Beruflich Dokumente
Kultur Dokumente
Health
Services
Evaluation
Health Services Research
Series Editor
Boris Sobolev
University of British Columbia
Vancouver, BC, Canada
Health services research is the study of the organization, uses and outcomes of
health care. The societal value of health services research lies in identifying the
ways in which health care can best be organized, financed, and delivered.
This ambitious agenda brings together researchers from a wide range of
disciplinary backgrounds that are required for evaluating the effectiveness of
diagnostic technologies, treatment procedures, and managerial solutions. The
series is envisaged as a collection that overviews the established knowledge
and provides access to accepted information in the field. The content is
grouped into six major areas.
Health Services
Evaluation
This Springer imprint is published by the registered company Springer Science+Business Media,
LLC part of Springer Nature.
The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.
Series Preface
Health Services Research has experienced explosive growth in the past three
decades. The new field was formed at the interface of a number of disciplines,
including medicine, statistics, economics, management science, and the social
and behavioral sciences, which came together around the study of health care
practice, delivery and outcomes. The rich, multidisciplinary research enter-
prise that developed from this fusion has already produced a growing and
sophisticated body of subject matter research and has also defined a body of
methodology that is integral to the field. True to the multidisciplinary origins
of the field, its methods continue to benefit from developments in diverse
disciplines, while formulating and addressing scientific questions that are
unique to health care and outcomes research.
The societal value of health services research lies in identifying the ways in
which health care can best be organized, financed, and delivered. This ambi-
tious agenda brings together researchers from a wide range of disciplinary
backgrounds who are required for evaluating the effectiveness of diagnostic
technologies, treatments, procedures, and health delivery systems as no single
discipline provides a full perspective on how the health systems operate.
A fundamental discovery was the persistent variation in health care utili-
zation across providers, regions and countries, variation that cannot be
explained by population illness level, known benefit or patient preference.
Another discovery was that treatments and procedures that are meant to benefit
patients may produce adverse events and unintended consequences. We have
learned that results of randomized clinical trials cannot always be generalized
to clinical practice because patients enrolled in trials can be highly selective.
Researchers have been able to identify patients who may benefit from a
treatment but there are groups of patients for whom the optimal treatment is
not well defined or may depend on their personal preferences. Learning what
works in real life gave rise to comparative effectiveness research.
The Health Services Research series addresses the increasing need for a
comprehensive reference in the field of inquiry that welcomes interdisciplinary
collaborations. This major reference work aims to be a source of information
for everyone who seeks to develop an understanding of health services and
health systems, and learn about the historic, political, and economic factors
v
vi Series Preface
that influence health policies at global, national, regional and local levels. The
intended readership includes graduate students, educators, researchers,
healthcare professionals, policy makers and service administrators.
The main reason for public support of health services research is the
common understanding that new knowledge will lead to more effective health
care. Over the past decades, we have witnessed the increased prominence of
health services and health policy research since the knowledge, skills and
approaches required for ground-breaking work distinguish it from other spe-
cialties. An important step towards the formation of the profession is a
comprehensive reference work of established knowledge. The Health Services
Research series is intended to provide the health services researcher a home for
the foundations of the profession.
The Health Services Research series is available in both printed and online
formats. The online version will serve as a web-based conduit of information
that evolves as knowledge content expands. This innovative depository of
knowledge will offer various search tools, including cross-referencing across
chapters and linking to supplement data, other Springer reference works and
external articles.
vii
viii Contents
xi
xii About the Series Editor
General Hospital. The program’s mission is closing the gap between health
care that is possible and health care that is delivered. This ambitious agenda
brings together researchers from a wide range of disciplinary backgrounds that
are required for evaluating the effectiveness of diagnostic technologies, treat-
ment procedures, and managerial solutions. The program’s investigators
empirically assess the benefits and harms of therapeutic and health care
interventions in the acute and primary care setting, using patient registries
and data from routine medical care. By learning what works in everyday
clinical practice the program generates knowledge that helps physicians and
patients to make shared decisions about the best approach to treatment.
Dr. Sobolev promotes and advances the causality perspective in health
services research for informing policy and decision-making. In particular, his
recent work helped to estimate the reduction in postoperative mortality
expected from providing timely cardiac surgical care; the health effects of
receiving hip fracture surgery within the government benchmark; the propor-
tion of hospital readmissions that could be avoided had patients undergone
medication review in emergency departments rather than in hospital wards;
and the expected reduction of mortality had all coronary obstructive pulmo-
nary disease patients had their second exacerbation prevented.
About the Editors
xiii
xiv About the Editors
serving as leaders who generate evidence and apply critical thinking to the
health challenges of today and tomorrow. The Department’s faculty generate
evidence and engage in knowledge exchange that advances effective and
sustainable systems for health services access and delivery.
As nominated principal investigator, Dr. Levy led the development and
implementation of the Maritime Strategy for Patient-Oriented Research SUP-
PORT Unit. This initiative, co-funded by the Canadian Institutes of Health
Research, offers research infrastructure designed to promote patient-centered
outcomes and health services research in Canada’s three Maritime provinces.
The Unit’s mission is to lead the development and application of patient-
centered outcomes research, and the vision is to enhance the health and well-
being of individuals and populations in the Maritimes and across Canada. The
central goals include advancing research on health systems, knowledge trans-
lation and implementation of healthcare transformation, and implementing
research at the point of care.
Sarah Goring has an M.Sc. in healthcare and epidemiology from the Univer-
sity of British Columbia and more than 10 years of experience consulting in the
private sector, where she focuses on pharmacoepidemiology, evidence syn-
thesis methods, and health services research.
leading authority on the evaluation of diagnostic and screening tests and has
made major contributions to statistical methods for medical technology assess-
ment and health services and outcomes research. His current research activity
spans the spectrum of evidence-based diagnostic medicine, addresses both
methodology and subject matter, and has a major focus on the comparative
effectiveness of screening and diagnostic modalities. As the founding network
statistician of the American College of Radiology Imaging Network (ACRIN)
and a group statistician for the ECOG-ACRIN collaborative group, he has
decades-long experience in the clinical evaluation of modalities for diagnosis
and prediction in cancer and other chronic diseases. Dr. Gatsonis has served on
numerous review and advisory panels. He chaired the NAS Committee on
Applied and Theoretical Statistics and is a member of the NAS Committee on
National Statistics. He served on the IOM Committee on Comparative Effec-
tiveness Research Prioritization and the NAS Committee on Reproducibility
and Replicability in Science and was the founding editor-in-chief of Health
Services and Outcomes Research Methodology. Dr. Gatsonis was educated at
Princeton and Cornell, was elected fellow of the American Statistical Associ-
ation, and received a Long-Term Excellence Award from the Health Policy
Statistics Section of ASA.
Mary Jane King Surveillance and Ontario Cancer Registry, Cancer Care
Ontario, Toronto, ON, Canada
Martin Knapp The London School of Economics and Political Science,
London, UK
Madelon Kroneman Netherlands Institute of Health Services Research
(NIVEL), Utrecht, The Netherlands
Suszy Lessof European Observatory on Health Systems and Policies,
Brussels, Belgium
Adrian R. Levy Community Health and Epidemiology, Dalhousie Univer-
sity, Halifax, NS, Canada
Chang Liu ACCESS Health International, New York, NY, USA
Lisa M. Lix Department of Community Health Sciences, University of
Manitoba, Winnipeg, MB, Canada
Gregory Marchildon Institute of Health Policy, Management and Evalua-
tion, University of Toronto, Toronto, ON, Canada
Anna Maresso European Observatory on Health Systems and Policies,
London School of Economics and Political Science, London, UK
Patricia J. Martens Winnipeg, MB, Canada
Ryozo Matsuda Ritsumeikan University, Kyoto, Japan
Gregory J. Matthews Department of Mathematics and Statistics, Loyola
University, Chicago, IL, USA
Martin McKee London School of Hygiene and Tropical Medicine, London,
UK
John R. T. Monson Florida Hospital System Center for Colon and Rectal
Surgery, Florida Hospital Medical Group Professor of Surgery, University of
Central Florida, College of Medicine, Florida Hospital, Orlando, FL, USA
Vincent Mor Department of Health Services, Policy and Practice, Brown
University School of Public Health, Providence, RI, USA
Providence Veterans Administration Medical Center, Center on Innovation,
Providence, RI, USA
Elias Mossialos London School of Economics and Political Science,
London, UK
Yan Mu UNICEF China, Beijing, China
Ross M. Mullner Division of Health Policy and Administration, School of
Public Health, University of Illinois, Chicago, IL, USA
Sarah Naeger Behavioral Health Research and Policy, IBM Watson Health,
Bethesda, MD, USA
Brian Neelon Department of Biostatistics and Bioinformatics, Duke Univer-
sity School of Medicine, Durham, NC, USA
J. Patrick Nicol Manitoba Centre for Health Policy, University of Manitoba,
Winnipeg, MB, Canada
Katia Noyes Department of Surgery, University of Rochester Medical
Center, Rochester, NY, USA
Alistair James O’Malley The Dartmouth Institute for Health Policy and
Clinical Practice, Department of Biomedical Data Science, Geisel School of
Medicine at Dartmouth, Lebanon, NH, USA
Department of Health Care Policy, Harvard Medical School, Boston,
MA, USA
Jukka-Pekka Onnela Department of Biostatistics, Harvard School of Public
Health, Boston, MA, USA
Tyler Ostapyk University Advancement, Carleton University, Ottawa, ON,
Canada
Irene Papanicolas The London School of Economics and Political Science,
London, UK
Harvard T.H. Chan School of Public Health, Cambridge, MA, USA
Melania Pintilie University Health Network, Toronto, ON, Canada
Sujohn Prodhan Surveillance and Ontario Cancer Registry, Cancer Care
Ontario, Toronto, ON, Canada
Bernd Rechel European Observatory on Health Systems and Policies,
London School of Hygiene and Tropical Medicine, London, UK
Thomas Rice Department of Health Policy and Management, Fielding
School of Public Health, University of California, Los Angeles, CA, USA
Cynthia Robins Westat, Rockville, MD, USA
Leslie L. Roos Manitoba Centre for Health Policy, University of Manitoba,
Winnipeg, MB, Canada
Noralou P. Roos Manitoba Centre for Health Policy, University of Manitoba,
Winnipeg, MB, Canada
Pauline Rosenau Division of Management, Policy and Community Health,
School of Public Health, University of Texas Health Science Center at
Houston, Houston, TX, USA
Carolyn M. Rutter RAND Corporation, Santa Monica, CA, USA
Marya Saidi The London School of Economics and Political Science,
London, UK
xxiv Contributors
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Big Data Analytics on SEER Lung Cancer
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Lung Cancer Survival Prediction System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Conditional Survival Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Association Rule Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Illustrative Data Mining Results on SEER Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Lung Cancer Outcome Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Other Applications of Big Data Analytics in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
A. Agrawal (*) · A. Choudhary The term “big data” has become a ubiquitous buzz-
Department of Electrical Engineering and Computer word today in practically all areas of science, tech-
Science, Northwestern University, Evanston, IL, USA
e-mail: ankita@eecs.northwestern.edu; nology, and commerce. It primarily denotes
choudhar@eecs.northwestern.edu datasets that are too large, complex, or both, to be
# Springer Science+Business Media, LLC, part of Springer Nature 2019 3
A. Levy et al. (eds.), Health Services Evaluation, Health Services Research,
https://doi.org/10.1007/978-1-4939-8715-3_2
4 A. Agrawal and A. Choudhary
• Value: The goal of big data analytics is to important? Let us try to understand with the help
discover the hidden knowledge from huge of an example. The benefits of medical treatments
amounts of data, which is akin to finding a can vary depending on one’s expected survival,
needle in a haystack, and can be extremely and thus not considering an individual patient’s
valuable. For example, big data analytics in prognosis can result in poor quality of care as well
healthcare can help enable personalized medi- as nonoptimal use of healthcare resources. Devel-
cine by identifying optimal patient-specific oping accurate prognostic models using all avail-
treatments, which can potentially improve mil- able information and incorporating them into
lions of lives, reduce waste of healthcare clinical decision support could thus significantly
resources, and save billions of dollars in improve quality of healthcare (Collins et al.
healthcare expenditure. 2015), both in terms of improving clinical deci-
sion support and enhancing informed patient con-
The first three Vs above distinguish big data sent. Development of accurate data-driven models
from small data, and other Vs are characteristics can also have a tremendous economic impact. The
of any type of data, including big data. Further, Centers for Disease Control and Prevention esti-
each application domain can also introduce its mates that there are more than 150,000 surgical
own nuances to the process of big data management site infections annually (Magill et al. 2014), and it
and analytics. For example, in healthcare, the pri- can cost $11,000–$35,000 per patient, i.e., about
vacy and security of patients’ data are of paramount $5 billion every year. Accurate predictions and
importance, and compliance to Health Insurance risk estimation for healthcare outcomes can poten-
Portability and Accountability Act (HIPAA) and tially avoid thousands of complications, resulting
institutional review board (IRB) protocols is neces- in improved resource management and signifi-
sary to work with many types of healthcare data. It cantly reduced costs. This requires development
is also worth noting here that although the size and of advanced data-driven technologies that could
scale of healthcare data are not as large as in some effectively mine all available historical data,
other domains of science like high energy physics extract and suitably store the resulting insights
or in business and marketing, but the sheer com- and models, and make them available at the
plexity and variety in healthcare data becoming point of care in a patient-specific way.
available nowadays require the development of In the rest of this chapter, we will see one such
new big data approaches in healthcare. For exam- application of big data analytics on electronic
ple, there are electronic healthcare records (EHRs), healthcare records so as to make predictive
medical images (e.g., mammograms), time-series models on it and discover interesting insights. In
data (e.g., ECG signals), textual data (doctor’s particular, we will take the example of lung cancer
notes, research papers), genome sequence, and data from the Surveillance, Epidemiology, and
related data (e.g., SNPs). End Results (SEER) program to build models of
So what can big data analytics do for a real- patient survival after 6 months, 9 months, 1 year,
world healthcare application? A variety of person- 2 years, and 5 years (Agrawal et al. 2011a) and for
alized information such as patient’s electronic conditional survival as well (Agrawal et al. 2012).
health records is increasingly becoming available. We will also see the application of association rule
What if we could intelligently integrate the hidden mining on this dataset for 5-year survival
knowledge from such healthcare data during a (Agrawal et al. 2011b) and 5-year conditional
real-time patient encounter to complement physi- survival (Agrawal and Choudhary 2011). Finally,
cian’s expertise and potentially address the chal- we will discuss the online lung cancer outcome
lenges of personalization, safe, and cost-effective calculator that resulted from the described predic-
healthcare? Note that the challenge here is to tive analytics on SEER data and conclude with
make the insights patient specific instead of giving some examples of big data analytics in other
generic population-wide statistics. Why is this healthcare-related applications.
6 A. Agrawal and A. Choudhary
Big Data Analytics on SEER Lung prediction on SEER data. Figure 2 depicts the
Cancer Data overall end-to-end workflow. It is worth mention-
ing here that this workflow for predictive lung
Lung (respiratory) cancer is the second most com- cancer outcome analytics is essentially a healthcare
mon cancer and the leading cause of cancer- adaptation of existing similar data science
related deaths in the USA. In 2012 alone, over workflows in other domains, since most of the
157,000 people in the USA died from lung cancer. advanced techniques for big data management
The 5-year survival rate for lung cancer is esti- and analytics are invented in the field of computer
mated to be just 15% (Ries et al. 2007). The science and more specifically high-performance
Surveillance, Epidemiology, and End Results data mining (Agrawal et al. 2013a; Xie et al.
(SEER) program of the National Cancer Institute 1072), via applications in many different domains
(NCI) is an authoritative repository of cancer sta- like business and marketing (Xie et al. 2012), cli-
tistics in the USA (SEER 2008). It is a population- mate science (Ganguly et al. 2014), materials infor-
based cancer registry covering about 26% of the matics (Agrawal and Choudhary 2016), and social
US population and is the largest publicly available media analytics (Xie et al. 2013), among many
cancer dataset in the USA. It collects cancer data others. Here we will only focus on the healthcare
for all invasive and in situ cancers, except basal application of developing a lung cancer survival
and squamous cell carcinomas of the skin and in prediction system. As shown in Fig. 2, it has five
situ carcinomas of the uterine cervix (Ries et al. stages described below.
2007). The SEER data attributes can be broadly
categorized into demographic attributes, diagno- Data Collection
sis attributes, treatment attributes, and outcome This is the obvious first step. Depending on the
attributes (see Table 1). The presence of outcome project, the kind of data required for it, and the
attributes makes the SEER data very useful for license agreements associated with that data, this
doing predictive analytics and making models for can be the easiest or the toughest step in the
cancer survival. workflow. SEER has made it easy to get the
“SEER limited-use data” from their website on sub-
mitting a SEER limited-use data agreement form. It
Lung Cancer Survival Prediction creates a personalized SEER research data agree-
System ment for every user that allows the use of the data for
only research purposes. In particular, there must be
Till now we have seen what big data is and what big no attempt to identify the individual patients in the
data analytics can do for healthcare applications. database. Of course, the obvious identification infor-
We have also had a brief introduction to SEER and mation like patient name, SSN, etc., are excluded
what kind of data is present in the SEER database. from the data released by SEER, but it still has
So now let us dive deeper into what a typical demographic information like age, sex, and race,
workflow for predictive analytics looks like, with which is very useful for research purposes but
the specific example of lung cancer survival should not be misused to try to identify patients in
any way. Such compliance to HIPAA regulations is
Table 1 SEER data attributes important to preserve patient privacy.
Type Examples
Demographic Age, gender, location, race/ethnicity,
Data Transformation
date of diagnosis Once the data is available, the first step is to under-
Diagnosis Tumor primary site, size, extension, stand the data format and representation and do any
lymph node involvement necessary transformations to make it suitable for
Treatment Primary treatment, surgical procedure, modeling. Let us assume the data is in a
radiation therapy
row-column (spreadsheet) format, such as in the
Outcome Survival time, cause of death
case of SEER data. Each row corresponds to a
1 Health Services Data: Big Data Analytics for Deriving Predictive Healthcare Insights 7
Fig. 2 A typical workflow for predictive analytics, illustrated with the example of outcome prediction models for lung
cancer using SEER data
patient’s medical record and can also be referred to which could easily be misinterpreted as 992 mm if
as an instance, data point, or observation. The col- not transformed appropriately. Another example of
umns are the attributes, such as age, race, tumor size, an unsupervised data transformation required in
surgery, outcome, etc. Data attributes can be of SEER data is to construct numeric survival time
different types – numeric, nominal, ordinal, and in months from the SEER format of YYMM, so
interval – and it is important to have the correct that it can be modeled correctly.
representation of each attribute for analysis, for The above data transformations are required
which some data transformation might be necessary. due to the way SEER data is represented and
More broadly, data transformation is needed to may be necessary for almost any project dealing
ensure the quality of the data ahead of modeling with this data. But there are also problem-specific
and remove or appropriately deal with noise, out- data transformations that may be necessary for
liers, missing values, duplicate data instances, etc. building a model as originally intended. For
Data transformation is usually unsupervised, example, if we are interested in building a predic-
which means that it does not depend on the out- tive model for lung cancer survival, then we
come or target attributes. For example, SEER should only include those patient records where
encodes all attributes as numbers, and many of the cause of patients’ death was lung cancer,
them are actually nominal, like marital status, which is given by the “cause of death” attribute.
where “1” represents “Single,” “2” represents We also need to remove certain attributes from the
“Married,” “3” represents “Separated,” “4” repre- modeling that directly or indirectly specify the
sents “Divorced,” “5” represents “Widowed,” and outcome, e.g., cause of death, whether the patient
“9” represents “Unknown.” Numbers have a natu- is still alive. Further, for binary class prediction,
ral order, and the operations of addition, subtrac- we also need to derive appropriate binary attri-
tion, and division are defined, which may be fine butes for survival time, e.g., 5-year survival.
for numeric attributes like “tumor size” but not for There are also certain data transformation steps
nominal attributes like marital status, sex, race, that could be supervised in some cases, meaning
etc., Such attributes need to be explicitly converted that they depend on the outcome attribute(s).
to nominal for correct predictive modeling. Even Examples include feature selection/extraction,
numeric attributes need to be examined carefully. discretization, and sampling, and all of these can
For example, the tumor size attribute in SEER data be supervised or unsupervised. If they are super-
gives the exact size of tumor in mm, if it is known. vised, they should in general be considered
But in some cases, the doctor notes may say “less together with other supervised analytics so as to
than 2 cm,” in which case it is encoded as “992,” avoid over-fitting (more about this later).
8 A. Agrawal and A. Choudhary
evaluation and the results combined to get predic- interest, overall accuracy by itself may not be a
tions for all the instances in the dataset. This setting very useful indicator of classification perfor-
is called twofold cross validation, as the dataset is mance, since even a trivial classifier that sim-
split into two parts. It can further be generalized to ply predicts the majority class would give high
k-fold cross validation, where the dataset is ran- values of overall accuracy:
domly split into k parts. k 1 parts are used to
build the model, and the remaining one part is ðTP þ TNÞ
Overall accuracy ¼
used for testing. This process is repeated k times ðTP þ TN þ FP þ FNÞ
with different test splits, and the results are com-
bined to get predictions for all the instances in the where TP is the number of true positives (hits),
dataset using a model that did not see them while TN is number of true negatives (correct rejec-
training. Leave-one-out cross validation (LOOCV) tions), FP is number of false positives (false
is a special case of the more generic k-fold cross alarms), and FN is number of false negatives
validation, with k = N, the number of instances in (misses).
the dataset. LOOCV is commonly used when the 3. Sensitivity (recall): It is the percentage of pos-
dataset is not very large. To predict the target attri- itive labeled records that were predicted posi-
bute for each data instance, a separate predictive tive. Recall measures the completeness of the
model is built using the remaining N 1 data positive predictions:
instances, and the whole process is repeated for
each data instance. The resulting N predictions can TP
Sensitivity ¼
then be compared with the N actual values to calcu- ðTP þ FNÞ
late various quantitative metrics for accuracy. In this
way, each of the N instances is tested using a model 4. Specificity: It is the percentage of negative
that did not see it while training, thereby maximally labeled records that were predicted negative,
utilizing the available data for model building. Cross thus measuring the completeness of the nega-
validation is a standard evaluation setting to elimi- tive predictions:
nate any chances of over-fitting. Of course, k-fold
cross validation necessitates building k models, TN
Specificity ¼
which may take a long time on large datasets. ðTN þ FPÞ
Comparative assessments of how close the
models can predict the actual outcome are used 5. Positive predictive value (precision): It is the
to provide an evaluation of the models’ predictive percentage of positive predictions that are cor-
performance. Many binary classification perfor- rect. Precision measures the correctness of pos-
mance metrics are usually used for this purpose itive predictions:
such as accuracy, precision, recall/sensitivity,
specificity, area under the ROC curve, etc. TP
Positive predictive value ¼
ðTP þ FPÞ
1. c-statistic (AUC): The receiver operating char-
acteristic (ROC) curve is a graphical plot of true- 6. Negative predictive value: It is the percentage of
positive rate and false-positive rate. The area negative predictions that are correct, thereby mea-
under the ROC curve (AUC or c-statistic) is suring the correctness of negative predictions:
one of the most effective metrics for evaluating
binary classification performance, as it is inde- TN
Negative predictive value ¼
pendent of the probability cutoff and measures ðTN þ FNÞ
the discrimination power of the model.
2. Overall accuracy: It is the percentage of pre- 7. F-measure: It is not too difficult to have a
dictions that are correct. For highly unbalanced model with either good precision or good
classes where the minority class is the class of recall, at the cost of each other. F-measure
10 A. Agrawal and A. Choudhary
combines the two measures in a single metric 12 months, 18 months, and 24 months), and the
such that it is high only if both precision and same binary classification techniques were used to
recall are high: build five new models.
2:precision:recall
F measure ¼
ðprecision þ recallÞ
Association Rule Mining
the SEER November 2008 Limited-Use Data files given that the patient has already survived
(SEER 2008) were used, which was released in 3 months, 6 months, 12 months, 18 months, and
April 2009. It had a follow-up cutoff date of 24 months.
December 31, 2006, i.e., the patients were diag- Many predictive modeling techniques were
nosed and followed up up to this date. Data was found to give good accuracy measures that were
selected for the patients diagnosed between 1998 statistically indistinguishable with the best accu-
and 2001. Since the follow-up cutoff date for the racy. From among those, we chose the model
SEER data in study was December 31, 2006, and based on alternating decision trees with addi-
the goal of the project was to predict survival up to tional logistic modeling on top for better calibra-
5 years, data of 2001 and before was used. Also, tion. Tenfold cross validation was used to
since several important attributes were introduced estimate the accuracy of all the ten models.
to the SEER data in 1998 (like RX Summ-Surg Table 3 presents the results for all the models
Site 98-02, RX Summ-Scope Reg 98-02, RX (only accuracy and AUC included here for sim-
Summ-Surg Oth 98-02, Summary stage 2000 plicity), along with the distribution of survived
(1998+)), data of 1998 and after was used. There and not-survived patients in the data used to build
were a total of 70,132 instances of patients with the corresponding model.
cancer of the respiratory system between 1998
and 2001, and there were 118 attributes in the
Association Rule Mining
raw data from SEER.
For association rule mining analysis, all missing/
The SEER-related preprocessing resulted in
unknown values were removed, since we are
modification and splitting of several attributes,
interested in finding segments with precise defini-
many of which were found to have significant
tions in terms of patient attributes. The survival
predictive power. In particular, 2 out of 11 newly
time (in months) was chosen as the target attribute
created (derived) attributes were within the top
for the Hotspot algorithm. The dataset had 13,033
13 attributes that were eventually selected to be
instances, 13 input patient attributes, and 1 target
used in the lung cancer outcome calculator. These
attribute. The average survival time in the entire
were (a) the count of regional lymph nodes that
dataset (tavg) was 24.45 months. So it would be
were removed and examined by the pathologist
interesting to find segments of patients where the
and (b) the count of malignant/in situ tumors.
These attributes were derived from “Regional
Nodes Examined” and “Sequence Number-Cen- Table 3 Model classification performance (tenfold cross
tral,” respectively, from raw SEER data, both of validation)
which had nominal values encoded within the % % Not % Model
same attribute, with the latter also encoding non- Model Survived survived accuracy AUC
malignant tumors. After performing various steps 5 year 12.8 87.2 91.8 0.924
of data transformation and feature selection, the 2 year 23.4 76.6 85.6 0.859
data was reduced to 46,389 instances of lung 1 year 40.2 59.8 74.5 0.796
cancer patients and 13 attributes (excluding the 9 month 48.8 51.2 71.0 0.779
outcome attribute). 6 month 60.1 39.9 69.8 0.765
5 year| 16.9 83.1 89.8 0.912
3 month
5 year| 21.4 78.6 87.3 0.900
Predictive Analytics 6 month
For predictive analytics, binary outcome attri- 5 year| 31.9 68.1 82.1 0.875
butes for 6-month, 9-month, 1-year, 2-year, and 12 month
5-year survival were derived from survival time. 5 year| 43.9 56.1 78.1 0.850
The dataset of 5-year survival was subsequently 18 month
filtered to generate five new datasets for modeling 5 year| 54.9 45.1 76.1 0.830
24 month
conditional survival after 5 years of diagnosis,
12 A. Agrawal and A. Choudhary
average survival time is significantly higher than attribute. The average survival time in the condi-
or significantly lower than 24.45 months. Two tional survival dataset was 42.54 months. So, the
independent analyses were performed to find seg- above analysis was repeated on the conditional
ments in which average survival time was higher survival dataset with tavg = 42.54.
and lower than overall average survival, Tables 4 and 5 present the nonredundant
represented in the form of association rules. Lift association rules obtained with “higher” and
of a rule/segment is a multiplicative metric that “lower” mode, respectively. Tables 6 and 7 pre-
measures the relative improvement in the target sent the same for the conditional survival
(here survival time) as compared to the average dataset.
value of the target across the entire dataset.
For association rule mining analysis on condi-
tional survival data, a new dataset was constructed Lung Cancer Outcome Calculator
using only the cases in which the patient survived
at least 12 months from the time of diagnosis. The The web tool is available at http://info.eecs.north
conditional survival dataset had 6,788 instances, western.edu/LungCancerOutcomeCalculator, and
the same 13 input patient attributes, and 1 target uses the following 13 attributes:
Table 4 Nonredundant association rules denoting segments where average survival time is significantly higher than
24.45 months
Segment description Avg. survival time Segment size Lift
The tumor is well differentiated and localized, regional lymph nodes 68.18 100 2.79
examined are between 4 and 17, age of the patient at time of diagnosis is
less than 79, current tumor is patient’s first or second tumor, and
resection of lobe/bilobectomy is performed by the surgeon
The tumor is localized, age of patient is between 39 and 52, number of 68.11 100 2.79
regional lymph nodes examined is between 1 and 14, and resection of
lobe/bilobectomy is performed by the surgeon
Tumor is well differentiated, number of regional lymph nodes examined 66.83 101 2.73
is less than 15, resection of lobe/bilobectomy is performed, and regional
lymph nodes are removed
Tumor is localized, age of patient is between 41 and 52, tumor is 66.26 111 2.71
confined to one lung, and resection of lobe/bilobectomy is performed
Patient is born in Hawaii, patient’s age is less than 76, there is no lymph 64.98 106 2.66
node involvement, and resection of lobe/bilobectomy is performed
Tumor is localized, patient is born in Hawaii, patient’s age is less than 63.96 101 2.62
83, and surgery is performed
Tumor is well differentiated, number of lymph nodes examined is 63.86 101 2.61
between 7 and 18, there is no lymph node involvement, and patient’s age
is less than 81
Tumor is localized, patient is born in Connecticut, tumor is confined to 63.10 103 2.58
one lung, number of lymph nodes examined is greater than two, and
resection of lobe/bilobectomy is performed
Tumor is well differentiated, there is no lymph node involvement, 62.16 100 2.54
patient’s age is less than 76, and intrapulmonary/ipsilateral hilar/
ipsilateral peribronchial nodes are removed
Tumor is localized (confined to one lung), patient is born in Hawaii and 60.38 101 2.47
is less than 82 years old
Tumor is localized (confined to one lung), patient is born in Hawaii, and 60.18 103 2.46
cancer is confirmed by positive histology
Tumor is localized, patient is born in California, and resection of lobe/ 58.71 100 2.40
bilobectomy is performed by the surgeon
1 Health Services Data: Big Data Analytics for Deriving Predictive Healthcare Insights 13
Table 5 Nonredundant association rules denoting segments where average survival time is significantly lower than
24.45 months
Segment description Avg. survival time Segment size Lift
Tumor has metastasized and is poorly differentiated, lymph nodes are 5.21 100 4.69
involved in metastasis, and no lymph nodes are removed
Tumor has metastasized and is poorly differentiated, no surgery is 5.67 110 4.31
performed, and the patient is born in Hawaii
Tumor has metastasized, no surgery is performed, cancer is confirmed 5.73 128 4.26
by positive histology, and patient is born in Hawaii
Tumor has metastasized, surgery is contraindicated and not performed, 5.78 132 4.23
and cancer is confirmed by positive histology
Pleural effusion has taken place, tumor is poorly differentiated, 7.53 205 3.25
subcarinal/carinal/mediastinal/tracheal/aortic/pulmonary ligament/
pericardial lymph nodes are involved, and no surgery is performed
Pleural effusion has taken place, cancer is confirmed by positive 8.60 112 2.84
cytology, surgery is not recommended and hence not performed
Table 6 Nonredundant association rules denoting segments in the conditional survival dataset where average survival
time is significantly higher than 42.54 months
Segment description Avg. survival time Segment size Lift
Tumor is well differentiated and localized, patient’s age is less than 72.92 104 1.71
71, less than 13 regional lymph nodes are examined, and resection of
lobe/bilobectomy is performed
Tumor is well differentiated and localized (confined to one lung), 72.50 103 1.70
patient’s age is less than 71, surgery is performed, less than eight
regional lymph nodes are examined
Tumor is well differentiated, patient’s age is less than 84, regional lymph 71.95 100 1.69
nodes are removed, no lymph node involvement, no radiation therapy,
and resection of lobe/bilobectomy is performed
Tumor is localized (confined to one lung), patient’s age is between 41 and 69.66 105 1.64
52, surgery is performed, and resection of lobe/bilobectomy is performed
Tumor is well differentiated, patient’s age is less than 79, no lymph node 68.44 100 1.61
involvement, between 5 and 9 regional lymph nodes are examined
Tumor is localized (confined to one lung), patient’s age is less than 67.99 119 1.60
77, patient is born in Connecticut, and resection of lobe/bilobectomy is
performed
Patient’s age is less than 76, patient is born in Hawaii, no lymph node 67.81 101 1.59
involvement, and resection of lobe/bilobectomy is performed
Patient’s age is less than 75, patient is born in California, no lymph node 65.37 102 1.54
involvement, and resection of lobe/bilobectomy is performed
Tumor is localized, no regional lymph nodes are removed, and resection 62.14 102 1.46
of lobe/bilobectomy is performed
1. Age at diagnosis: Numeric age of the patient and spread. Available options are well-
at the time of diagnosis of lung cancer. differentiated, moderately differentiated,
2. Birth place: The place of birth of the patient. poorly differentiated, undifferentiated, and
There are 198 options available to select for undetermined.
this attribute (based on the values observed in 4. Diagnostic confirmation: The best method
the SEER database). used to confirm the presence of lung cancer.
3. Cancer grade: A descriptor of how the can- Available options are positive histology, pos-
cer cells appear and how fast they may grow itive cytology, positive microscopic
14 A. Agrawal and A. Choudhary
Table 7 Nonredundant association rules denoting segments in the conditional survival dataset where average survival
time is significantly less than 42.54 months
Segment description Avg. survival time Segment size Lift
Tumor is undifferentiated and has metastasized, subcarinal/carinal/ 17.18 100 2.48
mediastinal/tracheal/aortic/pulmonary ligament/pericardial lymph
nodes are involved, no regional lymph nodes are removed, and no
surgery is performed
Tumor is spread, surgery not recommended, patient is born in Iowa 20.28 137 2.10
Tumor is spread and undifferentiated, surgery not recommended, 20.35 124 2.09
subcarinal/carinal/mediastinal/tracheal/aortic/pulmonary ligament/
pericardial lymph nodes are involved, and cancer is confirmed by
positive histology
Pleural effusion has taken place, and tumor is poorly differentiated 22.96 101 1.85
12. Number of malignant tumors in the past: that it is meant to complement and not replace the
An integer denoting the number of malignant advice of a medical doctor. Many such calculators
tumors in the patient’s lifetime so far. This are becoming popular in healthcare.
attribute is derived from the SEER attribute
“Sequence Number-Central,” which encodes
both numeric and categorical values for both Other Applications of Big Data
malignant and benign tumors within a single Analytics in Healthcare
attribute. As part of the preprocessing, the
original SEER attribute was split into numeric We will conclude with a sampling of some other
and nominal parts, and the numeric part was applications of big data in healthcare. There has
further split into two attributes representing been abundant work on mining electronic health
number of malignant and benign tumors, records in addition to what is described in this
respectively. chapter. Some of these include mining data from
13. Total regional lymph nodes examined: An a particular hospital (Mathias et al. 2013), Amer-
integer denoting the total number of regional ican College of Surgeons National Surgical Qual-
lymph nodes that were removed and exam- ity Improvement Program (ACS NSQIP)
ined by the pathologist. This attribute was (Agrawal et al. 2013b), and United Network for
derived by extracting the numeric part of the Organ Sharing (UNOS (Agrawal et al. 2013c).
SEER attribute “Regional Nodes Examined.” Apart from electronic health records, a very
important source of healthcare data is social
Figure 3 shows a screenshot of the lung cancer media. We are in the midst of a revolution in
outcome calculator. This calculator is widely which, using social media, people interact, com-
accessed from more than 15 countries, including municate, learn, influence, and make decisions.
many medical schools and hospitals. A previous This data includes multi-way communications
version of this calculator was presented in and interactions on social media (e.g., Facebook,
Agrawal et al. (2011a). The current calculator Twitter), discussion forums, and blogs in the area
incorporates faster models as described in this of healthcare, public health, and medicine. The
chapter and has a redesigned interface. It allows emergence and ubiquity of online social net-
the user to enter values for the above-described works have enriched this data with evolving
13 attributes and get patient-specific risk. For all interactions and communities at mega-scale,
the ten models, it also shows the distribution of and people are turning to social media for various
survived and not-survived patients in the form of kinds of healthcare guidance and knowledge,
pie charts. Upon entering the patient attributes on including proactive and preventive care. Patients
the website, the patient-specific risk calculated by with like conditions – often chronic conditions,
all the ten models is depicted along with the such as flu, cancer, allergy, multiple sclerosis,
healthy and sick patient risk, which are essentially diabetes, arthritis, ALS, etc. – find patients with
the median risk of death of patients who actually the same condition on these networking sites and
survived and did not survive, respectively, as cal- in public forums. And these virtual peers can
culated by the corresponding model. It generates very much become a key guiding source of data
bar charts corresponding to each of the ten unlike in the past, when all information emanated
models, and each of them has three bars. The from physicians. This big data, being produced in
middle bar denotes the patient-specific risk, and social media domain, offers a unique opportunity
the left (right) bars denote the healthy (sick) for advancing, studying the interaction between
patient risk. The patient-specific risk is thus put society and medicine, managing diseases, learn-
in context of the healthy and sick patient risk for ing best practices, influencing policies, identify-
an informative comparison. ing best treatment, and, in general, empowering
Any data-driven tool like this in the field of people. It thus has numerous applications in pub-
healthcare has a disclaimer about its use, stating lic health informatics, and we are already seeing
16 A. Agrawal and A. Choudhary
several studies in this domain (Lee et al. 2013, is nonetheless in its infancy, and we are still far
2015; Xie et al. 2013). from realizing the dream of personalized medicine
Technological advances in sensors, micro- and by optimally utilizing the flood of genomic data
nano-electronics, advanced materials, mobile that we are able to collect now. Clearly, computa-
computing, etc., have had an immense impact tional sequence analysis techniques are critical to
toward enabling future Internet of things (IoT) unearth the hidden knowledge from such genomic
applications in several fields including healthcare. sequence data, and big data analytics is expected
We are currently witnessing a rapid adoption of to play a big role in that. For further reading on big
wearable devices under the IoT paradigm for a data analytics in genomics, the following articles
variety of healthcare applications (Andreu-Perez are recommended (Howe et al. 2008; ODriscoll
et al. 2015). These wearable and implantable sen- et al. 2013; Marx 2013).
sors along with smartphones that are ubiquitously
used all over the world form another source of
healthcare big data and provide unprecedented Summary
opportunities for continuous healthcare monitor-
ing and management. Big data has become a very popular term denoting
The field of genomics is another area where big huge volumes of complex datasets generated from
data analytics can play an important role. It is well various sources at a rapid rate. This big data
recognized that in genomics and life sciences, potentially has immense hidden value that needs
almost everything is based on complex sequence- to be discovered by means of intelligently
structure-function relationships, which are far designed analysis methodologies that can scale
from being well understood. With genomic for big data and all of that falls in the scope of
sequencing becoming progressively easier and big data analytics. In this chapter, we have looked
affordable, we have arrived at a point in time at some of the big data challenges in general and
where huge amounts of biological sequence data also what they mean in context of healthcare. As
have become increasingly available, thanks to the an example on big data mining in healthcare,
advent of next-generation sequencing (NGS). some recent works dealing with the use of predic-
Functional interpretation of genomic data is the tive analytics and association rule mining on lung
major task in fundamental life science. Research cancer data from SEER were discussed, including
results in this area in turn feed research in other a lung cancer outcome calculator that has been
important areas such as cell biology, genetics, deployed as a result of this analytics. Finally, we
immunology, and disease-oriented fields. There also briefly looked at a few other healthcare-
has been a lot of work in bioinformatics on related areas where big data analytics is playing
sequence data in terms of computationally mining an increasingly vital role.
the genomic sequences for interesting insights
such as homology detection (Agrawal and
Huang 2009, 2011). Furthermore, biological
References
sequencing data also ushers an era of personal
genomics enabling individuals to have their per- Agrawal A, Choudhary A. Association rule mining based
sonal DNA sequenced and studied to allow more hotspot analysis on seer lung cancer data. Int J Knowl
precise and personalized ways of anticipating, Discov Bioinform (IJKDB). 2011a;2(2):34–54.
Agrawal A, Choudhary A. Identifying hotspots in lung
diagnosing, and treating diseases on an individual
cancer data using association rule mining. In: 2nd
basis (precision medicine). Genome assembly and IEEE ICDM workshop on biological data mining and
sequence mapping techniques (Huang and Madan its applications in healthcare (BioDM); 2011b.
1999; Misra et al. 2011) form the first step of this p. 995–1002.
Agrawal A, Choudhary A. Perspective: materials informat-
process by compiling the overlapping reads into a
ics and big data: realization of the fourth paradigm of
single genome. While it is a fact that personalized science in materials science. APL Mater. 2016;4
medicine is becoming more and more common, it (053208):1–10.
18 A. Agrawal and A. Choudhary
Agrawal A, Huang X. Psiblast pairwisestatsig: reordering Lee K, Agrawal A, Choudhary A. Mining social media
psi-blast hits using pairwise statistical significance. streams to improve public health allergy surveillance.
Bioinformatics. 2009;25(8):1082–3. In: Proceedings of IEEE/ACM international conference
Agrawal A, Huang X. Pairwise statistical significance of on Social Networks Analysis and Mining (ASONAM);
local sequence alignment using sequence- specific and 2015.p. 815–22.
position-specific substitution matrices. IEEE/ACM Magill SS, Edwards JR, Bamberg W, Beldavs ZG,
Trans Comput Biol Bioinformatics. 2011;8(1):194–205. Dumyati G, Kainer MA, Lynfield R, Maloney M,
Agrawal A, Misra S, Narayanan R, Polepeddi L, Choudhary McAllister-Hollod L, Nadle J, Ray SM, Thompson
A. A lung cancer outcome calculator using ensemble DL, Wilson LE, Fridkin SK. Multistate point-
data mining on seer data. In: Proceedings of the tenth prevalence survey of health care-associated infections.
international workshop on data mining in bioinformatics N Engl J Med. 2014;370(13):1198–208.
(BIOKDD), New York: ACM; 2011. p. 1–9. Marx V. Biology: the big challenges of big data. Nature.
Agrawal A, Misra S, Narayanan R, Polepeddi L, 2013;498(7453):255–60.
Choudhary A. Lung cancer survival prediction using Mathias JS, Agrawal A, Feinglass J, Cooper AJ, Baker
ensemble data mining on seer data. Sci Program. DW, Choudhary A. Development of a 5 year life expec-
2012;20(1):29–42. tancy index in older adults using predictive mining of
Agrawal A, Patwary M, Hendrix W, Liao WK, Choudhary electronic health record data. J Am Med Inform Assoc.
A. High performance big data clustering. IOS Press; 2013;20:e118–24. JSM and AA are co-first authors.
2013a. p. 192–211. Misra S, Agrawal A, Liao W-k, Choudhary A. Anatomy of
Agrawal A, Al-Bahrani R, Merkow R, Bilimoria K, a hash-based long read sequence mapping algorithm for
Choudhary A. “Colon surgery outcome prediction using next generation dna sequencing. Bioinformatics.
acs nsqip data,” In: Proceedings of the KDD workshop on 2011;27(2):189–95.
Data Mining for Healthcare (DMH); 2013b. p. 1–6. ODriscoll A, Daugelaite J, Sleator RD. Big data, hadoop
Agrawal A, Al-Bahrani R, Raman J, Russo MJ, Choudhary and cloud computing in genomics. J Biomed Inform.
A. Lung transplant outcome prediction using unos data. 2013;46(5):774–81.
In: Proceedings of the IEEE big data workshop on Bio- Ries LAG, Eisner MP. Cancer of the lung. In: Ries LAG,
informatics and Health Informatics (BHI); 2013c. p. 1–8. Young JL, Keel GE, Eisner MP, Lin YD, Horner M-J,
Andreu-Perez J, Leff DR, Ip H, Yang G-Z. From wearable eds. SEER survival monograph: Cancer survival among
sensors to smart implants – toward pervasive and per- adults: U.S. SEER program, 1988–2001, Patient and
sonalized healthcare. IEEE Trans Biomed Eng. Tumor Characteristics. NIH Pub. No. 07–6215.
2015;62(12):2750–62. Bethesda, Md: National Cancer Institute, SEER
Collins GS, Reitsma JB, Altman DG, Moons Program; 2007:73–80.
KG. Transparent reporting of a multivariable prediction SEER, Surveillance, epidemiology, and end results (seer)
model for individual prognosis or diagnosis (tripod): the program (www.seer.cancer.gov) limited-use data
tripod statement. Ann Intern Med. 2015;162(1):55–63. (1973–2006). National Cancer Institute, DCCPS, Sur-
Ganguly AR, Kodra E, Agrawal A, Banerjee A, Boriah S, veillance Research Program, Cancer Statistics Branch;
Chatterjee S, Chatterjee S, Choudhary A, Das D, 2008. Released April 2009, based on the November
Faghmous J, Ganguli P, Ghosh S, Hayhoe K, Hays C, 2008 submission.
Hendrix W, Fu Q, Kawale J, Kumar D, Kumar V, Liao Xie Y, Honbo D, Choudhary A, Zhang K, Cheng Y,
WK, Liess S, Mawalagedara R, Mithal V, Oglesby R, Agrawal A. Voxsup: a social engagement framework.
Salvi K, Snyder PK, Steinhaeuser K, Wang D, Wuebbles Proceedings of the 18th ACM SIGKDD international
D. Toward enhanced understanding and projections of conference on Knowledge discovery and data mining
climate extremes using physics-guided data mining tech- (KDD) (Demo paper). ACM; 2012. p. 1556–9.
niques. Nonlinear Process Geophys. 2014;21:777–95. Xie Y, Chen Z, Zhang K, Cheng Y, Honbo DK, Agrawal A,
Hey T, Tansley S, Tolle K, editors. The fourth paradigm: Choudhary A. Muses: a multilingual sentiment elicita-
data-intensive scientific discovery. Redmond: Micro- tion system for social media data. IEEE Intell Syst.
soft Research; 2009. 2013a;99:1541–672.
Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Xie Y, Chen Z, Cheng Y, Zhang K, Agrawal A, WK Liao,
Hide W, Hill DP, Kania R, Schaeffer M, Pierre SS, Choudhary A. Detecting and tracking disease out-
et al. Big data: the future of biocuration. Nature. breaks by mining social media data. In: Proceedings
2008;455(7209):47–50. of the 23rd International Joint Conference on Artificial
Huang X, Madan A. Cap3: a dna sequence assembly pro- Intelligence (IJCAI); 2013b.p. 2958–60.
gram. Genome Res. 1999;9(9):868–77. Xie Y, Palsetia D, Trajcevski G, Agrawal A, Choudhary
Lee K, Agrawal A, Choudhary A. Real-time disease sur- A. Silverback: scalable association mining for temporal
veillance using twitter data: demonstration on flu and data in columnar probabilistic databases. In: Proceed-
cancer. In: Proceedings of the 19th ACM SIGKDD ings of 30th IEEE International Conference on Data
international conference on Knowledge discovery and Engineering (ICDE), Industrial and Applications
data mining (KDD); 2013.p. 1474–77. Track; 2014. p. 1072–83.
Health Services Data: Managing the Data
Warehouse: 25 Years of Experience at 2
the Manitoba Centre for Health Policy
Mark Smith, Leslie L. Roos, Charles Burchill, Ken Turner,
Dave G. Towns, Say P. Hong, Jessica S. Jarmasz, Patricia J.
Martens, Noralou P. Roos, Tyler Ostapyk, Joshua Ginter, Greg
Finlayson, Lisa M. Lix, Marni Brownell, Mahmoud Azimaee,
Ruth-Ann Soodeen, and J. Patrick Nicol
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Who We Are . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
What We Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Our Data Is Our Strength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Repository Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Concept Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Characteristics of Administrative Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Data Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Applying for Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Repository Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
The Data Management Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Step 1: Formulate the Request and Receive the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Step 2: Become Familiar with the Data Structure and Content . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Step 3: Apply SAS ® Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Step 4: Evaluate Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Step 5: Document the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Who We Are MCHP was the first research unit of its kind in
Canada. It continues to be recognized for its com-
The Manitoba Centre for Health Policy (MCHP) prehensive and ever-expanding, linkable
is a research organization located within the population-based data repository; its collaborative
Department of Community Health Sciences, models of working with government and health
Max Rady College of Medicine, Rady Faculty of regions; and for the outstanding caliber of its
Health Sciences, at the University of Manitoba research (Jutte et al. 2011; Wolfson 2011). The
(see Fig. 1). MCHP maintains the unique Popula- Repository (see Fig. 3) is unique in terms of its
tion Health Research Data Repository (the Repos- comprehensiveness, degree of integration, and ori-
itory) that is used by researchers to describe and entation around an anonymized population registry.
explain patterns of health care as well as profiles All the data files in the Repository are
of illness, and to explore other factors that influ- de-identified: names, addresses, phone numbers,
ence health such as socioeconomic status and real personal health identification numbers
(income, education, employment, social status, (PHINs) are removed before files are transferred
etc.). This chapter provides an overview of to MCHP by the data provider. MCHP complies
MCHP, concentrating on the acquisition and prep- with all laws and regulations governing the
aration of data, and the management of the Repos- protection and use of personal information. Strict
itory to support research and to protect the privacy policies and procedures are implemented to pro-
and confidentiality of Manitobans. The chapter tect the privacy and security of anonymized data.
that follows concentrates on MCHP’s research Information in the Repository comes mainly
production as well as the policy and program from Manitoba Health and other provincial gov-
impacts of those products over the past 25 years. ernment departments. The ability to link files and
track healthcare use from more than 70 databases,
some of which include data as far back as 1970,
What We Do allows researchers to investigate the health of
Manitobans across a wide spectrum of indicators.
MCHP’s mission is to conduct world-class popu- The data can tell us about Manitobans’ visits to
lation-based research to support the development the doctor, hospital stays, home care and nursing-
of evidence-informed policy, programs, and ser- home use, pharmaceutical prescriptions, etc. It is
vices that maintain and improve the health and continually expanding into new areas such as
well-being of Manitobans (see Fig. 2). education (kindergarten through grade 12 and
MCHP
22 M. Smith et al.
Fig. 3 The Population Health Research Data Repository Data not yet a part of the registry but is currently being acquired
is represented by a dotted line
some post-secondary), social housing, laboratory Canada through the Data Liberation Initiative,
diagnostic information, in-hospital pharmaceuti- are also stored in the repository and available
cals, and justice. Additional area-level data such for linkage by postal code.
as the Canadian census indicator of average Some examples of how the data in the reposi-
household income, available from Statistics tory have been used in the past include:
2 Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba. . . 23
• MCHP’s research into Manitoba’s aging pop- services for groups such as Manitoba’s Franco-
ulation has helped estimate future needs for phone and Métis populations or for individual
nursing-home beds, so regions can begin stra- regional health authorities (RHAs) (Chartier
tegically to add services over the coming et al. 2012; Fransoo et al. 2013; Martens
decades (Chateau et al. 2012). et al. 2010).
• The results from MCHP’s report Population
Aging and the Continuum of Older Adult MCHP personnel interact extensively with gov-
Care in Manitoba, published in February ernment officials, healthcare administrators, and cli-
2011, were used by the Manitoba Government nicians to develop a topical and relevant research
to invest $216 million to add more home care agenda. The strength of these interactions, along
support; a new rehabilitation program for with the application of rigorous academic standards,
seniors after surgery; as well as new personal enables MCHP to make significant contributions to
care homes (Doupe et al. 2011). the development of health and social policy. MCHP
• A report released in 2010 found that women undertakes five major research projects every year
enrolled in Manitoba’s Healthy Baby Prenatal under contract with Manitoba Health. In addition,
Benefit program had fewer low birth weight MCHP investigators secure external funding by
babies and fewer preterm births among other competing for research grants. Research completed
measurable improvements, lending substantial at MCHP is widely published and internationally
support for the program (Brownell et al. 2010). recognized (see Fig. 4). MCHP researchers collabo-
• Other MCHP reports document comparative rate with a number of highly respected scientists from
health status and the use of health and social Canada, the United States, Europe, and Australia.
Fig. 4 Number of documented publications in peer-reviewed journals arising from the use of MCHP Data, 1977–2014
24 M. Smith et al.
Program Data
Scrambled PHIN
ID
890
4732
the archived data cannot be accessed without Adjusted Clinical Group ® (ACG ®) Case-Mix
appropriate approvals. System, Complications and Comorbidities,
MCHP also implements small number disclo- Teenage Pregnancy, Diagnoses and Procedures
sure control. Non-zero values that are less than six • Education: High School Completion, Indices of
are suppressed in final reports. This helps to Educational Achievement, Curriculum Level
ensure that the privacy and confidentiality of indi- • Statistics: Intra-class Correlation Coefficient
viduals is retained while allowing individual level (ICC), Sensitivity and Specificity, Prevalence
data to be used for research purposes. and Incidence, General Estimating Equations
(GEE)
• Data Management: Record Linkage, Com-
mon Exclusions, Duplicate Records
Repository Tools
• Geographic Analysis: Regional Health
Authorities (RHAs), Winnipeg Community
MCHP has developed a number of web-based
Areas (WCAs)
resources that document the historical use of infor-
• Costing: Hospital Stays, Prescriptions, Physi-
mation stored in the repository. Much of this “cor-
cian Services, Home Care
porate knowledge” is captured in two resources: the
• Socioeconomic Status: Income Quintiles,
MCHP Glossary and the Concept Dictionary.
Socioeconomic Factor Index (SEFI)
• Social: Family Structure, Income Assistance
(IA), Residential Mobility
Glossary
The MCHP Glossary is a compilation of short def- Developing new concepts involves collabora-
initions for key terms used in MCHP publications. It tion between the research team, a concept devel-
documents terms commonly used in population oper, and the Concept Dictionary Coordinator. As
health and health services research and consists of shown in Fig. 6, the process involves five steps:
over 2,300 entries. Each glossary term contains a (1) A request for the development (or update) of a
brief definition (and its source), links to related concept; (2) Identification of reference materials
entries in the glossary and concept dictionary, and and sources; (3) Development of a draft;
links to pertinent external sites and reports. (4) Review of the draft involving feedback and
revisions; and (5) Publication of the concept on
the MCHP website.
Concept Dictionary The contents of a concept typically include:
– Links to additional supporting material (both key data on processes and outcomes, and the
internal and external), and a list of references inconsistent recording of provider information.
for the concept On the other hand, the administrative data housed
in the Repository yields a number of advantages
for conducting high-quality research, including:
An example concept is listed below in Fig. 7
The MCHP Glossary and Concept Dictionary • Population based: The entire population of
are available on-line at: http://umanitoba.ca/facul
the province is covered by the Manitoba
ties/health_sciences/medicine/units/community_
Health Services Insurance Plan. Nonparti-
health_sciences/departmental_units/mchp/resour cipation is minimal since residents are not
ces/concept_dictionary.html
required to pay premiums to register for
insured benefits.
• Unique identifiers: Use of a consistent set of
Characteristics of Administrative Data identifiers (with identification numbers of both
program recipients and providers scrambled to
Because administrative data are collected primarily ensure confidentiality) permits researchers to
for purposes other than research, care is required to build histories of individuals across time and
ensure accurate results. Potential limitations across government programs. For example,
include clinically imprecise coding, absence of individuals who are discharged from hospital
2 Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba. . . 27
can be linked to the medical claims file in order MCHP has created a series of tools to docu-
to determine whether adverse events are being ment the content of the data files, the process of
treated in physicians’ offices. gaining access to the data, and techniques for
• Longitudinal: Migration into and out of the working with the data.
province as well as mortality can be traced
from 1970 onward. Tracking groups of sub-
jects through time can determine if individuals Data Documentation
receiving a given intervention truly have no
adverse outcomes or if adverse events are not The MCHP Metadata Repository, currently avail-
showing up because the individual has left the able to internal users and users at remote access
province or has died. sites, organizes all of the Repository’s documen-
tation. This tool provides a consistent set of doc-
Some of the key characteristics and research umentation components for each group of data
importance of these attributes are detailed in Table 1. files. Components, displayed in the form of six
28 M. Smith et al.
Table 1 Manitoba research registry: key characteristics and research relevance (Roos 1999)
Characteristics Research relevance
Very large N Many physical and statistical controls are feasible; rare events can be
analyzed; statistical power is high
Population based for an entire province Heterogeneity along many variables is provided
Longitudinal data (going back over Many types of longitudinal designs are possible; important variables can be
30 years) measured more reliably
Loss to follow-up specified Follow-up critical for cohort studies is accommodated
Place of residence (according to postal Length of exposure to geographic areas can be quantified; measures of
code) at any point in time mobility and migration can be defined; small area variation analyses can be
developed
Family composition at any point in time Nonexperimental designs estimating the importance of different social
variables and controlling for unmeasured background characteristics are
facilitated
tabbed pages, include an Overview, Data Model, geographical parameters, data caveats, access
Data Quality Report, Data Dictionary, Additional requirements, and links to concepts to assist
Documents and Reports, and a Blog. See Fig. 8. users working with the data. These descriptions
provide users a sense of the extent, purpose,
1. Overview – A standardized data description scope, and subject of a given database. They
summarizes the data, information on the data can also act as a first stop for researchers
provider, purpose and method of data collection, attempting to assess the feasibility of an adminis-
years of available data, size of data files, trative data project. The following list (see Table 2)
2 Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba. . . 29
Table 2 Standard headers used to describe all databases housed in the Repository
Header name Description
Summary A brief summary of the data, often used in grant applications, requests for data, and report
glossaries. These serve as a very basic and general introduction to the data
Source agency Data provider. Frequently the same agency from which access permission is required
Type A conceptual category (domain) that is indicative of the type of record included in the file
(e.g., administrative or survey)
Purpose Provides a brief overview of why the data is collected by the source agency. What use it
serves in the originating organization
Scope The scope of the database; who or what is in, and who or what is not. May also include
geographic, age, or program scope
Data collection method A brief description of the original data collection process at the source
Size General estimates of numbers of rows (records or observations) and columns (fields or
variables)
Data components The separate tables or sections that make up the data set
Data level The level at which researchers can effectively and reliably study the data (e.g., individual or
aggregate)
Data years Range of data years and whether acquired by calendar, fiscal, or academic year
Data highlights Key characteristics applicable for typical analyses
Data cautions Obvious issues with the data of potential importance to researchers or useful for assessing
project feasibility
Access requirements Who to apply to in order to gain access to the data
Direct links to the source agency’s contact info or website are also included when
appropriate
More information Links to other sources of information such as the glossary, data dictionary, concept
dictionary, provider’s webpage, etc
Previous and potential List of, and links to, MCHP deliverables and other reports or projects using the data
studies
References Any references used in the description/overview
Date modified The date the overview was last modified
shows the standard headers used to describe all 3. Data Quality Report – The usability of each
databases housed in the Repository. field is addressed when data files are stored in
Before the overview is published the data the Repository and evaluations are summa-
provider and a selection of users who fre- rized in a report available in the metadata
quently work with the data review the docu- repository. The data quality framework guid-
ment for accuracy and completeness. ing this effort is available on MCHP’s external
Overviews have also been sent to external website. A complete description of the data
organizations, such as Thomson and Reuter’s quality process is provided in the document A
“Data Citation Index,” that include these doc- Data Quality Evaluation Tool for Administra-
uments in their integrated search systems. This tion Data available online and from MCHP
facilitates the introduction of the data to exter- Data Quality Framework http://umanitoba.ca/
nal researchers, allows users to track and dis- faculties/health_sciences/medicine/units/commu
cover publications using a specific MCHP nity_health_sciences/departmental_units/mchp/
dataset, and increases the reach of the work protocol/media/Data_Quality_Framework.pdf.
produced by MCHP. 4. Data Dictionary – The data dictionary iden-
2. Data Model – A data model is created to dis- tifies the files and tables held in the Repository.
play the structure of data files and how they are It provides detailed descriptions of individual
linked together in the Repository (see Fig. 9). data elements to assist users in their extraction,
30 M. Smith et al.
As the number of data files and users has The Data Management Process
grown ensuring a common prerequisite level
of knowledge has become increasingly impor- MCHP’s six-step data management process (see
tant. An accreditation process established in Fig. 13) describes how data are transferred from a
April 2010 provides a consistent overview of source agency, processed, and brought into the
MCHP and its data access and use policies and Repository in order to be used for research purposes.
procedures. The accreditation material covers
the MCHP mission (see Fig. 2), available data
in the Repository, and the requirements for data Step 1: Formulate the Request
use and publication of results. Accreditation is and Receive the Data
required for all researchers, students, and per-
sonnel working on approved projects. Once the A data-sharing agreement must be in place before
initial accreditation session is completed, an any data can be received from the source agency.
online accreditation refresher module is avail- MCHP works in consultation with the source
able and must be completed annually. Accred- agency and the University of Manitoba’s Office
itation information is also available for public of Legal Counsel to produce an agreement. The
access at: http://umanitoba.ca/faculties/health_ data-sharing agreement defines policies and prac-
sciences/medicine/units/community_health_sci tices about data confidentiality, privacy, legislative
ences/departmental_units/mchp/resources/accre and regulatory requirements, data transfer, and
ditation.html ongoing use of the data for research purposes.
Data-sharing agreements are of two types: agree-
ments for data added to the Repository at regular
intervals (typically annually), and agreements for
Repository Documentation data provided for a single research project. For data
added to the Repository at regular intervals,
More general summaries of the Repository con- MCHP assumes responsibility for overseeing its
tents are produced in several formats: use. This involves ensuring that appropriate poli-
cies and procedures governing use are established,
1. Dataflow Diagram documented, and enforced. For data added only for
The dataflow diagram illustrates the flow of data one specific project – called project-specific data –
from its original source into the Repository. A the principal investigator of the project assumes
reduced-scale version is shown in Fig. 10. responsibility for overseeing the use of the data.
2. Data lists – several lists are maintained, each Once a data-sharing agreement is produced, a
serving different purposes: data management analyst is assigned to work with
a. Population Health Research Data Repository the source agency to facilitate the transfer. Initially
List – a searchable and filterable list that indi- this involves meeting with representatives from the
cates the years of available data, the source source agency to acquire background information,
agency for each database, and provides links documentation, data model diagrams, data dictio-
to individual data descriptions. An illustration naries, documentation about historical changes in
of the interface is provided in Fig. 11. the data (including changes in program scope, con-
b. Data Years Chart – Displays the years of tent, structure, and format), existing data quality
available data for each file, with links to reports, and other information relevant to the
data descriptions. Figure 12 provides an description or use of the data. This information is
example of the list. used to: (a) develop a formal data request;
3. Data Repository Slides – PowerPoint slides (b) enhance the metadata repository, which con-
commonly used by researchers that describe or tains database documentation; and (c) prepare the
provide a representation of the MCHP data Data Quality Report. The analyst asks the source
Repository (see Fig. 3). agency for reports or publications that document
32
Fig. 10 A screenshot of the MCHP dataflow diagram available online. The full-scale diagram is available online at: http://umanitoba.ca/faculties/health_sciences/medicine/units/
community_health_sciences/departmental_units/mchp/protocol/media/dataflow_diagram.pdf
M. Smith et al.
2 Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba. . . 33
the entities in the data, such as people, places, Once the documentation and sample data file
events, or activities (e.g., annual reports). This have been evaluated, a formal data request is
information is used to assess the accuracy and prepared and sent to the source agency. The data
validity of the files that are brought into the Repos- are then shipped to Manitoba Health for
itory. Available financial data, such as annual bud- de-identification and data linkage (described
gets and total expenditures for specific programs, above, under Privacy).
are also requested if available.
The initial data request encompasses historical
documentation; that is, information that may have Step 2: Become Familiar with the Data
gone through multiple revisions over time, particu- Structure and Content
larly in response to health system changes. The initial
data request may in fact be a series of requests, one Once MCHP receives the data, a data manage-
for each generation of source data. Future requests ment analyst reviews the documentation and the
for updates may refer to the most recent generation organization of files and structures. While data in
only. All changes in coding methods, program con- the Repository are usually organized to reflect the
straints, and accounting measures are documented structure of the original source data, sometimes
and incorporated into the metadata repository. the files must be reorganized to permit addressing
A sample data file is often prepared by the questions about different units of analysis that
source agency and transferred to MCHP at the comprise the data, including persons, places,
same time as the initial documentation transfer. objects, events, and dates.
Ideally, the sample consists of a random Tasks undertaken in the process of becoming
anonymized subset of the original data. familiar with the data structure and content include:
34 M. Smith et al.
1. Standardizing unique record identifiers. If the 5. Reorganizing and converting files to a different
PHIN is missing, then a unique “placeholder” file format, if necessary.
value is created by MCHP analysts.
2. Standardizing dates of events and correcting Step 3: Apply SAS ® Programs
incomplete dates, where possible.
3. Standardizing frequently used demographic MCHP uses SAS ® for analysis, which performs
data elements, including sex and postal code. optimally with data files that have been
4. Identifying and restricting access to data ele- denormalized (SAS Institute Inc. 2006).
ments not normally made available to Denormalization is a process of adding redun-
researchers without special permission. Exam- dant information to a data file to reduce the
ples include registration numbers and hospital processing time required for analysis. Standard-
chart numbers. ized formats are applied to selected fields, such
2 Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba. . . 35
5. Document Data
Meet with programmer(s) and researcher(s) to present data structure and content
as date fields. Once a data file has been pre- then used to create a summary of the contents
pared for research use, the SAS Scalable Per- for documentation purposes.
formance Data Server (SPDS) is used to sort
and create indices and other design elements Step 4: Evaluate Data Quality
appropriate for the most commonly used appli-
cations. During this process, standard naming A Data Quality Report is produced for each
conventions for data files are applied. SAS ® is dataset in the Repository. This report is housed
36 M. Smith et al.
in the metadata repository, which provides a sin- Step 6: Release the Data
gle point of access for all documentation
concerning a data file. The structure and contents If the data files and documentation appear ready,
of the Report, and the framework guiding the the data can be released internally for use. Release
development of the report, are described below may be informal, in which case analysts are simply
under Data Quality Evaluation Tool for Admin- notified that the new data and documentation are
istration Data. available for use, or more formal, involving pre-
sentations to data analysts and researchers. The
latter is useful when a new data source is valuable
for multiple research projects, if substantial
Step 5: Document the Data changes have occurred to existing data or when
the source agency has introduced a new data-
Data dictionaries, which contain information capture process or system. New and updated
about the name, contents, and format of each datasets are also announced in the MCHP quarterly
field, are created and stored in the metadata repos- newsletter Research Resources Update, published
itory. The data dictionaries can be used to conduct online at: http://umanitoba.ca/faculties/health_sci
an initial review of data quality; a cursory review ences/medicine/units/community_health_sciences/
can identify problems such as missing data, departmental_units/mchp/resources/repository/
incompleteness of labels and descriptors, prob- rrupdate.html
lems with ranges in numeric values, and/or integ-
rity of data linkage keys.
Before the data are stored in the Repository, Percent of Time Spent on Each Data
the data dictionaries are subjected to an initial Management Activity
assessment of accuracy and completeness. If
deficiencies are identified, the analyst will inves- Now that MCHP has developed a methodolog-
tigate them through further contacts with the ical approach to acquiring and installing data,
source agency, Manitoba Health, or MCHP time spent in each of the various categories of
personnel. activity can be tracked. Figure 14 shows staff
28.5% 3. Programming/install
4. Database Maintenance
6. Documentation
7. Data release
7.4%
8. Application Development
Fig. 14 Average percent of time spent on each data management activity* for 2014
2 Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba. . . 37
time spent in each category accumulated over a To determine the quality of data coming into
1 year period. It was instructive to realize that MCHP an evaluation tool was developed (see
about one third of staff time is spent in either Fig. 15). This tool was implemented using SAS ®
administrative (meetings and presentations, software and is specifically designed to assess the
general communication, training) or application following characteristics of administrative data:
development activities. The latter includes such
things as the development of data quality • Completeness and correctness
macros and tools to implement the metadata • Consistency
repository. As Fig. 14 shows, programming • Referential Integrity
data to be stored (programming/installing data) • Trends in the data
and documenting data are two of the largest • Crosswalk linkage assessment
areas of activity, followed by data quality • Agreements using kappa statistics
assessments and exploring data on arrival at
the center. The smallest areas of activity
involve requesting data and performing revi- Completeness and Correctness
sions to existing data (database maintenance).
MCHP continues to monitor time spent on each Completeness refers to the magnitude of missing
activity in order to track fluctuations over time. values; such values are identified and reported for
At the moment, MCHP does not have a formal all data elements. The assessment of correctness
data release process; therefore, no time is includes the fraction of data elements that are
accruing in that activity. A dissemination strat- valid, invalid (e.g., categorical variables that do
egy will be developed in the coming year. not match a reference list, out of range numerical
variables, invalid dates such as a living person
born in the 1800s), missing data, and outliers for
Summary all numeric variables. The process of checking the
large number of files that flow into the repository
The six-step data management process used at at MCHP would be infeasible if not for the ability
MCHP follows standards and practices observed to automate the process. Completeness and cor-
in other similar initiatives as well as recommen- rectness can be evaluated using an automated set
dations developed by organizations maintaining of SAS macros developed at MCHP called
repositories of anonymized personal health META, INVALID CHECK, and VIMO. These
information for research purposes (for examples, macros produce the VIMO table (see Fig. 16) that
see (Daas et al. 2008; Holman et al. 1999; documents the percentage of valid, invalid, miss-
Lyman et al. 2008)). MCHP’s process also ing, and outlier data. Fields with invalid values are
reflects some of the more unique aspects of the flagged and the total number of invalid records is
political and social environment in which it automatically noted in the comment column.
operates, including relationships with source
agencies, the software platform on which the
Repository is maintained, and provincial health Assessing Consistency
privacy legislation.
Consistency refers to the intra-record relationship
among data elements. For example, hospital admis-
Data Quality Evaluation Tool sion dates must precede hospital separation (dis-
for Administration Data charge) dates. Consistency can be assessed using
MCHP’s VALIDATION macro which is based on
Data collected for administrative purposes are not predefined consistency criteria. Each record is
always of the best quality for research, and poor checked for consistency, and the results are summa-
quality data may lead to false conclusions. rized as a table showing the total number of
38 M. Smith et al.
REFERENTIAL
INTEGRITY
Macro
AGREEMENT
Macro
LINK
Macro TREND
Macro
DQ
LINKYR GEN
VALIDATION
Macro Macro Macro
VIMO
CONTENTS
Macro Macro
AUTOMATE META
Macro* Macro
DOCUMENTATION
INVALID System
CHECK
Macro
*All Macros displayed above are SAS® macros with the
exception of the AUTOMATE macro which is an Excel VBA macro
Dataset Label: dataset label Records: 10000 Legend (Potential Data Quality Problems) :
None or Minimal Moderate Significant Unknown
Dataset Name: dataset name Period: yyyy
< 5% 5-30% > 30% or N/A
= No variance or 100% missing value
= Min, Max values based on valid range
Type Variable Name Variable Label Valid Invalid Missing Outlier Min Max Mean Median STD Comment
VAR1 variable1 100.00 .00
ID
VAR2 variable2 100.00 .00
VAR3 variable3 94.75 4.76 .49 0.83 10.00 8.67 9.23 1.48
VAR4 variable4 70.77 29.23 .00 1.00 99.00 38.63 2.08 46.06
VAR5 variable5 95.09 4.70 .21 0.00 10.00 8.13 9.01 1.96
Num
VAR6 variable6 100.00 .00 .00 0.00 0.00 .00 .00 .00
VAR7 variable7 85.91 .00 14.09 0.00 110.00 6.10 .01 22.99
Observed Values
VAR8 variable8 99.32 .68 .00 -1, 0, 1 -1 ( 68 Invalid Obs. in total )
VAR9 variable9 .00 100.00
23, 01, 21, 25, 19, 07, 16, 09, 26, 28, 08, 10, 27, 30, 18, 17, 29, 22, 31, 12, 11,
VAR10 variable10 93.41 6.59
03, 15, 14, 13, 02, 04, 05, 06, 24, 20
15, 24, 75, 76, 78, 79, 80, 81, 83, 84, 85, 86, 88, 89, 90, 91, 92, 94, 97, 98,
Char
VAR11 variable11 100.00 .00 100, 102, 103, 104, 130, 132, 137, 138, 146, 148, 217, 229, 233, 234, 236,
237, 238, 239, 112, 77, 101, 231, 113, 82, 74, 87, 227, 235, 226, 232
VAR12 variable12 100.00 .00 2011
VAR13 variable13 28.02 .02 71.96 2001-03-28 2006-03-13 1582-10-14 ( 2 Invalid Obs. in total )
VAR14 variable14 99.61 .39 2003-06-28 2006-11-04
1226 invalid obs. out of
VAR15 variable15 87.74 12.26 .00 02JAN2001:03:13:36 01APR2006:22:26:52 [01JAN2001:23:59:59,
01APR2006:23:59:59] range
Key: CLIENT_VISIT_GUID
TOTAL
PRIMARY TABLE DUPLICATE MISSING RECORDS
WRHA_EDIS_CLIENT_2007JAN 1 (x3) 1,098,981
ORPHAN TOTAL
FOREIGN TABLE VALUES RECORDS
WRHA_EDIS_STATUS_2007JAN 399 2,987,150
WRHA_EDIS_PROVIDER_2007JAN 400 6,133,612
WRHA_EDIS_NACRS_2007JAN 188 586,504
for a particular year of data. Typical output is identifying information, and the data are sent to
illustrated in Fig. 19. Variations in expected MCHP with an encrypted PHIN that can be linked
trends are typically used as indicators that further with MCHP databases for research purposes. The
exploration is necessary. viability of linking incoming data with other
MCHP databases can be assessed using the
LINK and LINKYR macros (see Fig. 20).
Assessing Agreement
500
400
Frequency
200
100
0
2000/01
2001/02
2002/03
2003/04
2004/05
2005/06
2006/07
2007/08
2008/09
2009/10
Fiscal Year
Significant Outliers
Identical Subsequent Frequencies
Suppressed Small Frequencies (betwenn 0 to 6)
discrepancies and anomalies in the data and correct provides an opportunity for preparing data,
or document them before the data is released. improving quality, and understanding error
Anyone interested in implementing the Data through linkage to files with independent informa-
Quality assessment tools developed at MCHP can tion on relevant variables. For example, comparing
download the source code, examples, and docu- date of death from the Manitoba Health Insurance
mentation at http://umanitoba.ca/faculties/health_ Registry with the date recorded in the governments
sciences/medicine/units/community_health_sciences/ Vital Statistics files allows for error correction.
departmental_units/mchp/resources/repository/data The population-based registry has been criti-
quality.html. This software is freely available for cally important for many studies since 1977 (Roos
use under a GNU General Public License. et al. 1977). Besides using the registry for com-
puting geographically-based rates, individuals
have been located within families to determine
Advantages of Using a Population- the health and health-service use of particular
Based Registry ethnic groups (Martens et al. 2005, 2011) and
the registry has been critical for longitudinal stud-
As illustrated in Fig. 3, a central component of the ies, being used for relatively short-term follow-up
Repository is an anonymized population-based of surgical outcomes and multi-year birth-cohort
registry: a longitudinal registry of individuals cov- research (Brownell et al. 2014; Oreopoulos
ered by the provincial health insurance plan. It et al. 2008; Roos et al. 1992).
42 M. Smith et al.
100%
90%
80%
70%
50%
Variables pertaining to residential mobility, make collecting this data more efficient and reli-
years living in a neighborhood with particular able (Roos et al. 2008).
characteristics and so on, can be generated. Such One of the advantages in using administrative
recording of “exposure” is methodologically social variables is that they can help adjust for
superior to relying on cross-sectional variables differences in family background. For example,
from one point in time. Variables measured at in one study it was found that nine variables
various times over relatively long periods may (gender, income assistance, receiving services/
help resolve disagreements as to when, in the children in care, family structure, number of sib-
early life course, different factors might be occur- lings, birth order, mother’s age at first birth, resi-
ring. To treat periods in the life course separately, dential mobility, and the neighborhood-based
counts of years in a neighborhood or in a particu- Socioeconomic Factor Index) accounted for as
lar social situation can be generated for different much variance in the Manitoba Language Arts
intervals (e.g., ages 0–1, 2–4, 5–9, etc.). achievement test as a similar sized set of variables
from survey data (Roos et al. 2008, 2013). That is,
administrative data were as good at predicting the
Constructing Reliable Social Measures outcome as were the survey data.
relatively rare, information on school performance population-based study. Early Child Dev Care.
and income assistance can provide a window on 2014;185:291–316.
Chartier M, Finlayson G, Prior H, McGowan K-L, Chen H,
well-being during childhood and adolescence. Atten- de Rocquigny J, Walld R, Gousseau M. Health and
tion to the socioeconomic gradient over the early life healthcare utilization of francophones in Manitoba.
course builds on the hypothesis that the relatively 2012. http://mchp-appserv.cpe.umanitoba.ca/refer
affluent will disproportionately take advantage of ence/MCHP_franco_report_en_20120513_WEB.pdf.
Accessed 29 May 2013.
and benefit from health and educational programs. Chateau D, Doupe M, Walld R, Soodeen RA, Ouelette C,
In other words, wealthy individuals are more likely Rajotte L. Projecting personal care home bed equivalent
to be exposed to and take advantage of new initia- needs in Manitoba through 2036. 2012. http://mchp-
tives and opportunities that make them healthier and appserv.cpe.umanitoba.ca/reference/MCHP_pch_days_
report_WEB.pdf. Accessed 29 May 2013.
low-income people are less likely to do so. Daas PJH, Arends-Tóth J, Schouten B, Kuijvenhoven L,
Statistics Netherlands. Quality framework for the eval-
uation of administrative data. 2008. http://www.
Summing Up pietdaas.nl/beta/pubs/pubs/21Daas.pdf
Doupe M, Fransoo R, Chateau D, Dik N, Burchill C,
Soodeen R-A, Bozat-Emre S, Guenette W. Population
The Population Health Research Data Repository aging and the continuum of older adult care in Mani-
housed at MCHP is one of the most established and toba. 2011. http://mchp-appserv.cpe.umanitoba.ca/ref
comprehensive provincial repositories of health erence/LOC_Report_WEB.pdf. Accessed 29 May
2013.
and social data in Canada. Currently, more than Fransoo R, Martens P, The Need to Know Team, Prior H,
200 research projects are being conducted using Burchill C, Koseva I, Bailly A, Allegro E. The 2013
these data. In addition to the policy-relevant RHA indicators atlas. 2013. http://mchp-appserv.cpe.
research produced in the form of deliverables to umanitoba.ca/reference//RHA_2013_web_version.pdf.
Accessed 20 Nov 2013.
the Manitoba government (discussed in the next Holman CDJ, Bass AJ, Rouse IL, et al. Population-based
chapter) numerous high-quality academic papers linkage of health records in Western Australia: devel-
are published in areas of health services research opment of a health services research linked database.
and population health. Increasingly, studies are Aust N Z J Public Health. 1999;23:453–9.
Jutte DP, Roos LL, Brownell MD. Administrative record
focusing on the social determinants of health as linkage as a tool for public health research. Annu Rev
more social data becomes available (some of these Public Health. 2011;32:91–108.
are listed in Roos et al. 2008). Lyman JA, Scully K, Harrison JH. The development of
Research units like MCHP that house large health care data warehouses to support data mining.
Clin Lab Med. 2008;28:55–71.
databases accessed by many investigators and Martens PJ, Sanderson D, Jebamani L. Mortality compari-
graduate students can benefit from the creation sons of First Nations to all other Manitobans: a provin-
of web-based research resources to compile and cial population-based look at health inequalities by
disseminate common organizational knowledge. region and gender. Can J Public Health. 2005;96:S33–8.
Martens PJ, Bartlett J, Burland E, Prior H, Burchill C, Huq S,
Creating a single point of access to the knowledge Romphf L, Sanguins J, Carter S, Bailly A. Profile of metis
generated from a wide range of projects is impor- health status and healthcare utilization in Manitoba: a
tant for ensuring a high level of productivity and population-based study. 2010. http://mchp-appserv.cpe.
methodological excellence. umanitoba.ca/reference/MCHP-Metis_Health_Status_
Full_Report_(WEB)_(update_aug11_2011).pdf. Acc
essed 29 May 2013.
Martens PJ, Bartlett JG, Prior HJ, et al. What is the com-
parative health status and associated risk factors for the
References Metis? A population-based study in Manitoba, Canada.
BMC Public Health. 2011;11:814.
Brownell M, Chartier M, Au W, Schultz J. Evaluation of the Oreopoulos P, Stabile M, Walld R, et al. Short, medium,
healthy baby program. 2010. http://mchp-appserv.cpe. and long term consequences of poor infant health: an
umanitoba.ca/reference/MCHP-Healthy_Baby_Full_ analysis using siblings and twins. J Hum Resour.
Report_WEB.pdf. Accessed 29 May 2013. 2008;43:88–138.
Brownell MD, Nickel NC, Chateau D, et al. Long-term Roos NP. Establishing a population data-based policy unit.
benefits of full-day kindergarten: a longitudinal Med Care. 1999;37:JS15–26.
2 Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba. . . 45
Roos NP, Roos LL, Henteleff PD. Elective surgical rates: Roos LL, Hiebert B, Manivong P, et al. What is most
do high rates mean lower surgical standards? N Engl J important: social factors, health selection, and adoles-
Med. 1977;297:360–5. cent educational achievement. Soc Indic Res.
Roos LL, Fisher ES, Brazauskas R, et al. Health and 2013;110:385–414.
surgical outcomes in Canada and the United States SAS Institute Inc. SAS data integration studio 3.3: user’s
Summer. Health Aff (Millwood). 1992;11 guide. 2006. http://support.sas.com/documentation/
(Summer):56–72. onlinedoc/etls/usage33.pdf. Accessed 12 Aug 2014.
Roos LL, Brownell M, Lix L, et al. From health research to Wolfson M. A shining light in Canada’s health information
social research: privacy, methods, approaches. Soc Sci system. Healthcare Policy. 2011;6:8–13.
Med. 2008;66:117–29.
Health Services Data, Sources
and Examples: The Institute for Clinical 3
Evaluative Sciences Data Repository
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Strengths and Challenges of Using Health Administrative Data for Health
Services Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
The ICES Data Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Privacy, Data Governance, and Access to Data at ICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Record Linkage and Desensitizing the Data for Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Data Documentation, Metadata, and Data Quality Assessment . . . . . . . . . . . . . . . . . . . . . 53
Data Quality Assessment in the Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
New Data, New Uses, and New Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
concepts will be illustrated through the exam- This chapter focuses on the following areas:
ple of the data held in the Institute for Clinical
Evaluative Sciences (ICES) Data Repository in • Strengths and challenges of using health
Ontario. administrative data for health services research
• Privacy and data governance
• Record linkage and desensitizing the data for
Introduction research
• Data documentation and data quality
Under the 1982 Canada Health Act, health ser- assessment
vices deemed essential for all residents must be • New data, new uses, and new ideas
paid for by the provinces and territories. In
order to manage, administer, and pay for health These concepts will be illustrated through the
services for their populations, the provinces and example of the data held in the Data Repository at
territories routinely collect information about the Institute for Clinical Evaluative Sciences
health system transactions. Generally, this spec- (ICES), a research organization in Toronto,
trum of health administrative data includes Ontario, Canada, that collects and manages a
information about people and their use of the large data repository that is used to generate evi-
health system, physicians’ billing claims, hos- dence to improve health and the health system in
pital discharges, emergency and ambulatory Ontario.
care, home care, complex continuing and long-
term care, and claims for publically funded pre-
scription drugs, to name a few. Other large and Strengths and Challenges of Using
routinely collected datasets are also generated Health Administrative Data for Health
and used by various organizations throughout Services Research
the health system to understand how health ser-
vices are being used. Examples include public In Ontario and elsewhere in Canada, health
health program information, agency-level client administrative data are used not only for manag-
information, population-based registries and ing the health system but also for health services
surveys, electronic medical records, and, most research, policy development, and healthcare
recently, large genomic biobank data. The planning. Since most residents are eligible for
power of these data is amplified when they are healthcare, the data reflect full coverage of pub-
linked to each other to understand the whole licly funded service transactions. The data repre-
picture of healthcare delivery. According to sent actual encounters with the healthcare system
Friedman et al. (2005), when these data are and are therefore population based, free from
used to generate “health statistics,” they create recall bias, readily available, consistent over
“fundamental knowledge about the health of time, and are inexpensive to collect and use for
populations” that inform the health system, secondary purposes compared to traditional
“influences on health” that guide policy deci- research data. Generally, health administrative
sions, and “interactions among those influ- data are collected using standardized coding met-
ences” that guide program development and rics, especially when the data are collected by a
clinical care (Friedman et al. 2005). For exam- single source (such as a provincial health author-
ple, linked data may answer health system, ity or ministry). Using one dataset alone is useful
population-based, and clinical research ques- for health system surveillance and monitoring, but
tions such as: Are patients getting the care they the real power of using administrative data lies in
need? Is the care timely and based on optimal the ability to link multiple datasets at the individ-
evidence? How might the system be better orga- ual person level and across healthcare sectors. In
nized to optimize care? Is the care provided his seminal work, Dunn describes record linkage
equitable across the province? as follows:
3 Health Services Data, Sources and Examples: The Institute for Clinical Evaluative. . . 49
Each person in the world creates a book of life. This access these datasets because even when the iden-
book starts with birth and ends with death. Its pages tifiers in these records are encoded, in rare cases,
are made of the records of principal events in life.
Record linkage is the name given to the process of individual linked records could potentially iden-
assembling the pages into a volume. (Dunn 1946) tify individuals if proper methodologies and
access controls are not employed. As noted by
The linkage of data enables researchers to Chamberlayne et al., “The ethical issues sur-
answer questions based on information from dif- rounding access to a resource made up of linked
ferent parts of the healthcare system. Without data are more complex than those pertaining to
linkage, we can look at hospitalization data and access to a single data source” (Chamberlayne
ask: “How many people were admitted to hospital et al. 1998).
with a heart attack and what hospital care did they Comprehensive and routinely updated docu-
receive?” But with linked data, we can answer mentation, or metadata, is required to fully under-
more involved questions, such as: “Of the people stand the rationale for the original collection of
who were hospitalized with a heart attack, who each variable – documentation is elusive at best
received appropriate follow-up with a specialist? and not always available to researchers. Compre-
Who was prescribed the appropriate medication hensive metadata is necessary to develop an accu-
on a follow-up basis? What were their compara- rate analytic plan, to assess face validity, and to
tive mortality rates 5 or 10 years later?” ensure a reasonable interpretation of the data once
Linked data also allows for the creation of analyzed. Currently, there are methodologies in
algorithms that generate cohorts of people with the emerging field of “data quality science” to
similar health conditions (such as diabetes, better standardize the assessment of administra-
asthma, congestive heart failure, or opioid use) tive health data quality and to understand whether
and/or healthcare experiences (such as mammog- the data are “fit” to answer the intended research
raphy or hip replacement). These algorithms can questions (Lix et al. 2012).
be enriched when linked data, such as physician
claims and hospital inpatient records, are used.
Typically, algorithms are validated by primary The ICES Data Repository
data collection from medical charts at physician
offices or in hospitals. Validated algorithms The Institute for Clinical Evaluative Sciences
applied to annual or updated administrative data (ICES) in Ontario, Canada, is a not-for-profit
provide an efficient way to generate cohorts that research institute and the steward of a secure and
would otherwise be very expensive to collect accessible data repository that allows for the
over time. development of evidence that makes “policy bet-
Using administrative health data for research ter, health care stronger and people healthier”
has some challenges, however. Since the data are (from ICES website www.ices.on.ca; March
collected for administrative purposes, they are 2014). ICES is funded primarily by the Ontario
observational and therefore usually retrospective. Ministry of Health and Long-Term Care with spe-
They usually do not contain the clinical or cial initiative funds and investigator-driven peer-
sociodemographic detail (such as smoking, socio- reviewed grants. As of April 2014, there were
economic status, or medical test results) necessary approximately 180 affiliated faculty from around
to answer some research questions or to account Ontario and about 160 staff whose expertise
for potential confounders of health outcomes. includes data linkage and analysis, biostatistics,
Administrative data may be prone to health informatics, epidemiology, project man-
misclassifying individuals assigned to disease- agement, research administration, information
based cohorts without adequate physician or hos- technology, and database development and sup-
pital chart-abstracted person-level record valida- port. ICES science is organized across clinical
tion. Finally, special legal authorities, privacy program areas: cancer, cardiovascular, primary
laws, and permissions are required to collect and care and population health, chronic disease and
50 K. Iron and K. Sykora
pharmacotherapy, health system planning and experiences over time. The ICES Repository
evaluation, kidney, dialysis and transplantation, goes back to 1991 and in some cases, earlier.
and mental health. • Population based: In 2013, there were over
Most of the ICES staff are located at ICES 13 million people in Ontario, and since most
Central on the campus of Sunnybrook Health of the people who are eligible for healthcare
Sciences Centre in Toronto, Ontario, and other are represented, this makes the ICES Repos-
affiliated ICES scientists and staff are located itory the largest repository of its type in
across the province: Downtown Toronto, Queen’s Canada.
University in Kingston, the University of Ottawa, • Comprehensive health sector data: Much of
Western University in London, and new sites the administrative data in the ICES Reposi-
developing at McMaster University in Hamilton tory represent publicly funded physician, hos-
and at the Northern Ontario School of Medicine in pital and health-based community care, as
Thunder Bay. well as claims for prescription drugs for peo-
ICES is the steward of a large comprehensive ple aged 65 and over. Population and
and linkable data repository used for research and condition-specific registries are also included,
evaluation. The ICES Data Repository consists where available. In some provincial data
primarily of health administrative data that are repositories, such as at the Manitoba Centre
created in the day-to-day interactions with the for Health Policy at the University of Mani-
healthcare system – billings of physicians to the toba, additional government administrative
Ontario Health Insurance Plan (OHIP), drug data outside the health sector, such as educa-
claims to the Ontario Drug Benefit (ODB) Pro- tion and social support, are routinely
gram, discharge summaries of hospital stays included. At ICES, discussions to broaden
(DAD) and emergency department visits the collection and use of data beyond the
(NACRS), and much more. With almost complete health sector have begun.
health services data coverage of the annual • Desensitized and linkable with coded identi-
Ontario population from 1991 across most pub- fiers: Individuals in the Repository are
licly funded healthcare sectors, ICES scientists, uniquely identified with an ICES-specific key
analysts, and staff apply scientific methods to number (IKN) which is obtained by encoding
advance the evidence for improvements in health the Ontario health card number using a propri-
and healthcare. The collection and use of these etary encoding algorithm. ICES in-house pro-
administrative data is authorized by ICES’ desig- fessionals replace any direct identifiers
nation as one of four prescribed entities in Ontario attached to the incoming data with a unique
under the Personal Health Information Protection IKN that is used to link person-level records
Act 2004 (PHIPA, s.45) – this means that ICES from one dataset to another. This in-house
may collect and use personal health information expertise that spans informatics and research
for the purposes of evaluating and monitoring the has allowed for the easy integration of data
health system, with adequate data governance with high data quality standards.
permissions and controls. • Easy to use: All data are in an SAS format and
The ICES Data Repository has the following ready to use in an analytic environment – these
attributes: data are linkable to each other using a unique
person-level identifier and ready to use after
• Individual level: The data reflect people appropriate data access approvals. Having the
and their health and healthcare experiences, data repository organized in this manner cre-
similar to data repositories in British Colum- ates efficiencies for research as the data are
bia, Manitoba, Quebec, Nova Scotia, and already in record-level format.
Newfoundland. • Secure and privacy protected: Ontario privacy
• Longitudinal: Like other jurisdictions, the legislation (Personal Health Information Pro-
ICES Data Repository includes most healthcare tection Act – PHIPA 2004) allows for ICES to
3 Health Services Data, Sources and Examples: The Institute for Clinical Evaluative. . . 51
collect direct identifiers from data custodians • Derived chronic condition cohorts have been
for the purpose of assigning an IKN to each developed at ICES using linked data algo-
data record. ICES’ privacy policies, practices, rithms that have been validated by using pri-
and procedures and our prescribed entity status mary data collection as a gold standard.
under PHIPA allow ICES to function with the • Detailed clinical data has been extracted from
approval of the Ontario Information and Pri- electronic medical records and through ICES
vacy Commissioner (IPC). A full review of primary data collection projects.
ICES privacy and security policies and proce- • Population and demographic data through the
dures is undertaken every 3 years, with the Ministry of Health’s Registered Persons Data-
approval letter from the IPC published on the base (RPDB) is used to characterize study sub-
ICES website (more detail on this below). jects and to generate denominators for rate
Expert information and technology staff are calculation.
on site to ensure the security and smooth main- • Additional clinical data, agency client-level
tenance of the research platform. data, and research data collections that are
• Professional data management: Data quality linkable to longitudinal outcome data are
and informatics experts apply the highest data included on a project-by-project basis.
quality standards and are leading in developing
metadata and other documentation for the ana- A full listing of the data in the ICES Data
lysts and scientists to use. Repository can be found on the ICES website.
operating procedures and policies are reviewed privacy impact assessment (PIA) is completed
and approved by the Information and Privacy by research teams outlining the project research
Commissioner of Ontario every 3 years. protocol, the data being contemplated for the pro-
The authority for ICES to hold and integrate ject, the output of the research, and the foreseeable
data lies within detailed data sharing agreements privacy impacts or risks. The ICES privacy office
or memoranda of understanding with every data reviews all privacy impact assessments and pro-
partner. A data sharing agreement executed for vides recommendations and final approval before
every dataset integrated into the Repository out- any data can be accessed for projects. In some
lines the legal authorities, the data collection and cases and according to data sharing agreements,
transfer methods, the desensitization procedures, the data custodian is notified or approves the use
and the use for each new dataset that ICES col- of their data for ICES projects and they receive a
lects. The most comprehensive data sharing copy of reports that utilized their data. All ICES
agreement is with the Ontario Ministry of Health projects at a minimum undergo Research Ethics
and Long-Term Care, and this agreement outlines Board (REB) retrospective review – currently
ICES’ responsibility in using the Ontario health Sunnybrook Health Sciences Centre REB is the
administrative data. overseeing body for most ICES projects.
ICES’ policies, practices, and procedures that
prescribe the governance of the Repository over-
all and of each dataset at ICES are strictly Record Linkage and Desensitizing
followed – the use of the data at ICES is limited the Data for Research
to the agreed-upon purpose and use defined in the
data sharing agreement under which the data is The ICES Data Repository is continuously grow-
authorized for ICES to collect. ing. Mostly, the data collected at ICES initially
contains direct identifiers so that the records
Access to ICES Data attributed to a unique individual can be assigned
Research at ICES is generally managed within the correct ICES key number (IKN) and the direct
clinical program areas: cancer, cardiovascular, identifiers removed. This process of desensitizing
population health and primary care, chronic dis- data for research at ICES may be facilitated by
ease and pharmacotherapy, health system plan- record linkage (also known as record matching) –
ning and evaluation, kidney, dialysis and a process by which records from two files are
transplantation, and mental health and addictions. combined so that an individual’s information
As well, ICES currently has four active satellite from one file can be merged with the same indi-
sites: ICES UofT at the University of Toronto, vidual’s information from another file. For exam-
ICES Queen’s in Kingston, ICES uOttawa, and ple, you may have one file of demographic data
ICES Western (ICES at McMaster University and and another file of diagnostic patient information,
ICES North at Lakehead/Laurentian University and you want to combine and analyze them
are being developed). Scientists and staff are affil- together. If both files contain a precise identifier
iated with these programs. When a fully formed that refers to the same person (such as health card
project is contemplated by an ICES scientist, the number or social insurance number), the linkage
feasibility and rationale for its implementation is task is relatively easy. This is called deterministic
vetted by ICES program leads and management record linkage.
staff: Is the project aligned with the ICES mis- At ICES, not all individual-level data received
sion? Can the question be answered with the data contain Ontario health card numbers. Frequently,
available (or new data collected)? What is the individuals are identified in the data records by
human resource capacity to implement the project their name, postal code, and other “soft” identi-
– analyst and project management or coordination fiers. Before data can be used for research, the
resources? Are there adequate funds to implement IKN for these records must be found. Linkage to
the project? After these criteria are vetted, a other fields may be used to match individuals from
3 Health Services Data, Sources and Examples: The Institute for Clinical Evaluative. . . 53
different files. These, as listed below, come with reduce the total number of comparisons; and cler-
some challenges. ical review is applied to pairs that did not yield a
conclusive weight.
Last name: The Registered Persons Database (RPDB) was
– Not unique between people (common described earlier in this chapter. ICES receives a
names may be shared by numerous number of RPDB files monthly and thus has a
individuals) cumulative record of the names, postal codes,
– Subject to misspelling and other demographic information for all health
– May change over time (e.g., at marriage) card holders in Ontario over time. This file is an
First name: essential component of making files without
– Similar issues as last names HCNs useable for research.
– Nicknames may be used in one file and full Figure 1 illustrates the process of assignment
names in the other of the ICES key number. Once an IKN is assigned
Date of birth: to a record and the original direct identifiers are
– Subject to transcription and other errors removed, that record is considered “desensitized”
– Imprecise when supplied by someone other and can be (deterministically) linked to all other
than the individual (e.g., family member at records in the ICES Data Repository that pertain
hospitalization) to the same person. This facilitates the creation of
– May be incomplete analytic datasets that are prepared to answer spe-
– Not unique cific research questions.
Date of death: Other institutions that may not have the equiv-
– Similar issues as date of birth alent of the RPDB may find other solutions. For
– May only be applicable to a portion of the example, Chamberlayne et al. (1998) describe the
file creation of a Linkage Coordinating File (LCF) at
Location of personal residence such as postal the Centre for Health Services and Policy
code: Research at the University of British Columbia.
– Subject to change over time (as people This file was created by applying probabilistic
move) record linkage to data from various sources and
– Nonunique, in particular within families contains personal identifiers and a unique person-
level index. The file can be used to facilitate the
To combine files that only contain imprecise linkage of other person-level files, in a way similar
direct identifiers such as those above, probabilistic to the RPDB.
record linkage (PRL) may be used. Another com-
mon term for PRL is “fuzzy matching.”
Probabilistic record linkage methodologies Data Documentation, Metadata,
incorporate the relative frequencies of field values and Data Quality Assessment
to compute their sensitivity and the positive pre-
dictive value and then combine these to form Data Quality Assessment
linkage weights for each pair of records. For in the Literature
example, if two records contain the same name,
a greater weight is given if that name is rare in the There are many frameworks and evaluation strate-
population being studied. Conversely, two records gies for data quality, and many are created for
sharing the same value that is quite common (e.g., specific purposes and types of data. Data quality
birth year or female gender) may not contribute assessment has been defined as “the whole of
much to the linkage weight. Various encoding planned and systematic procedures that take
algorithms and string comparators are used to place before, during and after data collection to
deal with alternate spellings, nicknames, and be able to guarantee the quality of data in a
common transcription errors. Blocking is used to database. . .for its intended use” (Arts et al. 2002).
54 K. Iron and K. Sykora
Fig. 1 Process for assignment of ICES key number at ICES, with and without Ontario health card number
Holt and Jones suggested that “data quality is not • Privacy: Do the data adhere to jurisdictional
so much an absolute property of a statistical esti- privacy laws? Are there appropriate and audit-
mate but is related to the purpose for which the able privacy preserving procedures and prac-
estimate is used” (Holt and Jones 1999, p. 24). tices? Has the risk been sufficiently reduced by
When using administrative data, it is difficult to removing sensitive information?
“guarantee” data quality; however, a robust • Usability: Are the data organized, accessible,
assessment focusing on the linked data’s intended and provided in a format that can be easily
use and purpose will at least characterize the qual- used?
ity in an interpretable way. • Currency: What is the time lag between the
Generally, the following domains and ques- time period reflected in the data and the time
tions need to be examined when assessing data that data are ready for use?
quality:
A number of organizations have developed
• Accuracy: Do the data reflect the truth? data quality frameworks to assess the data in
• Validity: Do the data reflect what they were their repositories. For example, the Canadian
designed to reflect? Institute for Health Information (CIHI) frame-
• Completeness: Do the data include all records work includes dimensions of relevance, timeli-
that are collected? Have the fields been well ness, usability, accuracy, and comparability
populated? within an envelope of planning, implementing,
• Comprehensiveness and coverage: Do the data and assessing (CIHI 2009).
cover 100 % of the intended population? Alter- Researchers at the Manitoba Centre for Health
nately, do they constitute a representative Policy have developed a data quality framework
sample? that has been broadly adopted by ICES (Azimaee
• Reliability: Are the data reproducible? et al. 2013). In that framework, dimensions of data
• Timeliness: Is there a short lag between data quality are divided between those that can be
collection and use? assessed at the database level, versus those that
• Linkability: Can the data be connected to can be assessed at the research level. In particular
other data to reflect healthcare system they described database-specific data quality
complexity? dimensions as:
3 Health Services Data, Sources and Examples: The Institute for Clinical Evaluative. . . 55
format” (ACM 1993). The information in the data The files in the ICES Data Repository are
dictionary should contain, at a minimum: stored as SAS datasets. ICES leverages certain
features of the SAS software to create data dictio-
• The name and brief description of each file naries that correspond dynamically to the data
• A list of fields and their description files. In particular:
• For each field, a list and description of valid
values • Dataset labels are used to describe the contents
• Unstructured or semi-structured comments of each file.
with additional information • Similarly, variable labels describe each field.
• A central format catalogue contains descrip-
tions of all valid discrete values of each field
of all datasets.
Fig. 3 Excerpt of ICES Data Dictionary (Source: ICES Data Dictionary https://datadictionary.ices.on.ca/Applications/
DataDictionary/Variables.aspx?LibName=OHIP&MemName=&Variable=ADMDATE)
3 Health Services Data, Sources and Examples: The Institute for Clinical Evaluative. . . 57
traditional manual methods. Since information is New Data, New Uses, and New Ideas
based on actual data elements, there is internal
consistency between the data and the documenta- Health administrative data, particularly in the con-
tion. The process of creating a data dictionary for text of universal healthcare coverage, present a
a new dataset is automated and quick, so that a tremendous opportunity to conduct health and
data dictionary can be made available immedi- healthcare research. Linkable population-based
ately at the same time the data is posted. And data, with the appropriate privacy and security
finally, if errors are discovered, they are corrected safeguards, are a resource for examining popula-
in both the data and the documentation. tion- and disease-based cohorts, trends in health
services utilization, prevalence and incidence
Data Quality Assessment at ICES trends, and effects of policy and system changes,
ICES’ holistic approach to assessing the data among others. Expertise and care must be applied
quality includes a variety of tools that are used to to use such data effectively and optimally.
assess and document data quality, including: Administrative data are also collected outside
the health sector for managing social programs or
• All the data elements in a dataset are displayed educational systems. As with many similar data
in a “VIMO report,” which summarizes the repositories in Canada and around the world, ICES
valid, invalid, missing, and outlier rates. Exam- is exploring the expansion of its linkable data
ples of invalid values are listed. Simple holdings to include non-health administrative
descriptive statistics are also displayed and data from across the provincial and federal gov-
frequencies or histograms are linked to each ernment and social service agencies. For example,
field. ID variables are highlighted, and their a new research program at ICES focusing on men-
uniqueness status is described. tal health and addictions (MHA) was launched in
• A trend analysis of the number of observations 2013 where the need for integrating community
over time is performed, and the results are addictions and mental health agency data with
displayed graphically. health data is critical to understanding prevention,
• The percent of records that are linkable to the early detection, and timely and sustained appropri-
rest of the ICES Data Repository is displayed ate care which in many cases is done in a commu-
over time. nity setting outside the medical model. Although
• Missing values over time are presented visu- much of the routine health data to support this
ally, so substantial changes can be easily program already exists at ICES, a comprehensive
detected. evaluation of the full spectrum of MHA care
• Content experts are identified for each of the requires linkable person-level data that are gener-
datasets. These content experts are expected to ated from education, social support, youth justice
be familiar with the data quality assessment for and child and youth services sectors for example.
their dataset and detect any issues that need to Around the world, discussions about linking
be addressed. biobank and genomic data, electronic medical
• All data users participate in a data blog, in record data, and other large data collections with
which questions and issues are discussed and, each other and with administrative data are pro-
when appropriate, acted upon. pelling the field of big data repositories and ana-
lytics into new and uncharted paradigms.
Figure 4 illustrates of a VIMO assessment of a Innovative data collection tools, dynamic and
client intake dataset. Variable names are privacy-protecting record linkage models, data
hyperlinked to additional univariate descriptions. use, and governance frameworks and technolo-
For example, for numeric values, a histogram is gies are quickly advancing to keep up with the
presented, and for nonunique ID variables, fre- amount and the scope of data being generated and
quencies of the number of records per ID are the research and private sector demands that
displayed. depend on linking disparate datasets.
58
Fig. 4 Example of VIMO (valid, invalid, missing, outlier) data quality assessment at ICES
K. Iron and K. Sykora
3 Health Services Data, Sources and Examples: The Institute for Clinical Evaluative. . . 59
Ross M. Mullner
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Major Healthcare Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Medicare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Medicaid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Children’s Health Insurance Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Information and Data Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Information Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Data Navigator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Interactive Dashboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Data Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Medicare and Medicaid Public Use Data File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Chronic Conditions Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Medicare Current Beneficiary Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Medicare Qualified Entity Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
services researchers, policymakers, educators, strengthening and modernizing the nation’s health
students, and the general public. In 2014, CMS care system to provide access to high equality care
established the Office of Enterprise Data and and improved health at lower cost” (CMS 2015).
Analytics (OEDA) to better oversee and coordi- Headquartered in Baltimore, Maryland, with
nate its large portfolio of data and information. other offices in Bethesda, Maryland, and
The office also funds the privately run Research Washington, DC, ten regional offices located
Data Assistance Center (ResDAC), which pro- throughout the nation, and three antifraud field
vides training and technical assistance to individ- offices, CMS employs about 5,900 federal
uals requesting the agency’s data files. CMS employees. CMS employees in Baltimore,
information products include an online research Bethesda, and Washington, DC, develop
journal Medicare and Medicaid Research healthcare policies and regulations, establish pay-
Review (MMRR); other publications including ment rates, and develop national operating sys-
Medicare and Medicaid Statistical Supplement, tems for programs. Regional office employees
Statistics Reference Booklet, and CMS Fast provide services to Medicare contractors; accom-
Facts; a data navigator; and several interactive pany state surveyors to hospitals, nursing homes,
dashboards. Its data products include numerous and other facilities to ensure health and safety
Medicare and Medicaid public use data files, the standards; and assist state CHIP and Medicaid
Chronic Conditions Data Warehouse (CCW), the programs. CMS employees also work in offices
Medicare Current Beneficiary Survey (MCBS) in Miami, Los Angeles, and New York, cities
files, and the Medicare Qualified Entity known to have high incidences of healthcare
(QE) Program. Many examples of CMS’ infor- fraud and abuse.
mation and data products are highlighted and Operationally, CMS consists of 15 major divi-
discussed. sions, including seven centers: Center for Strategic
Planning, Center for Clinical Standards and Quality,
Center for Medicare, Center for Medicaid and CHIP
Introduction Services, Center for Program Integrity, Center for
Consumer Information and Insurance Oversight,
The Centers for Medicare and Medicaid Services and Center for Medicare and Medicaid Innovation.
(CMS) is a major agency within the US Depart- CMS also has a number of operational offices.
ment of Health and Human Services (DHHS). One office that will increasingly play an important
CMS (previously known as the Health Care role in data and information initiatives is the
Financing Administration or HCFA) is responsi- Office of Enterprise Data and Analytics
ble for administering and overseeing three of the (OEDA). Established in 2014 and managed by
nation’s largest ongoing healthcare programs: CMS’ first chief data officer (CDO), the OEDA
Medicare, Medicaid, and the Children’s Health is tasked with overseeing improvements in the
Insurance Program (CHIP). In addition, CMS is agency’s data collection and dissemination activ-
responsible for implementing various provisions ities. It will work to better harness CMS’ vast data
of the Patient Protection and Affordable Care Act resources to guide decision-making, promoting
(ACA) of 2010, including the construction of an greater access to the agency’s data to increase
insurance exchange or marketplace, consumer higher-quality, patient-centered care at lower
protections, and private health insurance market costs. The OEDA also manages the CMS-funded
regulations. In 2015, CMS through its various Research Data Assistance Center (ResDAC) at the
programs served almost 123 million people, University of Minnesota, which conducts educa-
more than one in three Americans, making it the tion and training programs and provides assis-
single largest purchaser of healthcare in the tance to researchers who want to access the
United States. agency’s data files (Brennan et al. 2014).
CMS’ stated mission is “as an effective steward In 2015, CMS’ budget totaled an estimated
of public funds, CMS is committed to $602 billion (CMS 2015).
4 Health Services Data: The Centers for Medicare and Medicaid Services (CMS) Claims Records 63
scope, and generosity. For example, a low-income orthodontics, mental and behavioral health, hos-
individual may be eligible for Medicaid in one pitalizations, home health care, rehabilitation
state, but not in another. In 2015, an estimated care, medical equipment, and laboratory and
66.7 million individuals were receiving Medicaid x-ray services. In 2015, about 6.2 million children
benefits in the nation. were enrolled in CHIP (Ewing 2008; National
Medicaid originally only provided healthcare Conference of State Legislature 2014).
services for certain categories of the poor such as
pregnant women, children, parents with young
children, the elderly, and blind and disabled indi- Information and Data Products
viduals. The Affordable Care Act (ACA) of 2010
greatly expanded the Medicaid program to cover Each year CMS collects and processes enormous
millions of uninsured Americans. Under the new amounts of data. For just the Medicare program
law, many states have expanded their Medicaid alone, CMS and its contractors process more than
programs to cover nearly all non-elderly poor 1.3 billion claims a year and generate billions of
adults (Henry J. Kaiser Family Foundation 2015; other non-claims data, such as eligibility checks,
Orentlicher 2015). queries from telephone contacts through its toll-
Medicaid is a very important payer for infants free 1–800 MEDICAR(E) help line, patient experi-
and the elderly and younger individuals with signif- ence surveys, and enrollment information. Addition-
icant disabilities. It pays for about half of all births in ally, CMS collects data on its Medicare and
the nation. And Medicaid is the nation’s only safety Medicaid Electronic Health Record (EHR) Incen-
net for people who need long-term care services. tive Programs and on health insurance exchanges or
About a third of Medicaid spending pays for per- marketplaces coverage.
sonal assistance in nursing homes and at home for In the past, CMS tended to view the data and
people who need help with the basic tasks of daily information it produced as only by-products of its
living (Feder and Komisar 2012). operations. Today, however, the development,
Some individuals, known as dual eligible bene- management, use, and dissemination of data and
ficiaries, receive both Medicaid and Medicare ben- information resources have become one of CMS’
efits. They are enrolled in Medicare Part A and/or core functions. To become more transparent and
Part B and receive some form of Medicaid benefits. accountable, CMS is increasingly making more of
In 2015, about 9.6 million individuals were dually its data and information available to researchers,
eligible in the United States (Cohen et al. 2015; policymakers, educators, students, and the general
Henry J. Kaiser Family Foundation 2015). public. By releasing these resources, CMS is
attempting to leverage its data and information to
better evaluate and improve its programs, facilitate
Children’s Health Insurance Program healthcare innovation, develop new products and
analysis tools, and highlight actionable information
Established in 1992 and reauthorized several for internal and external policy- and decision-
times, the state Children’s Health Insurance Pro- makers (CMS 2012).
gram (CHIP) (Title XXI of the Social Security
Act) is a program that provides federal funds to
states and matches state contributions to provide Information Products
health insurance to children who do not qualify for
Medicaid. Specifically, CHIP provides health CMS produces many information products that
insurance for children less than 19 years of age are readily available to researchers and the general
whose families are ineligible for Medicaid. While public. These products include numerous publi-
state benefit plans vary, all CHIP plans cover cations, a data navigator, and several interactive
immunizations, prescription medications, routine dashboards. Examples of some of the major infor-
physician visits, dental care, medically necessary mation products are described below.
4 Health Services Data: The Centers for Medicare and Medicaid Services (CMS) Claims Records 65
Publications Data-and-Systems/Statistics-Trends-and-Reports/
CMS-Statistics-Reference-Booklet/2014.html.
For health services researchers and policy analysts, The briefest statistical summary on annual
CMS publishes a peer-reviewed online journal, the CMS program and financial data is published in
Medicare and Medicaid Research Review CMS Fast Facts. It includes summary information
(MMRR). The journal (previously titled the Health on total national health expenditures; Medicare
Care Financing Review) publishes research articles enrollment, utilization, and expenditures; and the
throughout the year on a continuous basis. The number of Medicaid recipients and payment by
articles address various topics such as trends in selected types of service. CMS Fast Facts can be
Medicare, Medicaid, and CHIP, access and quality found at: www.cms.gov/fastfacts.
of care issues, healthcare insurance coverage, and
payment for health services. It also includes CMS
News and Data Briefs. Issues of MMRR, as well as Data Navigator
the entire run of the Health Care Financing Review
(Vols. 1–39; 1979–2009), can be accessed at: An important tool for finding CMS information
www.ncbi.nlm.nih.gov/pmc/journals/2404. and data is the agency’s data navigator. The data
CMS publishes annual data in its Medicare and navigator is an easy-to-use, menu-driven search
Medicaid Statistical Supplement. This comprehen- tool that guides the user to CMS’ information and
sive statistical supplement is updated on an ongo- data on the World Wide Web, including the
ing basis by section as the data becomes available. agency’s data housed on external websites such
Consisting of 14 chapters, including 115 tables and as the Henry J. Kaiser Family Foundation, the
67 charts, the supplement provides detailed tables National Institute of Medicine, and the Health
on the personal healthcare expenditures for the Indicators Warehouse. The navigator enables the
entire US population; characteristics of the Medi- user to organize data into categories, such as by
care program including enrollment, program pay- CMS program, setting/type of care, topic, geog-
ments, cost sharing, utilization of short-stay raphy, and document type. It also contains a com-
hospitals, skilled nursing facilities, home health prehensive glossary of terms, a list of frequently
agencies, hospices, physician services, hospital asked questions, and a place to subscribe for email
outpatient services, end-stage renal disease ser- updates. The CMS data navigator’s address is:
vices, managed care, and Medicare Part D; and https://dnav.cms.gov.
characteristics of the Medicaid program including
the number of persons served, their demographic
characteristics, and the types of services they Interactive Dashboards
received. Current and past statistical supplements
(2001 to the present) can be accessed at: www.cms. To make its information more accessible, CMS has
gov/Research-Statistics-Data-and-Systems/Statis developed several interactive dashboards. For
tics-Trends-and-Reports/MedicareMedicaidStatS example, the Medicare Geographic Variation
upp/2013.html. Dashboard provides users with an easy-to-use,
CMS also publishes an abridged version of the customizable tool to find, compare, and analyze
statistical supplement entitled CMS Statistics Ref- state- and county-level variations in Medicare per
erence Booklet. This quick reference guide sum- capita costs. Data used in the dashboard are based
marizes information about national healthcare on CMS claims data for Medicare beneficiaries
expenditures and the Medicare and Medicaid pro- enrolled in the fee-for-service programs during
grams. Published in June of each year, the booklet the 5-year period 2008–2012. Users of the dash-
provides the most currently available information. board can compare state and county Medicare
Booklets are available online for 2003 through the costs to that of the nation and identify year-to-
most currently available complete calendar year, year trends compared to national trends over the
at: www.cms.gov/Research-StatitheMedicarestics- same time period. Specifically, users can compare
66 R. M. Mullner
Medicare’s total per capita costs, inpatient per Some of these files because they contain specific
capita costs, post-acute care per capita costs, hos- patient and condition identifiable data are
pice per capita costs, physician/outpatient depart- restricted and difficult to obtain; however, other
ment per capita costs, durable medical equipment de-identified files are readily available as public
per capita costs, Medicare Part B drug per capita use data files, which are free and can be easily
costs, outpatient dialysis facility per capita costs, downloaded.
and the total number of Medicare beneficiaries Table 1 presents a list of 23 CMS public use
in the state or county. The dashboard can be data files and systems and the years for which they
found at: www.cms.gov/Research-Statistics-Data- are available. The files are divided into nine broad
and-Systems/Statistics-Trends-and-Reports/Medica
re-Geographic-Variation/GV_Dashboard.html. Table 1 List of CMS’ public use data files and the years
Another example is the Medicare Chronic Con- for which they are available
dition Dashboard, which presents information for Healthcare organization cost data files
2012 on the prevalence, utilization, and Medicare 1. Healthcare Cost Report Information System (HCRIS)
spending for 17 chronic disease conditions. The Community Mental Health Centers, 2010–2015
conditions include Alzheimer’s disease/dementia, Health Clinics, 2009–2015
arthritis, asthma, atrial fibrillation, autism spectrum Home Health Agencies, 1994–2014
disorders, cancer, chronic kidney disease, chronic Hospices, 1999–2015
obstructive pulmonary disease (COPD), depres- Hospitals, 1996–2015
sion, diabetes, heart failure, hyperlipidemia, hyper- Renal Dialysis Facilities, 1994–2015
tension, ischemic heart disease, osteoporosis, Skilled Nursing Facilities, 1996–2014
schizophrenia/psychoses, and stroke. The informa- Medicare claims data files
2. Basic Stand Alone (BSA) Medicare Claims Public Use
tion is presented by geographic areas such as federal
Files (PUFs)
government region, state, county, and hospital refer- Carrier Line Items PUF, 2008, 2010
ral region. Users of the dashboard can select specific Durable Medical Equipment (DME) Line Items PUF,
categories by gender, age group, Medicare benefi- 2008, 2010
ciaries only, and for dual eligible beneficiaries (indi- Home Health Agency (HHA) Beneficiary PUF, 2008,
vidual receiving both Medicare and Medicaid). The 2010
dashboard is located at: www.cms.gov/Research- Hospice Beneficiary PUF, 2008, 2010
Statistics-Data-and-Systems/Statistics-Trends-and- Inpatient Claims PUF, 2008
Reports/Chronic-Conditions/CCDashboard.html. Outpatient Procedures PUF, 2008, 2010
Prescription Drug Events PUF, 2008
Skilled Nursing Facility (SNF) Beneficiary PUF, 2008,
Data Products 2010
Chronic Conditions PUF, 2008, 2010
CMS produces many data products that are avail- Institutional Providers and Beneficiary Summary PUF,
2013
able to researchers as well as the general public.
Prescription Drug Profiles PUF, 2008, 2010
These data products include many Medicare and
3. Data Entrepreneurs’ Synthetic Public Use Files
Medicaid public use data files, the Chronic Con- (DE-SynPUF), 2008–2010
ditions Data Warehouse (CCW), the Medicare Beneficiary Summary
Current Beneficiary Survey (MCBS), and the Carrier Claims
Medicare Data Sharing Program. Inpatient Claims
Outpatient Claims
Physician and supplier Medicare charges
Medicare and Medicaid Public Use
4. Medicare Provider Utilization and Payment Data
Data File Medicare Physician and Other Suppliers, 2012
Medicare Provider Utilization and Payment Data:
Many of CMS’ Medicare and Medicaid data files Inpatient, 2011–2012
may be very useful to health services researchers. (continued)
4 Health Services Data: The Centers for Medicare and Medicaid Services (CMS) Claims Records 67
Specifically, these reports are included in the information can be found at: www.resdac.org/event/
Healthcare Cost Report Information System webinar-introduction-data-entrepreneurs-synthetic-
(HCRIS). The various files in HCRIS contain public-use-file-de-synpuf.
annual mandatory cost reports submitted to CMS
from all healthcare facilities that accept Medicare Physician and Supplier Medicare Charges
funds. Nearly all of the nation’s hospitals, skilled The next category includes the Medicare Provider
nursing homes, hospices, renal dialysis facilities, Utilization and Payment Data files. These files
independent rural health clinics, and freestanding contain data on the services and procedures pro-
federally qualified health centers submit these vided to Medicare beneficiaries by physicians and
reports. The cost reports consist of a series of other healthcare professionals on an inpatient and
forms that collect descriptive, financial, and sta- outpatient basis. They also include all final-action
tistical data to determine if the Medicare program physician/supplier Part B noninstitutional line
over or underpaid the facility. These files are fre- items for the Medicare fee-for-service population.
quently used by health services researchers to For more information on these files, go to www.
examine various facility characteristics, calculate cms.gov/Research-Statistics-Data-and-Systems/
costs and charges, and determine the financial Statistics-Trends-and-Reports/Medicare-Provider-
viability of the facility (Asper 2013; Holmes Charge-Data.
et al. 2013; Kane and Magnus 2001). More infor-
mation on the various files can be found at: www. Program Evaluation and Health
resdac.org/cms-data/files/hcris. Outcomes
CMS offers researchers many program evaluation
Medicare Claims Data Files and health outcome public use data files. One such
Another widely used data source is the Medicare set of files is contained in the Consumer Assess-
Claims Data Files. These files are part of the Basic ment of Healthcare Providers and Systems
Stand Alone (BSA) Medicare Claims Public Use (CAHPS). CAHPS consists of a family of various
Files (PUFs). It consists of 11 separate basic patient experience surveys. These surveys ask
standalone public use files. Most of these files patients, or in some cases family members, about
contain non-identifiable claims-specific data their experiences with, and ratings of, the care
derived from a 5 % sample of all Medicare bene- they received. The surveys in many cases are the
ficiaries. The files are often used by health ser- only source of information on the care they
vices researchers, and they are increasingly being received. CAHPS surveys have been developed
used to conduct public health surveillance (Erdem for hospitals, home health, Medicare fee-for-ser-
and Concannon 2012; Stein et al. 2014; Erdem vice care, Medicare Advantage and Prescription
et al. 2014). Additional information on the files Drug plans, in-center hemodialysis, and hospices.
and how health services researchers use them can Results from the surveys are contained in various
be found at: www.academyhealth.org/Training/ public use files. Copies of the CAHPS survey
ResourceDetail.cfm?ItemNumber=7097. instruments can be found at: www.cms.gov/
To encourage researchers to use the Medicare Research-Statistics-Data-and-Systems/Research/
claims files, CMS has constructed the Data Entre- CAHPS/index.html. And more information on
preneurs’ Synthetic Public Use Files (DE-SynPUF). the CAHPS public use data files can be found at:
The DE-SynPUF allows researchers to develop and www.resdac.org/cms-data/files/cahps-puf.
create software applications for Medicare claims A number of other CMS public use files are also
data, train individuals to analyze claims data using derived from CAHPS. Data from various CAHPS
the actual files, and support safe data mining inno- surveys are used to produce Medicare Compare files
vations. Data contained in the DE-SynPUF is based and related websites, which contain data on individ-
on a 5 % sample of Medicare beneficiaries includ- ual facilities and physicians. These files provide
ing beneficiary summary data, inpatient, outpatient, contact information, quality of care measures, lists
carrier, and prescription drug event claims. More of services offered, and a five-star rating system.
4 Health Services Data: The Centers for Medicare and Medicaid Services (CMS) Claims Records 69
The Medicare Compare files are available for kid- ways to improve healthcare practices (Haffer and
ney dialysis facilities (www.medicare.gov/dialysis Bowen 2004; Bowen 2012). More information can
facilitiycompare/), home health care agencies be found at www.resdac.org/cms-data/file-family/
(www.medicare.gov/homehealthcompare/), hospi- Health-Outcomes-Survey-HOS.
tals (www.medicare.gov/hospitalcompare/search.
html), skilled nursing facilities (www.medicare. Medicare Prescription Drug Program
gov/nursinghomecompare/search.html), and physi- The next category includes the Prescription
cians (www.medicare.gov/physicianscompare/ Drug Plan Formulary and Pharmacy Network
search.html). Many health services researchers Files. It consists of seven separate files: Benefi-
have used these files to measure the quality of care ciary Cost File, Formulary File, Geographic
provided at various healthcare facilities (Werner and Locator File, Pharmacy Network File, Plan
Bradow 2006; Saunders and Chin 2013; Lutfiyya Information File, Pricing File, and Record Lay-
et al. 2013; Williams et al. 2014). More information out. These files contain data on Medicare pre-
on the public use files can be found at www.resdac. scription drug plans and Medicare Advantage
org/cms-data/files/medicare-compare. prescription drug plans. The various files are
Another public use file dealing with quality of updated weekly, monthly, and quarterly. For
healthcare is the Healthcare Effectiveness Data more information see: www.resdac.org/cms-
and Information Set (HEDIS) public use file. data/files/pharmacy-network.
CMS uses HEDIS to compare health plans pro-
viding Medicare and Medicaid services. HEDIS, Medicare Electronic Medical Records
which was developed by the independent not-for- Program Files
profit National Committee for Quality Assurance CMS encourages the greater use of electronic med-
(NCQA), is a widely used tool to measure the ical records by all healthcare providers. It has
performance of health plans. It currently consists established an incentive program that provides pay-
of 81 measures across five domains of care and ments to hospitals and healthcare professionals to
service. HEDIS, which is used by more than 90 % adopt, implement, upgrade, or demonstrate the use
of America’s health plans, enables researchers to of electronic health record technology. As of
compare the performance of the plans. HEDIS has February 2015, more than 438,000 healthcare
been used to compare different quality measures providers received funds for participating in the
of care (Pugh et al. 2013; Bundy et al. 2012). program. To identify eligible hospitals and profes-
Information on HEDIS and its performance mea- sionals, CMS has constructed the Medicare Elec-
sures can be found at: www.ncqa.org/HEDISQua tronic Health Record (ERH) Incentive Program
lityMeasurement.aspx. And information on the Eligible Professional Public Use File (Wright
public use file is available at: www.resdac.org/ et al. 2014). More information on the program and
cms-data/files/hedis-puf. the files can be obtained at: www.cms.gov/Regula
Lastly, the Medicare Health Outcome Survey tions-and-Guidance/Legislation/EHRIncentive
(HOS) public use files provide a rich source of Programs/DataAndReports.html.
outcome data on Medicare beneficiaries enrolled
in Medicare Advantage programs. The Medicare Medicaid Data Files
HOS consists of Base Line, Follow-Up, and Ana- The next category identifies four CMS Medicaid
lytic Public Use Files. The survey, which measures public use files. The Medicaid Analytic Extract
quality improvement activities, health plan perfor- (MAX) Provider Characteristics File contains
mance, and outcomes of care, is administered to data on state Medicaid programs including the
cohorts of individuals who are repeatedly sampled number of individuals enrolled, demographic
over time. Results from the Medicare HOS have characteristics (age, gender, ethnicity, and race),
been used by health services researchers and qual- basis of eligibility (aged, disabled, children, and
ity improvement professionals to explore func- adults), and maintenance assistant status (medi-
tional status measurement issues and identify cally needy, poverty, waiver, and other). However,
70 R. M. Mullner
after several years of data collection, the files were discharges, length of stay, and total charges by
discontinued. They were last updated in 2011. The CMS provider numbers and zip codes of the
MAX files have been used by researchers to study Medicare beneficiaries. Using these data hospital
medical adherence to drugs (Rust et al. 2013) and service areas can be determined for various ser-
the maternal and infant outcomes of multistate vices. More information on the file can be found
Medicaid populations (Palmsten et al. 2014). A at: www.resdac.org/cms-data/files/hsaf.
chartbook summarizing 2010 MAX data is also The largest set of CMS geographic public use
available (Borck et al. 2014). For more informa- files is the Medicare Geographic Variation Files.
tion about the public use files, see www.resdac. They include 12 separate files – two files with
org/cms-data/files/max-provider-characteristics. state- and county-level data, four files with state-
The second public use file is the Medicaid/ level data, and six files with hospital referral
CHIP Environmental Scanning and Program regions (HRRs). The files are divided into report
Characteristics (ESPC) File. This file was created and table formats for all Medicare beneficiaries,
by CMS to encourage cross-state analysis of Med- those under 65 years of age and those 65 years of
icaid programs. It is now part of CMS’ Environ- age and older. These geographic files contain
mental Scanning and Program Characteristics demographic, spending, utilization, and quality
(ESPC) Database. The ESPC can be linked of care indicators for the Medicare fee-for-service
to the Medicaid Analytic Extract (MAX) files population at the state, county, and hospital refer-
and other Medicaid data. More information can ral regions. The hospital referral regions were
be found at: www.resdac.org/cms-data/files/ developed by the Dartmouth Atlas of Health
medicaidchip-espc. Care Project and have been widely used by health
Another public use file is the Medicaid State services researchers to investigate regional differ-
Drug Utilization File. This file contains data for ences in access, cost, quality, and the outcomes of
covered outpatient drugs paid for by state Medic- care (Baker et al. 2014; Chen et al. 2014;
aid agencies since the start of the federal Drug Wennberg 2010). Detailed information on the
Rebate Program in 1990. Currently, all states and files can be found at: www.resdac.org/cms-data/
the District of Columbia participate in the pro- files/medicare-geographic-variation.
gram, as well as about 600 drug manufacturers.
For more information see: www.resdac.org/cms- Directories of Providers and Coding
data/files/medicaid-state-drug-utilization. Systems
Lastly, the Medicaid Statistical Information The last category includes seven directories of
System (MSIS) Datamart contains two public providers and medical procedure coding systems
use data files: State Summary Datamart and the public use data files. These files contain a listing
Drug Utilization Datamart. Both of these files can of the unique CMS healthcare facility and
be used to produce tables covering a wide range of healthcare professional provider identifiers and
Medicaid program statistics on eligibility and lists of CMS recognized medical procedure
claims data. These files contain data on Medicaid codes. The lists and procedure codes are primarily
eligible, beneficiaries, and payment, maintenance used for billing and payment purposes.
assistance status, age group, gender, race/ethnic- The public use Health Care Information Sys-
ity, and service category and program type. For tem (HCIS) Data File contains information on
more information go to: www.resdac.org/cms- each Medicare Part A and B institutional provider
data/files/msis-datamart. by type of facility and state. Specifically, it lists
CMS provider identifiers, facility characteristics,
Geographic Regions and Hospital Service total payment amounts, total number of Medicare
Areas beneficiaries served, and total utilization for hos-
The next category includes two geographic public pitals, skilled nursing facilities, home health agen-
use files. The first file is the Hospital Service Area cies, and hospices. For more information see:
File. It contains summary data on hospital www.resdac.org/cms-data/files/hcis.
4 Health Services Data: The Centers for Medicare and Medicaid Services (CMS) Claims Records 71
The Medicare Part B Summary Data Files con- Provider Identifier (NPI) Downloadable File,
sists of two separate public use files: Carrier File which was previously discussed. These two files,
and National File. These files contain data sum- which may be of interest to researchers investigat-
maries by Healthcare Common Procedure Coding ing physicians in the mid-2000s, include the
System (HCPCS) code ranges. The HCPCS are Unique Physician Identification Number (UPIN)
medical codes used to report supplies, equipment, Directory and the Unique Physician Identification
and devices provided to patients. The file includes Number (UPIN) Group File. The first file contains
allowed services, allowed charges, and payment the name, specialty, license number, and zip code
amounts. More information on the files can be of physicians, limited licensed practitioners, and
found at: www.resdac.org/cms-data/files/part-b- some nonphysician practitioners who were
summary-data. enrolled in the Medicare program. The second
The next public use file is the National Pro- file provides data on group practices and the phy-
vider Identifier (NPI) Downloadable File. The sicians who were members of them. Both files
NPI is a unique, ten-digit, identification number were discontinued in 2007 with the implementa-
for each CMS-covered healthcare provider. By tion of the NPI. Information on the two files can be
federal law, the NPI must be used in all adminis- obtained at: www.resdac.org/cms-data/files/upin-
trative and financial healthcare transactions. The directory and www.resdac.org/cms-data/files/
file contains NPI data on the name, gender, busi- upin-group.
ness address, and medical license number of pro-
vider. For more information see: www.resdac.org/
cms-data/files/nppes. Chronic Conditions Data Warehouse
The Physician Supplier Procedure Summary
Master File contains data on all Medicare Part B Another important CMS data product is the Chronic
carrier and durable medical equipment regional Conditions Data Warehouse (CCW). Established in
carrier (DMERC) claims that were processed by 2006, the CCW is a national Medicare and Medicaid
CMS. Carriers are private companies that have research database containing claims and assessment
contracts with Medicare to process Part B claims. data linked by beneficiary across the continuum of
Durable medical equipment (DME) is equipment care. It also includes Medicare Part D prescription
that can withstand repeated use and is appropriate drug event data listing plan, pharmacy, prescriber
for home use, for example wheelchairs, oxygen characteristics, and a formulary file.
equipment, and hospital beds. The file includes The CCW is designed to promote the use of
data on each carrier; pricing locality; HCPCS current Medicare and Medicaid analytic easy-to-
procedure code; type and place of service; sub- use data files by researchers and policy analysts,
mitted, allowed, and denied services and charges; promote longitudinal research using data already
and payment amounts. More information can be linked by beneficiary across the continuum of
found at: www.resdac.org/cms-data/files/psps. care, identify areas to improve the quality of care
The Provider of Services (POS) File contains a provided to chronically ill beneficiaries, identify
record of each Medicare provider, including all possible ways to reduce program spending, and
institutional providers, ambulatory surgical cen- provide thorough documentation so these data
ters, and clinical laboratories. The file, which is may be used accurately ( General Dynamics Infor-
updated quarterly, includes CMS provider identi- mation Technology 2013; CCW website, www.
fication numbers and the characteristics of hospi- ccwdata.org/web/guest/about-ccw).
tals and other types of facilities, including the The CCW uses various computer algorithms to
name, address, and type of Medicare services the identify various conditions. The database includes
facility provided. For further information see: 27 chronic disease conditions, 9 mental health and
www.resdac.org/cms-data/files/pos. tobacco use conditions, and 15 conditions that are
The last two files in this category have been related to physical and intellectual disability and
discontinued and replaced by the National developmental disorders.
72 R. M. Mullner
Specifically, the CCW’s chronic disease condi- insurance coverage, satisfaction with the care
tions include acquired hypothyroidism, acute they received, and socioeconomic and demo-
myocardial infarction, Alzheimer’s disease, graphic characteristics of Medicare beneficiaries.
Alzheimer’s or related dementia, anemia, asthma, It also has been used to study the occurrence and
atrial fibrillation, benign prostatic hyperplasia, treatment of specific chronic conditions of the
cataract, chronic kidney disease (CKD), chronic elderly such as depression, dementia, hip frac-
obstructive pulmonary disease (COPD), heart tures, glaucoma, osteoporosis, and rheumatoid
failure, depression, diabetes, glaucoma, hip/pelvic arthritis. A bibliography and copies of over
fracture, hyperlipidemia, hypertension, ischemic 800 research articles published from 1992 to
heart disease, osteoporosis, rheumatoid/osteoar- 2013, which used MCBS data, can be found at
thritis, stroke/transient ischemic attack (TIA), www.cms.gov/Research-Statistics-Data-and-Sys
breast cancer, colorectal cancer, lung cancer, pros- tems/Research/MCBS/Bibliography.html.
tate cancer, and endometrial cancer. The MCBS is a continuous, in-person, longi-
The CCW’s mental health and tobacco condi- tudinal panel survey of a representative national
tions include conduct disorders and hyperkinetic sample of the Medicare population. Survey
syndrome, anxiety disorders, bipolar disorder, respondents are interviewed three times a year
depressive disorders, personality disorders, post- over a period of 4 years to form a continuous
traumatic stress disorder (PTSD), schizophrenia, profile of their healthcare experience. Two types
schizophrenia and other psychotic disorders, and of interviews are conducted: a community inter-
tobacco use disorder. view done at the respondent’s residence and a
Lastly, the CCW’s physical and mental disabil- healthcare institutional interview of knowledge-
ity conditions include autism spectrum disorder; able staff on behalf of the beneficiary. An impor-
cerebral palsy; cystic fibrosis and other metabolic tant feature of the MCBS is that respondents are
developmental disorders; epilepsy; intellectual followed into and out of long-term care facilities
disabilities and related conditions; learning dis- during their panel participation. About 16,000
abilities and other developmental delays; mobility Medicare beneficiaries are interviewed every
impairments; multiple sclerosis and transverse year (Adler 1994; Briesacher et al. 2012).
myelitis; muscular dystrophy; sensory – deafness Two data products are derived each year from
and hearing impairment; sensory – blindness and the MCBS: the Access to Care data file and the
visual impairment; spina bifida and other congen- Cost and Use data file. The Access to Care file
ital anomalies of the nervous system; spinal cord represents all persons enrolled in Medicare
injury; traumatic brain injury and nonpsychotic throughout the entire data collection year, which
mental disorders due to brain damage; and other is referred to as the “always enrolled” beneficiary
developmental delays. population. The file contains data on the benefi-
General information on the CCW can be ciaries’ access to healthcare, satisfaction with
obtained at www.ccwdata.org/web/guest/home. care, and usual source of care. The Access to
And a current detailed user guide (Buccaneer Care file is released within a year of the survey
Computer Systems and Service 2015) can be (Petroski et al. 2014).
found at: www.ccwdata.org. The Cost and Use file represents all persons
enrolled in Medicare at any point during the data
collection year, which is referred to as the “ever-
Medicare Current Beneficiary Survey enrolled” beneficiary population. The file links
Medicare claims data to survey-reported events
A very widely used CMS data product is the and provides complete expenditure and source of
Medicare Current Beneficiary Survey (MCBS). payment data on all healthcare services, including
Since the survey’s inception in 1991, the MCBS those not covered by Medicare. The file contains
data files have been used to estimate the health data on the beneficiaries’ use and cost of
status, healthcare use and expenditures, health healthcare services, information supplementary
4 Health Services Data: The Centers for Medicare and Medicaid Services (CMS) Claims Records 73
health insurance, living arrangements, income, of Health Policy, Midwest Health Initiative (cov-
health status, and physical functioning. The Cost ering the St. Louis area and 16 counties in Mis-
and Use file is released within 2 years of the souri), and the Health Care Cost Institute
survey. (covering all 50 states and the District of
More information on the MCBS and its two Columbia).
files can be obtained at: www.cms.gov/Research- The QEs are beginning to release public
Statistics-Data-and-Systems/Research/MCBS/ind reports using the combined Medicare and other
ex.html?redirect=/MCBS. Additionally, an infor- payer data. The first report was published by the
mative free webinar presentation, “Getting and Oregon Health Care Quality Corporation, Infor-
Using the Medicare Current Beneficiary Survey mation for a Healthy Oregon: Statewide Report
(MCBS) for Health Services Research: Guidance on Health Care Quality 2014 (www.qcorp.org/
from the Experts,” is available from Academy reports/statewide-reports). It includes informa-
Health at: www.academyhealth.org/Training/ tion on Oregon’s chronic disease care, preventive
ResourceDetail.cfm?ItemNumber=11031. services, and ambulatory and hospital
resource use.
More information on CMS’ Qualified Entity
Medicare Qualified Entity Program Program is available at: www.resdac.org/cms-data/
request/qualified-entity-program; www.cms.gov/
The last data product to be discussed is the CMS’ QEMedicareData; and www.QEMedicareData.org.
Medicare Qualified Entity Program. This pro-
gram, which was mandated by the Affordable
Care Act of 2010, requires CMS to provide access
to Medicare claims data by qualified entities Conclusion
(QEs) in order to produce public performance
reports on physicians, hospitals, and other In the future, CMS will increasingly release more
healthcare providers. The program enables the information and data products that will be useful
QEs to combine Medicare claims data with com- to health services researchers, policymakers, edu-
mercial insurance and Medicaid claims data. To cators, students, and the general public. CMS will
become a QE, an organization must demonstrate continue to collect data on the Medicare, Medic-
existing expertise in performance measurement, aid, and Children’s Health Insurance Program
the ability to combine Medicare data with other (CHIP). At the same time, CMS will also expand
claims data, a process for allowing providers to its data collection efforts to measure its many new
review and correct their performance reports, and initiative programs, which are attempting to
adherence to data privacy and security procedures improve the quality of patient care, provide a
(Hostetter and Klein 2013). greater emphasis on prevention and population
As of June 2014, CMS has certified 12 regional health, and expand healthcare coverage. These
and one national QE: Oregon Health Care Quality initiatives will encourage all of the nation’s
Corporation (Q-Corp), Health Improvement Col- healthcare providers to use electronic health
laborative of Greater Cincinnati, Kansas City records, establish more Accountable Care Orga-
Quality Improvement Consortium, Maine Health nizations (ACOs), increase value-based purchas-
Management Coalition Foundation, Health ing, better coordinate care for dual eligible
Insight (covering five counties in New Mexico), beneficiaries, and reduce unnecessary hospital
California Healthcare Performance Information readmissions. As CMS moves from being a vol-
System, Pittsburgh Regional Health Initiative, ume payer of healthcare services to a value-based
Minnesota Community Measurement, Wisconsin payer, it will need much more data to identify the
Health Information Organization, Center for best ways to increase the quality of care while at
Improving Value in Health Care (covering Colo- the same time lower its costs (Burwell 2015; CMS
rado), Minnesota Department of Health, Division Strategy 2013).
74 R. M. Mullner
Commonwealth Fund Newsletter. 2013. Available at: associated with new exposure. Drugs Aging. 2013;30
www.commonwealthfund.org/publications/newsletters/ (8):645–54. Available at: www.ncbi.nlm.nih.gov/pmc/
quality-matters/2013/april-may/in-focus articles/PMC3720786/
Kane NM, Magnus SA. The Medicare cost report and the Rust G, Zhang S, Reynolds J. Inhaled corticosteroid adher-
limits of hospital accountability: improving financial ence and emergency department utilization among
accounting data. J Health Polit Policy Law. 2001;26 Medicaid-enrolled children with asthma. J Asthma.
(1):81–106. 2013;50(7):769–75. Available at: www.ncbi.nlm.nih.
Lutfiyya MN, Gessert CE, Lipsky MS. Nursing home gov/pmc/articles/PMC4017346/
quality: a comparative analysis using CMS nursing Saunders MR, Chin MH. Variation in dialysis quality mea-
home compare data to examine differences between sures by facility, neighborhood, and region. Med Care.
rural and non-rural facilities. J Am Med Dir Assoc. 2013;51(5):413–7. Available at: www.ncbi.nlm.nih.
2013;14(8):593–8. gov/pmc/articles/PMC3651911/
National Conference of State Legislatures. Children’s Stein BD, Pangilnan M, Sorbero MJ, et al. Using claims
health: trends and options for covering kids. data to generate clinical flags predicting short-term risk
Washington, DC: National Conference of State Legis- of continued psychiatric hospitalizations. Psychiatr
latures; 2014. Available at: www.ncsl.org/documents/ Serv. 2014;65(11):1341–6.
health/coveringkids914.pdf U.S. Government Accountability Office. Health care trans-
Office of the Assistant Secretary for Planning and Evalua- parency: actions needed to improve cost and quality
tion, U.S. Department of Health and Human Services. information for consumers. Washington, DC:
The Medicare advantage program in 2014. ASPE Issue U.S. Government Accountability Office; 2014. Avail-
Brief. 2014. Available at: http://aspe.hhs.gov able at: www.gao.gov/products/GAO-15-11
Orentlicher D. Medicaid at 50: no longer limited to the Wennberg JE. Tracking medicine: a researcher’s quest to
‘Deserving’ poor? Yale J Health Policy Law Ethics. understand health care. New York: Oxford University
2015;15(1):185–95. Press; 2010.
Palmsten K, Huybrechts KF, Kowal MK, et al. Validity of Werner RM, Bradow ET. Relationship between Medicare’s
maternal and infant outcomes within nationwide Med- hospital compare performance measures and mortality
icaid data. Pharmacoepidemiol Drug Saf. 2014;23 rates. JAMA. 2006;296(22):2694–702.
(6):646–55. Williams A, Straker JK, Applebaum R. The nursing home
Petroski J, Ferraro D, Chu A. Ever enrolled Medicare five star rating: how does it compare to resident and
population estimates from the MCBS access to care family views of care? Gerontologist. 2014.
files. Medicare Medicaid Res Rev. 2014;4(2):E1–16. Wright A, Feblowitz J, Samal L, et al. The Medicare
Available at: www.cms.gov/mmrr/Downloads/ electronic health record incentive program: provider
MMRR2014_004_02_a05.pdf performance on core and menu measures. Health Serv
Pugh MJV, Marcum ZA, Copeland LA, et al. The quality of Res. 2014;49(1 Pt 2):325–46. Available at: www.ncbi.
quality measures: HEDIS quality measures for media- nlm.nih.gov/pmc/articles/PMC3925405/
tion management in the elderly and outcomes
Health Services Data: Typology of Health
Care Data 5
Ross M. Mullner
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Basic Units of Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Households . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Groups/Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Health Care Organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Health Care Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
National Health Care Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Collection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Focus Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Medical Records, Administrative, and Billing Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Registries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Vital Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Data Sources and Holdings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Government Organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Private Organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
analysis for the study? These units include indi- assess, cost, quality, and the outcome of health care
viduals, households, groups/populations, health (Mullner 2009). Access to health care, which can
care organizations, health care programs, and be defined as encompassing everything that facili-
national health care systems. Second, how tates or impedes the use of health care services, is a
were these data collected? The methods used basic requirement of any health care facility, pro-
to collect data include literature reviews, obser- gram, or system. A number of factors influence an
vations, focus groups, surveys, medical records individual’s access to health care including the
and administrative and billing sources, regis- environment, population characteristics, health
tries, and vital records. Third, which govern- behavior, and outcomes. Environmental factors
ment agency or private organization collected include the health care system (e.g., whether it is
and is currently holding these data? Govern- acceptable to the individual or not) and external
ment data collection and holding agencies environmental factors (e.g., geographic distance,
include US health information clearinghouses physical, and political barriers). Population charac-
and libraries, US registries, US government teristics include predisposing characteristics (e.g.,
agencies and departments, health programs age, and gender), enabling resources (e.g., income,
and systems of other (non-US) nations, and and health insurance), and perceived need (e.g.,
government sponsored international organiza- health beliefs). Health behavior includes personal
tions. Private data collecting and holding orga- health practices and previous use of health ser-
nizations include health information vices. Lastly, outcomes include perceived health
clearinghouses and libraries; accreditation, status, evaluated health status, and consumer satis-
evaluation, and regulatory organizations; asso- faction with care (Andersen 1995).
ciations and professional societies; foundations Health services researchers studying access to
and trusts; health insurance and employee ben- health care investigate various topics such as iden-
efits organizations; registries; research and pol- tifying ethnic and racial disparities in medical
icy organizations; and survey research care; determining the geographic locations of
organizations. To illustrate each of the questions health professional shortage areas; studying the
and classifications, many examples are pro- factors associated with the diffusion and use of
vided and discussed. And many US and other new medical technology and facilities; measuring
public use data files are identified and described. access to hospitals and other health care facilities;
and identifying the availability of health insurance
coverage, and determining its impact on the use of
Introduction health care services (Agency for Healthcare
Research and Quality 2014).
Health services research is a growing field of study Cost of health care, which can be defined as the
that is becoming increasingly important to society. amount that has to be paid or spent to buy or obtain
As medicine progresses and increasingly saves health care, can be differentiated and measured in
lives, becomes more technologically complex, many ways including average cost, fixed cost, incre-
and is ever more costly and demands a greater mental cost, marginal cost, total cost, and variable
share of society’s resources, a growing number of cost, as well as direct and indirect cost, avoided cost,
people are conducting health services research cost of lost productivity, and the societal cost of
studies. These researchers include physicians, illness (Culyer 2010; Feldstein 2011). It should be
nurses, epidemiologist, demographers, health noted that health care cost frequently differs greatly
economists, medical sociologists, political scien- from the price of health care, because the price is
tists, public policymakers, hospital administrators, often not determined by cost, but rather it is greatly
insurance executives, senior business managers, distorted by what health insurers are willing to pay
and consultants. (Painter and Chernew 2012).
Health services research can be broadly defined Health services researchers studying the cost of
as a multidisciplinary field of study that focuses on health care investigate a large number of topics
5 Health Services Data: Typology of Health Care Data 79
such as conducting international comparisons of Discomfort includes various levels of pain from
health care cost in various nations, determining “no pain” to “worst pain imaginable” and its dura-
the cost-benefit and cost-efficiency of medical tion. And dissatisfaction, which is the level of
procedures and drugs, investigating the impact satisfaction, measures the specific and overall
of different methods of financing care, determin- experience with care (Kane and Radosevich 2011).
ing the impact of new payment reform models Health services researchers studying the out-
(i.e., pay-for-performance), identifying the impact come of health care tend to investigate such topics
of health care rationing, estimating the economic as estimating the number of preventable deaths of
value of life, and identifying the economic and enrollees in various health programs; determining
societal cost of particular medical conditions and the factors leading to the increase in longevity;
diseases (Health Care Cost Institute 2014). identifying the health services provided to chil-
Quality of health care, which can be defined as dren and adolescents with chronic diseases and
getting the right care to the right patient at the right disabilities; developing and testing new pain
time – every time, is evaluated using three dimen- scales; and analyzing and reporting the results of
sions: structure, process, and outcome. Structure health satisfaction surveys (Halsey 2015; Perrin
includes the characteristics of the care setting, 2002; Williamson and Hoggart 2008).
such as type and size of the health facility, exper- Ideally, a health care facility, program, or sys-
tise of the medical staff, sophistication of the tem should provide the greatest access to health
medical equipment, and the policies related to care, at the lowest possible cost, with the greatest
care delivery. Process consists of the methods of level of quality, and achieve the best possible
providing patients with consistent, appropriate, outcome of care. To work towards, this very dif-
effective, safe, coordinated, timely, and patient- ficult ideal, health services researchers frequently
centered high quality care. Outcome evaluates the study the equity, efficiency, and effectiveness of
end result of care the patient received (Clancy health care. Equity can be broadly defined as
2009; Donabedian 1980). fairness, efficiency as the ratio of inputs to out-
Health services researchers studying the qual- puts, and effectiveness as meeting stated objec-
ity of health care investigate such topics as iden- tives and goals, such as the US national health
tifying the impact of accreditation and licensing of goals contained in Healthy People, 2020 (Aday
health care facilities and professionals; estimating et al. 2004).
the overuse, underuse, and misuse of health care The overall aim of health services research is to
services; determining the occurrence of prevent- influence health policy and to improve the prac-
able medical errors; identifying the frequency of tice of medicine and public health. Health services
health care-associated infections; studying patient researchers do this by monitoring and evaluating
safety problems; and developing and testing new current health care facilities, programs, and sys-
medical quality indicators of care (Agency for tems and by predicting the consequences of pro-
Healthcare Research and Quality 2014; National posed future health care policies.
Committee for Quality Assurance 2006). Health services researchers frequently conduct
Lastly, the outcome of health care reflects the studies using existing data sources. They typically
interrelated issues of access, cost, and quality of conduct secondary data analysis of large data-
care. Outcome of health care can be broadly bases that were collected by various government
defined and includes the occurrence and change agencies and private organizations. There are
in the number and rate of death, disease, disability, many advantages in using existing data: they are
discomfort, and dissatisfaction with health care. readily available, inexpensive, and save time in
Death or mortality also includes changes in lon- collection, and they may be used to conduct lon-
gevity. Disease or morbidity addresses acute and gitudinal and international comparisons (Huston
chronic disease and complications with medical and Naylor 1996).
care. Disability deals with the change in physical Primarily focusing on US data sources, this chap-
functional status and psychosocial functioning. ter outlines a practical typology, or classification
80 R. M. Mullner
framework, of health care data that is frequently Table 1 Basic units of analysis
used by these researchers. The typology addresses Individuals
three important inextricably linked questions. First, Identify the general demographic and social
what is the basic unit of analysis of the study? characteristics of individuals
Second, how were these data collected? Third, Determine the overall health status of individuals
which government agency or private organization Measure the occurrence of specific diseases and medical
conditions
collected and is currently holding these data?
Households
Identify the demographic and social characteristics of
households
Basic Units of Analysis Measure the total household income and education levels
Determine the households overall use of health care
After identifying a particular study area of inter- services
est, and a specific topic, a health services Groups/populations
researcher must determine – What will be the Identify the demographic, economic, and social
basic unit of analysis of my proposed study? characteristics of specific ethnic and minority groups
Determine the overall health status of high risk and
Table 1 shows a list of these units; it also presents
vulnerable populations
some relevant questions that may be addressed for Measure the gaps in health care among various groups
each unit. The basic units of analysis include Identify health professional shortage areas
individuals, households, groups/populations, Organizations
health care organizations, health care programs, Identify the total number health care organizations in a
and national health care systems. region
Access the operating characteristics of hospitals
Determine the number of long-term care facilities in an
Individuals area
Measure the service areas and degree of competition
between healthcare organizations
Many health services researchers conduct their stud- Health care programs
ies focusing on individuals. Information on individ- Identify the characteristics of Medicare beneficiaries
uals may be obtained from many sources such as Access the number of type of providers of Medicaid
patient health care records, birth and death certifi- services
cates, insurance claim forms, and various national Determine the unwarranted use of services
health surveys. Data on them may include a very National health care systems
large number of potential variables including the Compare the access, costs, quality, and outcomes of
person’s age, sex, height, weight, race, ethnicity, various national health care systems
place of birth, language most often spoken, marital Determine how each system rations care
Identify by country the highest and lowest levels of care
status, highest level of education attained, main
occupation, current work status, health insurance
coverage, past medical history, current overall health
status, physical activities, degree of mobility, dis- individual health questionnaire is the “World Health
ability status, individual risk factors (tobacco, alco- Survey, 2002,” which was implemented in 70 mem-
hol use, and poor nutrition), environmental risk ber states (countries) to gather data on a sample of
factors (air pollution, ground water contamination, 300,000 adults. Data from the surveys were used to
and lack of sanitation), self care, the level of pain and strengthen each country’s capacity to monitor criti-
discomfort experienced, cognition problems, inter- cal health outcomes and systems. A copy of the
personal activities, sleep and energy level, inventory long- and short-survey instruments can be found
of medicines and drugs, health seeking behaviors, on WHO’s websites, www.who.int/healthinfo/sur
health screenings, reproductive and sexual health vey/en/ (WHO 2002).
care, maternal health care, child health preventive Another very important large-scale survey of
care, and health goals. An example of a widely used individuals is the US Centers for Disease Control
5 Health Services Data: Typology of Health Care Data 81
and Prevention’s (CDC) Behavioral Risk Factor The survey collects data on a broad range of topics
Surveillance System (BRFSS). The BRFSS is a including access to health care services, health
nationwide surveillance system that is conducted insurance coverage, physical and mental health
to monitor state-level prevalence of the major status, chronic medical conditions, health-related
behavioral risks (e.g., exercise, alcohol consump- behaviors, functioning and activity limitations,
tion, tobacco use, immunizations, and various immunizations, and injuries and poisonings. Cur-
cancer screening) among adults who have condi- rent and past NHIS data public use files, question-
tions associated with premature morbidity and naires, documentation, and analytic reports are
mortality. To collect data, the CDC works together readily available and can be downloaded for free
with state health departments and conducts from NCHS’ website www.cdc.gov/nchs/nhis.
monthly telephone surveys. Currently, more than htm (NCHS 2010).
500,000 interviews are conducted annually mak-
ing the BRFSS the world’s largest telephone sur-
vey. Data from the survey are published in various Groups/Populations
reports (Xu et al. 2014), and annual survey data
for 1984–2013 can be downloaded. The BRFSS When general sample surveys of individuals or
also offers statistical tools, Web Enabled Analysis households do not adequately yield reliable health
Tool (WEAT), which let researchers conduct cross care data on specific groups or populations, sup-
tabulations and logistic regression analysis, and plements may be added to existing surveys or new
an interactive mapping program to compare data surveys may be developed to obtain data on those
across geographic areas. More information on the groups or populations. Some of these groups or
BRFSS can be obtained at: www.cdc.gov/brfss/. populations may include racial and ethnic minor-
ity groups (American Indians and Alaska Natives,
Asians, Native Hawaiian and Pacific Islanders,
Households Blacks, and Hispanic/Latinos), high risk and vul-
nerable populations (infants, children under
Health services researchers often study the health 5 years of age, pregnant women, and the elderly),
and health care seeking behavioral characteristics and groups with a specific disease and medical
of households. In the USA, they frequently use condition (blind, hearing loss, and the severely
data collected by the Centers for Disease Control disabled).
and Prevention’s (CDC) National Center for To obtain information on a group or popula-
Health Statistics (NCHS), the nation’s principal tion, the National Health Interview Survey
government health statistics agency. Many of (NHIS) often adds supplements to its standard
NCHS’ surveys collect data on the demographic, survey and expands the number of households
socioeconomic, and the health characteristics of sampled. These supplements are sponsored by
households (NCHS “Summary” 2014b). various government agencies and nonprofit orga-
The oldest and arguably the most important nizations. In 2014, for example, the NHIS added
National Center for Health Statistics’ household 4,000 additional households to its survey to obtain
survey is the National Health Interview Survey more data on the health of Native Hawaiian and
(NHIS). The NHIS, which is considered the prin- Pacific Islanders (NCHS 2014a).
cipal source of information on the health of the US
population, has been used to continuously moni-
tor the nation’s health since 1957. This large-scale Health Care Organizations
household survey collects data on a statistically
representative sample of the US civilian noninsti- Health services researchers frequently study
tutional population. Each year interviewers visit health care organizations. They study many
35,000–40,000 households across the nation and types of organizations such as medical group
collect data on about 75,000–100,000 individuals. practices, outpatient surgery centers, home care
82 R. M. Mullner
organizations, Health Maintenance Organizations monitors and updates its files on them. CMS col-
(HMOs), and Accountable Care Organizations lects data on the address of each nursing home; the
(ACOs). But they particularly study hospitals facilities’ bed size, ownership type, and certifica-
and nursing homes. tion; number of nursing home residents; demo-
The hospital is arguably the single most impor- graphic and medical characteristics of the
tant institution for the delivery of modern health residents, including cognitive and functional
care, while the nursing home is the major institution impairments; and the number, type, and level of
caring for the elderly. The most widely used data deficiencies these facilities experienced. The defi-
source on US hospitals is the American Hospital ciencies include citations for substandard quality
Association’s (AHA) annual survey of hospitals. of care; abuse; improper restraint use; pressure
And the most important source on nursing homes sores; actual harm or worse; and the immediate
data is the US Department of Health and Human jeopardy threat to the health or life of one or more
Services, Centers for Medicare and Medicaid Ser- nursing home residents. Data on individual nurs-
vices’ (CMS) nursing home compare data program. ing homes can be obtained at CMS’s Medicare.
The American Hospital Association (AHA) gov Nursing Home Compare website www.medi
conducts an annual survey of the nation’s approx- care.gov/nursinghomecompare, the entire data-
imately 6,000 hospitals, which account for base can be downloaded, and a summary nursing
920,000-staffed beds and 36 million admissions. home data compendium is published annually,
The survey, which is the most comprehensive and which is also available on the website (CMS
authoritative source on US hospitals, collects 2014).
almost 900 variables on each hospital. These data
include the hospital’s address, bed size, ownership
(for-profit, not-for-profit, government), type of Health Care Programs
hospital (community, psychiatric, long-term care,
federal, and units of institutions), membership in a Many health services researchers study large
multihospital system or network, teaching status, national health care programs. One of the most
type of facilities and services offered, physician widely studied is the US Medicare program. This
arrangements, information technology, total num- federal government administered national pro-
ber of inpatients and outpatient visits, Medicare/ gram provides health insurance for over 50 million
Medicaid utilization, revenues and expenses, and people, including those 65 years of age or older,
number of hospital staff. Data from the survey are those with certain disabilities, and people of any
published in the annual AHA Guide to the Health age with End-Stage Renal Disease (ESRD) (per-
Care Field and AHA Hospital Statistics, and the manent kidney failure requiring dialysis or a kid-
proprietary data can be purchased on CD (AHA ney transplant).
Data Viewer 2015; AHA 2013). The Medicare program consists of four differ-
The Centers for Medicare and Medicaid Ser- ent parts: Part A (hospital insurance covering
vices (CMS), which administers the nation’s inpatient care, nursing home, hospice, and home
Medicare program and works in partnership with health care), Part B (medical insurance covering
state governments to administer Medicaid pro- physician services, outpatient and home health
grams, continuously gathers data on the country’s care, and durable medical equipment), Part C
nearly 16,000 certified-nursing homes. These (Medicare Advantage, a managed care program
nursing homes provide services to over 1.4 covering Part A and B), and Part D (covering
million residents, corresponding to nearly 3 % of prescription drugs).
the nation’s over 65 population and 10 % of the The program collects data on its various parts
over 85 population. Because CMS pays for nurs- including claims for services provided to each
ing home services provided to Medicare benefi- beneficiary admitted to a certified hospital and
ciaries and Medicaid recipients, it continuously nursing home. It codes the beneficiaries’ address,
5 Health Services Data: Typology of Health Care Data 83
where they received care, their medical diagnoses, accounts for about 60 % of all Medicare
admission date, what services were provided, dis- spending.
charge date, discharge status, cost of each service, To remedy the unwarranted variations in
and the total cost of care. If the beneficiary dies preference-sensitive care, the Dartmouth
after receiving care, it is coded up to 3 years after researchers argue for the greater use of evidence-
discharge. One widely used CMS database is based medicine to identify the best option, and
the Medicare Provider Analysis and Review they call for a fundamental reform of the
(MEDPAR) file, which can be obtained from physician-patient relationship, with greater shared
CMS’ website www.cms.gov/Research-Statistics- decision-making and informed patient choice. To
Data-and-Systems/IdentifiableDataFiles/Medicare remedy the variations in supply-sensitive care,
ProviderAnalysisandReviewFile.html (CMS 2014). they argue that the common physician assumption
An exemplar of the innovative use of the that “more care is better” needs to change and
MEDPAR database is the research conducted by there must be a new emphasis on improving the
the Dartmouth Atlas of Health Care Project. science of health care delivery (Wennberg 2010).
Health services researchers working on the pro- Published reports of the Dartmouth Atlas of
ject, which is housed at Dartmouth University’s Health Care Project as well as the data they used
Institute for Health Policy and Clinical Practice, in many of their studies can be downloaded from
have studied a wide range of medical practice their website www.dartmouthatlas.org.
patterns at the national, regional, state, and local
levels. For more than 20 years, these researchers
have found and documented glaring unwarranted National Health Care Systems
variations in surgeries, diagnostic testing, imaging
exams, physician visits, referrals to specialists, Lastly, some health services researchers conduct
hospitalizations, and stays in intensive care units. cross-national studies of health care systems, such
They have consistently found that more health as comparing the US health system to that of
care is not necessarily better care (Dartmouth Canada, the United Kingdom, and other industri-
Atlas of Health Care 2015). alized nations. It is hoped that these multinational
Using Medicare data, the Dartmouth comparisons may help health policymakers learn
researchers have identified three broad categories from the experiences of other nations, lead to new
of medical care: effective or necessary care, insights and perspectives, held in evaluating
preference-sensitive care, and supply-sensitive existing policies, and identify possible new solu-
care. Effective or necessary care includes services tions to shared problems.
that are based on sound medical evidence, which Three important sources of data on national
work better than any alternative treatment (e.g., health care systems are the World Health Organi-
surgery for hip fractures and colon cancer). They zation (WHO), Organisation for Economic
estimate that this category of care accounts for no Co-operation and Development (OECD), and the
more than 15 % of total Medicare spending. Commonwealth Fund.
Preference-sensitive or elective care includes The World Health Organization (WHO), which
interventions for which there are several options is the directing and coordinating authority for
and where the outcomes vary depending on the health within the United Nations (UN), collects
option used (e.g., elective surgeries, mammogra- health-related data on its 194 member states
phy screening tests, and prostate specific antigen (nations). These data on the states are published
tests). This accounts for about 25 % of Medicare in its series World Health Statistics. Issued annu-
spending. Lastly, supply-sensitive care includes ally since 2005, World Health Statistics is the
everyday medical care used to treat patients with definitive source of information on the health of
acute and chronic diseases (e.g., physician visits, the world’s people. The series is compiled using
imaging exams, and admissions to hospitals). This publications and databases produced and
84 R. M. Mullner
maintained by the WHO’s technical programs and The Commonwealth Fund, a private, nonparti-
regional offices, and from various databases of the san foundation headquartered in New York City
UN and World Bank. Data in the publication that supports independent research on health care
provide a comprehensive summary of the current issues to improve health care practice and policy,
health status and health system of each member conducts annual cross-national studies. Starting in
state. These data include nine areas: life expec- 1998, its International Health Policy Center has
tancy and mortality, cause-specific morbidity and conducted multinational surveys of patients and
mortality, selected infectious diseases, health ser- their physicians to identify their experiences with
vice coverage, risk factors, health systems, health their health care systems. The surveys focus on
expenditures, health inequities, and demographic various aspects of access, costs, and quality of
and socioeconomic statistics. WHO’s data in health care.
published form are available on its website One of the center’s recent surveys was the
www.who.int (WHO 2014). “2014 Commonwealth Fund International Health
The Organisation for Economic Co-operation Policy Survey of Older Adults,” a telephone inter-
and Development (OECD) is an international view survey of more than 15,000 people age 65 or
membership organization representing 34 industri- older in 11 industrialized countries (Australia,
alized nations that are committed to democracy and Canada, France, Germany, the Netherlands,
a free market economy. The OECD, working with New Zealand, Norway, Sweden, Switzerland, the
its member nations, produces data and reports on a United Kingdom, and the United States). The
wide variety of economic and social topics, includ- survey’s major finding was that older adults in
ing health care. Each year it releases data compar- the US were sicker and more likely to have prob-
ing the health care systems of its member nations lems paying their medical bills and getting needed
including: health care spending – average spending health care than those in the other 10 countries
per capita, spending as a percentage of GDP, (Osborn et al. 2014).
spending per hospital discharge, and pharmaceuti- The center has also conducted five surveys to
cal spending per capita; supply and monitor changes in multinational health care sys-
utilization–number of practicing physicians per tem performance, and the results have been
population, average number of physician visits published in a series of reports entitled Mirror,
per capita, Magnetic Resonance Imaging (MRI) Mirror on the Wall (2004, 2006, 2007, 2010,
machines per population, hospital discharges per 2014). Over the years, these reports have consis-
population, and hip replacement inpatient cases per tently found that among industrialized nations the
population; health promotion and disease preven- US health care system has been the most expen-
tion efforts – cervical cancer screening rates, flu sive, but underperforms relative to other nations
immunization among adults 65 or older, and adults on most dimensions on access, efficiency, and
who report being daily smokers; quality and patient equity (Davis et al. 2014).
safety – mortality amenable to health care, breast
cancer 5-year survival rate, and diabetes lower
extremity amputation rates; prices – total hospital
and physician prices for appendectomy and bypass Collection Methods
surgery, diagnostic imaging prices, and long-term
care and social supports – percent of population The second question of this typology of health
age 65 or older, beds in residential long-term care care data is – How were these data collected?
facilities per population age 65 or older, and health This question is important, because the way
and social care spending as a percentage of GDP. the data were collected may limit the type of
OECD data and its reports, which are frequently statistical methodology that can be used to ana-
used by health services researchers (Anderson lyze them, and it may greatly affect the reliability
2014; Anderson and Squires 2010), can be and validity of the results of the study. Each data
downloaded from their website www.oecd.org/ collection method has advantages and disadvan-
statistics/ (OECD 2013). tages and the researcher should be well aware of
5 Health Services Data: Typology of Health Care Data 85
them. Table 2 shows the various data collection focus groups; surveys; medical records, adminis-
methods, and it also lists some relevant questions trative, and billing sources; registries; and vital
that may be addressed by each method. The records.
methods include literature reviews; observations;
Literature Reviews
Table 2 Data collection methods
Literature reviews One of the easiest, fastest, and most economical
Identify what is known about a particular health care ways to obtain data and information on a research
topic topic or a specific research question is to conduct a
Determine what are the gaps in knowledge on the topic literature review. A comprehensive literature
Conduct a meta-analysis to assess the clinical review can help identify what is known and not
effectiveness of a health care intervention
known about a topic or question; what data
Answer a research question
sources are available; what variables were found
Observations
Observe patients taking their treatments
to be important; what statistical methods were
Measure the degree of hand hygiene adherence at a health employed; what populations were studied; what
care organization sample sizes were used; and what are the gaps or
Conduct a clinical observation, or shadowing, to possible errors in the studies.
determine how health care professionals actually provide A major resource in conducting literature
patient care
reviews is the US National Library of Medicine’s
Focus groups
(NLM) PubMed search engine. PubMed accesses
Determine the perceptions, opinions, beliefs, and
attitudes towards a health program
MEDLINE and other databases of citations and
Identify specific problems with a health facility abstracts in the fields of medicine, nursing, public
Present options to a group and see which ones are viewed health, and health care systems. Currently,
favorably PubMed contains more than 24 million citations
Surveys from over 5,600 worldwide journals and thou-
Determine the past medical history of individuals sands of books and reports. PubMed is easy to
Identify the experiences of patients in receiving care use, it can be searched by entering Medical Sub-
Measure the workload of physicians and other health care ject Headings (MeSH) the NLM’s controlled
professionals
vocabulary, author names, title words or phrases,
Medical records, administrative, and billing sources
journal names, or any combination of these. It also
Identify and implement best practices of care
links to many full-text articles and reports. The
Determine regional variations in the provision of health
care PubMed’s website is: www.ncbi.nlm.nih.gov/
Measure the average costs of various health care pubmed.
services Another important source for conducting liter-
Registries ature reviews is the Cochrane Collaboration.
Identify the occurrence of a disease within a Consisting of a network of 14 centers around the
population world, the Cochrane Collaboration is a nonprofit
Assess the natural history of a disease, its management, international organization that promotes and dis-
and its outcomes
seminates systematic reviews of health care inter-
Support health economic research
Collect postmarketing safety data on medical products
ventions, particularly clinical trials. Collaborators
and pharmaceuticals from over 120 countries conduct these systematic
Vital records reviews. The Cochrane Library contains a number
Determine trends in fetal and perinatal mortality of useful databases including Cochrane Database
Identify the relationship between infant birth weight and of Systematic Reviews (CDSR); Cochrane Con-
health care problems trolled Trials Register (CENTRAL); Database of
Determine trends in low-risk Cesarean delivery Abstracts of Reviews of Effectiveness (DARE);
Identify trends in drug-poisoning deaths involving opioid Cochrane Methodology Register; Health Technol-
analgesics and heroin
ogy Assessment Database (HTA); and the
86 R. M. Mullner
National Health Service Economic Evaluation the present, this study has followed large cohorts
Database (NHS EED). The Cochrane Collabora- of individuals from Framingham, Massachusetts,
tion’s website is www.cochrane.org. to determine their risk of developing cardiovascu-
Many of the Cochrane Collaboration’s sys- lar disease. Today, much of what is now-common
tematic reviews include a meta-analysis of stud- knowledge concerning the major risk factors of
ies. Meta-analysis is a statistical technique that developing heart disease (hypertension, high
combines the findings from multiple research “bad” cholesterol, diabetes, smoking, obesity,
studies to develop a single conclusion that has and a sedentary lifestyle) is based on the Framing-
greater statistical power. By pulling together a ham Study (Levy and Brink 2005).
number of independent studies, researchers can
make a more accurate estimate of the effect of a
result (Borenstein et al. 2009; Higgins and Green Focus Groups
2008).
Occasionally, health services researchers conduct
focus groups to obtain data. Focus groups gener-
Observations ally consist of five to ten participants who are
asked their opinions about a topic in a group
Health services researchers sometimes conduct interview. Although the interviews are informal,
observational studies to obtain data. In these open-ended, and relatively broad, a moderator
types of studies, individuals are observed or cer- asks the group a series of questions to help direct
tain outcomes are measured, but no attempt is the discussion. Focus groups may be used to
made to affect the outcome. They do not involve explore new research areas, topics that are diffi-
an experiment or intervention. Observational stud- cult to observe, and very sensitive topics. They
ies may be either cross-sectional or longitudinal. may also be used to gather preliminary data, aid in
Cross-sectional studies are short quick snapshot survey development and more formal structured
studies, and they do not provide definitive infor- interviews, and clarify complex research findings.
mation about a cause-and-effect relationship. As the focus group session is occurring, it is
However, longitudinal studies that are conducted audio- and/or video-recorded. These recording
over long periods of time with many observations are then transcribed, reviewed, and studied.
can determine changes in individuals and Focus groups have advantages as well as dis-
populations. They can establish the sequence of advantages. They may generate new ideas and
events and suggest a cause-and-effect relationship. allow clarification of issues, and the group mem-
Observational studies can vary greatly in size, bers may stimulate each other. However, members
scope, and complexity. Some observational stud- and the moderator can bias responses; some mem-
ies are very small, inexpensive, quickly bers may dominate the group; and the results of
conducted, cross-sectional studies. An example the focus group may be difficult to analyze or
of such as study would be a researcher investigat- quantify (Krueger and Casey 2009).
ing the waiting times of patients at a health care Recently, the Robert Wood Johnson Founda-
clinic. He or she might conduct the study by tion conducted a series of focus groups to gather
unobtrusively sitting in the waiting room for a information on what consumers think about the
few days observing and coding the demographic rising cost of health care in the USA. The foun-
characteristics of each patient and the number of dation convened eight focus groups in four cities:
minutes they waited to be seen. Philadelphia; Charlotte, North Carolina; Chicago;
In contrast, other observational studies are very and Denver. The participants included individuals
large, expensive, lengthy, longitudinal studies. with employer-sponsored insurance, those who
One of the most famous longitudinal observational purchased their insurance on the private market,
studies in modern medicine is the Framingham those enrolled in Medicare, and those without any
Heart Study. Begun in 1948 and continuing to health insurance coverage. The major findings of
5 Health Services Data: Typology of Health Care Data 87
the focus groups were that the participants were Some health surveys are conducted by inter-
very aware of their actual health care costs; they views, which may be completed over the tele-
were aware of the rising costs of care, but did not phone or face-to-face. Telephone interviews are
understand why it was happening; the rising costs more frequently used because of their versatility,
were affecting their daily lives and purchases; and data quality, and cost and time efficiency. In con-
they were increasingly angry about the increasing trast, face-to-face interviews are generally consid-
costs, but felt helpless in reversing the trends ered to provide the very best data quality, but they
(Robert Wood Johnson Foundation 2013). are the most expensive and time-consuming sur-
veys to complete (Aday and Cornelius 2006;
Johnson 2014).
Surveys To collect longitudinal data to measure
changes over time, health services researchers
Some health services researchers rely heavily on periodically send surveys to a panel of individuals
surveys to gather data for their studies. They or organizational respondents. An example of
occasionally conduct their own health care sur- such a survey is the US Agency for Healthcare
veys, but more often use data from surveys Research and Quality’s (AHRQ) Medical Expen-
conducted by others. Using these data, they con- diture Panel Survey (MEPS). Begun in 1996,
duct health needs assessments, develop health MEPS is a set of surveys of individuals and house-
profiles of groups/populations, monitor the health holds, their medical providers, and employers
of cohorts and populations, and collect pre- and across the nation. MEPS collects data to estimate
posttest heath care measures. the frequency and use of specific health services,
Health care surveys are a very effective and the cost and payment for these services, and the
efficient method of estimating the characteristics health insurance coverage held by and available to
of large groups/populations using representative US workers.
samples. Most health surveys are conducted with Specifically, MEPS consists of three compo-
a large number of participants who are randomly nents: household, insurance, and other. The
selected to reduce the risk of selection bias. The household component collects panel data from a
surveys collect data in a structured, standardized sample of families and individuals using several
manner from each respondent. Lastly, these data rounds of interviewing conducted over 2 years.
are typically summarized as counts or persons or Data from the interviews make it possible for
events. researchers to identify how the changes in the
Health survey data are collected using two respondent’s health status, income, employment,
broad strategies, and the respondents are asked health insurance, use of services, and payment of
to reply to questions presented in questionnaires care are related. The insurance component gathers
or read aloud by interviewers. These two strate- data by surveying employers about the health
gies may be employed individually or in insurance coverage they offer their workers. The
combination. other component collects data on the hospitals,
The most widely used type of survey is the physicians, home health care providers, and phar-
self-administered mailed survey, whereby a macies that provided care to respondents. It is
questionnaire and an introductory cover letter used to supplement and/or replace information
are sent via standard mail to a sample of persons. received from the respondents.
The respondents are asked to complete the ques- Data obtain from MEPS are published in var-
tionnaire and return it to the researcher using a ious statistical briefs, which can be downloaded.
preaddressed return envelope enclosed with the Recent briefs have reported on the access to
questionnaire. With the increasing use of home health care by adult men and women, ages
computers, self-administered surveys are also 18–64 (Davis 2014); the number and character-
increasingly being sent to respondents via istics of the long-term uninsured (Rhoades and
e-mail and the Internet. Cohen 2014); and national health care expenses
88 R. M. Mullner
by type of service and source of payment file, containing a list of equipment provided such
(Stagnitti and Carper 2014). MEPS household as oxygen equipment, hospital beds, and wheel-
component public use data files and insurance chairs; Prescription Drug Events file, containing
component summary data tables are released on the variables: age, gender, drug name, dose, cost,
AHRQ’s MEPS website on a regular annual and payment by patient; Hospice Beneficiary file,
schedule, http://meps.ahrq.gov/mepsweb/about_ containing the variables: age, gender, and length
meps/releaseschedule.jsp. of stay; Carrier Line Items file, containing physi-
cian/supplier medical claims data, dates of ser-
vice, and reimbursement amounts; Home Health
Medical Records, Administrative, Agency (HHA) Beneficiary file, containing demo-
and Billing Sources graphic and claim-related variables; Outpatient
Procedures file, containing demographic variables
A rich source of health care data can be obtained and procedures provided; Skilled Nursing Facility
from medical records, administrative, and billing (SNF) Beneficiary file, containing demographic
sources. The most widely used and easily acces- and nursing home claims; Chronic Conditions
sible source of this type of data is the Medicare file, containing age, gender, various chronic con-
claims files. These data files have been widely ditions, and dual-eligibility status; Institutional
used by health services researchers to identify: Provider and Beneficiary Summary file,
the factors that influence hospitalization; the geo- containing data on Medicare institutional claims
graphic variations in the type of care patients paid during the calendar year and a summary of
receive, such as the previously discussed Dart- other measures; Prescription Drug Profiles file,
mouth Atlas of Health Care Project; the cost- containing demographic variables, plan-drug-
effectiveness of various clinical procedures; and and prescriber characteristics, and payment data;
the effect of health reform efforts such as the and the Geographic Variation Public Use file,
Affordable Care Act (ACA) on Medicare utiliza- containing demographic, spending, utilization,
tion rates. and quality indicators at the state, hospital referral
CMS has numerous data files available to region, and county level.
researchers. However, because of privacy con- Further information about the data files can be
cerns, some of the files are more restricted than obtained from the CMS-funded Research Data
others. CMS classifies its files into three catego- Assistance Center (ResDAC), which is located at
ries: Research Identifiable Files (RIF), which are the University of Minnesota, Minneapolis. Its
the most restricted files because they contain website is www.resdac.org.
patient and condition identifiable data; Limited
Data Sets (LDS), which are less restricted files
because their patient-specific data are ranged or Registries
encrypted; and Public Use Files (PUF)/Non-
identifiable Files, which are the least restricted Health services researchers occasionally use data
files of all, are readily available, and can be easily from registries to conduct their studies. Registries
downloaded. are tools that systematically collect a defined set of
CMS has released a number of public use data exposures, health conditions, and demographic
files. These “Basic Stand Alone (BSA) Medicare data about individuals, with the data held in a
Claims Public Use Files (PUFs)”mainly consist of central database for a specific purpose. They are
5 % random samples of all Medicare beneficiaries used for a multitude of purposes including moni-
from a reference year. Examples of these data files toring treatment benefits and risks, understanding
include: Hospital Inpatient Claims file, containing the natural history of diseases, identifying unmet
the variables: age, gender, base DRG, ICD-9 pro- medical needs, and determining the quality of
cedure code, length of stay, and the amount paid; care. Registries can vary greatly in size, scope,
Durable Medical Equipment (DME) Line Items and duration. Some registries collect data at a
5 Health Services Data: Typology of Health Care Data 89
single clinic for a few weeks, while others are reports on medical malpractice payments, medical
international in scope and collect data for many review actions, and sanctions by Board of Medi-
decades. Registries may be sponsored by govern- cal Examiners. It collects information from med-
ment agencies, nonprofit organizations, health ical malpractice payments and adverse licensures,
care facilities, and/or private for-profit companies Drug Enforcement Administration (DEA) reports,
(Arts et al. 2002). and Medicare and Medicaid exclusion actions
It is difficult to classify the various types of concerning physicians, dentists, and other
registries because of their great diversity and licensed health care practitioners. The NPDB pro-
scope. Also, they may collect overlapping sets of vides this information to health care providers,
data. However, they can be very roughly divided hospitals, and state and federal agencies to use
into product registries, disease or condition regis- when making important hiring or licensing deci-
tries, and health services registries. sions. This helps protect the public by preventing
Product registries gather data on individuals physicians and other practitioners from hiding
who received a specific drug or medical device. their past when they move to a new state (Wake-
To ensure safety, these registries have been field 2011).
established to monitor individuals who received The NPDB public use data file, which does not
such drugs as thalidomide, and those who were include any information that identifies individuals or
given medical devices such as implantable reporting entities, is available for statistical analysis
cardioverter defibrillators. Registries have also at www.npdb.hrsa.gov/resources/publicData.jsp.
been established to monitor possible drug expo-
sures during pregnancy and the neonatal
consequences. Vital Records
Disease or condition registries gather data on
individuals with specific disorders. These regis- Vital records include birth certificates, marriage
tries may identify the natural history of a disease, licenses and divorce decrees, and death certifi-
evaluate possible treatments, and stimulate new cates. In the USA, counties and state governments
research on the cause and outcome of the disorder. collect, manage, and disseminate vital records, not
Diseases included in these registries can vary from the federal government. Health services
rare diseases such as cystic fibrosis, to relatively researchers frequently use data from birth and
common chronic diseases such as heart failure. death certificates in their studies. They use these
Health services registries tend to gather data on data to track health trends to determine changing
individual clinical encounters such as physician public health and research priorities, identify
office visits, hospitalizations, clinical procedures, racial and ethnic disparities, measure the impact
and total episodes of care. Some registries include of various diseases, ascertain the use of health care
all patients undergoing a procedure such as an services, and to address quality of care issues
appendectomy or those admitted to a hospital for (Children’s Health Care Quality measures Core
a particular diagnosis such as community- Set Technical Assistance and Analytic Support
acquired pneumonia. Many of these registries Program 2014; National Research Council 2009).
are used to evaluate the outcome of care and the The US Standard Certificate of Live Birth con-
associated quality of health care services (Gliklich tains a wealth of information on the newborn, as
and Dreyer 2010). well as the mother and father. Data on the new-
An example of a unique health services regis- born include name, sex, time and place of birth,
try is the Health Resources and Services Admin- birth weight, Apgar scores, abnormal conditions,
istration’s (HRSA) National Practitioner Data and congenital anomalies of the newborn. Data on
Bank (NPDB). The NPDB is a critical tool in the the mother include name; address; education
US’ efforts to protect patients from incompetent, level; whether of Hispanic origin or not; race;
unprofessional, and often dangerous health care date of first and last prenatal care visit; total num-
practitioners. Since 1986, the NPDB has collected ber of prenatal visits; number of other pregnancy
90 R. M. Mullner
outcomes; the degree of cigarette smoking before private organizations, sometimes with government
and during pregnancy; whether the mother was support through contracts and grants, also collect
transferred for maternal medical or fetal indica- health care data for research purposes, to monitor
tions for delivery; principal source of payment for health policies, and to identify their member’s views
the delivery; risk factors in the pregnancy such as and opinions on various issues. Table 3 shows the
diabetes, hypertension, and previous preterm classification of health care data collection organi-
birth; obstetric procedures used; onset of labor; zations and holding sources, including a list of var-
characteristics of the labor and delivery; method ious representative organizations and their websites.
of delivery; and maternal morbidity. Data on the
father include: name, age, education level,
whether of Hispanic origin or not; and race. Government Organizations
The US Standard Certificate of Death
records the decedent’s: legal name; age; sex; Federal, state, and local governments collect data
social security number; birthplace; residence; on the health care programs they conduct and man-
marital status at the time of death; place of age. These data are often readily available to
death; place of disposition; date of death; researchers at little or no cost. From the perspective
cause of death including the immediate and of health services research, government data collec-
underlying cause; manner of death; if the tion and holding agencies can be broadly classified
injury lead to death, the date and time of injury; into the following categories: US health informa-
and the location of injury. tion clearinghouses and libraries; US registries; US
There is also a separate certificate for fetal government agencies and departments; health pro-
deaths. The US Standard Report of Fetal Death grams and systems of other (non-US) nations; and
collects data on: the name of the fetus; sex; date government sponsored international organizations.
and place where delivery occurred; initiating
cause/condition; other significant causes or con- US Health Information Clearinghouses
ditions; risk factors in the pregnancy; infections and Libraries
present and/or treated during the pregnancy; The federal government maintains many clearing-
method of delivery; maternal morbidity; and con- houses and libraries that are valuable resources for
genital anomalies of the fetus. health services research. For example, the
Although birth, death, and fetal death certifi- National Institutes of Health’s (NIH) National
cates are confidential legal records, they can be Library of Medicine (NLM) is the world’s largest
obtained for research purposes from state public biomedical library. The NLM maintains and
health departments. Summary data on births, makes available its vast print collection and pro-
deaths, fetal deaths, and linked birth/infant deaths duces and continuously updates its electronic
can also be obtained from the National Center for information resources such as PubMed/
Health Statistics (NCHS). Its data release and MEDLINE. PubMed comprises more than 24 mil-
access policy for microdata and compressed vital lion citations from MEDLINE. The NLM also
statistics files can be found at www.cdc.gov/nchs/ contains the National Information Center on
nvss/dvs_data_release.htm. Health Services Research and Health Care Tech-
nology (NICHSR). This center maintains data-
bases and provides outreach and training, and
Data Sources and Holdings information and publications on health services
research. Its website is www.nlm.nih.gov/nichsr/.
The third question of this typology is – Which The US government’s principal health statisti-
government agency or private organization col- cal agency is the US Centers for Disease Control
lected and is currently holding these data? A large and Prevention’s (CDC) National Center for
number of government and private organizations Health Statistics (NCHS). Since 1960, the
collect and disseminate health care data. Many NCHS has conducted numerous national health
5 Health Services Data: Typology of Health Care Data 91
Table 3 (continued)
US Senate, www.senate.gov
US Social Security Administration (SSA), www.ssa.gov
White House, www.whitehouse.gov
Health Programs and Systems of Other (non-U.S.) Nations
Australian Commission on Safety and Quality in Health Care, www.humanservices.gov.au
Australian Government Department of Human Services, www.humanservices.gov.au
Canadian Agency for Drugs and Technologies in Health, www.cadth.ca
Canadian Institute for Health Information, www.cihi.ca
Canadian Institutes of Health Research, www.cihr-irsc.gc.ca
Health Canada, www.hc-sc.gc.ca
United Kingdom’s National Health Service (NHS),www.nhs.uk
United Kingdom’s National Institute for Health and Care Excellence (NICE), www.nice.org.uk
Government Sponsored International Organizations
European Commission, www.ec.europa.eu
European Observatory on Health Systems and Policies, www.euro.who.int/en/about-us/partners/observatory/about-us
Organisation for Economic Co-operation and Development (OECD), www.oecd.org
Pan American Health Organization (PAHO), www.paho.org
United Nation Children’s Fund (UNICEF), www.unicef.org
United Nations (UN), www.un.org
World Bank, www.worldbank.org
World Health Organization (WHO), www.who.int
Private Organizations
Health information clearinghouses and libraries
Centre for Evidence-Based Medicine (CEBM), www.cebm.net
Cochrane Collaboration, www.cochrane.org
Cornell Disability Research Group, www.disabilitystatistics.org
Dartmouth Atlas of Health Care Project, www.dartmouthatlas.org
Data Resource Center for Child and Adolescent Health, www.childhealthdata.org
Health Care Cost Institute (HCCI), www.healthcostinstitute.org
Health Data Consortium, www.healthdataconsortium.org
IMS Health, www.imshealth.com
Inter-University Consortium of Political and Social Research (ICPSR), www.icpsr.umich.edu
National Association of Health Data Organization (NAHDO), www.nahdo.org
National Implementation Research Network (NIRN), www.preventionaction.org
National Rehabilitation Information Center (NARIC), www.naric.com
National Rural Health Resource Center, www.ruralcenter.org
Accreditation, evaluation, and regulatory organizations
Accreditation Association for Ambulatory Health Care (AAAHC), www.aaahc.org
Accreditation Canada, www.accreditation.ca
Accreditation Commission for Health Care (ACHC), www.achc.org
Association of American Medical Colleges (AAMC), www.aamc.org
Board of Certification/Accreditation (BOC), www.bocusa.org
Center for Improvement in Healthcare Quality (CIHQ), www.cihq.org
Community Health Accreditation Partner (CHAP), www.chapinc.org
Det Norske Veritas (DNV) Healthcare, www.dnvglhealthcare.com
Health Grades, www.healthgrades.com
Healthcare Facilities Accreditation Program (HFAP), www.hfap.org
Healthcare Quality Association on Accreditation (HQAA), www.hqaa.org
Intersocietal Accreditation Commission (IAC), www.intersocietal.org
(continued)
5 Health Services Data: Typology of Health Care Data 93
Table 3 (continued)
Joint Commission, www.jointcommission.org
Leapfrog Group, www.leapfroggroup.org
Medical Travel Quality Alliance (MTQUA), www.ntqua.org
National Business Group on Health (NBGH), www.businessgrouphealth.org
National Committee for Quality Assurance (NCQA), www.ncqa.org
National Quality Forum (NQF), www.qualityforum.org
URAC, www.urac.org
Associations and professional societies
Disease/condition associations
ALS Association, www.alsa.org
American Association for Cancer Research (AACR), www.aacr.org
American Cancer Society (ACA), www.cancer.org
American Chronic Pain Association, www.theacpa.org
American Diabetes Association (ADA), www.diabetes.org
American Heart Association (AHA),www.heart.org
American Stroke Association www.strokeassociation.org
American Trauma Society (ATS), www.amtrauma.org
Canadian Mental Health Association (CMHA), www.cmha.ca
CORD (Canadian Organization for Rare Disorders), www.raredisorders.ca
EURORDIS (European Organisation for Rare Diseases), www.eurordis.org
Mental Health America, www.mentalhealthamerica.net
National Alliance on Mental Illness (NAMI), www.nami.org
National Health Council, www.nationalhealthcouncil.org
National Organization for Rare Diseases (NORD), www.rarediseases.org
NORD (National Organization for Rare Disorders), www.rarediseases.org
Unite for Sight, www.uniteforsight.org
Demographic and population group associations
AAPD (American Association of People with Disabilities), www.aapd.com
AARP, www.aarp.org
American Correctional Health Services Association (ACHSA), www.achsa.org
National Alliance for Hispanic Health, www.hispanichealth.org
National Associations of Counties (NACO), www.naco.org
National Coalition for the Homeless, www.nationalhomeless.org
National Medical Association (NMA), www.nmanet.org
National Rural Health Association (NRHA), www.ruralhealthweb.org
NCAI (National Congress of American Indians), www.ncai.org
Population Association of America, www.populationassociation.org
Health care organizations and trade associations
AAMI (Association for the Advancement of Medical Instrumentation), www.aami.org
Advanced Medical Technology Association (AdvaMed), www.advamed.org
Ambulatory Surgery Center Association, www.ascassociation.org
American Association of Accountable Care Organizations (AAACO), www.aaaco.org
American Association of Blood Banks (AABB), www.aabb.org
American Association of Eye and Ear Centers of Excellence (AAEECE), www.aaeece.org
American Association of Homes and Services for the Aging (AAHSA), www.aahsa.org
American Association of Preferred Provider Organizations (AAPPO), www.aappo.org
American Health Care Association (AHCA), www.ahca.org
American Health Information Management Association (AHIMA), www.ahima.org
American Hospital Association (AHA), www.aha.org
(continued)
94 R. M. Mullner
Table 3 (continued)
Association for Behavioral Health and Wellness (ABHW), www.abhw.org
Association of the British Pharmaceutical Industry (ABPI), www.abpi.org.uk
Association of Clinical Research Organization (ACRO), www.acrohealth.org
Catholic Health Association of the United States (CHAUSA), www.chausa.org
Children’s Hospital Association, www.childrenshospitals.net
Federation of American Hospitals (FAH), www.fah.org
HealthCareCAN, www.healthcarecan.ca
HOPE: European Hospital and Healthcare Federation, www.hope.be
International Hospital Federation (IHF), www.ihf-fih.org
Medical Device Manufacturers Association (MDMA), www.medicaldevices.org
National Association of ACOs (NAACOS), www.naacos.com
National Association of Community Health Centers (NACHC), www.nachc.com
National Association for Home Care and Hospice (NAHC), www.nahc.org
PhRMA (Pharmaceutical Research and Manufacturers of America), www.phrma.org
Trauma Center Association of America, www.traumacenters.org
UHC (University Health System Consortium), www.uhc.edu
World Medical Association (WMA), www.wma.net
Professional societies
Academy Health, www.academyhealth.org
American Academy of Family Physicians (AAFP), www.aafp.org
American Academy of Pediatrics (AAP), www.aap.org
American Academy of Physician Assistants (AAPA), www.aapa.org
American Board of Medical Specialties (ABMS), www.abms.org
American College of Emergency Physicians (ACEP), www.acep.org
American College of Healthcare Executives (ACHE), www.ache.org
American College of Surgeons (ACS), www.facs.org
American College of Radiology (ACR), www.acr.org
American College of Wound Healing and Tissue Repair (ACWHTR), https://acwound.org
American Dental Association (ADA), www.ada.org
American Medical Association (AMA), www.ama-assn.org
American Nurses Association (ANA), www.nursingworld.org
American Osteopathic Association, www.osteopathic.org
American Psychiatric Association (APA), www.psychiatry.org
American Psychological Association (APA), www.apa.org
American Public Health Association (APHA), www.apha.org
American Society of Anesthesiologists, www.asahq.org
American Society of Health Economists (ASHE), www.healtheconomics.us
American Society of Plastic Surgeons (ASPS), www.plasticsurgery.org
Canadian Medical Association (CMA), www.cma.ca
European Society for Health and Medical Sociology (ESHMS), www.eshms.eu
Health Services Research Association of Australia and New Zealand, www.hsraanz.org
International Health Economics Association (iHEA), www.healtheconomics.org
National Association of Chronic Disease Directors, www.chronicdisease.org
National Association of Medicaid Directors (NAMD), www.medicaiddirectors.org
National Cancer Registrars Association (NCRA), www.ncra-usa.org
National Governors Association (NGA), www.nga.org
National League for Nursing (NLN), www.nln.org
Society of General Internal Medicine (SGIM), www.sgim.org
Society for Medical Decision Making (SMDM), www.smdm.org
(continued)
5 Health Services Data: Typology of Health Care Data 95
Table 3 (continued)
Foundations and trusts
Canadian Foundation for Healthcare Improvement, www.cfhi-fcass.ca
Commonwealth Fund, www.commonwealthfund.org
Ford Foundation, www.fordfoundation.org
Gates (Bill and Melinda) Foundation, www.gatesfoundation.org
Health Research and Educational Trust (HRET), www.hret.org
Kaiser (Henry J.) Family Foundation, www.kff.org
Kellogg (WK) Foundation, www.wkkf.org
Kresge Foundation, www.kresge.org
MacArthur (John D. and Catherine T.) Foundation, www.macarthur.org
Milbank Memorial Fund, www.milbank.org
National Patient Safety Foundation (NPSF), www.npsf.org
New America Foundation, www.newamerica.net
NIHCM (National Institute for Health Care Management) Foundation, www.nihcm.org
Pew Charitable Trusts, www.pewtrusts.org
Physicians Foundation, www.physiciansfoundation.org
Pfizer Foundation, www.pfizer.com
Public Health Foundation, www.phf.org
Robert Wood Johnson Foundation (RWJ), www.rwjf.org
Wellcome Trust, www.wellcome.ac.uk
Health insurance and employee benefits organizations
American Academy of Insurance Medicine (AAIM), www.aaimedicine.org
America’s Health Insurance Plans (AHIP), www.ahip.org
American Insurance Association (AIA), www.aiadc.org
Association for Community Affiliated Plans (ACAP), www.communityplans.net
Blue Cross and Blue Shield Association (BCBS), www.bcbs.com
Canadian Life and Health Insurance Association (CLHIA), www.clhia.ca
Employee Benefit Research Institute (EBRI), www.ebri.org
Healthcare Financial Management Association (HFMA), www.hfma.org
Insurance – Canada, www.insurance-canada.ca
Medicaid Health Plans of America (MHPA), www.mhpa.org
National Academy of Social Insurance (NASI), www.nasi.org
National Association of Health Underwriters (NAHU), www.nahu.org
National Association of Insurance Commissioners (NAIC), www.naic.org
Physicians for a National Health Program (PNHP),www.pnhp.org
Registries
Alzheimer’s Prevention Registry, www.endalznow.org
American Burn Association, National Burn Repository, www.ameriburn.org
Australian Orthopaedic Association National Joint Replacement Registry (AOANJRR). www.aoa.org.au
British Society for Rheumatology Rheumatoid Arthritis Register (BSRBR-RA), www.inflammation-repair.manchester.ac.uk
Congenital Muscle Disease International Registry (CMDIR), www.cmdir.org
Cystic Fibrosis Foundation (CFF) Patient Registry, www.cff.org
DANBIO Registry of Biologics Used in Rheumatpid Arthritis Patients, www.danbio-online.dk
Danish Hip Arthroplasty Register (DHR), www.kea.au.dk
EPIRARE (European Platform for Rare Disease Registries), www.epirare.eu
International Society of Heart and Lung Transplantation (ISHLT), www.ishlt.org
Kaiser Permanente Autoimmune Disease Registry, www.kaiserpermanente.org
NAACCR (North American Association of Central Cancer Registries), www.naaccr.org
National Cancer Data Base, www.facs.org/quality programs/cancer/ncdb/
(continued)
96 R. M. Mullner
Table 3 (continued)
National Cardiovascular Data Registry, www.cardiosource.org
National Marrow Donor Program’s Be the Match Registry, www.bethematch.org
National Trauma Data Bank, www.ntdsdictionary.org
Register of Information and Knowledge about Swedish Heart Intensive-care Admissions (RIKS-HIA), www.ucr.uu.se
Scientific Registry of Transplant Recipients (SRTR), www.srtr.org
Swedish Childhood Cancer Registry, www.cceg.ki.se
Swedish Hip Arthroplasty Register (SHAR), www.shpr.se
Swedish National Cataract Register (NCR), www.kataraktreg.se
Swedish Rheumatology Quality Register (SRQ),www.srq.nu/en/
United Kingdom Cataract National Data Set for Adults,www.rcophth.ac.uk
United Kingdom Myocardial Ischaemial National Audit Project (MINAP), www.hqip.org.uk
United Network for Organ Sharing (UNOS), www.unos.org
Research and policy organizations
Abt Associates, www.abtassociates.com
American Enterprise Institute for Public Policy (AEI), www.aei.org
American Health Policy Institute, www.americanhealthpolicy.org
American Research Institute for Policy Development (ARIPD), www.aripd.org
Battelle Memorial Institute, www.battelle.org
Brookings Institution, www.brookings.edu
Canadian Association for Health Services and Policy Research (CAHSPR), www.cahspr.ca
Cato Institute, www.cato.org
Deloitte Center for Health Solutions, www2.deloitte.com
ECRI Institute, www.ecri.org
Families USA, www.familiesusa.org
Galen Institute, www.galen.org
George Washington University Center for Health Policy Research, www.publichealth.gwu.edu
Institute for Clinical Evaluative Sciences (Ontario, Canada), www.chspr.ubc.ca
Institute for e-Health Policy, www.e-healthpolicy.org
Institute for the Future (IFTF), www.iftf.org
Institute for Healthcare Improvement (IHI), www.ihi.org
Institute of Medicine (IOM), www.iom.edu
International Health Economics Association (iHEA), www.healtheconomics.org
Lewin Group, www.lewin.com
Manitoba Centre for Health Policy,www.umanitoba.ca/centres/mchp/
Mathematica Policy Research, www.mathematica-mpr.com
McMaster University Centre for Health Economics and Policy Analysis (CHEPA), www.chepa.org
National Bureau of Economic Research (NBER), www.nber.org
National Center for Policy Analysis (NCPA), www.ncpa.org
National Center for Public Policy Research, www.nationalcenter.org
National Coalition on Health Care (NCHC), www.nchc.org
National Health Policy Forum (NHPF), www.nhpf.org
National Health Policy Group (NHPG), www.nhpg.org
National Institute for Health Care Reform (NIHCR), www.nihcr.org
Nuffield Trust, www.nuffieldtrust.org.uk
Patient-Centered Research Institute (PCORI), www.pcori.org
RAND Corporation, www.rand.org
RTI International, www.rti.org
Stanford University Center for Health Policy/Center for Primary Care and Outcomes Research, www.stanford.edu
Transamerica Center for Health Studies, www.transamericacenterforhealthstudies.org
(continued)
5 Health Services Data: Typology of Health Care Data 97
Table 3 (continued)
University of British Columbia, Centre for Health Services and Policy Research, www.chspr.ubc.ca
University of California Los Angles Center for Health Policy Research, www.healthpolicy.ucla.edu
University of Illinois at Chicago Institute for Health Research Policy, www.ihrp.uic.edu
University of Nebraska Center for Health Policy Analysis and Rural Health Research, www.unmc.edu
Urban Institute, www.urban.org
Westat, www.westat.com
Survey research organizations
AAPOR (American Association for Public Opinion Research), www.aapor.org
AASRO (Association of Academic Survey Research Organizations), www.aasro.org
American Statistical Association, Survey Research Methods Section, www.amstat.org/sections/srms
CASRO (Council of American Survey Research Organizations), www.casro.org
ESRA (European Survey Research Association), www.europeansurveyresearch.org
Gallup, Inc., www.gallup.com
GESIS – Leibniz Institute for the Social Sciences, Mannheim, Germany, www.gesis.org/en/institute/
Harris Interactive, www.harrisinteractive.com
Institute for Social Research, York University, www.isr.yorku.ca
NORC at the University of Chicago,www.norc.org
ORC (Opinion Research Corporation) International, www.orcinternational.com
Population Research Laboratory, University of Alberta, www.ualberta.ca/PRL/
Rasmussen Reports, www.rasmussenreports.com
Roper Center for Public Opinion Research, University of Connecticut, www.ropercenter.uconn.edu
Survey Health Care, www.surveyhealthcare.com
Survey Research Laboratory, University of Illinois at Chicago, www.srl.uic.edu
University of Virginia Center for Survey Research, www.virginia.edu/surveys/moreinfo.htm
The US Food and Drug Administration Table 4 List of major U.S. Department of Health and
(FDA) also has many registries that may be Human Services data sources by division/agency
useful to health services researchers. There are Agency for Healthcare Research and Quality
registries that identify all US registered drugs (AHRQ)
and medical devices and their manufacturers; Healthcare Cost and Utilization Project (HCUP)
Medical Expenditure Panel Survey (MEPS)
record the occurrence of adverse drug events
Centers for Disease Control and Prevention (CDC)
and medication errors; and list drugs in short
Behavioral Risk Factor Surveillance System (BRFSS)
supply and the reasons for the drug shortages.
National Ambulatory Medical Care Survey (NAMCS)
The FDA also has a registry of all new and
National Health Interview Survey (NHIS)
generic drug approvals. More information on National Health and Nutrition Examination Survey
these and other FDA registries can be found at (NHANES)
www.fda.gov. National Home and Hospice Care Survey (NHHCS)
National Hospital Ambulatory Medical Care Survey
US Government Agencies (NHAMCS)
and Departments National Hospital Care Survey (NHCS)
All of the 15 executive departments of the US National Hospital Discharge Survey (NHDS)
federal government are involved in some way National Immunization Survey (NIS)
National Nursing Home Survey (NNHS)
with collecting health care data. For example,
National Survey of Children’s Health (NSCH)
the US Department of Labor (DOL) collects data
National Survey of Family Growth (NSFG)
on the nation’s health care workers and makes
National Survey of Residential Care Facilities (NSRCF)
projections of future needs; the US Department
National Vital Statistics System (NVSS)
of Defense (DOD) records the health care services State and Local Area Integrated Telephone Survey
they provide to military personnel and their fam- (SLAITS)
ilies at bases throughout the world; and the US Youth Risk Behavior Surveillance System (YRBSS)
Department of Homeland Security (DHS) works Centers for Medicare and Medicaid Services (CMS)
and collects data on the nation’s hospitals and CMS Administrative Datasets
public health departments to prepare for natural Home Health Outcome and Assessment Information Set
disasters and possible terrorist attacks. (OASIS)
The US Department of Health and Human Medicare Current Beneficiary Survey (MCBS)
Services (HHS) is by far the largest and arguably Health Resources and Services Administration
(HRSA)
the most important collector of health care data of
Area Health Resource File (AHRF)
all. The HHS, with a budget of $1 trillion in fiscal National Institutes of Health (NIH)
year (FY) 2015, has many staff offices and oper- Health and Retirement Study (HRS)
ating divisions, which implement national health National Children’s Study (NCS)
care policy, manage health care programs, deliver Substance Abuse and Mental Health Services
health care services, conduct medical research, Administration (SAMHSA)
and collect health care data. The major data col- National Survey on Drug Use and Health (NSDUH)
lection systems sponsored by HHS, by agency and
division, are listed in Table 4 (Office of the Assis-
tant Secretary for Planning and Evaluation n.d.):
language websites are: United Kingdom’s
Health Programs and Systems of Other National Health Service (NHS) at www.nhs.uk,
(Non-U.S.) Nations including its the National Institute for Health and
Some nations provide data and information about Care Excellence (NICE) www.nice.org.uk; Health
their health care programs and systems, which Canada www.hc-sc.gc.ca, and the Canadian Insti-
may be useful for health services research. This tute of Health Research www.cihr-irsc.gc.ca; and
information can be quickly obtained via the Inter- Australian Government Department of Human
net. Some of the best and most accessible English- Services, www.humanservices.gov.au.
5 Health Services Data: Typology of Health Care Data 99
archive; and the terrorism and preparedness data Colorado, Healthgrades has amassed data on
resource center. Faculty, staff, and students of over three million US health care providers.
member institutions have full access to ICPSR’s Healthgrades provides online data to consumers
data archives and to all of its services. Data files on physicians, hospitals, and dentists. For exam-
are available in SAS, SPSS, Strata, and R format. ple, there website identifies the name of physi-
ICPSR’s website is www.icpsr.umich.edu. cians in a city or zip code, the conditions they
treat, the procedures they perform, the physician’s
Accreditation, Evaluation, qualifications and patient feedback, and other
and Regulatory Organizations criteria. In terms of qualifications, the site iden-
To ensure that patients receive safe high quality tifies whether the physician is board certified and
care, health care professionals, laboratories, pro- has sanctions or malpractice claims against them,
grams, and health care facilities are accredited and the report of eight measures of care, and their
regulated. patient’s willingness to recommend the physician
One of the most important accrediting organi- to their family and friends. Today, nearly one
zations is the Joint Commission. Founded in million people a day use the Healthgrades
1951, the Joint Commission, and independent, website. It should be noted that some physicians
not-for-profit organization, is the largest and have criticized Healthgrades for having erroneous
oldest accrediting health care organization in the data, and not screening for false reviews.
USA. It accredits and certifies more than 20,500 Healthgrades website is www.healthgrades.com.
health care organizations and programs in the
nation including: all types of hospitals; home Associations and Professional Societies
care organizations, medical equipment services, The largest category of private health care organi-
pharmacy, and hospice services; nursing homes zations is associations and professional societies.
and rehabilitation centers; behavioral health care This category, which includes hundreds of orga-
and addiction services; ambulatory care organiza- nizations (Swartout 2014), can be roughly
tions, group practices and office-based surgery subdivided into disease/condition associations,
practices; and independent and freestanding clin- demographic and population group associations,
ical laboratories. health care organizations and trade associations,
To receive Joint Commission accreditation, and professional societies.
hospitals, for example, must meet certain There are associations for nearly every disease
evidence-based process standards that are closely and medical condition. These associations help
linked to positive patient outcomes. These process individuals and their families suffering from the
or accountability measures include heart attack disease, and they advocate on their behalf, educate
care, pneumonia care, surgical care, children’s the general public, and work to prevent and end
asthma care, inpatient psychiatric services, the disease. An example of this type of association
venous thromboembolism care, stroke care, is the American Cancer Society (ACS). Founded
immunization, and perinatal care (Chassin in 1913, the American Cancer Society is one of
et al 2010). The Joint Commission grants accred- the largest voluntary health organizations in the
itation based on periodic reviews by its survey USA. With its headquarters in Atlanta, Georgia,
teams who conduct unannounced onsite visits, the ACS also has over 350 local offices nation-
and quarterly self-assessment reports submitted wide. The ACS works to prevent cancer and
by the hospitals. The quality and safety results detect it as early as possible. The society offers
for specific hospitals are available at www. free information, programs, and services, and it
qualitycheck.org. provides community referrals to patients, survi-
A more recently established popular health vors, and caregivers. It funds research to identify
care evaluation organization is the Healthgrades the causes of cancer, to determine the best way to
Operating Company, which simply known as prevent cancer, and to discover new ways to cure
Healthgrades. Founded in 1998 in Denver, the disease. It also works with lawmakers to
5 Health Services Data: Typology of Health Care Data 101
promote policies, laws, and regulations to prevent Another type of health care association is pro-
cancer. The ACS has a National Cancer Informa- fessional societies. These societies advocate and
tion Center, which is open 24 h a day, every day of lobby for their members, provide continuing edu-
the year, to answer questions from individuals. cation, and attempt to advance the field. They
And it also offers advice online. The ACS website typically publish newsletters, factsheets, journals,
is www.cancer.org. and hold local meetings and an annual convention
Some associations represent specific demo- for their members.
graphic and population groups. For example, the For example, one of the oldest professional
National Rural Health Association (NRHA) medical societies is the American Medical Associ-
works on behalf of the rural population of the ation (AMA). Founded in 1847, and incorporated
USA. Nearly 25 % of the nation’s population in 1897, the AMA is the largest association of
lives in rural areas and many of them, who tend physicians and medical students in the USA.
to be poorer, have higher suicide rates and expe- Starting as a small association, the AMA would
rience higher death and serious injury accidents become the single most influential organization on
than their urban counterparts, also face physician the practice of medicine in the nation. The AMA
shortages and have to travel long distances to gained national prominence by publishing its flag-
health facilities. The NRHA works to improve ship Journal of the American Medical Association
the health and well-being of rural Americans. It and by reorganizing into local and state-level con-
provides leadership on health issues through stituent societies, a national House of Delegates, a
advocacy, communications, education, and Board of Trustees, and national officers. With these
research. Founded in 1980, with headquarters in changes, the membership of the AMA grew from
Leawood, Kansas, the NRHA has more than around 8,000 in 1900 to approximately 220,000
21,000 individual and organizational members, today. During the 1960s, the membership market
all sharing a common interest in rural health. Its share of the AMA reached its zenith, representing
website is www.ruralhealthweb.org. about 70 % of the nation’s physicians, but today it
Other associations represent health care organi- only represents about 25 %. Its membership, and
zations and trade associations. The Pharmaceutical to some degree its influence, has declined because
Research and Manufacturers of America (PhRMA) of the profusion of competing national specialty
is an example of a large influential trade associa- medical societies, and the decline of solo practices
tion. Founded in 1958, and headquartered in and the rise of salaried physicians who work for
Washington, D.C., PhRMA represents the nation’s various organizations (American Medical Associ-
largest biopharmaceutical research and biotechnol- ation 1997).
ogy companies, such as Amgen, Bayer, Eli Lilly, Today, the stated mission of the AMA is to
Merck, and Pfizer. Since 2000, PhRMA member promote the art and science of medicine and the
companies have invested more than $550 billion in betterment of public health. Headquartered in
drug development, including an estimated $51.1 Chicago and with an office in Washington, D.C.,
billion in 2013. PhRMA is an advocate for public the AMA advocates for its members by develop-
policies to encourage the discovery of new medi- ing health care policies. The top items on the
cines. To accomplish this PhRMA is dedicated to AMA’s current policy agenda include modifica-
achieve: broad patient access to medicines through tion of the Affordable Care Act (ACA), the
a free market, without price controls; strong intel- improvement of diabetes care delivery, changes
lectual property incentives; and effective regula- in drug reporting, and increasing Medicaid pay-
tion and a free flow of information to patients. ments making them comparable to those paid by
PhRMA publishes policy papers, profiles and Medicare. The AMA also produces a number of
reports, fact sheets, newsletters, and speeches important products and services. The association
(Pharmaceutical Research and Manufacturers of is one of the largest publishers of medical infor-
America 2014). These publications are available mation in the world. For example, its weekly
at its website, www.phrma.org. Journal of the American Medical Association
102 R. M. Mullner
HIV/AIDS prevention strategies using antiretroviral Table 5 RAND Corporation health surveys by topic
drugs in South Africa, India, and the USA; measur- Aging and health
ing the total costs of dementia in the USA; deter- Assessing Care of Vulnerable Elders (ACOVE)
mining the impact of lowering the costs of healthy Vulnerable Elders Survey (VES-13)
foods in supermarkets in the diet patterns of house- Diversity and health
holds in South African; identifying the effect of the Homelessness survey
Affordable Care Act (ACA) on hospital emergency Health economics
department use by young adults who remained on Hospital competition measures
their parent’s health insurance; and developing new Managed health care survey
models of patient-centered medical homes and HIV, STDs, and sexual behavior
nurse-managed health centers to help alleviate the HIV Cost and Services Utilization Study (HCSUS)
HIV Identification, Prevention, and Treatment Services
growing shortage of primary care physicians in the
Surveys
USA (RAND Corporation 2013). HIV Patient-Assessed Report of Status and Experience
The RAND Corporation publishes all of its (HIV-PARSE)
reports on its website. Further, RAND Health Maternal, child, and adolescent health
makes all of its surveys publicly available without Pediatric Asthma Symptom Scale
charge. Examples of available surveys by topic are Pediatric Quality of Life Inventory (PedsQL
shown in Table 5. More information can be found Measurement Model)
at www.rand.org/health/surveys_tools.html. Mental health
Mental health inventory
Depression screener
Survey Research Organizations
Improving Care for Depression in Primary Care (Partners
Academic and commercial survey research orga- in Care)
nizations frequently collect health care data. They Military health policy
often conduct health care surveys for various gov- Chronic Illness Care Evaluation Instruments (ICICE website)
ernment agencies, commercial companies, and Dialysis Patient Satisfaction Survey (DPSS)
research and public policies organizations. Some- Patient Satisfaction Questionnaires (PSQ-III and PSQ-18)
times they also add health care questions to the Patient Satisfaction Survey for the Unified Medical Group
general population surveys they conduct to deter- Association
mine changing attitudes, beliefs, and public opin- Quality of life
ions. Data from these surveys are often archived Epilepsy Surgery Inventory Survey (ESI-55)
by the survey organizations and eventually are Kidney Disease Quality of Life Instrument (KDQOL)
made available to researchers. Many of these Medical Outcomes Study (MOS)
organizations also provide lists of the survey Measures of quality of life
Measures of patient adherence
questions they have used. This can be a valuable
Mental health inventory
resource for researchers, because it is difficult to
Sexual problems measures
design nonbiased questions, and they can judge
Sleep scale
the validity and reliability of the questions already
Social support survey
used. Researchers may include these questions in National Eye Institute Refractive Error Quality of Life
the surveys they are designing. Instrument
An example of one of the oldest independent Pediatric Quality of Life Inventory (PedsQL
academic-based survey research organizations is Measurement Model)
NORC at the University of Chicago. Founded in Quality of Life in Epilepsy Inventory (QOLIE-89 and
QOLIE-31)
1941, NORC, which originally stood for National
RAND Negative Impact of Asthma on Quality of Life
Opinion Research Center, is headquartered in
Visual Function Questionnaire (VFQ-25)
downtown Chicago with additional offices on
Research methods
the University of Chicago’s campus and in
Socially Desirable Response Set Five-Item Survey (SDRS-5)
Washington, D.C. During the past 70 years, The Homelessness Survey
NORC has conducted many landmark national
5 Health Services Data: Typology of Health Care Data 105
large-scale health surveys including: National will all increasingly demand having more health
Ambulatory Medical Care Survey, the first-ever care data. Patients will need these data to help
survey of medical care delivered to patients by them make better evidence-based informed deci-
office-based physicians; National Children’s sions. They need to know: Who are the best
Study, the largest study of children’s health and physicians for the care I need? What innovative
development tracking 100,000 children before treatments are available? What are the benefits
birth through age 21; and the National Social and risks of the treatments? Which hospitals are
Life, Health and Aging Project, a longitudinal the best providers of the treatments? Where can I
study of the health of older Americans. get a second or even a third medical opinion?
One of NORC’s flagship surveys and longest- How much will the treatments cost? And which
running projects is its General Social Survey (GSS). treatments are covered by my current health
Begun in 1972, and continuing today, this annual insurance policy?
survey is the most widely regarded single best Health care providers will need more data to
source of data on societal trends. Hundreds of better monitor the care they provide. They will
researchers, policymakers, and students have used need to hold down their costs, provide high quality
the survey’s data to study a wealth of topics. The services, and justify what they charge to health care
GSS contains a standard set of demographic, behav- insurers. They also will have to increasingly deal
ioral, and attitudinal questions, plus various topics with patients demanding more data on the cost and
of special interest. For more than 40 years the GSS quality of the care they received. Already many
has been tracking the opinions of Americans. Over hospitals and clinics, insurers, and employers enable
the years, many health care questions have been patients to access their electronic medical and billing
included in the survey asking about choice of phy- records online.
sicians, difficulty receiving care, health insurance Policymakers will need more data to develop
coverage, coverage changes, use of Medicare/Med- new more effective policies to help bend the cost
icaid, incentives for physicians, opinions on HMOs, curve. They will use these data to construct and
and whether they sought medical care for mental test new medical care reimbursement models,
health problems. Data from the GSS and its various which will hopefully lower costs and at the same
questionnaires and codebooks can be downloaded. time increase the quality of care. They will also
A cross-tabulation program is also available (NORC develop policies to encourage more disease pre-
at the University of Chicago 2011). More informa- vention and wellness programs.
tion on NORC and its surveys, including the GSS, Health services researchers will demand more
can be obtained at www.norc.org. data to better evaluate existing health care pro-
grams. They will increasingly conduct research to
compare the relationship between the cost and
Conclusion quality of health care to determine its value to
patients and society. Over time, using these data
This chapter has presented a practical typology of sources, health services researchers will forge an
health care data, and it has identified and described important new evidence-based science of health
many important data sources and public use files. care delivery – a new science that will continue to
Although much health care data are currently avail- build on the crucial concepts of access, cost, qual-
able, in the future much more data will be needed. ity, and the outcome of health care.
The demand for more accessible, transparent, and
comprehensive health care data will be driven by
advances in medical science, rising public expecta- References
tions, the continuing growth of the Internet and social
media, and the ever increasing cost of health care. Aaron HJ, Schwartz WB, Cox MA. Can we say no?: the
In the future, patients, health care providers, challenge of rationing health care. Washington, DC:
Brookings Institution Press; 2005.
policymakers, and health services researchers
106 R. M. Mullner
Aday LA, Cornelius LJ. Designing and conducting health Centers for Medicare and Medicaid Services (CMS).
surveys: a comprehensive guide. 3rd ed. San Francisco: Medicare and you, 2015. Baltimore: Centers for Medi-
Jossey-Bass; 2006. care and Medicaid Services; 2014a. Available at: www.
Aday LA, Begley CE, Lairson DR, Balkrishnan cms.gov
R. Evaluating the healthcare system: effectiveness, effi- Centers for Medicare and Medicaid Services (CMS). Nurs-
ciency, and equity. 3rd ed. Chicago: Health Adminis- ing home data compendium 2013 edition. Baltimore:
tration Press; 2004. Centers for Medicare and Medicaid Services; 2014b.
Agency for Healthcare Research and Quality (AHRQ). Available at: www.cms.gov
2013 National healthcare disparities report. Rockville: Chassin MR, Loeb JM, Schmaltz SP, Wachter
Agency for Healthcare Research and Quality; 2014a. RW. Accountability measures – using measurement to
Available at: www.ahrq.gov/research/findings/nhqrdr/ promote quality improvement. N Engl J Med. 2010;363
index.html (7):683–88. Available at: www.nejm.org/doi/full/10.
Agency for Healthcare Research and Quality (AHRQ). 1056/NEJMsb1002320
2013 National healthcare quality report. Rockville: Children’s Health Care Quality Measures Core Set Tech-
Agency for Healthcare Research and Quality; 2014b. nical Assistance and Analytic Program. Strategies for
Available at: www.ahrq.gov/research/findings/nhqrdr/ using vital records to measure quality of care in Med-
index.html icaid and CHIP programs. Medicaid/CHIP Health Care
American College of Surgeons. National Trauma Data Quality Measures: Technical Assistance Brief 4: Jan
Bank 2013: annual report. Chicago: American College 2014, 1–11. Available at: www.medicaid.gov/Medic
of Surgeons; 2013. Available at: www.ntdb.org aid-CHIP-Program-information/By-Topics/Quality-of-
American Hospital Association (AHA). AHA guide to the Care/Downloads/Using-Vital-Records.pdf
health care field, 2014. Chicago: Health Forum; 2013a. Clancy CM. What is health care quality and who decides?
American Hospital Association (AHA). AHA hospital sta- Statement before the Committee on Finance, Subcommit-
tistics, 2014. Chicago: Health Forum; 2013b. tee on Health Care, U.S. Senate, 18 Mar 2009. Available
American Hospital Association (AHA). AHA Data Viewer at: www.hhs.gov/asl/testify/2009/03/t20090318b.html
website. 2015. www.ahadataviewer.com Culyer AJ. The dictionary of health economics, second
American Medical Association. Caring for the country: a edition. Northampton: Edward Elgar; 2010.
history and celebration of the first 150 years of the Dartmouth Atlas of Health Care Project. 2015. www.
American Medical Association. Chicago: American dartmouthatlas.org
Medical Association; 1997. Davis KE. Access to health care of adult men and women,
Andersen RM. Revisiting the behavioral model and access ages 18–64, 2012. Medical Expenditure Panel Survey
to medical care: does it matter? J Health Soc Behav. (MEPS) Statistical Brief #461. Rockville: U.S. Agency
1995; 36:1–10. Available at: www.mph.ufl.edu/files/ for Healthcare Research and Quality (AHRQ); 2014.
2012/01/session6april2RevisitingBehavioralModel.pdf Available at: www.meps.ahrq.gov/mepsweb/data_files/
Anderson C. Multinational comparisons of health system publications/st461/stat461.pdf
data, 2014. New York: Commonwealth Fund; 2014. Davis K, Stremikis K, Squires D, Schoen C. Mirror, mirror
Available at: www.commonwealthfund.org on the wall: how the performance of the U.S. health
Anderson GF, Squires DA. Measuring the U.S. health care care system compares internationally. Pub. No. 1755.
system: a cross-national comparison. Issues in Interna- New York: Commonwealth Fund; 2014. Available at:
tional Health Policy, Commonwealth Fund, Pub. 1412, www.commonwealthfund.org
Vol. 90, June 2010. Available at: www. Donabedian A. The definition of quality and approaches to
commonwealthfund.org its assessment. Vol. 1. Explorations in quality assess-
Aron-Dine A, Einav L, Finkelstein A. The RAND Health ment and monitoring. Ann Arbor: Health Administra-
Insurance Experiment, three decades later. J Econ tion Press; 1980.
Perspect. 2013;27(1):197–222. Available at: http:// Feldstein PJ. Health care economics. 7th ed. New York:
economics.mit.edu/files/8400 Thomson Deimar Learning; 2011.
Arts DGT, de Keizer NF, Scheffer G-J. Defining and Gliklich RE, Dreyer NA, editors. Registries for evaluating
improving data quality in medical registries: a literature patient outcomes: a user’s guide. 2nd ed. AHRQ Pub-
review, case study, and generic framework. J Am Med lication No. 10-EHC049. Rockville: U.S. Agency for
Inform Assoc. 2002;9(6):600. Available at: www.ncbi. Healthcare Research and Quality; 2010. p. 15–16.
nlm.nih.gov/pmc/articles/PMC349377 Available at: www.effectivehealthcare.ahrq.gov/ehc/
Black N. Why we need observational studies to evaluate products/74/531/Registries2nd ed Final to Eisenberg
the effectiveness of health care. BMJ. 1996;312 9-15-10.pdf
(7040):1215–18. Available at: www.bmj.com/content/ Halsey MF, Albanese SA, Thacker M, The Project of the
312/7040/1215 POSNA Practice Management Committee. Patient sat-
Borenstein M, Hedges LV, Higgins JPT, Rothstein isfaction surveys: an evaluation of POSNA members’
HR. Introduction to meta-analysis. Chichester: Wiley; knowledge and experience. J Pediatr Orthop. 2015;
2009. 35(1):104–7.
5 Health Services Data: Typology of Health Care Data 107
Health Care Cost Institute (HCCI). 2013 health care cost and Newhouse JP, The Insurance Experiment Group. Free
utilization report. Washington, DC: Health Care Cost for all?: lessons from the RAND health experiment.
Institute; 2014. Available at: www.healthcostinstitute.org Cambridge, MA: Harvard University Press; 1993.
Healthy People 2020. www.healthypeople.gov NORC at the University of Chicago. Social science
Higgins JPT, Green S, editors. Cochrane handbook for research in action. Chicago: NORC at the University
systematic reviews of interventions. Chichester: of Chicago; 2011. Available at: www.norc.org/PDFs/
Wiley-Blackwell; 2008. Brochures-Collateral/NORC_Book_Social_Science_
Huston P, Naylor CD. Health services research: reporting Research_in_Action.pdf
on studies using secondary data sources. Can Med Office of the Assistant Secretary for Planning and Evalua-
Assoc J. 1996;155(12):1697–1702. Available at: tion (ASPE). U.S. Department of Health and Human
www.ncbi.nlm.nih.gov Services (HHS). Guide to HHS surveys and data
Johnson TP, editor. Handbook of health survey methods. resources. Washington, DC: U.S. Department of Health
New York: Wiley; 2014. and Human Services; n.d. Available at: www.aspe.hhs.
Kane RL, Radosevich DM. Conducting health outcomes gov/sp/surveys/index.cfm
research. Sudbury: Jones and Bartlett Learning; 2011. Organisation for Economic Co-operation and Develop-
Krueger RA, Casey MA. Focus groups: a practical guide ment (OECD). Health at a glance 2013: OECD indica-
for applied research. 4th ed. Thousand Oaks: Sage; tors. Paris: Organisation for Economic Co-operation
2009. and Development; 2013. Available at: https://doi.org/
Larsson S, Lawyer P, Garellick G, Lindahl B, Lundstrom 10.1787/health_glance-2013-en
M. Use of 13 disease registries in 5 countries demon- Osborn R, Moulds D, Squires D, et al. International survey
strates the potential to use outcome data to improve of older adults finds shortcomings in access, coordina-
health care’s value. Health Aff. 2012;31(1):220–7. tion, and patient-centered care. Health Aff. 2014;33
Lee K. Global institutions: the World Health Organization (12):2247–55.
(WHO). New York: Routledge; 2008. Painter MJ, Chernew ME. Counting change: measuring health
Levy D, Brink S. A change of heart: how the people of care prices, costs, and spending. Princeton: Robert Wood
Framingham, Massachusetts, helped unravel the mys- Johnson Foundation; 2012. Available at: www.rwjf.org
teries of cardiovascular disease. New York: Knopf; Perrin JM. Health services research for children with dis-
2005. abilities. Milbank Q. 2002;80(2):303–24. Available at:
Mullner RM, editor. Encyclopedia of health services www.ncbi.nlm.nih.gov/pmc/articles/PMC2690116/
research. 2 Vol. Thousand Oaks: Sage; 2009, xxix. Pharmaceutical Research and Manufacturers of America.
National Cancer Institute. SEER as a research resource. 2014 Biopharmaceutical Research Industry Profile.
NIH Publication No. 10-7519. Bethesda: SEER Pro- Washington, DC: Pharmaceutical Research and Manu-
gram, National Cancer Institute; 2010. Available at: facturers of America; 2014. Available at: www.phrma.
www.seer.cancer.gov/about/factsheets/SEER_Research_ org/sites/default/files/pdf/2014_PhRMA_PROFILE.pdf
Brochure.pdf RAND Corporation. RAND Corporation: annual report 2013.
National Center for Health Statistics (NCHS). National Santa Monica: RAND Corporation; 2013. Available at:
health survey: the principal source of information on www.rand.org/pubs/corporate_pubs/CP1-2013.html
the health of the U.S. population. Hyattsville: National Rhoades JA, Cohen SB. The long-term uninsured in Amer-
Center for Health Statistics; 2010. Available at: www. ica, 2009–12 (selected intervals): estimates for the
cdc.gov/nchs/data/nhis/brochure2010January.pdf U.S. civilian noninstitutionalized population under
National Center for Health Statistics (NCHS). Health, age 65. Medical Expenditure Panel Survey (MEPS)
United States, 2013: with special feature on prescrip- Statistical Brief #464. Rockville: U.S. Agency for
tion drugs. Hyattsville: National Center for Health Sta- Healthcare Research and Quality (AHRQ); 2014.
tistics; 2014a. Available at: www.cdc.gov/nchs/data/ Available at: www.meps.ahrq.gov/mepsweb/data_
hus/hus13.pdf files/publications/st464/stat464.pdf
National Center for Health Statistics (NCHS). Summary Robert Wood Johnson Foundation. Consumer attitudes on
of current surveys and data collection systems. health care costs: insights from focus groups in four
National Center for Health Statistics; 2014b. Avail- U.S. cities. Princeton: Robert Wood Johnson Founda-
able at: www.cdc.gov/nchs/data/factsheets/factsheet_ tion; 2013. Available at: www.rwjf.org/content/dam/
summary1.pdf farm/reports/issue_briefs/2013/rwjf403428
National Committee for Quality Assurance (NCQA). The Stagnitti MN, Carper K. National health care expenses in
essential guide to health care quality. Washington, DC: the U.S. civilian noninstitutionalized population, dis-
National Committee for Quality Assurance; 2006. tributions by types of service and source of payment,
Available at: www.ncqa.org 2012. Medical Expenditure Panel Survey (MEPS)
National Research Council, Committee on National Statistics. Statistical Brief #456. Rockville: U.S. Agency for
Vital statistics: summary of a workshop. Washington, Healthcare Research and Quality (AHRQ); 2014.
DC: National Academies Press; 2009. Available at: Available at: www.meps.ahrq.gov/mepsweb/data_
www.ncbi.nlm.nih.gov/books/NBK219877/ files/publications/st456/stat456.pdf
108 R. M. Mullner
Swartout KA. Encyclopedia of associations: national orga- White Paper, AHRQ Publication No. 13-EHC124-EF.
nizations of the U.S. Farmington Hills: Gale Cengage Rockville: U.S. Agency for Healthcare Research and
Learning; 2014. Available at: www.gale.cengage.com Quality; 2013. Available at: www.effectivehealthcare.
Wakefield MK. Statement by HRSA administrator Mary ahrq.gov/ehc/assets/File/Patient-Powered-Registries-
K. Wakefield, Ph.D., R.N. on the National Practitioner white-paper-130911.pdf
Data Bank Public Use File, 9 Nov 2011. Available at: World Health Organization (WHO). World health survey,
www.npdb.hrsa.gov/resources/publicDataStatement.jsp 2002, B – individual questionnaire. 2002. Available at:
Wennberg JE. Tracking medicine: a researcher’s quest to www.who.int/healthinfo/survey/en/
understand health care. New York: Oxford University World Health Organization (WHO). World health statis-
Press; 2010. p. 1–13. tics, 2014. Geneva: WHO Press; 2014. Available at:
Williamson A, Hoggart B. Pain: a review of three com- www.who.int
monly used pain rating scales. J Clin Nurs. 2008;14 Xu F, Mawokomatanda T, Flegel D, et al. Surveillance for
(7):798–804. Available at: www.onlinelibrary.wiley. certain health behaviors among states and selected local
com/doi/10.1111/j.1365-2702.2005.01121.x/pdf areas – behavioral risk factor surveillance system,
Workman TA. Engaging patients in information sharing United States, 2011. Morb Mortal Wkly Rep
and data collection: the role of patient-powered regis- (MMWR). 2014;63(9):1–149. Available at: www.cdc.
tries and research networks. AHRQ Community Forum gov/mmwr/pdf/ss/ss6309.pdf
Health Services Information: Application
of Donabedian’s Framework to Improve 6
the Quality of Clinical Care
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
National Committee for Quality Assurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Dr. Ernest Amory Codman’s Data-Driven
Approach to Defining and Measuring Quality of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Dr. Avedis Donabedian’s Process-Structure-
Outcome Model for Quality of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Processes of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Structures of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Outcomes of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Process-Structure-Outcomes in Cardiac
Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Risk Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Implementation of VA National Quality
Improvement Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
The Processes, Structures, and Outcomes of
Cardiac Surgery Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
A. L. W. Shroyer (*)
Department of Surgery, School of Medicine, Stony Brook
University, Stony Brook, NY, USA
e-mail: annielaurie.shroyer@stonybrookmedicine.edu
B. M. Carr
Department of Emergency Medicine, Mayo Clinic,
Rochester, MN, USA
F. L. Grover
Department of Surgery, School of Medicine at the
Anschutz Medical Campus, University of Colorado,
Aurora, CO, USA
# This is a U.S. government work and not under copyright protection in the U.S.; 109
foreign copyright protection may apply 2019
A. Levy et al. (eds.), Health Services Evaluation, Health Services Research,
https://doi.org/10.1007/978-1-4939-8715-3_7
110 A. L. W. Shroyer et al.
evaluating whether the actual outcomes incurred National Committee for Quality
achieve the original expectations. As perhaps Assurance
one of the earliest documented descriptions
related to defining or measuring the quality of Toward the goal of providing quality rankings, the
medical care, King Hammurabi’s Code (1,700 National Committee for Quality Assurance
BC) provided insights as to what were consid- (NCQA) provides an infrastructure support of a
ered unacceptable care outcomes as compared to broad array of programs and services focused on
the expectations, providing clear instructions as measuring, analyzing, and continually improving
to the direct consequences to clinicians for the the healthcare provided by US-based health plans.
delivery of substandard care: The National Committee for Quality Assurance
has defined quality metrics that can be used to
If a physician performed a major operation on a identify opportunities for quality improvement.
nobleman with a bronze lancet and caused the The routine reporting of quality metrics has been
nobleman’s death, or he opened the eye-socket of
a nobleman and destroyed the nobleman’s eye, they useful to inform decisions at the clinical program,
shall cut off his hand. (Magno 1975) facility, health plan, and policy levels. By provid-
ing publicly available statistical reports evaluating
To optimize quality of medical care, there health plan performance, important quality
exist at many facilities patient safety initiatives improvements have been documented and trans-
focused on engaging healthcare professionals, lated into reduced adverse event rates impacting
organizations, and patients toward the attain- patient care. For example, the use of beta-blockers
ment of a healthcare system that reduces errors for the subgroup of patients with a prior acute
with a focus to consistently improve the care myocardial infarction (aka AMI or “heart attack”)
provided (based on previously identified chal- has been documented in the peer-reviewed litera-
lenges occurring) and to create an institutional ture to reduce the chance of a repeat AMI by 40%
culture focused upon assuring patient safety as a (National Committee for Quality Assurance
top priority. 2014a). Thus, beta-blocker use has been cited as
Institutional patient safety cultures can foster an NCQA successful metric used to facilitate pos-
and support the design and implementation of itive trends documented for quality of care
ideal clinical practices. This would be exempli- outcomes.
fied by an institutional culture that focuses on Moreover, as part of the National Committee
reducing the risk of adverse events occurring. for Quality Assurance, the Healthcare Effective-
Even with application of the best evidence avail- ness Data and Information Set (HEDIS) was
able, unforeseen adverse consequences of the developed, and, as of 2014, the vast majority of
medical care provided unfortunately still do US-based health plans submitted HEDIS metrics
occur. As an example, the perioperative admin- (which consisted of 81 measures for quality of
istration of prophylactic antibiotic therapy for care across five different care domains). The
patients undergoing surgery is commonly cited National Committee for Quality Assurance
as a patient safety practice employed to prevent HEDIS requires that plans report the continued
surgery-related infections in the postoperative post-AMI use rates for beta-blocker medications
period (van Kasteren et al. 2007). In spite of for their eligible population. That is, health plans
this important intervention, however, postopera- must calculate the proportion of their eligible
tive infections still remain an outstanding chal- enrollees (aged 18 years or older) who received
lenge faced by many healthcare institutions, with persistent beta-blocker treatment for 6 months
multiple approaches implemented to keep post- after discharge following their AMI hospitaliza-
operative infection rates low (e.g., conscience tion over the past year period. Although this spe-
handwashing techniques used routinely, com- cific HEDIS metric is most relevant to a smaller
bined with sterile techniques for wound dressing sized subgroup of ischemic heart disease patients,
changes). the National Committee for Quality Assurance
112 A. L. W. Shroyer et al.
procedures has been documented to improve outcomes (Sharp et al. 2002; Norcini
long-term survival in comparison to the use of et al. 2010).
other conduits, e.g., saphenous vein grafts Facility characteristics, such as a hospital’s
(SVGs) (Goldman et al. 2004). The earlier cardiac affiliation (academic versus community) or loca-
surgery clinical guidelines identified that the use tion (urban versus rural versus frontier hospitals),
of LIMA for CABG surgery should be considered have also been studied as structural characteristics
where longer-term survival may be an important that have been documented to impact the quality
consideration. More recent published literature of care provided. Academic affiliation, for
has extended the LIMA benefits documented to instance, has not been shown to be a predictor of
include the elderly population. As with any surgi- better outcomes (Papanikolaou et al. 2006). Loca-
cal procedure, there are risks and benefits associ- tion may be important, however, as rural hospitals
ated with every procedure, including LIMA use. have been shown to have worse performance on
The use of a LIMA graft generally takes more quality of care indicators than urban hospitals, in
time; therefore, a LIMA graft may be spite of studies showing their outcomes to not be
contraindicated for emergent/urgent patients inferior to those at urban hospitals (Nawal
where surgical cross-clamp time may be critically Lutfiyya et al. 2007; Dowsey et al. 2014; Tran
important. Hence, the CABG LIMA use rates may et al. 2014). The identification of disparities such
be used as a quality of care metric for elective as this not only demonstrates the important role of
patients, but may not be a meaningful measure of structures of care in affecting the quality of care
quality of care for the emergent/urgent patient but may serve as an impetus to identify changes
subgroups (Karthik and Fabri 2006). that can be made in the structures themselves to
improve patient care.
Importantly, the entire process associated with
Structures of Care accreditation, including the Joint Commission, is
intended to coordinate a quality oversight mecha-
Structures of care, as another important metric to nism, which, in theory, should validate the impor-
assess quality, were defined by Dr. Donabedian as tance of structural measures for care. For example,
being related to the “overall context or environ- the field of cardiac surgery has established mini-
ment in which care is rendered to a group of mal acceptable standards for nurse staffing ratios
patients,” including the characteristics of to be coordinated in critical care units for imme-
healthcare team members (e.g., credentials and diate post-CABG patient care. In order to be
experience) and healthcare facilities (e.g., the deemed of “acceptable” quality, standards for the
type and age of equipment) (Shroyer number and type of nurse staffing must be met to
et al. 1995). Representing an important arm of assure that a high quality of care may be provided.
Donabedian’s triad, structures of care include the For example, a study by VillaNueva and col-
manner in which healthcare facilities are orga- leagues looked at risk-adjusted outcomes of car-
nized and operated, the approaches used for care diac surgery patients in relation to (1) “the
delivery, and the policies and procedures related demographics, education, experience, and
to care including quality oversight processes. employment of operation room (OR) and surgical
For example, structures of surgical care may intensive care unit (ICU) nurses involved in their
involve the physicians’ provider-specific char- care” and (2) “the staffing and vacancy ratios of
acteristics, e.g., international medical graduate OR and surgical ICU nurses involved in their
(IMG) or board certification status. Though not care.” Significant variations were observed in pro-
definitive, studies have shown no difference in cesses of care between participating cardiac sur-
mortality outcomes among hospitalized patients gery centers, but there was insufficient data to
treated by graduates of US medical schools ver- draw conclusions on their effect on patient out-
sus IMGs. There may, however, be a correlation comes (VillaNueva et al. 1995). For this study,
between board certification and better clinical therefore, the theoretical link between structures
6 Health Services Information: Application of Donabedian’s Framework to Improve the. . . 115
of care and outcomes of care could not be con- in many ways including (1) planned (intended) or
firmed directly. Within Donabedian’s quality not planned (unintended) (Mavroudis et al. 2014),
triad, there is a fundamental assumption underly- (2) preventable versus not preventable (Lee
ing the assessment of structural quality elements; et al. 2003), (3) major versus minor in importance,
that is, the healthcare setting in which the care is and (4) related or not related to the medical care
rendered is a very important factor influencing the rendered (Shann et al. 2008).
quality of medical care provided. In spite of the Figure 2 illustrates the hypothetical interac-
data-driven evidence being sparse, this assump- tions between processes, structures, and changes
tion extends to the current Joint Commission in the patient’s outcomes of care, where patients
accreditation assessments focused on evaluating present to the healthcare system with an illness, in
the adequacy of healthcare facility basic structure context of their other patient risk characteristics.
of care. The medical care interventions received represent
processes of care, as well as the actions taken by
patients themselves to address their illness state.
Outcomes of Care These actions are coordinated within a healthcare
environment, representing the structures to care.
Finally, outcomes of care, the third piece of the Pending the passage of time, the patient emerges
triad, were defined by Dr. Donabedian as the from their episode of care with a changed rela-
measurable end points of the healthcare process tionship to their illness, which is the outcome of
(Malenka and O’Connor 1998). care measured. This “Process-Structure-Out-
Ideally, a broad range of clinically relevant come” paradigm can be extended from a single
outcomes should be assessed including (but not episode of care to the full series of care encoun-
limited to) traditional measures of mortality and ters, in order to assess and to improve the quality
morbidity, health-related quality of life, of patient care received.
condition-specific or disease-specific metrics of
symptom status or functionality, general health
status, and general overall functionality, or patient Process-Structure-Outcomes
satisfaction. The outcomes measured should be in Cardiac Surgery
related to the full range of care end points salient
to the patients impacted by the treatment received. Supporting these different outcome-based classi-
Prioritized in importance based upon the nature of fication systems, multiple examples have been
the question raised, outcomes may reflect a reported within the field of cardiothoracic surgery.
patient’s status at a single point in time (e.g., Delayed sternal closure, for example, may be
30-day operative mortality) or changes over planned or unplanned (1). In pediatric cardiac
points in time (e.g., pre-CABG angina frequency surgery in particular, the surgeon may plan to
compared to post-CABG 6-month follow-up leave the sternum open at the end of the procedure
angina frequency). For quality assessment pur- because this may allow for better heart function in
poses, moreover, outcomes may be subclassified certain patients. In other cases, however, the
Fig. 2 Theoretical
“process-structure- ENVIRONMENT for CARE=
outcome” framework
Acts Patient Patient with
of Care: Living Changed Relationship
Processes with To lllness =
lllness Outcomes of Care
STRUCTURES of CARE
116 A. L. W. Shroyer et al.
surgeon may have initially planned to close the 30-day operative mortality may be compared to
patient’s chest, but found himself or herself unable non-VA/STS hospital rates (Public Law 99–166
to as a result of bleeding, myocardial edema, or 1985), or these rates can be compared across time,
arrhythmia (Yasa et al. 2010; Ozker et al. 2012). It by examining the metric for different periods. To
is important to distinguish between the two when be most useful as quality assessment metrics, it
investigating the incidence of delayed sternal clo- may be important to make comparisons of differ-
sure as a surgical complication, because planned ent outcome rates across key patient subgroups
delays in closure could inflate the apparent inci- that did or did not receive specific treatments (e.g.,
dence of surgical complications. On the other rates of mediastinitis during the 30-day perioper-
hand, reintubation is rarely planned, but may be ative period for post-CABG patients treated ver-
preventable (2) if it is brought about by unplanned sus not treated with a prophylactic antibiotic
extubation or as a complication of a neuromuscu- therapy). Moreover, goals for specific procedure-
lar blocking agent rather than a non-iatrogenic based outcomes can be proactively established,
respiratory problem (Lee et al. 2003). Major and such as the STS national objective to achieve a
minor outcomes (3) are easily envisioned based 1% 30-day operative mortality rate for lower-risk
on the degree to which they impact the patient CABG-only patients in the future (Mack 2012).
(e.g., death or nonfatal myocardial infarction ver- As part of a National Institutes of Health (NIH)
sus new-onset atrial fibrillation after CABG sur- initiative in 2004, a new repository entitled the
gery, respectively). Finally, outcomes may be Patient Reported Outcomes Measurement Infor-
unrelated to the medical care rendered (4) when mation System (PROMIS#) system of measures
they’re accepted as a normal consequence of a was established. The PROMIS# metrics included
procedure in a certain fraction of patients. patient self-reported mental, physical, and social
Microembolic events, for example, are known to health status as assessments of the patient’s per-
be an unpreventable consequence of the use of ception of their overall well-being. The
extracorporeal circulation (i.e., cardiopulmonary PROMIS# surveys identified how patients
bypass during cardiac surgery), while an embolic reacted and described how patients felt during
stroke involving a territory of brain circulation is specific times during care received for a
not (Shann et al. 2008). preestablished set of conditions (National Insti-
To support clinical decision-making, the out- tutes of Health 2014). To evaluate treatment effec-
comes identified for the medical care rendered tiveness, PROMIS# assessments can be used as
should focus on the most clinically relevant end primary or secondary end points in clinical
points or changes and may be judged in compar- studies.
ison with the best possible outcomes anticipated Intermediate outcomes, as observations in the
with the use of good processes and structures of pathway that directly lead to the final longer-term
care. Outcomes are often reported as rates, for outcomes, have also been commonly measured.
example, the rate of a serious adverse event fol- Specifically, intermediate outcomes may be com-
lowing a surgical procedure. For coronary artery monly associated with processes of care, as key
bypass graft (CABG-only) procedures, for exam- steps in the journey to obtaining a desired longer-
ple, the national rate reported for a 30-day opera- term health states. For example, the current ische-
tive mortality by the Society of Thoracic Surgery mic heart disease guidelines promulgated by the
(STS) for the period from 1996 to 2009 was American Heart Association would recommend
2.24% (Puskas et al. 2012). For an outcome to that CABG patients be discharged from the hos-
be useful, it must be compared across different pital receiving lipid-lowering medications. At dis-
populations that have the potential to achieve this charge, the use of lipid-lowering medications can
desired end point and also compared to reference be documented, as well as the patient’s current
standards determined from the expected ideal out- total cholesterol level (as well as high-density
come rate to be achieved. For example, the rates lipoprotein and low-density lipoprotein subcom-
for Department of Veterans Affairs (VA) CABG ponents). As an important marker related to post-
6 Health Services Information: Application of Donabedian’s Framework to Improve the. . . 117
CABG patient’s long-term survival, therefore, Importantly, patient risk factors may predis-
both lipid-lowering medication use (as a process pose patients to appropriately receive different
of care measure) and patient cholesterol measures types of treatments or be excluded from consider-
following the CABG hospital discharge over time ation for a specific treatment or set of treatments.
(as intermediate outcomes) may be assessed as Based on these same risk characteristics, there-
part of a quality assurance program (Hiratzka fore, patients may be pre-selected by providers
et al. 2007). to be eligible to receive care or differential types
of treatments. The patient characteristics that
relate to the propensity of a provider (or set of
Risk Adjustment providers) to select them for treatment may be
considered in a slightly different modeling
Although outcomes are considered by many to be approach, related to a propensity analysis (Black-
the ultimate measure of quality of care, they are, to stone 2001, 2002). Based on a patient
a large degree, influenced by the patient’s population’s likelihood to receive a specific treat-
pretreatment condition and their unique character- ment (which may also be related to their risk
istics (e.g., risk factors that influence outcomes). characteristics) using a propensity analysis, a
The goal of risk adjustment (a statistical analysis risk-adjusted analysis can be performed to iden-
isolating the relationship between the outcomes of tify the quality of care rendered to the patient.
interest and the treatment effects of interest) is to The risk-adjustment process, using a statistical
control for the effects of other patient-relevant modeling approach, calculates an “expected” risk
factors, although a patient’s pretreatment status (“E”) for each patient uniquely. Based on aggre-
may not be easy to measure. Specifically, patient gating patient data, the sum of the “expected”
risk factors may be defined as those characteristics risks for an adverse outcome may be compared
that “place patients at an enhanced risk that an to the sum of the “observed” adverse outcomes to
unfavorable event may occur” (Blumberg 1986). identify a patient subpopulation O/E ratio. Any
Generally, risk factors may be classified as mod- specific patient subpopulation or provider-based
ifiable (e.g., related to lifestyle or health behavior O/E ratio that is statistically different from the
choices) or non-modifiable (e.g., related to the value of 1.0 (i.e., where the ratio of the “expected”
patients’ demographic characteristics, socioeco- event rate to the “observed” event rate falls out-
nomic status, or their genetic propensity to incur side of the preestablished confidence interval)
disease-related adverse conditions). In evaluating may be classified as a “high” outlier – that is, the
a patient’s risk profile, it is of paramount impor- O/E ratio is statistically higher than the value of
tance to identify the patient’s severity of disease 1.0. Similarly, a “low” outlier can be identified
and comorbidities (e.g., other diseases that may based on an O/E ratio that is statistically signifi-
impact a patient’s likelihood of experiencing an cantly lower than the value of 1.0. In general, the
adverse event related to the primary disease being risk-adjusted outcome “high” outliers are identi-
considered). In the realm of cardiac surgery risk fied for more intensive quality reviews (e.g.,
adjustment of outcomes, patients’ demographic expanded chart reviews or site visits) for potential
factors (e.g., age or gender) and socioeconomic quality challenges by oversight groups. In con-
status (e.g., highest educational level attained) trast, the “low” outliers may serve as potential
along with the severity of their coronary disease opportunities to identify differential processes or
and complexities of their comorbidities have been structures of care that may be exemplary to serve
demonstrated to be related to risk-adjusted out- as a “benchmark” for others, as a template to
comes. Moreover, patient-based choices related to consider for quality improvement. In general,
healthy behaviors (e.g., body mass index) and quality assurance processes may tend to use
lifestyle (e.g., smoking status) may also influence more generous confidence intervals (e.g., 90%
their probability for having a major adverse event confidence intervals) in order to be sensitive –
(Nashef et al. 2012). that is, to screen in additional patients or provider
118 A. L. W. Shroyer et al.
4 12.8%
Observed/Expected (O/E) Ration
High 90% Confidence Interval
Low 90% Confidence Interval
High Outlier
Low Outlier
3 9.6%
2 6.4%
1 3.2%
Adjusted VA
Mean 3.0%
0
C
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
27 nte
28 -X
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
1
2
3
4
5
6
7
8
9
e
r
Center
Fig. 3 Example: observed/expected ratio comparison for Department of Veterans Affairs coronary artery bypass graft
30-day operative mortality
subgroups for closer quality assurance center X had no statistically significant difference
(QA) review activities (Shroyer et al. 2008). identified between their O and E rates, indicating
As illustrated in Fig. 3 (which is an example no need for further quality investigation related to
report), VA medical center #3 would likely be this specific end point. Finally, there are no “low-
identified in preliminary reviews as a “high-out- outlier” facilities in this example, as the confi-
lier” facility and may subsequently be screened dence intervals for the O/E ratios for facilities
for potential quality of care concerns, given that #38–44 encompass the value of 1.0. If, however,
the observed rate for 30-day operative mortality is there were “low outliers” identified, then these
statistically significantly higher than the rate that may be facilities to explore further with both
would have been expected based on evaluating in-depth chart reviews and/or site visits to identify
the patient risk characteristics for a CABG proce- “benchmark” care activities that may be useful to
dure. As documented, many quality assurance share and disseminate to other VA medical centers
reports commonly will use a liberal p-value as “best practices.”
threshold (such as p < 0.10) to attempt to screen Though important, outcomes do have inherent
in more facilities for an in-depth quality review, limitations when used as quality of care metrics.
casting a broader net for the next step in the review Outcomes only indirectly provide information that
process. As O/E ratios (in and of themselves) are a potential challenge may exist related to quality of
not definitive measures of quality of care, VA care, but generally outcomes do not identify the
medical center #3 potentially might be selected specific actions needed to improve the quality of
for a detailed chart review and possibly a site visit patient care. Moreover, outcomes do not usually
(pending the results of the chart review) to explore provide an adequate level of information to guide
for possible quality of care challenges. In contrast, the required changes as “action items” that can be
6 Health Services Information: Application of Donabedian’s Framework to Improve the. . . 119
taken by providers directly. Hence, the importance critical review by the academic, industry, patient,
of Donabedian’s triad assessment for quality of and public targeted audiences. Over the past two
care, as a complement of outcomes with processes decades, there has been an increasing emphasis
and structures, is required. placed on improving the public transparency as
well as sharing reports of risk-adjusted provider-
specific and facility-specific outcomes. As a case
Uncertainty in point, the Society of Thoracic Surgeons has
partnered with Consumer Reports to provide
For many quality-of-care endeavors, there is no online provider-specific outcome reports, with
adequate understanding of the relative impact of risk-adjusted outcomes (The Society of Thoracic
the patient risk factors upon adverse outcomes, Surgeons 2012). Given that the availability of
nor adequate understanding of what might be the risk-adjusted outcomes information is increasing,
natural course of events had the patients not it will be very interesting to observe the changes in
received any treatment or an alternative course both referral patterns and patient-provider choices
of treatment. With a variety of care alternatives that may occur over time in cardiac surgery utili-
often available, the best approach to address a zation rates, revealing to what degree changes in
patient’s unique risk factor profile is not always patient patterns in obtaining care may be or may
clear. For example, in treating patients with ische- not be related to the use of risk-adjusted out-
mic heart disease, there is strong evidence comes-based reports.
suggesting CABG to be the best care strategy for Emphasizing the clinician’s role in quality
patients with two- or three-vessel disease. How- improvement, Dr. Donabedian noted that “An
ever, the situations where medical management ideal physician is defined as one who selects
should be used to optimize long-term survival and implements the strategy of care that maxi-
versus manage angina symptoms versus a revas- mizes health status improvement without wasted
cularization may not be completely clear, particu- resources” (Donabedian et al. 1982). Toward this
larly for high-risk patients subgroups (e.g., goal, new quality of care metrics may be added to
patients with two prior heart surgical procedures, evaluate “timeliness” of care rendered. For exam-
as well as current severe angina symptoms). ple, Dr. Boris Sobolev and his Canadian-based
Given that clinical guidelines may provide research team have forged the way to identify
evidence-based care strategies for some but not patterns in surgery wait times, evaluating the
all patient subpopulations (particularly the impact of the timeliness of care rendered for
highest-risk patient subgroups), compliance with patients upon both their short-term and longer-
state-of-the-art evidence provides an important term outcomes (Sobolev and Fradet 2008).
indicator of quality of care – that is, a process- Dr. Sobolev has also done similar research in
based assessment to augment the risk-adjusted other surgical fields (e.g., general surgery and
outcomes assessments that may be coordinated. orthopedics) that has demonstrated that longer
Unfortunately, there is not always adequate evi- wait times do appear to have detrimental effects
dence basis to coordinate guidelines: a recent on patient outcomes across a variety of surgical
evaluations identified that for the current fields and procedures (Sobolev et al. 2003;
ACC/AHA guidelines promulgated from 1994 to Garbuz et al. 2006). Moreover, the referral pat-
2008, only 11% of the guidelines were based on terns related to the risk-adjusted outcomes may be
rigorous scientific, high-quality data-driven evi- stratified based on wait time delays, taking into
dence (based on a review of 53 guidelines on consideration the patient’s disease-related care
22 topics, with a total of 7,196 recommendations processes – not just focusing on a patient’s single
evaluated) (Tricoci et al. 2009). cardiac surgical care encounter. Although early in
To improve quality of care, it is important not the evolutionary process, the current focus of
only to identify and to monitor outcomes but also quality of care, which uses the patient encounter
to subject these risk-adjusted outcome reports to as the primary unit of analysis, is beginning to
120 A. L. W. Shroyer et al.
transition to a disease management focus (e.g., 30-day operative mortality were produced; these
evaluating the care provided related to the were further refined in June 1990. With the VA
patient’s ischemic heart disease) and toward a CICSP fully implemented, the first risk-adjusted
patient-based holistic health perspective (Fihn outcomes reports (focused on mortality and major
et al. 2012). perioperative complications) were produced com-
paring the performance across of all VA-based
cardiac surgery programs.
Implementation of VA National Before the end of 1990, the CICSP data form
Quality Improvement Programs (originally comprised of 54 elements on a single
sheet of paper) with associated definitions for risk,
In 1972, the Department of Veterans Affairs procedure-related, and outcome variables was
(VA) established the Cardiac Surgery Consultants mandated nationally by the VA as a new quality
Board (CSCB) to provide quality assurance over- assurance requirement for all cardiac surgery pro-
sight for all VA-based cardiac surgery programs. grams. Based on the CICSP endeavor, a new
Initially, the Cardiac Surgery Consultants Board noncardiac surgical quality improvement pro-
review focus was placed on evaluating descriptive gram, entitled the National Surgical Quality
reports of observed mortality cases, as well as Improvement Program (NSQIP), was initiated in
monitoring rates for both mortality and major 1991 by Drs. Shukri Khuri and Jennifer Daley
morbidity outcomes. Chart audits and site visits (Khuri et al. 1998). Expanding the focus to
were performed by the Cardiac Surgery Consul- include a diversity of general surgical procedures,
tants Board to assure that minimum standards for the VA NSQIP initiative partnered with the
quality of cardiac surgery were met by means of a CICSP to obtain funding for local nurse or data
peer-review process (Veterans Health Administra- coordinators to prospectively gather the patient
tion 2008). preoperative risk characteristics, the detailed sur-
In 1985, the Health Care Financing Adminis- gical processes of care, and the mortality and
tration (HCFA) release of hospital report cards perioperative morbidity-related outcomes to be
raised the public’s awareness of the wide varia- able to coordinate risk-adjusted mortality reports.
tions experienced by hospitals for their surgical Similar to the CICSP oversight coordinated by the
outcomes reported. Additionally, the Administra- Cardiac Surgery Consultants Board, the NSQIP
tion Health Care Amendments Act was passed, established an Executive Committee (EC) with
requiring that the VA establish a new quality key analytical support coordinated by
assurance program which would identify signifi- Dr. William Henderson. Working in concert, the
cant deviations in risk-adjusted and unadjusted VA Central Office of Surgical Services (under the
mortality and morbidity rates for surgical proce- leadership and guidance of Drs. Gerald
dures when compared with prevailing national McDonald and Ralph DePalma) synchronized
rates (Public Law 99–166 1985). Accordingly, the CICSP and NSQIP efforts to provide data-
the VA had also to determine if any discrepancies driven reports routinely to both the national over-
that were identified were related to differences in sight committees (Cardiac Surgery Consultants
the quality of the VA-based healthcare services Board and NSQIP Executive Committee) as well
(Grover et al. 1990). as to share these reports with local and regional
To address these legislative requirements, Drs. surgical program leaders (including Cardiotho-
Hammermeister and Grover implemented in 1987 racic Division Chiefs, Chiefs of Surgical Services,
a new program entitled the “Continuous Improve- Medical Center leaders, and VA Regional Office
ment in Cardiac Surgery Program” (CICSP), gath- leaders). As a primary focus, both CICSP and
ering data related to each cardiac surgical patient’s NSQIP chose to make their top priority the provi-
unique set of risk factors, surgical procedural sion of good information to drive good local and
details, and 30-day operative death outcomes. In regional decisions – to support internal VA-based
December 1987, the first risk-adjusted reports for self-assessment and self-improvement initiatives.
6 Health Services Information: Application of Donabedian’s Framework to Improve the. . . 121
With directives and continuous improvement focusing on risk-adjusted outcome metrics as the
communications coordinated by Drs. McDonald ultimate quality of care metrics. Mortality, in and
and DePalma, they were able to successfully pro- of itself, was a relatively rare event (under 3%
vide the right information at the right time to the mortality rate for CABG procedures). Given that
right individuals, as key decision-makers, to the chart reviews and site visits performed by the
empower them to take the right actions to improve VA Cardiac Surgery Consultants Board members
the safety and the quality of patient care. often provided meaningful insights into the chal-
As the first national comprehensive surgical lenges that occurred with processes and structures
quality improvement endeavor, the efforts of of care, they initiated a new VA Health Services
these key VA leaders, including Drs. Research and Development Study entitled Pro-
Hammermeister, Grover, Shroyer, Khuri, Daley, cesses, Structures, and Outcomes of Cardiac Sur-
and Henderson, radically shifted the quality-of- gery (PSOCS) to identify the important
care paradigm from crisis identification, focused components of the cardiac surgical care rendered
on uncovering problem facilities or providers, to veterans that may benefit by closer quality
where urgent action was needed to address defi- monitoring and reporting (Shroyer et al. 1995).
ciencies in care. The new goal was to improve the Funded in late 1991, the PSOCS study was
quality of care for all facilities and focused on initiated in May 1992 at 14 VA Medical Centers
evaluating metrics comprehensively over time with active cardiac surgery programs (out of the
(Itani 2009a, Rodkey and Itani 2009). These 44 total VA cardiac surgery programs). The
data-driven quality improvement programs have PSOCS study was a prospective cohort study,
made major impacts. The NSQIP program has with funded research nurses and data support per-
identified risk factors for morbidity and mortality sonnel. They gathered an extensive set of detailed
across a wide range of surgical subspecialties, data related to processes of care (including preop-
including general surgery, orthopedics, neurosur- erative, intraoperative, postoperative, and post-
gery, and many others (Itani 2009b). These risk discharge), structures of care related to the entire
factors have set the stage for continuous improve- care provider team (e.g., team member’s educa-
ment in the field of surgery by providing tools tional background, specialty training, years of
with which to better evaluate the role of surgery experience, and level of certification), and the
in individual patients’ care and better identify environment in which the care was rendered.
patients for prophylactic measures or closer mon- The environment was comprehensively assessed,
itoring in the intra- and postoperative periods. including data about the key features of the oper-
Having established the initial CICSP and ating room, recovery room intensive care units,
NSQIP’s legacy, these VA programs provided an telemetry monitoring, staffing levels, and the
impetus, serving as models for others (such as the quality and scope of oversight mechanisms. Addi-
Northern New England Cardiovascular Consor- tionally, the care provider interactions and com-
tium) to follow and to expand upon – with inno- munications were assessed via surveys. Finally,
vative enhancements (Malenka and O’Connor the nature and scope for surgical resident training
1998). were assessed, including the degree of supervision
provided to the residents engaged in cardiac sur-
gical patient care.
The Processes, Structures, To complement the traditional mortality and
and Outcomes of Cardiac Surgery morbidity outcome metrics routinely monitored
Study by CICSP, a very broad array of outcomes was
incorporated into the PSOCS study assessments.
During the early CICSP implementation period Focusing on the primary end points of death and
(1987–1991), however, it is important to realize major perioperative complications, outcome
that both Drs. Hammermeister and Grover recog- assessments were made at both 30 days following
nized that there were inherent limitations in surgery or at the completion of the inpatient
122 A. L. W. Shroyer et al.
of the medical staff organization and regula- provider team and by the patients themselves
tory activities that were performed as part of (for both cardiac disease-specific and general
the hospital’s quality integrating system. health status domains). The risk factors were
4. The structures of care that may impact out- also analyzed to evaluate to what degree modifi-
comes also included the number, education, able risk factors (e.g., patient’s alcohol use,
experience, and specialty training of the phy- smoking, and exercise habits) had a differential
sician provider team members (e.g., the sur- impact as compared to the non-modifiable risk
geon, cardiologist, and anesthesiologist). factors (e.g., the patient’s age, gender, or race/
Fundamentally, the provider team member ethnicity). Finally, a series of control variables
characteristics, mix of providers providing was used (e.g., provider identifier, facility identi-
care, and staffing levels, along with hospital fier, date/time sequencing variables) to coordinate
and physician experience, were important the complex analyses required.
structures that were hypothesized to impact In total, there were 1,453 variables gathered for
patient outcomes, after holding patient-specific each PSOCS patient, including 249 outcome-
baseline risk factors constant (Shroyer related dependent variables (which were ulti-
et al. 1995). mately used to calculate three short-term and
five intermediate 6-month outcomes) along with
Building on Dr. Donabedian’s paradigm for 1,102 independent variables (209 patient risk vari-
quality of care, the PSOCS study assumed that ables, 509 process-of-care variables, and
good processes and good structures of care were 303 structure-of-care variables) and 23 interval
very likely to lead to improved patient outcomes. events with 153 “control” variables used for ana-
Uncovering problems with specific processes of lytical purposes. Across the 14 participating med-
care or structure-related weaknesses in the ical centers, the PSOCS study enrolled 3,988
provider-based characteristics, the clinical care patients during the period from 1992 to 1996,
team mix, or facility-based characteristics, could with follow-ups coordinated through early 1997
indicate targets for scrutiny, where different (O’Brien et al. 2004).
actions could be taken to improve care. Due to the large number of variables, an initial
task was data reduction, addressing the missing
data and evaluating patterns of data completeness
Methods of the PSOCS Study across surgeons and VA medical centers. Because
intraoperative complications directly impacted
Given that PSOCS outcomes included assess- outcomes, these were addressed analytically. As
ments at 6 months post-discharge, a series of a first step, statistical risk models were built to
“interval events” was monitored, including both predict the 30-day operative and 6-month out-
health-related and non-health-related life events comes. Within domains and coordinated in a
during this post-discharge time period. The nested analysis across sub-domains, the impact
sequence and timing of post-discharge events of processes of care upon risk-adjusted outcomes
were gathered to evaluate the potential for inter- was evaluated. Specifically, processes of care
actions between post-discharge healthcare and related to operative duration (i.e., increased oper-
non-healthcare events upon risk-adjusted ative time), the use of inotropic agents, the use of
6-month patient outcomes of care. transesophageal echocardiographic (TEE) moni-
Importantly, a comprehensive array of patient- toring and systemic temperature monitoring, and
specific risk factors was gathered. Risk factors the use of hemoconcentration/ultrafiltration sys-
were classified in four dimensions assessed at tems were powerful predictors of adverse com-
baseline, including severity of cardiac disease, posite outcomes. Since some of these processes of
comorbidities (i.e., noncardiac diseases), demo- care may be initiated in response to adverse inter-
graphic and socioeconomic factors, and health mediate outcomes (e.g., intraoperative complica-
status evaluations performed by both the care tions), a more complex analytical approach was
124 A. L. W. Shroyer et al.
used to evaluate for the main effects (rather than intertwined with structures of care (e.g., surgeon-
interaction-related effects) for processes of care. specific years of experience). Moreover, the
Following these adjustments, the use of PSOCS study challenged the ability of research
intraoperative transesophageal echocardiography to isolate process-specific or structure-specific
and the use of hemoconcentration/ultrafiltration impacts on adverse risk-adjusted outcomes, as
remained significantly associated with increased well as identified the need to differentiate
risk for an adverse outcome (O’Brien et al. 2004), unplanned versus planned processes of care, an
which was likely driven by patient complexity. important advancement forging forward the fron-
tier of quality assessment. Finally, the PSOCS
study documented that the statistical risk model-
Findings of the PSOCS Study ing approaches used may need to evolve, to be
process- or structure specific, in order to identify
An important finding of this study, unanticipated the unique risk factors that emerged (e.g., a new
in the original PSOCS design, was that, retro- intraoperative complication) directing the change
spectively, it is extremely difficult to differentiate from planned to unplanned approaches (O’Brien
planned versus unplanned processes of care. et al. 2004).
Intermediate outcomes, such as intraoperative
complications, may cause providers to initiate
new processes, previously unplanned, to address The CICSP-X Program
unforeseen challenges. Thus, differentiating
between a planned process of care (i.e., a process Having recently completed the PSOCS study’s
of care that would be generally initiated for all data capture and preliminary analyses, the VA
patients) versus an unplanned process of care CICSP was dramatically expanded (entitled
(i.e., a process of care that was initiated in CICSP-X [as an expansion of CICSP], under the
response to an unforeseen challenge) is a criti- leadership of Dr. Shroyer) in 1997 as a clinical
cally important distinction for meaningful quality national quality improvement database to identify
assessments. Quite simply, capturing the the interrelationships of risk factors with pro-
unplanned processes of care may be – in and of cesses and structures of care, as well as to include
itself – an important indicator as a quality metric. a broader set of clinical outcomes (Shroyer
With this important concept documented by et al. 2008). The CICSP-X program established
PSOCS, it became clear that the use of state-of- the feasibility of coordinated multidimensional
the-art techniques and equipment for monitoring quality database reports to address a more com-
may provide for the early identification of poten- prehensive set of quality of care metrics, with a
tial adverse events. comprehensive “dashboard” of summary metrics
To facilitate future quality-related research, the reported for different quality of care dimensions,
PSOCS study successfully built upon the histori- including a series of preestablished outcome met-
cal literature basis, denoting that inotropic use, rics, as well as processes and structures of care
transesophageal echocardiography use, and the measures.
use of hemoconcentration/ultrafiltration appear In 1997, Department of Defense (DoD) and VA
to potentially impact post-CABG risk-adjusted guidelines for Ischemic Heart Disease (IHD)
outcomes. The PSOCS found that there was a became an impetus for additional changes to the
consistent relationship documented between key VA Criteria and Standards, where new post-
times (i.e., cardiopulmonary bypass time or oper- CABG hospital medication-use requirements
ative time) and risk-adjusted adverse outcomes, were established (Veterans Health Administration
for which there is an association with the surgeon- and Department of Defense 1997). As a key pro-
specific and/or facility-specific practices. Not sur- cesses of care measure, the CABG-only patients
prisingly, therefore, the PSOCS study identified use of key evidence-based medical therapies
that processes (e.g., operative times) were was required for (1) lipid-lowering agents,
6 Health Services Information: Application of Donabedian’s Framework to Improve the. . . 125
(2) beta-blockers for patients with a prior myocar- care to patients. As reference, these National
dial infarction, and (3) angiotensin-converting Quality Forum quality metrics specified what
enzyme (ACE) or angiotensin II receptor blocker would be anticipated “best practices” as well as
(ARB) medications for patients with a baseline established goals for surgeons to strive for in
low ejection fraction (40%). For CABG-only coordinating the care for their patients. For exam-
patients in high-risk subgroups, monitoring ple, the use of internal mammary artery (IMA)
extended to additional guidelines, measuring conduits for a CABG graft placed to the left ante-
compliance with standards including the use of rior descending artery (LAD) artery was generally
diabetic agents for diabetic patients and antihy- preferred based on improved long-term survival
pertensive medications for those with rates, as well as reduced rates for repeat revascu-
hypertension. larization procedures. Since it may take slightly
Due to the VA’s extensive Pharmacy Benefits longer to take down the internal mammary artery,
Management (PBM) program (and outstanding compared to harvesting a saphenous vein graft
leadership of the Pharmacy Benefits Management (SVG) conduit, this approach may not be advan-
enterprise), the rates of guideline-based medica- tageous for emergent patients. Similarly, elderly
tion use could be identified for a CABG-only patients may not live long enough to document the
patient based on their preoperative risk profile. internal mammary artery survival benefit
Although limited to identification of medications (Ferguson et al. 2002). Based on the National
filled via the VA pharmacy (medications filled at Quality Forum standards combined with
non-VA pharmacies could not be easily literature-based evidence and the feasibility of
ascertained), the compliance rates for all of the data to be captured, the CSCB identified as “best
guideline-required medications (using an “all-or- practice” the use of an internal mammary artery
none” evaluation) were routinely coordinated to graft for CABG-only procedures, particularly
assess overall cardiac surgery program perfor- emphasizing that this practice should be used for
mance. By improving compliance with Depart- the subgroup of non-emergent, patients (e.g., elec-
ment of Defense/VA guidelines, the goal was to tive and urgent cases). Starting in 2008, therefore,
improve long-term survival post-CABG surgery, the VA Criteria and Standards for Cardiac Surgery
as well as to optimize veterans’ long-term health Programs specified that a CSCB review would be
status and quality of life (Veterans Health Admin- performed for cardiac surgery programs that
istration CARE-GUIDE Working Group performed less than 80% of their CABG-only
et al. 1996). procedures using internal mammary artery grafts
during a 6-month reporting period. Figure 4, a
sample report, illustrates the variability in internal
Measuring Processes of Care mammary artery graft use across VA medical cen-
ters. Within this 6-month reporting period, center
During the late 1990s, a wide variety of national “X” had a CABG-only procedure internal mam-
watchdog agencies arose with the goal of provid- mary artery graft use rate of >80%. Hence, no
ing quality of care oversight such as the Leapfrog quality reviews of center “X” would normally be
initiative (Milstein et al. 2000). The National required for this preestablished internal mammary
Quality Forum was developed (Miller and artery graft use quality threshold.
Leatherman 1999) and published a set of perfor- In addition to assessing that the right processes
mance indicators that were intended to serve as of care were provided to the right patient, the VA
internal quality improvement metrics (National CICSP-X reports were expanded to also evaluate
Quality Forum 2004). At that time, the National cardiac surgical resource utilization, toward the
Quality Forum metrics represented the best data- goal of improving the efficiency of the VA care
driven evidence (or in the cases where evidence is provided (Shroyer et al. 2008). The resource uti-
lacking, the best clinical consensus) about the lization metrics included evaluating the rates of
optimal approaches to provide cardiac surgical the same-day surgery, the preoperative length of
126 A. L. W. Shroyer et al.
Fig. 4 Example: rate of internal mammary artery graft use at Veterans Affairs Medical Centers
stay, the operating room times, the postoperative care (e.g., early discharge planning and social
length of stay, and the total length of stay for the work support systems).
veterans served. Because some patients Recent studies have attempted to further char-
underwent preoperative cardiac catheterizations acterize the importance and utility of these types
during the CABG hospitalization and others did of resource utilization metrics. For example, the
not, these two groups were considered separately, Virginia Cardiac Surgery Quality Initiative
since this difference could impact both the rates (VCSQI) database of over 42,000 patients under-
for same-day surgery and the total length of stay. going CABG was recently analyzed to investigate
As an example of important resource use the relationship between quality (as determined
metrics routinely evaluated by CICSP histori- by various risk-adjusted measures of morbidity
cally, the proportion of patients with same-day and mortality) and resource utilization (i.e., costs
surgery, the preoperative length of stay (both for and length of stay) at individual hospitals. The
patients with and without a cardiac catheteriza- VCSQI research team documented strong corre-
tion procedure during the CABG hospitaliza- lation between risk-adjusted morbidity and mor-
tion), the postoperative length of stay, and the tality with length of stay but not directly with
total length of stay were monitored. For exam- costs. This appears to support the importance of
ple, Fig. 5 (which is a sample report) illustrates these types of process of care and outcome mea-
the types of resource consumption profiles pro- sures in assessing the value of services rendered at
vided by center. Within this example 6-month cardiac surgical centers. Further, it was shown that
reporting period, center “X” might have had both preoperative and postoperative factors (e.g.,
several areas that were flagged for potential effi- comorbidities and complications, respectively)
ciency reviews to examine practices of influence both length of stay and costs,
discharge-related processes and structures of reinforcing the importance of healthcare quality
6 Health Services Information: Application of Donabedian’s Framework to Improve the. . . 127
Upper quartile Lower quartile Same day Lower quartile Lower quartile
Centers in the Upper / Lower quartiles and Mid range are not outliers
Fig. 5 Example: Veterans Affairs coronary artery bypass grafting procedural resource consumption dashboard report
initiatives in containing the costs associated with for the most recent 3-year period, and trends over
healthcare and increasing the value of the care time for the entire period monitored (from 1991 to
rendered (Osnabrugge et al. 2014a, b). the current reporting period) were coordinated.
These “Time Series Monitors of Outcome”
(TSMO) metrics were evaluated to identify if a
Monitoring Trends Over Time cardiac surgery program might be a “high outlier,”
“not an outlier,” or “low outlier” based on
Across all processes of care, structures of care, preestablished statistically driven thresholds (e.g.,
resource use, and risk-adjusted outcomes, reports high and low outliers were generally more than
for the most recent 6-month period, trends over time two standard deviations beyond the mean).
128 A. L. W. Shroyer et al.
Additionally, the trend line slope was evaluated for overload by providing summaries of the findings
“upward” versus “downward” trending, versus “no identified in these detailed process, structure, out-
trend identified.” The subgroup of VA cardiac sur- come, and resource reports (Shroyer et al. 2008).
gery programs with upwardly oriented trends iden- Based on the dashboard reports, very busy VA
tified (i.e., a trend toward increasing adverse event Central Office leadership team members, regional
rates or increased resource use or problems with directors, hospital directors, and local VA cardiac
guideline compliance) or “high-outlier” status surgery program directors could coordinate
(potential challenges in overall performance) was informed data-driven decisions to address any
identified for intensive review, with potential site challenges identified, as well as work proactively
visits performed when these indicators clustered in to improve future VA cardiac surgery program
a manner to raise potential quality of care concerns. quality of care. Thus, as an infrastructure quality
Summary reports across all quality metrics (called reporting resource, the VA CICSP-X program set
“dashboards”) were developed, as the number of forth a dashboard framework that continues today
quality indicators increased. These dashboards pro- as part of the consolidated VA Surgical Quality
vided a quick and easy identification of the sub- Improvement Program (VA SQIP), setting the VA
group of VA cardiac surgery programs with as a leader in identifying, monitoring, and
challenges identified. Similarly, a focus was placed reporting quality for cardiac surgical care. As an
on identifying exemplary performance, that is, example of this, Fig. 6 documents that there was a
when clusters of positive performance indicators statistically significant downward trend observed
were identified, particularly if positive trends over for 30-day CABG operative mortality (a 2.1%
time were identified, as well as sustained positive reduction) from 1988 to 2007, indicative of con-
performance over time (Marshall et al. 1998). tinuing improvements over time for the CABG-
With the expanded focus on multidimensional only in-hospital surgical care and early post-
quality reports, the original CICSP report had discharge care provided.
grown from six pages to over 200 pages. The As the VA historically invested substantial
use of dashboards addressed the information support at both the national level (in the
Fig. 6 Example: Veterans Affairs time series monitors of outcome summary report evaluating trends in observed/
expected ratios over time
6 Health Services Information: Application of Donabedian’s Framework to Improve the. . . 129
CICSP-X and NSQIP programs) and at the local STS Adult Cardiac Surgery Database was suc-
level (for the local nurses or data coordinators cessfully coordinated.
used originally to gather the data required), it is As background, the purpose of the STS Adult
important to pause to evaluate the return on this Cardiac Surgery Database was to gather data on
investment. Based upon VA findings to date, mortality, morbidity, and resource-use outcomes,
these quality improvement endeavors appear to as well as patient risk factors, to allow the evalu-
have positively impacted short-term and longer- ation of risk-adjusted cardiac surgical outcomes
term rates of adverse cardiac surgical outcomes, across providers and to report trends over time. By
with dramatic improvements and statistically 1995, Dr. Clark had reported that the Adult Car-
significant downward-sloping trends in the mor- diac Surgery Database had grown to include 1500
tality and morbidity rates over the 20+-year surgeons at 706 centers across 49 states, with
period reported (Grover et al. 2001; Shroyer decreasing postoperative length of stay trends
et al. 2008). Based on the trends in risk-adjusted documented and modest reductions in operative
outcomes reported, moreover, these positive mortality rates in spite of increasing patient risk
improvements do not seem to be related to the over time (Clark 1995).
VA taking on easier cardiac surgical cases, as the By the late 1990s, a wide variety of STS
risk profile for veterans basically remained the initiatives had been coordinated related to the
same (with the exception that the average age of enhancement of the Adult Cardiac Surgery
the veterans served increased slightly over the Database and the initiation of the Congenital
period of time evaluated) (Shroyer et al. 2008). Heart Surgery and General Thoracic Surgery
Moreover, the markers of VA efficiency simi- endeavors. The STS databases were distributed
larly documented substantial improvements, to the participants by means of licensed soft-
with same-day surgery rates rising from 0% ware products via vendors, with centralized
(1987) to 40% (1997). database management, analysis, and reporting
Although no causal impact could be identified functions coordinated by the Duke Clinical
(as many changes in both surgical practices and Research Institute (DRCI) team. Long-term
medical management of ischemic heart disease goals were preestablished for the STS databases
occurred during these same periods), these posi- to become the main repositories to support
tive trends in risk-adjusted outcomes support the improvements in local clinical decision-
continuation of quality improvement efforts and making, cardiac surgery program management,
the expansion of these programs beyond cardiac and policy decisions. Toward these goals,
surgical patient care. expansions of the existing database data forms
and definitions were expanded to ensure that
1595 future comparisons might be coordinated
Implementation of National Quality across a broader array of outcomes (e.g., health-
Improvement Programs related quality of life, functional status, longer-
term survival, and costs of care). Additionally,
Under the leadership and guidance of Dr. Richard comparisons of cardiac surgical procedures to
E. Clark, the Society of Thoracic Surgeons (STS) alternative treatments (e.g., cardiology-based
initiated the National Adult Cardiac Surgery Data- interventions, such as the placement of stents)
base (ACSD) in February 1991 with 330 surgeon were planned.
members at 81 centers throughout the United By the early 2000s, the STS Adult Cardiac
States participating initially in this quality Surgery Database was viewed as the largest clin-
improvement endeavor (The Society of Thoracic ical repository of data available in the country,
Surgeons 2014c). Although the original goal was used to guide both health policy discussions and
to initiate databases also for Congenital Heart debates on reimbursement at congressional hear-
Surgery (CHSD) and General Thoracic Surgery ings. Database reports were generated semiannu-
(GTSD), the development of these two databases ally, with local site reports compared to regional
was delayed until the full implementation of the and national profiles. As STS National Database
130 A. L. W. Shroyer et al.
Committee members, Drs. Bruce Keogh (United agreement rates ranged from 94.5% (2007) to
Kingdom) and Paul Sargent (Belgium) worked 97.2% (2012), with improvements in the
with their European colleagues to build upon the variable-specific agreement trends over time.
STS Adult Cardiac Surgery Database structure a Although the operative mortality agreement rate
new European Association for Cardio-Thoracic was reportedly lower in earlier years, the rate of
Surgery Adult Cardiac Surgery Database reliability for death reporting has consistently
(EACTS), transforming the STS template into a remained above 95% since 2008. The STS exter-
structure that could be used to support quality nal audit process established that Adult Cardiac
improvement efforts globally. As of late 2008, Surgery Database data integrity is high, with data
this database was reported to include over one concordance reported at 97.2% (2012). By means
million patient records from 366 hospitals across of this external audit process, the STS Adult Car-
29 countries in Europe (Nashef et al. 1999; Head diac Surgery Database can be interpreted with
et al. 2013). confidence, with independent external auditor
The STS worked with the National Quality verifications confirming that the data submitted
Forum and the American Medical Association’s by STS participating surgeons and centers is of
Performance Improvement Physician’s Consor- the highest integrity (Winkley Shroyer et al. 2015,
tium to coordinate new quality of care metrics Member of STS Adult Cardiac Surgery Database
for national reporting from 1999 to 2001. These Workgroup, “personal communication”).
external collaborations, beyond the STS-based
quality reporting endeavors, were very important
to establish the external credibility of the STS Uncovering Quality Trends
Adult Cardiac Surgery Database. Even today, the
National Quality Forum metrics reported for adult Important quality improvement trends over time
cardiac surgery include the STS Adult Cardiac have been documented using the STS Adult Car-
Surgery Database-based metrics used widely in diac Surgery Database, including procedure-
program-based quality of care assessments (The specific or population-specific reductions in the
Society of Thoracic Surgeons 2014a). rate of adverse events reported. Overall rate of
Focused upon the importance of high-quality, reoperations and correspondingly the rate of
accurate, and reliable STS data to generate 30-day operative death have been documented to
reports, the STS Adult Cardiac Surgery Database be diminishing (6.0% down to 3.4% and 6.1%
Committee (chaired by Dr. Rich Prager) began a down to 4.6%, respectively) over the 10-year
new quality improvement process in 2006, ran- period of 2000–2009 (The Society of Thoracic
domly selecting STS participating sites to audit Surgeons 2014c). Importantly, the field of cardio-
and validate the number of cardiac surgical thoracic surgery has documented substantial qual-
records and outcomes submitted by participating ity improvements over time, with diminishing
surgeons and sites. For a random sampling of rates of mortality and morbidity (Ferguson
Adult Cardiac Surgery Database participating et al. 2002). As noted by Dr. Ferguson, remark-
sites from 2007 to 2013, each audited sites’ sub- able strides to improve cardiac surgical care have
mitted risk, operative procedure, and outcome been initiated by the surgeons (e.g., the use of new
data were compared with data obtained indepen- techniques for improved myocardial persevera-
dently by an external audit company. The number tion) and the pharmaceutical industry (providing
of Adult Cardiac Surgery Database sites audited new medications). Other improvements include
increased from 24 in 2007 (3% of sites) to 86 in the implementation of care pathways, the forma-
2013 (8% of sites). Over 92% of audited STS sites tion of cardiac surgery dedicated teams (e.g.,
provided positive audit feedback, noting that the including a dedicated cardiac anesthesiologist),
audit process had positively impacted their data better approaches used for patient selection, as
accuracy. Across all risk, process of care, and well as innovations to improve the efficiency of
outcome variable categories, the aggregate care (e.g., “fast-track” cardiac surgery early
6 Health Services Information: Application of Donabedian’s Framework to Improve the. . . 131
extubation protocols). Even though the popula- consideration. Studies on atrial fibrillation have
tion of cardiac surgical patients has grown older demonstrated that certain prophylactic measures
and sicker over time, risk-adjusted outcomes have (e.g., amiodarone, beta-blockers, magnesium,
improved. Another major change over time was atrial pacing) do significantly reduce the rate of
the growing reliance of the STS Adult Cardiac postoperative atrial fibrillation after cardiac sur-
Surgery Database by key national US-based deci- gery, as well as shorten hospital stays and decrease
sion-makers, including legislators. The STS Adult the cost of hospital care by over $1,200. No sig-
Cardiac Surgery Database was used to identify, nificant effects on mortality or the incidence of
monitor, report, and target future cardiac surgical stroke have been demonstrated, however
improvements, shifting the national quality (Arsenault et al. 2013). Similarly, a new module
debates from a conceptual framework to data- related to documentation of the details of cardiac
driven patient care, program management, and anesthesiology was added in July 2013 to identify
policy discussions (Ferguson et al. 2002). the anesthesiology-related processes of care that
As a major transformation to multidimensional may be targeted for future quality improvement
quality metrics, the STS has led the way in the initiatives (The Society of Thoracic Surgeons
development of composite scores, which were 2013). Most importantly, the focus on STS cardiac
adopted by the National Quality Forum as new and thoracic procedural outcomes has been
quality metrics in 2008. Specifically, Dr. David shifted to evaluate long-term outcomes, such as
Shahian and the STS National Database Commit- long-term survival. Toward this goal, database
tee worked to coordinate an STS coronary artery matches with the national death registry were
bypass graft (CABG) composite score. The com- performed, with the first long-term follow-up
posite score was comprised of risk-adjusted mor- risk models predicting survival completed
tality, risk-adjusted morbidity, a surgeon-related in 2012.
process of care metric (i.e., the use of the internal
mammary artery as a conduit), and a facility-
related process of care metric (i.e., the use of The Michigan Society of Thoracic
beta-blocker medications perioperatively) and Cardiovascular Surgeons Quality
(O’Brien et al. 2007). In combination, these Collaborative
multidimensional composite metrics are used to
categorize STS facilities and surgeons into “star The Michigan Society of Thoracic and Cardiovas-
ratings” for quality, based on a three-star, two-star, cular Surgeons Quality Collaborative (MSTCVS-
and single-star rating system, differentiating high- QC), as an example of a regional STS initiative, is
versus low-quality centers based on the composite led by Dr. Richard Prager. The MSTCVS-QC is a
metric (The Society of Thoracic Surgeons 2014b). consortium of 33 cardiac surgery programs
Based on the success of the CABG-only compos- throughout the state of Michigan focused on iden-
ite score, an isolated aortic valve replacement tifying intraoperative and postoperative opportu-
(AVR) composite score was designed and nities to improve the quality of cardiac surgical
implemented in 2012, as well as a combined aortic care. As one of their recent endeavors, they exam-
valve replacement-CABG composite score ined the use of blood transfusions as a potential
in 2014. quality of care metric, examining the relationship
Most recently, the STS has added new modules between blood product use and clinical outcomes.
to enhance focused quality endeavors for high- The MSTCVS-QC found that quality collabora-
risk patient subgroups. For example, a new mod- tive educational approaches may have very posi-
ule related to prophylaxis and treatment of cardiac tive impacts, as the blood product utilization was
surgery patients that experience atrial fibrillation documented to decrease dramatically after routine
was added. As atrial fibrillation is a very common quarterly reporting of program-identified transfu-
post-cardiac surgical complication, its prevention sion rates was implemented. The quarterly
and early treatment is an important quality MSTCVS-QC incorporated very frank
132 A. L. W. Shroyer et al.
discussions about the potential adverse effects driven approaches used to assess and to improve
(i.e., increased risk of mortality and morbidity) cardiac surgical patient’s quality of care (Shih
associated with transfusions. Under the leadership et al. 2014).
of Dr. Prager, the Michigan team’s persistent and
continued focus on this topic has dramatically
revised clinical practice and enhanced blood prod- The American College of Surgeons’
uct conservation approaches used throughout Private Sector Initiative
Michigan State (Paone et al. 2013).
Another MSTCVS-QC recent endeavor As a separate endeavor, the American College of
looked at how to reduce hospital-acquired infec- Surgeons (ACS) coordinated an NSQIP Private
tions (HAI) related to CABG procedures. Sector initiative, building upon the VA-based his-
Hospital-acquired infections include complica- torical work by Dr. Shukri Khuri’s team. The first
tions such as pneumonia, sepsis, septicemia, step in this process was a feasibility study
wound-related infections, as well as other infec- conducted in 1999 at three non-VA hospitals (Uni-
tions reported. As of early 2008, Medicare has not versity of Kentucky, University of Michigan, and
reimbursed hospitals for post-CABG Emory University) (Fink et al. 2002). Based on
mediastinitis-related treatments, as infections the initial success of this feasibility project, the
(such as mediastinitis) are perceived to be directly NSQIP was expanded in 2001 to include 18 cen-
related to a lower quality of surgical care provided ters as part of a pilot project funded by the Agency
during the initial CABG hospitalization. Interest- for Healthcare Research and Quality (AHRQ)
ingly, Dr. Prager and his MSTCVS-QC colleagues (Hall et al. 2009). Subsequently, the American
found that on average 5.1% of CABG patients College of Surgeons’ NSQIP pilot was expanded
developed hospital-acquired infection postopera- in 2004 to include other private hospitals’
tively. Moreover, there was a tremendous varia- reporting.
tion in the reported rates of post-CABG hospital- As background, the VA-based NSQIP had
acquired infections (ranging from 0.9% to been documented to improve risk-adjusted mor-
19.1%). Differences in cardiac surgery program- tality and morbidity across a diversity of surgical
based patient risk characteristics did not account disciplines. For the period from 1991 to 2004, the
for much of this dramatic difference in program- surgical 30-day operative mortality rate improved
based hospital-acquired infection rates observed. by 31%, and the surgical 30-day perioperative
Within this analysis, four centers appeared to be major morbidity rate improved by 45% (Khuri
high outliers (i.e., had a hospital-acquired infec- 2005). During this time period, the VA NSQIP
tion O/E ratio that was statistically significantly findings reported were deemed to be the “best in
higher than 1.0). Based on in-depth evaluations of the nation” by the Institute of Medicine in 2003
the CABG care rendered at these four “high-out- for evaluating the quality of surgery across a
lier” centers, the MSTCVS-QC team concluded broad range of surgical specialties (Khuri 2005).
that the largest variations were found for pneumo- The “Patient Safety in Surgery” (PSS) study was
nia and multiple infection end points. Based on initiated during 2001–2004 to evaluate the impact
their reviews, they thought a multidisciplinary of a uniform quality improvement system and to
care team approach was needed to address the compare VA and non-VA-based outcomes of care
challenges identified, ideally to bridge across tra- (where care-related details were gathered contem-
ditional specialty-based silos of care, facilitating poraneously using a standardized set of data
future heart patient team-based care approaches in forms, definitions, and analyses). With nearly
the future. Working collaboratively as an STS 185,000 surgical patient records gathered across
regional society, therefore, the MSTCVS-QC 128 VA medical centers and 14 private sector
team provides research on quality improvements hospitals, there were significant differences in
that extend beyond the STS Adult Cardiac Sur- the types of surgical procedures performed and
gery Database capabilities, enhancing the data- patient baseline risk characteristics across the VA
6 Health Services Information: Application of Donabedian’s Framework to Improve the. . . 133
versus non-VA hospitals. In spite of these differ- preliminarily identified. Specifically, a higher
ences in patient risk factors and procedures O/E ratio (a potential marker for quality of metrics
performed, the O/E ratios for 30-day operative concerns) was found to be associated with several
death were remarkably similar between the VA factors including anesthesia organized as a sepa-
and non-VA facilities (correlation coefficient = rate service, a larger number of operating rooms,
0.98). Similar to the VA trends identified earlier, more frequent reports of short staffing, and a
the non-VA private sector hospitals had an 8.7% higher rate for staff surgeons to be paid in part
decrease in major perioperative complications by the affiliated medical center. As a key process
over the 3 years of the study, documenting an of care identified, changes in the anesthesia pro-
important and substantive quality improvement vider during the case (i.e., from across the pre-,
(Khuri et al. 2008). intra-, and postoperative time periods) were asso-
The Agency for Healthcare Research and ciated with worse risk-adjusted mortality rates. A
Quality provided a grant to Dr. Khuri’s team, negative relationship between surgical volume
based in part upon these promising findings, to (e.g., fewer cases per surgeon per month) and
evaluate the “Structures and Processes of Surgical risk-adjusted morbidity (e.g., higher rates of peri-
Care Study” in late 2003 to relate the processes operative complications) was identified. Overall,
and structures of surgical care to postoperative the self-reported survey findings for processes and
risk-adjusted outcomes. For this NSQIP-based structures of care appeared to be more strongly
endeavor, surveys were sent out to the 123 VA associated with the risk-adjusted morbidity rates
sites and 14 private sector sites that participated in observed, rather than risk-adjusted mortality rates
the Patient Safety in Surgery study earlier. The documented. Importantly, the VA self-survey
survey included many questions, but specifically findings identified that a more integrated surgical
asked for information as to the organization of the service appeared to improve communication and
preoperative, intraoperative, and postoperative coordination of surgical care, as well as the effec-
care services. Additionally, there was information tiveness of surgical team performance. Thus,
gathered on hospital-specific surgical program- these preliminary survey findings provided an
based characteristics such as surgical program impetus for the documentation of surgery-specific
size, surgeon-specific volumes at the VA and processes and structures of care, as well as the
non-VA affiliates, patterns in surgical staffing development of a more comprehensive set of
ratios, the nature of the organizational structure, quality metrics that are currently evaluated by
the use of local facility-based quality improve- NSQIP (Main et al. 2007).
ment efforts, the types of novel equipment/tech- In 2004, the Private Sector Study (conducted at
nology available (e.g., ultrasonography used in 14 academic non-VA hospitals) was expanded and
the operating room, the use of a harmonic scalpel, opened to other private sector hospitals. By 2008,
the use of radio-frequency ablation, or availability the American College of Surgeons’ NSQIP market
of ultrasound-guided aspiration devices), avail- penetration for private hospitals included over
able information systems, the use of coordina- 200 facilities with diverse characteristics located
tion/communication processes, as well as throughout the United States. The initial evaluation
residency training program characteristics. The of the first 3 years (2005–2007) documented dra-
published results from the VA-based surveys matic improvements in quality of surgical care
(with responses sent back by the local Chiefs of rendered, with 66% of the hospitals documenting
Surgical Service) identified that there were tre- improved risk-adjusted mortality rates and 82% of
mendous variations in the processes and struc- the hospitals documenting improved risk-adjusted
tures of general surgical care. As documented by morbidity rates. In spite of the increasing patient
the descriptive survey findings, the process and risk characteristics reported (e.g., average patient
structure variables that appeared to be associated age increased over time), the results were impres-
with risk-adjusted morbidity (14 variables) and sive, with 9,598 potential complications avoided at
risk-adjusted mortality (four variables) were 183 private sector hospitals (Hall et al. 2009).
134 A. L. W. Shroyer et al.
Although many factors likely contributed to these evaluate outcomes, there must be a plausible con-
important and positive changes, the use of a data- ceptual relationship (if not actual data) that would
driven quality improvement initiative was identi- identify any other quality of care factors that could
fied as a major factor that appeared to lead to better be associated with the outcomes selected for
outcomes, cost savings, as well as improvements in evaluation.
safety across patient subgroups (Maggard-Gibbons Different clinical fields are at different stages of
2014). Several publications were coordinated eval- maturation in selecting the “best” quality metrics.
uating the usefulness of different types of process For surgical services, it has been demonstrated that
and structural interventions. Reducing the rate of the use of processes, structures, and risk-adjusted
adverse clinical outcomes, the documented set of outcomes (as a comprehensive set of quality met-
effective interventions included the use of proto- rics) would be the most appropriate to consider. In
cols to manage postoperative blood glucose for other fields (e.g., psychiatry), however, simply
diabetic patients, the use of venous thrombosis defining the frequency of a broad array of clinical
risk evaluations for high-risk patient subgroups, outcomes (along with the variety of risk factors that
standardized approaches for wound care manage- may be related to these outcomes) may be a more
ment, the use of physician order entry templates, appropriate starting place for a project.
the helpfulness of clinical pathways (e.g., a stan- A good outcomes assessment instrument
dardized approach to remove Foley catheters), should be:
enhanced tracking, and the use of more detailed
patient tracking/monitoring tools for postoperative • Valid (reflect variations in quality that are
pulmonary management. Hence, changes in Medi- consistent with expectations)
care payment reforms were initiated to provide • Reliable (have reproducible findings across
positive reimbursement incentives for surgeons multiple raters for similar assessments of qual-
and hospitals to participate in national quality ity of care)
improvement reporting endeavors such as • Timely (measure a sufficient time 2057
ACS-NSQIP and the STS national database sequence to evaluate the impact of medical
endeavors. Most importantly, the use of clinical 2058 care provided)
databases developed by surgeons for surgeons’ • Sensitive to change (reflect changes associ-
use in self-assessment and self-improvement ated with the care impacts provided)
endeavors gained momentum; with clinician- • Feasible to implement (reasonable to capture
leaders rising to the ranks of government organiza- given time and cost constraints)
tions (e.g., Dr. Jeff Rich, a cardiothoracic surgeon • Clinically relevant (reflect “best practice” and
taking on a top-level leadership role with the Cen- be useful to guide clinical decisions and/or
ters for Medicare and Medicaid Services) to actions)
advance the science of quality measurement and
management. (MacDermid et al. 2009). Additionally, the
accurate documentation of risk factors is critical
to allow risk-adjusted outcomes for meaningful
Implementation Challenges: Dilemmas comparison across provider subgroups, facilities,
Faced by Quality Measurement or patient subgroups (Shahian et al. 2004).
Projects Although many advancements have been
made in identifying approaches to implement
In evaluating the optimal quality metric or set of Dr. Donabedian’s triad for assessing quality of
metrics to use for a project, researchers must con- patient care, many challenges remain that cause
sider many factors. The purpose of the project as difficulties in achieving these goals. Specifi-
well as the type of questions raised will direct cally, there are issues related to handling missing
which types of assessments are most important data (Hamilton et al. 2010; Parsons et al. 2011).
(e.g., process, structure, and/or outcomes). To Although different statistical approaches can be
6 Health Services Information: Application of Donabedian’s Framework to Improve the. . . 135
used to address missing data challenges, the may not necessarily result in adverse longer-term
distribution of missing data is unlikely to be outcomes.
random. Based on the nature and distribution Additional difficulties in evaluating quality of
of the missing data, therefore, it may be appro- patient care may be related to the uncertainty in
priate to clinically substitute specific values. For documenting the sequence and timing of events.
example, substituting negative findings for As a case in point, the NSQIP database was used
missing complications may be appropriate, as to evaluate the impact of the timing of major
the medical chart does not uniformly document perioperative complications upon mortality. Inter-
complications that did not occur. Pending the estingly, early wound infections resulted in a
need for a statistical imputation approach, there higher risk of mortality, in spite of adjusting for
are ways to reduce uncertainty associated with patient risk factors and other complication bur-
imputation. Whatever the approach used, the dens. Somewhat surprisingly, the early occurrence
assumptions and methodological details should of cardiac arrest or unplanned intubation was
be documented. Where possible, sensitivity ana- associated with lower risk of mortality after
lyses should be conducted to evaluate the impact adjustment for other factors. However, late occur-
of the different imputation approaches upon the rence of pneumonia, acute myocardial infarction,
study-specific findings (as well as potential deci- or cerebrovascular accident was associated with
sions to be drawn from these findings) (Hamil- higher risk of mortality (Wakeam et al. 2014).
ton et al. 2010). Although these study findings were preliminarily
Another challenge that arises in quality of care based on NSQIP database records, the timing and
assessments is differentiating between planned sequence of perioperative complications does
and unplanned processes or structures of care, as appear to matter when identifying the interrela-
well as to what degree these processes were coor- tionships of different adverse events, such as com-
dinated in response to interim outcomes. For plications and mortality.
example, Dr. Guyatt and his team conducted a Finally, there are many factors that impact
systematic review and meta-analysis of the factors patient longer-term outcomes including both med-
associated with unplanned readmission for ran- ical events and nonmedical factors that occur after
domized, controlled, clinical trials of heart failure the main medical intervention studied. Specifi-
interventions (Gwadry-Sridhar et al. 2004). They cally, the VA PSOCS study evaluated the factors
found that targeted heart failure patients who that influenced 6-month mortality and 6-month
received an educational intervention experienced health-related quality of life (Rumsfeld
a significantly decreased rate of unplanned hospi- et al. 2001, 2004). The variations in the occur-
tal readmissions. As part of their review and ana- rence of interval events following post-CABG
lyses, they identified that unplanned readmission discharge, including both medical and
(as an adverse process of care that occurred rela- nonmedical life events, were substantial. Simi-
tively infrequently following targeted heart failure larly, Dr. Murphy and colleagues found that living
interventions) was a potential quality of care met- alone following CABG surgery was a major risk
ric that was clinically relevant to monitor. How- factor for readmission, when such solitary patients
ever, unplanned readmission for congestive heart were compared to those who were married or
failure patients who received targeted educational lived with others (Murphy et al. 2008).
interventions did not correspond with a decrease
in longer-term patient survival (in the 6 months to
1 year post-intervention period). Thus, appropri- Summary
ate treatments coordinated at the time of the
unplanned readmission may have mitigated any In summary, the goal of improving quality of care
adverse impact upon the longer-term survival end is an elusive one. The end point may appear to be
point. In summary, unplanned processes of care in sight but, like a distant horizon, it cannot be
that occur may be related to interim outcomes and reached. Great achievements have been
136 A. L. W. Shroyer et al.
Sobolev B, Fradet G. Delays for coronary artery bypass Anderson RJ, Henderson W, V. A. C. S. Group. Long-
surgery: how long is too long? Expert Rev term patency of saphenous vein and left internal mam-
Pharmacoecon Outcomes Res. 2008;8(1):27–32. mary artery grafts after coronary artery bypass surgery:
results from a Department of Veterans Affairs Cooper-
ative Study. J Am Coll Cardiol. 2004;44(11):2149–56.
Gwadry-Sridhar FH, Flintoft V, Lee DS, Lee H, Guyatt
Further Readings GH. A systematic review and meta-analysis of studies
comparing readmission rates and mortality rates in
Arsenault KA, Yusuf AM, Crystal E, Healey JS, Morillo patients with heart failure. Arch Intern Med. 2004;164
CA, Nair GM, Whitlock RP. Interventions for (21):2315–20.
preventing post-operative atrial fibrillation in patients Hall BL, Hamilton BH, Richards K, Bilimoria KY, Cohen
undergoing heart surgery. Cochrane Database Syst Rev. ME, Ko CY. Does surgical quality improve in the
2013;1, CD003611. American College of Surgeons National Surgical Qual-
Blackstone EH. Breaking down barriers: helpful break- ity Improvement Program: an evaluation of all partici-
through statistical methods you need to understand pating hospitals. Ann Surg. 2009;250(3):363–76.
better. J Thorac Cardiovasc Surg. 2001;122(3):430–9. Hamilton BH, Ko CY, Richards K, Hall BL. Missing data
Clark RE. The STS Cardiac Surgery National Database: an in the American College of Surgeons National Surgical
update. Ann Thorac Surg. 1995;59(6):1376–80; dis- Quality Improvement Program are not missing at ran-
cussion 1380–1371. dom: implications and potential impact on quality
Dowsey MM, Petterwood J, Lisik JP, Gunn J, Choong assessments. J Am Coll Surg. 2010;210(2):125–39,
PF. Prospective analysis of rural–urban differences in e122.
demographic patterns and outcomes following total Head SJ, Howell NJ, Osnabrugge RL, Bridgewater B,
joint replacement. Aust J Rural Health. 2014;22 Keogh BE, Kinsman R, Walton P, Gummert JF,
(5):241–8. Pagano D, Kappetein AP. The European Association
Ferguson TB Jr, Coombs LP, Peterson ED. Internal tho- for Cardio-Thoracic Surgery (EACTS) database: an
racic artery grafting in the elderly patient undergoing introduction. Eur J Cardiothorac Surg. 2013;44(3):
coronary artery bypass grafting: room for process e175–80.
improvement? J Thorac Cardiovasc Surg. 2002;123 Hiratzka LF, Eagle KA, Liang L, Fonarow GC, LaBresh
(5):869–80. KA, Peterson ED, C. Get With the Guidelines
Fihn SD, Gardin JM, Abrams J, Berra K, Blankenship JC, Steering. Atherosclerosis secondary prevention per-
Dallas AP, Douglas PS, Foody JM, Gerber TC, formance measures after coronary bypass graft sur-
Hinderliter AL, King 3rd SB, Kligfield PD, Krumholz gery compared with percutaneous catheter
HM, Kwong RY, Lim MJ, Linderbaum JA, Mack MJ, intervention and nonintervention patients in the Get
Munger MA, Prager RL, Sabik JF, Shaw LJ, Sikkema With the Guidelines database. Circulation. 2007;116
JD, Smith Jr CR, Smith Jr SC, Spertus JA, Williams SV, (11 Suppl):I207–12.
Anderson JL, F. American College of Cardiology Itani KM. A celebration and remembrance. Am J Surg.
Foundation/American Heart Association Task. 2012 2009a;198(5 Suppl):S1–2.
ACCF/AHA/ACP/AATS/PCNA/SCAI/STS guideline Itani KM. Fifteen years of the National Surgical Quality
for the diagnosis and management of patients with Improvement Program in review. Am J Surg.
stable ischemic heart disease: a report of the American 2009b;198(5 Suppl):S9–18.
College of Cardiology Foundation/American Heart Karthik S, Fabri BM. Left internal mammary artery usage
Association task force on practice guidelines, and the in coronary artery bypass grafting: a measure of quality
American College of Physicians, American Associa- control. Ann R Coll Surg Engl. 2006;88(4):367–9.
tion for Thoracic Surgery, Preventive Cardiovascular Khuri SF. The NSQIP: a new frontier in surgery. Surgery.
Nurses Association, Society for Cardiovascular Angi- 2005;138(5):837–43.
ography and Interventions, and Society of Thoracic Khuri SF, Daley J, Henderson W, Hur K, Demakis J, Aust
Surgeons. Circulation. 2012;126(25):e354–471. JB, Chong V, Fabri PJ, Gibbs JO, Grover F,
Fink AS, Campbell Jr DA, Mentzer Jr RM, Henderson Hammermeister K, Irvin 3rd G, McDonald G, Passaro
WG, Daley J, Bannister J, Hur K, Khuri SF. The Jr E, Phillips L, Scamman F, Spencer J, Stremple
National Surgical Quality Improvement Program in JF. The Department of Veterans Affairs’ NSQIP: the
non-veterans administration hospitals: initial demon- first national, validated, outcome-based, risk-adjusted,
stration of feasibility. Ann Surg. 2002;236(3):344–53; and peer-controlled program for the measurement and
discussion 353–344. enhancement of the quality of surgical care. National
Garbuz DS, Xu M, Duncan CP, Masri BA, Sobolev VA Surgical Quality Improvement Program. Ann Surg.
B. Delays worsen quality of life outcome of primary 1998;228(4):491–507.
total hip arthroplasty. Clin Orthop Relat Res. Lee PJ, MacLennan A, Naughton NN, O’Reilly M. An
2006;447:79–84. analysis of reintubations from a quality assurance data-
Goldman S, Zadina K, Moritz T, Ovitt T, Sethi G, base of 152,000 cases. J Clin Anesth. 2003;15
Copeland JG, Thottapurathu L, Krasnicka B, Ellis N, (8):575–81.
138 A. L. W. Shroyer et al.
MacDermid JC, Grewal R, MacIntyre NJ. Using an National Institutes of Health. PROMIS: Patient Reported
evidence-based approach to measure outcomes in clin- Outcomes Measurement Information System. 2014.
ical practice. Hand Clin. 2009;25(1):97–111, vii. Retrieved 25 Oct 2014, from http://www.nihpromis.
Mack MJ. If this were my last speech, what would I say? org/
Ann Thorac Surg. 2012;94(4):1044–52. National Quality Forum. National voluntary consensus
Maggard-Gibbons M. The use of report cards and outcome standards for cardiac surgery. Washington, DC:
measurements to improve the safety of surgical care: National Quality Forum; 2004.
the American College of Surgeons National Surgical National Quality Forum. NQF patient safety terms and
Quality Improvement Program. BMJ Qual Saf. definitions. Washington, DC: National Quality Forum;
2014;23(7):589–99. 2009.
Magno G. The healing hand; man and wound in the ancient Nawal Lutfiyya M, Bhat DK, Gandhi SR, Nguyen C,
world. Cambridge, MA: Harvard University Press; 1975. Weidenbacher-Hoper VL, Lipsky MS. A compari-
Main DS, Henderson WG, Pratte K, Cavender TA, son of quality of care indicators in urban acute care
Schifftner TL, Kinney A, Stoner T, Steiner JF, Fink hospitals and rural critical access hospitals in the
AS, Khuri SF. Relationship of processes and structures United States. Int J Qual Health Care. 2007;19
of care in general surgery to postoperative outcomes: a (3):141–9.
descriptive analysis. J Am Coll Surg. 2007;204 Nielsen ME. The legacy of Ernest A. Codman in the 21st
(6):1157–65. century. J Urol. 2014;192(3):642–4.
Malenka DJ, O’Connor GT. The Northern New England Norcini JJ, Boulet JR, Dauphinee WD, Opalek A, Krantz
Cardiovascular Disease Study Group: a regional col- ID, Anderson ST. Evaluating the quality of care pro-
laborative effort for continuous quality improvement in vided by graduates of international medical schools.
cardiovascular disease. Jt Comm J Qual Improv. Health Aff (Millwood). 2010;29(8):1461–8.
1998;24(10):594–600. O’Brien SM, Shahian DM, DeLong ER, Normand SL,
Marshall G, Shroyer AL, Grover FL, Hammermeister Edwards FH, Ferraris VA, Haan CK, Rich JB, Shewan
KE. Time series monitors of outcomes. A new dimen- CM, Dokholyan RS, Anderson RP, Peterson
sion for measuring quality of care. Med Care. 1998;36 ED. Quality measurement in adult cardiac surgery:
(3):348–56. part 2 – statistical considerations in composite measure
Mavroudis C, Mavroudis CD, Jacobs JP, Siegel A, scoring and provider rating. Ann Thorac Surg. 2007;83
Pasquali SK, Hill KD, Jacobs ML. Procedure-based (4 Suppl):S13–26.
complications to guide informed consent: analysis of Osnabrugge RL, Speir AM, Head SJ, Jones PG,
society of thoracic surgeons-congenital heart surgery Ailawadi G, Fonner CE, Fonner E Jr, Kappetein AP,
database. Ann Thorac Surg. 2014;97(5):1838–49; dis- Rich JB. Cost, quality, and value in coronary artery
cussion 1849–51. bypass grafting. J Thorac Cardiovasc Surg.
Miller T, Leatherman S. The National Quality Forum: a 2014a ;148(6):2729-35.
‘me-too’ or a breakthrough in quality measurement Osnabrugge RL, Speir AM, Head SJ, Jones PG,
and reporting? Health Aff (Millwood). 1999;18 Ailawadi G, Fonner CE, Fonner Jr E, Kappetein AP,
(6):233–7. Rich JB. Prediction of costs and length of stay in
Milstein A, Galvin RS, Delbanco SF, Salber P, Buck Jr coronary artery bypass grafting. Ann Thorac Surg.
CR. Improving the safety of health care: the leapfrog 2014b;98(4):1286–93.
initiative. Eff Clin Pract. 2000;3(6):313–6. Ozker E, Saritas B, Vuran C, Yoruker U, Ulugol H, Turkoz
Murphy BM, Elliott PC, Le Grande MR, Higgins RO, R. Delayed sternal closure after pediatric cardiac oper-
Ernest CS, Goble AJ, Tatoulis J, Worcester ations; single center experience: a retrospective study. J
MU. Living alone predicts 30-day hospital readmission Cardiothorac Surg. 2012;7:102.
after coronary artery bypass graft surgery. Eur J Paone G, Brewer R, Likosky DS, Theurer PF, Bell GF,
Cardiovasc Prev Rehabil. 2008;15(2):210–5. Cogan CM, Prager RL, T. Membership of the Michigan
Nashef SA, Roques F, Michel P, Gauducheau E, Society of and S. Cardiovascular. Transfusion rate as a
Lemeshow S, Salamon R. European system for cardiac quality metric: is blood conservation a learnable skill?
operative risk evaluation (EuroSCORE). Eur J Ann Thorac Surg. 2013;96(4):1279–86.
Cardiothorac Surg. 1999;16(1):9–13. Papanikolaou PN, Christidi GD, Ioannidis JP. Patient out-
Nashef SA, Roques F, Sharples LD, Nilsson J, Smith C, comes with teaching versus nonteaching healthcare: a
Goldstone AR, Lockowandt U. EuroSCORE II. Eur J systematic review. PLoS Med. 2006;3(9), e341.
Cardiothorac Surg. 2012;41(4):734–44; discussion Parsons HM, Henderson WG, Ziegenfuss JY, Davern M,
744–735. Al-Refaie WB. Missing data and interpretation of can-
National Committee for Quality Assurance. About NCQA. cer surgery outcomes at the American College of Sur-
2014a. Retrieved 21 Nov 2014, from http://www.ncqa. geons National Surgical Quality Improvement
org/AboutNCQA.aspx Program. J Am Coll Surg. 2011;213(3):379–91.
National Committee for Quality Assurance. Persistence of Public Law 99–166. Veterans’ Administration Health-Care
beta-blocker treatment after a heart attack. 2014b. Amendments of 1985. Public Law. 1985;99–166.
Retrieved 21 Nov 2014, from http://www.ncqa.org/ Puskas JD, Kilgo PD, Thourani VH, Lattouf OM, Chen E,
ReportCards/HealthPlans/StateofHealthCareQuality/ Vega JD, Cooper W, Guyton RA, Halkos M. The soci-
2014TableofContents/BetaBlockers.aspx ety of thoracic surgeons 30-day predicted risk of
6 Health Services Information: Application of Donabedian’s Framework to Improve the. . . 139
mortality score also predicts long-term survival. Ann The Society of Thoracic Surgeons. STS CABG Composite
Thorac Surg. 2012;93(1):26–33; discussion 33–25. Score. 2014b. Retrieved 30 Oct 2014, from http://www.
Rodkey GV, Itani KM. Evaluation of healthcare quality: a sts.org/sts-public-reporting-online/cabg-composite-
tale of three giants. Am J Surg. 2009;198(5 Suppl):S3–8. score
Rumsfeld JS, Magid DJ, O’Brien M, McCarthy Jr M, The Society of Thoracic Surgeons. STS National Database.
MaWhinney S, Scd ALS, Moritz TE, Henderson WG, 2014c. Retrieved 30 Oct 2014, from http://www.sts.
Sethi GK, Grover FL, Hammermeister KE, org/national-database
S. Department of Veterans Affairs Cooperative Study Tran C, Wijeysundera HC, Qui F, Tu JV, Bhatia
in Health Services: Processes and S. Outcomes of Care RS. Comparing the ambulatory care and outcomes for
in Cardiac. Changes in health-related quality of life rural and urban patients with chronic ischemic heart
following coronary artery bypass graft surgery. Ann disease: a population-based cohort study. Circ
Thorac Surg. 2001;72(6):2026–32. Cardiovasc Qual Outcomes. 2014;7(6):8, 35–43.
Rumsfeld JS, Ho PM, Magid DJ, McCarthy Jr M, Shroyer Tricoci P, Allen JM, Kramer JM, Califf RM, Smith Jr
AL, MaWhinney S, Grover FL, Hammermeister SC. Scientific evidence underlying the ACC/AHA
KE. Predictors of health-related quality of life after clinical practice guidelines. JAMA. 2009;301
coronary artery bypass surgery. Ann Thorac Surg. (8):831–41.
2004;77(5):1508–13. van Kasteren ME, Mannien J, Ott A, Kullberg BJ, de Boer
Shahian DM, Blackstone EH, Edwards FH, Grover FL, AS, Gyssens IC. Antibiotic prophylaxis and the risk of
Grunkemeier GL, Naftel DC, Nashef SA, Nugent WC, surgical site infections following total hip arthroplasty:
Peterson ED, S. T. S. w. o. e.-b. surgery. Cardiac sur- timely administration is the most important factor. Clin
gery risk models: a position article. Ann Thorac Surg. Infect Dis. 2007;44(7):921–7.
2004;78(5):1868–77. Veterans Health Administration. VHA handbook 1102.3:
Shann KG, Giacomuzzi CR, Harness L, Myers GJ, Paugh criteria and standards for cardiac surgery programs.
TA, Mellas N, Groom RC, Gomez D, Thuys CA, Washington, DC: Veterans Health Administration;
Charette K, Ojito JW, Tinius-Juliani J, Calaritis C, 2008.
McRobb CM, Parpard M, Chancy T, Bacha E, Cooper Veterans Health Administration and Department of
DS, Jacobs JP, Likosky DS. Complications relating to Defense. VA/DoD clinical practice guideline for the
perfusion and extracorporeal circulation associated with management of ischemic heart disease. Washington,
the treatment of patients with congenital cardiac disease: DC: Veterans Health Administration, Department of
consensus definitions from the Multi-Societal Database Defense; 1997.
Committee for Pediatric and Congenital Heart Disease. Veterans Health Administration CARE-GUIDE Working
Cardiol Young. 2008;18 Suppl 2:206–14. Group, Denver VA Medical Center CARE-GUIDE
Sharp LK, Bashook PG, Lipsky MS, Horowitz SD, Miller Coordinating Team and United States Veterans Health
SH. Specialty board certification and clinical outcomes: Administration Office of Quality Management, Denver
the missing link. Acad Med. 2002;77(6):534–42. VA Medical Center CARE-GUIDE Coordinating Team,
Shih T, Zhang M, Kommareddi M, Boeve TJ, Harrington United States Veterans Health Administration Office of
SD, Holmes RJ, Roth G, Theurer PF, Prager RL, Likosky Quality Management. Veterans Health Administration
DS, T. Michigan Society of and C. Cardiovascular Sur- CARE-GUIDE for ischemic heart disease. Washington,
geons Quality. Center-level variation in infection rates DC: Department of Veterans Affairs; 1996.
after coronary artery bypass grafting. Circ Cardiovasc VillaNueva CB, Ludwig ST, Shroyer AL, Deegan NI,
Qual Outcomes. 2014;7(4):567–73. Steeger JE, London MJ, Sethi GK, Grover FL,
Sobolev B, Mercer D, Brown P, FitzGerald M, Jalink D, Hammermeister KE. Variations in the processes and
Shaw R. Risk of emergency admission while awaiting structures of cardiac surgery nursing care. Med Care.
elective cholecystectomy. CMAJ. 2003;169(7):662–5. 1995;33(10 Suppl):OS59–65.
The Joint Commission. Ernest Amory Codman Award. Wakeam E, Hyder JA, Tsai TC, Lipsitz SR, Orgill DP,
2014. Retrieved 23 Oct 2014, from http://www. Finlayson SR. Complication timing and association
jointcommission.org/codman.aspx with mortality in the American College of Surgeons’
The Society of Thoracic Surgeons. Consumer Reports and National Surgical Quality Improvement Program data-
STS Public Reporting. 2012. Retrieved 27 Oct 2014, base. J Surg Res. 2014,193(1):77–87.
from http://www.sts.org/news/consumer-reports-and- Winkley Shroyer AL, Bakaeen F, Shahian DM, Carr BM,
sts-public-reporting Prager RL, Jacobs JP, Ferraris V, Edwards F, Grover FL.
The Society of Thoracic Surgeons. Adult Cardiac Anes- The society of thoracic surgeons adult cardiac surgery
thesia Module. 2013. Retrieved 30 Oct 2014, from database: the driving force for improvement in cardiac
http://www.sts.org/sts-national-database/adult-cardiac- surgery. Semin Thorac Cardiovasc Surg. 2015 Sum-
anesthesia-module mer;27(2):144–51. PubMed PMID: 26686440.
The Society of Thoracic Surgeons. NQF# 0696: The STS Yasa H, Lafci B, Yilik L, Bademci M, Sahin A, Kestelli M,
CABG Composite Score. NQF: Quality Positioning Yesil M, Gurbuz A. Delayed sternal closure: an effec-
System. 2014a. Retrieved 30 Oct 2014, from http:// tive procedure for life-saving in open-heart surgery.
www.qualityforum.org/QPS Anadolu Kardiyol Derg. 2010;10(2):163–7.
Health Services Information:
Data-Driven Improvements in Surgical 7
Quality: Structure, Process, and Outcomes
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Stakeholders for Surgical Outcome
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Types of Data for Surgical Outcome
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Existing Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Changes in Surgical Procedures and Practices Over Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Individual Surgeon Variation (Preferences,
Techniques, and Skills) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Timing of Complications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Limited Information on Socioeconomic Drivers of Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Need for Linked Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Data Management and Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Structure-Process-Outcome Assessment in Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Theoretical Framework of Quality Assessment in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . 154
Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Surgical Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Risk Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
K. Noyes (*)
Department of Surgery, University of Rochester Medical
Center, Rochester, NY, USA
e-mail: katia_noyes@urmc.rochester.edu
F. J. Fleming · J. C. Iannuzzi
University of Rochester Medical Center, Rochester, NY,
USA
J. R. T. Monson
Florida Hospital System Center for Colon and Rectal
Surgery, Florida Hospital Medical Group Professor of
Surgery, University of Central Florida, College of
Medicine, Florida Hospital, Orlando, FL, USA
e-mail: john.monson.md@flhosp.org
“To Err is Human” concluded that the healthcare discrepancies and misalignments can be
in the United States is not as safe as it should be. observed with respect to surgical outcomes. The
One of the report’s main revolutionary conclu- vast majority of surgical oncologists will con-
sions was that the majority of medical errors in sider clean margins as synonymous with being
the United States did not result from individual “cured of cancer,” despite the fact that a patient
recklessness. More commonly errors are caused may still have to endure many months of
by faulty systems, processes, and underlying con- exhausting and toxic chemotherapy and radia-
ditions that lead people to either make mistakes or tion, temporary or permanent colostomy, fatigue,
fail to prevent them. The report advocated reduc- depression, and undesirable cosmetic changes.
ing harm through system-based initiatives rather Successful quality improvement in clinical prac-
than increasing pressure on individual providers tice requires a common vision, multidisciplinary
(Brown and Patterson 2001). A focus on surgical plans, and cooperation among all involved stake-
outcomes is thus even more paramount where any holders, across the spectrum of all clinical pro-
small slip can quickly lead to disastrous viders including healthcare administrators,
consequences. payers, social services, community organiza-
While the IOM report led to some system-level tions, and patient advocates.
improvements, including expansion of health Hurtado (Hurtado et al. 2001) defines quality
insurance coverage through PPACA in 2010, as “the degree to which health services for indi-
many problems remained or even worsened. In viduals and populations increase the likelihood of
2013, the IOM convened a committee of experts desired health outcomes and are consistent with
to examine the quality of cancer care in the United current professional knowledge,” but such broad
States and formulate recommendations for definitions can have limited direct applications. A
improvement. Delivering High-Quality Cancer more useful definition of quality measures it over
Care: Charting a New Course for a System in six domains: effectiveness, timely access, capac-
Crisis presented the committee’s findings and rec- ity, safety, patient centeredness, and equity
ommendations. The committee concluded that the (Leatherman and Sutherland 2003). Within each
cancer care delivery system is in crisis due to a of these domains, it is possible to measure various
growing demand for cancer care, increasing treat- elements, and so from this paradigm, a picture of a
ment complexity (including surgical procedures), service’s quality of care can be outlined. However,
a shrinking workforce, and rising costs (Levit et such comprehensive assessment can be too bur-
al. 2013). densome and thus not practical for frequent mon-
While it is widely recognized and accepted itoring and real-time evaluation.
that assessment of surgical quality and outcomes In addition, there have been significant efforts
should be a continuous process alongside care to identify and assess important elements of care
delivery, there is no clear consensus on how, pathways, rather than individual procedures,
when, and what outcomes should be measured. which may lead to better outcomes and higher
The problem is fueled by the fact that quality’s quality (Donabedian 1966; Hurtado et al. 2001;
definition changes depending on the stake- Maxwell 1984; Schiff and Rucker 2001; Sitzia
holder’s perspective. For instance, surgeons and Wood 1997). Many countries have made sig-
evaluate each other’s quality based on technical nificant progress with the implementation of
skills, board certifications, and morbidity which national quality programs (Department of Health
is under their perceived direct control, character- Office 1995; Department of Health 2000) includ-
istics that are often invisible and hence meaning- ing NSQIP (Agency for Healthcare Research and
less to patients. Instead, patients prefer clinicians Quality 2009; Australian Commission on Safety
with excellent communication skills who are and Quality in Healthcare 2008; American Col-
always on time, regardless of whether or not the lege of Surgeons 2014a), but further research is
surgeon is a board-certified Fellow of the Amer- required to accurately and affordably improve
ican College of Surgeons (FACS). Similar assessments of surgical quality.
144 K. Noyes et al.
develop a payment mechanism to facilitate such Medicaid eligible individuals through its
engagement. partnering organization, Monroe Plan. Over the
Other community-based stakeholders may years, Excellus partnered with many other com-
include medical societies, public health and safety munity stakeholders (e.g., Kodak, MCMS,
providers and agencies, social and aging services, URMC) to lead several area-wide initiatives
and educational organizations. Stakeholders out- aimed to improve quality of care and population
side of the healthcare system and non-for-profit health and reduce necessary variation in care and
world may include patient support groups and services overuse.
organizations, payers, large self-insured corpora-
tions, and business alliances who are also inter-
ested in improving overall community health at a Types of Data for Surgical Outcome
lower cost (Blackburn 1983; Brownson et al. Assessment
1996; Group 1991; Fawcett et al. 1997; Goodman
et al. 1995; Howell et al. 1998; Johnston et al. Existing Data Sources
1996; Mayer et al. 1998; Zapka et al. 1992;
Roussos and Fawcett 2000). In Upstate There are multiple types of medical data available,
New York, the Greater Rochester and Finger and each have their own set of complexities that
Lakes regions are well recognized for their long while answering important questions also leave
history of community-wide collaborations includ- gaps that require further analysis from alternative
ing University of Rochester Medical Center, Fin- perspectives found through other data sources.
ger Lakes Health Systems Agency (FLHSA), Typical datasets are comprised of the following:
Monroe County Medical Society (MCMS), Roch- hospital discharges, claims, registry, and survey
ester Business Alliance, Rochester regional office results. Other administrative types of data include
of American Cancer Society (2014), local payers hospital discharge data or billing data as recorded
(e.g., Excellus Blue Cross Blue Shield), account- and provided by the hospital itself. These datasets
able care organizations, and others. The FLHSA is are highly dependent on local practices and can
an independent community health planning orga- vary between institutions. It can be linked with
nization working collaboratively with multi- other subject data providing an in-depth chart
stakeholder groups to improve healthcare quality review; however, it is limited by the cases
and access and eliminate healthcare disparities in performed at an individual hospital. Some states
the nine-county Finger Lakes region. Its mission have statewide discharge census data, including
is to bring into focus community health issues via California and New York (Hannan et al. 1994,
data analysis and community engagement and to 1995, 2012, CA Society of Thoracic Surgeons
implement solutions through community collabo- 2014). These datasets provide billing data at a
ration and partnership. It has become the convener larger level, which includes ICD-9 codes by diag-
and facilitator of multi-stakeholder community nosis, with the ability to track hospital and sur-
initiatives to measure and improve the health, geon level variation, subject linking
healthcare, and cost of care. In the initial round longitudinally across in-state and charges (in con-
of the CMMI Innovation Challenge, the FLSHA trast to claims paid out) (Table 1).
was awarded with a $26.6 million initiative Claims data are available at a national as well
“Transforming the Delivery of Primary Care: A as local levels and include Medicare data that can
Community Partnership.” be linked to other datasets and insurance claims
Excellus Blue Cross Blue Shield is a nonprofit (i.e., Excellus-blue shield, large self-insured cor-
health plan, whose mission is to work collabora- porations (Xerox, Kodak), and data warehouses
tively with local hospitals, doctors, employers, (Thompson Reuters)). Registry data can be quite
and community leaders to offer affordable detailed, albeit specific to the registry’s purpose.
healthcare products. For instance, Excellus Examples of registry datasets include tumor reg-
administers its managed care products for istries like SEER that can be linked to Medicare
146 K. Noyes et al.
Table 1 Types of data used to assess surgical outcomes, quality, and safety
Types of data Databases Examples
Cancer SEER, NCDB (Mack et al. 2013; Rutter et al. 2013)
registry
Hospital Case series (Sinclair et al. 2012; Aquina et al. 2014b)
registry
Observational SPARCS, Statewide data, Medicare/Medicaid, (Rickles et al. 2013; Aquina et al. 2014a)
UHC
Randomized CEA/CAS (NASCET) Colonoscopy trial, Breast (Ferguson et al. 1999; Grube and Giuliano 2001;
controlled cancer z0011 Whitlock et al. 2008; Atkin 2003)
trials
Cost-Data PharMetrics, hospital billing, Medicare Charges, (Iannuzzi et al. 2014b; Jensen et al. 2012; Tufts
Tufts Cost-Effectiveness Registry 2014)
Process SCIP, WHO Surgical checklist, inpatient (The Joint Commission Core Measure Sets
measures smoking, VTE prophylaxis 2014a; American College of Surgeons,
Commission on Cancer, Surgical Care
Improvement Project 2014b; Safety 2008)
Satisfaction HCAHPS, Press Ganey (Systems 2014; Press Ganey Associates 2014)
Benchmarking ACS-NSQIP observed to expected mortality (Centers for Medicare and Medicaid Services
ratio (United States, thoracic, transplant; United 2014; Department of Health 2000; Cohen et al.
Kingdom, all surgeons), hospital compare, 2009a, b; Medicare.gov 2014)
creating centers of excellence (Medicaid Centers
of Excellence for breast cancer)
AMA provider survey (Etzioni et al. 2010, 2014)
AHA (ICU/staffing/nursing) (Nallamothu et al. 2006; Solomon et al. 2002)
SEER Surveillance, Epidemiology, and End Results Tumor Registry, NCDB National Cancer Data Base, SPARCS
New York Statewide Planning and Research Cooperative System, UHC University HealthSystem Consortium, SCIP
Surgical Care Improvement Project, VTE venous thromboembolism, HCAHPS Hospital Consumer Assessment of
Hospital Providers and Systems, ACS-NSQIP American College of Surgeons National Surgical Quality Improvement
Project, CMS Center for Medicare and Medicaid Services, AHA American Hospital Association, AMA American Medical
Association, ICU intensive care unit
for more robust analysis, NCDB that expands medicine, and challenged the traditional approach
cancer data beyond the identified cancer centers of confidential reporting of adverse events. Based
that are included within SEER, and the National on its success, this was expanded to the STS
Surgical Quality Improvement Program National Database established in 1989. The STS
(NSQIP) registry that samples approximately states that “physicians are in the best position to
20% of all cases performed at participating hos- measure clinical performance accurately and
pitals. Other registries include those maintained objectively” (Surgeons 2014), serving as a man-
by provider organizations (AMA, AHA). date for surgeon participation in these initiatives.
Finally, survey data can provide the patient per- While cardiac surgery has long maintained a
spective that is lacking from other large dataset similar database for tracking quality, this
analyses. Two prime examples are the Medicare approach was expanded nationally to help
Current Beneficiary Survey and the Hospital improve surgical outcomes. The National Surgi-
Consumer Assessment of Hospital Providers cal Quality Improvement Program (NSQIP) has
and Systems (HCAHPS) Survey. been a major development within the surgical
The first database for surgical outcomes was community as it provides more detailed surgical
developed in NYS for cardiothoracic surgery information at a national level than was ever pre-
(Hannan et al. 1990) leading to substantial quality viously available. The main purpose of this pro-
improvement, facilitating development of the gram was to improve quality through
field of quality assessment and risk adjustment in benchmarking, where hospitals were given risk-
7 Health Services Information: Data-Driven Improvements in Surgical Quality: Structure. . . 147
adjusted data comparing outcomes nationally to New York collaborative, called UNYSQI (Upstate
other hospitals of similar size. Based on the depth New York Surgical Quality Initiative), has
of data, numerous research studies have been focused predominantly on colorectal surgery and
conducted, describing surgical risk factors and more specifically at addressing the question of
comparing operative approaches. While this has readmissions. NSQIP allows for 40 additional
been very useful for expanding our understanding variables, and given this narrow limitation, spe-
of surgical quality as a whole, it was quickly cific questions must be addressed.
realized that different operations needed specific Participation in data collection programs is
in-depth data in order to design meaningful qual- promoted as it meets criteria for both mainte-
ity improvement strategies. One approach to pro- nance of certification (MOC) and Physician
viding more detailed data has been the roll out of Quality Reporting System (PQRS) as part of
procedure targeted variables, in which institutions CMS (EHealth University: Centers for Medicare
can add to the traditional NSQIP data for addi- & Medicaid Services 2014). This section for
tional cost. This approach allows for a more maintaining credentials requires that providers
detailed approach to individual procedures. This evaluate their performance based upon spe-
was first made available with the release of the cialty-established requirements which must
2012 NSQIP dataset, and the impact remains to be include national benchmarking. The MOC out-
seen. Targeted variables have required consensus lines six core competencies, one of which is
from experts that can be difficult to obtain and be practice-based learning and improvement. Part
limited in its scope. This in-depth approach also IV of the process for continuous learning
requires more resources limiting participation. includes practice performance assessment. For
Another specialty-specific approach includes the American Board of Surgery, diplomats must
the Organ Procurement and Transplantation Net- participate in a national, regional, or local surgi-
work (OPTN) database aimed at monitoring trans- cal outcome database or quality assessment pro-
plant programs nationally. This is monitored and gram. The PQRS is a part of CMS and is the
run by the US Department of Health and Human second specific incentive promoting the use of
Services (National Cancer Institute 2014). The outcome data collection programs as it uses both
desire for more detailed data has led to a number payment adjustments to penalize, as well as
of subspecialty datasets modeled after NSQIP. A incentive payments to ensure providers report
few examples include a vascular surgery-specific quality data (Table 2).
dataset, the Vascular Quality Initiative (2014),
Pediatric NSQIP, and an endocrine surgery-spe-
cific dataset (Collaborative Endocrine Surgery Data Quality
Quality Improvement Collective 2014). The
methods of data collection vary, NSQIP employs A common saying in large database analysis is
a clinical nurse reviewer, and CESQIP does not “garbage in garbage out,” and while there are
yet have the same infrastructure, requiring the methods to account for missing data, a major
surgeon or the surgeon’s designee to input data. limitation remains with extensive missing data
Another approach has been the creation of points. One approach might be to limit case inclu-
regional collaboratives, which requires a high sion to only those with a full set of data; however,
level of collaboration with both academic and this quickly limits patient inclusion. This
nonteaching hospitals alike. Regional collabora- approach may be appropriate for some major
tives will likely play a role in decreasing unnec- data points such as sex, where it can be assumed
essary variability and tracking quality at a more that if subject sex is not included then other vari-
manageable, regional level, where it is easier to ables are likely to be of questionable quality.
implement change than at the national level. Thus Missing data may also be secondary to the data
far, the regional approach has been seen in both collection process. For instance, in NSQIP, preop-
Michigan and Central New York. The central erative laboratory values are gathered; however,
148 K. Noyes et al.
Table 2 Databases and outcomes used to assess surgical outcomes, quality, and safety
Dataset Description Sample and outcomes
ACS-NSQIP http://site.acsnsqip. Maintained by the American College 30-day data based on postoperative
org/ of Surgeons. Participation through outcomes. Provides benchmarking
annual fees by hospital
Pediatric NSQIP Subset of overall NSQIP 30-day follow-up for surgical
http://www.pediatric.acsnsqip.org/ procedures performed on pediatric
patients
VQI (Vascular Quality Initiative) Vascular procedure-specific data 255 participating centers. Uses cloud
www.vascularqualityinitiative.org (including those performed by computing to allow multiple users to
radiologists, cardiologists, and enter data and does not depend on full-
vascular surgeons). Follow-up time data entry specialist. Can be
through 1 year. Governed by the integrated into electronic medical
Society of Vascular Surgeons (SVS) records
Patient Safety Organization
CESQIP (Collaborative Endocrine Since 2012, through the American Patient-centered data collection,
Surgery Quality Improvement Association of Endocrine Surgeons ongoing performance feedback to
Program) (AAES) clinicians, and improvement based on
http://cesqip.org/ analysis of collected data and
collaborative learning
STS National Database Society of Thoracic Surgeons run Focuses on three areas: adult cardiac,
http://www.sts.org/national- program that makes quality scores general thoracic, and congenital heart
database available to institutions and the public surgery
at large. National data for research
requires specific application to the
STS and is not released to
participating hospitals by virtue of
inclusion in data gathering
The Surveillance, Epidemiology, 1973–2011 cancer incidence and Includes data on patient
and End Results (SEER) program survival data from population-based demographics, primary tumor site,
funded by the National Cancer cancer registries covering tumor morphology and stage at
Institute approximately 28 % of the US diagnosis, first course of treatment,
http://seer.cancer.gov/about/ population and 12-month survival
overview.html
Hospital discharge data
Statewide Planning and Research Comprehensive all-payer data Patient-level data on patient
Cooperative System (SPARCS) reporting system. The system was characteristics, diagnoses and
California Patient Discharge initially created to collect information treatments, services, and charges for
Dataset on discharges from hospitals each hospital inpatient stay and
National Inpatient Sample (US) outpatient (ambulatory surgery,
http://www.hcup-us.ahrq.gov/ emergency department, and outpatient
nisoverview.jsp services) visit, and each ambulatory
Hospital Episode Statistics (UK) surgery and outpatient service visit to
http://www.hscic.gov.uk/hes a hospital extension clinic and
diagnostic and treatment center
licensed to provide ambulatory
surgery services
The Centers for Medicare & CMS is responsible for administering Data on acute, psychiatric and skilled
Medicaid Services (CMS) claims the Medicare, Medicaid, and State nursing inpatient admissions,
and survey data Children’s Health Insurance outpatient services, procedures and
http://www.resdac.org/cms-data/ Programs. CMS gathers and formats tests, use of prescription medications,
file-directory about Medicare beneficiaries, skilled nursing, durable medical
Medicare claims, Medicare providers, equipment, and hospice
clinical data, and Medicaid eligibility
and claims. CMS also collects
additional survey data on health
behavior and utilization Medicare &
(continued)
7 Health Services Information: Data-Driven Improvements in Surgical Quality: Structure. . . 149
Table 2 (continued)
Dataset Description Sample and outcomes
Current Beneficiary Survey (MCBS)
and satisfaction with care Consumer
Assessment of Healthcare Providers &
Systems (CAHPS)
American Hospital Association Hospital-specific data on 1,000 data fields covering
(AHA) Annual Hospital Survey approximately 6,500 hospitals and organizational structure, personnel,
http://www.aha.org/research/rc/ 400-plus systems hospital facilities and services, and
stat-studies/data-and-directories. financial performance
shtml
American Medical Association Established in 1906, current and Information about demographics,
(AMA) Physician Masterfile historical data for more than 1.4 practice type, significant education,
http://www.ama-assn.org/ama/ million physicians, residents, and training and professional certification
pub/about-ama/physician-data- medical students in the United States, on virtually all Doctors of Medicine
resources/physician-masterfile. including approximately 411,000 (MD) and Doctors of Osteopathic
page graduates of foreign medical schools Medicine (DO)
there remains extensive variation in timing of missing data group as its own categorical level
preoperative labs, as well as whether a specific without making any assumptions if there is an
blood level is checked at all. One particular exam- observed effect compared to subjects with data.
ple is albumin level. Albumin level has demon- Another method includes imputation of data.
strated associations with nutrition and overall These methods are beyond the scope of this chap-
health status. Studies have shown associations ter, but briefly involve separate analysis predicting
with surgical outcomes as well; however, this that specific data point based on the subject’s other
laboratory value is not always checked preopera- characteristics.
tively. In fact, there may be a bias of checking this Missing data of the first type (missing sex) can
value in patients that may be at risk for malnutri- be avoided through auditing processes. Many data
tion or have other major comorbidities. This fact collection programs employ auditing processes to
may bias results leading to concern about its inclu- ensure quality data and sites are not included if
sion in multivariable analysis, even though it they demonstrate inability to conform to
holds clinical value. Some suggest it should not predetermined standards.
be included at all, while others suggest it requires Another major limitation to all large datasets is
a more nuanced approach. Albumin, for instance, changing variable definitions over time. While
is reported as a continuous variable, but can be this process is necessary to some extent as clini-
transformed into a binary variable using clinically cally meaningful definitions may change with
meaningful cutoffs previously described as 3.5 g/ time, it can drastically limit the subject numbers
dl. By assuming all missing values fall within the available for analysis for that endpoint. One such
normal range, one creates a differential misclassi- example is postoperative transfusion within
fication that underestimates the true effect as some NSQIP. Initially, the number of transfused units
in this group may in fact have low albumin levels. was included intraoperatively and postoperatively
Thus, if an observed association is found, it likely defined as greater than 4 units. Researchers were
is true, albeit an underestimate. The data can then able to then describe this endpoint as major post-
still be useful for clinical decision making even operative bleeding and specifically describe the
though many values are in fact missing. Another extent of intraoperative blood loss. This changed
approach to this same problem can be assessing in 2011 when the number of intraoperative units
whether those in the missing dataset are different of blood was removed altogether and postopera-
with respect to the endpoint than the others. This tive transfusion was changed to 2 units or more of
is specifically testing whether there is differential packed red blood cells. The first limitation is the
misclassification. If there is, then one can treat the danger of merging datasets across years without
150 K. Noyes et al.
understanding these changes. First, if ignored, century and now represents the preferred tech-
researchers may erroneously code these missing nique (Korndorffer et al. 2010).
intraoperative transfusions as no transfusion given These changes can significantly impact
and make assumptions upon it which will clearly research as each procedure has specific compli-
be mistaken. Secondly, it poses a challenge in the cations; however, there may be limits in the
second instance as the postoperative transfusion available data due to changes not captured by
variable in the newer dataset has a different clin- the coding systems. For instance, CPT coding
ical meaning. Two units of blood can be given for does not capture robotic techniques lumping
merely low hematocrit levels with comorbidities them with laparoscopic procedures. This has lim-
meant to optimize patients and no longer ited observational studies comparing or even
representing a postoperative bleeding event. tracking robotics usage over the past decade.
These two variables of transfusion are not compa- Another example on the limits of CPT coding
rable over time, given the changes limiting include the absence of transanal endoscopic
analysis. microsurgery (TEMS) codes used for distal rectal
cancer resections that are of sufficiently minimal
rectal wall invasion. This approach is a mini-
Changes in Surgical Procedures mally invasive one that spares the rectum and
and Practices Over Time the sphincter allowing for essentially full rectal
function in low-grade tumors; however, they are
Other issues regarding data collection include lumped in with other rectal cancer resections
the constantly evolving process of case defini- which often include complete rectal resections
tion and even the addition of new surgical pro- with end colostomy or loss of sphincter. The
cedures over time. For instance, the change from difference in quality of life and even the types
ICD-9 to ICD-10 is looming, and how this will of complications are huge. While it clearly makes
impact data collection remains to be seen. The it impossible to perform observational studies on
nuanced changes between the two systems will TEMS within large datasets, it also adds varia-
likely impact some areas more than others, and a tion and error into any assumptions about out-
deep understanding of these nuances will be nec- comes after low rectal cancer resections. There
essary to compare cases between these two time are some ways to exclude TEMS from dataset by
periods. The last major ICD coding change was selecting cases where the tumor stage was suffi-
in 1975, and the medical arena has changed dra- ciently high to make TEMS contraindicated;
matically in that time including the advent of the however, this does not help elucidate specifically
electronic record. the advantages of TEMS. Another example
Some databases only include ICD-9 coding where CPT coding fails is differentiating
where numerous different procedures may be rel- between some specific laparoscopic approaches.
evant for repair of that diagnosis, for instance, Although open inguinal hernia repair has been a
appendicitis can be treated by an open approach bread-and-butter surgical operation, within the
making an incision in the right lower quadrant or last decade, increasingly surgeons are applying
can be treated using laparoscopic techniques, their laparoscopic skills to hernia repair. There
using three small incisions and a camera for are two available laparoscopic approaches:
appendix extraction. Where only ICD-9 codes totally extraperitoneal (TEP) or transabdominal
are available such datasets lack discrimination preperitoneal (TAPP). The TAPP approach
preventing comparison of operative approach. enters the abdominal cavity in standard laparo-
The introduction of laparoscopic procedures is scopic fashion repairing the hernia from the
one example of how surgical procedures change inside using tacks, whereas the TEP approach
over time; while the first report of laparoscopic enters a space above the peritoneum placing the
appendectomy was published in 1981, this prac- mesh between layers and usually does not require
tice did not become ubiquitous until the turn of the tacks to keep the mesh in place. Both approaches
7 Health Services Information: Data-Driven Improvements in Surgical Quality: Structure. . . 151
may have different risk profiles and long-term operations are now possible with the first transat-
sequelae; however, observational evaluation is lantic cholecystectomy or so-called “Lindbergh”
limited since there is no differentiation by CPT operation was performed in 2001 (Marescaux et
codes in the ICD-9 system. al. 2002). These changes were only possible
There also remain many processes that are not through improvements in electronic communica-
coded in most databases. This includes many data tion that decreased the lag time sufficiently to
points that may impact outcomes, such as patient allow such an operation.
follow-up strategies, staffing, utilization of The role that virtual communication will have
trainees, and even postdischarge medications. in the future remains unclear, but will likely
While large datasets evolve, opportunities to increase in frequency in the coming decades. Cur-
expand the data as research questions arise may rently, such approaches are not tracked; however,
be available. UNYSQI is one example where including such practices in large healthcare data-
through the ACS-NSQIP institutions can track bases may be useful in understanding their uptake
their own specific data points which may help and impact on clinical care. Other adjunct
answer specific questions. advances also impact surgical care, although
The surgical field is constantly progressing, largely unappreciated, such as major advances
not just specifically with new procedures but and availability in high-quality imaging. Where
also with the introduction of entirely new special- 20 years ago computed tomography was limited,
ties. For example, endocrine surgery is starting to it is now ubiquitous and high-quality scans are
become a major surgical subspecialty; although available within minutes. These findings change
not yet a board-certified specialty, the presence of the diagnostic paradigms and the quality of surgi-
these more specialized surgeons may impact out- cal decision making, although availability of such
comes. Other major changes in surgery may also high-quality CT scans is not included in data-
impact outcomes, which have not been included bases, even those that track whether CT scanning
in current databases. For example, resident work was done at all. Other technological advances
hour restrictions by the ACGME continue to include intraoperative imaging through 3D lapa-
change and become increasingly strict. Previ- roscopy and the development of new instruments
ously, it was not unheard of for surgical residents that make previously unthinkable operative
to work 120–100 h weekly, where now work approaches possible such as single incision sur-
hours are capped at 80 per week and interns are gery or natural orifice transluminal endoscopic
prevented from taking 24-h call. These changes surgery that allows surgeons to perform cholecys-
have drastically changed patient coverage and in tectomy through the vagina.
some cases required supplementing staffing There are many other changes to the structure
through advanced practice providers or moon- of healthcare that may drastically impact out-
lighters. These changes have not been tracked comes including advances in patient monitoring
and it is unclear how changing the workforce or quality of care in the intensive care unit. While
structure has impacted outcomes. Although con- it would be onerous to include all of these changes
troversial, this question holds some urgency as into any given dataset, it is important to remember
more and more restrictions are being the many forces that impact outcomes. Much like
implemented. In fact, a new randomized con- a projectile in physics has many forces that alter
trolled trial will observe how these restrictions its course such as friction, rotation, and wind
impact care; one arm of the trial will require forces, and many of these forces can be ignored
surgical residents to follow the new regulations, to provide the overall picture using the major
while the other will function without work hour forces of velocity and gravity on the object to
restrictions. However, such data is largely absent provide an estimated course; however, keeping
from current datasets. these other forces in mind remains important as
Other major changes include the advent of they may have potential to be key forces in surgi-
telemedicine, and with robotics, even remote cal care.
152 K. Noyes et al.
Individual Surgeon Variation patient was admitted with a risk factor for
(Preferences, Techniques, and Skills) readmission. This has led to disastrous conse-
quences as inclusion of such reasons for
Even if there is a single code and agreed-upon readmission in the model can make all other risk
surgical treatment or practice, the implementation factors no longer statistically significant, and in
of this can vary considerably. Laparoscopic cho- one model, the authors came to the incorrect con-
lecystectomy, for instance, one of the most com- clusion that the only risk factor for readmission
monly performed operations, has considerable was postoperative complications, although subse-
variation in the way the procedure itself is quent studies have demonstrated this to be false.
performed. The absence of this precise detail is This can be avoided by using complication timing
in obstacle to standardizing procedures nationally. to define complications as during the inpatient
There are statistical techniques for controlling for stay as compared to at postdischarge. While
variation at the surgeon level, specifically hierar- predischarge complications have been associated
chical modeling with random effects. Hierarchical with readmissions, the effect estimates have been
random effect modeling also addresses the issue much lower than previously described when all
that most multivariable models ignore; indepen- complications are considered together.
dence assumptions are voided in healthcare stud-
ies as patients are treated by surgeons within
hospitals which have been shown to impact qual- Limited Information on Socioeconomic
ity. Surgeon volume is one surgeon factor that was Drivers of Health
initially noted in 1979, where complex procedures
such as pancreatectomy and coronary artery Analyses of patterns and outcomes of care require
bypass graft have better outcomes when an assessment of the complex relationships
performed by higher-volume surgeons (Solomon among patient characteristics, treatments, and out-
et al. 2002; Birkmeyer et al. 2002; Katz et al. comes. Furthermore, according to the Andersen
2004). This may in part reflect standardization of healthcare utilization model (Aday and Andersen
technique, evidence-based practice, and skill, 1974), usage of health services (including inpa-
which may be a function of practice. Teasing out tient care, outpatient physician visits, imaging,
how outcomes are dependent on technique varia- etc.) is determined by three dynamics:
tion is virtually impossible in current large predisposing factors, enabling factors, and need.
dataset, although one could argue this variation Predisposing factors can be characteristics such as
might explain quality to a much greater degree race, age, and health beliefs. For instance, an
than even risk adjustment based on patient factors. individual who believes surgery is an effective
treatment for cancer is more likely to seek surgical
care. Examples of enabling factors could be famil-
Timing of Complications ial support, access to health insurance, one’s com-
munity, etc. Need represents both perceived and
Even if a reasonable outcome is chosen, it is actual need for healthcare services. To conduct
essential to understand the interplay of that com- and interpret outcome analyses properly,
plication with the hospital course. Incorrect researchers should both understand the strengths
assumptions about this can lead to incorrect and limitations of the primary data sources from
answers. Recent studies on readmissions have which these characteristics are derived and have a
suffered from major errors when they attempt to working knowledge of the strategies used to trans-
include complications as risk factor for late primary data into the categories available in
readmission (Aquina et al. 2014b). Some studies public databases. For instance, SEER-Medicare
suggest that complications are the biggest risk documents details on individual cancer diagnoses,
factor for readmission, and while this may seem demographics, (age, gender, race), Medicare eli-
reasonable, they often confuse the reason the gibility and program enrollment by month, and
7 Health Services Information: Data-Driven Improvements in Surgical Quality: Structure. . . 153
aggregate measures of the individual’s “neighbor- outcomes and quality improvement studies are
hood” (e.g., average income and years of educa- using multiple merged sources of data.
tion presented at the zip-code and census-tract The SEER-Medicare data is a product of a
level) as determined through a linkage to recent linkage between two large population-based
US Census data. However, census level data do datasets: Surveillance, Epidemiology, and End
not allow for assessment of differences among Results (SEER) Program of the National Cancer
those zip-code areas. Institute and beneficiaries healthcare claims data
Many analyses of large databases focus on the collected by the Center for Medicare and Medic-
patient’s race or ethnicity as a confounder or a aid Services for billing and enrollment purposes.
predictor of outcome or a marker for other The linked dataset includes Medicare beneficia-
unobserved factors (disadvantaged geographic ries with cancer from selected states participating
area or low health literacy). Information on race in SEER Program, with unit of observation being
is generally available, while information on eth- one healthcare utilization event. This includes all
nicity is often missing or inappropriately coded. Medicare-covered healthcare services from the
While most of the US data surveys allow only one time of a person’s Medicare eligibility (before or
category for Hispanic ethnicity (yes/no), the after cancer diagnosis) until their death. Because
NCDB classifies cancer patients into seven cate- of complex sampling design, number of included
gories (Mexican, Cuban, Puerto-Rican, Domini- variables, and specific data reporting practices for
can, South/Central American, Hispanic by name, tumor characteristics and services utilization, the
and Other). In our analysis of treatment patterns investigator considering a SEER-Medicare-based
for Hispanic cancer patients in NCDB, we dem- study or a proposal should spend time understand-
onstrated persistent disparities in receipt of guide- ing SEER-Medicare data limitations (National
line-recommended care. The care in Hispanic Institute of Health 2014) and learning about data
group as a whole was not significantly different layout and coding (manuals and training are avail-
from non-Hispanic, while individual subgroups able at the NCI and other cancer research
demonstrated significant differences, highlighting organizations).
a critical need of acknowledging Hispanic sub- The Medicare Current Beneficiary Survey
groups in outcome research. (MCBS) is a longitudinal survey of a nationally
representative sample of the Medicare population.
The MCBS contains data about sociodemo-
Need for Linked Data graphics, health and medical history, healthcare
expenditures, and sources of payment for all ser-
Surgical safety and quality are multifactorial vices for a randomly selected representative sam-
issues with more than one risk factor and hence ple of Medicare beneficiaries (Centers for
multiple potential mechanisms for improvement. Medicare and Medicaid Services 2014). For
For instance, reduction in postsurgical complica- every calendar year, there are two separate
tions could be partially achieved by more efficient MCBS data files released: Access to Care and
patient education about early symptoms, improve- Cost and Use files which can be ordered directly
ment in surgeon’s skills, changes in nursing and from the CMS with assistance from the Research
hospital practices, use of surgical visiting nurse Data Assistance Center at the University of Min-
services, and other interventions. Similarly, one nesota (Research Data Assistance Center 2014).
quality improvement intervention may have MCBS Access to Care file contains information
impact on multiple stakeholders including on beneficiaries’ healthcare access, healthcare sat-
patients and their caregivers, clinic personnel, isfaction, and their usual sources of care (Goss et
and health insurance. Hence, a comprehensive al. 2013; Research Data Assistance Center 2014).
evaluation may require information about all MCBS Cost and Use file offers a complete sum-
involved parties. Such data are rarely available mary of all healthcare expenditure and source of
in one dataset, and therefore, many surgical payment data on all healthcare services including
154 K. Noyes et al.
expenditures not covered by (CMS Research Data processing to enable enhanced decision making,
Assistance Center 2015). The information col- insight discovery, and process optimization
lected in the surveys is combined with the claims (Gartner 2013). The challenges of working with
data on the use and cost of services. Medicare big data include analysis, capture, curation,
claims data includes information on the utilization search, sharing, storage, transfer, visualization,
and cost of a broad range of costs including inpa- and privacy violations, among many others. Inno-
tient hospitalizations, outpatient hospital care, vative solutions such as cloud computing chip
skilled nursing home services, and other medical away at some challenges while remaining limited
services. In order for the Cost and Use file to by others. For instance, cloud computing outside
collect, summarize, and validate accurate pay- services such as Amazon ec2, box, dropbox, inter-
ment informations, the release of C&U file is net2, etc. provide storage or processing capabili-
usually delayed by 2 years compared to the ties, but without internal infrastructure or
MCBS AC file. agreements with the outside services, there is the
In addition to publically available merged potential for privacy violations. Yet, just like with
datasets, individual investigators can create their the administrative data several decades earlier, the
own aggregated databases by linking together opportunities provided by big data potentially
information from multiple sources and combining outweigh the risks and, in time, may become
existing data with prospectively collected and data-driven analytics as routine as EMR and dig-
patient-reported information. Examples of such ital image sharing.
studies include a NSQIP-based evaluation of pre-
operative use of statins and whether it is associ-
ated with decreased postoperative major Structure-Process-Outcome
noncardiac complications in noncardiac proce- Assessment in Surgery
dures (Iannuzzi et al. 2013c), a study of recipients
of abdominal solid organ transplant (ASOT) using Theoretical Framework of Quality
additional data from patient medical records Assessment in Healthcare
(Sharma et al. 2011), and a retrospective review
of the data from medical records of patients diag- According to Donabedian (1966), if there is evi-
nosed with hepatocellular carcinoma compared to dence that good structure leads to appropriate
patients in the California Cancer Registry (CCR) processes which in turn result in good outcomes,
(Atla et al. 2012). quality of healthcare intervention could be mea-
sured in terms of either structures (S), processes
(P), or outcomes (O) (Fig. 1).
Data Management and Big Data These indicators can be measured using elec-
tronic, readily available, data from the organiza-
More and more data are being collected for differ- tional health information systems, data collected
ent purposes and are available to be linked by cancer trackers, and other regional data sys-
together including electronic memberships, tems, like Rochester RHIO. It is important to work
online purchasing and consumer behavior closely with each hospital’s clinical quality
records, electronic transactions and others. The assessment team, to avoid redundancy in data
datasets become so large and complex that it collection and other quality assessment and
becomes difficult to manage using traditional reporting initiatives (e.g., Hospital Scorecard, the
resources, and organizations have to increase Clinical Service Scorecard, and the Management
their resources in order to be able to manage Plan Tracking Reports, SCIP, HCAHPS), and
them. Before we know what to do with it, we others (Hospital Consumer Assessment of
have entered into a new era of big data. Big data Healthcare Providers and Systems 2014; The
is high-volume, high-velocity, and/or high-variety Joint Commission Core Measure Sets 2014a).
information assets that require new forms of Additional financial and pre- and postadmission
7 Health Services Information: Data-Driven Improvements in Surgical Quality: Structure. . . 155
cost and utilization information about patients can with payers, regional healthcare systems, and
be obtained from CMS claims data for Medicare accountable care organizations (Froimson et al.
fee-for-service beneficiaries and Excellus BCBS 2013; Ugiliweneza et al. 2014).
claims for commercially insured and Medicare While it is tempting to seek out a single perfect
HMO patients (Medicare Health Insurance metric of surgical quality, anybody familiar with
Claim (HIC) number or health insurance ID will the complexity and variation in patient risks and
be abstracted from the patients’ medical charts). the delivery of surgical care would agree that such
The bundles of care for surgical patients can be metric could not possibly exist. More suitable
defined by multidisciplinary care teams for specific would be a multidimensional measure similar to
diagnoses and surgical service lines. A care bundle the six-domain definition of healthcare quality
identifies a set of key interventions from evidence- suggested by the World Health Organization
based guidelines that, when implemented, are (WHO). These dimensions require that healthcare
expected to improve patient outcomes (Institute be:
for Healthcare Improvement 2006). The aim of
care bundles is to change patient care processes • Effective: delivering healthcare that is adher-
and thereby encourage guideline compliance in a ent to an evidence based and results in
number of clinical settings (Brown et al. 2002; improved health outcomes for individuals and
Burger and Resar 2006; Pronovost et al. 2006). communities
Using regional or national healthcare utilization Example: each cancer case is reviewed by a
and expenditure data with Medicare or private specialty multidisciplinary team at least once
plan reimbursement schedule, clinicians and hos- before the final decision about treatment is
pital administrators can estimate annual cost of reached.
care for surgical patients receiving various care • Efficient: delivering healthcare in a
bundles, by disease stage. These bundled cost esti- manner that maximizes resource use and
mates can be used internally (e.g., for budgeting avoids waste
projections or to calculate return on investment Example: avoid unnecessary imaging for
for new programs and interventions) or externally, colorectal cancer (CRC) patients such as PET
to provide a foundation for contract negotiations scans or multiple CT scans.
156 K. Noyes et al.
• Accessible: delivering healthcare that is surgeon, the former Minister of Health in the
timely, geographically reasonable, and pro- United Kingdom, and the lead author of the UK
vided in a setting where skills and resources Darzi Plan to redesign care delivery, encouraged
are appropriate to the medical need healthcare agencies to “localize care where possi-
Example: providing a hub-and-spoke model ble, and centralize services where necessary” for
for chemotherapy delivery for CRC patients efficacy and safety. This implies that routine
residing far from major cancer centers healthcare, like cancer survivorship services,
• Acceptable/patient centered: delivering should take place as close to home as possible,
healthcare which takes into account the prefer- while more complex care, like active cancer treat-
ences and aspirations of individual service ment, should be centralized to ensure it is carried
users and the cultures of their communities out by the most skilled professionals with cutting-
Example: offering palliative care to all edge equipment and high volume/experience.
patients with advanced cancer There exist several validated care delivery
• Equitable: delivering healthcare that does not models to improve access to specialty care for
vary in quality because of personal character- patients with complex chronic disease living in
istics such as gender, race, ethnicity, geograph- underserved or remote communities (for instance,
ical location, or socioeconomic status using videoconferencing technology for enhanced
Example: providing financial assistance to care coordination). There is a large body of liter-
low-income cancer patients assuring that out- ature demonstrating that standardized care path-
of-pocket expenses do not represent a barrier ways, use of multidisciplinary teams (MDTs),
for adequate treatment resident involvement (Iannuzzi et al. 2013a, b),
• Safe: delivering healthcare that minimizes availability of specialized providers (e.g., board-
risks and harm to service users certified surgical specialists, surgical nurses, and
Example: following WHO surgical check- PA) and services (e.g., stoma care, wound care,
list to minimize the risk of surgical complica- surgical ICU), and receiving care in a high-vol-
tions and never events ume center of excellence are associated with bet-
ter outcomes (Reames et al. 2014; Howell et al.
As illustrated by the examples above, this defi- 2014).
nition of healthcare quality provides the link Evidence that hospital volume influences out-
between the organization of care, care processes, comes has been verified in nearly every major
surgical quality, and outcomes. Hence, it enables type of surgery (Begg et al. 1998; Birkmeyer et
all participating stakeholders (e.g., clinicians, al. 2002; Katz et al. 2004). This body of work
researchers, payers, and hospital administrators) highlighted important and previously
to rely on Donabedian’s framework when unrecognized variations in hospital performance
assessing quality of surgical services. According and ignited efforts to improve surgical quality
to Donabedian, if there is evidence that good struc- among poorly performing hospitals. In an effort
ture leads to appropriate processes which in turn to reduce these variations among hospitals, new
results in good outcomes, quality of healthcare health policy and quality improvement initiatives,
intervention could be measured based on presence such as public reporting, pay-for-performance,
of appropriate structures (S) or processes (P). and surgical checklists, have been implemented
Below we provide several examples of to promote best practice and improve standards of
evidence-based measures of quality in surgical care. care (Hannan et al. 1990, 2012; Haynes et al.
2009; Lindenauer et al. 2007). Over the last
decade, surgical mortality rates have significantly
Structure decreased throughout the country, possibly due to
such measures (Weiser et al. 2011; Finks et al.
Lord Darzi, international expert on quality and 2011; Birkmeyer 2012). While surgical/facility
innovation in cancer care, world-leading colorectal volume is easy to measure, the mechanism of
7 Health Services Information: Data-Driven Improvements in Surgical Quality: Structure. . . 157
association between procedure volume and out- was a relation between hospital volume and mor-
comes remains to be poorly understood. Possible tality for complex procedures such as open-heart
explanations highlight the importance of surgical surgery or coronary bypass (Luft et al. 1979).
expertise, specialized services, and infrastructure Since then, Birkmeyer et al. expanded on this
that tend to be associated with large-volume idea by showing a significant relationship
centers. between both hospital volume and surgeon vol-
Patient management following multidis- ume and operative mortality for many different
ciplinary principles consistently leads to superior procedures, including resections for lung, bladder,
outcomes at much lower costs. Published esophageal, and pancreatic cancer (Birkmeyer et
supporting evidence for improved cancer-specific al. 2002). Subsequent surgical oncology studies
outcomes with the use of multidisciplinary teams have shown an association between volume and
is available for a range of cancers, including negative margin status, superior nodal harvest,
breast, lung, head and neck, esophageal, and colo- and both short-term and long-term survival.
rectal (Chang et al. 2001; Coory et al. 2008; Gabel Recently, volume-outcome relationship has been
et al. 1997; Stephens et al. 2006; Wille-Jorgensen demonstrated even for less specialized proce-
et al. 2013; Burton et al. 2006). dures, such as incisional hernia repair (Aquina et
al. 2014a).
Evidence of the volume-outcome relationship,
Process along with financial pressures, implementation of
surgical bundled payments, and shift to account-
Many factors that constitute the structure and able care organizations brought to light the impor-
organization of surgical services contribute to tance of efficient and coordinated models of care
the processes of care and, ultimately, affect patient delivery. With the increase in the number of sur-
outcomes. For instance, in addition to knowing gical subspecialties and nonsurgical specialties
structural features, such as whether a hospital has performing surgical procedures (e.g., intervention
a surgical ICU, it is also important to identify radiology and cardiology, urogynecology), there
processes of care, such as how the ICU is staffed is an increase in the involvement of advanced
and what policies, regulations, and checklists the practice providers in patient care delivery (e.g.,
SICU personnel adhere to, including failure to nurse practicioners (NP), physician assistants
rescue, escalation of care, communication, use of (PA), technicians, and therapists) and growing
imaging and antibiotics, and patient nutritional acceptance of multidisciplinary care pathways
protocols. If a residence program is housed in a (oncology, geriatrics, orthopedics, among others).
hospital (structure), what, when, and how surgical For example, high-volume bariatric surgery prac-
residents are required to perform during cases tices can hire psychologists, nutritionists, exercise
(processes) may vary by institution and has seri- therapists, and specialty nurses to provide addi-
ous impact on institutional outcomes. tional supportive services. This approach can free
There is also a growing interest regarding the surgeon’s time and improve care coordination and
potentially detrimental impact of interruptive patient experience. There are other situations
operating room (OR) environments on surgical when the specialty and training of provider is
performance (Healey et al. 2006; Wiegmann et important – for the procedures that could be
al. 2007). Previous investigations showed that performed by different types of providers, for
interruptions occur frequently in ORs, across var- instance, inferior vena cava filter (IVC filter), a
ious surgical specialties (Weigl et al. 2015). type of vascular filter that is implanted to prevent
In an effort to improve surgical outcomes and life-threatening pulmonary emboli (PEs). IVC fil-
potentially lower costs, recent attention has been ters could be placed by a number of different types
placed on efficiency of care delivery and the sur- of providers (vascular surgeons, general surgeons,
gical volume-outcome relationship. Luft et al. first cardiologists, interventional radiologists) for var-
explored this concept in 1979 showing that there ious indications. The outcomes of the intervention
158 K. Noyes et al.
(mortality, complications, PE) could potentially mortality based on the 30-day postadmission
depend on the specialty and skill of the provider. interval rather than postdischarge time (Borzecki
In general, clinic staff rarely bill for their ser- et al. 2010; Hannan et al. 1990, 2013).
vices and often are employed by the institution. Cancer Survival: For surgical oncology stud-
Multidisciplinary consultations for cancer ies, cancer survival rate is often more appropriate
patients are also not reimbursable and often outcome metric than surgical mortality because
count toward “academic time” for faculty physi- the vast majority of cancer patients receive multi-
cians. As a result, these services may be “invisi- modal therapy. Cancer survival is reported by
ble” from insurance claims or medical records. In most tumor registries or can be calculated from
fact, only one provider can be associated with pathology reports. Cancer survival is defined as a
each billable service (procedure or hospital admis- percentage of people who have survived a certain
sion). For any service delivered by more than one type of cancer for a specific amount of time (e.g.,
provider (e.g., resident participating in a surgical 12 months, 2 or 5 years). Certain cancers can recur
case, several APPs involved in hospital discharge many years after first being diagnosed and treated
process), additional data may need to be included (e.g., breast cancer). During this time, a former
(e.g., operating notes, individual provider claims). cancer patient (also called survivor) may die from
a different condition (oncologic or benign), and
hence, the most appropriate choice of reported
Surgical Outcomes statistics in this case would be tumor site-specific
mortality. For instance, patient may be success-
A choice of optimal outcome for each study or fully treated for thyroid cancer but die from colon
evaluation depends on the goal of the assessment cancer 20 years later. Other types of survival rates
as well as factors that may be driving this outcome that give more specific information include dis-
(causal pathway) and resources available to the ease-free survival rate (the amount of cancer
investigators as some of the outcome collection patients who are cancer-free), progression-free
processes may be very costly and time consuming survival rate (the amount of cancer patients who
(e.g., health utility and quality of life measure- are not cured but their cancer is not progressing),
ment) (Drummond et al. 2005; Iezzoni 2004). and cancer recurrence (cancer that has returned
Below we describe some of the most common after treatment and after a period of time during
types of outcomes used in surgical outcome which the cancer was not detected). Sometimes
research and quality assessment and discuss their without detailed pathology data, it is impossible to
applications, limitations, and sources of data. distinguish cancer recurrence from cancer pro-
gression. An example of recurrence versus pro-
Clinical Outcomes gression dilemma could be observed in rectal
Mortality: When defining mortality, it is impor- cancer patients who received nonsurgical
tant to be specific about the duration of the obser- neoadjuvant treatment. Following neoadjuvant
vation period (e.g., in-hospital vs. 30-day chemoradiotherapy (CRT) and interval
mortality) as well as the starting point for the proctectomy, 15–20% of patients are found to
observation period (e.g., day when the procedure have a pathological complete response (pCR) to
was performed for 30-day postsurgical mortality combined multimodal therapy, but controversy
versus 30 days after hospital discharge for 30-day persists about whether this yields a survival ben-
hospital mortality). Using hospital discharge efit (Martin et al. 2012).
abstracts and publicly available software, one Surgical Complications: Incisional Hernia.
can measure in-hospital mortality using the most Incisional hernia is abdominal wall fascia that
appropriate definitions for the needs of the project. fails to heal. Incisional hernia is a common post-
For instance, if there is a significant variation in operative complication following major abdomi-
the hospital length of stay between patients in the nal surgery. Data on incidence of incisional hernia
study, it may be more accurate to define hospital is highly variable with reported values ranging
7 Health Services Information: Data-Driven Improvements in Surgical Quality: Structure. . . 159
from 0% to 91%. Diagnosis for incisional hernias healthcare setting that yields accurate measure-
is typically within the first 3 years after initial ment of patient health status domains over time
laparotomy (Yahchouchy-Chouillard et al. 2003; with few items (National Institute of Health
Rosen et al. 2003; Rea et al. 2012); however, it 2015a).
may take up to 10 years to become evident after Hospital Consumer Assessment of Healthcare
the initial surgery (LeBlanc et al. 2000; Akinci et Providers and Systems (HCAHPS) (Systems
al. 2013). This large amount of variation in the 2014): Just like with any other consumer goods
reported rates of incisional hernia is not and services, many providers and organizations
unforeseen, given the wide assortment of the have collected information on patient satisfaction
group of patients included into the studies, the with healthcare. However, prior to HCAHPS,
executed surgery, and the amount of time during there was no national standard for collecting and
the follow-up (Caglià et al. 2014). Several out- publicly reporting patients’ perspectives on their
come measures could be appropriate for a study healthcare experience that would enable valid
on incisional hernia including incidence, preva- comparisons to be made across providers. In
lence, rates of hospital admission, and May 2005, the National Quality Forum (NQF),
reoperation. an organization responsible for standardization of
Surgical Complications: Surgical Site Infec- healthcare quality measurement and reporting,
tion (SSI) (Schweizer et al. 2014). In addition to formally endorsed the CAHPS ® Hospital Survey
pain, discomfort, and high risk for readmission, (Press Ganey Associates Inc 2014).
surgical site infections (SSIs) are identified with The HCAHPS survey is mailed to a random
an excessive amount of morbidity and mortality. sample of hospital patients after a recent dis-
The costs of SSIs have been the focus of quality charge. The survey asks patients to rate 21 aspects
improvement and safety efforts ever since the of their hospital care combined into nine key
Centers for Medicare and Medicaid have halted topics: communication with patients and doctors,
compensation for the growing costs linked with communication between patients and nurses,
SSIs after some surgical operations (so-called responsiveness of the hospital staff, pain manage-
potentially preventable infections) (Aquina et al. ment, communication with patients about medi-
2014b). Prior studies have reported cost of hospi- cines, discharge information, hospital’s
talizations after SSIs in the range from $24 000 to cleanliness, hospital environment’s noise levels,
$100 000 (Schweizer et al. 2014). and transition of care. Patients’ perception of care
is a key performance metric and is used to deter-
Patient-Reported Outcome Measures mine payments to hospitals (Hospital Consumer
(PROMs) Assessment of Healthcare Providers and Systems
Patient-Reported Outcomes Measurement Infor- 2014). The Hospital Compare database (4605
mation System (PROMIS ®): Measures included in hospitals) can be used to examine complication
PROMIS ® are intended for standardized assess- rates and patient-reported experience for hospitals
ment of various patient-reported outcome across the nation. Prior studies have demonstrated
domains – including pain, fatigue, emotional dis- an inverse relationship between patient experi-
tress, physical functioning, and social role partic- ence and complication rates. This negative corre-
ipation (Devlin and Appleby 2010). PROMIS® is lation suggests that reducing these complications
a new set of tools intended to be used in routine can lead to a better hospital experience. Overall,
clinical practice as a part of electronic medical these results suggest that patient experience is
record (EMR) (Cella et al. 2007) system. generally correlated with the quality of care
PROMIS ® was established in 2004 with funding provided.
from the National Institutes of Health (NIH). Depending on the type of surgery and patient
PROMIS measures are based on common vali- population, other outcome measures may be also
dated metrics to ensure computerized and bur- relevant (e.g., pain, functional status, and cogni-
den-free data collection process in any tive ability). Quality of life is a multidomain
160 K. Noyes et al.
indicator that combines all aspects of health rele- clinical endpoint but does not necessarily have a
vant to patients and, hence, may serve as an guaranteed relationship (Cohn 2004). Surrogate
aggregate outcome measure. markers are also used when the number of events
Quality of Life and Subjective Well-Being (Lee is very small, thus making it impractical to con-
et al. 2013): Quality continues to be placed at the duct a clinical trial to detect a statistically signif-
heart of discussions about healthcare. This raises icant effect (e.g., instead of measuring VTE
important questions how quality of care should be events which have an incidence of less than 1%,
measured and from whose perspective, patient’s, studies often use ultrasound-detected blood clots
provider’s, or payer’s. Subjective well-being which are much more prevalent but do not always
(SWB) is a measure of the overall “wellness” of result in PE or VTE) (Fleming and DeMets 1996).
an individual and as such has the potential to be A correlate does not make a surrogate. It is a
used as this global marker for how treatments common misconception that if an outcome is a
affect people in the experience of their lives. correlate (i.e., correlated with the true clinical
SWB links all stages in the treatment and care outcome), it can be used as a valid surrogate
process, thus allowing the overall quality of care endpoint (i.e., a replacement for the true clinical
to be determined and valued according to its direct outcome). However, proper justification for such
effect on people’s lives. SWB has been shown to replacement requires that the effect of the inter-
have an effect on outcomes at all stages of the vention on the surrogate endpoint predicts the
treatment experience, and improved health and effect on the clinical outcome – a much stronger
quality outcomes are shown to consistently condition than correlation. Other examples of
enhance SWB (Lee et al. 2013). Furthermore, commonly used surrogate outcomes in surgery
SWB measures have been shown to be a suitable include costs of care as a measure of poor out-
method to value the impact of healthcare on the comes and disability, positive surgical margins,
families and caregivers of patients and, in this carcinoembryonic antigen (CEA), and number of
way, can join up health outcomes to show wider lymph nodes retrieved as a measure of long-term
effects of treatment on patients’ lives. Measuring cancer recurrence and mortality (Nussbaum et al.
an individual’s SWB throughout his or her treat- 2014).
ment experience can enable a full appraisal of the
quality of care that they receive. This could facil- Composite Outcomes: Episode of Care or
itate service improvements at the microlevel and Care Bundles
help value treatments for resource allocation pur- The value of quality reporting in surgical care,
poses at the macrolevel. however, is limited by problems with existing
measures of quality, mainly, that existing quality
Surrogate Outcomes indicators are designed to measure the quality of a
Although everybody recognizes the importance of specific facility (e.g., hospital) or a specific pro-
measuring patient outcomes and several valid and vider (e.g., surgeon). This, however, does not
accurate measures (as described above) are avail- reflect the current paradigm of care delivery
able, there are several practical barriers to mea- when a patient may be diagnosed in the commu-
suring patient outcomes. These include time nity, referred to a regional center of excellence for
(waiting for cancer recurrence or mortality to neoadjuvant chemoradiation, followed up for 6
occur while maintaining regular follow-up with a months by an academic colorectal surgeon, before
patient), personnel costs (to perform routine sur- returning back to the community for years of
veillance and follow-ups), and patient burden posttreatment surveillance. Regional standardized
(repeated follow-up, evaluations, and surveys). pathways of care and multidisciplinary team
One of the potential solutions to these problems (MDT) approach has been recommended by all
is use of surrogate outcomes. A surrogate out- clinical societies to better identify, coordinate,
come (or endpoint) is a measure of effect of a deliver, and monitor the optimal treatment on an
specific treatment that may substitute for a real individual patient-by-patient basis (Chang et al.
7 Health Services Information: Data-Driven Improvements in Surgical Quality: Structure. . . 161
2001; Coory et al. 2008; Stephens et al. 2006; of surgeons who treat low- or average-risk
Abbas et al. 2014; Wille-Jorgensen et al. 2013; patients.
Morris et al. 2006; Gatt et al. 2005; Adamina et al.
2011).
From Data to Quality Improvement
Risk adjustment is a set of analytic tools used for For many hospital and outpatient services, there is
an array of functions in the healthcare (Iezzoni and a wide difference between billed charges and the
Long-Bellil 2012; Schone and Brown 2013). One amounts that providers expect to receive for ser-
of the primary uses of risk adjustment is providing vices. Hospital charges are usually determined by
fair comparison between different patient hospital administrators depending on prior history
populations, providers, or programs. Risk adjust- and demand. Reimbursement rates, on the other
ment is also necessary to set costs for health plans hand, or the payments that hospitals are actually
to suggest expected treatment expenses of their willing to accept for a specific service or product,
specific membership group. Because of discrep- vary by payer and specific plan. On average, hos-
ancy in everyone’s health and treatment needs, the pitals billed Medicare 3.77 times (standard devia-
cost and outcomes of healthcare may differ from tion = 1.83) what they were actually reimbursed,
person to person. Without risk adjustment, plans with a range of 0.42 to 16.23 (Muhlestein 2013).
or providers have an enticement to enroll and treat The ratio may vary for private payers.
healthier patients (so-called cream skimming or High hospital charges, though, do have some
cherry-picking) and avoid sick, frail, or complex important consequences. First, since the charges
patients. After appropriate risk adjustment, plans do not correlate with the amount being paid and
and providers receive a larger amount of reim- hospital expenditures required to produce a spe-
bursement for members with numerous chronic cific service (i.e., true cost), it becomes difficult, if
illnesses than for members with a small amount not impossible, to compare process between hos-
of or no health problems at all. In addition to costs, pitals, and draw conclusions about financial sus-
risk adjustment is also applied to health outcomes tainability of various service lines. Second – and
when comparing performance across providers (e. potentially devastating for some – those who are
g., risk-adjusted mortality is reported by the STS uninsured who receive care at a hospital, or those
National Database and NSQIP, CABRG Report who are insured and receive care at an out-of-
Cards NYS, UK surgical mortality (National network hospital, may face a bill that greatly
Health Services 2015); The Society of Thoracic exceeds by many times the negotiated price paid
Surgeons National Database 2014). The method- by any payer.
ology used to risk adjustment varies, depending in
part on healthcare market regulations, the
populations served, and the source of payments. Focusing on Modifiable Factors
Risk adjustment is used in all major public pro-
grams offering health coverage in the United One of the major paradoxes that limits our ability
States – including Medicare Advantage (MA), to improve practice based on the results of
Medicare Part D, and state Medicaid managed published studies is that most available predictors
care programs. The STS National Database, with are not modifiable (readmissions: patient severity,
its three million patient records, has long used risk comorbidities), while most modifiable factors are
adjustment to provide more accurate patient out- not routinely collected through standard clinical
comes. If not risk adjusted, the records of sur- data systems (SES, organizational structure). Fur-
geons who perform operations on higher-risk thermore, the reported statistical associations not
patients would always look worse than the records equal causation (but often assumed) and hence,
162 K. Noyes et al.
modifying predictor may not result in a desired do not always lead to meaningful improvement
change in the outcome of interest. Let’s consider in care quality and patient outcomes. Is this the
the example below. ground for skepticism? Not at all. Just like many
Failure to rescue (FTR) refers to the mortality investigations in basic biomedical sciences, out-
among patients with serious complications comes and quality assessment projects often fall
(Johnston et al. 2014; Pucher et al. 2014; short of their potential impact by simply reporting
Almoudaris et al. 2013). Typically, it is hospitals barriers to high-quality care without considering
with greater FTR rates (not greater complication strategies for systematically overcoming these
rates) that have the greatest rate of mortality. limitations and obstacles. Other common mistake
Thus although complications may occur, out- is assuming that just because some risk factors are
comes can still be improved by optimizing the statistically associated with poor quality or out-
quality of care provided to the patient post- comes, they represent a target for improvement.
complication. Although there have been several For instance, if low patient education is associated
studies highlighting the importance of FTR as a with poor cancer prognosis, it may be naïve to
marker for quality of care, these have only con- assume that more education would improve out-
sidered organizational aspects of healthcare. Few comes in cancer patients without a high school
have explored the underlying human factors that diploma. In this case, low education is likely to be
lead up to this critical event. Two main factors a marker for social and economic deprivation in
may contribute toward an FTR event: first, a this demographic group. Addressing this issue
failure to recognize a sick patient and, second, a may require developing a system-wide solution
failure to act promptly once deterioration has like providing a care navigator, graphics rather
been detected. In both situations, an escalation than text-based decision support tools, and
of care (EOC) process is required if FTR is to be phone- rather than internet-based communication
avoided. with care providers.
EOC involves a nurse recognizing a change in Sometimes when large administrative dataset
patient status and communicating it to a postgrad- are used for the analysis, statistically significant
uate year 1 (PGY1) resident, who subsequently risk factors are not necessary clinically significant.
reviews the patient and then escalates care further Before considering any change in clinical prac-
for advice and/or management. Escalation is a tice, it may be beneficial to review the results for
difficult process, as the first doctor called by the face validity with all stakeholders involved in care
nurses will usually be the most junior; this is the process. One approach is to use a systematic
traditional hierarchy. After initial assessment, the quantitative validated method to assess risks in
junior doctor must then contact his or her senior to the process of information transfer across all
explain why they need help and the urgency of phases of surgical care. The method is known as
response required. All of this places a premium on failure mode and effect analysis (FMEA) and was
the value of communication between team mem- originally developed by engineers to accomplish
bers. However, failures in communication are proactive risk analyses (McDermott et al. 1996).
ubiquitous and frequent in the postoperative The National Center for Patient Safety of the US
phase. Although this EOC process lies at the cen- Department of Veterans Affairs adjusted FMEA
ter of FTR and is critically important for safety for use in healthcare, resulting in healthcare
and quality of surgical care, it remains difficult to FMEA (HFMEA) (DeRosier et al. 2002).
measure and quantify and, hence, relatively Healthcare FMEA is a multistep process (Fig. 2)
unexplored in the research literature. that uses a multidisciplinary team to proactively
evaluate a healthcare process. The team uses pro-
cess flow diagrams, hazard scoring, and decision
Identifying Actionable Goals trees to identify potential vulnerabilities and to
assess their potential effect on patient care. The
Despite the most sound study design and state-of- method captures the likelihood of risks, the sever-
the-art statistical methodology, outcome studies ity of consequences, and the probability that they
7 Health Services Information: Data-Driven Improvements in Surgical Quality: Structure. . . 163
Fig. 2 Main steps in surgical healthcare failure mode and effect analysis (HFMEA) (Adapted from the Veterans Affairs
National Center for Patient Safety, DeRosier et al. 2002)
important for observational studies using data poorly correlated with the more specific anasto-
where patients have not been randomized. motic leak variable as more specifically defined.
Higher-level statistical packages using multi- These findings suggest that prior reports are
variable approaches to adjust for patient-level based on identifying organ space infection as
factors are now readily available, providing an anastomotic leak in colorectal surgery.
adjusted estimated effects in terms of odds Odds ratios may be difficult to put into clin-
ratios. Despite the ubiquity of such methods, if ically meaningful terms other than demonstrat-
not well thought out, results can be drastically ing relative importance. Another approach to
skewed. Only confounding factors and taking multivariable analysis to the next step is
covariates not on the causal pathway should be the creation of risk scores aimed at guiding clin-
included. If one controls for factors on the ical decision making. This approach effectively
causal pathway, one may find that no presumed operationalizes the data available in multivari-
risk factors are associated with the outcome, able analysis by weighting risk factors. The
because they have been effectively controlled approach to these analyses is slightly different
for in the multivariable analysis. This will be as they are aimed at predicting an event, rather
discussed further below. Confounders such as than identifying all potential risk factors. This
comorbidities may also be highly collinear, and changes in which variables are included in anal-
grouping or using already established practices ysis, as only those that improve the predictive
for comorbidity adjustment may be helpful in ability should be used. There may be a high
decreasing the number of variables, particularly degree of crossover; however, risk scores are
if the research question is regarding comparing most useful when they are simple and so one
two different surgical approaches where one may desire to make a parsimonious model, that
only desires to adjust for comorbidities rather is, a model with the fewest number of covariates
than ascertain their independent contribution to while maximizing the predictive power of the
risk for poor outcome. model (Iannuzzi et al. 2013d, 2014a; Kelly et al.
While multivariable analyses are presented 2014a). In order to perform a predictive analysis,
with odds ratios, even this relatively straightfor- data should be split into a development and
ward result presentation requires some additional validation dataset so the risk score can be tested
thought in terms of the desired interpretation. One on naive subjects estimating its ability to be
particular nuance is whether using a reference applied to novel patients. Another similar
group that makes the odds ratio greater than one, approach is the use of nomograms, which is
in other words suggesting increased risk, or such simply another way to organize risk score-type
that the odds ratio suggests a protective effect. It is data.
often more intuitive to present odds ratios With the advent of the electronic record, some
suggesting increased risk; however, this is not of this risk scoring can now be integrated directly
always appropriate. into the clinical record, alerting physicians about
As quality data becomes more prevalent, high-risk patients for readmissions or high-risk
multiple metrics reportedly measuring the same DVT patients prompting some action such as pro-
poor outcome may exist. Auditing these results phylaxis prescription. This approach has
and comparing which approach is more reliable increased the use of guideline-based approaches
and measures the underlying disease state is of and may be an effective tool moving forward.
utmost importance, particularly if this data is to NSQIP also provides individual patient risk cal-
lead to clinical change. For instance, using culators for many complications which allow in-
Pearson’s correlation coefficient, a study of office estimates of risk based on individual patient
NSQIP data when compared to regional data factors. This tool anecdotally has a high degree of
measuring anastomotic leaks found that the tra- satisfaction for patients and providers alike and
ditional approach of “organ space infection” likely improves the consent process.
7 Health Services Information: Data-Driven Improvements in Surgical Quality: Structure. . . 165
Primary Sources Abbas MA, Chang GJ, Read TE, Rothenberger DA,
Garcia-Aguilar J, Peters W, Monson JR, Sharma A,
American College of Surgeons (ACS). National Surgical Dietz DW, Madoff RD, Fleshman JW, Greene FL,
Quality Improvement Program. American College of Wexner SD, Remzi FH. Optimizing rectal cancer
Surgeons. 2014a. http://site.acsnsqip.org/. Accessed 19 management: analysis of current evidence. Dis Colon
Sept 2014. Rectum. 2014;57(2):252–9. https://doi.org/10.1097/
Andersen R, Newman J. Societal and individual determi- dcr.0000000000000020.
nants of medical care utilization in the United States. Adachi W, Lodolce AE. Use of failure mode and effects
Milbank Q. 2005;83(4):1–28. analysis in improving the safety of iv drug administra-
Birkmeyer J. Progress and challenges in improving surgi- tion. Am J Health-Syst Pharm. 2005;62(9):917–22.
cal outcomes. Br J Surg. 2012;99(11):1467–9. Adamina M, Kehlet H, Tomlinson G, Senagore A, Delaney
Cohen M, Dimick J, Bilimoria K, Clifford K, Richards K, C. Enhanced recovery pathways optimize health out-
Hall B. Risk adjustment in the American College of comes and resource utilization: a meta-analysis of ran-
Surgeons National Surgical Quality Improvement Pro- domized controlled trials in colorectal surgery. Surgery.
gram: a comparison of logistic versus hierarchical 2011;149(6):830–40. https://doi.org/10.1016/j.
modeling. J Am Coll Surg. 2009a;209(6):687–93. surg.2010.11.003.
Donabedian A. Evaluating the quality of medical care. Aday L, Andersen R. A framework for the study of access
Milbank Mem Fund Q. 1966;44:166–206. to medical care. Health Serv Res. 1974;9(3):208.
Fleming F, Thomas R, DeMets D. Surrogate end points in Agency for Healthcare Research and Quality. National
clinical trials: are we being misled? Ann Intern Med. Healthcare Quality & Desparties Report, 2008. US
1996;125(7):605–13. Department of Health and Human Services. 2009.
Hospital Consumer Assessment of Healthcare Providers Akinci M, Yilmaz KB, Kulah B, Seker GE, Ugurlu C,
and Systems. HCAHPS: Hospital Consumer Assess- Kulacoglu H. Association of ventral incisional hernias
ment of Healthcare Providers and Systems. 2014. with comorbid diseases. Chirurgia. 2013;108:807–11.
http://www.hcahpsonline.org/home.aspx. Accessed 5 Almoudaris A, Burns E, Bottle A, Aylin P, Darzi A,
May 2015. Vincent C, Faiz O. Single measures of performance
Maxwell R. Quality assessment in health. Br Med J. do not reflect overall institutional quality in colorectal
1984;288(6428):1470. cancer surgery. Gut. 2013;62(3):423–9.
Medicare.gov. The official U.S. Government Site for Medi- American Cancer Society. What is cancer recurrence? In:
care. Safe Surgery Checklist Use. In: Medicare.gov. When cancer comes back: cancer recurrence. 2014.
2014. http://www.medicare.gov/hospitalcompare/hos http://www.cancer.org/treatment/survivorshipduringan
pital-safe-surgery-checklist.html?AspxAutoDetectCoo daftertreatment/understandingrecurrence/whenyourcan
kieSupport=1. Accessed 4 May 2015. cercomesback/when-cancer-comes-back-what-is-recur
Pucher P, Rajesh A, Pritam S, Ara D. Enhancing surgical rence. Accessed 7 Jul 2016.
performance outcomes through process-driven care: a American College of Surgeons. American College of Sur-
systematic review. World J Surg. 2014;38(6):1362–73. geons (ACS). In: American College of Surgeons.
Schiff GD, Rucker D. Beyond structure–process–outcome: 2014b. https://www.facs.org/. Accessed 19 Sept 2014.
Donabedian’s seven pillars and eleven buttresses of American College of Surgeons, Commission on Cancer,
quality. Jt Comm J Qual Patient Saf. 2001;27 Surgical Care Improvement Project. Core measure dets.
(3):169–74. In: The Joint Commissions. 2014b. http://www.
Sinclair A, Schymura M, Boscoe F, Yung R, Chen K, jointcommission.org/surgical_care_improvement_project/.
Roohan P, Tai E, Schrag D. Measuring colorectal can- Accessed 10 May 2015.
cer care quality for the publicly insured in New York American College of Surgeons. Commission on Cancer.
State. Cancer Med. 2012;1(3):363–71. https://doi.org/ In: American College of Surgeons. 2014c. https://
10.1002/cam4.30. www.facs.org/quality-programs/cancer. Accessed 19
The Joint Commission Core Measure Sets. 2014a. http:// Sept 2014.
www.jointcommission.org/core_measure_sets.aspx. Apkon M, Leonard J, Probst L, DeLizio L, Vitale R.
Accessed 19 Sept 2014. Design of a safer approach to intravenous drug infu-
The Society of Thoracic Surgeons National Database. The sions: failure mode effects analysis. Qual Saf Health
Society of Thoracic Surgeons National Database. 2014. Care. 2004;13(4):265–71.
http://www.sts.org/national-database. Accessed 19 Aquina C, Kelly K, Probst C, Noyes K, Langstein H,
Sept 2014. Monson JR, Fleming F. Surgeon and facility volume
Tufts Medical Center. Cost-Effectiveness Analysis Regis- play significant role in hernia recurrence and
try. In: Cost-Effectiveness Analysis Registry. 2014. reoperation after open incisional hernia repair. SSAT
https://research.tufts-nemc.org/cear4/ Accessed 5 May 55th annual meeting, Chicago;2014a. 2–6 May 2014.
2015. Aquina C, Rickles A, Iannuzzi JC, Kelly K, Probst C,
Noyes K, Monson JR, Fleming FJ. Centers of
166 K. Noyes et al.
excellence have lower ostomy-relatedNsquip. Tripar- Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve
tite Birmingham;2014b. 30 June 30–3 July 2014. B, Ader D, Fries J, Bruce B, Rose M. The Patient-
Atkin W. Options for screening for colorectal cancer. Reported Outcomes Measurement Information System
Scand J Gastroenterol. 2003;38(237):13–6. (PROMIS): progress of an NIH Roadmap cooperative
Atla P, Sheikh M, Mascarenhas R, Choudhury J, Mills P. group during its first two years. Med Care. 2007;45(5
Survival of patients with hepatocellular carcinoma in Suppl 1):S3.
the San Joaquin Valley: a comparison with California Cella D, Gershon R, Bass M, Rothrock N. What is assess-
Cancer Registry data. Ann Gastroenterol. 2012;25 ment center. In: Assessment Center. 2014. https://www.
(2):138. assessmentcenter.net/. Accessed 7 July 2016.
Australian Commission on Safety and Quality in Health Centers for Medicare and Medicaid Services (CMS).
Care. Windows Into Saf and Quality in Health Care. Physician Quality Reporting System (PQRS): mainte-
2008. nance of Certification Program Incentive. In: eHealth
Begg C, Cramer L, Hoskins W, Brennan M. Impact of University. 2014. https://www.cms.gov/eHealth/down
hospital volume on operative mortality for major can- loads/eHealthU_PQRSMaintenanceCertification-.pdf.
cer surgery. JAMA. 1998;280(20):1747–51. Accessed 6 Jul 2016.
Birkmeyer JD, Siewers AE, Finlayson EVA, Stukel TA, Chang J, Vines E, Bertsch H, Fraker D, Czerniecki B,
Lee Lucas F, Batista I, Gilbert Welch H, Wennberg DE. Rosato E, Lawton T, Conant E, Orel S, Schuchter L,
Hospital volume and surgical mortality in the United Fox K, Zieber N, Glick J, Solin L. The impact of a
States. N Engl J Med. 2002;346(15):1128–37. multidisciplinary breast cancer center on recommenda-
Blackburn H. Research and demonstration projects in com- tions for patient management: the University of Penn-
munity cardiovascular disease prevention. J Public sylvania experience. Cancer. 2001;91(7):1231–7.
Health Policy. 1983;4:398–421. Chantler C. The role and education of doctors in the deliv-
Borzecki A, Christiansen C, Chew P, Loveland S, Rosen A. ery of health care*. The Lancet. 1999;353
Comparison of in-hospital versus 30-day mortality (9159):1178–81. https://doi.org/10.1016/S0140-6736
assessments for selected medical conditions. Med (99)01075-2.
Care. 2010;48(12):1117–21. Cohen M, Bilimoria K, Ko C, Hall B. Development of an
Brown A, Patterson D. To err is human. In: Proceedings of American College of Surgeons National Surgery Qual-
the first workshop on evaluating and architecting sys- ity Improvement Program: morbidity and mortality risk
tem dependability (EASY’01). 2001. calculator for colorectal surgery. J Am Coll Surg.
Brown M, Riley G, Schussler N, Etzioni R. Estimating 2009b;208(6):1009–16.
health care costs related to cancer treatment from Cohn JN. Introduction to surrogate markers. Circulation.
SEER-Medicare data. Med Care. 2002;40(8): 2004;109(25 Suppl 1):IV-20–21.
IV104–17. https://doi.org/10.2307/3767931. Collaborative Endocrine Surgery Quality Improvement
Brownson R, Smith C, Pratt M, Mack N, Jackson-Thomp- Collective. Collaborative Endocrine Surgery Quality
son J, Dean C, Dabney S, Wilkerson. Preventing car- Improvement Program (CESQIP). In: The American
diovascular disease through community-based risk Association of Endocrine Surgeons. 2014. http://
reduction: the Bootheel Heart Health Project. Am J cesqip.org/. Accessed 19 Sept 2014.
Public Health. 1996;86(2):206–13. Coory M, Gkolia P, Yang I, Bowman R, Fong K. Sys-
Burger CD, Roger RK. “Ventilator bundle” approach to tematic review of multidisciplinary teams in the man-
prevention of ventilator-associated pneumonia. Mayo agement of lung cancer. Lung Cancer. 2008;60
Clin Proc. 2006;81(6):849–50. https://doi.org/10.4065/ (1):14–21.
81.6.849. Department of Health Office/Welsh. A policy framework
Burgmeier J. Failure mode and effect analysis: an applica- for commissioning cancer services (Calman-Hine
tion in reducing risk in blood transfusion. Jt Comm J report). London: Department of Health; 1995.
Qual Patient Saf. 2002;28(6):331–9. Department of Health. The NHS Cancer plan: a plan for
Burton S, Brown G, Daniels I, Norman A, Mason B, investment, a plan for reform. In: Publications. 2000.
Cunningham D. MRI directed multidisciplinary team http://webarchive.nationalarchives.gov.uk/+/www.dh.
preoperative treatment strategy: the way to eliminate gov.uk/en/Publicationsandstatistics/Publications/Pub
positive circumferential margins? Br J Cancer. 2006;94 licationsPolicyandGuidance/DH_4009609. Accessed
(3):351–7. 19 Sept 2014.
CA Society of thoracic Surgeons. California Cardiac Sur- DeRosier J, Stalhandske E, Bagian JP, Nudell T. Using
gery and Intervention Project (CCSIP). In: California health care failure mode and effect analysis™: the VA
Cardiac Surgery Intervention project. 2014. http:// National Center for Patient Safety’s prospective risk
www.californiacardiacsurgery.com/CCSIP-2012/index. analysis system. Jt Comm J Qual Patient Saf. 2002;28
html. Accessed 19 Sept 2014. (5):248–67. http://www.patientsafety.va.gov/profes
Caglià P, Tracia A, Borzì L, Amodeo L, Tracia L, Veroux sionals/onthejob/hfmea.asp.
M, Amodeo C. Incisional hernia in the elderly: risk Devlin N, Appleby J. Getting the most out of PROMs:
factors and clinical considerations. Intern J Surg. putting health outcomes at the heart of NHS decision-
2014;12(Suppl 2):S164–9. making. London: The King’s Fund; 2010.
7 Health Services Information: Data-Driven Improvements in Surgical Quality: Structure. . . 167
Drummond MF, Sculpher M, Torrance GW, O’Brien BJ, Werutsky G, Higgins M, Fan L, Vasconcelos C, Cazap
Stoddart G. Methods for the economic evaluation of E, Vallejos C, Mohar A, Knaul F, Arreola H, Batura R,
health care programmes. 3rd ed. New York: Oxford Luciani S, Sullivan R, Finkelstein D, Simon S, Barrios
University Press; 2005. C, Kightlinger R, Gelrud A, Bychkovsky V, Lopes G,
EHealth University: Centers for Medicare & Medicaid Stefani S, Blaya M, Souza F, Santos F, Kaemmerer A,
Services. Physician Quality Reporting System Azambuja E, Zorilla A, Murillo R, Jeronimo J, Tsu V,
(PQRS): Maintenance of Certification Program Incen- Carvalho A, Gil C, Sternberg C, Duenas-Gonzalez A,
tive. CMS. 2014. Sgroi D, Cuello M, Fresco R, Reis R, Masera G, Gabus
Etzioni DA, Cannom RR, Madoff RD, Ault GT, Beart Jr R, Ribeiro R, Knust R, Ismael G, Rosenblatt E, Roth B,
RW. Colorectal procedures: what proportion is Villa L, Solares A, Leon M, Torres-Vigil I, Covarru-
performed by American Board of Colon and Rectal bias-Gomez A, Hernandez A, Bertolino M,
Surgery–certified surgeons? Dis Colon Rectum. Schwartsmann G, Santillana S, Esteva F, Fein L,
2010;53(5):713–20. Mano M, Gomez H, Hurlbert M, Durstine A, Azenha
Etzioni DA, Young‐Fadok TM, Cima RR, Wasif N, G. Planning cancer control in Latin America and the
Madoff RD, Naessens JM, Habermann EB. Patient Caribbean. Lancet Oncol. 2013;14(5):391–436. https://
survival after surgical treatment of rectal cancer: impact doi.org/10.1016/S1470-2045(13)70048-2.
of surgeon and hospital characteristics. Cancer. Group, COMMIT Research. Community Intervention
2014;120(16):2472–81. Trial for Smoking Cessation (COMMIT): summary of
Fawcett S, Lewis R, Paine-Andrews A, Francisco V, Richter design and intervention. J Natl Cancer Inst. 1991;83
K, Williams E, Copple B. Evaluating community coali- (22):1620–8.
tions for prevention of substance abuse: the case of pro- Grube B, Giuliano A. Observation of the breast cancer
ject freedom. Health Educ Behav. 1997;24(6):812–28. patient with a tumor‐positive sentinel node: implica-
Ferguson G, Eliasziw M, Barr H, Clagett P, Barnes R, tions of the ACOSOG Z0011 trial. Semin Surg Oncol.
Wallace C, Taylor W, Haynes B, Finan J, Hachinski 2001;20(3):230–7.
V, Barnett H, for the North American Symptomatic Hannan E, Kilburn H, O’Donnell J, Lukacik G, Shields E.
Carotid Endarterectomy Trial Collaborators. The Adult open heart surgery in New York State: an analy-
North American Symptomatic Carotid Endarterectomy sis of risk factors and hospital mortality rates. JAMA.
Trial: surgical results in 1415 patients. Stroke. 1999;30 1990;264(21):2768–74.
(9):1751–8. https://doi.org/10.1161/01.str.30.9.1751. Hannan E, Siu A, Kumar D, Kilburn H, Chassin M. The
Finks J, Osborne N, Birkmeyer J. Trends in hospital vol- decline in coronary artery bypass graft surgery mortal-
ume and operative mortality for high-risk surgery. N ity in New York State: the role of surgeon volume.
Engl J Med. 2011;364(22):2128–37. JAMA. 1995;273(3):209–13.
Fletcher C. Failure mode and effects analysis: an interdis- Hannan E, Cozzens K, King S, Walford G, Shah N. The
ciplinary way to analyze and reduce medication errors. New York State cardiac registries history, contribu-
J Nurs Adm. 1997;27(12):19–26. tions, limitations, and lessons for future efforts to assess
Froimson M, Rana A, White R, Marshall A, Schutzer S, and publicly report healthcare outcomes. J Am Coll
Healy W, Naas P, Daubert G, Lorio R, Parsley B. Cardiol. 2012;59(25):2309–16.
Bundled payments for care improvement initiative: Hannan E, Farrell L, Wechsler A, Jordan D, Lahey S,
the next evolution of payment formulations: AAHKS Culliford A, Gold J, Higgins R, Smith C. The
Bundled Payment Task Force. J Arthroplasty. 2013;28 New York risk score for in-hospital and 30-day mortal-
(8):157–65. ity for coronary artery bypass graft surgery. Ann Thorac
Gabel M, Hilton N, Nathanson S. Multidisciplinary breast Surg. 2013;95(1):46–52.
cancer clinics. Do they work? Cancer. 1997;79 Hannan EL, Kilburn H, Racz M, Shields E, Chassin MR.
(12):2380–4. Improving the outcomes of coronary artery bypass
Gartner. Big Data. In: It Glossary. 2013. http://www. surgery in New York State. JAMA 1994;271
gartner.com/it-glossary/big-data/. Accessed 19 Sept (10):761–6.
2014. Haynes A, Weiser T, Berry W, Lipsitz SR, Breizat A,
Gatt M, Anderson A, Reddy B, Hayward-Sampson P, Dellinger P, Herbosa T, Joseph S, Kibatala P, Lapitan
Tring I, MacFie J. Randomized clinical trial of multi- M. A surgical safety checklist to reduce morbidity and
modal optimization of surgical care in patients under- mortality in a global population. N Engl J Med.
going major colonic resection. Br J Surg. 2005;92 2009;360(5):491–9.
(11):1354–62. https://doi.org/10.1002/bjs.5187. Healey AN, Sevdalis N, Vincent CA. Measuring intra-
Goodman R, Wheeler F, Lee P. Evaluation of the Heart To operative interference from distraction and interruption
Heart Project: lessons from a community-based chronic observed in the operating theatre. Ergonomics. 2006;49
disease prevention project. Am J Health Promot. (5–6):589–604.
1995;9(6):443–55. Hospital Consumer Assessment of Healthcare Providers
Goss P, Lee B, Badovinac-Crnjevic T, Strasser-Weippl K, and Systems. HCAHPS: Hospital consumer assess-
Chavarri-Guerra Y, Louis J, Villarreal-Garza C, Unger- ment of healthcare providers and systems. In: Hospital
Saldana K, Ferreyra M, Debiasi M, Liedke P, Touya D, Consumer Assessment of Healthcare Providers and
168 K. Noyes et al.
Rosen M, Brody F, Ponsky J, Walsh R, Rosenblatt S, small intestinal adenocarcinoma. Int J Colorectal Dis.
Duperier F, Fanning A, Siperstein A. Recurrence after 2013. https://doi.org/10.1007/s00384-013-1689-6.
laparoscopic ventral hernia repair. Surg Endosc Other Wayne A, Lodolce A. Use of failure mode and effects
Intervent Tech. 2003;17(1):123–8. analysis in improving the safety of IV drug administra-
Roussos S, Fawcett S. A review of collaborative partner- tion. Am J Health-Syst Pharm. 2005;62(9):917–22.
ships as a strategy for improving community health. Wehrli-Veit M, Riley J, Austin J. A failure mode effect
Ann Rev Public Health. 2000;21(1):369–402. https:// analysis on extracorporeal circuits for cardiopulmonary
doi.org/10.1146/annurev.publhealth.21.1.369. bypass. J Extra Corpor Technol. 2004;36(4):351–7.
Rutter C, Johnson E, Feuer E, Knudsen A, Kuntz K, Schrag Weigl M, Antoniadis S, Chiapponi C, Bruns C, Sevdalis N.
D. Secular trends in colon and rectal cancer relative The impact of intra-operative interruptions on sur-
survival. J Natl Cancer Inst. 2013;105:1806–13. geons’ perceived workload: an observational study in
Schone E, Brown R. Risk adjustment: what is the current state elective general and orthopedic surgery. Surg Endosc.
of the art and how can it be improved? In: Robert Wood 2015;29(1):145–53.
Johnson Foundation. 2013. http://www.rwjf.org/en/ Weinstein R, Linkin D, Sausman C, Santos L, Lyons C,
library/research/2013/07/risk-adjustment—what-is-the- Fox C, Aumiller L, Esterhai J, Pittman B, Lautenbach
current-state-of-the-art-and-how-c.html. Accessed 19 Sept E. Applicability of healthcare failure mode and effects
2014. analysis to healthcare epidemiology: evaluation of the
Schweizer M, Cullen J, Perencevich E, Vaughan S. Costs sterilization and use of surgical instruments. Clin Infect
associated with surgical site infections in veterans Dis. 2005;41(7):1014–9.
affairs hospitals. JAMA Surg. 2014. https://doi.org/ Weir V. Best-practice protocols: preventing adverse drug
10.1001/jamasurg.2013.4663. events. Nurs Manage. 2005;36(9):24–30.
Sharma R, Hawley C, Griffin R, Mundy J, Peters P, Shah P. Weiser T, Regenbogen S, Thompson K, Haynes A, Lipsitz
Cardiac surgical outcomes in abdominal solid organ S, Berry W, Gawande A. An estimation of the global
(renal and hepatic) transplant recipients: a case matched volume of surgery: a modelling strategy based on avail-
study. Heart Lung Circ. 2011;20(12):804–5. able data. Lancet. 2008;372(9633):139–44.
Sitzia J, Wood N. Patient satisfaction: a review of issues Weiser T, Semel M, Simon A, Lipsitz S, Haynes A, Funk L,
and concepts. Soc Sci Med. 1997;45(12):1829–43. Berry W, Gawande A. In-hospital death following inpa-
Society for Surgery of the Alimentary Tract. The society tient surgical procedures in the United States,
for surgery of the alimentary tract. In: The Society for 1996–2006. World J Surg. 2011;35(9):1950–6.
Surgery of the Alimentary Tract. 2016. http://www. Wetterneck T, Skibinski K, Roberts T, Kleppin S,
ssat.com/. Accessed 6 Jul 2016. Schroeder M, Enloe M, Rough S, Hundt A, Carayon
Society for Surgical Oncology. SSO: Society for surgical P. Using failure mode and effects analysis to plan
oncology. In: Society for Surgical Oncology. 2014. implementation of smart IV pump technology. Am J
http://www.surgonc.org/. Accessed 19 Sept 2014. Health-Syst Pharm. 2006;63(16):1528–38.
Solomon D, Losina E, Baron J, Fossel A, Guadagnoli E, Whitlock E, Lin J, Liles E, Beil T, Fu R. Screening for
Lingard E, Miner A, Phillips C, Katz J. Contribution of colorectal cancer: a targeted, updated systematic review
hospital characteristics to the volume–outcome rela- for the US Preventive Services Task Force. Ann Intern
tionship: dislocation and infection following total hip Med. 2008;149(9):638–58.
replacement surgery. Arthritis Rheum. 2002;46 Wiegmann D, ElBardissi A, Dearani J, Daly R, Sundt III T.
(9):2436–44. Disruptions in surgical flow and their relationship to
Stephens M, Lewis W, Brewster A, Lord I, Blackshaw G, surgical errors: an exploratory investigation. Surgery.
Hodzovic I, Thomas G, Roberts S, Crosby T, Gent C, 2007;142(5):658–65.
Allison M, Shute K. Multidisciplinary team management Wille-Jorgensen P, Sparre P, Glenthoj A, Holck S,
is associated with improved outcomes after surgery for Norgaard Petersen L, Harling H, Stub Hojen H,
esophageal cancer. Dis Esophagus. 2006;19(3):164–71. Bulow S. Result of the implementation of multidis-
https://doi.org/10.1111/j.1442-2050.2006.00559.x. ciplinary teams in rectal cancer. Colorectal Dis.
U.S. Department of Health & Human Services. Data. In: 2013;15(4):410–3. https://doi.org/10.1111/codi.12013.
Organ Procurement and Transplantation Network. 2014. World Alliance for Patient Safety. WHO surgical safety
http://optn.transplant.hrsa.gov/data/. Accessed 7 Jul 2016. checklist and implementation manual. In: World Health
Ugiliweneza B, Kong M, Nosova K, Huang BA, Babu R, Organization. 2014. http://www.who.int/patientsafety/
Klad SP, Boakye M. Spinal surgery: variations in safesurgery/ss_checklist/en/. Accessed 7 Jul 2016.
healthcare costs and implications for episode-based Yahchouchy-Chouillard E, Aura T, Picone O, Etienne J,
bundled payments. Spine. 2014;39:1235–42. Fingerhut A. Incisional hernias. Digest Surg. 2003;20
Vascular Quality Imitative. Improving vascular care. In: (1):3–9.
Society for Vascular Surgery. 2014. http://www. Zapka J, Marrocco G, Lewis B, McCusker J, Sullivan J,
vascularqualityinitiative.org/. Accessed 19 Sept 2014. McCarthy J, Birch F. Inter-organizational responses to
Wang Y, Jiang C, Guan J, Yang G, Yue J, Chen H, Xue J, AIDS: a case study of the Worcester AIDS Consortium.
Xu Z, Qian Q, Fan L. Molecular alterations of EGFR in Health Educ Res. 1992;7(1):31–46.
Health Services Information: From Data to
Policy Impact (25 Years of Health Services 8
and Population Health Research at the
Manitoba Centre for Health Policy)
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
The Deliverable Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
What Is a Deliverable? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Negotiating the Deliverable Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
The Approval Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Meetings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Presentations During the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Deliverable Measures and Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Highlights of Selected Deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
The “Need to Know” Team Deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Abstract Introduction
The impact of the Manitoba Centre for Health
Policy (MCHP) on policy development has The Manitoba Centre for Health Policy’s (MCHP)
resulted from an integrated approach to knowl- impact on policy and program development is the
edge translation (KT), combined with a close result, in large part, of an integrated approach to
relationship between the proposed/ongoing knowledge translation. This chapter focuses on
research and those working on provincial pro- this integrated approach which has become one
grams. Under a 5-year funding agreement with of the key factors underlying MCHP’s success.
Manitoba Health, the director of MCHP negoti- This chapter begins with a description of the
ates five new major projects (called “deliver- deliverable process and the numerous ways
ables”) annually with the Deputy Minister of researchers interact with provincial government
Health. Researchers interact among themselves personnel. MCHP enjoys an arm’s-length rela-
and with the provincial government in several tionship with the provincial government, which
ways: through forums, advisory group meetings, has no involvement in the interpretation of data or
knowledge translation workshops, and Need to drafting of deliverables (reports), and MCHP
Know (NTK) team meetings. Need to Know retains rights to publish all of its work. Next, the
representatives are from all the regional health impact several deliverables have had on govern-
authorities (RHAs), from Manitoba Health, and ment policies and programs will be highlighted.
also from MCHP staff. This and other activities Following this, an overview of the knowledge
related to knowledge translation are discussed. translation (KT) activities that have resulted in
This chapter outlines steps in the deliverable so many of MCHP’s impacts is provided. To
process. MCHP researchers retain publication conclude, important and interesting research
rights over the content of the deliverable with opportunities as well as challenges that lie ahead
government input being advisory only. Several for scientists using information-rich repositories
deliverables over the past 15 years, and their like ours will be discussed.
program and policy impacts, are discussed.
Finally, linking information from various
government departments with longitudinal The Deliverable Process
and familial data has created a large, integrated
data repository. Looking ahead, life stage ana- What Is a Deliverable?
lyses and intervention studies have great poten-
tial. In keeping with past success, MCHP MCHP works under a 5-year funding agreement
believes information-rich environments should with Manitoba Health to undertake five new
continue to facilitate opportunities for new major research projects a year plus KT events
types of research and policy analysis. that ensure the research is understood by
8 Health Services Information: From Data to Policy Impact (25 Years of Health Services and. . . 173
policy-makers and planners. These projects – following spring. At that time, Manitoba Health
termed deliverables – address health and social provides MCHP with a brief description of each
questions that can best be answered using data deliverable. These descriptions are posted on the
from the Population Health Research Data Repos- MCHP website in the area called “Upcoming
itory (Repository) which is developed, housed, MCHP Reports” http://umanitoba.ca/faculties/
and maintained at MCHP (see ▶ Chap. 2, health_sciences/medicine/units/community_
“Health Services Data: Managing the Data Ware- health_sciences/departmental_units/mchp/upcom
house: 25 Years of Experience at the Manitoba ing_deliverables.html.
Centre for Health Policy”). The associate director of research at MCHP
Each deliverable takes approximately 2 years works with the director to assign the investigators
to complete. Deliverables are produced by teams for each project. Soon after, a similar process is
that typically include a principal and co-principal undertaken by the lead research coordinator, the
investigator (PI and Co-PI), a research coordina- associate director of data access and use, and the
tor (RC), research support (RS), and data analysts research support coordinator to identify the
(DAs). Team members are typically chosen based remaining team members from their respective
on their area of expertise. Teams typically meet workgroups (research coordinators, data analysts,
weekly or biweekly throughout the course of a and research support). Occasionally, deliverable
deliverable to discuss the direction and progress teams will include graduate students or members
of the study, interpret results, and determine how outside of MCHP because of their expertise or
best to “tell the stories” that emerge from the data. interest in the topic.
A few times over the course of a deliverable, the
team also meets with an “advisory group” made
up of representatives from government and other The Approval Process
stakeholders who have relevant expertise and can
provide valuable feedback at different points in The PI works with the deliverable team to develop
the research process (see section “Meetings” for an initial analysis plan, which is then presented
more details). and critically reviewed in a research-scientist
forum held at MCHP. This forum is attended by
internal researchers and team members who help
refine the plan. The RC, in collaboration with the
Negotiating the Deliverable Topics
PIs, then prepares and submits the Health Infor-
mation Privacy Committee (HIPC) and Heath
Topics for deliverables are jointly determined by
Research Ethics Board (HREB) applications for
the Deputy Minister of Manitoba Health in nego-
approval. Depending on the datasets to be used in
tiation with the director of MCHP. Consultations
the deliverable, additional approvals from other
with assistant deputy ministers, MCHP scientists,
data providers may also be required. Throughout
and regional health authorities (RHAs) are under-
the life of the project, changes to the analysis plan
taken when appropriate. The final list of topics is
(“amendments”) and annual progress reports must
signed off by the Minister of Manitoba Health.
be submitted to HREB in order for the project to
Ideas are solicited from a broad range of stake-
maintain its approved status.
holders. If the research seems feasible using
repository data, the idea is added to a list. Specific
topics are also put forward by Manitoba Health
and the Healthy Child Committee of Cabinet Meetings
(Health Child Manitoba is Manitoba’s long-
term, cross-departmental strategy for putting fam- Meetings of the Advisory Group
ilies and children first). Negotiations typically An advisory group (AG) is also formed for each
start in the fall with final decisions made by the deliverable. It includes data providers, clinicians,
174 L. L. Roos et al.
health or social service experts, provincial plan- support from various other sources. The NTK
ners, policy-makers, RHA representatives, and Team meets three times a year for 2-day work-
other stakeholders with an interest in the topic. shops, together creating knowledge of relevance
This group meets two to three times over the life to regional planners, informing the research,
of the project to review progress, discuss findings, building capacity among the partners, and devis-
suggest alternative strategies or approaches where ing dissemination and application strategies to
necessary, provide clarifications based on their promote research uptake. Its foundation and
area of expertise, and review the final draft of the goals are simple; by having researchers work
deliverable. It is also not uncommon for AG mem- with decision-makers, research may be brought
bers to be contacted between meetings for their closer to policy. In other words, the hope is to
advice on specific issues. A strong relationship smooth the transition between analysis and appli-
with policy‐makers and other stakeholders also cation, between paper and practice. In 2005 the
facilitates access to data and other nonfinancial national “CIHR Knowledge Translation Award”
resources that are important for the success of was awarded to the NTK Team for regional
the research MCHP conducts. impact on health research.
The AG is a critically important group for
MCHP; many times the real expertise concerning
Presentations During the Project
issues of data collection, history, and use lies with
members of the AG. Their input provides an impor-
During the life of a typical deliverable, there are
tant check on any assumptions the deliverable team
numerous opportunities to discuss the project,
may have formed. Occasionally, depending on
present preliminary results, and report on pro-
their contributions, AG members may also be rec-
gress. Such opportunities include:
ognized with authorship on the final report.
Meetings with the Associate Director • MCHP knowledge translation workshop days –
of Research where invited guests consisting of government
Throughout the project, PIs meet with the associ- stakeholders meet with MCHP scientists and
ate director of research regularly to discuss their support staff to discuss deliverables
projects and enlist support if projects are • Provincial RHA Day
progressing slowly or running into problems. • Winnipeg RHA Day
Two common challenges addressed at these meet- • Manitoba Health Day
ings include the acquisition of new data or human • Manitoba Government Day
resource issues (lack of resources, inappropriate • Research forums – meetings where invited
skills or expertise, workload conflicts, etc.). These participants discuss the substantive merits of
meetings also help to ensure that steady progress various research proposals and progress
is being made and that expectations concerning updates
deadlines are achievable. • Held weekly on Wednesday afternoons at
MCHP
Meetings with the Need to Know • NTK meetings (held two to three times a year,
(NTK) Team as discussed above)
A small number of deliverables involve the Need • MCHP Advisory Board meetings (held
to Know Team (NTK Team), a collaborative biannually)
researcher/senior-level-planner group that • The board consists of five deputy ministers
includes representatives from all RHAs, several plus leading experts, other academic repre-
representatives of Manitoba Health, and MCHP sentatives, and the MCHP executive group.
staff. The NTK Team was established in 2001
through funding from the Canadian Institutes of The main steps in the deliverable process are
Health Research (CIHR) and has continued with presented in Table 1.
8 Health Services Information: From Data to Policy Impact (25 Years of Health Services and. . . 175
(formerly Churchill;
now part of
Winnipeg Regional Health)
Northern
(formerly Burntwood)
(formerly Nor-Man)
(formerly
Parkland) (formerly
(formerly North
Interlake) Eastman)
Prairie Interlake-
Mountain Eastern
(formerly Winnipeg
Assiniboine)
(formerly Southern
Brandon) (formerly (formerly
Central) South Eastman)
practices that address inequities in the The “Need to Know” Team Deliverables
healthcare system. Table 2 provides a list of
frequently included study indicators. As described above, a small number of all deliver-
ables involve the Need to Know Team (NTK
Team), a collaborative researcher/senior-level-
Highlights of Selected Deliverables planner group that includes representatives from
all RHAs, several representatives of Manitoba
This section provides an overview of MCHP deliv- Health, and MCHP staff.
erables (see Table 3) that have had specific or ongo-
ing impacts on policy and programs in the Manitoba The RHA Indicators Atlas Reports
community. The deliverables highlighted were The NTK Team is an important component of
published within the last 15 years (2000-2014) and the RHA Atlas deliverables. The Manitoba
there were no major criteria for their selection. Only RHA Indicators Atlas reports provide regional
deliverables with a concrete example of impact on and subregional data on over 50 indicators of
policy and programs in Manitoba were described. population health status, health service use,
178 L. L. Roos et al.
Fig. 2 Winnipeg
Community Areas (WCAs)
Seven Oaks-N
River East-N
Seven Oaks-W
St. Vital
Assiniboine South -N
Ft. Garry-N
St. Vital-S
Ft. Garry-S
* Churchill is also part of the Winnipeg Health Region (not shown in this map)
and quality of care. These reports provide The establishment and early work of the
RHAs with information on which to plan, NTK Team also resulted in organizational
increasing the likelihood that they will achieve effects in all three partners (academic, provin-
their goals, and allow all RHAs to compare cial government, and RHAs). Several RHAs
their health status with regional and provincial revised job descriptions and responsibilities to
averages. The three atlases (see Table 3 a–c) allocate more time and energy to finding and
were commissioned by Manitoba Health to using evidence to inform decisions. At least one
inform the Comprehensive Community Health RHA actually created a new full-time position
Assessment (CHA) reports required by provin- for this type of work. RHA representatives on
cial legislation every 5 years. the NTK Team are extremely valuable members
The atlases for CHA reporting are also used to of advisory groups for other deliverables, as
develop RHA strategic plans. Over the years, they already have an established appreciation
numerous regions have told MCHP that resource of the repository’s data and its possible uses.
allocation plans have been informed by evidence The team also increased the effectiveness and
from our reports (e.g., the need to increase efficiency of the CHA network group, which
resources or support in some areas, while reducing has many representatives in common with the
them in others). NTK Team. Each atlas has resulted in a round
8 Health Services Information: From Data to Policy Impact (25 Years of Health Services and. . . 179
of site visits. Almost every region has invited from these regional workshops suggests that the
MCHP scientists to workshops in their RHAs impacts are significant and long-lasting.
to explore local results in depth and to discuss Several NTK Team members are also public
implications for policy and planning. Feedback health officers who train medical students and
180 L. L. Roos et al.
residents in their communities. In these regions, Regional Health Authority’s (WRHA) delibera-
trainees may develop reports on the health of the tions regarding heart health services in the
communities in which they are working; they are mid-late 2000s. There had been some movement
frequently referred to the RHA atlas reports as a toward creating a women’s heart health center,
key source of information. Other NTK Team based on other evidence (not coming from
members have used atlases at their regional MCHP) demonstrating that female heart attack
board of directors meetings, tackling one or two patients were not receiving the same level of
chapters of the report at each of a series of meet- service as their male counterparts. The MCHP
ings. This provides valuable education for board report showed that this apparent sex bias was not
members and the opportunity for discussion with actually real. Within every 5-year age group,
senior management. female and male heart attack patients received
The two most recent RHA atlases are also the same level of care. The difference in inter-
listed 2nd and 16th on the list of the top vention rates was driven solely by the fact that
20 downloaded deliverables from MCHP’s female patients are known to experience heart
website over a 5-year period (from April attacks at a much older age (8–10 years older)
1, 2009, to March 31, 2014) (see Table 4). than males. Males were not being treated more
aggressively than females, but rather, younger
Other NTK Team Deliverables patients received more treatments than older
The sex differences report (see Table 3-d) may patients, and the younger patients were more
have also played some role in the Winnipeg likely to be male.
8 Health Services Information: From Data to Policy Impact (25 Years of Health Services and. . . 181
Table 4 The top 20 downloaded deliverables April 1, 2009, to March 31, 2014
Year published/ Page views
Rank Deliverable available online per year
1 Perinatal Services and Outcomes in Manitoba November 2012 104,735
2 The 2013 RHA Indicators Atlas October 2013 54,460
3 Social Housing in Manitoba: Part I and Part II June 2013 46,841
4 Projecting Personal Care Home Bed Equivalent Needs in Manitoba October 2012 27,572
Through 2036
5 Profile of Metis Health Status and Healthcare Utilization in Manitoba: A June 2010 15,941
Population-Based Study
6 Health Inequities in Manitoba: Is the Socioeconomic Gap in Health September 2010 11,195
Widening or Narrowing Over Time?
7 Pharmaceutical Use in Manitoba: Opportunities to Optimize Use December 2010 9,331
8 The Additional Cost of Chronic Disease in Manitoba April 2010 6,432
9 Manitoba Child Health Atlas Update November 2008 6,400
10 What Works? A First Look at Evaluating Manitoba’s Regional Health March 2008 6,118
Programs and Policies at the Population Level
11 Effects of Manitoba Pharmacare Formulary Policy on Utilization of December 2009 6,107
Prescription Medications
12 Defining and Validating Chronic Diseases: An Administrative Data July 2006 6,031
Approach
13 Patterns of Regional Mental Illness Disorder Diagnoses and Service Use September 2004 5,334
in Manitoba: A Population-Based Study
14 Assessing The Health Of Children In Manitoba: A Population-Based February 2001 5,213
Study
15 Who is in our hospitals and why September 2013 5,103
16 Manitoba RHA Indicators Atlas 2009 September 2009 4,975
17 The Health and Health Care Use of Registered First Nations People March 2002 4,906
Living in Manitoba: A Population-Based Study
18 How are Manitoba’s Children Doing? October 2012 4,832
19 Composite Measures/Indices of Health and Health System Performance August 2009 4,756
20 Population Aging and the Continuum of Older Adult Care in Manitoba February 2011 3,068
Note: PDF copies of all deliverables became available on the MCHP website in 1999
Averaged page views per year, over the 5-year period
The mental illness report (see Table 3-e) was comorbidity of physical and mental illness. The
important for documenting and spreading the timeliness and prominence of the report also
word about the high prevalence of mental illness resulted in its principal investigator, Dr. Patricia
in Manitoba and the high use of healthcare ser- Martens, being invited to join the first Scientific
vices by people with mental illness. This topic Advisory Board for the Mental Health Commis-
was identified as a high priority by the rural and sion of Canada.
northern RHAs and by the Deputy Minister The Mental Health Commission of Canada has
of Health and assistant deputy ministers. used MCHP research in launching its national
Between 1997 and 2002, more than one in four research project to find sustainable solutions for
Manitobans had at least one mental illness diag- homeless people with mental health issues.
nosis and used nearly half of the days people MCHP was included as a key partner in the
spent in hospitals. Most of the services used Winnipeg demonstration project: http://www.
were not for mental illness, but across the entire mentalhealthcommission.ca/sites/default/files/At
spectrum of physical illness as well. This added %252520Home%252520Report%252520Winni
important evidence to the understanding of the peg%252520ENG_0.pdf.
182 L. L. Roos et al.
The mental illness report (see Table 3-e) also Committee of the Assembly of Manitoba Chiefs
revealed that close to 83 % of nursing-home studied the health of Manitoba’s Registered First
residents have at least one mental illness diag- Nations people, identifying factors that contribute
nosis, yet the most frequent users of psychia- to differences in health. The study focused on the
trists are people 35–55 years old. The report First Nations population as a group, as well as
indicated that planners may want to ensure that by Tribal Council and by on-reserve versus
facility staff are trained to provide care to off-reserve populations. Comparisons were made
address mental health as well as physical health to the Manitoba population across various health-
needs and that people in personal care homes related indicators. Compared to all other
are referred for treatment. This finding may Manitobans, a Registered First Nations person’s
have contributed to the decision by the provin- life expectancy was 8 years shorter, dying at a
cial health Minister at the time, to invest more young age was more than doubled, the chance of
than $40 million to implement a comprehensive developing diabetes was more than quadrupled,
strategy to improve the quality of care in and the chance of having an amputation as a result
Manitoba’s personal care homes. The funding of diabetes increased 16-fold. Hospitalization
was pledged to hire 250 registered nurses, reg- rates were doubled for Registered First Nations
istered psychiatric nurses, and licensed practical persons compared to all other Manitobans, and
nurses, 100 personal healthcare aides, and they are three times higher for hospitalizations
50 allied healthcare professionals to increase due to injury. Overall, health status rates varied
the direct hours of care, strengthen the work across tribal councils. However, premature mor-
environment for staff, and provide dementia tality rates were lowest in the north and highest in
education to staff and families: http://news.gov. the south. This finding was surprising due to the
mb.ca/news/index.html?archive=&item=2707. “reversed” association with geography; in many
previous MCHP studies and other reports, the
health of residents of Northern Manitoba was
Manitoba’s Indigenous Population usually shown to be worse than those in the
south. However, this report showed the opposite
The Métis community makes up roughly 6 % of to be true: First Nations residents of the north were
Manitoba’s population. The Metis Health deliver- healthier than their counterparts in the south.
able (see Table 3-f) explored the Metis These findings have been extensively used by
community’s health status and healthcare use, as the Assembly of Manitoba Chiefs (AMC) health
well as many social indicators of health. Overall, councils for planning.
Métis people living in Northern Manitoba were
found to be less healthy compared to those living
in the southeast region (South Eastman) (see Hospitals, Emergency Departments,
Fig. 1). This deliverable drew the attention of the ICUs, and Long-Term Care
Manitoba Metis Federation (MMF), who were
concerned with identifying regions and health The epidemiology and outcomes of critical illness
areas needing improvement in order to better the in Manitoba report (see Table 3-h) allowed link-
health and well-being of the Métis community. age of the extensive clinical database created by
The MMF worked alongside MCHP to produce the Department of Critical Care Medicine to the
this report as one element in the regional planning repository. This combination of data sources is
profiles and to provide a springboard for other unique, allowing a first-ever population-based
studies. This was the first attempt in Canada to exploration of the use of intensive care units
do a population-level Metis health assessment. (ICUs) and fostered the development of an
The Health of First Nations deliverable (see ongoing research group. In this report, the entire
Table 3-g) with the approval and collaborative population of Manitoba and all hospitals were
support of the Health Information and Research assessed from 1999/2000 to 2007/2008. About
8 Health Services Information: From Data to Policy Impact (25 Years of Health Services and. . . 183
0.6 % of Manitoba adults are admitted to an ICU home use. While about 50 % of newly admitted
each year, which means that about 8 % of those in nursing-home residents required weight-bearing
hospitals are assessed as needing ICU care. Over a help to complete activities of daily living (ADLs),
9-year period, ICU beds in Winnipeg were full about a quarter of new residents had at most mod-
less than 5 % of the time. Outside of Winnipeg, erate challenges across several clinical domains
ICU beds were full less than 1 % of the time. The (e.g., ADLs, behavior, continence, cognitive perfor-
average age for ICU patients was 64 years and mance). Furthermore, about 12 % of newly admit-
admission rates peaked at those 80 years of age. ted nursing-home residents had the same clinical
Overall, about two-thirds of adult ICU care was profile as supportive housing clients (i.e., minor
for patients 60 years and older and the annual ADL and/or cognitive challenges, with few needs
number of ICU admissions have dropped slightly; in other clinical areas), suggesting the potential of
however, the length of stay in ICU’s has increased supportive housing to offset nursing-home use, now
over time. Repeated need for ICU care was sur- and into the future. Collectively, these findings
prisingly common (15 %) and previous ICU emphasized the need to develop appropriate transi-
patients were almost four times more likely to be tional strategies across the older adult continuum of
admitted again to an ICU in the year after dis- care, ensuring that people have access to the right
charge. Finally, the most common reason for care at the right time. Subsequently the Manitoba
ICU admission was cardiovascular conditions, government announced two initiatives which may
followed by sepsis, lung disorders, accidents or have been informed by this work:
traumas, and poisonings. This exploratory deliv-
erable was the first of its kind to link clinical data • Advancing Continuing Care – A Blueprint to
on ICU patients into a population-based reposi- Support System Change
tory; thus it created a globally unique and flexible http://news.gov.mb.ca/news/?item=31246
research tool. This tool is being leveraged for • Manitoba’s Framework for Alzheimer’s Dis-
use in research projects and graduate student the- ease and Other Dementias
ses. The results on ICU bed utilization confirmed http://news.gov.mb.ca/news/index.html?
that the number of ICU beds in the Winnipeg item=31385
RHA was within the recommended range.
The report has resulted in four published manu- The analysis of emergency department’s (see
scripts (Garland et al. 2013, 2014a, b; Olafson Table 3-j) has had several impacts. Manitoba
et al. 2014), with one more underway. It has also Health approved funding for the Eastman RHA
fostered several related research projects which (see Fig. 1) to hire 2.1 equivalent full-time staff to
have received peer-reviewed funding and pro- support mental health services. This is due to the
vided additional publications. reports’ finding that 54 % of frequent emergency
The population in Manitoba, as it is in other parts department (ED) users (seven or more ED visits
of Canada and the developed world, is rapidly aging. per year) have been diagnosed with two or more
The population aging deliverable (see Table 3-i) mental illnesses. The funding was approved for
looked at the use of home care, supportive housing, the placement of Registered Psychiatric Nurses in
and personal care homes (also known as nursing EDs. Manitoba Health designated a total of
homes) in Winnipeg MB from several perspectives. $165,302 for the 2008/09 and 2009/10 budget
First, past rates in nursing-home use were used to years: http://news.gov.mb.ca/news/index.html?
create two scenarios which showed that nursing- archive=&item=4458. The Canadian Health Ser-
home use will increase by 30–50 % by 2031, vices Research Foundation (CHSRF) included
emphasizing the importance of developing strate- some of the primary findings of this deliverable
gies to continually reduce rates of nursing-home in their publication on emergency room
use. This work also revealed the clinical profile of overcrowding: http://www.cfhi-fcass.ca/sf-docs/
current day nursing-home residents, showing the default-source/mythbusters/Myth-Emergency-
potential for supportive housing to offset nursing- Rm-Overcrowding-EN.pdf?sfvrsn=0.
184 L. L. Roos et al.
The CHSRF also wrote about MCHP’s ability Winnipeg Community Areas of Point Douglas,
to transform data into quality care and transfer Downtown, and Inkster (see Fig. 2). The deliver-
information down the chain of command to able included new information on rates of
those that could make the appropriate changes postpartum depression/anxiety in Manitoba,
and improvements. Their report highlighted the revealing that women who experienced anxiety
approach the principal investigator Dr. Malcolm or depression during their pregnancy were eight
Doupe took in explaining the deliverable “Using times more likely to experience it postpartum. The
Administrative Data to Develop Indicators of WRHA reaffirmed the Women’s Health Pro-
Quality Care in Personal Care Homes” (see gram’s efforts to ensure that information and
Table 3-k) to the Brandon RHA personal care resources are continuously available in the post-
homes’ managers and policy-makers. Results partum period to foster mental health. Staff in the
were seen immediately in the quality of care: a Population Health and Health Equity and Public
pneumonia care map was introduced; the region’s Health Program, administered by Manitoba
“personal care forum” became more productive, Health, noted that the perinatal deliverable
setting goals and action plans and updating each influenced their thinking about potential positive
other on their progress; and a program for better impacts of public health engagement early with
managing medications of new residents was intro- families in the prenatal period; findings from the
duced: http://www.cfhi-fcass.ca/sf-docs/default- deliverable have been used to inform develop-
source/building-the-case-for-quality/TRANSFOR ment of the provincial public health nursing stan-
MING_DATA_ENG_1.pdf?sfvrsn=0. dards. The WRHA is actively interested in
The performance of rural and northern hospi- reducing health inequities. They have been partic-
tals deliverable (see Table 3-l) showed that rural ularly interested in breastfeeding initiation. The
Manitobans do not use nearby hospitals. Across perinatal deliverable highlighted variations in ini-
68 rural hospitals, occupancy rates were below tiation rates across the city (e.g., over 90 % in an
60 % and some hospitals and health centers were affluent neighborhood and approximately 65 % in
keeping admitted patients for too long (low scores a less affluent one). These variations were signif-
on discharge efficiency). In 2002 the Manitoba icant in motivating the WRHA to begin tracking
Government announced a pilot project with the breastfeeding initiation and duration rates across
Southeast Manitoba RHA to serve more surgery Winnipeg.
patients at two local hospitals in an effort to make The Baby First deliverable (see Table 3-n)
better use of rural facilities and provide patient evaluated how well the Manitoba Baby First
care closer to home: http://www.gov.mb.ca/chc/ screening program (established in 1999, now
press/top/2002/07/2002-07-09-01.html. called “Families First”) works with regards to
identifying children at risk. About 75 % of babies
had a Baby First screening form filled out; the
Maternal and Child Health screen was reasonably successful in picking out
children who eventually ended up in foster care.
The Perinatal Services and Outcomes deliverable The strongest predictors of a child ending up in
(see Table 3-m) has been the number one deliver- care were having a file with local child protection
able downloaded from the MCHP website (see services, being on income assistance, having a
Table 4). The WRHA Women’s Health Program mother who did not finish high school, and living
used the report to validate their initiatives and in a one-parent family with no social support.
reiterate the importance of the prenatal period in Because the age of the mother at the birth of her
promoting optimal early childhood development. first child was also found to be highly predictive
Inadequate prenatal care is being addressed (and was not currently being asked on the screen-
through the “Partners in Inner-city Integrated Pre- ing form), Healthy Child Manitoba responded to
natal Care (PIIPC)” initiative, stimulated in part preliminary drafts of the report by adding this item
by the high rates of inadequate care found in the to the screening form (see Fig. 3). In addition,
8 Health Services Information: From Data to Policy Impact (25 Years of Health Services and. . . 185
child maltreatment and assault injury rates in chil- (see Table 3-o) found that infant mortality was
dren up to 3 years of age declined after the Baby double for the lowest-income areas compared to
First home visiting program was initiated. the highest-income areas, and the leading cause of
Poor health during childhood raises the risk death for children was injury due to motor vehicle
of poor adult health. The Child Health Atlas crashes. Children living outside of Winnipeg are
186 L. L. Roos et al.
twice as likely to die from injuries and almost had a passing rate of 95 %. The right side of
two-and-a-half times as likely to be hospitalized the graph not only includes those students who
for injuries. Because of these findings, Manitoba wrote the test, but more importantly, also
Health announced a new public initiative aimed at includes those students born in the same year
preventing childhood injuries in the home: http:// who are still residing in Winnipeg and who
news.gov.mb.ca/news/index.html?item=25659& should have written the test had they
posted=2002-02-26. progressed through the school system as
The children’s educational outcomes and expected. This population-based analysis
socioeconomic status deliverable (see Table 3- shows a much steeper gradient, with the pass-
p), which stemmed from the second Child ing rates for youth in families on provincial
Health Atlas, revealed some very surprising income assistance dropping to 16 %. The two
findings. This deliverable looked at perfor- figures differ in that the one on the right
mance on Grade 12 standard tests by socioeco- includes those who have been held behind a
nomic status (SES) (see Fig. 4). The left side grade or more or who have withdrawn from
of Fig. 4 shows, for youths who wrote the test, school. Such surprising findings demonstrate
that students from the poorest families (those the need for better educational programs and
receiving provincial income assistance) had a initiatives for students from low-income fami-
passing rate of 75 %, whereas students resid- lies. This report, along with the Child Health
ing in the city’s highest-income neighborhoods Atlas, led to the development of two initiatives:
90%
Withdrawn
80%
70% In Grade 11 or
lower
60%
In Grade 12 but
no LA test mark
50%
95%
88% 88% 91% Drop Course,
40% Absent, Exempt, 80%
75% Incomplete 71%
Fail 64%
30%
52%
20%
Pass
10%
16%
0%
Income Low Low-Middle Middle High Income Low Low-Middle Middle High
Assistance Assistance
Socioeconomic Status (SES) Socioeconomic Status (SES)
Note: A version of this figure has also been published in Roos, NP et al., 2010, Milbank Quarterly, 88(3):382-403 and in Brownell, M et al., How Do
Educational Outcomes Vary With Socioeconomic Status? June 2004, Manitoba Centre for Health Policy
Fig. 4 Grade 12 language arts (LA) test performance by Winnipeg socioeconomic status, 2001/02. Youths born in
Manitoba in 1984
8 Health Services Information: From Data to Policy Impact (25 Years of Health Services and. . . 187
• The “Community School Investigators (CSI) disseminated through the natural interest of the
program” decision-makers involved in the programs or poli-
http://www.bgcwinnipeg.ca/system/resources/ cies for which they are relevant. Although research
W1siZiIsIjIwMTQvMDEvMTYvMTgvMDQvM evidence is not the only influence on policy (often
zUvNDE3L0NTSV9SZXBvcnRfMjAxMi5wZG other pressures, such as economic or political real-
YiXV0/CSI%20Report%202012.pdf (p. 6) ities, override the evidence), if policy-makers
• The Community Schools Partnership Initiative and planners understand the research, there is a
(CSPI) good chance it will be important in the decision-
http://www.edu.gov.mb.ca/cspi/ making process (Martens 2011).
Some people have expressed concerns about
Two additional child health atlases have been having policy- and decision-makers involved in
produced at MCHP since the 2001 and 2004 the process from start to finish. What if they bias
atlases: The Child Health Atlas Update (2008) the results? What if they ask the wrong questions?
(#9 in Table 4) which provided much needed What if they don’t like the results? Such questions
information on child health for the annual echoed our own fears in the early years. Through a
Community Health Assessments and How Are combination of research funded from deliverables
Manitoba’s Children Doing? (2012) (#18 in and our external grant-funded research from
Table 4) which was a companion report to the peer-reviewed granting agencies such as CIHR,
legislated 5-year Healthy Child Manitoba report. Research Manitoba, and others, MCHP has learned
that the best questions come from an exchange of
ideas, both among researchers and between
Knowledge Translation (KT) researchers and research users (Martens 2011).
80
70
60
50
Number
40
30
20
10
completion.” Accessibility in this respect con- the importance of stories to him as a child.
tinues to grow, evidenced by the fact that our “‘Stories are a way of melting the ice,’ [his father]
concept dictionary and glossary receive more said gently, ‘turning it into water. They are like
than 1.5 million hits a year (excluding bots and repackaging something – changing its form – so
Web crawlers). This is a remarkably high fre- that the design of the sponge can accept it’” (Shah
quency for a small academic unit (Martens 2011). 2007: 298). This is an apt metaphor for telling
research stories. Sometimes providing a written
report may not be enough. In these workshops,
Other KT Activities MCHP turns written reports into stories by
explaining how to read the graphs, how to look
MCHP has established a highly successful set of for connections, or how to relate data to real-life
annual workshops attended by top-level planners, settings. Repackaging the research allows it to be
policy-makers, healthcare CEOs, VPs of plan- understood and incorporated into the audience’s
ning, board members for RHAs, and front-line way of thinking (Martens 2011).
workers. These activities are based upon an inter-
active model of roundtable discussions concen-
trating on one or two MCHP reports. Attendees Impact of Large Integrated Data
are encouraged to look for the stories in the data. Repositories
Key to these workshop days is the presence of
MCHP scientists to explain how to read the Creation of a large integrated repository of data
reports. In the book Arabian Nights by Tahir across multiple government domains has facili-
Shah, the author talks about his father explaining tated groundbreaking innovative research. Record
8 Health Services Information: From Data to Policy Impact (25 Years of Health Services and. . . 189
linkage has merged information from different the population-based repository has great poten-
departments while, at the same time, extensive tial for “natural experiments” where administra-
longitudinal and familial data have allowed new tive data may be used to consider the impact of
types of studies and facilitated interdisciplinary policy and program changes. And research
work. The opportunities presented are unique designs can be improved by building on the
advantages of large repositories. types of data available in Manitoba to construct
control groups using propensity scores, sibling
comparisons, and fine-grained ecological infor-
Looking Ahead mation. To date, such efforts are basically
unexplored but have great potential for the future.
As seen in the discussion of deliverables, the very
large numbers of cases that accumulate when such
data are routinely gathered facilitate complicated Summing Up
multivariate analyses and allow studying
low-prevalence conditions or events. Because Research platforms lend themselves to forming an
these data are typically collected over long “ecosystem,” “an intertwined set of products and
periods of time, pre- and post-observations can services that work together” (El Akkad and
be organized around different life events at the Marlow 2012). The MCHP ecosystem involves
individual level, and also before and after key relations with people, including key decision-
program implementation – with a time frame makers, software (for data cleaning, record link-
extending for over 40 years in the case of the age, and analysis), the extensive documentation
MCHP repository. Merging data across different accessible through our concept dictionary and
ministerial departments can bring together indi- glossary, predictors and outcome measures
vidual information from several subject areas to derived from multiple files, and a methodologi-
create predictors useful in a variety of contexts cal/statistical tool kit. New data in the Manitoba
(i.e., population-based research on ethnicity, repository has expanded the type and number of
developing risk assessment tools) and permit studies being carried out. These capabilities foster
examining important connections affecting the useful interactions with a diversity of investiga-
lives of individuals and patients. Data tors; helping to avoid an overreliance on a single
documenting the use or lack of contact with the funding source and bringing in valuable new
healthcare system and residential mobility data perspectives.
can be put together for any interval from 1 day The approaches forwarded here seem gener-
to many years. A real but relatively unexplored ally relevant to “big data” where more attention
advantage of the MCHP repository would be to needs to be paid to questions of design and
follow those born in the 1970s, where the ability analysis. The significant effort required to clean
to track family structure events and health out- and prepare the databases should not be
comes over the first decades of life is outstanding. underestimated; Cukier and Mayer-Schoenberger
This line of inquiry provides the possibility of life have both noted the messiness of big data and
stage analysis: does a diagnosis of attention deficit highlighted the potential benefits of interagency
disorder which first occurs at age 4–8 have a collaboration in improving public services
different impact on educational outcomes than a (Cukier and Mayer-Schoenberger 2013). The
diagnosis which first occurs at age 9–12? How uses of population-based data are being more
important is a chronic disease diagnosis, one widely recognized. Information-rich environ-
which continues over time, compared with the ments should continue to facilitate opportunities
same diagnosis occurring during only one age for the next generation of researchers. That’s
period? the real impact of MCHP’s academic and
There is great interest in improving both obser- research history: building a culture where evi-
vational and interventional studies. In addition, dence informs policy in a way that works.
190 L. L. Roos et al.
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Health-Care Data and Defining the Unit
of Analysis: Historical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
The Episode of Care: A Unifying Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Episodes as an Analytical Tool: Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Comprehensiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Clinical Meaningfulness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Episodes as an Analytical Tool: Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Data Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Time and Resources Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Methodological Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Constructing an Episode of Care: Key Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Data Sources Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Individual-Level Record Linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Information on Type of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Diagnosis Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
The Date/Time of the Service Delivered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Core Elements of the Episode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Defining the Index Event and/or Starting Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Defining the Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Selecting the Scope of Services Included . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Outcome Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Constructing an Episode of Care: A Hip Fracture Example . . . . . . . . . . . . . . . . . . . . . . . 204
Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
E. Hellsten (*)
Health Quality Ontario, Toronto, ON, Canada
e-mail: erik.hellsten@hqontario.ca
K. J. Sheehan
School of Population and Public Health, The University of
British Columbia, Vancouver, BC, Canada
particular hospital and how this sum compares to before the patient enters hospital, they are
that in previous years – presenting health-care assessed at a preoperative clinic to prepare for
utilization data in this discrete, itemized fashion the surgery. The patient is then admitted to hospi-
typically captures only fragments of the total pic- tal, receives a total knee replacement on the day of
ture of services delivered to treat a patient’s health admission, and is discharged home 3 days later
condition. without incident. Following their discharge home,
The challenges of organizing health-care data the patient receives three weekly visits from a
into a coherent narrative stem in part from the physiotherapist contracting with a local home
unique nature of the health-care “product”: unlike care agency to assist with their rehabilitation.
most other commodities, health care is often Three weeks later, the patient has a follow-up
delivered through a series of separate but visit with the surgeon in their office to assess
related encounters, rather than through a single their recovery. Satisfied with the patient’s pro-
stand-alone service (Feldstein 1966; Hornbrook gress, the surgeon decides no further follow-up
et al. 1985). A patient presenting with a health is needed; the patient’s care journey can now be
condition may receive health-care services that considered to be at an end.
span multiple different health-care providers This complex series of encounters typifies a
over several points in time. The interrelated nature routine, simplified pathway for a patient receiving
of this variety of providers and services in provid- a successful total knee replacement. In some
ing care for a health condition for an individual instances, the same patient’s journey might well
patient is typically not readily apparent in stan- be further complicated by additional health-care
dard itemized or index-based presentations of events, such as the appearance of in-hospital
health-care data. or postoperative complications, the need for
Figure 1 provides an illustrative example of a readmission to hospital or revision surgery, and
series of individual health-care service data other potential sequelae.
points, which on closer inspection are revealed For the health services researcher, the hypo-
to be a single patient’s journey through treatment thetical knee replacement example likely pro-
with a total knee replacement for osteoarthritis of duces over a dozen data points in the form of a
the knee. Beginning with a consultation with a series of individual encounters recorded between
primary care physician for chronic knee pain that several health service providers and provider
has failed to respond to conservative treatment, organizations over a span of several months. In
the patient is referred for a radiograph several days many cases, this encounter data will also be
later and booked for a consultation with an ortho- housed across several discrete – and frequently
pedic surgeon in their office 4 weeks following. disconnected – datasets: primary care physician
During this consultation, the patient and surgeon and specialist billings, inpatient hospital dis-
decide on a total knee replacement surgery, which charge abstracts, home care agency records, and
is scheduled at a local hospital approximately so on. The health service researcher faces the
2 months after the consultation. Several days challenge of stitching these discrete observations
Fig. 1 Example episode of care for osteoarthritis of the knee. Illustrative example; timeline not to scale
194 E. Hellsten and K. J. Sheehan
services delivered may be captured. White clinical practice perspective to the issue, defining
et al. suggested “the natural history of the patient’s a unit of analysis suitable for the development of
medical care” may be the most relevant primary “standards for the content of good clinical perfor-
unit of observation and proposed some appro- mance” in particular diseases, against which pro-
aches for disaggregating the data found in tradi- viders’ medical practices could be evaluated
tional health-care indexes into more meaningful “from preventive to postclinical after-care” (Falk
forms, such as employing time windows of weeks et al. 1967). Within these units, which they pre-
or months rather than years, and better under- sciently termed “pathways” in a subsequent paper
standing the decision-making process that unfolds in the same series (Schonfeld et al. 1968), the
between patients and medical care practitioners authors consulted expert physicians to arrive at
over the course of a particular illness. The paper quantitative judgments on what constituted appro-
is perhaps also the first to describe the “episode of priate medical utilization, such as the average time
ill health or injury” as its unit of observation required for a first diagnostic visit or the average
(White et al. 1961). hospital length of stay for various diseases.
In his 1966 article “Research on the Demand Published the same year, “Changes in the
for Health Services,” Paul Feldstein extended Costs of Treatment of Selected Illnesses,
Codman’s original work to define the “product” 1951–1965” by Anne Scitovsky (1967) extended
of health care, noting that in order to define a earlier work developing an alternate approach to
meaningful unit of output for analysis, researchers address the inadequacies of the Bureau of Labor
required “a better understanding of how the vari- Statistics’ medical care price index – which was
ous components of care are used in its production” based on the prices of individual medical items
(Feldstein 1966). Feldstein emphasized the and offered a limited and fragmented view of
importance of comprehensively accounting for changes in medical spending – to introduce
the entire combination of service inputs – such a “cost-per-episode-of-illness” approach that
as hospital care and physician visits – used to treat enabled the construction of a medical care price
a particular illness and considering differences in index based on the average costs of treatment of
the relative contributions of these services in the selected illnesses rather than the costs of discrete
production of treatment products between groups items. By demarcating patient episodes of illness
of providers and over time. He noted the limita- within a claims dataset that included all relevant
tions of conventional aggregate indices conven- services delivered between an initial diagnosis or
tionally applied to quantify national medical presentation for a health issue and either a
production in terms of outputs such as numbers service-defined endpoint (e.g., the last chemo-
of visits or bed days. therapy treatment following breast cancer treat-
ment) or a prescribed follow-up time period that
varied by disease, Scitovsky was able to compare
The Episode of Care: A Unifying changes in service utilization and cost for
Concept particular diseases between two time periods.
The episode unit enabled Scitovsky to both
Following this foundational papers’ assessment of comprehensively capture the full range of ser-
the gaps in contemporary methods for analyzing vices delivered to treat a specified disease
health-care utilization data, 1967 saw the publica- and examine changes in the provision of care,
tion of three seminal health services research such as a reduction in the rate of home visits
papers that each put forward a different perspec- and the shift of forearm fracture repairs from
tive on establishing an operational definition for office-based general practice to hospital-based
White et al.’s “natural history of the patient’s specialty care.
medical care.” In their series “The Development While White et al. provided a clinical practice
of Standards for the Audit and Planning of Med- construct of the episode of care and anticipated
ical Care,” Isidore Falk and colleagues took a the use of clinical pathways, and Scitovsky made
196 E. Hellsten and K. J. Sheehan
operational use of the concept for analyzing and apart from medical care, practically speaking,
comparing costs and utilization (an application researchers typically face significant challenges
that continues to see widespread use today), it in gathering precise data on episodes of illness
was Jerry Solon and colleagues who provided that occur without corresponding provision of
the first comprehensive definition of this new health services as these typically must be identi-
concept in “Delineating Episodes of Medical fied based on patient recollection. In their broadest
Care” (Solon et al. 1967). The authors described definition, the episode of care may overlap with
three essential features found in any medical care the episode of illness by including diagnostic
episode: a beginning point, a course of services follow-up after the point where medical care
directed toward an objective, and a point of ter- ceases, in order to understand the effect on a
mination or suspension of the service. Episodes patient’s trajectory of illness (Solon et al. 1967).
could be constructed around a variety of issues, Solon et al. also sketched out some potential
including a general health-related complaint, a set applications of the episode concept in their 1967
of defined symptoms, a diagnosed disease, or the paper, including using episodes as an organizing
achievement of a particular health objective (such structure for clinicians planning a patient’s care
as preventive care) where no active morbidities and as a frame of reference for the development of
are presented. standards of care for different medical conditions.
Solon et al. touched on range of impor- They further applied the concept in their 1969
tant (and still relevant) methodological issues study “Episodes of Medical Care: Nursing
such as the definition of clinically meaningful Students’ Use of Medical Services,” analyzing
time intervals for different medical conditions and comparing the details of several years of
between service encounters to mark the end of a health services received by nursing students and
previous episode and the beginning of a new one. comparing episode-based utilization measures
They discussed the conceptual challenges posed such as the volume and distribution of visits,
by chronic conditions that require ongoing med- diagnostic tests, and admissions within each epi-
ical management without a definite closure and sode (Solon et al. 1969).
expounded on the relationships between health After Solon’s codification of the essential ele-
services contained within a single episode, such ments of the episode of care, further refinements
as a chain of related physician visits. They iden- to and applications of the concept followed. In
tified potential interactions between multiple 1977, Moscovice first implemented episodes of
related episodes within the same individual, care using computerized routines, constructing
such as periodic exacerbations, remissions or disease-specific algorithms to define episodes for
acute sequelae linked to an underlying chronic con- several tracer conditions based on patient visit
dition, concurrent episodes for comorbid conditions, information (Moscovice 1977). The algorithms
or iatrogenic events resulting from the treatment identified an initial encounter with the recorded
delivered for an initial health problem. They incidence of a specified diagnosis code (the index
suggested that concurrent conditions in a patient event) and then tracked subsequent encounters by
might be treated as either part of a single episode the same patient with reported codes for the same
or multiple distinct episodes, depending on whether diagnosis or specified related comorbidities. For
the physician chooses to focus on one illness at a each condition, based on physician input, a max-
time or treat several within the same encounter imum time interval was defined between service
(Solon et al. 1967). encounters to assign services to either part of an
Solon et al. distinguished between episodes of existing episode or as the start of a new episode.
care, which are defined based on reported health Services and resources expended for each health
services, and episodes of illness, which may occur condition were similarly defined based on infor-
without the provision of health services. While the mation contained in medical directives and
episode of illness is an important concept for through clinician input. Moscovice compared
understanding the etiology of sickness and disease measures of utilization between providers and
9 Health Services Information: Key Concepts and Considerations in Building Episodes of Care. . . 197
issues or treatments between providers, settings, outlining such parameters. Finally, researchers
or points in time, using a format that is both more may encounter difficulties in communicating
clinically homogenous and more reflective of the around episode-based analysis to others who
underlying clinical reality of the health problem may not be familiar with the concept.
studied. The parameters of episodes used to
develop analyses can also be set with the explicit
input of clinicians who have understanding and Time and Resources Required
expertise in the particular condition or interven-
tion of focus, strengthening the credibility of the The increased complexity of the episode approach
analysis. over traditional silo-based forms of analyses leads
to increased time and resources required for tasks
such as defining episodes, preparing datasets, and
Episodes as an Analytical Tool: troubleshooting analyses. Many episode-based
Challenges analyses also require substantial computing
power to run.
There are also a number of important challenges
associated with designing, implementing, and
interpreting episode-based approaches: Methodological Challenges
Defining the Index Event and/or might trigger a separate, concurrent episode for
Starting Point the complication.
(Scitovsky 1967; Solon et al. 1967). Often with chronic sequelae like stroke, where follow-
described as a “clean period,” this generally on care can sometimes last for years. It should be
takes the form of a specified window of time noted that with endpoints based on clean periods,
where no services related to the episode are pro- the same considerations apply in terms of “open”
vided. For example, in the case of chronic bron- episodes: active episodes where a dataset or
chitis, this might be 45 days without any services claims history is censored before the full duration
related to bronchitis treatment such as x-rays or of the clean period elapses are considered “open”
relevant medication. Theoretically, using these at that point.
definition episodes for a particular condition can Alternatively, an endpoint can be a fixed point
have any duration, so long as relevant services in time, such as 30 days following a hospital
continue to be provided for treatment of the con- admission or discharge. These sorts of calendar-
dition. As with the duration of a fixed time win- based episode endpoints are commonly used for
dow, the duration of a clean period should be outcome measures that seek to compare “apples to
condition or procedure specific and defined apples” across providers that might have different
based on clinical input. Typically, episodes for discharge practices. The current public reporting
acute conditions such as appendicitis – where a principles adopted by the Centers for Medicare
defined, time-limited course of treatment can be and Medicaid Services to report on hospital mor-
expected – will have shorter clean periods than tality, readmission, and other outcomes stipulate
episodes for chronic diseases or acute conditions the use of a standardized time period to facilitate
202 E. Hellsten and K. J. Sheehan
comparison. A point in time approach may also be researcher desires and as is feasible given avail-
adopted in the case of chronic disease episodes able data. The scope of services included requires
based around an annualized analysis period or a decision on the part of the researcher: a more
where a dataset is censored at a particular date holistic episode approach might capture all ser-
and truncates “open” episodes. Using migraine vices provided during the episode window,
episodes, Schulman et al. (1999) put forward a regardless of whether they appear to be directly
novel approach to empirically defining the length related to a condition. This approach is being
of an episode of care (Schulman et al. 1999). The employed by the Centers for Medicare and Med-
study used administrative claims data to deter- icaid Services’ Bundled Payments for Care
mine the point in time following the index event Improvement initiative (Centers for Medicare &
where elevated weekly charges returned to their Medicaid Services 2014). A more limited episode
original pre-episode levels. may include only those services directly related to
Finally, the start of a new episode may trigger a particular condition. For example, in defining
the close of an existing one. For example, a patient services to be included in episodes of diabetes
suffering from osteoarthritis of the knees who care, the Netherlands’ bundled payment initiative
receives a total knee replacement may have an has included only community-based professional
ongoing osteoarthritis management episode re- services, excluding drugs and hospitalizations
placed with a total knee replacement procedural (Struijs et al. 2012a).
episode. Following the surgery, should their oste- Ultimately, the scope of services included in
oarthritis be completely addressed, the patient the episode depends on the objectives of the anal-
would not be expected to continue the original ysis and its intended applications and the nature of
disease episode. the data available. Payment applications, for
example, may suggest the utility of a single epi-
Examples of Endpoints sode payment that covers multiple different types
Moscovice drew on published medical directives of services over a fixed period of time, in order to
and clinician expert opinion concerning “reason- prevent any risk of “double counting” payment
able periods of follow-up” to a time period for (Struijs et al. 2012b). A truly comprehensive epi-
each condition where the absence of services sode might even include services beyond those
related to the condition would mark the beginning delivered by health-care providers: for an episode
of a new episode (Moscovice 1977). Scitovsky of care around complex patients with functional
used a similar condition-specific approach to needs, it may be ideal to also include social care
defining episode duration (Scitovsky 1967). services delivered – to the extent that they are
Health Quality Ontario used input from clinical captured in databases.
expert panels, informed by analysis of linked If the researcher elects to use a more clinically
administrative data on utilization, to define the focused approach or a categorically based
typical duration of services provided in episodes approach to service inclusion, clinical input is
of hip fracture care (Health Quality Ontario 2013). imperative. Input from clinical panels is required
Symmetry’s Episode Treatment Groups use the to identify the services that are related to the
approach of “annualizing” the episode of care for episode of care and the types of services that
chronic diseases with indefinite durations (Optum would likely not be related.
2015).
Examples
Moscovice used published medical directives and
Selecting the Scope of Services clinical input to define lists of medical services
Included that could “realistically be used in the treatment of
a particular problem or related comorbidity.” In
Episodes of care can be as comprehensive or the case of otitis media, this list of services
as specific in their inclusion of services as a included lab tests such as throat cultures that
9 Health Services Information: Key Concepts and Considerations in Building Episodes of Care. . . 203
might be used to rule out plausible related (Cave 1995). More recent studies have used epi-
comorbidities. Based on these lists, Moscovice sodes for similar cost and utilization profiling
defined a set of “patterns of care” based on the approaches with hospitals as the central unit of
most common combinations of services delivered analysis (Birkmeyer et al. 2010), as well as
to treat each episode. For otitis media, 20.6 % of exploring regional comparisons (Reschovsky
episodes analyzed consisted of a single visit, et al. 2014). Regardless of the unit of analysis
while 13.8 % consisted of an initial visit, admin- for comparison, the episode construct enables an
istration of an antibiotic, throat culture, and then a “apples to apples” mechanisms that allows for
follow-up visit (Moscovice 1977). comparison of the total treatment “product”
Solon et al. examined nursing students’ between different providers or regions.
utilization of health-care services within episodes The vast majority of episode-based costing
of care through separating encounters into analyses have largely been conducted in the
“universal” visits – those services, such as United States, where the predominant use of item-
vaccinations, provided to all students – and ized claims data for reimbursing health-care ser-
“individual” visits specific to treating the nursing vices naturally lends itself to the aggregation of
student’s episode (Solon et al. 1969). such claims into episodes of care. In countries
such as Canada or some European nations that
make greater use of global budgets for funding
Outcome Measures health-care services, constructing episode of care
investigations of health-care costs requires the
Ultimately, the episode of care is intended to serve development of methodological approaches that
as a clinically relevant unit of analysis for mea- serve as surrogates for “pricing.” In Ontario, such
suring particular aspects of care or outcomes approaches have been developed using a combi-
delivered. In the broadest sense, any outcome nation of case mix cost estimation methodologies
measured at a standard time frame (e.g., 30-day for globally budgeted hospital sectors and claims
mortality) might be considered an application schedules for physicians and other fee-for-service
of the episode-based approach. However, most providers (Sutherland et al. 2012).
episode-based studies have focused largely on pro-
cess- or utilization-related measures. Following
Falk et al.’s concept of the episode or pathway as Examples
a unit of analysis for auditing quality of care (Falk Sutherland et al. compared the total costs (includ-
et al. 1967), Lohr and Brook (1980) compared ing hospital, physician and inpatient, and
providers’ use of appropriate therapy for respira- community-based rehabilitation) of hip and knee
tory infection, while Nutting et al. (1981) used replacement episodes between regions in Ontario,
episodes of care to compare health systems’ per- correlating higher costs with the use of less effi-
formance in terms of preventative services, timely cient care settings (Sutherland et al. 2012).
diagnoses, continuity of care, and other factors. After defining the most common combina-
By far the most common use of episodes since tions of services (or “patterns of care”) used for
their earliest uses has been for examination and each type of episode, Moscovice evaluated the
comparison of health-care costs and utilization: proportion of episodes delivered according to
studies by Scitovsky (1967) and Solon et al. these patterns and compared the results between
(1969) examined measures of total episode costs different care providers and settings (Moscovice
and number of visits by different health profes- 1977).
sionals, respectively. A popular use of episode- Scitovsky used episode-based measures of
based cost measures involves the comparison of total health-care costs per treated condition to
different physicians or physicians’ practices in assess differences in costs (and the changes in
terms of the total downstream health-care costs service mix driving these differences) for episodes
of their patients – a practice known as profiling of care over time (Scitovsky 1967).
204 E. Hellsten and K. J. Sheehan
Lohr and Brook (1980) used an episode-based Data Source: Canadian Institute
analysis to compare quality of care for respiratory for Health Information Discharge
conditions before and after the publication of Abstract Database
guidelines on the use of injectable antibiotics,
defined as the percentage of episodes that The Canadian Institute for Health Information
included appropriate use of antibiotic therapy (CIHI) is an independent, not-for-profit organiza-
(Lohr and Brook 1980). tion that provides information on Canada’s health
system and the health of Canadians (Canadian
Institute for Health Information 2015). CIHI facil-
Constructing an Episode of Care: A Hip itates collection of standardized administrative,
Fracture Example clinical, and demographic data from acute hospi-
talizations through the Discharge Abstract Data-
Research Question base (DAD). The data (2003–2012) are presented
as a series of flat comma-delimited files with
As described earlier, the episode of care can be as multiple abstracts for some patients. To prepare
comprehensive or as specific in its inclusion of data for analysis, researchers develop a relational
services as is desired and feasible. Here, an exam- database to facilitate combining abstracts into epi-
ple is presented of a more focused episode con- sodes of care. In the following sections, a concep-
struction to address the question of the effect of tual framework for constructing an episode of hip
timing of hip fracture surgery on patient out- fracture care and the approach for operatio-
comes. Many argue that patients presenting to nalizing it using the CIHI abstracts is described.
hospital with hip fracture should receive surgery Here a method for constructing an episode of
as early as possible; however, the literature detail- care to study the effects of timing of hip fracture
ing the benefits of accelerated access to the proce- surgery using acute care discharge abstracts is
dure is inconclusive. Furthermore, little is known described, and therefore, the episode is confined
as to causes of delay: some patients wait to be to patients admitted to the hospital and outcomes
medically stabilized, while others are delayed due occurring in-hospital. Data relating to emergency
to administrative factors such as hospital type, department wait times or post-acute care utiliza-
transfers, and date and time of admission. tion was not provided.
The literature identifies the following path-
ways on the basis of treatment patients receive
during acute hospitalization with hip fracture: sur- Defining the Index Event
gical treatment (Menzies et al. 2010), nonsurgical
treatment (Jain et al. 2003), or palliative care The ideal index event is injury time. This event
(Meier 2011). Most patients undergo surgical enables researchers to capture all hip fracture
treatment during either their initial hospitalization patients, includes events preceding hospital
or after transfers from hospitals where patients are admission such as prehospital death, and captures
initially admitted. While in the hospital, some the time from injury to admission which contrib-
patients are medically stabilized before surgery. utes to delays (Sheehan et al. 2015). However,
Patients remain in the hospital after surgery until injury time is not available through administrative
they are fit to be discharged home or to an alter- databases and therefore alternative index events
native level of care. Some patients receive must be considered. When identifying the index
nonsurgical management of their hip fracture as event for the episode from administrative data,
their risk of complications and death is too high. researchers may select the hip fracture surgery
These patients are medically stabilized and procedure, the hip fracture diagnosis, or admis-
discharged home or to an alternative level of sion with a diagnosis of hip fracture (Fig. 3).
care. Palliative care is offered to patients at the A procedure approach captures outcomes which
end stage of a terminal illness. occur postoperatively implying that time at risk
9 Health Services Information: Key Concepts and Considerations in Building Episodes of Care. . . 205
Fig. 3 Approaches to defining the index event for a hip boxes and arrows represent events and their timings
fracture episode of care. Thick vertical lines indicate the ascertained retrospectively. Solid box and arrows represent
index event for constructing each care episode. Dashed events and their timing ascertained prospectively
begins at the time of surgery. A diagnosis from admission to surgery within a single dis-
approach includes patients who incur a hip frac- charge abstract. Where the index event is diagno-
ture in acute hospital following admission for sis, time to surgery is measured from diagnosis
another diagnosis. Here an admission approach (preadmission or postadmission) to surgery.
is adopted as it allows researchers to capture out- Where the index event is admission, time to sur-
comes which occur before surgery, including pre- gery is measured from the earliest admission time
operative death, while excluding patients who to surgery time, preoperative death, or discharge
incur a hip fracture in the hospital after admission without surgery. This approach is inclusive of
for another diagnosis (Sheehan et al. 2015). transfers which occurred between admission and
discharge, a potential administrative factor for
Defining the Endpoint delay (Fransoo et al. 2012).
Transfers from one acute care facility to
In this example, a clinically logical event defines another present in the data as a single patient
the endpoint: death, discharge home, or discharge with multiple records for hip fracture. Here, con-
to an alternative level of care. A fixed point in time tiguous abstracts linked by transfers are combined
is also considered an endpoint as the dataset is in one episode; the earliest admission date and the
censored at March 2012. latest discharge date are designated as the begin-
ning and the end of episode (Fig. 4). To determine
whether multiple records for a given patient reflect
Scope of the Services Included transfer before definitive care, the following rules
are applied:
In this example services included are specific to
the effect of surgical timing on outcomes of acute 1. Less than 6 h between discharge on one
hip fracture care. First, researchers define how abstract and admission on another abstract
time to surgery is measured. Where the index (12 h if at least one institution codes the
event is surgery, time to surgery is measured transfer)
206 E. Hellsten and K. J. Sheehan
Fig. 4 Conceptual framework for constructing hip frac- home or to an alternate level of care. On completion of the
ture acute episodes of care. A patient is admitted to acute first episode of care, a patient may return to acute care for a
hospital for their first episode of hip fracture care. They related episode – revision surgery, readmission, or for a
may be transferred from one acute care facility to another change in care. Alternatively a patient may return to acute
before definitive care – surgery or conservative manage- care with an entirely new subsequent hip fracture
ment. Once acute care is completed, they are discharged
2. Admission before 6:00 (12:00 if at least one surgery (Fig. 4). Finally, patients may present to
institution codes the transfer), when discharge the hospital with an entirely new subsequent
and admission occur on 1 day but discharge hip fracture (Fig. 4). Following consultation
time is unknown with orthopedic surgeons, the following rules
3. Discharge after 18:00 (12:00 if at least one are created for patients with multiple dis-
institution codes the transfer), when discharge charge abstracts to identify related episodes
and admission occur on 1 day but admission as revision, readmission, change in care, or
time is unknown subsequent:
After discharge from acute care, some • Revision: surgical admission within 90 days of
patients return to the hospital for an episode discharge after initial surgical episode
related to their hip fracture. They may return to • Readmission: nonsurgical admission within
acute care with a complication that requires 90 days of discharge after initial surgical/
revision surgery such as a failed fixation/prosthe- nonsurgical episode
sis. Alternatively they may return to acute hospi- • Change in care: surgical admission within
tal for treatment of medical complications related 30 days of admission for initial nonsurgical
to their hip fracture. Patients discharged without episode
surgery may also return for surgery to alleviate • Subsequent: hip fracture admission more than
pain or if they are no longer considered unfit for 90 days after the initial episode
9 Health Services Information: Key Concepts and Considerations in Building Episodes of Care. . . 207
After the application of the rules, some adja- • Episode of hip fracture care table contains
cent abstracts remain unassigned because their discharge abstracts of the first and subsequent
admission and discharge dates are in reverse episodes of hip fracture care, uniquely identi-
order. Only abstracts with the earlier admission fied by patient id and hip fracture number. The
date for constructing care episodes are used. episode may combine information from
abstracts linked by transfers.
• Revision surgery table contains discharge
Data Model abstracts of surgical hospitalization following
first or subsequent episodes of hip fracture
For patients with a single discharge abstract, the care.
abstract represents the first episode of hip fracture • Readmission table contains discharge abstracts
care. Multiple abstracts for a given patient could of nonsurgical hospitalization following first or
represent the first episode of hip fracture care, subsequent episodes of hip fracture care
revision surgery, readmission, change in care, or whether surgical or medical.
a subsequent episode of hip fracture care. As such, • Change in care table contains discharge
the data fields from multiple discharge abstracts abstracts of surgical hospitalization following
are used to construct new fields or update infor- first or subsequent nonsurgical episodes of hip
mation in the same field but from a different fracture care.
abstract. A data model is developed to relate mul- • Other tables contain demographic and comor-
tiple abstracts of hip fracture care for a given bidity data.
patient, which explicitly defines how data fields
relate to each other (Table 1). In particular, the Normalization is used to organize the CIHI
data model establishes relationships among tables discharge abstracts. First, repeating data fields
containing discharge abstracts of the first episode with similar data in individual tables are elimi-
of hip fracture care, revision, readmission, change nated, a separate table for each set of related data
in care, and subsequent hip fracture episodes. is created, and each set of related data is classified
This involves creating a series of data tables with two primary keys: patient id and hip fracture
and establishing relationships between them: number. This normalization helps avoid multiple
Table 1 Algorithm for identifying and classifying episodes of hip fracture care
Step 1 Remove duplicates from CIHI records
Step 2 For patients with single record, convert their records into episodes of initial hospitalization
Step 3 For patients with multiple records, combine records linked by transfers into care episodes:
(a) Designate the earliest unlinked record as the start of a new episode
(b) Combine contiguous records into an episode of care if transfer is identified
(c) If records remain, go to 3a
Step 4 For each patient, classify the episode with earliest admission as initial hospitalization
Step 5 Classify episodes of surgical hospitalization with admission within 90 days of discharge from initial surgical
hospitalization as revision
Step 6 Classify episodes with admission within 90 days of discharge from initial nonsurgical hospitalization as
readmission
Step 7 Classify episodes of surgical hospitalization with admission within 30 days of admission from initial
nonsurgical hospitalization as change in care
Step 8 For each patient, classify the episode with earliest admission beyond 90 days of discharge from initial surgical
hospitalization as initial hospitalization with a new fracture
Step 9 Mark episodes with admission for open, pathological, and post-admit fracture
Step Mark records not assigned to any episode as unassigned
10
208 E. Hellsten and K. J. Sheehan
fields storing similar data in one table. Second, different clinically logical events to define the
separate tables for groups of data fields that apply endpoint: death, change in surgical candidacy, or
to multiple abstracts are created, and these tables the procedure itself. Sobolev and Kuramoto stud-
are related with a foreign key. This normalization ied outcomes of surgical cardiac care according to
maintains records that only depend on a table’s time to surgery (Sobolev and Kuramoto 2007).
primary key.
Data Sources
Use of the Data
Data on patients registered to undergo CABG are
The dataset was created for estimating the fre- obtained from the British Columbia Cardiac Reg-
quency of preoperative deaths, postoperative istry (BCCR) (Volk et al. 1997). This prospective
complications, and in-hospital deaths following database contains dates of registration on the
complications among patients exposed to various list, procedure, and withdrawal from the list,
times before surgery. More specifically, the along with disease severity and other risk factors,
dataset creation enabled capturing events and for all patients who are registered to undergo
durations associated with hip fracture care deliv- CABG in any of the four tertiary care hospitals
ery. By operationalizing patient pathways in terms that provide cardiac care to adult residents of
of data available from the CIHI, preoperative British Columbia. Additional information on
transfers, surgery, postoperative transfers, and access to CABG is obtained from the BC
outcomes of admission (preoperative death, post- Linked Health Database Hospital Separations
operative complications, and death), as well as File (Chamberlayne et al. 1998) and deaths from
events following discharge (readmissions, revi- the provincial Death File (Sobolev et al. 2006).
sions, subsequent hip fractures), were captured.
From this dataset the durations of hospital stay,
preoperative stay, and postoperative stay were Capturing Events by Linking Data
estimated. Sources
Patient and administrative factors for delay
including demographic, clinical, and injury data The care episode begins with a cardiac surgeons’
fields and hospital type, date, and time of admis- assessment and includes hospital inpatients and
sion were also captured. These data facilitate the outpatients registered on a wait list for elective
assessment of potential causes of delay. Combin- CABG. A series of events take place preopera-
ing discharge abstracts of all patients, whether tively outside the hospital; preoperatively,
they have surgical or nonsurgical treatment or perioperatively, and postoperatively in the hospi-
die before surgery, facilitates assessment of the tal; and postoperatively outside the hospital. The
total harm from delays by considering deaths in care episode ends with death, change in surgical
those who did not make it to surgery. candidacy, or the procedure itself.
For patients registered on a wait list for elective
CABG, a preoperative assessment, which may
Constructing an Episode of Care: include additional tests, may occur prior to admis-
A Cardiac Example sion or in the hospital. Their surgical candidacy is
then confirmed or refuted by an anesthesiologist.
Research Question Once a patient is identified as a surgical candidate,
their access to the procedure is determined
A patient identified as in need of coronary artery through scheduling of operating room time.
bypass graft (CABG) while a hospital inpatient or Patients are selected from hospital admissions
as an outpatient is registered on a wait list for and from the wait list on the basis of urgency,
the elective procedure. A patient may encounter resource availability, and plan for discharge from
9 Health Services Information: Key Concepts and Considerations in Building Episodes of Care. . . 209
the hospital. The allocated time may change if determine the interval (wait time) between events.
emergent cases arise, if cancelations occur prior Once sequenced the person-episode is created
to the scheduled time, or if a patient’s status which includes a de-identified patient number
changes during their wait. The patient is assessed and an event number. This combination uniquely
again preoperatively, receives their surgery, is determines the patient-episode related to a specific
monitored postoperatively in the postanesthesia event.
care unit, and is transferred to the ward or inten-
sive care unit. The patient’s postoperative recov-
ery is managed in the hospital until they are Linkage of Cardiac Registry, Hospital
suitable for discharge home or to an alternate Separations, and Death Files
level of care. On discharge the patient is followed
up in the community until their recovery is com- A patients’ Provincial Health Number is used to
plete or death occurs. link BCCR records with the BC Linked Health
Patient-level records in administrative health Database Hospital Separations File and to the
databases may have multiple records for one Death File. Events including hospital admission,
patient. Patient records may be organized in two comorbidities, surgery, hospital separation, and
different formats – the “person-level” format or discharge type (home, alternate level of care, or
the “person-episode” format. The person-level death) are retrieved from the BC Linked Health
format contains a single record per patient. In the Database Hospital Separations File. Deaths which
current example, this approach would enable do not occur in the hospital are captured by the
researchers to capture the time from inpatient Death File. Adopting a person-episode approach,
registration on a wait list for elective CABG to the BCCR records are linked to the BC Linked
the procedure, discharge, or transfer to an alter- Health Database Hospital Separations Files and
nate level of care from a single hospitalization the Death Files to create an analytical dataset. An
record. The person-episode format contains mul- analytical data dictionary is created to describe the
tiple records per patient. In the current example, variables created to represent events and patient
this approach would enable researchers to capture characteristics (Table 2).
the time from inpatient or outpatient registration
on a wait list for elective CABG to the procedure,
discharge, or transfer to an alternate level of care Use of the Data
from multiple administrative records. As the pre-
sent study aim is to determine the impact of waits The dataset was created for estimating outcomes
on outcomes in cardiac care, all events contribut- of registration for elective (nonemergency) pro-
ing to the wait and potential outcomes of waiting cedures in surgical cardiac care. These outcomes
should be captured. In order to achieve this, the included preoperative death, postoperative death,
person-episode approach is adopted whereby change in urgency status, and unplanned emer-
multiple data sources are linked. gency surgery among patients exposed to various
The series of events during the care episode times before CABG. More specifically, the dataset
and patient characteristics are captured with creation enables capturing events and durations
administrative data entry. A data model which associated with registration on a wait list
chronologically relates events captured by data for CABG.
elements is created. Events of interest include By operationalizing patient pathways in terms
registration and removal from the wait list, hospi- of the data available from the cardiac registry,
tal admission and discharge, scheduled surgery hospital separations and death file preoperative
and unplanned emergency surgery, and preopera- events (delay to surgery, change in urgency status,
tive, in-hospital, or follow-up death. Each event unplanned emergency surgery, death) and postop-
has an associated time stamp which allows erative death were captured. Furthermore, the
researchers to sequence the events and to durations of time spent on the wait list for elective
210 E. Hellsten and K. J. Sheehan
Table 2 Analytical dataset data dictionary for records of patients awaiting elective coronary artery bypass grafting
Variable Description Source Code
BCCR_ID Patient identifier BCCR <Text>
AGECAT Age decade BCCR 1 – 20–29 years
2 – 30–39 years
...
8–90 years
SEXF Sex BCCR 0 – man
1 – woman
ANATOM Coronary anatomy BCCR 1 – left main disease
2 – 2- or 3-vessel disease, with PLAD
3 – 3-vessel disease, with no PLAD
4 – 1-vessel disease, with PLAD
5 – 1- or 2-vessel disease, no PLAD
U – otherwise and unknown
UR_BR Urgency at booking BCCR 0 – emergency
1 – urgent
2 – semiurgent
3 – nonurgent
U – unknown
CM_CH Comorbidities from Charlson Hospital 0, 1, 2, 3, or 4 (¸4)
index separations
CM_BK Major comorbidities Hospital 1 – CHF or diabetes or COPD or rheumatism or
separations cancer
0 – other
INST_BK Location at registration BCCR Hospital 1, 2, 3, or 4
WL_ST Wait-list registration date BCCR mm/dd/yyyy
WL_EN Wait-list removal date BCCR mm/dd/yyyy
WL_RM Reason for removal BCCR 0 – underwent surgery
1 – death
2 – medical treatment
3 – at patient request
4 – transfer to other hospital
5 – otherwise removed from list
6 – no surgical report
7 – still on wait list
8 – other surgery
9 – death recorded in BCCR, not in Deaths File
DTHDATE Death date Death file mm/dd/yyyy
< . > – no date recorded
EXIT_CODE Type of hospital discharge Hospital D – discharged alive
separations S – left against medical advice
X – died in the hospital
N/A – not applicable
ADDATE Hospital admission date Hospital mm/dd/yyyy
separations < . > – no date recorded
SEPDATE Hospital separation date Hospital mm/dd/yyyy
separations < . > – no date recorded
With kind permission from Springer Science + Business Media: Analysis of Waiting-Time Data in Health Services
Research, Waiting-time data used in this book, volume 1, 2008, 21–22, Boris Sobolev and Lisa Kuramoto, Table 2.1
9 Health Services Information: Key Concepts and Considerations in Building Episodes of Care. . . 211
surgery were estimated by urgency status. These similar clinical and resource utilization character-
data enabled researchers to answer questions istics (Fetter et al. 1980).
such as: Developing a case mix classification system is
a significant endeavor. Rather than development
• What is the variation in time spent waiting for being limited to a few particular types of episodes
elective surgery? of interest, case mix systems operate under prin-
• What is the effect of delays in scheduling an ciples of being mutually exclusive and compre-
operation? hensively exhaustive: thus, an effective episode
• Do longer delays contribute to preoperative grouping system (also known as a “grouper”)
mortality among patients with less urgent would feature logic to assign every health-care
need for surgery? service claim or encounter record to a particular
• What is the survival benefit of cardiac surgery? type of episode, selected from a limited list of
• What is the risk of death associated with episode categories.
delayed surgical treatment? From the researcher’s perspective, the decision
on the appropriate approach here depends on the
Combining data of all patients registered on the objectives of the analysis: if the objective is to
CABG wait list, whether they went on to receive develop an episode-based payment system that
surgery or not, facilitates assessment of the total provides payments for all health-care services
harm from delays by considering change in through an “episode bundle,” a full case mix sys-
urgency status and deaths in those who did not tem will be required to ensure all patients are
make it to surgery. assigned to a particular category. If the idea is to
simply focus on analyzing a few different types
of episodes, a full case mix system will not be
Expanding on and Applying Episodes required, although an existing public domain
of Care: Further Considerations or commercial episode grouping product could
be applied to define any number of episodes
Building Episode-Based Case Mix based on preexisting grouping algorithms. If an
Classification Systems existing episode grouping solution is applied, the
researcher is advised to acquire a thorough under-
While most of the studies conducting episode- standing of the underlying clinical logic of the
based analyses reviewed in this chapter focus on software.
a limited set of conditions, episode grouping soft-
ware such as the Symmetry Episode Treatment
Groups (ETGs) (Optum 2015), Thomson Reuters Risk Adjustment and Severity
Medical Episode Groups (MEGs) (MaCurdy Classification
et al. 2009), and the Centers for Medicare and
Medicaid Services’ episode grouping algorithms A key enhancement made in the 1990s over the
(Centers for Medicare and Medicaid Services basic episode concept of episode grouping and
2015b) seek to assign all patient health-care classification systems was the development of
encounters to mutually exclusive episodes based episode-based risk adjustment models. Wingert
on their diagnosis and procedure combinations. et al. (1995) first noted the need to incorporate
Such systems are developed with the objective severity adjustment into episode-based analyses,
of establishing a comprehensive episode-based beyond that offered by a diagnosis-based classifi-
case mix classification system, analogous to the cation system (Wingert et al. 1995).
long-established diagnosis-related groups (DRGs) Some episode grouping methodologies such as
and other similar classification systems that cate- the ETGs employ a hierarchy of subcategories
gorize hospital inpatient stays into one of several within each type of episode to differentiate
hundred preestablished case mix groups that share between episodes of different severity levels.
212 E. Hellsten and K. J. Sheehan
These subcategories may be defined with a variety cost performance of physicians or determining
of proxy data points, including patient character- what providers would be eligible to receive a
istics such as comorbidities or the type of health- share of a bundled payment. In such applications,
care services received. For example, a diabetes business rules must be defined for the attribution of
episode restricted to ambulatory services may be the episode to one or more providers. A variety of
assigned to a lower severity level than a diabetes approaches to this task are possible and have been
episode that includes a hospitalization for compli- explored in the literature. Using a retrospective
cations of diabetes. The use of different severity approach to assigning episodes to providers based
categories within episode groups allows for the on historical fee-for-service claims data, Hussey
expected cost (or sometimes, price) of the episode et al. (2009) examined the impacts of alternate
to differ by severity level, in order to compare rules for assigning episodes of care to physicians
“apples to apples” in performance profiling appli- and facilities, with options including attribution to
cations or ensuring fair reimbursement levels in a single physician or facility with the highest total
funding applications. charges in retrospective claims, assignment to a
Even with the use of severity levels within group of physicians or facilities that met a mini-
episode groups, there may still be challenges mum threshold of 25 % of total charges, and
with episode heterogeneity: MaCurdy et al. assignment to the physician with the highest pro-
conducted an extensive series of simulation ana- portion of evaluation and management claim
lyses using proprietary episode groupers and charges, using the rationale that this physician
found substantial residual variation in unex- was likely to be the “most responsible” for manag-
plained costs within each severity grouping ing the patient’s care. They concluded that the
(MaCurdy et al. 2009). Certain types of health- performance of alternate rules depended signifi-
care utilization that may potentially be included in cantly on the trajectory of the condition studied:
the scope of the episode have been found to con- for example, a largely hospital-based episode such
tribute substantial portions of this unexplained as myocardial infarction was more easily assigned
cost variation: Vertrees and other researchers to a single facility and physician than a largely
with 3M Inc. examined a variety of different sets ambulatory-based episode such as diabetes, where
of parameters for defining post-acute episode win- facilities played a relatively minor rule and a larger
dows and found that by excluding readmissions number of providers were involved in providing
from the episode, the performance of existing case care to individual patients (Hussey et al. 2009).
mix systems in terms of predicting total episode
costs was vastly improved (Vertrees et al. 2013).
In addition to methods for risk adjustment within Policy Applications
episode groups, some commercial groupers such
as the ETG and MEG methodologies also enable Up until the 1990s, the use of episode of care
the user to calculate an aggregate risk score for an methods was mainly confined to research-oriented
individual based on their total episode history in a applications and focused on a small set of conditions
given time period. In such applications, a total risk or procedures. In parallel, in the 1980s the US health
score is calculated based on the sum of individual policy landscape was transformed with the devel-
risk scores assigned to each type of episode expe- opment and wide-scale use of the DRGs acute
rienced by an individual. inpatient case mix classification system (Fetter
et al. 1980). This was first developed for the pur-
poses of utilization review and then subsequently,
Attributing Episodes to Providers and most importantly, applied for the purposes of
Medicare hospital payment.
Episodes of care may be used in applications that In the 1990s, the first commercial episode-based
involve assigning an episode to a particular pro- case mix classification systems emerged and began
vider entity: for example, comparing the relative to be employed by insurers and health maintenance
9 Health Services Information: Key Concepts and Considerations in Building Episodes of Care. . . 213
organizations for comparing efficiency across payment” providers effectively received for
groups of providers (Wingert et al. 1995). These such incidents under the fee-for-service payment
early efforts evolved into well-developed commer- system.
cial platforms such as the ETGs (Optum 2015) and Building on the success of some earlier bun-
the MEGs (MaCurdy et al. 2009). The ETGs and dled payment pilot programs that employed lim-
MEGs both use a flexible time window used to ited episodes of care focused on hospital and
delineate different episodes. Episode-based classi- physician services within single acute care stays,
fication software enabled commercial insurers to in 2011 the Centers for Medicare and Medicaid
assign all their claims and encounter data to distinct Services announced the “Bundled Payments for
episodes, advancing the practical use of episodes of Care Improvement” (BPCI) initiative, a landmark
care for policy applications such as payment and demonstration project that allowed providers to
physician profiling. volunteer for participation in a suite of bundled
In the past decade, episode-based payment and payment options, including episodes indexed by
performance measurement approaches have gath- an acute inpatient hospitalization for a set of eli-
ered huge momentum in the United States, in gible conditions and extending into either 30, 60,
large part due to the Medicare Payment Advisory or 90 days of post-acute care and episodes limited
Committee (MedPAC)’s endorsement of bundled to post-acute care only with similar 30, 60, or
payment approaches as a transformative alterna- 90 day time window options. All Medicare Part
tive to the predominantly fee-for-service payment A and Part B services are included in the episode.
systems employed in the United States. In their For each episode, a single payment is determined
influential 2008 report, Reforming the Medicare for the group of providers based on their historical
Delivery System, MedPAC put forward a strong service claims for similar episodes previously
case backed by extensive analysis for a nation- provided and adjusted for regional and national
wide shift toward bundled payments for episodes spending levels.
of care defined by an acute hospitalization and a The majority of BPCI participants are enrolled
fixed window of post-acute care services (Medi- in “retrospective” models, where providers con-
care Payment Advisory Committee 2008). Such a tinue to be paid on a fee-for-service basis followed
payment approach, MedPAC argued, would have by an episode-based reconciliation against the
the promise of overcoming several important lim- target total “price” for the episode by all providers
itations of Medicare’s fee-for-service payment participating in the demonstration project. Thus,
approaches. Payments for episodes of care shared groups of providers that are able to deliver epi-
across groups of providers would offer strong sodes at a significantly lower cost than their target
financial incentives for groups of physicians, hos- price are eligible for a share in the savings,
pitals, and post-acute care providers to work whereas providers that exceed the target price
together, coordinate services, and redesign patient may be eligible to return a share of the overspend-
pathways to improve efficiency across the epi- ing to Medicare. As of July 2015, there were over
sode. Bundled payments would also target 2000 provider entities that had contracted to par-
observed unwarranted regional variations in the ticipate in one of the BPCI models (Centers for
provision of post-acute care services for similar Medicare and Medicaid Services 2015).
types of patients, where some areas made much As these and other current major episode-
more use of more costly and intensive settings driven policy initiatives in the United States, the
such as inpatient rehabilitation beds and skilled Netherlands, Sweden, and elsewhere make abun-
nursing facilities than others. Finally, bundled dantly clear, the episode of care is currently
payments would drive improved quality of care experiencing a renaissance in terms of its use as
by ensuring that providers would be forced to a foundational analytic construct to support pay-
absorb the costs of unplanned readmissions and ment system design, performance measurement
complications occurring following discharge initiatives, and a wide variety of health services
from acute care, as opposed to the “double research applications.
214 E. Hellsten and K. J. Sheehan
to preoperative mortality in less urgent patients? Med health care costs in the Netherlands: an analysis for
Care. 2006;44(7):680–6. diabetes care and vascular risk management based on
Solon J, Sheps CG, Lee SS. Delineating patterns of nationwide claim data, 2007-2010. RIVM Report.
medical care. Am J Public Health Nations Health. 2012b;260013001. http://rivm.openrepository.com/
1960;50(8):1105–13. rivm/handle/10029/257206. Accessed 27 Oct 2016.
Solon JA, Feeney JJ, Jones SH, et al. Delineating episodes Sutherland JM, Hellsten E, Yu K. Bundles: an opportunity
of medical care. Am J Public Health Nations Health. to align incentives for continuing care in Canada?
1967;57(3):401–8. Health Policy (Amsterdam, Netherlands). 2012;107
Solon JA, Rigg RD, Jones SH, et al. Episodes of care: (2–3):209–17.
nursing students’ use of medical services. Am J Public Vertrees JC, Averill RF, Eisenhandler J, et al. Bundling
Health Nations Health. 1969;59(6):936–46. post-acute care services into MS-DRG payments.
Struijs JN, De Jong-Van Til JT, Lemmens LC, Drewes HW, Medicare Medicaid Res Rev. 2013;3(3):E1–E19.
De Bruin SR, Baan CA. Three years of bundled payment Volk T, Hahn L, Hayden R, et al. Reliability audit of a
for diabetes care in the Netherlands. Impact on health regional cardiac surgery registry. J Thorac Cardiovasc
care delivery process and the quality of care. RIVM Surg. 1997;114(6):903–10.
Report. 2012a;260013001. https://www.researchgate. White KL, Williams TF, Greenberg BG. The ecology of
net/profile/Jeroen_Struijs/publication/233407675_ medical care. N Engl J Med. 1961;265:885–92.
Three_years_of_bundled_payment_for_diabetes_care_ Wingert TD, Kralewski JE, Lindquist TJ,
in_the_Netherlands._Effect_on_health_care_delivery_ et al. Constructing episodes of care from encounter
process_and_the_quality_of_care/links/09e4150 and claims data: some methodological issues. Inquiry.
a50b96ad6cb000000.pdf. Accessed 27 Oct 2016. 1995;32(4):430–43.
Struijs JN, Mohnen SM, Molema CC, De Jong-van Til JT,
Baan CA. Effects of bundled payment on curative
Health Services Information: Lessons
Learned from the Society of Thoracic 10
Surgeons National Database
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
The Evolution of Healthcare Quality Measurement and Clinical Registries . . . . . . . . . . . 218
Database Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
STS Adult Cardiac Surgery Database
(STS-ACSD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
STS Congenital Heart Surgery Database
(STS-CHSD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
STS General Thoracic Surgery Database
(STS-GTSD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Database Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Vendors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
STS Staff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Data Warehouse and Analytic Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Data Quality and Audit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
STS Quality Measurement Task Force
(STS-QMTF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
STS Quality Initiatives Task Force (STS-QIT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
STS Public Reporting Task Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
STS Research Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
STS Task Force on Longitudinal Follow-Up and Linked Registries (STS-LFLR) . . . . 235
Device Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
D. M. Shahian (*)
Department of Surgery and Center for Quality and Safety,
Massachusetts General Hospital, Harvard Medical School,
Boston, MA, USA
e-mail: dshahian@partners.org
J. P. Jacobs
Division of Cardiac Surgery, Department of Surgery, Johns
Hopkins University School of Medicine, Baltimore, MA,
USA
Johns Hopkins All Children’s Heart Institute, Saint
Petersburg/Tampa, FL, USA
I am called eccentric for saying in public that hos- which could be used for provider profiling,
pitals, if they wish to be sure of improvement. . . research, and policy development. The use of
• Must find out what their results are. statistical techniques such as logistic regression
• Must analyze their results to find their strong and and hierarchical regression expanded dramati-
weak points. cally in the latter half of the twentieth century,
• Must compare their results with those of other facilitated in large part by the exponential growth
hospitals.
• Must care for what cases they can care for well, of computing power and mass data storage
and avoid attempting to care for cases which capacity.
they are not qualified to care for well. Another essential component for the develop-
• Must welcome publicity not only for their suc- ment of robust quality assessment and improve-
cesses, but for their errors, so that the public may
give them their help when it is needed. ment was the evolution of clinical data registries.
• Must promote members of the medical staff on Several seminal events in the mid- and late 1980s
the basis which gives due consideration to what provided the proximate stimulus for the develop-
they can and do accomplish for their patients. ment of cardiac surgery databases, including the
• Such opinions will not be eccentric a few years
hence Society of Thoracic Surgeons (STS) National
Database, which was the first large-scale clinical
Codman started his own End Result Hospital registry developed by a professional society. On
built upon these principles, but it eventually March 12, 1986, the Health Care Financing
closed. Although Codman was ridiculed and Administration (HCFA, the predecessor of the
disdained by many colleagues at the time, his Centers for Medicare and Medicaid Services or
work led directly to the formation of the American CMS) published a list of hospital mortality rates
College of Surgeons and the Joint Commission on which were based on administrative claims data
Accreditation of Healthcare Organizations. and minimally adjusted for patient risk. This was
The third visionary leader in healthcare quality referred to by some as the Medicare “death list,”
measurement was Professor Avedis Donabedian and it was widely criticized for its methodological
at the University of Michigan (Donabedian 1966, shortcomings. However, despite these flaws, it was
1988). Donabedian was the first to propose that apparent to some farsighted leaders that this was
healthcare quality could be measured using struc- the beginning of a new era in healthcare transpar-
ture (e.g., 24/7 intensivist availability, nursing ency. Among those who envisioned this future
ratios, adoption of computerized physician order state were the leaders of STS. The most commonly
entry), process (e.g., achieving an “open artery” performed procedure by members of that organi-
within 90 min for patients suffering an acute MI, zation, coronary artery bypass grafting surgery
administering aspirin to acute MI patients), and (CABG), was a natural target for early efforts to
outcomes (e.g., mortality, complications, readmis- assess performance. CABG was one of the most
sions, patient-reported outcomes). Donabedian frequently performed and costly procedures in
stressed that “Outcomes, by and large, remain healthcare at that time and had well-defined out-
the ultimate validators of the effectiveness and comes including mortality, stroke, reoperation,
quality of medical care” (Donabedian 1966), kidney failure, and infections. Owing in part to a
anticipating the current emphasis on outcomes torrent of requests from STS members who
measurement as the optimal way to assess quality believed that the HCFA “death list” had mischar-
in healthcare. acterized their programs as underperforming, STS
The science and technology necessary to actu- leaders recognized the inadequacy of using mini-
alize the conceptual framework of Nightingale, mally adjusted claims data to evaluate program
Codman, and Donabedian did not become widely performance. An ad hoc committee on risk factors
available until the latter half of the twentieth cen- was developed by STS (Kouchoukos et al. 1988)
tury. The enactment of Medicare legislation in in order to define those patient factors that would
1965 resulted in a huge new claims data source be required to fairly adjust for inherent patient risk.
220 D. M. Shahian and J. P. Jacobs
These were incorporated into what subsequently Over the next quarter century, the STS National
was to become the STS National Database (STS Database has expanded from its initial focus on
National Database 2014), which was made avail- adult cardiac surgery, particularly CABG, to
able to STS members in 1989 under the direction of encompass all major cardiac surgical procedures
Dr. Richard Clark (Clark 1989; Grover et al. 2014). in the adult, as well as congenital heart surgery
The 1986 HCFA release of mortality reports and general thoracic surgery. By 2014, over 1080
also stimulated the development of other cardiac programs participated in the STS Adult Cardiac
surgery database and performance monitoring Surgery Database (90–95 % of all US programs),
initiatives. In New York State, Dr. David 114 programs were contributors to the STS Con-
Axelrod, the commissioner of health, was aware genital Database (95 % of all US programs), and
of a fivefold variation in unadjusted mortality 244 programs participated in the STS General
rates for coronary artery bypass grafting surgery Thoracic Database. Seven international sites also
(CABG) among the 28 cardiac surgical programs participated. Figures 1, 2, and 3 demonstrate the
in that state. However, he and the New York geographical distribution of participants in the
Cardiac Advisory Committee recognized that three STS National Databases.
acting upon this data would be challenging, as
low-performing hospitals would likely assert that
their patients were “sicker,” just as they had Database Structure
when HCFA released its mortality reports.
Accordingly, in collaboration with Dr. Edward The STS National Database is composed of three
Hannan, a clinical data registry for CABG was clinical specialty databases and 10 functionally ori-
developed (the New York Cardiac Surgery ented, crosscutting task forces (Table 1). Each of
Reporting System or CSRS) (Hannan et al. the three clinical specialty databases has its own
2012). Using these data, expected results for unique features and, in some instances, challenges.
each patient were estimated and aggregated to
the program and surgeon levels. Comparing
observed and expected results made it possible STS Adult Cardiac Surgery Database
to generate risk-standardized mortality rates and (STS-ACSD)
ratios, and these were first released to the public
in 1990. These results demonstrated that not only The STS Adult Cardiac Surgery Database
was there wide variation in unadjusted mortality (STS-ACSD) is the oldest of the three specialty
rates but also in risk-adjusted mortality rates. databases and has the largest number of partici-
Similarly, the Northern New England Cardiovas- pants (approximately 1080 in the USA). Based on
cular Disease Study Group (O’Connor et al. studies by Jacobs and colleagues, center-level
1991) found wide variation in the ratio of penetration (number of CMS sites with at least
observed to expected mortality among CABG one matched STS participant divided by the
programs in that region. total number of CMS CABG sites) increased
In summary, the early development of clinical from 83 % to 90 % between 2008 and 2012
data registries by STS, as well as a few states and (Jacobs et al. 2016). In 2012, 973 of 1,081 CMS
regions, was driven by a desire to produce valid, CABG sites (90 %) were linked to an STS site.
risk-adjusted results that would allow fair com- Patient-level penetration (number of CMS CABG
parisons of performance among providers, ac- hospitalizations done at STS sites divided by the
counting for the preoperative risk of their total number of CMS CABG hospitalizations)
patients. Availability of such data would facilitate increased from 89 % to 94 % from 2008 to
quality improvement by providers and might also 2012. In 2012, 71,634 of 76,072 CMS CABG
impact consumer choice of providers, shifting hospitalizations (94 %) were at an STS site.
market share to better performing groups, Finally, completeness of case inclusion at STS
although the latter goal has yet to be achieved. sites (number of CMS CABG cases at STS sites
10 Health Services Information: Lessons Learned from the Society of Thoracic Surgeons. . . 221
Fig. 1 STS Adult Cardiac Surgery Database Map, Thoracic Surgeons, 2016. All rights reserved (Reprinted
accessed July 2, 2016, at http://www.sts.org/sites/default/ with permission from STS)
files/documents/adultcardiacMap_4.pdf. # The Society of
linked to STS records, divided the by total num- The STS-ACSD now encompasses the entire
ber of CMS CABG cases at STS sites) increased spectrum of adult cardiac surgery. This includes
from 97 % to 98 % from 2008 to 2012. In 2012, CABG; surgery of the aortic, mitral, tricuspid, and
69,213 of 70,932 CMS CABG hospitalizations at pulmonary valves; surgery of the thoracic aorta;
STS sites (98 %) were linked to an STS record. arrhythmia procedures; and less commonly per-
This suggests that at STS-participating sites that formed procedures such as pulmonary thromboend-
billed CMS for CABG procedures, virtually all arterectomy and removal of tumors of the heart and
these billed cases were captured in the STS inferior vena cava. Data are collected regarding:
National Database. These high degrees of
national penetration and completeness, together Patient demographics
with high accuracy verified in the ongoing exter- Risk factors that may impact the outcomes of
nal audits (see “Data Quality and Audit” section surgery
below), are of critical importance when STS Details of the specific disease process that led to
advocates for the use of its measures, rather surgery (e.g., degree of coronary artery stenosis
than those based on claims data, in various public in each vessel, etiology and severity of valvular
reporting programs. Lack of high national pene- lesions, type of thoracic aortic pathology)
tration is, in fact, a commonly used rationale for Technical details of the conduct of the procedure
the continued use of claims-based metrics in that was performed
many areas; however, this justification for use Detailed clinical outcomes
of claims-based metrics is clearly not applicable Disposition of the patient (e.g., home, rehabilita-
to adult cardiac surgery. tion facility, or deceased)
222 D. M. Shahian and J. P. Jacobs
Fig. 2 STS Congenital Heart Surgery Database Map, Thoracic Surgeons, 2016. All rights reserved (Reprinted
accessed July 2, 2016, at http://www.sts.org/sites/default/ with permission from STS)
files/documents/congenitalMap_4.pdf. # The Society of
Data from the STS-ACSD are reported back (Shahian et al. 2007a, 2012a, 2014; O’Brien
to participants on a quarterly basis (STS et al. 2007). These performance reports provide
National Database 2014). These data include numerical point estimates with credible inter-
the types of procedures performed, demo- vals based on a Bayesian hierarchical model,
graphics and risk factors of the patients, details and they also assign participants to a “star rat-
about the conduct of the surgical procedure, and ing” category based on the true Bayesian
outcomes. In each case, this information is probabilities (e.g., 99 % for isolated CABG)
benchmarked against aggregate data from all that the provider has worse than expected, as
STS-participating programs nationally and also expected, or better than expected performance
against aggregate data from programs that are (see “STS Quality Measurement Task Force
similar in terms of teaching intensity and size. [STS-QMTF]” section below). These reports
Finally, participants are given their last several also include guidance as to which areas of per-
years of data so that important trends may be formance are most in need of remediation and
recognized. Twice yearly, in addition to the improvement.
routine harvest feedback reports, participants In addition to these regular confidential feed-
also receive reports of their performance on back reports, STS-ACSD data are used for quality
National Quality Forum (NQF)-endorsed STS assessment, performance improvement initiatives,
metrics and on the various STS composite per- research, and public reporting and to satisfy regu-
formance metrics for specific procedures (e.g., latory and reimbursement imperatives. Many of
isolated CABG, isolated aortic valve replace- these additional functions are discussed in subse-
ment, aortic valve replacement plus CABG) quent sections.
10 Health Services Information: Lessons Learned from the Society of Thoracic Surgeons. . . 223
Fig. 3 STS General Thoracic Surgery Database Map, Thoracic Surgeons, 2016. All rights reserved (Reprinted
accessed July 2, 2016, at http://www.sts.org/sites/default/ with permission from STS)
files/documents/thoracicMap_5.pdf. # The Society of
STS Congenital Heart Surgery Database undertaken by the STS Workforce on Congenital
(STS-CHSD) Heart Surgery, documented that 125 hospitals in
the USA and 8 hospitals in Canada perform
The report of the 2010 STS Congenital Heart pediatric and congenital heart surgery (Jacobs
Surgery Practice and Manpower Survey, et al. 2011a). In 2014, the STS Congenital Heart
224 D. M. Shahian and J. P. Jacobs
Surgery Database (STS-CHSD) included 114 con- Heart Surgery Nomenclature and Database Pro-
genital heart surgery programs representing ject are now utilized by the STS-CHSD, the
119 of the 125 hospitals (95.2 % penetrance by EACTS Congenital Heart Surgery Database
hospital) in the USA and 3 of the 8 centers in (EACTS-CHSD), and the Japan Congenital Car-
Canada. diovascular Surgery Database (JCCVSD). As of
The analysis of outcomes of patients undergo- January 1, 2014, the STS-CHSD contains data
ing pediatric and congenital cardiac surgery pre- from 292,828 operations; the EACTS-CHSD con-
sents several unique challenges in the domains of tains data from over 157,772 operations; and
nomenclature and risk adjustment. Unlike adult the JCCVSD contains data from over 29,000
cardiac surgery where the majority of operations operations. Therefore, the combined data set of
involve CABG, aortic valve replacement, and the STS-CHSD, the EACTS-CHSD, and the
mitral valve replacement or repair or a combina- JCCVSD contains data from over 479,000 opera-
tion of these, congenital cardiac surgery involves tions performed between 1998 and January
a much wider variety of procedures. 1, 2014, inclusive, all coded with the EACTS-
One of the greatest challenges in the develop- STS-derived version of the IPCCC, and all
ment and application of the STS-CHSD has coded with identical data specifications.
involved standardization of nomenclature and Similar to investigations of data sources used
definitions related to surgery for pediatric and for adult cardiac surgery studies, several studies
congenital cardiac disease. During the 1990s, have examined the relative utility of clinical and
both the European Association for Cardio- administrative nomenclature for the evaluation of
Thoracic Surgery (EACTS) and STS created data- quality of care for patients undergoing treatment
bases to assess the outcomes of congenital cardiac for pediatric and congenital cardiac disease.
surgery. Beginning in 1998, these two organiza- Given the far greater diversity of anatomic lesions
tions collaborated to create the International Con- and procedures compared with adult cardiac sur-
genital Heart Surgery Nomenclature and Database gery, it is not surprising that the superiority of
Project. By 2000, a common nomenclature and a clinically rich data sources is even more apparent
common core minimal data set were adopted by in congenital heart disease. Evidence from several
EACTS and STS and published in the Annals of investigations suggests inferior accuracy of cod-
Thoracic Surgery (Mavroudis and Jacobs 2000; ing of lesions in the congenitally malformed heart
Franklin et al. 2008). In 2000, The International using administrative databases and the ninth
Nomenclature Committee for Pediatric and Con- revision of the International Classification of
genital Heart Disease was established. This com- Diseases (ICD-9) (Cronk et al. 2003; Frohnert
mittee eventually evolved into the International et al. 2005; Strickland et al. 2008; Pasquali
Society for Nomenclature of Paediatric and et al. 2013; Jantzen et al. 2014). Analyses based
Congenital Heart Disease (ISNPCHD). By 2005, on the codes available in ICD-9 are likely to have
members of the ISNPCHD crossmapped the substantial misclassification of congenital cardiac
nomenclature of the International Congenital disease. Furthermore, differences in case ascer-
Heart Surgery Nomenclature and Database Pro- tainment between administrative and clinical
ject of the EACTS and STS with the European registry data for children undergoing cardiac oper-
Paediatric Cardiac Code (EPCC) of the Associa- ations can translate into important differences in
tion for European Paediatric Cardiology (AEPC) outcomes assessment.
and therefore created the International Pediatric Risk modeling is essential when assessing and
and Congenital Cardiac Code (IPCCC) (Franklin comparing healthcare performance among pro-
et al. 2008; Jacobs et al. 2008), which is available grams and surgeons, as this adjusts for differences
for free download from the Internet at http://www. in the complexity and severity of patients they
IPCCC.NET. This common nomenclature, the treat. Reliably accounting for the risk of adverse
IPCCC, and the common minimum database outcomes mitigates the possibility that providers
data set created by the International Congenital caring for sicker patients will be unfairly
10 Health Services Information: Lessons Learned from the Society of Thoracic Surgeons. . . 225
penalized, as their unadjusted results may be EACTS have transitioned from the primary use
worse simply because of case mix (Shahian of Aristotle and RACHS-1 to the primary use of
et al. 2013a). However, formal risk modeling is the STAT Mortality Categories for three major
challenging for rare operations because sample reasons:
sizes are small. Risk adjustment in congenital
cardiac surgery is particularly challenged by this 1. STAT Score was developed primarily based on
reality, as the specialty is defined by a very objective data, while RACHS-1 and Aristotle
wide variety of operations, many of which are were developed primarily on expert opinion
performed at a relatively low volume. Conse- (subjective probability).
quently, the STS-CHSD has implemented a meth- 2. STAT Score allows for classification of more
odology of risk adjustment based on complexity operations than RACHS-1 or Aristotle.
stratification. Complexity stratification provides 3. STAT Score has a higher c-statistic than
an alternative methodology that can facilitate the RACHS-1 or Aristotle.
analysis of outcomes of rare operations by divid-
ing the data into relatively homogeneous groups Data from the STS-CHSD are reported back to
(called strata). The data are then analyzed within participants every 6 months in feedback reports.
each stratum. Similar to the STS-ACSD, the data in these feed-
Three major multi-institutional efforts have back reports include the types of procedures
used complexity stratification to measure the com- performed, demographics and risk factors of the
plexity and potential risk of congenital cardiac patients, details about the conduct of the surgical
surgical operations (Jacobs et al. 2009; O’Brien procedure, and outcomes. In each case, this infor-
et al. 2009a): mation is benchmarked against aggregate data
from all participants in the STS-CHSD. Partici-
1. Risk Adjustment in Congenital Heart Surgery- pants are given their last 4 years of data so that
1 methodology (RACHS-1 methodology) important trends may be recognized. The feed-
2. Aristotle Basic Complexity Score (ABC back report also includes an assessment of pro-
Score) grammatic performance using the empirically
3. STS-EACTS Congenital Heart Surgery Mor- derived 2014 STS Congenital Heart Surgery
tality Categories (STS-EACTS Mortality Cat- Database Mortality Risk Model that incorporates
egories) (STAT Mortality Categories) both procedural stratification by STAT Mortality
Category and patient factors. This 2014
RACHS-1 and the ABC Score were developed STS-CHSD Mortality Risk Model includes the
at a time when limited multi-institutional clinical following covariates:
data were available and were therefore based in a
large part on subjective probability (expert opin- • STAT Mortality Category
ion). The STAT Mortality Categories are a tool for • Age
complexity stratification that was developed from • Previous cardiovascular operation(s)
an analysis of 77,294 operations entered into the • Any noncardiac abnormality
EACTS-CHSD (33,360 operations) and the • Any chromosomal abnormality or syndrome
STS-CHSD (43,934 patients) between 2002 and • Important preoperative factors (mechanical cir-
2007. Procedure-specific mortality rate estimates culatory support, shock persisting at time of
were calculated using a Bayesian model that surgery, mechanical ventilation, and renal
adjusted for small denominators. Operations dysfunction)
were sorted by increasing risk and grouped into • Any other preoperative factors
five categories (the STAT Mortality Categories) • Prematurity (for neonates only)
that were designed to be optimal with respect to • Weight (for neonates only)
minimizing within-category variation and maxi- • Weight‐for‐age‐and‐sex Z‐score (for infants
mizing between-category variation. STS and only)
226 D. M. Shahian and J. P. Jacobs
Centers for which the 95 % confidence interval STS General Thoracic Surgery Database
for observed-to-expected mortality ratio does not (STS-GTSD)
include unity (does not overlap with the number
one) are identified as one-star (low-performing) The STS General Thoracic Database
or three-star (high-performing) programs with (STS-GTSD) is the newest of the three specialty
respect to operative mortality. Star ratings are databases, and it faces a unique challenge. Unlike
provided for the single category of ‘all ages and adult and congenital heart surgery, both of which
all STAT Categories.’ Public reporting of data are practiced almost exclusively by board-
from the STS-CHSD began in January 2015 certified cardiothoracic (CT) surgeons, general
using this star rating system, with reporting of thoracic surgery in the USA is more often
both star ratings and the actual numerical mortal- performed by general surgeons or by surgical
ity data on which the star rating is based. As of oncologists. These surgeons are allowed to submit
March 2016, 68 out of 113 (60.2 %) participants data to the STS National Database, but they rarely
in STS-CHSD from the United States had agreed take advantage of this opportunity. Therefore,
to publicly report their outcomes using this there are essentially two populations of patients
system. undergoing noncardiac chest surgery in the USA.
Data quality in the STS-CHSD is evaluated In the first group are patients operated upon by
through intrinsic data verification, including iden- board-certified CT surgeons, many of whom are
tification and correction of missing/out-of-range involved in academic or referral centers and most
values and inconsistencies across fields and of whom participate in the STS-GTSD. The sec-
on-site audit. In 2014, approximately 10 % of ond group of patients is operated upon by sur-
participants (11 participants) will be randomly geons who are not board-certified thoracic
selected for audits of their center. The audit is surgeons, who rarely if ever participate in the
designed to complement the internal quality con- STS National Database, and who do not receive
trols. Its overall objective is to maximize the regular feedback information on their perfor-
integrity of the data in the STS-CHSD by exam- mance from the STS-GTSD. This diverse popula-
ining the accuracy, consistency, and completeness tion of surgeons performing general thoracic
of the data. In 2013, the audit of the STS-CHSD surgery is an important consideration when
included the following documentation of rates of assessing the performance of an STS-GTSD pro-
completeness and accuracy for the specified fields gram, as their benchmark population of providers
of data: is already preselected to be among the best
thoracic surgeons in the nation. An average
• Primary diagnosis (completeness = 100 %, STS-GTSD participant program may well have
accuracy = 96.2 %) performance that substantially exceeds that of
• Primary procedure (completeness = 100 %, procedures performed by non-board-certified sur-
accuracy = 98.7 %) geons. Potentially useful areas of performance
• Mortality status at hospital discharge (com- comparison include adequacy of preoperative stag-
pleteness = 100 %, accuracy = 98.8 %) ing, functional evaluation, intraoperative lymph
node sampling, and morbidity and mortality.
Similar to the STS-ACSD, in addition to regu- Despite this challenge, the STS-GTSD is
lar confidential feedback reports, STS-CHSD data growing, and in 2015, it enrolled patients from
are used for quality assessment, performance 273 participants. External audit revealed high
improvement initiatives, research, and public accuracy (overall 95 %). Mortality and morbidity
reporting (beginning in early 2015) and to satisfy risk models for lung cancer and esophageal resec-
regulatory and reimbursement imperatives. Many tion have been developed in collaboration with the
of these additional functions are discussed in sub- STS Quality Measurement Task Force (QMTF)
sequent sections. (Kozower et al. 2010; Shapiro et al. 2010;
10 Health Services Information: Lessons Learned from the Society of Thoracic Surgeons. . . 227
Wright et al. 2009), and performance metrics using is the degree of granularity and specificity of its
these risk models will be used to classify thoracic data elements (STS National Database 2014).
programs as one, two, or three stars, similar to the Since the inception of the STS National Database,
approach used in adult cardiac surgery. Because periodic (typically every 3 years, in a cycle that
STS-GTSD participants represent a high-per- allows one of each of the three databases to be
forming subset of all US surgeons performing gen- updated each year) data specification upgrades
eral thoracic procedures, STS has also compared occur based on the evolution of scientific knowl-
the unadjusted results of STS surgeons with those edge as well as feedback from database managers,
available from the Nationwide Inpatient Sample end users, and participants. Every data element
(NIS) for all surgeons performing chest operations collected has an associated sequence number
nationally. This comparison has revealed that which is mapped to a detailed clinical data spec-
surgeons who are actively participating in the ification. This feature of clinical registries – their
STS-GTSD have superior results, likely both highly structured and clinical granular data –
because of their specialized training as well as the distinguishes them from alternative data sources
feedback reports they receive. such as claims data (not clinically rich) and
Similar to the efforts by the STS-CHSD to electronic health record (EHR) data (unstructured,
standardize nomenclature internationally (see lacking specific definitions used by all institutions).
“STS Congenital Heart Surgery Database This unique advantage of clinical registries,
[STS-CHSD]” section above), the STS-GTSD including the STS National Database, also poses
continues to update its data specifications and one of their greatest challenges – data collection
harmonize data definitions with the European burden. Rather than allowing anyone to enter the
Society of Thoracic Surgeons. This work will data that become part of a patient’s STS record,
facilitate joint research and quality improvement these data are either entered by a trained abstrac-
initiatives, as well as international comparisons tor, or data entered by caregivers are carefully
of care. reviewed by the data abstractor. These data man-
Members of the STS General Thoracic Surgery agers work with surgeons, physician assistants,
Database Task Force are exploring options for nurse practitioners, and others to ensure that that
obtaining long-term outcomes for cancer resec- data entered into the STS National Database
tion, including linking the STS-GTSD with Medi- adhere to the definitions established by STS and
care data (see “STS Task Force on Longitudinal that they are supported by documentation in the
Follow-Up and Linked Registries [LFLR]” sec- patient’s medical record. These data managers
tion below). However, other data sources will also have many resources available to them including:
be required, including various cancer registries, as
40 % of lung cancer resections and 60 % of • The detailed written specifications themselves.
esophageal cancer resections are under the age • A teaching manual that expands upon the for-
of 65. (Medicare data only includes patients mal specifications and often includes clinical
65 or older and younger patients on dialysis.) examples
• Advice of colleagues in regional collaboratives
around the nation
Database Operations • Biweekly telephone calls with STS National
Database and Duke Clinical Research Institute
Data Sources leaders
• Email alerts
Although many investigators use claims data • Newsletters
(e.g., Medicare) for performance evaluation and • A 4-day annual national meeting (The Society
research, the distinguishing feature of the STS of Thoracic Surgeons Advances in Quality
National Database and similar clinical registries and Outcomes [AQO] Conference: A Data
228 D. M. Shahian and J. P. Jacobs
Managers Meeting) attended by nearly 500 data • Assist programs in joining the database.
managers from around the country (at which • Develop and maintain appropriate contractual
data managers and surgeon leaders present edu- relationships with vendors, participants, and
cational sessions on challenging coding issues our warehouse and analytic center.
and new developments in data specifications) • Coordinate and staff the various STS National
Database Task Forces and their respective con-
Numerous studies have been conducted ference calls and meetings.
(Shahian et al. 2007b; Mack et al. 2005) showing • Develop and maintain budgets.
that both the number and type of procedures • Assure compliance with all relevant regulatory
performed and their results differ substantially processes, including the Health Insurance Por-
with the use of detailed clinical data as opposed tability and Accountability Act of 1996
to claims data sources. (HIPAA).
STS is working with EHR vendors to investi- • Serve as the main resource for data managers.
gate how some STS variables might be automat- • Arrange the annual STS Advances in Quality
ically extracted from routinely collected EHR and Outcomes [AQO] Conference.
data. The most straightforward variables for this • Work with external organizational partners on
type of capture would include demographics, labs, issues such as public reporting.
and structured diagnostic testing such as percent • Coordinate the ongoing upgrades of all three
coronary artery obstruction, ejection fraction, and clinical databases.
valve areas. Other STS data elements which have
complex data specifications would be more chal-
lenging to map from EHRs, and these complex Data Warehouse and Analytic Center
elements might require the addition of specific
fields to the EHR. Since 1998, the Duke Clinical Research Institute
(DCRI) has served as the data warehouse and
analytic center for the STS National Database.
Vendors DCRI receives data from participants, which
then undergo extensive data quality and con-
The Society of Thoracic Surgeons has contractual sistency checks. Each participant receives a
relationships with a number of vendors who pro- comprehensive harvest feedback report gener-
vide the data entry software by which participant ated by DCRI, as previously described. These
programs enter data into the STS National Data- feedback reports are distributed every 3 months
base. Each vendor differs in the sophistication of to participants in the STS-ACSD and every
the reports they produce, opportunity for custom- 6 months to participants in the STS-CHSD
ization, cost, and ability to link with other data- and the STS-GTSD. These feedback reports
bases such as the American College of Cardiology include extensive educational and explanatory
(ACC) National Cardiovascular Data Registry materials describing how each report and metric
(NCDR). However, each vendor must achieve are calculated. DCRI also provides statistical
basic certification by STS to ensure that their support for most of the STS National Database
software is capable of producing accurate and Task Forces, particularly the Quality Measure-
consistent results. ment Task Force, and they are also involved in
the Access and Publications Task Force, the
STS Task Force on Longitudinal Follow-Up
STS Staff and Linked Registries (LFLR), and the STS
Research Center. DCRI statisticians play an
Numerous full-time staff at STS headquarters are integral role in the design and implementation
devoted to database operations and serve multiple of all STS risk models and performance
functions: measures.
10 Health Services Information: Lessons Learned from the Society of Thoracic Surgeons. . . 229
Data Quality and Audit overall assessment of the accuracy at audit sites.
In 2013, among nearly 100,000 individual data
Regardless of the granularity and specificity of the elements audited, the overall agreement rates in
data elements in any registry, they are only useful the STS-ACSD averaged nearly 97 %. As des-
if data are actually inputted in strict conformity cribed above, similar agreement rates are documen-
with their specifications. A firm belief in the accu- ted in the STS-CHSD and the STS-GTSD. In the
racy of data submitted by all programs nationally, STS-CHSD, an STS congenital heart surgeon vol-
and the metrics derived from them, provides the unteer leader also participates in each audit.
foundation of trust necessary to implement STS
programs such as voluntary public reporting.
Data quality checks exist at several stages of STS Quality Measurement Task Force
the STS data entry process. First, there are internal (STS-QMTF)
consistency and out-of-range audits that take
place at the time of data entry. For example, an The STS Quality Measurement Task Force
age of 150 years would be rejected because it falls (STS-QMTF) is responsible for all risk model
out of the acceptable data input range. Second, and performance measure development for the
submitted data are reviewed at DCRI, and exces- Society. These quality measurement activities are
sive rates of missing data or other irregularities not fully integrated into the STS National Database, a
captured during data submission are reported back unique arrangement that has numerous advan-
to STS participant for revision. Third, STS partic- tages. First, the performance measures are based
ipant sites receive a list of their demographics, risk on readily available STS clinical data. Second, the
factors, operative data, and outcomes compared to performance measures are developed through
STS nationally and to hospitals of similar size and direct collaboration between statistical consul-
academic status. Substantial differences from tants and surgeons who have both clinical exper-
these benchmarks would lead a program to eval- tise and knowledge of performance measurement
uate the accuracy of its submissions. and health policy. Third, the performance mea-
Finally, STS has an extremely robust annual sures can be tested for reliability and validity by
audit of all three of its databases, all conducted by using them in confidential participant feedback
a highly respected external organization. Ten per- reports prior to public reporting. Pilot testing is a
cent of all STS National Database sites are ran- difficult process for many measure developers, but
domly selected for audit annually. Each audit it is an inherent capability provided by a clinical
consists of 20 coronary bypass procedures and registry such as the STS National Database.
10 valve procedures; approximately 82 data ele- In addition to having the best available clinical
ments are abstracted from each medical record. data, the next most important factor in perfor-
Previously this process had required on-site visits mance measure development is risk models.
by the external auditing agency, but a mechanism These are essential to adjust for inherent differ-
has been developed to access patient records elec- ences in patient risk, and they are crucial if per-
tronically in a HIPAA-compliant fashion. In addi- formance measures are to have face validity with
tion to validating STS submissions against the stakeholder groups, especially the providers
medical record (for accuracy of the data), STS (Shahian et al. 2013a). Risk model development
submissions are also checked against hospital typically begins by identifying the most relevant
operative logs in order to ensure that all cases outcomes for a particular type of procedure
have been collected (for completeness of and specialty. Initial exploratory analyses are
the data). performed to determine if an adequate number of
Each year, all three clinical databases compris- cases and endpoints are available and over what
ing the STS National Database are audited. An period of time these need to be aggregated in order
extensive report is generated showing the agree- to assure adequate sample size for the outcome in
ment rate for all audited data elements and an question.
230 D. M. Shahian and J. P. Jacobs
The selection and definition of relevant end- After endorsement by the Executive Committee
points is critical to the development of risk of STS, all STS performance measures are
models. In both quality assessment activities and published in their entirety in the peer-reviewed
clinical research to improve patient care, STS has literature (Shahian et al. 2009a, b; O’Brien
defined its major outcomes endpoint, mortality, in et al. 2009b), including all special considerations
a unique fashion. Typically, mortality after hospi- discussed during the measurement development
talizations or procedures has used one of two process, the final covariates and their parameteri-
definitions. In-hospital mortality is collected zation, and the associated intercepts and coeffi-
with high accuracy, but it misses early post- cients of the risk model equations.
discharge deaths occurring at home or in extended Risk-adjusted outcomes based on national
care facilities. Collecting only in-hospital out- benchmark STS data are provided back to partic-
comes may also create a perverse incentive to ipants at each quarterly harvest. Risk models are
discharge patients earlier than desirable so that fully updated every few years, but annually a
potential adverse outcomes do not occur during calibration factor is introduced so that the
the index hospitalization. Another approach is to observed-to-expected ratio for a given year equals
measure adverse outcomes at 30 days, regardless one. Multiple STS risk models are publicly avail-
of where the patient is located. This avoids pro- able as online calculators on the STS website
viding an incentive for premature discharge, but it (STS short-term risk calculator 2014; STS long-
may encourage some providers to keep a severely term risk calculator 2014), and these sites are
ill patient alive through artificial support just long visited thousands of times each month.
enough to meet the 30-day threshold. STS seeks The appropriate interpretation of risk-adjusted
to avoid the disadvantages of either of these results bears special mention, given both its cen-
approaches alone by combining them. The trality in performance measurement and the fact
time period of mortality data collection for all that it is often misunderstood by many who view
three STS National Databases is based upon the these reports. There are two primary statistical
STS definition of operative mortality (Overman methods by which outcomes results are adjusted
et al. 2013), which is now used by all three STS for inherent risk (Shahian and Normand 2008). In
National Databases: operative mortality is defined direct standardization, the stratum-specific rates
as (1) all deaths, regardless of cause, occurring (e.g., age, sex, ethnicity) for each population of
during the hospitalization in which the operation interest (e.g., a particular hospital’s stratum-
was performed, even if after 30 days (including specific rate of adverse events) are applied to a
patients transferred to other acute care facilities), standard or reference population. This method is
and (2) all deaths, regardless of cause, occurring often used in epidemiology where there are a
after discharge from the hospital, but before the limited number of strata to be considered, and
end of the 30th postoperative day. the rates for each stratum are available. However,
As the next step in risk model development, for most provider profiling applications, the num-
bivariate analyses are performed to study the asso- ber of strata, corresponding to individual risk fac-
ciation between individual risk factors and the tors, is too large to standardize in this fashion.
outcome. A comprehensive array of candidate Accordingly, almost all healthcare profiling initia-
risk factors is entered into multivariable risk tives use another statistical method, indirect stan-
models, and odds ratios (with 95 % CI) are deter- dardization, for risk adjustment. In this approach,
mined for each. In some instances, certain vari- the rates derived from a reference or standard
ables are “forced” into the model regardless of population of hospitals, often in the form of a
statistical significance because they are regarded risk model with intercepts and coefficients, are
by clinical experts as critical for face validity. The applied to the particular case mix of the institu-
output of these models is assessed using measures tions being studied. The actual results for an indi-
of calibration, discrimination, and reliability and vidual program’s case mix are compared to what
using actual data from the STS National Database. would have been expected had that program’s
10 Health Services Information: Lessons Learned from the Society of Thoracic Surgeons. . . 231
population of patients been treated by an average has addressed this in a number of ways. First, it
provider from the reference population. has expanded its activities in risk modeling and
Both methods of standardization provide risk performance metrics beyond CABG to include
adjustment in a generic sense – they “level the other major cardiothoracic procedures such as
playing field” – so that programs caring for sicker isolated aortic valve replacement, aortic valve
patients are not penalized. However, only direct replacement combined with CABG, mitral
standardization permits direct comparison of the valve replacement, mitral valve repair, multiple
risk-standardized results of one specific program valve procedures, and numerous procedures in
with those of another. In indirect standardization, general thoracic surgery and congenital cardiac
the results for any particular program are based surgery. This expansion of the procedures that
solely on its specific mix of patients, and these are available for risk modeling and performance
results can only be compared with the overall assessment provides a much more comprehensive
results of all providers for a similar case mix assessment of quality than focusing solely on
(Shahian and Normand 2008). For example, a CABG, whose incidence and rate of adverse out-
small community heart surgery program may comes have both been declining over the past
have a lower risk-adjusted mortality rate than a decade. Second, instead of collecting information
tertiary/quaternary center. However, using indi- only on mortality, the STS-QMTF has developed
rect standardization, it cannot be assumed that if risk models for more of the individual surgical
faced with the same case mix of the tertiary center, complications such as stroke, reoperation, pro-
it would also have superior results. longed ventilation, infection, renal failure, pro-
The primary motivation for development of the longed length of stay, and a composite of major
STS National Database was the need to provide morbidity and mortality.
accurate performance assessment, and this Third, in addition to viewing these measures
remains the highest priority of the STS-QMTF. individually, STS has increasingly focused on
A variety of measures have been developed composite measures using multivariate hierarchi-
including structure, process, and outcomes (the cal approaches. The first STS composite measure,
Donabedian triad) (Donabedian 1966). Risk- CABG, included the risk-adjusted mortality, the
adjusted mortality rates for CABG were the orig- occurrence of any (any or none) of the five major
inal outcome used to classify cardiac surgery per- complications of CABG surgery (stroke, renal
formance, but even this archetypal measure can be failure, prolonged ventilation, reoperation, and
inadequate. For example, consider three survivors infection), the use of at least one internal mam-
of coronary artery bypass surgery (CABG), all of mary artery graft, and the provision of all four (all
whom would be considered to have had identical or none) NQF-endorsed medications (preopera-
quality procedures based on mortality alone. One tive beta blockade, discharge beta blockade,
patient receives all the appropriate bypass grafts lipid-lowering agents such as statins and aspirin)
and medications and sustains no complications. (Shahian et al. 2007a; O’Brien et al. 2007). Sim-
The second patient receives only vein grafts, ilar composite measures have been developed for
which have limited longevity, and does not isolated aortic valve replacement (Shahian
receive postoperative medications to prevent pro- et al. 2012a) and for aortic valve replacement
gression of coronary disease. The third patient combined with CABG (Shahian et al. 2014), and
experiences the new onset of dialysis-dependent a composite measure is currently under develop-
renal failure which will markedly impact both ment for mitral valve surgery. These latter mea-
longevity and quality of life. Despite having all sures differ from the isolated CABG composite in
survived surgery, the quality received by these that they consist solely of outcomes measures
three patients varied markedly. (mortality and morbidity) and do not include
The STS-QMTF has recognized the inade- process measures. This reflects both a shift in
quacy of using CABG risk-adjusted mortality as healthcare performance measurement toward out-
the sole quality metric for cardiac surgery, and it comes measures (rather than structure or process
232 D. M. Shahian and J. P. Jacobs
measures) and the fact that evidence-based, variability in resource use among programs, as
widely accepted process measures suitable for well as the development of risk models for cost,
performance measurement are not available for so that programs being evaluated for cost effi-
these other procedures. ciency are not unfairly penalized when they care
STS envisions a portfolio of such procedure- for particularly complex patients. STS ultimately
specific composite measures and, ultimately, an envisages a comprehensive portfolio of perfor-
overall composite of procedural performance mance measures which might include a composite
encompassing information from all these individ- of multiple procedural composite measures,
ual composite metrics (a “composite of compos- appropriateness, failure to rescue, patient-
ites”). However, even this “composite of centered outcomes, and risk-adjusted resource
composites” will only be one component of an utilization.
overall STS performance measurement system Finally, the most appropriate level of attribu-
that will include multiple other domains. For tion for performance measures is a focus of con-
example, just as important as the outcome of tinuing discussion. STS has historically measured
particular procedure is the question of whether performance only at the participant level (typi-
that procedure was indicated in the first place. cally a hospital) for a variety of reasons. There
Accordingly, STS has mapped both the ACCF/ are sample size concerns at the individual surgeon
AHA CABG guidelines (Hillis et al. 2011) and the level, and cardiac surgery is a “team sport” requir-
multi-societal 2012 Appropriate Use Criteria ing many participants in addition to the surgeon
(AUC) for Coronary Revascularization (Patel (e.g., cardiologist, anesthesiologist, perfusionist,
et al. 2012) to the relevant data elements in the nurses, critical care specialists, respiratory thera-
STS-ACSD. This will ultimately allow STS par- pists). However, notwithstanding these concerns,
ticipants to receive immediate documentation that many commercial payers and governmental agen-
their patient meets one of these CABG guidelines cies are now publishing (or requiring) information
or AUC. Similar mapping is underway for valve about surgeon-level performance, much of which
procedures. STS has also begun to explore failure are based on inadequately adjusted administrative
to rescue (mortality following the development of claims data and/or flawed analytics. Conse-
a complication of surgery) as an additional new quently, STS feels a responsibility to offer a
quality metric (Pasquali et al. 2012a). Previous valid, surgeon-level metric. An individual sur-
research suggests that the ability to salvage a geon performance metric has now been developed
patient from a serious complication is a by STS for adult cardiac surgery. It is a composite
distinguishing feature of high-quality programs measure based on morbidity and mortality data for
and complements other metrics such as overall 5 of the most common performed procedures,
morbidity. Patient-reported outcomes are also aggregated over 3 years. This measure has very
increasingly recognized for their value in high reliability (0.81) because of the large number
assessing quality. These include both patient- of endpoints being analyzed (Shahian et al. 2015).
reported functional outcomes (e.g., return to Regardless of the particular performance mea-
work and overall functional capacity) as well as sure, the general STS-QMTF approach to profil-
patient satisfaction (e.g., HCAHPS or Hospital ing performance results across providers is
Consumer Assessment of Healthcare Providers similar. Results are estimated in Bayesian hierar-
and Systems, CGCAHPS or Clinician and Group chical models, and providers are classified as hav-
Consumer Assessment of Healthcare Providers). ing expected, better than expected, or worse than
STS has also formed a Cost and Resource Task expected performance based on true Bayesian
Force within the STS National Database. The probabilities rather than frequentist confidence
objective of this task force is to link the STS intervals (Shahian et al. 2007a; O’Brien
National Database with cost data from hospital, et al. 2007). Unlike the latter, the Bayesian cred-
commercial, federal, or state payer data. Such a ible interval has an intuitive probability interpre-
linkage would provide accurate data regarding tation. For example, given a database participant’s
10 Health Services Information: Lessons Learned from the Society of Thoracic Surgeons. . . 233
observed data, if the lower limit of the 98 % consumers could choose surgeons or hospitals
Bayesian credible interval is greater than the based on other criteria, such as convenience,
STS average value, then there is at least 99 % availability, or service.
probability (98 % credible interval plus 1 % In reporting their results, STS provides varying
upper tail) that the participant’s true performance levels of granularity. These range from point esti-
(e.g., in avoiding mortality or morbidity or in mates with credible intervals for statistically
using an internal mammary artery graft) exceeds sophisticated users and star ratings corresponding
the STS average value for their particular case to as expected, better than expected, or worse than
mix. The Bayesian probability (and corres- expected for typical consumers (based on the
ponding Bayesian credible interval) selected for work of Professor Judith Hibbard (Hibbard
a particular measure varies depending on factors et al. 2001)). When a composite measure encom-
such as event rates, variation of scores across passes multiple procedures or performance
programs, and sample sizes for typical providers. domains, STS always provides the ability to drill
For procedures such as CABG which are fre- down to the lowest level of the composite, its
quently performed, STS has used 99 % Bayesian constituent elements.
probabilities, which result in approximately
10–15 % of STS providers being labeled as low
performing and 10–15 % classified as high STS Quality Initiatives Task Force
performing, with the remainder being average. (STS-QIT)
For less common procedures such as isolated
valve replacement, STS-QMTF has used 95 % The acquisition of healthcare data and their use in
Bayesian probabilities (97.5 % credible inter- performance assessment are not goals in them-
vals), which results in fewer outliers (Shahian selves. The primary objective of all these activi-
et al. 2012a). Even with the lower probability ties is to improve healthcare quality. Just as the
requirement, the smaller number of observations Quality Measurement Task Force is an integral
means there is less data upon which to base an part of the STS National Database, the STS Qual-
estimate of a provider’s performance, and the ity Initiatives Task Force (STS-QIT) is similarly
percentage of outliers is typically lower than for fully integrated. This facilitates the use of STS
CABG. If the probability criterion were even data as the basis for quality improvement projects
lower (e.g., 90 % Bayesian probability), then and allows both baseline and subsequent perfor-
more participants would be classified as outliers, mance to be measured, thus documenting the effec-
but our certainty would also be much lower, jeop- tiveness of interventions. Another advantage of
ardizing face validity with providers and other integrating the Quality Initiatives Task Force
stakeholders. within the database is to facilitate the identification
Importantly, when estimated in this fashion, of gaps and variability in national performance and
there is no requirement for any fixed number of to focus quality initiatives in these areas.
high or low outliers. If, for example, all programs At the national level, quality improvement ini-
function at a high level and were statistically tiatives have been conducted using the STS
indistinguishable using these criteria, they would National Database to improve compliance with
all be average (or, in STS parlance, two-star) pro- preoperative beta blockade and use of internal
grams. In contrast to payers and commercial mammary artery bypass grafts for CABG, both
report card developers, who often seem deter- of which are NQF-endorsed performance mea-
mined to demonstrate differences among pro- sures (Ferguson et al. 2003). A 2012 report by
viders, STS believes the ideal situation from a ElBardissi and colleagues (ElBardissi et al. 2012)
societal perspective would be for all programs to suggests that the STS National Database and its
be functioning at a very high level and statistically quality measurement and improvement activities
indistinguishable (e.g., the very high safety record have dramatically improved cardiac surgery
of the commercial aircraft industry). Then, results over the past decade.
234 D. M. Shahian and J. P. Jacobs
STS-QIT has begun to identify key opportuni- • Public reporting demonstrates commitment to
ties for improvement within cardiothoracic sur- quality improvement.
gery and has developed focused webinars and • Public reporting is one approach to improving
online libraries of best practice articles to address quality.
these issues. Specific recent webinars (STS Qual- • Public reporting promotes patient autonomy
ity Improvement webinars 2014)include blood and facilitates shared decision-making.
conservation and transfusion triggers, glucose • If professional medical and surgical societies
management, and mediastinal staging prior to do not publish accurate information about per-
lung cancer surgery. The Quality Initiatives Task formance using the best available clinical data
Force is also exploring the possibility of identify- and risk adjustment, then the public will be
ing consistently low-performing programs using forced to judge our performance based on
STS data and then offering such programs the unadjusted or inadequately adjusted adminis-
possibility of external review of their database trative claims data.
integrity (to identify potential coding issues that
might lead to false outlier classification) and clin-
ical practice (to facilitate quality improvement). The STS Public Reporting Task Force is
A number of states and regions have also used responsible for the development and maintenance
STS data to improve quality. For example, in a of the web-based platforms for public reporting of
collaborative effort with Blue Cross Blue Shield data from the STS National Database. STS has
of Michigan, the Michigan Society of Thoracic and implemented voluntary public reporting through
Cardiovascular Surgeons has brought together rep- its STS Public Reporting Online Initiative [www.
resentatives from all cardiac surgery programs in sts.org/publicreporting] and through collaboration
the state (Prager et al. 2009). They review perfor- with Consumers Union [www.consumerreports.
mance of all programs, identify gaps and variability org/health]. In each case, these reports are based
in outcomes, and review each cardiac surgery death on the STS composite measures and star ratings
using a standardized phase of care mortality anal- (with drill-down capability) described above.
ysis (POCMA). They have also implemented a In September 2010, STS began publicly
number of best practice initiatives. Similarly, the reporting outcomes of isolated CABG surgery
Virginia Cardiac Surgery Quality Initiative (Speir based on its NQF-endorsed composite CABG
et al. 2009) has brought together surgeons from metric. In January 2013, STS began publicly
across the state. They have linked STS clinical reporting outcomes of isolated aortic valve
data to cost data with a focus on reducing both replacement (AVR) surgery based on its
complications and their associated costs. NQF-endorsed AVR composite score. In August
2014, STS began publicly reporting outcomes of
combined AVR + CABG surgery, using an
STS Public Reporting Task Force NQF-endorsed composite score with the same
two domains (risk-adjusted morbidity and mortal-
Among healthcare professional societies, STS has ity) as the isolated AVR composite.
taken the lead in public reporting by providing STS plans to expand its portfolio of publicly
easily understandable cardiothoracic surgical out- reported cardiothoracic surgical quality measures
comes data to the public (Shahian et al. 2011a, b). by at least one additional new operation every
STS support of public reporting and transparency year. Future publicly reported metrics will include
is based on several principles: pediatric and congenital heart surgery risk-
adjusted operative mortality based on the 2014
• Public reporting and accountability are our STS Congenital Heart Surgery Database Mortal-
professional responsibilities. ity Risk Model (planned for public reporting in
• Patients and their families have a right to know the January 2015), mitral valve replacement (MVR)
outcomes of cardiothoracic surgical procedures. and mitral valve repair, a multi-domain composite
10 Health Services Information: Lessons Learned from the Society of Thoracic Surgeons. . . 235
for pulmonary lobectomy for cancer, and a multi- literature and have significantly advanced knowl-
domain composite for esophagectomy. As of mid- edge in cardiothoracic surgery.
2016, 50 % of adult cardiac surgery participants in
the STS National Database and 60 % of congenital
heart surgery participants had consented to volun- STS Task Force on Longitudinal Follow-
tary public reporting. Up and Linked Registries (STS-LFLR)
They are also invaluable for comparative effec- ElBardissi AW, Aranki SF, Sheng S, O’Brien SM,
tiveness research. Although randomized trials Greenberg CC, Gammie JS. Trends in isolated coro-
nary artery bypass grafting: an analysis of the Society
have been considered by many to be the gold of Thoracic Surgeons adult cardiac surgery database.
standard of comparative effectiveness research, J Thorac Cardiovasc Surg. 2012;143(2):273–81.
recent efforts have examined the possibility of Ferguson Jr TB, Peterson ED, Coombs LP, et al. Use of
using clinical registries as platforms for random- continuous quality improvement to increase use of
process measures in patients undergoing coronary
ized trials (Frobert et al. 2013; Lauer and artery bypass graft surgery: a randomized controlled
D’Agostino 2013). Performing randomized trials trial. JAMA. 2003;290(1):49–56.
within clinical registries would potentially accom- Franklin RC, Jacobs JP, Krogmann ON, et al.
plish the dual objectives of decreasing the cost of Nomenclature for congenital and paediatric cardiac
disease: historical perspectives and The International
these trials and increasing the generalizability of Pediatric and Congenital Cardiac Code. Cardiol Young.
the results (as the included patients are more rep- 2008;18 Suppl 2:70–80.
resentative of “real-world” populations). Frobert O, Lagerqvist B, Olivecrona GK, et al.
Clinical registries provide practitioners with Thrombus aspiration during ST-segment elevation
myocardial infarction. N Engl J Med. 2013;
accurate and timely feedback of their own out- 369(17):1587–97.
comes and can benchmark these outcomes to Frohnert BK, Lussky RC, Alms MA, Mendelsohn NJ,
regional, national, or even international aggregate Symonik DM, Falken MC. Validity of hospital dis-
data, thus facilitating quality improvement. charge data for identifying infants with cardiac defects.
J Perinatol. 2005;25(11):737–42.
The STS National Database exemplifies that Grover FL, Shahian DM, Clark RE, Edwards FH. The STS
potential value of clinical registries for all of National Database. Ann Thorac Surg. 2014;97 Suppl 1:
healthcare. High-quality data are collected once S48–54.
and then used for multiple purposes, with the Hammill BG, Hernandez AF, Peterson ED, Fonarow
GC, Schulman KA, Curtis LH. Linking inpatient
ultimate goal of improving the care of all patients. clinical registry data to Medicare claims data using
indirect identifiers. Am Heart J. 2009;157(6):
995–1000.
References Hannan EL, Cozzens K, King III SB, Walford G, Shah
NR. The New York State cardiac registries: history,
Carroll JD, Edwards FH, Marinac-Dabic D, et al. The contributions, limitations, and lessons for future efforts
STS-ACC transcatheter valve therapy national registry: to assess and publicly report healthcare outcomes. J Am
a new partnership and infrastructure for the introduc- Coll Cardiol. 2012;59(25):2309–16.
tion and surveillance of medical devices and therapies. Hibbard JH, Peters E, Slovic P, Finucane ML, Tusler
J Am Coll Cardiol. 2013;62(11):1026–34. M. Making health care quality reports easier to use.
Clark RE. It is time for a national cardiothoracic surgical Jt Comm J Qual Improv. 2001;27(11):591–604.
data base. Ann Thorac Surg. 1989;48(6):755–6. Hillis LD, Smith PK, Anderson JL, et al. ACCF/AHA
Codman EA. The product of a hospital. Surg Gynecol guideline for coronary artery bypass graft surgery:
Obstet. 1914;18:491–6. executive summary: a report of the American College
Codman EA. A study in hospital efficiency. As demon- of Cardiology Foundation/American Heart Association
strated by the case report of the first two years of a Task Force on Practice Guidelines. Circulation.
private hospital. Reprint edition (originally published 2011;124(23):2610–42.
privately 1914–1920) ed. Oakbrook Terrace: Joint Iezzoni LI. Risk adjustment for measuring health care out-
Commission on Accreditation of Healthcare Organiza- comes. 3rd ed. Chicago: Health Administration Press;
tions; 1995. 2003.
Cronk CE, Malloy ME, Pelech AN, et al. Completeness of Jacobs JP, Jacobs ML, Mavroudis C, et al. Nomenclature
state administrative databases for surveillance of con- and databases for the surgical treatment of congenital
genital heart disease. Birth Defects Res A Clin Mol cardiac disease–an updated primer and an analysis
Teratol. 2003;67(9):597–603. of opportunities for improvement. Cardiol Young.
Donabedian A. Evaluating the quality of medical care. 2008;18 Suppl 2:38–62.
Milbank Mem Fund Q. 1966;44(3):166–206. Jacobs JP, Jacobs ML, Lacour-Gayet FG,
Donabedian A. The quality of care. How can it be et al. Stratification of complexity improves the utility
assessed? JAMA. 1988;260(12):1743–8. and accuracy of outcomes analysis in a multi-
Donabedian A. The end results of health care: Ernest institutional congenital heart surgery database: applica-
Codman’s contribution to quality assessment and tion of the risk adjustment in congenital heart surgery
beyond. Milbank Q. 1989;67(2):233–56. (RACHS-1) and Aristotle systems in the Society of
238 D. M. Shahian and J. P. Jacobs
Thoracic Surgeons (STS) Congenital Heart Surgery congenital heart surgery. J Thorac Cardiovasc Surg.
Database. Pediatr Cardiol. 2009;30(8):1117–30. 2009a;138(5):1139–53.
Jacobs JP, Edwards FH, Shahian DM, et al. Successful O’Brien SM, Shahian DM, Filardo G, et al. The Society of
linking of the Society of Thoracic Surgeons adult car- Thoracic Surgeons 2008 cardiac surgery risk models:
diac surgery database to Centers for Medicare and part 2–isolated valve surgery. Ann Thorac Surg.
Medicaid Services Medicare data. Ann Thorac Surg. 2009b;88 Suppl 1:S23–42.
2010;90(4):1150–6. O’Connor GT, Plume SK, Olmstead EM, et al. A regional
Jacobs ML, Daniel M, Mavroudis C, et al. Report of the prospective study of in-hospital mortality associated
2010 Society of Thoracic Surgeons congenital heart with coronary artery bypass grafting. The Northern
surgery practice and manpower survey. Ann Thorac New England Cardiovascular Disease Study Group.
Surg. 2011a;92(2):762–8. JAMA. 1991;266(6):803–9.
Jacobs JP, Edwards FH, Shahian DM, et al. Successful Overman DM, Jacobs JP, Prager RL, et al. Report from the
linking of the Society of Thoracic Surgeons database Society of Thoracic Surgeons National Database Work-
to social security data to examine survival after cardiac force: clarifying the definition of operative mortality.
operations. Ann Thorac Surg. 2011b;92(1):32–7. World J Pediatr Congenit Heart Surg. 2013;4(1):10–2.
Jacobs JP, Shahian DM, He X, et al. Penetration, complete- Pasquali SK, Jacobs JP, Shook GJ, et al. Linking clinical
ness, and representativeness of the Society of Thoracic registry data with administrative data using indirect
Surgeons adult cardiac surgery database. Ann Thorac identifiers: implementation and validation in the con-
Surg. 2016;101(1):33–41. genital heart surgery population. Am Heart J. 2010;
Jantzen DW, He X, Jacobs JP, et al. The impact of differ- 160(6):1099–104.
ential case ascertainment in clinical registry versus Pasquali SK, Li JS, Jacobs ML, Shah SS, Jacobs
administrative data on assessment of resource utiliza- JP. Opportunities and challenges in linking information
tion in pediatric heart surgery. World J Pediatr Congenit across databases in pediatric cardiovascular medicine.
Heart Surg. 2014;5(3):398–405. Prog Pediatr Cardiol. 2012a;33(1):21–4.
Kouchoukos NT, Ebert PA, Grover FL, Lindesmith Pasquali SK, He X, Jacobs JP, Jacobs ML, O’Brien SM,
GG. Report of the Ad Hoc Committee on risk factors Gaynor JW. Evaluation of failure to rescue as a quality
for coronary artery bypass surgery. Ann Thorac Surg. metric in pediatric heart surgery: an analysis of the STS
1988;45(3):348–9. Congenital Heart Surgery Database. Ann Thorac Surg.
Kozower BD, Sheng S, O’Brien SM, et al. STS database 2012b;94(2):573–9.
risk models: predictors of mortality and major morbid- Pasquali SK, Gaies MG, Jacobs JP, William GJ, Jacobs
ity for lung cancer resection. Ann Thorac Surg. ML. Centre variation in cost and outcomes for con-
2010;90(3):875–81. genital heart surgery. Cardiol Young. 2012c;22(6):
Lauer MS, D’Agostino Sr RB. The randomized registry 796–9.
trial–the next disruptive technology in clinical Pasquali SK, Li JS, He X, et al. Perioperative methylpred-
research? N Engl J Med. 2013;369(17):1579–81. nisolone and outcome in neonates undergoing heart
Mack MJ, Herbert M, Prince S, Dewey TM, Magee MJ, surgery. Pediatrics. 2012d;129(2):e385–91.
Edgerton JR. Does reporting of coronary artery bypass Pasquali SK, Li JS, He X, et al. Comparative analysis of
grafting from administrative databases accurately antifibrinolytic medications in pediatric heart surgery. J
reflect actual clinical outcomes? J Thorac Cardiovasc Thorac Cardiovasc Surg. 2012e;143(3):550–7.
Surg. 2005;129(6):1309–17. Pasquali SK, Peterson ED, Jacobs JP, et al. Differential
Mack MJ, Brennan JM, Brindis R, et al. Outcomes follow- case ascertainment in clinical registry versus adminis-
ing transcatheter aortic valve replacement in the United trative data and impact on outcomes assessment for
States. JAMA. 2013;310(19):2069–77. pediatric cardiac operations. Ann Thorac Surg.
Mallon WJ. Ernest Amory Codman: the end result of a life 2013;95(1):197–203.
in medicine. Philadelphia: W.B.Saunders Company; Passaro Jr E, Organ CH, Ernest Jr A. Codman: the
2000. improper Bostonian. Bull Am Coll Surg. 1999;
Mavroudis C, Jacobs JP. Congenital heart surgery nomen- 84(1):16–22.
clature and database project: overview and minimum Patel MR, Dehmer GJ, Hirshfeld JW, et al. ACCF/SCAI/
dataset. Ann Thorac Surg. 2000;69(3, Suppl 1):S1–17. STS/AATS/AHA/ASNC/HFSA/SCCT 2012 appropri-
Neuhauser D. Ernest Amory Codman, M.D., and end ate use criteria for coronary revascularization focused
results of medical care. Int J Technol Assess Health update: a report of the American College of Cardiology
Care. 1990;6(2):307–25. Foundation Appropriate Use Criteria Task Force, Soci-
O’Brien SM, Shahian DM, Delong ER, et al. Quality mea- ety for Cardiovascular Angiography and Interventions,
surement in adult cardiac surgery: part 2–Statistical con- Society of Thoracic Surgeons, American Association
siderations in composite measure scoring and provider for Thoracic Surgery, American Heart Association,
rating. Ann Thorac Surg. 2007;83 Suppl 4:S13–26. American Society of Nuclear Cardiology, and the
O’Brien SM, Clarke DR, Jacobs JP, et al. An empirically Society of Cardiovascular Computed Tomography.
based tool for analyzing mortality associated with J Thorac Cardiovasc Surg. 2012;143(4):780–803.
10 Health Services Information: Lessons Learned from the Society of Thoracic Surgeons. . . 239
Prager RL, Armenti FR, Bassett JS, et al. Cardiac surgeons Shahian DM, He X, Jacobs JP, et al. The Society of
and the quality movement: the Michigan experience. Thoracic Surgeons composite measure of individual
Semin Thorac Cardiovasc Surg. 2009;21(1):20–7. surgeon performance for adult cardiac surgery: a
Shahian DM, Normand SL. Comparison of “risk-adjusted” report of the Society of Thoracic Surgeons quality
hospital outcomes. Circulation. 2008;117(15):1955–63. measurement task force. Ann Thorac Surg.
Shahian DM, Edwards FH, Ferraris VA, et al. Quality mea- 2015;100:1315–1325.
surement in adult cardiac surgery: part 1–Conceptual Shapiro M, Swanson SJ, Wright CD, et al. Predictors of
framework and measure selection. Ann Thorac Surg. major morbidity and mortality after pneumonectomy
2007a;83 Suppl 4:S3–12. utilizing the Society for Thoracic Surgeons General
Shahian DM, Silverstein T, Lovett AF, Wolf RE, Normand Thoracic Surgery Database. Ann Thorac Surg.
SL. Comparison of clinical and administrative data 2010;90(3):927–34.
sources for hospital coronary artery bypass graft surgery Speir AM, Rich JB, Crosby I, Fonner Jr E. Regional col-
report cards. Circulation. 2007b;115(12):1518–27. laboration as a model for fostering accountability and
Shahian DM, O’Brien SM, Filardo G, et al. The Society of transforming health care. Semin Thorac Cardiovasc
Thoracic Surgeons 2008 cardiac surgery risk models: Surg. 2009;21(1):12–9.
part 1–coronary artery bypass grafting surgery. Ann Spiegelhalter DJ. Surgical audit: statistical lessons from
Thorac Surg. 2009a;88 Suppl 1:S2–22. Nightingale and Codman. J R Stat Soc (Series A).
Shahian DM, O’Brien SM, Filardo G, et al. The Society of 1999;162(Part 1):45–58.
Thoracic Surgeons 2008 cardiac surgery risk models: Strickland MJ, Riehle-Colarusso TJ, Jacobs JP, et al. The
part 3–valve plus coronary artery bypass grafting sur- importance of nomenclature for congenital cardiac dis-
gery. Ann Thorac Surg. 2009b;88 Suppl 1:S43–62. ease: implications for research and evaluation. Cardiol
Shahian DM, Edwards FH, Jacobs JP, et al. Public Young. 2008;18 Suppl 2:92–100.
reporting of cardiac surgery performance: part STS long-term risk calculator. http://www.sts.org/quality-
1–history, rationale, consequences. Ann Thorac Surg. research-patient-safety/quality/ascert-long-term-survival-
2011a;92 Suppl 3:S2–11. calculator. Accessed 11 July 2014.
Shahian DM, Edwards FH, Jacobs JP, et al. Public reporting STS National Database. http://www.sts.org/sections/
of cardiac surgery performance: part 2–implementation. stsnationaldatabase/. Accessed 26 July 2014.
Ann Thorac Surg. 2011b;92 Suppl 3:S12–23. STS Quality Improvement webinars. http://www.sts.org/
Shahian DM, He X, Jacobs JP, et al. The Society of Thoracic education-meetings/sts-webinar-series. Accessed
Surgeons isolated aortic valve replacement (AVR) com- 12 July 2014.
posite score: a report of the STS Quality Measurement STS Research Center. http://www.sts.org/sites/default/files/
Task Force. Ann Thorac Surg. 2012a;94(6):2166–71. documents/pdf/DirectorSTSResearchCenter_April2014.
Shahian DM, O’Brien SM, Sheng S, et al. Predictors of long- pdf. Accessed 13 July 2014.
term survival following coronary artery bypass grafting STS short term risk calculator. http://www.sts.org/quality-
surgery: results from The Society of Thoracic Surgeons research-patient-safety/quality/risk-calculator-and-models.
Adult Cardiac Surgery Database (The ASCERT Study). Accessed 11 July 2014.
Circulation. 2012b;125(12):1491–500. Weintraub WS, Grau-Sepulveda MV, Weiss JM,
Shahian DM, He X, Jacobs JP, et al. Issues in quality et al. Comparative effectiveness of revasculari-
measurement: target population, risk adjustment, and zation strategies. N Engl J Med. 2012;366(16):
ratings. Ann Thorac Surg. 2013a;96(2):718–26. 1467–76.
Shahian DM, Jacobs JP, Edwards FH, et al. The Society of Wright CD, Kucharczuk JC, O’Brien SM, Grab JD, Allen
Thoracic Surgeons National Database. Heart. 2013b; MS. Predictors of major morbidity and mortality after
99(20):1494–501. esophagectomy for esophageal cancer: a Society of
Shahian DM, He X, Jacobs JP, et al. The STS AVR + CABG Thoracic Surgeons General Thoracic Surgery Database
composite score: a report of the STS Quality Measure- risk adjustment model. J Thorac Cardiovasc Surg.
ment Task Force. Ann Thorac Surg. 2014;97(5):1604–9. 2009;137(3):587–95.
Health Services Information:
Patient Safety Research Using 11
Administrative Data
Chunliu Zhan
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Administrative Data: Definition, Data Resources, and Potential Patient Safety
Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Medical Claims, Discharge, and Other Health
Encounter Abstracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Medical Records and Electronic Health Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Reports and Surveillance of Patient Safety Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Surveys of Healthcare Encounters and Healthcare Experiences . . . . . . . . . . . . . . . . . . . . . . . 250
Other Data Sources and Data Linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Patient Safety Research Using Administrative Data: General Framework,
Methods, and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
General Framework for Administrative Data-Based
Patient Safety Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Methodological Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
AHRQ Patient Safety Indicators: An Exemplary Tool for Administrative
Data-Based Patient Safety Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Patient Safety Research Using Administrative Data: Potentials
and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Screen Patient Safety Events for In-depth Examination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Epidemiological Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Public Reporting on Patient Safety Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Advantages and Challenges in Administrative
Data-Based Patient Safety Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
insights into the impacts and risk factors, and, and Quality (AHRQ), the US federal agency
to some extent, provided benchmarks for track- charged with improving patient safety, defined
ing progress in patient safety efforts at local, patient safety as “freedom from accidental or pre-
state, or national levels. Various methods and ventable injuries produced by medical care.” The
tools have been developed to aid such research. literature is littered with systems of definitions,
The main disadvantage lies with the fact these taxonomies, categorizations, terms, and concepts
administrative data are often collected without associated with patient safety. The National
following any research design, protocol, or Quality Forum’s list of “never events” or “serious
quality assurance procedure; therefore health reportable events” offers concrete examples of the
services researchers using these data sources types of issues patient safety research is concerned
must make extra efforts in devising proper with:
methodologies and must interpret their find-
ings with extra caution. As more and more • Surgical events: surgery or other invasive pro-
administrative data are collected and digita- cedure performed on the wrong body part or
lized and more tailored methodologies and the wrong patient, the wrong surgical or other
tools are developed, health services researchers invasive procedure performed on a patient, and
will be presented with ever-greater opportunity unintended retention of a foreign object in a
to extract valid information and knowledge on patient after surgery or other procedure
patient safety issues from administrative data. • Product or device events: such as patient death
or serious injury associated with the use of
contaminated drugs, devices, or biologics
Introduction • Patient protection events: discharge or release of
a patient/resident of any age, who is unable to
A guiding principle for medical professionals is the make decisions, to other than an authorized
Hippocratic oath: First, Do No Harm. But, inevi- person and patient suicide, attempted suicide,
tably, patient harms occur, and research is needed or self-harm resulting in serious disability while
to understand why and how to prevent them. Since being cared for in a healthcare facility
the Institute of Medicine (IOM) published its land- • Care management events: such as patient death
mark report, To Err Is Human: Building a Safer or serious injury associated with a medication
Healthcare System (Kohn et al. 1999), in 1999, the error (e.g., errors involving the wrong drug,
importance of vigorous, systematic research on wrong dose, wrong patient, wrong time,
patient safety has been recognized worldwide, wrong rate, wrong preparation, or wrong
and patient safety research has become a prominent route of administration), patient death or seri-
domain of health services research. Using a variety ous injury associated with unsafe administra-
of definitions, taxonomies, methods, and data- tion of blood products, maternal death or
bases, health services researchers have addressed serious injury associated with labor or delivery,
a wide range of patient safety-related questions, and patient death or serious injury resulting
producing a large body of literature. from failure to follow up or communicate lab-
To the general public, patient safety is self- oratory, pathology, or radiology test results
defined. As a research topic, its definition is far • Environmental events: patient or staff death or
from universally agreed. IOM defines patient serious injury associated with a burn incurred
safety as “the prevention of harm to patients”, in a healthcare setting and patient death or
and its emphasis is placed on “the system of care serious injury associated with the use of
delivery that (1) prevents errors; (2) learns from restraints or bedrails while being cared for in
the errors that do occur; and (3) is built on a a healthcare setting.
culture of safety that involves health care profes- • Radiologic events: death or serious injury of a
sionals, organizations, and patients” (Kohn patient or staff associated with introduction of
et al. 1999). The Agency for Healthcare Research a metallic object into the MRI area
11 Health Services Information: Patient Safety Research Using Administrative Data 243
• Criminal events: any instance of care ordered extra efforts in devising methodologies and must
by or provided by someone impersonating a interpret their findings with extra caution.
physician, nurse, pharmacist, or other licensed Patient safety as a research domain is relatively
healthcare provider, abduction of a patient/res- new compared with other health services research
ident of any age, sexual abuse/assault on a domains, and the issues are diverse and constantly
patient within or on the grounds of a healthcare evolving. Administrative data is also fast ex-
setting, and death or significant injury of a panding, with more and more data collected and
patient or staff member resulting from a phys- accumulated as computer technologies progress
ical assault (i.e., battery) that occurs within or and interest in mining big data increases. Conse-
on the grounds of a healthcare setting quently, patient safety research using administra-
tive data does not follow any clearly defined
To focus on the subject at hand, that is, how to agenda, methodologies, or processes, giving
use administrative data to conduct patient safety researchers great room for creativity and innova-
research, this chapter will refer to all events with tion and also greater room for error.
patient safety implications as patient safety events This chapter provides a review of the adminis-
without distinction. trative data sources currently available for patient
Patient safety research can be done in a number safety research, the common methodologies and
of ways, such as follow-up of cohorts of patients tools employed, and the types of patient safety
as they come into contact with healthcare systems research that can be conducted using administra-
and randomized trials to examine whether a cer- tive data. By going through some well-developed
tain intervention works to reduce patient safety concepts, tools, and examples, the chapter intends
events. However, such studies are rare, due to to offer health services researchers a road map on
the fact that patient safety events are accidental how to use administrative data to generate infor-
in nature, in other words, rare; to gather a suffi- mation and knowledge to advance their patient
cient number of cases of patient safety events, a safety agenda.
researcher must collect a substantially large study
sample. Unsurprisingly, most studies on patient
safety were conducted using administrative data, Administrative Data: Definition, Data
the type of data collected routinely and processed Resources, and Potential Patient
in large volume for administrative purposes. Safety Measures
Health services researchers have used adminis-
trative data to study a variety of patient safety Administrative data refer to data collected for
issues, from the prevalence to risk factors and administrative purposes. Such data are essential
effectiveness of interventions to reduce patient for running any kind of business, and the business
safety events. The apparent advantage of adminis- of healthcare is no exception. Hospitals, outpa-
trative data is in its large volume and its com- tient clinics, nursing homes, home care providers,
puterization, which make the most tenuous and pharmacists, and all other healthcare providers
expensive part of research – data collection – collect and compile data on patients, medical con-
relatively easy and cheap. Another advantage is ditions, treatments, and patient directives, create
that, because of little risk to interrupting patient bills for patients and submit claims to insurers and
care in the data collection process and little risk of other third-party payers for reimbursements, and
patient privacy breach with patient identifiers compile business data for governance, internal
stripped, data acquisition can be done without audits, credentialing, and statistical reports.
jumping through many hoops. The apparent disad- Health insurance companies deal with medical
vantage lies with the fact that these administrative claims in addition to enrolling patients, generating
data are collected without following any research enormous amount of data on a daily basis. Drug
design, protocol, or quality assurance procedure; companies collect data on drug sales, establish
therefore, researchers using these data must make drug registries for postmarket research, and
244 C. Zhan
compile data on drug safety to meet regulatory It should be noted that, in health services
needs. Professional societies, such as the research literature, claims data are often treated
American Medical Association and the American as synonymous to administrative data. It is
Hospital Association, also compile extensive data because medical claims, which record individual
on their members for membership management, patients’ individual episodes of care for insurance
licensing, accreditation, and other administrative claims, are the most voluminous data, the first
purposes. Many employers, especially large and extensively computerized data, and the first
traditional companies, offer extensive health bene- administrative data sources extensively used in
fits, and, for management purposes, compile exten- health services research. However, similar data
sive data to track their employee’s use of health on individual healthcare encounters are also col-
benefits and expenses. Last but not least, govern- lected in many countries or programs under uni-
ment agencies compile extensive data, including versal insurance coverage, and these data are
claims in order to pay the bills for patients covered sometimes called discharge abstracts. Following
by government programs, data from healthcare pro- the basic definition of “administrative data,” this
viders to monitor this important sector of the econ- review also includes other data sources that are
omy, and regular surveys to generate national collected for administrative purposes, but may be
statistics and track changes over time. Together, smaller in scale, less computerized, and less often
tremendous amounts of administrative data are pro- used in health services research. The basic char-
duced and maintained by various entities, and these acteristics of these data sources and the potential
data hold great potential for research on a wide patient safety measures that can be derived from
range of issues, including patient safety issues. these data sources are discussed in detail below.
In general, any data source that records per-
sonal encounters or experiences with healthcare
systems has the potential to contribute informa- Medical Claims, Discharge, and Other
tion and knowledge on patient safety. Many other Health Encounter Abstracts
data sources containing no patient care data can
also be useful when merged with patient encoun- Data Sources
ter data. Table 1 provides a brief summary of the A healthcare provider must collect and compile
types of administrative data sources that are avail- data on each service rendered to each patient, for
able and that have been used by health services record keeping, patient tracking, billing, and other
researchers to study patient safety. administrative purposes. At minimun, the data
include some patient demographics, medical con- Besides CMS, other federal agencies, state
ditions, diagnoses, treatments, discharge or dispo- health departments, health plans, and private
sition status, and charges and payments. As data institutions have also compiled claims data
mentioned earlier, the most important use of into research databases. One prominent example
such data is to make insurance claims; therefore is AHRQ’s Healthcare Cost and Utilization Pro-
this type of data is often called “claims data” and ject (HCUP), a partnership of the federal govern-
further categorized as inpatient claims, outpatient ment and states that compiles uniform hospital
claims, pharmacy claims, and so on. In many discharge records for research purposes (HCUP
countries other than the United States, health 2014). As of today, HCUP includes databases
encounters are similarly recorded and compiled covering all hospital admissions from 47 states,
but not for insurance claims purposes, and this emergency department visits from 31 states,
type of administrative records may be called dis- and ambulatory surgery claims from 32 states.
charge abstracts, for example. Regardless of It has derived research databases with a sampling
terms, data on individual healthcare encounters design to yield national estimates and developed
are universal and are available in various capacity various tools to reliably and effectively use these
for research use. databases. On the private side, Truven Health
Researchers rarely have the need to deal with Analytics MarketScan ® databases contain com-
individual hospitals, primary care institutions, plete claims for more than 199 million unique
nursing homes, outpatient surgical centers, or patients, and IMS Health compiles information
home care agencies to access such data. Govern- from 100,000 suppliers from over 100 countries,
ment agencies, insurers, health systems, and many with more than 45 billion healthcare transactions
commercial companies compile the data and offer processed annually.
them to various end users. In the United States, the With the government paying for all health
Centers for Medicare and Medicaid Services services provided by mostly private providers,
(CMS) has been a major source of such adminis- Canada collects data on individual health
trative data. Medicare, a national social insurance encounters for almost the entire population.
program, guarantees access to health insurance for Some provinces have data on virtually all records
about 50 million Americans aged 65 and older and of hospitalizations, pharmacy, physician visits,
younger people with disabilities. Medicaid, a emergency department visits, and so on for
social healthcare program jointly funded by the every resident. Many efforts are made to make
state and federal governments and managed by the such data easy for researchers to access and use.
states, provide coverage for families and individ- For example, the Canadian Institute for Health
uals with low income and resources. Together, Information maintains discharge abstract data-
Medicare and Medicaid process millions of claims bases of administrative, clinical, and demo-
each day. CMS has made great efforts to make graphic information on hospital discharges
these claims available to researchers and to stan- received directly from acute care facilities or
dardize the data release process. The latest incar- from their respective health authority or depart-
nation of these efforts is called the CMS Data ment of health. In the United Kingdom, hospital
Navigator (CMS 2014), intended to be the one episode statistics comprises an administrative
stop for all CMS data sources, through standard database of all inpatients in England, covering
processes that include formulated requests, about 13 million episodes of care annually.
approval, pricing, and payment procedures to Similar databases exist, in various forms, in
ensure proper use and security of the data. The almost all nations, most of which are available
CMS data suite covers enrollment, outpatient for research purposes.
care, hospitalization, pharmacy, and services Regardless of country, healthcare system, or
delivered by other types of providers, and the purpose, administrative data of this sort record
data can be linked to form a rather complete patient encounters with the healthcare system
history of indidividual’s healthcare encounters. and capture with similar sets of data elements:
246 C. Zhan
• Patient demographics such as age, sex, race/ E8716: post heart catheter foreign object left in
ethnicity, county of resident and zip code, and body
expected payer E8717: post catheter removal foreign object left in
• Admission status including admission date, body
admission source and type, and primary and E8718: foreign object left in body during other
secondary diagnoses specified procedure
• Treatments such as procedures and E8719: foreign object left in body during
medications non-specified procedure
• Discharge status entailing discharge date,
patient disposition, or death The corresponding ICD-10 codes for foreign
• Charges and payments object accidentally left in body during a procedure
may include:
In addition, some identifiers for patients and
providers, usually encrypted, are included, T81.509A: unspecified complication of foreign
allowing for linking individual patient’s claims body accidentally left in body following
from multiple care settings. unspecified procedure, initial encounter
T81.519A: adhesions due to foreign body acci-
Potential Patient Safety Measures dentally left in body following unspecified pro-
A coding system for diagnosis and procedures is cedure, initial encounter
essential for recording patient encounters and for T81.529A: obstruction due to foreign body acci-
generating bills. The United States currently uses dentally left in body following unspecified pro-
International Classification of Diseases, the ninth cedure, initial encounter
revision, Clinical Modification (ICD-9-CM), a T81.539A: perforation due to foreign body acci-
coding system with three-digit numbers (i.e., dentally left in body following unspecified pro-
001–999) followed by a decimal point and up to cedure, initial encounter
two digits, supplemented by a group of E codes
(E000–999) capturing external causes of injury The process of identifying the right codes and
(Iezzoni et al. 1994). Canada, Australia, eligible patients to measure patient safety is a mix
New Zealand, and many European and Asian of science and art. It rarely is clear that one code
countries have been using ICD-10, an alphanu- specifically records a specific patient safety event.
meric system each starting with a letter (i.e., A–Z), The art of the process includes not only selection
followed by two numeric digits, a decimal point, of relevant codes but also exclusion of patients for
and a digit (Quan et al. 2008). whom the codes are not likely to be relevant.
Some of the codes specifically identify a patient Another consideration is whether a recorded
safety event, and some codes suggest that there event occurred during the current hospitalization
may be an event of patient safety concern. For (i.e., hospital-acquired condition) or whether it
example, there are ICD-9-CM diagnosis codes for was already present on admission (i.e., comorbid
“foreign object accidentally left in body during a condition). If the code appears as the first, or
procedure”: 998.4. Some other codes may also primary diagnosis, in a claim or discharge
suggest such occurrence, including: abstract, then it can be considered to record an
event that is present on admission. But as many as
998.7: postoperative foreign substance reaction 25 secondary diagnosis codes are recorded in
E8710: post-surgical foreign object left in body some claims data, and only recently, a code was
E8711: postinfusion foreign object left in body introduced in Medicare claims to indicate whether
E8712: postperfusion foreign object left in body a diagnosis is present on admission. A great deal
E8713: postinjection foreign object left in body of effort in administrative data-based patient
E8714: postendoscopy foreign object left in body safety research goes into the artistic process with
E8715: postcatheter foreign object left in body the dual purpose to maximize specificity (i.e., an
11 Health Services Information: Patient Safety Research Using Administrative Data 247
event flagged by the codes is truly a patient safety or prescribed on discharge; discharge diagno-
event) and sensivity (i.e., all patient safety events sis; and discharge plan and discharge planning
are flagged). This process is further illustrated in evaluation
later sections, in conjunction with the discussion • Other information: such as advance directives,
of the methods and tools used in administrative informed consent, and records of communica-
data-based patient safety research. tion with the patient, such as telephone calls or
Algorithms can also be built based on coded email.
data other than ICD codes. Claims for medications
can be used to screen harmful drug-drug interac- Medical records can be handwritten, typed, or
tions and contraindicative drug-condition interac- electronic and can be coded or written in open-text
tions. With data linked from multiple settings and narratives. The rich clinical information makes
over time, certain measures of inappropriate use, medical records a good source for patient safety
underuse, or overuse of care with safety implica- research, allowing identification of various medi-
tions can be studied. cal injuries, adverse events, errors, and nearmisses
and allowing analysis of circumstances and
causes of various patient safety events. Earlier
Medical Records and Electronic Health research on patient safety used medical records
Records predominantly as the primary data source (Kohn
et al. 1999). Those earlier studies mostly had to
Data Sources work with medical records in paper format or
Medical records are as numerous as claims but electronic format that was not readily usable for
much richer in information on patients and their research and had to rely on medical experts to
healthcare experiences. Each healthcare encoun- transform medical records into research data, a
ter has a medical record associated with it to process that was resource intensive and required
support diagnosis and justify services provided. exceptional knowledge and skills in medical con-
Broadly speaking, a medical record may contain: text and research. As a result, earlier patient safety
research with medical records was usually limited
• Patient demographic information: name, in scope and statistical power.
address, date of birth, sex, race and ethnicity, The wide adoption of electronic medical
legal status of any patient receiving behavioral records (EMRs) offers great promise for patient
healthcare services, and language and commu- safety research. In the United States, a substantial
nication needs, including the preferred lan- percentage of hospitals and physicians have
guage for discussing healthcare issues started to use EMR systems, with various levels
• Patient clinical information: reason(s) for of capacity and usability. In the United Kingdom,
admission; initial diagnosis; assessments; the National Health Service collects and stores
allergies to food or latex or medication; data electronically on primary care encounters in
medical history; physical examination; diagno- the clinical information management system.
ses or conditions established during the Great efforts are being made in Canada and all
patient’s course of care, treatment, and ser- over the world to move the healthcare industry
vices; consultation reports; observations rele- into the Information Age.
vant to treatment; patient’s response to
treatment; progress notes; medications Potential Patient Safety Measures
orderedor prescribed; medications adminis- In theory, EMRs hold much of what claims data
tered, including the strength, dose, frequency, can offer and much more. EMRs contain a great
and route; adverse drug reactions; treatment deal of information in structured, coded data sim-
goals; plan of care and revisions to the plan of ilar to administrative data. The allure of EMR data
care; results of diagnostic and therapeutic in patient safety research lies with its rich clinical
tests and procedures; medications dispensed data, such as lab values, and narratives that record
248 C. Zhan
Table 2 Medical record-based screening for patient safety events: adverse drug events associated with warfarin
Description Screening algorithm
Numerator The subset of the denominator who during the hospital stay experienced:
An INR 4.0 with one or more of the following symptoms: cardiac arrest/emergency measures to
sustain life, death, gastrointestinal bleeding, genitourinary bleeding, a hematocrit drop of three or more
points more than 48 h after admission, intracranial bleeding (subdural hematoma), a new hematoma,
other types of bleeding or pulmonary bleeding
An INR >1.5 and an abrupt cessation/hold of warfarin with one or more of the above symptoms
An INR >1.5 and administration of vitamin K or fresh frozen plasma (FFP) with one or more of the
above symptoms
An INR >1.5 and a blood transfusion absent a surgical procedure with one or more of the above
symptoms
Denominator All patients who received warfarin during hospitalization and had a documented INR result during the
hospital stay
medical providers’ observations, judgments, facilities to communicate findings and case stud-
treatment details, and outcomes. Screening algo- ies illustrating the most egregious harms.
rithms can be designed to search for patient safety With regard to rich notes and other narratives
events in coded data as well as in text narratives. in EMRs, there has been much hype but little real
The search can look for falls, retrieve lab data on progress. The method to identify, extract, and
toxic serum levels of digoxin, or screen for inter- encode relevant information from tremendous
national normalization ratios greater than 6 in volumes of text narratives is called natural lan-
patients on warfarin. It can entail a sophisticated, guage processing (NLP). In general, EMR narra-
explicit, structured query of entire medical tives are stored following internal structure;
records. Table 2 shows an example that screens information extraction involves the selection of
EMRs for possible adverse drug events for the relevant sections of EMR and then targeted
patients on warfarin. text data processing. NLP systems, such as
Such algorithms can be used in manual review MEDSYNDIKATE, MetaMap, SemRep,
of medical records and can also be used to design MedLEE, and BioMedLEE, can extract data
automatic review of EMRs. pertaining to patient safety events. In a recent
There are many challenges in implementing study of adverse drug events attributable to six
such explicit screening algorithms, and compro- drugs, Wang et al. (2009) demonstrated the gen-
mises are made. The Institute for Healthcare eral process, which consists of five stages:
Improvement (Griffin and Resar 2009) has devel- (1) collecting the set of EMRs to be mined,
oped a set of global trigger tools that screen med- (2) processing the summaries using NLP to
ical records for possible adverse events, including encode clinical narrative data, (3) selecting data
groups of triggers for medical, surgical, and while co-occurrence match of a specific drug and
medication-related patient harms. The tools its potential adverse drug events exist, (4) filtering
screen coded data; look for the most significant, data by excluding confounding information such
easy-to-detect signs; and can be applied by as diseases or symptoms that occurred before the
healthcare organizations to review paper-based use of the drug, and (5) analyzing and determining
and also electronic medical records. The trigger the drug-adverse drug events association.
tools have been adopted by many countries and In theory, any type of errors and adverse events
health systems. For example, Adventist Health that can be recognized by a clinician going
System used the tools to gauge the number, through a medical record can be captured elec-
types, and severity levels of adverse events in tronically. However, this theory is far from being
25 hospitals that used a common EMR system realized. There are many EMR systems that vary
and developed a centralized process to do so uni- substantially in structure, format, and content, and
formly, including quarterly reports to participating there are legal and practical obstacles over data
11 Health Services Information: Patient Safety Research Using Administrative Data 249
sharing. However, some healthcare systems have extracted from the FAERS database to researchers
started to pull together EMR data for research. It is inside and outside of the FDA. Similar to the FDA
expected that in the near future, research data- FAERS, the UK’s Medicines and Healthcare
bases composed of large volume of medical Products Regulatory Agency institutes a Yellow
records from many providers and cross care set- Card Scheme that allows patients and health pro-
tings, databases resembling HCUP or CMS data fessionals to report suspected side effects. The
navigator, will be created and made available to reports are continually assessed by medicine
health services researchers. safety experts, together with additional sources
of information such as clinical trial data, the med-
ical literature, and data from international medi-
Reports and Surveillance of Patient cines regulators, in order to identify previously
Safety Events unidentified safety issues or side effects.
MEDMARX is a similar system of voluntary
Data Sources reports but focuses on medication errors. Cur-
Alternative data sources for patient safety rently, MEDMARX contains over 1.3 million
research include mandatory and voluntary reports medication error records reported by over
of medical errors or adverse events, drug safety, or 400 healthcare facilities that voluntarily partici-
nosocomial infection surveillance systems and pate. The program collects information on medi-
other data systems that government agencies and cation errors, categorizing them into nine severity
nongovernmental organizations use specifically to levels, ranging from errors that do not reach
monitor patient safety. Spontaneous reporting sys- patients to errors that cause death. The reporting
tems have been created as the primary means for system contains up to 13 required data elements
providing postmarket safety information on drugs and 29 optional data elements to describe error
since the 1960s, and some systems have also types, causes, locations, staff involved, products
covered patient safety events due to inappropriate involved, and patient characteristics. The system
use of drugs. Such systems exist all over the world also asks about actions taken in response to the
in various names and with various mandates. errors, including both individual procedural activ-
This type of data sources records individual ities (i.e., actions to recover from the error) and
incidences of patient safety events and varies tre- practice-based changes (i.e., actions to prevent
mendously in formats and contents. One promi- future errors). Most data elements are coded fields
nent example of such a reporting system is the US allowing single or multiple selection, and some
Food and Drug Administration (FDA) Adverse data fields are for textual descriptions.
Event Reporting System (FAERS). FAERS con- Some surveillance systems collect similar data
tains information on adverse event and medica- but make reporting mandatory in order to accu-
tion error reports submitted to the FDA by rately track incidences of patient safety events.
healthcare professionals and consumers voluntar- The Centers for Disease Control and Prevention
ily as well as by drug manufacturers who are (CDC) National Nosocomial Infections Surveil-
required to send all adverse event reports they lance System is a prominent example of such a
receive from healthcare providers and consumers. data source, which has continued gathering
The database is designed to support the FDA’s reports from a sample of hospitals in the United
postmarketing safety surveillance program for States on nosocomial infections since the 1970s.
drug and therapeutic biologic products, to help Another example is the National Electronic Injury
FDA look for new safety concerns that might be Surveillance System (NEISS) at the CDC, com-
related to a marketed product, to evaluate a man- posed of a national probability sample of hospitals
ufacturer’s compliance with reporting regulations, in the United States that collect patient informa-
and to respond to outside requests for information. tion for every emergency visit involving an injury
Besides regulatory use, the FDA provides raw associated with consumer products, including
data consisting of individual case reports medical products. More recently, to address
250 C. Zhan
heightened public concerns over drug safety, the sector. Some of the surveys collect data on per-
system started a Cooperative Adverse Drug Event sonal encounters with healthcare systems and,
Surveillance Project (NEISS-CADES) to capture therefore, are potential data sources for patient
cases that are defined as those occurring in per- safety research.
sons who sought emergency care for injuries In the United States, the National Center for
linked by the treating physician to the outpatient Health Statistics, under the CDC, conducts a wide
use of a drug or drug-specific adverse effects. array of national surveys that contain healthcare
Using NEISS-CADES, Budnitz et al. (2011) encounter experiences. The National Ambulatory
were able to estimate that adverse drug events in Medical Care Survey collects information about
older Americans accounted for about 100,000 the provision and use of ambulatory medical care
emergency hospitalizations a year in the United services, drawing a random sample of visits to
States, and four medications (warfarin, insulins, nonfederal, office-based physicians who are pri-
oral antiplatelet agents, oral hypoglycemic marily engaged in direct patient care. The
agents) were implicated alone or in combination National Hospital Ambulatory Medical Care Sur-
in two thirds of the cases. vey collects similar data, on the utilization and
provision of ambulatory care services in hospital
Patient Safety Measures emergency and outpatient departments from a
Because each record of this type is to provide national sample of visits to the emergency depart-
details for one specific patient safety event, no ments and outpatient departments of noninsti-
effort is needed to identify or validate the reported tutional, general, and short-stay hospitals. The
event. The data allows various targeted research, National Hospital Discharge Survey collects data
such as the types of errors or adverse events most from a national sample of hospital discharges
frequently occurring, the circumstances, the pos- from nonfederal, short-stay hospitals. The
sible causes as reported, and the follow-up National Hospital Care Survey, a relatively new
actions. But this type of data has some obvious database, integrates inpatient data formerly
limitations for patient safety research. First, the collected by the National Hospital Discharge
reported event (adverse event or medication error) Survey with the emergency department, outpa-
may not be due to the product or a causal relation- tient department, and ambulatory surgery center
ship with the product. Second, the reports do not data collected by the National Hospital Ambula-
always contain enough detail to properly evaluate tory Medical Care Survey, with personal identi-
an event. Third, because of the voluntary nature of fiers linking care provided to the same patient in
data submission, the system does not receive the emergency departments, outpatient depart-
reports for every adverse event or medication ments, ambulatory surgical centers, and inpatient
error that occurs; therefore, the data cannot be departments.
used to calculate the incidence of an adverse Beside surveys of healthcare encounters as
event or medication error in a population. Lastly, listed above, some surveys ask patients and fam-
this type of data contains no controls (i.e., the ilies directly for information on their healthcare
patients without patient safety events), severely experiences. CMS Medicare Current Beneficiary
limiting its use in epidemiological research. Survey is such a data source, containing survey
responses from a random sample of Medicare
beneficiaries and linking to their administrative
Surveys of Healthcare Encounters data covering inpatient, outpatient, and other
and Healthcare Experiences claims. AHRQ Medical Expenditure Panel Sur-
vey is a set of large-scale surveys of families
Data Sources and individuals, their medical providers, and
Many government agencies conduct routine sur- employers on healthcare use and spending.
veys to collect data in order to produce national Similar surveys of healthcare encounters,
statistics and track changes in the healthcare residents, or families exist in various forms in
11 Health Services Information: Patient Safety Research Using Administrative Data 251
many other countries. For example, the Canadian causes, and circumstances of the underlying errors
Community Health Survey resembles the Medical and identify potential strategies to improve patient
Expenditure Panel Survey in general purposes safety.
and methods, collecting information annually on Combining multiple data sources for research
a large sample of the Canadian population on has been a significant trend in recent years. The
information related to health status, healthcare FDA’s Mini-Sentinel Project is an example.
utilization, and health determinants. Tasked with monitoring the safety of approved
medical products, the postmarket surveillance
Patient Safety Measures system consists of claims data from 18 private
Surveys of healthcare encounters and healthcare health plans covering about 100 million people,
use usually contain data on medical conditions, supplemented by EMR data from 18 healthcare
diagnoses, and procedures, coded by ICD-9-CM organizations, designed to answer the FDA’s
or other similar coding systems. As with claims questions on postmarket safety. The claims data
data, some patient safety indicators can be derived capture the complete records of individuals’ expo-
from the coded data. Depending on the data sure to a specific medical product in question and
collected, other screening algorithms can be limited measures of patient outcomes such as
designed. For example, many surveys collect death and major, codified complications. The
data on medication prescriptions, and measures linked EMR is then used to confirm a diagnosis
of inappropriate medication prescriptions can be and adverse events. The data are hosted locally
derived by screening medications that generally with individual participants to protect privacy and
should not be prescribed to patients with advanced confidentiality and are aggregated through com-
age or with certain medical conditions. Once a mon data formats and analytical modules. This
patient safety event is identified with moderate complicates the data analysis somewhat, but
specificity and sensitivity, survey data support a with flexible design and proper stratification,
wide range of patient safety research, especially such combined data can answer a great number
with national statistics, variation across regions of patient safety questions efficiently.
and social strata, and changes over time. Some administrative data sources that are not
concerned with patient safety events can be of
great value to patient safety research. Data col-
Other Data Sources and Data Linkage lected from providers for statistics, membership
management, or licensing purposes can be
Many other administrative data sources besides merged with patient encounter data capable of
the four types discussed earlier contain informa- identifying patient safety events. The American
tion on individual events of patient safety con- Hospital Association’s Annual Survey, for exam-
cerns. Malpractice claims, for example, contain ple, contains hospital-specific data on approxi-
rich data for patient safety research. A malpractice mately 6,500 hospitals and 400-plus systems,
claim is a written demand for compensation for a including as many as 1,000 data fields covering
medical injury, alleging that an attending physi- organizational structure, personnel, hospital
cian or a care provider is responsible for the injury facilities and services, and financial perfor-
due to missed or delayed or wrong diagnosis or mance. By linking this data with data on personal
treatment. A claims file captures information on healthcare encounters, researchers can study a
an entire litigation, from statement of claim, depo- variety of hospital-level factors in relationship
sitions, interrogations, reports of internal investi- to patient safety events. The American Medical
gations, root cause analyses, expert opinions from Association maintains a suite of membership
both sides, medical records and analysis, and final data, including the Physician Masterfile that con-
resolution and payments. Working with malprac- tains extensive personal and practice-related data
tice insurance companies, researchers can access for more than 1.4 million physicians, residents,
closed malpractice claims to study the nature, and medical students in the United States. By
252 C. Zhan
linking this file with other data, researchers are General Framework for Administrative
able to examine physician-related factors in rela- Data-Based Patient Safety Research
tion to patient safety events. Other types of orga-
nizations, such as nursing homes, home care Generally speaking, there are two types of
agencies, hospice, and primary care practices, research: estimation and hypothesis testing.
all maintain similar membership data, and, in Since patient safety research is a relatively new
theory, all can be linked to amplify patient safety field, most published studies since the landmark
research. 1999 IOM report have been about estimating
Population census data and geopolitical data prevalence and incidence of patient safety events
can make similar contributions to patient safety and distributions by categories, settings, causes,
research. Population surveys can provide denom- and circumstances. It is well recognized that each
inator information such as total population and administrative data source has an inherent popu-
subpopulations by age, racial, economic, and lation, such as Medicare beneficiaries from Medi-
other categories. The Area Resource File, com- care claims, which is further refined by exclusion
piled by the US government, contains information and inclusion criteria defined by the patient safety
on health facilities, health professions, measures screening algorithms employed. The focus for a
of resource, health status, economic activity, robust estimation study is to correctly identify the
health training programs, and socioeconomic numerators (i.e., patient safety events) and the
and environmental characteristics. By linking denominators (i.e., the underlying population at
this file with other patient safety data through risk for the patient safety events), a seemingly
geographic codes, researchers can explore geo- straightforward but in reality rather tenuous
graphic variation in patient safety events and process.
related econ-geo-political factors. To test hypotheses, administrative data-based
Data access to many of the above data sources patient safety research usually follows the general
can be challenging, but the challenges are fewer framework of regression analysis in epidemiology.
and less restricting compared with other data- To test hypotheses, administrative data-based
gathering efforts. Government-owned data are patient safety research usually follows the general
usually available following straightforward pro- framework of regression analysis in epidemiology
cesses. Data owned by private organizations can in which the occurrence of a patient safety event Y
be obtained in many ways, including, through is related to possible causes being examined or
collaboration with the data owners or researchers interventions evaluated X and confounding fac-
intimate with the data owners. tors Z. Within this framework two types of ques-
tions can be addressed. The first type of question
is why a patient safety event occurs, and the
Patient Safety Research Using second type of question is what are the conse-
Administrative Data: General quences of such an event.
Framework, Methods, and Tools In answering both questions, the most criti-
cal task is to build an analytical dataset out of
Because administrative data are not collected or one or more administrative data sources for a
compiled following an a priori study design, specific patient safety research question. This
efforts in choosing appropriate methods and in step involves the correct identification and
presenting the results in light of inherent limita- measurement of X, Y, and Z in the context of
tions of various data sources are of great impor- study cohorts of selected study subjects and
tance in generating valid information and time-stamp data, matching the data sources
knowledge on patient safety questions. This sec- (e.g., who is in the dataset and what X, Y,
tion offers a brief review of the general frame- and Z can be correctly measured and time-
work, methods, and tools for patient safety stamped) and the research questions to be
research using administrative data. answered. The second step is relatively easier,
11 Health Services Information: Patient Safety Research Using Administrative Data 253
using established statistical models or more standard. Specificity is defined by the positive
advanced data-mining techniques to estimate predictive value (PPV), which is the proportion
the relational parameters in the equation. The of patients flagged in the administrative data as
third step, interpreting the results and making having patient safety events who actually had
valid inferences in the full acknowledgment of such events, as confirmed by medical record
data limitations, also demands great attention. review or other ascertaining methods. Sensitivity
is the proportion of the patients with patient safety
events that are actually flagged in the administra-
Methodological Considerations tive data. Table 3 shows the calculation.
Zhan and his colleagues (2009) demonstrated
Identification of Patients with Patient the complexity of this issue in a study that
Safety Events attempted to determine the validity of identifying
The previous section went through the list of hospital-acquired catheter-associated urinary tract
potential administrative data sources and potential infections (CAUTIs) from Medicare claims, using
patient safety measures these data sources may medical record review as the gold standard. They
offer. It is clear that the usefulness of an adminis- found that ICD-9-CM procedure codes for urinary
trative data source in patient safety research catheterization appeared in only 1.4 % of Medi-
depends, first of all, on the ability of the data care claims for patients who had urinary catheters.
source to correctly identify patient safety events. As a result, using Medicare claims to screen UTIs
The validity of derived patient safety measures cannot be limited to claims that have a procedure
depends on carefully designed and validated indi- code for urinary catheterization. Using major sur-
cators, screening algorithms, or triggers. There- gery as the denominator, Medicare claims had a
fore, with the exception of medical error reports PPV of 30 % and sensitivity of 65 % in identify-
and malpractice claims where each record is, by ing hospital-acquired CAUTIs. Because 80 % of
definition, a patient safety event, a robust patient the secondary diagnosis codes indicating UTIs
safety research project starts with the most critical were present on admission (POA), adding POA
task of screening, determining, and ascertaining indicators in the screening algorithm would
patient safety events. This is a process of science, increase the PPV to 86 % and sensitivity to
rooted in the researchers’ understanding of the 79 % in identifying hospital-acquired CAUTIs.
relevant medical knowledge, the data-generating This study indicates that the screening algorithm
process, the structure of the specific databases, based on the selected ICD-9-CM codes and POA
and the specific purposes of the relevant research. code and confined to major surgery patients is a
It is also an art since there is usually no set formula valid way to identify patients with hospital-
for health services researchers to follow in com- acquired CAUTIs in Medicare claims data.
pleting this first step. Claims from private insurance do not currently
In general, the validity of an administrative contain POA codes and, therefore, are not suitable
data-based patient safety measure can be evalu- for research aimed at estimating CAUTI preva-
ated by specificity and sensitivity, with medical lence or hypothesis testing due to the 70 %
record review serving most often as the gold false-positive rate.
Table 3 Calculation of specificity and sensitivity of a patient safety measure based on administrative data, using medical
record review as the gold standard
Medical record review
Administrative data screening With patient safety event Without patient safety event
With patient safety event True positive (TP) False positive (FP)
Without patient safety event False negative (FN) True negative (TN)
Validity calculation PPV ¼ TP=ðTP þ FNÞ; Sensitivity ¼ TN=ðTN þ FPÞ
254 C. Zhan
Because medical record review is labor inten- These variables support a wide range of
sive and expensive, researchers often cannot val- cross-sectional analyses and longitudinal studies
idate the screening algorithms they use and have when the variables are time-stamped. Many
to rely on what has been reported in the literature. claims databases, such as Medicare claims, allow
In many cases, validity data are entirely researchers to build the complete profile of
unavailable. Nonetheless, researchers need to patient’s healthcare experiences from multiple set-
have a clear understanding of the specificity and tings (e.g., inpatient, outpatient, pharmacy), over
sensitivity in the case identification algorithms multiple years. Researchers can identify not only
they use based on relevant literature, context anal- cases of patient safety events and controls but
ysis, or experience and decide whether the patient also cohorts to retrospectively follow over time,
safety measures are valid enough for their greatly expanding the capacity of any single
research purposes and discuss their results in administrative data source.
light of these limitations. Besides identifying administrative databases
with variables of interest, one crucial consider-
Construction of Analytical Dataset ation in analytical data construction is the linkage
Only with confidence that patient safety events of multiple data sources. The simplest kind of
can be identified with an acceptable level of spec- record linkage is through a unique identification
ificity and sensitivity from an administrative data number, such as social security number, or multi-
source should a researcher proceed to construct an ple variables that accurately identify a person,
analytical dataset. As discussed earlier, most such as name, age, date of birth, gender, address,
administrative data contain measures of basic per- phone number, and so on. This method is called
sonal information, medical conditions, diagnosis, deterministic or rules-based record linkage.
treatment, and disposition, and the administrative Sometimes, a personal identifier is combined
data can be expanded by linking to other data with some personal demographic data in data-
sources on patients (e.g., National Death Index), bases with missing data or errors in the identifier.
providers (e.g., AHA hospital surveys), local Administrative data sources often do not contain
socioeconomic data (e.g., Area Resource Files), or share common identifiers, and a new method
and so on (e.g., census population statistics), to called probabilistic record linkage can be used.
form analytical files. From these extended datasets, Probabilistic record linkage takes into account a
arrays of variables of interest, such as dependent wider range of potential identifiers, computing
variables, explanatory variables, or confounding weights for each identifier based on its estimated
controls, can be constructed, including: ability to correctly identify a match or a non-match,
and uses these weights to calculate the probability
• Patient characteristics: age, sex, insurance that two given records refer to the same entity.
coverage, etc. Record pairs with probabilities above a certain
• Medical conditions and diagnoses: primary threshold are considered to be matches, while
diagnosis, secondary diagnoses, pairs with probabilities below another threshold
comorbidities, etc. are considered to be non-matches; pairs that fall
• Treatment or utilization: medical and surgical between these two thresholds are considered to be
procedures, medications, outpatient visits, etc. “possible matches” and can be dealt with accord-
• Patient outcomes: disposition (including ingly (e.g., human reviewed, linked, or not linked,
death), length of stay, charges or payments, depending on the requirements).
complications, etc.
• Provider characteristics: ownership, practice Data Analysis
size and composition, financial status, etc. For most patient safety studies using administra-
• Area characteristics: population statistics, mar- tive data, the methods are simple and straightfor-
ket competitiveness, managed care market ward; the common statistical methods for
share, etc. observational studies, such as logistic regressions
11 Health Services Information: Patient Safety Research Using Administrative Data 255
with the dichotomous variable of having a patient case group and control group are “the same”
safety event or not as the dependent variable and with regard to these factors. Matching can
ordinary least-square regression with a continuous either be done on a one-to-one basis or one-
dependent outcome variable as dependent vari- to-many basis, and patients can be matched
able, apply. As with observational studies, admin- with respect to a single confounder or multiple
istrative data-based patient safety research can fall confounders. This method is particularly appli-
into the following broad categories: cable to administrative data-based patient
safety research because patients with safety
• Cross-sectional study, involving studying a events are few and potential controls are
population at one specific point in time many; therefore, it is relatively easy to find
• Case-control study, in which two existing one or more matching controls for each case.
groups differing in outcome are identified and • Stratification: once a confounding variable is
compared on the basis of some hypothesized identified, the cohort is grouped by levels of
causal attribute this factor. The analysis is then performed on
• Longitudinal study, involving repeated obser- each subgroup within which the factor remains
vations of the same variables over long periods constant, thereby removing the confounding
of time potential of that factor.
• Cohort study, a particular form of longitudinal • Multivariable regression: regression analysis,
study where a group of patients is closely mon- the most commonly used analytical technique,
itored over a span of time is based on modeling the mathematical rela-
tionships between two or more variables in
observed data. In the context of administrative
However, administrative data-based patient data-based patient safety research, there are
safety research is unique in many ways. First, four types of outcome measures. The first
the number of observations is substantially larger
type is a binary outcome, such as surgical site
than studies of experimental design or involving infections complicating total hip replacement,
primary data collection. Second, because, by def- where multivariable logistic regression is the
inition, patient safety events are unintended or
proper method to identify factors associated
unexpected; the cases of interest (i.e., patient with the infections. The second type is a con-
safety events) are usually very small in numbers tinuous outcome, such as functional status or
and rates. The standard approaches to causal infer-
costs, where multivariable linear regression is
ence or risk adjustment easily produce statistically applicable to study the influence of various
significant findings that are small and clinically predictors of the outcomes. The third type is
meaninglessly. Third, the cases of interest are
an incidence rate, such as nosocomial infection
identified with a certain level of uncertainty or rates at individual hospitals, where Poisson
misclassification errors, as discussed earlier. regression may be the best method to identify
These particulars should be born in mind when
hospital-level factors that predict higher or
devising analytical approaches.
lower nosocomial infection rates. The fourth
The following general methods have been used
type is a time-to-event outcome, such as
in administrative data-based patient safety
reoperation following initial operation, where
research:
Cox proportional hazards model may be most
appropriate to study risk factors.
• Matching: matching is a conceptually straight- • Propensity score analysis: propensity score
forward strategy, whereby confounders are analysis entails two steps. In the first step, it
identified and patients in the cases (e.g., those summarizes multiple confounding variables
with patient safety events) are matched to the into a probability or “propensity” of having a
controls (e.g., those without safety events) on patient safety event or falling into an interven-
the basis of these factors so that, in the end, the tion group, usually generated by a logistic
256 C. Zhan
regression model, with the propensity score unproductive even with advanced NLP tech-
ranging from 0 to 1. In the second step, the niques. By cascading steps through coded data,
propensity score is used for matching or researchers can narrow down the text data and
performing stratified analysis or to be inserted read selected text narratives to gain valuable
into multivariable regression to estimate the insights. For example, in their analysis of
impact of a patient safety event or an warfarin-related medication errors, Zhan
intervention. et al. (2008) found that one hospital reported
• Instrumental variable analysis: the instrumen- dispensing errors four times higher than aver-
tal variable approach is a method for age, two thirds of the errors occurred in the
confounding control that has been used by hospital’s pharmacy department, and 65 % of
economists for decades but has only recently the errors were caused by inaccurate/omitted
been implemented in health services research. transcriptions. The textual descriptions in these
The basic idea is that if a variable (the instru- reports clearly revealed the difficulties the
mental variable) can be identified, that has the pharmacists were having with the hospital’s
ability to cause variation in the treatment of new medication administration record system,
interest but that has no impact on outcome therefore pinpointing the fix.
(other than through its direct influence on treat-
ment). Then the variable can be used as an In summary, all methods for observational stud-
instrument in the regression analysis to control ies in epidemiology, sociology, and economics are
for unobserved or unobservable confounding applicable to administrative data-based patient
variables on the outcome variable. safety research. Health services researchers should
• Data-mining methodologies: data mining consult textbooks in these fields and also follow the
refers to an analytic process designed to advancement of methodologies in data mining,
explore data (usually large amounts of data, pattern recognition, and machine learning that are
known as “big data”) in search of consistent being developed and increasingly applied to extract
patterns and systematic relationships between information and knowledge from “big data” in the
variables and then to validate the findings by Information Age.
applying the detected patterns to new subsets
of data. One example of data-mining methods Interpreting the Results
used in administrative data-based patient The results from administrative data-based patient
safety research is called disproportionality safety research must be interpreted in light of the
analysis, which creates algorithms that calcu- limitations implicit both in the data and in the
late observed-to-exposed ratios. For example, methods. First of all, the specificity and sensitivity
to find the link between a drug and a suspected of the methods or algorithms used to screen or
adverse event, researchers can compare each identify patient safety events must be adequately
potential drug-adverse event pair to back- explained, and the potential bias due to misclassi-
ground across all other drugs and events in fication of cases needs to be discussed. Similar
the database and flag those pairs with dispro- measurement errors may also occur in other
portional ratios for further causal investigation. important variables derived from administrative
Unsupervised machine learning is another data, and similar discussions need to be made.
example, encompassing many data-mining Second, administrative data-based patient
methods purported to discover meaningful safety research shares the same flaws that all
relationships between variables in large observational studies have. Regardless what
databases. methods are used, there is always the possibility
• Contextual analysis: some administrative data that confounding remains in the results, due to a
sources contain extensive narrative data. wide range of possible causes from unobserved
Screening text data for information on patient or missed confounders, to measurement errors
safety events is costly and, sometimes, and mis-specifications of analytical models.
11 Health Services Information: Patient Safety Research Using Administrative Data 257
Furthermore, multiple other criteria are required literature to develop a list of candidate indicators
to establish causation. For example, multivariable and collected information about their perfor-
adjustment cannot give causation unless factors mance. Second, they formed several panels of
such as appropriate temporal ordering of predic- clinician experts to solicit their judgment of clin-
tors and outcome are ensured. Finally, health ser- ical sensibility and their suggestions for revisions
vices researchers must completely report how the to the candidate indicators. Third, they consulted
analyses were undertaken. From choice of con- ICD-9-CM coding experts to ensure that the def-
founders to the statistical procedure used, ade- inition of each indicator reflects the intended clin-
quate information should be provided so that an ical situation. Fourth, they conducted empirical
independent analyst can reliably reproduce the analysis of the promising indicators using HCUP
reported results. data. Last, they produced the software and docu-
mentation for public release by AHRQ.
Since its inception, AHRQ PSIs have been
AHRQ Patient Safety Indicators: An constantly validated and updated. The latest PSIs
Exemplary Tool for Administrative (AHRQ 2014) include 23 indicators and one com-
Data-Based Patient Safety Research posite indicator with reasonable face and con-
struct validity, specificity, and potential for
The AHRQ patient safety indicators (AHRQ fostering quality improvement. Most indicators
PSIs) are one of the most popular measurement use per 1,000 discharges as the denominators,
tools for screening patient safety events in admin- listed below. Some of the indicators are designed
istrative data (AHRQ 2014). Developed in the to capture event rates within a community:
United States in the context of claims data using
ICD-9-CM coding system, this toolkit has been PSI 02 Death Rate in Low-Mortality Diagnosis
adopted worldwide. A case study of AHRQ PSIs Related Groups (DRGs)
serves to illustrate the general process, the poten- PSI 03 Pressure Ulcer Rate
tials, the challenges, and the limitations of admin- PSI 04 Death Rate among Surgical Inpatients with
istrative data-based patient safety research. Serious Treatable Conditions
AHRQ PSIs started with Iezzoni and col- PSI 05 Retained Surgical Item or Unretrieved
leagues’ 1994 complication screening program Device Fragment Count
(CSP) that relied on ICD-9-CM codes in claims PSI 06 Iatrogenic Pneumothorax Rate
data to identify 27 potentially preventable PSI 07 Central Venous Catheter-Related Blood
inhospital complications, such as postoperative Stream Infection Rate
pneumonia, hemorrhage, medication incidents, PSI 08 Postoperative Hip Fracture Rate
and wound infection. In the mid-1990s, AHRQ PSI 09 Perioperative Hemorrhage or Hematoma
broadened the CSP to include a set of administra- Rate
tive data-based quality indicators, including sev- PSI 10 Postoperative Physiologic and Metabolic
eral measures of avoidable adverse events and Derangement Rate
complications. Realizing the potential value of PSI 11 Postoperative Respiratory Failure Rate
administrative data-based measures in identifying PSI 12 Perioperative Pulmonary Embolism or
patient safety events, AHRQ contracted with the Deep Vein Thrombosis Rate
Evidence-based Practice Center at the University PSI 13 Postoperative Sepsis Rate
of California, San Francisco, and Stanford Uni- PSI 14 Postoperative Wound Dehiscence Rate
versity to further expand, test, and refine these PSI 15 Accidental Puncture or Laceration Rate
measures as well as improve the evidence behind PSI 16 Transfusion Reaction Count
their use with extensive literature reviews and PSI 19 Obstetric Trauma Rate-Vaginal Delivery
broad clinical consensus panels. The research Without Instrument
team developed AHRQ PSIs through a five-step PSI 21 Retained Surgical Item or Unretrieved
process (Romano et al. 2003). First, they reviewed Device Fragment Rate
258 C. Zhan
Table 4 describes, as an example, the definition that reflect the US hospitalized population in
of the numerator, denominator, and key exclu- age, sex, DRGs, and comorbidities; and estimates
sions for PSI #13, postoperative sepsis. smoothed rates that dampen random fluctuations
AHRQ created software that implements over time. Thirty comorbidity categories are auto-
evidence-based and consensus-approved algo- matically generated by the software and used as
rithms; calculates raw rates, risk-adjusted rates risk adjusters along with variables available in
Table 4 Claims-based screening for patient safety events: AHRQ PSI #13, postoperative sepsis
Description Screening algorithm
Numerator Discharges, among cases meeting the inclusion and exclusion rules for the denominator, with any
secondary ICD-9-CM diagnosis codes for sepsis. ICD-9-CM sepsis diagnosis code 1
0380 STREPTOCOCCAL SEPTICEMIA
0381 STAPHYLOCOCCAL SEPTICEMIA
03810 STAPHYLOCOCC SEPTICEM NOS
03811 METH SUSC STAPH AUR SEPT
03812 MRSA SEPTICEMIA
03819 STAPHYLOCC SEPTICEM NEC
0382 PNEUMOCOCCAL SEPTICEMIA
0383 ANAEROBIC SEPTICEMIA
78552 SEPTIC SHOCK
78559 SHOCK W/O TRAUMA NEC
9980 POSTOPERATIVE SHOCK
99800 POSTOPERATIVE SHOCK, NOS
99802 POSTOP SHOCK,SEPTIC
03840 GRAM-NEGATIVE SEPTICEMIA NOS
03841 H. INFLUENAE SEPTICEMIA
03842 E COLI SEPTICEMIA
03843 PSEUDOMONAS SEPTICEMIA
03844 SERRATIA SEPTICEMIA
03849 GRAM-NEG SEPTICEMIA NEC
0388 SEPTICEMIA NEC
0389 SEPTICEMIA NOS
99591 SEPSIS
99592 SEVERE SEPSIS
Denominator Elective surgical discharges, for patients ages 18 years and older, with any-listed ICD-9-CM procedure
codes for an operating room procedure. Elective surgical discharges are defined by specific DRG or
MS-DRG codes with admission type recorded as elective (SID ATYPE=3)
Exclude cases:
With a principal ICD-9-CM diagnosis code (or secondary diagnosis present on admission) for sepsis
(see above)
With a principal ICD-9-CM diagnosis code (or secondary diagnosis present on admission) for
infection
With any-listed ICD-9-CM diagnosis codes or any-listed ICD-9-CM procedure codes for
immunocompromised state
With any-listed ICD-9-CM diagnosis codes for cancer
With length of stay of less than 4 days
MDC 14 (pregnancy, childbirth, and puerperium)
With missing gender (SEX=missing), age (AGE=missing), quarter (DQTR=missing), year
(YEAR=missing), or principal diagnosis (DX1=missing)
11 Health Services Information: Patient Safety Research Using Administrative Data 259
most administrative data systems. The PSI • Pay-for-performance by hospital: some reform
website also provides software (in Windows and initiatives, such as CMS/Premier Demonstra-
SAS), benchmark tables, and risk-adjustment data tion, include AHRQ PSIs measures in pay-for-
for individual hospitals, hospital systems, health performance determination.
plans, state, and other interested parties to calcu-
late their own risk-adjusted rates and make com- AHRQ PSIs continue to evolve. Besides peri-
parison to national benchmarks. Researchers can odical refinements, one development hinges on
download the document and software for free the addition of time stamps on diagnosis codes
(AHRQ 2014). (i.e., present-on-admission code) in claims or dis-
The specificity and sensitivity of these indica- charge abstracts. This code helps to separate
tors have been evaluated, accounting for a sub- hospital-acquired adverse events (i.e., events
stantial portion of published literature on AHRQ occurred after admission) from comorbidities
PSIs. It appears that the validity of AHRQ PSIs (i.e., conditions present on admission). Another
varies substantially from indicator to indicator, development is to include basic clinical data such
depending also on the data sources and gold as lab data, to improve risk adjustments, recog-
standards used. nizing that such data exist alongside administra-
Broadly speaking, AHRQ PSIs have been used tive data in many healthcare systems. The third
for: direction is the conversion of ICD-9-CM based
AHRQ PSIs to ICD-10, which most European
• Internal hospital quality improvement: individ- countries use, with country-specific modifications
ual hospitals use them as a case finding trigger, (e.g., ICD-10-AM for Australian modification and
to do root cause analyses, to identify clusters of ICD-10-GM for German modification).
potential safety lapses, to evaluate impact of These improvements, combined with advance-
local interventions, and to monitor perfor- ments in administrative databases and computing
mance over time. technologies, will make AHRQ PSIs more useful
• External hospital accountability to the commu- in patient safety research in the future.
nity: local government, health systems, and
insurance carriers such as Blue Cross/Blue
Shield of Illinois produce hospital profiles to Patient Safety Research Using
support consumers. Administrative Data: Potentials
• National, state, and regional analyses: govern- and Limitations
ment and researchers used it to produce aggre-
gate statistics, e.g., AHRQ’s for National Administrative data-based patient safety research
Healthcare Quality/Disparities Reports, for started with a very simple expectation: to flag the
surveillance of trends over time, and for infrequent cases with potential patient safety con-
assessing disparities across areas, socioeco- cerns in the large volume of claims in order to
nomic strata, ethnicities, and so on. guide further, in-depth investigation. As adminis-
• Testing research hypotheses related to patient trative data sources became more available and
safety: researchers has used the PSIs to test var- screening algorithms improved, researchers began
ious hypotheses on patient safety risk factors, to produce a variety of estimates and statistics and
such as those that support house staff work test various hypotheses related to patient safety.
hours reform and nurse staffing regulation. More recently, attempts are being made to create
• Public reporting by hospital: several states safety performance reports from administrative
(e.g., Texas, New York, Colorado, Oregon, data for individual providers or healthcare systems,
Massachusetts, Wisconsin, Florida, and Utah) study variations across regions, and track progress
include AHRQ PSIs measures in their public over time. Previous sections have touched on many
reporting of hospital quality. examples of such work. This section offers a more
260 C. Zhan
detailed review of the types of patient safety stud- epidemiology of patient safety events, categoriz-
ies, with examples, that administrative data can ing the events, assessing the prevalence, and
support and their limitations. understanding the causes and impacts, following
the general framework and methodologies dis-
cussed earlier.
Screen Patient Safety Events
for In-depth Examination
Prevalence of Patient Safety Events
Because administrative data covers large
First and foremost, AHRQ PSIs, the global trigger
populations, they are often the only available
tools, and most screening algorithms, are consid-
data sources to estimate national or state rates of
ered indicators, not definitive measures, of patient
patient safety events. The National Healthcare
safety concerns. These indicators are proposed to
Quality Reports (AHRQ 2013), released annually,
screen claims data for adverse events and to guide
include, for example, the rate of postoperative
subsequent medical record reviews to determine
sepsis based nationwide inpatient claims and
whether safety concerns exist. AHRQ PSIs, for
the rates of ambulatory care visits due to adverse
example, enable institutions to quickly and easily
events based on the National Ambulatory Medical
identify a manageable number of medical records
Care Survey and the National Hospital Ambula-
for closer scrutiny. Ackroyd-Stolarz et al. (2014)
tory Medical Care Survey. The Medicare Current
developed an algorithm to screen the discharge
Beneficiary Survey, the Medical Expenditure
abstract database of a Nova Scotia hospital for
Panel Survey, the National Ambulatory Medical
fall-related injuries. They compared cases identi-
Care Survey, and the National Hospital Ambula-
fied in administrative data against cases identified
tory Medical Care Survey have been used to
in structured medical record review, finding that
examine the prevalence of inappropriate use of
administrative data could identify fall-related
medications in the United States (e.g., Zhan
injuries with sensitivity of 96 % and specificity
et al. 2001).
of 91 %. Their work provided the hospital with a
Similar studies on the prevalence of patient
powerful tool to locate records for patients with
safety events are numerous in medical literature,
fall-related injuries, explore causes, and search for
covering all settings of care and types of prob-
solutions to the problem.
lems. A more recent example is a study conducted
Screening cases of patient safety concerns is
by Owens et al. (2014). By examining claims of
especially advantageous when the targeted events
hospitalizations and ambulatory surgical visits for
are rare. For example, it is not likely that one
infections following ambulatory surgery, the
hospital provides enough data to study patterns,
authors were able to estimate the incidence of
causes, or circumstances of foreign objects left in
surgical site infections after ambulatory surgery
during surgery, because the events occur in less
procedures, highlighting safety concerns in the
than 1 in 10,000 surgeries (Zhan and Miller 2003).
fast-growing outpatient surgery centers in the
Screening claims with AHRQ PSIs could quickly
United States.
identify such rare events, and associated medical
records could be obtained and abstracted for
Causes of Patient Safety Events
in-depth analysis. This two-step approach is par-
Many administrative data-based studies address
ticularly useful for individual providers or health
the causes and circumstances of patient safety
systems in their search for localized safety lapses
events. Gandhi et al. (2006) intended to find out
and improvement strategies.
how missed and delayed diagnoses in the out-
patient setting led to patient injuries. For their
Epidemiological Study purpose, the authors chose closed malpractice
claims from four malpractice insurance compa-
A large proportion of administrative data-based nies. They selected 181 claims where patients
patient safety research is aimed at discovering the sued doctors for injuries stemmed from
11 Health Services Information: Patient Safety Research Using Administrative Data 261
diagnosis errors and had a team of doctors sepsis occurs, a hospital loses financially, estab-
review the closed documents, including state- lishing a case for collaboration among hospitals,
ment of claims, depositions, interrogatories, payers, and patients or patient advocates to reduce
reports of internal investigations, root cause postoperative sepsis. This type of study is com-
analyses, expert opinions on both sides of the mon in health services research literature.
litigations, medical records, and other docu-
ments in the closed file to determine what Interventions and Policies to Improve
kind of errors happened and what were the Safety
possible causes. They found that failure to order Administrative data have been used to evaluate
appropriate diagnostic tests, failure to create a many system-wide interventions aimed at improv-
proper follow-up plan, and failure to obtain ade- ing patient safety. Many studies have been
quate history or perform adequate physical exam- conducted in the United States, Canada, and the
ination (55 %, 45 %, and 42 %, respectively) United Kingdom, for example, to evaluate how
were the leading types of diagnosis errors that various levels of nurse staffing, different staffing
resulted in the malpractice cases. models, and nursing hours affect patient safety, by
Zhan et al. (2008) examined warfarin-related linking safety estimates from hospital claims or
medication errors voluntarily reported to the abstracts to nurse staffing data from hospital sur-
MEDMARX database. By tabulating and cross- veys. Rafferty et al. (2007) did such a study using
tabulating coded variables in a cascading way data from 30 English hospital trusts. They used
and screening open-ended narratives in selected data from three sources: hospital structure (e.g.,
reports, the authors were able to construct a com- size and teaching status) from hospital adminis-
prehensive understanding of errors in warfarin trative databases; patient outcomes, specifically,
prescriptions and administration in hospitals patient mortality and failure to rescue, from hos-
and clinics. They found that, in outpatient pital discharge abstracts; and data on nursing
settings, 50 % of errors in warfarin medication staffing and nurse job satisfaction from surveys
occurred in pharmacies and 50 % were inter- of the participating hospitals. Their finding that
cepted by a pharmacist, indicating the critical higher patient-to-nurse ratios were associated
role of pharmacists in helping patients with with worse patient outcomes could help hospitals
warfarin use. plan their nurse staffing.
A study by Dimick et al. (2013) is an exam-
Impact of Patient Safety Events ple of how administrative data can be useful to
Once patient safety events are identified with an evaluate national health policies. Starting in
acceptable level of validity in administrative data, 2006, CMS has restricted coverage of bariatric
it is relatively easy to examine the impacts of the surgery to hospitals designated as centers of
events on various patient and social outcomes excellence by two major professional organiza-
identifiable in the data. Using AHRQ PSIs, Zhan tions. The authors wanted to explore if such
and Miller (2003) screened nationwide hospital coverage policy change improved patient safety
claims and estimated the impacts of the selected as it intended. It would be difficult to design a
patient safety events on length of stay, charges, study based on primary data collection or data
and mortality. The authors found that postopera- sources other than nationwide administrative
tive sepsis, for example, extended hospital stay data to evaluate this policy. Using claims from
by about 11 days, added $58,000 extra charges 12 states covering 2004–2009, Dimick et al.
to the patients’ hospital bills, and increased the (2013) were able to estimate risk-adjusted rates
inhospital mortality rate by 22 %. In another of complications and reoperations of bariatric
study, Zhan et al. (2006) showed that when a surgery before versus after the implementation
case of postoperative sepsis occurred, Medicare of the national policy restricting coverage, find-
actually paid $9,000 extra. Taking the two studies ing that the policy has had no impact with regard
together, it is easy to see that, once a postoperative to patient safety.
262 C. Zhan
Table 5 Advantages and disadvantages of administrative data for patient safety research
Advantages of administrative data Disadvantages of administrative data
Already collected for administrative purposes and Information collected is restricted to data required for
therefore no additional costs of collection (besides data administrative purposes
acquisition and cleaning costs)
Large coverage of population of interest allowing Collection process does not follow any research design,
estimation and comparison at regional and national levels protocol, or procedure; lack of researcher control over
content
Collection process not intrusive to target population Algorithms, triggers, or indicators with variable validity,
subject to coding errors and coding variation across
institutions
Regularly, continuously updated Claims, abstracts, and surveys lack contextual, clinical
information, while malpractice claims and spontaneous
reports lack data on denominator or population at risk
Mostly computerized Results often statistically significant but clinically
Can be linked to form individual patient’s complete meaningless
healthcare experiences
Malpractice claims and spontaneous reports contain rich
contextual data not available elsewhere
11 Health Services Information: Patient Safety Research Using Administrative Data 263
cohorts that mimic a prospective study design and limited to variables available from administrative
test a wide range of hypotheses from risk factors data. On the other hand, malpractice claims and
to potential interventions. spontaneous medical error reports contain exten-
The greatest limitation lies with the fact that the sive details on specific events, but the denomina-
data were not collected with a research purpose, tor populations (i.e., patients at risk for those
study protocol, or quality assurance procedure. reported events) are unknown, severely limiting
Researchers have to creatively repurpose the the data’s ability to support estimation and
data to meet their research needs and make great hypothesis testing research.
efforts in methodology design to minimize poten- There are also many analytical challenges. The
tial biases. sheer size of administrative data can give the
As discussed earlier, the most critical task of illusion of great precision and power. Often
administrative data-based patient safety research times the differences found are statistically signif-
is to design valid patient safety screening algo- icant but of little clinical meaning. Coupled with
rithms or indicators. Most of the indicators missing important confounding variables and dif-
developed to date have relied on coded data in ficulty in choosing correct statistical models that
the administrative databases. Using ICD-9-CM fit the data, clinically insignificant but statistically
codes as examples, many concerns exist. First, significant results could lead to biased inferences
researchers can only find events for which there and erroneous conclusions. Health services
are corresponding ICD-9-CM codes. Second, researchers must bear in mind these limitations
there may be a substantial number of coding when designing their administrative data-based
errors, due to misunderstanding of codes, or errors patient safety studies and must interpret the results
by physicians and coders, or miscommunications with full acknowledgment of these limitations.
between them. Third, coding is very likely to be
incomplete because of limited slots for coding
secondary diagnoses and other reasons. Fourth,
References
assignment of ICD-9-CM codes is variable
because of the absence of precise clinical defini- Ackroyd-Stolarz S, Bowles SK, Giffin L. Validating
tions and context. Last but not least, diagnoses are administrative data for the detection of adverse events
not dated in most administrative data systems, in older hospitalized patients. Drug Healthc Patient Saf.
2014;13(6):101–8.
making it difficult to determine whether a second-
Agency for Healthcare Research and Quality (AHRQ).
ary diagnosis occurs prior to admission (i.e., a 2013 National healthcare quality report. http://www.
comorbid disease) or during a hospitalization ahrq.gov/research/findings/nhqrdr/nhqr13/2013nhqr.
(i.e., a complication or medical error). pdf. Accessed 1 Sept 2014.
Agency for Healthcare Research and Quality (AHRQ).
Administrative data have been repeatedly
Patient safety indicators. http://www.qualityindica
shown to have low sensitivity but fair specificity tors.ahrq.gov/Modules/psi_resources.aspx. Accessed
in identifying patient safety events. Focusing on 1 Sept 2014.
specific adverse events for specific patient Budnitz DS, Lovegrove MC, Shehab N, et al. Emergency
hospitalizations for adverse drug events in older
populations, as is built into the AHRQ PSIs,
Americans. N Engl J Med. 2011;365(21):2002–12.
improves specificity appreciably. But, in most Centers for Medicare and Medicaid Services (CMS). CMS
cases, researchers have to work with indicators data navigator. http://www.cms.gov/Research-Statis
that have modest validity in their research. tics-Data-and-Systems/Research-Statistics-Data-and-
Systems.html. Accessed 1 Sept 2014.
Lack of clinical details is another major limi-
Dimick JB, Nicholas LH, Ryan AM, et al. Bariatric surgery
tation of most administrative data such as claims complications before vs after implementation of a
and discharge abstracts. Of special concern is the national policy restricting coverage to centers of excel-
severity of illness that affects patient outcomes lence. JAMA. 2013;309(8):792–9.
Gandhi T, Kachalia A, Thomas E, et al. Missed and
and conceivably affects the likelihood of patient
delayed diagnoses in the ambulatory setting: a study
safety events. Analyses of outcomes and risk fac- of closed malpractice claims. Ann Intern Med.
tors associated with patient safety events are 2006;145:488–96.
264 C. Zhan
Griffin FA, Resar RK. IHI global trigger tool for measuring Romano PS, Geppert J, Davies S, et al. A national profile of
adverse events (Second Edition). IHI innovation series patient safety in US hospitals based on administrative
white paper. Cambridge, MA: Institute for Healthcare data. Health Aff. 2003;22(2):154–66.
Improvement; 2009. Wang X, Hripcsak G, Markatou M, et al. Active comput-
Healthcare Cost and Utilization Project (HCUP). http:// erized pharmacovigilance using natural language
www.hcup-us.ahrq.gov/. Accessed 1 Sept 2014. processing, statistics and electronic health records: a
Iezzoni LI, Daley J, Heeren T, et al. Using administrative feasibility studies. JAMIA. 2009;16:328–37.
data to screen hospitals for high complication rates. Zhan C, Miller M. Excess length of stay, costs, and mor-
Inquiry. 1994;31(1):40–55. tality attributable to medical injuries during hospitali-
Kohn LT, Corrigan JM, Donaldson M, et al. To err is zation: an administrative data-based analysis. JAMA.
human: building a safer health system. Washington, 2003;190(4):1868–74.
DC: Institute of Medicine; 1999. Zhan C, Sangl J, Bierman A, et al. Inappropriate medica-
Owens PL, Barrett ML, Raetzman S, et al. Surgical site tion use in the community-dwelling elderly: findings
infections following ambulatory surgery procedures. from 1996 Medical Expenditure Panel Survey. JAMA.
JAMA. 2014;311(7):709–16. 2001;286(22):2823–9.
Quan H, Drösler S, Sundararajan V, et al. Adaptation of Zhan C, Friedman B, Mosso A, et al. Medicare payment for
AHRQ patient safety indicators for use in ICD-10 selected adverse events under the prospective payment
administrative data by an international consortium. In: system: building the business cases for investing in patient
Henriksen K, Battles JB, Keyes MA, et al., editors. safety improvement. Health Aff. 2006;25(5):1386–93.
Advances in patient safety: new directions and alterna- Zhan C, Smith SR, Keyes MA, et al. How useful are
tive approaches. Rockville: Agency for Healthcare voluntary medication error reports? The case of
Research and Quality; 2008. warfarin-related medication errors. Joint Comm J
Raffertya AM, Clarkeb SP, Colesc J, et al. Outcomes of Qual Patient Saf. 2008;34(1):36–44.
variation in hospital nurse staffing in English Zhan C, Elixhauser A, Richards C, et al. Identification of
hospitals: cross-sectional analysis of survey data hospital-acquired catheter-associated urinary tract
and discharge records. Int J Nurs Stud. 2007;44(2): infections from Medicare claims: sensitivity and posi-
175–82. tive predictive value. Med Care. 2009;47(3):364–9.
Health Services Information: Personal
Health Records as a Tool for Engaging 12
Patients and Families
John Halamka
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
A Short History of Personal Health Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Products in the Marketplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
The Regulatory Environment: ARRA/HITECH, the HIPAA Omnibus Rule,
and FDASIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Myths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Digital Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Data Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
The Role of Personal Medical Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Research: OpenNotes, ICU Harm Reduction,
Care Plans, and Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
MyChart enables the patient to review their diag- A small number of reports are delayed to
noses, active medications, allergies, health main- enable a discussion between provider and patient
tenance schedules, immunizations, test results, to occur first. The Commonwealth of Massachu-
radiology results, appointments, and demo- setts has specific regulatory restrictions on the
graphics. In many cases, relevant health educa- delivery of HIV test results, so they are not
tional resources are automatically linked to key shown on PatientSite. The tests and their delays
terms or phrases in the patient’s medical record, are summarized below:
such as a diagnosis of diabetes. In addition,
patients can communicate with the physician CT scans (used to stage cancer) 4 days
office to request an appointment, request a pre- PET scans (used to stage cancer) 4 days
scription renewal, update demographic informa- Cytology results (used to diagnose cancer)
tion, update immunization status, or update a 2 weeks
health maintenance procedure. The patient can Pathology reports (used to diagnose cancer)
also request advice from an advice nurse or from 2 weeks
their own physicians.
Based on the success of these early adopters, HIV diagnostic tests: never shown
many electronic health record companies began
offering patient access to electronic records in • Bone marrow transplant screen, including:
the late 2000s. As is discussed below, the Federal HIV-1 and HIV-2 antibody
HITECH Meaningful Use program now requires HTLV-I and HTLV-II antibody
that patients be able to view, download, and Nucleic acid amplification to HIV-I (NHIV)
transmit their medical records, accelerating • HIV-1 DNA PCR, qualitative
market deployment of personal health record • HIV-2 and Western blot. Includes these results:
functionality. HIV-2 AB and EIA
HIV-2 and Western blot
• HIV-1 antibody confirmation. Includes these
Policies results:
Western blot
As personal health record technology was Anti-P24
deployed, many novel policy questions arose. Anti-GP41
What information should be shared and when? Anti-GP120/160
Who should have access? Should parents have
access to the records of their adolescent children? We want the patient to own and be the steward
Over time, many best practices have evolved of their own data, but we also want to support the
which have answered these questions. patient/provider relationship and believe that bad
Although the Health Insurance Portability and news is best communicated in person. Over time,
Accountability Act of 1996 (HIPAA) mandated it is likely that even these delays and restrictions
that patients have access to their medical records, will be removed, making all data instantly avail-
it did not require the release of data electronically. able to the patient. When the wife of the author of
The HIPAA Omnibus Rule of 2013 does require this chapter was diagnosed with breast cancer in
electronic access, but it does not specify how 2011, she wanted to see her pathology results
quickly releases should occur. Should a cancer immediately, even if they were bad news. In the
diagnosis be revealed to a patient in real time on future, the patient and provider may agree on data-
a website or wait for a personal conversation with sharing preferences as part of establishing a pri-
a physician? mary care relationship.
At BIDMC, the majority of the record is shared Other issues that arose during early experi-
with the patient immediately with minor excep- ences with personal health records included the
tions, since it is the patient’s data. access granted to adolescents and their parents.
268 J. Halamka
As more and more practices and hospitals are the parent’s link (unless they receive court
making patient portals available to their patients, documents stating that the parent remains the
providers of adolescent patients are encountering medical guardian).
a major hurdle: how to handle confidential ado- Health information contained in the patient portal:
lescent information. Children’s has identified and tagged certain
While adult patients generally maintain full information from their EHR that they consider
personal control of their personal health record sensitive, such as labs related to pregnancy,
(PHR), adolescent PHRs are anything but per- sexually transmitted illnesses, genetic results,
sonal. Adolescents rarely have full control of select confidential appointments, and poten-
their record, but instead rely on parents and guard- tially sensitive problems and medications.
ians to share control. The details around this This information is currently filtered from
shared access changes over time, depending on both parent and adolescent accounts, but in
developmental and age-appropriate consider- the near future, the sensitive information will
ations, as well as guardianship arrangements. flow to the adolescent account, but not to the
The biggest challenge then becomes how to parent account. So, even if a patient is less than
protect the adolescent’s legal right to privacy and 13 years, the parent would not have access to
confidentiality within this hybrid/proxy-control this information.
model. Many medical encounters with adolescents
come with the verbal assurance that what they tell This solution does take a lot of time and effort,
us will (under most circumstances) remain entirely but best replicates the current clinical practice.
confidential, meaning we will not discuss personal Many current PHR applications in the market-
health information pertaining to reproductive place do not allow for this type of differential
health, sexually transmitted diseases, substance access and only enable full proxy access.
abuse, and mental health with their parents or any- Alternative solutions include the following:
one else without their consent. As it turns out, this
type of confidential information is pervasive 1. Shared access for patient and parent, but filter-
through most EHRs. ing of sensitive information. One could then
Children’s Hospital Boston spent a lot of time choose the age at which patients would gain
thinking about this issue and adolescent access to access without worrying about the parent see-
our patient portal and ultimately developed a ing sensitive information at any age. This
custom-built solution to meet our and our makes the age at which the patient obtains
patients’ needs. access, whether it is 10 or 13 years, less impor-
Their approach is built around differential tant. Unfortunately, this option restricts adoles-
access to the patient portal with the goal of cent access to confidential information and
mirroring current clinical practice and works as creates a fragmented and incomplete record.
follows: 2. Adolescent access only. This is trickier,
because choosing the appropriate age when
Access to the patient portal: Separate accounts parental access is discontinued is difficult and
are created for the patient and parent(s) that may vary depending on patient characteristics.
are linked. The parent has sole access to the Many practices choose 12 or 13 years. How-
patient’s portal until the patient turns 13, at ever, if sensitive information is not being fil-
which point both the parent and the patient tered, there may be an occasional 11-year-old
can have access. They chose 13 years as the with a sexually transmitted infection. Also,
cutoff based on a number of factors, including some parents object to being cut off from
developmental maturity and other precedents their child’s medical information, and many
at their institution based on their policies. At play an important role in supporting their ado-
18 years, the patient becomes the sole owner of lescent children and guiding them through
the portal account, and Children’s deactivates healthcare decisions.
12 Health Services Information: Personal Health Records as a Tool for Engaging Patients and. . . 269
The issues and solutions involved with adoles- enhancing coordination of care. Many Blue
cent PHRs are certainly complex and will con- Cross affiliates have made such sites available.
tinue to evolve over time. However, I am hopeful Employer sponsored: In this model, employees
that PHRs will start incorporating the unique can access their claims data and benefit
needs of the adolescent population in the near information via a portal hosted by an indepen-
future, allowing both parents and adolescents dent outsourcing partner. The funding for
to share responsibility and engage in their employer-based personal health records is
healthcare. based on reducing total healthcare costs to the
employer through wellness and coordination
of care. A healthy employee is a more produc-
Products in the Marketplace tive employee. Keas is an example of an
employer-sponsored employee engagement
Over the nearly two decades that personal health for health application.
records have been deployed, there have been four Vendor hosted: Several vendors serve as a
basic models. secure container for patients to retrieve, store,
and manipulate their own health records.
Provider-hosted patient portal to the electronic Microsoft’s HealthVault includes uploading
health record: In this model, patients have and storage of records as well as a health search
access to provider record data from hospitals engine. Google offered such services from
and clinics via a secure web portal connected to 2007 to 2012, but discontinued the service
existing clinical information systems. Exam- because of lack of adoption. Humetrix is an
ples of this approach include the PatientSite example of a consumer-centered technology
and MyChart applications described above. vendor, focused on mobile apps and healthcare
The funding for provider-based PHRs is gen- information exchange. The business model for
erally from the marketing department since these PHRs is generally based on attracting
PHRs are a powerful way to recruit and retain more users to advertising-based websites,
patients. Also, the Healthcare Quality Depart- although the PHR itself may be advertising
ment may fund them to enhance patient safety free. Vendor-hosted PHRs include HITECH-
since PHRs can support medication reconcili- mandated privacy protections and must sign
ation workflows. Kaiser’s implementation business associate agreements and agree to
does not distinguish between the personal keep data private.
health record and electronic health record.
Instead they call it a patient-/provider-shared Here is the press release from Beth Israel Dea-
electronic health record. coness, describing the availability of HealthVault
Payer-hosted patient portal to the payer claims to its patients, which illustrates the value proposi-
database: In this model, patients have access tion communicated to the patients:
to administrative claims data such as discharge
diagnoses, reimbursed medications, and lab BOSTON: Beth Israel Deaconess Medical Center
tests ordered. Few payer-hosted systems con- (BIDMC) is expanding options for users of its
tain actual lab data, but many payers are now secure PatientSite portal by joining forces with
working with labs to obtain this data. Addi- Microsoft HealthVault to offer a new way to
tionally, payers are working together to enable safely exchange medical records and other
the transport of electronic claims data between health data.
payers when patients move between plans,
enhancing continuity of care. The funding for The affiliation follows an earlier commitment
payer-based PHRs is based on reducing total to offer a similar service through Google Health.
claims to the payer through enrollment of “We believe that patients should be the stew-
patients in disease management programs and ards of their own data,” says John Halamka, MD,
270 J. Halamka
BIDMC’s chief information officer. BIDMC’s In stage 1 of Meaningful Use, vendor software
PatientSite is wonderful if all care is delivered at was certified to provide basic health information
BIDMC. However, many patients have primary access to patients. Providers were optionally able
care doctors, specialists, labs, pharmacies, and to attest to use of personal health records as
nontraditional providers at multiple institutions. part of meeting criteria for stimulus payment. In
stage 2 of Meaningful Use, use of personal health
“Our vision is that BIDMC patients will be able to record technology became a mandatory part of
electronically upload their diagnosis lists, medication
lists and allergy lists into a HealthVault account and
attestation. The three provider requirements
share that information with health care providers related to PHRs include:
who currently don’t have access to PatientSite.”
• Providers must offer online access to health
PatientSite, which currently has more than information to more than 50 % of their patients
40,000 patient users and 1,000 clinicians, enables with more than 5 % of patients actually
patients to access their medical records online, accessing their information.
securely email their doctors, make appointments, • More than 5 % of patients must send secure
renew medications, and request referrals. messages to their provider.
HealthVault is designed to put people in con- • Providers must use the EHR to identify and
trol of their health data. It helps them collect, provide educational resources to more than
store, and share health information with family 10 % of patients.
members and participating health care providers,
and it provides people with a choice of third-party Although some institutions have offered per-
applications and devices to help them manage sonal health records for many years, others have
things such as fitness, diet, and health. not yet established the workflow, created the pol-
HealthVault also provides a privacy- and icies, or experienced the cultural changes that are
security-enhanced foundation on which a broad foundational to provider/patient electronic inter-
ecosystem of providers – from medical providers action. Many organizations have suggested that
and health and wellness device manufacturers to requiring actual use of the personal health record
health associations – can build innovative new by the patient is beyond provider control and thus
health and wellness solutions to help put people is unfair.
in increased control of their and their family’s Beth Israel Deaconess has already achieved
health. patient participation rates of 25 % for record
viewing and 15 % for secure messaging without
“The end result will be when patients leave the
BIDMC area or see a provider outside the area significant advertising or educational effort.
they can have all their medical data located in one Patients find value in the timeliness and conve-
safe place,” adds Halamka. nience of these transactions, so participate enthu-
siastically. Admittedly, BIDMC had 15 years to
refine the application, modify medical staff
The Regulatory Environment: ARRA/ bylaws to require PHR use, and overcome some
HITECH, the HIPAA Omnibus Rule, of the doubts and myths described below.
and FDASIA In addition to the Meaningful Use require-
ments, the HIPAA Omnibus Rule expands an
The American Recovery and Reinvestment Act individual’s rights to receive electronic copies of
(ARRA) of 2009 included the HITECH provi- his or her health information and to restrict dis-
sions which launched the national Meaningful closures to a health plan concerning treatment for
Use program. Meaningful Use includes certifica- which the individual has paid out of pocket in full.
tion for products, ensuring they are good enough, Many healthcare organizations are struggling with
and attestation for clinicians that they are using the self-pay disclosures workflow, since modify-
the technology wisely. ing data flows based on how the patient pays is not
12 Health Services Information: Personal Health Records as a Tool for Engaging Patients and. . . 271
currently supported by commercial EHR prod- experience with personal health records, it is
ucts. There are also ongoing national efforts to clear that most of those concerns have not
refine the Omnibus Rule language for “accounting appeared in practice.
of disclosures,” when a patient requests a list of all Providers were concerned that sharing elec-
who have accessed or received copies of their tronic health records would result in more asser-
record. Implementing such accounting for all dis- tions of malpractice as patients found errors in
closures including treatment, payment, and oper- their records. At BIDMC and other Harvard-
ations requires capabilities not present in most associated hospitals, the opposite has been true.
commercial EHR products. Informed and engaged patients do find errors and
The Food and Drug Administration issued a work with their providers to correct inaccuracies
report in April 2014 outlining the Food and before harms occur. Malpractice assertions
Drug Administration Safety and Innovation Act decrease when personal health records are
(FDASIA) regulatory framework that is relevant deployed.
to personal health records because of the increas- Providers were concerned that they would be
ing popularity of using mobile devices to access overwhelmed with secure email or other elec-
health-related resources. Mobile devices will be tronic requests from patients. Electronic requests
discussed in detail later in this chapter. have replaced phone calls and have reduced time
The FDA stratified mobile devices/apps into spent on “phone tag” and accelerated the resolu-
three categories: tion of simple administrative matters than can be
delegated to others.
Administrative apps – an application that reminds Patients were concerned that increased elec-
you about an appointment, describes costs/ben- tronic access would create new security risks.
efits such as co-pays, or helps you find a doctor. While it is true that the Internet is increasingly a
Wellness apps – an application that measures your mire of viruses and malware, keeping electronic
daily exercise, suggests weight loss strategies, data centrally managed on secure servers is less
or offers healthcare coaching via a social risky than exchanging paper copies, storing PDFs
network. on laptops, or exchanging electronic copies on
Medical devices – an application that measures a USB flash drives because centrally stored infor-
body parameter such as pulse, blood pressure, mation can be better audited and controlled.
or EKG and may offer therapeutic suggestions Patients and providers were concerned that
based on directly gathered diagnostic data. more transparency could jeopardize the clinician/
patient relationship because of misunderstandings
in the interpretation of electronic health records.
The FDA reaffirmed its intent to regulate
Instead, providers have been careful to write com-
Medical devices and not administrative apps/
prehensible summaries with fewer abbreviations
wellness apps.
because they know a patient is likely to read
It is unlikely that the FDA will regulate per-
their work.
sonal health records in the near future, but it will
There have been lessons learned along the way.
likely regulate the apps and devices which collect
Sharing inaccurate or confusing data with patients
patient telemetry and transmit it to personal health
does not add value. For example, administrative
records.
billing data is a coded summary of the clinical care
that lacks perfect specificity and time references,
i.e., just because you had a diagnosis of low
Myths potassium 5 years ago does not imply it is a
problem today.
Many providers and patients have concerns about Thus, we must be thoughtful about what data is
the impact of increased electronic data sharing and sent to PHRs and how that data is presented to
automated workflows. After nearly 20 years of patients. The problem list is useful clinical
272 J. Halamka
information as long as clinicians keep it current. and e-prescribing. Although the standards for per-
BIDMC removes ICD-9 administrative data feed sonal health records are not explicitly stated, it is
so that the clinician’s problem list is the only data logical that personal health records should mirror
which populates the patient view. Also, BIDMC the standards used in electronic health records
improved its problem list functionality so that it themselves. Standards can generally be lumped
maps to a standardized terminology, SNOMED into three different categories.
CT, enabling BIDMC to provide medical infor-
mation and decision support based on a controlled Vocabulary – the terminology used in each part of
vocabulary instead of just free text. the record to communicate meaning between
As long as the PHR software is usable and the sender and receiver. The Meaningful Use
data presented is relevant, supplemented by edu- Common Data Set requires LOINC codes for
cational materials, the experiencing of provider/ labs, RxNorm codes for medications,
patient data sharing will be positive. SNOMED CT for problem lists, CVX for
immunization names, and ISO 639–2 for pri-
mary language. The same standards should be
Digital Divide
used in personal health records and medical
devices connecting to personal health records.
As we offer more electronic resources to patients
Mappings to patient friendly terminology,
and encourage the use of mobile technology and
available for the National Library of Medi-
home medical devices, we must be careful not to
cine’s Value Set Authority Center, are likely
create a digital divide – the technology haves and
to be helpful to patients.
have nots. In the Boston area, there are many
Content – the container used to package a collec-
academic and technology professionals with fast
tion of data to be transported between a sender
Internet connections and the latest mobile devices.
and receiver. The Consolidated Clinical Docu-
There are also Medicaid patients without the
ment Architecture (CCDA) is used for all EHR
funding to purchase personal devices and those
transition of care summaries and is appropriate
who feel technology requires expertise beyond
to use for sending data to PHRs and collecting
their comfort zone. Research done in the Boston
data from patients. Medical devices may addi-
area discovered that the large majority of Medicaid
tionally use the IEEE 11073 standard to trans-
patients have phones capable of receiving text
fer data to and from PHRs.
messages and most patients have access to the
Transmission – the secure protocol to transport
Internet at work, at a local library, or a community
content from one place to another without mod-
center. We must engineer our personal health
ification or interception. Meaningful Use stage
records so they run anywhere on anything, but
2 requires the Direct Protocol (SMTP/SMIME
also protect privacy by not leaving behind cached
or SOAP/HTTPS) to be used for transport.
data that could be viewed inappropriately.
These standards are also appropriate for per-
PatientSite and most vendor applications are web
sonal health records and medical devices.
based so they can be accessed regardless of loca-
tion or platform, with specific protections to ensure
As standards become increasingly
data is encrypted and not stored in web browsers.
constrained, ease of interfacing improves and
Engineering for those with disabilities, failing eye-
the value of interoperable products increases.
sight, or limited computer skills is also essential.
Ideally, Meaningful Use certification should cre-
ate an ecosystem of personal health record prod-
Data Standards ucts, leveraging the liquidity of data to foster
innovation. Later stages of Meaningful Use
The HITECH Meaningful Use program requires likely encourage “modular” EHR and PHR
the use of specific standards for transition of care products that plug into large commercial
summary transmission, public health reporting, systems through the use of simple application
12 Health Services Information: Personal Health Records as a Tool for Engaging Patients and. . . 273
programming interfaces (APIs). The April 2014 his mobility to an EHR or PHR. It would be clear
JASON report, requested by AHRQ and that on some days he walked 50 ft and other
facilitated by MITRE corporation, provides a days he walked 5,000 ft. The trend would be
roadmap for evolution of healthcare apps that clear – fewer good mobility days and more lim-
expand the use of today’s EHRs and PHRs. ited function. Care plans, medications, and
supportive therapies would be informed by this
objective data.
The Role of Personal Medical Devices Just as personal computing has evolved from
terminals to PCs to mobile smartphones/tablets, it
As Accountable Care Organizations move from is likely that personal health records will increas-
fee for service to risk contracts, providers will be ingly run on mobile technology with interfaces to
reimbursed for keeping patients healthy and not home care devices.
for delivering more care. Personal medical
devices that report on patient activities, functional
status, and body parameters between clinician Research: OpenNotes, ICU Harm
visits will be increasingly important. Reduction, Care Plans, and Clinical
Such devices include electronic scales for mea- Trials
suring fluid retention in CHF patients, blood pres-
sure measurement for refractory hypertension, When BIDMC’s PatientSite was originally
glucometers for diabetics, and home spirometry released, it included patient access to the entire
for patients with COPD or asthma. health record except for the clinic notes a physi-
The current challenge is that home medical cian wrote about a patient. That changed in 2011
devices communicate using proprietary protocols when notes were added via the OpenNotes pro-
that make interfacing to personal health records ject. Here’s the press release about it.
and electronic health records very challenging. BOSTON – A Beth Israel Deaconess Medical
The Continua Alliance is a group of 60 compa- Center-led study has found that patients with
nies that collaboratively develops standards for access to notes written by their doctors feel more
incorporation into products with the goal that in control of their care and report a better under-
devices available at the local drugstore will standing of their medical issues, improved recall
“plug and play” with the diversity of current of their care plan, and being more likely to take
EHRs and PHRs without complex engineering their medications as prescribed.
or custom software development. Doctors participating in the OpenNotes trial at
Future stages of Meaningful Use will likely BIDMC, Geisinger Health System in Danville,
include a requirement for patient-generated data. PA, and Harborview Medical Center in Seattle
Payers, providers, and patients will all have incen- reported that most of their fears about an addi-
tives to include device from home telemetry in tional time burden and offending or worrying
electronic medical records that provide coordi- patients did not materialize, and many reported
nated, optimized care further personalized via enhanced trust, transparency, and communication
access to personal medical devices. with their patients.
Here’s an example. The father of the author of “Patients are enthusiastic about open access to
this chapter had multiple sclerosis for 23 years. their primary care doctors’ notes. More than
His mobility declined but there was no easy way 85 % read them, and 99 % of those completing
to measure that decline. To complicate the situa- surveys recommended that this transparency
tion, he self-medicated with over-the-counter continue,” says Tom Delbanco, MD, co-first
and prescription medications to episodically author, a primary care doctor at BIDMC and the
reduce his symptoms. During personal visits his Koplow-Tullis Professor of General Medicine
level of function seemed very high. Imagine that and Primary Care at Harvard Medical School.
a Fitbit or other home device provided data about “Open notes may both engage patients far more
274 J. Halamka
actively in their care and enhance safety when the “As one doctor noted: ‘My fears? Longer notes,
patient reviews their records with a second set more questions and messages from patients . . . In
reality, it was not a big deal.’”
of eyes.”
“Perhaps most important clinically, a remark- Walker suggests that so few patients were wor-
able number of patients reported becoming more ried, confused, or offended by the note because
likely to take medications as prescribed,” adds Jan “fear or uncertainty of what’s in a doctor’s ‘black
Walker, RN, MBA, co-first author and a Principal box’ may engender far more anxiety than what is
Associate in Medicine in the Division of General actually written, and patients who are especially
Medicine and Primary Care at BIDMC and Har- likely to react negatively to notes may self-select
vard Medical School. “And in contrast to the fears to not read them.”
of many doctors, few patients reported being con-
fused, worried or offended by what they read.” “We anticipate that some patients may be disturbed
The findings reflect the views of 105 primary in the short term by reading their notes and doctors
care physicians and 13,564 of their patients who will need to work with patients to prevent such
harms, ideally by talking frankly with them or
had at least one note available during a year-long agreeing proactively that some things are at times
voluntary program that provided patients at an best left unread.”
urban academic medical center, a predominantly
rural network of physicians, and an urban safety “When this study began, it was a fascinating
net hospital with electronic links to their doctors’ idea in theory,” says Risa Lavizzo-Mourey, MD,
notes. president and CEO of the Robert Wood Johnson
Of 5,391 patients who opened at least one note Foundation, the primary funder of the study.
and returned surveys, between 77 % and 87 % “Now it’s tested and proven. The evidence is in:
reported OpenNotes made them feel more in con- Patients support, use, and benefit from open med-
trol of their care, with 60–78 % reporting ical notes. These results are exciting – and hold
increased adherence to medications. Only 1–8 % tremendous promise for transforming patient
of patients reported worry, confusion, or offense, care.”
three out of five felt they should be able to add Although PatientSite provides great transpar-
comments to their doctors’ notes, and 86 % ency into ambulatory and inpatient records, the
agreed that availability of notes would influence ICU is still an area with limited patient and family
their choice of providers in the future. engagement. Patient-connected devices in the
Among doctors, a maximum of 5 % reported ICU provide a dizzying array of data but rarely
longer visits, and no more than 8 % said they provide an interpretation of that data that is useful
spent extra time addressing patients’ questions to families, especially while making end-of-life
outside of visits. A maximum of 21 % reported decisions. The Moore Foundation recently funded
taking more time to write notes, while between a grant for several hospitals, including BIDMC, to
3 % and 36 % reported changing documentation create unique patient dashboards that make the
content. process of care in ICUs more transparent and
No doctor elected to stop providing access to reduce harms. Here’s an example.
notes after the experimental period ended. As discussed previously, the father of the
“The benefits were achieved with far less author of this chapter had multiple sclerosis for
impact on the work life of doctors and their staffs 23 years. He also had myelodysplastic syndrome
than anticipated,” says Delbanco. “While a size- for 2 years, had 3 myocardial infarctions since
able minority reported changing the way their 2009, and died in mid-March of 2013.
notes addressed substance abuse, mental health When the family arrived at his ICU bedside in
issues, malignancies and obesity, a smaller minor- early March, they spoke with all his clinicians to
ity spent more time preparing their notes, and create a mental dashboard of his progress. It
some commented that they were improved.” looked something like this
12 Health Services Information: Personal Health Records as a Tool for Engaging Patients and. . . 275
Cardiac – history of 2 previous myocardial infarc- Ideally, all patients and families should have
tions treated with 5 stents. New myocardial the tools needed to make such decisions regard-
infarction resulting in apical hypokinesis and less of their medical sophistication.
an ejection fraction of 25 %. No further stent The Moore Foundation project includes
placement possible, maximal medical therapy an automated ICU dashboard/scorecard for
already given patients and families updated in real time
Pulmonary – new congestive heart failure post based on data aggregated from the medical
recent myocardial infarction treated with record and patient-connected telemetry. The
diuretics, nitroglycerine drip, afterload reduc- architecture includes a cloud-hosted decision
tion, upright position, and maximal oxygena- support web service. Hospitals send data in and
tion via bilevel positive airway pressure. the web service returns the wisdom of a graphical
O2 saturation in the 90s and falling despite display.
maximal therapy (other than intubation) Although OpenNotes and the Moore Founda-
Hematologic – failing bone marrow resulting in a tion ICU project implement new ways to share
white count of 1, a platelet count of 30, and a data and its interpretation, we still need addi-
hematocrit of 20 tional ways to involve patients and families in
Neurologic – significant increase in muscle spas- shared decision-making through the creation of
ticity, resulting in constant agitation. Pain med- shared care plans. BIDMC created the Passport
ication requirements escalating. Consciousness to Trust initiative, in collaboration with a com-
fading. mercial PHR software vendor. Patients and doc-
Renal – creatinine rising tors use a secure PHR website to develop a
shared care plan, and then that plan is sent to
Although the family did not have real-time the EHR using Meaningful Use standards and it
access to his records, they gathered enough data is made part of the permanent medical record
to turn this mental dashboard into a scorecard and integrated into care delivery. This kind of
green, yellow, and red indicators. third-party PHR to EHR integration is likely to
increase now that Meaningful Use requires
Cardiac – red due to irreversible low ejection EHRs to receive externally generated data.
fraction Also, care plan exchange is likely to be part of
Pulmonary – red due to the combination of falling future stages of Meaningful Use.
O2 saturation despite aggressive therapy An area in which more patient and family
Hematologic – red due to lack of treatment options engagement could be beneficial is in the area of
available for myelodysplastic syndrome and an clinical trial enrollment. Today, most patients are
inability to transfuse given the low ejection unaware of the new treatments that could provide
fraction and congestive heart failure a cure or breakthrough. Many are willing to enroll
Neurologic – yellow due to the potential for suc- in clinical trials but do not know how. Clinicians
cessful symptom control with pain medications may be unaware of matching criteria or a patient’s
Renal – yellow due to treatment options available suitability for a given trial. BIDMC has worked
for renal failure with a company called TrialX that enables patients
and providers to use PHRs and EHRs with inno-
The patient had expressed his wishes in a dura- vative electronic connections to clinical trial data-
ble power of attorney for healthcare – do not bases to facilitate the process. Not only can direct
intubate, do not resuscitate, no pressors, no feed- patient involvement in clinical trial enrollment
ing tubes, and no heroic measures. accelerate research, it is likely that patient sharing
From the combination of the dashboard, score- their experiences with other patients will enable
card, and his end-of-life wishes, it was clear that new discoveries to be rapidly disseminated for the
hospice was the best course of action. benefit of all.
276 J. Halamka
Conclusion Bourgeois FC, Taylor PL, Emans SJ, Nigrin DJ, Mandl
KD. Whose personal control? Creating private, person-
ally controlled health records for pediatric and adoles-
From 1999 to the present, personal health records cent patients. J Am Med Inform Assoc. 2008b;15(6):
have transitioned from a research project to the 737–43. https://doi.org/10.1197/jamia.M2865. Epub
mainstream and are now required by several fed- 2008 Aug 28.
eral programs. Patients and families increasingly Brennan PF, Downs S, Casper G. Project HealthDesign:
rethinking the power and potential of personal health
expect access to their records, a role in decision- records. J Biomed Inform. 2010;43 Suppl 5:S3–5.
making, and the convenience of using electronic https://doi.org/10.1016/j.jbi.2010.09.001.
workflows to manage their care. Consumer plat- Britto MT, Wimberg J. Pediatric personal health records:
forms continue to rapidly evolve, accelerated by current trends and key challenges. Pediatrics. 2009;123
Suppl 2:S97–9. https://doi.org/10.1542/peds.2008-
market demand and new interoperability stan- 1755I.
dards incorporated into electronic health records. Collins SA, Vawdrey DK, Kukafka R, Kuperman
As important as the technology has been, the GJ. Policies for patient access to clinical data via
breakthroughs of the past 5 years have been in PHRs: current state and recommendations. J Am Med
Inform Assoc. 2011;18 Suppl 1:i2–7. https://doi.org/
culture and policy. Clinicians no longer fear shar- 10.1136/amiajnl-2011-000400. Epub 2011 Sep 7.
ing the record or participating in secure messag- Council on Clinical Information Technology. Policy
ing. There are available policy solutions to tricky Statement–Using personal health records to improve
problems like sharing adolescent records with the quality of health care for children. Pediatrics.
2009;124(1):403–9. https://doi.org/10.1542/peds.2009-
their parents. 1005.
The next few years will be an important turn- Forsyth R, Maddock CA, Iedema RA, Lassere M. Patient
ing point for the medical industry as care perceptions of carrying their own health information:
becomes increasingly focused on continuous approaches towards responsibility and playing an
active role in their own health – implications for a
wellness rather than episodic sickness. Patient- patient-held health file. Health Expect. 2010;13(4):
generated healthcare data and patient involve- 416–26. https://doi.org/10.1111/j.1369-7625.2010.
ment in the entire process is essential to achiev- 00593.x.
ing our national and international policy goals Goel MS, Brown TL, Williams A, Cooper AJ, Hasnain-
Wynia R, Baker DW. Patient reported barriers to enroll-
for quality, safety, and efficiency. Patients, acting ing in a patient portal. J Am Med Inform Assoc.
as stewards of their own data, will facilitate data 2011;18 Suppl 1:i8–12. https://doi.org/10.1136/
sharing, discovery of new therapies, and innova- amiajnl-2011-000473. Epub 2011 Nov 9.
tion as part of a connected learning healthcare Haggstrom DA, Saleem JJ, Russ AL, Jones J, Russell SA,
Chumbler NR. Lessons learned from usability testing
system. of the VA’s personal health record. J Am Med Inform
Assoc. 2011;18 Suppl 1:i13–7. https://doi.org/10.1136/
amiajnl-2010-000082. Epub 2011 Oct 8.
References Kaelber J. A research agenda for personal health records.
Am Med Inform Assoc. 2008;15:729–36.
AHIMA e-HIM Personal Health Record Work Group. The Kim EH, Stolyar A, Lober WB, Herbaugh AL, Shinstrom
role of the personal health record in the EHR. SE, Zierler BK, Soh CB, Kim Y. Challenges to using an
J AHIMA. 2005;76(7):64A–D. electronic personal health record by a low-income
Archer N, Fevrier-Thomas U, Lokker C, McKibbon KA, elderly population. J Med Internet Res. 2009;11(4),
Straus SE. Personal health records: a scoping review. e44. https://doi.org/10.2196/jmir.1256.
J Am Med Inform Assoc. 2011;18(4):515–22. https:// Poulton M. Patient confidentiality in sexual health services
doi.org/10.1136/amiajnl-2011-000105. Review. and electronic patient records. Sex Transm Infect.
Beard L, Schein R, Morra D, Wilson K, Keelan J. The 2013;89(2):90. https://doi.org/10.1136/sextrans-2013-
challenges in making electronic health records accessi- 051014.
ble to patients. J Am Med Inform Assoc. 2012; Rudd P, Frei T. How personal is the personal health record?:
19(1):116–20. comment on “the digital divide in adoption and use of a
Bourgeois FC, Taylor PL, Emans SJ, Nigrin DJ, Mandl personal health record”. Arch Intern Med. 2011;
KD. Whose personal control? Creating private, person- 171(6):575–6. https://doi.org/10.1001/archinternmed.
ally controlled health records for pediatric and adoles- 2011.35. No abstract available.
cent patients. J Am Med Inform Assoc. 2008a;15(6): Saparova D. Motivating, influencing, and persuading
737–43. patients through personal health records: a scoping
12 Health Services Information: Personal Health Records as a Tool for Engaging Patients and. . . 277
review. Perspect Health Inf Manag. 2012;9:1f. Epub Wynia M, Dunn K. Dreams and nightmares: practical and
2012 Apr 1. ethical issues for patients and physicians using personal
Sittig DF, Singh H. Rights and responsibilities of users health records. J Law Med Ethics. 2010;38(1):64–73.
of electronic health records. CMAJ. 2012;184(13): https://doi.org/10.1111/j.1748-720X.2010.00467.x.
1479–83. Yamin CK, Emani S, Williams DH, Lipsitz SR, Karson
Sittig DF, Singh H, Longhurst CA. Rights and responsi- AS, Wald JS, Bates DW. The digital divide in ad
bilities of electronic health records (EHR) users option and use of a personal health record. Arch Intern
caring for children. Arch Argent Pediatr. 2013; Med. 2011;171(6):568–74. https://doi.org/10.1001/
111(6):468–71. archinternmed.2011.34.
A Framework for Health System
Comparisons: The Health Systems 13
in Transition (HiT) Series of the
European Observatory on Health
Systems and Policies
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
The Ljubljana Charter: HiTs and Health Systems in Transition . . . . . . . . . . . . . . . . . . . . . . . . 281
The Observatory Partnership: HiTs and Policy Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
The Observatory Functions: HiTs in a Wider Work Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
B. Rechel (*)
European Observatory on Health Systems and Policies,
London School of Hygiene and Tropical Medicine,
London, UK
e-mail: Bernd.Rechel@lshtm.ac.uk
S. Lessof · J. Figueras
European Observatory on Health Systems and Policies,
Brussels, Belgium
e-mail: szy@obs.euro.who.int; jfi@obs.euro.who.int
R. Busse
Technische Universität Berlin, Berlin, Germany
Department Health Care Management, Faculty of
Economics and Management, Technische Universität,
Berlin, Germany
e-mail: rbusse@tu-berlin.de
M. McKee
London School of Hygiene and Tropical Medicine,
London, UK
e-mail: Martin.McKee@lshtm.ac.uk
E. Mossialos
London School of Economics and Political Science,
London, UK
e-mail: e.a.mossialos@lse.ac.uk
E. van Ginneken
Berlin University of Technology, Berlin, Germany
European Observatory on Health Systems and Policies,
Department of Health Care Management, Berlin
University of Technology, Berlin, Germany
e-mail: ewout.vanginneken@tu-berlin.de
helps explain the specific approach taken to HiTs, The Observatory Partnership: HiTs
but also speaks of the significance of context in and Policy Relevance
developing comparative frameworks.
Many of (what came to be) the Observatory team
were involved in developing evidence for Lju-
The Ljubljana Charter: HiTs and Health
bljana. The Observatory, which took formal
Systems in Transition
shape in May 1998, was designed to take forward
the approach to evidence for policy, after the
The Observatory can trace its origins to the early
Charter was agreed (Box 1). The original Partners
1990s and the challenges Europe faced as western
were WHO Europe, the government of Norway,
European expectations (and health-care costs) rose
the European Investment Bank, the World Bank,
and as the countries emerging in the wake of the
the London School of Economics and Political
Soviet Union looked to overhaul their own health
Science (LSE), and the London School of
systems. The World Health Organization (WHO)
Hygiene & Tropical Medicine (LSHTM). The
Regional Office for Europe facilitated a process
exact composition of the partnership has changed
that culminated in the 1996 Ljubljana conference
over the years, so that the Observatory today also
on European Health Care Reforms and the Lju-
includes the European Commission, more
bljana Charter, in which health ministers from
national governments (Austria, Belgium, Finland,
across the European region committed them-
Ireland, Slovenia, Sweden, and the United King-
selves to a set of principles for health system
dom), a regional government (Veneto), and the
reform. These reflected a growing understanding
French National Union of Health Insurance
of health’s part in the wider society and economy,
Funds (UNCAM); but the concept of a partner-
the importance of people and patients, the need
ship that brings different stakeholders together
for policy to be “based on evidence where avail-
remains the same. The idea is that the Observa-
able,” and the role of monitoring and learning
tory, like a good health system, is informed by
from experience (Richards 2009).
the people who use its services as well as those
The original HiTs were developed as part of the
providing them. The Partners have genuine expe-
preparations for the Ministerial Conference. They
rience of shaping health systems, and this has
were addressing a postcommunist Europe in
prompted a focus on policy relevance and how
which more than 15 new countries had emerged
decision-makers can access and use the evidence
and many more were making a transition from
generated. They have insisted that the HiT series
state-managed to market economies with all the
should be “accessible” to a nonspecialist, non-
accompanying economic upheaval. There were
academic audience and, more specifically, be
also growing challenges to the sustainability of
readable, clearly structured, consistent (so that
established and wealthy health systems and to
readers can move from one HiT to another and
notions of solidarity. The HiTs had therefore to
find comparable information), and timely, that is,
establish a common vocabulary for describing
available while the data and analysis are still
health systems and to make sure that the terms
current.
used could be explained and understood in coun-
tries with very different traditions. They had also
to provide for the fact that the systems to be
compared were contending with significant dis- Box 1: The European Observatory on Health
continuities and ongoing change. This prompted Systems and Policies
the development of a template to describe health The core mission of the Observatory is to
systems that would set down the bases on which to support and promote evidence-based health
make comparisons across countries. It was com- policy-making through the comprehensive
prehensive, allowed for very different path devel- and rigorous analysis of the dynamics of
opments, and offered detailed explanations to
guide authors. (continued)
282 B. Rechel et al.
Structure
Box 2: (continued)
the issues that matter most to decision- The HiT template benefits from a clear structure,
makers. All evidence is available “open based on a functional perspective of health sys-
access” to facilitate its use in practice. tems. It works from the premise that all health
• Performance assessment includes a pack- systems perform a number of nonnormative core
age of methodological and empirical work functions (Duran et al. 2012), including the orga-
designed to respond to country needs. nization, the governance, the financing, the gen-
There have been two key studies looking eration of physical and human resources, and the
at the policy agenda for performance com- provision of health services. The first HiT tem-
parison to improve health services and plate was developed in 1996. It was revised in
separate work on the domains that com- 2007 and again in 2010, but all iterations have
prise performance (efficiency, population used the notion of core functions and have drawn
health, responsiveness). on the literature and prevailing debate to interpret
• Knowledge brokering involves engaging what those functions are.
with policy-makers to understand what All revisions have involved input from staff
evidence they need and then assembling (editors) and national authors, based on their
and communicating the relevant informa- work on the country profiles, but they have also
tion at the right time. The Observatory included consultation with a wider group of users
combines an extensive publication pro- and stakeholders (Observatory Partners, various
gram with face-to-face and electronic dis- units of WHO and of the European Commission’s
semination to convey evidence on what health directorate, and, more recently, members of
might work better or worse in different the HSPM network). These review stages have
country and policy contexts. helped strengthen the template and build some
consensus around its structure and approach.
Table 1 shows the changes over time and the
very marked structural consistency between ver-
The HiT Template: Structuring, sions. This is in part because of a conscious deci-
Populating, and Signposting sion to adapt rather than rethink the structure
a Comparative Framework completely so that HiT users can read backwards
in time as well as across countries. It is also a
HiTs use a standard questionnaire and format to testament to the robustness of the first iteration.
guide authors – referred to as the HiT template. It The adjustments reflect on a wider rethinking on
guides the production of detailed descriptions of how different elements fit into the whole and on
health system and policy initiatives so that every what seemed more or less important at particular
HiT examines the organization, financing, and times.
delivery of health services, the role of key actors, The initial template placed more emphasis on
and the challenges faced in the same way, estab- the political, economic, and sociodemographic
lishes a comparable baseline for reviewing the context and on a country’s historical background,
impact of reforms, and takes a standardized because of the proximity to transition for so many
approach to health system assessment. This struc- eastern European countries. The 2004–2007 revi-
ture is central to the ability of HiTs to inform sion consolidated financing in one chapter, bring-
comparative analysis and facilitates the exchange ing together the collection and allocation of funds,
of reform experiences across countries. Arriving and split the chapter on organization and manage-
at a robust template is not straightforward, but the ment to address planning and regulation sepa-
Observatory’s experience suggests some elements rately, reflecting shifts in emphasis at the time in
that can help. wider academic and policy thinking. In addition, a
284 B. Rechel et al.
new chapter was added, on the assessment of the tightening of the template (see Box 3) after
health system, again a response to the more which the 2013 Estonia HiT dropped to
explicit way this issue was being addressed at 195 pages, and it is being revisited again in the
the time. The 2010 template condensed organiza- 2015–2016 update.
tion, governance, planning, and regulation into a
single chapter again and revised and extended the
Box 3: The 2010 Template, Structure
section on performance assessment as policy-
and Contents
makers became increasingly interested in under-
1. Introduction: the broader context of the
standing and contextualizing the evaluations
health system, including economic and
of their health systems that they were being
political context, and population health
confronted with.
2. Organization and governance: an
overview of how the health system in
the country is organized, the main actors
Scope and Content and their decision-making powers, the
historical background, regulation, and
There were of course other changes to the tem- levels of patient empowerment
plate between iterations in terms of the detail 3. Financing: information on the level of
addressed within the relatively stable overall expenditure, who is covered, what bene-
structure. New questions and issues were added fits are covered, the sources of health-
because areas like mental health, child health ser- care finance, how resources are pooled
vices, and palliative care (2007) or public health and allocated, the main areas of expen-
and intersectorality (2010) came to the policy fore diture, and how providers are paid
and as a wide group of experts and users were 4. Physical and human resources: the
consulted. The 2007 template was particularly planning and distribution of infrastruc-
heavily laden with new additions and contributed ture and capital stock, IT systems, and
to longer and more time-consuming HiTs. Cer- human resources, including registration,
tainly there was a marked growth in the length training, trends, and career paths
of HiTs in successive iterations with Estonia, for 5. Provision of services: concentrates on
example, growing from 67 pages in 2000, to patient flows, organization and delivery
137 pages in 2004, and 227 pages in 2008. This
was addressed to some extent in 2010 with a (continued)
13 A Framework for Health System Comparisons: The Health Systems in Transition (HiT) Series. . . 285
simultaneously maintaining comparability with what is happening which different sectors, minis-
other countries. It has proved to be helpful to tries, and levels of the health service (primary,
flag up where data is missing or an element of a secondary, regional, local) can all subscribe
system is not yet developed rather than simply to. They use HiTs in considering reforms, as the
avoiding mention of it, as it helps readers under- basis for policy dialogue and to explore policy
stand gaps. Editors have an important role in options, and to set their own health system’s per-
steering HiTs between flexibility and consistency formance in a European context. Other users are
and deciding what should be included or omitted. foreign analysts or consultants trying to get a
They meet regularly to exchange experience and comprehensive understanding of a health system,
discuss practice. and researchers and students. HiTs are a single
source of information and pull together different
strands of analysis which otherwise can be sur-
Review prisingly hard to find in “one place.”
country reviews for its region. The average impact research (and people) management skills of the
factor of (European Observatory) HiTs, calculated editorial team. Other comparative initiatives with
internally using Thomson Reuters methodology, limited resources might also want to consider
was 3.6 between 2012 and 2014, with a high of what they can do in terms of sharing ownership
4.26 in 2013 although this only captures citations and recognition to create non-monetary incentives
in journals listed on PubMed/Medline. Google for national counterparts and to develop their
Scholar, which also recognizes the gray literature, own team.
shows that some HiTs achieve several hundred Comparability is and will remain a challenge,
citations per edition. despite the standard template, tables, and figures,
The Observatory’s experience with HiTs has and is likely to be an issue for all other compara-
generated insights that others developing frame- tive projects. This is somewhat obvious when it
works for health system comparison might usefully comes to quantitative data given the divergent
draw on. It demonstrates the importance of a user- geographic coverage of international databases
friendly template that helps authors and editors and the differences in definitions and data collec-
produce accessible, relevant, and credible outputs tion methods, not to mention the challenges at the
with a focus on what is expected from the compar- individual country level. While it is clear that
isons and on who is going to use them. However, it caution must be exercised when comparing quan-
also suggests that no template is perfect. There are titative data from different sources, it is also true,
different ways of categorizing and grouping key if less obvious, that qualitative data and the
functions (of a health or any other system) or of descriptive elements of the HiTs raise issues of
conceptualizing systems and different levels of comparability. In some areas there are broadly
tackling and reporting evaluation. To some extent accepted tools (OECD et al. 2011) that help, but
these are a matter of preference. There are also and in many there are no agreed standard definitions
always tradeoffs between comprehensiveness and (with health professionals being a case in point).
accessibility, completeness and timeliness, and Other comparative projects will need both to draw
inclusiveness and readability. The current HiT tem- on the latest available knowledge and frameworks
plate can be seen as a pragmatic trade-off based on and to invest in methodological work as the
almost 20 years of experience. How other teams Observatory team has done, for example, with
chose to balance these will depend on the focus of the conceptual model (the three-dimensional
their comparisons and the people who are to use cube) to explain coverage (Busse et al. 2007;
their work. Busse and Schlette 2007). They will also need to
The Observatory has also found ways of com- tailor responses to data and evidence availability
bining (excellent) national authors with its own in parts of Europe (particularly but by no means
technical editors. This is not always straightfor- exclusively in central, eastern, and southeastern
ward as not all European countries have the same Europe) and to hope that EC/OECD/WHO initia-
capacity in health system research and national tives on data will ultimately fill the gaps. There
experts with strong analytical and English writing will still and inevitably be differences in the infor-
skills can be hard to find (Santoro et al. 2016) and mation available in countries, in the issues which
may move on rapidly. Moreover, HiT and HSPM are important to them, and in the interests and
authors are not normally remunerated but, at strengths of authors. Those developing frame-
“best,” receive only small honoraria. The HiT works for comparison will have to address these
series has addressed these challenges by identify- tensions in light of their overarching objectives
ing and linking formally with leading institutions, and in the knowledge that health systems are
cultivating long-term relationships with HiT constantly evolving. They may also find, as the
author teams, and, most recently, through its Observatory has, that a comparative framework
HSPM network. This mix of approaches may simply cannot capture everything and that analy-
have helped build capacity in countries. It has sis for more specialized issues may require sepa-
certainly developed the understanding and rate study.
296 B. Rechel et al.
Despite the challenges, the Observatory Duran A, et al. Understanding health systems: scope, func-
would hold that there is real value in a framework tions and objectives. In: Figueras J, McKee M, editors.
Health systems, health, wealth and societal well-being:
for health system comparison, particularly one assessing the case for investing in health systems.
that relates to a defined “user” need and which Maidenhead: Open University Press; 2012. p. 19–36.
can be sustained over time. Much follows from Figueras J, Tragakes E. Health care systems in transition:
knowing who will use a set of comparisons and production template and questionnaire. Copenhagen:
World Health Organization Regional Office for
why. Longevity allows a framework to evolve – Europe; 1996.
to improve, strengthen comparability, and build Magnussen J, Vrangbak K, Saltman RB, editors. Nordic
up successive levels of knowledge. Combining health care systems. Recent reforms and current policy
the two means a framework can move beyond the challenges. Maidenhead: Open University Press; 2009.
Mossialos E, Allin S, Figueras J. Health systems in transi-
descriptive to the truly evaluative so that it cap- tion: template for analysis. Copenhagen: WHO
tures and assesses aspects of health system per- Regional Office for Europe on behalf of the European
formance in ways that speak to policy-makers or Observatory on Health Systems and Policies; 2007.
the research community or, ideally, both. OECD, Eurostat, WHO. A system of health accounts.
Paris: OECD Publishing; 2011. https://doi.org/
10.1787/9789264116016-en.
Rechel B, Thomson S, van Ginneken E. Health systems in
transition: template for authors. Copenhagen: WHO
References Regional Office for Europe on behalf of the European
Observatory on Health Systems and Policies; 2010.
Busse R, Blümel M. Germany: health system review. Rechel B, et al. Lessons from two decades of health reform
Health Syst Transit. 2014;16(2):1–296. in Central Asia. Health Policy Plan. 2012;27(4):281–7.
Busse R, Schlette S, editors. Health policy developments Rechel B, Richardson E, McKee M, editors. Trends in
issue 7/8: focus on prevention, health and aging, and health systems in the former Soviet countries. Copen-
human resources. Gütersloh: Verlag Bertelsmann hagen: World Health Organization; 2014 (acting as the
Stiftung; 2007. host organization for, and secretariat of, the European
Busse R, Schreyögg J, Gericke CA. Analyzing changes in Observatory on Health Systems and Policies).
health financing arrangements in high-income coun- Richards T. Europe’s knowledge broker. BMJ. 2009;339:
tries: a comprehensive framework approach, Health, b3871.
Nutrition and Population (HNP) discussion paper. Santoro A, Glonti K, Bertollini R, Ricciardi W, McKee
Washington, DC: World Bank; 2007. M. Mapping health research capacity in 17 countries of
Catallo C, Lavis J, The BRIDGE study team. Knowledge the former Soviet Union and South Eastern Europe: an
brokering in public health. In: Rechel B, McKee M, exploratory study. Eur J Pub Health. 2016;26:349–54.
editors. Facets of public health in Europe. Maidenhead: van Ginneken E, et al. The Baltic States: building on
Open University Press; 2014. p. 301–16. 20 years of health reforms. BMJ. 2012;345:e7348.
Health Services Knowledge: Use of
Datasets Compiled Retrospectively to 14
Correctly Represent Changes in Size of
Wait List
Paul W. Armstrong
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Why Does the Waiting List Shrink (or Swell)?
The Primary Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
What Happens to Enrolment and Admission in a Waiting List Initiative? . . . . . . . 302
Does Size Shrink if Admission Exceeds Enrolment
(and Does Size Swell if Enrolment Exceeds Admission)? . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
In South Glamorgan, Wales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
In INSALUD, Spain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
In England . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
In Victoria, Australia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
In Winnipeg, Canada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
In Sweden . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
In England . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
The Balance of Enrolments and Admissions
(Plus Other Removals) Equals the Change in
Size. Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
If the Model Is Not Complicated, the Data Must Be Simple! . . . . . . . . . . . . . . . . . . . . . . . . . . 319
The Number of ‘Starts’ and ‘Stops’
Must Be the Same . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Secondary Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Inexplicably Complicated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Supplier-Induced Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Why has the Effect of Enrolment Confounded
Analyses to Date? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
Some Assumed Enrolment Was Fixed and
Unvarying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
Some Only Registered Discharge (and Death) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Some Compiled Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
Some Made Hay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
P. W. Armstrong (*)
London, UK
e-mail: P.W.Armstrong@outlook.com
effect of enrolment and find that they are unable to Culyer and Cullis (1976) imagine enrolments
test the primary hypothesis having failed to collect driven by admissions, i.e., ‘demand’ for a service
the necessary data. induced by its ‘supply,’ and Buttery and Snaith
Other investigators chose to infer that the rela- (1980) imagine enrolments constrained by the
tionship did not have the form hypothesized when length of wait. The former list is driven by the
they found no evidence of an inverse relationship public’s appetite for consumption and the latter by
between the number of admissions and changes in the clinicians’ desire to limit their commitments.
the size of the list (Feldstein 1967; Culyer and But the mechanisms and the outcomes envisaged
Cullis 1976; Snaith 1979; Buttery and Snaith are the same. An increase in the number of admis-
1980; Kenis 2006). sions is thought to reduce the length of wait, a
Culyer and Cullis (1976, 244) invoked “Say’s reduction in the length of wait is thought to
Law of Hospitals . . . that additions to the supply increase the number of enrolments, and an
of inpatient capacity create equal additions to the increase in the number of enrolments is thought
demand for that capacity.” “[A]s the price of a to increase the size of the list. It is change in the
good or service is lowered . . . the quantity number of admissions which is thought to evoke
demanded in any . . . period . . . will rise” change in the number of enrolments in both
(p. 244), so the authors hypothesized that an instances, and both hypotheses predict a direct
increase in “throughput capacity” would be correlation between admissions and enrolments.
accompanied by a fall in “mean waiting time” Enrolments are thought to contribute to self-
and “time price” which “would lead one to expect regulation and to supplier-induced demand, but
demand to increase” (p. 247). As a result, the neither pair of investigators assembled data
authors expected “a positive relation between which allowed them to confirm whether the
throughput capacity and waiting lists” (p. 247), number of enrolments was fixed or to establish
and they reported significant direct correlations whether any variation had the pattern hypothe-
between these two variables for seven (out of sized. Instead, they assume relationships which
15) “hospital regions” over time (p. 249). Under are consistent with the primary hypothesis. They
this hypothesis, an increase in the number of expect the size of the list to swell if there has been
admissions is about the worst line a policy- an increase in the number of enrolments, and they
maker can take if his/her goal is to reduce the expect the size of the list to shrink if there has been
size of the list. a decrease in the number of enrolments.
Buttery and Snaith (1980, 58) hypothesized “a The authors of most of these investigations
self-regulating system . . . in which waiting times dismissed the primary hypothesis without fair
for patients and waiting lists per surgeon are rela- trial. It would have been reasonable to restrict
tively constant” (Feldstein 1967; Smethurst and attention to the effect of admissions on size – and
Williams 2002). They thought that “[w]aiting to draw conclusions accordingly – when the effect
times must provide a constraint on unmet need, of enrolment had been given no thought. It would
preventing patients from coming forward or also have been reasonable to restrict attention to
surgeons from putting them on their waiting the effect of admissions on size when counts of
lists if they do.” In other words, the number of enrolments were thought to be unvarying. But it
enrolments shrank to counter any increase in wait was not reasonable to dismiss the primary hypoth-
as a result of any decrease in the number of admis- esis without attempting to adjust for enrolments
sions; and the number of enrolments swelled in (formula 3) once the effect had been surmised and
response to any decrease in wait as a result of any the variation acknowledged.
increase in the number of admissions. They under- The relationship between the change in the size
stood that if such a system exists, a “further of the list and the balance of enrolments and
increase in the number of surgeons will further admissions over a period is simple, not complex;
increase the national waiting list” and “a diminu- exact, not approximate; and mathematical, not
tion [will] reduce it.” behavioral. The relationship is not affected by
302 P. W. Armstrong
the location of the delay (outpatient or inpatient/ Torkki et al. 2002; NWTU 2003; National Audit
day case) or its cause (assessment, investigation, Office Wales 2005; Kenis 2006; Kreindler 2010).
or treatment); by diagnosis or procedure; by clini- They represent a variety of stakeholders, e.g.,
cian, specialty, or provider; or by any other clas- clinicians (DHSS 1981a; Sykes 1986; Naylor
sification of the lifelines. So the seeming variety 1991; Hanning and Lundström 1998; Torkki
of our case studies contributes a little color to the et al. 2002), managers (Worthington 1991; Street
account but adds nothing to the veracity of the and Duckett 1996; Hanning and Lundström 1998;
argument presented in formulae (1) through (5). Kenis 2006), and policy-makers (DHSS 1981a;
An increase in the number of admissions may NWTU 2003); and they represent a variety of
either make matters better, under the primary paradigms, e.g., health economics (Street and
hypothesis, or else may make matters worse, Duckett 1996; Hanning and Lundström 1998),
under the secondary hypothesis. The two views organizational science (Kenis 2006), and system
have influenced the thinking of contemporary dynamics (Worthington 1991). We thought it
commentators (Carvel 2004) and policy-makers remarkable that they should all agree: when a
but are contradictory. We wish to establish which consensus is the result of independent (and rigor-
studies provide relevant data, whether the results ous) evaluation by various stakeholders and dif-
are trustworthy or suspect and whether they con- ferent disciplines, their agreement adds weight to
firm or refute the primary hypothesis. the evidence. But independent (and rigorous)
The literature is clogged with citations. So, in evaluation is not the only way in which we reach
this chapter, we have given the floor to studies a consensus. Some authors also claim that the
which use empirical data to explore the effect of relationship (formula 3) has never been observed
admission on the size of the list while allowing for in practice (Culyer and Cullis 1976). Given that
the effect of enrolment. We have not knowingly the proposition of a second hypothesis implies the
omitted any study which offers relevant data. We failure of the first, it is perhaps not surprising that
are widely read, and we have used reviews this view is one echoed by many of those who
(Faulkner and Frankel 1993; Sanmartin et al. have contributed to this literature. So the variety
1998; Hurst and Siciliani 2003; Finn 2004; of the stakeholders provides no assurance of the
Kreindler 2010) and reference lists (Harvey et al. independence of their judgment if it is the failure
1993) to identify potentially relevant material. We of the first hypothesis on which they are all
are looking for one, well-substantiated, exception agreed; and the variety of the paradigms provides
to the rule, a set of data which invalidates for- us with no assurance of the rigor of their evalua-
mula (3). But the literature is extensive, and we tion if the failure of the first hypothesis is assumed
have not had time to run a systematic search of our by each approach. We begin this chapter by
own. Nevertheless, it should be easy to find exam- conducting a fresh assessment of claims that the
ples given the eagerness with which alternative primary hypothesis has failed.
hypotheses have been adopted and the primary
hypothesis dismissed.
What Happens to Enrolment and
Admission in a Waiting List Initiative?
Why Does the Waiting List Shrink
(or Swell)? The Primary Hypothesis Very few researchers have reported the number of
enrolments. We know of only seven instances. Four
A number of authorities claim that it is the balance appeared in print having been subject to peer review
of enrolments and admissions which determines (White 1980; Street and Duckett 1996; Armstrong
whether there is an increase (or a decrease) in the 2000, 2010), and three are contributions to the grey
size of a waiting list (DHSS 1981a; Sykes 1986; literature, two of which are in the public domain
Naylor 1991; Worthington 1991; Street and (Hamblin et al. 1998; Moral and de Pancorbo 2001)
Duckett 1996; Hanning and Lundström 1998; and the other of which is not (Kreindler and Bapuji
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 303
2010). This observation indicates that most commissioned by the hospitals he visited but,
researchers have not examined the effect of the apart from these (Purcell 2003), was obliged to
balance of enrolments and admissions on the size confess that “it is difficult to verify the reported
of a waiting list and implies that the validity of the number of procedures carried out under the
first hypothesis has not been widely evaluated. [waiting list initiative] and to ascertain the extent
Many researchers believe that an increase in to which they are over and above core-funded
the numbers admitted from a waiting list ought to activity” (Purcell 2003, 26). (# Government of
be accompanied by a decrease in its size (Culyer Ireland 2003.) He also reported that “the overall
and Cullis 1976; Goldacre et al. 1987; Naylor level of elective inpatient treatment . . . fell
1991). They rarely declare that enrolment may between 1998 and 2001,” which “suggests that
be a factor (Newton et al. 1995) or state that the the Initiative did not result in an increase in elec-
number of enrolments is assumed to be fixed and tive inpatient activity over and above existing
unvarying, i.e., stationary (MoH 1963a). Instead, levels” (Purcell 2003, 28). (# Government of
the effect of enrolment is conceded only when Ireland 2003.) It is possible that the fault was
researchers are obliged to explain how an increase due to a failure in the system of bookkeeping,
in the numbers admitted from a waiting list that financial control mechanisms were blameless
appears to have been accompanied by an increase (Purcell 2003), and that additional activity failed
in its size (MoH 1964; Goldacre et al. 1987; to have the effect desired. But before accepting
Hamblin et al. 1998; Sanmartin et al. 1998). that this is the case, we would like to know
(The size of the list and the numbers admitted whether the funds awarded to each hospital were
may both increase yet still be consistent with the apportioned in line with the intended contribu-
primary hypothesis if enrolment exceeded admis- tion of each department and whether these addi-
sion.) But at the close of their investigations, these tional resources appeared under the appropriate
researchers are unable to substantiate the claims budget headings in time to pay for the activity
they wish to make because they neither eliminated planned. If a study does not quantify the effect of
the variation in enrolments nor did they collect the additional expenditure on elective admission,
counts which would have allowed them to adjust we doubt its ability to provide empirical evi-
for it. The limitations of these studies have not dence about the effect of elective admission
been sufficiently appreciated. Their results have (or additional expenditure) on the size of the
nothing to contribute to our understanding of the list. Newton et al. (1995, 784) report that “[i]n
relationship between enrolments, admissions, and only six [out of 44 waiting list initiatives] was
the size of the list. Their discussions have nothing additional funding followed by a rise in admis-
to contribute to our methodology because they fail sions and a fall in list size.”
to acknowledge that variation in enrolment con- Some commentators believe that an increase in
founds the apparent relationship between admis- the amount of a resource, particularly one thought
sions and the size of the list (Newton et al. 1995). to be in critically short supply, ought to reduce the
Some analysts anticipate that an increase in size of the waiting list (DHSS 1975, 1981a). Our
expenditure, intended to increase the numbers evaluation of the effect of an increase in such a
admitted from the waiting list, ought to reduce resource proceeds along the same lines as our
its size. So the Irish Minister for Health and Chil- evaluation of the effect of an increase in expendi-
dren authorized the expenditure of an additional ture. We expect a reduction in the size of the list
€246 million between 1993 and 2002 (Purcell only when the number of admissions exceeds
2003), on the understanding that this would the number of enrolments. We therefore want to
buy substantial numbers of additional elective know what effect the increase in resources had
procedures and, as a consequence, would reduce on the number of admissions (and on the
the numbers who had been on the list for a number of enrolments). This data is the minimum
long time. The Comptroller and Auditor General required for any evaluation, and the effect of
found invoices for work in the private sector a waiting list initiative on the size of the list
304 P. W. Armstrong
cannot be established without it. Regrettably, this outpatients seen, but he does not express the
information has rarely been assembled for the same dissatisfaction with the relationship he
benefit of readers (Hamblin et al. 1998). observes between the size of the inpatient
waiting list and discharges (or deaths). In the
former instance, he attributes inconsistency to
Does Size Shrink if Admission Exceeds those who walk-in without having been entered
Enrolment (and Does Size Swell if on the list: in other words, he understands that
Enrolment Exceeds Admission)? not every new outpatient seen represented a unit
reduction in the numbers waiting (White 1980).
The veracity of the primary hypothesis can only In the latter, he does not acknowledge the
be tested by studies which report the number of difference between the number of discharges
enrolments alongside the number of admissions (or deaths) and the number of elective admis-
and changes in the size of the list. These studies sions or between the number of decisions to
are therefore rather more important than has hith- admit to the list and the number of new outpa-
erto been recognized. tients seen. Instead, he expresses himself satis-
fied that “[f]ewer deaths and discharges in the
specialty coincide with a lower in-patient waiting
In South Glamorgan, Wales list” which “indicates that long in-patient waiting
lists combine with greater in-patient activity”
White (1980) reports a study of the combined (White 1980, 274).
elective activity of three or four consultants in White (1980) uses surrogate measures to
one surgical specialty (unspecified) at one public describe activity over 15 periods each of 3 months
hospital in South Glamorgan, Wales. He thinks duration. He counts GP referrals rather than
that the size of the outpatient waiting list ought to all referrals and referrals received rather than
have something to do with the number of new referrals accepted. He counts new outpatients
outpatients seen and the number of GP referrals booked rather than decisions to admit to the inpa-
(White 1980), and he uses column charts to tient waiting list, and he counts discharges
explore the relationship. He seems also to have (or deaths) rather than elective admissions. Now,
thought that the size of the inpatient waiting list if it is the balance of enrolments and admissions
might have something to do with the number of which determines whether there is an increase
discharges (or deaths) and the number of new (or a decrease) in the size of a waiting list, we
outpatients seen. would expect E A = Cnow Cthen (3). But
White (1980) does not make full use of his 2.39 new outpatients were booked per discharge
data. So he looks for a relationship between the (or death). Therefore, where E represents new
size of the outpatient waiting list and new out- outpatients booked, A represents discharges
patients seen and between the size of the outpa- (or deaths), and C represents the size of the inpa-
tient waiting list and GP referrals, but he does not tient waiting list, we do not expect E A to
consider the combined effect on the size of the exactly equal Cnow Cthen (Table 1, right-hand
outpatient waiting list of new outpatients seen side). However, if the surrogate measures have
and GP referrals. Worse, he looks for a relation- the effect anticipated on the size of the waiting
ship between the size of the inpatient waiting list, we would expect a direct correlation
list and deaths and discharges, but he does between the two sides of formula (3), i.e.,
not consider even the univariate effect of new E A / Cnow Cthen.
outpatients seen. We obtained the quantities E A and Cnow
then
White (1980) does not reason correctly from C by calculation from counts charted by White
the data he has assembled. He is not satisfied with (1980), and we used Spearman’s rho to assess the
the relationship he observes between the size of direction and strength of association between
the outpatient waiting list and the number of new them. (We used the number of “deaths and
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 305
Table 1 Was the change in size directly correlated with the balance of enrolments and admissions in South Glamorgan,
Wales?
Waiting for out-patient assessment Waiting for in-patient admission
No. of No. of Size of 'out- No. of No. of Size of 'in-
'GP 'new out- patient 'new out- 'discharges patient
Year Qtr
referrals' patients waiting list' Net in- Change in patients and deaths' waiting list' Net in- Change in
booked' Flow Stock booked' Flow Stock
E A C now E-A C now−C then E A C now E-A C now−C then
[3] [4] [6] [9] [10] [3] [4] [6] [9] [10]
1976 1 1,076 1,109 712 −33 1,109 363 290 746
2 1,112 923 813 189 101 923 455 271 468 −19
3 1,197 1,296 495 −99 −318 1,296 412 237 884 −34
4 1,028 1,117 105 −89 −390 1,117 451 233 666 −4
1977 1 1,080 826 346 254 241 826 423 264 403 31
2 1,108 583 857 525 511 583 354 195 229 −69
3 1,044 615 1,126 429 269 615 377 231 238 36
4 1,068 575 1,360 493 234 575 385 222 190 −9
1978 1 1,020 672 1,649 348 289 672 346 207 326 −15
2 1,229 555 2,076 674 427 555 344 114 211 −93
3 1,092 704 2,084 388 8 704 321 152 383 38
4 1,205 1,385 1,489 −180 −595 1,385 300 222 1,085 70
1979 1 1,237 1,381 1,247 −144 −242 1,381 247 237 1,134 15
2 876 583 1,505 293 258 583 358 203 225 −34
3 1,036 704 1,505 332 0 704 323 217 381 14
used Spearman’s rho to assess the direction and • If the number awaiting surgery is always over-
strength of association between them (Spearman’s reported, e.g., by a factor of 1.05, the apparent
rho = + 1.00, n = 4, p < 0.01). change from one census to the next will cor-
The error of closure reports the difference rectly indicate whether the size of the list
between the number of dates of entry and the decreased or increased, but the size of the
number of dates of exit (1) often as a percentage apparent change will be exaggerated by a fac-
of all of those eligible for admission during the tor of 1.05. If this were the only source of error,
period of interest, i.e., Cnow Cthen would always be greater than
E A but would have the same direction.
error of closure ð%Þ ¼ • If the number of entries is always overreported,
e.g., by a factor of 1.05, then E A will
Cthen þ E ðA þ Cnow Þ
100 then : always be too positive. When the size of the
C þ E þ A þ Cnow =2 list is increasing, E A will maximize the
(5:1) amount, and when the size of the list is decreas-
ing, E A will minimize the amount some-
The initiative was associated with a reduction in times to the extent of reporting an increase in
the size of the list in its early years (in 1997 and size where there has been a decrease.
1998, according to E A and to Cnow Cthen). But • If the number of exits is always overreported,
if we are prepared to credit the initiative with success e.g., by a factor of 1.05, then E A will
in its early years – claiming that the initiative reduced always be too negative. When the size of the
the size of the list (Hanning and Lundström 1998) – list is increasing, E A will minimize the
we should also be prepared to credit it with failure in amount sometimes to the extent of reporting a
its later years – acknowledging that the initiative decrease in size where there has been an
increased the size of the list (in 1999 and 2000, increase, and when the size of the list is
according to net in-flow and to change in stock). decreasing, E A will maximize the amount.
The number of dates of entry (Cthen + E) did
not equal the number of dates of exit (A + Cnow): Unfortunately, none of these scenarios fit
the difference ranges from a shortfall of 15,148 to Table 2 in which E A is more negative than
a surplus of +2009. Although these differences are Cnow Cthen in the first and second periods, less
small, a little less than 2.5% when compared with positive in the third, and more positive in the
the number of lifelines enumerated over the period, fourth. This implies either that there is systematic
they should not occur and require some attempt at error in more than one count or that the error is not
explanation. If there were a systematic error in one systematic.
of the three counts, we would expect the direction There are several problems with the counts
and the extent of the error to be consistent. available. We have not been able to reconcile the
Table 2 Did the balance of “entries” and “exits” adequately account for the change in size in INSALUD, Spain?
Waiting for Admission
error of closure
No. of No. of Size of list Net in- Change in Counting Counting
Year “entries” “exits” Flow Stock dates of entry dates of exit difference (%)
now now then then now
E A C E−A C −C C +E A+C
[3] [4] [6] [9] [10] [11] [12] [13] [14]
30-Jun-96 190,000
31-Dec-96 165,735 −24,265
31-Dec-97 445,816 478,452 148,247 −32,636 −17,488 611,551 626,699 −15,148 −2.45
31-Dec-98 489,331 509,414 132,221 −20,083 −16,026 637,578 641,635 −4,057 −0.63
31-Dec-99 557,950 552,929 141,827 5,021 9,606 690,171 694,756 −4,585 −0.66
31-Dec-00 616,527 598,117 158,228 18,410 16,401 758,354 756,345 2,009 0.27
Source: Moral and de Pancorbo 2001
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 307
heights of the columns representing exits from “entries”) may accompany a reduction in the size
the target population with the numbers reported of the list.
in the text. (This undermines our confidence in the
authors’ presentation of their data.) The chart
records the suspiciously tidy 190,000 as the size In England
of the list in June 1996, whereas the text reports a
count of 168,265. (We have chosen to tabulate the Hamblin et al. (1998) tabulate counts which
numbers obtained from the chart which provides describe activity over six periods each of
information on entries as well as exits.) As a 12 months duration and invite their readers to
consequence, we report a change in stock of examine “[t]he effects of the Waiting Time Initia-
24,265 rather than of 2530, but this affects tive” (1998, 13). They supply three different
neither the correlation nor the error of closure. counts of ‘enrolments,’ two different counts of
More importantly, we have had to read the counts ‘admissions,’ and a count of the numbers awaiting
of entries and exits off the printed version of the elective admission on a day-case, or an inpatient,
column chart. We enlarged this so that 1 mm basis. When we used their counts of “[s]pecialist
represented 1674 patients on the vertical axis referring . . . with no date” as a measure of enrol-
instead of 9412. The errors of closure are therefore ment, and “[w]aiting list episodes” as a measure of
equivalent to heights of 9.0, 2.4, 2.7, and 1.2 mm admission, we obtained a perfect correlation
for the periods 1997, 1998, 1999, and 2000, between the change in stock and the net in-flow
respectively. If a measurement may be out by as (Table 3a: Spearman’s rho = + 1.00, n = 5,
much as 1.0 mm, then E A and Cnow Cthen p < 0.01).
may be out by as much as 2.0 mm, and the error Similarly, when we used “[t]otal elective epi-
of closure by as much as 4.0 mm. So one of sodes” as a measure of admission, and combined
these differences is not trivial. The waiting list “[s]pecialist referring . . . with no date” and
initiative claims to have funded an additional “[s]pecialist referring . . . with date” as a measure
35,883 surgical procedures in 1997, when it of enrolment, we obtained a perfect correlation
recorded 15,148 too many exits (or too few between the change in stock and the net in-flow
entries) for the change in size observed. (Table 3b: Spearman’s rho = + 1.00, n = 5,
The data provided by Moral and de Pancorbo p < 0.01).
(2001) is compatible with formula (3) and the pri- We consider this result suspicious although it is
mary hypothesis – an increase in “exits” (net of everything we are looking for. If ‘enrolments,’
Table 3a Did the balance of “[s]pecialist referring” and “episodes” adequately account for the change in size in
England?
Waiting for Admission to hospital
'Specialist 'waiting list 'waiting list
Year- referring to episodes' size' error of closure
end waiting list Net in- Change in Counting Counting
with no date' Flow Stock dates of entry dates of exit difference (%)
E A C now E−A C now−C then C then +E A+C now Ê
[3] [4] [6] [9] [10] [11] [12] [13] [14]
1989/90 2,189,437 2,163,709 912,800 25,728 3,076,509 *
1990/91 2,094,683 2,101,089 906,394 −6,406 −6,406 3,007,483 3,007,483 0 0.00 2,094,683
1991/92 2,261,086 2,251,873 915,607 9,213 9,213 3,167,480 3,167,480 0 0.00 2,261,086
1992/93 2,362,393 2,283,026 994,974 79,367 79,367 3,278,000 3,278,000 0 0.00 2,362,393
1993/94 2,455,038 2,384,643 1,065,369 70,395 70,395 3,450,012 3,450,012 0 0.00 2,455,038
1994/95 2,493,649 2,514,977 1,044,041 −21,328 −21,328 3,559,018 3,559,018 0 0.00 2,493,649
Table 3b Did the balance of “[s]pecialist referring” and “episodes” adequately account for the change in size in
England?
Waiting for Admission to hospital
'Specialist 'Total 'waiting list
Year- referring to ... elective size' error of closure
end with no date' episodes' Net in- Change in Counting Counting
or 'with date' Flow Stock dates of entry dates of exit difference (%)
E A C now E−A C now−C then C then +E A+C now Ê
[3] [4] [6] [9] [10] [11] [12] [13] [14]
1989/90 3,361,737 3,336,009 912,800 25,728 4,248,809 *
1990/91 3,288,594 3,295,000 906,394 −6,406 −6,406 4,201,394 4,201,394 0 0.00 3,288,594
1991/92 3,684,057 3,674,844 915,607 9,213 9,213 4,590,451 4,590,451 0 0.00 3,684,057
1992/93 3,914,759 3,835,392 994,974 79,367 79,367 4,830,366 4,830,366 0 0.00 3,914,759
1993/94 4,065,606 3,995,211 1,065,369 70,395 70,395 5,060,580 5,060,580 0 0.00 4,065,606
1994/95 4,139,168 4,160,496 1,044,041 −21,328 −21,328 5,204,537 5,204,537 0 0.00 4,139,168
‘admissions,’ and ‘size’ had enumerated the same are estimates obtained using formula (4): E ^ ¼A
now then
lifelines and if ‘admission’ was the inevitable and, þC C . (The reader can check these by
therefore, the only outcome of ‘enrolment,’ we adding the content of columns 4 and 10 in each
might hope for a perfect correlation and for errors row. We cannot estimate the number enrolled
of closure of zero. But Hamblin et al. (1998) present during 1989/90 without the size of the list at
counts obtained from Hospital Episode Statistics the start of that financial year.) We think that
alongside counts from the KH07 return, i.e., counts the numbers tabulated as “[s]pecialist referrals
of the number of episodes of investigation or treat- . . . with no date” and “[s]pecialist referrals . . .
ment alongside counts of people awaiting admis- with a date” are estimates rather than counts.
sion, and they omit to report counts of “removals If this is correct, then the number of ‘enrolments’
other than admissions” (CRIR 1998, 3 of KH06). presented in Tables 3a and 3b were obtained by
We are told that “specialists . . . may either refer with assuming that the counts of ‘enrolments,’ ‘admis-
a date for admission (these patients are known as sions,’ and ‘size,’ are perfectly consistent. The
‘booked admissions’) or . . . without a date – the true results therefore cannot be used to test whether this
‘waiting list’ admissions” (Hamblin et al. 1998, 13). is true. At best, the table presented by Hamblin et al.
But the distinction between “booked admissions” (1998) provides an example which shows how the
and “waiting list admissions” was made by Hospital three counts ought to be related were the primary
Episode Statistics among finished consultant epi- hypothesis true (Mason 1976; Fordham 1987). At
sodes, and the distinction between those “with a worst, the table presented by Hamblin et al. (1998)
date” and those “with no date” was made by the invites readers to imagine that this is what actually
KH07 return in its count of the number of patients happened to ‘enrolments’ when counts of finished
awaiting admission. The KH06 return, which consultant episodes and of patients awaiting admis-
counted the number of “decisions to admit” to the sion varied in the manner indicated.
list (and the number “admitted” and the number of
“removals other than admissions” from it), made no
such distinction (CRIR 1998, 3 of KH06). In Victoria, Australia
We know the authors were prepared to fill the
gaps in their table by calculation because they Street and Duckett (1996) report a study of the
indicate that they have done so for two of the combined elective activity of surgeons at public
eight items. The numbers in the column headed hospitals dealing with patients in categories 1–3
^ (on the right-hand side of Tables 3a and 3b)
E in Victoria, Australia. The authors feared that an
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 309
increase in elective procedures would increase the of dates of exit (A + Cnow). If we ignore the grossest
size of the list (Street and Duckett 1996, 4). They error, a shortfall of 2002 cases (6.70%) occur-
use counts of “additions” and “deletions” to ring in December 1991, the difference ranged from
describe activity over a single period of 12 months a shortfall of 105 (0.29%) to a surplus of +109
duration (31 July 1993 to 31 July 1994). (+0.32%) cases and was less than 0.20% in
Street and Duckett (1996, 12) claim that “hos- 28 (out of 32) instances.
pitals have achieved waiting list reduction in the If a measurement may be out by as much as
face of increases in the number of elective surgery 0.5 mm, then E A and Cnow Cthen may be
patients: the number of additions to the list is . . . out by as much as 1.0 mm and the error of closure
offset by increases in the number of patients . . . by as much as 2.0 mm. 28 out of 32 errors cannot
deleted from the list. . ..” They report that the be attributed to this level of inaccuracy in reading
number of category 1 patients waiting shrank the number of “additions” and “deletions” off a
from 1298 on 31 July 1993 to 195 on 31 July scale of 1 mm per 37 cases. While Street &
1994 and that the number of category 2 patients Duckett’s data may not be entirely consistent with
waiting shrank from 12,115 on 31 July 1993 to formula (3) and the primary hypothesis, the differ-
8506 on 31 July 1994 (Street and Duckett 1996), ence between “additions” and “deletions” accounts
and they present an intuitively helpful plot of the very well for the change in size.
number of “additions” to, and the number of
“deletions” from, the surgical waiting list each
month (31 December 1991 to 31 July 1994) In Winnipeg, Canada
(Street and Duckett 1996). This appears to
describe the movement of people on and off the Kreindler and Bapuji (2010) report a study of
combined waiting list, although this is not clearly the elective replacement of hips and knees in
stated in the text. Winnipeg, Canada. Winnipeg Regional Health
It is true that the size of the list has diminished, Authority thought that an increase in elective pro-
despite more additions to the list (85,259, 1 Aug cedures ought to reduce the size of the list
1993–31 Jul 1994 incl.) than in the previous (Kreindler and Bapuji 2010). Kreindler and
year (77,820, 1 Aug 1992–31 Jul 1993). But the Bapuji (2010) use counts of “arrivals” and “depar-
published data permit only a single comparison, tures” to describe activity over 11 periods each of
i.e., of the change in size between 31 Jul 1993 and 3 months duration. They emulate Street and
31 Jul 1994, with the difference in additions and Duckett (1996) in presenting a similarly helpful
deletions over the intervening period. It is there- plot of the number of “arrivals” and the number of
fore not possible to assess the strength of associ- “departures” during each quarter (31 Mar
ation between change in stock and net in-flow. 2005–31 Mar 2008) (Kreindler and Bapuji 2010)
The error of closure is small (335, or 0.29%, of alongside a plot of the number of joints still
those on the list at any point during the year). awaiting surgery at the close of each month
The authors were unable to verify the number (31 Jan 2005–31 Jan 2008) (Kreindler and Bapuji
of additions and deletions we obtained from their 2010). They appreciate that they ought to count
plot (Street and Duckett 1996) 20 years after its the arrival and the departure of joints if they are
publication but kindly volunteered the additional interested in the number of joints requiring sur-
census counts reported in column 6 of Table 4. gery (Table 5) or count the arrival and the depar-
This allows us to describe elective activity over ture of people if they are interested in the number
32 periods each of one calendar month duration. of people awaiting surgery.
The correlation between the change in size and the The correlation between E A and Cnow
then
balance of enrolments and admissions was positive, C was positive, strong, and statistically signifi-
strong, and statistically significant (Spearman’s cant (Spearman’s rho = + 0.90, n = 11, p < 0.01).
rho = + 0.99, n = 32, p < 0.01). But the count But the number of dates of entry (Cthen + E) did
of dates of entry (Cthen + E) did not equal the count not equal the number of dates of exit (A + Cnow):
310 P. W. Armstrong
Table 4 Did the balance of “additions” and “deletions” adequately account for the change in size in Victoria, Australia?
Waiting in Victoria, Australia
Month- No. of No. of Size of list Net in- Change in Counting Counting error of closure
Year
end 'additions' 'deletions' Flow Stock dates of entry dates of exit difference (%)
now now then then now
E A C E−A C −C C +E A+C
[3] [4] [6] [9] [10] [11] [12] [13] [14]
1992 31-Dec 5,988 4,574 26,323 1,414 3,416 28,895 30,897 −2,002 −6.70
31-Jan 4,946 4,686 26,563 260 240 31,269 31,249 20 0.06
29-Feb 6,397 6,248 26,757 149 194 32,960 33,005 −45 −0.14
31-Mar 6,490 6,527 26,689 -37 −68 33,247 33,216 31 0.09
30-Apr 5,671 6,322 26,025 −651 −664 32,360 32,347 13 0.04
31-May 6,118 6,545 25,539 −427 −486 32,143 32,084 59 0.18
30-Jun 6,136 6,136 25,532 0 −7 31,675 31,668 7 0.02
31-Jul 6,545 6,025 26,098 520 566 32,077 32,123 −46 −0.14
1993 31-Aug 6,192 5,969 26,299 223 201 32,290 32,268 22 0.07
30-Sep 6,322 6,360 26,206 −38 −93 32,621 32,566 55 0.17
31-Oct 6,564 6,322 26,463 242 257 32,770 32,785 −15 −0.05
30-Nov 6,564 5,541 27,436 1,023 973 33,027 32,977 50 0.15
31-Dec 6,601 4,426 29,634 2,175 2,198 34,037 34,060 −23 −0.07
31-Jan 5,002 5,002 29,671 0 37 34,636 34,673 −37 −0.11
28-Feb 6,471 6,471 29,776 0 105 36,142 36,247 −105 −0.29
31-Mar 7,271 6,955 30,121 316 345 37,047 37,076 −29 −0.08
30-Apr 6,341 6,694 29,827 −353 −294 36,462 36,521 −59 −0.16
31-May 6,192 6,899 29,088 −707 −739 36,019 35,987 32 0.09
30-Jun 7,085 7,550 28,618 −465 −470 36,173 36,168 5 0.01
31-Jul 7,215 7,122 28,745 93 127 35,833 35,867 −34 −0.09
1994 31-Aug 6,917 7,847 27,740 −930 −1,005 35,662 35,587 75 0.21
30-Sep 7,494 7,810 27,391 −316 −349 35,234 35,201 33 0.09
31-Oct 6,843 7,140 27,113 −297 −278 34,234 34,253 −19 −0.06
30-Nov 7,178 7,736 26,549 −558 −564 34,291 34,285 6 0.02
31-Dec 7,029 6,360 27,164 669 615 33,578 33,524 54 0.16
31-Jan 5,839 6,285 26,678 −446 −486 33,003 32,963 40 0.12
28-Feb 7,252 7,940 25,881 −688 −797 33,930 33,821 109 0.32
31-Mar 7,903 7,959 25,850 −56 −31 33,784 33,809 −25 −0.07
30-Apr 6,583 7,308 25,093 −725 −757 32,433 32,401 32 0.10
31-May 7,624 7,921 24,776 −297 −317 32,717 32,697 20 0.06
30-Jun 7,512 8,014 24,271 −502 −505 32,288 32,285 3 0.01
31-Jul 7,085 7,308 24,041 −223 −230 31,356 31,349 7 0.02
Source: Street and Duckett 1996
the error of closure ranged from a shortfall of 62 (Fig. 2), it is inconceivable that they give different
(1.70%) to a surplus of +62 (+1.58%) cases and counts. It is therefore reasonable to suspect the
was less than 1.00% in 7 (out of 11) instances. data when the counts appear inconsistent.
Kreindler and Bapuji’s data is compatible Kreindler and Bapuji (2010, 76) recognized that
with formula (3) and the primary hypothesis – their count of new “arrivals” might be considered
an increase in “departures” (net of “arrivals”) inflated if admission was the only outcome
may accompany a reduction in the size of the of interest, so they calculated net “arrivals”
list. But we used a scale of 1 mm per 9.5 cases to (2005–2007) by deducting those “removed from
estimate the size of the list and a scale of 1 mm the wait list without surgery” (2005–2007).
per 6.5 cases to estimate the number of “arrivals” Kreindler and Bapuji (2010) may have
and “departures,” so nine out of 11 errors can- deducted the number “removed” from the list
not be attributed to inaccuracy in reading the during a 3 months period from the number
relevant plot. known to have enrolled on the list in the same
When entry (Cthen + E) and exit (A + Cnow) quarter. It is likely that some of those deducted in
dates are used to enumerate the same lifelines this fashion had enrolled earlier. If so, the net
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 311
Table 5 Was the change in size directly correlated with the balance of “arrivals” and “departures” in Winnipeg, Canada?
Waiting in Winnipeg, Canada
Month- No. of No. of Size of list Net in- Change in Counting Counting error of closure
Year
end 'arrivals' 'departures' Flow Stock dates of entry dates of exit difference (%)
now now then then now
E A C E−A C −C C +E A+C
[3] [4] [6] [9] [10] [11] [12] [13] [14]
2005 31-Jan 3,076
28-Feb 3,171
31-Mar 800 600 3,200 200 3,800
30-Apr 3,276
31-May 3,271
30-Jun 797 710 3,338 87 138 3,997 4,048 −51 −1.27
31-Jul 3,352
31-Aug 3,390
30-Sep 745 681 3,400 64 62 4,083 4,081 2 0.05
31-Oct 3,424
30-Nov 3,414
31-Dec 674 739 3,371 −65 −29 4,074 4,110 −36 −0.88
2006 31-Jan 3,352
28-Feb 3,271
31-Mar 679 892 3,190 −213 −181 4,050 4,082 −32 −0.79
30-Apr 3,133
31-May 3,062
30-Jun 769 868 3,029 −99 −161 3,959 3,897 62 1.58
31-Jul 3,043
31-Aug 2,957
30-Sep 769 816 3,024 −47 −5 3,798 3,840 −42 −1.10
31-Oct 2,995
30-Nov 2,957
31-Dec 677 790 2,881 −113 −143 3,701 3,671 30 0.81
2007 31-Jan 2,881
28-Feb 2,867
31-Mar 732 842 2,833 −110 −48 3,613 3,675 −62 −1.70
30-Apr 2,771
31-May 2,681
30-Jun 716 865 2,662 −149 −171 3,549 3,527 22 0.62
31-Jul 2,629
31-Aug 2,581
30-Sep 616 677 2,614 −61 −48 3,278 3,291 −13 −0.40
31-Oct 2,562
30-Nov 2,562
31-Dec 685 748 2,519 −63 −95 3,299 3,267 32 0.97
2008 31-Jan 2,500
The change in stock correlated perfectly with the size of the list at the close of each quarter
net in-flow (Spearman’s rho = + 1.00, n = 64, (the KH07) and the amount of activity over its
p < 0.01) (Armstrong 2010). The number of course (the KH06 and KH07A). These central
dates of entry (Cthen + E) equals the number of returns were collated by the Department of
dates of exit (A + Cnow), and there was no error of Health and used to produce aggregate counts for
closure in any of the quarters studied. England.
It seems that the National Cataract Register
for Sweden is entirely consistent with formula (3) Twelve Periods Each of 3 Months
and the primary hypothesis – the relationship Duration
between enrolments, admissions, and the size of Newton et al. (1995) reports a study of elective
the list was found to be mathematically exact. inpatient activity combined across NHS hospitals
None of the numbers presented in columns 3, 4, in England. The authors acknowledge that
and 6 of Table 6 were obtained by calculation. The “studies . . . have so far failed to show a strong
count of enrolments was obtained by enumerating inverse correlation between admission rates and
records with a start date in the period of interest, list size” (Newton et al. 1995, 784). Newton et al.
and the count of admissions was obtained by enu- (1995) describe activity over 12 periods each of
merating records with an end date in the relevant 3 months duration using counts of additions and
period. The count of those awaiting admission was admissions from the KH06 return and counts of
obtained by enumerating records where the start the number still waiting from the KH07 return.
date preceded, and where the end date succeeded, They report that “changes in the number of admis-
the date and time of the relevant census. sions correlated inversely with changes in list size
It is helpful, on this occasion, that the (r = – 0.62; P < 0.001) . . . [a]fter adjusting for
dataset registers extractions and is compiled ret- changes in the number of additions to lists”
rospectively. It does not contain any record (Newton et al. 1995, 783). They obtain an inverse
where a patient was removed from the list with- relationship because they model the effect on
out having received treatment, and it does not changes in size of admission (adjusting for
contain any record where the outcome is not yet enrolments) rather than the effect of enrolment
known. So if we want to know how many cata- (adjusting for admissions). The correlation is sig-
racts were enrolled during a particular quarter, nificant but not perfect, which means the errors of
or how many – at a specified date – were still closure cannot be zero. Regrettably, the authors
awaiting extraction, we have to allow sufficiently plotted the number of admissions and the number
lengthy follow-up to ensure that each of them still waiting but not the number of additions, so
received treatment. (Armstrong (2010) restricted we are not able to construct a suitable table for
his analysis to the set of cataracts extracted less ourselves.
than 2 years after enrolment.) But no count has to We think this result is due – at least in part – to
be adjusted in the manner described by Kreindler a mismatch between their model and the records.
and Bapuji (2010) to exclude those removed The KH07 census counted some people who were
from the list. As a result, the records are consis- subsequently removed from the list without hav-
tent with the model. ing been admitted. Street and Duckett (1996)
recognized that the size of their waiting list dimin-
ished as a result of deletion from the list, and they
In England counted other reasons for deletion alongside treat-
ment, but Newton et al. (1995) did not supplement
The four studies cited here provide different com- their counts of admissions with the counts of other
pilations from the same series of counts. These removals though these were also available from
counts were obtained from the Patient Adminis- the KH06 return.
tration System for each provider and used to com- If we modify formula (3) to allow for an out-
plete a set of standard forms, which described come other than admission, we obtain
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 313
Table 6 Did the balance of enrolments and admissions adequately account for the change in size in Sweden?
Table 6 (continued)
Table 3c Did the balance of decisions to admit and of “admissions” and “removals” adequately account for the change in
size in England?
Waiting for Admission (in-patient or day case)
No. of No. of No. of Size of
Month- error of closure
Year 'decisions- elective other waiting list Net in- Change in Counting Counting
end
to-admit' 'admissions 'removals' Flow Stock dates of entry dates of exit difference (%)
now
E A R C E−(A+R) C now−C then C
then
+E A+R+C
now
[3] [4] [5] [6] [9] [10] [11] [12] [13] [14]
2000 31-Dec 1,034,381
2001 31-Mar 992,918 872,188 172,696 1,006,727 −51,966 −27,654 2,027,299 2,051,611 −24,312 −1.19
Table 3d Did the balance of decisions to admit and of admissions and removals adequately account for the change in
size in England?
[3] [4] [5] [6] [9] [10] [11] [12] [13] [14]
31-Mar-89 2,783,298 2,632,085 200,677 922,676 −49,464 3,755,438
31-Mar-90 2,943,658 2,768,482 260,503 958,976 −85,327 36,300 3,866,334 3,987,961 −121,627 −3.10
31-Mar-91 2,964,836 2,761,005 306,899 948,243 −103,068 −10,733 3,923,812 4,016,147 −92,335 −2.33
31-Mar-92 3,257,615 2,993,532 387,980 917,717 −123,897 −30,526 4,205,858 4,299,229 −93,371 −2.20
31-Mar-93 3,480,268 3,111,627 412,299 994,974 −43,658 77,257 4,397,985 4,518,900 −120,915 −2.71
31-Mar-94 3,501,715 3,110,477 451,559 1,065,369 −60,321 70,395 4,496,689 4,627,405 −130,716 −2.87
31-Mar-95 3,765,407 3,376,016 521,320 1,044,051 −131,929 −21,318 4,830,776 4,941,387 −110,611 −2.26
31-Mar-96 3,968,825 3,500,353 547,863 1,048,029 −79,391 3,978 5,012,876 5,096,245 −83,369 −1.65
31-Mar-97 4,111,511 3,549,074 551,999 1,158,004 10,438 109,975 5,159,540 5,259,077 −99,537 −1.91
31-Mar-98 4,192,037 3,543,634 558,242 1,297,662 90,161 139,658 5,350,041 5,399,538 −49,497 −0.92
31-Mar-99 4,189,323 3,826,507 672,432 1,072,860 −309,616 −224,802 5,486,985 5,571,799 −84,814 −1.53
31-Mar-00 4,159,078 3,682,180 622,787 1,037,066 −145,889 −35,794 5,231,938 5,342,033 −110,095 −2.08
31-Mar-01 3,935,930 3,467,338 613,931 1,006,727 −145,339 −30,339 4,972,996 5,087,996 −115,000 −2.29
31-Mar-02 3,781,437 3,244,185 581,534 1,035,365 −44,282 28,638 4,788,164 4,861,084 −72,920 −1.51
31-Mar-03 3,778,390 3,330,981 601,353 992,075 −153,944 −43,290 4,813,755 4,924,409 −110,654 −2.27
31-Mar-04 3,802,744 3,391,644 621,345 905,753 −210,245 −86,322 4,794,819 4,918,742 −123,923 −2.55
31-Mar-05 3,787,713 3,390,694 612,004 821,722 −214,985 −84,031 4,693,466 4,824,420 −130,954 −2.75
31-Mar-06 4,031,519 3,577,104 613,626 784,572 −159,211 −37,150 4,853,241 4,975,302 −122,061 −2.48
31-Mar-07 4,154,486 3,746,666 613,886 700,624 −206,066 −83,948 4,939,058 5,061,176 −122,118 −2.44
31-Mar-08 4,355,950 4,043,307 646,394 531,520 −333,751 −169,104 5,056,574 5,221,221 −164,647 −3.20
31-Mar-09 4,979,682 4,418,090 647,550 565,954 −85,958 34,434 5,511,202 5,631,594 −120,392 −2.16
any discrepancy between the two figures and why counts of decisions to admit and of the number
the Department of Health concurred (CRIR 1997). admitted or removed from the list to describe
The NAO’s data is compatible with formula (3) activity over 20 periods each of 12-months dura-
and the primary hypothesis – an increase in tion (Table 3d). These counts were obtained from
‘admission’ (plus “removal” net of enrolment) the same returns used by the NAO (2001a).
may accompany a reduction in the size of the list. The correlation between E (A + R) and
Cnow Cthen was positive, strong, and statisti-
Twenty Periods each of 12-months cally significant (Spearman’s rho = + 0.97,
Duration n = 20, p < 0.01). But the number of dates of
The House of Commons Health Committee entry (Cthen + E) did not equal the number of
(2010) published an extended series of counts dates of exit (A + R + Cnow): the discrepancy
obtained from the Department of Health. It uses ranges from –164,647 (–3.20%) to
316 P. W. Armstrong
Table 7 Does the balance of enrolments and admissions (plus other removals) correctly predict the direction of any
change in the size of the list?
+ − + − + −
+ 11 2 13 + 2 6 8 + 2 5 7
C now −C then C now −C then C now −C then
− 0 19 19 − 0 12 12 − 0 2 2
–49,497 (–0.92%) patients. The counts systemat- an increase in admission (plus removal) net of
ically overestimate the number of exits from the enrolment may accompany a reduction in the size
English waiting list (or systematically underesti- of the list.
mate the number of entries on it).
Had we predicted that the size of the list would Nine Periods each of 6-months Duration
shrink, we would have been mistaken only eight Armstrong (2000) reports a study of elective inpa-
times out of 20 (Table 7). Had we used net in-flow tient and day-case activity combined across NHS
to predict the direction of change in stock, we hospitals in England. He describes nine periods
would have predicted an increase on two occa- each of 6-months duration using counts of deci-
sions and a decrease on 18, i.e., we would have sions to admit and of the number “admitted” or
been mistaken on six out of 20 occasions. This “removed”, who “self-deferred”, “failed to
reduction in the error of prediction of 25% attend”, or were “suspended”. These counts were
(LB = 0.25) is not significant. So the direction of obtained from the same returns used by the NAO
any change in size appears to have had little to do (2001a) and by the House of Commons Health
with the efforts made during the course of the Committee (2010).
year. Results such as this might go some way to In Table 3e, the change in size is always more
explaining the frustration of at least one former positive than the net in-flow by between 68,237 and
Minister of Health (Powell 1966). 32,115 patients, so the error of closure ranged
The six exceptions in this data might be thought between 2.27% and 1.15%. Armstrong asserts
consistent with hypotheses of self-regulation and of that “[t]he number of patients waiting at the start of
supplier-induced demand – the size of the list showed a calendar period of interest or who counted as new
an increase when it ought to have shown a decrease. ‘decisions-to-admit’ or as those ‘reset-to-zero’ or
But it should be noted that the exceptions in the data ‘reinstated’ during it, must be reconciled with the
provided by Street and Duckett (1996) occur only numbers admitted, removed, self-deferred, failed,
when E = A, i.e., when E A = 0, and that there medically deferred or suspended during the calen-
are no exceptions in the data presented by other dar period of interest or still awaiting admission at
researchers (White 1980; Moral and de Pancorbo its close” (Armstrong 2000, 2043). But he was
2001; Kreindler and Bapuji 2010; Armstrong unable to account for this discrepancy by allowing
2010), i.e., the direction of net in-flow (E A) per- for other flows “onto and off of the waiting list” for
fectly predicts the direction of any change in size which there were data, i.e., those who were
(Cnow Cthen). More importantly, exceptions (Street suspended from the list, those who canceled
and Duckett 1996; House of Commons Health Com- arrangements for their own admission or who sim-
mittee 2010) are observed only because the number ply did not attend, those who were reinstated to the
of dates of entry does not equal the number of dates list, and those whose start date was reset to zero
of exit in the KH06 and KH07 returns. (Armstrong 2000, 2043–2045).
The Health Committee’s data is compatible The correlation between E (A + R) and Cnow
with formula (3) and the primary hypothesis – Cthen was positive, strong, and statistically
14
Table 3e Did the balance of “decisions to admit” and of “admissions” and “removals” adequately account for the change in size in England?
Censused 'Decisions- 'Reset-to-'Reinstated' Admitted Removed Self-deferred Failed Medically Suspended Censused 31 Net in- Change in Counting Counting error of closure
Year
30 June to-admit' zero' deferred December Flow Stock dates of entry dates of exit difference (%)
then now then now
C E A R C E-(A+R) C now−C then C +E A+R+C
[3] * † [4] [5] [6] [9] [10] [11] [12] [13] [14]
1988 878,306 1,389,133 298,687 - 1,286,087 95,431 95,508 203,179 - - 931,495 7,615 53,189 2,566,126 2,611,700 −45,574 −1.8
1989 922,877 1,446,243 307,945 - 1,323,492 122,104 99,189 208,756 - - 971,845 647 48,968 2,677,065 2,725,386 −48,321 −1.8
1990 955,786 1,485,021 210,352 - 1,373,394 154,738 101,028 109,324 - - 965,520 −43,111 9,734 2,651,159 2,704,004 −52,845 −2.0
1991 964,050 1,614,328 190,474 - 1,463,869 196,526 93,065 97,409 - - 950,098 −46,067 −13,952 2,768,852 2,800,967 −32,115 −1.2
1992 937,054 1,748,716 204,380 - 1,553,237 202,358 111,373 93,007 - - 977,189 −6,879 40,135 2,890,150 2,937,164 −47,014 −1.6
1993 1,019,341 1,731,690 225,203 - 1,531,449 222,034 133,802 91,401 - - 1,065,785 −21,793 46,444 2,976,234 3,044,471 −68,237 −2.3
1994 1,077,497 1,861,754 257,577 50,008 1,665,747 251,393 160,343 97,234 - 50,008 1,070,492 −55,386 −7,005 3,246,836 3,295,217 −48,381 −1.5
1995 1,052,958 1,972,067 288,143 92,966 1,739,917 273,491 182,723 105,420 - 92,966 1,054,948 −41,341 1,990 3,406,134 3,449,465 −43,331 −1.3
1996 1,056,122 2,067,520 306,572 123,383 1,799,013 273,861 193,345 113,227 - 123,383 1,104,984 −5,354 48,862 3,553,597 3,607,813 −54,216 −1.5
1997 1,207,515 ‡ ‡ ‡ ‡ ‡ ‡ ‡ - ‡ 1,261,915 54,400
Note: The numbers in italics contribute nothing to the difference between those becoming eligible for admission, and those becoming ineligible, so the error of closure is really a
comparison of Cthen + E and A + R + Cnow
Adapted from Armstrong (2000)
a
Estimated as the number who self-deferred or failed-to-attend for admission to hospital that quarter
b
Estimated as the number temporarily suspended or deferred on medical grounds that quarter
c
The quarterly counts were not collected in 1997/98
Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . .
317
318 P. W. Armstrong
consistent with periodic censuses of population. It 1981b). In other words, we assumed that the
ought to be relatively easy to obtain counts of waiting list had all of the attributes implied by
enrolments, admissions, and removals which are our use of these two variables. (We did not modify
consistent with periodic censuses of the list. the size of the list, deducting any patient who was
We can cross-examine the paper, or digital, suspended or deferred at that point; and we did not
records rather than the individuals they represent: modify the length of wait, deducting any period
the records are retrieved and dismissed at the when a patient was considered to be unfit or
researchers’ convenience, and their details are thought to be unavailable (Armstrong 2010)).
always available for inspection and analysis. It Like ourselves, other researchers are obliged to
ought therefore to have been possible to report assume that the data are complete (or are at least
an error of closure of 0% under the Körner representative) and that the data are accurate
Reporting System (CRIR 1997) which counted (or are at least not distorted) if they wish to pro-
relevant records from a hospital’s Patient Admin- ceed with their enquiries. Our success seems to
istration System. One part of the error observed suggest that the difficulties experienced by others
was due to the use of inconsistent definitions (Armstrong 2000; NAO 2001a) may be due to a
(Newton et al. 1995), another was due to incom- mismatch between the model and the data. The
plete flows (NAO 2001a), and still another was, dataset is simple (IMG 1992); the data model is
we think, the result of allowing the data model to elaborate.
become too elaborate (Armstrong 2000). If our So when the Steering Group on Health Ser-
systems do not allow us to identify who was vices Information (1984) proposed what became
eligible for admission in the period between two the KH06 and KH07 returns, they envisaged that
censuses, and if they do not allow us to demon- patients would join the list as the result of a
strate that there are as many dates of exit for this ‘decision to admit’ authorized by a clinician
set of records as there are dates of entry, the (Steering Group 1984, 85; IMG 1992, 5/3 & 5/8)
apparent complexity of the waiting list is a prob- and that patients would leave the list either as the
lem of our own making. result of “hav[ing] been admitted” (Steering
Group 1984, 85) or as the result of “no longer
needing to be admitted” (Steering Group 1984,
The Balance of Enrolments 86). The only complication which seems to have
and Admissions (Plus Other Removals) been envisaged relates to those patients whose
Equals the Change in Size. Why? arrangements for admission miscarry. These fall
into four categories: (a) those who did not attend,
We attribute our success (Armstrong 2010) in i.e., who neither declined the arrangement in
demonstrating this relationship to two things. advance nor presented themselves on the day,
(b) those who deferred admission by contacting
the hospital in advance, (c) those whose admis-
If the Model Is Not Complicated, sion was canceled by the hospital, and (d) those
the Data Must Be Simple! who were admitted but subsequently discharged
without having undergone investigation or
The first is our assumption that each wait started treatment.
and ended on the start and end dates of the record. The Working Group recommended that infor-
This implies (a) that the dataset is complete, i.e., mation be collected about the “[n]umber of
that no record was omitted, and (b) that both dates patients for whom arrangements to admit were
were entirely accurate. It also implies (c) that made but [who] were not admitted” (DHSS
everyone, having once enrolled, was eventually 1981b, 125), i.e., it did not distinguish between
admitted and (d) that no wait was ever broken. the first, second, and third categories. The
Items (c) and (d) are implied by the data defini- Steering Group recommended that information
tions and tables of Working Group A (DHSS be collected about the “[n]umber of patients . . .
320 P. W. Armstrong
who were not admitted because they failed to 4. Patients who were not admitted from the
attend” (Steering Group 1984, 87), i.e., it did not waiting list because they declined an offer or
distinguish between the first and second catego- canceled an arrangement were also not to be
ries. But the Steering Group also recommended counted as “not admitted” (IMG 1992, para.
that information be collected about the “[n]umber 41; CRIR 1997).
of patients for whom . . . admission did not take
place because of cancellation by the hospital” The instructions assert that “patients should
(Steering Group 1984, 87), i.e., about the third only be taken off the elective admission list
category. But the KH07A return, developed in when they have been treated – unless the treat-
the 6 months prior to implementation of the sys- ment is no longer required” (IMG 1992, para. 41),
tem (DHSS 1986), asked for counts of the number as though this had always been self-evident. But
of patients who deferred their own admission (the the examples given seem to suggest that practice
second category) rather than counts of the number was in need of correction. “Patients should not be
whose admission was canceled by the hospital. removed from the waiting list, because of self-
The earliest version of the KH06 reported four deferrals or deferral by the hospital. For example,
“events occurring during the quarter” (DHSS a patient admitted but sent home because treat-
1986, 4) namely, the “decisions to admit” which ment has been deferred . . . should not be removed
marked addition to the list and three mutually from the elective admission list” (IMG 1992, para.
exclusive outcomes which marked subtraction 41). Those who “failed to arrive” (IMG 1992,
from it. It was anticipated that a patient might be para. 48) are carefully distinguished from “self-
admitted from the waiting list to undergo investi- deferred admissions . . . or admissions cancelled
gation or treatment on an elective basis prior to by the hospital” (IMG 1992, para. 87). They have
discharge, that a patient might not be admitted neither been admitted from the waiting list as
although arrangements for this had been made, arranged nor removed from the waiting list as no
or else that a patient might be removed from the longer requiring elective admission. So they
waiting list as no longer requiring the elective appear to constitute a third class of event in the
admission intended. earliest version of the return in addition to the two
The three outcomes were subsequently defined expressly authorized.
by the Data Manual (version 1.0) so as to sub- The waiting list envisaged by the Steering
sume other possibilities: Group on Health Services Information (1984)
appears to have been one in which the arrange-
1. Patients who were admitted as emergencies ment of admission fulfilled the hospital’s entire
were not to be counted as having been admitted responsibility to the patient. Such a view seems
from the waiting list as arranged (IMG 1992). scarcely credible and therefore needs to be
Rather, they were to be counted as having been substantiated:
removed from the waiting list as no longer
requiring elective admission (CRIR 1997). • Some of the instructions in the Data Manual
2. Patients, who were admitted from the waiting (version 1.0) seem to confirm such an attitude
list but were then discharged from hospital toward the patient. So if a patient “failed to
without undergoing the investigation or treat- arrive” without giving notice of her intentions,
ment planned, were not to be counted as having her details are to be returned to the GPFH who
been admitted from the waiting list as arranged will determine whether she requires a fresh
(IMG 1992). referral, another consultation, and a new deci-
3. Patients who were not admitted from the sion to admit (IMG 1992, para. 71; CRIR
waiting list because the arrangement had been 1997). But the patient who declines an offer or
canceled by the hospital were not to be counted cancels an arrangement in good time receives a
as “not admitted” (IMG 1992, para. 41; CRIR degree of consideration. She is counted as
1997). waiting “with [a] date” until the intended
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 321
admission has passed, and she is then given a admission. He noted that they “no longer
start date the same as that on which she ought to need . . . or wish . . . to be admitted” at the
have left the list (IMG 1992). In other words, time of the review. But he thought their even-
the hospital authorizes the patient’s return to the tual removal from the list implied that they
list without forwarding her details to the GPFH, were never really available for admission. He
waiting for another letter of referral, and orga- infers that they were not eligible at the time of
nizing a fresh consultation in due course. The any census in which they appeared and that the
consideration extended to the exception – the decision to admit ought never to have been
patient who self-deferred admission – seems to authorized (Lee et al. 1987). He recommends
confirm the rule about the patient who gave no deducting their contribution to counts of deci-
warning but failed to attend. sions to admit and of the numbers still waiting.
• This attitude also seems to be confirmed by We think this view seriously flawed. He rejects
instructions in the Data Manual (version 4.0) the possibility that these patients could have
about patients discharged without having been received investigation or treatment had it been
investigated or treated. “Patients are taken off made available more promptly.
the elective admission list once they are admit-
ted into hospital. If treatment is then deferred There are grounds therefore for imagining that
because of lack of facilities or for medical the balance of decisions to admit less the three
reasons . . . the patient is discharged . . . A outcomes (KH06) ought to have accounted for
new decision to admit and a new elective differences between the number waiting (KH07)
admission list entry will then be made for the at the close of this quarter and the number waiting
patient” (CRIR 1997, 16). So the wait is con- at the close of the last quarter in the earliest days of
sidered to be completed upon admission the Körner Reporting System. If this were the case,
regardless of what happens next (CRIR the simplest model would require the insertion
1998), and the patient who has not received of an additional variable in formula (3.1) so that
the elective investigation or treatment prom- E (A + N + R) = Cnow Cthen, where N rep-
ised will need “[a] new decision to admit and resents the number “not admitted” during the inter-
a new [entry on the] elective admission list” if val between Cnow and Cthen.
she wishes to try again. The size of such a list Table 3f allows us to assess the consistency
shrinks not only as a result of admissions ofthese counts. The correlation between E (A +
which are followed by investigation and treat- N + R) and Cnow Cthen was strong, but it was
ment but also as a result of admissions which not statistically significant (Spearman’s rho =
are not. 0.96, n = 4, p = 0.20), and it did not have the
• The Working Group recommended that direction desired: the net in-flow indicates that
“waiting lists [be] regularly reviewed to the size of the list was getting smaller, while the
remove patients no longer needing or wishing change in stock indicates that the size of the list
to be admitted” (DHSS 1981b, 127), acknowl- was getting bigger. (The counts of stock (KH07)
edging that some would never be admitted and flow (KH06) do not appear to describe the
from the waiting list. But it did not recommend same waiting list.) There was a substantial error of
counting the “[n]umber of patients . . . removed closure ranging from 10.40% to 4.90% of
from a list for reasons other than elective those eligible for admission at any point over the
admission” (Steering Group 1984, 87). This relevant 6 months.
suggests that the Working Group felt no The discrepancy in Table 3f might be
responsibility toward those removed. One of explained in a number of ways. Apart from simple
the members of the group expressed an appro- underreporting of the number of patients added to
priate concern that the number of those still the list or overreporting of the numbers admitted
waiting should not be exaggerated by includ- from the list or removed, this might occur where
ing anyone no longer eligible for elective individuals are reported as contributing more than
322 P. W. Armstrong
Table 3f Did the balance of “decisions to admit” and of those “admitted,” “not admitted,” or “removed” adequately
account for the change in size in England?
Waiting for Admission (in-patient or day case)
No. of No. who No. who No. who Size of list Net Change Counting Counting error of closure
Year
'decisions- were were 'not were 31-Dec 30-Jun in-Flow in Stock dates of entry dates of exit difference (%)
to-admit ' 'admitted ' admitted ' 'removed '
E A N R C now C then E-(A+N+R) C now -C then C then +E A+N+R+C now
[3] [4] [5] [7] [8] [9] [10] [11] [12] [13] [14]
1988 1,389,133 1,286,087 203,179 95,431 931,495 878,306 −195,564 53,189 2,267,439 2,516,192 −248,753 −10.40
1989 1,446,243 1,323,492 208,756 122,104 971,845 922,877 −208,109 48,968 2,369,120 2,626,197 −257,077 −10.29
1990 1,485,021 1,373,394 109,324 154,738 965,520 955,786 −152,435 9,734 2,440,807 2,602,976 −162,169 −6.43
1991 1,614,328 1,463,869 97,409 196,526 950,098 964,050 −143,476 −13,952 2,578,378 2,707,902 −129,524 −4.90
one outcome but no more than one decision to the number of decisions to admit) and removals
admit. For example, where a patient is transferred from the waiting list that have taken place during
from a list at another hospital and is duly admitted the quarter” and also asserts that “[t]he change in
or removed without a local decision to admit the total numbers waiting should reflect this activ-
having been made (IMG 1992). Or where a patient ity” (CRIR 1997, para. 144). Despite the fact that
is removed from the list as not medically fit for “failed to attend” is classed as an event on the
elective admission (CRIR 1997) and is subse- KH06 return (CRIR 1997, para. 148), the simplest
quently reinstated without a fresh decision to explanation for the discrepancy within Table 3f is
admit having been made (IMG 1992). that there are two outcomes which end enrolment
Other possibilities are more complicated and not three. We obtain a better account of the stock
appear to be capable of accounting only for a part and flow of the English waiting list if we omit the
of the problem. So if a patient is temporarily “failed to attend” (Table 3g).
suspended from the list on medical grounds at the Table 3g shows the consistency of the counts if
close of a quarter, he will either be omitted from the the relationship is, in practice, best described by
decisions to admit over that quarter or else be formula (3.1). The correlation between E (A +
omitted from those still waiting at its close. In the R) and Cnow Cthen was perfect and had the
first instance, there will appear to have been fewer direction desired, but it was not statistically
dates of entry (column 11, Table 3f) to the period of significant (Spearman’s rho = + 1.00, n = 4,
interest and the reported difference (column 13) in p = 0.20). There was a small error of closure
counts of dates of entry and dates of exit and the ranging from 2.14% to 1.24% of those eligible
error of closure (column 14) – being negative – will for admission at any point over the relevant
appear larger. In the second, there will appear to 6 months.
have been fewer dates of exit (column 12) from the This is a little disconcerting. The data model
period of interest and the reported difference (col- used in practice appears to be simpler (CRIR
umn 13) in counts of dates of entry and dates of exit 1997) than the Data Manual would have us believe.
and the error of closure (column 14) – being nega- Within a short time of implementation, the
tive – will appear smaller. Government Statistical Service began to modify
The Data Manual presents a complicated the KH06, KH07, and KH07A returns. Now we
series of rules about what parts of which records sympathize with the performance analyst who
contribute data on the official wait for elective wishes to restrict attention to that part of the list,
admission. But version 1.0 also asserts that and that portion of the wait, for which a manager
“patients should only be taken off the elective (or a clinician) might reasonably be held respon-
admission list when they have been treated – sible. But we think the returns were changed with-
unless the treatment is no longer required” (IMG out considering the effect on the consistency of
1992, para. 41). Version 4.0 claims that “[t]he . . . the counts.
KH06 . . . relate[s] to elective admission list Neither the DHSS (1981b), nor the Steering
events – all the additions to the waiting list (i.e., Group (1984), nor the authors of the first set of
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 323
Table 3g Did the balance of “decisions to admit” and of those “admitted” or “removed” adequately account for the
change in size in England?
returns (DHSS 1987) mention the possibility of list and will subsequently contribute to the rele-
suspension from the waiting list either on medi- vant count of admissions or removals, so the size
cal grounds or for social reasons. But version 1.0 of the list is also too small for the number admitted
of the Data Manual instructed the NHS to sus- or removed. The publication of well-worded def-
pend from the list those “patients who are not initions may have improved the consistency of
medically ready for admission” (IMG 1992, 16), meaning attached to the various items, and the
and version 4.0 of the Data Manual advised the suspension of some (IMG 1992) who were not
NHS that this was consistent with the practice of medically ready may have improved the homoge-
not adding patients to the list until they are neity of the group requiring investigation or treat-
“likely to be fit for surgery when offered” ment. But omitting those reinstated during the
(CRIR 1997, 17). Version 4.0 also advised the quarter, and those suspended at its close, did not
NHS that “[p]atients may also be suspended from improve the consistency of counts of enrolments
[a] . . . list for social reasons such as holidays or and admissions (plus removals), with size.
family commitments which may be notified in Insistence on a model with more carefully
advance” (CRIR 1997, 17). specified outputs ought to have prompted the
The IMG (1992, 9 & 18) asserted that development of a dataset with more carefully
“[p]atients who are currently not medically ready defined classes and counts. The National Audit
should not be included in the national returns” Office (2001a, 21) reports the Department of
and emphasized that “patients . . . who are not Health as “acknowledging that they do not mea-
medically ready for admission are excluded from sure every flow onto and off of the waiting list.”
all waiting list central returns.” Now counts of But insistence on a model which introduces a
enrolments, admissions, and size ought to be con- break anywhere between the beginning and end
sistent if each of them exclude all of those of the patient’s time on the list demands another
removed from the list (Lee et al. 1987; Kreindler level of complexity from the dataset.
and Bapuji 2010). In the same way, counts of
enrolments, admissions (plus removals), and size • In some instances, the wait continues to accrue.
ought to be consistent if each of them exclude all The patient who is suspended from the list on
of those ever temporarily suspended from the list. medical grounds becomes invisible to enumer-
But these are patients whose admission to the ation in the census, but there is no outcome or
list was authorized because they were thought end date before the census to account for the
“likely to be fit for surgery when offered.” It is disappearance, and there is no start date or
likely therefore that counts of decisions to reinstatement after the census to account for
admit enumerated some who were subsequently the reappearance (IMG 1992). The effect on
excluded from a census, so the size of the list is too the Körner Reporting System is to make the
small for the number enrolled. Moreover, (most counts of stock and flow less consistent. (The
of) those excluded from the census because they Data Manual (version 4.0) acknowledges
were not medically ready will be reinstated to the the problem. The number of patients suspended
324 P. W. Armstrong
from the list – on social grounds – is to be added thing – give different answers (Armstrong 2000;
back to the number still waiting before assessing NAO 2001a) and in which a simple relationship
whether the counts are consistent (CRIR 1997).) has been made to appear complicated. If the
• In other instances, the accumulated wait is dataset is to be used to develop insight as well as
discounted. The patient who declines an to manage performance, then it must satisfy the
offer or cancels an arrangement accrues time requirements of researchers as well as those of
on the list until the date offered or arranged. analysts.
This then becomes the effective date of the
patient’s addition to the list and the wait accu-
mulated to date is reset to zero. But no out- The Number of ‘Starts’ and ‘Stops’ Must
come marks the end of the first wait, and no Be the Same
decision to admit marks the beginning of the
second, so there is no record of flows which The second reason for our success is that a simple
can account for the changes within the rele- relationship exists.
vant waiting time categories. We identify all of those waiting – at a given
moment – to be admitted for elective investigation
A more elaborate definition of the wait for or treatment, and we conduct a count. The only
investigation or treatment requires a more com- people on the list are those whose date (and time)
plicated dataset, with additional variables to pro- of enrolment preceded the date (and time) of the
vide a start date and an end date for the latest of census and whose date of admission (or removal)
those occasions on which the patient is classed succeeded it. (If obtaining this count is compli-
as “not being medically ready” (IMG 1992, 9). cated, it is because the list has been so narrowly
A still more elaborate definition requires a still defined that a great number of characteristics have
more complicated dataset, with variables to pro- to be evaluated in order to decide whether a par-
vide start dates and end dates for each occasion on ticular record should be included or not.)
which the patient is suspended (CRIR 1997) and The count varies from one time to another. It is
for each occasion (first, second, etc.) when a not difficult to apprehend that a unit increase in its
patient deferred admission. But what has not size must follow each enrolment over the interval
been recognized is that the occurrence of a break and that a unit decrease in its size must follow
between enrolment and admission (or removal) each admission (or removal), if no one contributes
has to be accounted for by flows other than enrol- more than one record to the dataset. It follows that
ment and admission (or removal). The definitions the balance of enrolments and admissions (plus
adopted under the Körner Reporting System soon removals), E (A + R), must exactly equal any
became so complicated that there were not vari- change in the size of a list, Cnow Cthen, and that
ables enough to represent all of those thought to the completeness, accuracy, and validity of the
be eligible, or ineligible, for admission over counts ought to be questioned whenever it fails
a period of interest (Armstrong 2000; NAO to do so.
2001a). Some patients who had been temporarily There is nothing original about the assertion
suspended as “not medically ready” (IMG 1992, that the balance of enrolments and admissions
9) were subsequently removed from the list with- ought to equal any change in the size of the
out having first been reinstated (CRIR 1997). waiting list. Mason (1976) constructed a hypo-
Data definitions have sometimes become so thetical example which – though it was incom-
elaborate that it has proven impossible to recon- plete – indicated that any difference in the
struct the state of the records as they stood on a numbers of enrolments and admissions was
particular date, even with the most up-to-date expected to account for any change in size, and
versions of the relevant software (Farquharson Fordham (1987) provided a complete example
2011). We think this reprehensible. The result is which showed the behavior of two hypothetical
a list in which two counts – ostensibly of the same lists over four quarters. The Department of Health
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 325
instructed those responsible for completing the allowance was made for the effect of variation in
KH06, KH07, and KH07A returns to check the the number of enrolments. In other words, he has
consistency of their submissions for each provider neither established that the first hypothesis needs
each quarter. “Patients waiting at the end of the to be replaced nor has he justified the assertion that
quarter should be equivalent to patients waiting at “decisions are often taken which are based on a
the end of the last quarter plus the number of simplified vision [sic] of the problem, which [are]
additions . . . minus the number of patients admit- inappropriate” (Kenis 2006, 296). Kenis (2006,
ted in the quarter or removed from the elective 296) asserts that “[g]iven a certain level of com-
admission list for other reasons. For the figures to plexity of a problem[,] it will become impossible
balance, suspended patients must also be taken to react in an equally complex way,” and he claims
into account” (CRIR 1997, 32). The National that this is properly the domain of organization
Audit Office (2001b) used the relationship to ver- science. But he does not substantiate the claim that
ify the purported reduction in the size of the list waiting lists possess the requisite level of com-
at Surrey and Sussex Healthcare NHS Trust plexity, and he has not demonstrated that the
(England), 1998–1999: they suspected a reduc- paradigm fits. Instead, he classifies the first
tion of 1800 patients where the number of elective hypothesis as an example of “our modernist-
admissions was known to have reduced, and they rationalist way of thinking” (Kenis 2006, 296)
found – among other things – 700 new patients and – perhaps as a consequence – anticipates its
and 300 transfers from other hospitals who had failure; he does not recognize the first hypothesis
not been added to the list. as an example of double-entry bookkeeping and –
perhaps as a consequence – does not anticipate its
success.
Secondary Hypotheses
One of the secondary hypotheses offered by the Another of the secondary hypotheses offered by
literature is attributed to the field of organi- the literature comes from the field of health eco-
zation science. Kenis (2006, 296) claims that nomics. It is unfortunate that ‘supplier-induced
“[e]mpirical studies carried out in The Nether- demand’ (Culyer and Cullis 1976) envisaged a
lands and elsewhere show . . . that the input of direct association between the number of admis-
extra resources does not automatically lead to a sions (or its surrogate) and the size of the list
shortening of the waiting list,” and he declares that (Culyer and Cullis 1976) because the notion lay
“[w]aiting lists seem . . . to be an . . . example of a ready to hand and provided what some would think
problem . . . characterized by a high level of com- a plausible explanation. But the first hypothesis
plexity.” Neither observation is new. We don’t anticipates a relationship between the number of
know who first suggested that the size of the list enrolments, the number of admissions, and the size
is influenced by many factors. But Sanmartin et al. of the waiting list which is mathematically exact,
(1998) drew attention to a plethora of factors so there is no room for a second hypothesis until
which appeared to account for a part of the varia- the first has proven false. Moreover, it is still nec-
tion in size (DHSS 1975; Newton et al. 1995; essary – once the primary hypothesis has proven
Hanning and Lundström 1998) and advocated false – for the secondary hypothesis to prove true.
the use of complex models to evaluate their inter- In a cross-sectional study, we might expect to
action and combined effect (DeCoster et al. 2007; see variations between one hospital and another
Kreindler and Bapuji 2010). that are the result of differences in size of the two
Kenis (2006) does not tell us whether the extra populations they serve. Let us imagine that there
resources had the intended effect on the number of are no differences that would invalidate a simple
admissions, and he does not tell us whether comparison, e.g., no differences in the mix of age,
326 P. W. Armstrong
sex, and other salient factors and no differences in understand the dynamics of the waiting list – if
the indications for treatment or in the thresholds at we find there is no need for a second hypothesis,
which a patient is added to, or admitted from, the whether as a result of empirical data or of mathe-
list, etc. Let us imagine that comparison reveals no matical proof. The same will be true if the second
difference in the rates of diagnosis specified on a hypothesis is found not to fit: e.g., if the number of
suitable cross-classification of salient factors. If enrolments is found to determine the number of
the only difference between one hospital and admissions rather than vice versa or if the financial
another is one of scale, then large hospitals serv- transaction, which serves to authorize enrolment
ing large populations would report large numbers and underwrite admission, is found to occur at
of admissions and large numbers waiting, while some other point in the market without any further
small hospitals serving small populations would exchange in the stock-cupboard.
report small numbers of admissions and small
numbers waiting, i.e., we would expect a direct
association between the number of admissions Why has the Effect of Enrolment
and the size of the list. The same reasoning Confounded Analyses to Date?
would also lead us to expect a direct association
between the number of admissions and the num- Commentators, analysts, and researchers have
ber of enrolments (Newton et al. 1995). shown very little interest in the effect of enrolment
It is not enough to show a direct association on the size of the waiting list. We wonder how this
between the number of admissions and the size important confounder came to be overlooked and
of the list and attribute it to supplier-induced what might provide a sufficient incentive to cor-
demand. This does not allow us to distinguish rect the fault.
the effect of supplier-induced demand from the We assert that it is the relationship between
effect of the flow of patients on the stock (when the balance of enrolments and admissions and
the number of enrolments is not fixed and changes in the size of the list which is of primary
unvarying). It is also not enough to show a direct concern, although it is the relationship between
correlation between the number of admissions and admissions and size which dominates the relevant
the number of enrolments. This does not allow us literature. Such a view seems to imply that com-
to distinguish between the effect of supplier- mentators, analysts, and researchers were wrong-
induced demand and the effect of scale. footed at the start of the debate and that the early
The use of the term supplier-induced demand error has been reproduced in most of the work
suggests the futility of making additional resources conducted since. Neither the scope of this chapter
available for elective treatment and – despite assur- nor the extent of our scholarship allows this
ances to the contrary – implies that clinicians have standpoint close consideration at present, but a
been complicit. The way had been prepared for the few waymarks may be enough to indicate the
notion long before the term entered the literature. route proposed.
Commentators viewed the waiting list “as a kind of
iceberg” (Powell 1966, 39), likened the waiting
list to a “bottomless pit” (Haywood 1974, 38), Some Assumed Enrolment Was Fixed
and thought that “trying to ‘get the waiting lists and Unvarying
down’ [was] an activity about as hopeful as filling
a sieve” (Powell 1966, 40); and the conviction In 1963, the then Ministry of Health (MoH) for the
that a plentiful supply might prompt burgeoning UK published what was only its fifth memoran-
demand is (we think) older than any of these dum on the NHS waiting list (MoH 1963b, 1). The
(Culyer and Cullis 1976). But the hypothesis of author claims that a stationary waiting list “normally
supplier-induced demand will prove to have been represents not a deficiency of resources . . .” –
counterproductive – a diversion of attempts to there is no imbalance of enrolments and
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 327
admissions – “but a backlog of cases . . .,” a result helpfully confirmed in the memoirs of the then
of the accumulated imbalances of the past. The Minister of Health, Enoch Powell, who refers to
author also says that “[a] growing waiting list may “the circulars enjoining such devices as the use of
often indicate a deficiency of resources,” i.e., there mental hospital beds and theatres, or of military
is an imbalance of enrolments and admissions. hospitals” (MoH 1963b, 1 & 3), to “the ‘waiting
But he obscures matters by asserting that the list at 31st December’ in the Ministry of Health’s
“growing waiting list . . . will generally also annual reports . . . [as] . . . a reliably stable feature in
include an element of backlog” (MoH 1963b, 1), an otherwise changing scene” (Culyer and Cullis
insisting that “[a] continuous effort will be needed 1976), and to “the special operations to ‘strafe’ the
to prevent a backlog from arising again” (MoH waiting lists, urged on the . . . ground that a station-
1963b, 3). His use of the words “generally,” also ary waiting list is not evidence of deficient capacity
“normally,” and “often” implies doubt where – otherwise it would lengthen – but of a backlog
there is, in fact, ground for none. which, once ‘cleared off’, ought not . . . to recur”
Whether an individual is on the list as the result (Powell 1966, 40). The Minister confirms the
of an historic backlog or as the result of its con- understanding of his staff but considers the
tinuing growth, the additional case can only be ground of their reasoning to have been “falla-
cleared if additional means allow the number of cious.” He no longer views the stationary waiting
admissions to exceed the number of enrolments list in the same light. We disagree. The Minister’s
however briefly. This is what the memorandum error was in thinking the list stationary when
asserts. The term “backlog” is useful if it is con- there had been substantial variation in one at
fined to those who are awaiting admission from a least of the factors thought to determine size,
list that is stationary: if any one of these is cleared, i.e., in admissions.
the reduction in size is permanent. The individual Had the size of the list in fact been stationary,
will never be replaced because the number of the number of enrolments ought to have equaled
admissions equals the number of enrolments. the number of admissions. So it is not clear to
But if we clear anyone from a waiting list that is us why anyone would expect the number of
growing, the reduction in size is momentary. This enrolments to be stationary, that is, fixed and
individual will shortly be replaced by another unvarying, when “the total annual number of
because the number of admissions does not in-patients treated in hospitals has increased by
equal the number of enrolments, and our efforts one-sixth [16.7%], . . . since the early days of the
have to be never ending. service” (MoH 1954, 1). (Culyer and Cullis
In an earlier memorandum, the Ministry expre- (1976) report that throughput capacity, their
ssed the view that “the hospital service is roughly surrogate for elective admissions, showed an
keeping pace with demand but is not appreciably increase of 24.2% – from 11,547 cases/day in
succeeding in reducing the very large waiting num- 1955 to 14,336 cases/day in 1962.) Nevertheless,
bers” (MoH 1954, 1). (For the sake of the narrative, the author of the memoranda feels no need to
we shall assume that the same author wrote both discuss the effect of variation in the number of
memoranda.) He seems to have thought that the enrolments, but he expects there to be a decrease
size of the list was approximately stationary, that is, in the size of the list if there is any increase in the
fixed and unvarying. As a result, he sees the number of admissions. A subsequent Secretary of
problem as one of clearing the backlog (DHSS State for Health and Social Services, Barbara
1981a; Naylor 1991). (According to Culyer and Castle, presents her analysis in very similar
Cullis (1976), the waiting list for all specialties terms. She knows that the list has both shrunk
(excluding psychiatry), England and Wales, and swelled since MoH (1963b), but she chooses
showed an increase in size of 4.3% over 7 years to describe it as approximately stationary: “over
from 444.0 thousand on 31 December 1955 to the past 10 years the total surgical waiting list in
462.9 thousand on 31 December 1962.) This is England and Wales has hovered at the half million
328 P. W. Armstrong
mark, with little change from 1 year to another” position, neither attributes the failure of initiatives
(DHSS 1975, 2). She seems to think it incongru- to the correct cause. The number of enrolments
ous that “the number of admissions nevertheless was not stationary, so a brief excess of admissions
increased by more than 7%” (DHSS 1975, 2) but was not capable of effecting a permanent reduc-
like her predecessor feels no need to discuss the tion in size.
possibility of underlying variation in the number
of enrolments.
According to Culyer and Cullis (1976, 244), Some Only Registered Discharge
“HM(63)22 . . . emphasized that a long waiting (and Death)
list that was numerically stationary is not nor-
mally an indication of resource deficiency in The first dataset, which was intended to inform
any permanent sense but represents instead a the administration of the NHS across England
‘backlog’ of cases which could, and should, be and Wales, provided even less evidence of
removed by determined short-term efforts”. The insight. When it was implemented across the
“situation is one in which the system has settled two countries in 1958, the Hospital In-Patient
down into a kind of long-run administrative equi- Enquiry required the completion of a printed
librium producing a constant addition to the form (HIP 1A) for a one-in-ten sample of dis-
waiting list . . . each time period which is just charges from, and deaths in, hospitals (MoH and
sufficient to offset the numbers called from the GRO 1961a). (Several categories of discharges
existing waiting list during the period” (Culyer (and deaths) were excluded such as those origi-
and Cullis 1976, 245). They think the Ministry nating from maternity units and psychiatric
envisaged a situation in which the number of wards.) The form allowed hospitals to record
enrolments “is just sufficient to offset” the num- the dates on which the patient had been “put on
ber of admissions. the list or booked” for the condition and had
Frost (1980) traces this to the Annual Report of been “first sent for” to come in to hospital
the Chief Medical Officer for the year 1962, (MoH and GRO 1961a). Successive iterations
which asserts that “a long but steady waiting list were intended to improve the coverage, com-
is an indication only of a backlog of work pleteness, and consistency of the data.
remaining from the past” and that “[i]t is only if
the waiting list is steadily increasing that one has Doubtful Definitions
any justification for deducing . . . from waiting list The second version of the form, which was
data alone . . . that there is a shortage of beds” introduced in 1967 (DHSS and OPCS 1970),
(MoH 1963a, 205). We might conclude that the established the pattern of data capture for the
list was not “steadily increasing” (Culyer and 18 years that followed. It allowed hospitals to
Cullis 1976) in the absence of any data on the continue recording the date of admission, the
number of elective admissions. Indeed, we would date of first operation, and the date of discharge
think it stationary were we to compare the size of (or death), but it omitted the date “first sent for.”
the list in 1964 with the size of the list in 1960 The original definition of the “waiting time” was
(475,863/475,643 = 1.000) or the size of the list “[t]he interval between the date a case is placed
in 1965 with the size of the list in 1951 (498,972/ on the waiting list, or booked, and the date of
496,131 = 1.006) (Powell 1966). But according admission (or the date first sent for if the patient
to Frost (1980), the waiting list for general surgery did not come into hospital when first offered a
and related specialties, England and Wales, bed)” (MoH and GRO 1961a, 264). This sug-
showed an increase in size of 23.0% from gests that length was calculated using either the
126,000 on 31 December 1949 to 155,000 on date of admission, or else the date “first sent for,”
31 December 1962. depending on which gave the shorter answer. If
But while Culyer and Cullis (1976) and Frost this is correct, then the definition of length and
(1980) agree with our reading of the Ministry’s the method of calculation subsequently changed:
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 329
the later definition of the “waiting time” was “[t] Event-Based Data Capture Makes some
he interval in weeks between the date a case is Vanish
placed on the waiting list and the date of admis- The number of admissions should exactly equal
sion” (DHSS and OPCS 1970, 1987, xii), so the the number of discharges in every subset of
length of wait reported in 1967–1985 was longer records defined on geography, or demography,
– by definition – than in 1955–1966. We do not or diagnostic group if the lengths of stay were
know why it was thought necessary to discount a always zero, and the number of admissions should
part of the completed wait in the early years of approximately equal the number of discharges if
the dataset, if a patient declined a reasonable the lengths of stay were short compared with the
offer of admission, and we do not know why period of data capture. But not everyone admitted
the practice was abandoned in the later years of to hospital was eventually discharged with an
the dataset. appropriate diagnosis, having completed the
The definition of a “waiting list case” used in series of investigations or the relevant course of
the later tabulations also differed from that used in treatment. Death accounted for 5.67% of the
the earlier tabulations See Table 8. records submitted for 1958 (MoH and GRO
Booked cases are included under the second 1961a, 107). Fortunately, those responsible for
definition but excluded explicitly under the third designing the Hospital In-Patient Enquiry
and implicitly under the first: a case cannot be thought it important to record the frequency and
booked, “it not being possible at that time to distribution of fatalities among those admitted so
define in advance the exact day of admission.” there were no outcomes of admission not
“[P]atients whose admission has been deferred” represented in the dataset. The authors were able
are excluded under the third definition but are not to claim “[a]lthough strictly related to discharges,
excluded under the first or second. If this is cor- in the majority of cases the data will approxi-
rect, then there was a change in the mix of those mately correspond to admissions” (MoH and
included in official statistics over the 31 years of GRO 1961a, 3).
the Hospital In-Patient Enquiry: the discharges It was not possible to collect information on the
(and deaths) which follow elective admission length of wait for admission until the HIP 1A was
were more narrowly defined and made to appear implemented as the first revision of the transcrip-
less numerous in 1967–1985 than in 1955–1966. tion form in 1952 (Registrar General 1959).
We do not know why the entire waits, of each of Regrettably, the item “date put on the list or
those temporarily suspended at any point “for booked” (MoH and GRO 1961a, 298) appears to
medical or personal reasons,” were included in have been added without fully appreciating its
the earlier version of the dataset but not in the implications for the dataset (Douglas 1962). The
later. authors warn “that the . . . data presented here only
330 P. W. Armstrong
give details of those patients who are admitted novelty and too content with the existing state of
to hospital” (MoH and GRO 1961b, 12). Just as affairs.
discharges underestimate admissions by the
number of deaths, so booked admissions and Period-Specific Cross-sections Estimate
admissions from the waiting list underestimate the Probability of Enrolment
enrolments by the number removed. “Nothing is The dataset was constructed by combining sam-
known of those patients who did not obtain admis- ples from cross-sections of records where mem-
sion” (MoH and GRO 1961b, 12). But whether it bership was defined by the date of discharge
is the discharges (and deaths) of the Hospital (or death), i.e., the dataset was period, rather
In-Patient Enquiry (1952–1987) or the finished than cohort, specific. But having used the end
consultant episodes of Hospital Episode Statistics date to determine whether a record ought to be
(1987 to date), using an end date associated with included or not, we are obliged to use the start
elective admission to define the set of records, date to discover the length of wait. In other words,
does not allow us to establish the frequency of the Hospital In-Patient Enquiry supplied mea-
occurrence of other outcomes or the length of wait sures which were retrospective rather than
with which they are typically associated. prospective – it calculated the length of wait
Had the designers chosen to accumulate backward. (The same is true of most of the
lengths of wait by sampling all of the outcomes datasets currently available to health services
of enrolment, the dataset would have allowed researchers.)
other researchers to identify cohorts of additions The technical terms fail to convey the incon-
to the list, e.g., in 1958, and would have allowed gruity of substituting one approach for the other: if
us to examine what happened to their members we want to know how long a patient might expect
prospectively. But the designers chose instead to to wait, the retrospective approach is akin to put-
accumulate lengths of wait by sampling only ting the cart in front of the horse. This is seldom
those patients who had experienced the event of appreciated because we seldom take sufficient
interest and only those records where this had care in defining what it is that we have calculated.
occurred within a specified period. This has left Let us imagine that the dataset allows us to count
subsequent analysts and researchers with very all of those who were admitted as booked or
little choice. If they want to use the existing waiting list cases during 1952, and to identify
datasets, they must be ready to assume that that proportion of these which had a prior wait
removal from the list is infrequent, or that it has of less than 3 months. Strictly speaking, it allows
nothing to do with the length of wait, or that the us to estimate the probability of being “put on the
experience of this group of patients doesn’t mat- list” 0–2 months prior to being admitted. But we
ter. If they want to use the latest accessions to the want to know the probability of being admitted
dataset and present timely analyses, they must be 0–2 months after being “put on the list.” So we
prepared to examine the prior wait of the quarter’s need to count all of those who were “put on the
admissions instead of the subsequent wait of the list” during 1952 and to identify what proportion
quarter’s enrolments. of these had a subsequent wait of less than
It is regrettable that the event-based and 3 months. Now the prior waits for the period will
period-specific data capture modeled by the have the same distribution as the subsequent waits
Hospital In-Patient Enquiry has been emulated of the cohort if the waiting list happens to be
so widely. It means there are few examples stationary (and closed). But publication of the
where the date of an event at the start, rather length of the prior wait for 32 out of 34 years
than at the end, of the wait is used to define the would seem to imply very great confidence in
set of records, so there has been little opportunity the veracity of this assumption.
to demonstrate the consequences of the approach It is likely that the design of the first dataset
empirically. We think those responsible for owed something to the preferences, practices,
funding enquiry in this area too suspicious of and technologies of the day. Each form
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 331
1985. In 1957, the Ministry invited non- Datasets have been constructed which make
participating hospitals to extend coverage by sub- use of the inputs of hospital administration,
mitting forms for a one-in-ten sample of inpatients under standard definitions and across many
discharged (or dead). In 1974, hospitals were hospitals, in order to meet the needs of researchers
invited to extend coverage by submitting forms as well as those of analysts. The investment
for a one-in-ten sample of all whose discharge which their development represents is sometimes
(or death) followed treatment (or investigation) justified in part by the benefit – unspecified and
as a day case. intangible – which the designers expect to accrue
We do not know whether the waiting list was from subsequent investigations. But the useful-
thought to be stationary, or not, and we do not ness of these datasets for the purposes of research
know whether there was an understanding of the depends upon the goals and design of investiga-
consequences of assuming that the list is station- tions not yet envisaged and on the extent to which
ary, when it is not. We have found no documen- the designers have succeeded in anticipating their
tation which alerts users to the fact that the prior requirements.
waits for a period do not have the same distribu- The dates of compilation, the list of contribu-
tion as the subsequent waits of the cohort unless tors, and the stated inclusion and exclusion
the list is stationary (and closed). There is there- criteria indicate some of the more obvious limita-
fore no evidence that the Government Statistical tions of these datasets. But most also constrain
Service considered the published measures to be researchers in a way that is not obvious. Although
erroneous when the waiting list was not, in fact, the datasets supply records of the wait for
stationary. elective admission, researchers may not use
these to conduct cohort analyses – prospective
Design, Analysis, and Interpretation are or retrospective – of all of those who were
Constrained added to the waiting list. The event-based (and
The Hospital In-Patient Enquiry was compiled period-specific) method of data capture used to
from period-specific cross-sections of those who compile the dataset obliges researchers to exam-
had died in hospital, or been discharged, having ine the prior waits of those admitted and the
been admitted electively. This method of data probabilities of enrolment, e.g., 0–2 months,
capture is analogous to drawing samples from prior to admission when they might have pre-
each year’s contribution to the filing cabinets. It ferred to examine the subsequent waits of those
is easy to understand and implement, and it is enrolled and the probabilities of admission, e.g.,
widely used and familiar. It may provide inexpen- 0–2 months, after enrolment.
sive data for the purposes of research if items are This constraint is an artifact of the method of
collected as a matter of routine for other purposes, data capture. The Hospital In-Patient Enquiry
but the advantage of this has always to be set aimed to compile information about hospital
against the disadvantage that records were not morbidity. It opted to do this by collating records
constructed and items not collected with the aim of discharges (and deaths) instead of records of
of this particular investigation clearly in mind. As admissions or enrolments because case notes were
a result, the dataset may not contain all of the more likely to include diagnoses, investigations,
necessary records, i.e., the representation it pro- and treatments at the later of the three events. By
vides may be biased (Berkson 1946; MoH and definition, those who were removed from the list
GRO 1961b; Cornfield and Haenszel 1960). The were not admitted, and their omission from the
dataset may not contain all of the necessary vari- dataset may have been quite unintentional. Their
ables, i.e., the analyses it permits may not allow case notes contained little information about diag-
for confounding and effect modification. And, noses, investigations, or procedures, no date of
where the dataset seems to include the necessary admission, and no date of discharge (or death).
variables, the data may prove insufficiently reli- So it would have been easy to class them with
able, valid, sensitive, or complete. incomplete records and other examples of missing
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 333
data and to assume that the error was random series after the British General Election of 1979
rather than systematic. (Labor: 1974–1979; Conservative: 1979–1997).
We do not think the designers of the Hospital It inferred “that hospitals were losing ground, . . .
In-Patient Enquiry fully appreciated the conse- between 1957 and 1967, against increasing pres-
quences of appending the “date put on the list sure on their resources” (DHSS et al. 1979, 266).
or booked” (MoH and GRO 1961a, 298) to This observation in 1979 is consistent with the
form HIP 1A (Douglas 1962). Nevertheless, they views expressed in HM(63)22. Had the number
established a precedent which resulted in the pop- of enrolments been stationary in the early 1960s,
ularization of a defective method and widespread the Government Statistical Service expected a
publication of biased estimates. Existing methods decrease in the length of wait to accompany an
of data capture should be amended to include increase in the number of admissions. But “the
outcomes of enrolment other than admission proportion of those admitted who had been
(Armstrong 2000), and new datasets should define waiting six months or more” and “the median
the set of interest – wherever possible – by using waiting time” was observed to increase despite
the date of an event at the start of the record rather an increase in the number of admissions, which
than the date of an event at the end. suggests “increasing pressure on resources,” i.e.,
that the number of enrolments increased.
An Apparent Lack of Candor
The Ministry of Health (1963b) discussed the
numbers waiting as reported in the SH3 return at Some Compiled Returns
the close of each year in its memorandum,
HM(63)22, but it made no mention of the length A judgment was passed on the set of discharges
of wait although the tables from the Hospital (and deaths), which resulted in abolition of the
In-Patient Enquiry for 1955, 1956–1957, and Hospital In-Patient Enquiry after 31 December
1958 were all available at the close of 1961. We 1985 and in implementation of the Körner
think it unlikely that any data on the length of wait Reporting System on 1 April 1987. It was asserted
would have been ignored when the Hospital that “[t]his survey is being replaced by the Körner
In-Patient Enquiry was intended to inform the data system” (DH and OPCS 1989, 1), i.e., that
administration of the NHS and the Ministry of the Körner Reporting System replaced records of
Health was preparing to issue guidance (MoH discharges (and deaths) with aggregate counts,
1963b). But the tables published during 1963 sometimes of those admitted (or removed) from
(for the 1959 and 1960 datasets) were the only the list, sometimes of those still awaiting admis-
ones in the series (1955–1985) which failed to sion, and sometimes of those enrolled on the list.
report the length of wait despite collecting the This might seem to suggest that the work of com-
dates needed to do so. The omission of appropri- piling the records of discharges (and deaths) had
ate statistics from the tables for 1959 (MoH and become too burdensome, even on the basis of a
GRO 1963a) and 1960 (MoH and GRO 1963b) one-in-ten sample (MoH and GRO 1961a), or else
implies a lack of candor in the run-up to the that the English NHS had decided that a series of
British General Election of 1964 (Conservative, aggregate counts could better meet its needs and
1951–1964; Labour, 1964–1970). had identified those it thought necessary. But this
The Government Statistical Service said noth- is not the whole story. The Körner Reporting
ing about the length of wait in 1959 and 1960 System replaced a number of returns in addition
when it published its collection of historical tables to the Hospital In-Patient Enquiry, e.g., the SBH
in 1972. But it drew attention to an increase in “the 203 and the EDP4 and EDP5 of the SH3 (Steering
median waiting time” and to an increase in “the Group 1984); and, even as the assertion was being
proportion of those admitted who had been published, the first records of inpatient episodes
waiting six months or more,” when it examined were being compiled into Hospital Episode Sta-
the data for 1957–1960 as part of a longer tistics. It appears that none of the criticisms made
334 P. W. Armstrong
by Working Group A on hospital clinical activity that this is why the Steering Group proposed
have to do with items supplied by the Hospital counts of those who failed to attend, counts of
In-Patient Enquiry (DHSS 1981b). admissions canceled by the hospital, and counts
Nevertheless, it was the Körner Reporting of patients removed from a list for any reason
System which introduced the count of decisions other than elective admission (Steering Group
to admit each quarter, the first data on the number 1984).
of enrolments, additions, or accessions to be We know that the design of the relevant returns
collected in almost 39 years of the UK NHS was not solely dependent upon the members of
(DHSS 1986, 4; Newton et al. 1995). Counts Working Group A. So the Steering Group added
were also proposed of the number of patients the count of patients removed from the list to the
admitted, and of the number of patients removed, KH06 return on “events occurring during [the]
from the list each quarter and of the number of period” (1984, 90) and published its recommen-
patients awaiting admission at the quarter’s end dations before it was realized that the additional
(Steering Group 1984). The four counts seem to counts of the KH07A return would be required.
imply that the stock-flow model, or some version Later versions of the KH06 return (CRIR 1997;
of the basic demographic equation (Newell 1988; CRIR 1998) instructed NHS Trusts to check that
Pressat 1985), may have informed the design of the counts on the KH06, KH07, and KH07A
the relevant returns. But this is doubtful. Work- returns were consistent, although the possibility
ing Group A used a different model to justify of doing this was not mentioned by Working
its proposals to the NHS in 1981, one which Group A, the Steering Group, or those responsible
claimed to provide information about demand for the development of the earliest versions of the
(expressed, met, and unmet) and about attempts returns (DHSS 1981b; Steering Group 1984;
to supply demand (DHSS 1981b; Steering Group DHSS 1986).
1984). Despite the addition to the KH06 return of an
We think that this is why its recommendations instruction to evaluate the consistency of the
were presented under the heading “Information data, we have found little evidence (in 40 sets
about demand for hospital facilities” (DHSS of returns submitted by each provider) that the
1981b, 120) and why ‘demand’ was mentioned stock-flow model, or any version of the basic
42 times in the relevant chapter while ‘stock’ and demographic equation, has been used to do this.
‘flow’ were not mentioned at all (DHSS 1981b). (The instruction was added no later than 1 April
We think that this is why the forms were first 1996 (CRIR 1997) and remained in force until
implemented as returns about the “demand for the return was abolished on 1 April 2006 (ISB
elective admission” (DHSS 1987, 1) and why 2006).)
‘demand’ is mentioned 13 times (and ‘stock’ and
‘flow’ are not mentioned at all) in the penultimate • The version of the KH06 return, which was
“DataSet Change Notice (DSCN)” of the series. issued for use from 1 April 1998 (CRIR
We think that this interest in supply and demand is 1998, 7 of KH06), added “[e]xplanations may
why Working Group A proposed the counting of be given in the box below” to the second par-
“admission decisions” (DHSS 1981b, 129) agraph of instructions about checking consis-
despite the confusion of these with “admissions tency, and it also added a box with the
arranged” (DHSS 1987, 1) and why it coined the invitation [t]his area can be used for your
term “decision to admit” (DHSS 1981b, 123, 125–6 notes and maybe [sic] used to explain any
& 130) instead of “patients added to the list” special features which have affected this
(CRIR 1997, 2–5 of 7). We think that this is why return. These changes might imply that the
Working Group A proposed a count of patients eight previous sets of submissions contained
who were not admitted (despite arrangements inconsistencies large enough to warrant expla-
having been made) as well as a count of nation. But there were numerous changes in
patients who were (DHSS 1981b), and we think this version of the return – most having to do
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 335
with format and layout and very few having check the internal consistency of the KH06 and
any effect on the counts. (The addition of pain KH07 returns, this is a conclusion we are not yet
management to the list of main specialty func- ready to draw.
tions will have generated an additional series of The four counts used to describe the inpatient
counts, and the counts against one (or more) of waiting list might have been consistent when
the existing categories might have diminished first proposed (Steering Group 1984; DHSS 1986;
as a consequence.) Given that previous ver- IMG 1992). The Steering Group (1984, 87)
sions of the return invited comment on counts recommended counting the “[n]umber of patients
of ordinary (or inpatient) admissions and for whom a decision-to-admit has been made,” the
counts of day-case admissions, the invitation “[n]umber of patients admitted electively,” the “[n]
to explain any inconsistency may reflect a umber of patients . . . removed from a list,” and the
desire for consistent presentation rather than “[n]umber of patients still awaiting admission.”
grounds for concern. It appears to have discounted – at least for the
• The National Audit Office (2001a, 21) “was purposes of the narrative – the possibility that an
unable to reconcile” the counts. It found individual might require elective investigation or
24,312† more patients on the list at the close treatment more than once a quarter. Instead, it
of the quarter than were accounted for by claims that “a cohort of all the patients for whom
enrolments less admissions and removals a decision to admit has been made during a spec-
(Table 3c), and the Department of Health was ified time period can be followed up at regular
unable to explain the discrepancy when asked intervals and the number in the cohort admitted at
to do so. The National Audit Office (2001b) different times recorded” (Steering Group 1984,
also queried an inconsistent reduction in the 86). The members of the cohort are “patients
size of the list at Surrey and Sussex Healthcare for whom a decision to admit has been made,”
NHS Trust (England), 1998–1989. It is not which seems to imply a single decision to admit
likely that this Trust had checked the consis- per patient. Moreover, the cohort is “followed
tency of its returns. up at regular intervals” to identify those no longer
• We have been informed that “[t]he NHS Data awaiting the outcome of interest, i.e., “the number
Model and Dictionary team are not aware . . . admitted,” which indicates that a member
of any reviews or audits that [were] commis- either has, or has not, been admitted “at different
sioned by the Department of Health into the times” and seems to imply a single outcome per
internal consistency of the KH06 and KH07 patient. The narrative does not mention removal
returns” (personal communication, Mayet M, from the list for reasons other than admission.
24 January 2016.). We do not think the Steering Group ignorant
of the possibilities. It understood that while the
Discussing attempts “to tackle waiting-list counts describing the outpatient waiting list might
problems,” Yates (1987, 71) claimed “there is be correlated, they were not consistent. Alluding
no tradition of writing up managerial work of to the decision to admit to the list, the Steering
this type in medical, or even in management Group claims that “[p]iloting and consultation
journals.” (Copyright # John Yates 1987.) The have shown the practical difficulty of capturing
paper by White (1980) appears to be the only and recording any requests other than those made
example of its type which survived peer review in writing. It is however feasible to record the
and made it into print, but it is scarcely possible number of written requests made by general prac-
that he was the only analyst in England and titioners and changes in this statistic should reflect
Wales who was interested in the relationship changes in the total number of requests” (Steering
between inputs, outputs, and the size of outpa- Group 1984, 87).
tient and inpatient waiting lists. So while the lack But the Steering Group (1984, 87) also
of documentary evidence suggests that NHS recommended regular reports of the “[n]umber
Trusts and District Health Authorities did not of patients for whom arrangements to admit were
336 P. W. Armstrong
made but who were not admitted because they one quarter on average. Moreover, the Körner
failed to attend” and of the “[n]umber of patients Reporting System does not tell us how many
for whom arrangements were made but admission were reinstated over the course of the quarter.
did not take place because of cancellation by the Sthen estimates the count in question by assuming
hospital.” If these are understood to be alternative that each suspension lasts one quarter on average
outcomes of enrolment, then admission and re- and that everyone suspended is duly reinstated
moval by definition cannot provide a consistent (CRIR 1997).
account for the change in the size of the list. The second handles the count of those sus-
And the definitions of the four counts used to pended as though it was a stock. The number
describe the inpatient waiting list were not wholly suspended at the end of that quarter is added to
consistent in subsequent iterations of the Körner the count of those awaiting admission at that date,
Reporting System (CRIR 1997; CRIR 1998). and the number suspended at the end of this quar-
The CRIR Secretariat (1997, 32) asserts that ter is added to the count of those awaiting admis-
“[p]atients waiting at the end of the quarter should sion at this date. So we expect
be equivalent to patients waiting at the end of the
last quarter plus the number of additions and Enow ðAnow þ Rnow Þ ¼
minus the number of patients admitted in the
quarter or removed from the elective admission ðCnow þ Snow Þ Cthen þ Sthen ,
list for other reasons.” This is what we would (3:3)
expect if (a) the date of addition marked the start
of each wait, (b) the date of admission (or of where Snow represents those suspended from the
removal) marked the end of each wait, and (c) if list at the time of this census, and Sthen represents
everyone waiting was eligible for admission on those suspended from the list at the time of that
any and all of the intervening dates. But not census. We do not need to make any assumptions
everyone was considered eligible for admission about the length of suspension or the frequency of
on any and all of the dates separating their addi- reinstatement under this approach. Instead, we
tion to the list from their removal. expect the balance of enrolments less admissions
“For the figures to balance,” providers were (and removals) to account for the difference bet-
told, “suspended patients must also be taken into ween the censuses once we have corrected those
account” (CRIR 1997, para. 164). There are two counts by adding back the suspended.
ways of doing this. Formulae (3.2) and (3.3) are equivalent. But
The first of these handles the count of those formula (3.2) tells us that enrolment and reinstate-
suspended as though it was a flow. The number ment cause the official list (Cnow , Cthen) to swell
suspended that quarter is added to decisions to and that admission, removal, and suspension
admit this quarter as though that number were cause it to shrink, whereas formula (3.3) provides
reinstated this quarter (Armstrong 2000), and the a simpler account – the number waiting increases
number suspended this quarter is added to the as a result of enrolment and decreases as a result of
number removed. So we expect admission and removal – but the list (Cnow +
Snow , Cthen + Sthen) is not the one reported in
E þ Sthen ðA þ R þ Snow Þ ¼ Cnow Cthen , the Press. With a little rearrangement, both formu-
(3:2) lae yield the relationship which providers were to
use to check the consistency of their counts of
where Sthen represents those reinstated to the list, inpatients and of day cases (CRIR 1997, para.
and Snow represents those removed from the list, 164 & p. 6 of KH06), namely,
this quarter. The Körner Reporting System does
not tell us how many were suspended over the
Cnow ¼ Cthen þ Sthen þ Enow
course of the quarter. Snow estimates the count in
question by assuming that each suspension lasts ðAnow þ Rnow Þ Snow , (2:1)
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 337
so the two approaches give identical results. The for the KH07 return are simple: by definition, no
CRIR Secretariat claims that “[t]he change in the patient can wait for more than one procedure at a
total numbers waiting should reflect this activity,” time, so no patient may be counted more than
that is, “all the additions to the waiting list (i.e., the once in the census at the end of the quarter. But
number of decisions to admit) and removals from the instructions for the KH06 return are not sim-
the waiting list that have taken place during the ple: if the dates of the decisions to admit fall in the
quarter” (CRIR 1997, para. 144). If we understand same quarter for both procedures, the count of
“the total numbers waiting” to include those decisions to admit must not include the second
suspended, i.e., Cnow + Snow and Cthen + Sthen, of them; and if the dates of the admissions do not
this statement would seem to imply the relation- fall in the same quarter for both procedures, the
ships of formula (3.3). But if we understand the count of admissions must include the second
“total . . . of all patients waiting for admission” to of them. We think that the date of admission
exclude those suspended (CRIR 1997, para. 155), (or removal) for the subsequent procedure will
the statement would seem to imply the relation- be counted more often than the date of the deci-
ships of formula (3.2). Given that the data about sion to admit which preceded it. So the consis-
suspensions (Snow , Sthen) were obtained by taking tency of the four counts was impaired when the
a census (CRIR 1997), formula (3.3) is the model KH07 was modified to exclude all of those
which ought to be used. ‘awaiting’ an additional procedure and the KH06
Having demonstrated the consistency of the was modified to exclude those ‘awaiting’ a second
data by adjusting for suspensions (CRIR 1997), procedure only when the first procedure had not
we ought to be willing to acknowledge – in the yet been completed.
first instance – that it is “the total numbers While the terms stock and flow have not been
waiting” and not the official numbers which used in any document about the KH06, KH07,
reflect the balance of enrolments less removals and KH07A returns or in any of the official com-
(and admissions) and, in the second instance, mentary, they were introduced as labels for the
that it is the balance of enrolments and reinstate- datasets which took their place. DSCN 09/2006,
ments less admissions (and removals and suspen- which announced the “data flow” intended to
sions) which changes the official numbers and not replace the tabulated content of the returns (ISB
“the total numbers waiting.” 2006, 1), mentioned ‘stock’ 29 times and did not
Now some patients will require elective treat- mention demand once. (It also mentioned ‘flow’
ment (or investigation) on more than one occasion 41 times, but not all of these were to do with the
(IMG 1992). Some will require treatment (or events previously recorded by the KH06.) Despite
investigation) for the same condition, will this, there seems to be little understanding of the
undergo the same procedure, and will appear on relationships implied by the stock-flow model
the same list, on two (or more) occasions. The even when the terms are used extensively. The
NHS accepts that the manager ought not to be held definitions of the four counts represented either
responsible for that part of any wait over which as a ‘stock’ or as a “flow” are not perfectly con-
she can be expected to exercise no control. So if a sistent (ISB 2006, 44 & 46).
patient is admitted to the same waiting list twice Dr. A. Mason, who had previously demon-
(CRIR 1997), e.g., for extraction of two cataracts, strated an excellent understanding of the relation-
she is not considered as waiting for the second ship between stock and flow (Mason 1976), was a
operation until she has been discharged from hos- member of the Secretariat and therefore party to the
pital after the first. But the data model implied by deliberations both of Working Group A and the
this is more complicated than that in which each Steering Group. Now Working Group A claimed
patient is (assumed) to require just one admission that “information is required about the balance
or in which we count, for example, the number of between referrals and the number seen . . . [t]o
decisions to admit – rather than the number of identify whether the number of patients waiting
individuals added – to the list. The instructions for an out-patient appointment is increasing or
338 P. W. Armstrong
decreasing” (DHSS 1981b, 122). It also claimed and Cullis 1976, 251), and they advocate “[a]n
that “information is required about the balance alternative approach, likely to appeal to those who
between expressed and met demand” (DHSS prefer not to reject the supply/demand approach
1981b, 123), presumably in order to determine entirely” (Culyer and Cullis 1976, 247). But “despite
whether the number of patients waiting for an very diligent searching” (Culyer and Cullis 1976,
inpatient admission is increasing or decreasing. 264), and despite emphasizing the “one behavioural
Nevertheless, we fear that neither the stock-flow law that has never been refuted” (Culyer and Cullis
model nor the basic demographic equation had 1976, 244), they are obliged to confess that “we
much influence on the analysis of the data. The have been unable to uncover any systematic and
English NHS appears to have collected relevant reliable empirical relationships among the rele-
counts for 24 years (1 April 1987–31 March vant variables, nor have we been able to devise a
2010) without ever testing its convictions about plausible ‘behavioural’ model that has led to the
the effect of enrolment on the size of the list, and specification of such a set of relationships”
it appears to have done so for 10 of these despite (Culyer and Cullis 1976, 264). Culyer and Cullis
instructions to check the consistency of the counts (1976) claim that the first hypothesis has failed
(1 April 1996–1 April 2006). without realizing that it has not been subject to a
The KH06, KH07, and KH07A returns were fair trial. They attempted to construct a model
abolished on 31 March 2010, on the grounds without considering the effect of variation in the
that the suite of 18-week referral to treatment number of enrolments.
times adequately met the needs of users. But Researchers continue to find fresh evidence of
this dataset has failed to provide any infor- the direct association between the number of
mation about the number of enrolments, addi- admissions (or an appropriate surrogate) and the
tions, or accessions for 5½ years (31 March size of the list (Buttery and Snaith 1980; Frost
2010–1 October 2015 (Analytical Services 1980), which Culyer and Cullis (1976) viewed as
2015)). The deficiency has now been rectified, indicating the failure of the first hypothesis. More-
ostensibly to allow the reintroduction of a check over, there appears to have been little diminution
on the consistency of the four counts, i.e., of in the popularity of the “one behavioural law that
“new RTT clock starts” (E), “completed RTT has never been refuted” as a result of Culyer and
pathways” (A), “validation removals” (R), and Cullis’s inability to implement it satisfactorily.
changes in the size of the list (Cnow Cthen). The direct relationship continues to be explained
But Analytical Services did not explain why we by the appetites of those who enter the market-
expect start dates and end dates to yield exactly place to sell (supplier-induced demand) rather
the same count of those eligible for admission at than the appetites of those who enter the market-
any point during the month of interest (Analyti- place to buy.
cal Services 2015, 8). It is perhaps not surprising The Institute of Social and Economic Research
that it permits “a reasonable tolerance” for the received support from the Department of Health
consistency check as did the CRIR Secretariat and Social Security “for . . . research into the
before it (CRIR 1997, 6 of KH06). economics of waiting lists.” It received a grant,
and Culyer and Cullis (1976, 239) “benefited
enormously from discussions with DHSS offi-
Some Made Hay cials,” which may be why the DHSS turned to
the Institute for advice. But it is likely that the
Culyer and Cullis (1976) note that the size of the enquiry was also prompted by prevailing opinion,
waiting list for England and Wales has not decreased e.g., by “Parkinson’s Law of Hospital Beds”
as a result of increases in the number of admissions. (Powell 1966, 43) and “Say’s Law of Hospitals”
They claim “that no one has to date succeeded in (Culyer and Cullis 1976, 244), and by the expres-
formulating a systematic and testable model to sions of other economists (Feldstein 1967) earlier
explain the phenomena . . . satisfactorily” (Culyer on the scene. Whatever the reason, the DHSS
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 339
chose to consult economists rather than the mem- can determine how many of these were admitted
bers of any other school of social science. It is during the period of interest, and we can calculate
perhaps no surprise that supplier-induced demand the length of their completed stay. But we have no
has become the dominant paradigm in the litera- information about those who have yet to be
ture from the UK. discharged (or to die): we cannot determine how
many of them were admitted during the period of
interest, and we cannot calculate the length of
The Primary Hypothesis Has Not their incomplete stay. And if we have chosen to
Been Falsified register admissions, we can count them and cal-
culate the length of stay with ease, i.e., we know
The Ministry of Health (1954, 5) recommended which of those admitted during the period of
“the careful and regular study of such figures as . . . interest have yet to (die or) be discharged, but
size of waiting list in proportion to number . . . of we have little information about the outcome of
patients treated, degree of urgency of need of their admission, e.g., diagnosis, treatment, and
patients on the waiting list, numbers waiting for destination of discharge.
defined periods and such other indices as are avail- Extracting information about enrolments from
able in published documents.” In other words, the a collection of discharges (and deaths) is more
Ministry expected the compilation of information involved. If we are to obtain a complete set of
about the numbers waiting and the numbers admit- enrolments, we must:
ted, but it did not expect the compilation of infor-
mation about the numbers enrolled. It is therefore • Identify those who have been discharged
not surprising that the Hospital In-Patient Enquiry (or who died) following admission from the
did not provide counts of the numbers enrolled in waiting list.
England and Wales. • Identify those who have not been discharged
The omission was hallowed by successive (or have not died) following admission from
datasets, first by those that relied on printed forms the list.
and second by those that relied on electronic media • Identify those who have not been admitted
for their inputs. The national dataset compiled from the list.
records after investigation or treatment had been
completed, and these records were collated by the This third group includes (i) some who will be
period in which the event was registered, i.e., by admitted from the list and who will, in due course,
the date of discharge (or death) (Registrar General be numbered among the discharged (or dead), and
1959). This architecture facilitated the counting it includes (ii) others who – having been removed
and cross-classification of discharges (and deaths), from the list – will never be admitted and will
and it reflected our need for data on morbidity therefore never be numbered among the
(Registrar General 1959). discharged (or dead).
But we are also interested in the use made of We face the problem of our choices. If we had
the costlier resources. This has expressed itself in chosen to register patients immediately after their
an interest in the length of stay and therefore in the enrolment on the list, instead of after their dis-
occurrence of admission as well as discharge charge from hospital, it would be easy to determine
(or death). Extracting information about admis- the size of a cohort and to cross-classify its mem-
sions from a collection of discharges (and deaths) bers. But the architecture of successive datasets in
is a little involved. It is not difficult to obtain the England prized economy of effort: it set about
information we require when we have both the capturing the requisite variables, and relevant
date of admission and the date of discharge records, in a single pass. This can only be done
(or death), but we face the problem of our choices using discharges (and deaths). If we attempt to
while we await the date of the second event. If we compile our records on admission, some data
have chosen to register discharges and deaths, we about the outcome of admission will be missing.
340 P. W. Armstrong
These details could be supplied by taking a Committee for Regulating Information Requirements
second pass at a later date and replacing any record (CRIR) Secretariat. Central returns: waiting times.
DSCN: 10/98/P10. Birmingham: NHS Executive;
which was incomplete with the now completed 1998. p. 3, 7 of KH06. Contains public sector informa-
version. tion licensed under the Open Government License v3.0.
There has been no attempt to construct a Committee for Regulating Information Requirements
national dataset from enrolments in England using (CRIR) Secretariat. Patients awaiting elective admis-
sion. In: The Data Manual. Hospital services module,
repeated passes to upload the latest details from the version 4.0. Birmingham: Information Management
most recent accessions. And there has been no Group, NHS Executive; 1997. p. 7, 12–4, 16–7,
attempt to construct an equivalent dataset out of 29–32, 2–6 of 7, 3 of 4. Contains public sector infor-
discharges (and deaths) for the purposes of longi- mation licensed under the Open Government License
v3.0.
tudinal research, where timeliness is much less of Cornfield J, Haenszel W. Some aspects of retrospective
an issue. But the relationship between enrolments, studies. J Chronic Dis. 1960;11:523–34.
admissions (and removals), and the size of the list Culyer AJ, Cullis JG. Some economics of hospital waiting
cannot be assessed empirically using the dataset lists in the NHS. J Soc Policy. 1976;5(3):239–64. By
permission of Cambridge University Press.
available (Hospital Episode Statistics). It would DeCoster C, Chateau D, Dahl M, et al. Waiting times for
not be reasonable however to attribute a lack of surgery, Manitoba 1999/2000 to 2003/04. Winnipeg:
interest in the effect of enrolment to the lack of Manitoba Centre for Health Policy; 2007. p. 6, 37–8,
relevant data. The Department of Health and Social 53, 59. http://mchp-appserv.cpe.umanitoba.ca/reference/
swt_3web.pdf. Accessed 11 July 2016.
Security instructed hospitals to report the number Department of Health and Social Security (DHSS). A
of enrolments as aggregate counts between 1 April report of the working groups A to the steering group
1987 and 31 March 2010, by completing the KH06 on health services information. London: NHS/DHSS
return on a quarterly basis. Nonetheless, there is Steering Group on health services information; 1981b.
p. 120–30. Contains public sector information licensed
little evidence that this data has been used to check under the Open Government Licence v3.0.
the reliability of the counts or the validity of the Department of Health and Social Security (DHSS). Man-
relationship hypothesized. agement services. Demand for elective admission: sta-
tistical returns KH06, KH07 and KH07A. SM(87)2/8.
Blackpool: Statistics and Research Division 2A; 1987.
p. 1. Contains public sector information licensed under
References the Open Government Licence v3.0.
Department of Health and Social Security (DHSS). Man-
Analytical Services. Aligning the publication of perfor- agement services. Post Korner aggregate statistical
mance data – statistics consultation. Leeds: NHS returns. SM(86)2/11. Blackpool: Statistics and
England; 2015. p. 8. https://www.engage.england.nhs. Research Division 2A Fylde; 1986. p. 4. Contains
uk/consultation/aligning-publication-performance-data. public sector information licensed under the Open Gov-
Accessed 11 July 2016. Contains public sector informa- ernment Licence v3.0.
tion licensed under the Open Government Licence v3.0. Department of Health and Social Security (DHSS). Ortho-
Armstrong PW. First steps in analysing NHS waiting times: paedic services: waiting time for out-patient appoint-
avoiding the ‘stationary and closed population’ fallacy. ments and in-patient treatment. Report of a working
Stat Med. 2000;19:2037–2051. By permission of John party to the Secretary of State for Social Services.
Wiley and Sons. https://doi.org/10.1002/1097-0258(200 London: DHSS; 1981a. p. 11. 24, 33, 42, 76, 80–1.
00815)19:15<2037::AID-SIM606>3.0.CO;2-R/pdf. http://nhsreality.wordpress.com/2015/01/. Accessed 11
Armstrong PW. Spotting the pantomime villain: do the July 2016.
usual approaches correctly indicate when waiting Department of Health and Social Security (DHSS). Reduc-
times got shorter? Health Serv Manag Res. tion of waiting times for in-patients admission: man-
2010;23:103–115. By permission of SAGE. https:// agement arrangements. HSC(IS)181. London: DHSS;
doi.org/10.1258/hsmr.2009.009021. 1975. p. 1–4. Contains public sector information
Berkson J. Limitations of the application of fourfold table licensed under the Open Government Licence v3.0.
analysis to hospital data. Biometrics. 1946;2:47–53. Department of Health and Social Security, Office of Pop-
Buttery RB, Snaith AH. Surgical provision, waiting times ulation Censuses and Surveys (DHSS & OPCS). Hos-
and waiting lists. Health Trends. 1980;12:57–61. pital in-patient enquiry, summary tables. Based on a
Carvel J. Tories doubt fall in hospital waits. Guardian, one in ten sample of NHS patients in hospitals in
10 Jan 2004, p. 6. England, 1985. MB4 no. 26. London: HMSO; 1987.
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 341
p. xi–xii. Contains public sector information licensed Frost CEB. How permanent are NHS waiting lists? Soc Sci
under the Open Government Licence v3.0. Med. 1980;14C:1–11.
Department of Health and Social Security, Office of Pop- Goldacre MJ, Lee A, Don B. Waiting list statistics. I:
ulation Censuses and Surveys (DHSS & OPCS). relation between admissions from waiting list and
Report on hospital in-patient enquiry for the year length of waiting list. Br Med J (Clin Res Ed).
1967. Part I. Tables. London: HMSO; 1970. p. 298–9. 1987;295:1105–8.
Contains public sector information licensed under the Hamblin R, Harrison A, Boyle S. Access to elective care:
Open Government Licence v3.0. why waiting lists grow? London: King’s Fund; 1998.
Department of Health and Social Security, Office of Pop- p. 12–5, 26, 58. http://kingsfund.koha-ptfs.eu/cgi-bin/
ulation Censuses and Surveys, Welsh Office. Hospital koha/opac-detail.pl?biblionumber=20657. Accessed
in-patient enquiry. Main tables. Based on a one in ten 17 Aug 2016. By permission of The King’s Fund.
sample of NHS patients in hospitals in England and Hanning M, Lundström M. Assessment of the maximum
Wales, 1974, Series MB4, no. 2. London: HMSO; waiting time guarantee for cataract surgery. The case
1978. p. ix. Contains public sector information licensed of a Swedish policy. Int J Technol Assess. 1998;14:
under the Open Government Licence v3.0. 180–93.
Department of Health and Social Security, Office of Pop- Harvey I, Webb M, Dowse J. Can a surgical treatment centre
ulation Censuses and Surveys, Welsh Office. Hospital reduce waiting lists? Results of a natural experiment.
in-patient enquiry. Patterns of morbidity. Based on a J Epidemiol Community Health. 1993;47:373–6.
one in ten sample of NHS patients in hospitals in Haywood SC. Managing the health service. London: Allen
England and Wales, 1962–67, Series MB4, no. 3 & Unwin; 1974. p. 38.
London: HMSO; 1979. p. 266. Contains public sector Hinde A. The lexis chart. In: Demographic methods.
information licensed under the Open Government London: Arnold; 1998. p. 12–3.
Licence v3.0. House of Commons Health Committee. Public expenditure
Department of Health, Office of Population Censuses on Health and Personal Social Services 2009. Memo-
and Surveys (DH & OPCS). Hospital in-patient randum received from the Department of Health
enquiry in-patient and day case trends. Based on a containing replies to a written questionnaire from the
nominal one in ten sample of NHS patients in hospi- Committee. London: The Stationery Office; 2010.
tals in England 1979–1985, Series MB4, no. 29. p. 132–4. http://www.publications.parliament.uk/pa/
London: HMSO; 1989. p. 1. Contains public sector cm200910/cmselect/cmhealth/269/269i.pdf. Accessed
information licensed under the Open Government 11 July 2016. Contains public sector information
Licence v3.0. licensed under the Open Government License v3.0.
Douglas JWB. Ministry of Health and General Register Hurst J, Siciliani L. Tackling excessive waiting times for
Office: report on hospital in-patient enquiry for the year elective surgery: a comparison of policies in twelve
1958: Part II. London: HMSO; 1961. p. 301. 17s. 6d. OECD countries. Paris: OECD; 2003. https://doi.org/
Popul Stud (Camb) 1962; 16(2):196. 10.1787/108471127058. Accessed 11 July 2016.
Farquharson D. Waiting times management in Lothian. Information Management Group (IMG). Patients
Edinburgh: NHS Lothian; 2011. p. 7. http://www.scot awaiting elective admission. In: The Data Manual.
tish.parliament.uk/S4_HealthandSportCommittee/Gen Hospital services module, version 1.0. Birmingham:
eral Documents/2012.01.09_to_DM_-_report_from_ NHS Management Executive, Department of Health;
NHS_Lothian_on_waiting_times_management.pdf. 1992. p. 5, 8–10, 14–8. Contains public sector infor-
Accessed 11 July 2016. mation licensed under the Open Government License
Faulkner A, Frankel S. Delayed access to non-emergency v3.0.
NHS services. A review of NHS waiting times and Kenis P. Waiting lists in Dutch health care. An analysis from
waiting list research issues. Bristol: Health Care Eval- an organization theoretical perspective. J Health Organ
uation Unit, University of Bristol; 1993. p. 23, 84. Manag. 2006;20(4):294–308. By permission of Emer-
Feldstein MS. Economic analysis for health service effi- ald. https://doi.org/10.1108/14777260610680104.
ciency. Amsterdam: North-Holland; 1967. p. 152, 200. Kreindler SA. Policy strategies to reduce waits for elective
Finn C. The management, collection and publication of care: a synthesis of international evidence. Br Med
acute day and inpatient waiting lists. Dublin: Institute Bull. 2010;95:7–32.
for the Study of Social Change, University College Kreindler SA, Bapuji SB. Evaluation of the WRHA pre-
Dublin; 2004. p. 12–7. habilitation program. Winnipeg: Winnipeg Regional
Fordham R. Managing orthopaedic waiting lists. Discus- Health Authority; 2010. p. 73–9.
sion paper no. 27. York: Centre for Health Economics, Lee A, Don B, Goldacre MJ. Waiting list statistics. II: an
University of York; 1987. p. 9. http://www.york.ac.uk/ estimate of inflation of waiting list length. Br Med J
che/pdf/dp27.pdf. Accessed 11 July 2016. (Clin Res Ed). 1987;295:1197–8.
Fowkes FGR, Page SM, Phillips-Miles D. Surgical Mason A. An epidemiological approach to the monitoring
manpower, beds and output in the NHS: 1967–1977. of hospital waiting list statistics. Proc R Soc Med.
Br J Surg. 1983;70:114–6. 1976;69:939–42.
342 P. W. Armstrong
Ministry of Health (MoH). National Health Service. The Naylor CD. A different view of queues in Ontario. Health
more effective use of hospital beds. HM(54)89. Aff (Millwood). 1991;10(3):110–28.
London: Ministry of Health; 1954. p. 1. Naylor CD, Slaughter P, Sykora K, et al. Waits and rates:
Ministry of Health (MoH). On the state of the public health. the 1997 ICES report on Coronary Surgical capacity for
The annual report of the Chief Medical Officer of the Ontario. Toronto: Institute for Clinical Evaluative
Ministry of Health for the year 1962. London: HMSO; Sciences; 1997. p. 14.
1963a. p. 205–7. Newell C. The basic demographic equation. In: Methods
Ministry of Health (MoH). Reduction of waiting lists, and models in demography. London: Belhaven Press;
surgical and general. HM(63)22. London: Ministry of 1988. p. 8.
Health; 1963b. Newton JN, Henderson J, Goldacre MJ. Waiting list
Ministry of Health (MoH). Report of the Ministry of Health dynamics and the impact of earmarked funding. BMJ.
for the year ended 31st December 1963. The health and 1995;311:783–5.
welfare services. 1963–64 Cmnd. 2389. London: NHS Information Standards Board (ISB). Measuring and
HMSO; 1964. p. 44. recording of waiting times. DSCN: 09/2006. Birming-
Ministry of Health, General Register Office (MoH & ham: NHS Management Executive; 2006. p. 1, 44, 46.
GRO). Report on hospital in-patient enquiry for the Contains public sector information licensed under the
two years 1956–1957. London: HMSO; 1961b. p. 12. Open Government License v3.0.
Ministry of Health, General Register Office (MoH & NHS/DHSS Steering group on Health Services Informa-
GRO). Report on hospital in-patient enquiry for the tion (Steering Group). A report on the collection and
year 1958. Part II. Detailed tables and commentary. use of information about hospital clinical activity in the
London: HMSO; 1961a. p. 107, 262, 264, 298–99. National Health Service. London: HMSO; 1984.
Ministry of Health, General Register Office (MoH & p. 27–8, 86–90, 131. Contains public sector informa-
GRO). Report on hospital in-patient enquiry for the tion licensed under the Open Government Licence
year 1959. Part II. Detailed tables and commentary. v3.0.
London: HMSO; 1963a. Niinimäki T. Increasing demands on orthopedic services.
Ministry of Health, General Register Office (MoH & Acta Orthop Scand. 1991;62(S241):42–3.
GRO). Report on hospital in-patient enquiry for the Nordberg M, Keskimäki I, Hemminki E. Is there a relation
year 1960. Part II. Detailed tables and commentary. between waiting-list length and surgery rate? Int J
London: HMSO; 1963b. Health Plann Manage. 1994;9:259–65.
Moral L, de Pancorbo CM. Surgical waiting list reduction Powell JE. Supply and demand. In: A new look at medicine
programme. The Spanish experience. In: HOPE and politics. London: Pitman Medical; 1966. p. 39–40.
sub-committee on coordination, editor. Waiting lists http://www.sochealth.co.uk/national-health-service/health
and waiting times in health care. Managing demand care-generally/history-of-healthcare/a-new-look-at-medi
and supply. Leuven: European Hospital and Healthcare cine-and-politics-4/. Accessed 11 July 2016.
Federation (HOPE); 2001. p. 7, 10–7, 48–9. http:// Pressat R. Balancing equation. In: Wilson C, editor. The
www.hope.be/documents-library/. Accessed 11 July dictionary of demography. Oxford: Blackwell; 1985.
2016. p. 15.
National Audit Office. Inappropriate adjustments to NHS Purcell J. The waiting list initiative. Report on value for
waiting lists. Report by the Comptroller and Auditor money examination. Dublin: Office of the Comptroller
General. HC452. Session 2001–2002: 19 December and Auditor General; 2003. p. 8, 17, 23, 26, 28. http://
2001. London: The Stationery Office; 2001b. p. 27. www.audgen.gov.ie/ViewDoc.asp?DocId=-1&CatID=
https://www.nao.org.uk/report/inappropriate-adjustments 5. Accessed 11 July 2016. By permission of the Office of
-to-nhs-waiting-lists/. Accessed 11 July 2016. the Comptroller and Auditor General.
National Audit Office. Inpatient and outpatient waiting in Registrar General. Statistical review of England and Wales
the NHS. Report by the Comptroller and Auditor Gen- for the year 1955. Supplement on hospital in-patient
eral. HC 221. Session 2001–2002: 26 July 2001. statistics. London: HMSO; 1959. p. 2.
London: The Stationery Office; 2001a. p. 21. https:// Sanmartin C, Barer ML, Sheps SB. Health care waiting lists
www.nao.org.uk/report/inpatient-and-outpatient-waiting- and waiting times: a critical review of the literature. In:
in-the-nhs/. Accessed 11 July 2016. By permission of the Waiting lists and waiting times for health care in
National Audit Office. Canada: more management, more money. Ottawa:
National Audit Office Wales. NHS waiting times in Wales. Health Canada; 1998. p. 196, 198, 241–54, 270, 281.
Volume 1 – the scale of the problem. Cardiff: The http://publications.gc.ca/site/eng/9.647111/publication.
Stationery Office; 2005. p. 7, 11, 18–9, 32–4, 43. html. Accessed 11 July 2016.
National Waiting Times Unit (NWTU). Managing waiting Smethurst DP, Williams HC. Self-regulation in hospital
times. A good practice guide. Edinburgh: Scottish waiting lists. J R Soc Med. 2002;95:287–9.
Executive; 2003. p. 4–5. http://www.gov.scot/Publica Snaith AH. Supply and demand in the NHS. Br Med
tions/2003/09/18035/25483. Accessed 11 July 2016. J. 1979;1(6171):1159–60.
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 343
Street A, Duckett S. Are waiting lists inevitable? Health White A. Waiting lists. A step towards representation,
Policy. 1996;36:1–15. By permission of Elsevier. clarification and solving of information problems.
https://doi.org/10.1016/0168-8510(95)00790-3. Hosp Health Serv Rev. 1980;76(8):270–4.
Sykes PA. DHSS waiting list statistics – a major deception? Worthington D. Hospital waiting list management models.
Br Med J (Clin Res Ed). 1986;293:1038–9. J Oper Res Soc. 1991;42(10):833–43.
Torkki M, Linna M, Seitsalo S, et al. How to report and Yates J. Why are we waiting? An analysis of hospital
monitor the performance of waiting list management. waiting lists. Oxford: Oxford University Press; 1987.
Int J Technol Assess. 2002;18(3):611–8. p. 71. By permission of Oxford University Press.
Waiting Times: Evidence of Social
Inequalities in Access for Care 15
Luigi Siciliani
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
Sources of Inequalities in Waiting Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Data and Empirical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
A Review of the Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
International Studies: Evidence from SHARE and the Commonwealth Fund . . . . . . . . . 355
United Kingdom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Australia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
Norway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Sweden . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Canada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Germany . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Spain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Italy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Conclusions and Implications for Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
seems however to suggest that within several an elective treatment (e.g., a hip replacement) or a
publicly funded health systems, nonprice few hours in the emergency room. “Need” can be
rationing does not guarantee equality of access interpreted as ill health or severity, but also as the
by socioeconomic status. Individuals with ability (or capacity) to benefit. The two concepts
higher socioeconomic status (as measured by differ since ill patients may have low capacity to
income or educational attainment) tend to wait benefit from treatment (as for some cancer
less for publicly funded hospital care than patients).
those with lower socioeconomic status. This An extensive empirical literature has been
negative gradient between waiting time and devoted to test whether, controlling for need, indi-
socioeconomic status may be interpreted as viduals with different socioeconomic status differ
evidence of inequity within publicly funded in healthcare utilization (Wagstaff and Doorslaer
systems which favors rich and more-educated 2000 for a review). In most studies, the level of
patients over poorer and less-educated ones. healthcare utilization is measured by the number
The chapter provides an overview of methods of visits to a specialist or a family doctor, while
and data to investigate the presence of social need is measured by self-reported health. Com-
inequalities in waiting times and highlights parative international studies suggest that in many
key results. OECD countries there is generally pro-rich ineq-
uity for physician contacts, in particular in relation
to specialist visit and to a lower extent family-
Introduction doctors consultation (where in some instances
pro-poor inequities may be present) (see van
Equity is a key policy objective in publicly funded Doorslaer et al. 2000, 2004; Devaux 2015 for a
health systems. In many OECD countries, this recent analysis).
takes the form of payments towards health care This chapter focuses on inequalities of access
funding being related to ability to pay, not the use as measured by waiting times for nonemergency
of medical care; access to health care being based treatments. Studies of inequalities in waiting
on patients’ need, not patients’ ability to pay; and times by socioeconomic status are relatively infre-
overall reduction in health inequalities. quent. This is perhaps surprising given that
An extensive empirical literature has been waiting times are a major health policy issue in
devoted to document inequalities in healthcare many OECD countries. Average waiting times
financing, access, and health (see Wagstaff and can reach several months for common procedures
Doorslaer 2000 for a review). This chapter like cataract and hip replacement (Siciliani
focuses on one form of inequalities in access. et al. 2014). In the absence or limited use of prices
The principle that “access should be based on in combination with constraints on the supply,
need” seems both intuitive and desirable. How- publicly funded systems are often characterized
ever, the words “access” and “need” are subject to by excess demand. Since the number of patients
different interpretations. “Access” can simply demanding treatment exceeds supply, patients are
refer to healthcare utilization, i.e., whether a added to a waiting list and have to wait before
patient has received treatment or not. But it receiving treatment (Martin and Smith 1999).
could also refer to the opportunity to receive treat- Waiting times generate dissatisfaction for
ment, when monetary and nonmonetary costs that patients since they postpone benefits from treatment,
people incur have been taken into account. Money may induce a deterioration of the health status of the
costs involve any copayment the patient has to patient, prolong suffering, and generate uncertainty.
pay or monetary expenses to reach a healthcare A number of policies have been introduced across
provider (a patient from a rural area may, for the globe to reduce or tackle waiting times (see
example, face significant travel costs). Siciliani et al. 2013a for a review).
Nonmonetary costs can take the form of waiting From an equity perspective, one possible
times, if the patient has to wait several weeks for advantage of rationing by waiting times is that
15 Waiting Times: Evidence of Social Inequalities in Access for Care 347
within publicly funded health systems, access to waiting times are negligible: the promise of low
services is not supposed to depend on the ability to wait is indeed the main way to attract patients
pay, unlike any form of price rationing where from the public to the private sector. For several
access is dependent on income. For a given level elective treatments, patients therefore can wait
of need, the duration of the wait is supposed to and obtain treatment in the public sector for free
be the same for patients with different income. (or by paying a small copayment) or opt for the
Patients with higher need or urgency are supposed private sector and obtain care more swiftly if they
to wait less based on implicit or explicit prioriti- are willing to pay the price (or prospectively
zation rules. insure themselves privately).
A recent empirical literature, reviewed in this Since it is individuals with higher income that
chapter, seems however to suggest that within are more likely to be able affording private care,
several publicly funded health systems, nonprice this generates inequalities in waiting times by
rationing does not guarantee equality of access by socioeconomic status within a country. The extent
socioeconomic status. Individuals with higher of such inequalities due to the presence of the
socioeconomic status (as measured by income or private sector is likely to depend on its relative
educational attainment) tend to wait less for pub- size. For example, about 50 % of treatments are
licly funded hospital care than those with lower private in Australia, but these tend to be negligible
socioeconomic status. in the Nordic countries where the option of going
This negative gradient between waiting time private is much more limited.
and socioeconomic status may be interpreted as Within publicly funded systems, access to care
evidence of inequity within publicly funded sys- should be based exclusively on need, not on abil-
tems which favors rich and more-educated ity to pay (in contrast to contributions to funding
patients over poorer and less-educated ones. of health systems instead based on ability to pay,
Therefore, rationing by waiting times may be not need). Therefore, waiting times for patients on
less equitable than it appears. the list should reflect need and not on socioeco-
The chapter focuses on studies employing nomic status. Indeed, patients on the list are pri-
large samples either from administrative or survey oritized by doctors. Patients with higher severity
data. The study is organized as follows. Possible and urgency are supposed to wait less than less
sources of inequalities in waiting times are first severe and urgent patients.
discussed. Second, appropriate data and empirical In practice, it is possible that variations in
methods are presented which can be usefully waiting times for publicly funded patients reflect
employed to investigate inequalities in waiting also non-need factors. Waiting-time inequalities
times. Third, the existing evidence is reviewed. may be due to hospital geography and therefore
Fourth, possible policy implications are drawn. arise “across” hospitals. This could be due to
some hospitals having more capacity (number of
beds and doctors) and being able to attract a more
Sources of Inequalities in Waiting skilled workforce. This may be the case for hos-
Times pitals located in an urban as opposed to a rural
area. Also, some geographical areas may be
This section describes different mechanisms that underfunded compared to others. If individuals
generate inequalities in waiting times. Several with higher socioeconomic status live in areas
health systems are publicly funded and character- where hospitals are better funded or have higher
ized by universal health coverage (e.g., Australia, capacity, then this may contribute to inequalities
Italy, New Zealand, Spain, and the United King- in waiting times by socioeconomic status.
dom). These often coexist with a parallel private Inequalities in waiting times may also arise
sector for patients who are willing to pay out of “within” the hospital as opposed to across “hos-
pocket or who are covered by private health insur- pitals.” Individuals with higher socioeconomic
ance. A key feature of this private sector is that status may engage more actively with the health
348 L. Siciliani
system and exercise pressure when they experi- individuals with primary, secondary, and tertiary
ence long delays. They may be able to express educational attainments, etc.).
better their needs. They may also have better The key advantage of survey data is that socio-
social networks (know someone) and use them economic status (such as income and highest edu-
to have priority over other patients (attempt cational attainment) is routinely recorded at
to jump the queue). They may have a lower individual level. However, the sample tends to
probability of missing scheduled appointments be smaller and more heterogeneous: patients’
(which would increase the waiting time). They treatment can range from less urgent ones (e.g., a
may search more actively for hospitals with lower cataract surgery) to more urgent ones (e.g., cancer
waiting times and willing to travel further. treatment). Detailed measures of severity are gen-
erally also missing. A measure of self-reported
health tends to be used as a proxy of health
Data and Empirical Methods needs which in line with previous literature on
measuring social inequalities in healthcare utili-
Data zation (Wagstaff and Van Doorslaer 2000).
times (from specialist addition to the list) of (Australia, New Zealand, Portugal, Finland,
patients on the list at a census date. This measure and the United Kingdom).
is analogous to the definition provided above but It is important to emphasize that such measures
refers to the patients on the list at a given census of waiting times refer to elective (nonemergency)
date (as opposed to patients treated in a given conditions where the wait is generally long (in the
year). Similarly, the referral-to-treatment waiting order of weeks of months) though they can be
time of patients on the list can be defined. shorter for more urgent elective care (e.g., cancer
The distribution of waiting time of patients care). Emergency care is therefore often excluded
treated measures the full duration of the patient’s from the empirical analyses.
waiting time experience (from entering to exiting Most empirical analyses making use of admin-
the list). The distribution of the waiting times of istrative data surveyed in this chapter have
patients on the list refers to an incomplete duration employed data that measure the inpatient waiting
since, if on the list, patients are still in the process time, which is computed retrospectively once the
of waiting. The waiting time of patients treated patient has received treatment. Those with survey
has the advantage of capturing the full duration of data have included both the inpatient and the
a patient’s journey, but it is retrospective in nature. outpatient waiting time (for a specialist visit).
However, it does not capture the wait of the Waiting-time measures from survey data are typ-
patients who never received treatment since they ically self-reported. Surveyed individuals are
died while waiting, changed their mind, received a asked questions of the type “if you had an inpa-
treatment in the private sector, etc. The two dis- tient (outpatient) care in the last year, how long
tributions of waiting times are different but did you wait to be treated (to see a specialist)?”
related. Both distributions can be used to com- Answers may therefore suffer from recall bias.
pute the probability of being treated (i.e., of
waiting time ending) as time passes, i.e., the
hazard rate in terms of survival analysis. The Methods
hazard rate derived under the two distributions
will be the same if the system is in steady state The empirical analyses are interested in testing
and if each patient on the list is ultimately treated. whether patients with higher socioeconomic sta-
Both conditions are unlikely to hold in reality. tus wait less than patients with lower socioeco-
This emphasizes some of the differences between nomic status when admitted to hospital. This
the two distributions (but see Armstrong 2000, section first presents a simple model specification
2002; Dixon and Siciliani 2009 for a fuller dis- which can be estimated with the Ordinary Least
cussion of these issues). Square (OLS) method and then proceeds to more
Table 1 below provides comparative figures of sophisticated models such as duration analysis.
median and mean waiting times across OECD
countries in 2011. It illustrates how some coun- Model Specification
tries report inpatient waiting time from specialist with Administrative Data
addition to the list to treatment, some report inpa- Suppose that the researcher has at her disposal a
tient waiting time for patients on the list, and some sample of I patients receiving treatment in
report both measures. Among the countries J hospitals. The sample includes all patients who
included, waiting times appear lowest in Denmark received a specific treatment (e.g., hip and knee
and the Netherlands. It is also evident that mean replacement, cataract surgery, coronary bypass,
waiting times are longer than the median ones, varicose veins). Each patient receives treatment
and this is due to the skewed distribution of only in one hospital. Each hospital in the sample
waiting times with a small proportion of patients treats at least one patient. Define w as the inpatient
having a very long wait. As an example, Fig. 1 waiting time for patients receiving treatment in a
provides the distribution of waiting times for public hospital for treatment. It is assumed that
hip replacement for several OECD countries waiting times are measured in days and that
350
Table 1 Median (mean) waiting times for common surgical procedures: 2011
Patients treated – inpatient (time from specialist addition to list to treatment)
Hip replacement Knee replacement Cataract Hysterectomy Prostatectomy Cholecystectomy Hernia CABG PTCA
Australia 108 173 90 49 47 54 57 17
Canada 89 107 49 7
Finland 108 127 122 84 49 69 76 35 21
(125) (141) (125) (98) (72) (90) (96) (45) (31)
Netherlands (46) (43) (32) (34) (32) (35) (36) (26) (16)
New Zealand 90 96 84 98 63 62 57 28 51
(104) (112) (94) (109) (86) (86) (82) (37) (66)
Portugal 92 192 49 90 61 80 82 2
(149) (231) (67) (125) (115) (134) (120) (29)
Spain (127) (89) (91) (89) (87)
UK-England 81 85 57 61 31 70 60 52 35
(91) (96) (65) (70) (41) (81) (71) (62) (40)
UK-Scotland 75 80 62 48 51 61 63 35 29
(90) (94) (70) (53) (55) (77) (82) (47) (33)
Patients treated – Referral to treatment (time from family doctor referral to treatment)
Denmark 39 46 70 35 36 38 45 13
(51) (59) (99) (49) (56) (46) (56) (19)
Patients on the list – Inpatient
Ireland 103 119 118 96 81 93 98 77 54
(130) (153) (144) (131) (127) (132) (128) (102) (78)
New Zealand 60 65 51 65 51 58 54 46 38
(78) (84) (63) (73) (66) (75) (69) (60) (51)
Portugal 129 156 67 82 103 117 95 93
(189) (200) (100) (111) (185) (178) (147) (118)
Spain (93) (71) (74) (74) (71)
Sweden 43 45 40 25
Slovenia 340 495 58 90 90 240
(354) (512) (63) (122) (132) (275)
Source: Siciliani et al. (2013b)
L. Siciliani
15 Waiting Times: Evidence of Social Inequalities in Access for Care 351
80
65
57 55
60 51
40 42
37
40 33
%
29
18
20 14 13
11 12
5 8
5 5 3 2
1 1 1 0 1 0
0
UK - UK - England Portugal New Finland Australia
Scotland Zealand
% waiting 0-3 months % of patients waiting 3 -6 months
% of patients waiting 6 -9 months % of patients waiting 9 -12 months
% of patients waiting over 12 months
Fig. 1 Distribution of waiting times of patients treated (Source: Siciliani et al. (2013b))
waiting time is a continuous variable. The follow- dj is a vector of hospital dummy variables
ing linear model can be specified: (fixed effects), one for each hospital. These are
included to control for systematic differences in
0 0 0
wij ¼ d j βj þ yij βy þ sij βs þ eij (1) waiting times across hospitals which arise from
differences in supply (beds, doctors, efficiency) or
where wij is the waiting time of patient i in public in demand (e.g., proportion of the elderly). Hos-
hospital j. Waiting times are a function of (and pitals with higher βj have longer waiting times on
additively separable in) the determinants outlined average.
on the Right Hand Side of Eq. 1. eij is the idiosyncratic error term. This can be
sij is a vector of patients’ characteristics cap- interpreted as any variation in waiting time
turing patients’ severity. These could include age, which is not captured by the other variables (this
gender, and number of comorbidities. These fac- includes coding and measurement error, or
tors control for the severity of patient’s health unobserved – to the researcher – dimensions of
condition. In many countries, patients on the list severity).
are prioritized on the basis of their severity and The simplest way to estimate Eq. 1 is with
more severe patients wait less relative to ordinary least squares (OLS). OLS minimizes
nonsevere ones. The coefficients βs are therefore the sum of the squared distances between the
expected to be negative. They provide a measure observed data and the predicted ones based on
of the extent to which patients with higher severity linear approximation, i.e., the sum of the squared
wait less. of the errors (Cameron and Trivedi 2010,
yij is a variable (or a vector of variables) which Chap. 3). OLS relies on a number of assumptions,
captures socioeconomic status, as measured by including the exogeneity of the regressors, the
the income in the area where the patient lives. error terms having the same variance (homosce-
Inequalities in waiting time across patients with dasticity) and conditionally uncorrelated observa-
different socioeconomic status arise if βy 6¼ 0. If βy tions. Under the assumption that the error terms
is negative then individuals with higher (lower) are normally distributed, the hypothesis can be
socioeconomic status wait less (more), keeping tested on whether the estimated coefficients are
other variables (including severity) constant. statistically different from zero.
352 L. Siciliani
For the coefficients βy to provide an unbiased covariates (regressors) on the RHS of Eq. 1 are
(correct) estimate of whether patients with higher also in transformed in log, then each estimated
socioeconomic status wait more or less than OLS coefficient can be interpreted as elasticity.
other patients, either socioeconomic status has For example, if socioeconomic status is measured
to be uncorrelated with other determinants of with income and βy ¼ 0:5, then a 10 % increase
waiting times (which seems implausible) or, if in income reduces waiting times by 5 %. If
it is correlated, it has to be controlled for all instead the covariates are dummy variables, then
possible determinants of waiting times. Other- the estimated coefficient can be interpreted
wise, the estimates of βy will be prone to (approximately) as the proportionate change in
so-called omitted variable bias. waiting times (semielasticity). For example, sup-
For example, more severe patients are more pose that socioeconomic status is measured
likely to have lower socioeconomic status through the highest level of education attained
(Wagstaff and van Dooerslaer 2000). Patients’ by the patient and patients either went to univer-
severity may therefore be correlated negatively sity or not. Suppose further that the estimated
with both waiting time and socioeconomic status. coefficient associated to the dummy variable
Failure to control for patient severity might gen- (equal to one if the patient has a university degree)
erate biased results. Without controlling for sever- is equal to βy ¼ 0:1 . Then, patients with a
ity, a positive correlation between waiting time university degree wait 10 % less.
and income may be observed, while such correla- Estimating Eq. 1 by OLS treats hospital effects
tion may disappear once controls for severity are as fixed. This approach generates unbiased but
added. inefficient estimates due to the inclusion of a
Similarly, hospitals with high supply (and large number of regressors (therefore introducing
lower waiting times) are likely to be located in the possibility of not identifying a gradient when
urban areas where high-income patients are there is one). An alternative approach is to assume
concentrated leading to a correlation between that hospital effects are random. Under the
hospital characteristics and socioeconomic char- assumption that hospital effects are uncorrelated
acteristics of patient’s area of residence. Omit- with other covariates, the coefficients in Eq. 1 will
ting hospital dummies (fixed effects) might be estimated more efficiently. However, a random
overestimate inequalities. Including hospital effect model will generate biased coefficients if
fixed effects allows interpreting socioeconomic hospital effects are correlated with other
inequalities in waiting times “within” a hospital, covariates. Whether the random effects generate
rather than across hospitals. If researchers are different estimated coefficients compared to the
interested in explaining waiting times inequal- fixed effects, can be tested through a Hausman test
ities across hospitals, a range of supply variables (Cameron and Trivedi 2010, Chap. 8).
(e.g., number of beds and doctors, length of stay)
can be employed instead of hospital fixed effects. Model Specification with Survey Data
In summary, inequalities in waiting time across Studies that employ survey data have typically
patients with different socioeconomic status arise smaller samples. Investigating waiting times by
if βy 6¼ 0 , i.e., when differences in waits are treatment or procedure is often precluded. An
statistically significant even after controlling for analysis can still be conducted by pooling the
patients’ severity and hospital fixed effects. sample across different treatments and conditions.
Hypothesis testing requires the error terms to In such studies, additional dummy variables have
be normally distributed. Given that waiting times to be introduced to control for systematic differ-
have a skewed distribution, the error terms in ences in waiting times across conditions (e.g.,
Eq. 1 are unlikely to be normal. To address this waiting for a cataract surgery tends to be longer
issue, the dependent variable wij is typically than for coronary bypass). Moreover, survey data
transformed by the logarithmic function, so that rarely have information on the provider (e.g., the
the dependent variable becomes log(wij). If the hospital) where the patient received the treatment.
15 Waiting Times: Evidence of Social Inequalities in Access for Care 353
It is therefore not possible to control for hospital NBM reduces to the PM in the special case
fixed effects. when there is no overdispersion in the data.
The model in Eq. 1 can be modified in the If measured in weeks or months, waiting times
following way. Define again w as the inpatient data are discretized: the variable is observed dis-
or outpatient waiting time for patients who cretely, whereas the underlying process generat-
received treatment in a given year. The model ing waiting times is intrinsically continuous. An
specification is: alternative to the NGM is the interval regression
model which is specifically designed for discretized
0 0 0
lnðwik Þ ¼ dk βk þ yi βy þ si βs þ eik (2) continuous variables.
which is less than one will imply that less- patients based on their degree of urgency.
educated patients have a lower probability of Some dimensions of urgency may however
exiting the list (and therefore of being treated remain unobservable to the researcher. Whether
within a given time). a larger socioeconomic gradient should be
The Cox model assumes the hazard ratio expected at low or high waiting times is in
between two different groups, for example, those principle indeterminate. Since waiting times
treated in hospital j and hospital j 0 , exp are short when the condition is more urgent,
" #
X richer and more-educated people may be keener
β k xj xj0 is constant with time waited to obtain reductions in waiting times when they
k
k
perceive delays to affect their health more crit-
(Cameron and Trivedi 2005, Chap. 17.8). If this
ically. On the other hand precisely because
assumption is violated, then the stratified Cox
waiting times are short, there may be less
model and the extended Cox model may be
scope for influencing them.
more appropriate. The former introduces
Finally, a concern may be raised that esti-
group-specific baseline hazards, h0j(t). There-
mates in Eq. 1 are contaminated by what is
fore, the conditional hazard rate becomes:
! known as sample selection based on unobserved
X factors (to the researcher). For example, patients
hðt; xÞ ¼ h0j ðtÞexp βk xk . The main advan-
k with higher income who expect to wait a long
tage of the stratified Cox model is that it relaxes time are more likely to afford and opt for the
the common baseline hazard assumption. The private sector. It may therefore arise that public
main disadvantage is that hazard ratios between hospitals treat poor patients with expected high
the stratified groups cannot be identified. The and low waiting times but only rich patients
extended Cox model introduces time depen- with low waiting times. In turn, this may gener-
dency by interacting covariates with the time ate an apparent negative gradient between
waited, gk(t), (Pettitt and Daud 1990; Fisher income and waiting time for patients receiving
and Lin 1999): treatment within publicly funded hospitals. If
the researcher observes whether patients went
" # for public and private treatment, then a Heck-
X X
hðt; xðtÞÞ ¼ h0 ðtÞexp β k xk þ δk xk gk ðtÞ man Selection model can be performed to adjust
k k for sample-selection bias (Heckman 1979).
(4) Such model involves estimating a selection
equation for the choice of the patient between
where δk are the coefficients of the time opting for private care versus public care, which
interactions. can include socioeconomic status among its
determinants. For the model to perform well,
Other Methods an identification variable is recommended, i.e.,
Another useful regression method for investi- a variable which predicts the choice of going
gating waiting times is quantile regression public versus private but does not directly affect
(Cameron and Trivedi 2010, Chap. 7). Estimat- waiting times (distance to the hospital may be
ing Eq. 1 by OLS allows estimating the effect of such an identifying variable; see Sharma
socioeconomic status at the sample mean. Since et al. 2013).
patients differ in the degree of urgency, it may
be interesting to estimate whether such effect is
persistent also when waiting times are high or A Review of the Evidence
low, i.e., across different cut-off points in the
waiting time distributions (say the 20th and This section first reviews key results from inter-
80th percentile, and at the median) through a national studies and then on studies that focus on
quantile regression model. Doctor prioritize individual countries.
15 Waiting Times: Evidence of Social Inequalities in Access for Care 355
International Studies: Evidence from United States). Waiting times are measured for a
SHARE and the Commonwealth Fund specialist visit and for elective surgery. Socio-
economic status is proxied by a dummy variable
Using survey data from Survey of Health, Ageing equal to one if income is above average. Control
and Retirement in Europe (SHARE), Siciliani variables include age, health status, and for the
and Verzulli (2009) test whether waiting times USA for private insurance status.
for specialist consultation and nonemergency Employing logistic regression, the study
surgery differ by socioeconomic status. The sam- shows that individuals with above-average
ple includes nine European countries: Austria, income have a lower probability of waiting
Denmark, France, Germany, Greece, Italy, the more than 2 months for a specialist visit in
Netherlands, Spain and Sweden. The survey covers Australia, New Zealand, and the Netherlands.
22,000 respondents across these European coun- They also have a higher probability of waiting
tries. The analysis controls for severity as proxied less than 4 weeks for a specialist visit in
by age, gender, and self-reported health (and type Australia, Canada, New Zealand, and the
of specialist care and treatment). Privately funded United States. No marked differences in waiting
patients are excluded from the analysis (a minority times by socioeconomic status are found for
of the sample). Therefore, the analysis can be elective surgery. Since no control variable is
interpreted in terms of inequalities among publicly included for patients going to private provider,
funded patients. Since waiting times are measured differences in waiting times by socioeconomic
in weeks and months, a negative binomial model is status could to some extent be explained by
employed. richer patients opting for the private sector
For specialist consultation, they find that indi- when waiting times are high.
viduals with high education experience a reduc-
tion in waiting times of 68 % in Spain, 67 % in
Italy and 34 % in France (compared with indi- United Kingdom
viduals with low education). Individuals with
intermediate education report a waiting-time Using administrative data, Cooper et al. (2009)
reduction of 74 % in Greece (compared with investigate for the presence of inpatient waiting-
individuals with low education). There is also time inequalities in England for the following
evidence of a negative and significant associa- elective procedures: hip and knee replacement
tion between education and waiting times for and cataract surgery. They also compare whether
nonemergency surgery in Denmark, the Nether- such inequalities varied during the Labor govern-
lands, and Sweden. High education reduces ment between 1997 and 2007. Waiting time was
waits by 66 %, 32 %, and 48 %, respectively. much higher in the early years but then gradually
There is some evidence of income effects, fell. The analysis refers to publicly funded
although generally modest. An increase in patients only, i.e., patients treated by the National
income of 10,000 Euro reduces waiting times Health Service. Patients who do not want to wait
for specialist consultation by 8 % in Germany can opt for treatment in the private sector, but they
and waiting times for nonemergency surgery by will have to pay or hold a private health insurance.
26 % in Greece. Surprisingly, an increase in The regression analysis (similar to Eq. 1) con-
income of 10,000 Euro increases waits by 11 % trols for patients’ age, gender, area type (e.g., city,
in Sweden. town and fringe, isolated village), but not for
Schoen et al. (2010) use data from the 13th hospital fixed effects. The regressions are run for
annual health policy survey conducted in three periods corresponding to different govern-
2010 by the Commonwealth Fund in eleven ment policy (1997–2000, 2001–2004, and
countries (Australia, Canada, France, Germany, 2005–2007). Socioeconomic status was measured
New Zealand, the Netherlands, Norway, Swe- through an index of income deprivation (the 2001
den, Switzerland, the United Kingdom, and the Carstairs index at the output area level then
356 L. Siciliani
transformed in to five income deprivation quin- Pell et al. (2000) investigate inequalities in
tiles). The Carstairs index is based on car owner- waiting times for cardiac surgery in Scotland.
ship, unemployment, overcrowding, and social They employ administrative data measuring the
class within output areas, calculated by the Office inpatient waiting time. Similarly to Cooper
of National Statistics. et al. (2009), socioeconomic status is proxied
The study finds that compared to patients through the Carstairs deprivation index. They
with lowest income deprivation (highest socio- find that the most deprived patients waited
economic status) patients in other groups tend 24 days longer than least deprived ones. This
to wait longer, up to about 2 weeks longer. was in part due to less deprived patients more
For some procedures and years, the effect is likely to be classified as urgent.
not-monotonic with patients with middle-
income deprivation waiting longest. Inequal-
ities in waiting times tend to decrease over Australia
time. This is probably due to waiting times
falling over the considered period. The authors Sharma et al. (2013) investigate the presence of
conclude that equity improved over time. In the inequalities in waiting times in the State of Vic-
period 2005–2007, very little differences toria (which accounts for 25 % of Australian
existed in waiting times across patients with population) in 2005. The study employs admin-
differing deprivation. istrative data on inpatient waiting time for pub-
The analysis by Cooper et al. (2009) does not licly funded patients. Several surgical procedures
account for hospital fixed effects. Therefore, are employed (including eye, hip and knee pro-
inequalities in waiting times may reflect varia- cedures, hysterectomy, and prostatectomy).
tions “across hospitals” due, for example, to A key institutional feature of the Australia
different resources or variations “within the system is that although everyone has public
hospital” due, for example, to some patients insurance, about half of the population has pri-
being able to get ahead of the queue. Laudicella vate health insurance and about half of the care
et al. (2012) extends the analysis by introducing is provided by private hospitals. More precisely,
hospital fixed effects but focuses on hip replace- patients who seek treatment in a public hospital
ment only in 2001. They split the deprivation receive treatment for free under Medicare
index between “income” deprivation (based on (Australia’s universal public health insurance
individuals on benefits) and “education” depri- scheme) but have to wait. Patients who seek
vation. They provide evidence of inequalities in treatment in a private hospital incur the full
waiting times favoring more-educated and cost of treatment, which is paid by the patient
richer individuals. More precisely, a patient either directly or through her private health
who is least skill deprived in education wait insurer.
9–14 % less than other patients; patients in the Given such institutional feature, one explana-
fourth and fifth most income-deprived quintile tion for a potential observed gradient between
wait about 7 % longer than other patients. The waiting time and socioeconomic status for pub-
analysis provides evidence that most inequal- licly funded patients is the possibility of sample
ities occur within hospitals rather than across selection: rich patients who expect to wait are
hospitals (failure to control for hospital fixed more likely to afford and opt for the private sector
effects results in underestimation of the income generating a negative gradient between income
gradient). The key insights are similar when the and waiting time in the public system. In other
Cox nonparametric model is employed. More words, public hospitals treat poor patients with
educated patients have a higher probability of expected high and low waiting times, but only
leaving the list (the inverse of hazard ration) by rich patients with low waiting times are treated
2–6 %. Richer patients have a higher probabil- in public hospitals. This is of potential importance
ity of leaving the list by 4–9 %. for policy. If the gradient is explained by sample
15 Waiting Times: Evidence of Social Inequalities in Access for Care 357
selection, then it should not be interpreted as inpatient waiting times for publicly funded
evidence of inequity. patients in public hospitals in 2004–2005 and
Since private hospitals have to report the same include all acute illnesses. Socioeconomic status
data than public hospitals, detailed administrative is measured through the SEIFA index (mentioned
data are available for both public and private above) split into five groups. Without controlling
sector (unlike many other countries). These data for supply factors, they find that more deprived
are therefore suitable for testing for sample selec- patients wait 30 % longer than those in the least
tion generated by the private sector through a deprived group (they wait about a month more
Heckman sample-selection model (the distance with an average wait of about 3 months across all
to the nearest public and private hospitals are patients included in the sample). These differ-
used as identifying variable). ences reflect inequalities both within and across
Like the English studies, socioeconomic status hospitals.
is measured through an index which captures eco- Once the authors control for supply factors
nomic resources at small-area level (suburbs), (such as bed occupancy rate, length of stay, ratio
known as the SEIFA (Socio-Economic Indexes of clinical staff to beds, proportion of emergency
for Areas) for economic resources. Examples of admissions), then patients wait 16–24 % longer
variables which generate the SEIFA index for are compared to patients in the highest socioeco-
the proportion of: people with low-income, sin- nomic group. This implies that richer patients
gle-parent families, occupied private housing with live in areas with better supply of hospital ser-
no car, households renting from community orga- vices. However, inequalities within the hospital
nization, unemployed, and households owning a persist after controlling for supply factors.
house they occupy. Quantile regression results confirm that inequal-
The analysis suggests that individuals who live ities are present at all quantiles of the waiting time
in richer areas wait less. Compared to patients distribution.
living in areas with lowest income, patients living
in areas with highest income wait 13 % less. With
an average waiting of 89 days, this implies an Norway
average reduction of 11 days. Patients in almost
every decile of income have a progressively lower Monstad et al. (2014) use data from the Norwe-
waiting time than the one below. Once selection is gian Arthroplasty Register for patients in need of
taken into account, the gradient between waiting hip replacement in Norway in 2002–2003 to test
times and socioeconomic status reduces signifi- whether patients with higher socioeconomic sta-
cantly in size but does not disappear. Compared to tus wait less. Income and education are measured
patients in the lowest income decile, patients at individual level. The sample covers 98 % of all
whose income falls between the 2nd and 7th dec- hip replacements. Since every patient has a unique
iles wait 3–4 % less, and patients whose income personal identification code, then the registry data
falls between the 8th and 10th deciles wait 5–7 % can be perfectly matched with other registers at
less. Therefore, the analysis still suggests evi- Statistics Norway.
dence of inequity though a reduced one compared The healthcare system in Norway is largely
to the case when selection is not taken into publicly funded with a negligible private sector
account. The results from quantile regression (therefore, the possibility to opt out is limited).
models confirm that inequalities persist at differ- Waiting times for hip replacement were on aver-
ent points of the waiting time distribution. age 170 days. The analysis is presented separately
Johar et al. (2013) use administrative data from for men and women. All specifications control for
New South Wales in Australia to decompose var- hospital fixed effects. Therefore, results can be
iations in waiting times that are due to clinical interpreted as inequalities arising “within the
need, supply factors, and nonclinical factors hospital.” The study finds that richer men and
such as socioeconomic status. They measure more-educated women tend to wait less: a 10 %
358 L. Siciliani
increase in income reduces waiting times by 8 %; orthopedic surgery and 34 % longer waiting
women with 3 years of upper secondary education times for general surgery. No differences on the
wait 7 % less compared to those with compulsory basis of ethnicity and gender were found. Income
schooling only. mattered more at the upper tail of the waiting time
Carlsen and Kaarboe (2014) use administrative distribution.
data (the Norwegian patient registry) from all
elective inpatient and outpatient hospital stays in
Norway for 2004–2005. The waiting time is mea- Canada
sured from the referral (from family doctor) until
the patient meets with a hospital specialist. Socio- Alter et al. (1999) employ a large administrative
economic status is measured at small-area level dataset to investigate whether publicly funded
(about 31,000 cells). Since the register contains waiting times for patients in need of a coronary
information about hospital stay, gender, year of angiography in 1993–1997 in Ontario (Canada)
birth, and resident municipality, patients can be differ for by socioeconomic status. The latter is
uniquely assigned to population cells that com- proxied by neighborhood income as determined
bine gender, age, and municipality. For each pop- by the Canadian census. The study controlled for a
ulation cell, Statistics Norway computed a set of number of supply factors such as the hospital
variables that describe the income and educational volume, distance from hospital, type of hospital,
levels of the cell population in 2004. in addition to clinical ones capturing patients’
The study finds that men with tertiary educa- severity. The study finds that patients in the
tion wait about 15 % less than men with primary highest income quintile wait 45 % less compared
education only. Women in the lowest income to patients in the lowest income quintile.
quintile wait 11 % longer than women with Carrière and Sanmartin (2007) investigate
highest income quintile. However, once controls determinants of waiting times for specialist con-
are added for hospital-specific factors (whether sultation using the 2007 Canadian Community
they went to the local hospital, travel time, and Health Survey. Like other surveys, the analysis
choice of hospital), most of inequalities disappear. does not control for hospital variables. On the
Whether the patient goes to the “local hospitals” other hand, socioeconomic status (household
and travel distance are key factors explaining the income and educational attainment) is measured
gradient. Since hospitals in low-income regions at individual level. The key finding is that com-
have longer waiting time than hospitals located in pared with men in the top income quintile, those in
high-income and middle-income regions, control- the lowest were less likely to see a specialist
ling for local hospitals makes the income gradient within a month (after controlling for possible con-
flatter. Travel distance also weakens the associa- founders). This was not the case for women.
tion between income and waiting time. Patients’
income decreases in traveling distance, whereas
waiting time increases with distance. Germany
Spain
Conclusions and Implications
Abasolo et al. (2014) use the 2006 Spanish for Policy
National Health Survey to test for waiting time
inequalities. The Spanish health system is charac- Within publicly funded systems, access to ser-
terized by universal coverage and tax funding. vices is supposed to depend on need and not
Waiting time is measured for the last specialist ability to pay (or, more broadly, socioeconomic
visit and is measured separately for a first visit status). The recent empirical literature reviewed in
and for a review visit. Like other studies this chapter seems however to suggest that this is
employing survey data, household income and not necessarily the case. The chapter focuses on
education are measured at individual level. Only elective (i.e., nonemergency) services and does
public patients are included in the analysis. Public not cover the literature on waiting times in the
patients have no or limited copayments for spe- emergency room. There is empirical evidence
cialist services. Average waiting time was about from several countries, suggesting that individuals
2 months. with higher socioeconomic status (as measured by
The analysis controls for type of speciality, income or educational attainment) tend to wait less
self-assessed health, existing conditions (such as for publicly funded hospital elective services than
hypertension and heart problems), whether the those with lower socioeconomic status. Combined
patient has private insurance, employment status, with the empirical literature reviewed in the Intro-
living in a rural area, different regions, in addition duction, it suggests that not only individuals with
to demographic variables. The study finds that an higher socioeconomic status tend to see doctors
increase of 10 % of the income reduces waiting more frequently, but also more swiftly.
times for diagnosis visits in 2.6 %. Individuals Waiting-time inequalities within public sys-
with primary education wait 28 % longer than tems may be due to a number of different reasons.
individuals with university studies. They may be due to hospital geography with some
360 L. Siciliani
Martin S, Smith PC. Rationing by waiting lists: an empir- Siciliani L, Verzulli R. Waiting times and socioeconomic
ical investigation. J Public Econ. 1999;71:141–64. status among elderly Europeans: evidence from
Monstad K, Engeaeter LB, Espehaug B. Waiting time SHARE. Health Econ. 2009;18(11):1295–306.
socioeconomic status – an individual level analysis. Siciliani L, Borowitz M, Moran V, editors. Waiting time
Health Econ. 2014;23:446–61. policies in the health sector. What works? Paris: OECD
Pell J, Pell A, Norrie J, Ford I, Cobbe S. Effect of socio- Book; 2013a.
economic deprivation on waiting time for cardiac sur- Siciliani L, Moran V, Borowitz M. Measuring and compar-
gery: retrospective cohort study. Br Med J. 2000;321: ing health care waiting times in OECD countries.
15–8. OECD health working papers, 67. OECD Publishing;
Petrelli A, De Luca G, Landriscina T, Costa G. Socioeco- 2013b. https://doi.org/10.1787/5k3w9t84b2kf-en.
nomic differences in waiting times for elective surgery: Siciliani L, Moran V, Borowitz M. Measuring and compar-
a population-based retrospective study. BMC Health ing health care waiting times in OECD countries.
Serv Res. 2012;12:268. Health Policy. 2014;118(3):292–303.
Pettitt AN, Daud IB. Investigating time dependence in Tinghög G, Andersson D, Tinghög P, Lyttkens
Cox’s proportional hazards model. J R Stat Soc. Ser C CH. Horizontal inequality when rationing by waiting
(Appl Stat). 1990;39(3):313–29. lists. Int J Health Serv. 2014;44(1):169–84.
Roll K, Stargardt T, Schreyogg J. Effect of type of insur- van Doorslaer E, Wagstaff A, et al. Equity in the delivery of
ance and income on waiting time for outpatient care, health care in Europe and the US. J Health Econ.
the Geneva papers. Int Assoc Study Insur Econ. 2000;19(5):553–83.
2012;37:609–32. Van Doorslaer E, Koolman X, Jones AM. Explaining
Schoen C, Osborn R, Squires D, Doty MM, Pierson R, income-related inequalities in doctor utilization in
Applebaum S. How health insurance design affects Europe. Health Econ. 2004;13(7):629–47.
access to care and costs, by income, in eleven countries. Wagstaff A, van Doorslaer E. Equity in health care financ-
Health Aff. 2010;29(12):2323–34. ing and delivery. Chapter 34. In: Culyer AJ, Newhouse
Sharma A, Siciliani L, Harris A. Waiting times and socio- JP, editors. Handbook of health economics, vol. 1. 1st
economic status: does sample selection matter? Econ ed. Amsterdam: Elsevier Science/North-Holland; 2000.
Model. 2013;33:659–67. p. 1803–62.
Health Services Data: The Ontario
Cancer Registry (a Unique, Linked, 16
and Automated Population-Based
Registry)
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
History of Cancer Registration in Ontario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
Automation and OCRIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
EDW Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Who Uses OCR Data and for What Purpose? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Examples of Provincial Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Examples of National Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Examples of International Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Pathology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Activity Level Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
DAD and NACRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Death Certificates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Data Systems and Consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
OCRIS and the EDW Successor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Patient Linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Case Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
Data Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
Other Factors Affecting Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
The OCR Adopts a New Approach to Counting Cancers . . . . . . . . . . . . . . . . . . . . . . . . . . 376
Topography and Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
Laterality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
EDW Enterprise Data Warehouse (the Ministry) and its advisor on the cancer and
EDW- Enterprise Data Warehouse based renal systems, as well as on access to care for key
OCR OCR health services. CCO strives for continuous
eMaRC Electronic Mapping, Reporting, improvement in disease prevention, screening,
and Coding Plus the delivery of care, and the patient experience.
ePath Electronic pathology data collec- CCO works with Regional Cancer Programs
tion system across the province, cancer experts, community
IACR International Association of Can- advisory committees, hospitals, provincial agen-
cer Registries cies and government, public health units, the
IARC International Agency for Ontario Hospital Association, the not for profit
Research on Cancer sector, as well as with cancer agencies in other
ICBP International Cancer provinces and the federal government, among
Benchmarking Partnership others, in order to achieve its mandate. Authority
ICD International Classification of for CCO’s programs and functions are provided
Diseases in the provincial Cancer Act, the Personal Health
ICD-O International Classification of Information Protection Act (PHIPA 2016), and a
Diseases for Oncology Memorandum of Understanding between the
MPH Multiple Primary and Histology Ministry and CCO (Cancer Act 2006).
NAACCR North American Association of In accordance with Ontario’s PHIPA legisla-
Central Cancer Registries tion, CCO is defined as a “prescribed entity” for
NACRS CIHI’s National Ambulatory Care certain functions. This designation authorizes
Reporting System CCO to collect, use, and disclose personal health
OCR Ontario Cancer Registry information for the purposes of cancer manage-
OCRIS Ontario Cancer Registry Infor- ment and planning. The OCR is a prescribed
mation System entity to support this goal. The OCR team at
OCTRF Ontario Cancer Treatment and CCO is comprised of pathology coders, standards
Research Foundation advisors, stage abstractors, quality assurance and
RCC Regional Cancer Center data analysts, and a management team. The OCR
SEER Surveillance, Epidemiology, and team’s responsibilities include:
End Results program
SSF Site-specific factors • Curating and coding source data to identify
TNM Tumor Node Metastasis staging incident cancer cases
• Deriving population-level cancer staging
values
Introduction • Working with standard setting bodies to estab-
lish the best practices for the registry
The purpose of this chapter is to describe
• Setting direction for the management of the
the evolution of the Ontario Cancer Registry
registry
(OCR), and explore its many purposes, pro-
• Collaborating with partners and stakeholders
cesses and applications that make it of particular
to enable use of OCR data for surveillance and
interest to researchers. The chapter also empha-
research, and more generally for cancer pre-
sizes how the registry has established itself as an
vention and control
effective population-based surveillance and
research tool.
The OCR is the official provincial cancer inci- The goal of the registry is to collect and dis-
dence registry for Ontario and is managed by Can- seminate timely and high-quality information
cer Care Ontario (CCO). CCO is an agency of the describing all cases of cancer diagnosed among
Ontario Ministry of Health and Long-Term Care Ontario residents using measures of cancer
366 S. Prodhan et al.
80000 480
70000 420
50000 300
New cases
40000 240
Deaths
Incidence rate
30000 180
Mortality rate
20000 120
10000 60
0 0
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Year of diagnosis or death
Fig. 1 Cancer incidence and mortality counts and sharp rise in incidence from 2010 onward is attributed to
age-standardized rates per 100,000 (adjusted to the 1991 the adoption by the OCR of new counting rules for multiple
Canadian Standard population) from 2002 to 2011. The primary cancers (Source: Ontario Cancer Registry, 2014)
burden such as incidence and mortality. The OCR organizations and individuals in the healthcare
is the largest provincial cancer registry in system to provide such information to the
Canada, covering a population that comprises OCTRF. This led to the formation of Ontario’s
almost 40 % of the Canadian population. With cancer registry in 1964. Initially managed by the
Ontario’s growing and aging population, the Ontario Department of Health, the cancer regis-
OCR is expected to have collected information try began tracking cancer incidence in 1964 and
on a projected 83,000 new cases of cancer in retrospectively collected cancer mortality data
2015 (Fig. 1). from as far back as 1950. In 1970, the OCTRF
took ownership of the cancer registry. A more
complete description of the historical mile-
History of Cancer Registration stones of the registry is described elsewhere
in Ontario (Clarke et al. 1991).
of OCRIS and its automatic record delivery, the Examples of Provincial Stakeholders
OCTRF was capable of receiving records almost
instantly. OCRIS’ improvements to data collec- The Cancer Quality Council of Ontario is an arm’s
tion and the development of case resolution – the length agency of the Ontario government tasked
sophisticated system of computerized medical with measuring the performance of the Ontario
logic - further established the province’s cancer cancer system. The Council relies on OCR data
registry as an important tool in cancer control. to generate the Cancer System Quality Index,
Enhancements to OCRIS were later made in the which reports on quality measures aimed at stim-
1990s, and for the next 20 years, it continued to ulating improvement in the cancer system.
be an integral component of cancer registration Informing program delivery is another example
in the province. of cancer registry data use. In partnership with CCO,
Ontario’s Regional Cancer Programs administer
programs and services for cancer prevention and
EDW Reconstruction care in all 14 of the province’s local health author-
ities and the Local Health Integration Networks
In 2014, many years of work culminated in the (Fig. 2). OCR data are a source of information
first major reconstruction of the cancer registry used by these networks in the planning, integration,
since the adoption of OCRIS. The registry was and funding of local healthcare, as well as in
rebuilt within the newly adopted technology of improving access and the patient experience.
the Enterprise Data Warehouse (EDW). This CCO also regularly shares OCR data with its
change also coincided with the adoption of provincial partners and collaborators, including:
new standards for the registration of cancer
cases, specifically the Multiple Primary and • Pediatric Oncology Group of Ontario,
Histology (MPH) coding rules of the Surveil- Ontario’s lead agency on childhood cancer sur-
lance, Epidemiology, and End Results Program veillance, research, care, and support
(SEER). The need to modernize the OCR • Institute for Clinical Evaluative Sciences, a
through technological improvements was research institute that performs many leading
prompted by greater demand for the registry’s evaluative studies on healthcare delivery and
business intelligence capabilities. The new outcomes, often by linking together health data
EDW-based OCR was officially launched in such as physician billing claims and hospital
October 2014. discharge abstracts with cancer data
• Cancer Research Institute of Queen’s University,
which undertakes studies of cancer etiology,
Who Uses OCR Data and for What tumor biology, clinical trials, as well as outcomes
Purpose? and health services research
• Public Health Ontario, an agency dedicated to
In recent years, the community of users of protecting and promoting the health of all
cancer registry data has expanded beyond the Ontarians and reducing inequities in health
traditional audience of epidemiologists, cancer through surveillance and research related to
surveillance analysts, public health researchers, chronic and communicable diseases.
and policy analysts. Increasingly, the healthcare
provider community, health services researchers,
and cancer system planners are turning to Examples of National Stakeholders
population-based cancer registries like the OCR
for foundational data to address questions related The OCR is 1 of 13 provincial and territorial
to clinical care and healthcare planning. The fol- cancer registries that populate the Canadian Can-
lowing sections highlight several examples of the cer Registry managed by Canada’s statistical
OCR’s stakeholders. agency (Statistics Canada). The Canadian Cancer
368 S. Prodhan et al.
Fig. 2 Map of Ontario’s Local Health Integration Net- West, 6. Mississauga Halton, 7. Toronto Central, 8. Cen-
works. 1. Erie St. Clair, 2. South West, 3. Waterloo Wel- tral, 9. Central East, 10. South East, 11. Champlain, 12.
lington, 4. Hamilton Niagara Haldimand Brant, 5. Central North Simcoe Muskoka, 13. North East, 14. North West
Registry is the main source of cancer statistics in numerous international research initiatives,
used in cancer health planning and decision- including but not limited to the International Can-
making at the national level. The OCR represents cer Benchmarking Partnership (ICBP) and the
the Canadian Cancer Registry’s largest provincial CONCORD studies on cancer survival.
source of cancer data and, as a result, greatly Established in 1987, NAACCR is an umbrella
influences national cancer statistics. The provin- organization for North American cancer regis-
cial and territorial cancer registries work with the tries, governmental agencies, professional associ-
Canadian Cancer Registry program to establish ations, and private groups interested in the
national standards for registry operations and dissemination of cancer data. NAACCR achieves
data collection. its mission through the active participation of
CCO also collaborates with the Canadian Part- selected US state cancer registries and Canadian
nership Against Cancer, a national agency that provincial and territorial cancer registries. As with
leads the performance measurement of Canada’s other member registries, the OCR shares its data
cancer system. The partnership uses OCR and with NAACCR annually. The compiled data are
other data from CCO and other provincial cancer used to present North American cancer statistics
agencies to identify disparities in cancer care and in NAACCR’s annual publication (Cancer Inci-
management at the national and provincial levels. dence in North America).
The OCR is one of several provincial cancer
registries that submits its data every 5 years to
Examples of International Stakeholders IARC for inclusion in a compendium of cancer
incidence data from internationally recognized
CCO actively shares OCR data with international cancer registries called Cancer Incidence in Five
organizations such as the North American Asso- Continents. Data on childhood cancer incidence
ciation of Central Cancer Registries (NAACCR) are also submitted by the OCR for inclusion in
and the International Agency for Research on IARC’s International Incidence of Childhood
Cancer (IARC). The registry data are also used Cancer report.
16 Health Services Data: The Ontario Cancer Registry (a Unique, Linked, and Automated. . . 369
Table 1 The OCR’s four main data sources for incident record creation
Relative rank of
importance in record Load frequency into
Source Type(s) of information creation EDW-OCR
Pathology (from public and Pathology reports and 1 Weekly
private laboratories) diagnostic test results
ALR (from Regional Cancer Treatment, past medical 2 Monthly
Centers) history and out-of-province
records
DAD and NACRS (from Admissions, discharge and 3/4 Monthly
CIHI) surgery data
Death certificates (from the Cause of death; 3 Typically every 18–24
Registrar General of Fact of death months; Every quarter
Ontario)
ALR Activity Level Reporting, DAD Discharge Abstract Database, NACRS National Ambulatory Care Reporting System,
CIHI Canadian Institute for Health Information
The ICBP is a global initiative that combines databases, laboratory reports, and clinical
the OCR with 12 comparable population-based records, including:
cancer registries. The ICBP’s registry data spans
six countries across three continents. Open only • Pathology reports
to registry jurisdictions with universal access to • Activity Level Reporting from Regional
healthcare and similar levels of healthcare Cancer Centers (RCCs)
spending, the ICBP aims to optimize the cancer • Surgery and discharge data from the Canadian
policies and services of its partners. To date, the Institute for Health Information (CIHI)
OCR has participated in three of five of the • Death certificates
ICBP’s research modules, exploring the topics • Notification of out of province diagnosis or
of cancer survival, delays between treatment treatment of Ontario residents
and diagnosis, and short-term survival (ICBP
booklet 2014). Each data source is managed differently by the
The CONCORD study was the first world- OCR and serves a unique purpose in record crea-
wide analysis of its kind to systematically com- tion (Table 1).
pare cancer survival across five continents, It is uncommon to have a single data source for
involving 101 cancer registries from 31 coun- any given cancer case (Fig. 3), but certain sources
tries (Coleman et al. 2008). Canadian data in the are more commonly available than others. For
study was composed of the OCR and four other example, of the 233,020 incident cases recorded
provincial and territorial cancer registries. The between 2010 and 2012, 84 % included a pathology
OCR was used again in the follow-up CON- report. In 7 % of all cases, pathology reports were
CORD-2 study, which assessed survival across the only given source record. By comparison, 60 %
279 population-based cancer registries from of all cases had a corresponding NACRS record.
67 countries (Allemani et al. 2015). However, in less than 0.1 % of all cases, NACRS
was the only provided source record.
Data Sources
Pathology
OCR records are created using data collected
for purposes other than cancer registration. Pathology reports are the main diagnostic source
The data come from various administrative for new case record creation (Table 1). Through the
370 S. Prodhan et al.
ePath electronic pathology reporting system, CCO Checklist (eCC) developed by the College of
receives over one million pathology reports each American Pathologists. Checklists and standard
year, sent in from 47 provincial facilities. In 2014, data fields in the eCCs eliminate the descriptive
237,834 of these reports were cancer relevant to language found in narrative reports. Synoptic
173,226 unique reports. To efficiently handle this reports can be submitted in real time, making
large volume of information, pathology data is them a significantly more efficient method of
loaded into the EDW-OCR on a weekly basis. pathology reporting.
Pathology reports are delivered to CCO in One promising development is the inclusion
one of two forms – as narrative or as both narra- of biomarkers in synoptic reporting. Biomarkers
tive/synoptic reports. Narrative reports describ- are laboratory indicators that can help identify
ing a patient’s pathology test results are those abnormal processes, conditions, or disease.
that have been written in sentence form or orally With respect to cancer care, biomarkers are of
transcribed. While these types of reports can be particular interest as they can provide informa-
submitted electronically, they cannot be handled tion on cancer etiology, prognosis, and diagno-
automatically and are difficult to query. Coders sis. Examples of commonly used biomarkers
must manually review narrative reports to derive include HER2 for breast cancer, KRAS for colo-
relevant information and verify if there is indeed rectal cancers, and ALK for lung cancer. In
a cancer diagnosis. collaboration with the College of American
Narrative reports currently account for Pathologists’ Pathology Electronic Reporting
approximately 70 % of all pathology reports Committee, CCO is working to create biomarker
received by CCO. The other 30 % of reports templates for synoptic reporting. By September
are received in synoptic form, a highly struc- 2016, all 19 of Ontario’s genetic facilities are
tured and standardized format of data submis- expected to implement eCC biomarker reporting.
sion submitted electronically. These reports In preparation, Ontario has mandated 5 biomarker
improve overall completeness, ease of data eCCs for lung, colorectal, breast, stomach cancers
exchange, treatment related decision-making, and melanoma. CCO is also equipped to handle
and turnaround time. First implemented in optional for use biomarker eCCs for endometrial,
2009, the synoptic pathology reporting system gastrointestinal stromal tumor, myeloid, lym-
in Ontario is derived from the Electronic Cancer phoid, and CNS tumors.
16 Health Services Data: The Ontario Cancer Registry (a Unique, Linked, and Automated. . . 371
Data submitted by RCCs include Activity Level OCRIS and the EDW Successor
Reporting (ALR). ALR consists of patient records
pertaining to radiation and systemic therapy ser- OCRIS served as CCO’s cancer registry informa-
vices as well as oncology clinic visits. Sixty-two tion system since the 1980s. In an effort to mod-
percent of new cancer cases in the OCR from ernize the registry and align it with current
2010 to 2012 included ALR as a reporting source standards, OCRIS was formally decommissioned
(Fig. 3). Some out-of-province data are collected and replaced by the Enterprise Data Warehouse
for patients that access cancer services outside of (EDW)-based OCR in late 2014. The EDW was
Ontario (e.g., in neighbouring provinces). The initially designed to store ALR data for examining
loading of ALR data into the OCR occurs on a treatment and financial metrics, but in 2005 the
monthly cycle. ALR data can be reported in either decision was made to reconstruct the cancer reg-
ICD-10 or ICD-O-3 coding systems. istry within the EDW.
The EDW is composed of numerous data hold-
ings, three of which are primarily related to cancer
DAD and NACRS registration (see “Technical Appendix” for more
details):
CIHI supplies data from the Discharge Abstract
Database (DAD) and National Ambulatory Care • Pathology/source data mart
Reporting System (NACRS). DAD includes • Ontario Cancer Registry (EDW-OCR)
administrative, clinical, and demographic data • Collaborative Staging (CS) data mart
pertaining to all hospital in-patient discharges.
NACRS reports all hospital- and community- CCO’s IT team is responsible for EDW sup-
based ambulatory care in day surgery, outpatient port, data load, linkages, .net support and techni-
clinics, and emergency departments. As of 2002, cal quality assurance.
all CIHI data are coded in ICD-10-CA.
Death certificates are obtained by the OCR from Through the key processes of patient linkage and
the Registrar General of Ontario. This information case resolution, the OCR registrars are able to
is used to track the vital status of patients in the generate linkable records that combine all relevant
registry and ensure that all incident cancer cases data while eliminating redundant records. The
have been identified, particularly those that are EDW-OCR also permits any manual correction
only identified upon death. This process is of cases at the record level, something not previ-
known as death clearance. ously possible with OCRIS. Although the OCR
Coded death certificates are received between relies on various automatic processes, manual
18 and 24 months after death. In lieu of death review and input are still required to verify the
certificates, CCO also accepts fact of death for completeness and accuracy of information for
death clearance. CCO receives fact of death cancer registration.
records approximately every quarter, describing Patient linkage is one of cancer registration’s
deaths that have occurred approximately 6 months most fundamental processes and involves a combi-
prior to the current quarter. Unlike death certifi- nation of deterministic and probabilistic linkage
cates, fact of death does not provide any insight routines to aggregate a person’s source records
into an individual’s diagnosis of cancer and can into a “best” linked person record, which is a com-
only be used to close existing cases in the OCR. posite record representing the individual. This
372 S. Prodhan et al.
entails the linking of new source records to existing existing case would not pass death clearance
person records. Source records that do not and the OCR would over-report cancer survival
match to existing person records are consoli- and prevalence.
dated and added to the OCR as new person
records. However, there are several challenges
with the linkage process. Aside from adminis- Case Resolution
trative errors like the misspelling of names or
varying date formats, not all reports contain While the goal of patient linkage is to tie patient
identical data elements. Unlike data from records together, case resolution works to consol-
CIHI, ALR, or ePath, death certificates fail to idate these data into individual cancer cases. The
provide patient Health Insurance Numbers. immense volume of data received by CCO for the
Because of the inconsistency in source data, purpose of curating the OCR necessitates a highly
deterministic linkage is ruled out as a major competent system to handle information and pare
method for creating patient records and proba- it down into discrete cases. Case resolution does
bilistic linkage is used instead. Nonetheless, this by identifying individuals to process,
deterministic linkage is used to supply names reviewing their source data records, and identify-
to CIHI records (via health card number) and ing any primary cancers. A rigorous set of rules
some other identifiers to other sources records, are then used to automatically produce a “best”
using the provincial client registry. diagnosis from the available data. At this point,
Probabilistic linkage allows matching only incident cases that have passed the various
of data where the completeness of matching checks and filters remain.
variables is not 100 % and tolerates typing Unlike patient linkage, case resolution is an
errors, transpositions, initials, nicknames, etc. automatic process without concurrent manual
Through probabilistic linkage matches are review. Automated logic processes all source
assigned a total match score (weight). Matches records for a person, making cases for reportable
with the highest weights are automatically neoplasms. Any case found to be non-incident,
accepted, matches with low weights are rejected problematic, or outside the interest of CCO is
and links falling between the high/low thresh- appropriately flagged.
olds are manually reviewed. The Master Patient Non-incident cases are legitimate cases which
Linkage links incoming CIHI, ALR and Pathol- do not qualify because the specific diagnosis is not
ogy data to existing OCR persons. Incoming covered by the OCR definition of “incident,”
data that does not link to existing persons which normally includes only invasive, reportable
results in the addition of ‘new’ OCR persons. cancers. This includes in situ cases as well as
The Death Linkage links incoming death certif- benign and borderline brain and central nervous
icates to the OCR. Death certificates with a system tumors.
cancer cause of death that do not link to an Problematic cases are those that either conflict
existing OCR person result in the addition of a with the system or do not meet the basic criteria for
‘new’ OCR person and a ‘Death Certificate a proven case. An example of the latter is a case
Only’ cancer case. consisting only of hospital discharge records.
Incorrect linkage would have several impli- Because discharge data alone is not indicative of
cations. For example, if multiple reports were a diagnosis or outcome, a definite case cannot be
not linked to their respective patient, redundant created. After a follow-up review, problematic
“persons” or cases would be generated. This cases can be identified as incident or non-incident
would result in the over-reporting of cancer or combined with already existing cases.
incidence. Similarly, if death certificates were Some cases are deemed as “out of OCR range”
not linked to the correct person record, the or not of interest. These cases do not qualify as
16 Health Services Data: The Ontario Cancer Registry (a Unique, Linked, and Automated. . . 373
Fig. 4 Diagram of interrelated OCR processes (except for pathology data collection system, ALR Activity Level
mortality data; death certificates and fact of death are Reporting, CIHI Canadian Institute for Health Informa-
processed separately). Patient linkage and case resolution tion, CS Collaborative Stage, NAACCR North American
processes are scheduled to run bimonthly. ePath electronic Association of Central Cancer Registries
incident cases because they fall outside of CCO’s sometimes used by the sources, case resolution
rules on geography and timing. The rules on logic may mistakenly create multiple cases for a
geography exclude patients with a residence at single person. Manual reviewers examine source
diagnosis listed as outside of Ontario. However, records and merge any such cases. All cases are
patients without a listed residence are still treated subject to manual review 6 months after their
as incident cases and considered as living in creation.
Ontario. Timing rules dictate that any cases diag- The successful completion of these processes
nosed prior to 1964 be labeled as “out of OCR allows for the creation of an OCR minimum
range” and ignored. Any cases that are not flagged dataset for a given year (Fig. 4).
by these rules are considered incident cases.
Resolved cases provide cancer-specific infor-
mation such as the conditions of the diagnosis
(ICD code, age, date of diagnosis, etc.), incidence Data Elements
status (in situ, invasive, etc.), cancer stage, and
other data pertinent to oncologists and Information in the OCR spans several domains of
researchers. data including demographic and vital statistics,
All non-pathology source data come to CCO tumor characteristics, treatment, and patient iden-
precoded. Because of the divergent coding tifiers (Table 2).
374 S. Prodhan et al.
Table 2 Data domains and elements in the OCR diagnosis is an automated activity. First, all source
Data domain Available data elements records linked to a case are chronologically
Demographic and Date of birth ordered. Then, the earliest date is selected as the
vital statistics Age at diagnosis date of diagnosis, regardless of record type. The
Sex of patient complexity of the algorithms used varies
Census tract, division, and
subdivision depending on the nature of the element. For exam-
Last known ple, the methods used to generate stage data are
Place of residence considerably more complex (see section “Cancer
Date of death Stage at Diagnosis”).
Tumor characteristics Date of diagnosis
Non-incident status
Method of diagnosis/
confirmation Data Quality
Type of pathology report
Stage at diagnosis The quality of cancer incidence data in the OCR
Stage (overall, clinical and
pathological) compares favorably with that of other provincial
Primary site (ICD-O-3 site and national cancer registries. The OCR adheres
code) to four dimensions of data quality: comparability,
Histology (ICD-O-3 histology completeness, accuracy, and timeliness (Parkin
code)
Morphology and Bray 2009).
Topography Comparability is defined as the extent to
Site-specific factors which registry practices adhere to standard
Laterality guidelines, which include the criteria for regis-
SEER diagnosis group
Clinical practice group tration, coding systems such as ICD-O-3, multi-
Place of residence at diagnosis ple primary counting rules, and more. The
Treatment Local Health Integration standardization of OCR procedures ensures its
Network comparability and compatibility with other can-
Public health unit
cer registries.
Treatment facility
Treatment date Completeness refers to how well incident
Date of last contact cancer cases are registered. Specifically, how
Care site ID closely registry values for incidence and sur-
Discharge count (DAD)
vival reflect the population’s true values. The
Surgery count (NACRS)
ALR/RCC OCR’s ability to draw upon multiple data
Number of pathology reports sources to register cases, often with multiple
Identifiable/linkable Place of residence at diagnosis sources per case, is conducive to a high level
Information Patient name of completeness. OCR completeness is further
Ontario Health Insurance Plan
number
verified through case-finding audits, record
Health card number linkage with national and provincial databases,
SEER Surveillance, Epidemiology, and End Results Pro- and comparisons with historic values.
gram; DAD Discharge Abstract Database, NACRS National Accuracy pertains to how well case records
Ambulatory Care Reporting System, ALR Activity Level resemble their actual values. Just as with com-
Reporting, RCC Regional Cancer Center
pleteness, the OCR maintains a high level of
accuracy thanks to its use of multiple data sources.
Accuracy is further improved with re-abstraction
Because numerous source records are often studies and recoding audits, histological verifica-
tied to a single case, some data elements such as tion of cases, examining “death certificate only”
date of diagnosis must be derived using algo- cases, and analyses of missing information and
rithms. In this case, establishing the date of internal inconsistencies.
16 Health Services Data: The Ontario Cancer Registry (a Unique, Linked, and Automated. . . 375
Table 3 Data quality indicators (NAACCR standard) for OCR 2008–2012 data yearsa
Year
Indicator (% of all cases) 2008 2009 2010 2011 2012
Completeness of case ascertainment 94.9 95.0 96.1 99.1 94.8
Missing age 0.0 0.0 0.0 0.0 0.0
Missing sex 0.0 0.0 0.0 0.0 0.0
Death certificate cases only (DCO) 1.0 1.3 1.5 1.4 1.8
Passing edits checks 100 100 100 100 100
a
Current as of Nov 2015
Timeliness is the speed with which a registry can also arise in instances where registrars manu-
can collect, process, and report complete and ally edit EDW-OCR data by merging cases
accurate cancer data. OCR timeliness is contin- together or modifying diagnosis codes and dates.
gent upon two variables – the time until receipt
and the time to process. The time until receipt Data Auditing
refers to the time elapsed from diagnosis to deliv- CCO practices routine data audits to verify the
ery to CCO. With the exception of cause of death accuracy of its data holdings. One such audit is
information though death certificates, which are for inter-rater reliability aimed at assessing the
typically received after 18–24 months, CCO level of agreement among coders or staging
receives and loads data into the EDW on a regular abstractors. These audits are necessary to mini-
schedule (see section “Data Sources”). mize the loss of data integrity as a result of human
Every year the OCR shares its data with error and establish consistency.
NAACCR as part of its annual call for data, In a 2015 audit for stage quality, the inter-rater
which is one of several calls for data by other reliability between 16 CCO analysts was carried
organizations throughout the year to which the out. Each analyst independently staged an identical
OCR responds. The measures of quality using set of 96 randomly chosen cases diagnosed from
the NAACCR data quality standard are shown in 2012 to 2013. The “de-identified” set of cases
Table 3 for reference. included an equal amount of breast, colorectal,
lung, and prostate primaries. Restrictions placed
on the analysts prevented them from consulting
Other Factors Affecting Quality each other or accessing full patient records, case
histories, or pathology reports. Audits such as
Registration System these allow CCO to discover any issues in data
One concern regarding data quality pertains to the quality and promptly rectify them. Among the
recent transition from OCRIS to the EDW-OCR. group of 16 analysts, an overall agreement rate of
With each bimonthly case resolution cycle, EDW 93.5 % was found. In such audits, CCO strives to
case data evolves. Existing cases expire and are maintain a crude agreement rate of at least 90 %.
replaced with a new case file. Cases tied to
OCRIS, namely, all data from before 2010, remain Data Sources and Timing of Loads
unaffected and are listed as “frozen.” In order to As part of the transition from OCRIS to the
mitigate variability, the data mart also tracks new EDW-OCR, some of the data source rankings
versions of old case files. When new and old case have changed. In particular, pathology reports
files maintain a fixed degree of similarity, the two have replaced ALR data for the highest rank.
are linked in a process called case chaining. Case This can be attributed to the more reliable and
chaining ensures that current case files can be efficient nature of some sources, which makes
found once an older case is retired. Variability them more valuable. Case data quality can be
376 S. Prodhan et al.
further examined by performing NAACCR Edit scheduled releases of death certificates from the
Checks (Table 3). These checks identify cases Registrar General of Ontario. As a result, the
that warrant further review. Often times death clearance process may occur long after
such cases are coded incorrectly, with invalid incident cases from other sources have been
topography and morphology combinations, identified.
unconfirmed multiple primaries, and other
errors which are easily rectified.
Delays in the delivery and handling of source The OCR Adopts a New Approach
data mean that case resolution and registration to Counting Cancers
cycles can on occasion be out of sync. As previ-
ously mentioned, DAD, NACRS, and ALR data Counting practices for OCRIS incident cases
are loaded on a monthly basis. In comparison, were modeled after standards set by the Interna-
pathology (ePath) data is loaded weekly. How- tional Agency for Research on Cancer (IARC)
ever, as the case resolution and registration and the International Association of Cancer
cycles operate on a bimonthly schedule, any Registries (IACR). These counting rules were
misrepresentations of data become negligible very conservative and inflexible for patients
over time. diagnosed with multiple primaries. Given that
approximately 10–14 % of cases with a single
primary will develop a subsequent cancer
Ontario Patients Treated Outside
within 25 years, cancer incidence counts
Ontario - Removal of Duplicates
would be under-reported by overlooking such
Statistics Canada conducts a national duplicate
subsequent primaries. The modified IARC/
resolution process with the provincial and terri-
IACR rules used by OCRIS did not recognize
torial cancer registries each year to account for
paired organs (e.g., left or right breast or lung)
multiple reporting of cases (e.g., due to patients
or colon and skin melanoma subsites, nor did it
moving between jurisdictions). The exchange of
have timing rules to recognize new, subsequent
data between provincial and territory cancer reg-
primary cancers in the same organ. As a result,
istries is necessary to resolve duplicate cases and
OCRIS likely reported lower rates of multiple
identify cases that may be missed, such as among
primaries than other registries with more liberal
individuals who access cancer services outside
rules, including those using the SEER Multiple
of their home province. For example, residents
Primary and Histology (MPH) coding rules.
of northwestern Ontario will often use out-of-
However, starting with cases diagnosed in
province cancer services in the neighboring
2010, the OCR implemented the SEER MPH
province of Manitoba. Data sharing agreements
coding rules, which use four criteria for counting
exist between provinces for the exchange of such
multiple primaries: topography, morphology,
information.
laterality, and timing (Johnson et al. 2012).
Death Clearance
Death certificates are used for the purpose of Topography and Morphology
death clearance, a process that uses the coded
cause of death (where cancer is the underlying Topography refers to a cancer’s anatomic site of
cause) to identify individuals who were not pre- origin, while morphology describes the type of
viously recognized as having cancer. These cell and its biological activity. The morphology
“death certificate only” cases represent under of cancers is recorded with two codes, describing
2 % of incident cases (Table 3). Unless fact of the cancer’s histology and behavior. In OCRIS,
death is established otherwise, death certificates additional primaries were only added to the regis-
are necessary to keep accurate survival and prev- try when cancers expressed both a different topog-
alence rates. Currently, there are no routinely raphy and morphology from the initial primary
16 Health Services Data: The Ontario Cancer Registry (a Unique, Linked, and Automated. . . 377
cancer. As shown in Table 4, the OCR accepts and right kidney, were reported with invasive
cancers that are morphologically identical but tumor, only a single primary would be recognized.
have different topography, and vice versa, as Using the SEER rules, paired sites are considered
being multiple primaries. in the registration of multiple primaries. As
outlined in Table 5, only specific topographic
sites are subject to the rules on laterality. With
Laterality cancers of the central nervous system, the
laterality rule only applies to benign and border-
Laterality applies mainly to paired organs and line tumors. Malignant central nervous system
differentiates similar cancers by organ subsite. tumors remain exempt.
The IARC/IACR rules do not recognize laterality.
In cases where both paired organs, such as the left
Timing
Table 4 Criteria for classifying cancers as multiple pri- The diagnosis of multiple primaries can be typi-
maries under the modified IARC/IACR rules in OCRIS cally described as synchronous or metachronous.
compared to SEER Multiple Primary and Histology rules Synchronous cancers are those that develop at
in the OCR the same time or within a small time frame,
Multiple primary rule while metachronous cancers occur in sequence
SEER of one another. Data on metachronous cancers
IARC/IACR MPH
are of particular importance to researchers as
Criteria (in OCRIS) (in OCR)
they provide insight into causal mechanisms
Same topography and No Yes
different morphology (in general) involved in the formation of subsequent neopla-
Different topography No Yes sia. IARC/IACR rules dictate that the existence
and same morphology (in general) of two or more primary cancers does not depend
Laterality No Yes on time and are therefore recognized as a single
Timing No Yes primary case. The SEER rules on timing allow
(in general) metachronous cancers to exist as multiple pri-
IARC International Agency for Research on Cancer, IACR maries. As shown in Table 5, a cancer must have
the International Association of Cancer Registries, SEER
Surveillance, Epidemiology, and End Results Program, developed after a specified period of time to qual-
MPH multiple primary and histology ify as a multiple primary.
Table 5 Applicable SEER multiple primary counting rules for laterality and timing (Source: SEER Multiple Primary and
Histology Coding Rules Manual, 2007)
Cancer type Laterality Timing
Breast Yes 5 years
Head and neck Yes 5 years
Kidney Yes 3 years
Lung and pleura Yes 3 years
Urinary Yes 3 years
Colon Yes 1 year
Melanoma Yes 60 days
Benign and borderline central nervous system tumors Yes Does not apply
Malignant central nervous system tumors Does not apply Does not apply
Other sites Yes, if considered a paired site 1 year, if applicable
Invasive diagnosis 60 days after an in situ diagnosis Does not apply 60 days
378 S. Prodhan et al.
10,500
10,000
9,500
9,000
Number of new cases
8,500
8,000
7,500
7,000
6,500
6,000
5,500
5,000
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Year of diagnosis
Female Breast Colorectal Lung Prostate
Fig. 5 Effect of implementing the SEER Multiple Pri- lung cancers while the incidence of prostate cancer
mary and Histology counting rules on incidence, by cancer remained relatively fixed (Source: Ontario Cancer Regis-
type in Ontario, 2002–2011. Note the seemingly dispro- try, 2015 (Cancer Care Ontario))
portionate rise in the incidence of breast, colorectal, and
Implications of Counting Rules on Data and those diagnosed with melanoma of the skin,
and Analysis female breast cancer, and colorectal cancer. The
incidence of colorectal, female breast and lung
There was a substantial increase in incident cases cancers rose considerably following the imple-
following the adoption of the SEER MPH rules by mentation of the SEER MPH counting rules
the OCR. This change is due to how cancers were (Fig. 5). However, the incidence of prostate
being counted rather than indicating that more cancer remained largely the same.
people in Ontario were being diagnosed with or
dying of cancer. The new rules allow for a more
complete accounting of cancer incidence, which Best Practices for Analysis
improves the ability for regions and communities
in Ontario to plan for the future needs of the From an analytic perspective, if an analysis spans
cancer system. the OCRIS and OCR datasets, special care must
To further examine the change imposed by the be taken to reconcile the two. More specifically,
new counting rules, IARC/IACR and SEER MPH data from 2010 onward must first be made IARC/
rules were compared for 2010 and 2011 incident IACR-compatible by using those multiple pri-
cases (Candido et al. 2015). According to this mary counting rules, which then allow for trend
analysis, overall there were 5.8 % more cases analyses under a common rule.
using the SEER MPH rules, the increase varying For cancer projections, the projections must be
by morphology, topography, sex, and age. The undertaken based on incidence counts using the
greatest change was observed in older age groups IARC/IACR rules and then be modified with a
16 Health Services Data: The Ontario Cancer Registry (a Unique, Linked, and Automated. . . 379
correction factor that accounts for the effect of the derives a single set of T, N, M, and stage group,
SEER MPH counting rules. which will be clinical or pathologic, depending on
how the extent of disease was discovered within
the diagnostic and treatment process.
Cancer Stage at Diagnosis The staging guidelines for CS require signifi-
cantly more information than is included in clin-
Cancer stage at diagnosis reports the extent of a ical or pathological TNM reports. CCO staging
cancer’s invasion and spread beyond the primary analysts require data on tumor size, depth of inva-
site. Factors used in staging include the tumor’s sion, the number and location of positive lymph
topographic site, size, multiplicity, invasiveness, nodes, as well as site-specific factors (see details
lymphatic penetration, and metastases. In a clini- below). TNM often does not specify these raw
cal setting, this information helps determine a data elements, nor does it provide a cancer stage
patient’s appropriate course of treatment and pro- indicator that combines clinical and pathology
vides an estimate of their prognosis. The dominant data. In order to derive the CS, staging analysts
clinical staging method is the tumor, node, and must also review patient pathology and medical
metastasis (TNM) staging system and the collab- records in addition to TNM reports. One calendar
orative staging (CS) framework (which is based on year is typically dedicated to the CS capture pro-
TNM) used by North American cancer registries. cess for a given diagnosis year.
TNM staging reports cancer stage as a function One significant change that accompanied the
of tumor, node, and metastasis. First, the primary adoption of CS was the introduction of site-
tumor is classified by type, size, and extent. Next, specific factors (SSFs). SSFs provide supplemen-
the level of lymph node involvement is deter- tary information unique to a cancer type to assist
mined. Lastly, any metastases are examined to in the staging process. This information expands
assess the cancer’s spread from the primary site. the understanding of tumor characteristics, prog-
By taking these data into consideration, an overall nosis, and predicted treatment response. SSFs for
stage value, ranging from 0 to IV, can be assigned. several cancer types were introduced with AJCC
CS is a unified data collection framework 7th edition for cases diagnosed in 2010 onward.
designed to use a set of data elements based on Furthermore, the implementation of SSFs allows
the extent of disease and clinically relevant fac- registries to collect data on biomarkers and other
tors. The primary objective of CS was to reconcile factors that were previously not collected.
the American Joint Committee on Cancer’s
(AJCC) TNM staging system, Union for Interna-
tional Cancer Control TNM staging system, and CS Automation and Integration
the SEER Summary Staging system. This change
brought about a significant reduction in data CS is generated in a two part hybrid system
redundancy and duplicate reporting. Furthermore, (see section “Data Systems and Consolidation,”
it retained data integrity for both clinical and Fig. 4). The first part, CS automation, is an auto-
public health researchers while improving acces- mated process that populates CS abstracts by iden-
sibility and compatibility. tifying stageable registry cases and linking them to
The input data items for CS that are collected synoptic pathology reports. Stageable cases are
from the medical record include both clinical those that contain data sufficient to derive a
diagnostic results, like imaging, biopsy, and TNM stage, either using the CS data collection
other tests, and any cancer resection surgery system or by manual TNM staging. CS abstracts
results. Each data element has an additional field are required to organize and summarize case data
that identifies whether it was collected from clin- pertinent to the staging process. Staging analysts
ical or resection surgery findings and an indicator remotely access hospital electronic patient records
if neoadjuvant therapy was performed prior to to determine if clinical diagnostic tests need to be
surgery. The CS algorithm then automatically added to the CS input information. This also
380 S. Prodhan et al.
occurs when synoptic pathology data are insuffi- hospitals have made patient records available to
cient for abstraction purposes or are unavailable. the OCR’s staging analysts.
The second process called CS integration
requires a more fine-tuned approach. It involves
a probabilistic tumor linkage between case and Stage Capture Rates
abstract followed by manual review of unlinked
abstracts. CS integration involves reviewing Stage capture refers to the completeness of stage
abstracts to determine a “best stage” and linking information on all stageable cancer cases identi-
it back to cases in the OCR. This process neces- fied by the registry. Of the 65,816 cases identified
sitates remote access to the electronic patient in 2013, approximately 72 % were stageable
record at hospitals. Currently, CCO is the only (Fig. 6). CS was derived for 85 % of those stage-
organization in Ontario authorized to exercise able cases, a significant improvement from its
this level of direct access to electronic patient introduction in 2008.
records. The proportion of CS cases increased sub-
stantially after 2010, when a national initiative
led by the Canadian Partnership Against
Source of Staging Data Cancer supported the Canada-wide adoption of
CS. The rise in stage capture rates can also be
In 2005, CS was captured for only a subset of attributed to the progressive rollout by CCO of
patients from outside RCCs, representing less CS to a greater number of cancer types (Table 6).
than 15 % stage capture in the first year of data CS was officially implemented in the OCR for
collection. It has since expanded to include both the four most common cancers in 2010. In 2011,
RCCs and non-RCC hospitals. Staged TNM data the use of CS grew to include melanoma of
is received from Ontario RCCs, while non-RCC the skin and gynecologic cancers, followed by
90%
80%
70%
Stage capture %
60%
50% Unknown
TNM
40%
CS
30%
20%
10%
0%
2008* 2009* 2010 2011 2012 2013
N= 54,789 N= 56,520 N= 67,717 N= 70,957 N= 69,757 N= 65,816
Year
* 2008 and 2009 capture rates determined using modified IARC/IACR rules for counting multiple primaries.
Fig. 6 Stage capture rates using Collaborative Stage (CS) Ontario). * 2008 and 2009 capture rates determined
and Tumor Node Metastasis (TNM), Ontario, 2008–2013 using modified IARC/IACR rules for counting multiple
(Source: Ontario Cancer Registry, 2015 (Cancer Care primaries)
16 Health Services Data: The Ontario Cancer Registry (a Unique, Linked, and Automated. . . 381
Table 6 Progressive rollout by OCR of collaborative This section describes the dataset linkage pro-
staging by cancer site cess within the OCR and outlines several of
Year of full implementation of CCO’s other data holdings that may be of interest
Cancer type collaborative stage by OCR to health services researchers.
Breast 2010
Lung 2010
Colorectal 2010
Other Linkage Processes
Prostate 2010
Gynecologic 2011
cancers Pending approval by CCO’s data disclosure team,
Melanoma of 2012 CCO may process cohort files submitted by exter-
skin nal researchers (see section “Data Privacy and
Thyroid 2013 Access” for more information on data access).
At minimum, these cohort files must include
names and birthdates of all patients to be
further expansion in 2013 to include thyroid processed. Additional identifiable information
cancer. such as health card numbers (HCNs), postal
In 2015, there was a decision to retire the CS codes, and gender may be included in the cohort
system and to implement TNM for population- file to expedite the linkage procedure and any
based staging in North America. The change in necessary manual resolutions. After a suitable
staging system is expected to take effect in Cana- cohort file has been received by CCO, a linked
dian provincial registries with cases diagnosed in file may be produced. In the interest of efficiency,
2017. Despite the decision to return to TNM stag- cohort files for less than 300 individuals are linked
ing, the AJCC has stated that it is committed to to the OCR manually through a name search func-
keeping SSFs an integral part of the staging pro- tion. Cohort files for over 300 individuals neces-
cess. Discussions are still underway in Canada sitate a probabilistic linkage in the same manner
about additional factors relevant to population- as described in section “Data Systems and Con-
based staging that may need to be collected. solidation,” but with the use of Automatch soft-
ware. The software compares records from client
files to the OCR and assigns a total score
corresponding to how closely the records match.
Linkage of the OCR to Other Datasets Matches on uncommon names will receive a
higher score than matches on common names,
By linking the OCR with other datasets, indicating greater confidence in the link.
researchers can obtain a more comprehensive These linkages are to a subset of OCR data.
understanding of healthcare issues. Whether Subsets are pared down to comply with research
linked with CCO or external datasets, the OCR parameters. For example, if the cohort represents
can serve as the basis for research studies, espe- females enrolled in a research survey which com-
cially when patient-level data is available. menced in 2002, the subset will not contain female
Datasets regularly linked with the OCR for the patients who died prior to 2002 or any male patients.
purpose of health services research include the Typical information released from the OCR includes
following healthcare utilization databases: person key (a unique identifier for an OCR person),
date of diagnosis, topography, morphology, vital
• Ontario Health Insurance Plan claims status, and date of last contact or death.
• Ontario Drug Benefit claims The probabilistic linkage will match the cohort
• CIHI’s Discharge Abstracts Database (DAD) file to the OCR person records and assign a total
for inpatient hospitalizations match score (weight). Matches with the highest
• CIHI’s National Ambulatory Care Reporting weights will be automatically accepted, matches
System (NACRS) with low weights will be rejected and links falling
382 S. Prodhan et al.
between the high/low thresholds are manually surveillance reports eventually evolved to
reviewed. The high/low thresholds will be deter- describe province-wide patterns and trends in
mined by an OCR data analyst through analysis of healthcare delivery aimed at managing and
the data. The unique identifier for a OCR person planning for the cancer system, allocating
will then be used to select case level data from the resources, as well as evaluating and monitoring
OCR for the cohort members that were identified the cancer system. Between 1973 and 2014,
as matches to the OCR. The final product of the more than 460 peer-reviewed articles were
linkage is a file of matched records which typi- published using data from the OCR. The fre-
cally includes information related to the cancer quency of OCR data use grew substantially fol-
diagnosis and vital status information. lowing the 1990s (Fig. 7). In the last 2 years of
available data (2013–2014), 120 peer-reviewed
research articles were published citing use of
CCO’s Other Data Holdings the OCR. This growth may be attributed to
improvements in information capture in the
CCO data holdings store information collected healthcare sector and the growing availability
from healthcare service providers across the prov- of healthcare data. For instance, within CCO,
ince. This information enables the planning and ALR has evolved in its ability to measure and
funding of cancer and other healthcare services, monitor activity related to systemic treatment,
development of guidelines, and management of including chemotherapy and radiation therapy.
the cancer and renal care systems in Ontario. The Similarly, CCO’s recent implementation of the
major data holdings are shown in Table 7. Details Wait Time Information System increases the
about the data held within each of these reposito- scope of data the patient experiences in the
ries can be found on CCO’s website, www. healthcare system. Moreover, as healthcare-
cancercare.on.ca. related information has become more readily
Other provincial organizations with which available in electronic format, the potential for
CCO maintains a data sharing agreement (DSA) data linkage and exploration of research topics
will sometimes create linkages with OCR data. has continued to grow.
One such example is the Institute for Clinical This section presents specific examples of how
Evaluative Sciences, which uses their version of the OCR has been used for health services
the OCR data received from CCO to perform research. The works cited in this section provide
in-house linkages for research purposes. The dif- some recent examples of data linkage between
ferences in dataset versions between CCO and the OCR and other administrative data sources
organizations that receive CCO data through or linkage with primary data collected by the
data sharing agreements can be identified through research study.
their respective data dictionaries, which are often
available online.
Examples of Health Services Research
Using the OCR
Health Services Research Using
the OCR Using date of diagnosis, geography, and demo-
graphic information, researchers frequently
The OCR has been a source of data for projects extract data from the OCR for descriptive purposes,
across the cancer continuum. A review of the to explore trends over time, patterns of care, and
peer-reviewed literature suggests that use of potential gaps in access and equity. Using this
Ontario cancer data in health research dates approach, researchers have described the postoper-
back to the 1970s. A series of papers in the ative mortality risk among the elderly (Nanji
1970s by MacKay and Sellers reported on the et al. 2015), wait times from abnormal mammogra-
burden of cancer by using the OCR. Such cancer phy to surgery among breast cancer patients
16 Health Services Data: The Ontario Cancer Registry (a Unique, Linked, and Automated. . . 383
Table 7 CCO’s major data holdings as of September 2015 (Source: CCO, 2015)
Data holding Description Type of data
Activity Level Reporting Provides an integrated set of data elements This dataset contains administrative
(ALR)/Cancer Activity from Regional Cancer Centers (RCC) related data, clinical data, and demographic
Datamart to systemic treatment and radiotherapy that data.
cannot be obtained from other providers. This
information is used to support management
decision-making, planning, accountability,
and performance management at the RCC,
regional, and corporate level.
Patient Information Database comprised of patient and tumor This dataset contains administrative
Management System information for cancer and cancer-related data, clinical data, and demographic
(PIMS)/Pathology pathology reports (tissue, cytology), data.
Datamart submitted from public hospital (and some
commercial) laboratories. PIMS documents
patient, facility, report identifiers, and tumor
identifiers, such as site, histology, and
behavior. This information is used to support
management decision-making, planning,
disease surveillance and research, as well as
contributing to resolved incidence case data
in the Ontario Cancer Registry.
New Drug Funding The NDFP database stores patient and This dataset contains: administrative
Program (NDFP) treatment information about systemic therapy data, clinical data (eligibility criteria)
drug utilization at RCCs and other Ontario and demographic data.
hospitals, for which reimbursement is being
sought through the NDFP according to strict
eligibility criteria.
Ontario Breast Screening The associated Integrated Client This dataset contains administrative
Program (OBSP) Management System database provides an data, clinical data, and demographic
integrated set of data for each client screened data.
in the OBSP for the purposes of program
administration, management, and evaluation.
Colorectal Screening Data – The data collected through CIRT will be used This dataset contains: administrative
Colonoscopy Interim to understand current colonoscopy activities care and clinical data.
Reporting Tool (CIRT) conducted within participating hospitals from
both volume and quality perspectives. It will
also be used to validate incremental volume
allocations across the province.
Laboratory Reporting Tool LRT contains CCC program FOBT (fecal
(LRT) occult blood test) kit distribution and results
data from the CCC partner labs.
Ontario Cervical Screening Cytobase is comprised of cervical cytology This dataset contains administrative
Program data (“Pap Test” results) collected from data, clinical data, and demographic
participating community laboratories. This data.
cervical cancer screening database contains
patient, physician, and laboratory
information. This information is used to
administer and evaluate the performance of
CCO’s Cervical Screening Program, for
cancer planning and management and for
cancer surveillance research.
Brachytherapy Funding Stores patient and treatment information This dataset contains administrative
Program about prostate cancer patients at RCC data, clinical data, and demographic
hospitals, for which reimbursement is being data.
sought.
(continued)
384 S. Prodhan et al.
Table 7 (continued)
Data holding Description Type of data
Symptom Management The Symptom Management Reporting This dataset contains administrative
Reporting Database Database data is comprised of three data, clinical data, and demographic
components: patient registration, symptom data.
screening using the Edmonton Symptom
Assessment System (ESAS) and functional
assessment using the Palliative Performance
Scale and/or Eastern Cooperative Oncology
Group Performance Status. This information
is captured by participating sites using the
Interactive Symptom Assessment and
Collection system and then submitted on a
monthly basis to the Symptom Management
Reporting Database.
Interim Annotated Tumor The Interim ATP provides an integrated set of This dataset contains administrative
Project (ATP) Database data, combining tumor information from the data, clinical data, and demographic
Ontario Institute for Cancer Research’s data.
Tumor Bank with CCO’s Cancer Registry,
for the purpose of increasing the accuracy
and utility of the information for both
researchers and CCO planners. For example,
researchers may use this information to study
the association between genetics and
response to cancer drugs; in turn, CCO may
use this information to create clinical
guidelines for the care and treatment of
cancer patients in Ontario.
Wait Times Information The Wait Time Information System is the This dataset contains administrative
System (WTIS) first-ever information system for Ontario to data, clinical data, and demographic
collect accurate and timely wait time data. data.
This system has been implemented in
82 Ontario hospitals. Work is underway to
enhance this system to track wait times for all
surgical procedures in Ontario
This web-based system performs several
functions, which include:
Enabling the collection of data related to
wait times
Providing clinicians and other health
professionals with the tools required to
effectively assess patient urgency according
to a defined wait times standard
Measuring and reporting wait times and
data regarding utilization of procedures
Supplying clinicians, administrators, and
managers with near real-time information for
use in monitoring and managing wait lists
Reporting wait time information to the
public on a website enabling patients to
manage their own care and the public to
assess progress on reducing wait times.
(Cordeiro et al. 2015), and rates of thyroid cancer Indians” in Ontario, researchers have been able
among children, adolescents, and young adults to describe the cancer experience among the
(Zhukova et al. 2015). Through linkage with a First Nations population in Ontario and study
dataset identifying 140,000 registered or “Status their survival rates over a 30-year time frame
16 Health Services Data: The Ontario Cancer Registry (a Unique, Linked, and Automated. . . 385
100
90
80
70
Publication frequency
60
50
40
30
20
10
0
1973
1975
1977
1979
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
2009
2011
2013
Year of publication
Fig. 7 Distribution of peer-reviewed publications using the OCR as a data source 1973–2014 (Source: CCO Surveillance
and OCR, July 2015)
(Nishri et al. 2015). The ability of the OCR to consisting of individuals with cancer who have
identify patients in specific clinical subgroups experienced standard care, in which comparisons
has also enabled research studies to test concor- may also be derived from the OCR using
dance with clinical practice guidelines, such as treatment-based criteria. The OCR is able to pro-
the treatment of patients with stage II colon vide covariates necessary for the statistical control
cancer (Biagi et al. 2009), follow-up surveil- of potential confounders in these comparative ana-
lance of patients treated for Hodgkin’s lym- lyses (e.g., stage at diagnosis or date of diagnosis).
phoma (Hodgson et al. 2010), and the use of Studies incorporating survival analysis and
single fraction radiotherapy for uncomplicated modeling have been able to uncover factors
bone metastases (Ashworth et al. 2014). associated with survival on a population level.
Population-based retrospective cohort studies Such studies have uncovered clinicopathologi-
have used the OCR to identify cohorts of patients cal factors linked to survival among patients
who were diagnosed during a given period of time, diagnosed with pancreatic adenocarcinoma
underwent particular therapeutic courses, or expe- (Kagedan et al. 2015), survival among bladder
rienced a particular model of care. This approach cancer patients receiving various treatment
has been used to carry out research to look at modalities (Leveridge et al. 2015; MacKillop
healthcare costs among colorectal cancer patients et al. 2014a), survival among Ontario men who
(Mittmann et al. 2013), the impact of active sur- underwent radical prostatectomy, and general
veillance in prostate cancer (Richard et al. 2015) survival trends among individuals with laryn-
and the use of adjuvant chemotherapy among geal cancer (Macneil et al. 2015). By coupling
patients with early breast cancer (Enright OCR-defined cohorts with clinical data from
et al. 2012). These studies make use of noncancer sources such as surgical pathology reports,
comparison groups or population-based compari- researchers have been able to associate the prog-
sons through strategies such as random digit dial- nostic importance of specific clinical factors and
ing. They may also use comparison groups provide direction for best practice in clinical
386 S. Prodhan et al.
reporting (e.g., Berman et al. 2015). Other a letter from CCO. These letters are used to con-
investigators have looked at variability in sur- firm that the individual has been diagnosed with
vival among patients visiting different RCCs in cancer, inform said individual about the research
Ontario by linking the OCR with stage and being performed, and obtain consent for the
treatment data (e.g., head and neck cancer – release of their contact information to the
MacKillop et al. 2014b). researcher. Individuals are also provided the
The OCR is also useful to health services option to opt out of any such studies in the future.
researchers who are interested in the effectiveness The new and more standardized approach to
of preventive strategies to control cancer, such as patient contact minimizes the risk associated with
population-based screening programs. In this type erroneous identification of cancer patients and
of research design, the OCR data provides the clin- contacting patients who do not or do not yet
ical endpoint that will determine the effectiveness of know they have cancer, as well as patients who
screening intervention. The OCR has been used to are deceased. The approach also ensures a more
capture rates of colorectal cancer among those indi- consistent and effective process for obtaining
viduals who had a positive guaiac fecal occult blood informed consent from study participants.
screening test as part of Ontario’s Colon Cancer Examples of patient-contact studies using
Check program and assess their risk of colorectal OCR-identified patients as a sampling frame have
cancer over a 30-month time frame (Tinmouth included a case–control study to identify risk factors
et al. 2015). The OCR has also been used to ascer- associated with ovarian tumors (McGee and Narod
tain the rates of cervical cancer before and after the 2010), a study of quality of life and health utilities
introduction of a human papillomavirus immuniza- among prostate cancer patient (Krahn et al. 2013), a
tion program for girls in grade 8 (Smith et al. 2015). dietary study among breast cancer patients (Boucher
The OCR has been used widely to look at the effects et al. 2012), and a survey of men with prostate
of breast cancer screening and its various aspects cancer about decision-making around the use of
though linkage with the data from the Ontario complementary and alternative medicine (Boon
Breast Screening Program. This research has shed et al. 2005).
light on the role of mammographic density in
screening outcomes (Boyd et al. 2007), the contri-
bution of clinical breast examination to breast Data Privacy and Access
screening (Chiarelli et al. 2009), and the perfor-
mance of digital compared with screen-film mam- Privacy
mography (Chiarelli et al. 2013).
As a prescribed entity under the Ontario Personal
Health Information Protection Act, CCO is per-
Patient Contact mitted to collect, use, and disclose personal health
information. By way of comparison, other pre-
Research teams will occasionally approach CCO scribed entities within Ontario include the Pediat-
to gain access to the OCR for the purpose of ric Oncology Group of Ontario, Canadian
identifying individuals eligible to participate in Institute for Health Information, and the Institute
cancer-related research studies. In these instances, for Clinical Evaluative Sciences.
analysts at CCO will work with research investi- CCO has robust information management
gators to refine a set of criteria for participation in practices, outlined within the Privacy Program,
the study and extract a cohort from the OCR. in place to ensure the protection of personal health
Until 2014, CCO had been in the practice of information within the OCR and its other data
providing cohorts to the research team, who holdings. These information management prac-
would then make contact with patients to request tices are audited on a triennial basis by the Office
their participation, often via their physician. The of the Information and Privacy Commissioner of
current process for patient contact is initiated with Ontario.
16 Health Services Data: The Ontario Cancer Registry (a Unique, Linked, and Automated. . . 387
CCO’s Privacy Program includes privacy pol- procedures. Before obtaining final approval by the
icies, standards, procedures, and guidelines. Its Data Disclosure Subcommittee, research data
privacy assurance and risk management activities requests must undergo an extensive review by
involve: subject matter experts in the data disclosure work-
ing group.
• Privacy impact assessments and risk mitigation
plans
• Data sharing agreements
Technical Appendix
• Standard operating procedures
Synoptic pathology reports are an integral com-
ponent of the EDW and feed the Pathology Data
Staff privacy training and awareness activities
Mart, which is needed for CS integration (Fig. 9).
are in place to maintain a culture of privacy across
Synoptic pathology reports from the Pathology
the organization. A data access program,
Data Mart, OCR case files and CS abstracts are
described below, is implemented to review and
utilized by the Registry Plus service to drive CS
approve external and internal requests for access
integration and populate the CS data mart (see
to OCR data.
section “Cancer Stage at Diagnosis” for more
information on CS and its processes). Registry
Plus is a suite of publicly available free software
Data Request Process
programs for collecting and processing cancer
registry data (Centers for Disease Control and
CCO understands the value of health services
Prevention 2015).
research and has therefore implemented the data
disclosure team to assist researchers and other
data requestors in accessing its data holdings. ePath, eMaRC, and ASTAIRE
Outlined in Table 8 are the four types of data
requests typically received by CCO. All pathology reports are handled through CCO’s
Figure 8 outlines CCO’s data disclosure pro- ePath electronic pathology reporting system.
cess and the various internal groups involved. The receives, processes and stores pathology reports,
Data Disclosure Subcommittee oversees all connecting the diagnostic laboratories to the
research data requests and occasionally some gen- OCR. ePath is comprised of several major sub-
eral data requests, in adherence with the Personal systems, including the Electonic Mapping,
Health Information Protection Act and CCO’s Reporting, and Coding (eMaRC) and the Auto-
Data Use and Disclosure Standard. This group mated Synoptic Template Analysis Interface and
also reviews CCO’s data disclosure policies and Rule Engine (ASTAIRE).
Fig. 8 Data disclosure process at CCO. DD WG data disclosure working group, DDSC data disclosure subcommittee
Hospital LIS
Non-Synoptic
Pathology
Pathology
Data Mart
Reports
Registry Coder
Synoptic
Hospital EPR Pathology Ontario Cancer
-Health Records Reports Registry
CS Datamart
CS Abstractors RegistryPlus
CS Integration
Fig. 9 Diagram of pathology-driven processes at CCO. LIS laboratory information system, EPR electronic patient
record, CS Collaborative Stage
CCO eMaRC is a subcomponent of the ePath partial automation for numerous ICD-O-3 diagnoses
system, which processes and stores pathology codes, collaborative staging elements, and creates
reports received in HL7 messaging format. CCO NAACCR compatible abstract records. The system
eMaRC automatically filters cancer vs non-cancer also merges multiple reports for a single patient so as
reports and non-reportable reports and provides to prevent the creation of extra cases in the OCR.
16 Health Services Data: The Ontario Cancer Registry (a Unique, Linked, and Automated. . . 389
A data quality assessment tool, ASTAIRE Boon H, Westlake K, Deber R, Moineddin R. Problem-
ensures that synoptic data is compliant with solving and decision-making preferences: no difference
between complementary and alternative medicine users
the College of American Pathologists standards. and non-users. Complement Ther Med. 2005;13(3):
ASTAIRE is made up of three components: 213–6.
GINGER, FRED, and ADELE. Combined, Boucher BA, Cotterchio M, Curca IA, Kreiger N, Harris
GINGER and FRED ensure that synoptic data SA, Kirsh VA, et al. Intake of phytoestrogen foods and
supplements among women recently diagnosed with
is sufficiently complete and in line with current breast cancer in Ontario, Canada. Nutr Cancer.
eCC versions. ADELE then cleans data so that 2012;64(5):695–703.
may be admitted to the EDW. Boyd NF, Guo H, Martin LJ, Sun L, Stone J, Fishell E,
In the interest of privacy and efficiency, data et al. Mammographic density and the risk and detection
of breast cancer. N Engl J Med. 2007;356(3):227–36.
handled through ePath is coded in Health Level Cancer Act, R.S.O 1990, c. C.1 [Internet]. 22 June 2006
Seven V2 format, which is a secure method of [cited 28 Oct 2015]. Available from: http://www.
data transmission designed to protect sensitive ontario.ca/laws/statute/90c01
health information. This data contains three Candido E, Young S, Nishri D. One cancer or two? The
impact of changes to the rules for counting multiple
main elements: patient ID (PID), observation primary cancers on estimates of cancer burden in
report ID (OBR), and observations (OBX). Ontario [Internet]. In: Proceedings of the 2015 Cana-
Patient ID contains personal and identifiable dian Society for Epidemiology and Biostatistics Con-
information, such as a patient’s name, sex, and ference; 1–4 June 2015; Mississauga/Toronto: Cancer
Care Ontario; 2015. Available at: http://csebca.ipage.
address. The observation report ID pertains to com/wordpress/wp-content/uploads/2014/06/June-2_
the pathology report and provides information 1430_SouthStudio_C1.2-Candido.pdf
regarding the pathologist, surgeon, referrals, and Centres for Disease Control and Prevention. Registry Plus,
specimen collection. The observations data ele- a suite of publicly available software programs for
collecting and processing cancer registry data [Inter-
ment conveys information regarding the clinical net]. Atlanta: National Center for Chronic Disease Pre-
diagnosis, clinical history, gross pathology, sub- vention and Health Promotion; Jan 2015 [cited 28 Oct
mitted tissues, and full diagnosis. 2015]. Available at: http://www.cdc.gov/cancer/npcr/
Chiarelli AM, Majpruz V, Brown P, Theriault M,
Shumak R, Mai V. The contribution of clinical breast
examination to the accuracy of breast screening. J Natl
References Cancer Inst. 2009;101(18):1236–43.
Chiarelli AM, Edwards SA, Prummel MV, Muradali D,
Majpruz V, Done SJ, et al. Digital compared with
Allemani C, Weir HK, Carreira H, Harewood R, Spika D, screen-film mammography: performance measures in
Wang XS. Global surveillance of cancer survival concurrent cohorts within an organized breast screen-
1995–2009: analysis of individual data for 25,676,887 ing program. Radiology. 2013;268(3):684–93.
patients from 279 population-based registries in Clarke EA, Marrett LD, Kreiger N. Cancer registration in
67 countries (CONCORD-2). Lancet. 2015;385 Ontario: a computer approach. IARC Sci Publ.
(9972):977–1010. 1991;95:246–57.
Ashworth A, Kong W, Chow EL, Mackillop W. The frac- Coleman MP, Quaresma M, Berrino F, Lutz JM, De
tionation of palliative radiation therapy for bone metas- Angelis R, Capocaccia R, et al. Cancer survival on
tases in Ontario. Paper presented at: The 56th Annual five continents: a worldwide population-based study
Meeting of the American Society for Radiation Oncol- (CONCORD). Lancet Oncol. 2008;9(8):730–56.
ogy; San Francisco; Sept 2014. Cordeiro ED, Dixon M, Coburn N, Holloway C. A patient-
Berman DM, Kawashima A, Peng Y, Mackillop WJ, Sie- centered approach toward wait times in the surgical
mens DR, Booth CM. Reporting trends and prognostic management of breast cancer in the province of
significance of lymphovascular invasion in muscle- Ontario. Ann Surg Oncol. 2015;22(8):2509–16.
invasive urothelial carcinoma: a population-based Enright K, Grunfeld E, Yun L, Moineddin R, Dent SF,
study. Int J Urol. 2015;22(2):163–70. Eisen A, et al. Acute care utilization (ACU) among
Biagi JJ, Wong R, Brierley J, Rahal R, Ross J. Assessing women receiving adjuvant chemotherapy for early
compliance with practice treatment guidelines by breast cancer (EBC). Paper presented at: The 2012
treatment centers and the reasons for noncompliance. Breast Cancer Symposium; San Francisco; Sept 2012.
Paper presented at: The 2009 Annual Meeting of the Hodgson DC, Grunfeld E, Gunraj N, Del Giudice L. A
American Society of Clinical Oncology; Orlando; population-based study of follow-up care for Hodgkin
May 2009. lymphoma survivors: opportunities to improve
390 S. Prodhan et al.
surveillance for relapse and late effects. Cancer. Society of Gynecologic Oncologists; San Francisco;
2010;116(14):3417–25. Mar 2010.
International Cancer Benchmarking. Showcasing our find- Mittmann N, Isogai PK, Saskin R, Liu N, Porter J, Cheung
ings and impacts. London: Cancer Research; Dec 2014 MC, et al. Homecare utilization and costs in colorectal
[cited 28 Oct 2015]. Available from: http://www. cancer. Paper presented at: Healthcare Cost, Quality,
cancerresearchuk.org/sites/default/files/icbp_pb_1012214 and Policy: Driving Stakeholder Innovation in Process
_booklet_final.pdf and Practice Conference; Toronto; Nov 2013.
Johnson CH, Peace S, Adamo P, Fritz A, Percy-Laurry A, Nanji S, Mackillop WJ, Wei X, Booth CM. Management
Edwards BK. The 2007 Multiple Primary and Histol- and outcome of colorectal cancer (CRC) liver metasta-
ogy Coding Rules [Internet]. Bethesda: National Can- ses in the elderly: A population-based study. Paper
cer Institute’s Surveillance Epidemiology and End presented at: The 15th Annual Americas Hepato-
Results Program; Aug 2012. Available at: http://seer. Pancreato-Biliary Congress; Miami Beach; Sept 2015.
cancer.gov/tools/mphrules/ Nishri ED, Sheppard AJ, Withrow DR, Marrett LD. Cancer
Kagedan DJ, Raju R, Dixon M, Shin E, Li Q, Liu N, survival among First Nations people of Ontario,
et al. Predictors of actual survival in resected pancreatic Canada (1968–2007). Int J Cancer. 2015;136(3):
adenocarcinoma: A population-level analysis. Paper 639–45.
presented at: The 15th Annual Americas Hepato- Parkin DM, Bray F. Evaluation of data quality in the cancer
Pancreato-Biliary Congress; Miami Beach; Sept 2015. registry: principles and methods part II: Completeness.
Krahn MD, Bremner KE, Alibhai SM, Ni A, Tomlinson G, Eur J Cancer. 2009;45:756–64.
Laporte A, et al. A reference set of health utilities for Personal Health Information Protection Act; June 2016
long-term survivors of prostate cancer: population- [cited July 2016]. Available from: https://www.ontario.
based data from Ontario, Canada. Qual Life Res. ca/laws/statute/04p03
2013;22(10):2951–62. Richard PO, Alibhai S, Urbach D, Fleshner NE,
Leveridge MJ, Siemens DR, Mackillop WJ, Peng Y, Timilshina N, Klotz L, et al. The uptake of active
Tannock IF, Berman DM, et al. Radical cystectomy surveillance in prostate cancer: Results of a population
and adjuvant chemotherapy for bladder cancer in the based-study. Paper presented at: The 2015 Annual
elderly: a population-based study. Urology. 2015;85(4): Meeting of the American Urological Association;
791–8. New Orleans; Apr 2015.
MacKillop W, Siemens R, Zaza K, Kong W, Peng P, Smith LM, Strumpf EC, Kaufman JS, Lofters A,
Berman D, et al. The outcomes of radiation therapy Schwandt M, Levesque LE. The early benefits of
and surgery for bladder cancer: a population-based human papillomavirus vaccination on cervical dyspla-
study. Paper presented at: The 56th Annual Meeting sia and anogenital warts. Pediatrics. 2015;135(5):
of the American Society for Radiation Oncology; San 1131–40.
Francisco; Sept 2014a. Tinmouth JM, Lim T, Kone A, Mccurdy B, Dube C,
MacKillop W, Kong W, Zaza K, Owen T, Booth C. Volume Rabeneck L. Risk of colorectal cancer among those
of practice and the outcomes of radiation therapy for who are gFOBt positive but have had a recent prior
head and neck cancer. Paper presented at: The 56th colonoscopy: experience from an organized screening
Annual Meeting of the American Society for Radiation program. Paper presented at: Digestive Disease Week
Oncology; San Francisco; Sept 2014b. 2015; Washington, DC; May 2015.
Macneil SD, Liu K, Shariff SZ, Thind A, Winquist E, Zhukova N, Pole J, Mistry M, Fried I, Bartels U, Huang A,
Yoo J, et al. Secular trends in the survival of patients et al. Clinical and molecular determinants of long-term
with laryngeal carcinoma, 1995–2007. Curr Oncol. survival in children with low grade glioma; a popula-
2015;22(2):85–99. tion based study. Paper presented at: The 16th Interna-
McGee J, Narod S. Low-malignant-potential tumor risk tional Symposium on Pediatric Neuro-Oncology in
factor analysis: a matched case–control analysis. Conjunction with the 8th St. Jude-VIVA Forum; Sin-
Paper presented at: The 41st Annual Meeting of the gapore; 28 June 2015–2 July 2015.
Challenges of Measuring the
Performance of Health Systems 17
Adrian R. Levy and Boris G. Sobolev
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
Performance Measurement in the Canadian Health-Care System . . . . . . . . . . . . . . . . 392
Data Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
A Case Study on Performance Measurement: Health Technology
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
Existing Research on Performance Measurement in Health Technology
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Data Sources for Performance Measurement in Health Technology
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Health Information and Statistics Canada on a broad systematic approach to performance measurement,
range of performance. Each indicator falls into one it will not be possible to develop a coherent strategy
of the four dimensions of the health indicator: for informing policy-making and decision-making
(1) health status provides insight on the health of throughout the entire health-care system.
Canadians, including well-being, human function,
and selected health conditions; (2) nonmedical
determinants of health reflect factors outside of the Background
health system that affect health; (3) health system
performance provides insight on the quality of At a fundamental level, “the primary aim of eval-
health services, including accessibility, appropriate- uation is to aid stakeholders in their decision mak-
ness, effectiveness, and patient safety; and (4) com- ing on policies and programs” (Alkin 2004). It is
munity and health system characteristics provide intended to provide evidence on the degree to
useful contextual information, rather than direct which government policies and spending are
measures of health status or quality of care. The effectively addressing specific issues identified
goals of this chapter are to characterize the goals by bureaucrats and elected officials.
of a high-functioning health-care system and pro- Performance management in the public sector
vide a typology for performance measures in health became a focus of interest in the late 1980s,
care. Both of these will be done within the context starting with the reinventing government move-
of the renewal of the First Ministers’ Accord. ment (Osbourne and Gaebler 1992). In the United
The intent of the 1984 Canada Health Act was States, the 1993 Government Performance and
to ensure that all residents of Canada have access Results Act obligated all federal departments and
to medically necessary health care on a prepaid agencies to present 5-year teaching plans linked to
basis. However, the act has not been uniformly performance measures; annual performance plans
applied across provinces and territories, leading to were required after 1998. In the United Kingdom,
variability in available services and treatment in the financial management initiative was intro-
different jurisdictions. The federal government’s duced in the early 1980s.
determination to adhere to national standards In Canada, the federal government introduced
while reducing funding to the provinces has a centralized evaluation policy in 1977. Using
recently produced additional challenges. (For evidence from peer-reviewed sources and from
more details about federalism and health care in reports of the auditor general, Shepherd argued
Canada, see Wilson (2000)). that, between 1977 and 2009, Canada’s evaluation
The system currently used to measure the per- policy was focused on operational issues directed
formance of the health system in Canada lags primarily toward program managers (Shepherd
behind that of other countries such as the United 2012). In 2009, federal evaluation policy was
States and the United Kingdom, both in terms of refocused on fiscal prudence and accountability.
standardized indicators and research in the area.
As a result, there is evidence indicating that the
values of Canadians are misaligned with the Performance Measurement
funding and performance of the health-care sys- in the Canadian Health-Care System
tem (Snowdon et al. 2012).
In this chapter, the authors review the current Over the past 25 years, there has been an increase
state of knowledge about performance measure- in measuring and reporting on the performance of
ment in health care and examine current efforts the Canadian health-care system at the federal,
in Canada. We describe the structural, political, provincial, and territorial levels. On the demand
conceptual, and methodological challenges of per- side, provincial and territorial governments and
formance measurement in the field of health tech- health authorities have been subjected to intense
nology assessment. We argue that without more pressure to contain costs; patients have greater
clarity around ethics and perspectives and a more expectations to be involved in decisions about
17 Challenges of Measuring the Performance of Health Systems 393
their treatment; and health-care professionals and • Broadly available and able to be disseminated
health authorities expect more oversight and electronically across Canada at the regional,
accountability be built into the health-care system. provincial, and national level
On the supply side, the information revolution and
progress in information technology have made it The primary goal of the Health Indicators Pro-
less expensive and more straightforward to col- ject was to support health regions in monitoring
lect, process, and disseminate data. progress in improving and maintaining the health
There have been several attempts to define the of the population and the functioning of the health
problem of how to measure health-care perfor- system for which they are responsible through the
mance in Canada, the necessary first step toward provision of good-quality comparative informa-
aligning goals and objectives. In 2000, the First tion on:
Ministers’ Meeting Communiqué on Health
directed Canada’s health ministers to meet to
• The overall health of the population served,
collaborate on the development of a comprehen-
how it compares with other regions in the
sive framework to report on health status, health
province and country, and how it is changing
outcomes, and quality of service using jointly
over time
agreed-upon comparable indicators. The intent
• The major nonmedical determinants of health
was that such reporting would meet several
in the region
objectives by providing information to Cana-
• The health services received by the region’s
dians on government performance, as well as
residents
assisting individuals, governments, and health-
• The characteristics of the community or the
care providers to make more informed health
health system
choices. In September 2002, all fourteen federal,
provincial, and territorial governments released
comparable indicator reports on a set of 67 indi- No mention was made of other potential uses
cators. The 2003 First Ministers’ Accord on for performance indicators, including establishing
Health Care Renewal (Appendix 1) directed the competence of organizations and identifying
health ministers to develop more indicators to the effectiveness of programs to meet specific
supplement work undertaken in response to the objectives.
September 2000 communiqué and identified the The communiqué from the 2004 First Minis-
following priority areas for reform: healthy ters’ Meeting on the Future of Health Care,
Canadians, primary health care, home care, cat- called “A 10-Year Plan to Strengthen Health
astrophic drug coverage and pharmaceutical Care,” included an explicit commitment to
management, diagnostic and medical equipment, “accountability and reporting to citizens” that
and health human resources. Federal, provincial, read: “all governments agree to report to their
and territorial jurisdictions agreed on 70 indica- residents on health system performance includ-
tors with 81 sub-indicators and established the ing the elements set out in this communiqué.”
Health Indicators Project to have them collated In so doing, the first ministers agreed that per-
and make them publicly available. formance indicators would be required and
Priorities and directions for the Health Indica- would be used for reporting purposes. The
tors Project were broadly revisited at a second intent of the effort was to hold health ministries
consensus conference in March 2004. The accountable for stewardship of the health-care
resulting consensus statement established that system using performance indicators. The
health indicators must be: communiqué did not specify whether such
reporting would be used in a formative
• Relevant to established health goals (to improve specific health systems) or in a
• Based on standard (comparable) definitions summative (to implement corrective measures
and methods or impose penalties) fashion.
394 A. R. Levy and B. G. Sobolev
Consultations continued with provincial and patient and treatment registries. As a result of the
regional health authorities to ensure that relevant large amount of data collected in Canada, this
data were collected and consistent methods were country has been characterized as a data-rich envi-
used for performance measurement. In 2012, the ronment (Roos et al. 2005). This is reflected by the
13th in a series of annual reports presented health activities of provincial data centers, which both
indicator data from the Canadian Institute for serve as data custodians and collate and use
Health Information and Statistics Canada on a administrative health and other databases for
broad range of performance measures (CIHI research and evaluation (Suissa et al. 2012).
2012). The data were grouped into four dimen- Existing performance measures reported by the
sions of health: (1) health status, which provides Canadian Institute for Health Information depend
insight on the health of Canadians, including well- on information from provincial and territorial
being, human function, and selected health con- population registries, vital statistics, hospital dis-
ditions; (2) nonmedical determinants of health, charge abstracts, and physician claims. Even
which reflect factors outside of the health system though performance measures have been reported
that affect health; (3) health system performance, annually since 2003, there are concerns about the
which provides insight on the quality of health provinces’ ability to produce unbiased perfor-
services, including accessibility, appropriateness, mance measures because of data quality; in Man-
effectiveness, and patient safety; and (4) commu- itoba, the auditor was “unable to form an opinion
nity and health system characteristics, which pro- on the accuracy of the data or on the adequacy of
vide useful contextual information rather than disclosure” for 21 of 56 health indicators used in
direct measures of health status or quality of care. the provincial report (Manitoba Minister of Health
That report used the following principles to and Healthy Living 2004).
categorize disparities in the health system:
• Same access to available care for the same need A Case Study on Performance
• Same utilization for the same need Measurement: Health Technology
• Same quality of care for all Assessment
Table 1 Examples of health-care performance indicators and information needs according to the type of stakeholder
Stakeholder Goals Types of needed information
Citizens To see evidence that resources on health are being Transparent descriptions of stated priorities
spent efficiently and align with stated priorities Comparative information on the health of the
To have the information they need to hold policy and population versus that in other countries
decision-makers accountable for health policies and Comparative information on the performance of the
health-care delivery that align with societal values health-care system versus that in other countries
To be reassured that necessary care will be Transparent access to indicators of access, quality of
forthcoming in time of need care, and resource use in the health-care system
Patients To be reassured that they will have access to specific Information on available health-care services and
health care when they need it, within a safe timeframe modalities
and at adequate proximity Information on trade-offs between services in terms
To obtain information on the intended and of potential intended and unintended health
unintended consequences of alternative health-care outcomes and out-of-pocket costs
options and on the out-of-pocket expenses associated
with these options
Health-care To provide high-quality and appropriate health care Data on individual performance against benchmarks
professionals to patients Up-to-date information on best practices, guidelines
To maintain and improve their knowledge and skills
in health-care delivery
Hospitals To monitor and improve the use of health-care Collective data on health-care quality, including
resources patient safety indicators measured against
To manage local budgets benchmarks
To identify and prioritize health technology Information on distributions of access (utilization,
acquisition and disinvestment waiting lists, and waiting times) measured against
To ensure patient safety benchmarks
To conduct continuous quality improvement A transparent health technology assessment process
Information on patient experience and satisfaction
Hospital-level costing information
Health To ensure that hospitals and health-care professionals Information on the comparative health of their
authorities provide appropriate and cost-effective health care population versus that of populations served by other
To ensure that patients have access to the specific health authorities
health care they need, within a safe timeframe and at Information on the health needs of their region
adequate proximity Information on the equity of health-care resource
To manage regional budgets distribution
To assess the impact of health care on the regional Information on distributions of access (utilization,
health needs of the population waiting lists, and waiting times) across health
To ensure equitable distribution of resources authority
Health authority-level costing information
Governments To assess the impact of health care on patients and on Comparative data on the health of their population
population health versus that of populations in other provinces and
To establish current and future health policy goals territories and in other countries
and priorities Information on the societal value of health care,
To set and manage governmental budgets elicited using transparent citizen engagement
To plan for the viability and sustainability of the processes
health-care system Information on the health needs of the region
To demonstrate the adequacy and proper functioning Information on the equity of health-care resource
of regulatory procedures for health care distribution
To provide appropriate assessment and research Information on distributions of access (utilization,
infrastructure waiting lists, and waiting times) across the
To promote investment and innovation in health care jurisdiction
Aggregate and decomposed expenditure data at the
provincial, territorial, and national level
Information on societal productivity attributable to
health and health care
Regulators To protect patient safety Safety signals from health care
To ensure protection of health-care professionals and Integrity in reporting financial performance
other consumers beyond patients Information on innovation in health care
To uphold their fiduciary responsibility
To promote efficiency in health-care markets
396 A. R. Levy and B. G. Sobolev
severity of secondary health complications. In the real world, or they need to understand how
health-care evaluation, a performance measure they affect the health system in terms of who is
summarizes the distribution of a health-care out- actually treated, the long-term clinical benefits,
come in the patient population. In most studies, severe unintended consequences, health-related
the performance measure combines the observed quality of life, and productivity. Even less is
responses for all patients or hospitals into a single known about the impact of less severe unintended
number. For example, a performance study might consequences, downstream medical and health
record the timing and occurrence of a clinic consequences (for the population to whom the
appointment for each patient, with the distribution technology is actually applied), population effec-
of time to clinic appointment (the health-care out- tiveness, or incremental cost-effectiveness in
come) being summarized by the weekly rate of actual use.
appointments (the performance measure). Many innovations have led to less invasive
There have been large investments in health technologies being introduced to treat conditions
technology assessment over the past decades, previously managed surgically, such as percuta-
and the use of new health-care technology is an neous transluminal coronary angiography, which
important driver of ongoing increases in health- is now being undertaken in patients who were
care expenditures. Before an expensive new tech- previously managed with coronary artery bypass
nology is implemented and covered in a jurisdic- grafting (Weintraub et al. 2012), and extracorpo-
tion, the expected impacts are assessed at the real shock-wave lithotripsy, which has displaced
provincial level, and the technology’s incremental surgical removal of kidney stones. Noninvasive
cost-effectiveness is often assessed by the Cana- technologies typically reduce patient morbidity
dian Agency for Drugs and Technology in Health; and the length of hospital stay, often resulting in
by several provinces, such as Ontario and Quebec; lower unit costs of treatment, and should therefore
and by some Canadian hospitals (Levin result in potential cost savings to the health-care
et al. 2007; McGregor and Brophy 2005). system. However, understanding the long-term
At the time a new health technology comes consequences of such technologies requires for-
to market, there is typically little information on mal assessment because those savings are often
its benefits, safety, and cost implications for the not realized. Angioplasty leads to a greater need
population among whom the technology will be for repeat revascularization over time, which
used. As such, health technology assessment reduces the cost differential, and, perhaps because
provides an incomplete picture. It examines of reduced morbidity, the number of patients and
short-term safety, with a focus on the most com- treatments may increase after a new technology
mon, serious (potentially life threatening), and becomes established (Levy and McGregor 1995).
severe (potentially debilitating) unintended Although measuring the performance of new
consequences; efficacy, often using data from health-care technologies once they have been
the restricted conditions in randomized trials; introduced into practice is crucial, it is done only
the acquisition costs; and, sometimes, estimated rarely. The work of the Ontario Health Technol-
cost-effectiveness on the basis of long-term pro- ogy Advisory Committee is an exception (Levin
ject models drawing on the limited information et al. 2007). One reason is that there is a lack of
available at market launch. indicators on a new health technology and a time
Once the technology is marketed, some infor- lag of at least several years before administrative
mation becomes available on the geographic dis- data becomes available for analysis in Canada.
tribution of the technology and sometimes its This knowledge gap is becoming increasingly
utilization. However, this descriptive information problematic as governments, health authorities,
alone is not adequate for assessing the perfor- and hospitals struggle to work within fixed bud-
mance of the technology. Decision-makers need gets, with the federal government planning on
to understand how new technologies affect indexing its spending to inflation. Decision-
patients once they have been adopted for use in makers in these organizations have said clearly
17 Challenges of Measuring the Performance of Health Systems 397
that they suffer from a lack of straightforward it more useful for improving performance
information about which technologies work, on within an organization than for comparing per-
whom, and under what circumstances (Health formance between organizations.
Technology Assessment Task Group on behalf More recently, a group of European investiga-
of the Federal/Provincial/Territorial Advisory tors proposed an input-throughput-outcome
Committee on Information and Emerging Tech- model of the health-care system in relation to the
nologies 2004). There is no consensus on, or even different types of health-care technologies
an understanding of, what should be measured or (Velasco et al. 2010). The thrust of their argument
how performance should be measured. is that “health technology assessment should
develop to increase its focus on the ‘technologies
applied to health care’ (i.e., the regulatory and
Existing Research on Performance policy measures for managing and organizing
Measurement in Health Technology health-care systems).” They recommend that
Assessment health technology assessment should have an
increased focus on regulatory, financial, and pol-
At least four groups of investigators have pro- icy measures for managing and organizing health-
posed methods to measure performance in health care systems. They recommend that “countries
technology assessment. A group of investigators embarking on health technology assessment
from the United Kingdom proposed a framework should not consider establishing completely sep-
for describing decision-making systems that use arate agencies for health technology assessment,
health technology assessment to determine reim- quality development, performance measurement,
bursement of health technologies (Hutton and health service development, but should rather
et al. 2006). The framework groups systems combine these agencies into a common knowl-
under four main headings (constitution and gov- edge strategy for evidence-informed decision-
ernance, objectives, use of evidence and decision making in the health services and the health sys-
processes, and accountability) and identifies three tem.” Although ambitious, there would be much
processes (assessment, decision, and outputs and to be gained from such a strategy.
implementation). Hutton et al. assessed the feasi- The framework closest to assessing some of
bility of implementing the framework using the performance measures listed in Table 1 was
published information on constitution and gover- developed in Quebec (Jacob and McGregor
nance, methods and processes, the use of evi- 1993). These authors outlined a new methodology
dence, and transparency and accountability, at for evaluating the impact of health technology
the stages of assessment, decision-making, and assessments on policy and expenditures and
implementation. They found that most of the applied it to 21 assessments produced by the Que-
information needed for their framework was not bec Council for Health Technology Assessment
publicly available. between 1990 and 1995. Using published docu-
A group of researchers from l’Université de ments, interviews, questionnaires, and administra-
Montréal proposed a framework for performance tive health data, the authors sought to evaluate
assessment in health technology assessment orga- the impact of health technology assessments by
nizations (Lafortune et al. 2008). Their conceptual addressing three fundamental questions: (1) What
model includes four functions and organizational impact was intended? (2) To whom was the mes-
needs that must be balanced for a health tech- sage directed? (3) To what extent was the hoped-
nology agency to perform well: goal attainment, for impact achieved, first in terms of policy and
production, adaptation to the environment and second in terms of actual distribution and the use
culture, and value maintenance. Although this of the technology? The authors determined that
model has a strong conceptual grounding, it has 18 of the 21 assessments had an influence on
yet to be applied in practice. It requires analysts policy and that there were substantial savings to
to make qualitative judgments, which may make the health-care system. They concluded that it will
398 A. R. Levy and B. G. Sobolev
rarely be possible to precisely estimate impact, but assessment done using a primary data collection
systematic documentation of effects can be procedure (Goeree et al. 2009).
achieved. The self-stated limitations of their meth-
odology included the identification of what they
called critical incidents, systematic categorization Data Sources for Performance
of policies about health technology, and the use of Measurement in Health Technology
documentation, which led to a degree of objectiv- Assessment
ity but also led to limitations relating to the reli-
ance on analysts’ judgment. The interpretations In terms of using existing data sources for perfor-
were improved by consulting with important mance measurement, investigators in the United
stakeholders. They also acknowledged that the Kingdom have proposed a typology of databases
impact of any health technology assessment is according to their potential uses in the following
influenced by many other factors, substantially elements of health technology assessment
complicating interpretations. (Assessing causality (Raftery et al. 2005):
when measuring performance of health technol-
ogy is among the most pernicious challenges fac- • Group I databases can be used to identify both
ing the careful analyst. This is made particularly health technologies and health states; these, in
challenging because of the impossibility of ran- turn, can be disaggregated into clinical regis-
domization in most studies. The thoughtful study tries, clinical administrative databases, and
by Jacob and McGregor (1993) is notable for its population-oriented databases. These data-
rigor and critical thinking in this area.) bases can be used to assess effectiveness,
None of the existing frameworks for perfor- equity, and diffusion.
mance measurement of health technology assess- • Group II databases can be used to identify
ment have gained widespread acceptance or have health technologies but not health states.
been used widely to help guide allocation deci- These databases can be used to assess
sions. One reason for this lack of uptake may be diffusion only.
that these frameworks are too complicated to be • Group III databases can be used to identify
easily applied or understood. Part of the reason the health states but not health technologies;
frameworks are complex is that the variables that these, in turn, can be disaggregated into
comprise the frameworks are not clearly defined. adverse event reporting, disease-only regis-
Without proper definition it is difficult to access tries, and health surveys. These databases
the appropriate indicators, which in turn makes it have restricted scope; they are focused mainly
difficult to examine the outcomes. on unintended adverse consequences of treat-
Other than the efforts of Jacob and McGregor ment or disease.
(1993), existing publications on performance
measurement in health technology assessment In the environmental scan that Raftery
have focused on processes and not on outcomes. et al. conducted in England and Wales, 270 data-
One reason for this is that outcomes are harder to bases were identified, of which an estimated six
measure in an unbiased fashion. Instead, existing had some potential for health technology assess-
performance measurement systems for health ment, approximately one-half of which could be
technology assessment are scattered and gener- assigned to group I. These investigators made
ated in a nonsystematic fashion. Additionally, important recommendations for policy that are
health technology assessments must presently applicable in Canada: responsibility for the strate-
rely on data that are made available because it is gic development of databases should be clarified
relatively convenient to do so, such as information (in Canada, this might be refocused on the ratio-
generated using routinely collected administrative nalization of data collection efforts with and
health data (Roos et al. 2005) and registries across health authorities); more resources should
(Tu et al. 2007); only rarely is a performance be made available; and issues associated with
17 Challenges of Measuring the Performance of Health Systems 399
coding, confidentiality, custodianship and access, medical products by setting and enforcing maxi-
maintenance of clinical support, optimal use of mum reimbursement amounts for medications,
information technology, filling gaps, and remedy- whereas provision of health care is mostly a pro-
ing deficiencies should be clarified. vincial and territorial responsibility. This compli-
cated legislative and regulatory environment
means that political and health reform cycles
Discussion must be considered at an early stage in the devel-
opment of performance measures (Roberts
Efforts to measure and assess the performance of et al. 2008). Performance indicators would be
the Canadian health system in Canada are in the developed and implemented much more effec-
early stages, and the research agenda is enormous. tively if there was cooperation between the fed-
Policy questions about what data to collect, and at eral, provincial, and territorial governments as
what cost, now have equally important parallels in well as health authorities and individual hospitals.
terms of how and when to most usefully summa- It is not possible for any subset of performance
rize and report such information, how to integrate measures to capture all of the facets of health care
the information into governance and efforts to that are needed by different stakeholders. What is
improve performance, and, ultimately, how to required is a process of systematically identifying
make wise decisions to optimize the health of and prioritizing performance measures that will
the population. meet at least some of the needs of each stake-
Developing performance indicators can be holder. Determining what performance measures
seen as a four-step process consisting of policy, should be used is, at the most fundamental level,
development, implementation, and evaluation an ethical question because the output must rep-
phases (Ibrahim 2001). The process must address resent the different values and needs of multiple
the conceptual, methodological, practical, and stakeholders. (Depending on the perspective,
political considerations for developing perfor- performance measures could be developed to rep-
mance measures for the Canadian health system. resent different perspectives, including the fol-
The lack of a conceptual framework for perfor- lowing ones. First, the utilitarian perspective
mance measurement in health means that research emphasizes the importance of achieving the
in the area is in its infancy. Methodological chal- greatest good for the greatest number. Bureaucrats
lenges are created by the nature of funding mech- require performance indicators to provide wise
anisms in the Canadian health system and the stewardship of the health-care system and to bal-
potentially long time lags between cause and ance equity of access with efficient distribution.
effect. Practical considerations include the daunt- For example, some Canadian midsized cities may
ing volume of work that would be required for seek to establish catheterization laboratories to
greater performance measurement, including the increase the speed of access to angioplasty for
cost and timing of such work. To date, many treating acute myocardial infarction, and provin-
unresolved questions remain, such as the follow- cial bureaucrats require access to information on
ing: Who will decide the performance indicators? both distributive and allocative efficiencies to bal-
Who will measure them? How will the results of ance the merit of these claims (Levy et al. 2010).
such measurements be presented? To whom and Health-care professionals and hospital administra-
how often? Performance assessment should not be tors use performance indicators to identify the
seen as a one-time effort: regular, ongoing follow- functional competence of individual practitioners
up is required. Political challenges include the and organizations and to decide which technolo-
different levels of governmental jurisdictions in gies to adopt. Surgeons must maintain their skills
Canada, with standards for care being laid out by to minimize operative complications, and health
the Canada Health Act; the federal government is authority decision-makers may seek detailed
responsible for protecting the health of the popu- information on postoperative infection rates
lation by ensuring safety through the regulation of when considering a technology for stapling versus
400 A. R. Levy and B. G. Sobolev
sewing colorectal anastomoses (when closing the and becomes normative (Murray and Lopez
opening left after removal of a colostomy bag). 1996). In so doing, it has the possibility of
This information is needed when making policy influencing policy decisions, spending, and even
decisions about purchasing and planning skills patterns of thinking about the health system.
training. Second, the libertarian perspective There is a risk of overreliance on existing perfor-
emphasizes the rights of individuals to access mance measures to the detriment of other aspects
and choose between levels of health care. For of care. For instance, in 2004, Canada’s first min-
example, patients choosing between different isters agreed to reduce waiting times in five prior-
treatments may seek detailed comparative infor- ity areas – radiation therapy for cancer, cardiac
mation on the intended and unintended conse- care, diagnostic imaging, joint (hip and knee)
quences of different treatment modalities: for replacement, and cataract surgery for sight resto-
example, when patients are considering angio- ration – by providing hospitals with cash incen-
plasty and stenting or bypass surgery for coronary tives from a $5.5-billion funding envelope. The
artery disease, their risk preferences may be Canadian Institute for Health Information now
elicited if information on benefits and risks is reports on performance measures for waiting
available and synthesized in an understandable times (CIHI 2012b). The current emphasis on
fashion. Third, the communitarian perspective these five priority areas means that other neces-
emphasizes the need to balance the rights of indi- sary procedures not considered a priority are
viduals against the rights of the community as a disincentivized. In orthopedics, for example,
whole. Organ donation (e.g., with a presumption operations such as surgery to repair feet and
that all persons are organ donors unless donation ankles are paid for out of a hospital’s global bud-
is actively opposed by the family), abortion and get and are not eligible for the incentive payments,
family planning services, and issues associated which creates a financial incentive for hospitals to
with the use of tobacco and intravenous drugs prioritize hip and knee replacements.
are all health-care matters in which communitar-
ian values may be invoked.) Examples from the
literature include performance measurement in Recommendations
the delivery of health-care services (Roski and
Gregory 2001), health systems (Evans A useful performance measure should always
et al. 2001), and the health of the community begin with detailed documentation of the indica-
(Klazinga et al. 2001). tors that constitute the measure, once definitions
The inherent complexities of health care, such have been agreed upon. Given the seemingly
as the diverse expertise of health-care profes- widespread acceptance in Canada of the four
sionals, the variety of organizational arrange- dimensions discussed earlier, indicators should
ments, the array of treatment protocols, and the fall into one of these dimensions: health status,
myriad interactions between managerial and nonmedical determinants of health, health system
clinical activities, may necessitate that multiple performance, and community and health system
outcomes be integrated in evaluating the effects characteristics. There should also be a clarification
of an intervention at the level of the patient, of responsibility for the strategic development
treatment, or health-care system (Sobolev of databases, a greater availability of resources,
et al. 2012). Table 1 provides examples of and clarification of issues associated with coding,
health-care performance indicators and infor- confidentiality, custodianship and access, mainte-
mation needs according to the type of stake- nance of clinical support, optimal use of informa-
holder. This list is not intended to be tion technology, filling gaps, and remedying
exhaustive, and the categories and information deficiencies.
needs overlap between stakeholders. The focus of measurement must be on out-
Once a performance measure comes into prac- comes as well as processes, and health perfor-
tice, it permeates the thinking of decision-makers mance measurement should have an increased
17 Challenges of Measuring the Performance of Health Systems 401
focus on regulatory, financial, and policy mea- Klazinga N, Stronks K, Delnoij D, Verhoeff A. Indicators
sures for managing and organizing health-care without a cause. Reflections on the development and
use of indicators in health care from a public health
systems. There should not be separate agencies perspective. Int J Qual Health Care. 2001;13:433–8.
for quality development, performance measure- Lafortune L, Farand L, Mondou I, et al. Assessing the
ment, and service development, but rather these performance of health technology assessment organi-
should be combined in a common strategy that zations: a framework. Int J Technol Assess Health Care.
2008;24:76–86.
will inform decision-making throughout the entire Levin L, Goeree R, Sikich N, et al. Establishing a compre-
health-care system. hensive continuum from an evidentiary base to policy
There has been, to date, a lack of focus on development for health technologies: the Ontario expe-
strategic evaluations of policy and program coher- rience. Int J Technol Assess Health Care. 2007;
23:299–309.
ence, that is, whether policies and programs are Levy AR. Categorizing outcomes of health care delivery.
addressing the issues and values that are most Clin Invest Med. 2005;28:347–50.
important to Canadians, such as understanding Levy AR, McGregor M. How has extracorporeal shock-
and improving determinants of health by reducing wave lithotripsy changed the treatment of urinary
stones in Quebec? Can Med Assoc J. 1995;153:
poverty and aligning healthcare spending with the 1729–36.
principles embodied in the Canada Health Act. Levy AR, Terashima M, Travers A. Should geographic
analyses guide the creation of regionalized care models
Acknowledgments This chapter is reprinted from Levy, for ST-segment elevation myocardial infarction? Open
Adrian R., and Boris G. Sobolev. “The Challenges of Med. 2010;1:e22–5.
Measuring the Performance of Health Systems in Manitoba, Minister of Health and Healthy Living.
Canada.” Health Care Federalism in Canada. Eds. Manitoba’s comparable health indicator report. Winni-
Katherine Fierlbeck and William Lahey. Montreal: peg: Manitoba Health; 2004.
McGill-Queen’s University Press, 2013. Print. McGregor M, Brophy JM. End-user involvement in health
technology assessment (HTA) development: a way to
increase impact. Int J Technol Assess Health Care.
2005;21:263–7.
Murray CJL, Lopez AD. The global burden of disease: a
References comprehensive assessment of mortality and disability
from diseases, injuries and risk factors in 1990 and
Alkin M. Evaluation roots: tracing theorists’ views and projected to 2020. Cambridge, MA: Harv Sch Public
influences. Thousand Oaks: CA Sage; 2004. Health/WHO/World Bank; 1996; Report No. 1.
Canadian Institute for Health Information (CIHI). Health Osbourne D, Gaebler T. Reinventing government. Lexing-
indicators 2012. http://waittimes.cihi.ca/ ton: Addison-Wesley; 1992.
Evans DB, Edejer TT, Lauer J, et al. Measuring quality: Raftery J, Roderick P, Stevens A. Potential use of routine
from the system to the provider. Int J Qual Health Care. databases in health technology assessment. Health
2001;13:439–46. Technol Assess. 2005;9:1–iv.
Goeree R, Levin L, Chandra K, et al. Health technology Roberts MJ, Hsiao W, Berman P, Reich M. Getting health
assessment and primary data collection for reducing reform right – a guide to improving performance and
uncertainty in decision making. J Am Coll Radiol. equity. Oxford, UK: Oxford University Press; 2008.
2009;6:332–42. Roos LL, Gupta S, Soodeen RA, Jebamani L. Data quality
Health Canada – Health Technology Assessment Task in an information-rich environment: Canada as an
Group on behalf of the Federal/Provincial/Territorial example. Can J Aging. 2005;24 Suppl 1:153–70.
Advisory Committee on Information and Emerging Roski J, Gregory R. Performance measurement for ambu-
Technologies Technology Strategy 1.0. 2004. Avail- latory care: moving towards a new agenda. Int J Qual
able at http://www.hc-sc.gc.ca/hcs-sss/pubs/ehealth- Health Care. 2001;13:447–53.
esante/2004-tech-strateg/index-eng.php Shepherd RP. In search of a balanced Canadian federal
Hutton J, McGrath C, Frybourg JM, et al. Framework for evaluation function: getting to relevance. Can J Pro-
describing and classifying decision-making systems gram Eval. 2012;26:1–45.
using technology assessment to determine the reim- Snowdon A, Schnarr K, Hussein A, Alessi C. Measuring what
bursement of health technologies (fourth hurdle sys- matters: the cost vs. values of health care. Ivey Interna-
tems). Int J Technol Assess Health Care. 2006;22:10–8. tional Centre for Health Innovation. http://sites.ivey.ca/
Ibrahim JE. Performance indicators from all perspectives. healthinnovation/thought-leadership/white-papers/measur
Int J Qual Health Care. 2001;13:431–2. ing-what-matters-the-cost-vs-values-of-health-care-
Jacob R, McGregor M. Assessing the impact of health november-2012/
technology assessment. Int J Technol Assess Health Sobolev B, Sanchez V, Kuramoto L. Health care evaluation
Care. 1993;13:68–80. using computer simulation: concepts, methods and
402 A. R. Levy and B. G. Sobolev
applications. New York: Springer; 2012; 480 pages Velasco GM, Gerhardus A, Rottingen JA, Busse
ISBN: 978-1-4614-2232-7. R. Developing health technology assessment to address
Suissa S, Henry D, Caetano P, et al. CNODES: the Cana- health care system needs. Health Policy. 2010;
dian network for observational drug effect studies. 94:196–202.
Open Med. 2012;6, e134. Weintraub WS, Grau-Sepulveda MV, Weiss JM,
Tu JV, Bowen J, Chiu M, et al. Effectiveness and safety of et al. Comparative effectiveness of revascularization
drug-eluting stents in Ontario. N Engl J Med. strategies. N Engl J Med. 2012;366:1467–76.
2007;357:1393–402.
Part II
Methods in Health Services Research
Analysis of Repeated Measures and
Longitudinal Data in Health Services 18
Research
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Issues Inherent in Longitudinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Historical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
Statistical Models for the Analysis of Longitudinal and Repeated
Measures Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Mixed-Effects Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Matrix Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
Covariance Pattern Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Calculating Effect Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
Illustrative Example: The WECare Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
Mixed-Effects Regression Models for Continuous Data Using the
WECare Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
Curvilinear Growth Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
Covariance Pattern Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
Effect of Treatment Group on Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Extensions and Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Analysis of Longitudinal Data with Missing
Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Generalized Estimating Equation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Models for Categorical Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
J. Siddique (*)
Department of Preventive Medicine, Northwestern
University Feinberg School of Medicine, Chicago, IL, USA
e-mail: siddique@northwestern.edu
D. Hedeker
Department of Public Health Sciences, University of
Chicago, Chicago, IL, USA
e-mail: hedeker@uchicago.edu
R. D. Gibbons
Departments of Medicine and Public Health Sciences,
University of Chicago, Chicago, IL, USA
e-mail: rdg@uchicago.edu
Issues Inherent in Longitudinal Data study. Reasons for discontinuing the study may be
differentially related to the treatment. For exam-
While longitudinal studies provide far more infor- ple, some subjects may develop side effects to an
mation than their cross-sectional counterparts, otherwise effective treatment and must discon-
they are not without complexities. The following tinue the study. Alternatively, some subjects
sections review some of the major issues associ- might achieve the full benefit of the study early
ated with longitudinal data analysis. on and discontinue the study because they feel
that their continued participation will provide no
Heterogeneity added benefit. The treatment of missing data in
Particularly in health services research, individual longitudinal studies is itself a vast literature, with
differences are the norm rather than the exception. major contributions by Laird (1988), Little
The overall mean response in a sample drawn (1995), Rubin (1976), and Little and Rubin
from a population provides little information (2002) to name a few. The basic issue is that
regarding the experience of the individual. In con- even in a randomized and well-controlled clinical
trast to cross-sectional studies in which it is rea- trial, the subjects who were initially enrolled in the
sonable to assume that there are independent study and randomized to the various treatment
random fluctuations at each measurement occa- conditions may be quite different from those sub-
sion, when the same subjects are repeatedly mea- jects that are available for analysis at the end of the
sured over time, their responses are correlated trial. If subjects “drop out” because they already
over time, and their estimated trend line or curve have derived full benefit from an effective treat-
can be expected to deviate systematically from the ment, an analysis that only considers those sub-
overall mean trend line. For example, behavioral jects who completed the trial may fail to show that
and/or biological subject-level characteristics can the treatment was beneficial relative to the control
increase the likelihood of a favorable response to a condition. This type of analysis is often termed a
particular experimental intervention (e.g., a new “completer” analysis. To avoid this type of obvi-
pharmacologic treatment for depression), leading ous bias, investigators often resort to an analysis
subjects with those characteristics to have a trend in which the last available measurement is carried
with higher slope (i.e., rate of change) than the forward to the end of the study as if the subject had
overall average rate of change for the sample as a actually completed the study. This type of analy-
whole. In many cases, these personal characteris- sis, often termed an “end-point” analysis, intro-
tics may be unobservable, leading to unexplained duces its own set of problems in that (a) all
heterogeneity in the population. Modeling this subjects are treated equally regardless of the
unobserved heterogeneity in terms of variance actual intensity of their treatment over the course
components that describe subject-level effects is of the study, and (b) the actual responses that
one way to accommodate the correlation of the would have been observed at the end of the
repeated responses over time and to better study, if the subject had remained in the study
describe individual differences in the statistical until its conclusion, may in fact, be quite different
characterization of the observed data. These vari- than the response made at the time of discontinu-
ance components are often termed “random-effects,” ation. Returning to the example of the study in
leading to terms like random-effects or mixed- which subjects discontinue when they feel that
effects regression models. they have received full treatment benefit, an
end-point analysis might miss the fact that some
Missing Data of these subjects may have had a relapse had they
Perhaps the most important issue when analyzing remained on treatment. Many other objections
data from longitudinal studies is the presence of have been raised about these two simple
missing data. Stated quite simply, not all subjects approaches of handing missing data, which have
remain in the study for the entire length of the led to more statistically reasoned approaches for
408 J. Siddique et al.
the analysis of longitudinal data with missing increase over time and covariances decrease as
observations. time-points become more separated in time.
Finally, based on the use of least-squares estima-
Irregularly Spaced Measurement tion, the repeated measures ANOVA breaks down
Occasions for unbalanced designs, such as those in which the
It is not at all uncommon in real longitudinal sample size decreases over time due to subject
studies either in the context of designed experi- discontinuation. Based on these limitations, the
ments or naturalistic cohorts, for individuals to repeated measures ANOVA and related
vary both in the number of repeated measure- approaches are mostly no longer used for the
ments they contribute and even in the time at analysis of longitudinal data. Mixed-effects
which the measurements are obtained. This may regression models, which are described in the
be due to drop-out or simply due to different sub- next section, build upon the repeated measures
jects having different schedules of availability. ANOVA framework by allowing more than just
While this can be quite problematic for traditional the intercept term to vary by individual in order to
analysis of variance based approaches (leading to better capture between-subject variability. In addi-
highly unbalanced designs which can produce tion, mixed-effects regression models use all
biased parameter estimates and tests of hypothe- available data so that not all subjects need to be
ses), more modern statistical approaches to the measured at the same time points.
analysis of longitudinal data are all but immune The second early approach for repeated mea-
to the “unbalancedness” that is produced by hav- sures data was multivariate growth curve – or
ing different times of measurement for different MANOVA – models (Potthoff and Roy 1964;
subjects. Indeed, this is one of the most useful Bock 1975). The primary advantage of the
features of the regression approach to this prob- MANOVA approach versus the ANOVA
lem, namely the ability to use all of the available approach is that the MANOVA assumes a gen-
data from each subject, regardless of when the eral form for the correlation of repeated mea-
data were specifically obtained. surements over time, whereas the ANOVA
assumes the much more restrictive compound-
symmetric form. The disadvantage of the
Historical Background MANOVA model is that it requires complete
data. Subjects with incomplete data are
Existing methods for the analysis of longitudinal removed from the analysis, leading to potential
data are an outgrowth of two earlier approaches bias. In addition, both MANOVA and ANOVA
for repeated measures data. The first approach, the models focus on comparison of group means
so-called repeated measures ANOVA was essen- and provide no information regarding subject-
tially a random intercept model that assumed that specific growth curves. Finally, both ANOVA
subjects could only deviate from the overall mean and MANOVA models require that the time-
response pattern by a constant that was equivalent points are fixed across subjects (either evenly
over time. A more reasonable view is that the or unevenly spaced) and are treated as a classi-
subject-specific deviation is both in terms of the fication variable in the ANOVA or MANOVA
baseline response (i.e., intercept) and in terms of model. This precludes analysis of unbalanced
the rate of change over time (i.e., slope or set of designs in which different subjects are mea-
trend parameters). This more general structure sured on different occasions. Finally, software
could not be accommodated by the repeated mea- for the MANOVA approach often makes it dif-
sures ANOVA. The random intercept model ficult to include time-varying covariates, which
assumption leads to a compound-symmetric vari- are often essential to modeling dynamic rela-
ance-covariance matrix for the repeated measure- tionships between predictors and outcomes.
ments in which the variances and covariances of The MANOVA approach has been extended
the repeated measurements are constant over time. into a set of methods referred to as CPMs
In general, it is common to find that variances which also estimate the parameters of the
18 Analysis of Repeated Measures and Longitudinal Data in Health Services Research 409
approaches estimate average change (across time) level, and β1, the linear change across time) do
in a population, MRMs can also estimate change not vary by individuals except in terms of treat-
for each subject. These estimates of individual ment assignment. For both of these reasons, it is
change across time can be particularly useful in useful to add individual-specific effects into the
longitudinal studies where a proportion of sub- model that will account for the data dependency
jects exhibit change that deviates from the average and describe differential time-trends for differ-
trend. ent individuals. This is precisely what MRMs
To help fix ideas, consider the following simple do. The essential point is that MRMs therefore
linear regression model for the measurement y of can be viewed as augmented linear regression
individual i (i = 1, 2, . . ., N subjects) on occasion models. Note also that here and elsewhere
j ( j = 1, 2, . . . ni occasions): in this chapter, a main effect for treatment is
not included in the model. That is, it is assumed
yij ¼ β0 þ β1 tij þ β2 tij Trti þ eij : (1) that there is no difference in the expected out-
comes between treatment groups at baseline.
Ignoring subscripts, this model represents This is a reasonable assumption in a clinical
the regression of the outcome variable y on the trial where participants are randomized
independent variable time (denoted t). The sub- prior to receiving treatment. Alternatively, in
scripts keep track of the particulars of the data, an observational study where treatment
namely whose observation it is (subscript i) and (or exposure) is not randomized, it usually
when the observation was made (the subscript makes sense to include a main effect for treat-
j). The independent variable t gives a value to ment to account for differences between treat-
the level of time, and may represent time in ment groups at baseline.
weeks, months, etc. Since y and t carry both
i and j subscripts, both the outcome variable Random Intercept Model
and the time variable are allowed to vary by A simple extension of the linear regression model
individuals and occasions. The variable T rti is described in Eq. 1 is the random intercept model,
a binary variable that indicates the treatment which allows each subject to deviate from the
assigned to individual i. When T rt is dummy overall mean response by a person-specific con-
coded as a 1 or 0, with 1 indicating membership stant that applies equally over time:
in the treatment group, the regression coeffi-
cient β0 is the mean of y when t = 0, β1 is the yij ¼ β0 þ β1 tij þ β2 tij Trti þ υ0i þ eij (2)
slope or rate of change for the control group,
and β2 is the difference in slopes between the where υ0i represents the influence of individual
treatment and control groups. i on his/her repeated observations. Notice that if
In linear regression models, the errors eij are individuals have no influence on their repeated
assumed to be normally and independently dis- outcomes, then all of the υ0i terms would equal
tributed in the population with zero mean and 0. However, it is more likely that subjects will
common variance σ 2. This independence assump- have positive or negative influences on their
tion makes the typical general linear regression longitudinal data, and so the υ0i terms will devi-
model unreasonable for longitudinal data. This is ate from 0. Since individuals in a sample
because the outcomes y are observed repeatedly are typically thought to be representative of a
from the same individuals, and so it is much more larger population of individuals, the individual-
reasonable to assume that errors within an indi- specific effects υ0i are treated as random effects.
vidual are correlated to some degree. Further- That is, the υ0i are considered to be representa-
more, the above model posits that the change tive of a distribution of individual effects in
across time is the same for all individuals since the population. The most common form for this
the model parameters (β0, the intercept or initial population distribution is the normal distribution
18 Analysis of Repeated Measures and Longitudinal Data in Health Services Research 411
26
25
25
24
24
Dependent Variable
Dependent Variable
23
23
22
22
21
21
Average Trend Average Trend
Individual Trends Individual Trends
20
20
0 1 2 3 4 0 1 2 3 4
Time Time
Fig. 1 Simulated longitudinal data based on a random the overall population (average) trend. The dashed lines
intercept model (left panel) and a random intercept and represent individual trends
slope model (right panel). The solid bold line represents
with mean 0 and variance σ 2υ . In addition, the one for each individual. The variance term σ 2υ
model assumes that the errors of measurement represents the spread of these lines. If σ 2υ is near-
(eij) are conditionally independent, which zero, then the individual lines would not deviate
implies that the errors of measurement are inde- much from the population trend and individuals
pendent conditional on the random individual- do not exhibit much heterogeneity in their change
specific effects υ0i. Since the errors now have across time. Alternatively, as individuals differ
the influence due to individuals removed from from the population trend, the lines move away
them, this conditional independence assump- from the population trend line and σ 2υ increases. In
tion is much more reasonable than the ordinary this case, there is more individual heterogeneity in
independence assumption associated with the time-trends.
linear regression model in Eq. 1. The random
intercept model is depicted graphically in the Random Intercept and Trend Model
left panel of Fig. 1. For longitudinal data, the random intercept model
As can be seen, individuals deviate from the is often too simplistic for a number of reasons.
regression of y on t in a parallel manner in this First, it is unlikely that the rate of change across
model (since there is only one subject effect υ0i) time is the same for all individuals. It is more
(for simplicity, it is assumed the treatment effect likely that individuals differ in their time-trends;
β2 = 0). In this figure the solid line represents the not everyone changes at the same rate. Further-
population average trend, which is based on more, the compound symmetry assumption of the
β0 and β1. Also depicted are ten individual trends, random intercept model is usually untenable for
both below and above the population (average) most longitudinal data. In general, measurements
trend. For a given sample there are N such lines, at points close in time tend to be more highly
412 J. Siddique et al.
correlated than measurements further separated in and slope parameters represent the overall (pop-
time. Also, in many studies subjects are more ulation) trend, while the individual parameters
similar at baseline due to entry criteria, and express how subjects deviate from the popula-
change at different rates across time. Thus, it is tion trends. The right panel of Fig. 1 represents
natural to expect that variability will increase this model graphically.
over time. As can be seen, individuals deviate from the
For these reasons, a more realistic MRM average trend both in terms of their intercept
allows both the intercept and time-trend to vary and in terms of their slope. As with the random
by individuals: intercept model, the spread of the lines around
the average intercept is measured by σ 2υ0 in Eq. 4.
yij ¼ β0 þ β1 tij þ þβ2 tij Trti þ υ0i The variance of the slopes around the average
trend is measured by σ 2υ1 in Eq. 4. By allowing
þ υ1i tij þ eij : (3)
the individual slopes to vary, it is now possible
for individual trends to be positive even though
In this model, β0 is the overall population the overall trend is negative. The term σ υ0υ1 in
intercept, β1 is the overall population slope for
Eq. 4 measures the association (covariance)
the group with Trt coded 0, and β2 indicates how between the random intercept and slope. When
the population slopes vary between treatment this quantity is negative, individuals with larger
groups (by specifically indicating how the
intercepts (β0 + υi0) will have steeper slopes
slope for Trt coded 1 is different than the slope (β1 + υi1).
for Trt coded 0). In terms of the random effects,
υ0i is the intercept deviation for subject i, and υ1i
is the slope deviation for subject i (relative to Matrix Formulation
their treatment group). As before, eij is an inde-
pendent error term distributed normally with
A more compact representation of the MRM
mean 0 and variance σ 2. As with the random is afforded using matrices and vectors. This
intercept model, the assumption regarding the formulation helps to summarize statistical
independence of the errors is one of conditional
aspects of the model. For this, the MRM for
independence, that is, they are independent con- the ni 1 response vector y for individual
ditional on υ0i and υ1i. With two random i can be written as:
individual-specific effects, the population dis-
tribution of intercept and slope deviations is yi ¼ Xi β þ Z i v i þ ei (5)
assumed to be bivariate normal N (0, Συ), with ni 1 ni p p1 ni r r1 ni 1
the random-effects variance-covariance matrix
given by with i = 1 . . . N individuals and j = 1 . . . ni
observations for individual i. Here, yi is the
X σ 2υ0 σ υ0υ1 ni 1 dependent variable vector for individual
¼ : (4)
υ σ υ0υ1 σ 2υ1 i, Xi is the ni p covariate matrix for individual
i, β is the p 1 vector of fixed regression param-
The model described in Eq. 3 can be thought eters, Zi is the ni r design matrix for the
of as a personal trend or change model since it random effects, υi is the r 1 vector of random
represents the measurements of y as a function individual effects, and «i is the ni 1 residual
of time, both at the individual υ0i and υ1i and vector.
population β0 and β1 (plus β2) levels. The inter- For example, in the random intercepts and
cept parameters indicate the starting point, and slopes MRM just considered, for a participant in
the slope parameters indicate the degree of the treatment group (Trti = 1) the data matrices are
change over time. The population intercept written as
18 Analysis of Repeated Measures and Longitudinal Data in Health Services Research 413
2 3 2 3
yi1 1 ti1 ti1 and the variance-covariance matrix equals σ 2 I ni þ
6 yi2 7 6 1 ti2 ti2 7
6 7 6 7 2 3
yi ¼ 6 7 6
6 7 and Xi ¼ 6 7 and Zi
7 σ 2υ0 σ 2υ0 þ σ υ0 υ1 σ 2υ0 þ 2σ υ0 υ1
45 4 5 4 σ 2υ þ σ υ0 υ1
0
σ υ0 þ 2σ υ0 υ1 þ σ 2υ1
2
σ 2υ0 2 5
þ 3σ υ0 υ1 þ 2σ υ1
yini 1 tini tini σ 2υ0 þ 2σ υ0 υ1 σ 2υ0 þ 3σ υ0 υ1 þ 2σ 2υ1 σ 2υ0 þ 4σ υ0 υ1 þ 4σ 2υ1
2 3
1 ti1
6 1 ti2 7 which allows the variances and covariances to
6 7 change across time. For example, if σ υ0 υ1 is posi-
¼66 7
7
4 5 tive, then clearly the variance increases across
time. Diminishing variance across time is also
1 tini
possible if, for example, 2σ υ0 υ1 > σ 2υ1 . Other
patterns are possible depending on the values of
and the population and individual trend parameter
these variance and covariance parameters.
vectors are written as,
Models with additional random effects are also
2 3 possible, as are models that allow autocorrelated
β0 errors, that is «i N (0, σ 2Ωi). Here, Ω might, for
υ
β ¼ 4 β1 5 and υ0i ¼ 0i example, represent an autoregressive (AR) or
υ1i
β2 moving average (MA) process for the residuals.
Autocorrelated error regression models are com-
respectively. The distributional assumptions mon in econometrics. Their application within an
about the random effects and residuals are: MRM formulation is treated by Chi and Reinsel
(1989) and Hedeker (1989), and extensively
υi N ð0, Συ Þ described in Verbeke and Molenberghs (2000).
ei N ð0, σ 2 I ni Þ: By including both random effects and auto-
correlated errors, a wide range of variance-
As a result, it can be shown that the expected covariance structures for the repeated measures
value of the repeated measures yi is is possible. This flexibility is in sharp contrast to
the traditional ANOVA models which assume
Eðyi Þ ¼ Xi β (6) either a compound symmetry structure (univariate
ANOVA) or a totally general structure
and the variance-covariance matrix of yi is of the (MANOVA). Typically, compound symmetry is
form: too restrictive and a general structure is not parsi-
monious. MRMs, alternatively, provide these two
and everything in between, and so allow efficient
V ðyi Þ ¼ Z i Συ Z0i þ σ 2 I ni : (7) modeling of the variance-covariance structure of
the repeated measures.
For example, with r = 2, n = 3, and
2 3 2 3
1 0 0 1 0 Covariance Pattern Models
Xi 4 1 1 1 5 and Z i ¼ 4 1 15
1 2 2 1 2 An alternative to using random effects to model
correlated measurements over time is to explicitly
The expected value of y is model the covariance structure through the use of
2 3 CPMs. These models are a direct outgrowth of the
β0 multivariate growth curve models described in the
4 β 0 þ β 1 þ β2 5
“Historical Background” section where the
β0 þ 2β1 þ 2β2 covariance structure of the repeated observations
414 J. Siddique et al.
was assumed to follow a general form and all subject and assuming constant (homogenous)
parameters of the matrix were estimated. Rather variance over time (though the homogeneity of
than estimating every parameter of the covariance variance can be relaxed).
matrix, CPMs assume the variance-covariance When choosing a covariance model for repeated
matrix of the repeated observations follows a spe- measures data, one wishes to choose the most par-
cific structure. For example, the compound sym- simonious model that fits the data well. This can be
metry (CS) covariance model has only two done by first modeling the mean of observations
parameters σ 2 (variance) and ρ (correlation) and over time and then using likelihood ratio tests as
assumes that observations Yij have constant vari- well as model fit indices such as the Bayesian Infor-
ance over time and the correlation between any mation Criteria (BIC) and the Akaike Information
two observations on the same subject is the same Criteria (AIC) to select the model that best fits the
no matter how far apart those observations correlation and variance structure of the data. More
occurred. A variety of covariance structures exist details on methods for assessing and comparing
and are available in most software packages. See model fit of the variance-covariance structure are
Weiss (2005) for detailed descriptions of a number described by Wolfinger (1993) and Grady and
of different covariance matrices. Helms (1995).
Using the matrix notation in Eq. 5, a CPM
would be Calculating Effect Sizes
yi ¼ Xi β þ ei (8) Effect Sizes for Mixed-Effects Models
It is often of interest to summarize results from an
Where instead of assuming the residuals are intervention in terms of effect sizes. The effect
independent, it is assumed «i N (o, Ωi). Some size of an intervention is defined as the difference
common choices for Ωi include the previously in means between the intervention and the control
mentioned compound symmetry where for three (or its comparator) divided by the standard devi-
observations on subject i the covariance matrix is ation of the outcome. Assume a random intercept
0 1 and slope MRM as in Eq. 16, that is
1 ρ ρ
V ðyi Þ ¼ σ 2 @ ρ 1 ρA
yij ¼ β0 þ β1 tij þ þβ2 tij Trti þ υ0i þ υ1i tij
ρ ρ 1
þ eij
and the parameter ρ is the correlation between
any two observations on the same subject. An To estimate the effect size of the treatment
autoregressive or AR(1) covariance structure effect at time 2, begin by calculating the predicted
also has two parameters like the compound mean for a subject in the treatment group at time
symmetry structure but takes on a different 2 (Trti = 1 tij = 2):
form, namely,
0 1 E yij j Trti ¼ 1, tij ¼ 2 ¼ β0 þ 2β1 þ 2β2 (9)
1 ρ ρ
2
V ðyi Þ ¼ σ 2 @ ρ 1 ρ A:
and the predicted mean for a control subject at
ρ2 ρ 1
time 2 is
Thus, the farther apart two observations are
in time, the lower the correlation between E yij j Trti ¼ 1, tij ¼ 2 ¼ β0 þ 2β1 (10)
them (assuming ρ > 0). In general, CPMs
apply structure by specifying a specific relation- since the mean of the random effects and variance
ship between repeated observations on the same terms are 0. Thus the difference between the two
18 Analysis of Repeated Measures and Longitudinal Data in Health Services Research 415
Table 1 WECare mean Hamilton Depression Rating random intercept and slope model in Eq. 16.
Scale (HDRS) scores, percent missing, and cumulative Here, time corresponds to the month of the inter-
measurement dropout at each time point
view and takes on values from 0 to 12. As noted
Mean HDRS score (% missing, % cumulative above, the change in depression scores across
measurement dropout)
time do not appear to be linear. For now, time is
Month of Medication CBT TAU
study (n = 88) (n = 90) (n = 89) treated as linear in order to demonstrate the role of
Baseline 17.95 (0%, 16.28 (0%, 16.48 (0%, diagnostics in addressing model fit. Subsequently,
0%) 0%) 0%) quadratic and cubic terms are incorporated as well
Month 1 14.00 (20%, 13.11 12.80 as the effect of treatment group in the model. The
2%) (27%, 6%) (27%, 4%) initial model is
Month 2 10.74 (16%, 11.42 11.30
5%) (27%, 7%) (29%, yij ¼ β0 þ β1 tij þ υ0i þ υ1i tij þ eij (16)
10%)
Month 3 9.60 (28%, 10.24 13.05
8%) (36%, 9%) (27%, where β0 is the average month 0 (baseline) HDRS
11%) level and β1 is the average HDRS monthly linear
Month 4 9.54 (31%, 9.07 (38%, 11.81 change. The random effect υ0i is the individual
9%) 13%) (35%, deviation from the average intercept, and υ1i is the
12%)
individual deviation from the average linear
Month 5 8.62 (40%, 10.47 11.85
14%) (34%, (40%, change. Fitting this model yields the results
14%) 13%) given in Table 2.
Month 6 9.17 (28%, 10.73 11.92 Focusing first on the estimated regression param-
18%) (33%, (29%, eters, this model indicates that patients start, on
14%) 15%)
average, with a HDRS score of 14.08 and change
Month 8 8.07 (36%, 9.62 (30%, 11.55
24%) 17%) (33%,
by 0.51 points each month. Lower scores on the
18%) HDRS reflect less depression, so patients are
Month 10 9.04 (40%, 8.31 (31%, 10.92 improving over time. The estimated HDRS score
27%) 20%) (31%, at a given month equals 14.08 (0.51 month). So
19%) for example, at month 2 the average depression
Month 12 9.71 (30%, 8.38 (24%, 10.22 score is 15.64 (1.56 2) = 12.88. Both the
30%) 24%) (19%,
19%) intercept and slope are statistically significant (p <
Note. CBT cognitive behavioral therapy, TAU treatment as
0.0001). The intercept being significant is not par-
usual ticularly meaningful; it just indicates that HDRS
scores are different than zero at baseline. However,
between-subject variability in the data. Second, as because the slope is significant, one can conclude
shown by the plots of means over time, the trends that the rate of improvement is significantly different
in depression scores do not appear to be linear. from zero in this study. On average, patients are
Instead, they appear curvilinear, with an initial improving across time.
strong downward trend and then a leveling off For the variance and covariance terms of the
over time. random effects, there are concerns in using the
standard errors in constructing Wald test statistics
(estimate divided by its standard error) particu-
Mixed-Effects Regression Models larly when the population variance is thought to
for Continuous Data Using the WECare be near zero and the number of subjects is small
Study (Bryk and Raudenbush 1992). This is because
variance parameters are bounded; they cannot be
This section illustrates the use of MRMs for con- less than zero and so using the standard normal for
tinuous data using the WECare data. The section the sampling distribution is not reasonable. As a
begins by fitting the WECare data using the result, statistical significance is not indicated for
18 Analysis of Repeated Measures and Longitudinal Data in Health Services Research 417
35
Fig. 2 WECare depression
scores over the course of the
study. The top panel plots
30
the raw HDRS scores for all
267 participants where each
25
Raw HDRS Values
line represents a single
individual. The bottom
20
panel is plots of mean
HDRS scores by treatment
15
group. There is substantial
heterogeneity in the raw
scores and nonlinear trends 10
in the means
5
0
0 1 2 3 4 5 6 8 10 12
Month
35
30
25
Mean HDRS Scores
20
15
10
TAU
5
CBT
Medication
0
0 1 2 3 4 5 6 8 10 12
Month
the variance and covariance parameters in the Thus, there is considerable heterogeneity in
tables. However, the magnitude of the estimates terms of patients’ initial level of depression and
does reveal the degree of individual heterogeneity in their change across time. Finally, the covariance
in both the intercepts and slopes. For example, between the intercept and linear trend is negative;
while the average intercept in the population expressed as a correlation it equals 0.13, which
is estimated to be 14.08, the estimated popula- is small in size. This suggests that baseline depres-
tion standard deviation for the intercept is 4:52 sion level (i.e., intercept) is not related to the
pffiffiffiffiffiffiffiffiffiffiffiffi
¼ 20:44 . Similarly, the average population amount of linear change over time. Later on, it is
slope is 0.51, but the estimated population seen that baseline level is positively correlated
standard deviation for the slope equals 0.42, with quadratic trend – patients who are initially
and so approximately 95% of subjects are more depressed tend to level off over time more
expected to have slopes in the interval than patients who are less depressed at baseline.
0.51 (1.96 0.42) = 1.33 to 0.31. That Using the estimated population intercept (β^ 0 ) and
the interval includes positive slopes reflects the slope ( β^ 1 ) one can estimate the average HDRS
fact that not all subjects improve across time. score at each time-point. These are displayed in
418 J. Siddique et al.
Fig. 3 along with the observed means at each time- and George (1998) which describes use of econo-
point. As can be seen, a linear trend does not result metric forecasting statistics to assess various
in close agreement between the observed and forms of fit between observed and estimated
estimated means. In particular, there is an initial means. The lack of fit of the estimated means to
sharp downward trend that the linear model is the observed means suggests the inclusion of cur-
unable to capture. For a more quantitative assess- vilinear trends in the model – a point made in the
ment, the interested reader is referred to Kaplan next section.
Table 2 MRM regression results for WECare data with Curvilinear Growth Model
random intercepts and slopes and assuming linear change
over time
In many situations, it is too simplistic to assume
Parameter that the change across time is linear. In the
name Symbol Estimate SE t p-value
present example, for instance, it appears that
Intercept β0 14.08 0.33 42.30 <0.0001
the depression scores diminish across time in a
Linear β1 0.51 0.04 12.27 <0.0001
slope
curvilinear manner. A curvilinear trend would
Intercept σ 2υ0 20.44 2.53 allow a leveling off of the improvement across
variance time. This is clearly plausible for rating scale
Intercept/ σ υ0 υ1 0.25 0.23 data, like the HDRS scores, where values below
linear slope zero are impossible. Here, a curvilinear growth
covariance
model is considered by adding both a quadratic
Linear σ 2υ1 0.18 0.04
slope
and cubic term to the model. A plot of observed
variance versus estimated means using linear and qua-
Error σ
2
23.67 0.88 dratic terms (not shown) did not appear to fit the
variance observed data well so a cubic term is also added.
Note. 2 log L = 12305.7. When random cubic effects were included in
18
16
Mean HDRS Scores
14
12
10
Raw Means
Linear Estimated Means
8
0 1 2 3 4 5 6 8 10 12
Month
Fig. 3 Observed and predicted WECare mean depression scores. Mean scores based on a linear or quadratic model do not
fit the observed data as well as a model that includes cubic effects
18 Analysis of Repeated Measures and Longitudinal Data in Health Services Research 419
the model, they were perfectly correlated with 0.02. Thus, change in depression from base-
the random quadratic effects so the updated line to a given month is calculated as
model only has random intercepts, slopes, 16.33 (2.69 month) + (0.36 month2)
and quadratic slopes. This produces the follow- (0.02 month3). So for example, at month 2 the
ing model average depression score is 16.33 (2.69 2) +
(0.36 4) (0.02 8) = 12.26. Average HDRS
yij ¼ β0 þ β1 tij þ β2 t2 þ β3 t3ij þ υ0i scores at each month are displayed in Fig. 3 along
with the observed means and estimated means
þ υ1i tij þ υ2i t2ij þ eij : (17) based on a linear model. Including a cubic effect
in the model does a better job capturing trends in
Where β0 is the average month 0 HDRS level, depression scores over time. Note that at months
β1 is the average HDRS monthly linear change, β2 8 and 10, the quadratic term dominates so that
is the average HDRS monthly quadratic change, mean depression scores begin to increase, and
and β3 is the average HDRS monthly cubic then at month 12 the cubic term dominates so
change. Similarly, υ0i is the individual deviation that HDRS scores decrease again. Most of the
from average intercept, υ1i is the individual devi- improvement in depression is occurring during
ation from average linear change, and υ2i is the the first few months of the study. Because the
individual deviation from average quadratic scale for each of these terms is different (e.g.,
change. Fitting this model yields the results the linear effect ranges from 0 to 12, the cubic
given in Table 3. effect ranges from 0 to 123 = 1728), it is diffi-
Focusing first on the estimated regression cult to compare them to each other in terms of
parameters, this model indicates that patients magnitude. The t-statistics provide some evi-
start off, on average, with an HDRS score of dence of the magnitude and suggest that
16.33. Note that this value is higher than the although the linear effect is strongest, all three
intercept of the linear model of 14.08 and closer effects contribute to the effect of time on depres-
to the observed baseline mean of 16.9. The sion symptoms.
linear, quadratic, and cubic terms in the model As before, the variance and covariance terms
are all highly significant (p < 0.0001). The in Table 3 provide information regarding the
coefficient of the linear effect of month is amount of heterogeneity in the data. The 95%
2.69, the coefficient of the quadratic term is confidence interval for subject-specific intercepts
0.36, and the coefficient of the cubic term is is 16.33 3.87 and the 95% confidence interval
Table 3 MRM results for the WECare data with cubic trends and random intercept, slope, and quadratic slopes effects
Parameter name Symbol Estimate SE t p-value
Intercept β0 16.33 0.34 47.99 <0.0001
Month β1 2.69 0.22 12.03 <0.0001
Month2 β2 0.36 0.05 7.97 <0.0001
Month3 β3 0.015 0.003 6.12 <0.0001
Intercept variance σ 2υ0 15.02 2.38
Intercept/linear slope covariance σ υ0 υ1 0.67 0.69
Linear slope variance σ 2υ1 1.55 0.36
Intercept/quadratic slope covariance σ υ0 υ2 0.10 0.05
Linear/quadratic slope covariance σ υ1 υ2 0.11 0.03
Quadratic slope variance σ 2υ2 0.01 0.002
2
Error variance σ 19.75 0.79
Note. 2 log L = 12095.1
420 J. Siddique et al.
for the subject-specific quadratic terms in the a more restrictive variance-covariance structure
model includes zero reflecting the fact that there than is correct. As noted by Berkhof and
is considerable heterogeneity in terms of patients’ Snijders (2001), this bias can largely be
initial level of depression and in their changes corrected by dividing the p-value obtained
across time. from the likelihood-ratio test (of variance
Finally, the covariance between the linear terms) by two. In the present case it doesn’t
effect and the quadratic effect is negative; really matter, but this modification yields p <
expressed as a correlation it equals 0.94, which 0.0001/2 = 0.00005. Thus, there is clear evi-
is very high. This is partially due to multi- dence that the assumption of only random inter-
colinearity but also suggests that those patients cepts and linear slopes is rejected, and the
who make the most initial gains (i.e., steep slopes) inclusion of the random quadratic slopes is
tend to level off at a greater rate (i.e., greater necessary.
quadratic effects) than patients who have flatter In addition to plots of the overall means over
slopes in the early stages of the study. An alterna- time, estimates of the individual trends, based on
tive explanation is that of a floor effect due to the the random effects ^υ 0i , ^υ 1i and ^υ 2i are often of
HDRS rating scale. Simply put, once patients interest. Figure 4 contains a plot of the individual
achieve low depression scores they no longer trend estimates from this model. These are
have room to keep improving and thus tend to obtained by calculating ^y ij ¼ β^ 0 þ β^ 1 tij þ β^ 1 t2ij þ
level off. β^ 2 t3ij þ ^υ 0i þ ^υ 1i tij þ ^υ 2i t2ij for t = 0, 1,. . ., 12, and
An interesting question, at this point, is then connecting the time point estimates for each
whether it is necessary to include random individual. For clarity, 50 of the 267 WECare
effects for the linear and quadratic terms or participants were randomly selected to display in
whether a less complicated model is sufficient. Fig. 4.
Fitting the more restrictive model with random The plot makes apparent the wide heteroge-
intercepts and linear terms (not shown) yields neity in trends across time, as well as the
2 log L = 12155.6. Note that both models still increasing variance in HDRS scores across
include fixed effects for linear slope, quadratic time. Some individuals have initial accelerating
slope, and cubic slope. Because these are nested downward trends suggesting immediate
models, they can be compared using a improvement and then a leveling off over time,
likelihood-ratio test. For this, one compares while others appear to have more modest
the difference in model deviance values (i.e., improvements and then perhaps a slight wors-
2 log L ) to a chi-square distribution, where ening of symptoms. Some individuals even
the degrees of freedom equals the number of have positive trends indicating a worsening of
parameters set equal to zero in the more restric- their depressive symptoms across time. This is
tive model. Comparing the full model to the not too surprising given that not all depression
restricted model with only random intercepts interventions work for everyone. At the end of
and slopes, χ 23 = 12155.6–12095.1 = 60.5, this chapter, growth mixture models are briefly
p < 0.0001 for H 0 : σ υ0 υ2 ¼ σ υ1 υ2 ¼ σ 2υ2 ¼ 0. It introduced which attempt to classify individuals
should be noted that the use of the likelihood into discrete latent classes based on the shape of
ratio test for this purpose also suffers from the their trajectories.
variance boundary problem mentioned above It is worth noting that the estimates of the
(Verbeke and Molenberghs 2000). Based on individual trends presented in Fig. 4 are empir-
simulation studies it can be shown that the ical Bayes (EB) estimates, which reflect a com-
likelihood-ratio test is too conservative (for test- promise between an estimate based solely on an
ing null hypotheses about variance parameters), individual’s data and an estimate for the popu-
namely, it does not reject the null hypothesis lation of interest. Thus, they are not equivalent
often enough. This would then lead to accepting to ordinary least squares (OLS) estimates (i.e.,
18 Analysis of Repeated Measures and Longitudinal Data in Health Services Research 421
25
20
Estimated HDRS Values
15
10
5
0 1 2 3 4 5 6 8 10 12
Month
Fig. 4 Subject-specific estimated WECare HDRS means over time based on a model with cubic fixed effects and random
intercept, slope, and quadratic slope effects. For clarity, only a random sample of 50 participants is displayed
fitting a regression line for each participant sep- the EB estimate will be smaller (in absolute
arately) which would only rely upon an individ- value) than the corresponding OLS estimate.
ual’s data. An important advantage of EB Alternatively, if the subject has many measure-
estimates relative to OLS estimates is that they ments across time, then the EB and OLS esti-
are not as prone to the undue influence of out- mates would be very similar. These EB
liers. This is especially true when an individual estimates are readily available from most
has few measurements by which to base these MRM software programs.
estimates on. Because of this, the EB estimates Finally, the fit of the observed variance-
are said to be shrunken to the mean, where the covariance matrix of the repeated measures is
mean of the random effects equals zero in the addressed. These are calculated based on the
population. The degree of shrinkage depends on pairwise data for the covariances and the available
the number of measurements an individual has. data for each of the variances. The observed
Thus, if a subject has few measurements, then variance-covariance matrix is
V ð2
yÞ 3
26:87
6 16:52 42:64 7
6 7
6 17:19 30:54 49:54 7
6 7
6 12:03 22:64 28:47 47:00 7
6 7
6 12:65 28:68 29:47 32:39 52:74 7
¼6 6 9:37
7
7
6 21:22 20:28 24:95 30:09 49:88 7
6 9:10 21:82 29:03 26:73 29:34 28:15 49:75 7
6 7
6 7:32 23:62 23:98 26:49 24:74 27:88 31:67 50:83 7
6 7
4 7:93 22:11 22:79 22:69 26:19 23:96 27:05 33:33 53:32 5
5:48 17:17 17:83 18:78 21:53 22:44 22:86 30:53 30:97 50:14
422 J. Siddique et al.
^ ðyÞ
V ¼2
ZΣ^ υ Z0 þ σ^ 2 I
3
34:76
6 15:58 37:23 7
6 7
6 15:94 19:00 41:24 7
6 7
6 16:09 20:10 23:43 45:80 7
6 7
6 16:03 20:81 24:80 28:00 50:16 7
¼6
6 15:76 21:11
7
7
6 25:60 29:24 32:04 53:72 7
6 15:29 21:01 25:84 29:80 32:87 35:06 56:12 7
6 7
6 13:72 19:60 24:63 28:82 32:18 34:69 36:36 56:93 7
6 7
4 11:33 16:58 21:16 25:08 28:33 30:91 32:82 34:65 53:55 5
8:11 11:95 15:44 18:56 21:32 23:72 25:76 28:75 30:29 50:14
where the design matrix of the random effects, and the estimates of the random effects variance-
covariance matrix are given by
2 3 2 3
1 1 1 1 1 1 1 1 1 1 10:02 0:67 0:10
6 7 ^ 6 7
Z0 ¼ 4 0 1 2 3 4 5 6 8 10 12 5, Σ υ ¼ 4 0:67 1:55 0:11 5,
0 1 4 9 16 25 36 36 100 44 0:10 0:11 0:01
Table 4 Fit indices for various covariance patterns fit to the WECare data
p-value versus
Covariance pattern No. of parameters 2 log L AIC BIC unstructured
Autoregressiveoving Average 3 12115.6 12129.6 12129.6 0.0001
MRM 7 12095.1 12117.1 12156.5 <0.0001
Toeplitz 10 12108.5 12136.5 12186.7 <0.0001
Heterogeneous Toeplitz 19 12079.3 12125.3 12207.8 0.004
Factor analytic (2) 29 12064.8 12130.8 12249.1 0.006
Factor analytic (1) 20 12130.7 12178.7 12264.8 <0.0001
Heterogeneous CS 11 12202.1 12232.1 12286.0 <0.0001
Autoregressive (1) 2 12257.1 12269.1 12290.7 <0.0001
Heterogeneous Autoregressive(1) 11 12227.8 12257.8 12311.7 <0.0001
Antedependence 19 12209.6 12255.6 12338.1 <0.0001
Unstructured 55 12016.9 12134.9 12346.6 NA
As can be seen, while none of the covariance i was randomized to CBT and 0 otherwise. The
patterns provide a statistically similar fit to the mixed-effects model is now
data than the unstructured covariance in terms
of a likelihood ratio test, the MRM with random yij ¼ β0 þ β1 tij þ β2 t2ij þ β3 t3ij
intercepts, slopes, and quadratic slopes has the þ β4 tij MEDSi þ β5 t2ij MEDSi
smallest AIC and the second smallest BIC
among all the models. BIC imposes a high pen- þ β6 t3ij MEDSi þ β7 tij CBT i (18)
alty on models with many parameters so it is not þ β8 t2ij CBT i þ β9 t3ij CBT i þ υ0i
surprising that the unstructured covariance has þ υ1i tij þ υ2i t2ij þ eij :
the worst BIC. For this reason, Fitzmaurice et al.
(2012) recommend against use of BIC for model
The parameters υ0i, υ1i, υ2i, and eij have the
selection of (co)variance structure. AIC is more
same interpretation as in section “Curvilinear
useful for comparing models that are not nested
Growth Model.”
when a likelihood ratio test is not appropriate.
The unstructured covariance model is
Still, Table 4 suggests that the MRM provides a
relatively parsimonious fit to the WECare
yij ¼ β0 þ β1 tij þ β2 t2ij þ β3 t3ij
data. Perhaps a model with both random subject
effects and autocorrelated errors could be consid- þ β4 tij MEDSi þ β5 t2ij MEDSi
(19)
ered here. þ β6 t3ij MEDSi þ β7 tij CBT i
P
þ β8 t2ij CBT i þ β9 t3ij CBT i þ ij :
via the error term). The covariance pattern point by treatment group using the parameter esti-
model does not make this distinction. When mates in Table 5. Even though the other treatment by
the focus of inference is on the fixed-effects time interactions are not significant, their magnitude
in the model, this distinction is less important. is large enough such that the three different growth
In other settings, where there is interest in deter- curves have very different shapes.
mining the degree of subject heterogeneity Once it has been established that the Medi-
and/or examining individual subject trends, it cation intervention (but not the CBT interven-
may be more important. tion) produces significantly different outcomes
In both models, β1, β2, and β3 represent the than the TAU group (via likelihood ratio tests),
linear, quadratic, and cubic effects of time for it may be of interest to estimate the mean HDRS
the TAU group which has been chosen to be the scores of these interventions at specific time
reference group. The coefficients β4, β5, and β6 points, their differences, and their corresponding
are the time by Medication group interactions effect sizes. This can be done using the methods
with the three time effects and indicate the dif- described in section “Calculating Effect Sizes.”
ference in time trends between the Medication For example, to calculate the effect size of
and TAU group. The coefficients β7, β8, and β9 the Medication intervention versus the TAU
are the time by CBT group interactions and intervention at month 6, one begins by estimat-
indicate the difference in time trends between ing the mean HDRS scores for both groups at
the CBT and TAU group. A likelihood-ratio month 6. For both Eqs. 18 and 19 the difference
test can be used to test the null hypothesis that in mean HDRS scores at month 6 between the
there is no effect of Medication versus TAU Medication and TAU interventions is
(i.e., β4, β5, and β6 are zero) by fitting 6β4 + 62β5 + 63β6. The variance at month 6 in
model 18 with and without the time by Medica- the mixed-effects model is
tion interaction effects. This yields χ 23 =
12091.2–12067.1 = 24.1, which has a p-value
<0.0001. A similar test for the effect of the CBT Var yij jtij ¼ 6
group yields χ 23 = 12074.0– 12067.1 = 6.9 ¼ Cov υ0i þ 6υ1i þ 62 υ2i þ ei6 ,υ0i þ 6υ1i þ62 υ2i þ ei6
which has a p-value 0.075. In model 19, the ¼ Varðυ0i Þ þ2Covðυ0i ,6υ1i Þ þ 2Cov υ0i ,62 υ2i
corresponding likelihood ratio tests are χ 23 = þ Varð6υ1i Þþ 2Cov 6υ1i ,62 υ2i þVar 62 υ2i
12012.6–11988.9 = 23.7 (p < 0.0001) for the þ Varðei6 Þ
Medication group and χ 23 = ¼ σ 2υ0 þ 12σ υ1 υ2 þ 72σ υ0 υ2 þ36σ 2υ1 þ 432σ υ1 υ2
þ 1296σ 2υ2 þ σ 2
11995.2–11988.9 = 6.3 ( p = 0.10) for the
¼ 54:57:
CBT group. Thus, both models give similar
results regarding the significance of the Medi- (20)
cation and CBT treatment groups versus the
In matrix notation, this is written as
TAU group.
Table 5 reports the results from fitting the
T
model described in Eq. 18 to the WECare data Var yij j tij ¼ 6 ¼ 1 6 62 Συ 1 6 62 þ σ 2 :
and Table 6 reports the results from the model
described in Eq. 19. As can be seen, the estimates
Using the estimates from Table 5, the effect
from both models are similar.
size based on the mixed-effects model is
It is interesting to note that among the time by
treatment interactions, only the interaction of Med-
ication with linear time is significant. This suggests 4:39
Month 6 effect size ¼ pffiffiffiffiffiffiffiffiffiffiffi ¼ 0:60:
that the effect of the Medication intervention takes 54:57
place early on in the study, during the initial sharp
decline in depression scores. This is clearer in Fig. 5, For the covariance pattern model, the variance
which displays the estimated means at each time at month 6 is simply the seventh term on the
18 Analysis of Repeated Measures and Longitudinal Data in Health Services Research 425
Table 5 Results from a mixed-effect regression model fit to the WECare data
Parameter name Symbol Estimate SE t p-value
Intercept β0 16.330 0.34 47.96 <0.0001
Month β1 2.081 0.36 5.84 <0.0001
Month2 β2 0.325 0.07 4.38 <0.0001
Month3 β3 0.016 0.00 3.88 0.0001
Month*MEDS β4 1.356 0.48 2.8 0.005
Month2*MEDS β5 0.099 0.10 0.96 0.34
Month3*MEDS β6 0.001 0.01 0.11 0.92
Month*CBT β7 0.424 0.49 0.87 0.38
Month2*CBT β8 0.005 0.10 0.05 0.96
Month3*CBT β9 0.002 0.01 0.39 0.70
Intercept variance σ 2υ0 15.062 2.387
Intercept, slope covariance σ υ0 υ1 1.052 0.658
Slope variance σ 2υ1 1.182 0.322
Intercept, quadratic slope covariance σ υ0 υ2 0.134 0.050
Slope, quadratic slope covariance σ υ1 υ2 0.078 0.025
Quadratic slope variance σ 2υ2 0.006 0.002
2
Error variance σ 19.741 0.792
Note. 2 log L = 12067.1
18
HDRS means over time by
treatment group
16
Estimated HDRS Scores
14
12
10
Treatment as usual
8
CBT
Medication
0 1 2 3 4 5 6 8 10 12
Month
from the study. As noted above, the treatment of dropout patterns with distinct model parameters
missing data in longitudinal studies is itself a vast for each stratum. Marginal estimates across the
literature. An important consideration when draw- patterns can be derived as a weighted average
ing inferences from longitudinal data is the reason across pattern specific estimates (Little 1995) or
for the missing data, also referred to as the missing by using multiple imputation (Demirtas and
data mechanism (Rubin 1976). Most of the Schafer 2003). Shared-parameter models are
methods described in this chapter – with the identified by using common random effects to
exception of GEE methods – provide valid esti- relate the response with the missing-data indi-
mates under the assumption that the missing data cator (Daniels and Hogan 2000; Guo et al.
mechanism is missing at random (MAR) as 2004).
described by Rubin (1976), where the probability Limitations due to space prevent an in-depth
that a value is missing does not depend on discussion of this topic. Instead, readers are
unobserved information such as the value itself. referred to recent review articles including
When data are not missing at random (NMAR), Kenward and Molenberghs (1999), Siddique
that is, the probability that a value is missing does et al. (2008), and Ibrahim and Molenberghs
depend on unobserved information, it is necessary (2009). Also the books by Little and Rubin
to model both the outcome as well as the missing (2002), Fitzmaurice et al. (2012), Hedeker and
data mechanism itself. Gibbons (2006), and Daniels and Hogan (2008)
NMAR is an untestable assumption since the which contain useful material on this topic.
mechanism by definition depends on
unobserved information. Thus, it is difficult to
identify those situations where one is dealing Generalized Estimating Equation
with data that are NMAR. However, one situa- Models
tion where data that are NMAR is often a con-
cern is participant drop-out where subjects In the 1980s, alongside development of MRMs
withdraw from a study and are never heard and CPMs for incomplete longitudinal data, gen-
from again. In this situation, two common eralized estimating equations (GEE) models were
approaches for handling drop-outs in longitudi- developed (Liang and Zeger 1986; Zeger and
nal designs are pattern-mixture models and Liang 1986). Essentially, GEE models extend
shared-parameter models. In pattern-mixture generalized linear models (GLMs) to the case of
models, the data are stratified by the different correlated data. This class of models has become
18 Analysis of Repeated Measures and Longitudinal Data in Health Services Research 427
very popular – especially for the analysis of cate- independent of their observed responses during
gorical and count outcomes – though they can be the study. This leads to a preference for full-
used for continuous outcomes as well. One differ- likelihood approaches over quasi or partial like-
ence between GEE models and MRMs is that lihood approaches, and MRM over GEE, at least
GEE models are based on quasi-likelihood esti- for longitudinal data. There is certainly less of
mation, and so the full likelihood of the data is not an argument for a preference for data that are
specified. GEE models are termed marginal only clustered (e.g., providers nested within
models, and they model the regression of y on clinics), in which case advantages of MAR
x and the within subject dependence (i.e., the over MCAR are not as germane.
association parameters) separately. The term A basic feature of GEE models is that the joint
“marginal” in this context indicates that the distribution of a subject’s response vector yi does
model for the mean response depends only on not need to be specified. Instead, it is only the
the covariates of interest, and not on any random marginal distribution of yij at each time point
effects or previous responses. In terms of missing that needs to be specified. To clarify this further,
data, GEE assumes that the missing data are miss- suppose that there are two time-points and sup-
ing completely at random (MCAR) where the pose that the outcome is a continuous normal
probability that a value is missing does not depend random variable. GEE would only require us to
either on observed or missing values. This is a assume that the distribution of yi1 and yi2 are two
stricter (and possibly less realistic) assumption univariate normals, rather than assuming that yi1
than that assumed by the models employing full- and yi2 form a (joint) bivariate normal distribution.
likelihood estimation which assume missing data Thus, GEE avoids the need for multivariate dis-
are MAR. tributions by only assuming a functional form for
Conceptually, GEE reproduces the marginal the marginal distribution at each time-point. This
means of the observed data, even if some of leads to a simpler quasi-likelihood approach for
those means have limited information because of estimating the model parameters, rather than the
subject drop-out. Standard errors are adjusted full-likelihood approach of the MRM and CPM.
(i.e., inflated) to accommodate the reduced The disadvantage, as mentioned above, is that
amount of independent information produced by because a multivariate distribution is not specified
the correlation of the repeated observations over for the response vector, the assumption for the
time. By contrast, mixed-effects models use the missing data is more stringent for the GEE than
available data from all subjects to model temporal the full-likelihood estimated MRMs and CPMs. A
response patterns that would have been observed complete treatment of GEE can be found in
had the subjects all been measured to the end of Hardin and Hilbe (2012).
the study. Because of this, estimated mean
responses at the end of the study can be quite
different for GEE versus MRM, if the future Models for Categorical Outcomes
observations are related to the measurements that
were made during the course of the study. If the Reflecting the usefulness of mixed-effects model-
available measurements are not related to the ing and the importance of categorical outcomes in
missing measurements (e.g., following dropout), many areas of research, generalization of mixed-
GEE and MRM will produce quite similar esti- effects models for categorical outcomes has been
mates. This is the fundamental difference between an active area of statistical research. For dichoto-
GEE and MRM, that is, the assumption that the mous response data, several approaches adopting
missing data are dependent on the observed either a logistic or probit regression model and
responses for a given subject during that sub- various methods for incorporating and estimating
ject’s participation in the study. It is hard to the influence of the random effects have been
imagine that a subject’s responses that would developed (Gibbons 1981; Stiratelli et al. 1984;
have been obtained following dropout would be Wong and Mason 1985; Gibbons and Bock 1987;
428 J. Siddique et al.
Conaway 1989; Goldstein 1991). Here, briefly yij ¼ x0ij β þ σ υ θi þ eij (23)
described is a mixed-effects logistic regression
model for the analysis of binary data. Extensions in which case the error term eij follows a standard
of this model for analysis of ordinal, nominal, and logistic distribution under the logistic regression
count data are described in detail by Hedeker and model (or a standard normal distribution under the
Gibbons (2006). probit regression model). This representation
To set the notation, let i denote individuals and helps to explain why the regression coefficients
let j denote the repeated measurement occasions from a mixed-effects logistic regression model do
within each individual. Assume that there are not typically agree with those obtained from a
i = 1, . . ., N individuals and j = 1, . . ., ni mea- fixed-effects logistic regression model, or for
surement occasions nested within each individual. that matter from a GEE logistic regression model
Let Yij be the value of the dichotomous outcome which has regression coefficients that agree in
variable, coded 0 or 1. The logistic regression scale with the fixed-effects model. In the mixed
model is written in terms of the log odds (i.e., model, the conditional variance of the latent
the logit) of the probability of a response, denoted y given x equals σ 2υ þ σ 2e , whereas in the fixed-
pij. Considering first a random-intercept model, effects model this conditional variance equals
augmenting the logistic regression model with a only the latter term σ 2e (which equals either π 2/3
single random effect yields: or 1 depending on whether it is a logistic or probit
regression model, respectively). As a result,
" #
pij equating the variances of the latent y under these
ln ¼ x0ij β þ υi (21) two scenarios yields:
1 þ pij
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
where xij is the ( p + 1) 1 covariate vector σ 2υ þ σ 2e
βM βF
(includes a 1 for the intercept), β is the σ 2e
( p + 1) 1 vector of unknown regression param-
eters, and υi0 is the random subject effect. These where βF and βM represent the regression coeffi-
random effects are assumed to be distributed in cients from the fixed-effects and (random-
the population as N 0, σ 2υ . For convenience and intercepts) mixed-effects models, respectively. In
computational simplicity, in models for categori- practice, Zeger et al. (1988) have found that (15/
cal outcomes the random effects are typically 16)2π 2/3 works better than π 2/3 for σ 2e in equating
expressed in standardized form. For this, results of logistic regression models.
υ0i = σ υ θi and the model is given as: Several authors have commented on the differ-
ence in scale and interpretation of the regression
" # coefficients in mixed-models and marginal
pij models, like the fixed-effects and GEE models
ln ¼ x0ij β þ σ υ θi : (22)
1 þ pij (Neuhaus et al. 1991; Zeger et al. 1988). Regres-
sion estimates from the mixed model have been
Notice that the random-effects variance term termed “subject-specific” to reinforce the notion
(i.e., the population standard deviation σ υ) is now that they are conditional estimates, conditional on
explicitly included in the regression model. Thus, it the random (subject) effect. Thus, they represent
and the regression coefficients are on the same scale, the effect of a regressor on the outcome control-
namely, in terms of the log-odds of a response. ling for, or holding constant, the value of the
The model can also be expressed in terms of a random subject effect. Alternatively, the estimates
latent continuous variable y, with the observed from the fixed-effects and GEE models are “mar-
dichotomous version Y being a manifestation of ginal” or “population-averaged” estimates which
the unobserved continuous y. Here, the model is indicate the effect of a regressor averaging over
written as: the population of subjects. This difference of scale
18 Analysis of Repeated Measures and Longitudinal Data in Health Services Research 429
and interpretation only occurs for nonlinear Once multiple trajectories have been identified,
regression models like the logistic regression analyses can be performed to predict trajectory
model. For the linear model this difference does class as a function of other covariates. This
not exist. approach is particularly useful in randomized tri-
als because it may suggest that for some groups of
individuals one treatment may be better than
Growth Mixture Models another treatment based on the subject’s predicted
trajectory. For example, if a subject’s age, number
A frequent characteristic of depression clinical of children, and ethnicity are predictive of a tra-
trials (such as the WECare study) is that outcomes jectory where outcomes are more favorable under
over time are subject to considerable between- medication rather than CBT, then one would con-
subject heterogeneity due to the fact that patients sider treating a patient with similar characteristics
often follow different trajectories over time. Some with medication. On the other hand, it may be that
participants may see immediate gains, only to a subject’s predicted trajectory suggests that both
relapse at a later date, while others will improve medication and CBT are effective. In that case,
gradually overtime. Some participants will not either treatment can be offered. In this way,
improve at all. When comparing the effectiveness growth mixture modeling may provide insights
of different treatments, it is important to identify on personalized depression treatments that are
and take into account these different trajectories tailored based on patient characteristics as well
because the effectiveness of an intervention as preferences.
may depend on the trajectory class of the par- More specifically, let ci be a latent categorical
ticipants. Despite the fact that heterogeneity of variable representing the unobserved membership
outcomes is common in depression studies, in a trajectory class for participant i, where ci =
most analyses such as mixed-effects regression 1, 2, . . ., K. The variable c is referred to as a
models assume that all individuals are drawn trajectory class variable. Define yij as the outcome
from a single population with common popula- for participant i at time j, j = 0, 1, . . ., ni. Then,
tion parameters (Muthén 2004). That is, they conditional on trajectory class k, the GMM aug-
assume that all individual trajectories vary ments Eq. 16 as follows
around a single mean trajectory. This assump-
tion goes counter to clinical observations and
empirical data where variation in trajectory yij j ci ¼ k ¼ β0k þ βik tij þ β2k tij Trti
shapes is routinely observed. When individuals
þ υ0ik þ υ1ik tij þ eijk (24)
follow several different trajectory shapes, con-
ventional repeated measures modeling may lead
to a distorted assessment of treatment effects. Both the random and fixed effects have the
Growth mixture modeling (Muthén and same interpretation as before, but now they are
Shedden 1999; Muthén et al. 2002; Xu and indexed by trajectory class k, so that they may
Hedeker 2002) relaxes the single population vary by trajectory class.
assumption to allow for parameter differences Class membership is expressed by a multino-
across several unobserved populations. Instead mial logistic regression of the form:
of considering individual variation around a sin- 0
gle trajectory, a growth mixture model (GMM) exi δk
Pðci ¼ kj xi Þ ¼ PK x0 δ (25)
s¼1 e
allows different classes of individuals to vary i s
For binary variables x in Eq. 25, eδ can be features of these classes of models relative to
interpreted as the odds ratio of being in one class MRMs and CPMs.
versus another. For example, if x is gender, then Mixed-effects models, which allow one to esti-
one can estimate the odds of a male participant mate subject-specific change over time and pro-
being in one trajectory versus a female. vide valid estimates in the presence of data
The number of trajectories in a GMM must be missing at random should be considered as the
specified a priori. Typically, several GMMs are fit preferred methodology for analysis of longitudi-
assuming a different number of trajectory classes nal data by health services researchers. Most cur-
and the “correct” number of trajectories is chosen rent statistical software packages include
based on model fit criteria such as BIC. See functions for estimating MRMs and their various
Muth’en et al. (2002) and Muth’en et al. (2009) extensions, thus making them easily accessible to
for more detail on fitting GMMs in clinical trial the interested researcher.
settings and Siddique et al. (2012) for an example
of a GMM fit to the WECare data. Acknowledgments The authors wish to thank Jeanne
Miranda for use of the WECare data. Dr. Siddique’s work
was supported by grant K07 CA154862-01 from the
National Cancer Institute and R03 HS018815-01 from
Discussion the Agency for Healthcare Research and Quality.
Dr. Hedeker’s work was supported by Award Number
P01 CA098262 from the National Cancer Institute.
This chapter reviewed methods for the analysis of
Dr. Gibbons’ work was supported by R01 MH8012201
longitudinal data commonly encountered in from the National Institute of Mental Health. The content
health services research. The chapter began by is solely the responsibility of the authors and does not
discussing issues inherent in longitudinal data necessarily represent the official views of the National
Cancer Institute, Agency for Healthcare Research and
and then described methods for analyzing these
Quality, or the National Institutes of Health.
data, focusing on linear mixed-effects models and
covariance-pattern models for continuous data.
These methods were applied to data from a longi-
tudinal depression treatment trial, going into spe- References
cific detail on model selection, estimation of
treatment effects, calculation of effect sizes, and Berkhof J, Snijders TAB. Variance component testing in
multilevel models. J Educ Behav Stat.
interpretation. 2001;26:133–52.
Data from health services research are often Bock RD. Multivariate statistical methods in behavioral
missing and/or not continuous. These types of research. New York: McGraw-Hill; 1975.
data suggest the use of models in addition to Bock RD. Within-subject experimentation in psychiatric
research. In: Gibbons RD, Dysken MW, editors. Statis-
those discussed in this chapter. Due to space lim- tical and methodological advances in psychiatric
itations, extended models for missing data and research. New York: Spectrum; 1983a. p. 59–90.
nonlinear models for noncontinuous data were Bock RD. The discrete Bayesian. In: Wainer H, Messick S,
only briefly mentioned. As described, MRMs editors. Modern advances in psychometric research.
Hillsdale: Erlbaum; 1983b. p. 103–15.
and CPMs do allow for missing data and provide Bock RD. Measurement of human variation: a two stage
valid results under the assumption of missing at model. In: Bock RD, editor. Multilevel analysis of
random (MAR). Thus, the extended missing data educational data. New York: Academic; 1989.
models are useful to the extent that researchers Bryk AS, Raudenbush SW. Hierarchical linear models:
applications and data analysis methods. Newbury
suspect that the missing data are missing not at Park: Sage; 1992.
random, a situation that is impossible to ascertain Chi EM, Reinsel GC. Models for longitudinal data with
with the observed data. Finally, the chapter briefly random effects and AR(1) errors. J Am Stat Soc.
described generalized estimating equation (GEE) 1989;84:452–9.
Conaway MR. Analysis of repeated categorical measure-
models and growth mixture models (GMMs) for ments with conditional likelihood methods. J Am Stat
longitudinal data, noting some distinguishing Assoc. 1989;84:53–61.
18 Analysis of Repeated Measures and Longitudinal Data in Health Services Research 431
Daniels MJ, Hogan JW. Reparameterizing the pattern mix- Laird NM. Missing data in longitudinal studies. Stat Med.
ture model for sensitivity analyses under informative 1988;7:305–15.
dropout. Biometrics. 2000;56:1241–8. Laird NM, Ware JH. Random-effects models for longitu-
Daniels MJ, Hogan JW. Missing data in longitudinal stud- dinal data. Biometrics. 1982;38:963–74.
ies: strategies for Bayesian modeling and sensitivity Liang K-Y, Zeger SL. Longitudinal data analysis using
analysis. New York: Chapman & Hall/CRC; 2008. generalized linear models. Biometrika. 1986;73:13–22.
de Leeuw J, Kreft I. Random coefficient models for multi- Little RJA. Modeling the drop-out mechanism in repeated-
level analysis. J Educ Stat. 1986;11:57–85. measures studies. J Am Stat Assoc. 1995;90:1112–21.
Demirtas H, Schafer JL. On the performance of random- Little RJA, Rubin DB. Statistical analysis with missing
coefficient pattern-mixture models for nonignorable data. 2nd ed. New York: Wiley; 2002.
dropout. Stat Med. 2003;22:2553–75. Longford NT. A fast scoring algorithm for maximum like-
Dempster AP, Rubin DB, Tsutakawa RK. Estimation in lihood estimation in unbalanced mixed models with
covariance component models. J Am Stat Soc. nested random effects. Biometrika. 1987;74:817–27.
1981;76:341–53. Longford NT. Random coefficient models. New York:
Diggle PJ, Heagerty P, Liang K-Y, Zeger SL. Analysis of Oxford University Press; 1993.
longitudinal data. 2nd ed. New York: Oxford Univer- Miranda J, Chung JY, Green BL, Krupnick J, Siddique J,
sity Press; 2002. Revicki DA, Belin T. Treating depression in predomi-
Fitzmaurice GM, Laird NM, Ware JH. Applied longitudi- nantly low-income young minority women. J Am Med
nal analysis. 2nd ed. Hoboken: Wiley; 2012. Assoc. 2003;290:57–65.
Gibbons RD. Trend in correlated proportions. PhD thesis, Miranda J, Chung JY, Green BL, Krupnick J, Siddique J,
University of Chicago, Department of Psychology, Revicki DA. One year outcomes of treating depression
1981. in predominantly low-income young minority women.
Gibbons RD, Bock RD. Trend in correlated proportions. J Clin Consult Psychol. 2006;74:99–111.
Psychometrika. 1987;52:113–24. Muth’en BO. Latent variable analysis: growth mixture
Gibbons RD, Hedeker D, Waternaux CM, Davis modeling and related techniques for longitudinal data.
JM. Random regression models: a comprehensive In: Kaplan D, editor. Handbook of quantitative meth-
approach to the analysis of longitudinal psychiatric odology for the social sciences. Newbury Park: Sage;
data. Psychopharmacol Bull. 1988;24:438–43. 2004.
Goldstein H. Multilevel mixed linear model analysis using Muth’en B, Shedden K. Finite mixture modeling with
iterative generalized least squares. Biometrika. 1986; mixture outcomes using the em algorithm. Biometrics.
73:43–56. 1999;55:463–9.
Goldstein H. Nonlinear multilevel models, with an application Muth’en B, Brown CH, Masyn K, Jo B, Khoo ST, Yang
to discrete response data. Biometrika. 1991;78:45–51. CC, Wang CP, Kellam SG, Carlin JB, Liao J. General
Goldstein H. Multilevel statistical models. 4th ed. Hoboken: growth mixture modeling for randomized preventive
Wiley; 2011. interventions. Biostatistics. 2002;3(4):459–75.
Grady JJ, Helms RW. Model selection techniques for the Muth’en BO, Brown CH, Leuchter A, Hunter A. General
covariance matrix for incomplete longitudinal data. approaches to analysis of course: applying growth mix-
Stat Med. 1995;14:1397–416. ture modeling to randomized trials of depression med-
Guo W, Ratcliffe SJ, Ten Have TR. A random pattern- ication. In: Shrout PE, editor. Causality and
mixture model for longitudinal data with dropouts. psychopathology: finding the determinants of disorders
J Am Stat Assoc. 2004;99:929–37. and their cures. Washington, DC: American Psychiatric
Hardin JW, Hilbe JM. Generalized estimating equations. Publishing; 2009. Forthcoming.
2nd ed. New York: Chapman and Hall; 2012. Neuhaus JM, Kalbfleisch JD, Hauck WW. A comparison
Hedeker D. Random regression models with auto- of cluster-specific and population-averaged approaches
correlated errors. PhD thesis, University of Chicago, for analyzing correlated binary data. Int Stat Rev.
Department of Psychology, 1989. 1991;59:25–35.
Hedeker D, Gibbons RD. Longitudinal data analysis. Potthoff RF, Roy SN. A generalized multivariate analysis
New York: Wiley; 2006. of variance model useful especially for growth curve
Hui SL, Berger JO. Empirical Bayes estimation of rates in problems. Biometrika. 1964;51:313–6.
longitudinal studies. J Am Stat Assoc. 1983;78:753–9. Raudenbush SW, Bryk AS. A hierarchical model for study-
Ibrahim J, Molenberghs G. Missing data methods in lon- ing school effects. Sociol Educ. 1986;59:1–17.
gitudinal studies: a review (with discussion). TEST. Raudenbush SW, Bryk AS. Hierarchical linear models. 2nd
2009;18:1–43. ed. Thousand Oaks: Sage; 2002.
Kaplan D, George R. Evaluating latent growth models Rubin DB. Inference and missing data. Biometrika.
through ex post simulation. J Educ Behav Stat. 1998; 1976;63:581–92.
23:216–35. Siddique J, Brown CH, Hedeker D, Duan N, Gibbons RD,
Kenward MG, Molenberghs G. Parametric models for Miranda J, Lavori PW. Missing data in longitudinal
incomplete continuous and categorical longitudinal trials–part B, analytic issues. Psychiatr Ann. 2008;
data. Stat Methods Med Res. 1999;8(1):51–83. 38(12):793–801.
432 J. Siddique et al.
Siddique J, Chung JY, Brown CH, Miranda J. Comparative Weiss RE. Modeling longitudinal data. New York:
effectiveness of medication versus cognitive behavioral Springer; 2005.
therapy in a randomized controlled trial of low-income Wolfinger RD. Covariance structure selection in general
young minority women with depression. J Consult Clin mixed models. Commun Stat Simul Comput.
Psychol. 2012;80:995–1006. 1993;22:1079–106.
Singer JD, Willett JB. Applied longitudinal data analysis. Wong GY, Mason WM. The hierarchical logistic regres-
New York: Oxford University Press; 2003. sion model for multilevel analysis. J Am Stat Assoc.
Stiratelli R, Laird NM, Ware JH. Random-effects models 1985;80:513–24.
for serial observations with binary response. Biomet- Xu W, Hedeker D. A random-effects models for classifying
rics. 1984;40:961–71. treatment response in longitudinal clinical trials.
Strenio JF, Weisberg HI, Bryk AS. Empirical Bayes J Biopharm Stat. 2002;11:253–73.
estimation of individual growth curve parameters Zeger SL, Liang KY. Longitudinal data analysis for discrete
and their relationship to covariates. Biometrics. and continuous outcomes. Biometrics. 1986;42:121–30.
1983;39:71–86. Zeger SL, Liang KY, Albert PS. Models for longitudinal
Verbeke G, Molenberghs G. Linear mixed models for data: a generalized estimating equation approach. Bio-
longitudinal data. New York: Springer; 2000. metrics. 1988;44:1049–60.
Competing Risk Models
19
Melania Pintilie
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
Motivation and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
The Need to Analyze Time to Event of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
The Follicular Lymphoma Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
The Pressure Ulcer Healing (PUH) Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Estimation of the Probability of Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Necessity for Special Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Nonparametric Estimation of Probability of Event in the Presence
of Competing Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
The Justification of the Kalbfleisch and Prentice Formula (1) . . . . . . . . . . . . . . . . . . . . . . . . . 437
The Intuitive Justification for Formula (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
General Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
A Theoretical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Fine and Gray Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Interpretation of the Fine and Gray Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Cox Regression in the Presence of Competing
Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
Other Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Analyzing Correlated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Analyzing Case-Cohort Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Sample Size and Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
M. Pintilie (*)
University Health Network, Toronto, ON, Canada
e-mail: pintilie@uhnresearch.ca
composite end point. However, this approach time to pressure ulcer healing as a function of a
could diminish the effect on the event of interest patient’s Palliative Performance Scale status, an
or even suggest a totally different conclusion. important clinical factor which for this analysis is
Sometimes a composite end point is not fea- dichotomized at 40, bedridden vs. ambulatory. If
sible. During the treatment a patient needs to be a patient had more than one pressure ulcer, one
assisted by temporarily inserting a feeding tube. was chosen at random for analysis to avoid
After the treatment and as the patient recovers, having to deal with the added complexity of
the tube is taken out. The time at which the tube correlated observations (see section “Analysing
is taken out could be considered as a surrogate Correlated Data”). Dr. Vincent Maida and
for response. This end point cannot be consid- Dr. Marguerite Ennis graciously allowed the
ered together with death, for example, as the use of the pressure ulcer healing data (Maida
former is a positive outcome and the latter a et al. 2012) for the illustration of the concepts
negative one. of this chapter.
The following two examples will be utilized
along this chapter to illustrate the different aspects
of the analysis in the presence of competing risks. Estimation of the Probability of Event
The datasets were slightly modified to help illus-
trate competing risks analysis. Clinical conclusion Necessity for Special Techniques
cannot be drawn from these analyses.
In the presence of competing risks, the estimates
based on the Kaplan-Meier (KM) method when
The Follicular Lymphoma Example the competing risk is censored are not probabil-
ities. This concept is illustrated using the cohort
Consider as an example a cohort of patients of follicular lymphoma described in section “The
with early-stage follicular lymphoma with the Follicular Lymphoma Example.” The event of
follow-up ranging between 1 and 31 years. For interest is the time to second malignancy follow-
this disease, the prognosis is good with 10 year ing the lymphoma diagnosis. For the moment,
survival of approximately 75%. These patients the competing risks (the deaths without second
could experience relapses (local and/or distant), malignancy) are ignored and censored. With this
a second malignancy, or die of other causes. assumption, KM estimates can be obtained. KM
Each of these events can be of interest with estimates can also be calculated for the deaths
the rest being competing risks with the excep- without second malignancy as event and with the
tion of death which cannot have any competing second malignancy censored. If the KM esti-
risks. mates can be interpreted as probabilities, then
the calculated 1-KM would be the probability
for each of the two specific types of event to
The Pressure Ulcer Healing (PUH) happen. Since the two types of events are mutu-
Example ally exclusive, the sum of the 1-KM estimates
calculated at each time point should be the prob-
This is a cohort of patients with advanced illness ability of any of the two events to occur, namely,
who were admitted to a palliative care center and the probability for either second malignancy or
followed until death (Maida et al. 2008, 2012). death without second malignancy. In Fig. 1, the
All patients had at least one pressure ulcer at the broken line is the 1-KM estimate for the second
time of admittance, and the time from admittance malignancy, while the solid line represents the
to complete healing was recorded for all pressure sum of the 1-KM estimates for second malig-
ulcers that healed. The life expectancy for the nancy and the death without second malignancy.
cohort is low with median survival less than a The fact that the top line goes beyond the possi-
month. The goal of this analysis is to study the ble upper limit of a probability is a proof that
436 M. Pintilie
1-KM estimates
0.6
0.2
lymphoma
of second malignancy
0.3
0.2
0.1
0.0
0 5 10 15 20 25 30
Time to second malignancy
when competing risks are present, the KM esti- and dev j are the number of events of interest
mates cannot be interpreted as probabilities. at time tj. The probability of event can be esti-
mated as:
X d ev j
Nonparametric Estimation ^ ev ðtÞ ¼
F ^S tj1 (1)
all j, t t
nj
of Probability of Event in the Presence j
of Competing Risks
Here ^S tj1 is the KM estimate for the comple-
Kalbfleisch and Prentice (1980) modified the KM ment of the probability of all types of events.
estimator to obtain the probability of event in the ^ ev ðtÞ is sometimes called cumu-
In the literature, F
presence of competing risks. Briefly, suppose lative incidence function (CIF). Figure 2 shows
t1 < t2 < . . . are the ordered time points for all the estimation based on (1) and on the KM
types of events, nj are the number at risk at time tj, method for the second malignancy in follicular
19 Competing Risk Models 437
of second malignancy
lymphoma se eath
r
fo r d
ity o
0.6 a bil ncy
o b n a
Pr alig
m Probability for death
0.4 Without second
malignancy
0.2
Probability for
0.0 second malignancy
0 5 10 15 20 25 30
Time to second malignancy
^ 1 ðtÞ þ F
¼F ^ 2 ðtÞ
The Justification of the Kalbfleisch (3)
and Prentice Formula (1)
It is easy to recognize the formula for the
The well-known formula for the KM estimates estimation of the probability of the event of inter-
can be written as a sum for its complement, the est (1) in the two terms in (3). Thus the probability
estimator for the probability of all events: of all events can be partitioned in the probabilities
of the constituent types of events. Figure 3 shows
^ ðtÞ ¼ 1 ^S ðtÞ
F the partition of the probability of second and death
with second malignancy or death into probability
nj d j of second malignancy and probability of death
¼1 ∏
tj t nj (2) without second malignancy in the follicular lym-
X dj phoma dataset.
¼ ^S tj1
tj t
n j
Table 1 Table of percentages the software calculates the variance but may not
Time CIF 1-KM Naive estimates give the confidence interval. The confidence
point (%) (%) (%) interval can be calculated using the same tech-
1 1.5 1.5 1.5 nique as in a noncompeting risks situation
2 3.1 3.2 3.1 (Kalbfleisch and Prentice 1980). If cCIF is the
3 3.7 3.8 3.7 complement of CIF (i.e., 1-CIF), then the
4 4.1 4.2 4.1 limits of the confidence interval for cCIF are
5 4.6 4.9 4.6 given by:
6 6.1 6.7 6.1
7 6.8 7.6 6.8 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
8 7.2 8.1 7.2 z1α=2 dðcCIFÞ
Var
cCIFexpðAÞ , where A¼
9 7.8 8.9 7.8 cCIF lnðcCIFÞ
10 7.9 9.2 7.9
(4)
0.3
f(x)
0.2
0.1
F(x)
0.0
–4 –2 0 2 4
x
λ1
The Fine and Gray model (1999) is an extension
F1 ð t Þ ¼ 1 eðλtþλ2 Þt (10)
λ1 þ λ2 of the Cox model to the situation of competing
risks. The effects are estimated by maximizing the
Note that the quantity which is in brackets pseudo-likelihood, which is a function that
is the distribution function of an exponential dis- depends on the observed covariates and the
tribution with parameter λ1 + λ2. As a distribution order in which the events were observed. As in
function, this quantity spans the 0–1 interval. On the Cox regression, the hazard is modeled as:
the other hand, the factor with which this is mul-
tiplied is a positive quantity less than 1. Therefore, γ ðtj xÞ ¼ γ 0 ðtÞeβx (14)
the maximum of this function is λ1λþλ 1
2
, a quantity
less than 1. The two hazards are: where x is the covariate, γ 0 is the baseline hazard,
and β is the coefficient estimated by maximizing
h~1 ðtÞ ¼ λ1 (11) the pseudo-likelihood given by:
!
f ðtÞ λ1 ðλ1 þ λ2 Þeðλtþλ2 Þt r eβxj
γ 1 ðtÞ ¼ 1
e ¼ (12) PLðβÞ ¼ ∏ P (15)
1 Fð t Þ λ2 þ λ2 j¼1 i Rj wij e
βxi
Note that the subhazard is the same as the where r is the number of events of interest and Rj
hazard of the marginal distribution. This is always is the risk set at time tj. This formula is written
true under the latent failure time assumption and if only for one covariate but it can easily be
the two types of event are independent. This lends extended to many covariates. The difference
easily to a nice interpretation of the effect in the between (15) and the partial likelihood of Cox
absence of the other event. However, the assump- regression is the weight wij and the risk set. In
tion of independence cannot be proven and rarely Cox regression the risk set is defined as the set of
can be made (Tsiatis 1975). In the absence of observations with longer observed time that the
independence, the analysis of the subhazard can- current event. In addition, for the Fine and Gray
not be interpreted. model, the risk set also includes all the competing
When the two events are not independent, the risks events at all time points regardless of the
subhazard is no longer the hazard of the marginal time at which the competing risk was observed.
model: The involvement of the competing risks event is
mitigated by the weight: the longer the duration
h~1 ðtÞ ¼ λ1 þ μt (13) between the current event and the observed com-
peting risks event, the smaller the weight. For
where μ is the parameter which controls the level example, a competing risks event which happens
of dependence between the two types of events. at 2 years participates fully in the pseudo-
In contrast, the analysis of the subdistribution likelihood for the terms before 2 years and partic-
hazard does not assume independence, and it can ipates less and less in the pseudo-likelihood for
be interpreted as reflecting the observable effect. the terms which are farther and farther from
19 Competing Risk Models 441
R3 time
R2
R1
b
W42 W65
W62 R6
time
W32 R3
R1
2 years. The weights are based on the distribution Table 2 Types of events
of the censored time. Type of event Frequency
In the two diagrams in Fig. 5, the horizontal Second malignancy 56 Event of
line represents the time axis, the black circles interest
represent the individual for which the event of Relapse before second 260 Competing
interest is observed, the vertical lines are for the malignancy risks event
censored observations, and the purple crosses in Death without relapse or 54 Competing
second malignancy risks event
diagram B represent the competing risks events.
Censored 171
In diagram A there are no competing risks and all
the individuals with the observed time larger than
the individual for which the partial likelihood is
written are in the risk set. For diagram B the coefficient can be interpreted as the sub-
competing risks are always in the risk set, every distribution hazards ratio. As in the Cox regres-
time with a different weight. Thus, the weight for sion, the assumption of proportionality of hazards
the individual marked with j = 2 is one for the is made and can be checked by visually inspecting
term j = 1, w32 for j = 3, w42 for j = 4, and w62 for the Schoenfeld-type residuals.
j = 6 where 1 w32 w42 w62. . .. Consider the example in section “The Follicu-
lar Lymphoma Example” with three types of
events: second malignancy, disease failure
Interpretation of the Fine and (relapse), and death without second malignancy
Gray Model or disease failure. Any of these events could be
considered as event of interest with the rest of
The Fine and Gray regression (1999) models the them as competing risks. The types of events
subdistribution hazard (9). The exponent of a and their frequency are listed in Table 2.
442 M. Pintilie
Probability of event
0.6
0.4
0.2
0.0
0 5 10 15 20 25 30
Time to event
From Table 2 it is apparent that the most Table 3 The results of the model with second malignancy
frequent type of event is disease failure. Figure 6 as event of interest
shows that the disease failures occur shortly after HR 95% conf. int. p-value
the initial diagnosis of follicular lymphoma, Age 1.03 1.01–1.05 0.0074
while the second malignancies and the death Sex: men vs. women 0.98 0.58–1.67 0.94
without disease failure happen at a more steady Stage: 2 vs. 1 0.78 0.41–1.48 0.44
rate. Residual bulk 0.79 0.45–1.41 0.43
The Fine and Gray model was applied to sec- Chemotherapy 1.54 0.78–3.02 0.22
ond malignancy and to disease failure. Tables 3
and 4 show the results of these models. Table 4 The results of the model for disease failure
Thus, age is the only significant covariate for HR 95% conf. int. p-value
both types of events. As expected, the disease- Age 1.02 1.01–1.02 0.0019
specific factors like stage and residual bulk are Sex: men vs. women 1.04 0.81–1.33 0.76
significant for the disease failure. Furthermore, Stage: 2 vs. 1 1.57 1.19–2.08 0.0016
chemotherapy is marginally not significant. Residual bulk 1.49 1.14–1.95 0.004
Those with residual bulk or of stage 2 are about Chemotherapy 0.72 0.51–1.01 0.055
1.5-fold more likely to have disease failures than
Table 5 The results of the model with all end points
the ones without residual bulk or stage 1, respec-
combined
tively. Those receiving chemo are less likely to
HR 95% conf. int. p-value
have a disease failure.
Age 1.04 1.03–1.04 <0.0001
Table 5 shows the results when all end points
Sex: men vs. women 1.16 0.94–1.42 0.16
are combined. The results in Table 5 are close
Stage: 2 vs. 1 1.41 1.11–1.79 0.0044
to those seen in Table 4 although somewhat
Residual bulk 1.45 1.15–1.82 0.0015
weaker for stage, bulk, and chemotherapy. The
Chemotherapy 0.83 0.63–1.11 0.22
reason for the resemblance between the last two
tables is the fact that there are many more
relapses than second malignancies: 260 vs. 56. Cox Regression in the Presence
Thus the results in Table 5 are driven by the of Competing Risks
number of relapses. Some of the effects are
weaker because those covariates have an oppo- If the competing risks event is censored, then,
site effect for the second malignancy than for from the technical point of view, the analysis
disease failure. could be carried out using the usual Cox model
19 Competing Risk Models 443
or Kaplan-Meier estimates, but the interpretation, wound healing are part of the system failures asso-
when possible, is different. In the previous sec- ciated with death. Only in the rare situation when
tions, the bias involved in estimating the proba- the event of interest can be assumed independent
bility of an event when competing risks are from the competing events can the Cox model
ignored was described. The main question is results be interpreted as the effect of a covariate
whether there is a bias when the competing risks when the competing risks do not exist.
are ignored in the modeling process and indeed, if
it is possible to predict how large and in which
direction this bias is. Another issue is if the results Other Developments
of a model when the competing risks are ignored
can be interpreted at all. Analyzing Correlated Data
In many instances the results of the Cox PH
model and Fine and Gray model will be very A notable development is the extension of the
similar giving the wrong impression that this is a Fine and Gray model to accommodate correlated
general pattern. However, the two models do not data (strata and/or cluster). For example, in the
always give similar results. Moreover, the direc- PUH example, one may wish to analyze all pres-
tion of bias cannot be predicted. Finally, the sure ulcers of a patient rather than just one. This
results from the Cox model can be interpreted creates clustered data. Zhou et al. (2011, 2012)
only under the strict assumption that the distribu- extended the Fine and Gray model by applying
tion of the event of interest and the distribution of Lee et al.’s (1992) approach.
competing risks event are independent. This
assumption can rarely be made and never substan-
tiated (Tsiatis 1975). The Wound PUH data (given Analyzing Case-Cohort Design
in section “The Pressure Ulcer Healing (PUH)
Example”) offers an example when the two When the event of interest is rare, the collection of
models give different results. data for the whole cohort is not feasible. The case-
Based on the Fine and Gray model (Table 6), cohort design allows one to take advantage of the
the analysis suggests that the performance status number of events of interest while including only a
is an important prognostic factor with regard to fraction of the data without the event of interest.
pressure ulcer healing. The patients who are bed- Pintilie et al. (2010) developed a pseudo-likelihood
ridden have a longer time to healing than the to analyze a case-cohort design in the presence of
ambulatory patients. The competing risk of death competing risks based on Barlow’s work (1999).
is ignored in the Cox model, and the effect is much
attenuated, the p-value becomes nonsignificant,
and one may reach the wrong conclusion. The Sample Size and Power
probabilities of death and pressure ulcer healing
are not independent: knowing that death occurred For the time-to-event analysis, the calculation of
changes the probability that the pressure ulcer the sample size necessary to achieve a certain
would have healed if the patient could be observed power involves two steps: (a) the calculation of
indefinitely. One possible mechanism for this is the necessary number of events and (b) the calcu-
because the physiological systems needed for lation of the necessary number of patients to
observe that number of events. The number of
events nev necessary to detect a specific hazard
Table 6 The prognostic value of palliative performance
status for wound healing ratio (HR) is given by:
HR 95% conf. int. p-value
Fine and Gray 3.3 1.7–6.7 0.00078 pffiffiffiffiffiffi z1α2 þ z1β
nev ¼ (16)
Cox model 1.7 0.8–3.6 0.13 sd ðxÞ lnðHRÞ
444 M. Pintilie
to observe 50 evens
140
120
100
where z1α2 and z1β are the quantiles of the dramatically. This is equivalent to say that as the
standard normal distribution for α2 and β. Thus, λcr increases the total number of patients neces-
for α = 0.05, z1α2 ¼ 1:96 and for β = 0.2, sary to observe, a certain number of events of
z1β = 0.84. sd(x) stands for the standard devia- interest increase greatly.
tion of the covariate to be tested. If a randomized Intuitively, this is obvious since the competing
trial with equal allocation in two arms is planned, risks hinder the observation of the event of inter-
then sd ðxÞ ¼ 12 . The total number of patients to est. One example is shown in Fig. 7 where an
produce nev is: n ¼ Pnevev where Pev is the probability increase of the competing risks from 0 to 0.4
of the event of interest to occur during the study causes a doubling of the final sample size.
period. When there are no competing risks, Pev The higher the rate of competing risks, the less
can be expressed formulaically as: likely is to observe the event of interest, and
therefore a larger initial sample sizes is needed.
eλf eλðαþf Þ Therefore, ignoring the competing risks in the
Pev ¼ 1 (17) design stage will create an underpowered study
λa
and will result in a waste of effort and money.
where λ is the hazard rate of the whole cohort, a is Although the independence between the event
the accrual time, and f the follow-up time added to of interest and the competing risks event cannot be
the accrual time. usually assumed in the analysis phase, this
When competing risk are present the formula assumption is needed to be made in this section
changes to: for mathematical tractability. The second assump-
tion made was that the time to the two types of
λev events follows exponential distribution.
Pev ¼
λev þ λcr
(18)
eðλev þλcr Þf eðλev þλcr Þðaþf Þ Example 1 Suppose that the researcher wants to
1
ðλev þ λcr Þa validate the prognostic value of a specific marker
in a cohort of patients. The marker is measured as
where λev and λcr are the marginal hazards for the present or absent, and the frequency of a positive
events of interest and competing risks event, marker is about half in this population. The cohort
respectively. It is obvious that if λcr = 0, i.e., is already assembled, and it is known that there are
when competing risks do not exist, the formula 50 events of interest. The researcher wants to
(18) becomes (17). A close look of formula (18) know if there is enough power to detect an effect
shows that as λcr increases, the Pev decreases size corresponding to a subdistribution hazard
19 Competing Risk Models 445
ratio of 2 at the level of significance of 0.05. formula (16) puts the approximate number of
Solving the formula (16) for z1β, the power is events at 122. The formula (18) can be applied
found to be 69%. for each of the two arms, and probability of event
for the standard arm is 0.62 and for the new
treatment is 0.41. On average it can be said the
Example 2 A randomized study is being probability of event in the study is approximately
planned to test a new way of delivering radiation 0.5. Since the necessary number of events is
for cancer patients. Since radiation is a local 122, the total number of patients needs to be
treatment, the investigators are interested to test 244. This center can accrue 50 patients per year,
its effect on local disease. Patients may experi- and thus 244 is a reachable goal. Note that
ence a relapse outside the treated area or death of relaxing the accrual effort is not allowed as the
other causes, both representing competing risks maximum number the center can accrue is very
events. It is known from previous studies that the close to the total number of patients needed.
rate of local disease in the standard arm is
λev = 0.4 and the rate of other relapses and
death of other causes λcr = 0.1. It is expected
that the new treatment will not change the rate of Software
competing risks but it will decrease the rate of
local disease to 0.2. The cancer center can accrue The competing risk analysis can be performed
50 patients per year, and it is desirable that the almost entirely within R environment using the
study will accrue the patients in 5 years or less. package cmprsk developed by Gray. This package
The analysis will take place 1 year after finishing contains functions which give the possibility to
accrual. The α level is set to 0.05, and the desired estimate the probability of event of interest at any
power is 80%. Thus, z1α/2 = 1.96 and time point, to plot these estimates, to apply the
z1β = 0.84. Fine and Gray model, and to plot the predictive
Note that the given rates for the local relapse probabilities of the event of interest based on this
refer to the marginal distributions; basically model. The package crrSC developed by Zhou
these rates are the hazards of the marginal expo- extends the Fine and Gray model for stratified or
nential distributions. The ratio of the two rates cluster data.
for the local relapse (0.4 and 0.2) is not the The package mstate can be used to modify the
subdistribution hazard ratio which will be data such that the usual Cox model can be applied.
detected. Unfortunately, even in the simple situ- This analysis still models the subdistribution haz-
ation when all distributions are exponential and ard, and the obtained coefficients are very close to
independent, the subdistribution hazards ratio is the results obtained using the function crr from
not independent of time. Its formula can be writ- cmprsk. However, the variance-covariance matrix
ten as: is slightly different, but for large datasets the
differences are minimal.
STATA has a function which allows the user to
λ1 ðλ1 þ λcr Þ λcr þ λ2 eðλev þλcr Þt eðλev þλcr Þt
sHR ¼ apply the Fine and Gray model. The plots
λ2 ðλ2 þ λcr Þðλcr þ λ1 eðλ1 þλcr Þt Þ
obtained are the predictive plots from the model.
(19)
Fine JP, Gray RJ. A proportional hazards model for the Maida V, Ennis M, Corban J. Wound outcomes in patients with
subdistribution of a competing risk. J Am Stat Assoc. advanced illness. Int Wound J. 2012;9(6):683–92.
1999;94:496–509. Pintilie M. Competing risks a practical perspective. Chich-
Gooley TA, Leisenring W, Crowley J, Storer BE. ester: Wiley & Sons Ltd; 2006.
Estimation of failure probabilities in the presence of Pintilie M, Bai Y, Yun LS, Hodgson DC. The analysis
competing risks: new representations of old estimators. of case cohort design in the presence of
Stat Med. 1999;18:695–706. competing risks with application to estimate the
Kalbfleisch JD, Prentice RL. The statistical analysis of fail- risk of delayed cardiac toxicity among Hodgkin
ure time data. New York: John Wiley & Sons, Inc.; 1980. Lymphoma survivors. Stat Med. 2010;29(27):
Lee EW, Wei LJ, Amato D, Leurgans S. Cox-type regres- 2802–10.
sion analysis for large numbers of small groups of Tsiatis A. Nonidentifiability aspect of problem of compet-
correlated failure time observations. In: Klein JP, ing risks. Proc Natl Acad Sci U S A. 1975;72:20–2.
Goel PK, editors. Survival analysis: state of the art. Zhou BQ, Latouche A, Rocha V, Fine J. Competing risks
Dordrecht: Kluwer; 1992. regression for stratified data. Biostatistics. 2011;
Maida V, Corbo M, Dolzhykov M, Ennis M, Irani S, 67(2):661–70.
Trozzolo L. Wounds in advanced illness: a prevalence Zhou BQ, Fine J, Latouche A, Labopin M. Competing
and incidence study based on a prospective case series. risks regression for clustered data. Biostatistics.
Int Wound J. 2008;5(2):305–14. 2012;13(3):371–83.
Modeling and Analysis of Cost Data
20
Shizhe Chen and XH Andrew Zhou
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Methods for Mean Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Parametric Methods on Continuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Nonparametric Methods on Continuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
Zero-Inflated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Two Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
Applications on a Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Parameters of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Linear Regression on Raw Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Transformation on Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
Transformation on E[Y] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Two-Part Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
Some Basic Concepts of Prediction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
Difference from Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
Concept of General Pivots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
Variances and Estimators for Back-Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
Abstract
Cost has become an important outcome in health
S. Chen services research. It can be used not only as a
Department of Biostatistics, University of Washington, measure for health care spending but also as a
Seattle, WA, USA measure for a part of health care value. Given
X. A. Zhou (*) ever-increasing rising health care expenditure,
Beijing International Center for Mathematical Research, the value of health care should include not only
Peking University, Beijing, China
traditional measures, such as mortality and mor-
VA Puget Sound Healthcare System, University of bidity, but also the cost of health care. Due to a
Washington, Seattle, WA, USA
e-mail: azhou@u.washington.edu limited resource, a new treatment with a slightly
better efficacy but much higher cost than an variable will exhibit heteroscedasticity. This kind
existing treatment may not be a choice of a of the mean-variance relation can also be
treatment for a patient. Hence, it is important to observed in many known parametric distributions,
be able to approximately analyze cost data. How- such as a Poisson distribution and a lognormal
ever, appropriately analyzing health care costs distribution. Many traditional statistical methods,
may be hindered by special distribution features such as ordinary least square (OLS), require
of cost data, including skewness, zero values, homoscedasticity in their validity in making sta-
clusters, heteroscedasticity, and multimodality. tistical inference. Ignoring heteroscedasticity in
Over the decades, various methods have cost data can lead to wrong statistical inferences.
been proposed to address these features. This The fourth feature is censoring of the cost out-
chapter would be devoted in introducing come, which occurs when the cost of a patient over
methods that are able to provide relatively a study period is observed. For example, a patient
trustworthy results with acceptable efficiency, drops out of the study before the study ends; as a
covering topics on mean inference, regression, result, we only observe the partial cost of this
and prediction. patient over the whole study period. Although the
problem of censored cost data is related to survival
analysis, analytic techniques are different from tra-
Introduction ditional survival analysis ones.
The fifth feature is clustering, which occurs
Rapidly rising of health care costs and health care due to the effects of clinicians and hospitals.
reforms to containing the health care costs makes Since some clinician tends to give patients similar
the cost be an important outcome in any health prescriptions and uses similar kinds of drugs and
services research. It is not straightforward to ana- treatments, the medical cost of this clinician is
lyze cost data due to some of its special distribu- expected to be correlated. The same reason goes
tional features, which prevent us from using with clinics and hospitals. Ignoring clustering
traditional statistical methods. would lead to invalid statistical inference.
The first feature of cost data is its positive skew- The final feature, not the last one, is multi-
ness or skewed to the right. The skewness arises due modality, which occurs when the distribution has
to a few patients with high costs, who are accounted more than one mode. This feature may be related to
for the major part of the total expenses. In addition, clinician clustering. For example, if the distribution
cost data often comes with a heavy upper tail, which of cost data is generated from patients who are cared
occurs when the tail of the distribution cannot be by two physicians with different treatment strate-
bounded by an exponential distribution. gies: one physician uses a more liberal approach of
The second feature is discontinuity of the dis- ordering tests and describing drugs, and another is
tribution at the zero value, which occurs because more conservative in treating his/her patients, the
not all subjects in the population of interest occur distribution of the cost data is a mixture of the
health care costs in a given study period. For distributions of two physicians, which may lead to
example, patients without any hospitalization dur- a bimodal distribution.
ing a study period have zero in-patient costs. One In this chapter, we are concentrating on a review
consequence of the distributional discontinuity at of statistical methods that can handle the first three
zero is that many standard statistical methods, distributional features of cost data: (1) skewness,
which require a continuous distribution assump- (2) zero values, and (3) heteroscedasticity. We
tion, cannot be used in the inference of cost data. review various methods that have been proposed
The third feature is heteroscedasticity, which to address these features. As there is no single
occurs when the variance of the cost of a patient is method that can handle all features that one might
not constant. For example, if the variance of a encounter with in a health cost study, in this chap-
random variable is a function of the mean, data ter, we also provide a rough evaluation of those
generated from the distribution of this random methods to help researchers in choosing methods
20 Modeling and Analysis of Cost Data 449
that are most suitable. This chapter is organized as the population is defined as subjects who
follows. Section “Introduction” focuses on mean received treatments and paid for them. Such
inference, which is the very foundation of health population is interesting for the study of the
cost analysis; section “Methods for Mean Infer- revenue of a department. It might also be a
ence” is about regression models, which is a com- distribution with a point mass at zero, when
plicated version of mean inference, and here the population is defined as a certain group of
covariates are taken into consideration; section people like citizens in a city, people in an insur-
“Regression” is a brief introduction on prediction ance plan, etc. This kind of distribution is
models and some important concepts about predic- named zero-inflated distribution or delta distri-
tion models. bution by Aitchison (1955) The first situation
can be seen as a special case of the second one
where the point mass at zero is 0. Hence,
Methods for Mean Inference methods for continuous distributions can be
used in the zero-inflated distribution with some
Methods and theorems are developed to summa- modifications. This section will begin with dis-
rize the distribution of health cost data which, as cussions on continuous distributions and then
described in the previous section, does not have proceed to the case with positive point mass
“nice” properties that we usually assume to be at zero.
true. The choice of quantity that summarizes the
distribution – or, in other words, the summary
Parametric Methods
measure – should be considered on the base of
on Continuous Data
statistical convenience as well as scientific impor-
tance. For example, the sample median is known
As a classic way of doing statistical analysis, the
to be a better summary measure for the central
distribution of data is sometimes assumed to be
location of a skewed distribution than sample
known and has finite parameters that characterize
mean, but investigators care about the total cost
the distribution. This kind of assumptions is called
instead of the median cost in most of the time. As
parametric assumption. For instance, normality is
will be shown later, a bunch of methods were
a well-known example of parametric assumption,
proposed to find consistent and efficient estima-
in which the distribution is characterized by two
tors for the population mean.
parameters, the expectation and variance. Unfor-
Generally speaking, methods with more
tunately, this normality assumption does not apply
assumptions perform better than others when the
for medical cost data, which is often highly right
assumptions hold or not being violated too much.
skewed. A common practice is to transform
Study has shown that using models with inappro-
the data into a more well-behaved form. And
priate assumptions on certain data would result in
then it is possible to assign the normality assump-
disastrous estimators (Briggs et al. 2005). Some
tion or some other parametric models on the
methods depend on few or no assumptions, which
transformed data.
can be called robust models, but these methods are
Box (1976) proposed a family of transforma-
often low in efficiency. As the famous quote says
tions that can be modified to fit in various situations:
“All models are wrong, but some are useful” (Box
1976). The choice of models is important espe-
yλ 1
cially in health cost data where the samples behave ¼ xβ þ e, if λ 6¼ 0; logðyÞ
poorly, though no clear boundary can be drawn in λ
making this decision. It is recommended to check ¼ xβ þ e, λ ¼ 0, (1)
the assumptions when applying certain methods.
Depending on the target population, medical where y is the original dependent variable, x is
costs have two possible distributions. It might be a row vector of covariates, e is an additive error
a continuous distribution with positive values when term that is independent of the covariates x and β,
450 S. Chen and X. A. Zhou
If, after transformation, the residuals follow a when 0 < σ 2 < n/2. The MSE in (5) can be
normal distribution, then the expected value of estimated by plugging in the estimators of σ 2 and
y can be written down by straight forward μ, which are S2 and W, respectively.
calculations: 2. Uniformly Minimum-Variance Unbiased
Estimator (UMVUE)
Eðyj xÞ ¼ exp xβ þ 0:5σ 2 ðxÞ : (3)
^θ u ¼ exp W gn S2 =2 , (6)
Notice that (3) shows us that the untransformed
mean is a function of both transformed mean and
variance. where
20 Modeling and Analysis of Cost Data 451
1
X
be turned into the CIs of θ by simply
1 n 1 þ 2r
gn ðtÞ ¼ exponentiating the lower and upper bounds.
r! n1
r¼0
r (7) Recall that {Wi = log(Yi)} are normally distrib-
ðn1Þt
r n1
∏ : 2
uted, so W þ S2 is the UMVU estimator for ln(θ).
i¼1 n 1 þ 2i
n
The target now is to estimate the confidence inter-
2
val of W þ S2 . Zhou and Gao (1997) summarized
It can be tell from its name that ^θ u is an unbiased
several practical procedures with median or large
estimator for θ. The mean square error for ^θ u is
sample sizes. Krishnamoorthy and Mathew
(2003) applied the general pivotal quantity on
2 1 2 1 4
E ^
θu θ 2
¼ θ exp σ gn σ 1 : this issue and got asymptotically efficient estima-
n 2n tors for the confidence intervals.
(8) In general, one cannot use confidence intervals
to make statistical inference as they have slight
3. Conditionally Minimal Mean Squared Error differences in between them. But in this simple
(MSE) Estimator case of one-sample mean inference, hypothesis
testing is equivalent with testing whether the
n4 2 mean under null hypothesis lies inside the 100
^
θ c ¼ exp W gn S , (9)
2n 2 (1 α)% confidence intervals or not. And thus,
a more desirable confidence interval will be a
where gn is the same as defined in (7). This more reliable approach of hypothesis testing.
estimator is biased; the bias is Notice that the 100(1 α)% confidence inter-
vals can also be used in hypothesis testing under
3 2 this one-sample setting. The null hypothesis will
^
E θ c θ ¼ θ exp σ 1 :
2n be rejected with significant level α when the null
mean lies outside of the confidence intervals.
The MSE of ^
θ c is 1 Cox’s method: The estimator for the variance
of W þ S2 =2 is S2/n + S4/(2(n — 1)). Cox, in a
! personal communication to Land (1972), proposed
2 2 2 ð n 4Þ 2 4
E ^
θc θ ¼θ 2
exp σ gn σ to construct the confidence intervals for ln(θ) by
n 2nðn 1Þ2
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3
2exp σ 2 þ 1Þ: (10) S2 S2 S4
2n W þ Z 1α=2 þ , (11)
2 2 2ð n 1Þ
Simulation results by Zhou (1998) show that the
conditionally minimal MSE estimator ^θ c is uni- where Z1 α/2 is the 100(1 α)% quantile of a
formly superior to the alternatives. However, standard normal distribution, i.e., normal distribu-
MSEs of those estimators are almost the same tion with mean zero and standard deviation of
when the sample size is sufficiently large 1. The corresponding confidence intervals for θ is
(n 200). In this case, the MLE ^θ m is recommended 0 8 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi9
because it is easy to compute. With a small sample < 2 2
S4 =
@exp W þ S Z1α=2 S þ ,
size, the conditionally minimal MSE estimator ^θ c is : 2 2 2ðn 1Þ;
more preferable than others. 8 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi91
< S 2
S2 S4 = A
Confidence Intervals exp W þ þ Z 1α=2 þ :
: 2 2 2ð n 1Þ ;
The construction of confidence intervals is more
straightforward than the estimators, due to the fact
that quantiles are invariant under monotone trans- 2. Angus’s conservative method: Although the
formation. The confidence intervals of ln(θ) can exact pivotal quantity is not available in this
452 S. Chen and X. A. Zhou
problem, an approximate pivotal statistics is avail- calculate T i as in (13), and denote the tl as the
able as 1 α/2 empirical quantile and tu as the α/2
empirical quantile. The estimated bounds are
pffiffiffi
n W þ S2 =2 lnðθÞ
V ðθ Þ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
, (12) sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
S2 1 þ S2 =2 S2 t S2
L1α ¼ W þ plffiffiffi S2 1 þ , (16)
2 n 2
which, in a finite sample, has the same distri-
bution as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
S2 t 2 S2
U 1α ¼ W þ puffiffiffi S 1þ : (17)
pffiffiffi 2 n 2
n 2
Nþσ χ n1 =ðn 1Þ 1
2
T ðνÞ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
, (13) So the 100(1 α)% confidence intervals for θ
χ n1 σ 2 χ 2n1
1þ is (exp(L1α), exp.(U1α)).
n1 2 n1 4. A signed likelihood ratio approach: Wu et al.
(2003) used the log-likelihood ratio to construct
where N and χ 2n1 are independent random confidence intervals. The signed log-likelihood
variables from a standard normal distribution ratio r is defined as
and a χ2 distribution with n-1 d.f., respectively.
The conservative CIs are
1=2
r ðmÞ ¼ sgnðm ^ σ^ 2 l m, σ^ 2m
^ mÞ 2 l m, :
(18)
S2 t1α=2 ðn 1Þ
L1α ¼ W þ pffiffiffi
2 n The log-likelihood as a function of m = log(θ)
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
and σ 2 is
S2
S2 1 þ , (14)
2
P
n
Yi
S2 qα=2 ðn 1Þ n 1
U 1α ¼Wþ pffiffiffi l m, σ 2 ¼ log σ 2 þ m σ 2 i¼1 2
2 n 2 2 σ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 X 2
n
1 2
n
S2 2 Yi m σ2 :
S2 1 þ , (15) 2σ i¼1 2 2σ 2
2
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
r ðmÞ ¼ ^ mÞ
sgnðm the same algorithm described in last section, with
n 1=2 simply replacing r with r*.
σ^ 2
n log σ^m2 þ n ω1 m þ σ^ 2m =2Þ
6. A generalized pivot approach:
(19) Krishnamoorthy and Mathew (2003) applied the
concept of generalized pivotal quantity on lognor-
The
100(1
α)% confidence
intervals would mal means. The generalized pivotal quantity can be
be exp m ^ α=2 , exp m
^ 1α=2 , where m ^ α is the viewed as a new concept of hypothesis test, and it
solution of r(m) = zα – the ath quantile of a yields the same coverage rate as a standard
standard normal distribution. The equation has frequentist hypothesis testing asymptotically. For
no explicit solution, but it can be approached by more details about generalized pivotal quantity, or
numerical methods such as Newton-Raphson fiducial quantity, please see the appendix of this
method, etc. An example of constructing the chapter. The generalized pivot for ln(θ) is given by
lower bound is given by (Wu et al. 2003):
showed that the adjusted signed log-likelihood The first one is to estimate it directly from the
ratio-based method provided the most satisfactory sample standard error. In other words, the standard
n
P
coverage probability and average biases. 1
error sb is the square root of nðn1 Yi Y .
Þ
Although the computation of adjusted signed i¼1
log-likelihood ratio approach is way more com- The second one is to use the bootstrap
plicated than others, it is recommended when the approach proposed by Efron (1981). The algo-
sample size is too small for other methods. rithm can be summarized as below:
It is worthwhile to notice that all methods
above are based on the lognormal assumption, 1. Resample n observations from the original data
which requires the log-transformed data to be with equal weight and replacement.
normally distributed. Although the estimators 2. Calculate the sample mean from the newly
still behave well when the log-transformed data sampled data, denoted as θis .
is approximately normally distributed, Briggs 3. Repeat steps 1 and 2 for M times, where M is a
et al. (2005) argued that the inference would be sufficiently large number chosen by the
invalid and misleading when the sample distribu- investigator.
tion extremely deviants from the assumed distribu- 4. Calculate the standard error sb of θis .
tion. Hence, checking the normality (with QQ plot,
goodness of fit, etc.) of transformed data is always Based on central limit theorem, the confidence
necessary. When the normality assumption is not interval would be Y þ Sb Zα=2 , Y þ Sb Z1α=2 ,
appropriate, other distributions such as Gamma are where Zq is the q-th quantile of a standard normal
available. And it is always possible to trade effi- distribution.
ciency for robustness via using nonparametric Hall (1992) proposed a monotone transformation
methods which will be introduced later. of t-statistics to correct for skewness effects of a
positive skewed distribution without assuming any
parametric forms. The original t-statistic is
Nonparametric Methods pffiffiffi
n Yθ
on Continuous Data T¼ ,
^τ
It is totally possible to estimate θ and confidence 2
where ^τ ¼ 1n Y i Y . The transformation is
interval and do hypothesis testing without para-
metric assumptions. Although the efficiency is
gðT Þ ¼ T þ n1=2^γ aT 2 þ b
often not satisfactory, the central limit theorem (23)
granted that the sample mean would converge to þn1 ða^γ Þ2 T 3 =3,
a normal distribution. n
P 3
Denote the sample mean as where ^γ ¼ 1n Y i Y =^τ 3 , a ¼ 1=3 , and
i¼1
b = 1/6. It is monotone and invertible. The unique
^θ s Y: (22) inverse function of g is
1. Resample n observations from the original data A most commonly used parametric model for
with equal weight and replacement. zero-inflated data is a two-part model. A two-part
2. Calculate g(t) from the newly sampled data. model assumes that the number of zero observations
3. Repeat steps 1 and 2 for M times, where M is a is a random variable from a binomial distribution bin
sufficiently large number chosen by the (n,p), where n is the number of observations and p is
investigator. the probability of one subject to have zero medical
4. Denote the sample α/2 and 1 α/2 quantiles as cost in study period. The nonzero observations,
gα=2 and g1α=2 . conditioned on the fact that they are nonzero, are
treated as the continuous data discussed in previous
resulting I α two-sided
The confidenceintervals sections. The conditional distribution is assumed to
are Y n ^τ g g1α=2 , Y n1=2^τ g1 gα=2 .
1=2 1 be a lognormal distribution in this section.
For each group, the distribution of samples is a
Zhou and Gao (2000) recommended the appli-
lognormal distribution with a point mass at zero,
cation of parametric bootstrap version of Hall’s
which is named as delta distribution by Aitchison
method, which yields the best coverage rate for
(1955). Suppose {Y1, ..., Yn} is a random sample
both upper and lower endpoints in a simulation
from a delta distribution, then the population
study of one-sided confidence intervals.
mean is
In another simulation study, Zhou (1998) showed
that the sample mean has relatively large mean
θ ¼ ð1 pÞ exp μ þ σ 2 ,
square error even when the sample size is as large
as 200, compared to other estimators discussed in
where p is the probability of the random vari-
the previous section. And the mean square error
able to be zero and μ and σ are mean and variance,
increases as σ increases, which is equivalent to say
respectively, of the conditional normal distribu-
as the skewness increases. It is important to notice
tion after transformation. Denote the number of
that the simulation study is conducted on lognormal
zero observations as N0, the number of nonzero
data, where the lognormal assumption actually
observations as N1. In this section, the parameter
holds. This might explain part of the bad perfor-
of interest is θ, and, again, the construction of
mance of sample mean compared to estimators
confidence intervals of θ is also discussed.
based on lognormal assumption. Yet the efficiency
of sample mean on skewed data is still very low.
Point Estimate
1. The MVUE fo θ is
Zero-Inflated Data
^θ A ¼ ð1 ^p Þ expðμ 1 2
As discussed, medical cost data is often accompa- ^ Þgn σ^ , (25)
2
nied with a considerable amount of observations
that have zero cost. The proportion of zero data where
might sometimes reach 30%. This point mass at
zero causes extra difficulty in making statistical N0
^p ¼ ,
inference, but it could be easily fixed with small n
modifications of the methods used on continuous
data. The nonparametric methods described in the 1 X N1
μ
^¼ wi ,
previous section, in fact, need no modifications at N 1 i¼1
all and can be used directly in this situation. For
instance, the sample mean is a nonparametric and
estimator of the population mean, and bootstrap
would give a confidence interval for it. So they
1 X N1
will not be discussed in this section anymore, and σ^ 2 ¼ ^ Þ2 :
ðwi μ
the focus will be placed on parametric methods. N 1 1 i¼1
456 S. Chen and X. A. Zhou
pffiffiffiffiffiffi pffiffiffiffiffiffi 2 !
2. A bias-corrected MLE for θ is N1 N1 σ N 1 χ ðN1 1Þ
log þZþ 1
σ nð 1 pÞ 2 N1
^ 1 2 T¼
0:5 ,
θ M ¼ ð1 ^p Þ exp μ^ σ^ : (26) χ2 σ2 χ 2
2 ðN1 1Þ
nσ 2 þ N 1
nN 1
1 þ 2N1 ðN 1 1Þ
intervals are
1 2
t ^ σ þ σ^ 4 :
p ð1 ^p Þ þ ð1 ^p Þ 2^
2 ^θ M exp tα=2 SE
^ , ^θ M exp t1α=2 SE
^ :
So the 100(1 α)% confidence intervals of ^θ A 4. A signed likelihood ratio approach: The
can be asymptotically approximated by ML confidence intervals are based on the
pffiffiffiffiffiffi pffiffiffiffi asymptotic normality of MLE, which is ques-
^
θ A z1α=2 V, ^θ A zα=2 V :
tionable with small or moderate samples. An
alternative would be the likelihood ratio inter-
2. The ML confidence intervals: Using delta
val. The log-likelihood as a function of m = log
method and property of MLE, a consistent vari-
(θ), μ, and σ2 is
ance
estimator
of the bias-corrected MLE,
log ^
θ M , can be written as
σ2
l m, μ, σ 2 ¼ N 0 log 1 exp m μ
2 4 2
^ 2 ¼ N 0 þ σ^ þ σ^ :
SE
nN 1 N 1 2N 1 σ2 N1
þ N1 θ μ logσ 2
So the two-sided 100(1 α)% confidence 2 2
intervals are 1 X N1
ðwi μÞ2 :
^ ^ , ^θ M exp z1α=2 SE
θ M exp zα=2 SE ^ : 2σ 2 i¼1
3. A bootstrap approach for ML confidence Since there are nuisance parameters in the
intervals: Similar to the Angus methods in the log-likelihood, the profile likelihood for m will
previous section, an approximate pivotal statistics be used to compute the likelihood ratio statistics.
can be derived: In general, the way to solve this problem is
σ^ 2 σ^ 2 to (i) use iterative algorithm to find the fi and
logð1 ^
pÞ þ μ ^ þ logð1 pÞ μ
T¼ 2 2 : a2 that maximized the log-likelihood given
n o0:5
^ σ^ 2 σ^ 4 m and μ + σ 2 > m; (ii) define lprof ðmÞ ¼
p Þ þ nð1^p Þ þ 2nð1^p Þ
p
nð1^
^ ½m, σ^ 2 ½mÞ, and find the m that maximizes
lðm, μ
(27) this profile log likelihood; (iii) define likelihood
It follows the same distribution as the follow- ratio statistics W ðmÞ ¼ 2 lprof ðm ^ Þ lp ðmÞ ffi ;
profffiffiffiffiffiffiffiffiffiffiffi
ing statistic: and (iv) define Z ðmÞ ¼ sgnðm ^ m Þ W ðm Þ .
20 Modeling and Analysis of Cost Data 457
The
100(1
α)% confidence
intervals would be where
exp m ^ α=2 , exp m^ 1α=2 , where m
^ α is the solu-
tion of Zðm ^ α Þ ¼ zα . σ^ 2 ðmÞ
exp m μ ^ ðmÞ
5. An adjusted signed log-likelihood ratio 2
^a m ¼ ,
approach: Tian and Wu (2006) proposed a mod- σ^ 2 ðmÞ
ified version of the signed likelihood ratio 1 exp m μ ^ ðmÞ
2
statistics. They defined an adjusted signed ^a m
log-likelihood statistics as ^b m ¼ ,
σ^ 2 ðmÞ
1 exp m μ ^ ðmÞ
uðmÞ 2
Z ðmÞ ¼ Z ðmÞZ1 ðmÞlog , XN1
Z ðmÞ
T ¼ W 2i :
where Z(m) is defined i¼1
pffiffiffiffiffiffiffiffiffiffiffiffi as in the previous
^ mÞ W ðmÞ . The u(m) here is
section: sgnðm
complicated:
The
100(1
α)% confidence
intervals would be
exp m ^ α=2 , exp m ^ 1α=2 , where m
^ α is the solu-
AC tion of Z ðm ^ α Þ ¼ zα .
uðmÞ ¼ (29) 6. A generalized pivot approach: Tian (2005)
BD
applied the concept of generalized confidence
where A, B, C, and D are intervals on the zero-inflated data. Recall that the
models are almost the same except for the excess
μ
^ ðmÞ μ
^ ðmÞ μ ^ zeros. Tian derived a generalized pivot for p using
A¼ ^ am þ 1 þ 2
σ^ ðmÞ 2^ σ 2 σ^ 4 ðmÞ the relationship between binomial distribution
and beta distribution. The author also provided a
1 1 N1
þ log computing algorithm for this method:
σ^ 2 ðmÞ 2^ σ 4 ðm Þ N 0 ^a m
2 2 2
1 σ^ μ^ μ
^ ðm Þ i). Compute the transformed sample mean W
log 2 2þ 2
2 σ^ ðmÞ 2^
σ 2^σ ðmÞ and sample variance S2.
ii). Generate Z ~ N(0,1), U 2 χ 2N1 1 T p1 ~ beta(N0
1 ^
am 1 1 ^ 2 ðmÞ
μ
þ 2 þ þ 2 4
σ^ ðmÞ 2 2 2^ σ ðmÞ 2^ σ ðmÞ + 1, N1), and T p2 ~ beta(N0, N1 + 1). Com-
h qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffii pffiffiffiffiffiffi
1 1 pute Tθ ¼ W Z=U= N 1 1 s= N 1 þ
2 , 2 2
2^
σ 2 2^ σ ðmÞ S =U =ðN 1 1Þ . Then compute T 1 ¼ log
n 1 T p1 þ T θ and T 2 ¼ logð1T p2 Þ þ T θ .
B¼ ,
2N 0 σ^ 6 iii). Repeat ii for sufficiently many times and get a
nN 31 series of T1’s and T2’s.
C¼ , iv). Take the α/2 sample quantile of T1’s, denoted
2N 0 σ^ 6
" as L, and take the (1 α /2) sample quantile
N 0 ^
b m N1 N 0 ^b m N1 T of T2’s, denoted as U. The 100(1 α)% con-
D¼ 2
þ 4 6
σ^ ðmÞ 4 σ ðmÞ σ^ ðmÞ
2^ fidence intervals would be (L, U).
P
N1
#
2^
μ ðm Þ Wi The simulation by Zhou and Tu (2000) showed
i¼1 N1μ^ 2 ðmÞ that the bootstrap interval yields the best coverage
þ 6
σ^ 6 ðmÞ σ^ ðmÞ probability among the first four methods in small
0 12 to moderate samples, although bias-corrected ML
P
N1
has better accuracy when the skewness is very
BN ^ ^ ðmÞ i¼1 C
W i
B 0 b mn N 1 μ C small. Tian (2005) verified that the generalized
B þ 4 4 C ,
@ 2 σ^ ðmÞ σ^ ðmÞA confidence intervals provide comparable results as
the first four methods. Based on Tian’s simulation,
458 S. Chen and X. A. Zhou
ð1 ð1 where
σ 2Δ ¼
p¼0 q¼0
2 3
ð1 X
1 4 1 λ
F1 F ðpÞ þ F1 Bj ðqÞ F1 1 5
1 ð pÞ þ 2 ð pÞ 1 ðqÞ þ F2 ðqÞ dq
2 1 j¼1
0
η k ð pÞ ¼ :
ð1Þ Fg ðpÞf g F1
g 1
g ð p Þ
0
The estimation is achieved by substituting all ^v 2 ¼ h θ^ I 1 h ^θ , (37)
unknown values with their empirical estimates.
In a simulation study, Dominici et al. (2005) where I is the Fisher information matrix and Î
showed that Δ ^ ðλ ¼ 2Þ has more robust perfor-
denotes its estimator:
mance than the MLE of lognormal distribution,
and it yields almost the same result when the 0 1
n1 =^
σ1 0 2 0 0
parametric assumption is met. The choice of λ B 0 n1 = 2^σ1 0 0 C
B C
can also be made by using cross validation. How- @ 0 0 n1 =^
σ1 0 A
2
ever, the computation of quantile smooth estima- 0 0 0 n1 = 2^σ1
tion, especially its asymptotic variance, is rather
difficult compared to those of MLE. The function h is defined as the partial deriva-
tive of δ with respect to φ = (μ1, σ 21 , μ2,σ 22 ):
Confidence Intervals
With no parametric assumption, one can use boot- @δ
strap or the asymptotic distribution of smooth hð θ Þ ¼
@θ
quantile ratio estimator to construct the confidence 0
1 1
intervals for the corresponding estimators. There are ¼ m1 , m1 , m2 , m2 , (38)
various ways to construct the confidence intervals 2 2
when lognormal assumption is applied.
where m1 ¼ exp μ1 þ 12 σ 21 and
1. A maximum likelihood approach: The max-
imum likelihood estimate for δ is m2 ¼ exp μ2 þ 12 σ 22 . The 100(1 α)% confi-
dence interval can be given by
^δ ¼ exp μ 1 2 1 2
^ 1 þ σ^ 1 exp μ ^ 2 þ σ^ 2 : (34)
2 2 δ zα=2^v , ^δ þ zα=2^v , (39)
where
where z comes from a standard normal distri-
1 Xni
bution. Since this is an asymptotic property, this
^i ¼
μ W ij ; (35)
ni j¼1 CI can be foreseen to have poor performance in
1 X ni
2 small sample settings.
σ^ 2i ¼ W ij μ^i : (36) 2. A bootstrap approach: A parametric boot-
ni j¼1
strap method can be employed to replace the role
It is known that the asymptotic variance of of asymptotic standard normal distribution. The
MLE achieves the variance bound given by algorithm is summarized below:
460 S. Chen and X. A. Zhou
infinite categories, a test of the mean relation of section, log-transformed data often has more sym-
the dependent variable and covariate can be metric distribution than the original data. And the
achieved by a linear regression. Both ordinary heteroscedasticity found in cost data can some-
linear regression and generalized linear model times be mediated by variance-stabilizing trans-
describe the mean relation of dependent variable formation including log transformation. Thus,
– the outcome and covariates. In other words, it is linear regressions can be applied on this trans-
an “on-average” type of description of the data. formed data. However, the regression on trans-
The other kind of regression that is going to be formed data can only be interpreted as the mean
discussed in this section is the quantile regression. relationship between the transformed outcome
As will be explained later, quantile regression is and covariates, which is not of scientific interest.
slightly different in interpretation from linear It does not cause any trouble when the relation
regression. of interest is multiplicative, for instance, the
There are extensive econometric literatures on influence of inflation rate on wages. But when
methodologies and applications of regression on the quantity of interest is, say, the total cost, a
medical costs. The features of cost data are the regression on the transformed data is not
same as those in last section: skewness, nonnega- enough to answer the question. Therefore,
tive values, and nontrivial fraction of zero obser- back-transformation becomes a problem. The
vations. Clustering and multimodality might also smearing estimator by Duan (1983) is dominat-
affect the validity of results if not properly ing in this area.
adjusted. Another way to deal with skewness and non-
The most common way to analyze cost data is constant variance is to implement a generalized
log transformation. As discussed in the last linear model (GLM). The relation between the
dependent variable and covariates is described
Table 2 95% confidence intervals of the one-sample by two equations in GLM, which are the link
mean (2)
function and mean-variance relationship. The
NP methods flexibility of link function and variance structure
95% Confidence NP provides a wide range of models that can be
intervals CLT bootstrap Hall
described under the setting of GLM. Various
Lower bounds 343.5 346.2 420.5
methods have been proposed to facilitate
Upper bounds 819 816.2 1692.3
researchers to choose the best models that fit the
data. Manning et al. (2005) discovered that the
Table 3 Estimates of the one-sample mean GLM and log-transformed OLS can be summa-
Point Sample cm rized in one family of models named generalized
estimate mean MLE UMVUE MSE gamma model.
581.3 542.7 540 529 In most study, the methods described above
would not be considered complete without the
Table 4 Estimates of the zero-inflated mean way to deal with the nontrivial fraction of zeros.
Point Sample Bias-corrected The zeros cause a direct problem with log trans-
estimate mean MLE MVUE formation, where log (0) has no meaning. A
492.3 462.7 457.7 straightforward, also naive, solution is to add a
Table 6 Estimates of the mean difference between two 2. The (conditional) marginal effect @@μxðkxÞ ¼
continuous samples
@ E½yj x
Point estimate Sample mean MLE @ xk . It is a typical measurement of how a
269.1 331.6 certain covariate xk affects the dependent vari-
able Y. In simple regression, it is called “slope.”
However, the concept of slope might not be
Table 7 95% confidence intervals of the mean difference
valid in other framework of regressions, and
between two continuous samples
that is why the marginal effect is brought
95% Confidence Parametric Generalized
intervals MLE bootstrap SLR pivotal up. Noted that the slope in linear regression
Lower bounds 491.5 174.9 442.1 195.5 does not depend on other covariates, marginal
Upper bounds 1154.7 2455.3 2568.8 8613.1 effects are different in the sense that they actu-
ally depend on the value of other covariates.
Interpretations of marginal effects must not
Table 8 P-value of the hypothesis that mean difference
is zero
ignore this property.
Pn
@ μðxÞ
Score Z-Score Bootstrap Generalized 3. The average marginal effects θ1 ¼ 1n @ xk
p-Value test test test p-value i¼1
estimate of the variance-covariance matrix is There is also an assumption called weak exo-
highly recommended to construct the robust stan- geneity, which is weaker than this one. When
dard errors of coefficients. A typical linear model X is treated as fixed, the exogeneity holds.
can be written as
Assumption 3: No Multicollinearity
Y ¼ Xβ þ e, rank(X) = p,
or, in other words, none of the row vectors of
where X can be written as a linear combination of
other rows.
Y ¼ ðy1 , y2 . . . , yn ÞT ,
Assumption 4: Uncorrelation
and the design matrix
cor ei , ej j X ¼ 0, i 6¼ j:
0 1
x11 x12 x13 ... x1p
B x21 x22 x23 ... x2p C If this assumption is violated, it is necessary to
X¼B
@⋮
C,
⋮ ⋮ ⋮ ⋮A use estimators of standard errors that adjust for
xn1 xn2 xn3 ... xnp correlations. This is most often observed in spatial
or temporal data. A group of observations that has
and the residuals: correlations among its members is called a cluster.
Researches are often restricted by sample sizes, in changes of the outcome. It is sometimes the quantity
which cases efficiency would be of more importance of interest, say, when investigating the association
to investigators than consistency. Unfortunately, one between wages and inflation rate. If then, the prob-
major disadvantage of linear regression on raw data lem of back-transformation is avoided. Yet the infer-
is its lack of efficiency. In other words, this regres- ence of total mean is often what investigators
sion method needs a greater sample size to reach the of medical cost concern about, which requires
same accuracy than some other methods. Recall that back-transformation. Another issue is that
skewness and heavy tailedness of the distribution of variance-stabilizing transformations can normal-
outcomes (Y) are the main features that are respon- ize the distribution of dependent variable, while
sible for the low efficiency of ordinary regressions. It they may not stabilize the variance as it should
is natural to think of transformations on Y to “cor- do. Therefore, homoscedasticity might not hold
rect” these features. for the transformed data.
The next step is to apply the methods discussed
Transformation on Y in the last section on the transformed data. It is
recommended to employ as few assumptions as
The intuition of transformation is straightforward: possible since there is no a priori knowledge of the
to achieve a better distribution of data by trans- transformed data. The inference made on trans-
forming the outcomes with some monotone func- formed scale might be adequate to answer the
tions. The advantage is also clear: an appropriate questions as mentioned above, and then there is
transformation would increase the efficiency of no need for the back-transformation step. Other-
estimation (Manning and Mullahy 2001, Briggs wise, the analysis should be continued.
et al. 2005). As been discussed in section “Param- The last step, back-transformation, is the key
eters of Interest,” an obvious issue of transforma- step in this method. Transformation is a tool to
tion is the change of scale. The inference made on gain efficiency, but the questions of interest are
transformed scales might not have scientific still on the original scale of the cost data. The
meanings. Moreover, it is inappropriate to trans- back-transformation methods are dominated by
form estimates directly back to original scale, Duan’s smearing estimators. Duan (1983) pro-
resulting in biased and inconsistent estimates. poses a nonparametric estimator that uses the
Statistical inferences on transformed scale are average of the transformed residuals to estimate
very likely to be different from those made on the expectation of dependent variable on the orig-
original scale. Thus, the main difficulty in the inal scale. We estimate EY0 by substituting the
methods based on transformation is the back- unknown cdf F by its empirical estimate F ^ n:
transformation problem.
A general procedure can be summarized into 1
EY 0 ¼ Σni¼1 hðx0 β þ ^e i Þ: (45)
three steps: transformation, regression, and back- n
transformation.
The first step, transformation, consists of choos- Further substituting the regression parameter β
ing a transform function h and substitute y with h( y). in (45) by its least squares estimates β^ , the
There are various functions that can serve as the smearing estimator is thus defined as
transform functions as long as it is monotone and
thus invertible. Box-Cox transformation is consid- 1
ered as a well-defined group of transformations for EY 0 ¼ Σni¼1 h x0 β^ þ ^e i : (46)
n
skewed data. Another variance-stabilizing transfor-
mation is also available (Weisberg 2005) For analy- Applications and generalizations of Duan’s
sis of cost data, log transformation is more method have been proposed in recent years. In
preferable than others in practice due to certain the rest of this section, three procedures would
practical reasons. For instance, regression analysis be introduced as examples for transformation-
on the log-transformed scale reveals the proportional based methods. The first one is the widely used
20 Modeling and Analysis of Cost Data 467
" #
logarithm transformation by Ai and Norton @^
μ ðxÞ @h x,β^ s2 ðx,^γ Þ
(2000); the rest are robust, yet efficient, nonpara- ¼μ
^ ðxÞ þ 0:5 ,
@xj @xj @xj
metric methods by Welsh and Zhou (2006) and
Zhou et al. (2008).
with variance
Example: Log Transformation 2
@ μ ðxÞ @ 2 μ ðxÞ
Ai and Norton (2000) derived the forms of stan- ω2j ðxÞ ¼ Σ β
dard errors of smearing estimators under log @xj @β @xj @β0
2
transformations by delta method. Their methods @ μðxÞ @ 2 μðxÞ
þ Σγ j 0 :
allow the situations where a nonlinear regres- @xj @γ @x @γ
sion has been applied in the second step. Results
for linear regression can be easily achieved from The sample average incremental effect or the
the general conclusions. marginal effect is
Although normality assumption might not X
^θ j ¼ 1
n
@2μ
^ ðx i Þ
always hold for transformed data, there is no ,
harm to look at the simplified case when the resid- n i¼1
@xj
uals are assumed to be normally distributed. Write
with variance
the model as ln( y) = k(x,β) + s(x,γ)e, where k(x,β)
is any models of the expectation of ln( y) given ! !!
x and e has mean 0 and unit variance. Imposing 1X n
@ 2 μðxi Þ 1X n
@ 2 μðxi Þ
ω3j ðxÞ ¼ Σβ
normality assumption on e means assuming e fol- n i¼1 @xj @β n i¼1 @β0 @xj
! !!
lows a standard normal distribution. Notice that the 1X n
@ 2 μðxi Þ 1X n
@ 2 μðxi Þ
square of s(x, γ) is the variance of the error term s þ Σγ :
n i¼1 @xj @γ n i¼1 @γ 0 @xj
(x,γ)e, writing it as a function of x allows for
heteroscedasticity. Both k(x,β) and s(x,γ) need to The quantities needed in the above formulas
be specified. For linear models, k(x,β) is defined as are listed in the appendix.
x β. Suppose β^ is the estimate of the linear regres-
0
1
Example: Nonparametric
pn ð z 1 , z 2 Þ ¼ Retransformation
nb1 b2
Welsh and Zhou (2006) proposed a method that
X n
Z 1i z1 Z 2i z2
K1 K2 , can estimate the back-transformed mean and its
i¼1
b1 b2 standard error for any transformation functions.
The model is assumed to be h( y) = x0β0 + gi(β0,
g1n ðyj z1 , z2 Þ ¼ @Gn ðyj z1 , z2 Þ=@z1 , γ 0)e, where gi can be a function of xi and ei are
20 Modeling and Analysis of Cost Data 469
independent and identically distributed random f. Assumed that such relationship exists: g
variables. ψ ~ (βT, γ T)T is estimated from estimat- (μ) = Xβ with g being a monotone increasing
ing equations. Then denote ηi = xTβ0 + g(ψ 0) function and the variance-covariance matrix of
hðy ÞxT β Y is a function of μ : V(μ), which is determined
ei(ψ), where ei ðψ Þ ¼ gi ðψ iÞ 0 and the estimated
i 0
P
n by the density function f. The function g is usually
mean on the original scale is m ^ ¼ h1 ðηi ðψ^ ÞÞ, called as the link function, and var(Y ) = V(μ) is
i¼1
which is also a smearing estimator. The idea of called the mean-variance relationship or variance
this method is to estimate the empirical distribu- function. The unknown parameter can be esti-
tion of residuals ei instead of making assumptions. mated by maximum likelihood estimator since a
The corresponding standards are estimated with parametric form of f is available. For short, a GLM
the help of the properties of empirical process. In describes the relation between a function of the
the original paper, Welsh and Zhou (2006) gener- expectation of Y and covariates; variation is
alized this method to the situation when there are addressed by the mean-variance relationship
observations with zero costs. and/or the assumed distribution.
The idea of transformation method is to trans- One important advantage of GLM is that it
form the data so that it has a “better” distribution, can handle various types of data. For instance,
which is often more symmetric and less heavy discrete data can be described by the Poisson
tailed and can be better fitted with a linear distribution with a log link function. For binary
model. By doing this, one can gain efficiency data, it can be analyzed by a Bernoulli distribu-
from transformation and assumptions. A natural tion with logit link, which is known as a logistic
alternative is to abandon the requirement about regression.
symmetry. For instance, a log-transformed linear Recall that in linear model, the normality
model can be interpreted as a lognormal model as assumption is the least important assumption
well. In the next section, this kind of models – because of central limit theorems. The same
generalized linear model – and applications of thing happens here. The parametric assumption
them will be discussed. is not necessarily required in setting a GLM
model, although it is still popular because of its
direct interpretations. In the previous setup, one
Transformation on E[Y] needs to specify the actual distribution of the
dependent variable and then use it to derive the
Linear model can be viewed as a parametric score function. But, in fact, one only needs the
model based on the normality assumption, where mean-variance relationship and use it to construct
the mean of normal distribution is assumed to an estimating equation which has the same prop-
have a linear relationship with the coefficients. If erties as the score function. The estimators from
the model is correct, the dependent variable is corresponding estimating equations are still con-
normally distributed – therefore symmetric and sistent. Therefore, the procedure reduced to spec-
without heavy tail. A natural generalization of ify (1) the link function and (2) the mean-variance
this traditional linear model is to expand the fam- relationship. Notice that the first term is the first
ily of distributions to account for possible skew- moment of dependent variable and the second
ness and heavy tail, which is called the term is about the second moment. That is why
generalized linear model by McCullagh and economatricians also call GLM and generalized
Nelder (1989). The GLM is first introduced to moment methods.
the area of medical cost analysis by Blough With parametric assumptions, the MLE might
et al. (1999). have explicit solutions. Otherwise, the estimators
Let μ be E(Y ), where Y is a n 1 vector. Yi, can be solved by solving the following estimating
i = 1, 2,..., n are i.i.d. from a common distribution equations with numerical method:
470 S. Chen and X. A. Zhou
XN
@μðxi ; βÞ 1 Quadratic variance family:
V ðμðxi ; βÞÞðyi μðxi ; βÞÞ ¼ 0,
@β
gðμi ; θ1 , θ2 Þ ¼ θ1 μi þ θ2 μ2i :
i¼1
(47)
Denote the parameters as γ = (βT, λ, θ1, θ2)T
Then the estimating equations are
where μðxi ; βÞ ¼ g1 x0i β . If the model is
specified correctly, the asymptotic variance of Giβj ¼ ðY i μi ÞV 1
i @μi =@βj ;
the estimator will be the inverse of Fisher infor-
mation up to some constant. Or, one can use the
Giλ ¼ ðY i μi ÞV 1
i ð@μi =@λÞ;
sandwich estimator as a robust estimator. A com-
monly used test for the coefficients is the h i
Wald test. Giθ1 ¼ ðY i μi Þ2 V i V 2
i ð@μi =@θ 1 Þ;
The interpretations of the regression must be
h i
taken care of. A GLM describes the relationship Giθ2 ¼ ðY i μi Þ2 V i V 2
i ð@μi =@θ 2 Þ:
between covariates and a function of Y’s expecta-
tion. Logistic regression, for example, shows the
linear relationship between the covariates and the And they can be combined in a vector form.
odds ratio. In medical cost data, the situation is Let
simpler since the most widely used GLM model in T
analyzing cost data is a gamma distribution with a Giγ Giβ1 :Giβ2 , . . . , Giβp , Giγ , Giθ1 , Giθ2 :
log link. Or without the parametric assumption,
one can employ a log link and V( y) = ϕμ2, which
The estimating equation is then
is a feature often observed in most medical cost
data (Blough et al. 1999, Manning and Mullahy X
n
2001). Giγ ¼ 0:
i¼1
heavy-tailed data. One key feature of this GGM is keep or drop all zero-cost observations depending
that it provides more flexibility than some other on the research interest. The most commonly used
models: lognormal models, gamma models, and modification is to construct a two-part model.
Weibull models are all special cases of GGM. The The intuition behind the two-part model is to
density function of GGM is describe separately the event that cost occurs
and how much the cost is when it occurs. The
γγ pffiffiffi outcome variable in the first part is a binary
f ðy; k, μ, σ Þ ¼ pffiffiffi exp z γ μ , (48)
σy γ Γðγ Þ variable 8i, where 0 stands for no cost occurs
and 1 stands for positive cost. Most of methods
where γ = |κ|2 , z = sign(k) log( y) μ/σ that are available for binary outcomes are appli-
and μ = γexp(| κ| z). The parameter μ is replaced cable here, and logistic regression is a typical
by XTi β. The expected value of y condition on x is method that one would use. In the second part,
given by all observations left have positive costs and that
turns the problem to what have been talked
h σ about.
Eðyj xÞ ¼ exp xT β þ log κ2
κ It seems a little bit complicated, but by the
1 σ 1 i short argument below, it will be clear why a
þlog Γ 2 þ log Γ 2 , two-part model will simplify the problem. Sup-
κ κ κ
pose there is a parametric distribution for the
second part. The likelihood function:
As shown in Manning et al. (2005), (48) is a
lognormal model when κ is close to zero or a n
gamma distribution when σ ¼ K > 0 . In other Ln ¼ ∏ pðδi j xi Þf ðyi j δi ¼ 1, xi Þδi
i¼1
words, the value of the parameters of GGM can
n
distinguish those special models from each other. ¼ ∏ pðδi j xi Þ ∏ f ðyi j δi ¼ 1, xi Þ (49)
A natural benefit of this kind of setting is that a i¼1 δi ¼1
model selection problem can be restated as a
hypothesis testing on the parameters. Or in If the conditional density function f does not
another aspect, it provides a systematic way to not depend on δ, then the likelihood function can
evaluate the appropriateness of those models. be maximized separately. Recall that all models in
In their paper, Manning et al. (2005) compared previous sections have nothing to do with δ; they
three versions of GMMs – featured by the way to can serve as the conditional density function here.
deal with heteroscedasticity – against some Therefore, all one needs to do is to analyze the first
existing model including back-transformed linear part and the second part separately and then com-
regression of ln( y) on x, a GLM with log link and bine the result into one. The estimated mean of the
gamma distribution, and a maximum likelihood population will be
estimator of Weibull model. Results showed that
the GGM would choose the right model properly, ^y ¼ ^p μ
^,
yet the heteroscedasticity in x has to be accounted
for. Also, GGM can better approximate the distri- where μ ^ is the estimated mean of the cost in the
bution of the data than other parametric models second part and ^p is the probability that cost
due to its flexibility. occurs. Blough et al. (1999) estimated the vari-
ance of ^y by
Two-Part Models
Varð^y Þ ¼ Varð^p μ
^Þ
The models discussed above are all based on ¼ ^p 2 Varðμ ^ 2 Varð^p Þ,
^Þ þ μ (50)
positive and continuous data. But in real-life
research, there is always a considerable fraction of which is an approximation of the true variance.
observations that have zero cost. One can choose to An alternative to use this equation is to generate
472 S. Chen and X. A. Zhou
the variance by bootstrap methods. The parame- outcomes. The mean is one quantity that can sum-
ters used in the first part are not necessary to be the marize the property of the conditional distribution
same as those in the second part. Interpretations of of outcome variables. Of course there are more
the coefficients are different from the previous summary quantities, for instance, the median,
section since the inference on the second part is 25%, and quantile, 75%, all of which can present
conditioning on the event that cost occurs. the distribution in some sense. It is noted that the
As mentioned early, two-part models are quite quantiles are better estimators than the mean for
popular in the analysis of medical cost. An exam- skewed or heavy-tailed data. However, the quan-
ple can be found in Blough et al. (1999) where tity of interest in this analysis is the total medical
they used a logistic regression for the first part and costs, which is directly related with mean but not a
a GLM with log link for the second part. If one single quantile. In order to estimate the total med-
chooses to transform yi, the back-transformation ical cost or the mean, a series of quantiles should
problem for a two-part model had been studied by be estimated so that an empirical estimate of dis-
Welsh and Zhou (2006). tribution can be achieved. The regression of
quantiles is called the quantile regression.
Mixtures of Distributions Koenker and Hallock (2001) said that “Quantile
With a point mass at zero, observations gathering regression seeks to extend these ideas to the esti-
around zero can also be viewed as multimodality, mation of conditional quantile functions – models
which can be explained by that the distribution is in which quantiles of the conditional distribution
actually a mixture of several distributions. In fact, of the response variable are expressed as functions
the two-part model is a special case of mixture of observed covariates.”
models. A mixture model is helpful in classifying The quantile regression can be viewed as a
the observations into high-cost groups and generalized median regression. In a median
low-cost groups. Say the true distribution of med- regression, the output of regression would
ical cost in a certain population is a mixture of describe the relation between the median, or
several normal distributions with different means 50% quantile, and the covariates. Median
due to some unknown features of patients. Then regression seeks to minimize the difference
the unknown features can be treated as a latent between the estimated values and the real
variable that would help in telling which normal values, in contrast to mean regression. Or in
curve the patient is in. Expectation maximization other words, median regression estimators min-
(EM) algorithm would give the estimates of the imize the sum of absolute value of the
coefficients of interests. More details about mix- difference:
ture models can be found in McLachlan and
Peel (2000). X
n
min y X0 β: (51)
There are other methods to deal with zero-cost i i
i¼1
observations, which include adding a constant to
each sample and forced the data to be positive.
Now let τ ranges from 0 to 100%, a regression
However, some methods have hardly any realistic
on the τ th quantile is
meaning but only serve as a way to address the
zero-cost observations. An advantage of the
X
n
two-part model is that it makes some sense in min ρτ yi X0i β , (52)
terms of real-life interpretations. i¼1
implementation of quantile regression can be about prediction are introduced, leaving the
achieved through written software such as details to be explored by readers.
quantreg in R by Koenker (2009). A nature esti-
mator of the conditional mean is the average of all
conditional quantiles. The marginal effects are Some Basic Concepts of Prediction
now specified with respect to each quantile. Models
A detailed example of quantile regression
analysis can be found in Koenker and Hallock The primary question of interest is how to accu-
(2001). In general, investigators can set a series rately predict the response, in this case the medical
of r, say from 10% to 90% increased by 10%. cost, given other individual information (predic-
That would give ten regression results, each of tors) and previous knowledge (the observed sam-
which stands for the relation of covariates and ple and maybe the theoretical model). A
the corresponding quantile. A major advantage secondary question is how to estimate the accu-
of quantile regression over linear regression racy, i.e., the prediction error, of the proposed
is that it reveals the different behavior of method. This type of prediction is called a super-
covariates on outcomes. Regression on the vised learning in the sense that there is a response
mean averages this effects and report only the (or outcome) that can be used to judge how well
averaged value. It is very possible that a feature the method does. Usually it is achieved by speci-
would behave differently on subjects with rela- fying a loss function which penalizes the method
tively low costs and those with relatively high based on the deviation from the true response,
costs as shown in the example in Koenker and e.g., the square error and absolute error. The
Hallock (2001). A possible explanation is that regression methods described in the last section
there are unknown features even after some can be counted as methods of supervised learning,
features are controlled; those features would where most of them use square error loss and
affect the costs and have interaction with the quantile regression uses several versions of abso-
controlled features. This concept is similar to lute error loss. Notice that an additional assump-
mixture models, where there are unknown fea- tion is needed in order to make the prediction
tures that define different models. But quantile valid: the sample been predicted should be from
regression does not attempt to figure out the the same population from where the observed
classifier; it simply performs the regression on sample is drawn. There are a bunch of other
different quantiles. methods available, to name some, principle com-
ponent analysis (PCA), support vector machine
(SVM), neural networks, random forest, and so
on. A general and broad introduction of the
Prediction methods can be found in Friedman et al. (2001).
As for the measure of accuracy of prediction,
As have been studied in previous sections, various several measures are available, for example, root
methods and models can be employed to discover mean squared error (RMSE) and mean squared
and quantify the association between covariates prediction error (MSPE). RMSE is defined as the
and medical cost in the target population. Natu- squared root of mean squared error, which can be
rally, one would be interested in whether it is estimated by the mean of squared difference
possible to predict the future medical cost for an between the fitted values and true values. MSPE
individual, or a group of people, given certain is the mean of squared difference between the
information. It is worthy of noting that prediction predicted values and the true values. The differ-
is a very broad subject where methods arise from ence between MSPE and MSE is that the model
various disciplines, which is beyond the scope of used to generate predicted values is fitted by
this chapter. In this section, a brief overview of another dataset, while the MSE is calculated
prediction methods and some important concepts with the model fitted by the same dataset. In
474 S. Chen and X. A. Zhou
other words, it requires two independent datasets since the way it is generated is similar to that of
to estimate MSPE but only one for MSE. The MSE, it might also result in overfitting when using
dataset used to fit the model is called the training it as a measure to choose the best model.
set, and the other one is called test set. MSE is Even with cross validation, overfitting is still a
almost always smaller than MSPE. Theoretically, problem. Throwing more predictors into the
MSPE is a better measure of accuracy than model will result in smaller MSPE in most cases.
RMSE. However, estimating MSPE requires two The small MSPE presents as a problem since it is
independent datasets, which might be a luxury for possible that the fitted model has been modified to
study with small sample size. Meanwhile, describe and only describe this observed sample,
depending on MSE, it might result in overfitting or training set, and thus the model is limited in
the current dataset, and thus the model is not valid being generalized to other samples in the popula-
for generalization onto other datasets. tion. Therefore, it is a trade-off between the ability
Recall that the purpose of prediction is to pre- of generalization and the accuracy.
dict the response with the highest accuracy, so the
next question is how to choose the best out of all
these models, which is called model selection in Difference from Regression Analysis
literatures. The basic idea is to estimate the mea-
sures, each model achieved on the study dataset, At the first sight, prediction and statistical infer-
and choose the one with the best performance. ence are similar to each other in the context of
One question that researchers often encounter is regressions: there is an observed sample, with
how to decide what predictors and how many of several predictors (or covariates) and an outcome
them should be included in the model. Say the variable; one builds a model to describe the asso-
measure of accuracy is MSPE. Ideally, there ciation between predictors and outcomes so that
should be a sufficiently large training set to fit the mean, quantiles, or the distribution of the
the model and a test set that could give a good outcome can be explained by a function of pre-
estimate of MSPE. However, this might not be the dictors. However, the focus of these two analyses
case in real-world study. There are different is different. For statistical inference, the target is to
approaches to overcome the limitation of sample describe the relationship between the covariates
size and generate an acceptable estimate of and outcomes in the population from which the
MSPE, like pseudo out-of-sample forecast and sample is drawn. For prediction, the major interest
cross validation. Take the cross validation, for lies in the accuracy of the predicted value, regard-
example, a k-fold cross validation will randomly less of whether the model makes sense or not. For
divide the sample into k subsamples. One subsam- instance, it is okay to look at the fitted model and
ple will be kept as the test set and the other (k-1) say certain predictors’ prediction ability is high,
subsamples are used to fit the model. One can take but one should not overinterpret relationships dis-
the average of the k-fitted models as the single covered in a prediction model. And also, addi-
fitted model. The average of the k MSPE is then tional assumptions are needed if the regression
used as a quantity that summarizes how this model model is used for prediction. The most important
performs and also an estimate for the MSPE. The assumption is that new sample should be from the
model that has the lowest average MSPE will then same population where the model is fitted, so that
be chosen. A common mistake in doing cross it is legit to use the model fitted on the observed
validation is to somehow use the whole dataset sample to make prediction. Another thing is that
in fitting the model, for instance, using the whole the conditional expectation of response give pre-
dataset to choose predictors and then fitted the dictors has different interpretations under differ-
model using these predictors by “k-fold cross ent setting. In regression analysis, it is the average
validation.” The MSPE calculated in that manner response for those who have the given levels of
would be smaller than the true value, and it cannot predictors, the uncertainty of which is estimated
be served as an estimate of the true MSPE. Also, by the standard error. For prediction model, it is
20 Modeling and Analysis of Cost Data 475
the predicted expectation of response given the actually a discussion of the generalized pivotal
level of predictors, the uncertainty of which is quantity. Hannig et al. (2006) refined the defini-
estimated by MSPE. Generally speaking, the pre- tion given by Weerahandi (1993) and discov-
diction error is larger than the standard error. ered that a subclass of generalized pivotal
quantity is of interests and good properties.
This subclass of generalized pivotal quantity is
Appendix named the fiducial generalized pivotal quantity
due to its close connection with Fisher (1935)
Concept of General Pivots fiducial argument.
Suppose there exist mappings f1,...,fk, with fj : @μðxÞ X @μðxÞ @μðxÞ X @μðxÞ
ω1 ðxÞ ¼ þ
ℝk ℝp ! ℝ, such that f = ( f1, . . . , fk) is an @β β @β0 @γ γ @γ 0
invertible pivotal quantity with inverse mapping g @μðxÞ X @μðxÞ @μðxÞ X
(s, •). Define þ2 þ 2
@β β @γ @β 2Dβ
@μðxÞ X X
Rθ ¼ Rθ ð, , ξÞ þ2 þ ,
@γ 1Dγ 1DD
¼ π ðg1 ð, f ð , ξÞÞ, . . . , gk ð, fð , ξÞÞÞ
¼ π ðg1 ð, ÞÞ, . . . , gk ð, Þ 2
@ 2 μðxÞ X @ 2 μðxÞ @ μðxÞ X @ 2 μðxÞ
(53) ω2j ðxÞ ¼ 0 þ 0
@x @β
j β @β @x j @x @γ
j γ @γ @x j
2
@ μðxÞ X @ 2 μðxÞ @ 2 μðxÞ X
where ¼ f ð , ξÞ is an independent copy of þ2
@xj @β βγ @γ 0 @xj
þ2 j
@x @β 2Dβ
. Then Pθ is a FGPQ for θ = π(ξ). When θ is a 2
@ μðxÞ X X
scalar parameter, an equal-tailed two-sided (1 – α) þ2 j þ , 9
@x @γ 2Dγ 2DD
100% GCI for θ is given by Pθ,α/2
θ
Pθ,1α/2
! !!
Here Pθ,γ = Pθ,γ (s) denotes the 100γ th percentile of 1X n
@ 2 μðxi Þ 1X n
@ 2 μðxi Þ
the distribution of Pθ conditional on ¼ s: ω3j ðxÞ ¼ Σβ
n i¼1 @xj @β n i¼1 @β0 @xj
One-sided generalized confidence bounds are ! !!
obtained in an obvious manner. 1X n
@ 2 μ ðx i Þ 1X n
@ 2 μðxi Þ
þ Σγ
n i¼1 @xj @γ n i¼1 @γ 0 @xj
This method is only valid in problems where ! !!
complete statistics exist. For the incomplete cases, 1X n
@ 2 μðxi Þ 1X n
@ 2 μðxi Þ
þ2 Σβγ
the authors gave two generalizations of this n i¼1 @xj @β n i¼1 @γ 0 @xj
!
method. For more details, please see Hannig 1X n
@ 2 μ ðx i Þ
et al. (2006). þ2 Σ 3Dβ
n i¼1 @xj @β
!
1X n
@ 2 μðxi Þ
þ2 Σ3Dγ þ Σ3DD:
Variances and Estimators for Back- n i¼1 @xj @γ
Transformations
The derivatives:
" #
@^
μ ðxÞ @h x,β^
¼μ^ ðxÞ , ^γ
@β @β μ ðxÞ 1 X
@^ n
@mi x,β,^
¼ ,
@^
μ ðxÞ @s2 ðx,^γ Þ @β n i¼1 @β
¼μ^ ðxÞ 0:5 ,
μ ðxÞ 1 X
@^ n ^γ
@mi x,β,^
@γ @γ ¼ ,
" # @γ n i¼1 @γ
@^
μ ðxÞ @^ μ ðxÞ @h x,β^ @s2 ðx,^γ Þ
¼ þ 0:5 ^ ðxÞ 1 X
@2μ n ^γ
@m2i x,β,^
@xj @β @β @xj @xj ¼ ,
" # @xj @β n i¼1 @xj @β
@ 2 h x,β^
þ^μ ðxÞ , ^ ðxÞ 1 X
@2μ n ^γ
@m2i x,β,^
@xj @β ¼ ,
" # @x @γ
j n i¼1 @x @γ
j
@^
μ ðxÞ @^ μ ðxÞ @h x,β^ s2 ðx,^γ Þ
¼ þ 0:5
@xj @γ @γ @xj @xj !1
2 2 X n
@k xi ,β^ @k xi ,β^
^ βγ ¼
Σ
@ s ðx,^γ Þ @β @β0
þ0:5^ μ ðxÞ : i¼1
@xj @γ ! !1
@k xi ,β^ @s2 ðxi ,^γ Þ Xn
@s2 ðxi ,^γ Þ @sðxi ,^γ Þ
^e i ^η i
@β @γ @γ @γ 0
The variances derived from delta methods are i¼1
20 Modeling and Analysis of Cost Data 477
Weerahandi S. Generalized confidence intervals. J Am Stat Zhou XH, Tu W. Comparison of several independent pop-
Assoc. 1993;88(423):899–905. ISSN 01621459. URL ulation means when their samples contain log-normal
http://www.jstor.org/stable/2290779 and possibly zero observations. Biometrics. 1999;55
Weisberg S. Applied linear regression, volume 528. Wiley; (2):645–51.
2005. Zhou XH, Tu W. Confidence intervals for the mean of
Welsh AH, Zhou XH. Estimating the retransformed mean diagnostic test charge data containing zeros. Biomet-
in a heteroscedastic two-part model. J stat plann infer. rics. 2000;56(4):1118–25.
2006;136(3):860–81. Zhou XH, Lin H, Johnson E. Non-parametric hetero-
Wu J, Wong ACM, Jiang G. Likelihood-based confidence scedastic transformation regression models for
intervals for a log-normal mean. Stat Med. 2003;22 skewed data with an application to health care
(11):1849–60. costs. J R Stat Soc Ser B Stat Methodol. 2008;
Zhou XH. Estimation of the log-normal mean. Stat Med. 70(5):1029–47.
1998;17(19):2251–64. Zhou X-H, Gao S, Hui SL. Methods for comparing the
Zhou XH, Gao S. Confidence intervals for the log-normal means of two independent log-normal samples. Bio-
mean. Stat Med. 1997;16(7):783–90. metrics. 1997;53(3):1129–35. ISSN 0006341X. URL
Zhou XH, Gao S. One-sided confidence intervals for means http://www.jstor.org/stable/2533570
of positively skewed distributions. Am Stat. 2000:100–4.
Instrumental Variable Analysis
21
Michael Baiocchi, Jing Cheng, and Dylan S. Small
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
Example: Neonatal Intensive Care Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
The Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
Methods to Address Selection Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
Instrumental Variables: NICU Example Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
Sources of Instruments in Health Services Research Studies . . . . . . . . . . . . . . . . . . . . . . 487
IV Assumptions and Estimation for Binary IV and Binary Treatment . . . . . . . . . . . . 490
Framework and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
Two-Stage Least Squares (Wald) Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
More Efficient Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Estimation with Observed Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Understanding the Treatment Effect That IV Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Relationship Between Average Treatment Effect for Compliers and Average
Treatment Effect for the Whole Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Characterizing the Compliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
Understanding the IV Estimate When Compliance Status Is Not Deterministic . . . . . . . 496
Assessing the IV Assumptions and Sensitivity Analysis for Violations
of Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Assessing the IV Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
M. Baiocchi (*)
Department of Statistics, Stanford University, Stanford,
CA, USA
e-mail: baiocchi@stanford.edu
J. Cheng
Department of Preventive and Restorative Dental Sciences,
University of California, San Francisco School of
Dentistry, San Francisco, CA, USA
e-mail: jing.cheng@ucsf.edu
D. S. Small
University of Pennsylvania, Philadelphia, PA, USA
e-mail: dsmall@wharton.upenn.edu
and/or conduct randomized studies. Observational weight, congenital disorder indicators, parity, and
data offers a sensible alternative source of data for information about the mother’s socioeconomic
developing evidence about the implications of dif- status. Yet even with this level of detail, our data
ferent medical interventions. However, for studies cannot characterize the full set of clinical factors
using observational data to be considered as a that a physician or family considers when decid-
reliable source for evidence of causal effects, ing whether to route a preemie to a high-intensity
great care is needed to design studies in a way care unit. As shall be discussed, these missing
that limits the number of alternative explanations attributes will cause us considerable problems.
for observed differences in outcomes between What one wants is not the naïve comparison of
intervention and control. This chapter will examine rates of death, that is, the percentage of preemies
instrumental variables as a framework for design- who died at the different types of NICUs, but the
ing high-quality observational studies. A few of difference in probabilities of death for each pree-
the common pitfalls to be aware of will be mie given whether the preemie was to be deliv-
discussed. ered at a low-level facility or a high-level facility.
This is the causal effect of treatment. This concept
is formalized below.
Example: Neonatal Intensive Care Units
Di = 1 and subject itaking the control as Di = 0. heart attack. High blood pressure is known to
To formally denote the outcome subject iwould correlate with higher risk of heart attack, so it is
experience under the treatment and control, write tempting to control for this covariate. Controlling
Y(Di = 1) and Y(Di = 0), respectively. To simplify for blood pressure is likely to improve the preci-
the notation, let Y 1i and Y 0i denote the potential sion of the estimate if a pretreatment blood pres-
outcome under treatment and the potential outcome sure measure is used. It would be a mistake to use
under control, respectively. In this chapter, Ywill be a posttreatment measurement of blood pressure as
thought of as a scalar, though it is possible to develop a control because this measurement may be
a framework where Y is a vector of outcomes. affected by the drug and would thus result in an
Excellent resources exist for reading up on the attenuated estimated causal effect. Intuitively, this
potential outcome framework (e.g., Rosenbaum is because the estimation procedure is limiting
2002; Pearl 2009; Hernán and Robins 2013). comparison in outcome not just between people
The ultimate, often unattainable, quantity of who took the drug and who didn’t but between
interest, namely, the individual level treatment people who took the drug and then had a certain
effect, can be described as level of blood pressure to people didn’t take the
drug and had the same level of blood pressure.
Δi ¼ Y 1i Y 0i The impact from the drug may have already hap-
pened via the lowering of the blood pressure.
Thus, Δi will tell us the difference in outcome, Let’s denote these measured pretreatment char-
for subject i, between taking the treatment and acteristics as Xi for the ith subject. Further, the
control. If this quantity could be observed, then subjects are likely to have characteristics which
the benefit from intervention would be known were not recorded. Let’s denote these unobserved
explicitly. But, in practice only one or the other characteristics as Ui for the ith subject. To keep
of the potential outcomes is observed. To see this, things simple, assume that the covariates are lin-
write the observed outcome, denoted Y obs i for the early related to the outcomes like so
ith individual, as a function of the potential out-
comes (Neyman 1990; Rubin 1974): Y 1i ¼ XTi β1 þ UTi α1
Y 0i ¼ XTi β0 þ Ui α0
Y obs
i ¼ Di Y 1i ð1 Di Þ Y 0i
Y obs
i ¼XTi β0 þ Di XTi β1 XTi β0 In an experiment, because of randomization, it
is known that
þ UTi α1 UTi α0 þ UTi α0
ðX, UÞ╨D,
It is standard in econometrics to think of the
above model as a regression, where the coefficient
where ╨ denotes independence. And it follows
on the treatment variable comes from two sources
that
of variation: the first source
is the variation
due to
the observed covariates XTi β1 XTi β0 and the
E Y 1 Y 0 j X ¼ E Y 1 j X, D ¼ 1
second is the variation due to the unobserved
covariates, UTi α1 UTi α0 , where Di may be E Y 0 j X, D ¼ 0
correlated with Ui. It is common to interpret the
first source of variation as the gains for the average Though it is often a dubious claim, many of
person with covariate levels Xi and the second the standard observational study techniques
source of variation to be referred to as idiosyncratic require an assumption which essentially says
gains for subject i. The idiosyncratic gains are the that the only selection between treated and con-
part of this model which allows persons i and j to trol groups is on levels of the observed
differ in gains from treatment even when Xi = Xj. covariates, i.e., U ╨ D|X. This is sometimes
referred to as overt selection bias. Typically, if
Selection Bias overt selection bias is the only form of bias, then
One of the biggest problems with observational stud- either conditioning on observed covariates (e.g.,
ies is that there is selection bias. Loosely speaking, by using a regression) or matching is enough to
selection bias arises from how the subjects are sorted address overt bias. One particular assumption
(or sort themselves) into the treatment or control that is invoked quite often in the current health
groups. The intuition here is the treatment group literature is the absence of omitted variables (i.e.,
was different from the control group even before only overt bias).
the intervention, and the two groups would probably Hidden bias exists when there are imbalances in
have had different outcomes even if there had been the unobserved covariates. Let’s use the observed
no intervention at all. Selection bias can occur in a outcome formula again, rewriting it like so:
couple of different ways, but one way to write it is
Y obs ¼ XTi β0 þ Di E½Δj X þ UTi α0
f ðX, Uj D ¼ 1Þ 6¼ f ðX, Uj D ¼ 0Þ i
þ Di UTi α1 UTi α0
that is, the joint distribution of the covariates
for those who received the treatment is different A least squares regression of Y on D based on
than for those who received the control. If this is the model above will tend to produce biased esti-
true, that there is selection bias, then mates for E[Δ|X]
when Dis correlated with either
UTi α0 or UTi α1 UTi α0 . This can arise from
E Y 1 Y 0 jX 6¼ E½Y ð1Þj X, D ¼ 1 unobserved covariates which influence both
E½Y ð0Þj X,D ¼ 0 potential outcomes and selection into treatment.
This bias is referred to as hidden bias. If the
This is problematic because the left-hand side average treatment effects given T X,1 E[Δ|X], and
of this equation is our unobservable quantity of the hidden
T 0 biases given
X, E Ui α j X, D ¼ 1
interest, but the right-hand side is made up of E Ui α j X, D ¼ 0 , are the same for all X, then
directly observable quantities. But it seems like the regression estimate of E[Δ|X] is biased by
the above equation is used in other settings,
namely, experimentation. Why is that acceptable? E UTi α1 j X, D ¼ 1 E UTi α0 j X, D ¼ 0
484 M. Baiocchi et al.
Methods to Address Selection Bias which are observed. Formally, this assumption
is often written as
In a randomized experiment setting, inference on
the causal effect of treatment on the outcome
Y 0 , Y 1 ╨D j X,
requires no further assumption than the method
0 < pr ðD ¼ 1j XÞ < 1
for randomizing subjects into the treatment or
control (Fisher 1949). The randomization guaran-
tees independence of assigned treatment from the where ╨ denotes the conditional independence
covariates. This independence is for all covariates, between the treatment and the joint distribution of
both observed and unobserved. By observed the counterfactual outcomes given X. Two ran-
covariates we mean those covariates which appear dom variables are conditionally independent
in the analyst’s data set and unobserved all of given a third variable if and only if they are
those that don’t. If the sample is large enough, independent in their conditional distribution
then this independence means that the treatment given the conditioning variable. The above
group will almost surely have quite a similar assumption, essentially saying that all needed
covariate distribution as the control group. There- covariates are measured, has a few different
fore, any variation noted in the outcome is more names: strongly ignorable treatment assignment
readily attributed to the variation in the treatment (Rosenbaum and Rubin 1983), selection on
level rather than variation in the covariates. observables (Heckman and Robb 1985), condi-
The primary challenge to observational studies tional independence, no hidden bias (only overt
is that selection into treatment is not randomly bias due to X), no unmeasured confounders (in the
assigned. Usually there are covariates, both epidemiology literature), or the absence of omit-
observed and unobserved, which determine who ted variable bias (in the econometrics literature).
receives treatment and who receives control. In To assume strongly ignorable treatment assign-
such a case, variation in the outcome is not easily ment in a medical application is to go a bit against
attributable to treatment levels because covariates intuition. In practice, the analyst has access to
are different between the different levels as well. some subset of the recorded information from
There are techniques which were created to the patients’ interaction with the health system.
address this selection bias. These methods can Currently, most analysts do not have access to
be classified (roughly) into two groups: (1) those many measurements the medical decision makers
methods which address only the observed selec- have (e.g., results of labs, biometric information),
tion bias and (2) those methods which attempt to so they are forced to use transactional information
address selection bias on both the observed as well (e.g., insurance claims) which are good for indi-
as unobserved covariates. Falling into the first cating the presence of a condition but not neces-
category are techniques like regression, Bayesian sarily the severity. It is possible that as electronic
hierarchical modeling, propensity score matching, health records become more readily available, this
and inverse probability weighting. The second problem will diminish, but currently this should
category includes methods like instrumental vari- be a source of great skepticism for methods rely-
ables, regression discontinuity, and difference in ing on the assumption of no unobserved bias. But
differences. the issue does not stop here. The health analyst
should be aware that medical practitioners are
Methods to Address Overt Selection Bias keen observers and intuitively adept at identifying
Only through special justification should issues which may go either unrecorded or may
methods which address only overt bias be con- even be unquantifiable (e.g., practitioners will
sidered valid. Usually, this justification takes the regularly refer to the frailty of a patient, which
form of an assumption. Informally, this assump- seems to be a generally understood yet
tion can be thought of as saying: selection into unmeasurable quality of a patient). Given the
the treatment is occurring only on variables additional information the medical decision
21 Instrumental Variable Analysis 485
makers have, and their desire to choose an optimal setting that is otherwise quite biased in its treat-
outcome, medical decision makers are actively ment assignments.
working against the reasonableness of strongly There have been many different formulations
ignorable treatment assignment. of IV, reflecting the diverse academic traditions
It is unfortunate that methods which were that use IV. Though IVs existed in the literature for
designed only to address overt bias have become quite some time, Angrist et al. (1996) used the
the default tools of choice in the literature. Given potential outcome framework to bring greater
the complexity of health decision, it strains cred- clarity to the math of IV. For the health analyst,
ibility that all variables which influence treatment perhaps Holland (1988) offers the most intuitive
and outcome are recorded and available to the introduction to IVs, framing IV as a randomized
analyst. The default for health analysts (and crit- trial with noncompliance. The frameworks for IV
ically minded reviewers) should be to assume discussed in both Angrist et al. (1996) and Hol-
unobserved selection is occurring and to look for land (1988) enhance the classic econometric pre-
ways of mitigating it. sentation of IVs where the focus is on correlation
with the error term. Health analysts will likely find
Instrumental Variables: A Framework these introductions most engaging.
to Address Overt Bias and Bias Due To illustrate IVs, consider the NICU example
to Omitted Variables from earlier.
Regression, propensity score matching, and any
methods predicated on only overt bias do not
address selection on unobserved covariates. It is Instrumental Variables: NICU Example
important to be aware of this because a well- Revisited
informed researcher needs to judge if available
covariates are enough to make a compelling Neonatal intensive care units (NICUs) have been
argument for the absence of omitted variables. established to deliver high-intensity care for pre-
This is often a dubious claim because (1) a clever mature infants (those infants born before
reviewer will find several variables missing 37 weeks of gestation). Considering all of the
from your data set and/or (2) there are intangible preemies that were delivered in Pennsylvania
variables that are difficult, or perhaps inconceiv- between 1995 and 2005, 2.26% of the preemies
able, to measure. Instrumental variable (IV) tech- delivered at high-level NICUs died, while only
niques are one way of addressing unobserved 1.25% of the preemies who were delivered at
selection bias. low-level NICUs died. No one believes the differ-
It is important to note IV techniques do not ence in outcomes reported above is solely attrib-
come for free, without hefty assumptions. It is utable to the difference in level of intensity of
important to consider these assumptions carefully treatment. People believe it is due to difference
before deciding to use an IV analysis. in covariates. Based on the observable covariates,
An instrumental variable (IV) design takes this is plausible because the preemies delivered at
advantage of randomness which occurs in the high-level NICUs weighed almost 250 g less than
treatment assignment to help address imbalances the preemies which were delivered at low-level
in the unobserved variables. An instrument is a NICUs (2,454 at high-level NICUs vs. 2,693 at
haphazard nudge toward acceptance of a treat- low-level NICUs). Similarly preemies delivered
ment that affects outcomes only to the extent that at high-level NICUs were born a week earlier than
it affects acceptance of the treatment. In settings in their counterparts at low-level NICUs on average
which treatment assignment is mostly deliberate (34.5 vs. 35.5 weeks).
and not random, there may nevertheless exist Unfortunately, full medical records were not
some essentially random nudges to accept treat- available for this study. Only birth and death cer-
ment, so that use of an instrument might extract tificates and a form UB-92 that hospitals provided
bits of random treatment assignment from a were available. It is quite likely that not all
486 M. Baiocchi et al.
necessary covariates in our dataset are available, precisely adjusted for using statistical methods
so assuming only overt bias is likely to lead to such as propensity scores or regression. If the
biased estimates. To attempt to deal with this story stopped with just D, Y, and U, then the effect
problem, Baiocchi et al. (2010) and Lorch et al. of D on Y could not be estimated.
(2012a) used an instrumental variable approach. Instrumental variable estimation makes use of
They used distance to treatment facility as an an uncomplicated form of variation in the system.
instrument, because travel time largely determines What is needed is a variable, typically called an
the likelihood that mother will deliver at a given instrument (represented by Z in Fig. 1) that has
facility but appears to be largely uncorrelated with very special characteristics. It takes some practice
the level of severity a preemie experiences. to understand exactly what constitutes a good
To help visualize the problem, look at Fig. 1 instrumental variable.
below. This is an example of a directed acyclic Consider excess travel time as a possible
graph (Pearl 2009). The arrows denote causal instrument. Excess travel time is defined as the
relationships. Read the relationship between vari- time it takes to travel from the mother’s residence
ables D and Y like so: changing the value of D to the nearest high-level NICU minus the time it
causes Y to change. In our example, Y represents takes to travel to the nearest low-level NICU. If
mortality. The variable D indicates whether or not the mother lives closest to a high-level NICU,
a baby attended a high-level NICU. Our goal is to then excess travel time will take on negative
understand the arrow connecting D to Y. In order values. If she lives closest to a low-level NICU,
to keep the current example simple, assume there excess travel time will be positive.
are no observed covariates (which would be There are three key features a variable must
denote using an X in Fig. 1). In general, IV tech- have in order to qualify as an instrument (see
niques are able to adjust for variation in observed section “IV Assumptions and Estimation for
covariates (see section “Estimation with Observed Binary IV and Binary Treatment” for mathemati-
Covariates”). cal details on these features and additional
The U variable causes consternation as it rep- assumptions for IV methods). The first feature
resents the unobserved level of severity of the (represented by the directed arrow from Z to
preemie, and it is causally linked to both mortality D in Fig. 1) is that the instrument causes a change
(sicker babies are more likely to die) and to which in the treatment assignment. When a woman
treatment the preemies selects (sicker babies are becomes pregnant she has a high probability of
more likely to be delivered in high-level NICUs). establishing a relationship with the proximal
Because U is unobserved directly, it cannot be NICU, regardless of the level, because she is not
anticipating having a preemie. Proximity as a
leading determinate in choosing a facility has
been discussed in Phibbs et al. (1993). By
selecting where to live, mothers assign themselves
to be more or less likely to deliver in a high-level
NICU. The fact that changes in the instrument are
associated with changes in the treatment is verifi-
able from the data.
The second feature (represented by the crossed
out arrow from Z to U) is that the instrument is not
associated with variation in unobserved variables
U that also affect the outcome. That is, Z is not
connected to the unobserved confounding that
was a worry to begin with. In our example, this
Fig. 1 Directed acyclic graph for the relationship between
an instrumental variable Z, a treatment D, unmeasured would mean unobserved severity is not caused by
confounders U, and an outcome Y variation in geography. Since high-level NICUs
21 Instrumental Variable Analysis 487
tend to be in urban areas and low-level NICUs the effect of maternal smoking during pregnancy
tend to be the only type in rural areas, this assump- on an infant’s birthweight using a randomized
tion would be dubious if there were high-level of encouragement trial in which some mothers
pollutants in urban areas (think of Manchester, received extra encouragement to stop smoking
England circa the Industrial Revolution) or if through a master’s level staff person providing
there were more pollutants in the drinking water information, support, practical guidance, and
in rural areas than in urban areas. The pollutants behavioral strategies (Sexton and Hebel 1984).
may have an impact on the unobserved levels of For a randomized encouragement trial, the ran-
severity. The assumption that the instrument is not domized encouragement assignment (1 if encour-
associated with variation in the unobserved vari- aged, 0 if not encouraged) is a potential IV. The
ables, while most certainly an assumption, can at randomized encouragement is independent of
least be corroborated by looking at the values unmeasured confounders because it is randomly
of variables that are perhaps related to the assigned by the investigators and will be associ-
unobserved variables of concern (see section ated with the treatment if the encouragement is
“Assessing the IV Assumptions”). effective. The only potential concern with the
The third feature (represented by the crossed randomized encouragement being a valid IV is
out line from Z to Y in Fig. 1) is that the instrument that the randomized encouragement might have
does not cause the outcome variable to change a direct effect on the outcome not through the
directly. That is, it is only through its impact on treatment. For example, in the randomized
the treatment that the instrumental variable affects encouragement trial to encourage expectant
the outcome. In our case, presumably a nearby mothers to stop smoking, the encouragement
hospital with a high-level NICU affects mortality could have a direct effect if the staff person pro-
only if the baby receives care at that hospital. That viding the encouragement also encouraged expec-
is, proximity to a high-level NICU in and of itself tant mothers to stop drinking alcohol during
does not change the probability of death for a pregnancy. To minimize a potential direct effect
preemie, except through the increased probability of the encouragement (Sexton and Hebel 1984)
of the preemie being delivered at the high-level asked the staff person providing encouragement
NICU. This is often referred to as the exclusion to avoid recommendations or information
restriction and can be a slippery concept to get a concerning other habits that might affect
hold of. See Angrist et al. (1996) for discussion of birthweight such as alcohol or caffeine consump-
the exclusion restriction. In our case it seems quite tion and also prohibited discussion of maternal
reasonable. nutrition or weight gain.
When comparing two treatments, one of which
is only provided by specialty care providers and
Sources of Instruments in Health one of which is provided by more general pro-
Services Research Studies viders, the distance a person lives from the nearest
specialty care provider has often been used as an
In this section, common types of IVs that have IV. Proximity to a specialty care provider particu-
been used in health services research studies will larly enhances the chance of being treated by the
be described, and issues to consider in assessing specialty care provider for acute conditions. For
their validity will be discussed. One way to study less acute conditions, patients/providers have
the effect of a treatment when that treatment can- more time to decide and plan where to be treated,
not be controlled is to conduct a randomized and proximity may have less of an influence on
encouragement trial. In such a trial, some subjects treatment selection. For treatments that are stig-
are randomly chosen to get extra encouragement matized such as substance abuse treatment, prox-
to take the treatment and the rest of the subjects imity could have a negative effect on the chance of
receive no extra encouragement (Holland 1988). being treatment. A classic example of the use of
For example, Permutt and Hebel (1989) studied distance as an IV is McClellan et al.’s study of the
488 M. Baiocchi et al.
effect of cardiac catheterization for patients suf- providers with different preferences, which would
fering a heart attack (McClellan et al. 1994); the make the preference-based IV invalid unless patient
IV used in the study was the differential distance mix is fully controlled for. It is useful to look at
the patient lives from the nearest hospital that whether measured patient risk factors differ
performs cardiac catheterization to the nearest between groups of providers with different prefer-
hospital that does not perform cardiac catheteriza- ences. If there are measured differences, there are
tion. Another example is the study of the effect of likely to be unmeasured differences as well; see
high-level versus low-level NICUS (Lorch et al. section “Assessing the IV Assumptions and Sensi-
2012a) that was discussed in section “Instrumen- tivity Analysis for Violations of Assumptions” for
tal Variables: NICU Example Revisited.” Because further discussion. Also, for proposed preference-
distance to a specialty care provider is often asso- based IVs, it is important to consider whether the
ciated with socioeconomic characteristics, it will IV has a direct effect; a direct effect could arise if
typically be necessary to control for socioeco- the group of providers that prefers treatment
nomic characteristics in order for distance to A treats patients differently in ways other than
potentially be independent of unmeasured con- the treatment under study compared to the pro-
founders. The possibility that distance might viders who prefer treatment B. For example,
have a direct effect because the time it takes to Newman et al. (2012)s studied the efficacy
receive treatment affects outcomes needs to be of phototherapy for newborns with hyper-
considered in assessing whether distance is a bilirubinemia and considered the frequency of
valid IV. phototherapy use at the newborn’s birth hospital
A general strategy for finding an IV for com- as an IV. However, chart reviews revealed that
paring two treatments A and B is to look for hospitals that use more phototherapy also have a
naturally occurring variation in medical greater use of infant formula; use of infant for-
practice patterns at the level of geographic mula is also thought to be an effective treatment
region, hospital or individual physician, and for hyperbilirubinemia. Consequently, the pro-
then use whether the region/hospital/individual posed preference-based IV has a direct effect
physician has a high or low use of treatment A as (going to a hospital with higher use of photo-
the IV. Brookhart and Schneeweiss (2007) therapy also means a newborn is more likely to
termed these IVs “preference-based instruments” receive infant formula even if the newborn does
because they are derived from the assumption not receive phototherapy) and is not valid. The
that different providers or groups of providers issue of whether a proposed preference-based
have different preferences or treatment algo- IV has a direct effect can be studied by looking
rithms dictating how medications or medical pro- at whether the IV is associated with concomitant
cedures are used. Examples of studies using treatments like use of infant formula (Brookhart
preference-based IVs are Brooks et al. (2004) and Schneeweiss 2007). A related way in which
that studied the effect of surgery plus irradiation a proposed preference-based IV can have a
versus mastectomy for breast cancer patients using direct effect is that the group of providers who
geographic region as the IV (Johnston 2000) that prefer treatment A may have more skill than the
studied the effect of surgery versus endovascular group of providers who prefer treatment B.
therapy for patients with a ruptured cerebral aneu- Also, providers who prefer treatment A may
rysm using hospital as the IV and Brookhart et al. deliver treatment A better than those providers
(2006) that studied the benefits and risks of who prefer treatment B because they have more
selective cyclooxygenase 2 inhibitors versus non- practice with it, for example, doctors who per-
selective nonsteroidal antiinflammatory drugs for form surgery more often may perform better
treating gastrointestinal problems using individual surgeries. Korn and Baumrind (1998) discuss a
physician as the IV. For proposed preference-based way to assess whether there are provider skill
IVs, it is important to consider that the patient effects by collecting data from providers on
mix may differ between the different groups of whether or not they would have treated a
21 Instrumental Variable Analysis 489
different provider’s patient with treatment A or on a parent’s genes, genetic variants need to satisfy
B based on the patient’s pretreatment records. additional assumption to be valid IVs:
Another common source for an IV is calendar
time. Variations in the use of one treatment 1. Not associated with unmeasured confounders
versus another could result from changes in through population stratification. Most Men-
guidelines, changes in formularies or reim- delian randomization analyses do not condi-
bursement policies, changes in physician pref- tion on parents’ genes, creating the potential
erence (e.g., due to marketing activities by drug of the proposed genetic variant IV being asso-
makers), release of new effectiveness or safety ciation with unmeasured confounders through
information, or the arrival of new treatments to population stratification. Population stratifica-
the market (Brookhart et al. 2010). For example, tion is a condition where there are subpopula-
Shetty et al. (2009) studied the effect of hor- tions, some of which are more likely to have
mone replacement therapy (HRT) on cardiovas- the genetic variant, and some of which are
cular health among postmenopausal women more likely to have the outcome through
using calendar time as an IV. HRT was widely mechanisms other than the treatment being
used among postmenopausal women until 2002; studied. For example, consider studying the
observational studies had suggested that HRT effect of alcohol consumption on hyperten-
reduced cardiovascular risk, but the Women’s sion. Consider using the ALDH2 null vari-
Health Initiative randomized trial reported ant, which is associated with alcohol
opposite results in 2002, which caused HRT consumption, as an IV (individuals who are
use to drop sharply. A proposed IV based on homozygous for the ALDH2 null variant
calendar time could be associated with con- have severe adverse reactions to alcohol con-
founders that change in time such as the charac- sumption and tend to drink very little
teristics of patients who enter the cohort, (Lawlor et al. 2008)). The ALDH2 null var-
changes in other medical practices, and changes iant is much more common in people with
in medical coding systems (Brookhart et al. Asian ancestry than other types of ancestry
2010). The most compelling type of IV based (Goedde et al. 1992). Suppose ancestry was
on calendar time is one where a dramatic change not fully measured. If ancestry is associated
in practice occurs in a relatively short period of with hypertension through means other than
time (Brookhart et al. 2010). differences in the ALDH2 null variant (e.g.,
Another general source for potential IVs is through different ancestries tending to have
genetic variants which affect treatment variables. different diets), then ALDH2 would not be a
For example, Voight et al. (2012) studied the valid IV because it would be associated with
effect of HDL cholesterol on myocardial infarc- an unmeasured confounder.
tion using as an IV the genetic variant LIPG 2. Not associated with unmeasured confounders
396Ser allele for which carriers have higher through genetic linkage. Genetic linkage is the
levels of HDL cholesterol but similar levels of tendency of genes that are located near to each
other lipid and non-lipid risk factors compared other on a chromosome to be inherited together
with noncarriers. Another example is Wehby because the genes are unlikely to be separated
et al. (2011) that studied the effect of maternal during the crossing over of the mother’s and
smoking on orofacial clefts in their babies using father’s DNA (Sham 1998). Consider using a
genetic variants that increase the probability that gene A as an IV where gene A is genetically
a mother smokes as IVs. The approach of using linked to a gene B that has a causal effect on the
genetic variants as an IV is called Mendelian outcome through a pathway other than the
randomization because it makes use of the ran- treatment being studied. If gene B is not mea-
dom assignment of genetic variants conditional on sured and controlled for, then gene A is not a
parents’ genes discovered by Mendel. Although valid IV because it is associated with the
genetic variants are randomly assigned conditional unmeasured confounder gene B.
490 M. Baiocchi et al.
3. No direct effect through pleiotropy. Pleiotropy must be comparable on weekdays versus week-
refers to a gene having multiple functions. If ends (i.e., the IV has no direct effect). Another
the genetic variant being used as an IV affects example of a timing of admission variable used
the outcome through a function other than as an IV is hour of birth as an IV for a newborn’s
affecting the treatment being studied, this length of stay in the hospital (Goyal et al. in
would mean the genetic variant has a direct press; Malkin et al. 2000).
effect. For example, consider the use of the An additional general source of potential IVs
APOE genotype as an IV for studying the for health services research studies is insurance
causal effect of low-density lipoprotein cho- plans which may vary in the amount of reim-
lesterol (LDLc) on myocardial infarction bursement they provide for different treat-
(MI) risk. The d2 variant of the APOE gene ments. For example, Cole et al. (2006) used
is associated with lower levels of LDLc but is drug co-payment amount as an IV to study
also associated with higher levels of high- the effect of β-blocker adherence on clinical
density lipoprotein cholesterol, less efficient outcomes and health-care expenditures after a
transfer of very low-density lipoproteins and hospitalization for heart failure. In order
chylomicrons from the blood to the liver, for variations in insurance plan like drug
greater postprandial lipemia, and an increased co-payment amount to be a valid IV, insurance
risk of type III hyperlipoproteinemia (the last plans must have comparable patients after con-
three of which are thought to increase MI trolling for measured confounders (i.e., the IV
risk) (Lawlor et al. 2008). Thus, the gene is independent of unmeasured confounders),
APOE is pleiotropic, affecting myocardial and insurance plans must not have an effect on
infarction risk through different pathways, the outcome of interest other than through
making it unsuitable as an IV to examine the influencing the treatment being studied (i.e.,
causal effect of any one of these pathways on the IV has no direct effect).
MI risk.
Didelez and Sheehan (2007) and Lawlor et al. IV Assumptions and Estimation
(2008) provide good reviews of Mendelian ran- for Binary IV and Binary Treatment
domization methods.
Another source of IVs for health services In this section, the simplest setting of a binary
research studies is timing of admission vari- instrument and a binary treatment will be consid-
ables. For example, Ho et al. (2000) used day ered. The main ideas in instrumental variable
of the week of hospital admission as an IV for methods are most easily understood in this setting,
waiting time for surgery to study the effects of and the ideas will be expanded to more compli-
waiting time on length of stay and inpatient cated settings later.
mortality among patients admitted to the hospi-
tal with a hip fracture. Day of the week of
admission is associated with waiting time for Framework and Notation
surgery because many surgeons only do non-
emergency operations on weekdays, and there- The Neyman-Rubin potential outcome frame-
fore patients admitted on weekends may have to work will be used to describe causal effects
wait longer for surgery. In order for weekday (Neyman 1990; Rubin 1974). Let Zi denote the
versus weekend admission to be a valid IV, IV for subject i, where Zi = 0 or 1 for a binary
patients admitted on weekdays versus weekends IV. Level 1 of the IV is assumed to mean
must not differ on unmeasured characteristics the subject was encouraged to take level 1 of
(i.e., the IV is independent of unmeasured con- the treatment, where the treatment has levels
founders) and other aspects of hospital care that 0 and 1. Let Dzi be the potential treatment
affect the patients’ outcomes besides surgery received for subject i if she were assigned level
21 Instrumental Variable Analysis 491
z of the IV – D1i is the treatment that subject 5. Stable unit treatment value assumption
i would receive if she were assigned level 1 of (SUTVA). This assumption says that the treat-
the IV and D0i is treatment that i would receive if ment affects only the subject taking the treat-
she were assigned level 0 of the IV. The ment and the treatment effect is stable through
observed treatment received for subject i is Di time (see Angrist et al. 1996; Rubin 1990 for
z, d
DZii . Let Y i be the potential outcome for details). The first part of this assumption that
subject i if she were assigned level z of the IV the treatment affects only the subject taking the
and level d of the treatment – there are four such treatment is called the no interference
potential outcomes Y 1i , 1 , Y 1i , 0 , Y 0i , 1, and Y 0i , 0 : assumption.
However, only one of them will be observed in
practice. The observed outcome for subject i is The first three assumptions are the assumptions
Z , DZi depicted in Fig. 1.
Y i Y i i i : Let Xi denote observed covariates
The fourth assumption, monotonicity, plays a
for subject i.
role in interpreting the standard IV estimate as a
Angrist et al. (1996) considered an IV to be a
causal effect for a certain subpopulation. A sub-
variable satisfying the following five
ject in a study with binary IV and treatment can
assumptions:
be classified into one of four latent compliance
classes based on the joint values of potential
1. IV is correlated with treatment received treatment received (Angrist et al. 1996):
Ci = never taker (nt) if D0i , D1i ¼ ð0, 0Þ, com-
E D1i j Xi > E D0i j Xi . 0 1
2. IV is independent of unmeasured confounders plier (co) if D i , Di ¼ ð0, 1Þ , always taker
0 1 i i ¼ ð1, 1Þ , and defier (de) if
0 1
(conditional on covariates). (at) if D , D
Di , Di ¼ ð1, 0Þ . Table 1 shows the relation-
ship between observed groups and latent com-
Zi is independent of D1i , D0i , Y i1, 1 , Y i0, 1 , Y i0, 0 j X:
pliance classes. Under the monotonicity
assumption, the set of defiers will be empty.
3. Exclusion restriction (ER). This assumption The never takers and always takers do not
says that the IV affects outcomes only change their treatment status when the instru-
through its effect on treatment received: ment changes, so under the ER assumption, the
0
Y z, d ¼ Y z , d . Under the ER, write Y d ¼ Y z, d
i i i i potential treatment and potential outcome under
for any z, that is, Y 1i is the potential outcome for either level of the IV (Zi = 1 or 0) is the same.
subject i if she were to receive level 1 of the Consequently, the IV is not helpful for learning
treatment (regardless of her level of the IV), about the treatment effect for always takers or
and Y 0i is the potential outcome if she were to never takers. Compliers are subjects who change
receive level 0 of the treatment. This assump- their treatment status with the instrument, that is,
tion is called the no direct effect assumption. the subjects would take the treatment if they were
4. Monotonicity assumption. This assumption encouraged to take it by the IV but would not
says that there are no subjects who are otherwise take the treatment. Because these subjects
“defiers,” who would only take level 1 of the change their treatment with the level of the IV, the IV
treatment if not encouraged to do so, that is, no is helpful for learning about their treatment effects.
subjects with D1i ¼ 0, D0i ¼ 1: The average causal effect for this subgroup,
E Y 1i Y 0i j Ci ¼ co , is called the complier aver- for compliers. When monotonicity does not hold,
age causal effect (CACE) or the local average treat- the standard IV estimator Eq. 3 discussed in section
ment effect (LATE). It provides the information on “Two Stage Least Squares (Wald) Estimator” esti-
the average causal effect of receiving the treatment mates the quantity (Angrist et al. 1996).
Equation 1 could potentially be negative even if units being treated on those units left untreated
the treatment has a positive effect for all subjects (see Sobel 2006 for a precise formulation and
(Angrist et al. 1996). However, the IV method details).
estimate of the CACE is not generally sensitive In economics, a latent index model is often
to small violations of the monotonicity assump- considered for causal inference about the effect
tion (Angrist et al. 1996). Additionally, if the of a binary treatment based on a structural equa-
treatment has the same effect for compliers and tion model or two-stage linear model, for
defiers, the monotonicity assumption is not example,
needed
1 as Eq. 1 equals the CACE,
E Y i Y i j Ci ¼ co
0
(Robins and Greenland Di ¼ α0 þ α1 Z i þ ei1
1996). For further discussion of understanding Y i ¼ β0 þ β1 Di þ ei2
the treatment effect that the IV method estimates,
see section “Understanding the Treatment Effect where
That IV Estimates.”
The fifth IV assumption, SUTVA, also plays
1 if Di > 0
a role in interpreting what the standard IV Di ¼
method estimate Eq. 3 estimates. Consider in 0 if Di 0
particular the no interference assumption part Z i ╨ei1 , ei2
of SUTVA that subject A receiving the treatment
affects only subject A and not other subjects. In Vytlacil (2002) shows that a nonparametric
the NICU study, the no interference assumption version of the latent index model is equivalent to
is reasonable – if preemie A is treated at a high- the Assumptions 1–5 above that Angrist et al.
level NICU, this does not affect preemie B’s (1996) use to define an IV.
outcome. If there were crowding effects (e.g.,
treating additional babies at a hospital decreases
the quality of care for babies already under care Two-Stage Least Squares (Wald)
that hospital), this assumption might not be true. Estimator
SUTVA is also not appropriate for situations
like estimating the effect of a vaccine on an Let us first consider IV estimation when there
individual because herd immunization would are no observed covariates X. For binary IV and
lead to causal links between different people treatment variable, Angrist et al. (1996) show
(Hudgens and Halloran 2008). When no inter- that under the framework and assumptions in
ference fails to hold, the IV method is roughly section “Two Stage Least Squares (Wald) Esti-
estimating the difference between the effect of mator,” the CACE is nonparametrically identi-
the treatment and the spillover effect of some fied by
21 Instrumental Variable Analysis 493
E Y 1
i Y i j Ci ¼ co
0 and π a, π c, and π n denote the proportion of always
takers, compliers, and never takers, respectively.
Eð Y i j Z i ¼ 1Þ Eð Y i j Z i ¼ 0Þ
¼ , Note that by Assumptions 1–5 and the mixture
EðDi j Zi ¼ 1Þ EðDi j Zi ¼ 0Þ structure of the outcomes of the four observed
(2) groups shown in Table 1,
i ¼
pc0 pni ¼ 1, pc0
i , pi 0, i ¼ 1, . . . , n0 ,
of and locations of the pieces. n
E Y 1 Y 0 ¼ PðC ¼ coÞE Y 1 Y 0 j C ¼ co Pischke 2009; Brookhart and Schneeweiss 2007).
The mean of a covariate Xi among the compliers is
þ PðC ¼ atÞE Y 1 Y 0 j C ¼ at
þ PðC ¼ ntÞE Y 1 Y 0 j C ¼ nt : E½ κ i X i
E½Xi j C ¼ co ¼ , (7)
E½κ i
The IV method provides no direct information
on the average treatment effect for always takers where
(E[Y1 Y0|C = at]) or the average treatment
effect for never takers (E[Y1 Y0|C = nt]). How- D i ð1 Z i Þ ð1 Di ÞZi
κi ¼ 1
ever, the IV method can provide useful bounds on 1 P Zi ¼ 1 j Xi PðZ i ¼ 1j Xi Þ
the average treatment effect for the whole popu-
lation if a researcher is able to put bounds on the (Abadie 2003). The prevalence ratio of a
difference between the average treatment effect binary characteristic X among compliers com-
for compliers and the average treatment effects pared to the full population is P (X = 1|C = co)/
for never takers and always takers based on sub- P (X = 1). Table 2 shows the mean of various
ject matter knowledge. For example, suppose a characteristics X among compliers versus the full
researcher is willing to assume that this difference population and also shows the prevalence ratio
is no more than b, then (where the sample estimates of P (Zi = 1|Xi), E
[κiXi] and E[κi] are plugged into Eq. 7). Babies
E Y 1 Y 0 j C ¼ co b½1 PðC ¼ coÞ whose mothers are college graduates are slightly
underrepresented (prevalence ratio = 0.87), and
E Y 1 Y 0 E Y 1 Y 0 j C ¼ co (6)
African-Americans are slightly overrepresented
þ b½1 PðC ¼ coÞ,
(prevalence ratio = 1.14) among compliers. Very
low birthweight (<1500 g) and very premature
where the quantities on the left and right-hand
babies (gestational age 32 weeks) are substan-
sides of Eq. 6 other than b can be estimated as
tially underrepresented among compliers, with
discussed in section “IVAssumptions and Estima-
prevalence ratios around one-third; these babies
tion for Binary IV and Binary Treatment.” For
are more likely to be always takers, that is, delivered
binary or other bounded outcomes, the bounded-
at high-level NICUs regardless of mother’s travel
ness of the outcomes can be used to tighten
time. Babies whose mothers’ have comorbidities
bounds on the average treatment effect for the
such as diabetes or hypertension are slightly under-
whole population or other treatment effects
represented among compliers. Overall, Table 2 sug-
(Balke and Pearl 1997; Cheng and Small 2006).
gests that higher risk babies are underrepresented
Qualitative assumptions, such as that the average
among the compliers. If the effect of high-level
treatment effect is larger for always takers than
NICUs is greater for higher risk babies, then the
compliers, can also be used to tighten the bounds
IV estimate will underestimate the average effect of
(e.g., Cheng and Small 2006; Bhattacharya et al.
high-level NICUs for the whole population.
2008; Siddique 2009).
The IV method estimates the average treatment For an encouragement that is uniformly delivered,
effect for the subpopulation of compliers. Who are such as patients who made an appointment at a
these compliers and how do they compare to psychiatric outpatient clinic are sent a letter
noncompliers? To understand this better, it is use- encouraging them to attend the appointment
ful to characterize the compliers in terms of their (Kitcheman et al. 2008), it is clear that a subject
distribution of observed covariates (Angrist and is either a complier, always taker, never taker, or
21 Instrumental Variable Analysis 497
Table 2 Complier characteristics for NICU study. The characteristic X, and the fourth column shows the estimated
second column shows the estimated proportion of com- ratio of compliers with X compared to the full population
pliers with a characteristic X, the third column shows the with X
estimated proportion of the full population with the
Prevalence of X among Prevalence of X in full Prevalence ratio of X among compliers
Characteristic X compliers population to full population
Mother College 0.23 0.26 0.87
Graduate
African-American 0.17 0.15 1.14
Birthweight < 1,500 g 0.03 0.09 0.33
Gestational age 0.04 0.13 0.34
32 weeks
Gestational diabetes 0.05 0.05 0.91
Diabetes mellitus 0.02 0.02 0.77
Pregnancy-induced 0.08 0.10 0.82
hypertension
Chronic hypertension 0.02 0.02 0.89
defier with respect to the encouragement. How- intervention. Consider the case of Mendelian ran-
ever, sometimes encouragements that are not uni- domization, in which the IV is often a single
formly delivered are used as IVs. For example, in nucleotide polymorphism (SNP) that might be
the NICU study, consider the IV of whether the part of a gene A. The SNP may be a marker for a
mother’s excess travel time to the nearest high- gene B on the same chromosome that actually
level NICU is more than 10 min. If a mother affects the level of the exposure D. The encour-
whose excess travel time to the nearest high- agement intervention is receiving the gene B that
level NICU was more than 10 min moved to a actually affects the level of the exposure D, and
new home with an excess travel time less than the SNP is just a proxy for this encouragement.
10 min, whether the mother would deliver her Consequently, even if a subject’s exposure level
baby at a high-level NICU might depend on addi- would change as a result of a change in gene B,
tional aspects of the move, such as the location whether the subject is a complier with respect to a
and availability of public transportation at her new change in the SNP depends on whether the change
home (Joffe 2011) and the exact travel time to the in the SNP leads to a change in the gene B, which
nearest high-level NICU at her new home. Con- is randomly determined through the process of
sequently, a mother may not be able to be deter- recombination (Joffe 2011).
ministically classified as a complier or not a Brookhart and Schneeweiss (2007) provide a
complier – she may be a complier with respect to framework for understanding how to interpret the
certain moves but not others. Another example of IV estimate when compliance status is not deter-
nondeterministic compliance is that when physi- ministic. Suppose that the study population can be
cian preference for one drug versus another is decomposed into a set of κ + 1 mutually exclusive
used as the IV (e.g., Z = 1 if a patient’s physician groups of patients based on clinical, lifestyle, and
prescribes drug A more often drug B), whether a other characteristics such that within each group
patient receives drug A may depend on how of patients, whether a subject receives treatment is
strongly the physician prefers drug A (Brookhart independent of the effect of the treatment. All of
and Schneeweiss 2007; Hernán and Robins the common causes of the potential treatment
2006). Another situation in which nondeter- received D1, D0, and the potential outcomes Y1,
ministic compliance status can arise is that the Y0 should be included in the characteristics used to
IV may not itself be an encouragement interven- define these groups. For example, if there are
tion but a proxy for an encouragement L binary common causes of (D1, D0, Y1, Y0),
498 M. Baiocchi et al.
then the subgroups can be the κ + 1 = 2L possible (Lorch et al. 2012a). If there are subgroups for
values of these common causes. Denote patient whom the encouraging level of the instrument
membership in these groups by the set of indica- makes them less likely to receive the treatment,
tors S = {S1, S2,. . ., Sκ}. Consider the following then this subgroup would get “negative weight”
model for the expected potential outcome: and Eq. 8 is not a true weighted average, poten-
tially leading the IV estimator to have the opposite
E Y d j S ¼ α0 þ α1 d þ αT2 S þ αT3 Sd sign of the effect of the treatment. For example,
Brookhart and Schneeweiss (2007) discussed
The average effect of treatment in the popula- studying the safety of metformin for treating
tion is α1 þ αT3 E½S , and the average effect of type II diabetes versus other antihyperglycemic
treatment in subgroup j is α1 + α3,j. Under the IV drugs among patients with liver disease using phy-
assumptions 1–3 and 5 in section “Framework sician preference as the IV (Z = 1 if a physician
and Notation,” that is, all the assumptions except is more likely to prescribe metformin than
monotonicity, the IV estimator estimates the fol- other antihyperglycemic drugs). Metformin is
lowing quantity: contraindicated in patients with decreased liver
disease, as it can cause lactic acidosis, a potentially
fatal side effect. Brookhart and Schneeweiss
EðYj Z ¼ 1Þ EðYj Z ¼ 0Þ (2007) speculated that physicians who infrequently
EðYj Z ¼ 1Þ EðDj Z ¼ 0Þ use metformin will be less likely to understand its
Xκ contraindications and would therefore be more
¼ α1 þ α3, j E Sj wj , (8) likely to misuse it. If this hypothesis is true, then
j¼1 for estimating the effect of metformin on lactic
acidosis, the IV estimator could mistakenly make
where metformin appear to prevent lactic acidosis, as
patients of physicians with Z = 1 are at lower risk
E DjZ ¼ 1, Sj ¼ 1 E DjZ ¼ 0,Sj ¼ 1 of being inappropriately treated with metformin.
wj ¼ :
EðDjZ ¼ 1Þ EðDjZ ¼ 0Þ When the compliance class is deterministic, a sub-
group getting negative weight means that there are
The IV estimator Eq. 8 is a “weighted average” defiers, violating the monotonicity assumption.
of treatment effects in different subgroups, where
the subgroups in which the instrument has a stron-
ger effect on the treatment get more weight. Note Assessing the IV Assumptions
that when the compliance class is deterministic, and Sensitivity Analysis for Violations
then the subgroups can be defined as the compli- of Assumptions
ance classes and Eq. 8 just says that the IV esti-
mator is the average treatment effect for Assessing the IV Assumptions
compliers. In the NICU study, where compliance
class may not be deterministic, Table 2 suggests This section will discuss assessing the two key IV
that babies in lower-risk groups, for example, not assumptions: (1) the IV is independent of
very low birthweight or not very low gestational unmeasured confounders; (2) the IV affects out-
age, are weighted more heavily in the IV estima- come only through treatment received (the exclu-
tor. If there are subgroups for whom the instru- sion restriction).
ment has no effect on their treatment level, then One way of assessing whether the proposed IV
that subgroup gets zero weight. For example, is independent of unmeasured confounders con-
mothers or babies with severe preexisting condi- ditional on measured confounders is to look at
tions may virtually always be delivered at a high- whether the proposed IV is associated with mea-
level NICU, so that the IV of excess travel sured confounders. Although measured con-
time has no effect on their treatment level founders can be controlled for, if the measured
21 Instrumental Variable Analysis 499
confounder is only a proxy for the true con- (Demissie et al. 2001; Lorch et al. 2012b), it is
founder, then an association between the proposed sensible to examine the association of other
IV and the measured confounder suggests that measured confounders with the IV after control-
there will be an association between the IV and ling for race. Table 4 shows the association of
the unmeasured part of the true confounder. If the IV with measured confounders for whites.
there are two or more sources of confounding, The clinical measured confounders such as low
then it is useful to examine if the observable part birthweight, gestational age 32 weeks, and
of one source of confounding is associated with maternal comorbidities (diabetes and hyperten-
the IV after controlling for the other sources of sion) are generally similar between near and far
confounding. These ideas will be illustrated using babies although there are some significant asso-
the NICU study described in section “Instrumen- ciations. This similarity between the clinical
tal Variables: NICU Example Revisited.” Table 3 status of near and far babies and mothers after
shows the imbalance of measured covariates controlling for race provides some support that
across levels of the IV. The racial composition is the IV is approximately, although not exactly,
very different between the near (Z = 1) and far valid for whites. However, whether the mother
(Z = 0) babies, with near babies being much more is a college graduate differs substantially
likely to be African-American. Since race has a between white near and far mothers, suggesting
substantial association with neonatal outcomes that there may be residual confounding due to
Table 3 Imbalance of measured covariates across levels compliers is P (D = 1|Z = 1) P (D = 1|Z = 0) = 0.447 so
of the instrument for the NICU data. The prevalence dif- that a prevalence difference ratio less than 0.447 for an
ference ratio is the ratio of the imbalance of the measured X indicates that there would less bias in the IV method from
covariates across levels of the instrument to the imbalance failing to adjust for X than from ordinary least squares that
across levels of the treatment. The estimated proportion of failed to adjust for X
Characteristic X P (X|near) (%) P (X|far) (%) p-value Prevalence difference ratio
Birthweight < 1,500 g 9.4 7.7 <0.01 0.02
Mother College Graduate 25.9 26.1 0.26 0.04
African-American 25.6 4.6 <0.01 0.64
Gestational age 32 weeks 14.3 11.7 <0.01 0.23
Gestational diabetes 5.2 5.2 0.47 0.12
Diabetes mellitus 1.8 1.9 0.07 0.16
Pregnancy-induced hypertension 10.6 10.1 <0.01 0.13
Chronic hypertension 1.9 1.3 <0.01 0.61
Table 4 Imbalance of measured covariates across levels Z = 1, white) P (D = 1|Z = 0, white) = 0.418 so that a
of the instrument for babies born to white mothers in the prevalence difference ratio less than 0.418 for an
NICU data. The prevalence difference ratio is the ratio of X indicates that there would less bias in the IV method
the imbalance of the measured covariates across levels of from failing to adjust for X than from ordinary least squares
the instrument to the imbalance across levels of the treat- that failed to adjust for X
ment. The estimated proportion of compliers is P (D = 1|
Characteristic X P (X|near) (%) P (X|far) (%) p-value Prevalence difference ratio
Birthweight < 1,500 g 7.5 7.2 0.07 0.04
Mother College Graduate 34.4 26.8 <0.01 0.72
Gestational age 32 weeks 11.8 11.1 <0.01 0.07
Gestational diabetes 5.6 5.3 0.02 0.34
Diabetes mellitus 1.8 1.9 0.08 0.17
Pregnancy-induced hypertension 10.6 10.1 <0.01 0.05
Chronic hypertension 1.6 1.3 <0.01 0.43
500 M. Baiocchi et al.
The term E(U|Z = 1) E(U|Z = 0) is the analysis reduces bias for whites compared to
difference in the prevalence of the risk factor OLS but not for African-Americans.
U between levels of the IV. The total bias in the A way of testing whether the two key IV
IV estimator is this difference multiplied by the assumptions (i.e., (i) the IV is independent of
excess risk of the outcome among patients with unmeasured confounders conditional on the mea-
U = 1 divided by the strength of the IV. For the IV sured confounders and (ii) the IVaffects outcomes
estimator to have less asymptotic bias than ordi- only through treatment received) hold is to find a
nary least squares (OLS), the following condition subpopulation for whom the link between the IV
must hold (Brookhart and Schneeweiss 2007) and treatment received is thought to be broken and
then test whether the IV is associated with the
outcome in this subpopulation. The only way in
E½Uj Z ¼ 1 E½Uj Z ¼ 0
which the IV could be associated with the out-
E½Uj D ¼ 1 E½Uj D ¼ 0
come in such a subpopulation is if the IV was
< EðDj Z ¼ 1Þ EðDj Z ¼ 0Þ: (11) associated with unmeasured confounders or
directly affected the outcome through a pathway
In other words, the difference in the prevalence other than treatment received. Figure 2 shows an
of U between levels of Z relative to the difference example. Kang et al. (2013) study the effect of
in the prevalence of U between levels of D must children in Africa getting malaria on their becom-
be less than the strength of the IV (Brookhart and ing stunted (having a height that is two standard
Schneeweiss 2007). The left-hand side of Eq. 11 is deviations below the expected height for the
called the prevalence difference ratio (PDR). In child’s age) and consider the sickle cell trait as a
order for us to think that the IVanalysis is likely to possible IV. The sickle cell trait is that a person
be less biased than OLS, the PDR should be less inherits a copy of the hemoglobin variant HbS
than the strength of the IV (E[D|Z = 1] E[D| from one parent and normal hemoglobin from
Z = 0]), particularly for those variables clearly the other. While inheriting two copies of HbS
related to the outcome. Table 4 shows that the results in sickle cell disease and substantially
PDRs are generally less than the strength of the shortened life expectancy, inheriting only one
IV (0.418) for whites, but the PDRs are often copy (the sickle cell trait) is protective against
greater than the strength of the IV (0.503) for malaria and is thought to have little detrimental
African-Americans, suggesting that the IV effect on health (Aidoo et al. 2002). To test
whether the sickle cell trait indeed does not affect (ii) E[Y|Z = 1, D = 0] E[Y|Z = 0, aD = 0] = 0.
stunting in ways other than reducing malaria and These are the differences between (i) the average
is not associated with unmeasured confounders, potential outcome of the group of always takers
Kang et al. (2013) considered whether the sickle and compliers together when these subjects are
cell trait is associated with stunting among encouraged to receive treatment and receive treat-
African-American children; the sickle cell trait ment versus those of always takers alone when
has high prevalence among African-Americans they are not encouraged to receive treatment but
but does not affect malaria because malaria is do receive treatment and (ii) the average potential
not present in the United States. Rehan (1981) outcome of never takers when encouraged to
and Kramer et al. (1978) found no evidence that receive treatment but do not receive treatment
sickle cell trait is associated with growth and versus those of the group of never takers and
development in African-American children. This compliers when they are not encouraged to
provides evidence that the dashed lines in Fig. 2 receive treatment and do not receive treatment.
are indeed absent, which would mean that the If the IV assumptions hold that the IV is not
proposed IV of the sickle cell trait does indeed associated with unmeasured confounders and
satisfy the two key IV assumptions of being inde- has no direct effect on the outcome other than
pendent of unmeasured confounders and affecting treatment received, then (i) is equal to zero if
outcomes only through treatment received. and only if the average potential outcome of
Angrist and Krueger (1991) also employed this compliers and always takers are the same when
strategy of finding a subpopulation for whom the both groups receive treatment and (ii) is equal to
link between the IV and treatment received is zero if and only if the average potential out-
broken to test their IV of quarter of birth for comes of compliers and never takers are the
studying the effect of education on earnings. The same when both groups do not receive treat-
reason that quarter of birth is associated with ment. Typically, the average potential outcome
education is that for students who plan to drop of compliers and always takers (compliers and
out of school as soon as they have reached the age never takers) will not be the same when both
at which they are no longer compelled to be in groups receive (do not receive) treatment even if
school (e.g., age 16), quarter of birth affects how the IV assumptions hold.
much education these students will get before they
drop out because children start school at different
ages depending on their quarter of birth. However, Sensitivity Analysis
for students who plan to go to college, quarter of
birth does not affect their amount of schooling. A sensitivity analysis seeks to quantify how sen-
Consequently, Angrist and Krueger (1991) looked sitive conclusions from an IV analysis are to plau-
at whether there was an absence of an association sible violations of key assumptions. Sensitivity
between quarter of birth and earnings among stu- analysis methods for IVanalyses have been devel-
dents who went to college to test the IV oped by Angrist et al. (1996), Brookhart and
assumptions. Schneeweiss (2007), Small (2007), Small and
Newcomers to IV methods often think that the Rosenbaum (2008), and Baiocchi et al. (2010),
validity of the IV can be tested by regressing the among others. Here an approach will be presented
outcome on treatment received, the IV and mea- to sensitivity analysis for violations of the
sured confounders, and testing whether the coef- assumption that the IV is independent of
ficient on the IV is significant. However, this is not unmeasured confounders. Assume that the con-
a valid test as even if the IV assumptions hold, the cern is that the IV may be related to an
coefficient on the IV would typically be nonzero. unmeasured confounder U which has mean
One way to see this is that if there are no measured 0 and variance 1 and is independent of the mea-
confounders, the test amounts to testing whether sured confounders X (U can always taken be to
(i) E[Y|Z = 1, D = 1] E[Y|Z = 0, D = 1] = 0 and the residual of the unmeasured confounder given
21 Instrumental Variable Analysis 503
the measured confounders to make this assump- be strong evidence that high-level NICUs reduce
tion plausible). Consider the following model: mortality substantially (lower end of 95% CI:
0.14% reduction). However, if there was an
Y di ¼ α þ βd þ γ T Xi þ δU i þ ei unmeasured confounder U that decreased the
U i ¼ ρ þ ηZi þ vi (12) death rate by 0.5% for a one standard deviation
increase in U and was 0.5 standard deviations
Eðvi j Xi , Zi Þ ¼ 0, Eðei j Xi , Zi Þ ¼ 0:
higher in subjects with Z = 1 versus Z = 0, then
there would no longer be strong evidence that high-
β is the causal effect of increasing D by one
level NICUs reduce mortality substantially. It can
unit. The sensitivity parameters are δ, the effect of
be useful to calibrate the effect of a potential
a one standard deviation increase in the
unmeasured confounder U to that of a measured
unmeasured confounder on the mean of the poten-
confounder. For example, an increase in gestational
tial outcome under no treatment, and η, how much
age from 30 to 33 weeks, which is a one standard
higher the mean of the unmeasured confounder Ui
deviation increase in gestational age, is associated
is in standard deviation units for Zi = 1 versus
with a reduction in the death rate of 2.2%, and the
Zi = 0. Model (12) says that Zi would be a valid IV
mean gestational age is 0.093 standard deviations
if both the measured confounders Xi and the
smaller among near (Z = 1) versus far (Z = 0)
unmeasured confounder Ui were controlled for.
babies. For a comparable U that reduced the death
Under model (12), the following holds
rate by 2.2% for a one standard deviation increase
in U and was 0.093 standard deviations smaller in
Y i ¼ α þ βDi þ γ T Xi þ δU i þ ei babies with Z = 1 versus Z = 0, there would still be
Y i δηZi ¼ α þ δρ þ βDi þ γ T Xi þ ei þ δvi strong evidence that high-level NICUs reduce mor-
Eðvi j Xi , Z i Þ ¼ 0, Eðei j Xi , Z i Þ ¼ 0: tality substantially (see the last row of Table 6).
A sensitivity analysis for violations of the
Consequently, a consistent estimate of and assumption that the IV has no direct effect on the
inferences for β can be obtained by carrying out a outcome can be carried out as follows. Suppose
two-stage least squares analysis with Yi δηZi as that the IV has a direct effect of λ but the IV is
the outcome variable, Di as the treatment variable, independent of unmeasured confounders, that is,
Xi as the measured confounders, and Zi as the
IV. Table 6 shows a sensitivity analysis for the Y iz, d ¼ α þ βd þ γ T Xi þ λz þ ei
(13)
NICU study. If there was an unmeasured con- Eðei j Xi , Z i Þ ¼ 0,
founder U that decreased the death rate by 0.1%
for a one standard deviation increase in U and was Then, a consistent estimate of and inferences
0.5 standard deviations higher on average in sub- for β can be obtained by carrying out a two-stage
jects with Z = 1 versus Z = 0, then there would still least squares analysis with Yi λZi as the outcome
Table 6 Estimates and 95% confidence intervals for b, deviation increase in the unmeasured confounder on the
the risk difference effect of a premature baby being deliv- mean of the potential outcome under no treatment, and ,
ered in a high-level NICU, for different values of the how much higher the mean of the unmeasured confounder
sensitivity parameters d, the effect of a one standard Ui is in standard deviation units for Zi = 1 versus Zi = 0
δ η β^ 95% CI for β
0 0 0.0059 (0.0091, 0.0027)
0.001 0.5 0.0046 (0.0079, 0.0014)
0.005 0.5 0.0004 (0.0029, 0.0036)
0.001 0.5 0.0071 (0.0104, 0.0039)
0.005 0.5 0.0121 (0.0154, 0.0089)
0.022 0.093 0.0110 (0.0142, 0.0078)
504 M. Baiocchi et al.
variable, Di as the treatment variable, Xi as the same. Under the assumption that the variance
measured confounders, and Zi as the IV. When a of the outcomes for the always takers,
proposed IV Z is thought to be independent of never takers, compliers under treatment,
unmeasured confounders but there is concern and compliers under control is the same σ 2
that Z might have a direct effect on the outcome, for each group, the asymptotic variance of
pffiffiffiffi
Joffe et al. (2008) proposed an extended instru- ^ 2SLS CACE , where CACE
N CACE ^ 2SLS is
mental variable strategy for obtaining an unbi- the two-stage least squares estimator Eq. 3, is
ased estimate of the causal effect of treatment
that requires having a covariate W which inter-
acts with Z in affecting treatment but for which σ 2 Var ðZ Þ
the direct effect of Z does not depend on W. This CovðD, ZÞ
method is described in section “Extended σ2
¼ ,
Instrumental Variable Method for When Pro- ½PðD ¼ 1j Z ¼ 1Þ PðD ¼ 1j Z ¼ 0Þ2
posed IV Has a Direct Effect.” (14)
confounders in a sample, and thus, some In summary, when the IV is weak, the IV
confounding will get transferred to the pre- estimate may have high variance, and if it is
dicted value of the treatment. This will result weak enough (i.e., partial F statistic less than
in some unmeasured confounding getting 10), it is important to use inference methods
transferred to the second-stage estimate of the other than two-stage least squares to provide
treatment effect. Stock et al. (2002) studied accurate inferences. These inference methods
what strength of IV is needed to ensure that may inform us that the confidence interval for
two-stage least squares provides reliable infer- the treatment effect is very wide, but it is possible
ences. They suggested looking at the first-stage that even when the IV is weak, if the treatment
partial F statistic for testing that the coefficient effect is large enough and the sample size is big
on the IV(s) is zero. For one IV, if this first enough, there may still be a statistically signifi-
stage partial F statistic is less than about 10, the cant treatment effect assuming the IV is valid.
two-stage least squares inferences are mislead- The third problem with weak IVs is that they are
ing in the sense that the type I error rate of a very sensitive to bias from being slightly invalid,
nominal 0.05 level is actually greater than 0.15. that is, being slightly correlated with unmeasured
If more than one IV is used, then the first-stage confounders. This problem does not go away
partial F statistic needs to be larger to avoid with a larger sample size. A slightly biased but
misleading inferences, greater than 12 for two strong IV may be preferable to a less biased but
IVs, greater than 16 for five IVs, and greater weak IV (Small and Rosenbaum 2008).
than 21 for ten IVs.
A number of methods have been devel-
oped that provide accurate inferences when Binary Outcomes
the IV is weak. One method is to use the
permutation inference developed in Imbens Often in health services research, the outcomes of
and Rosenbaum (2005) and illustrated in interest take values which are not continuous and
Small and Rosenbaum (2008). Another thus are not amenable to common techniques such
method developed by Moreira (1990) is to as two-stage least squares (2SLS). In this section,
consider the conditional distribution of the methods appropriate for binary outcomes will be
likelihood ratio statistic, conditioning on the discussed. In the next section methods, appropri-
value of nuisance parameters. This method is ate for other noncontinuous outcomes settings
implemented in a Stata program CLRv2. will be introduced. For good general reviews of
3. Highly sensitive to bias from unmeasured con- estimating IV effects in the binary outcome case,
founders. Recall formula (10) for the bias in the see Clarke and Windmeijer (2012), Vansteelandt
IVestimator when the proposed IV is associated et al. (2011), and Angrist (2001) (along with asso-
with an unmeasured confounder U. The numer- ciated comments).
ator measures the association between the IV In 2SLS, one regression is run predicting the
and the unmeasured confounder (multiplied by treatment, and then the estimated value of the treat-
how much the unmeasured confounder affects ment from this model is used and put into a second
the outcome). The denominator is the propor- regression of the outcome on the covariates and the
tion of compliers and reflects the strength of the predicted treatment. This type of estimator, where
IV. Thus, when the IV is weak (i.e., the propor- the predictions from one model are substituted into
tion of compliers is small), the effect of the IV a second model, is often referred to as a two-stage
being invalid from being associated with an predictor substitution (2SPS).
unmeasured confounder is greatly exacerbated, When first encountering situations with binary
and even a minor association between the IV outcomes, most analysts will recognize the regu-
and an unmeasured confounder can lead to lar 2SLS is problematic because it will not respect
substantial bias if the IV is weak (Bound et al. boundary conditions (i.e., the functional form
1995; Small and Rosenbaum 2008). imposes no constraints on parameter space,
506 M. Baiocchi et al.
meaning 2SLS can produce logical absurdities additivity of the terms on the right-hand side of
such as probabilities greater than one or even the regression to separate the endogeneity of the
negative). Through analogy to 2SLS, the naive treatment and allow unbiased estimation of the
analyst may consider changing the second-stage treatment effect. If M () is nonlinear, though,
regression to be a logistic model (or perhaps a generally 2SPS will not maintain the separabil-
probit) in lieu of the linear model. This would be ity of the confounding variables through the
a 2SPS. Unfortunately, in general, 2SPS models substitution method.
do not have the nice orthogonality properties of Another approach here is to use a two-stage
2SLS and produce biased estimates (Angrist and residual inclusion (2SRI) model. The idea in a
Pischke 2009; Wooldridge 1997). Other 2SRI is to model the unobserved covariates
approaches should be considered. These using the instrument, not the treatment, and
approaches include the parametric approaches of thereby remove the endogeneity. The first stage
Hirano et al. (2000) and the semiparametric in a 2SRI model is the same in that you model the
approaches of Abadie (2003), Tan (2006), and treatment selection. But the difference is that in
Vansteelandt et al. (2011)). Two other widely the second stage you substitute in the residuals
used approaches (two-stage residual inclusion from the first stage, not the predicted treatment. In
and a binary probit model) and a relatively new formula this is to say:
approach (effect ratios) will be considered in
detail below. EðYj D, X, UÞ ¼ M DβD þ XT βX þ UT β^ U
(17)
bivariate distribution of the error terms from the The effect ratio, λ, is the parameter
selection model and the error terms from the out-
come model (Bhattacharya et al. 2006; Muthen P2 1
PI
i¼0 j¼0 Y ij Y 0
ij
1979). This model leans on the parametric assump- λ¼P P , (18)
I 2
tions of the error terms, leaving the conclusions i¼0 j¼0 D 1
ij D 0
ij
sensitive to modifications of the assumptions. Addi-
tionally, these models suffer from difficulty in max- P
where it is implicitly assumed that 0 6¼ Ii¼0
imizing the likelihood functions and trouble with P2 1
j¼0 Dij Dij . Here, λ is a parameter of the
0
calculating appropriate standard errors (Freedman
and Sekhon 2010). finite population of 2I individuals, and because
Y 0ij , Y 1ij and D0ij , D1ij are not jointly
Matching-Based Estimator: Effect Ratio observed, λ cannot be calculated from observ-
able data so inference is required.
Coming out of a different tradition, a class of To test the null hypothesis H0: λ = λ0, construct
estimator has been proposed which is also capa- the following statistics
ble of dealing with binary outcomes in an IV
setting. Proposed in Baiocchi et al. (2010), the (
1X I X
2
“effect ratio” in a binary setting can be thought T ðλ0 Þ ¼ Zij Y ij λo Dij
I i¼1 j¼1
of as a risk difference estimator for the com- )
pliers. The effect ratio is predicated on having X
2
1 Zij Y ij λo Dij
matched sets. In Baiocchi et al. (2010) matched j¼1
pairs were constructed using a study design- X I
1
based approach called near-far matching. ¼ V i ðλo Þ, say,
I
Near-far matching will be discussed in the next i¼1
section. (19)
First, notation will be introduced required to
discuss the effect ratio. Assume there are where, because Y ij λ0 Dij ¼ Y 1ij λ0 D1ij if Zij = 1
I matched pairs, i = 1,. . ., I, with 2 subjects, and Y ij λ0 Dij ¼ Y 0ij λ0 D0ij if Zij = 0, write
j = 1, 2, one treated subject and one control, or
2I subjects in total. If the jth subject in pair
i receives the treatment, write Zij = 1, whereas if X
2
this subject receives the control, write Zij = 0, so V i ðλ0 Þ ¼ Zij Y 1ij λ0 D1ij
j¼1
1 = Zi1 + Zi2 for i = 1,. . ., I. The matched pairs
were formed by matching for an observed covar- X
2
1 Z ij Y 0ij λ0 D0ij : (20)
iate xij but may have failed to control an
j¼1
unobserved covariate uij; that is, xij = xik for all
i, j, k, but possibly uij 6¼ uik.
For any outcome, each subject has two poten-
Also, define
tial responses, one seen when the instrument
encourages the subject to take the treatment,
1 XI
Zij = 1, the other seen when the instrument ran- S2 ðλ0 Þ ¼ fV i ðλ0 Þ T ðλ0 Þg2 :
domly assigns the subject to be encouraged to take I ðI 1Þ j¼1
the control, Zij = 0. Here, there are two responses,
the potential outcomes Y ðZij ¼1Þ , Y ðZij ¼0Þ and the As shown in Baiocchi et al. (2010), under
reasonable conditions, the hypothesis H0: λ = λ0
potential treatment selections DðZij ¼1Þ , DðZij ¼0Þ : may be tested by comparing the test statistic T (λ0)
Abbreviate these as Y 0ij , Y 1ij and D0ij , D1ij . / S (λ0) to the standard normal.
508 M. Baiocchi et al.
Multinomial, Survival for understanding the causal effect for ordinal out-
and Distributional Outcomes comes, including the measure of stochastic superi-
ority of treatment over control for compliers –
Multinomial Outcome
SSC ¼ P Y 1i > Y 0i j Ci ¼ complier
Multinomial outcomes (i.e., nominal or ordinal out- 1
comes) are common in health services research. For þ P Y 1i ¼ Y 0i j Ci ¼ complier
2
example, Bruce et al. (2004) conducted a random- J1 X
X Jj
1X J
ized trial to improve adherence to prescribed depres- ¼ tjþk vj þ tj vj
sion treatments among depressed elderly patients in j¼k k¼1
2 j¼1
(21)
primary care practices; the outcomes of interest J1 X
X
Jj
qj ð1 π c Þsj
included continuous outcomes as well as multino- ¼ tjþk vj
mial outcomes such as the number of depression j¼k k¼1
πc
symptoms, ranging from 0 to 9, and the depression
1X J
qj ð1 π c Þsj
class (major, minor, or no depression). There was þ tj
2 j¼1 πc
noncompliance in this trial, and Ten Have et al.
(2004) used random assignment as an IV to estimate
the effect of receiving treatment on continuous out- SSC = 0.5 indicates no causal effect, and
comes. Cheng (2009) considered how to estimate SSC > 0.5 indicates beneficial effect of the treat-
the effect of receiving treatment on the multinomial ment for compliers if a higher value of the out-
outcomes using random assignment as an IV. come is a better result. Compared to the CACE,
For ordinal outcomes, the CACE is a function SSC is easy to interpret and avoids the problem of
of coding scores and probabilities with respect to choosing scores Wj, but without use of weighting
the categories: scores, it may not describe the strength of the
effect well when some specific categories are
1 known to be more important than other categories
CACE ¼ E Y i Y 0i j Ci ¼ co in measuring the treatment effect.
X X
¼ W j tj W j vj For nominal outcomes, it is difficult to get a
j j
X summary measure of the causal effect such as the
¼ W j tj CACE or SSC for ordinal outcomes. Instead, the
j
" # treatment effect on the entire outcome distribu-
1 X X tions of compliers with and without treatment can
W j gj ð1 π c Þ W j sj
πc j j be evaluated, that is, to compare tj to vj, j = 1,. . .,
J and test the equality of tj and vj, j = 1,. . ., J.
where Wj is the coding score; tj, vj, and sj are the Cheng (2009) estimated those causal effects with
probabilities for compliers under treatment and the likelihood method and proposed a bootstrap/
control and never takers, respectively; and qj is double bootstrap version of a likelihood ratio test
the probability for observed group Zi = 0, Di = 0 for the inference when the true values of parame-
for the jth category. For estimating the CACE for ters are on the boundary of the parameter spaces
ordinal outcomes, the coding score needs to be under the null.
chosen. Equally spaced scores or linear transfor-
mations of them, midranks and ridit scores are
among the options. A sensitivity analysis can be Survival Outcome
performed with different choices of scores to see
how the results differ. Compared to trials with continuous, binary, and
In addition to the CACE, Cheng (2009) consid- multinomial outcomes, randomized trials with
ered some other functions of outcome distributions survival outcomes often have an issue of
21 Instrumental Variable Analysis 509
administrative censoring in addition to noncom- Similar to the standard IV estimator for CACE,
pliance. For those studies, Robins and Tsiatis the standard IV estimator for the compliers differ-
(1991) considered a structural accelerated failure ence in survival probabilities is
time model and developed semiparametric estima-
tors for this model. Joffe (2001) provided a good ^S 1 ðV Þ ^S 0 ðV Þ
^S c1 ðV Þ ^S c0 ðV Þ ¼ ,
discussion of their approach and comparisons with ^ ðDj Z ¼ 1Þ E
E ^ ðDj Z ¼ 0Þ
other survival analysis methods. Loeys and
Goetghebeur (2003) and Cuzick et al. (2007) con- which is the difference of the observed survival
sidered a structural proportional hazards model in probabilities at time V between compliers under
which the hazard of the potential failure time under treatment and control divided by the proportion of
treatment for a certain group of subjects is propor- compliers. ^S z ðV Þ is the Kaplan-Meier estimator
tional to the hazard of the potential failure time under assignment z. In addition to the five IV
under control for these same subjects. Both the assumptions discussed in section “Framework
structural accelerated failure time model and the and Notation,” an additional assumption is needed
structural proportional hazards model are semi- to ensure that the estimator based on Kaplan-
parametric models, where the effect of the treat- Meier estimates is consistent:
ment on the distribution of failure times is modeled
parametrically. Independence Assumption of Failure Times
Baker (1998) extended the models and and Censoring Times The distributions of poten-
assumptions for discrete-time survival data and tial failure times T and administrative censoring
derived closed form expressions for estimating times C are independent of each other. Type I
the difference in the hazards at a specific time censoring (i.e., censoring times are the same for
between compliers under treatment and control all subjects) and random censoring are two spe-
based on maximum likelihood. Baker (1998)’s cial cases.
estimator is analogous to the standard IVestimator Although the standard IV estimator is very
for a survival outcome. Nie et al. (2011) discussed useful, it may give negative estimates for hazards
this standard IV approach and parametric maxi- and be inefficient because it does not make full
mum likelihood methods for the difference in use of the mixture structure implied by the latent
survival at a specific time between compliers compliance model. When the survival functions
under treatment and control. follow some parametric distributions, Nie et al.
Here, the standard IVapproach of Baker (1998) (2011) used the EM algorithm to obtain the MLE
will be reviewed. Let Sc1(V ), Sc0(V ), Sat(V ), and on the difference in survival probabilities for com-
Snt(V ) be the potential survival functions at time pliers. However, the MLEs could be biased when
Vof compliers in the treatment and control groups the parametric assumptions are not valid. To
and of always takers and never takers, respec- address this concern, Nie et al. (2011) developed
tively, Sz(V ) be the survival probabilities at time a nonparametric estimator based on empirical
V for the group with assignment Z = z, and Szd(V ) likelihood that makes use of the mixture structure
be the survival probabilities at time V for the to gain efficiency over the standard IV method
group with assignment Z = z and treatment while not depending on parametric assumptions
received D = d. By Table 1, the following holds to be consistent.
S1 ðV Þ ¼ π c Sc1 ðV Þ þ π at Sat ðV Þ þ π nt Snt ðV Þ,
πc π at
S11 ðV Þ ¼ Sc1 ðV Þ þ Sat ðV Þ Effect of Treatment on Distribution
π c þ π at π c þ π at
of Outcomes
S10 ðV Þ ¼ Snt ðV ÞS0 ðV Þ¼ π c Sc0 ðV Þ þ π at Sat ðV Þ þ π nt Snt ðV Þ,
focuses on estimating the effect of treatment on requires information on the treatment’s effect on
average outcomes, for example, the CACE the entire distribution of outcomes rather than just
(Imbens and Angrist 1994; Angrist et al. 1996). the average effect because a patient’s utility over
However, in addition to the average effect, knowl- outcomes may be nonlinear over the outcome
edge of the causal effect of a treatment on the scale (Karni 2009; Pliskin et al. 1980). Hogan
outcome distribution and its general functions and Lee (2004), Saigal et al. (1999), and Sommers
can often provide additional insights into the et al. (2007) provide examples in HIV care, neo-
impact of the treatment and therefore can be of natal care, and cancer care, respectively.
significant interest in many situations (Poulson For distributional treatment effects on non-
et al. 2012). For example, in a study of the effect degenerate outcome variables with bounded sup-
of school subsidized meal programs on children’s port, without any parametric assumption, Abadie
weight, both low weight and high weight are (2002) used the standard IV approach to estimate
adverse outcomes; therefore, knowing the effect the counterfactual cumulative distribution func-
of the program on the entire distribution of out- tions (cdfs) of the outcome of compliers with
comes rather than just average weight is important and without the treatment and proposed a boot-
for understanding the impact of the program. For strap procedure to test distributional hypotheses
an individual patient deciding which treatment to with the Kolmogorov-Smirnov statistic. How-
take, the patient must weight the effects of the ever, Abadie (2002) and Imbens and Rubin
possible treatments on the distribution of out- (1997a) pointed out that the standard IV estimates
comes, the costs of the treatments and the poten- of the potential cdfs for compliers may not be
tial side effects of the treatments (Hunink et al. nondecreasing functions:
2001). Therefore, making the best decision
^ f1ðY i yÞDi j Z i ¼ 1g E
E ^ f1ðY i yÞDi j Z i ¼ 0g
^ c1 ðyÞSIV ¼
H ^ c0 ðyÞSIV
H
^ ðDi j Zi ¼ 1Þ E
E ^ ðDi j Z i ¼ 0Þ
^ f1ðY i yÞð1 Di Þj Zi ¼ 1g E
E ^ f1ðY i yÞð1 Di Þj Z i ¼ 0g
¼ ,
^ fð1 Di Þj Zi ¼ 1g E
E ^ fð1 Di Þj Zi ¼ 0g
by Anderson (1979) to relate the densities of the Under the density ratio model (22), the log
latent compliance classes by an exponential tilt: likelihood is
hj ð y Þ
¼ exp αj þ βj y , j ¼ 1, 2, 3 (22) ‘ ¼ n01 logϕa þ n00 logð1 ϕa Þ
ð
h0 y Þ
þn10 logϕn þ n11 logð1 ϕn Þ
X
n
where h0( y) is unspecified and h0( y) = P þ ½I ðZi ¼ 0, Di ¼ 1Þðα3 þ β3 yi Þ
(Yi = y|Zi = 0, Ci = co), h1( y) = P (Yi = y|Ci = nt), i¼1
8 09
< ξ1 = ^ 1 ðιÞ H
d SEM ¼ H
CQCE ^ 1 ðιÞ
2 0
ξ0 ¼ ξ02
: 0;
ξ3 When ι = 0.5, it is the difference of the
8 9
< δϕn þ ð1 δÞð1 ϕa Þð1 λÞ = medians for the compliers under treatment and
¼ τδð1 ϕn Þ control.
: ;
ð1 δÞϕa þ δð1 ϕn Þð1 τÞ The goodness of fit of the density ratio model
can be tested by comparing estimated outcome
Then, the maximum semiparametric empirical cdfs based on the density ratio model to the empir-
likelihood estimate of η = (ϕa, ϕn, α1, β1, α2, β2, ical distribution function estimates (Qin and
α3, β3) can be obtained by maximizing the profiled Zhang 1997):
log likelihood through the EM algorithm. And
pffiffiffi
then the outcome densities (masses) of compliers Δzd ¼ sup n F ^ zd ðyÞ F
~ zd ðyÞ, z, d ¼ 0,1:
1<y<1
under control (h0( y)) and treatment (h2( y)) can
be estimated by ^ h 0 ðyi Þ; see Eq. 25, and ^h 0 ðyi Þexp (28)
^ 2 þ β^ 2 yi , respectively, and their corresponding
α
P The p-value of the goodness-of-fit test can be
cdfs H0( y) and H2( y) are estimated by H ^ 0 ðyÞ ¼
P i
estimated by a bootstrap p-value
^h 0 ðyi ÞI ðyi yÞ and H ^ 2 ðyÞ ¼ ^ ^ 2þ
i h 0 ðyi Þexpðα
β^ 2 yi ÞI ðyi yÞ, respectively. To examine the causal
^B ¼ P
P ^ B Δ Δobs (29)
effect of actually receiving treatment on the outcome zd zd zd zd
distribution for compliers, the equality of h0( y) and
h2( y) can be tested by testing H0: α2 = β2 = 0 by the where Δobszd is obtained from the actually observed
semiparametric empirical likelihood ratio statistic data and Δzd is calculated from B bootstrap sam-
ples generated under the null hypothesis: the den-
sity ratio model (22) is true.
R ¼ 2 max‘ ðηÞ max‘ ðη1 , α2 ¼ β2 ¼ 0Þ ,
η η1
η1 ¼ ðα1 , β1 , α3 , β3 , ϕa , ϕn Þ
Study Design IV and Multiple IVs
where α2 must equal 0 when β2 equals 0 because
of constraint (24). Under regularity conditions, Study Design IV: Near-Far Matching
R follows a chi-squared distribution with one
degree of freedom asymptotically under the null Study design focuses attention on the data which
hypothesis. is to be analyzed. The manner in which the data
In addition to investigating the distributional are structured largely determines the statistical
treatment effect, some function of the outcome procedures appropriate for analysis. The separa-
distributions, g(η), where g is a real-valued func- tion between study design and statistical analysis
tion with nonzero first partial derivatives, can also is quickly illustrated by considering a uniform
be estimated. For example, under the semi- randomized paired analysis. The process of
parametric setting in Cheng et al. (2009a), the matching individual units of observation into
CACE can be estimated by using pairs based on observed, pretreatment covariates,
and then randomizing one unit within each pair to
X
n
d SEM ¼
CACE ^ 2 þ β^ 2 yi 1 :
yi ^h 0 ðyi Þ exp α treatment and the other to control is study
i¼1 design. The researcher constructs the pairs by
carefully controlling the assignments to increase
One can also compare the ι quantiles of efficiency by decreasing within pair variation
outcome distributions of compliers with and (by constructing matched pairs) as well as to min-
without treatment (marginal distributions of Y1 imize unobserved bias (by randomization). These
and Y 0): steps increase the validity of the results and go a
21 Instrumental Variable Analysis 513
long way toward reassuring the audience of the Analogously for instrumental variables, it is
reliability of the reported conclusions. Only the known that if the goal is to have greater power
manner in which the data are prepared has thus far and results which are more robust to small viola-
been described. This is the design of the study. tions of the IV assumptions, then a smaller data set
Once the experiment is run and the data are with a stronger instrument is preferable to a larger
recorded, then the results need to be analyzed. data set with a weaker instrument (Small and
Given the study design, most analysts would Rosenbaum 2008). The trade-off between bigger
select a paired t-test, perhaps using student’s t. but weaker and smaller but stronger was thought
But that is not the only choice; one could justifi- to be informative, but not useful once the analyst
ably use a permutational test or, with some addi- has committed to using a particular data set. Con-
tional assumptions, a model-based approach trary to this belief, Baiocchi et al. (2010) demon-
(e.g., regression) to adjust for potential covariate strated that even within a particular data set, the
imbalances which routinely occur in finite sam- analyst may use near-far matching to go from a
ple randomizations. This is the statistical infer- weaker-but-bigger study to a more robust smaller-
ence phase of the study. Statistical inference is but-stronger study.
distinct from, though predicated on and preceded There are two objectives in near-far matching.
by, the study design. The more well understood As in a randomized controlled trial (RCT) with a
the study design, the more credibility the statis- matched-pair design, one objective in near-far
tical inference is likely to have. This is true in matching is to create matched pairs where the
experimentation and even truer in the observa- covariates are similar within a pair. Creating pairs
tional setting. with very similar covariate values (i.e., pairs which
In observational settings data is often plentiful, are near each other in covariate space) is used to
especially compared to the experimental setting. improve efficiency. The other objective in near-far
The trouble with observational data is that esti- matching is to separate observations’ instrument
mates of treatment effects tend to be plagued by values within a matched pair. In the neonatal inten-
confounding by both observed and unobserved sive care example outlined in the introduction,
covariates. The goal of study design in the obser- within a matched pair, one wants one mother to be
vational setting can be thought of as finding the highly encouraged to deliver at a high-level NICU
subset of the data which will produce the best and the other to be highly encouraged to deliver at a
study given the limitations of the data (usually in low-level NICU. This is similar to the matched-pair
the sense of internal validity). design when there is the potential for non-
In the literature, study design is also sometimes compliance. If the level of encouragement can be
referred to as “preprocessing” (Ho et al. 2007). varied, then it is preferential to have two mothers
For those new to study design, perhaps the most who are highly dissimilar (far) in their levels of
unintuitive insight is that the analysis can actually randomly assigned encouragement because it is
be improved by removing observations from then more likely that within the pair, one mother
consideration before performing the statistical will comply with the encouragement and take the
inference. This is unintuitive because, loosely treatment and the other will comply with the lack of
speaking, it seems like the study with the most encouragement and take the control. As outlined in
observations is the most informative. This is a Baiocchi et al. (2010), algorithms exist which will
recognized problem in the observational litera- construct pairs which maximize both of these objec-
ture. For example, it has become standard practice tives at the same time.
to use propensity scores to limit the analysis In most real-world examples, there will be a
to only the observation units which have trade-off between the “near” and the “far” part of
corresponding propensity score values in either the matching. The technical aspects of this trade-
the treated or control group, removing from infer- off, and how to construct such pairs, are context
ence the observational units with extreme values specific – for guidance see Baiocchi et al. (2010,
close to 1 or 0 (Rosenbaum 2002, 2009). 2012). The intuition is that as the analyst forces
514 M. Baiocchi et al.
separation in the instrument values between pairs effect for subjects who would the take treatment
of patients it becomes more difficult to find if the IV was equal to z but not take the treatment
patients with quite dissimilar instrument values if the IV was a little less than z is lim !0 E
d¼1
but very similar covariates. The Baiocchi et al. Y i Y d¼0
i j Dzi ¼ 1, Dz
i ¼ 0 ; Heckman
(2010) paper outlines both theoretical arguments and Vytlacil (1999) refer to this as the marginal
as well as practical reasons for designing studies treatment effect at z. Treatment effects of interest
with greater separation in the instrument. can all be expressed as a weighted average of
It should be noted that pair matching is being these marginal treatment effects (Heckman and
referred to, but all of these arguments hold Vytlacil 1999). For example, the treatment effect
for larger block designs. Near-far matching estimated by dichotomizing the IV as 1 or
would work with k:1 matching and other more 0 according to whether the IV is above some
exotic designs. The primary difference would be cutoff or the treatment effect estimated by
the optimization algorithm used to construct two-stage least squares using the continuous IV
the sets. can be expressed as a weighted average of the
This process is similar to propensity score marginal treatment effects. The average treatment
matching and other matching techniques in gen- effect over the whole population can also be
eral. The goal is to prepare the data, by finding the expressed as a weighted average of the marginal
parts of the data set which lend themselves to treatment effects. Identification of the average
causal inference, so as to improve the reliability treatment effect over the whole population
of the statistical analysis to be performed. Note requires identification of all the marginal treat-
that, just as with propensity score matching, the ment effects. In order for all the marginal treat-
analyst may decide to use whichever appropriate ment effects to be identified using the IV (and thus
statistical method of analysis post-matching. That the average treatment effect identified), it is
is, after performing near-far matching, the analyst required that for large values of Z, P (D = 1|Z)
may then decide to use a 2SRI model if that is approaches 1 and for small values of Z, P (D = 1|
appropriate for the given data set. But, the selec- Z) approaches 0 (Heckman and Vytlacil 1999).
tion of the statistical method must be made with Basu et al. (2007) show how to estimate marginal
justification, not out of convenience. This is why treatment effects and the average treatment effect
most analysts will decide to use the effect ratio when this condition is satisfied.
(discussed in section “Binary Outcomes”) after
performing near-far matching as the study design
leads naturally into the statistical analysis. Multiple IVs
their mothers. Each IV identifies the treatment limit of the estimators based on IV A and B,
effect for a different set of compliers. If treatment respectively, is the same; in this case, the over-
effects are heterogeneous, the complier average identifying restrictions test would give false assur-
causal effects may differ. For example, newborns ance that the IVs are valid (Small 2007).
who would only stay an extra day if born in the
a.m. compared to the p.m. may differ in their risk
characteristics compared to newborns who would Multilevel and Continuously Valued
only stay an extra day if delivered by C-section Treatments
compared to vaginal delivery, and length of stay
may have a different effect on newborns with The treatment under study may take on multiple or
different risk characteristics. continuous values, for example, the dose of a
Two-stage least squares can be used to com- medication. Two-stage least squares can still be
bine the IVs – in the first stage, regress D on both applied. Angrist and Imbens (1995) present the
Z1 and Z2 (as well as X) and then use the predicted following formula that shows that the two-stage
D as usual in the second stage. Under the assump- least squares estimator converges to a weighted
tion of homogeneous treatment effects and average of the effect of one unit changes in the
constant variance, the two-stage least squares esti- treatment level. Suppose the treatment can take on
mate is the optimal way to combine the IVs levels 0, 1,. . ., d and that monotonicity holds in the
(White 1984). When treatment effects are hetero- sense that Dz¼1 i Dz¼0
i . Assume there are no
geneous, two-stage least squares estimates a covariates. Then, the two-stage least squares esti-
weighted average of the complier average causal mator converges to
effect for the IVs with stronger IVs getting greater
weight (Imbens and Angrist 1994; Angrist and Eð Y i j Z i ¼ 1 Þ Eð Y i j Z i ¼ 0 Þ
Imbens 1995). When there are two or more dis- EðDi j Zi ¼ 1Þ EðDi j Zi ¼ 0Þ
tinct IVs, it is useful to report the estimates from
X
d
the individual IVs in addition to the combined IVs ¼ ωd E Y d Y d1 j Dz¼1 d > Dz¼0 ,
since the IVs may be estimating treatment effects d¼1
for different types of people. (30)
When there are multiple IVs and treatment
effects are homogeneous, the overidentifying PðDz¼1 d>Dz¼0 Þ
where ωd ¼ Pd . The numera-
restrictions test can be used to test the validity of PðDz¼1 d>Dz¼0 Þ
d¼1
the IVs (Davidson and MacKinnon 1993; Sargan tor of ωd is the proportion of compliers at point d,
1958). The overidentifying restrictions test tests that is, the proportion of individuals driven by the
whether the estimates from the different IVs are encouraging level of the IV from a treatment
the same. When treatment effects are homoge- intensity less than d to at least d. The ωd’s are
neous, if the estimates from two different IVs nonnegative and sum to one. The quantity E[Yd –
converge to different limits, this would show that Yd1|DZ=1 > d Dz=0] in Eq. 30 is the causal
at least one of the IVs is invalid. There are two effect of a one unit increase in the treatment from
problems with using the overidentifying restric- d 1 to d for compliers at point d. Equation 30
tions test to test the validity of IVs. First, if treat- shows that the two-stage least squares estimator
ment effects are heterogeneous, then the complier converges to a weighted average of the causal
average causal effects for the two IVs may be effects of one unit increases in the treatment
different even though both IVs are valid; in this from d 1 to d for compliers at point d, where
case, the overidentifying restrictions test would the points d at which there are more compliers get
falsely indicate that at least one of the IVs is greater weight. The weights ωd can be estimated
invalid. Second, even if treatment effects are since under monotonicity and the assumption that
homogeneous, two IVs A and B may both be the IV is independent of the potential treatment
biased but in the same way so that the asymptotic received, P(Dz=1 d > Dz=0) = P(Dz=1 d) P
516 M. Baiocchi et al.
(Dz=0 d) = P(D d|Z = 1) P(D d|Z = 0). used to form multiple IVs (e.g., Z, Z2, Z3, etc.),
See Angrist and Imbens (1995) for an extension of and a nonlinear treatment effect can be estimated
these formulas to the setting where there are (Kelejian 1971). For example, suppose YD = d =
covariates X that are controlled for. Y 0 + β1d + β2d2. Then, β1 and β2 can be consis-
Researchers often times dichotomize multi- tently estimated with a continuous IV Z by using
level or continuous treatments. However, using two least squares where D ^ is estimated by
IV methods with a dichotomized continuous treat- regressing D on Z and Z2, D ^ 2 is estimated by
ment can lead to an overestimate of the treatment regressing D on Z and Z , and β1 and β2 are
2 2
effect. Let β denote the average causal effect (30) estimated by regressing Y on D ^ and D ^ 2 . Tan
that the two-stage least squares estimator for a (2010) discusses other estimation approaches for
multilevel treatment converges to Angrist and estimating nonlinear treatment effects.
Imbens (1995) that show that if this treatment is A common setting is to have a treatment with
dichotomized as B = 1 if D l, B = 0 if D < l for three levels that may not be strictly ordered by
some 1 l d¯, then the two-stage least squares dose. Cheng and Small (2006) consider the set-
estimator using the binary treatment B converges ting of a treatment with three levels – control
to ϕβ where (0) and two active levels A and B, where A and
B are not ordered by dose and some subjects may
EðDjZ ¼ 1Þ EðDjZ ¼ 0Þ prefer A to B and some may prefer B to A. Sub-
ϕ¼
EðBjZ ¼ 1Þ EðBjZ ¼ 0Þ jects are randomly assigned to one of the three
Pd z¼1 arms 0, A and B, and then could either take the
j¼1 P D j > Dz¼0 assigned treatment or not take it and receive the
¼ 1
P Dz¼1 l > Dz¼0 control (for the control arm, all subjects receive
the control 0). The effect of treatment A versus
The only situation when ϕ = 1 is when the IV control for subjects who would take treatment
has no effect other than to cause people to switch A if offered it (i.e., compliers with treatment A)
from D = l 1 to D = l. Otherwise, when a is identified by analyzing only subjects who were
multilevel treatment is incorrectly parameterized either assigned to the control arm or the treatment
as binary, the resulting estimate tends to be too A arm. But for this setting, Cheng and Small
large relative to the average per-unit effect of the (2006) showed that the effect of treatment A for
treatment. The problem with dichotomizing a subjects who would take treatment A if assigned
multilevel treatment is that the IV has a direct to it but not treatment B and the effect of treat-
effect because the encouraging level of the IV ment A for subjects who would take treatments
can push a person to a higher level of treatment A or B if assigned to A or B, respectively, is not
even if B is 1 under both the non-encouraging and point identified. However, the data provides
encouraging levels of the IV. information that can be used to narrow bounds
Although dichotomizing a continuous treat- on these treatment effects. These treatment
ment results in a biased IV estimate, the sign of effects are of interest for individuals making
the treatment effect is still consistently estimated. decisions about which treatment to take, for
If the treatment effect for compliers is linear, example, for a very compliant subject who
that is, the causal effect of a one unit increase in knows she would take either treatment A or B if
the treatment from d 1 to d for compliers at offered it, she would like to know whether treat-
point d is the same for all d, then the two-stage ment A or B is better among very compliant sub-
least squares estimator estimates this linear treat- jects like herself; the treatment effects are also of
ment effect. If the treatment effect is nonlinear, interest for clinicians deciding which treatment
then with a binary IV, it is not possible to estimate to offer first and for health policymakers antici-
anything other than the weighted treatment effect pating what would happen were the treatment
(30). If the IV is continuous, then the IV can be (s) to be introduced into general practice in a
21 Instrumental Variable Analysis 517
setting in which compliance patterns are ex- strategy to produce a consistent estimate, the
pected to differ from those of the trial (Cheng effect of higher elementary/secondary school
and Small 2006). quality on earnings would have to be the same
for children from low-income and high-income
families – this is assumption (eiv-a2).
Extended Instrumental Variable
Method for When Proposed IV Has
a Direct Effect Software
When a proposed IV Z is thought to be indepen- Software for implementing IVanalyses is available
dent of unmeasured confounders but there is con- in R, SAS, and Stata. Here an IV analysis will be
cern that Z might have a direct effect on the illustrated using the AER package in the freely
outcome, Joffe et al. (2008) proposed an extended available software R. Consider estimating the
instrumental variables strategy for obtaining a causal effect of military service during the World
consistent (i.e., asymptotically unbiased) estimate War II era on men’s future earnings using data from
of the causal effect of treatment that requires hav- the 5% public use 1980 Census. The Census data
ing a covariate W for which: contain information on a man’s race and Census
division of birth, but is missing information on
• (eiv-a1). The covariate W interacts with Z in variables such as health and criminal behavior,
affecting treatment. which were important barriers to serving in the
• (eiv-a2). The direct effect of Z does not depend war and are important determinants of earnings.
on W. Motivated by this concern about unmeasured
confounding, Angrist and Krueger (1994) pro-
For such a setting, Joffe et al. (2008) show that posed to use time of birth as an IV; see also Small
a consistent estimate of the treatment effect can be and Rosenbaum (2008) for follow-up analyses.
obtained under the additional assumption that the Time of birth is associated with military service
treatment effect is constant across subjects by because a man only becomes eligible to serve in
using two-stage least squares where Z W is the military when he turns 18; men who turned
the IV and Z and W are included as measured 18 after World War II was over are substantially
covariates (other covariates can also be included less likely to have served in the military. Here,
in addition). As an example of this approach, Card consider the binary IV, Z = 1 if a man was born
(1995) studied the effect of education on earnings between 1925 and 1927 (most men born in these
and considered having grown up near a 4-year time periods turned 18 during World War II) and
college as an IV, but was concerned that growing Z = 0 if a man was born in 1928 (so turned 18 after
up near a college might have a direct effect on World War II was over). The data set used in
earnings, for example, through the presence of a the analysis military earnings.csv is available at
college being associated with higher school qual- www-stat.wharton.upenn.edu/dsmall/military-
ity at nearby elementary and secondary schools. earnings.csv, and the data is described in the
Card considered the covariate W = whether the file www-stat.wharton.upenn.edu/dsmall/military-
person grew up in a low-income household. The earnings-readme.txt.
interaction between growing up near a 4-year
college and being from a low-income household library(AER)
predicts going to college, because college prox- dataset=read.csv("military-earnings.
imity lowers the cost of higher education and this csv",header=TRUE) attach(dataset);
cost lowering has a bigger effect on going to # earnings = earnings in 1980
college for children from low-income families. # veteran = 1 if World War II veteran,
In order for the extended instrumental variable 0 if not
518 M. Baiocchi et al.
Angrist J, Krueger A. Does compulsory school attendance Brooks J, Chrischilles E, Scott S, Chen-Hardee S. Was
affect schooling and earnings? Q J Econ. 1991;106: breast conserving surgery underutilized for early stage
979–1014. breast cancer? Instrumental variables evidence for
Angrist J, Krueger A. The effect of age at school entry on stage II patients from Iowa. Health Serv Res.
educational attainment: an application of instrumental 2004;38:1385–402.
variables with moments from two samples. J Am Stat Bruce M, Ten Have T, Reynolds C III, Katz I, Schulberg H,
Assoc. 1992;87:328–36. Mulsant B, Brown G, McAvay G, Pearson J,
Angrist J, Krueger A. Why do World War II veterans earn Alexopoulos G. Reducing suicidal ideation and depres-
more than nonveterans? J Labor Econ. 1994;12:74–97. sive symptoms in depressed older primary care
Angrist J, Pischke J-S. Mostly harmless econometrics: an patients: a randomized trial. J Am Med Assoc.
empiricist’s companion. Princeton/Oxford: Princeton 2004;291:1081–91.
University Press; 2009. Cai B, Small D, Ten Have T. Two-stage instrumental
Angrist J, Imbens G, Rubin D. Identification of causal variable methods for estimating the causal odds ratio:
effects using instrumental variables. J Am Stat Assoc. analysis of bias. Stat Med. 2011;30:1809–24.
1996;91:444–55. Cai B, Hennessy S, Flory JH, Sha D, Ten Have TR, Small
Baiocchi M, Small D, Lorch S, Rosenbaum P. Building a DS. Simulation study of instrumental variable
stronger instrument in an observational study of peri- approaches with an application to a study of the anti-
natal care for premature infants. J Am Stat Assoc. diabetic effect of bezafibrate. Pharmacoepidemiol Drug
2010;105:1285–96. Saf. 2012;21:114–20.
Baiocchi M, Small D, Yang L, Polsky D, Groeneveld Card D. Using geographic variation in college proximity to
P. Near/far matching: a study design approach to instru- estimate the return to schooling. Toronto: University of
mental variables. Health Serv Outcome Res Methodol. Toronto Press; 1995. p. 201–22.
2012;12:237–53. Cheng J. Estimation and inference for the causal effect of
Baker S. Analysis of survival data from a randomized trial receiving treatment on a multinomial outcome. Bio-
with all-or-none compliance: estimating the cost- metrics. 2009;65:96–103.
effectiveness of a cancer screening program. J Am Cheng J, Small D. Bounds on causal effects in three-arm
Stat Assoc. 1998;93:929–34. trials with noncompliance. J R Stat Soc Ser B. 2006;
Balke A, Pearl J. Bounds on treatment effects for studies 68:815–36.
with imperfect compliance. J Am Stat Assoc. 1997; Cheng J, Qin J, Zhang B. Semiparametric estimation and
92:1171–6. inference for distributional and general treatment
Basu A, Heckman J, Navarro-Lozano S, Urzua S. Use of effects. J R Stat Soc Ser B Stat Methodol.
instrumental variables in the presence of heterogeneity 2009a;71:881–904.
and self-selection: an application to treatments of breast Cheng J, Small D, Tan Z, Ten Have T. Efficient nonpara-
cancer patients. Health Econ. 2007;16:1133–57. metric estimation of causal effects in randomized trials
Bhattacharya J, Goldman D, McCaffrey D. Estimating with noncompliance. Biometrika. 2009b;96:19–36.
probit models with self-selected treatments. Stat Med. Clarke P, Windmeijer F. Instrumental variable estimators
2006;25:389–413. for binary outcomes. J Am Stat Assoc. 2012;
Bhattacharya J, Shaikh A, Vytlacil E. Treatment effect 107:1638–52.
bounds under monotonicity assumptions: an applica- Cole J, Norman H, Weatherby L, Walker A. Drug
tion to Swan-Ganz catheterization. Am Econ Rev. copayment and adherence in chronic heart failure:
2008;98:351–6. effect on costs and outcomes. Pharmacotherapy.
Bound JD, Jaeger DA, Baker RM. Problems with instru- 2006;26:1157–64.
mental variables estimation when the correlation Cox D. Planning of experiments. New York: Wiley; 1958.
between the instruments and the endogenous explana- Cuzick J, Sasieni P, Myles J, Tyler J. Estimating the effect
tory variables is weak. J Am Stat Assoc. 1995;90 of treatment in a proportional hazards model in the
:443–50. presence of non-compliance and contamination. J R
Brookhart M, Schneeweiss S. Preference-based instrumen- Stat Soc Ser B Methodol. 2007;69:565–88.
tal variable methods for the estimation of treatment Davidson R, MacKinnon J. Estimation and inference in
effects: assessing validity and interpreting results. Int econometrics. New York: Oxford University Press;
J Biostat. 2007;3:14. 1993.
Brookhart M, Wang P, Solomon D, Schneeweiss Demissie K, Rhoads G, Ananth C, Alexander G,
S. Evaluating short-term drug effects using a Kramer M, Kogan M, Joseph K. Trends in preterm
physician-specific prescribing preference as an instru- birth and neonatal mortality among blacks and whites
mental variable. Epidemiology. 2006;17:268–75. in the United States from 1989 to 1997. Am J
Brookhart M, Rassen J, Schneeweiss S. Instrumental var- Epidemiol. 2001;154:307–15.
iable methods in comparative safety and effectiveness Didelez V, Sheehan N. Mendelian randomization as an
research. Pharmacoepidemiol Drug Saf. 2010; instrumental variable approach to causal inference.
19:537–54. Stat Methods Med Res. 2007;16:309–30.
520 M. Baiocchi et al.
Durbin J. Errors in variables. Rev Inst Int Stat. 1954; Imbens G, Rubin D. Estimating outcome distributions for
22:23–32. compliers in instrumental variables models. Rev Econ
Fisher R. Design of experiments. Edinburgh: Oliver and Stud. 1997b;64:555–74.
Boyd; 1949. Inoue A, Solon G. Two-sample instrumental variables esti-
Freedman D. Statistical models: theory and practice. Cam- mators. Rev Econ Stat. 2010;92:557–61.
bridge: Cambridge University Press; 2009. Joffe M. Administrative and artificial censoring in cen-
Freedman D, Sekhon J. Endogeneity in probit response sored regression models. Stat Med. 2001;20:2287–304.
models. Polit Anal. 2010;18:138–50. Joffe M. Principal stratification and attribution prohibition:
Goedde H, Agarwal D, Fritze G, Meier-Tackmann D, good ideas taken too far. Int J Biostat. 2011;7(1):1–22.
Singh S, Beckmann G, Bhatia K, Chen L, Fang B, Joffe M, Small D, Brunelli S, Ten Have T, Feldman
Lisker R. Distribution of ADH2 and ALDH2 geno- H. Extended instrumental variables estimation for over-
types in different populations. Hum Genet. 1992; all effects. Int J Biostat. 2008;4.
88:344–6. Johnston S. Combining ecological and individual variables
Goyal N, Zubizarreta J, Small D, Lorch S. Length of stay to reduce confounding by indication: case study – sub-
and readmission among late preterm infants: an instru- arachnoid hemorrhage treatment. J Clin Epidemiol.
mental variable approach. Hosp Pediatr. In press. 2000;53:1236–41.
Heckman J, Robb R. Alternative methods for evaluating Kang H, Kreuels B, Adjei O, May J, Small D. The causal
the impacts of interventions: an overview. J Econ. effect of malaria on stunting: a Mendelian randomiza-
1985;30:239–67. tion and matching approach, Working Paper.
Heckman J, Vytlacil E. Local instrumental variables and Karni E. A theory of medical decision making under
latent variable models for identifying and bounding uncertainty. J Risk Uncertain. 2009;39:1–16.
treatment effects. Proc Natl Acad Sci. 1999;96:4730–4. Kaushal N. Do food stamps cause obesity? Evidence from
Hernán M, Robins J. Instruments for causal inference: an immigrant experience. J Health Econ. 2007;26:968–91.
epidemiologist’s dream? Epidemiology. 2006;17:360. Kelejian H. Two-stage least squares and econometric sys-
Hernán M, Robins J. Causal inference; 2013. tems linear in parameters but nonlinear in the endoge-
Hirano K, Imbens G, Rubin D, Zhou X. Assessing the nous variables. J Am Stat Assoc. 1971;66:373–4.
effect of an influenza vaccine in an encouragement Kitcheman J, Adams C, Prevaiz A, Kader I, Mohandas D,
design. Biostatistics. 2000;1:69–88. Brookes G. Does an encouraging letter encourage
Ho V, Hamilton B, Roos L. Multiple approaches to attendance at psychiatric outpatient clinics? The
assessing the effects of delays for hip fracture patients Leeds PROMPTS randomized study. Psychol Med.
in the United States and Canada. Health Serv Res. 2008;38:717–23.
2000;34:1499–518. Korn E, Baumrind S. Clinician preferences and the estima-
Ho D, Imai K, King G, Stuart E. Matching as nonparamet- tion of causal treatment differences. Stat Sci.
ric preprocessing for reducing model dependence in 1998;13:209–35.
parametric causal inference. Polit Anal. 2007;15: Kramer M, Rooks Y, Pearson H. Growth and development
199–236. in children with sickle-cell trait. N Engl J Med.
Hogan J, Lee J. Marginal structural quantile models for 1978;299:686–9.
longitudinal observational studies with time-varying Lawlor D, Harbord R, Sterne J, Timpson N, Smith
treatment. Stat Sin. 2004;14:927–44. G. Mendelian randomization: using genes as instru-
Holland P. Causal inference, path analysis, and recursive ments for making causal inferences in epidemiology.
structural equations models. Sociol Methodol. Stat Med. 2008;27:1133–63.
1988;18:449–84. Little R, Yau L. Statistical techniques for analyzing data
Hudgens M, Halloran M. Towards causal inference with from prevention trials: treatment of no-shows using
interference. J Am Stat Assoc. 2008;103:832–42. Rubin’s causal model. Psychol Methods. 1998;3:
Hunink M, Glasziou P, Siegel J, Weeks J, Pliskin J, 147–59.
Elstein A, Weinstein M. Making in health and medi- Loeys T, Goetghebeur E. A causal proportional hazards
cine: integrating evidence and values. Cambridge: estimator for the effect of treatment actually received in
Cambridge University Press; 2001. a randomized trial with all-or-nothing compliance. Bio-
Imbens G. Nonadditive models with endogenous regres- metrics. 2003;59:100–5.
sors. New York: Cambridge University Press; 2007. Lorch S, Baiocchi M, Ahlberg C, Small D. The differential
Imbens G, Angrist J. Identification and estimation of local impact of delivery hospital on the outcomes of prema-
average treatment effects. Econometrica. 1994;62: ture infants. Pediatrics. 2012a.
467–75. Lorch S, Kroelinger C, Ahlberg C, Barfield W. Factors that
Imbens G, Rosenbaum P. Robust, accurate confidence mediate racial/ethnic disparities in us fetal death rates.
intervals with weak instruments: quarter of birth and Am J Public Health. 2012b;102:1902–10.
education. J R Stat Soc Ser A. 2005;168:109–26. Malkin J, Broder M, Keeler E. Do longer postpartum
Imbens G, Rubin D. Bayesian inference for causal effects stays reduce newborn readmissions? Analysis using
in randomized experiments with noncompliance. Ann instrumental variables. Health Serv Res. 2000;35:
Stat. 1997a;25:305–27. 1071–91.
21 Instrumental Variable Analysis 521
McClellan M, McNeil B, Newhouse J. Does more inten- Rubin D. Estimating causal effects of treatments in ran-
sive treatment of acute myocardial infarction in the domized and non-randomized studies. J Educ Psychol.
elderly reduce mortality? Analysis using instrumental 1974;66:688–701.
variables. JAMA. 1994;272:859. Rubin D. Formal modes of statistical inference for causal
Moreira M. A conditional likelihood ratio test for structural effects. J Stat Plan Inference. 1990;25:279–92.
models. Econometrica. 1990;71:463–80. Saigal S, Stoskopf B, Feeny D, Furlong W, Burrows E,
Muthen B. A structural probit model with latent variables. Rosenbaum P, Hoult L. Differences in preferences for
J Am Stat Assoc. 1979;74:807–11. neonatal outcomes among health care professionals,
Newman T, Vittinghoff E, McCulloch C. Efficacy of pho- parents, and adolescents. J Am Med Assoc. 1999;281:
totherapy for newborns with hyperbilirubinemia: a cau- 1991–7.
tionary example of an instrumental variable analysis. Sargan J. The estimation of economic relationships using
Med Decis Mak. 2012;32:83–92. instrumental variables. Econometrica. 1958;26:
Neyman J. On the application of probability theory to 393–415.
agricultural experiments. Stat Sci. 1990;5:463–80. Sexton M, Hebel J. A clinical trial of change in maternal
Nie H, Cheng J, Small D. Inference for the effect of smoking and its effect on birth weight. J Am Med
treatment on survival probability in randomized trials Assoc. 1984;251:911–5.
with noncompliance and administrative censoring. Bio- Sham P. Statistics in human genetics. London: Arnold;
metrics. 2011;67:1397–405. 1998.
O’Malley A, Frank R, Normand S. Estimating cost-offsets Shea J. Instrument relevance in multivariate linear models:
of new medications: use of new antipsychotics and a simple measure. Rev Econ Stat. 1997;79:348–52.
mental health costs for schizophrenia. Stat Med. Shetty K, Vogt W, Bhattacharya J. Hormone replacement
2011;30:1971–88. therapy and cardiovascular health in the United States.
Okui R, Small D, Tan Z, Robins J. Doubly robust instru- Med Care. 2009;47:600–6.
mental variables regression. Stat Sin. 2012;22:173–205. Siddique Z. Partially identified treatment effects under
Owen A. Empirical likeliood. Boca Raton: Chapman & imperfect compliance: the case of domestic violence.
Hall/CRC; 2002. IZA Discussion Paper No. 4565. 2009.
Pearl J. Causality. Cambridge: Cambridge University Small D. Sensitivity analysis for instrumental variables
Press; 2009. regression with overidentifying restrictions. J Am Stat
Permutt T, Hebel J. Simultaneous-equation estimation in a Assoc. 2007;102:1049–58.
clinical trial of the effect of smoking on birth weight. Small D, Rosenbaum P. War and wages: the strength of
Biometrics. 1989;45:619–22. instrumental variables and their sensitivity to
Phibbs C, Mark D, Luft H, Peltzman-Rennie D, Garnick D, unobserved biases. J Am Stat Assoc. 2008;103:
Lichtenberg E, McPhee S. Choice of hospital for deliv- 924–33.
ery: a comparison of high-risk and low-risk women. Sobel M. What do randomized studies of housing mobility
Health Serv Res. 1993;28:201. demonstrate? Causal inference in the face of interfer-
Pliskin J, Shepard D, Weinstein M. Utility functions for life ence. J Am Stat Assoc. 2006;101:1398–407.
years and health status. Oper Res. 1980;28:206–24. Sommers BD, Beard CJ, Dahl D, D’Amico AV, Kaplan IP,
Poulson R, Gadbury G, Allison D. Treatment heterogene- Richie JP, Zeckhauser RJ. Decision analysis using indi-
ity and individual qualitative interaction. Am Stat. vidual patient preferences to determine optimal treat-
2012;66:16–24. ment for localized prostate cancer. Cancer. 2007;
Qin J, Zhang B. A goodness-of-fit test for logistic regres- 110:2210–7.
sion models based on case–control data. Biometrika. Stock J, Wright J, Yogo M. A survey of weak instruments
1997;84:609–18. and weak identification in generalized method of
Rehan N. Growth status of children with and without sickle moments. J Bus Econ Stat. 2002;20:518–29.
cell trait. Clin Pediatr. 1981;20:705–9. Tan Z. Regression and weighting methods for causal infer-
Robins J, Greenland S. A comment on Angrist, Imbens and ence using instrumental variables. J Am Stat Assoc.
Rubin: Identification of causal effects using instrumen- 2006;101:1607–18.
tal variables. J Am Stat Assoc. 1996;91:456–8. Tan Z. Marginal and nested structural models using instru-
Robins J, Tsiatis A. Correcting for non-compliance in mental variables. J Am Stat Assoc. 2010;105:157–69.
randomized trials using rank preserving structural fail- Ten Have T, Elliott M, Joffe M, Zanutto E, Datto C. Causal
ure time models. Commun Stat Theory Methods. models for randomized physician encouragement trials
1991;20:2609–31. in treating primary care depression. J Am Stat Assoc.
Rosenbaum P. Observational studies. New York: Springer; 2004;99:16–25.
2002. Terza J, Basu A, Rathouz P. Two-stage residual inclusion
Rosenbaum P. Design of observational studies. New York: estimation: addressing endogeneity in health econo-
Springer; 2009. metric modeling. Health Econ. 2008;27:527–43.
Rosenbaum P, Rubin D. The central role of the propensity Vansteelandt S, Bowden J, Babnezhad M, Goetghebeur
score in observational studies for causal effects. E. On instrumental variables estimation of causal
Biometrika. 1983;70:41–55. odds ratios. Stat Sci. 2011;26:403–22.
522 M. Baiocchi et al.
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
Defining Causal Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
Two Concepts: SUTVA and Assignment Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
Careful Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
Strategies for Estimating Causal Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Randomized Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Natural Experiments: Instrumental Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
Regression Discontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
Difference-in-Difference and Interrupted Time Series Designs . . . . . . . . . . . . . . . . . . . . . . . . 530
Propensity Scores and Other Matching Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
may be no treatment, and in others, it may be an where interest is in predicting these missing
established treatment. For example, if the research potential outcomes.
question of interest is on the impact of electronic The causal effect of interest is the difference in
health records on behavioral health screening potential outcomes (Y(1) and Y(0)) for the same
rates in pediatric clinics, the “treatment” condition individual. The statistical problem of causal infer-
could be using electronic health records, and the ence relates to how we can best predict those
“control” condition could be using paper missing potential outcomes to make estimates.
health records (Hacker et al. 2012). Often the An important distinction is that these effects are
first step in clearly stating causal questions (and defined in relation to potential outcomes and
thus providing causal answers) is in clearly stating become the estimand of interest (i.e., the quantity
what is the intervention of interest and what is the we are interested in estimating), independent of
appropriate comparison condition. In CER, what study design we might use to learn about
for example, a researcher may have to decide them (e.g., randomized vs. nonrandomized). The
whether to compare a particular treatment to estimand is the comparison of the potential out-
another (e.g., one drug vs. another) or one drug comes that defines the causal effect of interest.
versus “usual practice” (which may be a mix of In prelude to concepts that will be discussed
treatments). Although concepts can be defined further below, there may be different estimands of
and some methods are available for cases where interest, and different methods may estimate dif-
there are more than two treatment conditions of ferent estimands, as discussed further below. Two
interest, for simplicity this chapter focuses on common estimands are the “average treatment
binary treatment (treatment vs. control or treat- effect on the treated” (ATT) and the “average
ment 1 vs. treatment 2) comparisons. treatment effect” (ATE). The ATE is the effect of
The next key concept is the units: the entities some treatment if everyone in the population
that the treatment could be given to or withheld receives the treatment versus no one receiving
form at a particular point in time. Units can be the treatment. The ATT, in contrast, is the effect
individuals, medical clinics, or communities, but for the treatment group – the difference in average
the units should correspond to the level of the outcomes if everyone in the treatment group is
treatment being evaluated. In the study of elec- treated and the average outcome if everyone in
tronic health records mentioned above, the units the treatment group actually receives the control
would be pediatric clinics. In a study of a new (this is the “counterfactual” condition). Which
therapy for diabetes, the units would be individual estimand is of more interest will depend on the
patients. substantive question. For example, when investi-
The treatment, units, and potential outcomes gating the effects of potentially harmful “treat-
form the framework for causal inference. The ments” (such as adolescent drug use), the ATT
“fundamental problem of causal inference,” how- may be more relevant since that treatment would
ever, is that only one of the potential outcomes for never be imposed on the full population; instead,
each individual or unit can be observed (Holland interest is on what the effects are for those people
1986). For individuals/units in the treatment con- who are actually drug users. In contrast, the ATE
dition, Y(1) is observed, and for individuals in the is a useful estimate when it is plausible that treat-
control group, Y(0) is observed; at a given point in ment could be disseminated to the entire popula-
time, each individual is either in Medicare or not. tion, for example, fluoride in the public water
For the individuals in Medicare, Y(1) is observed system. Note that in a randomized experiment,
and interest is in predicting what Y(0) would have these quantities are equal in expectation, and so
been if they had not been in Medicare. Similarly, this distinction does not arise. Other methods,
for individuals not in Medicare, Y(0) is observed, such as regression discontinuity and instrumental
and interest is in predicting what Y(1) would have variables, estimate other “local average treatment
been had they been in Medicare. Causal inference effects,” which are effects for a particular sub-
can thus be thought of as a missing data problem, group of individuals and are discussed further
526 E. A. Stuart and S. Naeger
below. Imai et al. (2008) also differentiate “sam- i is in the comparison group, Yi(0) is observed
ple” versus “population” effects, a distinction not (Holland 1986), and these quantities each take on
further discussed in this chapter. a single value, regardless of other conditions, such
As mentioned earlier, in most causal frame- as the treatment assignments of other individuals.
works, the treatments of interest are thought of This is known as “consistency” in the epidemiol-
as (at least hypothetical) “interventions” that one ogy literature (Cole and Frangakis 2009).
can imagine giving or withholding. As stated by The assignment mechanism is the process
Holland (1986, p. 959), “No causation without by which individuals are assigned to receive
manipulation.” In part this is to ensure that the treatment or not. In randomized experiments, the
estimand is clear and that everyone has the same assignment mechanism is the randomization pro-
understanding of what “treatment” versus “con- cess; knowing the assignment mechanism frees
trol” means. However, some of the methods the researcher from making any further assump-
discussed in this chapter have also been used to tions about the distribution of the data. This is
examine noncausal questions, such as to investi- because, in a randomized experiment, treatment
gate racial disparities, using the framework of assignment is independent of the individual’s
“balanced comparisons,” where we want to com- potential outcome. In observational studies, the
pare two groups that are as similar as possible on a researcher must infer the mechanism or process
set of observed characteristics. Zaslavsky et al. by which individuals end up in the treatment or
(2012) discuss these ideas in more detail. In this comparison group. In the example above, the
chapter, the focus is on the “effects of causes” researcher would need to model the process
rather than the “causes of effects,” as delineated through which some individuals receive Medicaid
by Holland (1986). In this way, interest is in coverage and some do not. This relates back to the
questions regarding what are the effects of partic- problem of missing data, in that the process that
ular policies, interventions, or “treatments,” creates the missing potential outcomes must be
rather than broader (and perhaps less specific) accounted for when estimating causal effects
questions about causal mechanisms or causal (Greenland 2005; Little and Rubin 2000).
models more generally.
Careful Design
Two Concepts: SUTVA and Assignment
Mechanism One theme of this chapter is the importance of
careful design. Randomized experiments have a
Since causal effects are estimated at the group particularly useful design; individuals are
level but potential outcomes are defined as indi- assigned to receive the treatment or control con-
vidual level phenomena, several assumptions are dition randomly. (The benefits of randomization
required (Little and Rubin 2000). The first is the will be discussed further below). When the assign-
stable unit treatment value assumption (SUTVA). ment mechanism is known, it is possible to obtain
SUTVA has two components: first, that there is unbiased estimates of treatment effects with no
only one “version” of each treatment: that the assumptions (for now assuming away any non-
“treatment” is well defined in that there are not compliance or missing data).
two different types of treatments within the “treat- In contrast, any nonexperimental study must
ment” condition, and second, that there is no rely on some (mostly untestable) assumptions.
interference in that the treatment assignment of Those assumptions are discussed briefly below for
one unit does not affect the potential outcomes of each method and in more detail in the accompany-
any other units. An implication of this assumption ing chapters. For that reason nonexperimental stud-
is that each unit has a unique potential outcome ies require smart choices, “choice as an alternative
under the treatment and control conditions; i.e., if to control” in the words of Paul Rosenbaum (1999,
person i is treated, Yi(1) is observed, and if person 2005a) and thoughtful designs to isolate the effects
22 Introduction to Causal Inference Approaches 527
of the treatments of interest. In other words, when This chapter aims to give researchers some
you can’t randomize, make smart choices to yield tools to start thinking about those possible
robust causal inferences. designs, outlining the basics of study designs
Many of these choices will involve selecting an with a focus on nonexperimental studies.
appropriate control group, as stressed in Readers who are interested in learning more
Rosenbaum (2010) and Cook et al. (2008). The about the careful design of nonexperimental
key feature of a randomized experiment is that it studies should refer to the discussion of threats
produces comparable or balanced treatment and to validity in Shadish et al. (2002) discussion of
control groups that lead to unbiased and consistent the importance of careful design and methods of
estimates of causal effects. Therefore, in terms of design sensitivity in Rosenbaum (1999, 2010)
internal validity, when experimental designs are and discussion of the role of design versus anal-
not available creating comparable treatment and ysis in Rubin (2007).
control groups in observational studies is more
important than creating samples that represent a
population. A study by Lehman et al. (1987) on
Strategies for Estimating Causal
the long-term effects of the sudden and unex-
Effects
pected loss of a spouse or a child provides an
example. Individuals who had either lost a spouse
This section provides an overview of common
or child in a car accident in the 4–7 years prior to
study designs that aim to estimate causal effects.
the study made up the treatment group. In order to
These descriptions are not meant to be fully
isolate the bereavement effects the authors created
detailed but rather to provide a broad understand-
a control group by identifying 7581 individuals
ing of the approach, when it can be used, and what
through driver’s license renewals and then
its underlying assumptions are. Examples of how
matched one control subject to each treatment
each design has been used in health services
group member on gender, age, family income in
research are provided.
1976 (i.e., the time period before the crash), edu-
cation level, and the number and ages of children
(Lehman et al. 1987). By carefully creating bal-
anced treatment and control groups, the authors Randomized Experiments
were able to demonstrate that psychological dis-
tress was significantly greater in the treatment First formalized by Fisher (1926), randomized
subjects (Lehman et al. 1987; Rosenbaum experiments are considered the gold standard of
2005a). In an example from road safety, causal inference, since, as mentioned above,
Rosenbaum (2010) describes a study that was (when “clean”) they yield unbiased estimates of
looking at the association of road features with treatment effects (at least for the sample at hand)
accidents; the “treatment” conditions were acci- with no additional assumptions. In contrast, all
dent sites, and the comparison conditions were of the nonexperimental methods discussed
sites exactly one mile prior to the accident at the below rely on at least some assumptions. Intui-
same time as the accident, with the idea that the tively, randomization to treatment or control
car in the accident passed by that site (with groups means that the groups are equivalent on
no problem) just before the accident, thus everything at baseline, except which treatment
controlling for factors such as weather and char- they receive. This means that any difference
acteristics of the drivers. Because of the need to in outcomes between groups can be attributed
rely on untestable assumptions, sensitivity ana- to the treatment and not to any preexisting
lyses are particularly crucial in nonexperimental differences. Mathematically it can be shown that
studies – assessing the robustness of results to the average potential outcomes observed in each
other (plausible) assumptions and considering group (treatment or control) provides an unbiased
other possible designs. estimate of the average potential outcome
528 E. A. Stuart and S. Naeger
under that condition for the population (Neyman Natural Experiments: Instrumental
1923, 1934). Variables
The three key properties of randomized exper-
iments that ensure estimates of causal effects are In some cases researchers do not have power over
unbiased are as follows. First, the treatment the treatments individuals (or providers or com-
assignment is “unconfounded” which means the munities) do or do not receive but can identify
randomization process is independent of the some naturally occurring randomness in who
potential outcomes. Second, each individual or receives which treatment. These methods rely on
unit in the experiment has a positive probability finding an “instrument” that is (or can be thought
of receiving each treatment condition (i.e., each of as) randomly assigned, affects the treatment
person could potentially be in either the treatment individuals receive, but does not affect their out-
or control group). And finally, the study is comes directly. Instrumental variable designs are
designed without any knowledge of the potential sometimes referred to as “encouragement
outcomes. designs” as the instrument can be thought of as
Examples of randomized experiments in something that encourages individuals to take the
health services research include the Oregon treatment of interest (or not). Examples of instru-
Medicaid Coverage experiment (Baicker and mental variables (IVs) in HSR include Bao et al.
Finkelstein 2011). Researchers used a lottery to (2006), who, in examining the effect of providers
randomly allocate low-income adults between giving smoking cessation advice, used whether or
19 and 64 years old to either receive Medicaid not the provider provided diet/nutrition or physi-
or be assigned to a waiting list for Medicaid. cal activity advice as an instrument. Linden and
Although not originally implemented for this Adams (2006) use zip code as an instrument for
purpose, the lottery process allowed researchers participation in disease management programs,
to estimate the causal effects of Medicaid enroll- since not all geographic areas are covered by
ment compared to being uninsured. Preliminary such programs. Geography is commonly used as
results for the study indicated that Medicaid cov- an instrument, as it takes advantage of the fact that
erage increases health-care use (Baicker and many medical treatments are more accessible in
Finkelstein 2011). some geographic areas than others (e.g., McClel-
However, as has been widely discussed lan et al. 1994).
(Gluud 2006; Marcus et al. 2012; Rothwell IV methods essentially work by fitting two
2005), randomized trials do have their own models: first, a model of treatment received as a
complications. These include noncompliance, function of the instrument and covariates and,
where people do not take their assigned treat- second, a model of outcome as a function of
ments (Frangakis and Rubin 2002; Marasinghe treatment received and the covariates. The “exclu-
and Amarasinghe 2007; Peduzzi et al. 1993), sion restriction” (described further below)
missing outcome data (Frangakis et al. 2007), means that the instrument is “excluded” (not in)
worries that the people who enroll in a trial the second-stage model. Because these two equa-
may be different from those of broader interest tions are related (and the error terms therefore
(Marcus et al. 2012; Zimmerman et al. 2005), correlated), the models are generally fit using
and ethical concerns about randomization two-stage least squares models (Angrist and
(Crawford et al. 2011; De Melo-Martín et al. Imbens 1995, 1996).
2011; Hughes 2009). Because of these There are two primary assumptions on which
concerns, nonexperimental studies are some- IV methods rely (in addition to the SUTVA
times used to estimate the causal effects of assumption described above). The first is known
“treatments,” interventions, or exposures of as “monotonicity” and basically implies that there
interest. We will see that many of these are no “defiers:” no people who go against the
designs attempt to replicate key features of instrument in terms of what treatment they
experiments. receive. In other words, no one who would take
22 Introduction to Causal Inference Approaches 529
the treatment if not “encouraged to” by the instru- “local average treatment effect,” also known as
ment but who would not take the treatment when the “complier average causal effect:” the effect of
“encouraged” to do so by the instrument. The the treatment for the “compliers,” those individ-
second set of assumptions are what are known as uals whose behavior is affected by the instrument
the “exclusion restrictions.” These say that there is and who will take the treatment when “told” to do
no effect of the instrument on individuals whose so (when encouraged by the instrument) but not
behavior is not changed by the instrument. In when not encouraged. In the example above, com-
other words, there is no effect of the instrument pliers would be individuals who seek out health
on outcomes for people who would either always insurance coverage when private or public options
take the treatment (whether encouraged to or not are available and there are positive attitudes
by the instrument) or for people who would never toward public assistance, but who do not seek
take the treatment (whether encouraged to or not). out insurance coverage when these are not oper-
This is sometimes stated as that there is “no direct ating (Long et al. 2005). The complier average
effect” of the instrument on the outcomes; the causal effect is also known as a “marginal treat-
only way the instrument can change outcomes is ment effect” in the economics literature (Carneiro
by changing the treatment that individuals et al. 2011).
receive. This assumption is often questionable. Another consideration when using IV methods
To illustrate these two assumptions, consider is what is known as the “strength” of the instru-
treatment assignment and actual treatment status. ment: how correlated the treatment assignment
In a randomized experiment, these two conditions (instrument) is with the actual treatment status
are typically one and the same and are manipu- (the treatment received). A strong instrument is
lated by the researcher. In the context of an IV highly correlated with the actual treatment
design, the instrument influences (encourages) an received. A week instrument, Sin contrast, is
individual’s treatment assignment, but other fac- only weakly associated with the actual treatment
tors, such as individual-level covariates, influence received (i.e., it is a poor predictor of treatment
compliance with the assignment (i.e., treatment status). Weak instruments lead to reduced power
status). The monotonicity assumption means that and biased IV estimates (Bound et al. 1995).
there is a positive correlation between treatment
assignment and status. As an example, in the Long
et al. (2005) study of the impact of Medicaid on Regression Discontinuity
improving access to care, treatment status was
defined as being privately insured, having Medic- Introduced by Thistlethwaite and Campbell
aid coverage, or being uninsured. The four instru- (1960), regression discontinuity (RD) is a partic-
mental variables (i.e., the treatment assignment ularly strong nonexperimental design that can be
variables) included accessibility of private insur- used when the treatment of interest is assigned on
ance, availability of public coverage, and family the basis of some “assignment variable” and cut-
and community attitudes toward public assis- off. For example, individuals with cholesterol
tance; under the monotonicity assumption, the levels above 200 may be put into a care manage-
influence of these variables can only increase the ment program, whereas those with lower choles-
likelihood that an individual is privately insured terol are not given access to the program. The idea
or has Medicaid coverage. The exclusion restric- is to compare individuals just below and just
tions require that accessibility of private insur- above the cutoff, who should be otherwise similar
ance, availability of public coverage, and but with one group receiving the treatment of
attitudes toward public assistance only influence interest and the other not. The analysis examines
insurance coverage and do not have any effect on whether there is a “discontinuity” in the outcome
health-care utilization directly (Long et al. 2005). variable at the cutoff, which would indicate an
One important point about IV methods is that effect of the treatment. RD is similar to random-
they technically estimate what is known as the ized experiments in that the assignment
530 E. A. Stuart and S. Naeger
mechanism is known, and that is what allows us to estimates the effect only for those just around the
obtain reliable treatment effect estimates. cutoff. This arguably, however, is the group for
Examples of RD designs in HSR include stud- whom the effect is most relevant as presumably
ies of disease management programs (Linden and these are the people who may or may not receive
Adams 2006), which may be a particularly good the intervention (i.e., those with very high or very
setting for RD since eligibility for the program is low scores may not be reasonable candidates for
often determined by clinical measures to ensure the intervention under investigation). The design
that the program is provided to those most in need. is not appropriate for estimating the effect of the
RD designs may also be appropriate when treatment for individuals with assignment vari-
resources permit serving only a portion of the ables nowhere near the cutoff.
population and those most in need are served Sensitivity analyses are important in RD
first, in which case there may be a discontinuity designs. Important sensitivity analysis options
at the point at which resources are gone. This sort include “zero checks” where the analysis is
of idea was used by Ludwig and Miller (2006) in repeated using fake cutoffs, to confirm that no
estimating the effects of Head Start, who used a “effect” is seen there, as well as assessing sensi-
discontinuity in grant writing support for original tivity to the model specification, as mentioned
Head Start grants, with that support given to the above. It is also important to note that RD designs
300 poorest counties in the country. only work when the treatment was in fact given
This section highlights a few assumptions and out on the basis of the cutoff variable; they cannot
requirements of the RD method, as described by be used in a “post hoc” way if that was not in fact
Trochim (1984). First, for the most basic form of how the treatment was administered.
RD analyses, the cutoff must be followed. (In fact, For more information on RD designs, see
more advanced “fuzzy” RD designs can be used if Imbens and Lemieux (2008). Wong et al. (2012)
there is some “noncompliance,” where some indi- provides discussion of extensions for studies with
viduals who were eligible didn’t receive the treat- multiple assignment variables or cutoff points.
ment and some individuals who were not eligible The appendix of Linden et al. (2006) provides a
did receive it; see Imbens and Lemieux (2008)). relatively easy to read description of the actual
Second, accurate modeling of the relationship models run to estimate effects in RD designs.
between the assignment variable and the outcome
is crucial, for example, allowing for a nonlinear
relationship or other flexible models. Ludwig and Difference-in-Difference
Miller (2006) consider a variety of functional and Interrupted Time Series Designs
forms in order to assess sensitivity to the model.
Third, the sample size around the cutoff must be A common approach for estimating the effects of
large enough to fit those models reliably and with discrete policy changes is interrupted time series
sufficient precision. Goldberger (2008) indicates (ITS) analyses (or a simplified version, difference
that sample sizes 2.75 times larger than would be in differences). These methods rely on sophisti-
required for adequate power in an RCT are needed cated before-after analyses to compare observed
for RD designs. trends in the presence of an intervention with the
Threats to the validity of RD designs include time trends that would have been predicted in the
cases where the assignment variable is manipu- absence of the intervention. Briefly, at its most
lated because of the treatment assignment process, basic level, the treatment effect is estimated by
for example, clinicians manipulating the assign- modeling the “outcome” of interest in the
ment variable so that patients they want to have pre-period, extrapolating that model fit to the
participate in the program are seen as eligible. post period, and estimating the effect as the dif-
Similar to the idea of the “local average treat- ference between the expected values (from that
ment effect” in instrumental variables analyses, a model fit) and the observed values. Interest may
limitation of the RD design is that it formally be in determining whether the intervention leads
22 Introduction to Causal Inference Approaches 531
to a jump at the time of implementation raised the cigarette excise tax by 25 cents per
(an “interruption”) or also possibly a change in pack in order to fund anti-smoking initiatives
the slope of the time series trend. The simpler across the state. Similarly, O’Malley et al. (2006)
model, difference in differences, essentially col- discuss the careful choice of comparison groups in
lapses the “pre”- and “post”-time periods, com- the context of a difference-in-difference analysis
paring the change in the outcome from of interventions aimed to encourage the use of
pre-intervention to post-intervention between the generic drugs.
intervention group and a comparison group (see An important consideration in ITS models is
O’Malley et al. 2006, for an example). serial correlation and accounting for the correla-
ITS designs abound in HSR. Campbell et al. tion of measures across time. Since the error terms
(2009) use an ITS design to evaluate the effect of in the regression models will likely be correlated,
pay for performance on the quality of care in it is important to test for autocorrelation using a
primary care practices. They collected data from test such as Durbin’s test (Durbin 1970) and
42 primary care practices at two time points prior appropriately model that autocorrelation, for
to the policy implementation and at two time example, using AR-1 models (Mills 1990). See
points post policy implementation. Data on Wagenaar et al. (2009) for an example.
patient care, patient perception of access to care,
and continuity of case were used to determine if
care for patients with asthma, diabetes, or coro- Propensity Scores and Other Matching
nary heart disease improved after the pay-for per- Methods
formance plan was implemented (Campbell et al.
2009). As another example, Andersson et al. The final nonexperimental method discussed is
(2006) use interrupted time series to investigate that of propensity score methods, which broadly
the effects of changes in the pharmaceutical reim- are used to equate two groups and ensure that the
bursement schedule in Sweden on costs and vol- treatment effect is being estimated among treated
umes of pharmaceuticals. and comparison subjects who are otherwise simi-
ITS methods are most useful when (1) there is lar. In this respect, propensity score methods aim
an abrupt policy change (e.g., a new law) and to replicate two key features of a randomized
(2) there is sufficient pre-change data with which experiment: (1) create groups that are similar on
to estimate trends reliably. And while not background characteristics (or at least the
required, a comparison group that did not experi- observed ones) and (2) the outcome is not used
ence the policy change can be very useful in terms in setting up the “design” of the study. The pro-
of providing accurate results. In particular, com- pensity score itself is defined as the probability of
parative interrupted time series designs are partic- receiving the treatment and is estimated by model-
ularly strong since they provide information on ing treatment status as a function of baseline char-
trends in the post-period in comparison units (e.g., acteristics. Because of the properties of the
states) that did not experience the policy change. propensity score (Rosenbaum and Rubin 1983),
Without such a comparison group, the results are they are particularly useful for creating groups
more reliant on the time series models themselves; that look similar with respect to a large set of
this can be misleading, for example, when there characteristics; researchers can then match, sub-
are strong time trends even in the absence of the classify, or weight using just the propensity score
intervention (e.g., increasing test scores in educa- itself, rather than having to deal with each variable
tion research). Linden and Adams (2010) provide individually. See Stuart (2010) for more details.
an example of combining ITS methods with pro- Propensity score methods involve two stages:
pensity score weighting (discussed more below) (1) fitting a propensity score model and (2) using
to create a particularly good comparison group for those propensity scores to create balanced sam-
the ITS analysis. Their study estimates the effect ples. Common propensity score estimation
of California’s Proposition 99, which in 1988 methods include logistic regression as well as
532 E. A. Stuart and S. Naeger
the instruments and the endogenous explanatory Greenland S. An introduction to instrumental variables for
variable is weak. J Am Stat Assoc. 1995;90(430): epidemiologists. Int J Epidemiol. 2000;29(4):722–9.
443–50. Greenland S. Epidemiologic measures and policy formu-
Campbell SM, Reeves D, Kontopantelis E, Sibbald B, lation: lessons from potential outcomes. Emerging
Roland M. Effects of pay for performance on the qual- Themes in Epidemiology. 2005;2(1):5.
ity of primary care in England. N Engl J Med. 2009; Hacker K, Penfold R, Zhang F, Soumerai SB. Impact of
361(4):368–78. https://doi.org/10.1056/NEJMsa08 electronic health record transition on behavioral health
07651. screening in a large pediatric practice. Psychiatr Serv.
Carneiro P, Heckman JJ, Vytlacil EJ. Estimating marginal 2012;63(3):256–61.
returns to education. Am Econ Rev. 2011;101(6): Holland PW. Statistics and causal inference. J Am Stat
2754–81. Assoc. 1986;81(396):945–60.
Cole SR, Frangakis CE. The consistency statement in Hughes JR. Ethical concerns about non-active conditions in
causal inference: a definition or an assumption? Epide- smoking cessation trials and methods to decrease such
miology. 2009;20(1):3–5. concerns. Drug Alcohol Depend. 2009;100(3):187–93.
Cook TD, Shadish WR, Wong VC. Three conditions under Imai K, Keele L, Yamamoto T. Identification, inference and
which experiments and observational studies produce sensitivity analysis for causal mediation effects. Stat
comparable causal estimates: new findings from within- Sci. 2010;25(1):51–71.
study comparisons. J Policy Anal Manage. 2008;27(4): Imai K, King G, Stuart EA. Misunderstandings between
724–50. https://doi.org/10.1002/pam.20375. experimentalists and observationalists about causal
Crawford MJ, Thana L, Methuen C, Ghosh P, Stanley SV, inference. J R Stat Soc Ser A Stat Soc. 2008;171(2):
Ross J, Gordon F, et al. Impact of screening for risk of 481–502.
suicide: randomized controlled trial. Br J Psychiatry. Imbens GW, Lemieux T. Regression discontinuity designs:
2011;198(5):379–84. a guide to practice. J Econ. 2008;142(2):615–35.
De Melo-Martín I, Sondhi D, Crystal RG. When ethics Lee BK, Lessler J, Stuart EA. Improving propensity score
constrains clinical research: trial design of control weighting using machine learning. Stat Med. 2010;
arms in “greater than minimal risk” pediatric trials. 29(3):337–46.
Hum Gene Ther. 2011;22(9):1121–7. Lehman DR, Wortman CB, Williams AF. Long-term
Dowd BE. Separated at birth: statisticians, social scientists, effects of losing a spouse or child in a motor vehicle
and causality in health services research. Health Serv crash. J Pers Soc Psychol. 1987;52(1):218–31.
Res. 2011;46(2):397–420. Linden A, Adams JL. Evaluating disease management pro-
Durbin J. Testing for serial correlation in least-squares gramme effectiveness: an introduction to instrumental
regression when some of the Regressors are lagged variables. J Eval Clin Pract. 2006;12(2):148–54. https://
dependent variables. Econometrica. 1970;38(3): doi.org/10.1111/j.1365-2753.2006. 00615.x.
410–21. Linden A, Adams JL, Roberts N. Evaluating disease man-
Escarce JJ, Flood AB. Introduction to special section: agement programme effectiveness: an introduction to
causality in health services research. Health Serv Res. the regression discontinuity design. J Eval Clin Pract.
2011;46(2):394–6. https://doi.org/10.1111/j.1475- 2006;12(2):124–31.
6773.2011.01255.x. Linden A, Adams JL. Using propensity score-based weighting
Finkelstein EA, Fiebelkorn IC, Wang G. State-level esti- in the evaluation of health management programme effec-
mates of annual medical expenditures attributable to tiveness. J Eval Clin Pract. 2010;16(1):175–9.
obesity*. Obes Res. 2004;12(1):18–24. https://doi. Little RJ, Rubin DB. Causal effects in clinical and
org/10.1038/oby.2004.4. epidemiological studies via potential outcomes:
Fisher R. The arrangement of field experiments. Journal of concepts and analytical approaches. Annu Rev
Ministry of Agriculture. 1926;33:500–13. Public Health. 2000;21:121–45. https://doi.org/10.
Frangakis CE, Rubin DB. Principal stratification in causal 1146/annurev.publhealth.21.1.121.
inference. Biometrics. 2002;58(1):21–9. Liu W, Kuramoto SK, Stuart EA. An introduction to sensitiv-
Frangakis CE, Rubin DB, An MW, MacKenzie ity analysis for unobserved confounding in non-experi-
E. Principal stratification designs to estimate input mental prevention research. Prev Sci. 2013;14(6):570–80.
data missing due to death. Biometrics. 2007;63(3): PMCID:3800481.
641–9. Long SK, Coughlin T, King J. How well does medicaid
Gluud LL. Bias in clinical intervention research. Am work in improving access to care? Health Serv Res.
J Epidemiol. 2006;163(6):493–501. https://doi.org/ 2005;40(1):36–58. https://doi.org/10.1111/j.1475-6773.
10.1093/aje/kwj069. 2005.00341.x.
Goldberger A. Selection bias in evaluating treatment Ludwig J, Miller DL. Does head start improve children’s
effects: some formal illustrations. In: Modelling and life chances? Evidence from a regression discontinuity
evaluating treatment effects in econometrics, Advances design. Institute for the Study of Labor (IZA). 2006.
in econometrics. Bingley: Emerald Group Publishing Retrieved from http://ideas.repec.org/p/iza/izadps/
Limited; 2008. p. 1–31. dp2111.html
22 Introduction to Causal Inference Approaches 535
Marasinghe JP, Amarasinghe AAW. Noncompliance in Rosenberg L. Comparative effectiveness research: making
randomized controlled trials [4]. CMAJ. 2007; it work for those we serve. J Behav Health Serv Res.
176(12):1735. 2009;36(3):283–4.
Marcus SM, Stuart EA, Wang P, Shadish WR, Steiner Rothwell PM. External validity of randomised controlled
PM. Estimating the causal effect of randomization ver- trials? To whom do the results of this trial apply??
sus treatment preference in a doubly randomized pref- Lancet. 2005;365(9453):82–93.
erence trial. Psychol Methods. 2012;17(2):244–54. Rubin DB. The design versus the analysis of observa-
McClellan M, McNeil BJ, Newhouse JP. Does more inten- tional studies for causal effects: parallels with the
sive treatment of acute myocardial infarction in the design of randomized trials. Stat Med. 2007;
elderly reduce mortality? Analysis using instrumental 26(1):20–36.
variables. JAMA. 1994;272:859–66. Schneeweiss S. Sensitivity analysis and external adjust-
Mills TC. Time series techniques for economists. Cam- ment for unmeasured confounders in epidemiologic
bridge: Cambridge University Press; 1990. database studies of therapeutics. Pharmacoepidemiol
Mullins CD, Abdulhalim AM, Lavallee DC. Continuous pat- Drug Saf. 2006;15(5):291–303.
ient engagement in comparative effectiveness research. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H,
JAMA J Am Med Assoc. 2012;307(15):1587–8. Brookhart MA. High-dimensional propensity score
Neyman J. On the application of probability theory to adjustment in studies of treatment effects using health
agricultural experiments. Essay on principles. Stat care claims data. Epidemiology. 2009;20(4):512–22.
Sci. 1923;5(4):465–80. Shadish WR, Cook TD, Campbell DT. Experimental and
Neyman J. On the two different aspects of the representa- quasi-experimental designs for generalized causal
tive method: the method of stratified sampling and the inference. 2nd ed. Belmont: Wadsworth Publishing;
method of purposive selection. J R Stat Soc. 2002.
1934;97:558–606. Steiner PM, Cook TD, Shadish WR, Clark MH. The
Oliver S, Armes DG, Gyte G. Public involvement in setting importance of covariate selection in controlling for
a national research agenda: a mixed methods evaluation. selection bias in observational studies. Psychol
Patient Patient-Cent Outcomes Res. 2009;2(3):179–90. Methods. 2010;15(3):250–67.
O’Malley AJ. Commentary on Bryan Dowd’s paper “Sep- Stuart EA. Matching methods for causal inference: a
arated at birth: statisticians, social scientists, and cau- review and a look forward. Stat Sci. 2010;25(1):1–21.
sality in health services research”. Health Serv Res. Thistlethwaite DL, Campbell DT. Regression-
2011;46(2):430–6. discontinuity analysis: an alternative to the ex post
O’Malley AJ, Frank RG, Kaddis A, Rothenberg BM, facto experiment. J Educ Psychol. 1960;51(6):309–17.
McNeil BJ. Impact of alternative interventions on Trochim W. Research design for program evaluation; the
changes in generic dispensing rates. Health Serv Res. regression-discontinuity design. Beverly Hills: Sage;
2006;415(5):1876–94. 1984.
Pearl J. Statistics and causality: Separated to reunite – Wagenaar AC, Maldonado-Molina MM, Wagenaar
commentary on Bryan Dowd’s “Separated at birth”. BH. Effects of alcohol tax increases on alcohol-related
Health Serv Res. 2011;46(2):421–9. disease mortality in Alaska: time-series analysis
Peduzzi P, Wittes J, Detre K, Holford T. Analysis from 1976 to 2004. Am J Public Health. 2009;
as-randomized and the problem of non-adherence: 99(8):1464–70.
an example from the veterans affairs randomized Werner RM, Konetzka RT, Stuart EA, Norton EC, Polsky D,
trial of coronary artery bypass surgery. Stat Park J. Impact of public reporting on quality of Postacute
Med. 1993;12(13): 1185–95. https://doi.org/10.1002/ care. Health Serv Res. 2009;44(4): 1169–87. https://doi.
sim.4780121302. org/10.1111/j.1475-6773.2009.00967.x.
Rosenbaum PR. Choice as an alternative to control in Wong VC, Steiner PM, Cook TD. Analyzing regression-
observational studies. Stat Sci. 1999;14(3):259–304. discontinuity designs with multiple assignment
Rosenbaum PR. Observational study. In: Everitt B, variables: a comparative study of four estimation
Howell D, editors. Encyclopedia of statistics in behav- methods. J Educ Behav Stat. 2012; https://doi.org/
ioral science. Chichester: Wiley; 2005a. 10.3102/1076998611432172.
Rosenbaum PR. Sensitivity analysis in observational stud- Zaslavsky AM, Ayanian JZ, Zaborski LB. The validity
ies. In: Everitt BS, Howell DC, editors. Encyclopedia of race and ethnicity in enrollment data for medicare
of statistics in behavioral science, vol. 4. Chichester: beneficiaries. Health Serv Res. 2012;47(3 Part 2):
Wiley; 2005b. p. 1809–14. 1300–21.
Rosenbaum PR. Design of observational studies, Springer Zimmerman M, Chelminski I, Posternak MA. Generaliz-
series in statistics. New York: Springer; 2010. ability of antidepressant efficacy trials: differences
Rosenbaum PR, Rubin DB. The central role of the propen- between depressed psychiatric outpatients who would
sity score in observational studies for causal effects. or would not qualify for an efficacy trial. Am J
Biometrika. 1983;70(1):41–55. Psychiatr. 2005;162(7):1370–2.
Measurement of Patient-Reported
Outcomes of Health Services 23
Joseph C. Cappelleri and Andrew G. Bushmakin
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
Research Basis and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Background and Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
Selection of Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Longitudinal Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
Event- or Condition-Driven Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
Time-Driven Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
Timing of the Initial PRO Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
Timing of Follow-Up PRO Assessments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
Frequency of Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
Selection and Evaluation of the Measurement Instrument . . . . . . . . . . . . . . . . . . . . . . . . . 545
Step 1: Formulating Study Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Step 2: Developing or Selecting an Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
Step 3: Developing Data Collection Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
Step 4: Analyzing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
Step 5: Reporting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
Interpreting Study Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
elements are central to this chapter as they relate potential for missing data from not only missed
to the design and measurement of PROs. These assessments on an entire PRO but also non-
elements are discussed and framed within the response of some items on a PRO used in a
five characteristics that tend to be associated study. A second characteristic is that being sub-
with PROs: missing and incomplete data, psy- jective and not a so-called “hard” endpoint like
chometric validation, interpretation, multiple death, PROs require their measurement properties
testing, and longitudinal analysis. Special consid- to be assessed, leading to additional steps of val-
eration is given for developing a PRO measure- idation (reliability and validity) prior to their anal-
ment strategy in a regulatory context where the ysis on treatment effect. A third characteristic,
intent is to have a labeling claim on a PRO. related to the second one, is that the interpretation
of PROs may require methods that can enrich and
enhance their interpretation. A fourth characteris-
Introduction tic is that most PROs are multidimensional and
hence produce multiple scores on various aspects
A patient-reported outcome (PRO) is any report of what is being measured, engendering multiple
on the status of a patient’s health condition that comparisons and testing of outcomes that need to
comes directly from the patient, without interpre- be methodologically and statistically addressed.
tation of the patient’s response by a clinician or The fifth characteristic is that the outcomes are
anyone else (Food and Drug Administration generally repeated over time, calling for methods
2009). Patient-reported outcome is an umbrella that effectively handle longitudinal data in the
term that includes a whole host of subjective out- context of the research question.
comes such as pain, fatigue, depression, aspects of Identifying which components of a PRO are
well-being (e.g., physical, functional, psycholog- relevant to measuring the impact of a disease and
ical), treatment satisfaction, health-related quality its treatment is essential to good study design and
of life, and physical symptoms such as nausea and subsequent scientific scrutiny. Successful measure-
vomiting. Patient-reported outcomes are often rel- ment of PROs begins with the development of a
evant in studying a variety of conditions – includ- protocol to provide a recipe for the conduct of the
ing pain, erectile dysfunction, fatigue, migraine, study. The protocol provides not only key elements
mental functioning, physical functioning, and of the study design but also provides the scientific
depression – that cannot be assessed adequately rationale and planned analysis for the study, which
without a patient’s evaluation and whose key are inextricably linked to study design.
questions require patient’s input on the impact of Because the validation of PROs is an ongoing
a disease or a treatment. After all, who knows process, multiple protocols with each having its
better than the patient herself/himself? To be use- specific purpose may often be necessary. A proto-
ful to patients and other decision-makers (e.g., col for a study, be it a clinical trial or a method
physicians, regulatory agencies, reimbursement study, should contain several essential elements.
authorities), who are stakeholders in medical A clinical trial protocol should describe the fol-
care, PRO must undergo a validation process to lowing: the rationale for the specific aspect of
confirm that it is reliably measuring what it is PRO being measured, explicit research objectives
intended to measure. and endpoints, strategies for minimizing the
In general the same clinical trial design princi- exclusion of subjects from the study, rationale
ples that apply to directly assessable clinical end- for timing of assessments and off-study rules,
point measures, like blood pressure, also apply to rationale for instruction selection, details for
PROs. Although not necessarily unique to PROs, administration of PRO assessments to minimize
at least five characteristics tend to be associated bias and missing data, sample size estimation, and
with PROs (Fairclough 2004). One characteristic analytic plan. A method study protocol involves
is, by definition, PROs require the subject’s by definition methodological considerations, such
(patient’s) active participation, resulting in the as which measurement properties of a PRO will be
23 Measurement of Patient-Reported Outcomes of Health Services 539
tested, and these considerations will define the of relevance in assessing outcomes of interest in
design of the study. For example, if an objective relation to the disease, as well as the characteris-
is to obtain test-retest reliability data, data should tics of the patient population under consideration,
be collected at least on two occasions. Contrary to needs to be described and linked to previous
a clinical trial design, which includes a pre- research and to the planned treatment. The reason
selected diseased population at baseline, method for using the PRO component in relation to the
studies may not involve any treatment and may research question needs to be lucid, and the PROs
include a variety of subjects from healthy to need to be clearly defined in the study (Wiklund
severely ill for whom a PRO is designed to assess. 2004). A rationale should be given for not only
The aforementioned protocol elements are cen- why a PRO is being studied but also which spe-
tral to this chapter as they relate to the design and cific aspect of a PRO is central and especially
measurement of PROs. These elements are worthwhile.
discussed and framed within the five characteris- Inherent in PROs is its ability to assist in pro-
tics that tend to be associated with PROs: missing viding a better understanding of disease and treat-
data, validation, interpretation, multiple testing, ment outcomes from the patient’s perceptive, and
and handling of longitudinal data. PROs do so by translating clinical improvement,
Specifically, Section “Research Basis and stability, or deterioration into patient-centered
Goals” covers the research basis surrounding outcomes. As such, PROs represent a unique indi-
PROs, with focus on the background and rationale cator of the impact of disease and its treatment by
and also on research objectives. Section “Selection enabling physicians and other health-care profes-
of Subjects” centers on selection of subjects. sionals to rely significantly on patient reports in
Section “Longitudinal Designs” focuses on longi- evaluating disease activity and symptoms.
tudinal designs. It discusses event- or condition- In the management and monitoring of certain
driven designs, time-driven designs, timing of the chronic conditions – such as arthritis, neuropathic
initial PRO assessments, timing of the follow-up pain, irritable bowel syndrome, sexual dysfunc-
PRO assessments, and frequency of evaluation. tion, and chronic obstructive pulmonary disease –
Section “Selection and Evaluation of the Measure- PROs have become the central outcomes of
ment Instrument” concentrates on the selection and choice. In other chronic diseases, such as cancer
evaluation of the measurement instrument: formu- and cardiovascular disease, increased attention
lating study objectives, developing or selecting an has been paid to PROs in order to highlight the
instrument (its relevance, psychometric properties, humanistic side of the disease and its treatment. In
and feasibility), developing data collection strate- oncology studies, for instance, the impact of treat-
gies, analyzing data (multiple testing, missing data), ment on survival and tumor shrinkage is often
reporting data, and interpreting study findings. accompanied by and weighted against the impact
Moreover, in this chapter, special consideration is of the treatment on aspects of a patient’s health-
given for developing a PRO measurement strategy related quality of life, for example, the impact of
in a regulatory context where the intent is to have a chemotherapy on toxicity and adverse events.
labeling claim on a PRO. Since 1985, the FDA has recommended patient-
centered evaluations in clinical trials relating to
cancer research (Johnson and Temple 1985).
Research Basis and Goals The rationale for measuring PROs needs to be
made explicit in the planning and documenting of
Background and Rationale clinical trials in order to put forward labeling or
promotional claims on PROs. From an industry
Providing sufficient background and rationale to and regulatory perspective, a well-defined and
justify the resources required for an investigation reliable PRO instrument in suitably designed
of PROs will contribute to the success of the investigations can be used to support a claim
investigation. The background to why PROs are provided that the medical product labeling of the
540 J. C. Cappelleri and A. G. Bushmakin
claim is consistent with the suitably documented study. Here is an example of a specific and con-
measurement capability of the instrument (Food crete objective: “Moodlift 20 mg, taken once
and Drug Administration 2009). (“Instrument,” as daily, will lead to improvement in symptoms of
defined here, refers to a questionnaire plus all the depression, psychological function, and social
information and documentation that supports its function by 2 weeks among adult men with
use, including the method of administration, major depressive disorder” (Luo and Cappelleri
instructions for administration, the scoring algo- 2008; Rothman et al. 2007). Based on the aspects
rithm, analysis, and interpretation.) The Food and of a patient’s condition under investigation, rele-
Drug Administration, the regulator and approver vant PRO instruments and relevant domains of
of medicines in the United States, has produced a those instruments should be identified.
guidance document for use in medical product In addition to identifying the relevant domains
development to support labeling claims (Food of a PRO, the population of interest, and the time
and Drug Administration 2009). frame of interest, objectives should have clear
Data from a PRO instrument can be used to hypotheses as to whether the intent is to obtain a
highlight any distinctive treatment advantages label claim or not, demonstrate superiority or
and disadvantages of a drug, which are not possi- non-inferiority, and seek confirmatory or explor-
ble to be measured in other ways. Conversely, atory evidence. Different endpoints may serve
without PRO data, a drug’s profile may be incom- different purposes; for example, one PRO end-
plete and as such does not represent the full base point may be sought for a label claim and require
of potential benefits or harms patients would confirmatory evidence, whereas another PRO
experience when using the medicine under endpoint may be considered exploratory with no
investigation. intention of a label claim.
In seeking a PRO label claim being sought in
the United States, sponsors of medicines are
Research Objectives advised to place their research objectives and
goals in terms of a conceptual framework, which
The most critical component of a study is its may be useful in developing and refining the goals
research objectives and goals. The implementa- for PRO measurement. Guided by an appropriate
tion of study is successful only when its goals and conceptual model, which identifies and describes
research objectives are well defined with suffi- the PRO concepts and hypotheses that underlie a
cient detail to guide its design, conduct, and anal- PRO-based product labeling claim, a conceptual
ysis. The development of a clear and explicit a framework explicitly defines or depicts the rela-
priori objective is vital for subsequent trial design tionships between the items in a PRO instrument
and study conduct, especially if a sponsor wishes and the concept measured (Food and Drug
to seek a label claim or promote benefits of an Administration 2009; Rothman et al. 2007;
intervention. Snyder et al. 2007). The concept is the specific
Stated objectives should breathe concrete and measurement goal, that is, the attribute or charac-
specificity, not vagueness and ambiguity. For teristic measured by a PRO instrument.
example, the objective “To compare PROs If the desired overall claim, for instance, is
between regiment A and regimen B” fails to pro- “product X reduces problems with swallowing
vide specific information about the patient popu- and speaking to others and improves daily activ-
lation of interest, time of assessment, and which ities for individuals with head and neck cancer,”
aspects of a patient’s condition will be assessed the diagram in Fig. 1 depicts a plausible concep-
and compared. The stated objectives should refer tual framework of a PRO instrument where a set
to what is being measured and not the measure- of items is associated with a specific domain, such
ment instrument. Clear specifications of the as “swallowing,” “speaking,” and “basic activities
details can help to better design study protocols of daily living”; moreover, the domains represent
and are vital to the ultimate success of a clinical related but separate concepts (Patrick et al. 2007).
23 Measurement of Patient-Reported Outcomes of Health Services 541
An instrument may create a single score, thereby statistically significant and medically important,
measuring a single concept, or, as in Fig. 1, may would be sufficient for a claim. If this survival
be developed with multiple domain scores each endpoint showed a statistically significant and
represented by a concept, possibly within a more clinically meaningful treatment benefit, the
general concept of measurement, represented by domains of the PRO instrument – “swallowing,”
the “head and neck cancer-specific function” “speaking,” and “basic activities of daily living”–
domain. The conceptual model of a PRO instru- are subsequently listed in order of importance as
ment will evolve and be confirmed over the course complementary endpoints that may result in a
of measurement development as a sponsor gathers claim.
empiric evidence to support item grouping and
scores (Food and Drug Administration 2009).
Related to the conceptual framework, an end- Selection of Subjects
point model should be described and depicted if a
label claim is to be sought in the United States. It It is strongly recommended that protocol eligibil-
represents a diagram of the hierarchy of relation- ity, whenever possible, be restricted to patients
ship among endpoints, both PRO and non-PRO, willing and able to participate in the PRO assess-
that corresponds to the clinical trial’s objectives, ment. This challenging recommendation is moti-
design, and data analysis plan (Food and Drug vated by following two fundamental rationales
Administration 2009). Figure 2 depicts a hypo- (Gotay et al. 1992). The first is practical. Study
thetical endpoint model for a head and neck can- implementation is easier and more efficient when
cer example (Patrick et al. 2007). Primary all patients require the same assessments (PROs as
endpoints here include overall survival which, if well as non-PROs). The second is scientific.
542 J. C. Cappelleri and A. G. Bushmakin
Fig. 2 Hypothetical
endpoint model for head Concept Endpoints
and neck cancer example
Indication
Primary
Treatment of head
Overall Survival
and neck cancer
Secondary
Swallowing
Supportive Concepts
Other treatment benefit Speaking
Basic activities
of daily living
Credibility and interpretation of the overall course of action is to make eligible patients a top
results, and their overall conclusions, are priority and to have them complete all of their
enhanced when all subjects are available for all assessments in the same manner.
endpoints.
Assessments on PROs should not be seen as
optional by physician or patients. Optional assess- Longitudinal Designs
ments would jeopardize the ability of study results
to be generalized to study population and, with Patient-reported outcomes are often incorporated
randomization compromised, would likely lead to into a study by administering questionnaires at
selection bias. The goal here is to avoid differen- multiple time points with the goal of characteriz-
tial assessments on different patients because oth- ing the outcome over time (Fairclough 2004,
erwise results can be biased and likely seriously 2005, 2010). Such longitudinal data arise in
biased. All measurements over time should be most PRO investigations because interest centers
sought for all patients, not just some, in order to on how a disease or intervention affects an indi-
maintain validity and extension of inferences. vidual’s functioning and well-being over time.
Physical, cognitive, or language barriers may The number and timing of PRO assessments is
make the evaluation of PROs impossible in prac- influenced by the study objectives, such as when
tice for specific groups of patients (Gotay et al. meaningful change is expected, and practical con-
1992). In this case, alternative strategies for siderations, such as patient burden. Key consider-
collecting PRO data should be considered. Such ations in the design of a longitudinal study follow.
strategies include translation and culturally fine-
tuning of PRO instruments, assistance for patients
with visual or auditory impairments, and proxy Event- or Condition-Driven Designs
assessments for patients with cognitive deficits.
However, given the need to include patients who When the objective of a study is to compare a
are elderly or in minority populations, a preferred PRO in subjects who experience the same type
23 Measurement of Patient-Reported Outcomes of Health Services 543
of condition during a given phase of treatment, interferon alpha (control) (Cella et al. 2008).
assessments can be planned at times when clini- Doses were adjusted in response to symptoms of
cally relevant events are expected to occur or at toxicity. Treatment in both groups was continued
times that correspond to a distinct, meaningful until the occurrence of death, unacceptable
phase of the intervention or disease. Such assess- adverse events, or withdrawal of consent. Patients
ment is more common for a design with a rela- were asked to complete the PRO questionnaires
tively short duration. Many variations exist. before any clinical activities during visits to the
Among them, for example, is when differences study clinics at screening, on days 1 and 28 of
in PRO values are expected during only the early each 42-day treatment cycle, and at the end of
period of therapy. A breast cancer trial of adjuvant treatment or study withdrawal.
therapy in which a 16-week dose-intensive regi- Time-driven designs are associated with
men was compared with a more traditional mixed-effect models for studies where time is
24-week regimen is an illustration of such an often conceptualized and taken as a continuous
event-driven design (Fetting et al. 1998). Three variable. Mixed-effect models are useful when the
assessments were planned – prior to (baseline timing of assessment differs widely among indi-
assessment), during, and after therapy – where viduals, studies have a large number of PRO
each phase of the disease or its treatment was assessments, or changes over time are to be
considered distinct with respect to the PRO of modeled with a smaller number of parameters
interest. than that required for a repeated measures model
In event-driven designs where each assessment (with time as a categorical covariate).
is conceptually identified with a landmark event,
repeated measures models for longitudinal data
(with time taken as a categorical covariate) are Timing of the Initial PRO Assessment
an appropriate choice. Note that assessments for
all subjects should be taken at the same points in The initial assessment is the first and one of most
time (e.g., week 6, week 10, and week 24), where important assessments in a study. This initial
points in time need not be equally spaced. assessment, usually referred as a baseline assess-
Repeated measures models may also be useful in ment, plays crucial role in estimation of changes
some studies with only a few assessments. on PRO outcomes. If the baseline assessment is
not present, all other data for this subject could be
useless in the modeling of differences between
Time-Driven Designs treatments. It is also critical that the initial assess-
ment occurs prior to randomization in randomized
When the scientific questions involve a more trials. Because the measurement of a PRO is gen-
extended period, or when the phases of the disease erally based on self-evaluation, an initial assess-
or its treatment are not distinct, the longitudinal ment that follows randomization runs the risk that
designs are based on or driven by time (Fairclough a subject’s responses are influenced by knowledge
2005, 2010). These designs are appropriate for of treatment assignment (Brooks et al. 1998). This
chronic conditions where therapies are given risk becomes especially evident when one of the
over elongated periods, such as diabetes and interventions is a new, promising therapy.
arthritis. Sometimes multiple assessments, assessed
In time-driven designs, the duration of therapy before randomization, from daily patient diaries
may be indeterminate at study onset, with therapy are collected and averaged to arrive at an overall
intended to be given to a patient until it is not baseline score. Such averaging increases the reli-
efficacious or produces unacceptable toxicity. For ability (precision) of measurement relative to a
instance, patients with advanced renal cell carci- single assessment. In two randomized, double-
noma were randomized to receive either repeated blind, placebo-controlled trials of pregabalin for
6-week cycles of sunitinib (experimental) or fibromyalgia, a patient’s baseline score on self-
544 J. C. Cappelleri and A. G. Bushmakin
reported sleep quality was computed as the aver- test-retest, should be reevaluated. To be consid-
age rating over the 7 days prior to taking study ered statistically independent observations, the
medication (Russell et al. 2009). In this daily diary timing of one assessment should not have a recall
assessment, patients completed the rating in the period that overlaps with the timing of another
morning upon awakening and reported the quality assessment on the same instrument; assessments
of their sleep over the past 24 h on an 11-point should be based on distinct recall periods.
numeric rating scale ranging from 0 (“best possi-
ble sleep”) to 10 (“worst possible sleep”).
Frequency of Evaluations
Timing of Follow-Up PRO Assessments The frequency of the assessments depends on the
natural history of the disease, the likelihood of
As with the timing of the initial PRO assessment, meaningful changes during the study period, the
the timing of follow-up assessments should recall period of a PRO (if the PRO is based on
receive careful consideration (Fairclough 2010). recall over the previous month, assessments
A tenet of appropriate timing for follow-up assess- should not be made weekly or daily), and how
ments is that they should be made consistently discontinuation of therapy relates to the research
across the treatment arms. It is important not to objective. All of these considerations should be
choose a particular time that will bias the results balanced with practical considerations such as the
against one treatment or another. Measuring burden placed on individuals who complete ques-
immediately after an untoward event such as tox- tionnaires and the timing of therapeutic and diag-
icity will emphasize that experience at the expense nostic interventions. Hence the assessments on
of de-emphasizing the potential benefits of treat- PROs should be frequent enough to capture mean-
ment and disease symptoms. When follow-up ingful change over a sufficient duration but not
assessments on PROs are to be collected, they frequent enough to cause excessive burden on
are usually positioned at all or some of the visits participants.
that other clinical assessments or lab measure- In long-term studies with mortality as the pri-
ments are collected. mary endpoint, as in chronic heart failure trials, it
A major factor when deciding on the timing of is often useful to have more frequent assessments
the PRO assessment, both initially and subse- at the end of the study to enable detection of
quently, is the recall period of the PRO question- deterioration. If, on the other hand, rapid change
naire. Because individuals have better recall for is expected during the early part of a study, as is
major events and more recent experiences, the typically the case for renal cell carcinoma studies,
period of accurate recall for measuring certain more frequent assessments earlier on may be
areas (e.g., erectile dysfunction, physical well- needed.
being) is between 1 and 4 weeks, whereas the Assessments should not be more frequent that
period of recall for the frequency and severity of the period of recall defined for the PRO instru-
symptoms (e.g., pain, fatigue) is accurate over ment. Instruments on satisfaction, functioning,
shorter periods such as at the time of patient and well-being are often based on the last 7 days
completion of the PRO or the past 24 h. That or 4 weeks. Symptoms assessment scales often
said, it should be noted that recall period use the last 24 h or ask about the severity right
established by the developers of the PRO instru- now. Shorter periods of recall are generally more
ment should be used. It is unadvisable to change a appropriate when the severity of symptoms are
recall period of a PRO instrument to fit a particular being evaluated, with more rapid changes in
study design, but rather a PRO instrument should symptoms requiring a shorter recall duration,
be selected (or maybe even newly developed) to while the same or longer periods may be required
fit the study design. If a recall period for a PRO to assess the impact of those symptoms on activ-
instrument was changed, some aspects, such as ities of daily living. Such was the case in a
23 Measurement of Patient-Reported Outcomes of Health Services 545
non-small lung cancer trial where the severity of and justify the relevance of the selected PROs to
multiple symptoms and the impact of those symp- the target disease, patient population, and study
toms on daily functioning from chemoradiation setting. Information on relevance can be obtained
were evaluated during the last 24 h before the start from the medical literature, previous studies, and
of this intervention and weekly for 12 weeks dur- direct input from patients and other stakeholders
ing and after it (Wang et al. 2006). like families and health-care professionals. What
In many cases what is of real interest is not the is also needed is an understanding of the epidemi-
integrated effect over a short period (e.g., 2-week ology and burden of disease from the patient’s
period), but the effect at regular intervals (e.g., perspective and the postulated and empirical rela-
2, 4, and 6 weeks), similar to how measurements tionships between treatment, PROs, and other
might be made every 2 weeks in a blood pressure clinical outcomes.
trial (Food and Drug Administration 2009). For The FDA guidance on PROs for a label
regulatory claims on a PRO, the recall period with claim in clinical trials recommends a wheel
the shortest time frame consistent with and spoke diagram as a way to organize the
the instrument’s purpose or intended use (e.g., development process and provide the path by
when feasible, a recall period referenced which the PRO can lead to a claim
to the patient’s current or recent state) is preferable (Food and Drug Administration 2009; Patrick
to a recall period that is based on a longer period, a et al. 2007). The diagram is reproduced in
comparison of a patient’s current state with an Fig. 3. The five major steps highlighted in
earlier period, and a self-reported average over the diagram, which summarizes the iterative
time (Food and Drug Administration 2009). process used in developing a PRO instrument
Patients who drop out of a study prematurely for use in clinical trials, apply regardless of
are generally more likely to have a less favorable whether sponsors use an existing instrument,
score on a PRO because of side effects or no effect modify an existing instrument, or develop a
of treatment. A treatment arm with a high rate of new instrument. This diagram encapsulates
dropout is likely to give an artificially more favor- why the standards and preparations required for
able outcome because only the healthiest of the a PRO label claim are much more involved than
patients remain on treatment, leading to selection when a label claim is not sought.
bias and overly optimistic estimates of treatment In what follows a series of key steps on good
effect. It is therefore desirable to have a PRO research practices that centers around the common
assessment in conjunction with premature with- theme of selecting and evaluating a PRO measure-
drawal from the study. If the research objective ment instrument, be it for a regulatory claim or not
extends to off-therapy assessments, then they can (Luo and Cappelleri 2008).
be made by continuing the PRO assessments after
discontinuation. The off-therapy assessments can
always be excluded if deemed uninformative or
irrelevant to the research question. Including the Step 1: Formulating Study Objectives
off-therapy assessments after discontinuation
allows them to be available should they be deter- The evaluation of PROs begins with the formu-
mined to be of interest. lation of study objectives (Fig. 4). If a sponsor
wishes to seek a label claim or promote
benefits of a drug, the development of a
Selection and Evaluation clear and explicit a priori objective is critical
of the Measurement Instrument for subsequent trial design and study conduct.
Stated objectives should breathe concrete
The PRO measurement strategy should be and specificity, not vagueness and ambiguity,
operationalized according to what study questions as stated in the section “Research Basis and
are to be answered. It is necessary to understand Goals.”
546 J. C. Cappelleri and A. G. Bushmakin
Fig. 3 Development of a patient-reported outcome instrument for a label claim in a FDA application: an iterative process
(Source: Food and Drug Administration 2009)
Fig. 4 Key steps for selecting and evaluating patient-reported outcomes (Source: Reprinted with permission from Luo
and Cappelleri 2008)
what extent the items move in tandem to measure (discriminant validity). Construct validity can
different aspects of the same concept. The assess- also be assessed by correlating instrument scores
ment of internal consistency reliability is usually with other measures that are theoretically related
carried out using Cronbach’s alpha coefficient. (convergent validity) or unrelated (divergent
Test-retest reliability measures to what degree an validity) to the underlying concept measured by
instrument gives similar scores when it is repeat- the instrument.
edly administered to the same patient under a In addition to corrected item-to-total correla-
stable condition. It is often based on an intraclass tions (correlations between an item and the sum of
correlation coefficient. For Cronbach’s alpha and the other items on the same domain), items in
intraclass correlation coefficient, a minimum multi-item scales are often evaluated and con-
value of 0.7 is considered acceptable (Fayers and firmed by factor analysis. A “factor” is a latent
Machin 2007). variable, that is, an unobserved or hidden variable;
Assessing reliability is not sufficient for the the term “factor” may be defined and interchanged
validation of an instrument. An instrument may with the terms “domain,” “construct,” or “con-
be reliable (accurate and precise in measuring the cept.” A latent variable is a hypothetical construct
something) yet not measure what it is supposed to that is not directly observed but whose existence is
measure and hence not be valid. There are at least inferred from the way it influences the observed or
three major types of validity: content validity, manifest variables. Examples of a latent variable
construct validity, and criterion validity. Criterion include depression and anxiety.
validity is not assessed when there no criteria or The statistical technique that can govern and
“gold standard” measure, as is often the case for quantify those interrelationships is factor analysis.
most of the diseases. Factor analysis is a multivariate statistical method
Content validity concerns the extent to which concerned with detecting and analyzing patterns
the constituent items reflect the intended concept. based on the correlations among quantitative vari-
The assessment of content validity usually ables. For PRO assessment, it attempts to identify
involves critical examination on whether the groups of items such that there are strong correla-
items are comprehensive enough and clearly tions among all items within the same domain and
cover, without ambiguity, the concept of interest. weaker correlations among items in different
Content validity is often evaluated by consulting domains. The purposes of factor analysis are
with patients having the disease of interest, phy- mainly for the structural development and valida-
sicians, and specialists to ensure that the included tion of scales.
items are clear, comprehensive, and acceptable. Exploratory and confirmatory factor analyses
Construct validity is another fundamental char- are two major approaches to factor analysis
acteristic of a measurement instrument and (Brown 2006; Cappelleri and Gerber 2010; Fayers
assesses to what extent an instrument measures and Machin 2007). In factor analysis, the under-
the construct or concept it is supposed to measure. lying structure of a set of measured items is sum-
The assessment of construct validity often begins marized by a smaller set of latent (unobserved)
with postulating a relationship between the con- factors that manifest themselves via the measured
cept (construct) of interest and other related or items. An objective is to identify the number and
unrelated measures or characteristics. Data are the nature of the factors that are responsible for
then collected, and the assessment is conducted. covariation in the data and to determine the
If the results confirm the postulated relationship, domain structure of a questionnaire (which items
evidence exists to support construct validity. represent which domains), which is what explor-
Different methods can be used to establish atory factor analysis addresses. The domain struc-
construct validity. For example, construct validity ture may be unidimensional or multidimensional
can be assessed by comparing instrument scores with several factors or domains (sometimes also
among different groups of patients that are clini- called subscales). A further objective may be to
cally distinct and anticipated to score differently confirm an existing domain structure in a separate,
23 Measurement of Patient-Reported Outcomes of Health Services 549
independent group of individuals from the same language availability, time required to complete
population, which is what confirmatory factor the instrument, patient ability to complete the
analysis addresses. questionnaire, the rate of refusal, and percentage
It is difficult to fully and completely prove of missing items. All of these issues, each an
construct validity. Instead, researchers rely on important element itself, should be thought out
accumulating amounts of evidence to demonstrate when selecting an instrument.
that an instrument is valid in measuring the con-
cept of interest.
Responsiveness, which can also be viewed as Step 3: Developing Data Collection
another type of validity, is the ability of an instru- Strategies
ment to detect small but important changes within
a group over time. Responsiveness is one of the After determining which instrument will be used
most essential characteristics of an instrument; a in an evaluation on PROs, a carefully planned data
nonresponsive instrument has little use to discern collection strategy should be built into study
true drug effects. Two of the most commonly used design and research protocol to ensure high qual-
measures of responsiveness are the standardized ity of data (Fig. 4). Although this is true of any
response mean and the effect size. The standard- serious study design and research, the fact that
ized response mean is the ratio of the mean change PROs are based on a patient’s self-report makes
to the standard deviation of that change. The effect it even more important to develop a judicious
size is the ratio of the mean change to the standard strategy in order to prevent or minimize bias or
deviation of the initial measurement. The effect missing data. An important consideration when
size measure is commonly considered more developing the data collection strategies is the
appropriate than the standardized response mean time intervals that PROs are assessed, as
because the effect size uses natural variability discussed in the section “Longitudinal Designs.”
stemming from patients’ baseline values, which Time intervals of assessment should be based
are not influenced by the effects of treatment, in on disease progression, treatment response, drug
order to help quantify what magnitude of change side effects, duration of the study, and number of
would be important. Measures of responsiveness questionnaires. At a minimum, assessments of
like the effect size, being dimensionless, can be PROs should be performed at baseline and at the
used to compare the responsiveness of a new end of study. But intermediate follow-up measure-
instrument with that of existing ones. ments may be required to more fully capture
Related to responsiveness is sensitivity: the changes within group and between groups over
ability to detect known differences between treat- time. Therefore, a reasonable number of assess-
ment groups over time or at a specific time. Its ments to capture this trajectory should be planned
standardized measures of effect correspond to in a clinical trial.
those for responsiveness except that the mean Assessments of PROs are usually performed at
change is between groups instead of within group. the same time as clinical visits and are best com-
With the exception of content validity, which is pleted before professional encounters with
based on qualitative methods, measurement prop- non-PRO measures, which may influence a
erties are grounded in quantitative analysis usu- patient’s response on PROs. The mode of admin-
ally involving correlations, means and regression istration on PROs can be obtained by paper and
methods, as well as theoretical expectations. pencil, computer administration, electronic
Table 1 summarizes key measurement properties devices, or in-person or phone interviews. The
of a PRO. same PRO should use the same mode of adminis-
tration throughout the study.
Feasibility Standardized data collection procedures need
The final consideration on instrument selection is to be established to ensure that the data are col-
feasibility. Issues related to feasibility include lected consistently among different patients and
550 J. C. Cappelleri and A. G. Bushmakin
investigators and across various study sites. Missing data can occur at the item level for at
Before the start of the trial, data collection person- least one but not all items on the questionnaire or
nel and study monitors should be carefully and at the questionnaire level for all of its items. The
uniformly trained. A detailed guideline on the reasons for missing data should be recorded at the
assessment of PROs should be prepared and time of occurrence and later considered to lend
serve as a reference book for study monitors and insight into the potential patterns for why data are
data collection personnel in order to handle issues missing. Because data quality is directly linked to
arising from the assessment. the validity of study findings, researchers should
23 Measurement of Patient-Reported Outcomes of Health Services 551
have a thorough understanding about the data analysis and reliability where the objective is on
collection process along with potential issues an instrument’s measurement properties, rather
and biases inherent in this process. Such knowl- than a comparison between treatment groups. Fac-
edge can help facilitate the development of appro- tor analysis is a large-sample procedure, and a
priate data analysis plans to understand and valid factor analysis typically involves hundreds
minimize potential bias. of subjects. Sample size estimation for factor anal-
If missing data do occur for some but not all ysis depends on several elements such as the
items on the questionnaire, the non-missing data distribution of items and correlations between
may still be used for analysis based on some them. One suggested rule of thumb is to recom-
prespecified criteria, usually recommended by mend a sample size of at least ten times the num-
the developers of the questionnaire. For example, ber of items for an exploratory factor analysis
the EORTC QLQ-C30 (European Organization (Fayers and Machin 2007) and at least ten times
for Research and Treatment of Cancer Quality of the number of parameters (measurement-error
Life Questionnaire – Cancer-30) consists of five variances, covariances among domains, factor
functional scales [physical, role, cognitive, emo- loadings) for confirmatory factor analysis
tional, and social], three symptom scales (fatigue, (Brown 2006). Sample size estimation for test-
pain, nausea and vomiting), a global health status retest reliability can be based on Fisher’s
scale, and six single-item scales (Fayers et al. Z transformation for an intraclass correlation
2001). The EORTC QLQ-C30 Scoring Manual using a confidence interval approach (Streiner
has specified that under certain conditions, miss- and Norman 2008).
ing values will be imputed for multi-item scales. Although repeated measures and mixed-effect
Specifically, if at least half of the items from the models are often used in the analysis of PRO
scale have been answered, the missing items are measurements over time, sample size estimation
assumed to have values equal to the average of is most commonly based on calculating the
those items which are present for the respondent. expected difference in the group means at a single
For example, the physical function subscale con- time point rather than over time. This calculation
sists of 5 items, and this scale can be estimated amounts to sample size estimation for a univariate
whenever at least 3 of its 5 constituent items are analysis and in most cases provides a conservative
present. More is said about missing data in the (larger than necessary) estimate of the sample
section “Missing Data.” size. Procedures are also available for the estima-
Sample size estimation is an indispensable part tion of sample size in a longitudinal analysis with
of a data collection strategy and depends on the a repeated measures model or mixed-effect model
study objective. In principle, there are no major (Fairclough 2010; Fitzmaurice et al. 2011).
differences in planning studies for a comparison
between treatment groups using PROs compared
with using non-PRO clinical measures such as Step 4: Analyzing Data
blood pressure levels. As such, sample size esti-
mation for PROs will require specification of the The next step in the evaluation of PROs is to
significance level, statistical power, anticipated develop prespecified, comprehensive, and
difference or effect size, expected dropout rate, detailed plan on data analysis (Fig. 4). For a clin-
and type of data and method of analysis (Fayers ical trial, the statistical analysis plan (SAP) on
and Machin 2007). As already stressed, it is PROs is best integrated with other study endpoints
important and necessary to clearly state and limit as part of an overall analytic strategy. Gains in
the major PROs of interest in the study protocol. efficiency arise when PROs are integrated and
Doing so is especially relevant for sample size unified with other endpoints in the SAP.
purposes. The SAP part on PROs should be clear and
Sample size estimation for PROs becomes spe- concise, and yet complete and comprehensive,
cialized for psychometric techniques like factor about the stated objective. In addition to the data
552 J. C. Cappelleri and A. G. Bushmakin
analysis on PROs, the SAP should also include a measures models and mixed-effect models incor-
brief description on how the instruments are porate all available data and assume that data are
selected, how domains belonging to an instrument missing at random.
are scored, and how missing items of an instru- Inferential testing of data on PROs should con-
ment are handled. The development of data anal- sider the analytical issues specific to the evalua-
ysis plan should be based on study objectives and tion of PROs in a clinical trial. For example, many
may vary among different phases of clinical trials. instruments have multiple domains, and each
For example, for a phase II trial intended to instrument may be measured a number of times.
explore the potential impact of a specific drug Multiple comparisons then become an important
treatment on PROs, the analysis plan can focus issue that deserves special consideration. Missing
on a comprehensive descriptive analysis and, if data usually occur in PRO studies. How to handle
suitable, an inferential analysis. Basic statistics the missing data also requires special consider-
such as instrument compliance rate, the observed ations. More detail on these two issues follow.
mean of domain scores (along with confidence
intervals such as a 95% confidence interval), and Multiple Testing
the observed mean change from baseline (and its It has been well recognized that the multiple com-
95% confidence interval) to each follow-up time parisons of drug treatments can result in false
should be included within each group. Addition- significant results. Because data on a particular
ally, if a trial has multiple arms, a comparison of PRO is usually measured over a number of time
the domain scores between arms is typically points, and because the same study may comprise
worthwhile to include by analyzing (and then multiple PROs (or multiple subscales within the
reporting) the between-group difference in same PRO instrument), it becomes important to
changes from baseline to each follow-up time, describe in the SAP how to deal with this multi-
along with the corresponding difference in mean plicity issue, especially if the evaluation in the
changes and its 95% confidence interval. clinical trial is intended for label claims based on
For a phase III trial, especially one intended for PRO outcomes. Several methods can be applied to
a label claim based on a PRO outcome, inferential address the multiple testing (Fairclough 2010).
statistics (hypothesis testing and confidence inter- One of the methods is to use summary mea-
vals) should be the focus of the analysis plan, sures or summary statistics. For many instru-
along with a detailed descriptive summary. ments, a single score can be constructed by
Regardless of phase of the study, data on PROs aggregating data across different domains on the
should be treated just like any other study points same questionnaire. Such a summary score can be
and adopt the same analytical rigors. used as the primary endpoint for hypothesis test-
As discussed in the section “Longitudinal ing and, consequently, prevents the concern of
Designs,” event-driven designs are generally repeated testing on multiple domains of the same
associated with repeated measures longitudinal instrument.
model, where time is a categorical covariate. Summary measures can also be constructed on
Restricted maximum likelihood estimation of a particular subscale or domain of an instrument to
repeated measures models can account for incom- summarize the repeated observations over time on
plete data and time-varying covariates. Time- an individual and then across individuals in the
driven designs are associated with mixed-effect same treatment group. Examples include, for each
longitudinal models via growth curve models, treatment group, the average of within-subject
where time is taken as a continuous covariate. It posttreatment values, area under growth curve,
is generally enough for these models to include and time to reach a peak or prespecified value.
polynomial or piecewise linear models and typi- The use of these summary measures begins with
cally allow one to three random effects (intercept; the construction of a summary measure for each
intercept and slopes; intercept, slope, and addi- individual, follows with the analysis of a sum-
tional variation over time). Both repeated mary measure across individuals for a within
23 Measurement of Patient-Reported Outcomes of Health Services 553
group, and then continues with a corresponding and secondary endpoints and their order for infer-
between-group comparison. For instance, it is ential analysis and testing.
possible to construct summary statistics on the The problem of multiplicity can also be
repeated measures within a group of individuals addressed in several other ways including through
by taking the average rate of change over time for p-value adjustment. Three types of p-value adjust-
a treatment group and then comparing these sum- ment are commonly considered: (1) Bonferroni,
mary statistics between groups. (2) Bonferroni-Holm (step-down) procedure, and
A potential problem with the use of the sum- (3) Hochberg’s (step-up) method. Of the three
mary score is that significant changes in some methods, the Bonferroni procedure is the most
specific domains may be masked and what is conservative. In contrast, the Holm’s procedure
really measured may become clouded or convo- and Hochberg’s method may be more accurate
luted, resulting in low confidence about the effect and preferable.
of treatment as measured by the summary score. A
drawback of summary measures across time is Missing Data
that they do not fully capture the weighted and Missing data on PROs can have at least two major
correlated nature of repeated observations on repercussions. At a minimum, the missing data
PROs over time. will result in wider confidence intervals and
Another way to minimize the problem of mul- reduced statistical power for detecting a treatment
tiplicity is to restrict the number of key domains effect. The larger, more troublesome issue is the
and time points, no more than a few. These key likelihood that missing data are closely linked to
domains at specific time points should be pre- patients’ health and treatment, leading possibly to
specified in the SAP as primary endpoints for a biased estimation of treatment effects. Given
statistical inference. Other domains at other time these potential impacts, the SAP should clearly
points may be regarded as secondary endpoints. describe how to handle missing data, especially if
While this recommendation provides a straight- the evaluation on PROs is intended for label
forward way to handle the multiplicity issue, a claims or promotional use.
major challenge is how to select the most appro- Missing data on PROs can occur as missing
priate domains and time points. One way to items or missing questionnaires. Missing items
address this challenge is to rely on substantive involve the lack of responses for some specific
knowledge, well-grounded theory, and research items; missing questionnaire involves patients
objectives in tandem with the nature of the disease who may fail to complete and return the whole
and the intended effects of the interventions. questionnaire. Many instruments include well-
Often several multiple endpoints, both PRO documented procedures by their developers on
and non-PRO endpoints, would be of clinical how to handle missing items. Such recommenda-
interest. One suitable method is to test them tions by developers are typically the preferred
using a gatekeeping strategy whereby secondary way to address missing items.
endpoints are analyzed and tested inferentially in Missing questionnaires are a more complex
a prespecified sequential order only after success situation than missing items. Missing question-
on a primary endpoint (Food and Drug Adminis- naires can happen as a result of dropout from the
tration 2009). More generally, the key endpoints study or randomly failing to fill out an entire
are ranked from most important to least important questionnaire. In any of these situations, it is
from the list of endpoints considered most rele- important to first analyze the rates (proportions)
vant. This process can be done using a sequential and reasons for missing data. Such information
method by testing additional endpoints in a will help to gauge the severity of the nonresponse
defined sequence each at the usual alpha at the problem and the underlying mechanisms for
0.05 level of statistical significance. The analyses missing data.
cease when a failure occurs. It is important that the There are at least four approaches to address
clinical trial protocol specifies all relevant primary the missing data problem (Fairclough 2010). One
554 J. C. Cappelleri and A. G. Bushmakin
approach is to remove patients with missing or Multiple imputation method, which improves the
incomplete forms from the analysis and only ana- accuracy of standard error, assumes that the miss-
lyze complete cases. While simple, this method is ing data are missing at random (MAR), where the
usually not recommended because it can break missingness depends only on the observed data
down initial randomization and reduce sample such as the most recently observed PRO value.
size and, in doing so, may produce bias results if A third approach to address the problem of
the missing data are not missing completely at missing data is through the application of a
random. (Missing completely at random occurs likelihood-based approach using repeated mea-
when the missingness is unrelated to PRO value sures models or mixed-effect models (Fairclough
as when, e.g., a patient moves out of town or a 2010; Fitzmaurice et al. 2011; Mallinckrodt et al.
staff member forgets to administer the 2008). In this approach, every subject would con-
questionnaire.) tribute his or her available (observed) measure-
A second approach is to impute the missing ments. Repeated measures models and mixed-
data. Different methods can be used for the impu- effect models employ a likelihood-based
tation. The simplest way is to substitute the mean approach that is considered attractive because it
scores of patients with observed data for those can provide valid estimate of treatment effects if
with missing data (mean imputation). Unless the missing data are MCAR or MAR, where the miss-
missing data are missing completely at random ing data are said to be ignorable.
(MCAR), this means imputation method may The fourth approach is especially relevant
result in bias estimates and should be used cau- when missing data are not MAR and hence
tiously. Another commonly used method is last depend on the (unknown) missing value, when
observation carried forward, which replaces a missing data are said to be non-ignorable. In this
patient’s missing value with his last completed case, selection models or pattern-mixture models,
observation. In the event that data on PROs may which do not assume that data are neither MCAR
not remain stable over time, last observation car- nor MAR, should be considered as secondary
ried forward may also be suspect and result in a models in sensitivity analyses. For the analysis
bias representation (Mallinckrodt et al. 2008). of longitudinal data, it is generally preferred to
Analogous to last observation carried forward consider, depending on the circumstances, a
approach is the baseline observation carried for- repeated measures model or mixed-effect model
ward approach, when all missing values for a as the main model and multiple imputation or
subject are replaced by his or her baseline obser- pattern-mixture models (or both) as secondary
vation. Relative to the method based on last obser- models.
vation carried forward, this method can produce The National Research Council has produced
more conservative results for treatment an authoritative account on the prevention and
differences. handling of missing data in clinical trials
Some more sophisticated techniques have been (National Research Council 2010), which can be
developed including regression imputation, hot relevant to prevention and handling of missing
deck imputation, and cold deck imputation. All PRO data.
of these techniques, like the simple mean imputa-
tion and last observation carried forward, belong
to a single imputation category in which a single Step 5: Reporting Data
value is imputed for a specific missing point. A
major limitation with single imputation methods The reporting of data on PROs is a critical com-
is that estimated errors are generally too small, as ponent to their evaluation (Fig. 4). Data on PROs
the imputed values are treated as actual data when should be presented clearly, concisely, and suffi-
in fact they are not. However, this obstacle can be ciently to foster clarity, transparency, and compre-
overcome by multiple imputations whereby sev- hension. While a table is a useful way to
eral values are imputed instead of just one. summarize study results, graphical presentations
23 Measurement of Patient-Reported Outcomes of Health Services 555
Cumulative Proportion
70
line = experimental group,
dashed line = control 60
group)
50
40
30
20
10
0
−35 −30 −25 −20 −15 −10 −5 0 5 10 15 20 25 30 35
Change From Baseline
usually useful to display individual responses, supplement to – the main analysis based on the
often using an a priori responder definition (i.e., full original scale of measurement using
the individual PRO score change over a pre- established statistical methods (e.g., repeated
determined time period that should be interpreted measures models or mixed-effect models when
as a treatment benefit). The responder definition is the data are longitudinal).
determined empirically and may vary by target
population or other clinical trial design character-
istics. The empiric evidence for any responder References
definition is derived using anchor-based methods,
which explore the association between the Alcion L, Petersen JL, Temple S, Arndt S. Probabilistic
targeted concept of the PRO instrument and the index: an intuitive non-parametric approach to measur-
ing the size of treatment effects. Stat Med.
concept measured by the anchor (or anchors). To 2006;25:591–602.
be useful, the anchors chosen should be easier to Brooks MM, Jenkins LS, Schron EB, Steinberg JS, Cross
interpret than the PRO measure itself. JA, Paeth DS. Quality of life at baseline: is assessment
A cumulative distribution function can display a after randomization valid? Med Care. 1998;26:1515–9.
Brown TA. Confirmatory factor analysis for applied
continuous plot of the change from baseline on the research. New York: The Guilford Press; 2006.
horizontal axis and the cumulative percent of Cappelleri JC, Gerber RA. Exploratory factor analysis. In:
patients experiencing up to that change on the Chow S-C, editor. Encyclopedia of biopharmaceutical
vertical axis. Consider a situation where lower statistics.3rd ed., revised and expanded. New York:
Informa Healthcare; 2010. p. 480–5.
change or more negative scores are better or more Cella D, Li JZ, Cappelleri JC, Bushmakin A,
favorable (Fig. 5). In Fig. 4, 70% of the subjects in Charbonneau C, Kim ST, Chen I, Michaelson MD,
the experimental group had scores of 10 or less (i.e., Motzer RJ. Quality of life in patients with metastatic
10 or better) compared with 55% of the subjects in renal cell carcinoma treated with sunitinib versus
interferon-alfa: Results from a phase III randomized
the control group. The consistent horizontal sepa- trial. J Clin Oncol. 2008;26:3763–9.
ration between the distribution functions suggests Fairclough DL. Patient reported outcomes as endpoints in
that the treatment was beneficial relative to control medical research. Stat Methods Med Res.
over the entire range of changes. 2004;13:115–38.
Fairclough DL. Analysing longitudinal studies of QoL. In:
Responder analysis and cumulative distribu- Fayers P, Hayes R, editors. Assessing quality of life in
tion functions are best suited as descriptive dis- clinical trials. Oxford: Oxford University Press; 2005.
plays and as an adjunct to – as a complement and p. 149–65.
23 Measurement of Patient-Reported Outcomes of Health Services 557
Fairclough DL. Design and analysis of quality of life National Research Council. The prevention and treatment
studies in clinical trials. 2nd ed. Boca Raton: Chapman of missing data in clinical trials. Washington, DC: The
& Hall/CRC; 2010. National Academies Press; 2010.
Fayers FM, Machin D. Quality of life: the assessment, Patrick DL, Burke LB, Powers JH, Scott JA, Rock EP,
analysis and interpretation of patient-reported out- Dawisha S, O’Neill R, Kennedy DL. Patient-reported
comes. 2nd ed. Chichester: Wiley; 2007. outcomes to support medical product labeling claims:
Fayers PM, Aaronson NK, Bjordal K, Groenvold M, FDA perspective. Value Health. 2007;10:S125–37.
Curran D, Bottomley A. On behalf of the EORTC Revicki D, Hays RD, Cella D, Sloan J. Recommended
quality of life group. In: EORTC QLQ-C30 scoring methods for determining responsiveness and minimally
manual. 3rd ed. Brussels: EORTC; 2001. important differences for patient-reported outcomes.
Fetting JJ, Gray R, Fairclough DL, Smith TJ, Margolin J Clin Epidemiol. 2008;61:102–9.
KA, Citron ML, Grove-Conrad M, Cella D, Rothman ML, Beltran P, Cappelleri JC, Lipscomb J,
Pandya K, Robert N, Henderson C, Osborne K, Abeloff Teschendorf B, Mayo/FDA Patient-Reported Out-
MD. A 16-week multidrug regimen versus cyclophos- comes Consensus Meeting Group. Patient-reported
phamide, doxorubicin and 5-flurouracil as adjuvant outcomes: conceptual issues. Value Health. 2007;10:
therapy for node-positive, receptor negative breast S66–75.
cancer: an intergroup study. J Clin Oncol. Russell IJ, Crofford LJ, Leon T, Cappelleri JC, Bushmakin
1998;16:2382–91. AG, Whalen E, Barrett JA, Sadosky A. The effects of
Fitzmaurice GH, Laird NM, Ware JH. Applied longitudinal pregabalin on sleep disturbance symptoms among indi-
analysis. 2nd ed. Hoboken: Wiley; 2011. viduals with fibromyalgia syndrome. Sleep Med.
Food and Drug Administration. Guidance for industry on 2009;10:604–10.
patient-reported outcome measures: Use in medical Snyder CF, Watson ME, Jackson JD, Cella D, Halyard MY,
product development to support labeling claims. Fed Mayo/FDA Patient-Reported Outcomes Consensus
Regist. 2009;74(235):65132–3. Meeting Group. Patient-reported outcomes instruction
Gotay CC, Korn EL, McCabe MS, Moore TD, Cheson selection: designing a measurement strategy. Value
BD. Building quality of life assessment into cancer Health. 2007;10:S76–85.
treatment studies. Oncology. 1992;6:25–8. Streiner DL, Norman GR. Health measurement scales: a
Johnson JR, Temple R. Food and drug administration practical guide to their development and use. 4th ed. -
requirements for approval of new anticancer drugs. New York: Oxford University Press; 2008.
Cancer Treat Rep. 1985;69:1155–9. Wang XS, Fairclough DL, Liao Z, Komaki R, Chang JY,
Luo X, Cappelleri JC. A practical guide on interpreting and Mobley GM, Cleeland CS. Longitudinal study of the
evaluating patient-reported outcomes in clinical trials. relationship between chemoradiation therapy for non-
Clin Res Regul Aff. 2008;25:197–211. small-cell lung cancer and patient symptoms. J Clin
Mallinckrodt CH, Lane PW, Schnell D, Peng Y, Mancuso Oncol. 2006;24:4485–91.
JP. Recommendations for the primary analysis of con- Wiklund I. Assessment of patient-reported outcomes in
tinuous endpoints in longitudinal clinical trials. Drug clinical trials: the example of health-related quality of
Inf J. 2008;42:303–19. life. Fundam Clin Pharmacol. 2004;18:351–63.
Micro-simulation Modeling
24
Carolyn M. Rutter
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
Development of a Microsimulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
Step 1: Define the Decision Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
Step 2: Specify the Model Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
Step 3: Identify Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
Step 4: Select Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
Example: Comparison of Two Tests to Screen for Colorectal Cancer . . . . . . . . . . . . . . . . . 570
Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
Exploration and Description of Model Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
In Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
improvement in methods for developing Models can help decision makers choose among
and applying microsimulation models. This competing courses of action by structuring and
chapter reviews the process of developing combining a wide range of evidence, including
and applying a microsimulation model, information about disease process and clinical and
drawing from guidelines for best practices economic outcomes, and then predicting patient
for simulation outlined by the International outcomes based on this evidence. Microsimulation
Society for Pharmacoeconomics and Out- models are used to predict outcomes under different
comes Research (ISPOR) and The Society policy scenarios and are especially useful for out-
for Medical Decision Making (SDM) (Caro comes that cannot readily be studied via direct
et al. 2012). observation for ethical or practical reasons. Model
predictions may extend cross-sectional results to
longitudinal predictions, extend results to different
Introduction patient populations, or make direct comparisons not
made in available randomized trials. For example,
Microsimulation models for health policy are a randomized trials demonstrate that both fecal occult
type of decision analytic model that describe dis- blood testing (FOBT) (Hardcastle et al. 1996;
ease processes by simulating key events that occur Kronborg et al. 1996; Towler et al. 1998) and flex-
as disease develops. Their purpose is to help deci- ible sigmoidoscopy (Atkin et al. 2010) reduce CRC
sion makers identify trade-offs associated with mortality. There is no direct evidence that either
different policy decisions. For example, the optical colonoscopy or CT colonography reduces
National Cancer Institute has advanced the use mortality, though several studies have estimated
of models for the cancer outcomes through the their sensitivity and specificity for detecting colo-
Cancer Intervention and Survival Modeling Net- rectal adenomas (the primary precursor of colorectal
work (CISNET) (2014). CISNET models have cancer) (Hixson et al. 1990; Johnson et al. 2008;
been used to inform policy recommendations Rex et al. 1997). Microsimulation models for colo-
regarding use of newer colorectal cancer screen- rectal cancer have been used to combine available
ing tests (fecal immunochemical tests, stool-based information about the natural history of disease and
DNA, and computed tomography colonography) screening tests to compare the effectiveness and
and to assist in development of guidelines for cost-effectiveness of all four of these screening
breast and colorectal cancer screening. As early modalities (Knudsen et al. 2010; Lansdorp-
as the 1980s, models were used by the American Vogelaar et al. 2010).
Cancer Society to aid in guideline development
for cervical cancer screening and by the US
Congress’s Office of Technology Assessment for Development of a Microsimulation
evaluation of cervical and breast cancer screening Model
policy (Eddy 1987; Muller et al. 1990).
Models have also been used to inform policy Table 1 shows the steps in developing a micro-
and clinical practice related to medications, simulation model.
radiology, vaccination, and HIV screening
(Mandelblatt et al. 2012). Examples of policy-
relevant findings from models include overdiag- Step 1: Define the Decision Problem
nosis of prostate cancer among PSA-detected
cases (Etzioni et al. 2002); identification of The first job of the modeler is to define the deci-
efficient cervical cancer screening policies (van sion problem, that is the modeling objectives. It is
der Akker-van Marle et al. 2002); and the impact important to be clear about the objectives, because
of modifiable risk factors, screening, and treat- these will drive model structure and complexity.
ment on colorectal cancer (CRC) mortality rates Modeling is a collaborative process. Consulting
(Vogelaar et al. 2006). with experts knowledgeable about the targeted
24 Micro-simulation Modeling 561
Table 1 Steps in developing a microsimulation model affects the disease process through detection of
Step 1: Define the decision problem preclinical states.
What interventions will be modeled?
What events are of interest? What Events Are of Interest?
What is the target population and what subgroups are of Events that are outcomes, such as cases of and
interest? deaths from the disease of interest, need to be
Step 2: Conceptualize the model structure described by the model. All-cause death is another
Will models describe events in discrete or continuous event that is almost always modeled, because it
time?
enables calculation of life-years gained (or lost)
What disease states and characteristics will the model
describe? that result from intervening on the disease pro-
When (and how) do individuals transition between states? cess. The events that are modeled are closely
Step 3: Identify and select data sources related to the interventions of interest. Models
Which data will inform the model? for prevention and screening need to describe
How will each data source inform the model – as an input, preclinical (asymptomatic) disease processes. In
calibration target, or validation target? contrast, models that focus on treatment focus on
Step 4: Select model parameters detected disease need to describe remission and
Which parameters are “inputs” and which parameters will recurrence.
be calibrated?
Which goodness of fit measure will be used to guide
calibration? What Is the Target Population and What
Which calibration method will be used for parameter Subgroups Are of Interest?
selection? Those eligible for intervention often define the
target population, with the earliest age of inter-
vention defining the beginning of the age range,
disease from the outset will help to ensure devel- which may extend through the entire simulated
opment of a useful model that addresses important life span. For example, models for cervical cancer
policy questions. Clinicians and epidemiologists screening focus on women who are 18 years and
who are familiar with the disease process can help older, while models of for breast cancer screening
inform the model structure to ensure face validity generally focus on women who are 40 years and
of the model and can provide insight into key older. Models examining treatment focus on
questions that cannot readily be addressed patients diagnosed with disease. Specific sub-
through direct observation. Policy makers and groups may be defined by risk factors, such as
other end users can help to determine necessary race/ethnicity and family history or disease
model output and provide additional insight into severity.
policy questions. Some models are developed for very specific
Three key questions, described below, need to decision problems, while others are developed to
be considered when defining the decision address multiple problems. General purpose
problem. models tend to describe disease processes in
greater detail, enabling modeling of the action of
What Interventions Will Be Modeled? a wide range of possible interventions and capture
Interventions can include primary prevention of of a wide range of possible outcomes. Therefore,
disease, screening for purposes of early detec- models that are used for multiple decision prob-
tion, methods for diagnosing disease, and treat- lems tend to be more complex than more focused
ment after diagnosis. The action of the models. It can be difficult to choose the level of
intervention will determine key health states detail that will be described by the model. The
that need to be included in the model structure. modeler must strike a balance between simplicity,
For example, models for screening need to which eases communication of model assump-
describe disease states that occur before clinical tions, and complexity, which may increase face
(symptomatic) presentation, because screening validity.
562 C. M. Rutter
perfect specificity. The sensitivity of colonoscopy indirectly inform the model; however, data avail-
was assumed to depend on the size of the lesion; ability should not necessarily determine a model’s
the probability of missing a lesion that is s mm in structure. The structure of the model must be
diameter is given by P(miss|size=s and sufficient to address the decision problem, and
size < 20) = 0.34–0.0349s + 0.0009s2, with per- this may require description of processes that can-
fect sensitivity for adenomas 20 mm and larger. not be directly observed (such as tumor growth). If
The associated miss rates for lesions that are the model structure is not supported by data, this
1 mm, 5 mm, 10 mm, and 15 mm in size are limited understanding of the underlying disease
31%, 19%, 8%, and 2%, respectively. process should be noted. Processes that are not
well supported by data can be explored through
What Events Are of Interest? sensitivity analysis.
For this question, the key outcome event is colo- When specifying a microsimulation model, the
rectal cancer death. However, other-cause death modeler must choose whether to model time as
also needs to be modeled to enable estimation of discrete or continuous, the distinct health states
life-years saved and accurate description of the that the model will describe, and rules for trans-
screened population. In addition, models need to itioning between states.
describe the preclinical disease processes because
screening can reduce mortality by its effect on two Will Models Describe Events in Discrete
preclinical process: (1) by detecting cancer at an or Continuous Time?
earlier stage, before it has become clinically The decision to model time as continuous or dis-
detected (through presentation with symptoms), crete is closely tied to the type of model used for
and (2) by preventing disease through detection simulations. Different types of health policy
and removal of precancerous lesions (adenomas). models are described below, including some
It will be important to describe adenoma size in models that are not used for microsimulation.
this model because both the probability of screen- Decision trees are a relatively simple models
detecting an adenoma and the probability that an that are used to describe outcomes for groups of
adenoma transitions to cancer increases with individuals (Petitti 2000).At each branching
increasing adenoma size. point, the tree specifies the probability of each
subsequent outcome, for example, whether an
What Is the Target Population and What individual has disease and, among people who
Subgroups Are of Interest? have disease, whether a test is positive or nega-
The decision problem in this example focuses on tive. Using a decision tree, alternative courses of
average risk individuals, who begin screening at action are compared by calculating the expected
age 50. Individuals at high risk for colorectal value of the outcome resulting from each pathway
cancer, because of family history of colorectal (i.e., multiplying the value assigned to each poten-
cancer or diagnosis with genetic conditions, tial outcome by the probability that each occurs).
often begin screening at earlier ages. Because they do not explicitly incorporate time,
decision trees are useful for simple decision prob-
lems with short time horizons, such as the short-
Step 2: Specify the Model Structure term effects of diagnostic assessment, but they are
not well suited to modeling of repeated events,
Once the decision problem is defined, the modeler such as a regimen of screening.
must specify the model structure (Roberts et al. State transition models are more complex
2012). The structure of the model is driven by the than decision trees and are useful for describing
decision problem in combination with an under- events over longer time frames than decision
standing of the disease process, which may be trees. State transition models incorporate time by
rooted in empirical data representing the cumula- updating state membership at discrete time inter-
tive scientific knowledge. In this way, data may vals or cycles. Because only a single transition can
564 C. M. Rutter
occur in each cycle, cycle length should be number of states must be increased when there is
selected understanding that only one event can interest in patient subgroups with different transi-
occur within a cycle. For example, if there are tion probabilities that reflect differences in disease
disease-free, preclinical disease, and clinical dis- characteristics. The number of states can also
ease states, and individuals are required to pass increase when modelers relax the Markov
through the preclinical state, then in one cycle assumption by carrying past health state informa-
individuals could transition from disease-free to tion forward. Because of this, the number of dis-
preclinical disease, or from preclinical disease to ease states needed to adequately describe a
clinical disease, but not from disease-free to clin- disease process can quickly increase, a problem
ical disease. Cycle length does not need to be known as “state explosion.” As the number of
uniform – it can depend on the state. However, states increases, Markov process models become
shortening the cycle length for a given timeframe intractable.
increases the total number of simulated transi- State transition models that describe the tran-
tions, increasing computational time. sition of individuals are a type of microsimulation
State transition models that describe the tran- model. Simulated individuals can be assigned
sition of groups of individuals are called Markov characteristics (such as age, sex, or race), and the
process models (Beck and Pauker 1983; Siebert model can allow transitions to depend on these
et al. 2012). Markov models assume that the prob- characteristics. By explicitly allowing individuals
ability of the transition from one state to the next in the same state to be governed by different
depends only on the current state and is indepen- transition rules, microsimulation models are able
dent of prior history (i.e., how members got to the to limit the total number of states. For example,
state). Because of this, Markov models are com- consider the colorectal cancer model shown in
monly described as “memory-less.” For example, Fig. 1, which includes six states: (1) alive and
when using a Markov model for screening, the disease-free; (2) alive with one or more adenomas,
probability of transition to the next screening test but no cancer; (3) alive with preclinical cancer;
depends only on the outcome of the current test (4) alive with detected cancer; (5) dead from colo-
rather than the entire simulated screening history. rectal cancer; and 6) dead from other causes. Sup-
Markov process models assume that individ- pose now that the model needs to allow all
uals who occupy the same state are homogeneous, transitions to depend on sex. Using a Markov
that is, they are governed by the same rules for process model, this would require expansion to
transitioning into the next health state. The an 10 state model (assuming death states are the
Fig. 1 Bubble graph showing the states and allowed transitions between states for the colorectal cancer model
24 Micro-simulation Modeling 565
same for men and women). In contrast, a state modeling approach can be used to simulate indi-
transition microsimulation model could describe viduals with specific characteristics (such as age,
this process using the same six states, by allowing sex, or race).
transitions to depend on the sex of the simulated
individuals, and some transitions could be What Distinct Disease States
modeled as identical for both men and women. and Characteristics Will the Model
Discrete event simulation (DES) models are Describe?
another type of microsimulation model that All models require specification of a set of mutu-
describe the movement individuals through dis- ally exclusive disease states that reflect the
tinct disease states in continuous time (Karnon disease processes of interest, such as the six
et al. 2012). Discrete event simulation models states shown for colorectal cancer in Fig. 1.
are useful when modelers can better characterize This basic model must be expanded to evaluate
transitions with time-to-event models than with endoscopic tests because large adenomas are
transition probabilities over fixed periods. For easier to detect than small adenomas. Both state
example, when modeling disease incidence, a transition and DES models could address the
state transition model would specify incidence need for adenoma size information by expanding
probabilities that are tied to the model’s cycle the model to include the size of the largest ade-
length (e.g., annual incidence probabilities), noma (e.g., diminutive (<5 mm), small
while DES models could use time-to-event (sur- (5 10 mm), or large (10 mm)). Alternatively,
vival) models to simulate the age at disease DES models can describe adenoma growth as a
incidence. continuous process, which essentially describes
Models for infectious diseases are more com- the time to reach various sizes. Modeling contin-
plicated because they describe transmission of uous growth requires assumptions about the
disease between individuals, and therefore indi- nature of adenoma growth but allows flexibility
viduals are not independently simulated (Pitman in how adenoma size is incorporated into
et al. 2012). Two broad types of models are used an intervention examined in the decision
to simulate infectious disease at the population- problem.
level: dynamic transition models and agent-based
models. Dynamic transition models for infectious When (and How) Do Simulated
disease model groups of individuals and describe Individuals Transition Between States?
transitions using differential equations (Brauer Rules for moving individuals between states in a
and Castillo-Chavez 2013). These are also state transition model are based on cycle length,
known as compartmental models, and they that is, how often state memberships are updated,
describe the transitions of individuals between and are given by probabilities for each possible
compartments (or states) in continuous time. transition.
Agent-based models are an extension of discrete Rules for moving individuals between states
event simulation that allows interactions between in DES models are based on time-to-event
individuals (Hunt et al. 2013; Luke and distributions, life tables that characterize the
Stamatakis 2012). This chapter focuses on models time between successive events or, possibly,
that are useful for noninfectious diseases. How- continuous growth. Time-to-event distributions
ever, many of the issues associated with DES and take positive values on and include distributions
state transition microsimulation also apply to typically used in survival analysis, such as
agent-based models. exponential and Weibul distributions. While
In summary: State transition models describe state transition models have a single type of
individual disease trajectories in discrete time, parameter (transition probabilities), DES models
with time periods given by cycle lengths. Discrete can incorporate a range of parameter types that
event simulation (DES) models describe individ- are associated with different time-to-event
ual disease trajectories in continuous time. Either distributions.
566 C. M. Rutter
• There is no possible benefit for trajectories D, t after initiation. The minimum detectable ade-
E, F, or G. Although screening has the potential noma size is set to d0 = 1 mm, and the maximum
to detect and remove adenomas (D) or to detect adenoma size is set to d1 = 50 mm.
preclinical cancer (E), both trajectories simu- Variation in growth across adenomas is allo-
late death before the cancer becomes symp- wed by varying the time it takes to reach 10 mm,
tomatic. For trajectory F, the simulated given by t10 = ln((d1 10)/(d1 d0))/λ,
adenoma that could be detected at screening allowing t10 to follow a type I extreme value
does not develop into preclinical cancer before distribution. Individuals can transition out of the
other-cause death. For trajectory G, no disease adenoma state when adenomas are removed dur-
events are simulated to occur. ing colonoscopy. Individuals transition out of the
adenoma state in two ways: (1) any adenoma
transitioning to preclinical cancer, or (2) all ade-
When (and How) Do Individuals Transitions nomas are detected and removed during a
Between States? colonoscopy exam.
This component of the model is made up of the Simulated individuals transition into the pre-
mathematical functions and probability distribu- clinical cancer state when any one of their adeno-
tions that govern between state transitions. mas becomes cancerous. For each adenoma, the
The following section describes between state model assigns a size at transition based on the
transition rules for the CRC-SPIN model. Addi- lognormal distribution, with an expected size at
tional details are provided in Rutter and transition that depends on location in the colon
Savarino (2010). and rectum, gender, and age at initiation. Adeno-
The model describes the initiation of adenomas mas do not transition to preclinical cancer if the
using on a nonhomogeneous Poisson process that individual dies before the adenoma reaches tran-
allows adenoma risk to vary systematically by sition size. Once in the preclinical cancer state,
gender and age and to vary randomly across indi- disease can be screen-detected, perhaps at an ear-
viduals. Under this model, the log-risk of devel- lier stage than if it becomes clinical cancer, but the
oping an adenoma for the ith simulated individual person cannot transition back to the disease-free
is given by or adenoma-only states.
Simulated individuals transition into the clini-
X
4 cal cancer state when any preclinical cancer
α0i þ α1 sexi þ δðAk < agei ðtÞ Akþ1 Þ becomes clinically detected. Once the model sim-
( k¼1 ) ulates a preclinical cancer, the lesion is assigned a
Xk
agei ðtÞα2k þ Aj α2, j1 α2j time to clinical cancer, based on a lognormal
j¼2 distribution that depends on location of the pre-
clinical cancer (colon or rectum).
Here, δ() is an indicator function with δ(x) = 1 Once cancer is detected (clinically or through
when x is true and δ(x) = 0; otherwise, agei(t) is the screening), the model assigns a stage at detection.
ith individual’s simulated age at time t. Increases in For clinically detected cancers, stage is assigned
adenoma risk with age are modeled with a piece- using on the observed (SEER) stage distribution
wise linear function, with changes at Ai, with of clinically detected cancers. The model specifies
A1 = 20, A2 = 50, A3 = 60, A4 = 70, A5 = 100. that screen detection finds cancer at the same stage
Once an adenoma is initiated, the model or earlier than clinical detection. Simulated indi-
assigns two characteristics: a location in the viduals can only die from colorectal cancer after
colorectum (colon/rectum) and a growth rate. cancer is detected. Time from colorectal cancer
Adenomas grow based on the Janoschek growth diagnosis to death is based on survival probabili-
curve model, given by dij(t) = d1 (d1 d0) ties based on analysis of SEER data and is a
exp(λijt), where dij(t) is the maximum diameter function of age at diagnosis, gender, stage at diag-
of the jth adenoma in the ith individual at time nosis, and year of diagnosis.
568 C. M. Rutter
used to measure how close the model predictions sample of parameters might miss regions of
are to the observed data. At least three measures good fit.
can be used to measure goodness of fit (GOF):
least squares, chi-squared, or likelihood methods Directed Searches
(Vanni and Karnon 2011). Least squares minimize Directed searches move through the parameter
the sum of squared differences between predicted space by “hill climbing,” that is, moving in a
values, Pi, and observed values Oi. The chi-square direction of improving goodness of fit. If the func-
approach scales these differences, for example, by tional form of the likelihood is available, then the
dividing by the standard deviation of the observed algorithm can take steps in directions that are
P 2
data, σ i : ðOi P
σi
iÞ
. The goal of calibration is to based on the derivative of the likelihood function,
minimize the distance between the observed and with movements in the direction of most rapid
predicted values, that is, to minimize the least increase (“up the hill”). In general, micro-
squares or chi-square statistics. A third common simulation models do not have closed form
approach is to use the likelihood of the data at a expression for these derivatives. This can be
specific parameter value, ^θ, that is, the probability addressed by using approximations to the deriva-
of the observed data at ^θ. The goal of calibration is tive or by using the Nelder-Mead algorithm,
to maximize the likelihood. The likelihood which does not require derivatives. Directed
approach requires specification of a probability searches may find parameter values that provide
distribution for observed data as a function of locally, but not globally, good fit to calibration
model parameters or simulation-based estimation targets. To avoid this problem, directed searches
of the likelihood at ^θ (Rutter et al. 2009). should be initiated at multiple widely dispersed
points within the parameter space. Directional
searches for model calibration are generally
Which Calibration Method Will Be Used more computationally efficient than grid search
for Parameter Selection? approaches, requiring fewer model runs for
The next step in model calibration is selection of a calibration.
search strategy. There are two primary approaches
to model calibration: undirected and directed
searches (Rutter et al. 2010). Implementation
Table 2 Simulated effect of screening for colorectal cancer, based on a cohort of individuals screened at age 50. The table
below shows predictions per 100,000 individuals screened
No screening FIT every year Colonoscopy every 10 years
Screen detected colorectal cancers 0 0.49 0.13
Clinically detected colorectal cancers 5.73 0.64 0.42
Colorectal cancer deaths 2.08 0.30 0.17
Colorectal cancers prevented 0 5.09 5.31
Colorectal cancer deaths prevented 0 1.78 1.91
Life-years gained 0 19.05 20.92
Number of screening Colonoscopies 0 173.4 429.1
572 C. M. Rutter
include costs, the costs assigned to various validation does not provide a method for choosing
actions and events in the model need to be the correct or best model.
clearly described. The National Cancer Institute, through the
• Calibration Targets and Model Fit to Targets: CISNET group (National Cancer Institute), has
This provides information about observed championed the comparative modeling approach,
information that the model is able to accurately by funding more than one modeling group to
simulate and how accurately the model simu- address policy questions. Examples of compara-
lates these data. tive modeling include estimation of the combined
effects of screening and treatment on breast cancer
Technical documentation should be sufficiently mortality based on seven CISNET models for
detailed to enable others to reproduce the model, if breast cancer (Berry et al. 2006) and the
they wish. This documentation should include: Mt. Hood Challenge comparing diabetes models
(The Mount Hood 4 Modeling Group 2007). Each
• Mathematical formulae for transition rules: if of these groups compared models only after stan-
the model is based entirely on fixed transition dardizing the calibration targets. Without such
probabilities, then these should be provided. cooperation, with each group simulating and pre-
• Methods used for model calibration: as this senting results under the same conditions, it can
would enable others to reproduce the model. be difficult to directly compare model results.
Cross-model comparisons can be very time con-
While release of computer code is seemingly suming, involving coordination across modeling
the most transparent approach, this strategy is groups, and so are generally only practical for
time consuming and ultimately uninformative to major policy questions.
the vast majority of end users so that code release
may obscure rather than clarify the model External Validation
assumptions. External validation refers to how well the model is
able to predict (or “fit”) existing data that was not
used for model calibration. Predictive validation
Internal Validity takes this idea a step further and refers to how well
Internal validity, or verification, refers to coding the model is able to predict study outcomes before
accuracy. Verification of code is a process that they are observed. Among the validity measures
takes place within a modeling team and can be discussed, external validity and predictive validity
facilitated by modular programming to allow test- most closely correspond to the models’ purpose
ing of specific blocks of code. and therefore are critical to model confidence. Yet
it is uncommon for models to carry out external or
Cross-Validation predictive validation exercises, largely because of
Cross-validation, also known as comparative data limitations.
modeling, is based on comparing results obtained Both external and predictive validation exer-
from different models and is the primary method cises require new data. For a model to be imme-
for evaluating structural variability. Cross- diately validated after development, some data
validation provides a way to assess model pre- would have to be held out for validation. But
dictions in the absence of observed or “gold stan- because models are complex, modelers often
dard” information and also provides a way of need to use all available data to inform parame-
exploring unobserved or unobservable phenom- ters. In some cases, modelers may validate to data
ena that are predicted by the model but cannot be that is partially dependent on calibration data,
validated against observed data such as predicted which represents a gray area between goodness
disease incidence in future years. Cross-validation of fit to calibration targets (sometimes referred to
may be reassuring when model predictions are as internal calibration) and external validation.
similar, but when there are differences, cross- For example, a model may use overall disease
574 C. M. Rutter
incidence rates by decades of age as a calibration of the ISPOR-SMDM modeling good research prac-
target and then validate the model by predicting tices Task Force Working Group-6. Med Decis Mak.
2012;32(5):722–32.
incidence rates by sex and age in years. To main- Cancer Incidence – Surveillance, Epidemiology, and End
tain trust in a model, it is critical that modelers be Results (SEER) Registries Research Data [database on
transparent about their validation approaches, the Internet]. National Cancer Institute, Surveillance
clearly stating when partially dependent data are Systems Branch. 2012. Available from: http://seer.can
cer.gov/data/seerstat/nov2011/.
used for validation. Caro JJ, Briggs AH, Siebert U, et al. Modeling good
research practices – overview: a report of the ISPOR-
SMDM modeling good research practices Task Force-
In Conclusion 1. Med Decis Mak. 2012;32(5):667–77.
Church JM. Clinical significance of small colorectal
polyps. Dis Colon Rectum. 2004;47(4):481–5.
Microsimulation models are a powerful tool for CISNET. 2014. Available at: http://cisnet.cancer.gov.
systematically combining evidence from a variety Accessed 30 Apr 2014.
of sources to provide critical information to health Cronin KA, Legler JM, Etzioni RD. Assessing uncertainty
in microsimulation modelling with application to can-
policy decision maker. Decision problems can be cer screening interventions. Stat Med. 1998;17(21):
unconstrained, assuming unlimited resources, or 2509–23.
they can be constrained to restrict resources such Doubilet P, Begg CB, Weinstein MC, et al. Probabilistic
as total costs or treating physicians. The use of sensitivity analysis using Monte Carlo simulation. A
practical approach. Med Decis Mak. 1985;5(2): 157–77.
models to inform policy is increasing, partly due Eddy D. Breast cancer screening for Medicare beneficiaries:
to increasing computational power but also effectiveness, costs to Medicare and medical resources
because of increasing interest in evidence-based required. Washington, DC: U.S. Congress, Health Pro-
medicine. Yet there remain concerns about credi- gram, Office of Technology Assessment; 1987.
Eddy DM, Hollingworth W, Caro JJ, et al. Model transpar-
bility of model predictions. These concerns are a ency and validation: a report of the ISPOR-SMDM
natural consequence of the complexity of models modeling good research practices Task Force-7. Med
and their focus on prediction, which requires Decis Mak. 2012;32(5):733–43.
extrapolation beyond available data. One way to Etzioni R, Penson DF, Legler JM, et al. Overdiagnosis due
to prostate-specific antigen screening: lessons from
build model credibility is to make model assump- U.S. prostate cancer incidence trends. J Natl Cancer
tions as transparent as possible. Another way to Inst. 2002;94(13):981–90.
build credibility is through model predictions, that Hardcastle JD, Chamberlain JO, Robinson MH, et al.
is, by comparing model predictions to observed Randomised controlled trial of faecal-occult-blood
screening for colorectal cancer. Lancet. 1996;348
data and, when possible, allowing end users to (9040):1472–7.
examine model predictions under different hypo- Hixson LJ, Fennerty MB, Sampliner RE, et al. Prospective
thetical scenarios. study of the frequency and size distribution of polyps
missed by colonoscopy. J Natl Cancer Inst. 1990;
82(22):1769–72.
Hunt CA, Kennedy RC, Kim SH, et al. Agent-based
References modeling: a systematic assessment of use cases and
requirements for enhancing pharmaceutical research
Atkin WS, Edwards R, Kralj-Hans I, et al. Once-only and development productivity. Wiley Interdiscip Rev
flexible sigmoidoscopy screening in prevention of Syst Biol Med. 2013;5(4):461–80.
colorectal cancer: a multicentre randomised controlled Imperiale TF. Sigmoidoscopy screening: understanding the
trial. Lancet. 2010;375(9726):1624–33. trade-off between detection of advanced neoplasia and
Beck J, Pauker S. The Markov process in medical progno- diagnostic efficiency. J Natl Cancer Inst. 2013;
sis. Med Decis Mak. 1983;3:419–58. 105(12):846–8.
Berry DA, Inoue L, Shen Y, et al. Modeling the impact of Imperiale TF, Wagner DR, Lin CY, et al. Risk of advanced
treatment and screening on U.S. breast cancer mortal- proximal neoplasms in asymptomatic adults according
ity: a Bayesian approach. J Natl Cancer Inst Monogr. to the distal colorectal findings. N Engl J Med.
2006; 36:30–6. 2000;343(3):169–74.
Brauer F, Castillo-Chavez C. Mathematical models for Johnson CD, Chen MH, Toledano AY, et al. Accuracy of
communicable diseases. Philadelphia: Society for CT colonography for detection of large adenomas and
Industrial and Applied Mathematics; 2013. cancers. N Engl J Med. 2008;359(12):1207–17.
Briggs AH, Weinstein MC, Fenwick EA, et al. Model Karnon J, Stahl J, Brennan A, et al. Modeling using dis-
parameter estimation and uncertainty analysis: a report crete event simulation: a report of the ISPOR-SMDM
24 Micro-simulation Modeling 575
modeling good research practices Task Force-4. Value Pitman R, Fisman D, Zaric GS, et al. Dynamic transmis-
Health. 2012;15(6):821–7. sion modeling: a report of the ISPOR-SMDM
Knudsen AB, Lansdorp-Vogelaar I, Rutter CM, et al. modeling good research practices Task Force
Cost-effectiveness of computed tomographic Working Group-5. Med Decis Mak. 2012;32(5):
colonography screening for colorectal cancer in the 712–21.
Medicare population. J Natl Cancer Inst. 2010; Rex DK, Cutler CS, Lemmel GT, et al. Colonoscopic miss
102(16):1238–52. rates of adenomas determined by back-to-back colonos-
Kronborg O, Fenger C, Olsen J, et al. Randomised study of copies. Gastroenterology. 1997;112(1): 24–8.
screening for colorectal cancer with faecal-occult- Rex DK, Johnson DA, Anderson JC, et al. American
blood test. Lancet. 1996;348(9040):1467–71. College of Gastroenterology guidelines for colorectal
Lansdorp-Vogelaar I, Kuntz KM, Knudsen AB, et al. Stool cancer screening 2009 [corrected]. Am J Gastroenterol.
DNA testing to screen for colorectal cancer in the 2009;104(3):739–50.
Medicare population. A cost-effectiveness analysis. Roberts M, Russell LB, Paltiel AD, et al. Conceptualizing a
Ann Intern Med. 2010;153(6):368–77. model: a report of the ISPOR-SMDM modeling good
Levin B, Lieberman DA, McFarland BG, et al. Screening research practices Task Force-2. Med Decis Mak.
and surveillance for the early detection of colorectal 2012;32(5):678–89.
cancer and adenomatous polyps, 2008: a joint guideline Rutter CM, Savarino JE. An evidence-based micro-
from the American Cancer Society, the US Multi- simulation model for colorectal cancer. Cancer
society task force on Colorectal Cancer, and the Amer- Epidemiol Biomark Prev. 2010;19(8):1992–2002.
ican College of Radiology. Gastroenterology. 2008; Rutter CM, Yu O, Miglioretti DL. A hierarchical
134(5):1570–95. non-homogenous Poisson model for meta-analysis of
Lieberman DA, Weiss DG, Bond JH, et al. Use of colonos- adenoma counts. Stat Med. 2007;26(1):98–109.
copy to screen asymptomatic adults for colorectal can- Rutter CM, Miglioretti DL, Savarino JE. Bayesian calibra-
cer. Veterans affairs cooperative study group 380. tion of microsimulation models. J Am Stat Assoc.
N Engl J Med. 2000;343(3):162–8. 2009;104(488):1338–50.
Loeve F, Boer R, Zauber AG, et al. National Polyp Study Rutter CM, Zaslavsky AM, Feuer EJ. Dynamic micro-
data: evidence for regression of adenomas. Int J Cancer. simulation models for health outcomes: a review. Med
2004;111(4):633–9. Decis Mak. 2010;31(1):10–8.
Luke DA, Stamatakis KA. Systems science methods in Siebert U, Alagoz O, Bayoumi AM, et al. State-transition
public health: dynamics, networks, and agents. Annu modeling: a report of the ISPOR-SMDM modeling
Rev Public Health. 2012;33:357–76. good research practices Task Force-3. Med Decis
Mandelblatt J, Schechter C, Levy D, et al. Building better Mak. 2012;32(5):690–700.
models: if we build them, will policy makers use them? Strul H, Kariv R, Leshno M, et al. The prevalence rate
Toward integrating modeling into health care decisions. and anatomic location of colorectal adenoma and
Med Decis Mak. 2012;32(5):656–9. cancer detected by colonoscopy in average-risk indi-
Muller CM, Mandelblatt J, Schechter C. The cost and viduals aged 40–80 years. Am J Gastroenterol.
effectiveness of cervical cancer screening in elderly 2006;101(2):255–62.
women. Washington, DC: Congress of the United The Mount Hood 4 Modeling Group. Computer modeling
States, Office of Technology Assessment; 1990. of diabetes and it's complication: a report on the 4th
National Cancer Institute. Cancer Intervention and Surveil- Mount Hood challenge meeting. Diabetes Care.
lance Modeling Network (CISNET). n.d.. Available at: 2007;30:1638–46.
http://cisnet.cancer.gov/. Accessed 2008. Towler B, Irwig L, Glasziou P, et al. A systematic review
National Center for Health Statistics. US Life Tables. of the effects of screening for colorectal cancer
2000.; Available at: www.cdc.gov/nchs/products/ using the faecal occult blood test, hemoccult. BMJ.
pubs/pubd/lftbls/life/1966.htm. Accessed 2013. 1998;317(7158):559–65.
Odom SR, Duffy SD, Barone JE, et al. The rate of adeno- U. S. Preventive Services Task Force. Screening for colo-
carcinoma in endoscopically removed colorectal rectal cancer: U.S. Preventive Services Task Force
polyps. Am Surg. 2005;71(12):1024–6. recommendation statement. Ann Intern Med.
Parmigiani G. Measuring uncertainty in complex decision 2008;149(9):627–37.
analysis models. Stat Methods Med Res. 2002;11(6): van der Akker-van Marle ME, van Ballegooijen M, van
513–37. Ootmarssen GJ, et al. Cost-effectivness of cervical
Petitti DB. Meta-analysis, decision analysis, and cost- cancer screening: comparison of screening policies.
effectiveness analysis: methods for quantitative synthe- J Natl Cancer Inst. 2002;94:193–204.
sis in medicine. 2nd ed. New York: Oxford University Vanni T, Karnon J, Madan J, et al. Calibrating models in
Press; 2000. 306 p. economic evaluation: a seven-step approach.
Pickhardt PJ, Choi JR, Hwang I, et al. Computed tomo- PharmacoEconomics. 2011;29(1):35–49.
graphic virtual colonoscopy to screen for colorectal Vogelaar I, Van Ballegooijen M, Schrag D, et al. How much
neoplasia in asymptomatic adults. N Engl J Med. can current interventions reduce colorectal cancer mor-
2003;349(23):2191–200. tality in the U.S.? Cancer. 2006; 107:1623–33.
Network Meta-analysis
25
Georgia Salanti, Deborah Caldwell, Anna Chaimani, and
Julian Higgins
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
Example: Incident Diabetes with Antihypertensive Drugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
A Roadmap to the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
Meta-analysis of Head-to-Head Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
Types of Data that Feed into a Meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
Meta-analysis and Meta-regression as Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
Meta-analysis as Hierarchical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
Fitting the Meta-analysis Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
Indirect and Mixed Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
Theory and Formulae for Indirect Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
Theory and Formulae for Mixed Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
Assumptions Underlying Indirect and Mixed Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
Models for Network Meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
Consistency Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
Assumptions of Network Meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
Statistical Methods to Detect Inconsistency in a Network of Interventions . . . . . . . . . . . . 603
Inconsistency Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
B-blockers CCB
5
3
1 3
1
ARB 1 2 ACE
2 inhibitors
1 3
1 2
2
3 Diuretics
Placebo
Fig. 1 Plot of network for incidence of diabetes. The size network. The numbers represent the number of studies
of the nodes is proportional to the number of studies that including each comparison (CCB calcium channel
evaluate each intervention and the thickness of the lines is blockers, ARB angiotensin receptor blockers, ACE angio-
proportional to the frequency of each comparison in the tensin converting enzyme)
580 G. Salanti et al.
from Network Meta-analysis” reviews numerical fulfill the predefined inclusion criteria. Consider
and graphical methods for presenting that N studies, indexed with i = 1 , . . . , N and
results from network meta-analysis to assist comparing two treatments. A and B are included
clinicians with interpretation of findings. that contribute data for a particular outcome of
interest. The data from each study can be arm
based or contrast based. The first term refers to
Meta-analysis of Head-to-Head data that apply to each arm; for dichotomous
Comparisons outcomes these can be the number of successes
riA, riB out of the total randomized niA, niB for the
Pairwise meta-analysis summarizes the relative arms A and B, respectively. For continuous out-
effectiveness of two interventions across comes, the arm-based data are the outcome
N studies. Two basic parametric models are usu- means miA, miB, standard deviations sdiA, sdiB
ally used: the fixed-effect model and the random and total numbers of participants niA, niB
effects model. Under the fixed-effect assump- per arm.
tion, it is considered that all studies estimate the Instead of presenting the outcome in each arm
same underlying treatment effect. In the random separately, a study can report the difference in
effects model, it is assumed that there is a study- the outcome between the two arms using a sta-
specific treatment effect underlying each study tistic. The contrast-based approach refers to
and that the observations from different studies study-specific statistics that compare the two
estimate these different underlying effects. The arms. With dichotomous outcomes, the statistics
study-specific underlying effects can be different are usually the odds ratios (OR), risk ratios, risk
yet related, and it is assumed that they “belong” differences, or hazard ratios, whereas for contin-
to the same distribution. The variance of this uous outcomes, it is usually mean differences,
distribution is the heterogeneity parameter standardized mean difference, or ratios of means.
describing the magnitude of the between-study The logarithmic transformation of ratio mea-
variation. Meta-analysis can be viewed as spe- sures (e.g., odds and risk ratios) is typically
cial case of a weighted linear regression or as a applied in practice. Let yiAB be generic notation
hierarchical model. Both models are equivalent; for one of these statistics, which will be referred
though linear regression approaches are the most to as “the effect size.” The sample variance of an
common approach in a frequentist implementa- effect size will be denoted with s2iAB Of course,
tion when treatment effect estimates are the arm-specific data can be transformed into
starting point of the analysis (known as a “con- contrast-based data before the start of the analy-
trast-based” approach), and hierarchical sis. However, modeling arm-specific data is
approaches are usually encountered when sum- often an advantage in terms of model fit and
mary data from each treatment group are the therefore detailed data, if available, should be
starting point of the analysis (an “arm-based” given preference.
approach), often fitted in a Bayesian framework.
These ideas are discussed in detail in the follow-
ing three subsections.
Meta-analysis and Meta-regression
as Linear Model
Types of Data that Feed into a Meta-
analysis Meta-analysis can be viewed as a linear regression
model with no covariates. As each observation
The systematic review process first requires represents a study and these studies typically
identification and appraisal of studies that have different sample sizes, it is reasonable to
address the research question of interest. Then, weight the observations accordingly; hence,
relevant data are extracted from the studies that meta-analyses are fitted as a weighted linear
25 Network Meta-analysis 581
yi ¼ μ1 þ μ2 xi þ δi þ ei
θi N μ, τ2 (5)
calculate probabilities of one treatment being bet- with dummy variables, to carry out subgroup anal-
ter than the other, or outperforming another by a ysis on contrast-based data (the ln(OR) for diabetes
specific magnitude. This is an important advan- from each study), using the treatments being com-
tage when many treatments need to be compared pared to define two subgroups. There are three
and pairwise presentation of effect sizes becomes studies comparing ACE inhibitors versus
cumbersome. Calculation of probabilities is pos- β-blockers and five comparing CCB versus
sible in a frequentist setting via resampling tech- β-blockers. Although a regression model is usually
niques, but this typically requires specialized written with an intercept and one or more regres-
routines or extra programming for the user. sion terms, it can also be written with no intercept
Several software options exist that fit meta- as in Eq. 3. The eight observed ln(OR) estimatesare
analysis models in a frequentist setting. Freely denoted asyi using study indices,
available software includes RevMan and pack- i = 1 , 2 , . . . , 8. Each yi is then written as a
ages in R; a popular commercial option is function of the variables xiACEBB and xiCCB BB.
STATA. The available routines and software These variables take values xiACEBB = 1 if study
frame the flexibility of models; for instance, it is i compares ACE inhibitors versus β-blockers and
not possible to fit arm-specific data using their xiACEBB = 0 otherwise, and xiCCB BB = 1 for
exact likelihood with the existing meta-analysis CCB versus β-blockers and zero otherwise. The
specific routines. meta-regression model that gives the summary
With network meta-analysis increasing in effects for these two comparisons is
popularity, Bayesian approaches have become
popular as they offer greater flexibility, and yi ¼ μACEBB xiACEBB þ μCCBBB xiCCBBB þ δi
WinBUGS is the most common software used. þ ei
Meta-analysis can be fitted as a linear or hierar-
chical model and both arm-specific or contrast where δi is the study-specific random effect.
specific data can be modelled, giving Bayesian Fitting this model in STATA using the command
fitting a practical advantage compared to the metareg and specifying the method of moments
frequentist approach. as the method to estimate the heterogeneity
parameter produces the results shown in the
upper part of Table 1.
Example: Subgroup Meta-analysis for ACE Τhe coefficients μ of the regression are the
Inhibitors and CCB Versus β-Blockers subgroup-specific summary effects μACE BB,
To exemplify the methods outlined above, consider μCCB BB on the ln(OR) scale. The heterogeneity
the two comparisons CCB versus β-blockers and parameter was estimated as τ2 = 0.01 and the
ACE inhibitors versus β-blockers from the network proportion of variability due to heterogeneity
introduced earlier relating to incident diabetes. rather than sampling error (after accounting for
Firstly, a meta-regression model will be fitted, subgroup differences) as I2 = 59%.
Table 1 Results of subgroup analysis for ACE inhibitors (OR) with their 95% confidence or credible interval
versus b-blockers and CCB versus b-blockers. Log-odds (CI/CrI) estimated from meta-regression and hierarchical
ratios (b) with their standard error SE(b) and odds ratios models are reported
95% CI/CrI for
Model Comparison μ SE(μ) OR OR
Linear model in frequentist ACE inhibitors versus 0.17 0.10 0.84 (0.69,1.03)
implementation β-blockers
CCB versus β-blockers 0.21 0.07 0.81 (0.71,0.93)
Hierarchical model in Bayesian ACE inhibitors versus 0.18 0.12 0.84 (0.66,1.06)
implementation β-blockers
CCB versus β-blockers 0.21 0.09 0.81 (0.68,0.97)
584 G. Salanti et al.
The subgroup meta-analysis can also be fitted The estimates obtained from the two
with the 2 2 tables as the starting point rather approaches are very similar. The major difference
than the ln(OR), and it is convenient to write between the two approaches is in the estimation of
this implementation as a hierarchical model. heterogeneity. The subgroup meta-analysis fitted
The outcome in each study is the number of within a Bayesian setting with the binomial like-
patients diagnosed with diabetes and therefore lihood gives a posterior median of τ2 equal to 0.02
the binomial likelihood can be used in a hierar- with 95% CrI (0.001,0.12), slightly larger than the
chical model. This means that the number point estimate from the frequentist meta-
of events (patients with diabetes) in each study regression.
arm, riBB and riACE for the first three
studies comparing ACE inhibitors to β-blockers
or riBB and riCCB for the five studies comparing
CCB to β-blockers, follow a specific binomial Indirect and Mixed Comparison
distribution with a respective probability of
success: Theory and Formulae for Indirect
Comparisons
Fig. 2 Chain of comparisons network of chemotherapy treatments for second-line treatment of breast cancer
Table 2 Findings from the manufacturer’s submission for gemcitabine STA. Median difference in survival and 95%
confidence intervals (Adapted from: Eli Lilly 2006 and Jones et al. 2006)
Treatment comparison Trials Median difference (MD) (95% CI) SE (MD) Variance (MD)
Gemcitabine + paclitaxel 1 2.8 (0.01, 5.6) 1.42 2.02
(A) versus paclitaxel (B)
Paclitaxel (B) versus 1 2.7 (0.3, 5.1) 1.24 1.54
docetaxel(C)
Docetaxel (C) versus 1 3.0 (0.6, 5.4) 1.20 1.44
Capecitabine + docetaxel
(D)
The usual measures of statistical variability can the network is “connected” and not necessarily
be derived for the indirect estimate. As μ ^ IBC is via a common comparator. Consider the
formed as a difference between two independent network shown in Fig. 2 which is adapted
estimates its variance, ^v IBC , is equal to the sum of from a 2006 submission to NICE which included
the variances, ^v D
AC and ^ vD
AB , estimated from the four distinct regimens for the second-line
direct AC and AB comparisons: treatment of metastatic breast cancer (Eli
Lilly 2006).
^v IBC ¼ ^v D
AC þ ^
vDAB
Table 2 reports the results for difference in
median years survival. Note there are direct esti-
A single head-to-head randomized trial is as mates available for gemcitabine + paclitaxel ver-
precise as an indirect comparison based on four sus paclitaxel ð^μD
AB ¼ 2:8 years), paclitaxel
trials of the same size. To see this, suppose each μD
versus docetaxel (^ BC ¼ 2:7 years and docetaxel
trial produces an estimate with variance σ 2. A versus capecitabine + docetaxel ð^ μDCD ¼ 3:0
meta-analysis of s trials with direct estimates of years), which forms a “chain” of evidence A-B-
A versus B will have variance ^v DAB ¼ σ =s (based
2 C-D. The comparison of interest to the decision-
on inverse variance weights). The indirect esti- maker was gemcitabine + paclitaxel (A) versus
mate of B versus C via A based on s AB and s capecitabine + docetaxel (D) (Jones et al. 2006),
AC trials will have variance ^v IBC ¼ ^v D
AC þ ^
vDAB ¼
an indirect comparison of which can be
σ =s þ σ =s ¼ 2σ =s.
2 2 2 formed as
A common misconception is that for an
indirect comparison to be valid, every trial
μ
^ IAD ¼ μ
^DAB þ μ
^DBC þ μ
^DCD ¼ 2:9 years
must include a common comparator (Hughes
2010). In truth, indirect estimates can be derived I pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
via many routes. The only requirement is that ^ AD ¼ ^v AB þ ^v BC þ ^v CD ¼ 2:23
SE μ
586 G. Salanti et al.
If both direct and indirect estimates are available The variance of the indirect estimate μ^ IACECCB
for the same comparison, they can be combined is the sum of the variances of μ ^ ACEBB and
I
0.5 1 1.5
the confidence interval for the mixed estimate is HTA organizations have expressed doubts about
narrower than the confidence interval for the indirect comparisons and state that if direct evi-
direct estimate. By combining direct evidence dence exists, it should take precedence. For exam-
with indirect evidence, the variance is reduced ple, the Cochrane Handbook (Higgins and Green
by 45%. 2008) states “indirect comparisons may suffer the
This approach is intuitive and will be familiar to biases of observational studies” and advises that
meta-analysts; however, it is labor-intensive. In a direct and indirect evidence only be combined as a
three treatment network, A-B-C, the meta-analytic supplemental analysis. In England and Wales,
estimates μ^DAB , μ
^DAC , μ^D
BC must be obtained, the NICE (2008) uses direct evidence as the reference
indirect estimates μ ^ AB , μ
I
^ IAC , μ^ IBC derived, and case for appraisals of new technologies, only allo-
then the mixed estimates μ ^ AB , μ
M
^M
AC , μ
^MBC com- wing indirect and “mixed” comparisons as a
puted. As the number of treatments increases and supporting analysis. Similarly, CADTH (Wells
the network expands, this approach quickly et al. 2009) adopts a cautious stance and the Phar-
becomes untenable and more sophisticated maceutical Benefits Advisory Committee (PBAC)
approaches can be used (Caldwell et al. 2005). in Australia have expressed skepticism about the
Section “Models for Network Meta-analysis” dis- use of indirect evidence (PBAC 2008).
cusses methods for simultaneously combining This caution is based on concerns regarding the
direct and indirect evidence in a single analysis. key assumption underpinning indirect compari-
The next section discusses the underlying assump- sons, which is reflected mathematically in the
tions needed to undertake an indirect or mixed consistency equation μBC = μAC μAB. The con-
comparison. sistency equation relates to the true (or average)
effectiveness of B versus C rather than to each
individual study. It states that the effect of B
Assumptions Underlying Indirect versus C can be estimated either indirectly via A
and Mixed Comparisons (right part of the equation) or directly (left part of
the equation) and that these two pieces of evi-
Current hierarchies of evidence place indirect and dence will, on average, give the same result.
“mixed” comparisons below direct evidence Rearranging the parts of the equation shows that
regardless of whether the constituent effect esti- one consistency equation is sufficient to reflect
mates have been obtained from meta-analyses of consistency for all three comparisons in a three
RCTs (currently “level one” evidence). Several treatment network. Such that
588 G. Salanti et al.
μBC ¼ μAC μAB , μAB ¼ μAC μBC , studies do not allow valid indirect comparisons,
μBC ¼ μAC μAB due to important differences between studies that
prevent an assumption of transitivity from hold-
ing. If there is inconsistency in the data, the
The validity of the consistency equation is assumption of transitivity is clearly challenged.
fundamental to the validity of indirect compari- For an “open loop,” that is one for which there is
sons. In considering the validity of the assumption indirect evidence but no direct evidence, consis-
for the combination of direct and indirect evi- tency cannot be evaluated statistically, and the
dence, some authors have found it instructive to validity of the indirect comparison must rest
separate the notion of similarity (Song et al. 2009; entirely on clinical and epidemiological judge-
Donegan et al. 2010) or transitivity (Baker and ments regarding the plausibility of transitivity. It
Kramer 2002) from the notion of consistency. In can be shown mathematically (Lu and Ades
the current chapter, these notions are interpreted 2009) that consistency is a consequence of the
as the distinction between clinical or epidemio- assumption of exchangeability that forms the
logical considerations on the one hand, and statis- basis of the Bayesian network meta-analysis
tical considerations on the other. Transitivity models which is, in turn, an extension of the
refers to the genuine ability to learn about a usual assumption made in a pairwise meta-
pairwise comparison via an intermediate treat- analysis (Dias et al. 2010). The assumption of
ment via indirect comparison. As will be transitivity is essentially equivalent to the
discussed below, it requires the intermediate treat- assumption of exchangeability in this sense,
ment to be equivalent when compared against since it relates to similarity of studies. The term
each of the treatments of interest and that the “transitivity” might be preferred to “similarity”
actual studies contributing to the indirect compar- (Donegan et al. 2010); however, because (i) it
ison do not differ in important ways. Specifically, better describes the aim of the assumption to
when μIAB is calculated, it is assumed that we can compare two treatments via a third one; (ii) it
learn about B versus C via A. The common com- clearly refers to more than two comparisons
parator A might be said to be “transitive” when it whereas the term “similarity” reduces to homo-
allows valid comparison of the treatments to geneity when we refer to a single head-to-head
which it is linked. Note that transitivity is not a comparison; and (iii) “similarity” may be mis-
property of the common treatment A alone but of interpreted as necessitating all trial and patient
the two sets of studies it links. characteristics to be similar, when in truth a valid
Consistency is a statistical notion that can be indirect comparison can be obtained even when
considered at the level of the parameters or the studies are dissimilar, so long as such character-
level of the data. The consistency equation defines istics do not modify treatment effect.
relationships among the parameters. The validity
of the assumption embodied in the equation can Requirements for Transitivity
be assessed only when data from different sources Transitivity requires some particular characteris-
form a “closed loop” of evidence in the network tics of the studies contributing to the indirect
(a path that starts and ends at the same node comparison, as follows:
treatment). When the consistency assumption
does not hold or when there is evidence of dis- • The two sets of trials AB and AC do not differ
agreement between direct and indirect evidence, with respect to the distribution of effect
then the evidence is said to be inconsistent modifiers.
(or show inconsistency).
When transitivity holds and there are multiple
sources of evidence, the consistency equation In order for an indirect comparison to be valid,
should hold. The consistency equation may hold the distribution of treatment effect modifiers
in a statistical sense, however, even when the should be similar in AB and AC trials. Before
25 Network Meta-analysis 589
example, consider that the common comparator A the subscript denoting the loop) (Lu and Ades
is a treatment which can be given at different 2006) as it can be shown that the same inconsis-
doses, but there is no systematic difference on tency factor will be obtained whichever edge of
the average dose of A between AC and AB trials. the triangle is of interest.
In this case the assumption can hold although The variance of the inconsistency factor is
there could be heterogeneity within AC and AB
comparisons. Consequently, the “anchor” treat- b ABC ¼ ^v D þ ^v Ι
var IF BC BC
ment A can be represented by a single node allo-
wing the indirect comparison of B and C. If,
however, A is given via a different administration A 95% confidence interval can be obtained for
the inconsistency factor as b ABC 1:96
IF
route in all AC and AB trials, then it is question- rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
able whether the two types of A can form a com- var IF b ABC . The null hypothesis of evidence
mon node and an indirect comparison of B versus
C via A would be impossible. For example, when b ¼ 0 can then be tested by deriving
consistency IF
comparing different fluoride treatments, compari- a z-test (Bucher et al. 1997).
son between fluoride toothpaste and fluoride rinse
can be made via placebo. However, placebo tooth- b ABC
IF
z ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
paste and placebo rinse might not be comparable
var IF ABCb
as the mechanical function of brushing might have
a different effect on the prevention of caries. If this
is the case, the transitivity assumption is doubtful If consistency holds, it is reasonable to com-
(Salanti et al. 2009). bine across μ ^D
BC and μ^ IBC to form μ ^M
BC . However, if
there is evidence of a “statistically significant”
discrepancy ( p 0.05), the fundamental
Estimating Inconsistency in Mixed
assumption is not fulfilled, and one may say that
Comparisons
there is evidence of inconsistency.
In theory, the consistency equation μBC = μAC
Claims have been made that indirect compari-
μAB must hold if transitivity is deemed to hold.
sons may systematically over- (Bucher et al. 1997;
However, in practice, there may be inconsistency
Mills et al. 2011) or underestimate treatment effects
in the evidence base. In a three-treatment network,
compared with direct comparisons. Since inconsis-
three independent direct estimates, μ ^D
ΑΒ 0 , μ
^DΑC and tency is a property of a “loop” of evidence apparent
μ
^DBC (assuming there are no trials with more than overestimation of a treatment effect on one side of a
two arms), and three indirect estimates, μ ^ IAB 0 , μ
^ IAC triangle network (e.g., μ ^ IBC ) corresponds to under-
and μ^ IBC, can be obtained. Assuming the treatment estimation of another (e.g., μ ^ IAC ). Thus, any assess-
comparison of interest is B versus C, the discrep- ment of consistency needs to take account of the
ancy (difference) between the direct and indirect particular circumstances of the problem. Until
estimates forms the measure of inconsistency. recently, empirical investigation of the extent of
This discrepancy is called the inconsistency factor inconsistency has been limited. In a recent review,
(IF) which is estimated as (Song et al. 2011) examined 112 independent three-
D treatment networks and detected 16 cases of statis-
b ABC ¼ μ
IF ^ ΙBC
^ BC μ tically significant discrepancies between direct and
indirect estimates. However, there was no consis-
Note that the direction of the difference might tent direction as to over- or underestimation. Of
be clinically important but mathematically is course, the test for inconsistency may have low
unimportant for the statistical evaluation of con- power to detect true inconsistency should it exist,
sistency. Consequently only absolute differences as with other interaction effects. The analyst must
are taken. In a three-treatment network, only one therefore be extremely cautious in their interpreta-
measure of inconsistency is possible (and hence tion even if inconsistency is not detected.
25 Network Meta-analysis 591
Note that the discovery of inconsistency does leading to a p-value equal to 0.91. Note that this
not necessarily mean that all indirect compari- result applies to the entire triangle: the same
sons in the loop are invalid. For example, sup- inconsistency factor and p-value could have
pose that AC and BC trials are similar regarding been obtained by calculating the difference
the distribution of effect modifiers (e.g., all stud- between direct and indirect evidence for the
ies are carried out in adults with a similar distri- ACE versus β-blockers or CCB versus
bution in age), so that μ^ IAB is a valid estimate of β-blockers comparisons. As the 95% CI includes
the relative effectiveness of A versus B for the zero, there is no indication of important statistical
given setting and population. If now the AB inconsistency between direct and indirect esti-
studies have all being carried out in younger mate, which is also supported by the p-value.
populations (e.g., in adolescents) then the con-
sistency assumption does not hold; both μ ^ IAB and
μ
^ AB are valid but answer different questions;
D
Models for Network Meta-analysis
hence computation of a mixed estimate, μ ^M
AB ,
would be inappropriate. Extensions of the ideas above to more than three
treatments lead to a general framework for network
meta-analysis. Consider a set of T treatments of
Example: Inconsistency in the Evidence Trian- interest that we want to evaluate according to
gle ACE Inhibitors Versus CCB Versus their relative effectiveness on a single outcome
β-Blockers measure. The treatments are studied collectively
Inconsistency can be evaluated by calculating the in N studies. Each study may provide evidence
difference between direct and indirect estimate for about some of the treatments; it will include only
the same comparison. In the case of ACE inhibi- a subset of T, Ti T. The study data can be
tors versus CCB, the inconsistency factor reflects arm-based or contrast-based. In the contrast-based
the disagreement in the triangle formed by the approach, the effect sizes yijk from each study are
three sets of trials ACE inhibitors versus CCB available, and they refer to the relative effective-
versus β-blockers and is calculated as ness of a treatment k relative to j with j , k Ti
Network meta-analysis can be viewed as a special
b ACECCBBB
IF case of meta-regression (linear model), as a hierar-
D
¼ μ^ ACECCB μ
^ IACECCB j¼j 0:22 0:04j ¼ 0:26 chical model or as a multivariate meta-analysis
model. The estimation methods that arise from
The standard error of the inconsistency factor these approaches are essentially equivalent and
is obtained as can be employed under the assumption of consis-
tency or under assumptions that impose fewer
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi restrictions.
b ACECCBBB ¼ ^v D
SE IF ACECCB þ ^ v IACECCB
A key issue in all methods for fitting network
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi meta-analysis is the minimization of the parame-
¼ 0:0144 þ 0:0225
¼ 0:192 ters’ space by selecting a minimum set of basic
parameters. This is a set of comparisons (as many
A 95% CI for the inconsistency factor is as the total number of treatments minus one) that
are sufficient to generate all possible comparisons
b 1:96 SE IF
obtained as IF b ¼ ð0:12, 0:64Þ.
between the treatments via the consistency equa-
^ ACECCBBB
The z-test for the hypothesis H 0 : IF tions. Under consistency, the choice of the basic
¼ 0 is parameters does not affect the results but typically
the basic parameters are defined by taking the
b
IF 0:26 comparisons of all treatments versus a common
z¼ ACECCBBB ¼ ¼ 1:35
b ACECCBBB 0:192 reference to simplify interpretation. Examples to
SE IF
follow should make this clear.
592 G. Salanti et al.
meta-analysis summary effects for all other com- a common reference treatment. Here, placebo (P) is
parisons, say B versus D, can be obtained by con- chosen to be the reference treatment and basic
sidering the consistency equations relating the μ ^ Aj contrasts are defined for each treatment versus pla-
to the functional parameters. Their variances can be cebo. Then, to specify the design matrix all com-
obtained by combining standard errors and covari- parisons in the network need to be written as
ances (from the variance-covariance matrix for the functions of the basic parameters. The first two
estimated regression coefficients). For instance, columns of Table 3 list all comparisons in the
μ
^ BD ¼ μ ^ ΑD μ^ AB and SE2 ðμ ^ BD Þ ¼ SE2 ðμ
^ ΑD Þ þ network for which direct estimates are available
SE ðμ
2
^ AB Þ 2cov ðμ^ ΑD , μ
^ ΑB Þ. and the number of studies involving each compar-
Note that the random effects follow a normal ison. Then, for the five comparisons belonging to
distribution δi N(0, τ2), with heterogeneity var- the basic contrasts (e.g., β-blockers (BB) vs. P), the
iance assumed to be equal for every comparison. respective variable xi (xiBBP) takes the value 1 and
This may be a strong assumption as different the variables of the other four basic contrasts take
comparisons might include studies with different the value 0. For any other treatment comparison
between-study variability. Assuming a common (e.g., diuretics (D) vs. BB) xi takes value -1 for the
heterogeneity might impose an inappropriate τ2 first treatment (xiDP) and 1 for the second treat-
value for some comparisons. Although assuming ment based on the consistency equations
comparison-specific heterogeneities can be desir- (μDBB = μDP μBBP).
able in many cases, it presents practical difficul- The full meta-regression model is
ties. Estimation of the parameter τ2 can be
challenging if few studies are available. Even
yi ¼ μBBP xiBBP þ μDP xiDP þ μCCBP xiCCBP
with large network meta-analyses including þ μACEP xiACEP þ μARBP xiARBP þ δi þ ei :
many treatments, it is often the case that some of
the comparisons include only a few studies; some Fitting the model in STATA using metareg
comparisons might even be informed by a single produces the regression coefficients in Table 4.
study. Nevertheless, assuming a common hetero- The common heterogeneity parameter of the
geneity parameter allows comparisons to “borrow network was estimated as 0.02. The variance-
strength” from each other in the estimation of the covariance matrix of the regression-coefficients
common τ2, overcoming computational problems is saved by STATA as the “e(v)” matrix and can
that are encountered both with frequentist and be obtained after fitting the meta-regression model
Bayesian fitting of models. (Table 5).
Then any head-to-head comparison can be
derived applying again the consistency equations
Application: Network Meta-analysis Using to the point estimates. For example, the ln(OR) of
Meta-regression for Incident Diabetes diuretics versus β-blockers is μ ^ DBB ¼ μ^ DP
Standard meta-regression methods can be only be μ
^ BBP ¼ 0:32 0:24 ¼ 0:08 , and its standard
applied to networks that contain two-arm studies. error is
The following analysis treats the 30 pairwise com-
parisons in the incident diabetes data set as if they
SEðμ
^ DBB Þ
came from 30 (rather than the true 22) independent
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
studies. A meta-regression model is be employed ¼ ^v DP þ ^v BBP 2Covðμ ^ DP , μ ^ BBP Þ
where again the different comparisons define the pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
subgroups. First the T 1 “basic contrasts” need ¼ 0:008 þ 0:0076 2 0:004 ¼ 0:09
to be selected, to be included as covariates in the
model. Several combinations of basic contrasts are All other functional contrasts estimates are
possible, and for T = 6, five parameters need to be derived the same way. The network meta-analysis
selected. For ease of interpretation, it is convenient estimates for all comparisons are presented in the
to choose the comparisons of each treatment versus black diamonds in Fig. 4.
594 G. Salanti et al.
Table 3 Parameterization of design matrix for the five basic contrasts when placebo is the reference treatment for
incident diabetes
xiBBP xiDP xiCCBP xiACEP xiARBP
β-blockers diuretics CCB ARB
Comparison in Number versus versus versus ACE inhibitors versus
study i of studies placebo placebo placebo versus placebo placebo
β-blockers versus 1 1 0 0 0 0
placebo
diuretics versus 3 0 1 0 0 0
placebo
diuretics versus 2 1 1 0 0 0
β-blockers
CCB versus 1 0 0 1 0 0
placebo
CCB versus 5 1 0 1 0 0
β–blockers
CCB versus 2 0 1 1 0 0
diuretics
ACE inhibitors 2 0 0 0 1 0
versus placebo
ACE inhibitors 3 1 0 0 1 0
versus β–blockers
ACE inhibitors 2 0 0 1 1 0
versus CCB
ARB versus 3 0 0 0 0 1
placebo
ARB versus 3 1 0 0 0 1
β–blockers
ARB versus 1 0 1 0 0 1
diuretics
ARB versus CCB 1 0 0 1 0 1
Table 4 Results of network meta-analysis as meta- 95% confidence interval (CI) for all basic and functional
μ ) with
regression for incident diabetes. Log-odds ratios (^ contrasts are reported
their standard error SEðμ
^ Þ and odds ratios (OR) with their
Comparison μ
^ SEðμ
^Þ OR 95% CI for OR
β-blockers versus placebo 0.24 0.09 1.27 (1.07, 1.52)
diuretics versus placebo 0.32 0.09 1.38 (1.15, 1.64)
CCB versus placebo 0.08 0.08 1.08 (0.93, 1.27)
ACE inhibitors versus placebo 0.11 0.08 0.90 (0.77, 1.05)
ARB versus placebo 0.17 0.10 0.84 (0.69, 1.03)
Note that the confidence intervals for the com- ranking the mean OR of each treatment versus
parison ACE inhibitors versus CCB have further placebo.
reduced compared with the direct or mixed esti-
mate previously calculated. Also, an estimate for Network Meta-analysis as a Hierarchical
the comparison ARB versus ACE inhibitors is Model
obtained for which no studies exist. Figure 4 An alternative way to fit the network meta-
also shows the ranking of the treatments by analysis model is by extending the hierarchical
25 Network Meta-analysis 595
Table 5 Variance-covariance matrix of the five basic the variances of the parameters and the other cells the
parameters used in meta-regression approach for network covariances between the two corresponding parameters
meta-analysis for incident diabetes. The diagonal includes
μ
^ BBP μ
^ DP μ
^ CCBP μ
^ ACEP μ
^ ARBP
μ
^ BBP 0.0076
μ
^ DP 0.0040 0.0080
μ
^ CCBP 0.0052 0.0040 0.0070
μ
^ ACEP 0.0038 0.0034 0.0035 0.0058
μ
^ ARBP 0.0037 0.0024 0.0037 0.0022 0.0098
0 1 2
Fig. 4 Results from network meta-analysis conducted as are the point estimates of summary odds ratios (OR) and
meta-regression in STATA (black) ignoring correlations the horizontal lines represent the corresponding 95% con-
from multi-arm trials and as hierarchical model in fidence intervals (CI)
WinBUGS that account for correlations (red). Diamonds
model. For the simplest case of three treatments θiAC N ðμAC , τ2 Þ, θiAB N ðμAB , τ2 Þ,
A , B , C, assume there are studies that inform all θiBC N ðμBC , τ2 Þ:
possible comparisons. The effect size for a study
that compares A versus B is denoted by yiAB. When This situation is depicted in Fig. 5.
only two-arm trials are included in the network, The model so far is a collection of three indepen-
the likelihood for the observations is specific to dent meta-analyses. The three distributions relate to
the comparison being presented, i.e., three different means, one mean per comparison.
The consistency assumption claims that the three
yiAC N θiAC , s2iAC , yiAB N θiAB , s2iAB , means are related via μBC = μAC μAB. This con-
yiBC N θiBC , s2iBC , straint results in the indirect estimation of B versus
C and, if there are studies making the comparison
and similarly for the random effects directly, results also in the synthesis of the indirect
596 G. Salanti et al.
BC AB AC
BC AB AC
AB = AC − BC
Fig. 5 Hypothetical example from three sets of meta- effects meta-analysis and a common heterogeneity param-
analyses that form a closed loop of evidence. The dia- eter. The diamond in dashed line is the indirect estimate for
monds represent the summary effects using a random the AB comparison
evidence with the direct evidence. Indirect estima- Note that the formulation above assumes that
tion is represented by the dashed diamond in Fig. 5. the different comparisons have the same hetero-
Of course the consistency relationship works by geneity variance, as in the meta-regression model
estimating indirect and mixed estimates for all com- in section “Network Meta-analysis as a Linear
parisons, not just B versus C. Model.” The impact of this assumption can be
Extending the idea to more than three treat- better visualized in Fig. 5, where the three forest
ments is straightforward. For any two treatments plots appear to reflect different degrees of hetero-
j, k = {A, B, C, D, E} compared in study i, the geneity. Nevertheless, the three random effects
likelihood and the random effects distributions distributions have the same dispersion as a result
can be written as in a subgroup analysis, i.e., of imposing a common heterogeneity parameter
specific to the comparison j versus k of their variance τ2. This will “inflate” the uncer-
tainty in the summary estimates in the more
homogeneous sets of studies BC and AB and
yijk N θijk , s2ijk (7)
“deflate” the uncertainty in the AC estimate by
attaching a lower τ2 value for this comparison. A
θijk N μjk , τ2 (8) notable consequence is that the network meta-
analysis summary effect size for a particular com-
Assuming consistency, the means of the random parison can be less precise than the summary
effects distributions are related. Selecting again effect size from direct evidence alone. This can
T 1 basic parameters μAj, all means are related happen when a comparison with very low or no
via μjk = μAk μAj. There are as many consistency heterogeneity enters a network that consists of
equations as comparisons-specific meta-analysis, heterogeneous comparisons. Then, the estimated
that is, Ncomp, each equation expressing every com- common heterogeneity parameter (which will
parison that appears in the data as a combination of be higher than the true for the homogeneous
the basic parameters. This gives the model Ncomp comparison) will impose greater uncertainty
(T 1) degrees of freedom. in the estimate for the homogeneous comparison,
25 Network Meta-analysis 597
which might outweigh the gain in precision Table 6 Results of network meta-analysis as hierarchical
offered by including the indirect evidence. model for incident diabetes. Log-odds ratios (^ μ ) with their
standard error SEðμ ^ Þ and odds ratios (OR) with their 95%
Fitting the hierarchical model within a Bayes- credible interval (CrI) for all basic and functional contrasts
ian framework makes the use of the true likeli- are reported
hood for the data easier. In the case of 95% CI
dichotomous outcomes, each study reports num- Comparison μ
^ SEðμ
^Þ OR for OR
bers of successes per arm and the likelihood β-blockers 0.24 0.10 1.27 (1.04,
(Eq. 7) is substituted by two arm-specific binomial versus placebo 1.55)
distributions diuretics versus 0.32 0.10 1.38 (1.13,
placebo 1.68)
CCB versus 0.07 0.10 1.07 (0.88,
r ij B pij , nij placebo 1.30)
ACE inhibitors 0.09 0.08 0.91 (0.78,
r ik Bðpik , nik Þ versus placebo 1.07)
ARB versus 0.19 0.10 0.83 (0.68,
Then the probabilities pik, pij are parameterized placebo 1.01)
to produce a treatment effect measure θijk (e.g., for
log(OR), θijk = logit( pij) logit( pik)). The hier-
archical network meta-analysis model is mathe- Bayesian approach accounts for uncertainty in
matically equivalent to the meta-regression as this value with a 95% CrI (0.01, 0.07) and pro-
long as the contrast-specific data are used: they vides estimates of the ORs with slightly wider
both have the same number of degrees of freedom confidence intervals.
and the same number of parameters. One advantage of conducting network meta-
analysis as a hierarchical model compared with a
meta-regression approach is that ranking of all
Application: Network Meta-analysis as Hierar- interventions included in the network is easier.
chical Model for Incident Diabetes This will be discussed in the next section, on the
As in the meta-regression approach, all compari- results from the model that accounts properly for
sons included in the four three-arm trials are the multi-arm trials.
assumed to be evaluated in three independent
two-arm studies, and this results in 30 comparisons
indexed with i. The same five basic parameters μAj Models for Data that Include Multi-arm
are chosen, with placebo as reference: μBB P, μD Trials
P, μCCB P, μACE P, μARB P. Arm-specific When trials involve more than two arms, the net-
data will be modelled using the binomial likelihood. work meta-analysis models described in sections
A categorical covariate needs to be specified for each “Network Meta-analysis as a Linear Model” and
arm, with xij showing the intervention given to arm “Network Meta-analysis as a Hierarchical Model”
j of study i and xik the intervention of arm are further complicated for two reasons. The first
k (xij , xik = {P, BB, D, CCB, ACE, ARB}). Fitting is the need to account for correlations induced by
the model in WinBUGS, and using a half-normal the fact that multi-arm trials inform more than one
prior distribution τ N(0, 1) , τ > 0 for the com- comparison. The second is that multi-arm studies
mon heterogeneity, gives the results in Table 6. are inherently consistent; if A, B, and C are all
These estimates are comparable to the effect included within the same study i then, it is plainly
sizes obtained by the meta-regression approach the case that yiBC = yiAC yiAB where yikj is the
(Table 4, Fig. 4). The most important difference effect size in study i for the contrast k versus j.
is, as in subgroup analysis, in the estimation of This means that if a study has αi arms, then only
heterogeneity. Although both meta-regression and αi 1 of the αi(αi 1)/2 possible comparisons
the hierarchical model result in the same point are linearly independent, and so only αi 1 need
estimate for heterogeneity of ^τ 2 ¼ 0:02 , the to be modelled. This inherent consistency also
598 G. Salanti et al.
makes the calculation of the number of degrees of Using matrix notation, the meta-regression
freedom difficult and the formula Ncomp (T model in section “Network Meta-analysis as
1) no longerholds (see also section “Statistical a Linear Model” will have the form
Methods to Detect Inconsistency in a Network of
Interventions”). 0 1 0 1
Consider the case of three treatments and four y1ΑC 1 0
B y2BC C B 1 1 C
B C C μAC
B y3AB C ¼ B
studies as presented in Table 7. In study four,
B C B 0 1C
μ
only two of the three contrasts need to be @ y4AC A @ 1 0 A AB
included in the model as the third effect size
y4AB 0 1
yiBC can be simply computed as yiAC yiAB. 0 1 0 1
Thus, the study will contribute directly to two δ1ΑC e1ΑC
B δ2BC C B e2BC C
out of the three meta-analyses in Fig. 5. The two B C B C
observed effect sizes yiAC , yiAB are correlated as þB C B C
B δ3AB C þ B e3AB C (9)
@ δ4AC A @ e4ΑC A
they both include the common treatment C. This
covariance needs to be taken into account in the δ4AB e5AB
analysis.
Note that in Table 7, the data for study To account for the fact that the random errors
4 includes the sample covariance cov(y4AC, y4AB), and the random effects that belong to the same
denoted also as c.The covariance can be estimated study are correlated, it is assumed that e N
from the data as the variance of the outcome in the (0, S2) and δ N(0, T2) where e , δ are the
common arm. For example, if the outcome is vectors of the random errors and random effects,
continuous and the effect size is the mean differ- S2 is the within-studies variance-covariance matrix
ence, it turns out that c is the sample variance of (estimated from the data), and T2 is the between-
the outcome in the common arm C, that is, sdiC2/ studies variance-covariance matrix (and consists of
niC. When the outcome is dichotomous and the unknown parameters to be estimated from the
effect size is the ln(OR), the covariance is c = 1/ model). For the data in Table 7, the within-studies
rC + 1/(nc rC). When the outcome is measured variance-covariance matrix is
on the risk ratio scale (RR), then the covariance
for ln(RR) is c = 1/rC 1/nC and for risk differ- 0 1
s21AC 0 0 0 0
ence it is c = rC(nC rC)/nC3. B C
B 0 s22BC 0 0 0 C
The meta-regression model as presented in B C
section “Network Meta-analysis as a Linear S ¼B
2
B 0 0 s23AB 0 0 C
C
B C
Model” does not account for the dependence @ 0 0 0 s24AC c A
between the observations in study 4. Moreover, 0 0 0 c s25AB
correlations are present not only in the observa-
tions y4AC, y4AB but also in their underlying ran- whereas the between-studies variance-covariance
dom effects δ4AC, δ4AB. matrix is
25 Network Meta-analysis 599
0 1
τ2AC 0 0 0 0 likelihood of the data consists of three binomial
B C
B 0 τ2ΒC 0 0 0 C distributions with event probability parameters p4A ,
B C
T ¼B
2
B 0 0 τ2AΒ 0 0 C
C p4B , p4C which parameterized will give two effect
B C
@ 0 0 0 τAC
2
covðδ4AC ,δ4AB Þ A sizes θ4 , θ5 that correspond to underlying relative
0 0 0 covðδ4AC , δ4AB Þ τAB
2
effects for treatments A and B compared to C (see
0 2
τ 0 0 0 0
1 section “Meta-analysis as Hierarchical Model”). So,
B 0 τ2 0 0 C
for studies i = 1 , 2 , 3, the underlying random
B 0 C
B C effects θijk follow independent distributions as
¼B
B0 0 τ
2
0 0 C C
B
@0 0 0 τ2
C
τ2 =2 A
described in Eq. 8, but the random effects θ4AC ,
θ4AB from the fourth study will follow the multi-
0 0 0 τ2 =2 τ2
variate normal distribution (10).
As discussed before, it is often the case that a Technical note: the multivariate normal distri-
common heterogeneity parameter is assumed; that bution above can be decomposed into a series of
is, τ2AC ¼ τ2BC ¼ τ2AB ¼ τ2 . This assumption offers conditional distributions; this offers computa-
tional advantages. Distribution (10) can be written
an advantage in the case of multi-arm studies as it
as a set of one unconditional and one conditional
considerably simplifies the between-studies vari-
distribution:
ance-covariance matrix Τ2. It can be shown that
θ4AB N(μ AC, τ )
2
and
when heterogeneity is equal across comparisons, 2
the covariance of any two random effects is τ2/2. θ4AC jθ4AB N μAC þ 12 ðθ4AB μAB Þ, 3τ4
Then the matrix Τ2 has τ2 in the diagonal and τ2/2 More generally, if a study i has ai arms
in the cells that refer to pairs of effects from the that correspond to treatments Ti = {A, B, C,
same study. D. . .} in this presented order, the (ai 1)-dimen-
Similar considerations need to be made for the sional normal distribution of all treatments versus
hierarchical model. The distributions (7) and (8) A can be “decomposed” by writing the independent
apply only to studies i = 1 , 2 , 3. For i = 4 the distribution for θiAB, then the conditional θiAC|θiAB,
likelihood of the two-dimensional vector of effect then θiAD|θiAB , θiAC, and so on. The distribution of
sizes is the random effect θiAj conditional on all “previous”
comparisons θiAk has mean:
y4AC θ4AC s24 covðy4AC ,y4AB Þ
MVN ,
y4AB θ4AB covðy4AC ,y4AB Þ s25
1 X
μAj þ ðθiAk μAk Þ
ai 1 k<j
and the random effects are distributed assuming
equal heterogeneities as
with variance
2 2
θ4AC μAC τ τ =2
MVN , 2 : ai τ 2
θ4AB μAB τ =2 τ2
ai 1 2
(10)
where k < j means that comparison Ak was been
The consistency equations remain as presented
modelled before Aj.
in section “Network Meta-analysis as a Hierarchi-
cal Model”.
With arm-specific data and a hierarchical struc- Application: Network Meta-analysis with
ture, no correlations are needed to account in the Multi-Arm Trials as Hierarchical Model for
likelihood as the observations in arms are indepen- Incident Diabetes
dent. For example, if study 4 presents the number of In this application index i refers to studies
successes r4A , r4B , r4C for a dichotomous outcome (i = 1, . . . ,22). There are 18 studies that com-
out of the total n4A , n4B , n4C randomized, then the pared only two interventions and thus have only
600 G. Salanti et al.
Table 8 Results of network meta-analysis as hierarchi- heterogeneity are as in the application of hierar-
cal model for incident diabetes taking into account chical model that does not account for multi-arm
multi-arm trials. Log-odds ratios (^μ) with their standard
error SEðμ^ Þ and odds ratios (OR) with their 95% credible trials).
interval (CrI) for all basic and functional contrasts are The estimate of common heterogeneity is 0.02
reported with 95% CrI (0.01, 0.07). Very little change is
95% CrI observed compared with the analyses above in
Comparison μ
^ SEðμ
^Þ OR for OR which the correlations between multiple arms
β-blockers 0.24 0.10 1.27 (1.04,1.55) were ignored; this is probably due to the fact that
versus multi-arm trials represent only the 18% of our
placebo
data. All pairwise ORs are presented in Fig. 4.
diuretics 0.32 0.10 1.38 (1.13,1.68)
versus The posterior deviance from the analysis is
placebo D = 53.26 which, when compared to the number
CCB versus 0.07 0.10 1.07 (0.88,1.30) of data points (48), suggests a rather poor fit of the
placebo model to the data. The DIC of the model was
ACE 0.09 0.08 0.91 (0.78,1.07) estimated as 91.4.
inhibitors
versus
placebo
ARB versus 0.19 0.10 0.83 Network Meta-analysis as a Multivariate
(0.68,1.01)
placebo Meta-analysis
Multivariate meta-analysis is an extension of
meta-analysis that simultaneously synthesizes
data on more than one outcome per study. For
two arms (αi = 2), and there are four studies with example, studies which compare antihypertensive
three arms (αi = 3). The variable αi = {2, 3} interventions might measure the two related out-
needs to be specified for each study i and then a comes fatal stroke and nonfatal stroke. Some stud-
binomial likelihood is assumed for the number of ies will only report fatal or only nonfatal stroke,
patients in all arms of each study. Using the index others will report both. Because these two out-
j to show the arm (treatment) within a study, the comes are correlated, there are important benefits
binomial likelihood is written as in analyzing them jointly via multivariate meta-
analysis, including improved precision and calcu-
rij B pij ,nij , i ¼ 1, . . . , 22, lation of confidence regions for both outcomes
j fP,BB,D,CCB, ACE,ARBg (Jackson et al. 2011; Riley 2009).
Multiple treatment comparisons reported by
The probabilities pij can be parameterized to multi-arm studies may be viewed in a similar
model αi 1 effect sizes as way to multiple outcomes. Specifically, the basic
logit( pi1) = ui for the “first” (reference) arm in contrasts can be considered analogous to different
each study that pertains to treatment j outcomes, where the basic contrasts are the set of
logit( pij) = ui + θijk for the other arms in the necessary comparisons to represent all compari-
study sons under the consistency assumption (e.g., the
The underlying ln(OR),θijk, compares treat- contrasts Aj of each treatment versus a common
ments k and j (reported in the first arm) where reference treatment A). Studies may report on
j , k {P, BB, D, CCB, ACE, ARB}. For the many, allora single basic contrast. In the example
multi-arm trials the correlation between θijk and of Table 7, the basic contrasts are the contrasts AC
θijl , l 6¼ j 6¼ kin the same trial is taken into and AB. So, study 1 reports on the first “outcome”
account by the conditional mean and variance of AC, study 3 reports on the second “outcome” AB,
their distribution. Table 8 shows the results of and study 4 reports on both “outcomes.”
fitting this model in WinBUGS against placebo A departure from the analogy arises for study
(basic parameters and prior distribution for 2, which compares B and C. This study gives
25 Network Meta-analysis 601
Table 10 Results of network meta-analysis as multivari- in previous analyses in the chapter. This sets τ2Pj
μ)
ate meta-analysis for incident diabetes. Log-odds ratios (^ ¼ τ2 so all covariances between random effects
with their standard error SEðμ^ Þ and odds ratios (OR) with
their 95% confidence interval (CI) for all basic contrasts are τ2/2. This model can be implemented in
are reported STATA using the mvmeta command with the
95% CI for option bscov(), which gives the results of
Comparison μ
^ SEðμ^Þ OR OR Table 10.
β-blockers 0.21 0.08 1.24 (1.05,1.44) Estimates for all functional comparisons can be
versus derived with the use of consistency equations.
placebo
There are small differences between the results
diuretics 0.28 0.08 1.32 (1.12,1.56)
of this approach with the corresponding results
versus
placebo of the hierarchical model. Using the restricted
CCB versus 0.04 0.08 1.04 (0.89,1.21) maximum likelihood estimator in mvmeta results
placebo in ^τ 2 ¼ 0:01, which is the same as the heteroge-
ACE 0.12 0.07 0.88 (0.77,1.10) neity estimated in the hierarchical model.
inhibitors
versus
placebo
ARB versus 0.19 0.09 0.83 (0.70,0.98)
Assumptions of Network Meta-analysis
placebo
As presented in section “Estimating Inconsistency
in Mixed Comparisons,” inconsistency in a net-
placebo as reference implies that in studies with- work can manifest as a disagreement between
out a placebo arm, we need to “impute” data for a different sources of evidence for the same com-
very small sample size for an assumed placebo parison and can be identified statistically. For
arm, and here the values riP = 0.001 and niP = example, an indirect estimate of A versus B via a
0.01 are used. Then all studies will report on the treatment C can be in conflict with the direct
relative effectiveness of the included treatments estimate or with another indirect estimate, e.g.,
versus placebo, yiPj where j = {BB, D, CCB, A versus B via a treatment C.
ACE, ARB}. The sample variance-covariance Both the likelihood of transitivity (based on
matrix S of all yiPj needs to be specified. As the clinical and epidemiological considerations) and
outcome is measured using the (OR), the vari- any evidence of (in)consistency (based on statis-
ances of all observations are calculated using the tical considerations) should be evaluated in a net-
formula: work as part of a network meta-analysis.
Conceptual evaluation involves a priori judge-
1 1 1 1 ments about the comparability of the studies
s2iPj ¼ þ þ þ
r P nP r P r j nj r j across comparisons with respect to the distribu-
tion of potential confounders, considering
and the covariances are calculated as whether treatments were all given for the same
indication and considering whether anchor treat-
1 1
cov yiPj , yiPk ¼ þ ments are equivalent. Such judgements should be
r P nP r P made ideally before the outcome data are extra-
cted but after the studies and their characteristics
The variance-covariance matrix of the random are collected.
effects can be modelled in various ways. The most Although transitivity and consistency are inter-
flexible structure is to estimate different heteroge- woven concepts and are often thought of as one, it
neity variances τ2Pj for each comparison ( j vs. P). can be useful to consider them separately for ease
In the analyses that follow, a much more restricted of evaluation. Consider, for instance, the network
structure is used, following the assumption of a presented in Fig. 6 where all treatments have been
common heterogeneity variance as has been used compared with placebo but not with each other. In
25 Network Meta-analysis 603
Paroxetine
problem is that the loop-specific tests are not inde- estimate will follow a chi-squared distribution
pendent as they share groups of studies. Consider, with l degrees of freedom.
for example, the network in Fig. 7 and imagine that The results of the loop-based approach can
the AB comparison is informed by a single study in vary substantially depending on the method
which an unobserved characteristic produced an used to derive the pairwise estimates and their
estimate very different from what would be variances. In the presence of heterogeneity, the
expected in the other studies. Then both ABC and uncertainty of IF^ will be larger in a random
ADB loops will present inconsistency because the effects analysis compared with a fixed-effect
respective IFs ^ share the same deviant AB study. analysis, and therefore there will be less chance
The loop-based approach does not provide a of identifying statistically significant inconsis-
network-specific estimate of the inconsistency. tencies. The random effects approach will also
The multiple dependent tests cannot be summa- give different results depending on which
rized into a global network-specific test. It is also method will be used to estimate the heterogene-
unclear how to treat multi-arm trials, which are ity parameter τ (e.g., method of moments,
inherently consistent. Because of the dependence restricted maximum likelihood). Some
between the loops and the multiple testing nature approaches will give larger estimates than
of the approach, the results should be interpreted others, resulting in different estimates for the
with caution; the absence of inconsistent loops uncertainty of IF ^ . Moreover, the estimated
may be reassuring for the assumption of consis- pairwise variances will change depending on
tency (notwithstanding the lack of power of such whether the same or different heterogeneity
tests), but the presence of statistically significant parameters are assumed in the loop.
loops cannot be used to infer the magnitude of There is currently limited empirical evidence
inconsistency in a network. about the occurrence of statistical inconsistency.
In the special case where the loops share a A study evaluated 112 triangular networks
single comparison as in Fig. 7, a chi-squared test of which only 16 were found inconsistent
can be applied (Caldwell et al. 2010). For (Song et al. 2011). O’Regan et al. (2009) empir-
the same comparison AB, there are three esti- ically evaluated the agreement between
mates; the direct estimate μ ^D
AB the two indirect indirect and mixed estimates that appear in
estimates via C and D, μ ^ ABviaC and μ
I
^ IABviaD, respec- networks of at least four treatments. Using a
tively, with their estimated variances noted as ^v D AB fixed-effect approach, they concluded that the
, ^v IABviaC , ^v IABviaD . The mixed estimate μ ^M
AB is the two indirect and mixed estimates did not show
weighted average of the three estimates with important differences, although the 51 compari-
weights being the inverse of the variance. To test sons they examined came from only seven
both ABC and ADB loops (and therefore provide a reviews.
global test for the network) the following Approaches that evaluate inconsistency glob-
chi-squared test can be applied ally in a network rather than testing each loop
have gained in popularity but are typically
cumbersome to apply and have limitations. For
2 I 2
μ
^DAB μ ^M μ
^ ABviaC μ ^M network models fitted within a Bayesian frame-
AB
þ AB
work, the consistency assumption can be evalu-
^v D
AB ^v IABviaC
I 2 ated by comparing a model that assumes
μ
^ μ ^M consistency with one that does not, using
þ ABviaD AB
Χ22
^v IABviaD the DIC (Spiegelhalter et al. 2002). The model
without consistency is the model described in
This can be generalized to combine testing for section “Consistency Models” but without the
disagreement between direct estimate and consistency equations to derive indirect and
l independent indirect sources; the weighted sum mixed estimates. The inconsistency model relies
of the difference of each estimate from the mixed only on direct evidence and is equivalent to a
25 Network Meta-analysis 605
series of pairwise meta-analyses (usually assum- proposed in (Lu and Ades 2006) and is based on
ing, however, that they share the same heteroge- the idea that inconsistency is a property of closed
neity parameter). The assumption of consistency loops and a network can have as many inconsis-
is challenged when the inconsistency model pre- tencies as functional parameters. Recently, an
sents, for the same data, a better trade-off approach has been proposed which extends the
between model fit and complexity; this is the idea of inconsistency: it does not apply only to the
case when the DIC for the inconsistency model disagreement between direct and indirect estimates
is lower to the DIC for the consistency model by in a loop but also disagreement between studies that
more than three units. An important drawback report the same comparison but include different
with this method is that results may depend sets of treatments. The two approaches are outlined
on the parameterization of the multi-arm trials, below, starting from the data in Table 7.
from which only some of the study-specific The loop-based inconsistency model assumes
effect sizes enter the model. Approaches that that inconsistency arises when the consistency
simultaneously test and account for inconsis- equations between functional and basic parame-
tency are discussed in the section “Inconsistency ters do not hold. Hence, an obvious solution is to
models”. “relax” the assumption by adding an extra term to
account for inconsistencies. In the example of
Application: Statistical Evaluation of Incon- Table 7, there are two basic parameters μAC , μAB
sistency in Each Loop of Incident Diabetes and one functional μBC = μAC μAB. This
Network reflects the closed loops ABC. Inconsistency in
The network includes 16 “triangles” that can be this loop can be accounted for if it is assumed that
evaluated for inconsistency. For the calculation
of all inconsistency factors, the formulae of sec- μBC ¼ μAC μAB þ wABC
tion 3.3.3 is employed. Then the estimates with
their 95% CI can be plot in a forest plot. The where wABC measures the amount of inconsis-
pairwise effect sizes were estimated using the tency in the loop. The term is also called an
random effects model assuming different and inconsistency factors and in fact in the absence
loop-common heterogeneity parameters. of multiple correlated loops is analogous to the
There are no important differences between the simple IF^ . In complex networks where many
two forest plots; both include two inconsistent inconsistency factors exist, the parameters wjkf
loops. are assumed to be randomly distributed with
The hierarchical model is fitted as described in expectation zero:
section “Network Meta-analysis as a Hierarchical
Model” but omitting the consistency equations (i.e.,
wjkf N 0, σ 2
an inconsistency model); i.e., this is essentially a
sequence of pairwise meta-analyses. The value of The variance σ 2 is often referred to as the
the posterior deviance was D = 50.85 and inconsistency variance in analogy with the het-
DIC = 93.6. Comparing theD value to that obtained erogeneity variance τ2 in the distribution of the
from the consistency model, since the difference in study-specific random effects δi N(0, τ2).
DIC is smaller than three points, this suggests that The inconsistency σ 2 describes the amount of
the inconsistency model fits the data better and variability across loops in the conflict between
might also be the most parsimonious model. direct and indirect evidence. Monitoring the indi-
vidual wjkf s for large values will reveal loops with
important inconsistency, whereas comparison of
Inconsistency Models σ 2 to τ2 will show how much inconsistency exists
compared with the heterogeneity.
Two major approaches have been proposed so far As the degrees of freedom in a network describe
to address inconsistency. The first approach was the number of functional parameters, there are
606 G. Salanti et al.
Ncom (T 1) many inconsistency factors. Prob- One further approach for detecting inconsis-
lems arise with this approach when there are multi- tency in a network meta-analysis is “node split-
arm trials. The ABC trials in Table 7 are inherently ting” (Dias et al. 2010) where a “node” refers to
consistent, and therefore the BC comparison each summary effect generated from the network
reported in these studies does not contribute to the meta-analysis. This approach is based on the
inconsistency as much as the BC comparison in an separation of the information contributing to
independent study. Lu and Ades suggested each node into the direct and indirect evidence,
adjusting the inconsistency degrees of freedom to within a single model. The node-splitting
ICDF = Ncom (T 1) S where S is the approach allows the analyst to split the
number of independent inconsistency relations in network-wide information contributing to the
which the corresponding parameters are supported summary estimate into the evidence directly
by no more than two independent sources of evi- comparing B versus C(^ μD
BC) and all the remaining
dence. In practice, S is the number of functional “indirect” evidence for B versus C (^ μ IBC) after the
comparisons where two out of the three parameters studies directly comparing B to C have been
are only estimated in multi-arm trials. removed. The extent of agreement between the
The difficulties in fully defining loop incon- direct and indirect estimates defines the magni-
sistency when there are multi-arm studies moti- tude of consistency. Note that this is a computa-
vated the concept of “design inconsistency.” tionally intensive approach involving models
Design inconsistency reflects the belief that that can be difficult to parameterize; care should
studies which include different treatments be taken to ensure that multi-arm trials are han-
might give different estimates for the same com- dled correctly and to ensure that split nodes are
parison. For example, an AB study and ABC actually from contrasts contributing to suspect
study might provide different estimates because loops.
of their different design. Design inconsistency
can be thought of as a special case of source-
specific heterogeneity: variation between the Application: Hierarchical Inconsistency
estimates for the same comparison due to differ- Model for Network Meta-analysis in Incident
ences in the total treatments included. In the data Diabetes
of Table 7, this means adding an inconsistency The application of a hierarchical inconsistency
factor for the disagreement between model requires careful choice of the basic parame-
the three estimates in the ABC study and the ters μAj and the inconsistency factors wjkf, as well as
AB , AC, and BC studies. The model with the appropriate parameterization of multi-arm trials.
both loop and design inconsistency has First, all basic contrasts need to be informed directly
NCompDesign (T 1) inconsistency factors, from at least one study. Choosing placebo as refer-
where NCompDesign is the number of indepen- ence treatment (A) satisfies this condition, because
dent comparisons per design. In Table 7 there is all other treatments are compared directly with pla-
one independent comparison for each two-arm cebo in at least one study. Second, the four multi-
trial and two independent comparisons for the arm trials included in the data may modify the
three-arm trials. This results in a total of three number of ICDF that should be included in the
inconsistency factors for the network. These model. However, as all consistency equations are
inconsistency factors are comparison-specific informed by at least three independent sources of
and are attached to every study reporting that evidence, it is
comparison. For instance, one inconsistency fac-
tor is attached to each AB , AC, and BC study, ICDF ¼ N com ðT 1Þ S
respectively, (wAB , wAC , wBC). As the inconsis- ¼ 14 ð6 1Þ 0 ¼ 9
tency factors derived in this way are indepen-
dent, they can be summarized in a single test for The consistency relations can be relaxed to
the entire network (see White 2011 for details). include the nine inconsistency parameters:
25 Network Meta-analysis 607
Table 11 Results of inconsistency hierarchical model for (CrI) for all basic and functional contrasts are reported.
network meta-analysis for incident diabetes. Inconsistency Missing values of w correspond to basic contrasts or func-
μ) with their standard error SE
factors (w), log-odds ratios (^ tional contrasts without direct estimates available
ðμ
^ Þ, and odds ratios (OR) with their 95% credible interval
Comparison wPjk μ
^ SEðμ^Þ OR 95% CI for OR
β-blockers versus placebo 0.23 0.11 1.26 (1.03, 1.62)
diuretics versus placebo 0.31 0.10 1.36 (1.13, 1.71)
CCB versus placebo 0.06 0.10 1.06 (0.89, 1.32)
ACE inhibitors versus placebo 0.13 0.08 0.88 (0.75, 1.03)
ARB versus placebo 0.20 0.10 0.82 (0.66, 1.00)
diuretics versus β-blockers 0.02 0.08 0.12 1.08 (0.86, 1.36)
CCB versus diuretics 0.00 0.25 0.11 0.78 (0.62, 0.97)
CCB versus β-blockers 0.01 0.17 0.11 0.84 (0.68, 1.04)
ACE inhibitors versus CCB 0.01 0.19 0.11 0.83 (0.65, 1.00)
ACE inhibitors versus diuretics 0.01 0.44 0.11 0.65 (0.50, 0.78)
ACE inhibitors versus β-blockers 0.04 0.36 0.12 0.70 (0.53, 0.85)
ARB versus ACE inhibitors 0.07 0.12 0.93 (0.73, 1.18)
ARB versus CCB 0.00 0.26 0.12 0.77 (0.59, 0.95)
ARB versus diuretics 0.01 0.51 0.13 0.60 (0.45, 0.76)
ARB versus β-blockers 0.02 0.43 0.13 0.65 (0.49, 0.81)
μDBB ¼ μDP μBBP þ wPDBB Since the model considers treatment 1 as baseline
treatment (A) of each study, we need in the data for
μCCBD ¼ μCCBP μDP þ wPCCBD the D-BB-P trial placebo to be the first treatment and
for the other three studies CCB or ACE inhibitors.
μCCBBB ¼ μCCBP μBBP þ wPCCBBB Table 11 shows the results of fitting this model
in WinBUGS employing a half-normal prior dis-
μACECCB ¼ μACEP μCCBP þ wPACECCB tribution on the inconsistency variance σ 2 (the
same as for the heterogeneity τ2).
μACED ¼ μACEP μDP þ wPACED Heterogeneity and inconsistency variances
were estimated as 0.02 and 0.01, respectively,
μACEBB ¼ μACEP μBBP þ wPACEBB with 95% CrI (0, 0.06) and (0, 0.13), respectively.
Some w-factors are quite large in relation to the
μARBCCB ¼ μARBP μCCBP þ wPARBCCB treatment effect estimates, indicating that there is
probably some inconsistency in the network. Note
μARBD ¼ μARBP μDP þ wPARBD this is in agreement with the loop-specific
approach. The loop placebo versus ACE inhibi-
μARBBB ¼ μARBP μBBP þ wPARBBB tors versus β-blockers presents the largest incon-
sistency value (0.04) followed by the loops
where wPjk N(0, σ 2) for j , k = {BB, D, CCB, placebo versus ARB versus β-blockers (0.02)
ACE, ARB}. The rest of the model is the same with and placebo versus diuretics versus β-blockers
the consistency hierarchical model (accounting for (0.02). The first two loops were also identified
multi-arm trials). Moreover, contrasts that are as inconsistent in Fig. 8, and the last was appeared
informed only from multi-arm trials need to be marginally consistent. There are no large differ-
expressed in model parameters. Such contrasts are ences in the point estimates of the summary ORs
β-blockers versus placebo, included in a D-BB-P compared to those from the consistency model.
study, and ACE inhibitors versus CCB, included in However, the 95% CrI from the inconsistency
two ACE-CCB-BB, and one ACE-CCB-D studies. model are wider to account for inconsistency.
608 G. Salanti et al.
BB-D-CCB
P-BB-CCB
BB-CCB-ACE
BB-CCB-ARB
P-BB-D
BB-D-ACE
BB-D-ARB
P-BB-ACE
P-BB-ARB
P-D-CCB
D-CCB-ARB
D-CCB-ARB
P-CCB-ACE
P-CCB-ARB
P-D-ACE
P-D-ARB
0 1 2 3 4 0 1 2 3 4
Inconsistency Factor
(a) (b)
Fig. 8 Inconsistency factors of all triangles of incident heterogeneity estimate within each triangle. Triangles
diabetes network with (a) a different heterogeneity esti- with statistically significant inconsistency factors (their
mate for each comparison and (b) with a common 95% CI does not include 0) are considered as inconsistent
The DIC of the model was 92.1 and D ¼ 53:15 The network meta-regression model as a hier-
showing that accounting for inconsistency does archical model is
not improve the fit of the model as the consistency
model resulted in almost the same values.
yijk N θijk , s2ijk
When heterogeneity is found in a pairwise meta- where bijk are the regression coefficients for study
analysis, subgroup analysis or meta-regression are i and comparison jk and Cijk the explanatory var-
employed to explore possible sources. Network iable. The regression coefficients can be assumed
meta-regression is an extension of network meta- to be fixed across studies (bijk = βjk) or, if there
analysis to include covariates and can be used to are many studies per comparison, as exchangeable
explore heterogeneity and/or inconsistency. across studies (bijk Ν(βjk, γ 2)). The model can
Covariates typically include study-specific vari- be applied to multi-arm trials and also extended to
ables such as setting or length of follow-up, account for inconsistency as described in previous
within-trial bias characteristics such as the quality sections.
of randomization, of allocation concealment and Consistency can be imposed for the regres-
blinding, or patient-level characteristics such as sion coefficients by choosing a reference treat-
age or sex. Meta-regression is equivalent to sub- ment A and defining βjk = βAk βAj (Cooper
group analysis for dichotomous or categorical et al. 2009). To improve power, the independent
explanatory variables. Characteristics such as dif- βAj can be assumed exchangeable; βAj Ν(Β,
ferences in baseline risk (if there is a common φ2). Adjusting for factors that can vary across
comparator) and sample size (as a single proxy comparisons may reduce heterogeneity and
for study quality) can also be considered. improve the likelihood of transitivity. The
25 Network Meta-analysis 609
importance and impact of the adjustment can be be the sample standard error, variance, or inverse of
judged by monitoring changes in the heteroge- sample size (references). However, significant
neity variance (compare τ2r to τ2) and inconsis- associations between effect sizes and precision
tency variance (compare σ 2r to σ 2), by monitoring can be taken only as an indication of publication
the magnitude and significance of the coeffi- bias, as other explanations, including genuine het-
cients βjk and by comparing the goodness of fit erogeneity, are possible. As publication bias and
and parsimony of adjusted and unadjusted selective reporting will affect interventions and
models using DIC and D. comparisons in different ways depending on the
Network meta-regression suffers from the clinical context, the problem of selection bias in
same problems with simple meta-regression. the network should be considered carefully. Further
These include ecological bias when aggregated methodological development is needed to better
patient-level data are used as covariates, low address selection bias in network meta-analysis.
power with few studies and high false-positive Because network meta-analysis combines
rates if heterogeneity not explained by the covari- studies that compare a treatment against a variety
ances is ignored (Higgins and Thompson 2004). of comparators, it enables researchers to explore
Adjusting for bias in a network of interven- biases that are not identifiable in a head-to-head
tions offers the advantage of increased power meta-analysis. “Optimism bias” associated
compared with traditional meta-analysis sensi- with the use of novel interventions has been a
tivity analysis, because the regression coeffi- concern difficult to address in a pairwise meta-
cients share information via the consistency analysis (Djulbegovic et al. 2011; Heres et al.
equations. Suppose, for example, that compari- 2006; Soares et al. 2005). However, in a
son B versus C is informed by very few studies, network of interventions, the same treatment
or by studies that all have the same characteristic C can be the newer and hence the “favored” in a
(e.g., they all have poor allocation concealment). comparison A versus C but the older in
Then, conducting sensitivity analysis or another comparison B versus C. This enables us
adjusting the meta-analysis result of BC for allo- to explore apparent changes in the effectiveness of
cation concealment is suboptimal or impossible. C because of optimism (Salanti et al. 2010).
However, when these studies are part of a net-
work meta-regression model, the bias coefficient Application: Network Meta-regression for
βBC for allocation concealment is linked to the Incident Diabetes Using Year of Publication
other regression coefficients via βBC = βAC as Covariate
βAB and βAj Ν(Β, φ2). An network meta-regression analysis of the inci-
A special application of network meta- dent diabetes data set will investigate whether
regression is to address small study effects in a differences in the publication year of included
network of interventions. The association between studies have an impact on the estimated treat-
sample size, effect size heterogeneity, and the prob- ment effects, and hence whether they can explain
ability of publication (which is often manifested as any of the heterogeneity and inconsistency of
funnel plot asymmetry) has long been a challenging this network. Two meta-regression models will
issue in meta-analysis. In a pairwise meta-analysis, be used; one estimating a common fixed coeffi-
the presence of small study effects (possibly due to cient across all studies and all treatment compar-
publication bias) has been explored by regressing isons and a second imposing consistency in
the underlying effect on a measure of the study coefficients. More specifically, in the first
precision. The same approach applies to networks model, it is assumed that bijk = B(i = 1 , . . . ,
of interventions to explore situations where com- 22), and a vague normal prior distribution on the
parisons that do not give significant results may be fixed coefficient N(0, 10000)is employed. In
underrepresented or missing in the network and the second model, the coefficients are assumed
their relative effectiveness will be informed primar- to be consistent bijk = βAk βAj (A = P) and
ily by the indirect evidence. The covariate Cijk can exchangeable βAj Ν(Β, φ2) ( j = {BB, D,
610 G. Salanti et al.
Table 12 Medians and 95% CrI of regression coefficients for comparisons of all treatments versus placebo estimated by
network meta-regression model for incident diabetes with consistent and exchangeable coefficients
Comparison β 95% CrI
β-blockers versus placebo 0.02 (0.06, 0.01)
diuretics versus placebo 0.02 (0.06, 0.01)
CCB versus placebo 0.03 (0.07, 0.02)
ACE inhibitors versus placebo 0.03 (0.08, 0.01)
ARB versus placebo 0.03 (0.10, 0.03)
Table 13 Results of network meta-regression model with standard error SEðμ^ Þ and odds ratios (OR) with their 95%
a common fixed coefficient for incident diabetes using year credible interval (CrI) for all comparisons are reported
μ ) with their
of publication as covariate. Log-odds ratios (^
Comparison μ
^ SEðμ^Þ OR 95% CrI for OR
β-blockers versus placebo 0.23 0.09 1.26 (1.06,1.50)
diuretics versus placebo 0.26 0.09 1.29 (1.07,1.56)
CCB versus placebo 0.06 0.09 1.07 (0.90,1.26)
ACE inhibitors versus placebo 0.10 0.07 0.91 (0.78,1.05)
ARB versus placebo 0.16 0.10 0.86 (0.70,1.04)
Table 14 Results of network meta-regression model with μ ) with their standard error SEðμ
ratios (^ ^ Þ and odds ratios
consistent and exchangeable coefficients for incident dia- (OR) with their 95% credible interval (CrI) for all compar-
betes using year of publication as covariate. Log-odds isons are reported
Comparison μ
^ SEðμ^Þ OR 95% CrI for OR
β-blockers versus placebo 0.25 0.10 1.28 (1.06, 1.55)
diuretics versus placebo 0.30 0.10 1.34 (1.11, 1.64)
CCB versus placebo 0.05 0.10 1.05 (0.86, 1.29)
ACE inhibitors versus placebo 0.09 0.10 1.09 (0.91, 1.33)
ARB versus placebo 0.06 0.09 0.94 (0.79, 1.12)
CCB, ACE, ARB}), where Β and φ2 are the mean CrI of B is (0.03, 0.01) implying that
and variance, respectively, of the distribution of there is no statistically significant effect of
all βAj with normal (B N(0, 10000)) and half- study publication year on treatments’
normal prior distributions. In both models a effectiveness.
covariate Ci Ci is used instead of Ci (the The same inference is derived from the
year of publication of study i) for computational second meta-regression model, which estimates
reasons (e.g., convergence of the models), where the mean (B) of distribution of regression
Ci is the mean publication year. coefficients’ to be 0.02 (0.07, 0.01) with
The estimate of the fixed regression coeffi- variance φ2 < 0.001. Table 12 shows the con-
cient from the first model (B) was 0.01, sistent coefficients (βAj) of all treatments versus
corresponding to an odds ratio that is placebo.
e0.01 = 0.99 times smaller for each 1 year The estimated treatment effects by the two
later of publication. However, the 95% models are presented in Tables 13 and 14.
25 Network Meta-analysis 611
Both meta-regression models resulted in het- Ranking measures and probabilities have
erogeneity estimates ^τ 2 ¼ 0:02, the same as for become popular as they provide an understand-
the consistency hierarchical model (accounting able gateway to the results, particularly when
for multi-arm trials), showing that year of publi- there are many competing treatments. The
cation as a covariate does not explain adequately probability of each treatment being the best is
the heterogeneity in the network. often calculated when the network model is fitted
The meta-regression with a fixed coefficient also within the Bayesian framework. Methods are also
does not improve the fit of the model (D = 53.85, available for similar ranking of treatments in a
DIC = 92.2) compared with the hierarchical con- frequentist framework (White 2011). The proba-
sistency model without any covariates, while the bility of being the best treatment has the disad-
model with consistent coefficients shows a slightly vantage that it does not reflect spread of rankings
better fit (D = 51.57, DIC = 91.3). for the treatments and may thus be misleading. An
The inconsistency model (as described in obvious solution is to calculate the probabilities
section “Consistency Models” but omitting the for all ranks. The probability of each treatment to
consistency equations) was also fitted with a fixed achieve each possible rank can be plotted to yield
coefficient to investigate if differences in year “rankograms.” Presentation of the cumulative
of publication can explain the identified inconsis- ranking curves in a single plot and a numerical
tency. The value of the posterior deviance was D= summary of the area below the cumulative raking
50.43 and DIC = 93.4, same with the inconsistency curve for each treatment is useful as it gives a clear
model not including any covariates. However, ordering of all treatments based on a summary of
using the estimates of this model, the two inconsis- the rank probabilities. A review of graphical and
tent loops ACE-BB-P and ARB-BB-P become numerical methods along with software code are
consistent with IF = 0.20 (0.44, 0.85) and 0.18 presented in (Chaimani et al. 2013).
(0.53, 0.89) implying that year of publication is a
possible explanation of inconsistency.
Application: Presentation of Results for Inci-
dent Diabetes
The results of the consistency hierarchical
model (accounting for multi-arm trials) will be
Numerical and Graphical Presentation used to illustrate the use of rankograms. The hier-
of Results from Network Meta-analysis archical model is fitted, and the ordering of the
treatments according to their effectiveness is col-
Network meta-analysis involves many treat- lected in each MCMC cycle using the equation:
ments and consequently results in a plethora of
pairwise effect sizes. When presenting results X
6
from a network meta-analysis, it is useful to orderk ¼ I μAj μAk
show both the direct and the mixed estimates j¼1
Table 15 Ranking probabilities for all treatments of incident diabetes. Results are based on the consistency hierarchical
model (accounting for multi-arm trials)
Order Placebo β-Blockers Diuretics CCB ACE inhibitors ARB
1 0.01 0.00 0.00 0.00 0.22 0.77
2 0.07 0.00 0.00 0.02 0.71 0.20
3 0.65 0.00 0.00 0.27 0.06 0.02
4 0.27 0.01 0.01 0.70 0.01 0.00
5 0.01 0.79 0.20 0.01 0.00 0.00
6 0.00 0.20 0.80 0.00 0.00 0.00
1.0
1.0
1.0
0.8
0.8
0.8
Probability
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Rank of Placebo Rank of β-Blockers Rank of Diuretics
1.0
1.0
1.0
0.8
0.8
0.8
Probability
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Rank of CCB Rank of ACE inhibitors Rank of ARB
Fig. 9 “Rankograms” for all treatments of incident diabetes. Results are based on the consistency hierarchical model
(accounting for multi-arm trials)
Table 16 Numerical summary of area below the cumu- The numerical summary of area
lative raking curve for all treatments of incident diabetes.
Results are based on the consistency hierarchical model below the cumulative raking curve for !each
(accounting for multi-arm trials) P5
treatment k is calculated as cum: Pjk =5 .
Placebo 0.59 j¼1
β-Blockers 0.16 The results are presented in Table 16 and
Diuretics 0.04 the plots in Fig. 10. These results suggest
CCB 0.46 that the best treatment appears to be
ACE inhibitors 0.81
ARB followed by ACE
ARB 0.96
inhibitors, placebo, CCB, β-blockers, and last
diuretics.
25 Network Meta-analysis 613
1.0
1.0
1.0
Cumulative Probability
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
1.0
1.0
Cumulative Probability
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Fig. 10 Plot of area below the cumulative raking curve for all treatments of incident diabetes. Results are based on the
consistency hierarchical model (accounting for multi-arm trials)
Acknowledgments GS and AC received funding Caldwell DM, Welton NJ, Ades AE. Mixed treatment
from the European Research Council (ERC starting comparison analysis provides internally coherent treat-
grant IMMA 260559). DC is supported by an UK MRC ment effect estimates based on overviews of reviews
Population Health Scientist Fellowship (G0902118). and can reveal inconsistency. J Clin Epidemiol. 2010;6
JPTH is funded by Medical Research Council grant (8):875–82.
U105285807. Chaimani A, Higgins JP, Mavridis D, Spyridonos P, Salanti G.
Graphical tools for network meta-analysis in STATA.
PLoS One. 2013;8(10):e76654.
Cipriani A, Furukawa TA, Salanti G, et al. Comparative
efficacy and acceptability of 12 new-generation antide-
References pressants: a multiple-treatments meta-analysis. Lancet.
2009;373:746–58.
Baker SG, Kramer BS. The transitive fallacy for random- Cooper NJ, Sutton AJ, Morris D, et al. Addressing
ized trials: if A bests B and B bests C in separate trials, between-study heterogeneity and inconsistency in
is A better than C? BMC Med Res Methodol. mixed treatment comparisons: application to stroke
2002;2:13. prevention treatments in individuals with
Barbui C, Cipriani A, Furukawa TA, et al. Making the best non-rheumatic atrial fibrillation. Stat Med. 2009;28
use of available evidence: the case of new generation (14):1861–81.
antidepressants: a response to: are all antidepressants Cooper NJ, Peters J, Lai MC, et al. How valuable are multiple
equal? Evid Based Ment Health. 2009;12:101–4. treatment comparison methods in evidence-based health-
Bucher HC, Guyatt GH, Griffith EL, et al. The results of care evaluation? Value Health. 2011;14:371–80.
direct and indirect treatment comparisons in meta- Dias S, Welton NJ, Caldwell DM, et al. Checking consis-
analysis of randomized controlled trials. J Clin tency in mixed treatment comparison meta-analysis.
Epidemiol. 1997;50(6):683–91. Stat Med. 2010;29:932–44.
Caldwell DM, Ades AE, Higgins JPT. Simultaneous com- Djulbegovic B, Kumar A, Magazin A, et al. Optimism bias
parison of multiple treatments: combining direct and leads to inconclusive results-an empirical study. J Clin
indirect evidence. BMJ. 2005;331:897–900. Epidemiol. 2011;64:583–93.
614 G. Salanti et al.
Donegan S, Williamson P, Gamble C, et al. Indirect com- Lu G, Ades AE. Assessing evidence inconsistency in
parisons: a review of reporting and methodological mixed treatment comparisons. J Am Stat Assoc.
quality. PLoS One. 2010;5:e11054. 2006;101:447–59.
Edwards SJ, Clarke MJ, Wordsworth S, et al. Indirect Lu G, Ades AE. Modeling between-trial variance structure
comparisons of treatments based on systematic reviews in mixed treatment comparisons. Biostatistics.
of randomised controlled trials. Int J Clin Pract. 2009;10(4):792–805.
2009;63:841–54. McAlister FA, Laupacis A, Wells GA, et al. Users’ guides
Eli Lilly and Company. Gemcitabine for the treatment of to the medical literature: XIX. Applying clinical trial
metastatic breast cancer: Single technology appraisal sub- results B. Guidelines for determining whether a drug is
mission to the National Institute for health and Clinical exerting (more than) a class effect. JAMA.
Excellence. 2006. Available from http://www.nice.org.uk 1999;282:1371–7.
Elliott WJ, Meyer PM. Incident diabetes in clinical trials of Mills EJ, Ghement I, O’Regan C, et al. Estimating the
antihypertensive drugs: a network meta-analysis. Lan- power of indirect comparisons: a simulation study.
cet. 2007;369:201–7. PLoS One. 2011;6:e16237.
Glenny AM, Altman DG, Song F, et al. Indirect compari- NICE. Methods for the development of NICE public health
sons of competing interventions. Health Technol guidance. 2nd ed. Evidence Synthesis National Insti-
Assess. 2005;9:26. tute of Health and Clinical Excellence; 2008.
Guyatt GH, Sackett DL, Sinclair JC, et al. Users’ guides to O’Regan C, Ghement I, Eyawo O, et al. Incorporating
the medical literature. IX. A method for grading health multiple interventions in meta-analysis: an evaluation
care recommendations. Evidence-Based Medicine of the mixed treatment comparison with the adjusted
Working Group. JAMA. 1995;274:1800–4. indirect comparison. Trials. 2009;10:86.
Heres S, Davis J, Maino K, et al. Why olanzapine beats PBAC. Report of the indirect comparisons working group
risperidone, risperidone beats quetiapine, and to the pharmaceutical benefits advisory committee:
quetiapine beats olanzapine: an exploratory analysis assessing indirect comparisons. Pharmaceutical Benefits
of head-to-head comparison studies of second- Advisory Committee; 2008. http://www.health.gov.au/
generation antipsychotics. Am J Psychiatry. internet/main/publishing.nsf/Content/B11E8EF19B358
2006;163:185–94. E39CA25754B000A9C07/$File/ICWG%20Report%2
Higgins JPT, Green S. Cochrane handbook for systematic 0FINAL2.pdf
reviews of interventions. 5.0.1 ed. The Cochrane Col- Piccini JP, Kong DF. Mixed treatment comparisons for
laboration; 2008; John Wiley & Sons Ltd, The Atrium, atrial fibrillation: evidence network or bewildering
Southern Gate, Chichester, West Sussex, England. entanglement? Europace. 2011;13:295–6.
Higgins JPT, Thompson SG. Controlling the risk of spuri- Riley RD. Multivariate meta-analysis: the effect of ignor-
ous findings from meta-regression. Stat Med. ing within-study correlation. J R Stat Soc Ser
2004;23:1663–82. A. 2009;172:789–811.
Hoaglin DC, Hawkins N, Jansen JP, et al. Conducting Salanti G, Marinho V, Higgins JP. A case study of multiple-
indirect-treatment-comparison and network-meta-anal- treatments meta-analysis demonstrates that covariates
ysis studies: report of the ISPOR task force on indirect should be considered. J Clin Epidemiol.
treatment comparisons good research practices-part 2009;62:857–64.
2. Value Health. 2011;14:429–37. Salanti G, Dias S, Welton NJ, et al. Evaluating novel agent
Hughes S. First “comparison” of prasugrel and ticagrelor. effects in multiple-treatments meta-regression. Stat
2010 Sep16. Available from http://www.theheart.org/ Med. 2010;29:2369–83.
article/1122713.do. Accessed 27 Apr 2011. Soares HP, Kumar A, Daniels S, et al. Evaluation of new
Jackson D, Riley R, White IR. Multivariate meta-analysis: treatments in radiation oncology: are they better than
potential and promise. Stat Med. 2011;30:2481–98. standard treatments? JAMA. 2005;293:970–8.
Jansen JP, Schmid CH, Salanti G. Directed acyclic graphs Song F, Altman D, Glenny AM, et al. Validity of indirect
can help understand bias in indirect and mixed treat- comparison for estimating efficacy of competing inter-
ment comparisons. J Clin Epidemiol. ventions: empirical evidence from published meta-
2012;65:798–807. analyses. BMJ. 2003;326:472.
Jones A, Takeda A, Tan SC, Cooper K, Loveman E, Song F, Loke YK, Walsh T, et al. Methodological problems
Clegg A, Murray N. Gemcitabine for metastatic breast in the use of indirect comparisons for evaluating
cancer: evidence review group report. 2006. Available healthcare interventions: survey of published system-
from www.nice.org.uk atic reviews. BMJ. 2009;338:b1147.
Lambert PC, Sutton AJ, Burton PR, Abrams KR, et al. Song F, Xiong T, Parekh-Bhurke S, et al. Inconsistency
How vague is vague? A simulation study of the impact between direct and indirect comparisons of competing
of the use of vague prior distributions in MCMC using interventions: meta-epidemiological study. BMJ.
WinBUGS. Stat Med. 2005;24:2401–28. 2011;343:d4909.
Lu G, Ades AE. Combination of direct and indirect evi- Spiegelhalter DJ, Best NG, Bradley PC, et al. Bayesian
dence in mixed treatment comparisons. Stat Med. measures of model complexity and fit. J R Stat Soc Ser
2004;23(20):3105–24. PMID: 15449338” B. 2002;64:583–639.
25 Network Meta-analysis 615
Spiegelhalter DJ, Abrams KR, Myles PJ. Bayesian Viechtbauer W. Confidence intervals for the amount of
approaches to clinical trials and health-care evaluation. heterogeneity in meta-analysis. Stat Med.
Chichester: Wiley; 2004. 2007;26:37–52.
Sutton AJ, Abrams KR. Bayesian methods in meta- Warn DE, Thompson SG, Spiegelhalter DJ. Bayesian ran-
analysis and evidence synthesis. Stat Methods Med dom effects meta-analysis of trials with binary
Res. 2001;10:277–303. outcomes: methods for the absolute risk
Thijs V, Lemmens R, Fieuws S. Network meta-analysis: difference and relative risk scales. Stat Med.
simultaneous meta-analysis of common antiplatelet 2002;21:1601–23.
regimens after transient ischaemic attack or stroke. Wells GA, Sultan SA, Chen L, et al. Indirect evidence:
Eur Heart J. 2008;29:1086–92. indirect treatment comparisons in meta-analysis.
Uhtman OA, Abdulmalik J. Comparative efficacy and Ottawa: Canadian Agency for Drugs and Technologies
acceptability of pharmacotherapeutic agents for anxiety in Health; 2009.
disorders in children and adolescents: a mixed treat- White IR. Multivariate random-effects meta-
ment comparison meta-analysis. Cur Med Res Opin. regression: updates to mvmeta. Stata
2010;26(1):53–9. J. 2011;11(2):255–70.
Introduction to Social Network
Analysis 26
Alistair James O’Malley and Jukka-Pekka Onnela
Contents
Part I: Introduction and Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
Historical Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
Representation of Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
Representation of Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
Descriptive Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
Unipartite or One-Mode Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
Bipartite or Two-Mode Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
Part II: Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
Network Influence Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
Relational Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
Part III: Network Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638
Generative Models of Network Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
Network Communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
Part IV: Discussion and Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
Glossary of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
Terms Used in Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
Terms Used in Network Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
A. J. O’Malley (*)
Abstract
The Dartmouth Institute for Health Policy and Clinical This chapter introduces statistical methods
Practice, Department of Biomedical Data Science, Geisel used in the analysis of social networks and
School of Medicine at Dartmouth, Lebanon, NH, USA in the rapidly evolving parallel-field of net-
Department of Health Care Policy, Harvard Medical work science. Although several instances
School, Boston, MA, USA of social network analysis in health services
e-mail: James.OMalley@Dartmouth.edu
research have appeared recently, the majority
J.-P. Onnela involve only the most basic methods and
Department of Biostatistics, Harvard School of Public
Health, Boston, MA, USA
thus scratch the surface of what might be
e-mail: onnela@biostat.hsph.edu; accomplished. Cutting-edge methods using
onnela@hsph.harvard.edu
relevant examples and illustrations in health might also influence the weight of A’s friends
services research are provided. (B and C) because they exercise more when
around A. Hence, A’s weight intervention may
also affect the weight of B and C. A consequence
Part I: Introduction and Background is that the total effect of A’s treatment must also
consider its effect on B and C, the benefit to
Social network analysis is the study of the struc- individuals to whom B and C are connected, and
ture of relationships linking individuals (or other so on. Such interference between observations
social units, such as organizations) and of interde- violates the stable-unit treatment value assump-
pendencies in behavior or attitudes related to con- tion (SUTVA) that one individual’s treatment not
figurations of social relations. The observational affect anothers outcome (Rubin 1978), which pre-
units in a social network are the relationships sents challenges for identification of causal
between individuals and their attributes. Whereas effects. Interference is likely to result in an incon-
studies in medicine typically involve individuals gruity between a regression parameter and the
whose observations can be thought of as statisti- causal effect that would be estimated in the
cally independent, observations made on social absence of interference.
networks may be simultaneously dependent on The second problem is important in sociology
all other observations due to the social ties and as social networks are thought to reveal the struc-
pathways linking them. Accordingly, different ture of a group, organization, or society as a whole
statistical techniques are needed to analyze social (Freeman 2004). For example, there has always
network data. The focus of this chapter is socio- been great interest in determining whether the
centric data, the case when relational data is triad is an important social unit (Simmel 1908;
available for all pairs of individuals, allowing a Heider 1946). If the existence of network ties A-B
fully-fledged review of available methods. and A-C makes the presence of network tie B-C
Two major questions in social network analy- more likely then the network exhibits transitivity,
sis are: (1) do behavioral and other mutable traits commonly described as “a friend of a friend is a
spread from person-to-person through a process friend.” Thus, just as an individual may influence
of induction (also known as social influence, peer or be influenced by multiple others, the relation-
effects, or social contagion); (2) what exogeneous ship status of one dyad (pair of individuals) may
factors (e.g., shared actor traits) or endogeneous affect the relationship status of another dyad, even
factors (e.g., internal configurations of actors such if no individuals are common to multiple dyads.
as triads) are important to the overall structure of Accounting for between dyad dependence is a
relationships among a group of individuals. core component of many social network analyses
The first problem has affinity to medical stud- and has entailed much methodological research.
ies in that individuals are the observational units. Network science is a parallel field to social
In medicine, the health of an individual is para- network analysis in that there is very little overlap
mount and so individual outcomes have histori- between researchers in the respective fields
cally been used to judge the effectiveness of an despite the similarity of the problems. Whereas
intervention. A study of social influence in med- solutions to problems in social networks have
icine may involve the same outcome, but the tended to be data-oriented in that models and
treatment or intervention is the same variable statistical tests are based on the data, those in
evaluated on the peers of the focal individual network science have tended to be phenomenon-
(referred to as alters). An important characteristic oriented with analogies to problems in the physi-
of studies of social influence is that individuals cal sciences often providing the backbone for
may partly or fully share treatments and one indi- solutions. Methods for social network analysis
vidual’s treatment may depend on the outcome of often have causal hypotheses (e.g., does one indi-
another. For example, an intervention that encour- vidual have an effect on another, does the pres-
ages person A to exercise in order to lose weight ence of a common friend make friendship
26 Introduction to Social Network Analysis 619
They can also convey social influence by defining methods to provide valuable insight on important
norms about such health-related behaviors as practical problems.
smoking or diet, or via social controls promoting
(for example) adherence to medication regimes
(Marsden 2006). Networks are also channels
through which certain communicable diseases, Representation of Networks
notably sexually transmitted ones, spread
(Klovdahl 1985) and certain network structures Social networks are comprised of units and
have been hypothesized to reduce exposure to the relationships between them. The units are
stressors (Haines and Hurlbert 1992). often individuals (also referred to as actors)
A field known as mathematical sociology but can include larger (e.g., countries, compa-
complemented social theory by attempting to nies) and smaller (e.g., organisms, genes)
derive results using mathematical rather than intu- entities.
itive arguments. In particular, statistical and prob-
ability methods are used to test for the presence of
various structural features in the network. Other Network Data
key areas of mathematics that have been used in
network analysis include graph theory and alge- In sociocentric studies, data is assembled on the
braic models. Katz and Powell (1955) develop ties linking all units or actors within some
tests of dependence within dyads (pairs of actors) bounded social collective (Laumann et al. 1983).
while Harary (1953) and Harary (1955) develop For example, the collection of data on the network
tests of triadic dependence. In general, results of all children in a classroom or on all pairs of
were descriptive or based on simple models mak- physician collaborations within a medical practice
ing strong assumptions about the network. With constitutes a sociocentric study. Relationships can
the advent of powerful computers, mathematical be shared or directional, and quantified by binary
contributions have taken on more importance as (tie exists or not), scale (or valued), or multivariate
so much more can be implemented than in the variables. By measuring all relationships, socio-
past. For example, computer simulation has centric data constitutes the highest level of infor-
recently been used to test and develop theoretical mation collection and facilitates an extensive
results (Centola 2009). range of analyses including accounting for the
In the mid-late 1990s, network science effects of multiple actors on actor outcomes or
emerged as a discipline. Whereas social networks the structure of the network itself to be studied
were the domain of social scientists and a growing (O’Malley and Marsden 2008). A weaker form of
number of statisticians, network scientists typi- relational data is collected in egocentric studies
cally have backgrounds in physics, computer sci- where individuals (“egos”) are sampled at random
ence, or applied mathematics. The use of physical and information is collected on at least a sample of
concepts to generate solutions to problems is com- the individuals with direct ties to the egos
mon as evinced by the large domains of research (“alters”). Because standard statistical methods
focusing on the adaptation of (e.g.,) a particular such as regression analysis can generally be used
physical equation to network data. For example, to analyze egocentric data (O’Malley et al. 2012),
several procedures for partitioning a network into herein egocentric data are not featured.
disjoint groups of individuals (“communities”) Relational data is often binary (e.g., friend or
rely on the modularity equation, which was devel- nonfriend). One reason is that other types of rela-
oped in the context of spin-theory to model the tional data (e.g., nominal, ordinal, interval-
interaction of electrons. While much of the initial valued) are often transformed to binary due to
work focused on the properties of the solution at the convenience of displaying binary networks.
different values of the parameters, there recently Another is the greater range of models available
has been increased attention to using these for modeling binary data.
26 Introduction to Social Network Analysis 621
Many studies involve two distinct types of units, square for one-mode and rectangular for
such as patients and physicians, or physicians and two-mode networks. Elements of the matrix con-
hospitals, authors and journal articles or books, etc.
In these two-mode networks, the elementary rela- tain the value of the relationship linking the
tionships of interest usually refer to affiliations of corresponding units or actors, so that element ij
units in one set with those in the other, e.g., of represents the relationship from actor i to actor j.
patients with the physician(s) responsible for their With binary ties (1 = tie present, 0 = tie absent),
care, or of physicians with the hospital(s) at which
they are admitted to practice. Two-mode networks the matrix representation is known as an adja-
are also known as affiliation or bipartite networks. cency matrix. Irrespective of how the network is
They can be viewed as a special case of general valued, the diagonal elements of the matrix
sociocentric network data in that the relationship of representing the network equal 0 as self-ties are
interest is between heterogeneous types of actors.
not permitted. Several network properties can be
The advent of high-powered computers has computed through matrix operations.
enabled the analysis of large networks, which In graphical form, units or actors are vertices
has benefitted fields such as health services and nonnull relationships are lines. Nondirected
research that regularly encounter large data sets. relationships are known as “edges” and directed
A challenge facing analyses of large networks is ones as “arcs”; arrows at the end(s) of arcs denote
that it may be infeasible for all actors to be their directionality. Value-weighted graphs can be
exposed to each other actor and thus for a rela- constructed by displaying nonnull tie values along
tionship to have formed. Therefore, statistical ana- arcs or edges, or by letting thinner and thicker
lyses for large networks essentially use relational lines represent line values. Such graphical imag-
data representing the joint event of individuals ery is a hallmark of social network analysis (Free-
meeting and then forming a tie, not the network man 2004).
of ties that would be observed if all pairs of Two-mode (or bipartite) networks may be
individuals actually met. Accordingly, analyses represented in set-theoretic form as hypergraphs
of large networks may underestimate effect sizes consisting of a set of actors of one type, together
unless information on the likelihood of two indi- with a collection of subsets of the actors defined
viduals meeting is incorporated. on the basis of a common actor of the second type
(Wasserman and Faust 1994). This representation
highlights the multiparty relationships that may
exist among those actors of one type that are
Representation of Network Data linked to a given actor of the other type, e.g., the
set of all physicians affiliated with a particular
Let the status of the relationship from i to j be clinic or service. In matrix form, element ij of an
denoted by aij, element ij of the adjacency matrix affiliation matrix A indicates that actor i of the first
A. In a directed network aij may differ from aji type is linked to actor j of the second type. Affil-
while in a nondirected network aij = aji, implying iation networks may usefully be represented as
A = AT. A network constructed from friendship bipartite graphs in which nodes are partitioned
nominations is likely to be directed while a network into two disjoint subsets and all lines link nodes
of coworkers is nondirected. In the case of immu- in different sets.
table relationships (e.g., siblings), A will only An induced one-mode network A may be
change as actors are added or removed (e.g., obtained by multiplying an affiliation matrix B
through birth or death), as relationship status is by its transpose, A = BBT; entry ij of the outer-
otherwise invariant. In the following, assume the product BBT gives the number of affiliations
network is binary unless otherwise stated (Fig. 1). shared by a pair of actors of one type (see Fig. 2,
Matrices and graphs are two common ways of which emulates a figure in Landon et al. (2012)).
representing the status of a network at a fixed Dually, the inner-product BT B yields a one-mode
time. In a matrix representation, rows and col- network of shared affiliations among actors of the
umns correspond to units or actors; the matrix is second type (Breiger 1974). The diagonals of the
622 A. J. O’Malley and J.-P. Onnela
1 2 3 A B C D E F
1 1 0 A 2 1 2 1 1 0 A
1 0 0 B 1 1 1 0 0 0 B
1 1 0 C 2 1 2 1 1 0 C
0 1 1 D 1 0 1 2 2 1 D
0 1 1 E 1 0 1 2 2 1 E
0 0 1 F 0 0 0 1 1 1 F
Fig. 2 A schematic illustrating a projection from a projection of the doctor-patient network is obtained by
two-mode (bipartite) to a one-mode (unipartite) network. multiplying the bipartite adjacency matrix B by its trans-
For example, Medicare records link each doctor to a num- pose, BT, to yield a 6 6 symmetric one-mode adjacency
ber of patients, defining a bipartite network consisting of matrix A, whose elements indicate the number of patients
two types of nodes, doctors and patients. An edge can only the two physicians have in common. The diagonal ele-
exist between different types of nodes (a doctor and a ments of A correspond to the number of patients the
patient), and the network is fully described by the (in this given physician “shares with themselves” (i.e., the number
case 6 3) bipartite adjacency matrix B. A one-mode of patients they care for)
outer and inner matrix products give the degree of that shared actors from the other mode act as
the actors (i.e., the number of ties to actors of the surrogates for ties between the actors. For exam-
other mode). ple, physicians with many patients in common
In health services applications, an investigator might have heightened opportunities for contact
is often interested in a one-mode network that is through consultations or sharing of information
not directly observed but rather is induced from a about those patients, and thus the number of
two-mode network. Such one-mode projection shared patients is a surrogate for the actual extent
networks are motivated theoretically by a claim of interaction between pairs of physicians.
26 Introduction to Social Network Analysis 623
Examples of provider (physician, hospital, health involves subgroup-specific network density statis-
service area) networks obtained as one-mode pro- tics. With high homophily according to some
jections of bipartite networks in health services attribute, networks tend toward segregation by
research are given in (Barnett et al. 2011, 2012a, b; that attribute – the extreme case occurs when the
Pham et al. 2009). network consists of separate components (i.e., no
An often overlooked feature of bipartite net- ties between actors in different components)
work analysis is the mechanism by which network defined by levels of the attribute. In the other
data is obtained. Networks obtained from direction, one obtains a bipartite network where
one-mode projections have different statistical all ties are between different types of actors
properties from directly observed one-mode net- (extreme heterophily).
works. Consider a patient-physician bipartite net- The out- and in-degree for an actor i are the
P
work and suppose a threshold is applied to the number of ties from, aiþ ¼ Nj¼1 aij (column
P
physician one-mode projection such that true sum), and to, aþj ¼ Ni¼1 aij (row sum), actor i.
social ties are assumed to exist or not according These are also referred to as expansiveness and
to whether one or more patients are shared. Then a popularity, respectively. For example, a positive
patient that visits three physicians induces ties correlation between out- and in-degree suggests
between all three physicians. The same complete that popular individuals are expansive.
set of ties between the three physicians is also The number of ties (or value of the ties) in a
induced by three patients that each visit different network is given by L ¼ N d, where d denotes the
pairs of the three physicians. However, the pro- mean degree (or strength) of an individual, imply-
jection does not preserve the distinction (see sec- ing the density of the network is given by D ¼ d=
tion “Bipartite or Two-Mode Networks” for ðN 1Þ. This result is not specific to in- or out-
further comment). degree due to the fact that the total number of
inward ties must equal the total number of outward
ties, implying mean in-degree equals mean
Descriptive Measures out-degree.
The variance of the degree distribution measures
Unipartite or One-Mode Networks the extent to which tie-density (or connectedness)
varies across the network (Snijders 1981). Often
The number of units or actors (N) is known as the actors having higher degree have prominent roles
order of the network. A common network statistic in the network (Freeman 1979). A special type of
is network density (D), defined as the number of homophily is the phenomenon where individuals
ties across the network (L ) divided by the number form ties with individuals of similar degree, com-
of possible ties; for directed networks D = L/ monly referred to as assortative mixing. In
(N (N 1)) and for nondirected networks directed networks, assortative mixing can be
D = L/(2 N (N 1)). Thus, density equals the defined with respect to both out-degree and
mean value of the binary (1, 0) ties across the in-degree (Piraveenan et al. 2010). The opposite
network. The same definition can be used for scenario to a network with the same degree for all
general relational data, in which case the resulting actors is a k-star – a network configuration with
measure is sometimes referred to as strength. k relationships are incident to the focal actor
While results in this chapter are generally pre- (Fig. 3) – in which there are no ties between the
sented for binary networks, corresponding mea- other actors.
sures for weighted networks often exist (Opsahl The length of a path between two actors
et al. 2010). through the network is defined as the number of
The tendency for relationships to form between ties traversed to get from one actor to the other.
people having similar attributes is known as The elements of the adjacency matrix multiplied
homophily (McPherson et al. 2001). Homophily by itself k 1 times, denoted Ak, equal the number
624 A. J. O’Malley and J.-P. Onnela
as those of the actors to which an actor is linked sets of intermediary actors who also lie within
(Bonacich 1987). It assumes that connections to the subgraph). Such a criterion is related to
central actors indicate greater prominence than do k-coreness, a measure of the extent to which sub-
(similar-strength) connections to peripheral graphs with all internal degrees k occur
actors. The key component of the measure is the (Seidman 1983) in a network.
largest eigenvalue of an adjacency or other matrix There are several other ways for grouping the
representation of the network (Bonacich 1987). actors in a network. Model-based methods
Network-level centrality indices (Freeman include mixed-membership stochastic block
1979) are network-level statistics that resemble models (Airoldi et al. 2008) and latent-class
the degree variance whose values grow larger to models in which the group is treated as a categor-
the extent that a single actor is involved in all ical individual-level latent variable (Handcock
relationships (as in the “star” network shown in et al. 2007) while nonparametric methods used
Fig. 3). in network science include modularity and its
variants. These methods are discussed in section
Cliques, Components, and Communities “Network Communities”, where the grouping of
The assignment of actors to groups is an important actors is referred to as community detection.
and growing field within social networks. The
rationale for grouping actors is that it may reveal
salient social distinctions that are not directly Bipartite or Two-Mode Networks
observed. The general statistical principle adhered
to is that individuals within a group are more alike In practice two-mode networks are rarely directly
than individuals in different groups. Groups are analyzed. If one of the modes instigates ties or is
typically formed on the basis of network ties of primary interest, the network involving just
alone, the rationale being that the similarity of those actors is often analyzed as a single-mode
individuals’ positions in the network is in-part network. For example, in a physician-patient
revealed by the pattern of ties involving them. referral network, the physicians often instigate
Thus, actors in densely connected parts of the ties through patient referrals while patients are
network are likely to be grouped together. A chiefly responsible for who they see first. The
related concept to a group is a clique, a maximal projection from a two-mode network to a
subset of actors having density 1.0 (i.e., ties exist one-mode network links nodes in one mode
between all pairs of individuals in a binary net- (e.g., physicians) if they share a node of the
work). The larger the clique the stronger the evi- other mode (e.g., patients). A weighted network
dence that the collective individuals are in the can be formed with the number of shared actors of
same group. Grouping algorithms based on max- the other mode (or function thereof) as weights.
imizing the ratio of within-group to between- In describing networks obtained from a projec-
group ties are unlikely to split large cliques as tion of a two-mode network, the usual practice is
doing so creates a lot of between-group ties. How- to use unipartite descriptive measures. However,
ever, a clique need not be its own group. several layers of information are lost, including
Components of a network are defined by the the number of actors in the other mode underlying
nonexistence of any paths between the actors a tie and the degree distribution of the actors in the
in them. Often a network is comprised of one other mode, from treating a one-mode projection
large component and several small components as an actual network. Even if the two-mode net-
containing few individuals. A more practical work is completely random, ties in a one-mode
way of grouping individuals than by cliques is projection that arise from a single (e.g.,) patient
through k-connected components (White and with ties to (e.g.,) three physicians are not sep-
Harary 2001), a maximal subset of actors mutu- arate events. More generally, a patient who visits
ally linked to one another by at least k node-inde- k-physicians generates a k-clique among those
pendent paths (i.e., paths that involve disjoint physicians and tells us nothing about whether
626 A. J. O’Malley and J.-P. Onnela
physician sharing of one patient is correlated with individual, yielding their strength. Degree and
physician sharing of another patient – the question strength together distinguish between actors with
of primary interest in the study of the diffusion of many weak ties and those with a few strong ties.
treatment practices. Thus, k-cliques for k > 2 may Analogous measures of centrality can also be
be excluded from measures of transitivity in computed for the weighted one-mode projection
two-mode networks. (Opsahl et al. 2010). However, whether ties
Descriptive measures for two-mode networks between k physicians arise through them all
may be computed that parallel those for treating the same patient, from each pair of phy-
one-mode networks (Wasserman and Faust sicians sharing a unique patient, or some
1994). Centrality measures based on the bipartite in-between scenario cannot be determined post-
network representation are covered in Faust transformation; thus, the projection transforma-
(1997). Borgatti and Everett (1997) review visu- tion expends information.
alization, subgroup detection, and measurement A further strategy is to set weights for the
of centrality for two-mode network data. More bipartite network prior to forming the projection.
descriptive measures for two-mode networks For example, in coauthorship networks, the tie
have recently been proposed. For example, a connecting an author to a publication might
two-mode measure of transitivity defined as the receive a weight of 1/(Nj 1) where Nj is the
ratio of the total number of six cycles (closed number of authors on paper j (Newman 2001).
paths of six ties through six nodes) in the (Only papers with at least two authors are used
two-mode network divided by the total number to form such networks.) The rationale is that the
of open five-paths through six nodes (Opsahl greater the number of authors the lower the
2011). In the context of the patient-physician expected interaction between any pair (a similar
network, physician transitivity exists if physi- logic underlies the example weight matrix
cians A and B sharing a patient and physicians described in section “Network Influence
B and C sharing a patient makes it more likely for Models”). The sum of the weights across all pub-
physicians A and C to share a patient. It is only if lications common to two authors is then the basis
the two pairs of physicians have different of their relationship in the author network.
patients in common that the physician triad If the events defining the bipartite network
may be transitive and only if the third pair occur at different times (e.g., medical claims data
share a different patient from the first two that often contain time-stamps for each patient-
the event can be attributed to transitivity. The physician encounter), a directed one-mode net-
involvement of distinct patients makes the work may be formed. The value of the A-B and
physician-physician ties distinct events and B-A ties in the physician-physician network could
thus informative about clustering of physicians be the number of patients who visited A before B
(and patients). and B before A, respectively. In the resulting
In general, the matrix equation A = BBT in directed network each physician has a flow to
which a bipartite network adjacency matrix B is and from each other physician. Subsequent trans-
multiplied by its transpose yields a weighted formation of the flows to binary values yields
one-mode network (the elements contain the num- dyads with states null, directed, and mutual as in
ber of shared actors of the other mode). To avoid a directed unipartite binary network.
losing information about the number of actors Because medical claims and surveys are fre-
leading to a tie between primary nodes, weights quent sources of information about one entity’s
can be retained or monotonically transformed in experience (e.g., a patient) with another entity
the projected network. Weighted analogies of (e.g., a health plan or physician), bipartite network
descriptive measures of binary networks can be analysis is an area that promises to have enormous
evaluated on the weighted one-mode projection. applicability to health services research. Hence,
For example, the calculation of degree is emulated new methods for bipartite network analysis are
by summing the weights of the edges involving an needed.
26 Introduction to Social Network Analysis 627
Part II: Statistical Models In social influence analyses the weight matrix,
W = [wij] in Fig. 4, apportions the total influence
We now consider the use of statistical models in acting on an individual evenly across the individ-
social network analysis. Particular emphasis is uals with whom they have a network tie. Typically
placed on methods for estimating social influence
or peer effects and models for analyzing the net- 1. wij 0: nonnegative weights.
work itself, including accounting for social selec- 2. wii = 0: no self-influence.
tion through the estimation of effects of 3. j wij = 1: weights give relative influences
homophily. (because its row-sums equal 1, W is said to be
row-stochastic).
A A B C D E F G
A 0 1/2 1/2 0 0 0 0
B C B 0 0 0 0 0 1 0
E C 1/2 0 0 0 1/2 0 0
D 0 0 1 0 0 0 0
F D E 1/3 0 1/3 0 0 0 1/3
F 0 0 0 0 1 0 0
G G 1/6 1/6 1/6 1/6 1/6 1/6 0
Fig. 4 Construction of a network weight matrix W (right). here assumes that influence only acts in the direction of the
A directed edge from i to j means that node (or individual) edge, influence may in general act in the absence of a tie
i has a relationship to node j while element ij of W quan- (e.g., people who consider me as a friend might influence
tifies the extent that individual i is influenced by individual me even if I do not consider them a friend)
j. Although the mathematical form of influence depicted
from Eq. 3, its elements can potentially be used as peer effects as they do not account for the statis-
Ivs (Fletcher 2008). However, IV methods can be tical dependence introduced by individuals who
problematic if the instrument is weak or if the play the dual role of ego and alter at time t
assumption that the IV does not directly impact (VanderWeele et al. 2012).
yit (the exclusion restriction) is violated, an
untestable assumption. Thus, in fitting a model Frontiers in Social Influence
with contemporaneous peer effects, one faces a There has recently been a lot of interest and dis-
choice between assuming a multivariate distribu- cussion concerning causal peer effects. Issues that
tion holds, relying on the nonexistence of have been discussed include the use of ordinary
unmeasured confounding variables, or relying on least squares (OLS) for the estimation of contem-
the validity of an IV. None of these assumptions poraneous peer effects (Lyons 2011) and the iden-
can be evaluated unconditionally on the tification of peer effects independent of
observed data. homophily (Shalizi and Thomas 2011). The dis-
While joint modeling and IV methods provide cussion has helped elevate social network meth-
theoretical solutions to the estimation of contem- odology to the forefront of many disciplines. For
poraneous peer effects, the notion of causality is example, VanderWeele et al. (2012) show that
philosophically challenged when the cause is not OLS still provides a valid test of the null hypoth-
known to occur prior to effect. Therefore, longi- esis that the peer effect is zero when the true peer
tudinal data provide an important basis for the effect is zero. Therefore, OLS can be used to test
identification of causal effects, in particular in for peer effects despite the fact that OLS estimates
negating concerns of reverse causality. If the are inconsistent under the alternative hypothesis.
observation times are far apart the use of lagged Christakis and Fowler (2007) use tie direction-
alter predictors may, however, substantially ality to account for unmeasured confounding vari-
reduce the power of an analysis. ables under the assumption that their effect on
relationship status is the same for all types of
Dyadic Influence Analyses relationships. The rationale is that the estimated
If the dyads consist of mutually exclusive or iso- peer effect in dyads where the relationship is not
lated pairs of actors there are no interdyad ties and expected to be conducive to peer influence (“con-
influence only acts within dyads. An example of trol relationships”) provides a baseline against
such a situation occurs when individuals can have which to identify the peer effect for other types
exactly one relationship and the relationship is of relationships. However, this test fails to offer
reciprocated, as is the case with spousal dyads. complete protection against unmeasured homo-
The network influence models of section “Net- phily (Shalizi and Thomas 2011), reflecting the
work Influence Models” reduce to dyadic influ- vulnerability of observational data to unmeasured
ence models in which the predictors are based on sources of bias. However, sensitivity analyses that
individual alters. For example, the dyadic influ- evaluate the effect-size needed to overturn the
ence model analogous to Eq. 3 is obtained by results may be conducted to help support a con-
replacing the subscript i with j. That is, clusion by illustrating that the confounding effect
must be implausibly large to reverse the finding
yit ¼ α0 yjt þ α1 yjðt1Þ þ αTx xjt þ β1 yiðt1Þ (VanderWeele 2011).
Instrumental variable (IV) methods have also
þ xTit β2 þ eit : (4) been used to estimate peer effects. A common
source of instruments is alters’ attributes other
The model in Eq. 4 may be estimated using than the one for which the peer effect is estimated
generalized estimating equations (GEE), avoiding (Fletcher 2008; Fletcher and Lehrer 2009). Poten-
specifying a distribution for eit. However, if any tial IVs must predict the attribute of interest in the
relationships are bidirectional, standard software alter but must not be a cause of the same attribute
packages will yield inconsistent estimates of the in other individuals. Attributes that are invisible
26 Introduction to Social Network Analysis 631
such as an individual’s genes appear to be ideal where α = (α1, . . ., αN)T and γ = (γ 1, . . ., γ N)T are
candidate genes. For instance, an individual with vectors of actor-specific parameters representing
two risk alleles of an obesity gene is at more risk the actors’ expansiveness (propensity to send ties)
of increased BMI but conditional on that individ- and popularity (propensity to receive ties), respec-
ual’s BMI their obesity genes should not affect the tively, and xij is a vector of covariates relevant to
BMI of other individuals. However, if the obesity aij (this may include covariates specific to either
genes are revealed through another behavior actor and combined traits of both actors). It is
(a phenomenon known as pleiotropy) that is asso- important to realize that covariates can be direc-
ciated with BMI then, unless such factors are tional; thus, xij need not equal xji. Although the
conditioned on, genes will not be valid IVs. model may include other parameters, α and γ play
an important role in network analysis due to their
relationship to the degree distribution of the net-
work and so are explicitly denoted.
Relational Analyses
When relationship status is binary, the distri-
bution of (aij, aji) is a four-component multino-
Sociocentric network studies assemble data on
mial distribution. The probabilities are typically
the ties representing the relationship linking a set
represented in the form of a generalized logistic
of individuals, such as all physicians within a
regression model (an extension of the logistic
medical practice. Models for such data posit
regression model to 2 categories) having the
that global network properties are the result of
form
phenomena involving subgroups of (most com-
monly) four or fewer actors (Robins et al. 2005).
pr aij ,aji j α, γ ¼ k1
ij exp μij aij þ μji aji þ ρij aij aji ,
Examples of such regularities are actor-level
tendencies to produce or attract ties (homophily (6)
and heterophily), dyadic tendencies toward rec-
iprocity, and triadic tendencies toward closure or where
transitivity. A relational model, in essence, spec-
ifies a set of microlevel rules governing the local κij ¼ 1 þ exp μij þ exp μji
structure of a network. In this section, models for
þ exp μij þ μji þ ρij ,
cross-sectional relational data are considered
first followed by longitudinal counterparts and μij, μji, and ρij are functions of (αi, αj, γ i, γ j)
of them. and (xij, xji). The term μij includes factors associ-
The simplest models for sociocentric data ated with the likelihood that aij = 1 but not nec-
assume dyadic independence. Under the random essarily the likelihood that aji = 1. In a
model, all ties have equal probability of occurring nondirected network the predictors can be direc-
and the status of one has no impact on the status of tional and so it is likely that μij 6¼ μji. However, the
another (Erdős and Rényi 1959). More general only covariates included in ρij must be non-
dyadic models were developed in Holland and directional as they affect the likelihood of (aij,
Leinhardt (1981) and later were extended in aji) = (1, 1); the sign of ρij indicates whether a
Wang and Wong (1987). Because independence mutual tie is more (if ρij > 0) or less (if ρij < 0)
is still assumed between dyads, the information likely to occur than predicted by the density terms
from the data about the model parameters accu- and so is a measure of reciprocity or mutuality.
mulates in the form of a product of the probability Null mutuality is implied by ρij = 0.
densities for the status of the dyadic observation In dyadic models, the terms μij, μji, and ρij
over each dyad: account for the local network about actors i and
j through the inclusion of (αi, αj, γ i, γ j). Further-
N
L ¼ ∏ pr aij , aji j α, γ, xij , xji , (5) more, other effects can be homogeneous across
i<j actors or actor-specific. For example, the p1 model
632 A. J. O’Malley and J.-P. Onnela
(Holland and Leinhardt 1981) assumes where A denotes a possible state of the network, sk
μij = μ + αi + γ j and ρij = ρ, implying the (A) denotes a network statistic evaluated over A
covariate-free joint probability density function (e.g., the number of ties, the number of recipro-
of the network given by cated ties), κ(θ) = ΣA A exp.(Σk θk sk (A)), and
A is the set of all 2N (N 1) possible realizations of a
(
X
N directed network. In general, the scale factor κ(θ)
p1 ðAÞ / exp μs1 ðAÞ þ αi s2i ðAÞ that sums over each distinct network does not
i
factor into a product of analogous terms. As a
X
N result, it is computationally infeasible to exactly
þ s3i ðAÞ þ ρs4 ðAÞg, evaluate the likelihood function of dyadic depen-
j dent ERGMs for even moderately sized N (e.g.,
N > 20 is problematic (Hunter and Handcock
where s1(A) = i/=j aij, s2i(A) = ai+, s3j (A) = a+j 2006)). The key feature of the p1 model that
and s4(A) = i 6¼ j aij aji. Thus, the p1 model allows the probability of the network to decom-
depends on 2 N + 2 network statistics and associ- pose into the product of dyadic-state probabilities
ated parameters. If the p1 model holds within (ego, is that it only depends on network statistics sk (A)
alter)-shared values of categorical attributes, a that sum individual ties or pairs of ties from the
stochastic block model is obtained by allowing same dyad.
block-specific modifications to the density and If dyads are independent unless they share
reciprocity of ties (Fineberg and Wasserman an actor, the network is a Markov Random
1981; Holland et al. 1983; Wang and Wong Graph (Frank and Strauss 1986). Markov
1987). An extension would allow reciprocity to Random Graphs may include terms for density,
also vary between blocks. Because the stochastic reciprocity, transitivity and other triadic
blockmodel extension of the p1 model is saturated structures, and k-stars (equivalent to the degree
at the actor-level due to the expansiveness and distribution) – these terms contain sums of the
popularity fixed effects, no assumption is made products of no more than three ties. Such
about differences in the degree-distributions of the terms may be multiplied with actor attribute
actors in different blocks. Stochastic block models variables to define interaction effects.
are the basis of mixed-membership and other (An interaction is the effect of the product of
recent statistical approaches for node-partitioning two or more variables, e.g., if males and
social network data (Goldenberg et al. 2009; Choi females have different tendencies to reciprocate
et al. 2010; Karrer and Newman 2011). Individ- ties then gender is said to interact with
uals in the same block of a stochastic block model reciprocity.)
are often referred to as being structurally Networks that extend Markov Random
equivalent. Graphs by allowing four-cycles but no fifth- or
higher-order terms are partially conditionally
Models of Networks as Single dependent. In such networks, a sufficient condi-
Observations tion for dependence of aij and akl is that
A criticism of dyadic independence models is that aik = ajl = 1 or ail = ajk = 1 (Wang et al.
they fail to account for interdependencies between 2009). Thus, two edges may be dependent
dyads. The p or exponential random graph model despite not having any actors in common. Partial
(ERGM) generalizes dyadic independence conditional dependence is the basis of the new
models (Frank and Strauss 1986; Wasserman parameterizations of network statistics devel-
and Pattison 1996). An ERGM has the form oped by Snijders (2006) that have led to better
fitting ERGMs (see below).
!
X Under ERGMs, the conditional likelihood of
PrðA; θÞ ¼ κðθÞ1 exp θk sk ðAÞ , (7) each tie given the other ties in the network has the
k logistic form:
26 Introduction to Social Network Analysis 633
provide evidence for bipartite closure. However, and tie-dissolution differently (Krivitsky and
physicians E and F have patient 3 in common; Handcock 2010).
despite being eligible to exhibit bipartite closure Like ERGMs for cross-sectional data, longitu-
they do not, and hence they provide evidence dinal ERGMs are defined by statistics that count
against bipartite closure. the number of occurrences of substructures in the
Analogies of ERGMs and solutions to prob- network. However, in addition to the current state
lematic issues exist for bipartite networks. For of the network, such statistics may also depend on
example, to avoid problems of high colinearity previous states. Under Markovian dependence,
between the k-star terms, alternating k-star statis- network statistics only depend on the current and
tics can be used in place of them (Wang et al. the most recent state; for example, the number of
2009). Let SD (B) denote the number of ties from ties that remain intact from the preceding obser-
one mode to the other, AS1(B) and AS2(B) denote vation. The recently released TERGM (“temporal
the alternating k-star statistics for each mode, S3P exponential random graph model”) package in the
(B) denote the number of three-paths, and S4C (B) Statnet suite in R estimates ERGMs for discrete
denote the number of closed four-cycles for a temporal (i.e., longitudinal) sociocentric data
network B. The resulting bipartite ERGM for B (Hanneke et al. 2010).
has the form:
Actor-Orientated Approaches
PrðB; θÞ ¼ κðθÞ1 expðθ0 SD ðBÞ þ θ1 AS1 ðBÞ An alternative approach for modeling network
evolution is the actor-oriented model (Snijders
þθ2 AS2 ðBÞ þ θ3 S3P ðBÞ þ θ4 S4C ðBÞÞ, (9) 1996, 2001, 2005). This centers on an objective
function that actors seek to maximize and which
where κ(θ) sums over the M N possible bipartite may be sensitive to multiple network properties,
graphs. The statistic S4C (B)/S3P (B) is the propor- including reciprocity, closure, homophily, or
tion of times that two patients each visit the same contact with high-degree actors. The model
two physicians out of all the occurrences where assumes that actors control their outgoing ties
two patients both have one visit to one physician and change them in order to increase their satis-
and one patient visits the other physician. The faction with the network in one or more respects
coefficient θ4 is the effect associated with this as quantified by the objective function. It resem-
lowest-order form of closure in a two-mode bles a rationale choice model in which each
sense (but should not be thought of as reciprocity agent attempts to maximize their own utility
because the network is nondirected). function. Estimated parameters indicate whether
changes in a given property raise or lower actor
Longitudinal ERGMs satisfaction.
The development of relational models has primar- An important distinction of actor-oriented
ily focused on cross-sectional data. However, models from ERGMs is that the relevant network
extensions of ERGMs to longitudinal scenarios statistics in the actor-oriented model are specific
have been developed – most often involving a to individuals rather than being aggregations
Markov assumption to describe dependence across the network. However, like ERGMs, esti-
across time. The first longitudinal ERGMs treated mation is computationally intensive. The SIENA
tie-formation and tie-dissolution as equitable package in StOCNET (Huisman and Van Duijn
events in the evolution of the network (Hanneke 2004, 2005) uses a stochastic approximation algo-
et al. 2010). A more general formulation treats rithm but struggles with networks of appreciable
tie-formation (attractiveness in the context of net- size (e.g., thousands of individuals). Because they
work science) and tie-duration (the complement only resemble ERGMs in the limiting steady-state
of tie-duration referred to as fitness in network case, actor-oriented models may also suffer from
science) as separable processes, thereby allowing degeneracy but the problem is less profound
the same network statistic to impact tie-formation (Goldenberg et al. 2009).
636 A. J. O’Malley and J.-P. Onnela
8
< λzi, zj where
c zi , zj f1, . . . , K gand λzi, zj ¼ λzj, zi
ξ zi , zj ¼ zi zj where c > 0 and zi , zj have K elements (11)
: T j
zi Uz where zi N ð0, Σz Þand U is a K dimensional diagonal matrix
which is included as an additional predictor in μij. variable in the latent class model (first row) to a
In Eq. 11 the form and interpretation of zi changes position in a continuously valued multidimensional
from denoting a scalar ξ(zi, zj) categorical latent space in the latent distance and latent eigenmodels
26 Introduction to Social Network Analysis 637
(second and third rows, respectively). The term negative (allowing for heterophily as well as homo-
ξ(zi, zj) can be added to either the μij or ρij compo- phily). The model constrains the extent to which the
nents of the p2 model to allow higher-order depen- quadratic forms zTi Uzj, zTi Uzk, and zTj Uzk constructed
dence to moderate the effect of density and from the latent vectors vary from one another. The
reciprocity, respectively. greater the magnitude of Σz = cov(zi) the greater the
In the latent class specification the array of values extent to which ties are expected to cluster and form
of λzi,zj form a symmetric K K matrix Λ. A basic cliques. The latent eigenmodel model is appropriate
specification is λzi,zj = λ0 if zi = zj (nodes in same if a network exhibits clustering due to both structural
partition) and λzi,zj = 0 if zi ¼
6 zj (nodes in different equivalence and unmeasured homophily.
partitions) (Nowicki and Snijders 2001; Airoldi et al. In Hoff (2005) and (2008) models are specified
2008). Latent class models extend stochastic-block at the tie level with reciprocity (in directed net-
models to allow latent clusters as well as observed works) represented as the within-dyad correlation
clustering variables. This family of models is suited between two tie-specific latent variables. Modeling
to network data exhibiting structural equivalence, reciprocity as a latent process differs from the p2
that is, under the model individuals are hypothesized model, in which reciprocity is represented as a
to belong to latent groups such that members of the direct effect (Paul and O’Malley 2013). Therefore,
same group have similar patterns of relationships. an alternative family of latent variable models for
In the latent distance specification the most networks is obtained by augmenting the density
common values for c are 1 and 2, corresponding term in the p2 model with Eq. 11. An advantage
to absolute and cartesian distance, respectively. of specifying a joint model at the dyad level is that
The distance metric accounts for latent homo- the resulting (extended-p2) model involves
phily – the effect of unobserved individual char- N (N 1) fewer latent variables, possibly alleviat-
acteristics that induce ties between individuals. ing computational issues such as nonidentifiability
In this model, zi can be interpreted as the posi- of parameters or multiple local optima.
tion of individual i in a social space (Hoff et al. The challenges of estimating models involving
2002). This model accounts for triadic depen- latent variables resemble those of factor analysis or
dence (e.g., transitivity) by requiring that latent other dimension-reduction methods. First, an appro-
distances between individuals obey the triangle priate value of K may not be able to be specified
inequality. Latent distance models are available from existing knowledge of the network, and esti-
in the LatentNet package in R (Krivitsky and mating K from the data is not straightforward. Sec-
Handcock 2008). ond, computational challenges in estimating the
The latent eigenmodel is the most general spec- latent variables can make the method difficult to
ification and accounts for both structural equiva- apply to large networks. However, such issues are
lence and latent homophily. Furthermore, the more easily overcome than degeneracy in ERGMs.
parameter space of the latent eigenmodel model of Degeneracy is avoided in these models as the model
dimension K generalizes that of the latent class for a dyad determines the distribution of the net-
model of the same dimension and weakly general- work. In other words, the factorization of the likeli-
izes the latent distance model of dimension K 1. hood into a product of like terms ensures that
Conversely, the latent distance model of dimension networks sampled under the model are almost surely
K does not generalize the one-dimensional latent in the neighborhood of the observed network,
eigenmodel model (Hoff 2008). The closeness of increasingly so as N increases (i.e., asymptotically).
the latent factors U1/2zi and U1/2zj quantifies the Another contrast with ERGMs is that the model
structural equivalence of actors i and j positions in describes a population as opposed to the single
the network; a tie is more likely if U1/2zi and U1/2zj observed network. Thus, in latent variable models
have a similar direction and magnitude, allowing for the data-generating process is modeled whereas
more clustering than under Eq. 10. On the other ERGMs are specific to the observed network and
hand, latent homophily is accounted for by the so have more in common with finite population
diagonal elements of U, which can be positive or inference.
638 A. J. O’Malley and J.-P. Onnela
Another advantage of conditional indepen- The approach in Paul and O’Malley (2013) is
dence models over ERGMs is that the same notable for attempting to capture the best of both
types of models can be applied to valued rela- worlds: it allows localized (actor or dyadic) ver-
tional data. Analogous to generalized linear sions of the higher-order predictors available in
models, the link function and any parametric ERGMs to be included as predictors, but avoids
distributions assumptions that define a condi- degeneracy by using their lagged values as
tional independence network model can be tai- opposed to their current values as predictors.
lored to the type of relationship variable (scale, Therefore, conditional on the observed and
count, ratio, categorical, multivariate). How- latent predictors, dyads are cross-sectionally
ever, a recent adaptation of ERGMs has been independent but longitudinally dependent on
proposed for modeling count-valued socio- prior states of other dyads (in addition to their
centric data (Krivitsky 2012). own past states) in the network. An extension
Offsetting the above advantageous features that builds on Paul and O’Malley (2013) is to
of conditional independence models is that incorporate the latent class, distance, or eigen-
terms such as ξ(zi, zj) are limited from the factor terms in Eq. 11 in the model. Such a
hypothesis testing and interpretational stand- model was entertained in Westveld and Hoff
point in that they do not distinguish particular (2011) but has not yet been developed.
forms of social equivalence or latent homophily.
For example, the effect of transitivity is not
distinguished from that of cyclicity or higher-
order clustering, such as tetradic closure. There- Part III: Network Science
fore, the choice of model in practice might
depend on the importance of testing specific We now switch attention to methods that have
hypotheses about higher-order effects to obtaining been derived and used in the field of network
a model whose generative basis allows it to make science. In general, network science approaches
predictions beyond the data set on which the avoid assumptions about distributions in models.
model was estimated. For example, to test whether a network exhibits a
certain property, the commonly employed
Longitudinal Conditional Independence approach is to use a permutation test to develop
Models a null distribution for a statistic that embodies the
Longitudinal counterparts of conditional indepen- property in question and then evaluate how
dence models are obtained by introducing terms extreme the observed value of the statistic is
that account for longitudinal dependence (e.g., with respect to the null distribution. This tech-
past states of the dyad). A simple Markov transi- nique is the cornerstone of the procedure used
tion model was developed in O’Malley and to evaluate the degree of separation to which
Christakis (2011) with tie-formation and social clustering can be detected in Szabo and
tie-dissolution treated as unrelated processes. Barabasi (2007).
Conditional on the past state of the dyad and the Network science focuses not only on
sender and receiver random effects, the value of social networks but also covers information
each tie is assumed to be statistically independent networks, transportation networks, biological
of that of any other tie. A more general formula- networks, and many others. Most of the net-
tion extends the p2 model, allowing dependence works studied within network science are non-
between ties within a dyad (reciprocity), hetero- directed as ties are typically thought of as
geneous effects in the formation and dissolution connections as opposed to measures for which
of ties, and the inclusion of higher-order effects the distinction between instigator and receiver
(e.g., third-order interactions to account for tran- is relevant. Thus, the networks in this section
sitivity) as lagged predictors (Paul and O’Malley are assumed to be nondirected unless stated
2013). otherwise.
26 Introduction to Social Network Analysis 639
and the latter to Nk (t + 1) = Nk (t) 1. Transitions attachment hypothesis, new physicians would be
larger than one, e.g., from k to k + 2 or from k to more likely to form ties with and thus seek advice
k 2 are very unlikely and can be ignored. The from popular established physicians or physicians
value of Nm(t) increases by one per time step as in the same cohort (e.g., Medical school or resi-
each incoming node has degree m, which also dency program).
means there are no nodes with degree less than
m, and hence the equations used to model the Social Network Models
evolution of quantities like Nk (t) are not valid The class of models known as network evolution
for k < m. The resulting degree distribution has models can be defined via three properties: (i) the
the form models incorporate a set of stochastic attachment
rules which determine the evolution of the net-
2mðm þ 1Þ work structure explicitly on a time-step–by–time-
Pð k Þ ¼ ,
k ð k þ 1Þ ð k þ 2Þ step basis; (ii) the network evolution starts from
an empty network consisting of nodes only, or
which asymptotically converges in distribution to from a small seed network possessing arbitrary
P (k) k3. structure; and (iii) the models incorporate a stop-
The preferential attachment model of Barabasi ping criterion, which for growing network models
and Albert has attracted a tremendous amount of is typically in the form of the network size
scientific interest in the past few years, and con- reaching a predetermined value, and for dynami-
sequently numerous modifications of the model cal (nongrowing) network models the conver-
have been introduced. For example, extensions of gence of network statistics to their asymptotic
the model allow: values. Many network evolution models do not
reference intrinsic properties or attributes of
• Ties to appear and disappear between any pairs nodes, and in this sense they are similar to the
of vertices (the original formulation only con- various implementations of preferential attach-
siders the addition of ties between the incom- ment models that do not postulate node-specific
ing vertex and set of vertices already in fitness or attractiveness.
existence). Most network evolution models that are
• Vertices to be deleted either uniformly at ran- intended to model social networks employ some
dom or based on their connectivity. variants of focal closure and cyclic closure (see,
• The attachment probability Π(ki) to be super- e.g., Kossinets and Watts (2006)). Focal closure
linear or sub-linear in degree, or to consist of refers to the formation of ties between individuals
several terms. based on shared foci, which in a medical context
• Nodal attributes, such as the attractiveness (the could correspond to a group of doctors who prac-
propensity with which new ties form with the tice in a particular hospital (the focus). The con-
node) or fitness (the propensity with which cept of shared foci in network science is
established ties remain intact) of a node, and analogous to homophily in social network analy-
the attachment probability can incorporate sis. More broadly, ties could represent any interest
these attributes in addition to degree. or activity that connects otherwise unlinked indi-
• Edges to assume weights instead of {0, 1} viduals. In contrast, cyclic closure refers to the
binary values to codify connection strength idea of forming new ties by navigating and
between any pair of elements. leveraging one’s existing social ties, a process
that results in a cycle in the underlying network.
In the context of physician networks, a prefer- Because the network is nondirected, the term
ential attachment model could be used to examine cycle is used interchangeably with closure. This
the process of new physicians seeking colleagues differs from when the network is directional and a
to ask for advice upon joining a medical organi- cycle is a specific form of closure, with transitivity
zation, such as a hospital. Under the preferential being another form. Triadic closure, which is the
642 A. J. O’Malley and J.-P. Onnela
special case of cyclic closure involving just three (2003) and Toivonen et al. (2006), do not usually
individuals, refers to the process of getting to incorporate link deletion, but instead grow the
know friends of friends, leading to the formation network to a prespecified size, which obviates
of a closed triad in the nondirected network. Most the need for link deletion.
social networks are expected to (i) have skewed Marsili et al. use extensive numerical simula-
and fat-tailed degree distributions, (ii) be assorta- tions, as well as a master equation approach
tively mixed (high-degree individuals are applied to a mean-field approximation of the
connected to high-degree individuals), (iii) be model, to explore the impact of varying the prob-
highly clustered, and (iv) possess the small- abilities η (global linking), ξ (neighborhood
world property (average shortest path lengths are linking), and λ (link deletion) for average degree
short, or more precisely, scale as log(N )), and and average clustering coefficient. Consider a sit-
(v) exhibit community structure. uation where the value of ξ (neighborhood
The models by Davidsen et al. (2002) and linking) is increased while keeping the value of λ
Marsili et al. (2004) exemplify dynamic (non- (link deletion) fixed. At first, for small values of ξ,
growing) network evolution models for social components with more than two nodes are rare,
networks. Both have a mechanism that starts by and the network can be said to be in the sparse
selecting a node i in the network uniformly at phase. Upon increasing the value of ξ up to a
random. In the model of Davidsen et al., if node specific point, a large connected component
i has fewer than two connections, it is connected emerges, and the value of the average degree
to a randomly chosen node in the network; other- suddenly jumps up. This point equals ξ2/λ and is
wise two randomly chosen neighbors of node i are known as the critical point – it marks the begin-
connected together. In the model of Marsili et al., ning of the dense phase in the phase diagram of
node i (regardless of its degree) is connected with the system. As ξ is increased further, the network
probability η to a randomly chosen node in the becomes more densely connected. Reversing the
network; then a second-order neighbor of node i, process by slowly decreasing the value of ξ iden-
i.e., a friend’s friend, is connected with probability tifies a range of values from ξ1 ξ ξ2 where the
ξ to node i. The first mechanism in each model, the largest connected component remains densely
random connection, emulates focal closure, connected and the average degree remains high.
because there are no nodal attributes signifying Only when the value of ξ is decreased below a
shared interests. The point is that the formation of point denoted by ξ1 does the network “collapse”
these connections is not driven by the structure of and reenter the sparse phase. This phenomenon,
existing connections but, from the point of view of which demonstrates some of the connections
network structure, is purely random. The second between network science and statistical physics,
mechanism, the notion of triadic closure, is is typical of first-order or discontinuous phase
implemented in slightly different ways across the transitions in statistical physics, and it demon-
models. If these mechanisms were applied indef- strates how hysteresis, the effect of the system
initely, the result would be a fully connected net- remembering its past state, can rise in networked
work. To avoid this outcome, the models also systems. Although Markov dependence is a spe-
delete ties at a constant rate, which makes it pos- cial case of hysteresis, its use is generally
sible for network statistics of interest to reach restricted to probabilistic models whereas hyster-
stationary distributions. In the model of Davidsen esis is typically aligned with nonlinear models of
et al., tie deletion is accomplished by choosing a physical phenomena having a continuous state-
node in the network uniformly at random, and space. From the social network point of view this
then removing all of its ties with some probability; means that the network can remain in a
Marsili et al. accomplish the same phenomenon connected phase even if the rate of establishing
by selecting a tie uniformly at random, and then new connections at the current rate would not be
deleting it with probability λ. Growing network sufficient for getting the network to that phase in
evolution models, such as those by Vázquez the first place. In more practical terms, this
26 Introduction to Social Network Analysis 643
a b
c
d
Fig. 6 Network structures produced by the model of Kumpula et al. by varying the reinforcement parameter as follows:
(a) δ = 0, (b) δ = 0.1, (c) δ = 0.5, and (d) δ = 1. Figure adapted from Kumpula et al. (2007)
finding implies that it is possible to maintain a The model by Kumpula et al. (2007), which is
highly connected network with a relatively low another dynamical (nongrowing) network evolu-
“effort” (the ξ parameter in the model) once the tion model for social networks, implements cyclic
network has been established, but that same low closure and focal closure (see Fig. 6) in a manner
level of effort would not be sufficient for similar to the models of Davidsen et al. and
establishing the dense phase of network evolu- Marsili et al., but introduces a minor modification.
tion in the first place. (The analogy in social Unlike the previous models which produce
network analysis is that the threshold for binary networks with Aij = {0, 1}, this model
forming a (e.g.,) friendship is greater than that produced weighted networks with Aij 0. The
needed for it to remain intact.) main modification deals with the triadic closure
644 A. J. O’Malley and J.-P. Onnela
step, which here is implemented as a weighted Nodal attribute models, in stark contrast to
two-step random walk. Starting from a randomly network evolution models, specify nodal attri-
chosen node i; this node chooses one of its neigh- butes for each node, which could be scalar or
bors j with probability wij/si, where si = Σj wij is vector valued. The probability of linkage between
the strength of node i, i.e., the sum of the edge any two nodes is typically an increasing function
weights connecting it to its neighbors. If node of the similarity of the nodal attributes of the two
j has neighbors other than i, such a node k will nodes in consideration. This is compatible with
be chosen with probability wjk/(sj wij), where the notion of homophily, the tendency for like to
there is a requirement that k 6¼ i. The weights wij attract like. Nodal attribute models can also be
and wjk on the edges just traversed will be interpreted as spatial models, where the idea is
increased by a value δ. In addition, if there is a that each node has a specific location in a social
link connecting node i and node k, the weight wik space. The models by Boguñá et al. (2004) and
on that link is similarly increased by δ; otherwise a Wong et al. (2006) serve as interesting examples.
new link is established between node i and k with Nodal attribute models do not specify attachment
wik = 1. When δ = 0, there is no clear community rules at the level of the network, and in some sense
structure present, but as the value of δ is increased, can be seen as latent variable models for social
very clear nucleation of communities takes place. network formation. These types of models have
This phenomenon occurs when δ > 0 because a been studied less in the network science literature
type of positive feedback or memory gets than network evolution models.
imprinted on the network, which reinforces Clearly, nodal attribute models have a strong
existing connections, and makes future transver- resemblance to models developed and studied in
sal of those connections more likely. This is not the social network literature that treat dyads as
unlike the models of cumulative advantage or independent conditional on observed attributes
preferential attachment discussed above, but now of the individuals, other covariates, and various
applies to individual links as opposed to nodes. If latent variables (individual-specific random
one inspects the community structure produced by effects in the case of the p2 model, categorical
the model, most of the strong links appear to be latent variables in the case of latent class models,
located within communities, whereas links continuous latent variables under the latent-space,
between communities are typically weak. This and latent eigenmodels in section “Latent Inde-
type of structural organization is compliant with pendence Approaches”). Unlike network science,
the so-called weak ties hypothesis, formulated in work on such models in the social network litera-
Granovetter (1973), which states, in essence, that ture has been more prominent than work on net-
the stronger the tie connecting two individuals, work evolution. A difference in the approach of
the higher the fraction of friends they have in some nodal attribute models and social network
common. Onnela et al. showed that a large-scale models is that the former may use specific rules
social network constructed from the cell phone for determining whether a tie is expected, such as
communication records of millions of people a threshold function (in a sense emulating formal
was in remarkable agreement with the hypothesis decision making), whereas the latter rewards
– only the top 5% of ties in terms of their weight values of parameters that make the model most
deviated noticeably from the prediction. The net- consistent with the observed network(s).
works produced by the model of Kumpula et al.
are clearly reminiscent of observed real-world
social networks, and the inclusion of the tuning Network Communities
parameter δ makes it straightforward to create
networks with sparser or denser communities. Many network characteristics are either micro-
The downside is that the addition of weights to scopic or macroscopic in nature; the value of a
the model appears to make it analytically microscopic characteristic depends on local net-
intractable. work structure only, whereas the value of a
26 Introduction to Social Network Analysis 645
Fig. 8 Schematic of a multislice network. Each slice networks, the slice-to-slice coupling extends for each
represents a network encoded by the adjacency tensor node a tie to itself across neighboring slices only as exem-
Aijs, where subscripts i and j are used to index the nodes plified for the node in the upper right corner of the slices;
and subscript s is used to index the slices. Each node is for multiplex networks, the slice-to-slice coupling extends
coupled to itself in the other slices, and the structure of this a tie from each node to itself in all the slices as exemplified
coupling, encoded by the Cjrs tensor, depends on whether for the node in the lower left corner. Whatever the form of
the slices correspond to snapshots taken at different times this coupling, it is applied the same way to each node,
(time-dependent network), to communities detected at dif- although for visual clarity the slice-to-slice couplings are
ferent resolution levels (multiscale network), or to a net- shown just for two nodes. Figure adapted from Mucha et al.
work consisting of multiple types of interactions (2010)
(multiplex network). For time-dependent and multiscale
where σ l indicates the spin of particle l and {σ} and other physical systems to be applied to mod-
denotes the configuration of all N spins. Finding ularity optimization and, more broadly, to the
the minimum energy state (the ground state) of the optimization of other quality functions. Simulated
system corresponds to finding {σ} such that H annealing, greedy algorithms, and spectral
({σ}) is minimized. The states of the particles methods serve as examples of these methods.
(spins) correspond to community assignments of More details and references are available in com-
nodes in the network problem, and minimizing H munity detection review articles (Porter et al.
({σ}) is mathematically identical to maximizing 2009; Fortunato 2010).
modularity QM ðP Þ . In the physical system, Although there are several extensions of mod-
depending on the interaction energies, the spins ularity maximization, only one such generaliza-
seek to align with other spins (interact ferro- tion is described here. Mucha et al. developed a
magnetically) or they seek to have different ori- generalized framework of network quality func-
entations (interact antiferromagnetically). In the tions that allow the study of community structure
community detection problem, two nodes seek of arbitrary multislice networks (see Fig. 8),
to be in the same community if they are connected which are combinations of individual networks
by an edge that is stronger than expected; other- coupled through links that connect each node in
wise they seek to be in different communities. one slice to the same node in other slices (Mucha
This correspondence between the two problems et al. 2010). This framework allows studies of
has enabled the application of computational tech- community structure in time-dependent, multi-
niques developed for the study of spin systems scale, and multiplex networks. Much of the work
648 A. J. O’Malley and J.-P. Onnela
in the area of community detection is motivated was interested in examining the community struc-
by the observation that the behavior of dynamical ture of the students at three different scales using,
processes on networks is driven or constrained by say, γ s {0.5, 1, 2}, this would result in a three-
their community structure. The approach of fold replication of the 4 2 slice array with each
Mucha et al. is based on a reversal of this logic, of the three layers having a distinct value for γ s.
and it introduces a dynamical process on the net- Taken together, this would lead to a three-
work, and the behavior of the dynamical process dimensional 4 2 3 array of slices.
is used to identify the (structural) communities.
The outcome is a quality function Clique Percolation
Cliques are (usually small) fully connected sub-
The k-clique percolation algorithm is an exam- similar to that of their neighbors. The criteria for
ple of a local community-finding method. One judging the efficacy of the partition of nodes into
obtains a network’s global community structure communities is embedded in the statistical model
by considering the ensemble of communities implied for the network and as such is a balance
obtained by looping over all of its k-cliques. between all of the terms in the model. This con-
Some nodes might not belong to any community trasts a nonmodel-based objective function such
(because they are never part of any k-clique), and as modularity which focuses on maximizing in
others can belong to several communities (if they some sense the ratio of density of ties within and
are located at the interface between two or more between communities. To illustrate the difference,
communities). The nested nature of communities consider a k-star. The greater the value of k, the
is recovered by considering different values of k, greater the discrepancy in the degree of the actors.
although k-clique percolation can be too rigid Therefore, if k-stars occur frequently, the mem-
because focusing on cliques typically causes one bers of the same k-star are likely to be included in
to overlook other dense modules that are not quite the same group by the latent class model but, due
as tightly connected. to the difference in degree, are unlikely to be
The advantage of k-clique percolation is that it grouped under modularity maximization. How-
provides a successful way to consider community ever, an advantage of the network science
overlap. Allowing the detection of network com- approach is that results are likely to be more
munities that overlap is especially appealing in the robust to model misspecifications than under the
social sciences, as people may belong simulta- social network approach.
neously to several communities (colleagues, fam- In the future it is possible to imagine a bridging
ily, friends, etc.). However, the case can be made of the two approaches to community detection.
that it is the underlying interactions that are dif- For example, a model for the network, or the
ferent, and one should not combine interactions component of the model involving the key deter-
that are of fundamentally different types. In sta- minants of network ties, could be incorporated in
tistics, this is analogous to using composite vari- the modularity function in (7.1). Depending on
ables or scales that combine multiple items in the specification, the result might be a weighted
(e.g.,) health surveys or questionnaires. If version of modularity in which a higher penalty is
the nature of the interactions is known, the sys- incurred if individuals with similar traits – or in
tem might be more appropriately described as a structurally equivalent positions with respect to k-
multiplex network, where one tie type encodes stars, triadic closure or other local network con-
professional interactions, another tie type corre- figurations – are included in different communi-
sponds to personal friendships, and a third tie ties than if individuals with different traits are in
type captures family memberships. The multi- different communities. However, to the best of the
slice framework discussed above is able to author’s knowledge, such a procedure is not
accommodate memberships in multiple commu- available.
nities as long as distinct interaction types are
encoded with distinct (multiplex) ties.
Part IV: Discussion and Glossary
Comparison to Social Network
Approaches to “Community Detection” In this chapter, the dual fields of social networks
The latent class models in section “Latent Inde- and network science have been described, with
pendence Approaches” partitions the actors in a particular focus on sociocentric data. Both fields
network into disjoint groups that can be thought of are growing rapidly in methodological results
as communities. The clustering process can be and the breadth of applications to which they
thought of as a search for structural equivalence are applied.
in that individuals are likely to be included in the In health applications, social network methods
same community if the network around them is for evaluating whether individuals’ attributes
650 A. J. O’Malley and J.-P. Onnela
spread from person-to-person across a popula- There are several important topics that have
tion (social influence) and for modeling rela- not been discussed, notably including network
tionship or tie status (social selection) have sampling. In gathering network data, adaptive
been described. Models of relationship status methods such as link-tracing designs are often
have not been applied as frequently in health used to identify individuals more likely to know
applications, where focus often centers on the each other and thus to have formed a relation-
patient. However, Keating et al. (2007) is a ship with other sampled individuals than in a
notable exception. Due to the ever-growing random-probability design. Link-tracing and
availability of data, the interest in peer effects, other related designs are often used to identify
and the need to design support mechanisms, the hard-to-reach populations (Thompson and
role of social network analysis in health care Seber 1996; Thompson and Frank 2000;
and medicine is likely to undergo continued Thompson 2006). However, the sampling prob-
growth in the future. abilities corresponding to link-tracing designs
A novel feature of this chapter is the attention may be difficult to evaluate (generally requiring
given to network science. Although network sci- the use of simulation), and it may not be obvious
ence is descriptively inclined and thus is how they should be incorporated in the analysis.
removed from mainstream translational medical The development of statistical methods that
research seeking to identify causes of medical account for the sample design in the analysis
outcomes, the increasing availability of complex of social network data has lagged behind the
systems data provides an opportunity for net- designs themselves. However, recently progress
work science to play a more prominent role in has been made on statistical inference for sampled
medical research in the future. For example, relational network data (Handcock et al. 2010).
Barabasi and others have created a Human Dis- In the future it is likely that more bridges will
ease Network by connecting all hereditary dis- form between the social network and the network
eases that share a disease-causing gene (Goh science fields with models or methods developed
et al. 2007). In other work, they created a Pheno- in one field used to solve problems in the other.
typic Disease Network (PDN) as a map summa- Furthermore, as these two fields become more
rizing phenotypic connections between diseases entwined, it is likely that they will also become
(Hidalgo et al. 2009). These networks provided more prominent in the solution to important prob-
important insights into the potential common lems in medicine and health care.
origins of different diseases, whether diseases
progress through cellular functions (phenotypes) Acknowledgments The time and effort of Dr. O’Malley
associated with a single diseased (mutated) gene and Dr. Onnela on researching and developing this chapter
was supported by NIH/NIA grant P01 AG031093 and
or with other phenotypes, and whether patients
Robert Wood Johnson Award #58729. The authors thank
affected by diseases that are connected to many Mischa Haider, Brian Neelon, and Bruce E Landon for
other diseases tend to die sooner than those reviewing an early draft of the manuscript and providing
affected by less connected diseases. Such work several useful comments and suggestions.
has the potential to provide insights into many
previously untested hypotheses about disease
mechanisms. Glossary of Terms
For example, they may ultimately be helpful
in designing “personalized treatments” based on To help readers familiar with social networks
the network position held by an individual’s understand the network science component of the
combined genetic, proteomic, and phenotypic chapter and conversely for readers familiar with
information. In addition, they may suggest con- network science to understand the social network
ditions for which treatments found to be effec- component, the following glossary contains a com-
tive on another condition might also be tried. prehensive list of terms and definitions.
26 Introduction to Social Network Analysis 651
indicate more central positions. There are the relationship from one actor to another.
numerous measures of centrality. Four com- For example, element ij contains the relation-
mon ones are degree, closeness, betweeness, ship from actor i to actor j. The diagonal
and eigenvalue centrality. Degree and eigen- elements are zero by definition.
value centrality are extremes in that degree 38. Matrix transpose: The operation whereby ele-
centrality is determined solely from an actor’s ment ij is exchanged with element ji for all i, j.
degree (it is internally focused) while eigen- 39. Row stochastic matrix: A matrix whose rows
value centrality is based on the centrality of sum to 1 and contain nonnegative elements.
the actors connected to the focal actor (it is Thus, each row represents a probability distri-
externally focused). bution of a discrete-valued random variable.
28. Structural balance: A theory which suggests 40. Random variable: A variable whose value is
actors seek balance in their relationships; for not known with certainty. It can relate to an
example, if A likes B and B likes C then A event or time period that is yet to occur, or it
will endeavor to like C as well to keep the can be a quantity whose value is fixed (i.e.,
system balanced. Thus, the existence of tran- has occurred) but is unknown.
sitivity is implied by structural balance. 41. Parametric: A term used in statistics to
29. Structural equivalence: The network configu- describe a model with a specific functional
ration (arrangement of ties) around one actor form (e.g., linear, quadratic, logarithmic,
is similar to that of another actor. Even though exponential) indexed by unknown parameters
actors may not be connected, they can still be or an estimation procedure that relies on spec-
in structurally similar situations. ification of the complete distribution of
30. Structural power: An actor in a dominant the data.
position in the network. Such an actor may 42. Nonparametric: A model or estimation proce-
be one in a strategic position, such as the only dure that makes no assumption about the spe-
bridge between otherwise distinct cific form of the relationship between key
components. variables (e.g., whether the predictors have
31. Network component: A subset of actors hav- linear or additivie effects on the outcome)
ing no ties external to themselves. and does not rely upon complete specification
32. Graph theory: The mathematical basis under of the distribution of the data for estimation.
which theoretical results for networks are 43. Outcome, Dependent variable: The variable
derived and empirical computations are considered causally dependent on other vari-
performed. ables of interest. This will typically be a var-
33. Digraph: A graph in which edges can be iable whose value is believed to be caused by
bidirectional. Unlike social networks, other variables.
digraphs can contain self-ties. Graphs lie in 44. Independent, Predictor, Explanatory variable,
two-dimensional space. Covariate: A variable believed to be a cause
34. Hypergraph: A graph in dimension three or of the outcome.
higher. 45. Contextual variable: A variable evaluated on
35. Maximal subset: A set of actors for whom all the neighbors of, or other members of a set
ties are intact in a binary network (i.e., has containing, the focal actor. For example, the
density 1.0). If the set contains k actors, the proportion of females in a neighboring
maximal subset is referred to as a k-clique. county, the proportion of friends with college
36. Scalar, vector, matrix: Terms from linear and degrees.
abstract algebra. A scalar is a 1 1 matrix, a 46. Interaction effect: The extent to which the
vector is a k 1 matrix, and a matrix is k p, effect of one variable on the outcome varies
where k, p > 1. across the levels of another variable.
37. Adjacency matrix: A matrix whose 47. Endogenous variable: A variable (or an
off-diagonal elements contain the value of effect) that is internal to a system.
26 Introduction to Social Network Analysis 653
Predictors in a regression model that are between the predictors and the outcome and
correlated with the unobserved error are the effect of a change in the predictor on the
endogeneous; they are determined by an change in the outcome.
internal as opposed to an external process. 52. Cross-sectional model: A model of the rela-
By definition outcome variables are tionship between the values of the predictors
endogenous. and outcomes at a given time. Because one
48. Exogenous variable: A variable (or an effect) cannot discern the direction of causality,
that is external to the system in that its value is cross-sectional models are more difficult to
not determined by other variables in the sys- defend as causal.
tem. Predictors that are independent of the 53. Stochastic block model: A conditional dyadic
error term in a regression model are independence model in which the density and
exogeneous. reciprocity effects differ between blocks
49. Instrumental variable (IV): A variable with a defined by attributes of the actors comprising
non-null effect on the endogeneous predictor the network. For example, blocks for gender
whose causal effect is of interest (the “treat- accomodate different levels of connectedness
ment”) that has no effect on the outcome and reciprocity for men and women.
other than that through its effect on treat- 54. Logistic regression: A member of the expo-
ment. Often-used sufficient conditions for nential family of models that is specific to
the latter are that the IV is (i) marginally binary outcomes. It utilizes a link function
independent of any unmeasured con- that maps expected values of the outcome
founders and (ii) conditionally independent onto an unrestricted scale to ensure that all
of the outcome given the treatment and any predictions from the model are well-defined.
unmeasured confounders. In an IV analysis 55. Multinomial distribution: A generalization of
a set of observed predictors may be condi- the binomial distribution to three or more
tioned on as long as they are not effects of categories. The sum of the probabilities of
the treatment and the IV assumptions hold each category equals 1.
conditional on them. While subject to con- 56. Exponential random graph model: A model in
troversy, IV methods are one of the only which the state of the entire network is the
methods of estimating the true (causal) dependent variable. Provides a flexible
effect of an endogeneous predictor on an approach to accounting for various forms of
outcome. dependence in the network. Not amenable to
50. Linear regression model: A model in which causal modeling.
the expected value of the outcome 57. Degeneracy: An estimation problem encoun-
(or dependent variable) conditional on one tered with exponential random graph models
or more predictors (or explanatory variables) in which the fitted model might reproduce
is a linear combination of the predictors observed features of the network on average
(an additive sum of the predictors multiplied but each actor draw bears no resemblence to
by their regression coefficients) and an the observed network. Often degenerate
unobserved random error. draws are empty or complete graphs.
51. Longitudinal model: A model that describes 58. Latent distance model: A model in which the
variation in the outcome variable over time as status of dyads are independent conditional
a function of the predictors, which may on the positions of the actors, and thus the
include prior (i.e., lagged) values of the out- distance between them, in a latent social
come. Observations are typically only avail- space.
able at specific, but not necessarily equally 59. Latent eigenmodel: A model in which the
spaced, times. Longitudinal models make the status of dyads are independent conditional
direction of causality explicit. Therefore, they on the product of the (weighted) latent posi-
can distinguish between the association tions of the actors in the dyad.
654 A. J. O’Malley and J.-P. Onnela
60. Latent variable: An unobserved random var- 70. Steady state: The state-space distribution of a
iable. Random effects and pure error terms are Markov chain describes the long-run propor-
latent variables. tion of time the random variable being
61. Latent class: An unobserved categorical ran- modeled is in each state. Often Markov chains
dom variable. Actors with the same value of iterate through a transient phase in which the
the variable are considered to be in the same current state of the chain depends less and less
latent class. on the initial state of the chain. The steady
62. Factor analysis: A statistical technique used state phase occurs when successive samples
to decompose the correlation (or covariance) have the same distribution (i.e., there is no
matrix of a set of random variables into dependence on the initial state).
groups of related items. 71. Colinearity: The correlation between two pre-
63. Generalized estimating equation (GEE): A dictors after conditioning on the other
statistical method that corrects estimation observed predictors (if any). When predictors
errors for dependent observations without are colinear, distinguishing their effects is
necessarily modeling the form of the depen- difficult, and the statistical properties of the
dence or specifying the full distribution of estimated effects are more sensitive to the
the data. validity of the model.
64. Random effect: A parameter for the effect of a 72. Normal distribution: Another name for the
unit (or cluster) that is drawn from a specified Gaussian distribution. Has a bell-shaped
probability distribution. Treating the unit probability density function.
effects as random draws from a common 73. Covariance matrix: A matrix in which the ijth
probability distribution allows information element contains the covariance of items
to be pooled across units for the estimation i and j.
of each unit-specific parameter. 74. Absolute or Geodesic distance: The total dis-
65. Fixed effect: A parameter in a model that tance along the edges of the network from one
reflects the effect of an actor belonging to a actor to another.
given unit (or cluster). By virtue of modeling 75. Cartesian distance: The distance between two
the unit effects as unrelated parameters, no points on a two-dimension surface or grid.
information is shared between units and so Adheres to Pythagorus Theorem.
estimates are based only on information 76. Count data: Observations made on a variable
within the unit. with the whole numbers (0, 1, 2, . . .) as its
66. Ordinary least squares: A commonly used state space.
method for estimating the parameters of a 77. Statistical inference: The process of estab-
regression model. The objective function is lishing the level of certainty of knowledge
to minimize the squared distance of the fitted about unknown parameters (or hypothesis)
model to the observed values of the depen- from data subject to random variation, such as
dent variable. when observations are measured imperfectly
67. Maximum likelihood: A method of estimat- with no systematic bias or a sample from a
ing the parameters of a statistical model that population of interest is used to estimate popu-
typically embodies parametric assumptions. lation parameters.
The procedure is to seek the values of the 78. Null model: The model of a network statistic
parameters that maximize the likelihood typically represents what would be expected
function of the data. if the feature of interest was nonexistent
68. Likelihood function: An expression that (effect equal to 0) or outside the range of
quantifies the total information in the data as interest.
a function of model parameters. 79. Permutation test: A statistical test of a null
69. Markov chain Monte Carlo: A numerical pro- hypothesis against an alternative implemented
cedure used to fit Bayesian statistical models. by randomly reshuffling the labels (i.e., the
26 Introduction to Social Network Analysis 655
subscripts) of the observations. The signifi- governing, for example, the attachment of
cance level of the test is evaluated by new nodes to the existing network structure
resampling the observed data 50–100 times in models of network growth.
and computing the proportion of times that 5. Cumulative advantage: A stylized modeling
the test is rejected. mechanism introduced by Price in 1976 to
capture phenomena where “success breeds
success.” Price applied the model to study
Terms Used in Network Science citation patterns where power-law or power-
law-like distributions are observed for the
1. Network science: The approach developed distribution of the number of citations and
from 1995 onwards mostly within statistical successfully reproduced by the model.
physics and applied mathematics to study 6. Polya urn model: A stylized sampling model
networked systems across many domains in probability theory where the composition
(e.g., physical, biological, social, etc). Usu- of the system, the contents of the urn, changes
ally focuses on very large systems; hence, as a consequence of each draw from the urn.
theoretical results derived in the thermody- 7. Power law: Refers to the specific functional
namic limit are good approximations to real- form P (x) xα of the distribution of quan-
world systems. tity x. Also called Pareto distribution. See
2. Thermodynamic limit: In statistical physics scale-free network.
refers to the limit obtained for any quantity 8. Preferential attachment: A stylized modeling
of interest as system size N tends to infinity. mechanism introduced by Barabasi and
Many analytical results within network sci- Albert in 1999 where the probability of a
ence are derived in this limit due to analytical new node to attach itself to an existing node
tractability. i of degree ki is an increasing function of ki; in
3. Statistical physics: The branch of physics the case of linear preferential attachment, this
dealing with many body systems where the probability is directly proportional to ki. In
particles in the system obey a fix set of rules, short, the higher the degree of a node, the
such as Newtonian mechanics, quantum higher the rate at which it acquires new con-
mechanics, or any other rule set. As the num- nections (increases its degree).
ber of bodies (particles) in a system grows, it 9. Weak ties hypothesis: A hypothesis devel-
becomes increasingly difficult (and less infor- oped by sociologist Mark Granovetter in his
mative) to write down the equations of extremely influential 1973 paper “The
motion, a set of differential equations that strength of weak ties.” The hypothesis, in
govern the motion of the particles over time, short, states the following: The stronger the
for the system. However, one can describe tie connecting persons A and B, the higher the
these systems probabilistically. The word fraction of friends they have in common.
“statistical” is somewhat misleading as there 10. Modularity: Modularity is a quality-function
is no statistics in the sense of statistical infer- used in network community detection, where
ence involved; instead everything proceeds its value is maximized (in principle) over the
from a set of axioms, suggesting that “proba- set of all possible partitions of the network
bilistic” might be a better term. Statistical nodes into communities. Standard modularity
P
physics, also called statistical mechanics, reads as Q ¼ ð2mÞ1 i, j Aij 2m
ki k j
δ ci , cj
gives a microscopic explanation to the phe-
nomena that thermodynamics explains where ci is the community assignment of
phenomenologically. node i and δ is Kronecker delta; other quanti-
4. Generative model: Most network models ties as defined in the text.
within network science belong to this cate- 11. Rate equations: Rate equations, commonly
gory. Here one specifies the microscopic rules used to model chemical reactions, are similar
656 A. J. O’Malley and J.-P. Onnela
to master equations but instead of modeling between phases where thermodynamic func-
the count of objects (e.g., number of nodes) in tions are discontinuous.
a collection of discrete states (e.g., the number 19. Network diameter: The longest of the shortest
of k-degree nodes Nk (t) for different values of pairwise paths in the network, computed for
k), they are used to model the evolution of each dyad (node pair).
continuous variables, such as average degree, 20. Hysteresis: The behavior of a system depends
over time. not only on its current state but also on its
12. Master equations: Widely used in statistical previous state or states.
physics, these differential equations model 21. Quality function: Typically a real-valued
how the state of the system changes from function with a high-dimensional domain
one time point to the next. For example, if that specifies the “goodness” of, say, a given
Nk (t) denotes the number of nodes of degree network partitioning. For example, given the
k, given the model, one can write down the community assignments of N nodes, which
equation for Nk (t + 1), i.e., the number of k- can be seen as a point in an N-dimensional
degree nodes at time t + 1. hypercube, the standard modularity quality
13. Fitness or affinity or attractiveness: A node function returns a number indicating how
attribute introduced to incorporate hetero- good the given partitioning is.
geneity in the node population in a growing 22. Dynamic process: Any process that unfolds on
network model. For example, in a model a network over time according to a set of
based on preferential attachment, this prespecified rules, such as epidemic processes,
could represent the inherent ability of a percolation, diffusion, synchronization, etc.
node to attract new edges, a mechanism 23. Slice: In the context of multislice community
that is superimposed on standard preferen- detection, refers to one graph in a collection
tial attachment. of many within the same system, where a
14. Community: A group of nodes in a network slice can capture the structure of a network
that are, in some sense, densely connected to at a given time (time-dependent slice), at a
other nodes in the community but sparsely particular resolution level (multiscale slice),
connected to nodes outside the community. or can encode the structure of a network for
15. Community detection: The set of methods one tie type when many are present (multi-
and techniques developed fairly recently for plex slice).
finding communities in a given network 24. Scale-free network: Network with a power-
(graph). The number of communities is usu- law (Pareto) degree distribution.
ally not specified a priori but, instead, needs 25. Erdős-Rényi model: Also known as Poisson
to be determined from data. random graph (after the fact that the degree
16. Critical point: The value of a control param- distribution in the model follows a Poisson
eter in a statistical mechanical system where distribution), Bernoulli random graph (after
the system exhibits critical behavior: previ- the fact that each edge corresponds to an
ously localized phenomena now become cor- outcome of a Bernoulli process), or the ran-
related throughout the system which at this dom graph (as the progenitor of all random
point behaves as one single entity. graphs). Starting with a fixed set of N nodes,
17. Phase diagram: A diagram displaying the one considers each node pair in turn indepen-
phase (liquid, gas, etc.) of the system as one dently of the other node pairs and connects
or more thermodynamic control parameters the nodes with probability p. Erdős and Rényi
(temperature, pressure, etc.) are varied. first published the model in 1959, although
18. Phase transition: Thermodynamic properties Solomonoff and Rapoport published a similar
of a system are continuous functions of the model earlier in 1951.
thermodynamic parameters within a phase; 26. Watts-Strogatz model: A now canonical
phase transitions (e.g., liquid to gas) happen model by Watts and Strogatz that was
26 Introduction to Social Network Analysis 657
introduced in 1998. Starting from a regular Barnett ML, Keating NL, Christakis NA, O’Malley AJ,
lattice structure characterized by high clus- Landon BE. Reasons for referral among primary care
and specialist physicians. J Gen Intern Med.
tering and long paths, the model shows how 2012b;27:506–12.
randomly rewiring only a small fraction of Berkman L, Glass T. Social integration, social methods,
edges (or, alternative, adding a small num- social support, and health. In: Social epidemiology.
ber of randomly placed edges) leads to a New York: Oxford University Press; 2000. p. 137–73.
Boguñá M, Pastor-Satorras R, Díaz-Guilera A, Arenas
small-world characterized by high cluster- A. Models of social networks based on social distance
ing and short paths. The model is conceptu- attachment. Phys Rev E. 2004;70:056122. https://doi.
ally appealing, and shows how to interpolate, org/10.1103/PhysRevE.70.056122.
using just one parameter, from a regular lat- Bonacich P. Power and centrality: a family of measures.
Am J Sociol. 1987;92:1170–82.
tice structure in one extreme to an Erdős- Borgatti S, Everett M. Network analysis of 2-mode data.
Rényi graph in the other. Soc Networks. 1997;19:243–69.
27. Mean-field approximation: Sometimes called Breiger R. The duality of persons and groups. Soc Forces.
the zero-order approximation, this approxi- 1974;53:181–90.
Cartwright D, Harrary F. A generalization of Heider’s
mation replaces the value of a random vari- theory. Psychol Rev. 1956;63:277–92.
able by its average, thus ignoring any Centola D. Failure in complex social networks. Math
fluctuations (deviations) from the average Sociol. 2009;33:64–8.
that may actually occur. This approach is Choi D, Wolfe P, Airoldi E. Stochastic blockmodels with
growing number of classes. Arxiv preprint. 2010;
commonly used in statistical physics. arXiv:1011.4644.
28. Ensemble: A collection of objects, such as Christakis N, Fowler J. The spread of obesity in a large
networks, that have been generated with social network over 32 years. N Engl J Med.
the same set of rules, where each object in 2007;357:370–9.
Christakis NA, Fowler JH. Social contagion theory: exam-
the ensemble has a certain probability asso- ining dynamic social networks and human behavior.
ciated with it. For example, one could con- Stat Med. 2013;32:556–77.
sider the ensemble of networks that consists Coleman J, Katz E, Menzel H. The diffusion of innovations
of six nodes and two edges, each begin among physicians. Sociometry. 1957;20:253–70.
Coleman J, Katz E, et al. Medical innovation: a diffusion
equiprobable. study. Indianapolis: Bobbs-Merrill; 1966.
Davidsen J, Ebel H, Bornholdt S. Emergence of a small
world from local interactions: modeling acquaintance
networks. Phys Rev Lett. 2002;88:128701. https://doi.
org/10.1103/PhysRevLett.88.128701.
References Dorogovtsev SN, Mendes JFF, Samukhin AN. Structure
of growing networks with preferential linking.
Airoldi EM, Fienberg SE, Xing EP. Mixed membership Phys Rev Lett. 2000;85:4633–6. https://doi.org/
stochastic blockmodels. J Mach Learn Res. 10.1103/PhysRevLett.85.4633.
2008;9:1981–2014. Duijn MV, Snijders TAB, Zijlstra B. P2: a random effects
Anselin L. Spatial econometrics: methods and models. model with covariates for directed graphs. Statistica
Dordrecht: Kluwer; 1988. Neerlandica. 2004;58:234–54.
Barabasi A-L, Albert R. Emergence of scaling in random Erdős P, Rényi A. Random graphs. Publ Math.
networks. Science. 1999;286:509–12. http://www. 1959;6:290–7.
sciencemag.org/content/286/5439/509.abstract Faust K. Centrality in affliation networks. Soc Networks.
Barabasi A-L, Albert R, Jeong H. Mean-field theory for 1997;19:157–91.
scale-free random networks. Phys A Stat Mech Appl. Feller W. An introduction to probability theory and its
1999;272:173–87. http://www.sciencedirect.com/sci applications, vol. 2. New York: Wiley; 1966.
ence/article/pii/S0378437199002915. Festinger L. The analysis of sociograms using matrix alge-
Barnett ML, Landon BE, O’Malley AJ, Keating NL, bra. Hum Relat. 1949;2:153–8.
Christakis NA. Mapping physician networks with Fineberg S, Wasserman S. Categorical data analysis of
self-reported and administrative data. Health Serv single sociometric relations. In: Sociological method-
Res. 2011;46:1592–609. ology. New Jersey: Jossey-Bass; 1981. p. 156–92.
Barnett ML, Christakis NA, O’Malley AJ, Onnela J-P, Fletcher JM. Social interactions and smoking: evidence
Keating NL, Landon BE. Physician patient-sharing using multiple student cohorts, instrumental
networks and the cost and intensity of care in US variables, and school fixed effects. Health Econ.
hospitals. Med Care. 2012a;50:152–60. 2008;19:466–84.
658 A. J. O’Malley and J.-P. Onnela
Fletcher JM, Lehrer SF. The effect of adolescent health on Hoff PD, Raftery AE, Handcock MS. Latent space models
educational outcomes: causal evidence using genetic for social networks analysis. J Am Stat Assoc.
lotteries between siblings. Canadian labor market and 2002;97:1090–8.
skills researcher network, working paper no. 32. 2009. Holland P, Leinhardt S. An exponential family of
Fortunato S. Community detection in graphs. Phys probability-distributions for directed-graph. J Am Stat
Reports. 2010;486:75–174. Assoc. 1981;76:33–50.
Frank O, Strauss D. Markov graphs. J Am Stat Assoc. Holland P, Laskey K, Leinhardt S. Stochastic blockmodels:
1986;81:832–42. some first steps. Soc Networks. 1983;5:109–37.
Freeman L. Centrality in social networks, I. Conceptual House J, Kahn R. Measures and concepts of social support.
clarification. Soc Networks. 1979;1:215–39. In: Social support and health. Orlando: Academic;
Freeman L. The development of social network analysis: a 1985. p. 83–108.
study in the sociology of science. Vancouver: Empirical Huisman M, Van Duijn M. Software for statistical analysis
Press; 2004. of social networks. In: The Sixth International Confer-
Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabasi ence on Logic and Methodology; Amsterdam: 2004.
A-L. The human disease network. Proc Natl Acad Sci. Huisman M, Van Duijn M. Software for social networks
2007;104:8685–90. http://www.pnas.org/content/104/ analysis. In: Models and methods in social network anal-
21/8685.abstract ysis. Cambridge: Cambridge University Press; 2005.
Goldenberg A, Zheng AX, Fineberg SE, Airoldi EM. A Hunter D. Curved exponential family models for social
survey of statistical network models. Found Trends networks. Soc Networks. 2007;29:216–30.
Mach Learn. 2009;2:129–233. Hunter DR, Handcock MS. Inference in curved exponen-
Goodreau S. Advances in exponential random graph tial family models for networks. J Comput Graph Stat.
(p*) models applied to a large social network. Soc 2006;15:565–83.
Networks. 2007;29:231–48. Iwashyna TJ, Chang VW, Zhang JX, Christakis
Granovetter MS. The strength of weak ties. Am J Sociol. AN. Physician social networks and variation in prostate
1973;78:1360–80. cancer treatment in three cities. Health Serv Res.
Guimera R, Nunes Amaral LA. Functional cartography of 2002;37:1531–51.
complex metabolic networks. Nature. 2005;433:895–900. Karrer B, Newman MEJ. Stochastic blockmodels and
Haines V, Hurlbert J. Network range and health. J Health community structure in networks. Phys Rev E. 2011;
Soc Behav. 1992;33:254–66. 83:016107. https://doi.org/10.1103/PhysRevE.83.
Handcock MS, Robins GL, Snijders TAB, Moody J, Besag 016107.
J. Assessing degeneracy in statistical models of social Katz L. On the matrix analysis of Sociometric data. Soci-
networks. J Am Stat Assoc. 2003;76:33–50. ometry. 1947;10:233–41.
Handcock M, Raftery A, Tantrum J. Model-based cluster- Katz L. A new status index derived from sociometric
ing for social networks. J Roy Stat Soc A. 2007; analysis. Psychometrika. 1953;18:39–43.
170:301–54. Katz L, Powell JH. Measurement of the tendency toward
Handcock MS, Hunter DR, Butts CT, Goodreau SM, reciprocation of choice. Sociometry. 1955;18:659–65.
Krivitsky PN, Morris M. ergm: A package to fit, sim- Keating NL, Ayanian JZ, Cleary PD, et al. Factors affecting
ulate and diagnose exponential-family models for net- influential discussions among physicians: a social net-
works, http://CRAN.R-project.org/package=ergm. work analysis of a primary care practice. J Gen Intern
Version 2.2-6. 2010. Project home page at http:// Med. 2007;22:794–8.
statnetproject.org Klovdahl A. Social networks and the spread of infectious
Hanneke S, Fu W, Xing EP. Discrete temporal models of diseases. Soc Sci Med. 1985;21:1203–16.
social networks. Electron J Stat. 2010;4:585–605. Kossinets G, Watts DJ. Empirical analysis of an evolving
Harary F. On the notion of balance of a signed graph. Mich social network. Science. 2006;311:88–90. http://www.
Math J. 1953;2:143–6. sciencemag.org/content/311/5757/88.abstract
Harary F. The number of linear, directed rooted and con- Krapivsky PL, Redner S, Leyvraz F. Connectivity of grow-
nected graphs. Trans Am Math Soc. 1955;78:445–63. ing random networks. Phys Rev Lett. 2000;85:4629–32.
Heider F. Attitudes and cognitive orientation. J Psychol. https://doi.org/10.1103/PhysRevLett.85.4629.
1946;21:107–12. Krivitsky PN. Exponential-family random graph models
Hidalgo CA, Blumm N, Barabasi A-L, Christakis NA. A for valued networks. 2012. arXiv preprint, 1101.
dynamic network approach for the study of human 1359v2 [stat.ME] 19 Jan 2012.
phenotypes. PLoS Comput Biol. 2009;5:e1000353. Krivitsky PN, Handcock MS. Fitting position latent cluster
https://doi.org/10.1371/journal.pcbi.1000353. models for social networks with latentnet. J Stat Softw.
Hoff PD. Bilinear mixed effects models for dyadic data. J 2008;24. http://statnetproject.org
Am Stat Assoc. 2005;100:286–95. Krivitsky PN, Handcock MS. A separable model for
Hoff P. Modeling homophily and stochastic equivalence in dynamic networks. 2010. arXiv preprint, 1011.1937v1
symmetric relational data. In: Advances in neural infor- [stat.ME].
mation processing systems, vol. 20. Cambridge, MA: Kumpula JM, Onnela J-P, Saramäki J, Kaski K, Kertész
MIT Press; 2008. p. 657–64. J. Emergence of communities in weighted networks.
26 Introduction to Social Network Analysis 659
Phys Rev Lett. 2007;99:228701. https://doi.org/ O’Malley AJ, Christakis NA. Longitudinal analysis of
10.1103/PhysRevLett.99.228701. large social networks: estimating the effect of health
Landon BE, Keating NL, Barnett ML, Onnela JP, Paul S, traits on changes in friendship ties. Stat Med.
OˆaMalley AJ, Keegan T, Christakis NA. Variation in 2011;30:950–64.
patient-sharing networks of physicians across the O’Malley AJ, Marsden PV. The analysis of social networks.
United States. JAMA. 2012;308:265–73. Health Serv Outcome Res Methodol. 2008;8:222–69.
Laumann E, Marsden P, Prensky D. The boundary specifi- O’Malley AJ, Arbesman S, Steiger DM, Fowler JH,
cation problem in network analysis. In: Burt R, Christakis NA. Egocentric social network structure,
Minor M, editors. Applied network analysis: a meth- health, and pro-social behaviors in a National Panel
odological introduction. Beverly Hills: Sage; 1983. Study of Americans. PLoS One. 2012;7:e36250.
p. 18–34. https://doi.org/10.1371/journal.pone.0036250.
Lorrain F, White H. Structural equivalence of individuals Opsahl T. Triadic closure in two-mode networks: redefining
in social networks. J Math Sociol. 1971;1:49–80. the global and local clustering coefficients. Soc Networks.
Lyons R. The spread of evidence-poor medicine via flawed 2011; 34. https://doi.org/10.1016/j.socnet.2011.07.001.
social-network analyses. Stat Polit Policy. 2011;2:1–26. Opsahl T, Agneessens F, Skvoretz J. Node centrality in
Manski CA. Identification of endogenous social effects: the weighted networks: generalizing degree and shortest
reflection problem. Rev Econ Stud. 1993;60:531–42. paths. Soc Networks. 2010;32:245–51.
Marsden P. Network methods in social epidemiology. In: Palla G, Derenyi I, Farkas I, Vicsek T. Uncovering the
Methods in social epidemiology. New York: Jossey- overlapping community structure of complex networks
Bass; 2006. p. 267–86. in nature and society. Nature. 2005;435:814–8. https://
Marsden PV, Friedkin NE. Network studies of social influ- doi.org/10.1038/nature03607.
ence. Sociol Methods Res. 1993;22:127–51. Paul S, O’Malley AJ. Hierarchical longitudinal models of
Marsili M, Vega-Redondo F, Slanina F. The rise and fall of relationships in social networks. J R Stat Soc Ser C
a networked society: a formal model. Proc Natl Acad Appl Stat. 2013;62:705–22.
Sci USA. 2004;101:1439–42. Pham HH, O’Malley AS, Bach PB, Saiontz-Martinez C,
McPherson ML, Smith-Lovin C, et al. Birds of a feather: Schrag D. Primary care physicians’ links to other phy-
homophily in social networks. Annu Rev Sociol. sicians through Medicare patients: the scope of care
2001;27:415–44. coordination. Ann Intern Med. 2009;150:236–42.
Moreno JL. Who shall survive? Nervous and mental dis- Piraveenan M, Prokopenko M, Zomaya AY. Assortative
ease processing. The University of Michigan, Ann mixing in directed biological networks. IEEE Trans
Arbor; 1934. Comput Biol Bioinform. 2010;9:66–78. To appear.
Mucha PJ, Richardson T, Macon K, Porter MA, Onnela J-P. Pollack CE, Weissman G, Bekelman J, Liao K, Armstrong
Community structure in time-dependent, multiscale, and K. Physician social networks and variation in prostate
multiplex networks. Science. 2010;328:876–8. http:// cancer treatment in three cities. Health Serv Res.
www.sciencemag.org/content/328/5980/876.abstract 2012;47:380–403.
Newcomb TM. An approach to the study of communica- Porter MA, Onnela J-P, Mucha PJ. Communities in net-
tive acts. Psychol Rev. 1953;60:393–404. works. Not Am Math Soc. 2009;56(1082–1097):1164–6.
Newman ME. Scientific collaboration networks. Price DDS. A general theory of bibliometric and other
II. Shortest paths, weighted networks, and centrality. cumulative advantage processes. J Am Soc Inf Sci.
Phys Rev. 2001;64:016132. 1976;27:292–306. https://doi.org/10.1002/asi.
Newman MEJ. Modularity and community structure in 4630270505.
networks. Proc Natl Acad Sci. 2006;103:8577–82. Robins G, Pattison P, Woolcock J. Small and other worlds:
Newman M. Networks: an introduction. New York: global network structures from local processes. Am J
Oxford University Press; 2010. Sociol. 2005;110:894–936.
Newman MEJ. Communities, modules and large-scale Robins GL, Snijders TAB, Wang P, Handcock MS,
structure in networks. Nat Phys. 2012;8:25–31. Pattison PE. Recent developments in exponential ran-
Newman MEJ, Girvan M. Mixing patterns and community dom graph ( p) models for social networks. Soc Net-
structure in networks. In: Pastor-Satorras R, Rubi J, works. 2007;29:192–215.
Diaz-Guilera A, editors. Statistical mechanics of com- Robins GL, Pattison PE, Wang P. Closure, connectivity
plex networks. Berlin: Springer; 2003. and degree distributions: exponential random graph
Newman MEJ, Girvan M. Finding and evaluating commu- (p*) models for directed social networks. Soc Net-
nity structure in networks. Phys Rev E. 2004;69:026113. works. 2009;31:105–7.
https://doi.org/10.1103/PhysRevE.69.026113. Rubin D. Bayesian inference for causal effects: the role of
Nowicki K, Snijders TAB. Estimation and prediction for randomization. Ann Stat. 1978;6:34–58.
stochastic blockstructures. J Am Stat Assoc. 2001; Seidman SB. Network structure and minimum degree. Soc
96:1077–87. Networks. 1983;5:269–87.
O’Malley AJ. The analysis of social network data: an Shalizi RR, Rinaldo A. Consistency under sampling of
exciting frontier for statisticians. Stat Med. 2013; exponential random graph models. 2012. arXiv pre-
32:539–55. print. arXiv:1111.3054v3
660 A. J. O’Malley and J.-P. Onnela
Shalizi CR, Thomas AC. Homophily and contagion are Traud AL, Mucha PJ, Porter MA. Social structure of
generically confounded in observational social network Facebook networks. Phys A Stat Mech Appl.
studies. Sociol Methods Res. 2011;40:211–39. 2012;391:4165–80. http://www.sciencedirect.com/sci
Simmel G. The sociology of Georg Simmel. New York: ence/article/pii/S0378437111009186
The Free Press; 1908. VanderWeele TJ. Sensitivity analysis for contagion effects in
Snijders T. The degree variance: an index of graph hetero- social networks. Sociol Methods Res. 2011;40:240–55.
geneity. Soc Networks. 1981;3:163–74. VanderWeele TJ, Ogburn EL, Tchetgen Tchetgen EJ. Why
Snijders T. Stochastic actor-oriented models for network and when “Flawed” social network analyses still yield
change. J Math Sociol. 1996;21:149–72. valid tests of no contagion. Stat Polit Policy.
Snijders TAB. The statistical evaluation of social network 2012;3:1050. https://doi.org/10.1515/2151-7509.1050.
dynamics. In: Sociological methodology. Oxford, UK: Vázquez A. Growing network with local rules: preferential
Basil Blackwell; 2001. p. 361–95. attachment, clustering hierarchy, and degree correla-
Snijders TAB. Models for longitudinal social network data. tions. Phys Rev E. 2003;67:056104. https://doi.org/
In: Models and methods in social network analysis. Cam- 10.1103/PhysRevE.67.056104.
bridge: Cambridge University Press; 2005. p. 215–47. Wang W, Wong G. Stochastic Blockmodels for directed
Snijders TAB. Statistical methods for network dynamics. graphs. J Am Stat Assoc. 1987;82:8–19.
In: Luchini SR et al., editors. Proceedings of the XLIII Wang P, Sharpe K, Robins GL, Pattison PE. Exponential
Scientific Meeting, Italian Statistical Society, Basil random graph (p*) models for affiliation networks. Soc
Blackwell, Ltd; 2006. p. 281–96 Networks. 2009;31:12–25.
de Solla Price DJ. Networks of scientific papers. Science. Wasserman SS, Faust K. Social network analysis: methods
1965;149:510–5. http://www.sciencemag.org/content/ and applications. Cambridge: Cambridge University
149/3683/510.short. Press; 1994.
Steglich C, Snijders TAB, Pearson M. Dynamic networks Wasserman S, Pattison P. Logit models and logistic regres-
and behavior: separating selection from influence. sions for social networks: I. An introduction to Markov
Sociol Methodol. 2010;40:329–93. graphs and p. Psychometrika. 1996;61:401–25.
Szabo G, Barabasi AL. Network effects in service usage. Westveld AH, Hoff PD. A mixed effect model for longitu-
2007. Arxiv preprint. http://lanl.arxiv.org/abs/physics/ dinal relational and network data, with applications to
0611177 international trade and conflict. Ann Appl Stat.
Thompson S. Adaptive web sampling. Biometrics. 2011;5:843–72.
2006;62:1224–34. White D, Harary F. The cohesiveness of blocks in social
Thompson S, Frank O. Mode-based estimation with link- networks: node connectivity and conditional density.
tracing sampling designs. Survey Methodol. Sociol Methodol. 2001;31:305–59.
2000;26:87–98. Wong LH, Pattison P, Robins G. A spatial model for social
Thompson S, Seber GAF. Adaptive sampling. New York: networks. Phys A Stat Mech Appl. 2006;360:99–120.
Wiley; 1996. http://www.sciencedirect.com/science/article/pii/S0378
Toivonen R, Onnela J-P, Saramäki J, Hyvönen J, Kaski 437105004334
K. A model for social networks. Phys A Stat Mech Zijlstra BJH, Duijn MV, Snijders TAB. The multilevel P2
Appl. 2006;371:851–60. http://www.sciencedirect. model: a random effects model for the analysis of
com/science/article/pii/S0378437106003931 multiple social networks. Methodology. 2006;2:42–7.
Survey Methods in Health Services
Research 27
Steven B. Cohen
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
Designing National Health-Care Surveys to
Inform Health Policy and Health Services Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
Types of Health and Health-Care Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
Objectives and Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
Survey Design Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666
Cross-Sectional and Longitudinal Survey Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666
Use of Complex Nationally Representative Survey Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
Sample Size Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
Controlling for Sampling Error and Bias in Survey Estimates . . . . . . . . . . . . . . . . . . . . 669
Sample Size Targets and Precision Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
Building Survey Response Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671
Survey Procedures to Facilitate Respondent Cooperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
Estimation of Health-Care Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
Development of Sampling Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
Adjustments for Unit Nonresponse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
Adjustments for Survey Attrition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674
Post-stratification Adjustments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
Variance Estimation Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676
Integrated Survey Designs: Analytical Enhancements Achieved through
the Linkage of Surveys and Administrative and Secondary Data . . . . . . . . . . . . . . . . . 676
An Example of Survey Integration: The Medical Expenditure Panel Survey . . . . . . . . . 678
Advantages of Integrated Survey Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
Linked Provider Data on Expenditures Improves the Accuracy of National Medical
Expenditure Estimates in the MEPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
Integrated Design Expands Capacity for Longitudinal Analyses . . . . . . . . . . . . . . . . . . . . . . 680
Integrated Design of MEPS Facilitates Examination of Response Error . . . . . . . . . . . . . . 681
Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
S. B. Cohen (*)
Division of Statistical and Data Sciences,
RTI International, Washington, DC, USA
e-mail: scohen@rti.org
# This is a U.S. government work and not under copyright protection in the U.S.; 661
foreign copyright protection may apply 2019
A. Levy et al. (eds.), Health Services Evaluation, Health Services Research,
https://doi.org/10.1007/978-1-4939-8715-3_38
662 S. B. Cohen
and optimal sample designs. This also necessi- Generally, surveys are operationalized by the
tates that subsequent applications of estimation selection of a representative sample of the popu-
strategies to the survey data, as well as analytical lation or universe of interest, referred to as the
techniques and interpretations of resultant target population, and the acquisition of informa-
research findings, are guided by well-grounded tion from the sample units obtained in a structured
statistical theory. The chapter also features impor- manner though administration of a well-
tant sample design considerations, with coverage developed questionnaire. The universe of interest
given to topics that include frame development, is often a population but can be any identifiable
sample size specifications, precision require- group of individual units such as health-care pro-
ments, and sample selection scheme. Adhering viders or events such as health-care visits. If the
to a total survey error framework, challenges that sample is selected as a probability sample, in
characterize health services research efforts are which a frame exists for sample enumeration and
identified, and the interdependence between the every unit selected from the frame has a known
analysts, the health-care survey designers, and the probability of selection for the sample, the find-
statisticians is reinforced. In this context, the ings from the sample are generalizable to the
methods that are discussed are illustrated with population. This is a powerful attribute and
examples from national health-care survey efforts, enhances the integrity of the data collected. Sur-
though the techniques are also applicable to sub- veys can have relatively simple or extremely com-
national or population subgroup specific target plex designs, but the basic principles of sample
populations. design and data collection methodology remain
the same. The complexity of the survey often
reflects the complexity of the subject under
Designing National Health-Care study. As health and health care encompass a
Surveys to Inform Health Policy wide range of phenomena and relate directly and
and Health Services Research indirectly to many other domains, it is necessary
to develop a range of health surveys to respond to
Surveys are a critical source of information for the differing needs for information. Each of these
development, implementation, and evaluation of surveys is often based on complex designs and
policies and practices addressing health and sophisticated data collection mechanisms.
health care. When properly designed, surveys
can provide accurate, unbiased, and generalizable
information on population characteristics, risk
factors, health status, health-care access, utiliza- Types of Health and Health-Care
tion and insurance coverage, and the health-care Surveys
system itself. To be most useful, surveys must be
designed according to sound statistical and meth- There are three main types of health surveys:
odological principles. Health surveys are data col- population-based surveys that obtain information
lection efforts designed to acquire information on directly from the subject (or a suitable proxy),
the nation’s health and health-care characteristics. surveys that obtain information about entities
Several general, though by no means, exhaustive such as health-care providers, and surveys that
uses of health and health-care survey data include are based on administrative records. Population-
identification of public health problems; program based surveys are used when it is essential to
planning and evaluation; health education and describe the characteristics of a defined popula-
health promotion; epidemiological, biomedical, tion. Often the population of interest is the general
and health services research; measurement of the US population and specific subpopulations as
extent and impact of illness; and the measurement defined by such characteristics as age, sex, race/
of the use of health-care services, related medical ethnicity, or socioeconomic status. However, a
expenditures, and sources of payment for care. population may also be defined by occupation or
664 S. B. Cohen
any other well-defined characteristics. For exam- structure, capacity, and functioning of that system.
ple, physicians could be the ultimate sample units These components range from private physicians’
of a population-based survey if the information offices to hospitals, nursing homes, and home
sought related directly to the physician and his health-care agencies. To fully understand the sys-
practice characteristics. Population-based surveys tem, it is necessary to cover all components. In
are most frequently adopted when the sample unit order to select representative samples of these
is the best source for providing the required infor- components, it is necessary to have sampling
mation. When a well-developed sample frame is frames, equivalent to the list frames mentioned
available, samples for population-based surveys above, that identify each member of each type of
can be selected from lists of all eligible subjects. health-care provider. Surveys of providers can
For example, surveys of Medicare or health plan provide information on different aspects of the
beneficiaries can be drawn from a list of enrollees. health-care system. Questions can be targeted at
When such lists are not available, other methods describing the number of components in a sector
such as area-based probability samples are used. as well as their organizational, legal, or financial
Whatever method is adopted, it is important that characteristics. Information can be obtained on
the sample is selected as a probability sample and the individual provider or on the interactions
may be evaluated for potential coverage and among related providers or between providers
response biases. Once a sample is selected, differ- and patients. Interactions with patients can focus
ent data collection modes can be used to collect on the delivery of care or on how care is paid for.
the necessary information including mail, phone, Sometimes the most accurate source of infor-
Web based, and in person. The nature of the con- mation comes from an administrative record that
tent and the sample will affect the mode chosen. was generated as part of the routine operation of a
Of critical importance is the survey instrument system. This clearly would be the case when the
itself. The questionnaire or other data collection objective of the research is the system producing
instruments need to be developed so that accurate the records. For example, utilization of Medicare
and valid information is obtained. The identifica- services is most easily obtained from Medicare
tion of the respondent is also an important step in administrative records. The entire census of
the process. In general, obtaining information records is usually available for these purposes,
directly from the survey subject provides more but often samples of records are taken when the
reliable and valid health data (Madans and entire universe is not needed. When possible,
Cohen 2005). information from administrative records is often
Many population-based health surveys obtain sought as a way to improve the accuracy of infor-
health information directly from the subject (or an mation available from the subject in population-
appropriate proxy) either through in-person or based surveys.
telephone interviews or mail questionnaires. Sup-
plemental information in the form of medical
records is often added to the information obtained Objectives and Content
from subjects to enhance completeness and qual-
ity. To obtain objective standardized information Health surveys used for policy and program
on health characteristics including undiagnosed development can be either focused on a particular
conditions, surveys rely on direct examination of health or health-care issue or can be multipurpose
populations. These surveys are extremely com- in nature. The latter surveys tend to be conducted
plex and expensive to undertake but are of added by public entities and are designed to provide
value to accurately describe the health status of ongoing, descriptive information on a range of
the population, particularly for those subpopula- topics and tend to be based on larger samples.
tions who lack medical care. While the information from these surveys can
In contrast, surveys of the components of the track changes in the population, they are less
health-care system provide information on the effective in obtaining detailed information on a
27 Survey Methods in Health Services Research 665
particular subject or in evaluating the success of the of care. Measures of satisfaction with the usual
survey. Such information is more appropriately source of health care are also collected in the survey
obtained from focused surveys. In order to allow through in-person interviews, in addition to infor-
for comprehensive studies of the current health-care mation on experiencing difficulty or delay in
system, information is needed on the population’s obtaining health care or not receiving needed
access to health care, their utilization of and expen- health-care services.
ditures for health-care services, and their health In addition to national population estimates of
insurance coverage. In a similar vein, an evalu- access to care derived from this survey, the ana-
ation of the system requires an understanding of lytical objectives include a capacity to permit
the patterns and trends in the use of health-care specific comparisons of these measures by age,
services and their associated costs and sources race/ethnicity, sex, perceived health status, health
of payment. To effectively address these issues, insurance coverage, and place of residence. These
researchers and policymakers need accurate analyses permit the identification of potential dis-
nationally representative data to better permit parities in access to care, with particular attention
an understanding of how individual characteris- given to individuals with low incomes, persons
tics, behavioral factors, financial and institutional with disabilities or chronic illness, minorities,
arrangements affect health-care utilization and women and children, elderly, rural, and inner-
expenditures in a rapidly changing health-care city populations. Evaluation of the effects of
market. Health surveys are often designed to acquire changes in the US health-care system on access
this information at both the national and subnational to care for these populations will remain a critical
levels and for policy-relevant population subgroups issue for policymakers in the next few years.
of interest (Madans and Cohen 2005).
Use of Health-Care Services
Access to Care An understanding of the patterns and trends in the
The population’s access to health-care services is use of health-care services is essential to facilitate
an important factor that may influence patterns of evaluations of the current health-care system, in
health-care utilization and associated health out- addition to informing proposals for modification.
comes. Measures of access to care have also been Assessments of the degree of equity in the distri-
used as indicators to assess the quality of the bution of health-care services and the identifica-
nation’s health-care delivery system. In addition tion of health-care disparities require an
to facilitating determinations of the availability of examination of health-care use across vulnerable
a usual source of care for the provision of neces- population subgroups and how it has changed
sary medical care, access to care measures serve to over time. These investigations are essential to
identify barriers to care, which include shortages discern how service utilization varies according
of health-care providers, financial restrictions, to the characteristics of the population, their
limitations in proximity to services, and constraints health plans, and their providers and to identify
associated with waiting times. Population-based other behavioral and institutional factors associ-
national health-care surveys such as the Medical ated with disparities in service use.
Expenditure Panel Survey (MEPS), cosponsored An examination of the variations in the use of
by the Agency for Healthcare Research and Quality health-care services also helps determine the ade-
(AHRQ) and the Centers for Disease Control and quacy of access to care across the population.
Prevention’s National Center for Health Statistics Underutilization of health-care services may be
(CDC/NCHS), collect information on several attributable to limitations in access to care as a
dimensions of access to health care in America. consequence of the lack of adequate health insur-
The survey was designed to yield estimates of the ance, financial resources, or limited availability of
proportion of the population lacking a usual source services in certain areas. Detailed comparisons of
of care as well as the types and characteristics of patterns of use by subpopulations presumed to
providers used by those who do have a usual source require more care (e.g., the elderly, those in poor
666 S. B. Cohen
health, or the terminally ill) relative to their less estimates of the size and composition of the insured
vulnerable counterparts help discern whether and uninsured populations, as well as information
those most in need of care are receiving it. on how demographic characteristics, economic
The utilization measures that are required for factors, and health status affect health plan eligibil-
these analyses typically consist of counts of the ity and decisions to enroll in health insurance plans.
number of visits or events for specific health-care The demand for accurate and reliable information
services that occur in a given calendar year. More on the population’s health-care expenditures, insur-
specifically, health-care services include office- ance coverage, and sources of payment is met by
based visits, ambulatory hospital-based visits, health-care surveys such as the Medical Expendi-
inpatient hospital stays, dental visits, home health ture Panel Survey (MEPS) cosponsored by the
visits, and prescribed medicine purchases. This Agency for Healthcare Research and Quality
information is acquired through population- (AHRQ) and NCHS.
based surveys, surveys of providers, and surveys
based on administrative records. Health-care sur-
veys are designed to acquire this information at Survey Design Framework
both the national and subnational levels and for
policy-relevant population subgroups of interest. Once the underlying survey objectives are articu-
The visible national data collection efforts that lated, greater specificity is required in order to final-
acquire this type of health-care utilization infor- ize the underlying survey design. With several
mation include the MEPS and the National Health competing analytic objectives under consideration,
Interview Survey (population based), the National priorities need to be established which will serve to
Ambulatory Medical Care Survey (provider guide the necessary precision specifications for the
based), and the Medicare Current Beneficiary Sur- core study estimates of the target population param-
vey (primarily population based and supplemented eters. A final set of survey objectives is then devel-
with administrative records). oped that provides details of the core population
domains of interest and the required levels of preci-
Cost of Medical Care and Coverage sion for domain estimates. Underlying study
Health-care expenditures represent one-sixth of hypotheses to be tested also need to be well speci-
the US gross domestic product, exhibit a rate of fied. The precision requirements for the survey esti-
growth that exceeds other sectors of the economy, mates will then be subject to further evaluation and
and constitute one of the largest components of subject to re-specification based on cost constraints.
the Federal and states’ budgets. Although the rate
of growth in health-care costs slowed in the
mid-1990s, it began to rise again shortly after- Cross-Sectional and Longitudinal
ward, fueled primarily by increasing costs for Survey Designs
hospital care and prescription medications. To
effectively address the issue of rising costs, National health-care sample surveys are generally
researchers and policymakers need accurate characterized by cross-sectional or longitudinal
nationally representative data to better permit an designs. The cross-sectional surveys are designed
understanding of how individual characteristics, to provide a snapshot of population characteristics
behavioral factors, financial incentives, and insti- that relate to a fixed point or interval in time.
tutional arrangements affect health-care utiliza- Alternatively, longitudinal surveys collect data
tion and expenditures in a rapidly changing on more than one occasion from the sample mem-
health-care market. bers of the population of analytical interest in
The continuing rise in the number of persons order to measure change and to obtain data for
without private health insurance has made access to time periods too long to recall accurately in a
health insurance coverage a critical public policy single interview. Longitudinal observations are
issue. Informed public policy requires precise essential for characterizing variations in the
27 Survey Methods in Health Services Research 667
population attributes that are sensitive to changes several stages of sampling. Cluster sampling is
in time. also a common feature of these national samples
Longitudinal survey designs are primarily that consider area samples. In these multistage
adopted to provide the necessary information to sample designs, the first stage of sampling
assess changes in the behavior of the population requires the development of a sampling frame
over a specific time period. Often referred to as in which the land mass of the nation is
panel designs, they have the capacity to permit partitioned into primary sampling units (PSUs)
measurement of seasonal and annual variations in defined as counties or groups of contiguous
population characteristics and behavior. These counties. The eligible set of units are then strat-
longitudinal designs are essential to permit the ified based on available geographic and socio-
acquisition of the necessary data that will support demographic information, and a first-stage
analyses that measure the impact of changes in sample of these primary units is then selected.
health status over time for individuals with spe- This process of subsampling areas continues
cific conditions with respect to their use of health- until sample segments consisting of 100–200
care services and related expenditures. Well- housing units are identified and subsampled.
specified sample size requirements for these sur- The final stage of sampling is often character-
veys that are achieved will also permit comparable ized by the selection of a representative sample
studies for different economic groups or special of housing units, which are then interviewed to
populations of interest, such as the poor, elderly, obtain the essential survey information on
veterans, the uninsured, or racial/ethnic groups. which subsequent health services research will
This type of survey design also allows for the be based.
development of economic models designed to This type of sample design has the following
produce national and regional estimates of the attractions. The specification of the sampling
impact of changes in financing, coverage, and frame is both cost-effective and less labor inten-
reimbursement policy over time, as well as esti- sive, where the list frames of target population
mates of who benefits and who bears the cost of members need to only be constructed for the
such changes in policy. sampled areas. In addition, the interviewing
Longitudinal designs are particularly attractive activity is restricted to the sample areas, achiev-
and well suited for studies that examine the extent ing efficiencies in travel time and cost for
of changes in health insurance coverage over time in-person interviewing. In contrast, these effi-
as well as the persistence of catastrophic medical ciencies are achieved at the expense of a loss in
expenditures over time. A cross-sectional survey precision of survey estimates based on the spec-
design can provide accurate national survey esti- ified sample size relative to the precision that
mates of the percent of the population with private would be achieved based upon a simple random
coverage, public coverage, or the uninsured at a sample selection scheme. The increased vari-
fixed point in time. Alternatively, the most accu- ance in survey estimates in a multistage sample
rate population estimates of the percent of popu- design relative to simple random sampling is the
lation ever uninsured in a given year or without result of the greater likelihood of geographi-
coverage for an entire year’s duration come from cally clustered units to have more homogeneous
data collection efforts that have adopted a longi- responses. This within cluster homogeneity is
tudinal survey design. measured by the intra-cluster correlation coeffi-
cient which measures the correlation between
units from the same cluster. Overall, the differ-
Use of Complex Nationally ential in the variance of a survey estimate of a
Representative Survey Designs population mean y based on a complex multi-
stage sample design VarDesign ðyÞ with dispro-
Many of the large national health-care surveys are portionate sampling relative to a simple random
characterized by a complex design structure with sample Varsrs ðyÞ is specified as the design effect.
668 S. B. Cohen
hP pffiffiffi i
Design effect ¼ VarDesign ðyÞ=Varsrs ðyÞ n¼ H
W ð h ÞS ðh Þ c ð hÞ
nh PH o
In addition, the effective sample size for a = V þ ð1=N Þ h W ðhÞS2 ðhÞ ,
design that departs from simple random sampling
P
assumptions is specified as the underlying sample where W(h) ¼ N(h)/N and N¼ H h NðhÞ
size, n, divided by the design effect. (Cochran 1977).
In practice, few health-care surveys are
Effective
sample size conducted with the primary objective of optimiz-
¼ n= VarDesign ðyÞ= Varsrs ðyÞ :
ing the design based on a single parameter esti-
mate. When the design specifications require
Sample Size Determination attention to competing precision specifications
for a variety of survey estimates, the optimization
Stratification is used in sample designs to improve process becomes much more complex. Often,
the precision of survey estimates and also provide sample size optimization for multiple variance
greater control of the sample distribution. For less constraints does not have a closed form solution.
complex designs, when a fixed set of strata (h ¼ 1, Conventional approaches under these circum-
2,. . ..H ) are defined and data collection costs for stances rely on iterative approaches to sample
surveying units from each distinct stratum and the size determination that provide an optimal solu-
associated variance estimates for a core criterion tion when convergence criteria are satisfied
variable have been determined, optimum sample (Chromy 1981).
allocation strategies have been developed. The For national health-care surveys, the precision
values of the samples sizes for each stratum, requirements may be articulated by specifying the
n(h), may be selected to minimize the variance Var amount of error that may be tolerated in the survey
ðyÞ of the survey estimate of the criterion variable y, estimates. To illustrate this process, assume some
expressed as a mean, for a fixed cost (C) or to margin of error, d, in the estimated survey mean of
minimize the cost for a specified level of precision a criterion variable of interest y from the survey
VðyÞ. Considering a cost function of the form has been established, and there is a small risk (α)
that the sponsors are willing to incur that the
Data Collection Cost fCg actual error is larger than d. This can be expressed
XH as Pr j y Yj d ¼ α.
¼ CðoÞ þ h
CðhÞnðhÞ For large samples, n is approximated by
(Design effect) [z2 S2/d2], where z is the cutoff
where C(o) represents an overhead cost and C(h)
point on a standardized normal distribution that
is the data collection cost per unit.
cuts off an area α at the tails and S2 is the
The variance of the estimated mean of a crite-
variance of y.
rion variable will be minimized when n(h) is pro-
pffiffiffi Another way to determine the sample size is to
portional to N ðhÞSðhÞ= cðhÞ, where N(h) is the
specify the relative standard error (RSE) required
population in stratum h and S(h) is the standard
for the resultant survey estimate. The RSE of a
deviation for the criterion variable.
survey estimate is defined as the ratio of the stan-
When cost is fixed, the overall sample size
dard error of the survey estimate SEðyÞ divided by
specification to minimize the variance of survey
the estimate y or RSEðyÞ ¼ SEðyÞ=y
estimate y when considering stratified sampling is
n XH pffiffiffi o pffiffiffi
Since the RSEðyÞ ¼ S=y n, then n
n ¼ ðC CðoÞÞ N ð h ÞS ð h Þ== c ð hÞ =
PH h
pffiffiffi
¼ S2 = y2 RSE2 ðyÞ :
h N ðhÞ SðhÞ== cðhÞ
Alternatively, when the precision level V is For example, if one was attempting to obtain
fixed, the overall sample size specification to min- an estimate of the proportion of the population
imize cost under stratified sampling assumptions is under age 65 uninsured in a given year, p, with a
27 Survey Methods in Health Services Research 669
standard errors for the following survey estimates person level (precision requirement specified
(Table 1): as an average relative standard error):
– Total health expenditures
• A 20% population estimate at the person level – Utilization and expenditure estimates for
for each specified domain (e.g., a percent pop- inpatient hospital stays
ulation estimate such as the rate of uninsured – Utilization and expenditure estimates for
for the population under age 65) ambulatory physician visits
• Mean estimates of the following measures of – Utilization and expenditure estimates for
health-care utilization and expenditures at the dental visits
– Utilization and expenditure estimates for
prescribed medicines
Table 1 Targeted average relative standard errors (RSEs)
for subpopulation of analytic interest in the 1997 Medical
To meet these requirements, the survey must
Expenditure Panel Survey Household Component include a minimum number of persons in each
Average RSE
domain of interest. The sample sizes necessary
for a population Average RSE to satisfy these precision requirements for the
estimate of 20% for mean use survey estimates are then derived, adjusting for
(e.g., percent and expenditure survey nonresponse targets and assumptions
Subpopulation uninsured) estimates
regarding the survey’s sample design and esti-
Persons with .020 .035
family income
mated design effects. The necessary sample sizes
less than 200% required to meet the precision targets for survey
of poverty level estimates presented in Table 1 are specified in the
Persons ages .040 .070 following table (Table 2; Cohen 2000).
18–64 predicted
to incur high
medical Table 2 Targeted sample yields at the end of three core
expenditures data collection rounds for 1997 for subpopulations of ana-
Persons 65 years .042 .070 lytic interest: 1997 Medical Expenditure Panel Survey
and over Household Component
Adults (18 and .080 .135 Targeted
over) with Subpopulation sample yield
functional
Person with family income less than 15,000
impairments
200% of poverty level
measured in
terms of ADLsa Persons ages 18–64 predicted to incur 4000
high medical expenditure
Adults (18 and .080 .135
over) with other Persons 65 years and over 3700
impairments Adults (18 and over) with functional 1000
measured in impairments measured in terms of
terms of IADLsb ADLsa
Children (under .080 .135 Adults (18 and over) with other 1000
age 18) with impairments measured in terms of
activity IADLsb
limitations Children (under age 18) with activity 1000
Overall .015 .023 limitations
population Overall population 34,000
Source: Center for Financing, Access, and Cost Trends, Source: Center for Financing, Access, and Cost Trends,
Agency for Healthcare Research and Quality: Medical Agency for Healthcare Research and Quality: Medical
Expenditure Panel Survey Household Component, 1997 Expenditure Panel Survey Household Component, 1997
a a
Need help in one or more activities of daily living (ADLs), Need help in one or more activities of daily living (ADLs),
such as bathing and dressing such as bathing and dressing
b b
Need help in one or more instrumental activities of daily Need help in one or more instrumental activities of daily
living (IADLs), such as shopping or paying bills living (IADLs), such as shopping or paying bills
27 Survey Methods in Health Services Research 671
The current MEPS sample consists of approx- (Stoop et al. 2010). Reluctant respondents are
imately 14,000 households and 32,000 individ- also more likely to attrite over the course of the
uals and includes oversampling of African- survey. Within fixed survey budget constraints,
Americans, Hispanics, Asians, and low-income these costly late-stage call-back interviews impact
households. With respect to desired levels of pre- on overall data quality, timeliness of data release,
cision for survey estimates, a relative standard and overall sample size specifications.
error (RSE) specification of less than or equal to Several studies have demonstrated the utility of
10% is recommended for survey estimates that subsampling nonrespondents in a survey to help
characterize policy-relevant population sub- minimize nonresponse bias and achieve efficien-
groups which include racial and ethnic minorities cies in data collection efforts. Many of these appli-
(RSE (Y) ¼ standard error (Y ) divided by the cations are modeled after the technique proposed
estimate Y ). by Hansen and Hurwitz (1946) to select a sub-
sample from the nonrespondents to get an esti-
mate for the subpopulation represented by the
Building Survey Response Rates nonrespondents (Vartivarian et al. 2006). Variants
of the procedure include application of double
In national household health-care surveys, signif- sampling for ratio and regression estimation with
icant amounts of resources are allocated to obtain a subsampling of the nonrespondents.
the participation of households that constitute the The subsampling of nonrespondents is consid-
last 5–10% of the overall survey response rate. A ered in order to limit survey costs while
substantial number of households that respond maintaining a nationally representative sample.
toward the end of the survey field period are In this vein, the National Survey of Family
characterized by an initial refusal to participate. Growth has implemented a multiphase design
When the specified response rates are in jeopardy which employs the subsampling of nonrespon-
of not being met, concerted use of nonresponse dents. These approaches are increasingly attrac-
conversion techniques are employed in tandem tive to survey designers because they allow for
with occasional extensions of the length of the methods to control the costs at the end of a data
field period. Applications of these “ninth inning” collection period while addressing concerns about
field force engagements to achieve target survey nonresponse rates and errors. For many national
response rates are not cost neutral and often result in-person household surveys, large costs are
in significant increments to data collection costs. incurred for travel to sample segments to inter-
The primary objective of this approach is to view a small set of sample units, usually those
enhance overall longitudinal survey response extremely difficult to contact in prior visits or
rates and achieve a reduction in survey error repeatedly displaying some reluctance to respond
attributable to nonresponse. It has also to the survey. By restricting these expensive visits
been noted that reluctant respondents occasion- to a sample of the nonrespondents at the end of the
ally differ from the more cooperative survey study, a more cost-effective method concentrates
respondents on sociodemographic characteristics, remaining resources on increasing response rates.
which may translate to significant differences in Additional examples of this approach are found in
the core analytic measures obtained from the sur- the General Social Survey, the National Comor-
vey (Stinchcombe et al. 1981; Cohen et al. 2000; bidity Survey Replication, the National Survey of
Lynn 2009). These differences are a key reason for America’s Families, and the National Survey of
continuing to spend resources following them. Recent College Graduates. Related efforts
Alternatively, findings from the European Social focused on subsampling callbacks to improve sur-
Survey and a number of state-level health-related vey efficiency have yielded mixed results, with
surveys in the USA suggest there are few statisti- trivial savings achieved in applications to the
cally significant differences between the sample National Comorbidity Survey, contrasted with
obtained before and after refusal conversion more cost-effective results attained in the
672 S. B. Cohen
American Community Survey. Adaptive survey record all contacts (in-person, telephone, by
designs have also been considered as a related mail) that are made with the household and
framework to improve the efficiency of survey whether they were successful or not. Where
data collection through the application of more appropriate, the conversion attempt may
tailored data collection treatments for different involve reassigning work to a more experienced
households identified using paradata. A special interviewer.
case of an adaptive survey design is the responsive
survey design, where alternative treatments or
data collection strategies are identified (Groves
Estimation of Health-Care Parameters
and Heeringa 2006).
Development of Sampling Weights
selecting the h-th primary sampling unit; Pi|h is the survey sampling weights to correct for potential
second-stage conditional probability of selecting nonresponse bias, most often applied at the
the i-th segment, given the h-th primary sample housing-unit level. To facilitate these analyses,
sampling unit is selected; Pj|hi is the third-stage the demographic, socioeconomic, health-related,
conditional probability of selecting the j-th hous- and interview-specific profiles of respondents
ing unit, given the i-th segment in the h-th primary and nonrespondents are examined, based on
sample sampling unit is selected; and Pk|hij is the available data for both groups (Groves et al.
final-stage conditional probability of selecting the 2009). Based on the results of these analyses,
k-th individual, given the j-th housing unit in the i- weighting classes are specified to adjust for
th segment in the h-th primary sample sampling housing unit nonresponse. For illustrative pur-
unit is selected. poses, consider weighting classes defined by
Generally, Pk|hij ¼ 1, as all members of a sam- cross-classifications of the following measures
pled household are selected to participate in the from the Medical Expenditure Panel Survey
survey with certainty. These sampling weights (Cohen et al. 1999):
may be interpreted as inflation factors to represent
the number of units in the target population asso- • Family income of primary reporting unit
ciated with the respective sample unit. (less than $10,000; $10,000–19,999;
$20,000–34,999; $35,000 or more; unknown)
• Size of dwelling unit (one, two, three, four,
Adjustments for Unit Nonresponse five, or more)
• MSA size (MSA, population 500,000 or more;
Once the data collection effort is concluded, care MSA, population less than 500,000; non-MSA)
must be taken to further adjust the survey unit • Region (Northeast, Midwest, South, West)
sampling weights to correct for survey non- • Employment classification of reference person
response. In general, the greater the difference (government, private sector, not in labor force/
among subgroups in response rates and the ana- never worked/worked without pay, unknown
lytic characteristic(s) of interest, the greater is the or under 18 years of age)
need to adjust survey weights for nonresponse. In • DU-level personal help measure (units with at
practice, weighting class nonresponse adjust- least one member unable to perform personal
ments are implemented under the assumption care activities or other routine needs, remaining
that nonresponding sampling units have units with person 70 and over, remaining units
responded in a manner similar to that of respon- with no limitations)
dents with similar sociodemographic and eco- • Propensity to cooperate, based on providing
nomic characteristics within the same adjustment phone number during NHIS (phone number
class. Properly designed, a weighting class non- provided, phone present but no number pro-
response adjustment strategy can result in reduced vided, no phone, unknown)
nonresponse bias. The technique requires that the • Age of reference person (under 25, 25–34,
sample be partitioned into mutually exclusive and 35–44, 45–64, 65 and over)
exhaustive classes, with classification information • Race/ethnicity of reference person (Hispanic,
available for both responding and nonresponding black non-Hispanic, other)
units that are correlated with response propensity • Sex of reference person
and the core criterion variables of the study (Cox • Marital status (married, spouse present, other)
and Cohen 1985).
In national health-care surveys, analyses are Overall, C cells were identified based on cross-
conducted of characteristics associated with dif- classifications of these measures, with cell col-
ferential nonresponse. These analyses help iden- lapsing often specified according to a hierarchy
tify the most important measures to use in determined by significance level to insure ade-
developing a nonresponse adjustment to the quate sample representation of the cell. Following
674 S. B. Cohen
this approach, the nonresponse adjustment for the health care of individuals and families, and the
c-th weighting class takes the form health needs of specific population groups such as
P P the elderly and children. In longitudinal survey
BðcÞ ¼ iec EðiÞDUPSWT ðiÞ= iec RðiÞ designs with multiple rounds of data collection,
DUPSWT ðiÞ the overall survey response rate is a multiplicative
function of the round-specific response rates. In
where DUPSWT(i) is the initial housing unit addition to adjusting for survey nonresponse at the
weight for the i-th sample housing unit, which first round of a longitudinal survey with multiple
reflects the reciprocal of the housing unit’s overall rounds of data collection, additional adjustments
selection probability for the sample survey, to the estimation weights are necessary to
E(i) ¼ 1 for all survey housing units selected for help mitigate the potential influence of survey
interviews, E(i) ¼ 0 otherwise, R(i) ¼ 1 for all attrition on bias in estimates. When the rate of
selected housing units responding in the survey, partial response is modest, it is often preferable
R(i) ¼ 0 otherwise, and iec represents eligible to treat the partial respondents as complete
housing units classified in weighting class c. nonrespondents. In this case, an additional
Consequently, the estimation weight adjusted for weighting class adjustment to the survey estima-
the respective survey’s housing unit nonresponse, tion weight to control for survey attrition is appro-
WGTHU1(i), for the i-th housing unit associated priate. For example, if a survey required three
with class c, takes the form WGTHU1(i) ¼ B(c) rounds of data collection to obtain calendar year
DUPSWT(i). Generally, survey participation is information for the population, the first-round
an all or none decision for the entire household, person-level estimation weights would be
so surveys that interview all members of sam- adjusted for survey attrition in the following
pled households will assume this nonresponse manner:
adjusted household sampling weight, WGTSP1
WGTSP2ðiÞ ¼ FðcÞ WGTSP1ðiÞ
(i) ¼ WGTHU1(i). Alternatively, when there is
for the i th person associatedwith class c,
differential nonresponse within households, an
additional weighting class adjustment should be
where the nonresponse adjustment for the c-th
implemented to correct for this additional level
weighting class takes the form
of person-level nonresponse in the survey.
Based on detailed studies of unit nonresponse P P
Fð c Þ ¼ iec EðiÞWGTSP1ðiÞ= iec RðiÞ
in national health-care surveys, studies have
revealed survey nonrespondents were more WGTSP1ðiÞ
likely to consist of smaller households, reside
in metropolitan areas, and have higher incomes. and
WGTSP1(i) is the round 1 nonresponse
Adjustments for Survey Attrition adjusted person-level weight for the i-th round
1 respondent; E(i) ¼ 1 for all round 1 respondents
Some of the large annual national health-care with positive values of WGTSP1(i); E(i) ¼ 0 oth-
surveys also are characterized by a longitudinal erwise; R(i) ¼ 1 for all persons with E(i) ¼ 1 who
design. The data collected in these ongoing lon- responded for their entire period of eligibility in
gitudinal surveys may be designed to permit stud- the calendar year covered by the survey over all
ies of the determinants of health insurance three data collection Rounds; R(i) ¼ 0 otherwise;
coverage and the use of health services and expen- and iec represents all full- and part-year respon-
ditures over time and to identify changes in the dents classified in weighting class c.
provision of health care in relation to social and Often, a logistic regression analysis is used
demographic factors such as employment or to identify the most important measures to include
income, the health status and satisfaction with in specifying a nonresponse adjustment to the
27 Survey Methods in Health Services Research 675
survey sources may still occur. Several factors can Statistical software packages that are com-
contribute to differences in estimates of health- monly used to estimate standard errors from com-
care parameters across surveys. These factors plex multistage designs using the Taylor series
include survey content and questionnaire design, linearization method include SAS ® (version 8.2
definitions of the criterion measures, survey or higher), SUDAAN ®, Stata ®, and SPSS ® (ver-
design and methods, and post-data collection pro- sion 12.0 or higher). The software packages vary
cessing such as editing, imputation, and estima- with respect to the specific types of estimates and
tion techniques. Survey design features such as models that can be produced accounting for the
length of recall period, sample design, and complex survey design and the treatment of miss-
response rates affect the accuracy and precision ing data. For complete information on the capa-
of survey estimates of coverage. Alternative bilities of each package, analysts need to refer to
methodologies for editing the survey data, impu- the appropriate software user documentation man-
tation procedures, and adjustments for survey uals. The websites for SAS, SUDAAN, Stata, and
nonresponse can also affect the final survey esti- SPSS are http://www.sas.com, http://www.rti.org,
mates that are generated. In addition, estimates http://www.stata.com, and http://www.spss.com,
within and across surveys differ depending on respectively. The R language also has a package
the duration of the time period that the survey for complex survey analysis. Information on this
estimates cover. package can be found in the June 2003 R News
newsletter available on the R website at http://
www.r-project.org.
Variance Estimation Considerations Standard errors for these national survey esti-
mates are most accurate when the analytic file
To obtain accurate estimates from complex sur- contains all of the sample persons (e.g., those
vey data, for either descriptive statistics or more with positive values for the person weight vari-
sophisticated analyses based on multivariate able) and the appropriate syntax is used to analyze
models, the survey design complexities need to population subgroups. The table above provides
be taken into account. This is achieved by apply- examples of basic programming code for SAS,
ing the survey estimation weights to produce the SUDAAN, Stata, and SPSS to generate estimates
survey estimates and using an appropriate tech- from MEPS person-level files for the survey var-
nique to derive standard errors associated with iable that measures annual health-care expendi-
the weighted estimates. Several methods for esti- tures, totexp (Table 3).
mating standard errors for estimates from com-
plex surveys have been developed, including the
Taylor series linearization method, balanced Integrated Survey Designs: Analytical
repeated replication, and the jackknife method. Enhancements Achieved through
The national health-care survey public use files the Linkage of Surveys
generally include variables to obtain weighted and Administrative
estimates and to implement a Taylor series and Secondary Data
approach to estimate standard errors for weighted
survey estimates. These variables, which jointly The analytical capacity, quality, and data content
reflect the underlying survey design, include the of household-specific health and health-care sur-
estimation weight, sampling strata, and primary veys are visibly enhanced through integrated
sampling unit (PSU) (Korn and Graubard 1999). designs that feature one-to-one data linkages
The documentation and codebook for the public between surveys, administrative and secondary
use files should contain these survey design vari- data, and future connectivity to electronic health
ables. For example, the documentation should records. The data linkages include direct matches
include the person weight, stratum, and PSU to additional health and socioeconomic measures
variables. acquired for the same set of sample units from
27 Survey Methods in Health Services Research 677
An Example of Survey Integration: The can be provided for individuals, families, and
Medical Expenditure Panel Survey population subgroups of interest. The data col-
lected in this ongoing longitudinal study also per-
One of the core health-care surveys in the USA, mit studies of the determinants of the use of
the Medical Expenditure Panel Survey (MEPS), is services and expenditures and changes in the pro-
characterized by an integrated survey design. vision of health care in relation to social and
Since its inception, the primary analytical focus demographic factors such as employment or
of the MEPS has been directed to the topics of income, the health status and satisfaction with
health-care access, coverage, cost, and use. Over health care of individuals and families, and the
the past several years, the MEPS data have health needs of specific population groups such as
supported a highly visible set of descriptive and the elderly and children.
behavioral analyses of the US health-care system The set of households selected for the House-
(Cohen et al. 2009). These include studies of the hold Component is a subsample of those partic-
population’s access to, use of, and expenditures ipating in the National Health Interview Survey
and sources of payment for health care, the avail- (NHIS), an ongoing annual household survey of
ability and costs of private health insurance in the approximately 40,000 households (100,000 indi-
employment-related and non-group markets, the viduals) conducted by the National Center for
population enrolled in public health insurance Health Statistics and Centers for Disease Control
coverage and those without health-care coverage, and Prevention, to obtain national estimates of
and the role of health status in health-care use, health-care utilization, health conditions, health
expenditures, household decision making, and status, insurance coverage, and access (Botman
health insurance and employment choices. As a et al. 2000). In addition to the cost savings
consequence of its breadth, the data have achieved by eliminating the need to indepen-
informed the nation’s economic models and their dently list and screen households, selecting a
projections of health-care expenditures and utili- subsample of NHIS participants has resulted in
zation. The level of the cost and coverage detail an enhancement in analytical capacity of the
collected in the MEPS has enabled public and resultant survey data. The use of the NHIS data
private sector economic models to develop in concert with the data collected for the MEPS
national and regional estimates of the impact of provides an additional capacity for longitudinal
changes in financing, coverage, and reimburse- analyses not otherwise available. Furthermore,
ment policy, as well as estimates of who benefits the large number and dispersion of the primary
and who bears the cost of a change in policy. The sampling units (~200) in MEPS has resulted in
MEPS consists of a family of three interrelated improvements in precision over prior expendi-
surveys: the Household Component (HC), the ture survey designs.
Medical Provider Component (MPC), and the The survey consists of an overlapping panel
Insurance Component (IC). The survey is spon- design in which any given sample panel is
sored by the Agency for Healthcare Research and interviewed a total of five times in person over
Quality (AHRQ). 30 months to yield annual use and expenditure
The MEPS Household Component was data for two calendar years. These rounds of
designed to provide annual national estimates of interviewing are spaced about 5–6 months apart.
the health-care use, medical expenditures, sources of The interview is administered through a
payment, and insurance coverage for the US civilian computer-assisted personal interview mode of
noninstitutionalized population. In addition to data collection and takes place with a family
collecting data to yield annual estimates for a variety respondent who reports for him/herself and for
of measures related to health-care use and expendi- other family members. Currently, the MEPS sam-
tures, MEPS also provides estimates of measures ple consists of 14,000 families and 32,000 indi-
related to health status, demographic characteristics, viduals and reflects an oversample of the
employment, and access to health care. Estimates following policy-relevant population subgroups:
27 Survey Methods in Health Services Research 679
Hispanics, blacks, and Asians. Data from two • Individuals 18–64 years who were predicted to
panels are combined to produce estimates for incur high medical expenditures
each calendar year. • Individuals predicted to have family income
The MEPS Medical Provider Component is a less than 200% of the poverty level
survey of the medical providers, facilities, and
pharmacies that provided care or services to sam- Detailed probabilistic models were to be used
ple persons. The primary objective is to collect to target the oversample of individuals likely to
detailed data on the expenditures and sources of incur high levels of expenditures in addition to
payment for the medical services provided to those with family incomes less than 200% of the
individuals sampled for the MEPS. Such data are poverty level. Data collection and training costs
essential to improve the accuracy of the national associated with this independent screening inter-
medical expenditure estimates derived from the view were projected to exceed several million
MEPS, since household respondents are not dollars. As part of the DHHS Survey Integration
always the most reliable source of information Plan, this separate screening interview was elim-
on medical expenditures. The data also serve as inated. Instead, NHIS was specified as the sam-
a primary imputation source of medical expendi- pling frame for MEPS. In addition to the cost
ture data to correct for the item nonresponse on savings achieved by substituting NHIS as the
this measure by the MEPS household sample MEPS sample frame, the design modification
participants. resulted in enhanced analytic capacity of the resul-
The MEPS Insurance Component was designed tant survey data. The use of the NHIS data in
to produce national and state-level estimates of the concert with the MEPS data provides an addi-
cost of employer-sponsored coverage. National, tional capacity for longitudinal analyses not avail-
regional, and state-level estimates can be made of able in the original design. Furthermore, the
the amount, types, and costs of job-related health greater number and dispersion of the sample pri-
insurance. Interviews are conducted annually via mary sampling units that comprise the MEPS
mail with 30,000 establishments to obtain national national sample resulted in improvements in pre-
and state-specific estimates of the availability of cision over the original design specifications.
health insurance at the workplace, the type of cov- These features are in clear contrast to new frame
erage provided by employers, and the associated construction and/or independent screening inter-
costs of coverage. views that characterize unlinked survey design
efforts.
The integrated survey design model also pro-
vides additional features with respect to improv-
Advantages of Integrated Survey ing data collection strategies tied to the core
Designs survey to better ensure that target response rates
are achieved. When the core survey is linked to a
The original MEPS sample design called for an larger host survey, the survey operations and field
independent screening interview to identify a staff that are armed with detailed record of calls
nationally representative sample and facilitate data from the host survey will be better poised to
oversampling of policy-relevant population sub- commit and target necessary nonresponse conver-
groups. Detailed information was to be obtained sion techniques to those cases that included reluc-
on sociodemographic, economic, and health sta- tant or hard to reach respondents in the prior data
tus measures to support an oversample of the collection effort.
following policy-relevant groups:
Capacity to Reduce Bias Attributable to Survey
• Adults (18 years and older) with functional Nonresponse As a consequence of the complex
impairments design of the MEPS HC, the MEPS sample
• Children with limitations of activity data must be appropriately weighted to obtain
680 S. B. Cohen
approximately unbiased national estimates for the comprehensive source of information on the
US civilian noninstitutionalized population. The health status, health-care use and expenditures,
MEPS estimation weights are built from the esti- health insurance coverage, and socioeconomic
mation weights developed for the NHIS. The and demographic characteristics of the entire
use of a sampling weight that has already incor- spectrum of Medicare beneficiaries. Rather than
porated the selection probabilities of the sample being linked to a larger survey, the sample for
design and appropriate nonresponse and post- MCBS is drawn from administrative records in
stratification adjustments is an added feature of CMS’s Medicare enrollment file. The Medicare
the integrated survey design. Since survey non- enrollment files also provide mailing addresses for
response is potentially a significant source of bias the sample. Medicare administrative files provide
in survey estimates, the MEPS dwelling unit sam- not only the sample frame but also service, diag-
pling weights included an adjustment to help nosis, and charge details for covered events,
reduce its potential for bias. In general, the greater month-by-month information on enrollment sta-
the difference among subgroups in response rates tus, payments for Medicaid buy-ins and HMO
and the analytic characteristic(s) of interest, the membership, and data for nonrespondents to the
greater is the need to adjust survey weights for interview.
nonresponse. In the absence of an integrated sur-
vey design, the nonresponse adjustment strategy
adopted for the MEPS would be constrained to Linked Provider Data on Expenditures
sociodemographic and economic information that Improves the Accuracy of National
were available at the geographic level (e.g., Medical Expenditure Estimates
county, state, division, and region), rather than in the MEPS
the detailed information available for each house-
hold participant in the NHIS sample selected for The MEPS Medical Provider Component
the MEPS. This is typical of standard household (MPC) was primarily designed to reduce the
surveys which use aggregate data at the geo- bias associated with national medical expendi-
graphic level to inform the nonresponse adjust- ture estimates derived from household reported
ments (e.g., per capita income for the county data. The estimation strategy that has been con-
based on secondary data available from the Cen- sidered to support the data replacement strategy
sus, physicians per 1000 populations and other is comprehensive in nature, making full use of
health manpower statistics at the county-level MPC data to correct for missing and poor-
available from the Area Resources File). In the quality household reported expenditure data. In
absence of an integrated survey design for the addition, it provides the basis for a recalibration
MEPS, none of the household-specific informa- of household reported data, if significant
tion that were factors in the nonresponse adjust- reporting differentials are observed in expendi-
ments would be available, other than the measures ture data between households and medical
of MSA size and region. Clearly the MEPS link- providers.
age to the NHIS enhances the capacity of the
specification of more direct nonresponse adjust-
ments to better correct for survey nonresponse. Integrated Design Expands Capacity
for Longitudinal Analyses
Another survey that benefits by this integrated
design model is the Medicare Current Beneficiary The MEPS survey integration with the National
Survey (MCBS) sponsored by the Centers for Health Interview Survey (NHIS) permits an
Medicare and Medicaid Services. The MCBS is enhanced capacity for longitudinal analyses of
a continuous, multipurpose survey of a nationally trends in health-care utilization, coverage, access,
representative sample of aged, disabled, and insti- and health status. The parallel structures of the two
tutionalized Medicare beneficiaries. It provides a surveys make their integration for longitudinal
27 Survey Methods in Health Services Research 681
analyses easier to accomplish. To facilitate the con- error associated with household reports. While the
duct of longitudinal cohort analyses using the NHIS development of adjustment factors that correct for
and MEPS data in tandem, NHIS/MEPS linkage both underreporting and overreporting of health-
files have been developed. These NHIS/MEPS link- care utilization by household respondents is per-
age files allow users to link persons in the MEPS missible, which would allow for household event
public use files to the records of the same persons in counts to be either scaled down or up, based on
the previous NHIS public use files. Examples of reported or imputed MPS information, an alterna-
enhanced longitudinal analyses based on the tive approach would be to limit the adjustment to
NHIS-MEPS linked files include studies of the correct the outlier cases (the poorest household
long-term uninsured and the conduct of episodes reporters of utilization).
of illness studies over an extended time interval.
Constraints
Integrated Design of MEPS Facilitates
Examination of Response Error It is important to note that several of the desired
features of an integrated survey design are the
In addition to serving as the primary source for sources of its most prominent limitations. As a
the expenditures in the MEPS, the design of the consequence of acquiring more information
Medical Provider Component provides data that on survey respondents through data augmenta-
could potentially facilitate adjustments to tion and data linkages over time, these analyti-
household reported utilization data that correct cal enhancements also increase the potential for
for reporting errors (both underreporting and disclosure of confidential information. To guard
overreporting (telescoping errors)), under the against this, it is necessary to impose greater
assumption that the medical provider reports restrictions on the release of data to the public.
are the “gold standard.” Within a given event The sponsorship and operation of a data center
type, the number of reported events can be to ensure that confidential data is in a secure
aggregated up to the person-provider pair level. environment while permitting more detailed
The distribution of the difference in utilization analyses to be conducted with the nonpublicly
counts between the medical provider and house- available data offers a compromise between
hold reports can then be examined. For each greater data access and achieving confidentiality
event type at the person-provider level (ij), a protection of data. However, this investment in the
difference measure, DIFFij, may be computed, development and operation of a secure data center
where: requires additional funds that may compete with
sample size enhancements or planned research
DIFFij ¼ MPSCOUNTij HHSCOUNTij efforts.
An integrated survey design also requires
MPSCOUNTij ¼ the number of events for the greater coordination across data sources and orga-
person-provider pair reported in provider nizations. There are often competing demands on
survey the host sample frames that may limit the full
HHSCOUNTij ¼ the number of events for the benefits of an integrated design from being real-
person-provider pair reported in household ized. Furthermore, the enhanced longitudinal data
survey that comes with an integrated survey design will
often be characterized by more frequent survey
The use of MPC data to develop adjustment contacts and rounds of data collection which will
factors that recalibrate or correct household impact the overall survey response rate. When
reported data to reflect utilization counts based properly designed and coordinated, as implemented
on MPC data offers a capacity to inform a utiliza- for the MEPS, the integrated survey design remains
tion adjustment to correct for potential response an attractive model for consideration and adoption.
682 S. B. Cohen
Policy-Relevant Examples from the (for families and individuals); the cost, scope, and
Medical Expenditure Panel Survey breadth of private health insurance coverage held
(MEPS) by and available to the US population; and the
specific services purchased through out-of-pocket
Design of the MEPS to Inform Health and/or third-party payments.
Policy and Health Services Research The MEPS data support a wealth of basic
descriptive and behavioral analyses of the US
The MEPS research program, broadly defined to health-care system. These include studies of the
encompass data collection, data development, population’s access to, use of, and expenditures
research, and the translation of research into prac- and sources of payment for health care, the avail-
tice, is directly tied to the strategic goal of identi- ability and costs of private health insurance in the
fying strategies to improve access, foster employment-related and non-group markets, the
appropriate use, and reduce unnecessary expendi- population enrolled in public health insurance
tures. Few other surveys provide the foundation coverage and those without health-care coverage,
for estimating the impact of changes on different and the role of health status in health-care use,
economic groups or special populations of inter- expenditures, household decision making, and
est, such as the poor, elderly, veterans, uninsured, health insurance and employment choices
and racial/ethnic groups. The public sector relies (Cohen et al. 2009; Cohen 2003).
upon the MEPS research findings to evaluate Efforts to address inequities in the availability
health reform policies, the effect of tax code of private health insurance and to control health
changes on health expenditures and tax revenue, insurance premiums and medical care costs must
and proposed changes in government health pro- necessarily focus on the employment-related
grams such as Medicare. In the private sector, health insurance market. Historically, the analyses
these data are also used to develop economic of data from the MEPS family of surveys have
projections. figured prominently in this arena. As is evidenced
The Medical Expenditure Panel Survey in the recent Institute of Medicine (IOM) Report
(MEPS), initiated in 1996, is designed as a con- on “Health Insurance is a Family Matter,” the
tinuous ongoing survey to permit annual estimates report notes that “the most comprehensive data
of health-care utilization, expenditures, insurance on who uses what health-care service and how
coverage, and sources of payment for the US much is paid for those services comes from the
civilian noninstitutionalized population. Over the Medical Expenditure Panel Survey”. MEPS-
past several years, the MEPS data and associated related analyses are prominently used to inform
research findings have quickly become a linchpin components of this IOM report focused on issues
for the nation’s economic models and their pro- of insurance coverage and cost.
jections of health-care expenditures and utiliza- MEPS-derived estimates of the health insur-
tion. This combination of breadth and depth of the ance status of the US civilian noninstitutionalized
data enables public and private sector analysts to population are critical to policymakers and others
develop economic models designed to produce concerned with access to medical care and the cost
national and regional estimates of the impact of and quality of that care. Health insurance helps
changes in financing, coverage, and reimburse- people get timely access to medical care and pro-
ment policy, as well as estimates of who benefits tects them against the risk of expensive and unan-
and who bears the cost of a change in policy. Since ticipated medical events. When estimating the
1977, AHRQ’s expenditure surveys have been an size of the uninsured population, it is critical to
important and unique resource for public and pri- consider the distinction between those uninsured
vate sector decision makers. The survey is unique for short periods of time and those who are long-
in the level of detail of information obtained on term uninsured across several years in duration.
the health-care services used by Americans at the Compared to people with health-care coverage,
household level and their associated expenditures uninsured people are less likely to visit a doctor,
27 Survey Methods in Health Services Research 683
have a usual source of medical care, receive pre- institutional arrangements affect health-care
ventive services, or have a recommended test or expenditures in a rapidly changing health-care
prescription filled. Consequently, individuals that market. Research findings for the MEPS have
experience extended periods of being uninsured also served to provide health-care decision makers
are particularly at risk for restrictions in access to with a better understanding of the highly concen-
care and exposure to serious illness and significant trated nature of health-care expenditures and the
financial jeopardy. Since many individuals persistence of these high expenditures over time.
undergo transitions in the acquisition and loss of MEPS studies that examine the persistence of
health insurance coverage over time, an important high levels of expenditures over time have been
consideration is the length of duration of spells of essential to help discern the factors most likely to
uninsurance and the capacity of this lack of cov- drive health-care spending and the characteristics
erage to lead to less efficient use of health-care of the individuals who incur them.
services and facilities. In this regard, MEPS Recently, greater attention and prioritization
research efforts have demonstrated that individ- have been given to data collection procedures,
uals who experience short spells of being predictive modeling, and estimation strategies that
uninsured differ significantly from those who help improve the precision and quality of the sur-
have been uninsured for more than a year on vey estimates that characterize this policy-relevant
several dimensions which include access to population subgroup of individuals with high
employer-sponsored coverage, their attitudes and levels of medical expenditures. Research findings
preferences regarding the need for coverage, and from MEPS also provide clear evidence of the
their sensitivity to the cost of acquiring coverage. utility and appropriateness of probabilistic models
In addition to providing cross-sectional estimates as prediction tools for identifying individuals likely
of health insurance coverage each year, the MEPS to incur high levels of medical expenditures in
has the added analytical capacity to identify indi- future years. To the extent that this policy-relevant
viduals with gaps in coverage over time as well as subset of the population is amenable to successful
the duration of the spells of being uninsured for up prediction through the application of well-
to 4 years. developed models, the methodology continues to
In addition to measuring actual out-of-pocket find several venues for application. Prominent
financial burdens for health care, MEPS provides examples of applications ripe for implementation
the only nationally representative data that can be include adoption of oversampling strategies for
used to measure the extent of underinsurance in the national health-care surveys and the identification
USA. Underinsurance is defined as being at risk of of individuals whose health status improvements
spending more than a certain amount of family through disease management programs could most
income on out-of-pocket expenses in the event of significantly result in potential reductions in overall
a catastrophic medical illness. Estimates of the future year health-care expenditures.
underinsured require linked information on fami- Given the growing attention focused on
lies health insurance benefits, family income, and achieving a better understanding of the impact of
risk of experiencing catastrophic medical events. rising prescribed medicine costs on health and the
With health-care absorbing increasing amounts consumption of health services, it is also impor-
of the nation’s resources, the question of how to tant to note the utility of the MEPS to inform
implement health system design innovations that studies examining the association between the
encourage the provision of high-quality and effi- use of newer medicines and morbidity, mortality,
cient health-care delivery is a sentinel concern of and health spending. Using this data resource,
both private and public payers. To effectively researchers have been able to determine the direc-
address this issue, researchers and policymakers tion of the association between the use of newer
have benefited from MEPS research findings to drugs and all other types of nondrug medical
better understand how individual characteristics, spending. Attention has also focused on studies
behavioral factors, financial incentives, and that identify inappropriate medication use, which
684 S. B. Cohen
is a major patient safety concern and has signifi- health status, presence of limitation in activity,
cant consequences with respect to health-care level of education, poverty status, born in USA,
costs. With its wealth of data on health conditions, and total health-care expenditures (Cohen 2003).
prescribed medication utilization and expendi- Once these measures were controlled for in the
tures, and associated therapeutic drug classifica- logistic regression model, it was possible to deter-
tions, the MEPS data have also been helpful to mine whether an individual’s classification with
researchers attempting to identify potentially respect to MEPS panel (year 1 of panel 7 vs. year
inappropriate medication use in the community. 2 of panel 6), which varied significantly in terms
of level of survey attrition, influenced the predic-
tion of the likelihood of being uninsured in a
Issues on Measuring and Estimating calendar year. Under the assumption that the two
Health Insurance Coverage in Surveys distinct MEPS panels that are combined to pro-
duce annual survey estimates were characterized
Testing for the Impact of Survey by the same survey response rates, one would not
Attrition on Health Insurance Coverage expect to observe a significant panel effect. Given
Estimates in the MEPS the higher level of nonresponse across MEPS
panels, where the older panel is affected by greater
The following study illustrates a test to assess the levels of survey attrition, a test for a MEPS panel
quality of the nonresponse adjustments employed effect affords the opportunity to assess the influ-
in the MEPS to adjust for potential nonresponse ence of unadjusted components of survey attrition
bias attributable to survey attrition. The over- on health insurance coverage estimates in a
lapping panel design of the MEPS survey is par- modeling context. The results of the logistic
ticularly well suited to inform these studies. This regression analysis reveal no significant effect
comparison of the stability of national estimates of for MEPS panel classification in distinguishing
health insurance coverage, subject to varying the full-year uninsured individuals from their
levels of survey attrition, made use of a model- insured counterparts (Table 4), when testing at
based analysis that included additional controls an alpha level of .05. These results serve to further
for other predispositional factors. More specifi- reinforce the efficacy of the estimation strategies
cally, a multivariate analysis was conducted to adopted in the MEPS to correct for the impact of
discern the influence of survey attrition on pre- survey attrition on health insurance coverage esti-
dicting the likelihood of being uninsured after mates and related model-based studies.
controlling for sociodemographic and economic
factors associated with this coverage measure. Analyses Based on NHIS to MEPS Linkage
Building on previous research efforts that have In addition to the within MEPS studies, the link-
identified salient factors associated with the pres- age of the MEPS to the NHIS permits a related set
ence or absence of health insurance coverage, a of analyses to be conducted to discern the impact
logistic regression model was developed to con- of survey attrition on national estimates. The
sist of the subset of significant predictors that design permits appending to the MEPS sample
distinguished the uninsured from those with either the data profiles collected in the NHIS for the
public or private coverage (Cohen et al. 2006b; prior year. Using the NHIS data in concert with
Cohen 2003). the restricted sample of MEPS respondents per-
Using data from the 2002 MEPS for individ- mits the derivation of national estimates for the
uals between the ages of 18 and 64, the following prior year based on a NHIS subsample character-
factors were determined to be significant corre- ized by a lower response rate. Using this design
lates in distinguishing between individuals likely feature, the national estimates derived from the
to be uninsured for the entire calendar year from MEPS sample, affected by survey attrition, may
their counterparts with some coverage: age, gen- be compared to the national estimates obtained
der, race/ethnicity, living in MSA, marital status, from the full NHIS, prior to its linkage to MEPS.
27 Survey Methods in Health Services Research 685
Table 4 Logistic regression analysis of the uninsured, testing for panel effects, US civilian noninstitutional population,
ages 18–64, 2002
Contrast Degrees of freedom Wald F P-value Wald F
Overall model 27 137.91 0.0000
Model minus intercept 26 72.69 0.0000
Panel classification 1 1.08 0.2989
Sex 1 63.48 0.0000
Race/ethnicity 3 18.60 0.0000
Health status 4 3.52 0.0082
Limitation in activity 1 14.71 0.0002
Marital status 4 20.59 0.0000
Highest year of education 4 18.01 0.0000
Poverty status 4 62.87 0.0000
USBORN 1 83.88 0.0000
MSA status 1 4.02 0.0462
Income 1 34.34 0.0000
Total health-care expenditures 1 34.91 0.0000
2 * Normalized Log-Likelihood Full Model: 14,862.53
Pseudo Model R-Square: 0.175588
Source: Center for Financing, Access, and Cost Trends, AHRQ, Household Component of the Medical Expenditure
Panel Survey, 2002
In the National Health Interview Survey, three based on the same NHIS measures from the linked
distinct measures of health insurance coverage are full-year 2002 MEPS survey will be characterized
collected as part of the annual survey. These mea- by a response rate subject to three additional
sures determine insurance coverage status at the rounds of interviewing and associated sample
time of the interview, whether there was a period attrition. A comparison of the health insurance
of being uninsured during the 12-month time estimates, based on the NHIS variables derived
frame preceding the interview, and the likelihood from the sample restricted to MEPS with the full
of being uninsured for durations that exceed a year sample NHIS national estimates, permits another
from the time of the interview. Each year, CDC’s assessment of the impact of survey attrition on the
National Center for Health Statistics releases resultant health insurance coverage estimates.
national estimates of the uninsured based on Table 5 provides a summary of the national
these measures, determining the percent of the health insurance estimates derived from the
population uninsured at the time of the interview, NHIS for calendar year 2001. In addition to
uninsured for at least part of the past year, and including the overall estimates of health insurance
uninsured for more than a year. coverage for the nation, and for the population
The cross-sectional nature of the NHIS design, under age 65, the table includes further break-
and its status as the initial baseline interview for downs distinguished by age groups <18 and
the MEPS, helps facilitate the achievement of a 18–64. National estimates of these NHIS mea-
survey response rate that has often exceeded 90%. sures from the MEPS are derived from the
Given the nationally representative nature of the MEPS full-year responding sample linked to the
subsample of the NHIS used for the MEPS each prior year NHIS. Based on the full sample 2001
year, one may produce national estimates of NHIS, the national estimates of being uninsured
health insurance coverage using the NHIS mea- by specific time periods for the entire US civilian
sures for the reserved MEPS subsample (prior to noninstitutional population were 14.2% at the
the conduct of MEPS interviews) that are conver- time of the interview, 17.8% for at least part of
gent with the estimates obtained from the full the past year, and 8.8% for being uninsured for
sample NHIS. Alternatively, national estimates more than a year since the time of the interview.
686 S. B. Cohen
Table 5 Comparison of 2001 national estimates of the uninsured derived from the 2001 NHIS and the 2002 MEPS
Percent of uninsured individuals, civilian noninstitutionalized population (standard error)
2001 estimates derived from the
NHIS 2001 NHIS estimates based on 2002 MEPS
Uninsured at Uninsured for at Uninsured Uninsured at Uninsured for at Uninsured
Age time of least part of the for more time of least part of the for more
group interview past year than a year interview past year than a year
All ages 14.2 (0.23) 17.8 (0.26) 8.8 (0.17) 13.9 (0.52) 17.9 (0.58) 8.9 (0.45)
Under 15.9 (0.25) 20.0 (0.29) 9.9 (0.19) 15.6 (0.57) 20.0 (0.64) 10.0 (0.50)
65 years
18–64 18.0 (0.26) 22.0 (0.28) 11.6 (0.21) 17.6 (0.62) 22.2 (0.70) 11.7 (0.55)
years
Under 10.9 (0.34) 15.1 (0.41) 6.0 (0.24) 10.7 (0.76) 14.9 (0.89) 5.8 (0.60)
18 years
Sources: Center for Financing, Access, and Cost Trends, AHRQ, Household Component of the Medical Expenditure
Panel Survey, 2002
National Center for Health Statistics, CDC, National Health Interview Survey, 2001
Restricting the sample to the full-year MEPS insurance helps people get timely access to med-
respondents for the subsequent year, the ical care and protects them against the risk of
corresponding NHIS-specific national estimates expensive and unanticipated medical events.
of the uninsured were 13.9% at the time of the When estimating the size of the uninsured popu-
interview, 17.9% for at least part of the past lation, it is important to consider the distinction
year, and 8.9% for being uninsured for more between those uninsured for short periods of time
than a year since the time of the interview. As and those who are uninsured for several years.
can be observed from a review of the comparisons Given the risk of exposure to high out-of-pocket
of the MEPS and NHIS-generated estimates of the medical expenditures faced by the long-term
uninsured, no significant difference in estimates uninsured and associated economic and health-
are evident, when testing at an alpha level of .05. related consequences, this population subgroup
A comparison of the NHIS-derived and the is of particular relevance to health policy consid-
MEPS-derived coverage estimates for the popula- erations. Consequently, a prediction model that
tion under age 65 and for age groupings <18 and can accurately identify the long-term uninsured
18–64 revealed similar levels of convergence. is an important analytical tool. These models
Once again, the results present no evidence of have particular relevance as statistical tools to
nonresponse bias attributable to survey attrition facilitate efficient sampling strategies that permit
affecting the national coverage estimates when the selection of an oversample of individuals
subject to more restrictive response rate require- likely to be uninsured for long periods in the
ments in MEPS. future. This discussion provides a summary of
the development of prediction models to identify
the long-term uninsured adults under age 65 and
The Utility of Prediction Models includes an evaluation of its potential utility as an
to Oversample the Long Term oversampling strategy for use in the Medical
Uninsured Expenditure Panel Survey, a core national longitu-
dinal medical care expenditure survey with com-
Estimates of the health insurance status of the US prehensive data on health insurance status. This
civilian population are critical to policymakers type of modeling effort also enhances the ability
and others concerned with access to medical care to discern the causes of being uninsured and the
and the cost and quality of that care. Health characteristics of the individuals who are without
27 Survey Methods in Health Services Research 687
coverage. This feature also applies to prediction constitutes the long-term uninsured is critical.
models that can accurately identify those individ- For this study, the ultimate objective was to
uals with transitions in coverage or with no gaps in develop the best model to predict the set of adults
coverage over a given time interval. under the age of 65 who are without any health
To improve the precision of survey estimates insurance coverage for two consecutive calendar
that characterize policy-relevant population sub- years (Cohen and Yu 2009). With these parame-
groups in a cost-efficient manner, oversampling ters set, a logistic model specification was consid-
strategies are traditionally included as a core sur- ered as most relevant for predicting the set of
vey design component and implemented in the adults under age 65 most likely to be continuously
sample selection phase. When the characteristics uninsured for two consecutive calendar years. The
of the population that are targeted for an over- longitudinal design of the MEPS, with two con-
sample are static in nature, and the sampling secutive years of data on health-care coverage,
frame that will be utilized contains the essential use, and expenditures, was ideally suited to permit
data to facilitate accurate identification of the model development and evaluation.
respective target subpopulation, the underlying The logistic model under consideration classi-
conditions permit a straightforward application fied individuals without coverage for two consec-
of disproportionate sampling techniques. Alterna- utive calendar years as Y ¼ 1, with all other
tively, when the characteristic of the population individuals classified as Y ¼ 0. Alternative defini-
targeted for an oversample is subject to transitions tions of the long-term uninsured such as lacking
over time, the oversampling strategy is subject to coverage for more than a year, being continuously
much greater uncertainties in terms of achieving uninsured for more than 2 years, are likewise
the desired sample size enhancements. The viable. All the predispositional variables included
greater the departure from a static characteristic, as potential predictors were based on an individ-
the more challenging the effort and the less certain ual’s data profile prior to the 2-year period of
the outcome. Other obstacles that further limit the interest. This modeling effort for predicting future
successful application of oversampling strategies health insurance coverage status builds off related
relate to the level of availability of the key mea- efforts that were likewise limited to consideration
sures essential for the identification of the targeted of the immediate prior year’s predispositional
population subgroup. Consequently, when atten- characteristics.
tion is directed to an effort that attempts to Several studies using MEPS data have identi-
increase the sample yield in a survey of individ- fied factors associated with distinguishing indi-
uals likely to be long-term uninsured in the future, viduals most likely to be characterized as the
the operation is subject to both constraints at its long-term uninsured (Selden and Hudson 2006;
inception: (1) the focus on a characteristic that is Short and Graefe 2003). Given the rare classifica-
subject to change and (2) a restricted set of avail- tion of children under the ages of 18 to be long-
able predictor measures available on a sampling term uninsured (only 2% of children were contin-
frame. uously uninsured over the period 2002–2005), the
modeling effort was further restricted to adults
Analytical Framework: Model between the ages of 18 and 64. The precursor
Development information characterizes an individual’s status
Given the analytical and substantive importance at a baseline period, which is defined as in the
of those individuals that are without health insur- year prior to the 2-year period of analytical focus
ance coverage for extended periods of time (in a and interest. In developing the prediction model, a
given year or longer period duration), the devel- core set of potential predispositional measures
opment and specification of accurate models to were identified that were applicable to health
predict the future likelihood of the occurrence of insurance take-up models and readily available
this event are highly desirable. At the outset, the from a screener interview. These included age,
specification of a clear definition of what gender, race/ethnicity, health status, limitations
688 S. B. Cohen
in ability to work, marital status, education level among the potential set of explanatory measures
(as measured by highest year of education com- under consideration (Table 6). The standard
pleted), region, MSA status, presence of hospital- errors of all the survey estimates derived from
ization, nativity in USA, family size, poverty the MEPS in this study and associated test statis-
status, and health insurance coverage status at tics have been adjusted for the impact of cluster-
time of screening (prior status). More specifically, ing due to the multistage survey design and
the measure of prior coverage distinguished unequal weighting.
whether the individual was covered in the prior Individuals with the longest durations of prior
year or, if not, the period of time without coverage spells without coverage were significantly more
(<6 months, 6 months to <1 year, 1 year to <3 likely to be continuously uninsured over the sub-
years, 3 years, or more years). sequent 2-year period. Hispanics, males, and indi-
As part of this study, three alternative predic- viduals born outside of the USA were also more
tion models were fit to the longitudinal data from likely to be continuously uninsured in the future.
the MEPS, the 2004–2005 panel linked to the Furthermore, low-income individuals, those with
2003 NHIS which served as both the MEPS sam- less than 12 years of education, residence in the
pling frame and screening interview. In this set- South or Midwest, and those who were never
ting, Model 1 makes use of the full set of potential married in 2003 were associated with a greater
predictors that are available from the National likelihood of being classified as long -term
Health Interview Survey for purposes of facilitat- uninsured for the period 2004–2005. Finally, a
ing an oversample of individuals predicted to be likelihood ratio test for the goodness of fit for
long-term uninsured in the MEPS. To assess the this model rejected the null hypothesis that the
performance of the fully specified model relative model’s coefficients were jointly equal to zero
to a model based on a more restricted set of and the pseudo-R2 for the model is 0.228 and it
measures, two additional models were considered had the lowest Akaike information criterion
for comparative purposes. The second model that (AIC ¼ 4572.3). A receiver operating character-
is considered (Model 2) is restricted to a single istic (ROC) analysis was also performed for each
measure of one’s insurance status at baseline, model, examining the area under the curve
further distinguished by length of time without (AUC). The selected model also exhibited the
coverage for those uninsured at baseline. From a highest AUC (.880).
survey operations perspective, the straightforward Remarkably, the second model under consid-
application and limited data requirements of this eration (Model 2) that was restricted to a single
model have particular appeal. Alternatively, the measure of one’s insurance status, further distin-
third model (Model 3) replicates the set of mea- guished by length of time without coverage for
sures considered for Model 1 with the exclusion of those uninsured at baseline, exhibited a relatively
the insurance status measure at baseline. comparable goodness of fit and a pseudo-R2 of
0.195 (Table 7). Alternatively, the third model
Likelihood of Being in the Continuously (Model 3) which replicated the set of measures
Uninsured in 2004–2005, Based on 2003 considered for Model 1 with the exclusion of the
Profiles insurance status measure at baseline exhibited less
In the final logistic regression model developed powerful goodness of fit and the lowest pseudo-
for predicting adults between the ages of 18–64 R2 of 0.130.
likely to be continuously uninsured for two sub-
sequent years, baseline health insurance status, Determination of the Cutoff Threshold
race/ethnicity, marital status, education level in Predicted Probability to Facilitate
(as measured by highest year of education com- Oversampling
pleted), nativity in USA, income, and gender Once these predictive models to identify the like-
were determined to be significant predictors lihood of being continuously uninsured have been
when testing at a .05 level of significance developed, additional analyses are necessary to
27 Survey Methods in Health Services Research 689
Table 6 Logistic regression model to identify individuals aged 18–64 likely to be continuously uninsured in 2004–2005,
based on 2003 profiles (2004–2005 MEPS, 2003, NHIS)
Independent variables and effects Beta coeff. SE beta T-test B ¼ 0 P-value T-test B ¼ 0
Intercept 2.30224 0.24615 9.35282 0.00000
Sex
Female 0.52614 0.08397 6.26584 0.00000
Race/ethnicity recode
Hispanic 0.53787 0.14531 3.70152 0.00027
Non-Hispanic Black 0.05864 0.17903 0.32753 0.74355
Non-Hispanic Others 0.36421 0.30747 1.18455 0.23737
Region
Midwest 0.54791 0.18237 3.00437 0.00294
South 0.86221 0.17992 4.79216 0.00000
West 0.37245 0.20332 1.83179 0.06822
MARITL
Married/DK 0.71323 0.11881 6.00321 0.00000
Widowed/divorced/separated 0.33652 0.14520 2.31754 0.02132
Living w/partner 0.32202 0.17812 1.80787 0.07188
EDUCYR
12 years/GED 0.19730 0.11301 1.74583 0.08212
Some college/DK 0.37671 0.12214 3.08426 0.00228
BA/BS degree 0.74324 0.19376 3.83588 0.00016
Adv degree 0.91486 0.22327 4.09753 0.00006
USBORN
No/DK 0.58786 0.13694 4.29267 0.00003
INCOME
$20K–$75K 0.38947 0.10632 3.66309 0.00031
$75K+ 0.72273 0.21587 3.34795 0.00095
How long since last had health coverage covered
<¼ 6 months 1.72635 0.17993 9.59470 0.00000
6 months–1 year 2.00910 0.19350 10.38289 0.00000
1–3 years 2.32315 0.14945 15.54488 0.00000
3 years+/DK 2.93583 0.11909 24.65196 0.00000
Sample size: 8888
Pseudo-R2: 0.228
2 *Normalized Log-Likelihood Full Model: 4528.34
Approximate chi-square (2 * Log-L Ratio): 2298.75
Degrees of freedom: 21
Source: 2004–2005 Medical Expenditure Panel Survey, Center for Financing, Access and Cost Trends, Agency for
Healthcare Research and Quality; 2003 NHIS, NCHS, CDC
identify the appropriate cutoff threshold in pre- specification and rank ordered from highest prob-
dicted probability for screening purposes to facil- ability to lowest. The predicted probability of
itate an oversample of this target population. To being uninsured for two consecutive years in the
accomplish this determination of an operational future (P ¼ Exp( y)/(1 + exp( y))) was derived
cutoff point for each model, the predicted proba- from a transformation of an individual’s predicted
bilities of being identified as continuously log odds ( y) based on the respective prediction
uninsured were determined for each sample indi- model under consideration. Based on MEPS lon-
vidual based on the underlying model gitudinal data, 12.9% of the US civilian
690 S. B. Cohen
Table 7 Logistic regression model to identify individuals ages 18–64 likely to be continuously uninsured in 2004–2005,
based on 2003 coverage profiles (2004–2005 MEPS, 2003, NHIS)
Independent variables and effects Beta coeff. SE beta T-test B ¼ 0 P-value T-test B ¼ 0
Intercept 2.98189 0.07370 40.45741 0.00000
How long since last had health coverage covered
<¼ 6 months 1.99909 0.16837 11.87347 0.00000
6 months–1 year 2.33658 0.17966 13.00526 0.00000
1–3 years 2.69189 0.13720 19.61994 0.00000
3 years+/DK 3.48061 0.11050 31.49911 0.00000
Sample size: 8888
Pseudo-R2: 0.195
2 * Normalized Log-Likelihood Full Model: 4894.40
Approximate chi-square (2 * Log-L Ratio): 1932.69
Degrees of freedom: 4
Source: 2004–2005 Medical Expenditure Panel Survey, Center for Financing, Access and Cost Trends, Agency for
Healthcare Research and Quality; 2003 NHIS, NCHS, CDC
noninstitutionalized population between the With respect to those who were long-term
ages 18 and 64 were continuously uninsured for uninsured, 59.5% were correctly identified by
the period 2004–2005. Consequently, initial the model, based on the initial cutoff rule applied
cutoff point for prediction classification was to the Model 1 predicted likelihood (model sensi-
established by determining the value of the tivity; Table 8). In addition, of those with some
predicted probability above which the sum of coverage over the 2-year period, 94.0% were cor-
the estimation weights associated with the rectly identified (model specificity), based on their
MEPS sample participants represented the predicted likelihood relative to the cutoff thresh-
top 12.9% of the distribution of the ranked old. When examining predictive capacity, 59.5%
prediction probabilities of being long-term of individuals predicted to be long-term uninsured
uninsured. As a consequence of the disproportion- were correctly classified by the model. It was also
ate sampling scheme adopted in the MEPS to observed that when considering higher values for
facilitate oversampling of policy-relevant popula- the threshold cutoff (top 10%, top 5%), the poten-
tion subgroups, and additional adjustments to the tial predictive capacity of the model in identifying
estimation weights to adjust for nonresponse and the long-term uninsured increased (Table 9).
poststratification, it was necessary to determine Using the top 5% as the threshold, the percent of
the cutoff point based on an estimated those predicted to be long-term uninsured rose to
population-based distribution of predicted proba- 73.7%. However, this gain in model predictive
bilities to insure greater applicability of the capacity was at the expense of potential sample
approach beyond the MEPS setting. This cutoff yield, given the greater restriction on the resultant
translated to a predicted probability of 0.355 eligible sample that fell above the threshold.
(or log odds of 0.598) based on the fully spec- When simultaneously considering model perfor-
ified model (Model 1). Similarly, when consider- mance on predictive capacity, sensitivity, and
ing the model that was restricted to a single specificity, while efficiently achieving accurate
measure of one’s insurance status at baseline targeted yields from oversampling subject to
(Model 2), the cutoff translated to a fixed overall sample size constraints, adoption of
predicted probability of 0.268 (or log odds of the initial cutoff rule was the preferred approach.
1.006). Alternatively, for the model which By establishing a cutoff rule in this manner, one
excludes the insurance status measure at baseline has the capacity to implement a sample selection
(Model 3), the cutoff translated to a predicted scheme permitting the oversampling of the long-
probability of 0.428 (or log odds of 0.290). term uninsured in “real time,” via a screening
27 Survey Methods in Health Services Research 691
Table 8 Examination of the sensitivity, specificity, and predictive capacity of alternative cutoff values – model 1
Predicted
Likelihood of lower pred.prob. of Logit probability True False
long-term uninsured cutoff cutoff Sensitivity Specificity positive negative
0.8000 1.8167 0.1398 0.7350 0.8789 0.4730 0.0427
0.8100 1.6711 0.1583 0.7194 0.8882 0.4876 0.0446
0.8200 1.5156 0.1801 0.7042 0.8975 0.5037 0.0465
0.8300 1.3715 0.2024 0.6905 0.9068 0.5227 0.0480
0.8400 1.1730 0.2363 0.6723 0.9157 0.5411 0.0502
0.8500 0.9506 0.2788 0.6479 0.9236 0.5561 0.0534
0.8600 0.7902 0.3121 0.6275 0.9320 0.5771 0.0558
0.8700 0.6228 0.3491 0.5988 0.9392 0.5929 0.0594
0.8714 0.5975 0.3549 0.5951 0.9401 0.5948 0.0599
0.8800 0.4734 0.3838 0.5694 0.9463 0.6106 0.0630
0.8900 0.3303 0.4182 0.5335 0.9526 0.6244 0.0675
0.9000 0.1804 0.4550 0.4924 0.9579 0.6337 0.0726
0.9100 0.0699 0.4825 0.4531 0.9636 0.6481 0.0774
0.9200 0.0788 0.5197 0.4180 0.9698 0.6719 0.0815
0.9300 0.2582 0.5642 0.3765 0.9751 0.6912 0.0864
0.9400 0.4694 0.6152 0.3317 0.9801 0.7112 0.0916
0.9500 0.6293 0.6523 0.2869 0.9849 0.7371 0.0967
Source: 2004–2005 Medical Expenditure Panel Survey, Center for Financing, Access and Cost Trends, Agency for
Healthcare Research and Quality; 2003 NHIS, NCHS, CDC
Table 9 Required sample size of adults 18–64 to yield a sample of 1760 individuals continuously without health
insurance coverage over 2 years (50% increase)
Model-based Model-based
Model-based oversample: oversample:
No model- oversample: model 2 – single model 3 – excludes
based model 1 – fully baseline coverage baseline coverage
oversample specified model measure measure
Required overall sample 15,000 11,058 11,028 11,512
size
Oversampling rate N.A. 1.80 1.75 2.58
Model prediction rate N.A. 55.5% 57.1% 38.8%
– % correct predictions
Assumes base sample size of 10,000 individuals aged 18–64 in a MEPS panel responding for their entire 2-year period of
eligibility in the survey
interview that collects the necessary input infor- those adults under age 65 who would be continu-
mation required for the prediction model under ously without coverage for the subsequent 2-year
consideration. period. The performance of the model was assessed
based upon an independent representative sample
Examination of the Sensitivity, that characterizes the nation’s health insurance cov-
Specificity, and Predictive Capacity erage experience. In this setting, the design of the
of Alternative Probabilistic Models MEPS is uniquely suited to this more rigorous cri-
Once a parsimonious model was identified, which terion to assess model performance. This condition
consisted of the subset of predictors that were all was satisfied through development of the prediction
significant at the .05 level, the model was ready to model using data from one specific MEPS longitu-
be evaluated in terms of its accuracy in predicting dinal panel and then applying the model to an
692 S. B. Cohen
independent MEPS longitudinal panel to assess Model 3, the prediction model that included the
model performance. Since the model was developed same sociodemographic predictors but excluded
using MEPS data from the 2004 to 2005 longitudi- prior year insurance coverage status. Model
nal panel, the model was then applied to a different 3 only correctly identified 31.2% of the long-
MEPS panel to assess performance. In addition, the term uninsured. Surprisingly, the model that
model’s performance was also assessed in relation to was restricted to only measuring the prior
the two alternative prediction models under consid- year’s health insurance coverage status (Model
eration (Model 2: prior coverage status only; and 2) was able to correctly identify 45.1% of the
Model 3, no inclusion of prior coverage status). long-term uninsured. Generally comparable per-
Model performance was then evaluated based formance was observed when examining the
upon predictive capacity, sensitivity, and specific- alternative models with respect to specificity.
ity, using the distinct predicted probability cutoff Model 1’s performed well with a specificity
thresholds established with the 2004–2005 MEPS level of 94.1%, with Model 2 at 95.5% and
longitudinal panel for the three models. For the Model 3 at 93.5%.
fully specified model (Model 1), the threshold The next set of comparisons focused on the
cutoff point for selecting an oversample of long- predictive capacity of the respective models, as
term uninsured individuals was .355. Using the measured as the percent of individuals with pre-
2001 NHIS data in tandem with the associated dicted probabilities of being long-term uninsured
model coefficients to derive a predicted probabil- above the threshold cutoff point, who were cor-
ity of being continuously uninsured in rectly classified. More specifically, of those indi-
2002–2003, all individuals with a predicted prob- viduals predicted to be continuously uninsured
ability of 0.355 or greater were targeted for an throughout 2002–2003, Model 1 correctly pre-
oversample. In the same manner, the threshold dicted 55.5% of the target population. Again,
cutoff point established for Models 2 and 3 were this performance was significantly better than the
0.268 and 0.428, respectively. Based on MEPS predictive capacity of Model 3, the prediction
longitudinal data, 11.7% of the US civilian non- model that included the same sociodemographic
institutionalized population between the ages predictors but excluded prior year insurance cov-
18–64 were continuously uninsured for the period erage status. Model 3 only correctly predicted
2002–2003. Using the predetermined cutoff 38.8% of the target population. Alternatively, the
points for the respective models, the overall per- model that was restricted to only measuring the
cent of the population predicted to be classified as prior year’s health insurance coverage status
long-term uninsured by Model 1 was most con- (Model 2) exhibited the best performance in pre-
sistent with the population estimate of 11.7% dictive capacity, with a correct prediction rate of
(11.6%), with both Models 2 and 3 yielding pre- 57.1% for the long-term uninsured.
dicted population estimates below 10% based on The final set of comparisons in model perfor-
the preestablished cutoff thresholds. mance are directly focused on the expected
An assessment of the performance of the sen- sample necessary to support a 50% increase in
sitivity of the alternative models to correctly sample yield of individuals between the ages of
identify individuals likely to be continuously 18–64, who are continuously uninsured over two
uninsured in 2002–2003 indicated the logistic consecutive calendar years. This enhanced sample
model that included prior year insurance cover- size would yield significant improvements in the
age profiles in tandem with the significant socio- precision of survey estimates which characterized
demographic predictors (Model 1) was superior. the long-term uninsured and associated popula-
More specifically, Model 1 correctly identified tion subgroups. The use of this metric facilitated
54.9% of those individuals who were continu- an evaluation of the efficiency of a model-based
ously uninsured throughout 2002–2003. This oversampling strategy to yield the targeted sam-
was significantly better than the sensitivity of ple, standardizing the comparison in terms of
27 Survey Methods in Health Services Research 693
sample size requirements under different model care quality and are dependent on accurate, reliable
specifications. Using an assumption of a base national estimates of these health-care parameters
sample requirement of 10,000 individuals aged to help inform policy and practice. Health-care
18–64 in a MEPS panel responding for their entire surveys serve as a critical source of this essential
2-year period of eligibility in the survey, the information, and the descriptive and analytical find-
required sample size necessary to achieve a 50% ings they generate are key inputs to facilitate the
sample size increase above the 1173 long-term development, implementation, and evaluation of
uninsured survey participants was derived based policies and practices addressing health care and
on the estimated predictive capacity observed for health behaviors. To ensure their utility and integ-
the alternative models. This sample size specifi- rity, it is essential that these health-care surveys are
cation calls for the inclusion of an additional designed according to high-quality, effective, and
587 individuals with the characteristic, resulting efficient statistical and methodological practices
in overall target sample yield of 1760 individuals and optimal sample designs.
who are long-term uninsured in the survey. This chapter serves to illustrate several survey
A summary of the overall sample size require- methods that enhance the performance and utility
ments to achieve a target sample of 1760 chronically of health services research efforts. Attention has
uninsured individuals aged 18–64 in the survey is been given to the topics of sample and survey
provided in Table 9. Model 1 performed quite well designs, nonresponse and attrition, estimation,
in terms of the necessary overall sample size to meet precision, sample size determination, and analyt-
the target for the policy-relevant population sub- ical techniques to control for survey design com-
group under consideration. A sample design with- plexities in analysis. Several of the topics that are
out access to the predictor variables from a featured in this chapter are further connected by
screening interview such as the NHIS or a design their substantive focus on the measurement of
without application of oversampling techniques trends in health-care costs, coverage, access, and
would require an overall sample of 15,000 adults health-care utilization. In addition to highlighting
ages 18–64 to achieve the target. Alternatively, all of underlying survey operations, estimates, and out-
the model-based oversampling strategies were sub- puts, the topics that have been covered also serve
stantially more effective than the constrained to identify potential enhancements that facilitate
approach, each requiring a substantially lower over- improvements in design, data collection, estima-
all sample to achieve the targeted sample. In addi- tion strategies, and ultimately analytical capacity
tion, the expected overall sample size specification for health services research efforts.
for the model-based oversampling approach inher- A well-designed health-care survey imposes an
ent in Model 1 was substantially more modest interdependence between the survey sponsors, the
(11,058) and significantly more efficient than the survey designers, the associated statisticians and
model which excluded a baseline measure of health methodologists, the survey operations, field and
insurance status (Model 3). Remarkably, the model- management staff, the data processing staff, and
based oversampling strategy that required the low- the end users, who are primarily the health
est overall sample was the model that considered researchers, policymakers, and the public. The
only a single baseline insurance coverage status survey methods covered in this chapter should
measure (Model 2). help serve as a roadmap to help realize and
strengthen these connections. When all the essen-
tial health-care survey contributors work in con-
Summary cert, following the methods covered in this
chapter, the overall quality and utility that is
Policymakers, health-care leaders, and decision achieved in the conduct of health services
makers are particularly sensitive to recent trends research should be much greater than the sum of
in health-care costs, coverage, access, and health- the individual successful components.
694 S. B. Cohen
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
Two-Part Models for Zero-Modified Count Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
Hurdle Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
Zero-Inflated Count Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
Regression Models for Zero-Modified Count Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
Recent Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
Two-Part Models for Semicontinuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
Two-Part Regression Models for Semicontinuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
Recent Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706
Model Fitting, Testing, and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
Zero-Modified Count Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
Semicontinuous Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
Model Comparison and Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
Abstract
Health services data often contain a high pro-
portion of zeros. In studies examining patient
hospitalization rates, for instance, many
B. Neelon (*) patients will have no hospitalizations, resulting
Department of Biostatistics and Bioinformatics, Duke in a count of zero. When the number of zeros is
University School of Medicine, Durham, NC, USA
greater or less than expected under a standard
e-mail: brian.neelon@duke.edu
count model, the data are said to be zero mod-
A. J. O’Malley
ified relative to the standard model. More pre-
The Dartmouth Institute for Health Policy and Clinical
Practice, Department of Biomedical Data Science, Geisel cisely, the data are zero inflated if there is an
School of Medicine at Dartmouth, Lebanon, NH, USA overabundance of zeros, and zero deflated if
Department of Health Care Policy, Harvard Medical there are fewer zeros than expected. A similar
School, Boston, MA, USA phenomenon arises with semicontinuous data,
e-mail: Alistair.J.O'Malley@Dartmouth.edu
which are characterized by a spike at zero and semicontinuous data, parametric mixture dis-
followed by a right-skewed continuous distri- tributions known as two-part models are typically
bution of positive values. When dealing with needed to address both the abundance of zeros and
zero-modified count and semicontinuous the often highly skewed distribution of nonzero
data, flexible two-part mixture distributions values.
are often needed to accommodate both the Various two-part models have been developed
excess zeros and the skewed distribution of in recent years to address zero-modified count
nonzero values. A broad array of two-part and semicontinuous data, including hurdle
models has been introduced over the past models, zero-inflated models, and two-part semi-
three decades to accommodate such data. continuous models. While these models vary in
These include hurdle models, zero-inflated terms of their distributional assumptions and para-
models, and two-part semicontinuous models. metric forms, they all incorporate an underlying,
While these models differ in their distribu- two-part structure in which the zero and nonzero
tional assumptions, they each incorporate a observations are modeled through distinct
two-part structure in which the zero and non- (although sometimes overlapping) sets of
zero observations are modeled in distinct but parameters.
related ways. This chapter describes recent Sections “Two-Part Models for Zero-Modified
developments in two-part modeling of zero- Count Data” and “Two-Part Models for Semi-
modified count and semicontinuous data and continuous Data” of this chapter provide overviews
highlights their application in health services of zero-modified count and semicontinuous models,
research. respectively. Section “Model Fitting, Testing, and
Evaluation” discusses model fitting and evaluation
strategies and highlights software packages com-
Introduction monly used to fit such models. The final section
provides a summary, discusses potential limita-
In health services research, it is common to tions of two-part models, and points to directions
encounter data with an abundance of zeros. For for future research.
example, in studies examining outpatient clinic
visits, patients who report no visits will be
assigned a count of zero. Likewise, in studies Two-Part Models for Zero-Modified
examining the frequency of screening mammog- Count Data
raphy, patients who have never received a screen-
ing mammogram will have a response value of Zero-modified count data arise frequently in
zero. Count-valued outcomes, like those in the health services research. Consider, for example,
previous two examples, are typically modeled a recent study by Neelon et al. (2012) examining
using discrete distributions, such as the Poisson emergency department (ED) visits in Durham,
or negative binomial distribution. However, there North Carolina, during the 2009 calendar year.
are times when the proportion of zeros is greater Figure 1 presents a partial histogram of the visits
or less than what a standard count distribution up to ten visits. The actual number of visits per
would predict, and in such cases the data are said patient ranged from 0 to 95, with an average of
to be zero modified relative to an ordinary count 0.65 visit per patient. Nearly 70% of the patients
model. A related phenomenon occurs with semi- made no ED visits during the year, 19% had
continuous outcomes, such as medical expendi- exactly one visit, 5% had exactly two visits, and
tures, which are characterized by a point mass at the remaining 6% had more than two visits.
zero (representing, say, no expenditures) followed Now, suppose one is interested in building a
by a right-skewed continuous distribution for the statistical model to describe these data. A first step
positive values (representing positive expendi- might be to assume that the data were generated
tures). When dealing with zero-modified count according to a Poisson distribution with mean
28 Two-Part Models for Zero-Modified Count and Semicontinuous Data 697
60
50
Percent 40
30
20
10
0 1 2 3 4 5 6 7 8 9 10
Number of Emergency Department Visits
parameter μ = 0.65, the average number of ED longer than predicted right-tailed distribution of
visits in the sample. That is, positive counts, since these features impose com-
peting influences on the model. In the Poisson
0:65y e0:65 case, for example, the high proportion of zeros
PrðY ¼ yÞ ¼ , y ¼ 0, 1, . . . , (1) tends to lower the mean parameter, μ, while large
y!
nonzero values tend to increase it. The term “zero
where Y denotes the number of ED visits. inflation,” then, is customarily used to describe
Although this seems like an intuitive (albeit some- data in which a high proportion of zeros, together
what basic) modeling choice, the model is not with a skewed distribution of nonzero counts,
especially compatible with the observed data. leads to a poor-fitting standard count model.
Under this model, for instance, one would expect More generally, the term zero modification is
52% zeros and 34% 1’s – far fewer zeros and more used to encompass both zero inflation and zero
1’s than were actually observed. When the num- deflation (i.e., fewer than expected zeros). In the
ber of zeros is greater than would be predicted presence of zero modification, special two-part
under a standard count distribution, the data are mixture distributions are often needed to provide
said to be zero inflated relative to the standard adequate fit to the data. This section reviews
model. Note that the abundance of zeros by itself common two-part models for zero-modified
is not necessarily problematic. For example, count data.
under a Poisson model with mean μ = 0.35, one
would expect approximately 70% zeros as
observed in Fig. 1. However, this same model Hurdle Models
would predict fewer than 1% of the counts to be
greater than two, clearly in conflict with the 6% The hurdle model (Mullahy 1986; Heilbron 1994)
observed in the data. Ordinary count distributions, is a two-part mixture model consisting of a point
therefore, become problematic primarily when mass at zero followed by a zero-truncated count
there is an abundance of zeros coupled with a distribution for the positive observations:
698 B. Neelon and A. J. O’Malley
PrðY ¼ 0Þ ¼ 1 π, 0 π 1
πμ y r
Eð Y Þ ¼ η¼ and π Γðy þ r Þ μ r
1 pð0; θÞ PrðY ¼ yÞ ¼ r
r Γðr Þy! μ þ r μþr
,
(3) 1 μþr
πσ 2
V ðY Þ ¼ ηðμ ηÞ þ , r, μ > 0; y ¼ 1, 2, . . .
1 pð0; θÞ
(5)
where μ and σ denote the mean and variance
2
of the base distribution, respectively. In health The negative binomial base distribution is
services research, π is known as the utilization appealing if there is evidence of overdispersion
probability – i.e., the probability of using ser- relative to the ordinary Poisson – that is, a vari-
vices at least once. When 1 π = p(0; θ), the ance exceeding the mean. The mean and variance
hurdle model reduces to its base distribution; of the negative binomial base distribution are
when (1 π) > p(0; θ), the data are zero given by μ and μ(1 + μ/r), respectively; hence,
inflated relative to the base distribution; and (1 + μ/r) is a measure of overdispersion. As
when (1 π) < p(0; θ), there is zero deflation. r ! 1 the negative binomial converges to a
In the extremes, π = 0 or 1. When π = 1, there Poisson distribution with mean and variance
are no zero counts, and the model reduces to a equal to μ. The connection between the negative
truncated count distribution; when π = 0, there binomial and Poisson distributions goes even fur-
are no users (i.e., all counts equal zero), and the ther, since the former can be derived as a Poisson-
model is degenerate at zero. Typically, one gamma mixture. In particular, if Wjλ Poi (λ)
assumes that π is strictly between 0 and 1, so and λ Ga(r, μ/r), then the marginal distribution
that there is a nonzero utilization probability for of W is negative binomial with mean μ and vari-
all individuals under study, and hence all sub- ance μ(1 + μ/r). Thus, the gamma prior, or
jects are viewed as “potential” users, even if “mixing,” distribution for λ induces excess varia-
some do not actually use services during the tion relative to the Poisson. More generally, it can
study period. be shown that hurdle models are more over-
Perhaps the most common choice for the base dispersed than their base distributions if and only
if (1 π) > p(0; θ), since in this case VEððYYÞÞ > σμ ,
2
distribution is the Poisson distribution, which
gives rise to the Poisson hurdle model: where Y is distributed according to Eq. 2 and μ and
28 Two-Part Models for Zero-Modified Count and Semicontinuous Data 699
σ 2 are the mean and variance of the base distribu- and the zeros are inflated relative to an ordinary
tion, respectively. For example, the negative bino- Poisson distribution. Thus, unlike the Poisson hur-
mial hurdle distribution is more overdispersed dle model, the ZIP model accommodates only zero
than the ordinary negative binomial if (1 π) inflation. In fact, because zero-inflated count
> 1 [r/(μ + r)]r or equivalently π < [r/ models can be rewritten as hurdle models with
(μ + r)]r. As a corollary, it follows that the mixing probability π = ϕ[1 p(0; θ)] (Neelon
Poisson hurdle model is overdispersed relative to et al. 2010), they can be viewed as special cases
the ordinary Poisson if and only if (1 π) > exp of hurdle models in which only zero inflation
(μ) and underdispersed when (1 π) < exp and overdispersion are allowed. As with hurdle
(μ). Thus, the Poisson hurdle model allows for models, other base distributions can be chosen to
both over- and underdispersion. Underdispersion model the counts in zero-inflated models. For
arises when there are fewer zeros than expected example, the ZINB model is given by Y ð1 ϕÞ
under the ordinary Poisson model (Winkelmann 1ðZ¼0Þ þ ϕNBðy; r, μÞ1ðZ¼1Þ. For a comprehensive
2008). As μ ! 1, the number of zeros expected review of zero-inflated models, see Ridout
under the ordinary Poisson model decreases, et al. (1998).
and the potential for underdispersion dimin- Because each part of the mixture accommo-
ishes. For detailed discussions of over- and dates zeros, zero-inflated models such as the ZIP
underdispersion in zero-modified count models, explicitly partition the zeros into two types: struc-
see Helibron (1994), Gschlößl and Czado tural or ineligibility zeros (e.g., those that occur
(2008), and Winkelmann (2008). because a patient is ineligible for health services)
and chance or sampling zeros (those that occur by
chance among eligible patients). In the health
Zero-Inflated Count Models
services setting, the parameter ϕ is known as the
eligibility probability, and hence the random var-
Zero-inflated count models are two-part mixtures
iable Z can be viewed as an “eligibility” indicator
consisting of a degenerate distribution at zero and
taking the value 1 if an individual is eligible for
an untruncated count distribution. These include
services and 0 otherwise. In this context, the
the zero-inflated Poisson (ZIP) model (Lambert
parameter μ represents the mean count among
1992) and the zero-inflated negative binomial
eligible subjects (i.e., given Z = 1). In other set-
(ZINB) model (Green 1994; Mwalili et al. 2008).
tings, such as infectious disease epidemiology,
The ZIP model is given by
ϕ is known as the “at-risk” or “susceptibility”
μ probability – i.e., the probability of belonging to
PrðY ¼ 0Þ ¼ ð1 ϕÞ þ ϕe , 0ϕ1
an at-risk or susceptible population (Albert et al.
μy eμ
PrðY ¼ yÞ ¼ ϕ , μ > 0; y ¼ 1, 2, . . . ; 2011; Preisser et al. 2012). Note that the random
y!
variable Z is unobserved, since the observed out-
or, alternatively, come, Y, provides no direct information about
Y ð1 ϕÞ1ðZ¼0Þ þ ϕPoiðy; μÞ1ðZ¼1Þ , individuals’ eligibility status, only whether they
(6) eventually used services as indicated by Y = 0 or
Y > 0. If Z were actually observed (e.g., through
where Z is a (latent) indicator variable that an eligibility screening process), then ϕ could be
takes the value 1 with probability ϕ. The mean estimated using the sample proportion of eligible
and variance of the ZIP model are E(Y ) = ϕμ patients and μ by fitting a count model to the
and V(Y ) = ϕμ[1 + (1 ϕ)μ], respectively, subsample of those eligible. The fact that Z is
and hence V(Y) > E(Y) and the model is over- unobserved means that it is not possible to condi-
dispersed when ϕ < 1. When ϕ = 1, there is no tion on the eligible group, which, from a policy
zero inflation, and the model reduces to the ordinary standpoint, may be the subpopulation of greatest
Poisson with Pr(Y = 0) = exp(μ). Conversely, interest. Fortunately, zero-inflated models allow
when ϕ < 1, exp(μ) < (1 ϕ) + ϕ exp(μ), one to estimate ϕ and μ even when Z is
700 B. Neelon and A. J. O’Malley
unobserved, a topic discussed in greater detail in than 1 π = Pr(Y = 0), since the former is typ-
section “Model Fitting, Testing, and Evaluation.” ically of interest. Moreover, for simplicity, identi-
The choice between ZIP and hurdle models is cal predictors are assumed for both parts of the
dictated in large part by the aims of the investiga- model. In general, one might allow for unique
tor. If zeros can arise in only one way, then a predictors for the two components if the goal is
hurdle model may be desirable. For example, in to obtain a parsimonious model by removing
a study of outpatient service use, it may happen extraneous variables in one component or if
that patients either decline services, in which case there is a priori scientific reason to believe that
Y = 0, or they use services one or more times, in the two components are associated with unique
which case Y > 0. Here, a hurdle model might sets of predictors.
reasonably capture the underlying distribution of Choosing a logit link for g(.) gives rise to the
the counts. In contrast, if patients only use ser- logistic hurdle regression model:
vices when they perceive themselves to be “at
πi
risk,” then zeros can arise in two ways: among logitðπ i Þ ¼ ln ¼ x0i β1
those who are not at risk or among those who are 1 πi
at risk but nevertheless choose not to use services. lnðμi Þ ¼ x0i β2 , i ¼ 1, . . . , n:
In this case, a zero-inflated model would seem (8)
more appropriate. In some situations, the choice
between models is not clear-cut. In these circum- Under model (8), the l-th regression coeffi-
stances, Min and Agresti (2005) suggest that hur- cient, β1l(1 l p), represents the effect of a
dle models might provide better fit if there is one-unit change in the l-th predictor,xli, on the
evidence of zero deflation among subgroups of log odds of service utilization, adjusting for
the population (e.g., among nonsmoking males). other predictors. The precise interpretation of β2l
Zero-inflated models, on the other hand, imply is less straightforward, since, conditional on
zero inflation at all covariate values. Y > 0, the counts are modeled via a truncated
distribution rather than an ordinary count distri-
Regression Models for Zero-Modified bution. Generally speaking, however, β2l > 0
Count Data implies that the expected count among health
services users increases as xli increases.
Suppose interest lies in modeling the association Zero-inflated regression models have a similar
between a set of predictors x (e.g., age, race, etc.) form:
and a zero-modified response Y. Hurdle models
g½PrðZi ¼ 1Þ ¼ gðϕi Þ ¼ x0i β1
can be extended to the regression setting by
modeling each component of as a function of x: lnðμi Þ ¼ x0i β2 , i ¼ 1, . . . , n,
(9)
g½PrðY i > 0Þ ¼ gðπ i Þ
¼ x0i β1 ¼ β10 þ β11 x1i þ . . . þ β1p xpi lnðμi Þ where Zi is the eligibility indicator for the i-th
subject as defined in the previous section. Note
¼ x0i β2 ¼ β20 þ β21 x1i þ . . . þ β2p xpi , that the first equation of (9) models ϕi, the eligi-
i ¼ 1, . . . , n, bility probability for the i-th individual, rather
(7) than the utilization probability, which is
represented by π i = ϕi[1 p(0; μi)]. If a logit
where g(.) is a binary link function, such as the link is assumed for g, then β1l denotes the effect
logit or probit link, Yi denotes the response for the of a one-unit change in covariate l on the log odds
i-th observation, xi is a p 1 vector of predictors, of eligibility, while β2l represents the effect of a
and β1 and β2 are corresponding p 1 vectors one-unit change in predictor l on the log-mean
of regression coefficients for each component. count given eligibility. Or, put another way, for
Note that Eq. 7 models π = Pr(Y > 0) rather every one-unit change in predictor l, the mean
28 Two-Part Models for Zero-Modified Count and Semicontinuous Data 701
where Yij is the j-th response for subject Bandyopadhyay (2011) proposed a two-state,
(or cluster) i, xij is a corresponding vector of pre- hidden Markov ZIP model to analyze cocaine
dictors for the ij-th observation, b1i and b2i are dependence, with hypothesized latent states
random intercepts for the i-th subject/cluster, and corresponding to “high” or “low” cocaine use.
N2(0, Σ) denotes a bivariate normal distribution Dobbie and Welsh (2001) and Hall and Zhang
with mean 0 = (0, 0)0 and 2 2 variance- (2004) used generalized estimating equations
covariance matrix Σ. Higher dimensional random (GEE) to fit population-average (or “marginal”)
effects, such as random slopes, can be incorpo- Poisson hurdle models. Fahrmeir and Osuna
rated as well. The correlated random effects model Echavarría (2006) developed a generalized
is appealing if one believes that the process giving additive ZINB model, using penalized splines
rise to a nonzero count is related to the expected to model nonlinear trends among the predictors
count given Y > 0. For example, returning to the Lam et al. (2006) proposed a related semi-
ED study presented at the beginning of the sec- parametric ZIP model. Hsu (2005) introduced
tion, it might be reasonable to hypothesize that a weighted ZIP (W-ZIP) model to predict the
patients with a high propensity to use the ED at time to recurrence of colorectal polyps among
least once are also likely to make repeat visits patients randomized to high- and low-fiber
given some utilization. In such cases, the corre- diets. Buu et al. (2011) developed a variable
lated random effects model can lead to improved selection method for ZIP models that allows
model fit over uncorrelated random effects, single for component-specific penalties. Williamson
random effect, and fixed-effects models – all of et al. (2007) derived power and sample size
which arise as special cases of the correlated calculations for studies involving zero-inflated
model. The correlated zero-inflated model has a data. For times series analysis, Hasan and
comparable form to the hurdle model, but as noted Sneddon (2009) developed first-order auto-
in the previous subsection, the interpretation of regressive (AR(1)) and moving average (MA
the parameters differs. For overviews of zero- (1)) ZIP models. More recently, Silva et al.
modified count models for repeated measures, (2011) proposed a ZIP model for quantitative
see Min and Agresti (2005) and Neelon et al. trait loci (QTL) mapping.
(2010). For a more general discussion of count Several authors have introduced zero-inflated
regression models, including zero-modified models for the analysis of spatially correlated
models, see Cameron and Trivedi (1998), data. Agarwal et al. (2002) developed a spatial
Winkelmann (2008), and Zuur et al. (2012). ZIP model that incorporated spatially correlated
random effects into the Poisson component.
Rathbun and Fei (2006) proposed a similar
Recent Developments model in which the structural zeros were fitted
using a spatial probit model. Ver Hoef and Jansen
Two-part count models have been adapted to (2007) extended the approach to include distinct
cover a wide range of statistical applications, spatial random effects for both model compo-
including latent growth curve models, finite mix- nents. Recently, Neelon et al. (2012) developed a
ture models, generalized additive models, variable spatial Poisson hurdle model for “areal-
selection methods, multivariate analysis, and spa- referenced” data in which the spatial units consist
tial data analysis. For example, Liu (2007) devel- of aggregated regions of space, such as counties or
oped a zero-inflated growth model that allows for Census tracts. They introduced spatial random
correlated random intercepts and slopes for both effects for both components of the hurdle model
components. Roeder et al. (1999), Dalrymple and linked the random effects via a bivariate con-
et al. (2003), and Min and Agresti (2005) devel- ditionally autoregressive (CAR) prior that induces
oped finite mixture zero-modified models that dependence between the model components and
cluster subjects into distinct classes defined provides spatial smoothing across neighboring
by latent response trajectories. DeSantis and regions. As such, their model can be viewed as a
28 Two-Part Models for Zero-Modified Count and Semicontinuous Data 703
spatial analogue to the correlated hurdle model As with zero-modified count data, semi-
given in Eq. 13. continuous data can be viewed as arising from
There have been a number of other recent two distinct stochastic processes: one governing
developments as well. These include zero- the occurrence of zeros and the second determin-
inflated binomial (ZIB) models (Hall 2000);, ing the observed value given a nonzero response.
Hall and Zhang 2004), pattern-mixture Poisson The first process is commonly referred to as the
hurdle models for non-ignorable missing data “occurrence” or “binary” part of the data, and the
(Hasan et al. 2009; Maruotti 2011), the k-ZIG second is often termed the “intensity” or “contin-
model for extreme zero inflation (Ghosh et al. uous” part. Two-part mixture models are an ideal
2012), zero-inflated generalized Poisson (ZIGP) choice for such data, since they explicitly accom-
models (Gschlößl and Czado 2008; Gupta et al. modate both data-generating processes. A lognor-
1996), zero-inflated power series models mal distribution is frequently chosen to model the
(Ghosh et al. 2006), and multivariate extensions nonzero values, giving rise to the Bernoulli-log-
of zero-inflated models (Li et al. 1999; Walhin normal two-part model (Manning et al. 1981):
and Bivariate 2001; Majumdar and Gries 2010;
Arab et al. 2011). These recent applications f ðyÞ ¼ ð1 ϕÞ1ðy¼0Þ
highlight the growing use of two-part models
þ ϕ LN y; μ, σ 2 1ðy>0Þ , y 0, 0 ϕ 1,
for the analysis of complex zero-modified
(14)
count data.
1.0
Fig. 2 Distribution of
annual mental health
expenditures among federal
employees
0.50
Density
0.25
0
0 100 200 300 400 500 600 700 800 900
Spending ($)
costs. More recently, Liu et al. (2012) compared model accommodates two sources of zeros (true
generalized gamma, log-skew-normal, and Box- zeros and censored zeros), it can be viewed a
Cox-transformed two-part models and found that semicontinuous version of the zero-inflated
the generalized gamma model provided superior fit count models described in section “Zero-Inflated
in their analysis of daily alcohol consumption. Count Models.”
A related model is the Tobit model (Tobin
1958) in which the zeros represent the censoring
of an underlying continuous variable Y below a Two-Part Regression Models
detection limit, L: for Semicontinuous Data
take exp(β2l), which denotes the multiplicative Modeling the correlation between b1i and b2i
change in the median of YijYi > 0 per unit change directly accommodates the between-component
in xil. Because the expected value of YijYi > 0 is association, thus providing a realistic characteri-
given by exp x0i β2 þ σ 2 =2 , inference involving zation of the underlying data-generating process.
the untransformed mean response entails estima- There are other advantages to modeling the
tion of both β and σ 2. If the log-normality assump- between-component association, however. Most
tion fails, nonparametric methods can be used to importantly, ignoring the between-component
estimate the untransformed mean in the continu- association can lead to biased estimates in the
ous component of the model (Duan 1983). This second part of the model (Su et al. 2009). To see
topic is discussed in greater detail in section this, consider the two-part lognormal model given
“Model Fitting, Testing, and Evaluation.” in Eq. 17, which can be recoded in terms of two
Two-part regression models have also been random variables:
used to analyze longitudinal and clustered semi-
continuous data (Olsen and Schafer 2001; Tooze 0 if Y ¼ 0
R ¼
et al. 2002; Cooper et al. 2007). The most com- 1 if Y > 0 (18)
mon approach is to introduce correlated random Undefined if R¼0
V ¼ ,
effects for each component, as in model (13) of logðY Þ if R¼1
section “Regression Models for Zero-Modified
Count Data.” Assuming a logit link for g( ) and a where subscripts have been omitted to simplify
lognormal distribution for the positive values notation. The random variable R ¼ 1ðY>0Þ can be
leads to the logistic-lognormal correlated random viewed as a response indicator for the second
effects model: component of the model.
Recall that the target population for the contin-
uous part is the set of all subjects with positive
logit ϕij ¼ logit Pr Y ij > 0j b1i ¼ x0ij β1 þ b1i
responses – that is, for whom Y > 0
μij ¼ E ln Y ij j Y ij > 0, b2i ¼ x0ij β2 þ b2i , (or equivalently for whom R = 1). Valid infer-
j ¼ 1, . . . , ni ; i ¼ 1, . . . , n; ences can be achieved by selecting a random
bi ¼ ðb1i , b2i ÞeN2 ð0, ΣÞ, sample V ¼ fV 1 , V 2 , . . . , V n g from this target
(17) population. However, when some individuals
have a response value of 0, a subset of
where Yij denotes the j-th response for the i-th V ðsay, V Þ is undefined. These undefined obser-
subject (or cluster); b1i and b2i are correlated vations can be viewed as akin to missing data. If
subject-specific random intercepts for the binary the two components are truly uncorrelated, then
and continuous components, respectively; and Σ V is missing completely at random (MCAR). In
is a 2 2 variance-covariance matrix. The model this case, the model for R includes only an inter-
can be easily extended to include higher-order cept and therefore has no bearing on the model for
random effects. V. Consequently, a model fitted to the observed
As in the count setting, the correlated model is values of V will yield population-representative
appealing if one believes that the process giving estimates.
rise to the positive values is related to the observed If the association between the components can
value given a positive response. For example, in a be explained entirely by observed data, then the
study of hospital length of stay, patients who are elements of V are missing at random (MAR). In
likely to be admitted to the hospital may also tend other words, R and V are conditionally indepen-
to have longer stays than those with lower pro- dent given the observed data. Modeling R and
pensities for admission. This would imply a pos- V separately will once again yield unbiased esti-
itive association between the probability of mates as long as the model for V is correctly
admission (component 1 of the model) and the specified and includes all predictors relevant to
length of stay given admission (component 2). R. In some instances, investigators may wish to
706 B. Neelon and A. J. O’Malley
include only a subset of the predictors in part 2 of mechanisms, see Little and Rubin (2002). For a
the model, in which case the model for V will, by general overview of shared-parameter models for
necessity, exclude predictors associated with R. non-ignorable missing data, see Albert and
Here, one can use the model for R to form sam- Follmann (2009).
pling weights and fit a weighted regression to V.
Alternatively, one can impute V and base the
analysis on the observed and imputed data. The Recent Developments
key point is that if V is MAR, then modeling
R and V separately will not induce bias so long as There have been a number of recent develop-
the model for V is correct and incorporates, in ments in semicontinuous regression modeling.
some fashion, the relevant predictors for R. Liu et al. (2008) developed a multilevel two-part
If, however, R and V remain correlated after model that incorporates correlated random effects
adjusting for covariates, then V is not missing at at multiple levels of clustering – for example,
random (NMAR). Here, fitting separate models longitudinal measurements on patients clustered
for R and V induces selection bias in the parameter within clinic. Here, clinics constitute the first clus-
estimates for V. For example, if the two compo- tering level, since patients are nested within
nents are positively correlated, higher-valued V’s clinics; patients then form the second level of
will tend to have increased nonzero response clustering, since there are repeated measurements
probability, Pr(R = 1), conditional on observed for each subject.
covariates. As a result, at fixed values of the Another active area of research involves two-
observed covariates, there will be an overrepre- part growth mixture models for examining longi-
sentation of large response values among the tudinal trends among latent subgroups of individ-
observed cases in V . Ignoring the association uals (Neelon et al. 2011; Muthén 2001). Growth
between the two components and basing the mixture models assume that the data are generated
part-2 analysis solely on observed cases will bias through a two-step process: first, individuals are
the fixed-effects intercept upward and may lead to placed into one of K latent classes defined by a set
bias in other part-2 parameters as well, depending of average trajectory curves – one for each com-
on the structure of the between-component asso- ponent of the two-part model; then, around these
ciation (Su et al. 2009). One way to correct for this average trajectories, individuals are randomly
bias is to fit a correlated two-part model analogous assigned their own, subject-specific curves
to Eq. 17. The resulting model can be viewed as a defined by a set of random effects with class-
shared-parameter model (Wu and Carroll 1988) specific variance parameters. As such, these
that accounts for unmeasured subject-level factors models can be viewed as finite mixtures of the
that induce correlation between R and V . Note two-part correlated random effects model
that this approach again relies on a conditional expressed in Eq. 17.
independence assumption whereby, this time, Other recent developments include bivariate
R and V are assumed to be stochastically inde- two-part models (Su et al. 2012), two-part models
pendent given both the observed data and the for the joint analysis of longitudinal and survival
random effects. While it is impossible to verify outcomes (Liu 2009); Hatfield et al. 2011),
whether V is MAR or NMAR, it is often safer to two-part models for estimating expected cumula-
assume NMAR, unless enough covariates have tive cost of illness in the presence of censoring
been measured to reasonably account for the (Basu and Manning 2010), and Bayesian exten-
dependence between R and V . sions of two-part semicontinuous models (Neelon
For further details on selection bias in et al. 2011; Liu 2009; Hatfield et al. 2011; Zhang
two-part semicontinuous models, see Su et al. et al. 2006). This recent work highlights a grow-
(2009). For a related discussion regarding selec- ing interest in parametric two-part modeling and
tion bias in hurdle count models, see Neelon et al. solidifies its current role as a vibrant area of sta-
(2012). For further discussion of missing data tistical research.
28 Two-Part Models for Zero-Modified Count and Semicontinuous Data 707
and truncated count components. Dobbie and way, Bayesian methodology provides a natural
Welsh (2001) used GEE to fit a Poisson hurdle scheme for learning from prior experience. For
model to clustered count data. Hall and Zhang zero-modified count models, the posterior distribu-
(2004) extended the approach to zero-inflated tions generally do not have closed forms, and
models by combining GEE with an EM-type hence Markov chain Monte Carlo (MCMC) algo-
expectation step, resulting in a two-step “expecta- rithms, such as Gibbs sampling (Gelfand and Smith
tion-solution” (ES) procedure (Rosen et al. 2000). 1990), are often used for posterior inference. At
In the E-step, the expectation of the complete data convergence, the MCMC draws form a Monte
log-likelihood with respect to the latent indicator Carlo sample from the joint posterior distribution
Z is computed; in the S-step, GEE is used in lieu of of all model parameters, which can then be used to
maximum likelihood to obtain parameter esti- obtain parameter estimates and corresponding
mates and robust standard errors separately for interval estimates (credible intervals), thus
each component of the model. avoiding the need for asymptotic assumptions.
For the zero-modified random effects models Moreover, because MCMC produces draws from
described in Eq. 13, Min and Agresti (2005) pro- the entire joint posterior distribution of the model
posed a two-stage approach in which numerical parameters, estimation of complex functions of
integration, such as Gaussian quadrature, is first parameters is straightforward. For example, the
used to estimate the marginal likelihood inte- Bayesian framework is ideal for estimating and
grated across the random effects; then, in the obtaining uncertainty intervals for quantities such
second stage, Fisher scoring is used to maximize as the population IRR given in Eq. 10. In the
the estimated marginal likelihood. More recently, maximum likelihood setting, one would have to
Kim et al. (2012) used restricted maximum quasi- perform bootstrapping or derive a Taylor series
likelihood (RMQL) to fit a correlated negative approximation to obtain standard errors and confi-
binomial hurdle model. dence intervals for such quantities.
Several authors have used the EM algorithm In recent years, there has been growing interest
for fitting longitudinal finite mixture (or “latent in Bayesian methods for fitting zero-modified
class”) models. Roeder et al. (1999) used EM to fit models. Rodrigues (2003) proposed a data-
a latent class trajectory model as part of a study augmented Gibbs sampling algorithm to fit a ZIP
examining risk factors for long-term criminal model. Ghosh et al. (2006) used a similar
behavior. Dalrymple et al. (2003) adopted a sim- approach to fit zero-inflated generalized power
ilar approach to study longitudinal trends in sud- series models, which include the ZIP as a special
den infant death syndrome, or SIDS. Min and case. Neelon et al. (2010) developed Bayesian
Agresti (2005) used the EM algorithm to fit a model fitting strategies for repeated measures hur-
discrete random effects model in an analysis of dle, ZIP, and ZAP models and compared various
pharmaceutical side effects. prior specifications, model comparison strategies,
Bayesian methods are also well suited for infer- and approaches to assessing model fit. Ghosh
ence involving zero-modified count data. In Bayes- et al. (2012) used Gibbs sampling to fit the
ian inference, model parameters are treated as k-ZIG model, which accommodates extreme
random variables and assigned prior distributions zero inflation. Several authors have proposed
that quantify one’s uncertainty about their values Bayesian methods for analyzing zero-modified
prior to observing the data. Common prior distri- spatial data (Neelon et al. 2012; Rathbun and Fei
butions for regression models include normal dis- 2006; Ver Hoef and Jansen 2007). In particular,
tributions for fixed-effect parameters, inverse- Neelon et al. (2012) used hybrid Gibbs and
gamma distributions for error variances, and Metropolis-Hastings steps to fit a spatially corre-
inverse-Wishart distributions for random effect lated Poisson hurdle model. For more on Bayesian
covariance matrices. These prior distributions are estimation of zero-inflated count models, see
then combined with the current data via Bayes’ Winkelmann (2008), Neelon et al. (2010, 2012),
theorem to obtain posterior distributions. In this and Zuur et al. (2012).
28 Two-Part Models for Zero-Modified Count and Semicontinuous Data 709
of Y, only that E [g(Yi)| Yi > 0, xi] is a linear (Yi| xi), yielding a simpler, one-part GLM that
function of β2 and that the errors are independent incorporates both zero and nonzero values.
and identically distributed with mean zero with Estimation for GLMs proceeds by nonlinear
homogeneous variance σ 2. When the errors are weighted least squares, with weights proportional
heteroscedastic – for example, when they depend to the inverse variances of the observations
on covariates – the smearing estimator is biased (Buntin and Zaslavsky 2004). The choice of λ is
(Manning 1998). Three approaches have important, since it can affect the efficiency of the
been proposed to account for heteroscedasticity parameter estimates. Choosing λ = 0 implies con-
when constructing a smearing estimator: (1) esti- stant variance; λ = 1 implies a “Poisson-type”
mate unique smearing factors for different covar- variance proportional to the mean; and λ = 2
iate subgroups (Manning 1998), (2) apply results in a “gamma-type” variance. To help
separate smearing factors to different parts of guide this choice, one can apply the Park test
the response distribution (Buntin and Zaslavsky (Park 1966) which exploits the fact that
2004) or (3) use ^ S ¼ E½exp ð^e i Þj Y i > 0, xi as a
corrected smearing factor (Jones 2011) which ln ½VðY i j Y i > 0, xi Þ ¼ constant
can be obtained by regressing the exponentiated þ λ ln ½ψ ðxi Þ: (22)
estimated residuals on x and using the predicted
values as the smearing factors at the corresponding To apply the Park test, the squared residuals
values of x. Recently, Welsh and Zhou (2006) devel- from a candidate model are regressed on the
oped a heteroscedastic smearing estimator for the log-transformed predicted values, y^i :
untransformed marginal mean, E(Yi| xi), averaged h i
over both the zero and nonzero observations. ln ðyi ^y i Þ2 j yi > 0
Note that retransformations to the Y-scale
pose no difficulty for Bayesian inference: after ¼ α þ λ lnð^y i Þ þ ei , i ¼ 1, . . . , n, (23)
drawing MCMC samples of model parameters
on the transformed scale, simply retransform where ei is a mean-zero error term. An estimate
and take the average to estimate the posterior of λ close to zero suggests constant variance, an
mean on the original data scale. However, estimate close to 1 suggests a Poisson-like vari-
unless advanced Bayesian nonparametric tech- ance, and an estimate close to 2 suggests a
niques are employed (Ferguson 1973), an gamma-type structure.
explicit parametric form for the likelihood In deciding between a GLM and a transformed
must be assumed. parametric model, one can employ the following
Quasi-likelihood generalized linear models decision procedure, adapted from Manning and
(GLMs) offer an alternative approach to estimat- Mullahy (2001):
ing the untransformed mean in part 2 of the
model (Manning and Mullahy 2001; Buntin and 1. Fit an OLS regression to the transformed pos-
Zaslavsky 2004; Blough et al. 1999). Here, the itive values.
untransformed
mean is modeled as ψ ðxi Þ ¼ 2. If the residuals are highly kurtotic, then the
h x0i β2 , where h is an inverse-link function parametric two-part model is generally prefer-
(e.g., the exponential function). By modeling able, since high kurtosis can lead to impreci-
ψ(xi) directly, GLMs avoid the need to transform sion (high variability) in quasi-likelihood
Y altogether. Next, the variance of Y(Y > 0) is GLM parameter estimates. To guard against
modeled as a function of covariates, typically model mis-specifications, smearing should
using a power function of the form V(Yi|Yi > 0, xi) be applied when estimating the untransformed
/ ψ(xi)λ. The approach does not specify a distri- mean. In the presence of heteroscedasticity,
bution for Y, making it robust to mis-specifications multiple covariate- or response-dependent
that might otherwise occur. The method can also be smearing factors should be applied to
used to directly estimate the marginal mean, E reduce bias.
28 Two-Part Models for Zero-Modified Count and Semicontinuous Data 711
3. If there is minimal kurtosis, fit a series of quasi- parameter space depends on the degree of het-
likelihood GLMs and apply the Park test to erogeneity between individuals, with greater
determine the optimal value of λ. heterogeneity implying more “effective”
4. To avoid over-fitting, use penalized model parameters. DIC was specifically designed to
comparison or cross validation techniques, estimate the number of effective parameters in
such as split-sample analyses, to choose between Bayesian hierarchical models. Celeux et al.
competing models. (2006) recently adapted the measure to accom-
modate additional latent variable models, such
The choice between models is guided by non- as finite mixtures.
statistical considerations as well. For example, if A second Bayesian comparison measure is
there is interest in estimating both the probability the Bayes factor (Kass and Raftery 1995) which
of a positive response and the mean response offers perhaps the most principled approach to
among positive observations, then a two-part Bayesian model selection. However, because
model (either parametric or quasi-likelihood Bayes factors rely on the marginal likelihood
GLM) may be preferable to a one-part model. of the data under a presumed model, they are not
Further, if it is reasonable to assume that the two defined for improper (infinite variance) prior
components are correlated, then a correlated para- distributions. To accommodate improper priors,
metric two-part model, as in Eq. 17, might be most alternative criteria such as the intrinsic Bayes
appropriate. For a more detailed comparison of factor (Berger and Pericchi 1996) have been
quasi-likelihood GLMs and transformed paramet- proposed. The pseudo Bayes factor (Gelfand
ric models, see Manning and Mullahy (2001) and and Dey 1994) offers a computationally conve-
Buntin and Zaslavsky (2004). nient numerical approximation to the Bayes
factor, but it has been criticized recently due to
its reliance on the computationally unstable
Model Comparison and Assessment harmonic mean (Raftery et al. 2007). Several
other Bayesian comparison measures have
There are several model comparison measures been proposed specifically in connection with
that can be used to select among competing zero-inflated count models, including the
two-part models, including the Akaike informa- group-marginalized DIC (Millar (2009) and
tion criterion (AIC) (Akaike 1974) and the Bayes- the predictive log-score loss function (Ghosh
ian information criterion (BIC), also known as the et al. 2012).
Schwarz criterion (Schwarz 1978). AIC and BIC To further assess model fit in the Bayesian
are referred to as “penalized” criteria because they setting, one can apply Bayesian posterior predic-
combine a measure of model fit, typically twice tive checks, whereby the observed data are com-
the negative log-likelihood, with a penalty for pared to data replicated from the posterior
model complexity, expressed as a function of the predictive distribution (Gelman et al. (1996). If
number of parameters. Smaller values of AIC and the model fits well, the replicated data should
BIC are considered preferable. A related measure resemble the observed data. To quantify the
for quasi-likelihood and GEE models is the quasi- degree of similarity, one typically chooses a “dis-
likelihood under independence, or QIC, criterion crepancy statistic,” such as a sample moment or
(Pan 2001). In the Bayesian setting, a common quantile, which captures some important aspect of
model comparison statistic is the deviance infor- the data. The Bayesian predictive p-value denotes
mation criterion (DIC) (Spiegelhalter et al. 2002) the probability that the model-predicted statistic is
which can be used to compare Bayesian hierar- more extreme than the observed sample value
chical (i.e., random effect) models. As with the (i.e., the value expected under the correct
other selection criteria, DIC balances an assess- model). A Bayesian p-value close to 0.50 repre-
ment of model fit with a penalty for complexity. sents adequate model fit, while p-values near 0 or
For random effects models, the dimension of the 1 indicate lack of fit.
712 B. Neelon and A. J. O’Malley
For more information on Bayesian model com- (2004) provide example code for fitting such
parison and assessment strategies, see Millar (2009), models. Finally, WinBUGS can be used to fit Bayes-
Neelon et al. (2010, 2011), and Ando (2010). ian two-part semicontinuous models; see Cooper
et al. (2003, 2007) Cooper et al.(2007) and Ghosh
and Albert (2009) for examples. Readers should
visit the appropriate software websites for updates
Software
and current versions of these packages.
There are a number of software packages that can
be used for fitting zero-modified count and semi-
continuous models. The statistical software pro- Conclusion
gram R (R Development Core Team 2012) has
several packages for fitting zero-modified count Two-part models play an important role in health
models, including the pscl (Zeileis et al. 2008; services research settings where data are character-
Jackman 2012) package, which performs ML esti- ized by both a high proportion of zeros and a skewed
mation of zero-inflated and hurdle models; distribution of positive values. By modeling the zero
glmmADMB (Fournier et al. (2012; Skaug et al. and nonzero values in distinct ways, two-part
(2012)) for fitting random effect zero-inflated and models offer a flexible parametric approach to the
hurdle models; and MCMCglmm (Hadfield 2010) analysis of zero-modified count and semicontinuous
for Bayesian estimation of hurdle, zero-inflated, data. In many cases, such flexibility can yield
and zero-altered models. SAS 9.1.3 Help and Doc- improved model fit over traditional one-part models.
umentation (2000) offers PROC COUNTREG for At the same time, the reliance on parametric
fitting zero-inflated count regressions, PROC assumptions can be a liability, particularly in the
GENMOD for GEE models, and PROCs case of semicontinuous data. Misguided assump-
NLMIXED and GLIMMIX for random effect tions about the response distribution will naturally
zero-modified count models. Stata Statistical Soft- lead to biased inferences. As in any regression anal-
ware (2011) uses the zip and zinb commands for ysis, careful attention to modeling assumptions is
fitting ZIP and ZINB models, HPLOGIT and paramount to achieving unbiased parameter esti-
HNBLOGIT for hurdle models (Hilbe 2005a, b), mates. If these assumptions appear to be violated,
and gllamm for fitting random effect ZIP models distribution-free quasi-likelihood or other semi-
(Rabe-Hesketh et al. 2005). For Bayesian infer- parametric approaches may be preferable.
ence, the freeware package WinBUGS (Lunn There are a number of active areas of research
et al. 2000) can be used to fit various zero-modified involving two-part models. These include
count models, including hierarchical models. See two-part spatial and spatiotemporal models for
Neelon et al. (2010, 2012) for examples. semicontinuous data, shared-parameter models
Many of these packages can also be used to for informatively censored zero-modified counts,
fit semicontinuous models. For example, SAS and inverse-probability weighting methods for
PROC NLMIXED can be used to fit random effect population-average two-part models. These
semicontinuous models (Tooze et al. 2002) The developments highlight just a few of the potential
freeware package ML (Lillard and Panis 1998) opportunities for methodological research involv-
can be used to fit multilevel two-part models; see ing two-part models.
Liu et al. (2008) for an application. Mplus soft- Lastly, given the scope of the methods
ware (Muthén and Muthén 1998) is useful for described above, this chapter should be viewed
fitting finite mixture and growth mixture as an introductory overview of two-part modeling.
two-part models; Muthén (2001) provides an Readers are encouraged to consult the references
illustration. SAS PROC GENMOD and the Stata cited herein for further discussions of two-part
command glm can be used to fit quasi-likelihood models and their ongoing application to health
one- and two-part models. Buntin and Zaslavsky services research.
28 Two-Part Models for Zero-Modified Count and Semicontinuous Data 713
Ridout M, Demétrio C, Hinde J. Models for count data with Ver Hoef JM, Jansen JK. Spacetime zero-inflated count
many zeros. Proceedings from the International Bio- models of harbor seals. Environmetrics. 2007;18
metric Conference, Cape Town; 1998. Available from (7):697–712. Available from https://doi.org/10.1002/
https://www.kent.ac.uk/smsas/personal/msr/webfiles/ env.873
zip/ibc_fin.pdf Vuong QH. Likelihood ratio tests for model selection and
Ridout M, Hinde J, DemAtrio CGB. A score test for testing non-nested hypotheses. Econometrica. 1989;57
a zero-inflated Poisson regression model against zero- (2):307–33. Available from http://www.jstor.org/sta
inflated negative binomial alternatives. Biometrics. ble/1912557
2001;57(1):219–23. Available from https://doi.org/ Walhin JF, Bivariate ZIP. Models. Biom J. 2001;43
10.1111/j.0006-341X.2001.00219.x (2):147–60. Available from 10.1002/1521-4036
Rodrigues J. Bayesian analysis of zero-inflated distribu- (200105)43:2<147::AID-BIMJ147> 3.0.CO;2-5
tions. Commun Stat Theory Methods. 2003;32 Welsh AH, Zhou XH. Estimating the retransformed mean
(2):281–9. Available from http://www.tandfonline. in a heteroscedastic two-part model. J Stat PlannInfer.
com/doi/abs/10.1081/STA-120018186 2006;136(3):860–81. Available from http://www.
Roeder K, Lynch KG, Nagin DS. Modeling uncertainty in sciencedirect.com/science/article/pii/S037837580400
latent class membership: a case study in criminology. J 3337
Am Stat Assoc. 1999;94(447):766–76. Available from Williamson JM, Lin HM, Lyles RH. Power calculations
http://www.jstor.org/stable/2669989 for ZIP and ZINB models. J Data Sci.
Rosen O, Jiang W, Tanner M. Mixtures of marginal models. 2007;5:519–34. Available from http://www.jds-
Biometrika. 2000;87(2):391–404. Available from http:// online.com/v5-4
biomet.oxfordjournals.org/content/87/2/391.abstract Winkelmann R. Econometric analysis of count data. 5th
SAS 9.1.3 Help and Documentation. Cary; 2000–2004. ed. Berlin: Springer; 2008. Available from http://gso.
Available from: http://sas.com/ gbv.de/DB=2.1/CMD?ACT=SRCHA&SRT=YOP&
Schwarz G. Estimating the dimension of a model. Ann IKT=1016&TRM=ppn+368353176&sourceid=fbw_
Stat. 1978;6(2):461–4. Available from http://www. bibsonomy
jstor.org/stable/2958889 Wu MC, Carroll RJ. Estimation and comparison of
Silva FF, Tunin KP, Rosa GJM, Silva MVBd, Azevedo changes in the presence of informative right censoring
ALS, Verneque RdS, et al. Zero-inflated Poisson by modeling the censoring process. Biometrics.
regression models for QTL mapping applied to 1988;44(1):175–88. Available from http://www.jstor.
tickresistance in a Gyr x Holstein F2 population. org/stable/2531905
Genet Mol Biol; 2011;34:575–82. Available from Xiang L, Lee AH, Yau KKW, McLachlan GJ. A score test
http://www.scielo.br/scielo.php?script=sci_arttext& for overdispersion in zero-inflated poisson mixed
pid=S1415-47572011000400008&nrm=iso regression model. Stat Med. 2007;26(7):1608–22.
Skaug H, Fournier D, Nielsen A, Magnusson A, Bolker Available from https://doi.org/10.1002/sim.2616
B. glmmADMB: generalized linear mixed models Xie H, McHugo G, Sengupta A, Clark R, Drake R. A method
using AD Model Builder; 2012. R package version for analyzing longitudinal outcomes with many zeros.
0.7.2.12. Available from http://glmmadmb.r-forge.r- Ment Health Serv Res. 2004;6:239–46. https://doi.
project.org org/10.1023/B:MHSR.0000044749.39484.1b. Available
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde from https://doi.org/10.1023/B:MHSR.0000044749.
A. Bayesian measures of model complexity and fit. J R 39484.1b
Stat Soc Ser B Stat Methodol. 2002;64(4):583–639. Yau KKW, Lee AH. Zero-inflated Poisson regression with
https://doi.org/10.1111/1467-9868.00353. random effects to evaluate an occupational injury pre-
Stata Statistical Software: Release 12. College Station; vention programme. Stat Med. 2001;20(19):2907–20.
2011. Available from http://stata.com/ Available from https://doi.org/10.1002/sim.860
Su L, Tom BDM, Farewell VT. Bias in 2-part mixed Zeileis A, Kleiber C, Jackman S. Regression models for
models for longitudinal semicontinuous data. Biosta- count data in R. J Stat Softw. 2008;27(8):1–25. Avail-
tistics. 2009;10(2):374–89. Available from http://biosta able from http://www.jstatsoft.org/v27/i08/
tistics.oxfordjournals.org/content/10/2/374.abstract Zhang M, Strawderman RL, Cowen ME, Wells
Su L, Brown S, Ghosh P, Taylor K. Modelling household MT. Bayesian inference for a two-part hierarchical
debt and financial assets: a Bayesian approach to a model: an application to profiling providers in man-
bivariate two-part model; 2012. aged health care. J Am Stat Assoc. 2006;101
Tobin J. Estimation of relationships for limited dependent (475):934–45. Available from http://www.jstor.org/
variables. Econometrica. 1958;26(1):24–36. Available stable/27590773
from http://www.jstor.org/stable/1907382 Zurr AF, Saveliev AA, Ieno EN. Zero inflated models and
Tooze JA, Grunwald GK, Jones RH. Analysis of repeated generalized linear mixed models with R. Newburgh:
measures data with clumping at zero. Stat Methods Highland Statistics Ltd; 2012. Available from http://
Med Res. 2002;11(4):341–55. Available from http:// www.highstat.com/book4.htm
smm.sagepub.com/content/11/4/341.abstract
Data Confidentiality
29
Theresa Henle, Gregory J. Matthews, and Ofer Harel
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
Introducing the Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
Types of Disclosures and an Overview of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
Privacy for Different Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
Balancing Privacy Versus Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
Privacy-Preserving Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
Unperturbed and Perturbed Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
Basic Methods for Limiting Disclosure Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
More Sophisticated SDC Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
Measuring Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
K-Anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
on a combination of only three demographics (ZIP user can difference two separate query results to
code, gender and date of birth)” (Sweeney 2000, gain confidential information about a single per-
1). Attribute disclosure occurs when an adversary son in the study. For example, a user might submit
is able to learn and reveal sensitive attributes a query based on all men under the age of 34, and
about an individual in the data set. Identity disclo- then subsequently submit a query based on all
sure if often a precursor to attribute disclosure: men under the age of 35. If the difference is
first, if a record in the data set is linked to an 1, then you have identified a unique combination
individual, then a private attribute about that of attributes of an individual in the data set. Note
individual is learned. When data are in tabular that in many cases, if the user were to submit a
form, attribute disclosure is most likely to occur query to the database based ONLY on men who
in a column that contains a degenerate distribu- were EXACTLY 34, in this case only one record,
tion of cell counts, as opposed to a column with many query systems will suppress a cell in a table
more uniformly distributed counts. In general, if the value is below some prespecified threshold
“a row or column with large cell counts would precisely because a small cell count can lead to
have less risk of identity or attribute disclosure attribute disclosures. By submitting nested
as compared to a row or column with small queries and differencing, a user is gaining access
counts” (Shlomo 2015, 215). to information that would potentially be
While attribute disclosure often takes place suppressed on its own. The best way to avoid
after an identity disclosure has occurred, there disclosure by differencing is to release a single
are other ways in which attribute disclosure can data set as opposed to providing a system that
occur such as group attribute disclosure, disclo- allows for flexible table generation.
sure by differencing, and disclosure by linking Similar to disclosure by differencing, disclo-
tables. In these scenarios, individuals need not sure by linking tables occurs when two tables
be identified in order for disclosure risk to occur, originate from the same source and therefore
such as in the case of group attribute disclosure, have the potential to be linked by common cells
where sensitive information is exposed about a or common margins. This can potentially allow an
group within the data set, rather than an individual adversary to discover the SDC technique which
person. For example, say that in a certain data guards the data and with it the original data values.
set, all people within a small ZIP code all have a The best way to avoid disclosure by linking tables
diagnosis of high blood pressure. If you know of is to ensure that the margins and cells of tables be
an individual in this data set who lives in that made consistent (Shlomo 2015, 214).
particular ZIP code, you now also know that The last type of disclosure risk is inferential
they have a high blood pressure diagnosis. disclosure. Inferential disclosure relies on proba-
Note that no identity disclosure has taken bility and/or modelling to expose attributes with a
place here as no particular record was matched high degree of confidence. One way in which
to any individual; however, an attribute disclo- inferential disclosure can occur is by way of
sure has still taken place. regression model if the model has “very high
Another way for attribute disclosure to occur is predictive power” (i.e., the dependent and explan-
from what is called disclosure by differencing. An atory variables are highly correlated). This spe-
example of this is where two nested tables (i.e., cific case of inferential disclosure is called model,
one table is a subset of another table) are sub- or predictive, disclosure. Willenborg and de Waal
tracted from one another exposing sensitive infor- (2001) explain predictive disclosure using micro-
mation previously unknown (Shlomo 2015, 214). data containing information about an individual’s
This is often a problem when data are accessed gender, age, occupation, location, and income. If
through flexible table generation. In such a sce- an adversary knows certain characteristics (i.e.,
nario, a user is not given access to the data set as a gender, age, occupation, and location) about a
whole but must submit queries to a database. This specific individual in the data set, they can build
is problematic from a privacy perspective when a a regression model to predict an unknown value
29 Data Confidentiality 721
(say income). Through modelling, the adversary pre-tabular, post-tabular, or some combination of
has achieved the “predictive distribution of the these two methods.
target’s income” (Willenborg and de Waal Pre-tabular methods are implemented on orig-
2001); however, many would argue that such inal microdata before it is transformed into a tab-
inferences are the goal of statistical modelling, ular data set. Post-tabular methods modify data for
and therefore it extends beyond reasonable pri- privacy purposes after the data set is already in its
vacy protection. tabular form. The most common forms of post-
Disclosing by differencing, which was previ- tabular disclosure control techniques are methods
ously discussed, can also be considered a type of utilizing random rounding and cell suppression.
inferential disclosure. For this reason, inferential Data collection techniques can also impact the
disclosure is often associated with web-based type and degree of vulnerability associated with a
interactive data. The recent emergence of interest data set. Data can be generated either through
in inferential disclosure has catalyzed a need for sampling from a larger population or by collecting
stricter forms of privacy guarantees and has complete information on the population through
formed the basis for differential privacy, a popular what is called a census. Though sampling is more
privacy measurement discussed later in this chap- common because it requires fewer resources, cen-
ter (Shlomo 2015, 203). sus data are popular for government publications.
There are examples showing that an individual Census data contain unique challenges in pre-
does not even need to be in a data set in order to be venting identity and attribute disclosure because
at risk of inferential disclosure. For example, say a there is no uncertainty about membership in the
person were to release their genetic sequencing data. Conversely, sampling as part of data collec-
data (as was done in the Human Genome Project) tion process obscures the ability to make inference
and they are found to have a specific gene. Then on frequency counts, thereby reducing the possi-
later a subsequent study with new participants bility of identity disclosure.
reveals that specific gene makes a person Lastly, the introduction of new types of data
extremely likely develop a rare form of cancer. has opened the door to new concerns over data
The person who released their genetic data was vulnerability. With advancements in technology,
not a part of the study which determined the large quantities of data are being generated from
effects of this specific gene, but they are now processes that simply did not exist until recently
subject to its inference: that they will likely such as location data collected from cell phone or
develop this form of rare cancer. However, individuals’ genome-wide data. With the Global
according to the current privacy guidelines, Positioning System (GPS) capability of modern
researchers are only beholden to protect the pri- cell phones, it has become easy to track the loca-
vacy of those individuals included in the tions of millions of people at once, yielding mas-
published data set. sive quantities of location data. Location data are
problematic from a privacy perspective as it has
the potential to jeopardize confidential informa-
Privacy for Different Types of Data tion about individuals such as where they live or
where they work.
There are two common structures for how Genetic data, such as data used in genome-
published data sets are presented: microdata and wide association studies (GWAS), for example,
tabular data. Microdata are “data containing are another type of data with unique and ever-
observations on the individual level” such as expanding complexities. Since the inception of
social surveys or general health surveys. Tabular the Human Genome Project in 1990, and its com-
data “contains frequency counts or magnitude pletion in 2003, scientists have made great strides
data,” which is more typical of business surveys in understanding the human genetic structure. The
(Shlomo 2015, 202). Methods for protecting pri- availability of human genetic data is crucial for the
vacy in tabular data can be classified as either continued growth of genetic research. However,
722 T. Henle et al.
this new source of personal information has been When the intended use of the data is unknown,
accompanied by its very own set of privacy con- publishers can utilize information loss measures
cerns. Homer et al. (2008) demonstrated “the abil- to quantify data utility. These measures seek to
ity to accurately and robustly determine whether minimize data distortion in a broad sense, making
individuals are in a complex genomic DNA mix- the data more versatile but relinquishing the
ture” (Homer et al. 2008, 1). Homer argued the promise of utility for any specific task. Informa-
need for more stringent methods for sharing and tion loss measures compare the difference in util-
combining individual genotype data across stud- ity of the altered data set to the original data set.
ies, since “sharing only summary data do not This difference in utility is task specific, meaning
completely mask identity” (Homer et al. 2008, the altered data set may perform accurately for
9). Gymrek et al. (2013) demonstrated a shocking some desired analysis but inaccurately for others.
breach of privacy when they were able to identify For example, if a data set was altered such that
a man whose genetic data had been used in the marginal totals remained constant, it is likely that
Human Genome Project. By using a sequence of when testing for average values, one would
that man’s genetic data along with information achieve perfect retention of data utility. However,
about his location and age, they were able to in the same case, the relationship between vari-
cross-reference genealogical databases and public ables may not be maintained, and therefore a
records to discover the identity of the individual subsequent regression analysis would be ineffec-
(Gymrek et al. 2013). tive at reflecting the true nature of the data. This is
often the case when electronic health records and
data sets that contain multiple variables of interest
Balancing Privacy Versus Utility need to be shared (e.g., demographics and diag-
nosis codes). The two attribute types cannot be
Protection of privacy is not the only concern a anonymized separately; however, it is difficult to
data distributor must consider when preparing a “preserve data utility when anonymizing both of
data set. Establishing privacy protections can the attribute types together” (Gkoulalas-Divanis
often come at a loss of data utility, causing the and Loukides 2015, 30). For this reason, it is
data to become unusable or inaccurate. This is true desirable that publishers reveal the type of
regardless of the type of data used or how a privacy-preserving method used and that
publisher defines privacy in their data set. In gen- researchers consider the effect that method may
eral, “it is always possible to increase the privacy have on the tests they wish to perform.
of any specific data release, but this almost When the intended use of the data is known,
assuredly comes with a loss of data utility” (Mat- the data distributor will likely opt for utility-
thews et al. 2010). Therefore, publishers should constrained approach when measuring data utility.
think carefully about the balance between privacy The specific type of utility constraints employed
and utility when preparing a data set for publica- depends entirely upon the intended use of the data,
tion. Further, if the distributor knows that the data but in general a utility constraint prevents the
will be used for a specific purpose, this is often anonymization procedure from generating data
helpful information in choosing an appropriate that will produce vastly different results when
disclosure control method. compared to the original data. For example, a
There are two major frameworks for how to data publisher may want to add noise to a variable
measure data utility on data sets where SDC tech- but may check that the resulting sample mean
niques have been applied. The more general of the of the modified data is relatively close to the
two is information loss measures, which do not true sample mean of the original data set. Fur-
presume any specific intended use for the data. ther, constraints preventing combinations of
Alternatively, the utility-constrained approach con- variables that are not possible (i.e., a record of
siders the way the data are intended to be used and a woman with prostate cancer) could also be
preserves data utility for that task specifically. considered here.
29 Data Confidentiality 723
diagnosis codes for different forms of cancer and generalized term “over 80” for those ten individ-
risk small counts that are more easily exposed, uals. Generalization provides the basis for more
generalization would bin all cancer diagnoses complex partitioning privacy preservation models
into one or more subsets that contain higher such as k-anonymity, l-diversity, and t-closeness
counts. The generalized term is still semantically (Li et al. 2015, 187). These techniques will be
consistent for the specific diagnosis, such as discussed in the following section.
replacing lymphoma with “cancer.” However,
the generalized term does not offer as much detail, Sampling
thereby obscuring sensitive values and preventing The last of the unperturbed methods is sampling.
an attacker from distinguishing a specific diagno- A familiar technique for data collection, sampling
sis code from within the generalized term. Gener- is also very useful in privacy preservation. In
alization is best implemented when the number of Skinner et al. (1994), they make the case that
quasi-identifier attributes is small and when the “population uniqueness will be a sufficient condi-
intended use applies to a range of data rather than tion for an exact match to be verified as correct.”
a specific class. The more attributes involved, the In other words, samples obscure population
greater number of generalized terms required to uniqueness and stifle an adversary’s ability to
ensure privacy, which will lead to the degenera- cross-reference uniqueness between data sets in
tion of data utility. When a user seeks information a linkage attack. Sampling also does extremely
about a group or range of values, such as people well in balancing privacy with utility, as proper
from a certain geographic area, generalization sampling techniques should yield data that are an
provides privacy without any utility loss. accurate representation of the population. More
Generalization is susceptible to composition so, sampling is an “easy technique to implement
attacks when multiple independent data sets are and the resulting sampled data are relatively easy
available. If two equivalence classes share only to analyze” (Matthews and Harel 2011).
one sensitive value, an adversary can deduce sen-
sitive information by differencing. For example, Randomization
the raw data set may contain information about the Perturbation techniques work by modifying the
age of an individual. Rather than reporting exact contents of the data in some way as the basis for
age, generalization would report, for instance, the privacy preservation. Randomization is the most
age group (e.g., 20–29, 30–39, etc.). basic perturbation technique and can be used for
both microdata and tabular data sets. In randomi-
Suppression zation, noise is randomly added to the original
Generalization is a favorite technique due to its values (or aggregated values) obscuring the true
“faithful” information properties. Although the values contained within an individual’s record and
granularity of detail may not be fine, the accuracy making it difficult for an adversary to infer sensi-
of values is pristine, and the relationship between tive information. The simplest application of ran-
variables is not disturbed. Suppression is an domization would be random noise generated
extreme case of generalization where the most from an independent and identical distribution
generalized term is utilized. Therefore, possible with a positive variance and mean of zero. In
generalization is preferred because it is a superior this case, the addition of random noise “will not
technique in preserving data utility. Top-bottom change mean of the variable for large data sets, but
coding is another specialized case of generaliza- will introduce more variance,” (Shlomo 2015,
tion that applies specifically to extreme values. 210) which may harm the ability of a researcher
For example, there may only be one person in to make accurate statistical inferences. Randomi-
the study that is 99 years old; however they may zation is best used within “small homogenous
be ten individuals over the age of 80. An agency sub-groups in order to use different initiating per-
may record specific age for individuals in the turbation variance for each sub-group” (Li et al.
study less than 80 years old but utilize the 2015, 180). The use of subgroups for noise
29 Data Confidentiality 725
addition is also beneficial in maintaining accurate greater risk to a record exists when core identify-
relationships between variables in the data (Mat- ing variables are present and are unique. Disclo-
thews et al. 2010). sure control techniques are then applied to groups
of records based on their risk category. Substi-
Rounding tution techniques are used to perturb the data.
Rounding is another perturbation method gener- Substitution methods include random rounding,
ally applied to tabular data sets. As the name randomization, data swapping, and synthetic
implies, in rounding, observations are rounded data (the last two methods mentioned here are
up or down to the nearest multiple of a pre- discussed in detail below). The data are then
determined rounding base. For example, if the sampled from the perturbed data set to add
rounding base was 0.1 and the observed value another layer of privacy protection and to
was 0.3, the probability of rounding up would be “help reduce the bias caused by substitution
0.3, whereas the probability of rounding down (Singh et al. 2003). A unique and desirable
would be 0.7. Another method is controlled property of MASSC is “that both disclosure
rounding, “which allows the sum of the rounded risk and information loss can be controlled for
values to be the same as the rounded value of the simultaneously” (Matthews et al. 2010).
sum of the original data” (Shlomo 2015, 218). A
problem with random rounding occurs however Data Swapping
when cells generated in different tables lack con- Data swapping is a privacy-preserving technique
sistency. When this happens, “the true cell count popular for its ease of use. Although the technique
can be learned by generating many tables was originally intended for use on contingency
containing the same cell and observing the pertur- tables, it has become a popular technique for
bation patterns” (Shlomo 2015, 218). An alterna- microdata as well. The procedure involves “the
tive to controlled rounding is semi-controlled swapping of values of variables for records that
random rounding which “ensures that rounded match on a representative key” (OECD 2008,
internal cells aggregate to the controlled rounded 126). In other words, given a data set with a
total” (Shlomo 2015, 218), thereby enforcing con- sensitive variable, such as cancer diagnoses
sistency across all generated tables. where it is necessary to protect against attribute
disclosure, some records containing that diagnosis
code will swap with another record exclusively
More Sophisticated SDC Approaches within that variable. Variables that are not consid-
ered sensitive will be untouched by this process,
Micro-agglomeration, Substitution, for the record swapping applies only to the vari-
Subsampling, and Calibration able of concern. An example of this can be viewed
MASSC (Micro-agglomeration, Substitution, through the following table. In the real data set,
Subsampling, and Calibration) combines various the sensitive variable is the participant’s cancer
simple techniques to create a more robust diagnosis. In the swapped data, the second and
approach to data privatization. The names of the third rows are swapped within the cancer column,
procedure lay out the four steps: micro- so that the participant in row 2 now is associated
agglomeration, substitution, subsampling, and with a cancer diagnosis and the participant in row
calibration. In micro-agglomeration, records are 3 is no longer (Fig. 1).
sorted by the level of risk, dependent on the pres- Data swapping is best used when one is simply
ence of identifying variables. High-risk identify- interested in univariate statistics. Since records are
ing variables are called core variables, as swapped one for one, the marginal totals remain
compared to noncore identifying variables which intact, making univariate statistics unchanged.
generally pose less risk to privacy. Core identify- Multivariate relationships, on the other hand,
ing variables pose a greater risk because they are between the affected variable and the other vari-
generally easier for an intruder to obtain. The ables in the data set may not be correctly
726 T. Henle et al.
maintained in the data swapping process. However variables are swapped with records where the
because only the sensitive variable is affected, mul- value of the sensitive variable falls within a
tivariate analysis can be effectively conducted by certain range of the original record. This restric-
simply excluding the sensitive variable. tion allows the relationships between the sensi-
When implementing data swapping, one must tive variable and the other variables in the data
be wary of swaps that may result in impossible or set to be more effectively maintained than in
improbable records. An example of this would be traditional data swapping where the process of
if a data set contained the variables gender and swapping is strictly random.
diagnosis code and swapping resulted in a record
suggesting that a female was diagnosed with pros- Data Shuffling
tate cancer. Sarathy and Muralidhar (2002) proposed a fur-
As mentioned, data swapping is effective for ther extension of data swapping called data
both tabular and microdata sets. However, imple- shuffling. Data shuffling utilizes a conditional
mentation procedures may differ depending on distribution approach where all of the marginal
the type of data used. Since microdata provides distributions remain intact. More so, pairwise
subject-level information rather than variable monotonic relationships in the original data are
aggregates, “many more swaps must be made to maintained. They are therefore able to increase
preserve the level of privacy” (Matthews et al. the privacy protection without sacrificing the
2010). Determining the number of swaps neces- high level of utility achieved through data swap-
sary was deemed “computationally impractical” ping (Sarathy and Muralidhar 2002). For this
by Fienberg and McIntyre (2004), and therefore reason, this method has become standard for
totals should be preserved only approximately for many, including the United States Bureau of
best practice. As previously stated, arguably the the Census and the Office for National Statistics
greatest advantage to this method is that it is very in the UN (Lauger et al. 2014).
easy to implement. All that is required to utilize
this method is microdata and a random number Randomized Response
generator (Moore 1996). Randomized response is a technique for survey
data closely related to the previously discussed
Rank-Based Proximity Swapping technique of randomization (Warner 1965;
A more contemporary alternative to the tradi- Greenberg et al. 1969). In randomized response,
tional data swapping method is a rank-based respondents will answer a question truthfully with
proximity swapping proposed by Greenberg some given probability (e.g., a coin flip). Other-
(1987) and popularized by Moore (1996). wise, they are instructed to answer the question
Unlike data swapping, values of sensitive with the opposite of the truthful answer.
29 Data Confidentiality 727
researchers can compute analysis across sev- assessment of privacy is substantially more diffi-
eral synthetic data sets and pool their results cult. This is due to the many different kinds of
for a combined estimate. However, since syn- disclosures that exist and that measures of privacy
thetic data relies on artificial data, this can leave will be different depending on the type of
researchers pondering the validity of their find- disclosure.
ings. Raghunathan et al. (2003) and Reiter Measures of privacy based on reidentification
(2005) set out to assure researchers that syn- assess the probability of accurately identifying a
thetic data have merit by showing that for accu- subject in the published data set. Spruill (1982)
rate imputation models, resulting analyses studied the privacy of some masking procedures
yield almost identical results to that of the orig- (e.g., normal random error, random rounding,
inal data. However, “if the model for imputa- data swapping, etc.). They proposed a measure
tion is incorrect or inaccurate, the resulting of confidentiality based on the percentage of
analysis from the synthetic data will yield records in the published data set that could be
parameter estimates that are much different linked to the original record. Paass (1988) dis-
than those estimated from the actual data. As cusses a measure of privacy based on matching
such, synthetic data sets are only as good as the subjects in the published data set to some addi-
models used for imputation” (Matthews et al. tional available information, and their proposed
2010, 10). measure of privacy is based on the percentage of
Though the idea of synthetic data sets was records that are at risk for identification. They
slow to catch on, it has become a widely used concluded that the best way to protect privacy is
and highly successful disclosure control tech- to release as few variables as possible, since the
nique. The most highly visible user of this tech- greater number of variables, the more difficult it
nique is the United States Census Bureau. They is to protect against a privacy attack. Larger data
have used partially and fully synthetic data in sets (i.e., data sets with many variables) require
several of their publicly released data sets, substantial modifications to the data in order to
including the yearly release of “On the Map” maintain a robust level of privacy though this
data (Shlomo 2015, 228). This data generated comes at the cost of potentially dramatic reduc-
by personal GPS devices provides information tions in data utility. They also note that the
on the locations of individuals. However, it addition of random noise does little to protect
would be rather easy to identify individuals privacy in this framework.
based on their home and place of work, making
it a statistical disclosure concern. However,
through the use of synthetic data sets, the Cen- K-Anonymity
sus Bureau has been able to release this data
without risking the privacy of the individual’s K-anonymity is an additional privacy measure
involved (Shlomo 2015, 228). for data that has had suppression and generali-
zation techniques applied to it. In general,
k-anonymity promises a level of anonymization
Measuring Privacy for any given record in the data by focusing on
quasi-identifiers. As previously mentioned,
Statistical disclosure techniques are designed to quasi-identifiers are “a set of attributes in a
protect the privacy of individuals by masking data set that could be used for matching with
sensitive attributes and preventing disclosure risk an external database” (Matthews et al. 2010,
and, at the same time, producing data sets that are 16). Quasi-identifiers put an individual at
useful for analysis and inference. While assessing greatest risk for disclosure when certain combi-
data utility is relatively straightforward (i.e., how nation of attributes is rare or, in the worst case,
similar is the analysis when using the raw data unique. More formally, k-anonymity states that
vs. the analysis when using the protected data), the every set of quasi-identifiers that appears in the
29 Data Confidentiality 729
data set must appear at least k-times. Thus there Differential Privacy
is at most a 1/k chance of reidentifying a partic-
ular record (Sweeney 2002a). Differential privacy was proposed in Dwork
In application, take the following two data (2006) and provides formal privacy guarantees
sets with variables age group, gender, and sur- and results in one of the strongest versions of
gical procedure. Generalization has been privacy. The basic idea of differential privacy is
applied to the age group variable in the “Privacy that no single observation in a data set should be
Preserved Data” so that there are fewer overall overly influential in terms of a function of the
age groups and less potential to uniquely iden- data. This means that for a given function of the
tify an individual based on their age. In the raw data, the value of this function will not change
data set, the combination of age group = 70–75 “very much” if ANY one single record in the
and gender = F is a unique combination. How- data is modified. Data sets that differ by only
ever, after the data are generalized, every com- one record are referred to as neighboring data
bination of quasi-identifiers (age and sex) sets. (There are actually two distinct meanings
appears at least two times. Therefore, the privacy for neighboring data sets: one refers to a record
of this data can be measured as 2-anonymous by being modified, and the other refers to a record
the principle of k-anonymity. Note, however, that being removed. Here the second definition is
both 70–80-year-old men had the Whipple proce- used.) Exactly how much values of the function
dure, thus causing an attribute disclosure even are allowed to change is controlled by the
though no record was uniquely identified parameter epsilon (ϵ), with smaller values
(Fig. 2). Extensions of k-anonymity include guaranteeing more privacy and larger values
l-diversity (Machanavajjhala et al. 2007) and guaranteeing less. Guaranteeing that the result
t-closeness (Li et al. 2007). of a function of the data does not change “very
730 T. Henle et al.
much” is accomplished by creating a random- (2012) place the problem of measuring privacy in
ized version of the function rather than the exact a hypothesis testing framework and use the
value of the function. This results in very strong receiver-operating characteristic (ROC) curve to
privacy. Practically speaking, this type of pri- assess the privacy of a database.
vacy guarantees that if an adversary knows all
records in the data set except for 1, they will still
not be able to learn very much about the last Conclusion
unknown observation, and this would be true
for ANY set of observations. It is estimated that 2.5 quintillion bytes of data are
Example data set: 1,2,3,4,100 collected every day (DN Capital 2015). These
As an example, imagine that a data set massive quantities of data allow researchers and
contained five observations, and one of these businesses to perform analyses that were previ-
observations was a large outlier. The mean of ously unthinkable. However, as the amount of
this data set is 22. However, rather than release data that are collected is increased, concerns
the value of 22, a randomized version of the mean about data privacy will naturally follow. Mali-
is released by simply adding some noise to the cious data users often possess the capabilities to
true value of the sample mean. If no noise was expose sensitive attributes and reveal the identi-
added and the true value of the sample mean was ties of individuals in a publicly available data set.
released, if an intruder knew the first four values This is especially problematic in medical data,
in this data set and the mean of 22, the intruder can where sensitive attributes might refer to a serious
learn the exact value of the remaining data value. illness or diagnosis. Therefore, it is of the utmost
However, since the released value of the mean is importance that proper consideration be given to
random, the exact value of the remaining data protecting patient privacy prior to releasing med-
point is uncertain. The exact amount of noise ical data, which requires consideration beyond
that is necessary to add is based on a data simply removing direct identifiers. It is imperative
releaser’s choice of the ϵ parameter and what is that statistical disclosure control techniques be
referred to as the sensitivity of the function. The applied to data to ensure a standard of privacy.
sensitivity of the function is the absolute value of
the largest possible difference in the function
computed on the actual data and a neighboring References
data set across ALL neighboring data sets.
As an example of sensitivity, if we consider the DN Capital – Venture Capital. Beyond ‘big data’ to data
driven decisions. 2015. Dncaptical.com/thoughts/
neighboring data base with the outlier removed,
beyond-big-data-to-data-driven-decisions/.
the mean is now 2.5. This yields a sensitivity of | Dwork C. Differential privacy. In: ICALP. Springer Verlag;
22–2.5| = 19.5 as this is the largest difference 2006. p. 1–12. MR2307219.
across all neighboring databases. Fellegi IP. On the question of statistical confidentiality. J
Am Stat Assoc. 1972;67(337):7–18.
One of the simplest and most popular ways to
Fienberg SE, McIntyre J. Data swapping: variations on a
achieve ϵ-differential privacy is to add Laplace theme by Dalenius and Reiss. In: Domingo-Ferrer J,
noise to the true value of the function of interest Torra V, editors. Privacy in statistical databases. Vol.
calculated on the full data set where the mean of 3050 of lecture notes in computer science. Berlin/Hei-
delberg: Springer; 2004. p. 519. https://doi.org/
the Laplace distribution is 0 and the variance is
10.1007/978-3-540-25955-8_2.
determined by the value of ϵ and the sensitivity of Gkoulalas-Divanis A, Loukides. A survey of
the function. anonymization algorithms for electronic health records.
Extensions of differential privacy include sev- In: Gkoulalas-Divanis A, Loukides G, editors. Medical
data privacy handbook. Cham: Springer International
eral relaxed versions including (ϵ, δ) – indistin-
Publishing; 2015. p. 17–34.
guishability (Nissim et al. 2007) and probabilistic Greenberg B. Rank swapping for masking ordinal micro-
differential privacy (Machanavajjhala et al. 2008). data. Technical report, U.S. Bureau of the Census
Matthews et al. (2010) and Matthews and Harel (unpublished manuscript), Suitland; 1987.
29 Data Confidentiality 731
Greenberg BG, Abul-Ela A-LA, Simmons WR, Horvitz Symposium on Theory of Computing; 2007.
DG. The unrelated question randomized response p. 75–84. MR2402430.
model: theoretical framework. J Am Stat Assoc. OECD Statistics. Glossary of statistical terms. OECD glos-
1969;64(326):520–39. MR0247719. sary of statistical terms – data swapping definition,
Gymrek M, McGuire AL, Golan D, Halperin E, Erlich stats. 2008. Oecd.org/glossary/detail.asp?ID=6904
Y. Identifying personal genomes by surname inference. Paass G. Disclosure risk and disclosure avoidance for
Science. 2013;339:321–4. microdata. J Bus Econ Stat. 1988;6(4):487–500.
Harel O, Zhou X.-H. Multiple imputation: Review and Raghunathan TE, Reiter JP, Rubin DB. Multiple imputa-
theory, implementation and software. Statistics in Med- tion for statistical disclosure limitation. J Off Stat.
icine 2007;26, 3057–3077. MR2380504 2003;19(1):1–16.
Health Insurance Portability and Accountability Act Reiter JP. Inference for partially synthetic, public use
(HIPAA); Pub.L. 104–191, 110 Stat. 1936, enacted microdata sets. Survey Methodology 2003;29 (2),
August 21, 1996. 181–188.
Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Reiter JP. Releasing multiply imputed, synthetic public use
Muehling J, et al. Resolving Individuals Contributing micro- data: an illustration and empirical study. J Royal
Trace Amounts of DNA to Highly Complex Mixtures Stat Soc Series A Stat Soc. 2005;168(1):185–205.
Using High-Density SNP Genotyping Microarrays. MR2113234.
PLoS Genet 2008;4(8): e1000167. https://doi.org/ Rubin DB. Multiple imputation for nonresponse in sur-
10.1371/journal.pgen.1000167 veys. Hoboken: Wiley; 1987. MR0899519.
Lauger A, et al. Disclosure avoidance techniques at the Rubin DB. Comment on “statistical disclosure limitation”.
U.S. census bureau: current practices and research. J Off Stat. 1993;9:461–8.
Research report series. 2014. www.census.gov/srd/ Sarathy R, Muralidhar K. The security of confidential
CDAR/cdar2014-02_Discl_Avoid_Techniques.pdf numerical data in databases. Inf Syst Res. 2002;13
Li N, Li T, Venkatasubramanian S. t-closeness: privacy (4):389–403.
beyond k-anonymity and l-diversity. In: Data Engineer- Shlomo N. Statistical disclosure limitation for health data:
ing, 2007. ICDE 2007. IEEE 23rd International Con- a statistical agency perspective. In: Gkoulalas-Divanis-
ference on; 2007. p. 106–15. A, Loukides G, editors. Medical data privacy hand-
Li H, et al. Differentially private histogram and synthetic book. Cham: Springer International Publishing; 2015.
data publication. In: Gkoulalas-Divanis A, Loukides G, p. 201–30.
editors. Medical data privacy handbook. Cham: Singh A, Yu F, Dunteman G. MASSC: a new data mask for
Springer International Publishing; 2015. p. 35–58. limiting statistical information loss and disclosure. In:
Machanavajjhala A, Kifer D, Gehrke J, Venkitasu- Proceedings of the Joint UNECE/EUROSTAT Work
bramaniam M. L-diversity: Privacy beyond k-anonym- Session on Statistical Data Confidentiality; 2003.
ity. ACM Trans. Knowl. Discov. Data 2007;1 (1), 3. p. 373–94.
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Skinner C, Marsh C, Openshaw S, Wymer C. Disclosure
Vilhuber, L. Privacy: theory meets practice on the control for census microdata. Journal of Official Statis-
map. In: International Conference on Data Engineer- tics 1994;10, 31–51.
ing. Cornell University Computer Science Department, Spruill NL. Measures of confidentiality. Proceedings of the
Cornell; 2008. p. 10. section on survey research methods, American Statisti-
Matthews GJ, Harel O. Data confidentiality: a review of cal Association. 1982
methods for statistical disclosure limitation and Sweeney L. Simple Demographics Often Identify People
methods for assessing privacy. Statist Surv. Uniquely. Carnegie Mellon University, Data Privacy
2011:1–29. https://doi.org/10.1214/11-SS074. Working Paper 3. Pittsburgh 2000.
Matthews GJ, Harel O. Assessing the privacy of random- Sweeney L. Achieving k-anonymity privacy protection
ized vector valued queries to a database using the area using generalization and suppression. Int J Uncertainty
under the receiver-operating characteristic curve. Fuzziness Knowledge Based Syst. 2002a;10
Health Serv Outcome Res Methodol. 2012;12 (5):571–88. MR1948200.
(2–3):141–55. Sweeney, L. Simple demographics often identify people
Matthews GJ, Harel O, Aseltine RH. Assessing database uniquely. Carnegie Mellon University, data privacy
privacy using the area under the receiver-operator char- working paper 3. 2002b.
acteristic curve. Health Serv Outcome Res Methodol. Sweeney L. K-anonymity: a model for protecting privacy.
2010;10(1):1–15. Int J Uncertainty Fuzziness Knowledge Based Syst.
Moore Jr R. Controlled data-swapping techniques for 2002c;10(5):557–70. MR1948199.
masking public use microdata. Census Tech Report. Warner SL. Randomized response: a survey technique for
1996. eliminating evasive answer bias. J Am Stat Assoc.
Nissim K, Raskhodnikova S, Smith A. Smooth sensitivity 1965;60(309):63–9.
and sampling in private data analysis. In: STOC ‘07: Willenborg L, de Waal T. Elements of statistical disclosure
Proceedings of the Thirty-Ninth Annual ACM control. New York: Springer; 2001. MR1866909.
Qualitative Research
30
Cynthia Robins
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
What Is Qualitative Research? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
A Sampling of Qualitative Health Research Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
Methods of Qualitative Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
Informed Consent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
Data Collection Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
To Record or Not to Record? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
Simplifying the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750
C. Robins (*)
Westat, Rockville, MD, USA
e-mail: cynthiarobins@westat.com
tive data collection and analysis that can serve What Is Qualitative Research?
as templates for their own qualitative health
studies. The overarching goal of the chapter is Much ink has been spilled over the years on what
to argue that when conducted systematically by scholars have often referred to as the “qualitative-
well-trained scholars, qualitative research has quantitative divide” or the “science wars,” with
the potential to offer us valuable insights into scholars arguing about the supremacy of one
the socio-cultural factors that underlie the approach over the other (classic examples include
interpretation of diseases, the illness experi- Popper 1934; Kuhn 1962; Sokal 1996). Those
ence, and the search for meaningful and effec- arguments will not be revisited here; instead, this
tive treatments. chapter starts from the assertion shared by Hopper
(2008), Morse (1991), and others that qualitative
research is not “better” or “worse” than its scien-
Introduction tific counterparts, but legitimate in its own right
and on its own terms. In research, as in life, some
Qualitative research involves the analysis of things simply serve different ends. To ask,
nonnumeric data obtained through data collec- “Which is better, an electric drill or a reciprocating
tion methods such as in-depth interviews, focus saw?” is a pointless question without knowing the
groups, and observations. Although it is gaining project that is to be undertaken. Once the project
traction in the field of health research, qualita- objectives are clearly defined, however, there is a
tive methods are a relatively recent addition and right answer. Selecting the wrong tool for the job,
many health researchers are still unsure about perhaps because it is the one the researcher likes
the value it brings to the field. This chapter thus or knows the best, can have disastrous conse-
has a threefold objective: first, to briefly set quences for the work at hand.
forth how qualitative research differs philo- For the reader who is not well versed in phi-
sophically from quantitative research. Although losophy and who simply wants to know if he or
readers may be aware that qualitative and quan- she should consider using qualitative methods on
titative data collection methods are different, it a project, the following brief distinction is worth
is also important to understand that the episte- considering. Quantitative research is rooted in a
mologies that drive each approach are very dis- philosophy of positivism, i.e., the belief that there
tinct. Second, this chapter will review the recent is objective truth in the world that can be discov-
history of the use of qualitative methods in ered through the application of the scientific
health research. It will look at the social and method (see discussion in Ponterotto 2005.
political processes that co-occurred with the 128–129). Through controlled experiments, care-
rise of qualitative research in health and ful measurements, and agreed-upon numeric indi-
healthcare, as well as briefly describe some of cators (e.g., p-values, confidence intervals),
the signal studies in the field over the last science aims to gather increasing amounts of
20–25 years. Finally, the chapter will provide information about the world. Scientific advances,
the reader with an overview of the fundamentals thus, are viewed as getting us closer to a full
of qualitative research, including data collection understanding of an empirical reality.
techniques and the basics of the analytic pro- Qualitative research, by contrast, has its roots
cess. The goal of this chapter is to demonstrate in a philosophy of knowledge often called phe-
that when skillfully and appropriately implemented, nomenology or social constructivism (Morse and
qualitative research can offer critical insights into Field 1996; Alvesson and Skoldberg 2009). This
such phenomena as patients’ experiences, service philosophical position is quite distinct from posi-
providers’ views of disease processes and treat- tivism in that it asserts that human beings’ inter-
ments, as well as key socio-cultural factors that actions with the world are always mediated by a
underlie the structure and delivery of health care in socially or culturally provided system of symbols
different communities. – language, beliefs, values, and rules for behavior.
30 Qualitative Research 735
For example, cultural anthropologists, such as former patients of psychiatric hospitals decried
Geertz (1973) and others, operate from the funda- some of the abuses they had endured under the
mental position that because all of our experiences guise of psychiatric medicine. Members of this
are filtered through a cultural lens, there is no way movement, such as activist-writer Judi Chamber-
to get at what some might refer to as “truth”: What lain, wanted to tell their side of the story, i.e., to
we believe to be truth is someone else’s heresy. share their experiences and perspectives being
Qualitative research thus aims not to uncover “treated” under lock and key. Chamberlain’s land-
“the” truth, but rather “their” truth, often with mark work, On Our Own: Patient-Controlled
the explicit aim of creating a foundation of under- Alternatives to the Mental Health System (Cham-
standing between populations in conflict. The berlain 1978), generously incorporated first-
objective of qualitative research generally “is to person accounts from mental health consumer/
experience, reflect, organize, understand, and survivors and, in so doing, made a compelling
communicate” (Estroff 1981, xvi). argument that there could be two sides to the
Different paradigms lead to different questions medical story.
and thus different ways to answer those questions. At roughly the same time, qualitative research
The reader should not wonder if quantitative or methods were being incorporated more broadly
qualitative research is “better,” even if a disserta- into health research. One of the earliest such
tion advisor prefers one approach or the other. The endeavors was Making It Crazy (Estroff 1981),
question really is: Which is the right tool to meet anthropologist Sue Estroff’s account of how indi-
the research objectives? If the research questions viduals with psychiatric disorders were putting
seek to understand volumes or counts (e.g., “How together their lives outside of the state mental
much. . .,” “How many. . .,” “How often. . .”), the hospital. Estroff’s research methods included par-
reader should look to quantitative data collection ticipant observation, in-depth interviews, and ad
techniques. If the interest is in what the world hoc encounters in the community with formerly
looks like from another’s point of view, perhaps hospitalized psychiatric patients. The result of her
with an eye towards understanding motivations time “in the field” is an ethnography that offers the
(e.g., “Why do. . .,” “How do. . .”), then qualita- reader critical insights into the patients’ perspec-
tive approaches are likely the best option. Once tives on psychiatric medications, work, and their
the research objectives are clearly defined, the relationships with others in the community. Other
choice – or even choices (a researcher may use important works were Emily Martin’s (1987) The
multiple methods) – will become obvious. What Woman in the Body, which examined how the
remains is for the researcher to learn how to use language used to describe women’s reproductive
the tool properly. systems influences the medical establishment’s
approach to pregnancy, childbirth, and meno-
pause, and Joan Cassell’s (1991) Expected Mira-
A Sampling of Qualitative Health cles, an ethnography of surgeons and their
Research Studies perceptions of and behaviors around their work.
Cassell’s study offered one of the earliest exam-
The use of qualitative methods to learn how peo- ples of what she referred to as “studying up,” i.e.,
ple make sense of their medical experiences – research into the lives of powerful members of a
either as recipients or providers of health care – society rather than the dispossessed.
is a fairly new phenomenon, dating back only to Occurring about the same time was the adop-
the mid-1980s. Arguably one of the greatest con- tion of anthropological methods by scholars in
tributors to this epistemological shift was the other fields, notably within the field of nursing
patients’ rights movement, which sought to chal- research. This movement is perhaps best epito-
lenge the hegemony of the medical system. The mized by the work of Janice Morse, a registered
mental health consumer rights movement, for nurse who went on to receive advanced degrees
example, was an early catalyst for change, as in both nursing and anthropology. In the mid- to
736 C. Robins
late-1980s, Morse edited several seminal works the acceptance that qualitative methods have
(Morse 1988, 1989a, b) that introduced nursing something valuable to offer health care practi-
scholars to the epistemology and methods of qual- tioners and researchers.
itative research. These approaches proved essen-
tial to cross-cultural nursing, where the nurses’
and patients’ understanding of illness and appro- Methods of Qualitative Data Collection
priate treatment might be worlds apart. Effective
care could best be provided, nursing scholars Informed Consent
argued, when these different perspectives were
taken into account. Before collecting any data, the researcher must
By the early 1990s, qualitative health research ensure that he or she follows the guidelines for
– while still not fully accepted by the health the protection of human subjects. Key to this is the
research establishment – was becoming both informed consent process, whereby the study par-
ubiquitous and highly influential. Efforts to com- ticipant is told what his or her rights are as a
bat the rapid spread of HIV/AIDS both in the USA research subject before any data are collected.
and in other countries and cultures demanded The core elements of the informed consent pro-
research methods that could uncover how a cess are provided in Fig. 1 and include a descrip-
group’s behaviors and beliefs about the disease tion of the study sponsor and how the data will be
were contributing to transmission. Anthropolo- used, the risks and benefits to the participants, and
gists and other social scientists using qualitative the voluntary nature of participation, among
methods rose to the occasion. Paul Farmer’s others.
(1993) ethnography of the interpretation of These elements must be provided to partici-
HIV/AIDS in Haiti early in the epidemic was a pants in a written informed consent form that the
landmark work, demonstrating the critical role of study participant and researcher will both sign and
both history and culture in people’s illness expe- date before data are collected. It is also good
riences. The rapid pace of globalization over the practice to review these key elements verbally
last two decades – and the concomitant potential with participants before beginning an interview
for pandemics – has only increased the or focus group discussion. Examples of how the
essentialness of qualitative research methods in information can be verbally reviewed with partic-
the health fields (Ramin 2009; Ebola Anthropol- ipants can be found in the sample in-depth inter-
ogy Response Platform). view protocol (Fig. 2) and sample focus group
Seminal journals, such as Qualitative Health guide (Fig. 3).
Research, first published in 1990, and, roughly a
decade later, the International Journal of Quali-
tative Methods as well as the online Forum for Data Collection Approaches
Qualitative Research, have provided important
avenues for scholars to share their health research There are three primary qualitative data collection
findings and learn about new and innovative strategies that will help researchers understand
approaches to qualitative methods. There are how the study subjects experience the world and,
also increasingly well-attended research confer- in turn, make meaning of those experiences: focus
ences, including the Qualitative Methods Confer- groups, in-depth interviews, and participant
ence and the Qualitative Health Research observation. Each of these is described in turn.
Conference (held alternating years), both
sponsored by the International Institute of Quali- In-Depth Interviews
tative Methodology at the University of Alberta, In-depth interviews (IDIs) are known by a num-
Canada, and the International Congress of Quali- ber of other terms, including semi-structured
tative Inquiry held annually at the University of interviews, unstructured interviews, one-on-one
Illinois in Urbana. These forums are testament to interviews, and guided conversations, among
30 Qualitative Research 737
All study participants need to be given the following information, as applicable, before any data are
collected.
A checklist of the elements of informed consent can be obtained from the U.S. Department of
Health and Human Services, Office for Human Research Protections website:
https://www.hhs.gov/ohrp/regulations-and-policy/guidance/checklists/index.html - accessed
9.5.17
others. All of these terms, however, can mislead greatest interest to the study. IDIs thus require a
the outside observer, who may believe that the skilled interviewer who has superior active lis-
researcher and interviewee are having an hour- tening skills and who fully understands how the
long discussion bounded by few, if any, parame- interviews are intended to support the goals and
ters. Although the researcher may use an IDI objectives of the study. He or she must have the
guide that at first glance appears lean, each intellectual flexibility to move simultaneously
guide must be carefully crafted to clearly and between the respondent’s narrative and the
narrowly frame the topic for the respondent. study aims, gently guiding the narrative back to
The guide must also include targeted probes to the frame when needed, but also listening for
help the interviewer ensure that, within the gen- new and relevant information that may merit
eral frame, the respondent addresses the areas of additional probing.
738 C. Robins
A hypothetical example will help to illustrate observer (Donohue and Siegel 2000). Sjogren’s
the process. Sjogren’s syndrome has been charac- symptoms can range from the annoying, such as
terized as an “invisible illness,” a disease that may dry eyes, mouth, and skin, to the disabling, includ-
be disabling to the individual who has it, but that ing crippling fatigue, joint pain, and even lym-
offers few visible symptoms to the outside phoma (http://www.sjogrens.org). People living
Introduction
Hello, my name is [NAME ]. Thank you for agreeing to talk with me today about how Sjogren’s has
impacted your social life and experiences. This study, which I am conducting for my dissertation at
University, is being funded by [ORGANIZATION].
Informed Consent
Before we get started there are a couple of things I need to mention. First, this is a research project
and your participation is voluntary. You can stop the interview at any time; if I ask you a question
you would prefer not to answer, just tell me and we’ll move on to the next one. Second, I will do
everything I can to maintain your confidentiality. I will not attach your name to any data files and I
will never use your name in any of my writings from this study. I may use quotes from the people I
interview, but the names of interviewees will not be attached to those quotes. I will also remove any
information from that quote that might identify you to others.
There are no direct benefits to you from participating in this research, although your story will
contribute to my efforts to create a resource manual for others living with Sjogren’s. The main risk
to you from participating in this study is that you might experience some emotional distress from
telling your story. I have a list of resources I will give you at the end of the interview if you would
like.
Finally, with your permission, I would like to audio record our interview today. This is so that I do
not have to take many notes while we are talking and I can focus on the story you are sharing with
me. The recording will also help me to be more accurate when analyzing all of the interviews.
Do you have any questions before we get started? [ANSWER ANY QUESTIONS]
Do I have your permission to audio record the interview? [IF YES, TURN ON THE AUDIO
RECORDER]
Interview Questions
I am interested in learning what it is like to live with Sjogren’s, which some people have referred to
as an “invisible illness.” By that they mean the disease can have profound effects on the person who
has it, but it offers few obvious clues to outside observers that the person is ill. What I’d like to do
today is have you tell me a story about your experiences living with Sjogren’s in a world that may not
know you are sick. You can start your story wherever you like, and you can talk as long as you like,
Fig. 2 (continued)
30 Qualitative Research 739
but tell me everything you think is important for me to fully understand your experiences living and
coping with this invisible illness.
What have you done that has been most effective in getting your work colleagues, family, and
friends to understand what it’s like living with Sjogren’s?
What, if anything, do you wish you had had – or would still like to have – to help others understand
your experiences?
Is there anything else about your experience living with an invisible illness like Sjogren’s that you
haven’t talked about, but that you think is important for me to hear to fully understand your
experiences?
with Sjogren’s may have to make a number of The first question in the IDI guide must set the
significant lifestyle changes, but often without parameters of the interview for study partici-
the support of family or friends, who think the pants, but also give them sufficient leeway to
person looks “perfectly healthy.” This illness cap- be able to share their experiences and their
tures the attention of a hypothetical researcher, points of view. Thus it may look like the
who wants to interview people with Sjogren’s to following:
understand their experiences working and living
I am interested in learning what it is like to live
with a disease that no one can see. She hopes to
with Sjogren’s, which some people have referred
develop a guidebook that can offer sufferers some to as an “invisible illness.” By that they mean the
coping strategies, including talking points that disease can have profound effects on the person
will help the person with the disease explain the who has it, but it offers few obvious clues to
outside observers that the person is ill. What I’d
illness to people in their social network. Thus, in
like to do today is have you tell me a story about
addition to hearing about her subjects’ social your experiences living with Sjogren’s in a world
experiences, she also wants to hear from her inter- that may not know you are sick. You can start
viewees what steps they have taken that have been your story wherever you like, and you can talk as
long as you like, but tell me everything you think
successful in explaining their condition to others,
is important for me to fully understand your
as well as any additional supports they might like experiences living and coping with this invisible
to have. illness.
740 C. Robins
This opening statement is by no means “unstruc- include probes so that the interviewer makes
tured” or even “semi-structured” because the sure the respondent addresses the key domains
interviewee is told precisely the bounds within of the research. Possible probes for this hypothet-
which her narrative should remain: She is being ical study might include the following:
asked to describe the social aspects of the illness,
i.e., what it is like to live with a disease that others • In what ways, if any, has this unseen illness
cannot see. See is not being asked to give a full affected
accounting of her symptoms, the specialists she – . . .your professional life?
sees, or the treatments she is undergoing. – . . .your home life with family members?
In a perfect world, each interviewee would – . . .your social life?
spontaneously relate a story that fully addresses
all areas of interest to the researcher. But because Notice that these three probes cover the key
this is an imperfect world, the protocol should dimensions of interest (work, family, friends), but
Introduction
Hello, my name is [NAME ]. Thank you all for agreeing to participate in this focus group discussion
today about how Sjogren’s has impacted your social lives and experiences. This study, which I am
conducting for my dissertation at University, is being funded by [ORGANIZATION].
Informed Consent
Before we get started there are a couple of things I need to mention. First, this is a research project
and your participation is voluntary. If you decide you no longer want to participate, you can leave
the discussion at any time; if I ask you a question you would prefer not to answer, just tell me and
I’ll move on to the next person. Second, there are no right or wrong answers to any of the questions
that I ask today. You may disagree with what someone else says during the group, and that’s ok. It’s
important that I hear different perspectives. Third, I will do everything I can to maintain your
confidentiality. I will not attach your names to any data files and I will never use your names in any
of my writings from this study. I may use quotes from the focus groups, but the names of
interviewees will not be attached to those quotes. I will also remove any information from that quote
that might identify you to others.
There are no direct benefits to you from participating in this research, although your story will
contribute to my efforts to create a resource manual for others living with Sjogren’s. The main risk
to you from participating in this study is that you might experience some emotional distress from
telling your story. I have a list of resources I will give you at the end of the interview if you would
like.
Finally, with your permission, I would like to audio record our interview today. This is so that I do
not have to take many notes while we are talking and I can focus on the story you are sharing with
me. The recording will also help me to be more accurate when analyzing all of the interviews.
Do you have any questions before we get started? [ANSWER ANY QUESTIONS]
Do I have your permission to audio record the interview? [IF YES, TURN ON THE AUDIO
RECORDER]
Fig. 3 (continued)
30 Qualitative Research 741
Warm-Up Exercise
I’d like to start off by taking just a couple of minutes for us to get to know each other. So if you
would, please tell us just your first name and , briefly, something that you think is unique about
yourself – an interesting hobby, somebody famous that you once met, or an interesting place that
you have visited. [GO AROUND THE ROOM; MODERATOR SHOULD GO LAST]
Discussion Questions
First, I’d like to get a sense of how long each of you has been living with Sjogren’s.
Symptoms
Sjogren’s has often been called an “invisible illness,” that is, a disease in which the symptoms can
have profound effects on the individual who has it, but in ways that may not be obvious to outside
observers. Let’s talk about this idea for a little bit. What symptoms do you all regularly experience
that may affect your daily life, but that you don’t think are noticed by people you work, live, or
socialize with.
Social Impacts
Think for a moment about your professional lives, your home life and family, or your social
activities with friends: Tell me about an instance in which you had to make a lifestyle adjustment to
accommodate your symptoms, but that you didn’t think was fully understood by others, such as
your work colleagues or family and friends.
Strategies
What have any of you done that has been effective in getting your work colleagues, family, and
friends to understand what it’s like living with Sjogren’s?
What, if anything, do you wish you had had – or would still like to have – to help others understand
your experiences?
Close
Is there anything else about your experiences living with an invisible illness like Sjogren’s that you
haven’t talked about, but that you think is important for me to hear to fully understand your
experiences?
also remind the interviewee that the focus of the activities that she once enjoyed is important to
research is on the impact of the invisibility of the the respondent. The researcher can be sympa-
illness, in short, the social effects. As an example, thetic to this wider loss, but needs the respondent
perhaps in responding to the third probe, the inter- to hone in on the one activity that relates to the
viewee describes how she can no longer do a social impact of the illness, namely, hiking with
variety of physical activities because of extreme friends. Thus, an appropriate probe at this junc-
joint pain: she can no longer garden, take week- ture might be: How well do your friends under-
end hikes with friends, go for her morning run, or stand why you stopped going on weekend hikes?
walk the dog. Clearly the loss of an array of A simple probe such as this is respectful of the
742 C. Robins
respondent’s need to describe these myriad losses, questions and the interviewer; it is always a good
but in a way that steers the narrative back to the idea to give the respondent a last opportunity to
research focus. talk about something that may have been given
The researcher should also be aware that the short shrift during the interview:
above probes may not be exhaustive and that
interviewees may add a dimension to their expe- • Is there anything else about your experience
riences that the researcher did not anticipate. living with an invisible illness like Sjogren’s
Perhaps three of the first four interviewees that you haven’t talked about, but that you
start off their narratives by recounting how think is important for me to hear to fully under-
many years it took for a doctor to finally recog- stand your experiences?
nize their symptoms and provide a diagnosis.
The research team is not interested in hearing Interviewees generally will not take this as an
about the clinical manifestations of the illness open invitation to talk about their illness experi-
per se, but these narratives suggest that the ences for another hour, for two key reasons: First,
symptoms may be invisible to the medical com- the protocol was structured so as to give them
munity as well. Thus, two new questions for sufficient latitude to tell their stories; and, second,
subsequent interviews might be: this summary question reiterates that the bound-
aries of the discussion are around the social invis-
• How long did it take for you to get a diagnosis ibility of the illness. The full interview guide,
after you first began experiencing symptoms of along with the critical elements of informed con-
the disease? Why do you think that was? sent, is shown in Fig. 2.
• How well do you think the medical community With a skilled interviewer, the above example
recognizes symptoms of the disease? should generate 45 min to an hour’s worth of rich,
detailed narrative. And after conducting another
Qualitative researchers should always be alert 12–15 such interviews, likely the researcher will
to the possibility that data collection will add have sufficient information to at least begin to
entirely new dimensions to their understanding create the desired end-product. Should there be
of the issue and be prepared to modify the inter- critical information gaps, additional in-depth
view protocol, as needed. interviews can be conducted to complete the
Recall, too, that in this example, the researcher’s product.
aim is to create a guidebook for people living with
Sjogren’s syndrome, one that includes successful Focus Groups
coping strategies and other resources that readers Focus groups are small group (6–10 person) dis-
might find useful. Two additional questions might cussions in which a moderator uses a carefully
be included in this protocol: designed protocol to elicit participants’ input on
the topic of interest (Morgan and Krueger 1997).
• What have you done that has been most effec- While IDIs offer depth on an issue, focus groups
tive in getting your work colleagues, family, provide the breadth necessary when beginning to
and friends to understand what it’s like living explore a particular issue. This is a particularly
with Sjogren’s? valuable data collection approach in the formative
• What, if anything, do you wish you had had – stages of a project, when the study team is still
or would still like to have – to help others learning the range of study participants’ experi-
understand your experiences? ences and perspectives on the topic. Focus groups
may also be the data collection method of choice
Finally, because this is a bounded narrative, when project resources (money, time) are limited.
one steered in a particular direction both by the Sometimes this is unavoidable, although the
30 Qualitative Research 743
researcher should remain cognizant that the lack instance in which you had to make a lifestyle
of depth necessarily limits what one can say about adjustment to accommodate your symptoms,
the findings. but that you didn’t think was fully understood
Two aspects of the group dynamic need to be by others, such as your work colleagues or
considered when developing the discussion pro- family and friends.
tocol for a focus group (see Fig. 3). The first is that
even though they have consented to participate, These questions endeavor to get at the same
some participants may be a little nervous, uncer- issues as those covered in the IDI, but in a way
tain how much they want to reveal about that does not allow any one person to tell his or
themselves in this group of strangers. Thus, her life story. For example, the second question
the protocol should include a brief (5 min) about symptom experience is clearly directed to
“icebreaking” exercise to get rid of any lingering the group (“symptoms which you all regularly
participant butterflies and to begin to create con- experience”) and implies that some of these
nections between those in the room. A particularly symptoms may be shared and so discussed.
effective strategy is to ask participants to tell the The third question also restricts any partici-
group something unique or interesting about pant’s input to a single example – enough to
themselves, such as a hobby they have, someone give the group (and the research team) a sense
famous they once met, or some unusual place they of the breadth of experiences of people living
have visited. Having the moderator also partici- with Sjogren’s. Summary questions can be
pate in this exercise is an excellent way for him or roughly identical to those used in the IDI:
her to establish rapport with the group members
before reassuming control of the discussion. • What have any of you done that has been
The second aspect of the group dynamic that effective in getting your work colleagues, fam-
must be taken into account is that the protocol ily, and friends to understand what it’s like
questions – and the moderator – must balance living with Sjogren’s?
the desire for detailed information against the • What, if anything, do you wish you had had –
need to hear from as many participants as possi- or would still like to have – to help others
ble. In the hypothetical Sjogren’s study, the initial understand your experiences?
questions to a focus group may look something • Is there anything else about your experiences
like the following: living with an invisible illness like Sjogren’s
that you haven’t talked about, but that you
• First, I’d like to get a sense of how long each of think is important for me to hear to fully under-
you has been living with Sjogren’s. stand your experiences?
• [Next] Sjogren’s has often been called an
“invisible illness,” that is, a disease in which An important thing to remember is that
the symptoms can have profound effects on the because the researcher must necessarily limit
individual who has it, but in ways that may not each person’s input to the discussion, it will limit
be obvious to outside observers. Let’s talk the depth around any one person’s contribution to
about this idea for a little bit. What symptoms the research topic – often, some important details
do you all regularly experience that may affect about a person’s story may be missing. This is the
your daily life, but that you don’t think are trade-off of conducting focus groups instead of
noticed by people you work, live, or socialize in-depth interviews, so make sure this is the right
with. data collection strategy to answer the research
• Think for a moment about your professional questions. If the researcher has to conduct focus
lives, your home life and family, or your social groups because there are constraints on project
activities with friends: Tell me about an resources, there may be a temptation to over-
744 C. Robins
interpret the data, e.g., the analyst may see differ- show high rates of morbidity from otherwise
ences between groups that are, at best, lightly very treatable conditions, such as diabetes and
supported by the evidence. Analysts should high blood pressure.
remember to work with the information they do Participant observation would be an excellent
have and let unanswered questions serve as the research strategy for trying to understand what is
basis for their next data collection effort. happening in these aversive encounters, why it is
occurring, and if the findings point to a possible
Participant Observation/Ethnography solution. Locations where the researcher might
This data collection strategy is invaluable when consider conducting observations could include
the researcher believes that subjects’ experiences the clinic waiting room, intake stations where
and perceptions can best be understood in the staff make the initial patient contact, weekly
context in which those experiences occur. The team meetings of various staff (e.g., administra-
researcher gains an understanding how the world tors, clinicians, and support staff), and locations
looks through their eyes by observing their behav- throughout the community where the researcher
iors in the location of interest and asking countless can hear from local residents (e.g., senior centers,
questions, some targeted, some spontaneous community library). An example of the kinds of
(Murchison 2010). question domains that might be relevant to this
A new researcher may find it tricky trying to hypothetical study and the categories of people
create interview protocols for this kind of study, in who might be able to speak to each domain, is
part because so much about the context is illustrated in Fig. 4.
unknown, anticipating what specific questions to This example table is by no means exhaustive,
ask and of whom can feel like an exercise in but suggests areas where there may be a discon-
futility. In addition, the field site oftentimes is nect between the various participants. For exam-
not in a location that lends itself to scheduled ple, the administration may need a high patient
in-depth interviews or focus groups. That said, volume to ensure sufficient reimbursements to
the researcher does know the core study goals keep the clinic operating; clinicians, however,
and, very generally, the roles of those within the may find the required volume overwhelming
context who might be able to address them. because it severely limits the amount of time
Instead of trying to develop a series of interview they can spend with each patient. Intake staff
guides applicable to every conceivable situation, and clinicians may get frustrated with patients
the researcher might consider developing a table who repeatedly return to the clinic with the same
of question domains by interviewee role. The issues, clearly not having followed the treatment
table ensures that the researcher will remain recommended during the last visit. At the same
focused on the goals and objectives of the study, time, patients do not understand why physicians
but in a way that provides the latitude required for expect them to be able to follow-through on med-
ad hoc encounters in the field. In addition, having ication regimens when the community does not
a single, focused study document can prove help- have a pharmacy. Moreover, patients with mobil-
ful if the work is being conducted by a team. ity challenges are not always able to drive to the
Another hypothetical example can illustrate closest pharmacy to pick up their prescriptions.
this approach: A community clinic is struggling The data produced through participant obser-
to meet the needs of local residents because resi- vation are not as neat and tidy as those produced
dents are reluctant to go there. Community mem- through IDIs or focus groups. Although the field
bers say they are often treated rudely by staff, and researcher may be able to conduct the occasional
avoid the clinic altogether so as not to be sub- audio-recorded interview and have it transcribed,
jected to the abuse. Without an alternative source much of the resulting data will be in the form of
of care nearby, however, many residents end up comprehensive observation notes written by the
not receiving any medical care at all. Indeed, researcher on a daily basis. Notes should include
surveys conducted with community members some obvious things, such as observations made
30 Qualitative Research 745
Community Residents
Clinic Administrators
Intake Staff
Clinicians
QUESTION DOMAINS
CLINIC ENVIRONMENT
EXPECTATIONS
Patient volume X X X
Length of appointments X X
Treatment adherence X X
COMMUNITY CONTEXT
by the researcher while sitting in the waiting encounters, the researcher should write up as com-
room: What were interactions like between plete a recounting of the conversation as memory
patients and intake staff? Did the participants allows and as soon as possible after the interview.
seem to be polite with each other or was tension Finally, the researcher should include her own
evident? And what was the evidence for either of thoughts and feelings in the observation notes.
these observations? How long were patients sit- Perhaps she finds the clinic staff insufferable,
ting in the waiting room? What did it look, feel, believing them to be rude to the patients. Con-
and smell like while sitting there, i.e., did the versely, perhaps she finds the patients them-
researcher find it to be a welcoming environment selves unpleasant, believing them to be
or not so much? Why? Notes should also be demanding too much from harried physicians.
recorded of any ad hoc interviews, whether in Regardless, it is important that the researcher
the clinic or in the community. Although it likely keep in mind the biases she brings to the work as
will not be possible to write verbatim notes while well as how those biases can easily color her
talking with people during these spontaneous interpretation of the data. Realistically, it is
746 C. Robins
highly unlikely that one clinic managed to hire gives the respondent some time to consider if he
all of the unbearable doctors, nurses, physi- or she is okay with being recorded. If the record-
cians’ assistants, and administrative staff in the ing is not optional (e.g., the funder/client may
area. The researchers must then ask herself, stipulate in a contract that focus groups are to be
what might be the structural contributors to the recorded), this allows potential participants the
staffs’ bad behavior? Are they overworked? Is opportunity to opt out early if they do not wish
the pay lower than other similar positions in the to be recorded.
area? Do they feel like they are unable to make a
positive difference in their patients’ lives? The
Allow Interviewees or Participants
researcher may not like – indeed, should not feel
to State Things off the Record
compelled to like – the individuals with whom
Interviews can be very cathartic at times, leading
she is working. But it is critical to acknowledge
respondents to get something off their chests that
those feelings and move beyond them so that
they then wish they hadn’t. The researcher should
systemic challenges – and thus possible solu-
let participants know that if they end up saying
tions to the problems – can be identified.
something they want to have expunged, it will be
deleted from the recording, any notes about it will
To Record or Not to Record? be scrubbed, and that information will never make
it into the report. Sometimes respondents may say,
Researchers new to qualitative methods “I need to say something, but it has to be off the
often express discomfort about using an audio record.” The interviewer should TURN off the
recorder during an interview or even a focus audio recorder, let them say what they need to
group discussion. Particularly when interviewing say, and then ask permission to turn the recorder
people about sensitive subjects (e.g., illness, sexu- back on. Interviewees can be much more comfort-
ality), the recorder can seem like a monstrous intru- able knowing they have some editorial control
sion on the interviewee’s private experiences. over what they say.
Nevertheless, recording is the best way to create
an accurate record of what was said during the Let Participants Create a Pseudonym
interview and thus ensure that the analysis is Because study participants’ names will never be
based not on secondary data (e.g., the interviewer’s used in final reports or journal, it makes no differ-
notes and remembrances), but on the primary ence to the researcher whether they use their real
results of the data collection effort (e.g., the record- name when being interviewed or not. But some
ing itself or interview transcripts). Edward Ives The individuals feel more comfortable with the added
Tape Recorded Interview (Ives 1995) is a particu- layer of anonymity that a pseudonym can bring. If
larly useful guide for researchers, but the following a topic, particularly in a focus group discussion, is
brief tips may facilitate the reader’s use of an audio especially sensitive, the researcher should con-
recording device. sider offering participants the option of coming
up with their own names for purposes of the
Discuss the Desire to Record Early discussion.
in the Process
Except on holidays and birthdays, many people Store Audio Files in a Secure Location
do not care to be surprised. If the study plan is to Neither the researcher nor his/her interviewees
record the interviews, respondents should be told should feel confident that an audio file on a por-
this at the recruitment stage of the project: “I will table recording device will not be accessed by
be conducting an approximately one-hour inter- others. Not only do such devices lack security
view that, with your permission, I would like to features, but also they are small and easily mis-
audio record.” If the recording is optional, this placed or lost. Study participants should be told
30 Qualitative Research 747
how the researchers will secure their information, with a description of the fundamentals of the
including where the file will be maintained and analytic process, more details of which can be
how quickly the file will be deleted from the found in the previously cited references.
portable device.
(4.a.) effective coping strategies, (4.b.) strategies “Please tell me what it has been like for you
that did not work so well for them, (4.c.) resources living with Sjogren’s?” Your probes may be less
that they wish they had available to them, and (4. directive than ours, asking, “How does the illness
d.) text still not yet categorized. As the reader can affect you day-to-day?” rather than, “How has
see, in this example, simply using the concepts living with this illness affected your social life?”
covered in the interview guide (which, not coin- In this case, simplifying the data requires reli-
cidentally, parallel the research questions), the ance on an inductive analytic approach, in which
analyst can readily parse hundreds of pages of the meaningful categories emerge from the read-
data into smaller, more manageable “units” of ing of the analyst’s data rather than being pre-
data for analysis. determined by the research questions. Inductive
Distillation by mapping extant categories onto simplification may mean the analyst needs to
the data, essentially a deductive approach, is not read all of the interviewees’ transcripts, at least
so much “analysis” as it is a necessary precursor to once, possibly twice, before he or she can begin
the analytic process. That is, there is nothing par- to find recurring themes in their narratives. Many
ticularly analytic about locating all of the tran- may describe impacts of the illness on their work
script sections in which interviewees describe and social lives and, as a result, an initial cut in
how Sjogren’s has affected their social lives. the data is created along these two dimensions.
However, it is only by reading through all of this But the analyst may also find that interviewees
similar text that the analyst can then begin to often describe being disappointed in themselves
discern patterns in interviewees’ descriptions – when they find they are no longer able to do not
i.e., analyze – the ways in which this invisible only high-energy activities, such as hiking or
disease leaves its social mark. Indeed, the analyst playing tennis, but even simple tasks to which
ultimately may find at least two threads in these they once gave not a moment’s thought. Carry-
narratives: those cases in which interviewees were ing a basket of laundry, turning a wrench to
no longer able to participate in their peer group’s release the oil drain bolt on the car, even walking
activities and their social lives collapsed and up a flight of stairs – once effortless activities
those, perhaps fewer, instances in which inter- have become onerous, if not impossible, to per-
viewees described a strengthening of their core form. Interviewees describe a loss of self-
social relationships. This bifurcate finding may efficacy that to them is as disturbing, if not
lead the analyst down a further analytic path as more so, than the loss of their social lives. After
he or she endeavors to determine the factors that reading several similar descriptions, the
contribute to any individual experiencing one researcher might create a provisional category,
social trajectory or the other. In sum, deductive perhaps called “Sense of Self,” and begin to look
data simplification does not preclude inductive for additional text that recounts similar feelings
(see below) data analysis. and experiences on the part of the narrator. As
with deductive simplification, once these subsets
Inductive Simplification of data are defined, the analysts can dive further
What if the research questions are not nearly so into each, looking for additional similarities and
clear-cut as the ones proposed in this article? differences in how individuals describe these
What if, instead of wondering how Sjogren’s like experiences.
affects the work, home, and social dimensions Unlike a purely deductive approach, where
of people’s lives, the research goal is simply to data reduction is a precursor to analysis, data
capture the broad experience of living with analysis is part and parcel of inductive data sim-
Sjogren’s? Instead of asking interviewees to plification. Meaningful cuts in the data are not
describe what it is like “living with an invisible predetermined by the research questions or inter-
illness” (which, as noted previously, necessarily view guides, but must be determined by the
implies asking how the person with the illness researcher through multiple careful readings of
interfaces with others), the researcher asks, the data and their subsequent interpretation.
30 Qualitative Research 749
Nevertheless, data reduction is still only the First, because the analyst is using electronic
first step in the process, whether it begins codes for data reduction rather than colored
through induction or deduction. Subsequent ana- markers, extracting similarly coded text can be
lytic efforts will explore the data for additional as quick and easy as the click of a button or the
patterns, such as themes or concepts that are writing of a simple program (data query). The
shared by all respondents, or multiple different analyst thus can spend less time looking for text
perspectives on the same issue (e.g., differential and more time reading it to see if there are impor-
impacts on one’s social life). Whenever possible, tant nuances in interviewees’ narratives. Second,
finding a potential explanation for such differ- automation allows studies to collect and analyze
ences is the next step in the analytic process. The much larger volumes of data than would be feasi-
previous social impacts example described ble if the work were being done by hand. In 2010,
hypothetical interviewees who said their social for example, the U.S. Department of Defense
worlds came undone as a result of their illness supported a Comprehensive Review Working
and others who said they grew even closer to Group (CRWG) to examine active-duty and
their core group of friends. The analyst might reserve service members’ views about the poten-
first look for demographic differences in each of tial impact on unit cohesion, morale, and readi-
these groups as a possible way to account for the ness if Don’t Ask, Don’t Tell (the 1993 law
different effects: perhaps the latter interviewees barring openly gay individuals from serving in
are significantly older than the former or perhaps the military) were repealed. In addition to
the first group are single while the second group conducting the largest-ever survey of service
are married. The analyst might also look to each members and military spouses, the effort included
speaker’s narrative for additional clues that the analysis of hundreds of focus group tran-
could account for the differences: words such scripts, two thousand open-ended survey com-
as “outgoing,” “active,” “gregarious,” and ments, and literally thousands of comments sent
“social” may characterize the first group’s narra- to a DOD inbox. All data collection and analysis
tives, while such terms are largely absent among took place within a ten-month timeframe, a feat
the second group. that was possible only with the support of an
excellent qualitative data analysis program
A Note on Data Coding (Robins and Eisen 2017).
Over the last 20 years, qualitative researchers Third, these programs allow the users to
have increasingly incorporated software into link respondent characteristics (e.g., demographic
their approach to data analysis. Sophisticated pro- data, geographic location, organizational affiliation)
grams such as NVivo, Atlas.ti, Dedoose, and to interview documents such that the analyst can
others are allowing researchers to analyze more quickly examine the data for any patterns by respon-
data, more quickly, and in a way that is far more dent type. In the Don’t Ask, Don’t Tell study, for
transparent than the old-fashioned paper-and-col- example, the team was able to explore respondent
ored-markers approach. Importantly, though, the sentiment regarding repeal (positive, negative, does
fundamentals of qualitative data analysis do not not care) by respondent gender, service (e.g., Marine
change simply because a computer is involved. Corp, Army), officer or enlisted status, or pay grade,
The analyst must still read through all of the data; or any combination of those characteristics (e.g.,
reduce the reams of information into manageable, female Army officers compared to male Army offi-
“like” units through either deductive or inductive cers). This type of analysis can possibly be done
simplification processes; read the like units to without a computer, but it would be tedious and
identify narrative themes that are shared by the time-consuming, at best.
interviewees or that diverge; and, when possible, Finally, and importantly, qualitative analysis
seek an explanation to account for those differ- software supports the development of an “audit
ences. The software does, however, make several trail,” a time- and date-stamped description of
of these steps easier. the decisions and actions of the analytic team.
750 C. Robins
This is important documentation for clients, Donohue PJ, Siegel ME. Sick and tired of feeling sick and
some of whom may be uncertain about the tired: living with invisible chronic illness. New York:
WW Norton & Company; 2000.
rigor with which the qualitative analysis is Ebola Anthropology Response Platform. http://www.
being done. It is also an invaluable check for ebola-anthropology.net/
the analysts, ensuring that both new and sea- Estroff SE. Making it crazy: an ethnography of psychiatric
soned researchers are able to support both their clients in an American community. Berkeley: Univer-
sity of California Press; 1981.
decisions and their findings with data. Farmer P. AIDS and accusation: Haiti and the geography of
blame. Berkeley: University of California Press; 1993.
Forum: Qualitative Social Research. Accessible at: http://
www.qualitative-research.net/index.php/fqs/index.
Summary Geertz C. The interpretation of cultures. New York: Basic
Books; 1973.
Qualitative health researchers have helped to shed Hopper K. Qualitative and quantitative research: two cul-
tures. Psychiatr Serv. 2008;59(7):711.
light on how both patients and clinicians under- International Congress of Qualitative Inquiry. Information
stand states of health, disease, and what consti- available at: http://icqi.org/qualitative-health-townhall-
tutes appropriate treatment. The insights meeting/
generated from their work have contributed to International Institute for Qualitative Methodology. Infor-
mation available at: https://www.ualberta.ca/interna
reduced disease transmission, understanding of tional-institute-for-qualitative-methodology
patients’ lived experiences, improved communi- International Journal of Qualitative Methods. Accessible at:
cation between clinicians and the people they https://us.sagepub.com/en-us/nam/international-journal-
treat, and better patient health care experiences. of-qualitative-methods/journal202499#description
Ives ED. The tape-recorded interview: a manual for
There has always been the potential for misunder- fieldworkers in folklore and oral history. Knoxville:
standings to emerge between patients and clini- University of Tennessee Press; 1995.
cians, who have very different funds of Kuhn TS. The structure of scientific revolutions. Chicago:
knowledge and assumptions about the world. University of Chicago Press; 1962.
Martin E. The woman in the body: a cultural analysis of
The rise in globalization only exacerbates the reproduction. Boston: Beacon Press; 1987.
potential for conflict in the midst of a medical Miles MB, Huberman AM. Qualitative data analysis. 2nd
crisis, resulting in increased disease burden on ed. Newbury Park: Sage; 1994.
patients and the systems trying to treat them. Morgan DL, Krueger RA. The focus group kit. Los
Angeles: Sage; 1997.
Health researchers interested in contributing to Morse JM, editor. Recent advances in cross-cultural nurs-
the development of constructive dialogues in the ing. Edinburgh: Churchill Livingstone; 1988.
clinical encounter may well find that qualitative Morse JM, editor. Cross-cultural nursing: anthropological
research methods are the right tool for the job. approaches to nursing research. New York: Gordon &
Breach; 1989a.
Morse JM, editor. Qualitative nursing research: a contem-
porary dialogue. Rockville: Aspen Press. Rev ed.,
Newbury Park: Sage; 1989b.
References Morse, JM. Getting Started: Labels, Camps, and Teams.
Qualitative Health Research, Volume 1991;1(1):3–5.
Alvesson M, Skoldberg K. Reflexive methodology: new Morse JM, Field PA. Nursing research: the application of
vistas in qualitative research. Los Angeles: Sage; 2009. qualitative approaches. Cheltenham: Stanley Thornes
Bernard HR, Ryan GW. Analyzing qualitative data: sys- Ltd; 1996.
tematic approaches. Los Angeles: Sage; 2010. Murchison JM. Ethnography essentials: designing,
Boeije H. Analysis in qualitative research. Los Angeles: conducting, and presenting your research. San
Sage; 2010. Francisco: Josey-Bass; 2010.
Cassell J. Expected miracles: surgeons at work. Philadel- Ponterotto JG. Qualitative Research in counseling psychol-
phia: Temple University Press; 1991. ogy: a primer on research paradigms and philosophy of
Chamberlain J. On our own: patient-controlled alternatives science. J Couns Psychol. 2005;52(2):126–36.
to the mental health system. New York: Haworth Press; Popper K. The logic of scientific discovery. London:
1978. Routledge; 1934.
30 Qualitative Research 751
Qualitative Health Research. Accessible at https://us. Roller MR, Lavrakas PJ. Applied qualitative research
sagepub.com/en-us/nam/journal/qualitative-health-re design: a Total quality framework. New York: Guilford
search#description Press; 2015.
Ramin, B.M. & McMichael, A.J. Climate Change and Sokal AD. Transgressing the boundaries: towards a trans-
Health in Sub-Saharan Africa: A Case-Based Perspec- formative hermeneutics of quantum gravity. Social
tive. EcoHealth. 2009;6:52. https://doi.org/10.1007/ Text. 1996;#46/47:217–52.
s10393-009-0222-4 Thorne S, Reimer Kirkham S, O’Flynn-Magee K. The
Robins CS, Eisen K. Strategies for the effective use of analytic challenge in interpretive description. Int J
NVivo in a Large-Scale Study: qualitative analysis Qual Methods. 2004;3(1):1–11.
and the repeal of don’t ask, don’t tell. Qual Inq. Volume
2017;23(10):768–778.
Part III
Health Care Systems and Policies
Assessing Health Systems
31
Irene Papanicolas and Peter C. Smith
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755
What Is Performance Measurement for? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756
Defining and Measuring Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758
Defining the Unit of Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758
Defining Key Performance Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759
Methodological Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
important role not only as an intrinsic element of result of the different ways in which the health
the health system but also as a key component of a system and its objectives are conceptualized
great deal of health services research. Underlying by different stakeholders and frameworks.
all of these efforts is the role it plays in enhancing The chapter will then consider some of the
the decisions that various stakeholders, such as methodological considerations which have
patients, clinicians, managers, governments, and arisen the use and evaluation of performance
citizens, take in identifying performance improve- information. And finally it will conclude by
ments and steering the health system toward better discussing the major challenges found in pre-
outcomes overall. senting and using performance measures but
The use of performance measurement for also by considering the presenting key lessons
health system improvement has been strongly and future priorities.
advocated by pioneers in the field such as
Florence Nightingale and Ernest Codman since
the late 1800s. Yet only in the past decades have What Is Performance Measurement
health systems seen a substantial growth in health for?
system performance measurement and reporting
to this end. The new growth in performance infor- Health systems are complex entities with many
mation and its use for improvement have been different stakeholders, including patients, health-
the result of multiple factors on both the demand care professionals, health-care providers, purchaser
and supply side. On the demand side, increasing organizations, regulators, the government, and the
demands of accountability and transparency broader citizenry. As outlined by an early report
by the public have created a growing culture in the area of health information (Rigby et al.
requiring proofs and accountability. While on the 1999), information can be identified as having
supply side, great advances in technology have five key roles in health care (Table 1) relating to
made it possible to develop and store increasing the different accountability relationships that exist
amounts of information, allowing stakeholders between the many stakeholders in the system.
instant access to large volumes of data (Smith Through the collection and use of information
et al. 2009). for decision-making in health systems, stake-
While these factors give major impetus to the holders can hold each other to account, thereby
use of information for performance improve- facilitating improvements in effectiveness and effi-
ment, a large number of key debates and barriers ciency. Thus, the fundamental role of performance
remain. Health systems are still experimenting measurement is to help enable accountability
with performance measurement, and large steps relationships to function, by enabling stakeholders
are still needed to coordinate efforts and identify to make informed decisions. It is therefore note-
what works. The policy agenda has moved worthy that, if the accountability relationships are
from concerns with whether data collection to function properly, no system of performance
should be undertaken, and in what areas, to information should be viewed in isolation from
concerns of how to summarize and present data the broader system design within which the mea-
and how to coordinate key interests in order surement is embedded.
to develop firmly based policies and tangible Each of the key roles of information described
improvements. in Table 1 relates to a separate function or role of
This chapter seeks to summarize some of the health-care system, such as providing patient
the main issues emerging in the performance care or planning and developing health services.
measurement debate. The chapter will begin by Each entails different information needs in terms
considering what the key aims of performance of the nature of information, the level of detail and
measurement are and what performance measure- timeliness, and the level of aggregation required,
ment seeks to evaluate. This section will draw in order to function effectively. For example,
upon some of the debates which have arisen as a in choosing which provider to use, a patient may
31 Assessing Health Systems 757
need detailed comparative data on health out- allow for an appropriate assessment of its perfor-
comes for a specific intervention. In contrast, in mance. In the past decade, numerous conceptual
holding a government to account, and deciding frameworks have been created for health system
for whom to vote, a citizen may seek out highly performance assessment at the international level
aggregate summaries and trends. Many inter- and national level. In many cases countries have
mediate needs arise. In order to contribute to developed more than one performance frame-
operational management, more aggregate infor- work, reflecting variations in national and/or
mation and detailed assurance on safety aspects local priorities or the performance of different
may be necessary. This variety of uses highlights areas of the health system. While existing frame-
greatly different information needs in terms of the works have varied purposes, they all aim to pro-
nature, detail, timeliness, and level of aggreg- vide a better understanding as to what constitutes
ation information users require. A fundamental “good” performance by identifying the entity
challenge in performance measurement is to cre- whose performance is being assessed, its key
ate information systems that are able to cater objectives, and the underlying structures and fac-
efficiently for these diverse needs, both in terms tors that drive performance (Papanicolas and
of data collection and data presentation and Smith 2014).
interpretation.
In practice the development of performance
measurement has rarely been pursued with a
Defining and Measuring Performance
clear picture of who the information users are or
what their information needs might be. Instead
The role of performance measurement is to
performance measurement systems have often
measure, analyze, and report the extent to
developed opportunistically, usually seeking to
which the health system is achieving its key
inform a variety of users and presenting a wide
objectives. In order to assess performance suc-
range of data in the hope that some of the infor-
cessfully, it is important to be able to unambigu-
mation collected will be useful to various parties.
ously define the entity being assessed (whether
Yet, given the diverse information needs of the
this be the health system, an organization, or an
different stakeholders in health systems, it is
individual), as well as the key performance objec-
unlikely that a single method of performance
tives of this entity.
reporting will be useful for everybody. Instead
data sources should be designed and exploited
with the needs of different users clearly in mind.
This may often involve using data from the same Defining the Unit of Analysis
sources in different forms. A major challenge
for health systems is therefore to develop more One of the main areas of debate across this field
nuanced ways of collecting and presenting perfor- of study involves clearly defining the unit under
mance measures for the different stakeholders scrutiny, whatever the level of analysis. At the
without imposing a huge burden of new data system level, differences exist between national
collection and analysis. and international stakeholders in determining
The starting point of most performance assess- where the health system boundaries lie and what
ments is the creation of a conceptual framework responsibilities lie within the jurisdiction of the
on which to base the collection of information and health system. In particular, there is no consensus
to use as a heuristic for the understanding of the as to whether a definition of the “health system”
entity being assessed (whether it be the entire should encompass the wider determinants of
health system, a provider organization, or an indi- health outcomes and whether it should include
vidual practitioner). A theoretical framework is activities which impact health outcomes such as
necessary to help define a set of measures that public health, health promotion, and targeting
reflect key organizational objectives and in turn social determinants of health (Papanicolas et al.
31 Assessing Health Systems 759
2013). There can be no right answer to this ques- to achieve its overarching goals. Thus, to be
tion, as institutional arrangements differ between able to assess the performance of health system,
countries, and there are arguments for promoting it is important to articulate clearly its key objec-
the use of both wider and narrower boundaries tives. There exists a substantial literature which
depending on the purpose of the analysis. How- outlines the main goals of the health system
ever lack of consensus on this issue makes inter- (Aday et al. 2004; Atun 2008; Commonwealth
national comparison of performance assessment Fund 2006; Hurst and Jee-Hughes 2001; IHP
difficult (Papanicolas and Smith 2014). 2008; Jee and Or 1999; Kelley and Hurst
At the organizational level, boundaries 2006; Klassen et al. 2009; Murray and Frenk
between different sectors of care such as primary 2000; Roberts et al. 2008; Sicotte et al. 1998),
care, hospital care, and long-term care are rarely and while there are differences related to the
clearly defined. Part of the difficulty in producing definitions of what particular objectives entail,
a coherent definition of these services and organi- there seems to be relative consensus on the
zations emerges from differences in remits within objectives themselves. These objectives can
and across systems. For example, something like usually be summarized under a limited number
rehabilitation after surgery may be provided in the of headings broadly summarized as:
hospital sector in some systems and in long-term
care facilities in others. However it would be • The health conferred on citizens by the health
misleading to compare the performance of the system
two hospital sectors without considering this dif- • The extent to which the health system is
ference. Whatever the chosen definition, in any equitable
evaluation of performance, the crucial objective • The extent to which patients and their families
is to ensure that the achievements being assessed are protected from the direct costs of needed
accurately represent the contribution attributable health care
to the entities under scrutiny. For example, in • The patient experience offered by the health
the performance assessment of a hospital, it is system
essential to isolate the contribution of hospital • The efficiency and productivity with which
care to the attainment of performance objectives health resources are utilized
(e.g., health improvement) and where necessary to
adjust for any contribution of other activities such The fundamental goal of all health systems is
as primary care provision, public health, and con- to improve the health of patients and the general
textual factors such as the economic, political, and public. However, aside from being concerned
demographic environment. It is thus necessary for with the absolute level of health improvement
one to consider what range of services falls within in each system, a number of performance frame-
the accountability of the hospital – and how the works highlight the importance of distributional
contribution of these services can be assessed (or equity) issues, expressed in terms of inequity
controlling for other factors external to the respon- in health outcomes. Most health systems today
sibilities of the hospital. are concerned not only with the ability of the
health systems to improve health but to do so
across all groups in the population. Related to
Defining Key Performance Objectives this concept is the issue of equity of access to
health care or equity of access to and financing of
Section “What Is Performance Measurement health care; most health systems also seek to
for?” above outlines the main objectives of per- protect citizens from the impoverishment that
formance assessment and the potential that infor- can arise from health-care expenditure and to
mation holds to ensure that the accountability ensure all groups of the population have access
relationships within the health system can operate to at least a basic package of health services
in a manner that enables the health-care system (Papanicolas et al. 2013).
760 I. Papanicolas and P. C. Smith
In 2000, the World Health Report “Health the same objective (e.g., improving quality of
Systems: Improving Performance” highlighted life, extending years of life lived, or providing
health system “responsiveness” as an intrinsic services).
goal of the health system (Murray and Frenk As stated above, the overall aim of perfor-
2000; WHO 2000). The WHO definition refers mance measurement is to measure, analyze, and
to “responsiveness to the legitimate expectations report the extent to which the health system is
of the population for their interaction with achieving its key objectives. However, we have
the health system,” and it captures dimensions also seen that information requirements necessary
unrelated to health outcomes such as dignity, to measure performance vary across the key roles
communications, autonomy, prompt services, of the health system, the different stakeholders,
access to social support during care, quality of and the different levels of analysis. Table 2 con-
basic services, and choice of provider. Often this siders some of the key types of measures relating
goal is also referred to as patient or population to the objectives discussed above at different units
satisfaction or patient experience, yet while there of analysis, in particular relating to (1) the system
is overlap across these three concepts, they do not level, (2) the organizational level, and (3) the indi-
all encompass the same characteristics but almost vidual level. Information at the systems level is
always relate to the underlying expectations aggregated information that allows stakeholders
of patients and the population. As with health to consider how performance objectives are being
outcomes, it is not only the absolute level of met at the population level. This information can
responsiveness/satisfaction or good experience be useful for national or regional benchmarking
in a system that is of interest but how this is exercises or to gauge overall performance on par-
distributed among different groups in the ticular goals or to assess the impact of system-
population. level reforms. Organizational-level performance
Finally, efficiency and productivity, or the can be crucial for many of the key roles of the
extent to which health resources are used to health system, such as allocating resources,
produce valued outcomes, is also a key objective patient choice, treatment, and policy evaluation.
of health systems. Reflecting the wide range Finally, information at the individual level can be
of potential perspectives, economists and policy very important for physicians and managers to
makers have adopted different conceptualizations ensure that safe and effective services are deliv-
of efficiency when analyzing different levels of ered to patients.
the health system. Systems-level efficiency is
concerned with understanding how well a specific
system is using the resources at its disposal Methodological Issues
to improve health and secure related objectives
(Papanicolas and Smith forthcoming). At the The diverse set of users and information needs
organizational level, definitions usually refer to in a health system call for a wide variety of mea-
the extent to which health service objectives – surement techniques and indicators. Various
such as hospital objectives – have been achieved approaches toward data collection are needed
compared to the maximum that could be attained, to assemble the necessary information, such as
given the resources available and the external national surveys, patient surveys, administrative
constraints on attainment. While, at the very databases, and routinely collected clinical infor-
micro level, efficiency can be related to decisions mation. The domain of performance being exam-
of individual clinicians on how to distribute ined will in part determine the most appropriate
health-care resources across treatment options in data collection technique (Table 3). For example,
order to maximize valued outputs. The study of when measuring responsiveness, household or
this type of efficiency often takes the form of a individual surveys are likely to be the best sources
systematic analysis of the effects and costs of of patient’s experiences and perspectives, whereas
alternative methods or programs for achieving when looking at specific clinical outcomes,
31 Assessing Health Systems 761
clinical registries may be a more informative and objectives outlined in section “Defining and
cost-effective source of information. In practice, Measuring Performance” are often the product of
although performance measurement efforts have numerous determinants. An individual’s health
progressed over recent years, many health sys- status, for example, can be directly influenced
tems still rely on readily available data as a basis in the short term by actors in the health services
for performance measurement. An important (e.g., improving medical care), others that require
research agenda is to determine where new or long-term action of actors not directly associated
revised data collection initiatives would be most with health services (e.g., environmental policy),
valuable. and yet others that depend primarily on the actions
Regardless of the data sources used, a funda- of individuals and their families (e.g., diet).
mental issue that arises when seeking to interpret Various statistical methods can be used to
performance data is: What has caused the adjust information for different risk factors, such
observed performance and to what practitioners, as differences in resources, case mix, and environ-
organizations, or agencies should variations in mental factors, to make performance more com-
performance be attributed? The key performance parable across organizations or practitioners.
762 I. Papanicolas and P. C. Smith
These methods are known as “risk adjustment” Furthermore, when performance assessment
techniques. Where variations in performance is used for health service improvement, it is
measures are known to be influenced by factors essential that causality for observed measures
beyond the control of the entities under scrutiny, is attributed to the correct sources or parties
it becomes essential to employ methods of (Terris and Aron 2009).
risk adjustment when using and comparing indi- When collecting and assessing performance
cators to help account for these variations. For information, two types of error should be recog-
example, when measuring hospital outcomes nized and controlled for to the extent possible.
as an indication of quality, it may become cru- The first of these is random error, which emerges
cial to adjust for patient attributes such as their with no systematic pattern and is always present in
age, comorbidities, or socioeconomic class. quantitative data. Random error can give rise to
Failure to risk-adjust outcome measures before two types of false inference, commonly known
comparing performance may result in drawing as type 1 errors (false positive) and type 2 errors
misleading conclusions and can have serious (false negative). The traditional way of control-
implications for policy and quality improvement ling for these errors is to apply statistical tests to
(Iezzoni 2013). However, many methods of risk data at a high significance level (usually 0.05 or
adjustment remain highly contested. Therefore, 0.01). Although well understood, this statistical
whenever risk adjustment is undertaken, it approach is essentially arbitrary and ignores the
should be presented in a clear transparent man- relative cost of making either type of error. The
ner together with the final performance data. second type of error is systematic error which may
31 Assessing Health Systems 763
occur if there have been errors in measurement while process measures will be more indicative of
approaches, such as flawed sampling methods. the quality of care for a specific procedure. It is
Systematic errors of this sort will lead to errone- critical that designers of performance measure-
ous conclusions concerning a variable’s true ment schemes are aware of the advantages and
value. In order to avoid systematic errors, it is disadvantages of different types of indicators
critical that data collection methods are carefully when using them to assess performance. Table 4
designed, implemented, and audited. summarizes the main advantages and disadvan-
Traditionally, performance measures have tages of using outcome and process indicators and
been classified as structure, outcome, or process the areas of performance measurement where they
measures. Outcome reflects the eventual objective are most useful.
of the system. However, certain process measures Experience indicates that a balanced approach
may be more realistic indicators of quality if with multiple aggregated and disaggregated indi-
they are known to be associated with good future cators is most desirable to cater for the informa-
outcomes. Different types of indicators will be tion needs of different stakeholders and to allow
appropriate depending on the setting. For exam- more informed policy decisions. For this reason,
ple, outcome measures such as mortality may be composite indicators – indicators which combine
more useful when looking at population health, separate performance indicators into a single
764 I. Papanicolas and P. C. Smith
index or measure – are often used to rank or organizations and the systematization of classifi-
compare the performance of different practi- cations within and across countries (using tools
tioners, organizations, or systems by providing a such as diagnostic resource groupings and/or ICD
“bigger picture” and offering a more rounded codes) also allow more robust comparisons across
view of performance (Goddard and Jacobs organizations. Finally, another very large area of
2009). The main virtue of composite indicators development is that of information and communi-
is that they capture attention in a way that a mass cation technologies (ICT), often described within
of separate indicators cannot. However, critics of the EU context in particular as “e-health,” which
composite measures argue that reducing the mea- has the potential to improve greatly the scope,
surement of objectives, or entire dimensions, to volume, and quality of performance data.
one indicator runs the risk of being too simplistic
and masks many of the variations in performance
that should be studied. Conclusions
Indeed, if composite indicators are not
carefully designed, they may be misleading and The ultimate aim of performance measurement
could lead to serious failings if used for health is to help hold the various agents to account,
system policy making or planning (Smith 2002). given the organization and structure of the health
One of the main challenges encountered in system, by enabling these stakeholders to make
the creation of composite indicators is selecting informed decisions. In order for these account-
which measures to include in the indicator and ability relationships to function properly, no per-
with what weights, particularly in areas where formance information system should be viewed
there is little choice of data, and questionable outside its broader context within which the mea-
sources may be used for some components of the surement is fixed. Where possible the perfor-
indicator. Thus, when using composite indicators, mance measurement should provide information
it is prudent to give a full description of all the for all the relevant accountability relationships
information that is summarized in the indicator, present in the health system.
to provide an insight into the performance of each If undertaken carefully, performance measure-
component and help pinpoint the reasons for ment can offer a powerful resource for identifying
variation. In addition, the composite and its inputs weaknesses and suggesting relevant reforms. The
should be presented with proper uncertainty progress that has been achieved is impressive,
measures, which may be more informative than both in the scope of areas for which data is now
measures of central tendency (Jacobs et al. 2005; available and in the degree to which comparability
Naylor et al. 2002). across different entities has been improved.
It is important to note that rapid progress Table 5 outlines the key developments that have
is being made in all areas of health system been made across some health service perfor-
data collection, including areas such as the design, mance domains and also highlights some of the
collection, governance, linkage, and dissemina- main challenges that remain.
tion of data. These developments have the poten- The data collection techniques and methodo-
tial to add further value to the existing data logical tools used for performance measurement
collected, particularly by extending the applica- have developed considerably in the past decade.
tion of what is already available and by collecting The debates raised by the WHO 2000 report in
new data in a more coordinated, timely, and reli- particular have spurred the development of data-
able fashion. Data linkage is allowing researchers sets, which are updated regularly with new sur-
and policy makers to create a more complete veys, process indicators, or outcome indicators in
record of all factors that contribute to health, facil- order to best operationalize theoretical concepts.
itating the creation of less noisy indicators and Considerable progress has also been made in
a more holistic picture of health determinants. the measurement of patient-reported outcomes,
The adoption of IT systems in health-care patient satisfaction measures, and patient
31 Assessing Health Systems 765
Table 5 Challenges and developments for the measurement of health service performance domains
Performance domain Challenges for measurement Developments in measurement
Health improvement Many aggregate measures fail to distinguish The development of electronic health records
the contribution of the health system (EHRs) provides more complete information
Problems of comparability among over time, on all factors influencing outcomes
reflecting changes in and differences between Increase in registry data, which identifies
coding rules individual patients and traces them through
Large gaps in availability of evidence on the the care process
effectiveness of treatments reducing Increase in measures of outcomes that are not
mortality defined in terms of cure, which are important
Limited set of dimensions captured by for the measurement of chronic disease and
outcome measures with a marked lack of long-term care
measures on disabilities or discomfort
Lack of available, good-quality, and
comparative data at the patient level
Equity Lack of existing datasets which provide a Better collection of indicators on
longitudinal perspective determinants of health
Limited evidence has been recorded on how Investing in data linkages to allow
sensitive inequalities are to the inclusion of desegregation by socioeconomic status and
environmental effects better monitoring of health inequalities
Limited understanding of the factors
explaining the health production process and
sources of inequalities, including the role of
mental conditions along with cognitive biases
in measuring self-reported health
Inadequate identification of what stands
behind measures of socioeconomic position,
namely, different income sources and
measures of wealth and social environmental
controls which differ across the life cycle
Patient experience Lack of conceptual clarity as to what is the Developing more research to understand
difference between satisfaction, patient determinants of satisfaction, patient
experience, and responsiveness experience, and responsiveness
Lack of clarity as to whose experiences/ Developing more precise questions of
satisfaction should be measured (population experience and standardized questionnaires
vs. patient vs. general experts) for the evaluation of health services
Surveys on satisfaction are very sensitive to
question wording, sampling, and
demographic factors
Efficiency The production process underlying health Research to find suitable metrics that measure
systems is intrinsically complex and poorly organizational factors and administrative
understood. Most measures make simplifying structures, which influence inputs and outputs
assumptions that may sometimes result in Improve clarification on the type of efficiency
misleading data being measured by different indicators
Outputs are generally multidimensional, and Improve the conceptualization of the
therefore preference weights are needed if production process in order to better
they are aggregated into a single measure of harmonize data collection efforts
attainment. The choice of such weights is Improve collection of high-quality
intrinsically political and contentious comparable data on outputs, inputs, and
A fundamental challenge in developing an environmental factors necessary for risk
efficiency measure is ensuring that the output adjustments
that is being captured is directly and fully Invest in research to refine methodologies for
dependent on the inputs that are included in whole-system efficiency measurement
the measurement Find a balance between whole-system
Environmental factors, policy constraints, measures and more fragmented efficiency
population characteristics, and other factors measures
may be largely responsible for determining More consideration of how indicators take
health outcomes, yet it is difficult to
(continued)
766 I. Papanicolas and P. C. Smith
Table 5 (continued)
Performance domain Challenges for measurement Developments in measurement
incorporate all possible determinants static and dynamic elements of inputs and
appropriately into an efficiency assessment outputs into account
From an accounting perspective, the
assignment of inputs and associated costs to
specific health system activities is
fundamentally problematic, often relying on
arbitrary accounting rules or other
questionable assignments
Although researchers have developed
indicators that seek to measure full
production processes, these measures are
often not the most informative for policy
makers looking to identify and address
inefficiencies
Many outputs are the results of years of health
system endeavor and cannot be attributed to
inputs in a single period
experience measures. Indicators such as avoidable Iezzoni L. Risk adjustment for measuring health outcomes.
mortality, which seek to measure the contribution Arlington: Health Administration Press/AUPHA;
2013.
of health care to health, are also being better IHP. Monitoring performance and evaluating progress in
developed and more frequently used. Indeed, indi- the scale-up for better health: a proposed common
cators are being selected through rigorous selec- framework. Document prepared by the monitoring
tion mechanisms that aim to identify how and evaluating working group of the International
Health Partnership and Related Initiatives (IHP+) Led
appropriate they are, rather than how readily by the WHO and the World Bank; 2008.
available they are. In addition, risk adjustment Jacobs R, Goddard M, Smith P. How robust are hospital
techniques have become more advanced and ranks based on composite performance measures? Med
allow us to better control for exogenous factors Care. 2005;43(12):1177–84.
Jee M, Or Z. Health outcomes in OECD countries: a
that may lead to changes in performance. framework of health indicators for outcome oriented
policymaking. OECD Labour Market and Social Policy
Occasional Papers No. 36. Paris: Organisation for Eco-
nomic Co-operation and Development; 1999.
References Mant J. Process versus outcome indicators in the assess-
ment of quality of health care. International J Qual
Aday LA, et al. Evaluating the healthcare system: effec- Health Care. 2001;13(6):475–480.
tiveness, efficiency, and equity. Chicago: Health Naylor DC, Iron K, Handa K. Measuring health system
Administration Press; 2004. performance: problems and opportunities in the era of
Atun R, Mendabde N. Health systems and systems think- assessment and accountability. In: Organization of Eco-
ing. In: Coker R, Atun R, McKee M, editors. Health nomic Co-operation and Development (OECD), editor.
systems and the challenge of communicable diseases: Measuring up: improving health system performance in
experiences from Europe and Latin America. European OECD countries. Paris: OECD Publications; 2002.
Observatory on Health Systems and Policies Series; Papanicolas I, Kringos D, Klazinga NS, Smith PC. Health
2008. system performance comparison: new directions
Commonwealth Fund. Framework for a high-performance in research and policy. Health Policy. 2013;112
health system for the United States. New York: The (1–2):1–3. 2013; ISSN 0168–8510.
Commonwealth Fund; 2006. Papanicolas I, Smith PC. Theory of system level efficiency
Davies H. Measuring and reporting the quality of health in health care. In: Culyer AJ, editor. Encyclopedia of
care. NHS Quality Improvement Scotland; 2005. health economics. Philadelphia: Elsevier; 2014. p.
Hurst J, Jee-Hughes M. Performance measurement and 386–394. ISBN 9780123756787.
performance management in OECD health systems. Rigby M, Roberts R, Purves I, Robins S. Realising the
OECD Labour Market and Social Policy Occasional fundamental role of information in health care delivery
Papers No. 47. Paris: Organisation for Economic Co- & management: reducing the zone of confusion.
operation and Development; 2001. Research report. Nuffield Trust; 1999.
31 Assessing Health Systems 767
Roberts MJ, et al. Getting health reform right: a guide to and prospects. Cambridge: Cambridge University
improving performance and equity. Oxford: Oxford Press; 2009.
University Press; 2008. Terris DD, Aron DC. Attribution and causality in health
Sicotte C, et al. A conceptual framework for the analysis of care performance measurement. In Smith PC,
health care organizations’ performance. Health Serv Mossialos E, Leatherman S, Papanicolas I, editors.
Manage Res. 1998;11:24–48. Performance measurement for health system
Smith PC. Developing composite indicators for assessing improvement: Experiences, challenges and pros-
health system efficiency. In: Smith PC, editor. Measur- pects. Cambridge: Cambridge University Press;
ing up: improving the performance of health systems in 2009.
OECD countries. Paris: Organization for Economic WHO. World health report 2000. Health systems: improv-
Cooperation and Development; 2002. ing performance. Geneva: World Health Organization,
Smith PC, et al., editors. Performance measurement for 2000.
health system improvement: experiences, challenges
Health System in Canada
32
Gregory Marchildon
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770
Organization and Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
Physical and Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
Delivery of Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774
Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777
Canadian Constitution
Canadian
Blood Services
(1996)
safety, as well as report outcomes to the general cost of their respective prescription drug plans
public. However, no government has given a pro- (Paris and Docteur 2006).
vincial quality council the power to regulate qual- Due to a high degree of health system decen-
ity or set enforceable standards. tralization, physician autonomy, and onerous pri-
The federal government through Health Can- vacy laws, Canada has been slower than other
ada regulates medical devices; determines the ini- countries in integrating information and commu-
tial approval and labeling of all prescription drug nications technology (ICT) into health delivery. In
therapies, herbal medicines, and homeopathic a 2009 survey of 11 OECD countries by the Com-
preparations; and prohibits direct-to-consumer monwealth Fund, Canadian family doctors scored
advertising of pharmaceuticals. Pharmaceutical the lowest in terms of using electronic health
advertising targeting physicians is subject to fed- records (EHRs) and had the lowest electronic
eral law as well as to codes established by industry functionality (Schoen et al. 2009). Although the
associations. The federal government has exclusive evidence is limited and now somewhat dated, it
jurisdiction over the patenting of new inventions, does appear that hospitals in Canada are also
including pharmaceuticals, and patent protection is behind in their adoption and use of ICT (Urowitz
set at the 20-year OECD norm. Provincial govern- et al. 2008).
ments use a number of regulatory tools, including Three provincial organizations and one
reference pricing, licensing of generics, bulk pur- national-level organization provide health tech-
chasing, tendering, and discounting, to contain the nology assessments (HTA) to provincial and
32 Health System in Canada 773
federal ministries of health and delegated health benefits are portable among the provinces and
authorities. As the sole pan-Canadian HTA territories. Beyond this so-called Medicare cov-
agency, CADTH’s mandate is to provide evalu- erage, federal, provincial, and territorial gov-
ations of new prescriptions drugs, as well as ernments offer their own categorical programs
medical devices, procedures, and systems, to in, and targeted benefits for, long-term care and
federal, provincial, and territorial governments. prescription drugs.
CADTH’s recommendations are advisory in Based on 2011 data, federal, provincial, and
nature, and it is up to the governments in ques- territorial governments were responsible for
tion to decide whether or not to introduce these funding 70.4% of all health spending in Canada,
technologies. the majority of which is raised through general
The patient rights movement is relatively taxation. Three provinces supplement their reve-
underdeveloped in Canada compared to similar nues through annual health-care premiums, but
movements in the United States and Western these too flow into provincial general revenue
Europe. While there are patient-based organiza- funds. The remaining health financing comes
tions focusing on particular diseases, there are from out-of-pocket payments (14.7%), private
only a handful of more broadly based, rights- health insurance (11.8%), and other sources
oriented patient groups. In recent years, patient (3.1%) (CIHI 2013).
rights have been exercised through the courts, Since the Canada Health Transfer constitutes
relying on the constitutional “right to life, liberty roughly 20% of total provincial government
and security of the person” in the Canadian Char- health expenditures, the provincial governments
ter of Rights and Freedom, although most are responsible for raising the lion’s share of rev-
attempts to extend this to a right of access to enues for health (CIHI 2013). Provincial tax rev-
quality health care within a reasonable time have enues come from a number of sources, including
failed (Jackman 2010). (in rough order of importance) individual income
Patients and their respective physicians have taxes, consumption taxes (including “sin” taxes
been more successful in using such Charter on alcohol and gambling), and corporation taxes.
rights to create a right to private health care In those provinces benefitting from an abun-
and private health insurance. In 2005, the dance of natural resources, resource royalties
Supreme Court of Canada provided a limited and taxes are significant sources of revenue
form of this right in a situation where the major- (Marchildon 2013).
ity of the court interpreted public waiting lists Consistent with being a tax-based Beveridge-
for certain types of elective surgery as unrea- style health system, there is limited pooling of
sonable (Flood et al. 2005). funds in the Canadian system. However, there is
a type of pooling through cash transfers – from the
federal government (which collects tax at the
Financing national level) to the provincial and territorial
governments and from provincial governments
Every provincial and territorial government (which pool federal transfers with own-source
provides universal coverage to medically nec- revenues) to RHAs – which, as public non-
essary hospital, diagnostic, and medical care governmental bodies, have no autonomous pow-
services (Taylor 1987). These 13 governments ers of taxation.
act as single payers in providing full coverage
for their respective provincial and territorial
residents. In return for receiving federal trans- Physical and Human Resources
fers, provincial and territorial benefits are pro-
vided on a first-dollar basis and on the same From the 1940s until the 1960s, Canada experienced
terms and conditions to all residents as stipu- a boom in hospital building encouraged by the
lated in the Canada Health Act. Moreover, these introduction and expansion of universal hospital
774 G. Marchildon
long-term care vary considerably among the prov- targeting Aboriginal Canadians, in particular
inces and territories. Facility-based long-term care eligible First Nation and Inuit citizens. These ser-
(LTC) ranges from residential care with some vices include health promotion, disease preven-
assisted living services to chronic care facilities tion, and public health programs as well as
(originally known as nursing homes) with 24- coverage for medical transportation, dental ser-
hour-a-day nursing supervision. Most residential vices, and prescription drug therapies. Despite
care is privately funded, whereas high-acuity these targeted efforts, the gap in health disparity
LTC (requiring 24-hour-a-day nursing supervi- between these Aboriginal citizens and the major-
sion) is heavily subsidized by provincial and ter- ity of society remains large. Since the 1990s, there
ritorial governments (Canadian Healthcare have been a series of health-funding transfer
Association 2009). agreements between the federal government and
Until the 1960s, the locus of most mental First Nation governments – largely based on
health care was in large, provincially run psychi- reserves in rural and remote regions of Canada.
atric hospitals which in turn had evolved out of the At the same time, there has been an Aboriginal
nineteenth century asylum and the twentieth cen- health movement advocating for a more uniquely
tury mental hospital. With the introduction of Aboriginal approach to health and health care
pharmaceutical therapies and a greater focus on (Marchildon 2013).
reintegration into the community, mental health
conditions have since been mainly treated on an
outpatient basis or, in the case of severe episodes, Reforms
in the psychiatric wards of hospitals. GPs provide
the majority of primary mental health care, in part There have been no major pan-Canadian health
because medical care is an insured service with reforms in the past decade. However, individual
first-dollar coverage, whereas psychological ser- provincial governments have concentrated on two
vices are provided largely on a private basis. categories of reforms: (1) structural change
While drugs administered in hospitals are involving the governance and management of
fully covered as an insured service under the health services as a more integrated health system,
Canada Health Act, every provincial and terri- mainly through the reorganization and fine-tuning
torial government has a prescription drug plan of their regional health systems, and (2) process-
that covers a portion of the cost for outpatient type reforms, aimed at addressing bottlenecks in
prescription drugs. The majority of these drug delivery, improving patient responsiveness and
plans target low-income or retired residents. elevating both quality and safety.
The federal government provides pharmaceuti- The introduction of RHAs allowed provincial
cal coverage for eligible First Nations and Inuit. governments to directly manage the health system
These public insurers depend heavily on health through arm’s-length delegated bodies. RHAs
technology assessment to determine which manage services as purchaser-providers except
drugs should be included in their respective in Ontario when the local health integration net-
formularies. works (LHINs) fund (purchase) but do not deliver
Almost all dental care is delivered by indepen- services directly. The purpose of the reform was to
dent practitioners, and 95% of these services are gain the benefits of vertical integration by manag-
paid privately. Dental services are paid for ing facilities and providers across a broad contin-
through private health insurance – provided uum of health services and to improve the
mainly through employment-based benefit plans coordination of “downstream” curative services
– or out of pocket. As a consequence of access with more “upstream” public and population
being largely based on income, outcomes are health services and interventions. In the last
highly inequitable. decade, there has been a trend to reduce the num-
For historical reasons, the federal government ber of RHAs, thereby increasing the geographic
finances a host of health service programs and population size of RHAs in each province,
776 G. Marchildon
in order to capture greater economies of scale narrow scope of the benefit package has resulted
and scope. in larger gaps in coverage, as pharmaceutical
Influenced chiefly by quality improvement therapies and LTC have grown in importance
initiatives in the United States and the United over time. Since 70% of financing for health
Kingdom, provincial ministries of health have care in Canada comes from general taxation,
established new institutions, mechanisms, and there is more equity in financing, but there is
tools to improve the quality, safety, timeliness, less equity in financing for the remaining 30%,
and responsiveness of health services. Six prov- which comes from out-of-pocket sources and
inces have established health quality councils to employment-based insurance benefits associ-
accelerate quality improvement initiatives at the ated with better-paid jobs.
provincial, regional, and clinical levels. Some There are disparities in terms of access to
provinces have also launched patient-centered health care, but outside of a few areas such as
care initiatives aimed at improving the experi- dental care and pharmaceuticals, they do not
ence of patients and informal caregivers. Patient appear to be large. For example, there appears to
dissatisfactions with long wait times for elective be a pro-poor bias in terms of primary care but a
surgery as well as specialist and diagnostic ser- pro-rich bias in the use of specialist physician
vices have triggered efforts in all provinces to services, but the gap in either case is not large.
better manage and reduce wait times. There is also an historic east-west economic
In contrast, the federal government has largely gradient dividing the less wealthy provinces in
removed itself from engaging the provinces in any eastern Canada and the wealthier provinces in
pan-Canadian reform efforts. This is in part the the more western parts of the country from
consequence of the perceived failure of the “10- Ontario to British Columbia. In the present, the
Year Plan to Strengthen Health Care,” signed by economic division is more between those prov-
the Prime Minister and the Premiers of all prov- inces rich in natural resources – particularly petro-
inces and territories in 2004. leum-producing provinces such as Alberta,
The “10-Year Plan” ends in the fiscal year Saskatchewan and Newfoundland – and those
2013–2014. In December 2011, the federal gov- provinces without such resources. These differ-
ernment announced its reconfiguration of the ences are addressed through equalization pay-
Canada Health Transfer for the decade follow- ments from federal revenue sources to “have-
ing the 10-Year Plan. After 2014, increases in not” provinces that ensure the latter have the rev-
the transfer to the provinces, originally 6% per enues necessary to provide comparable levels of
annum, will be held to the rate of economic public services, including health care, without
growth with a minimum floor of 3%, and all resorting to prohibitively higher tax rates.
transfers will be made on a pure per capita While Canadians are generally satisfied with
basis, without taking into consideration the tax the financial protection offered by Medicare,
capacity of the provinces. The removal of any they are less satisfied with their access to par-
equalization component in the transfer will ticular services. Beginning with the budget cuts
make it more difficult for lower-income prov- to health care in the 1990s, emergency rooms
inces to continue to ensure coverage is became overcrowded and waiting times for non-
maintained at the standard enjoyed in higher- urgent care became lengthier (Tuohy 2002).
income provinces. Based on a survey of patients in selected
OECD countries conducted in 2010, Canada
ranked poorly in terms of waiting times for
Assessment physician care and nonurgent surgery (Schoen
et al. 2010). However, based on relevant mor-
The model of universal Medicare has been effec- tality and morbidity indicators of health system
tive in protecting Canadians against high-cost hos- performance, such as amenable mortality, Can-
pital and medical care. At the same time, the ada fares considerably better, posting better
32 Health System in Canada 777
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780
China’s Current Health System Reform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781
Organization, Governance, and Accountability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783
Organization of the Health System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783
Accountability Within Government and to the Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784
Planning, Regulation, and Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786
Monitoring Progress: China’s Health Information Systems and Technology . . . . . . . . . . 786
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
Sources of Funding and Accountability for Its Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
Difficulties Using Available Health Financing for Policy Implementation . . . . . . . . . . . . 789
Health Expenditure and Sources of Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
Collection and Pooling of Funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
Coverage, Benefit, and Cost Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792
Payment Methods for Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
Physical and Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
Infrastructure and Its Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
Health Workforce and Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795
Remuneration of Health Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797
Health Services Delivery and Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797
Primary Care and Public Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797
Clinical Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
D. Hipgrave (*)
UNICEF, New York, NY, USA
Nossal Institute for Global Health, University of
Melbourne, Melbourne, VIC, Australia
e-mail: dhipgrave@gmail.com
Y. Mu
UNICEF China, Beijing, China
e-mail: ymu@unicef.org
70
60
50
40
1962 1971 1980 1989 1998 2017
China World Low & middle income Low income
Fig. 1 Life expectancy in years: China, the world, and low-income and middle-income nations (Source: World Bank data
available at http://data.worldbank.org/)
early 1970s, due to the formerly high fecundity of Bank analysis of NCDs in China (The World Bank
its women alongside a rapid fall in the crude death Human Development Unit 2011) concluded that “a
rate due to communicable disease control (CDC) reduced ratio of healthy workers to sicker, older
and basic public health measures. Life expectancy dependents will certainly increase the odds of a
also rose rapidly during this period (Fig. 1) future economic slowdown and pose a significant
(Hipgrave 2011a). social challenge in China” (page 2). Equally chal-
The second familiar aspect is China’s meteoric lenging is the provision of new services for the
economic development, with an average annual prevention and management of chronic illness
growth rate of around 10% for most of the last and the government’s averred commitment to
30 years, only falling to 7–8% since the global equity and universal health coverage. These chal-
financial crisis. lenges and commitments were among the stimuli to
These familiar aspects of China have depended the major health system reform (HSR) that China
on the health of its population improving dramati- commenced in 2009 (State Council 2009).
cally during the first 30 years of the People’s Repub-
lic of China (PRC) since its establishment in 1949.
By the mid-1970s, China was already undergoing China’s Current Health System Reform
the epidemiologic transition, years ahead of other
nations of similar economic status, and by 1980, life China’s most recent HSR was a response to deep
expectancy in low-income China (67 years) inequity resulting from three decades of marketiza-
exceeded that of most similarly low-income nations tion and de facto privatization of the health sector.
by 7 years (Jamison et al. 1984). It was the culmination of many years of debate
However, with CDC (Hipgrave 2011a), eco- (Tang et al. 2014a) after acknowledged inaction
nomic development, rapid urbanization, and a on the heavy burden of healthcare on household
dramatically ageing population, China’s health expenditure (Blumenthal and Hsiao 2005; Huang
system now faces a vastly different range of 2011; Liu 2004; Liu et al. 2003; Tang et al. 2008). It
issues. China will soon become the first large comprises initiatives in five main areas:
nation to age before achieving developed nation
status. Noncommunicable diseases (NCDs) now 1. Expanding the coverage and benefit of health
account for over 80% of deaths in China and insurance schemes in urban and rural areas
almost 70% of its total disease burden (The World 2. Establishing a national essential medicines
Bank Human Development Unit 2011). A World scheme to ensure the availability of affordable
782 D. Hipgrave and Y. Mu
medicines and reduce the ability of health pro- China’s commitment to HSR indicates its ongo-
viders to profit from the sale of drugs ing priority for the highest echelons of government
3. Improving basic service availability and qual- (Ministry of Health 2012a). The four-year plan for
ity while also reducing referrals to specialist phase 2 reiterates the goal of universal access to
care and hospitals basic health services and seeks to resolve constraints
4. Ensuring the availability of basic public health to the supply of China’s increasing and diverse
services for all populations health needs. It again commits to expanding insur-
5. Piloting public hospital reform, particularly in ance benefits and introduces priority to unifying
order to separate hospital management and China’s several health insurance schemes; it encour-
clinical service provision ages development of commercial insurance and the
introduction of capitation and other payment reforms
The current HSR builds on earlier, equity- to separate doctors from the financial management
enhancing initiatives including the reestablishment of hospitals; it suggests that the private sector should
of rural health insurance (Meng et al. 2012) and manage 20% of health services by 2015; family
subsidized hospital maternity services (Feng general practice is promoted alongside expanding
et al. 2010a). Early progress on the first phase community and public health services, and the
of China’s current HSR (2009–2012) was exten- drug production, prescription, and pricing will be
sively reviewed, both internally by domestically further consolidated and regulated; performance-
commissioned teams of international (unpublished) based funding of health staff is also mentioned.
and national experts (Wu and Yang 2013; Li and These individual areas are discussed further below.
Chen 2012) and externally (Yip et al. 2012). The The plan is encouragingly specific but not pre-
Reform is planned to continue to 2020, with scriptive on strategy and avoids the issue of local
gradual achievement of its overarching objec- accountability for financing various health pro-
tives on universal and equitable access to health grams, stipulating only that government spending
services; the second phase (2012–2015) was on health should gradually increase as a propor-
announced in early 2012 (Ministry of Health tion of total government expenditure. This vague-
2012a), and a major additional pronouncement ness hints at a major problem for China’s health
on county hospital reform was made in early sector, the reliance on local government support
2014 (State Council 2014). Monitoring and for the national equity objective (Hipgrave et al.
evaluation of the Reform is slated to prioritize 2012). Another major problem remains the diffi-
its different hierarchical elements (Figs. 2 and culty of reforming hospital management, effec-
3), although detailed plans for such evaluation tively undoing the private, for-profit system that
have not been released. evolved over recent decades. As a result, China’s
Fig. 3 Focus of monitoring and evaluation of health reforms in China (Source: WHO China)
HSR has not yet reduced the proportional finan- lower levels. A hierarchy of health authorities
cial burden of healthcare on households or their oversees these issues at province, prefecture,
risk of catastrophic expenditure on health (Meng county, and township levels.
et al. 2012). In China’s political economy and governance
structure, local health authorities are more respon-
sive to local government than to higher-level
Organization, Governance, cadres within the health sector, meaning that
and Accountability uptake of national policies and recommendations
is only guaranteed if there is broad agreement
Organization of the Health System across all sectors of government and at local gov-
ernment level. In the past, when the health sector
China’s former Ministry of Health (MoH) was of low priority, this severely limited the
recently merged with the body previously implementation of national laws relevant to the
responsible for family planning to form the health sector. For example, the 1989 Law on Con-
National Health and Family Planning Commis- trol of Infectious Diseases conferred on local gov-
sion (NHFPC). The Commission contains 23 dif- ernment’s responsibility for various forms of
ferent departments, offices, and bureaux reporting and action, but was weakly
responsible for setting standards and for the plan- implemented, culminating in the wake-up call of
ning, administration, oversight, and reporting on SARS in 2003, redrafting of the law and major
China’s health sector. However, as with most of reform of CDC (Hipgrave 2011a; Wang et al.
China’s social sectors, there is a heavy decentral- 2008a). Initiatives depending on countrywide
ization of responsibility for local planning, uptake such as the 2010 national measles vacci-
financing, and implementation of health services nation campaign still rely heavily on local funding
in China (Wong 2010; Zhou 2010a). In China’s and prioritization; recent environmental degrada-
decentralized system, policies and reform guide- tion and food and drug safety scandals are further
lines are set at national level but implementation evidence of the lack of cross-sectoral priority
is delegated to local authorities at provincial and given to the health sector in China. The partial
784 D. Hipgrave and Y. Mu
rollback of the one-child policy announced at However in China, such relationships have not
national level in 2013 remains subject to interpre- yet been forged. While there are promising moves
tation and optional implementation by provincial to make local government generally more
governments. Despite its evident high priority accountable to the public (such as measurement
(Tang et al. 2014a), many aspects of the HSR itself of “green gross domestic product (GDP)” and
are dependent on the same support and follow-up independent surveys of public opinion on local
by provincial and even county governments government performance in some provinces), the
(Hipgrave et al. 2012; Brixi et al. 2012). main motivation for subnational authorities
To ensure that HSR would receive adequate remains economic development and revenue gen-
local priority despite this structure and account- eration (Zhou 2010b). Moreover, while banking,
ability, in early 2010 the HSR Leading Group in communications, etc. are carefully regulated and
the State Council signed “accountability con- monitored from above, like most social sectors,
tracts” with provinces on key reform areas, for health services are largely organized and moni-
subsequent delegation and implementation at tored at the local level. It is too costly for China’s
lower levels (China News Network 2010). In undermanned central government to indepen-
some provinces, a few key HSR targets such as dently monitor and evaluate subnational health
health insurance coverage were incorporated into performance (Wong 2010; Zhou 2010b). These
subnational officials’ performance evaluation circumstances explain the limited ability of
criteria, which has been effective in ensuring pro- national health officials to ensure the HSR is
gress. However, in other, more complicated fully pursued at grassroots level.
reform areas, such as strengthening primary In theory, all government plans represent the
healthcare, public hospital reforms, and others, will of the people as they are ratified by the
ensuring progress has been more difficult. Indeed, National People’s Congress. However, many
the reform of public hospitals suffers from a lack Congress members are unelected (in the western
of consensus or clear national guidance on direc- democratic sense) appointees, and the People’s
tion, limiting its prioritization and implementation Congress generally rubber-stamps the documents
outside pilot areas, particularly at low levels. presented. However, with the increasing attention
of the Party and government in China to public
comment through social media, albeit increas-
Accountability Within Government ingly censored (Osnos 2014), and local protests,
and to the Population there is growing acknowledgment of their answer-
ability to the general public. Therefore, while
Figure 4 illustrates the ideal accountability relation- during local planning there is almost no formal
ships among government, healthcare providers, and process for the public to make input, there are
citizens (society) in the delivery of healthcare. opportunities for the general population to voice
Fig. 4 Accountability
relationships for healthcare Government
(Source: Adapted from The
World Development Report Regulation
2004 (The World Bank Mechanisms for
citizens’ Monitoring and evaluation
2003))
feedback, to
inform policy Funding
Provision of care
Health
Patients
Providers
As active purchasers of care,
patients monitor provider
performance
33 Health System in China 785
concerns through the courts, social media, peti- mechanisms to tap the feedback of patients, as
tions, protests, etc., especially when issues affect a the end users of health services, have not been
significant proportion of a community. Although established. There is no ombudsman or indepen-
the process is usually slow (the HSR took many dent regulator in China’s health system, and
years to be formalized (Tang et al. 2014a)), there senior appointments are normally approved by
is usually gradual recognition and acknowledg- the ruling Party organization. However, since
ment of the need to act. On the other hand, imple- launching the HSR, government is learning that
mentation of plans usually requires higher-level empowering patients and regularly collecting
pressure on the various lower tiers of government, their feedback on key parameters such as service
and this pressure progressively dissipates further prices and quality strengthens accountability
down the hierarchy; it may be ignored for issues across the government levels and can help achieve
that don’t have high-level and cross-sectoral sup- the overall goals of the reform (State Council
port and the support of local government. Hence, 2014). Patient satisfaction and feedback is
targets for insurance coverage and drug price con- increasingly incorporated into the performance
trol are accepted, but controlling the environmen- evaluation framework for HSR implementation
tal impact of local industry is often ignored (Ma 2013). However, this practice has not yet
(Human Rights Watch 2011). In this process, pub- been standardized, systematized, and regularized
lic influence is rather indirect and can be ignored if throughout China.
local economic, political, or vested interests An example of the problem China is having in
override it. effecting the most difficult aspect of the HSR, the
Patients’ concerns in healthcare delivery may reform of public hospitals, was recently summa-
be channeled formally through the National Peo- rized by eminent researchers on China (Yip et al.
ple’s Congress at different levels (although usu- 2012), who noted the complex web of relation-
ally only major complaints reach this level) or ships that govern this endeavor (Fig. 5). It seems
informally through social media. However, likely that China will need all the years up to 2020
Public hospitals
Fig. 5 Dispersion of power between ministries and public Affairs, CCP Org Dept Organizational Department of Chi-
hospitals in China. *MOH Ministry of Health, NDRC nese Communist Party, NCMS New Cooperative Medical
National Development and Reform Commission, MOF Scheme, UEBMI Urban Employee Basic Medical Insur-
Ministry of Finance, MOHRSS Ministry of Human ance, and URBMI Urban Residents Basic Medical Insur-
Resources and Social Security, MOCA Ministry of Civil ance (Based on Yip et al. 2012)
786 D. Hipgrave and Y. Mu
to make progress in this area of reform, although China, desensitization of administrative and eco-
some commentators doubt this will be achieved in nomic data is suspected (Cai 2008; Hu et al. 2011;
the current context (Zhang and Navarro 2014). Walter and Howie 2011; Kaiman 2013; Anony-
mous 2012).
Regulation of the health sector follows the
Planning, Regulation, and Monitoring accountability structure outlined above and
appraises progress and achievement against
The normal sector-planning practice in China fol- high-level targets set at national and local levels.
lows the National Five-Year Plan for Social and Performance assessment tends to be quantitative
Economic Development, with different social sec- (relating to coverage or throughput of health ser-
tors (including health) developing their respective vices), although assessment on more subtle mea-
plans at five-yearly intervals with annual updates. sures such as patient satisfaction, service quality,
However, the special need for health reform did not and disease management has commenced (as
allow China’s HSR to fall neatly in line with regu- outlined in a Guidance on Performance Assess-
lar national development planning, which covers ment of Basic Public Health Services Delivery,
two five-yearly periods per calendar decade: the jointly promulgated by Ministry of Health and
first three-year phase of the HSR covered Ministry of Finance in January 2011). At manage-
2009–2011, while the second overlaps with the ment level, government officials are also increas-
latter part of the government’s 12th Five-Year ingly being appraised according to efficiency and
Plan period: 2012–2015. Moreover, the HSR was innovations in rolling out reform initiatives at
developed as a cross-sectoral endeavor led by the local level.
national planning ministry (the National Develop-
ment and Reform Commission or NDRC) to
address long-accumulated concerns of the popula- Monitoring Progress: China’s Health
tion (State Council 2009; Tang et al. 2014a). While Information Systems and Technology
it overlapped with a MoH planning and develop-
ment activity, Healthy China 2020, the HSR was With around 20% of the world’s people,
not only a MoH initiative. population-level changes in China’s health status
As part of the government’s regular planning, or indeed any globally important indicator have a
the new NPFPC drafts annual national health major influence on corresponding global pro-
work plans with annual targets and submits annual gress. For example, China’s progress toward
budget proposals for approval by the Ministry of regional and global achievement of the Millen-
Finance and the NDRC, which approves major nium Development Goal (MDG) targets will
construction initiatives such as health infrastruc- impact any final evaluation of the MDGs in 2015.
ture development. With major events as the HSR, However, global statistics in any of the biolog-
new changes and innovations are often seen in the ical, physical, and social sciences can only be
plans year on year. At subnational levels, health- calculated if China’s data is included and consid-
related authorities (not only health bureaux) in ered to be reasonably accurate, and data from
provinces, prefectures, and counties submit China is not always available. Many lists of global
annual planning and budget proposals in line indicators lack an entry from China, and the accu-
with health service delivery needs and steward- racy of what is released has been questioned (Cai
ship to the development planning and finance 2008; Mulholland and Temple 2010). Usually,
authorities at the corresponding tier. Implementa- this is simply because China itself does not collect
tion is financed by local budget supplemented by national statistics on the relevant indicators or not
transfers from higher tiers of government in ways comparable with other nations (e.g., see
(explained below). Local data should be used in http://www.countdown2015mnch.org/documents/
formulating plans, but as there is little tradition of 2012Report/2012/2012_China.pdf). However, as
regular, independent, or audited data gathering in long ago as 2000, perspectives on China’s
33 Health System in China 787
mortality data were quite positive (Banister and 2. Maternal and Child Mortality Surveillance
Hill 2000). network, which has been summarized else-
The overall lack of data from China rouses where (Wang et al. 2011).
suspicion. But while China’s official statistics 3. The China Food and Nutrition Surveillance
often lack breakdowns on key indicators (e.g., System, which surveys 40 surveillance sites
until recently, child mortality by gender or cause on a five-yearly basis, most recently in 2010.
of death; nutrition status by province) or vary 4. The ten-yearly National Nutrition Survey, a
widely from one official source to the next comprehensive, age-stratified, sex-stratified,
(such as the annual birth cohort (Cai 2008) or and geographically stratified survey with a
number of road deaths (Hu et al. 2011)), these sample size of almost 200,000 (last com-
issues distract from China’s efforts to improve pleted in 2012).
the content, frequency, quality, and public avail- 5. The China Immunization Registration and
ability of official data in recent decades (Banister Information System, a newly computerized
and Hill 2000). Indeed, UNICEF’s “Atlas on administrative system that reports vaccina-
Children in China” publishes a wide range of offi- tion coverage to the NHFPC.
cial and recent data (http://www.unicefchina.org/ 6. Data gathered on health facilities, human
en/index.php?m=content&c=index&a=lists& resources, equipment, and services provided
catid=60), and health statistics and other year- to outpatients and inpatients at various sub-
books are published annually (Ministry of Health national levels and collected by the MoH
2012b; National Bureau of Statistics 2012, 2016) Center for Health Statistics and Information.
with a great degree of detail and disaggregation. 7. China’s National Notifiable Disease
An increasing number of official and peer- Reporting System, through which each
reviewed publications on maternal and child health county reports on 35 notifiable diseases.
(MCH) in China report official government data After SARS, this reporting system was mas-
(Wang et al. 2011, 2012; Rudan et al. 2010; Ministry sively upgraded to become web-based with
of Health 2011a; Feng et al. 2010b, 2011), and this is reporting in real time (Fig. 6).
contributing to summaries of global progress on the 8. Disease Surveillance Points on births, deaths,
world’s health status and MDGs 4 and 5. China and on cases of 35 notifiable diseases at
relies on several different sources to provide health 145 selected points around the nation.
administrators, the public and academia with infor- 9. China’s Vital Registration System, which
mation on the health sector. While it has never covers around 8% of the nation’s population
conducted a demographic and health survey, and but is biased toward urban and eastern
its last multi-indicator cluster survey was in 1995, locations.
China’s national health services survey has been 10. National Health Services Survey, which
undertaken with a reasonably consistent methodol- focuses on health status, service uptake, and
ogy on a five-yearly basis since 1993. Many publi- health financing (Meng et al. 2012); it was
cations have used this source to assess progress in last conducted in 2013.
aspects of China’s health system (Meng et al. 2012) 11. National Census, last conducted in 2010
and on its health indicators (Wang et al. 2012). (National Bureau of Statistics 2012), includ-
As an example of the other sources used, ing substantive demographic information.
China’s official MCH management information 12. National one percent (inter-census) House-
system (MIS) and the China Health Statistics hold Survey, conducted between the
Yearbook (Ministry of Health 2012b) rely on ten-yearly national censuses, last conducted
data from the following: in 2005.
Fig. 6 Web-based national notifiable disease reporting since 2004 (Source: China Centre for Disease Control, Beijing
(with permission))
800
Qinghai Shanghai
600
Ningxia Tianjin
Xinjiang Inner Mongolia
Zhejiang
400 Yunnan Gansu
Guizhou
Henan
Chongqing Guangdong Jiangsu
200
0
10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000
Provincial GDP per capita (RMB)
Fig. 7 Provincial expenditure on health per capita in relation to provincial gross domestic product per capita, 2010
(Source: Ministry of Health, China Health Statistics Yearbook, 2011 (Ministry of Health 2012b))
grants and tax sharing and have significant auton- (Blumenthal and Hsiao 2005; Meng et al. 2012;
omy in what they do with these funds. Prefecture World Bank 2012).
governments in turn have similar autonomy. In Moreover, income disparities have widened
this system, funding for public service delivery across localities and population groups within
by poorer townships and counties tends to be local jurisdictions (Xing et al. 2008; Zheng et al.
insufficient (Wong 2010; Zhou 2010b). 2008; UNDP China and China Institute for
Apart from earmarked transfers from the MoH Reform and Development 2008). The national
and funds for selected nationwide priorities, urban-rural ratio of income per capita has risen
local governments may withhold resources for from 2.4 in 1991 to 3.2 (up to 4 within certain
lower levels or favor spending in more populous provinces) in 2010 (Fig. 7) (National Bureau of
areas or on issues strategic to their career (Zhou Statistics 2011). At subnational level, only four
2010a; Liu 2007). This kind of bias at sub- provinces (Sichuan, Tibet, Xinjiang, and Yunnan)
national levels can undermine progress on bucked this trend due to large subsidies to stimu-
national development goals (Yang 2011; late economic development and poverty reduc-
Uchimura and Jütting 2007). tion. Subsidies for these provinces impact the
To supplement resources received from the shape of the line of best fit in Fig. 2, which depicts
higher levels, subnational governments raise provincial expenditure on health in relation to
resources from various fees, the sale of land use provincial GDP, per capita.
rights, and taxes on real estate transactions
(World Bank 2012). However, poor localities
tend to have limited scope for such revenue gen- Difficulties Using Available Health
eration. The imbalance between resources and Financing for Policy Implementation
expenditure responsibilities, particularly in poor
jurisdictions, impacts on health service quality As mentioned, in China’s decentralized environ-
(Yang 2011) and on household health expenditure ment, local government expenditures are not
790 D. Hipgrave and Y. Mu
aligned with policy priorities across sectors and compared with industrialized countries, which aver-
programs. There are four distinct components of aged 9.7% in 2010 (OECD 2013), but is average
the national budget system, two of which impact among low- and middle-income countries (LMIC),
on social sector spending: the general government whose THE/GDP ranges from 2.6% to 10% (e.g.,
budget (which relies on various taxation revenues Indonesia 2.6%, Thailand 3.9%, India 4.1%, Russia
and allocates funds to publicly funded services 5.1%, Vietnam 6.8%, South Africa 8.9%, and Bra-
and activities) and the social security budget. The zil, 9.0%) (see data at http://apps.who.int/nha/data
first of these allocates funds at the sectoral level; base). Health expenditure as a proportion of GDP
line ministries can then decide on and allocate has increased from ~3% to ~5% since 1980, but
earmarked transfers to the provinces (Wong 2010; numeric growth has been enormous due to China’s
Zhou 2010b). However, subnational government rapid economic growth (Figs. 8, 9, and 10).
spending also relies on off-budget revenues (such The sources of THE have changed dramati-
as local taxes) for off-budget programs. cally over time, reflecting changes in the role of
Monitoring is limited and there is little effort to government. Marketization beginning in the
align subnational budgets or plans with higher- 1980s led to historically high out-of-pocket
level priorities. Moreover, apart from some indi- expenditure in 2001 (60%), but this had decreased
vidually monitored earmarked transfers, little to ~34% in 2012 (China National Health Devel-
information is available on whether governments opment Research Centre 2013), mostly through
actually spend money according to budgetary public subsidies for primary health programs, for
allocations or whether government expenditures health providers and for the social insurance
and programs lead to the desired outputs and schemes.
expected outcomes. Achievement of high-profile In 2011, tax-based government expenditures
input and output HSR targets masks the absence accounted for 30.7% of THE, social health expen-
of substantive analysis of outcome-level impact diture 34.6%, and out of pocket 34.7% (Fig. 8).
(Meng et al. 2012; Yip et al. 2012). Audits tend to Overall, public expenditure on health as a share of
focus on detecting malfeasance, not program THE is similar to that of many other LMIC and
performance. also to the United States (even higher if the gov-
Additionally, China’s budget and expenditure ernment contribution to social health insurance is
cycles are not synchronous. The fiscal year starts considered), but most high-income countries
with the calendar year, but the budget is not average around 71% (Tangcharoensathien et al.
endorsed by the National People’s Congress 2011). WHO calculates this figure differently and
until the end of March. This delay reduces the has China’s figure at 56%; most nations in South
budget’s operational significance for subnational and East Asia average around 41% (see http://
governments and central ministries (World Bank apps.who.int/nha/database and Hipgrave and
2012). Fragmentation, information limitations, Hort 2014).
and delays in budget execution limit the ability
of national authorities to transform policy priori-
ties into resource allocation and results at the local Collection and Pooling of Funds
levels (World Bank 2012).
To provide essential health services, reduce ineq-
uity, and provide financial protection against cat-
Health Expenditure and Sources astrophic health expenditure, governments must
of Revenue mobilize sufficient resources via: (1) collecting
revenues, (2) pooling of risk, and (3) purchasing
Total health expenditure (THE) in China was goods and services (Gottret and Schieber 2006).
US$445.5bn in 2012, at US$329 per capita, and Globally, three models of basic healthcare financ-
5.41% of GDP (China National Health Develop- ing are practiced: nationalized health services,
ment Research Centre 2013). THE/GDP is modest social insurance, and private insurance. China’s
33 Health System in China 791
Percentage of GDP
5.5
Government health expenditure
5.0
Social health expenditure
4.5
Out-of-pocket health expenditure
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Fig. 8 Government, social and out-of-pocket expenditure on health, 1978–2011 (Source: China Health Statistics
Yearbook (Ministry of Health 2012b))
30000 6
25000 5
Unit: 100 million renminbi
20000 4
15000 3
10000 2
5000 1
0 0
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010
Total health expenditure % of GDP
Fig. 9 Total health expenditure (THE) in China, numeri- (China National Health Development Research Centre
cally and as a percentage of gross domestic product 2012) (2012: US$1 = ~6 renminbi [RMB]))
(Source: China National Health Accounts Report 2012
792 D. Hipgrave and Y. Mu
2000
1800
1600
1400
1200
RMB
1000
800
600
400
200
0
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
Fig. 10 China’s per capita THE (Source: China National Health Account Report 2012 (China National Health
Development Research Centre 2012))
healthcare financing has evolved to a structure these arrangements are slated for phasing out.
dominated by three social insurance schemes However, government does not as yet contribute
with almost universal population coverage: the substantively to the funding of hospital care,
urban employees basic medical insurance which remains predominantly managed in-house
(UEBMI) (financed by formal sector employers from various sources of revenue (in particular,
and employee contributions), the rural coopera- out-of-pocket payments and insurance) (State
tive medical (insurance) scheme (RCMS), and Council 2014; Barber et al. 2014).
urban residents’ basic medical insurance
(URBMI). The latter two receive heavy govern-
ment subsidization in addition to individual con- Coverage, Benefit, and Cost Sharing
tributions (in a roughly 4:1 ratio).
Government health expenditure stems from tax Table 1 summarizes the current basic health
revenue, as described above. China does not have financing arrangements and benefit provided by
tax instruments specifically designated to health the various health insurance schemes in China. It is
expenses; the funds are allocated from overall tax evident that the major challenge remains fragmenta-
revenue. These funds are used to pay the salaries tion of the schemes and arrangements and the asso-
of health workers, purchase equipment, and build ciated inequity and inefficiency. This is also
infrastructure at various levels and for various highlighted in Fig. 11, which depicts the large var-
specific programs such as public health subsidies iation in average numeric benefit and other informa-
or other schemes earmarked by the MoH. Gov- tion about the various schemes. In this context, and
ernment also funds a social assistance program given China’s highly mobile population and the
(the medical financial assistance scheme), which limited access of migrant populations to urban
provides cash for designated poor households to health services (Di Martino 2011), the Government
purchase health services. There also remains “free is prioritizing integration of the various insurance
medical treatment” for those on the government schemes (Ministry of Health 2012a), but this is a
payroll and for retired military and Party cadres; difficult and complex proposition.
33
sector not covered by the with catastrophic employees and purchasers another scheme or
workers UEBMI health expenses or special groups of private having to choose a
recognized such as retired insurance different health
recipients of China’s military and provider (includes
social security Party cadres many migrant
payments workers)
Population 14.8% 69.5% 9.5% All citizens 809 million incidents 0.7% 0.3% 5.2% (excludes most
coverage funded in 2011 migrant workers)
Benefit Outpatient Mainly IP, but some Mainly IP, but All services IP incurring OP and IP OP and IP
covered (OP) and counties some cities catastrophic cost to
inpatient experiment to cover experiment to individuals
(IP) some care for OPs cover some care
(e.g., treatment of for OPs (e.g.,
chronic diseases) treatment of
chronic diseases)
Co-payment Practices vary across regions; co-payment occurs for N.A. Practices vary cross Very minimal Varies 100%
arrangement (1) expenses below scheme thresholds and also above regions cross
ceilings, (2) expenses on high-end or special services schemes
excluded from schemes, and (3) the percentage not
reimbursable for different services
OP IP roughly 50–60% IP roughly
roughly 60–70%
30–50%;
IP:
10–20%
Source: Authors’ own compilation; figures from the 2012 China Health Statistics Digest (Ministry of Health 2012b)
793
794 D. Hipgrave and Y. Mu
3000
Per capita
government Programme for Employees of
2500 funding (RMB) in Public Administrative Units and
Organizations
2008 or 2010 (10 million) 2008 2629
2000
1500
Programme for
Employees of Public
1000
Sector Services Units
RCMS
and Organs (39 million) Urban MFA
(832
2008 (16.1 million)
500 533 million)
Rural MFA (50.4
2010
2010 126
million) 2010
279 171
100
0 (bubble
GFYL Urban Residents' BMI covered)
(195million) 2010
-500
Year of 120
launch 1950s 1990s 2002 2003 2007
Fig. 11 Government financing per participant across Government funding figures are annual per person, except
health security schemes introduced during 1950–2007. for the rural and urban medical financial assistance,
Note: Bubble size is equivalent to the number of partici- reported per case (Source: National Health Account Report
pants. Number of participants is shown in parentheses. 2011 and China Health Statistical Digest 2011)
Government spending per participant is shown in red.
Payment Methods for Health Services Innovative provider payment methods, such as
capitation (for primary heath mostly), gross bud-
Before the HSR, to ensure financial accessibility, get, diagnosis-related groups (for hospitals), as
the Chinese government priced primary well as performance-based payment for health
healthcare services at below cost, but allowed workers, are being piloted at county and district
providers to charge high prices for diagnostic level. Other related policy reforms include a
tests using high-tech equipment, effectively zero markup policy (for essential drugs), imple-
cross-subsidizing primary services. Providers mentation of essential drug list, and so on (Yang
could also levy a 15% profit on drug sales. et al. 2013a).
Under the prevailing fee-for-service payment
modality, this created an incentive for providers
to maximize profit by ordering tests and over- Physical and Human Resources
prescription of drugs. Cost-effective and efficient
primary healthcare services were ignored by pro- Infrastructure and Its Funding
viders because they were not profitable; those who
could not pay for services often chose to forego By international standards China’s average health
them (Tang et al. 2008). infrastructure level has been poor. For example, the
The recent reforms to provider payment, and number of hospital beds per 1000 population in
those mooted for the future, aim to: (1) encourage 2011 was around 4, among the lowest in the
the provision of cost-effective and efficient pri- world (Ministry of Health 2012b). Health infra-
mary healthcare services, (2) reduce provider structure in China also suffered from a major
reliance on drug income and curb over- urban-rural divide in the earlier stages of social
prescription, and (3) curb cost inflation. and economic development. Not only did urban
33 Health System in China 795
health infrastructure enjoy greater public financial drugs and inappropriate use of parenteral prep-
support, it attracted loans and other financial instru- arations continue to exemplify the low quality
ments because it was profitable and boosted the of care, especially in rural areas (Blumenthal
local economy. For many years, rural facilities and Hsiao 2005; Bloom and Xingyuan 1997;
received very limited government subsidy and Zhan et al. 1998; Pavin et al. 2003; Dong et al.
relied on collective funding among farmers. Rural 2008; Chen et al. 2010).
health infrastructure lagged seriously, in terms of With economic marketization, medicine at all
both the basic condition of health facilities levels became privatized, physician salaries were
(buildings, beds, etc.) and the equipment, while paltry, standard consultation fees were fixed
big urban hospitals acquired technical equipment below cost (Eggleston et al. 2008), and over
of high quality. In 2005, there were 3.6 hospital 40% of doctors’ and health facilities’ income
beds per 1000 urban residents, but only 0.78 in derived from the sale of drugs (Hu 2010). As a
rural townships (Ministry of Health 2007). This result, doctors worked where they could be
inequity was recognized by national government, assured of income, patients became disillusioned
and in 2006 the majority of a national bond issue with the care at rural clinics, self-referral to urban
was used to finance a project earmarked for rural clinics increased, and the distribution of doctors,
health, specifically to finance the rebuilding, reno- nurses, and health facilities was heavily biased to
vation, and updating of medical equipment for urban areas (Yip et al. 2012; UNDP China and
rural providers, including primary health facilities China Institute for Reform and Development
such as CDC and MCH institutions. The NDRC 2008; Youlong et al. 1997; Anand et al. 2008)
and its local branches approved the funding pro- (Table 2). Residents of urban areas in China, par-
posals for physical health infrastructure. More ticularly in the large eastern cities, enjoy physical
recently, the 2009 HSR allocated large sums to access to health services to the same level as in
further improve physical health sector infrastruc- most developed nations. However, like many
ture (focusing on rural remote rural areas, but also other Asian nations, China has trained more doc-
urban community health centers). Progress on this tors than nurses or midwives, and there are pro-
aspect of the Reform has been very positive (Yip gressively fewer staff with formal health training
et al. 2012). in progressively poorer rural areas (Youlong et al.
1997; Anand et al. 2008) (Table 3). China
includes TCM practitioners (13%) in headcounts
Health Workforce and Trends of health staff (Anand et al. 2008).
China is still paying for the interruption of
For the majority of China’s population, access university education during the Cultural Revolu-
to western and formally regulated traditional tion of 1966–1976, and the paucity of new village
Chinese medicine (TCM) only commenced doctors trained since the breakup of the village
with the introduction of China’s famed “bare- cooperatives in the late 1970s. First, as of 2005,
foot doctors” in the mid-1960s. These cadres 67.2% of China’s doctors and 97.5% of nurses had
numbered 1.8 million at their peak (around one only completed junior college or secondary tech-
per 600 people), but numbers fell rapidly with nical school level training, and 6% and 8% respec-
economic marketization and liberalization of tively had just high school or lower education
population movement (Bien 2008). Moreover, (Anand et al. 2008). The duration and standard
village-level care lost its funding base with the of professional education varies widely across the
dismantling of the rural cooperatives in the country (Youlong et al. 1997). Village doctors are
early 1980s, and training and supervision of an ageing cohort, with a likely high attrition rate in
the quality of care provided fell off. As recently the coming decade (Xu et al. 2014).
as the late 1990s, many doctors lacked training However, with massive increases in the num-
to the level suggested by their rank and title ber of formal trainees since 1998, the distribution
(Youlong et al. 1997), and overprescribing of and quality of personnel are probably bigger
796 D. Hipgrave and Y. Mu
Table 3 Distribution of doctors and nurses by education level and health institution type, in 2011
In hospitals (%) In community health centers (%) In township hospitals/clinics (%)
Doctors Nurses Doctors Nurses Doctors Nurses
University and above 62.7 11.5 31.7 5.7 3.9 0.4
Secondary school and 36.3 86.4 64.6 91 83 87.9
college
High school or less 1 2.1 3.7 3.3 13.1 11.7
Note: University and above refer to with at least a bachelor’s degree. Secondary schools include technical or professional
high schools
Source: China Health Statistics Yearbook 2012 (Ministry of Health 2012b)
problems than the overall number of China’s of intensive efforts to fill known human resource
health human resources. Indeed, some data sug- gaps among various health and allied health pro-
gest an excess of trainees and the likelihood that viders, and of tiered registration for doctors that
many health graduates do not take up professional first requires a period of rural service. A focus on
service. Nonetheless, inequality and inequity in community general practice is reiterated in the
the distribution of doctors and especially nurses plan, with a target of 150,000 staff newly trained
between and particularly within provinces or upgraded personnel to provide such services.
remains extreme and has been linked to key health In addition, in a 2011 “Guidance” the State
outcomes including infant mortality (Anand et al. Council announced new roles for village doctors,
2008). recommending a wide range of tasks (Govern-
Authorities in China recognize the prevailing ment of China 2011). By 2020, these cadres
inequity in distribution of health human resources should be providing standardized primary care
and have initiated training and other schemes to (following new clinical guidelines), implementing
increase the number of qualified personnel and public health programs, undertaking disease sur-
improve their distribution. The 12th Five-Year veillance, conducting community education, par-
Plan for health sector development, released in ticipating in health financing schemes, and
2012, sets targets for assistant physicians (1.88/ maintaining individual e-health dossiers. In the-
1000 population) and nurses (2.07) and lays out ory, it will be possible for the national HMIS to
plans for increased priority of staffing in rural monitor their work. The official engagement of
areas and at community level, of personnel and village doctors in a national system is positive
financial support for poor rural and western health development and should improve public confi-
facilities by wealthier urban and eastern facilities, dence in their services. However, payment for
33 Health System in China 797
the planned elevation of village doctors’ respon- and Hsiao 2005; Ho and Gostin 2009; Wang
sibilities will derive from a complex mix of et al. 2007; Tian et al. 2008) or through
funding streams (Government of China 2011; accepting bribes and kickbacks (Yang and Fan
Ministry of Health 2011b) overseen and addition- 2012). While the government has committed to
ally funded by county-level authorities (Govern- improving both the quality of care provided by
ment of China 2011) whose accountability for this health providers, and is exploring remunerating
national initiative will be to local government them through capitation, diagnostic-related
(Wong 2010; Zhou 2010b), not health authorities. groups and performance-based incentives (Min-
istry of Health 2012a), separating hospital man-
agement from doctors’ income is proving to be
Remuneration of Health Workers the most difficult element of the current HSR
(Yip et al. 2012).
It is well established that marketization and the de
facto privatization of clinical care by salaried doc-
tors working in public facilities had, by 2000, Health Services Delivery
resulted in China having one of the least equitable and Outcomes
health systems in the world (The World Health
Organization 2000), with over 60% of THE being Primary Care and Public Health
out of pocket (Blumenthal and Hsiao 2005; Ho
and Gostin 2009; Wang et al. 2007). One of the As reviewed elsewhere (Hipgrave 2011a), public
main objectives of China’s HSR is to regulate the health services in China suffered badly under the
remuneration of doctors and to separate their marketization of the 1980s and 1990s. CDC in
income from choices on clinical care. However, particular was weak, culminating in the SARS
while China has reduced the level of out-of- epidemic in 2003. Public funding for preventive
pocket expenditure on health to around 35% health services fell dramatically and was insuffi-
through increases in public funding and insurance cient to even cover salaries. Public health author-
initiatives (Yip et al. 2012), household health ities were left to raise their own income through
expenditure has not decreased either numerically charging fees for services, including vaccination
or as a proportion of total household expenditure (for which fees were only completely dropped in
(Meng et al. 2012). Although there is indirect 2007) and various inspections and screening.
evidence of increased non-health expenditure by Community approaches to disease control were
insured households in comparison to before the abandoned in favor of vertical programs reliant on
schemes were introduced (Bai and Wu 2014), this national or external funding, and disease surveil-
objective of the HSR is proving to be the most lance was poor.
difficult to achieve. China’s THE is increasing at SARS and health authorities’ realization of the
around 17% per year, and a large proportion of the epidemic of NCDs due to ageing, urbanization, and
increase is due to payment of health facilities, decreasingly active lifestyles has led to major
doctors, and other providers by individuals or changes to public health programming in China.
insurers. As patient expectations rise but out-of- Disease surveillance is now conducted online, in
pocket expenses remain numerically high, an real time, and funding for CDC and preventive
increasing number of assaults of doctors by health has increased dramatically. New vaccines
patients’ families are being reported. were introduced in 2008, although globally
On the other hand, the scheduled fees payable recommended vaccines against Haemophilus
to doctors for listed services are set below cost, influenzae type B, pneumococci, human papilloma
forcing clinicians and facilities to charge for viruses, and rotaviruses are only available privately
other services, investigations, procedures, and (ironically, through government providers).
drugs (including those not on the essential The largest boost to public health came with
drugs list with unregulated prices) (Blumenthal the 2009 HSR, when government introduced a
798 D. Hipgrave and Y. Mu
minimum 15 renminbi (RMB)/capita subsidy for (2) cervical and breast cancer screening for
public health/screening activities to be conducted women in rural areas; (3) an expansion of the
across the nation. This had been pre-dated by hospital delivery subsidies first introduced in
various vertical preventive health programs, such 2000, to cover women in all rural counties;
as funding of hepatitis B vaccine since 2002 (Cui (4) free cataract surgery for the poor; (5) free
et al. 2007) and national funding of the EPI since folic acid supplementation for rural women before
2007. The HSR public health funding is provided and during pregnancy; (6) improved stoves and
by a mix of national and local authorities fuel to reduce fluorosis; and (7) introduction of
according to their ability to pay (problematic for eco-friendly toilets. Again, targets for introduc-
poor counties in rich provinces) and the RMB15 tion of these measures have been set and rollout
was increased to RMB25 in 2011; it is much is proceeding (Yip et al. 2012).
higher in wealthy areas. The funds pay providers Finally, although firm evidence of impact is
to conduct the following services, notionally scant, local authorities in most Chinese cities
free of charge: (1) maintenance of individual have introduced public education and health liter-
electronic health records, (2) health education, acy programs to enhance awareness on issues like
(3) vaccination, (4) infectious diseases’ preven- diet, exercise, cigarette smoking, appropriate care
tion and treatment, (5) screening and manage- of women before and during pregnancy, infants
ment of chronic diseases such as hypertension and young children, and the elderly. As usual,
and diabetes, (6) mental healthcare, (7) child implementation of national guidelines on such
healthcare, (8) pregnancy and maternity care, activities depends on uptake and funding by
and (9) healthcare for the aged. other sectors and local authorities. The regular
For the elderly and those with chronic diseases, occurrence of outbreaks of food (Xinhua 2011)
this kind of screening, along with the introduction and environmental contamination (Human Rights
of zero markup and full reimbursement for drug Watch 2011) and other scandals with public health
treatment of NCDs (Yang et al. 2013a), has made implications indicates the difficulty faced by
a huge difference to their care. However, rollout of national authorities in China’s decentralized
this initiative is slow, and although most targets context.
are being met (Yip et al. 2012), monitoring is
hampered by the absence of local denominators.
Moreover, some of the programs, such as man- Clinical Services
agement of mental illness, have not been founded
upon a training program for staff ill-equipped to Recent high-profile summaries of China’s health
provide them. In addition, unpublished evidence system tend to focus on its administration and
gathered by UNICEF in 2010 suggests that some financing and neglect the considerable improve-
of the funds are being used as salary supplements ments in clinical care available to the local popu-
to support the new responsibilities of village doc- lation. While standards at all levels of the service
tors (in public health and other programs) and that hierarchy vary very widely, health authorities
the volume of money allocated to some rural have augmented the care available at virtually all
localities is actually too high, due to public facilities across the nation. Moreover,
out-migration to cities. Meanwhile, the increasing access to services to services has improved for
proportion of China’s population living in urban all the population, albeit at high cost to both
areas, including most rural-urban migrants, can- government and individuals (Meng et al. 2012).
not access such services. Clinical services in China are conducted
Another boost to public health came with the through a hierarchically arranged network of
MoH’s program, also introduced in 2009, to pri- facilities ranging from tertiary referral centers in
oritize interventions for certain vulnerable the large cities (most having high-quality diagnos-
populations. These include: (1) catch-up hepatitis tic and laboratory equipment) to second-tier hos-
B vaccination for those aged <15 years; pitals at county and district level. Rural townships
33 Health System in China 799
Table 4 Number of outpatient visits and inpatients in primary care, community general practice, and
health institutions in China in 2011 lower-level facilities in the HSR (Yip et al. 2012;
Total visits Ministry of Health 2012a) and to moving outpa-
(100 million Total inpatients tient care in particular from hospitals to primary
Health institution person-times) (10,000 persons)
type (n = 62.7) (n = 15,298) care facilities (Barber et al. 2014).
Hospitals n (%) 22.6 (36) 10,755 (70.3) As would be expected for a nation of this size
General-acute 16.74 8431 and variation, clinical services in China vary
hospitals widely, from the world-class care available to
Hospitals 3.61 1349 residents in Shanghai, Beijing, Guangzhou, and
specialized similar cities to the most basic care in rural clinics
in TCM
in far western China. Similarly, models for the
Specialty 1.88 844
hospitals
care of chronic illness and the use of day-care
Sanitaria 0.05 98 and hospital in the home vary widely, but in gen-
Community health 38.05 (60.7) 3775 (24.7) eral these options are not yet well developed in
institutions (%) China. The average length of inpatient stay is high
Health centers 8.8 3472 in China compared to OECD nations (Meng et al.
Urban health 0.11 23 2012), particularly in public hospitals, which
centers account for 89% of total beds and 92% of hospital
Rural township 8.7 3449 admissions (Barber et al. 2014). Clinicians at
hospitals
community level have usually had training in
Outpatient 0.7 13
department TCM and many practice both western medicine
Clinics, health 5.2 and Chinese medicine.
centers, and However, the preparedness of clinicians in pri-
nurse stations mary care for the wide range of conditions they
MCH centers 1.76 (2.8) 682 (4.28) treat varies widely. For example, China’s current
(stations) n (%)
HSR acknowledges that the system’s clinical
Specialized disease 0.2 38
prevention and focus has been ill-suited to the screening and
treatment institutes outpatient care of chronic illness, an increasing
Source: China Health Statistical Yearbook 2012 (Ministry priority as rates of noncommunicable diseases rise
of Health 2012b) (The World Bank Human Development Unit
2011). Similarly, the high-volume model of clin-
ical care in China is poorly suited to the manage-
and urban communities are served by clinics or ment of mental illness (Qin et al. 2008), aged care
hospitals with varying capacity for inpatient care and dementia, and prevention of tobacco-related
and surgery. At village or neighborhood level, illness and alcohol consumption, all of which are
public or (mostly) private facilities provide basic needed in China (The World Bank Human Devel-
outpatient care, usually with an attached dispen- opment Unit 2011; Phillips et al. 2009; Yang et al.
sary and possibly with links to a laboratory or 2013b; Zhou et al. 2011; Chan et al. 2013).
radiology service. Concern about the standard of With respect to quality of care, in the last
care provided by local facilities has resulted in decade China has moved to standardize many
many patients self-referring to higher-level facili- clinical pathways and practices, and the concept
ties and hospitals (Table 4). As a result, hospitals of evidence-based medicine is increasing. How-
in China tend to provide care for all level of ever, attention to such standards and their influ-
illness, resulting in inefficiency and over- ence on clinical care is perceived to be low (Yang
crowding. Expenditure on hospital-based care as and Fan 2012). Moreover, funding for and the
a proportion of THE in China far exceeds that in quality and independence of clinical research,
many OECD nations (Barber et al. 2014), access to information, and the ability of clinicians
resulting in the high priority given to improving to practice independent of the profit motive are
800 D. Hipgrave and Y. Mu
major obstacles to the use of evidence-based Acknowledging these problems, China’s HSR
guidelines in clinical care in China (Barber et al. included establishment of a National Essential
2014; Wang 2010). Medicines Scheme (NEMS) to improve popula-
tion access to and reduce the cost of essential
medicines (State Council 2009), particularly at
Pharmaceutical Care grassroots (township and village) level. The
Scheme covers drug production, pricing, distribu-
China’s pharmaceutical sector has been one of tion, procurement, prescribing, and payment
the most problematic for health authorities over (Hu 2010) and a new National Essential Drugs
recent decades and the focus of major reform List (NEDL) for primary healthcare institutions.
efforts in the last few years. In 2008, 42.7% of The 2012 NEDL comprises 317 western drugs
China’s THE was on drugs (Hu 2010), compared and 203 TCM commodities (increased from
to 17% in developed nations (Seiter et al. 2010). 205 western and 102 in 2009) for storage and
Excessive drug prescription was common in use by grassroots facilities. Bidding prices for
rural China (Zhan et al. 1998; Pavin et al. 2003; 296 NEDL drugs were capped (Schatz and
Dong et al. 2008; Chen et al. 2010; Yu et al. Nowlin 2010), and a “zero markup” (no profit)
2010), and there is evidence that China’s rural policy was introduced, although markups remain
health insurance scheme was encouraging over- allowed at county-level and higher facilities. By
prescription (Chen et al. 2010; Sun et al. 2009). late January 2012, 99.8% of township hospitals
Drug sales continue to provide the largest and 58.1% of village clinics had implemented the
income source for China’s county health facili- policy (Ministry of Health 2012d). In addition,
ties; doctors have a pecuniary incentive to pre- most (urban) districts and (rural) counties had
scribe more and more expensive drugs (Chen made NEDL medicines reimbursable by the vari-
et al. 2010; Yu et al. 2010). Hospitals and doctors ous health insurance schemes, with higher reim-
profit significantly from the sale of drugs bursement rates than for nonessential medicines
(Yu et al. 2010; The World Bank Group East (Ministry of Health 2011c). Finally, to regulate the
Asia Pacific Region 2010), affecting financial pharmaceutical market and distribution of essen-
access to healthcare (Tang et al. 2008; Meng tial drugs, the NEMS introduced province-wise,
et al. 2012). Weak regulation of drug manufac- collective, internet-based public bidding and pro-
ture and distribution raises safety concerns curement for NEDL medicines.
(Yu et al. 2010; Guan et al. 2011). These four elements – the NEDL, zero
Previous efforts to improve the pharmaceutical markup, reimbursement of certain drug costs by
sector had limited effect. The impact of laws, insurers, and public procurement – were
decrees, and 24 separate price reductions over designed by the government to wrest control of
1996–2007 was constrained by hospital financ- the public pharmaceutical sector from the private
ing/income generation, market influences, and sector. However, the official HSR documents
patient preferences (Chen et al. 2010; Yu et al. encourage local adaptation of the broad design
2010). Price controls were undermined by manu- (Ho 2010), including the NEDL (which has
facturers, wholesalers, and retailers and by hospi- indeed been widely augmented (Guan et al.
tals and physicians controlling the prescription of 2011; Shi et al. 2011)) and strategies to compen-
price-controlled drugs (Hu 2010; Yu et al. 2010; sate providers for the zero markup policy. Few
Chen and Schweitzer 2008). New drug approvals evaluations of the impact of the Scheme have
were issued at astonishing rates (Ho and Gostin emerged. Very early indications suggested little
2009) and the former head of the national drug change in prescribing practices (Yip et al. 2012),
administration authority was executed in 2007 for but a small field evaluation found that while
accepting bribes. Kickbacks and corruption con- drug procurement has been systematized and
tinue to mar the sector (Yip et al. 2012; Yang and the cost of care had declined coincident with
Fan 2012). reduced drug prices, manufacturers have not
33 Health System in China 801
uniformly supported the changes, and some drug employed 17.5% of the total labor force, owned
prices have actually increased. Provider compen- 9.7% of total medical beds, and received 9.1% of
sation for reduced income was mostly ineffec- total patient hospital visits (Ministry of Health
tive, forcing some to seek alternative sources of 2012b). Compared with public facilities, a large
income within and outside the health sector. percentage of elderly physicians and new laborers
Rational drug prescribing had improved in this in health market are practicing in private clinics
study. The loss of drug income had forced health (Tang et al. 2014b). This staffing structure could
facilities to rely more on public financing, and have negative impact on quality of services.
providers complained of higher workload and In general, despite rapid development in recent
lower incomes (Yang et al. 2013a). Similar years, private health services are at an early stage
issues were found in another study in different of development in China. One major reason is that
locations (Xiao et al. 2013). the evolution and current standing of national
The NEMS particularly impacts small rural policy generally still favors public providers in
health facilities and will again rely on consider- terms of resource allocation, stewardship (entry
able local support for its implementation. Mean- and registration control), opportunities for promo-
while, provinces are continuing to augment even a tion, and social insurance entitlements. This
revised version of the NEDL (Tang et al. 2014a), accounts for common challenges in the private
and zero markup has not yet been applied in sector, i.e., lack of technical capacity, poor infra-
county or higher-level facilities. While insurance structure, and thus compromised service quality.
reimbursement and capitation may help to Health authorities are now promoting a robust
improve prescribing practices and reduce patient private sector to encourage competition and effi-
outlays, more control of procurement, manufac- ciency within the health sector, aiming for 20% of
turer, and prescriber practices are required. beds and services to be privately provided by
The recently announced reforms of county 2015. However, subsidization of grassroots level
hospital funding and administration include a public institutions may prevent moves in this
major focus on drug procurement, prescription, direction.
management, and pricing (State Council 2014).
Health Outcomes
Private Healthcare
While China’s progress on major health indicators
As a consequence of the marketization of China’s during the 30 years immediately following the
health sector in the 1990s, provision of health foundation of the PRC is unparalleled (Jamison
services was opened significantly to private pro- et al. 1984), marketization and the unaffordability
viders. The number of private providers increased of healthcare for a large proportion of the popula-
rapidly and now comprises a significant propor- tion stymied progress in the 1980s and 1990s.
tion of the market. For example, in 2005, private There are even suggestions that child mortality
hospitals accounted for only 17.2% of total hos- rates in China actually rose in the 1980s (Banister
pitals, but the share had increased to 38.4% by and Hill 2000), with the breakup of the commune-
2011. In 2011 among all 954,389 health facilities based health cooperatives. Moreover, improve-
(hospitals, clinics, and other institutions), 47% ment in certain indicators has been slow. For
operated as “private” entities. Reports indicated example, urban maternal mortality has been slow
that private health providers can offer services at a to fall, almost certainly because reductions in
cheaper price and shorter physical distance and maternity risk for urban residents have been
waiting time for patients (Deng et al. 2013) and diluted by the much higher risk of death in preg-
are highly active in the provision of healthcare in nancy among urban migrants (Fig. 12) (Zhang
China. However, most private facilities are small et al. 2014). Geographic disparities also remain
and poorly equipped, and collectively they only great, particularly between eastern and western
802 D. Hipgrave and Y. Mu
140
120
100
80
Urban
60 Rural
Total
40
20
Fig. 12 Maternal mortality per 100,000 live births by urban-rural location (Source: China Health Statistics Year Book
(Ministry of Health 2012b) and NHFPC (China National Health and Family Planning Commission 2012))
provinces (Wang et al. 2012). In general, the pri- compare favorably with other developing coun-
ority given to China’s recent HSR acknowledges tries, and China’s performance in reducing rural
that progress in its population’s health status was maternal and neonatal mortality has been outstand-
less than could have occurred, given the nation’s ing (Feng et al. 2010b, 2011). China has already
economic growth since the 1980s (Yip et al. achieved all the health targets in MDGs 4, 5, and
2012). Acknowledgement of this is the govern- 6 and achieved the target on reducing child under-
ment target of a one-year increase in life expec- weight in the early 2000s. Urban-rural disparity in
tancy by 2015 (Ministry of Health 2012a). The under-five and particularly maternal mortality has
most comprehensive analysis of the causes of declined since 1990, but remains high for
death and disability in China, published in child underweight and stunting and especially
mid-2013, highlighted the dramatic evolution of for child micronutrient deficiency (UNICEF
its demographic transition, with NCDs now mak- China, unpublished data; (Hipgrave et al. 2014)).
ing up all but two of the top 30 causes of lost life Challenges to population health status have
years, and most infectious diseases having fallen been alluded to already and include the rise of
precipitously. The report also noted the contribu- NCDs, especially smoking-related illness (The
tion of air and household pollution to mortality World Bank Human Development Unit 2011),
and morbidity and the need for cross-sectoral illness due to environmental damage and air
action to tackle the major causes of ill-health in pollution (The World Bank Human Develop-
China (Yang et al. 2013b). ment Unit 2011; Millman et al. 2008), urbani-
Nonetheless, in 2010, average life expectancy zation, and the provision of services for newly
in China was 74.8 years, and in 2012 the maternal arrived migrants (Gong et al. 2012). The pre-
mortality ratio was 24.5/100,000 live births, infant vention of accidents and injury will also play an
mortality rate 10.3‰, and under-five mortality increasing role in maintaining China’s trajectory
rate 13.2‰ (China National Health and on reducing preventable death and ill-health
Family Planning Commission 2012). These figures (Wang et al. 2008b). As the population ages,
33 Health System in China 803
private and institutional care of the elderly is remain least able to fund public services, despite
another major issue for China’s health and other having the greatest needs. As a result, proportional
social sectors. household expenditure on healthcare has not
declined.
Urban residents of China’s industrialized east-
Assessment ern provinces enjoy a high quality of healthcare
and access to trained personnel. This is not the
China’s progress in maternal and child health, case for poorer rural residents, particularly in the
urban health, and communicable disease control nation’s vast western region. The official engage-
are very encouraging, but the nation’s health sys- ment of village doctors to provide publicly funded
tem now faces a vastly different range of issues health services in rural areas should improve the
from those it faced before. standard of and public confidence in their care, but
In addition to health insurance reforms that the burden on this ageing cadre of staff is rising
commenced in 2003, in many ways the compre- and may be untenable; again, accountability for
hensive health system reforms announced in 2009 this national initiative will be to local government
have been highly successful. Insurance coverage and health officials unused to the application of
is almost universal, and the benefit package is treatment algorithms, performance-based assess-
gradually expanding, even for outpatient services, ment, and clinical audit. Concern about the care
although a system for ensuring coverage for the provided by community providers continues to
huge population of rural-urban migrants remains result in many patients self-referring to higher-
under development. Introduction of public health level facilities and hospitals.
screening and management, building of new Population health in China is threatened by the
health infrastructure and expansion of rise of NCDs, especially illness due to diabetes,
community-based services, measures to control cardiovascular disease, overweight, tobacco
profiteering from the sales of drugs, scale-up smoking, environmental damage, and air pollu-
training of health personnel, and other measures tion. The prevention of accidents and injury and
were both needed and are being implemented. On management of mental illness will also play an
the other hand, the reform of hospital management increasing role in maintaining China’s trajectory
and financing remains at the pilot stage, with on reducing preventable death and ill-health. The
suggestions but no formal guidance on the required focus of the health sector on chronic
model to be followed. illnesses, aged care, and outpatient services
China’s HSR is encouragingly specific but not requires a dramatic increase in the engagement
prescriptive on strategy. Monitoring the reform and stewardship of community providers.
remains predominantly output-based at macro- This has been a major focus of China’s health
level; no detailed independent assessments have reforms, now well into their second phase, and it
been undertaken, and population-level studies of is likely that further major policy and financial
health outcomes related to the reforms have not inputs will be announced before this phase con-
been undertaken. Moreover, mechanisms to incor- cludes in 2015. The private sector will play an
porate patient feedback into health service provi- increasing role in the provision of health services
sion have not been established and may be in China, but a higher level of stewardship and the
ignored if local economic, political, or vested use of financial mechanisms to reign in escalating
interests override such input, as has been observed costs will almost certainly be required, especially
in relation to China’s natural environment. Pub- for hospital care. To ensure consistency and trans-
lic financing of the health sector, although ferability, this may involve stronger oversight by
modest by global standards, has improved, par- and involvement of national health policy and
ticularly in relation to the proportion of THE financing authorities, notwithstanding the power
that is out of pocket. But costs are rising faster vested in subnational authorities in China’s sys-
than government inputs, and poorer constituencies tem of government.
804 D. Hipgrave and Y. Mu
Hipgrave D. Perspectives on the progress of China’s Millman A, Tang D, Perera FP. Air pollution threatens the
2009 – 2012 health system reform. J Glob Health. health of children in China. Pediatrics. 2008;122
2011b;1(2):142–7. Epub 1 Dec 2012. (3):620–8. Epub 3 Sept 2008.
Hipgrave D, Hort K. Will current health reforms in south Ministry of Health. The National Health Statistics
and east Asia improve equity? Med J Aust. 2014;200 reporting system (in Chinese). Beijing: Chinese Acad-
(9):514. emy of Medical Science; 2007.
Hipgrave D, Guo S, Mu Y, Guo Y, Yan F, Scherpbier RW, Ministry of Health. Report on women and children’s health
et al. Chinese-style decentralization and health system development in China. Beijing: China Ministry of
reform. PLoS Med. 2012;9(11):1–4. Health; 2011a.
Hipgrave DB, Fu X, Zhou H, Jin Y, Wang X, Chang S, Ministry of Health. China’s Minister of Health: rural doc-
et al. Poor complementary feeding practices and high tors will continue to serve the masses indefinitely
anaemia prevalence among infants and young children (in Chinese). 2011b. Available at: http://www.gov.cn/
in rural central and western China. Eur J Clin Nutr. gzdt/2011-02/18/content_1805889.htm. Last viewed
2014;68:916. 24 Oct 2014.
Ho CS. Health reform and de facto federalism in China. Ministry of Health. China 2010 health statistical yearbook.
China Int J. 2010;8:33–62. Beijing: China Ministry of Health; 2011c.
Ho CS, Gostin LO. The social face of economic growth: Ministry of Health. China’s State Council announcement
China’s health system in transition. JAMA. 2009;301 on deepening medical and health system planning and
(17):1809–11. Epub 7 May 2009. implementation of the program during the 12th Five
Hu S. Financing, pricing and utilisation of pharmaceuticals Year Plan. 2012a. Available at: http://www.wpro.who.
in China: the road to reform. Beijing: The World Bank int/health_services/china_nationalhealthplan.pdf. Last
East Asia and Pacific Region; 2010. Contract No.: viewed 24 Oct 2014.
58410. Ministry of Health. China health statistics yearbook. Bei-
Hu G, Baker T, Baker SP. Comparing road traffic mortality jing: Chinese Academy of Medical Science; 2012b.
rates from police-reported data and death registration Ministry of Health. Three years of significant progress in
data in China. Bull World Health Organ. 2011;89 health reform (in Chinese). 2012c. Formerly available
(1):41–5. Epub 25 Feb 2011. at: http://www.moh.gov.cn/publicfiles/business/htmlfiles/
Huang YZ. The sick man of Asia. Foreign Aff. mohbgt/s3582/201201/53883.htm. Last viewed 20 Aug
2011;90:119–36. 2012 – MoH website now deleted.
Human Rights Watch. “My children have been poisoned”: Ministry of Health. Health statistical monthly reports. Bei-
a public health crisis in four Chinese provinces. jing: Ministry of Health; 2012d.
New York: Human Rights Watch; 2011. Ministry of Health Centre for Health Statistics and Infor-
Jamison DT, Evans JR, King T, Porter I, Prescott N, Prost mation. An analysis report of the fourth national health
A. China: the health sector. Washington, DC: The services survey in China in 2008. Beijing: China Union
World Bank; 1984. Medical University Press; 2009.
Kaiman J. Chinese statistics bureau accuses county of Mulholland K, Temple B. Causes of death in children
faking economic data. The Guardian. 7 Sept 2013. younger than 5 years in China in 2008. Lancet.
Li L, Chen Q-L. A rational evaluation of China’s health 2010;376(9735):89.
sector reform over three years. Health Econ Res. National Bureau of Statistics. China statistical yearbook
2012;5:7–12. (in Chinese). Beijing: National Bureau of Statistics;
Liu Y. China’s public health-care system: facing the chal- 2011. Available at: http://www.stats.gov.cn/tjsj/ndsj/
lenges. Bull World Health Organ. 2004;82(7):532–8. 2011/indexch.htm. Last viewed 24 Oct 2014.
Epub 27 Oct 2004. National Bureau of Statistics. Tabulation of the 2010 pop-
Liu MD. Sub-provincial intergovernmental fiscal transfers. ulation census of People’s Republic of China. Beijing:
2006 Annual China Fiscal Reform Forum. Beijing: China Statistics Press; 2012.
UNDP; 2007. National Bureau of Statistics. China statistical yearbooks.
Liu Y, Rao K, Hsiao WC. Medical expenditure and rural Beijing: Published annually; 2016.
impoverishment in China. J Health Popul Nutr. OECD. Health at a Glance: OECD Indicators, OECD Pub-
2003;21(3):216–22. Epub 14 Jan 2004. lishing. 2013. https://doi.org/10.1787/health_glance-
Ma XVC. National Commission for Health and Family 2013-en
Planning. Quoted comments given at press conference Osnos E. China’s censored world. New York Times. 2 May
during 13th National People Congress 2013 2014.
(in Chinese). 2013. Available at: http://news.sina.com. Pavin M, Nurgozhin T, Hafner G, Yusufy F, Laing
cn/c/2013-03-15/035926536113.shtml. Last viewed R. Prescribing practices of rural primary health care
24 Oct 2014. physicians in Uzbekistan. Trop Med Int Health.
Meng Q, Xu L, Zhang Y, Qian J, Cai M, Xin Y, et al. Trends 2003;8(2):182–90.
in access to health services and financial protection in Phillips MR, Zhang J, Shi Q, Song Z, Ding Z, Pang S,
China between 2003 and 2011: a cross-sectional study. et al. Prevalence, treatment, and associated disability
Lancet. 2012;379(9818):805–14. of mental disorders in four provinces in China during
806 D. Hipgrave and Y. Mu
and implications. Hum Resour Health. 2014;12:36. Zhan SK, Tang SL, Guo YD, Bloom G. Drug prescribing in
Epub 30 June 2014. rural health facilities in China: implications for service
Yang DL. The central-local relations dimension. In: Free- quality and cost. Trop Doct. 1998;28(1):42–8. Epub
man CW, Lu XQ, editors. Implementing health care 3 Mar 1998.
reform policies in China. Washington, DC: Center for Zhang W, Navarro V. Why hasn’t China’s high-profile
Strategic and International Studies; 2011. p. 21–9. health reform (2003–2012) delivered? An analysis of
Yang ZP, Fan DM. How to solve the crisis behind Bribegate its neoliberal roots. Crit Soc Policy. 2014;34:175–98.
for Chinese doctors. Lancet. 2012;379(9812):e13–5. Zhang J, Zhang X, Qiu L, Zhang R, Hipgrave D, Wang Y,
Yang L, Cui Y, Guo S, Brant P, Li B, Hipgrave et al. Maternal deaths among rural-urban migrants in
D. Evaluation, in three provinces, of the introduction China: a case-control study. BMC Public Health.
and impact of China’s National Essential Medicines 2014;14:512.
Scheme. Bull World Health Organ. 2013a;91:184–94. Zheng M, Fu Q, Wang X. Comparative study on structural
Yang G, Wang Y, Zeng Y, Gao GF, Liang X, Zhou M, et al. changes in income disparities in urban households in
Rapid health transition in China, 1990–2010: findings Chongqing Municipality, Shanghai Municipality and
from the Global Burden of Disease Study 2010. Lancet. Sichuan Province. J Reform Strategy. 2008;5:98–101.
2013b;381(9882):1987–2015. Epub 12 June 2013. Zhou LA. Reforming China’s local government gover-
Yip WC-M, Hsiao WC, Chen W, Hu S, Ma J, Maynard nance. In: Incentives and governance: China’s local
A. Early appraisal of China’s huge and complex health- governments. Singapore: Cengage Learning Asia Pte.
care reforms. Lancet. 2012;379(9818):833–42. Ltd.; 2010a.
Youlong G, Wilkes A, Bloom G. Health human resource Zhou LA. Incentives and governance: China’s local gov-
development in rural China. Health Policy Plan. ernments. Singapore: Cengage Learning Asia Pte. Ltd.;
1997;12(4):320–8. Epub 3 Nov 1997. 2010b.
Yu X, Li C, Shi Y, Yu M. Pharmaceutical supply chain in Zhou L, Conner KR, Caine ED, Xiao S, Xu L, Gong Y,
China: current issues and implications for health sys- et al. Epidemiology of alcohol use in rural men in two
tem reform. Health Policy. 2010;97(1):8–15. Epub provinces of China. J Stud Alcohol Drugs. 2011;72
24 Mar 2010. (2):333–40. Epub 11 Mar 2011.
Health System in Egypt
34
Christian A. Gericke, Kaylee Britain, Mahmoud Elmahdawy,
and Gihan Elsisi
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811
Organization and Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813
Historical Background Until 2011 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814
Public System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815
Private System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816
Information Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
Expenditure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
External Sources of Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
Insurance Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
Health Payments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820
Paying Health Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820
C. A. Gericke (*)
Anton Breinl Centre for Health Systems Strengthening,
James Cook University, Cairns, Australia
University of Queensland School of Public Health,
Brisbane, Australia
e-mail: c.gericke@uq.edu.au
K. Britain
University of Queensland School of Public Health,
Brisbane, Australia
e-mail: kayleebritain@gmail.com
M. Elmahdawy
Ministry of Health, Cairo, Egypt
e-mail: mahmoud77@yahoo.com
G. Elsisi
Ministry of Health, Cairo, Egypt
Faculty of Pharmacy, Heliopolis University, Cairo, Egypt
e-mail: gihanhamdyelsisi@hotmail.com
5 4 3 2 1 0 0 1 2 3 4 5
Fig. 1 Population pyramids for Egypt (2000 and 2013). (Source: United States Census Bureau, International
Programs 2013)
1991 to counter some of these undesired side improvements and a growing economy, wide-
effects had some positive impact through its spread public dissatisfaction with basic living
microcredit and community financing initiatives conditions and high levels of poverty remained
(Abou-Ali et al. 2010). Despite some social and spurred the Arab Spring revolution in 2011.
34 Health System in Egypt 813
Table 1 Socioeconomic and demographic indicators for in fact low, while noncommunicable diseases in
Egypt Egypt are higher than other countries in the
Indicator Year Country geographic region (WHO 2013). Of the non-
Socioeconomic communicable diseases, like in many other
Total population 2013 85,294,388 countries, obesity is a growing factor with over
Population living in urban 2010 43% 33% of the population being obese as of 2008
areas (%)
(WHO 2013). HIV/AIDS has also been an
Gross national income per 2010 6120
capita
increasing health issue with over 11,000
Gross domestic product 2010 $255 known persons with the infection as of 2009.
billion Today the top three diseases causing mortality
GDP growth rate (%) 2012 2.00% are essential primary hypertension, intracere-
Poverty rate (%) 2010 25.20% bral hemorrhage, and fibrosis/cirrhosis (WHO
Unemployment rate (%) 2012 12.50% 2013). In contrast, Egypt has put little emphasis
Rate of urbanization 2010 2.10% on controlling environmental risks to health and
Literacy rate males (%) 2010 80% well-being (Anwar 2003; Gericke 2006).
Literacy rate females (%) 2010 64%
Demographic
Total fertility rate (per woman) 2013 2.9
Population 0–14 (%) 2013 32.30% Organization and Governance
Population 65 years and 2013 4.80%
over (%)
Overview
Death rate (per 1000 2013 4.79
population)
Birth rate (per 1000 population) 2013 23.79 Egypt’s health-care system is pluralistic and
Sources: CIA Factbook 2013, WHO 2010, World Bank
complex combining both public and private
2013 providers and financers. The government has
committed to provide health care to the poor;
however, with a system pluralistic in nature,
The uprising has caused economic growth to health-care providers compete, and clients are
slow down (CIA 2013) in the past few years free to choose services based on their needs
due to the political uncertainty along with a along with the ability to pay (WHO-EMRO
significant reduction in tourism (Haley and 2006). Subsequently, the health-care system
Beg 2012). At the same time, the revolution relies upon four financing agents:
has resulted in increased social spending to
address public dissatisfaction and has also led • Government sector
toward a reduction in foreign exchange reserves • Public sector
contributing to a rising deficit (CIA 2013). • Private organizations
Overall health in Egypt prior to 2011 had • Household payments (out-of-pocket)
been steadily improving over time with marked
increases in life expectancy and decreases in The government sector represents the various
infant mortality rates since 1990 (Table 2). ministries and departments of the government
Communicable disease control, in particular, financed primarily through the Ministry of
for endemic tropical diseases such as schistoso- Finance (MOF). Other government agents are
miasis has also made great improvements dur- the Ministry of Health and Population (MOHP),
ing this time; however, diarrheal diseases, acute the Ministry of Higher Education, and the Minis-
respiratory infections, and hepatitis are still tries of Interior and Defense. The MOHP is
reported from health facilities (CIA 2013). responsible for policy formulation and the
Compared to other MENA countries, the popu- regulation of the health sector including public,
lation percentage of communicable diseases is nongovernmental, and private organizations
814 C. A. Gericke et al.
Sector Reform Program (HSRP) established (Elgazzar 2009). This organization is headed by
through the Family Health Fund along with the the minister and further employs over 5,000 per-
HIO (Salem 2002). To aid this development, sonnel in managing and delivering public health
Egypt received substantial foreign aid and assis- services (WHO-EMRO 2006). However, due to
tance by the World Bank, USAID, and the Euro- poor salary bases for doctors along with income-
pean Commission (Salem 2002). Egypt also based inequality in service utilization, the quality
became a party to International Health Regula- of public health care in Egypt is known to be poor
tions (IHR) to improve practice, surveillance, which shifts both suppliers and demand to private
and preparedness for health issues (WHO 2010). health care (Elgazzar 2009). Despite this, the
Egypt’s HSRP was officially introduced in 1997 MOHP is the major provider of primary, preven-
to address how health in Egypt is organized, tative, and curative care with over 4,500 health
financed, and delivered (Haley and Beg 2012). facilities throughout the country (WHO-EMRO
This program has worked to improve upon the 2006). The MOHP delivers its functions through
disjointed and complex health system through four separate levels which correlate to the follow-
the private and public sector that existed then ing levels of health care (WHO-EMRO 2006):
and now in Egypt. A few years after its establish-
ment, the Healthy Egyptians 2010 Initiative was
• Central
launched in 2000 to foster disease prevention and
• Health directorates (government level)
control (Anwar 2003).
• Health districts
The accumulation of reforms has benefited the
• Health-care providers
health system in Egypt by implementing a social
health insurance model, successfully increasing
surveillance, and reducing communicable disease Centrally, the MOHP is divided into ten sectors
incidence and prevalence (WHO 2010). However, (MOHP 2013) depicted in Fig. 2.
given Egypt’s lower-middle-income status, its These sectors, in accumulation, control the
overall population health is relatively poor in policy and regulation of health and health services
comparison with other lower-middle-income in all of Egypt. The governorate level of the
countries. Furthermore, despite some improve- MOHP operates in purchasing and financing
ments, the burden of noncommunicable diseases health care for the Egyptian population by
has increased, putting further strain on Egypt’s balancing income and expenditure in compliance
health system (Roberts et al. 2013). Universal with health sector regulations (WHO-EMRO
health care still has to be achieved due in large 2006). The district health structure is simply a
part to the privatization and its subsequent reduc- replication of the government level in functional-
tion in public spending which forced an increase ity except on smaller scales (WHO-EMRO 2006).
in prepaid private and in out-of-pocket health Finally, the provider level of the MOHP is divided
expenditure (WHO-EMRO 2006). Compared to based on services as well as location (WHO-
other lower-middle-income countries, Egypt EMRO 2006). Despite a consistent discrepancy
spends comparatively little on health care: only between rural and urban health care in Egypt, the
4.75% of GDP (2007–2008) (Ministry of Health, MOHP does try to provide a large variation of all
Egypt 2010). necessary services to all populated areas of Egypt.
A main component of the public sector of the
Egyptian health system is the HIO. While most
Public System Egyptians rely on private care provision in addi-
tion to the services provided by the MOHP, the
The primary organization behind the public sys- HIO is the largest health insurer in Egypt with
tem in Egypt is the MOHP. The MOHP offers continuous increases to its utilization through the
health service free of charge to every Egyptian years (Haley and Beg 2012). From 1990 to 2008,
citizen covering all inpatient and outpatient care the percentage of population insured by the HIO
816 C. A. Gericke et al.
increased from 10% to 55% (Table 4) showing not perception of high quality within the country.
only its growing use but also improvements to the However, the system prior to 2011 has not set up
public sector by increased access (Ministry of sufficient regulations on governing its service and
Health, Egypt 2010). However, the provision of finance, forcing much of the service to be pro-
health from all public-sector services has suffered vided through purely out-of-pocket payments
from the government and MOHP’s inability to (WHO-EMRO 2006). This increases the inequal-
keep up with increasing costs (WHO 2010). This ity of health-care access within the country as the
has turned not only patients but also doctors to the private services are only for those who can afford
private health system which can provide both them. Furthermore, because there are less regula-
better salaries and physical resources. tions, more doctors are relying on private care
work as supplemental payment which has been a
key factor in the private system’s perceived better
Private System quality of care (WHO 2010). The lack of govern-
mental regulation along with competing health
Increased privatization along with poor mainte- insurers and providers has resulted in a severe
nance of public care has driven substantial devel- absence in communication between the private
opment of the private health-care system in Egypt. and public sectors. This has been a key source of
Moreover, the private system has achieved a Egypt’s health system’s continuous dysfunction.
34 Health System in Egypt 817
Expenditure
Financing
In Egypt, public funding for the health system
Overview flows to financial agents and then onto providers
under mutually exclusive tracts known as silos.
In 2007/2008, Egypt invested 42.5 billion Egyp- This impedes care coordination and effective
tian pounds (LE) on health. For a middle-income allocation of resources between the public and
country in the region, this amount of spending is private sectors (Ministry of Health, Egypt 2010).
relatively low (see Table 3 for comparisons) (Min- From this, expenditure moves into various parts of
istry of Health, Egypt 2010). Breaking this down, the health system ranging from both private
financing derives from direct tax revenues, HIO and public service providers to pharmaceuticals.
premium payments and direct out-of-pocket The largest part of health financing is expenditure
spending from private households, private health for pharmaceuticals (Fig. 3). Pharmaceuticals
insurance premiums, and health spending from account for 25.9% of total health expenditure
employers to employees, and finally assistance which is a relatively high percentage in comparison
also comes from a cigarette tax as well as minor of comparable health systems (Ministry of Health,
donor assistance (Ministry of Health, Egypt Egypt 2010). Funding for pharmaceuticals mostly
818 C. A. Gericke et al.
Table 3 Comparison of health spending in Egypt to other WHO’s Middle Eastern countries (2007/2008)
Percent Per capita
of GDP Government Health spending as the Out-of-pocket health spending
spent on spending as the percentage of total expenditure as the (Constant
health (%) percentage (%) government budget (%) percentage (%) 2005 US$)
Algeria 4.49 83.85 10.65 15.30 205
Dji bouti 8.54 76.07 14.15 23.60 81
Egypt 4.75 33.00 5.00 60.00 111
Iran 6.30 45.72 11.40 51.68 294
Jordan 9.10 62.20 11.35 33.40 273
Lebanon 8.76 48.99 12.39 39.95 551
Li bya 2.80 75.88 5.38 24.12 383
Morocco 5.33 34.87 6.17 56.13 133
Syria 3.23 45.13 6.01 54.87 76
Tunisia 5.95 49.17 8.90 42.52 213
Sources: WHO NHA data, Egypt NHA results, Jordan NHA report, cited in Egypt MOH 2010
5.7%
Source: Ministry of Health 2010
University
2.9% Hopitals
0.9% 3.5%
2.8%
1.9%
Source: Ministry of Health 2010
The goal of the HIO is to be a provider of health with no cap on the quantity of services (WHO-
services to its beneficiaries under a low and fixed EMRO 2006). However, there remains inadequate
premium structure with an extensive benefit pack- management of HIO service providers resulting in
age (WHO-EMRO 2006). These benefits include poor care and low responsiveness in the public
transplants, plastic surgery, and treatments abroad system (Mosallam et al. 2013). Therefore, with the
820 C. A. Gericke et al.
higher demand for private care, the benefits of social purchase services and goods from other
insurance force most patients insured with the HIO providers)
to continue to pay out-of-pocket for most of their 3. Direct household funding
health care. (a) Over 90% of this goes directly to private
Private health insurance is not significant in health-care providers.
Egypt in terms of financing or population cover-
age, but it is on offer for the select few who can Another source of health revenues has been the
afford it, for example, Egycare. The regulations Family Health Fund (FHF). The FHF pays perfor-
have made it difficult for organizations to profit mance-based incentives to health workers in the
from health insurance schemes, which remain the public sector (WHO-EMRO 2006).
biggest barrier to their spread. However, from
time to time, new private health insurance pro-
grams appear which benefit upper-middle and Paying Health Workers
upper-class populations (WHO-EMRO 2006).
A major development post the 2011 Arab Prior to 2011, over 50% of health professionals were
Spring revolution era was the enactment of a employed by MOHP facilities (WHO-EMRO
new health insurance law, a project in the pipeline 2006). With the limited funds that the MOHP
that was hotly debated at the ministerial cabinet’s receives, salaries for individuals are limited forcing
level, in addition to the parliament floor. Unfortu- most professionals to practice privately for further
nately political instability with frequent changes sources of income. In turn, 89% of medical doctors
in the government executive has delayed the pro- had been found to hold more than one job prior to
ject launch and implementation. A new draft of 2011 (WHO-EMRO 2006). This allows for a bal-
the law was presented to the Higher Health Coun- ance in salary payments as a result of out-of-pocket
cil (HHC) in April 2012 which requested some payments to the private facilities.
modifications. The goal of this development is to Post January 2011, the health ministry has
improve universal coverage within the country in worked closely with health-care practitioners
both in a cost-effective way. The new social health (HCPs) and their respective syndicates, in addition
insurance (SHI) would be intending to cover 90% to the MOF to establish a new payroll system for
of the population and reduce out-of-pocket pay- government-employed health-care workers. It is
ments to 35% at the end of implementation phase. designed to reduce the gap between different payroll
categories. In addition to further appreciate those
Health Payments willing to serve in distant geographical locations,
new incentive schemes are being developed. Once
Out-of-pocket payments have always been the implemented, this would encourage the recruitment
largest source of service payment. Regardless of more competitive health-care practitioners into
of insurance status, there are formal user fees the government health-care system. In addition, it
for both outpatient and inpatient public services, would promote more health-care workers to serve in
the MOHP facilities having the smallest fees remote locations. The new payroll system needs
due (WHO 2010). Overall there have been parliament approval prior to implementation.
three separate pathways in provider payment
(WHO-EMRO 2006):
Physical and Human Resources
1. MOF funding
(a) Funds to government care providers Physical Resources
2. Social insurance
(a) Funds services as a combined provider and The number of health facilities has been growing
commissioner (half of revenues to finance rapidly over the last two decades (Table 5). These
services by itself and the other half to consist of both public and private facilities with
34 Health System in Egypt 821
Table 5 Summary of health facilities (2005) Table 6 Summary of health workers (2005)
Number Beds Number
MOHP 1,166 78,502 Physicians 12,917
Rural 669 11,093 Dentists 3,885
Rural (integrated) hospitals 439 8,509 Pharmacists 1,277
Urban 497 67,406 Nurses 44,300
General and district hospitals 233 34,656 Lab technicians 3,575
Obstetric and pediatric hospitals 10 752 Source: CAI HC data, cited by WHO 2006
Mental hospitals 17 6,415
Teaching hospitals and 18 5,639
institutes (THO)
these policies has hindered their implementa-
Curative care organization (CCO) 11 2,129
tion (Jabbour 2012). Progress has been made
Health insurance organization 40 9,828
(H10) in eliminating a job guarantee following medi-
Other ministries’ hosptials 119 29,851 cal school. However, there has always been a
Medical schools 71 25,891 lack of communication between the universities
Police and prison 26 1,382 producing doctors and the government oversee-
Private hospitals 1,329 15,302 ing policies. Prior to 2011, there were 6.53 phy-
Source: Egypt MOHP data, cited by WHO 2006 sicians and 13.75 nurses per 10,000 people
registered in Egypt (WHO-EMRO 2006).
There is little to no data as to whether the size
varying services and amenities. Most facilities of the workforce has been adequate to this point
offer at a minimum basic structural needs (i.e., (WHO-EMRO 2006). Little has changed in
electricity and water) along with at least one human resources apart from new payroll incen-
doctor (MOHP 2004). Maternal, child, and tives to encourage health professionals to prac-
reproductive health services have continued to tice in rural areas (Table 6).
increase with urban areas showing the highest
percentage (MOHP 2004). Overall the private
sector has access to more/better medical equip- Provision of Services
ment compared to public facilities which have
been continuously underfunded (WHO 2010; Overview
Ministry of Health, Egypt 2010; WHO-EMRO
2006). This has not changed since the Arab The utilization of health services in Egypt is
Spring revolution. highly reliant upon the division of public and
private sectors of health care. For the most part,
the majority of health facilities are run by the
Human Resources MOHP. However, the dichotomous system does
not allow for health provision completely by
Historically in efforts to make the health-care the public sector because the latter is chronically
system more independent, health professions underfunded. In contrast, inpatient care is
were encouraged and looked upon highly in mainly provided by the MOHP/public sector,
society. However, with that status, many while ambulatory and pharmaceutical care is
health-care workers have been unwilling to mostly private (WHO-EMRO 2006). While
practice in rural areas of Egypt. This has con- universal health-care coverage has not yet
tinued the cycle of a poor distribution of health been achieved within the country, 100% of the
services in these areas (Jabbour 2012). Efforts Egyptian population has access to basic health
had been made by previous governments to services (WHO 2010).
encourage practitioners to work in rural areas, For the most part, the MOHP oversees hospi-
but discontent among health professionals with tals and outpatient facilities. Other public facilities
822 C. A. Gericke et al.
establishing a clinical pharmacy unit and drug worked to recover with the help of other countries
information center in every public and private such as financial assistance from the United States
hospital in order to empower and educate which further increased Egypt’s already large debt
patients on medication issues. Furthermore, the (Hamilton 2013).
use of pharmaceuticals will continue to develop It was not until June 2012 that Egypt finally
as the MOHP health technology assessment elected a new president. Promising to end years of
and pharmacoeconomics unit will work to better presidential abuse of power, Mohammed Morsi
utilize pharmaceutical resources and expenditure. was sworn in (Hamilton 2013). Within his first
year of office, Egypt began importing natural gas.
This investment was to the benefit of the nation’s
The Arab Spring Revolution richest businessmen and increased public spend-
ing on fuels to 25% of all public expenditure –
Taking inspiration from Tunisia, in January of more than what the country spends on health and
2011, Egyptian protesters working alongside the education combined (Hamilton 2013). Also
only organized opposition force, the Muslim within this term, with influence from the Muslim
Brotherhood, stormed the streets of major cities Brotherhood, Morsi broke a number of electoral
in Egypt in order to protest against the current promises. In summary, his actions brought no
Egyptian regime. This revolution succeeded to improvements to social issues nor fulfilled the
overthrow President Hosni Mubarak. Under the goals of a new constitution to be improved fol-
Mubarak presidency between 1981 and 2011, lowing the revolutionary demands. Because of
there were many grievances over questions of this, the Egyptian people once again took to the
freedom of expression, other human rights issues, streets in order to overthrow their new president
as well as social and economic issues. The revo- along with the newly developed constitution. The
lution followed a number of years with high Armed Forces sided with the people. On 3 July
unemployment rates, low wages, as well as food 2013, President Morsi was overthrown by the
price inflation. The overall goal of the revolution military’s coup d’état, and he alongside with
was to end the president’s regime along with the other leaders of the Muslim Brotherhood was
country’s policy on emergency law, lack of free- arrested and put to trial. After a series of violent
dom of speech, and overall corruption from the demonstrations and bombings on police and mil-
government. The protests varied from peaceful to itary institutions as well as on Coptic Christians
violent and lasted a total of 28 days until the and churches, the Muslim Brotherhood was
president was finally overthrown. declared a terrorist organization in December
In efforts to disassemble the protests, the Egyp- 2013. The return to a military government has
tian government attempted to eliminate social led to new uncertainty and a continuation of an
media the night before the protests started. While economically and socially unstable condition. In
this was somewhat successful, the protests still 2014, General Abdel Fattah Elsisi was elected as
filled the streets the next day resulting in President the sixth president of Egypt.
Mubarak dismissing his government, appointing a
new cabinet and vice president Omar Suleiman
(the first in 30 years) all in hopes of satisfying the Reforms
uprising masses. However, protests did not resist
until President Mubarak handed power over to the Overview
Armed Forces placing Egypt in a truly transitional
state (Abou-El-Fadl 2012). Under the Armed The 2011 revolution made way for huge changes
Forces oversight, a new prime minister, Essam within the country. For the most part, the popula-
Sharaf, was announced, the Egyptian Parliament tion recognizes the challenges caused by a
was dissolved, and the Egyptian constitution was rapidly growing population alongside an out-of-
put on hold. Following the revolution, Egypt date public-sector health-care provision (Devi
824 C. A. Gericke et al.
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828
Organization and Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829
Physical and Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
Delivery of Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
Primary Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
Hospital Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832
Integrated Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832
Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833
Disabled Adults and Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833
Mental Health Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833
Pharmaceutical Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834
Public Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834
Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835
getting sick, being injured in the workplace, There is Statutory health insurance (SHI),
getting old, and growing families. which covers virtually 100% of the resident pop-
ulation under various noncompeting schemes.
The delivery of care is shared among private,
Introduction fee-for-service physicians, private profit-making
hospitals, private non-profit-making hospitals,
The overall picture of the state of health in France and public hospitals. In addition to the health
contains apparent contradictions. On the one hand, care sector and the social sector, there is a health
indicators such as life expectancy, life expectancy and social care sector, known as the third sector,
without disability, and healthy life expectancy which provides care and services to elderly and
show that the health of the population is good. disabled people.
The French average life expectancy is now over Jurisdiction in terms of health policy and reg-
80 years and is the second highest in the world for ulation of the health care system is divided among
women Moreover, the French population is aging, the state (parliament, government, and the
and from 2020 onwards, those aged over 60 will Administration of Health and Social Affairs),
outnumber those aged under 20 (accounting for SHI, and local authorities, particularly at the
27% and 23% of the population, respectively). regional level. Reforms over the last two decades
The aging of the population is not due to a decreas- have attempted to devolve a greater remit in gov-
ing fertility rate as in other European countries. ernance and health policy decision-making to the
Indeed, France has the third highest fertility rate regional level, particularly with respect to plan-
in the EU. In addition, older people remain in better ning. This trend culminated in the 2009 Hospital,
health than in many other European countries. Patients, Health and Territories Act (loi hôpital
The main causes of death in France are cancer, patients, santé et territories; HPST), which
cardiovascular diseases, accidents, and diseases of merged institutions representing the main stake-
the respiratory system. However, France also com- holders (the state, SHI schemes, health profes-
pares well with regard to cardiovascular diseases, sionals, and public health actors) at the regional
while its relative position with respect to mortality level into “one-stop shops,” the 26 regional health
caused by alcoholism, cirrhosis, and cancer of the agencies (agences régionales de santé; ARS).
cervix is improving. Nonetheless, France suffers Cutting across the traditional boundaries of the
from a high rate of premature male deaths from health care sector, the public health preventive
accidents and unhealthy habits such as smoking sector, and the health and social care sector for
and alcoholism that are the most common causes disabled and elderly persons, the ARSs are
of avoidable mortality in France. Additionally, responsible for ensuring that health care provision
France has long reported health inequalities across meets the needs of the population by improving
socioeconomic groups that are wider than in most coordination between the ambulatory and hospital
other European countries. These inequalities result sectors and health and social care sector services
not only from risk factors, but also from disparities while respecting national health expenditure
in access to health services that require the highest objectives.
out-of-pocket expenditure by patients. Planning and regulation involve negotiations
among provider representatives (hospitals and
health professionals): the state, represented by
Organization and Governance both the Ministry in charge of Health and the
Ministry in charge of the Economy and Finances,
The French health care system is of a mixed type, and SHI. The outcome of these negotiations is
structurally based on a Bismarckian approach with translated into administrative decrees and laws
Beveridge goals reflected in the single public payer passed by the parliament. These include public
model, the increasing importance of tax-based rev- health acts, social security funding acts, and
enue for financing and strong state intervention. reform acts. In the context of increasing health
35 Health System in France 829
care expenditure and the increasing SHI deficit, individual health insurance electronic card (carte
the role of the state in planning and regulation has Vitale) on the patient side and an electronic iden-
increased over the past two decades. The respon- tification card for health workers (carte de pro-
sibility for capacity planning is shared by the fessionnel de santé; CPS) on the provider side.
central and regional levels. At the regional level, Additionally, in order to improve quality of
the ARSs coordinate ambulatory and hospital care care and decrease redundancy in consumption,
and health and social care for the elderly and the development of an electronic patient record
disabled through a regional strategic health plan (dossier medicale personnel; DMP) to group
(plan stratégique régional de santé; PRS) based medical information and care consumption in
on population needs. Each sector’s planning pro- ambulatory and hospital settings for patients on a
cess must comply with the PRS which, starting in voluntary basis was initiated in 2004. Implemen-
2010, represents the first attempt at regional plan- tation has not been smooth due to both technical
ning of the ambulatory care sector. and patient privacy concerns. However, by June
Providers are paid by SHI (or directly by 2013, nearly 350,000 patients had DMPs, which
patients who are later reimbursed). The statutory are now used by 4800 health professional in the
tariffs are set through negotiations between pro- ambulatory sector and 350 institutions in the hos-
viders and SHI and are approved by the Ministry pital sector.
in charge of Health. Quality of care is regulated at
the national level. Hospitals must undergo a cer-
tification process every four years, but there is no Financing
formal re-certification or re-licensing process for
health professionals. However, doctors, pharma- Financial responsibility for health care in France
cists, dentists, and midwives are required to fol- is mainly borne by SHI. However, SHI only funds
low lifelong learning activities through around three-quarters of health spending, leaving
professional continuous development. considerable scope for complementary sources of
The role of patients in regulation and planning funding, such as private voluntary health insur-
has slowly increased in recent years, although ance (VHI). Moreover, funding for long-term care
their participation remains marginal. The 2009 for the elderly and disabled is financed differently.
HPST law created the Regional conference on It is partly financed by a dedicated fund created in
health and autonomy (Conférence régionale de 2004, the National Solidarity Fund for Autonomy
la santé et de l’autonomie; CRSA) through (Caisse nationale de solidarité pour l’autonomie;
which patients and their representatives may par- CNSA). Its resources come from SHI and the
ticipate in defining public health priorities at the “solidarity and autonomy contribution” that is
regional level, including development of the PRS. generated from the revenue of an unpaid work-
Patient input is stronger at the services level. ing/solidarity day ( journeé de solidarité) contrib-
Health information systems and technologies uted by the French working population. Local
have been developed to help in planning and authorities and households also participate in
regulation. The SHI inter-schemes system financing these categories of care.
(systeme national d’information interrégimes de SHI resources mainly come from an earmarked
l’assurance maladie; SNIIR-AM) was established tax called the “general social contribution” (con-
in 2003. It encompasses information on patient tribution sociale généralisée) based on total
health care consumption for which a claim has income and not only on earned income as was
been sent to SHI, regardless the type of care (hos- previously the case. Additional revenue accounts
pital inpatient stays, self-employed doctor visits, for around 13% and comes from specific taxes
drugs. . .) as long as it is covered by SHI. This such as “sin” taxes or taxes on the pharmaceutical
system has been facilitated by the development of companies’ turnover. Funds are pooled at the
electronic billing, which has been implemented in national level, and there is no formal allocation
the ambulatory sector since the mid-1990s via an mechanism in France.
830 K. Chevreul and K. B. Brigham
SHI coverage is established according to resi- SHI pays for hospital acute care by means of
dent status, and entitlement is based on employ- a DRG-type payment method (tarification à
ment, unemployment, student, or retiree status. l’activité; T2A). In addition to the 20%
Since the introduction of universal medical cov- co-insurance amount, a hospital catering flat
erage (couverture maladie universelle; CMU) in fee amounting to €18 per day is the responsibil-
2000, the state has covered the health care costs of ity of patients or their VHI. Self-employed pro-
residents not otherwise eligible for SHI. Illegal fessionals are paid on a fee-for-service basis and
residents who have applied for residency are cov- patients are reimbursed based on official tariffs.
ered by a special program (aide médicale de l’état; However, certain self-employed doctors are
AME). allowed to practice extra-billing, which impairs
SHI covers a broad range of services and the equity of access objective of the system.
goods that are provided in hospital or defined in Financial incentives to improve the quality and
positive lists for outpatient care. In Europe, the efficiency of doctors’ practices and to decrease
level of coverage is considered quite generous, the level of extra-billing exist. Individual con-
offering rapid access to the latest innovations. tracts with general practitioners including with
The rate of coverage varies across goods and pay for performance target were initially
services; for example, the co-insurance rate is implemented in 2009 and extended to specialists
30% for physician and dentist care, 40% for in 2012. From 2012, measures designed to rein
ancillary services and laboratory tests, and 20% in excessive extra billing include a new volun-
for hospitalization. For most drugs, co-insurance tary “Access to health care.”. In exchange for
amounts to either 35% or 70% but ranges maintaining their extra-billin fee practices at
from 0% for nonsubstitutable or expensive 2012 levels, doctors benefit from social and fis-
drugs to 85% for “convenience medications.” cal advantages.
However, there are several conditions for which In 2012, total expenditure on health in France
patients are exempted from co-insurance, such as was estimated at €243 billion or 12% of gross
chronic conditions covered under the ALD domestic product (GDP). Expenditure on personal
scheme (affections de longue durée) or preg- health care accounted for three-quarters of total
nancy after the fifth month. Co-insurance health expenditure (€183.6 billion), representing
amounts are generally covered by VHI, which an average €2806 per person. Of this, 75.5% was
provides reimbursement for co-payments and publicly funded, with complementary voluntary
better coverage for medical goods and services health insurance (VHI) financing 13.7% and
that are poorly covered. However, deductibles households covering 9.6% in out-of-pocket
introduced after 2004 with the aim of improving costs. As in other European countries, health
coordination of care and reducing patient con- care expenditure has steadily increased. As a
sumption cannot be covered by VHI or else the result, since the late 1990s, SHI annual expendi-
insuring entity will be subject to financial ture has been capped by a national ceiling on SHI
penalties. expenditure (objectif national des dépenses assur-
Over recent decades, VHI has gained an impor- ance maladie; ONDAM) approved by the parlia-
tant role in ensuring equity of access and financing ment. It is splits into subtargets that cover hospital
of health care. It covers 88% of the population on expenditure, social, and health care services for
a private basis. Since 2000, in order to ensure that elderly and disabled, privately delivered care.
the measures increasing patients’ co-insurance While there is no formal allocation mechanism,
would not result in increased social inequities in this has provided SHI with a tool to allocate
access, public complementary insurance health care expenditure between broad sectors. If
(couverture maladie universelle complémentaire, the health care system is found to exceed its pro-
CMU-C) has been offered on a voluntary basis to jected budget by more than 1%, a special parlia-
lower socioeconomic groups and covers 6% of the mentary Alert Committee can ask the head of the
population. Directorate of Social Security (the watchdog for
35 Health System in France 831
all social security branches) to present a financial About 7% of the French population works in the
rescue plan. health care sector. The number of practicing doctors
per 1000 population is slightly lower than the EU27
average (3.3 vs. 3.4), although in France the num-
Physical and Human Resources ber includes not only those providing direct patient
care, but also managers, educators, researchers, etc.
In France, there is a high level of facilities, equip- The number of practicing nurses exceeds the EU27
ment, and other physical resources. However, average (8.5 vs. 7.9), and the ratio of nurses to
there are strong disparities in geographic distribu- physicians is 2.6, just above the EU average. Work-
tion, and France is well below the EU average for force forecasting and careful planning of educa-
MRI units (7 per million population, compared to tional capacity is mostly made at the national
the EU23 average of 10.3) and CT scanners (11.8 level through the use of numerus clausus for med-
per million population, compared to 20.4). ical professionals. It seeks to prevent shortages or
There are four main categories of hospitals: oversupply of health professionals. However, it
regional hospitals, general hospitals, local hospi- does not control for the geographical distribution
tals, and psychiatric hospitals. Capital investment of medical professionals, as self-employed profes-
is either covered by reimbursements for services sionals are free to choose where they practice. In
delivery or funded through specific programs. order to solve the resulting great disparities in the
Two nationwide investment plans were launched distribution of medical professionals, there has
in the last decade in order to improve quality and been increasing transfer of tasks from medical to
safety standards. The ARSs are responsible for the other professionals such as nurses and development
control of capital investment and purchases of of incentives for attracting health professionals to
major medical equipment. under-served areas.
Following the general trend in European coun-
tries, the number of full time acute beds per 1000
inhabitants has steadily declined over the last Delivery of Health Services
20 years. In 2010, it was 6.4, which is above the
EU27 average of 5.3. Reduction in acute care The delivery of care is shared among private phy-
capacity was accompanied by the transformation sicians, private profit-making hospitals, private
of acute beds into rehabilitation and long-term non-profit-making hospitals, and public hospitals.
care units and the development of day surgery In addition to the health care sector and the social
and hospitalization at home. sector, there is a so-called “third sector” which
Nurses and nursing aides form the largest provides both care and social services to elderly
group of professionals, accounting for approxi- and disabled people.
mately half of the health care workforce. Regis-
tered health professionals also include medical
professionals (physicians, dentists, and mid- Primary Care
wives), pharmacists, professionals involved in
rehabilitation (physiotherapists, speech therapists, Primary care is mostly delivered in the ambulatory
vision therapists, psychomotor therapists, occupa- care sector by self-employed professionals who
tional therapists, and chiropodists) and technical are paid on a fee-for-service basis by patients who
paramedical professions (hearing aid specialists, receive partial reimbursement from the SHI funds
opticians, and radiographers). The other profes- (i.e., co-insurance payments apply). Since the late
sions usually identified as contributing to health 1990s, GPs have gained a major role in the coor-
care include clerical and technical staff working in dination of care with the implementation of a
hospitals, laboratory technicians, pediatric auxil- semi-gatekeeping system that provides incentives
iaries, dieticians, psychologists, and ambulance to people to visit their GP prior to consulting a
drivers. specialist.
832 K. Chevreul and K. B. Brigham
services and patient education not otherwise cov- goal of offering respite care for families and day
ered. Physicians receive additional compensation care for patients with Alzheimer’s disease and
for coordinating the care of patients with certain other dementias.
chronic diseases (€40 per patient per year).
form of both public and private outpatient and drugs “over the counter” on shelves directly
inpatient care. accessible to patients.
Adult public mental health care is provided A number of measures have been taken to try
within around 800 geographical areas that cover to improve and limit the prescribing behavior of
theoretically equivalent populations of approxi- physicians and as well as the consumption pat-
mately 60,000 inhabitants aged 16 or more, terns of patients. The promotion of generic drugs,
called mental health care areas (secteurs de largely nonexistent until recently owing to the
soins de santé mentale; MHC). Care within relatively low price of drugs in France, first
each area is coordinated by a hospital (a public occurred in the 1990s. The rate of generic substi-
hospital in more than 90% of the cases) and tution increased to 83% in 2012 from 76% in
includes a wide range of preventive, diagnostic, 2011. The volume of drug consumption has
and therapeutic services, which are provided in slowed since 2010 due to fewer prescriptions,
both inpatient and outpatient settings. In partic- the effect of publicity campaigns, including
ular, ambulatory care centers (centres médico- those to reduce antibiotic use, and removal of
psychologiques; CMP) are present in almost certain drugs from the positive list.
every MHC area; they provide primary ambula-
tory mental health care, including home visits,
and direct the patients towards appropriate ser- Public Health
vices. The size and resources of MHC areas are
quite heterogeneous. Public health policy and practice in France have
Public mental health care for children follows a historically been difficult to describe because they
similar territorial organization, with 321 areas involve numerous actors and sources of funding,
covering an average of 46,000 people aged and large discrepancies exist between legislative
under 20 years (corresponding to an average of texts and actual practice, which relies on the ini-
210,000 inhabitants). These MHC areas for chil- tiative of local actors. The 2004 Public Health Act
dren show even wider geographical inequalities. provided a new framework for public health pol-
icy, firmly establishing the responsibility of the
state in public health matters and emphasizing the
Pharmaceutical Care role of the regional level for organizational issues.
The Act also created a quantitative assessment
France is the fourth largest market for pharmaceu- framework for health policies encompassing pub-
tical drugs in the world and the second in Europe lic health objectives for 5-year periods that must
after Germany. Drugs are dispensed by self- be monitored on an annual basis and set 5-year
employed pharmacists, while the price of drugs targets for most of the related indicators. In order
is set administratively for all drugs covered by to meet some of these goals, several national plans
SHI. Pharmacies have a monopoly on the dispens- have been established, such as those related to
ing of medicines. As a general rule, retail pharma- cancer; violence, addictions, and risky behaviors;
cies must be owned by a qualified pharmacist or environment and health; quality of life of patients
by a group of pharmacists associated in a com- with chronic diseases; and the provision of health
pany; these pharmacists or companies cannot be care for patients with rare diseases.
proprietors of more than one pharmacy. This num-
ber of pharmacies is regulated by a numerus
clausus that takes into account both the size of Reforms
the population to be served and the distance
involved in getting to the nearest pharmacy. The main objectives of the reforms to the health
There were about 22,000 retail pharmacies in care system of the last decade were to contain SHI
2012. Since June 2008, pharmacies have been expenditures without damaging equity in financial
allowed to sell a limited range nonprescription access, to increase geographic equity in access to
35 Health System in France 835
care, and to meet the increasing demand for long- and simplified regional governance of the health
term care. Decentralization and a change in the care system by creating the ARSs. In addition to
balance of power between the state and SHI were creating the PRS, which should lead to a common
the main instruments used to achieve these approach in planning for the hospital, ambulatory,
objectives. and health and social care sectors, it made formal
To contain SHI expenditure, two categories of legal provisions for the transfer of tasks between
measures were used. The first, called the “strict professionals. It also linked the regional medical
accounting cost-containment policy,” primarily numerus clausus to needs. In order to optimize the
focused on decreasing the size of the benefit bas- distribution of doctors without impairing freedom
ket and levels of coverage, resulting in a shift of settlement, incentives to increase the attractive-
towards VHI coverage. After 2004, several new ness of underrepresented specialties and medi-
mechanisms were introduced. A coordinated care cally under-served areas are being developed.
pathway was implemented with higher For instance, wages for hospital doctors will pos-
co-insurance for patients consuming care out of sibly increase in contexts where there is a high
this pathway, and new categories of co-payment need for their specialties, and contracts with med-
for patients were created with the introduction of ical students and self-employed health profes-
deductibles on some categories of care such as sionals with financial incentives to practice in
drug packages, doctor and nurse consultations, under-served areas will be implemented on a vol-
or patient transportation. Finally, there was stricter untary basis.
control of statutory tariffs, and starting in 2013 The increasing demand for long-term care is a
economic considerations have been introduced in major concern, as the need for public funding in
health technology assessment of innovations. the coming decades is estimated to be three times
The second category of measures was called higher than the expected growth of the population,
the “medically based cost-containment policy”; it thereby threatening equity in financing. Since
was developed in the 1990s after a long period of 2005, various financing reform proposals have
strict accounting policies that led to ongoing con- been debated, ranging from a newly covered risk
flicts between doctors and SHI. Medically based under the social security system to targeted subsi-
cost-containment focuses on the reduction of dies for private long-term care insurance. How-
financial and equity loss due to medical practice ever, to date no reform measure has been enacted.
variations and aims to improve medical practice.
The main tools used are the implementation of
lifelong learning, the development of practice Assessment
guidelines by national agencies, and the introduc-
tion of good practice commitments within profes- The French health care system has long enjoyed
sionals’ collective agreements with SHI. At first, the reputation of being one of the best in the
coercive measures such as fines for not following world. It has become synonymous with universal
continuous education were used to enforce this health coverage and a generous supply of health
new policy, but this was slowly abandoned for a services. This reputation comes in large part from
move towards the development of incentives, success in meeting its goals of full coverage, access
most recently the introduction of payment for without waiting lists, patient choice, and satisfac-
performance for individual doctors based on tion. The combination of a basic universal public
meeting good practice targets. Overall, it appears health insurance system and voluntary comple-
that the coercive medically based cost- mentary private insurance, which provides reim-
containment policy did not lead to major improve- bursement for co-payments required by the public
ments in collective practice and much is expected system as well as coverage for medical goods and
from the pay-for-performance approach. services that are poorly covered by the public sys-
In order to facilitate geographical equity in tem, results in low out-of-pocket costs and high
access to care, the HPST reinforced local planning medical care utilization. France’s average life
836 K. Chevreul and K. B. Brigham
expectancy of over 80 years is in part testament to and between health care and public health. Health
the strong combination of good health care and expenditures per capita are higher than the OECD
good public health policies in France. average, ranking usually third or fourth after the
Despite these positives, there also are some United States, Germany, and Switzerland,
shortcomings, especially when considering effi- depending on the data used and year. The high
ciency and socioeconomic inequality in health. level of health expenditure has become increas-
Major problems include lack of coordination ingly important at a time when the public system
between hospital and ambulatory services, is facing chronic deficits, which are likely to
between private and public provision of care, increase with the current economic downturn.
Health System in Japan
36
Ryozo Matsuda
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
Organization and Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
Stewardship/Governance in Health System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
Dimensions of Coverage (Breadth, Scope, Depth) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
Typologies of Health System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840
Regulating and Planning; Actors and Responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841
Sources and Collection of Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841
Pooling of Funds and Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842
Purchasing Process and Paying for Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842
Health Spending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842
Physical and Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842
Physical Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842
Intermediate Care Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
The Health Workforce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
Provision of Services: Providers, Services, Access, and Quality . . . . . . . . . . . . . . . . . . . 844
Public Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
Primary Care/Ambulatory Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
Specialized Ambulatory Care/Hospital Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
Pharmaceuticals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845
Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845
Mental Health Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845
Dental Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846
Complementary and Alternative Medicines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846
R. Matsuda (*)
Ritsumeikan University, Kyoto, Japan
e-mail: ryozo.matsuda@gmail.com
before the establishment of that principle. To systems. The decision is made usually according
administer the system, the national and local gov- to decisions of the Central Social Insurance Med-
ernments have developed complex regulations ical Council, which is a major arena for policy
and incentives more than half century. debates with representatives from insurers, pro-
According to the structure given by the viders, ministry officials, researchers and other
volume, this case study focuses on basic issues experts. Meanwhile, technology assessment of
of the health system and mostly excludes descrip- pharmaceuticals and medical devices is conducted
tions on details on differences between statutory by the Pharmaceutical and Medical Devices
insurances and innovations in policy making. Agency, a regulatory agency of the government.
Also, its descriptions are limited to the period up Two-tiered local governments implement
to 2013. In translation of Japanese language, policies established by the national government
words are selected so that they are clear for inter- as well as develop their own policies. Forty-seven
national readers in references to previous articles prefectures, at the upper level, develop strategies
(Ikegami et al. 2011): some English names of for health-care development and health promo-
insurance plans, acts, and organization are differ- tion, implement regulations on health facilities,
ent from the official translation. monitor activities of providers, and collect data
on health and health care. There are more than
1,700 municipalities at the lower level, each of
Organization and Governance which operates its community-based health insur-
ance for residents that are not covered by other
Stewardship/Governance in Health statutory insurers and the long-term care insur-
System ance. Prefectures and municipalities also imple-
ment regulations on clinics and home care
Health-care policies are developed predominantly providers and hospitals, respectively. Meanwhile,
by the national government with involvement since local governments have omnipotent power
of concerned actors, including statutory health to develop new policies unless they are against
insurers, medical professions, and experts. Within current law, they occasionally develop innovative
the government, the ruling party, the Cabinet, policies for collaboration and supplemental mea-
and the Ministry of Health, Labour and Welfare sures to decrease cost-sharing of children in their
(MHLW) as well as the Ministry of Finance jurisdictions.
and other ministries join health policy making.
Statutory and non-statutory councils and commit-
tees of the government, involving concerned Dimensions of Coverage (Breadth,
actors, usually discuss policy options to build Scope, Depth)
consensuses for enacting legislation and ministe-
rial ordinances (Rodwin 2011). The SHIS covers all residents in Japan except
The national government developed those with social assistance (or livelihood protec-
and enforced laws and regulations on health and tion) and some exceptional cases. In practice, the
long-term care. Coverage rates and policies are insurance is operated by the following three types
usually decided by bills that shall be passed by the of compulsory insurance: employment-based
National Diet. The Social Security Council within health insurance (EHI), community-based health
the MHLW develops national strategies on quality insurance (CHI), and health insurance for elderly
and safety, cost control, and payment reforms in (HIE).
health care. The Minister of Health, Labour and The EHI covers employees and their depen-
Welfare decides services covered and their prices, dents under age 75. It is operated by more than
pharmaceuticals covered and the rule for deciding 1,400 society’s established at large companies for
each price of each pharmaceutical, and other pay- their employees, by more than 75 mutual aid
ments rules in the statutory health insurance associations for public servants and other defined
840 R. Matsuda
groups, and by the National Health Insurance Since 2000, the long-term care insurance
Association (NHIA) for those working at medium (LTCI) covers all residents ages 40 and over. It is
to small companies (Ikegami et al. 2011). Some compulsory and covers both institutional care and
groups of professionals (e.g., doctors in private home care. The co-payment rate is 10% in 2013.
practice) are covered by the purposely established
associations by themselves.
Municipalities are in charge of administration Typologies of Health System
both of the CHI and the HIE in different ways.
They operate the CHI by themselves, while The health system in Japan is principally a type of
they delegate their responsibilities to 47 statutory social insurance-based systems, but since the gov-
insurers, also purposely established, at the prefec- ernment has been involved in making decisions
ture level. A HIE insurer is governed by represen- on some details of the system and more than
tatives of all municipalities in a prefecture. a third of its funds comes from tax, the state
The SHIS provides the same national benefit involvement is far strong than most social insur-
package, which covers hospital care, ambulatory ance-based systems in Western countries (Blank
care, mental health care, approved prescription and Burau 2010).
drugs, home care, physiotherapy, and most On the one hand, it has been partly based
dental care. Health checks, health education, and on statutory health insurances: the EHIs are funded
counseling are delivered by statutory insurers to by contributions both from employees and
those ages 40 and older. Social assistance pro- employers and the CHI by contributions of benefi-
vides similar coverage. Cancer screenings are ciaries and subsidies from tax. On the other hand,
delivered by municipalities outside the SHIS. the government has been holding strong power,
Co-payment rate is 30% in general but 20% for particularly of deciding the payment system and
children under 3 years old and 10% for people levels. Although the system is operated by more
ages 70 and over with lower incomes. To mitigate than 3,000 statutory insurers, financing administra-
high financial burdens, catastrophic insurance tion is highly concentrated with little discretion to
covers most of co-payments over a monthly each insurer except limited issues. Provision of
threshold which varies according to enrollee’s health-care services is based on market mecha-
age and income. Also, cost-sharing is reduced nisms without gatekeeping mechanisms, where
for those with low-income, disabilities, mental public and not-for-profit providers compete with
illness, and specified chronic conditions. A part each other as well as collaborate.
of expenditure on health services and goods can Lee et al. (2008) describe the system as a
be deducted from taxable income. hybrid of a hybrid model between social health
Providers are prohibited from charging extra insurance and the national health insurance,
fees in general, although thay can make extra fees where the financing administration of health
for some services specified by the MHLW, includ- systems is concentrated into a national entity,
ing amenity beds, “experimental treatments,” the and private sectors are dominant in health-care
outpatient services of large multi-specialty hospi- provision.
tals, after-hours services, and hospitalizations of Private voluntary health insurance, historically
180 days or more. developed as a supplement to life insurance,
Catastrophic coverage stipulates a monthly appears to play a marginal role (Paris et al.
out-of-pocket threshold which varies according 2010). Traditional plans usually pay a lump sum
to enrollee age and income (e.g., 80,100 yen for when insured persons are hospitalized over a
people under ages 75 with an average income); defined period and/or diagnosed with cancer or
above this threshold, a 1% co-payment rate is any of other specified chronic diseases. In the last
applied. Alternatively, the threshold works as a decades, however, varieties of complementary
ceiling for low-income people, who do not pay private insurance policies, sold separately from
more than 35,400 yen a month in 2013. life insurance, have been increasing.
36 Health System in Japan 841
Regulating and Planning; Actors and implement regulations on quality of hospital ser-
Responsibilities vices and can develop their own policy measure-
ment, including subsidies and regulations, with
The national, prefectural, and municipal govern- their budget. Prefectures shall have public health
ments regulate health care and conduct planning centers to which many of regulatory responsibili-
activities in various fields within the SHIS, struc- ties concerned with health care and public health in
tured by law. Statutory insurers are responsible their jurisdictions are usually delegated from
for operating themselves within the framework the governors’ office (Tatara and Okamoto 2009).
and regulations stipulated by acts and Municipalities are responsible for operating
ordinances. the CHI and the LTCI, delivering home and
The national government decides which welfare services, and promoting health in the pop-
services and pharmaceuticals are covered by ulation. More autonomous large cities than usual
the SHIS and the rules for paying them. It revises municipalities shall establish public health
the rules every 2 years by building consensus centers.
between providers, insurance organizations, The public can participate in every level
and experts in health policy. Once the rules are of political decision-makings. In the last two
proclaimed, they are valid countrywide in the decades, critical committees concerned with
SHIS. The government also set requirements health care are more likely to have members
and quality standards for health-care facilities, who put patients’ interest first.
most of which local governments enforce in their
jurisdictions.
The national government shall and can Financing
develop its plans on health promotion and health
care. Their objectives include promotion Sources and Collection of Revenue
of healthy behavior and environment, higher
utilization of personal preventive services, In 2010, 82.1% of total health expenditure
increase of efficiency in health-care delivery, was financed through the SHIS, meanwhile
and higher utilization of generic drugs 14.4% by out-of-pocket (OECD 2013).
(OECD 2009). It also makes guidelines for The national and local government paid around
implementing regulations. It directly supervises a quarter and a ninth of national health spending,
the operation of the largest insurer, the respectively. Contributions are collected by
NHIA. Seven regional bureaus of health and each insurer. Each CHI insurer decides its com-
welfare supervise the operation of the insurance plex method of calculating premiums for house-
societies, local branches of the NHIA, and holds. Usually it is based on the number of CHI
the CHI insurers. member in the household and the member’s
Prefectural governments supervise and support household income. Rates,, therefore, vary
the CHI insurers in its jurisdictions both in financial between municipalities. Each HIE insurer at a
and technical terms. They shall develop their plans prefecture levies premium on per-capita and
on health promotion and health care. They are income basis.
usually supposed to consider policy and technical The EHI insurers levy premiums on wages.
guidelines developed by the national government. Employers pay half of these premiums for their
Prefecture shall develop and publish health-care employees. Premium rates of the EHI societies
plans in its jurisdiction, which shall include assess- vary between 3% and 10% of their income
ment of needs, directions for strategic develop- whereas rates of the NHIA, which differ between
ment, and descriptions of providers. The power branches, are around 10%.
and capacity of prefectures for implementing the There are various types of direct and
plans have been limited to place the cap on hospital indirect tax both at the national and local levels,
beds (Hashimoto et al. 2011). Prefectures politically controlled. By law, the national
842 R. Matsuda
and local governments have obligations of Increasing health-care demand, partly due
paying funds, calculated with actual spending, to demographic changes and the introduction of
to the SHIS. new technologies, is considered as cost drivers
in Japanese health-care system (Ikegami and
Anderson 2012).
Pooling of Funds and Resource
Allocation
Physical and Human Resources
Each insurer in the SHIS is expected to be finan-
cially healthy. Subsidies from the national and Physical Resources
local governments are granted mainly to the CHI
insurers and the HIE insurers and, to a lesser Hospitals, clinics, intermediary facilities, long-
extent, the JHIA. There are cross-subsidies from term care facilities, and other facilities have devel-
the CHI and the HI insurers to the HIE insurers, oped. The number of hospitals and beds in them
calculated by factoring in the number of enrollees per population is high, compared to other OECD
ages 65–74. countries (Tatara and Okamoto 2009). Health
facilities are owned and managed both publicly
and privately. Private providers include health
Purchasing Process and Paying for facilities owned by physicians as well as medical
Health Services corporations, which are not-for-profit private legal
entities, usually controlled by physicians, for
Providers are paid by a national payment rule, health-care provision.
which combines various kinds of activity-based To decrease geographical variations, the
funding methods: fee-for-service payments, per- national government increased the number of
diem payments, and per-monthly payments for medical courses with a policy aiming that every
chronic outpatient care. prefecture has at least a university with medical
Providers send claims for the CHIs to the faculties and educational hospitals in the 1960s
Central Federation of National Health Insurance and 1970s. Also, since 1956, the government has
and claims for the EHIs to the Health Insurance developed and implemented its Rural Healthcare
Claims Review & Reimbursement Services, a Plan with subsidies to local governments since
statutory body to manage claims in the SHIS. 1956.
Health facilities need to announce such
specialties and/or subspecialties as “internal med-
Health Spending icine,” “surgery,” “orthopedics,” and “circulatory
medicine.” Which specialties and sub-specialties
The total health expenditure (THE) on health as can be announced, the nomenclature is regulated
percentage of GDP is similar to the average of by the government. It has not so far included
OECD countries. It continuously increased in the “general practice,” “family practice,” nor
last decades. In 2010, 63.3%, 9.1%, 21.4%, 3.0%, “primary care.” The argument to make “general
and 1.6% of the THE were spent for services of practice” or “primary care” recognizable has been
curative and rehabilitative care, services of long- discussed recently.
term nursing care, medical goods, prevention Health facilities can install licensed medical
and public health services, and administration, devices with its resources and, in some cases,
respectively (OECD 2013). Hospitals, nursing with subsidies from the governments. Since
and residential care facilities, and ambulatory there have been no regulations on their diffusions,
care providers spent 47.1%, 3.8%, and 27.1% of magnetic resonance imaging (MRI) and com-
the THE. More than 20% of the THE was spent puted tomography (CT) scanners spread widely
for pharmaceuticals. (Anderson et al. 2005).
36 Health System in Japan 843
employed and get salaries from their employers. (Matsuda 2008). The aim of current health checks,
Approximately 60% of nurses work at hospitals, delivered by the insurers, is not checking general
most of others at clinics. Some nurses operate home health but detecting possible metabolic syndromes
nursing service providers, in which case they earn so that insurers intervene to decrease health-care
money as owners of providers. Japanese Nursing expenditures. The government established targets
Association has developed certification programs for uptake rates of health checks and introduces a
(Japanese Nursing Association 2011). Public financial incentive: insurers that fail to achieve the
health nurses, who are supposed to work in the target have to pay more cross-subsidies to the HIE.
field of public health, and midwives also need Regarding health promotion, the national gov-
licenses to practice. One must take courses for ernment has the national plan and strategies for
the two professions with qualification as nurses. health promotion, “Health Japan 21,” and munic-
Qualifications for long-term care, including home ipalities organize health activities for their resi-
helper and care worker at caring institutions, exist dents using their local health centers.
besides nursing qualifications.
Other qualified professionals in health
care include physical therapists, occupational Primary Care/Ambulatory Care
therapists, radiology technologists, and clinical
medical technologists. For alternative medicines, Ambulatory care is provided by clinics and hospital
licenses are needed to practice therapeutic mas- out-patient departments. The number of ambula-
sage, acupuncture, moxa cautery, and judo chiro- tory patients at medical clinics are 2.5 times than
practic treatment (Tatara and Okamoto 2009). that of hospitals (Health Statistics Office, Ministry
of Health, Labour and Welfare 2012a).
Since physicians in Japan are trained as special-
Provision of Services: Providers, ists and primary care or family care medicine has not
Services, Access, and Quality been established as a specialty in clinical medicine,
it is difficult to distinguish primary care physicians,
Public Health although it is easy to recognize such specialists as
ophthalmologists, otolaryngologists, and dermatol-
Public health administration has been a part ogists. It has been argued that “general practice”
of general administrative structure of the shall be a specialty and included to the nomenclature
governments and been separated from the SHIS. of specialties (Matsuda 2008).
According to legislations by the national govern- There is no gate-keeping. Patients are free to
ment, prefecture governments have a responsibil- choose either clinics or outpatient departments of
ity of public health and environmental health in hospitals when they need medical consultations.
their jurisdictions (Tatara and Okamoto 2011). Meanwhile, highly specialized hospitals can make
Large cities, designated by ministerial ordinances, extra charges when patients visit them without
also have the same responsibility. Those prefec- referral from other providers. Physicians at clinics
tures and cities also have an obligation of estab- or outpatient departments deal with first-contact
lishing and operating public health centers and patients, although their performance might not be
delegate most of their responsibilities and powers satisfactory by the standards of trained family
on public health to directors of those centers. physicians.
Municipalities delivered almost personal
preventive services, including vaccination, health
checks, and cancer screenings, until 2008. Specialized Ambulatory Care/Hospital
Since the 2008 Reform, statutory health Care
insurers deliver health checks and behavioral mod-
ification programs, while municipalities continue to Specialized ambulatory care is provided both at
deliver other personal preventive services clinics and at outpatient departments of hospitals.
36 Health System in Japan 845
Patients can directly use the care without referral in pharmacies in the community, which dispense pre-
principle, although they shall pay extra charges for scribed pharmaceuticals to patients. Some pharma-
the direct utilization. There has been a financial cies operate only for prescribed pharmaceuticals in
incentive to avoid direct utilization of patients of the SHIS; the others sell OTC drugs and other
specialist care: hospitals with highly specialized goods in addition to provision of prescribed phar-
care functions can charge extra fees to patients. maceuticals. There was a tradition that physicians
Hospitals vary in scale from small hospitals with dispense pharmaceuticals at their offices by them-
20 beds to large with more than 1,000 beds. selves in Japan and the tradition still has remained:
Remuneration for specialist physicians 41% of outpatient prescriptions were still dis-
depends on their status, i.e., whether they pensed by physicians in 2008 (OECD 2009).
are employed physicians or owners or executives Patients pay the same proportions of cost-shar-
of health-care organizations, as described above. ing for prescribed drugs as described above. Phar-
The payment method to hospital inpatient care macists can replace prescribed brand-name drugs
is based on their activities but has been gradually with generic drugs unless physicians explicitly
changing from payment on fee-for-service basis prohibit it on their prescriptions. Generic drugs
to payment on per-diem basis with case-mix count for 47.9% in its quantity and 11.4% in
modifications using the Diagnostic Procedure monetary terms among prescribed drugs dispensed
Combination (DPC), a case-mix classification at pharcies.
system similar to the Diagnostic-Related Groups
(Matsuda et al. 2008; Okamura et al. 2005).
However, the payment system with the DPC is Long-Term Care
unusual because it includes both a DPC compo-
nent and a fee-for-service component. The former With the mandatory Long-Term Care Insurance,
is a per-diem payment that declines as the length established in 2000, person with disabilities can
of the hospital stay increases and covers services use monthly budgets, allocated according to their
other than such specified services as surgical pro- assessed needs, to purchase long-term care ser-
cedures and rehabilitation basic charge, which are vices. Long-term care services are classified
covered by the fee-for-service component (OECD largely into institutionalized care and community
2009). A specific coefficient to multiply DPC care. The government prohibits private companies
rate for a hospital is determined in consideration to operate institutionalized care in the LTCI,
of different scales and functions of hospitals. although they can outside the LTCI. Most pro-
Hospitals using the DPCs must submit detailed viders of institutionalized care, therefore, are
data on their services. In 2012, more than half of not-for-profit organizations. Private for-profit
beds were paid with the DPC. The government companies can enter the community care market
uses the data to analyze hospital behaviors and and account for around half of all community care
impacts of financial incentives on them. providers (Olivares-Tirado and Tamiya 2013).
Integration or coordination of care has been
emphasized in health and long-term care policy.
Particular policies toward integration of care Mental Health Care
include development of disease-oriented clinical
care pathways (Okamoto et al. 2011). Psychiatric hospitals, psychiatric departments
of general hospitals, and psychiatric clinics pro-
vide mental health care covered by the SHIS.
Pharmaceuticals In addition to those providers, prefectures have
mental health centers, which are mostly funded
Prescribed pharmaceuticals for outpatients with tax, to support providers with expertise
and inpatients are covered by the SHIS. In princi- and develop collaboration between concerned
ple, patients bring prescriptions of physician to organizations.
846 R. Matsuda
Community mental health care has been obstetricians and pediatricians. Although there
developed. have been much differences in health-care resources
between prefectures, reasons of the differences and
whether they are inequitable or not have not firmly
Dental Care assessed. Furthermore, in the era of increasing pov-
erty, fair and good access to quality health services
Dental care for children as well as adults is cov- have encountered new challenges. Those challenges
ered by the SHIS. Some common services, includ- include delinquency in paying premiums to the CHI
ing orthodontics and expensive artificial teeth, are and cost-related access problems with the current
excluded from the SHIS coverage. co-insurance rates, particularly in ambulatory care
To become a dentist, one shall graduate from a (Matsuda 2016; Murata 2010; OECD 2009).
dental school and pass the national board exami- Quality of care is another area lacking system-
nation. Most dentists own and operate their atic evidences. However, new institutions for
clinics, who are paid on the fee-for-service basis, hospital certification and policy incentives have
and employ dental hygienists and technicians who been developed since 2000. More and more hos-
work with dentist. pitals publish their clinical indicators, which are
supported by the government. With increasing
financial pressures on health-care resources,
Complementary and Alternative mechanisms to monitor and regulate quality of
Medicines care are becoming more important (Hashimoto
et al. 2011).
The government issues licenses of massage ther-
apists, acupuncturists, moxa cauterists, and judo
chiropractitioners for providing care. The licensed
practitioners can provide defined services in the
References
SHIS provided that physicians order them. Anderson GF, Hussey PS, Frogner BK, Waters HR. Health
spending in the United States and the rest of the indus-
trialized world. Health Aff. 2005;24(4):903–14. https://
Assessment doi.org/10.1377/hlthaff.24.4.903.
Blank RH, Burau V. Comparative health policy.
Basingstoke: Palgrave Macmillan; 2010.
One difficulty for anyone trying to assess the Hashimoto H, Ikegami N, Shibuya K, et al. Cost
Japanese health-care system is that the fragmented containment and quality of care in Japan: is there a
system and lack of system-level robust data make trade-off? Lancet. 2011;378(9797):1174–82. https://
doi.org/10.1016/s0140-6736(11)60987-2.
it difficult to assess it quantitatively. The long life Health Statistics Office, Ministry of Health, Labour and
expectancy in Japan suggests that the system Welfare. Summary of 2010 static/dynamic surveys of
works at least fairly well even if strong health medical institutions and hospital report. Tokyo: Minis-
consciousness and prevalent healthy behaviors try of Health, Labour and Welfare; 2011.
Health Statistics Office, Ministry of Health, Labour and
are taken into consideration (Ikeda et al. 2011). Welfare. Summary of 2011 patient survey. Tokyo: Min-
Looking parts of the system, however, ineffi- istry of Health, Labour and Welfare; 2012a.
ciency in delivering health care and imbalances Health Statistics Office, Ministry of Health, Labour and
between regions have been pointed. Health Welfare. Summary of 2011 static/dynamic surveys of
medical institutions and hospital report (in Japanese).
expenditure has been fairly controlled, but its Tokyo: Ministry of Health, Labour and Welfare; 2012b.
projected increase in the near future jeopardizes Ikeda N, Saito E, Kondo N, et al. What has made the
the sustainability of the system (OECD 2010). population of Japan healthy? Lancet. 2011;378
Access to health care has been good since (9796):$32#1094–105.
Ikegami N, Anderson GF. In Japan, all-payer rate setting
patients can choose any providers principally. How- under tight government control has proved to be an effec-
ever, in some rural areas, patients have difficulties to tive approach to containing costs. Health Aff. 2012;31
find physicians, particularly such specialists as (5):1049–56. https://doi.org/10.1377/hlthaff.2011.1037.
36 Health System in Japan 847
Ikegami N, Yamauchi K, Yamada Y. The long term care Neurorehabilitation and Neural Repair. 2011;25
insurance law in Japan: impact on institutional care (6):540–547.
facilities. International Journal of Geriatric Psychiatry. Murata C, Yamada T, Chen CC, Ojima T, Hirai H, Kondo
2003;18(3):217–221. K. Barriers to Health Care among the Elderly in Japan.
Ikegami N, Yoo B-K, Hashimoto H, et al. Japanese International Journal of Environmental Research and
universal health coverage: evolution, achievements, Public Health. 2010;7(4):1330–13413
and challenges. Lancet. 2011;378(9796):1106–15. OECD. Health-care reform in Japan: controlling costs,
Ishizaki T, Kobayashi Y, Tamiya N. The role of geriatric improving quality and ensuring equity. In: OECD,
intermediate care facilities in long-term care for the editor. OECD economic surveys: Japan 2009. OECD
elderly in Japan. Health Policy. 1998;43(2):141–151. Publishing; 2009. https://doi.org/10.1787/
Japanese Nursing Association. Nursing in Japan. Tokyo: eco_survey$32#s-jpn-2009-6-en.
Japanese Nursing Association; 2011. Available at: OECD. Value for money in health spending. 2010. https://
http://www.nurse.or.jp/jna/english/pdf/nursing-in- doi.org/10.1787/9789264088818-en.
japan2011.pdf OECD. Health data 2013 [database on the Internet]. 2013.
Lee S-Y, Chun C-B, Lee Y-G, Seo NK. The National Okamoto E, Miyamoto M, Hara K, et al. Integrated care
Health Insurance system as one type of new typology: through disease-oriented clinical care pathways: expe-
the case of South Korea and Taiwan. Health Policy. rience from Japan’s regional health planning initiatives.
2008;85(1):105–13. Int J Integr Care. 2011. Available at: http://www.ijic.
Matsuda R. Arguments for instituting “general org. URN:NBN:NL:UI:10-1-101572.
physicians”. Health Policy Monitor, April 2008. 2008. Okamura S, Kobayashi R, Sakamaki T. Case-mix payment
Available at: http://www.hpm.org/en/Surveys/ in Japanese medical care. Health Policy. 2005;74
Ritsumeikan_University_-_Japan/11/Arguments_for_ (3):282–6.
Instituting__General_Physicians_.html Olivares-Tirado P, Tamiya N. Trends and Factors in Japan’s
Matsuda R. Public/Private Health Care Delivery in Japan: Long-Term Care Insurance System: Japan’s 10-year
and Some Gaps in Universal Coverage. Global Social Experience. Dordrecht: Springer; 2013.
Welfare. 2016;3:201. https://doi.org/10.1007/s40609- Paris V, Devaux M, Wei L. Health systems institutional
016-0073-1 characteristics. OECD health working papers, No. 50.
Matsuda S, Ishikawa KB, Kuwabara K, Fujimori K, Pasis: OECD; 2010.
Fushimi K, Hashimoto H. Development and use Rodwin MA. Conflicts of interest and the future of medi-
of the Japanese case-mix system. Eurohealth. cine: the United States, France, and Japan. New York:
2008;$32#14(3):25–30. Oxford University Press; 2011.
Matsumoto M, Inoue K, Kajii E, Takeuchi K. Retention Tatara K, Okamoto E. Japan. Health system review. Health
of physicians in rural Japan: concerted efforts of the Syst Transit. 2009;11(5):1.
government, prefectures, municipalities and medical Tatara K, Okamoto A. Public health of Japan 2011. Tokyo:
schools. Rural Remote Health. 2010;10(2):1432. Japan Public Health Association; 2011.
Ministry of Health, Labour and Welfare. Pharmaceutical Teo A. The current state of medical education in Japan: a
expenditures at dispensing pharmacies, FY2013 (in system under reform. Med Educ. 2007;41(3):302–8.
Japanese). Tokyo: Ministry of Health, Labour and Wel- https://doi.org/10.1111/j.1365-2929.2007.02691.x.
fare; 2014. Tsutsui T. Implementation process and challenges for
Miyai I, Sonoda S, Nagai S, Takayama Y, Inoue Y, Kakehi the community-based integrated care system in
A, Kurihara M, Ishikawa M. Results of New Policies Japan. International Journal of Integrated Care.
for Inpatient Rehabilitation Coverage in Japan. 2014;14(1).
Health System in Mexico
37
Julio Frenk and Octavio Gómez-Dantés
Contents
Health Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 850
History of the Mexican Health Care System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
Organization and Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
Planning and Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852
Health Information Systems and Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852
Role of Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853
Coverage and Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853
Sources of Revenue, Collection, and Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854
Health Expenditure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854
Physical and Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855
Pharmaceuticals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855
Delivery of Personal and Public Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855
Quality of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
Recent Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858
Abstract
This chapter discusses the Mexican health sys-
President of the University of Miami and former Minister
of Health of Mexico (2000–2006) tem. We first describe the general characteris-
Senior researcher, Center for Health Systems Research, tics of Mexico and the health conditions of the
National Institute of Public Health, Mexico Mexican population, with emphasis in non-
J. Frenk (*) communicable diseases, which are now the
University of Miami, Coral Gables, FL, USA main cause of death and disability. The follow-
e-mail: president@miami.edu ing section is devoted to the description of the
O. Gómez-Dantés basic structure of the system: its history; its
National Institute of Public Health, Cuernavaca, MOR, main institutions; the population coverage;
Mexico
e-mail: ocogomez@yahoo.com the health benefits of those affiliated to the
different health institutions; its financial • An increase in life expectancy at birth from
sources; the availability of physical, material, 49.6 years in 1950 to 79.8 years in women
and human resources for health; the delivery of and 74.0 in men in 2013 (2013 est.)
personal and public health services; the stew- • A reduction in fertility from 6.8 children per
ardship functions displayed by the Ministry of women of reproductive age in 1970 to 2.2 in
Health; and other actors. This part also dis- 2013 (2013 est.)
cusses the role of citizens in the monitorization
and evaluation of the health system, as well as The rapid decline in fertility is driving an aging
the levels of satisfaction with the rendered process which implies an increasing proportion of
health services. In part three, the most recent older adults in the population structure. Children
innovations and its impact on the performance under 5 will represent less than 10% of the total
of the health system are discussed. Salient population in 2050 while older adults will con-
among them are the System of Social Protec- centrate over 20% of the total population (Ham-
tion in Health and the Popular Health Insur- Chande 2012).
ance. The chapter concludes with a discussion Mexico is also going through an accelerated
of the most recent health initiatives and process of urbanization. Eight out of every 10
reforms, and a brief analysis of the short- and Mexicans now live in urban areas (Central Intel-
middle-term challenges faced by the Mexican ligence Agency). This is associated to a parallel
health system. process of rural population dispersion which
increases the problems of access to health care of
a population with major health needs (Reyna-
Mexico is the largest Spanish speaking country in
Bernal and Hernández-Esquivel 2006).
the world. It covers 1.9 million km2 of land
in North America (Central Intelligence Agency).
It borders to the north with the
Health Conditions
USA, and with Guatemala and Belize to the south.
Mexico is an upper middle income country with
The increase in life expectancy and a growing
a GDP of US$ppp 1.788 trillion (2012) and a per
exposure to unhealthy life styles in urban dwell-
capita GDP of US$ppp 15,100.1. Its human devel-
ings are modifying the main causes of disease,
opment index is 0.775 (2012), above the world
disability, and death. Mexico is going through a
average of 0.694 and ranking 61 out of 187 coun-
health transition characterized by an increasing
tries (UNDP). Inequality, as measured by the Gini
predominance of noncommunicable diseases
index, is 47.2, higher than all other high human
(NCD) and injuries. In 1950 around 50% of all
development countries except for Brazil (The
deaths in the country were due to common infec-
World Bank). Its principal source of income is ser-
tions, reproductive events, and diseases related to
vices (61.8%), with industry running second
undernutrition (Fig. 1) (Secretaría de Salud 2001).
(34.2%) and agriculture representing a small and
Today, these ailments concentrate less than 12%
waning portion (4.1%) (Central Intelligence
of total deaths, while NCDs and injuries are
Agency). Its annual economic growth rate during
responsible for almost 90% of national mortality
the period 1990–2010 was 2.8% (The World Bank).
(World Health Organization 2012).
Mexico has a population of 116.2 million
The contribution to mortality of the different
(2013 est.) that is witnessing: (Central Intelli-
age groups is also changing. In 1950, half of total
gence Agency; Partida 1999)
deaths were concentrated in children under 5 and
only 15% were concentrated in persons 65 years
• A decline in general mortality explained of age and older (Secretaría de Salud 2007). Now-
mostly by a reduction in infant mortality from adays, more than 50% of deaths are concentrated
79 per 1000 live births in 1970 to 16.2 in 2013 in older adults and less than 10% in children under
(2013 est.) 5 (Zúñiga and García 2008).
37 Health System in Mexico 851
Fig. 2 The Mexican heath system has a public and private sector providing services to overlapping population groups
own staff, except for Seguro Popular, which buys (CONAMED) (Comisión Nacional de Arbitraje
services for its affiliates from the MoH, SESA, and Médico).
IMSS-O. The private sector includes facilities and Regulation is the responsibility of the Federal
providers offering services mostly on a for-profit Commission for Health Risk Protection
basis financed either through insurance premiums (COFEPRIS), charged with assuring food safety,
or out-of-pocket payments. defining environmental standards, promoting
occupational health and safety, regulating the
pharmaceutical industry, and controlling hazard-
Planning and Regulation ous substances like alcohol and tobacco
(Comisión Federal de Protección contra Riesgos
The MoH is in charge of most stewardship func- Sanitarios).
tions, including strategic planning, policy design, The MoH also counts with an evaluation unit
intra- and inter-sectoral coordination, regulation which evaluates the main policies and programs
of personal health services, sanitary regulation, and publishes an annual report on the performance
and evaluation of policies and programs. The reg- of the Mexican health system and its various
ulation of personal health services includes the components (Dirección General de Evaluación
accreditation of medical and nursing schools, the del Desempeño, Secretaría de Salud, México).
certification of health professionals, and the
accreditation of health facilities. These activities
are developed in coordination with several profes- Health Information Systems and
sional bodies and NGOs, including the National Technology
Academy of Medicine and the National Associa-
tion of Medical Schools and Faculties. The pro- Health information is the responsibility of the
tection of health service users is in charge of the General Directorate for Health Information
National Commission for Medical Arbitrage based at the MoH (Dirección General de
37 Health System in Mexico 853
Información en Salud, Secretaría de Salud, Méx- Besides these groups, citizens have tradition-
ico). In collaboration with other public institu- ally played a limited role in the design and oper-
tions, this office created the National Health ation of health services, programs, and policies.
Information System (SINAIS), which generates The main exceptions are the HIV/AIDS and
information on births, deaths, cases of disease, women’s health advocacy groups.
health infrastructure, health services, and financial
and human resources (Sistema Nacional de
Información en Salud, México). SINAIS counts Financing
with several subsystems including the Epidemio-
logical Surveillance System, the Automatized Coverage and Benefits
Hospital Discharge System, and the National and
State Health Accounts System. The Mexican health system is segmented along
The MoH has an area for the evaluation of three broad categories of beneficiaries: (i) salaried
medical technology, the National Center for Health workers and retired population, along with their
Technology Excellence, whose main purpose is to families; (ii) self-employed workers and unem-
produce and disseminate information on the appro- ployed population, along with their families; and
priate selection, incorporation, and use of medical (iii) the population with the ability to pay.
technologies based on evidence of their safety, As mentioned above, salaried workers are the
effectiveness, and efficiency (National Center for beneficiaries of social security institutions, which
Health Technology Excellence). in 2010 covered 50.7 million people (Table 1;
Crónica). IMSS covered 80% of this population,
and the rest was covered by ISSSTE and the
Role of Patients social security institutions for oil workers and
the armed forces.
Patients in Mexico started playing a role in the The second category (self-employed and
operation of the Mexican health system until very unemployed, and their families) was covered
recently through the “citizen endorsements until 2003 by services of the MoH, SESA, and
groups,” created in 2001 as part of a quality pro- IMSS-O. The recently created Seguro Popular
gram, the “National Crusade for Quality in Health was covering 43.5 million individuals in this
Care.” The purpose of these groups is to train category by 2010 (Comisión Nacional de Pro-
community volunteers to assess the responsive- tección Social en Salud; Comisión Nacional
ness of health care facilities (Ruelas 2006). In de Protección Social en Salud). By the end of
2006, there were 1764 active citizen groups that 2011, affiliation to Seguro Popular reached
had endorsed over 1100 health units. 52 million. This means that Mexico is on
854 J. Frenk and O. Gómez-Dantés
track to reach universal health coverage in the IMSS. Finally, Seguro Popular is financed with
near future. federal and state government contributions and
Finally, the third category includes the users of family contributions, with total exemption for
private health services, mostly upper and middle those families in the bottom 40% of the income
class individuals. However, the poor and those distribution.
affiliated to social security institutions also use Private services are financed mostly out-of-
them on a regular basis. According to the National pocket. A very small portion of private health
Health and Nutrition Survey 2012 (ENSANUT expenditure comes from private insurance
2012), over 30% of the insured population regu- premiums.
larly use private health services, mostly ambula-
tory care, for which they usually pay out-of-
pocket (Instituto Nacional de Salud Pública Health Expenditure
2013). The penetration of private insurance is
low. Only six million people in Mexico are cov- Total health expenditure as % GDP in Mexico in
ered by private health insurance, half of which 2010 was 6.3%, well below the OECD average
also are covered by public insurance (9.3%) and below the Latin American average
(CNNExpansión). (6.8%), but up from 5.1% in 2000 (World Health
Those affiliated to social security institutions Organization; Organization for Economic Coop-
have access to a broad, but not explicitly defined, eration and Development; World Health Organi-
package of health services that includes ambula- zation). Health expenditure per capita in that same
tory and hospital care, including high specialty year was US$ppp 603, up from US$ppp 328 in
care. Coverage includes drugs as well. Those 2000.
affiliated to Seguro Popular have access to a com- Mexico’s public expenditure on health as a
prehensive and explicit package of 270 essential percentage of total health expenditure in 2010
interventions and the respective drugs. They also was 49%, up from 46.6% in 2000 but still the
have access to a package of over 60 high-cost third lowest of OECD countries (World Health
interventions for the treatment of acute neonatal Organization; Organization for Economic Coop-
conditions, cancer in children, cervical and breast eration and Development).
cancer, and HIV/AIDS, among other diseases. Private expenditure concentrates 51% of total
Finally, the uninsured population has access to a health expenditure in Mexico, a much larger por-
limited package of benefits that vary considerably tion than the average OECD country (17%) and a
depending on the type of population (urban larger portion than Argentina (35.6%), Colombia
or rural). (25.4%), and Uruguay (34.7%) but lower than
Brazil (53.0%) (World Health Organization;
Organization for Economic Cooperation and
Sources of Revenue, Collection, and Development; World Health Organization).
Pooling Ninety two percent of private health expendi-
ture is out-of-pocket (World Health Organization).
As shown in Fig. 2, social security institutions are The remaining 8% corresponds to private insur-
financed with contributions from the government, ance premiums (World Health Organization). In
the employer (which in the case of ISSSTE, Argentina, Brazil, Colombia, and Uruguay, out-
PEMEX, SEDENA, and SEMAR is also the gov- of-pocket expenditure concentrates 60%, 57.8%,
ernment in its role as employer), and the 67.7%, and 39.6% of total private health expendi-
employee. The MoH and the SESA are financed ture, respectively (World Health Organization).
mostly with federal and state government This means that Mexico has the highest level of
resources coming from general taxation. IMSS- out-of-pocket expenditure of middle-income coun-
O, which is directed to the rural poor of 17 states, tries in Latin America. This exposes households to
is financed with federal resources but operated by catastrophic financial events. In 2000, an estimated
37 Health System in Mexico 855
three million Mexican families suffered cata- Health Organization 2013; OECD. OECD Health
strophic or impoverishing health expenditures Data 2013).
(Frenk et al. 2006). However, several studies Regarding human resources, there are 1.96
showed that by 2006 this figure began to decline doctors per 1000 population, below the OECD
due to the implementation both of several programs average (3.0) and other Latin American countries,
to combat poverty and Seguro Popular (Knaul et such as Argentina (3.0) and Uruguay (3.7) (World
al. 2006, 2011). Health Organization 2013). The scarcity of these
resources is particularly acute when it comes to
human resources for mental health: in Mexico
Physical and Human Resources there are only 0.02 psychiatrists per 1000 popula-
tion World Health Organization 2013). The avail-
Excluding medical offices of the private sector, ability of nurses, 2.7 per 1000 population, is also
the Mexican health system has about 27,000 below the OECD average of 8.6 (OECDiLibrary).
health units, 3976 of which are hospitals, for a
rate of 3.5 hospitals per 100,000 population
(Dirección General de Evaluación del Pharmaceuticals
Desempeño, Secretaría de Salud, México). Of
the total number of hospitals, 1386 (33.6%) are The Mexican market of pharmaceutical products
public and 2590 are private (66.4%). Of the total is the 12th largest market in the world and the
number of public hospitals, 2147 (54%) belong to second largest in Latin America, just below Brazil
SESA and MoH and 1829 (44%) to social security (Massachusetts Office of International Trade and
institutions. Investment; Chhabara). Mexico spends 27% of its
In 2010 the three main public institutions total expenditure on health in pharmaceuticals, the
(MoH/SESA, IMSS, and ISSSTE) had 74,064 third highest figure for OECD countries (OECD).
hospital beds and 2900 operating rooms for a About 80% of total expenditure in pharmaceuti-
rate of 6.5 beds per 10,000 population and 2.5 cals is concentrated in generic drugs, a market that
operating rooms per 100,000 population has shown important growth rates in the past
(Dirección General de Evaluación del decade.
Desempeño, Secretaría de Salud, México). Around 80% of total expenditure in pharma-
Private hospitals count with 34,000 hospital ceuticals is private and 90% is out-of-pocket, one
beds. Most of them are general hospitals and are of the highest figures in the world (Moïse and
concentrated in the largest cities of the country. Docteur 2008). The public sector concentrates
Most of them have 20 beds or less. Some of these 20% of the national expenditure in pharmaceuti-
units, in fact, can hardly be considered hospitals at cals and 35% of its volume. This difference is due
all since they have no laboratories, no radiology to the fact that most of the drugs purchased by
and imaging services, and no blood banks. public institutions are generics, which are consid-
The Mexican health system also has over erably cheaper than patented drugs.
20,000 public ambulatory units, most of which
belong to SESA (Dirección General de
Evaluación del Desempeño, Secretaría de Salud, Delivery of Personal and Public Health
México 2000). Services
Regarding high specialty medical equipment
and procedures, Mexico has a rate of 3.9 com- Health care services in public institutions are pro-
puted tomography units (CTU) and 1.3 radiother- vided at social security, MoH, SESA, and IMSS-
apy units (RTU) per million population, the O facilities. Those in the formal, private sector of
lowest and second lowest figures for OECD coun- the economy receive health services at IMSS
tries, respectively, which on average have 8.2 clinics and hospitals. Those in the formal, public
CTU and 6.9 RT per million population (World sector of the economy receive services at ISSSTE,
856 J. Frenk and O. Gómez-Dantés
PEMEX, SEDENA or SEMAR facilities. Those status to any particular health institution. These
affiliated to Seguro Popular receive health care at services include health promotion, risk control,
the MoH, SESA, and IMSS-O facilities. The latter and disease prevention activities, including vacci-
institutions also provide services to the uninsured. nation, and epidemiological surveillance.
All these public providers run their health care
network with their own personnel.
Private providers offer services through a very Quality of Care
heterogeneous networks that includes large hos-
pitals offering high-quality but expensive care in a Quality has been a concern of the Mexican health
few metropolitan areas and a large amount of system for a long time. A quality assessment
small hospital/clinics (general hospitals providing conducted at the end of the past century in more
mostly obstetric care) offering services of poor than 1900 public health centers and 214 general
quality. public hospitals documented problems with
Social security institutions and Seguro Popular waiting times, drug supply, medical equipment,
are allowed to hire private providers to supply and use of medical records. Historically, public
services for their affiliates when demand sur- institutions have operated as monopolies with no
passes capacity or when there is a lack of person- choice, poor responsiveness to consumer needs,
nel, equipment, or other inputs to provide any and lack of concern for quality. Furthermore,
covered service. In 2012 IMSS contracted-out health care facilities were not subject to a formal
dialysis and hemodialysis services for almost accreditation process.
US$ 340 million (Instituto Mexicano del Seguro In the past decade two national quality pro-
Social). grams were implemented: the National Crusade
Furthermore, as mentioned above, due to prob- for Quality in Health Care and Sícalidad. These
lems of access and quality of public services, initiatives were designed to improve standards of
many individuals affiliated both to social security personnel and technical quality in service delivery
institutions and Seguro Popular make regular use and enhance the capacity of citizens to demand
of private out-patient services paying out-of- accountability.
pocket. ENSANUT 2012 indicates that 39% of A central component of these initiatives was
total out-patient services are offered by private the strengthening of the certification process for
providers. public and private health units, which is now
The use of private hospital services by those coordinated by the National Health Council
affiliated to social security or Seguro Popular is (NHC), an institution created in 1917 as the
less common for two reasons: the quality of highest policymaking body in the sector. This
services offered by public providers tends to process was reinforced by a disposition incorpo-
increase with the level of care, and middle-class rated to the General Health Law in 2003 requiring
and poor households seldom have the resources the accreditation of all units providing services to
needed to make use of private hospital facilities. Seguro Popular.
ENSANUT 2012 indicates that only 17% of total Initiatives to monitor and improve the avail-
hospitalizations in Mexico occur in private facil- ability of drugs in public institutions were also
ities, down from 23.9% in 2000 and 20.9% in implemented in the early 2000. External measure-
2006 (Instituto Nacional de Salud Pública 2013). ments have shown major improvements in drug
This trend matches the upward trend in hospital- availability in all public institutions, especially in
izations observed in units of the MoH which ambulatory facilities.
increased from 25.9% of total hospitalizations in A national system of indicators, Indica, was
Mexico in 2000 to 38.3% in 2012, a clear effect of also put in place to monitor quality of care by state
the implementation of Seguro Popular. and institution. This monitoring system includes
Public health services are provided by MoH to indicators for waiting times for ambulatory and
all the population, regardless of its affiliation emergency care, waiting times for elective
37 Health System in Mexico 857
interventions, and distribution and dispensing of health investments to enhance human security
pharmaceuticals, among other indicators. through epidemiological surveillance and
Several external surveys have measured the improved preparedness to respond to emergen-
levels of satisfaction with health care in Mexico. cies, natural disasters, and the threats related to
Regarding overall satisfaction with hospital care, globalization, including potential pandemics;
ENSANUT 2012 indicates that 80.6% of health and a major reorganization leading to the estab-
service users consider health care services either lishment of a new public health agency
“good” or “very good” (Instituto Nacional de (COFEPRIS) charged with protection against
Salud Pública 2013). Social security institutions health risks.
providing services to oil workers and the armed Another crucial component of the health
forces show the highest satisfaction levels (97%), reform was an external evaluation that used a
followed by private facilities (92%). quasi-experimental design. This community trial,
implemented in 2005–2006 in over 38,000 house-
holds taking advantage of the phase-in implemen-
Recent Reforms tation of the intervention, showed that Seguro
Popular was reducing out-of-pocket expenditures
The creation of the SSPH in 2004 allowed for the and providing protection against catastrophic
expansion of health care coverage for the non- health expenditures especially to the poorest
salaried population while also improving the qual- households (King et al. 2009). Additional studies
ity of the available services and the protection also showed improvements in health service utili-
against health risks. This system was able to reor- zation and effective coverage both of preventive
ganize and increase public funding by a full per- and curative interventions, including interven-
centage point of GDP over 8 years in order to tions for the main causes of disease, such as dia-
provide universal health insurance. The vehicle betes and breast cancer (Lozano et al. 2006;
for achieving this aim was Seguro Popular. By Gakidou et al. 2006).
December of 2012, 52 million people were
enrolled in it (Comisión Nacional de Protección
Social en Salud). If we add to these figures those Assessment
affiliated to social security institutions and those
with private health insurance, we can reasonably As shown in this chapter, Mexico has made pro-
state that Mexico is on track to achieve universal gress in the three main objectives of health sys-
health coverage. tems: improving health conditions, enhancing
The reform also contemplated quality ori- responsiveness to the legitimate expectations of
ented initiatives including the organization of the population, and providing financial protection
training programs on quality improvement tools (Murray and Frenk 2000). However, the country
for health professionals; the monitorization of is facing emerging challenges.
quality indicators through the regular informa- Efforts to control pretransition ailments have
tion systems and external satisfaction and yielded significant progress. However, as
responsiveness surveys; and the establishment increased immunization coverage expanded and
of a compulsory accreditation for all units will- deaths due to diarrhea and acute respiratory infec-
ing to provide services to those affiliated to tions declined, NCDs began to exercise an
Seguro Popular. increasing pressure on the health of the population
Regarding public health, the Mexican reform and the health system. Salient among these chal-
established a protected fund for community lenges is a critical need for additional public
health services targeting health promotion and funding to extend access to costly interventions
disease prevention interventions, which allowed, for NCDs, such as cardiovascular diseases, can-
among other things, for a major expansion of the cer, diabetes, and its complications, and mental
basic immunization scheme; additional public health problems.
858 J. Frenk and O. Gómez-Dantés
Lozano R, Soliz P, Gakidou E, et al. Benchmarking of Sistema Nacional de Información en Salud, México.
performance of Mexican states with effective coverage. Información por temas. Available at: http://sinais.
Lancet. 2006;368:1729–41. salud.gob.mx/estadisticasportema.html. Accessed 15
Massachusetts Office of International Trade and Invest- Oct 2013.
ment. Mexican pharmaceutical industry. Available at: Soberón G. El cambio estructural en la salud. Salud Publica
http://www.moiti.org/pdf/Mexican%20Pharmaceutical Mex. 1987;29(2):127–40.
%20Industry.pdf. Accessed 15 Oct 2013. The World Bank. Gini index. Available at: http://data.
Moïse P, Docteur E. Las políticas de precios y reembolsos worldbank.org/indicator/SI.POV.GINI. Accessed 14
farmacéuticos en México, OCDE, 2007. Salud Publica Oct 2013.
Mex. 2008;50(suplemento 4):s504–10. The World Bank. Data. GDP growth (annual %). Available
Murray CJL, Frenk J. A framework for assessing the per- at: http://data.worldbank.org/indicator/NY.GDP.
formance of health systems. Bull WHO. 2000;78 MKTP.KD.ZG. Accessed 14 Oct 2013.
(6):717–31. UNDP. International human development indicators. Mex-
National Center for Health Technology Excellence. Mis- ico. Available at: http://hdrstats.undp.org/en/countries/
sion. Available at: http://www.cenetec.salud.gob.mx/ profiles/MEX.html. Accessed 14 Oct 2013.
descargas/folletoingles.pdf. Accessed 15 Oct 2013. World Health Organization. World health report 2000. Health
OECD. OECD Health Data 2013. How does Mexico com- systems: improving performance. Geneva: WHO; 2000.
pare. Available at: http://www.oecd.org/els/health-sys World Health Organization. Non-communicable diseases.
tems/Briefing-Note-MEXICO-2013.pdf. Accessed 16 Country profiles 2011. Geneva: WHO; 2012. p. 124.
Oct 2013. World Health Organization. World Health Statistics 2013.
OECD. OECDiLibrary. Pharmaceutical expenditure. Geneva: WHO; 2013.
Available at: http://www.oecd-ilibrary.org/social- World Health Organization. National health accounts.
issues-migration-health/pharmaceutical-expenditure_ Mexico. Available at: http://apps.who.int/nha/database/
pharmexp-table-en. Accessed 17 Oct 2013. StandardReport.aspx?ID=REP_WEB_MINI_TEM
OECDiLibrary. Health: key tables from OECD. Practising PLATE_WEB_VERSION&COUNTRYKEY=84027.
nurses. Available at: http://www.oecd-ilibrary.org/ Accessed 15 Oct 2013.
social-issues-migration-health/practising-nurses_ World Health Organization. National health accounts.
nursepract-table-en. Accessed 16 Oct 2013. Available at: http://www.who.int/nha/en. Accessed 15
Organization for Economic Cooperation and Develop- Oct 2013.
ment., OECD StatExtracts. Available at: http://stats. Ham-Chande R. Diagnóstico socio-demográfico del
oecd.org/index.aspx?DataSetCode=HEALTH_STAT. envejecimiento en México. In: Consejo Nacional de
Accessed 15 Oct 2013. Población, México. Mexico City: CONAPO; 2012a. p.
Partida V. Veinticinco años de transición 141–55
epidemiológica en México. In: CONAPO. La
situación demográfica de México 1999. Mexico
City: CONAPO; 1999. Further Reading
Reyna-Bernal A, Hernández-Esquivel JC. Poblamiento,
desarrollo rural y medio ambiente. Retos y prioridades
Three publications by 2000 the same
de la política de población. In: CONAPO. La situación
demográfica de México 2006. Mexico City: CONAPO; authors were particularly useful for the
2006. development of this chapter:
Ruelas E. Citizen’s quality councils: an innovative mech-
anism for monitoring and providing social endorsement Frenk J, Gómez-Dantés O. Para entender el sistema de
of healthcare providers’ performance. Healthcare salud. Mexico City: Nostra Editores; 2008.
Papers. 2006;6(3):33–7. Gómez-Dantés O. Mexico. In: Johnson JA, Stoskopf CH,
Secretaría de Salud. Programa Nacional de Salud editors. Comparative health systems. Global perspec-
2001–2006. La democratización de la salud en México. tives. Boston: Jones and Bartlett Publishers; 2009. p.
Hacia un sistema universal de salud. Mexico City: 337–47.
Secretaría de Salud; 2001. p. 33. Gómez-Dantés O, Sesma S, Becerril V, et al. The
Secretaría de Salud. Programa Nacional de Salud health system of Mexico. Salud Publica Mex. 2011;53
2007–2012. Mexico City: Secretaría de Salud; 2007. (suppl 2):S220–32.
Health System in the Netherlands
38
Madelon Kroneman and Willemijn Schäfer
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 862
Organization and Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863
Organization of the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863
Planning and Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864
The Role of Patients and the Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865
Dimensions of Coverage of Curative Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865
Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866
Pooling of Funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866
Purchasing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866
Health Spending and Cost Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
Physical and Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
Physical Resources: Hospitals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
Paying the Hospital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
Medical Specialists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868
General Practitioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868
Pharmacists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Nurses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Other Information on Health-Care Personnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Delivery of Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Public Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Primary/Ambulatory Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Specialized Ambulatory Care/Inpatient Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870
Pharmaceutical Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870
Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870
Mental Health Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870
Dental Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871
Out-of-Hour and Emergency Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871
death. The burden of disease is higher among insurance funds (sickness funds), the Dutch sys-
immigrants than among native Dutch inhabitants. tem represents an innovative and unique variant
Important risk factors affecting the health of the of a social health insurance system.
Dutch population are smoking and overweight. The Dutch population aged 18 years and older is
Between 2000 and 2010, the average of regular obliged to take a health-care insurance for the basic
daily smokers was slightly below the EU average health-care package. Children under the age of 18
(World Health Organisation 2013). According to are included in the policy of one of their parents,
self-reported data, almost half of the population is and their premium is paid by the government.
overweight (Statistics Netherlands 2013). Health insurers are obliged to accept applicants
without restrictions. Differentiation of premiums
for different risk conditions (such as age, sex, and
Organization and Governance chronic diseases) is not allowed. Health insurers
are free to set community-rated premium and to
Organization of the System contract health-care providers, under the condition
that they have to operate within the national health-
A lengthy process of policy efforts to reform care budget set by the government and that they
the health-care system and to introduce managed have to contract sufficient providers to ensure good
competition into the system finally resulted in the access to care for their insured population. Health
new Health Insurance Act (Zorgverze- insurers are compensated for their insured with
keringswet) in 2006. With the introduction of a high risk for health-care costs via a risk adjustment
single compulsory health insurance scheme, the scheme. In addition to the basic insurance package,
former dual system of public and private insur- health insurers offer voluntary complementary
ance for curative care was abandoned. Managed insurance for care that is not covered by the Health
competition for providers and insurers became a Insurance Act. For instance, a (partly) coverage of
major driver in the health-care system. The new glasses or dental care is often part of the voluntary
system introduced new roles for patients, health insurance.
insurers, health-care providers, and the govern- General practice plays a central role in the
ment. Three markets exist: the health insurance Dutch health-care system. All citizens are listed
market, the health provision market, and the with a general practitioner (GP) or GP practice.
health purchasing market. Within the health- GPs serve as gatekeepers: patients have to visit
care purchasing market, insurers have to negoti- their GPs first for their health complaints, and only
ate with providers on price, quality, and volume upon referral they can go to a medical specialist.
of care. In the health-care provision market, About 96% of all contacts are dealt with within
patients can choose the provider they prefer. In primary care (Cardol et al. 2004). An important
the health insurance market, citizens can join a prerequisite is that GP care in the Netherlands is
health insurance policy which best fits their freely accessible and exempted from the compul-
needs and requirements. The system of managed sory deductible which is currently in place for
competition is currently in place for the curative other forms of care.
health-care sector and part of the mental health- Long-term care (nursing care and long-term
care sector (ambulatory mental care and institu- mental care) is regulated by the Exceptional
tional mental health care up to 1 year). Since Medical Expenses Act (AWBZ). This Act was
2006, the role of the national government has intended originally (1968) to provide care for
changed from directly steering the system to those with chronic conditions requiring continu-
safeguarding the proper functioning of the ous care that involves considerable financial con-
health-care markets. With the introduction of sequences. Since the introduction of the Act,
market mechanisms in the health-care sector many types of care have been added resulting in
and the privatization of former public health a rapid growth in expenditure in such a way that
864 M. Kroneman and W. Schäfer
the affordability became at risk and thus the call sets the rules for risk adjustment among health
for reform became urgent. During the past decade, insurers. In the care sector, the central government
social support for disabled and chronically ill and has a number of explicit responsibilities. These
several forms of home care were already include creating the preconditions for quality,
transferred to municipalities. In 2015 the long- accessibility, safety, and affordability of the care
term care in the Netherlands was completely for people with chronic conditions; strengthening
reformed. Home nursing care for people who the position of citizens, in particular patients and
require 24 hours supervision per day is now reg- their representatives; and stimulating innovation.
ulated by the Long-term Care Act (Wet langdurige To meet these responsibilities, the government has
zorg). People in the need of care who live at home supervisory and advisory bodies in place. Further-
receive care through the Social Support Act (Wet more, at national level, there is legislation which
maatschappelijke ondersteuning) wich is the describes the conditions in which the markets have
responsibility of municipalities. Home nursing to operate.
care became part of the Health Insurance Act
and is now the responsibility of health insurers. Supervisory Bodies
Independent supervisory bodies take care of
safeguarding accessibility, affordability, and
Planning and Regulation quality of care:
The role of the Dutch government is steering from • The Dutch Healthcare Authority (NZa) super-
a distance. They define the framework in which vises the compliance of actors with the Health
health care can be developed. Responsibilities Insurance Act (Zvw) and the Health Care
have been transferred to insurers, providers, and Market Regulation Act (Wmg). NZa interferes
patients, and the government only supervises with restrictions or obligations when an actor,
quality, accessibility, and affordability of health that is a health insurer, health-care provider, or
care. The establishment of new supervisory agen- consumers, together or alone, hinders fair com-
cies in the health sector aims to avoid undesired petition in (part of) the health-care market. Fur-
market effects in the new system. Traditionally, thermore, the NZa establishes tariffs and
self-regulation has been an important characteris- performance directions for those health services
tic of the Dutch health-care system. Professional that are not subject to free negotiations. Lastly,
associations are responsible for reregistration the NZa monitors health-care markets and pro-
schemes and are involved in quality improve- motes its transparent and fair operation. In addi-
ment, for instance, by developing professional tion, the NZa imposes on tariffs for health
guidelines. services that are not freely negotiable and on
extending the share of freely negotiable services.
Responsibilities of the National • The National Healthcare Institute (Zorginstituut
Government Nederland) advises the Ministry of Health, Wel-
The government should ensure that managed com- fare and Sport on the content of the basic health
petition results in safe, accessible, and affordable insurance package. Furthermore, it supplies
health care of good quality. Only a few instruments information to insurers (but also consumers
have been left to the government to directly inter- and providers) on the nature, content, and
fere in the health-care system. An essential compe- scope of the basic health insurance. The
tence of the government is setting the budget for Healthcare Institute also administers the Health
health-care expenditures. Other important compe- Insurance Fund and operates the risk adjustment
tences of the central government are taking deci- scheme.
sions on the content of the basic health insurance • The Health and Youth Care Inspectorate (IGJ)
package and on cost-sharing. Furthermore, in order supervises the health-care providers in the
to prevent preferred risk selection, the government areas of quality and safety.
38 Health System in the Netherlands 865
The Role of Patients and the Population less all primary and secondary curative care.
Excluded are dental care for persons older than
Within the Dutch health-care system, the popula- 18 years of age and some elective procedures such
tion is free to choose a health insurer. The idea is as plastic surgery without medical indication and,
that people will choose those insurers with the since 2013, simple walking aids. Partly covered
best price/quality performance. In practice, the are, for instance, allied health care, some medi-
main reason for people to switch is the level of cines, and in vitro fertilization.
the premium for the basic insurance package and Citizens pay for their health insurance
competition on quality of care seems to be absent through a community-rated premium and an
(see www.hspm.org). Patients are expected to income-dependent contribution. For 2013, the
choose providers based on quality (for instance, community-rated premium varied from €92 to
through providers selected by health insurer and/ €112 per month. Health insurers are free to set
or by comparing providers on quality on the the premium level. The insured persons pay these
website www.kiesbeter.nl). In practice, patients premiums directly to their health insurer. For chil-
follow the recommendation of their GP in choos- dren below the age of 18, the government covers
ing a health-care provider (Dautzenberg et al. the premium through a contribution into the
2012; Reitsma et al. 2012). Health Insurance Fund. Insurers are not allowed
to differentiate the premium of one specific policy
for the basic benefit package for different groups
Financing of people. There is one exemption: insurers may
offer collective contracts. Collective contracts are
Dimensions of Coverage of Curative established between groups of insured (e.g., a
Care company with employees) and the insurance com-
pany. Insurance companies are allowed to offer a
Basic health insurance is obligatory for all Dutch maximum of 10% reduction on the individual
residents. Children under the age of 18 are insured premium. Insured people are free to join a collec-
free of charge but have to be included in one of the tive policy or buy an individual policy. The sys-
parents’ policies. The nominal premium for chil- tem of collective policies is established to give the
dren is paid by the government. For persons aged insured more influence (“voice”) on the insurance
18 or over, there is a compulsory deductible of companies. The threat of the loss of a large num-
€385 in 2018. Excluded from this deductible are ber of insured persons may persuade insurers to
GP care, maternity care, and dental care under the satisfy the collectivity and compete on price and
age of 18. In addition to the compulsory deduct- quality of care. In addition, successful negotia-
ible, people can choose for a voluntary deductible. tions may lead to more demand-driven care and
This voluntary deductible may range from €100 to care that is tailored to the need of the target group
maximum €500 in exchange for a reduction on the of the collective. In 2012, 67% of the insured
premium. persons participated in a collective insurance
The basic health insurance covers all curative policy.
(somatic and mental) health care that is considered The income-dependent contribution is col-
essential, effective, cost-effective, and unaf- lected by the Tax Office, which levies the contri-
fordable for individuals. “Essential” refers to its bution from salary together with payroll taxes.
capacity to prevent loss of quality of life or to treat After collecting all the contributions, the Tax
life-threatening conditions. The affordability Office transfers the money to the Health Insurance
criteria state that no services need to be included Fund (Zorgverzekeringsfonds), where the money
that are affordable for individual citizens and is allocated after risk adjustment to the health
for which they can take responsibility (Brouwer insurers.
2004). The content of the benefit package is To ensure access to basic health insurance
defined by the government and covers more or under a system with flat rate premiums and to
866 M. Kroneman and W. Schäfer
compensate for undesired income effects for Insurance Act. Under certain conditions, people
lower-income groups, a “health-care allowance” can receive a personal budget to buy the care they
funded from general tax was created. In 2011, need.
six out of ten households received a health-care To cover the expenses for the Wlz, a contribu-
allowance of on average €85 per month. People tion of 9.65% is levied on the salary of the citi-
with chronic diseases or a handicap receive a zens, with a maximum of €3,280 per year (2018).
compensation of €99 per year for the compulsory The revenues are collected by the Tax Office and
deductible in 2013. transferred to the Long-term Care Fund, adminis-
tered by the Netherlands Healthcare Institute. The
Voluntary Health Insurance (VHI) expenses for the Social Support Act are covered
Most insurance companies offer voluntary by general taxes and are transfered to the
packages in combination with the basic benefit municipilities through the municipality fund.
basket. In 2012, 88% of the insured took out The budget is not earmarked.
complementary VHI (Ten Hove et al. 2012).
VHI covers for care that is not included in the
basic package, for instance, dental care, glasses, Pooling of Funds
or physical therapy (for persons without a chronic
indication). In addition, some co-payments may In the Netherlands, administering and providing
be covered, for instance, for ambulatory mental basic health insurance are delegated to private
care. Contrary to basic health insurance, health health insurers. These insurers are funded by
insurers are free to set premium levels and may the nominal premium directly received from
apply preferred risk selection for complementary clients and a contribution from the Health Insur-
VHI based on medical criteria or other risk ance Fund, which pools the income-dependent
factors. Insurers are obliged to offer VHI inde- employer contributions (collected by the Tax
pendent from the basic health insurance, but Office) and the state contribution (e.g., to cover
some insurers discourage taking VHI without a children under 18). The allocation among the
basic insurance by increasing the premium or by health insurers is based on the health risk profile
stating that VHI can only be taken when a basic of their insured population. The government sets
insurance is taken at the same insurer (Roos and the level of the income-dependent contribution,
Schut 2009). with the notion that, at national level, the total
income-dependent contributions for adults should
amount to approximately 50% of the total funding
Long-Term Care of basic health insurance, while the nominal
premiums should account for the other 50%.
Exceptional Medical Expenses Act
(AWBZ)
Long-term care is insured under the Long-term Purchasing Process
Care Act (Wlz). This is a social health insurance
scheme that is intended to provide care for those Health insurers buy health care for their insured
with chronic conditions (physical and/or mental) population (possibly by selective contracting).
requiring requiring 24 hour supervision (either They negotiate contracts with hospitals (on
physically, mentally or medically) per day. Every- volume and quality but also lump sum) and
one who is legally residing in the Netherlands or with committees that represent GPs. The negoti-
pays payroll tax in the Netherlands is compulsory ations with GPs are in practice hardly on tariff
insured. At present (2018), long-term care at but more on activities aimed at increasing GP
home is provided by municipalities under the care and substitution of secondary care to pri-
Social Support Act (Wmo). Home nursing is pro- mary care (modernization and innovation
vided by health insurers under the Health activities).
38 Health System in the Netherlands 867
Apart from these direct costs, also indirect costs General Practitioners
such as education, research, and overhead are
included. The reimbursement for each DBC is In 2012, there were 8,879 GPs (53 per 100,000
not influenced by longer hospital or shorter inhabitants), 43% of whom were female. GPs
hospital stay or a deviant number of diagnostic work in independent practices, either alone
procedures for a certain patient. (26%) or with two or more other GPs (74%).
Since the introduction of the DBC system, Patients are listed with a GP practice. About
there were two segments: the freely negotiable 11% of the GPs work in salaried service for
segment and the regulated segment. To get used other GPs; the majority of these salaried GPs is
to the new system, in which health insurers and female (87%) (Van Hassel and Kenens 2013).
hospitals had to negotiate prices for the DBCs, GPs receive a capitation fee per patient per year.
only a small part (10% in 2005) was freely nego- For older patients and patients from deprived areas,
tiable, and the prices for the regulated part were a higher fee is applicable, but this is only paid if
based on the former system of paying the hospi- there is an agreement with the health insurer
tal. Gradually the freely negotiable part (Nederlandse Zorgautoriteit [Dutch Healthcare
increased. In 2012, the former system of paying Authority] 2011). Per patient contact the GP
the hospital was abolished, with a transition receives a fee, differentiated toward practice con-
model for the years 2012 and 2013. Now there sultations, home visits, telephone consultations,
is a freely negotiable part (about 70% of the DBC and prescription refills. Practice nurses take part in
turnover) in which hospitals and insurers are free the routine care for chronically ill persons in the
to set prices and a regulated part for which the general practice, like diabetes, hypertension, and
Dutch Healthcare Authority (one of the supervi- COPD/asthma. Fees for practice nurses are freely
sory organizations) establishes maximum prices. negotiable or are part of integrated care agreements.
In practice, some insurers do not negotiate prices Integrated care agreements are financed via bun-
for the DBCs but negotiate a lump sum amount dled payments. Integrated care addresses the care
with the hospitals. for patients with the following chronic conditions,
As compensation for investments is diabetes type II and COPD, and persons with high
included in the tariffs, since 2008 for hospitals risk for cardiovascular diseases. According to the
and since 2009 for long-term care institutions, system of bundled payments, a care group orga-
health institutions are fully responsible for nizes all care that is necessary for managing these
the realization of their (re)constructions and the diseases. Care groups are owned by GPs in a certain
purchase of equipment. No external approval region; they vary in size from 4 to 150 GPs. The
of building plans applies, although the care group coordinates the care and pays the differ-
quality of premises is externally assessed every ent care providers who are involved in the care.
5 years. Patients are free to participate in integrated
care or to organize the necessary care themselves.
Besides the abovementioned payment methods,
Medical Specialists GPs may negotiate with insurers for the financing
of activities for improvement of efficiency or sub-
Medical specialists are either independent stitution of care. These activities are only reim-
professionals organized in partnerships working bursed if this is negotiated in a contract with the
in a hospital (55% in 2010) (Nederlandse health insurer.
Zorgautoriteit [Dutch Healthcare Authority] & Out-of-hour services for GP care are mostly
DBC-onderhoud 2012); or they are in salaried provided by GP out-of-hour cooperatives. GPs
service of a hospital. Since 2008, medical special- who participate in this system receive a per hour
ists are paid through the DBC system. The inde- compensation. The majority of GPs participate
pendent partnerships have to negotiate their tariffs in a GP out-of-hour cooperative (approximately
with the hospital they work in. 97% in 2013).
38 Health System in the Netherlands 869
Oral health care is provided in primary care by The main reform in the Dutch health-care system
private dentists and dental hygienists. Most citi- took place in 2006. The dual system in which two
zens register with a dentist. Most dentists work in third of the population (earning an income below a
small independent practices (about 70%). Dental certain threshold) was insured publicly and one
hygienists are specialized in preventive care and third privately was abolished. Since 2006, there
can be visited directly or upon referral from the is one insurance system for all citizens, with a
dentist. Preventive tasks and relatively simple community-rated premium that cannot be differen-
dental care are increasingly being substituted to tiated toward different risk groups. Insurers are
dental hygienists. Nine out of ten dentists regu- obliged to accept citizens who apply for a health
larly refer to a dental hygienist either in their own insurance policy. Together with this reform, the
practice, to the practice of a colleague, or to an financing system changed. Although some aspects
independent dental hygienist practice. of market forces were already incorporated into the
system before the reform, since 2006, market
mechanisms became officially introduced into the
Out-of-Hour and Emergency Care system. This imposed a new role for especially
health insurers and health-care providers. They
Patient with nonlife-threatening conditions goes had to learn to negotiate on price, volume, and
to the special GP cooperatives for out-of-hour quality. To ensure a smooth transition, in the first
care. For life-threatening conditions or upon refer- years, only a small part of the provided care was
ral of the GP in the GP post, patients can go to the freely negotiable. This share increased over the
24-h emergency department of the hospital. years, and in 2012, about 70% of the hospital care
expenditure was freely negotiable, with the
remaining 30% being regulated covering care that
Informal Care is too difficult or not suitable for free market nego-
tiations, such as intensive care in hospitals. The
The estimates of the number of people who Dutch Healthcare Authority defines the care activ-
provide informal care vary from approximately ities that are subject to remuneration. The prices for
1.7 million people (Oudijk et al. 2010) to 3.7 these activities in the free segment can be negoti-
million (Houben-van Herten and Te Riele 2011). ated by the market parties, although for some
Informal carers (60% women, about half in the issues, maximum prices are set. For instance, for
age of 45–65 years old) provided care (emotional the remuneration of independent medical special-
support, household work, accompanying during ists, a maximum hourly tariff is set. Selective
visits to family) to ill or disabled people, mostly to contracting by insurers is allowed, as long as
parents (40%) or spouses (18%). It is the policy of insurers can assure sufficient care for their clients.
the government to stimulate informal care, in However, until recently, none of the large insurers
order to keep healthcare affordable. opted for selective contracting. There has been one
872 M. Kroneman and W. Schäfer
attempt by a large insurer to refrain to contract a be available and affordable for all citizens. The
large hospital in the Dutch capital in 2012, which increasing demand for care and increasing costs
got a lot of attention in the Dutch newspapers. The as a result of technological and demographic
hospital finally agreed with the lower budget and developments may result in fundamental
thus can still provide care to their patients. changes in health care. People are encouraged
Another important reform is found in to stay at home as long as possible, with the aid
the Exceptional Medical Expenses Act (AWBZ). of informal carers and volunteers. Examples of
This Act regulated long-term care in the new initiatives are institutional care providers,
Netherlands up to 2015. However, over the years, who aim to agree by contract with informal
the act encompassed more and more care activities, carers to provide a minimum of 4 h of informal
leading to a strong increase in expenditure. The care per month. This led to a lot of societal
main target of the reform is to reduce the care commotion. Furthermore, mild forms of institu-
insured under the act to care where it initially was tional care are no longer provided, and new
meant for: care that is unaffordable for individual patients needing this type of care will receive
citizens and their insurers. This is, for instance, care this care at home.
in a medical home for the elderly. The following Dutch citizens are on average very satisfied
care was transferred from the AWBZ to other acts. with their health-care providers (they give a
Home help and social support became a responsi- score of 7.7–7.9 on a scale of 1–10) (Statistics
bility of municipalities under the Social Support Netherlands 2012). Healthy persons are slightly
Act (WMO). Curative mental care was transferred more satisfied than persons with ill health, and
to the Health Insurance Act and became part of the lower-educated people are more satisfied than
basic insurance package (for the first three years of young people and higher-educated people.
care). Youth care was transferred to municipali- In 2011, life expectancy for males was
ties under the Youth Act. Home nursing care is 79.2 years and for females 82.9 years. In the
transferred to the Health Insurance Act. The most past decade, the life expectancy for men
important consequence of this choice is that under increased with 3.4 years and for women with
the Health Insurance Act, citizens have a right on 2.2 years. Healthy life expectancy increased sig-
certain care whereas under the municipalities, the nificantly for men (from 9.2 to 10.9 healthy years
emphasis will be on individual responsibility. for 65-year-olds) but not for women (Statistics
Municipalities have the obligation to compensate Netherlands 2012).
citizens in such a way that they can participate in Mortality from cardiovascular diseases has
the society. The individual circumstances of the steadily decreased over the past decade. Several
citizen may be taken into account. This is called factors have contributed to this decrease, such as
the compensation principle: tailor-made measures a better treatment of high cholesterol and high
instead of rules. The reform came with a major blood pressure and more attention for a healthy
reduction in the budget, since municipalities were lifestyle. Furthermore, more people are aware of
considered to be closer to their citizens and thus the fact that they have a high blood pressure,
better able to efficiently organize the care. making treatment possible. Besides, the develop-
ment in technological options to treat cardiovas-
cular diseases resulted in more patients surviving
Assessing the Health System the disease (Statistics Netherlands 2012). Mortal-
ity due to cancer increased lightly in the past
Some Indicators of Health and Health decade. In 2008, cancer got ahead of cardio-
Care in the Netherlands vascular diseases as most important cause of
mortality.
The Dutch government stipulated in the explan- Affordability of health care is still a cause of
atory note accompanying the health-care budget debate in the Netherlands. Expenditure on health
in 2013 that essential care of good quality should care continues to increase over the years, both due
38 Health System in the Netherlands 873
to increasing prices and an increase in volume in prices for the freely negotiable part of hospital
of care. The government wishes to diminish care showed a decrease in 2010 of 3% and in
especially the increase in volume of care. From 2011 of 1.3%. However, these decreases are
2006 to 2011, the expenditure on care under the mainly due to the tariff caps for medical special-
Health Insurance Act and the Exceptional Medi- ists that were the result of the large increase in
cal Expenses Act increased with on average 4.4% medical specialist’s income in the years before.
per year. In 2011, the expenditure increased with Selective contracting in the health purchasing
3.6%. market is currently still in its infancy. In 2012,
Citizens find accessibility of and solidarity in a large insurer decided to not contract a large
health care important. However, citizens appear to hospital, but later that year, the hospital accepted
have little insight in health-care expenditure. They the lower tariffs proposed by the insurer. In the
are aware of the compulsory deductible and of health-care provision market, patients mainly go
the community-rated premium for the Health to the medical specialist who is advised by their
Insurance Act, but they are hardly aware of general practitioner. There is information avail-
the income-related premiums for the Health able on the Internet on quality of care, but con-
Insurance Act and the Exceptional Medical sumers find it difficult to use this information
Expenses Act that is paid by their employer (Damman et al. 2012).
directly to the government (Kooiker et al. 2012).
Competition in care is not popular among Dutch
citizens, it is associated with a profit orientation, The Dutch Health-Care System in
expensive managers, and a large overhead International Perspective
(Kooiker et al. 2012).
The accessibility of the Dutch health-care When looking at health-care supply, the
system is excellent. Nearly all citizens are Netherlands has a low number of acute care
insured, and waiting times are on average accept- hospital beds with 301 beds per 100,000 inhab-
able. There are a few specialisms that have a itants in 2010, below the EU average, but 10
larger waiting time than the norm of 4 weeks countries have a lower number of beds, with
for a first appointment and only for a few treat- Finland on top with about 180 beds per
ments the waiting time exceeds what is seen as 100,000 inhabitants. The supply of long-term
acceptable. beds (in nursing and elderly homes) is large
Competition in the health insurance market compared to most European countries, with
seems to be present. In 2012, 6% of the citizens 1,036 beds per 100,000 inhabitants. For those
switched insurers, and in 2013, this was 8.3%, countries where information is available, only
which can be seen as an indicator that competi- Finland and Malta have a higher supply of long-
tion in this market is present. In the health-care term care beds in 2011. The Netherlands has
purchasing market, nearly all general practi- nearly the lowest number of physicians in
tioners are contracted by the health insurers for Europe (58 physicians per 100,000 inhabitants),
the maximum tariff (Nederlandse Zorgautoriteit with only Denmark and Ireland having even
[Dutch Healthcare Authority] 2012). Health lower numbers. The number of general practi-
insurers managed to contract 90% of the hospi- tioners is also below the EU average, with
tals for the year 2011 before the end of that year, 72 GPs per 100,000 inhabitants, the EU average
which is rather late, considering that health being 82 GPs per 100,000 inhabitants (World
insurers have to publish their premiums in Health Organisation 2013).
November of the year before. To evaluate quality Acute care hospital admission rates are
of care, several indicators have been developed among the lowest in Europe with 11.4 admis-
by the Dutch Healthcare Authority, but these are sions per 100 inhabitants in 2009. Since 2001,
not yet published due to the fact that they cannot with 8.8 admissions per 100 inhabitants, the
yet be corrected for casemix. The development number of admissions is increasing. The average
874 M. Kroneman and W. Schäfer
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878
Organization and Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878
Organization and Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878
Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879
Health Information Systems and Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879
The Role of Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
Coverage and Subsidies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
Sources of Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881
Cost Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882
Pooling of Funds and Purchasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
Physical and Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
Healthcare Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
2012 Singapore Healthcare Professional Workforce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
Workforce Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
Paying Healthcare Professionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
Delivery of Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
Primary Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
Community Health Assist Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
Care Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
Breakdown of operators for various long-term care services . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
Mental Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
Pharmaceutical Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887
The Private Hospitals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887
Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887
Main Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887
Recent Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
Planned Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
User Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
Health Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
Transparency and Accountability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890
Among the citizens and permanent residents (i.e., Organization and Governance
excluding nonresidents in the country), approxi-
mately 23% fall under the age of 20, 67% are Organization and Planning
between 20 and 64, and 10% are 65 or older.
The median age is 38.4 (Department of Statistics, Singapore’s Ministry of Health has overall govern-
Singapore 2013). ment responsibility for addressing the healthcare
Until 1959, the year it achieved internal self- needs of the people. Key ongoing activities include:
government, Singapore was a colonial outpost of assessment of needs and planning for services and
the British Empire. At the time of the British with- for manpower, governance, and financing.
drawal, the country was impoverished, with no Assessing needs: The ministry makes regular
industrial base or natural resources upon which to projections of the disease burden and determines
build its economic future. After a brief and unsuc- whether the current levels of service are sufficient.
cessful merger (1963–1965) with Malaysia – its Service gaps that are detected are prioritized at the
much larger neighbor to the north – Singapore national and the regional levels.
became a fully independent nation under a govern- Services planning: The ministry projects facil-
ment controlled by the People’s Action Party or PAP. ity requirements for primary care locations, acute
The People’s Action Party has been the major- and community hospitals, nursing homes, and
ity party ever since, and its longevity in power has other services. Local care models are assessed to
provided Singapore with a remarkable era of ensure they remain up to date with the latest
political stability. This stability has over the medical advances as well as local developments.
years nurtured a consistent political vision, a con- The ministry is also responsible for planning and
stancy of purpose and action, and a culture of developing the systems IT capability.
cooperation among all government ministries. Manpower planning: The ministry projects
As a result, its policymakers have been able to manpower demand and responds with training
develop and implement extremely long-range and education, attracting talent, and overseas
plans that reflect the nation’s desire for collective recruitment as necessary to meet demand. It is
39 Health System in Singapore 879
also responsible for workforce management practice, ethics, and standards of care and to
including retention and upgrading of skills. consult on policy and operational matters. The
Governance and financing of the system: The ministry also engages them to explain policy
ministry is also responsible for financing policies rationale and garner their support in
and governance, including a performance man- implementing various initiatives.
agement system. It also creates feedback mecha- The Health Sciences Authority regulates the
nisms to drive continual improvement in all areas manufacture, import, supply, presentation, and
of responsibility. advertisement of health products – including med-
icines, complementary medicines (traditional
medicine and health supplements), cosmetic prod-
Regulation ucts, medical devices, tobacco products, and
medicinal products for clinical trials. Its mission
The healthcare system is regulated by the Ministry is to ensure that all meet internationally
of Health through legislation, regulation, and benchmarked standards of safety, quality, and
enforcement. One of its agencies, the Health Sci- efficacy.
ences Authority, regulates health products, includ- The insurance industry is regulated by the
ing medicines. Professional bodies, including the Monetary Authority of Singapore as part of its
Singapore Medical Council, Singapore Dental role as the financial regulatory authority of Singa-
Council, Singapore Nursing Board, and Singa- pore. The Ministry of Health regulates the seg-
pore Pharmacy Board, self-regulate their ment of the health insurance market for plans that
healthcare professionals through codes of ethics are paid by Medisave.
and conduct, practices, and guidelines.
One of the core regulatory functions of the
ministry is the licensing of healthcare institutions Health Information Systems and
under the Private Hospitals and Medical Clinics Technology
Act and conducting regular inspections and
audits. These institutions provide services that Singapore benefits from an information manage-
aid in or provide medical diagnosis, treatment, ment system that collects, reports, and analyzes
rehabilitation, and management of patients. Lab- information to aid in the formulation of policy as
oratory and radiology services are two examples. well as the monitoring of implementation.
Public and private hospitals, clinics, laboratories, Sources of information include administrative
and nursing homes are required to submit appli- data and survey-based data, articles, and reports
cations to the ministry for the license to operate. from professional journals and reports and from
Pre-licensing inspections are conducted to ensure external organizations.
standards. Complaints, surveillance, and analysis The Singapore healthcare system is heavily
of advertisements are used to identify potential invested in IT infrastructure and in the develop-
problems, and they are followed up with compli- ment of information systems for processing and
ance audits and possible prosecutions. Marketing storing large volumes of data in support of policy
by these licensed facilities is also regulated in research, planning, operations, and monitoring.
order to safeguard the public against false or High-quality data standards, IT security, and
unsubstantiated claims and to prevent inducement audits are utilized to ensure accuracy and reliabil-
to use nonessential services such as aesthetics ity of all information collected. In addition, exter-
medicine. nal data is carefully screened to ensure that
The ministry also works closely with profes- sources are reputable and trustworthy.
sional bodies such as the Academy of Medicine Both public and private healthcare providers
and the College of Family Physicians and with are required to report their service statistics to the
union-associations such as the Singapore Medi- Ministry of Health, including two types of infor-
cal Association as well as industry groups to mation: inpatient capacity and utilization, such as
discuss a wide range of issues such as their number of inpatient beds, beds in service, bed
880 W. A. Haseltine and C. Liu
occupancy rates, inpatient discharges, and aver- approach their healthcare choices knowing that
age lengths of stay, and surgical procedures, they will pay a part of the bill. Still, national
including inpatient and day surgeries, and saving accounts, insurance programs, and a safety
deliveries. net help to ameliorate the financial burden.
In addition, public providers are required to
report on their polyclinic, specialist outpatient,
and emergency department attendances. Coverage and Subsidies
with a corresponding range of amenities, choices, Means testing in public hospitals as of 1 January 2009
and prices but access to the same doctors and Citizens subsidyd
assurance of the same quality of care. There are Average monthly income of Class C Class B2
four classes: A, B1, B2, and C. A is the most patient (SGD)a ward(%) ward(%)
costly, with C the least costly. A-level wards con- $5,201 and abovec 65 50
a
tain private rooms with bath, air conditioning, and Monthly income is defined as average monthly wage
access to private doctors of the patient’s choice. C based on last available 12 month data (including bonuses)
b
No income and property with annual value (estimated
patients are in open wards, with eight or nine value of a property if it were rented out) $13,000 and below
patients in a room, sharing a bath, and usually c
No income and property with annual value exceeding
without air conditioning. Doctors are assigned to $13,000
d
these patients. Subsidies for Singapore permanent residents in most
income bands will receive half the corresponding subsidy
As amenities increase, subsidies decrease. that citizens receive (Ministry of Health, Singapore)
Patients in the A wards receive no subsidy, while
C-ward patients receive subsidies of up to 80% – Patients do have a choice in the matter of ward
depending on their income – of their ward classes. Individuals with high incomes can choose
charges, drugs, and medical treatment. C-ward the C ward, but their subsidy would be much lower
patients also receive subsidies on surgical proce- than what a low-income individual receives. Con-
dures and physicians’ fees. In the wards between versely, low-income patients can choose to stay in a
A and C, subsidies increase as amenities and class A ward if they can pay for it.
choices decrease.
Class ward Subsidy level
Sources of Revenue
A 0%
B1 20%
Government Healthcare Budget
B2 65–50%a
Funding of the healthcare system takes place
C 80–65%a
a
through the Ministry of Health. The ministry’s
Financial means testing determines eligibility for subsidy
for patients in C and B2 wards budget for fiscal year 2013 is $5.7 billion. The
ministry’s budget is used for healthcare subsidies,
promoting good health practices in the population,
Means testing in public hospitals as of 1 January 2009
developing manpower, training of healthcare pro-
Citizens subsidyd fessionals, and infrastructure. A total of $4 billion
Average monthly income of Class C Class B2 is allocated for subsidies to Singaporeans receiving
patient (SGD)a ward(%) ward(%) medical care at the public hospitals, polyclinics,
$3,200 and belowb 80 65 community hospitals, and institutions providing
$3,201–$3,350 79 64 intermediate and long-term care. A sampling of
$3,351–$3,500 78 63 other budget allocations include: $177 million for
$3,501–$3,650 77 62 initiatives addressing obesity prevention, tobacco
$3,651–$3,800 76 61 control, childhood preventive health services,
$3,801–$3,950 75 60 chronic disease management, and public
$3,951–$4,100 74 59 education and $70 million for Medisave grants to
$4,101–$4,250 73 58 newborn Singapore citizens (Ministry of Health,
$4,251–$4,400 72 57
Singapore 2013c).
$4,401–$4,550 71 56
$4,551–$4,700 70 55
$4,701–$4,850 69 54 Private Expenditure on Healthcare
$4,851–$5,000 68 53 The other major source of funding for the system
$5,001–$5,100 67 52 is private financing and expenditure on healthcare.
$5,101–$5,200 66 51 Singaporeans pay co-payments and deductibles
(continued) that are often higher than in other nations.
882 W. A. Haseltine and C. Liu
According to the World Health Organization, pri- expensive or long-term treatment. Insured patients
vate expenditure amounts to almost 70 (68.6)% of must usually pay 20% of the cost of such care.
the nation’s total expenditure on care. This statis- Private, Medisave-approved insurance, called
tic reflects the government’s guiding philosophy Integrated Shield Plans, are meshed together with
that healthcare is not free and, as stated earlier, MediShield to form an integrated plan for users.
that consumers of care must pay a portion of Such private plans give patients additional bene-
the cost their care. Of the private expenditure, fits and coverage for paying the costs of private
74.2% represent out-of-pocket expenditure hospitals or Class A and B1 wards in the public
versus 8% from Medisave and 6% from hospitals. Policyholders keep the benefits and
MediShield and Integrated Shield Plans (World coverage afforded then by their basic MediShield
Health Organization 2013; Ministry of Health, plans. In addition, Medisave can be used to pay
Singapore). the premiums of the approved, private plans, sub-
At the heart of Singapore’s system of private ject to a limit. Like MediShield, they also include
financing and expenditure are mandated savings deductibles and co-payments in accordance with
and insurance programs that help consumers pay the healthcare systems requirement that con-
for care. They are known as the “3Ms” – sumers of care must contribute to the cost of
Medisave, MediShield, and Medifund. They their care. Catastrophic insurance is widely held
play a critical role in maintaining the health and covers partial costs of expensive or long-term
and welfare of Singapore’s people and the suc- treatment. Insured patients must usually pay
cess of the healthcare system itself. The most 10–20% of the cost of such care (Ministry of
critical component of the trio is Medisave, a Health, Singapore).
mandatory, individual medical savings account Medifund, the third “M” is an endowment pro-
to which workers contribute a percentage of gram funded by the government as a healthcare
their wages which employers match. Medisave safety net that aids the poor pay in paying for their
grew out of the nation’s Central Provident Fund, care. Medifund was set up in 1993 to assist
a mandatory savings program originally created Singaporeans who could not pay their medical
by the British during their rule of Singapore to bills. Needy citizens can apply for assistance and
help workers pay for their retirement. Contribu- are means tested before their applications are
tions to the accounts are tax exempt, as are approved.
withdrawals. The account is used to pay for In addition to the 3Ms, another program,
health services and health insurance for the labeled ElderShield was introduced in 2002 to
account’s owner as well as for family members. provide insurance coverage for the costs of long-
MediShield, the second of the 3Ms, is a low- term care necessitated by very serious disabilities
cost insurance program paid for by the insured for in the elderly. ElderShield is an opt-out program
coverage against catastrophic inpatient bills and that commences for individuals when they turn
selected outpatient care. MediShield premiums 40 years of age. The insurance is offered by pri-
can be paid for from the individual’s Medisave vate insurers only, who are selected through com-
account. Singaporeans are automatically enrolled petitive bidding that takes place every 5 years.
in the program but are able to opt out if they so Premiums are fixed at a flat rate based on the age
desire. Soon to be introduced is an extension of of the individual joining the program and are paid
this program called Medishield Life which will by the insured until age 65. Benefits are set at
cover all Singaporeans. fixed monthly payouts of $400 per month.
Private health insurance is also available. While
affordable, the plans also include deductibles and
co-payments in accordance with the healthcare sys- Cost Control
tems requirement that consumers of care must con-
tribute to the cost of their care. Catastrophic Singapore is a leader in keeping costs under
insurance is widely held and covers partial costs of control, and it does so while providing world-
39 Health System in Singapore 883
class healthcare. The nation spends 4.5% of market. At the same time, government sets subsidy
GDP on care versus, for example, 17.9% of and cost-recovery targets for each ward class, which
GDP in the United States and 9.3% in the indirectly keeps the public sector hospitals from
United Kingdom. Here are some examples of producing excess profits. Hospitals are also given
private and public spending on healthcare annual budgets for patient subsidies, so they can
for several nations. All data as of 2010. plan accordingly, knowing in advance the levels of
reimbursement they will receive for patient care.
United They are required to break even within this budget.
Singapore States India China The entire system functions successfully because
Total 4.5 17.6 3.7 5 the quality of care in the public hospitals is
expenditure extremely high and is scrupulously maintained.
on health as
Singapore also regulates the number of medi-
% of GDP
General 31.4 48.2 28.2 54.3
cal students studying in the country, as well as the
government number of foreign medical schools’ degrees rec-
expenditure ognized in the country. In this way, the number of
on health as practicing physicians is controlled, preventing an
% of total
expenditure
oversupply of medical services and avoiding
on health induced demand. The medical savings programs,
Private 68.6 51.8 71.8 45.7 the insurance programs, and the subsidies to pub-
expenditure lic hospitals are continually adjusted. The num-
on health as bers of beds in the public hospitals are carefully
% of total
expenditure controlled. Government regulates and limits the
on health private insurance programs available to
(World Singaporeans. Wages of doctors in the public sec-
Health tor are kept reasonable and not sky-high and are
Organization
2013) periodically reviewed with the goal of keeping
them competitive with the private sector.
Singapore controls the costs of healthcare in a The private sector operates and thrives in this
number of ways, perhaps first and foremost in the quasi-capitalist environment, serving patients
manner by which it both fosters and controls com- who wish to pay more for certain services or
petition. The nation approaches healthcare as a amenities and competing with public sector facil-
quasi-capitalist market. Amid concerns in the early ities on price and quality.
1990s of soaring health costs, the government
issued a white paper entitled “Affordable Health
Care” that, among other issues, set the goal of Price Transparency
engaging competition and market forces to improve Another factor controlling costs is price and out-
service and raise efficiency. It was established that come transparency. The Ministry of Health makes
government would intervene directly in the available on its website the hospital bills for com-
healthcare sector when the market failed to keep mon illnesses (arranged alphabetically from ane-
costs down. This became the guiding policy of the mia to urinary stone), treatments, and ward classes:
system. Public and private hospitals exist side by http://www.moh.gov.sg/content/moh_web/home/
side in this market, with the public sector having the costs_and_financing/HospitalBillSize.html.
advantage of patient incentives and subsidies. Patients can look up the costs of specific sur-
Because it can regulate the number of public hospi- geries, the number of cases treated in each hospi-
tals and beds, the government is able to shape the tal, tests, and more. The data is complete for
environment of the marketplace. Within that envi- public sector hospitals while private hospital
ronment, market forces regulate the private sector, data is voluntary and may not carry the detail of
which must be careful to not price itself out of the the public sector information. Armed with pricing
884 W. A. Haseltine and C. Liu
information, consumers of care can better shop for Total (in active
the services they require. Public Private practice)
Nurses 20,911 8,348 29,259
Midwives 89 65 154
Pooling of Funds and Purchasing Dentists 357 1,215 1,572
Optometrists/ 155 2,124 2,279
Currently, there is no framework to pool funds to opticians
purchase provider services and goods, although a Pharmacists 934 1,048 1,982
system does exist that aggregates demand for bulk Information on the number of occupational
purchasing pricing. The Group Purchasing Office therapists, psychologists, and medical lab techni-
(GPO Pharma) consolidates drug purchases at cians are not available at this time (Ministry of
national level. One goal of this system is to keep Health, Singapore).
drug prices affordable for the elderly and lower-
income groups and contain the costs of pharma-
ceutical-related expenditure. GPO also purchases Workforce Trends
medical supplies, equipment, and IT services for
the healthcare system. Anticipating growth in demand, Singapore will
expand its healthcare professional workforce by
20,000 by the year 2020. This increase covers
Physical and Human Resources doctors, nurses, dentists, pharmacists, and allied
health professionals, representing a 50% increase
Healthcare Infrastructure from 2011. The nation is also expanding training
pipeline, encouraging mid-career professionals to
The data below provide a clear snapshot of the join the healthcare sector, and supporting older
main components of Singapore’s healthcare infra- healthcare staff who wish to continue working
structure as of December, 2012: for as long as they can.
Singapore is also looking to greater use of
Number of public acute hospitals (beds): 7 (6,985) technology, such as tele-consultations, and equip-
Number of public specialty centers (beds): 8 ment such as patient mobility aids to raise the
Number of private acute hospitals (beds): 9 (1,555) productivity levels of its professional workforce.
Number of private other hospitals (beds): 1 (20)
Number of public polyclinics: 18
Number of private medical clinics for primary Paying Healthcare Professionals
care: about 2,400
Number of community health centers: 2 The data below show the gross monthly wage of
Number of nursing homes: 66 healthcare professionals in Singapore in 2012
Number of hospices: 4
2012 Gross monthly wagea
25th 75th
Healthcare percentile Medianb percentile
2012 Singapore Healthcare professionals ($) ($) ($)
Professional Workforce (Primary care 9,058 11,398 16,358
doctors):
General
Total (in active practitioners/
Public Private practice) physicians
Total no. of 6,131 3,515 9,646 Specialist doctors:
doctors Specialist 9,919 20,516 30,300
Specialists 2,342 1,293 3,635 medical
(continued) (continued)
39 Health System in Singapore 885
and community-based services has been shown to as well as run specialized pharmacy clinics, such
improve health and social outcomes while reduc- as an anticoagulation clinic.
ing system cost. Components of the community In the intermediate- and long-term care set-
care program include: multidisciplinary shared ting as well as in the home, programs have been
care teams that provide treatment and care to the introduced where pharmacists visit nursing
mentally ill through service networks in the com- homes and aid in managing residents’ medica-
munity, support for caregivers to cope with care tion needs more effectively. With the Pharmacist
giving, and community safety network for people Outreach Program, pharmacists visit the
with dementia and depression and their care- homes of referred patients to check medication
givers. There are also community-based, targeted compliance and identify and address drug-
mental health programs for youths, adults, and the related problems in consultation with the
elderly. primary physician.
Pharmacists are also involved in supply of
medicines and medication safety, at the institu-
Psychiatric Intermediate and Long-Term
tional level through reviewing drug formularies
Care
and monitoring the use of drugs. Pharmacists are
The majority of psychiatric long-term care ser-
also involved in medication safety initiatives at
vices, where individuals require residential care
the institutional or national level, medication error
or a period of transition and close supervision
reporting and monitoring frameworks, monitoring
after discharge, are provided by the Institute of
and reporting of adverse drug events.
Mental Health and voluntary welfare organiza-
tions – supported by Ministry of Health and Min-
istry of Social and Family Development – such as
The Private Hospitals
Singapore Association for Mental Health and Sin-
gapore Anglican Community Services. Types of
Private hospitals account for approximately 20% of
long-term care facilities include psychiatric nurs-
inpatient beds. Patients may use either the public or
ing homes, rehabilitation homes, and day care
private system, as long as they can pay the costs of
centers.
their preferred provider. Luxury amenities are
available in some of the private hospitals. Private
hospitals are also more involved in medical tourism
Pharmaceutical Care than are the public facilities. Parkway Pantai is the
main private hospital group in Singapore.
In Singapore, pharmacists are now involved in There is a trend toward tapping private hospi-
providing more direct patient care as members of tals’ spare capacity for treating public system,
multidisciplinary healthcare teams. In the public subsidized patients. Private hospitals’ bed occu-
sector, pharmacy services and pharmaceutical pancy rate averages about 55% (MOH 2012 Com-
care by pharmacists are provided through the mittee of Supply Speech).
Departments of Pharmacy at each public hospi-
tal/institution.
Pharmacists dispense and review medications, Reforms
conduct medication counseling to patients upon
discharge, and perform specialized clinical phar- Main Reforms
macy services in hospitals, such as a dedicated
ICU pharmacist. Several main reforms in the Singapore system are
In the outpatient and community setting, phar- aimed at making healthcare more affordable for
macists also undertake health management and consumers.
disease prevention counseling, provide patient The Community Health Assist Scheme, which
medication management and adherence services provides subsidized healthcare services at private (as
888 W. A. Haseltine and C. Liu
• Life expectancy for women – currently stroke. The inclusion of these evidence-based
84.5 years versus 65 years in 1960 and validated indicators allow for comprehensive
• Life expectancy for men – currently 79.9 years benchmarking, enabling identification of areas of
versus 61.2 years in 1960 strong performance as well as areas where
improvements are needed.
Singapore also has a vastly improved survival In 2008 Singapore introduced a set of National
rate among newborns and infants, a rate better Standards for Healthcare which is used to set
than most developed countries: priorities for improvement efforts and alignment
with planning initiatives. It focuses on key areas
• Neonatal mortality rate per 1,000 births is now of concern and promotes a culture of continuous
1.1 versus 17.7 in the1960. quality improvement.
• Infant mortality rate per 1,000 births is now 1.8 National Standards for Healthcare is
versus 34.9 in 1960. implemented through a network of Healthcare
Performance Offices each chaired by a senior
Other outcomes: clinical leader who reports directly to the institu-
tions chief executive officer/chairman medical
• Under 5 mortality rate (per 1,000 live births – board. Resulting quality improvement outputs
both sexes) is 2.8 versus 7.5 in 1990 can then be incorporated into the National Health
System Scorecard and the Public Acute Hospital
Source: Singapore registry of births and death Scorecard for performance analysis and
report (2012). monitoring.
In addition, its cancer survival rates are similar
to Europe’s, and its cardiovascular disease death
rate is half that of the rest of the Asia/Pacific Transparency and Accountability
region.
Regarding policy development and implementa-
tion, Singapore’s Ministry of Health uses public
Efficiency consultation with stakeholders and the public
Singapore uses a performance measurement and before policies are enacted. Stakeholders are
management process to help healthcare providers engaged through dialogue and the public through
assess and benchmark their performance against public consultation. A set of principles and pro-
their peers. The National Health System Score- cesses guide the public consultation ensuring that
card uses internationally established performance public sentiment, concerns, feedback, and diverse
indicators to compare performance in Singapore. views are taken into account.
The Public Acute Hospital Scorecard is used to The Ministry of Health also gathers data on
measure institutional-level performance. Its indi- consumer needs and determines actionable
cators cover clinical quality and patient perspec- insights that might improve healthcare policies.
tives. Similar scorecards for providers are being It also engages in extensive face-to-face conver-
rolled out in primary care facilities and in com- sations through visits to private and public sector
munity hospitals. institutions, town halls, and feedback sessions.
The scorecards lay out the standards of service The ministry also identifies potential issues and
and key deliverables required of the public concerns from the complaint and appeal letters it
healthcare institutions, and they are monitored to receives from customers or their Members of Par-
ensure compliance. They incorporate internation- liament. Quarterly Customer Feedback reports are
ally accepted indicators and definitions where brought to senior management meetings for dis-
possible, such as the Centers for Medicare & cussion. The corporate planning cycle incorpo-
Medicaid Services Joint Commission-aligned rates the review of customer feedback as a key
measures for acute myocardial infarction and process to guarantee policy responsiveness.
890 W. A. Haseltine and C. Liu
Some concrete actions taken as a result of Ministry of Health, Singapore. Expenditure overview. 2013c.
public consultation include: extension of http://www.singaporebudget.gov.sg/budget_2013/expen
diture_overview/moh.html. Accessed Oct 2013.
Medisave use for pneumococcal vaccination, Ministry of Health. All Singapore residents to enjoy universal
treatment of schizophrenia and major depression, coverage under MediShield Life, with no exclusions.
and expanded coverage for major chronic dis- 2015a. https://www.google.com.sg/url?sa=t&rct=j&
eases; raised withdrawal limits for community q=&esrc=s&source=web&cd=1&cad=rja&uact=
8&ved=0ahUKEwikkpWSzPDRAhVJMo8KHWjXA
hospital stays and day rehabilitation center visits; hoQFggaMAA&url=https%3A%2F%2Fwww.moh.gov.
and Medisave use for mammograms and colonos- sg%2Fcontent%2Fmoh_web%2Fhome%2FpressRoom
copies. Directly as a result of customer feedback, %2FpressRoomItemRelease%2F2015%2Fall-singapore-
Medisave withdrawals were also extended to pal- residents-to-enjoy-universal-coverage-under-medish0.
html&usg=AFQjCNFJfEHv-OmUHPEjw-RiELmI8a
liative care, including palliative care in the home. fPFw
Ministry of Health. Better health, better future for all.
Ministry of Health Initiatives for 2015. 2015. https://
References www.google.com.sg/url?sa=t&rct=j&q=&esrc=s&
source=web&cd=1&cad=rja&uact=8&ved=0ahU
KEwiq4P-GzPDRAhXCRY8KHZr9BQYQFggaMAA
Department of Statistics, Singapore. http://www.singstat. &url=https%3A%2F%2Fwww.moh.gov.sg%2Fcontent
gov.sg/statistics/latest_data.html#8/ http://www.singstat. %2Fdam%2Fmoh_web%2FPressRoom%2FResources
gov.sg/publications/publications_and_papers/cop2010/ %2FMOH%2520Factsheet.pdf&usg=AFQjCNFM
cop2010adr.html. Accessed Oct 2013. oyT0VGwMdxLjebX-UVSJsu5EOg
Yong Gan Kim. Straitstimes.com http://www.straitstimes. Ministry of Manpower, Singapore. Median, 25th and 75th
com/mnt/html/parliament/mar6-GanKimYong-pt1.pdf percentile of monthly gross wages of Common Occupa-
Ministry of Health, Singapore. MOH (APO). 2012. tions in Health Industry. 2012. Ministry of Manpower
Ministry of Health, Singapore. 2013. http://www.moh.gov.sg/ (Table 4.9, http://stats.mom.gov.sg/Pages/Occupational-
content/dam/moh_web/Publications/Educational%20Re Wages-Tables-2012.aspx#)
sources/2009/MT%20pamphlet%20%28English%29. Weizhen T. Today Online. Thursday Oct 17 2013. http://
pdf. Accessed Oct 2013. www.todayonline.com/singapore/medishield-life-more-
Ministry of Health, Singapore. 2013. http://www.moh.gov. sustainable-private-medical-schemes-gan
sg/content/moh_web/home/costs_and_financing/sche World Health Organization. World health statistics 2013.
mes_subsidies/Medishield/Medisave-approved_Insur http://www.who.int/gho/publications/world_health_sta
ance.html. Accessed Oct 2013. tistics/2013/en/. Accessed Oct 2013.
Health System in the USA
40
Andrew J. Barnes, Lynn Y. Unruh, Pauline Rosenau, and
Thomas Rice
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893
Organization and Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893
Public and Private Organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893
Financing of Major Insurance Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
Sources of Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
Financing and Financial Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
Medicare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898
Medicaid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899
Private Insurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 901
Physical and Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903
Physical Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903
Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905
A. J. Barnes (*)
Department of Health Behavior and Policy, School of
Medicine, Virginia Commonwealth University, Richmond,
VA, USA
e-mail: andrew.barnes@vcuhealth.org
L. Y. Unruh
Department of Health Management and Informatics,
College of Health and Public Affairs, University of Central
Florida, Orlando, FL, USA
e-mail: lynn.unruh@ucf.edu
P. Rosenau
Division of Management, Policy and Community Health,
School of Public Health, University of Texas Health
Science Center at Houston, Houston, TX, USA
e-mail: pauline.rosenau@uth.tmc.edu
T. Rice
Department of Health Policy and Management, Fielding
School of Public Health, University of California, Los
Angeles, CA, USA
e-mail: trice@ucla.edu
Unruh LY, Barnes AJ, Saltman RB, van with 2014 (U.S. Census Bureau 2016).
Ginneken E, Health Syst Transit 15(3):1–431, Morevover, in California, there are now almost
2013. twice as many Hispanics and Latinos age 18 and
younger than there are whites (Kidsdata.org
2015).
Introduction Historically, the US has resisted central plan-
ning or control at both the federal and state levels.
The US is a large, wealthy country, with double The US health-care system reflects this wider
the gross domestic product of any other in the context, having developed largely through the
world. It is a federal, constitutional democracy, private sector and combining high levels of spend-
with decision-making authority divided between ing with distinctively low levels of government
the federal and state governments. In 2016 nearly regulation. The US spends far more money on
one-fifth (17.9%) of its economy was spent on health care per person than any other country.
health care ($3.3 trillion), amounting to $10,348 International comparison shows a varied pic-
per capita (Hartman et al. 2017). As with many ture with respect to access to health care, health
such national averages in this report, there are behaviors, and outcomes. The US is unusual
wide variations across the states, with spending among high-income OECD countries in that
per capita in 2014 ranging from about $5,982 per most Americans still receive their coverage from
person in Utah to more than $11,944 in the Dis- private health insurance, and more than 12% of
trict of Columbia (Kaiser Family Foundation non-elderly adults are uninsured, although this
2014a. Tax rates are lower than in almost all proportion has been reduced significantly through
other high-income countries, consistent with the implementation of the Affordable Care Act
fact that its public sector provides fewer social (Kaiser Family Foundation, 2016a). With regard
services. Tax rates are lower than in almost all to health behaviors, the picture is again varied; the
other high-income countries, consistent with the USA has been notably effective in reducing
fact that its public sector provides fewer social smoking rates and has one of the lowest smoking
services. Despite being a high-income nation, rates internationally. But it has been less effective
the US ranks poorly, compared to other high- in grappling with nutritional health and obesity.
income countries, on measures of income equal- The US does well on some disease indicators
ity. Because the US birth rate is higher than that of (e.g., certain cancers) but poorly on others (e.g.,
most developed countries, its dependency ratio – asthma). Compared to other developed countries,
those too young or too old to work, divided by the life expectancy is lower and mortality is
working age population – is expected to grow higher (World Bank 2017).
more slowly than in most other countries.
The racial and ethnic makeup of the US popu-
lation is quite varied, with approximately 61.3% Organization and Governance
non-Hispanic White, 17.8% Hispanic or Latino,
13.3% non-Hispanic Black or African American, Public and Private Organizations
and the remainder other and/or mixed racial and
ethnic groups (US Census Bureau 2017). His- In the US health-care system, public and private
panics and Latinos are the fastest-growing payers purchase health-care services from pro-
group, with a 49% population increase between viders subject to regulations imposed by federal,
2000 and 2010, compared to just 5% for others state, and local governments as well as by private
(Ennis et al. 2011). This proportional relationship regulatory organizations. Figure 1 illustrates the
also continues to change: Asians have replaced interplay between four main actors: (1) govern-
Hispanics and Latinos to be the fastest-growing ment, (2) private insurance, (3) providers, and (4)
group, with a total population of 21 million as of regulators, as well as the types of relationships
2015, representing a 3.4 % increase compared that connect them.
894
Federal State
government government
Regulation
Hierarchical
Legislative Contracts
Executive Judicial Local
(Congress)
Aid to the poor
Larger employer
sponsored Exchanges
Office of Department Department of Health and
Veterans Affairs of Defense Human Services (DHHS) Charities
Small employer
sponsored
Individuals
Selected Other
DHHS agencies: Centers for
AHRQ,CDC,FDA,NH Medicare &
Medica Services
Private insurance
Government actors include those at the federal, in the US. Private insurance plans have histori-
state, and local levels. Both the federal and state cally been categorized into three types: health
governments have executive, legislative, and judi- maintenance organization (HMO) plans that pro-
cial branches (although the figure only shows this vide or contract to provide managed care, pre-
for the federal government). Under the executive ferred provider organization (PPO) plans that
branch of the federal government, the Department contract with a preferred network of providers to
of Health and Human Services (HHS) plays the provide care at lower costs, and high-deductible
largest administrative role in the US health-care plans (HDHPs) that typically offer lower pre-
system. HHS includes agencies such as the Cen- miums but higher deductibles than HMOs and
ters for Medicare and Medicaid Services (CMS) PPOs. The vast majority of Americans with pri-
that administer the two major public health insur- vate insurance obtain it through an employer. The
ance programs: (1) Medicare, which provides Patient Protection and Affordable Care Act
near-universal coverage for those 65 and older as (ACA), signed into law on March 23, 2010, is
well as the disabled and those with end-stage renal resulting in significant changes in the US health-
disease, and (2) Medicaid and the Children’s care system. As shown in Fig. 1, these include the
Health Insurance Programs (CHIP), which pri- establishment of federal and state-based insurance
marily provide insurance for some low-income exchanges for individuals without access to public
families and those with disabilities. Medicaid or employer-based insurance to purchase private
also covers long-term care services after individ- coverage as mandated by law. The ACA also
uals have used up all their own income and assets allows providers that organize into Accountable
and, along with Medicare, low-income seniors Care Organizations (ACOs) to share in savings
(referred to as “dual eligibles”). Other agencies they achieve in the Medicare program.
within HHS include research and regulatory agen-
cies such as the Agency for Healthcare Research Planning
and Quality (AHRQ), the Centers for Disease There is a range of public and private organiza-
Control and Prevention (CDC), the Food and tions that undertake health system planning in the
Drug Administration (FDA), and the National US. In spite of this, coordinated health planning
Institutes of Health (NIH). The Office of Veterans by various actors as outlined in Fig. 1 is not highly
Affairs, which oversees the Veterans Health developed. In part this reflects the pluralist and
Administration to provide care to military vet- market-oriented nature of the US health-care sys-
erans, is a federal agency independent of HHS. tem. Planning for emergencies and natural disas-
Public purchasers include federal and state ters, however, is given serious consideration in
agencies. Medicare is the largest public purchaser. both the government and private sector. For exam-
State governments, along with funds provided by ple, the CDC plans for national and international
the federal government, purchase health-care ser- response to public health emergencies.
vices through Medicaid and CHIP, although both
programs are state-administered. Both state and Regulation
local governments are also involved in providing All actors in the health-care system are subject to
health care in a number of ways making it possible regulation, often from multiple government and
for low-income and other disadvantaged individ- nongovernment agencies. Major federal regula-
uals and families to obtain care. These include tory organizations fall under the umbrella of
such things as operating public hospitals as well HHS and include CMS, which regulates public
as providing medical and preventive services payments to private providers and provider qual-
through state and local health departments and ity; the CDC, which focuses on prevention and
their associated clinics and community health control of communicable and noncommunicable
centers. diseases; and the FDA, which regulates food and
In addition to government purchasers, private drug safety. State regulatory bodies include public
insurers and individuals also purchase health care health departments, provider licensing boards,
896 A. J. Barnes et al.
and insurance commissioners. Local counties and implementation of the ACA and the expansion of
cities also regulate health care through their public the individual private insurance market through
health and health service departments including income-based subsidies, nearly 16 million Amer-
regulating communicable diseases and restaurant icans have individually purchased coverage, at
safety. Independent nongovernment and provider least half of whom purchased private insurance
organizations such as the American Medical through one of the federal or state-based
Association (for physicians) and the Joint Com- exchanges. In 2016, 2 years after the implemen-
mission (for hospitals) also play a regulatory role tation of the ACA’s major coverage expansion
in the US health-care system. efforts, approximately 9% of all Americans were
uninsured (28 million) including many young
Patient Rights adults, minorities, and low-income households
The US does not have a national comprehensive (Kaiser Family Foundation, 2017a; Kaiser Family
Patient Bill of Rights (WHO August 2007). The Foundation, 2018a).
right to health care is not in the US Constitution,
and it remains controversial though some states
have enacted a Patient Bill of Rights. Some Sources of Revenue
patient rights in the US have been initiated by
the court system. For example, the Supreme The sources of revenues in the US health-care
Court ruled that individuals with disabilities system have changed considerably over the past
have the right to receive services in non- 40 years. In 1970, one-third of funding was from
institutional settings whenever possible. Since out-of-pocket payments. Currently, public sources
the 1990 passage of the Americans with Disabil- constitute 37% of spending and private sources
ities Act (ADA), those in the US with physical 34%, with the remaining 11% out-of-pocket (CMS
and/or mental disabilities have been granted addi- 2016). While out-of-pocket payments have fallen
tional civil rights. The Health Insurance Portabil- as a percentage of the total, real out-of-pocket
ity and Accountability Act (HIPAA) of 1996 spending per person has actually risen consider-
governs the security and confidentiality of patient ably. This is because the size of the health-care
information. As a result of this legislation, how system has grown so rapidly.
patient information is collected, stored, and trans-
ferred is subject to careful protection.
Financing and Financial Flows
Premiums
FFS
Individuals Income, sales Primary care FFS
and property taxes physicians FFS, Cap, Salary
FFS
FFS
Charities Specialists
FFS, Cap, Salary
DRGs, per diem, CR, DSH
DRGs
Hospitals FFS, per diem
Patients Negotiated discounts
Prescriptions Formularies
Co-payment, self-pay Formularies
Various
Income, sales, property and corporate taxes Other providers Various
Payroll taxes Various
Direct payments
Transfer flows
Service flows
sharing. Individuals may purchase non-group 14%, with the remainder coming from a variety
coverage outside of the employment market. In of community organizations (Kaiser Family
addition to payroll taxes, individuals contribute to Foundation 2013). In 2011, the federal govern-
general federal and state revenue funds to finance ment, through the Medicaid Disproportionate
public health-care coverage through income, Share Hospital (DSH) program, allotted $11.2
sales, and property taxes. There is no value- billion to hospitals serving a disproportionate
added tax (VAT) in the US. number of uninsured and Medicaid patients
In the past care for low-income and uninsured (Kaiser Family Foundation 2013). These pay-
individuals has been financed through private ments were expected to decrease as the ACA
charities, a safety net system of public and com- was fully implemented and many of the uninsured
munity clinics, as well as by hospitals and physi- and those with preexisting conditions acquired
cians. Additional funding came from general tax health insurance. However, many states have not
revenues, but in many cases the care received was expanded Medicaid leaving a number of
uncompensated and therefore is borne by pro- uninsured continuing to require uncompensated
viders. Prior to the ACA, it was estimated that of hospital care and subsequent legislation delayed
the $57 billion in uncompensated care expendi- reducing DSH payments to hospitals (Kaiser
tures, hospitals contribute 61% and physicians Family Foundation 2016c).
898 A. J. Barnes et al.
In the US, how health services are paid for Part C, Medicare Advantage, is an alternative
depends on the service provided, the type of to Parts A and B. Enrollment is voluntary. It pro-
health worker providing it, the funder, as well as vides coverage for the same services and, at the
where the service is provided (e.g., hospital or discretion of the organization offering coverage,
ambulatory care center, California or New York). sometimes additional benefits such as vision or
Given this complexity, the payment mechanisms hearing. One of the main differences between Part
for each type of health service is shown according C and the preceding two parts which are some-
to the payer involved (e.g., Medicare, insurers, times called “traditional Medicare,” is that Part C
and health plans) in Table 1. coverage is offered through private organizations
(e.g., insurers and HMOs). In 2017, 33% of Medi-
care beneficiaries were enrolled in Medicare
Medicare Advantage plans, but aspects of the ACA could
lead to reductions in enrollment in the future (Kai-
The Medicare program provides health insurance ser Family Foundation 2017b).
coverage to nearly all Americans age 65 and older Part D, prescription drug coverage, began in
as well as to many disabled Americans and people 2006 and is also voluntary. Like Part C, Part D
with end-stage renal disease – a total of about 55 benefits are provided through private insurers.
million people. It covers medically necessary care There are dozens of Part D plans in each state –
with the exception of extended long-term care and in addition to dozens of Medicare Advantage
dental care. Medicare is divided into four parts, plans providing drug coverage in many urban
labeled Parts A, B, C, and D. Part A, hospital areas. Also like Part C, premiums and benefits
coverage, includes not only hospital care but also vary by plan, with competition occurring based
some post-acute nursing home, home health care, not only on premium differences but also on dif-
and hospice care. Part B, supplemental medical ferences in benefits and, in particular, the drugs
insurance, is a voluntary program with essentially that are included on a plan’s formulary that are
the same eligibility requirements as Part A. It listed as “preferred” drugs and which therefore are
covers physicians’ services (both inpatient and out- subject to lower patient co-payments. Over 70%
patient); outpatient care; medical equipment, tests, of Medicare beneficiaries are covered under Part
and X-rays; home health care; some preventive D. Most other beneficiaries have drug coverage
care; and a variety of other medical services. from another source, such as coverage from a
Despite its voluntary nature, about 95% of those former employer, but 12% do not have any drug
eligible enroll in it because it is heavily subsidized. coverage (Kaiser Family Foundation 2017c).
40 Health System in the USA 899
February 2018, 33 states and the District of 100% of the costs from the federal government
Columbia had expanded their Medicaid coverage to add all poor people and the near poor up to
in accordance with the ACA, and 18 had not 138% of the poverty level to Medicaid rolls for
(Kaiser Family Foundation 2018b). In those states 4 years. The federal contribution will gradually
that have chosen to expand, all adults and children decrease to 90%.
below 138% of the federal poverty level (FPL) are Several states have petitioned the federal gov-
now eligible for Medicaid. (In 2017, the federal ernment for special arrangements in their Medic-
poverty level was $12,060 for a single individual aid expansion, and they have received approval to
and $24,600 for a family of four.) (Healthcare. proceed. These are called “1115 demonstration
gov, 2018). waivers” and typically involve exceptions to the
In the other states, children and pregnant usual Medicaid rules that are budget neutral for
women have the most liberal eligibility require- CMS. Examples include charging a co-pay or
ments. States are required to cover pregnant premium to recipients for services, imposing a
women and children up to age six if their incomes penalty for nonpayment of premiums, including
are at or below 138% of the federal poverty level work requirements, offering “wellness incentive”
(FPL) and children ages 6–18 up to 100% of the programs, and structuring the program like a
FPL. Many states employ even higher, or more health savings account (HSA). As of February
generous, income eligibility thresholds. When 2018, 35 states have received waivers from CMS
combined with CHIP coverage, the median state to tailor their own Medicaid programs (Kaiser
provides coverage to children up to 235% of the Family Foundation 2018c).
FPL and pregnant women up to 185%. To illus- The initial evidence on the effectiveness of
trate the critical role that Medicaid plays for preg- these innovations to save money, improve the
nant women, the program pays for 45% of all quality of care, and/or improve population health
births in the US. Coverage is somewhat narrower is limited. However, states are required by CMS to
for seniors and the disabled, however, with eligi- report such evidence during the demonstration
bility mandated up to 75% of the FPL. waiver. Almost all of the waivers add to the com-
In the 18 states that have not expanded cover- plexity of the Medicaid program and could
age, low-income parents of dependent children increase the cost of administration. This will be
face the most stringent eligibility requirements. evaluated by CMS going forward. In the tradition
Nine states cover them only if their incomes are of American federalism, successful innovations
below 40% of the FPL – with Alabama and Texas could spread to other states.
providing such coverage only up to 18% of the The scope of coverage under Medicaid is gen-
FPL (i.e., an annual income even as low as $2,200 erally wide but varies by state. Federal law
would disqualify an individual from coverage in requires that states provide the following services:
that state). In contrast, Connecticut and the Dis- inpatient and outpatient hospital, physician, nurse
trict of Columbia cover these adults at in excess of practitioner, laboratory and radiology, nursing
200% of the FPL or higher, taking advantage of home and home health care for those age 21 and
the joint funding by the federal government. older, health screening for those under age 21,
Recently, several states have either considered or family planning, and transportation. Other ser-
passed legislation that would also impose work vices are optional for states. This designation
requirements on many Medicaid recipients of means that if a state chooses to cover the service,
working age (Kaiser Family Foundation it will receive matching funds from the federal
2018b). This illustrates the large variation in government. Optional services include some
breadth of coverage that currently exists between major services such as prescription drugs and
states, although this variation has been reduced dental care but also such things as care provided
considerably as a result of the ACA. by professionals besides physicians and nurse
Beginning in 2014, states that choose to practitioners, durable medical equipment, eye-
expand their Medicaid coverage will receive glasses, rehabilitation, various types of
40 Health System in the USA 901
institutional care, home- and community-based an entry into the employer insurance market, and
services, personal care services, and hospice. who are not eligible for Medicare and Medicaid,
In general, those eligible for Medicaid receive often seek coverage individually. Historically,
services at little or no cost. However, states some- individual coverage has had several disadvan-
times put restrictions on the number of services tages over employer group coverage and therefore
that are covered per year. Moreover, payments to was normally purchased only if the employer-
physicians are usually low. In 2013, about 30% of sponsored coverage was unavailable. Prior to the
physicians reported that they would not take new ACA, plans purchased in the individual private
Medicaid patients (Decker 2013). Psychiatrists market were usually unsubsidized; administrative
were the most likely to reject new Medicaid costs tended to be high (25–40%); health exami-
patients (56%), and cardiovascular disease spe- nations were often necessary; cost-sharing
cialists see the most, with only 9% rejecting such requirements were, on average, higher; and
patients (Decker 2013). fewer types of services tended to be covered.
One development with the potential to provide However, the individual market is changing sub-
more mainstream access to physician office care is stantially with the creation of the health insurance
the movement toward the use of managed care in exchanges under the ACA.
the Medicaid program. Over 70% of Medicaid Some employers, particularly larger ones, offer
beneficiaries are in managed care plans. The a choice of health insurance products to their
exact nature of these arrangements varies from employees. Among firms offering a choice, only
state to state. Some include capitation (rather about 20% of employees nationally can choose
than fee-for-service) for providers and/or primary among three or more plans (California HealthCare
care case management. States often prefer man- Foundation 2009). For federal government
aged care both as a means of enhancing quality employees, there can be dozens of choices.
and controlling costs and are likely to rely on it as Employees with a choice can generally switch to
the program expands through provisions in the a different plan irrespective of their health history
ACA. or status once per year.
Historically the most common arrangement
offered by employers was a PPO. Among all
Private Insurance covered workers, in 2017 48% were enrolled in
PPOs, 14% in HMOs, 10% in point of service
In 2016, 179 million Americans were covered by plans (POS – a blend of HMO and PPO arrange-
private insurance; 157 million of these had ments that allow members to seek care from non-
employer-sponsored coverage (Kaiser Family network providers at a higher cost), 28% in high-
Foundation 2016d). While having employer- deductible plans (note that some of these may be
sponsored insurance is almost always advanta- PPOs or HMOs), and less than 1% in conventional
geous – employers generally subsidize premiums insurance (traditional fee-for-service) plans (Kaiser
– it is not available to everyone. First, it is neces- Family Foundation 2017d). The biggest change in
sary to be employed or be a family member of recent years has been the relatively rapid rise of
someone employed. Second, the employer has to high-deductible plans with a savings option, many
offer coverage; until 2015 or 2016, it was of which are classified as health savings accounts
completely voluntary on the part of the employer. (HSAs). In HSAs, the policy holder agrees to pur-
Third, if coverage is offered, the employee has to chase insurance with a high deductible (currently
be eligible for it. And fourth, even if eligible, the averaging about $2,200 annually for individual
employee has to be willing to pay the employee’s coverage and twice that for family coverage). Pre-
share of the premiums, which can be considerable. mium contributions can be made by the individual
It is the people who are better-off economically and/or employer. These contributions are tax
who are able to meet the four conditions men- deductible, can accumulate year to year if unspent,
tioned above. Individuals and families without and therefore can be used for future medical
902 A. J. Barnes et al.
expense. They can be withdrawn to pay for eligible the average cost of employer-based single cover-
medical care. age was $6,690 in 2017, 18% of which was paid
Market share in health insurance is dominated by the employee. For family coverage, it was 31%
by larger firms that generally market nationally. of the total cost of $18,764. The percentage of
(Blue Cross Blue Shield plans, while having a family coverage paid by the employee has risen
national presence, usually market in individual considerably over the past decade – by 6.8% per
states.) In 2013, three of the largest insurers cov- year compared to 4.8% for the share paid by the
ered 80% of people enrolled in individual, small employer (Kaiser Family Foundation 2017a).
group, and large group private insurance markets This is one of several examples of how employers
in at least 37 states (US Government Accountabil- have shifted more costs onto employees as health-
ity Office 2014). care costs have risen.
Prior to January 2014, insurers priced their As is the case in many high-income countries,
productions in two ways: experience rating and there are often substantial co-payments for pre-
community rating. Under experience rating, the scription drugs. In most employer-sponsored
most common technique used, insurers charged plans, there are multiple “tiers,” each of which
employers (or individuals) on the basis of the past has its own cost-sharing requirements. Their pur-
cost experiences or, when data is lacking, on pre- pose is mainly to encourage the use of cheaper
dicted expenditures. In contrast, community rat- drugs, particularly generics, the use of which has
ing entailed charging the same amount to all grown substantially in recent years. One way in
groups (or even individuals). In the individual which employer coverage tends to be more gen-
insurance market, premiums were generally expe- erous than Medicare’s is that there is usually a
rienced-rated. Each individual went through med- limit on annual out-of-pocket expenditures. Over
ical underwriting in which their risks are assessed. 80% of employer-sponsored health plans estab-
Under the ACA, state-based exchanges com- lish such a maximum. In 2014 the median out-of-
bined with the individual mandate to purchase pocket maximum for an employee with individual
insurance are intended to reduce adverse selection coverage was approximately $6,000 (Kaiser Fam-
problems in the individual and small group market ily Foundation 2014a).
by requiring plans selling in exchanges use com- Administrative costs tend to be higher in pri-
munity rating (older individuals can be charged vate insurance than government-sponsored pro-
more than the younger, but differences within age grams like Medicare and Medicaid. This is a
cohorts will be prohibited), rather than experience result of several factors in addition to the need
rating, and by increasing risk pooling to a far for profits. Private insurers engage in “underwrit-
greater extent than has been the case in the past ing” activities, which involve examining past
in the US. Exchanges will also reduce or eliminate claim expenses to determine a competitive, yet
the need for individuals to purchase insurance still profitable premium to charge. They also
through agents or brokers, whose fees can absorb need to market and advertise since, unlike gov-
20% of the total premium during the first year of ernment programs, they do not have a captive
enrollment (Whitmore et al. 2011). One of the key audience. Finally, to protect themselves against
requirements of the ACA is that individuals pur- unexpectedly high claims, insurers often need to
chase coverage or pay a penalty. Similarly, firms factor in a risk premium. Estimates vary on the
with more than 50 employees will also have to size of administrative costs (including profits and
provide coverage or pay a penalty. These “sticks,” taxes). Most agree, however, that administrative
combined with the “carrots” of subsidies for indi- costs are much higher for insurance policies cov-
viduals to purchase coverage, will, it is hoped, ering individuals and small firms. One study,
lead to a system where community rating will be conducted by a US actuarial firm, estimated that
viable. in 2003, private insurers spent 16.7% on admin-
There are significant user charges associated istrative costs. Among the latter, administrative
with private insurance. Beginning with premiums, costs were estimated to be 30% in the individual
40 Health System in the USA 903
6.67 6.44
6.5 6.22
5.83 5.81 5.65
5.44
5.5
4.5
4.3
4.5
3.7
3.4
3.5
2.9
2.64 2.7 2.6 2.6 2.6
2.5
0.5
1970 1980 1990 1995 2000 2005 2010 2011 2012
Psychiatric institutions Community hospitals Skilled nursing homes
Fig. 3 Number of beds in US community hospitals, psy- for Medicare and Medicaid Services. Sources: (1) For
chiatric institutions, and nursing homes per 1,000 popula- community hospitals: Health United States, 2006, 2007,
tion, 1970–2012 (Notes: Community hospitals are defined 2008, 2009, 2011. (2) For psychiatric hospitals: Foley et al.
as nonfederal, short-term general, and other specialized (2004), DHHS pub. no. (SMA)-06-4195, chap. 19; Health,
hospitals. The types of facilities included in the category United States, 2009, Table 119; Health, United States,
of community hospitals have changed over time. Psychi- 2011, Table 117. (3) For skilled nursing homes: Health,
atric institutions are defined as all 24-h psychiatric hospi- United States, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
tals and residential treatment organizations. Skilled 2011)
nursing homes are those that are certified with the Centers
Health Information Technology for Economic and chiropractors, registered nurses (RNs), and thera-
Clinical Health (HITECH) Act was passed. It pro- pist occupations. Employment also increased with
vides $30 billion to hospitals to adopt EHRs. most of the technologist and technician occupa-
Hospitals must build systems that have “meaning- tions and all of the support occupations. Employ-
ful use” in stages of increasingly advanced ment fell for dentists, physician assistants, and
requirements (Adler-Milstein et al. 2014). In addi- clinical laboratory personnel.
tion, the ACA has incentivized physicians and
hospitals to adopt EHRs by encouraging innova- International Mobility
tions such as ACOs, which are difficult to run The numbers of US health-care professionals
without an EHR (Adler-Milstein et al. 2014). include immigrants to the US and exclude emi-
grants from the US. In 2014, 26% of physicians
and 24% of residents in specialty programs in the
Human Resources US were international medical graduates
(Ranasinghe 2015). Over 8% of the US nursing
Health-Care Workforce workforce in 2004 consisted of international nurs-
Table 3 presents the numbers of workers ing graduates (US DHHS 2010).
employed in several health-care occupations Although immigrants add to the health-care
between 1990 and 2014. Increases in employment workforce supply, there is no evidence that they
occurred with most health-care diagnosing and improve distributional issues. Furthermore, a reli-
treating practitioners, such as physicians, ance on immigration reduces the incentive to
906 A. J. Barnes et al.
Table 3 Employed US health-care personnel per 1,000 population, 1990–2014 (selected occupations)
%
1990 1995 2000 2005 2010 2011 2012 2013 2014 chng
Health-care diagnosing and treating practitioners
Chiropractors – – 0.15 0.28 0.18 0.18 0.19 0.18 0.21 0.56
Dentists 0.64 0.59 0.61 0.55 0.57 0.58 0.53 0.58 0.60 0.01
Optometrists 0.09 0.13 0.12 0.14 0.12 0.09 0.11 0.13 0.15 0.27
Pharmacists 0.69 0.65 0.80 0.84 0.83 0.88 0.91 0.88 0.92 0.15
Physicians and surgeons 2.32 2.64 2.62 2.81 2.82 2.64 2.91 2.95 3.18 0.20
Physician assistants – – 0.15 0.25 0.32 0.26 0.35 0.41 0.26 0.84
Podiatrists 0.06 0.04 0.02 0.04 0.04 0.02 0.03 0.04 0.03 0.01
Registered nurses 6.70 7.52 7.79 8.17 9.21 8.68 9.19 9.15 9.06 0.16
Occupational therapists 0.15 0.20 0.20 0.29 0.35 0.36 0.38 0.35 0.35 0.59
Physical therapists 0.37 0.49 0.51 0.60 0.61 0.71 0.67 0.71 0.77 0.45
Respiratory therapists 0.25 0.36 0.27 0.32 0.42 0.43 0.35 0.35 0.35 0.27
Speech-language therapists 0.25 0.35 0.31 0.33 0.43 0.40 0.47 0.43 0.43 0.35
(pathologists)
Health-care technologists and technicians
Clinical laboratory 1.20 1.42 1.02 1.13 1.11 1.03 1.02 1.08 0.92 0.10
technologists and technicians
Dental hygienists 0.35 0.36 0.39 0.45 0.46 0.47 0.52 0.58 0.55 0.35
Licensed practical and 1.77 1.52 1.81 1.72 1.86 1.80 1.70 1.77 2.01 0.11
licensed vocational nurses
Medical records and health 0.28 0.08 0.31 0.41 0.38 0.37 0.29 0.28 0.43 0.36
information technicians
Health-care support occupations
Nursing, psychiatric, and 5.87 6.69 5.24 6.42 6.24 6.36 6.77 6.75 6.21 0.16
home health aides
Dental assistants 0.76 0.80 0.76 0.88 0.97 0.98 0.88 0.88 0.86 0.12
Sources: Current Population Survey (CPS), Bureau of Labor Statistics, HRSA, DHHS; US Census Bureau, Census 1990,
2000, 2010, and population estimates 2011–2014
Notes: Dashes indicate data are not available. % change is from 1990 to 2014 or from the earliest year. A new occupational
classification system for occupational employment (SOC) was introduced by the CPS in 2003. The 1990 and 1995 data are
based on the old classification system and may not be fully comparable to later data. The table reports numbers employed
rather than full-time equivalents (FTEs), so the actual amount of human resources employed may be less than that
reflected in the table due to part-time employment. On the other hand, since these are employment numbers, the total
number of individuals in each occupation would be larger if unemployed individuals were counted
Calculations: Employment and population were rounded to three decimal places
expand educational capacity, raise wages, and rural areas is only 4/5 that of urban areas (Hing
improve working conditions (Flynn and Aiken and Hsiao, 2014). In nursing, the biggest distribu-
2002). Finally, migration from low-income coun- tional issue is the low number of RN faculty
tries is a “brain drain” for those countries (Aiken (AACN 2017). This creates bottlenecks in the
2007). educational process and contributes to nursing
shortages (AACN 2017). The ACA includes pol-
Distribution icies aimed at improving supply and distribution
The US has a high proportion of specialists to issues related to primary care including scholar-
primary care physicians (around 1.5 times as ships and loan repayment programs for primary
many in 2012) (Hing and Hsiao 2014). Further, care physicians, short-term increases in primary
the primary care physician to population ratio in care payment rates for Medicaid, and additional
40 Health System in the USA 907
support for Federally Qualified Health Centers to CDC. Federal laws allow state health agencies to
provide essential health services to more determine the scope and amount of services and to
uninsured and low-income patients. establish the vehicles for providing those services.
As a result, the services vary significantly across
Adequacy the states. Local public health agencies at the
Projections of the adequacy of physicians using county or city levels (“health departments”)
several forecasting models indicate a future short- carry out many public health functions (Salinsky
age of physicians of 5–20% by 2020 (COGME 2010).
2005; BHPr 2008). Other projections indicate that Public health services include communicable
a smaller increase in supply would be needed if disease control, environmental hazard prevention,
distributional issues were improved or if there was emergency terrorism preparedness and response,
an increased use of nonphysician providers and occupational health, health promotion and screen-
osteopaths (Weiner 2007). In nursing, forecasters ing, and licensing, regulation, and planning of
unanimously predict a large future shortage health-care facilities and providers.
(BHPr 2010).
Outpatient Services
Provision of Health-Care Services
Primary Care
The US has several major health-care sectors, In 2010 55% of the visits to physicians in the US
including public health, primary, specialty, acute were to a primary care physician (US Depart-
inpatient, dental, mental health, pharmaceutical, ment of Health and Human Services 2014). Pri-
post-acute, long-term, and palliative care. Access mary care practitioners are physicians, nurse
to these services and navigation through the US practitioners, physician assistants, and nurse
health-care system differs depending upon the midwives who are generalists or who specialize
care that is needed and whether an individual is in family medicine, internal medicine, pediat-
insured or uninsured. Insured individuals tend to rics, obstetrics, and gynecology (Bodenheimer
enter the health-care system through a primary and Pham 2010).
care or specialty provider. Uninsured individuals Access to primary care requires that patients
often do not have a regular primary care provider have the ability to pay for care, adequate trans-
but instead may visit community health centers portation to care, and the health literacy to
and emergency departments. Due to out-of-pocket demand and use the care; it also requires that the
costs, they may be reluctant or unable to seek care supply, distribution, and time of providers are
unless they are experiencing an emergency. adequate (Shi and Singh 2012). For these reasons,
the uninsured and those with insurance but unable
to afford high out-of-pocket costs due to inade-
Public Health quate coverage have difficulty accessing primary
care. Additionally, those covered by Medicaid
Public health focuses on promoting health at the may experience problems accessing primary
population level through investigating and inter- care due to their inability to find a private physi-
vening in the environmental, social, and behav- cian that accepts Medicaid patients (Shi and
ioral factors in health and disease. It emphasizes Singh 2012).
prevention and health promotion (Shi and Singh
2012). Public health is promoted mostly through Specialty Care
public agencies. At the federal level, public health Forty-five percent of visits to physicians in the US
services are headed by the US Public Health Ser- in 2010 were to specialists (US Department of
vice (USPHS), a division of HHS. There are sev- Health and Human Services 2014, Tables 91,
eral subdivisions within the USPHS, such as the 92). Many of the issues with access to primary
908 A. J. Barnes et al.
care are even more of a concern with specialty and fractures. Medical care is typically performed
care. Care coordination among primary care and by family physicians, nurse practitioners, and
specialist providers is a growing issue in the US, physician assistants (Weinick et al. 2009).
where the typical Medicare beneficiary sees two In 2011 there were more than 9,000 urgent care
primary care physicians and five specialists a year, centers (UCCs) in the US (Yee et al. 2013). Urgent
and patients with multiple conditions may see up care services have expanded in response to diffi-
to sixteen physicians (Bodenheimer 2008). This culties in seeing primary care practitioners on an
can lead to over-, under-, and conflicting treatment urgent basis and after-hours, high ED costs, and
and polypharmacy. Two initiatives to improve long ED wait times (Yee et al. 2013). Some indi-
care coordination in the US are patient-centered viduals use UCCs because they do not have a
medical homes (PCMHs) and ACOs (Phillips and regular source of primary care. An individual
Bazemore 2010; CMS 2012). In PCMHs each must have insurance or pay out-of-pocket for care.
patient has an ongoing relationship with a primary
care provider, who directs the medical team, and Retail Clinics
the patient’s care is coordinated across all health- Located in pharmacies, grocery stores, and depart-
care settings, with patients actively participating ment stores, retail clinics are emerging as places to
in decision-making (Rittenhouse et al. 2011). In go for treatment of minor medical conditions
ACOs payment from Medicare is tied to the per- (RAND 2010). They tend to be staffed by non-
formance of the provider organization, thus con- physician practitioners, such as nurse practi-
ferring financial risks and rewards for care tioners or physician assistants, and they treat a
management and patient outcomes to providers. limited number of conditions and needs, such as
skin conditions, sore throats, pregnancy testing,
Emergency Care infections, diabetes screening, and immunizations
Emergency departments (EDs) are a major part of (Mehrotra et al. 2008).
the US health-care safety net (Shen and Hsia
2010). EDs in hospitals that receive payment
from Medicare are required by the Emergency Acute Inpatient Care
Medical Treatment and Active Labor Act
(EMTALA) to provide care to anyone needing Individuals who are acutely ill and need to have
emergency treatment. Hospitals must care for the round-the-clock care require inpatient care pro-
individuals until they are stable. This allows vided in hospitals. The availability of hospital
under- and uninsured persons access to the ED services depends upon the insurance status of the
for emergency conditions. individual seeking care, the type of hospital, and
EDs tend to be overused for nonurgent prob- the geographic area. For those who have private or
lems and for serious problems that could have public insurance, care is accessed through a phy-
been prevented with better primary and specialty sician referral to a hospital that the physician
care. ED overcrowding, long wait times, hospital recommends and that is in the insurance provider
diversions, the lack of ED space and staff, and network. For those without insurance, access to
patient boarding have been problems for many care depends upon how sick they are.
years (GAO 2009). When an uninsured patient’s condition is not
an emergency (such as planned surgery), access to
Urgent Care hospital care becomes dependent upon hospital
Urgent care is walk-in care provided outside the ownership. Government-owned hospitals must
ED setting in centers that are open in the evening provide charity care to those who do not have
on weekdays and at least 1 day over the weekend insurance or cannot pay for out-of-pocket portions
(Weinick et al. 2009). Services focus on acute of their care (Weissman et al. 2003). These hospi-
episodic care for minor illnesses and emergencies tals provide the majority of charity care in the US
such as upper respiratory infections, lacerations, (Weiner et al. 2008). Charity care is also provided
40 Health System in the USA 909
problem and is best treated with pharmaceuticals the elderly are fully covered by Medicare, the
or other products, or has been pharmaceuticalized, number of uninsured individuals needing hospice
is controversial (Metzl and Herzig 2007). care is quite small (Lorenz et al. 2003). For the
small number of individuals without insurance
coverage, hospices may provide care regardless
Long-Term Care of ability to pay (Pietroburgo 2006).
family size, and tobacco use. A calculator avail- federal government was invited to do this by the
able on the health insurance exchange website state itself, but in other cases the state refused to
allows those seeking insurance to determine the set up their own exchange as a means to protest
approximate of subsidy they will receive (Kaiser against the ACA. The Supreme Court sided with
Family Foundation 2018d). the Obama administration (Burwell) and ruled
Health insurance exchanges have been set up that the intent of Congress had been to provide
by states or the federal government to make it subsidies on all exchanges across the USA.
easier for consumers to compare and choose Medicare benefits were enhanced by the ACA.
health insurance policies by providing informa- Preventive services are covered without a co-pay-
tion in a standardized form. Policies are regulated ment from the patient. Over time, the coverage
as to what they must cover. Insurers selling gap (“doughnut hole”) for prescription drug cov-
through the exchanges cannot reject an applicant erage is being removed. Medicare Advantage
due to health status nor can they charge more to plans (private out-sourced forms of managed
those with a history of preexisting medical condi- care Medicare) are experiencing reductions in
tions. Premiums can, however, vary based on age, how much they are paid by the federal govern-
smoking status, and geographic location. No ment to take care of Medicare patients because of
annual or lifetime limits can be placed on the evidence that they have been paid much more than
value of insurance coverage. There are also limits their costs in the past. Those achieving higher-
on the percent of premiums insurers must use for quality scores for care receive bonuses and those
the health benefits of those who purchase policies. with lower scores, financial penalties.
The ACA also sets Medicaid eligibility stan- Employers with 50 or more employees must
dards which were more generous than those in offer health insurance, or face a penalty. This
effect in many states. The law made the federal mandate became effective in 2015. Employers
government responsible for most of the cost of with fewer employees do not have to provide
this expansion of Medicaid (90–100%) in states coverage. Some small employers receive tax
that were below the new national standard. How- credits to offer coverage.
ever, as a result of the Supreme Court ruling in Providers who choose to organize into ACOs
2011, states were given the option of not have the opportunity to share in any savings they
expanding Medicaid. As of early-2018, 32 states accrue, initially from Medicare but eventually
and D.C. have expanded Medicaid with the others other payers may participate as well. The ACA
working on waivers or not taking action at this includes experiments with innovative payment
time (Kaiser Family Foundation 2018c, 2018d). systems that avoid the problems inherent in fee-
They may, however, choose to participate in sub- for-service reimbursement. Bundled service pay-
sequent years. In June of 2015, the Supreme Court ments are an example. Scholarships and loans
ruled on the King V. Burwell case. King chal- included in the ACA are intended to encourage
lenged the constitutionality of federal subsidies more primary care physicians to work in under-
awarded to those purchasing health insurance on served rural and urban areas. Cost control policies
federal insurance exchanges. When the ACA was in the ACA included the formation of an Indepen-
drafted and adopted into law, wording indicated dent Payment Advisory Board to keep Medicare
that subsidies would be available to those who spending in-line with economic growth. Addi-
enrolled in an exchange “established by the tionally, while the ACA forbids the use of cost-
state,” and King argued that the federal exchanges effectiveness research in determining service cov-
were not established by a state and therefore they erage and reimbursement under Medicare, the law
could not offer subsidies. The case was critical to established the Patient-Centered Outcomes
the survival of the ACA because initially most Research Institute to spur comparative effective-
states (34) failed to establish their own exchange. ness research in the health-care sector.
The federal government had stepped in to set one The ACA was designed to be budget neutral.
up in each of these states. In some cases the To help pay for the ACA, high-income individuals
912 A. J. Barnes et al.
and families pay higher taxes on unearned and creating challenges as legislators from both
investment income, and they pay higher payroll parties try to shape the U.S. health care system
taxes to finance Medicare. A tax was added to moving forward.
some medical devices and to services offered by
tanning salons. There is also a tax on “Cadillac” or
high-benefit health insurance plans offered by Assessment
employers, although numerous postponements in
Congress have delayed levying the tax until at Overview
least 2020. In the end the ACA is redistributive
from the healthier to the sicker and from the The US health system has both considerable
wealthier to the poorer. strengths and notable weaknesses. These are
The ACA was adopted by a small margin in the discussed in the following sections in the context
Congress and opposition to this reform remains of access, quality and outcomes, and expenditures
strong. But today it is the law and it is unlikely that from the USA and international perspectives.
it will be completely reversed. Voters and stake-
holders become accustomed to the benefits they
receive and removing them is increasingly diffi- Access
cult as time passes. Revisions to the ACA will be
ongoing; health system reform is never final. New In 2013, just prior to the main provisions of the
legislation may be necessary to resolve dilemmas ACA being implemented, it was estimated that
that were overlooked or impossible to resolve at 44.6 million Americans under the age of 65
the time the ACAwas adopted by Congress. While (16.7%) were uninsured (US Department of
the current Republican President Donald J. Trump Health and Human Services 2014, Table 114).
made repealing and replacing the ACA a central This rate had been relatively steady since 2000
focus of his 2016 presidential campaign, wide- except for an uptick during the Great Recession.
spread opposition to repealing the benefits of the The distribution of uninsured was skewed
ACA undermined efforts to remove some of its toward those who were economically most vul-
protections. Nonetheless, Congress repealed the nerable. In 2013, nearly 30% of the non-elderly
individual mandate to purchase health insurance with incomes below twice the federally desig-
(effective in 2019) in addition to other legislative nated poverty level were uninsured, compared to
strategies to reduce ACA protections, including a just 5% of those whose income exceeded 400%
2017 Executive Order by President Trump for of the poverty level. Coverage varied consider-
agencies to explore options that would expand ably by race/ethnicity as well. Among those
short-term health insurance and other less-com- under age 65, about 16% of non-Hispanic
prehensive forms of health coverage, relax rules whites, 19% of African Americans, and 14% of
about associations offering less comprehensive Asians were uninsured. This compares to 31% of
coverage to members, shorten the sign-up period Hispanics/Latinos (US Department of Health
for individual coverage, reduce outreach for and Human Services 2014, Table 114). Poor
enrollment for individual coverage, and attempt and near-poor children were the one group that
to cut spending on federal subsidies offered to has had increasing insurance coverage over the
help individuals purchase health insurance years. Their uninsurance rate in 2013 was about
through the federal exchange. Despite these 7%, less than half that of poor and near-poor
efforts, and the uncertainty and increased costs parents as well as adults without children. The
they created in many state exchanges, enrollment lower uninsurance rates for poor and near-poor
in the exchanges fell only 5% in 2018 compared to children reflected the success of CHIP.
the previous year (Kaiser Family Foundation After nearly 4 years, the 2014 public and pri-
2018a).This suggests that the popularity of the vate insurance expansions brought about by the
expanded coverage afforded by the ACA endures, ACA have reduced the number of uninsured
40 Health System in the USA 913
Fig. 4 Barriers to health care among non-elderly adults by insurance status, 2016 (Kaiser Family Foundation 2017b)
contrast, the figure was just 7% in the UK and in perspective, while it does not perform so well on
the Germany (High et al. 2017). others. Performance on some of these measures is
A final set of metrics regarding access regards discussed next.
in how timely of a manner care is received. Table
5 shows several indicators of waiting times in 11
high-income countries. The US performed well
Mortality
internationally with regard to seeing a specialist
US life expectancy at birth was 81.2 years in 2015
and getting elective surgery, with Germany
(Worldbank 2015). It tied for 26th out of the 32
and France performing best and Norway and
high-income OECD countries, at about 2 years
Canada worst. The picture is different for
below the median. With respect to infant mortal-
primary care. The US ranked 8 out of the 11
ity, US rates have declined substantially over the
countries for seeing a doctor or nurse on the
past two decades but not as fast as other countries.
same or next day. This is not surprising. Access
As a result, it ranks the highest among the 31 high-
to specialty care and surgery is relatively
income OECD countries in infant mortality
high because there are ample resources and
(OECD 2015).
few restrictions on what and how much medical
Amenable mortality is defined as “premature
equipment hospitals, other health facilities,
deaths from causes that should not occur in
and physicians can purchase and own. In
the presence of timely and effective health
contrast, primary care efforts in the US fall
care” (Nolte and McKee 2011). Figure 5,
behind many other high-income countries
adapted from a 2017 Commonwealth
(Starfield and Shi 2002).
Fund report, illustrates that in the 2014 period,
the USA had the highest amenable mortality
rate among all countries, nearly double that of
Outcomes and Quality Switzerland, the country with the lowest
figure (Schneider et al. 2017). Typical explana-
The US performs well on some measures of tions for the poor US performance compared to
quality and outcomes from an international other countries with respect to mortality rates
40 Health System in the USA 915
Fig. 5 Mortality amenable to health care (Source: Adapted Switzerland (2013), and the U.K. (2013). Amenable mortal-
from Schneider et al. 2017). Data from: European Observa- ity causes based on Nolte and McKee (2004). Mortality and
tory on Health Systems and Policies (2017). Trends in population data derived from WHO mortality files (Sept.
amenable mortality for selected countries, 2004 and 2014. 2016); population data for Canada and the U.S. derived from
Data for 2014 in all countries except Canada (2011), France the Human Mortality Database. Age-specific rates standard-
(2013), the Netherlands (2013), New Zealand (2012), ized to the European Standard Population (2013).
Unless otherwise noted, all data are from OECD with regard to breast cancer treatment, in part due
(2015). to the high mammography screening rates. The 5-
Prevention and Screening: The US immuniza- year survival rate, 89%, is highest of 18 OECD
tion rates in 2015 were diphtheria, tetanus, and countries. The US survival rate for cervical cancer
pertussis, 84.6%; measles, 91.9%; hepatitis B, of 62%, in contrast, is the third lowest of the 18
92.6%, and influenza, 67%. The US is among countries. In contrast, for colorectal cancer, with a
the lower half of countries for DTP, measles, and 5-year survival rate of 64%, the US ranks in the
hepatitis B. It is, however, among the countries top third of the countries.
with the highest rates for influenza vaccination. Asthma Admissions: The hospital admission
With regard to screening rates for breast cancer rate for asthma in the US is among the highest
(mammography) and cervical cancer (Pap among the 32 high-income OECD countries, at
smears), of the 14 countries OECD compared, 89.7 per 100,000 population, with only the Slovak
the US has the second highest mammography Republic and Korea higher. This is likely the
(cancer screening) rate for women age 50–69, result of a high uninsurance rate and poor preven-
at 81% (after the Netherlands) among 12 tive care.
countries, and (among 11 countries) the highest
cervical cancer screening rate for women age Subjective Measures of Quality
20–69, at 85%. The leading source of these data for international
Cancer Survival: Cancer survival is often con- comparisons is the Commonwealth Fund, using
sidered a good measure of the quality of a medical annual surveys of patients or physicians that have
care system because high survival rates are related been conducted in up to 11 countries since 2007.
both to preventive (screening) care and to treat- The 2011 survey focused on adults with a history
ment success. The US has been very successful of illness, while the 2013 survey examined
918 A. J. Barnes et al.
nationally representative samples of all adults. on average, a 3.8-year longer life expectancy than
The data below are from the 2014 report (Davis African Americans. This gap had narrowed con-
et al. 2014). siderably in the recent years, as in 2006, it was
With regard to care coordination, compared to 5.1 years.
the other countries, sicker adults in the US had This disparity between African Americans and
among the highest rates of problems with test other races also holds for certain diseases. Diabe-
results or records not being available when they tes rates, for example, are 80% higher among
saw their doctor as well as having duplicate tests African Americans than whites. For end-stage
ordered. One area in which the US did well was renal disease, African American incidence and
patients receiving a written plan for care after prevalence rates are about three times those of
hospital discharge or surgery – at 92%, well whites. There are disparities by income as well.
higher than the other ten countries. In the case of diabetes, rates for those below 200%
Five metrics of patient safety are shown in of the FPL are twice those of people above 400%
Table 6: that the patient believes there was a of the FPL. While diet and genetic factors play a
medical mistake made in treatment, received the strong role in diabetes, disparities in treatment
wrong medication or dose, that there were incor- relate to both the medical care system itself and
rect test results, there were delays in obtaining access to it. Similarly, there are different cancer
abnormal test results, and those hospitalized survival rates according to race. Overall 5-year
reported an infection from the hospital stay. For survival rates in the 1999–2006 period were
the first four measures, the US ranked near the 69% for whites compared to 59% for African
bottom in patient safety among the 11 countries. Americans. Among ten of the most common
However, for the last measure (hospital infec- types of cancer, whites had higher survival rates
tions), the US figure was the best (Davis et al. for nine of them (all but stomach cancer).
2014). One of the stated objectives of the ACA is to
improve quality and outcomes. First, preventive
Equity of Outcomes care is encouraged because such services will not
The US suffers from major inequities or dispar- be subject to patient co-payments under Medicare
ities in access to health care as well as in health and Medicaid. Medicare will also cover one com-
outcomes. A few of the more noteworthy dispar- prehensive risk assessment. Second, ACOs, some
ities are discussed here (unless noted, all figures believe, can increase quality by encouraging coor-
are from the US Department of Health and Human dination of currently disparate providers and dis-
Services (2016)). Beginning with infant mortality, couraging the provision of unnecessary services.
the overall rate in 2015 was 5.9 deaths per 1000 Third, additional comparative effectiveness
live births. The rates for both whites (4.9) and research will be funded, and fourth, a number of
Hispanics/Latinos (5.01) are considerably higher financial incentives based on quality and out-
than they are for Asian/Pacific Islanders (3.7). The comes are initiated under the legislation. These
rate for African Americans, however, is more than include reimbursement incentives for hospital
double that of whites, at 10.9. The infant mortality performance and value-based payments to
rate for American Indians and Alaskan Natives is providers.
also considerably high at 7.7, higher than the rate
for whites, Hispanics and Asians. Infant mortality
also varies considerably by state, with the rate in Expenditures
Massachusetts (4.3) about half that in several
states in the South. Given the racial differences The US spends far more on health care per person
just noted, it is not surprising that the states with than any other country. There is little agreement
the highest rates tend to have higher proportions on why the US is an outlier in this regard. Those
of African American residents. Life expectancy at on the left often point to what they see as several
birth shows similar patterns: In 2015, whites had, contributing factors: lack of consolidated
40 Health System in the USA 919
purchasing power among buyers of care, the lack provision and overutilization of services. Other
of universal insurance coverage, high marketing factors that observers on both sides point out are
and administrative costs among private insurers, high unit prices paid to providers, particularly in
too many specialists and not enough primary care the fee-for-service system, proliferation of medi-
doctors, and direct-to-consumer advertising of cal technologies, and unhealthy behaviors.
prescription drugs. Those on the right point to a Per capita spending is more than double the
bloated government bureaucracy and a myriad of median level for OECD countries, nearly 40%
regulations that stifle competition, along with more than the second most expensive country,
medical liability laws that encourage over- Switzerland, and health-care expenses constitute
920 A. J. Barnes et al.
Fig. 6 Cumulative increases in health insurance premiums, workers’ contributions to premiums, inflation, and workers’
earnings, 1999–2014
over one-sixth of the US economy (Hartman et al. There are two overall ways in which the ACA
2014). The rate of growth in health-care spending may help contain expenditures. First, it includes a
exceeded the GDP growth rate every year since at number of initiatives that have the potential to
least the 1960s until 2010, which has increasingly change the financing and delivery system. These
squeezed the finances of all levels of government, include encouraging the development and/or
employers, and individuals. growth of ACOs; bundled payment systems,
Employers and employees also have seen large which provide payment for a set of related ser-
increases in their contributions to the health-care vices usually related to an episode of illness
costs of employer-sponsored health insurance. (as opposed to fee-for-service); medical homes
Between 1999 and 2014, total premiums rose by (a physician-directed organization that oversees
191% and the workers’ share by 212%. In contrast, the provision of access to comprehensive care
wages rose by only 54% over this period (Fig. 6). across health-care facilities and over a patient’s
Looking now at changes over time, Fig. 7 life); electronic medical records; and the linking
illustrates growth in national health expenditure of reimbursement to performance outcomes
per capita expressed in US purchasing power par- (initially, for Medicare hospital stays).
ities for six countries: Canada, Germany, Japan, In addition, the ACA includes a number of
the Netherlands, the UK, and the US from 2000 to direct mechanisms that could control expendi-
2016. Growth rates in the Netherlands and Japan tures, including large cuts in previously expected
exceed those of the other countries. However, in payment levels to Medicare Advantage (usually,
2016, US spending was more than double that in managed care) plans, which in 2012 were
the UK because the UK started at such a low level estimated to have been paid 7% more than it
of spending. Thus, when one combines both level would have cost for the same individuals to have
of spending and rate of growth, the US is an been enrolled in the traditional fee-for-service
international outlier. Medicare program (Medicare Payment Advisory
40 Health System in the USA 921
5,000,000.0
4,000,000.0
3,000,000.0
2,000,000.0
1,000,000.0
0.0
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Fig. 7 National health expenditures, per capita in six countries, 2000–2016 (Source: OECD 2017)
Commission 2012), the tax on “Cadillac” or afforded through large government insurers, the
high-benefit health insurance plans, and the Inde- lack of a centralized prices and prospective
pendent Payment Advisory Board, which is to budgeting, and, most importantly, the absence of
recommend ways to reduce Medicare costs if guaranteed insurance coverage.
they exceed a certain threshold. With the adoption of the Affordable Care Act in
The ACA does not include a number of 2010, and subsequent legal and policy challenges
cost-containment methods that have been to its core provisions, the US health care sustem
employed in some other countries. These continues to change. Nonetheless, despite many
include global budgets, coordinating provider legal and political challenges, the core provisions
payment among public and private insurers (i.e., of the ACA have endured. The ACA addresses
an “all-payers” system), controlling the supply of major challenging issues such as geographic varia-
resources (e.g., through expenditure targets tion in the use of services and a bias toward sub-
or technology controls), and using cost-effective- specialty rather than primary care services
ness research to determine which services should but mainly through small programs and pilot stud-
be reimbursed and, if so, how much. ies. The types of changes needed in health-care
delivery are unlikely to result from legislation.
Rather, they need to be innovated and supported
Conclusions by both the public and private sectors as each
grapples with the cost, quality, and access issues
In summary, the US health-care system is among the they face. They also hinge on changing individual
best in the world in some respects while suffering and provider behaviors. Solving the most vexing
from significant shortcomings in others. The US is health-care financing, delivery, and policy issues
distinguished from its counterparts by its historic depends as much on finding a common ground
distaste for health planning, lack of control over the among US policymakers and, more broadly, the
dissemination of medical technologies, reluctance to American public, as it does on medical, social,
take advantage of the potential bargaining power behavioral, and organizational sciences.
922 A. J. Barnes et al.
Gallup. U.S. Uninsured Rate Steady at 12.2% in Fourth Kaiser Family Foundation. Uninsured Rates for Non-
Quarter of 2017. 2017. http://news.gallup.com/poll/ elderly Adults by Gender. 2016a. https://www.kff.org/
225383/uninsured-rate-steady-fourth-quarter-2017.aspx. uninsured/state-indicator/rate-by-gender/?currentTime
Accessed 8 Feb 2018. frame=0&sortModel=%7B%22colId%22:%22Location
GAO. Hospital emergency departments: crowding con- %22,%22sort%22:%22asc%22%7D. Accessed 8 Jul
tinues to occur, and some patients wait longer than 2018.
recommended time frames. Washington, DC: US Gov- Kaiser Family Foundation. Health insurance coverage of
ernment Accountability Office; 2009. http://www.gao. total population. 2016b. https://www.kff.org/other/
gov/new.items/d09347.pdf. Accessed 19 Apr 2013. state-indicator/total-population/?currentTimeframe=0
GAO. Drug Industry: Profits, Research and Development &selectedDistributions=medicaid–medicare–other-
Spending, and Merger and Acquisition Deals. 2017. public&sortModel=%7B%22colId%22. Accessed
https://www.gao.gov/assets/690/688472.pdf. Accessed 21 Feb 2018.
5 Apr 2018. Kaiser Family Foundation. Federal Medicaid Dispropor-
Harrison TD. Consolidations and closures: an empirical tionate Share Hospital (DSH) Allotments. 2016c.
analysis of exits from the hospital industry. Health https://www.kff.org/medicaid/state-indicator/federal-
Econ. 2007;16(5):457–74. dsh-allotments/?currentTimeframe=0&sortModel=%
Hartman M, et al. National Health Care Spending In 2016: 7B%22colId%22:%22Location%22,%22sort%22:%
Spending And Enrollment Growth Slow After Initial 22asc%22%7D. Accessed 21 Feb 2018.
Coverage Expansions. 2017. Health Aff, p.10.1377/ Kaiser Family Foundation. Health Insurance Coverage of
hlthaff. http://www.healthaffairs.org/doi/10.1377/ the Total Population. 2016d. https://www.kff.org/other/
hlthaff.2017.1299 state-indicator/total-population/?dataView=1¤tTime
Healthcare.gov. Federal Poverty Level. 2018. Available at: frame=0&selectedDistributions=employer–non-group–
https://www.healthcare.gov/glossary/federal-poverty- uninsured&sortModel=%7B%22colId%22:%22Loca
level-FPL/. Accessed 14 Feb 2018. tion%22,%22sort%22:%22asc%22%7D. Accessed 21
Hersh W. A stimulus to define informatics and health Feb 2018.
information technology. BMC Med Inform Decis Kaiser Family Foundation. Key facts about the uninsured
Mak. 2009;9:24. population. 2017a. https://www.kff.org/uninsured/
High E, Schneider C, Sarnak DO. Appendix 1. Eleven- fact-sheet/key-facts-about-the-uninsured-population/.
Country Summary Scores on Health System Perfor- Accessed 8 Feb 2018.
mance. Mirror, Mirror 2017: International Kaiser Family Foundation. Medicare advantage. 2017b.
Comparison Reflects Flaws and Opportunities for Medicare advantage. http://files.kff.org/attachment/
Better U.S. Health Care. 2017. http://www. Fact-Sheet-Medicare-Advantage. Accessed 21 Mar
commonwealthfund.org/interactives/2017/july/mirror- 2018.
mirror/assets/Schneider_mirror_mirror_2017_Appen Kaiser Family Foundation. The Medicare Part D Prescrip-
dices.pdf. Accessed 18 Feb 2018. tion Drug Benefit. 2017c. http://files.kff.org/attach
Hing E, Hsiao C State Variability in Supply of Office-based ment/Fact-Sheet-The-Medicare-Part-D-Prescription-
Primary Care Providers: United States 2012. 2014. US Drug-Benefit. Accessed 21 Feb 2018.
Department of Health and Human Services. Kaiser Family Foundation. 2017 Employer Health Benefits
Hogan SO, Kissam SM. Measuring meaningful use. Health Survey. 2017d. https://www.kff.org/report-section/ehbs-
Aff. 2010;29(4):601–6. 2017-summary-of-findings/. Accessed 21 Feb 2018.
Hsiao C, Hing E. Use and characteristics of electronic Kaiser Family Foundation. Health Insurance Coverage of
health record systems among office-based physician the Total Population. 2018a. https://www.kff.org/other/
practices: United States, 2001–2013. NCHS Data state-indicator/total-population/?dataView=0¤tTime
Brief. 2014;143:1–8. frame=0&sortModel=%7B%22colId%22:%22Loca
Kaiser Family Foundation. Kaiser slides. 2012. http://facts. tion%22,%22sort%22:%22asc%22%7D
kff.org/. Accessed 19 Apr 2013. Kaiser Family Foundation. Status of State Action on the
Kaiser Family Foundation. Federal Disproportionate Share Medicaid Expansion Decision. 2018b. https://www.kff.
(DSH) hospital allotments. 2013. http://kff.org/medic org/health-reform/state-indicator/state-activity-around-
aid/state-indicator/federal-dsh-allotments. Accessed 11 expanding-medicaid-under-the-affordable-care-act/?
Oct 13. currentTimeframe=0&sortModel=%7B%22colId%
Kaiser Family Foundation. Employer health benefits: 22:%22Location%22,%22sort%22:%22asc%22%7D.
2014 annual survey. 2014a. http://files.kff.org/attach Accessed 14 Feb 2018.
ment/2014-employer-health-benefits-survey-full-report. Kaiser Family Foundation. Medicaid waiver tracker: Which
Accessed 9 Aug 2015. states have approved and pending section 115 Medicaid
Kaiser Family Foundation. Health Care Expenditures per waivers? 2018c. https://www.kff.org/medicaid/issue-
Capita by State of Residence. 2014b. https://www.kff. brief/which-states-have-approved-and-pending-sec
org/other/state-indicator/health-spending-per-capita/? tion-1115-medicaid-waivers/. Accessed 14 Feb 2018.
currentTimeframe=0&sortModel=%7B%22colId% Kaiser Family Foundation. Subsidy calculator. 2018d.
22:%22Location%22,%22sort%22:%22asc%22%7D. http://kff.org/interactive/subsidy-calculator/. Accessed
Accessed 9 Aug 2015. 18 Feb 2018.
924 A. J. Barnes et al.
Kaiser Family Foundation. Marketplace Enrollment, 2014- OECD. OECD.Stat. 2015. http://stats.oecd.org/index.
2018. 2018. https://www.kff.org/health-reform/state- aspx?DataSetCode=HEALTH_STAT
indicator/marketplace-enrollment-2014-2017/?current OECD. OECD.Stat. 2017. http://stats.oecd.org/OECDStat_
Timeframe=0&sortModel=%7B%22colId%22:%22 Metadata/ShowMetadata.ashx?Dataset=SHA&Coords
Location%22,%22sort%22:%22asc%22%7D. Accessed =%5BLOCATION%5D.%5BDEU%5D&ShowOnWeb
21 Mar 2018. =true&Lang=en. Accessed 18 Feb 2018.
Kidsdata.org. Child population, by race/ethnicity. 2015. Phillips RL, Bazemore AW. Primary care and why it mat-
http://www.kidsdata.org/topic/33/child-population-race/ ters for U.S. health system reform. Health Aff. 2010;29
table#fmt=144&loc=2,127,347,1763,331,348,336,171, (5):806–10.
321,345,357,332,324,369,358,362,360,337,327,364, Pietroburgo J. Charity at the deathbed: impacts of public
356,217,353,328,354,323,352,320,339,334,365,343, funding changes on hospice care. Am J Hosp Palliat
330,367,344,355,366,368,265,349,361,4,273,59,370, Med. 2006;23(3):217–23.
326,333,322,341,338,350,342,329,325,359,351,363, Ranasinghe PD. International medical graduates in the US
340,335&tf=79&ch=7,11,726,10,72,9,939&sortCo physician workforce. J Am Osteopath Assoc. 2015;115
lumnId=0&sortType=asc. Accessed 3 Aug 2015. (4):236–41.
Kovner AR, Knickman JR. Health care delivery in the RAND. Health care on aisle 7: the growing phenomenon of
United States. 9th ed. New York: Springer; 2011. retail clinics. RAND Health Research Highlights. Clin
Ling DC, Berndt ER, Frank RG. Economic incentives and Sch Rev. 2010;3(1):10–3.
contracts: the use of psychotropic medications. Rittenhouse D, et al. Small and medium-size physician
Contemp Econ Policy. 2008;26(1):49–72. practices use few patient-centered medical home pro-
Longtermcare.gov. Costs of Care. 2018. https:// cesses. Health Aff (Proj Hope). 2011;30(8):1575–84.
longtermcare.acl.gov/costs-how-to-pay/costs-of-care. Salinsky E. Governmental public health: an overview of
html. Accessed 5 Apr 2018. state and local public health agencies, National Health
Lorenz K, et al. Charity for the dying: who receives Policy Forum, background paper no. 77. Washington,
unreimbursed hospice care? J Palliat Med. 2003;6 DC: George Washington University; 2010. http://www.
(4):585–91. nhpf.org/library/background-papers/BP77_GovPublic
Medicare Payment Advisory Commission. Health care Health_08-18-2010.pdf. Accessed 19 Apr 2013.
spending and the Medicare program. 2012. http:// Schlesinger M, Mitchell S, Gray B. Measuring community
www.medpac.gov/documents/Jun12DataBookEntire benefits provided by nonprofit and for-profit HMOs.
Report.pdf. Accessed 19 Apr 2013. Inquiry. 2003;40(2):114–32.
Medicare.gov. Your Medicare Coverage. 2018a. Centers Schneider EC, et al. Mirror, Mirror 2017. International
for Medicare and Medicaid Services. https://www. Comparison Reflects Flaws and Opportunities for Bet-
medicare.gov/coverage/hospital-care-inpatient.html. ter US Health Care. 2017. Commonwealth Fund. http://
Accessed 18 Feb 2018. www.commonwealthfund.org/interactives/2017/july/
Medicare.gov. Part B Costs. 2018b. Centers for Medicare mirror-mirror/assets/Schneider_mirror_mirror_2017.
and Medicaid Services. https://www.medicare.gov/your- pdf. Accessed 5 Apr 2018.
medicare-costs/part-b-costs/part-b-costs.html. Accessed Schoenbaum SC, et al. Mortality amenable to health care in
21 Mar 2018. the United States: the roles of demographics and health
Mehrotra A, Wang M, Lave J, Adams J, McGlynn E. Retail systems performance. J Public Health Policy. 2011;32
clinics, primary care physicians, and emergency depart- (4):407–29.
ments: a comparison of patients’ visits. Health Aff. Shen Y, Hsia R. Changes in emergency department access
2008;27(5):1272–82. between 2001 and 2005 among general and vulnerable
Metzl JM, Herzig RM. Medicalisation in the 21st century: populations. Am J Public Health. 2010;100(8):1462–9.
introduction. Lancet. 2007;369(9562):697–8. Shi L, Singh DA. Delivering health care in America: a
Milliman Inc. Medicare versus private health insurance: systems approach. 5th ed. Boston: Jones & Bartlett;
the cost of administration. 2006. http://www.cahi.org/ 2012.
cahi_contents/resources/pdf/CAHIMedicareTechnical Starfield B, Shi L. Policy relevant determinants of health:
Paper.pdf. Accessed 19 Apr 2013. an international perspective. Health Policy. 2002;60
Misurski DA, Lipson DA, Changolkar AK. Inappropriate (3):201–18.
antibiotic prescribing in managed care subjects with Tunis SR, Kang JL. Improvement in Medicare coverage of
influenza. Am J Manag Care. 2011;17(9):601–9. new technology: how Medicare has responded to the
NHPCO. NHPCO facts and figures: hospice care in Amer- need to improve access to beneficial technologies.
ica 2010. National Hospice and Palliative Care Orga- Health Aff. 2001;20(5):83–5.
nization. 2010. http://www.nhpco.org/files/public/ U.S. Census Bureau. NAICS 6211, Offices of physicians.
Statistics_Research/Hospice_Facts_Figures_Oct-2010. 2010. http://www.census.gov/econ/census02/data/
pdf. Accessed 19 Apr 2013. industry/E62111.HTM#bridge. Accessed 19 Apr 2013.
Nolte E, McKee M. Variations in amenable mortality – U.S. Census Bureau. 2014. http://factfinder.census.gov/
trends in 16 high-income nations. Health Policy. faces/tableservices/jsf/pages/productview.xhtml?src=
2011;103:47–52. bkmk. Accessed 4 Jul 2015.
40 Health System in the USA 925
U.S. Census Bureau. Sumter County, Fla., is Nation’s Weiner J. Expanding the US medical workforce: global
Oldest, Census Bureau Reports. 2016. Press Release: perspectives and parallels. BMJ. 2007;335
CB16-107. https://www.census.gov/newsroom/press- (7613):236–8.
releases/2016/cb16-107.html. Accessed 21 Mar 2018. Weiner S, et al. Managing the unmanaged: a case
U.S. Census Bureau. Quickfacts US, Population estimates study of intra-institutional determinants of
2017. 2017. Available at: https://www.census.gov/ uncompensated care at health care institutions
quickfacts/fact/table/US/PST045217#viewtop. Accessed with differing ownership models. Med Care.
21 Mar 2018. 2008;46(8):821–8.
U.S. Centers for Disease Control and Prevention. Cancer Weinick RM, Bristol SJ, DesRoches CM. Urgent
screening and test use – United States, 2013. Morb care centers in the U.S.: findings from a national
Mortal Wkly Rep. 2015. http://origin.glb.cdc.gov/ survey. BMC Health Serv Res. 2009;9:79.
mmwr/preview/mmwrhtml/mm6417a4.htm?s_cid= Weissman J, Gaskin DJ, Reuter J. Hospitals’ care of
mm6417a4_w. Accessed 6 Aug 2015. uninsured patients during the 1990s: the relation of
U.S. Department of Health and Human Services. Health, teaching status and managed care to changes in market
U.S., 2014. 2014. http://www.cdc.gov/nchs/data/hus/ share and market concentration. Inquiry. 2003;40
hus14.pdf. Accessed 19 Aug 2015. (1):84–93.
U.S. Department of Health and Human Services, Health Whitmore H, et al. The individual insurance market before
Resources and Services Administration. The registered reform: low premiums and low benefits. Med Care Res
nurse population: findings from the 2008 National Rev. 2011;68(5):594–606.
Sample Survey of Registered Nurses. 2010. Retrieved WHO. The right to health – fact sheet. 2007. http://www.
from http://bhpr.hrsa.gov/healthworkforce/rnsurveys/ who.int/mediacentre/factsheets/fs323_en.pdf. Accessed
rnsurveyfinal.pdf 19 Apr 2013.
US Department of Health and Human Services. Health, Williams SJ, Martin P, Gabe J. The pharmaceuticalisation
U.S., 2016. 2016. https://www.cdc.gov/nchs/data/hus/ of society? A framework for analysis. Sociol Health
hus16.pdf. Accessed 8 Feb 2018. Illn. 2011;33(5):710–25.
U.S. Government Accountability Office. Private health World Bank. Life Expectancy at Birth, total (years). 2017.
insurance: concentration of enrollees among individ- https://data.worldbank.org/indicator/SP.DYN.LE00.IN.
ual, small group, and large group insurers from 2010 Accessed 21 Feb 2018.
through 2013. 2014. http://www.gao.gov/assets/670/ Yee T, Lechner AE, Boukus ER. The surge in urgent
667245.pdf. Accessed 2 Aug 2015. care centers: emergency department alternative
Van der Hooft C, et al. Inappropriate drug prescribing in or costly convenience? Center for Studying
older adults: the updated 2002 Beers criteria–a popula- Health System Change. Res Brief. (26). July
tion-based cohort study. Br J Clin Pharmacol. 2013. www.hschange.com/CONTENT/1366/1366.
2005;60(2):137–44. pdf
Health System Typologies
41
Claus Wendt
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
Typologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929
The Role of Actors and Institutions in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929
How Do Healthcare Systems Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936
healthcare systems. In healthcare, typologies have National Health Service (NHS) system was
so far mainly been used to contrast different types introduced in Britain in 1946 on the basis of the
of healthcare systems, to group countries into Beveridge-plan and provided an example for
types, and to identify similarities and differences countries such as New Zealand, Sweden, and
among countries. Recently, typologies of welfare Denmark, which have since been labeled as hav-
states and of healthcare systems have been ing NHS or Beveridgian healthcare systems
used for combining macro- and micro-research (e.g., Hassenteufel and Palier 2007). In a 1987
in healthcare. Comparative scholars have, study by the OECD, the labels National Health
for instance, studied macro-level effects on Service, social insurance, and private insurance
patients’ access to healthcare, health status, and were used to form a more coherent analytical
satisfaction. concept for healthcare system comparison.
With respect to the triangular model of This concept, however, has been criticized for
healthcare systems as the backbone of this vol- essentially referring to the real cases of Britain,
ume, health system typologies generally refer to Germany, and the USA instead of ideal-types.
one, two, or even all three dimensions: financing While sharing certain characteristics (such as tax
agencies, healthcare providers, and patients. financing vs. social insurance financing vs. pri-
Typologies can roughly be divided into two vate financing), the three types are designed nei-
areas of research. The first concentrates on actors ther for covering all developed healthcare
and institutions by asking who finances, provides, systems nor for capturing changes over time.
and regulates healthcare services. The second area The social insurance countries of Central
of research is interested in the what and captures and Eastern Europe (CEE), for instance, differ
levels and structures of financing, provision, and in many respects (e.g., the weak position of
regulation. corporate actors) from West European social
Typologies are rooted in Max Weber’s meth- health insurance, and Southern European NHS
odology of ideal-types. According to Weber systems differ (e.g., regarding administrative
(1949: 90; italics in original), an ideal-type “is capacities) from the British and Nordic
formed by the one-sided accentuation of one or NHS systems.
more points of view and by the synthesis of a great Due to the lack of a commonly used health
many diffuse, discrete, more or less present and system typology, some scholars of healthcare sys-
occasionally absent concrete individual phenom- tems use the typology of welfare states introduced
ena, which are arranged according to those one- by Esping-Andersen (1990) as a reference. In the
sidedly emphasized viewpoints into a unified ana- original version, social democratic welfare states
lytical construct.” This method can be used as a were separated from conservative-corporatist
tool for grouping countries (real-types) into and from liberal welfare states. In healthcare
health system types to identify similarities and research, welfare regimes have been used for ana-
differences among healthcare systems, analyze lyzing public support of healthcare systems
changes over time, and study the effects of (Gelissen 2002), the health status (Conley and
healthcare systems characterized by different Springer 2001), and health inequalities (Eikemo
institutional setups. et al. 2008). Arguing that the concept of
Social health insurance and National Health de-commodification (central to the welfare regime
Service have been used as terms for contrasting typology) is not designed for capturing character-
healthcare systems that have different traditions istics with great importance to health, Bambra
and institutional designs. The first social health (2005) introduced a health de-commodification
insurance (SHI) was implemented in Germany in index consisting of private health expenditure,
1883 by Bismarck; later, countries such as Aus- private hospital beds, and the overall coverage of
tria, Hungary, and France followed the German the healthcare system. The country grouping is
example, and their systems are often labeled as slightly different compared with the “classic” wel-
SHI or Bismarckian healthcare systems. The first fare state typology. Both concepts, however, were
41 Health System Typologies 929
first and foremost designed to capture social rights The Role of Actors and Institutions in
and do not reveal the main characteristics of mod- Healthcare
ern healthcare systems.
This chapter summarizes some of the more With respect to the triangular concept of this
recent health system typologies related to “orga- book, typologies in this area of research ask
nization and governance of health systems,” “who” is responsible for governance and organi-
“health financing,” and “provision of services.” zation as well as the purchasing and provision of
It discusses concepts, country groupings, and healthcare services. The 1987 OECD study was
findings from studies that use health system one of the first attempts to classify healthcare
typologies. Studies are available that analyze systems according to preselected dimensions.
the effects of different health system types on The analytical dimensions used in OECD 1987
cost containment, access to care, public opinion, were “coverage,” “financing,” and “ownership,”
health, and health inequality (for an overview, and the study investigated “who” was responsi-
see Beckfield et al. 2013; Burau et al. 2015). ble in these areas. The creation of types and the
Most typologies, however, remain descriptive classification of countries, however, did not take
with the primary task of identifying similarities place on the basis of available comparative data
and differences among today’s healthcare but on the basis of informed reasoning. The
systems. Such studies are critical and have OECD study identified (1) a National Health
commonly been used for selecting countries for Service (NHS) model with universal coverage,
small-n comparative studies in the welfare tax financing, and public ownership of
state discourse. Health system typologies have healthcare provision; (2) a social insurance
taken data from data sets such as the model with universal coverage, social insurance
OECD Health Data (various years), the WHO financing, and a combination of public and pri-
Health for All Database (various years), vate ownership; and (3) a private insurance
and other international and national sources. model with selective private insurance coverage,
Most comparative researchers make use of private insurance financing, and private owner-
the OECD data, which is, however, less ship (OECD 1987).
useful for the detection of institutional bound- Moran (1999) developed a typology of
aries within countries or for the analysis of “healthcare states” by asking “who” governs the
inequalities (Beckfield et al. 2013). If the effects “consumption,” “provision,” and “production” of
of different health system types are examined, healthcare. Governance of “consumption” refers
macro data could be matched with micro to patients’ eligibility to access healthcare and to
data from sources such as the Eurobarometer, the allocation of financial resources to the
the European Social Survey (ESS), the Interna- healthcare system; governance of “provision”
tional Social Survey Programme (ISSP), and refers to the control of doctors and hospitals; and
the Survey of Health, Ageing and Retirement governance of “production” refers to the regula-
(SHARE). tion of medical innovations. On the basis of these
dimensions, Moran (1999) constructed four fam-
ilies of healthcare states: (1) the “entrenched com-
Typologies mand and control state,” in which the state is
distinctive in all three governing areas (e.g., the
Health system typologies can basically be divided UK and the Scandinavian countries); (2) the “cor-
into frameworks that concentrate on the role of porate healthcare states,” in which “consumption”
actors and the type of governance on the one hand is dominated by public law bodies and the field of
and into frameworks that try to understand how outpatient healthcare is dominated by panel doc-
healthcare systems work, what they invest in the tors’ associations (e.g., Germany); (3) “supply
people’s health, and what services they provide on states,” which are dominated by provider interests
the other hand. (e.g., the USA); and (4) “insecure command and
930 C. Wendt
control states,” in which administrative capacities regulation, financing, and service provision by state
are much lower and private healthcare provision is actors and institutions (e.g., the UK, the Scandina-
higher than in the first type (e.g., Italy, Greece, vian countries, and the Southern European coun-
Portugal, Spain). By using Moran’s concept, tries of Portugal and Spain); (2) national health
Burau and Blank (2006) analyzed nine healthcare insurance with regulation and financing by the
systems and identified four cases that fully fit one state and with private healthcare provision (e.g.,
of Moran’s types. Sweden and the UK are perfect Australia, Canada, Ireland, New Zealand, and
examples of the “command and control state”; Italy); (3) a societal-based mixed type with regula-
however, New Zealand and the Netherlands tion and financing by societal actors such as social
share important characteristics of this type as insurance and public healthcare provision (e.g.,
well. Germany represents the “corporatist Slovenia); (4) social health insurance with regula-
healthcare state,” and Australia, Japan, and again tion and financing by societal actors and private
the Netherlands match this type in two dimen- healthcare provision (e.g., Austria, Germany, Lux-
sions. The USA is an example of the “supply embourg, and Switzerland); (5) a private health
healthcare state,” and Singapore also shows system with private regulation, financing, and ser-
major characteristics of this type in addition to vice provision (e.g., the USA); and (6) etatist social
corporatist elements. health insurance with state regulation, social insur-
Wendt et al. (2009) suggest a typology with ance financing, and private provision (e.g., Bel-
27 healthcare system types, 3 of which are ideal- gium, Estonia, France, the Czech Republic,
types. These healthcare system types are Hungary, the Netherlands, Poland, Slovakia, Israel,
constructed by combining the dimensions of reg- Japan, and Korea). Böhm et al. therefore identified
ulation, financing, and service provision with the two of the ideal-types proposed by Wendt et al. (a
involvement of state, nongovernmental (societal), state healthcare system and a private healthcare
and private actors. In “state healthcare systems,” system), while according to this study, an ideal-
the state is decisive in all three dimensions; in type societal healthcare system does not exist in
“societal healthcare systems,” societal and corpo- today’s OECD world. While corporate actors such
rate actors are decisive; and in “private healthcare as social health insurance and doctors’ associations
systems,” private actors dominate regulation, can (and sometimes do) run their own services,
financing, and healthcare provision. For each most healthcare systems that are financed by social
ideal-type, Wendt et al. (2009) identified six com- health insurance contributions rely on private
binations in which either the state, societal actors, provision.
or private actors are dominant in two dimensions Most typologies that concentrate on the role of
and therefore come close to the respective ideal- the state and other actors in healthcare (i.e., “who”
type. Six additional combinations do not approach is governing and regulating, financing, and pro-
to any of the three ideal-types. Based on this viding healthcare) have identified one type of
typology, Wendt et al. (2009) suggested that the system in which the state plays a dominant role
UK and Denmark form “state healthcare sys- and includes the UK and the Scandinavian coun-
tems,” in which the state is decisive in all three tries. Furthermore, in all typologies, the private
dimensions. Germany is classified as a (societal- US healthcare system forms a type of its own. All
based) mixed type due to the great importance of other empirical and theoretical observations are
private provision, and the USA is labeled a (pri- far from uniform. Most typologies have identified
vate-based) mixed type due to the growing impor- the German healthcare system as representative of
tance of public financing through public programs a “societal core”; however, while Burau and
such as Medicare and Medicaid. Blank (2006) cluster the German case together
Using Wendt et al.’s model, Böhm et al. (2013) with Australia, Japan, and the Netherlands,
compared and classified 30 OECD countries and Böhm et al. (2013) place Germany in the same
found 6 health system types for which real cases group as Austria, Luxembourg, and Switzerland
could be identified: (1) national health service with (see Table 1 below).
41 Health System Typologies 931
Table 1 (continued)
Authors Dimensions Data Types Country grouping Main goal
(regulation: state; Hungary, the Netherlands,
financing, societal; Poland, Slovakia, Israel,
provision, private Japan, Korea
How do healthcare systems work?
Bambra Private OECD (1) High public Grouping suggested by the (Construction
(2005) health health healthcare index author and based on Bambra of types) and
expenditure data; (50 or higher) (2005), Table 8: grouping of
Private WHO (2) Middle public (1) Finland, Sweden, countries
hospital beds data healthcare index Norway, the UK
(around 40)
Coverage of (3) Low public (2) Austria, Belgium, France,
the public healthcare index Ireland, New Zealand,
system (20–30) Canada, Denmark, Italy
(4) Very low public (3) Australia, Germany, the
healthcare index Netherlands, Switzerland,
(below 10) Japan
Reibling Gatekeeping OECD (1) Financial (1) Austria, Belgium, France, Construction
(2010) health incentives states Sweden, Switzerland of types and
Cost-sharing data; HiT (2) Strong (2) Denmark, the grouping of
reportsa; gatekeeping and low Netherlands, Poland, Spain, countries
MISSOCb supply states the UK
Supply (3) Weakly regulated (3) The Czech Republic,
and high supply Germany, Greece
states
(4) Mixed regulation (4) Finland, Italy, Portugal
states
Wendt Health OECD (1) Health service (1) Austria, Belgium, France, Constructing
(2009) expenditure health provision-oriented Germany, Luxembourg of types and
data; HiT type grouping of
Public- reportsa (2) Universal (2) Denmark, Italy, Ireland, countries
private mix coverage – controlled Sweden, the UK
of financing access type
Privatization (3) Low budget – (3) Portugal, Spain, Finland
of risk restricted access type
Healthcare
provision
Entitlement
to care
Payment of
doctors
Patients’
access to
providers
Wendt Health OECD (1) Health service (1) Austria, Belgium, Constructing
(2014) expenditure health provision-oriented Canada, France, Germany, of types and
data; HiT type Japan, Luxembourg, grouping of
reportsa New Zealand countries
Public- (2) Universal (2) Australia, the Czech
private mix coverage – controlled Republic, Denmark, Estonia,
of financing access type Hungary, Ireland, Italy, the
Netherlands, Poland, Slovak
Republic, Slovenia, the UK
(continued)
41 Health System Typologies 933
Table 1 (continued)
Authors Dimensions Data Types Country grouping Main goal
Privatization (3) Universal (3) Finland, Iceland,
of risk coverage – controlled Portugal, Spain, Sweden
supply type
Healthcare (4) Low supply type (4) Greece (in 2001), Israel,
provision Turkey
Payment of
doctors
Patients’
access to
providers
a
HiT reports: European Observatory of Health Care Systems, Healthcare Systems in Transitions series, see http://www.
euro.who.int/en/about-us/partners/observatory/health-systems-in-transition-hit-series
b
MISSOC: The EU’s Mutual Information System on Social Protection, see http://ec.europa.eu/social/main.jsp?catId=
815&langId=en
particularly for minor diseases. Supply, as a major comparatively low, and self-employed doctors
precondition for access, is assessed by provider are generally paid fee-for-service.
density and medical technology. By using gate- 2. The “universal coverage – controlled access
keeping, cost-sharing, provider density (GPs, spe- type,” represented by Denmark, Italy, Ireland,
cialists, and nurses), and medical technology Sweden, and the UK. While these healthcare
(magnetic resonance imaging units/MRI, com- systems provide universal coverage, access to
puted tomography scanners/CT), four types of care is strictly regulated. Patients typically
European healthcare systems were constructed: have to sign up on a general practitioner’s list
(1) “financial incentive states” that regulate for a longer period of time, and a referral is
patients’ access to medical care first and foremost required if specialist care is needed. Access to
by cost-sharing (Austria, Belgium, France, Swe- care is further restricted by a comparatively
den, Switzerland); (2) “strong gatekeeping and low level of healthcare provision in the outpa-
low supply states” that are characterized by low tient sector. General practitioners are mainly
cost-sharing (but where access is controlled by paid on a capitation basis.
extensive gatekeeping), low numbers of 3. The “low budget – restricted access type,”
healthcare providers, and medical technology which includes Finland, Portugal, and
(Denmark, the Netherlands, Poland, Spain, the Spain. This type of system is characterized
UK); (3) “weakly regulated and high supply by a low level of healthcare expenditure.
states” with low legal access regulation and a Patients’ access is controlled not only by strict
high supply of healthcare providers (the Czech access regulation but also by high private co-
Republic, Germany, Greece); and (4) “mixed payments. Most general practitioners receive a
regulation states” that use both gatekeeping and salary, and the degree of doctors’ autonomy
cost-sharing. can therefore be considered to be even lower
In two publications, Wendt (2009, 2014) addi- than in the “universal coverage – controlled
tionally focused on gatekeeping, cost-sharing, and access type.”
supply and combined these dimensions with
information on entitlement to healthcare, the In Wendt (2014), the number of countries
level of healthcare expenditure, the public-private was extended, and the research now covers
mix of healthcare financing, and doctors’ remu- both European and non-European healthcare
neration. Healthcare provision is captured by ser- systems. When using the same dimensions
vice provider numbers in inpatient and outpatient (except entitlement to care) and newer data, the
healthcare, gatekeeping by a healthcare regulation “health service provision oriented type” can be
index, and doctors’ remuneration by the payment confirmed and now also covers Canada, Japan,
of general practitioners in the outpatient sector and New Zealand. The “universal coverage –
(fee-for-service, capitation, salary). The 2009 controlled access type” has also been confirmed
article compares European countries, whereas and now additionally includes Australia and coun-
the 2014 article covers both European and non- tries from Central and Eastern Europe. A third
European countries. By applying cluster analyses type identified in Wendt (2014) is the “universal
in the 2009 typology, Wendt arrived at three types coverage – controlled supply state,” represented
of healthcare systems: by Finland, Iceland, Portugal, Spain, and Sweden.
In this type, the control of doctors’ remuneration
1. The “health service provision-oriented type,” is even stricter, and cost-sharing is even higher
which captures Austria, Belgium, France, than in the “universal coverage – controlled
Germany, and Luxembourg. This type is access type.” In the publication from 2014, the
characterized by a high level and unquestioned “low supply type” has been identified as a fourth
importance of service provision. Patients often type of healthcare system, represented by Israel,
have direct access and a choice of both general Turkey, and (in 2001) Greece. This type is char-
practitioners and specialists. Cost-sharing is acterized by both very low levels of total health
41 Health System Typologies 935
expenditure and low public financing. Levels of way they function. “Command and control states”
healthcare provision in both inpatient and outpa- should be characterized by lower healthcare
tient healthcare are quite low. Patients’ access to spending and stronger access regulation. “Supply
medical doctors, however, is hardly controlled by states,” in which doctors’ associations and other
instruments of regulation. corporate actors are involved in the governance of
healthcare, should be characterized by higher
levels of healthcare provision, greater doctors’
Discussion autonomy, and lower access regulation. However,
strong state actors could also use their power and
The typologies summarized in Table 1 cover two financial capacities to invest more in healthcare. If
different areas of research. The first group is more we want to know how healthcare systems actually
focused on types of governance and on the role of work (e.g., for analyzing healthcare systems’
the state and other actors in healthcare. The effects on health, health inequalities, and health-
dimensions used are “coverage,” “financing,” care utilization), dimensions with a stronger focus
“consumption,” and “provision,” and the focus is on healthcare provision and patients’ access to
on “who” is responsible in these areas of the healthcare providers are required.
healthcare arena. In almost all typologies, Ger- The different focus of the two concepts
many (and to some extent Australia, Japan, and becomes clear when comparing two typologies
the Netherlands), the UK (often together with the that include the largest number of countries (see
Scandinavian countries and to some extent with Table 2). We almost always find the Scandinavian
New Zealand and the Netherlands), and the USA countries in the same type of healthcare system,
(with no other countries representing this type) are irrespective of whether the focus is on governance
contrasted. Böhm et al. (2013) put forward one of (Böhm et al. 2013) or on how healthcare systems
the first empirical classifications of healthcare work (Wendt 2014). Since the mid-2000s, Portu-
systems that covers a larger number of countries. gal and Spain have appeared to be close to the
Like earlier “role of actors and institutions” typol- Scandinavian group. Almost all CEE countries
ogies, the UK and the Scandinavian countries are can be found in a common type of healthcare
grouped into the same type; however, this time system. However, while the form of governance
they are together with the Southern European seems to be close to that of some Western social
countries. Furthermore, Germany is grouped health insurance systems (Belgium, France, the
together with Austria, Luxembourg, and Switzer- Netherlands) and of the Japanese social health
land. This grouping is much in line with argu- insurance, levels of financing and healthcare pro-
ments laid down in the OECD 1987 study and in vision as well as patients’ access to medical care
Moran’s comparative work but has so far not been are more similar to the situation in NHS countries
demonstrated empirically. Two other types that such as Denmark, Ireland, Italy, and the UK (see
have not been suggested in earlier studies are the Table 2). The Western social health insurance
“social health insurance type,” represented by countries of Austria, Germany, and Luxembourg
Australia, Canada, Ireland, New Zealand, and are similar in both their governance and the
Italy, and the “etatist social health insurance way they work. When focusing on levels of
type,” represented by countries from Central and financing, healthcare provision, and patients’
Eastern Europe as well as by Belgium, France, the access, Germany, Austria, and Luxembourg are
Netherlands, Israel, Japan, and Korea. close to Belgium, Canada, France, Japan, and
The second group of typologies is more New Zealand, which, according to Böhm et al.,
focused in “how” healthcare systems work, what represent different governance types. The USA
services they provide, and how patients access seems to be distinct from any other type of
necessary healthcare services. Both areas of healthcare system, both in the way it is regulated
research are necessarily interrelated, for the way and in its level of financing, provision, and
healthcare systems are governed influences the patients’ access to care.
936 C. Wendt
This overview of health system typologies Health system typologies also have limitations
suggests that the way healthcare systems are that are in part related to their strength of simpli-
governed does not directly dictate the way they fication. The identification of health system types
function. Even if very similar actors are always depends on the indicators chosen, and
involved in the regulation, financing, and provi- therefore the selection of indicators and their the-
sion of healthcare, the results can be very differ- oretical justification is key to healthcare system
ent levels of financing, healthcare provision, typologies. Furthermore, the correct definition of
and access regulation among individual coun- indicators is not always an easy task. For instance,
tries. It is therefore essential to construct health does health insurance offered by private organiza-
system typologies for both areas of research. It tions in the Netherlands, that are highly regulated,
depends on the specific research question at count as private or as social health insurance?
hand what the more useful typological category Also, so far typologies have used national aver-
is. Classifications capturing the role of actors ages that conceal regional differences. Due to the
and modes of governance are better suited to trend of decentralization, future typologies may
analyze reform options, cost containment, and have to take geographic inequalities into account
physical and human resource strategies in dif- (Reibling 2010). More generally, according to
ferent health system types, whereas classifica- Freeman and Frisina (2010) and Burau et al.
tions capturing how healthcare systems actually (2015), a trade-off between simplification and
work are better suited for assessing health sys- accuracy is inherent to typologies.
tems and their influence on health, inequalities
in health, and utilization of healthcare services.
The triangular model of health systems References
is of importance for health system typologies
Bambra C. Cash versus services: ‘worlds of welfare’ and
not only with respect to the main players and the decommodification of cash benefits and health care
their interactions in the three health markets services. J Soc Policy. 2005;34(2):195–213.
(the health insurance market, the healthcare Beckfield J, Olafsdottir S, Sosnaud V. Healthcare systems
purchasing market, and the healthcare provision in comparative perspective: classification, conver-
gence, institutions, inequalities, and five missed turns.
market) but also with respect to the way Annu Rev Sociol. 2013;39:127–46.
patients can use the healthcare system, which Böhm K, Schmid A, Götze R, Landwehr C, Rothgang H.
is related to factors such as the resources Five types of OECD healthcare systems:
spent on healthcare, cost-sharing arrangements, empirical results of a deductive classification. Health
Policy. 2013;113(3):258–69.
the level of healthcare services actually Burau V, Blank RH. Comparing health policy: an assess-
provided, and how patients can use these ment of typologies of health systems. J Comp Policy
healthcare services. Anal. 2006;8(1):63–76.
41 Health System Typologies 937
Burau V, Blank RH, Pavolini E. Typologies of healthcare provision, and access to healthcare. J Eur Soc Policy.
systems and policies. In: Kuhlmann E, Blank 2009;19(5):432–45.
RH, Bourgeault IL, Wendt C, editors. The Palgrave Wendt C. Changing healthcare system types. Soc Policy
international handbook of healthcare policy and Adm. 2014;48(7):864–88.
governance. Basingstoke: Palgrave Macmillan; 2015. Wendt C, Frisina L, Rothgang H. Health care system types.
p. 101–15. A conceptual framework for comparison. Soc Policy
Conley D, Springer KW. Welfare state and infant mortality. Adm. 2009;43(1):70–90.
Am J Sociol. 2001;107(3):768–807.
Eikemo TA, Bambra C, Judge K, Ringdal K. Welfare
state regimes and differences in self-perceived health
in Europe: a multilevel analysis. Soc Sci Med. Further Reading
2008;66:2281–95.
Esping-Andersen G. The three worlds of welfare capital- Freeman R. The politics of health in Europe. Manchester:
ism. Cambridge: Polity Press; 1990. Manchester University Press; 2000.
Freeman R, Frisina L. Health care systems and the problem Freeman R, Moran M. Reforming health care in Europe.
of classification. J Comp Policy Anal Res Pract. West Eur Polit. 2000;23(2):35–59.
2010;12(1):163–78. Gauld R. The new health policy. Maidenhead: Open
Frenk J, Donabedian A. State intervention in medical University Press; 2009.
care: types, trends and variables. Health Policy Plan. Giaimo S, Manow P. Adapting the welfare state – the case
1987;2(1):17–31. of health care reform in Britain, Germany, and the
Gelissen J. Worlds of welfare, worlds of consent? Public United States. Comp Pol Stud. 1999;32(8):967–1000.
opinion on the welfare state. Leiden: Brill; 2002. Immergut EM. Health politics: interests and institutions in
Hassenteufel P, Palier B. Towards neo-Bismarckian health Western Europe. Cambridge: Cambridge University
care states? Comparing health insurance reforms Press; 1992.
in Bismarckian welfare systems. Soc Policy Adm. Marmor T, Wendt C, editors. Reforming healthcare
2007;41(6):574–96. systems. Two Volumes. Cheltenham/Northampton:
Marmor T, Wendt C. Conceptual frameworks for compar- Edward Elgar Publishing; 2011.
ing healthcare politics and policy. Health Policy. Montanari I, Nelson K. Social service decline and conver-
2012;107(1):11–20. gence: how does healthcare fare? J Eur Soc Policy.
Moran M. Governing the health care state. A comparative 2012;23(1):102–16.
study of the United Kingdom, the United States and Moran M. Understanding the welfare state: the case of
Germany. Manchester: Manchester University Press; health care. Br J Polit Int Relat. 2000;2(2):135–60.
1999. Rothgang H, Cacace M, Frisina L, Grimmeisen S, Schmid
OECD. Financing and delivery of health care. A compar- A, Wendt C. The state and healthcare. Comparing
ative analysis of OECD countries. Paris: OECD; 1987. OECD countries. Basingstoke: Palgrave Macmillan;
Reibling N. Healthcare systems in Europe: towards an 2010.
incorporation of patient access. J Eur Soc Policy. Smith P, Anell A, Busse R, Crivelli L, Healy J, Lindahl AK,
2010;20(1):5–18. et al. Leadership and governance in seven developed
Weber M. The methodology of the social sciences. New health systems. Health Policy. 2012;106:37–49.
York: The Free Press; 1949. Tuohy C. Accidental logics: the dynamics of change in the
Wendt C. Mapping European healthcare systems. health care arena in the United States, Britain, and
A comparative analysis of financing, service Canada. New York: Oxford University Press; 1999.
Organization and Governance:
Stewardship and Governance in 42
Health Systems
Scott L. Greer
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939
Definitions: Into the Mire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 940
Comparing and Measuring Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 941
Good Enough, or Better, Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942
Attributes of Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943
Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943
Accountability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944
Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944
Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945
Policy Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945
A Diagnostic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946
As a result, both have been stretched by aca- Czempiel 1992), and public management (Rhodes
demics, governments, international organizations, 1997), for example, differed greatly.
consultants, and other ideological entrepreneurs International organizations became particu-
who want the power that comes with its larly interested as part of the backlash against
invocation. structural adjustment lending and, in particular,
This chapter will first separate out stewardship their role in the Asian financial crisis and its
and governance, providing key definitions and aftermath. Fifteen years of increasingly invasive
making the point that while they might be in the policy conditionality in the service of structural
hands of political rivals, they are not intellectually adjustment failed to produce the desired effects in
rivalrous concepts. It then presents the results of the structurally adjusted countries (Greer 2013;
our review of concepts, presenting the five attri- Woods 2006). They turned to good governance
butes of governance (which are also among many as a solution (e.g., World Bank 1992, 1994). The
desirable objectives of stewardship) that emerged essential logic was simple enough: reforms, espe-
as mutually exclusive and able to cover the many cially those imposed through conditional loans,
activities and ideas classified as “governance.” frequently had serious noncompliance problems,
faced serious implementation problems, and had
the wrong effects. The response was to blame
Definitions: Into the Mire these problems on the governance – the organiza-
tion, probity, competence, and coordination –
Governance has several kinds of meaning. On one of the countries involved and try to improve
hand, it has spread across multiple fields that use it that as a part of development or financial rescue
in different ways to discuss topics as different as (Nunnenkamp 1995).
the proper constitution of a company board and In 2013, all three preoccupations are alive and
the nature of public management in the Internet well: we have governance as a field of manage-
age. On the other hand, it is used for a variety of ment, including corporate governance and clinical
normative, empirical, and mixed projects. governance in health (Walshe and Smith 2011),
While there have been sporadic uses of the word governance as a sprawling and contested term
for many years, it became a common modern con- applied in endless different ways by social scien-
cept first in the discussion of management, specifi- tists in analyzing the world (Kjaer 2004; Bevir
cally corporate governance, the organization of 2013), and governance as a normative concept
power within commercial firms. In the 1980s, it used when policymakers speak about improving,
started to pick up a second usage; it was used in essentially, international public management
political economy research to discuss arrangements (Fukuyama 2013).
in which organizations such as unions, professions, In each of these incarnations, governance-
and government collectively coordinated activity (e. speak has two essential uses. One is empirical:
g., Campbell and Lindberg 1991). In the aftermath the description and analysis of what is. One
of the Cold War, more academics became interested is normative: calls for how it ought to be.
in it as a descriptive term for systems that produced Empirically, governance in almost any account
collective decisions without having clear centers of is some form of authoritative coordination, which
hierarchical power (as distinct, in some once-fash- means decisionmaking and implementation. Such
ionable formulations, from “government”). In this analyses tend to try to capture the mechanisms by
capacity, the term drew on and partially displaced which authoritative decisions are made, analyzing
perfectly good older terms such as “networks.” In the powers, responsibilities, and coordination of
the hands of these scholars, governance came to professions, insurers, providers, governments at
mean almost anything that generated order without different levels, and the other actors who make
hierarchy; its meanings in transaction cost econom- and implement decisions in health systems.
ics (Williamson 1996), European studies (Marks et Normatively, governance can be termed good,
al. 1996), international relations (Rosenau and or better or worse, and the parallel normative,
42 Organization and Governance: Stewardship and Governance in Health Systems 941
Good Enough, or Better, Governance The third problem is that governance, being a
power word (Frederickson 2005) whose invoca-
Two words, three broad traditions of their use, a tion strengthens all sorts of arguments and claims,
plethora of international comparative enterprises, therefore has had a wide range of attributes added
and both normative and empirical applications: to it. These are often self-contradictory or hard to
this is a dispiriting starting point for a discussion derive from either data or first principles. For
of how the vocabulary of governance and stew- example, some international organizations view
ardship may be used to understand or improve “conflict prevention” as an important aspect of
health systems. good governance, and others do not (Barbazza
The first problem to address is the confusion and Tello 2014). Does this mean that the WHO
created by political analysts of many stripes, rang- regards conflict as part of good governance?
ing from entrepreneurial consultants to entrepre- Obviously not. Rather, what it shows is that lists
neurial academics, who sought to distinguish of attributes of good governance have a tendency
governance as a type of organization from gov- to be arbitrary and utopian. Defining the aspects of
ernment. This approach defined governance in good governance is tantamount to defining the
terms of self-organization, networks, and a blend good society, and that is questionable on matters
of public, nongovernmental, and private actors, of taste and practicality.
rather than “government,” which connoted hier- Notably, few if any systems show all the attri-
archy, legalism, and inflexibility. The essential butes that have been assigned to “good gover-
distinction was spurious and misleading; net- nance,” and many highly functional systems
works were hardly new forms of political organi- have aspects of poor governance – opacity, cor-
zation, in the West or anywhere else, and the ruption, nepotism, clientelism, and other prob-
hierarchical authority of states and other big orga- lems occur in many places. Few if any countries
nizations such as corporations remained very vaulted into high-income brackets while enjoying
powerful and effective (Bevir 2013). Here, fol- good governance as many define it today (Greer
lowing on current usage and the international and Jarman 2011; Brewer et al. 1999), and a few
institutions, governance is a description of overall practices we associated with bad governance have
decisionmaking and implementation rather than proved flexible and effective, for example,
an ideal type rendering of a particular form of clientelism can mean disruption and bad adminis-
public administration. tration by political jobbers but also allows
The next problem is with the concept of “good reformers to put technically skilled people into
governance.” If governance can be better or important posts (Grindle 2012).
worse, then it seems reasonable to seek to identify The problem, therefore, is the one noted
and generalize practices of good governance, by Tolstoy: all happy families are the same, but
whether it is corporate governance activists trying all unhappy families are different. So many
to generalize good recruitment practices for things have to go right to produce a happy
boards or international financial institutions trying family that the variation within the category
to generalize good governance for the recipients of happy families is limited. Unhappy families
of their funds. Two difficulties arise. The first is have many more degrees of freedom. And it is in
revealed by the syllogism: if governance is how the realm of unhappy families that policy
decisions are made and implemented, then good scholars and policymakers must operate.
governance is good decisionmaking and imple- The solution lies in the simple concept of
mentation throughout a whole society. The likeli- “Good enough governance.” Good enough
hood that the same things, defined with any level governance is a concept formulated by Merilee
of specificity, will constitute good governance in Grindle, who pointed out that many lists of
every society on earth seems limited (Andrews governance attributes have an arbitrary and uto-
2013). Excessive concreteness is a besetting prob- pian character (Grindle 2004, 2007; Thomas
lem in advice about good governance. 2015).
42 Organization and Governance: Stewardship and Governance in Health Systems 943
Drawing from this, a more intellectually and More specifically, our review found five key
practically satisfying approach to governance is to aspects of governance that matter and in many
view governance not as a desirable end state but cases can be strengthened. They are not a list of
rather as an activity that can be carried out in attributes to which every society should aspire;
different ways with different effects. This diag- they are, rather, five aspects of health systems that
nostic approach views governance as a phenome- influence the success or failures of policies. One
non that exists in essentially all societies and of the remarkable aspects of the governance liter-
sometimes causes a problem for something else. ature is that, beneath a level of apparent concep-
Governance problems can be diagnosed as a rea- tual confusion, the same words and concepts
son for policy failures, and strengthening one constantly recur. In other words, despite many
aspect or another of governance can remedy pol- different terms and many different lists with dif-
icy failures. Likewise, some policies are just not ferent inclusions and exclusions, and many differ-
sustainable in some systems; governance that is ent conceptual hierarchies, the same five issues
good enough for maintaining basic public health recur. We sorted them into groups with minimal
functions might not be good enough to operate overlap that scholars or policymakers interested in
sophisticated quasi-markets for health care. governance should consider (Greer et al. 2016;
In other words, rather than insistently defining Greer et al. 2017). The result is the TAPIC frame-
good governanceit makes more sense to identify work, for its domains of transparency, account-
aspects of governance that improve the ability of ability, participation, integrity and capacity any of
health systems to achieve a sustainable balance of the five might be the first or most important issue,
equity, access, and cost containment. So, then, and all can exist relatively independently of each
what are aspects of governance that influence the other (accountability without transparency, for
ability of health systems to achieve their goals, example, is the norm in both medical care and
and which can in some cases be improved? Or, on automobile repair). The literature review and anal-
the other side of the coin, what is a governance ysis is presented in (Greer et al. 2016). Case
problem (as distinct from some other kind of studies exploring and showing the uses of the
problem), and what is a detailed taxonomy of TAPIC framework can be found in that book,
governance problems that might need understand- and in (Jarman 2017, Wolfe et al. 2017, Exworthy
ing or remedy? et al. 2017, Trump 2017, Vasev 2017, Willison
2017 and Greer et al. 2017)
Attributes of Governance
Transparency
The first question in using governance analysis to
improve policies and systems: is the challenge, or Transparency involves two things: making deci-
problem, or opportunity one of governance? sions clear and making clear grounds on which
There are other reasons programs fail. They can decisions were made (Woods 1999). At a mini-
be fundamentally bad ideas (though high-capac- mum, this means the kind of basic publicity long
ity, participative, transparent governance might familiar in functional governments – official noti-
reduce the odds of bad ideas being adopted). fications, open meetings, and latterly informative
They can be underfunded. They can also lack websites that make policies and policy processes
political support. understandable.
By a process of elimination, a workable, There are a variety of problems with such a
funded, and supported policy that fails suggests simple form of transparency, however; for a start,
a governance issue. More positively, do problems as every consumer knows, “fine print” can look
appear to lie in the decisionmaking and imple- transparent and effectively hide companies’
mentation systems of society? If so, that means actions. Transparency can be taken too far;
the problems lie in governance. decisionmaking necessarily involves both deals
944 S. L. Greer
and ambiguity, and problems arise if transparency policies; ombuds processes; legislative oversight
displaces real decisionmaking into shadows or and committees of oversight, and regulation
becomes a weapon for those who want to replace including the establishment of dedicated regula-
argument and prioritization with some more tory agencies. Each of these focuses on increasing
mechanistic (Best 2005). It also has the problem the extent of reporting and the ability of the forum
that policy information can be intricate, and to sanction the actor.
efforts to simplify it can also distort it (as fre- Accountability is not the same thing as a prin-
quently happens with both politics and website cipal-agent relationship, which favored form of
redesigns). The result is that simple notification economic modeling. In a public sector principal-
should probably be flanked by devices that permit agent relationship, a principal chooses an agent to
informed access to the policy process so that carry out its wishes (Smith et al. 1997; Besley and
informed journalists, NGOs, citizens, and experts Coate 2003). Governance, in this analysis, is
can contest decisions and their grounds. These better insofar as it shortens and clarifies princi-
mechanisms can include inspectorates, ombuds pal-agent relationships. There are two key prob-
procedures, public data releases, and freedom of lems with this style of analysis. The first is that
information laws. frequently the relationship is hard to characterize
Effective transparency should improve policy in that way – it might actually be a fiduciary model
by enhancing accountability and participation, rather than an agency relationship. The second is
deterring or quickly identifying corruption and that it is essentially normative rather than politi-
incompetence, and making policies more predict- cal; it assumes that there should be a clear princi-
able. The result, in theory, will be trust that an pal, agent, and instructions. A quick reflection on,
organization will not be erratic and in constant for example, the many missions of a hospital
pressure to be competent. shows the empirical limits (Marmor 2001).
Accountability Participation
Accountability is a relationship in which an actor Participation means that affected parties have
(such as a government agency) must account for access to decisionmaking and power so that they
its actions to a forum (such as a legislature) which acquire a meaningful stake in the work of an
can sanction it. In other words, it has three institutions (Woods 1999). Participation has
key attributes: actions, reporting, and sanction. many normatively desirable aspects – it is the
A good accountability relationship means that basis of democracy, after all – but there is also a
the interests of the forum (legislature, population) pragmatic case for participation of affected parties
is always in the mind of the actor, but the actor has in decisions that spans political regimes. That is
autonomy to formulate superior solutions. It can simple: participation helps to reduce or avoid the
also allow productive innovation; holding some- problems that emerge when key affected groups
body accountable for outcomes within limits resist a policy or when a policy is made without
rather than process can produce learning and knowing what they know. For example, complex
better policy outcomes in general (Sabel 2001; medical payment incentive systems do not work
Behn 2001). as intended if they are made without understand-
Mechanisms that policymakers use to achieve ing how doctors work and are paid (a common
accountability are diverse, including contracts; problem in “pay for performance” schemes). In
reporting requirements; financial mechanisms the worst case, it makes it clear what depth of
such as pay for performance; laws that specify opposition a policy will face once enacted.
objectives, reporting, and mechanism; competi- There are a variety of well-established partici-
tive bidding; organizational separation such as pation mechanisms, as well as a very large and
purchaser/provider splits; conflict of interest notably confused literature on public participation
42 Organization and Governance: Stewardship and Governance in Health Systems 945
in health that rarely explains the point of partici- traced), clear personnel policies (regular hiring, job
pation (for a critical discussion Stewart 2013) descriptions, and procedures to weed out flawed
and some experiments in novel forms of public people), a clear mandate for each organization, a
participation, such as participatory budgeting, clear and reliable budgeting process, administrative
whose popularity outside their places of origin is procedures such as document management and min-
clearer than their effectiveness (Seekings 2013). uted meetings, external audit (to put a check on
Established mechanisms of participation include people within the organization), and a clear sense
stakeholder forums, public consultations, elec- of organizational roles and purposes. Many of these
tions, appointed community representatives on policies, if added together, are bureaucracy – for
boards, and legal remedies (e.g., legislation that better or for worse. The challenge of public man-
allows aggrieved outsiders to litigate processes). agement is to gain the benefits of bureaucracy in
They can also include research, e.g., surveys of terms of merit, impartiality, and efficiency without
local opinion about a given option. When affected risking too much wasted effort or incompetence.
bodies are other governments or organizations,
advisory committees, partnerships, joint budgets,
and special forums for consultation are effective Policy Capacity
mechanisms for ensuring that different govern-
ments will be aware of decisions and make their Finally, most accounts of effective health gover-
views clear. nance include a discussion of policy capacity: the
The benefit of participation is the potential cre- ability to develop policy that is aligned with
ation of “ownership,” i.e., a sense among affected resources in pursuit of societal goals. Policy
parties that they have a stake in the success of an capacity is a property of what Edward Page
initiative. Without ownership, there is a real risk of calls the “policy bureaucracy,” that part of an
sabotage, lassitude, or simple ignorance, all of organization, especially a government, whose
which amount to implementation failure. There is purpose is to produce policy (Page and Jenkins
also the potential benefit of increased legitimacy – 2005). Just as a health policy initiative can run into
the sense that decisions are taken in ways that trouble for a lack of medical staff, it can run into
reflected the relevant interests. trouble for a lack of policy staff who are capable of
identifying, synthesizing, and analyzing a wide
variety of information in order to spot problems,
Integrity make the case against ill-considered policies, and
work through the procedural and practical chal-
Integrity is one of many words for the key attri- lenges of implementation. It can look good to
butes of a well-run modern bureaucracy: pro- reduce policy capacity – civil servants at the heart
cesses of representation, decisionmaking, and of the state do not always have public sympathy –
enforcement should be clearly specified; all mem- but it can have negative consequences in the form
bers should be able to understand and predict the of poorly thought-out policies.
processes by which an institution will take deci- The development and improvement of policy
sions and apply them; and individuals should have capacity is a central preoccupation of public man-
clear roles and responsibilities. In other words, agement scholarship, and the list of tools for doing
an organization with a high level of integrity is it is long. It includes mechanisms to produce
meritocratic, separates the person and the office, intelligence on developments in the system and
and is not corrupt. These are the bases for its performance, so that policymakers can identify
well-functioning, long-lasting trustworthy and react to problems and intelligence on process
organizations. such as budgetary and legal issues (all too often
Mechanisms policymakers can use to promote or neglected in health policy analysis), research and
entrench organizational integrity include internal analysis capacity (trained staff who can conduct or
audit (so that money moves as intended and can be commission research and deal with literature and
946 S. L. Greer
Greer SL, Wismar M, Figueras J, editors. Strengthening Smith PC, Stepan A, Valdmanis V, Verheyen P. Principal-
health system governance: better policies, stronger per- agent problems in health care systems: an international
formance. Brussels/Philadelphia: European Observa- perspective. Health Policy. 1997;41(1):37–60.
tory on Health Systems and Policies/ Open University Smith PC, Mossialos E, Papanicolas I. Performance mea-
Press; 2016. surement for health systems improvement: experi-
Greer SL, Vasev N, Wismar M. Fences and ambulances: ences, challenges and prospects. Copenhagen: WHO
Intersectoral governance for health. Health Policy. Regional Office for Europe; 2008.
2017;121(11):1101–1104. Stewart E. What is the point of citizen participation in
Grindle MS. Good enough governance: poverty reduction health care? J Health Serv Res Policy. 2013;18(2):
and reform in developing countries. Governance. 124–6.
2004;17(4):525–48. Thomas MA. Govern Like Us: U.S. Expectations of Poor
Grindle MS. Good enough governance revisited. Dev Pol- Countries. Columbia University Press; 2015.
icy Rev. 2007;25(5):533–74. Travis P, Egger D, Davies P, Mechbal A. Towards better
Grindle MS. Jobs for the boys: patronage and the state in stewardship: concepts and critical issues. In: Global
comparative perspective. Cambridge, MA: Harvard programme on evidence for health policy discussion
University Press; 2012. papers. 2002. www.who.int/healthinfo/paper48.pdf
Jarman H. Trade Policy Governance: What Health Travis P, Egger D, Davies P, Mechbal A. Towards
Policymakers and Advocates Need to Know. Health better stewardship: concepts and critical issues.
Policy. 2017;121(11):1105–1112. In: Murray CJ, Evans DB, editors. Health
Kjaer AM. Governance. New York; 2004. systems performance assessment: methods, debate
Marks G, Hooghe L, Blank K. European integration and empiricism. Geneva: World Health Organization;
from the 1980s: state-centric v. multi-level governance. 2003.
J Common Mark Stud. 1996;34(3):341–78. Trump BD. Synthetic biology regulation and governance:
Marmor T. Fads in medical care policy and politics: the Lessons from TAPIC forthe United States, European
rhetoric and reality of managerialism. London: The Union, and Singapore_. Health Policy. 2017;121
Nuffield Trust; 2001. [Rock Carling Fellowship (11):1139–1146.
Lecture 2001]. Uslaner EM. Corruption, inequality, and the rule of law.
Nunnenkamp P. What donors mean by good governance: Cambridge: Cambridge University Press; 2008.
heroic ends, limited means, and traditional dilemmas of Vasev N. Governing energy while neglecting health - The
development cooperation. IDS Bull. 1995;26(2):9–16. case of Poland. Health Policy. 2017;121(11):1147–1153.
Page EC, Jenkins B. Policy bureaucracy: government with Walshe K, Smith J. Leadership and governance.
a cast of thousands. Oxford: Oxford University Press; In: Healthcare management. 2nd ed. Maidenhead:
2005. Open University Press; 2011.
Rhodes RAW. Understanding governance: policy net- Williamson OE. The mechanisms of governance. New
works, governance, reflexivity and accountability. York: Oxford University Press; 1996.
Philadelphia: Open University Press; 1997. Willison C. Shelter from the Storm: Roles, responsibilities,
Rosenau JN, Czempiel E-O. Governance without govern- and challenges in United States housing policy gover-
ment: order and change in world politics. Cambridge: nance. Health Policy. 2017;121(11):1113–1123.
Cambridge University Press; 1992. Wolfe I, Mandeville K, Harrison K, Lingam R. Child
Rothstein B. The quality of government: corruption, social survival in England: strengthening governance for
trust and inequality in international perspective. health. Health Policy. 2017;121(11):1131–1138.
Chicago: University of Chicago Press; 2011. Woods N. Good governance in international organizations.
Sabel C. A quiet revolution of democratic governance: Glob Gov. 1999;5:39–61.
towards democratic experimentalism. In: OECD, Woods N. The globalizers: the IMF, the World Bank, and
editor. Governance in the 21st century. Paris: OECD; their borrowers. Ithaca: Cornell University Press; 2006.
2001. World Bank. Governance and development.
Sabet DM. Police reform in Mexico: informal politics Washington, DC: World Bank; 1992.
and the challenge of institutional change. Stanford: World Bank. Governance: the World Bank’s experience.
Stanford University Press; 2012. Washington, DC: World Bank; 1994.
Seekings J. Is the south Brazilian’? The public realm in World Health Organization. The world health report 2000:
urban Brazil through a comparative lens. Policy Polit. health systems: improving performance. Geneva:
2013;41(3):351–70. WHO; 2000.
Provision of Health Services:
Long-Term Care 43
Vincent Mor and Anna Maresso
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 950
Who Uses Long-Term Care Services and
Supports? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951
Background to Long-Term Service and Support “Systems” . . . . . . . . . . . . . . . . . . . . . . . . . . . 951
Structure of Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953
Financing of Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953
Expenditure on Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953
Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954
Paying for Long-Term Care Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959
Structure of the Delivery System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 961
The Long-Term Care Services and Supports Continuum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 962
Long-Term Care Bed Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963
Community-Based Service Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965
Informal Care Provision and Cash Payments for Dependent Care Allowances . . . . . . . 966
Regulating Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 968
Different Regulatory Approaches to Quality Assurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 968
The Regulatory Reach of Quality Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 969
Challenges Facing Quality Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976
Abstract
V. Mor (*) This chapter examines the financing, organiza-
Department of Health Services, Policy and Practice, Brown tion and regulation of long-term care in OECD
University School of Public Health, Providence, RI, USA countries. Historically, long-term care services
Providence Veterans Administration Medical Center, and supports constitute a blending of social
Center on Innovation, Providence, RI, USA welfare benefits and health care provision.
e-mail: vincent_mor@brown.edu Depending on the complexity and severity of
A. Maresso care recipients’ needs, delivery is characterized
European Observatory on Health Systems and Policies, by both specialized nursing and medical care
London School of Economics and Political Science,
London, UK and personal and home-help services such as
e-mail: a.maresso@lse.ac.uk assistance with meals, grooming and
household chores. The delivery of long-term discussion of key challenges in quality mon-
care is accomplished via institutional (residen- itoring and its role in enhancing user choice
tial) care, formal home care services, as well as and stimulating improvements in providers’
through informal care provided by family performance.
members or hired care givers. In line with the
preferences of older people to remain in their
own homes, the past decade has seen a sub-
stantial shift in most OECD countries towards Introduction
more home and community based care. This
trend has regulatory and cost implications for The network of long-term care services and sup-
monitoring the quality of care, which in the ports that provide assistance, both financial and
past has focused predominantly on institutions. personal, to frail and disabled individuals in soci-
Moreover, increased demand for formal ser- ety is not really a system. While many who study
vices, in both residential and home care set- long-term care document and compare the poli-
tings, due to ageing population pressures, also cies and practices that characterize country’s ser-
has implications for the long-term care work- vice structure talk about the “long-term care
force, with shortages anticipated over the next system,” in most countries it is best to view
20–40 years. long-term services and supports as an amalgam
While funding of long-term care services of laws, policies, rules, practices, and service pro-
comes mainly from public sources, there are viders that emerged over the decades as a response
very large variations between OECD coun- to social and demographic changes in developed
tries in the resources dedicated to this sector. and developing societies. Unlike medical care or
Eligibility for coverage also varies between even public health structures, historically long-
countries, ranging from universal systems - term care services and supports constitute a blend-
based solely on need and not on income - to ing of social welfare supports and health care
long-term care systems that apply means test- provision. The supports required by frail older,
ing and safety-net principles to determine or seriously disabled, individuals include
who qualifies for publicly-provided long- enhanced finances made necessary to buy the
term care services and benefits. However, help needed to sustain daily life or the services
irrespective of financing model, all countries from appropriate agencies to provide that help.
use some form of needs assessment to judge Since the principal cause of frailty and/or disabil-
an applicant’s level of functional impairment ity is compromised health, it is almost always the
and care needs. Financial support is provided case that more and more complex and comprehen-
via in-kind services or through cash benefits sive medical care is needed in conjunction with
to recipients to purchase the services they support services.
need (with varying degrees of restrictions). Another factor that differentiates long-term
Cost-sharing, in the form of user charges, care services and supports from the provision of
play a role in all countries, to different health care services is that most often long-term
degrees, with service users, unless they are care services and supports is a family affair made
destitute, having to meet a proportion of the possible by a person’s spouse and children and
cost of their care from their own private less often extended family. Indeed, evidence sug-
resources. gests that, depending upon the country, between
The chapter also looks at the regulatory 10 and 40 times more care is provided by informal
mechanisms used across a selection of carers than by formal agency staff, whether insti-
countries to monitor the quality of long-term tutionally based or providing home-based ser-
care, particularly in residential facilities, vices (Columbo et al. 2011). Unlike demands for
identifying three broad quality assurance primary care medicine, as demand for long-term
approaches. The chapter ends with a care services and supports increases it can be met
43 Provision of Health Services: Long-Term Care 951
by policies favoring provision of care by families Who Uses Long-Term Care Services
or by formal sources, the former more reminiscent and Supports?
of how long-term care has been historically
provided. Most users of formal long-term care services
The history of formal long-term care in west- (institutional or community-based) are women
ern societies is closely bound up with the aged 80 and over. However, according to OECD
emergence of state sponsored social welfare health data, there is considerable country to coun-
efforts ranging from “almshouses” to outdoor try variation in the proportion of women aged
relief efforts designed to support paupers 80 and over who use long-term care services
and others unable to care for themselves (Kellog from a low of 2% in Poland to a high of over
1883; Katz 1996). Almshouses in Britain, the 45% in Norway (Columbo et al. 2011). Interest-
Netherlands, France, and Belgium housed ingly, depending upon the country, a sizeable
indigent elderly and disabled persons unable to minority of long-term care service recipients are
care for themselves without family members to under age 65, with Poland leading the way with
whom they or local authorities or charities almost half (48%) of formal care recipients being
could appeal for support. In European and under 65. However, the substantial variation in the
Anglo-Saxon countries these facilities emerged availability of home versus institutionally based
from a tradition of sectarian or local charitable long-term care services and how cash allowances
organizations but were not infrequently provided to frail elderly persons and their families
conflated with support for the poor, the destitute, are counted in long-term care user statistics makes
and the alcoholic. Local authorities, not just it difficult to be too precise in comparing rates of
in England where the poor laws prevailed, use across OECD countries. This is a theme which
established almshouses or hospices to care will be revisited throughout this chapter since
for the unfortunate and dying as a civic without reliable data to characterize the nature of
responsibility. the services provided and the characteristics of the
There are two overlapping dualities that char- recipient population, it is difficult to have a great
acterize the scope and delivery of long-term ser- deal of confidence in many of the statistics used to
vices and supports. First, services and supports compare the long-term care systems of one coun-
represent both financial support for basic try with another.
food and shelter and the provision of physical
support and care for those unable to do even the
simplest daily tasks without help. Second, Background to Long-Term Service
because care recipients are in need largely due and Support “Systems”
to the complexity and severity of their medical
conditions, services generally involve both As noted, long-term care “systems” are in almost
unskilled homemaker services as well as special- all cases a misnomer because it is the exceptional
ized nursing and medical care. Third, different country that actually has an integrated system.
countries have adopted varying mechanisms to Regardless of its “system-ness,” in general, long-
meet the long term care needs of population term care can be conceptualized as three
ranging from cash payments to eligibility interlocking sets of policies and forces which
determination processes which fundamentally apply, regardless of the country. These three fea-
define each country’s long term services and tures include: (1) financing and reimbursement,
supports structure. As will be observed in the that is, who pays and how the services rendered
paragraphs below, many countries make all are reimbursed; (2) the organization of the deliv-
these different dimensions of services and ery system, that is, how the providers of long-term
supports available under public funding with or care services and supports are organized and coor-
without means testing the client and/or her dinated, since clients often receive a multiplicity
family. of services from different providers and sources;
952 V. Mor and A. Maresso
and (3) the regulatory or quality assurance system, extent that financing changes are also designed to
that is, the regulations, rules, and procedures give the eligible service recipient some choice as
governing licensure and quality standards for to how their needs are to be met in the form of
agencies serving the long-term care population. “consumer direction” (Doty et al. 2010), addi-
These three components are interdependent; tional operational complications arise related to
changes made in each affect the implementation personal care workers’ compensation, indemnifi-
and impact the others have, either directly or cation, and even whether family members can be
indirectly. For example, over the past decade paid to provide the care. As is obvious, these
most OECD countries have increased their wrinkles in the financing rules and allowances
emphasis on home and community-based services introduce a further complication in the regulatory
to meet the preferences of a new generation of control structures since it is difficult for govern-
seniors who are less willing to be relegated to an ment to regulate the quality of familial
institutional setting (Grabowski et al. 2010; relationships.
Damiani et al. 2011). Indeed, the movement Changes in financing also have implications
toward “consumer directed care” represents the for the organization of the long-term care
epitome of the shift toward home and delivery system. For example, countries that
community-based care since the underlying instituted universal long-term care insurance
assumption is that older persons will use the new policies that include cash transfers must
discretion to remain in the community and outside determine whether those funds can be used to
of institutions (Alakeson 2010). purchase home care services, regardless of
Shifting payments to home care providers, the licensure status of the agency or worker
away from the past dominance of institutions, employed to provide the service. However,
has immediate implications for the structure of policies which only reimburse recipients
the delivery system as well as how it is regulated. and their families for services rendered by
To implement policies stimulating the develop- licensed or professionally supervised staff
ment of home care services, regulatory structures necessarily means costs will be higher. Without
that have historically been oriented toward moni- such requirements, institutional care providers,
toring quality in institutions must be realigned to who are required to adhere to professional
manage a much more diverse and complex over- licensure requirements and labor laws including
sight process. To assure that agencies charged tax withholdings, would have a legitimate
with meeting the needs of the elderly in their complaint about there not being a “level
homes actually are providing the care for which playing field,” since cash transfer payments
they are paid requires visiting clients and their that result in families hiring illegal
families in their homes and/or demanding exten- immigrants can be seen as undermining the
sive care management auditable documentation. formal health and social care services labor
This means that the costs of realigning long-term market.
care services from the institution to the commu- Understanding how changes in one compo-
nity will require a very different, and costly regu- nent of the system affect the others is
latory and oversight structure. Furthermore, further complicated by the fact that it spans the
financing home and community services repre- health care sector as well as the formal and infor-
sents a substantial departure from the institutional mal labor market. The emphasis on home
approach, where purchasing a day of care in a care places increased pressure on family care-
nursing home is well understood. In the case of givers who, in rich countries, often supplement
home care services, whether to pay by the hour, direct family care time with undocumented
the skill level of the staff person, or even to bundle workers’ time, thereby violating labor laws and
payment with other post-acute care services or via possibly endangering the frail older person.
capitation are all decisions that have different Recent efforts within the OECD to better
implications for how payments are made. To the characterize the variation in long-term care
43 Provision of Health Services: Long-Term Care 953
Table 1 Funding for publicly provided long-term care, selected OECD countries
Per capita
spending LTC Total government /state Private share/out-of- Private insurance
on LTC funding component (%) of public pocket component component (%) of
(US$ as % of LTC expenditure (incl. taxes (%) of public LTC public LTC
Countryc PPP)a GDPa and social insurance)b, d expenditureb, d expenditureb, d, e
Slovak 42 0.2 – – –
Republic
Czech 59 1.4 100 0 0
Republic
Poland 68 0.4 92.3 0.3 0
Korea 73 0.3 76.9 17.8 0
Hungary 108 0.3 90.3 2.4 0.9
Spain 271 0.6 71.9 28.1 0
Slovenia 302 0.8 75.4 24 0.5
Australia 367 0.8 88.9 8.5 0.3
New Zealand 383 1.3 92 4.4 1.3
United States 455 0.6 – – –
Germany 470 0.9 67.2 30.4 1.7
Austria 497 1.1 81.8 17.1 0
Japan 527 1.4 88.9 7.1 4
France 564 1.7 99.2 0.4 0.4
Canada 574 1.2 82 16.8 0.4
Iceland 638 1.7 100 0 0
Belgium 707 1.7 90 0.2 9.8
Denmark 724 1.8 89.6 10.4 0
Finland 790 1.8 84.4 14.2 0
Luxembourg 822 1.4 – – –
Norway 1276 2 89.3 10.7 –
Sweden 1332 3.6 99.2 0.8 0
The 1431 3.5 99.9 0 0
Netherlands
Portugal – 0.1 53.4 45.4 1.1
Switzerland – 0.8 38.8 58.4 0.4
Source: Adapted from Columbo et al. (2011)
Notes
a
Data from 2008.
b
Data from 2007.
c
Countries are listed from lowest to highest per capita expenditure on long-term care
d
Funding from government sources, private out-of-pocket expenditures, and private insurance do not always add up to
100% as the following other minor funding sources are excluded from this table: nonprofit institutions serving house-
holds, corporations (other than health insurance), and “other”.
e
Data on out-of-pocket spending for some of the countries are underestimated. For example, in the Netherlands, cost-
sharing on long-term care services is estimated to account for 8% of the total long-term care expenditure. The share of out-
of-pocket spending for Switzerland is overestimated as cash benefits granted for care in care facilities are not considered.
Types of Public Long-Term Care Systems services, until they become impoverished paying
The scope of entitlement provides a useful way to for such services privately. The only other way
classify countries’ public long-term care systems that these individuals’ long-term care needs can
since this approach captures whether entitlement be met is if services are provided as part of the
is universal or whether access to services is health care system (as in the case of nursing care
means-tested and thus reserved for the poorest in the United Kingdom) which sets up the
individuals who are protected through a public dynamic of cross-subsidy between the health and
safety-net. In addition, such coverage may be social services sector, the latter generally being a
financed through a single program (such as gen- more costly means of meeting the same need.
eral taxation or a mandatory long-term care insur- It should be noted, however, that whatever
ance scheme) or through multiple programs and long-term care coverage model a country has,
benefits. Using these criteria, Columbo et al. needs assessments to judge an applicant’s level
(2011) identify three long-term care models: of functional impairment and care needs are a
(1) universal coverage systems with a single pro- central component of determining eligibility. Nor
gram, (2) mixed systems, and (3) means-tested is it the case that countries’ approach to coverage
safety-net systems. Table 2 classifies a number necessarily corresponds with their level of spend-
of OECD countries according to this typology. ing. For example, while the Netherlands, Sweden,
The main feature of single-program universal sys- Norway, and Luxembourg dedicate the highest
tems, as found in Germany, Japan, Luxembourg, per capita spending to long-term care services,
the Netherlands, and South Korea, which have other universal-system countries such as Ger-
mandatory long-term care insurance schemes, many, Japan, South Korea, as well as Denmark
and the Nordic countries, which have tax-based and Finland fall within the mid to upper-mid
financing programs, is that they provide public expenditure range. Similarly, all mixed system
long-term care services to everyone who is countries have mid-range long-term care spend-
assessed as needing care, based on their depen- ing, as does the USA, which belongs to the means-
dency level and regardless of income. That is, tested safety-net group.
access to services is not dependent on the income
level or assets of beneficiaries. Mixed systems, as What Long-Term Care Services Are
seen in Australia, Austria, France, and Spain, Covered?
typically have a number of different programs Most public long-term care systems cover both
and benefit schemes operating side by side, institutional and home-based services, although
which can be either universal or means-tested, the range of services covered varies, as does the
with the amount of the benefit adjusted down- proportion of the cost (see Table 2, as well as the
wards as the recipient’s income level increases. subsection on cost-sharing below). Universal
The countries in this group may also have long-term care systems tend to provide compre-
medical-related or nursing benefits covered uni- hensive long-term care packages encompassing
versally (free) through the health system. Finally, institutional/residential services, home care nurs-
means-tested safety-net systems (found in the ing, domestic assistance as well as sheltered hous-
United Kingdom and USA) use income or asset ing schemes, assistive devices, home
tests to set a threshold for entitlement to publicly modification, and transport to community ser-
provided long-term care services and benefits. vices. However, in some universal systems, such
Income and asset-testing is used to target those as those in Germany and South Korea, a notable
with the highest care needs and to protect those omission from the long-term care package is
who otherwise would not have the means to pur- accommodation (all rooms in Germany and pri-
chase care privately. However, if means-testing vate rooms in South Korea) and meal costs in
thresholds are set quite low, a large proportion of nursing homes; these must be paid for out-of-
elderly people in need of long-term care may be pocket. In Japan, lodging and meals in nursing
excluded from receiving publically provided homes are only partially covered. In mixed
956
Italy Mixed Tax National and Institutional care All people in need No Yes Cash and in-kind
regional benefits part of the home and
health system; cash institutional care
care allowance covers
home care
Japan Universal Social National LTC insurance Over 65, or 40–65 No Yes In kind only: home
insurance, system with age-related and institutional
plus personal Insured individuals disease care
contributions aged 40–65 pay 30%
of total LTC costs
Korea Universal Social National LTC insurance Over 65 s, or under No Yes Cash and in-kind
insurance system 65 suffering from home and
and taxes geriatric diseases institutional care
Luxembourg Universal Social National Single LTC insurance All people in need No Yes Cash and in-kind
insurance, system part of health home and
tax and a insurance system institutional care
Provision of Health Services: Long-Term Care
special tax
The Universal Social National LTC insurance All people in need No Yes Cash and in-kind
Netherlands insurance system with multiple home and
insurers institutional care
New Zealand Mixed Tax-based National Health funding All people in need Yes Yes In kind only: home
authority responsible and institutional
for LTC provision; care
Residential Care
Subsidy
Norway Universal Tax-based National and Single program All people in need No Yes Cash and in-kind
local home and
institutional care
Spain Mixed Tax-based Central and National long-term All people in need Yes Yes Cash and in-kind
regional care system home and
administered by institutional care
regions
Sweden Universal Tax-based Local and Single program All people in need No Yes Cash, in-kind and
national vouchers: home
(11–12%) and in-kind care
varies across
municipalities
(continued)
957
958
Table 2 (continued)
Type of system
based on
eligibility Financing Income/
(universal; source: tax, Government means-
mixed; means- social levels testing to
tested/low security contributing Program determine Needs Types of benefits
Country income) contribution to financing characteristics Who is covered eligibility? assessment provided
Switzerland Mixed Social National and Mandatory health All people in need Asset tested Yes Cash and in-kind
(health cantons insurance program (for some institutional care;
insurance), plus complementary benefits) home care mainly
state budget cash benefits under provided by private
Disability Insurance organizations
United Means-tested Tax-based National and Various programs and Social care benefits to Asset tested Yes Cash and in-kind
Kingdom safety-net local allowances all adults in need; (for some home and
specific allowances benefits) institutional care
for the disabled and
elderly disabled
United States Means-tested Tax-based National and Medicaid and People of low income Medicaid is Yes Mainly in-kind:
safety-net state Medicare programs (Medicaid) means- institutional
Seniors (Medicare) tested; benefits. Optional
Medicare is state home care
universal for benefits
seniors
Sources: Adapted from Fernandez et al. (2009); Columbo et al. (2011); Swartz (2013)
Note: LTC – long-term care.
V. Mor and A. Maresso
43 Provision of Health Services: Long-Term Care 959
systems, typically nursing care, either in home or impairment will cost the facility more in terms of
institutional settings, is financed on a universal the time, labor, and skills required to care for
basis by the parallel health system while personal them. In this example, a form of case-mix reim-
(social care) is covered under separate benefit bursement (such as the Resource Utilization
schemes. For example, in Italy, special nursing Groups case-mix system used in many US states
homes for elderly people are covered via the and in Ontario, Canada) that provides an incentive
health system budget while home care services to care for sicker patients would be more appro-
are mainly financed by a non-means-tested cash priate than a flat-rate reimbursement model that
care allowance whose modest level means it is pays the same amount per nursing home resident,
most often used to pay for informal care. In regardless of the intensity of their care needs.
Canada, most provinces cover nursing and per- While most often applied to the institutional set-
sonal care (such as help with bathing and ting, it is possible to devise case-mix reimburse-
grooming) in home settings, but other assistance ment models for home care services. Such
such as domestic help and meal preparation may considerations impact on both the efficiency of
require the user to pay a fee. In the group of the long-term care system as well as on its capac-
means-tested safety-net countries, the United ity to meet the growing care needs of the popula-
States sets a basic mandatory basket of long- tion requiring long-term care services.
term care services (such as nursing facility ser-
vices and home health-related services) through Cash Benefit Schemes
its Medicaid program for people on low incomes, There is some cross-national information on cash-
but individual states determine what other ser- benefit schemes, which offer recipients the choice
vices may be covered. In most states while benefit to purchase care services that they feel best meet
structures cover support for daily living activities their needs from the provider they prefer. While
in home-care settings as well as accommodation most countries offer a combination of in-kind
and meals in nursing homes, the latter services are services and cash benefits, a few, like Austria,
only available to those who meet strict means- France, and the Czech Republic, use cash benefits
testing and who have exhausted their own as the main type of long-term care purchasing
resources before becoming eligible for public sup- mechanism. These schemes differ among coun-
port (Columbo et al. 2011). tries as to whether they are available alongside
in-kind benefits or whether recipients must choose
either one or the other, whether the level of the
Paying for Long-Term Care Services cash benefit is determined through means/income
testing, and whether any restrictions are placed on
Comparing countries’ approach to paying for ser- how the benefit may be used. For example, some
vices is complicated by the lack of comparable countries require that only accredited formal ser-
international data on the different reimbursement vices be hired while others have very few restric-
mechanisms used to pay providers for different tions and allow the benefit to be used to pay family
types of care, whether it be fee-for-service pay- members or other informal carers for services
ments, capitation, or day-rates for nursing costs. rendered in the home. There is some evidence
The importance of having data on the impact of that the use of unregulated cash payments seems
different reimbursement vehicles may be illus- to incentivize the hiring of migrant care workers
trated by the case of how institutional services in countries such as Austria and Italy, who either
(i.e., in nursing homes) are paid for. If a country’s substitute or compliment personal care and
reimbursement mechanism does not recognize, domestic assistance traditionally provided by the
and adjust payment levels proportionally for cli- family (van Hooren 2008; Columbo et al. 2011;
ents/patients who have more complex needs and Phillips and Schneider 2007; see also section
require more care, there will be a disincentive for “Structure of The Delivery System” on Provi-
providers to admit such individuals as their greater sion). Table 3 provides an overview of the cash
960 V. Mor and A. Maresso
Table 3 Cash for care schemes for long-term care services, selected OECD countries
Income/
Country Benefits available Cash benefit programs asset tested Use restrictions
Austria Both in-kind and cash 1) Cash Allowance for Care 1) No No. Can be used to pay for
(Pflegegeld) 2)Income care by relatives or other
2) 24-hour care benefit 3) No carer
3) Dementia care benefit
Czech Only cash benefits Care allowance No No. For services or care by
Republic relatives
Denmark In-kind, cash, and BPA (Citizen Controlled No Yes. Not for nursing care
vouchers Personal Assistance)
France In-kind and cash Allocation personnalisée Income Yes. Use of APAs is strictly
benefits are separate d’autonomie (APA) controlled. Can be used to
pay for care by relatives but
not a spouse
Germany Users must choose Cash benefits part of LTC No Yes. Cannot be used to pay
between either in-kind insurance scheme: 52% of for care by relatives or for
or (lower value) cash users opt for cash benefits some services (such as GP
benefits services)
Italy In-kind and cash Indennità di No No. Can be used to pay
benefits are separate accompagnamento (Carer/ relative or other carer
Companion allowance)
Korea Users must choose Cash benefits part of LTC No Only available to users who
between either in-kind insurance scheme live in remote areas with few
or (lower value) cash facilities, are unable to use
benefits LTC facilities due to national
disasters, or are unsuitable
for institutional LTC due to
physical or mental condition.
Cannot be used to pay for
care by relatives
Luxembourg Users must choose Cash benefits part of LTC No Cash for the first 10.5 hours
between either in-kind insurance scheme: Cash of care per week
or (lower value) cash Allowance for Care
benefits
The Users must choose Cash benefits (Personal Care No 98.5% of expenses must be
Netherlands between either in-kind Budgets) are part of LTC justified and unspent funds
or (lower value) cash insurance scheme: 12% of returned. Personal Care
benefits users opt for Personal Care Budgets can be used to pay
Budgets for care by relatives but they
must have a contract
Spain Users must choose 1)Allowance for user to hire 1) Income 1)Hire through accredited
between either in-kind services 2) Income centers
or cash benefits (the 2) Allowance for user 3) Income 2) To compensate informal
latter vary according to receiving informal care carers who must be a relative
program) 3) Allowance for Personal or in rural areas; a neighbor
Assistance can qualify
3)Expenses must be
justified; carer must have
professional qualifications
Sweden In-kind and cash 1)Attendance Allowance 1) No Yes. Cannot be used to cover
benefits are 2)Assistance Allowance 2) No medical expenses or to pay
complementary; also for care by relatives
vouchers
(continued)
43 Provision of Health Services: Long-Term Care 961
Table 3 (continued)
Income/
Country Benefits available Cash benefit programs asset tested Use restrictions
United In-kind and cash 1)Attendance Allowance 1) Income 1) No
Kingdom benefits are 2) Direct Payments and asset 2) Yes. Spending record
complementary 3) Individual (social care) tested required
Budgets 2) Income 3) Yes. Cannot be used to
3) Income pay for care by relatives
and asset
tested
Sources: Adapted from van Hooren (2008); Columbo et al. (2011); Swartz (2013); Wirrmann Gadsby (2013)
benefit schemes available in a selection of coun- modulate access to care but also countries’ differ-
tries, highlighting these major differences. ent emphasis on social protection for vulnerable
or low-income groups.
Cost Sharing
Cost-sharing, in the form of copayments, deduct-
ibles, or user-charges apply to all long-term care Structure of the Delivery System
systems, whether they are universal, mixed, or
means-tested, safety-net systems. Commentators As can be seen in Fig. 1, based upon OECD
(Swartz 2013) have noted that rising long-term data on long-term care, there is substantial
care costs, aging populations, and pressure on variation in the percentage of the population
public sector spending due to structural deficits over 65 using long-term care services (OECD
and the recent financial crisis in Europe since 2013b). Consistent with its origins in the medi-
2008 have seen a shift to greater cost-sharing eval alms house and hospice, traditionally long-
among users of long-care services or their rela- term care was synonymous with residential
tives. In most cases, cost-sharing is subject to arrangements provided in an institution. Indeed,
income thresholds, with exemptions available for in spite of the concerted effort that most OECD
those meeting set criteria, such as low-income governments have made in “rebalancing” long-
status (Columbo et al. 2011; Swartz 2013). For term services and supports from institutions to
example, in the Nordic countries with universal community-based services, spending on long-
systems, cost-sharing mechanisms account for term care in institutions was higher than spend-
relatively low shares of publically financed formal ing at home in virtually all OECD countries in
long-term care services and in Sweden and Nor- 2008. On the other hand, there are many more
way, such contributions are capped. In contrast, using long-term care services residing at home.
beneficiaries in South Korea are required to pay a While in the average OECD country 12.9% of
coinsurance rate of 20% for residential care and those 65 and over receive formal long-term care
15% for home care (Jung et al. 2014). Similarly, in services, less than half (under 5%) receive care in
Australia, those eligible for public long-term care a residential or institutional setting. Indeed, most
services still need to contribute to the cost of their countries report that about twice as many in the
personal care in both residential and home set- population of long-term care users are receiving
tings, with the amount determined through those services at home (Columbo 2011).
means-testing (Columbo et al. 2011). Table 4 Figure 2, reflecting OECD Health Statistics
summarizes a number of cost-sharing approaches data reveals that in many countries with data on
to long-term care and provides some country the distribution of home-based and institutional
examples. Like user charges and cost-sharing care over time, it is evident that the share of long-
approaches in health care, it is clear that the dif- term care users receiving home care has
ferent approaches found in long-term care systems increased in most countries and as an OECD
reflect not only the incentive structures that average.
962 V. Mor and A. Maresso
The Long-Term Care Services based care. While many countries have had a
and Supports Continuum range of different types of long-term care resi-
dential arrangements for frail older persons for
As noted, presently most OECD countries have many decades, the full array of community-
a higher percentage of long-term care users based services has been a relatively recent
receiving care at home than institutionally development. This required the development
43 Provision of Health Services: Long-Term Care 963
25
22.1
20.3
20 19.1
17.6 17.4
16.7 16.3
14.5
15
13.1 13.0 12.8 12.7
12.3 11.7
11.2 11.2
10
7.2
6.7 6.4 6.4 6.4
5.9
0.8
Fig. 1 Percentage of population aged 65 or over receiving long-term care services, by country, 2011 (Source: OECD
2013a)
of a continuum of long-term services and sup- Indeed, some have argued that financing rules
ports, ranging from household chores to inten- and contradictory regulatory controls are the
sive, medically oriented nursing home care to major drawbacks to having more comprehen-
serve individuals as their needs increase. The sive and responsive long-term care delivery
different levels of intensity of nursing home systems.
care offered range from facilities that manage
chronically bed-bound patients requiring oxy-
gen, artificial feeding, and intravenous care to Long-Term Care Bed Capacity
facilities specializing in short term rehabilita-
tion to independent small apartments offering Even though home care is more prevalent than
congregate meals. Nonresidential long-term institutional care, the best data regarding long-
care services can range from intensive round term care services across the OECD refers to the
the clock “respite” services to weekly chore availability of residential long-term care beds per
and cleaning services, and all the range of nurs- elderly person. Figure 3 reveals the substantial
ing to meals services offered in the homes of intercountry variation in the number of residential
dependent elders or in community settings. Day care homes per 1000 elderly, ranging from under
care programs, with and without medical and 20 in Italy, Poland, and Korea to over 60 in the
nursing support, increasingly serve frail older Nordic and northern European countries. (In the
individuals who otherwise live with caregiver OECD data, Japan has very few nursing home
children. Finally, some have argued that even beds but many long stay hospital beds which are
the differentiation between residential and not counted as nursing homes although they serve
home care services can be false since, regard- a very similar population (Ikegami et al. 2014.)
less of where one lives, needed services can be Not included in these figures are OECD countries
provided to meet their needs (Kane et al. 1998). that do not report data on long-term care use such
964 V. Mor and A. Maresso
Fig. 2 Share of long term care recipients aged 65 and over receiving care at home, 2000 and 2011 (Source: Columbo et al.
2011)
as Mexico and Chile which must be presumed to Facilities, in the USA, that are not nursing
have even less well-developed resources than homes per se, has made it appear as if the number
countries like Italy, Korea, and Poland. In spite of nursing home beds per capita elderly has
of the higher prevalence of home care recipients, fallen dramatically, even though the average
spending on long-term care in institutions is impairment level of Assisted Living residents
higher than spending on home care in all OECD now is as great as it used to be among nursing
countries reporting with the exception of Den- homes two decades ago (Sloane et al. 2005;
mark (Columbo et al. 2011). This reflects two Smith et al. 2007; Stevenson and Grabowski
phenomena: first, institutional care is more costly 2010). Thus, the recorded number of long-term
than home care and second because a higher care beds per 1000 elderly is very sensitive to
proportion of those receiving institutional care the different definitions of what constitutes a
are very impaired, particularly in the absence of long-term care bed, suggesting that in both
very involved family caregivers (Carpenter and Japan as well as in the USA, OECD data under-
Hirdes 2013). counts the number of long-term care beds,
In the absence of standardized definitions for although for different reasons. Indeed, according
what constitutes a “long-term care bed,” OECD to the first national survey of long-term care
data on the rates of such beds per 1000 elderly providers done by the US National Center for
are necessarily vague. Many countries have dif- Health Statistics, in 2012 there were some 15,700
ferent definitions for what constitutes a long- nursing homes but 22,000 Assisted Living
term care bed. For example, Japan has many Facilities with 39 nursing home beds per 1000
small, long stay hospitals licensed quite differ- elderly and 20 Assisted Living beds per 1000
ently from other Japanese acute hospitals but elderly. Adding these together places the USA at
serving populations that are not all that different the same level of total long-term care beds per
from licensed nursing facilities (Ikegami et al. thousand elderly as exists in the Netherlands
1994; Ikegami et al. 1997). On the other hand, and Belgium, (but still below Sweden) and
over the last several decades a new class of substantially higher than Germany or France.
residential long-term care home, Assisted Living However, it is not known whether certain classes
43 Provision of Health Services: Long-Term Care 965
Fig. 3 Residential Care Home Beds per 1000 Elderly aged 65+, selected OECD Countries, 2009 (Source: OECD 2013a)
and cultural histories. Additionally, in many labor shortage (Mot and Willeme 2012). Indeed,
OECD countries the organization and provision demand for formal institutional care, as well as
of long-term care services is a local matter for formal home care support services, are anticipated
municipalities operating under national guidelines to increase from 100% to over 200% between
that still allow considerable local discretion in 2010 and 2060 in the face of a relative flattening
how long-term care policies are implemented of the available number of informal caregivers and
(Tarricone and Touros 2008). This means that a projected decline in the number of long-term
national governments may not have the kind of care workers employed in the formal sector.
detailed data that would make it possible to char- These projections for the four countries examined
acterize the supply of services across the whole as part of the ANCIEN project are consistent with
country. For example, a 2008 report on home care other European countries, and the authors suggest
included a section on the supply of home care that policy makers face major challenges in the
without offering any data regarding the number coming years as demand outstrips the supply of
of agencies, workers, or services available to the caregivers, both informal family members and
elderly population (Tarricone and Touros 2008). formal agency employees. The only way to
A report compiled as part of a conference increase the supply of formal care workers is to
hosted at the University of Amsterdam by Profes- substantially alter their compensation and
sor Dyvendak and his colleagues included a series improve their working conditions, both of which
of detailed case studies regarding the structure of will dramatically increase the costs of services
the long-term care service delivery systems in that are already projected to bankrupt countries
Greece, Germany, Italy, Poland, the Netherlands, based only upon changing demography.
England, Sweden, and Norway. The project was
designed to assess the adequacy of political com-
mitment to providing support services to the frail Informal Care Provision and Cash
elderly and others with long-term care needs Payments for Dependent Care
(Duyvendak et al. 2009). No consistent informa- Allowances
tion about the supply of home care services was
available across all the countries, other than state- While policies offering cash payments to frail
ments that formal home nursing and care aide elders and their family members are discussed
services provided by established and authorized under the financing section above, these policies
entities were largely unavailable in countries like have a direct bearing on the structure of the market
Greece, Italy, and Poland whereas in countries for long-term care services for several reasons
like Sweden and Norway, all municipalities have (Ungerson and Yeandle 2007). First, many coun-
these kinds of service agencies. This is consistent tries which have some form of cash payments
with OECD data indicating that the proportion of provided to eligible frail elders and their families
the elderly population receiving any formally pro- allow those funds to be used by family members,
vided long-term services and supports was very ostensibly as compensation for foregone labor
low in the southern European countries but much force activity (Wiener 2007). Second, unless
higher in northern European countries. there are explicit limitations on the use of such
One of the recent issues facing all OECD coun- cash transfers, according to the limited empirical
tries are projections of the number of needed long- research that has been done on the issue, recipi-
term care workers relative to the growing number ents and their families appear to be more likely to
of frail, aged individuals in the population purchase unskilled household and personal care
(Columbo et al. 2011). Projections for the num- help from the unregulated labor market. That is,
bers of workers relative to the size of the popula- there are many reports documenting that this kind
tion in need conducted by the European Network of work is frequently done by undocumented
of Economic Policy Institutes strongly point to the workers (Bettio and Solinas 2009). Third, the
fact that most countries will face a significant interrelationship between cash transfer programs
43 Provision of Health Services: Long-Term Care 967
from long-term care insurance and the role of counseling” demonstration in three states was the
informal care and the undocumented “grey” stimulus for major expansions of this option for
labor force for domestic help has begun to receive Medicaid programs across the country since both
considerable attention in EU and OECD countries family members serving as informal caregivers
in the context of the raging debate regarding ille- and the workers hired by the family and patients
gal immigration and the cost of employment. reported improved satisfaction with their circum-
While it is not the place of this chapter to address stances when compared to the control group (Fos-
these issues thoroughly, the implications for the ter et al. 2007; RWJF 2013).
organization of long-term services and supports One of the key provisions of these cash allow-
are considerable since, many would argue that the ance programs is the extent to which the use of
growth of the illegal labor market for domestic the funds by recipients and their families are
help with the aging of the population undermines regulated, that is, how the money is spent is
the development of a robust home care services predetermined and/or whether there are restric-
system. It is this issue we address in the final tions on the kinds of workers that can be hired
paragraphs of this segment of the chapter (van (van Hooren 2008). In a comparative policy anal-
Hooren 2008). ysis, van Hooren appears to suggest that coun-
Informal care is the dominant source of support tries with little restriction on how such cash
for most community dwelling elderly throughout allowances are used are associated with higher
the developed world. While many have argued proportions of eligible households using illegal
that the availability of formal agency support domestic workers. Although the data are neces-
undermines and substitutes for endogenous infor- sarily limited, comparisons of Italian families’
mal care from families, the evidence both from use of undocumented domestic workers to care
microeconomic and macroeconomic studies do for the elderly and the virtually nonexistent use
not support this contention (Tarricone and Touros of such workers in the Netherlands suggests that
2008; Rothgang 2003; Foster et al. 2007). While families in countries with better developed for-
the proportion of the frail elderly in northern mal agency community-based services will rely
Europe who receive some form of formal care less upon the unregulated labor market to care for
services from municipalities is much higher than frail elders (van Hooren 2008; Simmonazzi
in southern Europe, the proportion receiving 2009).
assistance from families and friends is similar, In the USA, recent policies have expanded the
although the relative share of support may be applicability of “cash and counseling” programs
more highly weighted toward formal service pro- which encourage income and functionally eligible
viders (OECD 2013a). elders and their families to use their cash allow-
As noted, numerous OECD countries have ance to arrange for personal care attendants and
some form of cash payment system for frail elders home care assistants directly, thereby making
and their families who require long-term care their money stretch. Like many of the European
services and supports. While there are many cash transfer programs, the autonomy inherent in
details that differentiate the manner in which the such self-directed care seems to promote an infor-
funds are provided and the conditions under mal economy with workers, whether legally
which they can be used, it is clear that these are documented or undocumented, being paid with-
extremely popular programs (Columbo et al. out standard employer benefits such as holidays,
2011). Indeed, the popularity of the cash option vacation, and sick time which would be expected
is best seen in the fact that the vast majority of by a worker employed by an agency. For agencies
German households eligible for support under the to compete for labor when they have to withhold
long-term care insurance law elect to receive cash taxes and other payments from employees (not
rather than services even though the value of the necessarily required for cash payments) as well
cash is far less than the replacement cost of the as for customers, who do not necessarily want to
service. In the USA the success of the “cash and pay for the higher cost of agency-supervised
968 V. Mor and A. Maresso
workers, places them in a highly disadvanta- requires increased investment in time, skilled staff,
geous position. Many US states committed to and resources to ensure appropriate reporting
implement such “consumer directed care” pro- dimensions, standardized data gathering proce-
grams are exploring ways of creating a labor dures, and consistent assessment protocols that
market that elderly and disabled clients and can then feed into quality improvement measures.
their families and advocates can rely upon However, just as there is a great deal of variation
which also have provisions for paying workers’ in how countries organize and finance long-term
benefits while indemnifying the care recipient. care, there are also differences in the regulatory
(See the “Cash and Counseling” Resource Center approaches to assuring quality.
Web site for presentations and discussions of
care worker training, benefits payment, and
liability insurance. http://www.bc.edu/schools/ Different Regulatory Approaches
gssw/nrcpds/cash_and_counseling.html.) to Quality Assurance
Over the next decades, OECD countries will
have to devise more cohesive policies to sup- In a recent comparative study that we
port the population’s desire to receive care at conducted on the regulation of long-term care
home without inadvertently stimulating demand quality in 14 countries, we identified three main
for undocumented workers and illegal immigra- approaches that underpin the quality assurance
tion which, in turn, undermines the ability of a frameworks in the countries with relatively
well-functioning formal market in long-term well-developed long-term care systems (Mor
care services. Northern European countries, et al. 2014). The first approach, as seen in
like wealthier US states, have well-developed countries such as Austria, Germany, Japan,
home care agency structures, whether publicly and Switzerland, delegates the main responsi-
or privately operated, precisely because they bility for upholding standards, training, and
have invested in this sector of the long-term staff certification requirements for the long-
care system, whereas southern European coun- term care workforce, as well as for monitoring
tries, like their US southern state counterparts, quality, to professional organizations. In this
have relied more extensively on family caregiv- approach, government is a partner in quality
ing which, when unable to meet elders’ needs, assurance rather than assuming a primary
seeks to purchase assistance from the informal “policing” role. While government is still
labor market. How to devise financing and involved in setting standards for long-term
reimbursement policies as well as regulatory care via legislation, the “professionalism-
structures to address these issues presents a based” approach to quality regulation places
major challenge to developed countries. considerable trust in “self-regulation.” This
position is predicated on the assumption that
associations of professionals involved in long-
Regulating Quality term care have distinctive expertise that the
state can rely upon to ensure their commitment
Ensuring the quality of long-term care involves to training and ethical good practices in caring
more than just putting into place regulatory for the elderly.
rules and procedures to govern the licensure In contrast, a second approach (followed in
(registration) and certification of providers of countries such Australia, England, the Nether-
long-term care services. It also involves having lands, and Spain) is much more empirical and
systems to monitor the safety, effectiveness, and inspection-based, where government authorities
success of those services in terms of assume the primary role in rule-making and
maintaining the well-being, health outcomes, monitoring providers’ compliance with statuto-
and dignity of long-term care recipients. In prac- rily defined regulations. This “inspection-based”
tice, the latter goal is more difficult to achieve and approach stresses the need for close oversight
43 Provision of Health Services: Long-Term Care 969
by central authorities as there is generally less mainly applicable to the residential care sector as
societal confidence that professionals, or pro- quality regulation of home care agencies is very
viders, will always act in the interests of frail underdeveloped. Moreover, a check mark refers
elderly people using long-term care services. A only to the fact that the regulatory function takes
third approach, in place in Canada, Finland, place and does not purport to indicate the effec-
New Zealand, and the United States, builds tiveness of the regulations or the overall quality of
upon the existing inspection-oriented approach care.
to licensing, inspection, and complaints investi- As can be seen, the first four rows of Table 5
gation by adding quality measurement and pub- look at structural standards that are relevant to the
lic reporting protocols based on intensive data licensing of long-term care providers. All coun-
gathering and analysis. This “data management tries in this sample require providers to register
and public reporting” approach emphasizes with designated authorities and in the case of
standardization and reporting of data so that residential facilities must demonstrate that
long-term care users, ideally, can act as con- requirements for the physical plant (such as fire
sumers and choose the best services suited to and safety arrangements and quality of life con-
their needs, with quality being boosted by mar- siderations such as room size) are met. Moreover,
ket competition among providers. The best the OECD reports that in two-thirds of its member
example of this approach can be seen in the countries accreditation or certification of care
United States where the RAI Minimum Data facilities is compulsory, a condition for reimburse-
Set (MDS) (The MDS is a Resident Assessment ment and contracting or common practice (OECD
Instrument (RAI) which is required in all US 2010). In addition, formal regulations govern the
nursing homes in order to ensure that a resi- level of education and training that groups of
dent’s care plan is based upon a comprehensive long-term care workers (e.g., registered nurses,
assessment of their needs.) is used and the gov- personal care workers) must attain in order to be
ernment’s Nursing Home Compare Web site employed by a long-term care provider. However,
reports information on a range of MDS-based it is noteworthy that the levels of required training
assessment measures for short- and long-stay as well as experience vary markedly among coun-
nursing home residents [See http://www.medi tries. For example certified care workers need 75 h
care.gov/nursinghomecompare/search.html]. of training and experience in the United States,
430 h in Australia, 75 weeks in Denmark, and
3 years in Japan (OECD 2010; OECD/European
The Regulatory Reach of Quality Commission 2013; Table 4). In addition, market
Monitoring conditions, such as local unemployment rates or
the availability of excess labor (such as illegal
Monitoring the quality of long-term care can immigrants), has a big influence on the strictness
address structures, processes, and, less often, out- with which providers apply these professional
comes. It is also useful to divide quality regulation standards. Another consideration is the cost impli-
functions in terms of three broad domains: cations for mandating minimum training of staff
(1) standard setting and initial inspection and in formal care settings. Along with minimum
licensure, (2) ongoing surveillance and enforce- wages, social security contributions, and other
ment, and (3) reporting and/or rewarding perfor- labor-related overhead, training requirements
mance. Table 5 summarizes a wide selection of add to higher wage costs in the formal care sector,
these regulatory functions in a selection of OECD making formal long-term care services too expen-
countries, with the check marks in the columns sive for many users, particularly in countries
indicating that a particular function is an integral where public coverage is limited. In such cases,
part of the quality regulation regime in the partic- informal care from relatives or hiring cheaper care
ular country. It is important to note that the table workers from an available pool of migrant
rows includes quality assurance functions that are workers in the “grey labor market” becomes the
Table 5 Long-term care regulatory functions, selected countries
970
13. Complaint X X X X X X X Xb X X
collection and
monitoring
system
14. Telephone X X X X X X X
or Web-based
action-line
complaint
process
15. Public X X X X X Xn X X X
reporting
16. Consumer X X X X
choice data
17. Pay-for- X X
Performance
quality
Provision of Health Services: Long-Term Care
assurance
Source: Mor et al. (2014)
Notes:
a
Functions can vary slightly across nursing homes and community-based options.
b
Varies across regions.
c
Regulations pertaining to structural aspects are very broadly specified for the most part.
d
Standards for all groups exist but enforceability depends on the staff group (e.g., nurse, social worker, care worker).
e
Industry associations of aged care providers and nursing professional bodies.
f
Some provider regulations refer to aspects of the care process but these are broadly specified. Within the regulations, cross-references are made to best practice guidelines, but
inspectors seem to have a good deal of leeway on how to interpret this standard. NICE is developing quality standards which set out care process minimum standards, but it is not
yet clear to what extent these are enforceable or merely guidelines.
g
Legislation sets out a Schedule of Specified Care and Services and requires providers to deliver care of “an appropriate standard.” However, the legislation does not set out
minimum staff-resident ratios or hours of care.
h
Poorly or variably enforced.
i
There are no fines for poor care per se. However, poor providers can be sanctioned by having government funding withheld for new residents until they meet care standards. This is
a form of financial sanction.
j
These exist but are poorly enforced and with very low level of fines.
k
Contracts can be terminated by the Long-term Care Fund.
l
Appeals can be brought against registration or outcome of inspections.
m
Providers have appeal rights against regulatory decisions. Care recipients have appeal rights against a decision by the government to not to approve them for subsidized care.
n
Limited to inspections.
971
972 V. Mor and A. Maresso
only viable alternative, especially where The best example of a such an assessment tool
unregulated cash benefits are available to further is the Resident Assessment Instrument Minimum
incentivize this solution (van Hooren 2008) (See Data Set version (RAI-MDS) applied to nursing
also section “Structure of The Delivery System” home contexts and the versions developed for the
above). Obviously, the use of unskilled informal, assessment of individuals receiving home care
hired, carers may have consequences for the qual- (interRAI-HC) and care in community settings
ity of care provided – although many positive (inter-RAI CHA). First developed in the United
benefits of informal or hired care, such as foster- States in the late 1980s and subsequently
ing empowerment in the care user and building extended through an international consortium of
strong relations of trust, also have been reported experts, the current comprehensive suite of
(Columbo et al. 2011; Dale et al. 2005). assessment instruments are standardized tools to
One final aspect in this group of functions is the detect long-term care users’ strengths, needs,
role of professional organizations and/or indepen- and potential risks to enable individualized
dent, nongovernment organizations, in helping to monitoring and care planning. In addition, col-
set standards for long-term care providers (Row lected data are aggregated to produce quality indi-
4). Again, there is wide variability among coun- cators on processes and outcomes both at the
tries in both the participation rates of such organi- individual and facility/organizational level (Mor
zations and the rigor with which they pursue their et al. 2010; Hutchinson et al. 2010). The mandated
roles in standard setting, i.e., whether they use or testing of RAI assessment instruments
actively participate in developing benchmarks internationally has been growing over the last
for best practice (e.g., in Austria and Japan) or decade, with a presence in several countries in
whether they tend to limit their role to advocating North America, Europe, South-East Asia, and
in favor of minimum standards (USA). Australasia (see http://www.interrai.org/world
Rows 5 to 12 consider different functions wide.html).
associated with ongoing monitoring and Focusing directly on approaches to inspection,
enforcement as captured by inspection regimes. we can see from Table 5 that after a provider has
Such monitoring focuses in particular on process been certified or licensed, routine inspections are
standards that are applied to encourage positive almost universally carried out, albeit according to
aspects of care (such as weight monitoring, different time-frames and conditions that may
wound monitoring, fall prevention, and infection trigger an inspection (Rows 7–9). For example,
control) or to prohibit practices that often have a inspections may take place every few years, or
negative impact (e.g., the use of physical they may be less frequent based on a provider’s
restraints or the use of antipsychotic medica- good performance in previous inspections; the
tions) as well as the sanctions that can be latter kind of “risk-based” regulation relies on
imposed in cases of poor performance and/or historical inspection data and using it to shape
noncompliance. Such standards exist in several the regularity and intensity of subsequent inspec-
countries to minimize the use of these behavior tions. Alternatively, a desk audit of data submitted
control schemes by requiring extensive docu- in advance by a provider may take place in some
mentation to justify their use, and making them circumstances, either in lieu of or prior to an
the subject of inspection. In contrast, the moni- on-site inspection. In addition, ad hoc inspections
toring of resident’s health or well-being out- may be triggered by a complaint by a resident or
comes as an aspect of quality control is a family member, and the assessor will often inves-
relatively advanced and complex objective, tigate both the source of the particular complaint
involving the definition of such “outcomes” for as well as seek to document other problems in the
frail elderly individuals and then establishing same care domain as the complaint. On-site
standardized data systems that carers and inspec- inspections may also be carried out according to
tors can use to determine whether such outcomes regular schedules or be random (unannounced)
have been achieved. with the aim of observing providers in carrying
43 Provision of Health Services: Long-Term Care 973
out their day-to-day care duties with no prior complaints channels or making available system-
notification. To date, however, there is no empir- atic data on provider quality performance. In some
ical data on the efficacy of one approach over the countries, financial incentives (such as pay-for-
other in terms of stimulating better quality of care performance tariffs) may be in place through pub-
(Mor et al. 2014). lic funders of long-term care to encourage pro-
Once inspections uncover a problem with an viders to participate in quality assurance programs
aspect of care, different regulatory frameworks (Rows 13–17). Although complaints monitoring
employ various means to rectify the problem data is scarce, systems for submitting complaints
(Rows 10–12). Some countries prefer to view about the treatment of long-term care recipients
inspections as collaborative, compliance-based exist in most countries, again with substantial
exercises in which inspectors first work in tandem variation in the means available (e.g., written
with providers to find solutions to the identified complaints or telephone/internet action-lines)
deficiencies. This may take the form of informal and the requirements for responding to such
negotiations and persuasion. In other cases, more complaints.
formalized “deterrence-based” procedures are Of equal saliency is making information about
used, such as issuing warnings to return to com- providers’ quality performance available to con-
pliance within a given timeframe. Most systems, sumers. This could take the form of presenting the
however, do have some form of official sanctions results of recent inspections or supplying specific
if providers fail to respond adequately or if performance data on providers through easily
repeated cases of noncompliance are found. By accessible media such as the Internet. For exam-
linking quality to financial penalties, the ability of ple, in the USA various measures of quality, rang-
regulators to levy fines or other penalties against ing from staffing levels, to inspection results, to
poorly performing providers represents one way indicators of process and outcome quality are
of incentivizing improvements. This can be done computerized and posted on government Web
by fining the provider directly, restricting further sites. In Finland, data on residents’ outcomes are
admissions (and therefore potential revenue), voluntarily fed back to providers with the inten-
and/or in countries with public long-term care tion that ultimately this information (particularly
financing, withholding reimbursement until the if it is positive) might be used by the providers
specific problem is fixed. themselves to inform potential long-term care
A last-resort and very rarely applied sanction is consumers in their areas. A similar structure is in
decertification, or revocation, of a provider’s place in New Zealand for their home care agen-
license to operate (Angelelli et al. 2003). Regula- cies, with plans to extend the practice to residen-
tors are often reluctant to impose this ultimate tial care facilities. This transparency can
penalty as relocating frail elderly residents from have several advantages. Firstly, if directly avail-
a facility that has been sanctioned with closure able to consumers, performance data based on
would mean finding other suitable and available various indicators can inform choices in selecting
places in the same area and may also potentially a long-term care provider (Werner et al. 2009).
impose “relocation” stress on residents. It is also Secondly, public reporting of providers’ quality
worth noting that all of the compliance enforce- can exert pressure on them to address problems
ment methods mentioned above may be subject to and maintain standards, particularly if they are
lengthy procedural requirements that regulators preforming poorly and do not wish to com-
have to adhere to and/or legal appeal processes promise their reputation in the local long-term
open to sanctioned providers. These processes care market. Thirdly, having access to perfor-
often involve considerable periods of time before mance data, particularly on comparable quality
a noncompliance issue is resolved. indicators across a number of providers (either
Finally, the quality of long-term care can be local or national), can feed into individual pro-
monitored via various means to report on pro- viders’ voluntary quality improvement strategies
viders’ performance, whether through established (Werner and Konetzka 2009).
974 V. Mor and A. Maresso
Challenges Facing Quality Monitoring Related to this is the issue of whether any
collected data is made publicly available. One
While comparative international information is stumbling block is the opposition of providers to
still quite scarce, available sources (Mor et al. sharing information not only with the public but
2014) highlight that despite how developed or specifically with competitors, particularly if there
underdeveloped a long-term care system is, regu- is a reputational risk involved in releasing infor-
latory frameworks and some form of monitoring mation about poor performance. While some
activities always exist for residential facilities countries have started to collect inspection-based
(nursing homes). In contrast, due to the existence data, very few follow the example of the United
of informal care arrangements as well as less States where the availability of such data on res-
developed quality assurance programs applicable idential services helps would-be residents to make
to formal home care agencies, there is much less choices about what nursing home facilities would
regulation and knowledge about the regulation of best suit their needs or to vote with their feet if
home care, although some countries (such as the their existing facility falls short. The exercise of
Netherlands, New Zealand, Canada, the USA, and choice based on quality data also extends to health
Switzerland) have made inroads into monitoring and long-term care insurers who in the future will
the quality of home care. Given the relative increasingly have to make purchasing decisions
growth of the latter sector and the general prefer- about competitive service suppliers based on both
ence by long-term care users to remain in their cost-effectiveness and the quality of care. Indeed,
own home for as long as feasible (European Com- this process is already taking place in the USA,
mission 2008) the relative scarcity of quality Canada, Finland, and New Zealand.
assurance frameworks for home care settings A final consideration is affecting a sea change
will be a major but necessary challenge for gov- in the attitude to quality monitoring and its role in
ernments in the future. incentivizing improvements in the quality of long-
A second key challenge is the availability of term care. While the primary purpose of measur-
data that can be used for quality assurance pur- ing long-term care structures, processes, and out-
poses – not only in the residential care sector, comes is to ensure the safety and dignity of service
where data gathering can often be sporadic and users, the information harnessed by this process
not standardized across facilities, but also in the can be an invaluable tool for providers to assess
home care sector, where data are even more the relative quality of their performance against
limited. As far as European countries go, it is relevant benchmarks, be it officially set standards
still the case that standardized information derived or industry averages. Armed with such meaning-
from the inspection process is not routinely col- ful data, and skilled staff to interpret it, long-term
lected nor archived for subsequent use. One rea- care providers would then be in a much better
son for this is the difficulty of standardizing position to shape their improvement strategies,
inspections, or assessments, across different not only to enhance their marketability but more
regions, particularly if this function is importantly, for the benefit of the elderly clients in
decentralized to lower levels of government their care.
administration (such as municipalities or local
agencies) or to providers themselves. Another
hurdle is variability in the interpretations and Summary and Conclusions
evaluations of individual assessors. Thus, the
lack of consistency hinders meaningful compari- Governments of the rapidly aging industrialized
sons across providers. Indeed, even in the USA countries are just beginning to be aware of the
where standardized inspection protocols are in enormous challenges they will face in meeting the
place and computerized, there is substantial inter- care needs of the frail elderly. Over the last decade
state variation in the conduct and results of inspec- or so, most countries have begun the difficult
tion. (Mukamel et al. 2012). process of rebalancing the provision of long-
43 Provision of Health Services: Long-Term Care 975
term care from a system that was almost entirely Extending quality measurement to the home
supporting residential or institutional care to one care setting, particularly if including frail older
in which the majority of service recipients were persons receiving cash payments which they
cared for in their homes. This shift was acceler- apply to paying family or undocumented and
ated, or made possible, in many countries by the unlicensed workers, presents numerous chal-
introduction of direct cash payments to eligible lenges. Home care providers in the USA,
individuals and their families, allowing them to many Canadian provinces, and New Zealand
direct their long-term care mix of services using have implemented individualized quality met-
entitlement funds for which they are eligible rics as part of a routine client assessment pro-
because of their need for functionally based cess, and these data have been used to report on
support. provider quality (Mor et al. 2014). These expe-
There are several consequences that these riences suggest that the use of this kind of
shifts in care orientation have brought about. “microlevel” information is certainly a feasible
First, giving older consumers and their family approach to quality performance measurement
members control over who they hire has substan- of home care services, but they necessarily
tially altered the labor market for long-term care depend upon professionals periodically
services, particularly in those areas where well- assessing the client and using those data to
developed agency-based long-term home care ser- calculate indicators of quality performance. In
vices do not exist. Indeed, the availability of some the case of cash payments to family and infor-
financial support may allow late middle-aged chil- mal labor market participants, this approach is
dren of frail elderly persons to remain out of the not viable without introducing a mandatory
formal workforce or provide another reason for assessment in recipients’ homes, a process that
these individuals to retire early, becoming full may be perceived as excessively onerous,
time, partially paid caregivers to their aged and entailing an invasion of privacy. Furthermore,
frail parent. Second, as we have seen, monitoring since most OECD countries have not even
the quality of care and services rendered to frail established a solid data reporting system in
older persons is difficult enough when only pro- reference to institutional care provision and
vided in large residential care settings. Adminis- quality, it would be hard to imagine that most
trative procedures for reporting staffing levels and would be willing to institute an even more
quality as well as documenting services rendered complicated and costly data-based approach to
are sufficiently burdensome that many countries quality oversight of home care services.
do not require this form of reporting. Furthermore, Newer challenges which policy makers in the
hiring independent inspectors to monitor the per- health care delivery space are increasingly worry-
formance of these institutional providers consti- ing about is the linkage between reimbursement
tutes a large expense even if facilities are only and quality measurement. Strategies to assure
inspected annually. However, to truly monitor quality by applying “value-based purchasing”
quality issues requires more frequent inspections, have been tried with limited success in the USA
unannounced inspections, and inspections insti- but are likely to emerge as the next emerging
tuted in response to residents’ and families’ com- policy debate. To even consider this approach,
plaints. While this constitutes a very difficult task however, there is a critical need for consistent
in the case of residential care, monitoring quality data about patients’ outcomes in selected areas
in the home care setting, much less, policing and providers’ characteristics and services. To
family members’ own provision of care to the date, these types of data exist in only a few coun-
frail older person in their own homes is consider- tries, but the complexities of introducing such
ably more complex and costly, requiring close systems are substantial even after the data collec-
collaboration with what in the USA is known as tion and assembly challenges have been met. It is
“adult protective service” given the real potential likely, however, that as demand for various types
for abuse. of long-term care services increase due to
976 V. Mor and A. Maresso
population aging and inadequate private savings Grabowski DC, Cadigan RO, Miller EA, Stevenson DG,
among the elderly, public support for long-term Clark M, Mor V. Supporting home- and community-
based care: views of long-term care specialists.
care services may well be contingent upon those Med Care Res Rev. 2010;67(Suppl 4):82S–101S.
services being viewed as value for money. Hutchinson AM, Milk DL, Maisey S, Johnson C, Squires
JE, Teare G, Estabrooks CA. The resident assessment
instrument-minimum data set 2.0 quality indicators: a
References systematic review. BMC Health Serv Res. 2010;10:166.
https://doi.org/10.1186/1472–6963–10-166.
Alakeson V. International development in self-directed Ikegami N, Fries BE, Takagi Y, Ikeda S, Ibe T. Applying
care. Issue Brief (Commonw Fund). 2010;78:1–11. RUG-III in Japanese long-term care facilities. Geron-
Angelelli J, Mor V, et al. Oversight of nursing homes: tologist. 1994;34(5):628–39.
pruning the tree or just spotting bad apples? Gerontol- Ikegami N, Morris JN, Fries BE. Low-care cases in long-
ogist. 2003;43(2):67–75. term care settings: variation among nations. Age Age-
Bettio F, Solinas G. Which European model for elderly ing. 1997;26(Suppl 2):67–71.
are? Equity and cost-effectiveness in home based care Ikegami N, Ishibashi T, Amano T. Japan’s long-term care
in three European countries. Econ Lavoro. 2009;43 regulations focused on structure – rationale and future
(1):53–71. prospects. In: Mor V, Leone T, Maresso A, editors.
Carpenter I, Hirdes J. A good life in old age: monitoring Regulating long-term care quality: an international
and improving quality in long term care. OECD Health comparison. Cambridge: Cambridge University Press;
Policy Studies, OECD Publishing; 2013. https://doi. 2014.
org/10.1787/9789264194564-en Jung H-Y, Jang S-N, Seok J-E, Kwon S. Quality monitor-
Columbo F, Llena-Nozal A, Mercier J, Tjadens F. Help ing of long-term care in the Republic of Korea. In:
wanted? Providing and paying for long-term care. Mor V, Leone T, Maresso A, editors. Regulating long-
Paris: OECD Publishing; 2011. term care quality: an international comparison. Cam-
Dale S, Brown R, Phillips B, Carlson BL. How do hired bridge: Cambridge University Press; 2014.
workers fare under consumer-directed personal care? Kane RA, Kane RL, Ladd RC. The heart of long-term care.
Gerontologist. 2005;45(5):583–92. New York: Oxford University Press; 1998.
Damiani G, Farelli V, Anselmi A, Sicuro L, Solipaca A, Katz MB. In the shadow of the Poorhouse: a social history
Burgio A, Iezzi DF, Ricciardi W. Patterns of long term of welfare in America. Tenth anniversary edition.
care in 29 European countries: evidence from an New York: Basic Books; 1996.
exploratory study. BMC Health Serv Res. 2011;11:316. Kellogg DO. The pauper question. Atl Mon. 1883;51
Doty P, Mahoney KJ, Sciagaj M. New state strategies to (307):638–652.
meet long-term care needs. Health Aff. 2010;29 Larson Allen L. Mapping the future: estimating Florida
(1):49–56. aging service needs 2008–2030. Tallahassie: Agency
Duyvendak JW, Grootegoed E, Savernije MT, Tonkens for Health Care Administration; 2008.
E. Day 1: long-term care in Europe, the state of the Miller EA, Mor V, Clark M. Reforming long-term care in
art. Presentation given at does Europe care? European the United States: findings from a national survey of
Conference on Long-Term Care and Diversity, Amster- specialists. Gerontologist. 2010;50(2):238–52.
dam; 2009. http://www.careconference.eu/site/sites/ Miller EA, Tyler DA, Rozanova J, Mor V. National news-
default/files/Part201.pdf. paper portrayal of U.S. nursing homes: periodic treat-
European Commission. Long-term care in the European ment of topic and tone. Milbank Q. 2012;90
Union. Brussels: Commission of the European Com- (4):725–61.
munities, DG Employment, Social Affairs and Equal Mor V, Miller EA, Clark M. The taste for regulation in
Opportunities; 2008. long-term care. Med Care Res Rev. 2010;67(Suppl
Feng Z, Lepore M, Clark MA, Tyler D, Smith DB, Mor V, 4):38S–64S.
Fennell ML. Geographic concentration and correlates Mor V, Leone T, Maresso A, editors. Regulating long-term
of nursing home closures: 1999–2008. Arch Intern care quality: an international comparison. Cambridge:
Med. 2011;171(9):806–13. Cambridge University Press; 2014.
Fernandez JL, Forder J, Trukeschitz B, Rokosová M, Mot E, Willemé P, editors. Assessing needs of care in
McDaid D. How can European States design efficient, European nations, ENEPRI policy brief no. 14, vol.
equitable and sustainable funding systems for long- 2012. Centre for European Policy Studies: Brussels;
term care for older people? Copenhagen: World Health 2012.
Organization and World Health Organization on behalf Mukamel DB, Weimer DL, Harrington C, Spector WD,
of the European Observatory on Health Systems and Ladd H, Li Y. The effect of state regulatory stringency
Policies; 2009. on nursing home quality. Health Serv Res. 2012;47
Foster L, Dale S, Brown R. How caregivers and workers (5):1791–813.
fared in Cash and Counseling. Health Serv Res. OECD. Ensuring quality long-term care for older people.
2007;42(1 Pt 2):510–32. Paris: OECD Publishing; 2010. Policy Brief.
43 Provision of Health Services: Long-Term Care 977
OECD. Recipients of long-term care. In: Health at a glance Smith DB, Feng Z, Fennell ML, Zinn JS, Mor V. Separate
2013: OECD indicators. Paris: OECD Publishing; 2013a. and unequal: racial segregation and disparities in qual-
https://doi.org/10.1787/health_glance-2013-75-en ity across US nursing homes. Health Aff. 2007;26
OECD. OECD health data: long-term care resources and (5):1448–58.
utilisation. Paris: OECD; 2013b. Stevenson DG, Grabowski DC. Sizing up the market for
OECD/European Commission. A good life in old age? assisted living. Health Aff. 2010;29(1):35–43.
Monitoring and improving quality in long-term care, Swartz K. Searching for a balance of responsibilities:
OECD health policy studies. Paris: OECD Publishing; OECD countries’ changing elderly
2013. https://doi.org/10.1787/9789264194564-en assistance policies. Annu Rev Public Health.
Phillips B, Schneider B. Commonalities and variations in 2013;34:397–412.
the Cash and Counseling programs across the three Tarricone R, Touros AD, editors. The solid facts: home
demonstration States. Health Serv Res. 2007;42 care in Europe. Copenhagen: World Health Organiza-
(1 Pt 2):397–413. tion Regional Office for Europe and Universita'
Rothgang H. Long-term care for older people in Germany. Commerciale Luigi Bocconi; 2008.
In: Comas-Herrera A, Wittenberg R, editors. European Ungerson C, Yeandle S. Cash for care in developed
study of long-term care expenditure. Investigating the welfare states. Houndmills: Palgrave Macmillan;
sensitivity of projections of future long-term care 2007.
expenditure in Germany, Spain, Italy and the United van Hooren F. Bringing policies back in: How social and
Kingdom to changes in assumptions about demogra- migration policies affect the employment of immi-
phy, dependency, informal care, formal care and unit grants in domestic care for the elderly in the EU-15.
costs. Report to the European Commission, Employ- Paper presented at Transforming elderly care at local,
ment and Social Affairs DG: 24–42. 2003. http://ec. national and transnational level, International Confer-
europa.eu/employment_social/soc-prot/healthcare/ltc_ ence at the Danish National Centre for Social Research
study_en.pdf (SFI), Copenhagen; 2008.
RWJF – Robert Wood Johnson Foundation. Executive Werner RM, Konetzka RT. What drives nursing home
summary: cash and counseling program. Princeton: quality improvement under public reporting? An exam-
Robert Wood Johnson Foundation; 2013. Available ination of post-acute care. Chicago: AcademyHealth;
at: http://www.rwjf.org/content/dam/farm/reports/pro 2009.
gram_results_reports/2013/rwjf406468/subassets/rwjf Werner RM, Konetzka RT, Stuart EA, Norton EC,
406468_1 Polsky D, Park J. Impact of public reporting on quality
Simmonazzi A. Home care and cash transfers. Effects on of postacute care. Health Serv Res. 2009;44
the elderly care-female employment trade-off. Cost (4):1169–87.
Conference. Rome; 2009. Wiener JM. Commentary: cash and counseling in an
Sloane PD, Zimmerman S, Gruber-Baldini AL, Hebel JR, international context. Health Serv Res. 2007;42(1 Pt
Magaziner J, Konrad TR. Health and functional out- 2):567–76.
comes and health care utilization of persons with Wirrmann Gadsby E. Personal budgets and health: a
dementia in residential care and assisted living facili- review of the evidence. London: PruComm. Policy
ties: comparison with nursing homes. Gerontologist. Research Unit in Commissioning and the Health Care
2005;45 Spec No 1(1):124–32. System, Department of Health; 2013.
Provision of Health Services: Mental
Health Care 44
Jon Cylus, Marya Saidi, and Martin Knapp
Contents
Introduction: Why Is Mental Health Important? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 980
Definitions and Spectrum of Mental Health Disorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 980
Direct and Indirect Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 981
Stigma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982
Comorbidity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 983
Provision of Mental Health Care: How Is Care Delivered? . . . . . . . . . . . . . . . . . . . . . . . . 983
Who Delivers Care: Medical Professionals, Unpaid Caregivers . . . . . . . . . . . . . . . . . . . . . . . 983
Financing Mental Health Services: How Is Care Financed? . . . . . . . . . . . . . . . . . . . . . . . 986
Key Policy Dimensions/Recent Policies and Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986
Personalization and Empowerment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987
Carer and Family Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987
Prevention, Promotion, Public Mental Health (e.g., Campaigning) . . . . . . . . . . . . . . . . . . . . 988
Aging and Dementia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 989
Employment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 990
New Advancements in Treatments and Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994
These comprise different types of depression and (16%), again higher in Western developed coun-
anxiety, and their symptoms include low mood tries. Mood disorders were also very common
and a loss of interest and enjoyment in usual (12%) and were also mainly reported in Western
activities. Anxiety disorders include generalized countries. Kessler et al. suggested that a reason for
anxiety disorder, panic disorder, phobias, and their possible underestimation of prevalence rates
obsessive and compulsive disorder (OCD). in some countries may be because the DSM cate-
OCD, the most severe form of anxiety disorder, gories are less relevant to symptom expression in
is characterized by a combination of obsessive some countries that others.
thoughts and compulsive behaviors, where obses-
sions are defined as recurrent and persistent
thoughts and impulses or images that are intrusive Direct and Indirect Costs
and inappropriate and cause anxiety or distress,
while compulsions are repetitive, purposeful, and The latest data from the Global Burden of
ritualistic behaviors or mental acts that are Disease study (2013) estimate that mental and
performed in response to obsessive intrusion and behavioral disorders accounted for 198.3 million
to a set of rigidly prescribed rules (National Centre disability-adjusted life years (DALYs), with uni-
for Social Research 2007). Psychoses are disor- polar depressive disorders accounting for 37.8%
ders of the mind that can produce disturbances in of them. Anxiety disorders were the second
thinking as well as perceptions severe enough to biggest contributor, at 13.6% of DALYs of mental
produce distortions in perceptions of reality. Psy- health. The WHO (2013) ranked mental and
choses may also impair motivation and may be behavioral disorders the sixth leading cause of
associated with affective dysregulation (depres- DALYs worldwide for 2011, surpassing respira-
sion, mania), as well as alterations in information tory diseases, neurological and sense organ con-
processing (cognitive impairment) (Van Os et al. ditions, musculoskeletal diseases, and endocrine,
2010). Van Os et al. conclude that overall, psy- blood, immune disorders and diabetes.
chotic outcomes are associated with living in an Importantly, the burden of mental and sub-
urban area, being part of a minority group, canna- stance abuse disorders had increased significantly
bis use, and developmental trauma – hence is since 1990. However, almost a third of countries –
linked to the three risk factors described above. surprisingly perhaps – still do not have a desig-
An important systematic review of the literature nated budget for mental health; and 21% of
on the epidemiology of schizophrenia (McGrath the countries that do have a specific mental
et al. 2004) found a lifetime prevalence rate of health budget spend less than 1% of their total
0.5–1%. Rates varied across the dimensions Van health budgets on mental health (World Health
Os et al. suggested, as well as gender: schizophre- Organization 2008).
nia was more common in males compared to People with mental health problems experi-
females. ence high rates of unemployment. For example,
Kessler et al. (2009) studied the prevalence in OECD countries and depending on level of
rates reported in the first 17 World Mental Health severity, people with mental health problems are
Surveys and found lifetime prevalence estimates between two to three times and six to seven times
of any DSM-V disorder to be 18.1–36.1%. more likely to be unemployed compared to people
These were the highest in Columbia, France, without such conditions (OECD 2012). One rea-
New Zealand, Mexico, the Netherlands, and son for this difference is that illness can make it
South Africa and the lowest in China and Nigeria. difficult to perform a job, but perhaps bigger
Kessler et al. (2009) commented that the low problems are stigma and discrimination.
prevalence rates in the last two countries may be People with a history of mental health prob-
downwardly biased. Anxiety disorders were lems still face problems in the open employment
consistently found to be the most prevalent class market, including stigma, and a reluctance from
of mental disorder in the general population employers to give them a job (McDaid 2008). The
982 J. Cylus et al.
fact that some people with mental health problems found that people with schizophrenia reported the
receive social security benefits also may also hin- highest self-stigma scores and perceived discrim-
der their chances of seeking and obtaining ination in Greece and the lowest empowerment
employment (OECD 2011). scores in Ukraine. In this case, empowerment and
increased social contact were significantly associ-
ated with reduced self-stigma scores.
Stigma The greatest barriers to social inclusion
for people with mental health problems are said
Stigma can be a “mark of disgrace associated to be stigma and discrimination (Baldwin and
with a particular circumstance, quality or person” Marcus 2011; Social Exclusion Unit 2004).
(Oxford Dictionaries 2010), yet it is no longer Indeed, misconceptions about mental health
physical or bodily in nature (Goffman 1963; Wahl can also lead to the belief that these diseases
1999); it is now viewed as personal, psychological, are untreatable and people who have them
and social. People are no longer physically branded are not valued members of their communities,
but labeled by society as poor, homosexual, crim- subsequently leading to appropriate support and
inal, or, in this case, mentally ill. These labels have resources not being delivered (Funk et al. 2012).
influenced perceptions and behaviors and lead to Lack of access to proper judicial mechanisms
the devaluation and denigration of those who are so that would protect their rights (World Health
labeled (Thornicroft 2007). Organization 2005b) means that people with men-
Research on stigma in mental health has tal health problems may often experience human
largely relied on attitude surveys and has been rights violations in the community (Drew et al.
descriptive; very few studies have investigated 2005; Funk et al. 2005), and sometimes major life-
this aspect from the standpoint of a person with changing decisions are made on their behalf with
mental illness. Discrimination against people with regard to housing or treatment, for example
mental health problems has still been consistent in (World Health Organization 2005b). Where insti-
different parts of the world (Thornicroft et al. tutions still exist, living conditions are paltry and
2009). For example, in Ethiopia, key informants present risks to peoples’ physical health (Drew
were asked about their perceptions of several dif- et al. 2011). Sharma (1999) reported on the state
ferent health and mental health conditions – they of mental hospitals found that many had under-
judged schizophrenia to be the most severe, and gone no structural transformations after previ-
mental illness was frequently associated with talk- ously having been jails. Other hospitals were at
ativeness, aggression, and strange behavior (Alem risk of serious overcrowding as single-person
et al. 1999). In the Arab world, there is much cells were used to house several patients. Others
stigma associated with mental health services lacked any sanitary facilities and received inap-
(Al-Krenawi and Graham 2000; Savaya 1995). propriate treatment.
Thornicroft et al. (2009) have reported high and Public education campaigns have produced
consistent rates of experienced discrimination mixed results (Thornicroft 2007, p. 244) and
among people with schizophrenia across coun- have perhaps only been reserved to certain coun-
tries of various income levels. A cross-sectional tries. A concerted program in Australia called
survey conducted in 27 countries using face to “beyondblue” aimed at conveying accurate infor-
face interviews with 732 participants with schizo- mation about depression, and its initial evalua-
phrenia found that across all countries, the most tions showed a series of benefits, including
common areas of negative experienced discrimi- better community recognition of people with
nation were seeking or maintaining friendships, depression, reforms in life insurance and income
discrimination by relatives, keeping and find- protection, as well as intervention programs in
ing a job, and intimate or sexual relationships. schools (Ellis et al. 2002).
Examining self-stigma and discrimination across Combatting mental health stigma has been at
several European countries, Brohan et al. (2010) the forefront of mental health policy in England.
44 Provision of Health Services: Mental Health Care 983
Fig. 1 Prevalence of major depression in patients with physical illnesses (World Health Organization 2003a)
The Time to Change campaign led by two mental also associated risk factors with the development
health charities promoted public mental health of mild cognitive impairment or Alzheimer’s dis-
awareness. A study measuring its efficacy ease (Velayudhan et al. 2010); studies in Japan
(Evans-Lacko et al. 2013c) suggested that their (Ohara et al. 2011) and Sweden (Xu et al. 2009)
marketing tools – promoting social contact have also shown associations between diabetes
between members of the public and people with and dementia.
mental health problems – had positive outcomes In England, recent policy has focused on the
on social stigma, and perhaps more so on behavior association between poor health and mental health
and attitudes, rather than knowledge (Evans- (Department of Health 2011), given these consid-
Lacko et al. 2013a). Smith (2013) commented erable costs – it is estimated that between £8 and
that although the economic analysis seems to £13 billon of NHS spending in England is due to
indicate benefits from the program, the assump- comorbid mental health problems and long-term
tions used in the model seem to lead to uncertain conditions (Naylor et al. 2012) – and burden to
conclusions: from a net cost to a benefit of £223 society (Department of Health 2011). Comorbid-
million. ity is also associated with lower quality of life:
utilizing data from the World Health Surveys,
Moussavi et al. (2007) showed that people who
Comorbidity suffered from depression as well as a long-term
condition reported lower quality of life scores
Comorbidity is common within the mental health compared to people who only suffered from
population, as 30% of all people with a long-term long-term conditions (Fig. 1).
health condition also have a mental health prob-
lem (Cimpean and Drake 2011). Other estimates
have shown that people with long-term conditions Provision of Mental Health Care: How
were up to three times more likely to experience a Is Care Delivered?
mental health problem compared to the general
population (Naylor et al. 2012). Although much Who Delivers Care: Medical
of the evidence relates specifically to affective Professionals, Unpaid Caregivers
disorders such as depression and anxiety (Naylor
et al. 2012), studies have shown higher rates of The occurrence of mental illness does not always
conditions such as asthma, arthritis, cancer, and require a need for treatment (Bebbington 1990).
HIV/AIDS (Chapman et al. 2005; Sederer et al. Nevertheless, much like its determinants, the
2006), among people with mental health prob- treatment can be categorized into biological
lems, compared to people without. There are (such as psychotropic drugs), psychological
984 J. Cylus et al.
(or psychotherapies), and psychosocial (like case 10.6% (1996–1999) to 21.3% (2004–2007).
management and family interventions) (Lehtinen Data from England showed annual increases
et al. 2007, p. 128). from 1998 to 2010 of 6.8% on average: antide-
Antipsychotics basically control the produc- pressant prescriptions rose by 10% per year, while
tion of dopamine, the main neurotransmitter antipsychotics grew by 5.1%.
in the brain – the excess of which may play a Costs of antispsychotics overtook those of
part in producing hallucinations, delusions, and antidepressants as the most costly psychiatric
thought disorder and hence are mainly used for drug, with costs rising by 22% (Ilyas and
the treatment of schizophrenia. Older antipsy- Moncrieff 2012). Similarly, data from the USA
chotics, such as haloperidol and chlorpromazine, in recent years show that antipsychotics, antide-
depending on their dosage, have side effects pressants, and drugs for attention deficit hyperac-
which include stiffness and shakiness. In compar- tivity disorder have been consistently ranked
ison, newer drugs, the most popular of which as the most expensive prescription drugs (IMS
are clozapine and olanzapine, have side effects Health 2010).
which include sleepiness and slowness, weight The first point of contact for mental health
gain, sexual problems, increased risk of diabetes, care in many countries is usually primary health
and some risk of Parkinson’s disease; long-term care, and a majority of countries allow primary
use can produce movements of the face and, health care (PHC) doctors to prescribe and/or
rarely, of the arms and legs. Both are administered continue prescribing medicines for mental and
in the form of a pill. Increasingly, the use of depot behavioral disorders either without restrictions
antipsychotics, where medication is given as an (56%) or with some legal restrictions (40%),
injection every 2–4 weeks, has become more such as allowing prescriptions only in certain
common: medication is hence released slowly categories of medicines or only in emergency
over the course of time. Depots are usually admin- settings. In other cases, psychiatrists or neurolo-
istered at the local GP surgery, at a community gists would take responsibility for prescribing
mental health center, at a special outpatient clinic for patients with more severe or treatment-
or by a nurse at home (Royal College of Psychi- resistant symptoms. Only 3% of respondent
atrists Public Education Editorial Board 2012). countries in a WHO survey did not allow any
Antidepressants are also frequently adminis- form of prescription by PHC doctors (World
tered; their main effect is to stimulate the amount Health Organization 2011).
of serotonin and/or noradrenaline in the brain Treatment, care, and support for people with
(Lehtinen et al. 2007, p. 131). Due to their poten- mental health problems are managed by primary,
tially adverse side effects, “new” antidepressants secondary (and tertiary) health-care settings, with
were introduced in the 1990s to curb these. Over- a lot of treatment and care delivered in the com-
all, the uptake of antidepressants has been on the munity by non-medics. The most comprehensive
rise in the last decades. form of mental health care, which comprises a
Cross-country variations are also apparent, balance between hospital and community-based
with the USA leading the pack in terms of drug services, has only been achieved in a few high-
prescribing. More recent data show a continuation income countries (Saxena et al. 2007). Only half
in this positive trend especially with regard to the countries in Africa, the eastern Mediterranean,
antidepressants in the USA (Olfson and Marcus and southeast Asia provide community-based
2009), New Zealand (Exeter et al. 2009), and Italy care (World Health Organization 2005). Within-
(Deambrosis et al. 2010) as well as antipsychotics country differences also exist in terms of the
in various countries (Verdoux et al. 2010). A study availability of community-based care: this type
of the trends of antipsychotic prescribing in the of care is restricted to only a few areas in China,
USA for anxiety disorders among a representative India, Paraguay, and Zambia. In general,
sample of visits to office-based psychiatrists about 52% of low-income countries and about
(Comer et al. 2011) found an increase from 97% of high-income countries provide
44 Provision of Health Services: Mental Health Care 985
community-based care (Saxena et al. 2007; World has been generally on the decline between 2002
Health Organization 2005). and 2006. On the other hand, forensic bed spaces
Hospital inpatient beds were the mainstay of have been on the increase (except in Ireland, Italy,
mental health provision in many high-income and Switzerland), as well as places in supported
countries for many decades and remain crucially and supportive housing (except in Ireland and
important, but in many countries, the specialist Switzerland) and in prisons. More specifically, in
(institutional) asylums are being or have Iceland, Italy, and Sweden, psychiatric hospitals
been closed. no longer exist and care is provided in beds in
The global median number of facilities per general hospitals or in community-based facilities
100,000 population is 0.61 outpatient facilities, (Medeiros et al. 2008).
0.05 day treatment facilities, 0.01 community However, community-based residential ser-
residential facilities, and 0.04 mental hospitals. vices are not available in all countries. Turkey
In terms of psychiatric beds in general hospitals, and most cantons in Switzerland do not possess
the global median is 1.4 beds per 100,000 popu- such facilities.Deinstitutionalization is advancing
lation. Higher income countries typically at different paces in different countries, mainly
have more facilities and higher admission/utiliza- due to national traditions and the sociocultural
tion rates. context, the availability of resources, as well as
Deinstitutionalization is the process of shifting financial incentives (Fakhoury and Priebe 2002).
the care and support for patients with mental ill- Within Europe, the rates of the closure of asy-
ness from custodial asylums to community-based lums have been uneven between countries, and
settings and saw its real beginnings in the USA sometimes gaps have been reported between the
and then in England in the 1970s (Shorter 1997). closure of institutions and the provision of alter-
This period also saw a shift in treatment, in terms native services (Medeiros et al. 2008).
of becoming demedicalized, as non-physician Research conducted in 2000 has shown that,
specialists begin to assume a role (Shorter 2007, for example, in Asia, and specifically in Japan
pp. 21, 22). (Kuno and Asukai 2000) and Hong Kong (Yip
In England, generally, studies have demon- 2000), deinstitutionalization has yet to occur.
strated that deinstitutionalization has had positive In Japan, Kuno and Asukai (2000) comment that
outcomes for service users (see the TAPS studies, deinstitutionalization is unlikely to happen in the
e.g.). However, systematic data on the preferences near future since people with mental health prob-
and situations of people with mental health lems are not valued as members of society. More
problems is gravely missing, with no existing recently in Japan, the Sasagawa Project (Mizuno
European overview (Anderson et al. 2007). Data et al. 2005) aimed to make the transition of people
from the UK show that although the majority of with mental health problems from hospital to
people with mental health problems live in main- community living; this project claimed to be the
stream housing (Boardman 2010; Social Exclu- first of its kind in the country. The study on the
sion Unit 2004), many live in residential care closure of Sasagawa hospital and the subsequent
homes (Health and Social Care Information relocation of patients into Sasagawa “village”
Centre 2013) or in supported housing services or reported positive outcomes; however, there is
in independent flats where they receive “floating much to say about the segregation of people with
support” (Centre for Housing Research 2013), mental health problems.
which is support for a set number of hours a Deinstitutionalization is currently under way in
week within a person’s home. several South American countries (Larrobla and
Data from Priebe et al. (2005) show that in fact, Botega 2000). In Australia, Moxham and Pegg
in most of the nine selected European countries (2000) commented that the shift to community
(Austria, Denmark, England, Germany, Republic care was not met with systematic and adequate
of Ireland, Italy, the Netherlands, Spain, and Swit- planning and the delivery of appropriate housing
zerland), the number of psychiatric hospital beds services or placements.
986 J. Cylus et al.
Recently, a new project – EMERALD – was is often financed in a similar way to the mecha-
launched to improve mental health outcomes in nism of funding general health care in that par-
health systems performance and identify its poten- ticular country. Out-of-pocket payments are also
tial barriers, specifically in low- and mid-income an important source of funding for mental health
countries (EMERALD 2014); results of this pro- care in some countries, particularly outside of
gram have yet to be disseminated. Europe. Even so, nearly half of western
European countries levy user charges for special-
ist mental health-care services, even within their
Financing Mental Health Services: How publicly funded system (Knapp et al. 2006).
Is Care Financed? Generally, voluntary health insurance does not
play a major role in funding mental health care.
Though the cost of poor mental health has been However, in some countries like the UK and
estimated to be between 3% and 4% of GDP even Germany, there has been some expansion of
in many European countries, no countries dedi- mental health-care coverage within voluntary
cate a proportionate level of resources to treating health insurance (Knapp 2007). In the USA
mental health disorders (Gabriel and Liimatainen there have also been recent efforts to ensure
2000). Just over two thirds of countries across the that private health insurers cover mental health
world have a budget that is specifically dedicated conditions no differently than they cover physi-
to mental health, and many countries spend less cal conditions.
than 1% of their total health budget on mental Naturally, each country allocates different
health-care services (Thornicroft and Maingay levels of funding to the treatment of mental
2002). According to the 2005 WHO Mental health. Historically, spending has been directed
Health Atlas, South East Asia had the highest toward psychiatric hospitals; for example, three
proportion of countries with a specified budget quarters of spending in Sri Lanka, Ghana, Kerala
for mental health care (90%); the Western Pacific (India), and Uganda were on psychiatric hospital
had the lowest proportion of countries (59%). care. Recently though, there have been shifts in
European countries often allocate funds specifi- many countries toward allocating funds to
cally for mental health, despite not necessarily community-based services as opposed to psychi-
always having a specific line item within their atric hospitals. As a result of this move into the
national budgets (World Health Organization realm of social and community care, in some
2005a). cases there has also been a trend to shift mental
Generally, mental health budget information health-care funding away from health budgets
is scarce in low-income countries (Raja et al. and onto social protection budgets. This
2010). A study by Raja et al. (2010) found how- intersectoral approach to financing mental health
ever that national ring-fenced budgets for mental care is not exclusive to high-income countries;
health as a percentage of national health spend- for example, according to the WHO, the Burundi
ing for 2007–2008 were less than 4% in Sri Ministry of Finance requested a social sector
Lanka, Ghana, and Kerala (India) and less than loan from the World Bank for work on early
7% in Uganda. Even in countries that dedicate childhood development, which had an explicit
substantial resources to mental health, coverage mental health component (World Health Organi-
for mental health-care services may be more zation 2003b).
limited than other health-care services (Knapp
et al. 2007).
Worldwide, government funds such as those Key Policy Dimensions/Recent Policies
generated by taxes are the most common source and Trends
of mental health financing (World Health Orga-
nization 2003b). In countries where the govern- Several key policy dimensions have dominated
ment pays for the bulk of mental health care, care the global conversation on mental health.
44 Provision of Health Services: Mental Health Care 987
consequences for them, as well as for govern- and bans on underage drinking. For example,
ments and society as a whole. For example, in a comprehensive anti-smoking campaign can
England, the total cost of schizophrenia has been reduce smoking by up to 6% (Saffer 2000).
estimated at almost £12 billion, and this includes More specifically, Jané-Llopis and Anderson
a cost to the public sector of more than £7 billion (2007, pp. 191–192) carefully lay out an inte-
(Andrew et al. 2012). Many relatives or other grated policy framework for the promotion of
unpaid carers of people with schizophrenia may mental health and the prevention of mental disor-
give up employment (4.8% of carers) or take time ders. These are subdivided by age categories:
off work (15.5% took a mean 12.5 days off) childhood and adolescence, adulthood and older
in order to provide care and support. In economic groups, as well as by type of approach, whether
terms, this translates into a loss of £517 public or mental health policy. Starting with fetal
(in 2011/2012 prices) per individual with schizo- development, it is important to raise awareness
phrenia living in a household (Mangalore and among expectant mothers of the risk of substance
Knapp 2007). use during pregnancy, for example, smoking
The WHO has emphasized that more support is while pregnant doubles the risk of lower birth
required for unpaid (sometimes called informal) weight. Educational programs in some countries
carers, as usually their expenses as well as their to help pregnant women cease smoking have had
opportunity costs (e.g., from lost employment) are immediate and long-term mental health gains on
not covered by the State or by insurance (World infants (Institute of Medicine 2001). Other inter-
Health Organization 2003a). In addition to the ventions during childhood include parenting
emotional strain of caring, relatives can also be interventions. These target basic reading skills or
exposed to the stigma and discrimination associ- other parenting skills and are said to improve
ated with mental ill health. This in turn often literacy as well as emotional and language growth
translates into social isolation and exclusion (Jané-Llopis and Anderson 2007, p. 193). Indeed,
from their communities, friends, and relatives. poor school performance increases the risk
of social and mental health problems. School
prevention programs involve general cognitive,
Prevention, Promotion, Public Mental problem-solving, and social skill-building,
Health (e.g., Campaigning) resulting in 50% reductions in depressive symp-
toms (Greenberg et al. 2001). However, and
Given the huge psychological, economic, and unfortunately, most low-income countries lack
societal burdens, much emphasis has been placed appropriate child and adolescent mental health
on the prevention and promotion of mental health services (Patel et al. 2008).
(World Health Organization 2004). (Much of Funk et al. (2012) also focused on similar
the discussion on prevention can be found in aspects to those of the WHO (2010) with regard
Jané-Llopis and Anderson (2007).) In addition to to mental health interventions to improve
targeted interventions, the WHO distinguishes development, by employing targeted poverty-
macro-strategies that may reduce risk and alleviation programs in order to break the cycle
improve quality of life. These include improving between mental illness and poverty.
nutrition (especially in impoverished countries); Funk et al. (2012) discuss many interventions,
improving housing and its quality; improving including pharmacological, psychosocial, and
access to education; reducing economic insecu- care-management strategies for schizophrenia,
rity; strengthening community networks through, depression, alcohol misuse, epilepsy, and suicide
for example, the Communities That Care prevention that have been associated with
program, already in force in the USA, the positive outcomes across the world, regardless of
Netherlands, the UK, and Australia (Hawkins wealth. Suicide prevention should be highlighted
et al. 2002); and reducing the harm from addictive through comprehensive public health programs
substances, through interventions such as taxes and should at least comprise the following
44 Provision of Health Services: Mental Health Care 989
interventions in low- and middle-income coun- Total estimated worldwide costs of dementia
tries (LMICs): reducing the access to means were US$ 604 billion in 2010. In high-income
for suicide, responsible and deglamorized media countries, informal care (45%) and formal social
reporting, and early identification and treatment of care (40%) make up the majority of costs, in
people with mental and substance use disorders. comparison to direct medical costs (15%)
An important point to consider in working- which are much lower. In low-income and
age adults is employment and associated stress lower-middle-income countries, direct social
factors that may lead to anxiety, depression, or care costs are small, and the costs of unpaid
stress-related problems. Interventions to care provided by the family dominate (World
improve mental health in the workplace have Health Organization 2012). Given the expected
centered on task and technical interventions growth over the coming decades in the number
(e.g., lowering workload or ergonomic improve- of people with dementia, the costs of supporting
ments) and clarifying job role expectations as and treating them can also be expected to
well as improving social environment (e.g., con- increase rapidly too. For example, a study com-
flict resolution) (Price and Kompier 2006). There paring future dementia costs in Italy, Spain, the
is now evidence that many of these prevention UK, and Germany suggested that the proportion
and promotion initiatives can be not only effec- of GDP spent on long-term care would more than
tive but also cost-effective. Andrew et al. (2012) double between 2000 and 2050 (Comas-Herrera
assessed the various interventions in schizophre- et al. 2006).
nia in terms of effectiveness and cost- These projected future trends have prompted
effectiveness. One intervention, where authors much discussion and also some real action
found strong evidence for cost-effectiveness, across many countries. One of the first countries
was individual placement and support, which to develop such a plan was Canada in 1999,
aims to help people with schizophrenia find com- and their “Alzheimer Strategy – Preparing for
petitive employment. our future” runs till 2014. A good example of
an integrated action plan for dementia comes
from France, which was one of the first
Aging and Dementia European countries to launch such a program
(in 2008). Based on 44 measures to combat
With the world population aging rapidly, and dementia and related disorders (République
people living longer, the prevalence rate of Française 2013), the key aims are to improve
age-related disorders is increasing. One such dis- diagnosis, to provide better treatment and sup-
order is dementia, which often has an overwhelm- port through establishing “coordinators”
ing effect on the individual with the illness, their throughout the country and through encouraging
family, and society more generally, prompting the treatment at home by skilled support staff,
WHO to promote it as a major public health pri- and to provide more effective help through
ority (World Health Organization 2012). Demen- developing and diversifying respite structure
tia is a chronic and progressive syndrome, caused and through the use of technology (such as a
by a variety of brain illnesses and affecting mem- telephone line or a website). A final aim was to
ory, thinking, behavior, and ability to perform create a foundation for scientific cooperation
everyday activities. The latest figures from the to stimulate and coordinate research through
WHO (2012) estimated the total number of people memory clinics and diagnostics centers, with a
with dementia worldwide in 2010 to be 35.6 mil- lesser reliance on antipsychotic drugs
lion, and this number is projected to nearly double (République Française 2008). It also aimed to
every 20 years, to 65.7 million in 2030 and 115.4 change the way dementia is viewed, by raising
million in 2050. The worldwide annual incidence awareness at the national and international level.
rate of dementia is nearly 7.7 million, implying The plan pledged 1.6 billion Euros over this
one new case every 4 s. period.
990 J. Cylus et al.
More recently (December 2013), the Health are relatively low compared to the general
Ministers from the G8 countries met in London population.
for a Dementia Summit, following which they People with a history of mental health
jointly issued a declaration and communique, problems face problems in the open employment
spelling out clearly the challenges so often market, including stigma, a reluctance from
experienced by family and other cares of employers to give them a job (Manning and
people with dementia and the need for action. White 1995), with some even alluding to
Further joint action is planned to tackle what their perceived risk of violence (Roberts et al.
has become a major global mental health 2004). A recent study using Eurobarometer
challenge. surveys of 2006 and 2010 has demonstrated
that the economic crisis has widened the gap
even more in terms of unemployment rates
Employment between people with and without mental health
problems; those who were particularly affected
Previous studies have shown the enhancing effect were men and people with lower educational
employment can have within the mental health attainment (Evans-Lacko et al. 2013b). Addi-
population. However, poor-quality jobs can be tionally, people living within countries with
detrimental to mental health. This is problematic, higher levels of stigmatizing attitudes toward
due to the fact that people with mental health people with mental health problems were partic-
problems often find themselves in low-skilled ularly more vulnerable to unemployment
jobs, which can add strain to their emotional in 2010.
well-being, as well as not being suitable to their Nevertheless, it is generally agreed that
needs and preferences (OECD 2011). motivation to work has a significant influence on
However, despite these gains in outcomes, whether people with severe mental illness gain
employment rates among people with mental competitive employment (Catty et al. 2008); anti-
health problems vary by diagnosis severity and psychotic medication also plays a role here as
90
80
70
60
50
40
30
20
10
0
Employment/population ratio (employed people as a proportion of the working-age population), by severity of mental
disorder, ten OECD countries, latest available year (late 2000s) (OECD 2011)
44 Provision of Health Services: Mental Health Care 991
Incidence of absenteeism and presenteeism (in percentage) and average absence duration (in days), by mental health status, average over 21 European OECD countries in 2010
Panel A. Sickness absence incidence Panel B. Average duration of sickness absence Panel C. Presenteeism incidence
Percentage of persons who have been absent from work in Average number of days absent from work in the past Percentage of workers not absent in the past 4 weeks but who
the past 4 weeks (apart from holidays) 4 weeks (of those who have been absent) accomplished less than they would like as a result of an
emotional or physical health problem
45 8 90
88
40 42 7 80
7.3
35 70
6
69
Provision of Health Services: Mental Health Care
30 60
5 5.6
25 28 50
4.8
4
20 40
19 3
15 30
2 26
10 20
5 1 10
0 0 0
Severe disorder Moderate No mental Severe disorder Moderate No mental Severe disorder Moderate No mental
disorder disorder disorder disorder disorder disorder
Note: Averages are represented by dashed lines
Source: OECD calculations based on Eurobarometer (2010)
993
994 J. Cylus et al.
Stigma seemed to be pervasive across the media changes attitudes towards people with mental
board: people in low-income and maybe middle- illness. Soc Sci Med. 1996;43(12):1721–8.
Angermeyer MC, Matschinger H. Causal beliefs and
income countries still experience basic human attitudes to people with schizophrenia. Trend analysis
rights violations, while in high-income countries, based on data from two population surveys in Germany.
policies are more focused on social inclusion and [Research Support, Non-U.S. Gov’t]. Br J Psychiatry.
integration. Indeed, the promotion of personalized 2005;186:331–4.
Ayuso-Mateos JL. Global burden of bipolar disorder in the
care and service user empowerment, although year 2000. Geneva: WHO; 2000a.
producing mixed results, are steps in the right Ayuso-Mateos JL. Global burden of obsessive-compulsive
direction. disorder in the year 2000. Geneva: WHO; 2000b.
Deinstitutionalization has yet to occur in many Ayuso-Mateos JL. Global burden of panic disorder in the
year 2000: version 1 estimates. Geneva: WHO; 2000c.
countries, although the process has been under Ayuso-Mateos JL. Global burden of schizophrenia in the
way in some parts of Europe and the USA since year 2000: version 1 estimates. Geneva: WHO; 2000d.
the 1970s. Regardless of the financial resources, Baldwin ML, Marcus SC. Stigma, discrimination, and
and funding arrangements in place, perhaps employment outcomes among persons with mental
health disabilities. In: Schultz IZ, Rogers ES, editors.
stigma and discrimination do play a role in the Work accommodation and retention in mental health.
continued existence of asylums and institutions, New York: Springer; 2011.
alluding to the so-called NIMBY phenomenon Bebbington P. Population surveys of psychiatric disorder
(Thornicroft 2007). and the need for treatment. Soc Psychiatry Psychiatr
Epidemiol. 1990;25(1):33–40.
Mixed evidence exists with regard to anti- Bennett D. The value of work in psychiatric rehabilitation.
stigma campaigns, and it could be that a more Soc Psychiatry. 1970;5(4):224–30.
integrated approach to mental health promotion Boardman J. How are people with mental health problems
should be adopted somehow, with a focus on excluded? In: Boardman J, Currie A, Killaspy H,
Mezey G, editors. Social inclusion and mental health.
prevention as well. London: Royal College of Psychiatrists; 2010.
E-technologies may prove to be innovative and Bond GR. Principles of individual placement and support.
perhaps more importantly, cost-effective solutions Psychiatr Rehabil J. 1998;27:345–59.
but should still be regarded as complementary Bond GR, Drake RE, Mueser KT, Becker DR. An update
on supported employment for people with severe men-
therapies. tal illness. Psychiatr Serv. 1997;48(3):335–46.
Brohan E, Elgie R, Sartorius N, Thornicroft G, GAMIAN-
Europe Study Group. Self-stigma, empowerment and
perceived discrimination among people with schizo-
References phrenia in 14 European countries: the GAMIAN-
Europe study. Schizophr Res. 2010;122(1–3):232–8.
Alem A, Jacobsson L, Araya M, Kebede D, Kullgren G. Catty J, Lissouba P, White S, Becker T, Drake RE,
How are mental disorders seen and where is help Fioritti A, et al. Predictors of employment for people
sought in a rural Ethiopian community? A key infor- with severe mental illness: results of an international
mant study in Butajira, Ethiopia. Acta Psychiatr Scand. six-centre randomised controlled trial. Br J Psychiatry.
1999;397:40–7. 2008;192:224–31.
Al-Krenawi A, Graham JR. Culturally sensitive social Centre for Housing Research. Supporting people. 2013. From
work practice with Arab clients in mental health set- https://supportingpeople.st-andrews.ac.uk/index.cfm
tings. Health Soc Work. 2000;25(1):9–22. Chapman DP, Perry GS, Strine TW. The vital link between
Anderson R, Wynne R, McDaid D. Housing and employ- chronic disease and depressive disorders. Prev Chronic
ment. In: Knapp M, McDaid D, Mossialos E, Dis. 2005;3(2):1–3.
Thornicroft G, editors. Mental health policy and prac- Cimpean D, Drake RE. Treating co-morbid chronic
tice across Europe. Maidenhead: Open University medical conditions and anxiety/depression. [Review].
Press/McGraw-Hill Education; 2007. Epidemiol Psychiatr Sci. 2011;20(2):141–50.
Andrew A, Knapp M, McCrone P, Parsonage M, Comas-Herrera A, Wittenberg R, Costa-Font J, Gori C,
Trachtenberg M. Effective interventions in schizophre- Di Maio A, Patxot C, et al. Future long-term care
nia: the economic case: a report prepared for the expenditure in Germany, Spain, Italy and the United
Schizophrenia Commission. London: Rethink Mental Kingdom. Ageing Soc. 2006;26:285–302.
Illness; 2012. Cook J, Razzano L. Vocational rehabilitation for persons
Angermeyer MC, Matschinger H. Reporting of isolated with schizophrenia: recent research and implications
violent attacks by people with schizophrenia in the for practice. Schizophr Bull. 2000;26(1):87–103.
44 Provision of Health Services: Mental Health Care 995
Crosby C, Barry M, Carter MF, Lowe CF. Psychiatric Evans-Lacko S, Malcolm E, West K, Rose D, London J,
rehabilitation and community care: resettlement Rusch N, et al. Influence of Time to Change’s
from a North Wales Hospital. Health Soc Care. social marketing interventions on stigma in England
1993;1:355–63. 2009–2011. [Evaluation Studies Research Support,
Crowther RE, Marshall M, Bond GR, Huxley P. Helping Non-U.S. Gov’t]. Br J Psychiatry Suppl. 2013c;55:
people with severe mental illness to obtain work: sys- s77–88.
tematic review. Br Med J. 2001;322:204–8. Fakhoury W, Priebe S. The process of deinstitut-
Cuyun Carter GB, Milton DR, Ascher-Svanum H, ionalisation: an international overview. Curr Opin
Faries DE. Sustained favorable long-term outcome in Psychiatry. 2002;15:187–92.
the treatment of schizophrenia: a 3-year prospective Fakhoury W, Priebe S. Deinstitutionalization and
observational study. BMC Psychiatry. 2011;11:143. reinstitutionalization: major changes in the provision
Davey V, Fernández J-L, Knapp M, Vick N, Jolly D, of mental healthcare. Psychiatry. 2007;6(8):313–6.
Swift P, et al. Direct payments: a national survey of Fernandez-Aranda F, Jimenez-Murcia S, Santamaria JJ,
direct payments policy and practice. London: London Gunnard K, Soto A, Kalapanidas E, et al. Video
School of Economics; 2007. games as a complementary therapy tool in mental dis-
Deambrosis P, Chinellato A, Terrazzani G, Pullia G, orders: PlayMancer, a European multicentre study.
Giusti P, Skaper SD, et al. Antidepressant drug pre- J Ment Health. 2012;21(4):364–74.
scribing patterns to outpatients of an Italian local health Foroushani PS, Schneider J, Assareh N. Meta-review
authority during the years 1998 to 2008. [Comparative of the effectiveness of computerised CBT in treating
Study Letter Research Support, Non-U.S. Gov’t]. depression. [Comparative Study Research Support,
J Clin Psychopharmacol. 2010;30(2):212–5. Non-U.S. Gov’t Review]. BMC Psychiatry. 2011;
Department for Education and Employment. Towards full 11:131.
employment in a modern society. 2001. Funk M, Saraceno B, Drew N. Global perspective on
Department of Health. Our health, our care, our say: mental health policy and service development issues.
a new direction for community services. 2006. In: Knapp M, McDaid D, Mossialos E, Thornicroft G,
Retrieved from http://webarchive.nationalarchives. editors. Mental health policy and practice across
gov.uk/+/www.dh.gov.uk/en/Publicationsandstatistics/ Europe: the future direction of mental health care.
Publications/PublicationsPolicyAndGuidance/DH_41 Maidenhead: Open University Press; 2005.
27453 Gabriel P, Liimatainen M-R. Mental health in the work-
Department of Health. Direct payments. 2008. From http:// place. Geneva: International Labour Organisation;
www.dh.gov.uk/en/SocialCare/Socialcarereform/Perso 2000.
nalisation/Directpayments/DH_080273 Godard C, Chevalier A, Lecrubier Y, Lahon G. APRAND
Department of Health. No health without mental health: a programme: an intervention to prevent relapses of anx-
cross-government mental health outcomes strategy for iety and depressive disorders – first results of a medical
people of all ages. 2011. health promotion intervention in a population of
Drew N, Funk M, Pathare S, Swartz L. Mental health employees. Eur Psychiatry. 2006;21(7):451–9.
and human rights. In: Herrman H, Saxena S, Goffman E. Stigma: notes on the management of spoiled
Moodie R, editors. Promoting mental health: concepts, identity. Englewood Cliffs: Prentice-Hall; 1963.
emerging evidence, practice. Geneva: World Health Gournay K. Commentaries and reflections on mental health
Organisation; 2005. nursing in the UK at the dawn of the new millennium:
Ellis PM, Smith DA, beyond blue: the national depression commentary 2. J Ment Health. 2000;9(6): 621–3.
initiative. Treating depression: the beyond blue guide- Greenberg MT, Domitrovich C, Bumbarger B. The preven-
lines for treating depression in primary care. “Not so tion of mental disorders in school-aged children: cur-
much what you do but that you keep doing it”. [Guide- rent state of the field. Prev Treat. 2001;4.
line Meta-Analysis Practice Guideline Research Sup- Grove B. Mental health and employment. Shaping a new
port, Non-U.S. Gov’t]. Med J Aust. 2002;176(Suppl): agenda. J Ment Health. 1999;8:131–40.
S77–83. Hawkins JD, Catalano RF, Arthur MW. Promoting
EMERALD. EMERALD. 2014. science-based prevention in communities. Addict
Eurobarometer. Mental health eurobarometer. 2010. Behav. 2002;27(6):951–76.
Evans-Lacko S, Henderson C, Thornicroft G. Public Health and Social Care Information Centre. Community
knowledge, attitudes and behaviour regarding care statistics, social services activity – England,
people with mental illness in England 2009–2012. 2011–12, Final release. 2013. From https://catalogue.
Br J Psychiatry. 2013a;202:51–7. ic.nhs.uk/publications/social-care/activity/comm-care-
Evans-Lacko S, Knapp M, McCrone P, Thornicroft G, soci-serv-act-eng-11-12-fin/comm-care-stat-eng-2011-
Mojtabai R. The mental health consequences of the 12-soci-serv-act-rep.pdf
recession: economic hardship and employment of peo- Institute of Medicine. Clearing the smoke.
ple with mental health problems in 27 European coun- Washington, DC: National Academy Press; 2001.
tries. [Research Support, Non-U.S. Gov’t]. PLoS One. Jané-Llopis E, Anderson P. A policy framework for the
2013b;8(7):e69792. promotion of mental health and the prevention of
996 J. Cylus et al.
mental disorders. In: Knapp M, McDaid D, Manning C, White PD. Attitudes of employers to the
Mossialos E, Thornicroft G, editors. Mental health mentally ill. Psychiatr Bull. 1995;19:541–3.
policy and practice across Europe. Maidenhead: Marrone J, Golowka E. If work makes people with mental
McGraw-Hill; 2007. illness sick, what do unemployment, poverty, and
Kessler RC, Aguilar-Gaxiola S, Alonso J, Chatterji S, Lee S, social isolation cause? Psychiatr Rehabil J. 2000;
Ormel J, et al. The global burden of mental disorders: an 23(2):187–93.
update from the WHO World Mental Health (WMH) McCourt CA. Life after hospital closure: users’ views
surveys. [Research Support, N.I.H., Extramural]. of living in residential ‘resettlement’ projects. A case
Epidemiol Psichiatr Soc. 2009;18(1): 23–33. study in consumer-led research. Health Expect.
Killaspy H. From the asylum to community care: learning 2000;3:192–202.
from experience. Br Med Bull. 2006;79:245–58. McDaid D, Knapp M, Medeiros H, the MHEEN Group.
Knapp M. Mental health policy and practice across Europe: Employment and mental health: assessing the eco-
the future direction of mental health care. Maidenhead: nomic impact and the case for intervention. London:
Open University Press; 2007. Personal Social Services Research Unit; 2008.
Knapp M, McDaid D. Financing and funding mental health Medeiros H, McDaid D, Knapp M, the MHEEN Group.
care services. In: Knapp M, McDaid D, Mossialos E, Shifting care from hospital to the community in
Thornicroft G, editors. Mental health policy and prac- Europe: economic challenges and opportunities.
tice across Europe. Maidenhead: McGraw-Hill/Open London: Personal Social Services Research Unit; 2008.
University Press; 2007. Mental Disability Advocacy Centre. Guardianship and
Knapp M, McDaid D, Amaddeo F. Financing arrange- human rights in Bulgaria: analysis of law, policy and
ments for mental health in Western Europe. J Ment practice. 2007a.
Health. 2006. Mental Disability Advocacy Centre. Guardianship and
Knapp M, McDaid D, Amaddeo F, Constantopoulos A, human rights in Russia: analysis of law, policy and
Oliveira MD, Salvador-Carulla L, Zechmeister I, the practice. 2007b.
MHEEN Group. Financing mental health care in Mindout for mental health. Working minds: making mental
Europe. J Ment Health. 2007;16(2):167–80. health your business. 2000.
Kohn R, Saxena S, Levav I, Saraceno B. The treatment Mizuno M, Sakuma K, Ryu Y, Munakata S, Takebayashi T,
gap in mental health care. [Research Support, Murakami M, et al. The Sasagawa project: a model for
Non-U.S. Gov’t Review]. Bull World Health Organ. deinstitutionalisation in Japan. Keio J Med. 2005;
2004;82(11):858–66. 54(2):95–101.
Kuno E, Asukai N. Efforts toward building a community- Moessner M, Bauer S. Online counselling for eating dis-
based mental health system in Japan. Int J Law orders: reaching an underserved population? J Ment
Psychiatry. 2000;23(3–4):361–73. Health. 2012;21(4):336–45.
Larrobla C, Botega NJ. Psychiatric care policies and Moussavi S, Chatterji S, Verdes E, Tandon A, Patel V,
deinstitutionalization in South America. Actas Esp Ustun B. Depression, chronic diseases, and decrements
Psiquiatr. 2000;28(1):22–30. in health: results from the World Health Surveys.
Lawrie SM. Newspaper coverage of psychiatric and phys- Lancet. 2007;370(9590):851–8.
ical illness. Psychiatr Bull. 2000;24:104–6. Moxham LJ, Pegg SA. Permanent and stable housing
Leff J. The outcome for long-stay non-demented patients. for individuals living with a mental illness in the
In: Leff J, editor. Care in the community: illusion or community: a paradigm shift in attitude for mental
reality? London: Wiley; 1997. health nurses. Aust N Z J Ment Health Nurs.
Leff J, Warner R. Social inclusion of people with mental 2000;9(2):82–8.
illness. Cambridge: Cambridge University Press; 2006. National Centre for Social Research. Adult psychiatric
Leff J, Dayson D, Gooch C, Thornicroft G, Wills W. morbidity in England, 2007. Results of a household
Quality of life of long stay patients discharged from survey. 2007. Retrieved from http://www.ic.nhs.uk/
two psychiatric institutions. Psychiatr Serv. 1996; webfiles/publications/mental%20health/other%20men
47:62–7. tal%20health%20publications/Adult%20psychiatric%
Lehtinen V, Katschnig H, Kovess-Masféty V, Goldberg D. 20morbidity%2007/APMS%2007%20(FINAL)%20St
Mental health policy and practice across Europe. andard.pdf
Maidenhead: McGraw Hill; 2007. Naylor C, Parsonage M, McDaid D, Knapp M, Fossey M,
Lelliott P, Tulloch S, Boardman J, Harvey S, Henderson M, Galea A. Long-term conditions and mental health. The
Knapp M. Mental health and work. London: Royal cost of co-morbidities. London: The King’s Fund and
College of Psychiatrists; 2008. Centre for Mental Health; 2012.
Limb M. Digital technologies offer new ways to OECD. Sickness, disability and work: breaking the
tackle mental health problems. Br Med J. 2012;345: barriers. 2010.
e5163. OECD. Sick on the job. 2011. From http://www.oecd.org/
Mangalore R, Knapp M. Cost of schizophrenia in England. els/emp/sickonthejob2011.htm
[Research Support, Non-U.S. Gov’t]. J Ment Health OECD. Sick on the job? Myths and realities about mental
Policy Econ. 2007;10(1):23–41. health and work. OECD Publishing; 2012.
44 Provision of Health Services: Mental Health Care 997
Ohara T, Doi Y, Ninomiya T, Hirakawa Y, Hata J, Iwaki T, incidence of schizophrenia in different cultures. A pre-
et al. Glucose tolerance status and risk of dementia liminary report on the initial evaluation phase of the
in the community: the Hisayama study. [Comparative WHO Collaborative Study on determinants of outcome
Study Research Support, Non-U.S. Gov’t]. Neurology. of severe mental disorders. Psychol Med. 1986;16(4):
2011;77(12):1126–34. 909–28.
Olfson M, Marcus SC. National patterns in antidepressant Savaya R. Attitudes towards family and marital counseling
medication treatment. [Comparative Study Research among Israeli Arab women. J Soc Serv Res. 1995;
Support, Non-U.S. Gov’t Research Support, 21(1):35–51.
U.S. Gov’t, P.H.S.]. Arch Gen Psychiatry. 2009;66(8): Secker J, Grove B, Seebohm P. Challenging barriers to
848–56. employment, training and education for mental health
Oxford Dictionaries. Stigma. Oxford Dictionaries. 2010. service users: the service user’s perspective. J Ment
From http://oxforddictionaries.com/definition/stigma Health. 2001;10(4):395–404.
Paykel ES, Hart D, Priest RG. Changes in public attitudes Sederer LI, Silver L, McVeigh KH, Levy J. Integrating
to depression during the Defeat Depression Campaign. care for medical and mental illnesses. [Comment].
[Research Support, Non-U.S. Gov’t]. Br J Psychiatry. Prev Chronic Dis. 2006;3(2):A33.
1998;173:519–22. Shepherd G. Institutional care and rehabilitation. London:
Price R, Kompier M. Work stress and unemployment: Longman; 1984.
risks, mechanisms, and prevention. In: Hosman C, Shepherd G, Muijen M, Dean R, Cooney M. Inside resi-
Jané-Llopis E, Saxena S, editors. Prevention of mental dential care. London: The Sainsbury Centre for Mental
disorders: effective strategies and policy options. Health; 1995.
Oxford: Oxford University Press; 2006. Social Exclusion Unit. Mental health and social
Priebe S, Badesconyi A, Fioritti A, Hansson L, Kilian R, exclusion social exclusion unit report. 2004. Retrieved
Torres-Gonzales F, et al. Reinstitutionalisation in men- from http://www.socialinclusion.org.uk/publications/
tal health care: comparison of data on service provision SEU.pdf
from six European countries. Br Med J. 2005; Spandler H, Vick N. Direct payments, independent living
330:123–6. and mental health. London: Health and Social Care
Raja S, Wood SK, de Menil V, Mannarath SC. Mapping Advisory Service; 2004.
mental health finances in Ghana, Uganda, Sri Lanka, Tansella M. Community psychiatry without mental
India and Lao PDR. Int J Ment Heal Syst. 2010;4:11. hospitals – the Italian experience: a review. J R Soc
Read J, Baker S. Not just sticks and stones: a survey of Med. 1986;79:664–9.
the discrimination experienced by people with mental Thornicroft G. Shunned. Discrimination against people
health problems. 1996. with mental illness. Oxford: Oxford University Press;
République Française. National plan for “Alzheimer and 2007.
related diseases” 2008–2012. 2008. Available from Thornicroft G, Bebbington PE. Deinstitutionalisation –
http://www.plan-alzheimer.gouv.fr/IMG/pdf/Plan_ from hospital closure to service development. Br
Alzheimer_2008-2012_uk.pdf J Psychiatry. 1989;155:739–53.
République Française. 44 measures in order to fight Thornicroft G, Maingay S. The global response to mental
Alzheimer’s disease and related disorders. 2013. From illness – an enormous health burden is increasingly
http://www.plan-alzheimer.gouv.fr/-44-measures-.html being recognised. Br Med J. 2002;325(7365):608–9.
Roberts S, Heaver C, Hill K, Rennison J, Stafford B, Thornicroft G, Brohan E, Rose D, Sartorius N, Leese M.
Howat N, et al. Disability in the workplace: employers Global pattern of experienced and anticipated discrim-
and service providers’ response to the Disability ination against people with schizophrenia: a cross-
Discrimination Act in 2003 and preparation for 2004 sectional survey. Lancet. 2009;373:408–15.
changes. 2004. Van Dongen CJ. Quality of life and self-esteem in
Royal College of Pscyhiatrists Public Education Editorial working and nonworking persons with mental illness.
Board. Antipsychotics. 2012. From http://www.rcpsyc Community Ment Health J. 1996;32(6):535–48.
h.ac.uk/mentalhealthinfo/treatments/antipsychoticmed Velayudhan L, Poppe M, Archer N, Proitsi P, Brown RG,
ication.aspx Lovestone S. Risk of developing dementia in people
Saffer H. Tobacco advertising and promotion. In: Jha P, with diabetes and mild cognitive impairment.
Chaloupka F, editors. Tobacco control in developing [Research Support, Non-U.S. Gov’t]. Br J Psychiatry.
countries. Oxford: Oxford Medical Publications; 2000. 2010;196(1):36–40.
p. 215–36. Verdoux H, Tournier M, Begaud B. Antipsychotic pre-
Sainsbury Centre for Mental Health. Briefing 27. Benefits scribing trends: a review of pharmaco-epidemiological
and work for people with mental health problems: a studies. [Review]. Acta Psychiatr Scand. 2010;
briefing for mental health workers. 2004. Retrieved 121(1):4–10.
from http://www.centreformentalhealth.org.uk/pdfs/ Wahl OF. Telling is risky business. New Brunswick:
briefing_27.pdf Rutgers University Press; 1999.
Sartorius N, Jablensky A, Korten A, Ernberg G, Anker M, Wang PS, Patrick A, Avorn J, Azocar F, Ludman E,
Cooper JE, et al. Early manifestations and first-contact McCulloch J, et al. The costs and benefits of enhanced
998 J. Cylus et al.
depression care to employers. [Research Support, World Health Organisation. Mental health financing.
N.I.H., Extramural Research Support, Non-U.S. Gov’t]. 2003b.
Arch Gen Psychiatry. 2006;63(12):1345–53. World Health Organisation. Prevention of mental
Wang PS, Simon GE, Avorn J, Azocar F, Ludman EJ, disorders. Effective interventions and policy options.
McCulloch J, et al. Telephone screening, outreach, Geneva: WHO; 2004.
and care management for depressed workers and World Health Organisation. Mental health atlas 2005.
impact on clinical and work productivity outcomes – Geneva: WHO; 2005a.
a randomized controlled trial. JAMA. 2007;298(12): World Health Organisation. WHO resource book on men-
1401–11. tal health, human rights and legislation. Stop exclusion,
Warner R. Recovery from schizophrenia: psychiatry and dare to care. Geneva; 2005b.
political economy. Oxford: Oxford University Press; World Health Organisation. Scaling up care for mental,
1994. neurological, and substance use disorders. Geneva:
Whiteford HA, Degenhardt L, Rehm J, Baxter AJ, World Health Organisation; 2008.
Ferrari AJ, Erskine HE, et al. Global burden of disease World Health Organisation. Eurobarometer 73.2. Mental
attributable to mental and substance use disorders: health. Brussels: WHO; 2010.
findings from the Global Burden of Disease Study World Health Organisation. Mental health atlas 2011.
2010. Lancet. 2013;382(9904):1575–86. Geneva: WHO; 2011.
Wicks P. E-mental health: a medium reaches maturity. World Health Organisation. Dementia a public health
[Comment Editorial]. J Ment Health. 2012;21(4): priority. Geneva: WHO; 2012.
332–5. World Health Organisation. Mental health action plan
Wing L, Gould J, Gillberg C. Autism spectrum disorders in 2013–2020. Geneva: WHO; 2013.
the DSM-V: better or worse than the DSM-IV? Res Dev Xu W, Qiu C, Gatz M, Pedersen NL, Johansson B,
Disabil. 2011;32(2):768–73. Fratiglioni L. Mid- and late-life diabetes in relation to
World Health Organisation. The ICD-10 classification of the risk of dementia: a population-based twin study.
mental and behavioural disorders: clinical descriptions [Research Support, N.I.H., Extramural Research
and clinical guidelines. Geneva: WHO; 1992. Support, Non-U.S. Gov’t Twin Study]. Diabetes.
World Health Organization. The world health report 2001: 2009; 58(1):71–7.
mental health: new understanding, new hope. Geneva: Yip KS. Have psychiatric services in Hong Kong
World Health Organization; 2001. been impacted by the deinstitutionalization and com-
World Health Organisation. Investing in mental health. munity care movements? Adm Policy Ment Health.
Geneva: WHO; 2003a. 2000;27(6):443–9.