Health Services Research (2019) PDF

Health Services Research
Series Editor: Boris Sobolev
Adrian Levy · Sarah Goring

Constantine Gatsonis · Boris Sobolev
Ewout van Ginneken · Reinhard Busse Editors
Health
Services
Evaluation
Health Services Research
Series Editor
Boris Sobolev
University of British Columbia
Vancouver, BC, Canada
Health services research is the study of the organization, uses and outcomes of
health care. The societal value of health services research lies in identifying the
ways in which health care can best be organized, financed, and delivered.
This ambitious agenda brings together researchers from a wide range of
disciplinary backgrounds that are required for evaluating the effectiveness of
diagnostic technologies, treatment procedures, and managerial solutions. The
series is envisaged as a collection that overviews the established knowledge
and provides access to accepted information in the field. The content is
grouped into six major areas.
1. Clinical evaluation of health care outcomes

2. Medical practice variations
3. Research methods
4. Health care systems and policies
5. Sources of data
6. Health economics in health services research.
The series will be of significant interest for healthcare professionals, program

directors, service administrators, policy and decision makers, as well as for
graduate students, educators, and researchers in healthcare evaluation.
More information about this series at http://www.springer.com/series/13490

Adrian Levy • Sarah Goring
Constantine Gatsonis • Boris Sobolev
Ewout van Ginneken • Reinhard Busse
Editors
Health Services
Evaluation
With 142 Figures and 137 Tables

Editors
Adrian Levy Sarah Goring
Community Health and Epidemiology ICON plc
Dalhousie University Vancouver, BC, Canada
Halifax, NS, Canada
Constantine Gatsonis Boris Sobolev

Department of Biostatistics University of British Columbia
Brown University Vancouver, BC, Canada
Providence, RI, USA
Ewout van Ginneken Reinhard Busse

Berlin University of Technology Technische Universität Berlin
Berlin, Germany Berlin, Germany
European Observatory on Health Department Health Care Management
Systems and Policies, Department of Faculty of Economics and Management
Health Care Management Technische Universität
Berlin University of Technology Berlin, Germany
Berlin, Germany
ISSN 2511-8293 ISSN 2511-8307 (electronic)

ISBN 978-1-4939-8714-6 ISBN 978-1-4939-8715-3 (eBook)
ISBN 978-1-4939-8716-0 (print and electronic bundle)
https://doi.org/10.1007/978-1-4939-8715-3
Library of Congress Control Number: 2018960887
# Springer Science+Business Media, LLC, part of Springer Nature 2019

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or
part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way,
and transmission or information storage and retrieval, electronic adaptation, computer software, or
by similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt
from the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with
regard to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Science+Business Media,
LLC part of Springer Nature.
The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.
Series Preface
Health Services Research has experienced explosive growth in the past three
decades. The new field was formed at the interface of a number of disciplines,
including medicine, statistics, economics, management science, and the social
and behavioral sciences, which came together around the study of health care
practice, delivery and outcomes. The rich, multidisciplinary research enter-
prise that developed from this fusion has already produced a growing and
sophisticated body of subject matter research and has also defined a body of
methodology that is integral to the field. True to the multidisciplinary origins
of the field, its methods continue to benefit from developments in diverse
disciplines, while formulating and addressing scientific questions that are
unique to health care and outcomes research.
The societal value of health services research lies in identifying the ways in
which health care can best be organized, financed, and delivered. This ambi-
tious agenda brings together researchers from a wide range of disciplinary
backgrounds who are required for evaluating the effectiveness of diagnostic
technologies, treatments, procedures, and health delivery systems as no single
discipline provides a full perspective on how the health systems operate.
A fundamental discovery was the persistent variation in health care utili-
zation across providers, regions and countries, variation that cannot be
explained by population illness level, known benefit or patient preference.
Another discovery was that treatments and procedures that are meant to benefit
patients may produce adverse events and unintended consequences. We have
learned that results of randomized clinical trials cannot always be generalized
to clinical practice because patients enrolled in trials can be highly selective.
Researchers have been able to identify patients who may benefit from a
treatment but there are groups of patients for whom the optimal treatment is
not well defined or may depend on their personal preferences. Learning what
works in real life gave rise to comparative effectiveness research.
The Health Services Research series addresses the increasing need for a
comprehensive reference in the field of inquiry that welcomes interdisciplinary
collaborations. This major reference work aims to be a source of information
for everyone who seeks to develop an understanding of health services and
health systems, and learn about the historic, political, and economic factors
v
vi Series Preface
that influence health policies at global, national, regional and local levels. The
intended readership includes graduate students, educators, researchers,
healthcare professionals, policy makers and service administrators.
The main reason for public support of health services research is the
common understanding that new knowledge will lead to more effective health
care. Over the past decades, we have witnessed the increased prominence of
health services and health policy research since the knowledge, skills and
approaches required for ground-breaking work distinguish it from other spe-
cialties. An important step towards the formation of the profession is a
comprehensive reference work of established knowledge. The Health Services
Research series is intended to provide the health services researcher a home for
the foundations of the profession.
The Health Services Research series is available in both printed and online
formats. The online version will serve as a web-based conduit of information
that evolves as knowledge content expands. This innovative depository of
knowledge will offer various search tools, including cross-referencing across
chapters and linking to supplement data, other Springer reference works and
external articles.
July 2015 Boris Sobolev

Contents
Part I Data and Measures in Health Services Research . . . . . . . 1
1 Health Services Data: Big Data Analytics for Deriving

Predictive Healthcare Insights . . . . . . . . . . . . . . . . . . . . . . . . 3
Ankit Agrawal and Alok Choudhary
2 Health Services Data: Managing the Data Warehouse:
25 Years of Experience at the Manitoba Centre for
Health Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Mark Smith, Leslie L. Roos, Charles Burchill, Ken Turner,
Dave G. Towns, Say P. Hong, Jessica S. Jarmasz,
Patricia J. Martens, Noralou P. Roos, Tyler Ostapyk,
Joshua Ginter, Greg Finlayson, Lisa M. Lix, Marni Brownell,
Mahmoud Azimaee, Ruth-Ann Soodeen, and J. Patrick Nicol
3 Health Services Data, Sources and Examples: The Institute
for Clinical Evaluative Sciences Data Repository . . . . . . . . . 47
Karey Iron and Kathy Sykora
4 Health Services Data: The Centers for Medicare and
Medicaid Services (CMS) Claims Records . . . . . . . . . . . . . . . 61
Ross M. Mullner
5 Health Services Data: Typology of Health Care Data . . . . . . 77
Ross M. Mullner
6 Health Services Information: Application of Donabedian’s
Framework to Improve the Quality of Clinical Care . . . . . . 109
A. Laurie W. Shroyer, Brendan M. Carr, and
Frederick L. Grover
7 Health Services Information: Data-Driven Improvements
in Surgical Quality: Structure, Process, and Outcomes . . . . 141
Katia Noyes, Fergal J. Fleming, James C. Iannuzzi, and
John R. T. Monson
vii
viii Contents
8 Health Services Information: From Data to Policy Impact

(25 Years of Health Services and Population Health Research
at the Manitoba Centre for Health Policy) . . . . . . . . . . . . . . . 171
Leslie L. Roos, Jessica S. Jarmasz, Patricia J. Martens,
Alan Katz, Randy Fransoo, Ruth-Ann Soodeen, Mark Smith,
Joshua Ginter, Charles Burchill, Noralou P. Roos,
Malcolm B. Doupe, Marni Brownell, Lisa M. Lix,
Greg Finlayson, and Maureen Heaman
9 Health Services Information: Key Concepts and

Considerations in Building Episodes of Care from
Administrative Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Erik Hellsten and Katie Jane Sheehan
10 Health Services Information: Lessons Learned from the

Society of Thoracic Surgeons National Database . . . . . . . . . 217
David M. Shahian and Jeffrey P. Jacobs
11 Health Services Information: Patient Safety Research Using

Administrative Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Chunliu Zhan
12 Health Services Information: Personal Health Records as a

Tool for Engaging Patients and Families . . . . . . . . . . . . . . . . 265
John Halamka
13 A Framework for Health System Comparisons: The Health

Systems in Transition (HiT) Series of the European
Observatory on Health Systems and Policies . . . . . . . . . . . . . 279
Bernd Rechel, Suszy Lessof, Reinhard Busse, Martin McKee,
Josep Figueras, Elias Mossialos, and Ewout van Ginneken
14 Health Services Knowledge: Use of Datasets Compiled

Retrospectively to Correctly Represent Changes in Size of
Wait List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Paul W. Armstrong
15 Waiting Times: Evidence of Social Inequalities in Access for

Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Luigi Siciliani
16 Health Services Data: The Ontario Cancer Registry

(a Unique, Linked, and Automated Population-Based
Registry) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
Sujohn Prodhan, Mary Jane King, Prithwish De, and
Julie Gilbert
17 Challenges of Measuring the Performance of

Health Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Adrian R. Levy and Boris G. Sobolev
Contents ix
Part II Methods in Health Services Research ................ 403
18 Analysis of Repeated Measures and Longitudinal Data in

Health Services Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Juned Siddique, Donald Hedeker, and Robert D. Gibbons
19 Competing Risk Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
Melania Pintilie
20 Modeling and Analysis of Cost Data . . . . . . . . . . . . . . . . . . . 447
Shizhe Chen and XH Andrew Zhou
21 Instrumental Variable Analysis . . . . . . . . . . . . . . . . . . . . . . . 479
Michael Baiocchi, Jing Cheng, and Dylan S. Small
22 Introduction to Causal Inference Approaches . . . . . . . . . . . . 523
Elizabeth A. Stuart and Sarah Naeger
23 Measurement of Patient-Reported Outcomes of
Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
Joseph C. Cappelleri and Andrew G. Bushmakin
24 Micro-simulation Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
Carolyn M. Rutter
25 Network Meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
Georgia Salanti, Deborah Caldwell, Anna Chaimani, and
Julian Higgins
26 Introduction to Social Network Analysis . . . . . . . . . . . . . . . . 617
Alistair James O’Malley and Jukka-Pekka Onnela
27 Survey Methods in Health Services Research . . . . . . . . . . . . 661
Steven B. Cohen
28 Two-Part Models for Zero-Modified Count and
Semicontinuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
Brian Neelon and Alistair James O’Malley
29 Data Confidentiality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717
Theresa Henle, Gregory J. Matthews, and Ofer Harel
30 Qualitative Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733
Cynthia Robins
Part III Health Care Systems and Policies ................... 753
31 Assessing Health Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755

Irene Papanicolas and Peter C. Smith
32 Health System in Canada . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769
Gregory Marchildon
x Contents
33 Health System in China . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779

David Hipgrave and Yan Mu
34 Health System in Egypt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809
Christian A. Gericke, Kaylee Britain, Mahmoud Elmahdawy,
and Gihan Elsisi
35 Health System in France . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
Karine Chevreul and Karen Berg Brigham
36 Health System in Japan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837
Ryozo Matsuda
37 Health System in Mexico . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 849
Julio Frenk and Octavio Gómez-Dantés
38 Health System in the Netherlands . . . . . . . . . . . . . . . . . . . . . . 861
Madelon Kroneman and Willemijn Schäfer
39 Health System in Singapore . . . . . . . . . . . . . . . . . . . . . . . . . . 877
William A. Haseltine and Chang Liu
40 Health System in the USA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891
Andrew J. Barnes, Lynn Y. Unruh, Pauline Rosenau, and
Thomas Rice
41 Health System Typologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
Claus Wendt
42 Organization and Governance: Stewardship and
Governance in Health Systems . . . . . . . . . . . . . . . . . . . . . . . . 939
Scott L. Greer
43 Provision of Health Services: Long-Term Care . . . . . . . . . . . 949
Vincent Mor and Anna Maresso
44 Provision of Health Services: Mental Health Care . . . . . . . . 979
Jon Cylus, Marya Saidi, and Martin Knapp
About the Series Editor
Boris Sobolev is a health services researcher from the University of British

Columbia. He is author of Analysis of Waiting-Time Data in Health Services
Research and Health Care Evaluation Using Computer Simulation: Concepts,
Methods and Applications.
Dr. Sobolev started an academic career at the Radiation Epidemiology
Institute in Kiev, studying the risk of cancer in relation to exposure resulting
from the Chernobyl accident. In 1996, he came to Canada to work at Queen’s
University in Kingston, where he studied how people get access to health care,
what services they use, and what happens to patients as a result. There, he
pioneered the epidemiological approach to studying the risk of adverse events
in relation to time of receiving medical services.
Later, Dr. Sobolev joined the University of British Columbia, Canada,
where he is a Professor at the School of Population and Public Health.
There, he has taught a variety of courses and introduced into the curriculum
a new course on causal inferences in health sciences. He was awarded a
Canada Research Chair in Statistics and Modelling of the Health Care System,
a distinction he held through 2013. Currently, he serves as principal investi-
gator for the Canadian Collaborative Study on Hip Fractures.
Dr. Sobolev also leads the Health Services and Outcomes Research Pro-
gram at the Centre for Clinical Epidemiology and Evaluation at the Vancouver
xi
xii About the Series Editor
General Hospital. The program’s mission is closing the gap between health
care that is possible and health care that is delivered. This ambitious agenda
brings together researchers from a wide range of disciplinary backgrounds that
are required for evaluating the effectiveness of diagnostic technologies, treat-
ment procedures, and managerial solutions. The program’s investigators
empirically assess the benefits and harms of therapeutic and health care
interventions in the acute and primary care setting, using patient registries
and data from routine medical care. By learning what works in everyday
clinical practice the program generates knowledge that helps physicians and
patients to make shared decisions about the best approach to treatment.
Dr. Sobolev promotes and advances the causality perspective in health
services research for informing policy and decision-making. In particular, his
recent work helped to estimate the reduction in postoperative mortality
expected from providing timely cardiac surgical care; the health effects of
receiving hip fracture surgery within the government benchmark; the propor-
tion of hospital readmissions that could be avoided had patients undergone
medication review in emergency departments rather than in hospital wards;
and the expected reduction of mortality had all coronary obstructive pulmo-
nary disease patients had their second exacerbation prevented.
About the Editors
Adrian Levy is professor of epidemiology and health services research

working at Dalhousie University in Halifax, Nova Scotia. Dr. Levy com-
menced his academic career working for the Quebec Council for Health
Technology Assessment doing applied health research on real-world use of
health technologies such as extracorporeal shock wave lithotripsy and com-
plex operations. His doctoral dissertation in epidemiology was completed at
McGill University (1998) followed by postgraduate training in economic
evaluation at McMaster University (2000). In 2000, Dr. Levy joined the
faculty in the School of Population and Public Health at the University of
British Columbia and was awarded British Columbia Michael Smith Founda-
tion for Health Research Scholar (2001) and Senior Scholar (2006) awards and
a New Investigator Award from the Canadian Institutes of Health Research
(2004). There, he linked administrative health databases with patient and
treatment registries to study access, quality, and cost of care in cardiac surgery,
HIV, and transplant.
In 2009, Dr. Levy joined Dalhousie University in Halifax, Nova Scotia,
Canada, to serve as head of the Department of Community Health and
Epidemiology. As an integral part of the Medical School of the Maritimes,
the Department’s collective purpose is to enhance the capacity to improve the
health of individuals, patients, communities, populations, and systems, by
xiii
xiv About the Editors
serving as leaders who generate evidence and apply critical thinking to the
health challenges of today and tomorrow. The Department’s faculty generate
evidence and engage in knowledge exchange that advances effective and
sustainable systems for health services access and delivery.
As nominated principal investigator, Dr. Levy led the development and
implementation of the Maritime Strategy for Patient-Oriented Research SUP-
PORT Unit. This initiative, co-funded by the Canadian Institutes of Health
Research, offers research infrastructure designed to promote patient-centered
outcomes and health services research in Canada’s three Maritime provinces.
The Unit’s mission is to lead the development and application of patient-
centered outcomes research, and the vision is to enhance the health and well-
being of individuals and populations in the Maritimes and across Canada. The
central goals include advancing research on health systems, knowledge trans-
lation and implementation of healthcare transformation, and implementing
research at the point of care.
Sarah Goring has an M.Sc. in healthcare and epidemiology from the Univer-
sity of British Columbia and more than 10 years of experience consulting in the
private sector, where she focuses on pharmacoepidemiology, evidence syn-
thesis methods, and health services research.
Constantine Gatsonis is Henry Ledyard Goddard University Professor and

founding chair of the Department of Biostatistics and the Center for Statistical
Sciences at the Brown University School of Public Health. Dr. Gatsonis is a
About the Editors xv
leading authority on the evaluation of diagnostic and screening tests and has
made major contributions to statistical methods for medical technology assess-
ment and health services and outcomes research. His current research activity
spans the spectrum of evidence-based diagnostic medicine, addresses both
methodology and subject matter, and has a major focus on the comparative
effectiveness of screening and diagnostic modalities. As the founding network
statistician of the American College of Radiology Imaging Network (ACRIN)
and a group statistician for the ECOG-ACRIN collaborative group, he has
decades-long experience in the clinical evaluation of modalities for diagnosis
and prediction in cancer and other chronic diseases. Dr. Gatsonis has served on
numerous review and advisory panels. He chaired the NAS Committee on
Applied and Theoretical Statistics and is a member of the NAS Committee on
National Statistics. He served on the IOM Committee on Comparative Effec-
tiveness Research Prioritization and the NAS Committee on Reproducibility
and Replicability in Science and was the founding editor-in-chief of Health
Services and Outcomes Research Methodology. Dr. Gatsonis was educated at
Princeton and Cornell, was elected fellow of the American Statistical Associ-
ation, and received a Long-Term Excellence Award from the Health Policy
Statistics Section of ASA.
Ewout van Ginneken is coordinator of the Berlin office of the European

Observatory on Health Systems and Policies at the Berlin University of
Technology. He holds a master’s degree in health policy and administration
from Maastricht University in the Netherlands and a Ph.D. in public health
from the Berlin University of Technology. His expertise is in comparative
international health systems research and health policy research. His main
interests include health financing, insurance competition, care purchasing,
integrated care, cross-border care, and migrants’ access to care. He has edited
several Health Systems in Transition (HiT) reviews including on the healthcare
systems of Bulgaria, the Czech Republic, Estonia, Lithuania, the Netherlands,
Slovakia, Slovenia, and the United States. He has published widely on these
topics in international peer-reviewed literature and the wider literature. Before
joining the Observatory, Ewout was a senior researcher at the Berlin Univer-
sity of Technology and a 2011–2012 Commonwealth Fund Harkness Fellow
in Health Care Policy and Practice at the Harvard School of Public Health.
xvi About the Editors
Reinhard Busse is department head for healthcare management in the Faculty

of Economics and Management at Technische Universität Berlin, Germany.
He is also a faculty member of Charité, Berlin’s medical faculty, co-director
and head of the Berlin hub of the European Observatory on Health Systems
and Policies, member of several scientific advisory boards, as well as regular
consultant for the WHO, the EU Commission, the World Bank, the OECD, and
other international organizations within Europe and beyond as well as national
health and research institutions. From 2006 to 2009, he served as dean of his
faculty.
His research focuses on methods and contents of comparative health system
analysis and assessment as well as health services research (with emphasis on
hospitals, human resources, cross-border care, health reforms in Germany, role
of the EU, financing and payment mechanisms, as well as disease manage-
ment), health economics, and health technology assessment (HTA).
His regular master-level teaching courses at TU Berlin include “Managing
and Researching Health Care Systems”; “Health Technology Assessment”
(blended learning, i.e., mainly online); “Health Care Management I, Insurance
Management”; “Health Care Management II, Provider Management”; “Health
Care Management III, Industry Management” (pharmaceuticals and medical
devices); and “Health Care Management IV, Health Economic Evaluation.”
He is the principal editor of the German textbook on healthcare management
(published with Springer, fourth edition 2017), author of a book on the German
health system (fourth edition 2017), as well as co-editor of German textbooks
on public health (third edition 2012) and on HTA (second edition 2014). Since
2015, he is speaker of the board of the newly founded inter-university Berlin
School of Public Health.
Professor Busse is the director of the annual Observatory Summer School in
Venice, which is directed at policy-makers and has covered a wide range of
topics: human resources for health; hospital reengineering; innovation and
health technology assessment; EU integration and health systems; the aging
crisis; performance assessment for health system improvement; innovative
ways of improving population health; integrated care – moving beyond the
rhetoric; primary care – innovating for integrated, more effective care; and
quality of care – improving effectiveness, safety, and responsiveness.
He was the PI/coordinator of the EU-funded project “EuroDRG: Diagnosis-
Related Groups in Europe: Towards Efficiency and Quality” (Seventh Frame-
work; 2009–2011). He has been and is also involved in several other
About the Editors xvii
EU-funded projects under the Seventh Framework, e.g., on the relationship

between nursing and patient outcome (RN4Cast; 2009–2011), mobility of
health professionals (PROMeTHEUS; 2009–2011), evaluating care across
borders (ECAB; 2009–2013), on healthcare data for cross-country compari-
sons of efficiency and quality (EuroREACH; 2010–2013), on the impact of
new roles for health professionals (Munros; 2012–2016), and on advancing
and strengthening HTA (Advance HTA; 2013–2015). Previously, he was
PI/scientific coordinator of the EU-funded project “Health Benefits and Ser-
vice Costs in Europe” (HealthBASKET; 2004–2007).
Since 2011, he is editor-in-chief of the international peer-reviewed journal
Health Policy. Since 2012, he is the director of the Berlin Health Economics
Research Centre (BerlinHECOR, overarching topic “Towards a Performance
Assessment of the German Health Care System”), one of four centers in
Germany funded by the Federal Ministry of Research. In 2016–2017, he was
president of the German Health Economics Association (DGGÖ).
Professor Busse studied medicine in Marburg, Germany; Boston, USA; and
London, UK, as well as public health in Hannover, Germany. Prior to his
appointment at TU Berlin in 2002, he was head of the Observatory’s hub in
Madrid, Spain (1999–2002); a senior research fellow in the Department of
Epidemiology, Social Medicine and Health Systems Research (1994–1999,
finishing with his “habilitation”/second Ph.D.) and a resident physician in the
Department of Rheumatology (1992–1994), both at the Hannover Medical
School; and a researcher in the Planning Group for a Problem-Based Medical
Curriculum at the Freie Universität Berlin (1991–1992). In 1993, he earned a
“Dr. med.” (Ph.D. in medicine) from Philipps-Universität in Marburg.
Contributors
Ankit Agrawal Department of Electrical Engineering and Computer

Science, Northwestern University, Evanston, IL, USA
Paul W. Armstrong London, UK
Mahmoud Azimaee ICES Central, Toronto, ON, Canada
Michael Baiocchi Department of Statistics, Stanford University, Stanford,

CA, USA
Andrew J. Barnes Department of Health Behavior and Policy, School of

Medicine, Virginia Commonwealth University, Richmond, VA, USA
Karen Berg Brigham University of Washington, Seattle, WA, USA
Kaylee Britain University of Queensland School of Public Health, Brisbane,

Australia
Marni Brownell Manitoba Centre for Health Policy, University of Manitoba,

Winnipeg, MB, Canada
Charles Burchill Manitoba Centre for Health Policy, University of Manitoba,

Andrew G. Bushmakin Global Product Development, Pfizer Inc, Groton,
CT, USA
Reinhard Busse Technische Universität, Berlin, Germany
Department Health Care Management, Faculty of Economics and Manage-
ment, Technische Universität, Berlin, Germany
Deborah Caldwell School of Social and Community Medicine, University

of Bristol, Bristol, UK
Joseph C. Cappelleri Global Product Development, Pfizer Inc, Groton,
CT, USA
Brendan M. Carr Department of Emergency Medicine, Mayo Clinic,
Rochester, MN, USA
xix
xx Contributors
Anna Chaimani Department of Hygiene and Epidemiology, University of

Ioannina School of Medicine, Ioannina, Greece
Shizhe Chen Department of Biostatistics, University of Washington, Seattle,
WA, USA
Jing Cheng Department of Preventive and Restorative Dental Sciences,
University of California, San Francisco School of Dentistry, San Francisco,
CA, USA
Karine Chevreul Health Economics and Health Services Research Unit,
URC ECO Ile de France, Paris, France
Alok Choudhary Department of Electrical Engineering and Computer
Steven B. Cohen Division of Statistical and Data Sciences, RTI International,
Washington, DC, USA
Jon Cylus The London School of Economics and Political Science, London,
UK
Prithwish De Surveillance and Ontario Cancer Registry, Cancer Care
Ontario, Toronto, ON, Canada
Malcolm B. Doupe Manitoba Centre for Health Policy, University of
Manitoba, Winnipeg, MB, Canada
Mahmoud Elmahdawy Ministry of Health, Cairo, Egypt
Gihan Elsisi Ministry of Health, Cairo, Egypt
Faculty of Pharmacy, Heliopolis University, Cairo, Egypt
Josep Figueras European Observatory on Health Systems and Policies,
Brussels, Belgium
Greg Finlayson Finlayson and Associates Consulting, Kingston, ON,
Canada
Fergal J. Fleming University of Rochester Medical Center, Rochester,
NY, USA
Randy Fransoo Manitoba Centre for Health Policy, University of Manitoba,
Julio Frenk University of Miami, Coral Gables, FL, USA
Octavio Gómez-Dantés National Institute of Public Health, Cuernavaca,
MOR, Mexico
Christian A. Gericke Anton Breinl Centre for Health Systems Strengthen-
ing, James Cook University, Cairns, Australia
University of Queensland School of Public Health, Brisbane, Australia
Robert D. Gibbons Departments of Medicine and Public Health Sciences,
University of Chicago, Chicago, IL, USA
Contributors xxi
Julie Gilbert Planning and Regional Programs, Cancer Care Ontario,

Toronto, ON, Canada
Joshua Ginter Montreal, QC, Canada
Scott L. Greer Department of Health Management and Policy, University of
Michigan, Ann Arbor, MI, USA
Frederick L. Grover Department of Surgery, School of Medicine at the
Anschutz Medical Campus, University of Colorado, Aurora, CO, USA
John Halamka Department of Emergency Medicine, Harvard Medical
School and Beth Israel Deaconess Medical Center, Boston, MA, USA
Ofer Harel Department of Statistics, University of Connecticut, Storrs,
CT, USA
William A. Haseltine ACCESS Health International, New York, NY, USA
Maureen Heaman College of Nursing, Rady Faculty of Health Sciences,
University of Manitoba, Winnipeg, MB, Canada
Donald Hedeker Department of Public Health Sciences, University of
Chicago, Chicago, IL, USA
Erik Hellsten Health Quality Ontario, Toronto, ON, Canada
Theresa Henle Department of Mathematics and Statistics, Loyola Univer-
sity, Chicago, IL, USA
Julian Higgins MRC Biostatistics Unit, Cambridge, UK
Centre for Reviews and Dissemination, University of York, York, UK
David Hipgrave UNICEF, New York, NY, USA
Nossal Institute for Global Health, University of Melbourne, Melbourne, VIC,
Australia
Say P. Hong Manitoba Centre for Health Policy, University of Manitoba,
James C. Iannuzzi University of Rochester Medical Center, Rochester,
NY, USA
Karey Iron College of Physicians and Surgeons of Ontario, Toronto, ON,
Canada
Jeffrey P. Jacobs Division of Cardiac Surgery, Department of Surgery, Johns
Hopkins University School of Medicine, Baltimore, MA, USA
Johns Hopkins All Children’s Heart Institute, Saint Petersburg/Tampa,
FL, USA
Jessica S. Jarmasz Manitoba Centre for Health Policy, University of
Alan Katz Manitoba Centre for Health Policy, University of Manitoba,
xxii Contributors
Mary Jane King Surveillance and Ontario Cancer Registry, Cancer Care
Martin Knapp The London School of Economics and Political Science,
London, UK
Madelon Kroneman Netherlands Institute of Health Services Research
(NIVEL), Utrecht, The Netherlands
Suszy Lessof European Observatory on Health Systems and Policies,
Brussels, Belgium
Adrian R. Levy Community Health and Epidemiology, Dalhousie Univer-
sity, Halifax, NS, Canada
Chang Liu ACCESS Health International, New York, NY, USA
Lisa M. Lix Department of Community Health Sciences, University of
Gregory Marchildon Institute of Health Policy, Management and Evalua-
tion, University of Toronto, Toronto, ON, Canada
Anna Maresso European Observatory on Health Systems and Policies,
London School of Economics and Political Science, London, UK
Patricia J. Martens Winnipeg, MB, Canada
Ryozo Matsuda Ritsumeikan University, Kyoto, Japan
Gregory J. Matthews Department of Mathematics and Statistics, Loyola
University, Chicago, IL, USA
Martin McKee London School of Hygiene and Tropical Medicine, London,
UK
John R. T. Monson Florida Hospital System Center for Colon and Rectal
Surgery, Florida Hospital Medical Group Professor of Surgery, University of
Central Florida, College of Medicine, Florida Hospital, Orlando, FL, USA
Vincent Mor Department of Health Services, Policy and Practice, Brown
University School of Public Health, Providence, RI, USA
Providence Veterans Administration Medical Center, Center on Innovation,
Providence, RI, USA
Elias Mossialos London School of Economics and Political Science,
London, UK
Yan Mu UNICEF China, Beijing, China
Ross M. Mullner Division of Health Policy and Administration, School of
Public Health, University of Illinois, Chicago, IL, USA
Patricia J. Martens: deceased.

Contributors xxiii
Sarah Naeger Behavioral Health Research and Policy, IBM Watson Health,
Bethesda, MD, USA
Brian Neelon Department of Biostatistics and Bioinformatics, Duke Univer-
sity School of Medicine, Durham, NC, USA
J. Patrick Nicol Manitoba Centre for Health Policy, University of Manitoba,
Katia Noyes Department of Surgery, University of Rochester Medical
Center, Rochester, NY, USA
Alistair James O’Malley The Dartmouth Institute for Health Policy and
Clinical Practice, Department of Biomedical Data Science, Geisel School of
Medicine at Dartmouth, Lebanon, NH, USA
Department of Health Care Policy, Harvard Medical School, Boston,
MA, USA
Jukka-Pekka Onnela Department of Biostatistics, Harvard School of Public
Health, Boston, MA, USA
Tyler Ostapyk University Advancement, Carleton University, Ottawa, ON,
Canada
Irene Papanicolas The London School of Economics and Political Science,
London, UK
Harvard T.H. Chan School of Public Health, Cambridge, MA, USA
Melania Pintilie University Health Network, Toronto, ON, Canada
Sujohn Prodhan Surveillance and Ontario Cancer Registry, Cancer Care
Bernd Rechel European Observatory on Health Systems and Policies,
London School of Hygiene and Tropical Medicine, London, UK
Thomas Rice Department of Health Policy and Management, Fielding
School of Public Health, University of California, Los Angeles, CA, USA
Cynthia Robins Westat, Rockville, MD, USA
Leslie L. Roos Manitoba Centre for Health Policy, University of Manitoba,
Noralou P. Roos Manitoba Centre for Health Policy, University of Manitoba,
Pauline Rosenau Division of Management, Policy and Community Health,
School of Public Health, University of Texas Health Science Center at
Houston, Houston, TX, USA
Carolyn M. Rutter RAND Corporation, Santa Monica, CA, USA
Marya Saidi The London School of Economics and Political Science,
London, UK
xxiv Contributors
Georgia Salanti Department of Hygiene and Epidemiology, University of

Willemijn Schäfer Netherlands Institute of Health Services Research
David M. Shahian Department of Surgery and Center for Quality and Safety,
Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Katie Jane Sheehan School of Population and Public Health, The University
of British Columbia, Vancouver, BC, Canada
A. Laurie W. Shroyer Department of Surgery, School of Medicine, Stony
Brook University, Stony Brook, NY, USA
Luigi Siciliani Department of Economics and Related Studies, University of
York, York, UK
Juned Siddique Department of Preventive Medicine, Northwestern Univer-
sity Feinberg School of Medicine, Chicago, IL, USA
Dylan S. Small University of Pennsylvania, Philadelphia, PA, USA
Mark Smith Manitoba Centre for Health Policy, University of Manitoba,
Peter C. Smith Imperial College, London, UK
University of York, York, UK
Boris G. Sobolev School of Population and Public Health, University of
British Columbia, Vancouver, BC, Canada
Ruth-Ann Soodeen Manitoba Centre for Health Policy, University of
Elizabeth A. Stuart Department of Mental Health, Johns Hopkins
Bloomberg School of Public Health, Baltimore, MD, USA
Kathy Sykora Toronto, ON, Canada
Dave G. Towns Manitoba Centre for Health Policy, University of Manitoba,
Ken Turner Manitoba Centre for Health Policy, University of Manitoba,
Lynn Y. Unruh Department of Health Management and Informatics, College
of Health and Public Affairs, University of Central Florida, Orlando, FL, USA
Ewout van Ginneken Berlin University of Technology, Berlin, Germany
European Observatory on Health Systems and Policies, Department of Health
Care Management, Berlin University of Technology, Berlin, Germany
Claus Wendt University of Siegen, Siegen, Germany
Contributors xxv
Chunliu Zhan Department of Health and Human Services, Agency for

Healthcare Research and Quality, Rockville, MD, USA
XH Andrew Zhou Beijing International Center for Mathematical Research,

Peking University, Beijing, China
VA Puget Sound Healthcare System, University of Washington, Seattle,
WA, USA
Part I
Data and Measures in Health Services
Research
Health Services Data: Big Data Analytics
for Deriving Predictive Healthcare 1
Insights
Ankit Agrawal and Alok Choudhary
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Big Data Analytics on SEER Lung Cancer
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Lung Cancer Survival Prediction System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Conditional Survival Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Association Rule Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Illustrative Data Mining Results on SEER Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Lung Cancer Outcome Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Other Applications of Big Data Analytics in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Abstract from the Surveillance, Epidemiology, and

This chapter describes the application of big End Results (SEER) program. This includes
data analytics in healthcare, particularly on the construction of accurate predictive
electronic healthcare records so as to make models for lung cancer survival, develop-
predictive models for healthcare outcomes ment of a lung cancer outcome calculator
and discover interesting insights. A typical deploying the predictive models, and associ-
workflow for such predictive analytics ation rule mining on that data for bottom-up
involves data collection, data transformation, discovery of interesting insights. The lung
predictive modeling, evaluation, and deploy- cancer outcome calculator illustrated here is
ment, with each step tailored to the end goals of available at http://info.eecs.northwestern.
the project. To illustrate each of these steps, we edu/LungCancerOutcomeCalculator.
shall take the example of recent advances in
such predictive analytics on lung cancer data
Introduction
A. Agrawal (*) · A. Choudhary The term “big data” has become a ubiquitous buzz-
Department of Electrical Engineering and Computer word today in practically all areas of science, tech-
e-mail: ankita@eecs.northwestern.edu; nology, and commerce. It primarily denotes
choudhar@eecs.northwestern.edu datasets that are too large, complex, or both, to be
# Springer Science+Business Media, LLC, part of Springer Nature 2019 3
A. Levy et al. (eds.), Health Services Evaluation, Health Services Research,
https://doi.org/10.1007/978-1-4939-8715-3_2
4 A. Agrawal and A. Choudhary
adequately analyzed by traditional processing tech-

niques. Scientific and technological advances in
measurement and sensor devices, databases, and Big Data
storage systems have made it possible to efficiently
collect, store, and retrieve huge amounts of and
different kinds of data. However, when it comes to Big Data
the analysis of such data, we have to admit that our
ability to generate big data has far outstripped our
analytical ability to make sense of it. This is true in
practically all fields, and the field of medicine and Volume Velocity Variety Variability Veracity
healthcare is no exception to it, where the fourth
paradigm of science (data-driven analytics) is Value
increasingly becoming popular and has led to the
Visualization
emergence of the new field of healthcare informat-
ics. The fourth paradigm of science (Hey et al.
2009) unifies the first three paradigms of science – Fig. 1 The various Vs associated with big data. Volume,
namely, theory, experiment, and simulation/com- velocity, and variety are unique features of big data that
represent its bigness. Variability and veracity are charac-
putation. The need for such data-driven analytics teristics of any type of data, including big data. The goal of
in healthcare has also been emphasized by large- big data analytics is to unearth the value hidden in the data
scale initiatives all around the world, such as Big and appropriately visualize it to make informed decisions
Data to Knowledge (BD2K) and Precision Medi-
cine Initiative of National Institutes of Health in effectively process the data in real time.
the USA, Big Data for Better Outcomes Initiative A good example of high velocity data source
in Europe, and so on. is Twitter, where more than 5,000 tweets are
The bigness (amount) of data is certainly the posted every second.
central feature and challenge of dealing with the • Variety: This refers to the heterogeneity in the
so-called big data, but it is many times accompa- data. For instance, many different types of
nied by one or more of other features that can healthcare data are generated and collected by
make the collection and analysis of such data different healthcare providers, such as elec-
even more challenging. For example, the data tronic health records, X-rays, cardiograms,
could be from several heterogeneous sources, genomic sequence, etc. It is important to be
may be of different types, may have unknown able to derive insights by looking at all avail-
dependencies and inconsistencies within it, parts able heterogenous data in a holistic manner.
of it could be missing or not reliable, the rate of • Variability: The inconsistency in the data. This
data generation could be much more than what is especially important since the correct inter-
traditional systems could handle, and so on. All pretation of the data can vary significantly
this can be summarized by the famous Vs associ- depending on its context.
ated with big data, as presented in Fig. 1 and • Veracity: It refers to how trustworthy the data
briefly described below: is. The quality of the insights resulting from
analysis of any data is critically dependent on
• Volume: It refers to the amount of data. the quality of the data itself. Noisy data with
Datasets of sizes exceeding terabytes and erroneous values or lot of missing values can
even petabytes are not uncommon today in greatly hamper accurate analysis.
many domains. This presents one of the big- • Visualization: It means the ability to interpret
gest challenges in big data analytics. the data and resulting insights. Visualization
• Velocity: The speed with which new data is can be especially challenging for big data due
generated. The challenge here is to be able to to its other features as described above.
1 Health Services Data: Big Data Analytics for Deriving Predictive Healthcare Insights 5
• Value: The goal of big data analytics is to important? Let us try to understand with the help
discover the hidden knowledge from huge of an example. The benefits of medical treatments
amounts of data, which is akin to finding a can vary depending on one’s expected survival,
needle in a haystack, and can be extremely and thus not considering an individual patient’s
valuable. For example, big data analytics in prognosis can result in poor quality of care as well
healthcare can help enable personalized medi- as nonoptimal use of healthcare resources. Devel-
cine by identifying optimal patient-specific oping accurate prognostic models using all avail-
treatments, which can potentially improve mil- able information and incorporating them into
lions of lives, reduce waste of healthcare clinical decision support could thus significantly
resources, and save billions of dollars in improve quality of healthcare (Collins et al.
healthcare expenditure. 2015), both in terms of improving clinical deci-
sion support and enhancing informed patient con-
The first three Vs above distinguish big data sent. Development of accurate data-driven models
from small data, and other Vs are characteristics can also have a tremendous economic impact. The
of any type of data, including big data. Further, Centers for Disease Control and Prevention esti-
each application domain can also introduce its mates that there are more than 150,000 surgical
own nuances to the process of big data management site infections annually (Magill et al. 2014), and it
and analytics. For example, in healthcare, the pri- can cost $11,000–$35,000 per patient, i.e., about
vacy and security of patients’ data are of paramount $5 billion every year. Accurate predictions and
importance, and compliance to Health Insurance risk estimation for healthcare outcomes can poten-
Portability and Accountability Act (HIPAA) and tially avoid thousands of complications, resulting
institutional review board (IRB) protocols is neces- in improved resource management and signifi-
sary to work with many types of healthcare data. It cantly reduced costs. This requires development
is also worth noting here that although the size and of advanced data-driven technologies that could
scale of healthcare data are not as large as in some effectively mine all available historical data,
other domains of science like high energy physics extract and suitably store the resulting insights
or in business and marketing, but the sheer com- and models, and make them available at the
plexity and variety in healthcare data becoming point of care in a patient-specific way.
available nowadays require the development of In the rest of this chapter, we will see one such
new big data approaches in healthcare. For exam- application of big data analytics on electronic
ple, there are electronic healthcare records (EHRs), healthcare records so as to make predictive
medical images (e.g., mammograms), time-series models on it and discover interesting insights. In
data (e.g., ECG signals), textual data (doctor’s particular, we will take the example of lung cancer
notes, research papers), genome sequence, and data from the Surveillance, Epidemiology, and
related data (e.g., SNPs). End Results (SEER) program to build models of
So what can big data analytics do for a real- patient survival after 6 months, 9 months, 1 year,
world healthcare application? A variety of person- 2 years, and 5 years (Agrawal et al. 2011a) and for
alized information such as patient’s electronic conditional survival as well (Agrawal et al. 2012).
health records is increasingly becoming available. We will also see the application of association rule
What if we could intelligently integrate the hidden mining on this dataset for 5-year survival
knowledge from such healthcare data during a (Agrawal et al. 2011b) and 5-year conditional
real-time patient encounter to complement physi- survival (Agrawal and Choudhary 2011). Finally,
cian’s expertise and potentially address the chal- we will discuss the online lung cancer outcome
lenges of personalization, safe, and cost-effective calculator that resulted from the described predic-
healthcare? Note that the challenge here is to tive analytics on SEER data and conclude with
make the insights patient specific instead of giving some examples of big data analytics in other
generic population-wide statistics. Why is this healthcare-related applications.
Big Data Analytics on SEER Lung prediction on SEER data. Figure 2 depicts the
Cancer Data overall end-to-end workflow. It is worth mention-
ing here that this workflow for predictive lung
Lung (respiratory) cancer is the second most com- cancer outcome analytics is essentially a healthcare
mon cancer and the leading cause of cancer- adaptation of existing similar data science
related deaths in the USA. In 2012 alone, over workflows in other domains, since most of the
157,000 people in the USA died from lung cancer. advanced techniques for big data management
The 5-year survival rate for lung cancer is esti- and analytics are invented in the field of computer
mated to be just 15% (Ries et al. 2007). The science and more specifically high-performance
Surveillance, Epidemiology, and End Results data mining (Agrawal et al. 2013a; Xie et al.
(SEER) program of the National Cancer Institute 1072), via applications in many different domains
(NCI) is an authoritative repository of cancer sta- like business and marketing (Xie et al. 2012), cli-
tistics in the USA (SEER 2008). It is a population- mate science (Ganguly et al. 2014), materials infor-
based cancer registry covering about 26% of the matics (Agrawal and Choudhary 2016), and social
US population and is the largest publicly available media analytics (Xie et al. 2013), among many
cancer dataset in the USA. It collects cancer data others. Here we will only focus on the healthcare
for all invasive and in situ cancers, except basal application of developing a lung cancer survival
and squamous cell carcinomas of the skin and in prediction system. As shown in Fig. 2, it has five
situ carcinomas of the uterine cervix (Ries et al. stages described below.
2007). The SEER data attributes can be broadly
categorized into demographic attributes, diagno- Data Collection
sis attributes, treatment attributes, and outcome This is the obvious first step. Depending on the
attributes (see Table 1). The presence of outcome project, the kind of data required for it, and the
attributes makes the SEER data very useful for license agreements associated with that data, this
doing predictive analytics and making models for can be the easiest or the toughest step in the
cancer survival. workflow. SEER has made it easy to get the
“SEER limited-use data” from their website on sub-
mitting a SEER limited-use data agreement form. It
Lung Cancer Survival Prediction creates a personalized SEER research data agree-
System ment for every user that allows the use of the data for
only research purposes. In particular, there must be
Till now we have seen what big data is and what big no attempt to identify the individual patients in the
data analytics can do for healthcare applications. database. Of course, the obvious identification infor-
We have also had a brief introduction to SEER and mation like patient name, SSN, etc., are excluded
what kind of data is present in the SEER database. from the data released by SEER, but it still has
So now let us dive deeper into what a typical demographic information like age, sex, and race,
workflow for predictive analytics looks like, with which is very useful for research purposes but
the specific example of lung cancer survival should not be misused to try to identify patients in
any way. Such compliance to HIPAA regulations is
Table 1 SEER data attributes important to preserve patient privacy.
Type Examples
Demographic Age, gender, location, race/ethnicity,
Data Transformation
date of diagnosis Once the data is available, the first step is to under-
Diagnosis Tumor primary site, size, extension, stand the data format and representation and do any
lymph node involvement necessary transformations to make it suitable for
Treatment Primary treatment, surgical procedure, modeling. Let us assume the data is in a
radiation therapy
row-column (spreadsheet) format, such as in the
Outcome Survival time, cause of death
case of SEER data. Each row corresponds to a
Fig. 2 A typical workflow for predictive analytics, illustrated with the example of outcome prediction models for lung
cancer using SEER data
patient’s medical record and can also be referred to which could easily be misinterpreted as 992 mm if
as an instance, data point, or observation. The col- not transformed appropriately. Another example of
umns are the attributes, such as age, race, tumor size, an unsupervised data transformation required in
surgery, outcome, etc. Data attributes can be of SEER data is to construct numeric survival time
different types – numeric, nominal, ordinal, and in months from the SEER format of YYMM, so
interval – and it is important to have the correct that it can be modeled correctly.
representation of each attribute for analysis, for The above data transformations are required
which some data transformation might be necessary. due to the way SEER data is represented and
More broadly, data transformation is needed to may be necessary for almost any project dealing
ensure the quality of the data ahead of modeling with this data. But there are also problem-specific
and remove or appropriately deal with noise, out- data transformations that may be necessary for
liers, missing values, duplicate data instances, etc. building a model as originally intended. For
Data transformation is usually unsupervised, example, if we are interested in building a predic-
which means that it does not depend on the out- tive model for lung cancer survival, then we
come or target attributes. For example, SEER should only include those patient records where
encodes all attributes as numbers, and many of the cause of patients’ death was lung cancer,
them are actually nominal, like marital status, which is given by the “cause of death” attribute.
where “1” represents “Single,” “2” represents We also need to remove certain attributes from the
“Married,” “3” represents “Separated,” “4” repre- modeling that directly or indirectly specify the
sents “Divorced,” “5” represents “Widowed,” and outcome, e.g., cause of death, whether the patient
“9” represents “Unknown.” Numbers have a natu- is still alive. Further, for binary class prediction,
ral order, and the operations of addition, subtrac- we also need to derive appropriate binary attri-
tion, and division are defined, which may be fine butes for survival time, e.g., 5-year survival.
for numeric attributes like “tumor size” but not for There are also certain data transformation steps
nominal attributes like marital status, sex, race, that could be supervised in some cases, meaning
etc., Such attributes need to be explicitly converted that they depend on the outcome attribute(s).
to nominal for correct predictive modeling. Even Examples include feature selection/extraction,
numeric attributes need to be examined carefully. discretization, and sampling, and all of these can
For example, the tumor size attribute in SEER data be supervised or unsupervised. If they are super-
gives the exact size of tumor in mm, if it is known. vised, they should in general be considered
But in some cases, the doctor notes may say “less together with other supervised analytics so as to
than 2 cm,” in which case it is encoded as “992,” avoid over-fitting (more about this later).
Predictive Modeling Evaluation

Once appropriate data transformation has been Traditional statistical methods such as logistic
performed and the data is ready for modeling, regression are typically evaluated by building the
we can employ supervised data mining techniques model on the entire available data, and computing
for feature selection and predictive modeling. prediction errors on the same data, and it has been a
Caution needs to be exercised here to appropri- common practice in statistical analysis of medical
ately split the data into training and testing sets data as well for many years. Although this approach
(or use cross validation), or else the model may be may work well in some cases, it is nonetheless prone
subject to over-fitting and give overoptimistic to over-fitting and thus can give overoptimistic
accuracy. If the target attribute is numeric (e.g., accuracy. It is easy to see that a data-driven model
survival time), regression techniques can be used can, in principle, “memorize” every single instance
for predictive modeling, and if it is categorical of the dataset and thus result in 100% accuracy on
(e.g., whether a patient survived at least 5 years), the same data but will most likely not be able to
classification techniques can be used. Some tech- work well on unseen data. For this reason, advanced
niques are capable of doing both regression and data-driven techniques that usually result in black
classification. Further, there also exist several box models need to be evaluated on data that the
ensemble learning techniques that can combine model has not seen while training. A simple way to
the results from base learners in different ways do this is to build the model only on random half of
and in some cases have shown to improve accu- the data and use the remaining half for evaluation.
racy and robustness of the final model. Table 2 This is called the train-test split setting for model
lists some of the popular predictive modeling evaluation. Further, the training and testing halves
techniques. can then also be swapped for another round of
Table 2 Popular predictive modeling algorithms

Modeling technique Brief description
Naive Bayes A probabilistic classifier based on Bayes theorem
Bayesian network A graphical model that encodes probabilistic conditional relationships among variables
Logistic regression Fits data to a sigmoidal S-shaped logistic curve
Linear regression A linear least-squares fit of the data w.r.t. input features
Nearest neighbor Uses the most similar instance in the training data for making predictions
Artificial neural Uses hidden layer(s) of neurons to connect inputs and outputs, edge weights learnt using back
networks propagation (called deep learning if more than two layers)
Support vector Based on the structural risk minimization, constructs hyperplanes multidimensional feature
machines space
Decision table Constructs rules involving different combinations of attributes
Decision stump A weak tree-based machine learning model consisting of a single-level decision tree
J48 (C4.5) decision A decision tree model that identifies the splitting attribute based on information gain/gini
tree impurity
Alternating decision Tree consists of alternating prediction nodes and decision nodes, an instance traverses all
tree applicable paths
Random tree Considers a randomly chosen subset of attributes
Reduced-error Builds a tree using information gain/variance and prunes it using reduced-error pruning to
pruning tree avoid over-fitting
AdaBoost Boosting can significantly reduce error rate of a weak learning algorithm
Bagging Builds multiple models on bootstrapped training data subsets to improve model stability by
reducing variance
Random subspace Constructs multiple trees systematically by pseudo-randomly selecting subsets of features
Random forest An ensemble of multiple random trees
Rotation forest Generates model ensembles based on feature extraction followed by axis rotations
evaluation and the results combined to get predic- interest, overall accuracy by itself may not be a
tions for all the instances in the dataset. This setting very useful indicator of classification perfor-
is called twofold cross validation, as the dataset is mance, since even a trivial classifier that sim-
split into two parts. It can further be generalized to ply predicts the majority class would give high
k-fold cross validation, where the dataset is ran- values of overall accuracy:
domly split into k parts. k 1 parts are used to
build the model, and the remaining one part is ðTP þ TNÞ
Overall accuracy ¼
used for testing. This process is repeated k times ðTP þ TN þ FP þ FNÞ
with different test splits, and the results are com-
bined to get predictions for all the instances in the where TP is the number of true positives (hits),
dataset using a model that did not see them while TN is number of true negatives (correct rejec-
training. Leave-one-out cross validation (LOOCV) tions), FP is number of false positives (false
is a special case of the more generic k-fold cross alarms), and FN is number of false negatives
validation, with k = N, the number of instances in (misses).
the dataset. LOOCV is commonly used when the 3. Sensitivity (recall): It is the percentage of pos-
dataset is not very large. To predict the target attri- itive labeled records that were predicted posi-
bute for each data instance, a separate predictive tive. Recall measures the completeness of the
model is built using the remaining N 1 data positive predictions:
instances, and the whole process is repeated for
each data instance. The resulting N predictions can TP
Sensitivity ¼
then be compared with the N actual values to calcu- ðTP þ FNÞ
late various quantitative metrics for accuracy. In this
way, each of the N instances is tested using a model 4. Specificity: It is the percentage of negative
that did not see it while training, thereby maximally labeled records that were predicted negative,
utilizing the available data for model building. Cross thus measuring the completeness of the nega-
validation is a standard evaluation setting to elimi- tive predictions:
nate any chances of over-fitting. Of course, k-fold
cross validation necessitates building k models, TN
Specificity ¼
which may take a long time on large datasets. ðTN þ FPÞ
Comparative assessments of how close the
models can predict the actual outcome are used 5. Positive predictive value (precision): It is the
to provide an evaluation of the models’ predictive percentage of positive predictions that are cor-
performance. Many binary classification perfor- rect. Precision measures the correctness of pos-
mance metrics are usually used for this purpose itive predictions:
such as accuracy, precision, recall/sensitivity,
specificity, area under the ROC curve, etc. TP
Positive predictive value ¼
ðTP þ FPÞ
1. c-statistic (AUC): The receiver operating char-
acteristic (ROC) curve is a graphical plot of true- 6. Negative predictive value: It is the percentage of
positive rate and false-positive rate. The area negative predictions that are correct, thereby mea-
under the ROC curve (AUC or c-statistic) is suring the correctness of negative predictions:
one of the most effective metrics for evaluating
binary classification performance, as it is inde- TN
Negative predictive value ¼
pendent of the probability cutoff and measures ðTN þ FNÞ
the discrimination power of the model.
2. Overall accuracy: It is the percentage of pre- 7. F-measure: It is not too difficult to have a
dictions that are correct. For highly unbalanced model with either good precision or good
classes where the minority class is the class of recall, at the cost of each other. F-measure
combines the two measures in a single metric 12 months, 18 months, and 24 months), and the
such that it is high only if both precision and same binary classification techniques were used to
recall are high: build five new models.
2:precision:recall
F measure ¼
ðprecision þ recallÞ
Association Rule Mining
Deployment Association rule mining is useful to discover pat-

After the predictive models have been constructed terns in the data. In contrast with predictive model-
and properly evaluated, they need to be deployed ing where one is interested in predicting the
appropriately to make the resulting healthcare outcome for a given patient, here one is interested
insights available to various stakeholders at the in bottom-up discovery of associations among the
point of care. For the lung cancer survival predic- attributes. If a target attribute is specified, such asso-
tion project, the predictive models were incorpo- ciation rule mining can help identify segments (sub-
rated in a web tool that allows users to enter sets of data instances) in the data defined by specific
patient attributes and get patient-specific risk attributes’ values such that those segments have
values. More details about the lung cancer out- extreme average values of the target attribute. Note
come calculator are described later in this chapter. that this is tantamount to the inverse question of
retrieval in databases, where one gives the segment
definition in terms of attribute values, and the data-
Conditional Survival Prediction base system returns the segment, possibly along
with the average value of the target attribute in that
Survival prediction from time of diagnosis can be segment. However, such database retrieval cannot
very useful as we have seen till now, but for patients automatically discover segments with extreme aver-
who have already survived a period of time since age values of the target attribute, which is exactly
diagnosis, conditional survival is a much more clin- what association rule mining can do. Let us take the
ically relevant and useful measure, as it tries to example of the SEER dataset to make it clear. In this
incorporate the changes in risk over time. Therefore, case, we have patient attributes including an out-
the above-described lung cancer survival prediction come/target attribute (survival time). Let us say the
system was adapted to create additional conditional average survival time in the data is tavg. It would
survival prediction models. Since 5-year survival then be of interest to automatically discover from the
rate is the most commonly used measure to estimate data under what conditions – as defined by the
the prognosis of cancer, the conditional survival combination of patient attribute/values – is the sur-
0
models were designed to estimate patient-specific vival time tavg significantly greater or significantly
risk of mortality after 5 years of diagnosis of lung lower than tavg. Similarly, if the target attribute is
cancer, given that the patient has already survived nominal like 5-year survival (whether or not a
for 3 months, 6 months, 12 months, 18 months, and patient survived for at least 5 years), and the fraction
24 months. of survived patients in the entire dataset is f, then it
In order to construct a model for estimating would be interesting to find segments where this
mortality risk after 5 years of diagnosis of patients fraction f0 is significantly higher or lower than f.
already survived for time T, only those patients
were included in the modeling data that survived
at least time T. Note that this is equivalent to Illustrative Data Mining Results
taking the data used in the calculator to build on SEER Data
5-year survival prediction model, and removing
the instances where the survival time was less than We now present some examples of the results of
T. Thus, five new datasets were created for five above-described big data analytics on lung cancer
different values of T (3 months, 6 months, EHR data from SEER. In Agrawal et al. (2012),
the SEER November 2008 Limited-Use Data files given that the patient has already survived
(SEER 2008) were used, which was released in 3 months, 6 months, 12 months, 18 months, and
April 2009. It had a follow-up cutoff date of 24 months.
December 31, 2006, i.e., the patients were diag- Many predictive modeling techniques were
nosed and followed up up to this date. Data was found to give good accuracy measures that were
selected for the patients diagnosed between 1998 statistically indistinguishable with the best accu-
and 2001. Since the follow-up cutoff date for the racy. From among those, we chose the model
SEER data in study was December 31, 2006, and based on alternating decision trees with addi-
the goal of the project was to predict survival up to tional logistic modeling on top for better calibra-
5 years, data of 2001 and before was used. Also, tion. Tenfold cross validation was used to
since several important attributes were introduced estimate the accuracy of all the ten models.
to the SEER data in 1998 (like RX Summ-Surg Table 3 presents the results for all the models
Site 98-02, RX Summ-Scope Reg 98-02, RX (only accuracy and AUC included here for sim-
Summ-Surg Oth 98-02, Summary stage 2000 plicity), along with the distribution of survived
(1998+)), data of 1998 and after was used. There and not-survived patients in the data used to build
were a total of 70,132 instances of patients with the corresponding model.
cancer of the respiratory system between 1998
and 2001, and there were 118 attributes in the
Association Rule Mining
raw data from SEER.
For association rule mining analysis, all missing/
The SEER-related preprocessing resulted in
unknown values were removed, since we are
modification and splitting of several attributes,
interested in finding segments with precise defini-
many of which were found to have significant
tions in terms of patient attributes. The survival
predictive power. In particular, 2 out of 11 newly
time (in months) was chosen as the target attribute
created (derived) attributes were within the top
for the Hotspot algorithm. The dataset had 13,033
13 attributes that were eventually selected to be
instances, 13 input patient attributes, and 1 target
used in the lung cancer outcome calculator. These
attribute. The average survival time in the entire
were (a) the count of regional lymph nodes that
dataset (tavg) was 24.45 months. So it would be
were removed and examined by the pathologist
interesting to find segments of patients where the
and (b) the count of malignant/in situ tumors.
These attributes were derived from “Regional
Nodes Examined” and “Sequence Number-Cen- Table 3 Model classification performance (tenfold cross
tral,” respectively, from raw SEER data, both of validation)
which had nominal values encoded within the % % Not % Model
same attribute, with the latter also encoding non- Model Survived survived accuracy AUC
malignant tumors. After performing various steps 5 year 12.8 87.2 91.8 0.924
of data transformation and feature selection, the 2 year 23.4 76.6 85.6 0.859
data was reduced to 46,389 instances of lung 1 year 40.2 59.8 74.5 0.796
cancer patients and 13 attributes (excluding the 9 month 48.8 51.2 71.0 0.779
outcome attribute). 6 month 60.1 39.9 69.8 0.765
5 year| 16.9 83.1 89.8 0.912
3 month
5 year| 21.4 78.6 87.3 0.900
Predictive Analytics 6 month
For predictive analytics, binary outcome attri- 5 year| 31.9 68.1 82.1 0.875
butes for 6-month, 9-month, 1-year, 2-year, and 12 month
5-year survival were derived from survival time. 5 year| 43.9 56.1 78.1 0.850
The dataset of 5-year survival was subsequently 18 month
filtered to generate five new datasets for modeling 5 year| 54.9 45.1 76.1 0.830
24 month
conditional survival after 5 years of diagnosis,
average survival time is significantly higher than attribute. The average survival time in the condi-
or significantly lower than 24.45 months. Two tional survival dataset was 42.54 months. So, the
independent analyses were performed to find seg- above analysis was repeated on the conditional
ments in which average survival time was higher survival dataset with tavg = 42.54.
and lower than overall average survival, Tables 4 and 5 present the nonredundant
represented in the form of association rules. Lift association rules obtained with “higher” and
of a rule/segment is a multiplicative metric that “lower” mode, respectively. Tables 6 and 7 pre-
measures the relative improvement in the target sent the same for the conditional survival
(here survival time) as compared to the average dataset.
value of the target across the entire dataset.
For association rule mining analysis on condi-
tional survival data, a new dataset was constructed Lung Cancer Outcome Calculator
using only the cases in which the patient survived
at least 12 months from the time of diagnosis. The The web tool is available at http://info.eecs.north
conditional survival dataset had 6,788 instances, western.edu/LungCancerOutcomeCalculator, and
the same 13 input patient attributes, and 1 target uses the following 13 attributes:
Table 4 Nonredundant association rules denoting segments where average survival time is significantly higher than
24.45 months
Segment description Avg. survival time Segment size Lift
The tumor is well differentiated and localized, regional lymph nodes 68.18 100 2.79
examined are between 4 and 17, age of the patient at time of diagnosis is
less than 79, current tumor is patient’s first or second tumor, and
resection of lobe/bilobectomy is performed by the surgeon
The tumor is localized, age of patient is between 39 and 52, number of 68.11 100 2.79
regional lymph nodes examined is between 1 and 14, and resection of
lobe/bilobectomy is performed by the surgeon
Tumor is well differentiated, number of regional lymph nodes examined 66.83 101 2.73
is less than 15, resection of lobe/bilobectomy is performed, and regional
lymph nodes are removed
Tumor is localized, age of patient is between 41 and 52, tumor is 66.26 111 2.71
confined to one lung, and resection of lobe/bilobectomy is performed
Patient is born in Hawaii, patient’s age is less than 76, there is no lymph 64.98 106 2.66
node involvement, and resection of lobe/bilobectomy is performed
Tumor is localized, patient is born in Hawaii, patient’s age is less than 63.96 101 2.62
83, and surgery is performed
Tumor is well differentiated, number of lymph nodes examined is 63.86 101 2.61
between 7 and 18, there is no lymph node involvement, and patient’s age
is less than 81
Tumor is localized, patient is born in Connecticut, tumor is confined to 63.10 103 2.58
one lung, number of lymph nodes examined is greater than two, and
resection of lobe/bilobectomy is performed
Tumor is well differentiated, there is no lymph node involvement, 62.16 100 2.54
patient’s age is less than 76, and intrapulmonary/ipsilateral hilar/
ipsilateral peribronchial nodes are removed
Tumor is localized (confined to one lung), patient is born in Hawaii and 60.38 101 2.47
is less than 82 years old
Tumor is localized (confined to one lung), patient is born in Hawaii, and 60.18 103 2.46
cancer is confirmed by positive histology
Tumor is localized, patient is born in California, and resection of lobe/ 58.71 100 2.40
bilobectomy is performed by the surgeon
Table 5 Nonredundant association rules denoting segments where average survival time is significantly lower than
24.45 months
Tumor has metastasized and is poorly differentiated, lymph nodes are 5.21 100 4.69
involved in metastasis, and no lymph nodes are removed
Tumor has metastasized and is poorly differentiated, no surgery is 5.67 110 4.31
performed, and the patient is born in Hawaii
Tumor has metastasized, no surgery is performed, cancer is confirmed 5.73 128 4.26
by positive histology, and patient is born in Hawaii
Tumor has metastasized, surgery is contraindicated and not performed, 5.78 132 4.23
and cancer is confirmed by positive histology
Pleural effusion has taken place, tumor is poorly differentiated, 7.53 205 3.25
subcarinal/carinal/mediastinal/tracheal/aortic/pulmonary ligament/
pericardial lymph nodes are involved, and no surgery is performed
Pleural effusion has taken place, cancer is confirmed by positive 8.60 112 2.84
cytology, surgery is not recommended and hence not performed
Table 6 Nonredundant association rules denoting segments in the conditional survival dataset where average survival
time is significantly higher than 42.54 months
Tumor is well differentiated and localized, patient’s age is less than 72.92 104 1.71
71, less than 13 regional lymph nodes are examined, and resection of
lobe/bilobectomy is performed
Tumor is well differentiated and localized (confined to one lung), 72.50 103 1.70
patient’s age is less than 71, surgery is performed, less than eight
regional lymph nodes are examined
Tumor is well differentiated, patient’s age is less than 84, regional lymph 71.95 100 1.69
nodes are removed, no lymph node involvement, no radiation therapy,
and resection of lobe/bilobectomy is performed
Tumor is localized (confined to one lung), patient’s age is between 41 and 69.66 105 1.64
52, surgery is performed, and resection of lobe/bilobectomy is performed
Tumor is well differentiated, patient’s age is less than 79, no lymph node 68.44 100 1.61
involvement, between 5 and 9 regional lymph nodes are examined
Tumor is localized (confined to one lung), patient’s age is less than 67.99 119 1.60
77, patient is born in Connecticut, and resection of lobe/bilobectomy is
performed
Patient’s age is less than 76, patient is born in Hawaii, no lymph node 67.81 101 1.59
involvement, and resection of lobe/bilobectomy is performed
Patient’s age is less than 75, patient is born in California, no lymph node 65.37 102 1.54
involvement, and resection of lobe/bilobectomy is performed
Tumor is localized, no regional lymph nodes are removed, and resection 62.14 102 1.46
of lobe/bilobectomy is performed
1. Age at diagnosis: Numeric age of the patient and spread. Available options are well-
at the time of diagnosis of lung cancer. differentiated, moderately differentiated,
2. Birth place: The place of birth of the patient. poorly differentiated, undifferentiated, and
There are 198 options available to select for undetermined.
this attribute (based on the values observed in 4. Diagnostic confirmation: The best method
the SEER database). used to confirm the presence of lung cancer.
3. Cancer grade: A descriptor of how the can- Available options are positive histology, pos-
cer cells appear and how fast they may grow itive cytology, positive microscopic
Table 7 Nonredundant association rules denoting segments in the conditional survival dataset where average survival
time is significantly less than 42.54 months
Tumor is undifferentiated and has metastasized, subcarinal/carinal/ 17.18 100 2.48
mediastinal/tracheal/aortic/pulmonary ligament/pericardial lymph
nodes are involved, no regional lymph nodes are removed, and no
surgery is performed
Tumor is spread, surgery not recommended, patient is born in Iowa 20.28 137 2.10
Tumor is spread and undifferentiated, surgery not recommended, 20.35 124 2.09
subcarinal/carinal/mediastinal/tracheal/aortic/pulmonary ligament/
pericardial lymph nodes are involved, and cancer is confirmed by
positive histology
Pleural effusion has taken place, and tumor is poorly differentiated 22.96 101 1.85
confirmation (method unspecified), positive recommended, contraindicated due to other

laboratory test/marker study, direct visualiza- conditions, unknown reason, patient or
tion, radiology, other clinical diagnosis, and patient’s guardian refused, recommended but
unknown if microscopically confirmed. unknown if done, and unknown if surgery
5. Farthest extension of tumor: The farthest performed.
documented extension of tumor away from 9. Order of surgery and radiation therapy:
the lung, either by contiguous extension The order in which surgery and radiation
(regional growth) or distant metastases (can- therapies were administered for those patients
cer spreading to other organs far from primary who had both surgery and radiation. Avail-
site through bloodstream or lymphatic sys- able options are no radiation and/or surgery,
tem). There are 20 options available to select radiation before surgery, radiation after sur-
for this attribute. The original SEER name for gery, radiation both before and after surgery,
this attribute is “EOD extension.” intraoperative radiation therapy,
6. Lymph node involvement: The highest spe- intraoperative radiation with other radiation
cific lymph node chain that is involved by the given before/after surgery, and sequence
tumor. Cancer cells can spread to lymph unknown but both surgery and radiation
nodes near the lung, which are part of the were given. The original SEER name for
lymphatic system (the system that produces, this attribute is “RX Summ-Surg/Rad Seq.”
stores, and carries the infection-fighting 10. Scope of regional lymph node surgery: It
cells). This can often lead to metastases. describes the removal, biopsy, or aspiration of
There are eight options available for this attri- regional lymph node(s) at the time of surgery of
bute. The original SEER name for this attri- the primary site or during a separate surgical
bute is “EOD Lymph Node Involv.” event. There are eight options available for this
7. Type of surgery performed: The surgical attribute. The original SEER name for this attri-
procedure that removes and/or destroys can- bute is “RX Summ-Scope Reg 98-02.”
cerous tissue of the lung, performed as part of 11. Cancer stage: A descriptor of the extent to
the initial work-up or first course of therapy. which the cancer has spread, taking into
There are 25 options available for this attri- account the size of the tumor, depth of pene-
bute, like cryosurgery, fulguration, wedge tration, metastasis, etc. Available options are
resection, laser excision, pneumonectomy, in situ (noninvasive neoplasm), localized
etc. The original SEER name for this attribute (invasive neoplasm confined to the lung),
is “RX Summ-Surg Prim Site.” regional (extended neoplasm), distant (spread
8. Reason for no surgery: The reason why neoplasm), and unstaged/unknown. The orig-
surgery was not performed (if not). Available inal SEER name for this attribute is “Sum-
options are surgery performed, surgery not mary Stage 2000 (1998+).”
12. Number of malignant tumors in the past: that it is meant to complement and not replace the
An integer denoting the number of malignant advice of a medical doctor. Many such calculators
tumors in the patient’s lifetime so far. This are becoming popular in healthcare.
attribute is derived from the SEER attribute
“Sequence Number-Central,” which encodes
both numeric and categorical values for both Other Applications of Big Data
malignant and benign tumors within a single Analytics in Healthcare
attribute. As part of the preprocessing, the
original SEER attribute was split into numeric We will conclude with a sampling of some other
and nominal parts, and the numeric part was applications of big data in healthcare. There has
further split into two attributes representing been abundant work on mining electronic health
number of malignant and benign tumors, records in addition to what is described in this
respectively. chapter. Some of these include mining data from
13. Total regional lymph nodes examined: An a particular hospital (Mathias et al. 2013), Amer-
integer denoting the total number of regional ican College of Surgeons National Surgical Qual-
lymph nodes that were removed and exam- ity Improvement Program (ACS NSQIP)
ined by the pathologist. This attribute was (Agrawal et al. 2013b), and United Network for
derived by extracting the numeric part of the Organ Sharing (UNOS (Agrawal et al. 2013c).
SEER attribute “Regional Nodes Examined.” Apart from electronic health records, a very
important source of healthcare data is social
Figure 3 shows a screenshot of the lung cancer media. We are in the midst of a revolution in
outcome calculator. This calculator is widely which, using social media, people interact, com-
accessed from more than 15 countries, including municate, learn, influence, and make decisions.
many medical schools and hospitals. A previous This data includes multi-way communications
version of this calculator was presented in and interactions on social media (e.g., Facebook,
Agrawal et al. (2011a). The current calculator Twitter), discussion forums, and blogs in the area
incorporates faster models as described in this of healthcare, public health, and medicine. The
chapter and has a redesigned interface. It allows emergence and ubiquity of online social net-
the user to enter values for the above-described works have enriched this data with evolving
13 attributes and get patient-specific risk. For all interactions and communities at mega-scale,
the ten models, it also shows the distribution of and people are turning to social media for various
survived and not-survived patients in the form of kinds of healthcare guidance and knowledge,
pie charts. Upon entering the patient attributes on including proactive and preventive care. Patients
the website, the patient-specific risk calculated by with like conditions – often chronic conditions,
all the ten models is depicted along with the such as flu, cancer, allergy, multiple sclerosis,
healthy and sick patient risk, which are essentially diabetes, arthritis, ALS, etc. – find patients with
the median risk of death of patients who actually the same condition on these networking sites and
survived and did not survive, respectively, as cal- in public forums. And these virtual peers can
culated by the corresponding model. It generates very much become a key guiding source of data
bar charts corresponding to each of the ten unlike in the past, when all information emanated
models, and each of them has three bars. The from physicians. This big data, being produced in
middle bar denotes the patient-specific risk, and social media domain, offers a unique opportunity
the left (right) bars denote the healthy (sick) for advancing, studying the interaction between
patient risk. The patient-specific risk is thus put society and medicine, managing diseases, learn-
in context of the healthy and sick patient risk for ing best practices, influencing policies, identify-
an informative comparison. ing best treatment, and, in general, empowering
Any data-driven tool like this in the field of people. It thus has numerous applications in pub-
healthcare has a disclaimer about its use, stating lic health informatics, and we are already seeing
Fig. 3 Screenshot of the lung cancer outcome calculator. (Available at http://info.eecs.northwestern.edu/

LungCancerOutcomeCalculator)
several studies in this domain (Lee et al. 2013, is nonetheless in its infancy, and we are still far
2015; Xie et al. 2013). from realizing the dream of personalized medicine
Technological advances in sensors, micro- and by optimally utilizing the flood of genomic data
nano-electronics, advanced materials, mobile that we are able to collect now. Clearly, computa-
computing, etc., have had an immense impact tional sequence analysis techniques are critical to
toward enabling future Internet of things (IoT) unearth the hidden knowledge from such genomic
applications in several fields including healthcare. sequence data, and big data analytics is expected
We are currently witnessing a rapid adoption of to play a big role in that. For further reading on big
wearable devices under the IoT paradigm for a data analytics in genomics, the following articles
variety of healthcare applications (Andreu-Perez are recommended (Howe et al. 2008; ODriscoll
et al. 2015). These wearable and implantable sen- et al. 2013; Marx 2013).
sors along with smartphones that are ubiquitously
used all over the world form another source of
healthcare big data and provide unprecedented Summary
opportunities for continuous healthcare monitor-
ing and management. Big data has become a very popular term denoting
The field of genomics is another area where big huge volumes of complex datasets generated from
data analytics can play an important role. It is well various sources at a rapid rate. This big data
recognized that in genomics and life sciences, potentially has immense hidden value that needs
almost everything is based on complex sequence- to be discovered by means of intelligently
structure-function relationships, which are far designed analysis methodologies that can scale
from being well understood. With genomic for big data and all of that falls in the scope of
sequencing becoming progressively easier and big data analytics. In this chapter, we have looked
affordable, we have arrived at a point in time at some of the big data challenges in general and
where huge amounts of biological sequence data also what they mean in context of healthcare. As
have become increasingly available, thanks to the an example on big data mining in healthcare,
advent of next-generation sequencing (NGS). some recent works dealing with the use of predic-
Functional interpretation of genomic data is the tive analytics and association rule mining on lung
major task in fundamental life science. Research cancer data from SEER were discussed, including
results in this area in turn feed research in other a lung cancer outcome calculator that has been
important areas such as cell biology, genetics, deployed as a result of this analytics. Finally, we
immunology, and disease-oriented fields. There also briefly looked at a few other healthcare-
has been a lot of work in bioinformatics on related areas where big data analytics is playing
sequence data in terms of computationally mining an increasingly vital role.
the genomic sequences for interesting insights
such as homology detection (Agrawal and
Huang 2009, 2011). Furthermore, biological
References
sequencing data also ushers an era of personal
genomics enabling individuals to have their per- Agrawal A, Choudhary A. Association rule mining based
sonal DNA sequenced and studied to allow more hotspot analysis on seer lung cancer data. Int J Knowl
precise and personalized ways of anticipating, Discov Bioinform (IJKDB). 2011a;2(2):34–54.
Agrawal A, Choudhary A. Identifying hotspots in lung
diagnosing, and treating diseases on an individual
cancer data using association rule mining. In: 2nd
basis (precision medicine). Genome assembly and IEEE ICDM workshop on biological data mining and
sequence mapping techniques (Huang and Madan its applications in healthcare (BioDM); 2011b.
1999; Misra et al. 2011) form the first step of this p. 995–1002.
Agrawal A, Choudhary A. Perspective: materials informat-
process by compiling the overlapping reads into a
ics and big data: realization of the fourth paradigm of
single genome. While it is a fact that personalized science in materials science. APL Mater. 2016;4
medicine is becoming more and more common, it (053208):1–10.
Agrawal A, Huang X. Psiblast pairwisestatsig: reordering Lee K, Agrawal A, Choudhary A. Mining social media
psi-blast hits using pairwise statistical significance. streams to improve public health allergy surveillance.
Bioinformatics. 2009;25(8):1082–3. In: Proceedings of IEEE/ACM international conference
Agrawal A, Huang X. Pairwise statistical significance of on Social Networks Analysis and Mining (ASONAM);
local sequence alignment using sequence- specific and 2015.p. 815–22.
position-specific substitution matrices. IEEE/ACM Magill SS, Edwards JR, Bamberg W, Beldavs ZG,
Trans Comput Biol Bioinformatics. 2011;8(1):194–205. Dumyati G, Kainer MA, Lynfield R, Maloney M,
Agrawal A, Misra S, Narayanan R, Polepeddi L, Choudhary McAllister-Hollod L, Nadle J, Ray SM, Thompson
A. A lung cancer outcome calculator using ensemble DL, Wilson LE, Fridkin SK. Multistate point-
data mining on seer data. In: Proceedings of the tenth prevalence survey of health care-associated infections.
international workshop on data mining in bioinformatics N Engl J Med. 2014;370(13):1198–208.
(BIOKDD), New York: ACM; 2011. p. 1–9. Marx V. Biology: the big challenges of big data. Nature.
Agrawal A, Misra S, Narayanan R, Polepeddi L, 2013;498(7453):255–60.
Choudhary A. Lung cancer survival prediction using Mathias JS, Agrawal A, Feinglass J, Cooper AJ, Baker
ensemble data mining on seer data. Sci Program. DW, Choudhary A. Development of a 5 year life expec-
2012;20(1):29–42. tancy index in older adults using predictive mining of
Agrawal A, Patwary M, Hendrix W, Liao WK, Choudhary electronic health record data. J Am Med Inform Assoc.
A. High performance big data clustering. IOS Press; 2013;20:e118–24. JSM and AA are co-first authors.
2013a. p. 192–211. Misra S, Agrawal A, Liao W-k, Choudhary A. Anatomy of
Agrawal A, Al-Bahrani R, Merkow R, Bilimoria K, a hash-based long read sequence mapping algorithm for
Choudhary A. “Colon surgery outcome prediction using next generation dna sequencing. Bioinformatics.
acs nsqip data,” In: Proceedings of the KDD workshop on 2011;27(2):189–95.
Data Mining for Healthcare (DMH); 2013b. p. 1–6. ODriscoll A, Daugelaite J, Sleator RD. Big data, hadoop
Agrawal A, Al-Bahrani R, Raman J, Russo MJ, Choudhary and cloud computing in genomics. J Biomed Inform.
A. Lung transplant outcome prediction using unos data. 2013;46(5):774–81.
In: Proceedings of the IEEE big data workshop on Bio- Ries LAG, Eisner MP. Cancer of the lung. In: Ries LAG,
informatics and Health Informatics (BHI); 2013c. p. 1–8. Young JL, Keel GE, Eisner MP, Lin YD, Horner M-J,
Andreu-Perez J, Leff DR, Ip H, Yang G-Z. From wearable eds. SEER survival monograph: Cancer survival among
sensors to smart implants – toward pervasive and per- adults: U.S. SEER program, 1988–2001, Patient and
sonalized healthcare. IEEE Trans Biomed Eng. Tumor Characteristics. NIH Pub. No. 07–6215.
2015;62(12):2750–62. Bethesda, Md: National Cancer Institute, SEER
Collins GS, Reitsma JB, Altman DG, Moons Program; 2007:73–80.
KG. Transparent reporting of a multivariable prediction SEER, Surveillance, epidemiology, and end results (seer)
model for individual prognosis or diagnosis (tripod): the program (www.seer.cancer.gov) limited-use data
tripod statement. Ann Intern Med. 2015;162(1):55–63. (1973–2006). National Cancer Institute, DCCPS, Sur-
Ganguly AR, Kodra E, Agrawal A, Banerjee A, Boriah S, veillance Research Program, Cancer Statistics Branch;
Chatterjee S, Chatterjee S, Choudhary A, Das D, 2008. Released April 2009, based on the November
Faghmous J, Ganguli P, Ghosh S, Hayhoe K, Hays C, 2008 submission.
Hendrix W, Fu Q, Kawale J, Kumar D, Kumar V, Liao Xie Y, Honbo D, Choudhary A, Zhang K, Cheng Y,
WK, Liess S, Mawalagedara R, Mithal V, Oglesby R, Agrawal A. Voxsup: a social engagement framework.
Salvi K, Snyder PK, Steinhaeuser K, Wang D, Wuebbles Proceedings of the 18th ACM SIGKDD international
D. Toward enhanced understanding and projections of conference on Knowledge discovery and data mining
climate extremes using physics-guided data mining tech- (KDD) (Demo paper). ACM; 2012. p. 1556–9.
niques. Nonlinear Process Geophys. 2014;21:777–95. Xie Y, Chen Z, Zhang K, Cheng Y, Honbo DK, Agrawal A,
Hey T, Tansley S, Tolle K, editors. The fourth paradigm: Choudhary A. Muses: a multilingual sentiment elicita-
data-intensive scientific discovery. Redmond: Micro- tion system for social media data. IEEE Intell Syst.
soft Research; 2009. 2013a;99:1541–672.
Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Xie Y, Chen Z, Cheng Y, Zhang K, Agrawal A, WK Liao,
Hide W, Hill DP, Kania R, Schaeffer M, Pierre SS, Choudhary A. Detecting and tracking disease out-
et al. Big data: the future of biocuration. Nature. breaks by mining social media data. In: Proceedings
2008;455(7209):47–50. of the 23rd International Joint Conference on Artificial
Huang X, Madan A. Cap3: a dna sequence assembly pro- Intelligence (IJCAI); 2013b.p. 2958–60.
gram. Genome Res. 1999;9(9):868–77. Xie Y, Palsetia D, Trajcevski G, Agrawal A, Choudhary
Lee K, Agrawal A, Choudhary A. Real-time disease sur- A. Silverback: scalable association mining for temporal
veillance using twitter data: demonstration on flu and data in columnar probabilistic databases. In: Proceed-
cancer. In: Proceedings of the 19th ACM SIGKDD ings of 30th IEEE International Conference on Data
international conference on Knowledge discovery and Engineering (ICDE), Industrial and Applications
data mining (KDD); 2013.p. 1474–77. Track; 2014. p. 1072–83.
Health Services Data: Managing the Data
Warehouse: 25 Years of Experience at 2
the Manitoba Centre for Health Policy
Mark Smith, Leslie L. Roos, Charles Burchill, Ken Turner,
Dave G. Towns, Say P. Hong, Jessica S. Jarmasz, Patricia J.
Martens, Noralou P. Roos, Tyler Ostapyk, Joshua Ginter, Greg
Finlayson, Lisa M. Lix, Marni Brownell, Mahmoud Azimaee,
Ruth-Ann Soodeen, and J. Patrick Nicol
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Who We Are . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
What We Do . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Our Data Is Our Strength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Repository Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Concept Dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Characteristics of Administrative Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Data Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Applying for Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Repository Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
The Data Management Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Step 1: Formulate the Request and Receive the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Step 2: Become Familiar with the Data Structure and Content . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Step 3: Apply SAS ® Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Step 4: Evaluate Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Step 5: Document the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

M. Smith (*) · L. L. Roos · C. Burchill · K. Turner ·
D. G. Towns · S. P. Hong · J. S. Jarmasz · N. P. Roos · M.
Brownell · R.-A. Soodeen · J. P. Nicol
Manitoba Centre for Health Policy, University of
e-mail: Mark_Smith@cpe.umanitoba.ca;
Leslie_Roos@cpe.umanitoba.ca; Charles_Burchill@cpe.
umanitoba.ca; Ken_Turner@cpe.umanitoba.ca;
Dave_Towns@cpe.umanitoba.ca; Say_PhamHong@cpe.
umanitoba.ca; Jessica_Jarmasz@cpe.umanitoba.ca;
Noralou_Roos@cpe.umanitoba.ca;
Marni_Brownell@cpe.umanitoba.ca;
Ruth-Ann_Soodeen@cpe.umanitoba.ca

https://doi.org/10.1007/978-1-4939-8715-3_3
20 M. Smith et al.
Step 6: Release the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Percent of Time Spent on Each Data Management Activity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Data Quality Evaluation Tool for Administration Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Completeness and Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Assessing Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Referential Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Trend Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Assessing Agreement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Assessing Crosswalk Linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Advantages of Using a Population-Based
Registry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Expanding Capabilities into Social
Policy Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Using Place-of-Residence Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Constructing Reliable Social Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Identifying Siblings and Twins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Beyond Health Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Summing Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Abstract one of the most extensive collections of gov-

The “Data Repository” at the Manitoba Centre ernment administrative, survey, and clinical
for Health Policy (MCHP) is a cornerstone of data holdings in the world, including every-
the organization and one of the three “pillars” thing from hospital and medical claims to
on which it stands (the other two being child welfare services and educational enrol-
Research Program and Knowledge Transla- ment and outcomes. Over 70 different govern-
tion). For 25 years, MCHP has maintained ment and clinical databases flow into the
organization on an annual basis. This chapter
outlines how the data are collected, organized,
P. J. Martens documented, managed, and accessed in a pri-
vacy protecting fashion for use by researchers
T. Ostapyk in Canada, North America, and around the
University Advancement, Carleton University, Ottawa,
ON, Canada world. The research conducted by MCHP,
e-mail: Tyler.Ostapyk@Carleton.ca which is located in the Rady Faculty of Health
J. Ginter Sciences at the University of Manitoba, in
Montreal, QC, Canada addition to being relevant to policy and gov-
e-mail: joshua.ginter@gmail.com ernment decision makers is regularly published
G. Finlayson in leading academic journals. The chapter con-
Finlayson and Associates Consulting, Kingston, ON, cludes with a discussion of the relative
Canada strengths of using a population-based longitu-
e-mail: finlayson.consulting@cogeco.ca
dinal registry and some of the challenges faced
L. M. Lix in organizing and using available data for
Department of Community Health Sciences, University of
Manitoba, Winnipeg, MB, Canada research purposes.
e-mail: lisa.lix@umanitoba.ca
M. Azimaee
ICES Central, Toronto, ON, Canada
e-mail: Mahmoud.Azimaee@ices.on.ca
2 Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba. . . 21
Introduction Our Data Is Our Strength
Who We Are MCHP was the first research unit of its kind in
Canada. It continues to be recognized for its com-
The Manitoba Centre for Health Policy (MCHP) prehensive and ever-expanding, linkable
is a research organization located within the population-based data repository; its collaborative
Department of Community Health Sciences, models of working with government and health
Max Rady College of Medicine, Rady Faculty of regions; and for the outstanding caliber of its
Health Sciences, at the University of Manitoba research (Jutte et al. 2011; Wolfson 2011). The
(see Fig. 1). MCHP maintains the unique Popula- Repository (see Fig. 3) is unique in terms of its
tion Health Research Data Repository (the Repos- comprehensiveness, degree of integration, and ori-
itory) that is used by researchers to describe and entation around an anonymized population registry.
explain patterns of health care as well as profiles All the data files in the Repository are
of illness, and to explore other factors that influ- de-identified: names, addresses, phone numbers,
ence health such as socioeconomic status and real personal health identification numbers
(income, education, employment, social status, (PHINs) are removed before files are transferred
etc.). This chapter provides an overview of to MCHP by the data provider. MCHP complies
MCHP, concentrating on the acquisition and prep- with all laws and regulations governing the
aration of data, and the management of the Repos- protection and use of personal information. Strict
itory to support research and to protect the privacy policies and procedures are implemented to pro-
and confidentiality of Manitobans. The chapter tect the privacy and security of anonymized data.
that follows concentrates on MCHP’s research Information in the Repository comes mainly
production as well as the policy and program from Manitoba Health and other provincial gov-
impacts of those products over the past 25 years. ernment departments. The ability to link files and
track healthcare use from more than 70 databases,
some of which include data as far back as 1970,
What We Do allows researchers to investigate the health of
Manitobans across a wide spectrum of indicators.
MCHP’s mission is to conduct world-class popu- The data can tell us about Manitobans’ visits to
lation-based research to support the development the doctor, hospital stays, home care and nursing-
of evidence-informed policy, programs, and ser- home use, pharmaceutical prescriptions, etc. It is
vices that maintain and improve the health and continually expanding into new areas such as
well-being of Manitobans (see Fig. 2). education (kindergarten through grade 12 and
Fig. 1 MCHP’s location

within the University of University of Manitoba
Manitoba
Rady Faculty of
Health Sciences
Max Rady College
of Medicine
Dept. of
Community
Health Sciences
MCHP
22 M. Smith et al.
Fig. 2 MCHP’s mission,

goals, vision, and values
Fig. 3 The Population Health Research Data Repository Data not yet a part of the registry but is currently being acquired
is represented by a dotted line
some post-secondary), social housing, laboratory Canada through the Data Liberation Initiative,
diagnostic information, in-hospital pharmaceuti- are also stored in the repository and available
cals, and justice. Additional area-level data such for linkage by postal code.
as the Canadian census indicator of average Some examples of how the data in the reposi-
household income, available from Statistics tory have been used in the past include:
• MCHP’s research into Manitoba’s aging pop- services for groups such as Manitoba’s Franco-
ulation has helped estimate future needs for phone and Métis populations or for individual
nursing-home beds, so regions can begin stra- regional health authorities (RHAs) (Chartier
tegically to add services over the coming et al. 2012; Fransoo et al. 2013; Martens
decades (Chateau et al. 2012). et al. 2010).
• The results from MCHP’s report Population
Aging and the Continuum of Older Adult MCHP personnel interact extensively with gov-
Care in Manitoba, published in February ernment officials, healthcare administrators, and cli-
2011, were used by the Manitoba Government nicians to develop a topical and relevant research
to invest $216 million to add more home care agenda. The strength of these interactions, along
support; a new rehabilitation program for with the application of rigorous academic standards,
seniors after surgery; as well as new personal enables MCHP to make significant contributions to
care homes (Doupe et al. 2011). the development of health and social policy. MCHP
• A report released in 2010 found that women undertakes five major research projects every year
enrolled in Manitoba’s Healthy Baby Prenatal under contract with Manitoba Health. In addition,
Benefit program had fewer low birth weight MCHP investigators secure external funding by
babies and fewer preterm births among other competing for research grants. Research completed
measurable improvements, lending substantial at MCHP is widely published and internationally
support for the program (Brownell et al. 2010). recognized (see Fig. 4). MCHP researchers collabo-
• Other MCHP reports document comparative rate with a number of highly respected scientists from
health status and the use of health and social Canada, the United States, Europe, and Australia.
Fig. 4 Number of documented publications in peer-reviewed journals arising from the use of MCHP Data, 1977–2014
24 M. Smith et al.
Privacy everyone who has access to MCHP data (see

Applying for Access below for details).
Ensuring privacy and confidentiality of data Under PHIA entities that collect data are called
regarding individuals is a priority. MCHP protects trustees. Briefly, demographic data (identifying/
data against loss, destruction, or unauthorized use. personal information) – including items such as
The data MCHP receives is de-identified so name, address, and phone number – and an inter-
researchers and data analysts never know the iden- nal reference number are sent from the trustee to
tity of the individuals in the data. A detailed pro- Manitoba Health, where the identifying informa-
cess has been developed whereby information tion is used to lookup or verify an existing PHIN
from trustees can be transferred to MCHP in for each client. This process involves determinis-
de-identified and scrambled form (see Fig. 5). tic and probabilistic data linkage. The PHIN is
Our principles and procedures for ensuring confi- then encrypted and attached to each record, and
dentiality go beyond using de-identified data. As a the identifying information is removed. At the
custodian of sensitive information, MCHP adheres same time, the trustee sends the reference number
to the rules for privacy and protection of personal and the program data to MCHP. When the
information outlined in the province’s Freedom of encrypted PHIN is received by MCHP, the refer-
Information and Protection of Privacy Act (FIPPA) ence number is used to link it to the program data
and the Personal Health Information Act (PHIA). (Fig. 5). Consequently, no single organization has
MCHP implements many security safeguards in its all of the pieces of the linkage puzzle: the trustee
data network, including restricted access, does not have access to the scrambled PHIN,
two-factor authentication, and file encryption. Manitoba Health does not have access to the pro-
Every project requires review and approval from gram data, and MCHP does not have access to the
the University of Manitoba’s Health Research identifying information.
Ethics Board (HREB), the Health Information Pri- At MCHP, files are stored separately until all
vacy Committee (HIPC), and relevant data pro- approvals for a project are received and then they
viders. MCHP’s commitment to privacy also are linked. Once a research project is complete,
includes mandatory accreditation sessions for the code and data are retained for up to 7 years, but
Non-Health Data Provider

Identifying Information
ID (e.g. Name, Address, etc.)
Mark 1001000111001
4732 Internal Reference #
(Scrambled Case ID)
Program Data
Scrambled PHIN
Manitoba Health MCHP

Finds real PHIN, strips off identifying information, Files are stored separately and can only be
scrambles PHIN linked for approved research purposes
ID
890
4732
Sends this crosswalk file to MCHP
Fig. 5 De-identification process diagram

the archived data cannot be accessed without Adjusted Clinical Group ® (ACG ®) Case-Mix
appropriate approvals. System, Complications and Comorbidities,
MCHP also implements small number disclo- Teenage Pregnancy, Diagnoses and Procedures
sure control. Non-zero values that are less than six • Education: High School Completion, Indices of
are suppressed in final reports. This helps to Educational Achievement, Curriculum Level
ensure that the privacy and confidentiality of indi- • Statistics: Intra-class Correlation Coefficient
viduals is retained while allowing individual level (ICC), Sensitivity and Specificity, Prevalence
data to be used for research purposes. and Incidence, General Estimating Equations
(GEE)
• Data Management: Record Linkage, Com-
mon Exclusions, Duplicate Records
Repository Tools
• Geographic Analysis: Regional Health
Authorities (RHAs), Winnipeg Community
MCHP has developed a number of web-based
Areas (WCAs)
resources that document the historical use of infor-
• Costing: Hospital Stays, Prescriptions, Physi-
mation stored in the repository. Much of this “cor-
cian Services, Home Care
porate knowledge” is captured in two resources: the
• Socioeconomic Status: Income Quintiles,
MCHP Glossary and the Concept Dictionary.
Socioeconomic Factor Index (SEFI)
• Social: Family Structure, Income Assistance
(IA), Residential Mobility
Glossary
The MCHP Glossary is a compilation of short def- Developing new concepts involves collabora-
initions for key terms used in MCHP publications. It tion between the research team, a concept devel-
documents terms commonly used in population oper, and the Concept Dictionary Coordinator. As
health and health services research and consists of shown in Fig. 6, the process involves five steps:
over 2,300 entries. Each glossary term contains a (1) A request for the development (or update) of a
brief definition (and its source), links to related concept; (2) Identification of reference materials
entries in the glossary and concept dictionary, and and sources; (3) Development of a draft;
links to pertinent external sites and reports. (4) Review of the draft involving feedback and
revisions; and (5) Publication of the concept on
the MCHP website.
Concept Dictionary The contents of a concept typically include:
The MCHP Concept Dictionary contains detailed – A descriptive title

operational definitions and SAS ® program code – The date it was first available or last updated
for variables or measures developed from admin- – An introduction to the topic
istrative data. Because data are often complicated – A description of the methods used, including
to work with and government decisions about data sources, background information, steps in
definition, collection, and availability of data can developing the concept, validation of the pro-
change over time, having these resources avail- cess, and hyperlinks to additional information
able helps to communicate historical learning and such as findings and results in publications
reduce the probability of future error. Some exam- – The concept’s limitations or any cautions
ples of the many types of concepts that have been related to its use
developed include: – An example of the SAS ® code and/or formats
associated with the concept
• Health: Charlson Comorbidity Index, Suicide – Links to related terms in the MCHP Glossary
and Attempted Suicide, the John Hopkins or Concept Dictionary
26 M. Smith et al.
Fig. 6 Concept development process
– Links to additional supporting material (both key data on processes and outcomes, and the
internal and external), and a list of references inconsistent recording of provider information.
for the concept On the other hand, the administrative data housed
in the Repository yields a number of advantages
for conducting high-quality research, including:
An example concept is listed below in Fig. 7
The MCHP Glossary and Concept Dictionary • Population based: The entire population of
are available on-line at: http://umanitoba.ca/facul
the province is covered by the Manitoba
ties/health_sciences/medicine/units/community_
Health Services Insurance Plan. Nonparti-
health_sciences/departmental_units/mchp/resour cipation is minimal since residents are not
ces/concept_dictionary.html
required to pay premiums to register for
insured benefits.
• Unique identifiers: Use of a consistent set of
Characteristics of Administrative Data identifiers (with identification numbers of both
program recipients and providers scrambled to
Because administrative data are collected primarily ensure confidentiality) permits researchers to
for purposes other than research, care is required to build histories of individuals across time and
ensure accurate results. Potential limitations across government programs. For example,
include clinically imprecise coding, absence of individuals who are discharged from hospital
Fig. 7 A screenshot of the “Income Quintiles” concept. Available at: http://mchp-appserv.cpe.umanitoba.ca/

viewConcept.php?conceptID=1161
can be linked to the medical claims file in order MCHP has created a series of tools to docu-
to determine whether adverse events are being ment the content of the data files, the process of
treated in physicians’ offices. gaining access to the data, and techniques for
• Longitudinal: Migration into and out of the working with the data.
province as well as mortality can be traced
from 1970 onward. Tracking groups of sub-
jects through time can determine if individuals Data Documentation
receiving a given intervention truly have no
adverse outcomes or if adverse events are not The MCHP Metadata Repository, currently avail-
showing up because the individual has left the able to internal users and users at remote access
province or has died. sites, organizes all of the Repository’s documen-
tation. This tool provides a consistent set of doc-
Some of the key characteristics and research umentation components for each group of data
importance of these attributes are detailed in Table 1. files. Components, displayed in the form of six
28 M. Smith et al.
Table 1 Manitoba research registry: key characteristics and research relevance (Roos 1999)
Characteristics Research relevance
Very large N Many physical and statistical controls are feasible; rare events can be
analyzed; statistical power is high
Population based for an entire province Heterogeneity along many variables is provided
Longitudinal data (going back over Many types of longitudinal designs are possible; important variables can be
30 years) measured more reliably
Loss to follow-up specified Follow-up critical for cohort studies is accommodated
Place of residence (according to postal Length of exposure to geographic areas can be quantified; measures of
code) at any point in time mobility and migration can be defined; small area variation analyses can be
developed
Family composition at any point in time Nonexperimental designs estimating the importance of different social
variables and controlling for unmeasured background characteristics are
facilitated
Fig. 8 The MCHP metadata repository data dictionary intranet page
tabbed pages, include an Overview, Data Model, geographical parameters, data caveats, access
Data Quality Report, Data Dictionary, Additional requirements, and links to concepts to assist
Documents and Reports, and a Blog. See Fig. 8. users working with the data. These descriptions
provide users a sense of the extent, purpose,
1. Overview – A standardized data description scope, and subject of a given database. They
summarizes the data, information on the data can also act as a first stop for researchers
provider, purpose and method of data collection, attempting to assess the feasibility of an adminis-
years of available data, size of data files, trative data project. The following list (see Table 2)
Table 2 Standard headers used to describe all databases housed in the Repository
Header name Description
Summary A brief summary of the data, often used in grant applications, requests for data, and report
glossaries. These serve as a very basic and general introduction to the data
Source agency Data provider. Frequently the same agency from which access permission is required
Type A conceptual category (domain) that is indicative of the type of record included in the file
(e.g., administrative or survey)
Purpose Provides a brief overview of why the data is collected by the source agency. What use it
serves in the originating organization
Scope The scope of the database; who or what is in, and who or what is not. May also include
geographic, age, or program scope
Data collection method A brief description of the original data collection process at the source
Size General estimates of numbers of rows (records or observations) and columns (fields or
variables)
Data components The separate tables or sections that make up the data set
Data level The level at which researchers can effectively and reliably study the data (e.g., individual or
aggregate)
Data years Range of data years and whether acquired by calendar, fiscal, or academic year
Data highlights Key characteristics applicable for typical analyses
Data cautions Obvious issues with the data of potential importance to researchers or useful for assessing
project feasibility
Access requirements Who to apply to in order to gain access to the data
Direct links to the source agency’s contact info or website are also included when
appropriate
More information Links to other sources of information such as the glossary, data dictionary, concept
dictionary, provider’s webpage, etc
Previous and potential List of, and links to, MCHP deliverables and other reports or projects using the data
studies
References Any references used in the description/overview
Date modified The date the overview was last modified
shows the standard headers used to describe all 3. Data Quality Report – The usability of each
databases housed in the Repository. field is addressed when data files are stored in
Before the overview is published the data the Repository and evaluations are summa-
provider and a selection of users who fre- rized in a report available in the metadata
quently work with the data review the docu- repository. The data quality framework guid-
ment for accuracy and completeness. ing this effort is available on MCHP’s external
Overviews have also been sent to external website. A complete description of the data
organizations, such as Thomson and Reuter’s quality process is provided in the document A
“Data Citation Index,” that include these doc- Data Quality Evaluation Tool for Administra-
uments in their integrated search systems. This tion Data available online and from MCHP
facilitates the introduction of the data to exter- Data Quality Framework http://umanitoba.ca/
nal researchers, allows users to track and dis- faculties/health_sciences/medicine/units/commu
cover publications using a specific MCHP nity_health_sciences/departmental_units/mchp/
dataset, and increases the reach of the work protocol/media/Data_Quality_Framework.pdf.
produced by MCHP. 4. Data Dictionary – The data dictionary iden-
2. Data Model – A data model is created to dis- tifies the files and tables held in the Repository.
play the structure of data files and how they are It provides detailed descriptions of individual
linked together in the Repository (see Fig. 9). data elements to assist users in their extraction,
30 M. Smith et al.
Fig. 9 Example of a data model diagram: social housing
management, and understanding of the data.

Data file names and locations, field names, Applying for Access
field definitions, descriptive labels, formats, a
list of responses and frequencies for categorical Under the Personal Health Information Act of
variables or means, and distributions for Manitoba (PHIA), MCHP acts as a custodian of
numeric variables are provided in a web-based the data housed in the Repository. Access is based
format. on the principle that the data is owned by the
5. Documentation Directory– Original informa- organization contributing the data – the data stew-
tion from the data provider, project documen- ard. Data-sharing agreements, negotiated with the
tation, and links to relevant concepts are stored provider, spell out the terms of use once the infor-
with other documents that may be helpful in mation is housed at MCHP. In addition, the
the interpretation of data files, such as training research proposal process, administrative reporting
manuals, annual reports, and validation requirements, and data use and disclosure require-
studies. ments have been documented and are available on
6. Blog – The blog component is a communica- the MCHP website on the Applying for Access
tion tool for analysts and researchers’ inter- page: http://umanitoba.ca/faculties/health_sciences/
ested in communicating information about the medicine/units/community_health_sciences/depart
data as it is discovered. mental_units/mchp/resources/access.html
As the number of data files and users has The Data Management Process
grown ensuring a common prerequisite level
of knowledge has become increasingly impor- MCHP’s six-step data management process (see
tant. An accreditation process established in Fig. 13) describes how data are transferred from a
April 2010 provides a consistent overview of source agency, processed, and brought into the
MCHP and its data access and use policies and Repository in order to be used for research purposes.
procedures. The accreditation material covers
the MCHP mission (see Fig. 2), available data
in the Repository, and the requirements for data Step 1: Formulate the Request
use and publication of results. Accreditation is and Receive the Data
required for all researchers, students, and per-
sonnel working on approved projects. Once the A data-sharing agreement must be in place before
initial accreditation session is completed, an any data can be received from the source agency.
online accreditation refresher module is avail- MCHP works in consultation with the source
able and must be completed annually. Accred- agency and the University of Manitoba’s Office
itation information is also available for public of Legal Counsel to produce an agreement. The
access at: http://umanitoba.ca/faculties/health_ data-sharing agreement defines policies and prac-
sciences/medicine/units/community_health_sci tices about data confidentiality, privacy, legislative
ences/departmental_units/mchp/resources/accre and regulatory requirements, data transfer, and
ditation.html ongoing use of the data for research purposes.
Data-sharing agreements are of two types: agree-
ments for data added to the Repository at regular
intervals (typically annually), and agreements for
Repository Documentation data provided for a single research project. For data
added to the Repository at regular intervals,
More general summaries of the Repository con- MCHP assumes responsibility for overseeing its
tents are produced in several formats: use. This involves ensuring that appropriate poli-
cies and procedures governing use are established,
1. Dataflow Diagram documented, and enforced. For data added only for
The dataflow diagram illustrates the flow of data one specific project – called project-specific data –
from its original source into the Repository. A the principal investigator of the project assumes
reduced-scale version is shown in Fig. 10. responsibility for overseeing the use of the data.
2. Data lists – several lists are maintained, each Once a data-sharing agreement is produced, a
serving different purposes: data management analyst is assigned to work with
a. Population Health Research Data Repository the source agency to facilitate the transfer. Initially
List – a searchable and filterable list that indi- this involves meeting with representatives from the
cates the years of available data, the source source agency to acquire background information,
agency for each database, and provides links documentation, data model diagrams, data dictio-
to individual data descriptions. An illustration naries, documentation about historical changes in
of the interface is provided in Fig. 11. the data (including changes in program scope, con-
b. Data Years Chart – Displays the years of tent, structure, and format), existing data quality
available data for each file, with links to reports, and other information relevant to the
data descriptions. Figure 12 provides an description or use of the data. This information is
example of the list. used to: (a) develop a formal data request;
3. Data Repository Slides – PowerPoint slides (b) enhance the metadata repository, which con-
commonly used by researchers that describe or tains database documentation; and (c) prepare the
provide a representation of the MCHP data Data Quality Report. The analyst asks the source
Repository (see Fig. 3). agency for reports or publications that document
32
Fig. 10 A screenshot of the MCHP dataflow diagram available online. The full-scale diagram is available online at: http://umanitoba.ca/faculties/health_sciences/medicine/units/
community_health_sciences/departmental_units/mchp/protocol/media/dataflow_diagram.pdf
M. Smith et al.
Fig. 11 A Screenshot of how the Population Health faculties/health_sciences/medicine/units/community_health_

Research Data Repository Searchable List Appears on the sciences/departmental_units/mchp/resources/repository/
Website. The list is available online at: http://umanitoba.ca/ datalist.html
the entities in the data, such as people, places, Once the documentation and sample data file
events, or activities (e.g., annual reports). This have been evaluated, a formal data request is
information is used to assess the accuracy and prepared and sent to the source agency. The data
validity of the files that are brought into the Repos- are then shipped to Manitoba Health for
itory. Available financial data, such as annual bud- de-identification and data linkage (described
gets and total expenditures for specific programs, above, under Privacy).
are also requested if available.
The initial data request encompasses historical
documentation; that is, information that may have Step 2: Become Familiar with the Data
gone through multiple revisions over time, particu- Structure and Content
larly in response to health system changes. The initial
data request may in fact be a series of requests, one Once MCHP receives the data, a data manage-
for each generation of source data. Future requests ment analyst reviews the documentation and the
for updates may refer to the most recent generation organization of files and structures. While data in
only. All changes in coding methods, program con- the Repository are usually organized to reflect the
straints, and accounting measures are documented structure of the original source data, sometimes
and incorporated into the metadata repository. the files must be reorganized to permit addressing
A sample data file is often prepared by the questions about different units of analysis that
source agency and transferred to MCHP at the comprise the data, including persons, places,
same time as the initial documentation transfer. objects, events, and dates.
Ideally, the sample consists of a random Tasks undertaken in the process of becoming
anonymized subset of the original data. familiar with the data structure and content include:
34 M. Smith et al.
Fig. 12 A screenshot of the website and the years of social sciences/departmental_units/mchp/protocol/media/Avail

data available. Also available online at: http://umanitoba.ca/ able_Years.pdf
faculties/health_sciences/medicine/units/community_health_
1. Standardizing unique record identifiers. If the 5. Reorganizing and converting files to a different
PHIN is missing, then a unique “placeholder” file format, if necessary.
value is created by MCHP analysts.
2. Standardizing dates of events and correcting Step 3: Apply SAS ® Programs
incomplete dates, where possible.
3. Standardizing frequently used demographic MCHP uses SAS ® for analysis, which performs
data elements, including sex and postal code. optimally with data files that have been
4. Identifying and restricting access to data ele- denormalized (SAS Institute Inc. 2006).
ments not normally made available to Denormalization is a process of adding redun-
researchers without special permission. Exam- dant information to a data file to reduce the
ples include registration numbers and hospital processing time required for analysis. Standard-
chart numbers. ized formats are applied to selected fields, such
1. Formulate the Request and Receive the Data
Check the Prepare the Receive data

Liaise with the source agency to discuss acquisition
data sharing data request and
of the data and its documentation
agreement letter documentation
2. Become Familiar with Data Structure and Content
If receiving sample data, test it

Review provided If required, create a data model
and send feedback to the
documentation for the original data
source agency
3. Apply SAS Programs
Apply data field Liaise with the

Apply normalization or de-normalize as Install on Create
and SAS format source agency
required SPDS metadata
standards as needed
4. Evaluate Data Quality
Test the installed data using Identify solutions to address

Prepare Data Quality Report
standardized protocols deficiencies in data quality
5. Document Data
Install the documentation in the Metadata Repository
6. Release Data to Programmer(s) and Researcher(s)
Meet with programmer(s) and researcher(s) to present data structure and content
Fig. 13 The six-step data management process
as date fields. Once a data file has been pre- then used to create a summary of the contents
pared for research use, the SAS Scalable Per- for documentation purposes.
formance Data Server (SPDS) is used to sort
and create indices and other design elements Step 4: Evaluate Data Quality
appropriate for the most commonly used appli-
cations. During this process, standard naming A Data Quality Report is produced for each
conventions for data files are applied. SAS ® is dataset in the Repository. This report is housed
36 M. Smith et al.
in the metadata repository, which provides a sin- Step 6: Release the Data
gle point of access for all documentation
concerning a data file. The structure and contents If the data files and documentation appear ready,
of the Report, and the framework guiding the the data can be released internally for use. Release
development of the report, are described below may be informal, in which case analysts are simply
under Data Quality Evaluation Tool for Admin- notified that the new data and documentation are
istration Data. available for use, or more formal, involving pre-
sentations to data analysts and researchers. The
latter is useful when a new data source is valuable
for multiple research projects, if substantial
Step 5: Document the Data changes have occurred to existing data or when
the source agency has introduced a new data-
Data dictionaries, which contain information capture process or system. New and updated
about the name, contents, and format of each datasets are also announced in the MCHP quarterly
field, are created and stored in the metadata repos- newsletter Research Resources Update, published
itory. The data dictionaries can be used to conduct online at: http://umanitoba.ca/faculties/health_sci
an initial review of data quality; a cursory review ences/medicine/units/community_health_sciences/
can identify problems such as missing data, departmental_units/mchp/resources/repository/
incompleteness of labels and descriptors, prob- rrupdate.html
lems with ranges in numeric values, and/or integ-
rity of data linkage keys.
Before the data are stored in the Repository, Percent of Time Spent on Each Data
the data dictionaries are subjected to an initial Management Activity
assessment of accuracy and completeness. If
deficiencies are identified, the analyst will inves- Now that MCHP has developed a methodolog-
tigate them through further contacts with the ical approach to acquiring and installing data,
source agency, Manitoba Health, or MCHP time spent in each of the various categories of
personnel. activity can be tracked. Figure 14 shows staff
3.3% 1. Data request

8.1%
2. Exploring data
28.5% 3. Programming/install
4. Database Maintenance
22.4% 5. Data quality
6. Documentation
7. Data release
7.4%
8. Application Development
0.1% 2.7% 9. Admin/Other

9.8%
17.8%
*There are seven Data Analysts in the Data Management Work
Fig. 14 Average percent of time spent on each data management activity* for 2014
time spent in each category accumulated over a To determine the quality of data coming into
1 year period. It was instructive to realize that MCHP an evaluation tool was developed (see
about one third of staff time is spent in either Fig. 15). This tool was implemented using SAS ®
administrative (meetings and presentations, software and is specifically designed to assess the
general communication, training) or application following characteristics of administrative data:
development activities. The latter includes such
things as the development of data quality • Completeness and correctness
macros and tools to implement the metadata • Consistency
repository. As Fig. 14 shows, programming • Referential Integrity
data to be stored (programming/installing data) • Trends in the data
and documenting data are two of the largest • Crosswalk linkage assessment
areas of activity, followed by data quality • Agreements using kappa statistics
assessments and exploring data on arrival at
the center. The smallest areas of activity
involve requesting data and performing revi- Completeness and Correctness
sions to existing data (database maintenance).
MCHP continues to monitor time spent on each Completeness refers to the magnitude of missing
activity in order to track fluctuations over time. values; such values are identified and reported for
At the moment, MCHP does not have a formal all data elements. The assessment of correctness
data release process; therefore, no time is includes the fraction of data elements that are
accruing in that activity. A dissemination strat- valid, invalid (e.g., categorical variables that do
egy will be developed in the coming year. not match a reference list, out of range numerical
variables, invalid dates such as a living person
born in the 1800s), missing data, and outliers for
Summary all numeric variables. The process of checking the
large number of files that flow into the repository
The six-step data management process used at at MCHP would be infeasible if not for the ability
MCHP follows standards and practices observed to automate the process. Completeness and cor-
in other similar initiatives as well as recommen- rectness can be evaluated using an automated set
dations developed by organizations maintaining of SAS macros developed at MCHP called
repositories of anonymized personal health META, INVALID CHECK, and VIMO. These
information for research purposes (for examples, macros produce the VIMO table (see Fig. 16) that
see (Daas et al. 2008; Holman et al. 1999; documents the percentage of valid, invalid, miss-
Lyman et al. 2008)). MCHP’s process also ing, and outlier data. Fields with invalid values are
reflects some of the more unique aspects of the flagged and the total number of invalid records is
political and social environment in which it automatically noted in the comment column.
operates, including relationships with source
agencies, the software platform on which the
Repository is maintained, and provincial health Assessing Consistency
privacy legislation.
Consistency refers to the intra-record relationship
among data elements. For example, hospital admis-
Data Quality Evaluation Tool sion dates must precede hospital separation (dis-
for Administration Data charge) dates. Consistency can be assessed using
MCHP’s VALIDATION macro which is based on
Data collected for administrative purposes are not predefined consistency criteria. Each record is
always of the best quality for research, and poor checked for consistency, and the results are summa-
quality data may lead to false conclusions. rized as a table showing the total number of
38 M. Smith et al.
REFERENTIAL
INTEGRITY
Macro
AGREEMENT
Macro
LINK
Macro TREND
Macro
DQ
LINKYR GEN
VALIDATION
Macro Macro Macro
VIMO
CONTENTS
Macro Macro
AUTOMATE META
Macro* Macro
DOCUMENTATION
INVALID System
CHECK
Macro
*All Macros displayed above are SAS® macros with the
exception of the AUTOMATE macro which is an Excel VBA macro
Fig. 15 Generating a data quality report
inconsistent records. For example, the validation Trend Analysis

macro can be used to check for inconsistencies in
reporting the pregnancy indicator (see Fig. 17). A macro has been developed to perform a trend
analysis for core data elements. For example, no
change across years may indicate a data quality
Referential Integrity problem if the data are expected to trend natu-
rally upward or downward due to policy, social,
In a relational database, referential integrity refers or economic changes. Fields such as the diagno-
to the quality of linkages existing between data- sis and treatment of a specific cancer can be
base tables. Typically one table contains a unique assessed over a number of years. The macro
identifier known as the primary key, which may be plots frequency counts across a specified time
a single attribute or a set of attributes that uniquely period. This macro also fits a set of common
identify each record. Other tables will contain for- regression models and chooses the best-fit
eign keys, and each foreign key value must refer- model based on the minimum root mean square
ence a primary key value in the primary table. The error (RMSE). With the best regression model
REFERENTIAL INTEGRITY macro (see selected, studentized residuals with the current
Fig. 18) checks for the number of primary keys observation deleted are calculated. Aggregated
having duplicate or missing values as well as the observations with absolute studentized residuals
total number of foreign key values that do not greater than tð0:95, n p 1Þ are flagged as
reference a valid primary key (orphaned values). potential outliers indicating an unusual change
2
Dataset Label: dataset label Records: 10000 Legend (Potential Data Quality Problems) :
None or Minimal Moderate Significant Unknown
Dataset Name: dataset name Period: yyyy
< 5% 5-30% > 30% or N/A
= No variance or 100% missing value
= Min, Max values based on valid range
Type Variable Name Variable Label Valid Invalid Missing Outlier Min Max Mean Median STD Comment
VAR1 variable1 100.00 .00
ID
VAR2 variable2 100.00 .00
VAR3 variable3 94.75 4.76 .49 0.83 10.00 8.67 9.23 1.48
VAR4 variable4 70.77 29.23 .00 1.00 99.00 38.63 2.08 46.06
VAR5 variable5 95.09 4.70 .21 0.00 10.00 8.13 9.01 1.96
Num
VAR6 variable6 100.00 .00 .00 0.00 0.00 .00 .00 .00
VAR7 variable7 85.91 .00 14.09 0.00 110.00 6.10 .01 22.99
Observed Values
VAR8 variable8 99.32 .68 .00 -1, 0, 1 -1 ( 68 Invalid Obs. in total )
VAR9 variable9 .00 100.00
23, 01, 21, 25, 19, 07, 16, 09, 26, 28, 08, 10, 27, 30, 18, 17, 29, 22, 31, 12, 11,
VAR10 variable10 93.41 6.59
03, 15, 14, 13, 02, 04, 05, 06, 24, 20
15, 24, 75, 76, 78, 79, 80, 81, 83, 84, 85, 86, 88, 89, 90, 91, 92, 94, 97, 98,
Char
VAR11 variable11 100.00 .00 100, 102, 103, 104, 130, 132, 137, 138, 146, 148, 217, 229, 233, 234, 236,
237, 238, 239, 112, 77, 101, 231, 113, 82, 74, 87, 227, 235, 226, 232
VAR12 variable12 100.00 .00 2011
VAR13 variable13 28.02 .02 71.96 2001-03-28 2006-03-13 1582-10-14 ( 2 Invalid Obs. in total )
VAR14 variable14 99.61 .39 2003-06-28 2006-11-04
1226 invalid obs. out of
VAR15 variable15 87.74 12.26 .00 02JAN2001:03:13:36 01APR2006:22:26:52 [01JAN2001:23:59:59,
01APR2006:23:59:59] range
VAR16 variable16 100.00 .00 0:00:02 23:59:48
Time Datetime Date

Fig. 16 A VIMO table generated by macros which documents the percentage of valid, invalid, missing, and outlier data elements
Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba. . .
39
40 M. Smith et al.
Fig. 17 The validation macro demonstrates inconsistencies
Key: CLIENT_VISIT_GUID
TOTAL
PRIMARY TABLE DUPLICATE MISSING RECORDS
WRHA_EDIS_CLIENT_2007JAN 1 (x3) 1,098,981
ORPHAN TOTAL
FOREIGN TABLE VALUES RECORDS
WRHA_EDIS_STATUS_2007JAN 399 2,987,150
WRHA_EDIS_PROVIDER_2007JAN 400 6,133,612
WRHA_EDIS_NACRS_2007JAN 188 586,504
Fig. 18 Output from the referential integrity macro
for a particular year of data. Typical output is identifying information, and the data are sent to
illustrated in Fig. 19. Variations in expected MCHP with an encrypted PHIN that can be linked
trends are typically used as indicators that further with MCHP databases for research purposes. The
exploration is necessary. viability of linking incoming data with other
MCHP databases can be assessed using the
LINK and LINKYR macros (see Fig. 20).
Assessing Agreement
Since many of the MCHP data linkages are based Summary

on probabilistic matching, rates of agreement for
sex and date of birth between the incoming data The data quality report generated by these macros is
and MCHP’s population-based longitudinal useful in several ways. First, potential data quality
health registry are evaluated using kappa statistics issues are flagged so that researchers and data ana-
and the AGREEMENT macro. lysts are aware of potential pitfalls when performing
data analyses. Second, sharing the report with data
providers draws their attention to potential issues so
Assessing Crosswalk Linking that action can be taken to improve the quality of
data over time. Third, it provides a useful starting
Before data arrive at Manitoba Health, they are point for discussing the data, both to new users who
first sent to Manitoba Health for data linkage. may have no idea of the content as well as among
Manitoba Health then removes all personal and data acquisition staff so that they can spot
Trend Analysis for Dataset: project burns_trauma_trans_2000_2010

Date Variable: ADMDATE By Variable: ALL
By Variable = All Records_
500
400
Frequency
300 Y=Beta0 + Beta*(1/X)
200
100
0
2000/01
2001/02
2002/03
2003/04
2004/05
2005/06
2006/07
2007/08
2008/09
2009/10
Fiscal Year
Significant Outliers
Identical Subsequent Frequencies
Suppressed Small Frequencies (betwenn 0 to 6)
Fig. 19 Typical output from the trend analysis macro
discrepancies and anomalies in the data and correct provides an opportunity for preparing data,
or document them before the data is released. improving quality, and understanding error
Anyone interested in implementing the Data through linkage to files with independent informa-
Quality assessment tools developed at MCHP can tion on relevant variables. For example, comparing
download the source code, examples, and docu- date of death from the Manitoba Health Insurance
mentation at http://umanitoba.ca/faculties/health_ Registry with the date recorded in the governments
sciences/medicine/units/community_health_sciences/ Vital Statistics files allows for error correction.
departmental_units/mchp/resources/repository/data The population-based registry has been criti-
quality.html. This software is freely available for cally important for many studies since 1977 (Roos
use under a GNU General Public License. et al. 1977). Besides using the registry for com-
puting geographically-based rates, individuals
have been located within families to determine
Advantages of Using a Population- the health and health-service use of particular
Based Registry ethnic groups (Martens et al. 2005, 2011) and
the registry has been critical for longitudinal stud-
As illustrated in Fig. 3, a central component of the ies, being used for relatively short-term follow-up
Repository is an anonymized population-based of surgical outcomes and multi-year birth-cohort
registry: a longitudinal registry of individuals cov- research (Brownell et al. 2014; Oreopoulos
ered by the provincial health insurance plan. It et al. 2008; Roos et al. 1992).
42 M. Smith et al.
100%
90%
80%
70%
Manitoba Health Cadham

Provincial Laboratory Datasets
MHCPL_CMORGANISM_19922010
MHCPL_CMRESULTS_19922010
60% MHCPL_CMSECTION_19922010
MHCPL_SPSEROTESTS_19922010
MHCPL_SPPARATESTS_19922010
MHCPL_SPSECTION_19922010
50%
Fig. 20 Output from the LINKYR macro
Expanding Capabilities into Social • Building reliable social measures applicable

Policy Research across studies. In some cases this may mean
tracking family composition over time (mar-
Canada’s population-based data on families, riages, marriage break-up, remarriage, family
neighborhoods, and schools are increasingly size, and ages of children).
being used to study individuals cost-effectively • Identifying siblings and twins to facilitate more
in their social context. However, considerable complex research methodologies.
preparatory work is necessary to move from
health research to social research. This work Using Place-of-Residence Data
includes:
Postal codes allow users of the MCHP research
• Developing scales to measure new outcomes registry to infer the location of individuals at any
such as educational achievement at the popu- specified date after 1970. Semi-annual updates
lation level. allow capturing information on individuals who
• Using place-of-residence data (at any given have notified Manitoba Health of a change in
point in time) to calculate the number of residence or provided an updated address as part
moves, number of years in certain neighbor- of an encounter with the healthcare system. Con-
hoods (poor vs. wealthy), upward and down- sequently, research designs that involve linking
ward mobility (as defined by neighborhood individuals to their neighborhood context are rel-
median income). atively easy to implement.
Variables pertaining to residential mobility, make collecting this data more efficient and reli-
years living in a neighborhood with particular able (Roos et al. 2008).
characteristics and so on, can be generated. Such One of the advantages in using administrative
recording of “exposure” is methodologically social variables is that they can help adjust for
superior to relying on cross-sectional variables differences in family background. For example,
from one point in time. Variables measured at in one study it was found that nine variables
various times over relatively long periods may (gender, income assistance, receiving services/
help resolve disagreements as to when, in the children in care, family structure, number of sib-
early life course, different factors might be occur- lings, birth order, mother’s age at first birth, resi-
ring. To treat periods in the life course separately, dential mobility, and the neighborhood-based
counts of years in a neighborhood or in a particu- Socioeconomic Factor Index) accounted for as
lar social situation can be generated for different much variance in the Manitoba Language Arts
intervals (e.g., ages 0–1, 2–4, 5–9, etc.). achievement test as a similar sized set of variables
from survey data (Roos et al. 2008, 2013). That is,
administrative data were as good at predicting the
Constructing Reliable Social Measures outcome as were the survey data.
The MCHP research registry’s capacity to place a

Manitoba resident within a family structure at any Identifying Siblings and Twins
given time permits development of a number of
social variables: Birth cohorts, siblings, and twins may be defined
from one or more sources of administrative data.
• Number of years on income assistance In Manitoba, two hospital separation (discharge)
• Average household income abstracts – one for the mother and one for the
• Number of children in the family infant – are produced for each in-hospital birth.
• Birth order (particularly being first-born) These records can be checked against each other
• Mother’s marital status at birth of first child and against the Manitoba Health Insurance Reg-
• Number of years living in a single-parent family istry. Two siblings with the same birth date are
• Age of mother at birth of first child designated as twins.
• Family structure (or number of years in differ- Sibling and twin studies are important because
ent types of families) omitted variables and measurement error that occur
• Number of family structure changes (parental in studies that do not examine siblings and twins are
separations, remarriages) likely to bias the coefficients attached to measured
• Number of years living with a disabled parent variables. The effects of certain variables such as
• Number of household location moves birth weight may be overestimated when other vari-
• Immigrant status ables associated with the family are not appropri-
• Neighborhood characteristics ately controlled for. In Canada, the availability of
statistical power (a large N), heterogeneity – e.g.,
The longitudinal nature of the data and the wide variations in population characteristics across
repeated measurements that it facilitates constitute areas – and place-of-Residence data enhances the
a real strength. For example, since the short-term possibility for sophisticated research designs
effects of income assistance and welfare recipiency employing siblings and twins (Roos et al. 2008).
differ from long-term effects, being able to differ-
entiate between the two can be critical. A few
survey-based studies have counted years in partic- Beyond Health Research
ular types of families or neighborhoods in an effort
to quantify the impact of various social environ- Since the importance of early childhood conditions
ments, but longitudinal administrative data may are known to be significant, yet childhood disease is
44 M. Smith et al.
relatively rare, information on school performance population-based study. Early Child Dev Care.
and income assistance can provide a window on 2014;185:291–316.
Chartier M, Finlayson G, Prior H, McGowan K-L, Chen H,
well-being during childhood and adolescence. Atten- de Rocquigny J, Walld R, Gousseau M. Health and
tion to the socioeconomic gradient over the early life healthcare utilization of francophones in Manitoba.
course builds on the hypothesis that the relatively 2012. http://mchp-appserv.cpe.umanitoba.ca/refer
affluent will disproportionately take advantage of ence/MCHP_franco_report_en_20120513_WEB.pdf.
Accessed 29 May 2013.
and benefit from health and educational programs. Chateau D, Doupe M, Walld R, Soodeen RA, Ouelette C,
In other words, wealthy individuals are more likely Rajotte L. Projecting personal care home bed equivalent
to be exposed to and take advantage of new initia- needs in Manitoba through 2036. 2012. http://mchp-
tives and opportunities that make them healthier and appserv.cpe.umanitoba.ca/reference/MCHP_pch_days_
report_WEB.pdf. Accessed 29 May 2013.
low-income people are less likely to do so. Daas PJH, Arends-Tóth J, Schouten B, Kuijvenhoven L,
Statistics Netherlands. Quality framework for the eval-
uation of administrative data. 2008. http://www.
Summing Up pietdaas.nl/beta/pubs/pubs/21Daas.pdf
Doupe M, Fransoo R, Chateau D, Dik N, Burchill C,
Soodeen R-A, Bozat-Emre S, Guenette W. Population
The Population Health Research Data Repository aging and the continuum of older adult care in Mani-
housed at MCHP is one of the most established and toba. 2011. http://mchp-appserv.cpe.umanitoba.ca/ref
comprehensive provincial repositories of health erence/LOC_Report_WEB.pdf. Accessed 29 May
2013.
and social data in Canada. Currently, more than Fransoo R, Martens P, The Need to Know Team, Prior H,
200 research projects are being conducted using Burchill C, Koseva I, Bailly A, Allegro E. The 2013
these data. In addition to the policy-relevant RHA indicators atlas. 2013. http://mchp-appserv.cpe.
research produced in the form of deliverables to umanitoba.ca/reference//RHA_2013_web_version.pdf.
Accessed 20 Nov 2013.
the Manitoba government (discussed in the next Holman CDJ, Bass AJ, Rouse IL, et al. Population-based
chapter) numerous high-quality academic papers linkage of health records in Western Australia: devel-
are published in areas of health services research opment of a health services research linked database.
and population health. Increasingly, studies are Aust N Z J Public Health. 1999;23:453–9.
Jutte DP, Roos LL, Brownell MD. Administrative record
focusing on the social determinants of health as linkage as a tool for public health research. Annu Rev
more social data becomes available (some of these Public Health. 2011;32:91–108.
are listed in Roos et al. 2008). Lyman JA, Scully K, Harrison JH. The development of
Research units like MCHP that house large health care data warehouses to support data mining.
Clin Lab Med. 2008;28:55–71.
databases accessed by many investigators and Martens PJ, Sanderson D, Jebamani L. Mortality compari-
graduate students can benefit from the creation sons of First Nations to all other Manitobans: a provin-
of web-based research resources to compile and cial population-based look at health inequalities by
disseminate common organizational knowledge. region and gender. Can J Public Health. 2005;96:S33–8.
Martens PJ, Bartlett J, Burland E, Prior H, Burchill C, Huq S,
Creating a single point of access to the knowledge Romphf L, Sanguins J, Carter S, Bailly A. Profile of metis
generated from a wide range of projects is impor- health status and healthcare utilization in Manitoba: a
tant for ensuring a high level of productivity and population-based study. 2010. http://mchp-appserv.cpe.
methodological excellence. umanitoba.ca/reference/MCHP-Metis_Health_Status_
Full_Report_(WEB)_(update_aug11_2011).pdf. Acc
essed 29 May 2013.
Martens PJ, Bartlett JG, Prior HJ, et al. What is the com-
parative health status and associated risk factors for the
References Metis? A population-based study in Manitoba, Canada.
BMC Public Health. 2011;11:814.
Brownell M, Chartier M, Au W, Schultz J. Evaluation of the Oreopoulos P, Stabile M, Walld R, et al. Short, medium,
healthy baby program. 2010. http://mchp-appserv.cpe. and long term consequences of poor infant health: an
umanitoba.ca/reference/MCHP-Healthy_Baby_Full_ analysis using siblings and twins. J Hum Resour.
Report_WEB.pdf. Accessed 29 May 2013. 2008;43:88–138.
Brownell MD, Nickel NC, Chateau D, et al. Long-term Roos NP. Establishing a population data-based policy unit.
benefits of full-day kindergarten: a longitudinal Med Care. 1999;37:JS15–26.
Roos NP, Roos LL, Henteleff PD. Elective surgical rates: Roos LL, Hiebert B, Manivong P, et al. What is most
do high rates mean lower surgical standards? N Engl J important: social factors, health selection, and adoles-
Med. 1977;297:360–5. cent educational achievement. Soc Indic Res.
Roos LL, Fisher ES, Brazauskas R, et al. Health and 2013;110:385–414.
surgical outcomes in Canada and the United States SAS Institute Inc. SAS data integration studio 3.3: user’s
Summer. Health Aff (Millwood). 1992;11 guide. 2006. http://support.sas.com/documentation/
(Summer):56–72. onlinedoc/etls/usage33.pdf. Accessed 12 Aug 2014.
Roos LL, Brownell M, Lix L, et al. From health research to Wolfson M. A shining light in Canada’s health information
social research: privacy, methods, approaches. Soc Sci system. Healthcare Policy. 2011;6:8–13.
Med. 2008;66:117–29.
Health Services Data, Sources
and Examples: The Institute for Clinical 3
Evaluative Sciences Data Repository
Karey Iron and Kathy Sykora
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Strengths and Challenges of Using Health Administrative Data for Health
Services Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
The ICES Data Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Privacy, Data Governance, and Access to Data at ICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Record Linkage and Desensitizing the Data for Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Data Documentation, Metadata, and Data Quality Assessment . . . . . . . . . . . . . . . . . . . . . 53
Data Quality Assessment in the Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
New Data, New Uses, and New Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Abstract home care, complex continuing and long-term

Under the 1982 Canada Health Act, health care, and claims for prescription drugs, for
services deemed essential for all residents are example. When linked to each other, these
universally paid for by the provinces. Canadian highly comprehensive data may be used to
provinces, and others around the world, rou- answer health system and research questions
tinely collect data that allow them to adminis- such as: Are those who require care getting the
ter health services provided to their care they need? Is the care provided timely and
populations. Generally, this spectrum of health based on evidence? What organizational
administrative data includes information about aspects of the healthcare system could improve
people and their use of the health system, such care? This chapter describes the uses of health
as physicians’ billing claims, hospital dis- administrative data for research, its benefits
charges, emergency and ambulatory care, and limitations compared to traditional
research data, the concept of linking datasets
for health services research, emerging data
K. Iron (*)
College of Physicians and Surgeons of Ontario, Toronto,
quality scientific methods, and the caveats in
ON, Canada interpreting administrative data. Issues of data
e-mail: kiron@cpso.on.ca governance and privacy, data documentation,
K. Sykora and quality assessment are presented. These
Toronto, ON, Canada

https://doi.org/10.1007/978-1-4939-8715-3_4
48 K. Iron and K. Sykora
concepts will be illustrated through the exam- This chapter focuses on the following areas:
ple of the data held in the Institute for Clinical
Evaluative Sciences (ICES) Data Repository in • Strengths and challenges of using health
Ontario. administrative data for health services research
• Privacy and data governance
• Record linkage and desensitizing the data for
Introduction research
• Data documentation and data quality
Under the 1982 Canada Health Act, health ser- assessment
vices deemed essential for all residents must be • New data, new uses, and new ideas
paid for by the provinces and territories. In
order to manage, administer, and pay for health These concepts will be illustrated through the
services for their populations, the provinces and example of the data held in the Data Repository at
territories routinely collect information about the Institute for Clinical Evaluative Sciences
health system transactions. Generally, this spec- (ICES), a research organization in Toronto,
trum of health administrative data includes Ontario, Canada, that collects and manages a
information about people and their use of the large data repository that is used to generate evi-
health system, physicians’ billing claims, hos- dence to improve health and the health system in
pital discharges, emergency and ambulatory Ontario.
care, home care, complex continuing and long-
term care, and claims for publically funded pre-
scription drugs, to name a few. Other large and Strengths and Challenges of Using
routinely collected datasets are also generated Health Administrative Data for Health
and used by various organizations throughout Services Research
the health system to understand how health ser-
vices are being used. Examples include public In Ontario and elsewhere in Canada, health
health program information, agency-level client administrative data are used not only for manag-
information, population-based registries and ing the health system but also for health services
surveys, electronic medical records, and, most research, policy development, and healthcare
recently, large genomic biobank data. The planning. Since most residents are eligible for
power of these data is amplified when they are healthcare, the data reflect full coverage of pub-
linked to each other to understand the whole licly funded service transactions. The data repre-
picture of healthcare delivery. According to sent actual encounters with the healthcare system
Friedman et al. (2005), when these data are and are therefore population based, free from
used to generate “health statistics,” they create recall bias, readily available, consistent over
“fundamental knowledge about the health of time, and are inexpensive to collect and use for
populations” that inform the health system, secondary purposes compared to traditional
“influences on health” that guide policy deci- research data. Generally, health administrative
sions, and “interactions among those influ- data are collected using standardized coding met-
ences” that guide program development and rics, especially when the data are collected by a
clinical care (Friedman et al. 2005). For exam- single source (such as a provincial health author-
ple, linked data may answer health system, ity or ministry). Using one dataset alone is useful
population-based, and clinical research ques- for health system surveillance and monitoring, but
tions such as: Are patients getting the care they the real power of using administrative data lies in
need? Is the care timely and based on optimal the ability to link multiple datasets at the individ-
evidence? How might the system be better orga- ual person level and across healthcare sectors. In
nized to optimize care? Is the care provided his seminal work, Dunn describes record linkage
equitable across the province? as follows:
3 Health Services Data, Sources and Examples: The Institute for Clinical Evaluative. . . 49
Each person in the world creates a book of life. This access these datasets because even when the iden-
book starts with birth and ends with death. Its pages tifiers in these records are encoded, in rare cases,
are made of the records of principal events in life.
Record linkage is the name given to the process of individual linked records could potentially iden-
assembling the pages into a volume. (Dunn 1946) tify individuals if proper methodologies and
access controls are not employed. As noted by
The linkage of data enables researchers to Chamberlayne et al., “The ethical issues sur-
answer questions based on information from dif- rounding access to a resource made up of linked
ferent parts of the healthcare system. Without data are more complex than those pertaining to
linkage, we can look at hospitalization data and access to a single data source” (Chamberlayne
ask: “How many people were admitted to hospital et al. 1998).
with a heart attack and what hospital care did they Comprehensive and routinely updated docu-
receive?” But with linked data, we can answer mentation, or metadata, is required to fully under-
more involved questions, such as: “Of the people stand the rationale for the original collection of
who were hospitalized with a heart attack, who each variable – documentation is elusive at best
received appropriate follow-up with a specialist? and not always available to researchers. Compre-
Who was prescribed the appropriate medication hensive metadata is necessary to develop an accu-
on a follow-up basis? What were their compara- rate analytic plan, to assess face validity, and to
tive mortality rates 5 or 10 years later?” ensure a reasonable interpretation of the data once
Linked data also allows for the creation of analyzed. Currently, there are methodologies in
algorithms that generate cohorts of people with the emerging field of “data quality science” to
similar health conditions (such as diabetes, better standardize the assessment of administra-
asthma, congestive heart failure, or opioid use) tive health data quality and to understand whether
and/or healthcare experiences (such as mammog- the data are “fit” to answer the intended research
raphy or hip replacement). These algorithms can questions (Lix et al. 2012).
be enriched when linked data, such as physician
claims and hospital inpatient records, are used.
Typically, algorithms are validated by primary The ICES Data Repository
data collection from medical charts at physician
offices or in hospitals. Validated algorithms The Institute for Clinical Evaluative Sciences
applied to annual or updated administrative data (ICES) in Ontario, Canada, is a not-for-profit
provide an efficient way to generate cohorts that research institute and the steward of a secure and
would otherwise be very expensive to collect accessible data repository that allows for the
over time. development of evidence that makes “policy bet-
Using administrative health data for research ter, health care stronger and people healthier”
has some challenges, however. Since the data are (from ICES website www.ices.on.ca; March
collected for administrative purposes, they are 2014). ICES is funded primarily by the Ontario
observational and therefore usually retrospective. Ministry of Health and Long-Term Care with spe-
They usually do not contain the clinical or cial initiative funds and investigator-driven peer-
sociodemographic detail (such as smoking, socio- reviewed grants. As of April 2014, there were
economic status, or medical test results) necessary approximately 180 affiliated faculty from around
to answer some research questions or to account Ontario and about 160 staff whose expertise
for potential confounders of health outcomes. includes data linkage and analysis, biostatistics,
Administrative data may be prone to health informatics, epidemiology, project man-
misclassifying individuals assigned to disease- agement, research administration, information
based cohorts without adequate physician or hos- technology, and database development and sup-
pital chart-abstracted person-level record valida- port. ICES science is organized across clinical
tion. Finally, special legal authorities, privacy program areas: cancer, cardiovascular, primary
laws, and permissions are required to collect and care and population health, chronic disease and
pharmacotherapy, health system planning and experiences over time. The ICES Repository
evaluation, kidney, dialysis and transplantation, goes back to 1991 and in some cases, earlier.
and mental health. • Population based: In 2013, there were over
Most of the ICES staff are located at ICES 13 million people in Ontario, and since most
Central on the campus of Sunnybrook Health of the people who are eligible for healthcare
Sciences Centre in Toronto, Ontario, and other are represented, this makes the ICES Repos-
affiliated ICES scientists and staff are located itory the largest repository of its type in
across the province: Downtown Toronto, Queen’s Canada.
University in Kingston, the University of Ottawa, • Comprehensive health sector data: Much of
Western University in London, and new sites the administrative data in the ICES Reposi-
developing at McMaster University in Hamilton tory represent publicly funded physician, hos-
and at the Northern Ontario School of Medicine in pital and health-based community care, as
Thunder Bay. well as claims for prescription drugs for peo-
ICES is the steward of a large comprehensive ple aged 65 and over. Population and
and linkable data repository used for research and condition-specific registries are also included,
evaluation. The ICES Data Repository consists where available. In some provincial data
primarily of health administrative data that are repositories, such as at the Manitoba Centre
created in the day-to-day interactions with the for Health Policy at the University of Mani-
healthcare system – billings of physicians to the toba, additional government administrative
Ontario Health Insurance Plan (OHIP), drug data outside the health sector, such as educa-
claims to the Ontario Drug Benefit (ODB) Pro- tion and social support, are routinely
gram, discharge summaries of hospital stays included. At ICES, discussions to broaden
(DAD) and emergency department visits the collection and use of data beyond the
(NACRS), and much more. With almost complete health sector have begun.
health services data coverage of the annual • Desensitized and linkable with coded identi-
Ontario population from 1991 across most pub- fiers: Individuals in the Repository are
licly funded healthcare sectors, ICES scientists, uniquely identified with an ICES-specific key
analysts, and staff apply scientific methods to number (IKN) which is obtained by encoding
advance the evidence for improvements in health the Ontario health card number using a propri-
and healthcare. The collection and use of these etary encoding algorithm. ICES in-house pro-
administrative data is authorized by ICES’ desig- fessionals replace any direct identifiers
nation as one of four prescribed entities in Ontario attached to the incoming data with a unique
under the Personal Health Information Protection IKN that is used to link person-level records
Act 2004 (PHIPA, s.45) – this means that ICES from one dataset to another. This in-house
may collect and use personal health information expertise that spans informatics and research
for the purposes of evaluating and monitoring the has allowed for the easy integration of data
health system, with adequate data governance with high data quality standards.
permissions and controls. • Easy to use: All data are in an SAS format and
The ICES Data Repository has the following ready to use in an analytic environment – these
attributes: data are linkable to each other using a unique
person-level identifier and ready to use after
• Individual level: The data reflect people appropriate data access approvals. Having the
and their health and healthcare experiences, data repository organized in this manner cre-
similar to data repositories in British Colum- ates efficiencies for research as the data are
bia, Manitoba, Quebec, Nova Scotia, and already in record-level format.
Newfoundland. • Secure and privacy protected: Ontario privacy
• Longitudinal: Like other jurisdictions, the legislation (Personal Health Information Pro-
ICES Data Repository includes most healthcare tection Act – PHIPA 2004) allows for ICES to
collect direct identifiers from data custodians • Derived chronic condition cohorts have been
for the purpose of assigning an IKN to each developed at ICES using linked data algo-
data record. ICES’ privacy policies, practices, rithms that have been validated by using pri-
and procedures and our prescribed entity status mary data collection as a gold standard.
under PHIPA allow ICES to function with the • Detailed clinical data has been extracted from
approval of the Ontario Information and Pri- electronic medical records and through ICES
vacy Commissioner (IPC). A full review of primary data collection projects.
ICES privacy and security policies and proce- • Population and demographic data through the
dures is undertaken every 3 years, with the Ministry of Health’s Registered Persons Data-
approval letter from the IPC published on the base (RPDB) is used to characterize study sub-
ICES website (more detail on this below). jects and to generate denominators for rate
Expert information and technology staff are calculation.
on site to ensure the security and smooth main- • Additional clinical data, agency client-level
tenance of the research platform. data, and research data collections that are
• Professional data management: Data quality linkable to longitudinal outcome data are
and informatics experts apply the highest data included on a project-by-project basis.
quality standards and are leading in developing
metadata and other documentation for the ana- A full listing of the data in the ICES Data
lysts and scientists to use. Repository can be found on the ICES website.
The comprehensive collection in the ICES

Data Repository is the basis of population-based Privacy, Data Governance, and Access
examination of groups of people with particular to Data at ICES
health conditions (such as diabetes or cancer) or
people who have had similar health services expe- ICES is designated as a prescribed entity under the
riences (such as hip or knee surgery) or how the Ontario Personal Health Information Protection
health system is working (performance indicators Act (PHIPA 2004 (s. 45[1] and O. Reg 329/04
or continuity of care) and outcomes (length of section 18[1])). As a prescribed entity, health
hospital stay, emergency department visits, or information custodians (HICs), such as healthcare
death) over time. practitioners, hospitals, laboratories, nursing
The records in the ICES Data Repository homes, and community care access centers,
include: including the Ministry of Health and Long-Term
Care, may disclose personal health information
• Records of Ontarians’ day-to-day interactions (PHI) and associated information relating to their
with the healthcare system: Physician claims patients to ICES for purposes of “analysis or com-
submitted to the Ontario Health Insurance piling statistical information with respect to the
Plan, medical drug claims to the Ontario Drug management of, evaluation or monitoring of, the
Benefit Program, discharge summaries of hos- allocation of resources to or planning for all or
pital stays and emergency department visits, part of the health system, including the delivery of
claims for home care, information about long- services” (PHIPA s.45(1)). Health Information
term care, and more. Custodians and other data partners may also
• Special registry collections include Ontario disclose personal health information and associ-
Cancer Registry (Cancer Care Ontario), the ated clinical data to ICES that are collected
Ontario Stroke Registry (ICES collection), through approved research projects under the
Registry of the Cardiac Care Network, federal appropriate oversight of a Research Ethics Board
immigration information, an Ontario birth out- (REB) and the authorities prescribed under
comes registry (Better Outcomes Registry and PHIPA (s. 44(1)). As with all prescribed entities
Network – BORN), and others. in Ontario, ICES security and privacy standard
operating procedures and policies are reviewed privacy impact assessment (PIA) is completed
and approved by the Information and Privacy by research teams outlining the project research
Commissioner of Ontario every 3 years. protocol, the data being contemplated for the pro-
The authority for ICES to hold and integrate ject, the output of the research, and the foreseeable
data lies within detailed data sharing agreements privacy impacts or risks. The ICES privacy office
or memoranda of understanding with every data reviews all privacy impact assessments and pro-
partner. A data sharing agreement executed for vides recommendations and final approval before
every dataset integrated into the Repository out- any data can be accessed for projects. In some
lines the legal authorities, the data collection and cases and according to data sharing agreements,
transfer methods, the desensitization procedures, the data custodian is notified or approves the use
and the use for each new dataset that ICES col- of their data for ICES projects and they receive a
lects. The most comprehensive data sharing copy of reports that utilized their data. All ICES
agreement is with the Ontario Ministry of Health projects at a minimum undergo Research Ethics
and Long-Term Care, and this agreement outlines Board (REB) retrospective review – currently
ICES’ responsibility in using the Ontario health Sunnybrook Health Sciences Centre REB is the
administrative data. overseeing body for most ICES projects.
ICES’ policies, practices, and procedures that
prescribe the governance of the Repository over-
all and of each dataset at ICES are strictly Record Linkage and Desensitizing
followed – the use of the data at ICES is limited the Data for Research
to the agreed-upon purpose and use defined in the
data sharing agreement under which the data is The ICES Data Repository is continuously grow-
authorized for ICES to collect. ing. Mostly, the data collected at ICES initially
contains direct identifiers so that the records
Access to ICES Data attributed to a unique individual can be assigned
Research at ICES is generally managed within the correct ICES key number (IKN) and the direct
clinical program areas: cancer, cardiovascular, identifiers removed. This process of desensitizing
population health and primary care, chronic dis- data for research at ICES may be facilitated by
ease and pharmacotherapy, health system plan- record linkage (also known as record matching) –
ning and evaluation, kidney, dialysis and a process by which records from two files are
transplantation, and mental health and addictions. combined so that an individual’s information
As well, ICES currently has four active satellite from one file can be merged with the same indi-
sites: ICES UofT at the University of Toronto, vidual’s information from another file. For exam-
ICES Queen’s in Kingston, ICES uOttawa, and ple, you may have one file of demographic data
ICES Western (ICES at McMaster University and and another file of diagnostic patient information,
ICES North at Lakehead/Laurentian University and you want to combine and analyze them
are being developed). Scientists and staff are affil- together. If both files contain a precise identifier
iated with these programs. When a fully formed that refers to the same person (such as health card
project is contemplated by an ICES scientist, the number or social insurance number), the linkage
feasibility and rationale for its implementation is task is relatively easy. This is called deterministic
vetted by ICES program leads and management record linkage.
staff: Is the project aligned with the ICES mis- At ICES, not all individual-level data received
sion? Can the question be answered with the data contain Ontario health card numbers. Frequently,
available (or new data collected)? What is the individuals are identified in the data records by
human resource capacity to implement the project their name, postal code, and other “soft” identi-
– analyst and project management or coordination fiers. Before data can be used for research, the
resources? Are there adequate funds to implement IKN for these records must be found. Linkage to
the project? After these criteria are vetted, a other fields may be used to match individuals from
different files. These, as listed below, come with reduce the total number of comparisons; and cler-
some challenges. ical review is applied to pairs that did not yield a
conclusive weight.
Last name: The Registered Persons Database (RPDB) was
– Not unique between people (common described earlier in this chapter. ICES receives a
names may be shared by numerous number of RPDB files monthly and thus has a
individuals) cumulative record of the names, postal codes,
– Subject to misspelling and other demographic information for all health
– May change over time (e.g., at marriage) card holders in Ontario over time. This file is an
First name: essential component of making files without
– Similar issues as last names HCNs useable for research.
– Nicknames may be used in one file and full Figure 1 illustrates the process of assignment
names in the other of the ICES key number. Once an IKN is assigned
Date of birth: to a record and the original direct identifiers are
– Subject to transcription and other errors removed, that record is considered “desensitized”
– Imprecise when supplied by someone other and can be (deterministically) linked to all other
than the individual (e.g., family member at records in the ICES Data Repository that pertain
hospitalization) to the same person. This facilitates the creation of
– May be incomplete analytic datasets that are prepared to answer spe-
– Not unique cific research questions.
Date of death: Other institutions that may not have the equiv-
– Similar issues as date of birth alent of the RPDB may find other solutions. For
– May only be applicable to a portion of the example, Chamberlayne et al. (1998) describe the
file creation of a Linkage Coordinating File (LCF) at
Location of personal residence such as postal the Centre for Health Services and Policy
code: Research at the University of British Columbia.
– Subject to change over time (as people This file was created by applying probabilistic
move) record linkage to data from various sources and
– Nonunique, in particular within families contains personal identifiers and a unique person-
level index. The file can be used to facilitate the
To combine files that only contain imprecise linkage of other person-level files, in a way similar
direct identifiers such as those above, probabilistic to the RPDB.
record linkage (PRL) may be used. Another com-
mon term for PRL is “fuzzy matching.”
Probabilistic record linkage methodologies Data Documentation, Metadata,
incorporate the relative frequencies of field values and Data Quality Assessment
to compute their sensitivity and the positive pre-
dictive value and then combine these to form Data Quality Assessment
linkage weights for each pair of records. For in the Literature
example, if two records contain the same name,
a greater weight is given if that name is rare in the There are many frameworks and evaluation strate-
population being studied. Conversely, two records gies for data quality, and many are created for
sharing the same value that is quite common (e.g., specific purposes and types of data. Data quality
birth year or female gender) may not contribute assessment has been defined as “the whole of
much to the linkage weight. Various encoding planned and systematic procedures that take
algorithms and string comparators are used to place before, during and after data collection to
deal with alternate spellings, nicknames, and be able to guarantee the quality of data in a
common transcription errors. Blocking is used to database. . .for its intended use” (Arts et al. 2002).
Fig. 1 Process for assignment of ICES key number at ICES, with and without Ontario health card number
Holt and Jones suggested that “data quality is not • Privacy: Do the data adhere to jurisdictional
so much an absolute property of a statistical esti- privacy laws? Are there appropriate and audit-
mate but is related to the purpose for which the able privacy preserving procedures and prac-
estimate is used” (Holt and Jones 1999, p. 24). tices? Has the risk been sufficiently reduced by
When using administrative data, it is difficult to removing sensitive information?
“guarantee” data quality; however, a robust • Usability: Are the data organized, accessible,
assessment focusing on the linked data’s intended and provided in a format that can be easily
use and purpose will at least characterize the qual- used?
ity in an interpretable way. • Currency: What is the time lag between the
Generally, the following domains and ques- time period reflected in the data and the time
tions need to be examined when assessing data that data are ready for use?
quality:
A number of organizations have developed
• Accuracy: Do the data reflect the truth? data quality frameworks to assess the data in
• Validity: Do the data reflect what they were their repositories. For example, the Canadian
designed to reflect? Institute for Health Information (CIHI) frame-
• Completeness: Do the data include all records work includes dimensions of relevance, timeli-
that are collected? Have the fields been well ness, usability, accuracy, and comparability
populated? within an envelope of planning, implementing,
• Comprehensiveness and coverage: Do the data and assessing (CIHI 2009).
cover 100 % of the intended population? Alter- Researchers at the Manitoba Centre for Health
nately, do they constitute a representative Policy have developed a data quality framework
sample? that has been broadly adopted by ICES (Azimaee
• Reliability: Are the data reproducible? et al. 2013). In that framework, dimensions of data
• Timeliness: Is there a short lag between data quality are divided between those that can be
collection and use? assessed at the database level, versus those that
• Linkability: Can the data be connected to can be assessed at the research level. In particular
other data to reflect healthcare system they described database-specific data quality
complexity? dimensions as:
• Accuracy: Completeness (rate of missing values) Quality Assessment of Administrative Data

and correctness (invalid codes, invalid dates, out (QuAAD) framework leverages the traditional
of range, outliers, and extreme observations) data quality framework and adds a number of
• Internal validity: Internal consistency, stability domains that aim to help to develop data partner-
across time, and linkability ships and improvements for health data:
• External validity: Level of agreement with lit-
erature and available reports • Context: What is the purpose of the project and
data evaluation? Who are the key stakeholders
• Timeliness: Currency of posted data, time to and who is using the data? What is the purpose
acquisition, and time to release for research of the data collection? What is the political
purposes environment?
• Interpretability: Availability, quality and ease • Issues: Who is the target population and where
do they live? What are the outcomes of the
of use of documentation, policies and proce- project – for example, quality of care, appropri-
dures, format libraries, metadata, and data ateness, timeliness, mortality, and service use?
model diagrams What are the predictors of and influences that
may affect the outcomes of the project, such as
Data quality should also be assessed within a system characteristics?
• Data and sources: What data are being used?
specific research project, where conclusions may Who are the data custodians? What data ele-
be drawn about the accuracy and reliability of ments are being used? Will the data be linked?
data, measurement error, bias, and agreement What are the authorities for data use?
with other databases and other sources. According • Measurement: These are the usual data quality
indicators: e.g., timeliness, reliability,
to Roos and others (Roos et al. 1989, 2005), data completeness.
quality assessment can be carried out through • Appraisal: Summary of data quality; stakeholder
comparisons of linked information across datasets report; identification of data improvement
used for the project. opportunities
• Implementation: If opportunities or gaps are
In 2007 ICES researchers undertook an envi- identified, how will these be addressed? (Dis-
ronmental scan of data quality assessments (Iron cussions and next steps with data custodians)
and Manuel 2007). Their conclusions from the
environmental scan were: Data Documentation, Data Quality
Assessment, and Metadata at ICES
• Quality should be routinely and systematically
Currently at ICES, a systematic and holistic
evaluated for all generally-used data.
• Data quality is contextual approach is taken to documenting each dataset and
• The evaluation and interpretation of data quality assessing its completeness, correctness, stability,
depends on the purpose for which the data are and linkability. Figure 2 summarizes the approach
being used.
taken. Metadata information (such as the descrip-
• The constructs of accuracy and validity are often
confused. tion of datasets, variables, and valid values) is
• Accuracy (or truth) is an elusive construct and extracted from the data repository into a metadata
should not be expected. repository. This information in turn is used to pro-
• The most common ways to evaluate validity are
duce the data dictionary, as well as data quality
concordance, comparability and inter-database
reliability. assessments. By utilizing a “single source of
• Linked data, where available, should be used to truth,” consistency between the data, the data dic-
evaluate data quality (when primary data collec- tionary, and the data quality assessments is assured.
tion is not feasible).
• There is a need for more investigation into eval-
uating data quality. Data Documentation at ICES
• The relevance of every data quality assessment A detailed data dictionary is an essential tool in
requires full discussion. (p. 8) research. The IBM Dictionary of Computing
defines a data dictionary as “a centralized reposi-
They proposed an end-to-end data quality tory of information about data such as meaning,
framework for projects using linked data. The relationships to other data, origin, usage, and
format” (ACM 1993). The information in the data The files in the ICES Data Repository are
dictionary should contain, at a minimum: stored as SAS datasets. ICES leverages certain
features of the SAS software to create data dictio-
• The name and brief description of each file naries that correspond dynamically to the data
• A list of fields and their description files. In particular:
• For each field, a list and description of valid
values • Dataset labels are used to describe the contents
• Unstructured or semi-structured comments of each file.
with additional information • Similarly, variable labels describe each field.
• A central format catalogue contains descrip-
tions of all valid discrete values of each field
of all datasets.
Internal experts for each of the datasets have

been identified. These individuals enrich the data
dictionary with their insights and experience by
Data means of comments.
Repository A dot-net application displays this informa-
tion in a user-friendly online data dictionary.
Additional information, such as the expected
date of the next update, is added manually.
Metadata Most of the information is displayed for the
Repository public on the ICES website. Certain fields
(such as the name of the internal expert) are
Data Data only available internally. Figure 3 is an excerpt
Quality Dictionary from the ICES Data Dictionary describing the
variable admission date (ADMDATE) in the
OHIP data library.
This approach to data documentation at ICES
Fig. 2 Holistic approach to data documentation and data
quality assessment at ICES has a number of advantages over the more
Fig. 3 Excerpt of ICES Data Dictionary (Source: ICES Data Dictionary https://datadictionary.ices.on.ca/Applications/
DataDictionary/Variables.aspx?LibName=OHIP&MemName=&Variable=ADMDATE)
traditional manual methods. Since information is New Data, New Uses, and New Ideas
based on actual data elements, there is internal
consistency between the data and the documenta- Health administrative data, particularly in the con-
tion. The process of creating a data dictionary for text of universal healthcare coverage, present a
a new dataset is automated and quick, so that a tremendous opportunity to conduct health and
data dictionary can be made available immedi- healthcare research. Linkable population-based
ately at the same time the data is posted. And data, with the appropriate privacy and security
finally, if errors are discovered, they are corrected safeguards, are a resource for examining popula-
in both the data and the documentation. tion- and disease-based cohorts, trends in health
services utilization, prevalence and incidence
Data Quality Assessment at ICES trends, and effects of policy and system changes,
ICES’ holistic approach to assessing the data among others. Expertise and care must be applied
quality includes a variety of tools that are used to to use such data effectively and optimally.
assess and document data quality, including: Administrative data are also collected outside
the health sector for managing social programs or
• All the data elements in a dataset are displayed educational systems. As with many similar data
in a “VIMO report,” which summarizes the repositories in Canada and around the world, ICES
valid, invalid, missing, and outlier rates. Exam- is exploring the expansion of its linkable data
ples of invalid values are listed. Simple holdings to include non-health administrative
descriptive statistics are also displayed and data from across the provincial and federal gov-
frequencies or histograms are linked to each ernment and social service agencies. For example,
field. ID variables are highlighted, and their a new research program at ICES focusing on men-
uniqueness status is described. tal health and addictions (MHA) was launched in
• A trend analysis of the number of observations 2013 where the need for integrating community
over time is performed, and the results are addictions and mental health agency data with
displayed graphically. health data is critical to understanding prevention,
• The percent of records that are linkable to the early detection, and timely and sustained appropri-
rest of the ICES Data Repository is displayed ate care which in many cases is done in a commu-
over time. nity setting outside the medical model. Although
• Missing values over time are presented visu- much of the routine health data to support this
ally, so substantial changes can be easily program already exists at ICES, a comprehensive
detected. evaluation of the full spectrum of MHA care
• Content experts are identified for each of the requires linkable person-level data that are gener-
datasets. These content experts are expected to ated from education, social support, youth justice
be familiar with the data quality assessment for and child and youth services sectors for example.
their dataset and detect any issues that need to Around the world, discussions about linking
be addressed. biobank and genomic data, electronic medical
• All data users participate in a data blog, in record data, and other large data collections with
which questions and issues are discussed and, each other and with administrative data are pro-
when appropriate, acted upon. pelling the field of big data repositories and ana-
lytics into new and uncharted paradigms.
Figure 4 illustrates of a VIMO assessment of a Innovative data collection tools, dynamic and
client intake dataset. Variable names are privacy-protecting record linkage models, data
hyperlinked to additional univariate descriptions. use, and governance frameworks and technolo-
For example, for numeric values, a histogram is gies are quickly advancing to keep up with the
presented, and for nonunique ID variables, fre- amount and the scope of data being generated and
quencies of the number of records per ID are the research and private sector demands that
displayed. depend on linking disparate datasets.
58
VIMO for dataset INTAKE (N=5,168,489)

ID variables (2)
Variable Name Variable Label % Valid % Missing Comments
IKN ICES Key Number 99.90 0.01 Not unique
RefId Encrypted ID 100.00 0.00 Unique
Numeric Variables (3)

Variable Name Variable Label % Valid % Invalid % Missing % Outlier Min Max Mean Median STD
AGE Age in years 100.00 0.00 0.00 0.00 999.00 47.80 52.90 29.40
ABS_CC Agressive Behaviour Scale 76.24 14.72 9.04 0.00 12.00 1.10 0.00 2.20
CPS_CC Cognitive Performance Scale 86.78 13.22 0.00 0.00 6.00 2.90 3.00 2.20
Character Variables (6)

Variable Name Variable Label % Valid % Invalid % Missing Values Comments
AssessReas Reason for Assessment 100.00 0.00 A, N
Local health integration network
LHIN (LHIN) at admission 100.00 0.00 01,…,14
LivArr Living arrangement at admission 98.15 1.85 1,2,3,4,5,6,7,8,9,10,11
RefBy Source of referral 100.00 0.00 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11
ProgType Program request 100.00 0.00 09, 10, 11,12, 17, 18, 20 Invalid codes: 22, 65
Sex Client's gender 100.00 0.00 F, M , O, U
Date Variables (3)

Variable Name Variable Label % Valid % Missing Date Range
AssessStDate Assessment Start Date 77.66 22.34 10-Oct-1997 to 31-Aug-2014
AssessEndDate Assessment End Date 90.15 1.85 11-Nov-1997 to 21-Sep-2014
RefDate Referral Date 72.57 27.43 04-Jan-1997 to 30-Sep-2014
Fig. 4 Example of VIMO (valid, invalid, missing, outlier) data quality assessment at ICES
K. Iron and K. Sykora
References policy and practice to improve the population’s health.

New York: Oxford University Press; 2005. p. 16.
ACM. IBM dictionary of computing, 10th edn; 1993. Holt T, Jones T. Quality work and conflicting quality
http://www-03.ibm.com/ibm/history/documents/pdf/glos objectives. In: Quality work and quality
sary.pdf assurance within statistics. Eurostat Proceedings;
Arts D, de Keizer NF, Scheffer GJ. Defining and improving 1999. p. 15–24. http://epp.eurostat.ec.europa.eu/portal/
data quality in medical registries: a literature review, page/portal/quality/documents/DGINS%20QUALITY
case study, and generic framework. J Am Med Inform %20Q98EN_0.pdf. Accessed 17 June 2014.
Assoc 2002;9:600–11. http://jamia.bmj.com/content/9/ ICES. website www.ices.on.ca
6/600.full. Accessed 17 June 2014. Iron K, Manuel DG. Quality assessment of administrative
Azimaee M, Smith M, Lix L, Ostapyk T, Burchill C, Hong data (QuAAD): an opportunity for enhancing Ontario’s
SP. MCHP data quality framework. Winnipeg: Mani- health data. ICES investigative report. Toronto: Insti-
toba Centre for Health Policy, University of Manitoba; tute for Clinical Evaluative Sciences; 2007.
2013. www.umanitoba.ca/faculties/medicine/units/ Lix LM, Neufeld SM, Smith M, Levy A, Dai S,
community_health_sciences/departmental_units/mchp/ Sanmartin C, Quan H. Quality of administrative data
protocol/media/Data_Quality_Framework.pdf in Canada: a discussion paper 2012. Background for
Canadian Institute for Health Information. The CIHI data science of data quality invitational workshop. Ottawa;
quality framework, 2009. Ottawa: CIHI; 2009. http:// 2012. http://www.usaskhealthdatalab.ca/wp-content/
www.cihi.ca/CIHI-ext-portal/pdf/internet/DATA_QU uploads/2012/09/Workshop_Background-Document-
ALITY_FRAMEWORK_2009_EN updated-AppendixD.pdfLisaLix
Chamberlayne R, Green B, Barer ML, Hertzman C, Law- Personal Health Information Protection Act (PHIPA). 2004.
rence WJ, Sheps SB. Creating a population-based http://www.e-laws.gov.on.ca/html/statutes/english/elaws_
linked health database: a new resource for health ser- statutes_04p03_e.htm
vices research. Can J Public Health. 1998;89(4):270–3. Roos LL, Sharp SM, Wajda A. Assessing data quality: a
Dunn HL. Record Linkage. Am J Public Health Nation computerized approach. Soc Sci Med. 1989;28
Health. 1946;36:1412. (2):175–82.
Friedman D, Hunter E, Parrish II G. Defining health statis- Roos LL, Gupta S, Soodeen RA, Jebami L. Data quality in
tics and their scope. In: Friedman DJ, Hunter EL, a data rich environment: Canada as an example. Can J
Gibson Parrish II R, editors. Health statistics: shaping Aging. 2005;24 Suppl 1:153–70.
Health Services Data: The Centers for
Medicare and Medicaid Services (CMS) 4
Claims Records
Ross M. Mullner
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Major Healthcare Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Medicare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Medicaid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Children’s Health Insurance Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Information and Data Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Information Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Data Navigator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Interactive Dashboards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Data Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Medicare and Medicaid Public Use Data File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Chronic Conditions Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Medicare Current Beneficiary Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Medicare Qualified Entity Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Abstract healthcare programs: Medicare, Medicaid, and

The US Centers for Medicare and Medicaid the Children’s Health Insurance Program
Services (CMS) is the largest purchaser of (CHIP). The Medicare program provides
healthcare in the nation – serving almost 123 mil- government-sponsored health insurance for peo-
lion people, more than one in three Americans. ple 65 or older and under age 65 with certain
CMS is responsible for administering and over- diseases and disabilities. The Medicaid program,
seeing three of the nation’s largest ongoing which is a joint state-federal program, provides
healthcare for the poor. CHIP is a grant program
that provides health insurance to targeted
low-income children in families with incomes
R. M. Mullner (*)
Division of Health Policy and Administration, School of above Medicaid eligibility levels. CMS sponsors
Public Health, University of Illinois, Chicago, IL, USA many data and information initiatives for health
e-mail: rmullner@comcast.net

https://doi.org/10.1007/978-1-4939-8715-3_5
62 R. M. Mullner
services researchers, policymakers, educators, strengthening and modernizing the nation’s health
students, and the general public. In 2014, CMS care system to provide access to high equality care
established the Office of Enterprise Data and and improved health at lower cost” (CMS 2015).
Analytics (OEDA) to better oversee and coordi- Headquartered in Baltimore, Maryland, with
nate its large portfolio of data and information. other offices in Bethesda, Maryland, and
The office also funds the privately run Research Washington, DC, ten regional offices located
Data Assistance Center (ResDAC), which pro- throughout the nation, and three antifraud field
vides training and technical assistance to individ- offices, CMS employs about 5,900 federal
uals requesting the agency’s data files. CMS employees. CMS employees in Baltimore,
information products include an online research Bethesda, and Washington, DC, develop
journal Medicare and Medicaid Research healthcare policies and regulations, establish pay-
Review (MMRR); other publications including ment rates, and develop national operating sys-
Medicare and Medicaid Statistical Supplement, tems for programs. Regional office employees
Statistics Reference Booklet, and CMS Fast provide services to Medicare contractors; accom-
Facts; a data navigator; and several interactive pany state surveyors to hospitals, nursing homes,
dashboards. Its data products include numerous and other facilities to ensure health and safety
Medicare and Medicaid public use data files, the standards; and assist state CHIP and Medicaid
Chronic Conditions Data Warehouse (CCW), the programs. CMS employees also work in offices
Medicare Current Beneficiary Survey (MCBS) in Miami, Los Angeles, and New York, cities
files, and the Medicare Qualified Entity known to have high incidences of healthcare
(QE) Program. Many examples of CMS’ infor- fraud and abuse.
mation and data products are highlighted and Operationally, CMS consists of 15 major divi-
discussed. sions, including seven centers: Center for Strategic
Planning, Center for Clinical Standards and Quality,
Center for Medicare, Center for Medicaid and CHIP
Introduction Services, Center for Program Integrity, Center for
Consumer Information and Insurance Oversight,
The Centers for Medicare and Medicaid Services and Center for Medicare and Medicaid Innovation.
(CMS) is a major agency within the US Depart- CMS also has a number of operational offices.
ment of Health and Human Services (DHHS). One office that will increasingly play an important
CMS (previously known as the Health Care role in data and information initiatives is the
Financing Administration or HCFA) is responsi- Office of Enterprise Data and Analytics
ble for administering and overseeing three of the (OEDA). Established in 2014 and managed by
nation’s largest ongoing healthcare programs: CMS’ first chief data officer (CDO), the OEDA
Medicare, Medicaid, and the Children’s Health is tasked with overseeing improvements in the
Insurance Program (CHIP). In addition, CMS is agency’s data collection and dissemination activ-
responsible for implementing various provisions ities. It will work to better harness CMS’ vast data
of the Patient Protection and Affordable Care Act resources to guide decision-making, promoting
(ACA) of 2010, including the construction of an greater access to the agency’s data to increase
insurance exchange or marketplace, consumer higher-quality, patient-centered care at lower
protections, and private health insurance market costs. The OEDA also manages the CMS-funded
regulations. In 2015, CMS through its various Research Data Assistance Center (ResDAC) at the
programs served almost 123 million people, University of Minnesota, which conducts educa-
more than one in three Americans, making it the tion and training programs and provides assis-
single largest purchaser of healthcare in the tance to researchers who want to access the
United States. agency’s data files (Brennan et al. 2014).
CMS’ stated mission is “as an effective steward In 2015, CMS’ budget totaled an estimated
of public funds, CMS is committed to $602 billion (CMS 2015).
4 Health Services Data: The Centers for Medicare and Medicaid Services (CMS) Claims Records 63
Major Healthcare Programs beneficiaries enroll in both Part A and Part

B. Often beneficiaries do not pay premiums for
Medicare Part A, because they have worked for 40 quarters
and paid into the Social Security System. But,
Established in 1965, Medicare (Title XVIII of the they do have to pay monthly premiums for Part B.
Social Security Act) is a federal health insurance Medicare Part C, or Medicare Advantage, is a
program for people 65 or older, those under age managed care program. Medicare Advantage
65 with certain disabilities, people of any age with plans combine Medicare Part A and Part B and
end-stage renal disease (permanent kidney failure often provide additional benefits that Original
requiring dialysis or a kidney transplant), and Medicare does not cover such as dental, hearing,
individuals with amyotrophic lateral sclerosis vision care, and prescription drug coverage.
(ALS) commonly known as Lou Gehrig’s disease. Depending upon the particular managed care
In 2015, a total of 55.2 million individuals were plan, Medicare Advantage can cost beneficiaries
enrolled in Medicare in the nation. less and provide more benefits than Original
Medicare consists of four separate parts: Medi- Medicare. Medicare Advantage plans are run by
care Part A (Hospital Insurance), Medicare Part B private companies that contract with CMS to pro-
(Medical Insurance), Medicare Part C (Medicare vide covered services. The types of plans include
Advantage plans), and Medicare Part D (Medicare Health Maintenance Organizations (HMOs), Pre-
Prescription Drug Coverage). ferred Provider Organizations (PPOs), Private
Medicare Part A provides insurance coverage Fee-for-Service (PFFS) plans, and Special Needs
for hospital inpatient care (covering stays in a Plans (SNPs) (Office of the Assistant Secretary for
semiprivate room, meals, general nursing and Planning and Evaluation 2014).
other hospital services, and supplies), skilled Medicare Part D is a voluntary outpatient pre-
nursing facility care (covering up to 100 days in scription drug benefit for Medicare beneficiaries.
a semiprivate room, skilled nursing and rehabili- The Medicare program provides the drug benefit
tation services, and other services and supplies, through either Medicare Advantage plans or pri-
following a hospital stay), home health care ser- vate standalone prescription drug plans approved
vices (covering part-time or intermittent skilled by CMS. The prescription drug plans vary in
nursing care, physical therapy, speech language terms of the type of drugs they cover and their
pathology, and occupational therapy), and hospice coinsurance and deductible costs. In 2015, there
care (covering drugs for pain relief and medical were 1,001 prescription drug plans in the nation
and support services). (Blumenthal et al. 2015; Henry J. Kaiser Family
Medicare Part B provides insurance coverage Foundation 2014).
for necessary medical services (covering physi-
cian services, outpatient medical and surgical ser-
vices and supplies, diagnostic tests, and durable Medicaid
medical equipment (DME)), clinical laboratory
services (covering blood tests, urinalysis, and Established in 1965 along with the federal legis-
other screening tests), home health care services lation that created the Medicare program, Medic-
(covering part-time or intermittent skilled nursing aid (Title XIX of the Social Security Act) is a joint
care, physical therapy, speech language pathol- federal-state healthcare program for the poor. The
ogy, and occupational therapy), and outpatient federal government provides the states with
hospital services (covering hospital services and matching contributions to help fund the various
supplies). Medicaid programs. States design and administer
Medicare Part A and B are known as Original their own Medicaid programs, determining eligi-
Medicare. Most healthcare services provided to bility standards, benefit packages, and payment
beneficiaries enrolled in Original Medicare are rates under broad federal guidelines. As a result,
paid for on a fee-for-service basis. Most Medicare state Medicaid programs vary greatly in size,
64 R. M. Mullner
scope, and generosity. For example, a low-income orthodontics, mental and behavioral health, hos-
individual may be eligible for Medicaid in one pitalizations, home health care, rehabilitation
state, but not in another. In 2015, an estimated care, medical equipment, and laboratory and
66.7 million individuals were receiving Medicaid x-ray services. In 2015, about 6.2 million children
benefits in the nation. were enrolled in CHIP (Ewing 2008; National
Medicaid originally only provided healthcare Conference of State Legislature 2014).
services for certain categories of the poor such as
pregnant women, children, parents with young
children, the elderly, and blind and disabled indi- Information and Data Products
viduals. The Affordable Care Act (ACA) of 2010
greatly expanded the Medicaid program to cover Each year CMS collects and processes enormous
millions of uninsured Americans. Under the new amounts of data. For just the Medicare program
law, many states have expanded their Medicaid alone, CMS and its contractors process more than
programs to cover nearly all non-elderly poor 1.3 billion claims a year and generate billions of
adults (Henry J. Kaiser Family Foundation 2015; other non-claims data, such as eligibility checks,
Orentlicher 2015). queries from telephone contacts through its toll-
Medicaid is a very important payer for infants free 1–800 MEDICAR(E) help line, patient experi-
and the elderly and younger individuals with signif- ence surveys, and enrollment information. Addition-
icant disabilities. It pays for about half of all births in ally, CMS collects data on its Medicare and
the nation. And Medicaid is the nation’s only safety Medicaid Electronic Health Record (EHR) Incen-
net for people who need long-term care services. tive Programs and on health insurance exchanges or
About a third of Medicaid spending pays for per- marketplaces coverage.
sonal assistance in nursing homes and at home for In the past, CMS tended to view the data and
people who need help with the basic tasks of daily information it produced as only by-products of its
living (Feder and Komisar 2012). operations. Today, however, the development,
Some individuals, known as dual eligible bene- management, use, and dissemination of data and
ficiaries, receive both Medicaid and Medicare ben- information resources have become one of CMS’
efits. They are enrolled in Medicare Part A and/or core functions. To become more transparent and
Part B and receive some form of Medicaid benefits. accountable, CMS is increasingly making more of
In 2015, about 9.6 million individuals were dually its data and information available to researchers,
eligible in the United States (Cohen et al. 2015; policymakers, educators, students, and the general
Henry J. Kaiser Family Foundation 2015). public. By releasing these resources, CMS is
attempting to leverage its data and information to
better evaluate and improve its programs, facilitate
Children’s Health Insurance Program healthcare innovation, develop new products and
analysis tools, and highlight actionable information
Established in 1992 and reauthorized several for internal and external policy- and decision-
times, the state Children’s Health Insurance Pro- makers (CMS 2012).
gram (CHIP) (Title XXI of the Social Security
Act) is a program that provides federal funds to
states and matches state contributions to provide Information Products
health insurance to children who do not qualify for
Medicaid. Specifically, CHIP provides health CMS produces many information products that
insurance for children less than 19 years of age are readily available to researchers and the general
whose families are ineligible for Medicaid. While public. These products include numerous publi-
state benefit plans vary, all CHIP plans cover cations, a data navigator, and several interactive
immunizations, prescription medications, routine dashboards. Examples of some of the major infor-
physician visits, dental care, medically necessary mation products are described below.
Publications Data-and-Systems/Statistics-Trends-and-Reports/
CMS-Statistics-Reference-Booklet/2014.html.
For health services researchers and policy analysts, The briefest statistical summary on annual
CMS publishes a peer-reviewed online journal, the CMS program and financial data is published in
Medicare and Medicaid Research Review CMS Fast Facts. It includes summary information
(MMRR). The journal (previously titled the Health on total national health expenditures; Medicare
Care Financing Review) publishes research articles enrollment, utilization, and expenditures; and the
throughout the year on a continuous basis. The number of Medicaid recipients and payment by
articles address various topics such as trends in selected types of service. CMS Fast Facts can be
Medicare, Medicaid, and CHIP, access and quality found at: www.cms.gov/fastfacts.
of care issues, healthcare insurance coverage, and
payment for health services. It also includes CMS
News and Data Briefs. Issues of MMRR, as well as Data Navigator
the entire run of the Health Care Financing Review
(Vols. 1–39; 1979–2009), can be accessed at: An important tool for finding CMS information
www.ncbi.nlm.nih.gov/pmc/journals/2404. and data is the agency’s data navigator. The data
CMS publishes annual data in its Medicare and navigator is an easy-to-use, menu-driven search
Medicaid Statistical Supplement. This comprehen- tool that guides the user to CMS’ information and
sive statistical supplement is updated on an ongo- data on the World Wide Web, including the
ing basis by section as the data becomes available. agency’s data housed on external websites such
Consisting of 14 chapters, including 115 tables and as the Henry J. Kaiser Family Foundation, the
67 charts, the supplement provides detailed tables National Institute of Medicine, and the Health
on the personal healthcare expenditures for the Indicators Warehouse. The navigator enables the
entire US population; characteristics of the Medi- user to organize data into categories, such as by
care program including enrollment, program pay- CMS program, setting/type of care, topic, geog-
ments, cost sharing, utilization of short-stay raphy, and document type. It also contains a com-
hospitals, skilled nursing facilities, home health prehensive glossary of terms, a list of frequently
agencies, hospices, physician services, hospital asked questions, and a place to subscribe for email
outpatient services, end-stage renal disease ser- updates. The CMS data navigator’s address is:
vices, managed care, and Medicare Part D; and https://dnav.cms.gov.
characteristics of the Medicaid program including
the number of persons served, their demographic
characteristics, and the types of services they Interactive Dashboards
received. Current and past statistical supplements
(2001 to the present) can be accessed at: www.cms. To make its information more accessible, CMS has
gov/Research-Statistics-Data-and-Systems/Statis developed several interactive dashboards. For
tics-Trends-and-Reports/MedicareMedicaidStatS example, the Medicare Geographic Variation
upp/2013.html. Dashboard provides users with an easy-to-use,
CMS also publishes an abridged version of the customizable tool to find, compare, and analyze
statistical supplement entitled CMS Statistics Ref- state- and county-level variations in Medicare per
erence Booklet. This quick reference guide sum- capita costs. Data used in the dashboard are based
marizes information about national healthcare on CMS claims data for Medicare beneficiaries
expenditures and the Medicare and Medicaid pro- enrolled in the fee-for-service programs during
grams. Published in June of each year, the booklet the 5-year period 2008–2012. Users of the dash-
provides the most currently available information. board can compare state and county Medicare
Booklets are available online for 2003 through the costs to that of the nation and identify year-to-
most currently available complete calendar year, year trends compared to national trends over the
at: www.cms.gov/Research-StatitheMedicarestics- same time period. Specifically, users can compare
66 R. M. Mullner
Medicare’s total per capita costs, inpatient per Some of these files because they contain specific
capita costs, post-acute care per capita costs, hos- patient and condition identifiable data are
pice per capita costs, physician/outpatient depart- restricted and difficult to obtain; however, other
ment per capita costs, durable medical equipment de-identified files are readily available as public
per capita costs, Medicare Part B drug per capita use data files, which are free and can be easily
costs, outpatient dialysis facility per capita costs, downloaded.
and the total number of Medicare beneficiaries Table 1 presents a list of 23 CMS public use
in the state or county. The dashboard can be data files and systems and the years for which they
found at: www.cms.gov/Research-Statistics-Data- are available. The files are divided into nine broad
and-Systems/Statistics-Trends-and-Reports/Medica
re-Geographic-Variation/GV_Dashboard.html. Table 1 List of CMS’ public use data files and the years
Another example is the Medicare Chronic Con- for which they are available
dition Dashboard, which presents information for Healthcare organization cost data files
2012 on the prevalence, utilization, and Medicare 1. Healthcare Cost Report Information System (HCRIS)
spending for 17 chronic disease conditions. The Community Mental Health Centers, 2010–2015
conditions include Alzheimer’s disease/dementia, Health Clinics, 2009–2015
arthritis, asthma, atrial fibrillation, autism spectrum Home Health Agencies, 1994–2014
disorders, cancer, chronic kidney disease, chronic Hospices, 1999–2015
obstructive pulmonary disease (COPD), depres- Hospitals, 1996–2015
sion, diabetes, heart failure, hyperlipidemia, hyper- Renal Dialysis Facilities, 1994–2015
tension, ischemic heart disease, osteoporosis, Skilled Nursing Facilities, 1996–2014
schizophrenia/psychoses, and stroke. The informa- Medicare claims data files
2. Basic Stand Alone (BSA) Medicare Claims Public Use
tion is presented by geographic areas such as federal
Files (PUFs)
government region, state, county, and hospital refer- Carrier Line Items PUF, 2008, 2010
ral region. Users of the dashboard can select specific Durable Medical Equipment (DME) Line Items PUF,
categories by gender, age group, Medicare benefi- 2008, 2010
ciaries only, and for dual eligible beneficiaries (indi- Home Health Agency (HHA) Beneficiary PUF, 2008,
vidual receiving both Medicare and Medicaid). The 2010
dashboard is located at: www.cms.gov/Research- Hospice Beneficiary PUF, 2008, 2010
Statistics-Data-and-Systems/Statistics-Trends-and- Inpatient Claims PUF, 2008
Reports/Chronic-Conditions/CCDashboard.html. Outpatient Procedures PUF, 2008, 2010
Prescription Drug Events PUF, 2008
Skilled Nursing Facility (SNF) Beneficiary PUF, 2008,
Data Products 2010
Chronic Conditions PUF, 2008, 2010
CMS produces many data products that are avail- Institutional Providers and Beneficiary Summary PUF,
2013
able to researchers as well as the general public.
Prescription Drug Profiles PUF, 2008, 2010
These data products include many Medicare and
3. Data Entrepreneurs’ Synthetic Public Use Files
Medicaid public use data files, the Chronic Con- (DE-SynPUF), 2008–2010
ditions Data Warehouse (CCW), the Medicare Beneficiary Summary
Current Beneficiary Survey (MCBS), and the Carrier Claims
Medicare Data Sharing Program. Inpatient Claims
Outpatient Claims
Physician and supplier Medicare charges
Medicare and Medicaid Public Use
4. Medicare Provider Utilization and Payment Data
Data File Medicare Physician and Other Suppliers, 2012
Medicare Provider Utilization and Payment Data:
Many of CMS’ Medicare and Medicaid data files Inpatient, 2011–2012
may be very useful to health services researchers. (continued)
Table 1 (continued) Table 1 (continued)

Medicare Provider Utilization and Payment Data: Geographic regions and hospital service areas
Outpatient, 2011–2012 15. Hospital Service Area File, 1992–2013
Program evaluation and health outcomes 16. Medicare Geographic Variation Files, 2007–2013
5. Consumer Assessment of Healthcare Providers and Hospital Referral Region (HRR) Report – All
Systems (CAHPS), Varies Beneficiaries
Hospital CAHPS Hospital Referral Region (HRR) Report –
Home Health CAHPS Beneficiaries Under 65
Fee-for-Service CAHPS Hospital Referral Region (HRR) Report –
Medicare Advantage and Prescription Drug Plan Beneficiaries 65 and Older
CAHPS Hospital Referral Region (HRR) Table – All
In-Center Hemodialysis CAHPS Beneficiaries
Hospice Hospital Referral Region (HRR) Table – Beneficiaries
6. Healthcare Effectiveness Data and Information Set Under 65
(HEDIS), 1997–2015 Hospital Referral Region (HRR) Table – Beneficiaries
7. Medicare Compare 65 and Older
Dialysis Facility Compare, 2010–2014 State/County Report – All Beneficiaries
Home Health Compare, 2003–2014 State Report – Beneficiaries Under 65
Hospital Compare, 2005–2014 State Report – Beneficiaries 65 and Older
Nursing Home Compare, 2002–2014 State/County Table – All Beneficiaries
Physician Compare, 2010–2014 State Table – Beneficiaries Under 65
8. Medicare Health Outcome Survey (HOS), Varies by State Table – Beneficiaries 65 and Older
Cohort Group 1998–2015 Directories of providers and coding systems
Base Line PUFs 17. Health Care Information System (HCIS) Data File,
Follow-Up PUFs 2009–2011
Analytic PUFs 18. Medicare Part B Summary Data Files
Medicare prescription drug program Carrier File, 2005–2011
9. Prescription Drug Plan Formulary and Pharmacy National File, 2000–2013
Network Files, 2005 – Current 19. National Provider Identifier (NPI) Downloadable
Beneficiary Cost File File, 2007 – Current
Formulary File 20. Physician Supplier Procedure Summary Master File,
1991–2013
Geographic Locator File
21. Provider of Services (POS) File, 1991–2014
Pharmacy Network File
22. Unique Physician Identification Number (UPIN)
Plan Information File
Directory, 2003–2007
Pricing File
23. Unique Physician Identification Number (UPIN)
Record Layout Group File, 2005–2007
Medicare electronic medical records program files
10. Medicare Electronic Health Record (ERH) Incentive
Program Eligible categories: healthcare organization cost data files,
Professionals Public Use File (PUF), 2013 Medicare claims data files, physician and supplier
Eligible Professionals PUF Medicare charges, program evaluation and health
Eligible Hospitals PUF outcomes, Medicare prescription drug program,
Medicaid data files Medicare electronic medical records program
11. Medicaid Analytic Extract (MAX) Provider files, Medicaid data files, geographic regions and
Characteristics File, 2009–2011
hospital service areas, and directories of providers
12. Medicaid/CHIP Environmental Scanning and
Program Characteristics (ESPC) File, 2005–2013 and coding systems. The categories are discussed
13. Medicaid State Drug Utilization File, 1991–2014 below, and individual files are highlighted.
14. Medicaid Statistical Information System (MSIS)
Datamart Healthcare Organization Cost Data Files
MSIS State Summary Datamart, 1999–2012 Some of the most widely used CMS public use
MSIS Drug Utilization Datamart, 2004–2010 files are those containing Medicare cost reports.
(continued)
68 R. M. Mullner
Specifically, these reports are included in the information can be found at: www.resdac.org/event/
Healthcare Cost Report Information System webinar-introduction-data-entrepreneurs-synthetic-
(HCRIS). The various files in HCRIS contain public-use-file-de-synpuf.
annual mandatory cost reports submitted to CMS
from all healthcare facilities that accept Medicare Physician and Supplier Medicare Charges
funds. Nearly all of the nation’s hospitals, skilled The next category includes the Medicare Provider
nursing homes, hospices, renal dialysis facilities, Utilization and Payment Data files. These files
independent rural health clinics, and freestanding contain data on the services and procedures pro-
federally qualified health centers submit these vided to Medicare beneficiaries by physicians and
reports. The cost reports consist of a series of other healthcare professionals on an inpatient and
forms that collect descriptive, financial, and sta- outpatient basis. They also include all final-action
tistical data to determine if the Medicare program physician/supplier Part B noninstitutional line
over or underpaid the facility. These files are fre- items for the Medicare fee-for-service population.
quently used by health services researchers to For more information on these files, go to www.
examine various facility characteristics, calculate cms.gov/Research-Statistics-Data-and-Systems/
costs and charges, and determine the financial Statistics-Trends-and-Reports/Medicare-Provider-
viability of the facility (Asper 2013; Holmes Charge-Data.
et al. 2013; Kane and Magnus 2001). More infor-
mation on the various files can be found at: www. Program Evaluation and Health
resdac.org/cms-data/files/hcris. Outcomes
CMS offers researchers many program evaluation
Medicare Claims Data Files and health outcome public use data files. One such
Another widely used data source is the Medicare set of files is contained in the Consumer Assess-
Claims Data Files. These files are part of the Basic ment of Healthcare Providers and Systems
Stand Alone (BSA) Medicare Claims Public Use (CAHPS). CAHPS consists of a family of various
Files (PUFs). It consists of 11 separate basic patient experience surveys. These surveys ask
standalone public use files. Most of these files patients, or in some cases family members, about
contain non-identifiable claims-specific data their experiences with, and ratings of, the care
derived from a 5 % sample of all Medicare bene- they received. The surveys in many cases are the
ficiaries. The files are often used by health ser- only source of information on the care they
vices researchers, and they are increasingly being received. CAHPS surveys have been developed
used to conduct public health surveillance (Erdem for hospitals, home health, Medicare fee-for-ser-
and Concannon 2012; Stein et al. 2014; Erdem vice care, Medicare Advantage and Prescription
et al. 2014). Additional information on the files Drug plans, in-center hemodialysis, and hospices.
and how health services researchers use them can Results from the surveys are contained in various
be found at: www.academyhealth.org/Training/ public use files. Copies of the CAHPS survey
ResourceDetail.cfm?ItemNumber=7097. instruments can be found at: www.cms.gov/
To encourage researchers to use the Medicare Research-Statistics-Data-and-Systems/Research/
claims files, CMS has constructed the Data Entre- CAHPS/index.html. And more information on
preneurs’ Synthetic Public Use Files (DE-SynPUF). the CAHPS public use data files can be found at:
The DE-SynPUF allows researchers to develop and www.resdac.org/cms-data/files/cahps-puf.
create software applications for Medicare claims A number of other CMS public use files are also
data, train individuals to analyze claims data using derived from CAHPS. Data from various CAHPS
the actual files, and support safe data mining inno- surveys are used to produce Medicare Compare files
vations. Data contained in the DE-SynPUF is based and related websites, which contain data on individ-
on a 5 % sample of Medicare beneficiaries includ- ual facilities and physicians. These files provide
ing beneficiary summary data, inpatient, outpatient, contact information, quality of care measures, lists
carrier, and prescription drug event claims. More of services offered, and a five-star rating system.
The Medicare Compare files are available for kid- ways to improve healthcare practices (Haffer and
ney dialysis facilities (www.medicare.gov/dialysis Bowen 2004; Bowen 2012). More information can
facilitiycompare/), home health care agencies be found at www.resdac.org/cms-data/file-family/
(www.medicare.gov/homehealthcompare/), hospi- Health-Outcomes-Survey-HOS.
tals (www.medicare.gov/hospitalcompare/search.
html), skilled nursing facilities (www.medicare. Medicare Prescription Drug Program
gov/nursinghomecompare/search.html), and physi- The next category includes the Prescription
cians (www.medicare.gov/physicianscompare/ Drug Plan Formulary and Pharmacy Network
search.html). Many health services researchers Files. It consists of seven separate files: Benefi-
have used these files to measure the quality of care ciary Cost File, Formulary File, Geographic
provided at various healthcare facilities (Werner and Locator File, Pharmacy Network File, Plan
Bradow 2006; Saunders and Chin 2013; Lutfiyya Information File, Pricing File, and Record Lay-
et al. 2013; Williams et al. 2014). More information out. These files contain data on Medicare pre-
on the public use files can be found at www.resdac. scription drug plans and Medicare Advantage
org/cms-data/files/medicare-compare. prescription drug plans. The various files are
Another public use file dealing with quality of updated weekly, monthly, and quarterly. For
healthcare is the Healthcare Effectiveness Data more information see: www.resdac.org/cms-
and Information Set (HEDIS) public use file. data/files/pharmacy-network.
CMS uses HEDIS to compare health plans pro-
viding Medicare and Medicaid services. HEDIS, Medicare Electronic Medical Records
which was developed by the independent not-for- Program Files
profit National Committee for Quality Assurance CMS encourages the greater use of electronic med-
(NCQA), is a widely used tool to measure the ical records by all healthcare providers. It has
performance of health plans. It currently consists established an incentive program that provides pay-
of 81 measures across five domains of care and ments to hospitals and healthcare professionals to
service. HEDIS, which is used by more than 90 % adopt, implement, upgrade, or demonstrate the use
of America’s health plans, enables researchers to of electronic health record technology. As of
compare the performance of the plans. HEDIS has February 2015, more than 438,000 healthcare
been used to compare different quality measures providers received funds for participating in the
of care (Pugh et al. 2013; Bundy et al. 2012). program. To identify eligible hospitals and profes-
Information on HEDIS and its performance mea- sionals, CMS has constructed the Medicare Elec-
sures can be found at: www.ncqa.org/HEDISQua tronic Health Record (ERH) Incentive Program
lityMeasurement.aspx. And information on the Eligible Professional Public Use File (Wright
public use file is available at: www.resdac.org/ et al. 2014). More information on the program and
cms-data/files/hedis-puf. the files can be obtained at: www.cms.gov/Regula
Lastly, the Medicare Health Outcome Survey tions-and-Guidance/Legislation/EHRIncentive
(HOS) public use files provide a rich source of Programs/DataAndReports.html.
outcome data on Medicare beneficiaries enrolled
in Medicare Advantage programs. The Medicare Medicaid Data Files
HOS consists of Base Line, Follow-Up, and Ana- The next category identifies four CMS Medicaid
lytic Public Use Files. The survey, which measures public use files. The Medicaid Analytic Extract
quality improvement activities, health plan perfor- (MAX) Provider Characteristics File contains
mance, and outcomes of care, is administered to data on state Medicaid programs including the
cohorts of individuals who are repeatedly sampled number of individuals enrolled, demographic
over time. Results from the Medicare HOS have characteristics (age, gender, ethnicity, and race),
been used by health services researchers and qual- basis of eligibility (aged, disabled, children, and
ity improvement professionals to explore func- adults), and maintenance assistant status (medi-
tional status measurement issues and identify cally needy, poverty, waiver, and other). However,
70 R. M. Mullner
after several years of data collection, the files were discharges, length of stay, and total charges by
discontinued. They were last updated in 2011. The CMS provider numbers and zip codes of the
MAX files have been used by researchers to study Medicare beneficiaries. Using these data hospital
medical adherence to drugs (Rust et al. 2013) and service areas can be determined for various ser-
the maternal and infant outcomes of multistate vices. More information on the file can be found
Medicaid populations (Palmsten et al. 2014). A at: www.resdac.org/cms-data/files/hsaf.
chartbook summarizing 2010 MAX data is also The largest set of CMS geographic public use
available (Borck et al. 2014). For more informa- files is the Medicare Geographic Variation Files.
tion about the public use files, see www.resdac. They include 12 separate files – two files with
org/cms-data/files/max-provider-characteristics. state- and county-level data, four files with state-
The second public use file is the Medicaid/ level data, and six files with hospital referral
CHIP Environmental Scanning and Program regions (HRRs). The files are divided into report
Characteristics (ESPC) File. This file was created and table formats for all Medicare beneficiaries,
by CMS to encourage cross-state analysis of Med- those under 65 years of age and those 65 years of
icaid programs. It is now part of CMS’ Environ- age and older. These geographic files contain
mental Scanning and Program Characteristics demographic, spending, utilization, and quality
(ESPC) Database. The ESPC can be linked of care indicators for the Medicare fee-for-service
to the Medicaid Analytic Extract (MAX) files population at the state, county, and hospital refer-
and other Medicaid data. More information can ral regions. The hospital referral regions were
be found at: www.resdac.org/cms-data/files/ developed by the Dartmouth Atlas of Health
medicaidchip-espc. Care Project and have been widely used by health
Another public use file is the Medicaid State services researchers to investigate regional differ-
Drug Utilization File. This file contains data for ences in access, cost, quality, and the outcomes of
covered outpatient drugs paid for by state Medic- care (Baker et al. 2014; Chen et al. 2014;
aid agencies since the start of the federal Drug Wennberg 2010). Detailed information on the
Rebate Program in 1990. Currently, all states and files can be found at: www.resdac.org/cms-data/
the District of Columbia participate in the profiles/medicare-geographic-variation.
gram, as well as about 600 drug manufacturers.
For more information see: www.resdac.org/cms- Directories of Providers and Coding
data/files/medicaid-state-drug-utilization. Systems
Lastly, the Medicaid Statistical Information The last category includes seven directories of
System (MSIS) Datamart contains two public providers and medical procedure coding systems
use data files: State Summary Datamart and the public use data files. These files contain a listing
Drug Utilization Datamart. Both of these files can of the unique CMS healthcare facility and
be used to produce tables covering a wide range of healthcare professional provider identifiers and
Medicaid program statistics on eligibility and lists of CMS recognized medical procedure
claims data. These files contain data on Medicaid codes. The lists and procedure codes are primarily
eligible, beneficiaries, and payment, maintenance used for billing and payment purposes.
assistance status, age group, gender, race/ethnic- The public use Health Care Information Sys-
ity, and service category and program type. For tem (HCIS) Data File contains information on
more information go to: www.resdac.org/cms- each Medicare Part A and B institutional provider
data/files/msis-datamart. by type of facility and state. Specifically, it lists
CMS provider identifiers, facility characteristics,
Geographic Regions and Hospital Service total payment amounts, total number of Medicare
Areas beneficiaries served, and total utilization for hos-
The next category includes two geographic public pitals, skilled nursing facilities, home health agen-
use files. The first file is the Hospital Service Area cies, and hospices. For more information see:
File. It contains summary data on hospital www.resdac.org/cms-data/files/hcis.
The Medicare Part B Summary Data Files con- Provider Identifier (NPI) Downloadable File,
sists of two separate public use files: Carrier File which was previously discussed. These two files,
and National File. These files contain data sum- which may be of interest to researchers investigat-
maries by Healthcare Common Procedure Coding ing physicians in the mid-2000s, include the
System (HCPCS) code ranges. The HCPCS are Unique Physician Identification Number (UPIN)
medical codes used to report supplies, equipment, Directory and the Unique Physician Identification
and devices provided to patients. The file includes Number (UPIN) Group File. The first file contains
allowed services, allowed charges, and payment the name, specialty, license number, and zip code
amounts. More information on the files can be of physicians, limited licensed practitioners, and
found at: www.resdac.org/cms-data/files/part-b- some nonphysician practitioners who were
summary-data. enrolled in the Medicare program. The second
The next public use file is the National Pro- file provides data on group practices and the phy-
vider Identifier (NPI) Downloadable File. The sicians who were members of them. Both files
NPI is a unique, ten-digit, identification number were discontinued in 2007 with the implementa-
for each CMS-covered healthcare provider. By tion of the NPI. Information on the two files can be
federal law, the NPI must be used in all adminis- obtained at: www.resdac.org/cms-data/files/upin-
trative and financial healthcare transactions. The directory and www.resdac.org/cms-data/files/
file contains NPI data on the name, gender, busi- upin-group.
ness address, and medical license number of pro-
vider. For more information see: www.resdac.org/
cms-data/files/nppes. Chronic Conditions Data Warehouse
The Physician Supplier Procedure Summary
Master File contains data on all Medicare Part B Another important CMS data product is the Chronic
carrier and durable medical equipment regional Conditions Data Warehouse (CCW). Established in
carrier (DMERC) claims that were processed by 2006, the CCW is a national Medicare and Medicaid
CMS. Carriers are private companies that have research database containing claims and assessment
contracts with Medicare to process Part B claims. data linked by beneficiary across the continuum of
Durable medical equipment (DME) is equipment care. It also includes Medicare Part D prescription
that can withstand repeated use and is appropriate drug event data listing plan, pharmacy, prescriber
for home use, for example wheelchairs, oxygen characteristics, and a formulary file.
equipment, and hospital beds. The file includes The CCW is designed to promote the use of
data on each carrier; pricing locality; HCPCS current Medicare and Medicaid analytic easy-to-
procedure code; type and place of service; sub- use data files by researchers and policy analysts,
mitted, allowed, and denied services and charges; promote longitudinal research using data already
and payment amounts. More information can be linked by beneficiary across the continuum of
found at: www.resdac.org/cms-data/files/psps. care, identify areas to improve the quality of care
The Provider of Services (POS) File contains a provided to chronically ill beneficiaries, identify
record of each Medicare provider, including all possible ways to reduce program spending, and
institutional providers, ambulatory surgical cen- provide thorough documentation so these data
ters, and clinical laboratories. The file, which is may be used accurately ( General Dynamics Infor-
updated quarterly, includes CMS provider identi- mation Technology 2013; CCW website, www.
fication numbers and the characteristics of hospi- ccwdata.org/web/guest/about-ccw).
tals and other types of facilities, including the The CCW uses various computer algorithms to
name, address, and type of Medicare services the identify various conditions. The database includes
facility provided. For further information see: 27 chronic disease conditions, 9 mental health and
www.resdac.org/cms-data/files/pos. tobacco use conditions, and 15 conditions that are
The last two files in this category have been related to physical and intellectual disability and
discontinued and replaced by the National developmental disorders.
72 R. M. Mullner
Specifically, the CCW’s chronic disease condi- insurance coverage, satisfaction with the care
tions include acquired hypothyroidism, acute they received, and socioeconomic and demo-
myocardial infarction, Alzheimer’s disease, graphic characteristics of Medicare beneficiaries.
Alzheimer’s or related dementia, anemia, asthma, It also has been used to study the occurrence and
atrial fibrillation, benign prostatic hyperplasia, treatment of specific chronic conditions of the
cataract, chronic kidney disease (CKD), chronic elderly such as depression, dementia, hip frac-
obstructive pulmonary disease (COPD), heart tures, glaucoma, osteoporosis, and rheumatoid
failure, depression, diabetes, glaucoma, hip/pelvic arthritis. A bibliography and copies of over
fracture, hyperlipidemia, hypertension, ischemic 800 research articles published from 1992 to
heart disease, osteoporosis, rheumatoid/osteoar- 2013, which used MCBS data, can be found at
thritis, stroke/transient ischemic attack (TIA), www.cms.gov/Research-Statistics-Data-and-Sys
breast cancer, colorectal cancer, lung cancer, pros- tems/Research/MCBS/Bibliography.html.
tate cancer, and endometrial cancer. The MCBS is a continuous, in-person, longi-
The CCW’s mental health and tobacco condi- tudinal panel survey of a representative national
tions include conduct disorders and hyperkinetic sample of the Medicare population. Survey
syndrome, anxiety disorders, bipolar disorder, respondents are interviewed three times a year
depressive disorders, personality disorders, post- over a period of 4 years to form a continuous
traumatic stress disorder (PTSD), schizophrenia, profile of their healthcare experience. Two types
schizophrenia and other psychotic disorders, and of interviews are conducted: a community inter-
tobacco use disorder. view done at the respondent’s residence and a
Lastly, the CCW’s physical and mental disabil- healthcare institutional interview of knowledge-
ity conditions include autism spectrum disorder; able staff on behalf of the beneficiary. An impor-
cerebral palsy; cystic fibrosis and other metabolic tant feature of the MCBS is that respondents are
developmental disorders; epilepsy; intellectual followed into and out of long-term care facilities
disabilities and related conditions; learning dis- during their panel participation. About 16,000
abilities and other developmental delays; mobility Medicare beneficiaries are interviewed every
impairments; multiple sclerosis and transverse year (Adler 1994; Briesacher et al. 2012).
myelitis; muscular dystrophy; sensory – deafness Two data products are derived each year from
and hearing impairment; sensory – blindness and the MCBS: the Access to Care data file and the
visual impairment; spina bifida and other congen- Cost and Use data file. The Access to Care file
ital anomalies of the nervous system; spinal cord represents all persons enrolled in Medicare
injury; traumatic brain injury and nonpsychotic throughout the entire data collection year, which
mental disorders due to brain damage; and other is referred to as the “always enrolled” beneficiary
developmental delays. population. The file contains data on the benefi-
General information on the CCW can be ciaries’ access to healthcare, satisfaction with
obtained at www.ccwdata.org/web/guest/home. care, and usual source of care. The Access to
And a current detailed user guide (Buccaneer Care file is released within a year of the survey
Computer Systems and Service 2015) can be (Petroski et al. 2014).
found at: www.ccwdata.org. The Cost and Use file represents all persons
enrolled in Medicare at any point during the data
collection year, which is referred to as the “ever-
Medicare Current Beneficiary Survey enrolled” beneficiary population. The file links
Medicare claims data to survey-reported events
A very widely used CMS data product is the and provides complete expenditure and source of
Medicare Current Beneficiary Survey (MCBS). payment data on all healthcare services, including
Since the survey’s inception in 1991, the MCBS those not covered by Medicare. The file contains
data files have been used to estimate the health data on the beneficiaries’ use and cost of
status, healthcare use and expenditures, health healthcare services, information supplementary
health insurance, living arrangements, income, of Health Policy, Midwest Health Initiative (cov-
health status, and physical functioning. The Cost ering the St. Louis area and 16 counties in Mis-
and Use file is released within 2 years of the souri), and the Health Care Cost Institute
survey. (covering all 50 states and the District of
More information on the MCBS and its two Columbia).
files can be obtained at: www.cms.gov/Research- The QEs are beginning to release public
Statistics-Data-and-Systems/Research/MCBS/ind reports using the combined Medicare and other
ex.html?redirect=/MCBS. Additionally, an infor- payer data. The first report was published by the
mative free webinar presentation, “Getting and Oregon Health Care Quality Corporation, Infor-
Using the Medicare Current Beneficiary Survey mation for a Healthy Oregon: Statewide Report
(MCBS) for Health Services Research: Guidance on Health Care Quality 2014 (www.qcorp.org/
from the Experts,” is available from Academy reports/statewide-reports). It includes informa-
Health at: www.academyhealth.org/Training/ tion on Oregon’s chronic disease care, preventive
ResourceDetail.cfm?ItemNumber=11031. services, and ambulatory and hospital
resource use.
More information on CMS’ Qualified Entity
Medicare Qualified Entity Program Program is available at: www.resdac.org/cms-data/
request/qualified-entity-program; www.cms.gov/
The last data product to be discussed is the CMS’ QEMedicareData; and www.QEMedicareData.org.
Medicare Qualified Entity Program. This pro-
gram, which was mandated by the Affordable
Care Act of 2010, requires CMS to provide access
to Medicare claims data by qualified entities Conclusion
(QEs) in order to produce public performance
reports on physicians, hospitals, and other In the future, CMS will increasingly release more
healthcare providers. The program enables the information and data products that will be useful
QEs to combine Medicare claims data with com- to health services researchers, policymakers, edu-
mercial insurance and Medicaid claims data. To cators, students, and the general public. CMS will
become a QE, an organization must demonstrate continue to collect data on the Medicare, Medic-
existing expertise in performance measurement, aid, and Children’s Health Insurance Program
the ability to combine Medicare data with other (CHIP). At the same time, CMS will also expand
claims data, a process for allowing providers to its data collection efforts to measure its many new
review and correct their performance reports, and initiative programs, which are attempting to
adherence to data privacy and security procedures improve the quality of patient care, provide a
(Hostetter and Klein 2013). greater emphasis on prevention and population
As of June 2014, CMS has certified 12 regional health, and expand healthcare coverage. These
and one national QE: Oregon Health Care Quality initiatives will encourage all of the nation’s
Corporation (Q-Corp), Health Improvement Col- healthcare providers to use electronic health
laborative of Greater Cincinnati, Kansas City records, establish more Accountable Care Orga-
Quality Improvement Consortium, Maine Health nizations (ACOs), increase value-based purchas-
Management Coalition Foundation, Health ing, better coordinate care for dual eligible
Insight (covering five counties in New Mexico), beneficiaries, and reduce unnecessary hospital
California Healthcare Performance Information readmissions. As CMS moves from being a vol-
System, Pittsburgh Regional Health Initiative, ume payer of healthcare services to a value-based
Minnesota Community Measurement, Wisconsin payer, it will need much more data to identify the
Health Information Organization, Center for best ways to increase the quality of care while at
Improving Value in Health Care (covering Colo- the same time lower its costs (Burwell 2015; CMS
rado), Minnesota Department of Health, Division Strategy 2013).
74 R. M. Mullner
References Centers for Medicare and Medicaid Services (CMS). CMS

strategy: the road forward: 2013–2017. Baltimore:
Adler GS. A profile of the medicare current beneficiary Centers for Medicare and Medicaid Services; 2013.
survey. Health Care Financ Rev. 1994;15(4):153–63. Available at: www.cms.gov/About-CMS/Agency-Infor
Available at: www.cms.gov/Research-Statistics-Data- mation/CMS-Strategy/Downloads/CMS-Strategy.pdf
and-Systems/Research/HealthCareFinancingReview/ Centers for Medicare and Medicaid Services (CMS).
Downloads/CMS1191330dl.pdf Medicare and you, 2015. Baltimore: Centers for Medi-
Asper F. Introduction to Medicare cost reports. Slide pre- care and Medicaid Services; 2015. Available at: www.
sentation. Minneapolis: Research Data Assistance Cen- medicare.gov/Pubs/pdf/10050.pdf
ter; 2013. Available at: www.resdac.org/sites/resdac.org/ Chen C, Petterson S, Phillips R, et al. Spending patterns in
files/IntroductiontoMedicare Cost Reports (Slides).pdf region of residency training and subsequent expendi-
Baker LC, Kate Bundorf M, Kessler DP. Patients’ prefer- tures for care provided by practicing physicians for
ences explain a small but significant share of regional Medicare beneficiaries. JAMA. 2014;312
variation in Medicare spending. Health Aff. 2014;33 (22):2385–92.
(6):957–63. Chronic Condition Data Warehouse. About chronic condi-
Blumenthal D, Davis K, Guterman S. Medicare at 50 – mov- tion data warehouse. Available at: www.ccwdata.org/
ing forward. N Engl J Med. 2015;372(7):671–7. Available web/guest/about-ccw
at: www.nejm.org/doi/full/10.1056/NEJMhpr1414856 Cohen AB, Colby DC, Wailoo K, Zelizer J, editors. Medi-
Borck R, Laura R, Vivian B, Wagnerman K. The Medicaid care and Medicaid at 50: America’s entitlement pro-
analytic extract 2010 chartbook. Baltimore: Centers for grams in the age of affordable care. New York: Oxford
Medicare and Medicaid Services; 2014. Available at: University Press; 2015.
www.mathematica-mpr.com//media/publications/pdfs/ Erdem E, Concannon TW. What do researchers say about
health/maxchartbook_2010.pdf proposed Medicare claims public use files? J Comp Eff
Bowen SE. Evaluating outcomes of care and targeting qual- Res. 2012;1(6):519–25.
ity improvement using Medicare Health Outcomes Sur- Erdem E, Korda HH, Haffer SC, Sennett C. Medicare
vey data. J Ambul Care Manage. 2012;35(4):260–2. claims data as public use files: a new tool for public
Brennan N, Oelschlaeger A, Cox C, Tavenner health surveillance. J Public Health Manag Pract.
M. Leveraging the big-data revolution: CMS is 2014;20(4):445–52.
expanding capabilities to spur health system transfor- Ewing MT, editor. State Children’s Health Insurance Pro-
mation. Health Aff. 2014;33(7):1195–202. gram (SCHIP). New York: Nova; 2008.
Briesacher BA, Tjia J, Doubeni CA, Chen Y, Rao Feder J, Komisar HL. The importance of federal financing
SR. Methodological issues in using multiple years of to the nation’s long-term care safety net. 2012. Avail-
the medicare current beneficiary survey. Medicare Med- able at: www.thescanfoundation.org
icaid Res Rev. 2012;2(1):E1–19. Available at: www. General Dynamics Information Technology. Centers for
cms.gov/mmrr/downloads/mmrr2012_002_01_a04.pdf Medicare and Medicaid Services Chronic Condition
Buccaneer Computer Systems and Service. Chronic con- Data Warehouse (CCW): national Medicare and Med-
ditions data warehouse: Medicare administrative data icaid research database. Fairfax: General Dynamics
user guide. Version 3.1. 2015. Available at: www. Information Technology; 2013. Available at: www.
ccwdata.org gdit.com/globalassets/health/6978_ccw.pdf
Bundy DG, Solomon BS, Kim JM, et al. Accuracy and Haffer SC, Bowen SE. Measuring and improving health
usefulness of the HEDIS childhood immunization mea- outcomes in Medicare: the Medicare HOS program.
sures. Pediatrics. 2012;129(4):648–56. Available at: Health Care Financ Rev. 2004;25(4):1–3. Available
www.ncbi.nlm.nih.gov/pmc/articles/PMC3313643/ at: www.ncbi.nlm.nih.gov/pmc/articles/PMC4194894/
Burwell SM. Setting value-based payment goals – HHS Henry J. Kaiser Family Foundation. Medicaid moving for-
efforts to improve U.S. health care. N Engl J Med. ward. Kaiser Commission on Medicaid and the
2015;372(10):897–9. Available at: www.nejm.org/doi/ Uninsured, Fact Sheet. 2015. Available at: www.kff.org
full/10.1056/NEJMp1500445 Henry J. Kaiser Family Foundation. The medicare Part D
Centers for Medicare and Medicaid Services (CMS). CMS prescription drug benefit. Fact Sheet. 2014. Available
announces data and information initiative. Fact Sheet. at: www.kff.org
2012. Available at: www.cms.gov/Research-Statistics- Henry J. Kaiser Family Foundation. State demonstration
Data-and-Systems/Research/ResearchGeninfo/Down proposals to align financing and/or administration for
loads/OIPDA_Fact_Sheet.pdf dual eligible beneficiaries. February 2015. Fact Sheet.
Centers for Medicare and Medicaid Services (CMS). Cen- 2015. Available at: www.kff.org
ters for Medicare and Medicaid services: justification of Holmes GM, Pink GH, Friedman SA. The financial per-
estimates for appropriations committees. Baltimore: formance of rural hospitals and implications for elimi-
Centers for Medicare and Medicaid Services; 2015, nation of the critical access hospital program. J Rural
p. 2. Available at: www.cms.gov/About-CMS/ Health. 2013;29(2):140–9.
Agency-Information/PerformanceBudget/Downloads/ Hostetter M, Klein S. Medicare data helps fill in picture of
FY2015-CJ-Final.pdf health care performance. Quality Matters: The
Commonwealth Fund Newsletter. 2013. Available at: associated with new exposure. Drugs Aging. 2013;30
www.commonwealthfund.org/publications/newsletters/ (8):645–54. Available at: www.ncbi.nlm.nih.gov/pmc/
quality-matters/2013/april-may/in-focus articles/PMC3720786/
Kane NM, Magnus SA. The Medicare cost report and the Rust G, Zhang S, Reynolds J. Inhaled corticosteroid adher-
limits of hospital accountability: improving financial ence and emergency department utilization among
accounting data. J Health Polit Policy Law. 2001;26 Medicaid-enrolled children with asthma. J Asthma.
(1):81–106. 2013;50(7):769–75. Available at: www.ncbi.nlm.nih.
Lutfiyya MN, Gessert CE, Lipsky MS. Nursing home gov/pmc/articles/PMC4017346/
quality: a comparative analysis using CMS nursing Saunders MR, Chin MH. Variation in dialysis quality mea-
home compare data to examine differences between sures by facility, neighborhood, and region. Med Care.
rural and non-rural facilities. J Am Med Dir Assoc. 2013;51(5):413–7. Available at: www.ncbi.nlm.nih.
2013;14(8):593–8. gov/pmc/articles/PMC3651911/
National Conference of State Legislatures. Children’s Stein BD, Pangilnan M, Sorbero MJ, et al. Using claims
health: trends and options for covering kids. data to generate clinical flags predicting short-term risk
Washington, DC: National Conference of State Legis- of continued psychiatric hospitalizations. Psychiatr
latures; 2014. Available at: www.ncsl.org/documents/ Serv. 2014;65(11):1341–6.
health/coveringkids914.pdf U.S. Government Accountability Office. Health care trans-
Office of the Assistant Secretary for Planning and Evalua- parency: actions needed to improve cost and quality
tion, U.S. Department of Health and Human Services. information for consumers. Washington, DC:
The Medicare advantage program in 2014. ASPE Issue U.S. Government Accountability Office; 2014. Avail-
Brief. 2014. Available at: http://aspe.hhs.gov able at: www.gao.gov/products/GAO-15-11
Orentlicher D. Medicaid at 50: no longer limited to the Wennberg JE. Tracking medicine: a researcher’s quest to
‘Deserving’ poor? Yale J Health Policy Law Ethics. understand health care. New York: Oxford University
2015;15(1):185–95. Press; 2010.
Palmsten K, Huybrechts KF, Kowal MK, et al. Validity of Werner RM, Bradow ET. Relationship between Medicare’s
maternal and infant outcomes within nationwide Med- hospital compare performance measures and mortality
icaid data. Pharmacoepidemiol Drug Saf. 2014;23 rates. JAMA. 2006;296(22):2694–702.
(6):646–55. Williams A, Straker JK, Applebaum R. The nursing home
Petroski J, Ferraro D, Chu A. Ever enrolled Medicare five star rating: how does it compare to resident and
population estimates from the MCBS access to care family views of care? Gerontologist. 2014.
files. Medicare Medicaid Res Rev. 2014;4(2):E1–16. Wright A, Feblowitz J, Samal L, et al. The Medicare
Available at: www.cms.gov/mmrr/Downloads/ electronic health record incentive program: provider
MMRR2014_004_02_a05.pdf performance on core and menu measures. Health Serv
Pugh MJV, Marcum ZA, Copeland LA, et al. The quality of Res. 2014;49(1 Pt 2):325–46. Available at: www.ncbi.
quality measures: HEDIS quality measures for media- nlm.nih.gov/pmc/articles/PMC3925405/
tion management in the elderly and outcomes
Health Services Data: Typology of Health
Care Data 5
Ross M. Mullner
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Basic Units of Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Households . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Groups/Populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Health Care Organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Health Care Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
National Health Care Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Collection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Literature Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Focus Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Medical Records, Administrative, and Billing Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Registries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Vital Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Data Sources and Holdings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Government Organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Private Organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Abstract organizations to monitor and evaluate current

Health services researchers study access, cost, health care programs and systems and to predict
quality, and the outcome of health care. These the consequences of proposed new health poli-
researchers frequently use existing data col- cies. Primarily focusing on US data sources, this
lected by government agencies and private chapter outlines a practical typology, or classi-
fication framework, of health care data that is
often used by these researchers when they are
R. M. Mullner (*) gathering data and conducting their studies. The
Division of Health Policy and Administration, School of
Public Health, University of Illinois, Chicago, IL, USA
typology addresses three important inextricably
e-mail: rmullner@comcast.net linked questions. First, what is the basic unit of

https://doi.org/10.1007/978-1-4939-8715-3_6
78 R. M. Mullner
analysis for the study? These units include indi- assess, cost, quality, and the outcome of health care
viduals, households, groups/populations, health (Mullner 2009). Access to health care, which can
care organizations, health care programs, and be defined as encompassing everything that facili-
national health care systems. Second, how tates or impedes the use of health care services, is a
were these data collected? The methods used basic requirement of any health care facility, pro-
to collect data include literature reviews, obser- gram, or system. A number of factors influence an
vations, focus groups, surveys, medical records individual’s access to health care including the
and administrative and billing sources, regis- environment, population characteristics, health
tries, and vital records. Third, which govern- behavior, and outcomes. Environmental factors
ment agency or private organization collected include the health care system (e.g., whether it is
and is currently holding these data? Govern- acceptable to the individual or not) and external
ment data collection and holding agencies environmental factors (e.g., geographic distance,
include US health information clearinghouses physical, and political barriers). Population charac-
and libraries, US registries, US government teristics include predisposing characteristics (e.g.,
agencies and departments, health programs age, and gender), enabling resources (e.g., income,
and systems of other (non-US) nations, and and health insurance), and perceived need (e.g.,
government sponsored international organiza- health beliefs). Health behavior includes personal
tions. Private data collecting and holding orga- health practices and previous use of health ser-
nizations include health information vices. Lastly, outcomes include perceived health
clearinghouses and libraries; accreditation, status, evaluated health status, and consumer satis-
evaluation, and regulatory organizations; asso- faction with care (Andersen 1995).
ciations and professional societies; foundations Health services researchers studying access to
and trusts; health insurance and employee ben- health care investigate various topics such as iden-
efits organizations; registries; research and pol- tifying ethnic and racial disparities in medical
icy organizations; and survey research care; determining the geographic locations of
organizations. To illustrate each of the questions health professional shortage areas; studying the
and classifications, many examples are pro- factors associated with the diffusion and use of
vided and discussed. And many US and other new medical technology and facilities; measuring
public use data files are identified and described. access to hospitals and other health care facilities;
and identifying the availability of health insurance
coverage, and determining its impact on the use of
Introduction health care services (Agency for Healthcare
Research and Quality 2014).
Health services research is a growing field of study Cost of health care, which can be defined as the
that is becoming increasingly important to society. amount that has to be paid or spent to buy or obtain
As medicine progresses and increasingly saves health care, can be differentiated and measured in
lives, becomes more technologically complex, many ways including average cost, fixed cost, incre-
and is ever more costly and demands a greater mental cost, marginal cost, total cost, and variable
share of society’s resources, a growing number of cost, as well as direct and indirect cost, avoided cost,
people are conducting health services research cost of lost productivity, and the societal cost of
studies. These researchers include physicians, illness (Culyer 2010; Feldstein 2011). It should be
nurses, epidemiologist, demographers, health noted that health care cost frequently differs greatly
economists, medical sociologists, political scien- from the price of health care, because the price is
tists, public policymakers, hospital administrators, often not determined by cost, but rather it is greatly
insurance executives, senior business managers, distorted by what health insurers are willing to pay
and consultants. (Painter and Chernew 2012).
Health services research can be broadly defined Health services researchers studying the cost of
as a multidisciplinary field of study that focuses on health care investigate a large number of topics
5 Health Services Data: Typology of Health Care Data 79
such as conducting international comparisons of Discomfort includes various levels of pain from
health care cost in various nations, determining “no pain” to “worst pain imaginable” and its dura-
the cost-benefit and cost-efficiency of medical tion. And dissatisfaction, which is the level of
procedures and drugs, investigating the impact satisfaction, measures the specific and overall
of different methods of financing care, determin- experience with care (Kane and Radosevich 2011).
ing the impact of new payment reform models Health services researchers studying the out-
(i.e., pay-for-performance), identifying the impact come of health care tend to investigate such topics
of health care rationing, estimating the economic as estimating the number of preventable deaths of
value of life, and identifying the economic and enrollees in various health programs; determining
societal cost of particular medical conditions and the factors leading to the increase in longevity;
diseases (Health Care Cost Institute 2014). identifying the health services provided to chil-
Quality of health care, which can be defined as dren and adolescents with chronic diseases and
getting the right care to the right patient at the right disabilities; developing and testing new pain
time – every time, is evaluated using three dimen- scales; and analyzing and reporting the results of
sions: structure, process, and outcome. Structure health satisfaction surveys (Halsey 2015; Perrin
includes the characteristics of the care setting, 2002; Williamson and Hoggart 2008).
such as type and size of the health facility, exper- Ideally, a health care facility, program, or sys-
tise of the medical staff, sophistication of the tem should provide the greatest access to health
medical equipment, and the policies related to care, at the lowest possible cost, with the greatest
care delivery. Process consists of the methods of level of quality, and achieve the best possible
providing patients with consistent, appropriate, outcome of care. To work towards, this very dif-
effective, safe, coordinated, timely, and patient- ficult ideal, health services researchers frequently
centered high quality care. Outcome evaluates the study the equity, efficiency, and effectiveness of
end result of care the patient received (Clancy health care. Equity can be broadly defined as
2009; Donabedian 1980). fairness, efficiency as the ratio of inputs to out-
Health services researchers studying the qual- puts, and effectiveness as meeting stated objec-
ity of health care investigate such topics as iden- tives and goals, such as the US national health
tifying the impact of accreditation and licensing of goals contained in Healthy People, 2020 (Aday
health care facilities and professionals; estimating et al. 2004).
the overuse, underuse, and misuse of health care The overall aim of health services research is to
services; determining the occurrence of prevent- influence health policy and to improve the prac-
able medical errors; identifying the frequency of tice of medicine and public health. Health services
health care-associated infections; studying patient researchers do this by monitoring and evaluating
safety problems; and developing and testing new current health care facilities, programs, and sys-
medical quality indicators of care (Agency for tems and by predicting the consequences of pro-
Healthcare Research and Quality 2014; National posed future health care policies.
Committee for Quality Assurance 2006). Health services researchers frequently conduct
Lastly, the outcome of health care reflects the studies using existing data sources. They typically
interrelated issues of access, cost, and quality of conduct secondary data analysis of large data-
care. Outcome of health care can be broadly bases that were collected by various government
defined and includes the occurrence and change agencies and private organizations. There are
in the number and rate of death, disease, disability, many advantages in using existing data: they are
discomfort, and dissatisfaction with health care. readily available, inexpensive, and save time in
Death or mortality also includes changes in lon- collection, and they may be used to conduct lon-
gevity. Disease or morbidity addresses acute and gitudinal and international comparisons (Huston
chronic disease and complications with medical and Naylor 1996).
care. Disability deals with the change in physical Primarily focusing on US data sources, this chap-
functional status and psychosocial functioning. ter outlines a practical typology, or classification
80 R. M. Mullner
framework, of health care data that is frequently Table 1 Basic units of analysis
used by these researchers. The typology addresses Individuals
three important inextricably linked questions. First, Identify the general demographic and social
what is the basic unit of analysis of the study? characteristics of individuals
Second, how were these data collected? Third, Determine the overall health status of individuals
which government agency or private organization Measure the occurrence of specific diseases and medical
conditions
collected and is currently holding these data?
Households
Identify the demographic and social characteristics of
households
Basic Units of Analysis Measure the total household income and education levels
Determine the households overall use of health care
After identifying a particular study area of inter- services
est, and a specific topic, a health services Groups/populations
researcher must determine – What will be the Identify the demographic, economic, and social
basic unit of analysis of my proposed study? characteristics of specific ethnic and minority groups
Determine the overall health status of high risk and
Table 1 shows a list of these units; it also presents
vulnerable populations
some relevant questions that may be addressed for Measure the gaps in health care among various groups
each unit. The basic units of analysis include Identify health professional shortage areas
individuals, households, groups/populations, Organizations
health care organizations, health care programs, Identify the total number health care organizations in a
and national health care systems. region
Access the operating characteristics of hospitals
Determine the number of long-term care facilities in an
Individuals area
Measure the service areas and degree of competition
between healthcare organizations
Many health services researchers conduct their stud- Health care programs
ies focusing on individuals. Information on individ- Identify the characteristics of Medicare beneficiaries
uals may be obtained from many sources such as Access the number of type of providers of Medicaid
patient health care records, birth and death certifi- services
cates, insurance claim forms, and various national Determine the unwarranted use of services
health surveys. Data on them may include a very National health care systems
large number of potential variables including the Compare the access, costs, quality, and outcomes of
person’s age, sex, height, weight, race, ethnicity, various national health care systems
place of birth, language most often spoken, marital Determine how each system rations care
Identify by country the highest and lowest levels of care
status, highest level of education attained, main
occupation, current work status, health insurance
coverage, past medical history, current overall health
status, physical activities, degree of mobility, dis- individual health questionnaire is the “World Health
ability status, individual risk factors (tobacco, alco- Survey, 2002,” which was implemented in 70 mem-
hol use, and poor nutrition), environmental risk ber states (countries) to gather data on a sample of
factors (air pollution, ground water contamination, 300,000 adults. Data from the surveys were used to
and lack of sanitation), self care, the level of pain and strengthen each country’s capacity to monitor criti-
discomfort experienced, cognition problems, inter- cal health outcomes and systems. A copy of the
personal activities, sleep and energy level, inventory long- and short-survey instruments can be found
of medicines and drugs, health seeking behaviors, on WHO’s websites, www.who.int/healthinfo/sur
health screenings, reproductive and sexual health vey/en/ (WHO 2002).
care, maternal health care, child health preventive Another very important large-scale survey of
care, and health goals. An example of a widely used individuals is the US Centers for Disease Control
and Prevention’s (CDC) Behavioral Risk Factor The survey collects data on a broad range of topics
Surveillance System (BRFSS). The BRFSS is a including access to health care services, health
nationwide surveillance system that is conducted insurance coverage, physical and mental health
to monitor state-level prevalence of the major status, chronic medical conditions, health-related
behavioral risks (e.g., exercise, alcohol consump- behaviors, functioning and activity limitations,
tion, tobacco use, immunizations, and various immunizations, and injuries and poisonings. Cur-
cancer screening) among adults who have condi- rent and past NHIS data public use files, question-
tions associated with premature morbidity and naires, documentation, and analytic reports are
mortality. To collect data, the CDC works together readily available and can be downloaded for free
with state health departments and conducts from NCHS’ website www.cdc.gov/nchs/nhis.
monthly telephone surveys. Currently, more than htm (NCHS 2010).
500,000 interviews are conducted annually mak-
ing the BRFSS the world’s largest telephone sur-
vey. Data from the survey are published in various Groups/Populations
reports (Xu et al. 2014), and annual survey data
for 1984–2013 can be downloaded. The BRFSS When general sample surveys of individuals or
also offers statistical tools, Web Enabled Analysis households do not adequately yield reliable health
Tool (WEAT), which let researchers conduct cross care data on specific groups or populations, sup-
tabulations and logistic regression analysis, and plements may be added to existing surveys or new
an interactive mapping program to compare data surveys may be developed to obtain data on those
across geographic areas. More information on the groups or populations. Some of these groups or
BRFSS can be obtained at: www.cdc.gov/brfss/. populations may include racial and ethnic minor-
ity groups (American Indians and Alaska Natives,
Asians, Native Hawaiian and Pacific Islanders,
Households Blacks, and Hispanic/Latinos), high risk and vul-
nerable populations (infants, children under
Health services researchers often study the health 5 years of age, pregnant women, and the elderly),
and health care seeking behavioral characteristics and groups with a specific disease and medical
of households. In the USA, they frequently use condition (blind, hearing loss, and the severely
data collected by the Centers for Disease Control disabled).
and Prevention’s (CDC) National Center for To obtain information on a group or popula-
Health Statistics (NCHS), the nation’s principal tion, the National Health Interview Survey
government health statistics agency. Many of (NHIS) often adds supplements to its standard
NCHS’ surveys collect data on the demographic, survey and expands the number of households
socioeconomic, and the health characteristics of sampled. These supplements are sponsored by
households (NCHS “Summary” 2014b). various government agencies and nonprofit orga-
The oldest and arguably the most important nizations. In 2014, for example, the NHIS added
National Center for Health Statistics’ household 4,000 additional households to its survey to obtain
survey is the National Health Interview Survey more data on the health of Native Hawaiian and
(NHIS). The NHIS, which is considered the prin- Pacific Islanders (NCHS 2014a).
cipal source of information on the health of the US
population, has been used to continuously moni-
tor the nation’s health since 1957. This large-scale Health Care Organizations
household survey collects data on a statistically
representative sample of the US civilian noninsti- Health services researchers frequently study
tutional population. Each year interviewers visit health care organizations. They study many
35,000–40,000 households across the nation and types of organizations such as medical group
collect data on about 75,000–100,000 individuals. practices, outpatient surgery centers, home care
82 R. M. Mullner
organizations, Health Maintenance Organizations monitors and updates its files on them. CMS col-
(HMOs), and Accountable Care Organizations lects data on the address of each nursing home; the
(ACOs). But they particularly study hospitals facilities’ bed size, ownership type, and certifica-
and nursing homes. tion; number of nursing home residents; demo-
The hospital is arguably the single most impor- graphic and medical characteristics of the
tant institution for the delivery of modern health residents, including cognitive and functional
care, while the nursing home is the major institution impairments; and the number, type, and level of
caring for the elderly. The most widely used data deficiencies these facilities experienced. The defi-
source on US hospitals is the American Hospital ciencies include citations for substandard quality
Association’s (AHA) annual survey of hospitals. of care; abuse; improper restraint use; pressure
And the most important source on nursing homes sores; actual harm or worse; and the immediate
data is the US Department of Health and Human jeopardy threat to the health or life of one or more
Services, Centers for Medicare and Medicaid Ser- nursing home residents. Data on individual nurs-
vices’ (CMS) nursing home compare data program. ing homes can be obtained at CMS’s Medicare.
The American Hospital Association (AHA) gov Nursing Home Compare website www.medi
conducts an annual survey of the nation’s approx- care.gov/nursinghomecompare, the entire data-
imately 6,000 hospitals, which account for base can be downloaded, and a summary nursing
920,000-staffed beds and 36 million admissions. home data compendium is published annually,
The survey, which is the most comprehensive and which is also available on the website (CMS
authoritative source on US hospitals, collects 2014).
almost 900 variables on each hospital. These data
include the hospital’s address, bed size, ownership
(for-profit, not-for-profit, government), type of Health Care Programs
hospital (community, psychiatric, long-term care,
federal, and units of institutions), membership in a Many health services researchers study large
multihospital system or network, teaching status, national health care programs. One of the most
type of facilities and services offered, physician widely studied is the US Medicare program. This
arrangements, information technology, total num- federal government administered national pro-
ber of inpatients and outpatient visits, Medicare/ gram provides health insurance for over 50 million
Medicaid utilization, revenues and expenses, and people, including those 65 years of age or older,
number of hospital staff. Data from the survey are those with certain disabilities, and people of any
published in the annual AHA Guide to the Health age with End-Stage Renal Disease (ESRD) (per-
Care Field and AHA Hospital Statistics, and the manent kidney failure requiring dialysis or a kid-
proprietary data can be purchased on CD (AHA ney transplant).
Data Viewer 2015; AHA 2013). The Medicare program consists of four differ-
The Centers for Medicare and Medicaid Ser- ent parts: Part A (hospital insurance covering
vices (CMS), which administers the nation’s inpatient care, nursing home, hospice, and home
Medicare program and works in partnership with health care), Part B (medical insurance covering
state governments to administer Medicaid pro- physician services, outpatient and home health
grams, continuously gathers data on the country’s care, and durable medical equipment), Part C
nearly 16,000 certified-nursing homes. These (Medicare Advantage, a managed care program
nursing homes provide services to over 1.4 covering Part A and B), and Part D (covering
million residents, corresponding to nearly 3 % of prescription drugs).
the nation’s over 65 population and 10 % of the The program collects data on its various parts
over 85 population. Because CMS pays for nurs- including claims for services provided to each
ing home services provided to Medicare benefi- beneficiary admitted to a certified hospital and
ciaries and Medicaid recipients, it continuously nursing home. It codes the beneficiaries’ address,
where they received care, their medical diagnoses, accounts for about 60 % of all Medicare
admission date, what services were provided, dis- spending.
charge date, discharge status, cost of each service, To remedy the unwarranted variations in
and the total cost of care. If the beneficiary dies preference-sensitive care, the Dartmouth
after receiving care, it is coded up to 3 years after researchers argue for the greater use of evidence-
discharge. One widely used CMS database is based medicine to identify the best option, and
the Medicare Provider Analysis and Review they call for a fundamental reform of the
(MEDPAR) file, which can be obtained from physician-patient relationship, with greater shared
CMS’ website www.cms.gov/Research-Statistics- decision-making and informed patient choice. To
Data-and-Systems/IdentifiableDataFiles/Medicare remedy the variations in supply-sensitive care,
ProviderAnalysisandReviewFile.html (CMS 2014). they argue that the common physician assumption
An exemplar of the innovative use of the that “more care is better” needs to change and
MEDPAR database is the research conducted by there must be a new emphasis on improving the
the Dartmouth Atlas of Health Care Project. science of health care delivery (Wennberg 2010).
Health services researchers working on the pro- Published reports of the Dartmouth Atlas of
ject, which is housed at Dartmouth University’s Health Care Project as well as the data they used
Institute for Health Policy and Clinical Practice, in many of their studies can be downloaded from
have studied a wide range of medical practice their website www.dartmouthatlas.org.
patterns at the national, regional, state, and local
levels. For more than 20 years, these researchers
have found and documented glaring unwarranted National Health Care Systems
variations in surgeries, diagnostic testing, imaging
exams, physician visits, referrals to specialists, Lastly, some health services researchers conduct
hospitalizations, and stays in intensive care units. cross-national studies of health care systems, such
They have consistently found that more health as comparing the US health system to that of
care is not necessarily better care (Dartmouth Canada, the United Kingdom, and other industri-
Atlas of Health Care 2015). alized nations. It is hoped that these multinational
Using Medicare data, the Dartmouth comparisons may help health policymakers learn
researchers have identified three broad categories from the experiences of other nations, lead to new
of medical care: effective or necessary care, insights and perspectives, held in evaluating
preference-sensitive care, and supply-sensitive existing policies, and identify possible new solu-
care. Effective or necessary care includes services tions to shared problems.
that are based on sound medical evidence, which Three important sources of data on national
work better than any alternative treatment (e.g., health care systems are the World Health Organi-
surgery for hip fractures and colon cancer). They zation (WHO), Organisation for Economic
estimate that this category of care accounts for no Co-operation and Development (OECD), and the
more than 15 % of total Medicare spending. Commonwealth Fund.
Preference-sensitive or elective care includes The World Health Organization (WHO), which
interventions for which there are several options is the directing and coordinating authority for
and where the outcomes vary depending on the health within the United Nations (UN), collects
option used (e.g., elective surgeries, mammogra- health-related data on its 194 member states
phy screening tests, and prostate specific antigen (nations). These data on the states are published
tests). This accounts for about 25 % of Medicare in its series World Health Statistics. Issued annu-
spending. Lastly, supply-sensitive care includes ally since 2005, World Health Statistics is the
everyday medical care used to treat patients with definitive source of information on the health of
acute and chronic diseases (e.g., physician visits, the world’s people. The series is compiled using
imaging exams, and admissions to hospitals). This publications and databases produced and
84 R. M. Mullner
maintained by the WHO’s technical programs and The Commonwealth Fund, a private, nonparti-
regional offices, and from various databases of the san foundation headquartered in New York City
UN and World Bank. Data in the publication that supports independent research on health care
provide a comprehensive summary of the current issues to improve health care practice and policy,
health status and health system of each member conducts annual cross-national studies. Starting in
state. These data include nine areas: life expec- 1998, its International Health Policy Center has
tancy and mortality, cause-specific morbidity and conducted multinational surveys of patients and
mortality, selected infectious diseases, health ser- their physicians to identify their experiences with
vice coverage, risk factors, health systems, health their health care systems. The surveys focus on
expenditures, health inequities, and demographic various aspects of access, costs, and quality of
and socioeconomic statistics. WHO’s data in health care.
published form are available on its website One of the center’s recent surveys was the
www.who.int (WHO 2014). “2014 Commonwealth Fund International Health
The Organisation for Economic Co-operation Policy Survey of Older Adults,” a telephone inter-
and Development (OECD) is an international view survey of more than 15,000 people age 65 or
membership organization representing 34 industri- older in 11 industrialized countries (Australia,
alized nations that are committed to democracy and Canada, France, Germany, the Netherlands,
a free market economy. The OECD, working with New Zealand, Norway, Sweden, Switzerland, the
its member nations, produces data and reports on a United Kingdom, and the United States). The
wide variety of economic and social topics, includ- survey’s major finding was that older adults in
ing health care. Each year it releases data compar- the US were sicker and more likely to have prob-
ing the health care systems of its member nations lems paying their medical bills and getting needed
including: health care spending – average spending health care than those in the other 10 countries
per capita, spending as a percentage of GDP, (Osborn et al. 2014).
spending per hospital discharge, and pharmaceuti- The center has also conducted five surveys to
cal spending per capita; supply and monitor changes in multinational health care sys-
utilization–number of practicing physicians per tem performance, and the results have been
population, average number of physician visits published in a series of reports entitled Mirror,
per capita, Magnetic Resonance Imaging (MRI) Mirror on the Wall (2004, 2006, 2007, 2010,
machines per population, hospital discharges per 2014). Over the years, these reports have consis-
population, and hip replacement inpatient cases per tently found that among industrialized nations the
population; health promotion and disease preven- US health care system has been the most expen-
tion efforts – cervical cancer screening rates, flu sive, but underperforms relative to other nations
immunization among adults 65 or older, and adults on most dimensions on access, efficiency, and
who report being daily smokers; quality and patient equity (Davis et al. 2014).
safety – mortality amenable to health care, breast
cancer 5-year survival rate, and diabetes lower
extremity amputation rates; prices – total hospital
and physician prices for appendectomy and bypass Collection Methods
surgery, diagnostic imaging prices, and long-term
care and social supports – percent of population The second question of this typology of health
age 65 or older, beds in residential long-term care care data is – How were these data collected?
facilities per population age 65 or older, and health This question is important, because the way
and social care spending as a percentage of GDP. the data were collected may limit the type of
OECD data and its reports, which are frequently statistical methodology that can be used to ana-
used by health services researchers (Anderson lyze them, and it may greatly affect the reliability
2014; Anderson and Squires 2010), can be and validity of the results of the study. Each data
downloaded from their website www.oecd.org/ collection method has advantages and disadvan-
statistics/ (OECD 2013). tages and the researcher should be well aware of
them. Table 2 shows the various data collection focus groups; surveys; medical records, adminis-
methods, and it also lists some relevant questions trative, and billing sources; registries; and vital
that may be addressed by each method. The records.
methods include literature reviews; observations;
Literature Reviews
Table 2 Data collection methods
Literature reviews One of the easiest, fastest, and most economical
Identify what is known about a particular health care ways to obtain data and information on a research
topic topic or a specific research question is to conduct a
Determine what are the gaps in knowledge on the topic literature review. A comprehensive literature
Conduct a meta-analysis to assess the clinical review can help identify what is known and not
effectiveness of a health care intervention
known about a topic or question; what data
Answer a research question
sources are available; what variables were found
Observations
Observe patients taking their treatments
to be important; what statistical methods were
Measure the degree of hand hygiene adherence at a health employed; what populations were studied; what
care organization sample sizes were used; and what are the gaps or
Conduct a clinical observation, or shadowing, to possible errors in the studies.
determine how health care professionals actually provide A major resource in conducting literature
patient care
reviews is the US National Library of Medicine’s
Focus groups
(NLM) PubMed search engine. PubMed accesses
Determine the perceptions, opinions, beliefs, and
attitudes towards a health program
MEDLINE and other databases of citations and
Identify specific problems with a health facility abstracts in the fields of medicine, nursing, public
Present options to a group and see which ones are viewed health, and health care systems. Currently,
favorably PubMed contains more than 24 million citations
Surveys from over 5,600 worldwide journals and thou-
Determine the past medical history of individuals sands of books and reports. PubMed is easy to
Identify the experiences of patients in receiving care use, it can be searched by entering Medical Sub-
Measure the workload of physicians and other health care ject Headings (MeSH) the NLM’s controlled
professionals
vocabulary, author names, title words or phrases,
Medical records, administrative, and billing sources
journal names, or any combination of these. It also
Identify and implement best practices of care
links to many full-text articles and reports. The
Determine regional variations in the provision of health
care PubMed’s website is: www.ncbi.nlm.nih.gov/
Measure the average costs of various health care pubmed.
services Another important source for conducting liter-
Registries ature reviews is the Cochrane Collaboration.
Identify the occurrence of a disease within a Consisting of a network of 14 centers around the
population world, the Cochrane Collaboration is a nonprofit
Assess the natural history of a disease, its management, international organization that promotes and dis-
and its outcomes
seminates systematic reviews of health care inter-
Support health economic research
Collect postmarketing safety data on medical products
ventions, particularly clinical trials. Collaborators
and pharmaceuticals from over 120 countries conduct these systematic
Vital records reviews. The Cochrane Library contains a number
Determine trends in fetal and perinatal mortality of useful databases including Cochrane Database
Identify the relationship between infant birth weight and of Systematic Reviews (CDSR); Cochrane Con-
health care problems trolled Trials Register (CENTRAL); Database of
Determine trends in low-risk Cesarean delivery Abstracts of Reviews of Effectiveness (DARE);
Identify trends in drug-poisoning deaths involving opioid Cochrane Methodology Register; Health Technol-
analgesics and heroin
ogy Assessment Database (HTA); and the
86 R. M. Mullner
National Health Service Economic Evaluation the present, this study has followed large cohorts
Database (NHS EED). The Cochrane Collabora- of individuals from Framingham, Massachusetts,
tion’s website is www.cochrane.org. to determine their risk of developing cardiovascu-
Many of the Cochrane Collaboration’s sys- lar disease. Today, much of what is now-common
tematic reviews include a meta-analysis of stud- knowledge concerning the major risk factors of
ies. Meta-analysis is a statistical technique that developing heart disease (hypertension, high
combines the findings from multiple research “bad” cholesterol, diabetes, smoking, obesity,
studies to develop a single conclusion that has and a sedentary lifestyle) is based on the Framing-
greater statistical power. By pulling together a ham Study (Levy and Brink 2005).
number of independent studies, researchers can
make a more accurate estimate of the effect of a
result (Borenstein et al. 2009; Higgins and Green Focus Groups
2008).
Occasionally, health services researchers conduct
focus groups to obtain data. Focus groups gener-
Observations ally consist of five to ten participants who are
asked their opinions about a topic in a group
Health services researchers sometimes conduct interview. Although the interviews are informal,
observational studies to obtain data. In these open-ended, and relatively broad, a moderator
types of studies, individuals are observed or cer- asks the group a series of questions to help direct
tain outcomes are measured, but no attempt is the discussion. Focus groups may be used to
made to affect the outcome. They do not involve explore new research areas, topics that are diffi-
an experiment or intervention. Observational stud- cult to observe, and very sensitive topics. They
ies may be either cross-sectional or longitudinal. may also be used to gather preliminary data, aid in
Cross-sectional studies are short quick snapshot survey development and more formal structured
studies, and they do not provide definitive infor- interviews, and clarify complex research findings.
mation about a cause-and-effect relationship. As the focus group session is occurring, it is
However, longitudinal studies that are conducted audio- and/or video-recorded. These recording
over long periods of time with many observations are then transcribed, reviewed, and studied.
can determine changes in individuals and Focus groups have advantages as well as dis-
populations. They can establish the sequence of advantages. They may generate new ideas and
events and suggest a cause-and-effect relationship. allow clarification of issues, and the group mem-
Observational studies can vary greatly in size, bers may stimulate each other. However, members
scope, and complexity. Some observational stud- and the moderator can bias responses; some mem-
ies are very small, inexpensive, quickly bers may dominate the group; and the results of
conducted, cross-sectional studies. An example the focus group may be difficult to analyze or
of such as study would be a researcher investigat- quantify (Krueger and Casey 2009).
ing the waiting times of patients at a health care Recently, the Robert Wood Johnson Founda-
clinic. He or she might conduct the study by tion conducted a series of focus groups to gather
unobtrusively sitting in the waiting room for a information on what consumers think about the
few days observing and coding the demographic rising cost of health care in the USA. The foun-
characteristics of each patient and the number of dation convened eight focus groups in four cities:
minutes they waited to be seen. Philadelphia; Charlotte, North Carolina; Chicago;
In contrast, other observational studies are very and Denver. The participants included individuals
large, expensive, lengthy, longitudinal studies. with employer-sponsored insurance, those who
One of the most famous longitudinal observational purchased their insurance on the private market,
studies in modern medicine is the Framingham those enrolled in Medicare, and those without any
Heart Study. Begun in 1948 and continuing to health insurance coverage. The major findings of
the focus groups were that the participants were Some health surveys are conducted by inter-
very aware of their actual health care costs; they views, which may be completed over the tele-
were aware of the rising costs of care, but did not phone or face-to-face. Telephone interviews are
understand why it was happening; the rising costs more frequently used because of their versatility,
were affecting their daily lives and purchases; and data quality, and cost and time efficiency. In con-
they were increasingly angry about the increasing trast, face-to-face interviews are generally consid-
costs, but felt helpless in reversing the trends ered to provide the very best data quality, but they
(Robert Wood Johnson Foundation 2013). are the most expensive and time-consuming sur-
veys to complete (Aday and Cornelius 2006;
Johnson 2014).
Surveys To collect longitudinal data to measure
changes over time, health services researchers
Some health services researchers rely heavily on periodically send surveys to a panel of individuals
surveys to gather data for their studies. They or organizational respondents. An example of
occasionally conduct their own health care sur- such a survey is the US Agency for Healthcare
veys, but more often use data from surveys Research and Quality’s (AHRQ) Medical Expen-
conducted by others. Using these data, they con- diture Panel Survey (MEPS). Begun in 1996,
duct health needs assessments, develop health MEPS is a set of surveys of individuals and house-
profiles of groups/populations, monitor the health holds, their medical providers, and employers
of cohorts and populations, and collect pre- and across the nation. MEPS collects data to estimate
posttest heath care measures. the frequency and use of specific health services,
Health care surveys are a very effective and the cost and payment for these services, and the
efficient method of estimating the characteristics health insurance coverage held by and available to
of large groups/populations using representative US workers.
samples. Most health surveys are conducted with Specifically, MEPS consists of three compo-
a large number of participants who are randomly nents: household, insurance, and other. The
selected to reduce the risk of selection bias. The household component collects panel data from a
surveys collect data in a structured, standardized sample of families and individuals using several
manner from each respondent. Lastly, these data rounds of interviewing conducted over 2 years.
are typically summarized as counts or persons or Data from the interviews make it possible for
events. researchers to identify how the changes in the
Health survey data are collected using two respondent’s health status, income, employment,
broad strategies, and the respondents are asked health insurance, use of services, and payment of
to reply to questions presented in questionnaires care are related. The insurance component gathers
or read aloud by interviewers. These two strate- data by surveying employers about the health
gies may be employed individually or in insurance coverage they offer their workers. The
combination. other component collects data on the hospitals,
The most widely used type of survey is the physicians, home health care providers, and phar-
self-administered mailed survey, whereby a macies that provided care to respondents. It is
questionnaire and an introductory cover letter used to supplement and/or replace information
are sent via standard mail to a sample of persons. received from the respondents.
The respondents are asked to complete the ques- Data obtain from MEPS are published in var-
tionnaire and return it to the researcher using a ious statistical briefs, which can be downloaded.
preaddressed return envelope enclosed with the Recent briefs have reported on the access to
questionnaire. With the increasing use of home health care by adult men and women, ages
computers, self-administered surveys are also 18–64 (Davis 2014); the number and character-
increasingly being sent to respondents via istics of the long-term uninsured (Rhoades and
e-mail and the Internet. Cohen 2014); and national health care expenses
88 R. M. Mullner
by type of service and source of payment file, containing a list of equipment provided such
(Stagnitti and Carper 2014). MEPS household as oxygen equipment, hospital beds, and wheel-
component public use data files and insurance chairs; Prescription Drug Events file, containing
component summary data tables are released on the variables: age, gender, drug name, dose, cost,
AHRQ’s MEPS website on a regular annual and payment by patient; Hospice Beneficiary file,
schedule, http://meps.ahrq.gov/mepsweb/about_ containing the variables: age, gender, and length
meps/releaseschedule.jsp. of stay; Carrier Line Items file, containing physi-
cian/supplier medical claims data, dates of ser-
vice, and reimbursement amounts; Home Health
Medical Records, Administrative, Agency (HHA) Beneficiary file, containing demo-
and Billing Sources graphic and claim-related variables; Outpatient
Procedures file, containing demographic variables
A rich source of health care data can be obtained and procedures provided; Skilled Nursing Facility
from medical records, administrative, and billing (SNF) Beneficiary file, containing demographic
sources. The most widely used and easily acces- and nursing home claims; Chronic Conditions
sible source of this type of data is the Medicare file, containing age, gender, various chronic con-
claims files. These data files have been widely ditions, and dual-eligibility status; Institutional
used by health services researchers to identify: Provider and Beneficiary Summary file,
the factors that influence hospitalization; the geo- containing data on Medicare institutional claims
graphic variations in the type of care patients paid during the calendar year and a summary of
receive, such as the previously discussed Dart- other measures; Prescription Drug Profiles file,
mouth Atlas of Health Care Project; the cost- containing demographic variables, plan-drug-
effectiveness of various clinical procedures; and and prescriber characteristics, and payment data;
the effect of health reform efforts such as the and the Geographic Variation Public Use file,
Affordable Care Act (ACA) on Medicare utiliza- containing demographic, spending, utilization,
tion rates. and quality indicators at the state, hospital referral
CMS has numerous data files available to region, and county level.
researchers. However, because of privacy con- Further information about the data files can be
cerns, some of the files are more restricted than obtained from the CMS-funded Research Data
others. CMS classifies its files into three catego- Assistance Center (ResDAC), which is located at
ries: Research Identifiable Files (RIF), which are the University of Minnesota, Minneapolis. Its
the most restricted files because they contain website is www.resdac.org.
patient and condition identifiable data; Limited
Data Sets (LDS), which are less restricted files
because their patient-specific data are ranged or Registries
encrypted; and Public Use Files (PUF)/Non-
identifiable Files, which are the least restricted Health services researchers occasionally use data
files of all, are readily available, and can be easily from registries to conduct their studies. Registries
downloaded. are tools that systematically collect a defined set of
CMS has released a number of public use data exposures, health conditions, and demographic
files. These “Basic Stand Alone (BSA) Medicare data about individuals, with the data held in a
Claims Public Use Files (PUFs)”mainly consist of central database for a specific purpose. They are
5 % random samples of all Medicare beneficiaries used for a multitude of purposes including moni-
from a reference year. Examples of these data files toring treatment benefits and risks, understanding
include: Hospital Inpatient Claims file, containing the natural history of diseases, identifying unmet
the variables: age, gender, base DRG, ICD-9 pro- medical needs, and determining the quality of
cedure code, length of stay, and the amount paid; care. Registries can vary greatly in size, scope,
Durable Medical Equipment (DME) Line Items and duration. Some registries collect data at a
single clinic for a few weeks, while others are reports on medical malpractice payments, medical
international in scope and collect data for many review actions, and sanctions by Board of Medi-
decades. Registries may be sponsored by govern- cal Examiners. It collects information from med-
ment agencies, nonprofit organizations, health ical malpractice payments and adverse licensures,
care facilities, and/or private for-profit companies Drug Enforcement Administration (DEA) reports,
(Arts et al. 2002). and Medicare and Medicaid exclusion actions
It is difficult to classify the various types of concerning physicians, dentists, and other
registries because of their great diversity and licensed health care practitioners. The NPDB pro-
scope. Also, they may collect overlapping sets of vides this information to health care providers,
data. However, they can be very roughly divided hospitals, and state and federal agencies to use
into product registries, disease or condition regis- when making important hiring or licensing deci-
tries, and health services registries. sions. This helps protect the public by preventing
Product registries gather data on individuals physicians and other practitioners from hiding
who received a specific drug or medical device. their past when they move to a new state (Wake-
To ensure safety, these registries have been field 2011).
established to monitor individuals who received The NPDB public use data file, which does not
such drugs as thalidomide, and those who were include any information that identifies individuals or
given medical devices such as implantable reporting entities, is available for statistical analysis
cardioverter defibrillators. Registries have also at www.npdb.hrsa.gov/resources/publicData.jsp.
been established to monitor possible drug expo-
sures during pregnancy and the neonatal
consequences. Vital Records
Disease or condition registries gather data on
individuals with specific disorders. These regis- Vital records include birth certificates, marriage
tries may identify the natural history of a disease, licenses and divorce decrees, and death certifi-
evaluate possible treatments, and stimulate new cates. In the USA, counties and state governments
research on the cause and outcome of the disorder. collect, manage, and disseminate vital records, not
Diseases included in these registries can vary from the federal government. Health services
rare diseases such as cystic fibrosis, to relatively researchers frequently use data from birth and
common chronic diseases such as heart failure. death certificates in their studies. They use these
Health services registries tend to gather data on data to track health trends to determine changing
individual clinical encounters such as physician public health and research priorities, identify
office visits, hospitalizations, clinical procedures, racial and ethnic disparities, measure the impact
and total episodes of care. Some registries include of various diseases, ascertain the use of health care
all patients undergoing a procedure such as an services, and to address quality of care issues
appendectomy or those admitted to a hospital for (Children’s Health Care Quality measures Core
a particular diagnosis such as community- Set Technical Assistance and Analytic Support
acquired pneumonia. Many of these registries Program 2014; National Research Council 2009).
are used to evaluate the outcome of care and the The US Standard Certificate of Live Birth con-
associated quality of health care services (Gliklich tains a wealth of information on the newborn, as
and Dreyer 2010). well as the mother and father. Data on the new-
An example of a unique health services regis- born include name, sex, time and place of birth,
try is the Health Resources and Services Admin- birth weight, Apgar scores, abnormal conditions,
istration’s (HRSA) National Practitioner Data and congenital anomalies of the newborn. Data on
Bank (NPDB). The NPDB is a critical tool in the the mother include name; address; education
US’ efforts to protect patients from incompetent, level; whether of Hispanic origin or not; race;
unprofessional, and often dangerous health care date of first and last prenatal care visit; total num-
practitioners. Since 1986, the NPDB has collected ber of prenatal visits; number of other pregnancy
90 R. M. Mullner
outcomes; the degree of cigarette smoking before private organizations, sometimes with government
and during pregnancy; whether the mother was support through contracts and grants, also collect
transferred for maternal medical or fetal indica- health care data for research purposes, to monitor
tions for delivery; principal source of payment for health policies, and to identify their member’s views
the delivery; risk factors in the pregnancy such as and opinions on various issues. Table 3 shows the
diabetes, hypertension, and previous preterm classification of health care data collection organi-
birth; obstetric procedures used; onset of labor; zations and holding sources, including a list of var-
characteristics of the labor and delivery; method ious representative organizations and their websites.
of delivery; and maternal morbidity. Data on the
father include: name, age, education level,
whether of Hispanic origin or not; and race. Government Organizations
The US Standard Certificate of Death
records the decedent’s: legal name; age; sex; Federal, state, and local governments collect data
social security number; birthplace; residence; on the health care programs they conduct and man-
marital status at the time of death; place of age. These data are often readily available to
death; place of disposition; date of death; researchers at little or no cost. From the perspective
cause of death including the immediate and of health services research, government data collec-
underlying cause; manner of death; if the tion and holding agencies can be broadly classified
injury lead to death, the date and time of injury; into the following categories: US health informa-
and the location of injury. tion clearinghouses and libraries; US registries; US
There is also a separate certificate for fetal government agencies and departments; health pro-
deaths. The US Standard Report of Fetal Death grams and systems of other (non-US) nations; and
collects data on: the name of the fetus; sex; date government sponsored international organizations.
and place where delivery occurred; initiating
cause/condition; other significant causes or con- US Health Information Clearinghouses
ditions; risk factors in the pregnancy; infections and Libraries
present and/or treated during the pregnancy; The federal government maintains many clearing-
method of delivery; maternal morbidity; and con- houses and libraries that are valuable resources for
genital anomalies of the fetus. health services research. For example, the
Although birth, death, and fetal death certifi- National Institutes of Health’s (NIH) National
cates are confidential legal records, they can be Library of Medicine (NLM) is the world’s largest
obtained for research purposes from state public biomedical library. The NLM maintains and
health departments. Summary data on births, makes available its vast print collection and pro-
deaths, fetal deaths, and linked birth/infant deaths duces and continuously updates its electronic
can also be obtained from the National Center for information resources such as PubMed/
Health Statistics (NCHS). Its data release and MEDLINE. PubMed comprises more than 24 mil-
access policy for microdata and compressed vital lion citations from MEDLINE. The NLM also
statistics files can be found at www.cdc.gov/nchs/ contains the National Information Center on
nvss/dvs_data_release.htm. Health Services Research and Health Care Tech-
nology (NICHSR). This center maintains data-
bases and provides outreach and training, and
Data Sources and Holdings information and publications on health services
research. Its website is www.nlm.nih.gov/nichsr/.
The third question of this typology is – Which The US government’s principal health statisti-
government agency or private organization col- cal agency is the US Centers for Disease Control
lected and is currently holding these data? A large and Prevention’s (CDC) National Center for
number of government and private organizations Health Statistics (NCHS). Since 1960, the
collect and disseminate health care data. Many NCHS has conducted numerous national health
Table 3 Data collection organizations and holding sources

Government Organizations
US Health Information Clearinghouses and Libraries
Area Health Resource Files (AHRF), www.ahrf.hrsa.gov
Congressional Research Service, www.loc.gov/crsinfo/
Data.gov, www.data.gov
HealthCare.Gov, www.healthcare.gov
National Center for Health Statistics (NCHS), www.cdc.gov/nchs/
National (Evidence-Based Clinical Practice) Guideline Clearinghouse, www.guideline.gov
National Health Information Center (NHIC), www.health.gov/nhic/
National Information Center on Health Services Research and Health Care Technology (NICHSR), www.nlm.nih.gov/
nichsr/
National Institute on Deafness and Other Communication Disorders (NIDCD), www.nidcd.nih.gov/health/misc/pages/
clearinghouse.aspx
National Library of Medicine (NLM),www.nlm.nih.gov
National Mental Health Information Center, www.samhsa.gov
National Oral Health Clearinghouse, www.nider.nih.gov
U.S. Registries
FDA Adverse Event Reporting System (FAERS), www.fda.gov
Global Rare Diseases Patient Registry Data Repository (GRDR), www.rarediseases.info.nih.gov
National Practitioner Data Bank (NPDB), www.npdb.hrsa.gov
National Registry of Evidence-Based Programs and Practices (NREPP), www.nrepp.samhsa.gov
National Vital Statistics System (NVSS), www.cdc.gov/nchs/nvss/
NIH Genetic Testing Registry (GTR), www.ncbi.nlm.nih.gov/gtr/
Surveillance, Epidemiology, and End Results (SEER) Cancer Registry, www.seer.cancer.gov
US Government Agencies and Departments
Agency for Healthcare Research and Quality (AHRQ),www.ahrq.gov
Centers for Disease Control and Prevention (CDC), www.cdc.gov
Centers for Medicare and Medicaid Services (CMS), www.cms.gov
Congressional Budget Office (CBO), www.cbo.gov
Employee Benefits Security Administration (EBSA), www.dol.gov/ebsa/
Federal Trade Commission (FTC), www.ftc.gov
Food and Drug Administration (FDA), www.fda.gov
Government Accountability Office (GAO), www.gao.gov
Health Resources and Services Administration (HRSA), www.hrsa.gov
Internal Revenue Service (IRS), www.irs.gov
Medicare Payment Advisory Commission (MedPAC), www.medpac.gov
National Institute on Aging (NIA), www.nia.nih.gov
National Institute on Drug Abuse (NIDA), www.drugabuse.gov
National Institute of Health (NIH), www.nih.gov
Office of the National Coordinator for Health Information Technology (ONC), www.hhs.gov/healthit
Presidential Commission for the Study of Bioethical Issues, www.bioethics.gov
Substance Abuse and Mental Health Services Administration (SAMHSA), www.samhsa.gov
US Agency for International Development (USAID), Bureau for Global Health, www.usaid.gov
US Census Bureau, www.census.gov
US Department of Health and Human Services (HHS), www.hhs.gov
US Department of Justice, www.usdoj.gov
US Department of Labor Statistics, www.bls.gov
US Department of Veterans Affairs (VA), www.va.gov
US House of Representatives, www.house.gov
(continued)
92 R. M. Mullner
Table 3 (continued)
US Senate, www.senate.gov
US Social Security Administration (SSA), www.ssa.gov
White House, www.whitehouse.gov
Health Programs and Systems of Other (non-U.S.) Nations
Australian Commission on Safety and Quality in Health Care, www.humanservices.gov.au
Australian Government Department of Human Services, www.humanservices.gov.au
Canadian Agency for Drugs and Technologies in Health, www.cadth.ca
Canadian Institute for Health Information, www.cihi.ca
Canadian Institutes of Health Research, www.cihr-irsc.gc.ca
Health Canada, www.hc-sc.gc.ca
United Kingdom’s National Health Service (NHS),www.nhs.uk
United Kingdom’s National Institute for Health and Care Excellence (NICE), www.nice.org.uk
Government Sponsored International Organizations
European Commission, www.ec.europa.eu
European Observatory on Health Systems and Policies, www.euro.who.int/en/about-us/partners/observatory/about-us
Organisation for Economic Co-operation and Development (OECD), www.oecd.org
Pan American Health Organization (PAHO), www.paho.org
United Nation Children’s Fund (UNICEF), www.unicef.org
United Nations (UN), www.un.org
World Bank, www.worldbank.org
World Health Organization (WHO), www.who.int
Private Organizations
Health information clearinghouses and libraries
Centre for Evidence-Based Medicine (CEBM), www.cebm.net
Cochrane Collaboration, www.cochrane.org
Cornell Disability Research Group, www.disabilitystatistics.org
Dartmouth Atlas of Health Care Project, www.dartmouthatlas.org
Data Resource Center for Child and Adolescent Health, www.childhealthdata.org
Health Care Cost Institute (HCCI), www.healthcostinstitute.org
Health Data Consortium, www.healthdataconsortium.org
IMS Health, www.imshealth.com
Inter-University Consortium of Political and Social Research (ICPSR), www.icpsr.umich.edu
National Association of Health Data Organization (NAHDO), www.nahdo.org
National Implementation Research Network (NIRN), www.preventionaction.org
National Rehabilitation Information Center (NARIC), www.naric.com
National Rural Health Resource Center, www.ruralcenter.org
Accreditation, evaluation, and regulatory organizations
Accreditation Association for Ambulatory Health Care (AAAHC), www.aaahc.org
Accreditation Canada, www.accreditation.ca
Accreditation Commission for Health Care (ACHC), www.achc.org
Association of American Medical Colleges (AAMC), www.aamc.org
Board of Certification/Accreditation (BOC), www.bocusa.org
Center for Improvement in Healthcare Quality (CIHQ), www.cihq.org
Community Health Accreditation Partner (CHAP), www.chapinc.org
Det Norske Veritas (DNV) Healthcare, www.dnvglhealthcare.com
Health Grades, www.healthgrades.com
Healthcare Facilities Accreditation Program (HFAP), www.hfap.org
Healthcare Quality Association on Accreditation (HQAA), www.hqaa.org
Intersocietal Accreditation Commission (IAC), www.intersocietal.org
(continued)
Table 3 (continued)
Joint Commission, www.jointcommission.org
Leapfrog Group, www.leapfroggroup.org
Medical Travel Quality Alliance (MTQUA), www.ntqua.org
National Business Group on Health (NBGH), www.businessgrouphealth.org
National Committee for Quality Assurance (NCQA), www.ncqa.org
National Quality Forum (NQF), www.qualityforum.org
URAC, www.urac.org
Associations and professional societies
Disease/condition associations
ALS Association, www.alsa.org
American Association for Cancer Research (AACR), www.aacr.org
American Cancer Society (ACA), www.cancer.org
American Chronic Pain Association, www.theacpa.org
American Diabetes Association (ADA), www.diabetes.org
American Heart Association (AHA),www.heart.org
American Stroke Association www.strokeassociation.org
American Trauma Society (ATS), www.amtrauma.org
Canadian Mental Health Association (CMHA), www.cmha.ca
CORD (Canadian Organization for Rare Disorders), www.raredisorders.ca
EURORDIS (European Organisation for Rare Diseases), www.eurordis.org
Mental Health America, www.mentalhealthamerica.net
National Alliance on Mental Illness (NAMI), www.nami.org
National Health Council, www.nationalhealthcouncil.org
National Organization for Rare Diseases (NORD), www.rarediseases.org
NORD (National Organization for Rare Disorders), www.rarediseases.org
Unite for Sight, www.uniteforsight.org
Demographic and population group associations
AAPD (American Association of People with Disabilities), www.aapd.com
AARP, www.aarp.org
American Correctional Health Services Association (ACHSA), www.achsa.org
National Alliance for Hispanic Health, www.hispanichealth.org
National Associations of Counties (NACO), www.naco.org
National Coalition for the Homeless, www.nationalhomeless.org
National Medical Association (NMA), www.nmanet.org
National Rural Health Association (NRHA), www.ruralhealthweb.org
NCAI (National Congress of American Indians), www.ncai.org
Population Association of America, www.populationassociation.org
Health care organizations and trade associations
AAMI (Association for the Advancement of Medical Instrumentation), www.aami.org
Advanced Medical Technology Association (AdvaMed), www.advamed.org
Ambulatory Surgery Center Association, www.ascassociation.org
American Association of Accountable Care Organizations (AAACO), www.aaaco.org
American Association of Blood Banks (AABB), www.aabb.org
American Association of Eye and Ear Centers of Excellence (AAEECE), www.aaeece.org
American Association of Homes and Services for the Aging (AAHSA), www.aahsa.org
American Association of Preferred Provider Organizations (AAPPO), www.aappo.org
American Health Care Association (AHCA), www.ahca.org
American Health Information Management Association (AHIMA), www.ahima.org
American Hospital Association (AHA), www.aha.org
(continued)
94 R. M. Mullner
Table 3 (continued)
Association for Behavioral Health and Wellness (ABHW), www.abhw.org
Association of the British Pharmaceutical Industry (ABPI), www.abpi.org.uk
Association of Clinical Research Organization (ACRO), www.acrohealth.org
Catholic Health Association of the United States (CHAUSA), www.chausa.org
Children’s Hospital Association, www.childrenshospitals.net
Federation of American Hospitals (FAH), www.fah.org
HealthCareCAN, www.healthcarecan.ca
HOPE: European Hospital and Healthcare Federation, www.hope.be
International Hospital Federation (IHF), www.ihf-fih.org
Medical Device Manufacturers Association (MDMA), www.medicaldevices.org
National Association of ACOs (NAACOS), www.naacos.com
National Association of Community Health Centers (NACHC), www.nachc.com
National Association for Home Care and Hospice (NAHC), www.nahc.org
PhRMA (Pharmaceutical Research and Manufacturers of America), www.phrma.org
Trauma Center Association of America, www.traumacenters.org
UHC (University Health System Consortium), www.uhc.edu
World Medical Association (WMA), www.wma.net
Professional societies
Academy Health, www.academyhealth.org
American Academy of Family Physicians (AAFP), www.aafp.org
American Academy of Pediatrics (AAP), www.aap.org
American Academy of Physician Assistants (AAPA), www.aapa.org
American Board of Medical Specialties (ABMS), www.abms.org
American College of Emergency Physicians (ACEP), www.acep.org
American College of Healthcare Executives (ACHE), www.ache.org
American College of Surgeons (ACS), www.facs.org
American College of Radiology (ACR), www.acr.org
American College of Wound Healing and Tissue Repair (ACWHTR), https://acwound.org
American Dental Association (ADA), www.ada.org
American Medical Association (AMA), www.ama-assn.org
American Nurses Association (ANA), www.nursingworld.org
American Osteopathic Association, www.osteopathic.org
American Psychiatric Association (APA), www.psychiatry.org
American Psychological Association (APA), www.apa.org
American Public Health Association (APHA), www.apha.org
American Society of Anesthesiologists, www.asahq.org
American Society of Health Economists (ASHE), www.healtheconomics.us
American Society of Plastic Surgeons (ASPS), www.plasticsurgery.org
Canadian Medical Association (CMA), www.cma.ca
European Society for Health and Medical Sociology (ESHMS), www.eshms.eu
Health Services Research Association of Australia and New Zealand, www.hsraanz.org
International Health Economics Association (iHEA), www.healtheconomics.org
National Association of Chronic Disease Directors, www.chronicdisease.org
National Association of Medicaid Directors (NAMD), www.medicaiddirectors.org
National Cancer Registrars Association (NCRA), www.ncra-usa.org
National Governors Association (NGA), www.nga.org
National League for Nursing (NLN), www.nln.org
Society of General Internal Medicine (SGIM), www.sgim.org
Society for Medical Decision Making (SMDM), www.smdm.org
(continued)
Table 3 (continued)
Foundations and trusts
Canadian Foundation for Healthcare Improvement, www.cfhi-fcass.ca
Commonwealth Fund, www.commonwealthfund.org
Ford Foundation, www.fordfoundation.org
Gates (Bill and Melinda) Foundation, www.gatesfoundation.org
Health Research and Educational Trust (HRET), www.hret.org
Kaiser (Henry J.) Family Foundation, www.kff.org
Kellogg (WK) Foundation, www.wkkf.org
Kresge Foundation, www.kresge.org
MacArthur (John D. and Catherine T.) Foundation, www.macarthur.org
Milbank Memorial Fund, www.milbank.org
National Patient Safety Foundation (NPSF), www.npsf.org
New America Foundation, www.newamerica.net
NIHCM (National Institute for Health Care Management) Foundation, www.nihcm.org
Pew Charitable Trusts, www.pewtrusts.org
Physicians Foundation, www.physiciansfoundation.org
Pfizer Foundation, www.pfizer.com
Public Health Foundation, www.phf.org
Robert Wood Johnson Foundation (RWJ), www.rwjf.org
Wellcome Trust, www.wellcome.ac.uk
Health insurance and employee benefits organizations
American Academy of Insurance Medicine (AAIM), www.aaimedicine.org
America’s Health Insurance Plans (AHIP), www.ahip.org
American Insurance Association (AIA), www.aiadc.org
Association for Community Affiliated Plans (ACAP), www.communityplans.net
Blue Cross and Blue Shield Association (BCBS), www.bcbs.com
Canadian Life and Health Insurance Association (CLHIA), www.clhia.ca
Employee Benefit Research Institute (EBRI), www.ebri.org
Healthcare Financial Management Association (HFMA), www.hfma.org
Insurance – Canada, www.insurance-canada.ca
Medicaid Health Plans of America (MHPA), www.mhpa.org
National Academy of Social Insurance (NASI), www.nasi.org
National Association of Health Underwriters (NAHU), www.nahu.org
National Association of Insurance Commissioners (NAIC), www.naic.org
Physicians for a National Health Program (PNHP),www.pnhp.org
Registries
Alzheimer’s Prevention Registry, www.endalznow.org
American Burn Association, National Burn Repository, www.ameriburn.org
Australian Orthopaedic Association National Joint Replacement Registry (AOANJRR). www.aoa.org.au
British Society for Rheumatology Rheumatoid Arthritis Register (BSRBR-RA), www.inflammation-repair.manchester.ac.uk
Congenital Muscle Disease International Registry (CMDIR), www.cmdir.org
Cystic Fibrosis Foundation (CFF) Patient Registry, www.cff.org
DANBIO Registry of Biologics Used in Rheumatpid Arthritis Patients, www.danbio-online.dk
Danish Hip Arthroplasty Register (DHR), www.kea.au.dk
EPIRARE (European Platform for Rare Disease Registries), www.epirare.eu
International Society of Heart and Lung Transplantation (ISHLT), www.ishlt.org
Kaiser Permanente Autoimmune Disease Registry, www.kaiserpermanente.org
NAACCR (North American Association of Central Cancer Registries), www.naaccr.org
National Cancer Data Base, www.facs.org/quality programs/cancer/ncdb/
(continued)
96 R. M. Mullner
Table 3 (continued)
National Cardiovascular Data Registry, www.cardiosource.org
National Marrow Donor Program’s Be the Match Registry, www.bethematch.org
National Trauma Data Bank, www.ntdsdictionary.org
Register of Information and Knowledge about Swedish Heart Intensive-care Admissions (RIKS-HIA), www.ucr.uu.se
Scientific Registry of Transplant Recipients (SRTR), www.srtr.org
Swedish Childhood Cancer Registry, www.cceg.ki.se
Swedish Hip Arthroplasty Register (SHAR), www.shpr.se
Swedish National Cataract Register (NCR), www.kataraktreg.se
Swedish Rheumatology Quality Register (SRQ),www.srq.nu/en/
United Kingdom Cataract National Data Set for Adults,www.rcophth.ac.uk
United Kingdom Myocardial Ischaemial National Audit Project (MINAP), www.hqip.org.uk
United Network for Organ Sharing (UNOS), www.unos.org
Research and policy organizations
Abt Associates, www.abtassociates.com
American Enterprise Institute for Public Policy (AEI), www.aei.org
American Health Policy Institute, www.americanhealthpolicy.org
American Research Institute for Policy Development (ARIPD), www.aripd.org
Battelle Memorial Institute, www.battelle.org
Brookings Institution, www.brookings.edu
Canadian Association for Health Services and Policy Research (CAHSPR), www.cahspr.ca
Cato Institute, www.cato.org
Deloitte Center for Health Solutions, www2.deloitte.com
ECRI Institute, www.ecri.org
Families USA, www.familiesusa.org
Galen Institute, www.galen.org
George Washington University Center for Health Policy Research, www.publichealth.gwu.edu
Institute for Clinical Evaluative Sciences (Ontario, Canada), www.chspr.ubc.ca
Institute for e-Health Policy, www.e-healthpolicy.org
Institute for the Future (IFTF), www.iftf.org
Institute for Healthcare Improvement (IHI), www.ihi.org
Institute of Medicine (IOM), www.iom.edu
International Health Economics Association (iHEA), www.healtheconomics.org
Lewin Group, www.lewin.com
Manitoba Centre for Health Policy,www.umanitoba.ca/centres/mchp/
Mathematica Policy Research, www.mathematica-mpr.com
McMaster University Centre for Health Economics and Policy Analysis (CHEPA), www.chepa.org
National Bureau of Economic Research (NBER), www.nber.org
National Center for Policy Analysis (NCPA), www.ncpa.org
National Center for Public Policy Research, www.nationalcenter.org
National Coalition on Health Care (NCHC), www.nchc.org
National Health Policy Forum (NHPF), www.nhpf.org
National Health Policy Group (NHPG), www.nhpg.org
National Institute for Health Care Reform (NIHCR), www.nihcr.org
Nuffield Trust, www.nuffieldtrust.org.uk
Patient-Centered Research Institute (PCORI), www.pcori.org
RAND Corporation, www.rand.org
RTI International, www.rti.org
Stanford University Center for Health Policy/Center for Primary Care and Outcomes Research, www.stanford.edu
Transamerica Center for Health Studies, www.transamericacenterforhealthstudies.org
(continued)
Table 3 (continued)
University of British Columbia, Centre for Health Services and Policy Research, www.chspr.ubc.ca
University of California Los Angles Center for Health Policy Research, www.healthpolicy.ucla.edu
University of Illinois at Chicago Institute for Health Research Policy, www.ihrp.uic.edu
University of Nebraska Center for Health Policy Analysis and Rural Health Research, www.unmc.edu
Urban Institute, www.urban.org
Westat, www.westat.com
Survey research organizations
AAPOR (American Association for Public Opinion Research), www.aapor.org
AASRO (Association of Academic Survey Research Organizations), www.aasro.org
American Statistical Association, Survey Research Methods Section, www.amstat.org/sections/srms
CASRO (Council of American Survey Research Organizations), www.casro.org
ESRA (European Survey Research Association), www.europeansurveyresearch.org
Gallup, Inc., www.gallup.com
GESIS – Leibniz Institute for the Social Sciences, Mannheim, Germany, www.gesis.org/en/institute/
Harris Interactive, www.harrisinteractive.com
Institute for Social Research, York University, www.isr.yorku.ca
NORC at the University of Chicago,www.norc.org
ORC (Opinion Research Corporation) International, www.orcinternational.com
Population Research Laboratory, University of Alberta, www.ualberta.ca/PRL/
Rasmussen Reports, www.rasmussenreports.com
Roper Center for Public Opinion Research, University of Connecticut, www.ropercenter.uconn.edu
Survey Health Care, www.surveyhealthcare.com
Survey Research Laboratory, University of Illinois at Chicago, www.srl.uic.edu
University of Virginia Center for Survey Research, www.virginia.edu/surveys/moreinfo.htm
surveys, and it has also worked closely with state US Registries

governments to gather vital records. The NCHS The federal government conducts, and/or spon-
currently has four major data collection programs: sors, a number of health care registries. For exam-
National Vital Statistics System (NVSS); National ple, the National Cancer Institute‘s (NCI)
Health Interview Survey (NHIS); National Health Surveillance, Epidemiology, and End Results
and Nutrition Examination Survey (NHANES); (SEER) Program is the nation’s premier source
and the National Health Care Surveys. A sum- of cancer statistics. The SEER Program is a coor-
mary of NCHS’ surveys and data collection sys- dinated system of population-based cancer regis-
tems can be found at www.cdc.gov/nchs/data/ tries located across the United States. It currently
factsheets/factsheet_summary2.pdf. collects cancer incidence and survival data from
An important source of health care reports and 20 geographic areas of the nation, which together
issue briefs is the US Library of Congress’ (LOC) represent about 28 % of the US population. The
Congressional Research Service (CRS). The CRS, SEER Program provides essential data to track the
which is the research arm of the US Congress, nation’s medical progress against cancer. It also
conducts its analysis at the requests of Congressio- enables researchers to study access, quality, and
nal committees and individual members of both the outcomes of health care, geographic patterns of
House and Senate. CRS reports are unbiased, timely, cancer care, and health disparities. SEER’s data
and comprehensive. Recent reports have addressed are widely available through factsheets, reports,
the various aspects of the Affordable Care Act databases, analytical software, websites, and link-
(ACA), the most extensive reform of the American ages to other national data sources (National Can-
health care system in the last 50 years. CRS’ reports cer Institute 2010). More information can be
are posted on its website, www.loc.gov/crsinfo/. obtained at www.seer.cancer.gov.
98 R. M. Mullner
The US Food and Drug Administration Table 4 List of major U.S. Department of Health and
(FDA) also has many registries that may be Human Services data sources by division/agency
useful to health services researchers. There are Agency for Healthcare Research and Quality
registries that identify all US registered drugs (AHRQ)
and medical devices and their manufacturers; Healthcare Cost and Utilization Project (HCUP)
Medical Expenditure Panel Survey (MEPS)
record the occurrence of adverse drug events
Centers for Disease Control and Prevention (CDC)
and medication errors; and list drugs in short
Behavioral Risk Factor Surveillance System (BRFSS)
supply and the reasons for the drug shortages.
National Ambulatory Medical Care Survey (NAMCS)
The FDA also has a registry of all new and
National Health Interview Survey (NHIS)
generic drug approvals. More information on National Health and Nutrition Examination Survey
these and other FDA registries can be found at (NHANES)
www.fda.gov. National Home and Hospice Care Survey (NHHCS)
National Hospital Ambulatory Medical Care Survey
US Government Agencies (NHAMCS)
and Departments National Hospital Care Survey (NHCS)
All of the 15 executive departments of the US National Hospital Discharge Survey (NHDS)
federal government are involved in some way National Immunization Survey (NIS)
National Nursing Home Survey (NNHS)
with collecting health care data. For example,
National Survey of Children’s Health (NSCH)
the US Department of Labor (DOL) collects data
National Survey of Family Growth (NSFG)
on the nation’s health care workers and makes
National Survey of Residential Care Facilities (NSRCF)
projections of future needs; the US Department
National Vital Statistics System (NVSS)
of Defense (DOD) records the health care services State and Local Area Integrated Telephone Survey
they provide to military personnel and their fam- (SLAITS)
ilies at bases throughout the world; and the US Youth Risk Behavior Surveillance System (YRBSS)
Department of Homeland Security (DHS) works Centers for Medicare and Medicaid Services (CMS)
and collects data on the nation’s hospitals and CMS Administrative Datasets
public health departments to prepare for natural Home Health Outcome and Assessment Information Set
disasters and possible terrorist attacks. (OASIS)
The US Department of Health and Human Medicare Current Beneficiary Survey (MCBS)
Services (HHS) is by far the largest and arguably Health Resources and Services Administration
(HRSA)
the most important collector of health care data of
Area Health Resource File (AHRF)
all. The HHS, with a budget of $1 trillion in fiscal National Institutes of Health (NIH)
year (FY) 2015, has many staff offices and oper- Health and Retirement Study (HRS)
ating divisions, which implement national health National Children’s Study (NCS)
care policy, manage health care programs, deliver Substance Abuse and Mental Health Services
health care services, conduct medical research, Administration (SAMHSA)
and collect health care data. The major data col- National Survey on Drug Use and Health (NSDUH)
lection systems sponsored by HHS, by agency and
division, are listed in Table 4 (Office of the Assis-
tant Secretary for Planning and Evaluation n.d.):
language websites are: United Kingdom’s
Health Programs and Systems of Other National Health Service (NHS) at www.nhs.uk,
(Non-U.S.) Nations including its the National Institute for Health and
Some nations provide data and information about Care Excellence (NICE) www.nice.org.uk; Health
their health care programs and systems, which Canada www.hc-sc.gc.ca, and the Canadian Insti-
may be useful for health services research. This tute of Health Research www.cihr-irsc.gc.ca; and
information can be quickly obtained via the Inter- Australian Government Department of Human
net. Some of the best and most accessible English- Services, www.humanservices.gov.au.
Government Sponsored International employee benefits organizations; registries;

Organizations research and policy organizations; and survey
Nearly all of the world’s nations are members of research organizations.
international organizations that address health
care policy issues. For example, the World Health Health Information Clearinghouses
Organization (WHO), which is the directing and and Libraries
coordinating authority for health within the There are a number of private health information
United Nations (UN), represents 194 member clearinghouses and libraries. Examples include
states (nations). Established in 1946 and the Cochrane Collaboration and the Dartmouth
headquartered in Genève, Switzerland, the WHO Atlas of Health Care Project, which have previ-
provides leadership on health matters worldwide, ously been discussed. Two other examples are
and it sets norms and standards on health issues. IMS Health and the Inter-University Consortium
Much of the WHO’s work is concentrated on of Political and Social Research (ICPSR).
supporting research and providing technical IMS Health is a large global information and
advice to the health departments and ministries technology services for-profit corporation, whose
of governments. The WHO complies health sta- stock is traded on the New York Stock Exchange.
tistics on its members and publishes numerous Established in 1954, and headquartered in Dan-
reports in print and on the Internet (Lee 2008). bury, Connecticut, IMS Health operates in more
Its website can be accessed at www.who.int. than 100 countries. It maintains several very large
Another important international health organi- databases (more than 10 petabytes) on various
zation is the Pan American Health Organization diseases, treatments, costs, and outcomes of care.
(PAHO). Founded in 1902 and headquartered in Using annual data from 100,000 suppliers, which
Washington, D.C., PAHO is the oldest interna- include physicians who report on the number and
tional public health agency. PAHO provides tech- type of drug prescriptions they write, and more
nical cooperation and mobilizes partnerships to than 55 billion health care transactions, the com-
improve the health and quality of life in the pany serves over 5,000 clients globally. IMS
nations of the Americas. It represents 50 nations Health customers include health care manufac-
and territories in North and South America and tures, medical providers, government agencies,
also serves as the regional office for the Americas policymakers, and researchers. More information
of the World Health Organization. PAHO collects on the company can be obtained at www.
health statistics on its members, publishes reports, imshealth.com.
and posts them on the Internet at www.paho.org. The Inter-University Consortium of Political
Other government sponsored international and Social Research (ICPSR) is a unit of the Insti-
organizations that collect health statistics include: tute for Social Research at the University of Mich-
European Commission (EC), www.ec.europa.eu; igan with offices in Ann Arbor. Established in
Organisation for Economic Cooperation and 1962, the ICPSR acquires, preserves, and distrib-
Development (OECD), www.oecd.org; United utes original social science research data to an
Nations (UN), www.un.org; and the World international consortium of to more than 700 uni-
Bank, www.worldbank.org. versity and research institution members. ICPSR
maintains a very large data archive of more than
500,000 research files. These data span many aca-
Private Organizations demic disciplines including economics, sociology,
political science, demography, gerontology, and
Private data collecting and holding organizations public health. It has several special topic collec-
include health information clearinghouses and tions that relate to health services research: health
libraries; accreditation, evaluation, and regulatory and medical care archive; minority data resource
organizations; associations and professional soci- center; national archive of computerized data on
eties; foundations and trusts; health insurance and aging; substance abuse and mental health data
100 R. M. Mullner
archive; and the terrorism and preparedness data Colorado, Healthgrades has amassed data on
resource center. Faculty, staff, and students of over three million US health care providers.
member institutions have full access to ICPSR’s Healthgrades provides online data to consumers
data archives and to all of its services. Data files on physicians, hospitals, and dentists. For exam-
are available in SAS, SPSS, Strata, and R format. ple, there website identifies the name of physi-
ICPSR’s website is www.icpsr.umich.edu. cians in a city or zip code, the conditions they
treat, the procedures they perform, the physician’s
Accreditation, Evaluation, qualifications and patient feedback, and other
and Regulatory Organizations criteria. In terms of qualifications, the site iden-
To ensure that patients receive safe high quality tifies whether the physician is board certified and
care, health care professionals, laboratories, pro- has sanctions or malpractice claims against them,
grams, and health care facilities are accredited and the report of eight measures of care, and their
regulated. patient’s willingness to recommend the physician
One of the most important accrediting organi- to their family and friends. Today, nearly one
zations is the Joint Commission. Founded in million people a day use the Healthgrades
1951, the Joint Commission, and independent, website. It should be noted that some physicians
not-for-profit organization, is the largest and have criticized Healthgrades for having erroneous
oldest accrediting health care organization in the data, and not screening for false reviews.
USA. It accredits and certifies more than 20,500 Healthgrades website is www.healthgrades.com.
health care organizations and programs in the
nation including: all types of hospitals; home Associations and Professional Societies
care organizations, medical equipment services, The largest category of private health care organi-
pharmacy, and hospice services; nursing homes zations is associations and professional societies.
and rehabilitation centers; behavioral health care This category, which includes hundreds of orga-
and addiction services; ambulatory care organiza- nizations (Swartout 2014), can be roughly
tions, group practices and office-based surgery subdivided into disease/condition associations,
practices; and independent and freestanding clin- demographic and population group associations,
ical laboratories. health care organizations and trade associations,
To receive Joint Commission accreditation, and professional societies.
hospitals, for example, must meet certain There are associations for nearly every disease
evidence-based process standards that are closely and medical condition. These associations help
linked to positive patient outcomes. These process individuals and their families suffering from the
or accountability measures include heart attack disease, and they advocate on their behalf, educate
care, pneumonia care, surgical care, children’s the general public, and work to prevent and end
asthma care, inpatient psychiatric services, the disease. An example of this type of association
venous thromboembolism care, stroke care, is the American Cancer Society (ACS). Founded
immunization, and perinatal care (Chassin in 1913, the American Cancer Society is one of
et al 2010). The Joint Commission grants accred- the largest voluntary health organizations in the
itation based on periodic reviews by its survey USA. With its headquarters in Atlanta, Georgia,
teams who conduct unannounced onsite visits, the ACS also has over 350 local offices nation-
and quarterly self-assessment reports submitted wide. The ACS works to prevent cancer and
by the hospitals. The quality and safety results detect it as early as possible. The society offers
for specific hospitals are available at www. free information, programs, and services, and it
qualitycheck.org. provides community referrals to patients, survi-
A more recently established popular health vors, and caregivers. It funds research to identify
care evaluation organization is the Healthgrades the causes of cancer, to determine the best way to
Operating Company, which simply known as prevent cancer, and to discover new ways to cure
Healthgrades. Founded in 1998 in Denver, the disease. It also works with lawmakers to
promote policies, laws, and regulations to prevent Another type of health care association is pro-
cancer. The ACS has a National Cancer Informa- fessional societies. These societies advocate and
tion Center, which is open 24 h a day, every day of lobby for their members, provide continuing edu-
the year, to answer questions from individuals. cation, and attempt to advance the field. They
And it also offers advice online. The ACS website typically publish newsletters, factsheets, journals,
is www.cancer.org. and hold local meetings and an annual convention
Some associations represent specific demo- for their members.
graphic and population groups. For example, the For example, one of the oldest professional
National Rural Health Association (NRHA) medical societies is the American Medical Associ-
works on behalf of the rural population of the ation (AMA). Founded in 1847, and incorporated
USA. Nearly 25 % of the nation’s population in 1897, the AMA is the largest association of
lives in rural areas and many of them, who tend physicians and medical students in the USA.
to be poorer, have higher suicide rates and expe- Starting as a small association, the AMA would
rience higher death and serious injury accidents become the single most influential organization on
than their urban counterparts, also face physician the practice of medicine in the nation. The AMA
shortages and have to travel long distances to gained national prominence by publishing its flag-
health facilities. The NRHA works to improve ship Journal of the American Medical Association
the health and well-being of rural Americans. It and by reorganizing into local and state-level con-
provides leadership on health issues through stituent societies, a national House of Delegates, a
advocacy, communications, education, and Board of Trustees, and national officers. With these
research. Founded in 1980, with headquarters in changes, the membership of the AMA grew from
Leawood, Kansas, the NRHA has more than around 8,000 in 1900 to approximately 220,000
21,000 individual and organizational members, today. During the 1960s, the membership market
all sharing a common interest in rural health. Its share of the AMA reached its zenith, representing
website is www.ruralhealthweb.org. about 70 % of the nation’s physicians, but today it
Other associations represent health care organi- only represents about 25 %. Its membership, and
zations and trade associations. The Pharmaceutical to some degree its influence, has declined because
Research and Manufacturers of America (PhRMA) of the profusion of competing national specialty
is an example of a large influential trade associa- medical societies, and the decline of solo practices
tion. Founded in 1958, and headquartered in and the rise of salaried physicians who work for
Washington, D.C., PhRMA represents the nation’s various organizations (American Medical Associ-
largest biopharmaceutical research and biotechnol- ation 1997).
ogy companies, such as Amgen, Bayer, Eli Lilly, Today, the stated mission of the AMA is to
Merck, and Pfizer. Since 2000, PhRMA member promote the art and science of medicine and the
companies have invested more than $550 billion in betterment of public health. Headquartered in
drug development, including an estimated $51.1 Chicago and with an office in Washington, D.C.,
billion in 2013. PhRMA is an advocate for public the AMA advocates for its members by develop-
policies to encourage the discovery of new medi- ing health care policies. The top items on the
cines. To accomplish this PhRMA is dedicated to AMA’s current policy agenda include modifica-
achieve: broad patient access to medicines through tion of the Affordable Care Act (ACA), the
a free market, without price controls; strong intel- improvement of diabetes care delivery, changes
lectual property incentives; and effective regula- in drug reporting, and increasing Medicaid pay-
tion and a free flow of information to patients. ments making them comparable to those paid by
PhRMA publishes policy papers, profiles and Medicare. The AMA also produces a number of
reports, fact sheets, newsletters, and speeches important products and services. The association
(Pharmaceutical Research and Manufacturers of is one of the largest publishers of medical infor-
America 2014). These publications are available mation in the world. For example, its weekly
at its website, www.phrma.org. Journal of the American Medical Association
102 R. M. Mullner
(JAMA) is published in 10 languages and print communications activities, and evaluations.

editions are circulated in over 100 countries. The RWJF funds both projects it proposes that are
AMA also publishes a number of other specialty issued through call for proposals (CFP) as well
journals. Another of its publications is the Current as unsolicited proposals. Each year, RWJF makes
Procedural Terminology (CPT), a guidebook for hundreds of awards, with funds ranging from
physicians’ offices on how to classify and code $3,000 to $23 million. However, most awards
medical procedures and services for reimburse- range from $100,000 to $300,000 for a period of
ment from Medicare, and other insurance compa- 1–3 years. A list of 1,225 awards RWJF has given
nies. An important AMA resource that supports from 1972 to 2015 totaling $789,305, 241 can be
membership services, marketing activities, and found at www.rwjf.org.
research is the Physician Master file, a large data-
base that contains biographic, medical education Health Insurance and Employee Benefits
and training, contact, and practice information on Organizations
more than 1.4 million physicians, residents, and Many health services researchers study the func-
medical students in the USA. The file, which is tion of health insurance, the various types of insur-
updated continuously, also contains information ance plans, and the impact of insurance on the use
on medical schools, graduate medical education of health care services. In the USA, most people
programs, teaching institutions, and medical obtain their health insurance coverage through
group practices. More information on the AMA their workplace, with health insurance being one
can be found on its website at www.ama-assn.org. of the most important employment-based benefits.
An example of one organization that studies
Foundations and Trusts health insurance and other benefits is the Employee
Foundations and trusts are an important source of Benefit Research Institute (EBRI). Founded in
health care data and information. They publish 1978, and located in Washington, D.C., the EBRI
health care reports, policy briefs, and newsletters. is a nonprofit, nonpartisan organization that con-
They also often support and fund projects on ducts research relating to employee benefit plans,
various health services research topics. compiles and disseminates information on
One of the largest health care foundations in employee benefits, and sponsors educational activ-
the USA is the Robert Wood Johnson Foundation ities such as lectures, roundtables, forums, and study
(RWJF). Located in Princeton, New Jersey, RWJF groups on employee benefit plans. The EBRI pub-
was founded in 1936, and it became a national lishes a number of special reports, books, and
philanthropy in 1972. Its mission is to improve the monthly issue briefs. It also conducts annual health
health and health care of all Americans. Over the and retirement benefit surveys. In terms of health
past 40 years, RWJF has become the nation’s benefits, the EBRI has four research centers: Center
largest philanthropy devoted solely to the public’s for Research on Health Benefits Innovation, which
health. It currently provides grant funds primarily focuses on helping employers measure the impact of
to public agencies, universities, and public chari- new benefit plan designs in terms of cost, quality,
ties in six broad areas of focus: child and family and access to health care; Center for Studying
well-being; childhood obesity, health insurance Health Coverage and Public Policy, which monitors
coverage, healthy communities, health leadership the trends in the availability of health coverage and
and the workplace, and health system improve- the impact of public policy on employment-based
ment. RWJF attempts to fund innovative projects health benefits; Center for Research on Health Care
that will have a measureable impact and that in Retirement, which studies the trends in retiree
can create meaningful, transformation change, health benefits and its impact upon them; and the
such as service demonstrations, gathering and Center for Survey Research, which conducts the
monitoring of health statistics, public education, Health Confidence Survey and the Consumer
training and fellowship programs, policy analysis, Engagement in Health Care Survey. More informa-
health services research, technical assistance, tion can be found at www.ebri.org.
Registries Mathematica Policy Research, RAND Corpora-

This category is very similar to government reg- tion, RTI International, and the Urban Institute.
istries, which has been discussed previously. The RAND Corporation is the largest, and one of
However, it differs mainly in terms of sponsorship the most prestigious, research and policy organiza-
and funding source. Most of the private organizations in the USA. Incorporated in 1948, the RAND
tion registries are funded, constructed, and (a contraction of “research and development”) Cor-
maintained by professional medical societies. poration is an independent, nonprofit organization
For example, the National Trauma Data Bank that conducts research and analysis for many US
(NTDB), which is managed by the American Col- government departments, foreign governments,
lege of Surgeons’ (ACS) Committee on Trauma, international organizations, professional associa-
is the largest aggregation of trauma patient data in tions, and other organizations. Headquartered in
the USA. The NTDB obtains its data from trauma Santa Monica, California, the RAND Corporation
registries maintained by hundreds of hospitals has a professional staff of 1,700 people. It annually
across the nation. Currently, the NTDB contains receives over $250 million in contracts and grants
more than five million trauma patient records. and works on about 500 projects at any given time.
Since 2003, the NTDB has published an annual One of the RAND Corporation’s largest
report summarizing these data. The 2013 report research divisions is RAND Health. With a staff
contains data based on 833,311 trauma patient of 280 health care experts, about 70 % of RAND
records submitted by 805 hospitals. The report Health’s research is supported by contracts and
contains a wealth of summary information on the grants from the US federal government, with the
patient’s age; gender; primary payment source; remainder coming from professional associations,
alcohol and drug use; mechanism of injury; injury universities, state and local governments, and
severity score; pre-hospital time; hospital geo- foundations.
graphic location, bed size, and trauma level; Over the years, RAND Health has conducted
patient transfer information; number of ICU and hundreds of health care research studies, includ-
ventilator days; hospital complications; length of ing the very famous Health Insurance Experiment
hospital stay; place of discharge; and the number (HIE). The RAND HIE was one of the largest and
of deaths, including those who were dead on most comprehensive social science experiments
arrival (DOA), and specific and overall mortality ever conducted in the USA. Funded by the federal
rates (American College of Surgeons 2013). Data government, HIE addressed two key questions:
contained in the NTDB are available to qualified How much more medical care will people use if
researchers in two forms: a dataset containing all it is provided free of charge? What are the conse-
records sent to the NTDB for each admission year quences for their health? To answer these ques-
and national estimates for adult patients seen in tions in 1971 the HIE randomly assigned several
Level I and II trauma centers. More information is 1,000 households in different geographic regions
available at the NTDB website at www.ntdb.org. in the USA to health insurance with varying levels
of co-insurance, and then followed them for
5-years to evaluate the effect on their medical
Research and Policy Organizations utilization and health. The HIE, which took
There are many private research and policy orga- 15-year to complete at a cost of about $200 mil-
nizations in the USA. Most of them conduct con- lion, remains the largest health policy study ever
tract or grant funded research studies for the conducted in US history. Its rich findings are still
federal government. Depending upon the scope being discussed today (Aron-Dine et al. 2013;
of work, these contracts and grants can amount to Newhouse 1993). More information about the
hundreds of thousands to hundreds of millions of HIE can be found on the project’s home page at:
dollars. Some of the largest contract research and www.rand.org/health/projects/hie.html.
policy organizations that often receive these funds Currently, RAND Health is working on several
include Abt Associates, Brookings Institution, major projects including developing global
104 R. M. Mullner
HIV/AIDS prevention strategies using antiretroviral Table 5 RAND Corporation health surveys by topic
drugs in South Africa, India, and the USA; measur- Aging and health
ing the total costs of dementia in the USA; deter- Assessing Care of Vulnerable Elders (ACOVE)
mining the impact of lowering the costs of healthy Vulnerable Elders Survey (VES-13)
foods in supermarkets in the diet patterns of house- Diversity and health
holds in South African; identifying the effect of the Homelessness survey
Affordable Care Act (ACA) on hospital emergency Health economics
department use by young adults who remained on Hospital competition measures
their parent’s health insurance; and developing new Managed health care survey
models of patient-centered medical homes and HIV, STDs, and sexual behavior
nurse-managed health centers to help alleviate the HIV Cost and Services Utilization Study (HCSUS)
HIV Identification, Prevention, and Treatment Services
growing shortage of primary care physicians in the
Surveys
USA (RAND Corporation 2013). HIV Patient-Assessed Report of Status and Experience
The RAND Corporation publishes all of its (HIV-PARSE)
reports on its website. Further, RAND Health Maternal, child, and adolescent health
makes all of its surveys publicly available without Pediatric Asthma Symptom Scale
charge. Examples of available surveys by topic are Pediatric Quality of Life Inventory (PedsQL
shown in Table 5. More information can be found Measurement Model)
at www.rand.org/health/surveys_tools.html. Mental health
Mental health inventory
Depression screener
Survey Research Organizations
Improving Care for Depression in Primary Care (Partners
Academic and commercial survey research orga- in Care)
nizations frequently collect health care data. They Military health policy
often conduct health care surveys for various gov- Chronic Illness Care Evaluation Instruments (ICICE website)
ernment agencies, commercial companies, and Dialysis Patient Satisfaction Survey (DPSS)
research and public policies organizations. Some- Patient Satisfaction Questionnaires (PSQ-III and PSQ-18)
times they also add health care questions to the Patient Satisfaction Survey for the Unified Medical Group
general population surveys they conduct to deter- Association
mine changing attitudes, beliefs, and public opin- Quality of life
ions. Data from these surveys are often archived Epilepsy Surgery Inventory Survey (ESI-55)
by the survey organizations and eventually are Kidney Disease Quality of Life Instrument (KDQOL)
made available to researchers. Many of these Medical Outcomes Study (MOS)
organizations also provide lists of the survey Measures of quality of life
Measures of patient adherence
questions they have used. This can be a valuable
Mental health inventory
resource for researchers, because it is difficult to
Sexual problems measures
design nonbiased questions, and they can judge
Sleep scale
the validity and reliability of the questions already
Social support survey
used. Researchers may include these questions in National Eye Institute Refractive Error Quality of Life
the surveys they are designing. Instrument
An example of one of the oldest independent Pediatric Quality of Life Inventory (PedsQL
academic-based survey research organizations is Measurement Model)
NORC at the University of Chicago. Founded in Quality of Life in Epilepsy Inventory (QOLIE-89 and
QOLIE-31)
1941, NORC, which originally stood for National
RAND Negative Impact of Asthma on Quality of Life
Opinion Research Center, is headquartered in
Visual Function Questionnaire (VFQ-25)
downtown Chicago with additional offices on
Research methods
the University of Chicago’s campus and in
Socially Desirable Response Set Five-Item Survey (SDRS-5)
Washington, D.C. During the past 70 years, The Homelessness Survey
NORC has conducted many landmark national
large-scale health surveys including: National will all increasingly demand having more health
Ambulatory Medical Care Survey, the first-ever care data. Patients will need these data to help
survey of medical care delivered to patients by them make better evidence-based informed deci-
office-based physicians; National Children’s sions. They need to know: Who are the best
Study, the largest study of children’s health and physicians for the care I need? What innovative
development tracking 100,000 children before treatments are available? What are the benefits
birth through age 21; and the National Social and risks of the treatments? Which hospitals are
Life, Health and Aging Project, a longitudinal the best providers of the treatments? Where can I
study of the health of older Americans. get a second or even a third medical opinion?
One of NORC’s flagship surveys and longest- How much will the treatments cost? And which
running projects is its General Social Survey (GSS). treatments are covered by my current health
Begun in 1972, and continuing today, this annual insurance policy?
survey is the most widely regarded single best Health care providers will need more data to
source of data on societal trends. Hundreds of better monitor the care they provide. They will
researchers, policymakers, and students have used need to hold down their costs, provide high quality
the survey’s data to study a wealth of topics. The services, and justify what they charge to health care
GSS contains a standard set of demographic, behav- insurers. They also will have to increasingly deal
ioral, and attitudinal questions, plus various topics with patients demanding more data on the cost and
of special interest. For more than 40 years the GSS quality of the care they received. Already many
has been tracking the opinions of Americans. Over hospitals and clinics, insurers, and employers enable
the years, many health care questions have been patients to access their electronic medical and billing
included in the survey asking about choice of phy- records online.
sicians, difficulty receiving care, health insurance Policymakers will need more data to develop
coverage, coverage changes, use of Medicare/Med- new more effective policies to help bend the cost
icaid, incentives for physicians, opinions on HMOs, curve. They will use these data to construct and
and whether they sought medical care for mental test new medical care reimbursement models,
health problems. Data from the GSS and its various which will hopefully lower costs and at the same
questionnaires and codebooks can be downloaded. time increase the quality of care. They will also
A cross-tabulation program is also available (NORC develop policies to encourage more disease pre-
at the University of Chicago 2011). More informa- vention and wellness programs.
tion on NORC and its surveys, including the GSS, Health services researchers will demand more
can be obtained at www.norc.org. data to better evaluate existing health care pro-
grams. They will increasingly conduct research to
compare the relationship between the cost and
Conclusion quality of health care to determine its value to
patients and society. Over time, using these data
This chapter has presented a practical typology of sources, health services researchers will forge an
health care data, and it has identified and described important new evidence-based science of health
many important data sources and public use files. care delivery – a new science that will continue to
Although much health care data are currently avail- build on the crucial concepts of access, cost, qual-
able, in the future much more data will be needed. ity, and the outcome of health care.
The demand for more accessible, transparent, and
comprehensive health care data will be driven by
advances in medical science, rising public expecta- References
tions, the continuing growth of the Internet and social
media, and the ever increasing cost of health care. Aaron HJ, Schwartz WB, Cox MA. Can we say no?: the
In the future, patients, health care providers, challenge of rationing health care. Washington, DC:
Brookings Institution Press; 2005.
policymakers, and health services researchers
106 R. M. Mullner
Aday LA, Cornelius LJ. Designing and conducting health Centers for Medicare and Medicaid Services (CMS).
surveys: a comprehensive guide. 3rd ed. San Francisco: Medicare and you, 2015. Baltimore: Centers for Medi-
Jossey-Bass; 2006. care and Medicaid Services; 2014a. Available at: www.
Aday LA, Begley CE, Lairson DR, Balkrishnan cms.gov
R. Evaluating the healthcare system: effectiveness, effi- Centers for Medicare and Medicaid Services (CMS). Nurs-
ciency, and equity. 3rd ed. Chicago: Health Adminis- ing home data compendium 2013 edition. Baltimore:
tration Press; 2004. Centers for Medicare and Medicaid Services; 2014b.
Agency for Healthcare Research and Quality (AHRQ). Available at: www.cms.gov
2013 National healthcare disparities report. Rockville: Chassin MR, Loeb JM, Schmaltz SP, Wachter
Agency for Healthcare Research and Quality; 2014a. RW. Accountability measures – using measurement to
Available at: www.ahrq.gov/research/findings/nhqrdr/ promote quality improvement. N Engl J Med. 2010;363
index.html (7):683–88. Available at: www.nejm.org/doi/full/10.
Agency for Healthcare Research and Quality (AHRQ). 1056/NEJMsb1002320
2013 National healthcare quality report. Rockville: Children’s Health Care Quality Measures Core Set Tech-
Agency for Healthcare Research and Quality; 2014b. nical Assistance and Analytic Program. Strategies for
Available at: www.ahrq.gov/research/findings/nhqrdr/ using vital records to measure quality of care in Med-
index.html icaid and CHIP programs. Medicaid/CHIP Health Care
American College of Surgeons. National Trauma Data Quality Measures: Technical Assistance Brief 4: Jan
Bank 2013: annual report. Chicago: American College 2014, 1–11. Available at: www.medicaid.gov/Medic
of Surgeons; 2013. Available at: www.ntdb.org aid-CHIP-Program-information/By-Topics/Quality-of-
American Hospital Association (AHA). AHA guide to the Care/Downloads/Using-Vital-Records.pdf
health care field, 2014. Chicago: Health Forum; 2013a. Clancy CM. What is health care quality and who decides?
American Hospital Association (AHA). AHA hospital sta- Statement before the Committee on Finance, Subcommit-
tistics, 2014. Chicago: Health Forum; 2013b. tee on Health Care, U.S. Senate, 18 Mar 2009. Available
American Hospital Association (AHA). AHA Data Viewer at: www.hhs.gov/asl/testify/2009/03/t20090318b.html
website. 2015. www.ahadataviewer.com Culyer AJ. The dictionary of health economics, second
American Medical Association. Caring for the country: a edition. Northampton: Edward Elgar; 2010.
history and celebration of the first 150 years of the Dartmouth Atlas of Health Care Project. 2015. www.
American Medical Association. Chicago: American dartmouthatlas.org
Medical Association; 1997. Davis KE. Access to health care of adult men and women,
Andersen RM. Revisiting the behavioral model and access ages 18–64, 2012. Medical Expenditure Panel Survey
to medical care: does it matter? J Health Soc Behav. (MEPS) Statistical Brief #461. Rockville: U.S. Agency
1995; 36:1–10. Available at: www.mph.ufl.edu/files/ for Healthcare Research and Quality (AHRQ); 2014.
2012/01/session6april2RevisitingBehavioralModel.pdf Available at: www.meps.ahrq.gov/mepsweb/data_files/
Anderson C. Multinational comparisons of health system publications/st461/stat461.pdf
data, 2014. New York: Commonwealth Fund; 2014. Davis K, Stremikis K, Squires D, Schoen C. Mirror, mirror
Available at: www.commonwealthfund.org on the wall: how the performance of the U.S. health
Anderson GF, Squires DA. Measuring the U.S. health care care system compares internationally. Pub. No. 1755.
system: a cross-national comparison. Issues in Interna- New York: Commonwealth Fund; 2014. Available at:
tional Health Policy, Commonwealth Fund, Pub. 1412, www.commonwealthfund.org
Vol. 90, June 2010. Available at: www. Donabedian A. The definition of quality and approaches to
commonwealthfund.org its assessment. Vol. 1. Explorations in quality assess-
Aron-Dine A, Einav L, Finkelstein A. The RAND Health ment and monitoring. Ann Arbor: Health Administra-
Insurance Experiment, three decades later. J Econ tion Press; 1980.
Perspect. 2013;27(1):197–222. Available at: http:// Feldstein PJ. Health care economics. 7th ed. New York:
economics.mit.edu/files/8400 Thomson Deimar Learning; 2011.
Arts DGT, de Keizer NF, Scheffer G-J. Defining and Gliklich RE, Dreyer NA, editors. Registries for evaluating
improving data quality in medical registries: a literature patient outcomes: a user’s guide. 2nd ed. AHRQ Pub-
review, case study, and generic framework. J Am Med lication No. 10-EHC049. Rockville: U.S. Agency for
Inform Assoc. 2002;9(6):600. Available at: www.ncbi. Healthcare Research and Quality; 2010. p. 15–16.
nlm.nih.gov/pmc/articles/PMC349377 Available at: www.effectivehealthcare.ahrq.gov/ehc/
Black N. Why we need observational studies to evaluate products/74/531/Registries2nd ed Final to Eisenberg
the effectiveness of health care. BMJ. 1996;312 9-15-10.pdf
(7040):1215–18. Available at: www.bmj.com/content/ Halsey MF, Albanese SA, Thacker M, The Project of the
312/7040/1215 POSNA Practice Management Committee. Patient sat-
Borenstein M, Hedges LV, Higgins JPT, Rothstein isfaction surveys: an evaluation of POSNA members’
HR. Introduction to meta-analysis. Chichester: Wiley; knowledge and experience. J Pediatr Orthop. 2015;
2009. 35(1):104–7.
Health Care Cost Institute (HCCI). 2013 health care cost and Newhouse JP, The Insurance Experiment Group. Free
utilization report. Washington, DC: Health Care Cost for all?: lessons from the RAND health experiment.
Institute; 2014. Available at: www.healthcostinstitute.org Cambridge, MA: Harvard University Press; 1993.
Healthy People 2020. www.healthypeople.gov NORC at the University of Chicago. Social science
Higgins JPT, Green S, editors. Cochrane handbook for research in action. Chicago: NORC at the University
systematic reviews of interventions. Chichester: of Chicago; 2011. Available at: www.norc.org/PDFs/
Wiley-Blackwell; 2008. Brochures-Collateral/NORC_Book_Social_Science_
Huston P, Naylor CD. Health services research: reporting Research_in_Action.pdf
on studies using secondary data sources. Can Med Office of the Assistant Secretary for Planning and Evalua-
Assoc J. 1996;155(12):1697–1702. Available at: tion (ASPE). U.S. Department of Health and Human
www.ncbi.nlm.nih.gov Services (HHS). Guide to HHS surveys and data
Johnson TP, editor. Handbook of health survey methods. resources. Washington, DC: U.S. Department of Health
New York: Wiley; 2014. and Human Services; n.d. Available at: www.aspe.hhs.
Kane RL, Radosevich DM. Conducting health outcomes gov/sp/surveys/index.cfm
research. Sudbury: Jones and Bartlett Learning; 2011. Organisation for Economic Co-operation and Develop-
Krueger RA, Casey MA. Focus groups: a practical guide ment (OECD). Health at a glance 2013: OECD indica-
for applied research. 4th ed. Thousand Oaks: Sage; tors. Paris: Organisation for Economic Co-operation
2009. and Development; 2013. Available at: https://doi.org/
Larsson S, Lawyer P, Garellick G, Lindahl B, Lundstrom 10.1787/health_glance-2013-en
M. Use of 13 disease registries in 5 countries demon- Osborn R, Moulds D, Squires D, et al. International survey
strates the potential to use outcome data to improve of older adults finds shortcomings in access, coordina-
health care’s value. Health Aff. 2012;31(1):220–7. tion, and patient-centered care. Health Aff. 2014;33
Lee K. Global institutions: the World Health Organization (12):2247–55.
(WHO). New York: Routledge; 2008. Painter MJ, Chernew ME. Counting change: measuring health
Levy D, Brink S. A change of heart: how the people of care prices, costs, and spending. Princeton: Robert Wood
Framingham, Massachusetts, helped unravel the mys- Johnson Foundation; 2012. Available at: www.rwjf.org
teries of cardiovascular disease. New York: Knopf; Perrin JM. Health services research for children with dis-
2005. abilities. Milbank Q. 2002;80(2):303–24. Available at:
Mullner RM, editor. Encyclopedia of health services www.ncbi.nlm.nih.gov/pmc/articles/PMC2690116/
research. 2 Vol. Thousand Oaks: Sage; 2009, xxix. Pharmaceutical Research and Manufacturers of America.
National Cancer Institute. SEER as a research resource. 2014 Biopharmaceutical Research Industry Profile.
NIH Publication No. 10-7519. Bethesda: SEER Pro- Washington, DC: Pharmaceutical Research and Manu-
gram, National Cancer Institute; 2010. Available at: facturers of America; 2014. Available at: www.phrma.
www.seer.cancer.gov/about/factsheets/SEER_Research_ org/sites/default/files/pdf/2014_PhRMA_PROFILE.pdf
Brochure.pdf RAND Corporation. RAND Corporation: annual report 2013.
National Center for Health Statistics (NCHS). National Santa Monica: RAND Corporation; 2013. Available at:
health survey: the principal source of information on www.rand.org/pubs/corporate_pubs/CP1-2013.html
the health of the U.S. population. Hyattsville: National Rhoades JA, Cohen SB. The long-term uninsured in Amer-
Center for Health Statistics; 2010. Available at: www. ica, 2009–12 (selected intervals): estimates for the
cdc.gov/nchs/data/nhis/brochure2010January.pdf U.S. civilian noninstitutionalized population under
National Center for Health Statistics (NCHS). Health, age 65. Medical Expenditure Panel Survey (MEPS)
United States, 2013: with special feature on prescrip- Statistical Brief #464. Rockville: U.S. Agency for
tion drugs. Hyattsville: National Center for Health Sta- Healthcare Research and Quality (AHRQ); 2014.
tistics; 2014a. Available at: www.cdc.gov/nchs/data/ Available at: www.meps.ahrq.gov/mepsweb/data_
hus/hus13.pdf files/publications/st464/stat464.pdf
National Center for Health Statistics (NCHS). Summary Robert Wood Johnson Foundation. Consumer attitudes on
of current surveys and data collection systems. health care costs: insights from focus groups in four
National Center for Health Statistics; 2014b. Avail- U.S. cities. Princeton: Robert Wood Johnson Founda-
able at: www.cdc.gov/nchs/data/factsheets/factsheet_ tion; 2013. Available at: www.rwjf.org/content/dam/
summary1.pdf farm/reports/issue_briefs/2013/rwjf403428
National Committee for Quality Assurance (NCQA). The Stagnitti MN, Carper K. National health care expenses in
essential guide to health care quality. Washington, DC: the U.S. civilian noninstitutionalized population, dis-
National Committee for Quality Assurance; 2006. tributions by types of service and source of payment,
Available at: www.ncqa.org 2012. Medical Expenditure Panel Survey (MEPS)
National Research Council, Committee on National Statistics. Statistical Brief #456. Rockville: U.S. Agency for
Vital statistics: summary of a workshop. Washington, Healthcare Research and Quality (AHRQ); 2014.
DC: National Academies Press; 2009. Available at: Available at: www.meps.ahrq.gov/mepsweb/data_
www.ncbi.nlm.nih.gov/books/NBK219877/ files/publications/st456/stat456.pdf
108 R. M. Mullner
Swartout KA. Encyclopedia of associations: national orga- White Paper, AHRQ Publication No. 13-EHC124-EF.
nizations of the U.S. Farmington Hills: Gale Cengage Rockville: U.S. Agency for Healthcare Research and
Learning; 2014. Available at: www.gale.cengage.com Quality; 2013. Available at: www.effectivehealthcare.
Wakefield MK. Statement by HRSA administrator Mary ahrq.gov/ehc/assets/File/Patient-Powered-Registries-
K. Wakefield, Ph.D., R.N. on the National Practitioner white-paper-130911.pdf
Data Bank Public Use File, 9 Nov 2011. Available at: World Health Organization (WHO). World health survey,
www.npdb.hrsa.gov/resources/publicDataStatement.jsp 2002, B – individual questionnaire. 2002. Available at:
Wennberg JE. Tracking medicine: a researcher’s quest to www.who.int/healthinfo/survey/en/
understand health care. New York: Oxford University World Health Organization (WHO). World health statis-
Press; 2010. p. 1–13. tics, 2014. Geneva: WHO Press; 2014. Available at:
Williamson A, Hoggart B. Pain: a review of three com- www.who.int
monly used pain rating scales. J Clin Nurs. 2008;14 Xu F, Mawokomatanda T, Flegel D, et al. Surveillance for
(7):798–804. Available at: www.onlinelibrary.wiley. certain health behaviors among states and selected local
com/doi/10.1111/j.1365-2702.2005.01121.x/pdf areas – behavioral risk factor surveillance system,
Workman TA. Engaging patients in information sharing United States, 2011. Morb Mortal Wkly Rep
and data collection: the role of patient-powered regis- (MMWR). 2014;63(9):1–149. Available at: www.cdc.
tries and research networks. AHRQ Community Forum gov/mmwr/pdf/ss/ss6309.pdf
Health Services Information: Application
of Donabedian’s Framework to Improve 6
the Quality of Clinical Care
A. Laurie W. Shroyer, Brendan M. Carr, and Frederick L. Grover
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
National Committee for Quality Assurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
Dr. Ernest Amory Codman’s Data-Driven
Approach to Defining and Measuring Quality of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Dr. Avedis Donabedian’s Process-Structure-
Outcome Model for Quality of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Processes of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Structures of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Outcomes of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Process-Structure-Outcomes in Cardiac
Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Risk Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Implementation of VA National Quality
Improvement Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
The Processes, Structures, and Outcomes of
Cardiac Surgery Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
A. L. W. Shroyer (*)
Department of Surgery, School of Medicine, Stony Brook
University, Stony Brook, NY, USA
e-mail: annielaurie.shroyer@stonybrookmedicine.edu
B. M. Carr
Department of Emergency Medicine, Mayo Clinic,
Rochester, MN, USA
F. L. Grover
Department of Surgery, School of Medicine at the
Anschutz Medical Campus, University of Colorado,
Aurora, CO, USA
# This is a U.S. government work and not under copyright protection in the U.S.; 109
foreign copyright protection may apply 2019
https://doi.org/10.1007/978-1-4939-8715-3_7
110 A. L. W. Shroyer et al.
Hypotheses of the PSOCS Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Methods of the PSOCS Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Findings of the PSOCS Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
The CICSP-X Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Measuring Processes of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Monitoring Trends Over Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Implementation of National Quality
Improvement Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Uncovering Quality Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
The Michigan Society of Thoracic and
Cardiovascular Surgeons Quality Collaborative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
The American College of Surgeons’
Private Sector Initiative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Implementation Challenges: Dilemmas Faced by Quality Measurement
Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Abstract events (e.g., cascading of outcomes), and

This chapter provides a summary of the well- (4) assessing the relative impact of medical
established conceptual models related to mea- versus nonmedical care influences upon the
suring and improving the overall quality of quality of patient medical care rendered. His-
medical care as described by the Institute of torical projects designed to define, measure,
Medicine and founded upon Donabedian’s his- and evaluate the quality of cardiac surgical
torical triad for quality measures. The subcom- care in the Department of Veterans Affairs
ponents required for quality measurement are Continuous Improvement in Cardiac Surgery
first identified, including (1) patient risk fac- Program and Society of Thoracic Surgeons
tors, (2) processes of care, (3) structures of National Adult Cardiac Database are presented
care, (4) clinical outcomes, and (5) resource to illustrate how these quality of care concepts
utilization or costs of care. The key challenges can be applied. The challenges in using clinical
associated with applying this quality of care databases to evaluate quality of care are then
conceptual model to designing and imple- summarized. Finally, several innovative
menting new research projects are then approaches are described toward advancing
discussed, including the following cutting- the future practice of quality measurement
edge measurement-related topics: (1) dealing research.
with missing data (e.g., clinical substitution
versus statistical imputation), (2) differentiat-
ing planned versus unplanned processes of Introduction
care (e.g., distinguishing between interven-
tions used as a matter of routine and those Within the healthcare field, a diversity of
interventions that were initiated in response to approaches has been used to define, to measure,
observed changes in the patient’s status), and to improve the quality of medical care ser-
(3) evaluating the differential impact of vices, attempting to optimize the opportunities
sequential versus nonsequential timing of to improve clinical care outcomes while
6 Health Services Information: Application of Donabedian’s Framework to Improve the. . . 111
evaluating whether the actual outcomes incurred National Committee for Quality
achieve the original expectations. As perhaps Assurance
one of the earliest documented descriptions
related to defining or measuring the quality of Toward the goal of providing quality rankings, the
medical care, King Hammurabi’s Code (1,700 National Committee for Quality Assurance
BC) provided insights as to what were consid- (NCQA) provides an infrastructure support of a
ered unacceptable care outcomes as compared to broad array of programs and services focused on
the expectations, providing clear instructions as measuring, analyzing, and continually improving
to the direct consequences to clinicians for the the healthcare provided by US-based health plans.
delivery of substandard care: The National Committee for Quality Assurance
has defined quality metrics that can be used to
If a physician performed a major operation on a identify opportunities for quality improvement.
nobleman with a bronze lancet and caused the The routine reporting of quality metrics has been
nobleman’s death, or he opened the eye-socket of
a nobleman and destroyed the nobleman’s eye, they useful to inform decisions at the clinical program,
shall cut off his hand. (Magno 1975) facility, health plan, and policy levels. By provid-
ing publicly available statistical reports evaluating
To optimize quality of medical care, there health plan performance, important quality
exist at many facilities patient safety initiatives improvements have been documented and trans-
focused on engaging healthcare professionals, lated into reduced adverse event rates impacting
organizations, and patients toward the attain- patient care. For example, the use of beta-blockers
ment of a healthcare system that reduces errors for the subgroup of patients with a prior acute
with a focus to consistently improve the care myocardial infarction (aka AMI or “heart attack”)
provided (based on previously identified chal- has been documented in the peer-reviewed litera-
lenges occurring) and to create an institutional ture to reduce the chance of a repeat AMI by 40%
culture focused upon assuring patient safety as a (National Committee for Quality Assurance
top priority. 2014a). Thus, beta-blocker use has been cited as
Institutional patient safety cultures can foster an NCQA successful metric used to facilitate pos-
and support the design and implementation of itive trends documented for quality of care
ideal clinical practices. This would be exempli- outcomes.
fied by an institutional culture that focuses on Moreover, as part of the National Committee
reducing the risk of adverse events occurring. for Quality Assurance, the Healthcare Effective-
Even with application of the best evidence avail- ness Data and Information Set (HEDIS) was
able, unforeseen adverse consequences of the developed, and, as of 2014, the vast majority of
medical care provided unfortunately still do US-based health plans submitted HEDIS metrics
occur. As an example, the perioperative admin- (which consisted of 81 measures for quality of
istration of prophylactic antibiotic therapy for care across five different care domains). The
patients undergoing surgery is commonly cited National Committee for Quality Assurance
as a patient safety practice employed to prevent HEDIS requires that plans report the continued
surgery-related infections in the postoperative post-AMI use rates for beta-blocker medications
period (van Kasteren et al. 2007). In spite of for their eligible population. That is, health plans
this important intervention, however, postopera- must calculate the proportion of their eligible
tive infections still remain an outstanding chal- enrollees (aged 18 years or older) who received
lenge faced by many healthcare institutions, with persistent beta-blocker treatment for 6 months
multiple approaches implemented to keep post- after discharge following their AMI hospitaliza-
operative infection rates low (e.g., conscience tion over the past year period. Although this spe-
handwashing techniques used routinely, com- cific HEDIS metric is most relevant to a smaller
bined with sterile techniques for wound dressing sized subgroup of ischemic heart disease patients,
changes). the National Committee for Quality Assurance
Affordable Care Act, leading the way to provid-

PERSISTENCE OF ing publicly available information to aid con-
BETA-BLOCKER TREATMENT RATE sumers in selection of qualified health plans
(QHPs), as well as routinely monitoring qualified
COMMERCIAL MEDICAID MEDICARE health plan quality.
YEAR HMO PPO HMO HMO PPO
2013 83.9 81.4 84.2 90.0 89.4
Dr. Ernest Amory Codman’s
2012 83.9 79.5 82.0 88.9 88.5 Data-Driven Approach to Defining
2011 81.3 77.0 80.5 87.3 86.2 and Measuring Quality of Care
2010 75.5 71.3 76.3 83.1 82.5 As one of the “founding fathers” of the historical
2009 74.4 69.6 76.6 82.6 78.9 healthcare quality movement, Dr. Ernest Amory
Codman (1869–1940) conceived of the “end
2008 75.0 68.8 73.6 79.7 76.7
result” hospital, where the long-term outcomes
2007 71.9 62.9 62.0 75.5 70.4 following surgery would be documented and eval-
2006 72.5 65.5 68.1 69.6 70.9 uated for each patient to identify opportunities for
future medical care improvements (Codman
2005 70.2 64.3 69.8 65.4 58.5 2009). As part of his original hand-tallied quality-
of-care report card, Dr. Codman tabulated the
Fig. 1 National Committee for Quality Assurance
Healthcare Effectiveness Data and Information Set 2014
findings for over 600 abdominal surgical cases
report on persistence of beta-blocker treatment over a decade, classifying his findings by individ-
ual surgeon operators and by diagnosis and treat-
ment approaches used. As a tribute to his
trends over time documented in facility perfor- outcomes-measurement legacy, the Joint Com-
mance are impressive. Specifically, the reported mission created the “Ernest Amory Codman
improvements for this HEDIS metric over time for Award” in 1996, designed to enhance knowledge
the National Committee for Quality Assurance and encourage the use of performance measure-
health plans (including commercial, Medicaid, ment to improve healthcare quality and safety
and Medicare populations) for both health main- (The Joint Commission 2014).
tenance organizations (HMO) and preferred pro- In his multiple roles as a clinician, a leader,
vider organizations (PPO) were reported, as and an advocate for quality management,
shown in Fig. 1 (National Committee for Quality Dr. Codman emphasized the need for monitoring
Assurance 2014b). and improving surgical quality of care. His orig-
Importantly, the National Committee for inal concerns related to data-driven improvement
Quality Assurance seal of approval is given to of quality-of-care decisions remain critically rel-
health plans that meet their published require- evant even today (Nielsen 2014). The
ments, including adherence to more than 60 distinguishing factor between the historical and
preestablished standards, as well as reported per- current healthcare debates, however, relates to
formance for more than 40 quality metrics. The the plethora of current data available versus the
National Committee for Quality Assurance pro- scarcity of data that had been gathered histori-
vides the Quality Rating System (QRS) Measure cally. In spite of the overabundance of data cap-
Technical Specifications, a technical manual that tured to meet government, insurer, and
details each quality metric’s specification and accreditation requirements, an outstanding chal-
provides guidelines for data capture. Thus, the lenge remains to identify relevant, meaningful
National Committee for Quality Assurance information that can be used to improve the qual-
Quality Rating System fulfills the reporting ity of care and to further advance the field of
requirements set forth as part of the recent quality measurement.
Dr. Avedis Donabedian’s systematically evaluate quality, Dr. Donabedian

Process-Structure-Outcome Model put forward an approach that is now commonly
for Quality of Care used based on his “Process-Structure-Outcomes”
framework that incorporated the domains of
As a more contemporary leader shaping today’s (1) processes of care, (2) structures of care, and
quality of healthcare paradigm, Dr. Avedis (3) clinical outcomes (Donabedian 1988). In
Donabedian (1919–2000) established a concep- 1997, Dr. Donabedian was recognized by the
tual model for quantifying healthcare quality Joint Commission as the first recipient in the indi-
improvements. Specifically, Dr. Donabedian vidual category to receive the Ernest Amory
noted that “. . .quality may be judged based on Codman Award (The Joint Commission 2014).
improvements in patient status obtained, as com-
pared to those changes reasonably anticipated
based on the patient’s severity of illness, presence Processes of Care
of comorbidity, and the medical services
received” (Donabedian 1986). As part of his Dr. Donabedian’s approach to assessing quality
approach to defining and measuring quality of focused first on evaluating the processes of care.
care, Dr. Donabedian described two related pro- Processes of care may be defined as the “set of
cesses of care domains for assessing the quality of procedures and/or skills with which health care
medical care: interpersonal excellence and tech- technology of proven or accepted efficacy is
nical excellence (Donabedian 1980). delivered to individual patients” (Shroyer
Excellence in the interpersonal quality domain et al. 1995). This includes the processes associ-
is associated with a patient-centered focus to how ated with care provider actions, as well as the
the care is provided, and the degree of excellence patient’s activities related to seeking and
achieved is based on the degree to which the care obtaining care. Thus, processes of care assess-
meets the patient’s unique needs (including infor- ments verify that patients received what is
mation, physical, and emotional needs) in a man- known to be the appropriate care, by the standards
ner consistent with the patient’s expectations and of evidence-based medicine. Processes of care
preferences. As part of the interpersonal care may include communication processes, such as a
domain, incorporating the patient’s decisions post-discharge telephone follow-up call, as well
directly in the decision-making process is recog- as social or emotional support-related activities. In
nized as an important component of excellent evaluating processes of care, it is important to
quality care. Further, this implies that patient sat- recognize that the patient’s characteristics and
isfaction with the care provided may not mirror historical medical care received need to be fac-
quality if the patient reports high satisfaction tored into this assessment. Specific patient sub-
while the degree to which their needs are met is populations may be targeted for specific processes
less than desired – such that the quality of care of care, whereas these same processes of care may
rendered would be classified as “low.” Thus, there be contraindicated for other patient subgroups.
may not always be a concordance between patient Finally, the patient-based or family-based actions
satisfaction and patient-centered quality of care taken – that is, the patient’s or their family’s
assessments. actions to seek care, to adhere to the treatment
The second domain, the technical quality of plan, or to select to not participate in care offered
care, is related to the degree of alignment between (e.g., by requesting “do-not-resuscitate” orders) –
the care that was provided and the care that might are important considerations to complement the
have been provided based upon the current pro- provider-based processes of care evaluated
fessional care standards, as well as the degree of (National Quality Forum 2009).
improvements in patient outcomes that occurred For example, the use of the left internal mam-
(as compared with the changes in outcomes that mary artery (LIMA) as a conduit for coronary
may have been otherwise anticipated). To artery bypass grafting (CABG) surgical
procedures has been documented to improve outcomes (Sharp et al. 2002; Norcini
long-term survival in comparison to the use of et al. 2010).
other conduits, e.g., saphenous vein grafts Facility characteristics, such as a hospital’s
(SVGs) (Goldman et al. 2004). The earlier cardiac affiliation (academic versus community) or loca-
surgery clinical guidelines identified that the use tion (urban versus rural versus frontier hospitals),
of LIMA for CABG surgery should be considered have also been studied as structural characteristics
where longer-term survival may be an important that have been documented to impact the quality
consideration. More recent published literature of care provided. Academic affiliation, for
has extended the LIMA benefits documented to instance, has not been shown to be a predictor of
include the elderly population. As with any surgi- better outcomes (Papanikolaou et al. 2006). Loca-
cal procedure, there are risks and benefits associ- tion may be important, however, as rural hospitals
ated with every procedure, including LIMA use. have been shown to have worse performance on
The use of a LIMA graft generally takes more quality of care indicators than urban hospitals, in
time; therefore, a LIMA graft may be spite of studies showing their outcomes to not be
contraindicated for emergent/urgent patients inferior to those at urban hospitals (Nawal
where surgical cross-clamp time may be critically Lutfiyya et al. 2007; Dowsey et al. 2014; Tran
important. Hence, the CABG LIMA use rates may et al. 2014). The identification of disparities such
be used as a quality of care metric for elective as this not only demonstrates the important role of
patients, but may not be a meaningful measure of structures of care in affecting the quality of care
quality of care for the emergent/urgent patient but may serve as an impetus to identify changes
subgroups (Karthik and Fabri 2006). that can be made in the structures themselves to
improve patient care.
Importantly, the entire process associated with
Structures of Care accreditation, including the Joint Commission, is
intended to coordinate a quality oversight mecha-
Structures of care, as another important metric to nism, which, in theory, should validate the impor-
assess quality, were defined by Dr. Donabedian as tance of structural measures for care. For example,
being related to the “overall context or environ- the field of cardiac surgery has established mini-
ment in which care is rendered to a group of mal acceptable standards for nurse staffing ratios
patients,” including the characteristics of to be coordinated in critical care units for imme-
healthcare team members (e.g., credentials and diate post-CABG patient care. In order to be
experience) and healthcare facilities (e.g., the deemed of “acceptable” quality, standards for the
type and age of equipment) (Shroyer number and type of nurse staffing must be met to
et al. 1995). Representing an important arm of assure that a high quality of care may be provided.
Donabedian’s triad, structures of care include the For example, a study by VillaNueva and col-
manner in which healthcare facilities are orga- leagues looked at risk-adjusted outcomes of car-
nized and operated, the approaches used for care diac surgery patients in relation to (1) “the
delivery, and the policies and procedures related demographics, education, experience, and
to care including quality oversight processes. employment of operation room (OR) and surgical
For example, structures of surgical care may intensive care unit (ICU) nurses involved in their
involve the physicians’ provider-specific char- care” and (2) “the staffing and vacancy ratios of
acteristics, e.g., international medical graduate OR and surgical ICU nurses involved in their
(IMG) or board certification status. Though not care.” Significant variations were observed in pro-
definitive, studies have shown no difference in cesses of care between participating cardiac sur-
mortality outcomes among hospitalized patients gery centers, but there was insufficient data to
treated by graduates of US medical schools ver- draw conclusions on their effect on patient out-
sus IMGs. There may, however, be a correlation comes (VillaNueva et al. 1995). For this study,
between board certification and better clinical therefore, the theoretical link between structures
of care and outcomes of care could not be con- in many ways including (1) planned (intended) or
firmed directly. Within Donabedian’s quality not planned (unintended) (Mavroudis et al. 2014),
triad, there is a fundamental assumption underly- (2) preventable versus not preventable (Lee
ing the assessment of structural quality elements; et al. 2003), (3) major versus minor in importance,
that is, the healthcare setting in which the care is and (4) related or not related to the medical care
rendered is a very important factor influencing the rendered (Shann et al. 2008).
quality of medical care provided. In spite of the Figure 2 illustrates the hypothetical interac-
data-driven evidence being sparse, this assumptions between processes, structures, and changes
tion extends to the current Joint Commission in the patient’s outcomes of care, where patients
accreditation assessments focused on evaluating present to the healthcare system with an illness, in
the adequacy of healthcare facility basic structure context of their other patient risk characteristics.
of care. The medical care interventions received represent
processes of care, as well as the actions taken by
patients themselves to address their illness state.
Outcomes of Care These actions are coordinated within a healthcare
environment, representing the structures to care.
Finally, outcomes of care, the third piece of the Pending the passage of time, the patient emerges
triad, were defined by Dr. Donabedian as the from their episode of care with a changed rela-
measurable end points of the healthcare process tionship to their illness, which is the outcome of
(Malenka and O’Connor 1998). care measured. This “Process-Structure-Out-
Ideally, a broad range of clinically relevant come” paradigm can be extended from a single
outcomes should be assessed including (but not episode of care to the full series of care encoun-
limited to) traditional measures of mortality and ters, in order to assess and to improve the quality
morbidity, health-related quality of life, of patient care received.
condition-specific or disease-specific metrics of
symptom status or functionality, general health
status, and general overall functionality, or patient Process-Structure-Outcomes
satisfaction. The outcomes measured should be in Cardiac Surgery
related to the full range of care end points salient
to the patients impacted by the treatment received. Supporting these different outcome-based classi-
Prioritized in importance based upon the nature of fication systems, multiple examples have been
the question raised, outcomes may reflect a reported within the field of cardiothoracic surgery.
patient’s status at a single point in time (e.g., Delayed sternal closure, for example, may be
30-day operative mortality) or changes over planned or unplanned (1). In pediatric cardiac
points in time (e.g., pre-CABG angina frequency surgery in particular, the surgeon may plan to
compared to post-CABG 6-month follow-up leave the sternum open at the end of the procedure
angina frequency). For quality assessment pur- because this may allow for better heart function in
poses, moreover, outcomes may be subclassified certain patients. In other cases, however, the
Fig. 2 Theoretical
“process-structure- ENVIRONMENT for CARE=
outcome” framework
Acts Patient Patient with
of Care: Living Changed Relationship
Processes with To lllness =
lllness Outcomes of Care
STRUCTURES of CARE
surgeon may have initially planned to close the 30-day operative mortality may be compared to
patient’s chest, but found himself or herself unable non-VA/STS hospital rates (Public Law 99–166
to as a result of bleeding, myocardial edema, or 1985), or these rates can be compared across time,
arrhythmia (Yasa et al. 2010; Ozker et al. 2012). It by examining the metric for different periods. To
is important to distinguish between the two when be most useful as quality assessment metrics, it
investigating the incidence of delayed sternal clo- may be important to make comparisons of differ-
sure as a surgical complication, because planned ent outcome rates across key patient subgroups
delays in closure could inflate the apparent inci- that did or did not receive specific treatments (e.g.,
dence of surgical complications. On the other rates of mediastinitis during the 30-day perioper-
hand, reintubation is rarely planned, but may be ative period for post-CABG patients treated ver-
preventable (2) if it is brought about by unplanned sus not treated with a prophylactic antibiotic
extubation or as a complication of a neuromuscu- therapy). Moreover, goals for specific procedure-
lar blocking agent rather than a non-iatrogenic based outcomes can be proactively established,
respiratory problem (Lee et al. 2003). Major and such as the STS national objective to achieve a
minor outcomes (3) are easily envisioned based 1% 30-day operative mortality rate for lower-risk
on the degree to which they impact the patient CABG-only patients in the future (Mack 2012).
(e.g., death or nonfatal myocardial infarction ver- As part of a National Institutes of Health (NIH)
sus new-onset atrial fibrillation after CABG sur- initiative in 2004, a new repository entitled the
gery, respectively). Finally, outcomes may be Patient Reported Outcomes Measurement Infor-
unrelated to the medical care rendered (4) when mation System (PROMIS#) system of measures
they’re accepted as a normal consequence of a was established. The PROMIS# metrics included
procedure in a certain fraction of patients. patient self-reported mental, physical, and social
Microembolic events, for example, are known to health status as assessments of the patient’s per-
be an unpreventable consequence of the use of ception of their overall well-being. The
extracorporeal circulation (i.e., cardiopulmonary PROMIS# surveys identified how patients
bypass during cardiac surgery), while an embolic reacted and described how patients felt during
stroke involving a territory of brain circulation is specific times during care received for a
not (Shann et al. 2008). preestablished set of conditions (National Insti-
To support clinical decision-making, the out- tutes of Health 2014). To evaluate treatment effec-
comes identified for the medical care rendered tiveness, PROMIS# assessments can be used as
should focus on the most clinically relevant end primary or secondary end points in clinical
points or changes and may be judged in compar- studies.
ison with the best possible outcomes anticipated Intermediate outcomes, as observations in the
with the use of good processes and structures of pathway that directly lead to the final longer-term
care. Outcomes are often reported as rates, for outcomes, have also been commonly measured.
example, the rate of a serious adverse event fol- Specifically, intermediate outcomes may be com-
lowing a surgical procedure. For coronary artery monly associated with processes of care, as key
bypass graft (CABG-only) procedures, for exam- steps in the journey to obtaining a desired longer-
ple, the national rate reported for a 30-day opera- term health states. For example, the current ische-
tive mortality by the Society of Thoracic Surgery mic heart disease guidelines promulgated by the
(STS) for the period from 1996 to 2009 was American Heart Association would recommend
2.24% (Puskas et al. 2012). For an outcome to that CABG patients be discharged from the hos-
be useful, it must be compared across different pital receiving lipid-lowering medications. At dis-
populations that have the potential to achieve this charge, the use of lipid-lowering medications can
desired end point and also compared to reference be documented, as well as the patient’s current
standards determined from the expected ideal out- total cholesterol level (as well as high-density
come rate to be achieved. For example, the rates lipoprotein and low-density lipoprotein subcom-
for Department of Veterans Affairs (VA) CABG ponents). As an important marker related to post-
CABG patient’s long-term survival, therefore, Importantly, patient risk factors may predis-
both lipid-lowering medication use (as a process pose patients to appropriately receive different
of care measure) and patient cholesterol measures types of treatments or be excluded from consider-
following the CABG hospital discharge over time ation for a specific treatment or set of treatments.
(as intermediate outcomes) may be assessed as Based on these same risk characteristics, there-
part of a quality assurance program (Hiratzka fore, patients may be pre-selected by providers
et al. 2007). to be eligible to receive care or differential types
of treatments. The patient characteristics that
relate to the propensity of a provider (or set of
Risk Adjustment providers) to select them for treatment may be
considered in a slightly different modeling
Although outcomes are considered by many to be approach, related to a propensity analysis (Black-
the ultimate measure of quality of care, they are, to stone 2001, 2002). Based on a patient
a large degree, influenced by the patient’s population’s likelihood to receive a specific treat-
pretreatment condition and their unique character- ment (which may also be related to their risk
istics (e.g., risk factors that influence outcomes). characteristics) using a propensity analysis, a
The goal of risk adjustment (a statistical analysis risk-adjusted analysis can be performed to iden-
isolating the relationship between the outcomes of tify the quality of care rendered to the patient.
interest and the treatment effects of interest) is to The risk-adjustment process, using a statistical
control for the effects of other patient-relevant modeling approach, calculates an “expected” risk
factors, although a patient’s pretreatment status (“E”) for each patient uniquely. Based on aggre-
may not be easy to measure. Specifically, patient gating patient data, the sum of the “expected”
risk factors may be defined as those characteristics risks for an adverse outcome may be compared
that “place patients at an enhanced risk that an to the sum of the “observed” adverse outcomes to
unfavorable event may occur” (Blumberg 1986). identify a patient subpopulation O/E ratio. Any
Generally, risk factors may be classified as mod- specific patient subpopulation or provider-based
ifiable (e.g., related to lifestyle or health behavior O/E ratio that is statistically different from the
choices) or non-modifiable (e.g., related to the value of 1.0 (i.e., where the ratio of the “expected”
patients’ demographic characteristics, socioeco- event rate to the “observed” event rate falls out-
nomic status, or their genetic propensity to incur side of the preestablished confidence interval)
disease-related adverse conditions). In evaluating may be classified as a “high” outlier – that is, the
a patient’s risk profile, it is of paramount impor- O/E ratio is statistically higher than the value of
tance to identify the patient’s severity of disease 1.0. Similarly, a “low” outlier can be identified
and comorbidities (e.g., other diseases that may based on an O/E ratio that is statistically signifi-
impact a patient’s likelihood of experiencing an cantly lower than the value of 1.0. In general, the
adverse event related to the primary disease being risk-adjusted outcome “high” outliers are identi-
considered). In the realm of cardiac surgery risk fied for more intensive quality reviews (e.g.,
adjustment of outcomes, patients’ demographic expanded chart reviews or site visits) for potential
factors (e.g., age or gender) and socioeconomic quality challenges by oversight groups. In con-
status (e.g., highest educational level attained) trast, the “low” outliers may serve as potential
along with the severity of their coronary disease opportunities to identify differential processes or
and complexities of their comorbidities have been structures of care that may be exemplary to serve
demonstrated to be related to risk-adjusted out- as a “benchmark” for others, as a template to
comes. Moreover, patient-based choices related to consider for quality improvement. In general,
healthy behaviors (e.g., body mass index) and quality assurance processes may tend to use
lifestyle (e.g., smoking status) may also influence more generous confidence intervals (e.g., 90%
their probability for having a major adverse event confidence intervals) in order to be sensitive –
(Nashef et al. 2012). that is, to screen in additional patients or provider
4 12.8%
Observed/Expected (O/E) Ration
High 90% Confidence Interval
Low 90% Confidence Interval
High Outlier
Low Outlier
3 9.6%
Risk Adjusted Mortality

O/E Ratio
2 6.4%
1 3.2%
Adjusted VA
Mean 3.0%
0
C
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
27 nte
28 -X
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
1
2
3
4
5
6
7
8
9
e
r
Center
Fig. 3 Example: observed/expected ratio comparison for Department of Veterans Affairs coronary artery bypass graft
30-day operative mortality
subgroups for closer quality assurance center X had no statistically significant difference
(QA) review activities (Shroyer et al. 2008). identified between their O and E rates, indicating
As illustrated in Fig. 3 (which is an example no need for further quality investigation related to
report), VA medical center #3 would likely be this specific end point. Finally, there are no “low-
identified in preliminary reviews as a “high-out- outlier” facilities in this example, as the confi-
lier” facility and may subsequently be screened dence intervals for the O/E ratios for facilities
for potential quality of care concerns, given that #38–44 encompass the value of 1.0. If, however,
the observed rate for 30-day operative mortality is there were “low outliers” identified, then these
statistically significantly higher than the rate that may be facilities to explore further with both
would have been expected based on evaluating in-depth chart reviews and/or site visits to identify
the patient risk characteristics for a CABG proce- “benchmark” care activities that may be useful to
dure. As documented, many quality assurance share and disseminate to other VA medical centers
reports commonly will use a liberal p-value as “best practices.”
threshold (such as p < 0.10) to attempt to screen Though important, outcomes do have inherent
in more facilities for an in-depth quality review, limitations when used as quality of care metrics.
casting a broader net for the next step in the review Outcomes only indirectly provide information that
process. As O/E ratios (in and of themselves) are a potential challenge may exist related to quality of
not definitive measures of quality of care, VA care, but generally outcomes do not identify the
medical center #3 potentially might be selected specific actions needed to improve the quality of
for a detailed chart review and possibly a site visit patient care. Moreover, outcomes do not usually
(pending the results of the chart review) to explore provide an adequate level of information to guide
for possible quality of care challenges. In contrast, the required changes as “action items” that can be
taken by providers directly. Hence, the importance critical review by the academic, industry, patient,
of Donabedian’s triad assessment for quality of and public targeted audiences. Over the past two
care, as a complement of outcomes with processes decades, there has been an increasing emphasis
and structures, is required. placed on improving the public transparency as
well as sharing reports of risk-adjusted provider-
specific and facility-specific outcomes. As a case
Uncertainty in point, the Society of Thoracic Surgeons has
partnered with Consumer Reports to provide
For many quality-of-care endeavors, there is no online provider-specific outcome reports, with
adequate understanding of the relative impact of risk-adjusted outcomes (The Society of Thoracic
the patient risk factors upon adverse outcomes, Surgeons 2012). Given that the availability of
nor adequate understanding of what might be the risk-adjusted outcomes information is increasing,
natural course of events had the patients not it will be very interesting to observe the changes in
received any treatment or an alternative course both referral patterns and patient-provider choices
of treatment. With a variety of care alternatives that may occur over time in cardiac surgery utili-
often available, the best approach to address a zation rates, revealing to what degree changes in
patient’s unique risk factor profile is not always patient patterns in obtaining care may be or may
clear. For example, in treating patients with ische- not be related to the use of risk-adjusted out-
mic heart disease, there is strong evidence comes-based reports.
suggesting CABG to be the best care strategy for Emphasizing the clinician’s role in quality
patients with two- or three-vessel disease. How- improvement, Dr. Donabedian noted that “An
ever, the situations where medical management ideal physician is defined as one who selects
should be used to optimize long-term survival and implements the strategy of care that maxi-
versus manage angina symptoms versus a revas- mizes health status improvement without wasted
cularization may not be completely clear, particu- resources” (Donabedian et al. 1982). Toward this
larly for high-risk patients subgroups (e.g., goal, new quality of care metrics may be added to
patients with two prior heart surgical procedures, evaluate “timeliness” of care rendered. For exam-
as well as current severe angina symptoms). ple, Dr. Boris Sobolev and his Canadian-based
Given that clinical guidelines may provide research team have forged the way to identify
evidence-based care strategies for some but not patterns in surgery wait times, evaluating the
all patient subpopulations (particularly the impact of the timeliness of care rendered for
highest-risk patient subgroups), compliance with patients upon both their short-term and longer-
state-of-the-art evidence provides an important term outcomes (Sobolev and Fradet 2008).
indicator of quality of care – that is, a process- Dr. Sobolev has also done similar research in
based assessment to augment the risk-adjusted other surgical fields (e.g., general surgery and
outcomes assessments that may be coordinated. orthopedics) that has demonstrated that longer
Unfortunately, there is not always adequate evi- wait times do appear to have detrimental effects
dence basis to coordinate guidelines: a recent on patient outcomes across a variety of surgical
evaluations identified that for the current fields and procedures (Sobolev et al. 2003;
ACC/AHA guidelines promulgated from 1994 to Garbuz et al. 2006). Moreover, the referral pat-
2008, only 11% of the guidelines were based on terns related to the risk-adjusted outcomes may be
rigorous scientific, high-quality data-driven evi- stratified based on wait time delays, taking into
dence (based on a review of 53 guidelines on consideration the patient’s disease-related care
22 topics, with a total of 7,196 recommendations processes – not just focusing on a patient’s single
evaluated) (Tricoci et al. 2009). cardiac surgical care encounter. Although early in
To improve quality of care, it is important not the evolutionary process, the current focus of
only to identify and to monitor outcomes but also quality of care, which uses the patient encounter
to subject these risk-adjusted outcome reports to as the primary unit of analysis, is beginning to
transition to a disease management focus (e.g., 30-day operative mortality were produced; these
evaluating the care provided related to the were further refined in June 1990. With the VA
patient’s ischemic heart disease) and toward a CICSP fully implemented, the first risk-adjusted
patient-based holistic health perspective (Fihn outcomes reports (focused on mortality and major
et al. 2012). perioperative complications) were produced com-
paring the performance across of all VA-based
cardiac surgery programs.
Implementation of VA National Before the end of 1990, the CICSP data form
Quality Improvement Programs (originally comprised of 54 elements on a single
sheet of paper) with associated definitions for risk,
In 1972, the Department of Veterans Affairs procedure-related, and outcome variables was
(VA) established the Cardiac Surgery Consultants mandated nationally by the VA as a new quality
Board (CSCB) to provide quality assurance over- assurance requirement for all cardiac surgery pro-
sight for all VA-based cardiac surgery programs. grams. Based on the CICSP endeavor, a new
Initially, the Cardiac Surgery Consultants Board noncardiac surgical quality improvement pro-
review focus was placed on evaluating descriptive gram, entitled the National Surgical Quality
reports of observed mortality cases, as well as Improvement Program (NSQIP), was initiated in
monitoring rates for both mortality and major 1991 by Drs. Shukri Khuri and Jennifer Daley
morbidity outcomes. Chart audits and site visits (Khuri et al. 1998). Expanding the focus to
were performed by the Cardiac Surgery Consul- include a diversity of general surgical procedures,
tants Board to assure that minimum standards for the VA NSQIP initiative partnered with the
quality of cardiac surgery were met by means of a CICSP to obtain funding for local nurse or data
peer-review process (Veterans Health Administra- coordinators to prospectively gather the patient
tion 2008). preoperative risk characteristics, the detailed sur-
In 1985, the Health Care Financing Adminis- gical processes of care, and the mortality and
tration (HCFA) release of hospital report cards perioperative morbidity-related outcomes to be
raised the public’s awareness of the wide varia- able to coordinate risk-adjusted mortality reports.
tions experienced by hospitals for their surgical Similar to the CICSP oversight coordinated by the
outcomes reported. Additionally, the Administra- Cardiac Surgery Consultants Board, the NSQIP
tion Health Care Amendments Act was passed, established an Executive Committee (EC) with
requiring that the VA establish a new quality key analytical support coordinated by
assurance program which would identify signifi- Dr. William Henderson. Working in concert, the
cant deviations in risk-adjusted and unadjusted VA Central Office of Surgical Services (under the
mortality and morbidity rates for surgical proce- leadership and guidance of Drs. Gerald
dures when compared with prevailing national McDonald and Ralph DePalma) synchronized
rates (Public Law 99–166 1985). Accordingly, the CICSP and NSQIP efforts to provide data-
the VA had also to determine if any discrepancies driven reports routinely to both the national over-
that were identified were related to differences in sight committees (Cardiac Surgery Consultants
the quality of the VA-based healthcare services Board and NSQIP Executive Committee) as well
(Grover et al. 1990). as to share these reports with local and regional
To address these legislative requirements, Drs. surgical program leaders (including Cardiotho-
Hammermeister and Grover implemented in 1987 racic Division Chiefs, Chiefs of Surgical Services,
a new program entitled the “Continuous Improve- Medical Center leaders, and VA Regional Office
ment in Cardiac Surgery Program” (CICSP), gath- leaders). As a primary focus, both CICSP and
ering data related to each cardiac surgical patient’s NSQIP chose to make their top priority the provi-
unique set of risk factors, surgical procedural sion of good information to drive good local and
details, and 30-day operative death outcomes. In regional decisions – to support internal VA-based
December 1987, the first risk-adjusted reports for self-assessment and self-improvement initiatives.
With directives and continuous improvement focusing on risk-adjusted outcome metrics as the
communications coordinated by Drs. McDonald ultimate quality of care metrics. Mortality, in and
and DePalma, they were able to successfully proof itself, was a relatively rare event (under 3%
vide the right information at the right time to the mortality rate for CABG procedures). Given that
right individuals, as key decision-makers, to the chart reviews and site visits performed by the
empower them to take the right actions to improve VA Cardiac Surgery Consultants Board members
the safety and the quality of patient care. often provided meaningful insights into the chal-
As the first national comprehensive surgical lenges that occurred with processes and structures
quality improvement endeavor, the efforts of of care, they initiated a new VA Health Services
these key VA leaders, including Drs. Research and Development Study entitled Pro-
Hammermeister, Grover, Shroyer, Khuri, Daley, cesses, Structures, and Outcomes of Cardiac Sur-
and Henderson, radically shifted the quality-of- gery (PSOCS) to identify the important
care paradigm from crisis identification, focused components of the cardiac surgical care rendered
on uncovering problem facilities or providers, to veterans that may benefit by closer quality
where urgent action was needed to address defi- monitoring and reporting (Shroyer et al. 1995).
ciencies in care. The new goal was to improve the Funded in late 1991, the PSOCS study was
quality of care for all facilities and focused on initiated in May 1992 at 14 VA Medical Centers
evaluating metrics comprehensively over time with active cardiac surgery programs (out of the
(Itani 2009a, Rodkey and Itani 2009). These 44 total VA cardiac surgery programs). The
data-driven quality improvement programs have PSOCS study was a prospective cohort study,
made major impacts. The NSQIP program has with funded research nurses and data support per-
identified risk factors for morbidity and mortality sonnel. They gathered an extensive set of detailed
across a wide range of surgical subspecialties, data related to processes of care (including preop-
including general surgery, orthopedics, neurosur- erative, intraoperative, postoperative, and post-
gery, and many others (Itani 2009b). These risk discharge), structures of care related to the entire
factors have set the stage for continuous improve- care provider team (e.g., team member’s educa-
ment in the field of surgery by providing tools tional background, specialty training, years of
with which to better evaluate the role of surgery experience, and level of certification), and the
in individual patients’ care and better identify environment in which the care was rendered.
patients for prophylactic measures or closer mon- The environment was comprehensively assessed,
itoring in the intra- and postoperative periods. including data about the key features of the oper-
Having established the initial CICSP and ating room, recovery room intensive care units,
NSQIP’s legacy, these VA programs provided an telemetry monitoring, staffing levels, and the
impetus, serving as models for others (such as the quality and scope of oversight mechanisms. Addi-
Northern New England Cardiovascular Consor- tionally, the care provider interactions and com-
tium) to follow and to expand upon – with inno- munications were assessed via surveys. Finally,
vative enhancements (Malenka and O’Connor the nature and scope for surgical resident training
1998). were assessed, including the degree of supervision
provided to the residents engaged in cardiac sur-
gical patient care.
The Processes, Structures, To complement the traditional mortality and
and Outcomes of Cardiac Surgery morbidity outcome metrics routinely monitored
Study by CICSP, a very broad array of outcomes was
incorporated into the PSOCS study assessments.
During the early CICSP implementation period Focusing on the primary end points of death and
(1987–1991), however, it is important to realize major perioperative complications, outcome
that both Drs. Hammermeister and Grover recog- assessments were made at both 30 days following
nized that there were inherent limitations in surgery or at the completion of the inpatient
hospitalization (whichever came sooner) and at different sub-hypotheses evaluating a variety of

6 months post-CABG procedure. Comparing to the different intraoperative care dimensions,
baseline assessments, both a generic health- including operation duration, hemodynamic and
related quality-of-life instrument (i.e., the Vet- physiologic monitoring techniques, management
erans’ version of the Short-Form 36 health-related of hemodynamic function, anesthesia techniques
quality of life survey, the VR-36) and a disease- used, blood management approaches, myocardial
specific survey (i.e., the Seattle Angina Question- preservation technique, the use of the cardiopul-
naire) were used to assess the PSOCS veteran self- monary bypass machine, the surgeon-specific
perceptions of changes in physical, emotional, operative techniques used, the completeness of
and social functionality related to changes in the documentation for intraoperative care pro-
health status. Additionally, patient satisfaction vided, and the use of early extubation approaches.
with care was assessed to identify the concor- Given that a research nurse was located in the
dance of patient self-reported outcomes with clin- operating room for the duration of the procedure
ical outcomes of care, as well as to identify the to independently record the care provided, the
factors that may influence a patient’s CABG sur- medical chart’s completeness and quality of the
gical care-related experiences. documentation (e.g., the completeness of the sur-
geon’s dictated operative note) could be assessed
(O’Brien et al. 2004).
Hypotheses of the PSOCS Study Each PSOCS hypothesis (or sub-hypothesis)
was action driven; that is, the goal was to identify
As the overarching research question, the PSOCS the specific actions that care providers or
study identified the specific processes and struc- healthcare administrators or healthcare policy-
tures of care that could be revised in the future to makers would be able to take to improve the
improve the quality of cardiac surgery patient quality of future cardiac surgery patients’ care.
care. Importantly, the PSOCS study established a The PSOCS research questions raised were
vision that was based on a clinically relevant, based on the following assumptions:
conceptual framework of the wide diversity of
processes and structures of care that may be 1. A significant proportion of post-CABG
related to patient risk-adjusted outcomes. Specif- patients’ risk-adjusted healthcare outcomes
ically, the PSOCS study evaluated comprehen- could be explained by processes and/or struc-
sively the literature for all factors known in tures of care that could be improved.
surgery to be directly or indirectly related to 2. The processes of cardiac surgical care that
changes in patient outcomes, coordinating these were most likely to impact risk-adjusted out-
findings into a conceptual model that measured comes included the completeness and quality
the variables identified. There were six specific of the preoperative care processes, the
process and three specific structure hypotheses, intraoperative care processes, and the post-
with corresponding sets of sub-hypotheses, that CABG processes of care, as well as the conti-
were related to the dimensions (and correspond- nuity of follow-up care in the post-discharge
ingly the subdimensions) of the PSOCS concep- period.
tual model, tying each variable for which data was 3. The structures of cardiac surgical care that
gathered into an organized hierarchical relation- were most likely to impact risk-adjusted
ship of sets of variables, which could be analyzed patient outcomes included the degree of super-
in concert to address the specific research ques- vision by senior physicians, the degree and
tions raised. For example, one PSOCS hypothesis effectiveness of communications both among
focused on the intraoperative processes of care care provider team members as well as
performed that may influence the short-term and between team members and the patient and
intermediate-term patient outcomes. For family, and the nature and scope of the
intraoperative processes of care, there were ten quality-related oversight coordinated as part
of the medical staff organization and regula- provider team and by the patients themselves
tory activities that were performed as part of (for both cardiac disease-specific and general
the hospital’s quality integrating system. health status domains). The risk factors were
4. The structures of care that may impact out- also analyzed to evaluate to what degree modifi-
comes also included the number, education, able risk factors (e.g., patient’s alcohol use,
experience, and specialty training of the phy- smoking, and exercise habits) had a differential
sician provider team members (e.g., the sur- impact as compared to the non-modifiable risk
geon, cardiologist, and anesthesiologist). factors (e.g., the patient’s age, gender, or race/
Fundamentally, the provider team member ethnicity). Finally, a series of control variables
characteristics, mix of providers providing was used (e.g., provider identifier, facility identi-
care, and staffing levels, along with hospital fier, date/time sequencing variables) to coordinate
and physician experience, were important the complex analyses required.
structures that were hypothesized to impact In total, there were 1,453 variables gathered for
patient outcomes, after holding patient-specific each PSOCS patient, including 249 outcome-
baseline risk factors constant (Shroyer related dependent variables (which were ulti-
et al. 1995). mately used to calculate three short-term and
five intermediate 6-month outcomes) along with
Building on Dr. Donabedian’s paradigm for 1,102 independent variables (209 patient risk vari-
quality of care, the PSOCS study assumed that ables, 509 process-of-care variables, and
good processes and good structures of care were 303 structure-of-care variables) and 23 interval
very likely to lead to improved patient outcomes. events with 153 “control” variables used for ana-
Uncovering problems with specific processes of lytical purposes. Across the 14 participating med-
care or structure-related weaknesses in the ical centers, the PSOCS study enrolled 3,988
provider-based characteristics, the clinical care patients during the period from 1992 to 1996,
team mix, or facility-based characteristics, could with follow-ups coordinated through early 1997
indicate targets for scrutiny, where different (O’Brien et al. 2004).
actions could be taken to improve care. Due to the large number of variables, an initial
task was data reduction, addressing the missing
data and evaluating patterns of data completeness
Methods of the PSOCS Study across surgeons and VA medical centers. Because
intraoperative complications directly impacted
Given that PSOCS outcomes included assess- outcomes, these were addressed analytically. As
ments at 6 months post-discharge, a series of a first step, statistical risk models were built to
“interval events” was monitored, including both predict the 30-day operative and 6-month out-
health-related and non-health-related life events comes. Within domains and coordinated in a
during this post-discharge time period. The nested analysis across sub-domains, the impact
sequence and timing of post-discharge events of processes of care upon risk-adjusted outcomes
were gathered to evaluate the potential for inter- was evaluated. Specifically, processes of care
actions between post-discharge healthcare and related to operative duration (i.e., increased oper-
non-healthcare events upon risk-adjusted ative time), the use of inotropic agents, the use of
6-month patient outcomes of care. transesophageal echocardiographic (TEE) moni-
Importantly, a comprehensive array of patient- toring and systemic temperature monitoring, and
specific risk factors was gathered. Risk factors the use of hemoconcentration/ultrafiltration sys-
were classified in four dimensions assessed at tems were powerful predictors of adverse com-
baseline, including severity of cardiac disease, posite outcomes. Since some of these processes of
comorbidities (i.e., noncardiac diseases), demo- care may be initiated in response to adverse inter-
graphic and socioeconomic factors, and health mediate outcomes (e.g., intraoperative complica-
status evaluations performed by both the care tions), a more complex analytical approach was
used to evaluate for the main effects (rather than intertwined with structures of care (e.g., surgeon-
interaction-related effects) for processes of care. specific years of experience). Moreover, the
Following these adjustments, the use of PSOCS study challenged the ability of research
intraoperative transesophageal echocardiography to isolate process-specific or structure-specific
and the use of hemoconcentration/ultrafiltration impacts on adverse risk-adjusted outcomes, as
remained significantly associated with increased well as identified the need to differentiate
risk for an adverse outcome (O’Brien et al. 2004), unplanned versus planned processes of care, an
which was likely driven by patient complexity. important advancement forging forward the fron-
tier of quality assessment. Finally, the PSOCS
study documented that the statistical risk model-
Findings of the PSOCS Study ing approaches used may need to evolve, to be
process- or structure specific, in order to identify
An important finding of this study, unanticipated the unique risk factors that emerged (e.g., a new
in the original PSOCS design, was that, retro- intraoperative complication) directing the change
spectively, it is extremely difficult to differentiate from planned to unplanned approaches (O’Brien
planned versus unplanned processes of care. et al. 2004).
Intermediate outcomes, such as intraoperative
complications, may cause providers to initiate
new processes, previously unplanned, to address The CICSP-X Program
unforeseen challenges. Thus, differentiating
between a planned process of care (i.e., a process Having recently completed the PSOCS study’s
of care that would be generally initiated for all data capture and preliminary analyses, the VA
patients) versus an unplanned process of care CICSP was dramatically expanded (entitled
(i.e., a process of care that was initiated in CICSP-X [as an expansion of CICSP], under the
response to an unforeseen challenge) is a criti- leadership of Dr. Shroyer) in 1997 as a clinical
cally important distinction for meaningful quality national quality improvement database to identify
assessments. Quite simply, capturing the the interrelationships of risk factors with pro-
unplanned processes of care may be – in and of cesses and structures of care, as well as to include
itself – an important indicator as a quality metric. a broader set of clinical outcomes (Shroyer
With this important concept documented by et al. 2008). The CICSP-X program established
PSOCS, it became clear that the use of state-of- the feasibility of coordinated multidimensional
the-art techniques and equipment for monitoring quality database reports to address a more com-
may provide for the early identification of poten- prehensive set of quality of care metrics, with a
tial adverse events. comprehensive “dashboard” of summary metrics
To facilitate future quality-related research, the reported for different quality of care dimensions,
PSOCS study successfully built upon the histori- including a series of preestablished outcome met-
cal literature basis, denoting that inotropic use, rics, as well as processes and structures of care
transesophageal echocardiography use, and the measures.
use of hemoconcentration/ultrafiltration appear In 1997, Department of Defense (DoD) and VA
to potentially impact post-CABG risk-adjusted guidelines for Ischemic Heart Disease (IHD)
outcomes. The PSOCS found that there was a became an impetus for additional changes to the
consistent relationship documented between key VA Criteria and Standards, where new post-
times (i.e., cardiopulmonary bypass time or oper- CABG hospital medication-use requirements
ative time) and risk-adjusted adverse outcomes, were established (Veterans Health Administration
for which there is an association with the surgeon- and Department of Defense 1997). As a key pro-
specific and/or facility-specific practices. Not sur- cesses of care measure, the CABG-only patients
prisingly, therefore, the PSOCS study identified use of key evidence-based medical therapies
that processes (e.g., operative times) were was required for (1) lipid-lowering agents,
(2) beta-blockers for patients with a prior myocar- care to patients. As reference, these National
dial infarction, and (3) angiotensin-converting Quality Forum quality metrics specified what
enzyme (ACE) or angiotensin II receptor blocker would be anticipated “best practices” as well as
(ARB) medications for patients with a baseline established goals for surgeons to strive for in
low ejection fraction (40%). For CABG-only coordinating the care for their patients. For exam-
patients in high-risk subgroups, monitoring ple, the use of internal mammary artery (IMA)
extended to additional guidelines, measuring conduits for a CABG graft placed to the left ante-
compliance with standards including the use of rior descending artery (LAD) artery was generally
diabetic agents for diabetic patients and antihy- preferred based on improved long-term survival
pertensive medications for those with rates, as well as reduced rates for repeat revascu-
hypertension. larization procedures. Since it may take slightly
Due to the VA’s extensive Pharmacy Benefits longer to take down the internal mammary artery,
Management (PBM) program (and outstanding compared to harvesting a saphenous vein graft
leadership of the Pharmacy Benefits Management (SVG) conduit, this approach may not be advan-
enterprise), the rates of guideline-based medica- tageous for emergent patients. Similarly, elderly
tion use could be identified for a CABG-only patients may not live long enough to document the
patient based on their preoperative risk profile. internal mammary artery survival benefit
Although limited to identification of medications (Ferguson et al. 2002). Based on the National
filled via the VA pharmacy (medications filled at Quality Forum standards combined with
non-VA pharmacies could not be easily literature-based evidence and the feasibility of
ascertained), the compliance rates for all of the data to be captured, the CSCB identified as “best
guideline-required medications (using an “all-or- practice” the use of an internal mammary artery
none” evaluation) were routinely coordinated to graft for CABG-only procedures, particularly
assess overall cardiac surgery program perfor- emphasizing that this practice should be used for
mance. By improving compliance with Depart- the subgroup of non-emergent, patients (e.g., elec-
ment of Defense/VA guidelines, the goal was to tive and urgent cases). Starting in 2008, therefore,
improve long-term survival post-CABG surgery, the VA Criteria and Standards for Cardiac Surgery
as well as to optimize veterans’ long-term health Programs specified that a CSCB review would be
status and quality of life (Veterans Health Admin- performed for cardiac surgery programs that
istration CARE-GUIDE Working Group performed less than 80% of their CABG-only
et al. 1996). procedures using internal mammary artery grafts
during a 6-month reporting period. Figure 4, a
sample report, illustrates the variability in internal
Measuring Processes of Care mammary artery graft use across VA medical cen-
ters. Within this 6-month reporting period, center
During the late 1990s, a wide variety of national “X” had a CABG-only procedure internal mam-
watchdog agencies arose with the goal of provid- mary artery graft use rate of >80%. Hence, no
ing quality of care oversight such as the Leapfrog quality reviews of center “X” would normally be
initiative (Milstein et al. 2000). The National required for this preestablished internal mammary
Quality Forum was developed (Miller and artery graft use quality threshold.
Leatherman 1999) and published a set of perfor- In addition to assessing that the right processes
mance indicators that were intended to serve as of care were provided to the right patient, the VA
internal quality improvement metrics (National CICSP-X reports were expanded to also evaluate
Quality Forum 2004). At that time, the National cardiac surgical resource utilization, toward the
Quality Forum metrics represented the best data- goal of improving the efficiency of the VA care
driven evidence (or in the cases where evidence is provided (Shroyer et al. 2008). The resource uti-
lacking, the best clinical consensus) about the lization metrics included evaluating the rates of
optimal approaches to provide cardiac surgical the same-day surgery, the preoperative length of
Fig. 4 Example: rate of internal mammary artery graft use at Veterans Affairs Medical Centers
stay, the operating room times, the postoperative care (e.g., early discharge planning and social
length of stay, and the total length of stay for the work support systems).
veterans served. Because some patients Recent studies have attempted to further char-
underwent preoperative cardiac catheterizations acterize the importance and utility of these types
during the CABG hospitalization and others did of resource utilization metrics. For example, the
not, these two groups were considered separately, Virginia Cardiac Surgery Quality Initiative
since this difference could impact both the rates (VCSQI) database of over 42,000 patients under-
for same-day surgery and the total length of stay. going CABG was recently analyzed to investigate
As an example of important resource use the relationship between quality (as determined
metrics routinely evaluated by CICSP histori- by various risk-adjusted measures of morbidity
cally, the proportion of patients with same-day and mortality) and resource utilization (i.e., costs
surgery, the preoperative length of stay (both for and length of stay) at individual hospitals. The
patients with and without a cardiac catheteriza- VCSQI research team documented strong corre-
tion procedure during the CABG hospitaliza- lation between risk-adjusted morbidity and mor-
tion), the postoperative length of stay, and the tality with length of stay but not directly with
total length of stay were monitored. For exam- costs. This appears to support the importance of
ple, Fig. 5 (which is a sample report) illustrates these types of process of care and outcome mea-
the types of resource consumption profiles pro- sures in assessing the value of services rendered at
vided by center. Within this example 6-month cardiac surgical centers. Further, it was shown that
reporting period, center “X” might have had both preoperative and postoperative factors (e.g.,
several areas that were flagged for potential effi- comorbidities and complications, respectively)
ciency reviews to examine practices of influence both length of stay and costs,
discharge-related processes and structures of reinforcing the importance of healthcare quality
CICSP Cardiac Surgery Dashboard For All Centers

Resource Use Measures
Percent Pre-Op Length Pre-Op Length Total Post-Op
Six- Same Day Of Stay Of Stay Length of Total
Month Surgery without Cath without Cath Stay Length of Stay
Report (no cath) (median days) (median days) (median days) (median days)
Period Center Figure R3 Figure R6 Figure R9 Figure R10 Figure R11
FY07-2 1
FY07-2 2
FY07-2 3
FY07-2 4
FY07-2 5
FY07-2 6
FY07-2 7
FY07-2 8
FY07-2 9
FY07-2 10
FY07-2 Center-X
FY07-2 12
FY07-2 13
FY07-2 14
FY07-2 15
FY07-2 16
FY07-2 17
FY07-2 18
FY07-2 19
FY07-2 20
FY07-2 21
0% or Lower Upper quartile Upper quartile Upper quartile

> 1 day
quartlie (longer) (longer) (longer)
Mid range Mid range 1 day Mid range Mid range
Upper quartile Lower quartile Same day Lower quartile Lower quartile
Centers in the Upper / Lower quartiles and Mid range are not outliers
Fig. 5 Example: Veterans Affairs coronary artery bypass grafting procedural resource consumption dashboard report
initiatives in containing the costs associated with for the most recent 3-year period, and trends over
healthcare and increasing the value of the care time for the entire period monitored (from 1991 to
rendered (Osnabrugge et al. 2014a, b). the current reporting period) were coordinated.
These “Time Series Monitors of Outcome”
(TSMO) metrics were evaluated to identify if a
Monitoring Trends Over Time cardiac surgery program might be a “high outlier,”
“not an outlier,” or “low outlier” based on
Across all processes of care, structures of care, preestablished statistically driven thresholds (e.g.,
resource use, and risk-adjusted outcomes, reports high and low outliers were generally more than
for the most recent 6-month period, trends over time two standard deviations beyond the mean).
Additionally, the trend line slope was evaluated for overload by providing summaries of the findings
“upward” versus “downward” trending, versus “no identified in these detailed process, structure, out-
trend identified.” The subgroup of VA cardiac sur- come, and resource reports (Shroyer et al. 2008).
gery programs with upwardly oriented trends iden- Based on the dashboard reports, very busy VA
tified (i.e., a trend toward increasing adverse event Central Office leadership team members, regional
rates or increased resource use or problems with directors, hospital directors, and local VA cardiac
guideline compliance) or “high-outlier” status surgery program directors could coordinate
(potential challenges in overall performance) was informed data-driven decisions to address any
identified for intensive review, with potential site challenges identified, as well as work proactively
visits performed when these indicators clustered in to improve future VA cardiac surgery program
a manner to raise potential quality of care concerns. quality of care. Thus, as an infrastructure quality
Summary reports across all quality metrics (called reporting resource, the VA CICSP-X program set
“dashboards”) were developed, as the number of forth a dashboard framework that continues today
quality indicators increased. These dashboards pro- as part of the consolidated VA Surgical Quality
vided a quick and easy identification of the sub- Improvement Program (VA SQIP), setting the VA
group of VA cardiac surgery programs with as a leader in identifying, monitoring, and
challenges identified. Similarly, a focus was placed reporting quality for cardiac surgical care. As an
on identifying exemplary performance, that is, example of this, Fig. 6 documents that there was a
when clusters of positive performance indicators statistically significant downward trend observed
were identified, particularly if positive trends over for 30-day CABG operative mortality (a 2.1%
time were identified, as well as sustained positive reduction) from 1988 to 2007, indicative of con-
performance over time (Marshall et al. 1998). tinuing improvements over time for the CABG-
With the expanded focus on multidimensional only in-hospital surgical care and early post-
quality reports, the original CICSP report had discharge care provided.
grown from six pages to over 200 pages. The As the VA historically invested substantial
use of dashboards addressed the information support at both the national level (in the
Fig. 6 Example: Veterans Affairs time series monitors of outcome summary report evaluating trends in observed/
expected ratios over time
CICSP-X and NSQIP programs) and at the local STS Adult Cardiac Surgery Database was suc-
level (for the local nurses or data coordinators cessfully coordinated.
used originally to gather the data required), it is As background, the purpose of the STS Adult
important to pause to evaluate the return on this Cardiac Surgery Database was to gather data on
investment. Based upon VA findings to date, mortality, morbidity, and resource-use outcomes,
these quality improvement endeavors appear to as well as patient risk factors, to allow the evalu-
have positively impacted short-term and longer- ation of risk-adjusted cardiac surgical outcomes
term rates of adverse cardiac surgical outcomes, across providers and to report trends over time. By
with dramatic improvements and statistically 1995, Dr. Clark had reported that the Adult Car-
significant downward-sloping trends in the mor- diac Surgery Database had grown to include 1500
tality and morbidity rates over the 20+-year surgeons at 706 centers across 49 states, with
period reported (Grover et al. 2001; Shroyer decreasing postoperative length of stay trends
et al. 2008). Based on the trends in risk-adjusted documented and modest reductions in operative
outcomes reported, moreover, these positive mortality rates in spite of increasing patient risk
improvements do not seem to be related to the over time (Clark 1995).
VA taking on easier cardiac surgical cases, as the By the late 1990s, a wide variety of STS
risk profile for veterans basically remained the initiatives had been coordinated related to the
same (with the exception that the average age of enhancement of the Adult Cardiac Surgery
the veterans served increased slightly over the Database and the initiation of the Congenital
period of time evaluated) (Shroyer et al. 2008). Heart Surgery and General Thoracic Surgery
Moreover, the markers of VA efficiency simi- endeavors. The STS databases were distributed
larly documented substantial improvements, to the participants by means of licensed soft-
with same-day surgery rates rising from 0% ware products via vendors, with centralized
(1987) to 40% (1997). database management, analysis, and reporting
Although no causal impact could be identified functions coordinated by the Duke Clinical
(as many changes in both surgical practices and Research Institute (DRCI) team. Long-term
medical management of ischemic heart disease goals were preestablished for the STS databases
occurred during these same periods), these posi- to become the main repositories to support
tive trends in risk-adjusted outcomes support the improvements in local clinical decision-
continuation of quality improvement efforts and making, cardiac surgery program management,
the expansion of these programs beyond cardiac and policy decisions. Toward these goals,
surgical patient care. expansions of the existing database data forms
and definitions were expanded to ensure that
1595 future comparisons might be coordinated
Implementation of National Quality across a broader array of outcomes (e.g., health-
Improvement Programs related quality of life, functional status, longer-
term survival, and costs of care). Additionally,
Under the leadership and guidance of Dr. Richard comparisons of cardiac surgical procedures to
E. Clark, the Society of Thoracic Surgeons (STS) alternative treatments (e.g., cardiology-based
initiated the National Adult Cardiac Surgery Data- interventions, such as the placement of stents)
base (ACSD) in February 1991 with 330 surgeon were planned.
members at 81 centers throughout the United By the early 2000s, the STS Adult Cardiac
States participating initially in this quality Surgery Database was viewed as the largest clin-
improvement endeavor (The Society of Thoracic ical repository of data available in the country,
Surgeons 2014c). Although the original goal was used to guide both health policy discussions and
to initiate databases also for Congenital Heart debates on reimbursement at congressional hear-
Surgery (CHSD) and General Thoracic Surgery ings. Database reports were generated semiannu-
(GTSD), the development of these two databases ally, with local site reports compared to regional
was delayed until the full implementation of the and national profiles. As STS National Database
Committee members, Drs. Bruce Keogh (United agreement rates ranged from 94.5% (2007) to
Kingdom) and Paul Sargent (Belgium) worked 97.2% (2012), with improvements in the
with their European colleagues to build upon the variable-specific agreement trends over time.
STS Adult Cardiac Surgery Database structure a Although the operative mortality agreement rate
new European Association for Cardio-Thoracic was reportedly lower in earlier years, the rate of
Surgery Adult Cardiac Surgery Database reliability for death reporting has consistently
(EACTS), transforming the STS template into a remained above 95% since 2008. The STS exter-
structure that could be used to support quality nal audit process established that Adult Cardiac
improvement efforts globally. As of late 2008, Surgery Database data integrity is high, with data
this database was reported to include over one concordance reported at 97.2% (2012). By means
million patient records from 366 hospitals across of this external audit process, the STS Adult Car-
29 countries in Europe (Nashef et al. 1999; Head diac Surgery Database can be interpreted with
et al. 2013). confidence, with independent external auditor
The STS worked with the National Quality verifications confirming that the data submitted
Forum and the American Medical Association’s by STS participating surgeons and centers is of
Performance Improvement Physician’s Consor- the highest integrity (Winkley Shroyer et al. 2015,
tium to coordinate new quality of care metrics Member of STS Adult Cardiac Surgery Database
for national reporting from 1999 to 2001. These Workgroup, “personal communication”).
external collaborations, beyond the STS-based
quality reporting endeavors, were very important
to establish the external credibility of the STS Uncovering Quality Trends
Adult Cardiac Surgery Database. Even today, the
National Quality Forum metrics reported for adult Important quality improvement trends over time
cardiac surgery include the STS Adult Cardiac have been documented using the STS Adult Car-
Surgery Database-based metrics used widely in diac Surgery Database, including procedure-
program-based quality of care assessments (The specific or population-specific reductions in the
Society of Thoracic Surgeons 2014a). rate of adverse events reported. Overall rate of
Focused upon the importance of high-quality, reoperations and correspondingly the rate of
accurate, and reliable STS data to generate 30-day operative death have been documented to
reports, the STS Adult Cardiac Surgery Database be diminishing (6.0% down to 3.4% and 6.1%
Committee (chaired by Dr. Rich Prager) began a down to 4.6%, respectively) over the 10-year
new quality improvement process in 2006, ran- period of 2000–2009 (The Society of Thoracic
domly selecting STS participating sites to audit Surgeons 2014c). Importantly, the field of cardio-
and validate the number of cardiac surgical thoracic surgery has documented substantial qual-
records and outcomes submitted by participating ity improvements over time, with diminishing
surgeons and sites. For a random sampling of rates of mortality and morbidity (Ferguson
Adult Cardiac Surgery Database participating et al. 2002). As noted by Dr. Ferguson, remark-
sites from 2007 to 2013, each audited sites’ sub- able strides to improve cardiac surgical care have
mitted risk, operative procedure, and outcome been initiated by the surgeons (e.g., the use of new
data were compared with data obtained indepen- techniques for improved myocardial persevera-
dently by an external audit company. The number tion) and the pharmaceutical industry (providing
of Adult Cardiac Surgery Database sites audited new medications). Other improvements include
increased from 24 in 2007 (3% of sites) to 86 in the implementation of care pathways, the forma-
2013 (8% of sites). Over 92% of audited STS sites tion of cardiac surgery dedicated teams (e.g.,
provided positive audit feedback, noting that the including a dedicated cardiac anesthesiologist),
audit process had positively impacted their data better approaches used for patient selection, as
accuracy. Across all risk, process of care, and well as innovations to improve the efficiency of
outcome variable categories, the aggregate care (e.g., “fast-track” cardiac surgery early
extubation protocols). Even though the popula- consideration. Studies on atrial fibrillation have
tion of cardiac surgical patients has grown older demonstrated that certain prophylactic measures
and sicker over time, risk-adjusted outcomes have (e.g., amiodarone, beta-blockers, magnesium,
improved. Another major change over time was atrial pacing) do significantly reduce the rate of
the growing reliance of the STS Adult Cardiac postoperative atrial fibrillation after cardiac sur-
Surgery Database by key national US-based deci- gery, as well as shorten hospital stays and decrease
sion-makers, including legislators. The STS Adult the cost of hospital care by over $1,200. No sig-
Cardiac Surgery Database was used to identify, nificant effects on mortality or the incidence of
monitor, report, and target future cardiac surgical stroke have been demonstrated, however
improvements, shifting the national quality (Arsenault et al. 2013). Similarly, a new module
debates from a conceptual framework to data- related to documentation of the details of cardiac
driven patient care, program management, and anesthesiology was added in July 2013 to identify
policy discussions (Ferguson et al. 2002). the anesthesiology-related processes of care that
As a major transformation to multidimensional may be targeted for future quality improvement
quality metrics, the STS has led the way in the initiatives (The Society of Thoracic Surgeons
development of composite scores, which were 2013). Most importantly, the focus on STS cardiac
adopted by the National Quality Forum as new and thoracic procedural outcomes has been
quality metrics in 2008. Specifically, Dr. David shifted to evaluate long-term outcomes, such as
Shahian and the STS National Database Commit- long-term survival. Toward this goal, database
tee worked to coordinate an STS coronary artery matches with the national death registry were
bypass graft (CABG) composite score. The com- performed, with the first long-term follow-up
posite score was comprised of risk-adjusted mor- risk models predicting survival completed
tality, risk-adjusted morbidity, a surgeon-related in 2012.
process of care metric (i.e., the use of the internal
mammary artery as a conduit), and a facility-
related process of care metric (i.e., the use of The Michigan Society of Thoracic
beta-blocker medications perioperatively) and Cardiovascular Surgeons Quality
(O’Brien et al. 2007). In combination, these Collaborative
multidimensional composite metrics are used to
categorize STS facilities and surgeons into “star The Michigan Society of Thoracic and Cardiovas-
ratings” for quality, based on a three-star, two-star, cular Surgeons Quality Collaborative (MSTCVS-
and single-star rating system, differentiating high- QC), as an example of a regional STS initiative, is
versus low-quality centers based on the composite led by Dr. Richard Prager. The MSTCVS-QC is a
metric (The Society of Thoracic Surgeons 2014b). consortium of 33 cardiac surgery programs
Based on the success of the CABG-only compos- throughout the state of Michigan focused on iden-
ite score, an isolated aortic valve replacement tifying intraoperative and postoperative opportu-
(AVR) composite score was designed and nities to improve the quality of cardiac surgical
implemented in 2012, as well as a combined aortic care. As one of their recent endeavors, they exam-
valve replacement-CABG composite score ined the use of blood transfusions as a potential
in 2014. quality of care metric, examining the relationship
Most recently, the STS has added new modules between blood product use and clinical outcomes.
to enhance focused quality endeavors for high- The MSTCVS-QC found that quality collabora-
risk patient subgroups. For example, a new mod- tive educational approaches may have very posi-
ule related to prophylaxis and treatment of cardiac tive impacts, as the blood product utilization was
surgery patients that experience atrial fibrillation documented to decrease dramatically after routine
was added. As atrial fibrillation is a very common quarterly reporting of program-identified transfu-
post-cardiac surgical complication, its prevention sion rates was implemented. The quarterly
and early treatment is an important quality MSTCVS-QC incorporated very frank
discussions about the potential adverse effects driven approaches used to assess and to improve
(i.e., increased risk of mortality and morbidity) cardiac surgical patient’s quality of care (Shih
associated with transfusions. Under the leadership et al. 2014).
of Dr. Prager, the Michigan team’s persistent and
continued focus on this topic has dramatically
revised clinical practice and enhanced blood prod- The American College of Surgeons’
uct conservation approaches used throughout Private Sector Initiative
Michigan State (Paone et al. 2013).
Another MSTCVS-QC recent endeavor As a separate endeavor, the American College of
looked at how to reduce hospital-acquired infec- Surgeons (ACS) coordinated an NSQIP Private
tions (HAI) related to CABG procedures. Sector initiative, building upon the VA-based his-
Hospital-acquired infections include complica- torical work by Dr. Shukri Khuri’s team. The first
tions such as pneumonia, sepsis, septicemia, step in this process was a feasibility study
wound-related infections, as well as other infec- conducted in 1999 at three non-VA hospitals (Uni-
tions reported. As of early 2008, Medicare has not versity of Kentucky, University of Michigan, and
reimbursed hospitals for post-CABG Emory University) (Fink et al. 2002). Based on
mediastinitis-related treatments, as infections the initial success of this feasibility project, the
(such as mediastinitis) are perceived to be directly NSQIP was expanded in 2001 to include 18 cen-
related to a lower quality of surgical care provided ters as part of a pilot project funded by the Agency
during the initial CABG hospitalization. Interest- for Healthcare Research and Quality (AHRQ)
ingly, Dr. Prager and his MSTCVS-QC colleagues (Hall et al. 2009). Subsequently, the American
found that on average 5.1% of CABG patients College of Surgeons’ NSQIP pilot was expanded
developed hospital-acquired infection postopera- in 2004 to include other private hospitals’
tively. Moreover, there was a tremendous varia- reporting.
tion in the reported rates of post-CABG hospital- As background, the VA-based NSQIP had
acquired infections (ranging from 0.9% to been documented to improve risk-adjusted mor-
19.1%). Differences in cardiac surgery program- tality and morbidity across a diversity of surgical
based patient risk characteristics did not account disciplines. For the period from 1991 to 2004, the
for much of this dramatic difference in program- surgical 30-day operative mortality rate improved
based hospital-acquired infection rates observed. by 31%, and the surgical 30-day perioperative
Within this analysis, four centers appeared to be major morbidity rate improved by 45% (Khuri
high outliers (i.e., had a hospital-acquired infec- 2005). During this time period, the VA NSQIP
tion O/E ratio that was statistically significantly findings reported were deemed to be the “best in
higher than 1.0). Based on in-depth evaluations of the nation” by the Institute of Medicine in 2003
the CABG care rendered at these four “high-out- for evaluating the quality of surgery across a
lier” centers, the MSTCVS-QC team concluded broad range of surgical specialties (Khuri 2005).
that the largest variations were found for pneumo- The “Patient Safety in Surgery” (PSS) study was
nia and multiple infection end points. Based on initiated during 2001–2004 to evaluate the impact
their reviews, they thought a multidisciplinary of a uniform quality improvement system and to
care team approach was needed to address the compare VA and non-VA-based outcomes of care
challenges identified, ideally to bridge across tra- (where care-related details were gathered contem-
ditional specialty-based silos of care, facilitating poraneously using a standardized set of data
future heart patient team-based care approaches in forms, definitions, and analyses). With nearly
the future. Working collaboratively as an STS 185,000 surgical patient records gathered across
regional society, therefore, the MSTCVS-QC 128 VA medical centers and 14 private sector
team provides research on quality improvements hospitals, there were significant differences in
that extend beyond the STS Adult Cardiac Sur- the types of surgical procedures performed and
gery Database capabilities, enhancing the data- patient baseline risk characteristics across the VA
versus non-VA hospitals. In spite of these differ- preliminarily identified. Specifically, a higher
ences in patient risk factors and procedures O/E ratio (a potential marker for quality of metrics
performed, the O/E ratios for 30-day operative concerns) was found to be associated with several
death were remarkably similar between the VA factors including anesthesia organized as a sepa-
and non-VA facilities (correlation coefficient = rate service, a larger number of operating rooms,
0.98). Similar to the VA trends identified earlier, more frequent reports of short staffing, and a
the non-VA private sector hospitals had an 8.7% higher rate for staff surgeons to be paid in part
decrease in major perioperative complications by the affiliated medical center. As a key process
over the 3 years of the study, documenting an of care identified, changes in the anesthesia pro-
important and substantive quality improvement vider during the case (i.e., from across the pre-,
(Khuri et al. 2008). intra-, and postoperative time periods) were asso-
The Agency for Healthcare Research and ciated with worse risk-adjusted mortality rates. A
Quality provided a grant to Dr. Khuri’s team, negative relationship between surgical volume
based in part upon these promising findings, to (e.g., fewer cases per surgeon per month) and
evaluate the “Structures and Processes of Surgical risk-adjusted morbidity (e.g., higher rates of peri-
Care Study” in late 2003 to relate the processes operative complications) was identified. Overall,
and structures of surgical care to postoperative the self-reported survey findings for processes and
risk-adjusted outcomes. For this NSQIP-based structures of care appeared to be more strongly
endeavor, surveys were sent out to the 123 VA associated with the risk-adjusted morbidity rates
sites and 14 private sector sites that participated in observed, rather than risk-adjusted mortality rates
the Patient Safety in Surgery study earlier. The documented. Importantly, the VA self-survey
survey included many questions, but specifically findings identified that a more integrated surgical
asked for information as to the organization of the service appeared to improve communication and
preoperative, intraoperative, and postoperative coordination of surgical care, as well as the effec-
care services. Additionally, there was information tiveness of surgical team performance. Thus,
gathered on hospital-specific surgical program- these preliminary survey findings provided an
based characteristics such as surgical program impetus for the documentation of surgery-specific
size, surgeon-specific volumes at the VA and processes and structures of care, as well as the
non-VA affiliates, patterns in surgical staffing development of a more comprehensive set of
ratios, the nature of the organizational structure, quality metrics that are currently evaluated by
the use of local facility-based quality improve- NSQIP (Main et al. 2007).
ment efforts, the types of novel equipment/tech- In 2004, the Private Sector Study (conducted at
nology available (e.g., ultrasonography used in 14 academic non-VA hospitals) was expanded and
the operating room, the use of a harmonic scalpel, opened to other private sector hospitals. By 2008,
the use of radio-frequency ablation, or availability the American College of Surgeons’ NSQIP market
of ultrasound-guided aspiration devices), avail- penetration for private hospitals included over
able information systems, the use of coordina- 200 facilities with diverse characteristics located
tion/communication processes, as well as throughout the United States. The initial evaluation
residency training program characteristics. The of the first 3 years (2005–2007) documented dra-
published results from the VA-based surveys matic improvements in quality of surgical care
(with responses sent back by the local Chiefs of rendered, with 66% of the hospitals documenting
Surgical Service) identified that there were tre- improved risk-adjusted mortality rates and 82% of
mendous variations in the processes and struc- the hospitals documenting improved risk-adjusted
tures of general surgical care. As documented by morbidity rates. In spite of the increasing patient
the descriptive survey findings, the process and risk characteristics reported (e.g., average patient
structure variables that appeared to be associated age increased over time), the results were impres-
with risk-adjusted morbidity (14 variables) and sive, with 9,598 potential complications avoided at
risk-adjusted mortality (four variables) were 183 private sector hospitals (Hall et al. 2009).
Although many factors likely contributed to these evaluate outcomes, there must be a plausible con-
important and positive changes, the use of a data- ceptual relationship (if not actual data) that would
driven quality improvement initiative was identi- identify any other quality of care factors that could
fied as a major factor that appeared to lead to better be associated with the outcomes selected for
outcomes, cost savings, as well as improvements in evaluation.
safety across patient subgroups (Maggard-Gibbons Different clinical fields are at different stages of
2014). Several publications were coordinated eval- maturation in selecting the “best” quality metrics.
uating the usefulness of different types of process For surgical services, it has been demonstrated that
and structural interventions. Reducing the rate of the use of processes, structures, and risk-adjusted
adverse clinical outcomes, the documented set of outcomes (as a comprehensive set of quality met-
effective interventions included the use of proto- rics) would be the most appropriate to consider. In
cols to manage postoperative blood glucose for other fields (e.g., psychiatry), however, simply
diabetic patients, the use of venous thrombosis defining the frequency of a broad array of clinical
risk evaluations for high-risk patient subgroups, outcomes (along with the variety of risk factors that
standardized approaches for wound care manage- may be related to these outcomes) may be a more
ment, the use of physician order entry templates, appropriate starting place for a project.
the helpfulness of clinical pathways (e.g., a stan- A good outcomes assessment instrument
dardized approach to remove Foley catheters), should be:
enhanced tracking, and the use of more detailed
patient tracking/monitoring tools for postoperative • Valid (reflect variations in quality that are
pulmonary management. Hence, changes in Medi- consistent with expectations)
care payment reforms were initiated to provide • Reliable (have reproducible findings across
positive reimbursement incentives for surgeons multiple raters for similar assessments of qual-
and hospitals to participate in national quality ity of care)
improvement reporting endeavors such as • Timely (measure a sufficient time 2057
ACS-NSQIP and the STS national database sequence to evaluate the impact of medical
endeavors. Most importantly, the use of clinical 2058 care provided)
databases developed by surgeons for surgeons’ • Sensitive to change (reflect changes associ-
use in self-assessment and self-improvement ated with the care impacts provided)
endeavors gained momentum; with clinician- • Feasible to implement (reasonable to capture
leaders rising to the ranks of government organiza- given time and cost constraints)
tions (e.g., Dr. Jeff Rich, a cardiothoracic surgeon • Clinically relevant (reflect “best practice” and
taking on a top-level leadership role with the Cen- be useful to guide clinical decisions and/or
ters for Medicare and Medicaid Services) to actions)
advance the science of quality measurement and
management. (MacDermid et al. 2009). Additionally, the
accurate documentation of risk factors is critical
to allow risk-adjusted outcomes for meaningful
Implementation Challenges: Dilemmas comparison across provider subgroups, facilities,
Faced by Quality Measurement or patient subgroups (Shahian et al. 2004).
Projects Although many advancements have been
made in identifying approaches to implement
In evaluating the optimal quality metric or set of Dr. Donabedian’s triad for assessing quality of
metrics to use for a project, researchers must con- patient care, many challenges remain that cause
sider many factors. The purpose of the project as difficulties in achieving these goals. Specifi-
well as the type of questions raised will direct cally, there are issues related to handling missing
which types of assessments are most important data (Hamilton et al. 2010; Parsons et al. 2011).
(e.g., process, structure, and/or outcomes). To Although different statistical approaches can be
used to address missing data challenges, the may not necessarily result in adverse longer-term
distribution of missing data is unlikely to be outcomes.
random. Based on the nature and distribution Additional difficulties in evaluating quality of
of the missing data, therefore, it may be appro- patient care may be related to the uncertainty in
priate to clinically substitute specific values. For documenting the sequence and timing of events.
example, substituting negative findings for As a case in point, the NSQIP database was used
missing complications may be appropriate, as to evaluate the impact of the timing of major
the medical chart does not uniformly document perioperative complications upon mortality. Inter-
complications that did not occur. Pending the estingly, early wound infections resulted in a
need for a statistical imputation approach, there higher risk of mortality, in spite of adjusting for
are ways to reduce uncertainty associated with patient risk factors and other complication bur-
imputation. Whatever the approach used, the dens. Somewhat surprisingly, the early occurrence
assumptions and methodological details should of cardiac arrest or unplanned intubation was
be documented. Where possible, sensitivity ana- associated with lower risk of mortality after
lyses should be conducted to evaluate the impact adjustment for other factors. However, late occur-
of the different imputation approaches upon the rence of pneumonia, acute myocardial infarction,
study-specific findings (as well as potential deci- or cerebrovascular accident was associated with
sions to be drawn from these findings) (Hamil- higher risk of mortality (Wakeam et al. 2014).
ton et al. 2010). Although these study findings were preliminarily
Another challenge that arises in quality of care based on NSQIP database records, the timing and
assessments is differentiating between planned sequence of perioperative complications does
and unplanned processes or structures of care, as appear to matter when identifying the interrela-
well as to what degree these processes were coor- tionships of different adverse events, such as com-
dinated in response to interim outcomes. For plications and mortality.
example, Dr. Guyatt and his team conducted a Finally, there are many factors that impact
systematic review and meta-analysis of the factors patient longer-term outcomes including both med-
associated with unplanned readmission for ran- ical events and nonmedical factors that occur after
domized, controlled, clinical trials of heart failure the main medical intervention studied. Specifi-
interventions (Gwadry-Sridhar et al. 2004). They cally, the VA PSOCS study evaluated the factors
found that targeted heart failure patients who that influenced 6-month mortality and 6-month
received an educational intervention experienced health-related quality of life (Rumsfeld
a significantly decreased rate of unplanned hospi- et al. 2001, 2004). The variations in the occur-
tal readmissions. As part of their review and ana- rence of interval events following post-CABG
lyses, they identified that unplanned readmission discharge, including both medical and
(as an adverse process of care that occurred rela- nonmedical life events, were substantial. Simi-
tively infrequently following targeted heart failure larly, Dr. Murphy and colleagues found that living
interventions) was a potential quality of care met- alone following CABG surgery was a major risk
ric that was clinically relevant to monitor. How- factor for readmission, when such solitary patients
ever, unplanned readmission for congestive heart were compared to those who were married or
failure patients who received targeted educational lived with others (Murphy et al. 2008).
interventions did not correspond with a decrease
in longer-term patient survival (in the 6 months to
1 year post-intervention period). Thus, appropri- Summary
ate treatments coordinated at the time of the
unplanned readmission may have mitigated any In summary, the goal of improving quality of care
adverse impact upon the longer-term survival end is an elusive one. The end point may appear to be
point. In summary, unplanned processes of care in sight but, like a distant horizon, it cannot be
that occur may be related to interim outcomes and reached. Great achievements have been
accomplished in implementing Dr. Donabedian’s Donabedian A. Explorations in quality assessment and

framework, particularly in the cardiac surgery and monitoring vol. 1. The definition of quality and
approaches to its assessment. Ann Arbor: Health
general surgery fields. This process for defining, Administration Press; 1980.
measuring, and improving the quality of patient Donabedian A. Criteria and standards for quality assess-
care is the mechanism that advances best practices ment and monitoring. QRB Qual Rev Bull. 1986;12
and approaches optimum outcomes. (3):99–108.
Donabedian A. The quality of care. How can it be
In a pluralistic society, the top priorities for assessed? JAMA. 1988;260(12):1743–8.
quality of care initiatives are often difficult to ascer- Donabedian A, Wheeler JR, Wyszewianski L. Quality,
tain. Clinical outcomes may not correspond with cost, and health: an integrative model. Med Care.
patients’ self-reported outcomes, and the demand 1982;20(10):975–92.
Ferguson TB Jr, Hammill BG, Peterson ED, DeLong ER,
for cost containment may conflict with both. Grover FL, S. T. S. N. D. Committee. A decade of
Future electronic medical record systems change – risk profiles and outcomes for isolated coro-
(with a greater proportion of encoded data ele- nary artery bypass grafting procedures, 1990–1999: a
ments) may provide enhanced information, and report from the STS National Database Committee and
the Duke Clinical Research Institute. Society of Tho-
statistical data reduction techniques combined racic Surgeons. Ann Thorac Surg. 2002;73(2):480–9;
with more sophisticated risk modeling analyses discussion 489–90.
may identify the details for the best practices to Grover FL, Hammermeister KE, Burchfiel C. Initial report
improve patient outcomes. The next generation of the Veterans Administration Preoperative Risk
Assessment Study for Cardiac Surgery. Ann Thorac
of clinicians and scientists will advance the Surg. 1990;50(1):12–26; discussion 27–18.
frontier, with multidisciplinary, collaborative Grover FL, Shroyer AL, Hammermeister K, Edwards FH,
investigative teams leading the way. Ulti- Ferguson Jr TB, Dziuban Jr SW, Cleveland Jr JC, Clark
mately, the focus may be expanded beyond the RE, McDonald G. A decade’s experience with quality
improvement in cardiac surgery using the Veterans Affairs
simplistic avoidance of major adverse events to and Society of Thoracic Surgeons national databases. Ann
encompass more subtle aspects of healing and Surg. 2001;234(4):464–72; discussion 472–464.
health. Khuri SF, Henderson WG, Daley J, Jonasson O, Jones
RS, Campbell Jr DA, Fink AS, Mentzer Jr RM,
Neumayer L, Hammermeister K, Mosca C,
Acknowledgments This book chapter was supported, in Healey N, S. Principal Investigators of the Patient
part, by the Offices of Research and Development at the Safety in Surgery. Successful implementation of the
Northport and the Eastern Colorado Health Care System, Department of Veterans Affairs’ National Surgical
Department of Veterans Affairs Medical Centers, as well as Quality Improvement Program in the private sector:
by the Stony Brook University School of Medicine’s the Patient Safety in Surgery study. Ann Surg.
Department of Surgery and the Stony Brook University 2008;248(2):329–36.
Health Science Center Library. Additionally, special O’Brien MM, Shroyer AL, Moritz TE, London MJ,
thanks are extended to Ms. Sarah Miller (University of Grunwald GK, Villanueva CB, Thottapurathu LG,
Colorado at Denver), Ms. Carol Wollenstein (Nursing Edi- MaWhinney S, Marshall G, McCarthy Jr M, Hender-
tor, Stony Brook University), and Ms. Jennifer Lyon (Ref- son WG, Sethi GK, Grover FL, Hammermeister KE,
erence Librarian, Stony Brook University) for their S. Va Cooperative Study Group on Processes and
proofreading and editorial assistance. S. Outcomes of Care in Cardiac. Relationship
between processes of care and coronary bypass oper-
ative mortality and morbidity. Med Care. 2004;42
(1):59–70.
References Shroyer AL, London MJ, VillaNueva CB, Sethi GK,
Marshall G, Moritz TE, Henderson WG, McCarthy Jr
Blackstone EH. Comparing apples and oranges. J Thorac MJ, Grover FL, Hammermeister KE. The processes,
Cardiovasc Surg. 2002;123(1):8–15. structures, and outcomes of care in cardiac surgery
Blumberg MS. Risk adjusting health care outcomes: a study protocol. Med Care. 1995;33(10 Suppl):
methodologic review. Med Care Rev. 1986;43 OS17–25.
(2):351–93. Shroyer AL, McDonald GO, Wagner BD, Johnson R,
Codman EA. The classic: the registry of bone sarcomas as Schade LM, Bell MR, Grover FL. Improving quality
an example of the end-result idea in hospital organiza- of care in cardiac surgery: evaluating risk factors, pro-
tion. 1924. Clin Orthop Relat Res. 2009;467 cesses of care, structures of care, and outcomes. Semin
(11):2766–70. Cardiothorac Vasc Anesth. 2008;12(3):140–52.
Sobolev B, Fradet G. Delays for coronary artery bypass Anderson RJ, Henderson W, V. A. C. S. Group. Long-
surgery: how long is too long? Expert Rev term patency of saphenous vein and left internal mam-
Pharmacoecon Outcomes Res. 2008;8(1):27–32. mary artery grafts after coronary artery bypass surgery:
results from a Department of Veterans Affairs Cooper-
ative Study. J Am Coll Cardiol. 2004;44(11):2149–56.
Gwadry-Sridhar FH, Flintoft V, Lee DS, Lee H, Guyatt
Further Readings GH. A systematic review and meta-analysis of studies
comparing readmission rates and mortality rates in
Arsenault KA, Yusuf AM, Crystal E, Healey JS, Morillo patients with heart failure. Arch Intern Med. 2004;164
CA, Nair GM, Whitlock RP. Interventions for (21):2315–20.
preventing post-operative atrial fibrillation in patients Hall BL, Hamilton BH, Richards K, Bilimoria KY, Cohen
undergoing heart surgery. Cochrane Database Syst Rev. ME, Ko CY. Does surgical quality improve in the
2013;1, CD003611. American College of Surgeons National Surgical Qual-
Blackstone EH. Breaking down barriers: helpful break- ity Improvement Program: an evaluation of all partici-
through statistical methods you need to understand pating hospitals. Ann Surg. 2009;250(3):363–76.
better. J Thorac Cardiovasc Surg. 2001;122(3):430–9. Hamilton BH, Ko CY, Richards K, Hall BL. Missing data
Clark RE. The STS Cardiac Surgery National Database: an in the American College of Surgeons National Surgical
update. Ann Thorac Surg. 1995;59(6):1376–80; dis- Quality Improvement Program are not missing at ran-
cussion 1380–1371. dom: implications and potential impact on quality
Dowsey MM, Petterwood J, Lisik JP, Gunn J, Choong assessments. J Am Coll Surg. 2010;210(2):125–39,
PF. Prospective analysis of rural–urban differences in e122.
demographic patterns and outcomes following total Head SJ, Howell NJ, Osnabrugge RL, Bridgewater B,
joint replacement. Aust J Rural Health. 2014;22 Keogh BE, Kinsman R, Walton P, Gummert JF,
(5):241–8. Pagano D, Kappetein AP. The European Association
Ferguson TB Jr, Coombs LP, Peterson ED. Internal tho- for Cardio-Thoracic Surgery (EACTS) database: an
racic artery grafting in the elderly patient undergoing introduction. Eur J Cardiothorac Surg. 2013;44(3):
coronary artery bypass grafting: room for process e175–80.
improvement? J Thorac Cardiovasc Surg. 2002;123 Hiratzka LF, Eagle KA, Liang L, Fonarow GC, LaBresh
(5):869–80. KA, Peterson ED, C. Get With the Guidelines
Fihn SD, Gardin JM, Abrams J, Berra K, Blankenship JC, Steering. Atherosclerosis secondary prevention per-
Dallas AP, Douglas PS, Foody JM, Gerber TC, formance measures after coronary bypass graft sur-
Hinderliter AL, King 3rd SB, Kligfield PD, Krumholz gery compared with percutaneous catheter
HM, Kwong RY, Lim MJ, Linderbaum JA, Mack MJ, intervention and nonintervention patients in the Get
Munger MA, Prager RL, Sabik JF, Shaw LJ, Sikkema With the Guidelines database. Circulation. 2007;116
JD, Smith Jr CR, Smith Jr SC, Spertus JA, Williams SV, (11 Suppl):I207–12.
Anderson JL, F. American College of Cardiology Itani KM. A celebration and remembrance. Am J Surg.
Foundation/American Heart Association Task. 2012 2009a;198(5 Suppl):S1–2.
ACCF/AHA/ACP/AATS/PCNA/SCAI/STS guideline Itani KM. Fifteen years of the National Surgical Quality
for the diagnosis and management of patients with Improvement Program in review. Am J Surg.
stable ischemic heart disease: a report of the American 2009b;198(5 Suppl):S9–18.
College of Cardiology Foundation/American Heart Karthik S, Fabri BM. Left internal mammary artery usage
Association task force on practice guidelines, and the in coronary artery bypass grafting: a measure of quality
American College of Physicians, American Associa- control. Ann R Coll Surg Engl. 2006;88(4):367–9.
tion for Thoracic Surgery, Preventive Cardiovascular Khuri SF. The NSQIP: a new frontier in surgery. Surgery.
Nurses Association, Society for Cardiovascular Angi- 2005;138(5):837–43.
ography and Interventions, and Society of Thoracic Khuri SF, Daley J, Henderson W, Hur K, Demakis J, Aust
Surgeons. Circulation. 2012;126(25):e354–471. JB, Chong V, Fabri PJ, Gibbs JO, Grover F,
Fink AS, Campbell Jr DA, Mentzer Jr RM, Henderson Hammermeister K, Irvin 3rd G, McDonald G, Passaro
WG, Daley J, Bannister J, Hur K, Khuri SF. The Jr E, Phillips L, Scamman F, Spencer J, Stremple
National Surgical Quality Improvement Program in JF. The Department of Veterans Affairs’ NSQIP: the
non-veterans administration hospitals: initial demon- first national, validated, outcome-based, risk-adjusted,
stration of feasibility. Ann Surg. 2002;236(3):344–53; and peer-controlled program for the measurement and
discussion 353–344. enhancement of the quality of surgical care. National
Garbuz DS, Xu M, Duncan CP, Masri BA, Sobolev VA Surgical Quality Improvement Program. Ann Surg.
B. Delays worsen quality of life outcome of primary 1998;228(4):491–507.
total hip arthroplasty. Clin Orthop Relat Res. Lee PJ, MacLennan A, Naughton NN, O’Reilly M. An
2006;447:79–84. analysis of reintubations from a quality assurance data-
Goldman S, Zadina K, Moritz T, Ovitt T, Sethi G, base of 152,000 cases. J Clin Anesth. 2003;15
Copeland JG, Thottapurathu L, Krasnicka B, Ellis N, (8):575–81.
MacDermid JC, Grewal R, MacIntyre NJ. Using an National Institutes of Health. PROMIS: Patient Reported
evidence-based approach to measure outcomes in clin- Outcomes Measurement Information System. 2014.
ical practice. Hand Clin. 2009;25(1):97–111, vii. Retrieved 25 Oct 2014, from http://www.nihpromis.
Mack MJ. If this were my last speech, what would I say? org/
Ann Thorac Surg. 2012;94(4):1044–52. National Quality Forum. National voluntary consensus
Maggard-Gibbons M. The use of report cards and outcome standards for cardiac surgery. Washington, DC:
measurements to improve the safety of surgical care: National Quality Forum; 2004.
the American College of Surgeons National Surgical National Quality Forum. NQF patient safety terms and
Quality Improvement Program. BMJ Qual Saf. definitions. Washington, DC: National Quality Forum;
2014;23(7):589–99. 2009.
Magno G. The healing hand; man and wound in the ancient Nawal Lutfiyya M, Bhat DK, Gandhi SR, Nguyen C,
world. Cambridge, MA: Harvard University Press; 1975. Weidenbacher-Hoper VL, Lipsky MS. A compari-
Main DS, Henderson WG, Pratte K, Cavender TA, son of quality of care indicators in urban acute care
Schifftner TL, Kinney A, Stoner T, Steiner JF, Fink hospitals and rural critical access hospitals in the
AS, Khuri SF. Relationship of processes and structures United States. Int J Qual Health Care. 2007;19
of care in general surgery to postoperative outcomes: a (3):141–9.
descriptive analysis. J Am Coll Surg. 2007;204 Nielsen ME. The legacy of Ernest A. Codman in the 21st
(6):1157–65. century. J Urol. 2014;192(3):642–4.
Malenka DJ, O’Connor GT. The Northern New England Norcini JJ, Boulet JR, Dauphinee WD, Opalek A, Krantz
Cardiovascular Disease Study Group: a regional col- ID, Anderson ST. Evaluating the quality of care pro-
laborative effort for continuous quality improvement in vided by graduates of international medical schools.
cardiovascular disease. Jt Comm J Qual Improv. Health Aff (Millwood). 2010;29(8):1461–8.
1998;24(10):594–600. O’Brien SM, Shahian DM, DeLong ER, Normand SL,
Marshall G, Shroyer AL, Grover FL, Hammermeister Edwards FH, Ferraris VA, Haan CK, Rich JB, Shewan
KE. Time series monitors of outcomes. A new dimen- CM, Dokholyan RS, Anderson RP, Peterson
sion for measuring quality of care. Med Care. 1998;36 ED. Quality measurement in adult cardiac surgery:
(3):348–56. part 2 – statistical considerations in composite measure
Mavroudis C, Mavroudis CD, Jacobs JP, Siegel A, scoring and provider rating. Ann Thorac Surg. 2007;83
Pasquali SK, Hill KD, Jacobs ML. Procedure-based (4 Suppl):S13–26.
complications to guide informed consent: analysis of Osnabrugge RL, Speir AM, Head SJ, Jones PG,
society of thoracic surgeons-congenital heart surgery Ailawadi G, Fonner CE, Fonner E Jr, Kappetein AP,
database. Ann Thorac Surg. 2014;97(5):1838–49; dis- Rich JB. Cost, quality, and value in coronary artery
cussion 1849–51. bypass grafting. J Thorac Cardiovasc Surg.
Miller T, Leatherman S. The National Quality Forum: a 2014a ;148(6):2729-35.
‘me-too’ or a breakthrough in quality measurement Osnabrugge RL, Speir AM, Head SJ, Jones PG,
and reporting? Health Aff (Millwood). 1999;18 Ailawadi G, Fonner CE, Fonner Jr E, Kappetein AP,
(6):233–7. Rich JB. Prediction of costs and length of stay in
Milstein A, Galvin RS, Delbanco SF, Salber P, Buck Jr coronary artery bypass grafting. Ann Thorac Surg.
CR. Improving the safety of health care: the leapfrog 2014b;98(4):1286–93.
initiative. Eff Clin Pract. 2000;3(6):313–6. Ozker E, Saritas B, Vuran C, Yoruker U, Ulugol H, Turkoz
Murphy BM, Elliott PC, Le Grande MR, Higgins RO, R. Delayed sternal closure after pediatric cardiac oper-
Ernest CS, Goble AJ, Tatoulis J, Worcester ations; single center experience: a retrospective study. J
MU. Living alone predicts 30-day hospital readmission Cardiothorac Surg. 2012;7:102.
after coronary artery bypass graft surgery. Eur J Paone G, Brewer R, Likosky DS, Theurer PF, Bell GF,
Cardiovasc Prev Rehabil. 2008;15(2):210–5. Cogan CM, Prager RL, T. Membership of the Michigan
Nashef SA, Roques F, Michel P, Gauducheau E, Society of and S. Cardiovascular. Transfusion rate as a
Lemeshow S, Salamon R. European system for cardiac quality metric: is blood conservation a learnable skill?
operative risk evaluation (EuroSCORE). Eur J Ann Thorac Surg. 2013;96(4):1279–86.
Cardiothorac Surg. 1999;16(1):9–13. Papanikolaou PN, Christidi GD, Ioannidis JP. Patient out-
Nashef SA, Roques F, Sharples LD, Nilsson J, Smith C, comes with teaching versus nonteaching healthcare: a
Goldstone AR, Lockowandt U. EuroSCORE II. Eur J systematic review. PLoS Med. 2006;3(9), e341.
Cardiothorac Surg. 2012;41(4):734–44; discussion Parsons HM, Henderson WG, Ziegenfuss JY, Davern M,
744–735. Al-Refaie WB. Missing data and interpretation of can-
National Committee for Quality Assurance. About NCQA. cer surgery outcomes at the American College of Sur-
2014a. Retrieved 21 Nov 2014, from http://www.ncqa. geons National Surgical Quality Improvement
org/AboutNCQA.aspx Program. J Am Coll Surg. 2011;213(3):379–91.
National Committee for Quality Assurance. Persistence of Public Law 99–166. Veterans’ Administration Health-Care
beta-blocker treatment after a heart attack. 2014b. Amendments of 1985. Public Law. 1985;99–166.
Retrieved 21 Nov 2014, from http://www.ncqa.org/ Puskas JD, Kilgo PD, Thourani VH, Lattouf OM, Chen E,
ReportCards/HealthPlans/StateofHealthCareQuality/ Vega JD, Cooper W, Guyton RA, Halkos M. The soci-
2014TableofContents/BetaBlockers.aspx ety of thoracic surgeons 30-day predicted risk of
mortality score also predicts long-term survival. Ann The Society of Thoracic Surgeons. STS CABG Composite
Thorac Surg. 2012;93(1):26–33; discussion 33–25. Score. 2014b. Retrieved 30 Oct 2014, from http://www.
Rodkey GV, Itani KM. Evaluation of healthcare quality: a sts.org/sts-public-reporting-online/cabg-composite-
tale of three giants. Am J Surg. 2009;198(5 Suppl):S3–8. score
Rumsfeld JS, Magid DJ, O’Brien M, McCarthy Jr M, The Society of Thoracic Surgeons. STS National Database.
MaWhinney S, Scd ALS, Moritz TE, Henderson WG, 2014c. Retrieved 30 Oct 2014, from http://www.sts.
Sethi GK, Grover FL, Hammermeister KE, org/national-database
S. Department of Veterans Affairs Cooperative Study Tran C, Wijeysundera HC, Qui F, Tu JV, Bhatia
in Health Services: Processes and S. Outcomes of Care RS. Comparing the ambulatory care and outcomes for
in Cardiac. Changes in health-related quality of life rural and urban patients with chronic ischemic heart
following coronary artery bypass graft surgery. Ann disease: a population-based cohort study. Circ
Thorac Surg. 2001;72(6):2026–32. Cardiovasc Qual Outcomes. 2014;7(6):8, 35–43.
Rumsfeld JS, Ho PM, Magid DJ, McCarthy Jr M, Shroyer Tricoci P, Allen JM, Kramer JM, Califf RM, Smith Jr
AL, MaWhinney S, Grover FL, Hammermeister SC. Scientific evidence underlying the ACC/AHA
KE. Predictors of health-related quality of life after clinical practice guidelines. JAMA. 2009;301
coronary artery bypass surgery. Ann Thorac Surg. (8):831–41.
2004;77(5):1508–13. van Kasteren ME, Mannien J, Ott A, Kullberg BJ, de Boer
Shahian DM, Blackstone EH, Edwards FH, Grover FL, AS, Gyssens IC. Antibiotic prophylaxis and the risk of
Grunkemeier GL, Naftel DC, Nashef SA, Nugent WC, surgical site infections following total hip arthroplasty:
Peterson ED, S. T. S. w. o. e.-b. surgery. Cardiac sur- timely administration is the most important factor. Clin
gery risk models: a position article. Ann Thorac Surg. Infect Dis. 2007;44(7):921–7.
2004;78(5):1868–77. Veterans Health Administration. VHA handbook 1102.3:
Shann KG, Giacomuzzi CR, Harness L, Myers GJ, Paugh criteria and standards for cardiac surgery programs.
TA, Mellas N, Groom RC, Gomez D, Thuys CA, Washington, DC: Veterans Health Administration;
Charette K, Ojito JW, Tinius-Juliani J, Calaritis C, 2008.
McRobb CM, Parpard M, Chancy T, Bacha E, Cooper Veterans Health Administration and Department of
DS, Jacobs JP, Likosky DS. Complications relating to Defense. VA/DoD clinical practice guideline for the
perfusion and extracorporeal circulation associated with management of ischemic heart disease. Washington,
the treatment of patients with congenital cardiac disease: DC: Veterans Health Administration, Department of
consensus definitions from the Multi-Societal Database Defense; 1997.
Committee for Pediatric and Congenital Heart Disease. Veterans Health Administration CARE-GUIDE Working
Cardiol Young. 2008;18 Suppl 2:206–14. Group, Denver VA Medical Center CARE-GUIDE
Sharp LK, Bashook PG, Lipsky MS, Horowitz SD, Miller Coordinating Team and United States Veterans Health
SH. Specialty board certification and clinical outcomes: Administration Office of Quality Management, Denver
the missing link. Acad Med. 2002;77(6):534–42. VA Medical Center CARE-GUIDE Coordinating Team,
Shih T, Zhang M, Kommareddi M, Boeve TJ, Harrington United States Veterans Health Administration Office of
SD, Holmes RJ, Roth G, Theurer PF, Prager RL, Likosky Quality Management. Veterans Health Administration
DS, T. Michigan Society of and C. Cardiovascular Sur- CARE-GUIDE for ischemic heart disease. Washington,
geons Quality. Center-level variation in infection rates DC: Department of Veterans Affairs; 1996.
after coronary artery bypass grafting. Circ Cardiovasc VillaNueva CB, Ludwig ST, Shroyer AL, Deegan NI,
Qual Outcomes. 2014;7(4):567–73. Steeger JE, London MJ, Sethi GK, Grover FL,
Sobolev B, Mercer D, Brown P, FitzGerald M, Jalink D, Hammermeister KE. Variations in the processes and
Shaw R. Risk of emergency admission while awaiting structures of cardiac surgery nursing care. Med Care.
elective cholecystectomy. CMAJ. 2003;169(7):662–5. 1995;33(10 Suppl):OS59–65.
The Joint Commission. Ernest Amory Codman Award. Wakeam E, Hyder JA, Tsai TC, Lipsitz SR, Orgill DP,
2014. Retrieved 23 Oct 2014, from http://www. Finlayson SR. Complication timing and association
jointcommission.org/codman.aspx with mortality in the American College of Surgeons’
The Society of Thoracic Surgeons. Consumer Reports and National Surgical Quality Improvement Program data-
STS Public Reporting. 2012. Retrieved 27 Oct 2014, base. J Surg Res. 2014,193(1):77–87.
from http://www.sts.org/news/consumer-reports-and- Winkley Shroyer AL, Bakaeen F, Shahian DM, Carr BM,
sts-public-reporting Prager RL, Jacobs JP, Ferraris V, Edwards F, Grover FL.
The Society of Thoracic Surgeons. Adult Cardiac Anes- The society of thoracic surgeons adult cardiac surgery
thesia Module. 2013. Retrieved 30 Oct 2014, from database: the driving force for improvement in cardiac
http://www.sts.org/sts-national-database/adult-cardiac- surgery. Semin Thorac Cardiovasc Surg. 2015 Sum-
anesthesia-module mer;27(2):144–51. PubMed PMID: 26686440.
The Society of Thoracic Surgeons. NQF# 0696: The STS Yasa H, Lafci B, Yilik L, Bademci M, Sahin A, Kestelli M,
CABG Composite Score. NQF: Quality Positioning Yesil M, Gurbuz A. Delayed sternal closure: an effec-
System. 2014a. Retrieved 30 Oct 2014, from http:// tive procedure for life-saving in open-heart surgery.
www.qualityforum.org/QPS Anadolu Kardiyol Derg. 2010;10(2):163–7.
Health Services Information:
Data-Driven Improvements in Surgical 7
Quality: Structure, Process, and Outcomes
Katia Noyes, Fergal J. Fleming, James C. Iannuzzi, and

John R. T. Monson
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Stakeholders for Surgical Outcome
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Types of Data for Surgical Outcome
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Existing Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Changes in Surgical Procedures and Practices Over Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Individual Surgeon Variation (Preferences,
Techniques, and Skills) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Timing of Complications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Limited Information on Socioeconomic Drivers of Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Need for Linked Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Data Management and Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Structure-Process-Outcome Assessment in Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Theoretical Framework of Quality Assessment in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . 154
Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Surgical Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
Risk Adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
K. Noyes (*)
Department of Surgery, University of Rochester Medical
Center, Rochester, NY, USA
e-mail: katia_noyes@urmc.rochester.edu
F. J. Fleming · J. C. Iannuzzi
University of Rochester Medical Center, Rochester, NY,
USA
J. R. T. Monson
Florida Hospital System Center for Colon and Rectal
Surgery, Florida Hospital Medical Group Professor of
Surgery, University of Central Florida, College of
Medicine, Florida Hospital, Orlando, FL, USA
e-mail: john.monson.md@flhosp.org

https://doi.org/10.1007/978-1-4939-8715-3_8
142 K. Noyes et al.
From Data to Quality Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

Understanding Hospital Billing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Focusing on Modifiable Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Identifying Actionable Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Presenting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Abstract Percutaneous Coronary Interventions Reporting

The barriers to surgical quality improvement in System (PCIRS) in 1992 allowed for develop-
the United States are significant. Fee-for- ment and ongoing calibration of the cardiac sur-
service reimbursement approach does not gery risk-adjusted mortality model which in turn
encourage provider communication and drives provides meaningful reports that local practices
volume, not value. Quality report cards and can use to compare their performance with sim-
pay-for-performance strategies have been ilar groups and national benchmarks, without
implemented to reflect performance of individ- the fear of being penalized for treatment high-
ual providers at specific healthcare settings, but risk patients. In the last 20 years, greatly due to
they have not been very effective at enforcing the publically available CABG Reports Cards,
continuity of care and integration. In this chap- the outcomes of CABG and over cardiac surgical
ter we describe how, using Donabedian procedures have improved dramatically
approach to quality assessment, one can (Mukamel and Mushlin 1998; Hannan et al.
develop reliable and useful quality indicators 1994, 1995).
for surgical services. We review main sources In recent decades, as life expectancy has con-
of relevant data and discuss practical implica- tinued to grow around the world, the illness profile
tions of working with each of the databases. of highly populated countries in the Middle East
Finally, we provide an overview of current and Asia has undergone an epidemiologic transi-
knowledge gaps and challenges affecting sur- tion from predominantly infectious diseases to
gical care delivery and provide recommenda- primarily chronic illness, vastly expanding the
tion for future research and quality role and importance of surgical services. Surgical
improvement interventions. procedures that were previously extremely rare as
well as “simple, ineffective, and relatively safe”
became common, “complex, effective, and poten-
Introduction tially dangerous” (Chantler 1999). On average, an
American patient is expected to undergo about 10
Quality assessment and public reporting are surgical procedures in a lifetime, translating into
powerful approaches to improve quality of care an estimated 234 million operations annually
whether it is preventive services, acute surgical worldwide (Weiser et al. 2008; Lee et al. 2008).
care, and chronic illness management. We can While surgery can be extremely beneficial, often
learn a lot from the 20 years of coronary artery saving lives, surgical procedures are also associ-
bypass grafting (CABG) surgery report cards ated with the risk of complications, infection, and
experience (Hannan et al. 2012). It is also widely death. Furthermore, surgical interventions are the
recognized that the chief factor of the success of key treatment modalities for many prevalent con-
the cardiac surgery report cards is the develop- ditions including cancer, trauma, and obstetrics,
ment of the New York State (and then national) positioning surgical quality and safety as one of
coronary angioplasty reporting system to ensure the top public health concerns.
collection of high-quality clinical data, includ- Public worry and focus on medical outcomes is
ing data elements not routinely available from entirely warranted. The Institute of Medicine
administrative databases. Establishment of the (IOM) in the landmark 1999 patient safety report
7 Health Services Information: Data-Driven Improvements in Surgical Quality: Structure. . . 143
“To Err is Human” concluded that the healthcare discrepancies and misalignments can be
in the United States is not as safe as it should be. observed with respect to surgical outcomes. The
One of the report’s main revolutionary conclu- vast majority of surgical oncologists will con-
sions was that the majority of medical errors in sider clean margins as synonymous with being
the United States did not result from individual “cured of cancer,” despite the fact that a patient
recklessness. More commonly errors are caused may still have to endure many months of
by faulty systems, processes, and underlying con- exhausting and toxic chemotherapy and radia-
ditions that lead people to either make mistakes or tion, temporary or permanent colostomy, fatigue,
fail to prevent them. The report advocated reduc- depression, and undesirable cosmetic changes.
ing harm through system-based initiatives rather Successful quality improvement in clinical prac-
than increasing pressure on individual providers tice requires a common vision, multidisciplinary
(Brown and Patterson 2001). A focus on surgical plans, and cooperation among all involved stake-
outcomes is thus even more paramount where any holders, across the spectrum of all clinical pro-
small slip can quickly lead to disastrous viders including healthcare administrators,
consequences. payers, social services, community organiza-
While the IOM report led to some system-level tions, and patient advocates.
improvements, including expansion of health Hurtado (Hurtado et al. 2001) defines quality
insurance coverage through PPACA in 2010, as “the degree to which health services for indi-
many problems remained or even worsened. In viduals and populations increase the likelihood of
2013, the IOM convened a committee of experts desired health outcomes and are consistent with
to examine the quality of cancer care in the United current professional knowledge,” but such broad
States and formulate recommendations for definitions can have limited direct applications. A
improvement. Delivering High-Quality Cancer more useful definition of quality measures it over
Care: Charting a New Course for a System in six domains: effectiveness, timely access, capac-
Crisis presented the committee’s findings and rec- ity, safety, patient centeredness, and equity
ommendations. The committee concluded that the (Leatherman and Sutherland 2003). Within each
cancer care delivery system is in crisis due to a of these domains, it is possible to measure various
growing demand for cancer care, increasing treat- elements, and so from this paradigm, a picture of a
ment complexity (including surgical procedures), service’s quality of care can be outlined. However,
a shrinking workforce, and rising costs (Levit et such comprehensive assessment can be too bur-
al. 2013). densome and thus not practical for frequent mon-
While it is widely recognized and accepted itoring and real-time evaluation.
that assessment of surgical quality and outcomes In addition, there have been significant efforts
should be a continuous process alongside care to identify and assess important elements of care
delivery, there is no clear consensus on how, pathways, rather than individual procedures,
when, and what outcomes should be measured. which may lead to better outcomes and higher
The problem is fueled by the fact that quality’s quality (Donabedian 1966; Hurtado et al. 2001;
definition changes depending on the stake- Maxwell 1984; Schiff and Rucker 2001; Sitzia
holder’s perspective. For instance, surgeons and Wood 1997). Many countries have made sig-
evaluate each other’s quality based on technical nificant progress with the implementation of
skills, board certifications, and morbidity which national quality programs (Department of Health
is under their perceived direct control, character- Office 1995; Department of Health 2000) includ-
istics that are often invisible and hence meaning- ing NSQIP (Agency for Healthcare Research and
less to patients. Instead, patients prefer clinicians Quality 2009; Australian Commission on Safety
with excellent communication skills who are and Quality in Healthcare 2008; American Col-
always on time, regardless of whether or not the lege of Surgeons 2014a), but further research is
surgeon is a board-certified Fellow of the Amer- required to accurately and affordably improve
ican College of Surgeons (FACS). Similar assessments of surgical quality.
144 K. Noyes et al.
Stakeholders for Surgical Outcome contribute to determining guidelines aimed at

Assessment standardizing care for specific biologic systems
as demonstrated by the American Society of
There are many stakeholders that actively partic- Colon and Rectal Surgeons who release guide-
ipate in surgical quality initiatives. When there is lines about colon screening recommendations,
common purpose between these groups, pro- prophylaxis, and other elements of cancer care.
gress can easily be made; however, often There are also disease-specific groups such as the
agendas do not align making advancement diffi- Consortium for Optimizing Surgical Treatment of
cult. Understanding the key stakeholder, their Rectal Cancer (OSTRiCh), or regional groups
perspective, and roles is fundamental to quality such as the Upstate New York Quality Initiative
improvement. (UNYSQI), which currently focuses on improv-
Medical societies and professional groups have ing the quality of colon resections. Quality
long been the leaders in developing clinical prac- improvement at the hospital and surgical division
tice guidelines, supporting provider accreditation, level also occurs aimed at more specific interven-
and both auditing and providing clinical training tions such as thromboprophylaxis protocols or
as well as continuing medical education activities. surgical site infection prevention bundles that are
While heavily dominated by surgeons, the field of more applicable to single providers or individual
surgical outcome assessment also includes medi- hospital systems. This hierarchical structure, how-
cal and radiation oncologists, imaging scientists, ever, is not partitioned or independent with exten-
primary care providers, other advanced care part- sive overlap between organizations, societies,
ners, and allied health professionals. These disease-specific coalitions, and locoregional ini-
include, but are not limited to, the American Col- tiatives. Collaborations between all groups can
lege of Surgeons (ACS), the Commission on Can- propel initiatives; however, their recommenda-
cer (CoC) the Consortium for Optimizing Surgical tions are not always aligned with one another
Treatment of Rectal Cancer (OSTRiCh), Ameri- with nuanced differences that can create confu-
can Society of Colon and Rectal Surgeons sion and can potentially hinder quality improve-
(ASCRS), Society for Surgery for the Alimentary ment efforts.
Tract (SSAT), Society of Surgical Oncology, and In the current environment post-PPACA,
others (American College of Surgeons 2014b, c; accountable care organizations are frequently
Optimizing the Surgical Treatment of Rectal Can- the key drivers of clinical quality improvement.
cer 2014; Society for Surgery of the Alimentary This is because according to the Triple Aim prin-
Tract 2016; Society for Surgical Oncology 2014). ciple developed by Don Berwick and the IHI,
The provider stakeholder structure can take high-quality care overall is less expensive than
many forms and can work at every level of the poor care. Accountable Health Partners LLC is
healthcare system. For instance, the American one of the accountable care organizations in the
College of Surgeons represents an umbrella orga- Greater Rochester area. It was organized to create
nization that pushes an overarching quality a partnership between URMC and community
agenda. Its purpose is to be broad, as the organi- physicians, to enable them to succeed in the
zation spans multiple disciplines. While ACS looming era of value-based contracts by creative
includes lobbying initiatives in congress, it also initiatives to deliver high-quality care at a lower
has recently employed benchmarking for hospi- cost. The goals and interests of the AHP are
tals and now individual providers through data parallel to those of PPACA: to engage specialty
collection and risk adjustment. Other broad orga- providers in the delivery of integrated care path-
nizations, such as the National Comprehensive ways; to establish efficient communication
Cancer Network (NCCN), release specific con- between care managers in medical homes, pri-
sensus guidelines aimed at improving care mary care, and specialist practices; to develop
through utilizing the best available evidence. an integrated information system capable of
Other societies with a narrower focus also monitoring quality of care measures; and to
develop a payment mechanism to facilitate such Medicaid eligible individuals through its
engagement. partnering organization, Monroe Plan. Over the
Other community-based stakeholders may years, Excellus partnered with many other com-
include medical societies, public health and safety munity stakeholders (e.g., Kodak, MCMS,
providers and agencies, social and aging services, URMC) to lead several area-wide initiatives
and educational organizations. Stakeholders out- aimed to improve quality of care and population
side of the healthcare system and non-for-profit health and reduce necessary variation in care and
world may include patient support groups and services overuse.
organizations, payers, large self-insured corpora-
tions, and business alliances who are also inter-
ested in improving overall community health at a Types of Data for Surgical Outcome
lower cost (Blackburn 1983; Brownson et al. Assessment
1996; Group 1991; Fawcett et al. 1997; Goodman
et al. 1995; Howell et al. 1998; Johnston et al. Existing Data Sources
1996; Mayer et al. 1998; Zapka et al. 1992;
Roussos and Fawcett 2000). In Upstate There are multiple types of medical data available,
New York, the Greater Rochester and Finger and each have their own set of complexities that
Lakes regions are well recognized for their long while answering important questions also leave
history of community-wide collaborations includ- gaps that require further analysis from alternative
ing University of Rochester Medical Center, Fin- perspectives found through other data sources.
ger Lakes Health Systems Agency (FLHSA), Typical datasets are comprised of the following:
Monroe County Medical Society (MCMS), Roch- hospital discharges, claims, registry, and survey
ester Business Alliance, Rochester regional office results. Other administrative types of data include
of American Cancer Society (2014), local payers hospital discharge data or billing data as recorded
(e.g., Excellus Blue Cross Blue Shield), account- and provided by the hospital itself. These datasets
able care organizations, and others. The FLHSA is are highly dependent on local practices and can
an independent community health planning orga- vary between institutions. It can be linked with
nization working collaboratively with multi- other subject data providing an in-depth chart
stakeholder groups to improve healthcare quality review; however, it is limited by the cases
and access and eliminate healthcare disparities in performed at an individual hospital. Some states
the nine-county Finger Lakes region. Its mission have statewide discharge census data, including
is to bring into focus community health issues via California and New York (Hannan et al. 1994,
data analysis and community engagement and to 1995, 2012, CA Society of Thoracic Surgeons
implement solutions through community collabo- 2014). These datasets provide billing data at a
ration and partnership. It has become the convener larger level, which includes ICD-9 codes by diag-
and facilitator of multi-stakeholder community nosis, with the ability to track hospital and sur-
initiatives to measure and improve the health, geon level variation, subject linking
healthcare, and cost of care. In the initial round longitudinally across in-state and charges (in con-
of the CMMI Innovation Challenge, the FLSHA trast to claims paid out) (Table 1).
was awarded with a $26.6 million initiative Claims data are available at a national as well
“Transforming the Delivery of Primary Care: A as local levels and include Medicare data that can
Community Partnership.” be linked to other datasets and insurance claims
Excellus Blue Cross Blue Shield is a nonprofit (i.e., Excellus-blue shield, large self-insured cor-
health plan, whose mission is to work collabora- porations (Xerox, Kodak), and data warehouses
tively with local hospitals, doctors, employers, (Thompson Reuters)). Registry data can be quite
and community leaders to offer affordable detailed, albeit specific to the registry’s purpose.
healthcare products. For instance, Excellus Examples of registry datasets include tumor reg-
administers its managed care products for istries like SEER that can be linked to Medicare
146 K. Noyes et al.
Table 1 Types of data used to assess surgical outcomes, quality, and safety
Types of data Databases Examples
Cancer SEER, NCDB (Mack et al. 2013; Rutter et al. 2013)
registry
Hospital Case series (Sinclair et al. 2012; Aquina et al. 2014b)
registry
Observational SPARCS, Statewide data, Medicare/Medicaid, (Rickles et al. 2013; Aquina et al. 2014a)
UHC
Randomized CEA/CAS (NASCET) Colonoscopy trial, Breast (Ferguson et al. 1999; Grube and Giuliano 2001;
controlled cancer z0011 Whitlock et al. 2008; Atkin 2003)
trials
Cost-Data PharMetrics, hospital billing, Medicare Charges, (Iannuzzi et al. 2014b; Jensen et al. 2012; Tufts
Tufts Cost-Effectiveness Registry 2014)
Process SCIP, WHO Surgical checklist, inpatient (The Joint Commission Core Measure Sets
measures smoking, VTE prophylaxis 2014a; American College of Surgeons,
Commission on Cancer, Surgical Care
Improvement Project 2014b; Safety 2008)
Satisfaction HCAHPS, Press Ganey (Systems 2014; Press Ganey Associates 2014)
Benchmarking ACS-NSQIP observed to expected mortality (Centers for Medicare and Medicaid Services
ratio (United States, thoracic, transplant; United 2014; Department of Health 2000; Cohen et al.
Kingdom, all surgeons), hospital compare, 2009a, b; Medicare.gov 2014)
creating centers of excellence (Medicaid Centers
of Excellence for breast cancer)
AMA provider survey (Etzioni et al. 2010, 2014)
AHA (ICU/staffing/nursing) (Nallamothu et al. 2006; Solomon et al. 2002)
SEER Surveillance, Epidemiology, and End Results Tumor Registry, NCDB National Cancer Data Base, SPARCS
New York Statewide Planning and Research Cooperative System, UHC University HealthSystem Consortium, SCIP
Surgical Care Improvement Project, VTE venous thromboembolism, HCAHPS Hospital Consumer Assessment of
Hospital Providers and Systems, ACS-NSQIP American College of Surgeons National Surgical Quality Improvement
Project, CMS Center for Medicare and Medicaid Services, AHA American Hospital Association, AMA American Medical
Association, ICU intensive care unit
for more robust analysis, NCDB that expands medicine, and challenged the traditional approach
cancer data beyond the identified cancer centers of confidential reporting of adverse events. Based
that are included within SEER, and the National on its success, this was expanded to the STS
Surgical Quality Improvement Program National Database established in 1989. The STS
(NSQIP) registry that samples approximately states that “physicians are in the best position to
20% of all cases performed at participating hos- measure clinical performance accurately and
pitals. Other registries include those maintained objectively” (Surgeons 2014), serving as a man-
by provider organizations (AMA, AHA). date for surgeon participation in these initiatives.
Finally, survey data can provide the patient per- While cardiac surgery has long maintained a
spective that is lacking from other large dataset similar database for tracking quality, this
analyses. Two prime examples are the Medicare approach was expanded nationally to help
Current Beneficiary Survey and the Hospital improve surgical outcomes. The National Surgi-
Consumer Assessment of Hospital Providers cal Quality Improvement Program (NSQIP) has
and Systems (HCAHPS) Survey. been a major development within the surgical
The first database for surgical outcomes was community as it provides more detailed surgical
developed in NYS for cardiothoracic surgery information at a national level than was ever pre-
(Hannan et al. 1990) leading to substantial quality viously available. The main purpose of this pro-
improvement, facilitating development of the gram was to improve quality through
field of quality assessment and risk adjustment in benchmarking, where hospitals were given risk-
adjusted data comparing outcomes nationally to New York collaborative, called UNYSQI (Upstate
other hospitals of similar size. Based on the depth New York Surgical Quality Initiative), has
of data, numerous research studies have been focused predominantly on colorectal surgery and
conducted, describing surgical risk factors and more specifically at addressing the question of
comparing operative approaches. While this has readmissions. NSQIP allows for 40 additional
been very useful for expanding our understanding variables, and given this narrow limitation, spe-
of surgical quality as a whole, it was quickly cific questions must be addressed.
realized that different operations needed specific Participation in data collection programs is
in-depth data in order to design meaningful qual- promoted as it meets criteria for both mainte-
ity improvement strategies. One approach to pro- nance of certification (MOC) and Physician
viding more detailed data has been the roll out of Quality Reporting System (PQRS) as part of
procedure targeted variables, in which institutions CMS (EHealth University: Centers for Medicare
can add to the traditional NSQIP data for addi- & Medicaid Services 2014). This section for
tional cost. This approach allows for a more maintaining credentials requires that providers
detailed approach to individual procedures. This evaluate their performance based upon spe-
was first made available with the release of the cialty-established requirements which must
2012 NSQIP dataset, and the impact remains to be include national benchmarking. The MOC out-
seen. Targeted variables have required consensus lines six core competencies, one of which is
from experts that can be difficult to obtain and be practice-based learning and improvement. Part
limited in its scope. This in-depth approach also IV of the process for continuous learning
requires more resources limiting participation. includes practice performance assessment. For
Another specialty-specific approach includes the American Board of Surgery, diplomats must
the Organ Procurement and Transplantation Net- participate in a national, regional, or local surgi-
work (OPTN) database aimed at monitoring trans- cal outcome database or quality assessment pro-
plant programs nationally. This is monitored and gram. The PQRS is a part of CMS and is the
run by the US Department of Health and Human second specific incentive promoting the use of
Services (National Cancer Institute 2014). The outcome data collection programs as it uses both
desire for more detailed data has led to a number payment adjustments to penalize, as well as
of subspecialty datasets modeled after NSQIP. A incentive payments to ensure providers report
few examples include a vascular surgery-specific quality data (Table 2).
dataset, the Vascular Quality Initiative (2014),
Pediatric NSQIP, and an endocrine surgery-spe-
cific dataset (Collaborative Endocrine Surgery Data Quality
Quality Improvement Collective 2014). The
methods of data collection vary, NSQIP employs A common saying in large database analysis is
a clinical nurse reviewer, and CESQIP does not “garbage in garbage out,” and while there are
yet have the same infrastructure, requiring the methods to account for missing data, a major
surgeon or the surgeon’s designee to input data. limitation remains with extensive missing data
Another approach has been the creation of points. One approach might be to limit case inclu-
regional collaboratives, which requires a high sion to only those with a full set of data; however,
level of collaboration with both academic and this quickly limits patient inclusion. This
nonteaching hospitals alike. Regional collabora- approach may be appropriate for some major
tives will likely play a role in decreasing unnec- data points such as sex, where it can be assumed
essary variability and tracking quality at a more that if subject sex is not included then other vari-
manageable, regional level, where it is easier to ables are likely to be of questionable quality.
implement change than at the national level. Thus Missing data may also be secondary to the data
far, the regional approach has been seen in both collection process. For instance, in NSQIP, preop-
Michigan and Central New York. The central erative laboratory values are gathered; however,
148 K. Noyes et al.
Table 2 Databases and outcomes used to assess surgical outcomes, quality, and safety
Dataset Description Sample and outcomes
ACS-NSQIP http://site.acsnsqip. Maintained by the American College 30-day data based on postoperative
org/ of Surgeons. Participation through outcomes. Provides benchmarking
annual fees by hospital
Pediatric NSQIP Subset of overall NSQIP 30-day follow-up for surgical
http://www.pediatric.acsnsqip.org/ procedures performed on pediatric
patients
VQI (Vascular Quality Initiative) Vascular procedure-specific data 255 participating centers. Uses cloud
www.vascularqualityinitiative.org (including those performed by computing to allow multiple users to
radiologists, cardiologists, and enter data and does not depend on full-
vascular surgeons). Follow-up time data entry specialist. Can be
through 1 year. Governed by the integrated into electronic medical
Society of Vascular Surgeons (SVS) records
Patient Safety Organization
CESQIP (Collaborative Endocrine Since 2012, through the American Patient-centered data collection,
Surgery Quality Improvement Association of Endocrine Surgeons ongoing performance feedback to
Program) (AAES) clinicians, and improvement based on
http://cesqip.org/ analysis of collected data and
collaborative learning
STS National Database Society of Thoracic Surgeons run Focuses on three areas: adult cardiac,
http://www.sts.org/national- program that makes quality scores general thoracic, and congenital heart
database available to institutions and the public surgery
at large. National data for research
requires specific application to the
STS and is not released to
participating hospitals by virtue of
inclusion in data gathering
The Surveillance, Epidemiology, 1973–2011 cancer incidence and Includes data on patient
and End Results (SEER) program survival data from population-based demographics, primary tumor site,
funded by the National Cancer cancer registries covering tumor morphology and stage at
Institute approximately 28 % of the US diagnosis, first course of treatment,
http://seer.cancer.gov/about/ population and 12-month survival
overview.html
Hospital discharge data
Statewide Planning and Research Comprehensive all-payer data Patient-level data on patient
Cooperative System (SPARCS) reporting system. The system was characteristics, diagnoses and
California Patient Discharge initially created to collect information treatments, services, and charges for
Dataset on discharges from hospitals each hospital inpatient stay and
National Inpatient Sample (US) outpatient (ambulatory surgery,
http://www.hcup-us.ahrq.gov/ emergency department, and outpatient
nisoverview.jsp services) visit, and each ambulatory
Hospital Episode Statistics (UK) surgery and outpatient service visit to
http://www.hscic.gov.uk/hes a hospital extension clinic and
diagnostic and treatment center
licensed to provide ambulatory
surgery services
The Centers for Medicare & CMS is responsible for administering Data on acute, psychiatric and skilled
Medicaid Services (CMS) claims the Medicare, Medicaid, and State nursing inpatient admissions,
and survey data Children’s Health Insurance outpatient services, procedures and
http://www.resdac.org/cms-data/ Programs. CMS gathers and formats tests, use of prescription medications,
file-directory about Medicare beneficiaries, skilled nursing, durable medical
Medicare claims, Medicare providers, equipment, and hospice
clinical data, and Medicaid eligibility
and claims. CMS also collects
additional survey data on health
behavior and utilization Medicare &
(continued)
Table 2 (continued)
Dataset Description Sample and outcomes
Current Beneficiary Survey (MCBS)
and satisfaction with care Consumer
Assessment of Healthcare Providers &
Systems (CAHPS)
American Hospital Association Hospital-specific data on 1,000 data fields covering
(AHA) Annual Hospital Survey approximately 6,500 hospitals and organizational structure, personnel,
http://www.aha.org/research/rc/ 400-plus systems hospital facilities and services, and
stat-studies/data-and-directories. financial performance
shtml
American Medical Association Established in 1906, current and Information about demographics,
(AMA) Physician Masterfile historical data for more than 1.4 practice type, significant education,
http://www.ama-assn.org/ama/ million physicians, residents, and training and professional certification
pub/about-ama/physician-data- medical students in the United States, on virtually all Doctors of Medicine
resources/physician-masterfile. including approximately 411,000 (MD) and Doctors of Osteopathic
page graduates of foreign medical schools Medicine (DO)
there remains extensive variation in timing of missing data group as its own categorical level
preoperative labs, as well as whether a specific without making any assumptions if there is an
blood level is checked at all. One particular exam- observed effect compared to subjects with data.
ple is albumin level. Albumin level has demon- Another method includes imputation of data.
strated associations with nutrition and overall These methods are beyond the scope of this chap-
health status. Studies have shown associations ter, but briefly involve separate analysis predicting
with surgical outcomes as well; however, this that specific data point based on the subject’s other
laboratory value is not always checked preopera- characteristics.
tively. In fact, there may be a bias of checking this Missing data of the first type (missing sex) can
value in patients that may be at risk for malnutri- be avoided through auditing processes. Many data
tion or have other major comorbidities. This fact collection programs employ auditing processes to
may bias results leading to concern about its inclu- ensure quality data and sites are not included if
sion in multivariable analysis, even though it they demonstrate inability to conform to
holds clinical value. Some suggest it should not predetermined standards.
be included at all, while others suggest it requires Another major limitation to all large datasets is
a more nuanced approach. Albumin, for instance, changing variable definitions over time. While
is reported as a continuous variable, but can be this process is necessary to some extent as clini-
transformed into a binary variable using clinically cally meaningful definitions may change with
meaningful cutoffs previously described as 3.5 g/ time, it can drastically limit the subject numbers
dl. By assuming all missing values fall within the available for analysis for that endpoint. One such
normal range, one creates a differential misclassi- example is postoperative transfusion within
fication that underestimates the true effect as some NSQIP. Initially, the number of transfused units
in this group may in fact have low albumin levels. was included intraoperatively and postoperatively
Thus, if an observed association is found, it likely defined as greater than 4 units. Researchers were
is true, albeit an underestimate. The data can then able to then describe this endpoint as major post-
still be useful for clinical decision making even operative bleeding and specifically describe the
though many values are in fact missing. Another extent of intraoperative blood loss. This changed
approach to this same problem can be assessing in 2011 when the number of intraoperative units
whether those in the missing dataset are different of blood was removed altogether and postopera-
with respect to the endpoint than the others. This tive transfusion was changed to 2 units or more of
is specifically testing whether there is differential packed red blood cells. The first limitation is the
misclassification. If there is, then one can treat the danger of merging datasets across years without
150 K. Noyes et al.
understanding these changes. First, if ignored, century and now represents the preferred tech-
researchers may erroneously code these missing nique (Korndorffer et al. 2010).
intraoperative transfusions as no transfusion given These changes can significantly impact
and make assumptions upon it which will clearly research as each procedure has specific compli-
be mistaken. Secondly, it poses a challenge in the cations; however, there may be limits in the
second instance as the postoperative transfusion available data due to changes not captured by
variable in the newer dataset has a different clin- the coding systems. For instance, CPT coding
ical meaning. Two units of blood can be given for does not capture robotic techniques lumping
merely low hematocrit levels with comorbidities them with laparoscopic procedures. This has lim-
meant to optimize patients and no longer ited observational studies comparing or even
representing a postoperative bleeding event. tracking robotics usage over the past decade.
These two variables of transfusion are not compa- Another example on the limits of CPT coding
rable over time, given the changes limiting include the absence of transanal endoscopic
analysis. microsurgery (TEMS) codes used for distal rectal
cancer resections that are of sufficiently minimal
rectal wall invasion. This approach is a mini-
Changes in Surgical Procedures mally invasive one that spares the rectum and
and Practices Over Time the sphincter allowing for essentially full rectal
function in low-grade tumors; however, they are
Other issues regarding data collection include lumped in with other rectal cancer resections
the constantly evolving process of case defini- which often include complete rectal resections
tion and even the addition of new surgical pro- with end colostomy or loss of sphincter. The
cedures over time. For instance, the change from difference in quality of life and even the types
ICD-9 to ICD-10 is looming, and how this will of complications are huge. While it clearly makes
impact data collection remains to be seen. The it impossible to perform observational studies on
nuanced changes between the two systems will TEMS within large datasets, it also adds varia-
likely impact some areas more than others, and a tion and error into any assumptions about out-
deep understanding of these nuances will be nec- comes after low rectal cancer resections. There
essary to compare cases between these two time are some ways to exclude TEMS from dataset by
periods. The last major ICD coding change was selecting cases where the tumor stage was suffi-
in 1975, and the medical arena has changed dra- ciently high to make TEMS contraindicated;
matically in that time including the advent of the however, this does not help elucidate specifically
electronic record. the advantages of TEMS. Another example
Some databases only include ICD-9 coding where CPT coding fails is differentiating
where numerous different procedures may be rel- between some specific laparoscopic approaches.
evant for repair of that diagnosis, for instance, Although open inguinal hernia repair has been a
appendicitis can be treated by an open approach bread-and-butter surgical operation, within the
making an incision in the right lower quadrant or last decade, increasingly surgeons are applying
can be treated using laparoscopic techniques, their laparoscopic skills to hernia repair. There
using three small incisions and a camera for are two available laparoscopic approaches:
appendix extraction. Where only ICD-9 codes totally extraperitoneal (TEP) or transabdominal
are available such datasets lack discrimination preperitoneal (TAPP). The TAPP approach
preventing comparison of operative approach. enters the abdominal cavity in standard laparo-
The introduction of laparoscopic procedures is scopic fashion repairing the hernia from the
one example of how surgical procedures change inside using tacks, whereas the TEP approach
over time; while the first report of laparoscopic enters a space above the peritoneum placing the
appendectomy was published in 1981, this prac- mesh between layers and usually does not require
tice did not become ubiquitous until the turn of the tacks to keep the mesh in place. Both approaches
may have different risk profiles and long-term operations are now possible with the first transat-
sequelae; however, observational evaluation is lantic cholecystectomy or so-called “Lindbergh”
limited since there is no differentiation by CPT operation was performed in 2001 (Marescaux et
codes in the ICD-9 system. al. 2002). These changes were only possible
There also remain many processes that are not through improvements in electronic communica-
coded in most databases. This includes many data tion that decreased the lag time sufficiently to
points that may impact outcomes, such as patient allow such an operation.
follow-up strategies, staffing, utilization of The role that virtual communication will have
trainees, and even postdischarge medications. in the future remains unclear, but will likely
While large datasets evolve, opportunities to increase in frequency in the coming decades. Cur-
expand the data as research questions arise may rently, such approaches are not tracked; however,
be available. UNYSQI is one example where including such practices in large healthcare data-
through the ACS-NSQIP institutions can track bases may be useful in understanding their uptake
their own specific data points which may help and impact on clinical care. Other adjunct
answer specific questions. advances also impact surgical care, although
The surgical field is constantly progressing, largely unappreciated, such as major advances
not just specifically with new procedures but and availability in high-quality imaging. Where
also with the introduction of entirely new special- 20 years ago computed tomography was limited,
ties. For example, endocrine surgery is starting to it is now ubiquitous and high-quality scans are
become a major surgical subspecialty; although available within minutes. These findings change
not yet a board-certified specialty, the presence of the diagnostic paradigms and the quality of surgi-
these more specialized surgeons may impact out- cal decision making, although availability of such
comes. Other major changes in surgery may also high-quality CT scans is not included in data-
impact outcomes, which have not been included bases, even those that track whether CT scanning
in current databases. For example, resident work was done at all. Other technological advances
hour restrictions by the ACGME continue to include intraoperative imaging through 3D lapa-
change and become increasingly strict. Previ- roscopy and the development of new instruments
ously, it was not unheard of for surgical residents that make previously unthinkable operative
to work 120–100 h weekly, where now work approaches possible such as single incision sur-
hours are capped at 80 per week and interns are gery or natural orifice transluminal endoscopic
prevented from taking 24-h call. These changes surgery that allows surgeons to perform cholecys-
have drastically changed patient coverage and in tectomy through the vagina.
some cases required supplementing staffing There are many other changes to the structure
through advanced practice providers or moon- of healthcare that may drastically impact out-
lighters. These changes have not been tracked comes including advances in patient monitoring
and it is unclear how changing the workforce or quality of care in the intensive care unit. While
structure has impacted outcomes. Although con- it would be onerous to include all of these changes
troversial, this question holds some urgency as into any given dataset, it is important to remember
more and more restrictions are being the many forces that impact outcomes. Much like
implemented. In fact, a new randomized con- a projectile in physics has many forces that alter
trolled trial will observe how these restrictions its course such as friction, rotation, and wind
impact care; one arm of the trial will require forces, and many of these forces can be ignored
surgical residents to follow the new regulations, to provide the overall picture using the major
while the other will function without work hour forces of velocity and gravity on the object to
restrictions. However, such data is largely absent provide an estimated course; however, keeping
from current datasets. these other forces in mind remains important as
Other major changes include the advent of they may have potential to be key forces in surgi-
telemedicine, and with robotics, even remote cal care.
152 K. Noyes et al.
Individual Surgeon Variation patient was admitted with a risk factor for
(Preferences, Techniques, and Skills) readmission. This has led to disastrous conse-
quences as inclusion of such reasons for
Even if there is a single code and agreed-upon readmission in the model can make all other risk
surgical treatment or practice, the implementation factors no longer statistically significant, and in
of this can vary considerably. Laparoscopic cho- one model, the authors came to the incorrect con-
lecystectomy, for instance, one of the most com- clusion that the only risk factor for readmission
monly performed operations, has considerable was postoperative complications, although subse-
variation in the way the procedure itself is quent studies have demonstrated this to be false.
performed. The absence of this precise detail is This can be avoided by using complication timing
in obstacle to standardizing procedures nationally. to define complications as during the inpatient
There are statistical techniques for controlling for stay as compared to at postdischarge. While
variation at the surgeon level, specifically hierar- predischarge complications have been associated
chical modeling with random effects. Hierarchical with readmissions, the effect estimates have been
random effect modeling also addresses the issue much lower than previously described when all
that most multivariable models ignore; indepen- complications are considered together.
dence assumptions are voided in healthcare stud-
ies as patients are treated by surgeons within
hospitals which have been shown to impact qual- Limited Information on Socioeconomic
ity. Surgeon volume is one surgeon factor that was Drivers of Health
initially noted in 1979, where complex procedures
such as pancreatectomy and coronary artery Analyses of patterns and outcomes of care require
bypass graft have better outcomes when an assessment of the complex relationships
performed by higher-volume surgeons (Solomon among patient characteristics, treatments, and out-
et al. 2002; Birkmeyer et al. 2002; Katz et al. comes. Furthermore, according to the Andersen
2004). This may in part reflect standardization of healthcare utilization model (Aday and Andersen
technique, evidence-based practice, and skill, 1974), usage of health services (including inpa-
which may be a function of practice. Teasing out tient care, outpatient physician visits, imaging,
how outcomes are dependent on technique varia- etc.) is determined by three dynamics:
tion is virtually impossible in current large predisposing factors, enabling factors, and need.
dataset, although one could argue this variation Predisposing factors can be characteristics such as
might explain quality to a much greater degree race, age, and health beliefs. For instance, an
than even risk adjustment based on patient factors. individual who believes surgery is an effective
treatment for cancer is more likely to seek surgical
care. Examples of enabling factors could be famil-
Timing of Complications ial support, access to health insurance, one’s com-
munity, etc. Need represents both perceived and
Even if a reasonable outcome is chosen, it is actual need for healthcare services. To conduct
essential to understand the interplay of that com- and interpret outcome analyses properly,
plication with the hospital course. Incorrect researchers should both understand the strengths
assumptions about this can lead to incorrect and limitations of the primary data sources from
answers. Recent studies on readmissions have which these characteristics are derived and have a
suffered from major errors when they attempt to working knowledge of the strategies used to trans-
include complications as risk factor for late primary data into the categories available in
readmission (Aquina et al. 2014b). Some studies public databases. For instance, SEER-Medicare
suggest that complications are the biggest risk documents details on individual cancer diagnoses,
factor for readmission, and while this may seem demographics, (age, gender, race), Medicare eli-
reasonable, they often confuse the reason the gibility and program enrollment by month, and
aggregate measures of the individual’s “neighbor- outcomes and quality improvement studies are
hood” (e.g., average income and years of educa- using multiple merged sources of data.
tion presented at the zip-code and census-tract The SEER-Medicare data is a product of a
level) as determined through a linkage to recent linkage between two large population-based
US Census data. However, census level data do datasets: Surveillance, Epidemiology, and End
not allow for assessment of differences among Results (SEER) Program of the National Cancer
those zip-code areas. Institute and beneficiaries healthcare claims data
Many analyses of large databases focus on the collected by the Center for Medicare and Medic-
patient’s race or ethnicity as a confounder or a aid Services for billing and enrollment purposes.
predictor of outcome or a marker for other The linked dataset includes Medicare beneficia-
unobserved factors (disadvantaged geographic ries with cancer from selected states participating
area or low health literacy). Information on race in SEER Program, with unit of observation being
is generally available, while information on eth- one healthcare utilization event. This includes all
nicity is often missing or inappropriately coded. Medicare-covered healthcare services from the
While most of the US data surveys allow only one time of a person’s Medicare eligibility (before or
category for Hispanic ethnicity (yes/no), the after cancer diagnosis) until their death. Because
NCDB classifies cancer patients into seven cate- of complex sampling design, number of included
gories (Mexican, Cuban, Puerto-Rican, Domini- variables, and specific data reporting practices for
can, South/Central American, Hispanic by name, tumor characteristics and services utilization, the
and Other). In our analysis of treatment patterns investigator considering a SEER-Medicare-based
for Hispanic cancer patients in NCDB, we dem- study or a proposal should spend time understand-
onstrated persistent disparities in receipt of guide- ing SEER-Medicare data limitations (National
line-recommended care. The care in Hispanic Institute of Health 2014) and learning about data
group as a whole was not significantly different layout and coding (manuals and training are avail-
from non-Hispanic, while individual subgroups able at the NCI and other cancer research
demonstrated significant differences, highlighting organizations).
a critical need of acknowledging Hispanic sub- The Medicare Current Beneficiary Survey
groups in outcome research. (MCBS) is a longitudinal survey of a nationally
representative sample of the Medicare population.
The MCBS contains data about sociodemo-
Need for Linked Data graphics, health and medical history, healthcare
expenditures, and sources of payment for all ser-
Surgical safety and quality are multifactorial vices for a randomly selected representative sam-
issues with more than one risk factor and hence ple of Medicare beneficiaries (Centers for
multiple potential mechanisms for improvement. Medicare and Medicaid Services 2014). For
For instance, reduction in postsurgical complica- every calendar year, there are two separate
tions could be partially achieved by more efficient MCBS data files released: Access to Care and
patient education about early symptoms, improve- Cost and Use files which can be ordered directly
ment in surgeon’s skills, changes in nursing and from the CMS with assistance from the Research
hospital practices, use of surgical visiting nurse Data Assistance Center at the University of Min-
services, and other interventions. Similarly, one nesota (Research Data Assistance Center 2014).
quality improvement intervention may have MCBS Access to Care file contains information
impact on multiple stakeholders including on beneficiaries’ healthcare access, healthcare sat-
patients and their caregivers, clinic personnel, isfaction, and their usual sources of care (Goss et
and health insurance. Hence, a comprehensive al. 2013; Research Data Assistance Center 2014).
evaluation may require information about all MCBS Cost and Use file offers a complete sum-
involved parties. Such data are rarely available mary of all healthcare expenditure and source of
in one dataset, and therefore, many surgical payment data on all healthcare services including
154 K. Noyes et al.
expenditures not covered by (CMS Research Data processing to enable enhanced decision making,
Assistance Center 2015). The information col- insight discovery, and process optimization
lected in the surveys is combined with the claims (Gartner 2013). The challenges of working with
data on the use and cost of services. Medicare big data include analysis, capture, curation,
claims data includes information on the utilization search, sharing, storage, transfer, visualization,
and cost of a broad range of costs including inpa- and privacy violations, among many others. Inno-
tient hospitalizations, outpatient hospital care, vative solutions such as cloud computing chip
skilled nursing home services, and other medical away at some challenges while remaining limited
services. In order for the Cost and Use file to by others. For instance, cloud computing outside
collect, summarize, and validate accurate pay- services such as Amazon ec2, box, dropbox, inter-
ment informations, the release of C&U file is net2, etc. provide storage or processing capabili-
usually delayed by 2 years compared to the ties, but without internal infrastructure or
MCBS AC file. agreements with the outside services, there is the
In addition to publically available merged potential for privacy violations. Yet, just like with
datasets, individual investigators can create their the administrative data several decades earlier, the
own aggregated databases by linking together opportunities provided by big data potentially
information from multiple sources and combining outweigh the risks and, in time, may become
existing data with prospectively collected and data-driven analytics as routine as EMR and dig-
patient-reported information. Examples of such ital image sharing.
studies include a NSQIP-based evaluation of pre-
operative use of statins and whether it is associ-
ated with decreased postoperative major Structure-Process-Outcome
noncardiac complications in noncardiac proce- Assessment in Surgery
dures (Iannuzzi et al. 2013c), a study of recipients
of abdominal solid organ transplant (ASOT) using Theoretical Framework of Quality
additional data from patient medical records Assessment in Healthcare
(Sharma et al. 2011), and a retrospective review
of the data from medical records of patients diag- According to Donabedian (1966), if there is evi-
nosed with hepatocellular carcinoma compared to dence that good structure leads to appropriate
patients in the California Cancer Registry (CCR) processes which in turn result in good outcomes,
(Atla et al. 2012). quality of healthcare intervention could be mea-
sured in terms of either structures (S), processes
(P), or outcomes (O) (Fig. 1).
Data Management and Big Data These indicators can be measured using elec-
tronic, readily available, data from the organiza-
More and more data are being collected for differ- tional health information systems, data collected
ent purposes and are available to be linked by cancer trackers, and other regional data sys-
together including electronic memberships, tems, like Rochester RHIO. It is important to work
online purchasing and consumer behavior closely with each hospital’s clinical quality
records, electronic transactions and others. The assessment team, to avoid redundancy in data
datasets become so large and complex that it collection and other quality assessment and
becomes difficult to manage using traditional reporting initiatives (e.g., Hospital Scorecard, the
resources, and organizations have to increase Clinical Service Scorecard, and the Management
their resources in order to be able to manage Plan Tracking Reports, SCIP, HCAHPS), and
them. Before we know what to do with it, we others (Hospital Consumer Assessment of
have entered into a new era of big data. Big data Healthcare Providers and Systems 2014; The
is high-volume, high-velocity, and/or high-variety Joint Commission Core Measure Sets 2014a).
information assets that require new forms of Additional financial and pre- and postadmission
Fig. 1 Donabedian approach for evaluating outcomes
cost and utilization information about patients can with payers, regional healthcare systems, and
be obtained from CMS claims data for Medicare accountable care organizations (Froimson et al.
fee-for-service beneficiaries and Excellus BCBS 2013; Ugiliweneza et al. 2014).
claims for commercially insured and Medicare While it is tempting to seek out a single perfect
HMO patients (Medicare Health Insurance metric of surgical quality, anybody familiar with
Claim (HIC) number or health insurance ID will the complexity and variation in patient risks and
be abstracted from the patients’ medical charts). the delivery of surgical care would agree that such
The bundles of care for surgical patients can be metric could not possibly exist. More suitable
defined by multidisciplinary care teams for specific would be a multidimensional measure similar to
diagnoses and surgical service lines. A care bundle the six-domain definition of healthcare quality
identifies a set of key interventions from evidence- suggested by the World Health Organization
based guidelines that, when implemented, are (WHO). These dimensions require that healthcare
expected to improve patient outcomes (Institute be:
for Healthcare Improvement 2006). The aim of
care bundles is to change patient care processes • Effective: delivering healthcare that is adher-
and thereby encourage guideline compliance in a ent to an evidence based and results in
number of clinical settings (Brown et al. 2002; improved health outcomes for individuals and
Burger and Resar 2006; Pronovost et al. 2006). communities
Using regional or national healthcare utilization Example: each cancer case is reviewed by a
and expenditure data with Medicare or private specialty multidisciplinary team at least once
plan reimbursement schedule, clinicians and hos- before the final decision about treatment is
pital administrators can estimate annual cost of reached.
care for surgical patients receiving various care • Efficient: delivering healthcare in a
bundles, by disease stage. These bundled cost esti- manner that maximizes resource use and
mates can be used internally (e.g., for budgeting avoids waste
projections or to calculate return on investment Example: avoid unnecessary imaging for
for new programs and interventions) or externally, colorectal cancer (CRC) patients such as PET
to provide a foundation for contract negotiations scans or multiple CT scans.
156 K. Noyes et al.
• Accessible: delivering healthcare that is surgeon, the former Minister of Health in the
timely, geographically reasonable, and pro- United Kingdom, and the lead author of the UK
vided in a setting where skills and resources Darzi Plan to redesign care delivery, encouraged
are appropriate to the medical need healthcare agencies to “localize care where possi-
Example: providing a hub-and-spoke model ble, and centralize services where necessary” for
for chemotherapy delivery for CRC patients efficacy and safety. This implies that routine
residing far from major cancer centers healthcare, like cancer survivorship services,
• Acceptable/patient centered: delivering should take place as close to home as possible,
healthcare which takes into account the prefer- while more complex care, like active cancer treat-
ences and aspirations of individual service ment, should be centralized to ensure it is carried
users and the cultures of their communities out by the most skilled professionals with cutting-
Example: offering palliative care to all edge equipment and high volume/experience.
patients with advanced cancer There exist several validated care delivery
• Equitable: delivering healthcare that does not models to improve access to specialty care for
vary in quality because of personal character- patients with complex chronic disease living in
istics such as gender, race, ethnicity, geograph- underserved or remote communities (for instance,
ical location, or socioeconomic status using videoconferencing technology for enhanced
Example: providing financial assistance to care coordination). There is a large body of liter-
low-income cancer patients assuring that out- ature demonstrating that standardized care path-
of-pocket expenses do not represent a barrier ways, use of multidisciplinary teams (MDTs),
for adequate treatment resident involvement (Iannuzzi et al. 2013a, b),
• Safe: delivering healthcare that minimizes availability of specialized providers (e.g., board-
risks and harm to service users certified surgical specialists, surgical nurses, and
Example: following WHO surgical check- PA) and services (e.g., stoma care, wound care,
list to minimize the risk of surgical complica- surgical ICU), and receiving care in a high-vol-
tions and never events ume center of excellence are associated with bet-
ter outcomes (Reames et al. 2014; Howell et al.
As illustrated by the examples above, this defi- 2014).
nition of healthcare quality provides the link Evidence that hospital volume influences out-
between the organization of care, care processes, comes has been verified in nearly every major
surgical quality, and outcomes. Hence, it enables type of surgery (Begg et al. 1998; Birkmeyer et
all participating stakeholders (e.g., clinicians, al. 2002; Katz et al. 2004). This body of work
researchers, payers, and hospital administrators) highlighted important and previously
to rely on Donabedian’s framework when unrecognized variations in hospital performance
assessing quality of surgical services. According and ignited efforts to improve surgical quality
to Donabedian, if there is evidence that good struc- among poorly performing hospitals. In an effort
ture leads to appropriate processes which in turn to reduce these variations among hospitals, new
results in good outcomes, quality of healthcare health policy and quality improvement initiatives,
intervention could be measured based on presence such as public reporting, pay-for-performance,
of appropriate structures (S) or processes (P). and surgical checklists, have been implemented
Below we provide several examples of to promote best practice and improve standards of
evidence-based measures of quality in surgical care. care (Hannan et al. 1990, 2012; Haynes et al.
2009; Lindenauer et al. 2007). Over the last
decade, surgical mortality rates have significantly
Structure decreased throughout the country, possibly due to
such measures (Weiser et al. 2011; Finks et al.
Lord Darzi, international expert on quality and 2011; Birkmeyer 2012). While surgical/facility
innovation in cancer care, world-leading colorectal volume is easy to measure, the mechanism of
association between procedure volume and out- was a relation between hospital volume and mor-
comes remains to be poorly understood. Possible tality for complex procedures such as open-heart
explanations highlight the importance of surgical surgery or coronary bypass (Luft et al. 1979).
expertise, specialized services, and infrastructure Since then, Birkmeyer et al. expanded on this
that tend to be associated with large-volume idea by showing a significant relationship
centers. between both hospital volume and surgeon vol-
Patient management following multidis- ume and operative mortality for many different
ciplinary principles consistently leads to superior procedures, including resections for lung, bladder,
outcomes at much lower costs. Published esophageal, and pancreatic cancer (Birkmeyer et
supporting evidence for improved cancer-specific al. 2002). Subsequent surgical oncology studies
outcomes with the use of multidisciplinary teams have shown an association between volume and
is available for a range of cancers, including negative margin status, superior nodal harvest,
breast, lung, head and neck, esophageal, and colo- and both short-term and long-term survival.
rectal (Chang et al. 2001; Coory et al. 2008; Gabel Recently, volume-outcome relationship has been
et al. 1997; Stephens et al. 2006; Wille-Jorgensen demonstrated even for less specialized proce-
et al. 2013; Burton et al. 2006). dures, such as incisional hernia repair (Aquina et
al. 2014a).
Evidence of the volume-outcome relationship,
Process along with financial pressures, implementation of
surgical bundled payments, and shift to account-
Many factors that constitute the structure and able care organizations brought to light the impor-
organization of surgical services contribute to tance of efficient and coordinated models of care
the processes of care and, ultimately, affect patient delivery. With the increase in the number of sur-
outcomes. For instance, in addition to knowing gical subspecialties and nonsurgical specialties
structural features, such as whether a hospital has performing surgical procedures (e.g., intervention
a surgical ICU, it is also important to identify radiology and cardiology, urogynecology), there
processes of care, such as how the ICU is staffed is an increase in the involvement of advanced
and what policies, regulations, and checklists the practice providers in patient care delivery (e.g.,
SICU personnel adhere to, including failure to nurse practicioners (NP), physician assistants
rescue, escalation of care, communication, use of (PA), technicians, and therapists) and growing
imaging and antibiotics, and patient nutritional acceptance of multidisciplinary care pathways
protocols. If a residence program is housed in a (oncology, geriatrics, orthopedics, among others).
hospital (structure), what, when, and how surgical For example, high-volume bariatric surgery prac-
residents are required to perform during cases tices can hire psychologists, nutritionists, exercise
(processes) may vary by institution and has seri- therapists, and specialty nurses to provide addi-
ous impact on institutional outcomes. tional supportive services. This approach can free
There is also a growing interest regarding the surgeon’s time and improve care coordination and
potentially detrimental impact of interruptive patient experience. There are other situations
operating room (OR) environments on surgical when the specialty and training of provider is
performance (Healey et al. 2006; Wiegmann et important – for the procedures that could be
al. 2007). Previous investigations showed that performed by different types of providers, for
interruptions occur frequently in ORs, across var- instance, inferior vena cava filter (IVC filter), a
ious surgical specialties (Weigl et al. 2015). type of vascular filter that is implanted to prevent
In an effort to improve surgical outcomes and life-threatening pulmonary emboli (PEs). IVC fil-
potentially lower costs, recent attention has been ters could be placed by a number of different types
placed on efficiency of care delivery and the sur- of providers (vascular surgeons, general surgeons,
gical volume-outcome relationship. Luft et al. first cardiologists, interventional radiologists) for var-
explored this concept in 1979 showing that there ious indications. The outcomes of the intervention
158 K. Noyes et al.
(mortality, complications, PE) could potentially mortality based on the 30-day postadmission
depend on the specialty and skill of the provider. interval rather than postdischarge time (Borzecki
In general, clinic staff rarely bill for their ser- et al. 2010; Hannan et al. 1990, 2013).
vices and often are employed by the institution. Cancer Survival: For surgical oncology stud-
Multidisciplinary consultations for cancer ies, cancer survival rate is often more appropriate
patients are also not reimbursable and often outcome metric than surgical mortality because
count toward “academic time” for faculty physi- the vast majority of cancer patients receive multi-
cians. As a result, these services may be “invisi- modal therapy. Cancer survival is reported by
ble” from insurance claims or medical records. In most tumor registries or can be calculated from
fact, only one provider can be associated with pathology reports. Cancer survival is defined as a
each billable service (procedure or hospital admis- percentage of people who have survived a certain
sion). For any service delivered by more than one type of cancer for a specific amount of time (e.g.,
provider (e.g., resident participating in a surgical 12 months, 2 or 5 years). Certain cancers can recur
case, several APPs involved in hospital discharge many years after first being diagnosed and treated
process), additional data may need to be included (e.g., breast cancer). During this time, a former
(e.g., operating notes, individual provider claims). cancer patient (also called survivor) may die from
a different condition (oncologic or benign), and
hence, the most appropriate choice of reported
Surgical Outcomes statistics in this case would be tumor site-specific
mortality. For instance, patient may be success-
A choice of optimal outcome for each study or fully treated for thyroid cancer but die from colon
evaluation depends on the goal of the assessment cancer 20 years later. Other types of survival rates
as well as factors that may be driving this outcome that give more specific information include dis-
(causal pathway) and resources available to the ease-free survival rate (the amount of cancer
investigators as some of the outcome collection patients who are cancer-free), progression-free
processes may be very costly and time consuming survival rate (the amount of cancer patients who
(e.g., health utility and quality of life measure- are not cured but their cancer is not progressing),
ment) (Drummond et al. 2005; Iezzoni 2004). and cancer recurrence (cancer that has returned
Below we describe some of the most common after treatment and after a period of time during
types of outcomes used in surgical outcome which the cancer was not detected). Sometimes
research and quality assessment and discuss their without detailed pathology data, it is impossible to
applications, limitations, and sources of data. distinguish cancer recurrence from cancer pro-
gression. An example of recurrence versus pro-
Clinical Outcomes gression dilemma could be observed in rectal
Mortality: When defining mortality, it is impor- cancer patients who received nonsurgical
tant to be specific about the duration of the obser- neoadjuvant treatment. Following neoadjuvant
vation period (e.g., in-hospital vs. 30-day chemoradiotherapy (CRT) and interval
mortality) as well as the starting point for the proctectomy, 15–20% of patients are found to
observation period (e.g., day when the procedure have a pathological complete response (pCR) to
was performed for 30-day postsurgical mortality combined multimodal therapy, but controversy
versus 30 days after hospital discharge for 30-day persists about whether this yields a survival ben-
hospital mortality). Using hospital discharge efit (Martin et al. 2012).
abstracts and publicly available software, one Surgical Complications: Incisional Hernia.
can measure in-hospital mortality using the most Incisional hernia is abdominal wall fascia that
appropriate definitions for the needs of the project. fails to heal. Incisional hernia is a common post-
For instance, if there is a significant variation in operative complication following major abdomi-
the hospital length of stay between patients in the nal surgery. Data on incidence of incisional hernia
study, it may be more accurate to define hospital is highly variable with reported values ranging
from 0% to 91%. Diagnosis for incisional hernias healthcare setting that yields accurate measure-
is typically within the first 3 years after initial ment of patient health status domains over time
laparotomy (Yahchouchy-Chouillard et al. 2003; with few items (National Institute of Health
Rosen et al. 2003; Rea et al. 2012); however, it 2015a).
may take up to 10 years to become evident after Hospital Consumer Assessment of Healthcare
the initial surgery (LeBlanc et al. 2000; Akinci et Providers and Systems (HCAHPS) (Systems
al. 2013). This large amount of variation in the 2014): Just like with any other consumer goods
reported rates of incisional hernia is not and services, many providers and organizations
unforeseen, given the wide assortment of the have collected information on patient satisfaction
group of patients included into the studies, the with healthcare. However, prior to HCAHPS,
executed surgery, and the amount of time during there was no national standard for collecting and
the follow-up (Caglià et al. 2014). Several out- publicly reporting patients’ perspectives on their
come measures could be appropriate for a study healthcare experience that would enable valid
on incisional hernia including incidence, preva- comparisons to be made across providers. In
lence, rates of hospital admission, and May 2005, the National Quality Forum (NQF),
reoperation. an organization responsible for standardization of
Surgical Complications: Surgical Site Infec- healthcare quality measurement and reporting,
tion (SSI) (Schweizer et al. 2014). In addition to formally endorsed the CAHPS ® Hospital Survey
pain, discomfort, and high risk for readmission, (Press Ganey Associates Inc 2014).
surgical site infections (SSIs) are identified with The HCAHPS survey is mailed to a random
an excessive amount of morbidity and mortality. sample of hospital patients after a recent dis-
The costs of SSIs have been the focus of quality charge. The survey asks patients to rate 21 aspects
improvement and safety efforts ever since the of their hospital care combined into nine key
Centers for Medicare and Medicaid have halted topics: communication with patients and doctors,
compensation for the growing costs linked with communication between patients and nurses,
SSIs after some surgical operations (so-called responsiveness of the hospital staff, pain manage-
potentially preventable infections) (Aquina et al. ment, communication with patients about medi-
2014b). Prior studies have reported cost of hospi- cines, discharge information, hospital’s
talizations after SSIs in the range from $24 000 to cleanliness, hospital environment’s noise levels,
$100 000 (Schweizer et al. 2014). and transition of care. Patients’ perception of care
is a key performance metric and is used to deter-
Patient-Reported Outcome Measures mine payments to hospitals (Hospital Consumer
(PROMs) Assessment of Healthcare Providers and Systems
Patient-Reported Outcomes Measurement Infor- 2014). The Hospital Compare database (4605
mation System (PROMIS ®): Measures included in hospitals) can be used to examine complication
PROMIS ® are intended for standardized assess- rates and patient-reported experience for hospitals
ment of various patient-reported outcome across the nation. Prior studies have demonstrated
domains – including pain, fatigue, emotional dis- an inverse relationship between patient experi-
tress, physical functioning, and social role partic- ence and complication rates. This negative corre-
ipation (Devlin and Appleby 2010). PROMIS® is lation suggests that reducing these complications
a new set of tools intended to be used in routine can lead to a better hospital experience. Overall,
clinical practice as a part of electronic medical these results suggest that patient experience is
record (EMR) (Cella et al. 2007) system. generally correlated with the quality of care
PROMIS ® was established in 2004 with funding provided.
from the National Institutes of Health (NIH). Depending on the type of surgery and patient
PROMIS measures are based on common vali- population, other outcome measures may be also
dated metrics to ensure computerized and bur- relevant (e.g., pain, functional status, and cogni-
den-free data collection process in any tive ability). Quality of life is a multidomain
160 K. Noyes et al.
indicator that combines all aspects of health rele- clinical endpoint but does not necessarily have a
vant to patients and, hence, may serve as an guaranteed relationship (Cohn 2004). Surrogate
aggregate outcome measure. markers are also used when the number of events
Quality of Life and Subjective Well-Being (Lee is very small, thus making it impractical to con-
et al. 2013): Quality continues to be placed at the duct a clinical trial to detect a statistically signif-
heart of discussions about healthcare. This raises icant effect (e.g., instead of measuring VTE
important questions how quality of care should be events which have an incidence of less than 1%,
measured and from whose perspective, patient’s, studies often use ultrasound-detected blood clots
provider’s, or payer’s. Subjective well-being which are much more prevalent but do not always
(SWB) is a measure of the overall “wellness” of result in PE or VTE) (Fleming and DeMets 1996).
an individual and as such has the potential to be A correlate does not make a surrogate. It is a
used as this global marker for how treatments common misconception that if an outcome is a
affect people in the experience of their lives. correlate (i.e., correlated with the true clinical
SWB links all stages in the treatment and care outcome), it can be used as a valid surrogate
process, thus allowing the overall quality of care endpoint (i.e., a replacement for the true clinical
to be determined and valued according to its direct outcome). However, proper justification for such
effect on people’s lives. SWB has been shown to replacement requires that the effect of the inter-
have an effect on outcomes at all stages of the vention on the surrogate endpoint predicts the
treatment experience, and improved health and effect on the clinical outcome – a much stronger
quality outcomes are shown to consistently condition than correlation. Other examples of
enhance SWB (Lee et al. 2013). Furthermore, commonly used surrogate outcomes in surgery
SWB measures have been shown to be a suitable include costs of care as a measure of poor out-
method to value the impact of healthcare on the comes and disability, positive surgical margins,
families and caregivers of patients and, in this carcinoembryonic antigen (CEA), and number of
way, can join up health outcomes to show wider lymph nodes retrieved as a measure of long-term
effects of treatment on patients’ lives. Measuring cancer recurrence and mortality (Nussbaum et al.
an individual’s SWB throughout his or her treat- 2014).
ment experience can enable a full appraisal of the
quality of care that they receive. This could facil- Composite Outcomes: Episode of Care or
itate service improvements at the microlevel and Care Bundles
help value treatments for resource allocation pur- The value of quality reporting in surgical care,
poses at the macrolevel. however, is limited by problems with existing
measures of quality, mainly, that existing quality
Surrogate Outcomes indicators are designed to measure the quality of a
Although everybody recognizes the importance of specific facility (e.g., hospital) or a specific pro-
measuring patient outcomes and several valid and vider (e.g., surgeon). This, however, does not
accurate measures (as described above) are avail- reflect the current paradigm of care delivery
able, there are several practical barriers to mea- when a patient may be diagnosed in the commu-
suring patient outcomes. These include time nity, referred to a regional center of excellence for
(waiting for cancer recurrence or mortality to neoadjuvant chemoradiation, followed up for 6
occur while maintaining regular follow-up with a months by an academic colorectal surgeon, before
patient), personnel costs (to perform routine sur- returning back to the community for years of
veillance and follow-ups), and patient burden posttreatment surveillance. Regional standardized
(repeated follow-up, evaluations, and surveys). pathways of care and multidisciplinary team
One of the potential solutions to these problems (MDT) approach has been recommended by all
is use of surrogate outcomes. A surrogate out- clinical societies to better identify, coordinate,
come (or endpoint) is a measure of effect of a deliver, and monitor the optimal treatment on an
specific treatment that may substitute for a real individual patient-by-patient basis (Chang et al.
2001; Coory et al. 2008; Stephens et al. 2006; of surgeons who treat low- or average-risk
Abbas et al. 2014; Wille-Jorgensen et al. 2013; patients.
Morris et al. 2006; Gatt et al. 2005; Adamina et al.
2011).
From Data to Quality Improvement
Risk Adjustment Understanding Hospital Billing Data
Risk adjustment is a set of analytic tools used for For many hospital and outpatient services, there is
an array of functions in the healthcare (Iezzoni and a wide difference between billed charges and the
Long-Bellil 2012; Schone and Brown 2013). One amounts that providers expect to receive for ser-
of the primary uses of risk adjustment is providing vices. Hospital charges are usually determined by
fair comparison between different patient hospital administrators depending on prior history
populations, providers, or programs. Risk adjust- and demand. Reimbursement rates, on the other
ment is also necessary to set costs for health plans hand, or the payments that hospitals are actually
to suggest expected treatment expenses of their willing to accept for a specific service or product,
specific membership group. Because of discrep- vary by payer and specific plan. On average, hos-
ancy in everyone’s health and treatment needs, the pitals billed Medicare 3.77 times (standard devia-
cost and outcomes of healthcare may differ from tion = 1.83) what they were actually reimbursed,
person to person. Without risk adjustment, plans with a range of 0.42 to 16.23 (Muhlestein 2013).
or providers have an enticement to enroll and treat The ratio may vary for private payers.
healthier patients (so-called cream skimming or High hospital charges, though, do have some
cherry-picking) and avoid sick, frail, or complex important consequences. First, since the charges
patients. After appropriate risk adjustment, plans do not correlate with the amount being paid and
and providers receive a larger amount of reim- hospital expenditures required to produce a spe-
bursement for members with numerous chronic cific service (i.e., true cost), it becomes difficult, if
illnesses than for members with a small amount not impossible, to compare process between hos-
of or no health problems at all. In addition to costs, pitals, and draw conclusions about financial sus-
risk adjustment is also applied to health outcomes tainability of various service lines. Second – and
when comparing performance across providers (e. potentially devastating for some – those who are
g., risk-adjusted mortality is reported by the STS uninsured who receive care at a hospital, or those
National Database and NSQIP, CABRG Report who are insured and receive care at an out-of-
Cards NYS, UK surgical mortality (National network hospital, may face a bill that greatly
Health Services 2015); The Society of Thoracic exceeds by many times the negotiated price paid
Surgeons National Database 2014). The method- by any payer.
ology used to risk adjustment varies, depending in
part on healthcare market regulations, the
populations served, and the source of payments. Focusing on Modifiable Factors
Risk adjustment is used in all major public pro-
grams offering health coverage in the United One of the major paradoxes that limits our ability
States – including Medicare Advantage (MA), to improve practice based on the results of
Medicare Part D, and state Medicaid managed published studies is that most available predictors
care programs. The STS National Database, with are not modifiable (readmissions: patient severity,
its three million patient records, has long used risk comorbidities), while most modifiable factors are
adjustment to provide more accurate patient out- not routinely collected through standard clinical
comes. If not risk adjusted, the records of sur- data systems (SES, organizational structure). Fur-
geons who perform operations on higher-risk thermore, the reported statistical associations not
patients would always look worse than the records equal causation (but often assumed) and hence,
162 K. Noyes et al.
modifying predictor may not result in a desired do not always lead to meaningful improvement
change in the outcome of interest. Let’s consider in care quality and patient outcomes. Is this the
the example below. ground for skepticism? Not at all. Just like many
Failure to rescue (FTR) refers to the mortality investigations in basic biomedical sciences, out-
among patients with serious complications comes and quality assessment projects often fall
(Johnston et al. 2014; Pucher et al. 2014; short of their potential impact by simply reporting
Almoudaris et al. 2013). Typically, it is hospitals barriers to high-quality care without considering
with greater FTR rates (not greater complication strategies for systematically overcoming these
rates) that have the greatest rate of mortality. limitations and obstacles. Other common mistake
Thus although complications may occur, out- is assuming that just because some risk factors are
comes can still be improved by optimizing the statistically associated with poor quality or out-
quality of care provided to the patient post- comes, they represent a target for improvement.
complication. Although there have been several For instance, if low patient education is associated
studies highlighting the importance of FTR as a with poor cancer prognosis, it may be naïve to
marker for quality of care, these have only con- assume that more education would improve out-
sidered organizational aspects of healthcare. Few comes in cancer patients without a high school
have explored the underlying human factors that diploma. In this case, low education is likely to be
lead up to this critical event. Two main factors a marker for social and economic deprivation in
may contribute toward an FTR event: first, a this demographic group. Addressing this issue
failure to recognize a sick patient and, second, a may require developing a system-wide solution
failure to act promptly once deterioration has like providing a care navigator, graphics rather
been detected. In both situations, an escalation than text-based decision support tools, and
of care (EOC) process is required if FTR is to be phone- rather than internet-based communication
avoided. with care providers.
EOC involves a nurse recognizing a change in Sometimes when large administrative dataset
patient status and communicating it to a postgrad- are used for the analysis, statistically significant
uate year 1 (PGY1) resident, who subsequently risk factors are not necessary clinically significant.
reviews the patient and then escalates care further Before considering any change in clinical prac-
for advice and/or management. Escalation is a tice, it may be beneficial to review the results for
difficult process, as the first doctor called by the face validity with all stakeholders involved in care
nurses will usually be the most junior; this is the process. One approach is to use a systematic
traditional hierarchy. After initial assessment, the quantitative validated method to assess risks in
junior doctor must then contact his or her senior to the process of information transfer across all
explain why they need help and the urgency of phases of surgical care. The method is known as
response required. All of this places a premium on failure mode and effect analysis (FMEA) and was
the value of communication between team mem- originally developed by engineers to accomplish
bers. However, failures in communication are proactive risk analyses (McDermott et al. 1996).
ubiquitous and frequent in the postoperative The National Center for Patient Safety of the US
phase. Although this EOC process lies at the cen- Department of Veterans Affairs adjusted FMEA
ter of FTR and is critically important for safety for use in healthcare, resulting in healthcare
and quality of surgical care, it remains difficult to FMEA (HFMEA) (DeRosier et al. 2002).
measure and quantify and, hence, relatively Healthcare FMEA is a multistep process (Fig. 2)
unexplored in the research literature. that uses a multidisciplinary team to proactively
evaluate a healthcare process. The team uses pro-
cess flow diagrams, hazard scoring, and decision
Identifying Actionable Goals trees to identify potential vulnerabilities and to
assess their potential effect on patient care. The
Despite the most sound study design and state-of- method captures the likelihood of risks, the sever-
the-art statistical methodology, outcome studies ity of consequences, and the probability that they
Fig. 2 Main steps in surgical healthcare failure mode and effect analysis (HFMEA) (Adapted from the Veterans Affairs
National Center for Patient Safety, DeRosier et al. 2002)
may be detected and intercepted before causing Presenting Results

harm. Healthcare FMEA has so far been applied
to medication administration (Fletcher 1997; Quality outcome research results may be
McNally et al. 1997; Kunac and Reith 2005; presented in a variety of ways depending in part
Weir 2005), intravenous drug infusion (Adachi upon the endpoint and how that data will be
and Lodolce 2005; Apkon et al. 2004; Wetterneck used. Standard statistical approaches using stu-
et al. 2006), blood transfusions (Burgmeier 2002), dent’s t-test for continuous and chi-square for
equipment problems (Weinstein et al. 2005; categorical data, for instance, have long been
Wehrli-Veit et al. 2004), and surgery (Nagpal et noted to have biased results based on patient
al. 2010). factor distribution. This is particularly
164 K. Noyes et al.
important for observational studies using data poorly correlated with the more specific anasto-
where patients have not been randomized. motic leak variable as more specifically defined.
Higher-level statistical packages using multi- These findings suggest that prior reports are
variable approaches to adjust for patient-level based on identifying organ space infection as
factors are now readily available, providing an anastomotic leak in colorectal surgery.
adjusted estimated effects in terms of odds Odds ratios may be difficult to put into clin-
ratios. Despite the ubiquity of such methods, if ically meaningful terms other than demonstrat-
not well thought out, results can be drastically ing relative importance. Another approach to
skewed. Only confounding factors and taking multivariable analysis to the next step is
covariates not on the causal pathway should be the creation of risk scores aimed at guiding clin-
included. If one controls for factors on the ical decision making. This approach effectively
causal pathway, one may find that no presumed operationalizes the data available in multivari-
risk factors are associated with the outcome, able analysis by weighting risk factors. The
because they have been effectively controlled approach to these analyses is slightly different
for in the multivariable analysis. This will be as they are aimed at predicting an event, rather
discussed further below. Confounders such as than identifying all potential risk factors. This
comorbidities may also be highly collinear, and changes in which variables are included in anal-
grouping or using already established practices ysis, as only those that improve the predictive
for comorbidity adjustment may be helpful in ability should be used. There may be a high
decreasing the number of variables, particularly degree of crossover; however, risk scores are
if the research question is regarding comparing most useful when they are simple and so one
two different surgical approaches where one may desire to make a parsimonious model, that
only desires to adjust for comorbidities rather is, a model with the fewest number of covariates
than ascertain their independent contribution to while maximizing the predictive power of the
risk for poor outcome. model (Iannuzzi et al. 2013d, 2014a; Kelly et al.
While multivariable analyses are presented 2014a). In order to perform a predictive analysis,
with odds ratios, even this relatively straightfor- data should be split into a development and
ward result presentation requires some additional validation dataset so the risk score can be tested
thought in terms of the desired interpretation. One on naive subjects estimating its ability to be
particular nuance is whether using a reference applied to novel patients. Another similar
group that makes the odds ratio greater than one, approach is the use of nomograms, which is
in other words suggesting increased risk, or such simply another way to organize risk score-type
that the odds ratio suggests a protective effect. It is data.
often more intuitive to present odds ratios With the advent of the electronic record, some
suggesting increased risk; however, this is not of this risk scoring can now be integrated directly
always appropriate. into the clinical record, alerting physicians about
As quality data becomes more prevalent, high-risk patients for readmissions or high-risk
multiple metrics reportedly measuring the same DVT patients prompting some action such as pro-
poor outcome may exist. Auditing these results phylaxis prescription. This approach has
and comparing which approach is more reliable increased the use of guideline-based approaches
and measures the underlying disease state is of and may be an effective tool moving forward.
utmost importance, particularly if this data is to NSQIP also provides individual patient risk cal-
lead to clinical change. For instance, using culators for many complications which allow in-
Pearson’s correlation coefficient, a study of office estimates of risk based on individual patient
NSQIP data when compared to regional data factors. This tool anecdotally has a high degree of
measuring anastomotic leaks found that the tra- satisfaction for patients and providers alike and
ditional approach of “organ space infection” likely improves the consent process.
References Secondary Sources
Primary Sources Abbas MA, Chang GJ, Read TE, Rothenberger DA,
Garcia-Aguilar J, Peters W, Monson JR, Sharma A,
American College of Surgeons (ACS). National Surgical Dietz DW, Madoff RD, Fleshman JW, Greene FL,
Quality Improvement Program. American College of Wexner SD, Remzi FH. Optimizing rectal cancer
Surgeons. 2014a. http://site.acsnsqip.org/. Accessed 19 management: analysis of current evidence. Dis Colon
Sept 2014. Rectum. 2014;57(2):252–9. https://doi.org/10.1097/
Andersen R, Newman J. Societal and individual determi- dcr.0000000000000020.
nants of medical care utilization in the United States. Adachi W, Lodolce AE. Use of failure mode and effects
Milbank Q. 2005;83(4):1–28. analysis in improving the safety of iv drug administra-
Birkmeyer J. Progress and challenges in improving surgi- tion. Am J Health-Syst Pharm. 2005;62(9):917–22.
cal outcomes. Br J Surg. 2012;99(11):1467–9. Adamina M, Kehlet H, Tomlinson G, Senagore A, Delaney
Cohen M, Dimick J, Bilimoria K, Clifford K, Richards K, C. Enhanced recovery pathways optimize health out-
Hall B. Risk adjustment in the American College of comes and resource utilization: a meta-analysis of ran-
Surgeons National Surgical Quality Improvement Pro- domized controlled trials in colorectal surgery. Surgery.
gram: a comparison of logistic versus hierarchical 2011;149(6):830–40. https://doi.org/10.1016/j.
modeling. J Am Coll Surg. 2009a;209(6):687–93. surg.2010.11.003.
Donabedian A. Evaluating the quality of medical care. Aday L, Andersen R. A framework for the study of access
Milbank Mem Fund Q. 1966;44:166–206. to medical care. Health Serv Res. 1974;9(3):208.
Fleming F, Thomas R, DeMets D. Surrogate end points in Agency for Healthcare Research and Quality. National
clinical trials: are we being misled? Ann Intern Med. Healthcare Quality & Desparties Report, 2008. US
1996;125(7):605–13. Department of Health and Human Services. 2009.
Hospital Consumer Assessment of Healthcare Providers Akinci M, Yilmaz KB, Kulah B, Seker GE, Ugurlu C,
and Systems. HCAHPS: Hospital Consumer Assess- Kulacoglu H. Association of ventral incisional hernias
ment of Healthcare Providers and Systems. 2014. with comorbid diseases. Chirurgia. 2013;108:807–11.
http://www.hcahpsonline.org/home.aspx. Accessed 5 Almoudaris A, Burns E, Bottle A, Aylin P, Darzi A,
May 2015. Vincent C, Faiz O. Single measures of performance
Maxwell R. Quality assessment in health. Br Med J. do not reflect overall institutional quality in colorectal
1984;288(6428):1470. cancer surgery. Gut. 2013;62(3):423–9.
Medicare.gov. The official U.S. Government Site for Medi- American Cancer Society. What is cancer recurrence? In:
care. Safe Surgery Checklist Use. In: Medicare.gov. When cancer comes back: cancer recurrence. 2014.
2014. http://www.medicare.gov/hospitalcompare/hos http://www.cancer.org/treatment/survivorshipduringan
pital-safe-surgery-checklist.html?AspxAutoDetectCoo daftertreatment/understandingrecurrence/whenyourcan
kieSupport=1. Accessed 4 May 2015. cercomesback/when-cancer-comes-back-what-is-recur
Pucher P, Rajesh A, Pritam S, Ara D. Enhancing surgical rence. Accessed 7 Jul 2016.
performance outcomes through process-driven care: a American College of Surgeons. American College of Sur-
systematic review. World J Surg. 2014;38(6):1362–73. geons (ACS). In: American College of Surgeons.
Schiff GD, Rucker D. Beyond structure–process–outcome: 2014b. https://www.facs.org/. Accessed 19 Sept 2014.
Donabedian’s seven pillars and eleven buttresses of American College of Surgeons, Commission on Cancer,
quality. Jt Comm J Qual Patient Saf. 2001;27 Surgical Care Improvement Project. Core measure dets.
(3):169–74. In: The Joint Commissions. 2014b. http://www.
Sinclair A, Schymura M, Boscoe F, Yung R, Chen K, jointcommission.org/surgical_care_improvement_project/.
Roohan P, Tai E, Schrag D. Measuring colorectal can- Accessed 10 May 2015.
cer care quality for the publicly insured in New York American College of Surgeons. Commission on Cancer.
State. Cancer Med. 2012;1(3):363–71. https://doi.org/ In: American College of Surgeons. 2014c. https://
10.1002/cam4.30. www.facs.org/quality-programs/cancer. Accessed 19
The Joint Commission Core Measure Sets. 2014a. http:// Sept 2014.
www.jointcommission.org/core_measure_sets.aspx. Apkon M, Leonard J, Probst L, DeLizio L, Vitale R.
Accessed 19 Sept 2014. Design of a safer approach to intravenous drug infu-
The Society of Thoracic Surgeons National Database. The sions: failure mode effects analysis. Qual Saf Health
Society of Thoracic Surgeons National Database. 2014. Care. 2004;13(4):265–71.
http://www.sts.org/national-database. Accessed 19 Aquina C, Kelly K, Probst C, Noyes K, Langstein H,
Sept 2014. Monson JR, Fleming F. Surgeon and facility volume
Tufts Medical Center. Cost-Effectiveness Analysis Regis- play significant role in hernia recurrence and
try. In: Cost-Effectiveness Analysis Registry. 2014. reoperation after open incisional hernia repair. SSAT
https://research.tufts-nemc.org/cear4/ Accessed 5 May 55th annual meeting, Chicago;2014a. 2–6 May 2014.
2015. Aquina C, Rickles A, Iannuzzi JC, Kelly K, Probst C,
Noyes K, Monson JR, Fleming FJ. Centers of
166 K. Noyes et al.
excellence have lower ostomy-relatedNsquip. Tripar- Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve
tite Birmingham;2014b. 30 June 30–3 July 2014. B, Ader D, Fries J, Bruce B, Rose M. The Patient-
Atkin W. Options for screening for colorectal cancer. Reported Outcomes Measurement Information System
Scand J Gastroenterol. 2003;38(237):13–6. (PROMIS): progress of an NIH Roadmap cooperative
Atla P, Sheikh M, Mascarenhas R, Choudhury J, Mills P. group during its first two years. Med Care. 2007;45(5
Survival of patients with hepatocellular carcinoma in Suppl 1):S3.
the San Joaquin Valley: a comparison with California Cella D, Gershon R, Bass M, Rothrock N. What is assess-
Cancer Registry data. Ann Gastroenterol. 2012;25 ment center. In: Assessment Center. 2014. https://www.
(2):138. assessmentcenter.net/. Accessed 7 July 2016.
Australian Commission on Safety and Quality in Health Centers for Medicare and Medicaid Services (CMS).
Care. Windows Into Saf and Quality in Health Care. Physician Quality Reporting System (PQRS): mainte-
2008. nance of Certification Program Incentive. In: eHealth
Begg C, Cramer L, Hoskins W, Brennan M. Impact of University. 2014. https://www.cms.gov/eHealth/down
hospital volume on operative mortality for major can- loads/eHealthU_PQRSMaintenanceCertification-.pdf.
cer surgery. JAMA. 1998;280(20):1747–51. Accessed 6 Jul 2016.
Birkmeyer JD, Siewers AE, Finlayson EVA, Stukel TA, Chang J, Vines E, Bertsch H, Fraker D, Czerniecki B,
Lee Lucas F, Batista I, Gilbert Welch H, Wennberg DE. Rosato E, Lawton T, Conant E, Orel S, Schuchter L,
Hospital volume and surgical mortality in the United Fox K, Zieber N, Glick J, Solin L. The impact of a
States. N Engl J Med. 2002;346(15):1128–37. multidisciplinary breast cancer center on recommenda-
Blackburn H. Research and demonstration projects in com- tions for patient management: the University of Penn-
munity cardiovascular disease prevention. J Public sylvania experience. Cancer. 2001;91(7):1231–7.
Health Policy. 1983;4:398–421. Chantler C. The role and education of doctors in the deliv-
Borzecki A, Christiansen C, Chew P, Loveland S, Rosen A. ery of health care*. The Lancet. 1999;353
Comparison of in-hospital versus 30-day mortality (9159):1178–81. https://doi.org/10.1016/S0140-6736
assessments for selected medical conditions. Med (99)01075-2.
Care. 2010;48(12):1117–21. Cohen M, Bilimoria K, Ko C, Hall B. Development of an
Brown A, Patterson D. To err is human. In: Proceedings of American College of Surgeons National Surgery Qual-
the first workshop on evaluating and architecting sys- ity Improvement Program: morbidity and mortality risk
tem dependability (EASY’01). 2001. calculator for colorectal surgery. J Am Coll Surg.
Brown M, Riley G, Schussler N, Etzioni R. Estimating 2009b;208(6):1009–16.
health care costs related to cancer treatment from Cohn JN. Introduction to surrogate markers. Circulation.
SEER-Medicare data. Med Care. 2002;40(8): 2004;109(25 Suppl 1):IV-20–21.
IV104–17. https://doi.org/10.2307/3767931. Collaborative Endocrine Surgery Quality Improvement
Brownson R, Smith C, Pratt M, Mack N, Jackson-Thomp- Collective. Collaborative Endocrine Surgery Quality
son J, Dean C, Dabney S, Wilkerson. Preventing car- Improvement Program (CESQIP). In: The American
diovascular disease through community-based risk Association of Endocrine Surgeons. 2014. http://
reduction: the Bootheel Heart Health Project. Am J cesqip.org/. Accessed 19 Sept 2014.
Public Health. 1996;86(2):206–13. Coory M, Gkolia P, Yang I, Bowman R, Fong K. Sys-
Burger CD, Roger RK. “Ventilator bundle” approach to tematic review of multidisciplinary teams in the man-
prevention of ventilator-associated pneumonia. Mayo agement of lung cancer. Lung Cancer. 2008;60
Clin Proc. 2006;81(6):849–50. https://doi.org/10.4065/ (1):14–21.
81.6.849. Department of Health Office/Welsh. A policy framework
Burgmeier J. Failure mode and effect analysis: an applica- for commissioning cancer services (Calman-Hine
tion in reducing risk in blood transfusion. Jt Comm J report). London: Department of Health; 1995.
Qual Patient Saf. 2002;28(6):331–9. Department of Health. The NHS Cancer plan: a plan for
Burton S, Brown G, Daniels I, Norman A, Mason B, investment, a plan for reform. In: Publications. 2000.
Cunningham D. MRI directed multidisciplinary team http://webarchive.nationalarchives.gov.uk/+/www.dh.
preoperative treatment strategy: the way to eliminate gov.uk/en/Publicationsandstatistics/Publications/Pub
positive circumferential margins? Br J Cancer. 2006;94 licationsPolicyandGuidance/DH_4009609. Accessed
(3):351–7. 19 Sept 2014.
CA Society of thoracic Surgeons. California Cardiac Sur- DeRosier J, Stalhandske E, Bagian JP, Nudell T. Using
gery and Intervention Project (CCSIP). In: California health care failure mode and effect analysis™: the VA
Cardiac Surgery Intervention project. 2014. http:// National Center for Patient Safety’s prospective risk
www.californiacardiacsurgery.com/CCSIP-2012/index. analysis system. Jt Comm J Qual Patient Saf. 2002;28
html. Accessed 19 Sept 2014. (5):248–67. http://www.patientsafety.va.gov/profes
Caglià P, Tracia A, Borzì L, Amodeo L, Tracia L, Veroux sionals/onthejob/hfmea.asp.
M, Amodeo C. Incisional hernia in the elderly: risk Devlin N, Appleby J. Getting the most out of PROMs:
factors and clinical considerations. Intern J Surg. putting health outcomes at the heart of NHS decision-
2014;12(Suppl 2):S164–9. making. London: The King’s Fund; 2010.
Drummond MF, Sculpher M, Torrance GW, O’Brien BJ, Werutsky G, Higgins M, Fan L, Vasconcelos C, Cazap
Stoddart G. Methods for the economic evaluation of E, Vallejos C, Mohar A, Knaul F, Arreola H, Batura R,
health care programmes. 3rd ed. New York: Oxford Luciani S, Sullivan R, Finkelstein D, Simon S, Barrios
University Press; 2005. C, Kightlinger R, Gelrud A, Bychkovsky V, Lopes G,
EHealth University: Centers for Medicare & Medicaid Stefani S, Blaya M, Souza F, Santos F, Kaemmerer A,
Services. Physician Quality Reporting System Azambuja E, Zorilla A, Murillo R, Jeronimo J, Tsu V,
(PQRS): Maintenance of Certification Program Incen- Carvalho A, Gil C, Sternberg C, Duenas-Gonzalez A,
tive. CMS. 2014. Sgroi D, Cuello M, Fresco R, Reis R, Masera G, Gabus
Etzioni DA, Cannom RR, Madoff RD, Ault GT, Beart Jr R, Ribeiro R, Knust R, Ismael G, Rosenblatt E, Roth B,
RW. Colorectal procedures: what proportion is Villa L, Solares A, Leon M, Torres-Vigil I, Covarru-
performed by American Board of Colon and Rectal bias-Gomez A, Hernandez A, Bertolino M,
Surgery–certified surgeons? Dis Colon Rectum. Schwartsmann G, Santillana S, Esteva F, Fein L,
2010;53(5):713–20. Mano M, Gomez H, Hurlbert M, Durstine A, Azenha
Etzioni DA, Young‐Fadok TM, Cima RR, Wasif N, G. Planning cancer control in Latin America and the
Madoff RD, Naessens JM, Habermann EB. Patient Caribbean. Lancet Oncol. 2013;14(5):391–436. https://
survival after surgical treatment of rectal cancer: impact doi.org/10.1016/S1470-2045(13)70048-2.
of surgeon and hospital characteristics. Cancer. Group, COMMIT Research. Community Intervention
2014;120(16):2472–81. Trial for Smoking Cessation (COMMIT): summary of
Fawcett S, Lewis R, Paine-Andrews A, Francisco V, Richter design and intervention. J Natl Cancer Inst. 1991;83
K, Williams E, Copple B. Evaluating community coali- (22):1620–8.
tions for prevention of substance abuse: the case of pro- Grube B, Giuliano A. Observation of the breast cancer
ject freedom. Health Educ Behav. 1997;24(6):812–28. patient with a tumor‐positive sentinel node: implica-
Ferguson G, Eliasziw M, Barr H, Clagett P, Barnes R, tions of the ACOSOG Z0011 trial. Semin Surg Oncol.
Wallace C, Taylor W, Haynes B, Finan J, Hachinski 2001;20(3):230–7.
V, Barnett H, for the North American Symptomatic Hannan E, Kilburn H, O’Donnell J, Lukacik G, Shields E.
Carotid Endarterectomy Trial Collaborators. The Adult open heart surgery in New York State: an analy-
North American Symptomatic Carotid Endarterectomy sis of risk factors and hospital mortality rates. JAMA.
Trial: surgical results in 1415 patients. Stroke. 1999;30 1990;264(21):2768–74.
(9):1751–8. https://doi.org/10.1161/01.str.30.9.1751. Hannan E, Siu A, Kumar D, Kilburn H, Chassin M. The
Finks J, Osborne N, Birkmeyer J. Trends in hospital vol- decline in coronary artery bypass graft surgery mortal-
ume and operative mortality for high-risk surgery. N ity in New York State: the role of surgeon volume.
Engl J Med. 2011;364(22):2128–37. JAMA. 1995;273(3):209–13.
Fletcher C. Failure mode and effects analysis: an interdis- Hannan E, Cozzens K, King S, Walford G, Shah N. The
ciplinary way to analyze and reduce medication errors. New York State cardiac registries history, contribu-
J Nurs Adm. 1997;27(12):19–26. tions, limitations, and lessons for future efforts to assess
Froimson M, Rana A, White R, Marshall A, Schutzer S, and publicly report healthcare outcomes. J Am Coll
Healy W, Naas P, Daubert G, Lorio R, Parsley B. Cardiol. 2012;59(25):2309–16.
Bundled payments for care improvement initiative: Hannan E, Farrell L, Wechsler A, Jordan D, Lahey S,
the next evolution of payment formulations: AAHKS Culliford A, Gold J, Higgins R, Smith C. The
Bundled Payment Task Force. J Arthroplasty. 2013;28 New York risk score for in-hospital and 30-day mortal-
(8):157–65. ity for coronary artery bypass graft surgery. Ann Thorac
Gabel M, Hilton N, Nathanson S. Multidisciplinary breast Surg. 2013;95(1):46–52.
cancer clinics. Do they work? Cancer. 1997;79 Hannan EL, Kilburn H, Racz M, Shields E, Chassin MR.
(12):2380–4. Improving the outcomes of coronary artery bypass
Gartner. Big Data. In: It Glossary. 2013. http://www. surgery in New York State. JAMA 1994;271
gartner.com/it-glossary/big-data/. Accessed 19 Sept (10):761–6.
2014. Haynes A, Weiser T, Berry W, Lipsitz SR, Breizat A,
Gatt M, Anderson A, Reddy B, Hayward-Sampson P, Dellinger P, Herbosa T, Joseph S, Kibatala P, Lapitan
Tring I, MacFie J. Randomized clinical trial of multi- M. A surgical safety checklist to reduce morbidity and
modal optimization of surgical care in patients under- mortality in a global population. N Engl J Med.
going major colonic resection. Br J Surg. 2005;92 2009;360(5):491–9.
(11):1354–62. https://doi.org/10.1002/bjs.5187. Healey AN, Sevdalis N, Vincent CA. Measuring intra-
Goodman R, Wheeler F, Lee P. Evaluation of the Heart To operative interference from distraction and interruption
Heart Project: lessons from a community-based chronic observed in the operating theatre. Ergonomics. 2006;49
disease prevention project. Am J Health Promot. (5–6):589–604.
1995;9(6):443–55. Hospital Consumer Assessment of Healthcare Providers
Goss P, Lee B, Badovinac-Crnjevic T, Strasser-Weippl K, and Systems. HCAHPS: Hospital consumer assess-
Chavarri-Guerra Y, Louis J, Villarreal-Garza C, Unger- ment of healthcare providers and systems. In: Hospital
Saldana K, Ferreyra M, Debiasi M, Liedke P, Touya D, Consumer Assessment of Healthcare Providers and
168 K. Noyes et al.
Systems. 2014. http://www.hcahpsonline.org/home. Johnston J, Marmet P, Coen S, Fawcett S, Harris K.

aspx. Accessed 19 Sept 2014. Kansas LEAN: an effective coalition for nutrition
Howell E, Devaney B, McCormick M, Raykovich K. Back education and dietary change. J Nutr Educ. 1996;28
to the future: community involvement in the healthy (2):115–8.
start program. J Health Polit Policy Law. 1998;23 Johnston M, Arora S, King D, Stroman L, Darzi A. Esca-
(2):291–317. lation of care and failure to rescue: a multicenter,
Howell A, Panesar S, Burns E, Donaldson L, Darzi A. multiprofessional qualitative study. Surgery. 2014;155
Reducing the burden of surgical harm: a (6):989–94.
systematic review of the interventions used to reduce Katz J, Barrett J, Mahomed N, Baron J, Wright J, Losina E.
adverse events in surgery. Ann Surg. 2014;259 Association between hospital and surgeon procedure
(4):630–41. volume and the outcomes of total knee replacement. J
Hurtado M, Swift E, Corrigan J. Crossing the quality Bone Joint Surg. 2004;86(9):1909–16.
chasm: a new health system for the 21st century. Insti- Kelly K, Iannuzzi J, Rickles A, Monson J, Fleming F. Risk
tute of Medicine, Committee on the National Quality factors associated with 30-day postoperative
Report on Health Care Delivery. 2001. readmissions in major gastrointestinal resections. J
Iannuzzi J, Chandra A, Rickles A, Kumar N, Kelly K, Gastrointest Surg. 2014a;18(1):35–44. https://doi.org/
Gillespie D, Monson J, Fleming F. Resident involve- 10.1007/s11605-013-2354-7.
ment is associated with worse outcomes after major Kelly KN, Fleming FJ, Aquina CT, Probst CP, Noyes K,
lower extremity amputation. J Vasc Surg. 2013a;58 Pegoli W, Monson JRT. Disease severity, not operative
(3):827–31.e1. https://doi.org/10.1016/j. approach, drives organ space infection after pediatric
jvs.2013.04.046. appendectomy. Ann Surg. 2014b;260(3):466–73.
Iannuzzi J, Rickles A, Deeb A, Sharma A, Fleming F, Korndorffer J, Fellinger E, Reed W. SAGES guideline for
Monson J. Outcomes associated with resident involve- laparoscopic appendectomy. Surg Endosc. 2010;24
ment in partial colectomy. Dis Colon Rectum. (4):757–61.
2013b;56(2):212–8. https://doi.org/10.1097/ Kunac D, Reith D. Identification of priorities for medica-
DCR.0b013e318276862f. tion safety in neonatal intensive care. Drug Saf.
Iannuzzi J, Rickles A, Kelly K, Rusheen A, Dolan J, Noyes 2005;28(3):251–61.
K, Monson J, Fleming F. Perioperative pleiotropic Leatherman S, Sutherland K. The quest for quality in the
statin effects in general surgery. Surgery. 2013c. NHS: a mid-term evaluation of the ten-year quality
https://doi.org/10.1016/j.surg.2013.11.008. agenda. London: Stationary Office; 2003.
Iannuzzi J, Young K, Kim M, Gillespie D, Monson J, LeBlanc K, Booth W, Whitaker J, Bellanger D. Laparo-
Fleming F. Prediction of postdischarge venous throm- scopic incisional and ventral herniorrhaphy in 100
boembolism using a risk assessment model. J Vasc patients. Am J Surg. 2000;180(3):193–7.
Surg. 2013d;58(4):1014–20.e1. https://doi.org/ Lee P, Regenbogen S, Gawande A. How many surgical
10.1016/j.jvs.2012.12.073. procedures will Americans experience in an average
Iannuzzi J, Chandra A, Kelly K, Rickles A, Monson J, lifetime? Evidence from three states. Massachusetts
Fleming F. Risk score for unplanned vascular Chapter of the American College of Surgeons 55th
readmissions. J Vasc Surg. 2014a. https://doi.org/ Annual Meeting, Boston. 2008.
10.1016/j.jvs.2013.11.089. Lee H, Vlaev I, King D, Mayer E, Darzi A, Dolan P.
Iannuzzi J, Rickles A, Kelly K, Fleming F, Dolan J, Subjective well-being and the measurement of quality
Monson J, Noyes K. Defining high risk: cost-effective- in healthcare. Soc Sci Med. 2013;99:27–34.
ness of extended-duration thromboprophylaxis follow- Levit L, Balogh E, Nass S, Ganz P. Delivering high-quality
ing major oncologic abdominal surgery. J Gastrointest cancer care: charting a new course for a system in crisis.
Surg. 2014b;18(1):60–8. https://doi.org/10.1007/ Washington, DC: National Academies Press; 2013.
s11605-013-2373-4. Lindenauer P, Remus D, Roman S, Rothberg M, Benjamin
Iezzoni L. Risk adjusting rehabilitation outcomes: an over- E, Ma A, Bratzler D. Public reporting and pay for
view of methodologic issues. Am J Phys Med Rehabil. performance in hospital quality improvement. N Engl
2004;83(4):316–26. J Med. 2007;356(5):486–96.
Iezzoni L, Long-Bellil L. Training physicians about caring Luft H, Bunker J, Enthoven A. Should operations be
for persons with disabilities: “Nothing about us without regionalized? The empirical relation between surgical
us!”. Disabil Health J. 2012;5(3):136–9. volume and mortality. N Engl J Med. 1979;301
Institute for Healthcare Improvement. In: Raising the bar (25):1364–9.
with bundles: treating patients with an all-or-nothing Mack J, Chen K, Boscoe F, Gesten F, Roohan P, Weeks J,
standard. Institute for Healthcare Improvement. 2006. Schymura M, Schrag D. Underuse of hospice care by
www.ihi.org/IHI/Topics/CriticalCare/Intensive. Accessed medicaid-insured patients with stage IV lung cancer in
9 June 2014. New York and California. J Clin Oncol. 2013;2012
Jensen C, Prasad L, Abcarian H. Cost-effectiveness of (45):9271.
laparoscopic vs open resection for colon and rectal Marescaux J, Leroy J, Rubino F, Smith M, Vix M, Simone
cancer. Dis Colon Rectum. 2012;55(10):1017–23. M, Mutter D. Transcontinental robot-assisted remote
telesurgery: feasibility and potential applications. Ann appliedresearch.cancer.gov/seermedicare/. Accessed

Surg. 2002;235(4):487. 15 May 2015.
Martin S, Heneghan H, Winter D. Systematic review and National Institute of Health. PROMIS: patient-reported
meta‐analysis of outcomes following pathological outcomes measurement information system. In: Pro-
complete response to neoadjuvant grams. 2015a. http://commonfund.nih.gov/promis/
chemoradiotherapy for rectal cancer. Br J Surg. index. Accessed 15 May 2015.
2012;99(7):918–28. National Institute of Health. SEER-Medicare: brief
Mayer J, Soweid R, Dabney S, Brownson C, Goodman R, description of the SEER-Medicare database. In:
Brownson R. Practices of successful community coali- Healthcare Delivery Research. 2015b. http://healthcar
tions: a multiple case study. Am J Health Behav. edelivery.cancer.gov/seermedicare/overview/. Accessed
1998;22(5):368–77. 7 Jul 2016.
Mayo Clinic Staff. Cancer survival rate: What it means for New York State Department of Health. Cardiovascular
your prognosis. In: Diseases and Conditions Cancer. disease data and statistics. In: New York State Depart-
2016. http://www.mayoclinic.org/diseases-conditions/ ment of Health. 2014. https://www.health.ny.gov/statis
cancer/in-depth/cancer/art-20044517. Accessed 7 Jul tics/diseases/cardiovascular/. Accessed 7 Jul 2016.
2016. Nussbaum D, Speicher P, Ganapathi A, Englum B, Keenan
McDermott R, Mikulak R, Beauregard M. The basics of J, Mantyh C, Migaly J. Laparoscopic versus open low
FMEA: quality resources. New York: CRC Press; anterior resection for rectal cancer: results from the
1996. national cancer data base. J Gastrointest Surg.
McNally K, Page M, Sunderland V. Failure-mode and 2014;19(1):124–31. https://doi.org/10.1007/s11605-
effects analysis in improving a drug distribution sys- 014-2614-1.
tem. Am J Health-Syst Pharm. 1997;54(2):171–7. Optimizing the Surgical Treatment of Rectal Cancer
Morris E, Haward R, Gilthorpe M, Craigs C, Forman D. (OSTRiCh). OSTRiCh consortium. In: OSTRiCh Con-
The impact of the Calman-Hine report on the processes sortium. 2014. http://www.ostrichconsortium.org/
and outcomes of care for Yorkshire’s colorectal cancer news_archive.htm#.VByQjBYgIhU. Accessed 19
patients. Br J Cancer. 2006;95(8):979–85. https://doi. Sept 2014.
org/10.1038/sj.bjc.6603372. Press Ganey Associates, Inc. HCAHPS Insights. In: Our
Muhlestein D. What type of hospitals have high charge-to- solutions. 2014. http://www.pressganey.com/
reimbursement ratios? In: Health Affairs Blog. 2014. ourSolutions/patient-voice/regulatory-surveys/hcahps-
2013. http://healthaffairs.org/blog/2013/07/15/what- survey.aspx. Accessed 19 Sept 2014.
types-of-hospitals-have-high-charge-to-reimbursement- Pronovost P, Needham D, Berenholtz S, Sinopoli D, Chu
ratios/. Accessed 19 Sept 2014. H, Cosgrove S, Sexton B, Hyzy R, Welsh R, Roth G,
Mukamel D, Mushlin A. Quality of care information Bander J, Kepros J, Goeschel C. An intervention to
makes a difference: an analysis of market share and decrease catheter-related bloodstream infections in the
price changes after publication of the New York State ICU. N Engl J Med. 2006;355(26):2725–32. https://
Cardiac Surgery Mortality Reports. Med Care. 1998;36 doi.org/10.1056/NEJMoa061115.
(7):945–54. Rea R, Falco P, Izzo D, Leongito M, Amato B. Laparo-
Nagpal K, Vats A, Ahmed K, Smith A, Sevdalis N, scopic ventral hernia repair with primary transparietal
Jonannsson H, Vincent C, Moorthy K. A systematic closure of the hernial defect. BMC Surg. 2012;12 Suppl
quantitative assessment of risks associated with poor 1:S33.
communication in surgical care. Arch Surg. 2010;145 Reames B, Ghaferi A, Birkmeyer J, Dimick J. Hospital
(6):582–8. volume and operative mortality in the modern era.
Nallamothu B, Bates E, Wang Y, Bradley E, Krumholz H. Annal Surg. 2014;260(2):244–51.
Driving times and distances to hospitals with percuta- Research Data Assistance Center. MCBS access to care. In
neous coronary intervention in the United States impli- Research Data Assistance Center. 2014. http://www.
cations for prehospital triage of patients with ST- resdac.org/cms-data/files/mcbs-access-care. Accessed
elevation myocardial infarction. Circulation. 2006;113 19 Sept 2014.
(9):1189–95. Research Data Assistance Center. MCBS cost and use.
National Cancer Institute. SEER-Medicare: Brief descrip- In: Research Data Assistance Center. 2015. http://
tion of the SEER-medicare database. In: Healthcare www.resdac.org/cms-data/files/mcbs-cost-and-use.
Delivery Research. 2014. http://healthcaredelivery.can Accessed 5 May 2015.
cer.gov/seermedicare/overview/. Accessed 5 May Rickles A, Iannuzzi J, Kelly K, Cooney R, Brown D,
2014. Davidson M, Hellenthal N, Max C, Johnson J,
National Health Services. Consultant outcome data. In: My DeTraglia J, McGurrin M, Kimball R, DiBenedetto A,
NHS. 2015. http://www.nhs.uk/choiceintheNHS/ Galyon D, Esposito S, Noyes K, Monson J, Fleming F.
Yourchoices/consultant-choice/Pages/consultant-data. Anastomotic leak or organ space surgical site infection:
aspx. Accessed 14 May 2015. what are we missing in our quality improvement pro-
National Institute of Health. SEER-Medicare linked data- grams? Surgery. 2013;154(4):680–7. https://doi.org/
base. In: Healthcare Delivery Research. 2014. http:// 10.1016/j.surg.2013.06.035; discussion 687–9.
170 K. Noyes et al.
Rosen M, Brody F, Ponsky J, Walsh R, Rosenblatt S, small intestinal adenocarcinoma. Int J Colorectal Dis.
Duperier F, Fanning A, Siperstein A. Recurrence after 2013. https://doi.org/10.1007/s00384-013-1689-6.
laparoscopic ventral hernia repair. Surg Endosc Other Wayne A, Lodolce A. Use of failure mode and effects
Intervent Tech. 2003;17(1):123–8. analysis in improving the safety of IV drug administra-
Roussos S, Fawcett S. A review of collaborative partner- tion. Am J Health-Syst Pharm. 2005;62(9):917–22.
ships as a strategy for improving community health. Wehrli-Veit M, Riley J, Austin J. A failure mode effect
Ann Rev Public Health. 2000;21(1):369–402. https:// analysis on extracorporeal circuits for cardiopulmonary
doi.org/10.1146/annurev.publhealth.21.1.369. bypass. J Extra Corpor Technol. 2004;36(4):351–7.
Rutter C, Johnson E, Feuer E, Knudsen A, Kuntz K, Schrag Weigl M, Antoniadis S, Chiapponi C, Bruns C, Sevdalis N.
D. Secular trends in colon and rectal cancer relative The impact of intra-operative interruptions on sur-
survival. J Natl Cancer Inst. 2013;105:1806–13. geons’ perceived workload: an observational study in
Schone E, Brown R. Risk adjustment: what is the current state elective general and orthopedic surgery. Surg Endosc.
of the art and how can it be improved? In: Robert Wood 2015;29(1):145–53.
Johnson Foundation. 2013. http://www.rwjf.org/en/ Weinstein R, Linkin D, Sausman C, Santos L, Lyons C,
library/research/2013/07/risk-adjustment—what-is-the- Fox C, Aumiller L, Esterhai J, Pittman B, Lautenbach
current-state-of-the-art-and-how-c.html. Accessed 19 Sept E. Applicability of healthcare failure mode and effects
2014. analysis to healthcare epidemiology: evaluation of the
Schweizer M, Cullen J, Perencevich E, Vaughan S. Costs sterilization and use of surgical instruments. Clin Infect
associated with surgical site infections in veterans Dis. 2005;41(7):1014–9.
affairs hospitals. JAMA Surg. 2014. https://doi.org/ Weir V. Best-practice protocols: preventing adverse drug
10.1001/jamasurg.2013.4663. events. Nurs Manage. 2005;36(9):24–30.
Sharma R, Hawley C, Griffin R, Mundy J, Peters P, Shah P. Weiser T, Regenbogen S, Thompson K, Haynes A, Lipsitz
Cardiac surgical outcomes in abdominal solid organ S, Berry W, Gawande A. An estimation of the global
(renal and hepatic) transplant recipients: a case matched volume of surgery: a modelling strategy based on avail-
study. Heart Lung Circ. 2011;20(12):804–5. able data. Lancet. 2008;372(9633):139–44.
Sitzia J, Wood N. Patient satisfaction: a review of issues Weiser T, Semel M, Simon A, Lipsitz S, Haynes A, Funk L,
and concepts. Soc Sci Med. 1997;45(12):1829–43. Berry W, Gawande A. In-hospital death following inpa-
Society for Surgery of the Alimentary Tract. The society tient surgical procedures in the United States,
for surgery of the alimentary tract. In: The Society for 1996–2006. World J Surg. 2011;35(9):1950–6.
Surgery of the Alimentary Tract. 2016. http://www. Wetterneck T, Skibinski K, Roberts T, Kleppin S,
ssat.com/. Accessed 6 Jul 2016. Schroeder M, Enloe M, Rough S, Hundt A, Carayon
Society for Surgical Oncology. SSO: Society for surgical P. Using failure mode and effects analysis to plan
oncology. In: Society for Surgical Oncology. 2014. implementation of smart IV pump technology. Am J
http://www.surgonc.org/. Accessed 19 Sept 2014. Health-Syst Pharm. 2006;63(16):1528–38.
Solomon D, Losina E, Baron J, Fossel A, Guadagnoli E, Whitlock E, Lin J, Liles E, Beil T, Fu R. Screening for
Lingard E, Miner A, Phillips C, Katz J. Contribution of colorectal cancer: a targeted, updated systematic review
hospital characteristics to the volume–outcome rela- for the US Preventive Services Task Force. Ann Intern
tionship: dislocation and infection following total hip Med. 2008;149(9):638–58.
replacement surgery. Arthritis Rheum. 2002;46 Wiegmann D, ElBardissi A, Dearani J, Daly R, Sundt III T.
(9):2436–44. Disruptions in surgical flow and their relationship to
Stephens M, Lewis W, Brewster A, Lord I, Blackshaw G, surgical errors: an exploratory investigation. Surgery.
Hodzovic I, Thomas G, Roberts S, Crosby T, Gent C, 2007;142(5):658–65.
Allison M, Shute K. Multidisciplinary team management Wille-Jorgensen P, Sparre P, Glenthoj A, Holck S,
is associated with improved outcomes after surgery for Norgaard Petersen L, Harling H, Stub Hojen H,
esophageal cancer. Dis Esophagus. 2006;19(3):164–71. Bulow S. Result of the implementation of multidis-
https://doi.org/10.1111/j.1442-2050.2006.00559.x. ciplinary teams in rectal cancer. Colorectal Dis.
U.S. Department of Health & Human Services. Data. In: 2013;15(4):410–3. https://doi.org/10.1111/codi.12013.
Organ Procurement and Transplantation Network. 2014. World Alliance for Patient Safety. WHO surgical safety
http://optn.transplant.hrsa.gov/data/. Accessed 7 Jul 2016. checklist and implementation manual. In: World Health
Ugiliweneza B, Kong M, Nosova K, Huang BA, Babu R, Organization. 2014. http://www.who.int/patientsafety/
Klad SP, Boakye M. Spinal surgery: variations in safesurgery/ss_checklist/en/. Accessed 7 Jul 2016.
healthcare costs and implications for episode-based Yahchouchy-Chouillard E, Aura T, Picone O, Etienne J,
bundled payments. Spine. 2014;39:1235–42. Fingerhut A. Incisional hernias. Digest Surg. 2003;20
Vascular Quality Imitative. Improving vascular care. In: (1):3–9.
Society for Vascular Surgery. 2014. http://www. Zapka J, Marrocco G, Lewis B, McCusker J, Sullivan J,
vascularqualityinitiative.org/. Accessed 19 Sept 2014. McCarthy J, Birch F. Inter-organizational responses to
Wang Y, Jiang C, Guan J, Yang G, Yue J, Chen H, Xue J, AIDS: a case study of the Worcester AIDS Consortium.
Xu Z, Qian Q, Fan L. Molecular alterations of EGFR in Health Educ Res. 1992;7(1):31–46.
Health Services Information: From Data to
Policy Impact (25 Years of Health Services 8
and Population Health Research at the
Manitoba Centre for Health Policy)
Leslie L. Roos, Jessica S. Jarmasz, Patricia J. Martens, Alan Katz,

Randy Fransoo, Ruth-Ann Soodeen, Mark Smith, Joshua Ginter,
Charles Burchill, Noralou P. Roos, Malcolm B. Doupe, Marni
Brownell, Lisa M. Lix, Greg Finlayson, and Maureen Heaman
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
The Deliverable Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
What Is a Deliverable? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Negotiating the Deliverable Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
The Approval Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Meetings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Presentations During the Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Deliverable Measures and Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Highlights of Selected Deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
The “Need to Know” Team Deliverables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

L. L. Roos (*) · J. S. Jarmasz · A. Katz · R. Fransoo ·
R.-A. Soodeen · M. Smith · C. Burchill · N. P. Roos · M. B.
Doupe · M. Brownell
Manitoba Centre for Health Policy, University of
e-mail: Leslie_Roos@cpe.umanitoba.ca
P. J. Martens
J. Ginter
Montreal, QC, Canada
L. M. Lix
Department of Community Health Sciences, University of
G. Finlayson
Finlayson and Associates Consulting, Kingston, ON,
Canada
M. Heaman
College of Nursing, Rady Faculty of Health Sciences,
University of Manitoba, Winnipeg, MB, Canada

https://doi.org/10.1007/978-1-4939-8715-3_9
172 L. L. Roos et al.
Manitoba’s Indigenous Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

Hospitals, Emergency Departments, ICUs, and
Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Maternal and Child Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Knowledge Translation (KT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
The Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Other KT Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Impact of Large Integrated Data Repositories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Looking Ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Summing Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
Abstract Introduction
The impact of the Manitoba Centre for Health
Policy (MCHP) on policy development has The Manitoba Centre for Health Policy’s (MCHP)
resulted from an integrated approach to knowl- impact on policy and program development is the
edge translation (KT), combined with a close result, in large part, of an integrated approach to
relationship between the proposed/ongoing knowledge translation. This chapter focuses on
research and those working on provincial pro- this integrated approach which has become one
grams. Under a 5-year funding agreement with of the key factors underlying MCHP’s success.
Manitoba Health, the director of MCHP negoti- This chapter begins with a description of the
ates five new major projects (called “deliver- deliverable process and the numerous ways
ables”) annually with the Deputy Minister of researchers interact with provincial government
Health. Researchers interact among themselves personnel. MCHP enjoys an arm’s-length rela-
and with the provincial government in several tionship with the provincial government, which
ways: through forums, advisory group meetings, has no involvement in the interpretation of data or
knowledge translation workshops, and Need to drafting of deliverables (reports), and MCHP
Know (NTK) team meetings. Need to Know retains rights to publish all of its work. Next, the
representatives are from all the regional health impact several deliverables have had on govern-
authorities (RHAs), from Manitoba Health, and ment policies and programs will be highlighted.
also from MCHP staff. This and other activities Following this, an overview of the knowledge
related to knowledge translation are discussed. translation (KT) activities that have resulted in
This chapter outlines steps in the deliverable so many of MCHP’s impacts is provided. To
process. MCHP researchers retain publication conclude, important and interesting research
rights over the content of the deliverable with opportunities as well as challenges that lie ahead
government input being advisory only. Several for scientists using information-rich repositories
deliverables over the past 15 years, and their like ours will be discussed.
program and policy impacts, are discussed.
Finally, linking information from various
government departments with longitudinal The Deliverable Process
and familial data has created a large, integrated
data repository. Looking ahead, life stage ana- What Is a Deliverable?
lyses and intervention studies have great poten-
tial. In keeping with past success, MCHP MCHP works under a 5-year funding agreement
believes information-rich environments should with Manitoba Health to undertake five new
continue to facilitate opportunities for new major research projects a year plus KT events
types of research and policy analysis. that ensure the research is understood by
8 Health Services Information: From Data to Policy Impact (25 Years of Health Services and. . . 173
policy-makers and planners. These projects – following spring. At that time, Manitoba Health
termed deliverables – address health and social provides MCHP with a brief description of each
questions that can best be answered using data deliverable. These descriptions are posted on the
from the Population Health Research Data Repos- MCHP website in the area called “Upcoming
itory (Repository) which is developed, housed, MCHP Reports” http://umanitoba.ca/faculties/
and maintained at MCHP (see ▶ Chap. 2, health_sciences/medicine/units/community_
“Health Services Data: Managing the Data Ware- health_sciences/departmental_units/mchp/upcom
house: 25 Years of Experience at the Manitoba ing_deliverables.html.
Centre for Health Policy”). The associate director of research at MCHP
Each deliverable takes approximately 2 years works with the director to assign the investigators
to complete. Deliverables are produced by teams for each project. Soon after, a similar process is
that typically include a principal and co-principal undertaken by the lead research coordinator, the
investigator (PI and Co-PI), a research coordina- associate director of data access and use, and the
tor (RC), research support (RS), and data analysts research support coordinator to identify the
(DAs). Team members are typically chosen based remaining team members from their respective
on their area of expertise. Teams typically meet workgroups (research coordinators, data analysts,
weekly or biweekly throughout the course of a and research support). Occasionally, deliverable
deliverable to discuss the direction and progress teams will include graduate students or members
of the study, interpret results, and determine how outside of MCHP because of their expertise or
best to “tell the stories” that emerge from the data. interest in the topic.
A few times over the course of a deliverable, the
team also meets with an “advisory group” made
up of representatives from government and other The Approval Process
stakeholders who have relevant expertise and can
provide valuable feedback at different points in The PI works with the deliverable team to develop
the research process (see section “Meetings” for an initial analysis plan, which is then presented
more details). and critically reviewed in a research-scientist
forum held at MCHP. This forum is attended by
internal researchers and team members who help
refine the plan. The RC, in collaboration with the
Negotiating the Deliverable Topics
PIs, then prepares and submits the Health Infor-
mation Privacy Committee (HIPC) and Heath
Topics for deliverables are jointly determined by
Research Ethics Board (HREB) applications for
the Deputy Minister of Manitoba Health in nego-
approval. Depending on the datasets to be used in
tiation with the director of MCHP. Consultations
the deliverable, additional approvals from other
with assistant deputy ministers, MCHP scientists,
data providers may also be required. Throughout
and regional health authorities (RHAs) are under-
the life of the project, changes to the analysis plan
taken when appropriate. The final list of topics is
(“amendments”) and annual progress reports must
signed off by the Minister of Manitoba Health.
be submitted to HREB in order for the project to
Ideas are solicited from a broad range of stake-
maintain its approved status.
holders. If the research seems feasible using
repository data, the idea is added to a list. Specific
topics are also put forward by Manitoba Health
and the Healthy Child Committee of Cabinet Meetings
(Health Child Manitoba is Manitoba’s long-
term, cross-departmental strategy for putting fam- Meetings of the Advisory Group
ilies and children first). Negotiations typically An advisory group (AG) is also formed for each
start in the fall with final decisions made by the deliverable. It includes data providers, clinicians,
health or social service experts, provincial plan- support from various other sources. The NTK
ners, policy-makers, RHA representatives, and Team meets three times a year for 2-day work-
other stakeholders with an interest in the topic. shops, together creating knowledge of relevance
This group meets two to three times over the life to regional planners, informing the research,
of the project to review progress, discuss findings, building capacity among the partners, and devis-
suggest alternative strategies or approaches where ing dissemination and application strategies to
necessary, provide clarifications based on their promote research uptake. Its foundation and
area of expertise, and review the final draft of the goals are simple; by having researchers work
deliverable. It is also not uncommon for AG mem- with decision-makers, research may be brought
bers to be contacted between meetings for their closer to policy. In other words, the hope is to
advice on specific issues. A strong relationship smooth the transition between analysis and appli-
with policy‐makers and other stakeholders also cation, between paper and practice. In 2005 the
facilitates access to data and other nonfinancial national “CIHR Knowledge Translation Award”
resources that are important for the success of was awarded to the NTK Team for regional
the research MCHP conducts. impact on health research.
The AG is a critically important group for
MCHP; many times the real expertise concerning
Presentations During the Project
issues of data collection, history, and use lies with
members of the AG. Their input provides an impor-
During the life of a typical deliverable, there are
tant check on any assumptions the deliverable team
numerous opportunities to discuss the project,
may have formed. Occasionally, depending on
present preliminary results, and report on pro-
their contributions, AG members may also be rec-
gress. Such opportunities include:
ognized with authorship on the final report.
Meetings with the Associate Director • MCHP knowledge translation workshop days –
of Research where invited guests consisting of government
Throughout the project, PIs meet with the associ- stakeholders meet with MCHP scientists and
ate director of research regularly to discuss their support staff to discuss deliverables
projects and enlist support if projects are • Provincial RHA Day
progressing slowly or running into problems. • Winnipeg RHA Day
Two common challenges addressed at these meet- • Manitoba Health Day
ings include the acquisition of new data or human • Manitoba Government Day
resource issues (lack of resources, inappropriate • Research forums – meetings where invited
skills or expertise, workload conflicts, etc.). These participants discuss the substantive merits of
meetings also help to ensure that steady progress various research proposals and progress
is being made and that expectations concerning updates
deadlines are achievable. • Held weekly on Wednesday afternoons at
MCHP
Meetings with the Need to Know • NTK meetings (held two to three times a year,
(NTK) Team as discussed above)
A small number of deliverables involve the Need • MCHP Advisory Board meetings (held
to Know Team (NTK Team), a collaborative biannually)
researcher/senior-level-planner group that • The board consists of five deputy ministers
includes representatives from all RHAs, several plus leading experts, other academic repre-
representatives of Manitoba Health, and MCHP sentatives, and the MCHP executive group.
staff. The NTK Team was established in 2001
through funding from the Canadian Institutes of The main steps in the deliverable process are
Health Research (CIHR) and has continued with presented in Table 1.
Table 1 Steps in the deliverable process

Analysis plan and • Develop draft analysis plan
approvals • Present to researchers for discussion
• Finalize analysis plan
• Apply for approvals (HIPC, HREB, other data approvals as required)
Meetings (ongoing • Weekly/biweekly team meetings begin and continue until draft writing stage
throughout project) • Advisory group – two to three meetings over course of project
Data preparation and • Data cleaning/validation for new and established datasets
methodology • Start defining inclusion/exclusion criteria and defining outcome measures and
planning (ongoing independent variables, statistical methods, etc. (this is a somewhat iterative process
throughout project) throughout project)
Documentation • DA(s) document SAS programs and output
(ongoing throughout • RC documents methodology based on information provided in
project) meetings, email correspondence, and annotated output
• Team identifies concepts; DA/PI (with RC support as necessary) develops by end of project
• RC/DA Identify and define key glossary terms
Presentations (various • MCHP research-scientist forums (two to three per deliverable)
times throughout • MCHP government knowledge transfer workshop days
project) • Academic conferences
Draft report writing • Writing may be ongoing during the course of the deliverable but often occurs close to the
and prep for review end of the analysis
• Internal review by deliverable team and senior reader (an MCHP researcher); feedback sent
to PI and modified by PI
• Identification of external reviewers
Delivery to Manitoba Minimum 60 days before release
Health and External • Finalized draft for Manitoba Health to review for factual accuracy and comments regarding
Review the content
• PI, in collaboration with the associate director of research, identifies external reviewers
• Concurrently, copies sent to MCHP researchers, advisory group members, team
members, senior reader, external reviewer(s), and relevant data providers
Four-page (or Iterative process between PI and writer
two-page) deliverable • Identify writer of lay summary
lay summary • Copy of draft report provided to summary writer
Briefings • Deputy Minister of Health
• Manitoba Health – senior management and assistant deputy minister (other depts. also
invited or
they may request separate briefing)
• Minister of Health (if requested)
• WRHA (if their data were used and they requested a briefing)
• Other briefings as requested or required
Editing and final • Draft revised per reviewers’ feedback
report • In-house editor performs content edits and works with PI to finalize deliverable report
Final production • Review of printers’ proof; approval of printing
• Similar process followed for deliverable lay summary
• Layout and preparations for publishing by RS; report sent to printers
Release date and • PI and associate director, research consults with communication officer and research
requirements support lead to determine deliverable release date and if a media conference is necessary
• Manitoba Health advised of release date
Dissemination • Embargoed copies to Manitoba Health and select provincial government ministers ~1 week
prior
• Communications officer prepares media release
• Release circulated to University’s Public Affairs office and to local media
• Communications Officer handles all media related requests for questions and interviews
• Deliverable and all related content uploaded to MCHP website for public access
• PI responds to interview requests
An important component of the process is • Announcements are made through social

engaging communication during the course of media.
each deliverable. The following summarizes • All completed deliverables as well as
most of the important KT activities: research in progress are posted on the
MCHP website:http://mchp-appserv.cpe.
umanitoba.ca/deliverablesList.html
1. Research-based communication
• An advisory group consisting of academics,
clinicians (where appropriate), and policy- In addition, MCHP delivers the report to Man-
or program-oriented stakeholders are itoba Health at least 60 days before release. This
involved in developing the content of gives the government time to prepare a response
each project. to the findings. From an academic perspective,
• During final document review, at least one MCHP researchers retain full publication rights
and possibly two external reviewers who over the contents of the deliverable once it has
are experts in the field are recruited to been released and any input from government is
review the document. advisory only. This arm’s-length relationship with
• Presentations are made at academic government helps to maintain academic rigor in
conferences. the development of the final product. Once
released, most deliverables form the basis of
2. Dissemination to key decision-makers
peer‐reviewed articles in academic journals.
• Consultations with the Deputy Minister
Prior to any dissemination, all publications and
(DM) of Health, Healthy Child Committee
presentations are reviewed by the Manitoba Gov-
of Cabinet, and KT forums are used to dis-
ernment (through the HIPC coordinator) (and if
seminate results and collect ideas for future
necessary by other government departments who
research.
have provided data) for privacy and confidential-
• Core research teams frequently include clin-
ity issues.
ical and policy- or program-oriented con-
tent experts.
• The MCHP director briefs the Deputy Min-
ister of Health during regular bimonthly Deliverable Measures and Indicators
meetings.
• Prior to release, the PI briefs the assistant Generally, MCHP analyzes data at the popula-
DM, Manitoba Health, and other tion level. This provides an opportunity to
stakeholders present results at a geographic level (i.e., by
• During the project, numerous briefings are RHA and/or Winnipeg Community Areas, see
given at government KT workshops. Figs. 1 and 2). RHAs are given important
3. Public dissemination information that allows them to improve prac-
• A four-page (or two-page) deliverable sum- tices, policies, and healthcare services in their
mary aimed at a lay audience who may be particular region and to make comparisons
interested in the project is developed. between regions or with the province as a
• A one-page “physician briefing” may be whole. Reports often consist of common indi-
developed if relevant. cators of population health status, healthcare
• An infographic is designed and produced if use, and quality of care that are presented by
relevant. socioeconomic status (SES). This allows
• A media release is prepared for the policy-makers to compare populations that
release date. are less well-off (low socioeconomic status)
• The PI responds to media requests for infor- to those who are better-off (high socioeco-
mation, comments, or interviews. nomic status) and to design programs and
Fig. 1 Manitoba’s Five

Regional Health Authorities
(RHAs) (former RHAs are
shown in brackets)
(formerly Churchill;
now part of
Winnipeg Regional Health)
Northern
(formerly Burntwood)
(formerly Nor-Man)
(formerly
Parkland) (formerly
(formerly North
Interlake) Eastman)
Prairie Interlake-
Mountain Eastern
(formerly Winnipeg
Assiniboine)
(formerly Southern
Brandon) (formerly (formerly
Central) South Eastman)
practices that address inequities in the The “Need to Know” Team Deliverables
healthcare system. Table 2 provides a list of
frequently included study indicators. As described above, a small number of all deliver-
ables involve the Need to Know Team (NTK
Team), a collaborative researcher/senior-level-
Highlights of Selected Deliverables planner group that includes representatives from
all RHAs, several representatives of Manitoba
This section provides an overview of MCHP deliv- Health, and MCHP staff.
erables (see Table 3) that have had specific or ongo-
ing impacts on policy and programs in the Manitoba The RHA Indicators Atlas Reports
community. The deliverables highlighted were The NTK Team is an important component of
published within the last 15 years (2000-2014) and the RHA Atlas deliverables. The Manitoba
there were no major criteria for their selection. Only RHA Indicators Atlas reports provide regional
deliverables with a concrete example of impact on and subregional data on over 50 indicators of
policy and programs in Manitoba were described. population health status, health service use,
Fig. 2 Winnipeg
Community Areas (WCAs)
Seven Oaks-N
River East-N
Seven Oaks-W
Inkster Seven Oaks-E

-W
Point River East-W
Douglas-N
Inkster-E River East-E
Point
Douglas-S River
East-S
St. James- Downtown Transcona
St. James- Assiniboia-E -E St.
Downtown
Assiniboia-W -W Boniface
-W
River River
Heights-W Heights-E
St. Boniface-E
St. Vital
Assiniboine South -N
Ft. Garry-N
St. Vital-S
Ft. Garry-S
* Churchill is also part of the Winnipeg Health Region (not shown in this map)
and quality of care. These reports provide The establishment and early work of the
RHAs with information on which to plan, NTK Team also resulted in organizational
increasing the likelihood that they will achieve effects in all three partners (academic, provin-
their goals, and allow all RHAs to compare cial government, and RHAs). Several RHAs
their health status with regional and provincial revised job descriptions and responsibilities to
averages. The three atlases (see Table 3 a–c) allocate more time and energy to finding and
were commissioned by Manitoba Health to using evidence to inform decisions. At least one
inform the Comprehensive Community Health RHA actually created a new full-time position
Assessment (CHA) reports required by provin- for this type of work. RHA representatives on
cial legislation every 5 years. the NTK Team are extremely valuable members
The atlases for CHA reporting are also used to of advisory groups for other deliverables, as
develop RHA strategic plans. Over the years, they already have an established appreciation
numerous regions have told MCHP that resource of the repository’s data and its possible uses.
allocation plans have been informed by evidence The team also increased the effectiveness and
from our reports (e.g., the need to increase efficiency of the CHA network group, which
resources or support in some areas, while reducing has many representatives in common with the
them in others). NTK Team. Each atlas has resulted in a round
Table 2 Frequently used health indicators in MCHP deliverables

Mortality Quality of Primary Care
Total Mortality Antidepressant Prescription Follow–Up
Premature Mortality Asthma Care: Controller Medication Use
Causes of Mortality Diabetes Care: Eye Examinations
Life Expectancy Post–AMI Care: Beta–Blocker Prescribing
Potential Years of Life Lost (PYLL) Benzodiazepine Prescribing for Community–Dwelling Seniors
Suicide Benzodiazepine Prescribing for Residents of Personal Care Homes (PCH)
Illnesses, Diseases and Chronic Conditions Immunizations and Prescription Drug Use
Diabetes Influenza Immunization (Vaccination)**
Lower Limb Amputations Among Diabetics Pneumococcal Immunization (Vaccination)**
Hypertension Complete Immunization (Vaccination) Schedule (Ages 1, 2, 7, and 11)***
Total Respiratory Morbidity (TRM) Number of Different Types of Drugs Dispensed per User
Asthma Pharmaceutical Use
Arthritis Cost of prescription drug use
Osteoporosis Antibiotic Prescriptions
Multiple Sclerosis Antidepressant Prescriptions
Stroke Antipsychotic Prescriptions
Congestive Heart Failure (CHF) Opioid Prescriptions
Coronary Heart Disease (CHD)/Ischemic Heart Disease (IHD) Benzodiazepine Prescriptions
Acute Myocardial Infarction (AMI)
Inflammatory Bowel Disease (IBD)
Preventitive Care and Screening
Dialysis Initiation
Complete Physicals
Obesity
Breast Cancer Screening (Mammography)**
Cancer
Cervical Cancer Screening (Papanicolaou) (PAP) test)**
Mental Illness
Substance Abuse
Often grouped
Depression
together as
Mood and Anxiety Disorders
"Cumulative
Personality Disorders
Mental Health Long Term Care and Home Care
Schizophrenia
Disorders" Supply of PCH Beds
Dementia
Personal Care Home (PCH) Admissions
Suicide Attempt
Personal Care Home (PCH) Residents
Physician Services Median Waiting Times for PCH Admission from Hospital
Use of physicians/physician visits Median Waiting Times for PCH Admission from the Community
Reasons for physician visits Level of Care on Admission to PCH
Ambulatory Visits Median Length of Stay by Level of Care on Admission to PCH
Ambulatory Consultations Location: Where Residents Went for Personal Care Home (PCH)
Majority of Care Catchment: Where Patients Came From Prior to Admission to Personal Care Home (PCH)
Continuity of Care Home Care
Location of Visits of General and Family Practitioners Days of Home Care Service Received
Location of Visits to Specialists
Hospital Services Maternal and Child Health
Hospital Bed Supply Prenatal and Family Risk Factors (Family First Data)
Use of Hospitals Level of Maternal Education
Inpatient Hospitalization Teen Pregnancy
Day Surgery Maternal Depression/Anxiety
Hospital Days Used in Short Stays Social Isolation
Hospital Days used in Long Stays Relationship Distress
Injury Hospitalizations Sexually Transmitted Infections
Causes of Injury Hospitalizations Prenatal Care
Intentional Injury Hospitalizations Prenatal Alcohol Use
Unintentional Injury Hospitalizations Prenatal Smoking
Causes of Hospitalization Breastfeeding Initiation
Causes of Hospital Days Used Infant and Child Mortality
Hospital Separation (Discharge) Size for Gestational Age
Hospital Readmission Newborn Hospital Readmission
Factors Associated with Readmissions
Hospitalization Rates for Ambulatory Care Sensitive (ACS) Conditions Children Not Ready for School (in One or More Early Development Instrument (EDI) Domains)
Hospital Location: Where Residents Were Hospitalized—Hospitalizations Attention–Deficit Hyperactivity Disorder (ADHD)
Hospital Location: Where Residents Were Hospitalized—Days Asthma Prevalence
Hospital Catchment: Where Patients Using RHA Hospitals Came From—Hospitalizations
Hospital Catchment: Where Patients Using RHA Hospitals Came From—Days
Alternate Level of Care (ALC)
Intensive Care Unit (ICU)
High Profile Surgical and Diagnostic Services Education
Cardiac Catheterization Early Development Instrument (EDI)
Percutaneous Coronary Interventions (PCI) Number of school changes
Coronary Artery Bypass Surgery High School completion
Total Hip Replacement Grade Repetition
Total Knee Replacement Grade 3 Reading
Cataract Surgery Grade 3 Numeracy
Caesarean Section Grade 7 Mathematics
Cholecystectomy Grade 8 Reading and Writing
Hysterectomy Grade 9 Achievement Index
Dental Extractions Among Young Children* Grade 12 Language Arts Standards Tests
Computed Tomography (CT) Scans Grade 12 Math Standards Tests
Magnetic Resonance Imaging (MRI) Scans
*also considered as child health indicators
**also considered as quality of primary care indicators
***also considered as child health and quality of primary care indicators
Note: further indicator information can be found in the MCHP Glossary / Concept Dictionary online:
http://umanitoba.ca/faculties/health_sciences/medicine/units/community_health_sciences/departmental_units/mchp/resources/concept_dictionary.html
of site visits. Almost every region has invited from these regional workshops suggests that the
MCHP scientists to workshops in their RHAs impacts are significant and long-lasting.
to explore local results in depth and to discuss Several NTK Team members are also public
implications for policy and planning. Feedback health officers who train medical students and
Table 3 Impact – the list of deliverables highlighted in this chapter

Year
# Deliverable Authors published
(a) The 2013 RHA Indicators Atlas Fransoo R et al. 2013
(b) Manitoba RHA Indicators Atlas 2009 Fransoo R et al. 2009
(c) The Manitoba RHA Indicators Atlas: Population-Based Comparison of Martens PJ et al. 2003
Health and Health Care Use
(d) Sex Differences in Health Status, Health Care Use, and Quality of Care: A Fransoo R et al. 2004
Population-Based Analysis for Manitoba’s Regional Health Authorities
(e) Patterns of Regional Mental Illness Disorder Diagnoses and Service Use in Martens PJ et al. 2004
Manitoba: A Population-Based Study
(f) Profile of Metis Health Status and Healthcare Utilization in Manitoba: A Martens PJ, 2010
Population-Based Study Bartlett J et al.
(g) The Health and Health Care Use of Registered First Nations People Living in Martens PJ et al. 2002
Manitoba: A Population-Based Study
(h) The Epidemiology and Outcomes of Critical Illness in Manitoba Garland A et al. 2012
(i) Population Aging and the Continuum of Older Adult Care in Manitoba Doupe M et al. 2011
(j) An Initial Analysis of Emergency Departments and Urgent Care in Winnipeg Doupe M et al. 2008
(k) Using Administrative Data to Develop Indicators of Quality Care in Personal Doupe M et al. 2006
Care Homes
(l) Assessing the Performance of Rural and Northern Hospitals in Manitoba: A Stewart D et al. 2000
First Look
(m) Perinatal Services and Outcomes in Manitoba Heaman M et al. 2012
(n) Next Steps in the Provincial Evaluation of the Baby First Program: Measuring Brownell M et al. 2007
Early Impacts on Outcomes Associated with Child Maltreatment
(o) Assessing the Health of Children in Manitoba: A Population-Based Study Brownell M et al. 2001
(p) How Do Educational Outcomes Vary With Socioeconomic Status? Key Brownell M et al. 2004
Findings from the Manitoba Child Health Atlas 2004
Note: All deliverables are available on our website: http://mchp-appserv.cpe.umanitoba.ca/deliverablesList.html
residents in their communities. In these regions, Regional Health Authority’s (WRHA) delibera-
trainees may develop reports on the health of the tions regarding heart health services in the
communities in which they are working; they are mid-late 2000s. There had been some movement
frequently referred to the RHA atlas reports as a toward creating a women’s heart health center,
key source of information. Other NTK Team based on other evidence (not coming from
members have used atlases at their regional MCHP) demonstrating that female heart attack
board of directors meetings, tackling one or two patients were not receiving the same level of
chapters of the report at each of a series of meet- service as their male counterparts. The MCHP
ings. This provides valuable education for board report showed that this apparent sex bias was not
members and the opportunity for discussion with actually real. Within every 5-year age group,
senior management. female and male heart attack patients received
The two most recent RHA atlases are also the same level of care. The difference in inter-
listed 2nd and 16th on the list of the top vention rates was driven solely by the fact that
20 downloaded deliverables from MCHP’s female patients are known to experience heart
website over a 5-year period (from April attacks at a much older age (8–10 years older)
1, 2009, to March 31, 2014) (see Table 4). than males. Males were not being treated more
aggressively than females, but rather, younger
Other NTK Team Deliverables patients received more treatments than older
The sex differences report (see Table 3-d) may patients, and the younger patients were more
have also played some role in the Winnipeg likely to be male.
Table 4 The top 20 downloaded deliverables April 1, 2009, to March 31, 2014
Year published/ Page views
Rank Deliverable available online per year
1 Perinatal Services and Outcomes in Manitoba November 2012 104,735
2 The 2013 RHA Indicators Atlas October 2013 54,460
3 Social Housing in Manitoba: Part I and Part II June 2013 46,841
4 Projecting Personal Care Home Bed Equivalent Needs in Manitoba October 2012 27,572
Through 2036
5 Profile of Metis Health Status and Healthcare Utilization in Manitoba: A June 2010 15,941
Population-Based Study
6 Health Inequities in Manitoba: Is the Socioeconomic Gap in Health September 2010 11,195
Widening or Narrowing Over Time?
7 Pharmaceutical Use in Manitoba: Opportunities to Optimize Use December 2010 9,331
8 The Additional Cost of Chronic Disease in Manitoba April 2010 6,432
9 Manitoba Child Health Atlas Update November 2008 6,400
10 What Works? A First Look at Evaluating Manitoba’s Regional Health March 2008 6,118
Programs and Policies at the Population Level
11 Effects of Manitoba Pharmacare Formulary Policy on Utilization of December 2009 6,107
Prescription Medications
12 Defining and Validating Chronic Diseases: An Administrative Data July 2006 6,031
Approach
13 Patterns of Regional Mental Illness Disorder Diagnoses and Service Use September 2004 5,334
in Manitoba: A Population-Based Study
14 Assessing The Health Of Children In Manitoba: A Population-Based February 2001 5,213
Study
15 Who is in our hospitals and why September 2013 5,103
16 Manitoba RHA Indicators Atlas 2009 September 2009 4,975
17 The Health and Health Care Use of Registered First Nations People March 2002 4,906
Living in Manitoba: A Population-Based Study
18 How are Manitoba’s Children Doing? October 2012 4,832
19 Composite Measures/Indices of Health and Health System Performance August 2009 4,756
20 Population Aging and the Continuum of Older Adult Care in Manitoba February 2011 3,068
Note: PDF copies of all deliverables became available on the MCHP website in 1999
Averaged page views per year, over the 5-year period
The mental illness report (see Table 3-e) was comorbidity of physical and mental illness. The
important for documenting and spreading the timeliness and prominence of the report also
word about the high prevalence of mental illness resulted in its principal investigator, Dr. Patricia
in Manitoba and the high use of healthcare ser- Martens, being invited to join the first Scientific
vices by people with mental illness. This topic Advisory Board for the Mental Health Commis-
was identified as a high priority by the rural and sion of Canada.
northern RHAs and by the Deputy Minister The Mental Health Commission of Canada has
of Health and assistant deputy ministers. used MCHP research in launching its national
Between 1997 and 2002, more than one in four research project to find sustainable solutions for
Manitobans had at least one mental illness diag- homeless people with mental health issues.
nosis and used nearly half of the days people MCHP was included as a key partner in the
spent in hospitals. Most of the services used Winnipeg demonstration project: http://www.
were not for mental illness, but across the entire mentalhealthcommission.ca/sites/default/files/At
spectrum of physical illness as well. This added %252520Home%252520Report%252520Winni
important evidence to the understanding of the peg%252520ENG_0.pdf.
The mental illness report (see Table 3-e) also Committee of the Assembly of Manitoba Chiefs
revealed that close to 83 % of nursing-home studied the health of Manitoba’s Registered First
residents have at least one mental illness diag- Nations people, identifying factors that contribute
nosis, yet the most frequent users of psychia- to differences in health. The study focused on the
trists are people 35–55 years old. The report First Nations population as a group, as well as
indicated that planners may want to ensure that by Tribal Council and by on-reserve versus
facility staff are trained to provide care to off-reserve populations. Comparisons were made
address mental health as well as physical health to the Manitoba population across various health-
needs and that people in personal care homes related indicators. Compared to all other
are referred for treatment. This finding may Manitobans, a Registered First Nations person’s
have contributed to the decision by the provin- life expectancy was 8 years shorter, dying at a
cial health Minister at the time, to invest more young age was more than doubled, the chance of
than $40 million to implement a comprehensive developing diabetes was more than quadrupled,
strategy to improve the quality of care in and the chance of having an amputation as a result
Manitoba’s personal care homes. The funding of diabetes increased 16-fold. Hospitalization
was pledged to hire 250 registered nurses, reg- rates were doubled for Registered First Nations
istered psychiatric nurses, and licensed practical persons compared to all other Manitobans, and
nurses, 100 personal healthcare aides, and they are three times higher for hospitalizations
50 allied healthcare professionals to increase due to injury. Overall, health status rates varied
the direct hours of care, strengthen the work across tribal councils. However, premature mor-
environment for staff, and provide dementia tality rates were lowest in the north and highest in
education to staff and families: http://news.gov. the south. This finding was surprising due to the
mb.ca/news/index.html?archive=&item=2707. “reversed” association with geography; in many
previous MCHP studies and other reports, the
health of residents of Northern Manitoba was
Manitoba’s Indigenous Population usually shown to be worse than those in the
south. However, this report showed the opposite
The Métis community makes up roughly 6 % of to be true: First Nations residents of the north were
Manitoba’s population. The Metis Health deliver- healthier than their counterparts in the south.
able (see Table 3-f) explored the Metis These findings have been extensively used by
community’s health status and healthcare use, as the Assembly of Manitoba Chiefs (AMC) health
well as many social indicators of health. Overall, councils for planning.
Métis people living in Northern Manitoba were
found to be less healthy compared to those living
in the southeast region (South Eastman) (see Hospitals, Emergency Departments,
Fig. 1). This deliverable drew the attention of the ICUs, and Long-Term Care
Manitoba Metis Federation (MMF), who were
concerned with identifying regions and health The epidemiology and outcomes of critical illness
areas needing improvement in order to better the in Manitoba report (see Table 3-h) allowed link-
health and well-being of the Métis community. age of the extensive clinical database created by
The MMF worked alongside MCHP to produce the Department of Critical Care Medicine to the
this report as one element in the regional planning repository. This combination of data sources is
profiles and to provide a springboard for other unique, allowing a first-ever population-based
studies. This was the first attempt in Canada to exploration of the use of intensive care units
do a population-level Metis health assessment. (ICUs) and fostered the development of an
The Health of First Nations deliverable (see ongoing research group. In this report, the entire
Table 3-g) with the approval and collaborative population of Manitoba and all hospitals were
support of the Health Information and Research assessed from 1999/2000 to 2007/2008. About
0.6 % of Manitoba adults are admitted to an ICU home use. While about 50 % of newly admitted
each year, which means that about 8 % of those in nursing-home residents required weight-bearing
hospitals are assessed as needing ICU care. Over a help to complete activities of daily living (ADLs),
9-year period, ICU beds in Winnipeg were full about a quarter of new residents had at most mod-
less than 5 % of the time. Outside of Winnipeg, erate challenges across several clinical domains
ICU beds were full less than 1 % of the time. The (e.g., ADLs, behavior, continence, cognitive perfor-
average age for ICU patients was 64 years and mance). Furthermore, about 12 % of newly admit-
admission rates peaked at those 80 years of age. ted nursing-home residents had the same clinical
Overall, about two-thirds of adult ICU care was profile as supportive housing clients (i.e., minor
for patients 60 years and older and the annual ADL and/or cognitive challenges, with few needs
number of ICU admissions have dropped slightly; in other clinical areas), suggesting the potential of
however, the length of stay in ICU’s has increased supportive housing to offset nursing-home use, now
over time. Repeated need for ICU care was sur- and into the future. Collectively, these findings
prisingly common (15 %) and previous ICU emphasized the need to develop appropriate transi-
patients were almost four times more likely to be tional strategies across the older adult continuum of
admitted again to an ICU in the year after dis- care, ensuring that people have access to the right
charge. Finally, the most common reason for care at the right time. Subsequently the Manitoba
ICU admission was cardiovascular conditions, government announced two initiatives which may
followed by sepsis, lung disorders, accidents or have been informed by this work:
traumas, and poisonings. This exploratory deliv-
erable was the first of its kind to link clinical data • Advancing Continuing Care – A Blueprint to
on ICU patients into a population-based reposi- Support System Change
tory; thus it created a globally unique and flexible http://news.gov.mb.ca/news/?item=31246
research tool. This tool is being leveraged for • Manitoba’s Framework for Alzheimer’s Dis-
use in research projects and graduate student the- ease and Other Dementias
ses. The results on ICU bed utilization confirmed http://news.gov.mb.ca/news/index.html?
that the number of ICU beds in the Winnipeg item=31385
RHA was within the recommended range.
The report has resulted in four published manu- The analysis of emergency department’s (see
scripts (Garland et al. 2013, 2014a, b; Olafson Table 3-j) has had several impacts. Manitoba
et al. 2014), with one more underway. It has also Health approved funding for the Eastman RHA
fostered several related research projects which (see Fig. 1) to hire 2.1 equivalent full-time staff to
have received peer-reviewed funding and pro- support mental health services. This is due to the
vided additional publications. reports’ finding that 54 % of frequent emergency
The population in Manitoba, as it is in other parts department (ED) users (seven or more ED visits
of Canada and the developed world, is rapidly aging. per year) have been diagnosed with two or more
The population aging deliverable (see Table 3-i) mental illnesses. The funding was approved for
looked at the use of home care, supportive housing, the placement of Registered Psychiatric Nurses in
and personal care homes (also known as nursing EDs. Manitoba Health designated a total of
homes) in Winnipeg MB from several perspectives. $165,302 for the 2008/09 and 2009/10 budget
First, past rates in nursing-home use were used to years: http://news.gov.mb.ca/news/index.html?
create two scenarios which showed that nursing- archive=&item=4458. The Canadian Health Ser-
home use will increase by 30–50 % by 2031, vices Research Foundation (CHSRF) included
emphasizing the importance of developing strate- some of the primary findings of this deliverable
gies to continually reduce rates of nursing-home in their publication on emergency room
use. This work also revealed the clinical profile of overcrowding: http://www.cfhi-fcass.ca/sf-docs/
current day nursing-home residents, showing the default-source/mythbusters/Myth-Emergency-
potential for supportive housing to offset nursing- Rm-Overcrowding-EN.pdf?sfvrsn=0.
The CHSRF also wrote about MCHP’s ability Winnipeg Community Areas of Point Douglas,
to transform data into quality care and transfer Downtown, and Inkster (see Fig. 2). The deliver-
information down the chain of command to able included new information on rates of
those that could make the appropriate changes postpartum depression/anxiety in Manitoba,
and improvements. Their report highlighted the revealing that women who experienced anxiety
approach the principal investigator Dr. Malcolm or depression during their pregnancy were eight
Doupe took in explaining the deliverable “Using times more likely to experience it postpartum. The
Administrative Data to Develop Indicators of WRHA reaffirmed the Women’s Health Pro-
Quality Care in Personal Care Homes” (see gram’s efforts to ensure that information and
Table 3-k) to the Brandon RHA personal care resources are continuously available in the post-
homes’ managers and policy-makers. Results partum period to foster mental health. Staff in the
were seen immediately in the quality of care: a Population Health and Health Equity and Public
pneumonia care map was introduced; the region’s Health Program, administered by Manitoba
“personal care forum” became more productive, Health, noted that the perinatal deliverable
setting goals and action plans and updating each influenced their thinking about potential positive
other on their progress; and a program for better impacts of public health engagement early with
managing medications of new residents was intro- families in the prenatal period; findings from the
duced: http://www.cfhi-fcass.ca/sf-docs/default- deliverable have been used to inform develop-
source/building-the-case-for-quality/TRANSFOR ment of the provincial public health nursing stan-
MING_DATA_ENG_1.pdf?sfvrsn=0. dards. The WRHA is actively interested in
The performance of rural and northern hospi- reducing health inequities. They have been partic-
tals deliverable (see Table 3-l) showed that rural ularly interested in breastfeeding initiation. The
Manitobans do not use nearby hospitals. Across perinatal deliverable highlighted variations in ini-
68 rural hospitals, occupancy rates were below tiation rates across the city (e.g., over 90 % in an
60 % and some hospitals and health centers were affluent neighborhood and approximately 65 % in
keeping admitted patients for too long (low scores a less affluent one). These variations were signif-
on discharge efficiency). In 2002 the Manitoba icant in motivating the WRHA to begin tracking
Government announced a pilot project with the breastfeeding initiation and duration rates across
Southeast Manitoba RHA to serve more surgery Winnipeg.
patients at two local hospitals in an effort to make The Baby First deliverable (see Table 3-n)
better use of rural facilities and provide patient evaluated how well the Manitoba Baby First
care closer to home: http://www.gov.mb.ca/chc/ screening program (established in 1999, now
press/top/2002/07/2002-07-09-01.html. called “Families First”) works with regards to
identifying children at risk. About 75 % of babies
had a Baby First screening form filled out; the
Maternal and Child Health screen was reasonably successful in picking out
children who eventually ended up in foster care.
The Perinatal Services and Outcomes deliverable The strongest predictors of a child ending up in
(see Table 3-m) has been the number one deliver- care were having a file with local child protection
able downloaded from the MCHP website (see services, being on income assistance, having a
Table 4). The WRHA Women’s Health Program mother who did not finish high school, and living
used the report to validate their initiatives and in a one-parent family with no social support.
reiterate the importance of the prenatal period in Because the age of the mother at the birth of her
promoting optimal early childhood development. first child was also found to be highly predictive
Inadequate prenatal care is being addressed (and was not currently being asked on the screen-
through the “Partners in Inner-city Integrated Pre- ing form), Healthy Child Manitoba responded to
natal Care (PIIPC)” initiative, stimulated in part preliminary drafts of the report by adding this item
by the high rates of inadequate care found in the to the screening form (see Fig. 3). In addition,
Fig. 3 The revised 2007

families first screening form
(Reproduced with
permission from the
Manitoba Government)
child maltreatment and assault injury rates in chil- (see Table 3-o) found that infant mortality was
dren up to 3 years of age declined after the Baby double for the lowest-income areas compared to
First home visiting program was initiated. the highest-income areas, and the leading cause of
Poor health during childhood raises the risk death for children was injury due to motor vehicle
of poor adult health. The Child Health Atlas crashes. Children living outside of Winnipeg are
twice as likely to die from injuries and almost had a passing rate of 95 %. The right side of
two-and-a-half times as likely to be hospitalized the graph not only includes those students who
for injuries. Because of these findings, Manitoba wrote the test, but more importantly, also
Health announced a new public initiative aimed at includes those students born in the same year
preventing childhood injuries in the home: http:// who are still residing in Winnipeg and who
news.gov.mb.ca/news/index.html?item=25659& should have written the test had they
posted=2002-02-26. progressed through the school system as
The children’s educational outcomes and expected. This population-based analysis
socioeconomic status deliverable (see Table 3- shows a much steeper gradient, with the pass-
p), which stemmed from the second Child ing rates for youth in families on provincial
Health Atlas, revealed some very surprising income assistance dropping to 16 %. The two
findings. This deliverable looked at perfor- figures differ in that the one on the right
mance on Grade 12 standard tests by socioeco- includes those who have been held behind a
nomic status (SES) (see Fig. 4). The left side grade or more or who have withdrawn from
of Fig. 4 shows, for youths who wrote the test, school. Such surprising findings demonstrate
that students from the poorest families (those the need for better educational programs and
receiving provincial income assistance) had a initiatives for students from low-income fami-
passing rate of 75 %, whereas students resid- lies. This report, along with the Child Health
ing in the city’s highest-income neighborhoods Atlas, led to the development of two initiatives:
Pass/Fail Rates of Test Writers +

Pass/Fail Rates of Test Writers Only Those who Should Have Written Test
100%
90%
Withdrawn
80%
70% In Grade 11 or
lower
60%
In Grade 12 but
no LA test mark
50%
95%
88% 88% 91% Drop Course,
40% Absent, Exempt, 80%
75% Incomplete 71%
Fail 64%
30%
52%
20%
Pass
10%
16%
0%
Income Low Low-Middle Middle High Income Low Low-Middle Middle High
Assistance Assistance
Socioeconomic Status (SES) Socioeconomic Status (SES)
Note: A version of this figure has also been published in Roos, NP et al., 2010, Milbank Quarterly, 88(3):382-403 and in Brownell, M et al., How Do
Educational Outcomes Vary With Socioeconomic Status? June 2004, Manitoba Centre for Health Policy
Fig. 4 Grade 12 language arts (LA) test performance by Winnipeg socioeconomic status, 2001/02. Youths born in
Manitoba in 1984
• The “Community School Investigators (CSI) disseminated through the natural interest of the
program” decision-makers involved in the programs or poli-
http://www.bgcwinnipeg.ca/system/resources/ cies for which they are relevant. Although research
W1siZiIsIjIwMTQvMDEvMTYvMTgvMDQvM evidence is not the only influence on policy (often
zUvNDE3L0NTSV9SZXBvcnRfMjAxMi5wZG other pressures, such as economic or political real-
YiXV0/CSI%20Report%202012.pdf (p. 6) ities, override the evidence), if policy-makers
• The Community Schools Partnership Initiative and planners understand the research, there is a
(CSPI) good chance it will be important in the decision-
http://www.edu.gov.mb.ca/cspi/ making process (Martens 2011).
Some people have expressed concerns about
Two additional child health atlases have been having policy- and decision-makers involved in
produced at MCHP since the 2001 and 2004 the process from start to finish. What if they bias
atlases: The Child Health Atlas Update (2008) the results? What if they ask the wrong questions?
(#9 in Table 4) which provided much needed What if they don’t like the results? Such questions
information on child health for the annual echoed our own fears in the early years. Through a
Community Health Assessments and How Are combination of research funded from deliverables
Manitoba’s Children Doing? (2012) (#18 in and our external grant-funded research from
Table 4) which was a companion report to the peer-reviewed granting agencies such as CIHR,
legislated 5-year Healthy Child Manitoba report. Research Manitoba, and others, MCHP has learned
that the best questions come from an exchange of
ideas, both among researchers and between
Knowledge Translation (KT) researchers and research users (Martens 2011).
Situating MCHP within the University of Max

Rady College of Medicine with ongoing, renew- The Repository
able core funding from the provincial government
has allowed academic freedom, intellectual curi- MCHP continues to show leadership in the man-
osity, and a high degree of research skill to com- agement of administrative data as it becomes the
bine with grounded work relevant to the questions custodian of ever increasing numbers of
of top-level decision-makers. The university also de-identified (“anonymized”) but linkable
supports the work of MCHP through tenured or datasets. Currently, the repository consists of
tenure-track faculty who work in the centre. Gov- more than 70 routinely collected administrative
ernment input continues to be integral to the pro- and clinical databases that are updated on an
cess of deciding the five deliverables funded by annual basis (see Fig. 5). The repository yields
Manitoba Health annually. This model has been incredible opportunities to advance the under-
called “integrated knowledge translation (KT)” standing of complex relationships between popu-
(Canadian Institute of Health Research (CIHR) lation health and the use of health and social
2014; Graham et al. 2007, 2009) and reflects the services (Martens 2011).
fact that users of the research are involved at the Documenting the repository and the concepts
outset. If those individuals looking for answers used in research projects are other forms of
have helped frame relevant questions with expe- knowledge translation (KT). MCHP continuously
rienced researchers who know the limitations of dedicates resources to expand and improve
the data, the scope of the literature, and what has documentation-related KT. Other researchers and
already been done in the area, the findings are policy analysts can read about, and even request,
more likely to draw attention and result in action. the statistical coding for various concepts that
Not only does the research have its feet on the were derived using administrative data – such as
ground, but it begins to walk (so to speak) because how MCHP defines “continuity of care,” an
of the people involved. The findings are “episode of care,” “comorbidity,” or “high school
80
70
60
50
Number
40
30
20
10
Fig. 5 The total count of databases in the repository by year
completion.” Accessibility in this respect con- the importance of stories to him as a child.
tinues to grow, evidenced by the fact that our “‘Stories are a way of melting the ice,’ [his father]
concept dictionary and glossary receive more said gently, ‘turning it into water. They are like
than 1.5 million hits a year (excluding bots and repackaging something – changing its form – so
Web crawlers). This is a remarkably high fre- that the design of the sponge can accept it’” (Shah
quency for a small academic unit (Martens 2011). 2007: 298). This is an apt metaphor for telling
research stories. Sometimes providing a written
report may not be enough. In these workshops,
Other KT Activities MCHP turns written reports into stories by
explaining how to read the graphs, how to look
MCHP has established a highly successful set of for connections, or how to relate data to real-life
annual workshops attended by top-level planners, settings. Repackaging the research allows it to be
policy-makers, healthcare CEOs, VPs of plan- understood and incorporated into the audience’s
ning, board members for RHAs, and front-line way of thinking (Martens 2011).
workers. These activities are based upon an inter-
active model of roundtable discussions concen-
trating on one or two MCHP reports. Attendees Impact of Large Integrated Data
are encouraged to look for the stories in the data. Repositories
Key to these workshop days is the presence of
MCHP scientists to explain how to read the Creation of a large integrated repository of data
reports. In the book Arabian Nights by Tahir across multiple government domains has facili-
Shah, the author talks about his father explaining tated groundbreaking innovative research. Record
linkage has merged information from different the population-based repository has great poten-
departments while, at the same time, extensive tial for “natural experiments” where administra-
longitudinal and familial data have allowed new tive data may be used to consider the impact of
types of studies and facilitated interdisciplinary policy and program changes. And research
work. The opportunities presented are unique designs can be improved by building on the
advantages of large repositories. types of data available in Manitoba to construct
control groups using propensity scores, sibling
comparisons, and fine-grained ecological infor-
Looking Ahead mation. To date, such efforts are basically
unexplored but have great potential for the future.
As seen in the discussion of deliverables, the very
large numbers of cases that accumulate when such
data are routinely gathered facilitate complicated Summing Up
multivariate analyses and allow studying
low-prevalence conditions or events. Because Research platforms lend themselves to forming an
these data are typically collected over long “ecosystem,” “an intertwined set of products and
periods of time, pre- and post-observations can services that work together” (El Akkad and
be organized around different life events at the Marlow 2012). The MCHP ecosystem involves
individual level, and also before and after key relations with people, including key decision-
program implementation – with a time frame makers, software (for data cleaning, record link-
extending for over 40 years in the case of the age, and analysis), the extensive documentation
MCHP repository. Merging data across different accessible through our concept dictionary and
ministerial departments can bring together indi- glossary, predictors and outcome measures
vidual information from several subject areas to derived from multiple files, and a methodologi-
create predictors useful in a variety of contexts cal/statistical tool kit. New data in the Manitoba
(i.e., population-based research on ethnicity, repository has expanded the type and number of
developing risk assessment tools) and permit studies being carried out. These capabilities foster
examining important connections affecting the useful interactions with a diversity of investiga-
lives of individuals and patients. Data tors; helping to avoid an overreliance on a single
documenting the use or lack of contact with the funding source and bringing in valuable new
healthcare system and residential mobility data perspectives.
can be put together for any interval from 1 day The approaches forwarded here seem gener-
to many years. A real but relatively unexplored ally relevant to “big data” where more attention
advantage of the MCHP repository would be to needs to be paid to questions of design and
follow those born in the 1970s, where the ability analysis. The significant effort required to clean
to track family structure events and health out- and prepare the databases should not be
comes over the first decades of life is outstanding. underestimated; Cukier and Mayer-Schoenberger
This line of inquiry provides the possibility of life have both noted the messiness of big data and
stage analysis: does a diagnosis of attention deficit highlighted the potential benefits of interagency
disorder which first occurs at age 4–8 have a collaboration in improving public services
different impact on educational outcomes than a (Cukier and Mayer-Schoenberger 2013). The
diagnosis which first occurs at age 9–12? How uses of population-based data are being more
important is a chronic disease diagnosis, one widely recognized. Information-rich environ-
which continues over time, compared with the ments should continue to facilitate opportunities
same diagnosis occurring during only one age for the next generation of researchers. That’s
period? the real impact of MCHP’s academic and
There is great interest in improving both obser- research history: building a culture where evi-
vational and interventional studies. In addition, dence informs policy in a way that works.
References Garland A, Olafson K, Ramsey CD et al. Distinct determi-

nants of long-term and short-term survival in critical
Canadian Institute of Health Research (CIHR). More about illness. Intens Care Med. 2014b;40:1097–105.
knowledge translation at CIHR. 2014. http://www.cihr- Graham ID, Tetroe J, Gagnon M. Lost in translation: just
irsc.gc.ca/e/39033.html. Accessed 31 Oct 2014. lost or beginning to find our way? Ann Emerg Med.
Cukier K, Mayer-Schoenberger V. The rise of big data. 2009;54:313–4.
Foreign Aff. 2013;92:28–40. Graham ID, Tetroe J, KT Theories Research Group. Some
El Akkad O, Marlow I. Apple at the summit: the trouble with theoretical underpinnings of knowledge translation.
being no. 1. 2012. http://www.theglobeandmail.com/ Acad Emerg Med. 2007;14:936–41.
technology/apple-at-the-summit-the-trouble-with-being- Martens PJ. Straw into gold: lessons learned (and still
no-1/article4546745/?page=all being learned) at the Manitoba Centre for Health Pol-
Garland A, Olafson K, Ramsey CD, et al. Epidemiology of icy. Healthcare Policy. 2011;6:44–54.
critically ill patients in intensive care units: a Olafson K, Ramsey C, Yogendran M et al. Surge capacity:
population-based observational study. Crit Care. analysis of census fluctuations to estimate the number
2013;17:R212. of intensive care unit beds needed. Health Serv Res.
Garland A, Olafson K, Ramsey CD et al. A population- 2014;50:237–252.
based observational study of ICU-related outcomes: Shah, T. In Arabian Nights: A Caravan of Moroccan
with emphasis on post-hospital outcomes. Ann Am Dreams. 1 edition. New York: Bantam Books; 2007.
Thorac Soc. 2014a;12:202–208.
Health Services Information: Key
Concepts and Considerations in 9
Building Episodes of Care from
Administrative Data
Erik Hellsten and Katie Jane Sheehan
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Health-Care Data and Defining the Unit
of Analysis: Historical Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
The Episode of Care: A Unifying Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
Episodes as an Analytical Tool: Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Comprehensiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Clinical Meaningfulness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
Episodes as an Analytical Tool: Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Data Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Time and Resources Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Methodological Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
Constructing an Episode of Care: Key Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Data Sources Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Individual-Level Record Linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Information on Type of Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Diagnosis Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
The Date/Time of the Service Delivered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Core Elements of the Episode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Defining the Index Event and/or Starting Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Defining the Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
Selecting the Scope of Services Included . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Outcome Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Constructing an Episode of Care: A Hip Fracture Example . . . . . . . . . . . . . . . . . . . . . . . 204
Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
E. Hellsten (*)
Health Quality Ontario, Toronto, ON, Canada
e-mail: erik.hellsten@hqontario.ca
K. J. Sheehan
School of Population and Public Health, The University of

https://doi.org/10.1007/978-1-4939-8715-3_10
192 E. Hellsten and K. J. Sheehan
Data Source: Canadian Institute for Health

Information Discharge Abstract Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Defining the Index Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
Defining the Endpoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Scope of the Services Included . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Use of the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Constructing an Episode of Care: A Cardiac Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Research Question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Capturing Events by Linking Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Linkage of Cardiac Registry, Hospital Separations, and Death Files . . . . . . . . . . . . . . . . . . 209
Use of the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Expanding on and Applying Episodes
of Care: Further Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Building Episode-Based Case Mix Classification Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Risk Adjustment and Severity Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Attributing Episodes to Providers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Policy Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Abstract parameters for these elements are described.

Health-care utilization data are traditionally Episode-based case mix classification systems,
presented in discrete, itemized formats that risk adjustment, and attribution rules are also
offer a fragmented view of the total picture of examined. Lastly, two examples of episode of
services delivered to treat an individual care construction and policy applications are
patient’s health condition. In response, health discussed.
services researchers have struggled for
150 years to define more meaningful units of
analysis from the outputs of health-care ser- Introduction
vices that are suitable for investigation. Begin-
ning with Florence Nightingale in 1863, the Health services researchers routinely face the task
basis for and application of an alternate con- of organizing and making sense out of data on
ceptual approach – the episode of care con- health service utilization in order to tell the story
struct – for organizing health-care events into behind it. Crucially, the types of health-care
a clinically meaningful unit of analysis has data that researchers typically work with are
evolved. In recent decades this approach has more often than not reported and presented in
been operationalized to support a variety of ways that obscure or fragment the underlying
health services research and policy applica- medical narrative they represent. Health services
tions. To construct an episode, researchers researchers often rely on data points collected for
must define three key elements including the administrative purposes, representing discrete
index event, the scope of services included, units of service such as physician claims for indi-
and the endpoint. How these elements are vidual services provided, discharge abstracts from
defined is dependent on the objective of the hospitalizations, or records for drug prescriptions
episode construction and the data that are filled. While these individual observations are
available. Here, the history of the episode undoubtedly important both as individual health-
of care concept, the core elements of an epi- care events and in aggregated form – for example,
sode, and the researcher’s key considerations a researcher may be interested in the total annual
and decision points in determining appropriate number of hospitalizations for heart failure in a
9 Health Services Information: Key Concepts and Considerations in Building Episodes of Care. . . 193
particular hospital and how this sum compares to before the patient enters hospital, they are
that in previous years – presenting health-care assessed at a preoperative clinic to prepare for
utilization data in this discrete, itemized fashion the surgery. The patient is then admitted to hospi-
typically captures only fragments of the total pic- tal, receives a total knee replacement on the day of
ture of services delivered to treat a patient’s health admission, and is discharged home 3 days later
condition. without incident. Following their discharge home,
The challenges of organizing health-care data the patient receives three weekly visits from a
into a coherent narrative stem in part from the physiotherapist contracting with a local home
unique nature of the health-care “product”: unlike care agency to assist with their rehabilitation.
most other commodities, health care is often Three weeks later, the patient has a follow-up
delivered through a series of separate but visit with the surgeon in their office to assess
related encounters, rather than through a single their recovery. Satisfied with the patient’s pro-
stand-alone service (Feldstein 1966; Hornbrook gress, the surgeon decides no further follow-up
et al. 1985). A patient presenting with a health is needed; the patient’s care journey can now be
condition may receive health-care services that considered to be at an end.
span multiple different health-care providers This complex series of encounters typifies a
over several points in time. The interrelated nature routine, simplified pathway for a patient receiving
of this variety of providers and services in provid- a successful total knee replacement. In some
ing care for a health condition for an individual instances, the same patient’s journey might well
patient is typically not readily apparent in stan- be further complicated by additional health-care
dard itemized or index-based presentations of events, such as the appearance of in-hospital
health-care data. or postoperative complications, the need for
Figure 1 provides an illustrative example of a readmission to hospital or revision surgery, and
series of individual health-care service data other potential sequelae.
points, which on closer inspection are revealed For the health services researcher, the hypo-
to be a single patient’s journey through treatment thetical knee replacement example likely pro-
with a total knee replacement for osteoarthritis of duces over a dozen data points in the form of a
the knee. Beginning with a consultation with a series of individual encounters recorded between
primary care physician for chronic knee pain that several health service providers and provider
has failed to respond to conservative treatment, organizations over a span of several months. In
the patient is referred for a radiograph several days many cases, this encounter data will also be
later and booked for a consultation with an ortho- housed across several discrete – and frequently
pedic surgeon in their office 4 weeks following. disconnected – datasets: primary care physician
During this consultation, the patient and surgeon and specialist billings, inpatient hospital dis-
decide on a total knee replacement surgery, which charge abstracts, home care agency records, and
is scheduled at a local hospital approximately so on. The health service researcher faces the
2 months after the consultation. Several days challenge of stitching these discrete observations
Fig. 1 Example episode of care for osteoarthritis of the knee. Illustrative example; timeline not to scale
together to form a meaningful and comprehensive hospital as either “relieved,” “unrelieved,” or

picture of a patient’s knee replacement journey “dead” (Nightingale 1863). Following Nightin-
through the health-care system. gale, the Boston surgeon Ernest Amory Codman
This chapter explores the conceptual basis published his lecture “The Product of a Hospital”
for and application of an alternate conceptual in 1914 (Codman 1914), which set out an early
approach for organizing health-care events in a framework for classifying the outputs of hospitals
clinically meaningful unit of analysis known as such as counts of patients treated, beds, bed days,
the episode of care. Nearly half a century ago, and student hours.
health economist Jerry Solon published the sem- Throughout most of the twentieth century –
inal paper “Delineating episodes of medical care,” and indeed, still largely today – health-care utili-
which put forward the following definition of the zation data continued to be aggregated on the
concept: basis of Codman-esque sum totals or indices of
individual services and outputs. Coinciding with
An episode of medical care is a block of one or more
the emergence of health services research in its
medical services received by an individual during a
period of relatively continuous contact with one or modern form in the late 1950s and early 1960s,
more providers of service, in relation to a particular researchers began to note the inadequacy of rou-
medical problem or situation. (Solon et al. 1967) tinely collected health-care data for the purposes
of understanding the nature of health-care utiliza-
This intuitive definition, while later nuanced tion. Whether presented as physician visits, bed
and expanded upon by other researchers, still pro-
days, or entire hospital stays, these isolated
vides the basic foundation for the application of
encounters were often insufficient in themselves
this concept today. for understanding the nature of a patient’s encoun-
This chapter begins with a brief history of the
ters with health-care providers for treatment and
evolution of the episode of care construct, from its
the course of medical services delivered. In their
conceptual origins to its operationalization in
1960 paper “Delineating Patterns of Medical
health services research applications to its use in
Care,” Jerry Solon and colleagues noted the need
modern policy applications. The core elements of
to learn about the “patterns” of health-care utili-
an episode are described and the researcher’s key
zation, “rather than merely documenting isolated
considerations and decision points in determining
incidents of use.” Solon et al. proposed an
appropriate parameters for these elements in order
approach for “consolidating detailed information
to define a meaningful episode of care. Case mix
on use of medical resources into meaningful, inte-
classification systems, risk adjustment, and sever-
grated forms” and translating “the vast spectrum
ity classifications are also examined. Lastly,
of utilization into representative patterns” (Solon
examples of recent research and policy applica-
et al. 1960). Rather than presenting a “procession
tions using episodes of care are discussed.
of chaotic data” from isolated medical encounters,
this consolidation would enable the systematic
organization of health-care data to better inform
Health-Care Data and Defining research and policy on the utilization of medical
the Unit of Analysis: Historical services (Solon et al. 1960).
Perspective In their classic 1961 paper “The Ecology of
Medical Care,” Kerr White and colleagues simi-
For over 150 years, health services researchers larly encapsulated this issue (White et al. 1961).
have struggled to define meaningful units of anal- They identified the patient as the primary unit of
ysis suitable for investigation from the outputs of observation, rather than the disease, visit, or
health-care services. In the nineteenth century, admission (White et al. 1961). By following a
Florence Nightingale produced what may have patient’s progression through all their encounters
been the first outcome-oriented classification sys- with health-care providers for treatment of a par-
tem for hospital care, labeling patients leaving ticular health condition, the course of medical
services delivered may be captured. White clinical practice perspective to the issue, defining
et al. suggested “the natural history of the patient’s a unit of analysis suitable for the development of
medical care” may be the most relevant primary “standards for the content of good clinical perfor-
unit of observation and proposed some appro- mance” in particular diseases, against which pro-
aches for disaggregating the data found in tradi- viders’ medical practices could be evaluated
tional health-care indexes into more meaningful “from preventive to postclinical after-care” (Falk
forms, such as employing time windows of weeks et al. 1967). Within these units, which they pre-
or months rather than years, and better under- sciently termed “pathways” in a subsequent paper
standing the decision-making process that unfolds in the same series (Schonfeld et al. 1968), the
between patients and medical care practitioners authors consulted expert physicians to arrive at
over the course of a particular illness. The paper quantitative judgments on what constituted appro-
is perhaps also the first to describe the “episode of priate medical utilization, such as the average time
ill health or injury” as its unit of observation required for a first diagnostic visit or the average
(White et al. 1961). hospital length of stay for various diseases.
In his 1966 article “Research on the Demand Published the same year, “Changes in the
for Health Services,” Paul Feldstein extended Costs of Treatment of Selected Illnesses,
Codman’s original work to define the “product” 1951–1965” by Anne Scitovsky (1967) extended
of health care, noting that in order to define a earlier work developing an alternate approach to
meaningful unit of output for analysis, researchers address the inadequacies of the Bureau of Labor
required “a better understanding of how the vari- Statistics’ medical care price index – which was
ous components of care are used in its production” based on the prices of individual medical items
(Feldstein 1966). Feldstein emphasized the and offered a limited and fragmented view of
importance of comprehensively accounting for changes in medical spending – to introduce
the entire combination of service inputs – such a “cost-per-episode-of-illness” approach that
as hospital care and physician visits – used to treat enabled the construction of a medical care price
a particular illness and considering differences in index based on the average costs of treatment of
the relative contributions of these services in the selected illnesses rather than the costs of discrete
production of treatment products between groups items. By demarcating patient episodes of illness
of providers and over time. He noted the limita- within a claims dataset that included all relevant
tions of conventional aggregate indices conven- services delivered between an initial diagnosis or
tionally applied to quantify national medical presentation for a health issue and either a
production in terms of outputs such as numbers service-defined endpoint (e.g., the last chemo-
of visits or bed days. therapy treatment following breast cancer treat-
ment) or a prescribed follow-up time period that
varied by disease, Scitovsky was able to compare
The Episode of Care: A Unifying changes in service utilization and cost for
Concept particular diseases between two time periods.
The episode unit enabled Scitovsky to both
Following this foundational papers’ assessment of comprehensively capture the full range of ser-
the gaps in contemporary methods for analyzing vices delivered to treat a specified disease
health-care utilization data, 1967 saw the publica- and examine changes in the provision of care,
tion of three seminal health services research such as a reduction in the rate of home visits
papers that each put forward a different perspec- and the shift of forearm fracture repairs from
tive on establishing an operational definition for office-based general practice to hospital-based
White et al.’s “natural history of the patient’s specialty care.
medical care.” In their series “The Development While White et al. provided a clinical practice
of Standards for the Audit and Planning of Med- construct of the episode of care and anticipated
ical Care,” Isidore Falk and colleagues took a the use of clinical pathways, and Scitovsky made
operational use of the concept for analyzing and apart from medical care, practically speaking,
comparing costs and utilization (an application researchers typically face significant challenges
that continues to see widespread use today), it in gathering precise data on episodes of illness
was Jerry Solon and colleagues who provided that occur without corresponding provision of
the first comprehensive definition of this new health services as these typically must be identi-
concept in “Delineating Episodes of Medical fied based on patient recollection. In their broadest
Care” (Solon et al. 1967). The authors described definition, the episode of care may overlap with
three essential features found in any medical care the episode of illness by including diagnostic
episode: a beginning point, a course of services follow-up after the point where medical care
directed toward an objective, and a point of ter- ceases, in order to understand the effect on a
mination or suspension of the service. Episodes patient’s trajectory of illness (Solon et al. 1967).
could be constructed around a variety of issues, Solon et al. also sketched out some potential
including a general health-related complaint, a set applications of the episode concept in their 1967
of defined symptoms, a diagnosed disease, or the paper, including using episodes as an organizing
achievement of a particular health objective (such structure for clinicians planning a patient’s care
as preventive care) where no active morbidities and as a frame of reference for the development of
are presented. standards of care for different medical conditions.
Solon et al. touched on range of impor- They further applied the concept in their 1969
tant (and still relevant) methodological issues study “Episodes of Medical Care: Nursing
such as the definition of clinically meaningful Students’ Use of Medical Services,” analyzing
time intervals for different medical conditions and comparing the details of several years of
between service encounters to mark the end of a health services received by nursing students and
previous episode and the beginning of a new one. comparing episode-based utilization measures
They discussed the conceptual challenges posed such as the volume and distribution of visits,
by chronic conditions that require ongoing med- diagnostic tests, and admissions within each epi-
ical management without a definite closure and sode (Solon et al. 1969).
expounded on the relationships between health After Solon’s codification of the essential ele-
services contained within a single episode, such ments of the episode of care, further refinements
as a chain of related physician visits. They iden- to and applications of the concept followed. In
tified potential interactions between multiple 1977, Moscovice first implemented episodes of
related episodes within the same individual, care using computerized routines, constructing
such as periodic exacerbations, remissions or disease-specific algorithms to define episodes for
acute sequelae linked to an underlying chronic con- several tracer conditions based on patient visit
dition, concurrent episodes for comorbid conditions, information (Moscovice 1977). The algorithms
or iatrogenic events resulting from the treatment identified an initial encounter with the recorded
delivered for an initial health problem. They incidence of a specified diagnosis code (the index
suggested that concurrent conditions in a patient event) and then tracked subsequent encounters by
might be treated as either part of a single episode the same patient with reported codes for the same
or multiple distinct episodes, depending on whether diagnosis or specified related comorbidities. For
the physician chooses to focus on one illness at a each condition, based on physician input, a max-
time or treat several within the same encounter imum time interval was defined between service
(Solon et al. 1967). encounters to assign services to either part of an
Solon et al. distinguished between episodes of existing episode or as the start of a new episode.
care, which are defined based on reported health Services and resources expended for each health
services, and episodes of illness, which may occur condition were similarly defined based on infor-
without the provision of health services. While the mation contained in medical directives and
episode of illness is an important concept for through clinician input. Moscovice compared
understanding the etiology of sickness and disease measures of utilization between providers and
treatment sites, including volumes of visits, lab- Episodes as an Analytical Tool:

oratory procedures, prescription patterns, and Advantages
total relative charges. Moscovice’s computerized
approach – using clinician input to define mean- As a unit of observation, the episode of care
ingful condition-specific parameters – is much offers several advantages for the health services
the same as that used by modern episode group- researcher over other commonly used methods:
ing software algorithms today.
In 1985, Mark Hornbrook and colleagues
published perhaps the most comprehensive paper Flexibility
on the subject, the widely cited “Health Care
Episodes: Definition, Measurement and Use” Episodes do not have preset boundaries based on
(Hornbrook et al. 1985). Expanding upon Solon historical – and often arbitrary – observation units
et al.’s original definitions, their paper distin- used in health-care administrative claims data
guished between episodes of illness (a symptom, such as hospitalizations or physician visits. The
sign, or health complaint experienced by the flexibility of the episode model allows for param-
patient), episodes of disease (the morbidity or eters such as the index event, endpoint, and types
pathology as viewed from the provider’s perspec- of services included to be customized based on the
tive), and episodes of care (“a series of temporally objectives of the study and the nature of the health
contiguous health-care services related to treat- conditions examined.
ment of a given spell of illness or provided in
response to a specific request by the patient or
other relevant entity”). They further differentiated Comprehensiveness
episodes of care from health maintenance epi-
sodes, which are health-care services delivered Episodes support the inclusion of all relevant
with the goal of enhancing wellness, preventing health-care services for a particular condition or
disease, cosmetic, or contraceptive purposes, procedure, which may be delivered across multi-
rather than toward the resolution of an existing ple care settings, numerous individual providers,
pathology. Finally, Hornbrook et al. suggested and overextended time frames. This broad, inclu-
that episodes of care may be delivered for the sive framework enables the researcher to present
treatment of more than one episode of disease or an integrated, comprehensive picture of the
illness concurrently. health-care services delivered to treat a specific
Subsequent research on the episode concept issue, with the ability to cross historical silos
has largely expanded on these earlier efforts and existing between health-care providers, care set-
made incremental refinements in areas such as tings, and subsystems. This also makes them an
methods for risk adjustment and complexity strat- attractive analytical vehicle for policies aimed at
ification within episodes, methods for estimating promoting integration and coordination of care
episode costs at the system and provider levels, between providers and over time, such as payment
rules for attributing episodes to health-care pro- models and performance reporting initiatives.
viders, and the development of episode-based
case mix classification systems which establish
rules for comprehensively assigning all reported Clinical Meaningfulness
health-care services to mutually exclusive epi-
sodes. With these methodological advancements Because the episode design parameters can be
have come an impressively diverse array of appli- customized to the nature of a particular disease
cations of the concept, operationalizing episodes or procedure, they support the design of a more
for use in a variety of research purposes, utiliza- clinically meaningful unit of analysis than tradi-
tion review, provider profiling, and provider pay- tional service counts or indices. Episodes allow
ment model design. for the analysis and comparison of specific health
issues or treatments between providers, settings, outlining such parameters. Finally, researchers
or points in time, using a format that is both more may encounter difficulties in communicating
clinically homogenous and more reflective of the around episode-based analysis to others who
underlying clinical reality of the health problem may not be familiar with the concept.
studied. The parameters of episodes used to
develop analyses can also be set with the explicit
input of clinicians who have understanding and Time and Resources Required
expertise in the particular condition or interven-
tion of focus, strengthening the credibility of the The increased complexity of the episode approach
analysis. over traditional silo-based forms of analyses leads
to increased time and resources required for tasks
such as defining episodes, preparing datasets, and
Episodes as an Analytical Tool: troubleshooting analyses. Many episode-based
Challenges analyses also require substantial computing
power to run.
There are also a number of important challenges
associated with designing, implementing, and
interpreting episode-based approaches: Methodological Challenges
Episode-based approaches also bring more com-

Data Requirements plex methodological issues that need to be
addressed, such as developing methods that mea-
Developing episodes requires the availability of sure variables across time, providers, and settings;
several essential data elements, most notably attributing services within overlapping episodes;
the use of unique identifiers spanning multiple and adjusting for patient case mix and dealing
health-care encounters involving the same patient. with outliers. Some of these issues and their
Unique health service identifiers are still not com- implications, as well as potential solutions, are
monly available in some health-care jurisdictions described in the following sections.
or datasets; in such cases, probabilistic matching Notwithstanding these challenges, when an
algorithms may be used as a potential substitute episode-based analytic approach is well aligned
to link service encounters involving the same with the intended research questions or analysis
patient. The researcher will typically have to objectives and applied carefully and thoughtfully,
merge multiple health-care datasets using unique it can be a powerful tool for both research and
identifiers in order to develop a comprehensive policy applications.
episode-based analysis and subsequently develop Although the episode of care concept is far
algorithms for defining episodes based on diagno- from new, it has experienced a surge in popularity
ses, services, and calendar dates. in recent years due to its growing use in high-
profile applications such as provider payment
policies and profiling efforts. Traditionally, while
Complexity regarded as conceptually attractive, episode-based
approaches were often difficult to implement in
The episode of care is a multidimensional concept practice outside of research efforts. The increasing
that can present challenges for defining appropri- availability of linked health-care services datasets
ately, particularly if the health issue under study is suitable for constructing episodes, ready-made
in itself complex or heterogeneous. In order to episode grouping software, and advances in
develop meaningful parameters for the episode, computing power has enabled episode-based
the researcher is advised to either seek clinician approaches to become a viable option in a grow-
input or draw from previously published literature ing range of applications.
Constructing an Episode of Care: Key a drug prescription filled). Sometimes, a single

Components record may contain multiple instances of this,
such as a hospital inpatient stay with multiple
Health services researchers seeking to construct procedures performed.
analyses using episode-based approaches should
familiarize themselves with several basic require-
ments in terms of the nature of the data required and Diagnosis Information
the essential elements for defining the episode. The
growing body of literature around this approach The patient’s diagnosis. This may take the form of
also provides insight into a variety of methodolog- either a Principal or Most Responsible Diagnosis
ical challenges and considerations that are fre- (diagnosis responsible for the majority of care
quently encountered in developing episodes of care. provided) or a Primary (preadmission), Secondary
(comorbidity), or Complication (postadmission)
diagnosis.
Data Sources Required
From an operational perspective, a researcher

The Date/Time of the Service Delivered
seeking to construct an episode of care requires
data on individual health service encounters that
This element is crucial in order to be able to assign
contains several core elements necessary for
encounters around a particular period of time or to
defining the key parameters for an episode of care.
arrange encounters in a medically logical order
(e.g., initial diagnosis followed by disease staging
Individual-Level Record Linkage followed by surgery followed by follow-up
assessments). For hospitalization episodes, this
The temporal nature of the episode and its orga- may include an admission date, discharge date,
nization of related health-care events around an and sometimes the date of procedures performed
individual’s health issue requires that data for within the hospitalization.
analysis contain an identifier at the individual
level that can be linked across records and over
time. Typically, data elements from health-care Core Elements of the Episode
datasets (such as hospital discharges, physician
billings, and home care services) are merged and As first described by Solon et al., every episode of
linked using either a unique patient identifier, care has a set of three core elements that must
probabilistic matching algorithms that match on be defined in order to set the parameters for anal-
some combination of variables (e.g., age, place of ysis: the index event and/or starting point for
residence, and time of the encounter), or a combi- the episode, the episode endpoint, and the scope
nation of the two approaches. Health-care datasets of services included (Solon et al. 1967). In paral-
with individual-level record linkage are made lel with the definition of these elements, the
available through government sources such as researcher must select the outcome measures
the Centers for Medicare and Medicaid Services of interest to be examined using the episode
or research institutions such as the Institute for construct.
Clinical Evaluative Sciences in Ontario. In defining these core elements, the researcher
is advised to consider the research or policy appli-
cations of the analysis and to solicit clinician input
Information on Type of Service on these definitions. One of the most attractive
features of the episode of care approach is its
The type of service delivered (e.g., the type of resonance as a meaningful measurement unit for
physician procedure, hospital inpatient admission, clinicians.
Defining the Index Event and/or might trigger a separate, concurrent episode for
Starting Point the complication.
An episode of care requires an index encounter or Examples

event that triggers the start of the episode. This Scitovsky (1967), Moscovice (1977), and others
index event may be a specific health-care service used an index event for their episodes defined by
(such as a knee replacement procedure), a partic- the first recorded instance of prespecified diagno-
ular diagnosis or health condition (such as a dia- ses (Moscovice 1977; Scitovsky 1967).
betes diagnosis, assigned by any provider in any The American Board of Medical Specialties
setting), or some combination of these two, such Research and Education Foundation defined the
as an admission to hospital for treatment of a index event for the episode of care for colonos-
congestive heart failure exacerbation. In most copy as the provision of a colonoscopy procedure,
cases, the index event will also mark the start of but defined the episode to also include services
the episode. Exceptions to this rule include exam- provided in the 3 days preceding the colonoscopy
ples where the episode definition employs a (High Value Health Care Project 2011).
“look-back” period from the index event, as in
the case where the incidence of a surgical
hospitalization triggers an episode window pre- Defining the Endpoint
ceding the admission with a defined period of
presurgical care. Each episode has an event, time window (either a
An index event may also take the form of a fixed time window from the episode index event
point in time rather than a particular health-care or a window of time where any related services are
encounter. Some episode methodologies use this absent), or other decision rule triggering the con-
approach for defining episodes for chronic dis- clusion of the episode. Researchers may select a
eases such as diabetes or chronic obstructive pul- clinically logical event for concluding an episode
monary disease. These are expected to be lifelong such as a specific health-care event. This is more
conditions, but for ease of analysis, they may be common in cases of elective and trauma proce-
annualized into year-long episodes. A point in dures where a defined sequence of health-care
time might also be used as the index event and events is expected to take place. For example, in
starting point in the case of an incomplete episode, Fig. 2, the patient arrives at the emergency depart-
where the analyst’s dataset begins at a particular ment and is assessed for the presence of a tibia
date in time and there may be encounters related to fracture (physician exam, diagnostic testing). The
a particular episode falling before the start of the fracture is confirmed or refuted. If confirmed the
dataset, rendering it impossible to establish a def- patient is referred for orthopedic assessment and
inite starting point. identified as a surgical or nonsurgical candidate.
Some episode approaches may also employ The patient receives surgery or conservative man-
index events that “shift” an existing episode cate- agement, followed by restorative care before dis-
gory into a different category when they occur. charge. The patient then receives follow-up in the
For example, an ICU admission by a patient reg- community. Health-care events which may end
istered with a COPD episode might shift the the episode include a patient’s death, discharge
patient’s episode into a higher severity level; a from hospital, or a follow-up appointment after
patient with an ongoing coronary artery disease surgery.
episode that experiences a heart attack might be However, a clinically logical event is not
shifted into an acute myocardial infarction epi- always available. The original definition of the
sode or a coronary artery bypass graft episode if episode of care put forward by Scitovsky (1967)
they seek surgical treatment. Similarly, the occur- and Solon et al. (1967)’s suggested that episodes
rence of a complication such as a pressure ulcer for a particular health issue concluded with the
during hospitalization for treatment of hip fracture discontinuation of services for that health issue
Fig. 2 Episode of care for

patient arriving to an
emergency department with
a suspected fracture of the
tibia
(Scitovsky 1967; Solon et al. 1967). Often with chronic sequelae like stroke, where follow-
described as a “clean period,” this generally on care can sometimes last for years. It should be
takes the form of a specified window of time noted that with endpoints based on clean periods,
where no services related to the episode are pro- the same considerations apply in terms of “open”
vided. For example, in the case of chronic bron- episodes: active episodes where a dataset or
chitis, this might be 45 days without any services claims history is censored before the full duration
related to bronchitis treatment such as x-rays or of the clean period elapses are considered “open”
relevant medication. Theoretically, using these at that point.
definition episodes for a particular condition can Alternatively, an endpoint can be a fixed point
have any duration, so long as relevant services in time, such as 30 days following a hospital
continue to be provided for treatment of the con- admission or discharge. These sorts of calendar-
dition. As with the duration of a fixed time win- based episode endpoints are commonly used for
dow, the duration of a clean period should be outcome measures that seek to compare “apples to
condition or procedure specific and defined apples” across providers that might have different
based on clinical input. Typically, episodes for discharge practices. The current public reporting
acute conditions such as appendicitis – where a principles adopted by the Centers for Medicare
defined, time-limited course of treatment can be and Medicaid Services to report on hospital mor-
expected – will have shorter clean periods than tality, readmission, and other outcomes stipulate
episodes for chronic diseases or acute conditions the use of a standardized time period to facilitate
comparison. A point in time approach may also be researcher desires and as is feasible given avail-
adopted in the case of chronic disease episodes able data. The scope of services included requires
based around an annualized analysis period or a decision on the part of the researcher: a more
where a dataset is censored at a particular date holistic episode approach might capture all ser-
and truncates “open” episodes. Using migraine vices provided during the episode window,
episodes, Schulman et al. (1999) put forward a regardless of whether they appear to be directly
novel approach to empirically defining the length related to a condition. This approach is being
of an episode of care (Schulman et al. 1999). The employed by the Centers for Medicare and Med-
study used administrative claims data to deter- icaid Services’ Bundled Payments for Care
mine the point in time following the index event Improvement initiative (Centers for Medicare &
where elevated weekly charges returned to their Medicaid Services 2014). A more limited episode
original pre-episode levels. may include only those services directly related to
Finally, the start of a new episode may trigger a particular condition. For example, in defining
the close of an existing one. For example, a patient services to be included in episodes of diabetes
suffering from osteoarthritis of the knees who care, the Netherlands’ bundled payment initiative
receives a total knee replacement may have an has included only community-based professional
ongoing osteoarthritis management episode re- services, excluding drugs and hospitalizations
placed with a total knee replacement procedural (Struijs et al. 2012a).
episode. Following the surgery, should their oste- Ultimately, the scope of services included in
oarthritis be completely addressed, the patient the episode depends on the objectives of the anal-
would not be expected to continue the original ysis and its intended applications and the nature of
disease episode. the data available. Payment applications, for
example, may suggest the utility of a single epi-
Examples of Endpoints sode payment that covers multiple different types
Moscovice drew on published medical directives of services over a fixed period of time, in order to
and clinician expert opinion concerning “reason- prevent any risk of “double counting” payment
able periods of follow-up” to a time period for (Struijs et al. 2012b). A truly comprehensive epi-
each condition where the absence of services sode might even include services beyond those
related to the condition would mark the beginning delivered by health-care providers: for an episode
of a new episode (Moscovice 1977). Scitovsky of care around complex patients with functional
used a similar condition-specific approach to needs, it may be ideal to also include social care
defining episode duration (Scitovsky 1967). services delivered – to the extent that they are
Health Quality Ontario used input from clinical captured in databases.
expert panels, informed by analysis of linked If the researcher elects to use a more clinically
administrative data on utilization, to define the focused approach or a categorically based
typical duration of services provided in episodes approach to service inclusion, clinical input is
of hip fracture care (Health Quality Ontario 2013). imperative. Input from clinical panels is required
Symmetry’s Episode Treatment Groups use the to identify the services that are related to the
approach of “annualizing” the episode of care for episode of care and the types of services that
chronic diseases with indefinite durations (Optum would likely not be related.
2015).
Examples
Moscovice used published medical directives and
Selecting the Scope of Services clinical input to define lists of medical services
Included that could “realistically be used in the treatment of
a particular problem or related comorbidity.” In
Episodes of care can be as comprehensive or the case of otitis media, this list of services
as specific in their inclusion of services as a included lab tests such as throat cultures that
might be used to rule out plausible related (Cave 1995). More recent studies have used epi-
comorbidities. Based on these lists, Moscovice sodes for similar cost and utilization profiling
defined a set of “patterns of care” based on the approaches with hospitals as the central unit of
most common combinations of services delivered analysis (Birkmeyer et al. 2010), as well as
to treat each episode. For otitis media, 20.6 % of exploring regional comparisons (Reschovsky
episodes analyzed consisted of a single visit, et al. 2014). Regardless of the unit of analysis
while 13.8 % consisted of an initial visit, admin- for comparison, the episode construct enables an
istration of an antibiotic, throat culture, and then a “apples to apples” mechanisms that allows for
follow-up visit (Moscovice 1977). comparison of the total treatment “product”
Solon et al. examined nursing students’ between different providers or regions.
utilization of health-care services within episodes The vast majority of episode-based costing
of care through separating encounters into analyses have largely been conducted in the
“universal” visits – those services, such as United States, where the predominant use of item-
vaccinations, provided to all students – and ized claims data for reimbursing health-care ser-
“individual” visits specific to treating the nursing vices naturally lends itself to the aggregation of
student’s episode (Solon et al. 1969). such claims into episodes of care. In countries
such as Canada or some European nations that
make greater use of global budgets for funding
Outcome Measures health-care services, constructing episode of care
investigations of health-care costs requires the
Ultimately, the episode of care is intended to serve development of methodological approaches that
as a clinically relevant unit of analysis for mea- serve as surrogates for “pricing.” In Ontario, such
suring particular aspects of care or outcomes approaches have been developed using a combi-
delivered. In the broadest sense, any outcome nation of case mix cost estimation methodologies
measured at a standard time frame (e.g., 30-day for globally budgeted hospital sectors and claims
mortality) might be considered an application schedules for physicians and other fee-for-service
of the episode-based approach. However, most providers (Sutherland et al. 2012).
episode-based studies have focused largely on pro-
cess- or utilization-related measures. Following
Falk et al.’s concept of the episode or pathway as Examples
a unit of analysis for auditing quality of care (Falk Sutherland et al. compared the total costs (includ-
et al. 1967), Lohr and Brook (1980) compared ing hospital, physician and inpatient, and
providers’ use of appropriate therapy for respira- community-based rehabilitation) of hip and knee
tory infection, while Nutting et al. (1981) used replacement episodes between regions in Ontario,
episodes of care to compare health systems’ per- correlating higher costs with the use of less effi-
formance in terms of preventative services, timely cient care settings (Sutherland et al. 2012).
diagnoses, continuity of care, and other factors. After defining the most common combina-
By far the most common use of episodes since tions of services (or “patterns of care”) used for
their earliest uses has been for examination and each type of episode, Moscovice evaluated the
comparison of health-care costs and utilization: proportion of episodes delivered according to
studies by Scitovsky (1967) and Solon et al. these patterns and compared the results between
(1969) examined measures of total episode costs different care providers and settings (Moscovice
and number of visits by different health profes- 1977).
sionals, respectively. A popular use of episode- Scitovsky used episode-based measures of
based cost measures involves the comparison of total health-care costs per treated condition to
different physicians or physicians’ practices in assess differences in costs (and the changes in
terms of the total downstream health-care costs service mix driving these differences) for episodes
of their patients – a practice known as profiling of care over time (Scitovsky 1967).
Lohr and Brook (1980) used an episode-based Data Source: Canadian Institute
analysis to compare quality of care for respiratory for Health Information Discharge
conditions before and after the publication of Abstract Database
guidelines on the use of injectable antibiotics,
defined as the percentage of episodes that The Canadian Institute for Health Information
included appropriate use of antibiotic therapy (CIHI) is an independent, not-for-profit organiza-
(Lohr and Brook 1980). tion that provides information on Canada’s health
system and the health of Canadians (Canadian
Institute for Health Information 2015). CIHI facil-
Constructing an Episode of Care: A Hip itates collection of standardized administrative,
Fracture Example clinical, and demographic data from acute hospi-
talizations through the Discharge Abstract Data-
Research Question base (DAD). The data (2003–2012) are presented
as a series of flat comma-delimited files with
As described earlier, the episode of care can be as multiple abstracts for some patients. To prepare
comprehensive or as specific in its inclusion of data for analysis, researchers develop a relational
services as is desired and feasible. Here, an exam- database to facilitate combining abstracts into epi-
ple is presented of a more focused episode con- sodes of care. In the following sections, a concep-
struction to address the question of the effect of tual framework for constructing an episode of hip
timing of hip fracture surgery on patient out- fracture care and the approach for operatio-
comes. Many argue that patients presenting to nalizing it using the CIHI abstracts is described.
hospital with hip fracture should receive surgery Here a method for constructing an episode of
as early as possible; however, the literature detail- care to study the effects of timing of hip fracture
ing the benefits of accelerated access to the proce- surgery using acute care discharge abstracts is
dure is inconclusive. Furthermore, little is known described, and therefore, the episode is confined
as to causes of delay: some patients wait to be to patients admitted to the hospital and outcomes
medically stabilized, while others are delayed due occurring in-hospital. Data relating to emergency
to administrative factors such as hospital type, department wait times or post-acute care utiliza-
transfers, and date and time of admission. tion was not provided.
The literature identifies the following path-
ways on the basis of treatment patients receive
during acute hospitalization with hip fracture: sur- Defining the Index Event
gical treatment (Menzies et al. 2010), nonsurgical
treatment (Jain et al. 2003), or palliative care The ideal index event is injury time. This event
(Meier 2011). Most patients undergo surgical enables researchers to capture all hip fracture
treatment during either their initial hospitalization patients, includes events preceding hospital
or after transfers from hospitals where patients are admission such as prehospital death, and captures
initially admitted. While in the hospital, some the time from injury to admission which contrib-
patients are medically stabilized before surgery. utes to delays (Sheehan et al. 2015). However,
Patients remain in the hospital after surgery until injury time is not available through administrative
they are fit to be discharged home or to an alter- databases and therefore alternative index events
native level of care. Some patients receive must be considered. When identifying the index
nonsurgical management of their hip fracture as event for the episode from administrative data,
their risk of complications and death is too high. researchers may select the hip fracture surgery
These patients are medically stabilized and procedure, the hip fracture diagnosis, or admis-
discharged home or to an alternative level of sion with a diagnosis of hip fracture (Fig. 3).
care. Palliative care is offered to patients at the A procedure approach captures outcomes which
end stage of a terminal illness. occur postoperatively implying that time at risk
Fig. 3 Approaches to defining the index event for a hip boxes and arrows represent events and their timings
fracture episode of care. Thick vertical lines indicate the ascertained retrospectively. Solid box and arrows represent
index event for constructing each care episode. Dashed events and their timing ascertained prospectively
begins at the time of surgery. A diagnosis from admission to surgery within a single dis-
approach includes patients who incur a hip frac- charge abstract. Where the index event is diagno-
ture in acute hospital following admission for sis, time to surgery is measured from diagnosis
another diagnosis. Here an admission approach (preadmission or postadmission) to surgery.
is adopted as it allows researchers to capture out- Where the index event is admission, time to sur-
comes which occur before surgery, including pre- gery is measured from the earliest admission time
operative death, while excluding patients who to surgery time, preoperative death, or discharge
incur a hip fracture in the hospital after admission without surgery. This approach is inclusive of
for another diagnosis (Sheehan et al. 2015). transfers which occurred between admission and
discharge, a potential administrative factor for
Defining the Endpoint delay (Fransoo et al. 2012).
Transfers from one acute care facility to
In this example, a clinically logical event defines another present in the data as a single patient
the endpoint: death, discharge home, or discharge with multiple records for hip fracture. Here, con-
to an alternative level of care. A fixed point in time tiguous abstracts linked by transfers are combined
is also considered an endpoint as the dataset is in one episode; the earliest admission date and the
censored at March 2012. latest discharge date are designated as the begin-
ning and the end of episode (Fig. 4). To determine
whether multiple records for a given patient reflect
Scope of the Services Included transfer before definitive care, the following rules
are applied:
In this example services included are specific to
the effect of surgical timing on outcomes of acute 1. Less than 6 h between discharge on one
hip fracture care. First, researchers define how abstract and admission on another abstract
time to surgery is measured. Where the index (12 h if at least one institution codes the
event is surgery, time to surgery is measured transfer)
Fig. 4 Conceptual framework for constructing hip frac- home or to an alternate level of care. On completion of the
ture acute episodes of care. A patient is admitted to acute first episode of care, a patient may return to acute care for a
hospital for their first episode of hip fracture care. They related episode – revision surgery, readmission, or for a
may be transferred from one acute care facility to another change in care. Alternatively a patient may return to acute
before definitive care – surgery or conservative manage- care with an entirely new subsequent hip fracture
ment. Once acute care is completed, they are discharged
2. Admission before 6:00 (12:00 if at least one surgery (Fig. 4). Finally, patients may present to
institution codes the transfer), when discharge the hospital with an entirely new subsequent
and admission occur on 1 day but discharge hip fracture (Fig. 4). Following consultation
time is unknown with orthopedic surgeons, the following rules
3. Discharge after 18:00 (12:00 if at least one are created for patients with multiple dis-
institution codes the transfer), when discharge charge abstracts to identify related episodes
and admission occur on 1 day but admission as revision, readmission, change in care, or
time is unknown subsequent:
After discharge from acute care, some • Revision: surgical admission within 90 days of
patients return to the hospital for an episode discharge after initial surgical episode
related to their hip fracture. They may return to • Readmission: nonsurgical admission within
acute care with a complication that requires 90 days of discharge after initial surgical/
revision surgery such as a failed fixation/prosthe- nonsurgical episode
sis. Alternatively they may return to acute hospi- • Change in care: surgical admission within
tal for treatment of medical complications related 30 days of admission for initial nonsurgical
to their hip fracture. Patients discharged without episode
surgery may also return for surgery to alleviate • Subsequent: hip fracture admission more than
pain or if they are no longer considered unfit for 90 days after the initial episode
After the application of the rules, some adja- • Episode of hip fracture care table contains
cent abstracts remain unassigned because their discharge abstracts of the first and subsequent
admission and discharge dates are in reverse episodes of hip fracture care, uniquely identi-
order. Only abstracts with the earlier admission fied by patient id and hip fracture number. The
date for constructing care episodes are used. episode may combine information from
abstracts linked by transfers.
• Revision surgery table contains discharge
Data Model abstracts of surgical hospitalization following
first or subsequent episodes of hip fracture
For patients with a single discharge abstract, the care.
abstract represents the first episode of hip fracture • Readmission table contains discharge abstracts
care. Multiple abstracts for a given patient could of nonsurgical hospitalization following first or
represent the first episode of hip fracture care, subsequent episodes of hip fracture care
revision surgery, readmission, change in care, or whether surgical or medical.
a subsequent episode of hip fracture care. As such, • Change in care table contains discharge
the data fields from multiple discharge abstracts abstracts of surgical hospitalization following
are used to construct new fields or update infor- first or subsequent nonsurgical episodes of hip
mation in the same field but from a different fracture care.
abstract. A data model is developed to relate mul- • Other tables contain demographic and comor-
tiple abstracts of hip fracture care for a given bidity data.
patient, which explicitly defines how data fields
relate to each other (Table 1). In particular, the Normalization is used to organize the CIHI
data model establishes relationships among tables discharge abstracts. First, repeating data fields
containing discharge abstracts of the first episode with similar data in individual tables are elimi-
of hip fracture care, revision, readmission, change nated, a separate table for each set of related data
in care, and subsequent hip fracture episodes. is created, and each set of related data is classified
This involves creating a series of data tables with two primary keys: patient id and hip fracture
and establishing relationships between them: number. This normalization helps avoid multiple
Table 1 Algorithm for identifying and classifying episodes of hip fracture care
Step 1 Remove duplicates from CIHI records
Step 2 For patients with single record, convert their records into episodes of initial hospitalization
Step 3 For patients with multiple records, combine records linked by transfers into care episodes:
(a) Designate the earliest unlinked record as the start of a new episode
(b) Combine contiguous records into an episode of care if transfer is identified
(c) If records remain, go to 3a
Step 4 For each patient, classify the episode with earliest admission as initial hospitalization
Step 5 Classify episodes of surgical hospitalization with admission within 90 days of discharge from initial surgical
hospitalization as revision
Step 6 Classify episodes with admission within 90 days of discharge from initial nonsurgical hospitalization as
readmission
Step 7 Classify episodes of surgical hospitalization with admission within 30 days of admission from initial
nonsurgical hospitalization as change in care
Step 8 For each patient, classify the episode with earliest admission beyond 90 days of discharge from initial surgical
hospitalization as initial hospitalization with a new fracture
Step 9 Mark episodes with admission for open, pathological, and post-admit fracture
Step Mark records not assigned to any episode as unassigned
10
fields storing similar data in one table. Second, different clinically logical events to define the
separate tables for groups of data fields that apply endpoint: death, change in surgical candidacy, or
to multiple abstracts are created, and these tables the procedure itself. Sobolev and Kuramoto stud-
are related with a foreign key. This normalization ied outcomes of surgical cardiac care according to
maintains records that only depend on a table’s time to surgery (Sobolev and Kuramoto 2007).
primary key.
Data Sources
Use of the Data
Data on patients registered to undergo CABG are
The dataset was created for estimating the fre- obtained from the British Columbia Cardiac Reg-
quency of preoperative deaths, postoperative istry (BCCR) (Volk et al. 1997). This prospective
complications, and in-hospital deaths following database contains dates of registration on the
complications among patients exposed to various list, procedure, and withdrawal from the list,
times before surgery. More specifically, the along with disease severity and other risk factors,
dataset creation enabled capturing events and for all patients who are registered to undergo
durations associated with hip fracture care deliv- CABG in any of the four tertiary care hospitals
ery. By operationalizing patient pathways in terms that provide cardiac care to adult residents of
of data available from the CIHI, preoperative British Columbia. Additional information on
transfers, surgery, postoperative transfers, and access to CABG is obtained from the BC
outcomes of admission (preoperative death, post- Linked Health Database Hospital Separations
operative complications, and death), as well as File (Chamberlayne et al. 1998) and deaths from
events following discharge (readmissions, revi- the provincial Death File (Sobolev et al. 2006).
sions, subsequent hip fractures), were captured.
From this dataset the durations of hospital stay,
preoperative stay, and postoperative stay were Capturing Events by Linking Data
estimated. Sources
Patient and administrative factors for delay
including demographic, clinical, and injury data The care episode begins with a cardiac surgeons’
fields and hospital type, date, and time of admis- assessment and includes hospital inpatients and
sion were also captured. These data facilitate the outpatients registered on a wait list for elective
assessment of potential causes of delay. Combin- CABG. A series of events take place preopera-
ing discharge abstracts of all patients, whether tively outside the hospital; preoperatively,
they have surgical or nonsurgical treatment or perioperatively, and postoperatively in the hospi-
die before surgery, facilitates assessment of the tal; and postoperatively outside the hospital. The
total harm from delays by considering deaths in care episode ends with death, change in surgical
those who did not make it to surgery. candidacy, or the procedure itself.
For patients registered on a wait list for elective
CABG, a preoperative assessment, which may
Constructing an Episode of Care: include additional tests, may occur prior to admis-
A Cardiac Example sion or in the hospital. Their surgical candidacy is
then confirmed or refuted by an anesthesiologist.
Research Question Once a patient is identified as a surgical candidate,
their access to the procedure is determined
A patient identified as in need of coronary artery through scheduling of operating room time.
bypass graft (CABG) while a hospital inpatient or Patients are selected from hospital admissions
as an outpatient is registered on a wait list for and from the wait list on the basis of urgency,
the elective procedure. A patient may encounter resource availability, and plan for discharge from
the hospital. The allocated time may change if determine the interval (wait time) between events.
emergent cases arise, if cancelations occur prior Once sequenced the person-episode is created
to the scheduled time, or if a patient’s status which includes a de-identified patient number
changes during their wait. The patient is assessed and an event number. This combination uniquely
again preoperatively, receives their surgery, is determines the patient-episode related to a specific
monitored postoperatively in the postanesthesia event.
care unit, and is transferred to the ward or inten-
sive care unit. The patient’s postoperative recov-
ery is managed in the hospital until they are Linkage of Cardiac Registry, Hospital
suitable for discharge home or to an alternate Separations, and Death Files
level of care. On discharge the patient is followed
up in the community until their recovery is com- A patients’ Provincial Health Number is used to
plete or death occurs. link BCCR records with the BC Linked Health
Patient-level records in administrative health Database Hospital Separations File and to the
databases may have multiple records for one Death File. Events including hospital admission,
patient. Patient records may be organized in two comorbidities, surgery, hospital separation, and
different formats – the “person-level” format or discharge type (home, alternate level of care, or
the “person-episode” format. The person-level death) are retrieved from the BC Linked Health
format contains a single record per patient. In the Database Hospital Separations File. Deaths which
current example, this approach would enable do not occur in the hospital are captured by the
researchers to capture the time from inpatient Death File. Adopting a person-episode approach,
registration on a wait list for elective CABG to the BCCR records are linked to the BC Linked
the procedure, discharge, or transfer to an alter- Health Database Hospital Separations Files and
nate level of care from a single hospitalization the Death Files to create an analytical dataset. An
record. The person-episode format contains mul- analytical data dictionary is created to describe the
tiple records per patient. In the current example, variables created to represent events and patient
this approach would enable researchers to capture characteristics (Table 2).
the time from inpatient or outpatient registration
on a wait list for elective CABG to the procedure,
discharge, or transfer to an alternate level of care Use of the Data
from multiple administrative records. As the pre-
sent study aim is to determine the impact of waits The dataset was created for estimating outcomes
on outcomes in cardiac care, all events contribut- of registration for elective (nonemergency) pro-
ing to the wait and potential outcomes of waiting cedures in surgical cardiac care. These outcomes
should be captured. In order to achieve this, the included preoperative death, postoperative death,
person-episode approach is adopted whereby change in urgency status, and unplanned emer-
multiple data sources are linked. gency surgery among patients exposed to various
The series of events during the care episode times before CABG. More specifically, the dataset
and patient characteristics are captured with creation enables capturing events and durations
administrative data entry. A data model which associated with registration on a wait list
chronologically relates events captured by data for CABG.
elements is created. Events of interest include By operationalizing patient pathways in terms
registration and removal from the wait list, hospi- of the data available from the cardiac registry,
tal admission and discharge, scheduled surgery hospital separations and death file preoperative
and unplanned emergency surgery, and preopera- events (delay to surgery, change in urgency status,
tive, in-hospital, or follow-up death. Each event unplanned emergency surgery, death) and postop-
has an associated time stamp which allows erative death were captured. Furthermore, the
researchers to sequence the events and to durations of time spent on the wait list for elective
Table 2 Analytical dataset data dictionary for records of patients awaiting elective coronary artery bypass grafting
Variable Description Source Code
BCCR_ID Patient identifier BCCR <Text>
AGECAT Age decade BCCR 1 – 20–29 years
2 – 30–39 years
...
8–90 years
SEXF Sex BCCR 0 – man
1 – woman
ANATOM Coronary anatomy BCCR 1 – left main disease
2 – 2- or 3-vessel disease, with PLAD
3 – 3-vessel disease, with no PLAD
4 – 1-vessel disease, with PLAD
5 – 1- or 2-vessel disease, no PLAD
U – otherwise and unknown
UR_BR Urgency at booking BCCR 0 – emergency
1 – urgent
2 – semiurgent
3 – nonurgent
U – unknown
CM_CH Comorbidities from Charlson Hospital 0, 1, 2, 3, or 4 (¸4)
index separations
CM_BK Major comorbidities Hospital 1 – CHF or diabetes or COPD or rheumatism or
separations cancer
0 – other
INST_BK Location at registration BCCR Hospital 1, 2, 3, or 4
WL_ST Wait-list registration date BCCR mm/dd/yyyy
WL_EN Wait-list removal date BCCR mm/dd/yyyy
WL_RM Reason for removal BCCR 0 – underwent surgery
1 – death
2 – medical treatment
3 – at patient request
4 – transfer to other hospital
5 – otherwise removed from list
6 – no surgical report
7 – still on wait list
8 – other surgery
9 – death recorded in BCCR, not in Deaths File
DTHDATE Death date Death file mm/dd/yyyy
< . > – no date recorded
EXIT_CODE Type of hospital discharge Hospital D – discharged alive
separations S – left against medical advice
X – died in the hospital
N/A – not applicable
ADDATE Hospital admission date Hospital mm/dd/yyyy
separations < . > – no date recorded
SEPDATE Hospital separation date Hospital mm/dd/yyyy
separations < . > – no date recorded
With kind permission from Springer Science + Business Media: Analysis of Waiting-Time Data in Health Services
Research, Waiting-time data used in this book, volume 1, 2008, 21–22, Boris Sobolev and Lisa Kuramoto, Table 2.1
surgery were estimated by urgency status. These similar clinical and resource utilization character-
data enabled researchers to answer questions istics (Fetter et al. 1980).
such as: Developing a case mix classification system is
a significant endeavor. Rather than development
• What is the variation in time spent waiting for being limited to a few particular types of episodes
elective surgery? of interest, case mix systems operate under prin-
• What is the effect of delays in scheduling an ciples of being mutually exclusive and compre-
operation? hensively exhaustive: thus, an effective episode
• Do longer delays contribute to preoperative grouping system (also known as a “grouper”)
mortality among patients with less urgent would feature logic to assign every health-care
need for surgery? service claim or encounter record to a particular
• What is the survival benefit of cardiac surgery? type of episode, selected from a limited list of
• What is the risk of death associated with episode categories.
delayed surgical treatment? From the researcher’s perspective, the decision
on the appropriate approach here depends on the
Combining data of all patients registered on the objectives of the analysis: if the objective is to
CABG wait list, whether they went on to receive develop an episode-based payment system that
surgery or not, facilitates assessment of the total provides payments for all health-care services
harm from delays by considering change in through an “episode bundle,” a full case mix sys-
urgency status and deaths in those who did not tem will be required to ensure all patients are
make it to surgery. assigned to a particular category. If the idea is to
simply focus on analyzing a few different types
of episodes, a full case mix system will not be
Expanding on and Applying Episodes required, although an existing public domain
of Care: Further Considerations or commercial episode grouping product could
be applied to define any number of episodes
Building Episode-Based Case Mix based on preexisting grouping algorithms. If an
Classification Systems existing episode grouping solution is applied, the
researcher is advised to acquire a thorough under-
While most of the studies conducting episode- standing of the underlying clinical logic of the
based analyses reviewed in this chapter focus on software.
a limited set of conditions, episode grouping soft-
ware such as the Symmetry Episode Treatment
Groups (ETGs) (Optum 2015), Thomson Reuters Risk Adjustment and Severity
Medical Episode Groups (MEGs) (MaCurdy Classification
et al. 2009), and the Centers for Medicare and
Medicaid Services’ episode grouping algorithms A key enhancement made in the 1990s over the
(Centers for Medicare and Medicaid Services basic episode concept of episode grouping and
2015b) seek to assign all patient health-care classification systems was the development of
encounters to mutually exclusive episodes based episode-based risk adjustment models. Wingert
on their diagnosis and procedure combinations. et al. (1995) first noted the need to incorporate
Such systems are developed with the objective severity adjustment into episode-based analyses,
of establishing a comprehensive episode-based beyond that offered by a diagnosis-based classifi-
case mix classification system, analogous to the cation system (Wingert et al. 1995).
long-established diagnosis-related groups (DRGs) Some episode grouping methodologies such as
and other similar classification systems that cate- the ETGs employ a hierarchy of subcategories
gorize hospital inpatient stays into one of several within each type of episode to differentiate
hundred preestablished case mix groups that share between episodes of different severity levels.
These subcategories may be defined with a variety cost performance of physicians or determining
of proxy data points, including patient character- what providers would be eligible to receive a
istics such as comorbidities or the type of health- share of a bundled payment. In such applications,
care services received. For example, a diabetes business rules must be defined for the attribution of
episode restricted to ambulatory services may be the episode to one or more providers. A variety of
assigned to a lower severity level than a diabetes approaches to this task are possible and have been
episode that includes a hospitalization for compli- explored in the literature. Using a retrospective
cations of diabetes. The use of different severity approach to assigning episodes to providers based
categories within episode groups allows for the on historical fee-for-service claims data, Hussey
expected cost (or sometimes, price) of the episode et al. (2009) examined the impacts of alternate
to differ by severity level, in order to compare rules for assigning episodes of care to physicians
“apples to apples” in performance profiling appli- and facilities, with options including attribution to
cations or ensuring fair reimbursement levels in a single physician or facility with the highest total
funding applications. charges in retrospective claims, assignment to a
Even with the use of severity levels within group of physicians or facilities that met a mini-
episode groups, there may still be challenges mum threshold of 25 % of total charges, and
with episode heterogeneity: MaCurdy et al. assignment to the physician with the highest pro-
conducted an extensive series of simulation ana- portion of evaluation and management claim
lyses using proprietary episode groupers and charges, using the rationale that this physician
found substantial residual variation in unex- was likely to be the “most responsible” for manag-
plained costs within each severity grouping ing the patient’s care. They concluded that the
(MaCurdy et al. 2009). Certain types of health- performance of alternate rules depended signifi-
care utilization that may potentially be included in cantly on the trajectory of the condition studied:
the scope of the episode have been found to con- for example, a largely hospital-based episode such
tribute substantial portions of this unexplained as myocardial infarction was more easily assigned
cost variation: Vertrees and other researchers to a single facility and physician than a largely
with 3M Inc. examined a variety of different sets ambulatory-based episode such as diabetes, where
of parameters for defining post-acute episode win- facilities played a relatively minor rule and a larger
dows and found that by excluding readmissions number of providers were involved in providing
from the episode, the performance of existing case care to individual patients (Hussey et al. 2009).
mix systems in terms of predicting total episode
costs was vastly improved (Vertrees et al. 2013).
In addition to methods for risk adjustment within Policy Applications
episode groups, some commercial groupers such
as the ETG and MEG methodologies also enable Up until the 1990s, the use of episode of care
the user to calculate an aggregate risk score for an methods was mainly confined to research-oriented
individual based on their total episode history in a applications and focused on a small set of conditions
given time period. In such applications, a total risk or procedures. In parallel, in the 1980s the US health
score is calculated based on the sum of individual policy landscape was transformed with the devel-
risk scores assigned to each type of episode expe- opment and wide-scale use of the DRGs acute
rienced by an individual. inpatient case mix classification system (Fetter
et al. 1980). This was first developed for the pur-
poses of utilization review and then subsequently,
Attributing Episodes to Providers and most importantly, applied for the purposes of
Medicare hospital payment.
Episodes of care may be used in applications that In the 1990s, the first commercial episode-based
involve assigning an episode to a particular pro- case mix classification systems emerged and began
vider entity: for example, comparing the relative to be employed by insurers and health maintenance
organizations for comparing efficiency across payment” providers effectively received for
groups of providers (Wingert et al. 1995). These such incidents under the fee-for-service payment
early efforts evolved into well-developed commer- system.
cial platforms such as the ETGs (Optum 2015) and Building on the success of some earlier bun-
the MEGs (MaCurdy et al. 2009). The ETGs and dled payment pilot programs that employed lim-
MEGs both use a flexible time window used to ited episodes of care focused on hospital and
delineate different episodes. Episode-based classi- physician services within single acute care stays,
fication software enabled commercial insurers to in 2011 the Centers for Medicare and Medicaid
assign all their claims and encounter data to distinct Services announced the “Bundled Payments for
episodes, advancing the practical use of episodes of Care Improvement” (BPCI) initiative, a landmark
care for policy applications such as payment and demonstration project that allowed providers to
physician profiling. volunteer for participation in a suite of bundled
In the past decade, episode-based payment and payment options, including episodes indexed by
performance measurement approaches have gath- an acute inpatient hospitalization for a set of eli-
ered huge momentum in the United States, in gible conditions and extending into either 30, 60,
large part due to the Medicare Payment Advisory or 90 days of post-acute care and episodes limited
Committee (MedPAC)’s endorsement of bundled to post-acute care only with similar 30, 60, or
payment approaches as a transformative alterna- 90 day time window options. All Medicare Part
tive to the predominantly fee-for-service payment A and Part B services are included in the episode.
systems employed in the United States. In their For each episode, a single payment is determined
influential 2008 report, Reforming the Medicare for the group of providers based on their historical
Delivery System, MedPAC put forward a strong service claims for similar episodes previously
case backed by extensive analysis for a nation- provided and adjusted for regional and national
wide shift toward bundled payments for episodes spending levels.
of care defined by an acute hospitalization and a The majority of BPCI participants are enrolled
fixed window of post-acute care services (Medi- in “retrospective” models, where providers con-
care Payment Advisory Committee 2008). Such a tinue to be paid on a fee-for-service basis followed
payment approach, MedPAC argued, would have by an episode-based reconciliation against the
the promise of overcoming several important lim- target total “price” for the episode by all providers
itations of Medicare’s fee-for-service payment participating in the demonstration project. Thus,
approaches. Payments for episodes of care shared groups of providers that are able to deliver epi-
across groups of providers would offer strong sodes at a significantly lower cost than their target
financial incentives for groups of physicians, hos- price are eligible for a share in the savings,
pitals, and post-acute care providers to work whereas providers that exceed the target price
together, coordinate services, and redesign patient may be eligible to return a share of the overspend-
pathways to improve efficiency across the epi- ing to Medicare. As of July 2015, there were over
sode. Bundled payments would also target 2000 provider entities that had contracted to par-
observed unwarranted regional variations in the ticipate in one of the BPCI models (Centers for
provision of post-acute care services for similar Medicare and Medicaid Services 2015).
types of patients, where some areas made much As these and other current major episode-
more use of more costly and intensive settings driven policy initiatives in the United States, the
such as inpatient rehabilitation beds and skilled Netherlands, Sweden, and elsewhere make abun-
nursing facilities than others. Finally, bundled dantly clear, the episode of care is currently
payments would drive improved quality of care experiencing a renaissance in terms of its use as
by ensuring that providers would be forced to a foundational analytic construct to support pay-
absorb the costs of unplanned readmissions and ment system design, performance measurement
complications occurring following discharge initiatives, and a wide variety of health services
from acute care, as opposed to the “double research applications.
References payment: moving from concept to practice. Health Aff

(Project Hope). 2009;28(5):1406–17.
Birkmeyer JD, Gust C, Baser O, et al. Medicare payments Jain R, Basinski A, Kreder HJ. Nonoperative treatment of
for common inpatient procedures: implications for hip fractures. Int Orthop. 2003;27(1):11–7.
episode-based payment bundling. Health Serv Res. Lohr KN, Brook RH. Quality of care in episodes of respi-
2010;45(6 Pt 1):1783–95. ratory illness among Medicaid patients in New Mexico.
Canadian Institute for Health Information. Canadian Insti- Ann Intern Med. 1980;92(1):99–106.
tute for Health Information. 2015. www.cihi.ca. MaCurdy T, Kerwin J, Theobald N. Need for risk adjust-
Accessed 27 Oct 2015. ment in adapting episode grouping software to Medi-
Cave DG. Profiling physician practice patterns using diag- care data. Health Care Financ Rev. 2009;30(4):33–46.
nostic episode clusters. Med Care. 1995;33(5):463–86. Medicare Payment Advisory Committee. Report to the con-
Centers for Medicare and Medicaid Services. Bundled gress: reforming the delivery system; 2008. http://www.
Payments for Care Improvement (BPCI) initiative: gen- medpac.gov/documents/reports/Jun08_EntireReport.pdf.
eral information. 2014. https://innovation.cms.gov/ini Accessed 28 Oct 2015.
tiatives/bundled-payments/. Accessed 27 Oct 2015 Meier DE. Increased access to palliative care and hospice
Centers for Medicare and Medicaid Services. Supplemen- services: opportunities to improve value in health care.
tal QRURs and episode-based payment measurement. Milbank Q. 2011;89(3):343–80.
2015a. https://www.cms.gov/Medicare/Medicare- Menzies IB, Mendelson DA, Kates SL, et al. Prevention
Fee-for-Service-Payment/PhysicianFeedbackProgram/ and clinical management of hip fractures in patients
Episode-Costs-and-Medicare-Episode-Grouper.html. with dementia. Geriatr Orthop Surg Rehabil. 2010;1
Accessed 28 Oct 2015. (2):63–72.
Centers for Medicare and Medicaid Services. Bundled pay- Moscovice I. Selection of an appropriate unit of analysis
ments for care improvement (BPCI) initiative fact sheet. for ambulatory care settings. Med Care. 1977;15
2015b. https://www.cms.gov/Newsroom/MediaRelease (12):1024–44.
Database/Fact-sheets/2015-Fact-sheets-items/2015-08- Nightingale F. Notes on hospitals. London: Longman,
13-2.html. Accessed 28 Oct 2015. Gree, Longman, Roberts and Green; 1863.
Chamberlayne R, Green B, Barer ML, et al. Creating a Nutting PA, Shorr GI, Burkhalter BR. Assessing the
population-based linked health database: a new performance of medical care. Med Care. 1981;21(3):
resource for health services research. Can J Public 281–96.
Health. 1998;89(4):270–3. Optum. Symmetry Episode Treatment Groups. 2015.
Codman EA. The product of a hospital. Arch Pathol Lab https://www.optum.com/providers/analytics/health-plan-
Med. 1914;1990;114(11):1106–11. analytics/symmetry/symmetry-episode-treatment-groups.
Falk IS, Schonfeld HK, Harris BR, et al. The development html. Accessed 27 Oct 2015.
of standards for the audit and planning of medical care: Reschovsky JD, Hadley J, O’Malley AJ, et al. Geographic
I. Concepts, research design, and the content of primary variations in the cost of treating condition-specific epi-
physician’s care. Am J Public Health. 1967;57: 1118–36. sodes of care among Medicare patients. Health Serv
Feldstein PJ. Research on the demand for health services. Res. 2014;49(1):32–51.
Health Serv Res. 1966;44(3):128–65. Schonfeld HK, Falk IS, Lavietes PH, et al. The develop-
Fetter RB, Shin Y, Freeman JL, et al. Case mix definition ment of standards for the audit and planning of medical
by diagnosis-related groups. Med Care. 1980;18 care: pathways among primary physicians and special-
((2 Suppl) iii):1–53. ists for diagnosis and treatment. Med Care. 1968;
Fransoo R, Yogendran M, Olafson K, et al. Constructing 6(2):101–14.
episodes of inpatient care: data infrastructure for Schulman KA, Yabroff KR, Kong J, et al. A claims data
population-based research. BMC Med Res Methodol. approach to defining an episode of care. Health Serv
2012;12:133. Res. 1999;34(2):603–21.
Health Quality Ontario. Quality-based procedures: clinical Scitovsky A. Changes in the costs of treatment of selected
handbook for hip fracture. Toronto: Health Quality illnesses, 1951–65. Am Econ Rev. 1967;57(5):
Ontario; 2013. http://www.hqontario.ca/evidence/publi 1182–95.
cations-and-ohtac-recommendations/clinical-handbooks. Sheehan KJ, Sobolev B, Bohm E, Sutherland J, Kuramoto
Accessed 27 Oct 2015. L, Guy P, Hellsten E, Jaglal S for the Canadian Collab-
High Value Health Care Project. 2011. http://www.rwjf. orative on Hip Fractures. Constructing episode of care
org/content/dam/farm/reports/program_results_reports/ from acute hospital records for studying effects of
2011/rwjf71110. Accessed 27 Oct 2016. timing of hip fracture surgery. J Orthop Res. 2016;
Hornbrook MC, Hurtado AV, Johnson RE. Health care 34(2):197–204.
episodes: definition, measurement and use. Med Care Sobolev BG, Kuramoto L. Analysis of waiting-time data in
Rev. 1985;42(2):163–218. health services research. New York: Springer; 2007.
Hussey P, Sorbero M, Mehrotra A, et al. Using episodes of Sobolev BG, Levy AR, Kuramoto L, et al. Do longer
care as a basis for performance measurement and delays for coronary artery bypass surgery contribute
to preoperative mortality in less urgent patients? Med health care costs in the Netherlands: an analysis for
Care. 2006;44(7):680–6. diabetes care and vascular risk management based on
Solon J, Sheps CG, Lee SS. Delineating patterns of nationwide claim data, 2007-2010. RIVM Report.
medical care. Am J Public Health Nations Health. 2012b;260013001. http://rivm.openrepository.com/
1960;50(8):1105–13. rivm/handle/10029/257206. Accessed 27 Oct 2016.
Solon JA, Feeney JJ, Jones SH, et al. Delineating episodes Sutherland JM, Hellsten E, Yu K. Bundles: an opportunity
of medical care. Am J Public Health Nations Health. to align incentives for continuing care in Canada?
1967;57(3):401–8. Health Policy (Amsterdam, Netherlands). 2012;107
Solon JA, Rigg RD, Jones SH, et al. Episodes of care: (2–3):209–17.
nursing students’ use of medical services. Am J Public Vertrees JC, Averill RF, Eisenhandler J, et al. Bundling
Health Nations Health. 1969;59(6):936–46. post-acute care services into MS-DRG payments.
Struijs JN, De Jong-Van Til JT, Lemmens LC, Drewes HW, Medicare Medicaid Res Rev. 2013;3(3):E1–E19.
De Bruin SR, Baan CA. Three years of bundled payment Volk T, Hahn L, Hayden R, et al. Reliability audit of a
for diabetes care in the Netherlands. Impact on health regional cardiac surgery registry. J Thorac Cardiovasc
care delivery process and the quality of care. RIVM Surg. 1997;114(6):903–10.
Report. 2012a;260013001. https://www.researchgate. White KL, Williams TF, Greenberg BG. The ecology of
net/profile/Jeroen_Struijs/publication/233407675_ medical care. N Engl J Med. 1961;265:885–92.
Three_years_of_bundled_payment_for_diabetes_care_ Wingert TD, Kralewski JE, Lindquist TJ,
in_the_Netherlands._Effect_on_health_care_delivery_ et al. Constructing episodes of care from encounter
process_and_the_quality_of_care/links/09e4150 and claims data: some methodological issues. Inquiry.
a50b96ad6cb000000.pdf. Accessed 27 Oct 2016. 1995;32(4):430–43.
Struijs JN, Mohnen SM, Molema CC, De Jong-van Til JT,
Baan CA. Effects of bundled payment on curative
Health Services Information: Lessons
Learned from the Society of Thoracic 10
Surgeons National Database
David M. Shahian and Jeffrey P. Jacobs
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
The Evolution of Healthcare Quality Measurement and Clinical Registries . . . . . . . . . . . 218
Database Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
STS Adult Cardiac Surgery Database
(STS-ACSD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
STS Congenital Heart Surgery Database
(STS-CHSD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
STS General Thoracic Surgery Database
(STS-GTSD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
Database Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Vendors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
STS Staff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Data Warehouse and Analytic Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Data Quality and Audit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
STS Quality Measurement Task Force
(STS-QMTF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
STS Quality Initiatives Task Force (STS-QIT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
STS Public Reporting Task Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
STS Research Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
STS Task Force on Longitudinal Follow-Up and Linked Registries (STS-LFLR) . . . . 235
Device Surveillance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
D. M. Shahian (*)
Department of Surgery and Center for Quality and Safety,
Massachusetts General Hospital, Harvard Medical School,
Boston, MA, USA
e-mail: dshahian@partners.org
J. P. Jacobs
Division of Cardiac Surgery, Department of Surgery, Johns
Hopkins University School of Medicine, Baltimore, MA,
USA
Johns Hopkins All Children’s Heart Institute, Saint
Petersburg/Tampa, FL, USA

https://doi.org/10.1007/978-1-4939-8715-3_11
218 D. M. Shahian and J. P. Jacobs
Abstract In summary, the STS National Database is a

The Society of Thoracic Surgeons (STS) uniquely valuable resource that is largely
National Database was initiated in 1989 with responsible for the dramatic improvements in
the goal of providing accurate clinical data to cardiothoracic surgical outcomes that have
support quality assessment and improvement occurred over the past quarter century.
activities in cardiothoracic surgery. Participa-
tion among adult cardiac surgery centers
and pediatric cardiac surgery centers in the Introduction
USA currently exceeds 90 %, and the STS
National Database now also includes a general The Evolution of Healthcare Quality
thoracic surgery registry with growing national Measurement and Clinical Registries
penetration.
The specific functions of the STS National Valid and reliable assessment of healthcare per-
Database have also evolved, as reflected in its formance requires high-quality data, appropriate
various task forces. Quality assessment analytical methodologies, modern computing
remains the primary function, and the STS power, and, most importantly, a conceptual frame-
Quality Measurement Task Force is responsi- work. Regarding the latter, several healthcare
ble for developing risk models and perfor- leaders were prescient in their recognition of the
mance metrics (often composites) for need to collect, analyze, and publish the outcomes
frequently performed cardiothoracic proce- of medical and surgical care.
dures. Each quarter, participants in the STS Florence Nightingale, best known as the foun-
Adult Cardiac Database are provided with der of modern nursing, is less well recognized for
detailed feedback reports of their practice char- her seminal contributions to public health research
acteristics, including demographics, risk factor and provider profiling (Spiegelhalter 1999;
profiles, operative data, and outcomes Iezzoni 2003). Upon her return to England after
benchmarked against STS national data. Simi- service in the Crimean War, she published mortal-
lar feedback reports are provided to partici- ity rates of English hospitals, using approaches
pants in the STS Congenital Heart Surgery that were admittedly flawed by today’s standards.
Database and the STS General Thoracic Data- However, this publication, which was roughly
base every 6 months. In addition, given its contemporaneous with the American Civil War,
belief in accountability, STS established a Pub- represents the first time that outcomes rates for a
lic Reporting Task Force to coordinate volun- diverse group of hospitals were compared.
tary public reporting initiatives using the In the early 1900s, Ernest Amory Codman, a
Consumer Reports or STS websites. surgeon at Boston’s Massachusetts General
The ultimate goal of all database activities is Hospital, was distraught by the lack of objective
to improve patient outcomes, and specific qual- data regarding surgeon performance, as well as
ity improvement projects are developed and the lack of correlation between surgeon’s results
led by the STS Quality Initiatives Task Force. and their reputations or hospital leadership posi-
The STS Task Force on Longitudinal Follow- tions. Codman incurred the wrath of the Boston
Up and Linked Registries coordinates the medical community when he unveiled a large
linkage of STS clinical registry data with com- cartoon depicting an ostrich-goose laying golden
plementary information regarding long-term “humbug” eggs for the well-heeled residents of
outcomes and costs from other data sources. the Back Bay, who were woefully ignorant of
Additional STS National Database Task Forces the results actually produced by their doctors.
include International Relations, Appropriate- He famously wrote (Spiegelhalter 1999; Codman
ness, Cost and Resource Use, Dashboards, 1914, 1995; Donabedian 1989; Mallon 2000;
and Informatics. Neuhauser 1990; Passaro et al. 1999):
10 Health Services Information: Lessons Learned from the Society of Thoracic Surgeons. . . 219
I am called eccentric for saying in public that hos- which could be used for provider profiling,
pitals, if they wish to be sure of improvement. . . research, and policy development. The use of
• Must find out what their results are. statistical techniques such as logistic regression
• Must analyze their results to find their strong and and hierarchical regression expanded dramati-
weak points. cally in the latter half of the twentieth century,
• Must compare their results with those of other facilitated in large part by the exponential growth
hospitals.
• Must care for what cases they can care for well, of computing power and mass data storage
and avoid attempting to care for cases which capacity.
they are not qualified to care for well. Another essential component for the develop-
• Must welcome publicity not only for their suc- ment of robust quality assessment and improve-
cesses, but for their errors, so that the public may
give them their help when it is needed. ment was the evolution of clinical data registries.
• Must promote members of the medical staff on Several seminal events in the mid- and late 1980s
the basis which gives due consideration to what provided the proximate stimulus for the develop-
they can and do accomplish for their patients. ment of cardiac surgery databases, including the
• Such opinions will not be eccentric a few years
hence Society of Thoracic Surgeons (STS) National
Database, which was the first large-scale clinical
Codman started his own End Result Hospital registry developed by a professional society. On
built upon these principles, but it eventually March 12, 1986, the Health Care Financing
closed. Although Codman was ridiculed and Administration (HCFA, the predecessor of the
disdained by many colleagues at the time, his Centers for Medicare and Medicaid Services or
work led directly to the formation of the American CMS) published a list of hospital mortality rates
College of Surgeons and the Joint Commission on which were based on administrative claims data
Accreditation of Healthcare Organizations. and minimally adjusted for patient risk. This was
The third visionary leader in healthcare quality referred to by some as the Medicare “death list,”
measurement was Professor Avedis Donabedian and it was widely criticized for its methodological
at the University of Michigan (Donabedian 1966, shortcomings. However, despite these flaws, it was
1988). Donabedian was the first to propose that apparent to some farsighted leaders that this was
healthcare quality could be measured using struc- the beginning of a new era in healthcare transpar-
ture (e.g., 24/7 intensivist availability, nursing ency. Among those who envisioned this future
ratios, adoption of computerized physician order state were the leaders of STS. The most commonly
entry), process (e.g., achieving an “open artery” performed procedure by members of that organi-
within 90 min for patients suffering an acute MI, zation, coronary artery bypass grafting surgery
administering aspirin to acute MI patients), and (CABG), was a natural target for early efforts to
outcomes (e.g., mortality, complications, readmis- assess performance. CABG was one of the most
sions, patient-reported outcomes). Donabedian frequently performed and costly procedures in
stressed that “Outcomes, by and large, remain healthcare at that time and had well-defined out-
the ultimate validators of the effectiveness and comes including mortality, stroke, reoperation,
quality of medical care” (Donabedian 1966), kidney failure, and infections. Owing in part to a
anticipating the current emphasis on outcomes torrent of requests from STS members who
measurement as the optimal way to assess quality believed that the HCFA “death list” had mischar-
in healthcare. acterized their programs as underperforming, STS
The science and technology necessary to actu- leaders recognized the inadequacy of using mini-
alize the conceptual framework of Nightingale, mally adjusted claims data to evaluate program
Codman, and Donabedian did not become widely performance. An ad hoc committee on risk factors
available until the latter half of the twentieth cen- was developed by STS (Kouchoukos et al. 1988)
tury. The enactment of Medicare legislation in in order to define those patient factors that would
1965 resulted in a huge new claims data source be required to fairly adjust for inherent patient risk.
These were incorporated into what subsequently Over the next quarter century, the STS National
was to become the STS National Database (STS Database has expanded from its initial focus on
National Database 2014), which was made avail- adult cardiac surgery, particularly CABG, to
able to STS members in 1989 under the direction of encompass all major cardiac surgical procedures
Dr. Richard Clark (Clark 1989; Grover et al. 2014). in the adult, as well as congenital heart surgery
The 1986 HCFA release of mortality reports and general thoracic surgery. By 2014, over 1080
also stimulated the development of other cardiac programs participated in the STS Adult Cardiac
surgery database and performance monitoring Surgery Database (90–95 % of all US programs),
initiatives. In New York State, Dr. David 114 programs were contributors to the STS Con-
Axelrod, the commissioner of health, was aware genital Database (95 % of all US programs), and
of a fivefold variation in unadjusted mortality 244 programs participated in the STS General
rates for coronary artery bypass grafting surgery Thoracic Database. Seven international sites also
(CABG) among the 28 cardiac surgical programs participated. Figures 1, 2, and 3 demonstrate the
in that state. However, he and the New York geographical distribution of participants in the
Cardiac Advisory Committee recognized that three STS National Databases.
acting upon this data would be challenging, as
low-performing hospitals would likely assert that
their patients were “sicker,” just as they had Database Structure
when HCFA released its mortality reports.
Accordingly, in collaboration with Dr. Edward The STS National Database is composed of three
Hannan, a clinical data registry for CABG was clinical specialty databases and 10 functionally ori-
developed (the New York Cardiac Surgery ented, crosscutting task forces (Table 1). Each of
Reporting System or CSRS) (Hannan et al. the three clinical specialty databases has its own
2012). Using these data, expected results for unique features and, in some instances, challenges.
each patient were estimated and aggregated to
the program and surgeon levels. Comparing
observed and expected results made it possible STS Adult Cardiac Surgery Database
to generate risk-standardized mortality rates and (STS-ACSD)
ratios, and these were first released to the public
in 1990. These results demonstrated that not only The STS Adult Cardiac Surgery Database
was there wide variation in unadjusted mortality (STS-ACSD) is the oldest of the three specialty
rates but also in risk-adjusted mortality rates. databases and has the largest number of partici-
Similarly, the Northern New England Cardiovas- pants (approximately 1080 in the USA). Based on
cular Disease Study Group (O’Connor et al. studies by Jacobs and colleagues, center-level
1991) found wide variation in the ratio of penetration (number of CMS sites with at least
observed to expected mortality among CABG one matched STS participant divided by the
programs in that region. total number of CMS CABG sites) increased
In summary, the early development of clinical from 83 % to 90 % between 2008 and 2012
data registries by STS, as well as a few states and (Jacobs et al. 2016). In 2012, 973 of 1,081 CMS
regions, was driven by a desire to produce valid, CABG sites (90 %) were linked to an STS site.
risk-adjusted results that would allow fair com- Patient-level penetration (number of CMS CABG
parisons of performance among providers, ac- hospitalizations done at STS sites divided by the
counting for the preoperative risk of their total number of CMS CABG hospitalizations)
patients. Availability of such data would facilitate increased from 89 % to 94 % from 2008 to
quality improvement by providers and might also 2012. In 2012, 71,634 of 76,072 CMS CABG
impact consumer choice of providers, shifting hospitalizations (94 %) were at an STS site.
market share to better performing groups, Finally, completeness of case inclusion at STS
although the latter goal has yet to be achieved. sites (number of CMS CABG cases at STS sites
Fig. 1 STS Adult Cardiac Surgery Database Map, Thoracic Surgeons, 2016. All rights reserved (Reprinted
accessed July 2, 2016, at http://www.sts.org/sites/default/ with permission from STS)
files/documents/adultcardiacMap_4.pdf. # The Society of
linked to STS records, divided the by total num- The STS-ACSD now encompasses the entire
ber of CMS CABG cases at STS sites) increased spectrum of adult cardiac surgery. This includes
from 97 % to 98 % from 2008 to 2012. In 2012, CABG; surgery of the aortic, mitral, tricuspid, and
69,213 of 70,932 CMS CABG hospitalizations at pulmonary valves; surgery of the thoracic aorta;
STS sites (98 %) were linked to an STS record. arrhythmia procedures; and less commonly per-
This suggests that at STS-participating sites that formed procedures such as pulmonary thromboend-
billed CMS for CABG procedures, virtually all arterectomy and removal of tumors of the heart and
these billed cases were captured in the STS inferior vena cava. Data are collected regarding:
National Database. These high degrees of
national penetration and completeness, together Patient demographics
with high accuracy verified in the ongoing exter- Risk factors that may impact the outcomes of
nal audits (see “Data Quality and Audit” section surgery
below), are of critical importance when STS Details of the specific disease process that led to
advocates for the use of its measures, rather surgery (e.g., degree of coronary artery stenosis
than those based on claims data, in various public in each vessel, etiology and severity of valvular
reporting programs. Lack of high national pene- lesions, type of thoracic aortic pathology)
tration is, in fact, a commonly used rationale for Technical details of the conduct of the procedure
the continued use of claims-based metrics in that was performed
many areas; however, this justification for use Detailed clinical outcomes
of claims-based metrics is clearly not applicable Disposition of the patient (e.g., home, rehabilita-
to adult cardiac surgery. tion facility, or deceased)
Fig. 2 STS Congenital Heart Surgery Database Map, Thoracic Surgeons, 2016. All rights reserved (Reprinted
files/documents/congenitalMap_4.pdf. # The Society of
Data from the STS-ACSD are reported back (Shahian et al. 2007a, 2012a, 2014; O’Brien
to participants on a quarterly basis (STS et al. 2007). These performance reports provide
National Database 2014). These data include numerical point estimates with credible inter-
the types of procedures performed, demo- vals based on a Bayesian hierarchical model,
graphics and risk factors of the patients, details and they also assign participants to a “star rat-
about the conduct of the surgical procedure, and ing” category based on the true Bayesian
outcomes. In each case, this information is probabilities (e.g., 99 % for isolated CABG)
benchmarked against aggregate data from all that the provider has worse than expected, as
STS-participating programs nationally and also expected, or better than expected performance
against aggregate data from programs that are (see “STS Quality Measurement Task Force
similar in terms of teaching intensity and size. [STS-QMTF]” section below). These reports
Finally, participants are given their last several also include guidance as to which areas of per-
years of data so that important trends may be formance are most in need of remediation and
recognized. Twice yearly, in addition to the improvement.
routine harvest feedback reports, participants In addition to these regular confidential feed-
also receive reports of their performance on back reports, STS-ACSD data are used for quality
National Quality Forum (NQF)-endorsed STS assessment, performance improvement initiatives,
metrics and on the various STS composite per- research, and public reporting and to satisfy regu-
formance metrics for specific procedures (e.g., latory and reimbursement imperatives. Many of
isolated CABG, isolated aortic valve replace- these additional functions are discussed in subse-
ment, aortic valve replacement plus CABG) quent sections.
Fig. 3 STS General Thoracic Surgery Database Map, Thoracic Surgeons, 2016. All rights reserved (Reprinted
files/documents/thoracicMap_5.pdf. # The Society of
Table 1 STS National Database Task Forces

STS National Database Task Forces
Specialty-specific task forces Functional, crosscutting task forces
STS Adult Cardiac Surgery Database STS Quality Measurement Task Force (STS-QMTF)
(STS-ACSD) Task Force
STS Congenital Heart Surgery Database STS Quality Improvement Task Force (STS-QIT)
(STS-CHSD) Task Force
STS General Thoracic Surgery Database STS Access and Publications (A and P) Task Force (within the STS
(STS-GTSD) Task Force Research Center)
STS International Relations Task Force
STS Task Force on Longitudinal Follow-up and Linked Registries
(STS-LFLR) (within the STS Research Center)
STS Appropriateness Task Force
STS Public Reporting Task Force
STS Cost and Resource Use Task Force
STS Dashboards Task Force
STS Informatics Task Force
STS Congenital Heart Surgery Database undertaken by the STS Workforce on Congenital
(STS-CHSD) Heart Surgery, documented that 125 hospitals in
the USA and 8 hospitals in Canada perform
The report of the 2010 STS Congenital Heart pediatric and congenital heart surgery (Jacobs
Surgery Practice and Manpower Survey, et al. 2011a). In 2014, the STS Congenital Heart
Surgery Database (STS-CHSD) included 114 con- Heart Surgery Nomenclature and Database Pro-
genital heart surgery programs representing ject are now utilized by the STS-CHSD, the
119 of the 125 hospitals (95.2 % penetrance by EACTS Congenital Heart Surgery Database
hospital) in the USA and 3 of the 8 centers in (EACTS-CHSD), and the Japan Congenital Car-
Canada. diovascular Surgery Database (JCCVSD). As of
The analysis of outcomes of patients undergo- January 1, 2014, the STS-CHSD contains data
ing pediatric and congenital cardiac surgery pre- from 292,828 operations; the EACTS-CHSD con-
sents several unique challenges in the domains of tains data from over 157,772 operations; and
nomenclature and risk adjustment. Unlike adult the JCCVSD contains data from over 29,000
cardiac surgery where the majority of operations operations. Therefore, the combined data set of
involve CABG, aortic valve replacement, and the STS-CHSD, the EACTS-CHSD, and the
mitral valve replacement or repair or a combina- JCCVSD contains data from over 479,000 opera-
tion of these, congenital cardiac surgery involves tions performed between 1998 and January
a much wider variety of procedures. 1, 2014, inclusive, all coded with the EACTS-
One of the greatest challenges in the develop- STS-derived version of the IPCCC, and all
ment and application of the STS-CHSD has coded with identical data specifications.
involved standardization of nomenclature and Similar to investigations of data sources used
definitions related to surgery for pediatric and for adult cardiac surgery studies, several studies
congenital cardiac disease. During the 1990s, have examined the relative utility of clinical and
both the European Association for Cardio- administrative nomenclature for the evaluation of
Thoracic Surgery (EACTS) and STS created data- quality of care for patients undergoing treatment
bases to assess the outcomes of congenital cardiac for pediatric and congenital cardiac disease.
surgery. Beginning in 1998, these two organiza- Given the far greater diversity of anatomic lesions
tions collaborated to create the International Con- and procedures compared with adult cardiac sur-
genital Heart Surgery Nomenclature and Database gery, it is not surprising that the superiority of
Project. By 2000, a common nomenclature and a clinically rich data sources is even more apparent
common core minimal data set were adopted by in congenital heart disease. Evidence from several
EACTS and STS and published in the Annals of investigations suggests inferior accuracy of cod-
Thoracic Surgery (Mavroudis and Jacobs 2000; ing of lesions in the congenitally malformed heart
Franklin et al. 2008). In 2000, The International using administrative databases and the ninth
Nomenclature Committee for Pediatric and Con- revision of the International Classification of
genital Heart Disease was established. This com- Diseases (ICD-9) (Cronk et al. 2003; Frohnert
mittee eventually evolved into the International et al. 2005; Strickland et al. 2008; Pasquali
Society for Nomenclature of Paediatric and et al. 2013; Jantzen et al. 2014). Analyses based
Congenital Heart Disease (ISNPCHD). By 2005, on the codes available in ICD-9 are likely to have
members of the ISNPCHD crossmapped the substantial misclassification of congenital cardiac
nomenclature of the International Congenital disease. Furthermore, differences in case ascer-
Heart Surgery Nomenclature and Database Pro- tainment between administrative and clinical
ject of the EACTS and STS with the European registry data for children undergoing cardiac oper-
Paediatric Cardiac Code (EPCC) of the Associa- ations can translate into important differences in
tion for European Paediatric Cardiology (AEPC) outcomes assessment.
and therefore created the International Pediatric Risk modeling is essential when assessing and
and Congenital Cardiac Code (IPCCC) (Franklin comparing healthcare performance among pro-
et al. 2008; Jacobs et al. 2008), which is available grams and surgeons, as this adjusts for differences
for free download from the Internet at http://www. in the complexity and severity of patients they
IPCCC.NET. This common nomenclature, the treat. Reliably accounting for the risk of adverse
IPCCC, and the common minimum database outcomes mitigates the possibility that providers
data set created by the International Congenital caring for sicker patients will be unfairly
penalized, as their unadjusted results may be EACTS have transitioned from the primary use
worse simply because of case mix (Shahian of Aristotle and RACHS-1 to the primary use of
et al. 2013a). However, formal risk modeling is the STAT Mortality Categories for three major
challenging for rare operations because sample reasons:
sizes are small. Risk adjustment in congenital
cardiac surgery is particularly challenged by this 1. STAT Score was developed primarily based on
reality, as the specialty is defined by a very objective data, while RACHS-1 and Aristotle
wide variety of operations, many of which are were developed primarily on expert opinion
performed at a relatively low volume. Conse- (subjective probability).
quently, the STS-CHSD has implemented a meth- 2. STAT Score allows for classification of more
odology of risk adjustment based on complexity operations than RACHS-1 or Aristotle.
stratification. Complexity stratification provides 3. STAT Score has a higher c-statistic than
an alternative methodology that can facilitate the RACHS-1 or Aristotle.
analysis of outcomes of rare operations by divid-
ing the data into relatively homogeneous groups Data from the STS-CHSD are reported back to
(called strata). The data are then analyzed within participants every 6 months in feedback reports.
each stratum. Similar to the STS-ACSD, the data in these feed-
Three major multi-institutional efforts have back reports include the types of procedures
used complexity stratification to measure the com- performed, demographics and risk factors of the
plexity and potential risk of congenital cardiac patients, details about the conduct of the surgical
surgical operations (Jacobs et al. 2009; O’Brien procedure, and outcomes. In each case, this infor-
et al. 2009a): mation is benchmarked against aggregate data
from all participants in the STS-CHSD. Partici-
1. Risk Adjustment in Congenital Heart Surgery- pants are given their last 4 years of data so that
1 methodology (RACHS-1 methodology) important trends may be recognized. The feed-
2. Aristotle Basic Complexity Score (ABC back report also includes an assessment of pro-
Score) grammatic performance using the empirically
3. STS-EACTS Congenital Heart Surgery Mor- derived 2014 STS Congenital Heart Surgery
tality Categories (STS-EACTS Mortality Cat- Database Mortality Risk Model that incorporates
egories) (STAT Mortality Categories) both procedural stratification by STAT Mortality
Category and patient factors. This 2014
RACHS-1 and the ABC Score were developed STS-CHSD Mortality Risk Model includes the
at a time when limited multi-institutional clinical following covariates:
data were available and were therefore based in a
large part on subjective probability (expert opin- • STAT Mortality Category
ion). The STAT Mortality Categories are a tool for • Age
complexity stratification that was developed from • Previous cardiovascular operation(s)
an analysis of 77,294 operations entered into the • Any noncardiac abnormality
EACTS-CHSD (33,360 operations) and the • Any chromosomal abnormality or syndrome
STS-CHSD (43,934 patients) between 2002 and • Important preoperative factors (mechanical cir-
2007. Procedure-specific mortality rate estimates culatory support, shock persisting at time of
were calculated using a Bayesian model that surgery, mechanical ventilation, and renal
adjusted for small denominators. Operations dysfunction)
were sorted by increasing risk and grouped into • Any other preoperative factors
five categories (the STAT Mortality Categories) • Prematurity (for neonates only)
that were designed to be optimal with respect to • Weight (for neonates only)
minimizing within-category variation and maxi- • Weight‐for‐age‐and‐sex Z‐score (for infants
mizing between-category variation. STS and only)
Centers for which the 95 % confidence interval STS General Thoracic Surgery Database
for observed-to-expected mortality ratio does not (STS-GTSD)
include unity (does not overlap with the number
one) are identified as one-star (low-performing) The STS General Thoracic Database
or three-star (high-performing) programs with (STS-GTSD) is the newest of the three specialty
respect to operative mortality. Star ratings are databases, and it faces a unique challenge. Unlike
provided for the single category of ‘all ages and adult and congenital heart surgery, both of which
all STAT Categories.’ Public reporting of data are practiced almost exclusively by board-
from the STS-CHSD began in January 2015 certified cardiothoracic (CT) surgeons, general
using this star rating system, with reporting of thoracic surgery in the USA is more often
both star ratings and the actual numerical mortal- performed by general surgeons or by surgical
ity data on which the star rating is based. As of oncologists. These surgeons are allowed to submit
March 2016, 68 out of 113 (60.2 %) participants data to the STS National Database, but they rarely
in STS-CHSD from the United States had agreed take advantage of this opportunity. Therefore,
to publicly report their outcomes using this there are essentially two populations of patients
system. undergoing noncardiac chest surgery in the USA.
Data quality in the STS-CHSD is evaluated In the first group are patients operated upon by
through intrinsic data verification, including iden- board-certified CT surgeons, many of whom are
tification and correction of missing/out-of-range involved in academic or referral centers and most
values and inconsistencies across fields and of whom participate in the STS-GTSD. The sec-
on-site audit. In 2014, approximately 10 % of ond group of patients is operated upon by sur-
participants (11 participants) will be randomly geons who are not board-certified thoracic
selected for audits of their center. The audit is surgeons, who rarely if ever participate in the
designed to complement the internal quality con- STS National Database, and who do not receive
trols. Its overall objective is to maximize the regular feedback information on their perfor-
integrity of the data in the STS-CHSD by exam- mance from the STS-GTSD. This diverse popula-
ining the accuracy, consistency, and completeness tion of surgeons performing general thoracic
of the data. In 2013, the audit of the STS-CHSD surgery is an important consideration when
included the following documentation of rates of assessing the performance of an STS-GTSD pro-
completeness and accuracy for the specified fields gram, as their benchmark population of providers
of data: is already preselected to be among the best
thoracic surgeons in the nation. An average
• Primary diagnosis (completeness = 100 %, STS-GTSD participant program may well have
accuracy = 96.2 %) performance that substantially exceeds that of
• Primary procedure (completeness = 100 %, procedures performed by non-board-certified sur-
accuracy = 98.7 %) geons. Potentially useful areas of performance
• Mortality status at hospital discharge (com- comparison include adequacy of preoperative stag-
pleteness = 100 %, accuracy = 98.8 %) ing, functional evaluation, intraoperative lymph
node sampling, and morbidity and mortality.
Similar to the STS-ACSD, in addition to regu- Despite this challenge, the STS-GTSD is
lar confidential feedback reports, STS-CHSD data growing, and in 2015, it enrolled patients from
are used for quality assessment, performance 273 participants. External audit revealed high
improvement initiatives, research, and public accuracy (overall 95 %). Mortality and morbidity
reporting (beginning in early 2015) and to satisfy risk models for lung cancer and esophageal resec-
regulatory and reimbursement imperatives. Many tion have been developed in collaboration with the
of these additional functions are discussed in sub- STS Quality Measurement Task Force (QMTF)
sequent sections. (Kozower et al. 2010; Shapiro et al. 2010;
Wright et al. 2009), and performance metrics using is the degree of granularity and specificity of its
these risk models will be used to classify thoracic data elements (STS National Database 2014).
programs as one, two, or three stars, similar to the Since the inception of the STS National Database,
approach used in adult cardiac surgery. Because periodic (typically every 3 years, in a cycle that
STS-GTSD participants represent a high-per- allows one of each of the three databases to be
forming subset of all US surgeons performing gen- updated each year) data specification upgrades
eral thoracic procedures, STS has also compared occur based on the evolution of scientific knowl-
the unadjusted results of STS surgeons with those edge as well as feedback from database managers,
available from the Nationwide Inpatient Sample end users, and participants. Every data element
(NIS) for all surgeons performing chest operations collected has an associated sequence number
nationally. This comparison has revealed that which is mapped to a detailed clinical data spec-
surgeons who are actively participating in the ification. This feature of clinical registries – their
STS-GTSD have superior results, likely both highly structured and clinical granular data –
because of their specialized training as well as the distinguishes them from alternative data sources
feedback reports they receive. such as claims data (not clinically rich) and
Similar to the efforts by the STS-CHSD to electronic health record (EHR) data (unstructured,
standardize nomenclature internationally (see lacking specific definitions used by all institutions).
“STS Congenital Heart Surgery Database This unique advantage of clinical registries,
[STS-CHSD]” section above), the STS-GTSD including the STS National Database, also poses
continues to update its data specifications and one of their greatest challenges – data collection
harmonize data definitions with the European burden. Rather than allowing anyone to enter the
Society of Thoracic Surgeons. This work will data that become part of a patient’s STS record,
facilitate joint research and quality improvement these data are either entered by a trained abstrac-
initiatives, as well as international comparisons tor, or data entered by caregivers are carefully
of care. reviewed by the data abstractor. These data man-
Members of the STS General Thoracic Surgery agers work with surgeons, physician assistants,
Database Task Force are exploring options for nurse practitioners, and others to ensure that that
obtaining long-term outcomes for cancer resec- data entered into the STS National Database
tion, including linking the STS-GTSD with Medi- adhere to the definitions established by STS and
care data (see “STS Task Force on Longitudinal that they are supported by documentation in the
Follow-Up and Linked Registries [LFLR]” sec- patient’s medical record. These data managers
tion below). However, other data sources will also have many resources available to them including:
be required, including various cancer registries, as
40 % of lung cancer resections and 60 % of • The detailed written specifications themselves.
esophageal cancer resections are under the age • A teaching manual that expands upon the for-
of 65. (Medicare data only includes patients mal specifications and often includes clinical
65 or older and younger patients on dialysis.) examples
• Advice of colleagues in regional collaboratives
around the nation
Database Operations • Biweekly telephone calls with STS National
Database and Duke Clinical Research Institute
Data Sources leaders
• Email alerts
Although many investigators use claims data • Newsletters
(e.g., Medicare) for performance evaluation and • A 4-day annual national meeting (The Society
research, the distinguishing feature of the STS of Thoracic Surgeons Advances in Quality
National Database and similar clinical registries and Outcomes [AQO] Conference: A Data
Managers Meeting) attended by nearly 500 data • Assist programs in joining the database.
managers from around the country (at which • Develop and maintain appropriate contractual
data managers and surgeon leaders present edu- relationships with vendors, participants, and
cational sessions on challenging coding issues our warehouse and analytic center.
and new developments in data specifications) • Coordinate and staff the various STS National
Database Task Forces and their respective con-
Numerous studies have been conducted ference calls and meetings.
(Shahian et al. 2007b; Mack et al. 2005) showing • Develop and maintain budgets.
that both the number and type of procedures • Assure compliance with all relevant regulatory
performed and their results differ substantially processes, including the Health Insurance Por-
with the use of detailed clinical data as opposed tability and Accountability Act of 1996
to claims data sources. (HIPAA).
STS is working with EHR vendors to investi- • Serve as the main resource for data managers.
gate how some STS variables might be automat- • Arrange the annual STS Advances in Quality
ically extracted from routinely collected EHR and Outcomes [AQO] Conference.
data. The most straightforward variables for this • Work with external organizational partners on
type of capture would include demographics, labs, issues such as public reporting.
and structured diagnostic testing such as percent • Coordinate the ongoing upgrades of all three
coronary artery obstruction, ejection fraction, and clinical databases.
valve areas. Other STS data elements which have
complex data specifications would be more chal-
lenging to map from EHRs, and these complex Data Warehouse and Analytic Center
elements might require the addition of specific
fields to the EHR. Since 1998, the Duke Clinical Research Institute
(DCRI) has served as the data warehouse and
analytic center for the STS National Database.
Vendors DCRI receives data from participants, which
then undergo extensive data quality and con-
The Society of Thoracic Surgeons has contractual sistency checks. Each participant receives a
relationships with a number of vendors who pro- comprehensive harvest feedback report gener-
vide the data entry software by which participant ated by DCRI, as previously described. These
programs enter data into the STS National Data- feedback reports are distributed every 3 months
base. Each vendor differs in the sophistication of to participants in the STS-ACSD and every
the reports they produce, opportunity for custom- 6 months to participants in the STS-CHSD
ization, cost, and ability to link with other data- and the STS-GTSD. These feedback reports
bases such as the American College of Cardiology include extensive educational and explanatory
(ACC) National Cardiovascular Data Registry materials describing how each report and metric
(NCDR). However, each vendor must achieve are calculated. DCRI also provides statistical
basic certification by STS to ensure that their support for most of the STS National Database
software is capable of producing accurate and Task Forces, particularly the Quality Measure-
consistent results. ment Task Force, and they are also involved in
the Access and Publications Task Force, the
STS Task Force on Longitudinal Follow-Up
STS Staff and Linked Registries (LFLR), and the STS
Research Center. DCRI statisticians play an
Numerous full-time staff at STS headquarters are integral role in the design and implementation
devoted to database operations and serve multiple of all STS risk models and performance
functions: measures.
Data Quality and Audit overall assessment of the accuracy at audit sites.
In 2013, among nearly 100,000 individual data
Regardless of the granularity and specificity of the elements audited, the overall agreement rates in
data elements in any registry, they are only useful the STS-ACSD averaged nearly 97 %. As des-
if data are actually inputted in strict conformity cribed above, similar agreement rates are documen-
with their specifications. A firm belief in the accu- ted in the STS-CHSD and the STS-GTSD. In the
racy of data submitted by all programs nationally, STS-CHSD, an STS congenital heart surgeon vol-
and the metrics derived from them, provides the unteer leader also participates in each audit.
foundation of trust necessary to implement STS
programs such as voluntary public reporting.
Data quality checks exist at several stages of STS Quality Measurement Task Force
the STS data entry process. First, there are internal (STS-QMTF)
consistency and out-of-range audits that take
place at the time of data entry. For example, an The STS Quality Measurement Task Force
age of 150 years would be rejected because it falls (STS-QMTF) is responsible for all risk model
out of the acceptable data input range. Second, and performance measure development for the
submitted data are reviewed at DCRI, and exces- Society. These quality measurement activities are
sive rates of missing data or other irregularities not fully integrated into the STS National Database, a
captured during data submission are reported back unique arrangement that has numerous advan-
to STS participant for revision. Third, STS partic- tages. First, the performance measures are based
ipant sites receive a list of their demographics, risk on readily available STS clinical data. Second, the
factors, operative data, and outcomes compared to performance measures are developed through
STS nationally and to hospitals of similar size and direct collaboration between statistical consul-
academic status. Substantial differences from tants and surgeons who have both clinical exper-
these benchmarks would lead a program to eval- tise and knowledge of performance measurement
uate the accuracy of its submissions. and health policy. Third, the performance mea-
Finally, STS has an extremely robust annual sures can be tested for reliability and validity by
audit of all three of its databases, all conducted by using them in confidential participant feedback
a highly respected external organization. Ten per- reports prior to public reporting. Pilot testing is a
cent of all STS National Database sites are ran- difficult process for many measure developers, but
domly selected for audit annually. Each audit it is an inherent capability provided by a clinical
consists of 20 coronary bypass procedures and registry such as the STS National Database.
10 valve procedures; approximately 82 data ele- In addition to having the best available clinical
ments are abstracted from each medical record. data, the next most important factor in perfor-
Previously this process had required on-site visits mance measure development is risk models.
by the external auditing agency, but a mechanism These are essential to adjust for inherent differ-
has been developed to access patient records elec- ences in patient risk, and they are crucial if per-
tronically in a HIPAA-compliant fashion. In addi- formance measures are to have face validity with
tion to validating STS submissions against the stakeholder groups, especially the providers
medical record (for accuracy of the data), STS (Shahian et al. 2013a). Risk model development
submissions are also checked against hospital typically begins by identifying the most relevant
operative logs in order to ensure that all cases outcomes for a particular type of procedure
have been collected (for completeness of and specialty. Initial exploratory analyses are
the data). performed to determine if an adequate number of
Each year, all three clinical databases compris- cases and endpoints are available and over what
ing the STS National Database are audited. An period of time these need to be aggregated in order
extensive report is generated showing the agree- to assure adequate sample size for the outcome in
ment rate for all audited data elements and an question.
The selection and definition of relevant end- After endorsement by the Executive Committee
points is critical to the development of risk of STS, all STS performance measures are
models. In both quality assessment activities and published in their entirety in the peer-reviewed
clinical research to improve patient care, STS has literature (Shahian et al. 2009a, b; O’Brien
defined its major outcomes endpoint, mortality, in et al. 2009b), including all special considerations
a unique fashion. Typically, mortality after hospi- discussed during the measurement development
talizations or procedures has used one of two process, the final covariates and their parameteri-
definitions. In-hospital mortality is collected zation, and the associated intercepts and coeffi-
with high accuracy, but it misses early post- cients of the risk model equations.
discharge deaths occurring at home or in extended Risk-adjusted outcomes based on national
care facilities. Collecting only in-hospital out- benchmark STS data are provided back to partic-
comes may also create a perverse incentive to ipants at each quarterly harvest. Risk models are
discharge patients earlier than desirable so that fully updated every few years, but annually a
potential adverse outcomes do not occur during calibration factor is introduced so that the
the index hospitalization. Another approach is to observed-to-expected ratio for a given year equals
measure adverse outcomes at 30 days, regardless one. Multiple STS risk models are publicly avail-
of where the patient is located. This avoids pro- able as online calculators on the STS website
viding an incentive for premature discharge, but it (STS short-term risk calculator 2014; STS long-
may encourage some providers to keep a severely term risk calculator 2014), and these sites are
ill patient alive through artificial support just long visited thousands of times each month.
enough to meet the 30-day threshold. STS seeks The appropriate interpretation of risk-adjusted
to avoid the disadvantages of either of these results bears special mention, given both its cen-
approaches alone by combining them. The trality in performance measurement and the fact
time period of mortality data collection for all that it is often misunderstood by many who view
three STS National Databases is based upon the these reports. There are two primary statistical
STS definition of operative mortality (Overman methods by which outcomes results are adjusted
et al. 2013), which is now used by all three STS for inherent risk (Shahian and Normand 2008). In
National Databases: operative mortality is defined direct standardization, the stratum-specific rates
as (1) all deaths, regardless of cause, occurring (e.g., age, sex, ethnicity) for each population of
during the hospitalization in which the operation interest (e.g., a particular hospital’s stratum-
was performed, even if after 30 days (including specific rate of adverse events) are applied to a
patients transferred to other acute care facilities), standard or reference population. This method is
and (2) all deaths, regardless of cause, occurring often used in epidemiology where there are a
after discharge from the hospital, but before the limited number of strata to be considered, and
end of the 30th postoperative day. the rates for each stratum are available. However,
As the next step in risk model development, for most provider profiling applications, the num-
bivariate analyses are performed to study the asso- ber of strata, corresponding to individual risk fac-
ciation between individual risk factors and the tors, is too large to standardize in this fashion.
outcome. A comprehensive array of candidate Accordingly, almost all healthcare profiling initia-
risk factors is entered into multivariable risk tives use another statistical method, indirect stan-
models, and odds ratios (with 95 % CI) are deter- dardization, for risk adjustment. In this approach,
mined for each. In some instances, certain vari- the rates derived from a reference or standard
ables are “forced” into the model regardless of population of hospitals, often in the form of a
statistical significance because they are regarded risk model with intercepts and coefficients, are
by clinical experts as critical for face validity. The applied to the particular case mix of the institu-
output of these models is assessed using measures tions being studied. The actual results for an indi-
of calibration, discrimination, and reliability and vidual program’s case mix are compared to what
using actual data from the STS National Database. would have been expected had that program’s
population of patients been treated by an average has addressed this in a number of ways. First, it
provider from the reference population. has expanded its activities in risk modeling and
Both methods of standardization provide risk performance metrics beyond CABG to include
adjustment in a generic sense – they “level the other major cardiothoracic procedures such as
playing field” – so that programs caring for sicker isolated aortic valve replacement, aortic valve
patients are not penalized. However, only direct replacement combined with CABG, mitral
standardization permits direct comparison of the valve replacement, mitral valve repair, multiple
risk-standardized results of one specific program valve procedures, and numerous procedures in
with those of another. In indirect standardization, general thoracic surgery and congenital cardiac
the results for any particular program are based surgery. This expansion of the procedures that
solely on its specific mix of patients, and these are available for risk modeling and performance
results can only be compared with the overall assessment provides a much more comprehensive
results of all providers for a similar case mix assessment of quality than focusing solely on
(Shahian and Normand 2008). For example, a CABG, whose incidence and rate of adverse out-
small community heart surgery program may comes have both been declining over the past
have a lower risk-adjusted mortality rate than a decade. Second, instead of collecting information
tertiary/quaternary center. However, using indi- only on mortality, the STS-QMTF has developed
rect standardization, it cannot be assumed that if risk models for more of the individual surgical
faced with the same case mix of the tertiary center, complications such as stroke, reoperation, pro-
it would also have superior results. longed ventilation, infection, renal failure, pro-
The primary motivation for development of the longed length of stay, and a composite of major
STS National Database was the need to provide morbidity and mortality.
accurate performance assessment, and this Third, in addition to viewing these measures
remains the highest priority of the STS-QMTF. individually, STS has increasingly focused on
A variety of measures have been developed composite measures using multivariate hierarchi-
including structure, process, and outcomes (the cal approaches. The first STS composite measure,
Donabedian triad) (Donabedian 1966). Risk- CABG, included the risk-adjusted mortality, the
adjusted mortality rates for CABG were the orig- occurrence of any (any or none) of the five major
inal outcome used to classify cardiac surgery per- complications of CABG surgery (stroke, renal
formance, but even this archetypal measure can be failure, prolonged ventilation, reoperation, and
inadequate. For example, consider three survivors infection), the use of at least one internal mam-
of coronary artery bypass surgery (CABG), all of mary artery graft, and the provision of all four (all
whom would be considered to have had identical or none) NQF-endorsed medications (preopera-
quality procedures based on mortality alone. One tive beta blockade, discharge beta blockade,
patient receives all the appropriate bypass grafts lipid-lowering agents such as statins and aspirin)
and medications and sustains no complications. (Shahian et al. 2007a; O’Brien et al. 2007). Sim-
The second patient receives only vein grafts, ilar composite measures have been developed for
which have limited longevity, and does not isolated aortic valve replacement (Shahian
receive postoperative medications to prevent pro- et al. 2012a) and for aortic valve replacement
gression of coronary disease. The third patient combined with CABG (Shahian et al. 2014), and
experiences the new onset of dialysis-dependent a composite measure is currently under develop-
renal failure which will markedly impact both ment for mitral valve surgery. These latter mea-
longevity and quality of life. Despite having all sures differ from the isolated CABG composite in
survived surgery, the quality received by these that they consist solely of outcomes measures
three patients varied markedly. (mortality and morbidity) and do not include
The STS-QMTF has recognized the inade- process measures. This reflects both a shift in
quacy of using CABG risk-adjusted mortality as healthcare performance measurement toward out-
the sole quality metric for cardiac surgery, and it comes measures (rather than structure or process
measures) and the fact that evidence-based, variability in resource use among programs, as
widely accepted process measures suitable for well as the development of risk models for cost,
performance measurement are not available for so that programs being evaluated for cost effi-
these other procedures. ciency are not unfairly penalized when they care
STS envisions a portfolio of such procedure- for particularly complex patients. STS ultimately
specific composite measures and, ultimately, an envisages a comprehensive portfolio of perfor-
overall composite of procedural performance mance measures which might include a composite
encompassing information from all these individ- of multiple procedural composite measures,
ual composite metrics (a “composite of compos- appropriateness, failure to rescue, patient-
ites”). However, even this “composite of centered outcomes, and risk-adjusted resource
composites” will only be one component of an utilization.
overall STS performance measurement system Finally, the most appropriate level of attribu-
that will include multiple other domains. For tion for performance measures is a focus of con-
example, just as important as the outcome of tinuing discussion. STS has historically measured
particular procedure is the question of whether performance only at the participant level (typi-
that procedure was indicated in the first place. cally a hospital) for a variety of reasons. There
Accordingly, STS has mapped both the ACCF/ are sample size concerns at the individual surgeon
AHA CABG guidelines (Hillis et al. 2011) and the level, and cardiac surgery is a “team sport” requir-
multi-societal 2012 Appropriate Use Criteria ing many participants in addition to the surgeon
(AUC) for Coronary Revascularization (Patel (e.g., cardiologist, anesthesiologist, perfusionist,
et al. 2012) to the relevant data elements in the nurses, critical care specialists, respiratory thera-
STS-ACSD. This will ultimately allow STS par- pists). However, notwithstanding these concerns,
ticipants to receive immediate documentation that many commercial payers and governmental agen-
their patient meets one of these CABG guidelines cies are now publishing (or requiring) information
or AUC. Similar mapping is underway for valve about surgeon-level performance, much of which
procedures. STS has also begun to explore failure are based on inadequately adjusted administrative
to rescue (mortality following the development of claims data and/or flawed analytics. Conse-
a complication of surgery) as an additional new quently, STS feels a responsibility to offer a
quality metric (Pasquali et al. 2012a). Previous valid, surgeon-level metric. An individual sur-
research suggests that the ability to salvage a geon performance metric has now been developed
patient from a serious complication is a by STS for adult cardiac surgery. It is a composite
distinguishing feature of high-quality programs measure based on morbidity and mortality data for
and complements other metrics such as overall 5 of the most common performed procedures,
morbidity. Patient-reported outcomes are also aggregated over 3 years. This measure has very
increasingly recognized for their value in high reliability (0.81) because of the large number
assessing quality. These include both patient- of endpoints being analyzed (Shahian et al. 2015).
reported functional outcomes (e.g., return to Regardless of the particular performance mea-
work and overall functional capacity) as well as sure, the general STS-QMTF approach to profil-
patient satisfaction (e.g., HCAHPS or Hospital ing performance results across providers is
Consumer Assessment of Healthcare Providers similar. Results are estimated in Bayesian hierar-
and Systems, CGCAHPS or Clinician and Group chical models, and providers are classified as hav-
Consumer Assessment of Healthcare Providers). ing expected, better than expected, or worse than
STS has also formed a Cost and Resource Task expected performance based on true Bayesian
Force within the STS National Database. The probabilities rather than frequentist confidence
objective of this task force is to link the STS intervals (Shahian et al. 2007a; O’Brien
National Database with cost data from hospital, et al. 2007). Unlike the latter, the Bayesian cred-
commercial, federal, or state payer data. Such a ible interval has an intuitive probability interpre-
linkage would provide accurate data regarding tation. For example, given a database participant’s
observed data, if the lower limit of the 98 % consumers could choose surgeons or hospitals
Bayesian credible interval is greater than the based on other criteria, such as convenience,
STS average value, then there is at least 99 % availability, or service.
probability (98 % credible interval plus 1 % In reporting their results, STS provides varying
upper tail) that the participant’s true performance levels of granularity. These range from point esti-
(e.g., in avoiding mortality or morbidity or in mates with credible intervals for statistically
using an internal mammary artery graft) exceeds sophisticated users and star ratings corresponding
the STS average value for their particular case to as expected, better than expected, or worse than
mix. The Bayesian probability (and corres- expected for typical consumers (based on the
ponding Bayesian credible interval) selected for work of Professor Judith Hibbard (Hibbard
a particular measure varies depending on factors et al. 2001)). When a composite measure encom-
such as event rates, variation of scores across passes multiple procedures or performance
programs, and sample sizes for typical providers. domains, STS always provides the ability to drill
For procedures such as CABG which are fre- down to the lowest level of the composite, its
quently performed, STS has used 99 % Bayesian constituent elements.
probabilities, which result in approximately
10–15 % of STS providers being labeled as low
performing and 10–15 % classified as high STS Quality Initiatives Task Force
performing, with the remainder being average. (STS-QIT)
For less common procedures such as isolated
valve replacement, STS-QMTF has used 95 % The acquisition of healthcare data and their use in
Bayesian probabilities (97.5 % credible inter- performance assessment are not goals in them-
vals), which results in fewer outliers (Shahian selves. The primary objective of all these activi-
et al. 2012a). Even with the lower probability ties is to improve healthcare quality. Just as the
requirement, the smaller number of observations Quality Measurement Task Force is an integral
means there is less data upon which to base an part of the STS National Database, the STS Qual-
estimate of a provider’s performance, and the ity Initiatives Task Force (STS-QIT) is similarly
percentage of outliers is typically lower than for fully integrated. This facilitates the use of STS
CABG. If the probability criterion were even data as the basis for quality improvement projects
lower (e.g., 90 % Bayesian probability), then and allows both baseline and subsequent perfor-
more participants would be classified as outliers, mance to be measured, thus documenting the effec-
but our certainty would also be much lower, jeop- tiveness of interventions. Another advantage of
ardizing face validity with providers and other integrating the Quality Initiatives Task Force
stakeholders. within the database is to facilitate the identification
Importantly, when estimated in this fashion, of gaps and variability in national performance and
there is no requirement for any fixed number of to focus quality initiatives in these areas.
high or low outliers. If, for example, all programs At the national level, quality improvement ini-
function at a high level and were statistically tiatives have been conducted using the STS
indistinguishable using these criteria, they would National Database to improve compliance with
all be average (or, in STS parlance, two-star) pro- preoperative beta blockade and use of internal
grams. In contrast to payers and commercial mammary artery bypass grafts for CABG, both
report card developers, who often seem deter- of which are NQF-endorsed performance mea-
mined to demonstrate differences among pro- sures (Ferguson et al. 2003). A 2012 report by
viders, STS believes the ideal situation from a ElBardissi and colleagues (ElBardissi et al. 2012)
societal perspective would be for all programs to suggests that the STS National Database and its
be functioning at a very high level and statistically quality measurement and improvement activities
indistinguishable (e.g., the very high safety record have dramatically improved cardiac surgery
of the commercial aircraft industry). Then, results over the past decade.
STS-QIT has begun to identify key opportuni- • Public reporting demonstrates commitment to
ties for improvement within cardiothoracic sur- quality improvement.
gery and has developed focused webinars and • Public reporting is one approach to improving
online libraries of best practice articles to address quality.
these issues. Specific recent webinars (STS Qual- • Public reporting promotes patient autonomy
ity Improvement webinars 2014)include blood and facilitates shared decision-making.
conservation and transfusion triggers, glucose • If professional medical and surgical societies
management, and mediastinal staging prior to do not publish accurate information about per-
lung cancer surgery. The Quality Initiatives Task formance using the best available clinical data
Force is also exploring the possibility of identify- and risk adjustment, then the public will be
ing consistently low-performing programs using forced to judge our performance based on
STS data and then offering such programs the unadjusted or inadequately adjusted adminis-
possibility of external review of their database trative claims data.
integrity (to identify potential coding issues that
might lead to false outlier classification) and clin-
ical practice (to facilitate quality improvement). The STS Public Reporting Task Force is
A number of states and regions have also used responsible for the development and maintenance
STS data to improve quality. For example, in a of the web-based platforms for public reporting of
collaborative effort with Blue Cross Blue Shield data from the STS National Database. STS has
of Michigan, the Michigan Society of Thoracic and implemented voluntary public reporting through
Cardiovascular Surgeons has brought together rep- its STS Public Reporting Online Initiative [www.
resentatives from all cardiac surgery programs in sts.org/publicreporting] and through collaboration
the state (Prager et al. 2009). They review perfor- with Consumers Union [www.consumerreports.
mance of all programs, identify gaps and variability org/health]. In each case, these reports are based
in outcomes, and review each cardiac surgery death on the STS composite measures and star ratings
using a standardized phase of care mortality anal- (with drill-down capability) described above.
ysis (POCMA). They have also implemented a In September 2010, STS began publicly
number of best practice initiatives. Similarly, the reporting outcomes of isolated CABG surgery
Virginia Cardiac Surgery Quality Initiative (Speir based on its NQF-endorsed composite CABG
et al. 2009) has brought together surgeons from metric. In January 2013, STS began publicly
across the state. They have linked STS clinical reporting outcomes of isolated aortic valve
data to cost data with a focus on reducing both replacement (AVR) surgery based on its
complications and their associated costs. NQF-endorsed AVR composite score. In August
2014, STS began publicly reporting outcomes of
combined AVR + CABG surgery, using an
STS Public Reporting Task Force NQF-endorsed composite score with the same
two domains (risk-adjusted morbidity and mortal-
Among healthcare professional societies, STS has ity) as the isolated AVR composite.
taken the lead in public reporting by providing STS plans to expand its portfolio of publicly
easily understandable cardiothoracic surgical out- reported cardiothoracic surgical quality measures
comes data to the public (Shahian et al. 2011a, b). by at least one additional new operation every
STS support of public reporting and transparency year. Future publicly reported metrics will include
is based on several principles: pediatric and congenital heart surgery risk-
adjusted operative mortality based on the 2014
• Public reporting and accountability are our STS Congenital Heart Surgery Database Mortal-
professional responsibilities. ity Risk Model (planned for public reporting in
• Patients and their families have a right to know the January 2015), mitral valve replacement (MVR)
outcomes of cardiothoracic surgical procedures. and mitral valve repair, a multi-domain composite
for pulmonary lobectomy for cancer, and a multi- literature and have significantly advanced knowl-
domain composite for esophagectomy. As of mid- edge in cardiothoracic surgery.
2016, 50 % of adult cardiac surgery participants in
the STS National Database and 60 % of congenital
heart surgery participants had consented to volun- STS Task Force on Longitudinal Follow-
tary public reporting. Up and Linked Registries (STS-LFLR)
The STS Task Force on Longitudinal Follow-Up

STS Research Center and Linked Registries (STS-LFLR) is responsi-
ble for oversight of research initiatives that
The initial and still primary purpose of the STS involve longitudinal follow-up of patients and
National Database is quality assessment and qual- linking of the STS National Database to other
ity improvement in cardiothoracic surgery. The sources of data. The transformation of the STS
STS National Database and its quality assessment National Database to a platform for longitudinal
activities, development of nationally recognized follow-up will ultimately result in higher quality
quality measures, and performance improvement of care for all cardiothoracic surgical patients by
initiatives are all built on the foundation of more facilitating capture of long-term clinical and
than five million surgical records (STS National nonclinical outcomes on a national level. Impor-
Database 2014; Shahian et al. 2013b). Because it tant strategies include the development of clini-
is such a robust source of clinical data, the STS cal longitudinal follow-up modules within the
National Database also provides a platform for STS National Database itself and linking the
research to advance the state of the art of cardio- STS National Database to other clinical regis-
thoracic surgery. This research activity is overseen tries, administrative databases, and national
by the STS Research Center (2014). death registries:
Launched in 2011, the STS Research Center is
a nationally recognized leader in outcomes 1. Using probabilistic matching with shared indi-
research. The center seeks to capitalize on the rect identifiers, the STS National Database can
value of the STS National Database and other be linked to administrative claims databases
resources to provide scientific evidence and sup- (such as the CMS Medicare Database (Jacobs
port cutting-edge research. Such research ultimately et al. 2010; Hammill et al. 2009) and the Pedi-
helps cardiothoracic surgeons, government, indus- atric Health Information System (PHIS) data-
try, and other interested parties to improve surgical base (Pasquali et al. 2010, 2012)) and become
care and outcomes. a valuable source of information about long-
All research that is confined to the STS term mortality, rates of re-hospitalization, mor-
National Database and to its standard period of bidity, and cost (Shahian et al. 2012b;
data collection (the index operative hospitaliza- Weintraub et al. 2012; Pasquali et al. 2012b).
tion and 30 days postoperatively) is vetted 2. Using deterministic matching with shared unique
through the STS Access and Publications (A and direct identifiers, the STS National Database can
P) Task Force. Research that involves linking the be linked to national death registries like the
STS National Database to other databases, or lon- Social Security Death Master File (SSDMF)
gitudinal follow-up beyond the standard period of and the National Death Index (NDI) in order to
data collection of the STS National Database, is verify life status over time (Jacobs et al. 2011b).
vetted by the STS Task Force on Longitudinal
Follow-Up and Linked Registries (STS-LFLR) Through either probabilistic matching or deter-
(see “STS-LFLR” section below). Using this pro- ministic matching, the STS National Database can
cess, research activities based on data from the link to multiple other clinical registries, such as
STS National Database have resulted in more than the ACC NCDR, and to claims data sources, in
300 peer-reviewed publications in the scientific order to provide enhanced clinical follow-up and
opportunities for comparative effectiveness Summary

research. The NIH-funded ASCERT trial (Amer-
ican College of Cardiology Foundation-Society of The STS National Database, comprised of three
Thoracic Surgeons Collaboration on the Compar- specialty-specific registries, is the premier clinical
ative Effectiveness of Revascularization Strate- data registry for cardiothoracic surgery. In com-
gies trial) exemplifies this approach. ASCERT parison with other available data sources, the STS
linked STS and ACC clinical registry data with National Database and similar clinical registries
Medicare data to compare longer-term outcomes have the advantages of structured, granular data
for surgical and percutaneous approaches to cor- elements defined by clinical experts, standardized
onary revascularization (Weintraub et al. 2012). data specifications, high accuracy as confirmed by
Similarly, the NIH-funded linkage of the STS- external audit, and the capability to provide more
CHSD to the Pediatric Health Information System robust risk adjustment.
(PHIS) Database used linked clinical and admin- Clinical registries like the STS National Data-
istrative data to facilitate comparative effective- base are the best sources for measuring healthcare
ness research in the domains of perioperative outcomes. In contrast to many claims data sources,
methylprednisolone and outcome in neonates the STS National Database provides “real-world”
undergoing heart surgery (Pasquali et al. 2012c) data from all age groups and payers. Furthermore,
and antifibrinolytic medications in pediatric heart as described in this chapter, the ability to accurately
surgery (Pasquali et al. 2012d). measure clinical outcomes requires standardized
clinical nomenclature, uniform standards for defin-
ing and collecting data elements, strategies to
Device Surveillance adjust for the complexity of patients, and tech-
niques to verify the completeness and accuracy of
Another role of the STS National Database is the data. All of these elements exist in clinical regis-
longitudinal surveillance of implanted medical tries such as the STS National Database. Conse-
devices. The use of the STS National Database quently, metrics derived from clinical registries are
as a platform for device surveillance is best exem- ideally suited for high-stakes applications such as
plified by the STS/ACC Transcatheter Valve public reporting, center of excellence designation,
Therapies (TVT) Registry (Carroll et al. 2013; and reimbursement. STS performance measures
Mack et al. 2013), which tracks patients who based on the STS National Database have been
undergo Transcatheter Aortic Valve Replacement used to develop more than 30 measures endorsed
(TAVR). Since December 2011, the TVT Registry by the National Quality Forum.
has collected data for all commercial TAVR pro- Clinical registries can be linked to other data
cedures performed in the USA. As of mid-2016, sources to obtain information about long-term
it had 457 enrolled sites and had acquired 74,240 outcomes and risk-adjusted cost and resource uti-
patient records (personal communication, Joan lization, all increasingly important considerations
Michaels). in healthcare. Clinical registries are also used
The TVT Registry was launched as a joint to satisfy regulatory and governmental require-
initiative of STS and ACC in collaboration with ments, as exemplified by Qualified Clinical
CMS, the US Food and Drug Administration Data Registries in the CMS Physician Quality
(FDA), and the medical device industry. It serves Reporting System (PQRS) program, and the use
as an objective, comprehensive, and scientifically of registries for post-market surveillance of new
rigorous resource to improve the quality of patient implantable devices, in collaboration with CMS
care, to monitor the safety and effectiveness of and FDA, as exemplified by the Transcatheter
TVT devices through post-market surveillance, to Valve Therapies (TVT) Registry.
provide an analytic resource for TVT research, Clinical registries are the ideal platform for
and to enhance communication among key developing evidence for best practice guidelines
stakeholders. and to document appropriateness of procedures.
They are also invaluable for comparative effec- ElBardissi AW, Aranki SF, Sheng S, O’Brien SM,
tiveness research. Although randomized trials Greenberg CC, Gammie JS. Trends in isolated coro-
nary artery bypass grafting: an analysis of the Society
have been considered by many to be the gold of Thoracic Surgeons adult cardiac surgery database.
standard of comparative effectiveness research, J Thorac Cardiovasc Surg. 2012;143(2):273–81.
recent efforts have examined the possibility of Ferguson Jr TB, Peterson ED, Coombs LP, et al. Use of
using clinical registries as platforms for random- continuous quality improvement to increase use of
process measures in patients undergoing coronary
ized trials (Frobert et al. 2013; Lauer and artery bypass graft surgery: a randomized controlled
D’Agostino 2013). Performing randomized trials trial. JAMA. 2003;290(1):49–56.
within clinical registries would potentially accom- Franklin RC, Jacobs JP, Krogmann ON, et al.
plish the dual objectives of decreasing the cost of Nomenclature for congenital and paediatric cardiac
disease: historical perspectives and The International
these trials and increasing the generalizability of Pediatric and Congenital Cardiac Code. Cardiol Young.
the results (as the included patients are more rep- 2008;18 Suppl 2:70–80.
resentative of “real-world” populations). Frobert O, Lagerqvist B, Olivecrona GK, et al.
Clinical registries provide practitioners with Thrombus aspiration during ST-segment elevation
myocardial infarction. N Engl J Med. 2013;
accurate and timely feedback of their own out- 369(17):1587–97.
comes and can benchmark these outcomes to Frohnert BK, Lussky RC, Alms MA, Mendelsohn NJ,
regional, national, or even international aggregate Symonik DM, Falken MC. Validity of hospital dis-
data, thus facilitating quality improvement. charge data for identifying infants with cardiac defects.
J Perinatol. 2005;25(11):737–42.
The STS National Database exemplifies that Grover FL, Shahian DM, Clark RE, Edwards FH. The STS
potential value of clinical registries for all of National Database. Ann Thorac Surg. 2014;97 Suppl 1:
healthcare. High-quality data are collected once S48–54.
and then used for multiple purposes, with the Hammill BG, Hernandez AF, Peterson ED, Fonarow
GC, Schulman KA, Curtis LH. Linking inpatient
ultimate goal of improving the care of all patients. clinical registry data to Medicare claims data using
indirect identifiers. Am Heart J. 2009;157(6):
995–1000.
References Hannan EL, Cozzens K, King III SB, Walford G, Shah
NR. The New York State cardiac registries: history,
Carroll JD, Edwards FH, Marinac-Dabic D, et al. The contributions, limitations, and lessons for future efforts
STS-ACC transcatheter valve therapy national registry: to assess and publicly report healthcare outcomes. J Am
a new partnership and infrastructure for the introduc- Coll Cardiol. 2012;59(25):2309–16.
tion and surveillance of medical devices and therapies. Hibbard JH, Peters E, Slovic P, Finucane ML, Tusler
J Am Coll Cardiol. 2013;62(11):1026–34. M. Making health care quality reports easier to use.
Clark RE. It is time for a national cardiothoracic surgical Jt Comm J Qual Improv. 2001;27(11):591–604.
data base. Ann Thorac Surg. 1989;48(6):755–6. Hillis LD, Smith PK, Anderson JL, et al. ACCF/AHA
Codman EA. The product of a hospital. Surg Gynecol guideline for coronary artery bypass graft surgery:
Obstet. 1914;18:491–6. executive summary: a report of the American College
Codman EA. A study in hospital efficiency. As demon- of Cardiology Foundation/American Heart Association
strated by the case report of the first two years of a Task Force on Practice Guidelines. Circulation.
private hospital. Reprint edition (originally published 2011;124(23):2610–42.
privately 1914–1920) ed. Oakbrook Terrace: Joint Iezzoni LI. Risk adjustment for measuring health care out-
Commission on Accreditation of Healthcare Organiza- comes. 3rd ed. Chicago: Health Administration Press;
tions; 1995. 2003.
Cronk CE, Malloy ME, Pelech AN, et al. Completeness of Jacobs JP, Jacobs ML, Mavroudis C, et al. Nomenclature
state administrative databases for surveillance of con- and databases for the surgical treatment of congenital
genital heart disease. Birth Defects Res A Clin Mol cardiac disease–an updated primer and an analysis
Teratol. 2003;67(9):597–603. of opportunities for improvement. Cardiol Young.
Donabedian A. Evaluating the quality of medical care. 2008;18 Suppl 2:38–62.
Milbank Mem Fund Q. 1966;44(3):166–206. Jacobs JP, Jacobs ML, Lacour-Gayet FG,
Donabedian A. The quality of care. How can it be et al. Stratification of complexity improves the utility
assessed? JAMA. 1988;260(12):1743–8. and accuracy of outcomes analysis in a multi-
Donabedian A. The end results of health care: Ernest institutional congenital heart surgery database: applica-
Codman’s contribution to quality assessment and tion of the risk adjustment in congenital heart surgery
beyond. Milbank Q. 1989;67(2):233–56. (RACHS-1) and Aristotle systems in the Society of
Thoracic Surgeons (STS) Congenital Heart Surgery congenital heart surgery. J Thorac Cardiovasc Surg.
Database. Pediatr Cardiol. 2009;30(8):1117–30. 2009a;138(5):1139–53.
Jacobs JP, Edwards FH, Shahian DM, et al. Successful O’Brien SM, Shahian DM, Filardo G, et al. The Society of
linking of the Society of Thoracic Surgeons adult car- Thoracic Surgeons 2008 cardiac surgery risk models:
diac surgery database to Centers for Medicare and part 2–isolated valve surgery. Ann Thorac Surg.
Medicaid Services Medicare data. Ann Thorac Surg. 2009b;88 Suppl 1:S23–42.
2010;90(4):1150–6. O’Connor GT, Plume SK, Olmstead EM, et al. A regional
Jacobs ML, Daniel M, Mavroudis C, et al. Report of the prospective study of in-hospital mortality associated
2010 Society of Thoracic Surgeons congenital heart with coronary artery bypass grafting. The Northern
surgery practice and manpower survey. Ann Thorac New England Cardiovascular Disease Study Group.
Surg. 2011a;92(2):762–8. JAMA. 1991;266(6):803–9.
Jacobs JP, Edwards FH, Shahian DM, et al. Successful Overman DM, Jacobs JP, Prager RL, et al. Report from the
linking of the Society of Thoracic Surgeons database Society of Thoracic Surgeons National Database Work-
to social security data to examine survival after cardiac force: clarifying the definition of operative mortality.
operations. Ann Thorac Surg. 2011b;92(1):32–7. World J Pediatr Congenit Heart Surg. 2013;4(1):10–2.
Jacobs JP, Shahian DM, He X, et al. Penetration, complete- Pasquali SK, Jacobs JP, Shook GJ, et al. Linking clinical
ness, and representativeness of the Society of Thoracic registry data with administrative data using indirect
Surgeons adult cardiac surgery database. Ann Thorac identifiers: implementation and validation in the con-
Surg. 2016;101(1):33–41. genital heart surgery population. Am Heart J. 2010;
Jantzen DW, He X, Jacobs JP, et al. The impact of differ- 160(6):1099–104.
ential case ascertainment in clinical registry versus Pasquali SK, Li JS, Jacobs ML, Shah SS, Jacobs
administrative data on assessment of resource utiliza- JP. Opportunities and challenges in linking information
tion in pediatric heart surgery. World J Pediatr Congenit across databases in pediatric cardiovascular medicine.
Heart Surg. 2014;5(3):398–405. Prog Pediatr Cardiol. 2012a;33(1):21–4.
Kouchoukos NT, Ebert PA, Grover FL, Lindesmith Pasquali SK, He X, Jacobs JP, Jacobs ML, O’Brien SM,
GG. Report of the Ad Hoc Committee on risk factors Gaynor JW. Evaluation of failure to rescue as a quality
for coronary artery bypass surgery. Ann Thorac Surg. metric in pediatric heart surgery: an analysis of the STS
1988;45(3):348–9. Congenital Heart Surgery Database. Ann Thorac Surg.
Kozower BD, Sheng S, O’Brien SM, et al. STS database 2012b;94(2):573–9.
risk models: predictors of mortality and major morbid- Pasquali SK, Gaies MG, Jacobs JP, William GJ, Jacobs
ity for lung cancer resection. Ann Thorac Surg. ML. Centre variation in cost and outcomes for con-
2010;90(3):875–81. genital heart surgery. Cardiol Young. 2012c;22(6):
Lauer MS, D’Agostino Sr RB. The randomized registry 796–9.
trial–the next disruptive technology in clinical Pasquali SK, Li JS, He X, et al. Perioperative methylpred-
research? N Engl J Med. 2013;369(17):1579–81. nisolone and outcome in neonates undergoing heart
Mack MJ, Herbert M, Prince S, Dewey TM, Magee MJ, surgery. Pediatrics. 2012d;129(2):e385–91.
Edgerton JR. Does reporting of coronary artery bypass Pasquali SK, Li JS, He X, et al. Comparative analysis of
grafting from administrative databases accurately antifibrinolytic medications in pediatric heart surgery. J
reflect actual clinical outcomes? J Thorac Cardiovasc Thorac Cardiovasc Surg. 2012e;143(3):550–7.
Surg. 2005;129(6):1309–17. Pasquali SK, Peterson ED, Jacobs JP, et al. Differential
Mack MJ, Brennan JM, Brindis R, et al. Outcomes follow- case ascertainment in clinical registry versus adminis-
ing transcatheter aortic valve replacement in the United trative data and impact on outcomes assessment for
States. JAMA. 2013;310(19):2069–77. pediatric cardiac operations. Ann Thorac Surg.
Mallon WJ. Ernest Amory Codman: the end result of a life 2013;95(1):197–203.
in medicine. Philadelphia: W.B.Saunders Company; Passaro Jr E, Organ CH, Ernest Jr A. Codman: the
2000. improper Bostonian. Bull Am Coll Surg. 1999;
Mavroudis C, Jacobs JP. Congenital heart surgery nomen- 84(1):16–22.
clature and database project: overview and minimum Patel MR, Dehmer GJ, Hirshfeld JW, et al. ACCF/SCAI/
dataset. Ann Thorac Surg. 2000;69(3, Suppl 1):S1–17. STS/AATS/AHA/ASNC/HFSA/SCCT 2012 appropri-
Neuhauser D. Ernest Amory Codman, M.D., and end ate use criteria for coronary revascularization focused
results of medical care. Int J Technol Assess Health update: a report of the American College of Cardiology
Care. 1990;6(2):307–25. Foundation Appropriate Use Criteria Task Force, Soci-
O’Brien SM, Shahian DM, Delong ER, et al. Quality mea- ety for Cardiovascular Angiography and Interventions,
surement in adult cardiac surgery: part 2–Statistical con- Society of Thoracic Surgeons, American Association
siderations in composite measure scoring and provider for Thoracic Surgery, American Heart Association,
rating. Ann Thorac Surg. 2007;83 Suppl 4:S13–26. American Society of Nuclear Cardiology, and the
O’Brien SM, Clarke DR, Jacobs JP, et al. An empirically Society of Cardiovascular Computed Tomography.
based tool for analyzing mortality associated with J Thorac Cardiovasc Surg. 2012;143(4):780–803.
Prager RL, Armenti FR, Bassett JS, et al. Cardiac surgeons Shahian DM, He X, Jacobs JP, et al. The Society of
and the quality movement: the Michigan experience. Thoracic Surgeons composite measure of individual
Semin Thorac Cardiovasc Surg. 2009;21(1):20–7. surgeon performance for adult cardiac surgery: a
Shahian DM, Normand SL. Comparison of “risk-adjusted” report of the Society of Thoracic Surgeons quality
hospital outcomes. Circulation. 2008;117(15):1955–63. measurement task force. Ann Thorac Surg.
Shahian DM, Edwards FH, Ferraris VA, et al. Quality mea- 2015;100:1315–1325.
surement in adult cardiac surgery: part 1–Conceptual Shapiro M, Swanson SJ, Wright CD, et al. Predictors of
framework and measure selection. Ann Thorac Surg. major morbidity and mortality after pneumonectomy
2007a;83 Suppl 4:S3–12. utilizing the Society for Thoracic Surgeons General
Shahian DM, Silverstein T, Lovett AF, Wolf RE, Normand Thoracic Surgery Database. Ann Thorac Surg.
SL. Comparison of clinical and administrative data 2010;90(3):927–34.
sources for hospital coronary artery bypass graft surgery Speir AM, Rich JB, Crosby I, Fonner Jr E. Regional col-
report cards. Circulation. 2007b;115(12):1518–27. laboration as a model for fostering accountability and
Shahian DM, O’Brien SM, Filardo G, et al. The Society of transforming health care. Semin Thorac Cardiovasc
Thoracic Surgeons 2008 cardiac surgery risk models: Surg. 2009;21(1):12–9.
part 1–coronary artery bypass grafting surgery. Ann Spiegelhalter DJ. Surgical audit: statistical lessons from
Thorac Surg. 2009a;88 Suppl 1:S2–22. Nightingale and Codman. J R Stat Soc (Series A).
Shahian DM, O’Brien SM, Filardo G, et al. The Society of 1999;162(Part 1):45–58.
Thoracic Surgeons 2008 cardiac surgery risk models: Strickland MJ, Riehle-Colarusso TJ, Jacobs JP, et al. The
part 3–valve plus coronary artery bypass grafting sur- importance of nomenclature for congenital cardiac dis-
gery. Ann Thorac Surg. 2009b;88 Suppl 1:S43–62. ease: implications for research and evaluation. Cardiol
Shahian DM, Edwards FH, Jacobs JP, et al. Public Young. 2008;18 Suppl 2:92–100.
reporting of cardiac surgery performance: part STS long-term risk calculator. http://www.sts.org/quality-
1–history, rationale, consequences. Ann Thorac Surg. research-patient-safety/quality/ascert-long-term-survival-
2011a;92 Suppl 3:S2–11. calculator. Accessed 11 July 2014.
Shahian DM, Edwards FH, Jacobs JP, et al. Public reporting STS National Database. http://www.sts.org/sections/
of cardiac surgery performance: part 2–implementation. stsnationaldatabase/. Accessed 26 July 2014.
Ann Thorac Surg. 2011b;92 Suppl 3:S12–23. STS Quality Improvement webinars. http://www.sts.org/
Shahian DM, He X, Jacobs JP, et al. The Society of Thoracic education-meetings/sts-webinar-series. Accessed
Surgeons isolated aortic valve replacement (AVR) com- 12 July 2014.
posite score: a report of the STS Quality Measurement STS Research Center. http://www.sts.org/sites/default/files/
Task Force. Ann Thorac Surg. 2012a;94(6):2166–71. documents/pdf/DirectorSTSResearchCenter_April2014.
Shahian DM, O’Brien SM, Sheng S, et al. Predictors of long- pdf. Accessed 13 July 2014.
term survival following coronary artery bypass grafting STS short term risk calculator. http://www.sts.org/quality-
surgery: results from The Society of Thoracic Surgeons research-patient-safety/quality/risk-calculator-and-models.
Adult Cardiac Surgery Database (The ASCERT Study). Accessed 11 July 2014.
Circulation. 2012b;125(12):1491–500. Weintraub WS, Grau-Sepulveda MV, Weiss JM,
Shahian DM, He X, Jacobs JP, et al. Issues in quality et al. Comparative effectiveness of revasculari-
measurement: target population, risk adjustment, and zation strategies. N Engl J Med. 2012;366(16):
ratings. Ann Thorac Surg. 2013a;96(2):718–26. 1467–76.
Shahian DM, Jacobs JP, Edwards FH, et al. The Society of Wright CD, Kucharczuk JC, O’Brien SM, Grab JD, Allen
Thoracic Surgeons National Database. Heart. 2013b; MS. Predictors of major morbidity and mortality after
99(20):1494–501. esophagectomy for esophageal cancer: a Society of
Shahian DM, He X, Jacobs JP, et al. The STS AVR + CABG Thoracic Surgeons General Thoracic Surgery Database
composite score: a report of the STS Quality Measure- risk adjustment model. J Thorac Cardiovasc Surg.
ment Task Force. Ann Thorac Surg. 2014;97(5):1604–9. 2009;137(3):587–95.
Health Services Information:
Patient Safety Research Using 11
Administrative Data
Chunliu Zhan
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Administrative Data: Definition, Data Resources, and Potential Patient Safety
Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Medical Claims, Discharge, and Other Health
Encounter Abstracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Medical Records and Electronic Health Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Reports and Surveillance of Patient Safety Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
Surveys of Healthcare Encounters and Healthcare Experiences . . . . . . . . . . . . . . . . . . . . . . . 250
Other Data Sources and Data Linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Patient Safety Research Using Administrative Data: General Framework,
Methods, and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
General Framework for Administrative Data-Based
Patient Safety Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Methodological Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
AHRQ Patient Safety Indicators: An Exemplary Tool for Administrative
Data-Based Patient Safety Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
Patient Safety Research Using Administrative Data: Potentials
and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Screen Patient Safety Events for In-depth Examination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Epidemiological Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Public Reporting on Patient Safety Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Advantages and Challenges in Administrative
Data-Based Patient Safety Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Abstract organizations, and government agencies for

A wide variety of data is routinely collected by administrative purposes. Readily available,
healthcare providers, insurers, professional computer readable, and covering large
populations, these data have become valuable
resources for patient safety research. A large
C. Zhan (*) number of exemplary studies have been
Department of Health and Human Services, Agency for
conducted that examined the nature and types
Healthcare Research and Quality, Rockville, MD, USA
e-mail: chunliu.zhan@ahrq.hhs.gov of patient safety problems, offered valuable

https://doi.org/10.1007/978-1-4939-8715-3_12
242 C. Zhan
insights into the impacts and risk factors, and, and Quality (AHRQ), the US federal agency
to some extent, provided benchmarks for track- charged with improving patient safety, defined
ing progress in patient safety efforts at local, patient safety as “freedom from accidental or pre-
state, or national levels. Various methods and ventable injuries produced by medical care.” The
tools have been developed to aid such research. literature is littered with systems of definitions,
The main disadvantage lies with the fact these taxonomies, categorizations, terms, and concepts
administrative data are often collected without associated with patient safety. The National
following any research design, protocol, or Quality Forum’s list of “never events” or “serious
quality assurance procedure; therefore health reportable events” offers concrete examples of the
services researchers using these data sources types of issues patient safety research is concerned
must make extra efforts in devising proper with:
methodologies and must interpret their find-
ings with extra caution. As more and more • Surgical events: surgery or other invasive pro-
administrative data are collected and digita- cedure performed on the wrong body part or
lized and more tailored methodologies and the wrong patient, the wrong surgical or other
tools are developed, health services researchers invasive procedure performed on a patient, and
will be presented with ever-greater opportunity unintended retention of a foreign object in a
to extract valid information and knowledge on patient after surgery or other procedure
patient safety issues from administrative data. • Product or device events: such as patient death
or serious injury associated with the use of
contaminated drugs, devices, or biologics
Introduction • Patient protection events: discharge or release of
a patient/resident of any age, who is unable to
A guiding principle for medical professionals is the make decisions, to other than an authorized
Hippocratic oath: First, Do No Harm. But, inevi- person and patient suicide, attempted suicide,
tably, patient harms occur, and research is needed or self-harm resulting in serious disability while
to understand why and how to prevent them. Since being cared for in a healthcare facility
the Institute of Medicine (IOM) published its land- • Care management events: such as patient death
mark report, To Err Is Human: Building a Safer or serious injury associated with a medication
Healthcare System (Kohn et al. 1999), in 1999, the error (e.g., errors involving the wrong drug,
importance of vigorous, systematic research on wrong dose, wrong patient, wrong time,
patient safety has been recognized worldwide, wrong rate, wrong preparation, or wrong
and patient safety research has become a prominent route of administration), patient death or seri-
domain of health services research. Using a variety ous injury associated with unsafe administra-
of definitions, taxonomies, methods, and data- tion of blood products, maternal death or
bases, health services researchers have addressed serious injury associated with labor or delivery,
a wide range of patient safety-related questions, and patient death or serious injury resulting
producing a large body of literature. from failure to follow up or communicate lab-
To the general public, patient safety is self- oratory, pathology, or radiology test results
defined. As a research topic, its definition is far • Environmental events: patient or staff death or
from universally agreed. IOM defines patient serious injury associated with a burn incurred
safety as “the prevention of harm to patients”, in a healthcare setting and patient death or
and its emphasis is placed on “the system of care serious injury associated with the use of
delivery that (1) prevents errors; (2) learns from restraints or bedrails while being cared for in
the errors that do occur; and (3) is built on a a healthcare setting.
culture of safety that involves health care profes- • Radiologic events: death or serious injury of a
sionals, organizations, and patients” (Kohn patient or staff associated with introduction of
et al. 1999). The Agency for Healthcare Research a metallic object into the MRI area
11 Health Services Information: Patient Safety Research Using Administrative Data 243
• Criminal events: any instance of care ordered extra efforts in devising methodologies and must
by or provided by someone impersonating a interpret their findings with extra caution.
physician, nurse, pharmacist, or other licensed Patient safety as a research domain is relatively
healthcare provider, abduction of a patient/res- new compared with other health services research
ident of any age, sexual abuse/assault on a domains, and the issues are diverse and constantly
patient within or on the grounds of a healthcare evolving. Administrative data is also fast ex-
setting, and death or significant injury of a panding, with more and more data collected and
patient or staff member resulting from a phys- accumulated as computer technologies progress
ical assault (i.e., battery) that occurs within or and interest in mining big data increases. Conse-
on the grounds of a healthcare setting quently, patient safety research using administra-
tive data does not follow any clearly defined
To focus on the subject at hand, that is, how to agenda, methodologies, or processes, giving
use administrative data to conduct patient safety researchers great room for creativity and innova-
research, this chapter will refer to all events with tion and also greater room for error.
patient safety implications as patient safety events This chapter provides a review of the adminis-
without distinction. trative data sources currently available for patient
Patient safety research can be done in a number safety research, the common methodologies and
of ways, such as follow-up of cohorts of patients tools employed, and the types of patient safety
as they come into contact with healthcare systems research that can be conducted using administra-
and randomized trials to examine whether a cer- tive data. By going through some well-developed
tain intervention works to reduce patient safety concepts, tools, and examples, the chapter intends
events. However, such studies are rare, due to to offer health services researchers a road map on
the fact that patient safety events are accidental how to use administrative data to generate infor-
in nature, in other words, rare; to gather a suffi- mation and knowledge to advance their patient
cient number of cases of patient safety events, a safety agenda.
researcher must collect a substantially large study
sample. Unsurprisingly, most studies on patient
safety were conducted using administrative data, Administrative Data: Definition, Data
the type of data collected routinely and processed Resources, and Potential Patient
in large volume for administrative purposes. Safety Measures
Health services researchers have used adminis-
trative data to study a variety of patient safety Administrative data refer to data collected for
issues, from the prevalence to risk factors and administrative purposes. Such data are essential
effectiveness of interventions to reduce patient for running any kind of business, and the business
safety events. The apparent advantage of adminis- of healthcare is no exception. Hospitals, outpa-
trative data is in its large volume and its com- tient clinics, nursing homes, home care providers,
puterization, which make the most tenuous and pharmacists, and all other healthcare providers
expensive part of research – data collection – collect and compile data on patients, medical con-
relatively easy and cheap. Another advantage is ditions, treatments, and patient directives, create
that, because of little risk to interrupting patient bills for patients and submit claims to insurers and
care in the data collection process and little risk of other third-party payers for reimbursements, and
patient privacy breach with patient identifiers compile business data for governance, internal
stripped, data acquisition can be done without audits, credentialing, and statistical reports.
jumping through many hoops. The apparent disad- Health insurance companies deal with medical
vantage lies with the fact that these administrative claims in addition to enrolling patients, generating
data are collected without following any research enormous amount of data on a daily basis. Drug
design, protocol, or quality assurance procedure; companies collect data on drug sales, establish
therefore, researchers using these data must make drug registries for postmarket research, and
244 C. Zhan
compile data on drug safety to meet regulatory It should be noted that, in health services
needs. Professional societies, such as the research literature, claims data are often treated
American Medical Association and the American as synonymous to administrative data. It is
Hospital Association, also compile extensive data because medical claims, which record individual
on their members for membership management, patients’ individual episodes of care for insurance
licensing, accreditation, and other administrative claims, are the most voluminous data, the first
purposes. Many employers, especially large and extensively computerized data, and the first
traditional companies, offer extensive health bene- administrative data sources extensively used in
fits, and, for management purposes, compile exten- health services research. However, similar data
sive data to track their employee’s use of health on individual healthcare encounters are also col-
benefits and expenses. Last but not least, govern- lected in many countries or programs under uni-
ment agencies compile extensive data, including versal insurance coverage, and these data are
claims in order to pay the bills for patients covered sometimes called discharge abstracts. Following
by government programs, data from healthcare pro- the basic definition of “administrative data,” this
viders to monitor this important sector of the econ- review also includes other data sources that are
omy, and regular surveys to generate national collected for administrative purposes, but may be
statistics and track changes over time. Together, smaller in scale, less computerized, and less often
tremendous amounts of administrative data are pro- used in health services research. The basic char-
duced and maintained by various entities, and these acteristics of these data sources and the potential
data hold great potential for research on a wide patient safety measures that can be derived from
range of issues, including patient safety issues. these data sources are discussed in detail below.
In general, any data source that records per-
sonal encounters or experiences with healthcare
systems has the potential to contribute informa- Medical Claims, Discharge, and Other
tion and knowledge on patient safety. Many other Health Encounter Abstracts
data sources containing no patient care data can
also be useful when merged with patient encoun- Data Sources
ter data. Table 1 provides a brief summary of the A healthcare provider must collect and compile
types of administrative data sources that are avail- data on each service rendered to each patient, for
able and that have been used by health services record keeping, patient tracking, billing, and other
researchers to study patient safety. administrative purposes. At minimun, the data
Table 1 Administrative data sources and potential patient safety measures

Data source Potential patient safety measures
Medical care claims and abstracts Screening algorithms based on ICD codes, interactive drug-drug
pairs, contraindicative drug-event pairs, utilization-condition
pairs indicative of inappropriate, over- and underuse of specific
medications or procedures
Medical records, electronic medical records Screening algorithms above, expanded to include more clinical
data and text narratives
Reports of medical errors and adverse events, Each report describes a patient safety event and contextual
malpractice claims factors
Survey of healthcare encounters or experiences Screening algorithms based on ICD codes, interactive drug-drug
pairs, contraindicative drug-event pairs, utilization-condition
pairs indicative of inappropriate, over- and underuse of specific
medications or procedures
Other administrative databases such as census, Contain no patient safety measures but expand research into
provider databanks, geo-eco-political databases population, provider, and regional statistics in relation to patient
safety events
include some patient demographics, medical con- Besides CMS, other federal agencies, state
ditions, diagnoses, treatments, discharge or dispo- health departments, health plans, and private
sition status, and charges and payments. As data institutions have also compiled claims data
mentioned earlier, the most important use of into research databases. One prominent example
such data is to make insurance claims; therefore is AHRQ’s Healthcare Cost and Utilization Pro-
this type of data is often called “claims data” and ject (HCUP), a partnership of the federal govern-
further categorized as inpatient claims, outpatient ment and states that compiles uniform hospital
claims, pharmacy claims, and so on. In many discharge records for research purposes (HCUP
countries other than the United States, health 2014). As of today, HCUP includes databases
encounters are similarly recorded and compiled covering all hospital admissions from 47 states,
but not for insurance claims purposes, and this emergency department visits from 31 states,
type of administrative records may be called dis- and ambulatory surgery claims from 32 states.
charge abstracts, for example. Regardless of It has derived research databases with a sampling
terms, data on individual healthcare encounters design to yield national estimates and developed
are universal and are available in various capacity various tools to reliably and effectively use these
for research use. databases. On the private side, Truven Health
Researchers rarely have the need to deal with Analytics MarketScan ® databases contain com-
individual hospitals, primary care institutions, plete claims for more than 199 million unique
nursing homes, outpatient surgical centers, or patients, and IMS Health compiles information
home care agencies to access such data. Govern- from 100,000 suppliers from over 100 countries,
ment agencies, insurers, health systems, and many with more than 45 billion healthcare transactions
commercial companies compile the data and offer processed annually.
them to various end users. In the United States, the With the government paying for all health
Centers for Medicare and Medicaid Services services provided by mostly private providers,
(CMS) has been a major source of such adminis- Canada collects data on individual health
trative data. Medicare, a national social insurance encounters for almost the entire population.
program, guarantees access to health insurance for Some provinces have data on virtually all records
about 50 million Americans aged 65 and older and of hospitalizations, pharmacy, physician visits,
younger people with disabilities. Medicaid, a emergency department visits, and so on for
social healthcare program jointly funded by the every resident. Many efforts are made to make
state and federal governments and managed by the such data easy for researchers to access and use.
states, provide coverage for families and individ- For example, the Canadian Institute for Health
uals with low income and resources. Together, Information maintains discharge abstract data-
Medicare and Medicaid process millions of claims bases of administrative, clinical, and demo-
each day. CMS has made great efforts to make graphic information on hospital discharges
these claims available to researchers and to stan- received directly from acute care facilities or
dardize the data release process. The latest incar- from their respective health authority or depart-
nation of these efforts is called the CMS Data ment of health. In the United Kingdom, hospital
Navigator (CMS 2014), intended to be the one episode statistics comprises an administrative
stop for all CMS data sources, through standard database of all inpatients in England, covering
processes that include formulated requests, about 13 million episodes of care annually.
approval, pricing, and payment procedures to Similar databases exist, in various forms, in
ensure proper use and security of the data. The almost all nations, most of which are available
CMS data suite covers enrollment, outpatient for research purposes.
care, hospitalization, pharmacy, and services Regardless of country, healthcare system, or
delivered by other types of providers, and the purpose, administrative data of this sort record
data can be linked to form a rather complete patient encounters with the healthcare system
history of indidividual’s healthcare encounters. and capture with similar sets of data elements:
246 C. Zhan
• Patient demographics such as age, sex, race/ E8716: post heart catheter foreign object left in
ethnicity, county of resident and zip code, and body
expected payer E8717: post catheter removal foreign object left in
• Admission status including admission date, body
admission source and type, and primary and E8718: foreign object left in body during other
secondary diagnoses specified procedure
• Treatments such as procedures and E8719: foreign object left in body during
medications non-specified procedure
• Discharge status entailing discharge date,
patient disposition, or death The corresponding ICD-10 codes for foreign
• Charges and payments object accidentally left in body during a procedure
may include:
In addition, some identifiers for patients and
providers, usually encrypted, are included, T81.509A: unspecified complication of foreign
allowing for linking individual patient’s claims body accidentally left in body following
from multiple care settings. unspecified procedure, initial encounter
T81.519A: adhesions due to foreign body acci-
Potential Patient Safety Measures dentally left in body following unspecified pro-
A coding system for diagnosis and procedures is cedure, initial encounter
essential for recording patient encounters and for T81.529A: obstruction due to foreign body acci-
generating bills. The United States currently uses dentally left in body following unspecified pro-
International Classification of Diseases, the ninth cedure, initial encounter
revision, Clinical Modification (ICD-9-CM), a T81.539A: perforation due to foreign body acci-
coding system with three-digit numbers (i.e., dentally left in body following unspecified pro-
001–999) followed by a decimal point and up to cedure, initial encounter
two digits, supplemented by a group of E codes
(E000–999) capturing external causes of injury The process of identifying the right codes and
(Iezzoni et al. 1994). Canada, Australia, eligible patients to measure patient safety is a mix
New Zealand, and many European and Asian of science and art. It rarely is clear that one code
countries have been using ICD-10, an alphanu- specifically records a specific patient safety event.
meric system each starting with a letter (i.e., A–Z), The art of the process includes not only selection
followed by two numeric digits, a decimal point, of relevant codes but also exclusion of patients for
and a digit (Quan et al. 2008). whom the codes are not likely to be relevant.
Some of the codes specifically identify a patient Another consideration is whether a recorded
safety event, and some codes suggest that there event occurred during the current hospitalization
may be an event of patient safety concern. For (i.e., hospital-acquired condition) or whether it
example, there are ICD-9-CM diagnosis codes for was already present on admission (i.e., comorbid
“foreign object accidentally left in body during a condition). If the code appears as the first, or
procedure”: 998.4. Some other codes may also primary diagnosis, in a claim or discharge
suggest such occurrence, including: abstract, then it can be considered to record an
event that is present on admission. But as many as
998.7: postoperative foreign substance reaction 25 secondary diagnosis codes are recorded in
E8710: post-surgical foreign object left in body some claims data, and only recently, a code was
E8711: postinfusion foreign object left in body introduced in Medicare claims to indicate whether
E8712: postperfusion foreign object left in body a diagnosis is present on admission. A great deal
E8713: postinjection foreign object left in body of effort in administrative data-based patient
E8714: postendoscopy foreign object left in body safety research goes into the artistic process with
E8715: postcatheter foreign object left in body the dual purpose to maximize specificity (i.e., an
event flagged by the codes is truly a patient safety or prescribed on discharge; discharge diagno-
event) and sensivity (i.e., all patient safety events sis; and discharge plan and discharge planning
are flagged). This process is further illustrated in evaluation
later sections, in conjunction with the discussion • Other information: such as advance directives,
of the methods and tools used in administrative informed consent, and records of communica-
data-based patient safety research. tion with the patient, such as telephone calls or
Algorithms can also be built based on coded email.
data other than ICD codes. Claims for medications
can be used to screen harmful drug-drug interac- Medical records can be handwritten, typed, or
tions and contraindicative drug-condition interac- electronic and can be coded or written in open-text
tions. With data linked from multiple settings and narratives. The rich clinical information makes
over time, certain measures of inappropriate use, medical records a good source for patient safety
underuse, or overuse of care with safety implica- research, allowing identification of various medi-
tions can be studied. cal injuries, adverse events, errors, and nearmisses
and allowing analysis of circumstances and
causes of various patient safety events. Earlier
Medical Records and Electronic Health research on patient safety used medical records
Records predominantly as the primary data source (Kohn
et al. 1999). Those earlier studies mostly had to
Data Sources work with medical records in paper format or
Medical records are as numerous as claims but electronic format that was not readily usable for
much richer in information on patients and their research and had to rely on medical experts to
healthcare experiences. Each healthcare encoun- transform medical records into research data, a
ter has a medical record associated with it to process that was resource intensive and required
support diagnosis and justify services provided. exceptional knowledge and skills in medical con-
Broadly speaking, a medical record may contain: text and research. As a result, earlier patient safety
research with medical records was usually limited
• Patient demographic information: name, in scope and statistical power.
address, date of birth, sex, race and ethnicity, The wide adoption of electronic medical
legal status of any patient receiving behavioral records (EMRs) offers great promise for patient
healthcare services, and language and commu- safety research. In the United States, a substantial
nication needs, including the preferred lan- percentage of hospitals and physicians have
guage for discussing healthcare issues started to use EMR systems, with various levels
• Patient clinical information: reason(s) for of capacity and usability. In the United Kingdom,
admission; initial diagnosis; assessments; the National Health Service collects and stores
allergies to food or latex or medication; data electronically on primary care encounters in
medical history; physical examination; diagno- the clinical information management system.
ses or conditions established during the Great efforts are being made in Canada and all
patient’s course of care, treatment, and ser- over the world to move the healthcare industry
vices; consultation reports; observations rele- into the Information Age.
vant to treatment; patient’s response to
treatment; progress notes; medications Potential Patient Safety Measures
orderedor prescribed; medications adminis- In theory, EMRs hold much of what claims data
tered, including the strength, dose, frequency, can offer and much more. EMRs contain a great
and route; adverse drug reactions; treatment deal of information in structured, coded data sim-
goals; plan of care and revisions to the plan of ilar to administrative data. The allure of EMR data
care; results of diagnostic and therapeutic in patient safety research lies with its rich clinical
tests and procedures; medications dispensed data, such as lab values, and narratives that record
248 C. Zhan
Table 2 Medical record-based screening for patient safety events: adverse drug events associated with warfarin
Description Screening algorithm
Numerator The subset of the denominator who during the hospital stay experienced:
An INR 4.0 with one or more of the following symptoms: cardiac arrest/emergency measures to
sustain life, death, gastrointestinal bleeding, genitourinary bleeding, a hematocrit drop of three or more
points more than 48 h after admission, intracranial bleeding (subdural hematoma), a new hematoma,
other types of bleeding or pulmonary bleeding
An INR >1.5 and an abrupt cessation/hold of warfarin with one or more of the above symptoms
An INR >1.5 and administration of vitamin K or fresh frozen plasma (FFP) with one or more of the
above symptoms
An INR >1.5 and a blood transfusion absent a surgical procedure with one or more of the above
symptoms
Denominator All patients who received warfarin during hospitalization and had a documented INR result during the
hospital stay
medical providers’ observations, judgments, facilities to communicate findings and case stud-
treatment details, and outcomes. Screening algo- ies illustrating the most egregious harms.
rithms can be designed to search for patient safety With regard to rich notes and other narratives
events in coded data as well as in text narratives. in EMRs, there has been much hype but little real
The search can look for falls, retrieve lab data on progress. The method to identify, extract, and
toxic serum levels of digoxin, or screen for inter- encode relevant information from tremendous
national normalization ratios greater than 6 in volumes of text narratives is called natural lan-
patients on warfarin. It can entail a sophisticated, guage processing (NLP). In general, EMR narra-
explicit, structured query of entire medical tives are stored following internal structure;
records. Table 2 shows an example that screens information extraction involves the selection of
EMRs for possible adverse drug events for the relevant sections of EMR and then targeted
patients on warfarin. text data processing. NLP systems, such as
Such algorithms can be used in manual review MEDSYNDIKATE, MetaMap, SemRep,
of medical records and can also be used to design MedLEE, and BioMedLEE, can extract data
automatic review of EMRs. pertaining to patient safety events. In a recent
There are many challenges in implementing study of adverse drug events attributable to six
such explicit screening algorithms, and compro- drugs, Wang et al. (2009) demonstrated the gen-
mises are made. The Institute for Healthcare eral process, which consists of five stages:
Improvement (Griffin and Resar 2009) has devel- (1) collecting the set of EMRs to be mined,
oped a set of global trigger tools that screen med- (2) processing the summaries using NLP to
ical records for possible adverse events, including encode clinical narrative data, (3) selecting data
groups of triggers for medical, surgical, and while co-occurrence match of a specific drug and
medication-related patient harms. The tools its potential adverse drug events exist, (4) filtering
screen coded data; look for the most significant, data by excluding confounding information such
easy-to-detect signs; and can be applied by as diseases or symptoms that occurred before the
healthcare organizations to review paper-based use of the drug, and (5) analyzing and determining
and also electronic medical records. The trigger the drug-adverse drug events association.
tools have been adopted by many countries and In theory, any type of errors and adverse events
health systems. For example, Adventist Health that can be recognized by a clinician going
System used the tools to gauge the number, through a medical record can be captured elec-
types, and severity levels of adverse events in tronically. However, this theory is far from being
25 hospitals that used a common EMR system realized. There are many EMR systems that vary
and developed a centralized process to do so uni- substantially in structure, format, and content, and
formly, including quarterly reports to participating there are legal and practical obstacles over data
sharing. However, some healthcare systems have extracted from the FAERS database to researchers
started to pull together EMR data for research. It is inside and outside of the FDA. Similar to the FDA
expected that in the near future, research data- FAERS, the UK’s Medicines and Healthcare
bases composed of large volume of medical Products Regulatory Agency institutes a Yellow
records from many providers and cross care set- Card Scheme that allows patients and health pro-
tings, databases resembling HCUP or CMS data fessionals to report suspected side effects. The
navigator, will be created and made available to reports are continually assessed by medicine
health services researchers. safety experts, together with additional sources
of information such as clinical trial data, the med-
ical literature, and data from international medi-
Reports and Surveillance of Patient cines regulators, in order to identify previously
Safety Events unidentified safety issues or side effects.
MEDMARX is a similar system of voluntary
Data Sources reports but focuses on medication errors. Cur-
Alternative data sources for patient safety rently, MEDMARX contains over 1.3 million
research include mandatory and voluntary reports medication error records reported by over
of medical errors or adverse events, drug safety, or 400 healthcare facilities that voluntarily partici-
nosocomial infection surveillance systems and pate. The program collects information on medi-
other data systems that government agencies and cation errors, categorizing them into nine severity
nongovernmental organizations use specifically to levels, ranging from errors that do not reach
monitor patient safety. Spontaneous reporting sys- patients to errors that cause death. The reporting
tems have been created as the primary means for system contains up to 13 required data elements
providing postmarket safety information on drugs and 29 optional data elements to describe error
since the 1960s, and some systems have also types, causes, locations, staff involved, products
covered patient safety events due to inappropriate involved, and patient characteristics. The system
use of drugs. Such systems exist all over the world also asks about actions taken in response to the
in various names and with various mandates. errors, including both individual procedural activ-
This type of data sources records individual ities (i.e., actions to recover from the error) and
incidences of patient safety events and varies tre- practice-based changes (i.e., actions to prevent
mendously in formats and contents. One promi- future errors). Most data elements are coded fields
nent example of such a reporting system is the US allowing single or multiple selection, and some
Food and Drug Administration (FDA) Adverse data fields are for textual descriptions.
Event Reporting System (FAERS). FAERS con- Some surveillance systems collect similar data
tains information on adverse event and medica- but make reporting mandatory in order to accu-
tion error reports submitted to the FDA by rately track incidences of patient safety events.
healthcare professionals and consumers voluntar- The Centers for Disease Control and Prevention
ily as well as by drug manufacturers who are (CDC) National Nosocomial Infections Surveil-
required to send all adverse event reports they lance System is a prominent example of such a
receive from healthcare providers and consumers. data source, which has continued gathering
The database is designed to support the FDA’s reports from a sample of hospitals in the United
postmarketing safety surveillance program for States on nosocomial infections since the 1970s.
drug and therapeutic biologic products, to help Another example is the National Electronic Injury
FDA look for new safety concerns that might be Surveillance System (NEISS) at the CDC, com-
related to a marketed product, to evaluate a man- posed of a national probability sample of hospitals
ufacturer’s compliance with reporting regulations, in the United States that collect patient informa-
and to respond to outside requests for information. tion for every emergency visit involving an injury
Besides regulatory use, the FDA provides raw associated with consumer products, including
data consisting of individual case reports medical products. More recently, to address
250 C. Zhan
heightened public concerns over drug safety, the sector. Some of the surveys collect data on per-
system started a Cooperative Adverse Drug Event sonal encounters with healthcare systems and,
Surveillance Project (NEISS-CADES) to capture therefore, are potential data sources for patient
cases that are defined as those occurring in per- safety research.
sons who sought emergency care for injuries In the United States, the National Center for
linked by the treating physician to the outpatient Health Statistics, under the CDC, conducts a wide
use of a drug or drug-specific adverse effects. array of national surveys that contain healthcare
Using NEISS-CADES, Budnitz et al. (2011) encounter experiences. The National Ambulatory
were able to estimate that adverse drug events in Medical Care Survey collects information about
older Americans accounted for about 100,000 the provision and use of ambulatory medical care
emergency hospitalizations a year in the United services, drawing a random sample of visits to
States, and four medications (warfarin, insulins, nonfederal, office-based physicians who are pri-
oral antiplatelet agents, oral hypoglycemic marily engaged in direct patient care. The
agents) were implicated alone or in combination National Hospital Ambulatory Medical Care Sur-
in two thirds of the cases. vey collects similar data, on the utilization and
provision of ambulatory care services in hospital
Patient Safety Measures emergency and outpatient departments from a
Because each record of this type is to provide national sample of visits to the emergency depart-
details for one specific patient safety event, no ments and outpatient departments of noninsti-
effort is needed to identify or validate the reported tutional, general, and short-stay hospitals. The
event. The data allows various targeted research, National Hospital Discharge Survey collects data
such as the types of errors or adverse events most from a national sample of hospital discharges
frequently occurring, the circumstances, the pos- from nonfederal, short-stay hospitals. The
sible causes as reported, and the follow-up National Hospital Care Survey, a relatively new
actions. But this type of data has some obvious database, integrates inpatient data formerly
limitations for patient safety research. First, the collected by the National Hospital Discharge
reported event (adverse event or medication error) Survey with the emergency department, outpa-
may not be due to the product or a causal relation- tient department, and ambulatory surgery center
ship with the product. Second, the reports do not data collected by the National Hospital Ambula-
always contain enough detail to properly evaluate tory Medical Care Survey, with personal identi-
an event. Third, because of the voluntary nature of fiers linking care provided to the same patient in
data submission, the system does not receive the emergency departments, outpatient depart-
reports for every adverse event or medication ments, ambulatory surgical centers, and inpatient
error that occurs; therefore, the data cannot be departments.
used to calculate the incidence of an adverse Beside surveys of healthcare encounters as
event or medication error in a population. Lastly, listed above, some surveys ask patients and fam-
this type of data contains no controls (i.e., the ilies directly for information on their healthcare
patients without patient safety events), severely experiences. CMS Medicare Current Beneficiary
limiting its use in epidemiological research. Survey is such a data source, containing survey
responses from a random sample of Medicare
beneficiaries and linking to their administrative
Surveys of Healthcare Encounters data covering inpatient, outpatient, and other
and Healthcare Experiences claims. AHRQ Medical Expenditure Panel Sur-
vey is a set of large-scale surveys of families
Data Sources and individuals, their medical providers, and
Many government agencies conduct routine sur- employers on healthcare use and spending.
veys to collect data in order to produce national Similar surveys of healthcare encounters,
statistics and track changes in the healthcare residents, or families exist in various forms in
many other countries. For example, the Canadian causes, and circumstances of the underlying errors
Community Health Survey resembles the Medical and identify potential strategies to improve patient
Expenditure Panel Survey in general purposes safety.
and methods, collecting information annually on Combining multiple data sources for research
a large sample of the Canadian population on has been a significant trend in recent years. The
information related to health status, healthcare FDA’s Mini-Sentinel Project is an example.
utilization, and health determinants. Tasked with monitoring the safety of approved
medical products, the postmarket surveillance
Patient Safety Measures system consists of claims data from 18 private
Surveys of healthcare encounters and healthcare health plans covering about 100 million people,
use usually contain data on medical conditions, supplemented by EMR data from 18 healthcare
diagnoses, and procedures, coded by ICD-9-CM organizations, designed to answer the FDA’s
or other similar coding systems. As with claims questions on postmarket safety. The claims data
data, some patient safety indicators can be derived capture the complete records of individuals’ expo-
from the coded data. Depending on the data sure to a specific medical product in question and
collected, other screening algorithms can be limited measures of patient outcomes such as
designed. For example, many surveys collect death and major, codified complications. The
data on medication prescriptions, and measures linked EMR is then used to confirm a diagnosis
of inappropriate medication prescriptions can be and adverse events. The data are hosted locally
derived by screening medications that generally with individual participants to protect privacy and
should not be prescribed to patients with advanced confidentiality and are aggregated through com-
age or with certain medical conditions. Once a mon data formats and analytical modules. This
patient safety event is identified with moderate complicates the data analysis somewhat, but
specificity and sensitivity, survey data support a with flexible design and proper stratification,
wide range of patient safety research, especially such combined data can answer a great number
with national statistics, variation across regions of patient safety questions efficiently.
and social strata, and changes over time. Some administrative data sources that are not
concerned with patient safety events can be of
great value to patient safety research. Data col-
Other Data Sources and Data Linkage lected from providers for statistics, membership
management, or licensing purposes can be
Many other administrative data sources besides merged with patient encounter data capable of
the four types discussed earlier contain informa- identifying patient safety events. The American
tion on individual events of patient safety con- Hospital Association’s Annual Survey, for exam-
cerns. Malpractice claims, for example, contain ple, contains hospital-specific data on approxi-
rich data for patient safety research. A malpractice mately 6,500 hospitals and 400-plus systems,
claim is a written demand for compensation for a including as many as 1,000 data fields covering
medical injury, alleging that an attending physi- organizational structure, personnel, hospital
cian or a care provider is responsible for the injury facilities and services, and financial perfor-
due to missed or delayed or wrong diagnosis or mance. By linking this data with data on personal
treatment. A claims file captures information on healthcare encounters, researchers can study a
an entire litigation, from statement of claim, depo- variety of hospital-level factors in relationship
sitions, interrogations, reports of internal investi- to patient safety events. The American Medical
gations, root cause analyses, expert opinions from Association maintains a suite of membership
both sides, medical records and analysis, and final data, including the Physician Masterfile that con-
resolution and payments. Working with malprac- tains extensive personal and practice-related data
tice insurance companies, researchers can access for more than 1.4 million physicians, residents,
closed malpractice claims to study the nature, and medical students in the United States. By
252 C. Zhan
linking this file with other data, researchers are General Framework for Administrative
able to examine physician-related factors in rela- Data-Based Patient Safety Research
tion to patient safety events. Other types of orga-
nizations, such as nursing homes, home care Generally speaking, there are two types of
agencies, hospice, and primary care practices, research: estimation and hypothesis testing.
all maintain similar membership data, and, in Since patient safety research is a relatively new
theory, all can be linked to amplify patient safety field, most published studies since the landmark
research. 1999 IOM report have been about estimating
Population census data and geopolitical data prevalence and incidence of patient safety events
can make similar contributions to patient safety and distributions by categories, settings, causes,
research. Population surveys can provide denom- and circumstances. It is well recognized that each
inator information such as total population and administrative data source has an inherent popu-
subpopulations by age, racial, economic, and lation, such as Medicare beneficiaries from Medi-
other categories. The Area Resource File, com- care claims, which is further refined by exclusion
piled by the US government, contains information and inclusion criteria defined by the patient safety
on health facilities, health professions, measures screening algorithms employed. The focus for a
of resource, health status, economic activity, robust estimation study is to correctly identify the
health training programs, and socioeconomic numerators (i.e., patient safety events) and the
and environmental characteristics. By linking denominators (i.e., the underlying population at
this file with other patient safety data through risk for the patient safety events), a seemingly
geographic codes, researchers can explore geo- straightforward but in reality rather tenuous
graphic variation in patient safety events and process.
related econ-geo-political factors. To test hypotheses, administrative data-based
Data access to many of the above data sources patient safety research usually follows the general
can be challenging, but the challenges are fewer framework of regression analysis in epidemiology.
and less restricting compared with other data- To test hypotheses, administrative data-based
gathering efforts. Government-owned data are patient safety research usually follows the general
usually available following straightforward pro- framework of regression analysis in epidemiology
cesses. Data owned by private organizations can in which the occurrence of a patient safety event Y
be obtained in many ways, including, through is related to possible causes being examined or
collaboration with the data owners or researchers interventions evaluated X and confounding fac-
intimate with the data owners. tors Z. Within this framework two types of ques-
tions can be addressed. The first type of question
is why a patient safety event occurs, and the
Patient Safety Research Using second type of question is what are the conse-
Administrative Data: General quences of such an event.
Framework, Methods, and Tools In answering both questions, the most criti-
cal task is to build an analytical dataset out of
Because administrative data are not collected or one or more administrative data sources for a
compiled following an a priori study design, specific patient safety research question. This
efforts in choosing appropriate methods and in step involves the correct identification and
presenting the results in light of inherent limita- measurement of X, Y, and Z in the context of
tions of various data sources are of great impor- study cohorts of selected study subjects and
tance in generating valid information and time-stamp data, matching the data sources
knowledge on patient safety questions. This sec- (e.g., who is in the dataset and what X, Y,
tion offers a brief review of the general frame- and Z can be correctly measured and time-
work, methods, and tools for patient safety stamped) and the research questions to be
research using administrative data. answered. The second step is relatively easier,
using established statistical models or more standard. Specificity is defined by the positive
advanced data-mining techniques to estimate predictive value (PPV), which is the proportion
the relational parameters in the equation. The of patients flagged in the administrative data as
third step, interpreting the results and making having patient safety events who actually had
valid inferences in the full acknowledgment of such events, as confirmed by medical record
data limitations, also demands great attention. review or other ascertaining methods. Sensitivity
is the proportion of the patients with patient safety
events that are actually flagged in the administra-
Methodological Considerations tive data. Table 3 shows the calculation.
Zhan and his colleagues (2009) demonstrated
Identification of Patients with Patient the complexity of this issue in a study that
Safety Events attempted to determine the validity of identifying
The previous section went through the list of hospital-acquired catheter-associated urinary tract
potential administrative data sources and potential infections (CAUTIs) from Medicare claims, using
patient safety measures these data sources may medical record review as the gold standard. They
offer. It is clear that the usefulness of an adminis- found that ICD-9-CM procedure codes for urinary
trative data source in patient safety research catheterization appeared in only 1.4 % of Medi-
depends, first of all, on the ability of the data care claims for patients who had urinary catheters.
source to correctly identify patient safety events. As a result, using Medicare claims to screen UTIs
The validity of derived patient safety measures cannot be limited to claims that have a procedure
depends on carefully designed and validated indi- code for urinary catheterization. Using major sur-
cators, screening algorithms, or triggers. There- gery as the denominator, Medicare claims had a
fore, with the exception of medical error reports PPV of 30 % and sensitivity of 65 % in identify-
and malpractice claims where each record is, by ing hospital-acquired CAUTIs. Because 80 % of
definition, a patient safety event, a robust patient the secondary diagnosis codes indicating UTIs
safety research project starts with the most critical were present on admission (POA), adding POA
task of screening, determining, and ascertaining indicators in the screening algorithm would
patient safety events. This is a process of science, increase the PPV to 86 % and sensitivity to
rooted in the researchers’ understanding of the 79 % in identifying hospital-acquired CAUTIs.
relevant medical knowledge, the data-generating This study indicates that the screening algorithm
process, the structure of the specific databases, based on the selected ICD-9-CM codes and POA
and the specific purposes of the relevant research. code and confined to major surgery patients is a
It is also an art since there is usually no set formula valid way to identify patients with hospital-
for health services researchers to follow in com- acquired CAUTIs in Medicare claims data.
pleting this first step. Claims from private insurance do not currently
In general, the validity of an administrative contain POA codes and, therefore, are not suitable
data-based patient safety measure can be evalu- for research aimed at estimating CAUTI preva-
ated by specificity and sensitivity, with medical lence or hypothesis testing due to the 70 %
record review serving most often as the gold false-positive rate.
Table 3 Calculation of specificity and sensitivity of a patient safety measure based on administrative data, using medical
record review as the gold standard
Medical record review
Administrative data screening With patient safety event Without patient safety event
With patient safety event True positive (TP) False positive (FP)
Without patient safety event False negative (FN) True negative (TN)
Validity calculation PPV ¼ TP=ðTP þ FNÞ; Sensitivity ¼ TN=ðTN þ FPÞ
254 C. Zhan
Because medical record review is labor inten- These variables support a wide range of
sive and expensive, researchers often cannot val- cross-sectional analyses and longitudinal studies
idate the screening algorithms they use and have when the variables are time-stamped. Many
to rely on what has been reported in the literature. claims databases, such as Medicare claims, allow
In many cases, validity data are entirely researchers to build the complete profile of
unavailable. Nonetheless, researchers need to patient’s healthcare experiences from multiple set-
have a clear understanding of the specificity and tings (e.g., inpatient, outpatient, pharmacy), over
sensitivity in the case identification algorithms multiple years. Researchers can identify not only
they use based on relevant literature, context anal- cases of patient safety events and controls but
ysis, or experience and decide whether the patient also cohorts to retrospectively follow over time,
safety measures are valid enough for their greatly expanding the capacity of any single
research purposes and discuss their results in administrative data source.
light of these limitations. Besides identifying administrative databases
with variables of interest, one crucial consider-
Construction of Analytical Dataset ation in analytical data construction is the linkage
Only with confidence that patient safety events of multiple data sources. The simplest kind of
can be identified with an acceptable level of spec- record linkage is through a unique identification
ificity and sensitivity from an administrative data number, such as social security number, or multi-
source should a researcher proceed to construct an ple variables that accurately identify a person,
analytical dataset. As discussed earlier, most such as name, age, date of birth, gender, address,
administrative data contain measures of basic per- phone number, and so on. This method is called
sonal information, medical conditions, diagnosis, deterministic or rules-based record linkage.
treatment, and disposition, and the administrative Sometimes, a personal identifier is combined
data can be expanded by linking to other data with some personal demographic data in data-
sources on patients (e.g., National Death Index), bases with missing data or errors in the identifier.
providers (e.g., AHA hospital surveys), local Administrative data sources often do not contain
socioeconomic data (e.g., Area Resource Files), or share common identifiers, and a new method
and so on (e.g., census population statistics), to called probabilistic record linkage can be used.
form analytical files. From these extended datasets, Probabilistic record linkage takes into account a
arrays of variables of interest, such as dependent wider range of potential identifiers, computing
variables, explanatory variables, or confounding weights for each identifier based on its estimated
controls, can be constructed, including: ability to correctly identify a match or a non-match,
and uses these weights to calculate the probability
• Patient characteristics: age, sex, insurance that two given records refer to the same entity.
coverage, etc. Record pairs with probabilities above a certain
• Medical conditions and diagnoses: primary threshold are considered to be matches, while
diagnosis, secondary diagnoses, pairs with probabilities below another threshold
comorbidities, etc. are considered to be non-matches; pairs that fall
• Treatment or utilization: medical and surgical between these two thresholds are considered to be
procedures, medications, outpatient visits, etc. “possible matches” and can be dealt with accord-
• Patient outcomes: disposition (including ingly (e.g., human reviewed, linked, or not linked,
death), length of stay, charges or payments, depending on the requirements).
complications, etc.
• Provider characteristics: ownership, practice Data Analysis
size and composition, financial status, etc. For most patient safety studies using administra-
• Area characteristics: population statistics, mar- tive data, the methods are simple and straightfor-
ket competitiveness, managed care market ward; the common statistical methods for
share, etc. observational studies, such as logistic regressions
with the dichotomous variable of having a patient case group and control group are “the same”
safety event or not as the dependent variable and with regard to these factors. Matching can
ordinary least-square regression with a continuous either be done on a one-to-one basis or one-
dependent outcome variable as dependent vari- to-many basis, and patients can be matched
able, apply. As with observational studies, admin- with respect to a single confounder or multiple
istrative data-based patient safety research can fall confounders. This method is particularly appli-
into the following broad categories: cable to administrative data-based patient
safety research because patients with safety
• Cross-sectional study, involving studying a events are few and potential controls are
population at one specific point in time many; therefore, it is relatively easy to find
• Case-control study, in which two existing one or more matching controls for each case.
groups differing in outcome are identified and • Stratification: once a confounding variable is
compared on the basis of some hypothesized identified, the cohort is grouped by levels of
causal attribute this factor. The analysis is then performed on
• Longitudinal study, involving repeated obser- each subgroup within which the factor remains
vations of the same variables over long periods constant, thereby removing the confounding
of time potential of that factor.
• Cohort study, a particular form of longitudinal • Multivariable regression: regression analysis,
study where a group of patients is closely mon- the most commonly used analytical technique,
itored over a span of time is based on modeling the mathematical rela-
tionships between two or more variables in
observed data. In the context of administrative
However, administrative data-based patient data-based patient safety research, there are
safety research is unique in many ways. First, four types of outcome measures. The first
the number of observations is substantially larger
type is a binary outcome, such as surgical site
than studies of experimental design or involving infections complicating total hip replacement,
primary data collection. Second, because, by def- where multivariable logistic regression is the
inition, patient safety events are unintended or
proper method to identify factors associated
unexpected; the cases of interest (i.e., patient with the infections. The second type is a con-
safety events) are usually very small in numbers tinuous outcome, such as functional status or
and rates. The standard approaches to causal infer-
costs, where multivariable linear regression is
ence or risk adjustment easily produce statistically applicable to study the influence of various
significant findings that are small and clinically predictors of the outcomes. The third type is
meaninglessly. Third, the cases of interest are
an incidence rate, such as nosocomial infection
identified with a certain level of uncertainty or rates at individual hospitals, where Poisson
misclassification errors, as discussed earlier. regression may be the best method to identify
These particulars should be born in mind when
hospital-level factors that predict higher or
devising analytical approaches.
lower nosocomial infection rates. The fourth
The following general methods have been used
type is a time-to-event outcome, such as
in administrative data-based patient safety
reoperation following initial operation, where
research:
Cox proportional hazards model may be most
appropriate to study risk factors.
• Matching: matching is a conceptually straight- • Propensity score analysis: propensity score
forward strategy, whereby confounders are analysis entails two steps. In the first step, it
identified and patients in the cases (e.g., those summarizes multiple confounding variables
with patient safety events) are matched to the into a probability or “propensity” of having a
controls (e.g., those without safety events) on patient safety event or falling into an interven-
the basis of these factors so that, in the end, the tion group, usually generated by a logistic
256 C. Zhan
regression model, with the propensity score unproductive even with advanced NLP tech-
ranging from 0 to 1. In the second step, the niques. By cascading steps through coded data,
propensity score is used for matching or researchers can narrow down the text data and
performing stratified analysis or to be inserted read selected text narratives to gain valuable
into multivariable regression to estimate the insights. For example, in their analysis of
impact of a patient safety event or an warfarin-related medication errors, Zhan
intervention. et al. (2008) found that one hospital reported
• Instrumental variable analysis: the instrumen- dispensing errors four times higher than aver-
tal variable approach is a method for age, two thirds of the errors occurred in the
confounding control that has been used by hospital’s pharmacy department, and 65 % of
economists for decades but has only recently the errors were caused by inaccurate/omitted
been implemented in health services research. transcriptions. The textual descriptions in these
The basic idea is that if a variable (the instru- reports clearly revealed the difficulties the
mental variable) can be identified, that has the pharmacists were having with the hospital’s
ability to cause variation in the treatment of new medication administration record system,
interest but that has no impact on outcome therefore pinpointing the fix.
(other than through its direct influence on treat-
ment). Then the variable can be used as an In summary, all methods for observational stud-
instrument in the regression analysis to control ies in epidemiology, sociology, and economics are
for unobserved or unobservable confounding applicable to administrative data-based patient
variables on the outcome variable. safety research. Health services researchers should
• Data-mining methodologies: data mining consult textbooks in these fields and also follow the
refers to an analytic process designed to advancement of methodologies in data mining,
explore data (usually large amounts of data, pattern recognition, and machine learning that are
known as “big data”) in search of consistent being developed and increasingly applied to extract
patterns and systematic relationships between information and knowledge from “big data” in the
variables and then to validate the findings by Information Age.
applying the detected patterns to new subsets
of data. One example of data-mining methods Interpreting the Results
used in administrative data-based patient The results from administrative data-based patient
safety research is called disproportionality safety research must be interpreted in light of the
analysis, which creates algorithms that calcu- limitations implicit both in the data and in the
late observed-to-exposed ratios. For example, methods. First of all, the specificity and sensitivity
to find the link between a drug and a suspected of the methods or algorithms used to screen or
adverse event, researchers can compare each identify patient safety events must be adequately
potential drug-adverse event pair to back- explained, and the potential bias due to misclassi-
ground across all other drugs and events in fication of cases needs to be discussed. Similar
the database and flag those pairs with dispro- measurement errors may also occur in other
portional ratios for further causal investigation. important variables derived from administrative
Unsupervised machine learning is another data, and similar discussions need to be made.
example, encompassing many data-mining Second, administrative data-based patient
methods purported to discover meaningful safety research shares the same flaws that all
relationships between variables in large observational studies have. Regardless what
databases. methods are used, there is always the possibility
• Contextual analysis: some administrative data that confounding remains in the results, due to a
sources contain extensive narrative data. wide range of possible causes from unobserved
Screening text data for information on patient or missed confounders, to measurement errors
safety events is costly and, sometimes, and mis-specifications of analytical models.
Furthermore, multiple other criteria are required literature to develop a list of candidate indicators
to establish causation. For example, multivariable and collected information about their perfor-
adjustment cannot give causation unless factors mance. Second, they formed several panels of
such as appropriate temporal ordering of predic- clinician experts to solicit their judgment of clin-
tors and outcome are ensured. Finally, health ser- ical sensibility and their suggestions for revisions
vices researchers must completely report how the to the candidate indicators. Third, they consulted
analyses were undertaken. From choice of con- ICD-9-CM coding experts to ensure that the def-
founders to the statistical procedure used, ade- inition of each indicator reflects the intended clin-
quate information should be provided so that an ical situation. Fourth, they conducted empirical
independent analyst can reliably reproduce the analysis of the promising indicators using HCUP
reported results. data. Last, they produced the software and docu-
mentation for public release by AHRQ.
Since its inception, AHRQ PSIs have been
AHRQ Patient Safety Indicators: An constantly validated and updated. The latest PSIs
Exemplary Tool for Administrative (AHRQ 2014) include 23 indicators and one com-
Data-Based Patient Safety Research posite indicator with reasonable face and con-
struct validity, specificity, and potential for
The AHRQ patient safety indicators (AHRQ fostering quality improvement. Most indicators
PSIs) are one of the most popular measurement use per 1,000 discharges as the denominators,
tools for screening patient safety events in admin- listed below. Some of the indicators are designed
istrative data (AHRQ 2014). Developed in the to capture event rates within a community:
United States in the context of claims data using
ICD-9-CM coding system, this toolkit has been PSI 02 Death Rate in Low-Mortality Diagnosis
adopted worldwide. A case study of AHRQ PSIs Related Groups (DRGs)
serves to illustrate the general process, the poten- PSI 03 Pressure Ulcer Rate
tials, the challenges, and the limitations of admin- PSI 04 Death Rate among Surgical Inpatients with
istrative data-based patient safety research. Serious Treatable Conditions
AHRQ PSIs started with Iezzoni and col- PSI 05 Retained Surgical Item or Unretrieved
leagues’ 1994 complication screening program Device Fragment Count
(CSP) that relied on ICD-9-CM codes in claims PSI 06 Iatrogenic Pneumothorax Rate
data to identify 27 potentially preventable PSI 07 Central Venous Catheter-Related Blood
inhospital complications, such as postoperative Stream Infection Rate
pneumonia, hemorrhage, medication incidents, PSI 08 Postoperative Hip Fracture Rate
and wound infection. In the mid-1990s, AHRQ PSI 09 Perioperative Hemorrhage or Hematoma
broadened the CSP to include a set of administra- Rate
tive data-based quality indicators, including sev- PSI 10 Postoperative Physiologic and Metabolic
eral measures of avoidable adverse events and Derangement Rate
complications. Realizing the potential value of PSI 11 Postoperative Respiratory Failure Rate
administrative data-based measures in identifying PSI 12 Perioperative Pulmonary Embolism or
patient safety events, AHRQ contracted with the Deep Vein Thrombosis Rate
Evidence-based Practice Center at the University PSI 13 Postoperative Sepsis Rate
of California, San Francisco, and Stanford Uni- PSI 14 Postoperative Wound Dehiscence Rate
versity to further expand, test, and refine these PSI 15 Accidental Puncture or Laceration Rate
measures as well as improve the evidence behind PSI 16 Transfusion Reaction Count
their use with extensive literature reviews and PSI 19 Obstetric Trauma Rate-Vaginal Delivery
broad clinical consensus panels. The research Without Instrument
team developed AHRQ PSIs through a five-step PSI 21 Retained Surgical Item or Unretrieved
process (Romano et al. 2003). First, they reviewed Device Fragment Rate
258 C. Zhan
Table 4 describes, as an example, the definition that reflect the US hospitalized population in
of the numerator, denominator, and key exclu- age, sex, DRGs, and comorbidities; and estimates
sions for PSI #13, postoperative sepsis. smoothed rates that dampen random fluctuations
AHRQ created software that implements over time. Thirty comorbidity categories are auto-
evidence-based and consensus-approved algo- matically generated by the software and used as
rithms; calculates raw rates, risk-adjusted rates risk adjusters along with variables available in
Table 4 Claims-based screening for patient safety events: AHRQ PSI #13, postoperative sepsis
Description Screening algorithm
Numerator Discharges, among cases meeting the inclusion and exclusion rules for the denominator, with any
secondary ICD-9-CM diagnosis codes for sepsis. ICD-9-CM sepsis diagnosis code 1
0380 STREPTOCOCCAL SEPTICEMIA
0381 STAPHYLOCOCCAL SEPTICEMIA
03810 STAPHYLOCOCC SEPTICEM NOS
03811 METH SUSC STAPH AUR SEPT
03812 MRSA SEPTICEMIA
03819 STAPHYLOCC SEPTICEM NEC
0382 PNEUMOCOCCAL SEPTICEMIA
0383 ANAEROBIC SEPTICEMIA
78552 SEPTIC SHOCK
78559 SHOCK W/O TRAUMA NEC
9980 POSTOPERATIVE SHOCK
99800 POSTOPERATIVE SHOCK, NOS
99802 POSTOP SHOCK,SEPTIC
03840 GRAM-NEGATIVE SEPTICEMIA NOS
03841 H. INFLUENAE SEPTICEMIA
03842 E COLI SEPTICEMIA
03843 PSEUDOMONAS SEPTICEMIA
03844 SERRATIA SEPTICEMIA
03849 GRAM-NEG SEPTICEMIA NEC
0388 SEPTICEMIA NEC
0389 SEPTICEMIA NOS
99591 SEPSIS
99592 SEVERE SEPSIS
Denominator Elective surgical discharges, for patients ages 18 years and older, with any-listed ICD-9-CM procedure
codes for an operating room procedure. Elective surgical discharges are defined by specific DRG or
MS-DRG codes with admission type recorded as elective (SID ATYPE=3)
Exclude cases:
With a principal ICD-9-CM diagnosis code (or secondary diagnosis present on admission) for sepsis
(see above)
With a principal ICD-9-CM diagnosis code (or secondary diagnosis present on admission) for
infection
With any-listed ICD-9-CM diagnosis codes or any-listed ICD-9-CM procedure codes for
immunocompromised state
With any-listed ICD-9-CM diagnosis codes for cancer
With length of stay of less than 4 days
MDC 14 (pregnancy, childbirth, and puerperium)
With missing gender (SEX=missing), age (AGE=missing), quarter (DQTR=missing), year
(YEAR=missing), or principal diagnosis (DX1=missing)
most administrative data systems. The PSI • Pay-for-performance by hospital: some reform
website also provides software (in Windows and initiatives, such as CMS/Premier Demonstra-
SAS), benchmark tables, and risk-adjustment data tion, include AHRQ PSIs measures in pay-for-
for individual hospitals, hospital systems, health performance determination.
plans, state, and other interested parties to calcu-
late their own risk-adjusted rates and make com- AHRQ PSIs continue to evolve. Besides peri-
parison to national benchmarks. Researchers can odical refinements, one development hinges on
download the document and software for free the addition of time stamps on diagnosis codes
(AHRQ 2014). (i.e., present-on-admission code) in claims or dis-
The specificity and sensitivity of these indica- charge abstracts. This code helps to separate
tors have been evaluated, accounting for a sub- hospital-acquired adverse events (i.e., events
stantial portion of published literature on AHRQ occurred after admission) from comorbidities
PSIs. It appears that the validity of AHRQ PSIs (i.e., conditions present on admission). Another
varies substantially from indicator to indicator, development is to include basic clinical data such
depending also on the data sources and gold as lab data, to improve risk adjustments, recog-
standards used. nizing that such data exist alongside administra-
Broadly speaking, AHRQ PSIs have been used tive data in many healthcare systems. The third
for: direction is the conversion of ICD-9-CM based
AHRQ PSIs to ICD-10, which most European
• Internal hospital quality improvement: individ- countries use, with country-specific modifications
ual hospitals use them as a case finding trigger, (e.g., ICD-10-AM for Australian modification and
to do root cause analyses, to identify clusters of ICD-10-GM for German modification).
potential safety lapses, to evaluate impact of These improvements, combined with advance-
local interventions, and to monitor perfor- ments in administrative databases and computing
mance over time. technologies, will make AHRQ PSIs more useful
• External hospital accountability to the commu- in patient safety research in the future.
nity: local government, health systems, and
insurance carriers such as Blue Cross/Blue
Shield of Illinois produce hospital profiles to Patient Safety Research Using
support consumers. Administrative Data: Potentials
• National, state, and regional analyses: govern- and Limitations
ment and researchers used it to produce aggre-
gate statistics, e.g., AHRQ’s for National Administrative data-based patient safety research
Healthcare Quality/Disparities Reports, for started with a very simple expectation: to flag the
surveillance of trends over time, and for infrequent cases with potential patient safety con-
assessing disparities across areas, socioeco- cerns in the large volume of claims in order to
nomic strata, ethnicities, and so on. guide further, in-depth investigation. As adminis-
• Testing research hypotheses related to patient trative data sources became more available and
safety: researchers has used the PSIs to test var- screening algorithms improved, researchers began
ious hypotheses on patient safety risk factors, to produce a variety of estimates and statistics and
such as those that support house staff work test various hypotheses related to patient safety.
hours reform and nurse staffing regulation. More recently, attempts are being made to create
• Public reporting by hospital: several states safety performance reports from administrative
(e.g., Texas, New York, Colorado, Oregon, data for individual providers or healthcare systems,
Massachusetts, Wisconsin, Florida, and Utah) study variations across regions, and track progress
include AHRQ PSIs measures in their public over time. Previous sections have touched on many
reporting of hospital quality. examples of such work. This section offers a more
260 C. Zhan
detailed review of the types of patient safety stud- epidemiology of patient safety events, categoriz-
ies, with examples, that administrative data can ing the events, assessing the prevalence, and
support and their limitations. understanding the causes and impacts, following
the general framework and methodologies dis-
cussed earlier.
Screen Patient Safety Events
for In-depth Examination
Prevalence of Patient Safety Events
Because administrative data covers large
First and foremost, AHRQ PSIs, the global trigger
populations, they are often the only available
tools, and most screening algorithms, are consid-
data sources to estimate national or state rates of
ered indicators, not definitive measures, of patient
patient safety events. The National Healthcare
safety concerns. These indicators are proposed to
Quality Reports (AHRQ 2013), released annually,
screen claims data for adverse events and to guide
include, for example, the rate of postoperative
subsequent medical record reviews to determine
sepsis based nationwide inpatient claims and
whether safety concerns exist. AHRQ PSIs, for
the rates of ambulatory care visits due to adverse
example, enable institutions to quickly and easily
events based on the National Ambulatory Medical
identify a manageable number of medical records
Care Survey and the National Hospital Ambula-
for closer scrutiny. Ackroyd-Stolarz et al. (2014)
tory Medical Care Survey. The Medicare Current
developed an algorithm to screen the discharge
Beneficiary Survey, the Medical Expenditure
abstract database of a Nova Scotia hospital for
Panel Survey, the National Ambulatory Medical
fall-related injuries. They compared cases identi-
Care Survey, and the National Hospital Ambula-
fied in administrative data against cases identified
tory Medical Care Survey have been used to
in structured medical record review, finding that
examine the prevalence of inappropriate use of
administrative data could identify fall-related
medications in the United States (e.g., Zhan
injuries with sensitivity of 96 % and specificity
et al. 2001).
of 91 %. Their work provided the hospital with a
Similar studies on the prevalence of patient
powerful tool to locate records for patients with
safety events are numerous in medical literature,
fall-related injuries, explore causes, and search for
covering all settings of care and types of prob-
solutions to the problem.
lems. A more recent example is a study conducted
Screening cases of patient safety concerns is
by Owens et al. (2014). By examining claims of
especially advantageous when the targeted events
hospitalizations and ambulatory surgical visits for
are rare. For example, it is not likely that one
infections following ambulatory surgery, the
hospital provides enough data to study patterns,
authors were able to estimate the incidence of
causes, or circumstances of foreign objects left in
surgical site infections after ambulatory surgery
during surgery, because the events occur in less
procedures, highlighting safety concerns in the
than 1 in 10,000 surgeries (Zhan and Miller 2003).
fast-growing outpatient surgery centers in the
Screening claims with AHRQ PSIs could quickly
United States.
identify such rare events, and associated medical
records could be obtained and abstracted for
Causes of Patient Safety Events
in-depth analysis. This two-step approach is par-
Many administrative data-based studies address
ticularly useful for individual providers or health
the causes and circumstances of patient safety
systems in their search for localized safety lapses
events. Gandhi et al. (2006) intended to find out
and improvement strategies.
how missed and delayed diagnoses in the out-
patient setting led to patient injuries. For their
Epidemiological Study purpose, the authors chose closed malpractice
claims from four malpractice insurance compa-
A large proportion of administrative data-based nies. They selected 181 claims where patients
patient safety research is aimed at discovering the sued doctors for injuries stemmed from
diagnosis errors and had a team of doctors sepsis occurs, a hospital loses financially, estab-
review the closed documents, including state- lishing a case for collaboration among hospitals,
ment of claims, depositions, interrogatories, payers, and patients or patient advocates to reduce
reports of internal investigations, root cause postoperative sepsis. This type of study is com-
analyses, expert opinions on both sides of the mon in health services research literature.
litigations, medical records, and other docu-
ments in the closed file to determine what Interventions and Policies to Improve
kind of errors happened and what were the Safety
possible causes. They found that failure to order Administrative data have been used to evaluate
appropriate diagnostic tests, failure to create a many system-wide interventions aimed at improv-
proper follow-up plan, and failure to obtain ade- ing patient safety. Many studies have been
quate history or perform adequate physical exam- conducted in the United States, Canada, and the
ination (55 %, 45 %, and 42 %, respectively) United Kingdom, for example, to evaluate how
were the leading types of diagnosis errors that various levels of nurse staffing, different staffing
resulted in the malpractice cases. models, and nursing hours affect patient safety, by
Zhan et al. (2008) examined warfarin-related linking safety estimates from hospital claims or
medication errors voluntarily reported to the abstracts to nurse staffing data from hospital sur-
MEDMARX database. By tabulating and cross- veys. Rafferty et al. (2007) did such a study using
tabulating coded variables in a cascading way data from 30 English hospital trusts. They used
and screening open-ended narratives in selected data from three sources: hospital structure (e.g.,
reports, the authors were able to construct a com- size and teaching status) from hospital adminis-
prehensive understanding of errors in warfarin trative databases; patient outcomes, specifically,
prescriptions and administration in hospitals patient mortality and failure to rescue, from hos-
and clinics. They found that, in outpatient pital discharge abstracts; and data on nursing
settings, 50 % of errors in warfarin medication staffing and nurse job satisfaction from surveys
occurred in pharmacies and 50 % were inter- of the participating hospitals. Their finding that
cepted by a pharmacist, indicating the critical higher patient-to-nurse ratios were associated
role of pharmacists in helping patients with with worse patient outcomes could help hospitals
warfarin use. plan their nurse staffing.
A study by Dimick et al. (2013) is an exam-
Impact of Patient Safety Events ple of how administrative data can be useful to
Once patient safety events are identified with an evaluate national health policies. Starting in
acceptable level of validity in administrative data, 2006, CMS has restricted coverage of bariatric
it is relatively easy to examine the impacts of the surgery to hospitals designated as centers of
events on various patient and social outcomes excellence by two major professional organiza-
identifiable in the data. Using AHRQ PSIs, Zhan tions. The authors wanted to explore if such
and Miller (2003) screened nationwide hospital coverage policy change improved patient safety
claims and estimated the impacts of the selected as it intended. It would be difficult to design a
patient safety events on length of stay, charges, study based on primary data collection or data
and mortality. The authors found that postopera- sources other than nationwide administrative
tive sepsis, for example, extended hospital stay data to evaluate this policy. Using claims from
by about 11 days, added $58,000 extra charges 12 states covering 2004–2009, Dimick et al.
to the patients’ hospital bills, and increased the (2013) were able to estimate risk-adjusted rates
inhospital mortality rate by 22 %. In another of complications and reoperations of bariatric
study, Zhan et al. (2006) showed that when a surgery before versus after the implementation
case of postoperative sepsis occurred, Medicare of the national policy restricting coverage, find-
actually paid $9,000 extra. Taking the two studies ing that the policy has had no impact with regard
together, it is easy to see that, once a postoperative to patient safety.
262 C. Zhan
Public Reporting on Patient regions may lead to contentions over technicali-

Safety Events ties rather than facilitate quality improvement.
Developers of AHRQ PSIs and similar adminis-
Using administrative data to measure patient trative data-based indicators in general have
safety of individual providers has been controver- expressed caution with regard to the use of the
sial. However, administrative data-based patient indicators for public reporting at an institutional
safety measures are increasingly used to profile level. Health services researchers must exert sim-
hospitals and to support pay-for-performance pro- ilar caution when using these measures as hospital
grams. Ten of AHRQ PSIs are endorsed by the performance measures in their research.
National Quality Forum as valid measures for
public reporting. Many US states have used
these measures as components of their hospital Advantages and Challenges
quality reports. CMS annually calculates seven in Administrative Data-Based Patient
AHRQ PSI rates as parts of public-reported out- Safety Research
come measures based on claims and administra-
tive data, aimed at increasing the transparency of Table 5 summarizes the major advantages and
hospital care, providing information for con- limitations of administrative data-based patient
sumers choosing care, and assisting hospitals in safety research.
their quality improvement efforts. The greatest advantage is that the data already
There are many legitimate arguments against exist and are mostly computerized, and the
such use, such as coding differences across research effort requires properly acquiring the
institutions, lack of specificity and sensitivity in data, creating valid screening algorithms, and
the safety indicators, and lack of sufficient conducting robust analysis. Administrative data
confounding adjustments, to list a few. These usually cover large populations, allowing estima-
reasons raise some doubts whether differences tions at county, state, or national levels and com-
between hospitals in administrative data-based parisons across different subpopulations. Many
patient safety event rates reflect true differences administrative databases allow linkage of patient
in patient safety. Because of these limitations, records from multiple settings and over time,
public reporting of such rates for institutions and and researchers can construct large retrospective
Table 5 Advantages and disadvantages of administrative data for patient safety research
Advantages of administrative data Disadvantages of administrative data
Already collected for administrative purposes and Information collected is restricted to data required for
therefore no additional costs of collection (besides data administrative purposes
acquisition and cleaning costs)
Large coverage of population of interest allowing Collection process does not follow any research design,
estimation and comparison at regional and national levels protocol, or procedure; lack of researcher control over
content
Collection process not intrusive to target population Algorithms, triggers, or indicators with variable validity,
subject to coding errors and coding variation across
institutions
Regularly, continuously updated Claims, abstracts, and surveys lack contextual, clinical
information, while malpractice claims and spontaneous
reports lack data on denominator or population at risk
Mostly computerized Results often statistically significant but clinically
Can be linked to form individual patient’s complete meaningless
healthcare experiences
Malpractice claims and spontaneous reports contain rich
contextual data not available elsewhere
cohorts that mimic a prospective study design and limited to variables available from administrative
test a wide range of hypotheses from risk factors data. On the other hand, malpractice claims and
to potential interventions. spontaneous medical error reports contain exten-
The greatest limitation lies with the fact that the sive details on specific events, but the denomina-
data were not collected with a research purpose, tor populations (i.e., patients at risk for those
study protocol, or quality assurance procedure. reported events) are unknown, severely limiting
Researchers have to creatively repurpose the the data’s ability to support estimation and
data to meet their research needs and make great hypothesis testing research.
efforts in methodology design to minimize poten- There are also many analytical challenges. The
tial biases. sheer size of administrative data can give the
As discussed earlier, the most critical task of illusion of great precision and power. Often
administrative data-based patient safety research times the differences found are statistically signif-
is to design valid patient safety screening algo- icant but of little clinical meaning. Coupled with
rithms or indicators. Most of the indicators missing important confounding variables and dif-
developed to date have relied on coded data in ficulty in choosing correct statistical models that
the administrative databases. Using ICD-9-CM fit the data, clinically insignificant but statistically
codes as examples, many concerns exist. First, significant results could lead to biased inferences
researchers can only find events for which there and erroneous conclusions. Health services
are corresponding ICD-9-CM codes. Second, researchers must bear in mind these limitations
there may be a substantial number of coding when designing their administrative data-based
errors, due to misunderstanding of codes, or errors patient safety studies and must interpret the results
by physicians and coders, or miscommunications with full acknowledgment of these limitations.
between them. Third, coding is very likely to be
incomplete because of limited slots for coding
secondary diagnoses and other reasons. Fourth,
References
assignment of ICD-9-CM codes is variable
because of the absence of precise clinical defini- Ackroyd-Stolarz S, Bowles SK, Giffin L. Validating
tions and context. Last but not least, diagnoses are administrative data for the detection of adverse events
not dated in most administrative data systems, in older hospitalized patients. Drug Healthc Patient Saf.
2014;13(6):101–8.
making it difficult to determine whether a second-
Agency for Healthcare Research and Quality (AHRQ).
ary diagnosis occurs prior to admission (i.e., a 2013 National healthcare quality report. http://www.
comorbid disease) or during a hospitalization ahrq.gov/research/findings/nhqrdr/nhqr13/2013nhqr.
(i.e., a complication or medical error). pdf. Accessed 1 Sept 2014.
Agency for Healthcare Research and Quality (AHRQ).
Administrative data have been repeatedly
Patient safety indicators. http://www.qualityindica
shown to have low sensitivity but fair specificity tors.ahrq.gov/Modules/psi_resources.aspx. Accessed
in identifying patient safety events. Focusing on 1 Sept 2014.
specific adverse events for specific patient Budnitz DS, Lovegrove MC, Shehab N, et al. Emergency
hospitalizations for adverse drug events in older
populations, as is built into the AHRQ PSIs,
Americans. N Engl J Med. 2011;365(21):2002–12.
improves specificity appreciably. But, in most Centers for Medicare and Medicaid Services (CMS). CMS
cases, researchers have to work with indicators data navigator. http://www.cms.gov/Research-Statis
that have modest validity in their research. tics-Data-and-Systems/Research-Statistics-Data-and-
Systems.html. Accessed 1 Sept 2014.
Lack of clinical details is another major limi-
Dimick JB, Nicholas LH, Ryan AM, et al. Bariatric surgery
tation of most administrative data such as claims complications before vs after implementation of a
and discharge abstracts. Of special concern is the national policy restricting coverage to centers of excel-
severity of illness that affects patient outcomes lence. JAMA. 2013;309(8):792–9.
Gandhi T, Kachalia A, Thomas E, et al. Missed and
and conceivably affects the likelihood of patient
delayed diagnoses in the ambulatory setting: a study
safety events. Analyses of outcomes and risk fac- of closed malpractice claims. Ann Intern Med.
tors associated with patient safety events are 2006;145:488–96.
264 C. Zhan
Griffin FA, Resar RK. IHI global trigger tool for measuring Romano PS, Geppert J, Davies S, et al. A national profile of
adverse events (Second Edition). IHI innovation series patient safety in US hospitals based on administrative
white paper. Cambridge, MA: Institute for Healthcare data. Health Aff. 2003;22(2):154–66.
Improvement; 2009. Wang X, Hripcsak G, Markatou M, et al. Active comput-
Healthcare Cost and Utilization Project (HCUP). http:// erized pharmacovigilance using natural language
www.hcup-us.ahrq.gov/. Accessed 1 Sept 2014. processing, statistics and electronic health records: a
Iezzoni LI, Daley J, Heeren T, et al. Using administrative feasibility studies. JAMIA. 2009;16:328–37.
data to screen hospitals for high complication rates. Zhan C, Miller M. Excess length of stay, costs, and mor-
Inquiry. 1994;31(1):40–55. tality attributable to medical injuries during hospitali-
Kohn LT, Corrigan JM, Donaldson M, et al. To err is zation: an administrative data-based analysis. JAMA.
human: building a safer health system. Washington, 2003;190(4):1868–74.
DC: Institute of Medicine; 1999. Zhan C, Sangl J, Bierman A, et al. Inappropriate medica-
Owens PL, Barrett ML, Raetzman S, et al. Surgical site tion use in the community-dwelling elderly: findings
infections following ambulatory surgery procedures. from 1996 Medical Expenditure Panel Survey. JAMA.
JAMA. 2014;311(7):709–16. 2001;286(22):2823–9.
Quan H, Drösler S, Sundararajan V, et al. Adaptation of Zhan C, Friedman B, Mosso A, et al. Medicare payment for
AHRQ patient safety indicators for use in ICD-10 selected adverse events under the prospective payment
administrative data by an international consortium. In: system: building the business cases for investing in patient
Henriksen K, Battles JB, Keyes MA, et al., editors. safety improvement. Health Aff. 2006;25(5):1386–93.
Advances in patient safety: new directions and alterna- Zhan C, Smith SR, Keyes MA, et al. How useful are
tive approaches. Rockville: Agency for Healthcare voluntary medication error reports? The case of
Research and Quality; 2008. warfarin-related medication errors. Joint Comm J
Raffertya AM, Clarkeb SP, Colesc J, et al. Outcomes of Qual Patient Saf. 2008;34(1):36–44.
variation in hospital nurse staffing in English Zhan C, Elixhauser A, Richards C, et al. Identification of
hospitals: cross-sectional analysis of survey data hospital-acquired catheter-associated urinary tract
and discharge records. Int J Nurs Stud. 2007;44(2): infections from Medicare claims: sensitivity and posi-
175–82. tive predictive value. Med Care. 2009;47(3):364–9.
Health Services Information: Personal
Health Records as a Tool for Engaging 12
Patients and Families
John Halamka
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
A Short History of Personal Health Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
Products in the Marketplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
The Regulatory Environment: ARRA/HITECH, the HIPAA Omnibus Rule,
and FDASIA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
Myths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Digital Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
Data Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
The Role of Personal Medical Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Research: OpenNotes, ICU Harm Reduction,
Care Plans, and Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
Abstract matured. Policies such as who can see what

Personal Health Records have evolved from for what purpose have been enumerated. Reg-
stand alone websites requiring manual entry ulations now require a deeper level of interac-
of data to automated mobile applications fully tion between care teams and patients. Many
integrated into care management workflow. myths about the risks of engaging patients and
Technology issues such as interoperability, families have been shattered. Research con-
security, and patient identification have tinues to expand the scope of information
shared with families, enhance usability of
patient facing applications, and improving the
J. Halamka (*) utility of solutions automating patient/provider
Department of Emergency Medicine, Harvard Medical
School and Beth Israel Deaconess Medical Center, Boston, workflow.
MA, USA
e-mail: jhalamka@bidmc.harvard.edu

https://doi.org/10.1007/978-1-4939-8715-3_13
266 J. Halamka
Introduction In 1999, a group of clinicians and patient advo-

cates in New England suggested that Beth Israel
A key enabler to delivering safe, high-quality, Deaconess Medical Center (BIDMC) should
efficient care is engaging patients and families share all of its electronic records with patients,
by sharing healthcare records, codeveloping since all healthcare data ultimately belongs to
plans of treatment, and communicating prefer- the patient. In 2000, BIDMC went live with a
ences for care among the entire care team. hospital-based personal health record, PatientSite
Over the past 20 years, patient portals, personal (http://www.patientsite.org). PatientSite includes
health records, electronic consumer education full access to problem lists, medications, allergies,
resources, wellness devices, and health-focused visits, laboratory results, diagnostic test results,
social networks have offered more transparency, and microbiology results from three hospitals
shared decision-making, and communication than and numerous ambulatory care practices. In addi-
ever before. tion to these hospital and ambulatory clinic-
This chapter examines the history of technol- provided data, patients can amend their own
ogies that empower patients and families while records online, adding home glucometer readings,
also identifying important foundational policies over-the-counter medications, and notes. Secure
and speculating how future innovations will patient-doctor messaging is integrated into the
provide even greater functionality. The chapter system. Convenience functions such as appoint-
also reviews the evolving regulatory environment making, medication renewal, and specialist
ment and discusses the impact of US national referral are automated and easy to use. Clinical
“Meaningful Use” requirements on the adoption messaging is the most popular feature, followed
of new tools. by prescription renewals and followed by appoint-
ment making and referrals.
In 1998 researchers at the Children’s Hospital
A Short History of Personal Health Informatics Program (CHIP) at Children’s Hospi-
Records tal Boston developed the concept of the Indivo
Personally Controlled Health Record in a plan-
Personal health records (PHRs) have the potential ning grant and began implementation in 1999.
to make patients the stewards of their own medical Critical to the success of the model, the code
data. PHRs may contain data from payer claims base of Indivo has always been open source, the
databases, clinician electronic health records, application programming interface (API) is fully
pharmacy-dispensing records, commercial labo- published, and all communication/messaging pro-
ratory results, and personal medical device data. tocols adhere to freely implementable standards.
They may include decision support features, con- Indivo enables patients to maintain electronically
venience functions such as appointment making/ collated copies of their records in a centralized
referral requests/medication refill workflow, and storage site. Access, authentication, and authori-
bill paying. zation all occur on one of several available Indivo
Early personal health records were deployed at servers, which are also responsible for encryption
Beth Israel Deaconess Medical Center of the record. Individuals decide who can read,
(PatientSite), Children’s Hospital (Indivo), and write, or modify components of their records.
Palo Alto Medical Clinic (MyChart) in the late In 1999, Epic Systems, an established vendor
1990s and early 2000s. In the mid-2000s, of EHR systems, decided to develop a patient
direct-to-consumer vendors such as Microsoft portal, which they called MyChart. The Palo
(HealthVault) and Google (Google Health) Alto Medical Foundation (PAMF) worked with
offered products. Since that time, most electronic Epic to develop the functionality requirements
health record (EHR) vendors (Epic, Meditech, for a PHR that was integrated with their EHR.
Cerner, eClinicalWorks, Athena) have included PAMF became the first customer of MyChart,
patient portals in their products. which was implemented at the end of 2000.
12 Health Services Information: Personal Health Records as a Tool for Engaging Patients and. . . 267
MyChart enables the patient to review their diag- A small number of reports are delayed to
noses, active medications, allergies, health main- enable a discussion between provider and patient
tenance schedules, immunizations, test results, to occur first. The Commonwealth of Massachu-
radiology results, appointments, and demo- setts has specific regulatory restrictions on the
graphics. In many cases, relevant health educa- delivery of HIV test results, so they are not
tional resources are automatically linked to key shown on PatientSite. The tests and their delays
terms or phrases in the patient’s medical record, are summarized below:
such as a diagnosis of diabetes. In addition,
patients can communicate with the physician CT scans (used to stage cancer) 4 days
office to request an appointment, request a pre- PET scans (used to stage cancer) 4 days
scription renewal, update demographic informa- Cytology results (used to diagnose cancer)
tion, update immunization status, or update a 2 weeks
health maintenance procedure. The patient can Pathology reports (used to diagnose cancer)
also request advice from an advice nurse or from 2 weeks
their own physicians.
Based on the success of these early adopters, HIV diagnostic tests: never shown
many electronic health record companies began
offering patient access to electronic records in • Bone marrow transplant screen, including:
the late 2000s. As is discussed below, the Federal HIV-1 and HIV-2 antibody
HITECH Meaningful Use program now requires HTLV-I and HTLV-II antibody
that patients be able to view, download, and Nucleic acid amplification to HIV-I (NHIV)
transmit their medical records, accelerating • HIV-1 DNA PCR, qualitative
market deployment of personal health record • HIV-2 and Western blot. Includes these results:
functionality. HIV-2 AB and EIA
HIV-2 and Western blot
• HIV-1 antibody confirmation. Includes these
Policies results:
Western blot
As personal health record technology was Anti-P24
deployed, many novel policy questions arose. Anti-GP41
What information should be shared and when? Anti-GP120/160
Who should have access? Should parents have
access to the records of their adolescent children? We want the patient to own and be the steward
Over time, many best practices have evolved of their own data, but we also want to support the
which have answered these questions. patient/provider relationship and believe that bad
Although the Health Insurance Portability and news is best communicated in person. Over time,
Accountability Act of 1996 (HIPAA) mandated it is likely that even these delays and restrictions
that patients have access to their medical records, will be removed, making all data instantly avail-
it did not require the release of data electronically. able to the patient. When the wife of the author of
The HIPAA Omnibus Rule of 2013 does require this chapter was diagnosed with breast cancer in
electronic access, but it does not specify how 2011, she wanted to see her pathology results
quickly releases should occur. Should a cancer immediately, even if they were bad news. In the
diagnosis be revealed to a patient in real time on future, the patient and provider may agree on data-
a website or wait for a personal conversation with sharing preferences as part of establishing a pri-
a physician? mary care relationship.
At BIDMC, the majority of the record is shared Other issues that arose during early experi-
with the patient immediately with minor excep- ences with personal health records included the
tions, since it is the patient’s data. access granted to adolescents and their parents.
268 J. Halamka
As more and more practices and hospitals are the parent’s link (unless they receive court
making patient portals available to their patients, documents stating that the parent remains the
providers of adolescent patients are encountering medical guardian).
a major hurdle: how to handle confidential ado- Health information contained in the patient portal:
lescent information. Children’s has identified and tagged certain
While adult patients generally maintain full information from their EHR that they consider
personal control of their personal health record sensitive, such as labs related to pregnancy,
(PHR), adolescent PHRs are anything but per- sexually transmitted illnesses, genetic results,
sonal. Adolescents rarely have full control of select confidential appointments, and poten-
their record, but instead rely on parents and guard- tially sensitive problems and medications.
ians to share control. The details around this This information is currently filtered from
shared access changes over time, depending on both parent and adolescent accounts, but in
developmental and age-appropriate consider- the near future, the sensitive information will
ations, as well as guardianship arrangements. flow to the adolescent account, but not to the
The biggest challenge then becomes how to parent account. So, even if a patient is less than
protect the adolescent’s legal right to privacy and 13 years, the parent would not have access to
confidentiality within this hybrid/proxy-control this information.
model. Many medical encounters with adolescents
come with the verbal assurance that what they tell This solution does take a lot of time and effort,
us will (under most circumstances) remain entirely but best replicates the current clinical practice.
confidential, meaning we will not discuss personal Many current PHR applications in the market-
health information pertaining to reproductive place do not allow for this type of differential
health, sexually transmitted diseases, substance access and only enable full proxy access.
abuse, and mental health with their parents or any- Alternative solutions include the following:
one else without their consent. As it turns out, this
type of confidential information is pervasive 1. Shared access for patient and parent, but filter-
through most EHRs. ing of sensitive information. One could then
Children’s Hospital Boston spent a lot of time choose the age at which patients would gain
thinking about this issue and adolescent access to access without worrying about the parent see-
our patient portal and ultimately developed a ing sensitive information at any age. This
custom-built solution to meet our and our makes the age at which the patient obtains
patients’ needs. access, whether it is 10 or 13 years, less impor-
Their approach is built around differential tant. Unfortunately, this option restricts adoles-
access to the patient portal with the goal of cent access to confidential information and
mirroring current clinical practice and works as creates a fragmented and incomplete record.
follows: 2. Adolescent access only. This is trickier,
because choosing the appropriate age when
Access to the patient portal: Separate accounts parental access is discontinued is difficult and
are created for the patient and parent(s) that may vary depending on patient characteristics.
are linked. The parent has sole access to the Many practices choose 12 or 13 years. How-
patient’s portal until the patient turns 13, at ever, if sensitive information is not being fil-
which point both the parent and the patient tered, there may be an occasional 11-year-old
can have access. They chose 13 years as the with a sexually transmitted infection. Also,
cutoff based on a number of factors, including some parents object to being cut off from
developmental maturity and other precedents their child’s medical information, and many
at their institution based on their policies. At play an important role in supporting their ado-
18 years, the patient becomes the sole owner of lescent children and guiding them through
the portal account, and Children’s deactivates healthcare decisions.
The issues and solutions involved with adoles- enhancing coordination of care. Many Blue
cent PHRs are certainly complex and will con- Cross affiliates have made such sites available.
tinue to evolve over time. However, I am hopeful Employer sponsored: In this model, employees
that PHRs will start incorporating the unique can access their claims data and benefit
needs of the adolescent population in the near information via a portal hosted by an indepen-
future, allowing both parents and adolescents dent outsourcing partner. The funding for
to share responsibility and engage in their employer-based personal health records is
healthcare. based on reducing total healthcare costs to the
employer through wellness and coordination
of care. A healthy employee is a more produc-
Products in the Marketplace tive employee. Keas is an example of an
employer-sponsored employee engagement
Over the nearly two decades that personal health for health application.
records have been deployed, there have been four Vendor hosted: Several vendors serve as a
basic models. secure container for patients to retrieve, store,
and manipulate their own health records.
Provider-hosted patient portal to the electronic Microsoft’s HealthVault includes uploading
health record: In this model, patients have and storage of records as well as a health search
access to provider record data from hospitals engine. Google offered such services from
and clinics via a secure web portal connected to 2007 to 2012, but discontinued the service
existing clinical information systems. Exam- because of lack of adoption. Humetrix is an
ples of this approach include the PatientSite example of a consumer-centered technology
and MyChart applications described above. vendor, focused on mobile apps and healthcare
The funding for provider-based PHRs is gen- information exchange. The business model for
erally from the marketing department since these PHRs is generally based on attracting
PHRs are a powerful way to recruit and retain more users to advertising-based websites,
patients. Also, the Healthcare Quality Depart- although the PHR itself may be advertising
ment may fund them to enhance patient safety free. Vendor-hosted PHRs include HITECH-
since PHRs can support medication reconcili- mandated privacy protections and must sign
ation workflows. Kaiser’s implementation business associate agreements and agree to
does not distinguish between the personal keep data private.
health record and electronic health record.
Instead they call it a patient-/provider-shared Here is the press release from Beth Israel Dea-
electronic health record. coness, describing the availability of HealthVault
Payer-hosted patient portal to the payer claims to its patients, which illustrates the value proposi-
database: In this model, patients have access tion communicated to the patients:
to administrative claims data such as discharge
diagnoses, reimbursed medications, and lab BOSTON: Beth Israel Deaconess Medical Center
tests ordered. Few payer-hosted systems con- (BIDMC) is expanding options for users of its
tain actual lab data, but many payers are now secure PatientSite portal by joining forces with
working with labs to obtain this data. Addi- Microsoft HealthVault to offer a new way to
tionally, payers are working together to enable safely exchange medical records and other
the transport of electronic claims data between health data.
payers when patients move between plans,
enhancing continuity of care. The funding for The affiliation follows an earlier commitment
payer-based PHRs is based on reducing total to offer a similar service through Google Health.
claims to the payer through enrollment of “We believe that patients should be the stew-
patients in disease management programs and ards of their own data,” says John Halamka, MD,
270 J. Halamka
BIDMC’s chief information officer. BIDMC’s In stage 1 of Meaningful Use, vendor software
PatientSite is wonderful if all care is delivered at was certified to provide basic health information
BIDMC. However, many patients have primary access to patients. Providers were optionally able
care doctors, specialists, labs, pharmacies, and to attest to use of personal health records as
nontraditional providers at multiple institutions. part of meeting criteria for stimulus payment. In
stage 2 of Meaningful Use, use of personal health
“Our vision is that BIDMC patients will be able to record technology became a mandatory part of
electronically upload their diagnosis lists, medication
lists and allergy lists into a HealthVault account and
attestation. The three provider requirements
share that information with health care providers related to PHRs include:
who currently don’t have access to PatientSite.”
• Providers must offer online access to health
PatientSite, which currently has more than information to more than 50 % of their patients
40,000 patient users and 1,000 clinicians, enables with more than 5 % of patients actually
patients to access their medical records online, accessing their information.
securely email their doctors, make appointments, • More than 5 % of patients must send secure
renew medications, and request referrals. messages to their provider.
HealthVault is designed to put people in con- • Providers must use the EHR to identify and
trol of their health data. It helps them collect, provide educational resources to more than
store, and share health information with family 10 % of patients.
members and participating health care providers,
and it provides people with a choice of third-party Although some institutions have offered per-
applications and devices to help them manage sonal health records for many years, others have
things such as fitness, diet, and health. not yet established the workflow, created the pol-
HealthVault also provides a privacy- and icies, or experienced the cultural changes that are
security-enhanced foundation on which a broad foundational to provider/patient electronic inter-
ecosystem of providers – from medical providers action. Many organizations have suggested that
and health and wellness device manufacturers to requiring actual use of the personal health record
health associations – can build innovative new by the patient is beyond provider control and thus
health and wellness solutions to help put people is unfair.
in increased control of their and their family’s Beth Israel Deaconess has already achieved
health. patient participation rates of 25 % for record
viewing and 15 % for secure messaging without
“The end result will be when patients leave the
BIDMC area or see a provider outside the area significant advertising or educational effort.
they can have all their medical data located in one Patients find value in the timeliness and conve-
safe place,” adds Halamka. nience of these transactions, so participate enthu-
siastically. Admittedly, BIDMC had 15 years to
refine the application, modify medical staff
The Regulatory Environment: ARRA/ bylaws to require PHR use, and overcome some
HITECH, the HIPAA Omnibus Rule, of the doubts and myths described below.
and FDASIA In addition to the Meaningful Use require-
ments, the HIPAA Omnibus Rule expands an
The American Recovery and Reinvestment Act individual’s rights to receive electronic copies of
(ARRA) of 2009 included the HITECH provi- his or her health information and to restrict dis-
sions which launched the national Meaningful closures to a health plan concerning treatment for
Use program. Meaningful Use includes certifica- which the individual has paid out of pocket in full.
tion for products, ensuring they are good enough, Many healthcare organizations are struggling with
and attestation for clinicians that they are using the self-pay disclosures workflow, since modify-
the technology wisely. ing data flows based on how the patient pays is not
currently supported by commercial EHR prod- experience with personal health records, it is
ucts. There are also ongoing national efforts to clear that most of those concerns have not
refine the Omnibus Rule language for “accounting appeared in practice.
of disclosures,” when a patient requests a list of all Providers were concerned that sharing elec-
who have accessed or received copies of their tronic health records would result in more asser-
record. Implementing such accounting for all dis- tions of malpractice as patients found errors in
closures including treatment, payment, and oper- their records. At BIDMC and other Harvard-
ations requires capabilities not present in most associated hospitals, the opposite has been true.
commercial EHR products. Informed and engaged patients do find errors and
The Food and Drug Administration issued a work with their providers to correct inaccuracies
report in April 2014 outlining the Food and before harms occur. Malpractice assertions
Drug Administration Safety and Innovation Act decrease when personal health records are
(FDASIA) regulatory framework that is relevant deployed.
to personal health records because of the increas- Providers were concerned that they would be
ing popularity of using mobile devices to access overwhelmed with secure email or other elec-
health-related resources. Mobile devices will be tronic requests from patients. Electronic requests
discussed in detail later in this chapter. have replaced phone calls and have reduced time
The FDA stratified mobile devices/apps into spent on “phone tag” and accelerated the resolu-
three categories: tion of simple administrative matters than can be
delegated to others.
Administrative apps – an application that reminds Patients were concerned that increased elec-
you about an appointment, describes costs/ben- tronic access would create new security risks.
efits such as co-pays, or helps you find a doctor. While it is true that the Internet is increasingly a
Wellness apps – an application that measures your mire of viruses and malware, keeping electronic
daily exercise, suggests weight loss strategies, data centrally managed on secure servers is less
or offers healthcare coaching via a social risky than exchanging paper copies, storing PDFs
network. on laptops, or exchanging electronic copies on
Medical devices – an application that measures a USB flash drives because centrally stored infor-
body parameter such as pulse, blood pressure, mation can be better audited and controlled.
or EKG and may offer therapeutic suggestions Patients and providers were concerned that
based on directly gathered diagnostic data. more transparency could jeopardize the clinician/
patient relationship because of misunderstandings
in the interpretation of electronic health records.
The FDA reaffirmed its intent to regulate
Instead, providers have been careful to write com-
Medical devices and not administrative apps/
prehensible summaries with fewer abbreviations
wellness apps.
because they know a patient is likely to read
It is unlikely that the FDA will regulate per-
their work.
sonal health records in the near future, but it will
There have been lessons learned along the way.
likely regulate the apps and devices which collect
Sharing inaccurate or confusing data with patients
patient telemetry and transmit it to personal health
does not add value. For example, administrative
records.
billing data is a coded summary of the clinical care
that lacks perfect specificity and time references,
i.e., just because you had a diagnosis of low
Myths potassium 5 years ago does not imply it is a
problem today.
Many providers and patients have concerns about Thus, we must be thoughtful about what data is
the impact of increased electronic data sharing and sent to PHRs and how that data is presented to
automated workflows. After nearly 20 years of patients. The problem list is useful clinical
272 J. Halamka
information as long as clinicians keep it current. and e-prescribing. Although the standards for per-
BIDMC removes ICD-9 administrative data feed sonal health records are not explicitly stated, it is
so that the clinician’s problem list is the only data logical that personal health records should mirror
which populates the patient view. Also, BIDMC the standards used in electronic health records
improved its problem list functionality so that it themselves. Standards can generally be lumped
maps to a standardized terminology, SNOMED into three different categories.
CT, enabling BIDMC to provide medical infor-
mation and decision support based on a controlled Vocabulary – the terminology used in each part of
vocabulary instead of just free text. the record to communicate meaning between
As long as the PHR software is usable and the sender and receiver. The Meaningful Use
data presented is relevant, supplemented by edu- Common Data Set requires LOINC codes for
cational materials, the experiencing of provider/ labs, RxNorm codes for medications,
patient data sharing will be positive. SNOMED CT for problem lists, CVX for
immunization names, and ISO 639–2 for pri-
mary language. The same standards should be
Digital Divide
used in personal health records and medical
devices connecting to personal health records.
As we offer more electronic resources to patients
Mappings to patient friendly terminology,
and encourage the use of mobile technology and
available for the National Library of Medi-
home medical devices, we must be careful not to
cine’s Value Set Authority Center, are likely
create a digital divide – the technology haves and
to be helpful to patients.
have nots. In the Boston area, there are many
Content – the container used to package a collec-
academic and technology professionals with fast
tion of data to be transported between a sender
Internet connections and the latest mobile devices.
and receiver. The Consolidated Clinical Docu-
There are also Medicaid patients without the
ment Architecture (CCDA) is used for all EHR
funding to purchase personal devices and those
transition of care summaries and is appropriate
who feel technology requires expertise beyond
to use for sending data to PHRs and collecting
their comfort zone. Research done in the Boston
data from patients. Medical devices may addi-
area discovered that the large majority of Medicaid
tionally use the IEEE 11073 standard to trans-
patients have phones capable of receiving text
fer data to and from PHRs.
messages and most patients have access to the
Transmission – the secure protocol to transport
Internet at work, at a local library, or a community
content from one place to another without mod-
center. We must engineer our personal health
ification or interception. Meaningful Use stage
records so they run anywhere on anything, but
2 requires the Direct Protocol (SMTP/SMIME
also protect privacy by not leaving behind cached
or SOAP/HTTPS) to be used for transport.
data that could be viewed inappropriately.
These standards are also appropriate for per-
PatientSite and most vendor applications are web
sonal health records and medical devices.
based so they can be accessed regardless of loca-
tion or platform, with specific protections to ensure
As standards become increasingly
data is encrypted and not stored in web browsers.
constrained, ease of interfacing improves and
Engineering for those with disabilities, failing eye-
the value of interoperable products increases.
sight, or limited computer skills is also essential.
Ideally, Meaningful Use certification should cre-
ate an ecosystem of personal health record prod-
Data Standards ucts, leveraging the liquidity of data to foster
innovation. Later stages of Meaningful Use
The HITECH Meaningful Use program requires likely encourage “modular” EHR and PHR
the use of specific standards for transition of care products that plug into large commercial
summary transmission, public health reporting, systems through the use of simple application
programming interfaces (APIs). The April 2014 his mobility to an EHR or PHR. It would be clear
JASON report, requested by AHRQ and that on some days he walked 50 ft and other
facilitated by MITRE corporation, provides a days he walked 5,000 ft. The trend would be
roadmap for evolution of healthcare apps that clear – fewer good mobility days and more lim-
expand the use of today’s EHRs and PHRs. ited function. Care plans, medications, and
supportive therapies would be informed by this
objective data.
The Role of Personal Medical Devices Just as personal computing has evolved from
terminals to PCs to mobile smartphones/tablets, it
As Accountable Care Organizations move from is likely that personal health records will increas-
fee for service to risk contracts, providers will be ingly run on mobile technology with interfaces to
reimbursed for keeping patients healthy and not home care devices.
for delivering more care. Personal medical
devices that report on patient activities, functional
status, and body parameters between clinician Research: OpenNotes, ICU Harm
visits will be increasingly important. Reduction, Care Plans, and Clinical
Such devices include electronic scales for mea- Trials
suring fluid retention in CHF patients, blood pres-
sure measurement for refractory hypertension, When BIDMC’s PatientSite was originally
glucometers for diabetics, and home spirometry released, it included patient access to the entire
for patients with COPD or asthma. health record except for the clinic notes a physi-
The current challenge is that home medical cian wrote about a patient. That changed in 2011
devices communicate using proprietary protocols when notes were added via the OpenNotes pro-
that make interfacing to personal health records ject. Here’s the press release about it.
and electronic health records very challenging. BOSTON – A Beth Israel Deaconess Medical
The Continua Alliance is a group of 60 compa- Center-led study has found that patients with
nies that collaboratively develops standards for access to notes written by their doctors feel more
incorporation into products with the goal that in control of their care and report a better under-
devices available at the local drugstore will standing of their medical issues, improved recall
“plug and play” with the diversity of current of their care plan, and being more likely to take
EHRs and PHRs without complex engineering their medications as prescribed.
or custom software development. Doctors participating in the OpenNotes trial at
Future stages of Meaningful Use will likely BIDMC, Geisinger Health System in Danville,
include a requirement for patient-generated data. PA, and Harborview Medical Center in Seattle
Payers, providers, and patients will all have incen- reported that most of their fears about an addi-
tives to include device from home telemetry in tional time burden and offending or worrying
electronic medical records that provide coordi- patients did not materialize, and many reported
nated, optimized care further personalized via enhanced trust, transparency, and communication
access to personal medical devices. with their patients.
Here’s an example. The father of the author of “Patients are enthusiastic about open access to
this chapter had multiple sclerosis for 23 years. their primary care doctors’ notes. More than
His mobility declined but there was no easy way 85 % read them, and 99 % of those completing
to measure that decline. To complicate the situa- surveys recommended that this transparency
tion, he self-medicated with over-the-counter continue,” says Tom Delbanco, MD, co-first
and prescription medications to episodically author, a primary care doctor at BIDMC and the
reduce his symptoms. During personal visits his Koplow-Tullis Professor of General Medicine
level of function seemed very high. Imagine that and Primary Care at Harvard Medical School.
a Fitbit or other home device provided data about “Open notes may both engage patients far more
274 J. Halamka
actively in their care and enhance safety when the “As one doctor noted: ‘My fears? Longer notes,
patient reviews their records with a second set more questions and messages from patients . . . In
reality, it was not a big deal.’”
of eyes.”
“Perhaps most important clinically, a remark- Walker suggests that so few patients were wor-
able number of patients reported becoming more ried, confused, or offended by the note because
likely to take medications as prescribed,” adds Jan “fear or uncertainty of what’s in a doctor’s ‘black
Walker, RN, MBA, co-first author and a Principal box’ may engender far more anxiety than what is
Associate in Medicine in the Division of General actually written, and patients who are especially
Medicine and Primary Care at BIDMC and Har- likely to react negatively to notes may self-select
vard Medical School. “And in contrast to the fears to not read them.”
of many doctors, few patients reported being con-
fused, worried or offended by what they read.” “We anticipate that some patients may be disturbed
The findings reflect the views of 105 primary in the short term by reading their notes and doctors
care physicians and 13,564 of their patients who will need to work with patients to prevent such
harms, ideally by talking frankly with them or
had at least one note available during a year-long agreeing proactively that some things are at times
voluntary program that provided patients at an best left unread.”
urban academic medical center, a predominantly
rural network of physicians, and an urban safety “When this study began, it was a fascinating
net hospital with electronic links to their doctors’ idea in theory,” says Risa Lavizzo-Mourey, MD,
notes. president and CEO of the Robert Wood Johnson
Of 5,391 patients who opened at least one note Foundation, the primary funder of the study.
and returned surveys, between 77 % and 87 % “Now it’s tested and proven. The evidence is in:
reported OpenNotes made them feel more in con- Patients support, use, and benefit from open med-
trol of their care, with 60–78 % reporting ical notes. These results are exciting – and hold
increased adherence to medications. Only 1–8 % tremendous promise for transforming patient
of patients reported worry, confusion, or offense, care.”
three out of five felt they should be able to add Although PatientSite provides great transpar-
comments to their doctors’ notes, and 86 % ency into ambulatory and inpatient records, the
agreed that availability of notes would influence ICU is still an area with limited patient and family
their choice of providers in the future. engagement. Patient-connected devices in the
Among doctors, a maximum of 5 % reported ICU provide a dizzying array of data but rarely
longer visits, and no more than 8 % said they provide an interpretation of that data that is useful
spent extra time addressing patients’ questions to families, especially while making end-of-life
outside of visits. A maximum of 21 % reported decisions. The Moore Foundation recently funded
taking more time to write notes, while between a grant for several hospitals, including BIDMC, to
3 % and 36 % reported changing documentation create unique patient dashboards that make the
content. process of care in ICUs more transparent and
No doctor elected to stop providing access to reduce harms. Here’s an example.
notes after the experimental period ended. As discussed previously, the father of the
“The benefits were achieved with far less author of this chapter had multiple sclerosis for
impact on the work life of doctors and their staffs 23 years. He also had myelodysplastic syndrome
than anticipated,” says Delbanco. “While a size- for 2 years, had 3 myocardial infarctions since
able minority reported changing the way their 2009, and died in mid-March of 2013.
notes addressed substance abuse, mental health When the family arrived at his ICU bedside in
issues, malignancies and obesity, a smaller minor- early March, they spoke with all his clinicians to
ity spent more time preparing their notes, and create a mental dashboard of his progress. It
some commented that they were improved.” looked something like this
Cardiac – history of 2 previous myocardial infarc- Ideally, all patients and families should have
tions treated with 5 stents. New myocardial the tools needed to make such decisions regard-
infarction resulting in apical hypokinesis and less of their medical sophistication.
an ejection fraction of 25 %. No further stent The Moore Foundation project includes
placement possible, maximal medical therapy an automated ICU dashboard/scorecard for
already given patients and families updated in real time
Pulmonary – new congestive heart failure post based on data aggregated from the medical
recent myocardial infarction treated with record and patient-connected telemetry. The
diuretics, nitroglycerine drip, afterload reduc- architecture includes a cloud-hosted decision
tion, upright position, and maximal oxygena- support web service. Hospitals send data in and
tion via bilevel positive airway pressure. the web service returns the wisdom of a graphical
O2 saturation in the 90s and falling despite display.
maximal therapy (other than intubation) Although OpenNotes and the Moore Founda-
Hematologic – failing bone marrow resulting in a tion ICU project implement new ways to share
white count of 1, a platelet count of 30, and a data and its interpretation, we still need addi-
hematocrit of 20 tional ways to involve patients and families in
Neurologic – significant increase in muscle spas- shared decision-making through the creation of
ticity, resulting in constant agitation. Pain med- shared care plans. BIDMC created the Passport
ication requirements escalating. Consciousness to Trust initiative, in collaboration with a com-
fading. mercial PHR software vendor. Patients and doc-
Renal – creatinine rising tors use a secure PHR website to develop a
shared care plan, and then that plan is sent to
Although the family did not have real-time the EHR using Meaningful Use standards and it
access to his records, they gathered enough data is made part of the permanent medical record
to turn this mental dashboard into a scorecard and integrated into care delivery. This kind of
green, yellow, and red indicators. third-party PHR to EHR integration is likely to
increase now that Meaningful Use requires
Cardiac – red due to irreversible low ejection EHRs to receive externally generated data.
fraction Also, care plan exchange is likely to be part of
Pulmonary – red due to the combination of falling future stages of Meaningful Use.
O2 saturation despite aggressive therapy An area in which more patient and family
Hematologic – red due to lack of treatment options engagement could be beneficial is in the area of
available for myelodysplastic syndrome and an clinical trial enrollment. Today, most patients are
inability to transfuse given the low ejection unaware of the new treatments that could provide
fraction and congestive heart failure a cure or breakthrough. Many are willing to enroll
Neurologic – yellow due to the potential for suc- in clinical trials but do not know how. Clinicians
cessful symptom control with pain medications may be unaware of matching criteria or a patient’s
Renal – yellow due to treatment options available suitability for a given trial. BIDMC has worked
for renal failure with a company called TrialX that enables patients
and providers to use PHRs and EHRs with inno-
The patient had expressed his wishes in a dura- vative electronic connections to clinical trial data-
ble power of attorney for healthcare – do not bases to facilitate the process. Not only can direct
intubate, do not resuscitate, no pressors, no feed- patient involvement in clinical trial enrollment
ing tubes, and no heroic measures. accelerate research, it is likely that patient sharing
From the combination of the dashboard, score- their experiences with other patients will enable
card, and his end-of-life wishes, it was clear that new discoveries to be rapidly disseminated for the
hospice was the best course of action. benefit of all.
276 J. Halamka
Conclusion Bourgeois FC, Taylor PL, Emans SJ, Nigrin DJ, Mandl
KD. Whose personal control? Creating private, person-
ally controlled health records for pediatric and adoles-
From 1999 to the present, personal health records cent patients. J Am Med Inform Assoc. 2008b;15(6):
have transitioned from a research project to the 737–43. https://doi.org/10.1197/jamia.M2865. Epub
mainstream and are now required by several fed- 2008 Aug 28.
eral programs. Patients and families increasingly Brennan PF, Downs S, Casper G. Project HealthDesign:
rethinking the power and potential of personal health
expect access to their records, a role in decision- records. J Biomed Inform. 2010;43 Suppl 5:S3–5.
making, and the convenience of using electronic https://doi.org/10.1016/j.jbi.2010.09.001.
workflows to manage their care. Consumer plat- Britto MT, Wimberg J. Pediatric personal health records:
forms continue to rapidly evolve, accelerated by current trends and key challenges. Pediatrics. 2009;123
Suppl 2:S97–9. https://doi.org/10.1542/peds.2008-
market demand and new interoperability stan- 1755I.
dards incorporated into electronic health records. Collins SA, Vawdrey DK, Kukafka R, Kuperman
As important as the technology has been, the GJ. Policies for patient access to clinical data via
breakthroughs of the past 5 years have been in PHRs: current state and recommendations. J Am Med
Inform Assoc. 2011;18 Suppl 1:i2–7. https://doi.org/
culture and policy. Clinicians no longer fear shar- 10.1136/amiajnl-2011-000400. Epub 2011 Sep 7.
ing the record or participating in secure messag- Council on Clinical Information Technology. Policy
ing. There are available policy solutions to tricky Statement–Using personal health records to improve
problems like sharing adolescent records with the quality of health care for children. Pediatrics.
2009;124(1):403–9. https://doi.org/10.1542/peds.2009-
their parents. 1005.
The next few years will be an important turn- Forsyth R, Maddock CA, Iedema RA, Lassere M. Patient
ing point for the medical industry as care perceptions of carrying their own health information:
becomes increasingly focused on continuous approaches towards responsibility and playing an
active role in their own health – implications for a
wellness rather than episodic sickness. Patient- patient-held health file. Health Expect. 2010;13(4):
generated healthcare data and patient involve- 416–26. https://doi.org/10.1111/j.1369-7625.2010.
ment in the entire process is essential to achiev- 00593.x.
ing our national and international policy goals Goel MS, Brown TL, Williams A, Cooper AJ, Hasnain-
Wynia R, Baker DW. Patient reported barriers to enroll-
for quality, safety, and efficiency. Patients, acting ing in a patient portal. J Am Med Inform Assoc.
as stewards of their own data, will facilitate data 2011;18 Suppl 1:i8–12. https://doi.org/10.1136/
sharing, discovery of new therapies, and innova- amiajnl-2011-000473. Epub 2011 Nov 9.
tion as part of a connected learning healthcare Haggstrom DA, Saleem JJ, Russ AL, Jones J, Russell SA,
Chumbler NR. Lessons learned from usability testing
system. of the VA’s personal health record. J Am Med Inform
Assoc. 2011;18 Suppl 1:i13–7. https://doi.org/10.1136/
amiajnl-2010-000082. Epub 2011 Oct 8.
References Kaelber J. A research agenda for personal health records.
Am Med Inform Assoc. 2008;15:729–36.
AHIMA e-HIM Personal Health Record Work Group. The Kim EH, Stolyar A, Lober WB, Herbaugh AL, Shinstrom
role of the personal health record in the EHR. SE, Zierler BK, Soh CB, Kim Y. Challenges to using an
J AHIMA. 2005;76(7):64A–D. electronic personal health record by a low-income
Archer N, Fevrier-Thomas U, Lokker C, McKibbon KA, elderly population. J Med Internet Res. 2009;11(4),
Straus SE. Personal health records: a scoping review. e44. https://doi.org/10.2196/jmir.1256.
J Am Med Inform Assoc. 2011;18(4):515–22. https:// Poulton M. Patient confidentiality in sexual health services
doi.org/10.1136/amiajnl-2011-000105. Review. and electronic patient records. Sex Transm Infect.
Beard L, Schein R, Morra D, Wilson K, Keelan J. The 2013;89(2):90. https://doi.org/10.1136/sextrans-2013-
challenges in making electronic health records accessi- 051014.
ble to patients. J Am Med Inform Assoc. 2012; Rudd P, Frei T. How personal is the personal health record?:
19(1):116–20. comment on “the digital divide in adoption and use of a
Bourgeois FC, Taylor PL, Emans SJ, Nigrin DJ, Mandl personal health record”. Arch Intern Med. 2011;
KD. Whose personal control? Creating private, person- 171(6):575–6. https://doi.org/10.1001/archinternmed.
ally controlled health records for pediatric and adoles- 2011.35. No abstract available.
cent patients. J Am Med Inform Assoc. 2008a;15(6): Saparova D. Motivating, influencing, and persuading
737–43. patients through personal health records: a scoping
review. Perspect Health Inf Manag. 2012;9:1f. Epub Wynia M, Dunn K. Dreams and nightmares: practical and
2012 Apr 1. ethical issues for patients and physicians using personal
Sittig DF, Singh H. Rights and responsibilities of users health records. J Law Med Ethics. 2010;38(1):64–73.
of electronic health records. CMAJ. 2012;184(13): https://doi.org/10.1111/j.1748-720X.2010.00467.x.
1479–83. Yamin CK, Emani S, Williams DH, Lipsitz SR, Karson
Sittig DF, Singh H, Longhurst CA. Rights and responsi- AS, Wald JS, Bates DW. The digital divide in ad
bilities of electronic health records (EHR) users option and use of a personal health record. Arch Intern
caring for children. Arch Argent Pediatr. 2013; Med. 2011;171(6):568–74. https://doi.org/10.1001/
111(6):468–71. archinternmed.2011.34.
A Framework for Health System
Comparisons: The Health Systems 13
in Transition (HiT) Series of the
European Observatory on Health
Systems and Policies
Bernd Rechel, Suszy Lessof, Reinhard Busse, Martin McKee,

Josep Figueras, Elias Mossialos, and Ewout van Ginneken
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
The Ljubljana Charter: HiTs and Health Systems in Transition . . . . . . . . . . . . . . . . . . . . . . . . 281
The Observatory Partnership: HiTs and Policy Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
The Observatory Functions: HiTs in a Wider Work Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
B. Rechel (*)
European Observatory on Health Systems and Policies,
London School of Hygiene and Tropical Medicine,
London, UK
e-mail: Bernd.Rechel@lshtm.ac.uk
S. Lessof · J. Figueras
Brussels, Belgium
e-mail: szy@obs.euro.who.int; jfi@obs.euro.who.int
R. Busse
Technische Universität Berlin, Berlin, Germany
Department Health Care Management, Faculty of
Economics and Management, Technische Universität,
Berlin, Germany
e-mail: rbusse@tu-berlin.de
M. McKee
London School of Hygiene and Tropical Medicine,
London, UK
e-mail: Martin.McKee@lshtm.ac.uk
E. Mossialos
London School of Economics and Political Science,
London, UK
e-mail: e.a.mossialos@lse.ac.uk
E. van Ginneken
Berlin University of Technology, Berlin, Germany
Department of Health Care Management, Berlin
University of Technology, Berlin, Germany
e-mail: ewout.vanginneken@tu-berlin.de

https://doi.org/10.1007/978-1-4939-8715-3_15
280 B. Rechel et al.
The HiT Template: Structuring, Populating, and Signposting a Comparative

Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Scope and Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
Signposting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
HiT Processes: Making Sure Frameworks Are Used Consistently
and Comparably . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Authors, Author Teams, and the Role of
(Contributing) Editors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Long-Term Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Flexibility, Consistency, and Signaling Gaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Dissemination and Policy Relevance: Helping Frameworks Achieve Their
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Timeliness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Visibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Signaling Credibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Lessons Learned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
The Value of a Template . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
The Importance of Author and Editor Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
The Need to Build In “Accessibility” and
Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
The Need to Signal Credibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
The Need to Build in a Review Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
Abstract lessons learned so far and what they might

Comparing health systems across countries contribute to the development of other compar-
allows policy-makers to make informed deci- ative frameworks.
sions on how to strengthen their systems. The
European Observatory on Health Systems and
Policies produces a series of profiles that sys- Introduction
tematically describe health systems – the HiTs.
These capture how a health system is orga- The European Observatory on Health Systems and
nized, how funds flow through the system, Policies (Observatory) is a partnership of countries,
and what care it provides. They follow a com- international organizations, and academic institu-
mon template and are updated periodically. tions that was set up to provide evidence for the
They allow policy-makers and academics to policy-makers shaping Europe’s health systems. A
understand each system individually in light central pillar of its work is the Health Systems in
of its previous development and in the context Transition (HiT) series – a set of highly structured
of other European health systems. In effect, the and analytic descriptions of country health systems
HiTs provide a framework for comparison that are updated periodically. This experience of
across countries. This chapter describes the monitoring and comparing country health systems
Observatory’s experience in developing the and policies, which stretches back over 20 years,
framework. It explores the role of the HiT provides insights into the challenges researchers
template, the processes put in place to support face in developing and applying any framework
consistency and comparability, and the efforts for health system comparisons. Understanding the
to build in policy relevance. It highlights the background to the HiT series and the Observatory
13 A Framework for Health System Comparisons: The Health Systems in Transition (HiT) Series. . . 281
helps explain the specific approach taken to HiTs, The Observatory Partnership: HiTs
but also speaks of the significance of context in and Policy Relevance
developing comparative frameworks.
Many of (what came to be) the Observatory team
were involved in developing evidence for Lju-
The Ljubljana Charter: HiTs and Health
bljana. The Observatory, which took formal
Systems in Transition
shape in May 1998, was designed to take forward
the approach to evidence for policy, after the
The Observatory can trace its origins to the early
Charter was agreed (Box 1). The original Partners
1990s and the challenges Europe faced as western
were WHO Europe, the government of Norway,
European expectations (and health-care costs) rose
the European Investment Bank, the World Bank,
and as the countries emerging in the wake of the
the London School of Economics and Political
Soviet Union looked to overhaul their own health
Science (LSE), and the London School of
systems. The World Health Organization (WHO)
Hygiene & Tropical Medicine (LSHTM). The
Regional Office for Europe facilitated a process
exact composition of the partnership has changed
that culminated in the 1996 Ljubljana conference
over the years, so that the Observatory today also
on European Health Care Reforms and the Lju-
includes the European Commission, more
bljana Charter, in which health ministers from
national governments (Austria, Belgium, Finland,
across the European region committed them-
Ireland, Slovenia, Sweden, and the United King-
selves to a set of principles for health system
dom), a regional government (Veneto), and the
reform. These reflected a growing understanding
French National Union of Health Insurance
of health’s part in the wider society and economy,
Funds (UNCAM); but the concept of a partner-
the importance of people and patients, the need
ship that brings different stakeholders together
for policy to be “based on evidence where avail-
remains the same. The idea is that the Observa-
able,” and the role of monitoring and learning
tory, like a good health system, is informed by
from experience (Richards 2009).
the people who use its services as well as those
The original HiTs were developed as part of the
providing them. The Partners have genuine expe-
preparations for the Ministerial Conference. They
rience of shaping health systems, and this has
were addressing a postcommunist Europe in
prompted a focus on policy relevance and how
which more than 15 new countries had emerged
decision-makers can access and use the evidence
and many more were making a transition from
generated. They have insisted that the HiT series
state-managed to market economies with all the
should be “accessible” to a nonspecialist, non-
accompanying economic upheaval. There were
academic audience and, more specifically, be
also growing challenges to the sustainability of
readable, clearly structured, consistent (so that
established and wealthy health systems and to
readers can move from one HiT to another and
notions of solidarity. The HiTs had therefore to
find comparable information), and timely, that is,
establish a common vocabulary for describing
available while the data and analysis are still
health systems and to make sure that the terms
current.
used could be explained and understood in coun-
tries with very different traditions. They had also
to provide for the fact that the systems to be
compared were contending with significant dis- Box 1: The European Observatory on Health
continuities and ongoing change. This prompted Systems and Policies
the development of a template to describe health The core mission of the Observatory is to
systems that would set down the bases on which to support and promote evidence-based health
make comparisons across countries. It was com- policy-making through the comprehensive
prehensive, allowed for very different path devel- and rigorous analysis of the dynamics of
opments, and offered detailed explanations to
guide authors. (continued)
and secondary data collection. These have pro-

Box 1: (continued) vided insights into important health system dimen-
health systems in Europe and through sions and how they impact on each other. At the
brokering the knowledge generated. same time, they create (a positive) pressure on
The Observatory is overseen by a steering the HiT series to deliver consistent and compara-
committee, made up of representatives of all ble information that can feed into more in-depth
its Partners, which sets priorities and empha- analysis. The performance assessment work has
sizes policy relevance. Work is shared across given the Observatory the tools to understand the
four teams with a “secretariat” in Brussels use (and misuse) of performance measures and
that coordinates and champions knowledge address how far systems achieve their goals. The
brokering and analytic teams in Brussels, contribution of these “other” functions to the HiT
London (at LSE and LSHTM), and in Berlin makes clear the value of wide-ranging inputs
(at the University of Technology). from different specialist and thematic perspec-
The core staff team carries out research tives in developing a comparative framework.
and analysis but depends (often) on second-
ary research and (almost always) on the Box 2: The Observatory’s Core Functions
Observatory’s extensive academic and pol- • Country monitoring generates systematic
icy networks. Over 600 researchers and overviews of health systems in Europe
practitioners provide country- and topic- (and in key OECD countries beyond) in
specific knowledge, insights, and under- the form of Health Systems in Transition
standing. Collectively the Observatory and (HiT) reviews. All HiTs are available on
those who contribute to it equip Europe’s the web, listed in PubMed, and dissemi-
policy-makers and their advisors with eval- nated at launches and conferences.
uative and comparative information that can The Health Systems and Policy Mon-
help them make informed policy choices. itor (HSPM) is a new initiative to update
HiTs online. It is a web platform that
hosts 27 “living HiTs.” These are regu-
larly updated by the expert members of
The Observatory Functions: HiTs the HSPM network with short “reform
in a Wider Work Plan logs” and longer “health policy updates.”
These give users news and insights into
The Observatory has four core functions: country policy processes and developments
monitoring, analysis, comparative health system http://www.hspm.org/mainpage.aspx.
performance assessment, and knowledge brokering The HSPM also allows users to extract
(Box 2). The HiTs are a fundamental part of coun- and merge specific sections from the HiTs
try monitoring, supported by a (relatively) new for several countries at the same time as a
initiative to provide online updates – the Health single file, facilitating comparisons http://
Systems and Policy Monitor (HSPM). They are, to www.hspm.org/searchandcompare.aspx.
some extent, a stand-alone exercise. However, the • Analysis provides for in-depth work on
fact that the Observatory’s portfolio of work is core health system and policy issues. The
broader than country monitoring has done much Observatory brings together teams of
to strengthen the comparative framework. The academics, policy analysts, and practi-
analysis program runs in-depth studies of issues tioners from different institutions, coun-
like governance, insurance mechanisms, staff tries, and disciplines to ensure rigorous
mobility, hospitals, primary care, care for chronic meta-analysis and secondary research on
conditions, and the economics of prevention, using
HiTs, but also reviews of the academic literature (continued)
Structure
Box 2: (continued)
the issues that matter most to decision- The HiT template benefits from a clear structure,
makers. All evidence is available “open based on a functional perspective of health sys-
access” to facilitate its use in practice. tems. It works from the premise that all health
• Performance assessment includes a pack- systems perform a number of nonnormative core
age of methodological and empirical work functions (Duran et al. 2012), including the orga-
designed to respond to country needs. nization, the governance, the financing, the gen-
There have been two key studies looking eration of physical and human resources, and the
at the policy agenda for performance com- provision of health services. The first HiT tem-
parison to improve health services and plate was developed in 1996. It was revised in
separate work on the domains that com- 2007 and again in 2010, but all iterations have
prise performance (efficiency, population used the notion of core functions and have drawn
health, responsiveness). on the literature and prevailing debate to interpret
• Knowledge brokering involves engaging what those functions are.
with policy-makers to understand what All revisions have involved input from staff
evidence they need and then assembling (editors) and national authors, based on their
and communicating the relevant informa- work on the country profiles, but they have also
tion at the right time. The Observatory included consultation with a wider group of users
combines an extensive publication pro- and stakeholders (Observatory Partners, various
gram with face-to-face and electronic dis- units of WHO and of the European Commission’s
semination to convey evidence on what health directorate, and, more recently, members of
might work better or worse in different the HSPM network). These review stages have
country and policy contexts. helped strengthen the template and build some
consensus around its structure and approach.
Table 1 shows the changes over time and the
very marked structural consistency between ver-
The HiT Template: Structuring, sions. This is in part because of a conscious deci-
Populating, and Signposting sion to adapt rather than rethink the structure
a Comparative Framework completely so that HiT users can read backwards
in time as well as across countries. It is also a
HiTs use a standard questionnaire and format to testament to the robustness of the first iteration.
guide authors – referred to as the HiT template. It The adjustments reflect on a wider rethinking on
guides the production of detailed descriptions of how different elements fit into the whole and on
health system and policy initiatives so that every what seemed more or less important at particular
HiT examines the organization, financing, and times.
delivery of health services, the role of key actors, The initial template placed more emphasis on
and the challenges faced in the same way, estab- the political, economic, and sociodemographic
lishes a comparable baseline for reviewing the context and on a country’s historical background,
impact of reforms, and takes a standardized because of the proximity to transition for so many
approach to health system assessment. This struc- eastern European countries. The 2004–2007 revi-
ture is central to the ability of HiTs to inform sion consolidated financing in one chapter, bring-
comparative analysis and facilitates the exchange ing together the collection and allocation of funds,
of reform experiences across countries. Arriving and split the chapter on organization and manage-
at a robust template is not straightforward, but the ment to address planning and regulation sepa-
Observatory’s experience suggests some elements rately, reflecting shifts in emphasis at the time in
that can help. wider academic and policy thinking. In addition, a
Table 1 The evolution of the HiT template structure

Version 1: developed 1995–1996a Version 2: developed 2004–2007b Version 3: developed 2009–2010c
Introduction and historical background Introduction Introduction
Organizational structure and management Organizational structure Organization and governance
Health-care finance and expenditure Financing Financing
Planning and regulation
Physical and human resources Physical and human resources
Health-care delivery system Provision of services Provision of services
Financial resource allocation
Health-care reforms Principal health-care reforms Principal health-care reforms
Assessment of the health system Assessment of the health system
Conclusions Conclusions Conclusions
References Appendices Appendices
a
Figueras and Tragakes (1996)
b
Mossialos et al. (2007)
c
Rechel et al. (2010)
new chapter was added, on the assessment of the tightening of the template (see Box 3) after
health system, again a response to the more which the 2013 Estonia HiT dropped to
explicit way this issue was being addressed at 195 pages, and it is being revisited again in the
the time. The 2010 template condensed organiza- 2015–2016 update.
tion, governance, planning, and regulation into a
single chapter again and revised and extended the
Box 3: The 2010 Template, Structure
section on performance assessment as policy-
and Contents
makers became increasingly interested in under-
1. Introduction: the broader context of the
standing and contextualizing the evaluations
health system, including economic and
of their health systems that they were being
political context, and population health
confronted with.
2. Organization and governance: an
overview of how the health system in
the country is organized, the main actors
Scope and Content and their decision-making powers, the
historical background, regulation, and
There were of course other changes to the tem- levels of patient empowerment
plate between iterations in terms of the detail 3. Financing: information on the level of
addressed within the relatively stable overall expenditure, who is covered, what bene-
structure. New questions and issues were added fits are covered, the sources of health-
because areas like mental health, child health ser- care finance, how resources are pooled
vices, and palliative care (2007) or public health and allocated, the main areas of expen-
and intersectorality (2010) came to the policy fore diture, and how providers are paid
and as a wide group of experts and users were 4. Physical and human resources: the
consulted. The 2007 template was particularly planning and distribution of infrastruc-
heavily laden with new additions and contributed ture and capital stock, IT systems, and
to longer and more time-consuming HiTs. Cer- human resources, including registration,
tainly there was a marked growth in the length training, trends, and career paths
of HiTs in successive iterations with Estonia, for 5. Provision of services: concentrates on
example, growing from 67 pages in 2000, to patient flows, organization and delivery
137 pages in 2004, and 227 pages in 2008. This
was addressed to some extent in 2010 with a (continued)
and easier to read and update. The editorial team

Box 3: (continued) also drew up word limits for chapters, although
of services, addressing public health, pri- these have not been included in the published
mary and secondary health care, emer- template yet; they are used with authors to agree
gency and day care, rehabilitation, the length of HiTs. The changes in the way terms
pharmaceutical care, long-term care, ser- are explained reflect the fact that they are now
vices for informal carers, palliative care, familiar to authors and readers alike.
mental health care, dental care, comple- Key changes that have been aimed at readers
mentary and alternative medicine, and include the reorganization of several subsections
health care for specific populations to increase accessibility and clarity and the
6. Principal health reforms: reviews introduction of summary paragraphs with key
reforms, policies, and organizational messages at the start of chapters, an abstract
changes that have had a substantial (of less than one page), and an executive
impact on health care, as well as future summary (of three to five pages). These pull out
developments (or signpost) findings in a way that allows policy-
7. Assessment of the health system: pro- makers and their advisers quick access and is in
vides an assessment based on the stated line with the Observatory’s growing understand-
objectives of the health system, financial ing of knowledge brokering (Catallo et al. 2014)
protection, and equity in financing; user and the testing of “HiT Summaries” between 2002
experience and equity of access to health and 2008.
care; health outcomes, health service There is a further round of revision which
outcomes, and quality of care; health started in 2015 and is now being piloted, which
system efficiency; and transparency and will fine-tune the HiT template. It will signpost
accountability still more explicitly how health systems are doing
8. Conclusions: highlights the lessons by integrating more evaluative elements in the
learned from health system changes and broadly “descriptive” sections rather than keeping
summarizes remaining challenges and them all for a single, policy-focused, assessment
future prospects section.
9. Appendices: includes references, fur-
ther reading, and useful web sites
HiT Processes: Making Sure
Frameworks Are Used Consistently
and Comparably
Signposting
The HiT template in its various iterations has
The HiT template has also seen a number of guided the writing of country profiles, providing
significant changes to layout and design. These a clear overall structure, as well as detailed notes
have aimed firstly to make the template itself more on what belongs in the various subsections. How-
user-friendly for authors and editors and secondly ever, despite its definitions and advice on how to
to create easier to read HiTs. produce a HiT, it is not a tool that can ensure
Key changes from the perspective of authors consistency and comparability on its own. This
have been clear signposting of sections or sets of is because health systems are so complex and
questions that are “essential” and of those which there are so many layers of information that
are only “discretionary” and some reworking of could be deemed relevant. The Observatory has
the glossary elements and examples that charac- therefore developed a range of practice over
terized the 1996 template. The intention in flag- the last 20 years that helps make the template
ging what is and what is not essential is to help into a framework that supports health system
authors and editors to focus and keep HiTs short comparisons.
Data Sources research expertise and signal credibility.

Appointing small teams of national authors can
Data is of course a constant issue in seeking to also be a helpful way of bringing different skills
make comparisons, particularly across countries. and knowledge into the process. However experi-
The Observatory has chosen to supply quantita- enced the author team, writing a HiT is a complex
tive data in the form of a set of standard compar- process. The role of the editor is extremely impor-
ative tables and figures for each country, drawing tant and a crucial factor in applying the HiT
on the European Health for All Database framework so that it can support comparisons.
(HFA-DB) of the WHO Regional Office for Observatory editors play a proactive role and are
Europe, as well as the databases from the Organi- expected to address not just the quality of the
zation for Economic Co-operation and Develop- individual profile they are working on but its fit
ment (OECD), the Eurostat, and the World Bank. with the rest of the series. They are often credited
All of these international databases rely largely on as authors because of the contribution they make.
official figures provided and approved by national
governments. These are not unproblematic. The
WHO Europe HFA database covers the 53 coun- Long-Term Relationships
tries of its European region and Eurostat the
28 EU member states and the 4 members of the The HiTs are updated periodically, and the Obser-
European Free Trade Association, while OECD vatory has found that building long-term relation-
Health Statistics covers the 34 OECD countries ships with its author teams is efficient in terms of
(of which only 26 are in WHO’s European region minimizing the learning curve (and costs) of new
and 22 in the EU). There are also differences in iterations and, as importantly, is effective in pro-
definitions and data collection methods. However, moting focus and consistency. The template is a
they have the merit of being consistently compiled complicated instrument and familiarity with it
and rigorously checked. National statistics are (and a role in shaping it) makes a difference in
also used in the HiTs, although they may raise authors’ ability to use it. It also fosters a sense of
methodological issues, as are national and co-ownership of and commitment to the outputs.
regional policy documents, and academic and The HSPM initiative (Box 3) has strengthened
gray literature, although these do not of course these links, engaging authors and contributors by
have comparability built in. Data in HiTs is sharing ownership, creating publishing opportu-
discussed and assessed, and there is explicit atten- nities (with its dedicated series in Health Policy
tion given to discrepancies between national and http://www.hspm.org/hpj.aspx), and holding
international sources. annual meetings which let authors and editors
meet and exchange ideas. Efforts to properly
integrate national experts into thinking on a com-
Authors, Author Teams, and the Role parative framework and to acknowledge their
of (Contributing) Editors contribution are demonstrably worthwhile.
HiTs are produced by country experts in close

collaboration with Observatory (analytic) staff. Flexibility, Consistency, and
Having a national author is important because Signaling Gaps
the framework covers so much ground it is
extremely difficult to marshal the range of infor- The experience of writing HiTs makes clear that
mation needed to complete it from “outside.” It no two health systems are identical. There needs
also creates ownership within the country and the to be an ability, therefore, to apply the template
national academic community which encourages thoughtfully. Each profile should bring out what
subsequent use of the profile. The choice of is important in a country without slavishly
national experts is important and needs to reflect rehearsing details that are not pertinent while
simultaneously maintaining comparability with what is happening which different sectors, minis-
other countries. It has proved to be helpful to tries, and levels of the health service (primary,
flag up where data is missing or an element of a secondary, regional, local) can all subscribe
system is not yet developed rather than simply to. They use HiTs in considering reforms, as the
avoiding mention of it, as it helps readers under- basis for policy dialogue and to explore policy
stand gaps. Editors have an important role in options, and to set their own health system’s per-
steering HiTs between flexibility and consistency formance in a European context. Other users are
and deciding what should be included or omitted. foreign analysts or consultants trying to get a
They meet regularly to exchange experience and comprehensive understanding of a health system,
discuss practice. and researchers and students. HiTs are a single
source of information and pull together different
strands of analysis which otherwise can be sur-
Review prisingly hard to find in “one place.”
Review is an essential element of the HiT process.

Timeliness
Each HiT editor works with their supervisor and
the Brussels secretariat as needed to resolve
Any comparative evidence will have more impact
issues. When the draft HiT is complete to their
if it is delivered when it is still “current” and if it
satisfaction, the Observatory combines external
can coincide with a window of opportunity for
review by academics (typically one national and
reform. The Observatory tries to turn HiTs around
one international) with that of policy-makers. This
in the shortest possible time, although this is not
means quality is addressed not only through aca-
always easy. The Health Systems and Policy Mon-
demic criteria but also in terms of readability,
itor is, in part, a response to this and provides a log
credibility, and policy relevance. The draft is
of policy developments and reforms online in
also sent to the Ministry of Health and the Obser-
between formal HiT updates. Other steps to ensure
vatory Partners for comment. Ministries of Health
that material is not superseded by developments
are allowed 2 weeks to flag any factual concerns,
before it is published include agreeing a schedule
but they do not approve HiTs. In the same way,
with authors in advance, efforts to keep HiTs short
Partners can comment but do not have a clearance
and focused, and quick turnaround on review
function. Any feedback provided is handled by
stages, all of which must be underpinned by strong
the editor and introduced only where it is consis-
project management on the part of the HiT editor.
tent with the evidence. Completed HiTs are given
Linking HiTs to an entry point where they are
a final check by one of the Observatory’s codirec-
likely to be considered by policy-makers is both a
tors or hub coordinators to ensure that they
way of motivating authors to deliver on time and a
achieve expectations on quality and objectivity
way of securing impact when they do. The Obser-
and fulfill the aims of the series.
vatory has successfully tied HiTs and HiT launches
to EU Presidencies (Denmark 2012, Lithuania
2013, Italy 2014, Luxembourg 2015), to moments
Dissemination and Policy Relevance:
of political change (Ukraine 2015), and to major
Helping Frameworks Achieve Their
reform programs in countries (Slovenia 2016).
Objectives
The HiTs are designed to allow comparisons Visibility

across countries, but they are not intended purely
to feed into (academic) research and analysis. HiT HiTs can only be used when potential users are
audiences are often national policy-makers who aware of their existence. The Observatory has
use the HiT to take stock of their own health developed a mix of dissemination approaches
system and to reach a shared understanding of to encourage uptake. There are launch events,
typically in the country and in collaboration with

national authors, partner institutions, and Ministries Box 4: (continued)
of Health. These work particularly well if linked to Kazakhstan HiT (2012)
a policy dialogue (a facilitated debate about policy Kyrgyzstan HiT (2011)
options for decision-makers) or a major national or Latvia HiT (2012)
international conference (like the Polish annual Lithuania HiT (2013)
National Health Fund meeting or the Czech Presi- Luxembourg HiT (2015)
dency of the Visegrad Group) or a workshop or Malta HiT (2014)
meeting held by other agencies (European Com- Mongolia HiT (2007)
mission meeting on health reform in Ukraine). Netherlands HiT (2016)
All HiTs are available as open access online on New Zealand HiT (2001)
the Observatory’s web site and there are e-bulletins Norway HiT (2013)
and tweets to draw attention to new publications Poland HiT (2011)
http://www.euro.who.int/en/about-us/partners/ Portugal HiT (2011)
observatory. A list of the latest available HiTs for Republic of Korea HiT (2009)
the various countries is shown in Box 4. Republic of Moldova HiT (2012)
Romania HiT (2016)
Russian Federation HiT (2011)
Box 4: Latest Available HiTs, September 2016 Slovakia HiT (2011)
Albania HiT (2002) Slovenia HiT (2016)
Andorra HiT (2004) Spain HiT (2010)
Armenia HiT (2013) Sweden HiT (2012)
Australia HiT (2006) Switzerland HiT (2015)
Austria HiT (2013) Tajikistan HiT (2016)
Azerbaijan HiT (2010) The former Yugoslav Republic of Macedo-
Belarus HiT (2013) nia HiT (2006)
Belgium HiT (2010) Turkey HiT (2011)
Bosnia and Herzegovina HiT (2002) Turkmenistan HiT (2000)
Bulgaria HiT (2012) Ukraine HiT (2015)
Canada HiT (2013) United Kingdom HiT (2015)
Croatia HIT (2014) United Kingdom, England HiT (2011)
Cyprus HiT (2012) United Kingdom, Northern Ireland
Czech Republic HiT (2015) HiT (2012)
Denmark HiT (2012) United Kingdom, Scotland
Estonia HiT (2013) HiT (2012)
Finland HiT (2008) United Kingdom, Wales HiT (2012)
France HiT (2015) United States of America HiT (2013)
Georgia HiT (2009) Uzbekistan HiT (2014)
Germany HiT (2014)
Greece HiT (2010)
Hungary HiT (2011) Translations can also be extremely helpful in
Iceland HiT (2014) facilitating national access, and HiTs have been
Ireland HiT (2009) translated from English into 11 other languages,
Israel HiT (2015) including Albanian, Bulgarian, Estonian, French,
Italy HiT (2014) Georgian, Polish, Romanian, Russian, Spanish,
Italy, Veneto Region HiT (2012) and Turkish. However, translation is expensive
Japan HiT (2009) and requires careful review by national authors
as concepts and policy terms often pose problems.
Signaling Credibility rather than single authors, and by building long-

term relationships, which is possible through a
Securing visibility alone cannot ensure uptake. It is network like the HSPM. Good authors must be
helpful also to demonstrate credibility. The Obser- complemented by equally skilful editors who can
vatory has gone about this in a number of ways. support the authors and ensure consistency.
It invests considerable resources in “presentation,” Bringing editors and authors together to agree
i.e., copy-editing and typesetting, so that the HiTs expectations around timing and quality can be
signal professionalism. It also endorses all aspects extremely effective, as is keeping editors in
of the International Committee of Medical Journal touch with each other.
Editors’ Uniform Requirements for Manuscripts The experience of the Observatory suggests
Submitted to Biomedical Journals (www.ICMJE. that it is useful to provide for two roles analogous
org) that are relevant to HiTs. It has also taken time to national author and HiT editor, to have clear
to make the HiTs compatible with PubMed/ (academic) criteria for guiding the choice of
Medline requirements, and the Health Systems in author, to schedule an initial meeting between
Transition series has been recognized as an inter- the editor and author(s) to go through the tem-
national peer-reviewed journal and indexed on plate and clarify expectations, and to agree a
PubMed/Medline since 2010. clear timetable. In the case of the HiT template,
there is often discussion of how to tailor the HiT
to national circumstances (and specifically of
Lessons Learned which areas will be addressed in more detail
and which in less), but this may not apply to
The experience of the HiT series suggests a num- other comparative frameworks. The experience
ber of lessons for frameworks for health system with HiTs also suggests there needs to be allow-
comparisons. These include: ance for numerous drafts and iterations before
the overall manuscript is ready for review.
The Value of a Template While this may be less of an issue in frameworks
with a narrower coverage, plans should include
A template that follows a rational and defendable sufficient opportunities for authors and editors to
structure, establishes a common vocabulary with exchange views.
clearly defined terms (supported by examples
when appropriate), and is mindful of the way
researchers from different disciplines and national The Need to Build In “Accessibility”
traditions may understand it is an invaluable tool. It and Relevance
needs to include clear and sensible explanations on
how to use it, be sufficiently robust to accommo- Users need to be considered in designing
date change over time, and allow a certain degree the template, the processes to deliver the compar-
of flexibility. It should also reflect on what the final isons, and the way findings are disseminated.
output is expected to be and who will use it. Readable, well-structured, well-presented reports
that allow users to move from one report to
another and find comparable information easily
The Importance of Author and Editor will increase uptake and impact. Abstracts,
Roles summaries, and key messages will all help dif-
ferent users access the things they need. An
Comparative work demands data collection and example of a cover and an executive summary
analysis in different settings and national exper- of a HiT are shown in Fig. 1 and Box 5. Deliv-
tise is key to this. Selecting authors with appro- ering timely (current) data and analysis is also
priate skills and credibility is therefore essential important if the evidence generated is to have an
and is boosted by clear criteria, by using teams impact. Reports that are overly long and
Fig. 1 Cover of the 2014

German HiT (Source: Busse
and Blümel 2014)
detailed can still be useful, but they may tend to

be used by academics rather than policy- Box 5: (continued)
makers. Furthermore, those developing compar- European Union (EU). Berlin is the
ative frameworks need to have an explicit country’s capital and, with 3.5 million resi-
debate as to how best to balance the comprehen- dents, Germany’s largest city.
sive against the manageable and the timely. A In 2012 Germany’s gross domestic
mix of approaches to dissemination should be product (GDP) amounted to approximately
considered, paying attention to ease of access, €32 554 per capita (one of the highest in
free download from the Internet, and translation Europe). Germany is a federal parliamen-
into other languages. tary republic consisting of 16 states
(Länder), each of which has a constitution
reflecting the federal, democratic, and
Box 5: Executive Summary from Germany, social principles embodied in the national
Health System Review, 2014 constitution known as the Basic Law
The Federal Republic of Germany is in cen- (Grundgesetz).
tral Europe, with 81.8 million inhabitants By 2010, life expectancy at birth in Ger-
(December 2011), making it by some dis- many had reached 78.1 years for men and
tance the most populated country in the
(continued)
Box 5: (continued) Box 5: (continued)

83.1 years for women (slightly below the the population – either mandatorily or vol-
Eurozone average of 78.3 years for men and untarily. Cover through PHI is mandatory
84.0 years for women, although the gap for certain professional groups (e.g., civil
with other similar European countries has servants), while for others it can be an alter-
been narrowing). Within Germany, the gap native to SHI under certain conditions (e.g.,
in life expectancy at birth between East and the self-employed and employees above a
West Germany peaked in 1990 at 3.5 years certain income threshold). In 2012, the per-
for men and 2.8 years for women, but centage of the population having cover
narrowed following reunification to through such PHI was 11%. PHI can also
1.3 years for men and 0.3 years for provide complementary cover for people
women. Moreover, differences in life with SHI, such as for dental care. Addition-
expectancy in Germany no longer follow a ally, 4% of the population is covered by
strict east–west divide. The lowest life sector-specific governmental schemes
expectancy for women in 2004, for exam- (e.g., for the military). People covered by
ple, was observed in Saarland, a land in the SHI have free choice of sickness funds and
western part of the country. are all entitled to a comprehensive range of
A fundamental facet of the German polit- benefits.
ical system – and the health-care system in Germany invests a substantial amount of
particular – is the sharing of decision- its resources in health care. According to the
making powers between the Länder, the Federal Statistical Office (Statistisches
federal government, and civil society orga- Bundesamt), which provides the latest
nizations. In health care, the federal and available data on health expenditure, total
Länder governments traditionally delegate health expenditure was €300.437 billion in
powers to membership-based (with manda- 2012, or 11.4% of GDP (one of the highest
tory participation), self-regulated organiza- in the EU). This reflects a sustained increase
tions of payers and providers, known as in health-care expenditure even following
“corporatist bodies.” In the statutory health the economic crisis in 2009 (with total
insurance (Gesetzliche Krankenver- health expenditure rising from 10.5% of
sicherung (SHI)) system, these are, in par- GDP in 2008).
ticular, sickness funds and their associations Although SHI dominates the German dis-
together with associations of physicians cussion on health-care expenditure and
accredited to treat patients covered by SHI. reform(s), its actual contribution to overall
These corporatist bodies constitute the self- health expenditure was only 57.4% in 2012.
regulated structures that operate the financ- Altogether, public sources accounted for
ing and delivery of benefits covered by SHI, 72.9% of total expenditure on health, with
with the Federal Joint Committee the rest of public funding coming principally
(Gemeinsamer Bundesausschuss) being from statutory long-term care insurance
the most important decision-making body. (Soziale Pflegeversicherung). Private
The Social Code Book (Sozialgesetzbuch sources accounted for 27.1% of total expen-
(SGB)) provides regulatory frameworks; diture. The proportion of health care
SGB V has details decided for SHI. financed from taxes has decreased through-
Since 2009, health insurance has been out the last decades, falling from 10.8% in
mandatory for all citizens and permanent 1996 to 4.8% in 2012. The most significant
residents, either through SHI or private decrease of public expenditure was recorded
health insurance (PHI). SHI covers 85% of
(continued)

for long-term care (over 50%) with the intro- for private services). Payment of physicians
duction of mandatory long-term care insur- by the SHI is made from an overall
ance in 1993 shifting financing away from morbidity-adjusted capitation budget paid
means-tested social assistance. by the sickness funds to the regional asso-
The 132 sickness funds collect contribu- ciations of SHI physicians (Kassenärztliche
tions and transfer these to the Central Vereinigungen), which they then distribute
Reallocation Pool (Gesundheitsfonds; liter- to their members according to the volume of
ally, “Health Fund”). Contributions increase services provided (with various adjust-
proportionally with income to an upper ments). Payment for private services is on
threshold (a monthly income of €4050 in a fee-for-service basis using the private fee
2014). Since 2009 there has been a uniform scale, although individual practitioners typ-
contribution rate (15.5% of income). ically charge multiples of the fees indicated.
Resources are then redistributed to the sick- In 2012, there were 2017 hospitals with a
ness funds according to a morbidity-based total of 501 475 beds (6.2 beds per 1000;
risk-adjustment scheme (morbiditätsor- higher than any other EU country). Of
ientierter Risikostrukturausgleich; often these, 48% of beds were in publicly owned
abbreviated to Morbi-RSA), and funds hospitals, 34% in private non-profit, and
have to make up any shortfall by charging 18% in private for-profit hospitals. Both
a supplementary premium. SHI and PHI (as well as the two long-term
Sickness funds pay for health-care pro- care insurance schemes) use the same pro-
viders, with hospitals and physicians in viders. Although acute hospital beds have
ambulatory care (just ahead of pharmaceu- been reduced substantially since 1991, the
ticals) being the main expenditure blocks. number of acute hospital beds is still almost
Hospitals are financed through “dual financ- 60% higher than the EU15 (15 EU Member
ing,” with financing of capital investments States before May 2004) average. The aver-
through the Länder and running costs age length of stay decreased steadily
through the sickness funds, private health between 1991 and 2011, falling from 12.8
insurers, and self-pay patients – although to 7.7 days.
the sickness funds finance the majority of Health care is an important employment
operating costs (including all costs for med- sector in Germany, with 4.9 million people
ical goods and personnel). Financing of working in the health sector, accounting for
running costs is negotiated between individ- 11.2% of total employment at the end of
ual hospitals and Länder associations of 2011. According to the WHO Regional
sickness funds and primarily takes Office for Europe’s Health for All Database,
place through diagnosis-related groups 382 physicians per 100 000 were practicing
(Diagnose-bezogene Fallpauschale; in primary and secondary care. Thus, the
DRGs). Public investment in hospital infra- density of physicians in Germany was
structure has declined by 22% over the last slightly above the EU15 average and sub-
decade and is not evenly distributed; in stantially higher than the EU28 (Member
2012, hospitals in the western part of Ger- States at 1 July 2013) average; the relative
many received 83% of such public numbers of nurses and dentists are also
investment. higher than the EU average. With the EU
Payment for ambulatory care is subject enlargements of 2004 and 2007, a growing
to predetermined price schemes for each migration of health professionals to
profession (one for SHI services and one
(continued)

Germany had been expected. In fact, the drugs exceed the reference price. For phar-
number of foreign health workers grew maceuticals with an additional benefit
from 2000 and reached its peak in 2003, beyond existing reference price groups,
thus before the enlargements. The extent reimbursement amounts are negotiated
of migration to Germany is relatively small between the manufacturer and the Federal
compared with that to other destination Association of Sickness Funds
countries in the EU. (GKV-Spitzenverband). Patients generally
Ambulatory health care is mainly pro- pay co-payments for pharmaceuticals of
vided by private for-profit providers. €5–10; there are also other cost-saving mea-
Patients have free choice of physicians, psy- sures, such as provisions for generic substi-
chotherapists (including psychologists pro- tution. Of the pharmaceutical industry’s
viding psychotherapy, since 1999), dentists, total turnover in 2011 of €38.1 billion,
pharmacists, and emergency room services. €14.3 billion was gained in the domestic
Although patients covered by SHI may also market and €23.8 billion from exports
go to other health professionals, access to (62.5%); Germany is the third largest pro-
reimbursed care is available only upon ducer of pharmaceuticals in the world after
referral by a physician. In 2012, of the the United States and Japan.
121 198 practicing SHI-accredited physi- Public health is principally the responsi-
cians in Germany (psychotherapists not bility of the Länder, covering issues such as
included), 46% were practicing as family surveillance of communicable disease and
physicians and 54% as specialists. German health promotion and education. Histori-
hospitals have traditionally concentrated on cally, the Länder have resisted the influence
inpatient care, with strict separation from of the federal government on public health,
ambulatory care. This rigid separation has and although some elements of public
been made more permeable in recent years health have been included in SHI in recent
and now hospitals are partially authorized to decades (such as cancer screening), and
provide outpatient services and to partici- other interventions have separate agree-
pate in integrated care models and disease ments (e.g., immunizations), a “prevention
management programs (DMPs). act” at federal level intended to consolidate
For pharmaceuticals, while hospitals and clarify responsibilities in this area in
may negotiate prices with wholesalers or 2005 was ultimately rejected by the Federal
manufacturers, the distribution chain and Assembly (Bundesrat).
prices are much more regulated in the phar- Governmental policy since the early
macy market. In both sectors, manufac- 2000s has principally focused on cost con-
turers are free in theory to set prices tainment and the concept of a sustainable
without direct price controls or profit con- financing system. The government in office
trols. However, there is a reference pricing at the time of writing, again a grand coali-
system for SHI reimbursement, which has tion of Christian Democrats and Social
been steadily strengthened over recent Democrats, has agreed a focus on quality,
years, whereby “reference” prices are especially in hospitals.
defined nationally for groups of similar In international terms, the German
pharmaceuticals with reimbursement health-care system has a generous benefit
capped at that level. Although prices can basket, one of the highest levels of capacity
be set higher (with the patient paying as well as relatively low levels of cost
the difference), in practice very few
(continued)
have to be clear about the sources of data they

Box 5: (continued) use, their quality, and the extent to which they are
sharing. Expenditure per capita is relatively compatible with each other.
high, but expenditure growth since the early
2000s has been modest in spite of a growing
number of services provided both in hospi- The Need to Build in a Review Process
tal and ambulatory care, an indication of
technical efficiency. In addition, access is The experience of the Observatory has shown the
good – evidenced by low waiting times value of a comprehensive review process for
and relatively high satisfaction with out- developing templates for health system compari-
of-hours care. sons. While it is clear that consulting widely
However, the German health-care sys- brings new perspectives and creates acceptance
tem also shows areas in need of improve- for a model, it does run the risk of diluting the
ment if compared with other countries. This framework’s focus. The Observatory has found
is demonstrated by the low satisfaction fig- that making it clear in advance that there are
ures with the health system in general; space constraints and giving those consulted
respondents see a need for major reform some explanation of how or why their suggestions
more often than in many other countries. have been acted on (or not) lessens the pressure to
Another area is quality of care, in spite of expand the framework indefinitely and helps
all reforms having taken place. Germany is those consulted see that their inputs are valued
rarely placed among the top OECD or EU15 even if they are not always used.
countries, but usually around average, and
sometimes even lower.
In addition, the division into SHI and Conclusions
PHI remains one of the largest challenges
for the German health-care system – as risk The HiT series is, at least in Europe, in many
pools differ and different financing, access, respects a “gold standard” for comparing health
and provision lead to inequalities. systems. It has a long and positive track record
Source: Busse and Blümel (2014) with HiTs for 56 European and OECD countries,
often in several editions, and a total of some
130 HiTs overall. It has made information on
health systems and policies publicly available in
The Need to Signal Credibility a format that cannot be found elsewhere and
supported comparative analysis across countries,
If evidence is to be used, the reader needs to have including analytic studies, more detailed country
confidence in it. Using expert inputs and consul- case studies, and explicitly comparative works,
tation in developing the template can support this. for example on countries emerging from the
External review stages of the HiT are of course Soviet Union (Rechel et al. 2014), the Baltic states
also important and ideally will include academic (van Ginneken et al. 2012), the central Asian
and practitioner perspectives. It is also crucial that countries (Rechel et al. 2012), or the Nordic coun-
any review by governments or other authorities tries (Magnussen et al. 2009). HiTs are some of
with a potential conflict of interests is handled in the most downloaded documents held on the
such a way that it is not seen to compromise the WHO web site and are used not just in Europe
integrity of the work. Professional presentation, but beyond. They have served as a guide for the
launches and links to major events, as well as Asia Pacific Observatory on Health Systems and
other efforts to “publicize” the materials may Policies (which was mentored by the European
also enhance the reputation of the work. Those Observatory and launched in 2011) which uses an
developing comparative frameworks will also adapted version of the template to produce
country reviews for its region. The average impact research (and people) management skills of the
factor of (European Observatory) HiTs, calculated editorial team. Other comparative initiatives with
internally using Thomson Reuters methodology, limited resources might also want to consider
was 3.6 between 2012 and 2014, with a high of what they can do in terms of sharing ownership
4.26 in 2013 although this only captures citations and recognition to create non-monetary incentives
in journals listed on PubMed/Medline. Google for national counterparts and to develop their
Scholar, which also recognizes the gray literature, own team.
shows that some HiTs achieve several hundred Comparability is and will remain a challenge,
citations per edition. despite the standard template, tables, and figures,
The Observatory’s experience with HiTs has and is likely to be an issue for all other compara-
generated insights that others developing frame- tive projects. This is somewhat obvious when it
works for health system comparison might usefully comes to quantitative data given the divergent
draw on. It demonstrates the importance of a user- geographic coverage of international databases
friendly template that helps authors and editors and the differences in definitions and data collec-
produce accessible, relevant, and credible outputs tion methods, not to mention the challenges at the
with a focus on what is expected from the compar- individual country level. While it is clear that
isons and on who is going to use them. However, it caution must be exercised when comparing quan-
also suggests that no template is perfect. There are titative data from different sources, it is also true,
different ways of categorizing and grouping key if less obvious, that qualitative data and the
functions (of a health or any other system) or of descriptive elements of the HiTs raise issues of
conceptualizing systems and different levels of comparability. In some areas there are broadly
tackling and reporting evaluation. To some extent accepted tools (OECD et al. 2011) that help, but
these are a matter of preference. There are also and in many there are no agreed standard definitions
always tradeoffs between comprehensiveness and (with health professionals being a case in point).
accessibility, completeness and timeliness, and Other comparative projects will need both to draw
inclusiveness and readability. The current HiT tem- on the latest available knowledge and frameworks
plate can be seen as a pragmatic trade-off based on and to invest in methodological work as the
almost 20 years of experience. How other teams Observatory team has done, for example, with
chose to balance these will depend on the focus of the conceptual model (the three-dimensional
their comparisons and the people who are to use cube) to explain coverage (Busse et al. 2007;
their work. Busse and Schlette 2007). They will also need to
The Observatory has also found ways of com- tailor responses to data and evidence availability
bining (excellent) national authors with its own in parts of Europe (particularly but by no means
technical editors. This is not always straightfor- exclusively in central, eastern, and southeastern
ward as not all European countries have the same Europe) and to hope that EC/OECD/WHO initia-
capacity in health system research and national tives on data will ultimately fill the gaps. There
experts with strong analytical and English writing will still and inevitably be differences in the infor-
skills can be hard to find (Santoro et al. 2016) and mation available in countries, in the issues which
may move on rapidly. Moreover, HiT and HSPM are important to them, and in the interests and
authors are not normally remunerated but, at strengths of authors. Those developing frame-
“best,” receive only small honoraria. The HiT works for comparison will have to address these
series has addressed these challenges by identify- tensions in light of their overarching objectives
ing and linking formally with leading institutions, and in the knowledge that health systems are
cultivating long-term relationships with HiT constantly evolving. They may also find, as the
author teams, and, most recently, through its Observatory has, that a comparative framework
HSPM network. This mix of approaches may simply cannot capture everything and that analy-
have helped build capacity in countries. It has sis for more specialized issues may require sepa-
certainly developed the understanding and rate study.
Despite the challenges, the Observatory Duran A, et al. Understanding health systems: scope, func-
would hold that there is real value in a framework tions and objectives. In: Figueras J, McKee M, editors.
Health systems, health, wealth and societal well-being:
for health system comparison, particularly one assessing the case for investing in health systems.
that relates to a defined “user” need and which Maidenhead: Open University Press; 2012. p. 19–36.
can be sustained over time. Much follows from Figueras J, Tragakes E. Health care systems in transition:
knowing who will use a set of comparisons and production template and questionnaire. Copenhagen:
World Health Organization Regional Office for
why. Longevity allows a framework to evolve – Europe; 1996.
to improve, strengthen comparability, and build Magnussen J, Vrangbak K, Saltman RB, editors. Nordic
up successive levels of knowledge. Combining health care systems. Recent reforms and current policy
the two means a framework can move beyond the challenges. Maidenhead: Open University Press; 2009.
Mossialos E, Allin S, Figueras J. Health systems in transi-
descriptive to the truly evaluative so that it cap- tion: template for analysis. Copenhagen: WHO
tures and assesses aspects of health system per- Regional Office for Europe on behalf of the European
formance in ways that speak to policy-makers or Observatory on Health Systems and Policies; 2007.
the research community or, ideally, both. OECD, Eurostat, WHO. A system of health accounts.
Paris: OECD Publishing; 2011. https://doi.org/
10.1787/9789264116016-en.
Rechel B, Thomson S, van Ginneken E. Health systems in
transition: template for authors. Copenhagen: WHO
References Regional Office for Europe on behalf of the European
Observatory on Health Systems and Policies; 2010.
Busse R, Blümel M. Germany: health system review. Rechel B, et al. Lessons from two decades of health reform
Health Syst Transit. 2014;16(2):1–296. in Central Asia. Health Policy Plan. 2012;27(4):281–7.
Busse R, Schlette S, editors. Health policy developments Rechel B, Richardson E, McKee M, editors. Trends in
issue 7/8: focus on prevention, health and aging, and health systems in the former Soviet countries. Copen-
human resources. Gütersloh: Verlag Bertelsmann hagen: World Health Organization; 2014 (acting as the
Stiftung; 2007. host organization for, and secretariat of, the European
Busse R, Schreyögg J, Gericke CA. Analyzing changes in Observatory on Health Systems and Policies).
health financing arrangements in high-income coun- Richards T. Europe’s knowledge broker. BMJ. 2009;339:
tries: a comprehensive framework approach, Health, b3871.
Nutrition and Population (HNP) discussion paper. Santoro A, Glonti K, Bertollini R, Ricciardi W, McKee
Washington, DC: World Bank; 2007. M. Mapping health research capacity in 17 countries of
Catallo C, Lavis J, The BRIDGE study team. Knowledge the former Soviet Union and South Eastern Europe: an
brokering in public health. In: Rechel B, McKee M, exploratory study. Eur J Pub Health. 2016;26:349–54.
editors. Facets of public health in Europe. Maidenhead: van Ginneken E, et al. The Baltic States: building on
Open University Press; 2014. p. 301–16. 20 years of health reforms. BMJ. 2012;345:e7348.
Health Services Knowledge: Use of
Datasets Compiled Retrospectively to 14
Correctly Represent Changes in Size of
Wait List
Paul W. Armstrong
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
Why Does the Waiting List Shrink (or Swell)?
The Primary Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
What Happens to Enrolment and Admission in a Waiting List Initiative? . . . . . . . 302
Does Size Shrink if Admission Exceeds Enrolment
(and Does Size Swell if Enrolment Exceeds Admission)? . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
In South Glamorgan, Wales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
In INSALUD, Spain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
In England . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
In Victoria, Australia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
In Winnipeg, Canada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
In Sweden . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
In England . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
The Balance of Enrolments and Admissions
(Plus Other Removals) Equals the Change in
Size. Why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
If the Model Is Not Complicated, the Data Must Be Simple! . . . . . . . . . . . . . . . . . . . . . . . . . . 319
The Number of ‘Starts’ and ‘Stops’
Must Be the Same . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Secondary Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Inexplicably Complicated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Supplier-Induced Demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Why has the Effect of Enrolment Confounded
Analyses to Date? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
Some Assumed Enrolment Was Fixed and
Unvarying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
Some Only Registered Discharge (and Death) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Some Compiled Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
Some Made Hay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
P. W. Armstrong (*)
London, UK
e-mail: P.W.Armstrong@outlook.com

https://doi.org/10.1007/978-1-4939-8715-3_16
298 P. W. Armstrong
The Primary Hypothesis Has Not Been

Falsified . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Abstract clinician recommended – and the patient agreed

This chapter introduces two items which are to – hospitalization, i.e., the date of the clinician’s
mandatory for any dataset which hopes to sup- ‘decision to admit’ to the list, and the end date
port the analysis of waiting lists. It was once might be the date of the relevant admission, if we
understood that the direction and extent of any are interested in the wait for investigation or treat-
change in size was determined by the balance ment on an inpatient or day-case basis.
of ‘enrolments’ and ‘admissions.’ We can The delay is the interval between the start and
assess the effect on the size of the list of any end dates. This interval tends to lengthen whenever
increase (or decrease) in the number of admis- it involves the coordination of multiple players or
sions, if we know the numbers on the list at two the scheduling of a scarce resource. It is helpful to
points in time (at the beginning, and at the end, visualize the delay as a line connecting the start and
of the period) and if we know the number end dates. Demographers refer to this as a ‘lifeline’
enrolled on the list and the number admitted (Hinde 1998). It is particularly helpful if lifelines
from it during the interval. In the chapter, we are orientated to display the passage of time (on the
show that one cannot determine how changes horizontal axis) and the acquisition of experience
in the number of admissions affect the size of (on the vertical axis) in what is known as a Lexis
the list, if the number of enrolments is not diagram (Hinde 1998).
known or we do not know how it has changed. The number of start dates and end dates in any
cohort must be the same because they are two
different ways of counting the same set of com-
Introduction pleted lifelines. There are seven ‘starts’ and seven
‘stops’ in Fig. 1. But it is more difficult to enumer-
It is not always possible to start treatment the ate the relevant start and end dates when we restrict
moment that a clinician decides it is desirable. attention to the lifelines eligible over any period.
Delay is sometimes unacceptable, and the work Some lifelines had a start date during the period
of the clinician is arranged to expedite the assess- of interest (Fig. 2). We will refer to these as the
ment, investigation, or treatment of such cases newly ‘enrolled’ (E = 13) on the waiting list.
wherever possible. But delay is sometimes accept- Others had a start date which preceded the earlier
able. Patients experiencing such delay are said to census: the end date of some of those also preceded
require assessment, investigation, and treatment the census, and they are of no further interest, but
on an elective basis and to belong to the waiting the end date of others succeeded the census. We will
list. The waiting list identifies all of those in this refer to these as the lifelines counted at the time of
state of limbo at any particular moment in time. It the earlier census (Cthen = 1). In this example,
functions as an order book, allowing clinicians to Cthen + E = 14 lifelines became eligible, e.g., for
keep track of their outstanding obligations. elective admission, at some point during the Period.
The limits of each delay are defined by a ‘start Some lifelines had an end date during the
date’ and an ‘end date.’ There are a variety of these period (Fig. 2). We will refer to these as the
to choose from. For example, the start date might newly ‘admitted’ (A = 11) from the list. Others
be the date of receipt of a referral, and the end date had an end date which succeeded the later census:
might be the date of the relevant consultation, if we the start date of some of those also succeeded the
are interested in the wait for an expert opinion. Or census, and they are of no further interest, but the
the start date might be the date on which the start date of others preceded the census. We will
14 Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . . 299
• The start and end dates must be accurate. The

date recorded as the date of the decision to
admit to the list must be the same as that on
which the patient and clinician agreed to elec-
tive investigation or treatment on an inpatient
or day-case basis. The date recorded as the date
of admission from the list must be the same as
that on which the patient was admitted to hos-
pital for elective investigation or treatment.
Otherwise, those considered eligible on a
specified date might exclude some who were
in fact available and might include others who
Fig. 1 Counting the start and end dates in a cohort were not.
• No record must be duplicated.
Additional items of data may be needed if we

want to assess whether delay is acceptable (or not),
to determine who should be invited ‘to come in’
next or to allow the comparison of like lifelines
with like. Our records must show the same entry as
the waiting list in each of these fields.
But there may be difficulty demonstrating that
the number of starts equals the number of stops
(formula (1)), even if the records are entirely
accurate (Goldacre et al. 1987). We cannot expect
to obtain counts which are consistent unless they
Fig. 2 Counting the start and end dates in a period enumerate the same lifelines, e.g., those which
represent individuals eligible for the same service
and in the same period. Moreover, the counts may
refer to these as the lifelines counted at the time of appear inconsistent if the records are more com-
the later census (Cnow = 3). In this example, A + plicated than the model. We have assumed (a) that
Cnow = 14 lifelines became ineligible at some there is never more than one record per patient,
point during the period. (b) that no one joins the list without eventually
It is evident that receiving the service desired, and (c) that the
interval between the start and end dates is unbro-
Cthen þ E ¼ A þ Cnow : (1) ken. So if any lifelines are removed (R) from the
list without experiencing the outcome desired (A),
There are several conditions which have to be counting the stops will not give the same result as
satisfied if we want our set of records to provide a counting the starts.
valid representation of even the simplest waiting list. But if we are not able to demonstrate
empirically that the number becoming eligible
• The start date must be complete. If it is the exactly equals the number becoming ineligible
decision to admit which determines whether or (formula (1)), we must doubt our processing of
not a patient has been added to the waiting list, the records. This is not trivial. The implications
the date of the decision to admit must be com- for the subsequent analysis are important.
plete; otherwise, the dataset might exclude The number of starts must equal the number of
someone who required elective admission. stops, that is,
300 P. W. Armstrong
Cthen þ E ¼ A þ Cnow : (1) admissions (Naylor 1991). This ‘primary hypoth-

esis’ is often used to justify requests for additional
If we subtract A from both sides of formula (1), resources when the size of the list is thought to be
we obtain a problem. But this relationship has proven so
difficult to substantiate that health economists
Cthen þ E A ¼ Cnow : (2) (and others) have cast around for ‘secondary
hypotheses’ which better fit the available data.
The size of the list when the period closes Formula (3), E A = Cnow Cthen, indi-
now
(C ) is determined by the size of the list when cates that the effect of admissions may be con-
the period opened (Cthen) and by the balance of founded by the effect of enrolments. The size of
enrolments (E) and admissions (A) during the the list may swell despite an increase in the num-
interval. Formula (2) (Moral and de Pancorbo ber of admissions, if enrolments exceed admis-
2001) is a simplified version of what is known as sions; and the size of the list may shrink despite a
the basic demographic equation (Newell 1988) or reduction in the number of admissions, if admis-
the balancing equation (Pressat 1985). sions exceed enrolments.
If we subtract Cthen from both sides of formula The literature includes a number of studies
(2), we obtain in which investigators found little evidence of
an inverse relationship between the number of
admissions and changes in the size of the list
E A ¼ Cnow Cthen : (3) (Fowkes et al. 1983; Goldacre et al. 1987; Niinimäki
1991; Harvey et al. 1993; Nordberg et al. 1994;
Formula (3) indicates that it is the net in-flow Newton et al. 1995). Some investigators were
(E A) which determines the direction and unwilling to surrender the hypothesis. They pre-
extent of any change in stock (Cnow Cthen). ferred to attribute the results to the effect of
If we add A to both sides of formula (3), we enrolments and were prepared to infer a pattern
obtain of variation in the number of enrolments consis-
tent with their data. But few have assembled the
E ¼ A þ Cnow Cthen : (4) information needed to test this surmise. Few sup-
plied their readers with a qualitative judgment as
Formula (4) has been used to estimate the to whether the number of enrolments was fixed or
number of enrolments when the relevant count is varying, and few supplied quantitative data such
not available (Naylor et al. 1997). as the numbers enrolled from one period to the
If we subtract E from both sides of formula (4), next (White 1980; Newton et al. 1995; Street and
we obtain Duckett 1996; Armstrong 2000; NAO 2001b;
Moral and de Pancorbo 2001; House of Commons
Health Committee 2010; Armstrong 2010;
0 ¼ A þ Cnow Cthen E: (5)
Kreindler and Bapuji 2010).
We cannot use the relationship implied by for-
If the four counts enumerate the same set of
mula (4) to obtain estimates of the numbers
lifelines and are accurate, formula (5) gives an
enrolled and then use these estimates to test the
‘error of closure’ of zero (Newell 1988).
relationship implied by any of the other formulae.
If we subtract Cnow from both sides of formula
The estimate is necessarily consistent with the count
(1), we obtain
of admissions and with changes in size under
formula (4) and therefore cannot provide indepen-
Cthen þ E Cnow ¼ A: (6) dent verification of another version of the formula,
e.g., formula (3). So the authors of these studies
It is often asserted that the size of the list ought assert that it is not possible to evaluate the effect of
to shrink if there is an increase in the number of admissions without making some allowance for the
effect of enrolment and find that they are unable to Culyer and Cullis (1976) imagine enrolments
test the primary hypothesis having failed to collect driven by admissions, i.e., ‘demand’ for a service
the necessary data. induced by its ‘supply,’ and Buttery and Snaith
Other investigators chose to infer that the rela- (1980) imagine enrolments constrained by the
tionship did not have the form hypothesized when length of wait. The former list is driven by the
they found no evidence of an inverse relationship public’s appetite for consumption and the latter by
between the number of admissions and changes in the clinicians’ desire to limit their commitments.
the size of the list (Feldstein 1967; Culyer and But the mechanisms and the outcomes envisaged
Cullis 1976; Snaith 1979; Buttery and Snaith are the same. An increase in the number of admis-
1980; Kenis 2006). sions is thought to reduce the length of wait, a
Culyer and Cullis (1976, 244) invoked “Say’s reduction in the length of wait is thought to
Law of Hospitals . . . that additions to the supply increase the number of enrolments, and an
of inpatient capacity create equal additions to the increase in the number of enrolments is thought
demand for that capacity.” “[A]s the price of a to increase the size of the list. It is change in the
good or service is lowered . . . the quantity number of admissions which is thought to evoke
demanded in any . . . period . . . will rise” change in the number of enrolments in both
(p. 244), so the authors hypothesized that an instances, and both hypotheses predict a direct
increase in “throughput capacity” would be correlation between admissions and enrolments.
accompanied by a fall in “mean waiting time” Enrolments are thought to contribute to self-
and “time price” which “would lead one to expect regulation and to supplier-induced demand, but
demand to increase” (p. 247). As a result, the neither pair of investigators assembled data
authors expected “a positive relation between which allowed them to confirm whether the
throughput capacity and waiting lists” (p. 247), number of enrolments was fixed or to establish
and they reported significant direct correlations whether any variation had the pattern hypothe-
between these two variables for seven (out of sized. Instead, they assume relationships which
15) “hospital regions” over time (p. 249). Under are consistent with the primary hypothesis. They
this hypothesis, an increase in the number of expect the size of the list to swell if there has been
admissions is about the worst line a policy- an increase in the number of enrolments, and they
maker can take if his/her goal is to reduce the expect the size of the list to shrink if there has been
size of the list. a decrease in the number of enrolments.
Buttery and Snaith (1980, 58) hypothesized “a The authors of most of these investigations
self-regulating system . . . in which waiting times dismissed the primary hypothesis without fair
for patients and waiting lists per surgeon are rela- trial. It would have been reasonable to restrict
tively constant” (Feldstein 1967; Smethurst and attention to the effect of admissions on size – and
Williams 2002). They thought that “[w]aiting to draw conclusions accordingly – when the effect
times must provide a constraint on unmet need, of enrolment had been given no thought. It would
preventing patients from coming forward or also have been reasonable to restrict attention to
surgeons from putting them on their waiting the effect of admissions on size when counts of
lists if they do.” In other words, the number of enrolments were thought to be unvarying. But it
enrolments shrank to counter any increase in wait was not reasonable to dismiss the primary hypoth-
as a result of any decrease in the number of admis- esis without attempting to adjust for enrolments
sions; and the number of enrolments swelled in (formula 3) once the effect had been surmised and
response to any decrease in wait as a result of any the variation acknowledged.
increase in the number of admissions. They under- The relationship between the change in the size
stood that if such a system exists, a “further of the list and the balance of enrolments and
increase in the number of surgeons will further admissions over a period is simple, not complex;
increase the national waiting list” and “a diminu- exact, not approximate; and mathematical, not
tion [will] reduce it.” behavioral. The relationship is not affected by
302 P. W. Armstrong
the location of the delay (outpatient or inpatient/ Torkki et al. 2002; NWTU 2003; National Audit
day case) or its cause (assessment, investigation, Office Wales 2005; Kenis 2006; Kreindler 2010).
or treatment); by diagnosis or procedure; by clini- They represent a variety of stakeholders, e.g.,
cian, specialty, or provider; or by any other clas- clinicians (DHSS 1981a; Sykes 1986; Naylor
sification of the lifelines. So the seeming variety 1991; Hanning and Lundström 1998; Torkki
of our case studies contributes a little color to the et al. 2002), managers (Worthington 1991; Street
account but adds nothing to the veracity of the and Duckett 1996; Hanning and Lundström 1998;
argument presented in formulae (1) through (5). Kenis 2006), and policy-makers (DHSS 1981a;
An increase in the number of admissions may NWTU 2003); and they represent a variety of
either make matters better, under the primary paradigms, e.g., health economics (Street and
hypothesis, or else may make matters worse, Duckett 1996; Hanning and Lundström 1998),
under the secondary hypothesis. The two views organizational science (Kenis 2006), and system
have influenced the thinking of contemporary dynamics (Worthington 1991). We thought it
commentators (Carvel 2004) and policy-makers remarkable that they should all agree: when a
but are contradictory. We wish to establish which consensus is the result of independent (and rigor-
studies provide relevant data, whether the results ous) evaluation by various stakeholders and dif-
are trustworthy or suspect and whether they con- ferent disciplines, their agreement adds weight to
firm or refute the primary hypothesis. the evidence. But independent (and rigorous)
The literature is clogged with citations. So, in evaluation is not the only way in which we reach
this chapter, we have given the floor to studies a consensus. Some authors also claim that the
which use empirical data to explore the effect of relationship (formula 3) has never been observed
admission on the size of the list while allowing for in practice (Culyer and Cullis 1976). Given that
the effect of enrolment. We have not knowingly the proposition of a second hypothesis implies the
omitted any study which offers relevant data. We failure of the first, it is perhaps not surprising that
are widely read, and we have used reviews this view is one echoed by many of those who
(Faulkner and Frankel 1993; Sanmartin et al. have contributed to this literature. So the variety
1998; Hurst and Siciliani 2003; Finn 2004; of the stakeholders provides no assurance of the
Kreindler 2010) and reference lists (Harvey et al. independence of their judgment if it is the failure
1993) to identify potentially relevant material. We of the first hypothesis on which they are all
are looking for one, well-substantiated, exception agreed; and the variety of the paradigms provides
to the rule, a set of data which invalidates for- us with no assurance of the rigor of their evalua-
mula (3). But the literature is extensive, and we tion if the failure of the first hypothesis is assumed
have not had time to run a systematic search of our by each approach. We begin this chapter by
own. Nevertheless, it should be easy to find exam- conducting a fresh assessment of claims that the
ples given the eagerness with which alternative primary hypothesis has failed.
hypotheses have been adopted and the primary
hypothesis dismissed.
What Happens to Enrolment and
Admission in a Waiting List Initiative?
Why Does the Waiting List Shrink
(or Swell)? The Primary Hypothesis Very few researchers have reported the number of
enrolments. We know of only seven instances. Four
A number of authorities claim that it is the balance appeared in print having been subject to peer review
of enrolments and admissions which determines (White 1980; Street and Duckett 1996; Armstrong
whether there is an increase (or a decrease) in the 2000, 2010), and three are contributions to the grey
size of a waiting list (DHSS 1981a; Sykes 1986; literature, two of which are in the public domain
Naylor 1991; Worthington 1991; Street and (Hamblin et al. 1998; Moral and de Pancorbo 2001)
Duckett 1996; Hanning and Lundström 1998; and the other of which is not (Kreindler and Bapuji
2010). This observation indicates that most commissioned by the hospitals he visited but,
researchers have not examined the effect of the apart from these (Purcell 2003), was obliged to
balance of enrolments and admissions on the size confess that “it is difficult to verify the reported
of a waiting list and implies that the validity of the number of procedures carried out under the
first hypothesis has not been widely evaluated. [waiting list initiative] and to ascertain the extent
Many researchers believe that an increase in to which they are over and above core-funded
the numbers admitted from a waiting list ought to activity” (Purcell 2003, 26). (# Government of
be accompanied by a decrease in its size (Culyer Ireland 2003.) He also reported that “the overall
and Cullis 1976; Goldacre et al. 1987; Naylor level of elective inpatient treatment . . . fell
1991). They rarely declare that enrolment may between 1998 and 2001,” which “suggests that
be a factor (Newton et al. 1995) or state that the the Initiative did not result in an increase in elec-
number of enrolments is assumed to be fixed and tive inpatient activity over and above existing
unvarying, i.e., stationary (MoH 1963a). Instead, levels” (Purcell 2003, 28). (# Government of
the effect of enrolment is conceded only when Ireland 2003.) It is possible that the fault was
researchers are obliged to explain how an increase due to a failure in the system of bookkeeping,
in the numbers admitted from a waiting list that financial control mechanisms were blameless
appears to have been accompanied by an increase (Purcell 2003), and that additional activity failed
in its size (MoH 1964; Goldacre et al. 1987; to have the effect desired. But before accepting
Hamblin et al. 1998; Sanmartin et al. 1998). that this is the case, we would like to know
(The size of the list and the numbers admitted whether the funds awarded to each hospital were
may both increase yet still be consistent with the apportioned in line with the intended contribu-
primary hypothesis if enrolment exceeded admis- tion of each department and whether these addi-
sion.) But at the close of their investigations, these tional resources appeared under the appropriate
researchers are unable to substantiate the claims budget headings in time to pay for the activity
they wish to make because they neither eliminated planned. If a study does not quantify the effect of
the variation in enrolments nor did they collect the additional expenditure on elective admission,
counts which would have allowed them to adjust we doubt its ability to provide empirical evi-
for it. The limitations of these studies have not dence about the effect of elective admission
been sufficiently appreciated. Their results have (or additional expenditure) on the size of the
nothing to contribute to our understanding of the list. Newton et al. (1995, 784) report that “[i]n
relationship between enrolments, admissions, and only six [out of 44 waiting list initiatives] was
the size of the list. Their discussions have nothing additional funding followed by a rise in admis-
to contribute to our methodology because they fail sions and a fall in list size.”
to acknowledge that variation in enrolment con- Some commentators believe that an increase in
founds the apparent relationship between admis- the amount of a resource, particularly one thought
sions and the size of the list (Newton et al. 1995). to be in critically short supply, ought to reduce the
Some analysts anticipate that an increase in size of the waiting list (DHSS 1975, 1981a). Our
expenditure, intended to increase the numbers evaluation of the effect of an increase in such a
admitted from the waiting list, ought to reduce resource proceeds along the same lines as our
its size. So the Irish Minister for Health and Chil- evaluation of the effect of an increase in expendi-
dren authorized the expenditure of an additional ture. We expect a reduction in the size of the list
€246 million between 1993 and 2002 (Purcell only when the number of admissions exceeds
2003), on the understanding that this would the number of enrolments. We therefore want to
buy substantial numbers of additional elective know what effect the increase in resources had
procedures and, as a consequence, would reduce on the number of admissions (and on the
the numbers who had been on the list for a number of enrolments). This data is the minimum
long time. The Comptroller and Auditor General required for any evaluation, and the effect of
found invoices for work in the private sector a waiting list initiative on the size of the list
304 P. W. Armstrong
cannot be established without it. Regrettably, this outpatients seen, but he does not express the
information has rarely been assembled for the same dissatisfaction with the relationship he
benefit of readers (Hamblin et al. 1998). observes between the size of the inpatient
waiting list and discharges (or deaths). In the
former instance, he attributes inconsistency to
Does Size Shrink if Admission Exceeds those who walk-in without having been entered
Enrolment (and Does Size Swell if on the list: in other words, he understands that
Enrolment Exceeds Admission)? not every new outpatient seen represented a unit
reduction in the numbers waiting (White 1980).
The veracity of the primary hypothesis can only In the latter, he does not acknowledge the
be tested by studies which report the number of difference between the number of discharges
enrolments alongside the number of admissions (or deaths) and the number of elective admis-
and changes in the size of the list. These studies sions or between the number of decisions to
are therefore rather more important than has hith- admit to the list and the number of new outpa-
erto been recognized. tients seen. Instead, he expresses himself satis-
fied that “[f]ewer deaths and discharges in the
specialty coincide with a lower in-patient waiting
In South Glamorgan, Wales list” which “indicates that long in-patient waiting
lists combine with greater in-patient activity”
White (1980) reports a study of the combined (White 1980, 274).
elective activity of three or four consultants in White (1980) uses surrogate measures to
one surgical specialty (unspecified) at one public describe activity over 15 periods each of 3 months
hospital in South Glamorgan, Wales. He thinks duration. He counts GP referrals rather than
that the size of the outpatient waiting list ought to all referrals and referrals received rather than
have something to do with the number of new referrals accepted. He counts new outpatients
outpatients seen and the number of GP referrals booked rather than decisions to admit to the inpa-
(White 1980), and he uses column charts to tient waiting list, and he counts discharges
explore the relationship. He seems also to have (or deaths) rather than elective admissions. Now,
thought that the size of the inpatient waiting list if it is the balance of enrolments and admissions
might have something to do with the number of which determines whether there is an increase
discharges (or deaths) and the number of new (or a decrease) in the size of a waiting list, we
outpatients seen. would expect E A = Cnow Cthen (3). But
White (1980) does not make full use of his 2.39 new outpatients were booked per discharge
data. So he looks for a relationship between the (or death). Therefore, where E represents new
size of the outpatient waiting list and new out- outpatients booked, A represents discharges
patients seen and between the size of the outpa- (or deaths), and C represents the size of the inpa-
tient waiting list and GP referrals, but he does not tient waiting list, we do not expect E A to
consider the combined effect on the size of the exactly equal Cnow Cthen (Table 1, right-hand
outpatient waiting list of new outpatients seen side). However, if the surrogate measures have
and GP referrals. Worse, he looks for a relation- the effect anticipated on the size of the waiting
ship between the size of the inpatient waiting list, we would expect a direct correlation
list and deaths and discharges, but he does between the two sides of formula (3), i.e.,
not consider even the univariate effect of new E A / Cnow Cthen.
outpatients seen. We obtained the quantities E A and Cnow
then
White (1980) does not reason correctly from C by calculation from counts charted by White
the data he has assembled. He is not satisfied with (1980), and we used Spearman’s rho to assess the
the relationship he observes between the size of direction and strength of association between
the outpatient waiting list and the number of new them. (We used the number of “deaths and
Table 1 Was the change in size directly correlated with the balance of enrolments and admissions in South Glamorgan,
Wales?
Waiting for out-patient assessment Waiting for in-patient admission
No. of No. of Size of 'out- No. of No. of Size of 'in-
'GP 'new outpatient 'new out- 'discharges patient
Year Qtr
referrals' patients waiting list' Net in- Change in patients and deaths' waiting list' Net in- Change in
booked' Flow Stock booked' Flow Stock
E A C now E-A C now−C then E A C now E-A C now−C then
[3] [4] [6] [9] [10] [3] [4] [6] [9] [10]
1976 1 1,076 1,109 712 −33 1,109 363 290 746
2 1,112 923 813 189 101 923 455 271 468 −19
3 1,197 1,296 495 −99 −318 1,296 412 237 884 −34
4 1,028 1,117 105 −89 −390 1,117 451 233 666 −4
1977 1 1,080 826 346 254 241 826 423 264 403 31
2 1,108 583 857 525 511 583 354 195 229 −69
3 1,044 615 1,126 429 269 615 377 231 238 36
4 1,068 575 1,360 493 234 575 385 222 190 −9
1978 1 1,020 672 1,649 348 289 672 346 207 326 −15
2 1,229 555 2,076 674 427 555 344 114 211 −93
3 1,092 704 2,084 388 8 704 321 152 383 38
4 1,205 1,385 1,489 −180 −595 1,385 300 222 1,085 70
1979 1 1,237 1,381 1,247 −144 −242 1,381 247 237 1,134 15
2 876 583 1,505 293 258 583 358 203 225 −34
3 1,036 704 1,505 332 0 704 323 217 381 14
Source: White (1980)
discharges” (White 1980, Fig. 14) as our surrogate In INSALUD, Spain

for the number of admissions from the inpatient
waiting list, and we used the number of “new out- Moral and de Pancorbo (2001) report a study of
patients booked” (White 1980, Fig. 8) as our the combined elective activity in six surgical spe-
surrogate for the number of enrolments on it. We cialties funded by INSALUD, Spain. When the
used the number of “new out-patients booked” initiative began, the waiting list comprised ortho-
(White 1980, Fig. 8) as our surrogate for the pedics (27%), general surgery (21%), ophthalmol-
number of admissions from the outpatient waiting ogy (17%), ENT surgery (10%), urology (7%),
list, and we used the number of “GP referrals” gynecology (6%), and other specialties (12%).
(White 1980, Fig. 2) as our surrogate for the Unlike White (1980), the authors do not report
number of enrolments on it.) the inputs and outputs at some distance from the
We found that the correlation for the inpatient waiting list down the referral pathway. Instead, they
waiting list – while not statistically significant – had count “entries” to the list, i.e., that number by which
the direction anticipated (Spearman’s rho = the count of patients on the list ought to swell; and
+ 0.46, n = 14, p = 0.10) and that the correla- they count “exits” from the list, i.e., that number by
tion for the outpatient waiting list was strong and which the count of patients on the list ought to
statistically significant as well as having the direc- shrink. Moral and de Pancorbo (2001, 48) use
tion anticipated (Spearman’s rho = + 0.82, counts of “entries” and “exits” to describe activity
n = 14, p < 0.01). White’s data is compatible over four periods each of 12 months duration.
with formula (3) and the primary hypothesis – an Moral and de Pancorbo (2001) emulate the
increase in ‘admission’ (net of ‘enrolment’) may approach modeled by White (1980). They look
accompany a reduction in the size of the list. It for a relationship between the size of the list and
isnoteworthy that the author overlooked the effectof the number of “exits”, but they do not consider the
enrolment despite assembling relevant data when effect of the number of “entries”. We obtained the
trying to identify “[w]hat [f]actors [i]nfluence [the quantities E A and Cnow Cthen from counts
size of w]aiting [l]ists” (White 1980, 270). charted by Moral and de Pancorbo (2001), and we
306 P. W. Armstrong
used Spearman’s rho to assess the direction and • If the number awaiting surgery is always over-
strength of association between them (Spearman’s reported, e.g., by a factor of 1.05, the apparent
rho = + 1.00, n = 4, p < 0.01). change from one census to the next will cor-
The error of closure reports the difference rectly indicate whether the size of the list
between the number of dates of entry and the decreased or increased, but the size of the
number of dates of exit (1) often as a percentage apparent change will be exaggerated by a fac-
of all of those eligible for admission during the tor of 1.05. If this were the only source of error,
period of interest, i.e., Cnow Cthen would always be greater than
E A but would have the same direction.
error of closure ð%Þ ¼ • If the number of entries is always overreported,
e.g., by a factor of 1.05, then E A will
Cthen þ E ðA þ Cnow Þ
100 then : always be too positive. When the size of the
C þ E þ A þ Cnow =2 list is increasing, E A will maximize the
(5:1) amount, and when the size of the list is decreas-
ing, E A will minimize the amount some-
The initiative was associated with a reduction in times to the extent of reporting an increase in
the size of the list in its early years (in 1997 and size where there has been a decrease.
1998, according to E A and to Cnow Cthen). But • If the number of exits is always overreported,
if we are prepared to credit the initiative with success e.g., by a factor of 1.05, then E A will
in its early years – claiming that the initiative reduced always be too negative. When the size of the
the size of the list (Hanning and Lundström 1998) – list is increasing, E A will minimize the
we should also be prepared to credit it with failure in amount sometimes to the extent of reporting a
its later years – acknowledging that the initiative decrease in size where there has been an
increased the size of the list (in 1999 and 2000, increase, and when the size of the list is
according to net in-flow and to change in stock). decreasing, E A will maximize the amount.
The number of dates of entry (Cthen + E) did
not equal the number of dates of exit (A + Cnow): Unfortunately, none of these scenarios fit
the difference ranges from a shortfall of 15,148 to Table 2 in which E A is more negative than
a surplus of +2009. Although these differences are Cnow Cthen in the first and second periods, less
small, a little less than 2.5% when compared with positive in the third, and more positive in the
the number of lifelines enumerated over the period, fourth. This implies either that there is systematic
they should not occur and require some attempt at error in more than one count or that the error is not
explanation. If there were a systematic error in one systematic.
of the three counts, we would expect the direction There are several problems with the counts
and the extent of the error to be consistent. available. We have not been able to reconcile the
Table 2 Did the balance of “entries” and “exits” adequately account for the change in size in INSALUD, Spain?
Waiting for Admission
error of closure
No. of No. of Size of list Net in- Change in Counting Counting
Year “entries” “exits” Flow Stock dates of entry dates of exit difference (%)
now now then then now
E A C E−A C −C C +E A+C
[3] [4] [6] [9] [10] [11] [12] [13] [14]
30-Jun-96 190,000
31-Dec-96 165,735 −24,265
31-Dec-97 445,816 478,452 148,247 −32,636 −17,488 611,551 626,699 −15,148 −2.45
31-Dec-98 489,331 509,414 132,221 −20,083 −16,026 637,578 641,635 −4,057 −0.63
31-Dec-99 557,950 552,929 141,827 5,021 9,606 690,171 694,756 −4,585 −0.66
31-Dec-00 616,527 598,117 158,228 18,410 16,401 758,354 756,345 2,009 0.27
Source: Moral and de Pancorbo 2001
heights of the columns representing exits from “entries”) may accompany a reduction in the size
the target population with the numbers reported of the list.
in the text. (This undermines our confidence in the
authors’ presentation of their data.) The chart
records the suspiciously tidy 190,000 as the size In England
of the list in June 1996, whereas the text reports a
count of 168,265. (We have chosen to tabulate the Hamblin et al. (1998) tabulate counts which
numbers obtained from the chart which provides describe activity over six periods each of
information on entries as well as exits.) As a 12 months duration and invite their readers to
consequence, we report a change in stock of examine “[t]he effects of the Waiting Time Initia-
24,265 rather than of 2530, but this affects tive” (1998, 13). They supply three different
neither the correlation nor the error of closure. counts of ‘enrolments,’ two different counts of
More importantly, we have had to read the counts ‘admissions,’ and a count of the numbers awaiting
of entries and exits off the printed version of the elective admission on a day-case, or an inpatient,
column chart. We enlarged this so that 1 mm basis. When we used their counts of “[s]pecialist
represented 1674 patients on the vertical axis referring . . . with no date” as a measure of enrol-
instead of 9412. The errors of closure are therefore ment, and “[w]aiting list episodes” as a measure of
equivalent to heights of 9.0, 2.4, 2.7, and 1.2 mm admission, we obtained a perfect correlation
for the periods 1997, 1998, 1999, and 2000, between the change in stock and the net in-flow
respectively. If a measurement may be out by as (Table 3a: Spearman’s rho = + 1.00, n = 5,
much as 1.0 mm, then E A and Cnow Cthen p < 0.01).
may be out by as much as 2.0 mm, and the error Similarly, when we used “[t]otal elective epi-
of closure by as much as 4.0 mm. So one of sodes” as a measure of admission, and combined
these differences is not trivial. The waiting list “[s]pecialist referring . . . with no date” and
initiative claims to have funded an additional “[s]pecialist referring . . . with date” as a measure
35,883 surgical procedures in 1997, when it of enrolment, we obtained a perfect correlation
recorded 15,148 too many exits (or too few between the change in stock and the net in-flow
entries) for the change in size observed. (Table 3b: Spearman’s rho = + 1.00, n = 5,
The data provided by Moral and de Pancorbo p < 0.01).
(2001) is compatible with formula (3) and the pri- We consider this result suspicious although it is
mary hypothesis – an increase in “exits” (net of everything we are looking for. If ‘enrolments,’
Table 3a Did the balance of “[s]pecialist referring” and “episodes” adequately account for the change in size in
England?
Waiting for Admission to hospital
'Specialist 'waiting list 'waiting list
Year- referring to episodes' size' error of closure
end waiting list Net in- Change in Counting Counting
with no date' Flow Stock dates of entry dates of exit difference (%)
E A C now E−A C now−C then C then +E A+C now Ê
[3] [4] [6] [9] [10] [11] [12] [13] [14]
1989/90 2,189,437 2,163,709 912,800 25,728 3,076,509 *
1990/91 2,094,683 2,101,089 906,394 −6,406 −6,406 3,007,483 3,007,483 0 0.00 2,094,683
1991/92 2,261,086 2,251,873 915,607 9,213 9,213 3,167,480 3,167,480 0 0.00 2,261,086
1992/93 2,362,393 2,283,026 994,974 79,367 79,367 3,278,000 3,278,000 0 0.00 2,362,393
1993/94 2,455,038 2,384,643 1,065,369 70,395 70,395 3,450,012 3,450,012 0 0.00 2,455,038
1994/95 2,493,649 2,514,977 1,044,041 −21,328 −21,328 3,559,018 3,559,018 0 0.00 2,493,649
Source: Hamblin et al. 1998

*The authors were able to enter a value for 1989/90 in column 3, but we were unable to provide an estimate of the value for
1989/90 in the last column on the right. This suggests that the authors knew the ‘waiting list size’ for 1988/89 but opted
not to report it.
308 P. W. Armstrong
Table 3b Did the balance of “[s]pecialist referring” and “episodes” adequately account for the change in size in
England?
Waiting for Admission to hospital
'Specialist 'Total 'waiting list
Year- referring to ... elective size' error of closure
end with no date' episodes' Net in- Change in Counting Counting
or 'with date' Flow Stock dates of entry dates of exit difference (%)
E A C now E−A C now−C then C then +E A+C now Ê
[3] [4] [6] [9] [10] [11] [12] [13] [14]
1989/90 3,361,737 3,336,009 912,800 25,728 4,248,809 *
1990/91 3,288,594 3,295,000 906,394 −6,406 −6,406 4,201,394 4,201,394 0 0.00 3,288,594
1991/92 3,684,057 3,674,844 915,607 9,213 9,213 4,590,451 4,590,451 0 0.00 3,684,057
1992/93 3,914,759 3,835,392 994,974 79,367 79,367 4,830,366 4,830,366 0 0.00 3,914,759
1993/94 4,065,606 3,995,211 1,065,369 70,395 70,395 5,060,580 5,060,580 0 0.00 4,065,606
1994/95 4,139,168 4,160,496 1,044,041 −21,328 −21,328 5,204,537 5,204,537 0 0.00 4,139,168
Source: Hamblin et al. 1998

*The authors were able to enter a value for 1989/90 in column 3, but we were unable to provide an estimate of the value for
1989/90 in the last column on the right. This suggests that the authors knew the ‘waiting list size’ for 1988/89 but opted not
to report it.
‘admissions,’ and ‘size’ had enumerated the same are estimates obtained using formula (4): E ^ ¼A
now then
lifelines and if ‘admission’ was the inevitable and, þC C . (The reader can check these by
therefore, the only outcome of ‘enrolment,’ we adding the content of columns 4 and 10 in each
might hope for a perfect correlation and for errors row. We cannot estimate the number enrolled
of closure of zero. But Hamblin et al. (1998) present during 1989/90 without the size of the list at
counts obtained from Hospital Episode Statistics the start of that financial year.) We think that
alongside counts from the KH07 return, i.e., counts the numbers tabulated as “[s]pecialist referrals
of the number of episodes of investigation or treat- . . . with no date” and “[s]pecialist referrals . . .
ment alongside counts of people awaiting admis- with a date” are estimates rather than counts.
sion, and they omit to report counts of “removals If this is correct, then the number of ‘enrolments’
other than admissions” (CRIR 1998, 3 of KH06). presented in Tables 3a and 3b were obtained by
We are told that “specialists . . . may either refer with assuming that the counts of ‘enrolments,’ ‘admis-
a date for admission (these patients are known as sions,’ and ‘size,’ are perfectly consistent. The
‘booked admissions’) or . . . without a date – the true results therefore cannot be used to test whether this
‘waiting list’ admissions” (Hamblin et al. 1998, 13). is true. At best, the table presented by Hamblin et al.
But the distinction between “booked admissions” (1998) provides an example which shows how the
and “waiting list admissions” was made by Hospital three counts ought to be related were the primary
Episode Statistics among finished consultant epi- hypothesis true (Mason 1976; Fordham 1987). At
sodes, and the distinction between those “with a worst, the table presented by Hamblin et al. (1998)
date” and those “with no date” was made by the invites readers to imagine that this is what actually
KH07 return in its count of the number of patients happened to ‘enrolments’ when counts of finished
awaiting admission. The KH06 return, which consultant episodes and of patients awaiting admis-
counted the number of “decisions to admit” to the sion varied in the manner indicated.
list (and the number “admitted” and the number of
“removals other than admissions” from it), made no
such distinction (CRIR 1998, 3 of KH06). In Victoria, Australia
We know the authors were prepared to fill the
gaps in their table by calculation because they Street and Duckett (1996) report a study of the
indicate that they have done so for two of the combined elective activity of surgeons at public
eight items. The numbers in the column headed hospitals dealing with patients in categories 1–3
^ (on the right-hand side of Tables 3a and 3b)
E in Victoria, Australia. The authors feared that an
increase in elective procedures would increase the of dates of exit (A + Cnow). If we ignore the grossest
size of the list (Street and Duckett 1996, 4). They error, a shortfall of 2002 cases (6.70%) occur-
use counts of “additions” and “deletions” to ring in December 1991, the difference ranged from
describe activity over a single period of 12 months a shortfall of 105 (0.29%) to a surplus of +109
duration (31 July 1993 to 31 July 1994). (+0.32%) cases and was less than 0.20% in
Street and Duckett (1996, 12) claim that “hos- 28 (out of 32) instances.
pitals have achieved waiting list reduction in the If a measurement may be out by as much as
face of increases in the number of elective surgery 0.5 mm, then E A and Cnow Cthen may be
patients: the number of additions to the list is . . . out by as much as 1.0 mm and the error of closure
offset by increases in the number of patients . . . by as much as 2.0 mm. 28 out of 32 errors cannot
deleted from the list. . ..” They report that the be attributed to this level of inaccuracy in reading
number of category 1 patients waiting shrank the number of “additions” and “deletions” off a
from 1298 on 31 July 1993 to 195 on 31 July scale of 1 mm per 37 cases. While Street &
1994 and that the number of category 2 patients Duckett’s data may not be entirely consistent with
waiting shrank from 12,115 on 31 July 1993 to formula (3) and the primary hypothesis, the differ-
8506 on 31 July 1994 (Street and Duckett 1996), ence between “additions” and “deletions” accounts
and they present an intuitively helpful plot of the very well for the change in size.
number of “additions” to, and the number of
“deletions” from, the surgical waiting list each
month (31 December 1991 to 31 July 1994) In Winnipeg, Canada
(Street and Duckett 1996). This appears to
describe the movement of people on and off the Kreindler and Bapuji (2010) report a study of
combined waiting list, although this is not clearly the elective replacement of hips and knees in
stated in the text. Winnipeg, Canada. Winnipeg Regional Health
It is true that the size of the list has diminished, Authority thought that an increase in elective pro-
despite more additions to the list (85,259, 1 Aug cedures ought to reduce the size of the list
1993–31 Jul 1994 incl.) than in the previous (Kreindler and Bapuji 2010). Kreindler and
year (77,820, 1 Aug 1992–31 Jul 1993). But the Bapuji (2010) use counts of “arrivals” and “depar-
published data permit only a single comparison, tures” to describe activity over 11 periods each of
i.e., of the change in size between 31 Jul 1993 and 3 months duration. They emulate Street and
31 Jul 1994, with the difference in additions and Duckett (1996) in presenting a similarly helpful
deletions over the intervening period. It is there- plot of the number of “arrivals” and the number of
fore not possible to assess the strength of associ- “departures” during each quarter (31 Mar
ation between change in stock and net in-flow. 2005–31 Mar 2008) (Kreindler and Bapuji 2010)
The error of closure is small (335, or 0.29%, of alongside a plot of the number of joints still
those on the list at any point during the year). awaiting surgery at the close of each month
The authors were unable to verify the number (31 Jan 2005–31 Jan 2008) (Kreindler and Bapuji
of additions and deletions we obtained from their 2010). They appreciate that they ought to count
plot (Street and Duckett 1996) 20 years after its the arrival and the departure of joints if they are
publication but kindly volunteered the additional interested in the number of joints requiring sur-
census counts reported in column 6 of Table 4. gery (Table 5) or count the arrival and the depar-
This allows us to describe elective activity over ture of people if they are interested in the number
32 periods each of one calendar month duration. of people awaiting surgery.
The correlation between the change in size and the The correlation between E A and Cnow
then
balance of enrolments and admissions was positive, C was positive, strong, and statistically signifi-
strong, and statistically significant (Spearman’s cant (Spearman’s rho = + 0.90, n = 11, p < 0.01).
rho = + 0.99, n = 32, p < 0.01). But the count But the number of dates of entry (Cthen + E) did
of dates of entry (Cthen + E) did not equal the count not equal the number of dates of exit (A + Cnow):
310 P. W. Armstrong
Table 4 Did the balance of “additions” and “deletions” adequately account for the change in size in Victoria, Australia?
Waiting in Victoria, Australia
Month- No. of No. of Size of list Net in- Change in Counting Counting error of closure
Year
end 'additions' 'deletions' Flow Stock dates of entry dates of exit difference (%)
[3] [4] [6] [9] [10] [11] [12] [13] [14]
1992 31-Dec 5,988 4,574 26,323 1,414 3,416 28,895 30,897 −2,002 −6.70
31-Jan 4,946 4,686 26,563 260 240 31,269 31,249 20 0.06
29-Feb 6,397 6,248 26,757 149 194 32,960 33,005 −45 −0.14
31-Mar 6,490 6,527 26,689 -37 −68 33,247 33,216 31 0.09
30-Apr 5,671 6,322 26,025 −651 −664 32,360 32,347 13 0.04
31-May 6,118 6,545 25,539 −427 −486 32,143 32,084 59 0.18
30-Jun 6,136 6,136 25,532 0 −7 31,675 31,668 7 0.02
31-Jul 6,545 6,025 26,098 520 566 32,077 32,123 −46 −0.14
1993 31-Aug 6,192 5,969 26,299 223 201 32,290 32,268 22 0.07
30-Sep 6,322 6,360 26,206 −38 −93 32,621 32,566 55 0.17
31-Oct 6,564 6,322 26,463 242 257 32,770 32,785 −15 −0.05
30-Nov 6,564 5,541 27,436 1,023 973 33,027 32,977 50 0.15
31-Dec 6,601 4,426 29,634 2,175 2,198 34,037 34,060 −23 −0.07
31-Jan 5,002 5,002 29,671 0 37 34,636 34,673 −37 −0.11
28-Feb 6,471 6,471 29,776 0 105 36,142 36,247 −105 −0.29
31-Mar 7,271 6,955 30,121 316 345 37,047 37,076 −29 −0.08
30-Apr 6,341 6,694 29,827 −353 −294 36,462 36,521 −59 −0.16
31-May 6,192 6,899 29,088 −707 −739 36,019 35,987 32 0.09
30-Jun 7,085 7,550 28,618 −465 −470 36,173 36,168 5 0.01
31-Jul 7,215 7,122 28,745 93 127 35,833 35,867 −34 −0.09
1994 31-Aug 6,917 7,847 27,740 −930 −1,005 35,662 35,587 75 0.21
30-Sep 7,494 7,810 27,391 −316 −349 35,234 35,201 33 0.09
31-Oct 6,843 7,140 27,113 −297 −278 34,234 34,253 −19 −0.06
30-Nov 7,178 7,736 26,549 −558 −564 34,291 34,285 6 0.02
31-Dec 7,029 6,360 27,164 669 615 33,578 33,524 54 0.16
31-Jan 5,839 6,285 26,678 −446 −486 33,003 32,963 40 0.12
28-Feb 7,252 7,940 25,881 −688 −797 33,930 33,821 109 0.32
31-Mar 7,903 7,959 25,850 −56 −31 33,784 33,809 −25 −0.07
30-Apr 6,583 7,308 25,093 −725 −757 32,433 32,401 32 0.10
31-May 7,624 7,921 24,776 −297 −317 32,717 32,697 20 0.06
30-Jun 7,512 8,014 24,271 −502 −505 32,288 32,285 3 0.01
31-Jul 7,085 7,308 24,041 −223 −230 31,356 31,349 7 0.02
Source: Street and Duckett 1996
the error of closure ranged from a shortfall of 62 (Fig. 2), it is inconceivable that they give different
(1.70%) to a surplus of +62 (+1.58%) cases and counts. It is therefore reasonable to suspect the
was less than 1.00% in 7 (out of 11) instances. data when the counts appear inconsistent.
Kreindler and Bapuji’s data is compatible Kreindler and Bapuji (2010, 76) recognized that
with formula (3) and the primary hypothesis – their count of new “arrivals” might be considered
an increase in “departures” (net of “arrivals”) inflated if admission was the only outcome
may accompany a reduction in the size of the of interest, so they calculated net “arrivals”
list. But we used a scale of 1 mm per 9.5 cases to (2005–2007) by deducting those “removed from
estimate the size of the list and a scale of 1 mm the wait list without surgery” (2005–2007).
per 6.5 cases to estimate the number of “arrivals” Kreindler and Bapuji (2010) may have
and “departures,” so nine out of 11 errors can- deducted the number “removed” from the list
not be attributed to inaccuracy in reading the during a 3 months period from the number
relevant plot. known to have enrolled on the list in the same
When entry (Cthen + E) and exit (A + Cnow) quarter. It is likely that some of those deducted in
dates are used to enumerate the same lifelines this fashion had enrolled earlier. If so, the net
Table 5 Was the change in size directly correlated with the balance of “arrivals” and “departures” in Winnipeg, Canada?
Waiting in Winnipeg, Canada
Year
end 'arrivals' 'departures' Flow Stock dates of entry dates of exit difference (%)
[3] [4] [6] [9] [10] [11] [12] [13] [14]
2005 31-Jan 3,076
28-Feb 3,171
31-Mar 800 600 3,200 200 3,800
30-Apr 3,276
31-May 3,271
30-Jun 797 710 3,338 87 138 3,997 4,048 −51 −1.27
31-Jul 3,352
31-Aug 3,390
30-Sep 745 681 3,400 64 62 4,083 4,081 2 0.05
31-Oct 3,424
30-Nov 3,414
31-Dec 674 739 3,371 −65 −29 4,074 4,110 −36 −0.88
2006 31-Jan 3,352
28-Feb 3,271
31-Mar 679 892 3,190 −213 −181 4,050 4,082 −32 −0.79
30-Apr 3,133
31-May 3,062
30-Jun 769 868 3,029 −99 −161 3,959 3,897 62 1.58
31-Jul 3,043
31-Aug 2,957
30-Sep 769 816 3,024 −47 −5 3,798 3,840 −42 −1.10
31-Oct 2,995
30-Nov 2,957
31-Dec 677 790 2,881 −113 −143 3,701 3,671 30 0.81
2007 31-Jan 2,881
28-Feb 2,867
31-Mar 732 842 2,833 −110 −48 3,613 3,675 −62 −1.70
30-Apr 2,771
31-May 2,681
30-Jun 716 865 2,662 −149 −171 3,549 3,527 22 0.62
31-Jul 2,629
31-Aug 2,581
30-Sep 616 677 2,614 −61 −48 3,278 3,291 −13 −0.40
31-Oct 2,562
30-Nov 2,562
31-Dec 685 748 2,519 −63 −95 3,299 3,267 32 0.97
2008 31-Jan 2,500
Source: Kreindler and Bapuji 2010
“arrivals” will sometimes underestimate (and In Sweden

sometimes overestimate) the number which actu-
ally enrolled (E) and proceeded to receive surgery. Armstrong (2010) reports a study of cataract
As a result, E A would sometimes yield too extraction across Sweden. He claims that “[t]he
positive, and sometimes too negative, a value. stock-flow model . . . predicts that the size of the
Moreover, Kreindler and Bapuji (2010) do not list will increase when there is a decrease in
report deducting those “removed” from each cen- admissions (and removals) net of enrolment, and
sus which followed their enrolment so the balance vice versa” (Armstrong 2010, 113). Armstrong
of net “arrivals” and “departures” could not (2010) uses counts of enrolments and admissions
entirely account for any change in the size of the to describe activity over 64 periods each of
list even if it were correct. 3 months duration.
312 P. W. Armstrong
The change in stock correlated perfectly with the size of the list at the close of each quarter
net in-flow (Spearman’s rho = + 1.00, n = 64, (the KH07) and the amount of activity over its
p < 0.01) (Armstrong 2010). The number of course (the KH06 and KH07A). These central
dates of entry (Cthen + E) equals the number of returns were collated by the Department of
dates of exit (A + Cnow), and there was no error of Health and used to produce aggregate counts for
closure in any of the quarters studied. England.
It seems that the National Cataract Register
for Sweden is entirely consistent with formula (3) Twelve Periods Each of 3 Months
and the primary hypothesis – the relationship Duration
between enrolments, admissions, and the size of Newton et al. (1995) reports a study of elective
the list was found to be mathematically exact. inpatient activity combined across NHS hospitals
None of the numbers presented in columns 3, 4, in England. The authors acknowledge that
and 6 of Table 6 were obtained by calculation. The “studies . . . have so far failed to show a strong
count of enrolments was obtained by enumerating inverse correlation between admission rates and
records with a start date in the period of interest, list size” (Newton et al. 1995, 784). Newton et al.
and the count of admissions was obtained by enu- (1995) describe activity over 12 periods each of
merating records with an end date in the relevant 3 months duration using counts of additions and
period. The count of those awaiting admission was admissions from the KH06 return and counts of
obtained by enumerating records where the start the number still waiting from the KH07 return.
date preceded, and where the end date succeeded, They report that “changes in the number of admis-
the date and time of the relevant census. sions correlated inversely with changes in list size
It is helpful, on this occasion, that the (r = – 0.62; P < 0.001) . . . [a]fter adjusting for
dataset registers extractions and is compiled ret- changes in the number of additions to lists”
rospectively. It does not contain any record (Newton et al. 1995, 783). They obtain an inverse
where a patient was removed from the list with- relationship because they model the effect on
out having received treatment, and it does not changes in size of admission (adjusting for
contain any record where the outcome is not yet enrolments) rather than the effect of enrolment
known. So if we want to know how many cata- (adjusting for admissions). The correlation is sig-
racts were enrolled during a particular quarter, nificant but not perfect, which means the errors of
or how many – at a specified date – were still closure cannot be zero. Regrettably, the authors
awaiting extraction, we have to allow sufficiently plotted the number of admissions and the number
lengthy follow-up to ensure that each of them still waiting but not the number of additions, so
received treatment. (Armstrong (2010) restricted we are not able to construct a suitable table for
his analysis to the set of cataracts extracted less ourselves.
than 2 years after enrolment.) But no count has to We think this result is due – at least in part – to
be adjusted in the manner described by Kreindler a mismatch between their model and the records.
and Bapuji (2010) to exclude those removed The KH07 census counted some people who were
from the list. As a result, the records are consis- subsequently removed from the list without hav-
tent with the model. ing been admitted. Street and Duckett (1996)
recognized that the size of their waiting list dimin-
ished as a result of deletion from the list, and they
In England counted other reasons for deletion alongside treat-
ment, but Newton et al. (1995) did not supplement
The four studies cited here provide different com- their counts of admissions with the counts of other
pilations from the same series of counts. These removals though these were also available from
counts were obtained from the Patient Adminis- the KH06 return.
tration System for each provider and used to com- If we modify formula (3) to allow for an out-
plete a set of standard forms, which described come other than admission, we obtain
Table 6 Did the balance of enrolments and admissions adequately account for the change in size in Sweden?
Waiting for cataract extraction

Year
end enrolments admissions Flow Stock dates of entry dates of exit difference (%)
[3] [4] [6] [9] [10] [11] [12] [13] [14]
31-Dec 10,169
1992 31-Mar 8,074 8,360 9,883 −286 −286 18,243 18,243 0 0.00
30-Jun 7,259 7,307 9,835 −48 −48 17,142 17,142 0 0.00
30-Sep 7,879 7,237 10,477 642 642 17,714 17,714 0 0.00
31-Dec 9,522 8,706 11,293 816 816 19,999 19,999 0 0.00
1993 31-Mar 9,863 9,844 11,312 19 19 21,156 21,156 0 0.00
30-Jun 8,525 8,580 11,257 −55 −55 19,837 19,837 0 0.00
30-Sep 8,273 7,700 11,830 573 573 19,530 19,530 0 0.00
31-Dec 9,933 9,988 11,775 -55 −55 21,763 21,763 0 0.00
1994 31-Mar 10,515 10,172 12,118 343 343 22,290 22,290 0 0.00
30-Jun 9,966 9,435 12,649 531 531 22,084 22,084 0 0.00
30-Sep 9,285 8,207 13,727 1,078 1,078 21,934 21,934 0 0.00
31-Dec 10,940 10,709 13,958 231 231 24,667 24,667 0 0.00
1995 31-Mar 11,235 11,330 13,863 -95 −95 25,193 25,193 0 0.00
30-Jun 9,550 9,081 14,332 469 469 23,413 23,413 0 0.00
30-Sep 9,195 7,765 15,762 1,430 1,430 23,527 23,527 0 0.00
31-Dec 10,740 9,917 16,585 823 823 26,502 26,502 0 0.00
1996 31-Mar 11,732 11,637 16,680 95 95 28,317 28,317 0 0.00
30-Jun 10,970 10,180 17,470 790 790 27,650 27,650 0 0.00
30-Sep 10,310 8,859 18,921 1,451 1,451 27,780 27,780 0 0.00
31-Dec 12,669 11,287 20,303 1,382 1,382 31,590 31,590 0 0.00
1997 31-Mar 12,598 11,713 21,188 885 885 32,901 32,901 0 0.00
30-Jun 12,504 10,886 22,806 1,618 1,618 33,692 33,692 0 0.00
30-Sep 11,026 9,076 24,756 1,950 1,950 33,832 33,832 0 0.00
31-Dec 13,584 12,893 25,447 691 691 38,340 38,340 0 0.00
1998 31-Mar 13,749 14,006 25,190 −257 −257 39,196 39,196 0 0.00
30-Jun 13,693 12,415 26,468 1,278 1,278 38,883 38,883 0 0.00
30-Sep 11,974 10,867 27,575 1,107 1,107 38,442 38,442 0 0.00
31-Dec 15,191 16,211 26,555 −1,020 −1,020 42,766 42,766 0 0.00
1999 31-Mar 15,368 15,412 26,511 −44 −44 41,923 41,923 0 0.00
30-Jun 15,556 14,319 27,748 1,237 1,237 42,067 42,067 0 0.00
30-Sep 12,372 11,111 29,009 1,261 1,261 40,120 40,120 0 0.00
31-Dec 16,187 15,466 29,730 721 721 45,196 45,196 0 0.00
2000 31-Mar 16,729 15,982 30,477 747 747 46,459 46,459 0 0.00
30-Jun 15,102 13,888 31,691 1,214 1,214 45,579 45,579 0 0.00
30-Sep 12,315 11,545 32,461 770 770 44,006 44,006 0 0.00
31-Dec 15,651 16,972 31,140 −1,321 −1,321 48,112 48,112 0 0.00
2001 31-Mar 16,924 18,027 30,037 −1,103 −1,103 48,064 48,064 0 0.00
30-Jun 15,428 15,665 29,800 −237 −237 45,465 45,465 0 0.00
30-Sep 14,280 12,775 31,305 1,505 1,505 44,080 44,080 0 0.00
31-Dec 19,128 20,186 30,247 −1,058 −1,058 50,433 50,433 0 0.00
2002 31-Mar 19,272 20,330 29,189 −1,058 −1,058 49,519 49,519 0 0.00
30-Jun 17,992 18,399 28,782 −407 −407 47,181 47,181 0 0.00
30-Sep 14,865 14,150 29,497 715 715 43,647 43,647 0 0.00
31-Dec 19,508 20,222 28,783 −714 −714 49,005 49,005 0 0.00
2003 31-Mar 19,966 20,820 27,929 −854 −854 48,749 48,749 0 0.00
30-Jun 18,366 18,469 27,826 −103 −103 46,295 46,295 0 0.00
30-Sep 15,152 14,116 28,862 1,036 1,036 42,978 42,978 0 0.00
31-Dec 19,893 19,968 28,787 −75 −75 48,755 48,755 0 0.00
2004 31-Mar 20,492 21,577 27,702 −1,085 −1,085 49,279 49,279 0 0.00
30-Jun 18,639 19,406 26,935 −767 −767 46,341 46,341 0 0.00
30-Sep 14,776 14,264 27,447 512 512 41,711 41,711 0 0.00
31-Dec 20,181 19,474 28,154 707 707 47,628 47,628 0 0.00
(continued)
314 P. W. Armstrong
Table 6 (continued)
Waiting for cataract extraction

Year
end enrolments admissions Flow Stock dates of entry dates of exit difference (%)
[3] [4] [6] [9] [10] [11] [12] [13] [14]
2005 31-Mar 19,061 20,739 26,476 −1,678 −1,678 47,215 47,215 0 0.00
30-Jun 18,658 21,244 23,890 −2,586 −2,586 45,134 45,134 0 0.00
30-Sep 13,640 13,866 23,664 −226 -226 37,530 37,530 0 0.00
31-Dec 18,106 20,877 20,893 −2,771 −2,771 41,770 41,770 0 0.00
2006 31-Mar 18,435 21,213 18,115 −2,778 −2,778 39,328 39,328 0 0.00
30-Jun 16,486 18,559 16,042 −2,073 −2,073 34,601 34,601 0 0.00
30-Sep 13,858 13,110 16,790 748 748 29,900 29,900 0 0.00
31-Dec 20,026 19,164 17,652 862 862 36,816 36,816 0 0.00
2007 31-Mar 19,855 20,370 17,137 −515 −515 37,507 37,507 0 0.00
30-Jun 18,094 18,749 16,482 −655 −655 35,231 35,231 0 0.00
30-Sep 14,693 13,231 17,944 1,462 1,462 31,175 31,175 0 0.00
31-Dec 20,490 19,699 18,735 791 791 38,434 38,434 0 0.00
Source: Armstrong 2010
E ðA þ RÞ ¼ Cnow Cthen , (3:1) counts of decisions to admit and of the number

admitted or removed to describe activity over one
which shows the relationship between the change period of 3 months duration. (It is therefore not
in size and the balance of enrolments and admis- possible to assess the strength of association
sions (plus other removals). between change in stock and net in-flow.)
If we add A + R to both sides, we obtain The NAO (2001a, 21) “was unable to reconcile
aggregated changes in [the size of] the waiting list.”
It found 24,312† more patients still on the list at the
E ¼ A þ R þ Cnow Cthen : (4:1)
close of the quarter than were accounted for
by additions and “admissions” plus “removals”
Formula (4.1) has been used to estimate the
(Table 3c). “The Department of Health explain
number of enrolments when the relevant count is
the discrepancy by acknowledging that they do
not available (Naylor et al. 1997).
not measure every flow onto and off of the waiting
If we add Cthen to both sides of formula (4.1),
list, but focus on the major ones such as hospital
we obtain
admissions and suspensions” (NAO 2001a, 21). It
is noteworthy that the patients removed from the
Cthen þ E ¼ ðA þ RÞ þ Cnow , (1:1) list are a substantial flow but are not mentioned, and
the patients suspended are mentioned but are nei-
which allows us to compare the dates of entry and ther substantial, accounting for a reduction in size
the dates of exit of those on the list at any point of another 74 cases†, nor a flow – as recorded in the
between the two censuses (Armstrong 2000). available returns.
Nevertheless, the data used by Newton et al. E (A + R) must exactly equal Cnow Cthen,
(1995) is compatible with formula (3) and the if E (A + R) accounts for all of those who
primary hypothesis – an increase in “admissions” joined the list or who left it in the interval between
(net of ‘additions’) may accompany a reduction in Cthen and Cnow; if enrolments, admissions,
the size of the list. removals, and size enumerate the same lifelines
(whether these are episodes of investigation or treat-
One Period of 3 months Duration ment, the conditions which prompted those, or the
The National Audit Office (NAO 2001a) reports a patient in possession of one or more of these); and if
study of all elective inpatient and day-case activity all four counts are accurate. This is why the
combined across the NHS in England. It uses National Audit Office (2001a) was not happy with
Table 3c Did the balance of decisions to admit and of “admissions” and “removals” adequately account for the change in
size in England?
Waiting for Admission (in-patient or day case)
No. of No. of No. of Size of
Month- error of closure
Year 'decisions- elective other waiting list Net in- Change in Counting Counting
end
to-admit' 'admissions 'removals' Flow Stock dates of entry dates of exit difference (%)
now
E A R C E−(A+R) C now−C then C
then
+E A+R+C
now
[3] [4] [5] [6] [9] [10] [11] [12] [13] [14]
2000 31-Dec 1,034,381
2001 31-Mar 992,918 872,188 172,696 1,006,727 −51,966 −27,654 2,027,299 2,051,611 −24,312 −1.19
Source: NAO 2001a
Table 3d Did the balance of decisions to admit and of admissions and removals adequately account for the change in
size in England?

No. of No. of No. of Size of
error of closure
Year-end “decisions elective other waiting list Net in- Change in Counting Counting
to admit” “admissions” “removals” Flow Stock dates of entry dates of exit difference (%)
now
E A R C E−(A+R) C now−C then C
then
+E A+R+C
now
[3] [4] [5] [6] [9] [10] [11] [12] [13] [14]
31-Mar-89 2,783,298 2,632,085 200,677 922,676 −49,464 3,755,438
31-Mar-90 2,943,658 2,768,482 260,503 958,976 −85,327 36,300 3,866,334 3,987,961 −121,627 −3.10
31-Mar-91 2,964,836 2,761,005 306,899 948,243 −103,068 −10,733 3,923,812 4,016,147 −92,335 −2.33
31-Mar-92 3,257,615 2,993,532 387,980 917,717 −123,897 −30,526 4,205,858 4,299,229 −93,371 −2.20
31-Mar-93 3,480,268 3,111,627 412,299 994,974 −43,658 77,257 4,397,985 4,518,900 −120,915 −2.71
31-Mar-94 3,501,715 3,110,477 451,559 1,065,369 −60,321 70,395 4,496,689 4,627,405 −130,716 −2.87
31-Mar-95 3,765,407 3,376,016 521,320 1,044,051 −131,929 −21,318 4,830,776 4,941,387 −110,611 −2.26
31-Mar-96 3,968,825 3,500,353 547,863 1,048,029 −79,391 3,978 5,012,876 5,096,245 −83,369 −1.65
31-Mar-97 4,111,511 3,549,074 551,999 1,158,004 10,438 109,975 5,159,540 5,259,077 −99,537 −1.91
31-Mar-98 4,192,037 3,543,634 558,242 1,297,662 90,161 139,658 5,350,041 5,399,538 −49,497 −0.92
31-Mar-99 4,189,323 3,826,507 672,432 1,072,860 −309,616 −224,802 5,486,985 5,571,799 −84,814 −1.53
31-Mar-00 4,159,078 3,682,180 622,787 1,037,066 −145,889 −35,794 5,231,938 5,342,033 −110,095 −2.08
31-Mar-01 3,935,930 3,467,338 613,931 1,006,727 −145,339 −30,339 4,972,996 5,087,996 −115,000 −2.29
31-Mar-02 3,781,437 3,244,185 581,534 1,035,365 −44,282 28,638 4,788,164 4,861,084 −72,920 −1.51
31-Mar-03 3,778,390 3,330,981 601,353 992,075 −153,944 −43,290 4,813,755 4,924,409 −110,654 −2.27
31-Mar-04 3,802,744 3,391,644 621,345 905,753 −210,245 −86,322 4,794,819 4,918,742 −123,923 −2.55
31-Mar-05 3,787,713 3,390,694 612,004 821,722 −214,985 −84,031 4,693,466 4,824,420 −130,954 −2.75
31-Mar-06 4,031,519 3,577,104 613,626 784,572 −159,211 −37,150 4,853,241 4,975,302 −122,061 −2.48
31-Mar-07 4,154,486 3,746,666 613,886 700,624 −206,066 −83,948 4,939,058 5,061,176 −122,118 −2.44
31-Mar-08 4,355,950 4,043,307 646,394 531,520 −333,751 −169,104 5,056,574 5,221,221 −164,647 −3.20
31-Mar-09 4,979,682 4,418,090 647,550 565,954 −85,958 34,434 5,511,202 5,631,594 −120,392 −2.16
Source: House of Commons Health Committee 2010
any discrepancy between the two figures and why counts of decisions to admit and of the number
the Department of Health concurred (CRIR 1997). admitted or removed from the list to describe
The NAO’s data is compatible with formula (3) activity over 20 periods each of 12-months dura-
and the primary hypothesis – an increase in tion (Table 3d). These counts were obtained from
‘admission’ (plus “removal” net of enrolment) the same returns used by the NAO (2001a).
may accompany a reduction in the size of the list. The correlation between E (A + R) and
Cnow Cthen was positive, strong, and statisti-
Twenty Periods each of 12-months cally significant (Spearman’s rho = + 0.97,
Duration n = 20, p < 0.01). But the number of dates of
The House of Commons Health Committee entry (Cthen + E) did not equal the number of
(2010) published an extended series of counts dates of exit (A + R + Cnow): the discrepancy
obtained from the Department of Health. It uses ranges from –164,647 (–3.20%) to
316 P. W. Armstrong
Table 7 Does the balance of enrolments and admissions (plus other removals) correctly predict the direction of any
change in the size of the list?
a) Street & Duckett, 1996 b) Health Committee, 2009 c) Armstrong, 2000
E−A E−A E−A
+ − + − + −
+ 11 2 13 + 2 6 8 + 2 5 7
C now −C then C now −C then C now −C then
− 0 19 19 − 0 12 12 − 0 2 2
LB = 84.62 (95% C.I. = 45−100) LB = 25.00 LB = 0.00
–49,497 (–0.92%) patients. The counts systemat- an increase in admission (plus removal) net of
ically overestimate the number of exits from the enrolment may accompany a reduction in the size
English waiting list (or systematically underesti- of the list.
mate the number of entries on it).
Had we predicted that the size of the list would Nine Periods each of 6-months Duration
shrink, we would have been mistaken only eight Armstrong (2000) reports a study of elective inpa-
times out of 20 (Table 7). Had we used net in-flow tient and day-case activity combined across NHS
to predict the direction of change in stock, we hospitals in England. He describes nine periods
would have predicted an increase on two occa- each of 6-months duration using counts of deci-
sions and a decrease on 18, i.e., we would have sions to admit and of the number “admitted” or
been mistaken on six out of 20 occasions. This “removed”, who “self-deferred”, “failed to
reduction in the error of prediction of 25% attend”, or were “suspended”. These counts were
(LB = 0.25) is not significant. So the direction of obtained from the same returns used by the NAO
any change in size appears to have had little to do (2001a) and by the House of Commons Health
with the efforts made during the course of the Committee (2010).
year. Results such as this might go some way to In Table 3e, the change in size is always more
explaining the frustration of at least one former positive than the net in-flow by between 68,237 and
Minister of Health (Powell 1966). 32,115 patients, so the error of closure ranged
The six exceptions in this data might be thought between 2.27% and 1.15%. Armstrong asserts
consistent with hypotheses of self-regulation and of that “[t]he number of patients waiting at the start of
supplier-induced demand – the size of the list showed a calendar period of interest or who counted as new
an increase when it ought to have shown a decrease. ‘decisions-to-admit’ or as those ‘reset-to-zero’ or
But it should be noted that the exceptions in the data ‘reinstated’ during it, must be reconciled with the
provided by Street and Duckett (1996) occur only numbers admitted, removed, self-deferred, failed,
when E = A, i.e., when E A = 0, and that there medically deferred or suspended during the calen-
are no exceptions in the data presented by other dar period of interest or still awaiting admission at
researchers (White 1980; Moral and de Pancorbo its close” (Armstrong 2000, 2043). But he was
2001; Kreindler and Bapuji 2010; Armstrong unable to account for this discrepancy by allowing
2010), i.e., the direction of net in-flow (E A) perfor other flows “onto and off of the waiting list” for
fectly predicts the direction of any change in size which there were data, i.e., those who were
(Cnow Cthen). More importantly, exceptions (Street suspended from the list, those who canceled
and Duckett 1996; House of Commons Health Com- arrangements for their own admission or who sim-
mittee 2010) are observed only because the number ply did not attend, those who were reinstated to the
of dates of entry does not equal the number of dates list, and those whose start date was reset to zero
of exit in the KH06 and KH07 returns. (Armstrong 2000, 2043–2045).
The Health Committee’s data is compatible The correlation between E (A + R) and Cnow
with formula (3) and the primary hypothesis – Cthen was positive, strong, and statistically
14
Table 3e Did the balance of “decisions to admit” and of “admissions” and “removals” adequately account for the change in size in England?
Censused 'Decisions- 'Reset-to-'Reinstated' Admitted Removed Self-deferred Failed Medically Suspended Censused 31 Net in- Change in Counting Counting error of closure
Year
30 June to-admit' zero' deferred December Flow Stock dates of entry dates of exit difference (%)
then now then now
C E A R C E-(A+R) C now−C then C +E A+R+C
[3] * † [4] [5] [6] [9] [10] [11] [12] [13] [14]
1988 878,306 1,389,133 298,687 - 1,286,087 95,431 95,508 203,179 - - 931,495 7,615 53,189 2,566,126 2,611,700 −45,574 −1.8
1989 922,877 1,446,243 307,945 - 1,323,492 122,104 99,189 208,756 - - 971,845 647 48,968 2,677,065 2,725,386 −48,321 −1.8
1990 955,786 1,485,021 210,352 - 1,373,394 154,738 101,028 109,324 - - 965,520 −43,111 9,734 2,651,159 2,704,004 −52,845 −2.0
1991 964,050 1,614,328 190,474 - 1,463,869 196,526 93,065 97,409 - - 950,098 −46,067 −13,952 2,768,852 2,800,967 −32,115 −1.2
1992 937,054 1,748,716 204,380 - 1,553,237 202,358 111,373 93,007 - - 977,189 −6,879 40,135 2,890,150 2,937,164 −47,014 −1.6
1993 1,019,341 1,731,690 225,203 - 1,531,449 222,034 133,802 91,401 - - 1,065,785 −21,793 46,444 2,976,234 3,044,471 −68,237 −2.3
1994 1,077,497 1,861,754 257,577 50,008 1,665,747 251,393 160,343 97,234 - 50,008 1,070,492 −55,386 −7,005 3,246,836 3,295,217 −48,381 −1.5
1995 1,052,958 1,972,067 288,143 92,966 1,739,917 273,491 182,723 105,420 - 92,966 1,054,948 −41,341 1,990 3,406,134 3,449,465 −43,331 −1.3
1996 1,056,122 2,067,520 306,572 123,383 1,799,013 273,861 193,345 113,227 - 123,383 1,104,984 −5,354 48,862 3,553,597 3,607,813 −54,216 −1.5
1997 1,207,515 ‡ ‡ ‡ ‡ ‡ ‡ ‡ - ‡ 1,261,915 54,400
Note: The numbers in italics contribute nothing to the difference between those becoming eligible for admission, and those becoming ineligible, so the error of closure is really a
comparison of Cthen + E and A + R + Cnow
Adapted from Armstrong (2000)
a
Estimated as the number who self-deferred or failed-to-attend for admission to hospital that quarter
b
Estimated as the number temporarily suspended or deferred on medical grounds that quarter
c
The quarterly counts were not collected in 1997/98
Health Services Knowledge: Use of Datasets Compiled Retrospectively to Correctly. . .
317
318 P. W. Armstrong
significant (Spearman’s rho = + 0.95, n = 9, incongruous because White expects changes in

p < 0.01). The data (Armstrong 2000) is compat- the number of enrolments, i.e., “GP referrals”
ible with formula (3.1) and the primary hypothesis (White 1980, 271), to affect the size of the
– an increase in admission plus other removals (net outpatient waiting list as well as changes in the
of enrolment plus other additions) may accompany number of admissions, e.g., “new outpatients
a reduction in the size of the list. booked” (White 1980, 271–272).
Armstrong (2000) concludes that “[a]lthough • Other researchers mismatch the timing of the
the NHS is obliged to produce complete and accu- counts. Street and Duckett (1996) present the
rate reports of how it used public monies, the same number of additions and deletions each month
standard has yet to be applied to accounts of what for use with annual censuses of the list, and
became of patients enrolled on the waiting list for Kreindler and Bapuji (2010) present the num-
England” (Armstrong 2000, 2045). If we combine ber of arrivals and departures each quarter for
the numbers who self-deferred, failed to attend, or use with censuses taken 1 month apart.
were suspended from the list, we find that they • A few researchers draw conclusions so reluctant
accounted for between 10.3% and 17.8% of flows as to falsify what the data otherwise verifies. So
off the list. The Department of Health acknowl- Hamblin et al. (1998) present counts which
edges that “they do not measure every flow onto seem to confirm the existence of a perfect math-
and off of the waiting list, but focus on the major ematical relationship between the changes in
ones” (NAO 2001a, 21). size and the balance of enrolments and admis-
No counts were collected of the numbers ‘reset sions (but for the error discussed above in con-
to zero’ having previously deferred admission or nection with Tables 3a and 3b). Yet rather than
having failed to attend on the date in question, evaluating the primary hypothesis, in which the
and no counts were collected of the numbers variation in enrolment confounds the effect of
‘reinstated’ to the list having previously been variation in admission on the size of the list and
suspended from it, i.e., the data model was more for which they appear to have data, they advo-
complex than the central returns allowed. So we do cate “the acceptable wait hypothesis” in which
not know the size of the error of closure or whether the variation in enrolments duplicates variation
it is systematic; and we do not know whether E ( in admissions (Hamblin et al. 1998, 37, 42,
A + R) predicts Cnow Cthen or not. 59, 64) in order to maintain the length of wait
for which they do not have data.
A Problem of Our own Making
Under the primary hypothesis, we expect the size Instead, rigorous testing has been left to audi-
of the list to change from one census to another by tors untroubled by secondary hypotheses of, for
any difference in enrolments and admissions dur- example, supplier-induced demand. So the
ing the interval. But researchers do not appear to National Audit Office for England (2001a)
have had sufficient confidence in the hypothesis to expects to “reconcile” the two counts of lifelines,
subject it to rigorous testing. Cthen + E and (A + R) + Cnow, because it appre-
ciates that – as in double-entry bookkeeping – the
• Many researchers omit enrolments (Feldstein relationship ought to be exact.
1967; Culyer and Cullis 1976; Snaith 1979; It is difficult to obtain consistent counts of
Frost 1980; Buttery and Snaith 1980; Fowkes stock and flow if the population (or waiting list)
et al. 1983; Goldacre et al. 1987; Niinimäki is narrowly defined. These difficulties are exag-
1991; Harvey et al. 1993; Nordberg et al. gerated if members move from one population to
1994; Kenis 2006). So White (1980, 273–274) another and if the methods of data capture are felt
examines the effect of admissions, i.e., of “dis- to be unduly onerous. So the error of closure
charges and deaths,” on the size of the inpatient allows demographers to assess whether the regis-
waiting list without considering the effect of tration of vital events (births and deaths) and
enrolments. This is more than usually of migration (in and out) has yielded counts
consistent with periodic censuses of population. It 1981b). In other words, we assumed that the
ought to be relatively easy to obtain counts of waiting list had all of the attributes implied by
enrolments, admissions, and removals which are our use of these two variables. (We did not modify
consistent with periodic censuses of the list. the size of the list, deducting any patient who was
We can cross-examine the paper, or digital, suspended or deferred at that point; and we did not
records rather than the individuals they represent: modify the length of wait, deducting any period
the records are retrieved and dismissed at the when a patient was considered to be unfit or
researchers’ convenience, and their details are thought to be unavailable (Armstrong 2010)).
always available for inspection and analysis. It Like ourselves, other researchers are obliged to
ought therefore to have been possible to report assume that the data are complete (or are at least
an error of closure of 0% under the Körner representative) and that the data are accurate
Reporting System (CRIR 1997) which counted (or are at least not distorted) if they wish to pro-
relevant records from a hospital’s Patient Admin- ceed with their enquiries. Our success seems to
istration System. One part of the error observed suggest that the difficulties experienced by others
was due to the use of inconsistent definitions (Armstrong 2000; NAO 2001a) may be due to a
(Newton et al. 1995), another was due to incom- mismatch between the model and the data. The
plete flows (NAO 2001a), and still another was, dataset is simple (IMG 1992); the data model is
we think, the result of allowing the data model to elaborate.
become too elaborate (Armstrong 2000). If our So when the Steering Group on Health Ser-
systems do not allow us to identify who was vices Information (1984) proposed what became
eligible for admission in the period between two the KH06 and KH07 returns, they envisaged that
censuses, and if they do not allow us to demon- patients would join the list as the result of a
strate that there are as many dates of exit for this ‘decision to admit’ authorized by a clinician
set of records as there are dates of entry, the (Steering Group 1984, 85; IMG 1992, 5/3 & 5/8)
apparent complexity of the waiting list is a prob- and that patients would leave the list either as the
lem of our own making. result of “hav[ing] been admitted” (Steering
Group 1984, 85) or as the result of “no longer
needing to be admitted” (Steering Group 1984,
The Balance of Enrolments 86). The only complication which seems to have
and Admissions (Plus Other Removals) been envisaged relates to those patients whose
Equals the Change in Size. Why? arrangements for admission miscarry. These fall
into four categories: (a) those who did not attend,
We attribute our success (Armstrong 2010) in i.e., who neither declined the arrangement in
demonstrating this relationship to two things. advance nor presented themselves on the day,
(b) those who deferred admission by contacting
the hospital in advance, (c) those whose admis-
If the Model Is Not Complicated, sion was canceled by the hospital, and (d) those
the Data Must Be Simple! who were admitted but subsequently discharged
without having undergone investigation or
The first is our assumption that each wait started treatment.
and ended on the start and end dates of the record. The Working Group recommended that infor-
This implies (a) that the dataset is complete, i.e., mation be collected about the “[n]umber of
that no record was omitted, and (b) that both dates patients for whom arrangements to admit were
were entirely accurate. It also implies (c) that made but [who] were not admitted” (DHSS
everyone, having once enrolled, was eventually 1981b, 125), i.e., it did not distinguish between
admitted and (d) that no wait was ever broken. the first, second, and third categories. The
Items (c) and (d) are implied by the data defini- Steering Group recommended that information
tions and tables of Working Group A (DHSS be collected about the “[n]umber of patients . . .
320 P. W. Armstrong
who were not admitted because they failed to 4. Patients who were not admitted from the
attend” (Steering Group 1984, 87), i.e., it did not waiting list because they declined an offer or
distinguish between the first and second catego- canceled an arrangement were also not to be
ries. But the Steering Group also recommended counted as “not admitted” (IMG 1992, para.
that information be collected about the “[n]umber 41; CRIR 1997).
of patients for whom . . . admission did not take
place because of cancellation by the hospital” The instructions assert that “patients should
(Steering Group 1984, 87), i.e., about the third only be taken off the elective admission list
category. But the KH07A return, developed in when they have been treated – unless the treat-
the 6 months prior to implementation of the sys- ment is no longer required” (IMG 1992, para. 41),
tem (DHSS 1986), asked for counts of the number as though this had always been self-evident. But
of patients who deferred their own admission (the the examples given seem to suggest that practice
second category) rather than counts of the number was in need of correction. “Patients should not be
whose admission was canceled by the hospital. removed from the waiting list, because of self-
The earliest version of the KH06 reported four deferrals or deferral by the hospital. For example,
“events occurring during the quarter” (DHSS a patient admitted but sent home because treat-
1986, 4) namely, the “decisions to admit” which ment has been deferred . . . should not be removed
marked addition to the list and three mutually from the elective admission list” (IMG 1992, para.
exclusive outcomes which marked subtraction 41). Those who “failed to arrive” (IMG 1992,
from it. It was anticipated that a patient might be para. 48) are carefully distinguished from “self-
admitted from the waiting list to undergo investi- deferred admissions . . . or admissions cancelled
gation or treatment on an elective basis prior to by the hospital” (IMG 1992, para. 87). They have
discharge, that a patient might not be admitted neither been admitted from the waiting list as
although arrangements for this had been made, arranged nor removed from the waiting list as no
or else that a patient might be removed from the longer requiring elective admission. So they
waiting list as no longer requiring the elective appear to constitute a third class of event in the
admission intended. earliest version of the return in addition to the two
The three outcomes were subsequently defined expressly authorized.
by the Data Manual (version 1.0) so as to sub- The waiting list envisaged by the Steering
sume other possibilities: Group on Health Services Information (1984)
appears to have been one in which the arrange-
1. Patients who were admitted as emergencies ment of admission fulfilled the hospital’s entire
were not to be counted as having been admitted responsibility to the patient. Such a view seems
from the waiting list as arranged (IMG 1992). scarcely credible and therefore needs to be
Rather, they were to be counted as having been substantiated:
removed from the waiting list as no longer
requiring elective admission (CRIR 1997). • Some of the instructions in the Data Manual
2. Patients, who were admitted from the waiting (version 1.0) seem to confirm such an attitude
list but were then discharged from hospital toward the patient. So if a patient “failed to
without undergoing the investigation or treat- arrive” without giving notice of her intentions,
ment planned, were not to be counted as having her details are to be returned to the GPFH who
been admitted from the waiting list as arranged will determine whether she requires a fresh
(IMG 1992). referral, another consultation, and a new deci-
3. Patients who were not admitted from the sion to admit (IMG 1992, para. 71; CRIR
waiting list because the arrangement had been 1997). But the patient who declines an offer or
canceled by the hospital were not to be counted cancels an arrangement in good time receives a
as “not admitted” (IMG 1992, para. 41; CRIR degree of consideration. She is counted as
1997). waiting “with [a] date” until the intended
admission has passed, and she is then given a admission. He noted that they “no longer
start date the same as that on which she ought to need . . . or wish . . . to be admitted” at the
have left the list (IMG 1992). In other words, time of the review. But he thought their even-
the hospital authorizes the patient’s return to the tual removal from the list implied that they
list without forwarding her details to the GPFH, were never really available for admission. He
waiting for another letter of referral, and orga- infers that they were not eligible at the time of
nizing a fresh consultation in due course. The any census in which they appeared and that the
consideration extended to the exception – the decision to admit ought never to have been
patient who self-deferred admission – seems to authorized (Lee et al. 1987). He recommends
confirm the rule about the patient who gave no deducting their contribution to counts of deci-
warning but failed to attend. sions to admit and of the numbers still waiting.
• This attitude also seems to be confirmed by We think this view seriously flawed. He rejects
instructions in the Data Manual (version 4.0) the possibility that these patients could have
about patients discharged without having been received investigation or treatment had it been
investigated or treated. “Patients are taken off made available more promptly.
the elective admission list once they are admit-
ted into hospital. If treatment is then deferred There are grounds therefore for imagining that
because of lack of facilities or for medical the balance of decisions to admit less the three
reasons . . . the patient is discharged . . . A outcomes (KH06) ought to have accounted for
new decision to admit and a new elective differences between the number waiting (KH07)
admission list entry will then be made for the at the close of this quarter and the number waiting
patient” (CRIR 1997, 16). So the wait is con- at the close of the last quarter in the earliest days of
sidered to be completed upon admission the Körner Reporting System. If this were the case,
regardless of what happens next (CRIR the simplest model would require the insertion
1998), and the patient who has not received of an additional variable in formula (3.1) so that
the elective investigation or treatment prom- E (A + N + R) = Cnow Cthen, where N rep-
ised will need “[a] new decision to admit and resents the number “not admitted” during the inter-
a new [entry on the] elective admission list” if val between Cnow and Cthen.
she wishes to try again. The size of such a list Table 3f allows us to assess the consistency
shrinks not only as a result of admissions ofthese counts. The correlation between E (A +
which are followed by investigation and treat- N + R) and Cnow Cthen was strong, but it was
ment but also as a result of admissions which not statistically significant (Spearman’s rho =
are not. 0.96, n = 4, p = 0.20), and it did not have the
• The Working Group recommended that direction desired: the net in-flow indicates that
“waiting lists [be] regularly reviewed to the size of the list was getting smaller, while the
remove patients no longer needing or wishing change in stock indicates that the size of the list
to be admitted” (DHSS 1981b, 127), acknowl- was getting bigger. (The counts of stock (KH07)
edging that some would never be admitted and flow (KH06) do not appear to describe the
from the waiting list. But it did not recommend same waiting list.) There was a substantial error of
counting the “[n]umber of patients . . . removed closure ranging from 10.40% to 4.90% of
from a list for reasons other than elective those eligible for admission at any point over the
admission” (Steering Group 1984, 87). This relevant 6 months.
suggests that the Working Group felt no The discrepancy in Table 3f might be
responsibility toward those removed. One of explained in a number of ways. Apart from simple
the members of the group expressed an appro- underreporting of the number of patients added to
priate concern that the number of those still the list or overreporting of the numbers admitted
waiting should not be exaggerated by includ- from the list or removed, this might occur where
ing anyone no longer eligible for elective individuals are reported as contributing more than
322 P. W. Armstrong
Table 3f Did the balance of “decisions to admit” and of those “admitted,” “not admitted,” or “removed” adequately
account for the change in size in England?
No. of No. who No. who No. who Size of list Net Change Counting Counting error of closure
Year
'decisions- were were 'not were 31-Dec 30-Jun in-Flow in Stock dates of entry dates of exit difference (%)
to-admit ' 'admitted ' admitted ' 'removed '
E A N R C now C then E-(A+N+R) C now -C then C then +E A+N+R+C now
[3] [4] [5] [7] [8] [9] [10] [11] [12] [13] [14]
1988 1,389,133 1,286,087 203,179 95,431 931,495 878,306 −195,564 53,189 2,267,439 2,516,192 −248,753 −10.40
1989 1,446,243 1,323,492 208,756 122,104 971,845 922,877 −208,109 48,968 2,369,120 2,626,197 −257,077 −10.29
1990 1,485,021 1,373,394 109,324 154,738 965,520 955,786 −152,435 9,734 2,440,807 2,602,976 −162,169 −6.43
1991 1,614,328 1,463,869 97,409 196,526 950,098 964,050 −143,476 −13,952 2,578,378 2,707,902 −129,524 −4.90
one outcome but no more than one decision to the number of decisions to admit) and removals
admit. For example, where a patient is transferred from the waiting list that have taken place during
from a list at another hospital and is duly admitted the quarter” and also asserts that “[t]he change in
or removed without a local decision to admit the total numbers waiting should reflect this activ-
having been made (IMG 1992). Or where a patient ity” (CRIR 1997, para. 144). Despite the fact that
is removed from the list as not medically fit for “failed to attend” is classed as an event on the
elective admission (CRIR 1997) and is subse- KH06 return (CRIR 1997, para. 148), the simplest
quently reinstated without a fresh decision to explanation for the discrepancy within Table 3f is
admit having been made (IMG 1992). that there are two outcomes which end enrolment
Other possibilities are more complicated and not three. We obtain a better account of the stock
appear to be capable of accounting only for a part and flow of the English waiting list if we omit the
of the problem. So if a patient is temporarily “failed to attend” (Table 3g).
suspended from the list on medical grounds at the Table 3g shows the consistency of the counts if
close of a quarter, he will either be omitted from the the relationship is, in practice, best described by
decisions to admit over that quarter or else be formula (3.1). The correlation between E (A +
omitted from those still waiting at its close. In the R) and Cnow Cthen was perfect and had the
first instance, there will appear to have been fewer direction desired, but it was not statistically
dates of entry (column 11, Table 3f) to the period of significant (Spearman’s rho = + 1.00, n = 4,
interest and the reported difference (column 13) in p = 0.20). There was a small error of closure
counts of dates of entry and dates of exit and the ranging from 2.14% to 1.24% of those eligible
error of closure (column 14) – being negative – will for admission at any point over the relevant
appear larger. In the second, there will appear to 6 months.
have been fewer dates of exit (column 12) from the This is a little disconcerting. The data model
period of interest and the reported difference (col- used in practice appears to be simpler (CRIR
umn 13) in counts of dates of entry and dates of exit 1997) than the Data Manual would have us believe.
and the error of closure (column 14) – being nega- Within a short time of implementation, the
tive – will appear smaller. Government Statistical Service began to modify
The Data Manual presents a complicated the KH06, KH07, and KH07A returns. Now we
series of rules about what parts of which records sympathize with the performance analyst who
contribute data on the official wait for elective wishes to restrict attention to that part of the list,
admission. But version 1.0 also asserts that and that portion of the wait, for which a manager
“patients should only be taken off the elective (or a clinician) might reasonably be held respon-
admission list when they have been treated – sible. But we think the returns were changed with-
unless the treatment is no longer required” (IMG out considering the effect on the consistency of
1992, para. 41). Version 4.0 claims that “[t]he . . . the counts.
KH06 . . . relate[s] to elective admission list Neither the DHSS (1981b), nor the Steering
events – all the additions to the waiting list (i.e., Group (1984), nor the authors of the first set of
Table 3g Did the balance of “decisions to admit” and of those “admitted” or “removed” adequately account for the
change in size in England?

No. of No. who No. who Size of list Net Change Counting Counting error of closure
Year
'decisions- were were 31-Dec 30-Jun in-Flow in Stock dates of entry dates of exit difference (%)
to-admit ' 'admitted ' 'removed '
E A R C now C then E−(A+R) C now -C then C then +E A+R+C now
[3] [4] [6] [7] [8] [9] [10] [11] [12] [13] [14]
1988 1,389,133 1,286,087 95,431 931,495 878,306 7,615 53,189 2,267,439 2,313,013 −45,574 −1.99
1989 1,446,243 1,323,492 122,104 971,845 922,877 647 48,968 2,369,120 2,417,441 −48,321 −2.02
1990 1,485,021 1,373,394 154,738 965,520 955,786 −43,111 9,734 2,440,807 2,493,652 −52,845 −2.14
1991 1,614,328 1,463,869 196,526 950,098 964,050 −46,067 −13,952 2,578,378 2,610,493 −32,115 −1.24
returns (DHSS 1987) mention the possibility of list and will subsequently contribute to the rele-
suspension from the waiting list either on medi- vant count of admissions or removals, so the size
cal grounds or for social reasons. But version 1.0 of the list is also too small for the number admitted
of the Data Manual instructed the NHS to sus- or removed. The publication of well-worded def-
pend from the list those “patients who are not initions may have improved the consistency of
medically ready for admission” (IMG 1992, 16), meaning attached to the various items, and the
and version 4.0 of the Data Manual advised the suspension of some (IMG 1992) who were not
NHS that this was consistent with the practice of medically ready may have improved the homoge-
not adding patients to the list until they are neity of the group requiring investigation or treat-
“likely to be fit for surgery when offered” ment. But omitting those reinstated during the
(CRIR 1997, 17). Version 4.0 also advised the quarter, and those suspended at its close, did not
NHS that “[p]atients may also be suspended from improve the consistency of counts of enrolments
[a] . . . list for social reasons such as holidays or and admissions (plus removals), with size.
family commitments which may be notified in Insistence on a model with more carefully
advance” (CRIR 1997, 17). specified outputs ought to have prompted the
The IMG (1992, 9 & 18) asserted that development of a dataset with more carefully
“[p]atients who are currently not medically ready defined classes and counts. The National Audit
should not be included in the national returns” Office (2001a, 21) reports the Department of
and emphasized that “patients . . . who are not Health as “acknowledging that they do not mea-
medically ready for admission are excluded from sure every flow onto and off of the waiting list.”
all waiting list central returns.” Now counts of But insistence on a model which introduces a
enrolments, admissions, and size ought to be con- break anywhere between the beginning and end
sistent if each of them exclude all of those of the patient’s time on the list demands another
removed from the list (Lee et al. 1987; Kreindler level of complexity from the dataset.
and Bapuji 2010). In the same way, counts of
enrolments, admissions (plus removals), and size • In some instances, the wait continues to accrue.
ought to be consistent if each of them exclude all The patient who is suspended from the list on
of those ever temporarily suspended from the list. medical grounds becomes invisible to enumer-
But these are patients whose admission to the ation in the census, but there is no outcome or
list was authorized because they were thought end date before the census to account for the
“likely to be fit for surgery when offered.” It is disappearance, and there is no start date or
likely therefore that counts of decisions to reinstatement after the census to account for
admit enumerated some who were subsequently the reappearance (IMG 1992). The effect on
excluded from a census, so the size of the list is too the Körner Reporting System is to make the
small for the number enrolled. Moreover, (most counts of stock and flow less consistent. (The
of) those excluded from the census because they Data Manual (version 4.0) acknowledges
were not medically ready will be reinstated to the the problem. The number of patients suspended
324 P. W. Armstrong
from the list – on social grounds – is to be added thing – give different answers (Armstrong 2000;
back to the number still waiting before assessing NAO 2001a) and in which a simple relationship
whether the counts are consistent (CRIR 1997).) has been made to appear complicated. If the
• In other instances, the accumulated wait is dataset is to be used to develop insight as well as
discounted. The patient who declines an to manage performance, then it must satisfy the
offer or cancels an arrangement accrues time requirements of researchers as well as those of
on the list until the date offered or arranged. analysts.
This then becomes the effective date of the
patient’s addition to the list and the wait accu-
mulated to date is reset to zero. But no out- The Number of ‘Starts’ and ‘Stops’ Must
come marks the end of the first wait, and no Be the Same
decision to admit marks the beginning of the
second, so there is no record of flows which The second reason for our success is that a simple
can account for the changes within the rele- relationship exists.
vant waiting time categories. We identify all of those waiting – at a given
moment – to be admitted for elective investigation
A more elaborate definition of the wait for or treatment, and we conduct a count. The only
investigation or treatment requires a more com- people on the list are those whose date (and time)
plicated dataset, with additional variables to proof enrolment preceded the date (and time) of the
vide a start date and an end date for the latest of census and whose date of admission (or removal)
those occasions on which the patient is classed succeeded it. (If obtaining this count is compli-
as “not being medically ready” (IMG 1992, 9). cated, it is because the list has been so narrowly
A still more elaborate definition requires a still defined that a great number of characteristics have
more complicated dataset, with variables to pro- to be evaluated in order to decide whether a par-
vide start dates and end dates for each occasion on ticular record should be included or not.)
which the patient is suspended (CRIR 1997) and The count varies from one time to another. It is
for each occasion (first, second, etc.) when a not difficult to apprehend that a unit increase in its
patient deferred admission. But what has not size must follow each enrolment over the interval
been recognized is that the occurrence of a break and that a unit decrease in its size must follow
between enrolment and admission (or removal) each admission (or removal), if no one contributes
has to be accounted for by flows other than enrol- more than one record to the dataset. It follows that
ment and admission (or removal). The definitions the balance of enrolments and admissions (plus
adopted under the Körner Reporting System soon removals), E (A + R), must exactly equal any
became so complicated that there were not vari- change in the size of a list, Cnow Cthen, and that
ables enough to represent all of those thought to the completeness, accuracy, and validity of the
be eligible, or ineligible, for admission over counts ought to be questioned whenever it fails
a period of interest (Armstrong 2000; NAO to do so.
2001a). Some patients who had been temporarily There is nothing original about the assertion
suspended as “not medically ready” (IMG 1992, that the balance of enrolments and admissions
9) were subsequently removed from the list with- ought to equal any change in the size of the
out having first been reinstated (CRIR 1997). waiting list. Mason (1976) constructed a hypo-
Data definitions have sometimes become so thetical example which – though it was incom-
elaborate that it has proven impossible to recon- plete – indicated that any difference in the
struct the state of the records as they stood on a numbers of enrolments and admissions was
particular date, even with the most up-to-date expected to account for any change in size, and
versions of the relevant software (Farquharson Fordham (1987) provided a complete example
2011). We think this reprehensible. The result is which showed the behavior of two hypothetical
a list in which two counts – ostensibly of the same lists over four quarters. The Department of Health
instructed those responsible for completing the allowance was made for the effect of variation in
KH06, KH07, and KH07A returns to check the the number of enrolments. In other words, he has
consistency of their submissions for each provider neither established that the first hypothesis needs
each quarter. “Patients waiting at the end of the to be replaced nor has he justified the assertion that
quarter should be equivalent to patients waiting at “decisions are often taken which are based on a
the end of the last quarter plus the number of simplified vision [sic] of the problem, which [are]
additions . . . minus the number of patients admit- inappropriate” (Kenis 2006, 296). Kenis (2006,
ted in the quarter or removed from the elective 296) asserts that “[g]iven a certain level of com-
admission list for other reasons. For the figures to plexity of a problem[,] it will become impossible
balance, suspended patients must also be taken to react in an equally complex way,” and he claims
into account” (CRIR 1997, 32). The National that this is properly the domain of organization
Audit Office (2001b) used the relationship to ver- science. But he does not substantiate the claim that
ify the purported reduction in the size of the list waiting lists possess the requisite level of com-
at Surrey and Sussex Healthcare NHS Trust plexity, and he has not demonstrated that the
(England), 1998–1999: they suspected a reduc- paradigm fits. Instead, he classifies the first
tion of 1800 patients where the number of elective hypothesis as an example of “our modernist-
admissions was known to have reduced, and they rationalist way of thinking” (Kenis 2006, 296)
found – among other things – 700 new patients and – perhaps as a consequence – anticipates its
and 300 transfers from other hospitals who had failure; he does not recognize the first hypothesis
not been added to the list. as an example of double-entry bookkeeping and –
perhaps as a consequence – does not anticipate its
success.
Secondary Hypotheses
Inexplicably Complicated Supplier-Induced Demand
One of the secondary hypotheses offered by the Another of the secondary hypotheses offered by
literature is attributed to the field of organi- the literature comes from the field of health eco-
zation science. Kenis (2006, 296) claims that nomics. It is unfortunate that ‘supplier-induced
“[e]mpirical studies carried out in The Nether- demand’ (Culyer and Cullis 1976) envisaged a
lands and elsewhere show . . . that the input of direct association between the number of admis-
extra resources does not automatically lead to a sions (or its surrogate) and the size of the list
shortening of the waiting list,” and he declares that (Culyer and Cullis 1976) because the notion lay
“[w]aiting lists seem . . . to be an . . . example of a ready to hand and provided what some would think
problem . . . characterized by a high level of com- a plausible explanation. But the first hypothesis
plexity.” Neither observation is new. We don’t anticipates a relationship between the number of
know who first suggested that the size of the list enrolments, the number of admissions, and the size
is influenced by many factors. But Sanmartin et al. of the waiting list which is mathematically exact,
(1998) drew attention to a plethora of factors so there is no room for a second hypothesis until
which appeared to account for a part of the varia- the first has proven false. Moreover, it is still nec-
tion in size (DHSS 1975; Newton et al. 1995; essary – once the primary hypothesis has proven
Hanning and Lundström 1998) and advocated false – for the secondary hypothesis to prove true.
the use of complex models to evaluate their inter- In a cross-sectional study, we might expect to
action and combined effect (DeCoster et al. 2007; see variations between one hospital and another
Kreindler and Bapuji 2010). that are the result of differences in size of the two
Kenis (2006) does not tell us whether the extra populations they serve. Let us imagine that there
resources had the intended effect on the number of are no differences that would invalidate a simple
admissions, and he does not tell us whether comparison, e.g., no differences in the mix of age,
326 P. W. Armstrong
sex, and other salient factors and no differences in understand the dynamics of the waiting list – if
the indications for treatment or in the thresholds at we find there is no need for a second hypothesis,
which a patient is added to, or admitted from, the whether as a result of empirical data or of mathe-
list, etc. Let us imagine that comparison reveals no matical proof. The same will be true if the second
difference in the rates of diagnosis specified on a hypothesis is found not to fit: e.g., if the number of
suitable cross-classification of salient factors. If enrolments is found to determine the number of
the only difference between one hospital and admissions rather than vice versa or if the financial
another is one of scale, then large hospitals serv- transaction, which serves to authorize enrolment
ing large populations would report large numbers and underwrite admission, is found to occur at
of admissions and large numbers waiting, while some other point in the market without any further
small hospitals serving small populations would exchange in the stock-cupboard.
report small numbers of admissions and small
numbers waiting, i.e., we would expect a direct
association between the number of admissions Why has the Effect of Enrolment
and the size of the list. The same reasoning Confounded Analyses to Date?
would also lead us to expect a direct association
between the number of admissions and the num- Commentators, analysts, and researchers have
ber of enrolments (Newton et al. 1995). shown very little interest in the effect of enrolment
It is not enough to show a direct association on the size of the waiting list. We wonder how this
between the number of admissions and the size important confounder came to be overlooked and
of the list and attribute it to supplier-induced what might provide a sufficient incentive to cor-
demand. This does not allow us to distinguish rect the fault.
the effect of supplier-induced demand from the We assert that it is the relationship between
effect of the flow of patients on the stock (when the balance of enrolments and admissions and
the number of enrolments is not fixed and changes in the size of the list which is of primary
unvarying). It is also not enough to show a direct concern, although it is the relationship between
correlation between the number of admissions and admissions and size which dominates the relevant
the number of enrolments. This does not allow us literature. Such a view seems to imply that com-
to distinguish between the effect of supplier- mentators, analysts, and researchers were wrong-
induced demand and the effect of scale. footed at the start of the debate and that the early
The use of the term supplier-induced demand error has been reproduced in most of the work
suggests the futility of making additional resources conducted since. Neither the scope of this chapter
available for elective treatment and – despite assur- nor the extent of our scholarship allows this
ances to the contrary – implies that clinicians have standpoint close consideration at present, but a
been complicit. The way had been prepared for the few waymarks may be enough to indicate the
notion long before the term entered the literature. route proposed.
Commentators viewed the waiting list “as a kind of
iceberg” (Powell 1966, 39), likened the waiting
list to a “bottomless pit” (Haywood 1974, 38), Some Assumed Enrolment Was Fixed
and thought that “trying to ‘get the waiting lists and Unvarying
down’ [was] an activity about as hopeful as filling
a sieve” (Powell 1966, 40); and the conviction In 1963, the then Ministry of Health (MoH) for the
that a plentiful supply might prompt burgeoning UK published what was only its fifth memoran-
demand is (we think) older than any of these dum on the NHS waiting list (MoH 1963b, 1). The
(Culyer and Cullis 1976). But the hypothesis of author claims that a stationary waiting list “normally
supplier-induced demand will prove to have been represents not a deficiency of resources . . .” –
counterproductive – a diversion of attempts to there is no imbalance of enrolments and
admissions – “but a backlog of cases . . .,” a result helpfully confirmed in the memoirs of the then
of the accumulated imbalances of the past. The Minister of Health, Enoch Powell, who refers to
author also says that “[a] growing waiting list may “the circulars enjoining such devices as the use of
often indicate a deficiency of resources,” i.e., there mental hospital beds and theatres, or of military
is an imbalance of enrolments and admissions. hospitals” (MoH 1963b, 1 & 3), to “the ‘waiting
But he obscures matters by asserting that the list at 31st December’ in the Ministry of Health’s
“growing waiting list . . . will generally also annual reports . . . [as] . . . a reliably stable feature in
include an element of backlog” (MoH 1963b, 1), an otherwise changing scene” (Culyer and Cullis
insisting that “[a] continuous effort will be needed 1976), and to “the special operations to ‘strafe’ the
to prevent a backlog from arising again” (MoH waiting lists, urged on the . . . ground that a station-
1963b, 3). His use of the words “generally,” also ary waiting list is not evidence of deficient capacity
“normally,” and “often” implies doubt where – otherwise it would lengthen – but of a backlog
there is, in fact, ground for none. which, once ‘cleared off’, ought not . . . to recur”
Whether an individual is on the list as the result (Powell 1966, 40). The Minister confirms the
of an historic backlog or as the result of its con- understanding of his staff but considers the
tinuing growth, the additional case can only be ground of their reasoning to have been “falla-
cleared if additional means allow the number of cious.” He no longer views the stationary waiting
admissions to exceed the number of enrolments list in the same light. We disagree. The Minister’s
however briefly. This is what the memorandum error was in thinking the list stationary when
asserts. The term “backlog” is useful if it is con- there had been substantial variation in one at
fined to those who are awaiting admission from a least of the factors thought to determine size,
list that is stationary: if any one of these is cleared, i.e., in admissions.
the reduction in size is permanent. The individual Had the size of the list in fact been stationary,
will never be replaced because the number of the number of enrolments ought to have equaled
admissions equals the number of enrolments. the number of admissions. So it is not clear to
But if we clear anyone from a waiting list that is us why anyone would expect the number of
growing, the reduction in size is momentary. This enrolments to be stationary, that is, fixed and
individual will shortly be replaced by another unvarying, when “the total annual number of
because the number of admissions does not in-patients treated in hospitals has increased by
equal the number of enrolments, and our efforts one-sixth [16.7%], . . . since the early days of the
have to be never ending. service” (MoH 1954, 1). (Culyer and Cullis
In an earlier memorandum, the Ministry expre- (1976) report that throughput capacity, their
ssed the view that “the hospital service is roughly surrogate for elective admissions, showed an
keeping pace with demand but is not appreciably increase of 24.2% – from 11,547 cases/day in
succeeding in reducing the very large waiting num- 1955 to 14,336 cases/day in 1962.) Nevertheless,
bers” (MoH 1954, 1). (For the sake of the narrative, the author of the memoranda feels no need to
we shall assume that the same author wrote both discuss the effect of variation in the number of
memoranda.) He seems to have thought that the enrolments, but he expects there to be a decrease
size of the list was approximately stationary, that is, in the size of the list if there is any increase in the
fixed and unvarying. As a result, he sees the number of admissions. A subsequent Secretary of
problem as one of clearing the backlog (DHSS State for Health and Social Services, Barbara
1981a; Naylor 1991). (According to Culyer and Castle, presents her analysis in very similar
Cullis (1976), the waiting list for all specialties terms. She knows that the list has both shrunk
(excluding psychiatry), England and Wales, and swelled since MoH (1963b), but she chooses
showed an increase in size of 4.3% over 7 years to describe it as approximately stationary: “over
from 444.0 thousand on 31 December 1955 to the past 10 years the total surgical waiting list in
462.9 thousand on 31 December 1962.) This is England and Wales has hovered at the half million
328 P. W. Armstrong
mark, with little change from 1 year to another” position, neither attributes the failure of initiatives
(DHSS 1975, 2). She seems to think it incongru- to the correct cause. The number of enrolments
ous that “the number of admissions nevertheless was not stationary, so a brief excess of admissions
increased by more than 7%” (DHSS 1975, 2) but was not capable of effecting a permanent reduc-
like her predecessor feels no need to discuss the tion in size.
possibility of underlying variation in the number
of enrolments.
According to Culyer and Cullis (1976, 244), Some Only Registered Discharge
“HM(63)22 . . . emphasized that a long waiting (and Death)
list that was numerically stationary is not nor-
mally an indication of resource deficiency in The first dataset, which was intended to inform
any permanent sense but represents instead a the administration of the NHS across England
‘backlog’ of cases which could, and should, be and Wales, provided even less evidence of
removed by determined short-term efforts”. The insight. When it was implemented across the
“situation is one in which the system has settled two countries in 1958, the Hospital In-Patient
down into a kind of long-run administrative equi- Enquiry required the completion of a printed
librium producing a constant addition to the form (HIP 1A) for a one-in-ten sample of dis-
waiting list . . . each time period which is just charges from, and deaths in, hospitals (MoH and
sufficient to offset the numbers called from the GRO 1961a). (Several categories of discharges
existing waiting list during the period” (Culyer (and deaths) were excluded such as those origi-
and Cullis 1976, 245). They think the Ministry nating from maternity units and psychiatric
envisaged a situation in which the number of wards.) The form allowed hospitals to record
enrolments “is just sufficient to offset” the num- the dates on which the patient had been “put on
ber of admissions. the list or booked” for the condition and had
Frost (1980) traces this to the Annual Report of been “first sent for” to come in to hospital
the Chief Medical Officer for the year 1962, (MoH and GRO 1961a). Successive iterations
which asserts that “a long but steady waiting list were intended to improve the coverage, com-
is an indication only of a backlog of work pleteness, and consistency of the data.
remaining from the past” and that “[i]t is only if
the waiting list is steadily increasing that one has Doubtful Definitions
any justification for deducing . . . from waiting list The second version of the form, which was
data alone . . . that there is a shortage of beds” introduced in 1967 (DHSS and OPCS 1970),
(MoH 1963a, 205). We might conclude that the established the pattern of data capture for the
list was not “steadily increasing” (Culyer and 18 years that followed. It allowed hospitals to
Cullis 1976) in the absence of any data on the continue recording the date of admission, the
number of elective admissions. Indeed, we would date of first operation, and the date of discharge
think it stationary were we to compare the size of (or death), but it omitted the date “first sent for.”
the list in 1964 with the size of the list in 1960 The original definition of the “waiting time” was
(475,863/475,643 = 1.000) or the size of the list “[t]he interval between the date a case is placed
in 1965 with the size of the list in 1951 (498,972/ on the waiting list, or booked, and the date of
496,131 = 1.006) (Powell 1966). But according admission (or the date first sent for if the patient
to Frost (1980), the waiting list for general surgery did not come into hospital when first offered a
and related specialties, England and Wales, bed)” (MoH and GRO 1961a, 264). This sug-
showed an increase in size of 23.0% from gests that length was calculated using either the
126,000 on 31 December 1949 to 155,000 on date of admission, or else the date “first sent for,”
31 December 1962. depending on which gave the shorter answer. If
But while Culyer and Cullis (1976) and Frost this is correct, then the definition of length and
(1980) agree with our reading of the Ministry’s the method of calculation subsequently changed:
Table 8 Whose origin is acknowledged when admissions are enumerated?

The first definition was used to The second definition was used to The third definition was used to
collect data in the years 1959–1973. collect data in the years 1974–1975. collect data in 1976–1985.
“A patient for whom the hospital “A patient for whom the hospital had “A patient for whom the hospital
had previously agreed to arrange an previously arranged an admission in had previously agreed to arrange an
admission in due course, it not due course. Booked cases admission in due course, and who
being possible at that time to define (non-maternity) are included with comes in when sent for by the
in advance the exact day of those who come in when sent for by the hospital. Booked cases, that is those
admission, and who comes in when hospital” (DHSS et al. 1978, ix). for whom an admission date has
sent for by the hospital” (MoH and been reserved, are excluded, as are
GRO 1961a, 262). patients whose admission has been
deferred whether for medical or
personal reasons” (DHSS and
OPCS 1987, xi–xii).
the later definition of the “waiting time” was “[t] Event-Based Data Capture Makes some
he interval in weeks between the date a case is Vanish
placed on the waiting list and the date of admis- The number of admissions should exactly equal
sion” (DHSS and OPCS 1970, 1987, xii), so the the number of discharges in every subset of
length of wait reported in 1967–1985 was longer records defined on geography, or demography,
– by definition – than in 1955–1966. We do not or diagnostic group if the lengths of stay were
know why it was thought necessary to discount a always zero, and the number of admissions should
part of the completed wait in the early years of approximately equal the number of discharges if
the dataset, if a patient declined a reasonable the lengths of stay were short compared with the
offer of admission, and we do not know why period of data capture. But not everyone admitted
the practice was abandoned in the later years of to hospital was eventually discharged with an
the dataset. appropriate diagnosis, having completed the
The definition of a “waiting list case” used in series of investigations or the relevant course of
the later tabulations also differed from that used in treatment. Death accounted for 5.67% of the
the earlier tabulations See Table 8. records submitted for 1958 (MoH and GRO
Booked cases are included under the second 1961a, 107). Fortunately, those responsible for
definition but excluded explicitly under the third designing the Hospital In-Patient Enquiry
and implicitly under the first: a case cannot be thought it important to record the frequency and
booked, “it not being possible at that time to distribution of fatalities among those admitted so
define in advance the exact day of admission.” there were no outcomes of admission not
“[P]atients whose admission has been deferred” represented in the dataset. The authors were able
are excluded under the third definition but are not to claim “[a]lthough strictly related to discharges,
excluded under the first or second. If this is cor- in the majority of cases the data will approxi-
rect, then there was a change in the mix of those mately correspond to admissions” (MoH and
included in official statistics over the 31 years of GRO 1961a, 3).
the Hospital In-Patient Enquiry: the discharges It was not possible to collect information on the
(and deaths) which follow elective admission length of wait for admission until the HIP 1A was
were more narrowly defined and made to appear implemented as the first revision of the transcrip-
less numerous in 1967–1985 than in 1955–1966. tion form in 1952 (Registrar General 1959).
We do not know why the entire waits, of each of Regrettably, the item “date put on the list or
those temporarily suspended at any point “for booked” (MoH and GRO 1961a, 298) appears to
medical or personal reasons,” were included in have been added without fully appreciating its
the earlier version of the dataset but not in the implications for the dataset (Douglas 1962). The
later. authors warn “that the . . . data presented here only
330 P. W. Armstrong
give details of those patients who are admitted novelty and too content with the existing state of
to hospital” (MoH and GRO 1961b, 12). Just as affairs.
discharges underestimate admissions by the
number of deaths, so booked admissions and Period-Specific Cross-sections Estimate
admissions from the waiting list underestimate the Probability of Enrolment
enrolments by the number removed. “Nothing is The dataset was constructed by combining sam-
known of those patients who did not obtain admis- ples from cross-sections of records where mem-
sion” (MoH and GRO 1961b, 12). But whether it bership was defined by the date of discharge
is the discharges (and deaths) of the Hospital (or death), i.e., the dataset was period, rather
In-Patient Enquiry (1952–1987) or the finished than cohort, specific. But having used the end
consultant episodes of Hospital Episode Statistics date to determine whether a record ought to be
(1987 to date), using an end date associated with included or not, we are obliged to use the start
elective admission to define the set of records, date to discover the length of wait. In other words,
does not allow us to establish the frequency of the Hospital In-Patient Enquiry supplied mea-
occurrence of other outcomes or the length of wait sures which were retrospective rather than
with which they are typically associated. prospective – it calculated the length of wait
Had the designers chosen to accumulate backward. (The same is true of most of the
lengths of wait by sampling all of the outcomes datasets currently available to health services
of enrolment, the dataset would have allowed researchers.)
other researchers to identify cohorts of additions The technical terms fail to convey the incon-
to the list, e.g., in 1958, and would have allowed gruity of substituting one approach for the other: if
us to examine what happened to their members we want to know how long a patient might expect
prospectively. But the designers chose instead to to wait, the retrospective approach is akin to put-
accumulate lengths of wait by sampling only ting the cart in front of the horse. This is seldom
those patients who had experienced the event of appreciated because we seldom take sufficient
interest and only those records where this had care in defining what it is that we have calculated.
occurred within a specified period. This has left Let us imagine that the dataset allows us to count
subsequent analysts and researchers with very all of those who were admitted as booked or
little choice. If they want to use the existing waiting list cases during 1952, and to identify
datasets, they must be ready to assume that that proportion of these which had a prior wait
removal from the list is infrequent, or that it has of less than 3 months. Strictly speaking, it allows
nothing to do with the length of wait, or that the us to estimate the probability of being “put on the
experience of this group of patients doesn’t mat- list” 0–2 months prior to being admitted. But we
ter. If they want to use the latest accessions to the want to know the probability of being admitted
dataset and present timely analyses, they must be 0–2 months after being “put on the list.” So we
prepared to examine the prior wait of the quarter’s need to count all of those who were “put on the
admissions instead of the subsequent wait of the list” during 1952 and to identify what proportion
quarter’s enrolments. of these had a subsequent wait of less than
It is regrettable that the event-based and 3 months. Now the prior waits for the period will
period-specific data capture modeled by the have the same distribution as the subsequent waits
Hospital In-Patient Enquiry has been emulated of the cohort if the waiting list happens to be
so widely. It means there are few examples stationary (and closed). But publication of the
where the date of an event at the start, rather length of the prior wait for 32 out of 34 years
than at the end, of the wait is used to define the would seem to imply very great confidence in
set of records, so there has been little opportunity the veracity of this assumption.
to demonstrate the consequences of the approach It is likely that the design of the first dataset
empirically. We think those responsible for owed something to the preferences, practices,
funding enquiry in this area too suspicious of and technologies of the day. Each form
represented a finished spell in hospital. The Patient Administration System. As a result, it

details of admission, investigation, diagnosis, ought to have been possible to extend the use-
treatment, and discharge ought to have been a fulness of the Hospital In-Patient Enquiry with
matter of record. It should therefore have been very little increase in labor once the submission
possible to complete the transcription form by of electronic records was sufficiently wide-
handling the case notes once. It should never spread. But the stakeholders who chose to com-
have been necessary to submit a partially compile records of discharges (and deaths) rather
pleted form with the rest of the details to follow than of admissions continued to influence the
on a second copy at a later date. This kept the design of the Hospital In-Patient Enquiry when
work of completing the forms to a minimum. It it was no longer necessary to choose one rather
avoided the problem of matching two (or more) than the other.
forms which described the same spell; it simpli- The usefulness (and coverage) of the dataset
fied the sorting, selection, and counting of rele- could have been extended had a further depar-
vant discharges (and deaths); and it eliminated ture from the original design been allowed to
the possibility of double counting. provide information about enrolments as well
But the submission of electronic records in as admissions and discharges (or deaths). This
1965 implies that some of the work could have would have required the submission of a pre-
been done by computer. The dataset could have liminary record, which would have registered the
been amended at this point to derive some of its decision to admit and provided relevant details
inputs from those admitted and the remainder known at the time. Many of these patients were
from those discharged (or dead). It would have eventually admitted to hospital and subsequently
required the submission of two records for each discharged (or died), but some were removed
spell (Steering Group 1984) as a matter of routine. from the list without having been admitted. In
The first would have registered admission to hos- these instances, we would have wanted the sec-
pital with all of the details known at that time, and ond extract from the Patient Administration Sys-
the second would have supplemented these with tem to record the fact that the patient had been
the additional details established by the time of removed from the list and to record the date on
discharge. The computer would have then been which this occurred.
used to find the appropriate admission for each Had such a modification been introduced, we
discharge (or death), and to merge the two, creat- would now be able:
ing a single record for each finished spell.
It would have been possible to restrict atten- • To calculate the length of the subsequent wait
tion to the discharges (and deaths) in the dataset (without needing 2 or more years follow-up of
by selecting only those records which met the those most recently enrolled) (Armstrong
relevant criterion, e.g., a date of discharge 2010)
(or death) during 1952. But it would also have • To describe the characteristics and experience
been possible to restrict attention to the admis- of a group of patients which is currently
sions by selecting only those records which met excluded from most of the available statistics
the relevant criterion, e.g., a date of admission
during 1952 regardless of whether the spell was We would be able to do this without any loss of
finished or not. Once, the submission of two data about discharges (and deaths) and without
records would have meant returning to the any loss of data about admissions. We would
same case notes on a second occasion, with a also be able to identify all of those who were on
commensurate increase in the clerical workload. the list and to calculate the length of each individ-
But that need no longer be so. The production ual’s wait to date (Armstrong 2010), at any spec-
of an initial record about admission and a ified date and time.
subsequent record about discharge (or death) The construction of the Hospital In-Patient
reflects the sequence of data entry on the Enquiry changed very little between 1952 and
332 P. W. Armstrong
1985. In 1957, the Ministry invited non- Datasets have been constructed which make
participating hospitals to extend coverage by sub- use of the inputs of hospital administration,
mitting forms for a one-in-ten sample of inpatients under standard definitions and across many
discharged (or dead). In 1974, hospitals were hospitals, in order to meet the needs of researchers
invited to extend coverage by submitting forms as well as those of analysts. The investment
for a one-in-ten sample of all whose discharge which their development represents is sometimes
(or death) followed treatment (or investigation) justified in part by the benefit – unspecified and
as a day case. intangible – which the designers expect to accrue
We do not know whether the waiting list was from subsequent investigations. But the useful-
thought to be stationary, or not, and we do not ness of these datasets for the purposes of research
know whether there was an understanding of the depends upon the goals and design of investiga-
consequences of assuming that the list is station- tions not yet envisaged and on the extent to which
ary, when it is not. We have found no documen- the designers have succeeded in anticipating their
tation which alerts users to the fact that the prior requirements.
waits for a period do not have the same distribu- The dates of compilation, the list of contribu-
tion as the subsequent waits of the cohort unless tors, and the stated inclusion and exclusion
the list is stationary (and closed). There is there- criteria indicate some of the more obvious limita-
fore no evidence that the Government Statistical tions of these datasets. But most also constrain
Service considered the published measures to be researchers in a way that is not obvious. Although
erroneous when the waiting list was not, in fact, the datasets supply records of the wait for
stationary. elective admission, researchers may not use
these to conduct cohort analyses – prospective
Design, Analysis, and Interpretation are or retrospective – of all of those who were
Constrained added to the waiting list. The event-based (and
The Hospital In-Patient Enquiry was compiled period-specific) method of data capture used to
from period-specific cross-sections of those who compile the dataset obliges researchers to exam-
had died in hospital, or been discharged, having ine the prior waits of those admitted and the
been admitted electively. This method of data probabilities of enrolment, e.g., 0–2 months,
capture is analogous to drawing samples from prior to admission when they might have pre-
each year’s contribution to the filing cabinets. It ferred to examine the subsequent waits of those
is easy to understand and implement, and it is enrolled and the probabilities of admission, e.g.,
widely used and familiar. It may provide inexpen- 0–2 months, after enrolment.
sive data for the purposes of research if items are This constraint is an artifact of the method of
collected as a matter of routine for other purposes, data capture. The Hospital In-Patient Enquiry
but the advantage of this has always to be set aimed to compile information about hospital
against the disadvantage that records were not morbidity. It opted to do this by collating records
constructed and items not collected with the aim of discharges (and deaths) instead of records of
of this particular investigation clearly in mind. As admissions or enrolments because case notes were
a result, the dataset may not contain all of the more likely to include diagnoses, investigations,
necessary records, i.e., the representation it pro- and treatments at the later of the three events. By
vides may be biased (Berkson 1946; MoH and definition, those who were removed from the list
GRO 1961b; Cornfield and Haenszel 1960). The were not admitted, and their omission from the
dataset may not contain all of the necessary vari- dataset may have been quite unintentional. Their
ables, i.e., the analyses it permits may not allow case notes contained little information about diag-
for confounding and effect modification. And, noses, investigations, or procedures, no date of
where the dataset seems to include the necessary admission, and no date of discharge (or death).
variables, the data may prove insufficiently reli- So it would have been easy to class them with
able, valid, sensitive, or complete. incomplete records and other examples of missing
data and to assume that the error was random series after the British General Election of 1979
rather than systematic. (Labor: 1974–1979; Conservative: 1979–1997).
We do not think the designers of the Hospital It inferred “that hospitals were losing ground, . . .
In-Patient Enquiry fully appreciated the conse- between 1957 and 1967, against increasing pres-
quences of appending the “date put on the list sure on their resources” (DHSS et al. 1979, 266).
or booked” (MoH and GRO 1961a, 298) to This observation in 1979 is consistent with the
form HIP 1A (Douglas 1962). Nevertheless, they views expressed in HM(63)22. Had the number
established a precedent which resulted in the pop- of enrolments been stationary in the early 1960s,
ularization of a defective method and widespread the Government Statistical Service expected a
publication of biased estimates. Existing methods decrease in the length of wait to accompany an
of data capture should be amended to include increase in the number of admissions. But “the
outcomes of enrolment other than admission proportion of those admitted who had been
(Armstrong 2000), and new datasets should define waiting six months or more” and “the median
the set of interest – wherever possible – by using waiting time” was observed to increase despite
the date of an event at the start of the record rather an increase in the number of admissions, which
than the date of an event at the end. suggests “increasing pressure on resources,” i.e.,
that the number of enrolments increased.
An Apparent Lack of Candor
The Ministry of Health (1963b) discussed the
numbers waiting as reported in the SH3 return at Some Compiled Returns
the close of each year in its memorandum,
HM(63)22, but it made no mention of the length A judgment was passed on the set of discharges
of wait although the tables from the Hospital (and deaths), which resulted in abolition of the
In-Patient Enquiry for 1955, 1956–1957, and Hospital In-Patient Enquiry after 31 December
1958 were all available at the close of 1961. We 1985 and in implementation of the Körner
think it unlikely that any data on the length of wait Reporting System on 1 April 1987. It was asserted
would have been ignored when the Hospital that “[t]his survey is being replaced by the Körner
In-Patient Enquiry was intended to inform the data system” (DH and OPCS 1989, 1), i.e., that
administration of the NHS and the Ministry of the Körner Reporting System replaced records of
Health was preparing to issue guidance (MoH discharges (and deaths) with aggregate counts,
1963b). But the tables published during 1963 sometimes of those admitted (or removed) from
(for the 1959 and 1960 datasets) were the only the list, sometimes of those still awaiting admis-
ones in the series (1955–1985) which failed to sion, and sometimes of those enrolled on the list.
report the length of wait despite collecting the This might seem to suggest that the work of com-
dates needed to do so. The omission of appropri- piling the records of discharges (and deaths) had
ate statistics from the tables for 1959 (MoH and become too burdensome, even on the basis of a
GRO 1963a) and 1960 (MoH and GRO 1963b) one-in-ten sample (MoH and GRO 1961a), or else
implies a lack of candor in the run-up to the that the English NHS had decided that a series of
British General Election of 1964 (Conservative, aggregate counts could better meet its needs and
1951–1964; Labour, 1964–1970). had identified those it thought necessary. But this
The Government Statistical Service said noth- is not the whole story. The Körner Reporting
ing about the length of wait in 1959 and 1960 System replaced a number of returns in addition
when it published its collection of historical tables to the Hospital In-Patient Enquiry, e.g., the SBH
in 1972. But it drew attention to an increase in “the 203 and the EDP4 and EDP5 of the SH3 (Steering
median waiting time” and to an increase in “the Group 1984); and, even as the assertion was being
proportion of those admitted who had been published, the first records of inpatient episodes
waiting six months or more,” when it examined were being compiled into Hospital Episode Sta-
the data for 1957–1960 as part of a longer tistics. It appears that none of the criticisms made
334 P. W. Armstrong
by Working Group A on hospital clinical activity that this is why the Steering Group proposed
have to do with items supplied by the Hospital counts of those who failed to attend, counts of
In-Patient Enquiry (DHSS 1981b). admissions canceled by the hospital, and counts
Nevertheless, it was the Körner Reporting of patients removed from a list for any reason
System which introduced the count of decisions other than elective admission (Steering Group
to admit each quarter, the first data on the number 1984).
of enrolments, additions, or accessions to be We know that the design of the relevant returns
collected in almost 39 years of the UK NHS was not solely dependent upon the members of
(DHSS 1986, 4; Newton et al. 1995). Counts Working Group A. So the Steering Group added
were also proposed of the number of patients the count of patients removed from the list to the
admitted, and of the number of patients removed, KH06 return on “events occurring during [the]
from the list each quarter and of the number of period” (1984, 90) and published its recommen-
patients awaiting admission at the quarter’s end dations before it was realized that the additional
(Steering Group 1984). The four counts seem to counts of the KH07A return would be required.
imply that the stock-flow model, or some version Later versions of the KH06 return (CRIR 1997;
of the basic demographic equation (Newell 1988; CRIR 1998) instructed NHS Trusts to check that
Pressat 1985), may have informed the design of the counts on the KH06, KH07, and KH07A
the relevant returns. But this is doubtful. Work- returns were consistent, although the possibility
ing Group A used a different model to justify of doing this was not mentioned by Working
its proposals to the NHS in 1981, one which Group A, the Steering Group, or those responsible
claimed to provide information about demand for the development of the earliest versions of the
(expressed, met, and unmet) and about attempts returns (DHSS 1981b; Steering Group 1984;
to supply demand (DHSS 1981b; Steering Group DHSS 1986).
1984). Despite the addition to the KH06 return of an
We think that this is why its recommendations instruction to evaluate the consistency of the
were presented under the heading “Information data, we have found little evidence (in 40 sets
about demand for hospital facilities” (DHSS of returns submitted by each provider) that the
1981b, 120) and why ‘demand’ was mentioned stock-flow model, or any version of the basic
42 times in the relevant chapter while ‘stock’ and demographic equation, has been used to do this.
‘flow’ were not mentioned at all (DHSS 1981b). (The instruction was added no later than 1 April
We think that this is why the forms were first 1996 (CRIR 1997) and remained in force until
implemented as returns about the “demand for the return was abolished on 1 April 2006 (ISB
elective admission” (DHSS 1987, 1) and why 2006).)
‘demand’ is mentioned 13 times (and ‘stock’ and
‘flow’ are not mentioned at all) in the penultimate • The version of the KH06 return, which was
“DataSet Change Notice (DSCN)” of the series. issued for use from 1 April 1998 (CRIR
We think that this interest in supply and demand is 1998, 7 of KH06), added “[e]xplanations may
why Working Group A proposed the counting of be given in the box below” to the second par-
“admission decisions” (DHSS 1981b, 129) agraph of instructions about checking consis-
despite the confusion of these with “admissions tency, and it also added a box with the
arranged” (DHSS 1987, 1) and why it coined the invitation [t]his area can be used for your
term “decision to admit” (DHSS 1981b, 123, 125–6 notes and maybe [sic] used to explain any
& 130) instead of “patients added to the list” special features which have affected this
(CRIR 1997, 2–5 of 7). We think that this is why return. These changes might imply that the
Working Group A proposed a count of patients eight previous sets of submissions contained
who were not admitted (despite arrangements inconsistencies large enough to warrant expla-
having been made) as well as a count of nation. But there were numerous changes in
patients who were (DHSS 1981b), and we think this version of the return – most having to do
with format and layout and very few having check the internal consistency of the KH06 and
any effect on the counts. (The addition of pain KH07 returns, this is a conclusion we are not yet
management to the list of main specialty func- ready to draw.
tions will have generated an additional series of The four counts used to describe the inpatient
counts, and the counts against one (or more) of waiting list might have been consistent when
the existing categories might have diminished first proposed (Steering Group 1984; DHSS 1986;
as a consequence.) Given that previous ver- IMG 1992). The Steering Group (1984, 87)
sions of the return invited comment on counts recommended counting the “[n]umber of patients
of ordinary (or inpatient) admissions and for whom a decision-to-admit has been made,” the
counts of day-case admissions, the invitation “[n]umber of patients admitted electively,” the “[n]
to explain any inconsistency may reflect a umber of patients . . . removed from a list,” and the
desire for consistent presentation rather than “[n]umber of patients still awaiting admission.”
grounds for concern. It appears to have discounted – at least for the
• The National Audit Office (2001a, 21) “was purposes of the narrative – the possibility that an
unable to reconcile” the counts. It found individual might require elective investigation or
24,312† more patients on the list at the close treatment more than once a quarter. Instead, it
of the quarter than were accounted for by claims that “a cohort of all the patients for whom
enrolments less admissions and removals a decision to admit has been made during a spec-
(Table 3c), and the Department of Health was ified time period can be followed up at regular
unable to explain the discrepancy when asked intervals and the number in the cohort admitted at
to do so. The National Audit Office (2001b) different times recorded” (Steering Group 1984,
also queried an inconsistent reduction in the 86). The members of the cohort are “patients
size of the list at Surrey and Sussex Healthcare for whom a decision to admit has been made,”
NHS Trust (England), 1998–1989. It is not which seems to imply a single decision to admit
likely that this Trust had checked the consis- per patient. Moreover, the cohort is “followed
tency of its returns. up at regular intervals” to identify those no longer
• We have been informed that “[t]he NHS Data awaiting the outcome of interest, i.e., “the number
Model and Dictionary team are not aware . . . admitted,” which indicates that a member
of any reviews or audits that [were] commis- either has, or has not, been admitted “at different
sioned by the Department of Health into the times” and seems to imply a single outcome per
internal consistency of the KH06 and KH07 patient. The narrative does not mention removal
returns” (personal communication, Mayet M, from the list for reasons other than admission.
24 January 2016.). We do not think the Steering Group ignorant
of the possibilities. It understood that while the
Discussing attempts “to tackle waiting-list counts describing the outpatient waiting list might
problems,” Yates (1987, 71) claimed “there is be correlated, they were not consistent. Alluding
no tradition of writing up managerial work of to the decision to admit to the list, the Steering
this type in medical, or even in management Group claims that “[p]iloting and consultation
journals.” (Copyright # John Yates 1987.) The have shown the practical difficulty of capturing
paper by White (1980) appears to be the only and recording any requests other than those made
example of its type which survived peer review in writing. It is however feasible to record the
and made it into print, but it is scarcely possible number of written requests made by general prac-
that he was the only analyst in England and titioners and changes in this statistic should reflect
Wales who was interested in the relationship changes in the total number of requests” (Steering
between inputs, outputs, and the size of outpa- Group 1984, 87).
tient and inpatient waiting lists. So while the lack But the Steering Group (1984, 87) also
of documentary evidence suggests that NHS recommended regular reports of the “[n]umber
Trusts and District Health Authorities did not of patients for whom arrangements to admit were
336 P. W. Armstrong
made but who were not admitted because they one quarter on average. Moreover, the Körner
failed to attend” and of the “[n]umber of patients Reporting System does not tell us how many
for whom arrangements were made but admission were reinstated over the course of the quarter.
did not take place because of cancellation by the Sthen estimates the count in question by assuming
hospital.” If these are understood to be alternative that each suspension lasts one quarter on average
outcomes of enrolment, then admission and re- and that everyone suspended is duly reinstated
moval by definition cannot provide a consistent (CRIR 1997).
account for the change in the size of the list. The second handles the count of those sus-
And the definitions of the four counts used to pended as though it was a stock. The number
describe the inpatient waiting list were not wholly suspended at the end of that quarter is added to
consistent in subsequent iterations of the Körner the count of those awaiting admission at that date,
Reporting System (CRIR 1997; CRIR 1998). and the number suspended at the end of this quar-
The CRIR Secretariat (1997, 32) asserts that ter is added to the count of those awaiting admis-
“[p]atients waiting at the end of the quarter should sion at this date. So we expect
be equivalent to patients waiting at the end of the
last quarter plus the number of additions and Enow ðAnow þ Rnow Þ ¼
minus the number of patients admitted in the
quarter or removed from the elective admission ðCnow þ Snow Þ Cthen þ Sthen ,
list for other reasons.” This is what we would (3:3)
expect if (a) the date of addition marked the start
of each wait, (b) the date of admission (or of where Snow represents those suspended from the
removal) marked the end of each wait, and (c) if list at the time of this census, and Sthen represents
everyone waiting was eligible for admission on those suspended from the list at the time of that
any and all of the intervening dates. But not census. We do not need to make any assumptions
everyone was considered eligible for admission about the length of suspension or the frequency of
on any and all of the dates separating their addi- reinstatement under this approach. Instead, we
tion to the list from their removal. expect the balance of enrolments less admissions
“For the figures to balance,” providers were (and removals) to account for the difference bet-
told, “suspended patients must also be taken into ween the censuses once we have corrected those
account” (CRIR 1997, para. 164). There are two counts by adding back the suspended.
ways of doing this. Formulae (3.2) and (3.3) are equivalent. But
The first of these handles the count of those formula (3.2) tells us that enrolment and reinstate-
suspended as though it was a flow. The number ment cause the official list (Cnow , Cthen) to swell
suspended that quarter is added to decisions to and that admission, removal, and suspension
admit this quarter as though that number were cause it to shrink, whereas formula (3.3) provides
reinstated this quarter (Armstrong 2000), and the a simpler account – the number waiting increases
number suspended this quarter is added to the as a result of enrolment and decreases as a result of
number removed. So we expect admission and removal – but the list (Cnow +
Snow , Cthen + Sthen) is not the one reported in

E þ Sthen ðA þ R þ Snow Þ ¼ Cnow Cthen , the Press. With a little rearrangement, both formu-
(3:2) lae yield the relationship which providers were to
use to check the consistency of their counts of
where Sthen represents those reinstated to the list, inpatients and of day cases (CRIR 1997, para.
and Snow represents those removed from the list, 164 & p. 6 of KH06), namely,
this quarter. The Körner Reporting System does
not tell us how many were suspended over the
Cnow ¼ Cthen þ Sthen þ Enow
course of the quarter. Snow estimates the count in
question by assuming that each suspension lasts ðAnow þ Rnow Þ Snow , (2:1)
so the two approaches give identical results. The for the KH07 return are simple: by definition, no
CRIR Secretariat claims that “[t]he change in the patient can wait for more than one procedure at a
total numbers waiting should reflect this activity,” time, so no patient may be counted more than
that is, “all the additions to the waiting list (i.e., the once in the census at the end of the quarter. But
number of decisions to admit) and removals from the instructions for the KH06 return are not sim-
the waiting list that have taken place during the ple: if the dates of the decisions to admit fall in the
quarter” (CRIR 1997, para. 144). If we understand same quarter for both procedures, the count of
“the total numbers waiting” to include those decisions to admit must not include the second
suspended, i.e., Cnow + Snow and Cthen + Sthen, of them; and if the dates of the admissions do not
this statement would seem to imply the relation- fall in the same quarter for both procedures, the
ships of formula (3.3). But if we understand the count of admissions must include the second
“total . . . of all patients waiting for admission” to of them. We think that the date of admission
exclude those suspended (CRIR 1997, para. 155), (or removal) for the subsequent procedure will
the statement would seem to imply the relation- be counted more often than the date of the deci-
ships of formula (3.2). Given that the data about sion to admit which preceded it. So the consis-
suspensions (Snow , Sthen) were obtained by taking tency of the four counts was impaired when the
a census (CRIR 1997), formula (3.3) is the model KH07 was modified to exclude all of those
which ought to be used. ‘awaiting’ an additional procedure and the KH06
Having demonstrated the consistency of the was modified to exclude those ‘awaiting’ a second
data by adjusting for suspensions (CRIR 1997), procedure only when the first procedure had not
we ought to be willing to acknowledge – in the yet been completed.
first instance – that it is “the total numbers While the terms stock and flow have not been
waiting” and not the official numbers which used in any document about the KH06, KH07,
reflect the balance of enrolments less removals and KH07A returns or in any of the official com-
(and admissions) and, in the second instance, mentary, they were introduced as labels for the
that it is the balance of enrolments and reinstate- datasets which took their place. DSCN 09/2006,
ments less admissions (and removals and suspen- which announced the “data flow” intended to
sions) which changes the official numbers and not replace the tabulated content of the returns (ISB
“the total numbers waiting.” 2006, 1), mentioned ‘stock’ 29 times and did not
Now some patients will require elective treat- mention demand once. (It also mentioned ‘flow’
ment (or investigation) on more than one occasion 41 times, but not all of these were to do with the
(IMG 1992). Some will require treatment (or events previously recorded by the KH06.) Despite
investigation) for the same condition, will this, there seems to be little understanding of the
undergo the same procedure, and will appear on relationships implied by the stock-flow model
the same list, on two (or more) occasions. The even when the terms are used extensively. The
NHS accepts that the manager ought not to be held definitions of the four counts represented either
responsible for that part of any wait over which as a ‘stock’ or as a “flow” are not perfectly con-
she can be expected to exercise no control. So if a sistent (ISB 2006, 44 & 46).
patient is admitted to the same waiting list twice Dr. A. Mason, who had previously demon-
(CRIR 1997), e.g., for extraction of two cataracts, strated an excellent understanding of the relation-
she is not considered as waiting for the second ship between stock and flow (Mason 1976), was a
operation until she has been discharged from hos- member of the Secretariat and therefore party to the
pital after the first. But the data model implied by deliberations both of Working Group A and the
this is more complicated than that in which each Steering Group. Now Working Group A claimed
patient is (assumed) to require just one admission that “information is required about the balance
or in which we count, for example, the number of between referrals and the number seen . . . [t]o
decisions to admit – rather than the number of identify whether the number of patients waiting
individuals added – to the list. The instructions for an out-patient appointment is increasing or
338 P. W. Armstrong
decreasing” (DHSS 1981b, 122). It also claimed and Cullis 1976, 251), and they advocate “[a]n
that “information is required about the balance alternative approach, likely to appeal to those who
between expressed and met demand” (DHSS prefer not to reject the supply/demand approach
1981b, 123), presumably in order to determine entirely” (Culyer and Cullis 1976, 247). But “despite
whether the number of patients waiting for an very diligent searching” (Culyer and Cullis 1976,
inpatient admission is increasing or decreasing. 264), and despite emphasizing the “one behavioural
Nevertheless, we fear that neither the stock-flow law that has never been refuted” (Culyer and Cullis
model nor the basic demographic equation had 1976, 244), they are obliged to confess that “we
much influence on the analysis of the data. The have been unable to uncover any systematic and
English NHS appears to have collected relevant reliable empirical relationships among the rele-
counts for 24 years (1 April 1987–31 March vant variables, nor have we been able to devise a
2010) without ever testing its convictions about plausible ‘behavioural’ model that has led to the
the effect of enrolment on the size of the list, and specification of such a set of relationships”
it appears to have done so for 10 of these despite (Culyer and Cullis 1976, 264). Culyer and Cullis
instructions to check the consistency of the counts (1976) claim that the first hypothesis has failed
(1 April 1996–1 April 2006). without realizing that it has not been subject to a
The KH06, KH07, and KH07A returns were fair trial. They attempted to construct a model
abolished on 31 March 2010, on the grounds without considering the effect of variation in the
that the suite of 18-week referral to treatment number of enrolments.
times adequately met the needs of users. But Researchers continue to find fresh evidence of
this dataset has failed to provide any infor- the direct association between the number of
mation about the number of enrolments, addi- admissions (or an appropriate surrogate) and the
tions, or accessions for 5½ years (31 March size of the list (Buttery and Snaith 1980; Frost
2010–1 October 2015 (Analytical Services 1980), which Culyer and Cullis (1976) viewed as
2015)). The deficiency has now been rectified, indicating the failure of the first hypothesis. More-
ostensibly to allow the reintroduction of a check over, there appears to have been little diminution
on the consistency of the four counts, i.e., of in the popularity of the “one behavioural law that
“new RTT clock starts” (E), “completed RTT has never been refuted” as a result of Culyer and
pathways” (A), “validation removals” (R), and Cullis’s inability to implement it satisfactorily.
changes in the size of the list (Cnow Cthen). The direct relationship continues to be explained
But Analytical Services did not explain why we by the appetites of those who enter the market-
expect start dates and end dates to yield exactly place to sell (supplier-induced demand) rather
the same count of those eligible for admission at than the appetites of those who enter the market-
any point during the month of interest (Analyti- place to buy.
cal Services 2015, 8). It is perhaps not surprising The Institute of Social and Economic Research
that it permits “a reasonable tolerance” for the received support from the Department of Health
consistency check as did the CRIR Secretariat and Social Security “for . . . research into the
before it (CRIR 1997, 6 of KH06). economics of waiting lists.” It received a grant,
and Culyer and Cullis (1976, 239) “benefited
enormously from discussions with DHSS offi-
Some Made Hay cials,” which may be why the DHSS turned to
the Institute for advice. But it is likely that the
Culyer and Cullis (1976) note that the size of the enquiry was also prompted by prevailing opinion,
waiting list for England and Wales has not decreased e.g., by “Parkinson’s Law of Hospital Beds”
as a result of increases in the number of admissions. (Powell 1966, 43) and “Say’s Law of Hospitals”
They claim “that no one has to date succeeded in (Culyer and Cullis 1976, 244), and by the expres-
formulating a systematic and testable model to sions of other economists (Feldstein 1967) earlier
explain the phenomena . . . satisfactorily” (Culyer on the scene. Whatever the reason, the DHSS
chose to consult economists rather than the mem- can determine how many of these were admitted
bers of any other school of social science. It is during the period of interest, and we can calculate
perhaps no surprise that supplier-induced demand the length of their completed stay. But we have no
has become the dominant paradigm in the litera- information about those who have yet to be
ture from the UK. discharged (or to die): we cannot determine how
many of them were admitted during the period of
interest, and we cannot calculate the length of
The Primary Hypothesis Has Not their incomplete stay. And if we have chosen to
Been Falsified register admissions, we can count them and cal-
culate the length of stay with ease, i.e., we know
The Ministry of Health (1954, 5) recommended which of those admitted during the period of
“the careful and regular study of such figures as . . . interest have yet to (die or) be discharged, but
size of waiting list in proportion to number . . . of we have little information about the outcome of
patients treated, degree of urgency of need of their admission, e.g., diagnosis, treatment, and
patients on the waiting list, numbers waiting for destination of discharge.
defined periods and such other indices as are avail- Extracting information about enrolments from
able in published documents.” In other words, the a collection of discharges (and deaths) is more
Ministry expected the compilation of information involved. If we are to obtain a complete set of
about the numbers waiting and the numbers admit- enrolments, we must:
ted, but it did not expect the compilation of infor-
mation about the numbers enrolled. It is therefore • Identify those who have been discharged
not surprising that the Hospital In-Patient Enquiry (or who died) following admission from the
did not provide counts of the numbers enrolled in waiting list.
England and Wales. • Identify those who have not been discharged
The omission was hallowed by successive (or have not died) following admission from
datasets, first by those that relied on printed forms the list.
and second by those that relied on electronic media • Identify those who have not been admitted
for their inputs. The national dataset compiled from the list.
records after investigation or treatment had been
completed, and these records were collated by the This third group includes (i) some who will be
period in which the event was registered, i.e., by admitted from the list and who will, in due course,
the date of discharge (or death) (Registrar General be numbered among the discharged (or dead), and
1959). This architecture facilitated the counting it includes (ii) others who – having been removed
and cross-classification of discharges (and deaths), from the list – will never be admitted and will
and it reflected our need for data on morbidity therefore never be numbered among the
(Registrar General 1959). discharged (or dead).
But we are also interested in the use made of We face the problem of our choices. If we had
the costlier resources. This has expressed itself in chosen to register patients immediately after their
an interest in the length of stay and therefore in the enrolment on the list, instead of after their dis-
occurrence of admission as well as discharge charge from hospital, it would be easy to determine
(or death). Extracting information about admis- the size of a cohort and to cross-classify its mem-
sions from a collection of discharges (and deaths) bers. But the architecture of successive datasets in
is a little involved. It is not difficult to obtain the England prized economy of effort: it set about
information we require when we have both the capturing the requisite variables, and relevant
date of admission and the date of discharge records, in a single pass. This can only be done
(or death), but we face the problem of our choices using discharges (and deaths). If we attempt to
while we await the date of the second event. If we compile our records on admission, some data
have chosen to register discharges and deaths, we about the outcome of admission will be missing.
340 P. W. Armstrong
These details could be supplied by taking a Committee for Regulating Information Requirements
second pass at a later date and replacing any record (CRIR) Secretariat. Central returns: waiting times.
DSCN: 10/98/P10. Birmingham: NHS Executive;
which was incomplete with the now completed 1998. p. 3, 7 of KH06. Contains public sector informa-
version. tion licensed under the Open Government License v3.0.
There has been no attempt to construct a Committee for Regulating Information Requirements
national dataset from enrolments in England using (CRIR) Secretariat. Patients awaiting elective admis-
sion. In: The Data Manual. Hospital services module,
repeated passes to upload the latest details from the version 4.0. Birmingham: Information Management
most recent accessions. And there has been no Group, NHS Executive; 1997. p. 7, 12–4, 16–7,
attempt to construct an equivalent dataset out of 29–32, 2–6 of 7, 3 of 4. Contains public sector infor-
discharges (and deaths) for the purposes of longi- mation licensed under the Open Government License
v3.0.
tudinal research, where timeliness is much less of Cornfield J, Haenszel W. Some aspects of retrospective
an issue. But the relationship between enrolments, studies. J Chronic Dis. 1960;11:523–34.
admissions (and removals), and the size of the list Culyer AJ, Cullis JG. Some economics of hospital waiting
cannot be assessed empirically using the dataset lists in the NHS. J Soc Policy. 1976;5(3):239–64. By
permission of Cambridge University Press.
available (Hospital Episode Statistics). It would DeCoster C, Chateau D, Dahl M, et al. Waiting times for
not be reasonable however to attribute a lack of surgery, Manitoba 1999/2000 to 2003/04. Winnipeg:
interest in the effect of enrolment to the lack of Manitoba Centre for Health Policy; 2007. p. 6, 37–8,
relevant data. The Department of Health and Social 53, 59. http://mchp-appserv.cpe.umanitoba.ca/reference/
swt_3web.pdf. Accessed 11 July 2016.
Security instructed hospitals to report the number Department of Health and Social Security (DHSS). A
of enrolments as aggregate counts between 1 April report of the working groups A to the steering group
1987 and 31 March 2010, by completing the KH06 on health services information. London: NHS/DHSS
return on a quarterly basis. Nonetheless, there is Steering Group on health services information; 1981b.
p. 120–30. Contains public sector information licensed
little evidence that this data has been used to check under the Open Government Licence v3.0.
the reliability of the counts or the validity of the Department of Health and Social Security (DHSS). Man-
relationship hypothesized. agement services. Demand for elective admission: sta-
tistical returns KH06, KH07 and KH07A. SM(87)2/8.
Blackpool: Statistics and Research Division 2A; 1987.
p. 1. Contains public sector information licensed under
References the Open Government Licence v3.0.
Department of Health and Social Security (DHSS). Man-
Analytical Services. Aligning the publication of perfor- agement services. Post Korner aggregate statistical
mance data – statistics consultation. Leeds: NHS returns. SM(86)2/11. Blackpool: Statistics and
England; 2015. p. 8. https://www.engage.england.nhs. Research Division 2A Fylde; 1986. p. 4. Contains
uk/consultation/aligning-publication-performance-data. public sector information licensed under the Open Gov-
Accessed 11 July 2016. Contains public sector informa- ernment Licence v3.0.
tion licensed under the Open Government Licence v3.0. Department of Health and Social Security (DHSS). Ortho-
Armstrong PW. First steps in analysing NHS waiting times: paedic services: waiting time for out-patient appoint-
avoiding the ‘stationary and closed population’ fallacy. ments and in-patient treatment. Report of a working
Stat Med. 2000;19:2037–2051. By permission of John party to the Secretary of State for Social Services.
Wiley and Sons. https://doi.org/10.1002/1097-0258(200 London: DHSS; 1981a. p. 11. 24, 33, 42, 76, 80–1.
00815)19:15<2037::AID-SIM606>3.0.CO;2-R/pdf. http://nhsreality.wordpress.com/2015/01/. Accessed 11
Armstrong PW. Spotting the pantomime villain: do the July 2016.
usual approaches correctly indicate when waiting Department of Health and Social Security (DHSS). Reduc-
times got shorter? Health Serv Manag Res. tion of waiting times for in-patients admission: man-
2010;23:103–115. By permission of SAGE. https:// agement arrangements. HSC(IS)181. London: DHSS;
doi.org/10.1258/hsmr.2009.009021. 1975. p. 1–4. Contains public sector information
Berkson J. Limitations of the application of fourfold table licensed under the Open Government Licence v3.0.
analysis to hospital data. Biometrics. 1946;2:47–53. Department of Health and Social Security, Office of Pop-
Buttery RB, Snaith AH. Surgical provision, waiting times ulation Censuses and Surveys (DHSS & OPCS). Hos-
and waiting lists. Health Trends. 1980;12:57–61. pital in-patient enquiry, summary tables. Based on a
Carvel J. Tories doubt fall in hospital waits. Guardian, one in ten sample of NHS patients in hospitals in
10 Jan 2004, p. 6. England, 1985. MB4 no. 26. London: HMSO; 1987.
p. xi–xii. Contains public sector information licensed Frost CEB. How permanent are NHS waiting lists? Soc Sci
under the Open Government Licence v3.0. Med. 1980;14C:1–11.
Department of Health and Social Security, Office of Pop- Goldacre MJ, Lee A, Don B. Waiting list statistics. I:
ulation Censuses and Surveys (DHSS & OPCS). relation between admissions from waiting list and
Report on hospital in-patient enquiry for the year length of waiting list. Br Med J (Clin Res Ed).
1967. Part I. Tables. London: HMSO; 1970. p. 298–9. 1987;295:1105–8.
Contains public sector information licensed under the Hamblin R, Harrison A, Boyle S. Access to elective care:
Open Government Licence v3.0. why waiting lists grow? London: King’s Fund; 1998.
Department of Health and Social Security, Office of Pop- p. 12–5, 26, 58. http://kingsfund.koha-ptfs.eu/cgi-bin/
ulation Censuses and Surveys, Welsh Office. Hospital koha/opac-detail.pl?biblionumber=20657. Accessed
in-patient enquiry. Main tables. Based on a one in ten 17 Aug 2016. By permission of The King’s Fund.
sample of NHS patients in hospitals in England and Hanning M, Lundström M. Assessment of the maximum
Wales, 1974, Series MB4, no. 2. London: HMSO; waiting time guarantee for cataract surgery. The case
1978. p. ix. Contains public sector information licensed of a Swedish policy. Int J Technol Assess. 1998;14:
under the Open Government Licence v3.0. 180–93.
Department of Health and Social Security, Office of Pop- Harvey I, Webb M, Dowse J. Can a surgical treatment centre
ulation Censuses and Surveys, Welsh Office. Hospital reduce waiting lists? Results of a natural experiment.
in-patient enquiry. Patterns of morbidity. Based on a J Epidemiol Community Health. 1993;47:373–6.
one in ten sample of NHS patients in hospitals in Haywood SC. Managing the health service. London: Allen
England and Wales, 1962–67, Series MB4, no. 3 & Unwin; 1974. p. 38.
London: HMSO; 1979. p. 266. Contains public sector Hinde A. The lexis chart. In: Demographic methods.
information licensed under the Open Government London: Arnold; 1998. p. 12–3.
Licence v3.0. House of Commons Health Committee. Public expenditure
Department of Health, Office of Population Censuses on Health and Personal Social Services 2009. Memo-
and Surveys (DH & OPCS). Hospital in-patient randum received from the Department of Health
enquiry in-patient and day case trends. Based on a containing replies to a written questionnaire from the
nominal one in ten sample of NHS patients in hospi- Committee. London: The Stationery Office; 2010.
tals in England 1979–1985, Series MB4, no. 29. p. 132–4. http://www.publications.parliament.uk/pa/
London: HMSO; 1989. p. 1. Contains public sector cm200910/cmselect/cmhealth/269/269i.pdf. Accessed
information licensed under the Open Government 11 July 2016. Contains public sector information
Licence v3.0. licensed under the Open Government License v3.0.
Douglas JWB. Ministry of Health and General Register Hurst J, Siciliani L. Tackling excessive waiting times for
Office: report on hospital in-patient enquiry for the year elective surgery: a comparison of policies in twelve
1958: Part II. London: HMSO; 1961. p. 301. 17s. 6d. OECD countries. Paris: OECD; 2003. https://doi.org/
Popul Stud (Camb) 1962; 16(2):196. 10.1787/108471127058. Accessed 11 July 2016.
Farquharson D. Waiting times management in Lothian. Information Management Group (IMG). Patients
Edinburgh: NHS Lothian; 2011. p. 7. http://www.scot awaiting elective admission. In: The Data Manual.
tish.parliament.uk/S4_HealthandSportCommittee/Gen Hospital services module, version 1.0. Birmingham:
eral Documents/2012.01.09_to_DM_-_report_from_ NHS Management Executive, Department of Health;
NHS_Lothian_on_waiting_times_management.pdf. 1992. p. 5, 8–10, 14–8. Contains public sector infor-
Accessed 11 July 2016. mation licensed under the Open Government License
Faulkner A, Frankel S. Delayed access to non-emergency v3.0.
NHS services. A review of NHS waiting times and Kenis P. Waiting lists in Dutch health care. An analysis from
waiting list research issues. Bristol: Health Care Eval- an organization theoretical perspective. J Health Organ
uation Unit, University of Bristol; 1993. p. 23, 84. Manag. 2006;20(4):294–308. By permission of Emer-
Feldstein MS. Economic analysis for health service effi- ald. https://doi.org/10.1108/14777260610680104.
ciency. Amsterdam: North-Holland; 1967. p. 152, 200. Kreindler SA. Policy strategies to reduce waits for elective
Finn C. The management, collection and publication of care: a synthesis of international evidence. Br Med
acute day and inpatient waiting lists. Dublin: Institute Bull. 2010;95:7–32.
for the Study of Social Change, University College Kreindler SA, Bapuji SB. Evaluation of the WRHA pre-
Dublin; 2004. p. 12–7. habilitation program. Winnipeg: Winnipeg Regional
Fordham R. Managing orthopaedic waiting lists. Discus- Health Authority; 2010. p. 73–9.
sion paper no. 27. York: Centre for Health Economics, Lee A, Don B, Goldacre MJ. Waiting list statistics. II: an
University of York; 1987. p. 9. http://www.york.ac.uk/ estimate of inflation of waiting list length. Br Med J
che/pdf/dp27.pdf. Accessed 11 July 2016. (Clin Res Ed). 1987;295:1197–8.
Fowkes FGR, Page SM, Phillips-Miles D. Surgical Mason A. An epidemiological approach to the monitoring
manpower, beds and output in the NHS: 1967–1977. of hospital waiting list statistics. Proc R Soc Med.
Br J Surg. 1983;70:114–6. 1976;69:939–42.
342 P. W. Armstrong
Ministry of Health (MoH). National Health Service. The Naylor CD. A different view of queues in Ontario. Health
more effective use of hospital beds. HM(54)89. Aff (Millwood). 1991;10(3):110–28.
London: Ministry of Health; 1954. p. 1. Naylor CD, Slaughter P, Sykora K, et al. Waits and rates:
Ministry of Health (MoH). On the state of the public health. the 1997 ICES report on Coronary Surgical capacity for
The annual report of the Chief Medical Officer of the Ontario. Toronto: Institute for Clinical Evaluative
Ministry of Health for the year 1962. London: HMSO; Sciences; 1997. p. 14.
1963a. p. 205–7. Newell C. The basic demographic equation. In: Methods
Ministry of Health (MoH). Reduction of waiting lists, and models in demography. London: Belhaven Press;
surgical and general. HM(63)22. London: Ministry of 1988. p. 8.
Health; 1963b. Newton JN, Henderson J, Goldacre MJ. Waiting list
Ministry of Health (MoH). Report of the Ministry of Health dynamics and the impact of earmarked funding. BMJ.
for the year ended 31st December 1963. The health and 1995;311:783–5.
welfare services. 1963–64 Cmnd. 2389. London: NHS Information Standards Board (ISB). Measuring and
HMSO; 1964. p. 44. recording of waiting times. DSCN: 09/2006. Birming-
Ministry of Health, General Register Office (MoH & ham: NHS Management Executive; 2006. p. 1, 44, 46.
GRO). Report on hospital in-patient enquiry for the Contains public sector information licensed under the
two years 1956–1957. London: HMSO; 1961b. p. 12. Open Government License v3.0.
Ministry of Health, General Register Office (MoH & NHS/DHSS Steering group on Health Services Informa-
GRO). Report on hospital in-patient enquiry for the tion (Steering Group). A report on the collection and
year 1958. Part II. Detailed tables and commentary. use of information about hospital clinical activity in the
London: HMSO; 1961a. p. 107, 262, 264, 298–99. National Health Service. London: HMSO; 1984.
Ministry of Health, General Register Office (MoH & p. 27–8, 86–90, 131. Contains public sector informa-
GRO). Report on hospital in-patient enquiry for the tion licensed under the Open Government Licence
year 1959. Part II. Detailed tables and commentary. v3.0.
London: HMSO; 1963a. Niinimäki T. Increasing demands on orthopedic services.
Ministry of Health, General Register Office (MoH & Acta Orthop Scand. 1991;62(S241):42–3.
GRO). Report on hospital in-patient enquiry for the Nordberg M, Keskimäki I, Hemminki E. Is there a relation
year 1960. Part II. Detailed tables and commentary. between waiting-list length and surgery rate? Int J
London: HMSO; 1963b. Health Plann Manage. 1994;9:259–65.
Moral L, de Pancorbo CM. Surgical waiting list reduction Powell JE. Supply and demand. In: A new look at medicine
programme. The Spanish experience. In: HOPE and politics. London: Pitman Medical; 1966. p. 39–40.
sub-committee on coordination, editor. Waiting lists http://www.sochealth.co.uk/national-health-service/health
and waiting times in health care. Managing demand care-generally/history-of-healthcare/a-new-look-at-medi
and supply. Leuven: European Hospital and Healthcare cine-and-politics-4/. Accessed 11 July 2016.
Federation (HOPE); 2001. p. 7, 10–7, 48–9. http:// Pressat R. Balancing equation. In: Wilson C, editor. The
www.hope.be/documents-library/. Accessed 11 July dictionary of demography. Oxford: Blackwell; 1985.
2016. p. 15.
National Audit Office. Inappropriate adjustments to NHS Purcell J. The waiting list initiative. Report on value for
waiting lists. Report by the Comptroller and Auditor money examination. Dublin: Office of the Comptroller
General. HC452. Session 2001–2002: 19 December and Auditor General; 2003. p. 8, 17, 23, 26, 28. http://
2001. London: The Stationery Office; 2001b. p. 27. www.audgen.gov.ie/ViewDoc.asp?DocId=-1&CatID=
https://www.nao.org.uk/report/inappropriate-adjustments 5. Accessed 11 July 2016. By permission of the Office of
-to-nhs-waiting-lists/. Accessed 11 July 2016. the Comptroller and Auditor General.
National Audit Office. Inpatient and outpatient waiting in Registrar General. Statistical review of England and Wales
the NHS. Report by the Comptroller and Auditor Gen- for the year 1955. Supplement on hospital in-patient
eral. HC 221. Session 2001–2002: 26 July 2001. statistics. London: HMSO; 1959. p. 2.
London: The Stationery Office; 2001a. p. 21. https:// Sanmartin C, Barer ML, Sheps SB. Health care waiting lists
www.nao.org.uk/report/inpatient-and-outpatient-waiting- and waiting times: a critical review of the literature. In:
in-the-nhs/. Accessed 11 July 2016. By permission of the Waiting lists and waiting times for health care in
National Audit Office. Canada: more management, more money. Ottawa:
National Audit Office Wales. NHS waiting times in Wales. Health Canada; 1998. p. 196, 198, 241–54, 270, 281.
Volume 1 – the scale of the problem. Cardiff: The http://publications.gc.ca/site/eng/9.647111/publication.
Stationery Office; 2005. p. 7, 11, 18–9, 32–4, 43. html. Accessed 11 July 2016.
National Waiting Times Unit (NWTU). Managing waiting Smethurst DP, Williams HC. Self-regulation in hospital
times. A good practice guide. Edinburgh: Scottish waiting lists. J R Soc Med. 2002;95:287–9.
Executive; 2003. p. 4–5. http://www.gov.scot/Publica Snaith AH. Supply and demand in the NHS. Br Med
tions/2003/09/18035/25483. Accessed 11 July 2016. J. 1979;1(6171):1159–60.
Street A, Duckett S. Are waiting lists inevitable? Health White A. Waiting lists. A step towards representation,
Policy. 1996;36:1–15. By permission of Elsevier. clarification and solving of information problems.
https://doi.org/10.1016/0168-8510(95)00790-3. Hosp Health Serv Rev. 1980;76(8):270–4.
Sykes PA. DHSS waiting list statistics – a major deception? Worthington D. Hospital waiting list management models.
Br Med J (Clin Res Ed). 1986;293:1038–9. J Oper Res Soc. 1991;42(10):833–43.
Torkki M, Linna M, Seitsalo S, et al. How to report and Yates J. Why are we waiting? An analysis of hospital
monitor the performance of waiting list management. waiting lists. Oxford: Oxford University Press; 1987.
Int J Technol Assess. 2002;18(3):611–8. p. 71. By permission of Oxford University Press.
Waiting Times: Evidence of Social
Inequalities in Access for Care 15
Luigi Siciliani
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
Sources of Inequalities in Waiting Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
Data and Empirical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
A Review of the Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
International Studies: Evidence from SHARE and the Commonwealth Fund . . . . . . . . . 355
United Kingdom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Australia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
Norway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
Sweden . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Canada . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Germany . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Spain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Italy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Conclusions and Implications for Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
Abstract by socioeconomic status are relatively rare, the

Equity is a key policy objective in many pub- traditional focus being on measurement of
licly funded health systems across OECD inequalities in healthcare utilization. Waiting
countries. Policymakers aim at providing time data are readily available for the analysis
access based on need and not ability to pay. through administrative databases. They are
This chapter focuses on the use of waiting commonly used for reporting on health system
times for studying inequalities of access to performance. Within publicly funded health
care. Studies of inequalities in waiting times systems, the duration of the wait is supposed
to be the same for patients with different socio-
economic status for a given level of need.
L. Siciliani (*) Patients with higher need or urgency are sup-
Department of Economics and Related Studies, University posed to wait less based on implicit or explicit
of York, York, UK
e-mail: luigi.siciliani@york.ac.uk
prioritization rules. A recent empirical literature

https://doi.org/10.1007/978-1-4939-8715-3_17
346 L. Siciliani
seems however to suggest that within several an elective treatment (e.g., a hip replacement) or a
publicly funded health systems, nonprice few hours in the emergency room. “Need” can be
rationing does not guarantee equality of access interpreted as ill health or severity, but also as the
by socioeconomic status. Individuals with ability (or capacity) to benefit. The two concepts
higher socioeconomic status (as measured by differ since ill patients may have low capacity to
income or educational attainment) tend to wait benefit from treatment (as for some cancer
less for publicly funded hospital care than patients).
those with lower socioeconomic status. This An extensive empirical literature has been
negative gradient between waiting time and devoted to test whether, controlling for need, indi-
socioeconomic status may be interpreted as viduals with different socioeconomic status differ
evidence of inequity within publicly funded in healthcare utilization (Wagstaff and Doorslaer
systems which favors rich and more-educated 2000 for a review). In most studies, the level of
patients over poorer and less-educated ones. healthcare utilization is measured by the number
The chapter provides an overview of methods of visits to a specialist or a family doctor, while
and data to investigate the presence of social need is measured by self-reported health. Com-
inequalities in waiting times and highlights parative international studies suggest that in many
key results. OECD countries there is generally pro-rich ineq-
uity for physician contacts, in particular in relation
to specialist visit and to a lower extent family-
Introduction doctors consultation (where in some instances
pro-poor inequities may be present) (see van
Equity is a key policy objective in publicly funded Doorslaer et al. 2000, 2004; Devaux 2015 for a
health systems. In many OECD countries, this recent analysis).
takes the form of payments towards health care This chapter focuses on inequalities of access
funding being related to ability to pay, not the use as measured by waiting times for nonemergency
of medical care; access to health care being based treatments. Studies of inequalities in waiting
on patients’ need, not patients’ ability to pay; and times by socioeconomic status are relatively infre-
overall reduction in health inequalities. quent. This is perhaps surprising given that
An extensive empirical literature has been waiting times are a major health policy issue in
devoted to document inequalities in healthcare many OECD countries. Average waiting times
financing, access, and health (see Wagstaff and can reach several months for common procedures
Doorslaer 2000 for a review). This chapter like cataract and hip replacement (Siciliani
focuses on one form of inequalities in access. et al. 2014). In the absence or limited use of prices
The principle that “access should be based on in combination with constraints on the supply,
need” seems both intuitive and desirable. How- publicly funded systems are often characterized
ever, the words “access” and “need” are subject to by excess demand. Since the number of patients
different interpretations. “Access” can simply demanding treatment exceeds supply, patients are
refer to healthcare utilization, i.e., whether a added to a waiting list and have to wait before
patient has received treatment or not. But it receiving treatment (Martin and Smith 1999).
could also refer to the opportunity to receive treat- Waiting times generate dissatisfaction for
ment, when monetary and nonmonetary costs that patients since they postpone benefits from treatment,
people incur have been taken into account. Money may induce a deterioration of the health status of the
costs involve any copayment the patient has to patient, prolong suffering, and generate uncertainty.
pay or monetary expenses to reach a healthcare A number of policies have been introduced across
provider (a patient from a rural area may, for the globe to reduce or tackle waiting times (see
example, face significant travel costs). Siciliani et al. 2013a for a review).
Nonmonetary costs can take the form of waiting From an equity perspective, one possible
times, if the patient has to wait several weeks for advantage of rationing by waiting times is that
15 Waiting Times: Evidence of Social Inequalities in Access for Care 347
within publicly funded health systems, access to waiting times are negligible: the promise of low
services is not supposed to depend on the ability to wait is indeed the main way to attract patients
pay, unlike any form of price rationing where from the public to the private sector. For several
access is dependent on income. For a given level elective treatments, patients therefore can wait
of need, the duration of the wait is supposed to and obtain treatment in the public sector for free
be the same for patients with different income. (or by paying a small copayment) or opt for the
Patients with higher need or urgency are supposed private sector and obtain care more swiftly if they
to wait less based on implicit or explicit prioriti- are willing to pay the price (or prospectively
zation rules. insure themselves privately).
A recent empirical literature, reviewed in this Since it is individuals with higher income that
chapter, seems however to suggest that within are more likely to be able affording private care,
several publicly funded health systems, nonprice this generates inequalities in waiting times by
rationing does not guarantee equality of access by socioeconomic status within a country. The extent
socioeconomic status. Individuals with higher of such inequalities due to the presence of the
socioeconomic status (as measured by income or private sector is likely to depend on its relative
educational attainment) tend to wait less for pub- size. For example, about 50 % of treatments are
licly funded hospital care than those with lower private in Australia, but these tend to be negligible
socioeconomic status. in the Nordic countries where the option of going
This negative gradient between waiting time private is much more limited.
and socioeconomic status may be interpreted as Within publicly funded systems, access to care
evidence of inequity within publicly funded sys- should be based exclusively on need, not on abil-
tems which favors rich and more-educated ity to pay (in contrast to contributions to funding
patients over poorer and less-educated ones. of health systems instead based on ability to pay,
Therefore, rationing by waiting times may be not need). Therefore, waiting times for patients on
less equitable than it appears. the list should reflect need and not on socioeco-
The chapter focuses on studies employing nomic status. Indeed, patients on the list are pri-
large samples either from administrative or survey oritized by doctors. Patients with higher severity
data. The study is organized as follows. Possible and urgency are supposed to wait less than less
sources of inequalities in waiting times are first severe and urgent patients.
discussed. Second, appropriate data and empirical In practice, it is possible that variations in
methods are presented which can be usefully waiting times for publicly funded patients reflect
employed to investigate inequalities in waiting also non-need factors. Waiting-time inequalities
times. Third, the existing evidence is reviewed. may be due to hospital geography and therefore
Fourth, possible policy implications are drawn. arise “across” hospitals. This could be due to
some hospitals having more capacity (number of
beds and doctors) and being able to attract a more
Sources of Inequalities in Waiting skilled workforce. This may be the case for hos-
Times pitals located in an urban as opposed to a rural
area. Also, some geographical areas may be
This section describes different mechanisms that underfunded compared to others. If individuals
generate inequalities in waiting times. Several with higher socioeconomic status live in areas
health systems are publicly funded and character- where hospitals are better funded or have higher
ized by universal health coverage (e.g., Australia, capacity, then this may contribute to inequalities
Italy, New Zealand, Spain, and the United King- in waiting times by socioeconomic status.
dom). These often coexist with a parallel private Inequalities in waiting times may also arise
sector for patients who are willing to pay out of “within” the hospital as opposed to across “hos-
pocket or who are covered by private health insur- pitals.” Individuals with higher socioeconomic
ance. A key feature of this private sector is that status may engage more actively with the health
348 L. Siciliani
system and exercise pressure when they experi- individuals with primary, secondary, and tertiary
ence long delays. They may be able to express educational attainments, etc.).
better their needs. They may also have better The key advantage of survey data is that socio-
social networks (know someone) and use them economic status (such as income and highest edu-
to have priority over other patients (attempt cational attainment) is routinely recorded at
to jump the queue). They may have a lower individual level. However, the sample tends to
probability of missing scheduled appointments be smaller and more heterogeneous: patients’
(which would increase the waiting time). They treatment can range from less urgent ones (e.g., a
may search more actively for hospitals with lower cataract surgery) to more urgent ones (e.g., cancer
waiting times and willing to travel further. treatment). Detailed measures of severity are gen-
erally also missing. A measure of self-reported
health tends to be used as a proxy of health
Data and Empirical Methods needs which in line with previous literature on
measuring social inequalities in healthcare utili-
Data zation (Wagstaff and Van Doorslaer 2000).
Administrative and Survey Data Measures of Waiting Times

Two main data sources have been employed in the Waiting times in health care can be measured in
existing literature: administrative data and survey different ways (Siciliani et al. 2013a, Chap. 2).
data. Each of these has relative merits. Registry data For many elective surgical procedures and medi-
have also been employed but to a lower extent. cal treatments, the most common measure is the
The key advantage of (mainly hospital) admin- inpatient waiting time. This measures the time
istrative data is that they cover the whole popula- elapsed from the specialist addition to the list to
tion of patients admitted to hospitals for treatment. treatment for all publicly funded patients treated
Moreover, waiting times can be measured at in a given year (Siciliani et al. 2014). Collecting
disaggregated level, i.e., for specific conditions, data for all publicly funded patients implies that
treatments or surgeries (such as cataract or hip patients can receive treatment either by publicly or
replacement) with large sample size. Administra- privately (nonprofit and for-profit) owned pro-
tive data contain detailed control variables on viders. Waiting times on privately funded patients
patients’ severity (as proxied by number of are generally not collected on a routine basis.
comorbidities, number of secondary diagnoses The definition of inpatient waiting time does
or the Charlson index) and information on the not include the outpatient waiting time, i.e., the
hospital which provided the treatment. time elapsed from the date of referral of the gen-
The key disadvantage of administrative data is eral practitioner to the date of specialist
the difficulty in linking patients’ wait with detailed assessment.
and precise information on patients’ socioeco- Some countries (like Denmark and England)
nomic status. Ideally, the researcher would like to have started to collect a third measure known as
access measures of income and educational attain- referral-to-treatment waiting time. This measures
ment for each patient who was admitted to hospital. the time elapsed between family doctor referral
This would involve linking health administrative and treatment for patients treated in a given year.
data with fiscal ones (generally for tax purposes). This measure therefore includes also the time
Except for Nordic countries, this link is not easily elapsed from the family doctor referral to the
available in most countries. Researchers therefore specialist visit. It is approximately the sum of
have to use proxies. Since patient’s postcode is outpatient and inpatient waiting times, though it
usually available, the waiting time experienced by allows for gaps between the specialist visit and the
the patient can be linked with socioeconomic vari- addition to the list which could be significant.
ables measured at small-area level (income depri- An alternative measure to the inpatient waiting
vation, individuals living on benefits, proportion of time of patients treated is the inpatient waiting
times (from specialist addition to the list) of (Australia, New Zealand, Portugal, Finland,
patients on the list at a census date. This measure and the United Kingdom).
is analogous to the definition provided above but It is important to emphasize that such measures
refers to the patients on the list at a given census of waiting times refer to elective (nonemergency)
date (as opposed to patients treated in a given conditions where the wait is generally long (in the
year). Similarly, the referral-to-treatment waiting order of weeks of months) though they can be
time of patients on the list can be defined. shorter for more urgent elective care (e.g., cancer
The distribution of waiting time of patients care). Emergency care is therefore often excluded
treated measures the full duration of the patient’s from the empirical analyses.
waiting time experience (from entering to exiting Most empirical analyses making use of admin-
the list). The distribution of the waiting times of istrative data surveyed in this chapter have
patients on the list refers to an incomplete duration employed data that measure the inpatient waiting
since, if on the list, patients are still in the process time, which is computed retrospectively once the
of waiting. The waiting time of patients treated patient has received treatment. Those with survey
has the advantage of capturing the full duration of data have included both the inpatient and the
a patient’s journey, but it is retrospective in nature. outpatient waiting time (for a specialist visit).
However, it does not capture the wait of the Waiting-time measures from survey data are typ-
patients who never received treatment since they ically self-reported. Surveyed individuals are
died while waiting, changed their mind, received a asked questions of the type “if you had an inpa-
treatment in the private sector, etc. The two dis- tient (outpatient) care in the last year, how long
tributions of waiting times are different but did you wait to be treated (to see a specialist)?”
related. Both distributions can be used to com- Answers may therefore suffer from recall bias.
pute the probability of being treated (i.e., of
waiting time ending) as time passes, i.e., the
hazard rate in terms of survival analysis. The Methods
hazard rate derived under the two distributions
will be the same if the system is in steady state The empirical analyses are interested in testing
and if each patient on the list is ultimately treated. whether patients with higher socioeconomic sta-
Both conditions are unlikely to hold in reality. tus wait less than patients with lower socioeco-
This emphasizes some of the differences between nomic status when admitted to hospital. This
the two distributions (but see Armstrong 2000, section first presents a simple model specification
2002; Dixon and Siciliani 2009 for a fuller dis- which can be estimated with the Ordinary Least
cussion of these issues). Square (OLS) method and then proceeds to more
Table 1 below provides comparative figures of sophisticated models such as duration analysis.
median and mean waiting times across OECD
countries in 2011. It illustrates how some coun- Model Specification
tries report inpatient waiting time from specialist with Administrative Data
addition to the list to treatment, some report inpa- Suppose that the researcher has at her disposal a
tient waiting time for patients on the list, and some sample of I patients receiving treatment in
report both measures. Among the countries J hospitals. The sample includes all patients who
included, waiting times appear lowest in Denmark received a specific treatment (e.g., hip and knee
and the Netherlands. It is also evident that mean replacement, cataract surgery, coronary bypass,
waiting times are longer than the median ones, varicose veins). Each patient receives treatment
and this is due to the skewed distribution of only in one hospital. Each hospital in the sample
waiting times with a small proportion of patients treats at least one patient. Define w as the inpatient
having a very long wait. As an example, Fig. 1 waiting time for patients receiving treatment in a
provides the distribution of waiting times for public hospital for treatment. It is assumed that
hip replacement for several OECD countries waiting times are measured in days and that
350
Table 1 Median (mean) waiting times for common surgical procedures: 2011
Patients treated – inpatient (time from specialist addition to list to treatment)
Hip replacement Knee replacement Cataract Hysterectomy Prostatectomy Cholecystectomy Hernia CABG PTCA
Australia 108 173 90 49 47 54 57 17
Canada 89 107 49 7
Finland 108 127 122 84 49 69 76 35 21
(125) (141) (125) (98) (72) (90) (96) (45) (31)
Netherlands (46) (43) (32) (34) (32) (35) (36) (26) (16)
New Zealand 90 96 84 98 63 62 57 28 51
(104) (112) (94) (109) (86) (86) (82) (37) (66)
Portugal 92 192 49 90 61 80 82 2
(149) (231) (67) (125) (115) (134) (120) (29)
Spain (127) (89) (91) (89) (87)
UK-England 81 85 57 61 31 70 60 52 35
(91) (96) (65) (70) (41) (81) (71) (62) (40)
UK-Scotland 75 80 62 48 51 61 63 35 29
(90) (94) (70) (53) (55) (77) (82) (47) (33)
Patients treated – Referral to treatment (time from family doctor referral to treatment)
Denmark 39 46 70 35 36 38 45 13
(51) (59) (99) (49) (56) (46) (56) (19)
Patients on the list – Inpatient
Ireland 103 119 118 96 81 93 98 77 54
(130) (153) (144) (131) (127) (132) (128) (102) (78)
New Zealand 60 65 51 65 51 58 54 46 38
(78) (84) (63) (73) (66) (75) (69) (60) (51)
Portugal 129 156 67 82 103 117 95 93
(189) (200) (100) (111) (185) (178) (147) (118)
Spain (93) (71) (74) (74) (71)
Sweden 43 45 40 25
Slovenia 340 495 58 90 90 240
(354) (512) (63) (122) (132) (275)
Source: Siciliani et al. (2013b)
L. Siciliani
80
65
57 55
60 51
40 42
37
40 33
%
29
18
20 14 13
11 12
5 8
5 5 3 2
1 1 1 0 1 0
0
UK - UK - England Portugal New Finland Australia
Scotland Zealand
% waiting 0-3 months % of patients waiting 3 -6 months
% of patients waiting 6 -9 months % of patients waiting 9 -12 months
% of patients waiting over 12 months
Fig. 1 Distribution of waiting times of patients treated (Source: Siciliani et al. (2013b))
waiting time is a continuous variable. The follow- dj is a vector of hospital dummy variables
ing linear model can be specified: (fixed effects), one for each hospital. These are
included to control for systematic differences in
0 0 0
wij ¼ d j βj þ yij βy þ sij βs þ eij (1) waiting times across hospitals which arise from
differences in supply (beds, doctors, efficiency) or
where wij is the waiting time of patient i in public in demand (e.g., proportion of the elderly). Hos-
hospital j. Waiting times are a function of (and pitals with higher βj have longer waiting times on
additively separable in) the determinants outlined average.
on the Right Hand Side of Eq. 1. eij is the idiosyncratic error term. This can be
sij is a vector of patients’ characteristics cap- interpreted as any variation in waiting time
turing patients’ severity. These could include age, which is not captured by the other variables (this
gender, and number of comorbidities. These fac- includes coding and measurement error, or
tors control for the severity of patient’s health unobserved – to the researcher – dimensions of
condition. In many countries, patients on the list severity).
are prioritized on the basis of their severity and The simplest way to estimate Eq. 1 is with
more severe patients wait less relative to ordinary least squares (OLS). OLS minimizes
nonsevere ones. The coefficients βs are therefore the sum of the squared distances between the
expected to be negative. They provide a measure observed data and the predicted ones based on
of the extent to which patients with higher severity linear approximation, i.e., the sum of the squared
wait less. of the errors (Cameron and Trivedi 2010,
yij is a variable (or a vector of variables) which Chap. 3). OLS relies on a number of assumptions,
captures socioeconomic status, as measured by including the exogeneity of the regressors, the
the income in the area where the patient lives. error terms having the same variance (homosce-
Inequalities in waiting time across patients with dasticity) and conditionally uncorrelated observa-
different socioeconomic status arise if βy 6¼ 0. If βy tions. Under the assumption that the error terms
is negative then individuals with higher (lower) are normally distributed, the hypothesis can be
socioeconomic status wait less (more), keeping tested on whether the estimated coefficients are
other variables (including severity) constant. statistically different from zero.
352 L. Siciliani
For the coefficients βy to provide an unbiased covariates (regressors) on the RHS of Eq. 1 are
(correct) estimate of whether patients with higher also in transformed in log, then each estimated
socioeconomic status wait more or less than OLS coefficient can be interpreted as elasticity.
other patients, either socioeconomic status has For example, if socioeconomic status is measured
to be uncorrelated with other determinants of with income and βy ¼ 0:5, then a 10 % increase
waiting times (which seems implausible) or, if in income reduces waiting times by 5 %. If
it is correlated, it has to be controlled for all instead the covariates are dummy variables, then
possible determinants of waiting times. Other- the estimated coefficient can be interpreted
wise, the estimates of βy will be prone to (approximately) as the proportionate change in
so-called omitted variable bias. waiting times (semielasticity). For example, sup-
For example, more severe patients are more pose that socioeconomic status is measured
likely to have lower socioeconomic status through the highest level of education attained
(Wagstaff and van Dooerslaer 2000). Patients’ by the patient and patients either went to univer-
severity may therefore be correlated negatively sity or not. Suppose further that the estimated
with both waiting time and socioeconomic status. coefficient associated to the dummy variable
Failure to control for patient severity might gen- (equal to one if the patient has a university degree)
erate biased results. Without controlling for sever- is equal to βy ¼ 0:1 . Then, patients with a
ity, a positive correlation between waiting time university degree wait 10 % less.
and income may be observed, while such correla- Estimating Eq. 1 by OLS treats hospital effects
tion may disappear once controls for severity are as fixed. This approach generates unbiased but
added. inefficient estimates due to the inclusion of a
Similarly, hospitals with high supply (and large number of regressors (therefore introducing
lower waiting times) are likely to be located in the possibility of not identifying a gradient when
urban areas where high-income patients are there is one). An alternative approach is to assume
concentrated leading to a correlation between that hospital effects are random. Under the
hospital characteristics and socioeconomic char- assumption that hospital effects are uncorrelated
acteristics of patient’s area of residence. Omit- with other covariates, the coefficients in Eq. 1 will
ting hospital dummies (fixed effects) might be estimated more efficiently. However, a random
overestimate inequalities. Including hospital effect model will generate biased coefficients if
fixed effects allows interpreting socioeconomic hospital effects are correlated with other
inequalities in waiting times “within” a hospital, covariates. Whether the random effects generate
rather than across hospitals. If researchers are different estimated coefficients compared to the
interested in explaining waiting times inequal- fixed effects, can be tested through a Hausman test
ities across hospitals, a range of supply variables (Cameron and Trivedi 2010, Chap. 8).
(e.g., number of beds and doctors, length of stay)
can be employed instead of hospital fixed effects. Model Specification with Survey Data
In summary, inequalities in waiting time across Studies that employ survey data have typically
patients with different socioeconomic status arise smaller samples. Investigating waiting times by
if βy 6¼ 0 , i.e., when differences in waits are treatment or procedure is often precluded. An
statistically significant even after controlling for analysis can still be conducted by pooling the
patients’ severity and hospital fixed effects. sample across different treatments and conditions.
Hypothesis testing requires the error terms to In such studies, additional dummy variables have
be normally distributed. Given that waiting times to be introduced to control for systematic differ-
have a skewed distribution, the error terms in ences in waiting times across conditions (e.g.,
Eq. 1 are unlikely to be normal. To address this waiting for a cataract surgery tends to be longer
issue, the dependent variable wij is typically than for coronary bypass). Moreover, survey data
transformed by the logarithmic function, so that rarely have information on the provider (e.g., the
the dependent variable becomes log(wij). If the hospital) where the patient received the treatment.
It is therefore not possible to control for hospital NBM reduces to the PM in the special case
fixed effects. when there is no overdispersion in the data.
The model in Eq. 1 can be modified in the If measured in weeks or months, waiting times
following way. Define again w as the inpatient data are discretized: the variable is observed dis-
or outpatient waiting time for patients who cretely, whereas the underlying process generat-
received treatment in a given year. The model ing waiting times is intrinsically continuous. An
specification is: alternative to the NGM is the interval regression
model which is specifically designed for discretized
0 0 0
lnðwik Þ ¼ dk βk þ yi βy þ si βs þ eik (2) continuous variables.
where wik is the waiting time of patient Duration Analysis

i receiving treatment k; yi is a variable (or a vector Duration models are an alternative method to
of variables) which captures socioeconomic sta- investigate the determinants of waiting times.
tus (income and/or educational attainment), si They can be employed to test for differences in
measures, in addition to age and gender, self- waits between socioeconomic groups over the
reported health or whether the patient has chronic whole distribution of time waited (see Laudicella
conditions. dk is a vector of dummy variables et al. 2012; and Dimakou et al. 2009; Appleby
controlling for different types of treatment (e.g., et al. 2005, for other applications of duration
cataract, coronary bypass etc.) or speciality analysis to waiting times).
(orthopedic, ophthalmology). eik is the error A key concept in duration analysis is the haz-
term. Again, inequalities, in waiting time across ard rate, h(t). This measures the instantaneous
patients with different socioeconomic status, probability of leaving the waiting list at time
arise if βy 6¼ 0. t (and therefore of being treated) conditional on
Depending on the survey employed, waiting having waited on the list until time t.
time can be measured separately for publicly A popular duration model is the Cox regression
funded and privately funded patients. The avail- model. This model is semiparametric since it does
ability of this information is critical. If public and not require assumptions over the distribution of
private patients are pooled together, then an obvi- the time waited. The Cox model identifies the
ous reason for patients with higher socioeconomic effect of each covariate on waiting time in terms
status to wait less is that they can afford to go of hazard ratios, i.e., the ratio between the hazard
private. If only publicly funded patients are rates of different groups of patients. The Cox
included in the analysis, then other mechanisms model calculates the conditional hazard rate, h
are responsible for the estimated gradient. (t; x), as:
If waiting times are long and measured in days, !
X
they may be treated as a continuous variable (like hðt; xÞ ¼ h0 ðtÞexp β k xk (3)
in Eq. 1). However, if waiting times are short k
and/or measured in weeks or months, then waiting
times should be treated as a discrete variable. where xk ¼ y, s (with k being the total number of
Given that waiting times’ distributions are covariates in each vector) and h0(t) is the baseline
skewed, a negative binomial model (NBM) can hazard rate, i.e., the probability of leaving the list
be employed to investigate the determinants of when all covariates are zero. The estimated coef-
waiting times. The NBM gives a useful generali- ficients βk provide the effect of an increase in
zation of the Poisson model (PM), allowing for socioeconomic status and severity on the proba-
heterogeneity in the mean function, thereby bility of leaving the waiting list, and therefore of
relaxing the restriction on the variance (Cameron being admitted for treatment. Suppose that socio-
and Trivedi 2005; Jones 2007). In the PM, the economic status is measured by education with a
dependent variable follows a Poisson distribution dummy variable equal to one if the patient does
and the variance is set equal to the mean. The not have a university degree. Then a coefficient
354 L. Siciliani
which is less than one will imply that less- patients based on their degree of urgency.
educated patients have a lower probability of Some dimensions of urgency may however
exiting the list (and therefore of being treated remain unobservable to the researcher. Whether
within a given time). a larger socioeconomic gradient should be
The Cox model assumes the hazard ratio expected at low or high waiting times is in
between two different groups, for example, those principle indeterminate. Since waiting times
treated in hospital j and hospital j 0 , exp are short when the condition is more urgent,
" #
X richer and more-educated people may be keener
β k xj xj0 is constant with time waited to obtain reductions in waiting times when they
k
k
perceive delays to affect their health more crit-
(Cameron and Trivedi 2005, Chap. 17.8). If this
ically. On the other hand precisely because
assumption is violated, then the stratified Cox
waiting times are short, there may be less
model and the extended Cox model may be
scope for influencing them.
more appropriate. The former introduces
Finally, a concern may be raised that esti-
group-specific baseline hazards, h0j(t). There-
mates in Eq. 1 are contaminated by what is
fore, the conditional hazard rate becomes:
! known as sample selection based on unobserved
X factors (to the researcher). For example, patients
hðt; xÞ ¼ h0j ðtÞexp βk xk . The main advan-
k with higher income who expect to wait a long
tage of the stratified Cox model is that it relaxes time are more likely to afford and opt for the
the common baseline hazard assumption. The private sector. It may therefore arise that public
main disadvantage is that hazard ratios between hospitals treat poor patients with expected high
the stratified groups cannot be identified. The and low waiting times but only rich patients
extended Cox model introduces time depen- with low waiting times. In turn, this may gener-
dency by interacting covariates with the time ate an apparent negative gradient between
waited, gk(t), (Pettitt and Daud 1990; Fisher income and waiting time for patients receiving
and Lin 1999): treatment within publicly funded hospitals. If
the researcher observes whether patients went
" # for public and private treatment, then a Heck-
X X
hðt; xðtÞÞ ¼ h0 ðtÞexp β k xk þ δk xk gk ðtÞ man Selection model can be performed to adjust
k k for sample-selection bias (Heckman 1979).
(4) Such model involves estimating a selection
equation for the choice of the patient between
where δk are the coefficients of the time opting for private care versus public care, which
interactions. can include socioeconomic status among its
determinants. For the model to perform well,
Other Methods an identification variable is recommended, i.e.,
Another useful regression method for investi- a variable which predicts the choice of going
gating waiting times is quantile regression public versus private but does not directly affect
(Cameron and Trivedi 2010, Chap. 7). Estimat- waiting times (distance to the hospital may be
ing Eq. 1 by OLS allows estimating the effect of such an identifying variable; see Sharma
socioeconomic status at the sample mean. Since et al. 2013).
patients differ in the degree of urgency, it may
be interesting to estimate whether such effect is
persistent also when waiting times are high or A Review of the Evidence
low, i.e., across different cut-off points in the
waiting time distributions (say the 20th and This section first reviews key results from inter-
80th percentile, and at the median) through a national studies and then on studies that focus on
quantile regression model. Doctor prioritize individual countries.
International Studies: Evidence from United States). Waiting times are measured for a
SHARE and the Commonwealth Fund specialist visit and for elective surgery. Socio-
economic status is proxied by a dummy variable
Using survey data from Survey of Health, Ageing equal to one if income is above average. Control
and Retirement in Europe (SHARE), Siciliani variables include age, health status, and for the
and Verzulli (2009) test whether waiting times USA for private insurance status.
for specialist consultation and nonemergency Employing logistic regression, the study
surgery differ by socioeconomic status. The sam- shows that individuals with above-average
ple includes nine European countries: Austria, income have a lower probability of waiting
Denmark, France, Germany, Greece, Italy, the more than 2 months for a specialist visit in
Netherlands, Spain and Sweden. The survey covers Australia, New Zealand, and the Netherlands.
22,000 respondents across these European coun- They also have a higher probability of waiting
tries. The analysis controls for severity as proxied less than 4 weeks for a specialist visit in
by age, gender, and self-reported health (and type Australia, Canada, New Zealand, and the
of specialist care and treatment). Privately funded United States. No marked differences in waiting
patients are excluded from the analysis (a minority times by socioeconomic status are found for
of the sample). Therefore, the analysis can be elective surgery. Since no control variable is
interpreted in terms of inequalities among publicly included for patients going to private provider,
funded patients. Since waiting times are measured differences in waiting times by socioeconomic
in weeks and months, a negative binomial model is status could to some extent be explained by
employed. richer patients opting for the private sector
For specialist consultation, they find that indi- when waiting times are high.
viduals with high education experience a reduc-
tion in waiting times of 68 % in Spain, 67 % in
Italy and 34 % in France (compared with indi- United Kingdom
viduals with low education). Individuals with
intermediate education report a waiting-time Using administrative data, Cooper et al. (2009)
reduction of 74 % in Greece (compared with investigate for the presence of inpatient waiting-
individuals with low education). There is also time inequalities in England for the following
evidence of a negative and significant associa- elective procedures: hip and knee replacement
tion between education and waiting times for and cataract surgery. They also compare whether
nonemergency surgery in Denmark, the Nether- such inequalities varied during the Labor govern-
lands, and Sweden. High education reduces ment between 1997 and 2007. Waiting time was
waits by 66 %, 32 %, and 48 %, respectively. much higher in the early years but then gradually
There is some evidence of income effects, fell. The analysis refers to publicly funded
although generally modest. An increase in patients only, i.e., patients treated by the National
income of 10,000 Euro reduces waiting times Health Service. Patients who do not want to wait
for specialist consultation by 8 % in Germany can opt for treatment in the private sector, but they
and waiting times for nonemergency surgery by will have to pay or hold a private health insurance.
26 % in Greece. Surprisingly, an increase in The regression analysis (similar to Eq. 1) con-
income of 10,000 Euro increases waits by 11 % trols for patients’ age, gender, area type (e.g., city,
in Sweden. town and fringe, isolated village), but not for
Schoen et al. (2010) use data from the 13th hospital fixed effects. The regressions are run for
annual health policy survey conducted in three periods corresponding to different govern-
2010 by the Commonwealth Fund in eleven ment policy (1997–2000, 2001–2004, and
countries (Australia, Canada, France, Germany, 2005–2007). Socioeconomic status was measured
New Zealand, the Netherlands, Norway, Swe- through an index of income deprivation (the 2001
den, Switzerland, the United Kingdom, and the Carstairs index at the output area level then
356 L. Siciliani
transformed in to five income deprivation quin- Pell et al. (2000) investigate inequalities in
tiles). The Carstairs index is based on car owner- waiting times for cardiac surgery in Scotland.
ship, unemployment, overcrowding, and social They employ administrative data measuring the
class within output areas, calculated by the Office inpatient waiting time. Similarly to Cooper
of National Statistics. et al. (2009), socioeconomic status is proxied
The study finds that compared to patients through the Carstairs deprivation index. They
with lowest income deprivation (highest socio- find that the most deprived patients waited
economic status) patients in other groups tend 24 days longer than least deprived ones. This
to wait longer, up to about 2 weeks longer. was in part due to less deprived patients more
For some procedures and years, the effect is likely to be classified as urgent.
not-monotonic with patients with middle-
income deprivation waiting longest. Inequal-
ities in waiting times tend to decrease over Australia
time. This is probably due to waiting times
falling over the considered period. The authors Sharma et al. (2013) investigate the presence of
conclude that equity improved over time. In the inequalities in waiting times in the State of Vic-
period 2005–2007, very little differences toria (which accounts for 25 % of Australian
existed in waiting times across patients with population) in 2005. The study employs admin-
differing deprivation. istrative data on inpatient waiting time for pub-
The analysis by Cooper et al. (2009) does not licly funded patients. Several surgical procedures
account for hospital fixed effects. Therefore, are employed (including eye, hip and knee pro-
inequalities in waiting times may reflect varia- cedures, hysterectomy, and prostatectomy).
tions “across hospitals” due, for example, to A key institutional feature of the Australia
different resources or variations “within the system is that although everyone has public
hospital” due, for example, to some patients insurance, about half of the population has pri-
being able to get ahead of the queue. Laudicella vate health insurance and about half of the care
et al. (2012) extends the analysis by introducing is provided by private hospitals. More precisely,
hospital fixed effects but focuses on hip replace- patients who seek treatment in a public hospital
ment only in 2001. They split the deprivation receive treatment for free under Medicare
index between “income” deprivation (based on (Australia’s universal public health insurance
individuals on benefits) and “education” depri- scheme) but have to wait. Patients who seek
vation. They provide evidence of inequalities in treatment in a private hospital incur the full
waiting times favoring more-educated and cost of treatment, which is paid by the patient
richer individuals. More precisely, a patient either directly or through her private health
who is least skill deprived in education wait insurer.
9–14 % less than other patients; patients in the Given such institutional feature, one explana-
fourth and fifth most income-deprived quintile tion for a potential observed gradient between
wait about 7 % longer than other patients. The waiting time and socioeconomic status for pub-
analysis provides evidence that most inequal- licly funded patients is the possibility of sample
ities occur within hospitals rather than across selection: rich patients who expect to wait are
hospitals (failure to control for hospital fixed more likely to afford and opt for the private sector
effects results in underestimation of the income generating a negative gradient between income
gradient). The key insights are similar when the and waiting time in the public system. In other
Cox nonparametric model is employed. More words, public hospitals treat poor patients with
educated patients have a higher probability of expected high and low waiting times, but only
leaving the list (the inverse of hazard ration) by rich patients with low waiting times are treated
2–6 %. Richer patients have a higher probabil- in public hospitals. This is of potential importance
ity of leaving the list by 4–9 %. for policy. If the gradient is explained by sample
selection, then it should not be interpreted as inpatient waiting times for publicly funded
evidence of inequity. patients in public hospitals in 2004–2005 and
Since private hospitals have to report the same include all acute illnesses. Socioeconomic status
data than public hospitals, detailed administrative is measured through the SEIFA index (mentioned
data are available for both public and private above) split into five groups. Without controlling
sector (unlike many other countries). These data for supply factors, they find that more deprived
are therefore suitable for testing for sample selec- patients wait 30 % longer than those in the least
tion generated by the private sector through a deprived group (they wait about a month more
Heckman sample-selection model (the distance with an average wait of about 3 months across all
to the nearest public and private hospitals are patients included in the sample). These differ-
used as identifying variable). ences reflect inequalities both within and across
Like the English studies, socioeconomic status hospitals.
is measured through an index which captures eco- Once the authors control for supply factors
nomic resources at small-area level (suburbs), (such as bed occupancy rate, length of stay, ratio
known as the SEIFA (Socio-Economic Indexes of clinical staff to beds, proportion of emergency
for Areas) for economic resources. Examples of admissions), then patients wait 16–24 % longer
variables which generate the SEIFA index for are compared to patients in the highest socioeco-
the proportion of: people with low-income, sin- nomic group. This implies that richer patients
gle-parent families, occupied private housing with live in areas with better supply of hospital ser-
no car, households renting from community orga- vices. However, inequalities within the hospital
nization, unemployed, and households owning a persist after controlling for supply factors.
house they occupy. Quantile regression results confirm that inequal-
The analysis suggests that individuals who live ities are present at all quantiles of the waiting time
in richer areas wait less. Compared to patients distribution.
living in areas with lowest income, patients living
in areas with highest income wait 13 % less. With
an average waiting of 89 days, this implies an Norway
average reduction of 11 days. Patients in almost
every decile of income have a progressively lower Monstad et al. (2014) use data from the Norwe-
waiting time than the one below. Once selection is gian Arthroplasty Register for patients in need of
taken into account, the gradient between waiting hip replacement in Norway in 2002–2003 to test
times and socioeconomic status reduces signifi- whether patients with higher socioeconomic sta-
cantly in size but does not disappear. Compared to tus wait less. Income and education are measured
patients in the lowest income decile, patients at individual level. The sample covers 98 % of all
whose income falls between the 2nd and 7th dec- hip replacements. Since every patient has a unique
iles wait 3–4 % less, and patients whose income personal identification code, then the registry data
falls between the 8th and 10th deciles wait 5–7 % can be perfectly matched with other registers at
less. Therefore, the analysis still suggests evi- Statistics Norway.
dence of inequity though a reduced one compared The healthcare system in Norway is largely
to the case when selection is not taken into publicly funded with a negligible private sector
account. The results from quantile regression (therefore, the possibility to opt out is limited).
models confirm that inequalities persist at differ- Waiting times for hip replacement were on aver-
ent points of the waiting time distribution. age 170 days. The analysis is presented separately
Johar et al. (2013) use administrative data from for men and women. All specifications control for
New South Wales in Australia to decompose var- hospital fixed effects. Therefore, results can be
iations in waiting times that are due to clinical interpreted as inequalities arising “within the
need, supply factors, and nonclinical factors hospital.” The study finds that richer men and
such as socioeconomic status. They measure more-educated women tend to wait less: a 10 %
358 L. Siciliani
increase in income reduces waiting times by 8 %; orthopedic surgery and 34 % longer waiting
women with 3 years of upper secondary education times for general surgery. No differences on the
wait 7 % less compared to those with compulsory basis of ethnicity and gender were found. Income
schooling only. mattered more at the upper tail of the waiting time
Carlsen and Kaarboe (2014) use administrative distribution.
data (the Norwegian patient registry) from all
elective inpatient and outpatient hospital stays in
Norway for 2004–2005. The waiting time is mea- Canada
sured from the referral (from family doctor) until
the patient meets with a hospital specialist. Socio- Alter et al. (1999) employ a large administrative
economic status is measured at small-area level dataset to investigate whether publicly funded
(about 31,000 cells). Since the register contains waiting times for patients in need of a coronary
information about hospital stay, gender, year of angiography in 1993–1997 in Ontario (Canada)
birth, and resident municipality, patients can be differ for by socioeconomic status. The latter is
uniquely assigned to population cells that com- proxied by neighborhood income as determined
bine gender, age, and municipality. For each pop- by the Canadian census. The study controlled for a
ulation cell, Statistics Norway computed a set of number of supply factors such as the hospital
variables that describe the income and educational volume, distance from hospital, type of hospital,
levels of the cell population in 2004. in addition to clinical ones capturing patients’
The study finds that men with tertiary educa- severity. The study finds that patients in the
tion wait about 15 % less than men with primary highest income quintile wait 45 % less compared
education only. Women in the lowest income to patients in the lowest income quintile.
quintile wait 11 % longer than women with Carrière and Sanmartin (2007) investigate
highest income quintile. However, once controls determinants of waiting times for specialist con-
are added for hospital-specific factors (whether sultation using the 2007 Canadian Community
they went to the local hospital, travel time, and Health Survey. Like other surveys, the analysis
choice of hospital), most of inequalities disappear. does not control for hospital variables. On the
Whether the patient goes to the “local hospitals” other hand, socioeconomic status (household
and travel distance are key factors explaining the income and educational attainment) is measured
gradient. Since hospitals in low-income regions at individual level. The key finding is that com-
have longer waiting time than hospitals located in pared with men in the top income quintile, those in
high-income and middle-income regions, control- the lowest were less likely to see a specialist
ling for local hospitals makes the income gradient within a month (after controlling for possible con-
flatter. Travel distance also weakens the associa- founders). This was not the case for women.
tion between income and waiting time. Patients’
income decreases in traveling distance, whereas
waiting time increases with distance. Germany
Using survey data between 2007 and 2009, Roll

Sweden et al. (2012) investigate the impact of income and
type of insurance on waiting times to see a family
Tinghög et al. (2014) use administrative hospital doctor and a specialist. Type of insurance is a
data on all elective surgeries performed in critical control variable since Germany has a
Östergötland in Sweden in 2007. These data multipayer health system divided into two main
were linked to national registers containing vari- components: statutory health insurance and pri-
ables on socioeconomic variables. The study finds vate health insurance. While the first is financed
that patients with low disposable household by income-related contribution rates, private
income have 27 % longer waiting times for insurance is financed by risk-based rates. The
vast majority of the population is covered by Italy

statutory insurance. However, individuals with
an income of approximately 50,000 Euro in Petrelli et al. (2012) employ administrative data in
2011 can opt out to take private insurance which Piedmont (a large Italian Region) in 2006–2008 to
covers about 11 % of the population. investigate inequalities in waiting times for
After controlling for insurance type, mild or selected surgical procedures, such as coronary
severe severity, chronic conditions, and type of bypass, angioplasty, coronarography, endarterec-
care needed, the study finds that income reduces tomy, hip replacement, and cholecystectomy.
waiting time for both an appointment with the GP Waiting time is measured for publicly funded
and the specialist. Individuals with a household patients. It refers to the inpatient wait, from the
income with more than 2,000 Euro per month specialist addition to the list to admission for
were associated with a reduction in waiting time treatment. Socioeconomic status was measured
for a GP appointment by 1 day or 28 % compared by education only, not income.
to respondents with an income of less than The Italian health system has universal cover-
500 Euro (with a sample mean of about 3 days). age with limited or no copayments for inpatient
For the waiting time of an appointment with the hospital care. The analysis controls for severity
specialist, a household income of more than 5,000 (as proxied by the Charlson index) in addition to
Euro per month was associated with significantly demographic variables. The results from Cox
lower waiting time (28 % or 5 days less; regression suggest that more-educated patients
sample mean of 30 days). Individuals with private are more likely to wait less for all procedures
insurance also obtain faster access to health services. except for coronary bypass (where the difference
is not statistically significant).
Spain
Conclusions and Implications
Abasolo et al. (2014) use the 2006 Spanish for Policy
National Health Survey to test for waiting time
inequalities. The Spanish health system is charac- Within publicly funded systems, access to ser-
terized by universal coverage and tax funding. vices is supposed to depend on need and not
Waiting time is measured for the last specialist ability to pay (or, more broadly, socioeconomic
visit and is measured separately for a first visit status). The recent empirical literature reviewed in
and for a review visit. Like other studies this chapter seems however to suggest that this is
employing survey data, household income and not necessarily the case. The chapter focuses on
education are measured at individual level. Only elective (i.e., nonemergency) services and does
public patients are included in the analysis. Public not cover the literature on waiting times in the
patients have no or limited copayments for spe- emergency room. There is empirical evidence
cialist services. Average waiting time was about from several countries, suggesting that individuals
2 months. with higher socioeconomic status (as measured by
The analysis controls for type of speciality, income or educational attainment) tend to wait less
self-assessed health, existing conditions (such as for publicly funded hospital elective services than
hypertension and heart problems), whether the those with lower socioeconomic status. Combined
patient has private insurance, employment status, with the empirical literature reviewed in the Intro-
living in a rural area, different regions, in addition duction, it suggests that not only individuals with
to demographic variables. The study finds that an higher socioeconomic status tend to see doctors
increase of 10 % of the income reduces waiting more frequently, but also more swiftly.
times for diagnosis visits in 2.6 %. Individuals Waiting-time inequalities within public sys-
with primary education wait 28 % longer than tems may be due to a number of different reasons.
individuals with university studies. They may be due to hospital geography with some
360 L. Siciliani
hospitals having more capacity and being located References

in more affluent areas. Inequalities in waiting times
may also arise “within” the hospital if individuals Abasolo I, Negrin-Hernandez MA, Pinilla J. Equity in
specialist waiting times by socioeconomic groups: evi-
with higher socioeconomic status engage more
dence from Spain. Eur J Health Econ. 2014;15:323–34.
actively with the health system, exercise pressure Alter DA, Naylor CD, Austin P, Tu JV. Effects of socio-
when they experience long delays, are able to economic status on access to invasive cardiac proce-
express better their needs, have better social net- dures and on mortality after acute myocardial
infarction. N Engl J Med. 1999;348(18):1359–67.
works (attempt to jump the queue), miss scheduled
Appleby J, Boyle S, Devlin N, Harley M, Harrison A,
appointments less frequently, and are willing to Thorlby R. Do English NHS waiting time targets distort
travel further in the search of lower waits. treatment priorities in orthopaedic surgery? J Health
Although there is significant evidence on Serv Res Policy. 2005;10(3):167–72.
Armstrong PW. First steps in analysing NHS waiting
social inequalities in waiting times, it is still not
times: avoiding the ‘stationary and closed population’
known which of its possible determinants are the fallacy. Stat Med. 2000;19(15):2037–51.
most critical. The methods and data outlined in Armstrong PW. The ebb and flow of the NHS waiting list:
this chapter could be usefully employed in future how do recruitment and admission affect event-based
measures of the length of ‘time-to admission’? Stat
research to further uncover evidence on the pres-
Med. 2002;21:2991–3009.
ence of such inequalities in a number of countries, Cameron CA, Trivedi PK. Microeconometrics: methods
and perhaps most importantly, its key determi- and applications. Cambridge: Cambridge University
nants. The degree to which these inequalities are Press; 2005.
Cameron CA, Trivedi PK. Microeconometrics using Stata.
unjust depends on its exact mechanisms, for
Rev. ed. College Station: Stata Press. 2010.
example, whether richer patients exercise more Carlsen F, Kaarboe O. Waiting times and socioeconomic
active choice among public providers (a policy status. Evidence from Norway. Health Econ. 2014;23:
which is encouraged in many countries) or 93–107.
Carrière G, Sanmartin C. Waiting time for medical special-
whether through more deliberate attempts to
ist consultations in Canada. 2007. Statistics Canada,
jump the queue. Therefore, rationing by waiting Catalogue no. 82-003-XPE. Health Rep. 2010;21
times may be less equitable than it appears. (2):7–14.
In some countries, universal health coverage Cooper ZN, McGuire A, Jones S, Le Grand J. Equity,
waiting times, and NHS reforms: retrospective study.
coexists with a parallel private sector for patients
Br Med J. 2009;339:b3264.
who are willing to pay out of pocket or who are Devaux M. Income-related inequalities and inequities in
covered by private health insurance. Individuals health care services utilisation in 18 selected OECD
with higher income are more likely to be able countries. Eur J Health Econ. 2015;16(1):21–33.
Dimakou S, Parkin D, Devlin N, Appleby J. Identifying the
affording private care, generating inequalities in
impact of government targets on waiting times in the
waiting times by socioeconomic status within a NHS. Health Care Manag Sci. 2009;12(1):1–10.
country. In such circumstances, it is much less Dixon H, Siciliani L. Waiting-time targets in the healthcare
surprising that such inequalities exist. sector. How long are we waiting? J Health Econ.
2009;28:1081–98.
Uncovering the exact mechanisms that explain
Fisher LD, Lin DY. Time-dependent covariates in the Cox
the socioeconomic gradient in waiting times is proportional hazards regression model. Annu Rev Pub-
also critical for policy design. For example, if lic Health. 1999;20:145–57.
the gradient is due to hospitals having access to Heckman JJ. Sample selection bias as a specification error.
Econometrica. 1979;47(1):153–61.
different resources, then policymakers may want
Johar M, Jones G, Keane M, Savage E, Stavrunova
to improve allocation formulas that appropriately O. Differences in waiting times for elective admissions
reflect need. If instead the gradient arises within in NSW public hospitals: a decomposition analysis by
the hospital with some patients attempting to jump non-clinical factors. J Health Econ. 2013;32:181–94.
Jones AM. Applied econometrics for health economists: a
the queue, more robust mechanisms to regulate
practical guide. Oxford: Radcliffe Medical Publishing;
the waiting list management may be required. If 2007.
poorer people are struggling to keep up with the Laudicella M, Siciliani L, Cookson R. Waiting times and
health booking systems, then simplifications and socioeconomic status: evidence from England. Soc Sci
Med. 2012;74(9):1331–41.
greater transparency could be considered.
Martin S, Smith PC. Rationing by waiting lists: an empir- Siciliani L, Verzulli R. Waiting times and socioeconomic
ical investigation. J Public Econ. 1999;71:141–64. status among elderly Europeans: evidence from
Monstad K, Engeaeter LB, Espehaug B. Waiting time SHARE. Health Econ. 2009;18(11):1295–306.
socioeconomic status – an individual level analysis. Siciliani L, Borowitz M, Moran V, editors. Waiting time
Health Econ. 2014;23:446–61. policies in the health sector. What works? Paris: OECD
Pell J, Pell A, Norrie J, Ford I, Cobbe S. Effect of socio- Book; 2013a.
economic deprivation on waiting time for cardiac sur- Siciliani L, Moran V, Borowitz M. Measuring and compar-
gery: retrospective cohort study. Br Med J. 2000;321: ing health care waiting times in OECD countries.
15–8. OECD health working papers, 67. OECD Publishing;
Petrelli A, De Luca G, Landriscina T, Costa G. Socioeco- 2013b. https://doi.org/10.1787/5k3w9t84b2kf-en.
nomic differences in waiting times for elective surgery: Siciliani L, Moran V, Borowitz M. Measuring and compar-
a population-based retrospective study. BMC Health ing health care waiting times in OECD countries.
Serv Res. 2012;12:268. Health Policy. 2014;118(3):292–303.
Pettitt AN, Daud IB. Investigating time dependence in Tinghög G, Andersson D, Tinghög P, Lyttkens
Cox’s proportional hazards model. J R Stat Soc. Ser C CH. Horizontal inequality when rationing by waiting
(Appl Stat). 1990;39(3):313–29. lists. Int J Health Serv. 2014;44(1):169–84.
Roll K, Stargardt T, Schreyogg J. Effect of type of insur- van Doorslaer E, Wagstaff A, et al. Equity in the delivery of
ance and income on waiting time for outpatient care, health care in Europe and the US. J Health Econ.
the Geneva papers. Int Assoc Study Insur Econ. 2000;19(5):553–83.
2012;37:609–32. Van Doorslaer E, Koolman X, Jones AM. Explaining
Schoen C, Osborn R, Squires D, Doty MM, Pierson R, income-related inequalities in doctor utilization in
Applebaum S. How health insurance design affects Europe. Health Econ. 2004;13(7):629–47.
access to care and costs, by income, in eleven countries. Wagstaff A, van Doorslaer E. Equity in health care financ-
Health Aff. 2010;29(12):2323–34. ing and delivery. Chapter 34. In: Culyer AJ, Newhouse
Sharma A, Siciliani L, Harris A. Waiting times and socio- JP, editors. Handbook of health economics, vol. 1. 1st
economic status: does sample selection matter? Econ ed. Amsterdam: Elsevier Science/North-Holland; 2000.
Model. 2013;33:659–67. p. 1803–62.
Health Services Data: The Ontario
Cancer Registry (a Unique, Linked, 16
and Automated Population-Based
Registry)
Sujohn Prodhan, Mary Jane King, Prithwish De, and

Julie Gilbert
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
History of Cancer Registration in Ontario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
Automation and OCRIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
EDW Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Who Uses OCR Data and for What Purpose? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Examples of Provincial Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Examples of National Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Examples of International Stakeholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Pathology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Activity Level Reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
DAD and NACRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Death Certificates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Data Systems and Consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
OCRIS and the EDW Successor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Patient Linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Case Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
Data Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
Data Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
Other Factors Affecting Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
The OCR Adopts a New Approach to Counting Cancers . . . . . . . . . . . . . . . . . . . . . . . . . . 376
Topography and Morphology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376
Laterality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
S. Prodhan (*) · M. J. King · P. De (*)

Surveillance and Ontario Cancer Registry, Cancer Care
e-mail: prithwish.de@cancercare.on.ca
J. Gilbert
Planning and Regional Programs, Cancer Care Ontario,
Toronto, ON, Canada
# Her Majesty the Queen in Right of Canada 2019 363

https://doi.org/10.1007/978-1-4939-8715-3_18
364 S. Prodhan et al.
Implications of Counting Rules on Data and

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Best Practices for Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
Cancer Stage at Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
CS Automation and Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Source of Staging Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Stage Capture Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Linkage of the OCR to Other Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Other Linkage Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
CCO’s Other Data Holdings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
Health Services Research Using the OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
Examples of Health Services Research Using
the OCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
Patient Contact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
Data Privacy and Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
Data Request Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Technical Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
ePath, eMaRC, and ASTAIRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Abstract linked at the person level and then “resolved”

Since its creation in 1964, the Ontario Cancer into incident cases of cancer using a unique
Registry (OCR) has been an important source computerized medical logic. Recent technologi-
of high-quality information on cancer inci- cal updates to the OCR have further modernized
dence and mortality. As a population-based the registry and prepared it for future develop-
registry, the OCR can be used to assess the ments in the field of cancer registration.
provincial burden of cancer, track the progress This chapter describes the evolution of the
of cancer control programs, identify health dis- OCR, its basic processes and components of
parities among subpopulations, plan and automation, data elements, data quality mea-
improve healthcare, perform health services sures, linkage processes, and other aspects of
research, verify clinical guideline adherence, the registry that make it of particular interest to
evaluate screening effectiveness, and much health services researchers and more broadly to
more. With over one third of Canadians resid- the healthcare and public health community.
ing in Ontario, the OCR is the nation’s largest
provincial cancer registry and a major contrib- List of Abbreviations
utor to the Canadian Cancer Registry. In 2015 AJCC American Joint Committee on
alone, the OCR collected data on an estimated Cancer
83,000 malignant cases. ALR Activity Level Reporting
Through its active participation in Canadian, CCO Cancer Care Ontario
North American, and international standard set- CIHI Canadian Institute for Health
ting bodies, the OCR adopts the latest methods Information
for registry data collection and reporting. The CS Collaborative Stage
OCR is created entirely from records generated DAD CIHI’s Discharge Abstract
for purposes other than cancer registration. These Database
records include pathology reports, treatment- DCO Death certificate only
level activity, hospital discharges, surgery data, DSA Data sharing agreement
and death certificates. Electronic records are eCC Electronic Cancer Checklist
16 Health Services Data: The Ontario Cancer Registry (a Unique, Linked, and Automated. . . 365
EDW Enterprise Data Warehouse (the Ministry) and its advisor on the cancer and
EDW- Enterprise Data Warehouse based renal systems, as well as on access to care for key
OCR OCR health services. CCO strives for continuous
eMaRC Electronic Mapping, Reporting, improvement in disease prevention, screening,
and Coding Plus the delivery of care, and the patient experience.
ePath Electronic pathology data collec- CCO works with Regional Cancer Programs
tion system across the province, cancer experts, community
IACR International Association of Can- advisory committees, hospitals, provincial agen-
cer Registries cies and government, public health units, the
IARC International Agency for Ontario Hospital Association, the not for profit
Research on Cancer sector, as well as with cancer agencies in other
ICBP International Cancer provinces and the federal government, among
Benchmarking Partnership others, in order to achieve its mandate. Authority
ICD International Classification of for CCO’s programs and functions are provided
Diseases in the provincial Cancer Act, the Personal Health
ICD-O International Classification of Information Protection Act (PHIPA 2016), and a
Diseases for Oncology Memorandum of Understanding between the
MPH Multiple Primary and Histology Ministry and CCO (Cancer Act 2006).
NAACCR North American Association of In accordance with Ontario’s PHIPA legisla-
Central Cancer Registries tion, CCO is defined as a “prescribed entity” for
NACRS CIHI’s National Ambulatory Care certain functions. This designation authorizes
Reporting System CCO to collect, use, and disclose personal health
OCR Ontario Cancer Registry information for the purposes of cancer manage-
OCRIS Ontario Cancer Registry Infor- ment and planning. The OCR is a prescribed
mation System entity to support this goal. The OCR team at
OCTRF Ontario Cancer Treatment and CCO is comprised of pathology coders, standards
Research Foundation advisors, stage abstractors, quality assurance and
RCC Regional Cancer Center data analysts, and a management team. The OCR
SEER Surveillance, Epidemiology, and team’s responsibilities include:
End Results program
SSF Site-specific factors • Curating and coding source data to identify
TNM Tumor Node Metastasis staging incident cancer cases
• Deriving population-level cancer staging
values
Introduction • Working with standard setting bodies to estab-
lish the best practices for the registry
The purpose of this chapter is to describe
• Setting direction for the management of the
the evolution of the Ontario Cancer Registry
registry
(OCR), and explore its many purposes, pro-
• Collaborating with partners and stakeholders
cesses and applications that make it of particular
to enable use of OCR data for surveillance and
interest to researchers. The chapter also empha-
research, and more generally for cancer pre-
sizes how the registry has established itself as an
vention and control
effective population-based surveillance and
research tool.
The OCR is the official provincial cancer inci- The goal of the registry is to collect and dis-
dence registry for Ontario and is managed by Can- seminate timely and high-quality information
cer Care Ontario (CCO). CCO is an agency of the describing all cases of cancer diagnosed among
Ontario Ministry of Health and Long-Term Care Ontario residents using measures of cancer
80000 480
70000 420
Age-standardized rate per 100,000

60000 360
Number of new cases or deaths
50000 300
New cases
40000 240
Deaths
Incidence rate
30000 180
Mortality rate
20000 120
10000 60
0 0
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Year of diagnosis or death
Fig. 1 Cancer incidence and mortality counts and sharp rise in incidence from 2010 onward is attributed to
age-standardized rates per 100,000 (adjusted to the 1991 the adoption by the OCR of new counting rules for multiple
Canadian Standard population) from 2002 to 2011. The primary cancers (Source: Ontario Cancer Registry, 2014)
burden such as incidence and mortality. The OCR organizations and individuals in the healthcare
is the largest provincial cancer registry in system to provide such information to the
Canada, covering a population that comprises OCTRF. This led to the formation of Ontario’s
almost 40 % of the Canadian population. With cancer registry in 1964. Initially managed by the
Ontario’s growing and aging population, the Ontario Department of Health, the cancer regis-
OCR is expected to have collected information try began tracking cancer incidence in 1964 and
on a projected 83,000 new cases of cancer in retrospectively collected cancer mortality data
2015 (Fig. 1). from as far back as 1950. In 1970, the OCTRF
took ownership of the cancer registry. A more
complete description of the historical mile-
History of Cancer Registration stones of the registry is described elsewhere
in Ontario (Clarke et al. 1991).
Recognition of the importance of population-

based cancer registration in Ontario goes back Automation and OCRIS
to 1943, with the passing of the provincial Can-
cer Act and the establishment of the Ontario One major transformation in the long history
Cancer Treatment and Research Foundation of the OCR was the adoption of the Ontario
(OCTRF). The OCTRF, which became Cancer Cancer Registry Information System (OCRIS)
Care Ontario in 1997, was established to “con- in the early 1980s. Previously, the cancer regis-
duct a program of research, diagnosis and treat- try was curated manually using records
ment in cancer” (Cancer Act 2006). While received from hospitals and cancer centers.
the Cancer Act did not make the reporting of This approach resulted in significant delays in
cancer diagnoses a legal obligation, it permitted the processing of case records. With the advent
of OCRIS and its automatic record delivery, the Examples of Provincial Stakeholders
OCTRF was capable of receiving records almost
instantly. OCRIS’ improvements to data collec- The Cancer Quality Council of Ontario is an arm’s
tion and the development of case resolution – the length agency of the Ontario government tasked
sophisticated system of computerized medical with measuring the performance of the Ontario
logic - further established the province’s cancer cancer system. The Council relies on OCR data
registry as an important tool in cancer control. to generate the Cancer System Quality Index,
Enhancements to OCRIS were later made in the which reports on quality measures aimed at stim-
1990s, and for the next 20 years, it continued to ulating improvement in the cancer system.
be an integral component of cancer registration Informing program delivery is another example
in the province. of cancer registry data use. In partnership with CCO,
Ontario’s Regional Cancer Programs administer
programs and services for cancer prevention and
EDW Reconstruction care in all 14 of the province’s local health author-
ities and the Local Health Integration Networks
In 2014, many years of work culminated in the (Fig. 2). OCR data are a source of information
first major reconstruction of the cancer registry used by these networks in the planning, integration,
since the adoption of OCRIS. The registry was and funding of local healthcare, as well as in
rebuilt within the newly adopted technology of improving access and the patient experience.
the Enterprise Data Warehouse (EDW). This CCO also regularly shares OCR data with its
change also coincided with the adoption of provincial partners and collaborators, including:
new standards for the registration of cancer
cases, specifically the Multiple Primary and • Pediatric Oncology Group of Ontario,
Histology (MPH) coding rules of the Surveil- Ontario’s lead agency on childhood cancer sur-
lance, Epidemiology, and End Results Program veillance, research, care, and support
(SEER). The need to modernize the OCR • Institute for Clinical Evaluative Sciences, a
through technological improvements was research institute that performs many leading
prompted by greater demand for the registry’s evaluative studies on healthcare delivery and
business intelligence capabilities. The new outcomes, often by linking together health data
EDW-based OCR was officially launched in such as physician billing claims and hospital
October 2014. discharge abstracts with cancer data
• Cancer Research Institute of Queen’s University,
which undertakes studies of cancer etiology,
Who Uses OCR Data and for What tumor biology, clinical trials, as well as outcomes
Purpose? and health services research
• Public Health Ontario, an agency dedicated to
In recent years, the community of users of protecting and promoting the health of all
cancer registry data has expanded beyond the Ontarians and reducing inequities in health
traditional audience of epidemiologists, cancer through surveillance and research related to
surveillance analysts, public health researchers, chronic and communicable diseases.
and policy analysts. Increasingly, the healthcare
provider community, health services researchers,
and cancer system planners are turning to Examples of National Stakeholders
population-based cancer registries like the OCR
for foundational data to address questions related The OCR is 1 of 13 provincial and territorial
to clinical care and healthcare planning. The fol- cancer registries that populate the Canadian Can-
lowing sections highlight several examples of the cer Registry managed by Canada’s statistical
OCR’s stakeholders. agency (Statistics Canada). The Canadian Cancer
Fig. 2 Map of Ontario’s Local Health Integration Net- West, 6. Mississauga Halton, 7. Toronto Central, 8. Cen-
works. 1. Erie St. Clair, 2. South West, 3. Waterloo Wel- tral, 9. Central East, 10. South East, 11. Champlain, 12.
lington, 4. Hamilton Niagara Haldimand Brant, 5. Central North Simcoe Muskoka, 13. North East, 14. North West
Registry is the main source of cancer statistics in numerous international research initiatives,
used in cancer health planning and decision- including but not limited to the International Can-
making at the national level. The OCR represents cer Benchmarking Partnership (ICBP) and the
the Canadian Cancer Registry’s largest provincial CONCORD studies on cancer survival.
source of cancer data and, as a result, greatly Established in 1987, NAACCR is an umbrella
influences national cancer statistics. The provin- organization for North American cancer regis-
cial and territorial cancer registries work with the tries, governmental agencies, professional associ-
Canadian Cancer Registry program to establish ations, and private groups interested in the
national standards for registry operations and dissemination of cancer data. NAACCR achieves
data collection. its mission through the active participation of
CCO also collaborates with the Canadian Part- selected US state cancer registries and Canadian
nership Against Cancer, a national agency that provincial and territorial cancer registries. As with
leads the performance measurement of Canada’s other member registries, the OCR shares its data
cancer system. The partnership uses OCR and with NAACCR annually. The compiled data are
other data from CCO and other provincial cancer used to present North American cancer statistics
agencies to identify disparities in cancer care and in NAACCR’s annual publication (Cancer Inci-
management at the national and provincial levels. dence in North America).
The OCR is one of several provincial cancer
registries that submits its data every 5 years to
Examples of International Stakeholders IARC for inclusion in a compendium of cancer
incidence data from internationally recognized
CCO actively shares OCR data with international cancer registries called Cancer Incidence in Five
organizations such as the North American Asso- Continents. Data on childhood cancer incidence
ciation of Central Cancer Registries (NAACCR) are also submitted by the OCR for inclusion in
and the International Agency for Research on IARC’s International Incidence of Childhood
Cancer (IARC). The registry data are also used Cancer report.
Table 1 The OCR’s four main data sources for incident record creation
Relative rank of
importance in record Load frequency into
Source Type(s) of information creation EDW-OCR
Pathology (from public and Pathology reports and 1 Weekly
private laboratories) diagnostic test results
ALR (from Regional Cancer Treatment, past medical 2 Monthly
Centers) history and out-of-province
records
DAD and NACRS (from Admissions, discharge and 3/4 Monthly
CIHI) surgery data
Death certificates (from the Cause of death; 3 Typically every 18–24
Registrar General of Fact of death months; Every quarter
Ontario)
ALR Activity Level Reporting, DAD Discharge Abstract Database, NACRS National Ambulatory Care Reporting System,
CIHI Canadian Institute for Health Information
The ICBP is a global initiative that combines databases, laboratory reports, and clinical
the OCR with 12 comparable population-based records, including:
cancer registries. The ICBP’s registry data spans
six countries across three continents. Open only • Pathology reports
to registry jurisdictions with universal access to • Activity Level Reporting from Regional
healthcare and similar levels of healthcare Cancer Centers (RCCs)
spending, the ICBP aims to optimize the cancer • Surgery and discharge data from the Canadian
policies and services of its partners. To date, the Institute for Health Information (CIHI)
OCR has participated in three of five of the • Death certificates
ICBP’s research modules, exploring the topics • Notification of out of province diagnosis or
of cancer survival, delays between treatment treatment of Ontario residents
and diagnosis, and short-term survival (ICBP
booklet 2014). Each data source is managed differently by the
The CONCORD study was the first world- OCR and serves a unique purpose in record crea-
wide analysis of its kind to systematically com- tion (Table 1).
pare cancer survival across five continents, It is uncommon to have a single data source for
involving 101 cancer registries from 31 coun- any given cancer case (Fig. 3), but certain sources
tries (Coleman et al. 2008). Canadian data in the are more commonly available than others. For
study was composed of the OCR and four other example, of the 233,020 incident cases recorded
provincial and territorial cancer registries. The between 2010 and 2012, 84 % included a pathology
OCR was used again in the follow-up CON- report. In 7 % of all cases, pathology reports were
CORD-2 study, which assessed survival across the only given source record. By comparison, 60 %
279 population-based cancer registries from of all cases had a corresponding NACRS record.
67 countries (Allemani et al. 2015). However, in less than 0.1 % of all cases, NACRS
was the only provided source record.
Data Sources
Pathology
OCR records are created using data collected
for purposes other than cancer registration. Pathology reports are the main diagnostic source
The data come from various administrative for new case record creation (Table 1). Through the
Fig. 3 OCR data sources

for 2010–2012 incident
cases, showing the
proportion of case records
that included a particular
data source. Also shown,
percent of case records
generated from a single
source (Source: Ontario
Cancer Registry, 2015)
ePath electronic pathology reporting system, CCO Checklist (eCC) developed by the College of
receives over one million pathology reports each American Pathologists. Checklists and standard
year, sent in from 47 provincial facilities. In 2014, data fields in the eCCs eliminate the descriptive
237,834 of these reports were cancer relevant to language found in narrative reports. Synoptic
173,226 unique reports. To efficiently handle this reports can be submitted in real time, making
large volume of information, pathology data is them a significantly more efficient method of
loaded into the EDW-OCR on a weekly basis. pathology reporting.
Pathology reports are delivered to CCO in One promising development is the inclusion
one of two forms – as narrative or as both narra- of biomarkers in synoptic reporting. Biomarkers
tive/synoptic reports. Narrative reports describ- are laboratory indicators that can help identify
ing a patient’s pathology test results are those abnormal processes, conditions, or disease.
that have been written in sentence form or orally With respect to cancer care, biomarkers are of
transcribed. While these types of reports can be particular interest as they can provide informa-
submitted electronically, they cannot be handled tion on cancer etiology, prognosis, and diagno-
automatically and are difficult to query. Coders sis. Examples of commonly used biomarkers
must manually review narrative reports to derive include HER2 for breast cancer, KRAS for colo-
relevant information and verify if there is indeed rectal cancers, and ALK for lung cancer. In
a cancer diagnosis. collaboration with the College of American
Narrative reports currently account for Pathologists’ Pathology Electronic Reporting
approximately 70 % of all pathology reports Committee, CCO is working to create biomarker
received by CCO. The other 30 % of reports templates for synoptic reporting. By September
are received in synoptic form, a highly struc- 2016, all 19 of Ontario’s genetic facilities are
tured and standardized format of data submis- expected to implement eCC biomarker reporting.
sion submitted electronically. These reports In preparation, Ontario has mandated 5 biomarker
improve overall completeness, ease of data eCCs for lung, colorectal, breast, stomach cancers
exchange, treatment related decision-making, and melanoma. CCO is also equipped to handle
and turnaround time. First implemented in optional for use biomarker eCCs for endometrial,
2009, the synoptic pathology reporting system gastrointestinal stromal tumor, myeloid, lym-
in Ontario is derived from the Electronic Cancer phoid, and CNS tumors.
Activity Level Reporting Data Systems and Consolidation
Data submitted by RCCs include Activity Level OCRIS and the EDW Successor
Reporting (ALR). ALR consists of patient records
pertaining to radiation and systemic therapy ser- OCRIS served as CCO’s cancer registry informa-
vices as well as oncology clinic visits. Sixty-two tion system since the 1980s. In an effort to mod-
percent of new cancer cases in the OCR from ernize the registry and align it with current
2010 to 2012 included ALR as a reporting source standards, OCRIS was formally decommissioned
(Fig. 3). Some out-of-province data are collected and replaced by the Enterprise Data Warehouse
for patients that access cancer services outside of (EDW)-based OCR in late 2014. The EDW was
Ontario (e.g., in neighbouring provinces). The initially designed to store ALR data for examining
loading of ALR data into the OCR occurs on a treatment and financial metrics, but in 2005 the
monthly cycle. ALR data can be reported in either decision was made to reconstruct the cancer reg-
ICD-10 or ICD-O-3 coding systems. istry within the EDW.
The EDW is composed of numerous data hold-
ings, three of which are primarily related to cancer
DAD and NACRS registration (see “Technical Appendix” for more
details):
CIHI supplies data from the Discharge Abstract
Database (DAD) and National Ambulatory Care • Pathology/source data mart
Reporting System (NACRS). DAD includes • Ontario Cancer Registry (EDW-OCR)
administrative, clinical, and demographic data • Collaborative Staging (CS) data mart
pertaining to all hospital in-patient discharges.
NACRS reports all hospital- and community- CCO’s IT team is responsible for EDW sup-
based ambulatory care in day surgery, outpatient port, data load, linkages, .net support and techni-
clinics, and emergency departments. As of 2002, cal quality assurance.
all CIHI data are coded in ICD-10-CA.
Death Certificates Patient Linkage
Death certificates are obtained by the OCR from Through the key processes of patient linkage and
the Registrar General of Ontario. This information case resolution, the OCR registrars are able to
is used to track the vital status of patients in the generate linkable records that combine all relevant
registry and ensure that all incident cancer cases data while eliminating redundant records. The
have been identified, particularly those that are EDW-OCR also permits any manual correction
only identified upon death. This process is of cases at the record level, something not previ-
known as death clearance. ously possible with OCRIS. Although the OCR
Coded death certificates are received between relies on various automatic processes, manual
18 and 24 months after death. In lieu of death review and input are still required to verify the
certificates, CCO also accepts fact of death for completeness and accuracy of information for
death clearance. CCO receives fact of death cancer registration.
records approximately every quarter, describing Patient linkage is one of cancer registration’s
deaths that have occurred approximately 6 months most fundamental processes and involves a combi-
prior to the current quarter. Unlike death certifi- nation of deterministic and probabilistic linkage
cates, fact of death does not provide any insight routines to aggregate a person’s source records
into an individual’s diagnosis of cancer and can into a “best” linked person record, which is a com-
only be used to close existing cases in the OCR. posite record representing the individual. This
entails the linking of new source records to existing existing case would not pass death clearance
person records. Source records that do not and the OCR would over-report cancer survival
match to existing person records are consoli- and prevalence.
dated and added to the OCR as new person
records. However, there are several challenges
with the linkage process. Aside from adminis- Case Resolution
trative errors like the misspelling of names or
varying date formats, not all reports contain While the goal of patient linkage is to tie patient
identical data elements. Unlike data from records together, case resolution works to consol-
CIHI, ALR, or ePath, death certificates fail to idate these data into individual cancer cases. The
provide patient Health Insurance Numbers. immense volume of data received by CCO for the
Because of the inconsistency in source data, purpose of curating the OCR necessitates a highly
deterministic linkage is ruled out as a major competent system to handle information and pare
method for creating patient records and proba- it down into discrete cases. Case resolution does
bilistic linkage is used instead. Nonetheless, this by identifying individuals to process,
deterministic linkage is used to supply names reviewing their source data records, and identify-
to CIHI records (via health card number) and ing any primary cancers. A rigorous set of rules
some other identifiers to other sources records, are then used to automatically produce a “best”
using the provincial client registry. diagnosis from the available data. At this point,
Probabilistic linkage allows matching only incident cases that have passed the various
of data where the completeness of matching checks and filters remain.
variables is not 100 % and tolerates typing Unlike patient linkage, case resolution is an
errors, transpositions, initials, nicknames, etc. automatic process without concurrent manual
Through probabilistic linkage matches are review. Automated logic processes all source
assigned a total match score (weight). Matches records for a person, making cases for reportable
with the highest weights are automatically neoplasms. Any case found to be non-incident,
accepted, matches with low weights are rejected problematic, or outside the interest of CCO is
and links falling between the high/low thresh- appropriately flagged.
olds are manually reviewed. The Master Patient Non-incident cases are legitimate cases which
Linkage links incoming CIHI, ALR and Pathol- do not qualify because the specific diagnosis is not
ogy data to existing OCR persons. Incoming covered by the OCR definition of “incident,”
data that does not link to existing persons which normally includes only invasive, reportable
results in the addition of ‘new’ OCR persons. cancers. This includes in situ cases as well as
The Death Linkage links incoming death certif- benign and borderline brain and central nervous
icates to the OCR. Death certificates with a system tumors.
cancer cause of death that do not link to an Problematic cases are those that either conflict
existing OCR person result in the addition of a with the system or do not meet the basic criteria for
‘new’ OCR person and a ‘Death Certificate a proven case. An example of the latter is a case
Only’ cancer case. consisting only of hospital discharge records.
Incorrect linkage would have several impli- Because discharge data alone is not indicative of
cations. For example, if multiple reports were a diagnosis or outcome, a definite case cannot be
not linked to their respective patient, redundant created. After a follow-up review, problematic
“persons” or cases would be generated. This cases can be identified as incident or non-incident
would result in the over-reporting of cancer or combined with already existing cases.
incidence. Similarly, if death certificates were Some cases are deemed as “out of OCR range”
not linked to the correct person record, the or not of interest. These cases do not qualify as
Fig. 4 Diagram of interrelated OCR processes (except for pathology data collection system, ALR Activity Level
mortality data; death certificates and fact of death are Reporting, CIHI Canadian Institute for Health Informa-
processed separately). Patient linkage and case resolution tion, CS Collaborative Stage, NAACCR North American
processes are scheduled to run bimonthly. ePath electronic Association of Central Cancer Registries
incident cases because they fall outside of CCO’s sometimes used by the sources, case resolution
rules on geography and timing. The rules on logic may mistakenly create multiple cases for a
geography exclude patients with a residence at single person. Manual reviewers examine source
diagnosis listed as outside of Ontario. However, records and merge any such cases. All cases are
patients without a listed residence are still treated subject to manual review 6 months after their
as incident cases and considered as living in creation.
Ontario. Timing rules dictate that any cases diag- The successful completion of these processes
nosed prior to 1964 be labeled as “out of OCR allows for the creation of an OCR minimum
range” and ignored. Any cases that are not flagged dataset for a given year (Fig. 4).
by these rules are considered incident cases.
Resolved cases provide cancer-specific infor-
mation such as the conditions of the diagnosis
(ICD code, age, date of diagnosis, etc.), incidence Data Elements
status (in situ, invasive, etc.), cancer stage, and
other data pertinent to oncologists and Information in the OCR spans several domains of
researchers. data including demographic and vital statistics,
All non-pathology source data come to CCO tumor characteristics, treatment, and patient iden-
precoded. Because of the divergent coding tifiers (Table 2).
Table 2 Data domains and elements in the OCR diagnosis is an automated activity. First, all source
Data domain Available data elements records linked to a case are chronologically
Demographic and Date of birth ordered. Then, the earliest date is selected as the
vital statistics Age at diagnosis date of diagnosis, regardless of record type. The
Sex of patient complexity of the algorithms used varies
Census tract, division, and
subdivision depending on the nature of the element. For exam-
Last known ple, the methods used to generate stage data are
Place of residence considerably more complex (see section “Cancer
Date of death Stage at Diagnosis”).
Tumor characteristics Date of diagnosis
Non-incident status
Method of diagnosis/
confirmation Data Quality
Type of pathology report
Stage at diagnosis The quality of cancer incidence data in the OCR
Stage (overall, clinical and
pathological) compares favorably with that of other provincial
Primary site (ICD-O-3 site and national cancer registries. The OCR adheres
code) to four dimensions of data quality: comparability,
Histology (ICD-O-3 histology completeness, accuracy, and timeliness (Parkin
code)
Morphology and Bray 2009).
Topography Comparability is defined as the extent to
Site-specific factors which registry practices adhere to standard
Laterality guidelines, which include the criteria for regis-
SEER diagnosis group
Clinical practice group tration, coding systems such as ICD-O-3, multi-
Place of residence at diagnosis ple primary counting rules, and more. The
Treatment Local Health Integration standardization of OCR procedures ensures its
Network comparability and compatibility with other can-
Public health unit
cer registries.
Treatment facility
Treatment date Completeness refers to how well incident
Date of last contact cancer cases are registered. Specifically, how
Care site ID closely registry values for incidence and sur-
Discharge count (DAD)
vival reflect the population’s true values. The
Surgery count (NACRS)
ALR/RCC OCR’s ability to draw upon multiple data
Number of pathology reports sources to register cases, often with multiple
Identifiable/linkable Place of residence at diagnosis sources per case, is conducive to a high level
Information Patient name of completeness. OCR completeness is further
Ontario Health Insurance Plan
number
verified through case-finding audits, record
Health card number linkage with national and provincial databases,
SEER Surveillance, Epidemiology, and End Results Pro- and comparisons with historic values.
gram; DAD Discharge Abstract Database, NACRS National Accuracy pertains to how well case records
Ambulatory Care Reporting System, ALR Activity Level resemble their actual values. Just as with com-
Reporting, RCC Regional Cancer Center
pleteness, the OCR maintains a high level of
accuracy thanks to its use of multiple data sources.
Accuracy is further improved with re-abstraction
Because numerous source records are often studies and recoding audits, histological verifica-
tied to a single case, some data elements such as tion of cases, examining “death certificate only”
date of diagnosis must be derived using algo- cases, and analyses of missing information and
rithms. In this case, establishing the date of internal inconsistencies.
Table 3 Data quality indicators (NAACCR standard) for OCR 2008–2012 data yearsa
Year
Indicator (% of all cases) 2008 2009 2010 2011 2012
Completeness of case ascertainment 94.9 95.0 96.1 99.1 94.8
Missing age 0.0 0.0 0.0 0.0 0.0
Missing sex 0.0 0.0 0.0 0.0 0.0
Death certificate cases only (DCO) 1.0 1.3 1.5 1.4 1.8
Passing edits checks 100 100 100 100 100
a
Current as of Nov 2015
Timeliness is the speed with which a registry can also arise in instances where registrars manu-
can collect, process, and report complete and ally edit EDW-OCR data by merging cases
accurate cancer data. OCR timeliness is contin- together or modifying diagnosis codes and dates.
gent upon two variables – the time until receipt
and the time to process. The time until receipt Data Auditing
refers to the time elapsed from diagnosis to deliv- CCO practices routine data audits to verify the
ery to CCO. With the exception of cause of death accuracy of its data holdings. One such audit is
information though death certificates, which are for inter-rater reliability aimed at assessing the
typically received after 18–24 months, CCO level of agreement among coders or staging
receives and loads data into the EDW on a regular abstractors. These audits are necessary to mini-
schedule (see section “Data Sources”). mize the loss of data integrity as a result of human
Every year the OCR shares its data with error and establish consistency.
NAACCR as part of its annual call for data, In a 2015 audit for stage quality, the inter-rater
which is one of several calls for data by other reliability between 16 CCO analysts was carried
organizations throughout the year to which the out. Each analyst independently staged an identical
OCR responds. The measures of quality using set of 96 randomly chosen cases diagnosed from
the NAACCR data quality standard are shown in 2012 to 2013. The “de-identified” set of cases
Table 3 for reference. included an equal amount of breast, colorectal,
lung, and prostate primaries. Restrictions placed
on the analysts prevented them from consulting
Other Factors Affecting Quality each other or accessing full patient records, case
histories, or pathology reports. Audits such as
Registration System these allow CCO to discover any issues in data
One concern regarding data quality pertains to the quality and promptly rectify them. Among the
recent transition from OCRIS to the EDW-OCR. group of 16 analysts, an overall agreement rate of
With each bimonthly case resolution cycle, EDW 93.5 % was found. In such audits, CCO strives to
case data evolves. Existing cases expire and are maintain a crude agreement rate of at least 90 %.
replaced with a new case file. Cases tied to
OCRIS, namely, all data from before 2010, remain Data Sources and Timing of Loads
unaffected and are listed as “frozen.” In order to As part of the transition from OCRIS to the
mitigate variability, the data mart also tracks new EDW-OCR, some of the data source rankings
versions of old case files. When new and old case have changed. In particular, pathology reports
files maintain a fixed degree of similarity, the two have replaced ALR data for the highest rank.
are linked in a process called case chaining. Case This can be attributed to the more reliable and
chaining ensures that current case files can be efficient nature of some sources, which makes
found once an older case is retired. Variability them more valuable. Case data quality can be
further examined by performing NAACCR Edit scheduled releases of death certificates from the
Checks (Table 3). These checks identify cases Registrar General of Ontario. As a result, the
that warrant further review. Often times death clearance process may occur long after
such cases are coded incorrectly, with invalid incident cases from other sources have been
topography and morphology combinations, identified.
unconfirmed multiple primaries, and other
errors which are easily rectified.
Delays in the delivery and handling of source The OCR Adopts a New Approach
data mean that case resolution and registration to Counting Cancers
cycles can on occasion be out of sync. As previ-
ously mentioned, DAD, NACRS, and ALR data Counting practices for OCRIS incident cases
are loaded on a monthly basis. In comparison, were modeled after standards set by the Interna-
pathology (ePath) data is loaded weekly. How- tional Agency for Research on Cancer (IARC)
ever, as the case resolution and registration and the International Association of Cancer
cycles operate on a bimonthly schedule, any Registries (IACR). These counting rules were
misrepresentations of data become negligible very conservative and inflexible for patients
over time. diagnosed with multiple primaries. Given that
approximately 10–14 % of cases with a single
primary will develop a subsequent cancer
Ontario Patients Treated Outside
within 25 years, cancer incidence counts
Ontario - Removal of Duplicates
would be under-reported by overlooking such
Statistics Canada conducts a national duplicate
subsequent primaries. The modified IARC/
resolution process with the provincial and terri-
IACR rules used by OCRIS did not recognize
torial cancer registries each year to account for
paired organs (e.g., left or right breast or lung)
multiple reporting of cases (e.g., due to patients
or colon and skin melanoma subsites, nor did it
moving between jurisdictions). The exchange of
have timing rules to recognize new, subsequent
data between provincial and territory cancer reg-
primary cancers in the same organ. As a result,
istries is necessary to resolve duplicate cases and
OCRIS likely reported lower rates of multiple
identify cases that may be missed, such as among
primaries than other registries with more liberal
individuals who access cancer services outside
rules, including those using the SEER Multiple
of their home province. For example, residents
Primary and Histology (MPH) coding rules.
of northwestern Ontario will often use out-of-
However, starting with cases diagnosed in
province cancer services in the neighboring
2010, the OCR implemented the SEER MPH
province of Manitoba. Data sharing agreements
coding rules, which use four criteria for counting
exist between provinces for the exchange of such
multiple primaries: topography, morphology,
information.
laterality, and timing (Johnson et al. 2012).
Death Clearance
Death certificates are used for the purpose of Topography and Morphology
death clearance, a process that uses the coded
cause of death (where cancer is the underlying Topography refers to a cancer’s anatomic site of
cause) to identify individuals who were not pre- origin, while morphology describes the type of
viously recognized as having cancer. These cell and its biological activity. The morphology
“death certificate only” cases represent under of cancers is recorded with two codes, describing
2 % of incident cases (Table 3). Unless fact of the cancer’s histology and behavior. In OCRIS,
death is established otherwise, death certificates additional primaries were only added to the regis-
are necessary to keep accurate survival and prev- try when cancers expressed both a different topog-
alence rates. Currently, there are no routinely raphy and morphology from the initial primary
cancer. As shown in Table 4, the OCR accepts and right kidney, were reported with invasive
cancers that are morphologically identical but tumor, only a single primary would be recognized.
have different topography, and vice versa, as Using the SEER rules, paired sites are considered
being multiple primaries. in the registration of multiple primaries. As
outlined in Table 5, only specific topographic
sites are subject to the rules on laterality. With
Laterality cancers of the central nervous system, the
laterality rule only applies to benign and border-
Laterality applies mainly to paired organs and line tumors. Malignant central nervous system
differentiates similar cancers by organ subsite. tumors remain exempt.
The IARC/IACR rules do not recognize laterality.
In cases where both paired organs, such as the left
Timing
Table 4 Criteria for classifying cancers as multiple pri- The diagnosis of multiple primaries can be typi-
maries under the modified IARC/IACR rules in OCRIS cally described as synchronous or metachronous.
compared to SEER Multiple Primary and Histology rules Synchronous cancers are those that develop at
in the OCR the same time or within a small time frame,
Multiple primary rule while metachronous cancers occur in sequence
SEER of one another. Data on metachronous cancers
IARC/IACR MPH
are of particular importance to researchers as
Criteria (in OCRIS) (in OCR)
they provide insight into causal mechanisms
Same topography and No Yes
different morphology (in general) involved in the formation of subsequent neopla-
Different topography No Yes sia. IARC/IACR rules dictate that the existence
and same morphology (in general) of two or more primary cancers does not depend
Laterality No Yes on time and are therefore recognized as a single
Timing No Yes primary case. The SEER rules on timing allow
(in general) metachronous cancers to exist as multiple pri-
IARC International Agency for Research on Cancer, IACR maries. As shown in Table 5, a cancer must have
the International Association of Cancer Registries, SEER
Surveillance, Epidemiology, and End Results Program, developed after a specified period of time to qual-
MPH multiple primary and histology ify as a multiple primary.
Table 5 Applicable SEER multiple primary counting rules for laterality and timing (Source: SEER Multiple Primary and
Histology Coding Rules Manual, 2007)
Cancer type Laterality Timing
Breast Yes 5 years
Head and neck Yes 5 years
Kidney Yes 3 years
Lung and pleura Yes 3 years
Urinary Yes 3 years
Colon Yes 1 year
Melanoma Yes 60 days
Benign and borderline central nervous system tumors Yes Does not apply
Malignant central nervous system tumors Does not apply Does not apply
Other sites Yes, if considered a paired site 1 year, if applicable
Invasive diagnosis 60 days after an in situ diagnosis Does not apply 60 days
SEER MPH rules implemented in OCR

11,000
10,500
10,000
9,500
9,000
Number of new cases
8,500
8,000
7,500
7,000
6,500
6,000
5,500
5,000
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Year of diagnosis
Female Breast Colorectal Lung Prostate
Fig. 5 Effect of implementing the SEER Multiple Pri- lung cancers while the incidence of prostate cancer
mary and Histology counting rules on incidence, by cancer remained relatively fixed (Source: Ontario Cancer Regis-
type in Ontario, 2002–2011. Note the seemingly dispro- try, 2015 (Cancer Care Ontario))
portionate rise in the incidence of breast, colorectal, and
Implications of Counting Rules on Data and those diagnosed with melanoma of the skin,
and Analysis female breast cancer, and colorectal cancer. The
incidence of colorectal, female breast and lung
There was a substantial increase in incident cases cancers rose considerably following the imple-
following the adoption of the SEER MPH rules by mentation of the SEER MPH counting rules
the OCR. This change is due to how cancers were (Fig. 5). However, the incidence of prostate
being counted rather than indicating that more cancer remained largely the same.
people in Ontario were being diagnosed with or
dying of cancer. The new rules allow for a more
complete accounting of cancer incidence, which Best Practices for Analysis
improves the ability for regions and communities
in Ontario to plan for the future needs of the From an analytic perspective, if an analysis spans
cancer system. the OCRIS and OCR datasets, special care must
To further examine the change imposed by the be taken to reconcile the two. More specifically,
new counting rules, IARC/IACR and SEER MPH data from 2010 onward must first be made IARC/
rules were compared for 2010 and 2011 incident IACR-compatible by using those multiple pri-
cases (Candido et al. 2015). According to this mary counting rules, which then allow for trend
analysis, overall there were 5.8 % more cases analyses under a common rule.
using the SEER MPH rules, the increase varying For cancer projections, the projections must be
by morphology, topography, sex, and age. The undertaken based on incidence counts using the
greatest change was observed in older age groups IARC/IACR rules and then be modified with a
correction factor that accounts for the effect of the derives a single set of T, N, M, and stage group,
SEER MPH counting rules. which will be clinical or pathologic, depending on
how the extent of disease was discovered within
the diagnostic and treatment process.
Cancer Stage at Diagnosis The staging guidelines for CS require signifi-
cantly more information than is included in clin-
Cancer stage at diagnosis reports the extent of a ical or pathological TNM reports. CCO staging
cancer’s invasion and spread beyond the primary analysts require data on tumor size, depth of inva-
site. Factors used in staging include the tumor’s sion, the number and location of positive lymph
topographic site, size, multiplicity, invasiveness, nodes, as well as site-specific factors (see details
lymphatic penetration, and metastases. In a clini- below). TNM often does not specify these raw
cal setting, this information helps determine a data elements, nor does it provide a cancer stage
patient’s appropriate course of treatment and pro- indicator that combines clinical and pathology
vides an estimate of their prognosis. The dominant data. In order to derive the CS, staging analysts
clinical staging method is the tumor, node, and must also review patient pathology and medical
metastasis (TNM) staging system and the collab- records in addition to TNM reports. One calendar
orative staging (CS) framework (which is based on year is typically dedicated to the CS capture pro-
TNM) used by North American cancer registries. cess for a given diagnosis year.
TNM staging reports cancer stage as a function One significant change that accompanied the
of tumor, node, and metastasis. First, the primary adoption of CS was the introduction of site-
tumor is classified by type, size, and extent. Next, specific factors (SSFs). SSFs provide supplemen-
the level of lymph node involvement is deter- tary information unique to a cancer type to assist
mined. Lastly, any metastases are examined to in the staging process. This information expands
assess the cancer’s spread from the primary site. the understanding of tumor characteristics, prog-
By taking these data into consideration, an overall nosis, and predicted treatment response. SSFs for
stage value, ranging from 0 to IV, can be assigned. several cancer types were introduced with AJCC
CS is a unified data collection framework 7th edition for cases diagnosed in 2010 onward.
designed to use a set of data elements based on Furthermore, the implementation of SSFs allows
the extent of disease and clinically relevant fac- registries to collect data on biomarkers and other
tors. The primary objective of CS was to reconcile factors that were previously not collected.
the American Joint Committee on Cancer’s
(AJCC) TNM staging system, Union for Interna-
tional Cancer Control TNM staging system, and CS Automation and Integration
the SEER Summary Staging system. This change
brought about a significant reduction in data CS is generated in a two part hybrid system
redundancy and duplicate reporting. Furthermore, (see section “Data Systems and Consolidation,”
it retained data integrity for both clinical and Fig. 4). The first part, CS automation, is an auto-
public health researchers while improving acces- mated process that populates CS abstracts by iden-
sibility and compatibility. tifying stageable registry cases and linking them to
The input data items for CS that are collected synoptic pathology reports. Stageable cases are
from the medical record include both clinical those that contain data sufficient to derive a
diagnostic results, like imaging, biopsy, and TNM stage, either using the CS data collection
other tests, and any cancer resection surgery system or by manual TNM staging. CS abstracts
results. Each data element has an additional field are required to organize and summarize case data
that identifies whether it was collected from clin- pertinent to the staging process. Staging analysts
ical or resection surgery findings and an indicator remotely access hospital electronic patient records
if neoadjuvant therapy was performed prior to to determine if clinical diagnostic tests need to be
surgery. The CS algorithm then automatically added to the CS input information. This also
occurs when synoptic pathology data are insuffi- hospitals have made patient records available to
cient for abstraction purposes or are unavailable. the OCR’s staging analysts.
The second process called CS integration
requires a more fine-tuned approach. It involves
a probabilistic tumor linkage between case and Stage Capture Rates
abstract followed by manual review of unlinked
abstracts. CS integration involves reviewing Stage capture refers to the completeness of stage
abstracts to determine a “best stage” and linking information on all stageable cancer cases identi-
it back to cases in the OCR. This process neces- fied by the registry. Of the 65,816 cases identified
sitates remote access to the electronic patient in 2013, approximately 72 % were stageable
record at hospitals. Currently, CCO is the only (Fig. 6). CS was derived for 85 % of those stage-
organization in Ontario authorized to exercise able cases, a significant improvement from its
this level of direct access to electronic patient introduction in 2008.
records. The proportion of CS cases increased sub-
stantially after 2010, when a national initiative
led by the Canadian Partnership Against
Source of Staging Data Cancer supported the Canada-wide adoption of
CS. The rise in stage capture rates can also be
In 2005, CS was captured for only a subset of attributed to the progressive rollout by CCO of
patients from outside RCCs, representing less CS to a greater number of cancer types (Table 6).
than 15 % stage capture in the first year of data CS was officially implemented in the OCR for
collection. It has since expanded to include both the four most common cancers in 2010. In 2011,
RCCs and non-RCC hospitals. Staged TNM data the use of CS grew to include melanoma of
is received from Ontario RCCs, while non-RCC the skin and gynecologic cancers, followed by
Stage capture rates 2008-2013, all disease sites

100%
90%
80%
70%
Stage capture %
60%
50% Unknown
TNM
40%
CS
30%
20%
10%
0%
2008* 2009* 2010 2011 2012 2013
N= 54,789 N= 56,520 N= 67,717 N= 70,957 N= 69,757 N= 65,816
Year
* 2008 and 2009 capture rates determined using modified IARC/IACR rules for counting multiple primaries.
Fig. 6 Stage capture rates using Collaborative Stage (CS) Ontario). * 2008 and 2009 capture rates determined
and Tumor Node Metastasis (TNM), Ontario, 2008–2013 using modified IARC/IACR rules for counting multiple
(Source: Ontario Cancer Registry, 2015 (Cancer Care primaries)
Table 6 Progressive rollout by OCR of collaborative This section describes the dataset linkage pro-
staging by cancer site cess within the OCR and outlines several of
Year of full implementation of CCO’s other data holdings that may be of interest
Cancer type collaborative stage by OCR to health services researchers.
Breast 2010
Lung 2010
Colorectal 2010
Other Linkage Processes
Prostate 2010
Gynecologic 2011
cancers Pending approval by CCO’s data disclosure team,
Melanoma of 2012 CCO may process cohort files submitted by exter-
skin nal researchers (see section “Data Privacy and
Thyroid 2013 Access” for more information on data access).
At minimum, these cohort files must include
names and birthdates of all patients to be
further expansion in 2013 to include thyroid processed. Additional identifiable information
cancer. such as health card numbers (HCNs), postal
In 2015, there was a decision to retire the CS codes, and gender may be included in the cohort
system and to implement TNM for population- file to expedite the linkage procedure and any
based staging in North America. The change in necessary manual resolutions. After a suitable
staging system is expected to take effect in Cana- cohort file has been received by CCO, a linked
dian provincial registries with cases diagnosed in file may be produced. In the interest of efficiency,
2017. Despite the decision to return to TNM stag- cohort files for less than 300 individuals are linked
ing, the AJCC has stated that it is committed to to the OCR manually through a name search func-
keeping SSFs an integral part of the staging pro- tion. Cohort files for over 300 individuals neces-
cess. Discussions are still underway in Canada sitate a probabilistic linkage in the same manner
about additional factors relevant to population- as described in section “Data Systems and Con-
based staging that may need to be collected. solidation,” but with the use of Automatch soft-
ware. The software compares records from client
files to the OCR and assigns a total score
corresponding to how closely the records match.
Linkage of the OCR to Other Datasets Matches on uncommon names will receive a
higher score than matches on common names,
By linking the OCR with other datasets, indicating greater confidence in the link.
researchers can obtain a more comprehensive These linkages are to a subset of OCR data.
understanding of healthcare issues. Whether Subsets are pared down to comply with research
linked with CCO or external datasets, the OCR parameters. For example, if the cohort represents
can serve as the basis for research studies, espe- females enrolled in a research survey which com-
cially when patient-level data is available. menced in 2002, the subset will not contain female
Datasets regularly linked with the OCR for the patients who died prior to 2002 or any male patients.
purpose of health services research include the Typical information released from the OCR includes
following healthcare utilization databases: person key (a unique identifier for an OCR person),
date of diagnosis, topography, morphology, vital
• Ontario Health Insurance Plan claims status, and date of last contact or death.
• Ontario Drug Benefit claims The probabilistic linkage will match the cohort
• CIHI’s Discharge Abstracts Database (DAD) file to the OCR person records and assign a total
for inpatient hospitalizations match score (weight). Matches with the highest
• CIHI’s National Ambulatory Care Reporting weights will be automatically accepted, matches
System (NACRS) with low weights will be rejected and links falling
between the high/low thresholds are manually surveillance reports eventually evolved to
reviewed. The high/low thresholds will be deter- describe province-wide patterns and trends in
mined by an OCR data analyst through analysis of healthcare delivery aimed at managing and
the data. The unique identifier for a OCR person planning for the cancer system, allocating
will then be used to select case level data from the resources, as well as evaluating and monitoring
OCR for the cohort members that were identified the cancer system. Between 1973 and 2014,
as matches to the OCR. The final product of the more than 460 peer-reviewed articles were
linkage is a file of matched records which typi- published using data from the OCR. The fre-
cally includes information related to the cancer quency of OCR data use grew substantially fol-
diagnosis and vital status information. lowing the 1990s (Fig. 7). In the last 2 years of
available data (2013–2014), 120 peer-reviewed
research articles were published citing use of
CCO’s Other Data Holdings the OCR. This growth may be attributed to
improvements in information capture in the
CCO data holdings store information collected healthcare sector and the growing availability
from healthcare service providers across the prov- of healthcare data. For instance, within CCO,
ince. This information enables the planning and ALR has evolved in its ability to measure and
funding of cancer and other healthcare services, monitor activity related to systemic treatment,
development of guidelines, and management of including chemotherapy and radiation therapy.
the cancer and renal care systems in Ontario. The Similarly, CCO’s recent implementation of the
major data holdings are shown in Table 7. Details Wait Time Information System increases the
about the data held within each of these reposito- scope of data the patient experiences in the
ries can be found on CCO’s website, www. healthcare system. Moreover, as healthcare-
cancercare.on.ca. related information has become more readily
Other provincial organizations with which available in electronic format, the potential for
CCO maintains a data sharing agreement (DSA) data linkage and exploration of research topics
will sometimes create linkages with OCR data. has continued to grow.
One such example is the Institute for Clinical This section presents specific examples of how
Evaluative Sciences, which uses their version of the OCR has been used for health services
the OCR data received from CCO to perform research. The works cited in this section provide
in-house linkages for research purposes. The dif- some recent examples of data linkage between
ferences in dataset versions between CCO and the OCR and other administrative data sources
organizations that receive CCO data through or linkage with primary data collected by the
data sharing agreements can be identified through research study.
their respective data dictionaries, which are often
available online.
Examples of Health Services Research
Using the OCR
Health Services Research Using
the OCR Using date of diagnosis, geography, and demo-
graphic information, researchers frequently
The OCR has been a source of data for projects extract data from the OCR for descriptive purposes,
across the cancer continuum. A review of the to explore trends over time, patterns of care, and
peer-reviewed literature suggests that use of potential gaps in access and equity. Using this
Ontario cancer data in health research dates approach, researchers have described the postoper-
back to the 1970s. A series of papers in the ative mortality risk among the elderly (Nanji
1970s by MacKay and Sellers reported on the et al. 2015), wait times from abnormal mammogra-
burden of cancer by using the OCR. Such cancer phy to surgery among breast cancer patients
Table 7 CCO’s major data holdings as of September 2015 (Source: CCO, 2015)
Data holding Description Type of data
Activity Level Reporting Provides an integrated set of data elements This dataset contains administrative
(ALR)/Cancer Activity from Regional Cancer Centers (RCC) related data, clinical data, and demographic
Datamart to systemic treatment and radiotherapy that data.
cannot be obtained from other providers. This
information is used to support management
decision-making, planning, accountability,
and performance management at the RCC,
regional, and corporate level.
Patient Information Database comprised of patient and tumor This dataset contains administrative
Management System information for cancer and cancer-related data, clinical data, and demographic
(PIMS)/Pathology pathology reports (tissue, cytology), data.
Datamart submitted from public hospital (and some
commercial) laboratories. PIMS documents
patient, facility, report identifiers, and tumor
identifiers, such as site, histology, and
behavior. This information is used to support
management decision-making, planning,
disease surveillance and research, as well as
contributing to resolved incidence case data
in the Ontario Cancer Registry.
New Drug Funding The NDFP database stores patient and This dataset contains: administrative
Program (NDFP) treatment information about systemic therapy data, clinical data (eligibility criteria)
drug utilization at RCCs and other Ontario and demographic data.
hospitals, for which reimbursement is being
sought through the NDFP according to strict
eligibility criteria.
Ontario Breast Screening The associated Integrated Client This dataset contains administrative
Program (OBSP) Management System database provides an data, clinical data, and demographic
integrated set of data for each client screened data.
in the OBSP for the purposes of program
administration, management, and evaluation.
Colorectal Screening Data – The data collected through CIRT will be used This dataset contains: administrative
Colonoscopy Interim to understand current colonoscopy activities care and clinical data.
Reporting Tool (CIRT) conducted within participating hospitals from
both volume and quality perspectives. It will
also be used to validate incremental volume
allocations across the province.
Laboratory Reporting Tool LRT contains CCC program FOBT (fecal
(LRT) occult blood test) kit distribution and results
data from the CCC partner labs.
Ontario Cervical Screening Cytobase is comprised of cervical cytology This dataset contains administrative
Program data (“Pap Test” results) collected from data, clinical data, and demographic
participating community laboratories. This data.
cervical cancer screening database contains
patient, physician, and laboratory
information. This information is used to
administer and evaluate the performance of
CCO’s Cervical Screening Program, for
cancer planning and management and for
cancer surveillance research.
Brachytherapy Funding Stores patient and treatment information This dataset contains administrative
Program about prostate cancer patients at RCC data, clinical data, and demographic
hospitals, for which reimbursement is being data.
sought.
(continued)
Table 7 (continued)
Data holding Description Type of data
Symptom Management The Symptom Management Reporting This dataset contains administrative
Reporting Database Database data is comprised of three data, clinical data, and demographic
components: patient registration, symptom data.
screening using the Edmonton Symptom
Assessment System (ESAS) and functional
assessment using the Palliative Performance
Scale and/or Eastern Cooperative Oncology
Group Performance Status. This information
is captured by participating sites using the
Interactive Symptom Assessment and
Collection system and then submitted on a
monthly basis to the Symptom Management
Reporting Database.
Interim Annotated Tumor The Interim ATP provides an integrated set of This dataset contains administrative
Project (ATP) Database data, combining tumor information from the data, clinical data, and demographic
Ontario Institute for Cancer Research’s data.
Tumor Bank with CCO’s Cancer Registry,
for the purpose of increasing the accuracy
and utility of the information for both
researchers and CCO planners. For example,
researchers may use this information to study
the association between genetics and
response to cancer drugs; in turn, CCO may
use this information to create clinical
guidelines for the care and treatment of
cancer patients in Ontario.
Wait Times Information The Wait Time Information System is the This dataset contains administrative
System (WTIS) first-ever information system for Ontario to data, clinical data, and demographic
collect accurate and timely wait time data. data.
This system has been implemented in
82 Ontario hospitals. Work is underway to
enhance this system to track wait times for all
surgical procedures in Ontario
This web-based system performs several
functions, which include:
Enabling the collection of data related to
wait times
Providing clinicians and other health
professionals with the tools required to
effectively assess patient urgency according
to a defined wait times standard
Measuring and reporting wait times and
data regarding utilization of procedures
Supplying clinicians, administrators, and
managers with near real-time information for
use in monitoring and managing wait lists
Reporting wait time information to the
public on a website enabling patients to
manage their own care and the public to
assess progress on reducing wait times.
(Cordeiro et al. 2015), and rates of thyroid cancer Indians” in Ontario, researchers have been able
among children, adolescents, and young adults to describe the cancer experience among the
(Zhukova et al. 2015). Through linkage with a First Nations population in Ontario and study
dataset identifying 140,000 registered or “Status their survival rates over a 30-year time frame
100
90
80
70
Publication frequency
60
50
40
30
20
10
0
1973
1975
1977
1979
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
2009
2011
2013
Year of publication
Fig. 7 Distribution of peer-reviewed publications using the OCR as a data source 1973–2014 (Source: CCO Surveillance
and OCR, July 2015)
(Nishri et al. 2015). The ability of the OCR to consisting of individuals with cancer who have
identify patients in specific clinical subgroups experienced standard care, in which comparisons
has also enabled research studies to test concor- may also be derived from the OCR using
dance with clinical practice guidelines, such as treatment-based criteria. The OCR is able to pro-
the treatment of patients with stage II colon vide covariates necessary for the statistical control
cancer (Biagi et al. 2009), follow-up surveil- of potential confounders in these comparative ana-
lance of patients treated for Hodgkin’s lym- lyses (e.g., stage at diagnosis or date of diagnosis).
phoma (Hodgson et al. 2010), and the use of Studies incorporating survival analysis and
single fraction radiotherapy for uncomplicated modeling have been able to uncover factors
bone metastases (Ashworth et al. 2014). associated with survival on a population level.
Population-based retrospective cohort studies Such studies have uncovered clinicopathologi-
have used the OCR to identify cohorts of patients cal factors linked to survival among patients
who were diagnosed during a given period of time, diagnosed with pancreatic adenocarcinoma
underwent particular therapeutic courses, or expe- (Kagedan et al. 2015), survival among bladder
rienced a particular model of care. This approach cancer patients receiving various treatment
has been used to carry out research to look at modalities (Leveridge et al. 2015; MacKillop
healthcare costs among colorectal cancer patients et al. 2014a), survival among Ontario men who
(Mittmann et al. 2013), the impact of active sur- underwent radical prostatectomy, and general
veillance in prostate cancer (Richard et al. 2015) survival trends among individuals with laryn-
and the use of adjuvant chemotherapy among geal cancer (Macneil et al. 2015). By coupling
patients with early breast cancer (Enright OCR-defined cohorts with clinical data from
et al. 2012). These studies make use of noncancer sources such as surgical pathology reports,
comparison groups or population-based compari- researchers have been able to associate the prog-
sons through strategies such as random digit dial- nostic importance of specific clinical factors and
ing. They may also use comparison groups provide direction for best practice in clinical
reporting (e.g., Berman et al. 2015). Other a letter from CCO. These letters are used to con-
investigators have looked at variability in sur- firm that the individual has been diagnosed with
vival among patients visiting different RCCs in cancer, inform said individual about the research
Ontario by linking the OCR with stage and being performed, and obtain consent for the
treatment data (e.g., head and neck cancer – release of their contact information to the
MacKillop et al. 2014b). researcher. Individuals are also provided the
The OCR is also useful to health services option to opt out of any such studies in the future.
researchers who are interested in the effectiveness The new and more standardized approach to
of preventive strategies to control cancer, such as patient contact minimizes the risk associated with
population-based screening programs. In this type erroneous identification of cancer patients and
of research design, the OCR data provides the clin- contacting patients who do not or do not yet
ical endpoint that will determine the effectiveness of know they have cancer, as well as patients who
screening intervention. The OCR has been used to are deceased. The approach also ensures a more
capture rates of colorectal cancer among those indi- consistent and effective process for obtaining
viduals who had a positive guaiac fecal occult blood informed consent from study participants.
screening test as part of Ontario’s Colon Cancer Examples of patient-contact studies using
Check program and assess their risk of colorectal OCR-identified patients as a sampling frame have
cancer over a 30-month time frame (Tinmouth included a case–control study to identify risk factors
et al. 2015). The OCR has also been used to ascer- associated with ovarian tumors (McGee and Narod
tain the rates of cervical cancer before and after the 2010), a study of quality of life and health utilities
introduction of a human papillomavirus immuniza- among prostate cancer patient (Krahn et al. 2013), a
tion program for girls in grade 8 (Smith et al. 2015). dietary study among breast cancer patients (Boucher
The OCR has been used widely to look at the effects et al. 2012), and a survey of men with prostate
of breast cancer screening and its various aspects cancer about decision-making around the use of
though linkage with the data from the Ontario complementary and alternative medicine (Boon
Breast Screening Program. This research has shed et al. 2005).
light on the role of mammographic density in
screening outcomes (Boyd et al. 2007), the contri-
bution of clinical breast examination to breast Data Privacy and Access
screening (Chiarelli et al. 2009), and the perfor-
mance of digital compared with screen-film mam- Privacy
mography (Chiarelli et al. 2013).
As a prescribed entity under the Ontario Personal
Health Information Protection Act, CCO is per-
Patient Contact mitted to collect, use, and disclose personal health
information. By way of comparison, other pre-
Research teams will occasionally approach CCO scribed entities within Ontario include the Pediat-
to gain access to the OCR for the purpose of ric Oncology Group of Ontario, Canadian
identifying individuals eligible to participate in Institute for Health Information, and the Institute
cancer-related research studies. In these instances, for Clinical Evaluative Sciences.
analysts at CCO will work with research investi- CCO has robust information management
gators to refine a set of criteria for participation in practices, outlined within the Privacy Program,
the study and extract a cohort from the OCR. in place to ensure the protection of personal health
Until 2014, CCO had been in the practice of information within the OCR and its other data
providing cohorts to the research team, who holdings. These information management prac-
would then make contact with patients to request tices are audited on a triennial basis by the Office
their participation, often via their physician. The of the Information and Privacy Commissioner of
current process for patient contact is initiated with Ontario.
CCO’s Privacy Program includes privacy pol- procedures. Before obtaining final approval by the
icies, standards, procedures, and guidelines. Its Data Disclosure Subcommittee, research data
privacy assurance and risk management activities requests must undergo an extensive review by
involve: subject matter experts in the data disclosure work-
ing group.
• Privacy impact assessments and risk mitigation
plans
• Data sharing agreements
Technical Appendix
• Standard operating procedures
Synoptic pathology reports are an integral com-
ponent of the EDW and feed the Pathology Data
Staff privacy training and awareness activities
Mart, which is needed for CS integration (Fig. 9).
are in place to maintain a culture of privacy across
Synoptic pathology reports from the Pathology
the organization. A data access program,
Data Mart, OCR case files and CS abstracts are
described below, is implemented to review and
utilized by the Registry Plus service to drive CS
approve external and internal requests for access
integration and populate the CS data mart (see
to OCR data.
section “Cancer Stage at Diagnosis” for more
information on CS and its processes). Registry
Plus is a suite of publicly available free software
Data Request Process
programs for collecting and processing cancer
registry data (Centers for Disease Control and
CCO understands the value of health services
Prevention 2015).
research and has therefore implemented the data
disclosure team to assist researchers and other
data requestors in accessing its data holdings. ePath, eMaRC, and ASTAIRE
Outlined in Table 8 are the four types of data
requests typically received by CCO. All pathology reports are handled through CCO’s
Figure 8 outlines CCO’s data disclosure pro- ePath electronic pathology reporting system.
cess and the various internal groups involved. The receives, processes and stores pathology reports,
Data Disclosure Subcommittee oversees all connecting the diagnostic laboratories to the
research data requests and occasionally some gen- OCR. ePath is comprised of several major sub-
eral data requests, in adherence with the Personal systems, including the Electonic Mapping,
Health Information Protection Act and CCO’s Reporting, and Coding (eMaRC) and the Auto-
Data Use and Disclosure Standard. This group mated Synoptic Template Analysis Interface and
also reviews CCO’s data disclosure policies and Rule Engine (ASTAIRE).
Table 8 The four types of data requests received by CCO

Request type Description
Research data Requests from external researchers for record-level data (personal health information or
requests de-identified data) for the purposes of conducting scientific studies. This type of request also
includes patient-contact studies, where CCO contacts prospective participants to obtain consent
for permission to be contacted for a research study
General data Nonresearch requests for record-level or aggregate data for a variety of health system planning
requests purposes, including regional performance management, quality assurance, and information
dissemination. Currently this type of request also includes private company requests for record-
level or aggregate data, for purposes such as marketing and economic analyses
Genetic requests Requests from genetic counselors for pathology reports, with consent from the individual in
question or substitute decision-maker, to facilitate the genetic counseling process
SEER*Stat Requests from external partners for the latest SEER*Stat de-identified data software package to
requests facilitate the production of aggregate cancer incidence and mortality statistics
Fig. 8 Data disclosure process at CCO. DD WG data disclosure working group, DDSC data disclosure subcommittee
Hospital Cancer Care Ontario

Synoptic
Pathology
Reports
CCO Enterprise Data

Pathologist Warehouse
ePath
Hospital LIS
Non-Synoptic
Pathology
Pathology
Data Mart
Reports
Registry Coder
Synoptic
Hospital EPR Pathology Ontario Cancer
-Health Records Reports Registry
Semi-automated population of CS data
CS Datamart
CS Abstractors RegistryPlus
CS Integration
Fig. 9 Diagram of pathology-driven processes at CCO. LIS laboratory information system, EPR electronic patient
record, CS Collaborative Stage
CCO eMaRC is a subcomponent of the ePath partial automation for numerous ICD-O-3 diagnoses
system, which processes and stores pathology codes, collaborative staging elements, and creates
reports received in HL7 messaging format. CCO NAACCR compatible abstract records. The system
eMaRC automatically filters cancer vs non-cancer also merges multiple reports for a single patient so as
reports and non-reportable reports and provides to prevent the creation of extra cases in the OCR.
A data quality assessment tool, ASTAIRE Boon H, Westlake K, Deber R, Moineddin R. Problem-
ensures that synoptic data is compliant with solving and decision-making preferences: no difference
between complementary and alternative medicine users
the College of American Pathologists standards. and non-users. Complement Ther Med. 2005;13(3):
ASTAIRE is made up of three components: 213–6.
GINGER, FRED, and ADELE. Combined, Boucher BA, Cotterchio M, Curca IA, Kreiger N, Harris
GINGER and FRED ensure that synoptic data SA, Kirsh VA, et al. Intake of phytoestrogen foods and
supplements among women recently diagnosed with
is sufficiently complete and in line with current breast cancer in Ontario, Canada. Nutr Cancer.
eCC versions. ADELE then cleans data so that 2012;64(5):695–703.
may be admitted to the EDW. Boyd NF, Guo H, Martin LJ, Sun L, Stone J, Fishell E,
In the interest of privacy and efficiency, data et al. Mammographic density and the risk and detection
of breast cancer. N Engl J Med. 2007;356(3):227–36.
handled through ePath is coded in Health Level Cancer Act, R.S.O 1990, c. C.1 [Internet]. 22 June 2006
Seven V2 format, which is a secure method of [cited 28 Oct 2015]. Available from: http://www.
data transmission designed to protect sensitive ontario.ca/laws/statute/90c01
health information. This data contains three Candido E, Young S, Nishri D. One cancer or two? The
impact of changes to the rules for counting multiple
main elements: patient ID (PID), observation primary cancers on estimates of cancer burden in
report ID (OBR), and observations (OBX). Ontario [Internet]. In: Proceedings of the 2015 Cana-
Patient ID contains personal and identifiable dian Society for Epidemiology and Biostatistics Con-
information, such as a patient’s name, sex, and ference; 1–4 June 2015; Mississauga/Toronto: Cancer
Care Ontario; 2015. Available at: http://csebca.ipage.
address. The observation report ID pertains to com/wordpress/wp-content/uploads/2014/06/June-2_
the pathology report and provides information 1430_SouthStudio_C1.2-Candido.pdf
regarding the pathologist, surgeon, referrals, and Centres for Disease Control and Prevention. Registry Plus,
specimen collection. The observations data ele- a suite of publicly available software programs for
collecting and processing cancer registry data [Inter-
ment conveys information regarding the clinical net]. Atlanta: National Center for Chronic Disease Pre-
diagnosis, clinical history, gross pathology, sub- vention and Health Promotion; Jan 2015 [cited 28 Oct
mitted tissues, and full diagnosis. 2015]. Available at: http://www.cdc.gov/cancer/npcr/
Chiarelli AM, Majpruz V, Brown P, Theriault M,
Shumak R, Mai V. The contribution of clinical breast
examination to the accuracy of breast screening. J Natl
References Cancer Inst. 2009;101(18):1236–43.
Chiarelli AM, Edwards SA, Prummel MV, Muradali D,
Majpruz V, Done SJ, et al. Digital compared with
Allemani C, Weir HK, Carreira H, Harewood R, Spika D, screen-film mammography: performance measures in
Wang XS. Global surveillance of cancer survival concurrent cohorts within an organized breast screen-
1995–2009: analysis of individual data for 25,676,887 ing program. Radiology. 2013;268(3):684–93.
patients from 279 population-based registries in Clarke EA, Marrett LD, Kreiger N. Cancer registration in
67 countries (CONCORD-2). Lancet. 2015;385 Ontario: a computer approach. IARC Sci Publ.
(9972):977–1010. 1991;95:246–57.
Ashworth A, Kong W, Chow EL, Mackillop W. The frac- Coleman MP, Quaresma M, Berrino F, Lutz JM, De
tionation of palliative radiation therapy for bone metas- Angelis R, Capocaccia R, et al. Cancer survival on
tases in Ontario. Paper presented at: The 56th Annual five continents: a worldwide population-based study
Meeting of the American Society for Radiation Oncol- (CONCORD). Lancet Oncol. 2008;9(8):730–56.
ogy; San Francisco; Sept 2014. Cordeiro ED, Dixon M, Coburn N, Holloway C. A patient-
Berman DM, Kawashima A, Peng Y, Mackillop WJ, Sie- centered approach toward wait times in the surgical
mens DR, Booth CM. Reporting trends and prognostic management of breast cancer in the province of
significance of lymphovascular invasion in muscle- Ontario. Ann Surg Oncol. 2015;22(8):2509–16.
invasive urothelial carcinoma: a population-based Enright K, Grunfeld E, Yun L, Moineddin R, Dent SF,
study. Int J Urol. 2015;22(2):163–70. Eisen A, et al. Acute care utilization (ACU) among
Biagi JJ, Wong R, Brierley J, Rahal R, Ross J. Assessing women receiving adjuvant chemotherapy for early
compliance with practice treatment guidelines by breast cancer (EBC). Paper presented at: The 2012
treatment centers and the reasons for noncompliance. Breast Cancer Symposium; San Francisco; Sept 2012.
Paper presented at: The 2009 Annual Meeting of the Hodgson DC, Grunfeld E, Gunraj N, Del Giudice L. A
American Society of Clinical Oncology; Orlando; population-based study of follow-up care for Hodgkin
May 2009. lymphoma survivors: opportunities to improve
surveillance for relapse and late effects. Cancer. Society of Gynecologic Oncologists; San Francisco;
2010;116(14):3417–25. Mar 2010.
International Cancer Benchmarking. Showcasing our find- Mittmann N, Isogai PK, Saskin R, Liu N, Porter J, Cheung
ings and impacts. London: Cancer Research; Dec 2014 MC, et al. Homecare utilization and costs in colorectal
[cited 28 Oct 2015]. Available from: http://www. cancer. Paper presented at: Healthcare Cost, Quality,
cancerresearchuk.org/sites/default/files/icbp_pb_1012214 and Policy: Driving Stakeholder Innovation in Process
_booklet_final.pdf and Practice Conference; Toronto; Nov 2013.
Johnson CH, Peace S, Adamo P, Fritz A, Percy-Laurry A, Nanji S, Mackillop WJ, Wei X, Booth CM. Management
Edwards BK. The 2007 Multiple Primary and Histol- and outcome of colorectal cancer (CRC) liver metasta-
ogy Coding Rules [Internet]. Bethesda: National Can- ses in the elderly: A population-based study. Paper
cer Institute’s Surveillance Epidemiology and End presented at: The 15th Annual Americas Hepato-
Results Program; Aug 2012. Available at: http://seer. Pancreato-Biliary Congress; Miami Beach; Sept 2015.
cancer.gov/tools/mphrules/ Nishri ED, Sheppard AJ, Withrow DR, Marrett LD. Cancer
Kagedan DJ, Raju R, Dixon M, Shin E, Li Q, Liu N, survival among First Nations people of Ontario,
et al. Predictors of actual survival in resected pancreatic Canada (1968–2007). Int J Cancer. 2015;136(3):
adenocarcinoma: A population-level analysis. Paper 639–45.
presented at: The 15th Annual Americas Hepato- Parkin DM, Bray F. Evaluation of data quality in the cancer
Pancreato-Biliary Congress; Miami Beach; Sept 2015. registry: principles and methods part II: Completeness.
Krahn MD, Bremner KE, Alibhai SM, Ni A, Tomlinson G, Eur J Cancer. 2009;45:756–64.
Laporte A, et al. A reference set of health utilities for Personal Health Information Protection Act; June 2016
long-term survivors of prostate cancer: population- [cited July 2016]. Available from: https://www.ontario.
based data from Ontario, Canada. Qual Life Res. ca/laws/statute/04p03
2013;22(10):2951–62. Richard PO, Alibhai S, Urbach D, Fleshner NE,
Leveridge MJ, Siemens DR, Mackillop WJ, Peng Y, Timilshina N, Klotz L, et al. The uptake of active
Tannock IF, Berman DM, et al. Radical cystectomy surveillance in prostate cancer: Results of a population
and adjuvant chemotherapy for bladder cancer in the based-study. Paper presented at: The 2015 Annual
elderly: a population-based study. Urology. 2015;85(4): Meeting of the American Urological Association;
791–8. New Orleans; Apr 2015.
MacKillop W, Siemens R, Zaza K, Kong W, Peng P, Smith LM, Strumpf EC, Kaufman JS, Lofters A,
Berman D, et al. The outcomes of radiation therapy Schwandt M, Levesque LE. The early benefits of
and surgery for bladder cancer: a population-based human papillomavirus vaccination on cervical dyspla-
study. Paper presented at: The 56th Annual Meeting sia and anogenital warts. Pediatrics. 2015;135(5):
of the American Society for Radiation Oncology; San 1131–40.
Francisco; Sept 2014a. Tinmouth JM, Lim T, Kone A, Mccurdy B, Dube C,
MacKillop W, Kong W, Zaza K, Owen T, Booth C. Volume Rabeneck L. Risk of colorectal cancer among those
of practice and the outcomes of radiation therapy for who are gFOBt positive but have had a recent prior
head and neck cancer. Paper presented at: The 56th colonoscopy: experience from an organized screening
Annual Meeting of the American Society for Radiation program. Paper presented at: Digestive Disease Week
Oncology; San Francisco; Sept 2014b. 2015; Washington, DC; May 2015.
Macneil SD, Liu K, Shariff SZ, Thind A, Winquist E, Zhukova N, Pole J, Mistry M, Fried I, Bartels U, Huang A,
Yoo J, et al. Secular trends in the survival of patients et al. Clinical and molecular determinants of long-term
with laryngeal carcinoma, 1995–2007. Curr Oncol. survival in children with low grade glioma; a popula-
2015;22(2):85–99. tion based study. Paper presented at: The 16th Interna-
McGee J, Narod S. Low-malignant-potential tumor risk tional Symposium on Pediatric Neuro-Oncology in
factor analysis: a matched case–control analysis. Conjunction with the 8th St. Jude-VIVA Forum; Sin-
Paper presented at: The 41st Annual Meeting of the gapore; 28 June 2015–2 July 2015.
Challenges of Measuring the
Performance of Health Systems 17
Adrian R. Levy and Boris G. Sobolev
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
Performance Measurement in the Canadian Health-Care System . . . . . . . . . . . . . . . . 392
Data Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
A Case Study on Performance Measurement: Health Technology
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
Existing Research on Performance Measurement in Health Technology
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Data Sources for Performance Measurement in Health Technology
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Abstract offs involved in trying to reduce costs while

Improving the measurement of the perfor- striving to improve quality of care, access, and
mance of health systems is a wise policy option the health of the population. Performance mea-
for federal, provincial, and territorial govern- surement – monitoring, evaluating, and commu-
ments because it provides essential informa- nicating the degree to which health-care systems
tion for understanding the inevitable trade- address priorities and meet specific objectives –
is also garnering increased attention from many
stakeholders at other levels of the system.
A. R. Levy (*)
Community Health and Epidemiology, Dalhousie
University, Halifax, NS, Canada Introduction
e-mail: adrian.levy@dal.ca
B. G. Sobolev (*) In 2010, the 11th in a series of annual reports was
School of Population and Public Health, University of
published and presented the most recent health
e-mail: bgsobolev@gmail.com indicator data from the Canadian Institute for

https://doi.org/10.1007/978-1-4939-8715-3_19
392 A. R. Levy and B. G. Sobolev
Health Information and Statistics Canada on a broad systematic approach to performance measurement,
range of performance. Each indicator falls into one it will not be possible to develop a coherent strategy
of the four dimensions of the health indicator: for informing policy-making and decision-making
(1) health status provides insight on the health of throughout the entire health-care system.
Canadians, including well-being, human function,
and selected health conditions; (2) nonmedical
determinants of health reflect factors outside of the Background
health system that affect health; (3) health system
performance provides insight on the quality of At a fundamental level, “the primary aim of eval-
health services, including accessibility, appropriate- uation is to aid stakeholders in their decision mak-
ness, effectiveness, and patient safety; and (4) coming on policies and programs” (Alkin 2004). It is
munity and health system characteristics provide intended to provide evidence on the degree to
useful contextual information, rather than direct which government policies and spending are
measures of health status or quality of care. The effectively addressing specific issues identified
goals of this chapter are to characterize the goals by bureaucrats and elected officials.
of a high-functioning health-care system and pro- Performance management in the public sector
vide a typology for performance measures in health became a focus of interest in the late 1980s,
care. Both of these will be done within the context starting with the reinventing government move-
of the renewal of the First Ministers’ Accord. ment (Osbourne and Gaebler 1992). In the United
The intent of the 1984 Canada Health Act was States, the 1993 Government Performance and
to ensure that all residents of Canada have access Results Act obligated all federal departments and
to medically necessary health care on a prepaid agencies to present 5-year teaching plans linked to
basis. However, the act has not been uniformly performance measures; annual performance plans
applied across provinces and territories, leading to were required after 1998. In the United Kingdom,
variability in available services and treatment in the financial management initiative was intro-
different jurisdictions. The federal government’s duced in the early 1980s.
determination to adhere to national standards In Canada, the federal government introduced
while reducing funding to the provinces has a centralized evaluation policy in 1977. Using
recently produced additional challenges. (For evidence from peer-reviewed sources and from
more details about federalism and health care in reports of the auditor general, Shepherd argued
Canada, see Wilson (2000)). that, between 1977 and 2009, Canada’s evaluation
The system currently used to measure the per- policy was focused on operational issues directed
formance of the health system in Canada lags primarily toward program managers (Shepherd
behind that of other countries such as the United 2012). In 2009, federal evaluation policy was
States and the United Kingdom, both in terms of refocused on fiscal prudence and accountability.
standardized indicators and research in the area.
As a result, there is evidence indicating that the
values of Canadians are misaligned with the Performance Measurement
funding and performance of the health-care sys- in the Canadian Health-Care System
tem (Snowdon et al. 2012).
In this chapter, the authors review the current Over the past 25 years, there has been an increase
state of knowledge about performance measure- in measuring and reporting on the performance of
ment in health care and examine current efforts the Canadian health-care system at the federal,
in Canada. We describe the structural, political, provincial, and territorial levels. On the demand
conceptual, and methodological challenges of per- side, provincial and territorial governments and
formance measurement in the field of health tech- health authorities have been subjected to intense
nology assessment. We argue that without more pressure to contain costs; patients have greater
clarity around ethics and perspectives and a more expectations to be involved in decisions about
17 Challenges of Measuring the Performance of Health Systems 393
their treatment; and health-care professionals and • Broadly available and able to be disseminated
health authorities expect more oversight and electronically across Canada at the regional,
accountability be built into the health-care system. provincial, and national level
On the supply side, the information revolution and
progress in information technology have made it The primary goal of the Health Indicators Pro-
less expensive and more straightforward to col- ject was to support health regions in monitoring
lect, process, and disseminate data. progress in improving and maintaining the health
There have been several attempts to define the of the population and the functioning of the health
problem of how to measure health-care perfor- system for which they are responsible through the
mance in Canada, the necessary first step toward provision of good-quality comparative informa-
aligning goals and objectives. In 2000, the First tion on:
Ministers’ Meeting Communiqué on Health
directed Canada’s health ministers to meet to
• The overall health of the population served,
collaborate on the development of a comprehen-
how it compares with other regions in the
sive framework to report on health status, health
province and country, and how it is changing
outcomes, and quality of service using jointly
over time
agreed-upon comparable indicators. The intent
• The major nonmedical determinants of health
was that such reporting would meet several
in the region
objectives by providing information to Cana-
• The health services received by the region’s
dians on government performance, as well as
residents
assisting individuals, governments, and health-
• The characteristics of the community or the
care providers to make more informed health
health system
choices. In September 2002, all fourteen federal,
provincial, and territorial governments released
comparable indicator reports on a set of 67 indi- No mention was made of other potential uses
cators. The 2003 First Ministers’ Accord on for performance indicators, including establishing
Health Care Renewal (Appendix 1) directed the competence of organizations and identifying
health ministers to develop more indicators to the effectiveness of programs to meet specific
supplement work undertaken in response to the objectives.
September 2000 communiqué and identified the The communiqué from the 2004 First Minis-
following priority areas for reform: healthy ters’ Meeting on the Future of Health Care,
Canadians, primary health care, home care, cat- called “A 10-Year Plan to Strengthen Health
astrophic drug coverage and pharmaceutical Care,” included an explicit commitment to
management, diagnostic and medical equipment, “accountability and reporting to citizens” that
and health human resources. Federal, provincial, read: “all governments agree to report to their
and territorial jurisdictions agreed on 70 indica- residents on health system performance includ-
tors with 81 sub-indicators and established the ing the elements set out in this communiqué.”
Health Indicators Project to have them collated In so doing, the first ministers agreed that per-
and make them publicly available. formance indicators would be required and
Priorities and directions for the Health Indica- would be used for reporting purposes. The
tors Project were broadly revisited at a second intent of the effort was to hold health ministries
consensus conference in March 2004. The accountable for stewardship of the health-care
resulting consensus statement established that system using performance indicators. The
health indicators must be: communiqué did not specify whether such
reporting would be used in a formative
• Relevant to established health goals (to improve specific health systems) or in a
• Based on standard (comparable) definitions summative (to implement corrective measures
and methods or impose penalties) fashion.
Consultations continued with provincial and patient and treatment registries. As a result of the
regional health authorities to ensure that relevant large amount of data collected in Canada, this
data were collected and consistent methods were country has been characterized as a data-rich envi-
used for performance measurement. In 2012, the ronment (Roos et al. 2005). This is reflected by the
13th in a series of annual reports presented health activities of provincial data centers, which both
indicator data from the Canadian Institute for serve as data custodians and collate and use
Health Information and Statistics Canada on a administrative health and other databases for
broad range of performance measures (CIHI research and evaluation (Suissa et al. 2012).
2012). The data were grouped into four dimen- Existing performance measures reported by the
sions of health: (1) health status, which provides Canadian Institute for Health Information depend
insight on the health of Canadians, including well- on information from provincial and territorial
being, human function, and selected health con- population registries, vital statistics, hospital dis-
ditions; (2) nonmedical determinants of health, charge abstracts, and physician claims. Even
which reflect factors outside of the health system though performance measures have been reported
that affect health; (3) health system performance, annually since 2003, there are concerns about the
which provides insight on the quality of health provinces’ ability to produce unbiased perfor-
services, including accessibility, appropriateness, mance measures because of data quality; in Man-
effectiveness, and patient safety; and (4) commu- itoba, the auditor was “unable to form an opinion
nity and health system characteristics, which pro- on the accuracy of the data or on the adequacy of
vide useful contextual information rather than disclosure” for 21 of 56 health indicators used in
direct measures of health status or quality of care. the provincial report (Manitoba Minister of Health
That report used the following principles to and Healthy Living 2004).
categorize disparities in the health system:
• Same access to available care for the same need A Case Study on Performance
• Same utilization for the same need Measurement: Health Technology
• Same quality of care for all Assessment
In general, three types of outcomes are studied in

Data Requirements health-care evaluations: those related to patients,
those related to treatments, and those related to the
Those considering performance measurement are system (Levy 2005). Patient-related outcomes
faced with many competing needs when design- represent the effects of delivering care in a partic-
ing information systems to serve a range of stake- ular system on the patient’s ability to care for
holders (Table 1). A set of consensus performance himself or herself, physical function and mobility,
measures needs to be developed iteratively, and emotional and intellectual performance, and self-
those involved in the process must have a deep perception of health. Treatment-related outcomes
understanding of existing and potential data represent the biological and physiological
sources that can be used to create the measures. changes in the patient’s condition that occur as a
The specific circumstances of health care in result of administering therapy within the health-
Canada – such as Canada’s single-payer financing care system. System-related outcomes represent
and several provincial and federal initiatives – the effect on the health-care system produced by
have led to the development of key elements the provision of medical services to a patient
needed to produce some routine performance population.
measures, including population registries, vital Examples of the outcomes include perfor-
statistics, administrative health databases mance benchmarks, requirements for pain medi-
containing records of patients’ interactions with cation, length of hospital stay, waiting times,
various elements of the health-care system, and frequency of readmission, and frequency and
Table 1 Examples of health-care performance indicators and information needs according to the type of stakeholder
Stakeholder Goals Types of needed information
Citizens To see evidence that resources on health are being Transparent descriptions of stated priorities
spent efficiently and align with stated priorities Comparative information on the health of the
To have the information they need to hold policy and population versus that in other countries
decision-makers accountable for health policies and Comparative information on the performance of the
health-care delivery that align with societal values health-care system versus that in other countries
To be reassured that necessary care will be Transparent access to indicators of access, quality of
forthcoming in time of need care, and resource use in the health-care system
Patients To be reassured that they will have access to specific Information on available health-care services and
health care when they need it, within a safe timeframe modalities
and at adequate proximity Information on trade-offs between services in terms
To obtain information on the intended and of potential intended and unintended health
unintended consequences of alternative health-care outcomes and out-of-pocket costs
options and on the out-of-pocket expenses associated
with these options
Health-care To provide high-quality and appropriate health care Data on individual performance against benchmarks
professionals to patients Up-to-date information on best practices, guidelines
To maintain and improve their knowledge and skills
in health-care delivery
Hospitals To monitor and improve the use of health-care Collective data on health-care quality, including
resources patient safety indicators measured against
To manage local budgets benchmarks
To identify and prioritize health technology Information on distributions of access (utilization,
acquisition and disinvestment waiting lists, and waiting times) measured against
To ensure patient safety benchmarks
To conduct continuous quality improvement A transparent health technology assessment process
Information on patient experience and satisfaction
Hospital-level costing information
Health To ensure that hospitals and health-care professionals Information on the comparative health of their
authorities provide appropriate and cost-effective health care population versus that of populations served by other
To ensure that patients have access to the specific health authorities
health care they need, within a safe timeframe and at Information on the health needs of their region
adequate proximity Information on the equity of health-care resource
To manage regional budgets distribution
To assess the impact of health care on the regional Information on distributions of access (utilization,
health needs of the population waiting lists, and waiting times) across health
To ensure equitable distribution of resources authority
Health authority-level costing information
Governments To assess the impact of health care on patients and on Comparative data on the health of their population
population health versus that of populations in other provinces and
To establish current and future health policy goals territories and in other countries
and priorities Information on the societal value of health care,
To set and manage governmental budgets elicited using transparent citizen engagement
To plan for the viability and sustainability of the processes
health-care system Information on the health needs of the region
To demonstrate the adequacy and proper functioning Information on the equity of health-care resource
of regulatory procedures for health care distribution
To provide appropriate assessment and research Information on distributions of access (utilization,
infrastructure waiting lists, and waiting times) across the
To promote investment and innovation in health care jurisdiction
Aggregate and decomposed expenditure data at the
provincial, territorial, and national level
Information on societal productivity attributable to
health and health care
Regulators To protect patient safety Safety signals from health care
To ensure protection of health-care professionals and Integrity in reporting financial performance
other consumers beyond patients Information on innovation in health care
To uphold their fiduciary responsibility
To promote efficiency in health-care markets
severity of secondary health complications. In the real world, or they need to understand how
health-care evaluation, a performance measure they affect the health system in terms of who is
summarizes the distribution of a health-care out- actually treated, the long-term clinical benefits,
come in the patient population. In most studies, severe unintended consequences, health-related
the performance measure combines the observed quality of life, and productivity. Even less is
responses for all patients or hospitals into a single known about the impact of less severe unintended
number. For example, a performance study might consequences, downstream medical and health
record the timing and occurrence of a clinic consequences (for the population to whom the
appointment for each patient, with the distribution technology is actually applied), population effec-
of time to clinic appointment (the health-care out- tiveness, or incremental cost-effectiveness in
come) being summarized by the weekly rate of actual use.
appointments (the performance measure). Many innovations have led to less invasive
There have been large investments in health technologies being introduced to treat conditions
technology assessment over the past decades, previously managed surgically, such as percuta-
and the use of new health-care technology is an neous transluminal coronary angiography, which
important driver of ongoing increases in health- is now being undertaken in patients who were
care expenditures. Before an expensive new tech- previously managed with coronary artery bypass
nology is implemented and covered in a jurisdic- grafting (Weintraub et al. 2012), and extracorpo-
tion, the expected impacts are assessed at the real shock-wave lithotripsy, which has displaced
provincial level, and the technology’s incremental surgical removal of kidney stones. Noninvasive
cost-effectiveness is often assessed by the Cana- technologies typically reduce patient morbidity
dian Agency for Drugs and Technology in Health; and the length of hospital stay, often resulting in
by several provinces, such as Ontario and Quebec; lower unit costs of treatment, and should therefore
and by some Canadian hospitals (Levin result in potential cost savings to the health-care
et al. 2007; McGregor and Brophy 2005). system. However, understanding the long-term
At the time a new health technology comes consequences of such technologies requires for-
to market, there is typically little information on mal assessment because those savings are often
its benefits, safety, and cost implications for the not realized. Angioplasty leads to a greater need
population among whom the technology will be for repeat revascularization over time, which
used. As such, health technology assessment reduces the cost differential, and, perhaps because
provides an incomplete picture. It examines of reduced morbidity, the number of patients and
short-term safety, with a focus on the most com- treatments may increase after a new technology
mon, serious (potentially life threatening), and becomes established (Levy and McGregor 1995).
severe (potentially debilitating) unintended Although measuring the performance of new
consequences; efficacy, often using data from health-care technologies once they have been
the restricted conditions in randomized trials; introduced into practice is crucial, it is done only
the acquisition costs; and, sometimes, estimated rarely. The work of the Ontario Health Technol-
cost-effectiveness on the basis of long-term pro- ogy Advisory Committee is an exception (Levin
ject models drawing on the limited information et al. 2007). One reason is that there is a lack of
available at market launch. indicators on a new health technology and a time
Once the technology is marketed, some infor- lag of at least several years before administrative
mation becomes available on the geographic dis- data becomes available for analysis in Canada.
tribution of the technology and sometimes its This knowledge gap is becoming increasingly
utilization. However, this descriptive information problematic as governments, health authorities,
alone is not adequate for assessing the perfor- and hospitals struggle to work within fixed bud-
mance of the technology. Decision-makers need gets, with the federal government planning on
to understand how new technologies affect indexing its spending to inflation. Decision-
patients once they have been adopted for use in makers in these organizations have said clearly
that they suffer from a lack of straightforward it more useful for improving performance
information about which technologies work, on within an organization than for comparing per-
whom, and under what circumstances (Health formance between organizations.
Technology Assessment Task Group on behalf More recently, a group of European investiga-
of the Federal/Provincial/Territorial Advisory tors proposed an input-throughput-outcome
Committee on Information and Emerging Tech- model of the health-care system in relation to the
nologies 2004). There is no consensus on, or even different types of health-care technologies
an understanding of, what should be measured or (Velasco et al. 2010). The thrust of their argument
how performance should be measured. is that “health technology assessment should
develop to increase its focus on the ‘technologies
applied to health care’ (i.e., the regulatory and
Existing Research on Performance policy measures for managing and organizing
Measurement in Health Technology health-care systems).” They recommend that
Assessment health technology assessment should have an
increased focus on regulatory, financial, and pol-
At least four groups of investigators have pro- icy measures for managing and organizing health-
posed methods to measure performance in health care systems. They recommend that “countries
technology assessment. A group of investigators embarking on health technology assessment
from the United Kingdom proposed a framework should not consider establishing completely sep-
for describing decision-making systems that use arate agencies for health technology assessment,
health technology assessment to determine reim- quality development, performance measurement,
bursement of health technologies (Hutton and health service development, but should rather
et al. 2006). The framework groups systems combine these agencies into a common knowl-
under four main headings (constitution and gov- edge strategy for evidence-informed decision-
ernance, objectives, use of evidence and decision making in the health services and the health sys-
processes, and accountability) and identifies three tem.” Although ambitious, there would be much
processes (assessment, decision, and outputs and to be gained from such a strategy.
implementation). Hutton et al. assessed the feasi- The framework closest to assessing some of
bility of implementing the framework using the performance measures listed in Table 1 was
published information on constitution and gover- developed in Quebec (Jacob and McGregor
nance, methods and processes, the use of evi- 1993). These authors outlined a new methodology
dence, and transparency and accountability, at for evaluating the impact of health technology
the stages of assessment, decision-making, and assessments on policy and expenditures and
implementation. They found that most of the applied it to 21 assessments produced by the Que-
information needed for their framework was not bec Council for Health Technology Assessment
publicly available. between 1990 and 1995. Using published docu-
A group of researchers from l’Université de ments, interviews, questionnaires, and administra-
Montréal proposed a framework for performance tive health data, the authors sought to evaluate
assessment in health technology assessment orga- the impact of health technology assessments by
nizations (Lafortune et al. 2008). Their conceptual addressing three fundamental questions: (1) What
model includes four functions and organizational impact was intended? (2) To whom was the mes-
needs that must be balanced for a health tech- sage directed? (3) To what extent was the hoped-
nology agency to perform well: goal attainment, for impact achieved, first in terms of policy and
production, adaptation to the environment and second in terms of actual distribution and the use
culture, and value maintenance. Although this of the technology? The authors determined that
model has a strong conceptual grounding, it has 18 of the 21 assessments had an influence on
yet to be applied in practice. It requires analysts policy and that there were substantial savings to
to make qualitative judgments, which may make the health-care system. They concluded that it will
rarely be possible to precisely estimate impact, but assessment done using a primary data collection
systematic documentation of effects can be procedure (Goeree et al. 2009).
achieved. The self-stated limitations of their meth-
odology included the identification of what they
called critical incidents, systematic categorization Data Sources for Performance
of policies about health technology, and the use of Measurement in Health Technology
documentation, which led to a degree of objectiv- Assessment
ity but also led to limitations relating to the reli-
ance on analysts’ judgment. The interpretations In terms of using existing data sources for perfor-
were improved by consulting with important mance measurement, investigators in the United
stakeholders. They also acknowledged that the Kingdom have proposed a typology of databases
impact of any health technology assessment is according to their potential uses in the following
influenced by many other factors, substantially elements of health technology assessment
complicating interpretations. (Assessing causality (Raftery et al. 2005):
when measuring performance of health technol-
ogy is among the most pernicious challenges fac- • Group I databases can be used to identify both
ing the careful analyst. This is made particularly health technologies and health states; these, in
challenging because of the impossibility of ran- turn, can be disaggregated into clinical regis-
domization in most studies. The thoughtful study tries, clinical administrative databases, and
by Jacob and McGregor (1993) is notable for its population-oriented databases. These data-
rigor and critical thinking in this area.) bases can be used to assess effectiveness,
None of the existing frameworks for perfor- equity, and diffusion.
mance measurement of health technology assess- • Group II databases can be used to identify
ment have gained widespread acceptance or have health technologies but not health states.
been used widely to help guide allocation deci- These databases can be used to assess
sions. One reason for this lack of uptake may be diffusion only.
that these frameworks are too complicated to be • Group III databases can be used to identify
easily applied or understood. Part of the reason the health states but not health technologies;
frameworks are complex is that the variables that these, in turn, can be disaggregated into
comprise the frameworks are not clearly defined. adverse event reporting, disease-only regis-
Without proper definition it is difficult to access tries, and health surveys. These databases
the appropriate indicators, which in turn makes it have restricted scope; they are focused mainly
difficult to examine the outcomes. on unintended adverse consequences of treat-
Other than the efforts of Jacob and McGregor ment or disease.
(1993), existing publications on performance
measurement in health technology assessment In the environmental scan that Raftery
have focused on processes and not on outcomes. et al. conducted in England and Wales, 270 data-
One reason for this is that outcomes are harder to bases were identified, of which an estimated six
measure in an unbiased fashion. Instead, existing had some potential for health technology assess-
performance measurement systems for health ment, approximately one-half of which could be
technology assessment are scattered and gener- assigned to group I. These investigators made
ated in a nonsystematic fashion. Additionally, important recommendations for policy that are
health technology assessments must presently applicable in Canada: responsibility for the strate-
rely on data that are made available because it is gic development of databases should be clarified
relatively convenient to do so, such as information (in Canada, this might be refocused on the ratio-
generated using routinely collected administrative nalization of data collection efforts with and
health data (Roos et al. 2005) and registries across health authorities); more resources should
(Tu et al. 2007); only rarely is a performance be made available; and issues associated with
coding, confidentiality, custodianship and access, medical products by setting and enforcing maxi-
maintenance of clinical support, optimal use of mum reimbursement amounts for medications,
information technology, filling gaps, and remedy- whereas provision of health care is mostly a pro-
ing deficiencies should be clarified. vincial and territorial responsibility. This compli-
cated legislative and regulatory environment
means that political and health reform cycles
Discussion must be considered at an early stage in the devel-
opment of performance measures (Roberts
Efforts to measure and assess the performance of et al. 2008). Performance indicators would be
the Canadian health system in Canada are in the developed and implemented much more effec-
early stages, and the research agenda is enormous. tively if there was cooperation between the fed-
Policy questions about what data to collect, and at eral, provincial, and territorial governments as
what cost, now have equally important parallels in well as health authorities and individual hospitals.
terms of how and when to most usefully summa- It is not possible for any subset of performance
rize and report such information, how to integrate measures to capture all of the facets of health care
the information into governance and efforts to that are needed by different stakeholders. What is
improve performance, and, ultimately, how to required is a process of systematically identifying
make wise decisions to optimize the health of and prioritizing performance measures that will
the population. meet at least some of the needs of each stake-
Developing performance indicators can be holder. Determining what performance measures
seen as a four-step process consisting of policy, should be used is, at the most fundamental level,
development, implementation, and evaluation an ethical question because the output must rep-
phases (Ibrahim 2001). The process must address resent the different values and needs of multiple
the conceptual, methodological, practical, and stakeholders. (Depending on the perspective,
political considerations for developing perfor- performance measures could be developed to rep-
mance measures for the Canadian health system. resent different perspectives, including the fol-
The lack of a conceptual framework for perfor- lowing ones. First, the utilitarian perspective
mance measurement in health means that research emphasizes the importance of achieving the
in the area is in its infancy. Methodological chal- greatest good for the greatest number. Bureaucrats
lenges are created by the nature of funding mech- require performance indicators to provide wise
anisms in the Canadian health system and the stewardship of the health-care system and to bal-
potentially long time lags between cause and ance equity of access with efficient distribution.
effect. Practical considerations include the daunt- For example, some Canadian midsized cities may
ing volume of work that would be required for seek to establish catheterization laboratories to
greater performance measurement, including the increase the speed of access to angioplasty for
cost and timing of such work. To date, many treating acute myocardial infarction, and provin-
unresolved questions remain, such as the follow- cial bureaucrats require access to information on
ing: Who will decide the performance indicators? both distributive and allocative efficiencies to bal-
Who will measure them? How will the results of ance the merit of these claims (Levy et al. 2010).
such measurements be presented? To whom and Health-care professionals and hospital administra-
how often? Performance assessment should not be tors use performance indicators to identify the
seen as a one-time effort: regular, ongoing follow- functional competence of individual practitioners
up is required. Political challenges include the and organizations and to decide which technolo-
different levels of governmental jurisdictions in gies to adopt. Surgeons must maintain their skills
Canada, with standards for care being laid out by to minimize operative complications, and health
the Canada Health Act; the federal government is authority decision-makers may seek detailed
responsible for protecting the health of the popu- information on postoperative infection rates
lation by ensuring safety through the regulation of when considering a technology for stapling versus
sewing colorectal anastomoses (when closing the and becomes normative (Murray and Lopez
opening left after removal of a colostomy bag). 1996). In so doing, it has the possibility of
This information is needed when making policy influencing policy decisions, spending, and even
decisions about purchasing and planning skills patterns of thinking about the health system.
training. Second, the libertarian perspective There is a risk of overreliance on existing perfor-
emphasizes the rights of individuals to access mance measures to the detriment of other aspects
and choose between levels of health care. For of care. For instance, in 2004, Canada’s first min-
example, patients choosing between different isters agreed to reduce waiting times in five prior-
treatments may seek detailed comparative infor- ity areas – radiation therapy for cancer, cardiac
mation on the intended and unintended conse- care, diagnostic imaging, joint (hip and knee)
quences of different treatment modalities: for replacement, and cataract surgery for sight resto-
example, when patients are considering angio- ration – by providing hospitals with cash incen-
plasty and stenting or bypass surgery for coronary tives from a $5.5-billion funding envelope. The
artery disease, their risk preferences may be Canadian Institute for Health Information now
elicited if information on benefits and risks is reports on performance measures for waiting
available and synthesized in an understandable times (CIHI 2012b). The current emphasis on
fashion. Third, the communitarian perspective these five priority areas means that other neces-
emphasizes the need to balance the rights of indi- sary procedures not considered a priority are
viduals against the rights of the community as a disincentivized. In orthopedics, for example,
whole. Organ donation (e.g., with a presumption operations such as surgery to repair feet and
that all persons are organ donors unless donation ankles are paid for out of a hospital’s global bud-
is actively opposed by the family), abortion and get and are not eligible for the incentive payments,
family planning services, and issues associated which creates a financial incentive for hospitals to
with the use of tobacco and intravenous drugs prioritize hip and knee replacements.
are all health-care matters in which communitar-
ian values may be invoked.) Examples from the
literature include performance measurement in Recommendations
the delivery of health-care services (Roski and
Gregory 2001), health systems (Evans A useful performance measure should always
et al. 2001), and the health of the community begin with detailed documentation of the indica-
(Klazinga et al. 2001). tors that constitute the measure, once definitions
The inherent complexities of health care, such have been agreed upon. Given the seemingly
as the diverse expertise of health-care profes- widespread acceptance in Canada of the four
sionals, the variety of organizational arrange- dimensions discussed earlier, indicators should
ments, the array of treatment protocols, and the fall into one of these dimensions: health status,
myriad interactions between managerial and nonmedical determinants of health, health system
clinical activities, may necessitate that multiple performance, and community and health system
outcomes be integrated in evaluating the effects characteristics. There should also be a clarification
of an intervention at the level of the patient, of responsibility for the strategic development
treatment, or health-care system (Sobolev of databases, a greater availability of resources,
et al. 2012). Table 1 provides examples of and clarification of issues associated with coding,
health-care performance indicators and infor- confidentiality, custodianship and access, mainte-
mation needs according to the type of stake- nance of clinical support, optimal use of informa-
holder. This list is not intended to be tion technology, filling gaps, and remedying
exhaustive, and the categories and information deficiencies.
needs overlap between stakeholders. The focus of measurement must be on out-
Once a performance measure comes into prac- comes as well as processes, and health perfor-
tice, it permeates the thinking of decision-makers mance measurement should have an increased
focus on regulatory, financial, and policy mea- Klazinga N, Stronks K, Delnoij D, Verhoeff A. Indicators
sures for managing and organizing health-care without a cause. Reflections on the development and
use of indicators in health care from a public health
systems. There should not be separate agencies perspective. Int J Qual Health Care. 2001;13:433–8.
for quality development, performance measure- Lafortune L, Farand L, Mondou I, et al. Assessing the
ment, and service development, but rather these performance of health technology assessment organi-
should be combined in a common strategy that zations: a framework. Int J Technol Assess Health Care.
2008;24:76–86.
will inform decision-making throughout the entire Levin L, Goeree R, Sikich N, et al. Establishing a compre-
health-care system. hensive continuum from an evidentiary base to policy
There has been, to date, a lack of focus on development for health technologies: the Ontario expe-
strategic evaluations of policy and program coher- rience. Int J Technol Assess Health Care. 2007;
23:299–309.
ence, that is, whether policies and programs are Levy AR. Categorizing outcomes of health care delivery.
addressing the issues and values that are most Clin Invest Med. 2005;28:347–50.
important to Canadians, such as understanding Levy AR, McGregor M. How has extracorporeal shock-
and improving determinants of health by reducing wave lithotripsy changed the treatment of urinary
stones in Quebec? Can Med Assoc J. 1995;153:
poverty and aligning healthcare spending with the 1729–36.
principles embodied in the Canada Health Act. Levy AR, Terashima M, Travers A. Should geographic
analyses guide the creation of regionalized care models
Acknowledgments This chapter is reprinted from Levy, for ST-segment elevation myocardial infarction? Open
Adrian R., and Boris G. Sobolev. “The Challenges of Med. 2010;1:e22–5.
Measuring the Performance of Health Systems in Manitoba, Minister of Health and Healthy Living.
Canada.” Health Care Federalism in Canada. Eds. Manitoba’s comparable health indicator report. Winni-
Katherine Fierlbeck and William Lahey. Montreal: peg: Manitoba Health; 2004.
McGill-Queen’s University Press, 2013. Print. McGregor M, Brophy JM. End-user involvement in health
technology assessment (HTA) development: a way to
increase impact. Int J Technol Assess Health Care.
2005;21:263–7.
Murray CJL, Lopez AD. The global burden of disease: a
References comprehensive assessment of mortality and disability
from diseases, injuries and risk factors in 1990 and
Alkin M. Evaluation roots: tracing theorists’ views and projected to 2020. Cambridge, MA: Harv Sch Public
influences. Thousand Oaks: CA Sage; 2004. Health/WHO/World Bank; 1996; Report No. 1.
Canadian Institute for Health Information (CIHI). Health Osbourne D, Gaebler T. Reinventing government. Lexing-
indicators 2012. http://waittimes.cihi.ca/ ton: Addison-Wesley; 1992.
Evans DB, Edejer TT, Lauer J, et al. Measuring quality: Raftery J, Roderick P, Stevens A. Potential use of routine
from the system to the provider. Int J Qual Health Care. databases in health technology assessment. Health
2001;13:439–46. Technol Assess. 2005;9:1–iv.
Goeree R, Levin L, Chandra K, et al. Health technology Roberts MJ, Hsiao W, Berman P, Reich M. Getting health
assessment and primary data collection for reducing reform right – a guide to improving performance and
uncertainty in decision making. J Am Coll Radiol. equity. Oxford, UK: Oxford University Press; 2008.
2009;6:332–42. Roos LL, Gupta S, Soodeen RA, Jebamani L. Data quality
Health Canada – Health Technology Assessment Task in an information-rich environment: Canada as an
Group on behalf of the Federal/Provincial/Territorial example. Can J Aging. 2005;24 Suppl 1:153–70.
Advisory Committee on Information and Emerging Roski J, Gregory R. Performance measurement for ambu-
Technologies Technology Strategy 1.0. 2004. Avail- latory care: moving towards a new agenda. Int J Qual
able at http://www.hc-sc.gc.ca/hcs-sss/pubs/ehealth- Health Care. 2001;13:447–53.
esante/2004-tech-strateg/index-eng.php Shepherd RP. In search of a balanced Canadian federal
Hutton J, McGrath C, Frybourg JM, et al. Framework for evaluation function: getting to relevance. Can J Pro-
describing and classifying decision-making systems gram Eval. 2012;26:1–45.
using technology assessment to determine the reim- Snowdon A, Schnarr K, Hussein A, Alessi C. Measuring what
bursement of health technologies (fourth hurdle sys- matters: the cost vs. values of health care. Ivey Interna-
tems). Int J Technol Assess Health Care. 2006;22:10–8. tional Centre for Health Innovation. http://sites.ivey.ca/
Ibrahim JE. Performance indicators from all perspectives. healthinnovation/thought-leadership/white-papers/measur
Int J Qual Health Care. 2001;13:431–2. ing-what-matters-the-cost-vs-values-of-health-care-
Jacob R, McGregor M. Assessing the impact of health november-2012/
technology assessment. Int J Technol Assess Health Sobolev B, Sanchez V, Kuramoto L. Health care evaluation
Care. 1993;13:68–80. using computer simulation: concepts, methods and
applications. New York: Springer; 2012; 480 pages Velasco GM, Gerhardus A, Rottingen JA, Busse
ISBN: 978-1-4614-2232-7. R. Developing health technology assessment to address
Suissa S, Henry D, Caetano P, et al. CNODES: the Cana- health care system needs. Health Policy. 2010;
dian network for observational drug effect studies. 94:196–202.
Open Med. 2012;6, e134. Weintraub WS, Grau-Sepulveda MV, Weiss JM,
Tu JV, Bowen J, Chiu M, et al. Effectiveness and safety of et al. Comparative effectiveness of revascularization
drug-eluting stents in Ontario. N Engl J Med. strategies. N Engl J Med. 2012;366:1467–76.
2007;357:1393–402.
Part II
Methods in Health Services Research
Analysis of Repeated Measures and
Longitudinal Data in Health Services 18
Research
Juned Siddique, Donald Hedeker, and Robert D. Gibbons
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
Issues Inherent in Longitudinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Historical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
Statistical Models for the Analysis of Longitudinal and Repeated
Measures Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Mixed-Effects Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Matrix Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
Covariance Pattern Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Calculating Effect Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
Illustrative Example: The WECare Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
Mixed-Effects Regression Models for Continuous Data Using the
WECare Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
Curvilinear Growth Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
Covariance Pattern Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422
Effect of Treatment Group on Change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Extensions and Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Analysis of Longitudinal Data with Missing
Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Generalized Estimating Equation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Models for Categorical Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
J. Siddique (*)
Department of Preventive Medicine, Northwestern
University Feinberg School of Medicine, Chicago, IL, USA
e-mail: siddique@northwestern.edu
D. Hedeker
Department of Public Health Sciences, University of
Chicago, Chicago, IL, USA
e-mail: hedeker@uchicago.edu
R. D. Gibbons
Departments of Medicine and Public Health Sciences,
University of Chicago, Chicago, IL, USA
e-mail: rdg@uchicago.edu

https://doi.org/10.1007/978-1-4939-8715-3_1
406 J. Siddique et al.
Growth Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429

Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
Abstract baseline severity as well. Laird and Ware (1982)

This chapter reviews statistical methods for the showed that mixed-effects regression models
analysis of longitudinal data that are com- could be used to perform a more complete analy-
monly found in health services research. The sis of all of the available longitudinal data under
chapter begins by discussing issues inherent in much more general assumptions regarding the
longitudinal data and provides historical back- missing data (i.e., missing at random). The net
ground on early methods that were used to result was a more powerful set of statistical tools
analyze data of this type. Next, mixed-effects for analysis of longitudinal data that led to more
regression models (MRMs) and covariance- powerful statistical hypothesis tests, more precise
pattern models (CPMs) for longitudinal data estimates of rates of change (and differential rates
are introduced with a focus on linear models of change between experimental and control
for normally distributed outcomes. As an illus- groups), and more general assumptions regarding
tration of the use of these methods in practice, missing data, for example, because of study drop-
MRMs and CPMs are applied to data from the out. This early work has led to considerable
Women Entering Care (WECare) study, a lon- related advances in statistical methodology for
gitudinal depression treatment study. Finally, the analysis of longitudinal data (see Hedeker
extensions and alternatives to these models are and Gibbons 2006; Fitzmaurice et al. 2012;
briefly described. Key phrases: mixed-effects Diggle et al. 2002; Goldstein 2011; Longford
models; random-effects models; covariance- 1993; Raudenbush and Bryk 2002; Singer and
pattern models; effect sizes. Willett 2003; Verbeke and Molenberghs 2000 for
several excellent reviews of this growing
literature).
Introduction The following sections provide a general
overview of recent advances in statistical
In health services research, a typical study design methods for the analysis of longitudinal data.
is the longitudinal clinical trial in which patients The primary focus is on linear models for con-
are randomly assigned to different treatments and tinuous data. Their application is illustrated
repeatedly evaluated over the course of the study. using data from the Women Entering Care
Since the pioneering work of Laird and Ware (WECare) study, a longitudinal depression
(1982), statistical methods for the analysis of lon- treatment study of low income minority
gitudinal data have advanced dramatically. Prior women with depression. In order to motivate
to this time, a standard approach to analysis of the use of these advanced methods, the first
longitudinal data principally involved using the section discusses issues inherent in longitudinal
longitudinal data to impute end-points (e.g., last data and some of the history of earlier methods
observation carried forward) and then to simply for the analysis of longitudinal data. Next, linear
discard the valuable intermediate time-point data, mixed-effects regression models (MRMs) and
favoring the simplicity of analyses of change covariance-pattern models (CPMs) are
scores from baseline to study completion (or the described in detail and applied to the WECare
last available measurement treated as if it was study. At the end of the chapter, alternatives to
what would have been obtained had it been the and extensions of linear MRMs are briefly
end of the study), in some cases adjusting for discussed and concluding remarks are provided.
18 Analysis of Repeated Measures and Longitudinal Data in Health Services Research 407
Issues Inherent in Longitudinal Data study. Reasons for discontinuing the study may be
differentially related to the treatment. For exam-
While longitudinal studies provide far more infor- ple, some subjects may develop side effects to an
mation than their cross-sectional counterparts, otherwise effective treatment and must discon-
they are not without complexities. The following tinue the study. Alternatively, some subjects
sections review some of the major issues associ- might achieve the full benefit of the study early
ated with longitudinal data analysis. on and discontinue the study because they feel
that their continued participation will provide no
Heterogeneity added benefit. The treatment of missing data in
Particularly in health services research, individual longitudinal studies is itself a vast literature, with
differences are the norm rather than the exception. major contributions by Laird (1988), Little
The overall mean response in a sample drawn (1995), Rubin (1976), and Little and Rubin
from a population provides little information (2002) to name a few. The basic issue is that
regarding the experience of the individual. In con- even in a randomized and well-controlled clinical
trast to cross-sectional studies in which it is rea- trial, the subjects who were initially enrolled in the
sonable to assume that there are independent study and randomized to the various treatment
random fluctuations at each measurement occa- conditions may be quite different from those sub-
sion, when the same subjects are repeatedly mea- jects that are available for analysis at the end of the
sured over time, their responses are correlated trial. If subjects “drop out” because they already
over time, and their estimated trend line or curve have derived full benefit from an effective treat-
can be expected to deviate systematically from the ment, an analysis that only considers those sub-
overall mean trend line. For example, behavioral jects who completed the trial may fail to show that
and/or biological subject-level characteristics can the treatment was beneficial relative to the control
increase the likelihood of a favorable response to a condition. This type of analysis is often termed a
particular experimental intervention (e.g., a new “completer” analysis. To avoid this type of obvi-
pharmacologic treatment for depression), leading ous bias, investigators often resort to an analysis
subjects with those characteristics to have a trend in which the last available measurement is carried
with higher slope (i.e., rate of change) than the forward to the end of the study as if the subject had
overall average rate of change for the sample as a actually completed the study. This type of analy-
whole. In many cases, these personal characteris- sis, often termed an “end-point” analysis, intro-
tics may be unobservable, leading to unexplained duces its own set of problems in that (a) all
heterogeneity in the population. Modeling this subjects are treated equally regardless of the
unobserved heterogeneity in terms of variance actual intensity of their treatment over the course
components that describe subject-level effects is of the study, and (b) the actual responses that
one way to accommodate the correlation of the would have been observed at the end of the
repeated responses over time and to better study, if the subject had remained in the study
describe individual differences in the statistical until its conclusion, may in fact, be quite different
characterization of the observed data. These vari- than the response made at the time of discontinu-
ance components are often termed “random-effects,” ation. Returning to the example of the study in
leading to terms like random-effects or mixed- which subjects discontinue when they feel that
effects regression models. they have received full treatment benefit, an
end-point analysis might miss the fact that some
Missing Data of these subjects may have had a relapse had they
Perhaps the most important issue when analyzing remained on treatment. Many other objections
data from longitudinal studies is the presence of have been raised about these two simple
missing data. Stated quite simply, not all subjects approaches of handing missing data, which have
remain in the study for the entire length of the led to more statistically reasoned approaches for
the analysis of longitudinal data with missing increase over time and covariances decrease as
observations. time-points become more separated in time.
Finally, based on the use of least-squares estima-
Irregularly Spaced Measurement tion, the repeated measures ANOVA breaks down
Occasions for unbalanced designs, such as those in which the
It is not at all uncommon in real longitudinal sample size decreases over time due to subject
studies either in the context of designed experi- discontinuation. Based on these limitations, the
ments or naturalistic cohorts, for individuals to repeated measures ANOVA and related
vary both in the number of repeated measure- approaches are mostly no longer used for the
ments they contribute and even in the time at analysis of longitudinal data. Mixed-effects
which the measurements are obtained. This may regression models, which are described in the
be due to drop-out or simply due to different sub- next section, build upon the repeated measures
jects having different schedules of availability. ANOVA framework by allowing more than just
While this can be quite problematic for traditional the intercept term to vary by individual in order to
analysis of variance based approaches (leading to better capture between-subject variability. In addi-
highly unbalanced designs which can produce tion, mixed-effects regression models use all
biased parameter estimates and tests of hypothe- available data so that not all subjects need to be
ses), more modern statistical approaches to the measured at the same time points.
analysis of longitudinal data are all but immune The second early approach for repeated mea-
to the “unbalancedness” that is produced by hav- sures data was multivariate growth curve – or
ing different times of measurement for different MANOVA – models (Potthoff and Roy 1964;
subjects. Indeed, this is one of the most useful Bock 1975). The primary advantage of the
features of the regression approach to this prob- MANOVA approach versus the ANOVA
lem, namely the ability to use all of the available approach is that the MANOVA assumes a gen-
data from each subject, regardless of when the eral form for the correlation of repeated mea-
data were specifically obtained. surements over time, whereas the ANOVA
assumes the much more restrictive compound-
symmetric form. The disadvantage of the
Historical Background MANOVA model is that it requires complete
data. Subjects with incomplete data are
Existing methods for the analysis of longitudinal removed from the analysis, leading to potential
data are an outgrowth of two earlier approaches bias. In addition, both MANOVA and ANOVA
for repeated measures data. The first approach, the models focus on comparison of group means
so-called repeated measures ANOVA was essen- and provide no information regarding subject-
tially a random intercept model that assumed that specific growth curves. Finally, both ANOVA
subjects could only deviate from the overall mean and MANOVA models require that the time-
response pattern by a constant that was equivalent points are fixed across subjects (either evenly
over time. A more reasonable view is that the or unevenly spaced) and are treated as a classi-
subject-specific deviation is both in terms of the fication variable in the ANOVA or MANOVA
baseline response (i.e., intercept) and in terms of model. This precludes analysis of unbalanced
the rate of change over time (i.e., slope or set of designs in which different subjects are mea-
trend parameters). This more general structure sured on different occasions. Finally, software
could not be accommodated by the repeated mea- for the MANOVA approach often makes it dif-
sures ANOVA. The random intercept model ficult to include time-varying covariates, which
assumption leads to a compound-symmetric vari- are often essential to modeling dynamic rela-
ance-covariance matrix for the repeated measure- tionships between predictors and outcomes.
ments in which the variances and covariances of The MANOVA approach has been extended
the repeated measurements are constant over time. into a set of methods referred to as CPMs
In general, it is common to find that variances which also estimate the parameters of the
repeated measures variance-covariance matrix, Mixed-Effects Regression Models

but within a regression framework. Addition-
ally, CPMs allow for incomplete data across Mixed-effects regression models (MRMs) are
time, and thus include subjects with incomplete now widely used for the analysis of longitudinal
data in the analysis. These methods are data. Variants of MRMs have been developed
discussed in the next section. under a variety of names: random-effects models
(Laird and Ware 1982), variance component
models (Dempster et al. 1981), multilevel models
Statistical Models for the Analysis (Goldstein 1986), two-stage models (Bock 1989),
of Longitudinal and Repeated random coefficient models (de Leeuw and Kreft
Measures Data 1986), mixed models (Longford 1987; Wolfinger
1993), empirical Bayes models (Hui and Berger
In an attempt to provide a more general treatment 1983; Strenio et al. 1983), hierarchical linear
of longitudinal data, with more realistic assump- models (Raudenbush and Bryk 1986), and ran-
tions regarding the longitudinal response process dom regression models (Bock 1983a, b; Gibbons
and associated missing data mechanisms, statisti- et al. 1988). A basic characteristic of these models
cal researchers have developed a wide variety of is the inclusion of random subject effects into
more rigorous approaches to the analysis of lon- regression models in order to account for the
gitudinal data. Among these, the most widely influence of subjects on their repeated observa-
used include mixed-effects regression models tions. These random subject effects thus describe
(Laird and Ware 1982), and generalized estimat- each person’s trend across time, and explain the
ing equation (GEE) models (Zeger and Liang correlational structure of the longitudinal data.
1986). Variations of these models have been Additionally, they indicate the degree of
developed for both discrete and continuous out- between-subject variation that exists in the popu-
comes and for a variety of missing data mecha- lation of subjects.
nisms. The primary distinction between the two There are several features that make MRMs
general approaches is that mixed-effects models especially useful in longitudinal research. First,
are “full-likelihood” methods and GEE models subjects are not assumed to be measured the
are “partial-likelihood” methods. The advantage same number of times, thus, subjects with incom-
of statistical models based on partial-likelihood is plete data across time are included in the analysis.
that (a) they are computationally easier than full- The ability to include subjects with incomplete
likelihood methods, and (b) they generalize quite data is an important advantage relative to proce-
easily to a wide variety of outcome measures with dures that require complete data across time
quite different distributional forms. The price of because (a) by including all data, the analysis
this flexibility, however, is that partial likelihood has increased statistical power, and (b) complete-
methods are more restrictive in their assumptions case analysis may suffer from biases to the extent
regarding missing data than their full-likelihood that subjects with complete data are not represen-
counterparts. In addition, full-likelihood methods tative of the larger population of subjects. Because
provide estimates of person-specific effects (e.g., time can be treated as a continuous variable in
person-specific trend lines) that are quite useful in MRMs, subjects do not have to be measured at
understanding inter-individual variability in the the same time-points. This is useful for analysis of
longitudinal response process and in predicting longitudinal studies where follow-up times are not
future responses for a given subject or set of sub- uniform across all subjects. Both time-invariant
jects from a particular subgroup (e.g., a county, a and time-varying covariates can be easily
hospital, or a community). In the following sec- included in the model. Thus, changes in the out-
tions attention is focused on full-likelihood come variable may be due to both stable charac-
methods, and partial-likelihood methods are only teristics of the subject (e.g., their gender or race)
briefly discussed in section “Generalized Estimat- as well as characteristics that change across time
ing Equation Models.” (e.g., life-events). Finally, whereas traditional
approaches estimate average change (across time) level, and β1, the linear change across time) do
in a population, MRMs can also estimate change not vary by individuals except in terms of treat-
for each subject. These estimates of individual ment assignment. For both of these reasons, it is
change across time can be particularly useful in useful to add individual-specific effects into the
longitudinal studies where a proportion of sub- model that will account for the data dependency
jects exhibit change that deviates from the average and describe differential time-trends for differ-
trend. ent individuals. This is precisely what MRMs
To help fix ideas, consider the following simple do. The essential point is that MRMs therefore
linear regression model for the measurement y of can be viewed as augmented linear regression
individual i (i = 1, 2, . . ., N subjects) on occasion models. Note also that here and elsewhere
j ( j = 1, 2, . . . ni occasions): in this chapter, a main effect for treatment is
not included in the model. That is, it is assumed

yij ¼ β0 þ β1 tij þ β2 tij Trti þ eij : (1) that there is no difference in the expected out-
comes between treatment groups at baseline.
Ignoring subscripts, this model represents This is a reasonable assumption in a clinical
the regression of the outcome variable y on the trial where participants are randomized
independent variable time (denoted t). The sub- prior to receiving treatment. Alternatively, in
scripts keep track of the particulars of the data, an observational study where treatment
namely whose observation it is (subscript i) and (or exposure) is not randomized, it usually
when the observation was made (the subscript makes sense to include a main effect for treat-
j). The independent variable t gives a value to ment to account for differences between treat-
the level of time, and may represent time in ment groups at baseline.
weeks, months, etc. Since y and t carry both
i and j subscripts, both the outcome variable Random Intercept Model
and the time variable are allowed to vary by A simple extension of the linear regression model
individuals and occasions. The variable T rti is described in Eq. 1 is the random intercept model,
a binary variable that indicates the treatment which allows each subject to deviate from the
assigned to individual i. When T rt is dummy overall mean response by a person-specific con-
coded as a 1 or 0, with 1 indicating membership stant that applies equally over time:
in the treatment group, the regression coeffi-
cient β0 is the mean of y when t = 0, β1 is the yij ¼ β0 þ β1 tij þ β2 tij Trti þ υ0i þ eij (2)
slope or rate of change for the control group,
and β2 is the difference in slopes between the where υ0i represents the influence of individual
treatment and control groups. i on his/her repeated observations. Notice that if
In linear regression models, the errors eij are individuals have no influence on their repeated
assumed to be normally and independently dis- outcomes, then all of the υ0i terms would equal
tributed in the population with zero mean and 0. However, it is more likely that subjects will
common variance σ 2. This independence assump- have positive or negative influences on their
tion makes the typical general linear regression longitudinal data, and so the υ0i terms will devi-
model unreasonable for longitudinal data. This is ate from 0. Since individuals in a sample
because the outcomes y are observed repeatedly are typically thought to be representative of a
from the same individuals, and so it is much more larger population of individuals, the individual-
reasonable to assume that errors within an indi- specific effects υ0i are treated as random effects.
vidual are correlated to some degree. Further- That is, the υ0i are considered to be representa-
more, the above model posits that the change tive of a distribution of individual effects in
across time is the same for all individuals since the population. The most common form for this
the model parameters (β0, the intercept or initial population distribution is the normal distribution
26 Random intercept model Random intercept and slope model
26
25
25
24
24
Dependent Variable
Dependent Variable
23
23
22
22
21
21
Average Trend Average Trend
Individual Trends Individual Trends
20
20
0 1 2 3 4 0 1 2 3 4
Time Time
Fig. 1 Simulated longitudinal data based on a random the overall population (average) trend. The dashed lines
intercept model (left panel) and a random intercept and represent individual trends
slope model (right panel). The solid bold line represents
with mean 0 and variance σ 2υ . In addition, the one for each individual. The variance term σ 2υ
model assumes that the errors of measurement represents the spread of these lines. If σ 2υ is near-
(eij) are conditionally independent, which zero, then the individual lines would not deviate
implies that the errors of measurement are inde- much from the population trend and individuals
pendent conditional on the random individual- do not exhibit much heterogeneity in their change
specific effects υ0i. Since the errors now have across time. Alternatively, as individuals differ
the influence due to individuals removed from from the population trend, the lines move away
them, this conditional independence assump- from the population trend line and σ 2υ increases. In
tion is much more reasonable than the ordinary this case, there is more individual heterogeneity in
independence assumption associated with the time-trends.
linear regression model in Eq. 1. The random
intercept model is depicted graphically in the Random Intercept and Trend Model
left panel of Fig. 1. For longitudinal data, the random intercept model
As can be seen, individuals deviate from the is often too simplistic for a number of reasons.
regression of y on t in a parallel manner in this First, it is unlikely that the rate of change across
model (since there is only one subject effect υ0i) time is the same for all individuals. It is more
(for simplicity, it is assumed the treatment effect likely that individuals differ in their time-trends;
β2 = 0). In this figure the solid line represents the not everyone changes at the same rate. Further-
population average trend, which is based on more, the compound symmetry assumption of the
β0 and β1. Also depicted are ten individual trends, random intercept model is usually untenable for
both below and above the population (average) most longitudinal data. In general, measurements
trend. For a given sample there are N such lines, at points close in time tend to be more highly
correlated than measurements further separated in and slope parameters represent the overall (pop-
time. Also, in many studies subjects are more ulation) trend, while the individual parameters
similar at baseline due to entry criteria, and express how subjects deviate from the popula-
change at different rates across time. Thus, it is tion trends. The right panel of Fig. 1 represents
natural to expect that variability will increase this model graphically.
over time. As can be seen, individuals deviate from the
For these reasons, a more realistic MRM average trend both in terms of their intercept
allows both the intercept and time-trend to vary and in terms of their slope. As with the random
by individuals: intercept model, the spread of the lines around
the average intercept is measured by σ 2υ0 in Eq. 4.

yij ¼ β0 þ β1 tij þ þβ2 tij Trti þ υ0i The variance of the slopes around the average
trend is measured by σ 2υ1 in Eq. 4. By allowing
þ υ1i tij þ eij : (3)
the individual slopes to vary, it is now possible
for individual trends to be positive even though
In this model, β0 is the overall population the overall trend is negative. The term σ υ0υ1 in
intercept, β1 is the overall population slope for
Eq. 4 measures the association (covariance)
the group with Trt coded 0, and β2 indicates how between the random intercept and slope. When
the population slopes vary between treatment this quantity is negative, individuals with larger
groups (by specifically indicating how the
intercepts (β0 + υi0) will have steeper slopes
slope for Trt coded 1 is different than the slope (β1 + υi1).
for Trt coded 0). In terms of the random effects,
υ0i is the intercept deviation for subject i, and υ1i
is the slope deviation for subject i (relative to Matrix Formulation
their treatment group). As before, eij is an inde-
pendent error term distributed normally with
A more compact representation of the MRM
mean 0 and variance σ 2. As with the random is afforded using matrices and vectors. This
intercept model, the assumption regarding the formulation helps to summarize statistical
independence of the errors is one of conditional
aspects of the model. For this, the MRM for
independence, that is, they are independent con- the ni 1 response vector y for individual
ditional on υ0i and υ1i. With two random i can be written as:
individual-specific effects, the population dis-
tribution of intercept and slope deviations is yi ¼ Xi β þ Z i v i þ ei (5)
assumed to be bivariate normal N (0, Συ), with ni 1 ni p p1 ni r r1 ni 1
the random-effects variance-covariance matrix
given by with i = 1 . . . N individuals and j = 1 . . . ni
observations for individual i. Here, yi is the
X σ 2υ0 σ υ0υ1 ni 1 dependent variable vector for individual
¼ : (4)
υ σ υ0υ1 σ 2υ1 i, Xi is the ni p covariate matrix for individual
i, β is the p 1 vector of fixed regression param-
The model described in Eq. 3 can be thought eters, Zi is the ni r design matrix for the
of as a personal trend or change model since it random effects, υi is the r 1 vector of random
represents the measurements of y as a function individual effects, and «i is the ni 1 residual
of time, both at the individual υ0i and υ1i and vector.
population β0 and β1 (plus β2) levels. The inter- For example, in the random intercepts and
cept parameters indicate the starting point, and slopes MRM just considered, for a participant in
the slope parameters indicate the degree of the treatment group (Trti = 1) the data matrices are
change over time. The population intercept written as
2 3 2 3
yi1 1 ti1 ti1 and the variance-covariance matrix equals σ 2 I ni þ
6 yi2 7 6 1 ti2 ti2 7
6 7 6 7 2 3
yi ¼ 6 7 6
6 7 and Xi ¼ 6 7 and Zi
7 σ 2υ0 σ 2υ0 þ σ υ0 υ1 σ 2υ0 þ 2σ υ0 υ1
45 4 5 4 σ 2υ þ σ υ0 υ1
0
σ υ0 þ 2σ υ0 υ1 þ σ 2υ1
2
σ 2υ0 2 5
þ 3σ υ0 υ1 þ 2σ υ1
yini 1 tini tini σ 2υ0 þ 2σ υ0 υ1 σ 2υ0 þ 3σ υ0 υ1 þ 2σ 2υ1 σ 2υ0 þ 4σ υ0 υ1 þ 4σ 2υ1
2 3
1 ti1
6 1 ti2 7 which allows the variances and covariances to
6 7 change across time. For example, if σ υ0 υ1 is posi-
¼66 7
7
4 5 tive, then clearly the variance increases across
time. Diminishing variance across time is also
1 tini
possible if, for example, 2σ υ0 υ1 > σ 2υ1 . Other
patterns are possible depending on the values of
and the population and individual trend parameter
these variance and covariance parameters.
vectors are written as,
Models with additional random effects are also
2 3 possible, as are models that allow autocorrelated
β0 errors, that is «i N (0, σ 2Ωi). Here, Ω might, for
υ
β ¼ 4 β1 5 and υ0i ¼ 0i example, represent an autoregressive (AR) or
υ1i
β2 moving average (MA) process for the residuals.
Autocorrelated error regression models are com-
respectively. The distributional assumptions mon in econometrics. Their application within an
about the random effects and residuals are: MRM formulation is treated by Chi and Reinsel
(1989) and Hedeker (1989), and extensively
υi N ð0, Συ Þ described in Verbeke and Molenberghs (2000).
ei N ð0, σ 2 I ni Þ: By including both random effects and auto-
correlated errors, a wide range of variance-
As a result, it can be shown that the expected covariance structures for the repeated measures
value of the repeated measures yi is is possible. This flexibility is in sharp contrast to
the traditional ANOVA models which assume
Eðyi Þ ¼ Xi β (6) either a compound symmetry structure (univariate
ANOVA) or a totally general structure
and the variance-covariance matrix of yi is of the (MANOVA). Typically, compound symmetry is
form: too restrictive and a general structure is not parsi-
monious. MRMs, alternatively, provide these two
and everything in between, and so allow efficient
V ðyi Þ ¼ Z i Συ Z0i þ σ 2 I ni : (7) modeling of the variance-covariance structure of
the repeated measures.
For example, with r = 2, n = 3, and
2 3 2 3
1 0 0 1 0 Covariance Pattern Models
Xi 4 1 1 1 5 and Z i ¼ 4 1 15
1 2 2 1 2 An alternative to using random effects to model
correlated measurements over time is to explicitly
The expected value of y is model the covariance structure through the use of
2 3 CPMs. These models are a direct outgrowth of the
β0 multivariate growth curve models described in the
4 β 0 þ β 1 þ β2 5
“Historical Background” section where the
β0 þ 2β1 þ 2β2 covariance structure of the repeated observations
was assumed to follow a general form and all subject and assuming constant (homogenous)
parameters of the matrix were estimated. Rather variance over time (though the homogeneity of
than estimating every parameter of the covariance variance can be relaxed).
matrix, CPMs assume the variance-covariance When choosing a covariance model for repeated
matrix of the repeated observations follows a spe- measures data, one wishes to choose the most par-
cific structure. For example, the compound sym- simonious model that fits the data well. This can be
metry (CS) covariance model has only two done by first modeling the mean of observations
parameters σ 2 (variance) and ρ (correlation) and over time and then using likelihood ratio tests as
assumes that observations Yij have constant vari- well as model fit indices such as the Bayesian Infor-
ance over time and the correlation between any mation Criteria (BIC) and the Akaike Information
two observations on the same subject is the same Criteria (AIC) to select the model that best fits the
no matter how far apart those observations correlation and variance structure of the data. More
occurred. A variety of covariance structures exist details on methods for assessing and comparing
and are available in most software packages. See model fit of the variance-covariance structure are
Weiss (2005) for detailed descriptions of a number described by Wolfinger (1993) and Grady and
of different covariance matrices. Helms (1995).
Using the matrix notation in Eq. 5, a CPM
would be Calculating Effect Sizes
yi ¼ Xi β þ ei (8) Effect Sizes for Mixed-Effects Models
It is often of interest to summarize results from an
Where instead of assuming the residuals are intervention in terms of effect sizes. The effect
independent, it is assumed «i N (o, Ωi). Some size of an intervention is defined as the difference
common choices for Ωi include the previously in means between the intervention and the control
mentioned compound symmetry where for three (or its comparator) divided by the standard devi-
observations on subject i the covariance matrix is ation of the outcome. Assume a random intercept
0 1 and slope MRM as in Eq. 16, that is
1 ρ ρ
V ðyi Þ ¼ σ 2 @ ρ 1 ρA
yij ¼ β0 þ β1 tij þ þβ2 tij Trti þ υ0i þ υ1i tij
ρ ρ 1
þ eij
and the parameter ρ is the correlation between
any two observations on the same subject. An To estimate the effect size of the treatment
autoregressive or AR(1) covariance structure effect at time 2, begin by calculating the predicted
also has two parameters like the compound mean for a subject in the treatment group at time
symmetry structure but takes on a different 2 (Trti = 1 tij = 2):
form, namely,
0 1 E yij j Trti ¼ 1, tij ¼ 2 ¼ β0 þ 2β1 þ 2β2 (9)
1 ρ ρ
2
V ðyi Þ ¼ σ 2 @ ρ 1 ρ A:
and the predicted mean for a control subject at
ρ2 ρ 1
time 2 is
Thus, the farther apart two observations are
in time, the lower the correlation between E yij j Trti ¼ 1, tij ¼ 2 ¼ β0 þ 2β1 (10)
them (assuming ρ > 0). In general, CPMs
apply structure by specifying a specific relation- since the mean of the random effects and variance
ship between repeated observations on the same terms are 0. Thus the difference between the two
groups is 2β2. The variance for both groups at 2β2

Effect Size ¼ :
time 2 is σ 233

Var yij j tij ¼ 2 ¼ Varðv0i Þ þ 22 Varðv1i Þ
Illustrative Example: The WECare
þ 4Cov
ðv0i , v1i Þ Study
þ Var eij (11)
This section implements and extends the above
¼ σ 2υ0 þ 4σ 2υ1 þ 4σ υυ þ σ υ0 υ1 þ σ 2 : (12) methods using data from the WECare Study.
The WECare Study investigated depression out-
In matrix notation, this is written as comes during a 12-month period in which
267 low-income, mostly minority, women in the
suburban Washington, DC, area were treated for
Var yij j tij ¼ 2 ¼ ½1 2Συ ½1 2 T þ σ 2 :
depression. The participants were randomly
assigned to one of three groups: medication, cog-
Thus, the effect size of the intervention at nitive behavioral therapy (CBT), or treatment-as-
time 2 is usual (TAU), which consisted of referral to a
community provider. Depression was measured
2β2 every month or every other month through a
Effect Size ¼ :
σ 2υ0 þ 4σ 2υ0 þ 4σ υ0 υ1 þ σ 2 phone interview using the Hamilton Depression
Rating Scale (HDRS). Information on ethnicity,
income, number of children, insurance, and edu-
Effect Sizes for Covariance Pattern cation was collected during the screening and
Models baseline interviews. All screening and baseline
Calculating effect sizes for a covariance pattern data were complete except for income, with
model is slightly different than for the mixed- 10 participants missing data on income. After
effect model in Eq. 16 because, although it is not baseline, the percentage of missing interviews
necessary take into account the variance of the ranged between 24 and 38 per cent across months.
random effects, the error terms are no longer inde- Outcomes of the study were reported in Miranda
pendent. The model is et al. (2003, 2006). In these papers, the primary
research question was whether the medication and
yij ¼ β0 þ β1 tij þ β2 tij Trti þ eij (13) CBT treatment groups had better depression out-
comes as compared with the treatment-as-usual
where «i N (0, Ωi). As in Eqs. 9 and 10 the (TAU) group.
difference in predicted means between treatments Table 1 provides mean HDRS scores, percent
and controls is 2β2. The variance for both groups missing, and cumulative measurement dropout at
at time 2 is simply each time point by treatment group. By month
6, approximately 84% of participants had been

retained in the study. By month 12, the retention
Var yij j tij ¼ 2 ¼ Varðei3 Þ (14)
rate was 76%. The difference in dropout rates
across the three treatment groups was not signifi-
¼ σ 233 (15) cant ( p = 0.27). Figure 2 provides a spaghetti plot
of depression trajectories for all 267 participants
That is, the variance at time 2 is the third term (top panel) and also plots the mean depression
on the diagonal of the error variance covariance score by treatment group (bottom panel). Two
matrix. Thus, the effect size of the intervention at features of the data are readily apparent. First, as
time 2 is shown by the spaghetti plot, there is quite a bit of
Table 1 WECare mean Hamilton Depression Rating random intercept and slope model in Eq. 16.
Scale (HDRS) scores, percent missing, and cumulative Here, time corresponds to the month of the inter-
measurement dropout at each time point
view and takes on values from 0 to 12. As noted
Mean HDRS score (% missing, % cumulative above, the change in depression scores across
measurement dropout)
time do not appear to be linear. For now, time is
Month of Medication CBT TAU
study (n = 88) (n = 90) (n = 89) treated as linear in order to demonstrate the role of
Baseline 17.95 (0%, 16.28 (0%, 16.48 (0%, diagnostics in addressing model fit. Subsequently,
0%) 0%) 0%) quadratic and cubic terms are incorporated as well
Month 1 14.00 (20%, 13.11 12.80 as the effect of treatment group in the model. The
2%) (27%, 6%) (27%, 4%) initial model is
Month 2 10.74 (16%, 11.42 11.30
5%) (27%, 7%) (29%, yij ¼ β0 þ β1 tij þ υ0i þ υ1i tij þ eij (16)
10%)
Month 3 9.60 (28%, 10.24 13.05
8%) (36%, 9%) (27%, where β0 is the average month 0 (baseline) HDRS
11%) level and β1 is the average HDRS monthly linear
Month 4 9.54 (31%, 9.07 (38%, 11.81 change. The random effect υ0i is the individual
9%) 13%) (35%, deviation from the average intercept, and υ1i is the
12%)
individual deviation from the average linear
Month 5 8.62 (40%, 10.47 11.85
14%) (34%, (40%, change. Fitting this model yields the results
14%) 13%) given in Table 2.
Month 6 9.17 (28%, 10.73 11.92 Focusing first on the estimated regression param-
18%) (33%, (29%, eters, this model indicates that patients start, on
14%) 15%)
average, with a HDRS score of 14.08 and change
Month 8 8.07 (36%, 9.62 (30%, 11.55
24%) 17%) (33%,
by 0.51 points each month. Lower scores on the
18%) HDRS reflect less depression, so patients are
Month 10 9.04 (40%, 8.31 (31%, 10.92 improving over time. The estimated HDRS score
27%) 20%) (31%, at a given month equals 14.08 (0.51 month). So
19%) for example, at month 2 the average depression
Month 12 9.71 (30%, 8.38 (24%, 10.22 score is 15.64 (1.56 2) = 12.88. Both the
30%) 24%) (19%,
19%) intercept and slope are statistically significant (p <
Note. CBT cognitive behavioral therapy, TAU treatment as
0.0001). The intercept being significant is not par-
usual ticularly meaningful; it just indicates that HDRS
scores are different than zero at baseline. However,
between-subject variability in the data. Second, as because the slope is significant, one can conclude
shown by the plots of means over time, the trends that the rate of improvement is significantly different
in depression scores do not appear to be linear. from zero in this study. On average, patients are
Instead, they appear curvilinear, with an initial improving across time.
strong downward trend and then a leveling off For the variance and covariance terms of the
over time. random effects, there are concerns in using the
standard errors in constructing Wald test statistics
(estimate divided by its standard error) particu-
Mixed-Effects Regression Models larly when the population variance is thought to
for Continuous Data Using the WECare be near zero and the number of subjects is small
Study (Bryk and Raudenbush 1992). This is because
variance parameters are bounded; they cannot be
This section illustrates the use of MRMs for con- less than zero and so using the standard normal for
tinuous data using the WECare data. The section the sampling distribution is not reasonable. As a
begins by fitting the WECare data using the result, statistical significance is not indicated for
35
Fig. 2 WECare depression
scores over the course of the
study. The top panel plots
30
the raw HDRS scores for all
267 participants where each
25
Raw HDRS Values
line represents a single
individual. The bottom
20
panel is plots of mean
HDRS scores by treatment
15
group. There is substantial
heterogeneity in the raw
scores and nonlinear trends 10
in the means
5
0
0 1 2 3 4 5 6 8 10 12
Month
35
30
25
Mean HDRS Scores
20
15
10
TAU
5
CBT
Medication
0
0 1 2 3 4 5 6 8 10 12
Month
the variance and covariance parameters in the Thus, there is considerable heterogeneity in
tables. However, the magnitude of the estimates terms of patients’ initial level of depression and
does reveal the degree of individual heterogeneity in their change across time. Finally, the covariance
in both the intercepts and slopes. For example, between the intercept and linear trend is negative;
while the average intercept in the population expressed as a correlation it equals 0.13, which
is estimated to be 14.08, the estimated popula- is small in size. This suggests that baseline depres-
tion standard deviation for the intercept is 4:52 sion level (i.e., intercept) is not related to the
pffiffiffiffiffiffiffiffiffiffiffiffi
¼ 20:44 . Similarly, the average population amount of linear change over time. Later on, it is
slope is 0.51, but the estimated population seen that baseline level is positively correlated
standard deviation for the slope equals 0.42, with quadratic trend – patients who are initially
and so approximately 95% of subjects are more depressed tend to level off over time more
expected to have slopes in the interval than patients who are less depressed at baseline.
0.51 (1.96 0.42) = 1.33 to 0.31. That Using the estimated population intercept (β^ 0 ) and
the interval includes positive slopes reflects the slope ( β^ 1 ) one can estimate the average HDRS
fact that not all subjects improve across time. score at each time-point. These are displayed in
Fig. 3 along with the observed means at each time- and George (1998) which describes use of econo-
point. As can be seen, a linear trend does not result metric forecasting statistics to assess various
in close agreement between the observed and forms of fit between observed and estimated
estimated means. In particular, there is an initial means. The lack of fit of the estimated means to
sharp downward trend that the linear model is the observed means suggests the inclusion of cur-
unable to capture. For a more quantitative assess- vilinear trends in the model – a point made in the
ment, the interested reader is referred to Kaplan next section.
Table 2 MRM regression results for WECare data with Curvilinear Growth Model
random intercepts and slopes and assuming linear change
over time
In many situations, it is too simplistic to assume
Parameter that the change across time is linear. In the
name Symbol Estimate SE t p-value
present example, for instance, it appears that
Intercept β0 14.08 0.33 42.30 <0.0001
the depression scores diminish across time in a
Linear β1 0.51 0.04 12.27 <0.0001
slope
curvilinear manner. A curvilinear trend would
Intercept σ 2υ0 20.44 2.53 allow a leveling off of the improvement across
variance time. This is clearly plausible for rating scale
Intercept/ σ υ0 υ1 0.25 0.23 data, like the HDRS scores, where values below
linear slope zero are impossible. Here, a curvilinear growth
covariance
model is considered by adding both a quadratic
Linear σ 2υ1 0.18 0.04
slope
and cubic term to the model. A plot of observed
variance versus estimated means using linear and qua-
Error σ
2
23.67 0.88 dratic terms (not shown) did not appear to fit the
variance observed data well so a cubic term is also added.
Note. 2 log L = 12305.7. When random cubic effects were included in
18
16
Mean HDRS Scores
14
12
10
Raw Means
Linear Estimated Means
8
Cubic Estimated Means
0 1 2 3 4 5 6 8 10 12
Month
Fig. 3 Observed and predicted WECare mean depression scores. Mean scores based on a linear or quadratic model do not
fit the observed data as well as a model that includes cubic effects
the model, they were perfectly correlated with 0.02. Thus, change in depression from base-
the random quadratic effects so the updated line to a given month is calculated as
model only has random intercepts, slopes, 16.33 (2.69 month) + (0.36 month2)
and quadratic slopes. This produces the follow- (0.02 month3). So for example, at month 2 the
ing model average depression score is 16.33 (2.69 2) +
(0.36 4) (0.02 8) = 12.26. Average HDRS
yij ¼ β0 þ β1 tij þ β2 t2 þ β3 t3ij þ υ0i scores at each month are displayed in Fig. 3 along
with the observed means and estimated means
þ υ1i tij þ υ2i t2ij þ eij : (17) based on a linear model. Including a cubic effect
in the model does a better job capturing trends in
Where β0 is the average month 0 HDRS level, depression scores over time. Note that at months
β1 is the average HDRS monthly linear change, β2 8 and 10, the quadratic term dominates so that
is the average HDRS monthly quadratic change, mean depression scores begin to increase, and
and β3 is the average HDRS monthly cubic then at month 12 the cubic term dominates so
change. Similarly, υ0i is the individual deviation that HDRS scores decrease again. Most of the
from average intercept, υ1i is the individual devi- improvement in depression is occurring during
ation from average linear change, and υ2i is the the first few months of the study. Because the
individual deviation from average quadratic scale for each of these terms is different (e.g.,
change. Fitting this model yields the results the linear effect ranges from 0 to 12, the cubic
given in Table 3. effect ranges from 0 to 123 = 1728), it is diffi-
Focusing first on the estimated regression cult to compare them to each other in terms of
parameters, this model indicates that patients magnitude. The t-statistics provide some evi-
start off, on average, with an HDRS score of dence of the magnitude and suggest that
16.33. Note that this value is higher than the although the linear effect is strongest, all three
intercept of the linear model of 14.08 and closer effects contribute to the effect of time on depres-
to the observed baseline mean of 16.9. The sion symptoms.
linear, quadratic, and cubic terms in the model As before, the variance and covariance terms
are all highly significant (p < 0.0001). The in Table 3 provide information regarding the
coefficient of the linear effect of month is amount of heterogeneity in the data. The 95%
2.69, the coefficient of the quadratic term is confidence interval for subject-specific intercepts
0.36, and the coefficient of the cubic term is is 16.33 3.87 and the 95% confidence interval
Table 3 MRM results for the WECare data with cubic trends and random intercept, slope, and quadratic slopes effects
Parameter name Symbol Estimate SE t p-value
Intercept β0 16.33 0.34 47.99 <0.0001
Month β1 2.69 0.22 12.03 <0.0001
Month2 β2 0.36 0.05 7.97 <0.0001
Month3 β3 0.015 0.003 6.12 <0.0001
Intercept variance σ 2υ0 15.02 2.38
Intercept/linear slope covariance σ υ0 υ1 0.67 0.69
Linear slope variance σ 2υ1 1.55 0.36
Intercept/quadratic slope covariance σ υ0 υ2 0.10 0.05
Linear/quadratic slope covariance σ υ1 υ2 0.11 0.03
Quadratic slope variance σ 2υ2 0.01 0.002
2
Error variance σ 19.75 0.79
Note. 2 log L = 12095.1
for the subject-specific quadratic terms in the a more restrictive variance-covariance structure
model includes zero reflecting the fact that there than is correct. As noted by Berkhof and
is considerable heterogeneity in terms of patients’ Snijders (2001), this bias can largely be
initial level of depression and in their changes corrected by dividing the p-value obtained
across time. from the likelihood-ratio test (of variance
Finally, the covariance between the linear terms) by two. In the present case it doesn’t
effect and the quadratic effect is negative; really matter, but this modification yields p <
expressed as a correlation it equals 0.94, which 0.0001/2 = 0.00005. Thus, there is clear evi-
is very high. This is partially due to multi- dence that the assumption of only random inter-
colinearity but also suggests that those patients cepts and linear slopes is rejected, and the
who make the most initial gains (i.e., steep slopes) inclusion of the random quadratic slopes is
tend to level off at a greater rate (i.e., greater necessary.
quadratic effects) than patients who have flatter In addition to plots of the overall means over
slopes in the early stages of the study. An alterna- time, estimates of the individual trends, based on
tive explanation is that of a floor effect due to the the random effects ^υ 0i , ^υ 1i and ^υ 2i are often of
HDRS rating scale. Simply put, once patients interest. Figure 4 contains a plot of the individual
achieve low depression scores they no longer trend estimates from this model. These are
have room to keep improving and thus tend to obtained by calculating ^y ij ¼ β^ 0 þ β^ 1 tij þ β^ 1 t2ij þ
level off. β^ 2 t3ij þ ^υ 0i þ ^υ 1i tij þ ^υ 2i t2ij for t = 0, 1,. . ., 12, and
An interesting question, at this point, is then connecting the time point estimates for each
whether it is necessary to include random individual. For clarity, 50 of the 267 WECare
effects for the linear and quadratic terms or participants were randomly selected to display in
whether a less complicated model is sufficient. Fig. 4.
Fitting the more restrictive model with random The plot makes apparent the wide heteroge-
intercepts and linear terms (not shown) yields neity in trends across time, as well as the
2 log L = 12155.6. Note that both models still increasing variance in HDRS scores across
include fixed effects for linear slope, quadratic time. Some individuals have initial accelerating
slope, and cubic slope. Because these are nested downward trends suggesting immediate
models, they can be compared using a improvement and then a leveling off over time,
likelihood-ratio test. For this, one compares while others appear to have more modest
the difference in model deviance values (i.e., improvements and then perhaps a slight wors-
2 log L ) to a chi-square distribution, where ening of symptoms. Some individuals even
the degrees of freedom equals the number of have positive trends indicating a worsening of
parameters set equal to zero in the more restric- their depressive symptoms across time. This is
tive model. Comparing the full model to the not too surprising given that not all depression
restricted model with only random intercepts interventions work for everyone. At the end of
and slopes, χ 23 = 12155.6–12095.1 = 60.5, this chapter, growth mixture models are briefly
p < 0.0001 for H 0 : σ υ0 υ2 ¼ σ υ1 υ2 ¼ σ 2υ2 ¼ 0. It introduced which attempt to classify individuals
should be noted that the use of the likelihood into discrete latent classes based on the shape of
ratio test for this purpose also suffers from the their trajectories.
variance boundary problem mentioned above It is worth noting that the estimates of the
(Verbeke and Molenberghs 2000). Based on individual trends presented in Fig. 4 are empir-
simulation studies it can be shown that the ical Bayes (EB) estimates, which reflect a com-
likelihood-ratio test is too conservative (for test- promise between an estimate based solely on an
ing null hypotheses about variance parameters), individual’s data and an estimate for the popu-
namely, it does not reject the null hypothesis lation of interest. Thus, they are not equivalent
often enough. This would then lead to accepting to ordinary least squares (OLS) estimates (i.e.,
25
20
Estimated HDRS Values
15
10
5
0 1 2 3 4 5 6 8 10 12
Month
Fig. 4 Subject-specific estimated WECare HDRS means over time based on a model with cubic fixed effects and random
intercept, slope, and quadratic slope effects. For clarity, only a random sample of 50 participants is displayed
fitting a regression line for each participant sep- the EB estimate will be smaller (in absolute
arately) which would only rely upon an individ- value) than the corresponding OLS estimate.
ual’s data. An important advantage of EB Alternatively, if the subject has many measure-
estimates relative to OLS estimates is that they ments across time, then the EB and OLS esti-
are not as prone to the undue influence of out- mates would be very similar. These EB
liers. This is especially true when an individual estimates are readily available from most
has few measurements by which to base these MRM software programs.
estimates on. Because of this, the EB estimates Finally, the fit of the observed variance-
are said to be shrunken to the mean, where the covariance matrix of the repeated measures is
mean of the random effects equals zero in the addressed. These are calculated based on the
population. The degree of shrinkage depends on pairwise data for the covariances and the available
the number of measurements an individual has. data for each of the variances. The observed
Thus, if a subject has few measurements, then variance-covariance matrix is
V ð2
yÞ 3
26:87
6 16:52 42:64 7
6 7
6 17:19 30:54 49:54 7
6 7
6 12:03 22:64 28:47 47:00 7
6 7
6 12:65 28:68 29:47 32:39 52:74 7
¼6 6 9:37
7
7
6 21:22 20:28 24:95 30:09 49:88 7
6 9:10 21:82 29:03 26:73 29:34 28:15 49:75 7
6 7
6 7:32 23:62 23:98 26:49 24:74 27:88 31:67 50:83 7
6 7
4 7:93 22:11 22:79 22:69 26:19 23:96 27:05 33:33 53:32 5
5:48 17:17 17:83 18:78 21:53 22:44 22:86 30:53 30:97 50:14
Based on the model estimates, the variance-covariance matrix is
^ ðyÞ
V ¼2
ZΣ^ υ Z0 þ σ^ 2 I
3
34:76
6 15:58 37:23 7
6 7
6 15:94 19:00 41:24 7
6 7
6 16:09 20:10 23:43 45:80 7
6 7
6 16:03 20:81 24:80 28:00 50:16 7
¼6
6 15:76 21:11
7
7
6 25:60 29:24 32:04 53:72 7
6 15:29 21:01 25:84 29:80 32:87 35:06 56:12 7
6 7
6 13:72 19:60 24:63 28:82 32:18 34:69 36:36 56:93 7
6 7
4 11:33 16:58 21:16 25:08 28:33 30:91 32:82 34:65 53:55 5
8:11 11:95 15:44 18:56 21:32 23:72 25:76 28:75 30:29 50:14
where the design matrix of the random effects, and the estimates of the random effects variance-
covariance matrix are given by
2 3 2 3
1 1 1 1 1 1 1 1 1 1 10:02 0:67 0:10
6 7 ^ 6 7
Z0 ¼ 4 0 1 2 3 4 5 6 8 10 12 5, Σ υ ¼ 4 0:67 1:55 0:11 5,
0 1 4 9 16 25 36 36 100 44 0:10 0:11 0:01
and σ^ 2 ¼ 19:75. Given that this variance- Covariance Pattern Models

covariance matrix of 55 elements is represented
by seven parameter estimates, the fit is reasonably As discussed in the last section, a MRM with ran-
good. The model is clearly picking up on the dom intercept, slope, and quadratic slope terms did
increasing variance across time and the not provide an adequate fit to the variance-
diminishing covariance away from the diagonal. covariance matrix of the WECare data, as compared
Comparing this model to one with a totally to a totally general structure. This was not unex-
general variance-covariance structure (not pected as the MRM was attempting to model a
shown) where every unique parameter of the covariance matrix of the repeated measures with
covariance matrix is estimated, yields a 55 unique elements using only seven parameters.
likelihood-ratio χ 248 ¼ 78:2 , which is statisti- An alternative to MRMs for modeling longitudinal
cally significant. Thus, this curvilinear model data are CPMs. The WECare data were modeled
with seven variance-covariance parameters (σ 2 using the fixed linear, quadratic, and cubic effects as
and six unique parameters in Συ) does not quite described before and fit with a number of different
provide an adequate fit of the variance- covariance structures. The fit indices AIC and BIC
covariance matrix V (y), which being of dimen- were used to determine the covariance pattern which
sion 10 10 has 55 unique elements. This best fit the data. Likelihood ratio tests were also
suggests the use of a parameterized covariance performed in order to compare each structured
matrix to fit these data. These models offer the covariance pattern to an unstructured covariance
possibility of using more parameters to estimate where every parameter of the covariance matrix is
the covariance matrix and the potential of a estimated. Table 4 summarizes the results of the
better fit to the data. They are discussed in the investigation. The rows have been sorted by BIC
next section. from smallest to largest.
Table 4 Fit indices for various covariance patterns fit to the WECare data
p-value versus
Covariance pattern No. of parameters 2 log L AIC BIC unstructured
Autoregressiveoving Average 3 12115.6 12129.6 12129.6 0.0001
MRM 7 12095.1 12117.1 12156.5 <0.0001
Toeplitz 10 12108.5 12136.5 12186.7 <0.0001
Heterogeneous Toeplitz 19 12079.3 12125.3 12207.8 0.004
Factor analytic (2) 29 12064.8 12130.8 12249.1 0.006
Factor analytic (1) 20 12130.7 12178.7 12264.8 <0.0001
Heterogeneous CS 11 12202.1 12232.1 12286.0 <0.0001
Autoregressive (1) 2 12257.1 12269.1 12290.7 <0.0001
Heterogeneous Autoregressive(1) 11 12227.8 12257.8 12311.7 <0.0001
Antedependence 19 12209.6 12255.6 12338.1 <0.0001
Unstructured 55 12016.9 12134.9 12346.6 NA
As can be seen, while none of the covariance i was randomized to CBT and 0 otherwise. The
patterns provide a statistically similar fit to the mixed-effects model is now
data than the unstructured covariance in terms
of a likelihood ratio test, the MRM with random yij ¼ β0 þ β1 tij þ β2 t2ij þ β3 t3ij
intercepts, slopes, and quadratic slopes has the þ β4 tij MEDSi þ β5 t2ij MEDSi
smallest AIC and the second smallest BIC
among all the models. BIC imposes a high pen- þ β6 t3ij MEDSi þ β7 tij CBT i (18)
alty on models with many parameters so it is not þ β8 t2ij CBT i þ β9 t3ij CBT i þ υ0i
surprising that the unstructured covariance has þ υ1i tij þ υ2i t2ij þ eij :
the worst BIC. For this reason, Fitzmaurice et al.
(2012) recommend against use of BIC for model
The parameters υ0i, υ1i, υ2i, and eij have the
selection of (co)variance structure. AIC is more
same interpretation as in section “Curvilinear
useful for comparing models that are not nested
Growth Model.”
when a likelihood ratio test is not appropriate.
The unstructured covariance model is
Still, Table 4 suggests that the MRM provides a
relatively parsimonious fit to the WECare
yij ¼ β0 þ β1 tij þ β2 t2ij þ β3 t3ij
data. Perhaps a model with both random subject
effects and autocorrelated errors could be consid- þ β4 tij MEDSi þ β5 t2ij MEDSi
(19)
ered here. þ β6 t3ij MEDSi þ β7 tij CBT i
P
þ β8 t2ij CBT i þ β9 t3ij CBT i þ ij :
Effect of Treatment Group on Change

where Σij represents the jth entry on the diagonal
of the ni ni unstructured covariance matrix for
At this point, the effect of treatment group on
subject i.
depression outcomes is examined by
augmenting the model to include interactions Equations 18 and 19 highlight the difference
of time with treatment group. Setting the TAU between a mixed-effects model and a covari-
group as the reference group, two new variables ance pattern model. The mixed-effects model
are created: MEDSi which equals 1 if participant partitions the variance of yij into between-
i was randomized to antidepressants and 0 oth- subject variance (estimated via the random
erwise; and CBTi which equals 1 if participant effects) and within-subject variance (estimated
via the error term). The covariance pattern point by treatment group using the parameter esti-
model does not make this distinction. When mates in Table 5. Even though the other treatment by
the focus of inference is on the fixed-effects time interactions are not significant, their magnitude
in the model, this distinction is less important. is large enough such that the three different growth
In other settings, where there is interest in deter- curves have very different shapes.
mining the degree of subject heterogeneity Once it has been established that the Medi-
and/or examining individual subject trends, it cation intervention (but not the CBT interven-
may be more important. tion) produces significantly different outcomes
In both models, β1, β2, and β3 represent the than the TAU group (via likelihood ratio tests),
linear, quadratic, and cubic effects of time for it may be of interest to estimate the mean HDRS
the TAU group which has been chosen to be the scores of these interventions at specific time
reference group. The coefficients β4, β5, and β6 points, their differences, and their corresponding
are the time by Medication group interactions effect sizes. This can be done using the methods
with the three time effects and indicate the dif- described in section “Calculating Effect Sizes.”
ference in time trends between the Medication For example, to calculate the effect size of
and TAU group. The coefficients β7, β8, and β9 the Medication intervention versus the TAU
are the time by CBT group interactions and intervention at month 6, one begins by estimat-
indicate the difference in time trends between ing the mean HDRS scores for both groups at
the CBT and TAU group. A likelihood-ratio month 6. For both Eqs. 18 and 19 the difference
test can be used to test the null hypothesis that in mean HDRS scores at month 6 between the
there is no effect of Medication versus TAU Medication and TAU interventions is
(i.e., β4, β5, and β6 are zero) by fitting 6β4 + 62β5 + 63β6. The variance at month 6 in
model 18 with and without the time by Medica- the mixed-effects model is
tion interaction effects. This yields χ 23 =
12091.2–12067.1 = 24.1, which has a p-value
<0.0001. A similar test for the effect of the CBT Var yij jtij ¼ 6

group yields χ 23 = 12074.0– 12067.1 = 6.9 ¼ Cov υ0i þ 6υ1i þ 62 υ2i þ ei6 ,υ0i þ 6υ1i þ62 υ2i þ ei6

which has a p-value 0.075. In model 19, the ¼ Varðυ0i Þ þ2Covðυ0i ,6υ1i Þ þ 2Cov υ0i ,62 υ2i

corresponding likelihood ratio tests are χ 23 = þ Varð6υ1i Þþ 2Cov 6υ1i ,62 υ2i þVar 62 υ2i
12012.6–11988.9 = 23.7 (p < 0.0001) for the þ Varðei6 Þ
Medication group and χ 23 = ¼ σ 2υ0 þ 12σ υ1 υ2 þ 72σ υ0 υ2 þ36σ 2υ1 þ 432σ υ1 υ2
þ 1296σ 2υ2 þ σ 2
11995.2–11988.9 = 6.3 ( p = 0.10) for the
¼ 54:57:
CBT group. Thus, both models give similar
results regarding the significance of the Medi- (20)
cation and CBT treatment groups versus the
In matrix notation, this is written as
TAU group.
Table 5 reports the results from fitting the

T
model described in Eq. 18 to the WECare data Var yij j tij ¼ 6 ¼ 1 6 62 Συ 1 6 62 þ σ 2 :
and Table 6 reports the results from the model
described in Eq. 19. As can be seen, the estimates
Using the estimates from Table 5, the effect
from both models are similar.
size based on the mixed-effects model is
It is interesting to note that among the time by
treatment interactions, only the interaction of Med-
ication with linear time is significant. This suggests 4:39
Month 6 effect size ¼ pffiffiffiffiffiffiffiffiffiffiffi ¼ 0:60:
that the effect of the Medication intervention takes 54:57
place early on in the study, during the initial sharp
decline in depression scores. This is clearer in Fig. 5, For the covariance pattern model, the variance
which displays the estimated means at each time at month 6 is simply the seventh term on the
Table 5 Results from a mixed-effect regression model fit to the WECare data
Intercept β0 16.330 0.34 47.96 <0.0001
Month β1 2.081 0.36 5.84 <0.0001
Month2 β2 0.325 0.07 4.38 <0.0001
Month3 β3 0.016 0.00 3.88 0.0001
Month*MEDS β4 1.356 0.48 2.8 0.005
Month2*MEDS β5 0.099 0.10 0.96 0.34
Month3*MEDS β6 0.001 0.01 0.11 0.92
Month*CBT β7 0.424 0.49 0.87 0.38
Month2*CBT β8 0.005 0.10 0.05 0.96
Month3*CBT β9 0.002 0.01 0.39 0.70
Intercept variance σ 2υ0 15.062 2.387
Intercept, slope covariance σ υ0 υ1 1.052 0.658
Slope variance σ 2υ1 1.182 0.322
Intercept, quadratic slope covariance σ υ0 υ2 0.134 0.050
Slope, quadratic slope covariance σ υ1 υ2 0.078 0.025
Quadratic slope variance σ 2υ2 0.006 0.002
2
Error variance σ 19.741 0.792
Note. 2 log L = 12067.1
Table 6 Results from a covariance-pattern model fit to the WECare data

Intercept β0 16.817 0.31 54.22 <0.0001
Month β1 2.118 0.39 5.49 <0.0001
Month2 β2 0.319 0.08 4.06 <0.0001
Month3 β3 0.015 0.00 3.61 0.0004
Month*MEDS β4 1.497 0.53 2.81 0.005
Month2*MEDS β5 0.138 0.11 1.25 0.21
Month3*MEDS β6 0.002 0.01 0.29 0.77
Month*CBT β7 0.438 0.54 0.82 0.41
Month2*CBT β8 0.009 0.11 0.08 0.94
Month3*CBT β9 0.001 0.01 0.18 0.86
Note. 2 log L = 11988.9
diagonal of the covariance matrix which is equal

to 49.52. Thus, using parameter estimates from Extensions and Alternatives
Table 6, the effect size based on the covariance
pattern model is Analysis of Longitudinal Data
with Missing Values
4:45 While longitudinal designs have many benefits,

Month 6 effect size ¼ pffiffiffiffiffiffiffiffiffiffiffi ¼ 0:62:
49:52 measuring participants repeatedly over time also
leads to repeated opportunities for missing data,
Both effect sizes are similar and suggest a either through failure to answer certain items,
medium effect of the Medication intervention. missed assessments, or permanent withdrawal
Fig. 5 Estimated WECare
18
HDRS means over time by
treatment group
16
Estimated HDRS Scores
14
12
10
Treatment as usual
8
CBT
Medication
0 1 2 3 4 5 6 8 10 12
Month
from the study. As noted above, the treatment of dropout patterns with distinct model parameters
missing data in longitudinal studies is itself a vast for each stratum. Marginal estimates across the
literature. An important consideration when draw- patterns can be derived as a weighted average
ing inferences from longitudinal data is the reason across pattern specific estimates (Little 1995) or
for the missing data, also referred to as the missing by using multiple imputation (Demirtas and
data mechanism (Rubin 1976). Most of the Schafer 2003). Shared-parameter models are
methods described in this chapter – with the identified by using common random effects to
exception of GEE methods – provide valid esti- relate the response with the missing-data indi-
mates under the assumption that the missing data cator (Daniels and Hogan 2000; Guo et al.
mechanism is missing at random (MAR) as 2004).
described by Rubin (1976), where the probability Limitations due to space prevent an in-depth
that a value is missing does not depend on discussion of this topic. Instead, readers are
unobserved information such as the value itself. referred to recent review articles including
When data are not missing at random (NMAR), Kenward and Molenberghs (1999), Siddique
that is, the probability that a value is missing does et al. (2008), and Ibrahim and Molenberghs
depend on unobserved information, it is necessary (2009). Also the books by Little and Rubin
to model both the outcome as well as the missing (2002), Fitzmaurice et al. (2012), Hedeker and
data mechanism itself. Gibbons (2006), and Daniels and Hogan (2008)
NMAR is an untestable assumption since the which contain useful material on this topic.
mechanism by definition depends on
unobserved information. Thus, it is difficult to
identify those situations where one is dealing Generalized Estimating Equation
with data that are NMAR. However, one situa- Models
tion where data that are NMAR is often a con-
cern is participant drop-out where subjects In the 1980s, alongside development of MRMs
withdraw from a study and are never heard and CPMs for incomplete longitudinal data, gen-
from again. In this situation, two common eralized estimating equations (GEE) models were
approaches for handling drop-outs in longitudi- developed (Liang and Zeger 1986; Zeger and
nal designs are pattern-mixture models and Liang 1986). Essentially, GEE models extend
shared-parameter models. In pattern-mixture generalized linear models (GLMs) to the case of
models, the data are stratified by the different correlated data. This class of models has become
very popular – especially for the analysis of cate- independent of their observed responses during
gorical and count outcomes – though they can be the study. This leads to a preference for full-
used for continuous outcomes as well. One differ- likelihood approaches over quasi or partial like-
ence between GEE models and MRMs is that lihood approaches, and MRM over GEE, at least
GEE models are based on quasi-likelihood esti- for longitudinal data. There is certainly less of
mation, and so the full likelihood of the data is not an argument for a preference for data that are
specified. GEE models are termed marginal only clustered (e.g., providers nested within
models, and they model the regression of y on clinics), in which case advantages of MAR
x and the within subject dependence (i.e., the over MCAR are not as germane.
association parameters) separately. The term A basic feature of GEE models is that the joint
“marginal” in this context indicates that the distribution of a subject’s response vector yi does
model for the mean response depends only on not need to be specified. Instead, it is only the
the covariates of interest, and not on any random marginal distribution of yij at each time point
effects or previous responses. In terms of missing that needs to be specified. To clarify this further,
data, GEE assumes that the missing data are miss- suppose that there are two time-points and sup-
ing completely at random (MCAR) where the pose that the outcome is a continuous normal
probability that a value is missing does not depend random variable. GEE would only require us to
either on observed or missing values. This is a assume that the distribution of yi1 and yi2 are two
stricter (and possibly less realistic) assumption univariate normals, rather than assuming that yi1
than that assumed by the models employing full- and yi2 form a (joint) bivariate normal distribution.
likelihood estimation which assume missing data Thus, GEE avoids the need for multivariate dis-
are MAR. tributions by only assuming a functional form for
Conceptually, GEE reproduces the marginal the marginal distribution at each time-point. This
means of the observed data, even if some of leads to a simpler quasi-likelihood approach for
those means have limited information because of estimating the model parameters, rather than the
subject drop-out. Standard errors are adjusted full-likelihood approach of the MRM and CPM.
(i.e., inflated) to accommodate the reduced The disadvantage, as mentioned above, is that
amount of independent information produced by because a multivariate distribution is not specified
the correlation of the repeated observations over for the response vector, the assumption for the
time. By contrast, mixed-effects models use the missing data is more stringent for the GEE than
available data from all subjects to model temporal the full-likelihood estimated MRMs and CPMs. A
response patterns that would have been observed complete treatment of GEE can be found in
had the subjects all been measured to the end of Hardin and Hilbe (2012).
the study. Because of this, estimated mean
responses at the end of the study can be quite
different for GEE versus MRM, if the future Models for Categorical Outcomes
observations are related to the measurements that
were made during the course of the study. If the Reflecting the usefulness of mixed-effects model-
available measurements are not related to the ing and the importance of categorical outcomes in
missing measurements (e.g., following dropout), many areas of research, generalization of mixed-
GEE and MRM will produce quite similar esti- effects models for categorical outcomes has been
mates. This is the fundamental difference between an active area of statistical research. For dichoto-
GEE and MRM, that is, the assumption that the mous response data, several approaches adopting
missing data are dependent on the observed either a logistic or probit regression model and
responses for a given subject during that sub- various methods for incorporating and estimating
ject’s participation in the study. It is hard to the influence of the random effects have been
imagine that a subject’s responses that would developed (Gibbons 1981; Stiratelli et al. 1984;
have been obtained following dropout would be Wong and Mason 1985; Gibbons and Bock 1987;
Conaway 1989; Goldstein 1991). Here, briefly yij ¼ x0ij β þ σ υ θi þ eij (23)
described is a mixed-effects logistic regression
model for the analysis of binary data. Extensions in which case the error term eij follows a standard
of this model for analysis of ordinal, nominal, and logistic distribution under the logistic regression
count data are described in detail by Hedeker and model (or a standard normal distribution under the
Gibbons (2006). probit regression model). This representation
To set the notation, let i denote individuals and helps to explain why the regression coefficients
let j denote the repeated measurement occasions from a mixed-effects logistic regression model do
within each individual. Assume that there are not typically agree with those obtained from a
i = 1, . . ., N individuals and j = 1, . . ., ni mea- fixed-effects logistic regression model, or for
surement occasions nested within each individual. that matter from a GEE logistic regression model
Let Yij be the value of the dichotomous outcome which has regression coefficients that agree in
variable, coded 0 or 1. The logistic regression scale with the fixed-effects model. In the mixed
model is written in terms of the log odds (i.e., model, the conditional variance of the latent
the logit) of the probability of a response, denoted y given x equals σ 2υ þ σ 2e , whereas in the fixed-
pij. Considering first a random-intercept model, effects model this conditional variance equals
augmenting the logistic regression model with a only the latter term σ 2e (which equals either π 2/3
single random effect yields: or 1 depending on whether it is a logistic or probit
regression model, respectively). As a result,
" #
pij equating the variances of the latent y under these
ln ¼ x0ij β þ υi (21) two scenarios yields:
1 þ pij
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
where xij is the ( p + 1) 1 covariate vector σ 2υ þ σ 2e
βM βF
(includes a 1 for the intercept), β is the σ 2e
( p + 1) 1 vector of unknown regression param-
eters, and υi0 is the random subject effect. These where βF and βM represent the regression coeffi-
random effects are assumed to be distributed in cients from the fixed-effects and (random-
the population as N 0, σ 2υ . For convenience and intercepts) mixed-effects models, respectively. In
computational simplicity, in models for categori- practice, Zeger et al. (1988) have found that (15/
cal outcomes the random effects are typically 16)2π 2/3 works better than π 2/3 for σ 2e in equating
expressed in standardized form. For this, results of logistic regression models.
υ0i = σ υ θi and the model is given as: Several authors have commented on the differ-
ence in scale and interpretation of the regression
" # coefficients in mixed-models and marginal
pij models, like the fixed-effects and GEE models
ln ¼ x0ij β þ σ υ θi : (22)
1 þ pij (Neuhaus et al. 1991; Zeger et al. 1988). Regres-
sion estimates from the mixed model have been
Notice that the random-effects variance term termed “subject-specific” to reinforce the notion
(i.e., the population standard deviation σ υ) is now that they are conditional estimates, conditional on
explicitly included in the regression model. Thus, it the random (subject) effect. Thus, they represent
and the regression coefficients are on the same scale, the effect of a regressor on the outcome control-
namely, in terms of the log-odds of a response. ling for, or holding constant, the value of the
The model can also be expressed in terms of a random subject effect. Alternatively, the estimates
latent continuous variable y, with the observed from the fixed-effects and GEE models are “mar-
dichotomous version Y being a manifestation of ginal” or “population-averaged” estimates which
the unobserved continuous y. Here, the model is indicate the effect of a regressor averaging over
written as: the population of subjects. This difference of scale
and interpretation only occurs for nonlinear Once multiple trajectories have been identified,
regression models like the logistic regression analyses can be performed to predict trajectory
model. For the linear model this difference does class as a function of other covariates. This
not exist. approach is particularly useful in randomized tri-
als because it may suggest that for some groups of
individuals one treatment may be better than
Growth Mixture Models another treatment based on the subject’s predicted
trajectory. For example, if a subject’s age, number
A frequent characteristic of depression clinical of children, and ethnicity are predictive of a tra-
trials (such as the WECare study) is that outcomes jectory where outcomes are more favorable under
over time are subject to considerable between- medication rather than CBT, then one would con-
subject heterogeneity due to the fact that patients sider treating a patient with similar characteristics
often follow different trajectories over time. Some with medication. On the other hand, it may be that
participants may see immediate gains, only to a subject’s predicted trajectory suggests that both
relapse at a later date, while others will improve medication and CBT are effective. In that case,
gradually overtime. Some participants will not either treatment can be offered. In this way,
improve at all. When comparing the effectiveness growth mixture modeling may provide insights
of different treatments, it is important to identify on personalized depression treatments that are
and take into account these different trajectories tailored based on patient characteristics as well
because the effectiveness of an intervention as preferences.
may depend on the trajectory class of the par- More specifically, let ci be a latent categorical
ticipants. Despite the fact that heterogeneity of variable representing the unobserved membership
outcomes is common in depression studies, in a trajectory class for participant i, where ci =
most analyses such as mixed-effects regression 1, 2, . . ., K. The variable c is referred to as a
models assume that all individuals are drawn trajectory class variable. Define yij as the outcome
from a single population with common popula- for participant i at time j, j = 0, 1, . . ., ni. Then,
tion parameters (Muthén 2004). That is, they conditional on trajectory class k, the GMM aug-
assume that all individual trajectories vary ments Eq. 16 as follows
around a single mean trajectory. This assump-
tion goes counter to clinical observations and
empirical data where variation in trajectory yij j ci ¼ k ¼ β0k þ βik tij þ β2k tij Trti
shapes is routinely observed. When individuals
þ υ0ik þ υ1ik tij þ eijk (24)
follow several different trajectory shapes, con-
ventional repeated measures modeling may lead
to a distorted assessment of treatment effects. Both the random and fixed effects have the
Growth mixture modeling (Muthén and same interpretation as before, but now they are
Shedden 1999; Muthén et al. 2002; Xu and indexed by trajectory class k, so that they may
Hedeker 2002) relaxes the single population vary by trajectory class.
assumption to allow for parameter differences Class membership is expressed by a multino-
across several unobserved populations. Instead mial logistic regression of the form:
of considering individual variation around a sin- 0
gle trajectory, a growth mixture model (GMM) exi δk
Pðci ¼ kj xi Þ ¼ PK x0 δ (25)
s¼1 e
allows different classes of individuals to vary i s
around several different trajectories. In this way,

growth mixture modeling may do a better job of where the variable x can represent baseline
capturing between-subject variability because it covariates. When there are only two classes,
does not require that all individuals follow the Eq. 25 is a logistic regression estimating the prob-
same average trajectory over time. ability of being in one class versus another.
For binary variables x in Eq. 25, eδ can be features of these classes of models relative to
interpreted as the odds ratio of being in one class MRMs and CPMs.
versus another. For example, if x is gender, then Mixed-effects models, which allow one to esti-
one can estimate the odds of a male participant mate subject-specific change over time and pro-
being in one trajectory versus a female. vide valid estimates in the presence of data
The number of trajectories in a GMM must be missing at random should be considered as the
specified a priori. Typically, several GMMs are fit preferred methodology for analysis of longitudi-
assuming a different number of trajectory classes nal data by health services researchers. Most cur-
and the “correct” number of trajectories is chosen rent statistical software packages include
based on model fit criteria such as BIC. See functions for estimating MRMs and their various
Muth’en et al. (2002) and Muth’en et al. (2009) extensions, thus making them easily accessible to
for more detail on fitting GMMs in clinical trial the interested researcher.
settings and Siddique et al. (2012) for an example
of a GMM fit to the WECare data. Acknowledgments The authors wish to thank Jeanne
Miranda for use of the WECare data. Dr. Siddique’s work
was supported by grant K07 CA154862-01 from the
National Cancer Institute and R03 HS018815-01 from
Discussion the Agency for Healthcare Research and Quality.
Dr. Hedeker’s work was supported by Award Number
P01 CA098262 from the National Cancer Institute.
This chapter reviewed methods for the analysis of
Dr. Gibbons’ work was supported by R01 MH8012201
longitudinal data commonly encountered in from the National Institute of Mental Health. The content
health services research. The chapter began by is solely the responsibility of the authors and does not
discussing issues inherent in longitudinal data necessarily represent the official views of the National
Cancer Institute, Agency for Healthcare Research and
and then described methods for analyzing these
Quality, or the National Institutes of Health.
data, focusing on linear mixed-effects models and
covariance-pattern models for continuous data.
These methods were applied to data from a longi-
tudinal depression treatment trial, going into spe- References
cific detail on model selection, estimation of
treatment effects, calculation of effect sizes, and Berkhof J, Snijders TAB. Variance component testing in
multilevel models. J Educ Behav Stat.
interpretation. 2001;26:133–52.
Data from health services research are often Bock RD. Multivariate statistical methods in behavioral
missing and/or not continuous. These types of research. New York: McGraw-Hill; 1975.
data suggest the use of models in addition to Bock RD. Within-subject experimentation in psychiatric
research. In: Gibbons RD, Dysken MW, editors. Statis-
those discussed in this chapter. Due to space lim- tical and methodological advances in psychiatric
itations, extended models for missing data and research. New York: Spectrum; 1983a. p. 59–90.
nonlinear models for noncontinuous data were Bock RD. The discrete Bayesian. In: Wainer H, Messick S,
only briefly mentioned. As described, MRMs editors. Modern advances in psychometric research.
Hillsdale: Erlbaum; 1983b. p. 103–15.
and CPMs do allow for missing data and provide Bock RD. Measurement of human variation: a two stage
valid results under the assumption of missing at model. In: Bock RD, editor. Multilevel analysis of
random (MAR). Thus, the extended missing data educational data. New York: Academic; 1989.
models are useful to the extent that researchers Bryk AS, Raudenbush SW. Hierarchical linear models:
applications and data analysis methods. Newbury
suspect that the missing data are missing not at Park: Sage; 1992.
random, a situation that is impossible to ascertain Chi EM, Reinsel GC. Models for longitudinal data with
with the observed data. Finally, the chapter briefly random effects and AR(1) errors. J Am Stat Soc.
described generalized estimating equation (GEE) 1989;84:452–9.
Conaway MR. Analysis of repeated categorical measure-
models and growth mixture models (GMMs) for ments with conditional likelihood methods. J Am Stat
longitudinal data, noting some distinguishing Assoc. 1989;84:53–61.
Daniels MJ, Hogan JW. Reparameterizing the pattern mix- Laird NM. Missing data in longitudinal studies. Stat Med.
ture model for sensitivity analyses under informative 1988;7:305–15.
dropout. Biometrics. 2000;56:1241–8. Laird NM, Ware JH. Random-effects models for longitu-
Daniels MJ, Hogan JW. Missing data in longitudinal stud- dinal data. Biometrics. 1982;38:963–74.
ies: strategies for Bayesian modeling and sensitivity Liang K-Y, Zeger SL. Longitudinal data analysis using
analysis. New York: Chapman & Hall/CRC; 2008. generalized linear models. Biometrika. 1986;73:13–22.
de Leeuw J, Kreft I. Random coefficient models for multi- Little RJA. Modeling the drop-out mechanism in repeated-
level analysis. J Educ Stat. 1986;11:57–85. measures studies. J Am Stat Assoc. 1995;90:1112–21.
Demirtas H, Schafer JL. On the performance of random- Little RJA, Rubin DB. Statistical analysis with missing
coefficient pattern-mixture models for nonignorable data. 2nd ed. New York: Wiley; 2002.
dropout. Stat Med. 2003;22:2553–75. Longford NT. A fast scoring algorithm for maximum like-
Dempster AP, Rubin DB, Tsutakawa RK. Estimation in lihood estimation in unbalanced mixed models with
covariance component models. J Am Stat Soc. nested random effects. Biometrika. 1987;74:817–27.
1981;76:341–53. Longford NT. Random coefficient models. New York:
Diggle PJ, Heagerty P, Liang K-Y, Zeger SL. Analysis of Oxford University Press; 1993.
longitudinal data. 2nd ed. New York: Oxford Univer- Miranda J, Chung JY, Green BL, Krupnick J, Siddique J,
sity Press; 2002. Revicki DA, Belin T. Treating depression in predomi-
Fitzmaurice GM, Laird NM, Ware JH. Applied longitudi- nantly low-income young minority women. J Am Med
nal analysis. 2nd ed. Hoboken: Wiley; 2012. Assoc. 2003;290:57–65.
Gibbons RD. Trend in correlated proportions. PhD thesis, Miranda J, Chung JY, Green BL, Krupnick J, Siddique J,
University of Chicago, Department of Psychology, Revicki DA. One year outcomes of treating depression
1981. in predominantly low-income young minority women.
Gibbons RD, Bock RD. Trend in correlated proportions. J Clin Consult Psychol. 2006;74:99–111.
Psychometrika. 1987;52:113–24. Muth’en BO. Latent variable analysis: growth mixture
Gibbons RD, Hedeker D, Waternaux CM, Davis modeling and related techniques for longitudinal data.
JM. Random regression models: a comprehensive In: Kaplan D, editor. Handbook of quantitative meth-
approach to the analysis of longitudinal psychiatric odology for the social sciences. Newbury Park: Sage;
data. Psychopharmacol Bull. 1988;24:438–43. 2004.
Goldstein H. Multilevel mixed linear model analysis using Muth’en B, Shedden K. Finite mixture modeling with
iterative generalized least squares. Biometrika. 1986; mixture outcomes using the em algorithm. Biometrics.
73:43–56. 1999;55:463–9.
Goldstein H. Nonlinear multilevel models, with an application Muth’en B, Brown CH, Masyn K, Jo B, Khoo ST, Yang
to discrete response data. Biometrika. 1991;78:45–51. CC, Wang CP, Kellam SG, Carlin JB, Liao J. General
Goldstein H. Multilevel statistical models. 4th ed. Hoboken: growth mixture modeling for randomized preventive
Wiley; 2011. interventions. Biostatistics. 2002;3(4):459–75.
Grady JJ, Helms RW. Model selection techniques for the Muth’en BO, Brown CH, Leuchter A, Hunter A. General
covariance matrix for incomplete longitudinal data. approaches to analysis of course: applying growth mix-
Stat Med. 1995;14:1397–416. ture modeling to randomized trials of depression med-
Guo W, Ratcliffe SJ, Ten Have TR. A random pattern- ication. In: Shrout PE, editor. Causality and
mixture model for longitudinal data with dropouts. psychopathology: finding the determinants of disorders
J Am Stat Assoc. 2004;99:929–37. and their cures. Washington, DC: American Psychiatric
Hardin JW, Hilbe JM. Generalized estimating equations. Publishing; 2009. Forthcoming.
2nd ed. New York: Chapman and Hall; 2012. Neuhaus JM, Kalbfleisch JD, Hauck WW. A comparison
Hedeker D. Random regression models with auto- of cluster-specific and population-averaged approaches
correlated errors. PhD thesis, University of Chicago, for analyzing correlated binary data. Int Stat Rev.
Department of Psychology, 1989. 1991;59:25–35.
Hedeker D, Gibbons RD. Longitudinal data analysis. Potthoff RF, Roy SN. A generalized multivariate analysis
New York: Wiley; 2006. of variance model useful especially for growth curve
Hui SL, Berger JO. Empirical Bayes estimation of rates in problems. Biometrika. 1964;51:313–6.
longitudinal studies. J Am Stat Assoc. 1983;78:753–9. Raudenbush SW, Bryk AS. A hierarchical model for study-
Ibrahim J, Molenberghs G. Missing data methods in lon- ing school effects. Sociol Educ. 1986;59:1–17.
gitudinal studies: a review (with discussion). TEST. Raudenbush SW, Bryk AS. Hierarchical linear models. 2nd
2009;18:1–43. ed. Thousand Oaks: Sage; 2002.
Kaplan D, George R. Evaluating latent growth models Rubin DB. Inference and missing data. Biometrika.
through ex post simulation. J Educ Behav Stat. 1998; 1976;63:581–92.
23:216–35. Siddique J, Brown CH, Hedeker D, Duan N, Gibbons RD,
Kenward MG, Molenberghs G. Parametric models for Miranda J, Lavori PW. Missing data in longitudinal
incomplete continuous and categorical longitudinal trials–part B, analytic issues. Psychiatr Ann. 2008;
data. Stat Methods Med Res. 1999;8(1):51–83. 38(12):793–801.
Siddique J, Chung JY, Brown CH, Miranda J. Comparative Weiss RE. Modeling longitudinal data. New York:
effectiveness of medication versus cognitive behavioral Springer; 2005.
therapy in a randomized controlled trial of low-income Wolfinger RD. Covariance structure selection in general
young minority women with depression. J Consult Clin mixed models. Commun Stat Simul Comput.
Psychol. 2012;80:995–1006. 1993;22:1079–106.
Singer JD, Willett JB. Applied longitudinal data analysis. Wong GY, Mason WM. The hierarchical logistic regres-
New York: Oxford University Press; 2003. sion model for multilevel analysis. J Am Stat Assoc.
Stiratelli R, Laird NM, Ware JH. Random-effects models 1985;80:513–24.
for serial observations with binary response. Biomet- Xu W, Hedeker D. A random-effects models for classifying
rics. 1984;40:961–71. treatment response in longitudinal clinical trials.
Strenio JF, Weisberg HI, Bryk AS. Empirical Bayes J Biopharm Stat. 2002;11:253–73.
estimation of individual growth curve parameters Zeger SL, Liang KY. Longitudinal data analysis for discrete
and their relationship to covariates. Biometrics. and continuous outcomes. Biometrics. 1986;42:121–30.
1983;39:71–86. Zeger SL, Liang KY, Albert PS. Models for longitudinal
Verbeke G, Molenberghs G. Linear mixed models for data: a generalized estimating equation approach. Bio-
longitudinal data. New York: Springer; 2000. metrics. 1988;44:1049–60.
Competing Risk Models
19
Melania Pintilie
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
Motivation and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
The Need to Analyze Time to Event of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
The Follicular Lymphoma Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
The Pressure Ulcer Healing (PUH) Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Estimation of the Probability of Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Necessity for Special Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Nonparametric Estimation of Probability of Event in the Presence
of Competing Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
The Justification of the Kalbfleisch and Prentice Formula (1) . . . . . . . . . . . . . . . . . . . . . . . . . 437
The Intuitive Justification for Formula (1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
Theoretical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
General Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
A Theoretical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Fine and Gray Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
Interpretation of the Fine and Gray Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Cox Regression in the Presence of Competing
Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
Other Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Analyzing Correlated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Analyzing Case-Cohort Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Sample Size and Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
M. Pintilie (*)
University Health Network, Toronto, ON, Canada
e-mail: pintilie@uhnresearch.ca

https://doi.org/10.1007/978-1-4939-8715-3_30
434 M. Pintilie
. . . the event whose occurrence either precludes the

Abstract
occurrence of another event under investigation or
In the time-to-event analysis when more than fundamentally alters the probability of occurrence
one type of event can occur and not all are of of this other event.
interest, the situation of competing risks
appears. In this chapter the competing risks As an example, consider a cohort of patients
will be defined, and the need for special sta- with chronic kidney disease. The interest is to
tistical analysis techniques will be justified. study the time to dialysis. However, patients
The methodology for estimation and model- could die due to comorbidities and never reach
ing in the presence of competing risks will be the point of dialysis. The death before dialysis
presented. The cumulative incidence function initiation is a competing risks event.
and the Fine and Gray model will be intro- The time to local recurrence as the event of
duced as the main methods to analyze com- interest in cancer treatment is another example. In
peting risks data. The cumulative incidence this case, the occurrence of a distant recurrence
function will be contrasted to Kaplan-Meier could be considered a competing risks event
method. For a deeper understanding of the because the treatment for such a recurrence
modelling, the subdistribution hazard will be could change the probability of developing a
defined. local recurrence.
The importance of considering the compet- The existence of competing risks was recog-
ing risks in the process of designing a study nized by David and Moeschberger (1978) in their
will be emphasized, and the steps needed to be monograph, and later Kalbfleisch and Prentice
taken in the calculation will be presented. For (1980) introduced a nonparametric estimation of
a better understanding of the material and of the probability of the event of interest. And yet in
the interpretation, examples will be given at medical research, it was completely ignored until
each step. recently. Most of the statistical analyses published
before 1990 used inappropriate techniques to ana-
lyze the time-to-event data when a competing risk
Introduction was present. Basically the competing risks event
was considered censored as if for that observation
Motivation and Examples the event of interest could still be observed in the
future.
In the time-to-event analysis, the outcome is given
by two pieces of information: a continuous part
representing the duration of time under the The Need to Analyze Time to Event
follow-up and a binary part indicating whether at of Interest
that time an event was observed or not. The obser-
vation for which the event was not observed is There are many examples in medical research
called censored. It is assumed that with enough when it is important to study a specific event of
follow-up, the events will be observed for all interest.
observations. The obvious example is the time to In cancer research one of the standard treat-
death. Death is an event that eventually will be ments is radiation therapy (RT). RT treats a small
observed for each observation. However, in some part of the body, where the tumor is. Thus, if one is
situations more than one type of event can happen. interested in the effect of the treatment, it is fair to
The occurrence of one type of event can hinder the think of the effect in the treated area, local control
observation or change the probability of other of disease. Yet, a patient could experience other
types of events being observed. Such a situation events: a relapse in a different part of the body,
is called competing risks. Gooley et al. (1999) another malignancy, or death of a different cause.
gave a formal definition of a competing risks Sometimes all the events (event of interest and
situation as: the competing risks event) are combined in a
19 Competing Risk Models 435
composite end point. However, this approach time to pressure ulcer healing as a function of a
could diminish the effect on the event of interest patient’s Palliative Performance Scale status, an
or even suggest a totally different conclusion. important clinical factor which for this analysis is
Sometimes a composite end point is not fea- dichotomized at 40, bedridden vs. ambulatory. If
sible. During the treatment a patient needs to be a patient had more than one pressure ulcer, one
assisted by temporarily inserting a feeding tube. was chosen at random for analysis to avoid
After the treatment and as the patient recovers, having to deal with the added complexity of
the tube is taken out. The time at which the tube correlated observations (see section “Analysing
is taken out could be considered as a surrogate Correlated Data”). Dr. Vincent Maida and
for response. This end point cannot be consid- Dr. Marguerite Ennis graciously allowed the
ered together with death, for example, as the use of the pressure ulcer healing data (Maida
former is a positive outcome and the latter a et al. 2012) for the illustration of the concepts
negative one. of this chapter.
The following two examples will be utilized
along this chapter to illustrate the different aspects
of the analysis in the presence of competing risks. Estimation of the Probability of Event
The datasets were slightly modified to help illus-
trate competing risks analysis. Clinical conclusion Necessity for Special Techniques
cannot be drawn from these analyses.
In the presence of competing risks, the estimates
based on the Kaplan-Meier (KM) method when
The Follicular Lymphoma Example the competing risk is censored are not probabil-
ities. This concept is illustrated using the cohort
Consider as an example a cohort of patients of follicular lymphoma described in section “The
with early-stage follicular lymphoma with the Follicular Lymphoma Example.” The event of
follow-up ranging between 1 and 31 years. For interest is the time to second malignancy follow-
this disease, the prognosis is good with 10 year ing the lymphoma diagnosis. For the moment,
survival of approximately 75%. These patients the competing risks (the deaths without second
could experience relapses (local and/or distant), malignancy) are ignored and censored. With this
a second malignancy, or die of other causes. assumption, KM estimates can be obtained. KM
Each of these events can be of interest with estimates can also be calculated for the deaths
the rest being competing risks with the excep- without second malignancy as event and with the
tion of death which cannot have any competing second malignancy censored. If the KM esti-
risks. mates can be interpreted as probabilities, then
the calculated 1-KM would be the probability
for each of the two specific types of event to
The Pressure Ulcer Healing (PUH) happen. Since the two types of events are mutu-
Example ally exclusive, the sum of the 1-KM estimates
calculated at each time point should be the prob-
This is a cohort of patients with advanced illness ability of any of the two events to occur, namely,
who were admitted to a palliative care center and the probability for either second malignancy or
followed until death (Maida et al. 2008, 2012). death without second malignancy. In Fig. 1, the
All patients had at least one pressure ulcer at the broken line is the 1-KM estimate for the second
time of admittance, and the time from admittance malignancy, while the solid line represents the
to complete healing was recorded for all pressure sum of the 1-KM estimates for second malig-
ulcers that healed. The life expectancy for the nancy and the death without second malignancy.
cohort is low with median survival less than a The fact that the top line goes beyond the possi-
month. The goal of this analysis is to study the ble upper limit of a probability is a proof that
436 M. Pintilie
Fig. 1 (1-KM) Estimates

for second malignancy and
death without second 1.0
malignancy in the follicular
lymphoma dataset 0.8
1-KM estimates
0.6
1-KM for death

0.4 without second malignancy
0.2
1-KM for second malignancy

0.0
0 5 10 15 20 25 30
Time (years)
Fig. 2 Plot of the CIF and 0.4

1-KM for second CIF
malignancy in follicular 1-KM
Estimates for the probability
lymphoma
of second malignancy
0.3
0.2
0.1
0.0
0 5 10 15 20 25 30
Time to second malignancy
when competing risks are present, the KM esti- and dev j are the number of events of interest
mates cannot be interpreted as probabilities. at time tj. The probability of event can be esti-
mated as:
X d ev j
Nonparametric Estimation ^ ev ðtÞ ¼
F ^S tj1 (1)
all j, t t
nj
of Probability of Event in the Presence j
of Competing Risks
Here ^S tj1 is the KM estimate for the comple-
Kalbfleisch and Prentice (1980) modified the KM ment of the probability of all types of events.
estimator to obtain the probability of event in the ^ ev ðtÞ is sometimes called cumu-
In the literature, F
presence of competing risks. Briefly, suppose lative incidence function (CIF). Figure 2 shows
t1 < t2 < . . . are the ordered time points for all the estimation based on (1) and on the KM
types of events, nj are the number at risk at time tj, method for the second malignancy in follicular
Fig. 3 Probability of Probability of second malignancy

1.0 Probability of second malignancy or death
second malignancy and
death without second
Estimates for the probability

malignancy in the follicular 0.8 nd
co
of second malignancy
lymphoma se eath
r
fo r d
ity o
0.6 a bil ncy
o b n a
Pr alig
m Probability for death
0.4 Without second
malignancy
0.2
Probability for
0.0 second malignancy
0 5 10 15 20 25 30
Time to second malignancy
lymphoma. Clearly the two are different, and X dj X d 1j þ d2j

^ ðtÞ ¼
F ^S tj1 ¼ ^S tj1
the 1-KM estimates are larger than the CIF. It n nj
tj t j tj t
can be proven algebraically that in the presence
X d1j X d 2j
of competing risks, 1-KM is always larger than ¼ ^S tj1 þ ^S tj1
the CIF. n
tj t j
n
tj t j
^ 1 ðtÞ þ F
¼F ^ 2 ðtÞ
The Justification of the Kalbfleisch (3)
and Prentice Formula (1)
It is easy to recognize the formula for the
The well-known formula for the KM estimates estimation of the probability of the event of inter-
can be written as a sum for its complement, the est (1) in the two terms in (3). Thus the probability
estimator for the probability of all events: of all events can be partitioned in the probabilities
of the constituent types of events. Figure 3 shows
^ ðtÞ ¼ 1 ^S ðtÞ
F the partition of the probability of second and death
with second malignancy or death into probability
nj d j of second malignancy and probability of death
¼1 ∏
tj t nj (2) without second malignancy in the follicular lym-
X dj phoma dataset.
¼ ^S tj1
tj t
n j
The Intuitive Justification for

where t1 < t2 < . . . are the ordered time points for
Formula (1)
the events, nj are the number at risk at time tj, and
dj are the number of events at time tj. Suppose that In the absence of censoring or competing risks,
there are two types of events which can occur at the estimation of the probability of event for a
time tj; then the total number of events that can time point t0 using the KM method gives an iden-
happen can be written as the sum of the number tical result to the intuitive calculation of the ratio
of events of type 1, d1j, and number of events of between the number of events occurred before t0
type 2, d2j. Then the probability of all events (2) and the total number of subjects. In this sense the
can be written as a sum of the probabilities of the KM method can be considered an extension for
two types of events: calculating the probability of event in the presence
438 M. Pintilie
Table 1 Table of percentages the software calculates the variance but may not
Time CIF 1-KM Naive estimates give the confidence interval. The confidence
point (%) (%) (%) interval can be calculated using the same tech-
1 1.5 1.5 1.5 nique as in a noncompeting risks situation
2 3.1 3.2 3.1 (Kalbfleisch and Prentice 1980). If cCIF is the
3 3.7 3.8 3.7 complement of CIF (i.e., 1-CIF), then the
4 4.1 4.2 4.1 limits of the confidence interval for cCIF are
5 4.6 4.9 4.6 given by:
6 6.1 6.7 6.1
7 6.8 7.6 6.8 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
8 7.2 8.1 7.2 z1α=2 dðcCIFÞ
Var
cCIFexpðAÞ , where A¼
9 7.8 8.9 7.8 cCIF lnðcCIFÞ
10 7.9 9.2 7.9
(4)
and z1α/2 is the quantile of the standard

of censoring. Along the same lines, the CIF esti- normal distribution for 95% confidence interval,
mation (given by (1)) is an extension of the KM z1α/2 = 1.96.
method for calculating the probability of event in
the presence of competing risks. Thus, if compet-
ing risks are not present, the CIF is identical to
Theoretical Background
1-KM. If competing risks exist but there is no
censoring, the CIF is identical to the ratio of the
This presentation of the competing risks issue is
number of events of interest to the number of
not intended to be mathematical in nature. How-
subjects. To illustrate, the follow-up of the follic-
ever, for a thorough understanding of the subject,
ular lymphoma dataset was completed to 10 years
it was decided to include some theoretical details.
in an artificial way.
The reader who is not mathematically inclined
Note that (Table 1) the CIF is identical to the
could skip this section.
naïve estimates (the ratio between the number of
events up to the time point of interest and the total
number of subjects) while the 1-KM estimates are
General Remarks
larger. It must be emphasized that the equality
between the CIF and the naïve estimates holds
In statistics there are two interrelated functions:
only when there are no censored observations
the density, usually denoted by f, and the distribu-
with shorter follow-up time than the time point
tion function, usually denoted by F. The known
at which the calculation is made.
bell shape of the normal distribution is the plot of
the density function. The integral to a certain point
x measures the area under that curve, F(x), and it
Confidence Intervals represents the probability that a number generated
from the normal distribution is smaller than x.
As for any estimation, it is desirable to be able (See Fig. 4.)
to assess the degree of confidence. The custom-
ðx
ary way is to present the 95% confidence
interval of the estimate. This involves the Fð x Þ ¼ Pð X x Þ ¼ f ðuÞ du (5)
1
knowledge of the distribution of the estimate
and its variance. The ln(1-CIF) can be consid-
ered to be normally distributed. The variance In the time-to-event analysis, the distribution
can be calculated in several ways, but the dif- function appears usually as its complementary
ferences are minimal (Pintilie 2006). In general function (1 F(x)) and is called the survivor
Fig. 4 The density Density function of the standard normal distribution

function of the standard 0.4
normal distribution
0.3
f(x)
0.2
0.1
F(x)
0.0
–4 –2 0 2 4
x
function, usually denoted by S(t). An important FðtÞ ¼ PðT tÞ

property of any distribution function, F, is that it ¼ PðT tjev ¼ 1Þ þ PðT tjev ¼ 2Þ
is an increasing function ranging between ¼ F1 ðtÞ þ F2 ðtÞ (7)
0 and 1. Another important function in the
time-to-event analysis is the hazard function, h where it is assumed that there are only two types
(t), which is the instantaneous risk for event. It can of events and ev = 1 and ev = 2 refer to events of
be calculated as the ratio between the density and types 1 and 2, respectively. As mentioned above,
the survivor function. For the exponential distribu- F(t) is an increasing function ranging between
tion, the density, distribution, and the hazards func- 0 and 1. Since all terms are probabilities and
tions are: thus positive, and the probability of all events is
at most 1, it follows that each of the probabilities
f ðxÞ ¼ λeλt for a specific event can reach only a value p < 1.
Thus the probability of one event in the presence
FðtÞ ¼ 1 eλt
(6) of another event ranges between 0 and a value
f ðt Þ p < 1. It follows that F1 and F2 cannot be regarded
hðtÞ ¼ ¼λ
1 FðtÞ as true, proper distributions. They are called
subdistributions.
The importance of the hazard stems from the For each of these subdistributions, there is a
way modeling is performed. If the choice is for subdensity ( f1 and f2). The hazard for event of
parametric modeling, the decision on the distribu- type 1 can be defined in two ways:
tion is based on the shape of the hazard. If the
modeling is performed utilizing the ubiquitous f 1 ðtÞ
h~1 ðtÞ ¼ (8)
Cox proportional hazards model, then the hazard 1 Fð t Þ
itself is modeled.
f 1 ðt Þ
γ 1 ðtÞ ¼
e (9)
1 F1 ðtÞ
A Theoretical Example
Each of these hazards can be modeled, and the
It was shown in (3) that the estimator of the results could be different as is their interpretation.
probability of all events can be partitioned in The hazard from (8) is called the subhazard while
the probabilities of the constituent types of the hazard from (9) is called the subdistribution
events. This can be formulated more generally as: hazard.
440 M. Pintilie
As a theoretical example, consider the sub- More on the interpretation is presented in

distribution for an event of interest which is expo- section “Interpretation of the Fine and Gray
nentially distributed. Under the latent failure time Model.”
model in which the event of interest and the
competing risks event are independent and expo-
nentially distributed with the parameter λ1 and λ2, Regression Model
respectively, the subdistribution for the event of
interest is: Fine and Gray Model
λ1
The Fine and Gray model (1999) is an extension
F1 ð t Þ ¼ 1 eðλtþλ2 Þt (10)
λ1 þ λ2 of the Cox model to the situation of competing
risks. The effects are estimated by maximizing the
Note that the quantity which is in brackets pseudo-likelihood, which is a function that
is the distribution function of an exponential dis- depends on the observed covariates and the
tribution with parameter λ1 + λ2. As a distribution order in which the events were observed. As in
function, this quantity spans the 0–1 interval. On the Cox regression, the hazard is modeled as:
the other hand, the factor with which this is mul-
tiplied is a positive quantity less than 1. Therefore, γ ðtj xÞ ¼ γ 0 ðtÞeβx (14)
the maximum of this function is λ1λþλ 1
2
, a quantity
less than 1. The two hazards are: where x is the covariate, γ 0 is the baseline hazard,
and β is the coefficient estimated by maximizing
h~1 ðtÞ ¼ λ1 (11) the pseudo-likelihood given by:
!
f ðtÞ λ1 ðλ1 þ λ2 Þeðλtþλ2 Þt r eβxj
γ 1 ðtÞ ¼ 1
e ¼ (12) PLðβÞ ¼ ∏ P (15)
1 Fð t Þ λ2 þ λ2 j¼1 i Rj wij e
βxi
Note that the subhazard is the same as the where r is the number of events of interest and Rj
hazard of the marginal distribution. This is always is the risk set at time tj. This formula is written
true under the latent failure time assumption and if only for one covariate but it can easily be
the two types of event are independent. This lends extended to many covariates. The difference
easily to a nice interpretation of the effect in the between (15) and the partial likelihood of Cox
absence of the other event. However, the assump- regression is the weight wij and the risk set. In
tion of independence cannot be proven and rarely Cox regression the risk set is defined as the set of
can be made (Tsiatis 1975). In the absence of observations with longer observed time that the
independence, the analysis of the subhazard can- current event. In addition, for the Fine and Gray
not be interpreted. model, the risk set also includes all the competing
When the two events are not independent, the risks events at all time points regardless of the
subhazard is no longer the hazard of the marginal time at which the competing risk was observed.
model: The involvement of the competing risks event is
mitigated by the weight: the longer the duration
h~1 ðtÞ ¼ λ1 þ μt (13) between the current event and the observed com-
peting risks event, the smaller the weight. For
where μ is the parameter which controls the level example, a competing risks event which happens
of dependence between the two types of events. at 2 years participates fully in the pseudo-
In contrast, the analysis of the subdistribution likelihood for the terms before 2 years and partic-
hazard does not assume independence, and it can ipates less and less in the pseudo-likelihood for
be interpreted as reflecting the observable effect. the terms which are farther and farther from
Fig. 5 The risk set for Cox

regression (a) and Fine and a R4 R5
Gray regression (b)
R6
J=1 J=2 J=3 J=4 J=5 J=6
R3 time
R2
R1
b
W42 W65
W62 R6
J=1 J=2 J=3 J=4 J=5 J=6
time
W32 R3
R1
2 years. The weights are based on the distribution Table 2 Types of events
of the censored time. Type of event Frequency
In the two diagrams in Fig. 5, the horizontal Second malignancy 56 Event of
line represents the time axis, the black circles interest
represent the individual for which the event of Relapse before second 260 Competing
interest is observed, the vertical lines are for the malignancy risks event
censored observations, and the purple crosses in Death without relapse or 54 Competing
second malignancy risks event
diagram B represent the competing risks events.
Censored 171
In diagram A there are no competing risks and all
the individuals with the observed time larger than
the individual for which the partial likelihood is
written are in the risk set. For diagram B the coefficient can be interpreted as the sub-
competing risks are always in the risk set, every distribution hazards ratio. As in the Cox regres-
time with a different weight. Thus, the weight for sion, the assumption of proportionality of hazards
the individual marked with j = 2 is one for the is made and can be checked by visually inspecting
term j = 1, w32 for j = 3, w42 for j = 4, and w62 for the Schoenfeld-type residuals.
j = 6 where 1 w32 w42 w62. . .. Consider the example in section “The Follicu-
lar Lymphoma Example” with three types of
events: second malignancy, disease failure
Interpretation of the Fine and (relapse), and death without second malignancy
Gray Model or disease failure. Any of these events could be
considered as event of interest with the rest of
The Fine and Gray regression (1999) models the them as competing risks. The types of events
subdistribution hazard (9). The exponent of a and their frequency are listed in Table 2.
442 M. Pintilie
Fig. 6 The probabilities 1.0

for the three types of event Second malignancy
Disease failure
Death
0.8
Probability of event
0.6
0.4
0.2
0.0
0 5 10 15 20 25 30
Time to event
From Table 2 it is apparent that the most Table 3 The results of the model with second malignancy
frequent type of event is disease failure. Figure 6 as event of interest
shows that the disease failures occur shortly after HR 95% conf. int. p-value
the initial diagnosis of follicular lymphoma, Age 1.03 1.01–1.05 0.0074
while the second malignancies and the death Sex: men vs. women 0.98 0.58–1.67 0.94
without disease failure happen at a more steady Stage: 2 vs. 1 0.78 0.41–1.48 0.44
rate. Residual bulk 0.79 0.45–1.41 0.43
The Fine and Gray model was applied to sec- Chemotherapy 1.54 0.78–3.02 0.22
ond malignancy and to disease failure. Tables 3
and 4 show the results of these models. Table 4 The results of the model for disease failure
Thus, age is the only significant covariate for HR 95% conf. int. p-value
both types of events. As expected, the disease- Age 1.02 1.01–1.02 0.0019
specific factors like stage and residual bulk are Sex: men vs. women 1.04 0.81–1.33 0.76
significant for the disease failure. Furthermore, Stage: 2 vs. 1 1.57 1.19–2.08 0.0016
chemotherapy is marginally not significant. Residual bulk 1.49 1.14–1.95 0.004
Those with residual bulk or of stage 2 are about Chemotherapy 0.72 0.51–1.01 0.055
1.5-fold more likely to have disease failures than
Table 5 The results of the model with all end points
the ones without residual bulk or stage 1, respec-
combined
tively. Those receiving chemo are less likely to
HR 95% conf. int. p-value
have a disease failure.
Age 1.04 1.03–1.04 <0.0001
Table 5 shows the results when all end points
Sex: men vs. women 1.16 0.94–1.42 0.16
are combined. The results in Table 5 are close
Stage: 2 vs. 1 1.41 1.11–1.79 0.0044
to those seen in Table 4 although somewhat
Residual bulk 1.45 1.15–1.82 0.0015
weaker for stage, bulk, and chemotherapy. The
Chemotherapy 0.83 0.63–1.11 0.22
reason for the resemblance between the last two
tables is the fact that there are many more
relapses than second malignancies: 260 vs. 56. Cox Regression in the Presence
Thus the results in Table 5 are driven by the of Competing Risks
number of relapses. Some of the effects are
weaker because those covariates have an oppo- If the competing risks event is censored, then,
site effect for the second malignancy than for from the technical point of view, the analysis
disease failure. could be carried out using the usual Cox model
or Kaplan-Meier estimates, but the interpretation, wound healing are part of the system failures asso-
when possible, is different. In the previous sec- ciated with death. Only in the rare situation when
tions, the bias involved in estimating the proba- the event of interest can be assumed independent
bility of an event when competing risks are from the competing events can the Cox model
ignored was described. The main question is results be interpreted as the effect of a covariate
whether there is a bias when the competing risks when the competing risks do not exist.
are ignored in the modeling process and indeed, if
it is possible to predict how large and in which
direction this bias is. Another issue is if the results Other Developments
of a model when the competing risks are ignored
can be interpreted at all. Analyzing Correlated Data
In many instances the results of the Cox PH
model and Fine and Gray model will be very A notable development is the extension of the
similar giving the wrong impression that this is a Fine and Gray model to accommodate correlated
general pattern. However, the two models do not data (strata and/or cluster). For example, in the
always give similar results. Moreover, the direc- PUH example, one may wish to analyze all pres-
tion of bias cannot be predicted. Finally, the sure ulcers of a patient rather than just one. This
results from the Cox model can be interpreted creates clustered data. Zhou et al. (2011, 2012)
only under the strict assumption that the distribu- extended the Fine and Gray model by applying
tion of the event of interest and the distribution of Lee et al.’s (1992) approach.
competing risks event are independent. This
assumption can rarely be made and never substan-
tiated (Tsiatis 1975). The Wound PUH data (given Analyzing Case-Cohort Design
in section “The Pressure Ulcer Healing (PUH)
Example”) offers an example when the two When the event of interest is rare, the collection of
models give different results. data for the whole cohort is not feasible. The case-
Based on the Fine and Gray model (Table 6), cohort design allows one to take advantage of the
the analysis suggests that the performance status number of events of interest while including only a
is an important prognostic factor with regard to fraction of the data without the event of interest.
pressure ulcer healing. The patients who are bed- Pintilie et al. (2010) developed a pseudo-likelihood
ridden have a longer time to healing than the to analyze a case-cohort design in the presence of
ambulatory patients. The competing risk of death competing risks based on Barlow’s work (1999).
is ignored in the Cox model, and the effect is much
attenuated, the p-value becomes nonsignificant,
and one may reach the wrong conclusion. The Sample Size and Power
probabilities of death and pressure ulcer healing
are not independent: knowing that death occurred For the time-to-event analysis, the calculation of
changes the probability that the pressure ulcer the sample size necessary to achieve a certain
would have healed if the patient could be observed power involves two steps: (a) the calculation of
indefinitely. One possible mechanism for this is the necessary number of events and (b) the calcu-
because the physiological systems needed for lation of the necessary number of patients to
observe that number of events. The number of
events nev necessary to detect a specific hazard
Table 6 The prognostic value of palliative performance
status for wound healing ratio (HR) is given by:
HR 95% conf. int. p-value
Fine and Gray 3.3 1.7–6.7 0.00078 pffiffiffiffiffiffi z1α2 þ z1β
nev ¼ (16)
Cox model 1.7 0.8–3.6 0.13 sd ðxÞ lnðHRÞ
444 M. Pintilie
Fig. 7 The increase in the

number of patients as the 160
Number of patients necessary

hazard for CR increases
to observe 50 evens
140
120
100
0.0 0.1 0.2 0.3 0.4

Hazard for the competing risks
where z1α2 and z1β are the quantiles of the dramatically. This is equivalent to say that as the
standard normal distribution for α2 and β. Thus, λcr increases the total number of patients neces-
for α = 0.05, z1α2 ¼ 1:96 and for β = 0.2, sary to observe, a certain number of events of
z1β = 0.84. sd(x) stands for the standard devia- interest increase greatly.
tion of the covariate to be tested. If a randomized Intuitively, this is obvious since the competing
trial with equal allocation in two arms is planned, risks hinder the observation of the event of inter-
then sd ðxÞ ¼ 12 . The total number of patients to est. One example is shown in Fig. 7 where an
produce nev is: n ¼ Pnevev where Pev is the probability increase of the competing risks from 0 to 0.4
of the event of interest to occur during the study causes a doubling of the final sample size.
period. When there are no competing risks, Pev The higher the rate of competing risks, the less
can be expressed formulaically as: likely is to observe the event of interest, and
therefore a larger initial sample sizes is needed.
eλf eλðαþf Þ Therefore, ignoring the competing risks in the
Pev ¼ 1 (17) design stage will create an underpowered study
λa
and will result in a waste of effort and money.
where λ is the hazard rate of the whole cohort, a is Although the independence between the event
the accrual time, and f the follow-up time added to of interest and the competing risks event cannot be
the accrual time. usually assumed in the analysis phase, this
When competing risk are present the formula assumption is needed to be made in this section
changes to: for mathematical tractability. The second assump-
tion made was that the time to the two types of
λev events follows exponential distribution.
Pev ¼
λev þ λcr
(18)
eðλev þλcr Þf eðλev þλcr Þðaþf Þ Example 1 Suppose that the researcher wants to
1
ðλev þ λcr Þa validate the prognostic value of a specific marker
in a cohort of patients. The marker is measured as
where λev and λcr are the marginal hazards for the present or absent, and the frequency of a positive
events of interest and competing risks event, marker is about half in this population. The cohort
respectively. It is obvious that if λcr = 0, i.e., is already assembled, and it is known that there are
when competing risks do not exist, the formula 50 events of interest. The researcher wants to
(18) becomes (17). A close look of formula (18) know if there is enough power to detect an effect
shows that as λcr increases, the Pev decreases size corresponding to a subdistribution hazard
ratio of 2 at the level of significance of 0.05. formula (16) puts the approximate number of
Solving the formula (16) for z1β, the power is events at 122. The formula (18) can be applied
found to be 69%. for each of the two arms, and probability of event
for the standard arm is 0.62 and for the new
treatment is 0.41. On average it can be said the
Example 2 A randomized study is being probability of event in the study is approximately
planned to test a new way of delivering radiation 0.5. Since the necessary number of events is
for cancer patients. Since radiation is a local 122, the total number of patients needs to be
treatment, the investigators are interested to test 244. This center can accrue 50 patients per year,
its effect on local disease. Patients may experi- and thus 244 is a reachable goal. Note that
ence a relapse outside the treated area or death of relaxing the accrual effort is not allowed as the
other causes, both representing competing risks maximum number the center can accrue is very
events. It is known from previous studies that the close to the total number of patients needed.
rate of local disease in the standard arm is
λev = 0.4 and the rate of other relapses and
death of other causes λcr = 0.1. It is expected
that the new treatment will not change the rate of Software
competing risks but it will decrease the rate of
local disease to 0.2. The cancer center can accrue The competing risk analysis can be performed
50 patients per year, and it is desirable that the almost entirely within R environment using the
study will accrue the patients in 5 years or less. package cmprsk developed by Gray. This package
The analysis will take place 1 year after finishing contains functions which give the possibility to
accrual. The α level is set to 0.05, and the desired estimate the probability of event of interest at any
power is 80%. Thus, z1α/2 = 1.96 and time point, to plot these estimates, to apply the
z1β = 0.84. Fine and Gray model, and to plot the predictive
Note that the given rates for the local relapse probabilities of the event of interest based on this
refer to the marginal distributions; basically model. The package crrSC developed by Zhou
these rates are the hazards of the marginal expo- extends the Fine and Gray model for stratified or
nential distributions. The ratio of the two rates cluster data.
for the local relapse (0.4 and 0.2) is not the The package mstate can be used to modify the
subdistribution hazard ratio which will be data such that the usual Cox model can be applied.
detected. Unfortunately, even in the simple situ- This analysis still models the subdistribution haz-
ation when all distributions are exponential and ard, and the obtained coefficients are very close to
independent, the subdistribution hazards ratio is the results obtained using the function crr from
not independent of time. Its formula can be writ- cmprsk. However, the variance-covariance matrix
ten as: is slightly different, but for large datasets the
differences are minimal.
STATA has a function which allows the user to
λ1 ðλ1 þ λcr Þ λcr þ λ2 eðλev þλcr Þt eðλev þλcr Þt
sHR ¼ apply the Fine and Gray model. The plots
λ2 ðλ2 þ λcr Þðλcr þ λ1 eðλ1 þλcr Þt Þ
obtained are the predictive plots from the model.
(19)
where λ1 and λ2 are the hazard rates for the local

relapse for the standard and the new treatment, References
respectively, and λcr is the hazard rate for the
competing event for both arms. Barlow EW, Ichikawa L, Rosner D, Izumi S. Analysis of
case-cohort design. J Clin Epidemiol. 1999;52
For the time span (0–6 years) of this study, sHR (12):1165–72.
ranges between 2 and 1.1. The approximate aver- David HA, Moeschberger ML. The theory of competing
age is about 1.66 and with this hazard ratio risks. London: Griffin; 1978.
446 M. Pintilie
Fine JP, Gray RJ. A proportional hazards model for the Maida V, Ennis M, Corban J. Wound outcomes in patients with
subdistribution of a competing risk. J Am Stat Assoc. advanced illness. Int Wound J. 2012;9(6):683–92.
1999;94:496–509. Pintilie M. Competing risks a practical perspective. Chich-
Gooley TA, Leisenring W, Crowley J, Storer BE. ester: Wiley & Sons Ltd; 2006.
Estimation of failure probabilities in the presence of Pintilie M, Bai Y, Yun LS, Hodgson DC. The analysis
competing risks: new representations of old estimators. of case cohort design in the presence of
Stat Med. 1999;18:695–706. competing risks with application to estimate the
Kalbfleisch JD, Prentice RL. The statistical analysis of fail- risk of delayed cardiac toxicity among Hodgkin
ure time data. New York: John Wiley & Sons, Inc.; 1980. Lymphoma survivors. Stat Med. 2010;29(27):
Lee EW, Wei LJ, Amato D, Leurgans S. Cox-type regres- 2802–10.
sion analysis for large numbers of small groups of Tsiatis A. Nonidentifiability aspect of problem of compet-
correlated failure time observations. In: Klein JP, ing risks. Proc Natl Acad Sci U S A. 1975;72:20–2.
Goel PK, editors. Survival analysis: state of the art. Zhou BQ, Latouche A, Rocha V, Fine J. Competing risks
Dordrecht: Kluwer; 1992. regression for stratified data. Biostatistics. 2011;
Maida V, Corbo M, Dolzhykov M, Ennis M, Irani S, 67(2):661–70.
Trozzolo L. Wounds in advanced illness: a prevalence Zhou BQ, Fine J, Latouche A, Labopin M. Competing
and incidence study based on a prospective case series. risks regression for clustered data. Biostatistics.
Int Wound J. 2008;5(2):305–14. 2012;13(3):371–83.
Modeling and Analysis of Cost Data
20
Shizhe Chen and XH Andrew Zhou
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
Methods for Mean Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Parametric Methods on Continuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Nonparametric Methods on Continuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
Zero-Inflated Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
Two Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
Applications on a Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Parameters of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Linear Regression on Raw Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Transformation on Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
Transformation on E[Y] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Two-Part Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
Some Basic Concepts of Prediction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473
Difference from Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
Concept of General Pivots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
Variances and Estimators for Back-Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
Abstract
Cost has become an important outcome in health
S. Chen services research. It can be used not only as a
Department of Biostatistics, University of Washington, measure for health care spending but also as a
Seattle, WA, USA measure for a part of health care value. Given
X. A. Zhou (*) ever-increasing rising health care expenditure,
Beijing International Center for Mathematical Research, the value of health care should include not only
Peking University, Beijing, China
traditional measures, such as mortality and mor-
VA Puget Sound Healthcare System, University of bidity, but also the cost of health care. Due to a
Washington, Seattle, WA, USA
e-mail: azhou@u.washington.edu limited resource, a new treatment with a slightly

https://doi.org/10.1007/978-1-4939-8715-3_31
448 S. Chen and X. A. Zhou
better efficacy but much higher cost than an variable will exhibit heteroscedasticity. This kind
existing treatment may not be a choice of a of the mean-variance relation can also be
treatment for a patient. Hence, it is important to observed in many known parametric distributions,
be able to approximately analyze cost data. How- such as a Poisson distribution and a lognormal
ever, appropriately analyzing health care costs distribution. Many traditional statistical methods,
may be hindered by special distribution features such as ordinary least square (OLS), require
of cost data, including skewness, zero values, homoscedasticity in their validity in making sta-
clusters, heteroscedasticity, and multimodality. tistical inference. Ignoring heteroscedasticity in
Over the decades, various methods have cost data can lead to wrong statistical inferences.
been proposed to address these features. This The fourth feature is censoring of the cost out-
chapter would be devoted in introducing come, which occurs when the cost of a patient over
methods that are able to provide relatively a study period is observed. For example, a patient
trustworthy results with acceptable efficiency, drops out of the study before the study ends; as a
covering topics on mean inference, regression, result, we only observe the partial cost of this
and prediction. patient over the whole study period. Although the
problem of censored cost data is related to survival
analysis, analytic techniques are different from tra-
Introduction ditional survival analysis ones.
The fifth feature is clustering, which occurs
Rapidly rising of health care costs and health care due to the effects of clinicians and hospitals.
reforms to containing the health care costs makes Since some clinician tends to give patients similar
the cost be an important outcome in any health prescriptions and uses similar kinds of drugs and
services research. It is not straightforward to ana- treatments, the medical cost of this clinician is
lyze cost data due to some of its special distribu- expected to be correlated. The same reason goes
tional features, which prevent us from using with clinics and hospitals. Ignoring clustering
traditional statistical methods. would lead to invalid statistical inference.
The first feature of cost data is its positive skew- The final feature, not the last one, is multi-
ness or skewed to the right. The skewness arises due modality, which occurs when the distribution has
to a few patients with high costs, who are accounted more than one mode. This feature may be related to
for the major part of the total expenses. In addition, clinician clustering. For example, if the distribution
cost data often comes with a heavy upper tail, which of cost data is generated from patients who are cared
occurs when the tail of the distribution cannot be by two physicians with different treatment strate-
bounded by an exponential distribution. gies: one physician uses a more liberal approach of
The second feature is discontinuity of the dis- ordering tests and describing drugs, and another is
tribution at the zero value, which occurs because more conservative in treating his/her patients, the
not all subjects in the population of interest occur distribution of the cost data is a mixture of the
health care costs in a given study period. For distributions of two physicians, which may lead to
example, patients without any hospitalization dur- a bimodal distribution.
ing a study period have zero in-patient costs. One In this chapter, we are concentrating on a review
consequence of the distributional discontinuity at of statistical methods that can handle the first three
zero is that many standard statistical methods, distributional features of cost data: (1) skewness,
which require a continuous distribution assump- (2) zero values, and (3) heteroscedasticity. We
tion, cannot be used in the inference of cost data. review various methods that have been proposed
The third feature is heteroscedasticity, which to address these features. As there is no single
occurs when the variance of the cost of a patient is method that can handle all features that one might
not constant. For example, if the variance of a encounter with in a health cost study, in this chap-
random variable is a function of the mean, data ter, we also provide a rough evaluation of those
generated from the distribution of this random methods to help researchers in choosing methods
20 Modeling and Analysis of Cost Data 449
that are most suitable. This chapter is organized as the population is defined as subjects who
follows. Section “Introduction” focuses on mean received treatments and paid for them. Such
inference, which is the very foundation of health population is interesting for the study of the
cost analysis; section “Methods for Mean Infer- revenue of a department. It might also be a
ence” is about regression models, which is a com- distribution with a point mass at zero, when
plicated version of mean inference, and here the population is defined as a certain group of
covariates are taken into consideration; section people like citizens in a city, people in an insur-
“Regression” is a brief introduction on prediction ance plan, etc. This kind of distribution is
models and some important concepts about predic- named zero-inflated distribution or delta distri-
tion models. bution by Aitchison (1955) The first situation
can be seen as a special case of the second one
where the point mass at zero is 0. Hence,
Methods for Mean Inference methods for continuous distributions can be
used in the zero-inflated distribution with some
Methods and theorems are developed to summa- modifications. This section will begin with dis-
rize the distribution of health cost data which, as cussions on continuous distributions and then
described in the previous section, does not have proceed to the case with positive point mass
“nice” properties that we usually assume to be at zero.
true. The choice of quantity that summarizes the
distribution – or, in other words, the summary
Parametric Methods
measure – should be considered on the base of
on Continuous Data
statistical convenience as well as scientific impor-
tance. For example, the sample median is known
As a classic way of doing statistical analysis, the
to be a better summary measure for the central
distribution of data is sometimes assumed to be
location of a skewed distribution than sample
known and has finite parameters that characterize
mean, but investigators care about the total cost
the distribution. This kind of assumptions is called
instead of the median cost in most of the time. As
parametric assumption. For instance, normality is
will be shown later, a bunch of methods were
a well-known example of parametric assumption,
proposed to find consistent and efficient estima-
in which the distribution is characterized by two
tors for the population mean.
parameters, the expectation and variance. Unfor-
Generally speaking, methods with more
tunately, this normality assumption does not apply
assumptions perform better than others when the
for medical cost data, which is often highly right
assumptions hold or not being violated too much.
skewed. A common practice is to transform
Study has shown that using models with inappro-
the data into a more well-behaved form. And
priate assumptions on certain data would result in
then it is possible to assign the normality assump-
disastrous estimators (Briggs et al. 2005). Some
tion or some other parametric models on the
methods depend on few or no assumptions, which
transformed data.
can be called robust models, but these methods are
Box (1976) proposed a family of transforma-
often low in efficiency. As the famous quote says
tions that can be modified to fit in various situations:
“All models are wrong, but some are useful” (Box
1976). The choice of models is important espe-
yλ 1
cially in health cost data where the samples behave ¼ xβ þ e, if λ 6¼ 0; logðyÞ
poorly, though no clear boundary can be drawn in λ
making this decision. It is recommended to check ¼ xβ þ e, λ ¼ 0, (1)
the assumptions when applying certain methods.
Depending on the target population, medical where y is the original dependent variable, x is
costs have two possible distributions. It might be a row vector of covariates, e is an additive error
a continuous distribution with positive values when term that is independent of the covariates x and β,
and λ are parameters to be estimated. Box (1976) Point Estimate

stated that under an appropriate transformation, Several articles in the past decade have been
the error term can be approximated by a normal published in searching for efficient estimators of
distribution or at least more symmetric than the (3). Some of them are well established and have
original scale. Notice that (1) has a dependent been tested by time (see Zhou 1998). Before pro-
variable y and covariate x in the formula, and the ceeding to discuss these methods, there are a
mean inference is a degenerated version of it few notations that need to be set up. As in (2),
where x is set to be a row of 1 s. {Y1, . . . , Yn} is a random sample from a lognor-
Notice that when λ is set to be zero, the mal distribution with mean θ and variance τ2.
transformation is taking the logarithm of y. Define Wi = log(Yi) , 8i (1, . . . , n). Then
The log transformation is the most widely {W1, . . . , Wn} comes from a normal distribution
used transformation in analysis of expenditure with mean μ and variance σ 2. Let W be defined as
data, not only because it reduces the skewness Pn
1
Pn
W i =n, and S2, the sample variance, be n1
of samples but also because of its real-world

i¼1
2 i¼1
interpretations. Manning (1998) gave several Wi W .
rationales for using log transformation in 1. Maximum Likelihood Estimator (MLE)
his articles: “(1) A desire for multiplicative or
proportional responses to a covariate of inter-
est;... (2) a desire to generate an estimate that ^θ m ¼ exp W þ 0:5 n 1 S2 (4)
n
easily yields an elasticity; ... or (5) a need to deal
with dependent variables that are badly skewed
The MLE for lognormal distribution is a biased
to the right.”
estimator. And the bias is
The same reasoning is applicable for medical
cost. The expenditures for users are implemented
with a log transformation to reduce the skewness E θ^
m θ
inherent in health expenditure data. Under certain n þ 1 2 2
ðn1Þ=2
¼ θ exp σ 1 σn 1 ,
circumstances (see Duan 1983), inferences based 2n
on logged models are much more precise and
robust than direct analysis of original dependent if 0 < σ 2 < n. The corresponding mean square
variable. Another attractive property of log trans- error is
formation is that it has an explicit expression for
the untransformed expectation. The expectation ðn1Þ=2
2 n2 2 σ2
of dependent variable y (untransformed) in a log E ^θ m θ ¼ θ2 exp σ 12
2n n
model is
ð ðn1Þ=2 !
n þ 1 2 σ2
Eðyj xÞ ¼ exβ ee dFðeÞ: (2) 2exp
2n
σ Þ 1
n
þ1 , (5)
If, after transformation, the residuals follow a when 0 < σ 2 < n/2. The MSE in (5) can be
normal distribution, then the expected value of estimated by plugging in the estimators of σ 2 and
y can be written down by straight forward μ, which are S2 and W, respectively.
calculations: 2. Uniformly Minimum-Variance Unbiased
Estimator (UMVUE)
Eðyj xÞ ¼ exp xβ þ 0:5σ 2 ðxÞ : (3)

^θ u ¼ exp W gn S2 =2 , (6)
Notice that (3) shows us that the untransformed
mean is a function of both transformed mean and
variance. where
1
X
be turned into the CIs of θ by simply
1 n 1 þ 2r
gn ðtÞ ¼ exponentiating the lower and upper bounds.
r! n1
r¼0
r (7) Recall that {Wi = log(Yi)} are normally distrib-
ðn1Þt
r n1
∏ : 2
uted, so W þ S2 is the UMVU estimator for ln(θ).
i¼1 n 1 þ 2i
n
The target now is to estimate the confidence inter-
2
val of W þ S2 . Zhou and Gao (1997) summarized
It can be tell from its name that ^θ u is an unbiased
several practical procedures with median or large
estimator for θ. The mean square error for ^θ u is
sample sizes. Krishnamoorthy and Mathew
(2003) applied the general pivotal quantity on
2 1 2 1 4
E ^
θu θ 2
¼ θ exp σ gn σ 1 : this issue and got asymptotically efficient estima-
n 2n tors for the confidence intervals.
(8) In general, one cannot use confidence intervals
to make statistical inference as they have slight
3. Conditionally Minimal Mean Squared Error differences in between them. But in this simple
(MSE) Estimator case of one-sample mean inference, hypothesis
testing is equivalent with testing whether the

n4 2 mean under null hypothesis lies inside the 100
^
θ c ¼ exp W gn S , (9)
2n 2 (1 α)% confidence intervals or not. And thus,
a more desirable confidence interval will be a
where gn is the same as defined in (7). This more reliable approach of hypothesis testing.
estimator is biased; the bias is Notice that the 100(1 α)% confidence inter-
vals can also be used in hypothesis testing under

3 2 this one-sample setting. The null hypothesis will
^
E θ c θ ¼ θ exp σ 1 :
2n be rejected with significant level α when the null
mean lies outside of the confidence intervals.
The MSE of ^
θ c is 1 Cox’s method: The estimator for the variance
of W þ S2 =2 is S2/n + S4/(2(n — 1)). Cox, in a
! personal communication to Land (1972), proposed
2 2 2 ð n 4Þ 2 4
E ^
θc θ ¼θ 2
exp σ gn σ to construct the confidence intervals for ln(θ) by
n 2nðn 1Þ2
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
3
2exp σ 2 þ 1Þ: (10) S2 S2 S4
2n W þ Z 1α=2 þ , (11)
2 2 2ð n 1Þ
Simulation results by Zhou (1998) show that the
conditionally minimal MSE estimator ^θ c is uni- where Z1 α/2 is the 100(1 α)% quantile of a
formly superior to the alternatives. However, standard normal distribution, i.e., normal distribu-
MSEs of those estimators are almost the same tion with mean zero and standard deviation of
when the sample size is sufficiently large 1. The corresponding confidence intervals for θ is
(n 200). In this case, the MLE ^θ m is recommended 0 8 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi9
because it is easy to compute. With a small sample < 2 2
S4 =
@exp W þ S Z1α=2 S þ ,
size, the conditionally minimal MSE estimator ^θ c is : 2 2 2ðn 1Þ;
more preferable than others. 8 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi91
< S 2
S2 S4 = A
Confidence Intervals exp W þ þ Z 1α=2 þ :
: 2 2 2ð n 1Þ ;
The construction of confidence intervals is more
straightforward than the estimators, due to the fact
that quantiles are invariant under monotone trans- 2. Angus’s conservative method: Although the
formation. The confidence intervals of ln(θ) can exact pivotal quantity is not available in this
problem, an approximate pivotal statistics is avail- calculate T i as in (13), and denote the tl as the
able as 1 α/2 empirical quantile and tu as the α/2
empirical quantile. The estimated bounds are
pffiffiffi
n W þ S2 =2 lnðθÞ
V ðθ Þ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
, (12) sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

S2 1 þ S2 =2 S2 t S2
L1α ¼ W þ plffiffiffi S2 1 þ , (16)
2 n 2
which, in a finite sample, has the same distri-
bution as sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

S2 t 2 S2
U 1α ¼ W þ puffiffiffi S 1þ : (17)
pffiffiffi 2 n 2
n 2
Nþσ χ n1 =ðn 1Þ 1
2
T ðνÞ ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
, (13) So the 100(1 α)% confidence intervals for θ
χ n1 σ 2 χ 2n1
1þ is (exp(L1α), exp.(U1α)).
n1 2 n1 4. A signed likelihood ratio approach: Wu et al.
(2003) used the log-likelihood ratio to construct
where N and χ 2n1 are independent random confidence intervals. The signed log-likelihood
variables from a standard normal distribution ratio r is defined as
and a χ2 distribution with n-1 d.f., respectively.
The conservative CIs are 1=2
r ðmÞ ¼ sgnðm ^ σ^ 2 l m, σ^ 2m
^ mÞ 2 l m, :
(18)
S2 t1α=2 ðn 1Þ
L1α ¼ W þ pffiffiffi
2 n The log-likelihood as a function of m = log(θ)
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
and σ 2 is
S2
S2 1 þ , (14)
2
P
n
Yi
S2 qα=2 ðn 1Þ n 1
U 1α ¼Wþ pffiffiffi l m, σ 2 ¼ log σ 2 þ m σ 2 i¼1 2
2 n 2 2 σ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1 X 2
n
1 2
n
S2 2 Yi m σ2 :
S2 1 þ , (15) 2σ i¼1 2 2σ 2
2
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

n The overall MLE is

where qα=2 ðn 1Þ ¼ 2 2
n1
χ ðn1Þ
1 .
α=2
Then the 100(1 α)% confidence intervals 1

σ^ 2 ¼ ω2 ω21 , ^ ¼ ω1 þ σ^ 2 ,
m
for θ is (exp(L1α), exp.(U1α)). This 2
approach is called conservative because the
P
n P
n
probability that ln(θ) falls into the CIs is no where ω1 ¼ 1n W i , ω2 ¼ 1n W 2i . For a
less than 1 α. i¼1 i¼1
3. Parametric bootstrap method: Notice that fixed m, the MLE of σ 2 is

in (13), T is determined by N and χ 2n1. Though T
h i1=2
itself is hard to generate, N and χ 2n1 come from
σ^ 2m ¼ 2 ðm þ 1Þ2 þ ω2 2mω1 2m 2:
two simple distributions. It is possible to get samples
of T by generating a series of N and χ 2n1 . Suppose
N i N ð0, 1Þ and χ 2 2
i χ n1 , i ¼ 1, . . . , B, Thus, the specified form of signed
where B is a sufficiently large number. Then log-likelihood ratio r is
r ðmÞ ¼ ^ mÞ
sgnðm the same algorithm described in last section, with
n 1=2 simply replacing r with r*.
σ^ 2
n log σ^m2 þ n ω1 m þ σ^ 2m =2Þ
6. A generalized pivot approach:
(19) Krishnamoorthy and Mathew (2003) applied the
concept of generalized pivotal quantity on lognor-
The
100(1
α)% confidence
intervals would mal means. The generalized pivotal quantity can be
be exp m ^ α=2 , exp m
^ 1α=2 , where m ^ α is the viewed as a new concept of hypothesis test, and it
solution of r(m) = zα – the ath quantile of a yields the same coverage rate as a standard
standard normal distribution. The equation has frequentist hypothesis testing asymptotically. For
no explicit solution, but it can be approached by more details about generalized pivotal quantity, or
numerical methods such as Newton-Raphson fiducial quantity, please see the appendix of this
method, etc. An example of constructing the chapter. The generalized pivot for ln(θ) is given by
lower bound is given by (Wu et al. 2003):
i). Set up accuracy e, differentiation constant δ, Z S 1 S2

T¼W pffiffiffiffiffiffiffiffiffiffiffi pffiffiffi þ 2
, (21)
and initial value m0. U= n 1 n 2 U ðn 1Þ
ii). Estimate m* as
where Z N(0, 1), U 2 χ 2n1 , and Z and U2 are
zα=2 r ðm0 Þ
m ¼ m0 þ : independent. The same approach in parametric
½r ðm0 þ δÞ r ðm0 þ δÞ=ð2δÞ bootstrap is used here to estimate the generalized
confidence intervals. Suppose Zi N ð0, 1Þ and
U 2 2
i χ n1 , i ¼ 1, . . . , B, , where B is a suffi-
ciently large number. Then calculate T i as in
iii). Substitute m0 with m* if |m m0| > e and
(21), and denote the if as the α/2 empirical
repeat step (ii) again.
quantile and tu as the (1 α/2) empirical
quantile.

The estimated bounds are tl , tu . So the 100
The construction of upper bound is basically
the same except for replacing zα/2 with z1α/2. α)%
(1 generalized
confidence interval for θ is
exp tl , exp tu . However, as being pointed out
5. An adjusted signed log-likelihood ratio
approach: Wu et al. (2003) also proposed a mod- by Krishnamoorthy and Mathew (2003), the type I
ified version of the signed likelihood ratio statis- error and the power of such a test might depend on
tics. They defined an adjusted signed unknown parameters. It would be necessary to
log-likelihood statistics as simulate type I error probability in order to see
whether the test controls type I error.

1 uðmÞ Simulation results by Zhou and Gao (1997)
r ðmÞ ¼ r ðmÞ þ r ðmÞlog , show that Cox’s method has the best performance
r ðm Þ
in moderate to large samples, in terms of both
where r(m) is defined as in the previous sec- computational simplicity and statistical efficiency.
tion. The u(m) here is a function of m defined as And thus Cox’s method is recommended when
sample size is sufficiently enough. With small
! !1=2 sample size, the parametric bootstrap method pro-
pffiffiffi σ^ 1 1
uð m Þ ¼ nð m^ mÞ þ : vides the most satisfactory confidence interval
σ^ 3m 2 σ^ 2m among the methods examined in Zhou and Gao
(20) (1997). However, Krishnamoorthy and Mathew
(2003) showed that the generalized pivot
The 100(1 α)% confidence intervals can be approach has better performance than the para-
0
constructed
in the same fashion of r(m) s: metric bootstrap approach in one-side hypothesis
exp m^ α=2 , exp m
^ 1α=2 , where m
^ α is the solu- testing with small sample size. In a following
tion of r(m) = zα The equations are solved with simulation by Wu et al. (2003), the authors
showed that the adjusted signed log-likelihood The first one is to estimate it directly from the
ratio-based method provided the most satisfactory sample standard error. In other words, the standard
n
P
coverage probability and average biases. 1
error sb is the square root of nðn1 Yi Y .
Þ
Although the computation of adjusted signed i¼1
log-likelihood ratio approach is way more com- The second one is to use the bootstrap
plicated than others, it is recommended when the approach proposed by Efron (1981). The algo-
sample size is too small for other methods. rithm can be summarized as below:
It is worthwhile to notice that all methods
above are based on the lognormal assumption, 1. Resample n observations from the original data
which requires the log-transformed data to be with equal weight and replacement.
normally distributed. Although the estimators 2. Calculate the sample mean from the newly
still behave well when the log-transformed data sampled data, denoted as θis .
is approximately normally distributed, Briggs 3. Repeat steps 1 and 2 for M times, where M is a
et al. (2005) argued that the inference would be sufficiently large number chosen by the
invalid and misleading when the sample distribu- investigator.

tion extremely deviants from the assumed distribu- 4. Calculate the standard error sb of θis .
tion. Hence, checking the normality (with QQ plot,
goodness of fit, etc.) of transformed data is always Based on central limit theorem, the confidence

necessary. When the normality assumption is not interval would be Y þ Sb Zα=2 , Y þ Sb Z1α=2 ,
appropriate, other distributions such as Gamma are where Zq is the q-th quantile of a standard normal
available. And it is always possible to trade effi- distribution.
ciency for robustness via using nonparametric Hall (1992) proposed a monotone transformation
methods which will be introduced later. of t-statistics to correct for skewness effects of a
positive skewed distribution without assuming any
parametric forms. The original t-statistic is
Nonparametric Methods pffiffiffi
n Yθ
on Continuous Data T¼ ,
^τ
It is totally possible to estimate θ and confidence 2
where ^τ ¼ 1n Y i Y . The transformation is
interval and do hypothesis testing without para-
metric assumptions. Although the efficiency is
gðT Þ ¼ T þ n1=2^γ aT 2 þ b
often not satisfactory, the central limit theorem (23)
granted that the sample mean would converge to þn1 ða^γ Þ2 T 3 =3,
a normal distribution. n
P 3
Denote the sample mean as where ^γ ¼ 1n Y i Y =^τ 3 , a ¼ 1=3 , and
i¼1
b = 1/6. It is monotone and invertible. The unique
^θ s Y: (22) inverse function of g is
Central limit theorem and Slutsky’s theorem T ¼ g1 h i

1=3
grant that ¼ n1=2 ða^γ Þ1 1 þ 3a^γ n1=2 x n1 b^γ 1 :
(24)
1 ^
θ s θ ! N ð0, 1Þ,
sn There are several ways to construct the confi-
dence intervals based on the proposed g function.
where sn is the standard error of X. There are It was shown in Zhou and Gao (2000) that the
two ways to estimate this standard error, both of bootstrap approach has the best performance. The
which are straightforward. algorithm is similar to what has been discussed:
1. Resample n observations from the original data A most commonly used parametric model for
with equal weight and replacement. zero-inflated data is a two-part model. A two-part
2. Calculate g(t) from the newly sampled data. model assumes that the number of zero observations
3. Repeat steps 1 and 2 for M times, where M is a is a random variable from a binomial distribution bin
sufficiently large number chosen by the (n,p), where n is the number of observations and p is
investigator. the probability of one subject to have zero medical
4. Denote the sample α/2 and 1 α/2 quantiles as cost in study period. The nonzero observations,
gα=2 and g1α=2 . conditioned on the fact that they are nonzero, are
treated as the continuous data discussed in previous
resulting I α two-sided
The confidenceintervals sections. The conditional distribution is assumed to

are Y n ^τ g g1α=2 , Y n1=2^τ g1 gα=2 .
1=2 1 be a lognormal distribution in this section.
For each group, the distribution of samples is a
Zhou and Gao (2000) recommended the appli-
lognormal distribution with a point mass at zero,
cation of parametric bootstrap version of Hall’s
which is named as delta distribution by Aitchison
method, which yields the best coverage rate for
(1955). Suppose {Y1, ..., Yn} is a random sample
both upper and lower endpoints in a simulation
from a delta distribution, then the population
study of one-sided confidence intervals.
mean is
In another simulation study, Zhou (1998) showed
that the sample mean has relatively large mean
θ ¼ ð1 pÞ exp μ þ σ 2 ,
square error even when the sample size is as large
as 200, compared to other estimators discussed in
where p is the probability of the random vari-
the previous section. And the mean square error
able to be zero and μ and σ are mean and variance,
increases as σ increases, which is equivalent to say
respectively, of the conditional normal distribu-
as the skewness increases. It is important to notice
tion after transformation. Denote the number of
that the simulation study is conducted on lognormal
zero observations as N0, the number of nonzero
data, where the lognormal assumption actually
observations as N1. In this section, the parameter
holds. This might explain part of the bad perfor-
of interest is θ, and, again, the construction of
mance of sample mean compared to estimators
confidence intervals of θ is also discussed.
based on lognormal assumption. Yet the efficiency
of sample mean on skewed data is still very low.
Point Estimate
1. The MVUE fo θ is
Zero-Inflated Data

^θ A ¼ ð1 ^p Þ expðμ 1 2
As discussed, medical cost data is often accompa- ^ Þgn σ^ , (25)
2
nied with a considerable amount of observations
that have zero cost. The proportion of zero data where
might sometimes reach 30%. This point mass at
zero causes extra difficulty in making statistical N0
^p ¼ ,
inference, but it could be easily fixed with small n
modifications of the methods used on continuous
data. The nonparametric methods described in the 1 X N1
μ
^¼ wi ,
previous section, in fact, need no modifications at N 1 i¼1
all and can be used directly in this situation. For
instance, the sample mean is a nonparametric and
estimator of the population mean, and bootstrap
would give a confidence interval for it. So they
1 X N1
will not be discussed in this section anymore, and σ^ 2 ¼ ^ Þ2 :
ðwi μ
the focus will be placed on parametric methods. N 1 1 i¼1
pffiffiffiffiffiffi pffiffiffiffiffiffi 2 !
2. A bias-corrected MLE for θ is N1 N1 σ N 1 χ ðN1 1Þ
log þZþ 1
σ nð 1 pÞ 2 N1
^ 1 2 T¼
0:5 ,
θ M ¼ ð1 ^p Þ exp μ^ σ^ : (26) χ2 σ2 χ 2
2 ðN1 1Þ
nσ 2 þ N 1
nN 1
1 þ 2N1 ðN 1 1Þ
Notice that in (26), the unbiased estimator σ^ 2 is (28)

used instead of the MLE NN1 1
1
σ^ 2. That is the reason where Z and χ 2ðN1 1Þ are independent random
why it is named a bias-corrected MLE.
variables with standard normal distribution and
Confidence Intervals χ2 distribution with Ni 1 degrees of freedom.
Several methods have been proposed to construct The procedure for bootstrap is to (i) generate
the confidence intervals. the number of zero observations, N0, from a bino-
1. The MVUE intervals: Owen and DeRouen mial distribution Bin(n,p), (ii) generate Z* and χ2*
(1980) derived a minimum-variance unbiased from the distributions described above, (iii) cal-
estimator (MVUE) confidence interval for culate the T* with (28); and (iv) repeat i through iii
the population mean of zero-inflated lognormal for sufficiently many times and get the sample
quantiles tα/2 and t1α/2.
distribution. The asymptotic variance of ^θ A is
So the two-sided 100(1 α)% confidence

V θ^ A ¼ n1 exp 2^μ þ σ^ 2
intervals are
1 2
t ^ σ þ σ^ 4 :
p ð1 ^p Þ þ ð1 ^p Þ 2^
2 ^θ M exp tα=2 SE
^ , ^θ M exp t1α=2 SE
^ :
So the 100(1 α)% confidence intervals of ^θ A 4. A signed likelihood ratio approach: The
can be asymptotically approximated by ML confidence intervals are based on the
pffiffiffiffiffiffi pffiffiffiffi asymptotic normality of MLE, which is ques-
^
θ A z1α=2 V, ^θ A zα=2 V :
tionable with small or moderate samples. An
alternative would be the likelihood ratio inter-
2. The ML confidence intervals: Using delta
val. The log-likelihood as a function of m = log
method and property of MLE, a consistent vari-
(θ), μ, and σ2 is
ance
estimator
of the bias-corrected MLE,
log ^
θ M , can be written as
σ2
l m, μ, σ 2 ¼ N 0 log 1 exp m μ
2 4 2
^ 2 ¼ N 0 þ σ^ þ σ^ :
SE
nN 1 N 1 2N 1 σ2 N1
þ N1 θ μ logσ 2
So the two-sided 100(1 α)% confidence 2 2
intervals are 1 X N1
ðwi μÞ2 :
^ ^ , ^θ M exp z1α=2 SE
θ M exp zα=2 SE ^ : 2σ 2 i¼1
3. A bootstrap approach for ML confidence Since there are nuisance parameters in the
intervals: Similar to the Angus methods in the log-likelihood, the profile likelihood for m will
previous section, an approximate pivotal statistics be used to compute the likelihood ratio statistics.
can be derived: In general, the way to solve this problem is
σ^ 2 σ^ 2 to (i) use iterative algorithm to find the fi and
logð1 ^
pÞ þ μ ^ þ logð1 pÞ μ
T¼ 2 2 : a2 that maximized the log-likelihood given
n o0:5
^ σ^ 2 σ^ 4 m and μ + σ 2 > m; (ii) define lprof ðmÞ ¼
p Þ þ nð1^p Þ þ 2nð1^p Þ
p
nð1^
^ ½m, σ^ 2 ½mÞ, and find the m that maximizes
lðm, μ
(27) this profile log likelihood; (iii) define likelihood
It follows the same distribution as the follow- ratio statistics W ðmÞ ¼ 2 lprof ðm ^ Þ lp ðmÞ ffi ;
profffiffiffiffiffiffiffiffiffiffiffi
ing statistic: and (iv) define Z ðmÞ ¼ sgnðm ^ m Þ W ðm Þ .
The
100(1
α)% confidence
intervals would be where
exp m ^ α=2 , exp m^ 1α=2 , where m
^ α is the solu-

tion of Zðm ^ α Þ ¼ zα . σ^ 2 ðmÞ
exp m μ ^ ðmÞ
5. An adjusted signed log-likelihood ratio 2
â m ¼ ,
approach: Tian and Wu (2006) proposed a mod- σ^ 2 ðmÞ
ified version of the signed likelihood ratio 1 exp m μ ^ ðmÞ
2
statistics. They defined an adjusted signed â m
log-likelihood statistics as ^b m ¼ ,
σ^ 2 ðmÞ
1 exp m μ ^ ðmÞ
uðmÞ 2
Z ðmÞ ¼ Z ðmÞZ1 ðmÞlog , XN1
Z ðmÞ
T ¼ W 2i :
where Z(m) is defined i¼1
pffiffiffiffiffiffiffiffiffiffiffiffi as in the previous
^ mÞ W ðmÞ . The u(m) here is
section: sgnðm
complicated:
The
100(1
α)% confidence
intervals would be
exp m ^ α=2 , exp m ^ 1α=2 , where m
^ α is the solu-

AC tion of Z ðm ^ α Þ ¼ zα .
uðmÞ ¼ (29) 6. A generalized pivot approach: Tian (2005)
BD
applied the concept of generalized confidence
where A, B, C, and D are intervals on the zero-inflated data. Recall that the
models are almost the same except for the excess
μ
^ ðmÞ μ
^ ðmÞ μ ^ zeros. Tian derived a generalized pivot for p using
A¼ ^ am þ 1 þ 2
σ^ ðmÞ 2^ σ 2 σ^ 4 ðmÞ the relationship between binomial distribution
and beta distribution. The author also provided a
1 1 N1
þ log computing algorithm for this method:
σ^ 2 ðmÞ 2^ σ 4 ðm Þ N 0 â m
2 2 2

1 σ^ μ^ μ
^ ðm Þ i). Compute the transformed sample mean W
log 2 2þ 2
2 σ^ ðmÞ 2^
σ 2^σ ðmÞ and sample variance S2.
ii). Generate Z ~ N(0,1), U 2 χ 2N1 1 T p1 ~ beta(N0
1 ^
am 1 1 ^ 2 ðmÞ
μ
þ 2 þ þ 2 4
σ^ ðmÞ 2 2 2^ σ ðmÞ 2^ σ ðmÞ + 1, N1), and T p2 ~ beta(N0, N1 + 1). Com-
h qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffii pffiffiffiffiffiffi

1 1 pute Tθ ¼ W Z=U= N 1 1 s= N 1 þ
2 , 2 2
2^
σ 2 2^ σ ðmÞ S =U =ðN 1 1Þ . Then compute T 1 ¼ log

n 1 T p1 þ T θ and T 2 ¼ logð1T p2 Þ þ T θ .
B¼ ,
2N 0 σ^ 6 iii). Repeat ii for sufficiently many times and get a
nN 31 series of T1’s and T2’s.
C¼ , iv). Take the α/2 sample quantile of T1’s, denoted
2N 0 σ^ 6
" as L, and take the (1 α /2) sample quantile
N 0 ^
b m N1 N 0 ^b m N1 T of T2’s, denoted as U. The 100(1 α)% con-
D¼ 2
þ 4 6
σ^ ðmÞ 4 σ ðmÞ σ^ ðmÞ
2^ fidence intervals would be (L, U).
P
N1
#
2^
μ ðm Þ Wi The simulation by Zhou and Tu (2000) showed
i¼1 N1μ^ 2 ðmÞ that the bootstrap interval yields the best coverage
þ 6
σ^ 6 ðmÞ σ^ ðmÞ probability among the first four methods in small
0 12 to moderate samples, although bias-corrected ML
P
N1
has better accuracy when the skewness is very
BN ^ ^ ðmÞ i¼1 C
W i
B 0 b mn N 1 μ C small. Tian (2005) verified that the generalized
B þ 4 4 C ,
@ 2 σ^ ðmÞ σ^ ðmÞA confidence intervals provide comparable results as
the first four methods. Based on Tian’s simulation,
the generalized confidence intervals seem to be δ ¼ Y1 Y2: (31)

anti-conservative, and its performance is actually
worse than other methods when the skewness is 2. The maximum likelihood estimator: When
low. But when the data is highly skewed, say lognormal assumption is appropriate, the MLE of
σ = 10, the coverage probability of generalized δ is available in the form of
confidence intervals is closed to the true value.
The adjusted signed log-likelihood method has ^δ ¼ exp μ 1 2 1 2
^ 1 þ σ^ 1 exp μ^ 2 þ σ^ 2 :
the best performance among all these methods 2 2
based on the results of Tian and Wu (2006),
although no direct comparison has been made The asymptotic variance of MLE will be given
between adjusted signed log-likelihood ratio- in (37).
based intervals and generalized confidence 3. Smooth quantile estimation: Dominici et al.
intervals. Another aspect to be considered is (2005) proposed a new kind of smoothing estima-
the computation difficulty. The likelihood- tor which needs no parametric assumptions. They
based methods both are more difficult to com- called it smooth quantile ratio estimator.
pute than other methods, as can be seen from the Step 1. Estimate β in
description of methods. y1ðiÞ
log ¼ sðpi , βÞ þ ei , i ¼ 1, . . . , n, (32)
y2ðiÞ
Two Sample
P
where sðpi , βÞ ¼ λj¼0 Bj ðpi Þβj , pi ¼ i=ðnþ1Þ
Before proceeding to discuss these methods, there and Bj( p) are orthonormal basis functions with
are
a few notations needed to be set up. B0( p) = 1. If the sample size is imbalanced, say
Y j, 1 ,:::, Y j, nj ,j = 1, 2 is now two sets of obser- n1 > n2, a tiny modification is needed: replace y2
vations from distributions with mean θj and vari- by q2, the linear interpolant of the order statistics
ance τ2j , respectively. Define Wi,j = log(Yi,j), 8i y2i) at the grid of points p1i = i/(n1 + 1),i = 1, ...,n1.
(1,..., nj), j = 1, 2, and denote the variance of Wi,j The choice of s is rather flexible, for instance,
as σ 2j , mean as μj, for j = 1,2. Let W j be defined as natural cubic splines, smoothing splines, and poly-
Pn Pn
W i, j , and S2j , the sample variance, be n1 1 nomials are all available choices. The simulation
i¼1 2 i¼1 study by Dominici et al. (2005) showed that the
W i, j W j . estimates are quite closeto each other.
The difference between two population means Step 2. Define u1 ¼ y1ð1Þ , . . . , y1ðnÞ , y1ð1Þ , . . . ,
is δ = θ1 θ2, which is the parameter of interest
y1ðnÞ Þ and similar with u2, where y1ðiÞ ¼ y2ðiÞ exp
in this section. With parametric assumption, i.e.,
lognormal assumption, δ can be further specified. s pi ,β^ , y2ðiÞ ¼ y1ðiÞ exp s pi ,β^ :

Under lognormal assumption, W j, 1 , . . . , W j, nj And estimate Δ by
come from a normal distribution with mean μj and
variance σ 2j , j ¼ 1, 2 . And the difference of two ^ SQ ðu1 , u2 , λÞ ¼ u1 u2 :
Δ (33)
lognormal means is
Notice that it is symmetric in the two sam-
δ θ 1 θ2 ples. Furthermore, it can be viewed as a linear

1 2 1 2 combination of order statistics, but with weights
¼ exp μ1 þ δ1 exp μ2 þ δ2 : (30)
2 2 estimated from the data, and thus it is related to
L-estimation.
The authors show that under mild conditions,
Point Estimate the proposed estimator is asymptotically normal.
pffiffiffi ^
1. Mean difference: A straightforward estimator In other words, n Δ Δ is asymptotically
of the mean difference would be the difference of normal with mean 0 and variance σ 2Δ. The asymp-
the sample means totic variance is given by
ð1 ð1 where
σ 2Δ ¼
p¼0 q¼0
fminðp, qÞ pqgfλ1 η1 ðpÞηðqÞ þ λ2 η2 ðpÞη2 ðqÞgdpdq,
2 3
ð1 X
1 4 1 λ
F1 F ðpÞ þ F1 Bj ðqÞ F1 1 5
1 ð pÞ þ 2 ð pÞ 1 ðqÞ þ F2 ðqÞ dq
2 1 j¼1
0
η k ð pÞ ¼ :
ð1Þ Fg ðpÞf g F1
g 1
g ð p Þ
0
The estimation is achieved by substituting all ^v 2 ¼ h θ^ I 1 h ^θ , (37)
unknown values with their empirical estimates.
In a simulation study, Dominici et al. (2005) where I is the Fisher information matrix and Î
showed that Δ ^ ðλ ¼ 2Þ has more robust perfor-
denotes its estimator:
mance than the MLE of lognormal distribution,
and it yields almost the same result when the 0 1
n1 =^
σ1 0 2 0 0
parametric assumption is met. The choice of λ B 0 n1 = 2^σ1 0 0 C
B C
can also be made by using cross validation. How- @ 0 0 n1 =^
σ1 0 A
2
ever, the computation of quantile smooth estima- 0 0 0 n1 = 2^σ1
tion, especially its asymptotic variance, is rather
difficult compared to those of MLE. The function h is defined as the partial deriva-
tive of δ with respect to φ = (μ1, σ 21 , μ2,σ 22 ):
Confidence Intervals
With no parametric assumption, one can use boot- @δ
strap or the asymptotic distribution of smooth hð θ Þ ¼
@θ
quantile ratio estimator to construct the confidence 0
1 1
intervals for the corresponding estimators. There are ¼ m1 , m1 , m2 , m2 , (38)
various ways to construct the confidence intervals 2 2
when lognormal assumption is applied.
where m1 ¼ exp μ1 þ 12 σ 21 and
1. A maximum likelihood approach: The max-
imum likelihood estimate for δ is m2 ¼ exp μ2 þ 12 σ 22 . The 100(1 α)% confi-
dence interval can be given by
^δ ¼ exp μ 1 2 1 2
^ 1 þ σ^ 1 exp μ ^ 2 þ σ^ 2 : (34)
2 2 δ zα=2^v , ^δ þ zα=2^v , (39)
where
where z comes from a standard normal distri-
1 Xni
bution. Since this is an asymptotic property, this
î ¼
μ W ij ; (35)
ni j¼1 CI can be foreseen to have poor performance in
1 X ni
2 small sample settings.
σ^ 2i ¼ W ij μî : (36) 2. A bootstrap approach: A parametric boot-
ni j¼1
strap method can be employed to replace the role
It is known that the asymptotic variance of of asymptotic standard normal distribution. The
MLE achieves the variance bound given by algorithm is summarized below:
(1) Compute μ ^ i , σ^ 2i and ^δ, ^v from the samples of T D ¼ expðT 1 Þ expðT 2 Þ:

interest.

(2) General ni samples from N μ ^ i , σ^ 2i , i = 1,2. Notice that this expression depends on two
statistics, namely, T1 and T2. They are defined as
(3) Calculate ^δ j and ^v j from the bootstrap sample.

(4) Compute the test statistic tj ¼ ^δ j ^δ =^v j .
Y i μi 2 pffiffiffiffi 1 σ 2i 2
(5) Repeat steps (2) and (4) for m times. Ti ¼ μ
î pffiffiffiffi σ^ = ni þ σ^ , i ¼ 1, 2:
Si = ni i 2 S2i i
The 100(1 α) % confidence intervals are (42)
constructed as in (39) with the corresponding
empirical quantiles of t serving as the role of z. This is equivalent to
3 A signed log-likelihood ratio approach:
Zi σ^ i 1
Rewrite the log-likelihood function as a function Ti ¼ μ
î pffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffi þ
of δ: U i = ni 1 n i 2
σ^ 2i
pffiffiffiffiffi pffiffiffiffiffi , i ¼ 1, 2, (43)
lðδ, λÞ ¼ n1 log 2π n2 log 2π U 2i =ðni 1Þ
1 Xn 1
n2 logσ 1 n2 logσ 2 2 and Zi ~ N(0,1), U 2i χ 2ni 1 In order to get a CI
2σ 1 j¼1

2 with GP, some samples can be drawn from Zi and
1 1 U 2i and calculate TDs. CI can be constructed with
y1j log δ þ exp μ2 þ σ 22 σ 21
2 2 enough sample of TDs.
1 Xn2 2
The simulation by Chen and Zhou (2006)
2 y2j μ2 ,
2σ 1 j¼1 showed that the generalized confidence inter-
(40) vals yield the best coverage probability, though
its performance in small samples is slightly
where λ is the vector of nuisance parameters worse. As an alternative, the ratio of two
(μ2, σ 1, σ 2). The signed log-likelihood ratio statis- means is also of some interest. The adjusted
tic (SLLR) is signed log-likelihood approach is available in
construction of confidence intervals, and it is
1=2 the best choice in that case. For more details,
r ðδÞ ¼ sgn ^δ δ 2 l ^δ,^λ l δ, ^λ δ ,
see Chen and Zhou (2006).
(41)
Hypothesis Testing
where ^δ and ^λ denote the maximum likelihood The hypothesis to be discussed here is a two-sided
estimators, and ^λ δ denotes constrained maximum hypothesis:
likelihood estimators: the MLE of nuisance
parameters at a given value of δ. The distribution H : δ ¼ 0; v:s: K : δ 6¼ 0:
of r approximates the standard normal to the first
order. Thus, the CI is given by A one-side test can be derived from two-side
tests easily by taking the upper critical value or the
δ; zα=2
r ðδÞ
zα=2 lower critical value.
1. A nonparametric bootstrap approach:
4. A generalized pivotal approach: General- Zhou et al. (1997) proposed to use bootstrap to
ized pivotal is a statistics that has a distribution get the p-value of the t-statistics. Unlike the
free of unknown parameters and an observed bootstrap method used to construct the confi-
value that does not depend on nuisance param- dence interval, this time the method does not
eters. In this case, define the generalized pivotal require parametric assumption. The algorithm is
quantities as summarized below:
1. Calculate the combined sample mean: ðμ

^ 20 , σ^ 10 , σ^ 10 Þ be the constrained MLE under the
condition that δ = 0. The test statistics R is
n1 n2
^v ¼ Y1 þ Y2: " #2
n1 þ n 2 n1 þ n2 W 1 n2 μ ^ 20 n1 σ^ 220 n1
R¼ þ
σ^ 210 σ 210
2^ 2
2. Transform the samples so that they share a 4 1 4 4
common mean: σ^ 20 σ^ 20 σ^ 10 σ^ 10
þ þ þ :
2n2 n2 2n1 n1
T i, 1 ¼ Y i, 1 Y 1 þ ^v , T i, 2 ¼ Y i, 2 Y 2 þ ^v :
The score R follows a χ 21 distribution when the
3. Resample n1 and n2 observations with equal sample sizes, i.e., n1 and n2, go to infinite.
weights from {Ti,1} and {Ti,2} with replacement, 4. Generalized p-value: The generalized
respectively. Denote the bootstrap samples as p-value can be achieved with the TDs that were
{Zi,1} and {Zi,2}. used to construct the generalized confidence inter-
4. Compute the bootstrap statistics: val. Suppose the null hypothesis is θ1 θ2 = 0,
then the generalized p-value is
Z1 Z2
t ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi , p ¼ min pðT D
0j θ1 θ2 ¼ 0Þ, p T D 0θ1
τ2 τ2
1
þ 2 θ2 ¼ 0g,
n1 n2
which can be estimated by the empirical distri-
where τ2
i is the sample variance of the boot- bution of TDs.
strap samples. The z-score test is of great computational sim-
5. Repeat steps 3 and 4 for B times, where B plicity and has a straightforward interpretation.
is a large number chosen by the investigator, The simulation of Zhou et al. (1997) showed that
and denote the series of test statistics as it has a satisfactory performance when the sample
B
ti i¼1 . sizes of both groups are large. However,
6. Calculate the observed test statistics in the Krishnamoorthy and Mathew (2003) discovered
same manner of step 4 with original samples. that the distribution of z-scores is skewed when
7. The p-value is the samples are imbalanced between two groups
and when the skewness is high compared to the

# ti : j ti j > j tobs j sample sizes. In that case, the generalized p-value
p¼ would be a better choice for hypothesis testing.
B
Gupta and Li (2006) argued about the same issue,
2. Z-score test: The test statistic is defined as and they showed that the score tests have a better
control over type I error and higher power than

W 1 W 2 þ 0:5 S22 S21 z-score tests. It is recommended to use score tests
Z ¼ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 4 : (44) or generalized p-value especially when the sam-
S21 S22 S1 S42 ples size are not equal. Again, caution should be
þ þ 0:5 þ
n1 n2 n1 1 n2 1 taken when interpreting the generalized p-value.
All three methods discussed above are based on
The distribution of Z is approximately standard the parametric assumptions; therefore, they are all
normal under the null hypothesis. Thus, the subjected to huge errors when the lognormal
p-value is min{l Φ(Ζ), Φ(Ζ)}. This test and assumption is violated. In that case, a bootstrap
the following tests all require the lognormal test is more preferable since it makes no paramet-
assumption. ric assumptions. Zhou and Tu (1999) discussed
3. Score tests: Gupta and Li (2006) derived the comparison of multiple population means with
the score test of two lognormal means. Let ^λ 0 ¼ zero-inflated distribution.
Applications on a Simple Example intervals, nonparametric methods have similar

performances, but there is an obvious difference
Callahan et al. (1997) studied the relationship between nonparametric and parametric. Nonpara-
between depression and the expected cost of diag- metric methods tend to be more conservative and
nostic testing for a patient. A subset of patients robust. The most conservative parametric CI is
who had a chronic medical condition as defined Angus’s conservative CI, which has a range of
by Ambulatory Care Group system is selected out 604 larger than those nonparametric methods
of the entire dataset. The focus of statistical anal- except for Hall’s transformation. It is noted that
ysis was on the mean of diagnostic testing cost Cox’s method yields the most unconservative
because it can be used to reconstruct the total cost. result. The estimates of lower bound are more
The data can be summarized in a 124 by 2 matrix, alike than estimated for the upper bound, which
of which the first column records the cost and might be the result of the right skewness. The
the second one contains the indicators of depres- lower bound by Hall’s transformation is close to
sion. Thirteen patients are diagnosed as depres- those of parametric methods, but its upper bound
sion in this sample (depression =1). Four is much larger than other estimates.
observations out of them have zero costs. The Results of (Tables 4 and 5) are similar to those
ratio is 17 out of 111 for the non-depression on positive data. The sample mean is larger than
patients (depression =0). the other two estimators; nonparametric CIs tend
In order to see how these methods perform on to be more conservative than parametric CIs
this dataset, three questions are raised: (Tables 1, 2, 3, 6, 7, 8).
For two-sample inference, there are nine obser-
1. What is the mean cost of those non-depression vations with positive costs in the depression group,
patients who have positive cost, and what is the and the standard error of their costs is 1116.3. This
corresponding confidence interval? might contribute to the extreme estimate of upper
2. What is the mean cost of those non-depression bound by generalized pivotal method. Other than
patients, and what is the corresponding confi- that, the results are consistent. Zero is included in all
dence interval? confidence intervals constructed by different
3. What is the difference in mean cost between methods. This phenomenon is consistent with the
the depression group and non-depression results of hypothesis testing, where all four p-values
group among those who have positive cost? are not significant under common settings.
And, of course, the confidence intervals and
hypothesis testing.
Regression
Although they are made up in this example,
these questions are commonly seen in real analy- In some sense, linear regressions can be viewed as
sis and would help the performance of various a generalization of multiple comparison. Consider
methods. a simple linear regression with a binary variable as
The estimators based on lognormal assumption its covariate; the test on the coefficient is the same
provide similar answers, while the sample mean is as a two-sample t-test on mean difference. When
separated from the others. In terms of confidence the covariate at hand is continuous, i.e., there are
Table 1 95% confidence intervals of the one-sample mean (1)

Parametric methods
Parametric
95% Confidence intervals Cox Angus bootstrap SLR Adjusted SLR Generalized pivotal
Lower bounds 407.9 406.6 419.2 416.3 418.5 420.1
Upper bounds 731.7 1010.2 759.2 750.6 761.2 767.7
infinite categories, a test of the mean relation of section, log-transformed data often has more sym-
the dependent variable and covariate can be metric distribution than the original data. And the
achieved by a linear regression. Both ordinary heteroscedasticity found in cost data can some-
linear regression and generalized linear model times be mediated by variance-stabilizing trans-
describe the mean relation of dependent variable formation including log transformation. Thus,
– the outcome and covariates. In other words, it is linear regressions can be applied on this trans-
an “on-average” type of description of the data. formed data. However, the regression on trans-
The other kind of regression that is going to be formed data can only be interpreted as the mean
discussed in this section is the quantile regression. relationship between the transformed outcome
As will be explained later, quantile regression is and covariates, which is not of scientific interest.
slightly different in interpretation from linear It does not cause any trouble when the relation
regression. of interest is multiplicative, for instance, the
There are extensive econometric literatures on influence of inflation rate on wages. But when
methodologies and applications of regression on the quantity of interest is, say, the total cost, a
medical costs. The features of cost data are the regression on the transformed data is not
same as those in last section: skewness, nonnega- enough to answer the question. Therefore,
tive values, and nontrivial fraction of zero obser- back-transformation becomes a problem. The
vations. Clustering and multimodality might also smearing estimator by Duan (1983) is dominat-
affect the validity of results if not properly ing in this area.
adjusted. Another way to deal with skewness and non-
The most common way to analyze cost data is constant variance is to implement a generalized
log transformation. As discussed in the last linear model (GLM). The relation between the
dependent variable and covariates is described
Table 2 95% confidence intervals of the one-sample by two equations in GLM, which are the link
mean (2)
function and mean-variance relationship. The
NP methods flexibility of link function and variance structure
95% Confidence NP provides a wide range of models that can be
intervals CLT bootstrap Hall
described under the setting of GLM. Various
Lower bounds 343.5 346.2 420.5
methods have been proposed to facilitate
Upper bounds 819 816.2 1692.3
researchers to choose the best models that fit the
data. Manning et al. (2005) discovered that the
Table 3 Estimates of the one-sample mean GLM and log-transformed OLS can be summa-
Point Sample cm rized in one family of models named generalized
estimate mean MLE UMVUE MSE gamma model.
581.3 542.7 540 529 In most study, the methods described above
would not be considered complete without the
Table 4 Estimates of the zero-inflated mean way to deal with the nontrivial fraction of zeros.
Point Sample Bias-corrected The zeros cause a direct problem with log trans-
estimate mean MLE MVUE formation, where log (0) has no meaning. A
492.3 462.7 457.7 straightforward, also naive, solution is to add a
Table 5 95% confidence intervals of the zero-inflated mean

Parametric methods NP methods
Parametric Generalized
95% Confidence intervals MLE MVUE bootstrap SLR pivotal CLT NP bootstrap
Lower bounds 342.1 317.9 330.6 347.8 344.2 287.3 288.9
Upper bounds 625.8 597.4 613.1 640.8 653.5 697 695.6
Table 6 Estimates of the mean difference between two 2. The (conditional) marginal effect @@μxðkxÞ ¼
continuous samples
@ E½yj x
Point estimate Sample mean MLE @ xk . It is a typical measurement of how a
269.1 331.6 certain covariate xk affects the dependent vari-
able Y. In simple regression, it is called “slope.”
However, the concept of slope might not be
Table 7 95% confidence intervals of the mean difference
valid in other framework of regressions, and
between two continuous samples
that is why the marginal effect is brought
95% Confidence Parametric Generalized
intervals MLE bootstrap SLR pivotal up. Noted that the slope in linear regression
Lower bounds 491.5 174.9 442.1 195.5 does not depend on other covariates, marginal
Upper bounds 1154.7 2455.3 2568.8 8613.1 effects are different in the sense that they actu-
ally depend on the value of other covariates.
Interpretations of marginal effects must not
Table 8 P-value of the hypothesis that mean difference
is zero
ignore this property.
Pn
@ μðxÞ
Score Z-Score Bootstrap Generalized 3. The average marginal effects θ1 ¼ 1n @ xk
p-Value test test test p-value i¼1
0.85 0.35 0.43 0.15 for fixed x or θ2 ¼ E @ μðxÞ for randomized x.

@ xk
Marginal effects are conditioned on other
covariates. The average marginal effects are
small constant to zeros. The constant is often created as unconditional values which take
chosen to be the minimum positive values in the the average over possible values of covariates.
sample. As one could easily point out, this method Therefore, they are features of the entire pop-
has barely any scientific justifications, and its only ulation instead of any individuals.
purpose is to make the model work. Another
method is to describe the distribution of cost as a As an example, in linear regression, the mean
combination of several distributions, which is of y given x is simply xTβ, and slope of mean
referred to as the mixture distribution. A common w.r.t. xk is [βk, and θ1 = θ2 = βk. There are more
strategy is to describe whether positive cost would summarized quantities for the data, but the
be observed in the first part of a two-part model methods introduced in this section would only
and then use the regression methods discussed focus on these quantities.
before to analyze the cost conditioning on the
observations that have medical costs in the second
part. The technical problem that arises in a Linear Regression on Raw Data
two-part model is the conditioning variance of
the estimators, which will be explained in details Despite the low efficiency, the least squares esti-
later in this section. mators of linear models are applicable on medical
cost data. The estimators of coefficients remain
unbiased and consistent, which means it provides
Parameters of Interest results that are acceptable as long as the sample
size is large enough. However, cautions should be
There are several parameters of the cost data that taken in estimating the variance of the estimated
are of practical interests. coefficients. It is quite possible that hetero-
scedasticity exists in cost data. Statistical infer-
1. The conditional mean μ(x) = E[Y|X = x]. This ence of coefficients would be invalid, without
is the expected cost of a patient given one’s accounting for the heteroscedasticity. Therefore a
covariates. It can also be used to make infer- robust standard error will be more plausible than
ence about the total cost of one population. the homoscedastic standard errors. Huber/White
estimate of the variance-covariance matrix is There is also an assumption called weak exo-
highly recommended to construct the robust stan- geneity, which is weaker than this one. When
dard errors of coefficients. A typical linear model X is treated as fixed, the exogeneity holds.
can be written as
Assumption 3: No Multicollinearity
Y ¼ Xβ þ e, rank(X) = p,
or, in other words, none of the row vectors of
where X can be written as a linear combination of
other rows.
Y ¼ ðy1 , y2 . . . , yn ÞT ,
Assumption 4: Uncorrelation
and the design matrix
cor ei , ej j X ¼ 0, i 6¼ j:
0 1
x11 x12 x13 ... x1p
B x21 x22 x23 ... x2p C If this assumption is violated, it is necessary to
X¼B
@⋮
C,
⋮ ⋮ ⋮ ⋮A use estimators of standard errors that adjust for
xn1 xn2 xn3 ... xnp correlations. This is most often observed in spatial
or temporal data. A group of observations that has
and the residuals: correlations among its members is called a cluster.
Assumption 5: Constant Variance or

e ¼ e1 , e2 . . . , en T :
Homoscedasticity
There are certain assumptions in order to make
varðei j XÞ ¼ E e2i j X ¼ σ 2 , ði ¼ 1, . . . , nÞ:
the linear regression valid. This section would
only introduce some crucial assumptions but The homoscedasticity assumption might be
make no further comments. A systematic analysis one of the least important assumptions for linear
and descriptions of linear regressions can be regression – it is too strong to be true in most real-
found in various textbooks, for instance, Seber world problems, and also, there are well-
and Lee (2012) and Hayashi (2000). developed methods available to estimate the stan-
dard errors when it is violated.
Assumption 1: Linearity
EY ¼ Χβ: Assumption 6: Normality
ei jX N ð0, σ i Þ, ði ¼ 1, . . . nÞ:
This assumption is sometimes written as
This parametric assumption is important in
yi ¼ β1 xi1 þ . . . þ βp xip þ ei ,
getting the distribution of the test statistics in
hypothesis testing. But most of the tests are still
where e has mean zero. An important concept to
valid as long as sample size is large enough even
be memorized, that is, the linearity, here refers to the
when the errors are not normally distributed.
linearity in coefficients instead of covariates.
Assumption 2: Exogeneity Linear regressions describe the mean relations

between the outcome and covariates. The central
Eðei j XÞ ¼ 0, ði ¼ 1, 2, . . . nÞ: limit theorem ensures that a consistent conclusion
would be achieved, which means estimates and
The statement means the expectation of resid- inferences from an appropriate linear regression
ual is zero conditioning on all the covariates. The would be correct with infinite samples. However,
exogeneity here is actually strict exogeneity. consistency is only one aspect of data analysis.
Researches are often restricted by sample sizes, in changes of the outcome. It is sometimes the quantity
which cases efficiency would be of more importance of interest, say, when investigating the association
to investigators than consistency. Unfortunately, one between wages and inflation rate. If then, the prob-
major disadvantage of linear regression on raw data lem of back-transformation is avoided. Yet the infer-
is its lack of efficiency. In other words, this regres- ence of total mean is often what investigators
sion method needs a greater sample size to reach the of medical cost concern about, which requires
same accuracy than some other methods. Recall that back-transformation. Another issue is that
skewness and heavy tailedness of the distribution of variance-stabilizing transformations can normal-
outcomes (Y) are the main features that are respon- ize the distribution of dependent variable, while
sible for the low efficiency of ordinary regressions. It they may not stabilize the variance as it should
is natural to think of transformations on Y to “cor- do. Therefore, homoscedasticity might not hold
rect” these features. for the transformed data.
The next step is to apply the methods discussed
Transformation on Y in the last section on the transformed data. It is
recommended to employ as few assumptions as
The intuition of transformation is straightforward: possible since there is no a priori knowledge of the
to achieve a better distribution of data by trans- transformed data. The inference made on trans-
forming the outcomes with some monotone func- formed scale might be adequate to answer the
tions. The advantage is also clear: an appropriate questions as mentioned above, and then there is
transformation would increase the efficiency of no need for the back-transformation step. Other-
estimation (Manning and Mullahy 2001, Briggs wise, the analysis should be continued.
et al. 2005). As been discussed in section “Param- The last step, back-transformation, is the key
eters of Interest,” an obvious issue of transforma- step in this method. Transformation is a tool to
tion is the change of scale. The inference made on gain efficiency, but the questions of interest are
transformed scales might not have scientific still on the original scale of the cost data. The
meanings. Moreover, it is inappropriate to trans- back-transformation methods are dominated by
form estimates directly back to original scale, Duan’s smearing estimators. Duan (1983) pro-
resulting in biased and inconsistent estimates. poses a nonparametric estimator that uses the
Statistical inferences on transformed scale are average of the transformed residuals to estimate
very likely to be different from those made on the expectation of dependent variable on the orig-
original scale. Thus, the main difficulty in the inal scale. We estimate EY0 by substituting the
methods based on transformation is the back- unknown cdf F by its empirical estimate F ^ n:
transformation problem.
A general procedure can be summarized into 1
EY 0 ¼ Σni¼1 hðx0 β þ ê i Þ: (45)
three steps: transformation, regression, and back- n
transformation.
The first step, transformation, consists of choos- Further substituting the regression parameter β
ing a transform function h and substitute y with h( y). in (45) by its least squares estimates β^ , the
There are various functions that can serve as the smearing estimator is thus defined as
transform functions as long as it is monotone and
thus invertible. Box-Cox transformation is consid- 1
ered as a well-defined group of transformations for EY 0 ¼ Σni¼1 h x0 β^ þ ê i : (46)
n
skewed data. Another variance-stabilizing transfor-
mation is also available (Weisberg 2005) For analy- Applications and generalizations of Duan’s
sis of cost data, log transformation is more method have been proposed in recent years. In
preferable than others in practice due to certain the rest of this section, three procedures would
practical reasons. For instance, regression analysis be introduced as examples for transformation-
on the log-transformed scale reveals the proportional based methods. The first one is the widely used
" #
logarithm transformation by Ai and Norton @^
μ ðxÞ @h x,β^ s2 ðx,^γ Þ
(2000); the rest are robust, yet efficient, nonpara- ¼μ
^ ðxÞ þ 0:5 ,
@xj @xj @xj
metric methods by Welsh and Zhou (2006) and
Zhou et al. (2008).
with variance
Example: Log Transformation 2
@ μ ðxÞ @ 2 μ ðxÞ
Ai and Norton (2000) derived the forms of stan- ω2j ðxÞ ¼ Σ β
dard errors of smearing estimators under log @xj @β @xj @β0
2
transformations by delta method. Their methods @ μðxÞ @ 2 μðxÞ
þ Σγ j 0 :
allow the situations where a nonlinear regres- @xj @γ @x @γ
sion has been applied in the second step. Results
for linear regression can be easily achieved from The sample average incremental effect or the
the general conclusions. marginal effect is
Although normality assumption might not X
^θ j ¼ 1
n
@2μ
^ ðx i Þ
always hold for transformed data, there is no ,
harm to look at the simplified case when the resid- n i¼1
@xj
uals are assumed to be normally distributed. Write
with variance
the model as ln( y) = k(x,β) + s(x,γ)e, where k(x,β)
is any models of the expectation of ln( y) given ! !!
x and e has mean 0 and unit variance. Imposing 1X n
@ 2 μðxi Þ 1X n
@ 2 μðxi Þ
ω3j ðxÞ ¼ Σβ
normality assumption on e means assuming e fol- n i¼1 @xj @β n i¼1 @β0 @xj
! !!
lows a standard normal distribution. Notice that the 1X n
@ 2 μðxi Þ 1X n
@ 2 μðxi Þ
square of s(x, γ) is the variance of the error term s þ Σγ :
n i¼1 @xj @γ n i¼1 @γ 0 @xj
(x,γ)e, writing it as a function of x allows for
heteroscedasticity. Both k(x,β) and s(x,γ) need to The quantities needed in the above formulas
be specified. For linear models, k(x,β) is defined as are listed in the appendix.
x β. Suppose β^ is the estimate of the linear regres-
0
Now if the normality assumption is inappropri-

sion, or nonlinear regression, depending on the ate, estimators are still available and complicated. A
form of k, on transformed data, ê i is the residual new quantity needs to be defined: mi(x, β, γ) = exp
^ β which is the heteroscedasticity-consistent
for xi , Σ (k(x, β) + [(ln(yi) k(xi, β))/s(xi, γ)]s(x, γ)),
covariance matrix. An additional regression is which is the predicted value of μ(x) based on xi. The
needed in order to get the estimates, which is intuitive idea is simple: replace the distribution func-
to regress ê 2i on s(xi,γ). Denote the estimate of γ tion with empirical distribution – its empirical esti-
from the second regression as ^γ and the mate. The three estimators are listed below. The
heteroscedasticity-consistent covariance matrix as estimated variances can be found in Appendix B.
^ γ . Then the estimates of y’s expectation give x as
Σ
1X n
^γ ,

μ
^ ðxÞ ¼ mi x,β,^
^ ðxÞ ¼ exp k x,β^ þ 0:5s2 ðx,^γ Þ ,
μ n i¼1

^ ðxÞ 1 X
μ n ^γ
@mi x,β,^
with variance ¼ ,
@xj n i¼1 @xj

@μðxÞ @μðxÞ
ω 1 ðxÞ ¼ Σβ !
@β @β0 1 Xn
1 Xn
@m x , ^γ
β,^
θ^ j ¼
i k
:
@μðxÞ @μðxÞ n i¼1 n i¼1 @xj
þ Σγ :
@γ @γ 0
In this example, the transformation is pre-
The incremental effects of kth elements of x is specified and thus known. Transformation
functions would have higher efficiency when the 1

pn ð z 1 , z 2 Þ ¼
assumptions are more likely to be true. The fol- nb0 b1 b2 pn ðz1 , z2 Þ
lowing example shows how to estimate the trans- X
n
Yi y Z 1i z1
form function from the data. K0 K1
i¼1
b0 b1

Z2i z2
K2 :
Example: Estimating Transformation b2
Function
It turns out that the transformation function that K1, K2, and K3 are kernel functions with band-
satisfies h( y) = x0β + s(x0γ)e is more restricted width 61, 62, and 63, respectively. The unknown
than it seems to be. Zhou et al. (2008) showed parameter can be approached by using estimating
that the transformation function can be esti- equations:
mated from the data once such linear model is
specified. Now, the model is defined as h( y) = Xn
hðyi Þ X0i β Xi
¼ 0,
x0β + s(x0γ)e. Notice that now it is a linear s2 X0i γ
i¼1
regression with heteroscedasticity, but the vari-
ance is a known function of the scalar x0 γ. n h
X 2
Denote x0 β as z1 and x0 γ as z2, the density func- hðyi Þ X0i β s2 X0i γ Xi ¼ 0,
tion of z1, z2 as p(z1, z2), and the cumulative i¼1
distribution of y given z1,z2 as G(y|z1, z2),

Zhou et al. (2008) derived the relation between and
h and g1 = @G/dz1.
Xn
Xi X0i
1 Xn
X i hð y i Þ
βn ¼ 0
P
n s 2 Xγ s 2 X0 γ
ðy pðu, Z1i , Z 2i Þ i¼1 i i¼1 i
i¼1
hð y Þ ¼ Pn du:
The conditional mean on the original scale
y0
g1 ðu, Z1i , Z2i Þ
i¼1 (n(x)) can be easily estimated by the smearing
estimator:
They proposed to estimate it with kernel den- !
sity estimate of unknown density function and 1X n
^h 1 ^h ðyi Þ X0 β^
distribution function: û ðxÞ ¼ x0 β^ þ sðx0^γ Þ i
:
n i¼1 sðx0^γ Þ
P
n
ðy pn ðuj Z1i , Z2i Þpn ðZ 1i , Z 2i Þ Zhou et al. (2008) proved that the estimator
hn ð y Þ ¼ i¼1
du, ^ ðxÞ converges to the true value at the rate n1/2,
μ
Pn
y0
g1n ðuj Z1i , Z 2i Þpn ðZ 1i , Z2i Þ and, as a nonparametric method, it is suitable for
i¼1 any distribution of y. For more details and the
estimate of variance, please see Zhou et al. (2008).
where
1
Example: Nonparametric
pn ð z 1 , z 2 Þ ¼ Retransformation
nb1 b2
Welsh and Zhou (2006) proposed a method that
X n
Z 1i z1 Z 2i z2
K1 K2 , can estimate the back-transformed mean and its
i¼1
b1 b2 standard error for any transformation functions.
The model is assumed to be h( y) = x0β0 + gi(β0,
g1n ðyj z1 , z2 Þ ¼ @Gn ðyj z1 , z2 Þ=@z1 , γ 0)e, where gi can be a function of xi and ei are
independent and identically distributed random f. Assumed that such relationship exists: g
variables. ψ ~ (βT, γ T)T is estimated from estimat- (μ) = Xβ with g being a monotone increasing
ing equations. Then denote ηi = xTβ0 + g(ψ 0) function and the variance-covariance matrix of
hðy ÞxT β Y is a function of μ : V(μ), which is determined
ei(ψ), where ei ðψ Þ ¼ gi ðψ iÞ 0 and the estimated
i 0
P
n by the density function f. The function g is usually
mean on the original scale is m ^ ¼ h1 ðηi ðψ^ ÞÞ, called as the link function, and var(Y ) = V(μ) is
i¼1
which is also a smearing estimator. The idea of called the mean-variance relationship or variance
this method is to estimate the empirical distribu- function. The unknown parameter can be esti-
tion of residuals ei instead of making assumptions. mated by maximum likelihood estimator since a
The corresponding standards are estimated with parametric form of f is available. For short, a GLM
the help of the properties of empirical process. In describes the relation between a function of the
the original paper, Welsh and Zhou (2006) gener- expectation of Y and covariates; variation is
alized this method to the situation when there are addressed by the mean-variance relationship
observations with zero costs. and/or the assumed distribution.
The idea of transformation method is to trans- One important advantage of GLM is that it
form the data so that it has a “better” distribution, can handle various types of data. For instance,
which is often more symmetric and less heavy discrete data can be described by the Poisson
tailed and can be better fitted with a linear distribution with a log link function. For binary
model. By doing this, one can gain efficiency data, it can be analyzed by a Bernoulli distribu-
from transformation and assumptions. A natural tion with logit link, which is known as a logistic
alternative is to abandon the requirement about regression.
symmetry. For instance, a log-transformed linear Recall that in linear model, the normality
model can be interpreted as a lognormal model as assumption is the least important assumption
well. In the next section, this kind of models – because of central limit theorems. The same
generalized linear model – and applications of thing happens here. The parametric assumption
them will be discussed. is not necessarily required in setting a GLM
model, although it is still popular because of its
direct interpretations. In the previous setup, one
Transformation on E[Y] needs to specify the actual distribution of the
dependent variable and then use it to derive the
Linear model can be viewed as a parametric score function. But, in fact, one only needs the
model based on the normality assumption, where mean-variance relationship and use it to construct
the mean of normal distribution is assumed to an estimating equation which has the same prop-
have a linear relationship with the coefficients. If erties as the score function. The estimators from
the model is correct, the dependent variable is corresponding estimating equations are still con-
normally distributed – therefore symmetric and sistent. Therefore, the procedure reduced to spec-
without heavy tail. A natural generalization of ify (1) the link function and (2) the mean-variance
this traditional linear model is to expand the fam- relationship. Notice that the first term is the first
ily of distributions to account for possible skew- moment of dependent variable and the second
ness and heavy tail, which is called the term is about the second moment. That is why
generalized linear model by McCullagh and economatricians also call GLM and generalized
Nelder (1989). The GLM is first introduced to moment methods.
the area of medical cost analysis by Blough With parametric assumptions, the MLE might
et al. (1999). have explicit solutions. Otherwise, the estimators
Let μ be E(Y ), where Y is a n 1 vector. Yi, can be solved by solving the following estimating
i = 1, 2,..., n are i.i.d. from a common distribution equations with numerical method:
XN
@μðxi ; βÞ 1 Quadratic variance family:
V ðμðxi ; βÞÞðyi μðxi ; βÞÞ ¼ 0,
@β
gðμi ; θ1 , θ2 Þ ¼ θ1 μi þ θ2 μ2i :
i¼1
(47)
Denote the parameters as γ = (βT, λ, θ1, θ2)T
Then the estimating equations are
where μðxi ; βÞ ¼ g1 x0i β . If the model is

specified correctly, the asymptotic variance of Giβj ¼ ðY i μi ÞV 1
i @μi =@βj ;
the estimator will be the inverse of Fisher infor-
mation up to some constant. Or, one can use the
Giλ ¼ ðY i μi ÞV 1
i ð@μi =@λÞ;
sandwich estimator as a robust estimator. A com-
monly used test for the coefficients is the h i
Wald test. Giθ1 ¼ ðY i μi Þ2 V i V 2
i ð@μi =@θ 1 Þ;
The interpretations of the regression must be
h i
taken care of. A GLM describes the relationship Giθ2 ¼ ðY i μi Þ2 V i V 2
i ð@μi =@θ 2 Þ:
between covariates and a function of Y’s expecta-
tion. Logistic regression, for example, shows the
linear relationship between the covariates and the And they can be combined in a vector form.
odds ratio. In medical cost data, the situation is Let
simpler since the most widely used GLM model in T
analyzing cost data is a gamma distribution with a Giγ Giβ1 :Giβ2 , . . . , Giβp , Giγ , Giθ1 , Giθ2 :
log link. Or without the parametric assumption,
one can employ a log link and V( y) = ϕμ2, which
The estimating equation is then
is a feature often observed in most medical cost
data (Blough et al. 1999, Manning and Mullahy X
n
2001). Giγ ¼ 0:
i¼1
The additional indexes λ and θ1, θ2 can be

Flexible Link Function incorporated into the generalized estimating equa-
Basu and Rathouz proposed a method that tions: the variance can be estimated by the sand-
enables investigators to choose the link function wich estimator. The marginal effect of x j is
and variance function from a certain family and
thus provide an option when there is no a priori μ ðxÞ 1 X
@^ n 1^λ
knowledge of the link function and the mean- ¼ β^ j μ XTi β^ ^λ :
@x j n i¼1
variance relationship. They define a parametric
family of link function indexed by λ:
The authors showed that estimating link function
results in some loss of efficiency, but it is partially

μλi 1 =λ if λ 6¼ 0 recovered by estimation of the variance structure.
hðy, λÞ ¼
logðμi Þ, if λ ¼ 0 They also recommended the use of power variance
family when treating continuous outcomes and qua-
dratic power family for discrete family.
This family of functions is a modification of The next example gives a general distribution
Box-Cox transformation (1). Similarly, the that is able to cover many other distributions.
authors define two families h(μ, θ1, θ2): PV
and QV. A Generalized Gamma Model
The power variance family: Manning et al. (2005) proposed a generalized
gðμi ; θ1 , θ2 Þ ¼ θ1 μθi 2 ; gamma model(GGM) to analyze skewed and
heavy-tailed data. One key feature of this GGM is keep or drop all zero-cost observations depending
that it provides more flexibility than some other on the research interest. The most commonly used
models: lognormal models, gamma models, and modification is to construct a two-part model.
Weibull models are all special cases of GGM. The The intuition behind the two-part model is to
density function of GGM is describe separately the event that cost occurs
and how much the cost is when it occurs. The
γγ pffiffiffi outcome variable in the first part is a binary
f ðy; k, μ, σ Þ ¼ pffiffiffi exp z γ μ , (48)
σy γ Γðγ Þ variable 8i, where 0 stands for no cost occurs
and 1 stands for positive cost. Most of methods
where γ = |κ|2 , z = sign(k) log( y) μ/σ that are available for binary outcomes are appli-
and μ = γexp(| κ| z). The parameter μ is replaced cable here, and logistic regression is a typical
by XTi β. The expected value of y condition on x is method that one would use. In the second part,
given by all observations left have positive costs and that
turns the problem to what have been talked
h σ about.
Eðyj xÞ ¼ exp xT β þ log κ2
κ It seems a little bit complicated, but by the

1 σ 1 i short argument below, it will be clear why a
þlog Γ 2 þ log Γ 2 , two-part model will simplify the problem. Sup-
κ κ κ
pose there is a parametric distribution for the
second part. The likelihood function:
As shown in Manning et al. (2005), (48) is a
lognormal model when κ is close to zero or a n
gamma distribution when σ ¼ K > 0 . In other Ln ¼ ∏ pðδi j xi Þf ðyi j δi ¼ 1, xi Þδi
i¼1
words, the value of the parameters of GGM can
n
distinguish those special models from each other. ¼ ∏ pðδi j xi Þ ∏ f ðyi j δi ¼ 1, xi Þ (49)
A natural benefit of this kind of setting is that a i¼1 δi ¼1
model selection problem can be restated as a
hypothesis testing on the parameters. Or in If the conditional density function f does not
another aspect, it provides a systematic way to not depend on δ, then the likelihood function can
evaluate the appropriateness of those models. be maximized separately. Recall that all models in
In their paper, Manning et al. (2005) compared previous sections have nothing to do with δ; they
three versions of GMMs – featured by the way to can serve as the conditional density function here.
deal with heteroscedasticity – against some Therefore, all one needs to do is to analyze the first
existing model including back-transformed linear part and the second part separately and then com-
regression of ln( y) on x, a GLM with log link and bine the result into one. The estimated mean of the
gamma distribution, and a maximum likelihood population will be
estimator of Weibull model. Results showed that
the GGM would choose the right model properly, ^y ¼ ^p μ
^,
yet the heteroscedasticity in x has to be accounted
for. Also, GGM can better approximate the distri- where μ ^ is the estimated mean of the cost in the
bution of the data than other parametric models second part and ^p is the probability that cost
due to its flexibility. occurs. Blough et al. (1999) estimated the vari-
ance of ^y by
Two-Part Models
Varð^y Þ ¼ Varð^p μ
^Þ
The models discussed above are all based on ¼ ^p 2 Varðμ ^ 2 Varð^p Þ,
^Þ þ μ (50)
positive and continuous data. But in real-life
research, there is always a considerable fraction of which is an approximation of the true variance.
observations that have zero cost. One can choose to An alternative to use this equation is to generate
the variance by bootstrap methods. The parame- outcomes. The mean is one quantity that can sum-
ters used in the first part are not necessary to be the marize the property of the conditional distribution
same as those in the second part. Interpretations of of outcome variables. Of course there are more
the coefficients are different from the previous summary quantities, for instance, the median,
section since the inference on the second part is 25%, and quantile, 75%, all of which can present
conditioning on the event that cost occurs. the distribution in some sense. It is noted that the
As mentioned early, two-part models are quite quantiles are better estimators than the mean for
popular in the analysis of medical cost. An exam- skewed or heavy-tailed data. However, the quan-
ple can be found in Blough et al. (1999) where tity of interest in this analysis is the total medical
they used a logistic regression for the first part and costs, which is directly related with mean but not a
a GLM with log link for the second part. If one single quantile. In order to estimate the total med-
chooses to transform yi, the back-transformation ical cost or the mean, a series of quantiles should
problem for a two-part model had been studied by be estimated so that an empirical estimate of dis-
Welsh and Zhou (2006). tribution can be achieved. The regression of
quantiles is called the quantile regression.
Mixtures of Distributions Koenker and Hallock (2001) said that “Quantile
With a point mass at zero, observations gathering regression seeks to extend these ideas to the esti-
around zero can also be viewed as multimodality, mation of conditional quantile functions – models
which can be explained by that the distribution is in which quantiles of the conditional distribution
actually a mixture of several distributions. In fact, of the response variable are expressed as functions
the two-part model is a special case of mixture of observed covariates.”
models. A mixture model is helpful in classifying The quantile regression can be viewed as a
the observations into high-cost groups and generalized median regression. In a median
low-cost groups. Say the true distribution of med- regression, the output of regression would
ical cost in a certain population is a mixture of describe the relation between the median, or
several normal distributions with different means 50% quantile, and the covariates. Median
due to some unknown features of patients. Then regression seeks to minimize the difference
the unknown features can be treated as a latent between the estimated values and the real
variable that would help in telling which normal values, in contrast to mean regression. Or in
curve the patient is in. Expectation maximization other words, median regression estimators min-
(EM) algorithm would give the estimates of the imize the sum of absolute value of the
coefficients of interests. More details about mix- difference:
ture models can be found in McLachlan and
Peel (2000). X
n
min y X0 β: (51)
There are other methods to deal with zero-cost i i
i¼1
observations, which include adding a constant to
each sample and forced the data to be positive.
Now let τ ranges from 0 to 100%, a regression
However, some methods have hardly any realistic
on the τ th quantile is
meaning but only serve as a way to address the
zero-cost observations. An advantage of the
X
n
two-part model is that it makes some sense in min ρτ yi X0i β , (52)
terms of real-life interpretations. i¼1
where ρτ is defined as ρτ(u) = u(τ – I(u < 0)).

Quantile Regression When τ is set to be 0.5, ρ is equivalent to taking
the absolute value up to a sealer. And thus (52) is
All of the methods introduced above focus on the exactly the same as (51). This optimization can be
relation between covariates and the mean of easily solved by many algorithms. The
implementation of quantile regression can be about prediction are introduced, leaving the
achieved through written software such as details to be explored by readers.
quantreg in R by Koenker (2009). A nature esti-
mator of the conditional mean is the average of all
conditional quantiles. The marginal effects are Some Basic Concepts of Prediction
now specified with respect to each quantile. Models
A detailed example of quantile regression
analysis can be found in Koenker and Hallock The primary question of interest is how to accu-
(2001). In general, investigators can set a series rately predict the response, in this case the medical
of r, say from 10% to 90% increased by 10%. cost, given other individual information (predic-
That would give ten regression results, each of tors) and previous knowledge (the observed sam-
which stands for the relation of covariates and ple and maybe the theoretical model). A
the corresponding quantile. A major advantage secondary question is how to estimate the accu-
of quantile regression over linear regression racy, i.e., the prediction error, of the proposed
is that it reveals the different behavior of method. This type of prediction is called a super-
covariates on outcomes. Regression on the vised learning in the sense that there is a response
mean averages this effects and report only the (or outcome) that can be used to judge how well
averaged value. It is very possible that a feature the method does. Usually it is achieved by speci-
would behave differently on subjects with rela- fying a loss function which penalizes the method
tively low costs and those with relatively high based on the deviation from the true response,
costs as shown in the example in Koenker and e.g., the square error and absolute error. The
Hallock (2001). A possible explanation is that regression methods described in the last section
there are unknown features even after some can be counted as methods of supervised learning,
features are controlled; those features would where most of them use square error loss and
affect the costs and have interaction with the quantile regression uses several versions of abso-
controlled features. This concept is similar to lute error loss. Notice that an additional assump-
mixture models, where there are unknown fea- tion is needed in order to make the prediction
tures that define different models. But quantile valid: the sample been predicted should be from
regression does not attempt to figure out the the same population from where the observed
classifier; it simply performs the regression on sample is drawn. There are a bunch of other
different quantiles. methods available, to name some, principle com-
ponent analysis (PCA), support vector machine
(SVM), neural networks, random forest, and so
on. A general and broad introduction of the
Prediction methods can be found in Friedman et al. (2001).
As for the measure of accuracy of prediction,
As have been studied in previous sections, various several measures are available, for example, root
methods and models can be employed to discover mean squared error (RMSE) and mean squared
and quantify the association between covariates prediction error (MSPE). RMSE is defined as the
and medical cost in the target population. Natu- squared root of mean squared error, which can be
rally, one would be interested in whether it is estimated by the mean of squared difference
possible to predict the future medical cost for an between the fitted values and true values. MSPE
individual, or a group of people, given certain is the mean of squared difference between the
information. It is worthy of noting that prediction predicted values and the true values. The differ-
is a very broad subject where methods arise from ence between MSPE and MSE is that the model
various disciplines, which is beyond the scope of used to generate predicted values is fitted by
this chapter. In this section, a brief overview of another dataset, while the MSE is calculated
prediction methods and some important concepts with the model fitted by the same dataset. In
other words, it requires two independent datasets since the way it is generated is similar to that of
to estimate MSPE but only one for MSE. The MSE, it might also result in overfitting when using
dataset used to fit the model is called the training it as a measure to choose the best model.
set, and the other one is called test set. MSE is Even with cross validation, overfitting is still a
almost always smaller than MSPE. Theoretically, problem. Throwing more predictors into the
MSPE is a better measure of accuracy than model will result in smaller MSPE in most cases.
RMSE. However, estimating MSPE requires two The small MSPE presents as a problem since it is
independent datasets, which might be a luxury for possible that the fitted model has been modified to
study with small sample size. Meanwhile, describe and only describe this observed sample,
depending on MSE, it might result in overfitting or training set, and thus the model is limited in
the current dataset, and thus the model is not valid being generalized to other samples in the popula-
for generalization onto other datasets. tion. Therefore, it is a trade-off between the ability
Recall that the purpose of prediction is to pre- of generalization and the accuracy.
dict the response with the highest accuracy, so the
next question is how to choose the best out of all
these models, which is called model selection in Difference from Regression Analysis
literatures. The basic idea is to estimate the mea-
sures, each model achieved on the study dataset, At the first sight, prediction and statistical infer-
and choose the one with the best performance. ence are similar to each other in the context of
One question that researchers often encounter is regressions: there is an observed sample, with
how to decide what predictors and how many of several predictors (or covariates) and an outcome
them should be included in the model. Say the variable; one builds a model to describe the asso-
measure of accuracy is MSPE. Ideally, there ciation between predictors and outcomes so that
should be a sufficiently large training set to fit the mean, quantiles, or the distribution of the
the model and a test set that could give a good outcome can be explained by a function of pre-
estimate of MSPE. However, this might not be the dictors. However, the focus of these two analyses
case in real-world study. There are different is different. For statistical inference, the target is to
approaches to overcome the limitation of sample describe the relationship between the covariates
size and generate an acceptable estimate of and outcomes in the population from which the
MSPE, like pseudo out-of-sample forecast and sample is drawn. For prediction, the major interest
cross validation. Take the cross validation, for lies in the accuracy of the predicted value, regard-
example, a k-fold cross validation will randomly less of whether the model makes sense or not. For
divide the sample into k subsamples. One subsam- instance, it is okay to look at the fitted model and
ple will be kept as the test set and the other (k-1) say certain predictors’ prediction ability is high,
subsamples are used to fit the model. One can take but one should not overinterpret relationships dis-
the average of the k-fitted models as the single covered in a prediction model. And also, addi-
fitted model. The average of the k MSPE is then tional assumptions are needed if the regression
used as a quantity that summarizes how this model model is used for prediction. The most important
performs and also an estimate for the MSPE. The assumption is that new sample should be from the
model that has the lowest average MSPE will then same population where the model is fitted, so that
be chosen. A common mistake in doing cross it is legit to use the model fitted on the observed
validation is to somehow use the whole dataset sample to make prediction. Another thing is that
in fitting the model, for instance, using the whole the conditional expectation of response give pre-
dataset to choose predictors and then fitted the dictors has different interpretations under differ-
model using these predictors by “k-fold cross ent setting. In regression analysis, it is the average
validation.” The MSPE calculated in that manner response for those who have the given levels of
would be smaller than the true value, and it cannot predictors, the uncertainty of which is estimated
be served as an estimate of the true MSPE. Also, by the standard error. For prediction model, it is
the predicted expectation of response given the actually a discussion of the generalized pivotal
level of predictors, the uncertainty of which is quantity. Hannig et al. (2006) refined the defini-
estimated by MSPE. Generally speaking, the pre- tion given by Weerahandi (1993) and discov-
diction error is larger than the standard error. ered that a subclass of generalized pivotal
quantity is of interests and good properties.
This subclass of generalized pivotal quantity is
Appendix named the fiducial generalized pivotal quantity
due to its close connection with Fisher (1935)
Concept of General Pivots fiducial argument.
The concept of generalized pivot is first intro-

Definition 1 A function of ð,  , ξÞ for a
duced by Tsui and Weerahandi (1989).
parameter θ, denoted as Pθ ð,  , ξÞ, is called a
Weerahandi (1993) compared the properties of
fiducial generalized pivotal quantity (FGPQ) if it
frequentist confidence intervals and generalized
satisfies the following two conditions (FGPQ1).
confidence intervals to give an intuitive
The conditional distribution of Pθ ð,  , ξÞ, con-
understanding:
ditional on = s, is free of ξ (FGPQ2). For every
allowable s ℝk,Pθ(s, s, ξ) = 0.
Property 1: Consider a particular situation of inter-
val estimation of a parameter θ. If the same
Hannig et al. (2006) proved that, under mild
experiment is repeated a large number of
conditions, the coverage probability of a gener-
times (depending on the required accuracy of
alized confidence interval is correct as sample
the desired coverage) to obtain new sets of
size goes to infinite. The authors also gave a
observations x, then the confidence intervals
structural method to construct the fiducial gen-
by conventional definition will correctly
eralized pivotal quantity. It is briefly described
include the true value of the parameter 95%
here:
of the time.
Property 2: After a large number of independent
situations of setting 95% confidence intervals Definition 2 Let  ¼ ðS1 , , Sk Þ S ℝk be a
for certain parameters of interest, the investi- k-dimensional statistic whose distribution
gator will have correctly included the true depends on a p-dimensional parameter ξ Ξ.
values of the parameters in the corresponding Suppose there exist mappings f1 . . . , fk, with fj :
intervals 95% of the time. ℝk ℝp ! ℝ, such that, if Ei ¼ f i ð; ξÞ, for
i = 1,..., k; then  ¼¼ ðE1 , . . . Ek Þ has a joint
Property 1 is the property of classic frequentist distribution that is free of ξ. We say that f ð, ξÞ is
confidence intervals and it implies Property a pivotal quantity for ξ where f = ( f1, . . . , fk).
2. However, it is not always possible to find the
confidence intervals that satisfy Property 1, a Definition 3 Let f ð, ξÞ be a pivotal quantity for f
well-known example of which is the Behrens- as described in Definition 2. For each s S,
Fisher problem. Weerahandi (1993) argued that define e(s) = f(s, Ξ). Suppose the mapping f(s, ) :
Property 2 is of direct practical importance Ξ ! e(s) is invertible for every s S. We then
because the statistical inference is no longer an say that f ð, ξÞ is an invertible pivotal quantity for ξ.
issue if indeed repeated samples can be obtained In this case we write g(s, ) = (g1(s, ), . . . , gk(s, ))
from the same experiment. The confidence inter- for the inverse mapping so that whenever e =
vals that have Property 2 are thus called general- f(s, ξ), we have g(s, e) = ξ.
ized confidence intervals.
In order to construct a confidence interval, a Theorem 1 Let  ¼ ðS1 , . . . , Sk Þ S ℝk be a
quantity call pivotal quantity is essential. The k-dimensional statistic whose distribution
discussion of generalized confidence intervals is depends on a p-dimensional parameter ξ Ξ.

Suppose there exist mappings f1,...,fk, with fj : @μðxÞ X @μðxÞ @μðxÞ X @μðxÞ
ω1 ðxÞ ¼ þ
ℝk ℝp ! ℝ, such that f = ( f1, . . . , fk) is an @β β @β0 @γ γ @γ 0

invertible pivotal quantity with inverse mapping g @μðxÞ X @μðxÞ @μðxÞ X
(s, •). Define þ2 þ 2
@β β @γ @β 2Dβ
@μðxÞ X X
Rθ ¼ Rθ ð,  , ξÞ þ2 þ ,
@γ 1Dγ 1DD
¼ π ðg1 ð, f ð , ξÞÞ, . . . , gk ð, fð , ξÞÞÞ
¼ π ðg1 ð,  ÞÞ, . . . , gk ð,  Þ 2
@ 2 μðxÞ X @ 2 μðxÞ @ μðxÞ X @ 2 μðxÞ
(53) ω2j ðxÞ ¼ 0 þ 0
@x @β
j β @β @x j @x @γ
j γ @γ @x j
2
@ μðxÞ X @ 2 μðxÞ @ 2 μðxÞ X
where  ¼ f ð , ξÞ is an independent copy of þ2
@xj @β βγ @γ 0 @xj
þ2 j
@x @β 2Dβ
. Then Pθ is a FGPQ for θ = π(ξ). When θ is a 2
@ μðxÞ X X
scalar parameter, an equal-tailed two-sided (1 – α) þ2 j þ , 9
@x @γ 2Dγ 2DD
100% GCI for θ is given by Pθ,α/2
θ
Pθ,1α/2
! !!
Here Pθ,γ = Pθ,γ (s) denotes the 100γ th percentile of 1X n
@ 2 μðxi Þ 1X n
@ 2 μðxi Þ
the distribution of Pθ conditional on  ¼ s: ω3j ðxÞ ¼ Σβ
n i¼1 @xj @β n i¼1 @β0 @xj
One-sided generalized confidence bounds are ! !!
obtained in an obvious manner. 1X n
@ 2 μ ðx i Þ 1X n
@ 2 μðxi Þ
þ Σγ
n i¼1 @xj @γ n i¼1 @γ 0 @xj
This method is only valid in problems where ! !!
complete statistics exist. For the incomplete cases, 1X n
@ 2 μðxi Þ 1X n
@ 2 μðxi Þ
þ2 Σβγ
the authors gave two generalizations of this n i¼1 @xj @β n i¼1 @γ 0 @xj
!
method. For more details, please see Hannig 1X n
@ 2 μ ðx i Þ
et al. (2006). þ2 Σ 3Dβ
n i¼1 @xj @β
!
1X n
@ 2 μðxi Þ
þ2 Σ3Dγ þ Σ3DD:
Variances and Estimators for Back- n i¼1 @xj @γ
Transformations
The derivatives:
" #
@^
μ ðxÞ @h x,β^
¼μ^ ðxÞ , ^γ
@β @β μ ðxÞ 1 X
@^ n
@mi x,β,^
¼ ,
@^
μ ðxÞ @s2 ðx,^γ Þ @β n i¼1 @β
¼μ^ ðxÞ 0:5 ,
μ ðxÞ 1 X
@^ n ^γ
@mi x,β,^
@γ @γ ¼ ,
" # @γ n i¼1 @γ
@^
μ ðxÞ @^ μ ðxÞ @h x,β^ @s2 ðx,^γ Þ
¼ þ 0:5 ^ ðxÞ 1 X
@2μ n ^γ
@m2i x,β,^
@xj @β @β @xj @xj ¼ ,
" # @xj @β n i¼1 @xj @β
@ 2 h x,β^
þ^μ ðxÞ , ^ ðxÞ 1 X
@2μ n ^γ
@m2i x,β,^
@xj @β ¼ ,
" # @x @γ
j n i¼1 @x @γ
j
@^
μ ðxÞ @^ μ ðxÞ @h x,β^ s2 ðx,^γ Þ
¼ þ 0:5
@xj @γ @γ @xj @xj !1
2 2 X n
@k xi ,β^ @k xi ,β^
^ βγ ¼
Σ
@ s ðx,^γ Þ @β @β0
þ0:5^ μ ðxÞ : i¼1
@xj @γ ! !1
@k xi ,β^ @s2 ðxi ,^γ Þ Xn
@s2 ðxi ,^γ Þ @sðxi ,^γ Þ
ê i ^η i
@β @γ @γ @γ 0
The variances derived from delta methods are i¼1
Estimated Independent Friedman J, Hastie T, Tibshirani R. The elements of statis-

coefficient Dependent variable variable tical learning, volume 1. Springer Series in Statistics.
2001.
^ 1Dβ
Σ ^ γ ê i
mi x,β,^ @ k xi ,β^ = @ β Gupta RC, Li X. Statistical inference for the common

^ 1Dγ
Σ ^ γ ^η i
mi x,β,^ @ s2 ðxi ,^γ Þ= @ γ mean of two log-normal distributions and some appli-
cations in reliability. Comput stat data anal. 2006;50
^ 2Dβ
Σ ^γ =
@ mi x,β,^ @ k xi ,β^ = @ β
(11):3141–64.
@ xj Þ ê i Hall P. On the removal of skewness by transformation.

^ 2Dγ
Σ ^ γ = @ xj
@ mi x,β,^ @ s2 ðxi ,^γ Þ= @ γ J R Stat Soc Ser B Methodol. 1992;54(1):221–8.
^η i Hannig J, Iyer H, Patterson P. Fiducial generalized
1 P confidence intervals. J Am Stat Assoc. 2006;101(473):
^ 3Dβ
Σ ^γ @ k xi ,β^ = @ β
n l @ mi xl ,β,^ 254–69. https://doi.org/10.1198/016214505000000736.
= @ xj Þ ê i Hayashi F. Econometrics, vol. volume 1. Princeton:
1 P
^ 3Dγ
Σ n ^γ
@ mi xl ,β,^ @ s2 ðxi ,^γ Þ= @ γ Princeton University Press; 2000.
l
= @ x Þ ^η i
j Koenker R. Quantreg: quantile regression. R package ver-
Estimated variance Sample variance of sion, 4. 2009.
Koenker R, Hallock KF. Quantile regression. J Econ
^1DD ^γ
mi x,β,^
Perspect. 2001;15(4):143–56.
^2DD ^ γ = @ xj
@ mi x,β,^ Krishnamoorthy K, Mathew T. Inferences on the means of
P lognormal distributions using generalized p-values and
^3DD ^ γ = @ xj
ni l @ mi xl ,β,^
generalized confidence intervals. J stat plann infer.
2003;115(1):103–21.
Land CE. An evaluation of approximate confidence inter-
val estimation methods for lognormal means. Techno-
References metrics. 1972;14(1):145–58.
Manning WG, Mullahy J. Estimating log models: to trans-
Ai C, Norton EC. Standard errors for the retransformation form or not to transform? J Health Econ. 2001;20
problem with heteroscedasticity. J Health Econ. (4):461–94.
2000;19(5):697–718. Manning WG, Basu A, Mullahy J. Generalized modeling
Aitchison J. On the distribution of a positive random var- approaches to risk adjustment of skewed outcomes
iable having a discrete probability mass at the origin. J data. J Health Econ. 2005;24(3):465–88.
Am Stat Assoc. 1955;50(271):901–8. Manning WG. The logged dependent variable, hetero-
Blough DK, Madden CW, Hornbrook MC. Modeling scedasticity, and the retransfor-mation problem. J
risk using generalized linear models. J Health Econ. Health Econ. 1998;17(3):283–95. ISSN 0167-6296.
1999;18(2):153–71. URL http://ukpmc.ac.uk/abstract/MED/10180919
Box GEP. Science and statistics. J Am Stat Assoc. 1976;71 McCullagh P, Nelder JA. Generalized linear models. Boca
(356):791–9. Raton: Chapman & Hall/CRC; 1989.
Briggs A, Nixon R, Dixon S, Thompson S. Parametric McLachlan GJ, Peel D. Finite mixture models, vol. volume
modelling of cost data: some simulation evidence. 299. Hoboken: Wiley-Interscience; 2000.
Health Econ. 2005;14(4):421–8. Owen WJ, DeRouen TA. Estimation of the mean for log-
Callahan CM, Kesterson JG, Tierney WM, et al. Association normal data containing zeroes and left-censored values,
of symptoms of depression with diagnostic test charges with applications to the measurement of worker expo-
among older adults. Ann Intern Med. 1997;126(6):426. sure to air contaminants. Biometrics. 1980;36(4):
Yea-Hung Chen and Xiao-Hua Zhou. Interval estimates for 707–19. ISSN 0006341X. URL http://www.jstor.org/
the ratio and difference of two lognormal means. Stat stable/2556125
Med, 25(23):4099–4113, 2006. ISSN 1097-0258. Seber GAF, Lee AJ. Linear regression analysis, vol. vol-
https://doi.org/10.1002/sim.2504. ume 936. Hoboken: Wiley; 2012.
Dominici F, Cope L, Naiman DQ, Zeger SL. Smooth quantile Tian L, Wu J. Confidence intervals for the mean of lognor-
ratio estimation. Biometrika. 2005;92(3):543–57. mal data with excess zeros. Biom J. 2006;48(1):149–56.
Duan N. Smearing estimate: a nonparametric Lili Tian. Inferences on the mean of zero-inflated lognor-
retransformation method. J Am Stat Assoc. 1983;78 mal data: the generalized variable approach. Stat
(383):605–10. ISSN 01621459. URL http://www. Med, 24(20):3223–3232, 2005. ISSN 1097-0258.
jstor.org/stable/2288126 https://doi.org/10.1002/sim.2169.
Efron B. Nonparametric estimates of standard error: the Tsui K-W, Weerahandi S. Generalized p-values in signifi-
jackknife, the bootstrap and other methods. cance testing of hypotheses in the presence of nuisance
Biometrika. 1981;68(3):589–99. parameters. J Am Stat Assoc. 1989;84(406):602–7.
Fisher RA. The fiducial argument in statistical inference. ISSN 01621459. URL http://www.jstor.org/stable/
Ann Hum Genet. 1935;6(4):391–8. 2289949
Weerahandi S. Generalized confidence intervals. J Am Stat Zhou XH, Tu W. Comparison of several independent pop-
Assoc. 1993;88(423):899–905. ISSN 01621459. URL ulation means when their samples contain log-normal
http://www.jstor.org/stable/2290779 and possibly zero observations. Biometrics. 1999;55
Weisberg S. Applied linear regression, volume 528. Wiley; (2):645–51.
2005. Zhou XH, Tu W. Confidence intervals for the mean of
Welsh AH, Zhou XH. Estimating the retransformed mean diagnostic test charge data containing zeros. Biomet-
in a heteroscedastic two-part model. J stat plann infer. rics. 2000;56(4):1118–25.
2006;136(3):860–81. Zhou XH, Lin H, Johnson E. Non-parametric hetero-
Wu J, Wong ACM, Jiang G. Likelihood-based confidence scedastic transformation regression models for
intervals for a log-normal mean. Stat Med. 2003;22 skewed data with an application to health care
(11):1849–60. costs. J R Stat Soc Ser B Stat Methodol. 2008;
Zhou XH. Estimation of the log-normal mean. Stat Med. 70(5):1029–47.
1998;17(19):2251–64. Zhou X-H, Gao S, Hui SL. Methods for comparing the
Zhou XH, Gao S. Confidence intervals for the log-normal means of two independent log-normal samples. Bio-
mean. Stat Med. 1997;16(7):783–90. metrics. 1997;53(3):1129–35. ISSN 0006341X. URL
Zhou XH, Gao S. One-sided confidence intervals for means http://www.jstor.org/stable/2533570
of positively skewed distributions. Am Stat. 2000:100–4.
Instrumental Variable Analysis
21
Michael Baiocchi, Jing Cheng, and Dylan S. Small
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
Example: Neonatal Intensive Care Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
The Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
Methods to Address Selection Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
Instrumental Variables: NICU Example Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
Sources of Instruments in Health Services Research Studies . . . . . . . . . . . . . . . . . . . . . . 487
IV Assumptions and Estimation for Binary IV and Binary Treatment . . . . . . . . . . . . 490
Framework and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490
Two-Stage Least Squares (Wald) Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
More Efficient Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
Estimation with Observed Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Understanding the Treatment Effect That IV Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Relationship Between Average Treatment Effect for Compliers and Average
Treatment Effect for the Whole Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
Characterizing the Compliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
Understanding the IV Estimate When Compliance Status Is Not Deterministic . . . . . . . 496
Assessing the IV Assumptions and Sensitivity Analysis for Violations
of Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Assessing the IV Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
M. Baiocchi (*)
Department of Statistics, Stanford University, Stanford,
CA, USA
e-mail: baiocchi@stanford.edu
J. Cheng
Department of Preventive and Restorative Dental Sciences,
University of California, San Francisco School of
Dentistry, San Francisco, CA, USA
e-mail: jing.cheng@ucsf.edu
D. S. Small
University of Pennsylvania, Philadelphia, PA, USA
e-mail: dsmall@wharton.upenn.edu

https://doi.org/10.1007/978-1-4939-8715-3_32
480 M. Baiocchi et al.
Weak Instruments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504

Binary Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
Two-Stage Residual Inclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
Bivariate Probit Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
Matching-Based Estimator: Effect Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507
Multinomial, Survival and Distributional Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
Multinomial Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
Survival Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
Effect of Treatment on Distribution of Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
Study Design IV and Multiple IVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
Study Design IV: Near-Far Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
Multilevel and Continuous IVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
Multiple IVs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
Multilevel and Continuously Valued
Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
Extended Instrumental Variable Method for When Proposed IV Has a Direct
Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
Abstract causal effects using instrumental variables, and

A goal of many health services research studies sources of instrumental variables in health ser-
is to determine the causal effect of a treatment vices research studies.
or intervention on health outcomes. Often, it is
not ethically or practically possible to conduct
a perfectly randomized experiment, and Introduction
instead an observational study must be used.
A major difficulty with observational studies is The goal of health services research is to provide
that there might be unmeasured confounding, actionable information for policymakers. Modern
i.e., unmeasured ways in which the treatment policy decision makers are driven by data-backed
and control groups differ before treatment that arguments regarding what might change as a
affect the outcome. Instrumental variable anal- result of an intervention. As analysts, this requires
ysis is a method for controlling for unmeasured specific attention to determining the causal impact
confounding. Instrumental variable analysis between a given intervention and future out-
requires the measurement of a valid instrumen- comes. In order to justify a change in the way
tal variable, which is a variable that is indepen- medicine is practiced, correlation is not sufficient;
dent of the unmeasured confounding and detecting and quantifying causal connections is
encourages a subject to take one treatment necessary.
level versus another, while having no effect Medicine has relied on randomized controlled
on the outcome beyond its encouragement of studies as the gold standard for detecting and quan-
a certain treatment level. This chapter dis- tifying causal connections between an intervention
cusses the types of causal effects that can be and future outcomes. Randomization offers a clear
estimated by instrumental variable analysis, mechanism for limiting the number of alternate
the assumptions needed for instrumental vari- possible explanations for what generates the dif-
able analysis to provide valid estimates of ferences between the treated and control groups.
causal effects and sensitivity analysis for The demand for causal evidence in medicine far
those assumptions, methods of estimation of exceeds the ability to practically control, finance,
21 Instrumental Variable Analysis 481
and/or conduct randomized studies. Observational weight, congenital disorder indicators, parity, and
data offers a sensible alternative source of data for information about the mother’s socioeconomic
developing evidence about the implications of dif- status. Yet even with this level of detail, our data
ferent medical interventions. However, for studies cannot characterize the full set of clinical factors
using observational data to be considered as a that a physician or family considers when decid-
reliable source for evidence of causal effects, ing whether to route a preemie to a high-intensity
great care is needed to design studies in a way care unit. As shall be discussed, these missing
that limits the number of alternative explanations attributes will cause us considerable problems.
for observed differences in outcomes between What one wants is not the naïve comparison of
intervention and control. This chapter will examine rates of death, that is, the percentage of preemies
instrumental variables as a framework for design- who died at the different types of NICUs, but the
ing high-quality observational studies. A few of difference in probabilities of death for each pree-
the common pitfalls to be aware of will be mie given whether the preemie was to be deliv-
discussed. ered at a low-level facility or a high-level facility.
This is the causal effect of treatment. This concept
is formalized below.
Example: Neonatal Intensive Care Units
The development of medical care for premature The Fundamentals

infants (preemies) has been a spectacular success
for modern medicine. This care is offered within The Potential Outcome Framework
neonatal intensive care units (NICUs) of varying The literature has made great use of the potential
intensity of care. Higher-intensity NICUs (those outcome framework (as described in Neyman
classified as various grades of level 3 by the 1990; Rubin 1974; Holland 1988) as a systematic,
American Academy of Pediatrics) have more mathematical description of the cause-and-effect
sophisticated medical machinery and highly relationship between variables. Let us assume
skilled doctors who specialize in the treatment of there are three variables of interest: the outcome
tiny preemies. of interest Y, the treatment variable D(the Dcomes
While establishing value requires addressing from the notion of “dose” rather than a “treat-
questions of both costs and outcomes, our exam- ment”), and Xas a vector of covariates. For most
ple will focus on estimating the difference in rates of this chapter it will be assumed that there are
of death between the higher level NICUs and the only two treatment levels (e.g., the new interven-
lower level NICUs. Using data from Pennsylvania tion under consideration vs. the old intervention),
from the years 1995 to 2005, a simple comparison though this assumption is only for simplicity’s
of death rates at high-level facilities to low-level sake and treatments with more than two levels
facilities shows a higher-death rate at high-level are permissible. These two levels will be referred
facilities, 2.26% compared to 1.25%. This higher- to using the generic terms “treatment” and “con-
death rate at high-level facilities is surprising only trol,” without much discussion of what those two
if one assumes preemies were randomly assigned words mean aside from saying that they serve as
to either a high- or low-level NICU, regardless of contrasting interventions to one another. In the
how sick they were. In fact, as in most health potential outcome framework, the notion is that
applications, the sickest patients were routed to each individual has two possible outcomes – one
the highest level of intensity. As a result, one which is observed if the person were to take the
cannot necessarily attribute the variation in the treatment and one if the person were to take the
outcome to variation in the treatment intensity. control. In practice only one of these outcomes
Fortunately, our data provide a detailed assess- can be observed because taking the treatment
ment of baseline severity with 45 covariates often precludes taking the control and vice versa.
including variables such as gestational age, birth Let subject itaking the treatment be denoted as
Di = 1 and subject itaking the control as Di = 0. heart attack. High blood pressure is known to
To formally denote the outcome subject iwould correlate with higher risk of heart attack, so it is
experience under the treatment and control, write tempting to control for this covariate. Controlling
Y(Di = 1) and Y(Di = 0), respectively. To simplify for blood pressure is likely to improve the preci-
the notation, let Y 1i and Y 0i denote the potential sion of the estimate if a pretreatment blood pres-
outcome under treatment and the potential outcome sure measure is used. It would be a mistake to use
under control, respectively. In this chapter, Ywill be a posttreatment measurement of blood pressure as
thought of as a scalar, though it is possible to develop a control because this measurement may be
a framework where Y is a vector of outcomes. affected by the drug and would thus result in an
Excellent resources exist for reading up on the attenuated estimated causal effect. Intuitively, this
potential outcome framework (e.g., Rosenbaum is because the estimation procedure is limiting
2002; Pearl 2009; Hernán and Robins 2013). comparison in outcome not just between people
The ultimate, often unattainable, quantity of who took the drug and who didn’t but between
interest, namely, the individual level treatment people who took the drug and then had a certain
effect, can be described as level of blood pressure to people didn’t take the
drug and had the same level of blood pressure.
Δi ¼ Y 1i Y 0i The impact from the drug may have already hap-
pened via the lowering of the blood pressure.
Thus, Δi will tell us the difference in outcome, Let’s denote these measured pretreatment char-
for subject i, between taking the treatment and acteristics as Xi for the ith subject. Further, the
control. If this quantity could be observed, then subjects are likely to have characteristics which
the benefit from intervention would be known were not recorded. Let’s denote these unobserved
explicitly. But, in practice only one or the other characteristics as Ui for the ith subject. To keep
of the potential outcomes is observed. To see this, things simple, assume that the covariates are lin-
write the observed outcome, denoted Y obs i for the early related to the outcomes like so
ith individual, as a function of the potential out-
comes (Neyman 1990; Rubin 1974): Y 1i ¼ XTi β1 þ UTi α1
Y 0i ¼ XTi β0 þ Ui α0
Y obs
i ¼ Di Y 1i ð1 Di Þ Y 0i
Note that the coefficients need to be indexed by

Observing one of the potential outcomes pre-
the treatment level in order to account for interac-
cludes observing the other. In all but the most
tions between the treatment level and the covariates.
contrived settings, this problem is intractable.
Also, it may appear strange putting coefficients on
Both the treatment and control outcomes cannot
the unobserved variables, but this is required at the
be observed. So other parameters of interest must
bare minimum to make the dimensions agree. In
be turned to.
practice, let us write e1i in place of the clunkier UTi
α1 , but this is a move of convenience rather than
Parameters of Interest discipline. There is likely not just one scalar,
Suppose we, as the analysts, have collected char- unobservable covariate omitted from our dataset,
acteristics of the subjects in our study. It is impor- so it is more realistic to write UTi α1 . Note that this
tant to stress that these baseline characteristics means something a bit magical is happening when
should be based on the state of the subject prior an author proposes a parametric distribution for e1i .
to the intervention to avoid the potential to bias Combining our equations for the observed out-
the treatment effect (see Cox (1958, Sect. 4.2) and come and the linear models, the observed outcome
Rosenbaum (2002)). For example, say a new drug can be decomposed in terms of covariates, both
is being tested for its ability to lower the risk of observed and unobserved, as well as the treatment:

Y obs
i ¼XTi β0 þ Di XTi β1 XTi β0 In an experiment, because of randomization, it
is known that
þ UTi α1 UTi α0 þ UTi α0
ðX, UÞ╨D,
It is standard in econometrics to think of the
above model as a regression, where the coefficient
where ╨ denotes independence. And it follows
on the treatment variable comes from two sources
that
of variation: the first source
is the variation
due to
the observed covariates XTi β1 XTi β0 and the
E Y 1 Y 0 j X ¼ E Y 1 j X, D ¼ 1
second is the variation due to the unobserved
covariates, UTi α1 UTi α0 , where Di may be E Y 0 j X, D ¼ 0
correlated with Ui. It is common to interpret the
first source of variation as the gains for the average Though it is often a dubious claim, many of
person with covariate levels Xi and the second the standard observational study techniques
source of variation to be referred to as idiosyncratic require an assumption which essentially says
gains for subject i. The idiosyncratic gains are the that the only selection between treated and con-
part of this model which allows persons i and j to trol groups is on levels of the observed
differ in gains from treatment even when Xi = Xj. covariates, i.e., U ╨ D|X. This is sometimes
referred to as overt selection bias. Typically, if
Selection Bias overt selection bias is the only form of bias, then
One of the biggest problems with observational stud- either conditioning on observed covariates (e.g.,
ies is that there is selection bias. Loosely speaking, by using a regression) or matching is enough to
selection bias arises from how the subjects are sorted address overt bias. One particular assumption
(or sort themselves) into the treatment or control that is invoked quite often in the current health
groups. The intuition here is the treatment group literature is the absence of omitted variables (i.e.,
was different from the control group even before only overt bias).
the intervention, and the two groups would probably Hidden bias exists when there are imbalances in
have had different outcomes even if there had been the unobserved covariates. Let’s use the observed
no intervention at all. Selection bias can occur in a outcome formula again, rewriting it like so:
couple of different ways, but one way to write it is
Y obs ¼ XTi β0 þ Di E½Δj X þ UTi α0
f ðX, Uj D ¼ 1Þ 6¼ f ðX, Uj D ¼ 0Þ i

þ Di UTi α1 UTi α0
that is, the joint distribution of the covariates
for those who received the treatment is different A least squares regression of Y on D based on
than for those who received the control. If this is the model above will tend to produce biased esti-
true, that there is selection bias, then mates for E[Δ|X]
when Dis correlated with either
UTi α0 or UTi α1 UTi α0 . This can arise from

E Y 1 Y 0 jX 6¼ E½Y ð1Þj X, D ¼ 1 unobserved covariates which influence both
E½Y ð0Þj X,D ¼ 0 potential outcomes and selection into treatment.
This bias is referred to as hidden bias. If the
This is problematic because the left-hand side average treatment effects given T X,1 E[Δ|X], and

of this equation is our unobservable quantity of the hidden
T 0 biases given
X, E Ui α j X, D ¼ 1
interest, but the right-hand side is made up of E Ui α j X, D ¼ 0 , are the same for all X, then
directly observable quantities. But it seems like the regression estimate of E[Δ|X] is biased by
the above equation is used in other settings,
namely, experimentation. Why is that acceptable? E UTi α1 j X, D ¼ 1 E UTi α0 j X, D ¼ 0
Methods to Address Selection Bias which are observed. Formally, this assumption
is often written as
In a randomized experiment setting, inference on
the causal effect of treatment on the outcome
Y 0 , Y 1 ╨D j X,
requires no further assumption than the method
0 < pr ðD ¼ 1j XÞ < 1
for randomizing subjects into the treatment or
control (Fisher 1949). The randomization guaran-
tees independence of assigned treatment from the where ╨ denotes the conditional independence
covariates. This independence is for all covariates, between the treatment and the joint distribution of
both observed and unobserved. By observed the counterfactual outcomes given X. Two ran-
covariates we mean those covariates which appear dom variables are conditionally independent
in the analyst’s data set and unobserved all of given a third variable if and only if they are
those that don’t. If the sample is large enough, independent in their conditional distribution
then this independence means that the treatment given the conditioning variable. The above
group will almost surely have quite a similar assumption, essentially saying that all needed
covariate distribution as the control group. There- covariates are measured, has a few different
fore, any variation noted in the outcome is more names: strongly ignorable treatment assignment
readily attributed to the variation in the treatment (Rosenbaum and Rubin 1983), selection on
level rather than variation in the covariates. observables (Heckman and Robb 1985), condi-
The primary challenge to observational studies tional independence, no hidden bias (only overt
is that selection into treatment is not randomly bias due to X), no unmeasured confounders (in the
assigned. Usually there are covariates, both epidemiology literature), or the absence of omit-
observed and unobserved, which determine who ted variable bias (in the econometrics literature).
receives treatment and who receives control. In To assume strongly ignorable treatment assign-
such a case, variation in the outcome is not easily ment in a medical application is to go a bit against
attributable to treatment levels because covariates intuition. In practice, the analyst has access to
are different between the different levels as well. some subset of the recorded information from
There are techniques which were created to the patients’ interaction with the health system.
address this selection bias. These methods can Currently, most analysts do not have access to
be classified (roughly) into two groups: (1) those many measurements the medical decision makers
methods which address only the observed selec- have (e.g., results of labs, biometric information),
tion bias and (2) those methods which attempt to so they are forced to use transactional information
address selection bias on both the observed as well (e.g., insurance claims) which are good for indi-
as unobserved covariates. Falling into the first cating the presence of a condition but not neces-
category are techniques like regression, Bayesian sarily the severity. It is possible that as electronic
hierarchical modeling, propensity score matching, health records become more readily available, this
and inverse probability weighting. The second problem will diminish, but currently this should
category includes methods like instrumental vari- be a source of great skepticism for methods rely-
ables, regression discontinuity, and difference in ing on the assumption of no unobserved bias. But
differences. the issue does not stop here. The health analyst
should be aware that medical practitioners are
Methods to Address Overt Selection Bias keen observers and intuitively adept at identifying
Only through special justification should issues which may go either unrecorded or may
methods which address only overt bias be con- even be unquantifiable (e.g., practitioners will
sidered valid. Usually, this justification takes the regularly refer to the frailty of a patient, which
form of an assumption. Informally, this assump- seems to be a generally understood yet
tion can be thought of as saying: selection into unmeasurable quality of a patient). Given the
the treatment is occurring only on variables additional information the medical decision
makers have, and their desire to choose an optimal setting that is otherwise quite biased in its treat-
outcome, medical decision makers are actively ment assignments.
working against the reasonableness of strongly There have been many different formulations
ignorable treatment assignment. of IV, reflecting the diverse academic traditions
It is unfortunate that methods which were that use IV. Though IVs existed in the literature for
designed only to address overt bias have become quite some time, Angrist et al. (1996) used the
the default tools of choice in the literature. Given potential outcome framework to bring greater
the complexity of health decision, it strains cred- clarity to the math of IV. For the health analyst,
ibility that all variables which influence treatment perhaps Holland (1988) offers the most intuitive
and outcome are recorded and available to the introduction to IVs, framing IV as a randomized
analyst. The default for health analysts (and crit- trial with noncompliance. The frameworks for IV
ically minded reviewers) should be to assume discussed in both Angrist et al. (1996) and Hol-
unobserved selection is occurring and to look for land (1988) enhance the classic econometric pre-
ways of mitigating it. sentation of IVs where the focus is on correlation
with the error term. Health analysts will likely find
Instrumental Variables: A Framework these introductions most engaging.
to Address Overt Bias and Bias Due To illustrate IVs, consider the NICU example
to Omitted Variables from earlier.
Regression, propensity score matching, and any
methods predicated on only overt bias do not
address selection on unobserved covariates. It is Instrumental Variables: NICU Example
important to be aware of this because a well- Revisited
informed researcher needs to judge if available
covariates are enough to make a compelling Neonatal intensive care units (NICUs) have been
argument for the absence of omitted variables. established to deliver high-intensity care for pre-
This is often a dubious claim because (1) a clever mature infants (those infants born before
reviewer will find several variables missing 37 weeks of gestation). Considering all of the
from your data set and/or (2) there are intangible preemies that were delivered in Pennsylvania
variables that are difficult, or perhaps inconceiv- between 1995 and 2005, 2.26% of the preemies
able, to measure. Instrumental variable (IV) tech- delivered at high-level NICUs died, while only
niques are one way of addressing unobserved 1.25% of the preemies who were delivered at
selection bias. low-level NICUs died. No one believes the differ-
It is important to note IV techniques do not ence in outcomes reported above is solely attrib-
come for free, without hefty assumptions. It is utable to the difference in level of intensity of
important to consider these assumptions carefully treatment. People believe it is due to difference
before deciding to use an IV analysis. in covariates. Based on the observable covariates,
An instrumental variable (IV) design takes this is plausible because the preemies delivered at
advantage of randomness which occurs in the high-level NICUs weighed almost 250 g less than
treatment assignment to help address imbalances the preemies which were delivered at low-level
in the unobserved variables. An instrument is a NICUs (2,454 at high-level NICUs vs. 2,693 at
haphazard nudge toward acceptance of a treat- low-level NICUs). Similarly preemies delivered
ment that affects outcomes only to the extent that at high-level NICUs were born a week earlier than
it affects acceptance of the treatment. In settings in their counterparts at low-level NICUs on average
which treatment assignment is mostly deliberate (34.5 vs. 35.5 weeks).
and not random, there may nevertheless exist Unfortunately, full medical records were not
some essentially random nudges to accept treat- available for this study. Only birth and death cer-
ment, so that use of an instrument might extract tificates and a form UB-92 that hospitals provided
bits of random treatment assignment from a were available. It is quite likely that not all
necessary covariates in our dataset are available, precisely adjusted for using statistical methods
so assuming only overt bias is likely to lead to such as propensity scores or regression. If the
biased estimates. To attempt to deal with this story stopped with just D, Y, and U, then the effect
problem, Baiocchi et al. (2010) and Lorch et al. of D on Y could not be estimated.
(2012a) used an instrumental variable approach. Instrumental variable estimation makes use of
They used distance to treatment facility as an an uncomplicated form of variation in the system.
instrument, because travel time largely determines What is needed is a variable, typically called an
the likelihood that mother will deliver at a given instrument (represented by Z in Fig. 1) that has
facility but appears to be largely uncorrelated with very special characteristics. It takes some practice
the level of severity a preemie experiences. to understand exactly what constitutes a good
To help visualize the problem, look at Fig. 1 instrumental variable.
below. This is an example of a directed acyclic Consider excess travel time as a possible
graph (Pearl 2009). The arrows denote causal instrument. Excess travel time is defined as the
relationships. Read the relationship between vari- time it takes to travel from the mother’s residence
ables D and Y like so: changing the value of D to the nearest high-level NICU minus the time it
causes Y to change. In our example, Y represents takes to travel to the nearest low-level NICU. If
mortality. The variable D indicates whether or not the mother lives closest to a high-level NICU,
a baby attended a high-level NICU. Our goal is to then excess travel time will take on negative
understand the arrow connecting D to Y. In order values. If she lives closest to a low-level NICU,
to keep the current example simple, assume there excess travel time will be positive.
are no observed covariates (which would be There are three key features a variable must
denote using an X in Fig. 1). In general, IV tech- have in order to qualify as an instrument (see
niques are able to adjust for variation in observed section “IV Assumptions and Estimation for
covariates (see section “Estimation with Observed Binary IV and Binary Treatment” for mathemati-
Covariates”). cal details on these features and additional
The U variable causes consternation as it rep- assumptions for IV methods). The first feature
resents the unobserved level of severity of the (represented by the directed arrow from Z to
preemie, and it is causally linked to both mortality D in Fig. 1) is that the instrument causes a change
(sicker babies are more likely to die) and to which in the treatment assignment. When a woman
treatment the preemies selects (sicker babies are becomes pregnant she has a high probability of
more likely to be delivered in high-level NICUs). establishing a relationship with the proximal
Because U is unobserved directly, it cannot be NICU, regardless of the level, because she is not
anticipating having a preemie. Proximity as a
leading determinate in choosing a facility has
been discussed in Phibbs et al. (1993). By
selecting where to live, mothers assign themselves
to be more or less likely to deliver in a high-level
NICU. The fact that changes in the instrument are
associated with changes in the treatment is verifi-
able from the data.
The second feature (represented by the crossed
out arrow from Z to U) is that the instrument is not
associated with variation in unobserved variables
U that also affect the outcome. That is, Z is not
connected to the unobserved confounding that
was a worry to begin with. In our example, this
Fig. 1 Directed acyclic graph for the relationship between
an instrumental variable Z, a treatment D, unmeasured would mean unobserved severity is not caused by
confounders U, and an outcome Y variation in geography. Since high-level NICUs
tend to be in urban areas and low-level NICUs the effect of maternal smoking during pregnancy
tend to be the only type in rural areas, this assump- on an infant’s birthweight using a randomized
tion would be dubious if there were high-level of encouragement trial in which some mothers
pollutants in urban areas (think of Manchester, received extra encouragement to stop smoking
England circa the Industrial Revolution) or if through a master’s level staff person providing
there were more pollutants in the drinking water information, support, practical guidance, and
in rural areas than in urban areas. The pollutants behavioral strategies (Sexton and Hebel 1984).
may have an impact on the unobserved levels of For a randomized encouragement trial, the ran-
severity. The assumption that the instrument is not domized encouragement assignment (1 if encour-
associated with variation in the unobserved vari- aged, 0 if not encouraged) is a potential IV. The
ables, while most certainly an assumption, can at randomized encouragement is independent of
least be corroborated by looking at the values unmeasured confounders because it is randomly
of variables that are perhaps related to the assigned by the investigators and will be associ-
unobserved variables of concern (see section ated with the treatment if the encouragement is
“Assessing the IV Assumptions”). effective. The only potential concern with the
The third feature (represented by the crossed randomized encouragement being a valid IV is
out line from Z to Y in Fig. 1) is that the instrument that the randomized encouragement might have
does not cause the outcome variable to change a direct effect on the outcome not through the
directly. That is, it is only through its impact on treatment. For example, in the randomized
the treatment that the instrumental variable affects encouragement trial to encourage expectant
the outcome. In our case, presumably a nearby mothers to stop smoking, the encouragement
hospital with a high-level NICU affects mortality could have a direct effect if the staff person pro-
only if the baby receives care at that hospital. That viding the encouragement also encouraged expec-
is, proximity to a high-level NICU in and of itself tant mothers to stop drinking alcohol during
does not change the probability of death for a pregnancy. To minimize a potential direct effect
preemie, except through the increased probability of the encouragement (Sexton and Hebel 1984)
of the preemie being delivered at the high-level asked the staff person providing encouragement
NICU. This is often referred to as the exclusion to avoid recommendations or information
restriction and can be a slippery concept to get a concerning other habits that might affect
hold of. See Angrist et al. (1996) for discussion of birthweight such as alcohol or caffeine consump-
the exclusion restriction. In our case it seems quite tion and also prohibited discussion of maternal
reasonable. nutrition or weight gain.
When comparing two treatments, one of which
is only provided by specialty care providers and
Sources of Instruments in Health one of which is provided by more general pro-
Services Research Studies viders, the distance a person lives from the nearest
specialty care provider has often been used as an
In this section, common types of IVs that have IV. Proximity to a specialty care provider particu-
been used in health services research studies will larly enhances the chance of being treated by the
be described, and issues to consider in assessing specialty care provider for acute conditions. For
their validity will be discussed. One way to study less acute conditions, patients/providers have
the effect of a treatment when that treatment can- more time to decide and plan where to be treated,
not be controlled is to conduct a randomized and proximity may have less of an influence on
encouragement trial. In such a trial, some subjects treatment selection. For treatments that are stig-
are randomly chosen to get extra encouragement matized such as substance abuse treatment, prox-
to take the treatment and the rest of the subjects imity could have a negative effect on the chance of
receive no extra encouragement (Holland 1988). being treatment. A classic example of the use of
For example, Permutt and Hebel (1989) studied distance as an IV is McClellan et al.’s study of the
effect of cardiac catheterization for patients suf- providers with different preferences, which would
fering a heart attack (McClellan et al. 1994); the make the preference-based IV invalid unless patient
IV used in the study was the differential distance mix is fully controlled for. It is useful to look at
the patient lives from the nearest hospital that whether measured patient risk factors differ
performs cardiac catheterization to the nearest between groups of providers with different prefer-
hospital that does not perform cardiac catheteriza- ences. If there are measured differences, there are
tion. Another example is the study of the effect of likely to be unmeasured differences as well; see
high-level versus low-level NICUS (Lorch et al. section “Assessing the IV Assumptions and Sensi-
2012a) that was discussed in section “Instrumen- tivity Analysis for Violations of Assumptions” for
tal Variables: NICU Example Revisited.” Because further discussion. Also, for proposed preference-
distance to a specialty care provider is often asso- based IVs, it is important to consider whether the
ciated with socioeconomic characteristics, it will IV has a direct effect; a direct effect could arise if
typically be necessary to control for socioeco- the group of providers that prefers treatment
nomic characteristics in order for distance to A treats patients differently in ways other than
potentially be independent of unmeasured con- the treatment under study compared to the pro-
founders. The possibility that distance might viders who prefer treatment B. For example,
have a direct effect because the time it takes to Newman et al. (2012)s studied the efficacy
receive treatment affects outcomes needs to be of phototherapy for newborns with hyper-
considered in assessing whether distance is a bilirubinemia and considered the frequency of
valid IV. phototherapy use at the newborn’s birth hospital
A general strategy for finding an IV for comas an IV. However, chart reviews revealed that
paring two treatments A and B is to look for hospitals that use more phototherapy also have a
naturally occurring variation in medical greater use of infant formula; use of infant for-
practice patterns at the level of geographic mula is also thought to be an effective treatment
region, hospital or individual physician, and for hyperbilirubinemia. Consequently, the pro-
then use whether the region/hospital/individual posed preference-based IV has a direct effect
physician has a high or low use of treatment A as (going to a hospital with higher use of photo-
the IV. Brookhart and Schneeweiss (2007) therapy also means a newborn is more likely to
termed these IVs “preference-based instruments” receive infant formula even if the newborn does
because they are derived from the assumption not receive phototherapy) and is not valid. The
that different providers or groups of providers issue of whether a proposed preference-based
have different preferences or treatment algo- IV has a direct effect can be studied by looking
rithms dictating how medications or medical pro- at whether the IV is associated with concomitant
cedures are used. Examples of studies using treatments like use of infant formula (Brookhart
preference-based IVs are Brooks et al. (2004) and Schneeweiss 2007). A related way in which
that studied the effect of surgery plus irradiation a proposed preference-based IV can have a
versus mastectomy for breast cancer patients using direct effect is that the group of providers who
geographic region as the IV (Johnston 2000) that prefer treatment A may have more skill than the
studied the effect of surgery versus endovascular group of providers who prefer treatment B.
therapy for patients with a ruptured cerebral aneu- Also, providers who prefer treatment A may
rysm using hospital as the IV and Brookhart et al. deliver treatment A better than those providers
(2006) that studied the benefits and risks of who prefer treatment B because they have more
selective cyclooxygenase 2 inhibitors versus non- practice with it, for example, doctors who per-
selective nonsteroidal antiinflammatory drugs for form surgery more often may perform better
treating gastrointestinal problems using individual surgeries. Korn and Baumrind (1998) discuss a
physician as the IV. For proposed preference-based way to assess whether there are provider skill
IVs, it is important to consider that the patient effects by collecting data from providers on
mix may differ between the different groups of whether or not they would have treated a
different provider’s patient with treatment A or on a parent’s genes, genetic variants need to satisfy
B based on the patient’s pretreatment records. additional assumption to be valid IVs:
Another common source for an IV is calendar
time. Variations in the use of one treatment 1. Not associated with unmeasured confounders
versus another could result from changes in through population stratification. Most Men-
guidelines, changes in formularies or reim- delian randomization analyses do not condi-
bursement policies, changes in physician pref- tion on parents’ genes, creating the potential
erence (e.g., due to marketing activities by drug of the proposed genetic variant IV being asso-
makers), release of new effectiveness or safety ciation with unmeasured confounders through
information, or the arrival of new treatments to population stratification. Population stratifica-
the market (Brookhart et al. 2010). For example, tion is a condition where there are subpopula-
Shetty et al. (2009) studied the effect of hor- tions, some of which are more likely to have
mone replacement therapy (HRT) on cardiovas- the genetic variant, and some of which are
cular health among postmenopausal women more likely to have the outcome through
using calendar time as an IV. HRT was widely mechanisms other than the treatment being
used among postmenopausal women until 2002; studied. For example, consider studying the
observational studies had suggested that HRT effect of alcohol consumption on hyperten-
reduced cardiovascular risk, but the Women’s sion. Consider using the ALDH2 null vari-
Health Initiative randomized trial reported ant, which is associated with alcohol
opposite results in 2002, which caused HRT consumption, as an IV (individuals who are
use to drop sharply. A proposed IV based on homozygous for the ALDH2 null variant
calendar time could be associated with con- have severe adverse reactions to alcohol con-
founders that change in time such as the charac- sumption and tend to drink very little
teristics of patients who enter the cohort, (Lawlor et al. 2008)). The ALDH2 null var-
changes in other medical practices, and changes iant is much more common in people with
in medical coding systems (Brookhart et al. Asian ancestry than other types of ancestry
2010). The most compelling type of IV based (Goedde et al. 1992). Suppose ancestry was
on calendar time is one where a dramatic change not fully measured. If ancestry is associated
in practice occurs in a relatively short period of with hypertension through means other than
time (Brookhart et al. 2010). differences in the ALDH2 null variant (e.g.,
Another general source for potential IVs is through different ancestries tending to have
genetic variants which affect treatment variables. different diets), then ALDH2 would not be a
For example, Voight et al. (2012) studied the valid IV because it would be associated with
effect of HDL cholesterol on myocardial infarc- an unmeasured confounder.
tion using as an IV the genetic variant LIPG 2. Not associated with unmeasured confounders
396Ser allele for which carriers have higher through genetic linkage. Genetic linkage is the
levels of HDL cholesterol but similar levels of tendency of genes that are located near to each
other lipid and non-lipid risk factors compared other on a chromosome to be inherited together
with noncarriers. Another example is Wehby because the genes are unlikely to be separated
et al. (2011) that studied the effect of maternal during the crossing over of the mother’s and
smoking on orofacial clefts in their babies using father’s DNA (Sham 1998). Consider using a
genetic variants that increase the probability that gene A as an IV where gene A is genetically
a mother smokes as IVs. The approach of using linked to a gene B that has a causal effect on the
genetic variants as an IV is called Mendelian outcome through a pathway other than the
randomization because it makes use of the ran- treatment being studied. If gene B is not mea-
dom assignment of genetic variants conditional on sured and controlled for, then gene A is not a
parents’ genes discovered by Mendel. Although valid IV because it is associated with the
genetic variants are randomly assigned conditional unmeasured confounder gene B.
3. No direct effect through pleiotropy. Pleiotropy must be comparable on weekdays versus week-
refers to a gene having multiple functions. If ends (i.e., the IV has no direct effect). Another
the genetic variant being used as an IV affects example of a timing of admission variable used
the outcome through a function other than as an IV is hour of birth as an IV for a newborn’s
affecting the treatment being studied, this length of stay in the hospital (Goyal et al. in
would mean the genetic variant has a direct press; Malkin et al. 2000).
effect. For example, consider the use of the An additional general source of potential IVs
APOE genotype as an IV for studying the for health services research studies is insurance
causal effect of low-density lipoprotein cho- plans which may vary in the amount of reim-
lesterol (LDLc) on myocardial infarction bursement they provide for different treat-
(MI) risk. The d2 variant of the APOE gene ments. For example, Cole et al. (2006) used
is associated with lower levels of LDLc but is drug co-payment amount as an IV to study
also associated with higher levels of high- the effect of β-blocker adherence on clinical
density lipoprotein cholesterol, less efficient outcomes and health-care expenditures after a
transfer of very low-density lipoproteins and hospitalization for heart failure. In order
chylomicrons from the blood to the liver, for variations in insurance plan like drug
greater postprandial lipemia, and an increased co-payment amount to be a valid IV, insurance
risk of type III hyperlipoproteinemia (the last plans must have comparable patients after con-
three of which are thought to increase MI trolling for measured confounders (i.e., the IV
risk) (Lawlor et al. 2008). Thus, the gene is independent of unmeasured confounders),
APOE is pleiotropic, affecting myocardial and insurance plans must not have an effect on
infarction risk through different pathways, the outcome of interest other than through
making it unsuitable as an IV to examine the influencing the treatment being studied (i.e.,
causal effect of any one of these pathways on the IV has no direct effect).
MI risk.
Didelez and Sheehan (2007) and Lawlor et al. IV Assumptions and Estimation
(2008) provide good reviews of Mendelian ran- for Binary IV and Binary Treatment
domization methods.
Another source of IVs for health services In this section, the simplest setting of a binary
research studies is timing of admission vari- instrument and a binary treatment will be consid-
ables. For example, Ho et al. (2000) used day ered. The main ideas in instrumental variable
of the week of hospital admission as an IV for methods are most easily understood in this setting,
waiting time for surgery to study the effects of and the ideas will be expanded to more compli-
waiting time on length of stay and inpatient cated settings later.
mortality among patients admitted to the hospi-
tal with a hip fracture. Day of the week of
admission is associated with waiting time for Framework and Notation
surgery because many surgeons only do non-
emergency operations on weekdays, and there- The Neyman-Rubin potential outcome frame-
fore patients admitted on weekends may have to work will be used to describe causal effects
wait longer for surgery. In order for weekday (Neyman 1990; Rubin 1974). Let Zi denote the
versus weekend admission to be a valid IV, IV for subject i, where Zi = 0 or 1 for a binary
patients admitted on weekdays versus weekends IV. Level 1 of the IV is assumed to mean
must not differ on unmeasured characteristics the subject was encouraged to take level 1 of
(i.e., the IV is independent of unmeasured con- the treatment, where the treatment has levels
founders) and other aspects of hospital care that 0 and 1. Let Dzi be the potential treatment
affect the patients’ outcomes besides surgery received for subject i if she were assigned level
z of the IV – D1i is the treatment that subject 5. Stable unit treatment value assumption
i would receive if she were assigned level 1 of (SUTVA). This assumption says that the treat-
the IV and D0i is treatment that i would receive if ment affects only the subject taking the treat-
she were assigned level 0 of the IV. The ment and the treatment effect is stable through
observed treatment received for subject i is Di time (see Angrist et al. 1996; Rubin 1990 for
z, d
DZii . Let Y i be the potential outcome for details). The first part of this assumption that
subject i if she were assigned level z of the IV the treatment affects only the subject taking the
and level d of the treatment – there are four such treatment is called the no interference
potential outcomes Y 1i , 1 , Y 1i , 0 , Y 0i , 1, and Y 0i , 0 : assumption.
However, only one of them will be observed in
practice. The observed outcome for subject i is The first three assumptions are the assumptions
Z , DZi depicted in Fig. 1.
Y i Y i i i : Let Xi denote observed covariates
The fourth assumption, monotonicity, plays a
for subject i.
role in interpreting the standard IV estimate as a
Angrist et al. (1996) considered an IV to be a
causal effect for a certain subpopulation. A sub-
variable satisfying the following five
ject in a study with binary IV and treatment can
assumptions:
be classified into one of four latent compliance
classes based on the joint values of potential
1. IV is correlated with treatment received treatment received (Angrist et al. 1996):
Ci = never taker (nt) if D0i , D1i ¼ ð0, 0Þ, com-
E D1i j Xi > E D0i j Xi . 0 1
2. IV is independent of unmeasured confounders plier (co) if D i , Di ¼ ð0, 1Þ , always taker
0 1 i i ¼ ð1, 1Þ , and defier (de) if
0 1
(conditional on covariates). (at) if D , D
Di , Di ¼ ð1, 0Þ . Table 1 shows the relation-

ship between observed groups and latent com-
Zi is independent of D1i , D0i , Y i1, 1 , Y i0, 1 , Y i0, 0 j X:
pliance classes. Under the monotonicity
assumption, the set of defiers will be empty.
3. Exclusion restriction (ER). This assumption The never takers and always takers do not
says that the IV affects outcomes only change their treatment status when the instru-
through its effect on treatment received: ment changes, so under the ER assumption, the
0
Y z, d ¼ Y z , d . Under the ER, write Y d ¼ Y z, d
i i i i potential treatment and potential outcome under
for any z, that is, Y 1i is the potential outcome for either level of the IV (Zi = 1 or 0) is the same.
subject i if she were to receive level 1 of the Consequently, the IV is not helpful for learning
treatment (regardless of her level of the IV), about the treatment effect for always takers or
and Y 0i is the potential outcome if she were to never takers. Compliers are subjects who change
receive level 0 of the treatment. This assump- their treatment status with the instrument, that is,
tion is called the no direct effect assumption. the subjects would take the treatment if they were
4. Monotonicity assumption. This assumption encouraged to take it by the IV but would not
says that there are no subjects who are otherwise take the treatment. Because these subjects
“defiers,” who would only take level 1 of the change their treatment with the level of the IV, the IV
treatment if not encouraged to do so, that is, no is helpful for learning about their treatment effects.
subjects with D1i ¼ 0, D0i ¼ 1: The average causal effect for this subgroup,
Table 1 The relation Zi Di Ci

between observed groups
1 1 Complier or Always taker
and latent compliance
classes 1 0 Never taker or
0 0 Never taker or Complier
0 1 Always taker or

E Y 1i Y 0i j Ci ¼ co , is called the complier aver- for compliers. When monotonicity does not hold,
age causal effect (CACE) or the local average treat- the standard IV estimator Eq. 3 discussed in section
ment effect (LATE). It provides the information on “Two Stage Least Squares (Wald) Estimator” esti-
the average causal effect of receiving the treatment mates the quantity (Angrist et al. 1996).
PðCi ¼ coÞ PðCi ¼ deÞ

E Y 1i Y 0i jCi ¼ co E Y 1i Y 0i jCi ¼ de
PðCi ¼ coÞ þ PðCi ¼ deÞ PðCi ¼ coÞ þ PðCi ¼ deÞ
PðCi ¼ coÞ PðCi ¼ deÞ
(1)
Equation 1 could potentially be negative even if units being treated on those units left untreated
the treatment has a positive effect for all subjects (see Sobel 2006 for a precise formulation and
(Angrist et al. 1996). However, the IV method details).
estimate of the CACE is not generally sensitive In economics, a latent index model is often
to small violations of the monotonicity assump- considered for causal inference about the effect
tion (Angrist et al. 1996). Additionally, if the of a binary treatment based on a structural equa-
treatment has the same effect for compliers and tion model or two-stage linear model, for
defiers, the monotonicity assumption is not example,
needed
1 as Eq. 1 equals the CACE,
E Y i Y i j Ci ¼ co
0
(Robins and Greenland Di ¼ α0 þ α1 Z i þ ei1
1996). For further discussion of understanding Y i ¼ β0 þ β1 Di þ ei2
the treatment effect that the IV method estimates,
see section “Understanding the Treatment Effect where
That IV Estimates.”
The fifth IV assumption, SUTVA, also plays
1 if Di > 0
a role in interpreting what the standard IV Di ¼
method estimate Eq. 3 estimates. Consider in 0 if Di 0
particular the no interference assumption part Z i ╨ei1 , ei2
of SUTVA that subject A receiving the treatment
affects only subject A and not other subjects. In Vytlacil (2002) shows that a nonparametric
the NICU study, the no interference assumption version of the latent index model is equivalent to
is reasonable – if preemie A is treated at a high- the Assumptions 1–5 above that Angrist et al.
level NICU, this does not affect preemie B’s (1996) use to define an IV.
outcome. If there were crowding effects (e.g.,
treating additional babies at a hospital decreases
the quality of care for babies already under care Two-Stage Least Squares (Wald)
that hospital), this assumption might not be true. Estimator
SUTVA is also not appropriate for situations
like estimating the effect of a vaccine on an Let us first consider IV estimation when there
individual because herd immunization would are no observed covariates X. For binary IV and
lead to causal links between different people treatment variable, Angrist et al. (1996) show
(Hudgens and Halloran 2008). When no inter- that under the framework and assumptions in
ference fails to hold, the IV method is roughly section “Two Stage Least Squares (Wald) Esti-
estimating the difference between the effect of mator,” the CACE is nonparametrically identi-
the treatment and the spillover effect of some fied by

E Y 1
i Y i j Ci ¼ co
0 and π a, π c, and π n denote the proportion of always
takers, compliers, and never takers, respectively.
Eð Y i j Z i ¼ 1Þ Eð Y i j Z i ¼ 0Þ
¼ , Note that by Assumptions 1–5 and the mixture
EðDi j Zi ¼ 1Þ EðDi j Zi ¼ 0Þ structure of the outcomes of the four observed
(2) groups shown in Table 1,
which is the intention-to-treat (ITT) effect divided πc πa

by the proportion of compliers. EðYj Zi ¼ 1, Di ¼ 1Þ ¼ μc1 þ μa
πc þ πa πc þ πa
The standard IV estimator or two-stage least
EðYj Zi ¼ 1, Di ¼ 0Þ ¼ μn
squares estimator (2SLS) is the ratio of sample
(4)
covariances (Durbin 1954):
πc πn
EðYj Zi ¼ 0, Di ¼ 0Þ ¼ μc0 þ μn
cô vðY i , Zi Þ πc þ πn πc þ πn
^ 2SLS
CACE ¼
cô vðDi , Zi Þ EðYj Zi ¼ 0, Di ¼ 1Þ ¼ μa
^ ð Y i j Z i ¼ 1Þ E
E ^ ð Y i j Z i ¼ 0Þ (5)
¼
^
E ðDi j Zi ¼ 1Þ E ^ ðDi j Zi ¼ 0Þ
for binary IV and treatment: where the quantities on the left-hand side are
(3) expectations of observed outcomes and on the
right-hand side are functions of expected potential
The 2SLS estimator CACE2SLS , sometimes outcomes and proportions for latent compliance
called the Wald estimator, is the sample analogue classes. The 2SLS or standard IV estimator is to
of Eq. 2 and consistently estimates the CACE. use the data in the (Zi = 1, Di = 0) group to get μ^n
The asymptotic standard error for CACE2SLS is and then plug it into Eq. 5 to getμ^c 0 and use the data
given in Imbens and Angrist (1994), Theorem 3. in the (Zi = 0, Di = 1) group for μâ and then plug it
The 2SLS estimator Eq. 3 can be used when into Eq. 4 to get μ^c 1. However, the data information
information on Y, Z, and D are not available in a in the mixture groups (Zi = 1, Di = 1) and (Zi = 0,
single data set, but one data set has Y and Z and the Di = 0) is not used in the 2SLS estimator Eq. 3
other data set has D and Z; this is called even though it can be useful for estimating the
two-sample instrumental variable estimation average potential outcomes. Similarly the 2SLS
(Angrist and Krueger 1992; Inoue and Solon estimator uses only the information in the treatment
2010). For example, Kaushal (2007) studied the group (Zi = 1) to estimate π n and only the infor-
effect of food stamps on body mass index (BMI) mation in the control group (Zi = 0) to estimate π a,
in immigrant families using differences in state but the mixture structure (see Table 1) implies that
responses to a change in federal laws on immi- there is additional information in the control group
grant eligibility for the food stamp program as an for estimating π n and additional information in the
IV. The National Health Interview Study was used treatment group for estimating π a.
to estimate the effect of state lived in on BMI, and Imbens and Rubin (1997a, 1997b) proposed
the Current Population Survey was used to esti- two approaches using mixture modeling to esti-
mate the effect of state lived in on food stamp mate the CACE. One approach assumes a paramet-
program participation because neither data set ric distribution (normal) for the outcomes and then
contained all three variables. estimates the CACE by maximum likelihood using
the EM algorithm. This estimator provides consid-
erable efficiency gains over the 2SLS estimator
More Efficient Estimation when the parametric assumptions hold. However,
when the parametric assumptions are wrong, this
Let μc1 ¼ E Y 1i j Ci ¼ co , μc0 ¼ E Y 0i j Ci ¼ co , estimator can be inconsistent, whereas the 2SLS
μa = E(Yi| Ci = at), and μn = E(Yi| Ci = nt), estimator is consistent; see Table 4 of (Cheng
et al. 2009b) for finite-sample results. Imbens X

n0 X
N
and Rubin’s other approach to using mixture qi ¼ 1, qi ¼ 1, qi 0, i ¼ 1, . . . N,
i¼1 i¼n0 þ1
modeling to estimate the CACE is to approxi-
mate the density of the outcome distribution for X
N X
N
each compliance class under each randomiza- qi D i ¼ π c , qi Y i Di ¼ μc1 π c ,

i¼n0 þ1 i¼n0 þ1
tion group as a piecewise constant function and
then estimate the CACE by maximum likeli- X
N
qi Y i ð1 Di Þ ¼ μn ð1 π c Þ,
hood (Imbens and Rubin 1997a). This approach
i¼n0 þ1
is in principle nonparametric as the number of
i , pi , i ¼ 1, . . . , n0 such that
There exist pc0
constant pieces in each density function can be n
increased with the sample size. However,

Imbens and Rubin (1997a) do not provide a π c pc0
i þ ð1 π c Þpi ¼ qi ,
n
systematic approach for choosing the number Xn0 Xn0
i ¼
pc0 pni ¼ 1, pc0
i , pi 0, i ¼ 1, . . . , n0 ,
of and locations of the pieces. n
To take into account the mixture structure in i¼1 i¼1

outcome distribution, Cheng et al. (2009b) devel- X
n0
oped a systematic and easily implementable pni ðY i μn Þ ¼ 0,

i¼1
approach for inference about the CACE using
X
n0

i Yi μ ¼ 0:
empirical likelihood (Owen 2002). Empirical likeli- pc0 c0
hood profiles a general multinomial likelihood with i¼1

support on the observed data points and therefore is
an easily constructed random approximation to where pc0 n
i and pi are the population probabilities
unknown distributions. Maximum empirical likeli- that a randomly chosen complier assigned to the
hood estimators have good properties. The maxi- no encouragement group and a randomly chosen
mum empirical likelihood estimator for the CACE is never taker assigned to the no encouragement
robust to parametric distribution assumptions since group have the same outcome as subject i,
the empirical likelihood for a parameter such as the respectively.
CACE is the nonparametric profile likelihood for the By maximizing the empirical likelihood with
parameter. the EM algorithm as described in Cheng et al.
To explain the methodology of Cheng et al. (2009b), the maximum empirical likelihood esti-
(2009b), consider a single consent randomized mator for the CACE is obtained. Cheng et al.
encouragement trial as an example. A single con- (2009b) show that the estimator provides substan-
sent trial is a trial in which the group that does not tial efficiency gains over the 2SLS estimator in
receive encouragement to take the treatment has finite samples. Cheng et al. (2009b) also extend
no access to the treatment so that the set of always their methodology to general encouragement tri-
takers and defiers is empty in the trial (Zelen als in which there are always takers.
1979). Let the first n0 subjects be from the not In addition to the inference on CACE, Cheng
encouraged group and the next N n0 subjects be et al. (2009a) developed a semiparametric IV
from the encouraged group. Then the empirical method based on the empirical likelihood approach
likelihood LE of the parameters (π c, μn, μc1, μc0) is for distributional treatment effects for compliers and
other general functions of the compliers’ outcome

! distribution. They showed that their estimators are
n0 N
LE π c , μ , μ , μ
n c1 c0
¼ max ∏ qi ∏ qi , substantially more efficient than the standard IV
i¼1 i¼n0 þ1 estimator for treatment effects on outcome distribu-
tions (see section “Effect of Treatment on Distribu-
subject to tion of Outcomes” for more details).
Estimation with Observed Covariates the sampling uncertainty in using D^ i as an esti-

mate of E(Di|Xi, Zi) (see White 1984; Davidson
As discussed above, various methods have and MacKinnon 1993; Freedman 2009,
been proposed to use IVs to overcome the prob- Chap. 9.8). Other methods besides two-stage
lem of selection bias in estimating the effect of a least squares for incorporating measured
treatment on outcomes without covariates. covariates into the IV model are discussed in
However, in practice, instruments may be valid Little and Yau (1998), Hirano et al. (2000),
only after conditioning on covariates. For exam- Angrist and Imbens (1995), Abadie (2003) Tan
ple, in the NICU study of section “Instrumental (2006), O’Malley et al. (2011), Cheng et al.
Variables: NICU Example Revisited,” race is (2009b), and Okui et al. (2012), among others.
associated with the proposed IV excess travel Little and Yau (1998) and Hirano et al. (2000)
time and race is also thought to be associated introduce covariates in the IV model of Imbens
with infant mortality through mechanisms other and Angrist (1994) with distributional assump-
than level of NICU delivery such as maternal tions and functional form restrictions. Angrist
age, previous Caesarean section, inadequate and Imbens (1995) consider settings under
prenatal care, and chronic medical conditions fully saturated specifications with discrete
(Lorch et al. 2012b). Consequently, in order covariates. Without distributional assumptions
for excess travel time to be independent of or functional form restrictions, Abadie (2003)
unmeasured confounders conditional on mea- develops closed forms for average potential out-
sured covariates, it is important that race be comes for compliers under treatment and control
included as a measured covariate. To incorpo- with covariates. Cheng et al. (2009b) discuss
rate covariates into the two-stage least squares incorporating covariates with the empirical like-
estimator, regress Di on Xi and Zi in the lihood approach of section “More Efficient
first stage to obtain D ^ i and then regress Yi on D î Estimation.”
and Xi in the second stage. Denote the coefficient on
^ i in the second-stage regression by ^λ 2SLS . The
D
estimator ^λ
2SLS
estimates some kind of covariate-
averaged CACE as we shall discuss (Angrist and Understanding the Treatment Effect
Pischke 2009). Let (λ, ϕ) be the minimum mean That IV Estimates
squared error linear approximation to the
average response function for compliers E(Y|X, D, Relationship Between Average
Ch = co), that is, ðλ, iϕÞ ¼ arg minλ , ϕ Treatment Effect for Compliers
T
2 and Average Treatment Effect
E Y ϕ X λ D j C ¼ co (where X is
for the Whole Population
assumed to contain the intercept). Specifically, if
the complier average causal effect given X is the As discussed in section “IV Assumptions and
same for all X and the effect of X on the outcomes Estimation for Binary IV and Binary Treat-
for compliers is linear (i.e., E(Y|X, D, C = co) = ϕT ment,” the IV method estimates the CACE,
X + λD), then λ equals the CACE. The estimator the average treatment effect for the compliers
^λ 2SLS is a consistent (i.e., asymptotically unbiased) (E[Y1 Y0|C = co]). The average treatment
estimator of λ. Thus, if the complier average causal effect in the population is, under the monoto-
effect given X is the same for all X and the effect of nicity assumption, a weighted average of the
X on the outcomes for compliers is linear, ^λ
2SLS
is a average treatment effect for the compliers, the
consistent estimator of the CACE. The standard average treatment effect for the never takers,
error for ^λ
2SLS
is not the standard error from the and the average treatment effect for the always
second-stage regression but needs to account for takers:

E Y 1 Y 0 ¼ PðC ¼ coÞE Y 1 Y 0 j C ¼ co Pischke 2009; Brookhart and Schneeweiss 2007).
The mean of a covariate Xi among the compliers is
þ PðC ¼ atÞE Y 1 Y 0 j C ¼ at

þ PðC ¼ ntÞE Y 1 Y 0 j C ¼ nt : E½ κ i X i
E½Xi j C ¼ co ¼ , (7)
E½κ i
The IV method provides no direct information
on the average treatment effect for always takers where
(E[Y1 Y0|C = at]) or the average treatment
effect for never takers (E[Y1 Y0|C = nt]). How- D i ð1 Z i Þ ð1 Di ÞZi
κi ¼ 1
ever, the IV method can provide useful bounds on 1 P Zi ¼ 1 j Xi PðZ i ¼ 1j Xi Þ
the average treatment effect for the whole popu-
lation if a researcher is able to put bounds on the (Abadie 2003). The prevalence ratio of a
difference between the average treatment effect binary characteristic X among compliers com-
for compliers and the average treatment effects pared to the full population is P (X = 1|C = co)/
for never takers and always takers based on sub- P (X = 1). Table 2 shows the mean of various
ject matter knowledge. For example, suppose a characteristics X among compliers versus the full
researcher is willing to assume that this difference population and also shows the prevalence ratio
is no more than b, then (where the sample estimates of P (Zi = 1|Xi), E
[κiXi] and E[κi] are plugged into Eq. 7). Babies
E Y 1 Y 0 j C ¼ co b½1 PðC ¼ coÞ whose mothers are college graduates are slightly
underrepresented (prevalence ratio = 0.87), and
E Y 1 Y 0 E Y 1 Y 0 j C ¼ co (6)
African-Americans are slightly overrepresented
þ b½1 PðC ¼ coÞ,
(prevalence ratio = 1.14) among compliers. Very
low birthweight (<1500 g) and very premature
where the quantities on the left and right-hand
babies (gestational age 32 weeks) are substan-
sides of Eq. 6 other than b can be estimated as
tially underrepresented among compliers, with
discussed in section “IVAssumptions and Estima-
prevalence ratios around one-third; these babies
tion for Binary IV and Binary Treatment.” For
are more likely to be always takers, that is, delivered
binary or other bounded outcomes, the bounded-
at high-level NICUs regardless of mother’s travel
ness of the outcomes can be used to tighten
time. Babies whose mothers’ have comorbidities
bounds on the average treatment effect for the
such as diabetes or hypertension are slightly under-
whole population or other treatment effects
represented among compliers. Overall, Table 2 sug-
(Balke and Pearl 1997; Cheng and Small 2006).
gests that higher risk babies are underrepresented
Qualitative assumptions, such as that the average
among the compliers. If the effect of high-level
treatment effect is larger for always takers than
NICUs is greater for higher risk babies, then the
compliers, can also be used to tighten the bounds
IV estimate will underestimate the average effect of
(e.g., Cheng and Small 2006; Bhattacharya et al.
high-level NICUs for the whole population.
2008; Siddique 2009).
Understanding the IV Estimate When

Characterizing the Compliers Compliance Status Is Not Deterministic
The IV method estimates the average treatment For an encouragement that is uniformly delivered,
effect for the subpopulation of compliers. Who are such as patients who made an appointment at a
these compliers and how do they compare to psychiatric outpatient clinic are sent a letter
noncompliers? To understand this better, it is use- encouraging them to attend the appointment
ful to characterize the compliers in terms of their (Kitcheman et al. 2008), it is clear that a subject
distribution of observed covariates (Angrist and is either a complier, always taker, never taker, or
Table 2 Complier characteristics for NICU study. The characteristic X, and the fourth column shows the estimated
second column shows the estimated proportion of com- ratio of compliers with X compared to the full population
pliers with a characteristic X, the third column shows the with X
estimated proportion of the full population with the
Prevalence of X among Prevalence of X in full Prevalence ratio of X among compliers
Characteristic X compliers population to full population
Mother College 0.23 0.26 0.87
Graduate
African-American 0.17 0.15 1.14
Birthweight < 1,500 g 0.03 0.09 0.33
Gestational age 0.04 0.13 0.34
32 weeks
Gestational diabetes 0.05 0.05 0.91
Diabetes mellitus 0.02 0.02 0.77
Pregnancy-induced 0.08 0.10 0.82
hypertension
Chronic hypertension 0.02 0.02 0.89
defier with respect to the encouragement. How- intervention. Consider the case of Mendelian ran-
ever, sometimes encouragements that are not uni- domization, in which the IV is often a single
formly delivered are used as IVs. For example, in nucleotide polymorphism (SNP) that might be
the NICU study, consider the IV of whether the part of a gene A. The SNP may be a marker for a
mother’s excess travel time to the nearest high- gene B on the same chromosome that actually
level NICU is more than 10 min. If a mother affects the level of the exposure D. The encour-
whose excess travel time to the nearest high- agement intervention is receiving the gene B that
level NICU was more than 10 min moved to a actually affects the level of the exposure D, and
new home with an excess travel time less than the SNP is just a proxy for this encouragement.
10 min, whether the mother would deliver her Consequently, even if a subject’s exposure level
baby at a high-level NICU might depend on addi- would change as a result of a change in gene B,
tional aspects of the move, such as the location whether the subject is a complier with respect to a
and availability of public transportation at her new change in the SNP depends on whether the change
home (Joffe 2011) and the exact travel time to the in the SNP leads to a change in the gene B, which
nearest high-level NICU at her new home. Con- is randomly determined through the process of
sequently, a mother may not be able to be deter- recombination (Joffe 2011).
ministically classified as a complier or not a Brookhart and Schneeweiss (2007) provide a
complier – she may be a complier with respect to framework for understanding how to interpret the
certain moves but not others. Another example of IV estimate when compliance status is not deter-
nondeterministic compliance is that when physi- ministic. Suppose that the study population can be
cian preference for one drug versus another is decomposed into a set of κ + 1 mutually exclusive
used as the IV (e.g., Z = 1 if a patient’s physician groups of patients based on clinical, lifestyle, and
prescribes drug A more often drug B), whether a other characteristics such that within each group
patient receives drug A may depend on how of patients, whether a subject receives treatment is
strongly the physician prefers drug A (Brookhart independent of the effect of the treatment. All of
and Schneeweiss 2007; Hernán and Robins the common causes of the potential treatment
2006). Another situation in which nondeter- received D1, D0, and the potential outcomes Y1,
ministic compliance status can arise is that the Y0 should be included in the characteristics used to
IV may not itself be an encouragement interven- define these groups. For example, if there are
tion but a proxy for an encouragement L binary common causes of (D1, D0, Y1, Y0),
then the subgroups can be the κ + 1 = 2L possible (Lorch et al. 2012a). If there are subgroups for
values of these common causes. Denote patient whom the encouraging level of the instrument
membership in these groups by the set of indica- makes them less likely to receive the treatment,
tors S = {S1, S2,. . ., Sκ}. Consider the following then this subgroup would get “negative weight”
model for the expected potential outcome: and Eq. 8 is not a true weighted average, poten-
tially leading the IV estimator to have the opposite

E Y d j S ¼ α0 þ α1 d þ αT2 S þ αT3 Sd sign of the effect of the treatment. For example,
Brookhart and Schneeweiss (2007) discussed
The average effect of treatment in the popula- studying the safety of metformin for treating
tion is α1 þ αT3 E½S , and the average effect of type II diabetes versus other antihyperglycemic
treatment in subgroup j is α1 + α3,j. Under the IV drugs among patients with liver disease using phy-
assumptions 1–3 and 5 in section “Framework sician preference as the IV (Z = 1 if a physician
and Notation,” that is, all the assumptions except is more likely to prescribe metformin than
monotonicity, the IV estimator estimates the fol- other antihyperglycemic drugs). Metformin is
lowing quantity: contraindicated in patients with decreased liver
disease, as it can cause lactic acidosis, a potentially
fatal side effect. Brookhart and Schneeweiss
EðYj Z ¼ 1Þ EðYj Z ¼ 0Þ (2007) speculated that physicians who infrequently
EðYj Z ¼ 1Þ EðDj Z ¼ 0Þ use metformin will be less likely to understand its
Xκ contraindications and would therefore be more
¼ α1 þ α3, j E Sj wj , (8) likely to misuse it. If this hypothesis is true, then
j¼1 for estimating the effect of metformin on lactic
acidosis, the IV estimator could mistakenly make
where metformin appear to prevent lactic acidosis, as
patients of physicians with Z = 1 are at lower risk
E DjZ ¼ 1, Sj ¼ 1 E DjZ ¼ 0,Sj ¼ 1 of being inappropriately treated with metformin.
wj ¼ :
EðDjZ ¼ 1Þ EðDjZ ¼ 0Þ When the compliance class is deterministic, a sub-
group getting negative weight means that there are
The IV estimator Eq. 8 is a “weighted average” defiers, violating the monotonicity assumption.
of treatment effects in different subgroups, where
the subgroups in which the instrument has a stron-
ger effect on the treatment get more weight. Note Assessing the IV Assumptions
that when the compliance class is deterministic, and Sensitivity Analysis for Violations
then the subgroups can be defined as the compli- of Assumptions
ance classes and Eq. 8 just says that the IV esti-
mator is the average treatment effect for Assessing the IV Assumptions
compliers. In the NICU study, where compliance
class may not be deterministic, Table 2 suggests This section will discuss assessing the two key IV
that babies in lower-risk groups, for example, not assumptions: (1) the IV is independent of
very low birthweight or not very low gestational unmeasured confounders; (2) the IV affects out-
age, are weighted more heavily in the IV estima- come only through treatment received (the exclu-
tor. If there are subgroups for whom the instru- sion restriction).
ment has no effect on their treatment level, then One way of assessing whether the proposed IV
that subgroup gets zero weight. For example, is independent of unmeasured confounders con-
mothers or babies with severe preexisting condi- ditional on measured confounders is to look at
tions may virtually always be delivered at a high- whether the proposed IV is associated with mea-
level NICU, so that the IV of excess travel sured confounders. Although measured con-
time has no effect on their treatment level founders can be controlled for, if the measured
confounder is only a proxy for the true con- (Demissie et al. 2001; Lorch et al. 2012b), it is
founder, then an association between the proposed sensible to examine the association of other
IV and the measured confounder suggests that measured confounders with the IV after control-
there will be an association between the IV and ling for race. Table 4 shows the association of
the unmeasured part of the true confounder. If the IV with measured confounders for whites.
there are two or more sources of confounding, The clinical measured confounders such as low
then it is useful to examine if the observable part birthweight, gestational age 32 weeks, and
of one source of confounding is associated with maternal comorbidities (diabetes and hyperten-
the IV after controlling for the other sources of sion) are generally similar between near and far
confounding. These ideas will be illustrated using babies although there are some significant asso-
the NICU study described in section “Instrumen- ciations. This similarity between the clinical
tal Variables: NICU Example Revisited.” Table 3 status of near and far babies and mothers after
shows the imbalance of measured covariates controlling for race provides some support that
across levels of the IV. The racial composition is the IV is approximately, although not exactly,
very different between the near (Z = 1) and far valid for whites. However, whether the mother
(Z = 0) babies, with near babies being much more is a college graduate differs substantially
likely to be African-American. Since race has a between white near and far mothers, suggesting
substantial association with neonatal outcomes that there may be residual confounding due to
Table 3 Imbalance of measured covariates across levels compliers is P (D = 1|Z = 1) P (D = 1|Z = 0) = 0.447 so
of the instrument for the NICU data. The prevalence dif- that a prevalence difference ratio less than 0.447 for an
ference ratio is the ratio of the imbalance of the measured X indicates that there would less bias in the IV method from
covariates across levels of the instrument to the imbalance failing to adjust for X than from ordinary least squares that
across levels of the treatment. The estimated proportion of failed to adjust for X
Characteristic X P (X|near) (%) P (X|far) (%) p-value Prevalence difference ratio
Birthweight < 1,500 g 9.4 7.7 <0.01 0.02
Mother College Graduate 25.9 26.1 0.26 0.04
African-American 25.6 4.6 <0.01 0.64
Gestational age 32 weeks 14.3 11.7 <0.01 0.23
Gestational diabetes 5.2 5.2 0.47 0.12
Diabetes mellitus 1.8 1.9 0.07 0.16
Pregnancy-induced hypertension 10.6 10.1 <0.01 0.13
Chronic hypertension 1.9 1.3 <0.01 0.61
Table 4 Imbalance of measured covariates across levels Z = 1, white) P (D = 1|Z = 0, white) = 0.418 so that a
of the instrument for babies born to white mothers in the prevalence difference ratio less than 0.418 for an
NICU data. The prevalence difference ratio is the ratio of X indicates that there would less bias in the IV method
the imbalance of the measured covariates across levels of from failing to adjust for X than from ordinary least squares
the instrument to the imbalance across levels of the treat- that failed to adjust for X
ment. The estimated proportion of compliers is P (D = 1|
Birthweight < 1,500 g 7.5 7.2 0.07 0.04
Mother College Graduate 34.4 26.8 <0.01 0.72
Diabetes mellitus 1.8 1.9 0.08 0.17
Chronic hypertension 1.6 1.3 <0.01 0.43
Table 5 Imbalance of measured covariates across levels compliers is P (D = 1|Z = 1, African-American) P

of the instrument for babies born to African-American (D = 1|Z = 0, African-American) = 0.503 so that a
mothers in the NICU data. The prevalence difference prevalence difference ratio less than 0.503 for an
ratio is the ratio of the imbalance of the measured X indicates that there would less bias in the IV method
covariates across levels of the instrument to the imbalance from failing to adjust for X than from ordinary least squares
across levels of the treatment. The estimated proportion of that failed to adjust for X
Birthweight < 1,500 g 13.5 11.9 <0.01 0.41
Mother College Graduate 8.0 10.7 <0.01 1.60
Diabetes mellitus 1.9 2.6 <0.01 1.35
Chronic hypertension 2.8 2.4 0.12 0.34
socioeconomic status. Table 5 shows the asso- Therefore,

ciation of the IV with measured confounders for
African-Americans. For African-Americans, EðYj D ¼ 1Þ EðYj D ¼ 0Þ
there are more substantial associations than for ¼ α1 þ α2 ðE½Uj D ¼ 1 E½Uj D ¼ 0Þ,
whites between near/far status and the important
clinical status variables low birthweight and so that an ordinary least squares analysis that did
gestational age 32 weeks, raising more con- not adjust for U would be biased by α2(E[U|
cern about whether the IV is approximately D = 1] E[U|D = 0]). To evaluate the IV
valid for African-Americans. estimand, consider the further assumption that E
The last column of Tables 3, 4, and 5 shows the [ϵ 0|Z] = 0 so that the proposed IV can be related to
prevalence difference ratio, a measure of how the observed outcome only through its effect on
biased an IV analysis would be from failing to D or association with U; also assume that E
adjust from the confounder as compared to an (ϵ 1 ϵ 0|C) is the same for all compliance classes
ordinary least squares analysis (Brookhart and C so that the complier average causal effect is equal
Schneeweiss 2007). The below discussion of the to the overall average causal effect α1. These
prevalence difference ratio is drawn from assumptions together say that if U were controlled
Brookhart and Schneeweiss (2007). Denote the for, the IV estimator would consistently estimate
confounder by U. Consider the following model the average treatment effect α1. Under these
for the potential outcome: assumptions, the probability limit of the IV estima-
tor that does not control for U can be written as
Y d ¼ α0 þ α1 d þ α2 U þ ϵ d , (9)
E½Yj Z ¼ 1 E½Yj Z ¼ 0
where E(ϵ d|U) = 0. The average treatment E½Dj Z ¼ 1 E½Dj Z ¼ 0
effect is E[Y1 Y0] = α1. The observed data is EðUj Z ¼ 1Þ EðUj Z ¼ 0Þ
¼ α1 þ α2 :
EðDj Z ¼ 1Þ EðDj Z ¼ 0Þ
Y ¼ α0 þ α1 D þ α2 U þ ϵ 0 þ Dðϵ 1 ϵ 0 Þ:
The asymptotic bias of the IV estimator is thus
Assume that E(ϵ d|D, U) = 0 for d = 0 or 1. This
assumption means that if U were controlled for, IV EðUj Z ¼ 1Þ EðUj Z ¼ 0Þ
the parameters of Eq. 9 could be consistently Bias β^ 1 ¼ α2 :
EðDj Z ¼ 1Þ EðDj Z ¼ 0Þ
estimated by least squares. By iterated expecta-
tions, E[ϵ 0 + D(ϵ 1 ϵ 0)|D] = 0. (10)
The term E(U|Z = 1) E(U|Z = 0) is the analysis reduces bias for whites compared to
difference in the prevalence of the risk factor OLS but not for African-Americans.
U between levels of the IV. The total bias in the A way of testing whether the two key IV
IV estimator is this difference multiplied by the assumptions (i.e., (i) the IV is independent of
excess risk of the outcome among patients with unmeasured confounders conditional on the mea-
U = 1 divided by the strength of the IV. For the IV sured confounders and (ii) the IVaffects outcomes
estimator to have less asymptotic bias than ordi- only through treatment received) hold is to find a
nary least squares (OLS), the following condition subpopulation for whom the link between the IV
must hold (Brookhart and Schneeweiss 2007) and treatment received is thought to be broken and
then test whether the IV is associated with the
outcome in this subpopulation. The only way in
E½Uj Z ¼ 1 E½Uj Z ¼ 0
which the IV could be associated with the out-
E½Uj D ¼ 1 E½Uj D ¼ 0
come in such a subpopulation is if the IV was
< EðDj Z ¼ 1Þ EðDj Z ¼ 0Þ: (11) associated with unmeasured confounders or
directly affected the outcome through a pathway
In other words, the difference in the prevalence other than treatment received. Figure 2 shows an
of U between levels of Z relative to the difference example. Kang et al. (2013) study the effect of
in the prevalence of U between levels of D must children in Africa getting malaria on their becom-
be less than the strength of the IV (Brookhart and ing stunted (having a height that is two standard
Schneeweiss 2007). The left-hand side of Eq. 11 is deviations below the expected height for the
called the prevalence difference ratio (PDR). In child’s age) and consider the sickle cell trait as a
order for us to think that the IVanalysis is likely to possible IV. The sickle cell trait is that a person
be less biased than OLS, the PDR should be less inherits a copy of the hemoglobin variant HbS
than the strength of the IV (E[D|Z = 1] E[D| from one parent and normal hemoglobin from
Z = 0]), particularly for those variables clearly the other. While inheriting two copies of HbS
related to the outcome. Table 4 shows that the results in sickle cell disease and substantially
PDRs are generally less than the strength of the shortened life expectancy, inheriting only one
IV (0.418) for whites, but the PDRs are often copy (the sickle cell trait) is protective against
greater than the strength of the IV (0.503) for malaria and is thought to have little detrimental
African-Americans, suggesting that the IV effect on health (Aidoo et al. 2002). To test
Fig. 2 Causal diagrams for

the effect of the sickle cell
trait (the IV) and malaria
episodes (the treatment) on
stunting (the outcome) in
African children and
African-American children.
If the sickle cell trait is a
valid IV, then the dashed
lines should be absent and
the sickle cell trait will have
no effect on stunting among
African-American children
whether the sickle cell trait indeed does not affect (ii) E[Y|Z = 1, D = 0] E[Y|Z = 0, aD = 0] = 0.
stunting in ways other than reducing malaria and These are the differences between (i) the average
is not associated with unmeasured confounders, potential outcome of the group of always takers
Kang et al. (2013) considered whether the sickle and compliers together when these subjects are
cell trait is associated with stunting among encouraged to receive treatment and receive treat-
African-American children; the sickle cell trait ment versus those of always takers alone when
has high prevalence among African-Americans they are not encouraged to receive treatment but
but does not affect malaria because malaria is do receive treatment and (ii) the average potential
not present in the United States. Rehan (1981) outcome of never takers when encouraged to
and Kramer et al. (1978) found no evidence that receive treatment but do not receive treatment
sickle cell trait is associated with growth and versus those of the group of never takers and
development in African-American children. This compliers when they are not encouraged to
provides evidence that the dashed lines in Fig. 2 receive treatment and do not receive treatment.
are indeed absent, which would mean that the If the IV assumptions hold that the IV is not
proposed IV of the sickle cell trait does indeed associated with unmeasured confounders and
satisfy the two key IV assumptions of being inde- has no direct effect on the outcome other than
pendent of unmeasured confounders and affecting treatment received, then (i) is equal to zero if
outcomes only through treatment received. and only if the average potential outcome of
Angrist and Krueger (1991) also employed this compliers and always takers are the same when
strategy of finding a subpopulation for whom the both groups receive treatment and (ii) is equal to
link between the IV and treatment received is zero if and only if the average potential out-
broken to test their IV of quarter of birth for comes of compliers and never takers are the
studying the effect of education on earnings. The same when both groups do not receive treat-
reason that quarter of birth is associated with ment. Typically, the average potential outcome
education is that for students who plan to drop of compliers and always takers (compliers and
out of school as soon as they have reached the age never takers) will not be the same when both
at which they are no longer compelled to be in groups receive (do not receive) treatment even if
school (e.g., age 16), quarter of birth affects how the IV assumptions hold.
much education these students will get before they
drop out because children start school at different
ages depending on their quarter of birth. However, Sensitivity Analysis
for students who plan to go to college, quarter of
birth does not affect their amount of schooling. A sensitivity analysis seeks to quantify how sen-
Consequently, Angrist and Krueger (1991) looked sitive conclusions from an IV analysis are to plau-
at whether there was an absence of an association sible violations of key assumptions. Sensitivity
between quarter of birth and earnings among stu- analysis methods for IVanalyses have been devel-
dents who went to college to test the IV oped by Angrist et al. (1996), Brookhart and
assumptions. Schneeweiss (2007), Small (2007), Small and
Newcomers to IV methods often think that the Rosenbaum (2008), and Baiocchi et al. (2010),
validity of the IV can be tested by regressing the among others. Here an approach will be presented
outcome on treatment received, the IV and mea- to sensitivity analysis for violations of the
sured confounders, and testing whether the coef- assumption that the IV is independent of
ficient on the IV is significant. However, this is not unmeasured confounders. Assume that the con-
a valid test as even if the IV assumptions hold, the cern is that the IV may be related to an
coefficient on the IV would typically be nonzero. unmeasured confounder U which has mean
One way to see this is that if there are no measured 0 and variance 1 and is independent of the mea-
confounders, the test amounts to testing whether sured confounders X (U can always taken be to
(i) E[Y|Z = 1, D = 1] E[Y|Z = 0, D = 1] = 0 and the residual of the unmeasured confounder given
the measured confounders to make this assump- be strong evidence that high-level NICUs reduce
tion plausible). Consider the following model: mortality substantially (lower end of 95% CI:
0.14% reduction). However, if there was an
Y di ¼ α þ βd þ γ T Xi þ δU i þ ei unmeasured confounder U that decreased the
U i ¼ ρ þ ηZi þ vi (12) death rate by 0.5% for a one standard deviation
increase in U and was 0.5 standard deviations
Eðvi j Xi , Zi Þ ¼ 0, Eðei j Xi , Zi Þ ¼ 0:
higher in subjects with Z = 1 versus Z = 0, then
there would no longer be strong evidence that high-
β is the causal effect of increasing D by one
level NICUs reduce mortality substantially. It can
unit. The sensitivity parameters are δ, the effect of
be useful to calibrate the effect of a potential
a one standard deviation increase in the
unmeasured confounder U to that of a measured
unmeasured confounder on the mean of the poten-
confounder. For example, an increase in gestational
tial outcome under no treatment, and η, how much
age from 30 to 33 weeks, which is a one standard
higher the mean of the unmeasured confounder Ui
deviation increase in gestational age, is associated
is in standard deviation units for Zi = 1 versus
with a reduction in the death rate of 2.2%, and the
Zi = 0. Model (12) says that Zi would be a valid IV
mean gestational age is 0.093 standard deviations
if both the measured confounders Xi and the
smaller among near (Z = 1) versus far (Z = 0)
unmeasured confounder Ui were controlled for.
babies. For a comparable U that reduced the death
Under model (12), the following holds
rate by 2.2% for a one standard deviation increase
in U and was 0.093 standard deviations smaller in
Y i ¼ α þ βDi þ γ T Xi þ δU i þ ei babies with Z = 1 versus Z = 0, there would still be
Y i δηZi ¼ α þ δρ þ βDi þ γ T Xi þ ei þ δvi strong evidence that high-level NICUs reduce mor-
Eðvi j Xi , Z i Þ ¼ 0, Eðei j Xi , Z i Þ ¼ 0: tality substantially (see the last row of Table 6).
A sensitivity analysis for violations of the
Consequently, a consistent estimate of and assumption that the IV has no direct effect on the
inferences for β can be obtained by carrying out a outcome can be carried out as follows. Suppose
two-stage least squares analysis with Yi δηZi as that the IV has a direct effect of λ but the IV is
the outcome variable, Di as the treatment variable, independent of unmeasured confounders, that is,
Xi as the measured confounders, and Zi as the
IV. Table 6 shows a sensitivity analysis for the Y iz, d ¼ α þ βd þ γ T Xi þ λz þ ei
(13)
NICU study. If there was an unmeasured con- Eðei j Xi , Z i Þ ¼ 0,
founder U that decreased the death rate by 0.1%
for a one standard deviation increase in U and was Then, a consistent estimate of and inferences
0.5 standard deviations higher on average in sub- for β can be obtained by carrying out a two-stage
jects with Z = 1 versus Z = 0, then there would still least squares analysis with Yi λZi as the outcome
Table 6 Estimates and 95% confidence intervals for b, deviation increase in the unmeasured confounder on the
the risk difference effect of a premature baby being deliv- mean of the potential outcome under no treatment, and ,
ered in a high-level NICU, for different values of the how much higher the mean of the unmeasured confounder
sensitivity parameters d, the effect of a one standard Ui is in standard deviation units for Zi = 1 versus Zi = 0
δ η β^ 95% CI for β
0 0 0.0059 (0.0091, 0.0027)
0.001 0.5 0.0046 (0.0079, 0.0014)
0.005 0.5 0.0004 (0.0029, 0.0036)
0.001 0.5 0.0071 (0.0104, 0.0039)
0.005 0.5 0.0121 (0.0154, 0.0089)
0.022 0.093 0.0110 (0.0142, 0.0078)
variable, Di as the treatment variable, Xi as the same. Under the assumption that the variance
measured confounders, and Zi as the IV. When a of the outcomes for the always takers,
proposed IV Z is thought to be independent of never takers, compliers under treatment,
unmeasured confounders but there is concern and compliers under control is the same σ 2
that Z might have a direct effect on the outcome, for each group, the asymptotic variance of
pffiffiffiffi
Joffe et al. (2008) proposed an extended instru- ^ 2SLS CACE , where CACE
N CACE ^ 2SLS is
mental variable strategy for obtaining an unbi- the two-stage least squares estimator Eq. 3, is
ased estimate of the causal effect of treatment
that requires having a covariate W which inter-
acts with Z in affecting treatment but for which σ 2 Var ðZ Þ
the direct effect of Z does not depend on W. This CovðD, ZÞ
method is described in section “Extended σ2
¼ ,
Instrumental Variable Method for When Pro- ½PðD ¼ 1j Z ¼ 1Þ PðD ¼ 1j Z ¼ 0Þ2
posed IV Has a Direct Effect.” (14)
(Imbens and Angrist 1994). Thus, for a

Weak Instruments sample of size N, the variance of the IV
esstimate is equivalent to the variance from
The strength of an IV refers to how strongly the IV having a sample of N P (C = co)2 known
is associated with the treatment after controlling compliers. For example, for a sample size of
for the measured confounders X. An IV is weak if 10,000 with 20% compliers, the variance of the
this association is weak. When the IV is encour- IV estimate is equivalent to that from a sample
agement (vs. no such encouragement) to accept a of 400 known compliers as could be obtained
treatment, the IV is weak if the encouragement from a randomized trial of size 400 with perfect
only has a slight impact on acceptance of the compliance. Thus, weak IVs can drastically
treatment. The strength of the IV can be measured reduce the effective sample size, resulting in
by the proportion of compliers or the partial r2 high variance and potentially low power.
when adding the IV to the first-stage model for the 2. Misleading inferences from two-stage least
treatment after already including the measured squares. When the IV is weak enough, confi-
confounders X (Bound et al. 1995; Shea 1997). dence intervals formed using the asymptotic
Studies that use weak IVs face three problems: standard errors for two-stage least squares,
that is, Eq. 14, may be misleading. Beginning
1. High variance. The IV method is estimating with Bound et al. (1995), it has been recognized
the complier average causal effect (CACE), that the most common method of inference with
and the only subjects that are contributing instrumental variables, two-stage least squares,
information about the CACE are the compliers. gives highly misleading inferences when the
Thus, the weaker the IV is (i.e., the smaller the instrument is weak even when the instrument
proportion of compliers), the larger is the var- is perfectly valid. The two-stage least squares
iance of the IV estimate. One might think that estimate can have substantial finite sample bias
for a sample of size N, the variance of the IV toward the ordinary least squares estimate and
estimate would be equivalent to the variance the asymptotic variance understates the actual
from having a sample of N P (C = co) variance. To see this, consider including a ran-
known compliers. However, the situation is dom number as an IV (the random number is
actually worse because additional variability not a valid IV because it is not correlated
is contributed from the always takers and with the treatment received). Although the ran-
never takers having different sample means in dom number is theoretically unrelated to the
the encouraged and unencouraged groups, unmeasured confounding variables, it will have
even though the population means are the some chance association with the unmeasured
confounders in a sample, and thus, some In summary, when the IV is weak, the IV
confounding will get transferred to the pre- estimate may have high variance, and if it is
dicted value of the treatment. This will result weak enough (i.e., partial F statistic less than
in some unmeasured confounding getting 10), it is important to use inference methods
transferred to the second-stage estimate of the other than two-stage least squares to provide
treatment effect. Stock et al. (2002) studied accurate inferences. These inference methods
what strength of IV is needed to ensure that may inform us that the confidence interval for
two-stage least squares provides reliable infer- the treatment effect is very wide, but it is possible
ences. They suggested looking at the first-stage that even when the IV is weak, if the treatment
partial F statistic for testing that the coefficient effect is large enough and the sample size is big
on the IV(s) is zero. For one IV, if this first enough, there may still be a statistically signifi-
stage partial F statistic is less than about 10, the cant treatment effect assuming the IV is valid.
two-stage least squares inferences are mislead- The third problem with weak IVs is that they are
ing in the sense that the type I error rate of a very sensitive to bias from being slightly invalid,
nominal 0.05 level is actually greater than 0.15. that is, being slightly correlated with unmeasured
If more than one IV is used, then the first-stage confounders. This problem does not go away
partial F statistic needs to be larger to avoid with a larger sample size. A slightly biased but
misleading inferences, greater than 12 for two strong IV may be preferable to a less biased but
IVs, greater than 16 for five IVs, and greater weak IV (Small and Rosenbaum 2008).
than 21 for ten IVs.
A number of methods have been devel-
oped that provide accurate inferences when Binary Outcomes
the IV is weak. One method is to use the
permutation inference developed in Imbens Often in health services research, the outcomes of
and Rosenbaum (2005) and illustrated in interest take values which are not continuous and
Small and Rosenbaum (2008). Another thus are not amenable to common techniques such
method developed by Moreira (1990) is to as two-stage least squares (2SLS). In this section,
consider the conditional distribution of the methods appropriate for binary outcomes will be
likelihood ratio statistic, conditioning on the discussed. In the next section methods, appropri-
value of nuisance parameters. This method is ate for other noncontinuous outcomes settings
implemented in a Stata program CLRv2. will be introduced. For good general reviews of
3. Highly sensitive to bias from unmeasured con- estimating IV effects in the binary outcome case,
founders. Recall formula (10) for the bias in the see Clarke and Windmeijer (2012), Vansteelandt
IVestimator when the proposed IV is associated et al. (2011), and Angrist (2001) (along with asso-
with an unmeasured confounder U. The numer- ciated comments).
ator measures the association between the IV In 2SLS, one regression is run predicting the
and the unmeasured confounder (multiplied by treatment, and then the estimated value of the treat-
how much the unmeasured confounder affects ment from this model is used and put into a second
the outcome). The denominator is the propor- regression of the outcome on the covariates and the
tion of compliers and reflects the strength of the predicted treatment. This type of estimator, where
IV. Thus, when the IV is weak (i.e., the propor- the predictions from one model are substituted into
tion of compliers is small), the effect of the IV a second model, is often referred to as a two-stage
being invalid from being associated with an predictor substitution (2SPS).
unmeasured confounder is greatly exacerbated, When first encountering situations with binary
and even a minor association between the IV outcomes, most analysts will recognize the regu-
and an unmeasured confounder can lead to lar 2SLS is problematic because it will not respect
substantial bias if the IV is weak (Bound et al. boundary conditions (i.e., the functional form
1995; Small and Rosenbaum 2008). imposes no constraints on parameter space,
meaning 2SLS can produce logical absurdities additivity of the terms on the right-hand side of
such as probabilities greater than one or even the regression to separate the endogeneity of the
negative). Through analogy to 2SLS, the naive treatment and allow unbiased estimation of the
analyst may consider changing the second-stage treatment effect. If M () is nonlinear, though,
regression to be a logistic model (or perhaps a generally 2SPS will not maintain the separabil-
probit) in lieu of the linear model. This would be ity of the confounding variables through the
a 2SPS. Unfortunately, in general, 2SPS models substitution method.
do not have the nice orthogonality properties of Another approach here is to use a two-stage
2SLS and produce biased estimates (Angrist and residual inclusion (2SRI) model. The idea in a
Pischke 2009; Wooldridge 1997). Other 2SRI is to model the unobserved covariates
approaches should be considered. These using the instrument, not the treatment, and
approaches include the parametric approaches of thereby remove the endogeneity. The first stage
Hirano et al. (2000) and the semiparametric in a 2SRI model is the same in that you model the
approaches of Abadie (2003), Tan (2006), and treatment selection. But the difference is that in
Vansteelandt et al. (2011)). Two other widely the second stage you substitute in the residuals
used approaches (two-stage residual inclusion from the first stage, not the predicted treatment. In
and a binary probit model) and a relatively new formula this is to say:
approach (effect ratios) will be considered in

detail below. EðYj D, X, UÞ ¼ M DβD þ XT βX þ UT β^ U
(17)
Two-Stage Residual Inclusion where UT βU is estimated as the difference

between the actual treatment value and the pre-
Two-stage residual inclusion (2SRI) is a two-stage dicted treatment value from the first stage (i.e.,
regression method that is equivalent to 2SPS the residual). The difference between a 2SPS and
when the outcome is continuous but differs a 2SRI is what information from the first stage is
when the outcome is binary. Consider the non- used in the second stage. 2SPS and 2SRI produce
linear model the same estimates for linear models but not for
nonlinear models. For an introduction to 2SRI

EðYj D, X, UÞ ¼ M DβD þ XT βX þ UT βU models and how they differ from 2SPS (of which
(15) 2SLS is a special case), see Terza et al. (2008). It
was shown using simulation studies in Cai et al.
where M ( ) is a known function of the treat- (2012) that for the estimation of the causal odds
ment D, a vector of observed covariates X, and a ratio for compliers, the 2SPS and 2SRI models
vector of unobserved covariates U. The performed similarly; see also Cai et al. (2011) for
unobserved covariates U are correlated with the an analytical comparison. The simulation studies
treatment D when there is unmeasured of Cai et al. (2012) also showed that the general-
confounding. ized structural mean model (GSMM) in an IV
In a 2SPS model, the actual treatment is framework with binary outcomes tended to per-
replaced by some predicted values, like so form quite well vis-a-vis 2SPS and 2SRI models.
See Vansteelandt et al. (2011) for an introduction

^ D þ XT β X þ UT β U
EðYj D, X, UÞ ¼ M Dβ to GSMM in an IV framework.
(16)
Bivariate Probit Models
where D^ is estimated using the IV. This is how
2SLS is done. If the model, M (), is linear then – The bivariate probit model is a parameterized model
speaking loosely – 2SLS makes use of the that assumes an explicit functional form of the
bivariate distribution of the error terms from the The effect ratio, λ, is the parameter
selection model and the error terms from the out-
come model (Bhattacharya et al. 2006; Muthen P2 1
PI
i¼0 j¼0 Y ij Y 0
ij
1979). This model leans on the parametric assump- λ¼P P , (18)
I 2
tions of the error terms, leaving the conclusions i¼0 j¼0 D 1
ij D 0
ij
sensitive to modifications of the assumptions. Addi-
tionally, these models suffer from difficulty in max- P
where it is implicitly assumed that 0 6¼ Ii¼0
imizing the likelihood functions and trouble with P2 1
j¼0 Dij Dij . Here, λ is a parameter of the
0
calculating appropriate standard errors (Freedman
and Sekhon 2010). finite population of 2I individuals, and because

Y 0ij , Y 1ij and D0ij , D1ij are not jointly
Matching-Based Estimator: Effect Ratio observed, λ cannot be calculated from observ-
able data so inference is required.
Coming out of a different tradition, a class of To test the null hypothesis H0: λ = λ0, construct
estimator has been proposed which is also capa- the following statistics
ble of dealing with binary outcomes in an IV
setting. Proposed in Baiocchi et al. (2010), the (
1X I X
2
“effect ratio” in a binary setting can be thought T ðλ0 Þ ¼ Zij Y ij λo Dij
I i¼1 j¼1
of as a risk difference estimator for the com- )
pliers. The effect ratio is predicated on having X
2
1 Zij Y ij λo Dij
matched sets. In Baiocchi et al. (2010) matched j¼1
pairs were constructed using a study design- X I
1
based approach called near-far matching. ¼ V i ðλo Þ, say,
I
Near-far matching will be discussed in the next i¼1
section. (19)
First, notation will be introduced required to
discuss the effect ratio. Assume there are where, because Y ij λ0 Dij ¼ Y 1ij λ0 D1ij if Zij = 1
I matched pairs, i = 1,. . ., I, with 2 subjects, and Y ij λ0 Dij ¼ Y 0ij λ0 D0ij if Zij = 0, write
j = 1, 2, one treated subject and one control, or
2I subjects in total. If the jth subject in pair
i receives the treatment, write Zij = 1, whereas if X
2
this subject receives the control, write Zij = 0, so V i ðλ0 Þ ¼ Zij Y 1ij λ0 D1ij
j¼1
1 = Zi1 + Zi2 for i = 1,. . ., I. The matched pairs
were formed by matching for an observed covar- X
2
1 Z ij Y 0ij λ0 D0ij : (20)
iate xij but may have failed to control an
j¼1
unobserved covariate uij; that is, xij = xik for all
i, j, k, but possibly uij 6¼ uik.
For any outcome, each subject has two poten-
Also, define
tial responses, one seen when the instrument
encourages the subject to take the treatment,
1 XI
Zij = 1, the other seen when the instrument ran- S2 ðλ0 Þ ¼ fV i ðλ0 Þ T ðλ0 Þg2 :
domly assigns the subject to be encouraged to take I ðI 1Þ j¼1
the control, Zij = 0. Here, there are two responses,

the potential outcomes Y ðZij ¼1Þ , Y ðZij ¼0Þ and the As shown in Baiocchi et al. (2010), under
reasonable conditions, the hypothesis H0: λ = λ0
potential treatment selections DðZij ¼1Þ , DðZij ¼0Þ : may be tested by comparing the test statistic T (λ0)

Abbreviate these as Y 0ij , Y 1ij and D0ij , D1ij . / S (λ0) to the standard normal.
Multinomial, Survival for understanding the causal effect for ordinal out-
and Distributional Outcomes comes, including the measure of stochastic superi-
ority of treatment over control for compliers –
Multinomial Outcome

SSC ¼ P Y 1i > Y 0i j Ci ¼ complier
Multinomial outcomes (i.e., nominal or ordinal out- 1
comes) are common in health services research. For þ P Y 1i ¼ Y 0i j Ci ¼ complier
2
example, Bruce et al. (2004) conducted a random- J1 X
X Jj
1X J
ized trial to improve adherence to prescribed depres- ¼ tjþk vj þ tj vj
sion treatments among depressed elderly patients in j¼k k¼1
2 j¼1
(21)
primary care practices; the outcomes of interest J1 X
X
Jj
qj ð1 π c Þsj
included continuous outcomes as well as multino- ¼ tjþk vj
mial outcomes such as the number of depression j¼k k¼1
πc
symptoms, ranging from 0 to 9, and the depression
1X J
qj ð1 π c Þsj
class (major, minor, or no depression). There was þ tj
2 j¼1 πc
noncompliance in this trial, and Ten Have et al.
(2004) used random assignment as an IV to estimate
the effect of receiving treatment on continuous out- SSC = 0.5 indicates no causal effect, and
comes. Cheng (2009) considered how to estimate SSC > 0.5 indicates beneficial effect of the treat-
the effect of receiving treatment on the multinomial ment for compliers if a higher value of the out-
outcomes using random assignment as an IV. come is a better result. Compared to the CACE,
For ordinal outcomes, the CACE is a function SSC is easy to interpret and avoids the problem of
of coding scores and probabilities with respect to choosing scores Wj, but without use of weighting
the categories: scores, it may not describe the strength of the
effect well when some specific categories are
1 known to be more important than other categories
CACE ¼ E Y i Y 0i j Ci ¼ co in measuring the treatment effect.
X X
¼ W j tj W j vj For nominal outcomes, it is difficult to get a
j j
X summary measure of the causal effect such as the
¼ W j tj CACE or SSC for ordinal outcomes. Instead, the
j
" # treatment effect on the entire outcome distribu-
1 X X tions of compliers with and without treatment can
W j gj ð1 π c Þ W j sj
πc j j be evaluated, that is, to compare tj to vj, j = 1,. . .,
J and test the equality of tj and vj, j = 1,. . ., J.
where Wj is the coding score; tj, vj, and sj are the Cheng (2009) estimated those causal effects with
probabilities for compliers under treatment and the likelihood method and proposed a bootstrap/
control and never takers, respectively; and qj is double bootstrap version of a likelihood ratio test
the probability for observed group Zi = 0, Di = 0 for the inference when the true values of parame-
for the jth category. For estimating the CACE for ters are on the boundary of the parameter spaces
ordinal outcomes, the coding score needs to be under the null.
chosen. Equally spaced scores or linear transfor-
mations of them, midranks and ridit scores are
among the options. A sensitivity analysis can be Survival Outcome
performed with different choices of scores to see
how the results differ. Compared to trials with continuous, binary, and
In addition to the CACE, Cheng (2009) consid- multinomial outcomes, randomized trials with
ered some other functions of outcome distributions survival outcomes often have an issue of
administrative censoring in addition to noncom- Similar to the standard IV estimator for CACE,
pliance. For those studies, Robins and Tsiatis the standard IV estimator for the compliers differ-
(1991) considered a structural accelerated failure ence in survival probabilities is
time model and developed semiparametric estima-
tors for this model. Joffe (2001) provided a good ^S 1 ðV Þ ^S 0 ðV Þ
^S c1 ðV Þ ^S c0 ðV Þ ¼ ,
discussion of their approach and comparisons with ^ ðDj Z ¼ 1Þ E
E ^ ðDj Z ¼ 0Þ
other survival analysis methods. Loeys and
Goetghebeur (2003) and Cuzick et al. (2007) con- which is the difference of the observed survival
sidered a structural proportional hazards model in probabilities at time V between compliers under
which the hazard of the potential failure time under treatment and control divided by the proportion of
treatment for a certain group of subjects is propor- compliers. ^S z ðV Þ is the Kaplan-Meier estimator
tional to the hazard of the potential failure time under assignment z. In addition to the five IV
under control for these same subjects. Both the assumptions discussed in section “Framework
structural accelerated failure time model and the and Notation,” an additional assumption is needed
structural proportional hazards model are semi- to ensure that the estimator based on Kaplan-
parametric models, where the effect of the treat- Meier estimates is consistent:
ment on the distribution of failure times is modeled
parametrically. Independence Assumption of Failure Times
Baker (1998) extended the models and and Censoring Times The distributions of poten-
assumptions for discrete-time survival data and tial failure times T and administrative censoring
derived closed form expressions for estimating times C are independent of each other. Type I
the difference in the hazards at a specific time censoring (i.e., censoring times are the same for
between compliers under treatment and control all subjects) and random censoring are two spe-
based on maximum likelihood. Baker (1998)’s cial cases.
estimator is analogous to the standard IVestimator Although the standard IV estimator is very
for a survival outcome. Nie et al. (2011) discussed useful, it may give negative estimates for hazards
this standard IV approach and parametric maxi- and be inefficient because it does not make full
mum likelihood methods for the difference in use of the mixture structure implied by the latent
survival at a specific time between compliers compliance model. When the survival functions
under treatment and control. follow some parametric distributions, Nie et al.
Here, the standard IVapproach of Baker (1998) (2011) used the EM algorithm to obtain the MLE
will be reviewed. Let Sc1(V ), Sc0(V ), Sat(V ), and on the difference in survival probabilities for com-
Snt(V ) be the potential survival functions at time pliers. However, the MLEs could be biased when
Vof compliers in the treatment and control groups the parametric assumptions are not valid. To
and of always takers and never takers, respec- address this concern, Nie et al. (2011) developed
tively, Sz(V ) be the survival probabilities at time a nonparametric estimator based on empirical
V for the group with assignment Z = z, and Szd(V ) likelihood that makes use of the mixture structure
be the survival probabilities at time V for the to gain efficiency over the standard IV method
group with assignment Z = z and treatment while not depending on parametric assumptions
received D = d. By Table 1, the following holds to be consistent.
S1 ðV Þ ¼ π c Sc1 ðV Þ þ π at Sat ðV Þ þ π nt Snt ðV Þ,
πc π at
S11 ðV Þ ¼ Sc1 ðV Þ þ Sat ðV Þ Effect of Treatment on Distribution
π c þ π at π c þ π at
of Outcomes
S10 ðV Þ ¼ Snt ðV ÞS0 ðV Þ¼ π c Sc0 ðV Þ þ π at Sat ðV Þ þ π nt Snt ðV Þ,
πc π nt As discussed in previous sections, a large litera-

S00 ðV Þ ¼ Sc0 ðV Þþ Snt ðV ÞS01 ðV Þ ¼ Sat ðV Þ ture on methods of analysis for treatment effects
π c þ π nt π c þ π nt
focuses on estimating the effect of treatment on requires information on the treatment’s effect on
average outcomes, for example, the CACE the entire distribution of outcomes rather than just
(Imbens and Angrist 1994; Angrist et al. 1996). the average effect because a patient’s utility over
However, in addition to the average effect, knowl- outcomes may be nonlinear over the outcome
edge of the causal effect of a treatment on the scale (Karni 2009; Pliskin et al. 1980). Hogan
outcome distribution and its general functions and Lee (2004), Saigal et al. (1999), and Sommers
can often provide additional insights into the et al. (2007) provide examples in HIV care, neo-
impact of the treatment and therefore can be of natal care, and cancer care, respectively.
significant interest in many situations (Poulson For distributional treatment effects on non-
et al. 2012). For example, in a study of the effect degenerate outcome variables with bounded sup-
of school subsidized meal programs on children’s port, without any parametric assumption, Abadie
weight, both low weight and high weight are (2002) used the standard IV approach to estimate
adverse outcomes; therefore, knowing the effect the counterfactual cumulative distribution func-
of the program on the entire distribution of out- tions (cdfs) of the outcome of compliers with
comes rather than just average weight is important and without the treatment and proposed a boot-
for understanding the impact of the program. For strap procedure to test distributional hypotheses
an individual patient deciding which treatment to with the Kolmogorov-Smirnov statistic. How-
take, the patient must weight the effects of the ever, Abadie (2002) and Imbens and Rubin
possible treatments on the distribution of out- (1997a) pointed out that the standard IV estimates
comes, the costs of the treatments and the poten- of the potential cdfs for compliers may not be
tial side effects of the treatments (Hunink et al. nondecreasing functions:
2001). Therefore, making the best decision
^ f1ðY i yÞDi j Z i ¼ 1g E
E ^ f1ðY i yÞDi j Z i ¼ 0g
^ c1 ðyÞSIV ¼
H ^ c0 ðyÞSIV
H
^ ðDi j Zi ¼ 1Þ E
E ^ ðDi j Z i ¼ 0Þ
^ f1ðY i yÞð1 Di Þj Zi ¼ 1g E
E ^ f1ðY i yÞð1 Di Þj Z i ¼ 0g
¼ ,
^ fð1 Di Þj Zi ¼ 1g E
E ^ fð1 Di Þj Zi ¼ 0g
where H^ c1 ðyÞSIV and H

^ c0 ðyÞSIV are the standard Cheng et al. (2009a) developed a semi-
IV estimators for compliers’ cumulative distribu- parametric instrumental variable method based
tion function (cdf) under treatment and control, on the empirical likelihood approach. Their
respectively. Furthermore, as discussed in section approach makes full use of the mixture structure
“More Efficient Estimation,” the standard IV implied by the latent compliance class model
approach does not make full use of the mixture without parametric assumptions on the outcome
structure (Imbens and Rubin 1997a) implied by distributions as well as takes into account the
the latent compliance class model (see Table 1) nondecreasing property of cdfs and can be easily
and hence could be less efficient. Instead, Imbens constructed based on data. Their method can be
and Rubin (1997a) proposed a normal approxima- applied to general outcomes and general functions
tion and two multinomial approximations to the of outcome distributions. Cheng et al. (2009a)
outcome distributions. However, the estimator showed that their estimator has good properties
based on a normal approximation could be biased and is substantially more efficient than the stan-
when the outcomes are not normal, and for the dard IV estimator.
approach based on multinomial approximations, a For the mixture structure implied by the latent
systematic approach for choosing the multinomial compliance model (see Table 1), Cheng et al.
approximations is needed. (2009a) adopted a density ratio model proposed
by Anderson (1979) to relate the densities of the Under the density ratio model (22), the log
latent compliance classes by an exponential tilt: likelihood is
hj ð y Þ
¼ exp αj þ βj y , j ¼ 1, 2, 3 (22) ‘ ¼ n01 logϕa þ n00 logð1 ϕa Þ
ð
h0 y Þ
þn10 logϕn þ n11 logð1 ϕn Þ
X
n
where h0( y) is unspecified and h0( y) = P þ ½I ðZi ¼ 0, Di ¼ 1Þðα3 þ β3 yi Þ
(Yi = y|Zi = 0, Ci = co), h1( y) = P (Yi = y|Ci = nt), i¼1
h2( y) = P (Yi = y|Zi = 1, Ci = co), h3( y) = P þI ðZi ¼ Di ¼ 0Þlogfλ þ ð1 λÞexpðα1 þ β1 yi Þg

(Yi = y|Ci = at) are the outcome density (mass) X n
þ ½I ðZi ¼ Di ¼ 1Þlogfτexpðα2 þ β2 yi Þ
functions of the latent compliance groups: com- i¼1
pliers under control, never takers, compliers under þð1 τÞexpðα3 þ β3 yi Þg
treatment, and always takers, respectively; The Xn X
n
densities are modeled nonparametrically except þ ½I ðZi ¼ 1, Di ¼ 0Þðα1 þ β1 yi Þ þ logh0 ðyi Þ
for being related by a parametric “exponential i¼1 i¼1
tilt.” The idea is similar to Cox’s proportional

hazard models, and many conventional paramet- where h0( ) is unspecified, and
ric families fall in the exponential tilt model
category, including two normals with common (
X
n X
n
variance but different means, two exponential dis- h0 C ¼ h0 j h0 ðyi Þ 0, h0 ðyi Þ ¼ 1, h0 ð y i Þ
tributions, and two Poissons. The exponential tilt i¼1 i¼1
model provides a good fit to the data when many

exp αj þ βj yi ¼ 1, j ¼ 1,2,3g (24)
conventional parametric models do not fit the
data well.
Let fzd( y) = P (Yi = y|Zi = z, Di = d ) and Note that h0( ) will put its support on observed
Fzd( y) = P (Yi y|Zi = z, Di = d ) be the proba- data points y1,. . ., yn (Owen 2002) and constraint
bility density (mass) function and cumulative dis- (24) ensures that the estimators for outcome dis-
tribution function of the observed group (Zi = z, tributions H0, H1, H2, and H3 are cumulative dis-
Di = d ) for continuous (discrete) outcome, tribution functions. Similar to Qin and Zhang
respectively, where z, d = 0, 1. Then, by the IV (1997), after maximizing the log likelihood with
assumptions and latent compliance class model constraint (24) through Lagrange multipliers, the
(see Table 1), the following holds following holds:
f 11 ðyÞ ¼ λh2 ðyÞ þ ð1 λÞh3 ðyÞ, 1 1

h0 ð y i Þ ¼ P ,
f 10 ðyÞ ¼ h1 ðyÞ, f 00 ðyÞ ¼ τh0 ðyÞ (23) n n1 þ 3j¼1 ξj exp αj þ βj yi 1
þð1 τÞh1 ðyÞ, f 01 ðyÞ ¼ h3 ðyÞ: j ¼ 1, 2, 3
(25)
where
where ξj’s ( j = 1, 2, 3) are Lagrange multipliers
ϕc 1 ϕa ϕn ϕc determined by
λ¼ ¼ , τ¼
ϕ c þ ϕa 1 ϕn ϕc þ ϕn

1 ϕa ϕn 1X n
exp αj þ βj yi 1
¼ P
1 ϕa n i¼1 1 þ 3j¼1 ξj exp αj þ βj yi 1
The causal effect of actually receiving the ¼ 0, j ¼ 1, 2, 3 (26)

treatment on the outcome distribution for compliers
can be examined by considering h0( y) and h2( y). and the limiting values of ξ are
8 09
< ξ1 = ^ 1 ðιÞ H
d SEM ¼ H
CQCE ^ 1 ðιÞ
2 0
ξ0 ¼ ξ02
: 0;
ξ3 When ι = 0.5, it is the difference of the
8 9
< δϕn þ ð1 δÞð1 ϕa Þð1 λÞ = medians for the compliers under treatment and
¼ τδð1 ϕn Þ control.
: ;
ð1 δÞϕa þ δð1 ϕn Þð1 τÞ The goodness of fit of the density ratio model
can be tested by comparing estimated outcome
Then, the maximum semiparametric empirical cdfs based on the density ratio model to the empir-
likelihood estimate of η = (ϕa, ϕn, α1, β1, α2, β2, ical distribution function estimates (Qin and
α3, β3) can be obtained by maximizing the profiled Zhang 1997):
log likelihood through the EM algorithm. And
pffiffiffi
then the outcome densities (masses) of compliers Δzd ¼ sup n F ^ zd ðyÞ F
~ zd ðyÞ, z, d ¼ 0,1:
1<y<1
under control (h0( y)) and treatment (h2( y)) can
be estimated by ^ h 0 ðyi Þ; see Eq. 25, and ^h 0 ðyi Þexp (28)

^ 2 þ β^ 2 yi , respectively, and their corresponding
α
P The p-value of the goodness-of-fit test can be
cdfs H0( y) and H2( y) are estimated by H ^ 0 ðyÞ ¼
P i
estimated by a bootstrap p-value
^h 0 ðyi ÞI ðyi yÞ and H ^ 2 ðyÞ ¼ ^ ^ 2þ
i h 0 ðyi Þexpðα
β^ 2 yi ÞI ðyi yÞ, respectively. To examine the causal
^B ¼ P
P ^ B Δ Δobs (29)
effect of actually receiving treatment on the outcome zd zd zd zd
distribution for compliers, the equality of h0( y) and
h2( y) can be tested by testing H0: α2 = β2 = 0 by the where Δobszd is obtained from the actually observed
semiparametric empirical likelihood ratio statistic data and Δzd is calculated from B bootstrap sam-
ples generated under the null hypothesis: the den-

sity ratio model (22) is true.
R ¼ 2 max‘ ðηÞ max‘ ðη1 , α2 ¼ β2 ¼ 0Þ ,
η η1
η1 ¼ ðα1 , β1 , α3 , β3 , ϕa , ϕn Þ
Study Design IV and Multiple IVs
where α2 must equal 0 when β2 equals 0 because
of constraint (24). Under regularity conditions, Study Design IV: Near-Far Matching
R follows a chi-squared distribution with one
degree of freedom asymptotically under the null Study design focuses attention on the data which
hypothesis. is to be analyzed. The manner in which the data
In addition to investigating the distributional are structured largely determines the statistical
treatment effect, some function of the outcome procedures appropriate for analysis. The separa-
distributions, g(η), where g is a real-valued function between study design and statistical analysis
tion with nonzero first partial derivatives, can also is quickly illustrated by considering a uniform
be estimated. For example, under the semi- randomized paired analysis. The process of
parametric setting in Cheng et al. (2009a), the matching individual units of observation into
CACE can be estimated by using pairs based on observed, pretreatment covariates,
and then randomizing one unit within each pair to
X
n
d SEM ¼
CACE ^ 2 þ β^ 2 yi 1 :
yi ^h 0 ðyi Þ exp α treatment and the other to control is study
i¼1 design. The researcher constructs the pairs by
carefully controlling the assignments to increase
One can also compare the ι quantiles of efficiency by decreasing within pair variation
outcome distributions of compliers with and (by constructing matched pairs) as well as to min-
without treatment (marginal distributions of Y1 imize unobserved bias (by randomization). These
and Y 0): steps increase the validity of the results and go a
long way toward reassuring the audience of the Analogously for instrumental variables, it is
reliability of the reported conclusions. Only the known that if the goal is to have greater power
manner in which the data are prepared has thus far and results which are more robust to small viola-
been described. This is the design of the study. tions of the IV assumptions, then a smaller data set
Once the experiment is run and the data are with a stronger instrument is preferable to a larger
recorded, then the results need to be analyzed. data set with a weaker instrument (Small and
Given the study design, most analysts would Rosenbaum 2008). The trade-off between bigger
select a paired t-test, perhaps using student’s t. but weaker and smaller but stronger was thought
But that is not the only choice; one could justifi- to be informative, but not useful once the analyst
ably use a permutational test or, with some addi- has committed to using a particular data set. Con-
tional assumptions, a model-based approach trary to this belief, Baiocchi et al. (2010) demon-
(e.g., regression) to adjust for potential covariate strated that even within a particular data set, the
imbalances which routinely occur in finite sam- analyst may use near-far matching to go from a
ple randomizations. This is the statistical infer- weaker-but-bigger study to a more robust smaller-
ence phase of the study. Statistical inference is but-stronger study.
distinct from, though predicated on and preceded There are two objectives in near-far matching.
by, the study design. The more well understood As in a randomized controlled trial (RCT) with a
the study design, the more credibility the statis- matched-pair design, one objective in near-far
tical inference is likely to have. This is true in matching is to create matched pairs where the
experimentation and even truer in the observa- covariates are similar within a pair. Creating pairs
tional setting. with very similar covariate values (i.e., pairs which
In observational settings data is often plentiful, are near each other in covariate space) is used to
especially compared to the experimental setting. improve efficiency. The other objective in near-far
The trouble with observational data is that esti- matching is to separate observations’ instrument
mates of treatment effects tend to be plagued by values within a matched pair. In the neonatal inten-
confounding by both observed and unobserved sive care example outlined in the introduction,
covariates. The goal of study design in the obser- within a matched pair, one wants one mother to be
vational setting can be thought of as finding the highly encouraged to deliver at a high-level NICU
subset of the data which will produce the best and the other to be highly encouraged to deliver at a
study given the limitations of the data (usually in low-level NICU. This is similar to the matched-pair
the sense of internal validity). design when there is the potential for non-
In the literature, study design is also sometimes compliance. If the level of encouragement can be
referred to as “preprocessing” (Ho et al. 2007). varied, then it is preferential to have two mothers
For those new to study design, perhaps the most who are highly dissimilar (far) in their levels of
unintuitive insight is that the analysis can actually randomly assigned encouragement because it is
be improved by removing observations from then more likely that within the pair, one mother
consideration before performing the statistical will comply with the encouragement and take the
inference. This is unintuitive because, loosely treatment and the other will comply with the lack of
speaking, it seems like the study with the most encouragement and take the control. As outlined in
observations is the most informative. This is a Baiocchi et al. (2010), algorithms exist which will
recognized problem in the observational litera- construct pairs which maximize both of these objec-
ture. For example, it has become standard practice tives at the same time.
to use propensity scores to limit the analysis In most real-world examples, there will be a
to only the observation units which have trade-off between the “near” and the “far” part of
corresponding propensity score values in either the matching. The technical aspects of this trade-
the treated or control group, removing from infer- off, and how to construct such pairs, are context
ence the observational units with extreme values specific – for guidance see Baiocchi et al. (2010,
close to 1 or 0 (Rosenbaum 2002, 2009). 2012). The intuition is that as the analyst forces
separation in the instrument values between pairs effect for subjects who would the take treatment
of patients it becomes more difficult to find if the IV was equal to z but not take the treatment
patients with quite dissimilar instrument values if the IV was a little less than z is lim !0 E
d¼1
but very similar covariates. The Baiocchi et al. Y i Y d¼0
i j Dzi ¼ 1, Dz
i ¼ 0 ; Heckman
(2010) paper outlines both theoretical arguments and Vytlacil (1999) refer to this as the marginal
as well as practical reasons for designing studies treatment effect at z. Treatment effects of interest
with greater separation in the instrument. can all be expressed as a weighted average of
It should be noted that pair matching is being these marginal treatment effects (Heckman and
referred to, but all of these arguments hold Vytlacil 1999). For example, the treatment effect
for larger block designs. Near-far matching estimated by dichotomizing the IV as 1 or
would work with k:1 matching and other more 0 according to whether the IV is above some
exotic designs. The primary difference would be cutoff or the treatment effect estimated by
the optimization algorithm used to construct two-stage least squares using the continuous IV
the sets. can be expressed as a weighted average of the
This process is similar to propensity score marginal treatment effects. The average treatment
matching and other matching techniques in gen- effect over the whole population can also be
eral. The goal is to prepare the data, by finding the expressed as a weighted average of the marginal
parts of the data set which lend themselves to treatment effects. Identification of the average
causal inference, so as to improve the reliability treatment effect over the whole population
of the statistical analysis to be performed. Note requires identification of all the marginal treat-
that, just as with propensity score matching, the ment effects. In order for all the marginal treat-
analyst may decide to use whichever appropriate ment effects to be identified using the IV (and thus
statistical method of analysis post-matching. That the average treatment effect identified), it is
is, after performing near-far matching, the analyst required that for large values of Z, P (D = 1|Z)
may then decide to use a 2SRI model if that is approaches 1 and for small values of Z, P (D = 1|
appropriate for the given data set. But, the selec- Z) approaches 0 (Heckman and Vytlacil 1999).
tion of the statistical method must be made with Basu et al. (2007) show how to estimate marginal
justification, not out of convenience. This is why treatment effects and the average treatment effect
most analysts will decide to use the effect ratio when this condition is satisfied.
(discussed in section “Binary Outcomes”) after
performing near-far matching as the study design
leads naturally into the statistical analysis. Multiple IVs
In some settings, there may be multiple IVs avail-

Multilevel and Continuous IVs able. For example, Malkin et al. (2000) used IV
methods to estimate the effect of longer postpar-
In some settings, the IV has multiple levels or is tum stays on newborn readmissions. Malkin et al.
continuous. For example, in the neonatal intensive (2000) used two IVs, (1) hour of birth and
care example, the mother’s excess travel time (2) method of delivery (vaginal vs. C-section).
from the nearest high-level NICU compared to Hour of birth influences length of stay because it
the nearest low-level NICU is continuous. Multi- affects whether a newborn will spend an extra
ple levels of the IV provides us with the opportu- night in the hospital; for example, Malkin et al.
nity to identify a richer set of causal effects (2000) found that newborns born in the a.m. have
(Imbens 2007). Suppose the IV is continuous longer lengths of stay than newborns born in the
and the following extended monotonicity assump- p.m. Method of delivery influences length of stay
0
tion holds Dzi Dzi for all zi z0i, that is, a higher because mothers need more time to recuperate
level of the IV always leads to at least as high a after a C-section than following a vaginal deliv-
level of the treatment. The limit of the treatment ery, and newborns are rarely discharged before
their mothers. Each IV identifies the treatment limit of the estimators based on IV A and B,
effect for a different set of compliers. If treatment respectively, is the same; in this case, the over-
effects are heterogeneous, the complier average identifying restrictions test would give false assur-
causal effects may differ. For example, newborns ance that the IVs are valid (Small 2007).
who would only stay an extra day if born in the
a.m. compared to the p.m. may differ in their risk
characteristics compared to newborns who would Multilevel and Continuously Valued
only stay an extra day if delivered by C-section Treatments
compared to vaginal delivery, and length of stay
may have a different effect on newborns with The treatment under study may take on multiple or
different risk characteristics. continuous values, for example, the dose of a
Two-stage least squares can be used to com- medication. Two-stage least squares can still be
bine the IVs – in the first stage, regress D on both applied. Angrist and Imbens (1995) present the
Z1 and Z2 (as well as X) and then use the predicted following formula that shows that the two-stage
D as usual in the second stage. Under the assump- least squares estimator converges to a weighted
tion of homogeneous treatment effects and average of the effect of one unit changes in the
constant variance, the two-stage least squares esti- treatment level. Suppose the treatment can take on
mate is the optimal way to combine the IVs levels 0, 1,. . ., d and that monotonicity holds in the
(White 1984). When treatment effects are hetero- sense that Dz¼1 i Dz¼0
i . Assume there are no
geneous, two-stage least squares estimates a covariates. Then, the two-stage least squares esti-
weighted average of the complier average causal mator converges to
effect for the IVs with stronger IVs getting greater
weight (Imbens and Angrist 1994; Angrist and Eð Y i j Z i ¼ 1 Þ Eð Y i j Z i ¼ 0 Þ
Imbens 1995). When there are two or more dis- EðDi j Zi ¼ 1Þ EðDi j Zi ¼ 0Þ
tinct IVs, it is useful to report the estimates from
X
d
the individual IVs in addition to the combined IVs ¼ ωd E Y d Y d1 j Dz¼1 d > Dz¼0 ,
since the IVs may be estimating treatment effects d¼1
for different types of people. (30)
When there are multiple IVs and treatment
effects are homogeneous, the overidentifying PðDz¼1 d>Dz¼0 Þ
where ωd ¼ Pd . The numera-
restrictions test can be used to test the validity of PðDz¼1 d>Dz¼0 Þ
d¼1
the IVs (Davidson and MacKinnon 1993; Sargan tor of ωd is the proportion of compliers at point d,
1958). The overidentifying restrictions test tests that is, the proportion of individuals driven by the
whether the estimates from the different IVs are encouraging level of the IV from a treatment
the same. When treatment effects are homoge- intensity less than d to at least d. The ωd’s are
neous, if the estimates from two different IVs nonnegative and sum to one. The quantity E[Yd –
converge to different limits, this would show that Yd1|DZ=1 > d Dz=0] in Eq. 30 is the causal
at least one of the IVs is invalid. There are two effect of a one unit increase in the treatment from
problems with using the overidentifying restric- d 1 to d for compliers at point d. Equation 30
tions test to test the validity of IVs. First, if treat- shows that the two-stage least squares estimator
ment effects are heterogeneous, then the complier converges to a weighted average of the causal
average causal effects for the two IVs may be effects of one unit increases in the treatment
different even though both IVs are valid; in this from d 1 to d for compliers at point d, where
case, the overidentifying restrictions test would the points d at which there are more compliers get
falsely indicate that at least one of the IVs is greater weight. The weights ωd can be estimated
invalid. Second, even if treatment effects are since under monotonicity and the assumption that
homogeneous, two IVs A and B may both be the IV is independent of the potential treatment
biased but in the same way so that the asymptotic received, P(Dz=1 d > Dz=0) = P(Dz=1 d) P
(Dz=0 d) = P(D d|Z = 1) P(D d|Z = 0). used to form multiple IVs (e.g., Z, Z2, Z3, etc.),
See Angrist and Imbens (1995) for an extension of and a nonlinear treatment effect can be estimated
these formulas to the setting where there are (Kelejian 1971). For example, suppose YD = d =
covariates X that are controlled for. Y 0 + β1d + β2d2. Then, β1 and β2 can be consis-
Researchers often times dichotomize multi- tently estimated with a continuous IV Z by using
level or continuous treatments. However, using two least squares where D ^ is estimated by
IV methods with a dichotomized continuous treat- regressing D on Z and Z2, D ^ 2 is estimated by
ment can lead to an overestimate of the treatment regressing D on Z and Z , and β1 and β2 are
2 2
effect. Let β denote the average causal effect (30) estimated by regressing Y on D ^ and D ^ 2 . Tan
that the two-stage least squares estimator for a (2010) discusses other estimation approaches for
multilevel treatment converges to Angrist and estimating nonlinear treatment effects.
Imbens (1995) that show that if this treatment is A common setting is to have a treatment with
dichotomized as B = 1 if D l, B = 0 if D < l for three levels that may not be strictly ordered by
some 1 l d¯, then the two-stage least squares dose. Cheng and Small (2006) consider the set-
estimator using the binary treatment B converges ting of a treatment with three levels – control
to ϕβ where (0) and two active levels A and B, where A and
B are not ordered by dose and some subjects may
EðDjZ ¼ 1Þ EðDjZ ¼ 0Þ prefer A to B and some may prefer B to A. Sub-
ϕ¼
EðBjZ ¼ 1Þ EðBjZ ¼ 0Þ jects are randomly assigned to one of the three
Pd z¼1 arms 0, A and B, and then could either take the
j¼1 P D j > Dz¼0 assigned treatment or not take it and receive the
¼ 1
P Dz¼1 l > Dz¼0 control (for the control arm, all subjects receive
the control 0). The effect of treatment A versus
The only situation when ϕ = 1 is when the IV control for subjects who would take treatment
has no effect other than to cause people to switch A if offered it (i.e., compliers with treatment A)
from D = l 1 to D = l. Otherwise, when a is identified by analyzing only subjects who were
multilevel treatment is incorrectly parameterized either assigned to the control arm or the treatment
as binary, the resulting estimate tends to be too A arm. But for this setting, Cheng and Small
large relative to the average per-unit effect of the (2006) showed that the effect of treatment A for
treatment. The problem with dichotomizing a subjects who would take treatment A if assigned
multilevel treatment is that the IV has a direct to it but not treatment B and the effect of treat-
effect because the encouraging level of the IV ment A for subjects who would take treatments
can push a person to a higher level of treatment A or B if assigned to A or B, respectively, is not
even if B is 1 under both the non-encouraging and point identified. However, the data provides
encouraging levels of the IV. information that can be used to narrow bounds
Although dichotomizing a continuous treat- on these treatment effects. These treatment
ment results in a biased IV estimate, the sign of effects are of interest for individuals making
the treatment effect is still consistently estimated. decisions about which treatment to take, for
If the treatment effect for compliers is linear, example, for a very compliant subject who
that is, the causal effect of a one unit increase in knows she would take either treatment A or B if
the treatment from d 1 to d for compliers at offered it, she would like to know whether treat-
point d is the same for all d, then the two-stage ment A or B is better among very compliant sub-
least squares estimator estimates this linear treat- jects like herself; the treatment effects are also of
ment effect. If the treatment effect is nonlinear, interest for clinicians deciding which treatment
then with a binary IV, it is not possible to estimate to offer first and for health policymakers antici-
anything other than the weighted treatment effect pating what would happen were the treatment
(30). If the IV is continuous, then the IV can be (s) to be introduced into general practice in a
setting in which compliance patterns are ex- strategy to produce a consistent estimate, the
pected to differ from those of the trial (Cheng effect of higher elementary/secondary school
and Small 2006). quality on earnings would have to be the same
for children from low-income and high-income
families – this is assumption (eiv-a2).
Extended Instrumental Variable
Method for When Proposed IV Has
a Direct Effect Software
When a proposed IV Z is thought to be indepen- Software for implementing IVanalyses is available
dent of unmeasured confounders but there is con- in R, SAS, and Stata. Here an IV analysis will be
cern that Z might have a direct effect on the illustrated using the AER package in the freely
outcome, Joffe et al. (2008) proposed an extended available software R. Consider estimating the
instrumental variables strategy for obtaining a causal effect of military service during the World
consistent (i.e., asymptotically unbiased) estimate War II era on men’s future earnings using data from
of the causal effect of treatment that requires hav- the 5% public use 1980 Census. The Census data
ing a covariate W for which: contain information on a man’s race and Census
division of birth, but is missing information on
• (eiv-a1). The covariate W interacts with Z in variables such as health and criminal behavior,
affecting treatment. which were important barriers to serving in the
• (eiv-a2). The direct effect of Z does not depend war and are important determinants of earnings.
on W. Motivated by this concern about unmeasured
confounding, Angrist and Krueger (1994) pro-
For such a setting, Joffe et al. (2008) show that posed to use time of birth as an IV; see also Small
a consistent estimate of the treatment effect can be and Rosenbaum (2008) for follow-up analyses.
obtained under the additional assumption that the Time of birth is associated with military service
treatment effect is constant across subjects by because a man only becomes eligible to serve in
using two-stage least squares where Z W is the military when he turns 18; men who turned
the IV and Z and W are included as measured 18 after World War II was over are substantially
covariates (other covariates can also be included less likely to have served in the military. Here,
in addition). As an example of this approach, Card consider the binary IV, Z = 1 if a man was born
(1995) studied the effect of education on earnings between 1925 and 1927 (most men born in these
and considered having grown up near a 4-year time periods turned 18 during World War II) and
college as an IV, but was concerned that growing Z = 0 if a man was born in 1928 (so turned 18 after
up near a college might have a direct effect on World War II was over). The data set used in
earnings, for example, through the presence of a the analysis military earnings.csv is available at
college being associated with higher school qual- www-stat.wharton.upenn.edu/dsmall/military-
ity at nearby elementary and secondary schools. earnings.csv, and the data is described in the
Card considered the covariate W = whether the file www-stat.wharton.upenn.edu/dsmall/military-
person grew up in a low-income household. The earnings-readme.txt.
interaction between growing up near a 4-year
college and being from a low-income household library(AER)
predicts going to college, because college prox- dataset=read.csv("military-earnings.
imity lowers the cost of higher education and this csv",header=TRUE) attach(dataset);
cost lowering has a bigger effect on going to # earnings = earnings in 1980
college for children from low-income families. # veteran = 1 if World War II veteran,
In order for the extended instrumental variable 0 if not
# yrquarter = year/quarter of birth, tslsreg=ivreg(earnings~veteran

e.g., born in 1927 first quarter = +racecat+division,~z+racecat+division)
1927, born in 1927 second qua summary(tslsreg)
# born in 1927 fourth quarter = Call:
1927.75 ivreg(formula = earnings ~ veteran +
# racecat = 1, white; 2, black; racecat + division | z + racecat +
3, other division)
# Make race into a categorical variable Residuals:
# racecat=as.factor(racecat); <inline figure>
# division = Census division of Coefficients:
birthplace, <inline figure>
# Census Division of birthplace Signif. codes: 0 *** 0.001 ** 0.01 *
# 1= New England, 2= Middle Atlantic, 0.05 . 0.1 1
3 = East North Central, Residual standard error: 12750 on
# 4= West North Central, 5 = South 127073 degrees of freedom Multiple
Atlantic, 6 = East South Central R-Squared: 0.02788, Adjusted R-squared:
# 7 = Mountain, 8 = Pacific, 9 = 0.02779 Wald test: 408.2 on 11 and
American Territories 127073 DF, p-value: < 2.2e-16
# Make division into a categorical # It is estimated that military service
variable division=as.factor(division); decreases a man’s earnings by $834 with
# IV is 1 if born in 1925-1927, 0 if a
born in 1928 z=(yrquarter<=1927.75) # standard error of $197. There is
# Strength of the IV strong evidence that military service
> mean(veteran[z==1]) [1] 0.7363794 # decreases earnings (p-value <
> mean(veteran[z==0]). [1] 0.3169782 0.0001).
# It is estimated that 0.736-
0.317=0.419 of the men are compliers Acknowledgments Jing Cheng and Dylan Small were
and the IV is moderately strong supported by grant RC4MH092722 from the National
# First stage of the two stage least Institute of Mental Health. The authors thank Scott Lorch
for the use of the data from the NICU study.
squares regression
# Find partial F test statistic for IV
fsreg=lm(veteran~racecat+division+z) References
reg.without.iv=lm(veteran~racecat
+division) anova(fsreg,reg.without.iv) Abadie A. Bootstrap tests for distributional treatment
Analysis of Variance Table effects in instrumental variable models. J Am Stat
Model 1: veteran ~ racecat + division Assoc. 2002;97:284–92.
Abadie A. Semiparametric instrumental variable estima-
Model 2: veteran ~ racecat + division +
tion of treatment response models. J Econ.
z 2003;113:231–63.
Res.Df RSS Df Sum of Sq F Pr(>F) Aidoo M, Terlouw D, Kolczak M, McElroy P, ter Kuile F,
<inline figure> Kariuki S, Nahlen B, Lal A, Udhayakumar
# The partial F statistic is 21,747, V. Protective effects of the sickle cell gene against
malaria morbidity and mortality. Lancet. 2002;
much greater than 10, and thus there is
359:1311–2.
no concern about the IV Anderson J. Multivariate logistic compounds. Biometrika.
# being too weak for two stage least 1979;66:17–26.
squares to be reliable Angrist J. Estimation of limited dependent variable models
# Two stage least squares regression with dummy endogenous regressors. J Bus Econ Stat.
2001;19:2–28.
using z as the IV and controlling for
Angrist J, Imbens G. Two-stage least squares estimation of
race and Census division average causal effects in models with variable treat-
# of birth ment intensity. J Am Stat Assoc. 1995;90:430–42.
Angrist J, Krueger A. Does compulsory school attendance Brooks J, Chrischilles E, Scott S, Chen-Hardee S. Was
affect schooling and earnings? Q J Econ. 1991;106: breast conserving surgery underutilized for early stage
979–1014. breast cancer? Instrumental variables evidence for
Angrist J, Krueger A. The effect of age at school entry on stage II patients from Iowa. Health Serv Res.
educational attainment: an application of instrumental 2004;38:1385–402.
variables with moments from two samples. J Am Stat Bruce M, Ten Have T, Reynolds C III, Katz I, Schulberg H,
Assoc. 1992;87:328–36. Mulsant B, Brown G, McAvay G, Pearson J,
Angrist J, Krueger A. Why do World War II veterans earn Alexopoulos G. Reducing suicidal ideation and depres-
more than nonveterans? J Labor Econ. 1994;12:74–97. sive symptoms in depressed older primary care
Angrist J, Pischke J-S. Mostly harmless econometrics: an patients: a randomized trial. J Am Med Assoc.
empiricist’s companion. Princeton/Oxford: Princeton 2004;291:1081–91.
University Press; 2009. Cai B, Small D, Ten Have T. Two-stage instrumental
Angrist J, Imbens G, Rubin D. Identification of causal variable methods for estimating the causal odds ratio:
effects using instrumental variables. J Am Stat Assoc. analysis of bias. Stat Med. 2011;30:1809–24.
1996;91:444–55. Cai B, Hennessy S, Flory JH, Sha D, Ten Have TR, Small
Baiocchi M, Small D, Lorch S, Rosenbaum P. Building a DS. Simulation study of instrumental variable
stronger instrument in an observational study of peri- approaches with an application to a study of the anti-
natal care for premature infants. J Am Stat Assoc. diabetic effect of bezafibrate. Pharmacoepidemiol Drug
2010;105:1285–96. Saf. 2012;21:114–20.
Baiocchi M, Small D, Yang L, Polsky D, Groeneveld Card D. Using geographic variation in college proximity to
P. Near/far matching: a study design approach to instru- estimate the return to schooling. Toronto: University of
mental variables. Health Serv Outcome Res Methodol. Toronto Press; 1995. p. 201–22.
2012;12:237–53. Cheng J. Estimation and inference for the causal effect of
Baker S. Analysis of survival data from a randomized trial receiving treatment on a multinomial outcome. Bio-
with all-or-none compliance: estimating the cost- metrics. 2009;65:96–103.
effectiveness of a cancer screening program. J Am Cheng J, Small D. Bounds on causal effects in three-arm
Stat Assoc. 1998;93:929–34. trials with noncompliance. J R Stat Soc Ser B. 2006;
Balke A, Pearl J. Bounds on treatment effects for studies 68:815–36.
with imperfect compliance. J Am Stat Assoc. 1997; Cheng J, Qin J, Zhang B. Semiparametric estimation and
92:1171–6. inference for distributional and general treatment
Basu A, Heckman J, Navarro-Lozano S, Urzua S. Use of effects. J R Stat Soc Ser B Stat Methodol.
instrumental variables in the presence of heterogeneity 2009a;71:881–904.
and self-selection: an application to treatments of breast Cheng J, Small D, Tan Z, Ten Have T. Efficient nonpara-
cancer patients. Health Econ. 2007;16:1133–57. metric estimation of causal effects in randomized trials
Bhattacharya J, Goldman D, McCaffrey D. Estimating with noncompliance. Biometrika. 2009b;96:19–36.
probit models with self-selected treatments. Stat Med. Clarke P, Windmeijer F. Instrumental variable estimators
2006;25:389–413. for binary outcomes. J Am Stat Assoc. 2012;
Bhattacharya J, Shaikh A, Vytlacil E. Treatment effect 107:1638–52.
bounds under monotonicity assumptions: an applica- Cole J, Norman H, Weatherby L, Walker A. Drug
tion to Swan-Ganz catheterization. Am Econ Rev. copayment and adherence in chronic heart failure:
2008;98:351–6. effect on costs and outcomes. Pharmacotherapy.
Bound JD, Jaeger DA, Baker RM. Problems with instru- 2006;26:1157–64.
mental variables estimation when the correlation Cox D. Planning of experiments. New York: Wiley; 1958.
between the instruments and the endogenous explana- Cuzick J, Sasieni P, Myles J, Tyler J. Estimating the effect
tory variables is weak. J Am Stat Assoc. 1995;90 of treatment in a proportional hazards model in the
:443–50. presence of non-compliance and contamination. J R
Brookhart M, Schneeweiss S. Preference-based instrumen- Stat Soc Ser B Methodol. 2007;69:565–88.
tal variable methods for the estimation of treatment Davidson R, MacKinnon J. Estimation and inference in
effects: assessing validity and interpreting results. Int econometrics. New York: Oxford University Press;
J Biostat. 2007;3:14. 1993.
Brookhart M, Wang P, Solomon D, Schneeweiss Demissie K, Rhoads G, Ananth C, Alexander G,
S. Evaluating short-term drug effects using a Kramer M, Kogan M, Joseph K. Trends in preterm
physician-specific prescribing preference as an instru- birth and neonatal mortality among blacks and whites
mental variable. Epidemiology. 2006;17:268–75. in the United States from 1989 to 1997. Am J
Brookhart M, Rassen J, Schneeweiss S. Instrumental var- Epidemiol. 2001;154:307–15.
iable methods in comparative safety and effectiveness Didelez V, Sheehan N. Mendelian randomization as an
research. Pharmacoepidemiol Drug Saf. 2010; instrumental variable approach to causal inference.
19:537–54. Stat Methods Med Res. 2007;16:309–30.
Durbin J. Errors in variables. Rev Inst Int Stat. 1954; Imbens G, Rubin D. Estimating outcome distributions for
22:23–32. compliers in instrumental variables models. Rev Econ
Fisher R. Design of experiments. Edinburgh: Oliver and Stud. 1997b;64:555–74.
Boyd; 1949. Inoue A, Solon G. Two-sample instrumental variables esti-
Freedman D. Statistical models: theory and practice. Cam- mators. Rev Econ Stat. 2010;92:557–61.
bridge: Cambridge University Press; 2009. Joffe M. Administrative and artificial censoring in cen-
Freedman D, Sekhon J. Endogeneity in probit response sored regression models. Stat Med. 2001;20:2287–304.
models. Polit Anal. 2010;18:138–50. Joffe M. Principal stratification and attribution prohibition:
Goedde H, Agarwal D, Fritze G, Meier-Tackmann D, good ideas taken too far. Int J Biostat. 2011;7(1):1–22.
Singh S, Beckmann G, Bhatia K, Chen L, Fang B, Joffe M, Small D, Brunelli S, Ten Have T, Feldman
Lisker R. Distribution of ADH2 and ALDH2 geno- H. Extended instrumental variables estimation for over-
types in different populations. Hum Genet. 1992; all effects. Int J Biostat. 2008;4.
88:344–6. Johnston S. Combining ecological and individual variables
Goyal N, Zubizarreta J, Small D, Lorch S. Length of stay to reduce confounding by indication: case study – sub-
and readmission among late preterm infants: an instru- arachnoid hemorrhage treatment. J Clin Epidemiol.
mental variable approach. Hosp Pediatr. In press. 2000;53:1236–41.
Heckman J, Robb R. Alternative methods for evaluating Kang H, Kreuels B, Adjei O, May J, Small D. The causal
the impacts of interventions: an overview. J Econ. effect of malaria on stunting: a Mendelian randomiza-
1985;30:239–67. tion and matching approach, Working Paper.
Heckman J, Vytlacil E. Local instrumental variables and Karni E. A theory of medical decision making under
latent variable models for identifying and bounding uncertainty. J Risk Uncertain. 2009;39:1–16.
treatment effects. Proc Natl Acad Sci. 1999;96:4730–4. Kaushal N. Do food stamps cause obesity? Evidence from
Hernán M, Robins J. Instruments for causal inference: an immigrant experience. J Health Econ. 2007;26:968–91.
epidemiologist’s dream? Epidemiology. 2006;17:360. Kelejian H. Two-stage least squares and econometric sys-
Hernán M, Robins J. Causal inference; 2013. tems linear in parameters but nonlinear in the endoge-
Hirano K, Imbens G, Rubin D, Zhou X. Assessing the nous variables. J Am Stat Assoc. 1971;66:373–4.
effect of an influenza vaccine in an encouragement Kitcheman J, Adams C, Prevaiz A, Kader I, Mohandas D,
design. Biostatistics. 2000;1:69–88. Brookes G. Does an encouraging letter encourage
Ho V, Hamilton B, Roos L. Multiple approaches to attendance at psychiatric outpatient clinics? The
assessing the effects of delays for hip fracture patients Leeds PROMPTS randomized study. Psychol Med.
in the United States and Canada. Health Serv Res. 2008;38:717–23.
2000;34:1499–518. Korn E, Baumrind S. Clinician preferences and the estima-
Ho D, Imai K, King G, Stuart E. Matching as nonparamet- tion of causal treatment differences. Stat Sci.
ric preprocessing for reducing model dependence in 1998;13:209–35.
parametric causal inference. Polit Anal. 2007;15: Kramer M, Rooks Y, Pearson H. Growth and development
199–236. in children with sickle-cell trait. N Engl J Med.
Hogan J, Lee J. Marginal structural quantile models for 1978;299:686–9.
longitudinal observational studies with time-varying Lawlor D, Harbord R, Sterne J, Timpson N, Smith
treatment. Stat Sin. 2004;14:927–44. G. Mendelian randomization: using genes as instru-
Holland P. Causal inference, path analysis, and recursive ments for making causal inferences in epidemiology.
structural equations models. Sociol Methodol. Stat Med. 2008;27:1133–63.
1988;18:449–84. Little R, Yau L. Statistical techniques for analyzing data
Hudgens M, Halloran M. Towards causal inference with from prevention trials: treatment of no-shows using
interference. J Am Stat Assoc. 2008;103:832–42. Rubin’s causal model. Psychol Methods. 1998;3:
Hunink M, Glasziou P, Siegel J, Weeks J, Pliskin J, 147–59.
Elstein A, Weinstein M. Making in health and medi- Loeys T, Goetghebeur E. A causal proportional hazards
cine: integrating evidence and values. Cambridge: estimator for the effect of treatment actually received in
Cambridge University Press; 2001. a randomized trial with all-or-nothing compliance. Bio-
Imbens G. Nonadditive models with endogenous regres- metrics. 2003;59:100–5.
sors. New York: Cambridge University Press; 2007. Lorch S, Baiocchi M, Ahlberg C, Small D. The differential
Imbens G, Angrist J. Identification and estimation of local impact of delivery hospital on the outcomes of prema-
average treatment effects. Econometrica. 1994;62: ture infants. Pediatrics. 2012a.
467–75. Lorch S, Kroelinger C, Ahlberg C, Barfield W. Factors that
Imbens G, Rosenbaum P. Robust, accurate confidence mediate racial/ethnic disparities in us fetal death rates.
intervals with weak instruments: quarter of birth and Am J Public Health. 2012b;102:1902–10.
education. J R Stat Soc Ser A. 2005;168:109–26. Malkin J, Broder M, Keeler E. Do longer postpartum
Imbens G, Rubin D. Bayesian inference for causal effects stays reduce newborn readmissions? Analysis using
in randomized experiments with noncompliance. Ann instrumental variables. Health Serv Res. 2000;35:
Stat. 1997a;25:305–27. 1071–91.
McClellan M, McNeil B, Newhouse J. Does more inten- Rubin D. Estimating causal effects of treatments in ran-
sive treatment of acute myocardial infarction in the domized and non-randomized studies. J Educ Psychol.
elderly reduce mortality? Analysis using instrumental 1974;66:688–701.
variables. JAMA. 1994;272:859. Rubin D. Formal modes of statistical inference for causal
Moreira M. A conditional likelihood ratio test for structural effects. J Stat Plan Inference. 1990;25:279–92.
models. Econometrica. 1990;71:463–80. Saigal S, Stoskopf B, Feeny D, Furlong W, Burrows E,
Muthen B. A structural probit model with latent variables. Rosenbaum P, Hoult L. Differences in preferences for
J Am Stat Assoc. 1979;74:807–11. neonatal outcomes among health care professionals,
Newman T, Vittinghoff E, McCulloch C. Efficacy of pho- parents, and adolescents. J Am Med Assoc. 1999;281:
totherapy for newborns with hyperbilirubinemia: a cau- 1991–7.
tionary example of an instrumental variable analysis. Sargan J. The estimation of economic relationships using
Med Decis Mak. 2012;32:83–92. instrumental variables. Econometrica. 1958;26:
Neyman J. On the application of probability theory to 393–415.
agricultural experiments. Stat Sci. 1990;5:463–80. Sexton M, Hebel J. A clinical trial of change in maternal
Nie H, Cheng J, Small D. Inference for the effect of smoking and its effect on birth weight. J Am Med
treatment on survival probability in randomized trials Assoc. 1984;251:911–5.
with noncompliance and administrative censoring. Bio- Sham P. Statistics in human genetics. London: Arnold;
metrics. 2011;67:1397–405. 1998.
O’Malley A, Frank R, Normand S. Estimating cost-offsets Shea J. Instrument relevance in multivariate linear models:
of new medications: use of new antipsychotics and a simple measure. Rev Econ Stat. 1997;79:348–52.
mental health costs for schizophrenia. Stat Med. Shetty K, Vogt W, Bhattacharya J. Hormone replacement
2011;30:1971–88. therapy and cardiovascular health in the United States.
Okui R, Small D, Tan Z, Robins J. Doubly robust instru- Med Care. 2009;47:600–6.
mental variables regression. Stat Sin. 2012;22:173–205. Siddique Z. Partially identified treatment effects under
Owen A. Empirical likeliood. Boca Raton: Chapman & imperfect compliance: the case of domestic violence.
Hall/CRC; 2002. IZA Discussion Paper No. 4565. 2009.
Pearl J. Causality. Cambridge: Cambridge University Small D. Sensitivity analysis for instrumental variables
Press; 2009. regression with overidentifying restrictions. J Am Stat
Permutt T, Hebel J. Simultaneous-equation estimation in a Assoc. 2007;102:1049–58.
clinical trial of the effect of smoking on birth weight. Small D, Rosenbaum P. War and wages: the strength of
Biometrics. 1989;45:619–22. instrumental variables and their sensitivity to
Phibbs C, Mark D, Luft H, Peltzman-Rennie D, Garnick D, unobserved biases. J Am Stat Assoc. 2008;103:
Lichtenberg E, McPhee S. Choice of hospital for deliv- 924–33.
ery: a comparison of high-risk and low-risk women. Sobel M. What do randomized studies of housing mobility
Health Serv Res. 1993;28:201. demonstrate? Causal inference in the face of interfer-
Pliskin J, Shepard D, Weinstein M. Utility functions for life ence. J Am Stat Assoc. 2006;101:1398–407.
years and health status. Oper Res. 1980;28:206–24. Sommers BD, Beard CJ, Dahl D, D’Amico AV, Kaplan IP,
Poulson R, Gadbury G, Allison D. Treatment heterogene- Richie JP, Zeckhauser RJ. Decision analysis using indi-
ity and individual qualitative interaction. Am Stat. vidual patient preferences to determine optimal treat-
2012;66:16–24. ment for localized prostate cancer. Cancer. 2007;
Qin J, Zhang B. A goodness-of-fit test for logistic regres- 110:2210–7.
sion models based on case–control data. Biometrika. Stock J, Wright J, Yogo M. A survey of weak instruments
1997;84:609–18. and weak identification in generalized method of
Rehan N. Growth status of children with and without sickle moments. J Bus Econ Stat. 2002;20:518–29.
cell trait. Clin Pediatr. 1981;20:705–9. Tan Z. Regression and weighting methods for causal infer-
Robins J, Greenland S. A comment on Angrist, Imbens and ence using instrumental variables. J Am Stat Assoc.
Rubin: Identification of causal effects using instrumen- 2006;101:1607–18.
tal variables. J Am Stat Assoc. 1996;91:456–8. Tan Z. Marginal and nested structural models using instru-
Robins J, Tsiatis A. Correcting for non-compliance in mental variables. J Am Stat Assoc. 2010;105:157–69.
randomized trials using rank preserving structural fail- Ten Have T, Elliott M, Joffe M, Zanutto E, Datto C. Causal
ure time models. Commun Stat Theory Methods. models for randomized physician encouragement trials
1991;20:2609–31. in treating primary care depression. J Am Stat Assoc.
Rosenbaum P. Observational studies. New York: Springer; 2004;99:16–25.
2002. Terza J, Basu A, Rathouz P. Two-stage residual inclusion
Rosenbaum P. Design of observational studies. New York: estimation: addressing endogeneity in health econo-
Springer; 2009. metric modeling. Health Econ. 2008;27:527–43.
Rosenbaum P, Rubin D. The central role of the propensity Vansteelandt S, Bowden J, Babnezhad M, Goetghebeur
score in observational studies for causal effects. E. On instrumental variables estimation of causal
Biometrika. 1983;70:41–55. odds ratios. Stat Sci. 2011;26:403–22.
Voight B, Peloso G, Orho-Melander M, Frikke-Schmidt R, of risk behaviors: an application to maternal smoking

Barbalic M, Jensen M, Hindy G, Hólm H, Ding E, and orofacial clefts. Health Serv Outcome Res
Johnson T, et al. Plasma HDL cholesterol and risk of Methodol. 2011;11:54–78.
myocardial infarction: a Mendelian randomisation White H. Asymptotic theory for econometricians. 1984.
study. Lancet. 2012;380:572–80. Wooldridge J. On two stage least squares estimation of the
Vytlacil E. Independence, monotonicity, and latent index average treatment effect in a random coefficient model.
models: an equivalence result. Econometrica. 2002; Econ Lett. 1997;56:129–33.
70:331–41. Zelen M. A new design for randomized clinical trials.
Wehby G, Jugessur A, Moreno L, Murray J, Wilcox A, Lie N Engl J Med. 1979;300:1242–5.
R. Genetic instrumental variable studies of the impacts
Introduction to Causal Inference
Approaches 22
Elizabeth A. Stuart and Sarah Naeger
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
Defining Causal Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
Two Concepts: SUTVA and Assignment Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
Careful Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
Strategies for Estimating Causal Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Randomized Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527
Natural Experiments: Instrumental Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
Regression Discontinuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
Difference-in-Difference and Interrupted Time Series Designs . . . . . . . . . . . . . . . . . . . . . . . . 530
Propensity Scores and Other Matching Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
Abstract nonexperimental study designs can be used,

Many questions in health services research as highlighted in this chapter. The chapter
require causal estimates of the effects of poli- first takes care to carefully define the causal
cies or programs on a health outcome. effects of interest and stresses the importance
Although randomized experiments are seen as of careful study design. Overviews of four
the gold standard for estimating causal effects, common nonexperimental study designs are
randomization is often unfeasible and/or then provided: instrumental variables, regres-
impractical or will not answer the question of sion discontinuity, interrupted time series (and
interest. In those cases, rigorous the related approach of difference in differ-
ences), and propensity score matching. An
emphasis is on applications of these methods
E. A. Stuart (*) in health services research and the assumptions
Department of Mental Health, Johns Hopkins Bloomberg underlying each approach. The chapter con-
School of Public Health, Baltimore, MD, USA cludes with open topics and suggestions for
e-mail: estuart@jhsph.edu the conduct of studies aiming to estimate
S. Naeger causal effects in health services research.
Behavioral Health Research and Policy, IBM Watson
Health, Bethesda, MD, USA
e-mail: snaeger@jhsph.edu

https://doi.org/10.1007/978-1-4939-8715-3_33
524 E. A. Stuart and S. Naeger
Introduction health researchers, in particular, come from aca-

demic traditions that emphasize one method over
Many questions in health services research require another (Dowd 2011). This chapter aims to pro-
causal estimates of the effects of policies or provide an overview of methods for estimating causal
grams on a health outcome, for example, the effects, providing a brief introduction to the
effects of expanding access to public health insur- methods commonly used in health services
ance on health (Finkelstein et al. 2004) or the research, including propensity scores, instrumen-
effects of public reporting on quality of care in tal variables, and interrupted time series. For more
nursing homes (Werner et al. 2009). Either due to information on the history behind some of these
practical or ethical concerns, many of these ques- methods (and their use in health services
tions cannot be answered with randomized exper- research), see Dowd (2011) and O’Malley
iments and require sophisticated nonexperimental (2011), and for more description of the role of
methods instead. This is particularly true given structural equation models for assessing causality
recent interest in comparative effectiveness more broadly, see, for example, Pearl (2011).
research and patient-centered outcomes research,
which are both interested in examining the effects
of interventions in real-world settings, among a Defining Causal Effects
broad set of patients, including those who may
normally be excluded from randomized trials To help clarify the concepts and goals of causal
(Berger et al. 2009; Mullins et al. 2012; Oliver inference methods, consider a motivating exam-
et al. 2009; Rosenberg 2009). ple where interest is in examining whether access
Like many other fields in the social sciences, to public health insurance (such as Medicare)
health services researchers are continually faced improves health outcomes. For simplicity, assume
with the challenge of making causal inferences health outcomes are measured by self-reported
from nonexperimental data. As stated by Escarce health status at a particular point in time.
and Flood (2011) in an introduction to a special For each individual, there are two “potential
section of Health Services Research on causality, outcomes:” their self-reported health status if
“Explicit in both definitions [of health services they are in the Medicare program (the “treatment”
research, by AcademyHealth and the Agency for condition), denoted Y(1), and their self-reported
Healthcare Research and Quality] is the notion health status if they are not in Medicare (the
that health services researchers should strive to “control” condition), denoted Y(0) (these could
identify and estimate the causal effects on out- be indexed by i to denote individual i; for simplic-
comes of interest of alternative organizational ity we leave that out). The causal effect of Medi-
structures, management approaches, financing care enrollment is then a comparison of these two
systems, provider practices, and personal choices potential outcomes. When the outcome of interest
regarding lifestyle and behavior. Without a focus is continuous, the comparison between the sets of
on causal effects, it would be impossible to iden- potential outcomes is often a difference in means,
tify the most effective ways to achieve the out- for binary outcomes an odds ratio can be used.
comes we seek through clinical, management, or The comparison of potential outcomes allows
policy interventions” (2011, p. 394). At stake is the definition of the causal effects of interest; they
the need to determine if programs, treatments, and are an essential component to the causal inference
prevention efforts are having a measurable impact framework. There are two other key components
on people and the populations being served in face to this framework. First, the treatment or exposure
of declining resources and funds. of interest: a condition that could theoretically
Health services research is an inherently inter- be given to or withheld from an individual (or a
disciplinary field; researchers come to the field community). The treatment condition should
with varying levels of familiarity with various be defined in relation to a control or reference
methods for estimating causal effects. Many condition. In some cases, the control condition
22 Introduction to Causal Inference Approaches 525
may be no treatment, and in others, it may be an where interest is in predicting these missing
established treatment. For example, if the research potential outcomes.
question of interest is on the impact of electronic The causal effect of interest is the difference in
health records on behavioral health screening potential outcomes (Y(1) and Y(0)) for the same
rates in pediatric clinics, the “treatment” condition individual. The statistical problem of causal infer-
could be using electronic health records, and the ence relates to how we can best predict those
“control” condition could be using paper missing potential outcomes to make estimates.
health records (Hacker et al. 2012). Often the An important distinction is that these effects are
first step in clearly stating causal questions (and defined in relation to potential outcomes and
thus providing causal answers) is in clearly stating become the estimand of interest (i.e., the quantity
what is the intervention of interest and what is the we are interested in estimating), independent of
appropriate comparison condition. In CER, what study design we might use to learn about
for example, a researcher may have to decide them (e.g., randomized vs. nonrandomized). The
whether to compare a particular treatment to estimand is the comparison of the potential out-
another (e.g., one drug vs. another) or one drug comes that defines the causal effect of interest.
versus “usual practice” (which may be a mix of In prelude to concepts that will be discussed
treatments). Although concepts can be defined further below, there may be different estimands of
and some methods are available for cases where interest, and different methods may estimate dif-
there are more than two treatment conditions of ferent estimands, as discussed further below. Two
interest, for simplicity this chapter focuses on common estimands are the “average treatment
binary treatment (treatment vs. control or treat- effect on the treated” (ATT) and the “average
ment 1 vs. treatment 2) comparisons. treatment effect” (ATE). The ATE is the effect of
The next key concept is the units: the entities some treatment if everyone in the population
that the treatment could be given to or withheld receives the treatment versus no one receiving
form at a particular point in time. Units can be the treatment. The ATT, in contrast, is the effect
individuals, medical clinics, or communities, but for the treatment group – the difference in average
the units should correspond to the level of the outcomes if everyone in the treatment group is
treatment being evaluated. In the study of elec- treated and the average outcome if everyone in
tronic health records mentioned above, the units the treatment group actually receives the control
would be pediatric clinics. In a study of a new (this is the “counterfactual” condition). Which
therapy for diabetes, the units would be individual estimand is of more interest will depend on the
patients. substantive question. For example, when investi-
The treatment, units, and potential outcomes gating the effects of potentially harmful “treat-
form the framework for causal inference. The ments” (such as adolescent drug use), the ATT
“fundamental problem of causal inference,” how- may be more relevant since that treatment would
ever, is that only one of the potential outcomes for never be imposed on the full population; instead,
each individual or unit can be observed (Holland interest is on what the effects are for those people
1986). For individuals/units in the treatment con- who are actually drug users. In contrast, the ATE
dition, Y(1) is observed, and for individuals in the is a useful estimate when it is plausible that treat-
control group, Y(0) is observed; at a given point in ment could be disseminated to the entire popula-
time, each individual is either in Medicare or not. tion, for example, fluoride in the public water
For the individuals in Medicare, Y(1) is observed system. Note that in a randomized experiment,
and interest is in predicting what Y(0) would have these quantities are equal in expectation, and so
been if they had not been in Medicare. Similarly, this distinction does not arise. Other methods,
for individuals not in Medicare, Y(0) is observed, such as regression discontinuity and instrumental
and interest is in predicting what Y(1) would have variables, estimate other “local average treatment
been had they been in Medicare. Causal inference effects,” which are effects for a particular sub-
can thus be thought of as a missing data problem, group of individuals and are discussed further
below. Imai et al. (2008) also differentiate “sam- i is in the comparison group, Yi(0) is observed
ple” versus “population” effects, a distinction not (Holland 1986), and these quantities each take on
further discussed in this chapter. a single value, regardless of other conditions, such
As mentioned earlier, in most causal frame- as the treatment assignments of other individuals.
works, the treatments of interest are thought of This is known as “consistency” in the epidemiol-
as (at least hypothetical) “interventions” that one ogy literature (Cole and Frangakis 2009).
can imagine giving or withholding. As stated by The assignment mechanism is the process
Holland (1986, p. 959), “No causation without by which individuals are assigned to receive
manipulation.” In part this is to ensure that the treatment or not. In randomized experiments, the
estimand is clear and that everyone has the same assignment mechanism is the randomization pro-
understanding of what “treatment” versus “con- cess; knowing the assignment mechanism frees
trol” means. However, some of the methods the researcher from making any further assump-
discussed in this chapter have also been used to tions about the distribution of the data. This is
examine noncausal questions, such as to investi- because, in a randomized experiment, treatment
gate racial disparities, using the framework of assignment is independent of the individual’s
“balanced comparisons,” where we want to com- potential outcome. In observational studies, the
pare two groups that are as similar as possible on a researcher must infer the mechanism or process
set of observed characteristics. Zaslavsky et al. by which individuals end up in the treatment or
(2012) discuss these ideas in more detail. In this comparison group. In the example above, the
chapter, the focus is on the “effects of causes” researcher would need to model the process
rather than the “causes of effects,” as delineated through which some individuals receive Medicaid
by Holland (1986). In this way, interest is in coverage and some do not. This relates back to the
questions regarding what are the effects of partic- problem of missing data, in that the process that
ular policies, interventions, or “treatments,” creates the missing potential outcomes must be
rather than broader (and perhaps less specific) accounted for when estimating causal effects
questions about causal mechanisms or causal (Greenland 2005; Little and Rubin 2000).
models more generally.
Careful Design
Two Concepts: SUTVA and Assignment
Mechanism One theme of this chapter is the importance of
careful design. Randomized experiments have a
Since causal effects are estimated at the group particularly useful design; individuals are
level but potential outcomes are defined as indi- assigned to receive the treatment or control con-
vidual level phenomena, several assumptions are dition randomly. (The benefits of randomization
required (Little and Rubin 2000). The first is the will be discussed further below). When the assign-
stable unit treatment value assumption (SUTVA). ment mechanism is known, it is possible to obtain
SUTVA has two components: first, that there is unbiased estimates of treatment effects with no
only one “version” of each treatment: that the assumptions (for now assuming away any non-
“treatment” is well defined in that there are not compliance or missing data).
two different types of treatments within the “treat- In contrast, any nonexperimental study must
ment” condition, and second, that there is no rely on some (mostly untestable) assumptions.
interference in that the treatment assignment of Those assumptions are discussed briefly below for
one unit does not affect the potential outcomes of each method and in more detail in the accompany-
any other units. An implication of this assumption ing chapters. For that reason nonexperimental stud-
is that each unit has a unique potential outcome ies require smart choices, “choice as an alternative
under the treatment and control conditions; i.e., if to control” in the words of Paul Rosenbaum (1999,
person i is treated, Yi(1) is observed, and if person 2005a) and thoughtful designs to isolate the effects
of the treatments of interest. In other words, when This chapter aims to give researchers some
you can’t randomize, make smart choices to yield tools to start thinking about those possible
robust causal inferences. designs, outlining the basics of study designs
Many of these choices will involve selecting an with a focus on nonexperimental studies.
appropriate control group, as stressed in Readers who are interested in learning more
Rosenbaum (2010) and Cook et al. (2008). The about the careful design of nonexperimental
key feature of a randomized experiment is that it studies should refer to the discussion of threats
produces comparable or balanced treatment and to validity in Shadish et al. (2002) discussion of
control groups that lead to unbiased and consistent the importance of careful design and methods of
estimates of causal effects. Therefore, in terms of design sensitivity in Rosenbaum (1999, 2010)
internal validity, when experimental designs are and discussion of the role of design versus anal-
not available creating comparable treatment and ysis in Rubin (2007).
control groups in observational studies is more
important than creating samples that represent a
population. A study by Lehman et al. (1987) on
Strategies for Estimating Causal
the long-term effects of the sudden and unex-
Effects
pected loss of a spouse or a child provides an
example. Individuals who had either lost a spouse
This section provides an overview of common
or child in a car accident in the 4–7 years prior to
study designs that aim to estimate causal effects.
the study made up the treatment group. In order to
These descriptions are not meant to be fully
isolate the bereavement effects the authors created
detailed but rather to provide a broad understand-
a control group by identifying 7581 individuals
ing of the approach, when it can be used, and what
through driver’s license renewals and then
its underlying assumptions are. Examples of how
matched one control subject to each treatment
each design has been used in health services
group member on gender, age, family income in
research are provided.
1976 (i.e., the time period before the crash), edu-
cation level, and the number and ages of children
(Lehman et al. 1987). By carefully creating bal-
anced treatment and control groups, the authors Randomized Experiments
were able to demonstrate that psychological dis-
tress was significantly greater in the treatment First formalized by Fisher (1926), randomized
subjects (Lehman et al. 1987; Rosenbaum experiments are considered the gold standard of
2005a). In an example from road safety, causal inference, since, as mentioned above,
Rosenbaum (2010) describes a study that was (when “clean”) they yield unbiased estimates of
looking at the association of road features with treatment effects (at least for the sample at hand)
accidents; the “treatment” conditions were acci- with no additional assumptions. In contrast, all
dent sites, and the comparison conditions were of the nonexperimental methods discussed
sites exactly one mile prior to the accident at the below rely on at least some assumptions. Intui-
same time as the accident, with the idea that the tively, randomization to treatment or control
car in the accident passed by that site (with groups means that the groups are equivalent on
no problem) just before the accident, thus everything at baseline, except which treatment
controlling for factors such as weather and char- they receive. This means that any difference
acteristics of the drivers. Because of the need to in outcomes between groups can be attributed
rely on untestable assumptions, sensitivity ana- to the treatment and not to any preexisting
lyses are particularly crucial in nonexperimental differences. Mathematically it can be shown that
studies – assessing the robustness of results to the average potential outcomes observed in each
other (plausible) assumptions and considering group (treatment or control) provides an unbiased
other possible designs. estimate of the average potential outcome
under that condition for the population (Neyman Natural Experiments: Instrumental
1923, 1934). Variables
The three key properties of randomized exper-
iments that ensure estimates of causal effects are In some cases researchers do not have power over
unbiased are as follows. First, the treatment the treatments individuals (or providers or com-
assignment is “unconfounded” which means the munities) do or do not receive but can identify
randomization process is independent of the some naturally occurring randomness in who
potential outcomes. Second, each individual or receives which treatment. These methods rely on
unit in the experiment has a positive probability finding an “instrument” that is (or can be thought
of receiving each treatment condition (i.e., each of as) randomly assigned, affects the treatment
person could potentially be in either the treatment individuals receive, but does not affect their out-
or control group). And finally, the study is comes directly. Instrumental variable designs are
designed without any knowledge of the potential sometimes referred to as “encouragement
outcomes. designs” as the instrument can be thought of as
Examples of randomized experiments in something that encourages individuals to take the
health services research include the Oregon treatment of interest (or not). Examples of instru-
Medicaid Coverage experiment (Baicker and mental variables (IVs) in HSR include Bao et al.
Finkelstein 2011). Researchers used a lottery to (2006), who, in examining the effect of providers
randomly allocate low-income adults between giving smoking cessation advice, used whether or
19 and 64 years old to either receive Medicaid not the provider provided diet/nutrition or physi-
or be assigned to a waiting list for Medicaid. cal activity advice as an instrument. Linden and
Although not originally implemented for this Adams (2006) use zip code as an instrument for
purpose, the lottery process allowed researchers participation in disease management programs,
to estimate the causal effects of Medicaid enroll- since not all geographic areas are covered by
ment compared to being uninsured. Preliminary such programs. Geography is commonly used as
results for the study indicated that Medicaid cov- an instrument, as it takes advantage of the fact that
erage increases health-care use (Baicker and many medical treatments are more accessible in
Finkelstein 2011). some geographic areas than others (e.g., McClel-
However, as has been widely discussed lan et al. 1994).
(Gluud 2006; Marcus et al. 2012; Rothwell IV methods essentially work by fitting two
2005), randomized trials do have their own models: first, a model of treatment received as a
complications. These include noncompliance, function of the instrument and covariates and,
where people do not take their assigned treat- second, a model of outcome as a function of
ments (Frangakis and Rubin 2002; Marasinghe treatment received and the covariates. The “exclu-
and Amarasinghe 2007; Peduzzi et al. 1993), sion restriction” (described further below)
missing outcome data (Frangakis et al. 2007), means that the instrument is “excluded” (not in)
worries that the people who enroll in a trial the second-stage model. Because these two equa-
may be different from those of broader interest tions are related (and the error terms therefore
(Marcus et al. 2012; Zimmerman et al. 2005), correlated), the models are generally fit using
and ethical concerns about randomization two-stage least squares models (Angrist and
(Crawford et al. 2011; De Melo-Martín et al. Imbens 1995, 1996).
2011; Hughes 2009). Because of these There are two primary assumptions on which
concerns, nonexperimental studies are some- IV methods rely (in addition to the SUTVA
times used to estimate the causal effects of assumption described above). The first is known
“treatments,” interventions, or exposures of as “monotonicity” and basically implies that there
interest. We will see that many of these are no “defiers:” no people who go against the
designs attempt to replicate key features of instrument in terms of what treatment they
experiments. receive. In other words, no one who would take
the treatment if not “encouraged to” by the instru- “local average treatment effect,” also known as
ment but who would not take the treatment when the “complier average causal effect:” the effect of
“encouraged” to do so by the instrument. The the treatment for the “compliers,” those individ-
second set of assumptions are what are known as uals whose behavior is affected by the instrument
the “exclusion restrictions.” These say that there is and who will take the treatment when “told” to do
no effect of the instrument on individuals whose so (when encouraged by the instrument) but not
behavior is not changed by the instrument. In when not encouraged. In the example above, com-
other words, there is no effect of the instrument pliers would be individuals who seek out health
on outcomes for people who would either always insurance coverage when private or public options
take the treatment (whether encouraged to or not are available and there are positive attitudes
by the instrument) or for people who would never toward public assistance, but who do not seek
take the treatment (whether encouraged to or not). out insurance coverage when these are not oper-
This is sometimes stated as that there is “no direct ating (Long et al. 2005). The complier average
effect” of the instrument on the outcomes; the causal effect is also known as a “marginal treat-
only way the instrument can change outcomes is ment effect” in the economics literature (Carneiro
by changing the treatment that individuals et al. 2011).
receive. This assumption is often questionable. Another consideration when using IV methods
To illustrate these two assumptions, consider is what is known as the “strength” of the instru-
treatment assignment and actual treatment status. ment: how correlated the treatment assignment
In a randomized experiment, these two conditions (instrument) is with the actual treatment status
are typically one and the same and are manipu- (the treatment received). A strong instrument is
lated by the researcher. In the context of an IV highly correlated with the actual treatment
design, the instrument influences (encourages) an received. A week instrument, Sin contrast, is
individual’s treatment assignment, but other fac- only weakly associated with the actual treatment
tors, such as individual-level covariates, influence received (i.e., it is a poor predictor of treatment
compliance with the assignment (i.e., treatment status). Weak instruments lead to reduced power
status). The monotonicity assumption means that and biased IV estimates (Bound et al. 1995).
there is a positive correlation between treatment
assignment and status. As an example, in the Long
et al. (2005) study of the impact of Medicaid on Regression Discontinuity
improving access to care, treatment status was
defined as being privately insured, having Medic- Introduced by Thistlethwaite and Campbell
aid coverage, or being uninsured. The four instru- (1960), regression discontinuity (RD) is a partic-
mental variables (i.e., the treatment assignment ularly strong nonexperimental design that can be
variables) included accessibility of private insur- used when the treatment of interest is assigned on
ance, availability of public coverage, and family the basis of some “assignment variable” and cut-
and community attitudes toward public assis- off. For example, individuals with cholesterol
tance; under the monotonicity assumption, the levels above 200 may be put into a care manage-
influence of these variables can only increase the ment program, whereas those with lower choles-
likelihood that an individual is privately insured terol are not given access to the program. The idea
or has Medicaid coverage. The exclusion restric- is to compare individuals just below and just
tions require that accessibility of private insur- above the cutoff, who should be otherwise similar
ance, availability of public coverage, and but with one group receiving the treatment of
attitudes toward public assistance only influence interest and the other not. The analysis examines
insurance coverage and do not have any effect on whether there is a “discontinuity” in the outcome
health-care utilization directly (Long et al. 2005). variable at the cutoff, which would indicate an
One important point about IV methods is that effect of the treatment. RD is similar to random-
they technically estimate what is known as the ized experiments in that the assignment
mechanism is known, and that is what allows us to estimates the effect only for those just around the
obtain reliable treatment effect estimates. cutoff. This arguably, however, is the group for
Examples of RD designs in HSR include stud- whom the effect is most relevant as presumably
ies of disease management programs (Linden and these are the people who may or may not receive
Adams 2006), which may be a particularly good the intervention (i.e., those with very high or very
setting for RD since eligibility for the program is low scores may not be reasonable candidates for
often determined by clinical measures to ensure the intervention under investigation). The design
that the program is provided to those most in need. is not appropriate for estimating the effect of the
RD designs may also be appropriate when treatment for individuals with assignment vari-
resources permit serving only a portion of the ables nowhere near the cutoff.
population and those most in need are served Sensitivity analyses are important in RD
first, in which case there may be a discontinuity designs. Important sensitivity analysis options
at the point at which resources are gone. This sort include “zero checks” where the analysis is
of idea was used by Ludwig and Miller (2006) in repeated using fake cutoffs, to confirm that no
estimating the effects of Head Start, who used a “effect” is seen there, as well as assessing sensi-
discontinuity in grant writing support for original tivity to the model specification, as mentioned
Head Start grants, with that support given to the above. It is also important to note that RD designs
300 poorest counties in the country. only work when the treatment was in fact given
This section highlights a few assumptions and out on the basis of the cutoff variable; they cannot
requirements of the RD method, as described by be used in a “post hoc” way if that was not in fact
Trochim (1984). First, for the most basic form of how the treatment was administered.
RD analyses, the cutoff must be followed. (In fact, For more information on RD designs, see
more advanced “fuzzy” RD designs can be used if Imbens and Lemieux (2008). Wong et al. (2012)
there is some “noncompliance,” where some indi- provides discussion of extensions for studies with
viduals who were eligible didn’t receive the treat- multiple assignment variables or cutoff points.
ment and some individuals who were not eligible The appendix of Linden et al. (2006) provides a
did receive it; see Imbens and Lemieux (2008)). relatively easy to read description of the actual
Second, accurate modeling of the relationship models run to estimate effects in RD designs.
between the assignment variable and the outcome
is crucial, for example, allowing for a nonlinear
relationship or other flexible models. Ludwig and Difference-in-Difference
Miller (2006) consider a variety of functional and Interrupted Time Series Designs
forms in order to assess sensitivity to the model.
Third, the sample size around the cutoff must be A common approach for estimating the effects of
large enough to fit those models reliably and with discrete policy changes is interrupted time series
sufficient precision. Goldberger (2008) indicates (ITS) analyses (or a simplified version, difference
that sample sizes 2.75 times larger than would be in differences). These methods rely on sophisti-
required for adequate power in an RCT are needed cated before-after analyses to compare observed
for RD designs. trends in the presence of an intervention with the
Threats to the validity of RD designs include time trends that would have been predicted in the
cases where the assignment variable is manipu- absence of the intervention. Briefly, at its most
lated because of the treatment assignment process, basic level, the treatment effect is estimated by
for example, clinicians manipulating the assign- modeling the “outcome” of interest in the
ment variable so that patients they want to have pre-period, extrapolating that model fit to the
participate in the program are seen as eligible. post period, and estimating the effect as the dif-
Similar to the idea of the “local average treat- ference between the expected values (from that
ment effect” in instrumental variables analyses, a model fit) and the observed values. Interest may
limitation of the RD design is that it formally be in determining whether the intervention leads
to a jump at the time of implementation raised the cigarette excise tax by 25 cents per
(an “interruption”) or also possibly a change in pack in order to fund anti-smoking initiatives
the slope of the time series trend. The simpler across the state. Similarly, O’Malley et al. (2006)
model, difference in differences, essentially col- discuss the careful choice of comparison groups in
lapses the “pre”- and “post”-time periods, com- the context of a difference-in-difference analysis
paring the change in the outcome from of interventions aimed to encourage the use of
pre-intervention to post-intervention between the generic drugs.
intervention group and a comparison group (see An important consideration in ITS models is
O’Malley et al. 2006, for an example). serial correlation and accounting for the correla-
ITS designs abound in HSR. Campbell et al. tion of measures across time. Since the error terms
(2009) use an ITS design to evaluate the effect of in the regression models will likely be correlated,
pay for performance on the quality of care in it is important to test for autocorrelation using a
primary care practices. They collected data from test such as Durbin’s test (Durbin 1970) and
42 primary care practices at two time points prior appropriately model that autocorrelation, for
to the policy implementation and at two time example, using AR-1 models (Mills 1990). See
points post policy implementation. Data on Wagenaar et al. (2009) for an example.
patient care, patient perception of access to care,
and continuity of case were used to determine if
care for patients with asthma, diabetes, or coro- Propensity Scores and Other Matching
nary heart disease improved after the pay-for per- Methods
formance plan was implemented (Campbell et al.
2009). As another example, Andersson et al. The final nonexperimental method discussed is
(2006) use interrupted time series to investigate that of propensity score methods, which broadly
the effects of changes in the pharmaceutical reim- are used to equate two groups and ensure that the
bursement schedule in Sweden on costs and vol- treatment effect is being estimated among treated
umes of pharmaceuticals. and comparison subjects who are otherwise simi-
ITS methods are most useful when (1) there is lar. In this respect, propensity score methods aim
an abrupt policy change (e.g., a new law) and to replicate two key features of a randomized
(2) there is sufficient pre-change data with which experiment: (1) create groups that are similar on
to estimate trends reliably. And while not background characteristics (or at least the
required, a comparison group that did not experi- observed ones) and (2) the outcome is not used
ence the policy change can be very useful in terms in setting up the “design” of the study. The pro-
of providing accurate results. In particular, com- pensity score itself is defined as the probability of
parative interrupted time series designs are partic- receiving the treatment and is estimated by model-
ularly strong since they provide information on ing treatment status as a function of baseline char-
trends in the post-period in comparison units (e.g., acteristics. Because of the properties of the
states) that did not experience the policy change. propensity score (Rosenbaum and Rubin 1983),
Without such a comparison group, the results are they are particularly useful for creating groups
more reliant on the time series models themselves; that look similar with respect to a large set of
this can be misleading, for example, when there characteristics; researchers can then match, sub-
are strong time trends even in the absence of the classify, or weight using just the propensity score
intervention (e.g., increasing test scores in educa- itself, rather than having to deal with each variable
tion research). Linden and Adams (2010) provide individually. See Stuart (2010) for more details.
an example of combining ITS methods with pro- Propensity score methods involve two stages:
pensity score weighting (discussed more below) (1) fitting a propensity score model and (2) using
to create a particularly good comparison group for those propensity scores to create balanced sam-
the ITS analysis. Their study estimates the effect ples. Common propensity score estimation
of California’s Proposition 99, which in 1988 methods include logistic regression as well as
nonparametric methods such as boosted CART or The validity of the unconfoundedness

random forests (Lee et al. 2010). Common assumption will also likely depend on the set-
methods of using propensity scores include ting, in that, for example, the assumption may be
matching, subclassification, and weighting (Stuart more believable when the treatment assignment
2010). Matching aims to find one (or more) is made by an external party on the basis of
matched comparison subjects for each treated sub- observed characteristics (e.g., a physician,
ject; most matching methods estimate the average using medical records that researchers have
treatment effect on the treated. Subclassification access to, or a teacher selecting students on the
groups subjects into small sets with similar pro- basis of test scores), as compared to studies
pensity score values (e.g., by the deciles of the where individuals self-select into treatments, in
propensity score distribution). Weighting uses ways that may be related to unobserved factors
ideas similar to survey weighting, where individ- such as motivation or an individual’s own assess-
uals are weighted by functions of the propensity ment of how effective they think the treatment
score. The most common weighting approach, will be for them. This may be a particular
known as inverse probability of treatment concern in many HSR studies that rely on
weighting (IPTW), weights the treatment and publicly available data that was not originally
comparison group up to the combined sample designed to answer the question of current
and estimates the average treatment effect. interest and where the variables that predict
Examples of propensity score methods in HSR treatment assignment may not be observed
include Werner et al. (2009), which used propen- (e.g., clinical measures may not be available in
sity score matching combined with ITS to esti- a claims file). In this case, one strategy is to
mate the effect of public reporting of nursing follow the strategy recommended by Rosenbaum
home quality measures on quality of care. In that (2010), which is to deal as well as possible
work, nursing home residents before the policy with the observed characteristics (“overt bias”)
change were matched to residents after the using methods such as propensity scores, and
change, to ensure a comparable case-mix follow that with an analysis of sensitivity to
over time. unobserved confounding (“hidden bias”).
The key assumption underlying propensity While not yet fully disseminated, there are a
score methods is that of unconfounded treatment number of sensitivity analyses that can be done
assignment, also known as “no hidden bias” or to assess sensitivity to an unobserved
“strong ignorability” (Rosenbaum and Rubin confounder; these analyses ask the question
1983). This assumes that there are no unobserved “How strongly related to treatment assignment
differences between the treatment and comparison and the outcome would an unobserved
groups, given the observed covariates. It is crucial confounder have to be to make my observed
to think carefully about the validity of this treatment effect go away?” See Rosenbaum
assumption in any given study and whether the (2005b), Schneeweiss (2006), and Liu et al.
data at hand are sufficient in terms of the variables (2013) for more background on these methods,
observed and available for the propensity score including links to software to implement
model. Propensity score methods generally work these approaches. A second challenge with the
best when there is a large set of variables on which sorts of large datasets often used in HSR is that
to match (demographics are generally not suffi- they are often cross-sectional, without repeated
cient) and in particular when baseline measures of measures of individuals. The challenge in
the outcomes are available (Steiner et al. 2010). this setting is to identify which variables can be
For example, when assessing self-reported health safely considered “pretreatment” and therefore
as an outcome, it is important to have a matched on, versus those that may have
pre-intervention baseline measure of self-reported been affected by the treatment and thus
health status (or of variables highly correlated should not be matched on (known generally as
with such a measure). “posttreatment bias”; Imai et al. 2010).
Conclusions There are also many important directions for

further research in the field of causal inference
Health services research involves answering relevant for HSR. These include modifications of
many important questions, many of which are existing methods to handle very large datasets
causal, aiming to understand what are the effects such as electronic health records or medical
of particular policies or programs. This chapter claims (e.g., the high-dimensional propensity
has aimed to provide an overview of the methods score approach of Schneeweiss et al. (2009)). A
available to answer such causal questions, with a second challenge in the coming years is to identify
goal of providing an overview of a variety of methods to better detect treatment effect hetero-
methods so that researchers know the breadth of geneity. Some progress has been made recently,
methods available. There is no one single method but this will be an important area for further work,
that will be best for answering every possible especially as there is increasing interest in deter-
study; the method needs to be chosen in the con- mining “what works for whom” and under what
text of any given research question. For example, settings treatments are effective.
regression discontinuity and interrupted time Methods for causal inference are an important,
series designs can work very well when the data and expanding, set of tools for health services
arise from such settings but cannot be used when researchers. Answering causal questions well
the data does not. will ultimately help us better understand how to
In many cases, researchers may be left improve health and health care for people across
selecting between instrumental variables and the globe.
propensity score approaches. Again, the optimal
choice will depend on the particular question:
whether a plausible instrument exists and References
whether the assumption of unconfounded treat-
ment assignment is believable. In brief: which Andersson K, Petzold MG, Sonesson C, Lonnroth K,
Carlsten A. Do policy changes in the pharmaceutical
method’s assumptions are more likely satisfied?
reimbursement schedule affect drug expenditures?
How much is known about the process that Interrupted time series analysis of cost, volume, and
determined who was treated and who was not cost per volume trends in Sweden 1986–2002. Health
and what are the characteristics associated with Policy. 2006;79:231–43.
Angrist JD, Imbens GW. Two-stage least squares estima-
that choice observed? This decision may also
tion of average causal effects in models with variable
relate to who was making the treatment treatment intensity. J Am Stat Assoc. 1995;90(430):
decisions: when an individual is self-selecting 431–42. https://doi.org/10.1080/01621459.1995.
the treatment, there may be more concern about 10476535.
Angrist JD, Imbens GW. Identification of causal effects
the plausibility of unconfounded treatment
using instrumental variables. J Am Stat Assoc.
assignment. In contrast, if another decision 1996;91:444–55.
maker (e.g., a physician) is making the decision Baicker K, Finkelstein A. The effects of Medicaid cover-
based on variables that are largely observed, age – learning from the Oregon experiment. N Engl
J Med. 2011;365(8):683–5.
unconfounded treatment assignment may be
Bao Y, Duan N, Fox SA. Is some provider advice on smoking
more reasonable. cessation better than no advice? An instrumental variable
At a minimum, analyses of sensitivity to those analysis of the 2001 National Health Iinterview Survey.
assumptions should be done, such as the sensitiv- Health Serv Res. 2006;41(6):2114–35.
Berger ML, Mamdani M, Atkins D, Johnson ML. Good
ity analyses discussed above for propensity score
research practices for comparative effectiveness
methods, as well as methods that help assess the research: defining, reporting and interpreting non-
validity of the exclusion restrictions in IV (Green- randomized studies of treatment effects using second-
land 2000). In some cases, both analyses may be ary data sources: the ISPOR good research practices
for retrospective database analysis task force report –
plausible, and doing the analysis both ways
part I. Value Health. 2009;12(8):1044–52.
may help provide a sense for the robustness of Bound J, Jaeger DA, Baker RM. Problems with instrumen-
the results. tal variables estimation when the correlation between
the instruments and the endogenous explanatory Greenland S. An introduction to instrumental variables for
variable is weak. J Am Stat Assoc. 1995;90(430): epidemiologists. Int J Epidemiol. 2000;29(4):722–9.
443–50. Greenland S. Epidemiologic measures and policy formu-
Campbell SM, Reeves D, Kontopantelis E, Sibbald B, lation: lessons from potential outcomes. Emerging
Roland M. Effects of pay for performance on the qual- Themes in Epidemiology. 2005;2(1):5.
ity of primary care in England. N Engl J Med. 2009; Hacker K, Penfold R, Zhang F, Soumerai SB. Impact of
361(4):368–78. https://doi.org/10.1056/NEJMsa08 electronic health record transition on behavioral health
07651. screening in a large pediatric practice. Psychiatr Serv.
Carneiro P, Heckman JJ, Vytlacil EJ. Estimating marginal 2012;63(3):256–61.
returns to education. Am Econ Rev. 2011;101(6): Holland PW. Statistics and causal inference. J Am Stat
2754–81. Assoc. 1986;81(396):945–60.
Cole SR, Frangakis CE. The consistency statement in Hughes JR. Ethical concerns about non-active conditions in
causal inference: a definition or an assumption? Epide- smoking cessation trials and methods to decrease such
miology. 2009;20(1):3–5. concerns. Drug Alcohol Depend. 2009;100(3):187–93.
Cook TD, Shadish WR, Wong VC. Three conditions under Imai K, Keele L, Yamamoto T. Identification, inference and
which experiments and observational studies produce sensitivity analysis for causal mediation effects. Stat
comparable causal estimates: new findings from within- Sci. 2010;25(1):51–71.
study comparisons. J Policy Anal Manage. 2008;27(4): Imai K, King G, Stuart EA. Misunderstandings between
724–50. https://doi.org/10.1002/pam.20375. experimentalists and observationalists about causal
Crawford MJ, Thana L, Methuen C, Ghosh P, Stanley SV, inference. J R Stat Soc Ser A Stat Soc. 2008;171(2):
Ross J, Gordon F, et al. Impact of screening for risk of 481–502.
suicide: randomized controlled trial. Br J Psychiatry. Imbens GW, Lemieux T. Regression discontinuity designs:
2011;198(5):379–84. a guide to practice. J Econ. 2008;142(2):615–35.
De Melo-Martín I, Sondhi D, Crystal RG. When ethics Lee BK, Lessler J, Stuart EA. Improving propensity score
constrains clinical research: trial design of control weighting using machine learning. Stat Med. 2010;
arms in “greater than minimal risk” pediatric trials. 29(3):337–46.
Hum Gene Ther. 2011;22(9):1121–7. Lehman DR, Wortman CB, Williams AF. Long-term
Dowd BE. Separated at birth: statisticians, social scientists, effects of losing a spouse or child in a motor vehicle
and causality in health services research. Health Serv crash. J Pers Soc Psychol. 1987;52(1):218–31.
Res. 2011;46(2):397–420. Linden A, Adams JL. Evaluating disease management pro-
Durbin J. Testing for serial correlation in least-squares gramme effectiveness: an introduction to instrumental
regression when some of the Regressors are lagged variables. J Eval Clin Pract. 2006;12(2):148–54. https://
dependent variables. Econometrica. 1970;38(3): doi.org/10.1111/j.1365-2753.2006. 00615.x.
410–21. Linden A, Adams JL, Roberts N. Evaluating disease man-
Escarce JJ, Flood AB. Introduction to special section: agement programme effectiveness: an introduction to
causality in health services research. Health Serv Res. the regression discontinuity design. J Eval Clin Pract.
2011;46(2):394–6. https://doi.org/10.1111/j.1475- 2006;12(2):124–31.
6773.2011.01255.x. Linden A, Adams JL. Using propensity score-based weighting
Finkelstein EA, Fiebelkorn IC, Wang G. State-level esti- in the evaluation of health management programme effec-
mates of annual medical expenditures attributable to tiveness. J Eval Clin Pract. 2010;16(1):175–9.
obesity*. Obes Res. 2004;12(1):18–24. https://doi. Little RJ, Rubin DB. Causal effects in clinical and
org/10.1038/oby.2004.4. epidemiological studies via potential outcomes:
Fisher R. The arrangement of field experiments. Journal of concepts and analytical approaches. Annu Rev
Ministry of Agriculture. 1926;33:500–13. Public Health. 2000;21:121–45. https://doi.org/10.
Frangakis CE, Rubin DB. Principal stratification in causal 1146/annurev.publhealth.21.1.121.
inference. Biometrics. 2002;58(1):21–9. Liu W, Kuramoto SK, Stuart EA. An introduction to sensitiv-
Frangakis CE, Rubin DB, An MW, MacKenzie ity analysis for unobserved confounding in non-experi-
E. Principal stratification designs to estimate input mental prevention research. Prev Sci. 2013;14(6):570–80.
data missing due to death. Biometrics. 2007;63(3): PMCID:3800481.
641–9. Long SK, Coughlin T, King J. How well does medicaid
Gluud LL. Bias in clinical intervention research. Am work in improving access to care? Health Serv Res.
J Epidemiol. 2006;163(6):493–501. https://doi.org/ 2005;40(1):36–58. https://doi.org/10.1111/j.1475-6773.
10.1093/aje/kwj069. 2005.00341.x.
Goldberger A. Selection bias in evaluating treatment Ludwig J, Miller DL. Does head start improve children’s
effects: some formal illustrations. In: Modelling and life chances? Evidence from a regression discontinuity
evaluating treatment effects in econometrics, Advances design. Institute for the Study of Labor (IZA). 2006.
in econometrics. Bingley: Emerald Group Publishing Retrieved from http://ideas.repec.org/p/iza/izadps/
Limited; 2008. p. 1–31. dp2111.html
Marasinghe JP, Amarasinghe AAW. Noncompliance in Rosenberg L. Comparative effectiveness research: making
randomized controlled trials [4]. CMAJ. 2007; it work for those we serve. J Behav Health Serv Res.
176(12):1735. 2009;36(3):283–4.
Marcus SM, Stuart EA, Wang P, Shadish WR, Steiner Rothwell PM. External validity of randomised controlled
PM. Estimating the causal effect of randomization ver- trials? To whom do the results of this trial apply??
sus treatment preference in a doubly randomized pref- Lancet. 2005;365(9453):82–93.
erence trial. Psychol Methods. 2012;17(2):244–54. Rubin DB. The design versus the analysis of observa-
McClellan M, McNeil BJ, Newhouse JP. Does more intentional studies for causal effects: parallels with the
sive treatment of acute myocardial infarction in the design of randomized trials. Stat Med. 2007;
elderly reduce mortality? Analysis using instrumental 26(1):20–36.
variables. JAMA. 1994;272:859–66. Schneeweiss S. Sensitivity analysis and external adjust-
Mills TC. Time series techniques for economists. Cam- ment for unmeasured confounders in epidemiologic
bridge: Cambridge University Press; 1990. database studies of therapeutics. Pharmacoepidemiol
Mullins CD, Abdulhalim AM, Lavallee DC. Continuous pat- Drug Saf. 2006;15(5):291–303.
ient engagement in comparative effectiveness research. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H,
JAMA J Am Med Assoc. 2012;307(15):1587–8. Brookhart MA. High-dimensional propensity score
Neyman J. On the application of probability theory to adjustment in studies of treatment effects using health
agricultural experiments. Essay on principles. Stat care claims data. Epidemiology. 2009;20(4):512–22.
Sci. 1923;5(4):465–80. Shadish WR, Cook TD, Campbell DT. Experimental and
Neyman J. On the two different aspects of the representa- quasi-experimental designs for generalized causal
tive method: the method of stratified sampling and the inference. 2nd ed. Belmont: Wadsworth Publishing;
method of purposive selection. J R Stat Soc. 2002.
1934;97:558–606. Steiner PM, Cook TD, Shadish WR, Clark MH. The
Oliver S, Armes DG, Gyte G. Public involvement in setting importance of covariate selection in controlling for
a national research agenda: a mixed methods evaluation. selection bias in observational studies. Psychol
Patient Patient-Cent Outcomes Res. 2009;2(3):179–90. Methods. 2010;15(3):250–67.
O’Malley AJ. Commentary on Bryan Dowd’s paper “Sep- Stuart EA. Matching methods for causal inference: a
arated at birth: statisticians, social scientists, and cau- review and a look forward. Stat Sci. 2010;25(1):1–21.
sality in health services research”. Health Serv Res. Thistlethwaite DL, Campbell DT. Regression-
2011;46(2):430–6. discontinuity analysis: an alternative to the ex post
O’Malley AJ, Frank RG, Kaddis A, Rothenberg BM, facto experiment. J Educ Psychol. 1960;51(6):309–17.
McNeil BJ. Impact of alternative interventions on Trochim W. Research design for program evaluation; the
changes in generic dispensing rates. Health Serv Res. regression-discontinuity design. Beverly Hills: Sage;
2006;415(5):1876–94. 1984.
Pearl J. Statistics and causality: Separated to reunite – Wagenaar AC, Maldonado-Molina MM, Wagenaar
commentary on Bryan Dowd’s “Separated at birth”. BH. Effects of alcohol tax increases on alcohol-related
Health Serv Res. 2011;46(2):421–9. disease mortality in Alaska: time-series analysis
Peduzzi P, Wittes J, Detre K, Holford T. Analysis from 1976 to 2004. Am J Public Health. 2009;
as-randomized and the problem of non-adherence: 99(8):1464–70.
an example from the veterans affairs randomized Werner RM, Konetzka RT, Stuart EA, Norton EC, Polsky D,
trial of coronary artery bypass surgery. Stat Park J. Impact of public reporting on quality of Postacute
Med. 1993;12(13): 1185–95. https://doi.org/10.1002/ care. Health Serv Res. 2009;44(4): 1169–87. https://doi.
sim.4780121302. org/10.1111/j.1475-6773.2009.00967.x.
Rosenbaum PR. Choice as an alternative to control in Wong VC, Steiner PM, Cook TD. Analyzing regression-
observational studies. Stat Sci. 1999;14(3):259–304. discontinuity designs with multiple assignment
Rosenbaum PR. Observational study. In: Everitt B, variables: a comparative study of four estimation
Howell D, editors. Encyclopedia of statistics in behav- methods. J Educ Behav Stat. 2012; https://doi.org/
ioral science. Chichester: Wiley; 2005a. 10.3102/1076998611432172.
Rosenbaum PR. Sensitivity analysis in observational stud- Zaslavsky AM, Ayanian JZ, Zaborski LB. The validity
ies. In: Everitt BS, Howell DC, editors. Encyclopedia of race and ethnicity in enrollment data for medicare
of statistics in behavioral science, vol. 4. Chichester: beneficiaries. Health Serv Res. 2012;47(3 Part 2):
Wiley; 2005b. p. 1809–14. 1300–21.
Rosenbaum PR. Design of observational studies, Springer Zimmerman M, Chelminski I, Posternak MA. Generaliz-
series in statistics. New York: Springer; 2010. ability of antidepressant efficacy trials: differences
Rosenbaum PR, Rubin DB. The central role of the propen- between depressed psychiatric outpatients who would
sity score in observational studies for causal effects. or would not qualify for an efficacy trial. Am J
Biometrika. 1983;70(1):41–55. Psychiatr. 2005;162(7):1370–2.
Measurement of Patient-Reported
Outcomes of Health Services 23
Joseph C. Cappelleri and Andrew G. Bushmakin
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
Research Basis and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Background and Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
Selection of Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
Longitudinal Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
Event- or Condition-Driven Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
Time-Driven Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
Timing of the Initial PRO Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
Timing of Follow-Up PRO Assessments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
Frequency of Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
Selection and Evaluation of the Measurement Instrument . . . . . . . . . . . . . . . . . . . . . . . . . 545
Step 1: Formulating Study Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Step 2: Developing or Selecting an Instrument . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
Step 3: Developing Data Collection Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
Step 4: Analyzing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
Step 5: Reporting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
Interpreting Study Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
Abstract address key protocol elements that include the

A patient-reported outcome (PRO) is any report rationale for the specific aspect of PRO being
on the status of a patient’s health condition that measured, explicit research objectives and end-
comes directly from the patient, without interpre- points, strategies for minimizing the exclusion of
tation of the patient’s response by a clinician or subjects from the study, rationale for timing of
anyone else. The measurement of PROs should assessments and off-study rules, rationale for
instruction selection, details for administration
of PRO assessments to minimize bias and miss-
J. C. Cappelleri (*) · A. G. Bushmakin
ing data, sample size estimation, and analytic
Global Product Development, Pfizer Inc, Groton, CT, USA plan. Another key element involves the measure-
e-mail: joseph.c.cappelleri@pfizer.com; ment properties of a PRO. These protocol
andrew.g.bushmakin@pfizer.com

https://doi.org/10.1007/978-1-4939-8715-3_34
538 J. C. Cappelleri and A. G. Bushmakin
elements are central to this chapter as they relate potential for missing data from not only missed
to the design and measurement of PROs. These assessments on an entire PRO but also non-
elements are discussed and framed within the response of some items on a PRO used in a
five characteristics that tend to be associated study. A second characteristic is that being sub-
with PROs: missing and incomplete data, psy- jective and not a so-called “hard” endpoint like
chometric validation, interpretation, multiple death, PROs require their measurement properties
testing, and longitudinal analysis. Special consid- to be assessed, leading to additional steps of val-
eration is given for developing a PRO measure- idation (reliability and validity) prior to their anal-
ment strategy in a regulatory context where the ysis on treatment effect. A third characteristic,
intent is to have a labeling claim on a PRO. related to the second one, is that the interpretation
of PROs may require methods that can enrich and
enhance their interpretation. A fourth characteris-
Introduction tic is that most PROs are multidimensional and
hence produce multiple scores on various aspects
A patient-reported outcome (PRO) is any report of what is being measured, engendering multiple
on the status of a patient’s health condition that comparisons and testing of outcomes that need to
comes directly from the patient, without interpre- be methodologically and statistically addressed.
tation of the patient’s response by a clinician or The fifth characteristic is that the outcomes are
anyone else (Food and Drug Administration generally repeated over time, calling for methods
2009). Patient-reported outcome is an umbrella that effectively handle longitudinal data in the
term that includes a whole host of subjective out- context of the research question.
comes such as pain, fatigue, depression, aspects of Identifying which components of a PRO are
well-being (e.g., physical, functional, psycholog- relevant to measuring the impact of a disease and
ical), treatment satisfaction, health-related quality its treatment is essential to good study design and
of life, and physical symptoms such as nausea and subsequent scientific scrutiny. Successful measure-
vomiting. Patient-reported outcomes are often rel- ment of PROs begins with the development of a
evant in studying a variety of conditions – includ- protocol to provide a recipe for the conduct of the
ing pain, erectile dysfunction, fatigue, migraine, study. The protocol provides not only key elements
mental functioning, physical functioning, and of the study design but also provides the scientific
depression – that cannot be assessed adequately rationale and planned analysis for the study, which
without a patient’s evaluation and whose key are inextricably linked to study design.
questions require patient’s input on the impact of Because the validation of PROs is an ongoing
a disease or a treatment. After all, who knows process, multiple protocols with each having its
better than the patient herself/himself? To be use- specific purpose may often be necessary. A proto-
ful to patients and other decision-makers (e.g., col for a study, be it a clinical trial or a method
physicians, regulatory agencies, reimbursement study, should contain several essential elements.
authorities), who are stakeholders in medical A clinical trial protocol should describe the fol-
care, PRO must undergo a validation process to lowing: the rationale for the specific aspect of
confirm that it is reliably measuring what it is PRO being measured, explicit research objectives
intended to measure. and endpoints, strategies for minimizing the
In general the same clinical trial design princi- exclusion of subjects from the study, rationale
ples that apply to directly assessable clinical end- for timing of assessments and off-study rules,
point measures, like blood pressure, also apply to rationale for instruction selection, details for
PROs. Although not necessarily unique to PROs, administration of PRO assessments to minimize
at least five characteristics tend to be associated bias and missing data, sample size estimation, and
with PROs (Fairclough 2004). One characteristic analytic plan. A method study protocol involves
is, by definition, PROs require the subject’s by definition methodological considerations, such
(patient’s) active participation, resulting in the as which measurement properties of a PRO will be
23 Measurement of Patient-Reported Outcomes of Health Services 539
tested, and these considerations will define the of relevance in assessing outcomes of interest in
design of the study. For example, if an objective relation to the disease, as well as the characteris-
is to obtain test-retest reliability data, data should tics of the patient population under consideration,
be collected at least on two occasions. Contrary to needs to be described and linked to previous
a clinical trial design, which includes a pre- research and to the planned treatment. The reason
selected diseased population at baseline, method for using the PRO component in relation to the
studies may not involve any treatment and may research question needs to be lucid, and the PROs
include a variety of subjects from healthy to need to be clearly defined in the study (Wiklund
severely ill for whom a PRO is designed to assess. 2004). A rationale should be given for not only
The aforementioned protocol elements are cen- why a PRO is being studied but also which spe-
tral to this chapter as they relate to the design and cific aspect of a PRO is central and especially
measurement of PROs. These elements are worthwhile.
discussed and framed within the five characteris- Inherent in PROs is its ability to assist in pro-
tics that tend to be associated with PROs: missing viding a better understanding of disease and treat-
data, validation, interpretation, multiple testing, ment outcomes from the patient’s perceptive, and
and handling of longitudinal data. PROs do so by translating clinical improvement,
Specifically, Section “Research Basis and stability, or deterioration into patient-centered
Goals” covers the research basis surrounding outcomes. As such, PROs represent a unique indi-
PROs, with focus on the background and rationale cator of the impact of disease and its treatment by
and also on research objectives. Section “Selection enabling physicians and other health-care profes-
of Subjects” centers on selection of subjects. sionals to rely significantly on patient reports in
Section “Longitudinal Designs” focuses on longi- evaluating disease activity and symptoms.
tudinal designs. It discusses event- or condition- In the management and monitoring of certain
driven designs, time-driven designs, timing of the chronic conditions – such as arthritis, neuropathic
initial PRO assessments, timing of the follow-up pain, irritable bowel syndrome, sexual dysfunc-
PRO assessments, and frequency of evaluation. tion, and chronic obstructive pulmonary disease –
Section “Selection and Evaluation of the Measure- PROs have become the central outcomes of
ment Instrument” concentrates on the selection and choice. In other chronic diseases, such as cancer
evaluation of the measurement instrument: formu- and cardiovascular disease, increased attention
lating study objectives, developing or selecting an has been paid to PROs in order to highlight the
instrument (its relevance, psychometric properties, humanistic side of the disease and its treatment. In
and feasibility), developing data collection strate- oncology studies, for instance, the impact of treat-
gies, analyzing data (multiple testing, missing data), ment on survival and tumor shrinkage is often
reporting data, and interpreting study findings. accompanied by and weighted against the impact
Moreover, in this chapter, special consideration is of the treatment on aspects of a patient’s health-
given for developing a PRO measurement strategy related quality of life, for example, the impact of
in a regulatory context where the intent is to have a chemotherapy on toxicity and adverse events.
labeling claim on a PRO. Since 1985, the FDA has recommended patient-
centered evaluations in clinical trials relating to
cancer research (Johnson and Temple 1985).
Research Basis and Goals The rationale for measuring PROs needs to be
made explicit in the planning and documenting of
Background and Rationale clinical trials in order to put forward labeling or
promotional claims on PROs. From an industry
Providing sufficient background and rationale to and regulatory perspective, a well-defined and
justify the resources required for an investigation reliable PRO instrument in suitably designed
of PROs will contribute to the success of the investigations can be used to support a claim
investigation. The background to why PROs are provided that the medical product labeling of the
claim is consistent with the suitably documented study. Here is an example of a specific and con-
measurement capability of the instrument (Food crete objective: “Moodlift 20 mg, taken once
and Drug Administration 2009). (“Instrument,” as daily, will lead to improvement in symptoms of
defined here, refers to a questionnaire plus all the depression, psychological function, and social
information and documentation that supports its function by 2 weeks among adult men with
use, including the method of administration, major depressive disorder” (Luo and Cappelleri
instructions for administration, the scoring algo- 2008; Rothman et al. 2007). Based on the aspects
rithm, analysis, and interpretation.) The Food and of a patient’s condition under investigation, rele-
Drug Administration, the regulator and approver vant PRO instruments and relevant domains of
of medicines in the United States, has produced a those instruments should be identified.
guidance document for use in medical product In addition to identifying the relevant domains
development to support labeling claims (Food of a PRO, the population of interest, and the time
and Drug Administration 2009). frame of interest, objectives should have clear
Data from a PRO instrument can be used to hypotheses as to whether the intent is to obtain a
highlight any distinctive treatment advantages label claim or not, demonstrate superiority or
and disadvantages of a drug, which are not possi- non-inferiority, and seek confirmatory or explor-
ble to be measured in other ways. Conversely, atory evidence. Different endpoints may serve
without PRO data, a drug’s profile may be incom- different purposes; for example, one PRO end-
plete and as such does not represent the full base point may be sought for a label claim and require
of potential benefits or harms patients would confirmatory evidence, whereas another PRO
experience when using the medicine under endpoint may be considered exploratory with no
investigation. intention of a label claim.
In seeking a PRO label claim being sought in
the United States, sponsors of medicines are
Research Objectives advised to place their research objectives and
goals in terms of a conceptual framework, which
The most critical component of a study is its may be useful in developing and refining the goals
research objectives and goals. The implementa- for PRO measurement. Guided by an appropriate
tion of study is successful only when its goals and conceptual model, which identifies and describes
research objectives are well defined with suffi- the PRO concepts and hypotheses that underlie a
cient detail to guide its design, conduct, and anal- PRO-based product labeling claim, a conceptual
ysis. The development of a clear and explicit a framework explicitly defines or depicts the rela-
priori objective is vital for subsequent trial design tionships between the items in a PRO instrument
and study conduct, especially if a sponsor wishes and the concept measured (Food and Drug
to seek a label claim or promote benefits of an Administration 2009; Rothman et al. 2007;
intervention. Snyder et al. 2007). The concept is the specific
Stated objectives should breathe concrete and measurement goal, that is, the attribute or charac-
specificity, not vagueness and ambiguity. For teristic measured by a PRO instrument.
example, the objective “To compare PROs If the desired overall claim, for instance, is
between regiment A and regimen B” fails to pro- “product X reduces problems with swallowing
vide specific information about the patient popu- and speaking to others and improves daily activ-
lation of interest, time of assessment, and which ities for individuals with head and neck cancer,”
aspects of a patient’s condition will be assessed the diagram in Fig. 1 depicts a plausible concep-
and compared. The stated objectives should refer tual framework of a PRO instrument where a set
to what is being measured and not the measure- of items is associated with a specific domain, such
ment instrument. Clear specifications of the as “swallowing,” “speaking,” and “basic activities
details can help to better design study protocols of daily living”; moreover, the domains represent
and are vital to the ultimate success of a clinical related but separate concepts (Patrick et al. 2007).
Fig. 1 Diagram of the conceptual framework of a patient-reported outcome instrument
An instrument may create a single score, thereby statistically significant and medically important,
measuring a single concept, or, as in Fig. 1, may would be sufficient for a claim. If this survival
be developed with multiple domain scores each endpoint showed a statistically significant and
represented by a concept, possibly within a more clinically meaningful treatment benefit, the
general concept of measurement, represented by domains of the PRO instrument – “swallowing,”
the “head and neck cancer-specific function” “speaking,” and “basic activities of daily living”–
domain. The conceptual model of a PRO instru- are subsequently listed in order of importance as
ment will evolve and be confirmed over the course complementary endpoints that may result in a
of measurement development as a sponsor gathers claim.
empiric evidence to support item grouping and
scores (Food and Drug Administration 2009).
Related to the conceptual framework, an end- Selection of Subjects
point model should be described and depicted if a
label claim is to be sought in the United States. It It is strongly recommended that protocol eligibil-
represents a diagram of the hierarchy of relation- ity, whenever possible, be restricted to patients
ship among endpoints, both PRO and non-PRO, willing and able to participate in the PRO assess-
that corresponds to the clinical trial’s objectives, ment. This challenging recommendation is moti-
design, and data analysis plan (Food and Drug vated by following two fundamental rationales
Administration 2009). Figure 2 depicts a hypo- (Gotay et al. 1992). The first is practical. Study
thetical endpoint model for a head and neck can- implementation is easier and more efficient when
cer example (Patrick et al. 2007). Primary all patients require the same assessments (PROs as
endpoints here include overall survival which, if well as non-PROs). The second is scientific.
Fig. 2 Hypothetical
endpoint model for head Concept Endpoints
and neck cancer example
Indication
Primary
Treatment of head
Overall Survival
and neck cancer
Secondary
Swallowing
Supportive Concepts
Other treatment benefit Speaking
Basic activities
of daily living
Credibility and interpretation of the overall course of action is to make eligible patients a top
results, and their overall conclusions, are priority and to have them complete all of their
enhanced when all subjects are available for all assessments in the same manner.
endpoints.
Assessments on PROs should not be seen as
optional by physician or patients. Optional assess- Longitudinal Designs
ments would jeopardize the ability of study results
to be generalized to study population and, with Patient-reported outcomes are often incorporated
randomization compromised, would likely lead to into a study by administering questionnaires at
selection bias. The goal here is to avoid differen- multiple time points with the goal of characteriz-
tial assessments on different patients because oth- ing the outcome over time (Fairclough 2004,
erwise results can be biased and likely seriously 2005, 2010). Such longitudinal data arise in
biased. All measurements over time should be most PRO investigations because interest centers
sought for all patients, not just some, in order to on how a disease or intervention affects an indi-
maintain validity and extension of inferences. vidual’s functioning and well-being over time.
Physical, cognitive, or language barriers may The number and timing of PRO assessments is
make the evaluation of PROs impossible in prac- influenced by the study objectives, such as when
tice for specific groups of patients (Gotay et al. meaningful change is expected, and practical con-
1992). In this case, alternative strategies for siderations, such as patient burden. Key consider-
collecting PRO data should be considered. Such ations in the design of a longitudinal study follow.
strategies include translation and culturally fine-
tuning of PRO instruments, assistance for patients
with visual or auditory impairments, and proxy Event- or Condition-Driven Designs
assessments for patients with cognitive deficits.
However, given the need to include patients who When the objective of a study is to compare a
are elderly or in minority populations, a preferred PRO in subjects who experience the same type
of condition during a given phase of treatment, interferon alpha (control) (Cella et al. 2008).
assessments can be planned at times when clini- Doses were adjusted in response to symptoms of
cally relevant events are expected to occur or at toxicity. Treatment in both groups was continued
times that correspond to a distinct, meaningful until the occurrence of death, unacceptable
phase of the intervention or disease. Such assess- adverse events, or withdrawal of consent. Patients
ment is more common for a design with a rela- were asked to complete the PRO questionnaires
tively short duration. Many variations exist. before any clinical activities during visits to the
Among them, for example, is when differences study clinics at screening, on days 1 and 28 of
in PRO values are expected during only the early each 42-day treatment cycle, and at the end of
period of therapy. A breast cancer trial of adjuvant treatment or study withdrawal.
therapy in which a 16-week dose-intensive regi- Time-driven designs are associated with
men was compared with a more traditional mixed-effect models for studies where time is
24-week regimen is an illustration of such an often conceptualized and taken as a continuous
event-driven design (Fetting et al. 1998). Three variable. Mixed-effect models are useful when the
assessments were planned – prior to (baseline timing of assessment differs widely among indi-
assessment), during, and after therapy – where viduals, studies have a large number of PRO
each phase of the disease or its treatment was assessments, or changes over time are to be
considered distinct with respect to the PRO of modeled with a smaller number of parameters
interest. than that required for a repeated measures model
In event-driven designs where each assessment (with time as a categorical covariate).
is conceptually identified with a landmark event,
repeated measures models for longitudinal data
(with time taken as a categorical covariate) are Timing of the Initial PRO Assessment
an appropriate choice. Note that assessments for
all subjects should be taken at the same points in The initial assessment is the first and one of most
time (e.g., week 6, week 10, and week 24), where important assessments in a study. This initial
points in time need not be equally spaced. assessment, usually referred as a baseline assess-
Repeated measures models may also be useful in ment, plays crucial role in estimation of changes
some studies with only a few assessments. on PRO outcomes. If the baseline assessment is
not present, all other data for this subject could be
useless in the modeling of differences between
Time-Driven Designs treatments. It is also critical that the initial assess-
ment occurs prior to randomization in randomized
When the scientific questions involve a more trials. Because the measurement of a PRO is gen-
extended period, or when the phases of the disease erally based on self-evaluation, an initial assess-
or its treatment are not distinct, the longitudinal ment that follows randomization runs the risk that
designs are based on or driven by time (Fairclough a subject’s responses are influenced by knowledge
2005, 2010). These designs are appropriate for of treatment assignment (Brooks et al. 1998). This
chronic conditions where therapies are given risk becomes especially evident when one of the
over elongated periods, such as diabetes and interventions is a new, promising therapy.
arthritis. Sometimes multiple assessments, assessed
In time-driven designs, the duration of therapy before randomization, from daily patient diaries
may be indeterminate at study onset, with therapy are collected and averaged to arrive at an overall
intended to be given to a patient until it is not baseline score. Such averaging increases the reli-
efficacious or produces unacceptable toxicity. For ability (precision) of measurement relative to a
instance, patients with advanced renal cell carci- single assessment. In two randomized, double-
noma were randomized to receive either repeated blind, placebo-controlled trials of pregabalin for
6-week cycles of sunitinib (experimental) or fibromyalgia, a patient’s baseline score on self-
reported sleep quality was computed as the aver- test-retest, should be reevaluated. To be consid-
age rating over the 7 days prior to taking study ered statistically independent observations, the
medication (Russell et al. 2009). In this daily diary timing of one assessment should not have a recall
assessment, patients completed the rating in the period that overlaps with the timing of another
morning upon awakening and reported the quality assessment on the same instrument; assessments
of their sleep over the past 24 h on an 11-point should be based on distinct recall periods.
numeric rating scale ranging from 0 (“best possi-
ble sleep”) to 10 (“worst possible sleep”).
Frequency of Evaluations
Timing of Follow-Up PRO Assessments The frequency of the assessments depends on the
natural history of the disease, the likelihood of
As with the timing of the initial PRO assessment, meaningful changes during the study period, the
the timing of follow-up assessments should recall period of a PRO (if the PRO is based on
receive careful consideration (Fairclough 2010). recall over the previous month, assessments
A tenet of appropriate timing for follow-up assess- should not be made weekly or daily), and how
ments is that they should be made consistently discontinuation of therapy relates to the research
across the treatment arms. It is important not to objective. All of these considerations should be
choose a particular time that will bias the results balanced with practical considerations such as the
against one treatment or another. Measuring burden placed on individuals who complete ques-
immediately after an untoward event such as tox- tionnaires and the timing of therapeutic and diag-
icity will emphasize that experience at the expense nostic interventions. Hence the assessments on
of de-emphasizing the potential benefits of treat- PROs should be frequent enough to capture mean-
ment and disease symptoms. When follow-up ingful change over a sufficient duration but not
assessments on PROs are to be collected, they frequent enough to cause excessive burden on
are usually positioned at all or some of the visits participants.
that other clinical assessments or lab measure- In long-term studies with mortality as the pri-
ments are collected. mary endpoint, as in chronic heart failure trials, it
A major factor when deciding on the timing of is often useful to have more frequent assessments
the PRO assessment, both initially and subse- at the end of the study to enable detection of
quently, is the recall period of the PRO question- deterioration. If, on the other hand, rapid change
naire. Because individuals have better recall for is expected during the early part of a study, as is
major events and more recent experiences, the typically the case for renal cell carcinoma studies,
period of accurate recall for measuring certain more frequent assessments earlier on may be
areas (e.g., erectile dysfunction, physical well- needed.
being) is between 1 and 4 weeks, whereas the Assessments should not be more frequent that
period of recall for the frequency and severity of the period of recall defined for the PRO instru-
symptoms (e.g., pain, fatigue) is accurate over ment. Instruments on satisfaction, functioning,
shorter periods such as at the time of patient and well-being are often based on the last 7 days
completion of the PRO or the past 24 h. That or 4 weeks. Symptoms assessment scales often
said, it should be noted that recall period use the last 24 h or ask about the severity right
established by the developers of the PRO instru- now. Shorter periods of recall are generally more
ment should be used. It is unadvisable to change a appropriate when the severity of symptoms are
recall period of a PRO instrument to fit a particular being evaluated, with more rapid changes in
study design, but rather a PRO instrument should symptoms requiring a shorter recall duration,
be selected (or maybe even newly developed) to while the same or longer periods may be required
fit the study design. If a recall period for a PRO to assess the impact of those symptoms on activ-
instrument was changed, some aspects, such as ities of daily living. Such was the case in a
non-small lung cancer trial where the severity of and justify the relevance of the selected PROs to
multiple symptoms and the impact of those symp- the target disease, patient population, and study
toms on daily functioning from chemoradiation setting. Information on relevance can be obtained
were evaluated during the last 24 h before the start from the medical literature, previous studies, and
of this intervention and weekly for 12 weeks dur- direct input from patients and other stakeholders
ing and after it (Wang et al. 2006). like families and health-care professionals. What
In many cases what is of real interest is not the is also needed is an understanding of the epidemi-
integrated effect over a short period (e.g., 2-week ology and burden of disease from the patient’s
period), but the effect at regular intervals (e.g., perspective and the postulated and empirical rela-
2, 4, and 6 weeks), similar to how measurements tionships between treatment, PROs, and other
might be made every 2 weeks in a blood pressure clinical outcomes.
trial (Food and Drug Administration 2009). For The FDA guidance on PROs for a label
regulatory claims on a PRO, the recall period with claim in clinical trials recommends a wheel
the shortest time frame consistent with and spoke diagram as a way to organize the
the instrument’s purpose or intended use (e.g., development process and provide the path by
when feasible, a recall period referenced which the PRO can lead to a claim
to the patient’s current or recent state) is preferable (Food and Drug Administration 2009; Patrick
to a recall period that is based on a longer period, a et al. 2007). The diagram is reproduced in
comparison of a patient’s current state with an Fig. 3. The five major steps highlighted in
earlier period, and a self-reported average over the diagram, which summarizes the iterative
time (Food and Drug Administration 2009). process used in developing a PRO instrument
Patients who drop out of a study prematurely for use in clinical trials, apply regardless of
are generally more likely to have a less favorable whether sponsors use an existing instrument,
score on a PRO because of side effects or no effect modify an existing instrument, or develop a
of treatment. A treatment arm with a high rate of new instrument. This diagram encapsulates
dropout is likely to give an artificially more favor- why the standards and preparations required for
able outcome because only the healthiest of the a PRO label claim are much more involved than
patients remain on treatment, leading to selection when a label claim is not sought.
bias and overly optimistic estimates of treatment In what follows a series of key steps on good
effect. It is therefore desirable to have a PRO research practices that centers around the common
assessment in conjunction with premature with- theme of selecting and evaluating a PRO measure-
drawal from the study. If the research objective ment instrument, be it for a regulatory claim or not
extends to off-therapy assessments, then they can (Luo and Cappelleri 2008).
be made by continuing the PRO assessments after
discontinuation. The off-therapy assessments can
always be excluded if deemed uninformative or
irrelevant to the research question. Including the Step 1: Formulating Study Objectives
off-therapy assessments after discontinuation
allows them to be available should they be deter- The evaluation of PROs begins with the formu-
mined to be of interest. lation of study objectives (Fig. 4). If a sponsor
wishes to seek a label claim or promote
benefits of a drug, the development of a
Selection and Evaluation clear and explicit a priori objective is critical
of the Measurement Instrument for subsequent trial design and study conduct.
Stated objectives should breathe concrete
The PRO measurement strategy should be and specificity, not vagueness and ambiguity,
operationalized according to what study questions as stated in the section “Research Basis and
are to be answered. It is necessary to understand Goals.”
Fig. 3 Development of a patient-reported outcome instrument for a label claim in a FDA application: an iterative process
(Source: Food and Drug Administration 2009)
Step 2: Developing or Selecting whole point of the endeavor is to measure

an Instrument patients’ experiences; measurement properties
have little value if the relevant concepts
Instrument development can be an expensive important to patients are not measured.
and a time-consuming process. It usually Adequacy for the development process of a
involves a number of considerations: qualitative PRO, especially for a label claim, is contingent
methods (item generation, cognitive debriefing, on the qualitative interview strategy, description
expert panels, qualitative interviews, focus of qualitative interviews and focus groups,
groups), data collection from a sample in the transcripts, coding procedure, and justification
target population of interest, item reduction, for each step in the development of an instrument
instrument validation, translation, and cultural (Patrick et al. 2007)
adaptation. The importance of establishing The whole procedure of instrument development
content validity through qualitative methods – and validation can easily require at least 1 year.
ascertaining that the measured concepts cover Therefore, the use of a previously validated instru-
what patients consider the important outcomes ment is typically preferable to the development of a
of the condition and its therapy – cannot be new instrument that requires validation. For
overemphasized (Fig. 4). It is essential that researchers who are not familiar with various instru-
the patients’ perspectives be taken in ments, updated information on currently available
account when developing PROs, as the instruments can be accessed from databases such as
Fig. 4 Key steps for selecting and evaluating patient-reported outcomes (Source: Reprinted with permission from Luo
and Cappelleri 2008)
the Patient-Reported Outcome and Quality of Life Psychometric Properties of an

Instruments Database (http://www.proqolid.org) Instrument
and the On-Line Guide to Quality-of-Life Assess- The selection of an instrument must also consider
ment (http://www.OLGA-Qol.com). the instrument’s measurement properties. Is the
With many instruments currently available, the instrument measuring what it intended to measure
choice of the most appropriate instruments becomes – is it valid? Does it give accurate measurements –
vital to the success of a study in which PROs are is it reliable? The selected instrument must be
included as a key endpoint. In what follows, several psychometrically sound. Measurement character-
issues that need to be taken into consideration for istics including reliability and validity are funda-
instrument selection are highlighted (Fig. 4). mental aspects for judging the quality and merits
of an instrument (Fayers and Machin 2007;
Relevance of the Selected Instrument Streiner and Norman 2008).
As part of aligning a selective instrument with Reliability measures to what extent an instru-
study objectives, the instrument should reflect ment yields reproducible and consistent results.
the concrete, unambiguous questions being Evidence on two types of reliability is usually
asked that are relevant to the targeted disease required. One is internal consistency reliability,
and study population. The instrument should and another is test-retest reliability. The internal
also be able to measure intended benefits and consistency reliability assesses to what extent the
harms of a treatment. items of a domain or subscale are correlated – to
what extent the items move in tandem to measure (discriminant validity). Construct validity can
different aspects of the same concept. The assess- also be assessed by correlating instrument scores
ment of internal consistency reliability is usually with other measures that are theoretically related
carried out using Cronbach’s alpha coefficient. (convergent validity) or unrelated (divergent
Test-retest reliability measures to what degree an validity) to the underlying concept measured by
instrument gives similar scores when it is repeat- the instrument.
edly administered to the same patient under a In addition to corrected item-to-total correla-
stable condition. It is often based on an intraclass tions (correlations between an item and the sum of
correlation coefficient. For Cronbach’s alpha and the other items on the same domain), items in
intraclass correlation coefficient, a minimum multi-item scales are often evaluated and con-
value of 0.7 is considered acceptable (Fayers and firmed by factor analysis. A “factor” is a latent
Machin 2007). variable, that is, an unobserved or hidden variable;
Assessing reliability is not sufficient for the the term “factor” may be defined and interchanged
validation of an instrument. An instrument may with the terms “domain,” “construct,” or “con-
be reliable (accurate and precise in measuring the cept.” A latent variable is a hypothetical construct
something) yet not measure what it is supposed to that is not directly observed but whose existence is
measure and hence not be valid. There are at least inferred from the way it influences the observed or
three major types of validity: content validity, manifest variables. Examples of a latent variable
construct validity, and criterion validity. Criterion include depression and anxiety.
validity is not assessed when there no criteria or The statistical technique that can govern and
“gold standard” measure, as is often the case for quantify those interrelationships is factor analysis.
most of the diseases. Factor analysis is a multivariate statistical method
Content validity concerns the extent to which concerned with detecting and analyzing patterns
the constituent items reflect the intended concept. based on the correlations among quantitative vari-
The assessment of content validity usually ables. For PRO assessment, it attempts to identify
involves critical examination on whether the groups of items such that there are strong correla-
items are comprehensive enough and clearly tions among all items within the same domain and
cover, without ambiguity, the concept of interest. weaker correlations among items in different
Content validity is often evaluated by consulting domains. The purposes of factor analysis are
with patients having the disease of interest, phy- mainly for the structural development and valida-
sicians, and specialists to ensure that the included tion of scales.
items are clear, comprehensive, and acceptable. Exploratory and confirmatory factor analyses
Construct validity is another fundamental char- are two major approaches to factor analysis
acteristic of a measurement instrument and (Brown 2006; Cappelleri and Gerber 2010; Fayers
assesses to what extent an instrument measures and Machin 2007). In factor analysis, the under-
the construct or concept it is supposed to measure. lying structure of a set of measured items is sum-
The assessment of construct validity often begins marized by a smaller set of latent (unobserved)
with postulating a relationship between the con- factors that manifest themselves via the measured
cept (construct) of interest and other related or items. An objective is to identify the number and
unrelated measures or characteristics. Data are the nature of the factors that are responsible for
then collected, and the assessment is conducted. covariation in the data and to determine the
If the results confirm the postulated relationship, domain structure of a questionnaire (which items
evidence exists to support construct validity. represent which domains), which is what explor-
Different methods can be used to establish atory factor analysis addresses. The domain struc-
construct validity. For example, construct validity ture may be unidimensional or multidimensional
can be assessed by comparing instrument scores with several factors or domains (sometimes also
among different groups of patients that are clini- called subscales). A further objective may be to
cally distinct and anticipated to score differently confirm an existing domain structure in a separate,
independent group of individuals from the same language availability, time required to complete
population, which is what confirmatory factor the instrument, patient ability to complete the
analysis addresses. questionnaire, the rate of refusal, and percentage
It is difficult to fully and completely prove of missing items. All of these issues, each an
construct validity. Instead, researchers rely on important element itself, should be thought out
accumulating amounts of evidence to demonstrate when selecting an instrument.
that an instrument is valid in measuring the con-
cept of interest.
Responsiveness, which can also be viewed as Step 3: Developing Data Collection
another type of validity, is the ability of an instru- Strategies
ment to detect small but important changes within
a group over time. Responsiveness is one of the After determining which instrument will be used
most essential characteristics of an instrument; a in an evaluation on PROs, a carefully planned data
nonresponsive instrument has little use to discern collection strategy should be built into study
true drug effects. Two of the most commonly used design and research protocol to ensure high qual-
measures of responsiveness are the standardized ity of data (Fig. 4). Although this is true of any
response mean and the effect size. The standard- serious study design and research, the fact that
ized response mean is the ratio of the mean change PROs are based on a patient’s self-report makes
to the standard deviation of that change. The effect it even more important to develop a judicious
size is the ratio of the mean change to the standard strategy in order to prevent or minimize bias or
deviation of the initial measurement. The effect missing data. An important consideration when
size measure is commonly considered more developing the data collection strategies is the
appropriate than the standardized response mean time intervals that PROs are assessed, as
because the effect size uses natural variability discussed in the section “Longitudinal Designs.”
stemming from patients’ baseline values, which Time intervals of assessment should be based
are not influenced by the effects of treatment, in on disease progression, treatment response, drug
order to help quantify what magnitude of change side effects, duration of the study, and number of
would be important. Measures of responsiveness questionnaires. At a minimum, assessments of
like the effect size, being dimensionless, can be PROs should be performed at baseline and at the
used to compare the responsiveness of a new end of study. But intermediate follow-up measure-
instrument with that of existing ones. ments may be required to more fully capture
Related to responsiveness is sensitivity: the changes within group and between groups over
ability to detect known differences between treat- time. Therefore, a reasonable number of assess-
ment groups over time or at a specific time. Its ments to capture this trajectory should be planned
standardized measures of effect correspond to in a clinical trial.
those for responsiveness except that the mean Assessments of PROs are usually performed at
change is between groups instead of within group. the same time as clinical visits and are best com-
With the exception of content validity, which is pleted before professional encounters with
based on qualitative methods, measurement prop- non-PRO measures, which may influence a
erties are grounded in quantitative analysis usu- patient’s response on PROs. The mode of admin-
ally involving correlations, means and regression istration on PROs can be obtained by paper and
methods, as well as theoretical expectations. pencil, computer administration, electronic
Table 1 summarizes key measurement properties devices, or in-person or phone interviews. The
of a PRO. same PRO should use the same mode of adminis-
tration throughout the study.
Feasibility Standardized data collection procedures need
The final consideration on instrument selection is to be established to ensure that the data are col-
feasibility. Issues related to feasibility include lected consistently among different patients and
Table 1 Measurement properties for PRO instruments

Measurement
property Type What is assessed? FDA review considerations
Reliability Test-retest or intra- Stability of scores over time when Intraclass correlation coefficient
interviewer reliability no change is expected in the Time period of assessment
(for interviewer- concept of interest
administered PROs
only)
Internal consistency Extent to which items comprising Cronbach’s alpha for summary
a scale measure the same concept scores
Intercorrelation of items that Item-total correlations
contribute to a score
Internal consistency
Inter-interviewer Agreement among responses when Interclass correlation coefficient
reliability (for the PRO is administered by two or
interviewer- more different interviewers
administered PROs
only)
Validity Content validity Evidence that the instrument Derivation of all items
measures the concept of interest Qualitative interview schedule
including evidence from Interview or focus group
qualitative studies that the items transcripts
and domains of an instrument are Items derived from the transcripts
appropriate and comprehensive Composition of patients used to
relative to its intended develop content
measurement concept, population, Cognitive interview transcripts to
and use. Testing other evaluate patient understanding
measurement properties will not
replace or rectify problems with
content validity
Construct validity Evidence that relationships among Strength of correlation testing a
items, domains, and concepts priori hypotheses (discriminant
conform to a priori hypotheses and convergent validity)
concerning logical relationships Degree to which the PRO
that should exist with measures of instrument can distinguish among
related concepts or scores groups hypothesized a priori to be
produced in similar or diverse different (known groups validity)
patient groups
Ability to Evidence that a PRO instrument Within person change over time
detect change can identify differences in scores Effect size statistic
over time in individuals or groups
(similar to those in the clinical
trials) who have changed with
respect to the measurement
concept
Source: Food and Drug Administration 2009
investigators and across various study sites. Missing data can occur at the item level for at
Before the start of the trial, data collection person- least one but not all items on the questionnaire or
nel and study monitors should be carefully and at the questionnaire level for all of its items. The
uniformly trained. A detailed guideline on the reasons for missing data should be recorded at the
assessment of PROs should be prepared and time of occurrence and later considered to lend
serve as a reference book for study monitors and insight into the potential patterns for why data are
data collection personnel in order to handle issues missing. Because data quality is directly linked to
arising from the assessment. the validity of study findings, researchers should
have a thorough understanding about the data analysis and reliability where the objective is on
collection process along with potential issues an instrument’s measurement properties, rather
and biases inherent in this process. Such knowl- than a comparison between treatment groups. Fac-
edge can help facilitate the development of appro- tor analysis is a large-sample procedure, and a
priate data analysis plans to understand and valid factor analysis typically involves hundreds
minimize potential bias. of subjects. Sample size estimation for factor anal-
If missing data do occur for some but not all ysis depends on several elements such as the
items on the questionnaire, the non-missing data distribution of items and correlations between
may still be used for analysis based on some them. One suggested rule of thumb is to recom-
prespecified criteria, usually recommended by mend a sample size of at least ten times the num-
the developers of the questionnaire. For example, ber of items for an exploratory factor analysis
the EORTC QLQ-C30 (European Organization (Fayers and Machin 2007) and at least ten times
for Research and Treatment of Cancer Quality of the number of parameters (measurement-error
Life Questionnaire – Cancer-30) consists of five variances, covariances among domains, factor
functional scales [physical, role, cognitive, emo- loadings) for confirmatory factor analysis
tional, and social], three symptom scales (fatigue, (Brown 2006). Sample size estimation for test-
pain, nausea and vomiting), a global health status retest reliability can be based on Fisher’s
scale, and six single-item scales (Fayers et al. Z transformation for an intraclass correlation
2001). The EORTC QLQ-C30 Scoring Manual using a confidence interval approach (Streiner
has specified that under certain conditions, miss- and Norman 2008).
ing values will be imputed for multi-item scales. Although repeated measures and mixed-effect
Specifically, if at least half of the items from the models are often used in the analysis of PRO
scale have been answered, the missing items are measurements over time, sample size estimation
assumed to have values equal to the average of is most commonly based on calculating the
those items which are present for the respondent. expected difference in the group means at a single
For example, the physical function subscale con- time point rather than over time. This calculation
sists of 5 items, and this scale can be estimated amounts to sample size estimation for a univariate
whenever at least 3 of its 5 constituent items are analysis and in most cases provides a conservative
present. More is said about missing data in the (larger than necessary) estimate of the sample
section “Missing Data.” size. Procedures are also available for the estima-
Sample size estimation is an indispensable part tion of sample size in a longitudinal analysis with
of a data collection strategy and depends on the a repeated measures model or mixed-effect model
study objective. In principle, there are no major (Fairclough 2010; Fitzmaurice et al. 2011).
differences in planning studies for a comparison
between treatment groups using PROs compared
with using non-PRO clinical measures such as Step 4: Analyzing Data
blood pressure levels. As such, sample size esti-
mation for PROs will require specification of the The next step in the evaluation of PROs is to
significance level, statistical power, anticipated develop prespecified, comprehensive, and
difference or effect size, expected dropout rate, detailed plan on data analysis (Fig. 4). For a clin-
and type of data and method of analysis (Fayers ical trial, the statistical analysis plan (SAP) on
and Machin 2007). As already stressed, it is PROs is best integrated with other study endpoints
important and necessary to clearly state and limit as part of an overall analytic strategy. Gains in
the major PROs of interest in the study protocol. efficiency arise when PROs are integrated and
Doing so is especially relevant for sample size unified with other endpoints in the SAP.
purposes. The SAP part on PROs should be clear and
Sample size estimation for PROs becomes spe- concise, and yet complete and comprehensive,
cialized for psychometric techniques like factor about the stated objective. In addition to the data
analysis on PROs, the SAP should also include a measures models and mixed-effect models incor-
brief description on how the instruments are porate all available data and assume that data are
selected, how domains belonging to an instrument missing at random.
are scored, and how missing items of an instru- Inferential testing of data on PROs should con-
ment are handled. The development of data anal- sider the analytical issues specific to the evalua-
ysis plan should be based on study objectives and tion of PROs in a clinical trial. For example, many
may vary among different phases of clinical trials. instruments have multiple domains, and each
For example, for a phase II trial intended to instrument may be measured a number of times.
explore the potential impact of a specific drug Multiple comparisons then become an important
treatment on PROs, the analysis plan can focus issue that deserves special consideration. Missing
on a comprehensive descriptive analysis and, if data usually occur in PRO studies. How to handle
suitable, an inferential analysis. Basic statistics the missing data also requires special consider-
such as instrument compliance rate, the observed ations. More detail on these two issues follow.
mean of domain scores (along with confidence
intervals such as a 95% confidence interval), and Multiple Testing
the observed mean change from baseline (and its It has been well recognized that the multiple com-
95% confidence interval) to each follow-up time parisons of drug treatments can result in false
should be included within each group. Addition- significant results. Because data on a particular
ally, if a trial has multiple arms, a comparison of PRO is usually measured over a number of time
the domain scores between arms is typically points, and because the same study may comprise
worthwhile to include by analyzing (and then multiple PROs (or multiple subscales within the
reporting) the between-group difference in same PRO instrument), it becomes important to
changes from baseline to each follow-up time, describe in the SAP how to deal with this multi-
along with the corresponding difference in mean plicity issue, especially if the evaluation in the
changes and its 95% confidence interval. clinical trial is intended for label claims based on
For a phase III trial, especially one intended for PRO outcomes. Several methods can be applied to
a label claim based on a PRO outcome, inferential address the multiple testing (Fairclough 2010).
statistics (hypothesis testing and confidence inter- One of the methods is to use summary mea-
vals) should be the focus of the analysis plan, sures or summary statistics. For many instru-
along with a detailed descriptive summary. ments, a single score can be constructed by
Regardless of phase of the study, data on PROs aggregating data across different domains on the
should be treated just like any other study points same questionnaire. Such a summary score can be
and adopt the same analytical rigors. used as the primary endpoint for hypothesis test-
As discussed in the section “Longitudinal ing and, consequently, prevents the concern of
Designs,” event-driven designs are generally repeated testing on multiple domains of the same
associated with repeated measures longitudinal instrument.
model, where time is a categorical covariate. Summary measures can also be constructed on
Restricted maximum likelihood estimation of a particular subscale or domain of an instrument to
repeated measures models can account for incom- summarize the repeated observations over time on
plete data and time-varying covariates. Time- an individual and then across individuals in the
driven designs are associated with mixed-effect same treatment group. Examples include, for each
longitudinal models via growth curve models, treatment group, the average of within-subject
where time is taken as a continuous covariate. It posttreatment values, area under growth curve,
is generally enough for these models to include and time to reach a peak or prespecified value.
polynomial or piecewise linear models and typi- The use of these summary measures begins with
cally allow one to three random effects (intercept; the construction of a summary measure for each
intercept and slopes; intercept, slope, and addi- individual, follows with the analysis of a sum-
tional variation over time). Both repeated mary measure across individuals for a within
group, and then continues with a corresponding and secondary endpoints and their order for infer-
between-group comparison. For instance, it is ential analysis and testing.
possible to construct summary statistics on the The problem of multiplicity can also be
repeated measures within a group of individuals addressed in several other ways including through
by taking the average rate of change over time for p-value adjustment. Three types of p-value adjust-
a treatment group and then comparing these sum- ment are commonly considered: (1) Bonferroni,
mary statistics between groups. (2) Bonferroni-Holm (step-down) procedure, and
A potential problem with the use of the sum- (3) Hochberg’s (step-up) method. Of the three
mary score is that significant changes in some methods, the Bonferroni procedure is the most
specific domains may be masked and what is conservative. In contrast, the Holm’s procedure
really measured may become clouded or convo- and Hochberg’s method may be more accurate
luted, resulting in low confidence about the effect and preferable.
of treatment as measured by the summary score. A
drawback of summary measures across time is Missing Data
that they do not fully capture the weighted and Missing data on PROs can have at least two major
correlated nature of repeated observations on repercussions. At a minimum, the missing data
PROs over time. will result in wider confidence intervals and
Another way to minimize the problem of mul- reduced statistical power for detecting a treatment
tiplicity is to restrict the number of key domains effect. The larger, more troublesome issue is the
and time points, no more than a few. These key likelihood that missing data are closely linked to
domains at specific time points should be pre- patients’ health and treatment, leading possibly to
specified in the SAP as primary endpoints for a biased estimation of treatment effects. Given
statistical inference. Other domains at other time these potential impacts, the SAP should clearly
points may be regarded as secondary endpoints. describe how to handle missing data, especially if
While this recommendation provides a straight- the evaluation on PROs is intended for label
forward way to handle the multiplicity issue, a claims or promotional use.
major challenge is how to select the most appro- Missing data on PROs can occur as missing
priate domains and time points. One way to items or missing questionnaires. Missing items
address this challenge is to rely on substantive involve the lack of responses for some specific
knowledge, well-grounded theory, and research items; missing questionnaire involves patients
objectives in tandem with the nature of the disease who may fail to complete and return the whole
and the intended effects of the interventions. questionnaire. Many instruments include well-
Often several multiple endpoints, both PRO documented procedures by their developers on
and non-PRO endpoints, would be of clinical how to handle missing items. Such recommenda-
interest. One suitable method is to test them tions by developers are typically the preferred
using a gatekeeping strategy whereby secondary way to address missing items.
endpoints are analyzed and tested inferentially in Missing questionnaires are a more complex
a prespecified sequential order only after success situation than missing items. Missing question-
on a primary endpoint (Food and Drug Adminis- naires can happen as a result of dropout from the
tration 2009). More generally, the key endpoints study or randomly failing to fill out an entire
are ranked from most important to least important questionnaire. In any of these situations, it is
from the list of endpoints considered most rele- important to first analyze the rates (proportions)
vant. This process can be done using a sequential and reasons for missing data. Such information
method by testing additional endpoints in a will help to gauge the severity of the nonresponse
defined sequence each at the usual alpha at the problem and the underlying mechanisms for
0.05 level of statistical significance. The analyses missing data.
cease when a failure occurs. It is important that the There are at least four approaches to address
clinical trial protocol specifies all relevant primary the missing data problem (Fairclough 2010). One
approach is to remove patients with missing or Multiple imputation method, which improves the
incomplete forms from the analysis and only ana- accuracy of standard error, assumes that the miss-
lyze complete cases. While simple, this method is ing data are missing at random (MAR), where the
usually not recommended because it can break missingness depends only on the observed data
down initial randomization and reduce sample such as the most recently observed PRO value.
size and, in doing so, may produce bias results if A third approach to address the problem of
the missing data are not missing completely at missing data is through the application of a
random. (Missing completely at random occurs likelihood-based approach using repeated mea-
when the missingness is unrelated to PRO value sures models or mixed-effect models (Fairclough
as when, e.g., a patient moves out of town or a 2010; Fitzmaurice et al. 2011; Mallinckrodt et al.
staff member forgets to administer the 2008). In this approach, every subject would con-
questionnaire.) tribute his or her available (observed) measure-
A second approach is to impute the missing ments. Repeated measures models and mixed-
data. Different methods can be used for the impu- effect models employ a likelihood-based
tation. The simplest way is to substitute the mean approach that is considered attractive because it
scores of patients with observed data for those can provide valid estimate of treatment effects if
with missing data (mean imputation). Unless the missing data are MCAR or MAR, where the miss-
missing data are missing completely at random ing data are said to be ignorable.
(MCAR), this means imputation method may The fourth approach is especially relevant
result in bias estimates and should be used cau- when missing data are not MAR and hence
tiously. Another commonly used method is last depend on the (unknown) missing value, when
observation carried forward, which replaces a missing data are said to be non-ignorable. In this
patient’s missing value with his last completed case, selection models or pattern-mixture models,
observation. In the event that data on PROs may which do not assume that data are neither MCAR
not remain stable over time, last observation car- nor MAR, should be considered as secondary
ried forward may also be suspect and result in a models in sensitivity analyses. For the analysis
bias representation (Mallinckrodt et al. 2008). of longitudinal data, it is generally preferred to
Analogous to last observation carried forward consider, depending on the circumstances, a
approach is the baseline observation carried for- repeated measures model or mixed-effect model
ward approach, when all missing values for a as the main model and multiple imputation or
subject are replaced by his or her baseline obser- pattern-mixture models (or both) as secondary
vation. Relative to the method based on last obser- models.
vation carried forward, this method can produce The National Research Council has produced
more conservative results for treatment an authoritative account on the prevention and
differences. handling of missing data in clinical trials
Some more sophisticated techniques have been (National Research Council 2010), which can be
developed including regression imputation, hot relevant to prevention and handling of missing
deck imputation, and cold deck imputation. All PRO data.
of these techniques, like the simple mean imputa-
tion and last observation carried forward, belong
to a single imputation category in which a single Step 5: Reporting Data
value is imputed for a specific missing point. A
major limitation with single imputation methods The reporting of data on PROs is a critical com-
is that estimated errors are generally too small, as ponent to their evaluation (Fig. 4). Data on PROs
the imputed values are treated as actual data when should be presented clearly, concisely, and suffi-
in fact they are not. However, this obstacle can be ciently to foster clarity, transparency, and compre-
overcome by multiple imputations whereby sev- hension. While a table is a useful way to
eral values are imputed instead of just one. summarize study results, graphical presentations
is especially appealing in simplifying and import. Anchor-based approaches include per-

depicting the longitudinal and multidimensional centages based on thresholds, the percentage of
nature of data on PROs (Fayers and Machin patients above and below some specified value;
2007). Whether a table or graph is used, it is criterion-group interpretation, the comparison of
imperative to present information as comprehen- scores from the particular group of interest to a
sively and practically as possible. For example, group or external variable worthy of comparison;
data on the number of subjects completing the content-based interpretation, a representative item
PRO evaluation at each treatment assessment internal to the multi-item PRO itself; and clinical
should be reported, as should the metrics of vari- important difference, a difference on a PRO that is
ability embodied as in confidence intervals or deemed clinically relevant. For example, a
standard errors of estimates. criterion-group interpretation would involve a
comparison of PRO scores in the population of
interest with norm-based PRO scores from a gen-
Interpreting Study Findings eral population or with external variables such as
utilization of health-care services and ability
The data analysis may show a statistically signif- to work.
icant difference on scores of PROs between treat- Distribution-based approaches use the statisti-
ment groups at a specific time or a significant cal characteristics of the sample (e.g., mean and
change within or between groups over time. In standard deviation) or instrument (e.g., reliability)
addition to statistical significance, a natural ensu- to suggest a clinically meaningful change.
ing question is whether the treatment difference or Distributed-based approaches include effect size,
change is clinically meaningful (Fig. 4). It has probability of relative benefit, and responder anal-
been well recognized that statistical significance ysis and cumulative proportions. A widely used
may not imply clinical significance. For example, distribution-based method is the effect size,
a small difference on PRO scores between two discussed earlier in the section “Psychometric
treatment groups may be statistically significant Properties of an Instrument.” The approach
given a large sample size, but clinical relevance based on the probability of relative benefit,
may be scant or difficult to interpret in a meaning- which is based on ridit analysis using the
ful manner. Understanding the degree of differ- Wilcoxon rank-sum test statistic, gives the proba-
ence on scores of PROs that is considered to be bility of a randomly selected individual on one
clinically meaningful can enhance the application treatment arm having a more favorable score than
and interpretation on PROs. a randomly selected individual on the other treat-
A number of methods have been proposed for ment arm (Alcion et al. 2006). Because a
establishing meaningful change in PROs. These distribution-based approach like effect size and
methods can be grouped into two broad catego- probability of relative benefit is derived purely
ries: anchor-based and distribution-based from a statistical distribution, and not from patient
approaches (Fayers and Machin 2007; Food and input, it does not provide an estimation of clinical
Drug Administration 2009; Revicki et al. 2008). significance per se.
Anchor-based methods are those in which dif- According to the FDA final guidance on PROs
ferences at a given time or changes over time in for a label claim, it is recommended to display
PROs are linked – or anchored – to differences or individual responses using a priori responder def-
changes in an external clinical measure (e.g., inition: the threshold value on an individual PRO
patients’ global rating of change and clinical rat- change score that is to be interpreted as a treat-
ing of disease severity) or to a yardstick value or ment benefit (Food and Drug Administration
even to part of the PRO measure under consider- 2009). The proportion of subjects meeting the
ation. When used as an external clinical measure, responder definition can then be reported for
an anchor should bear an appreciable correlation each treatment group and compared between
to the PRO and have clinical understanding and groups. As stated in the FDA guidance, it is
Fig. 5 Illustrative 100

cumulative distribution
functions of two treatment 90
groups where more
negative change scores are 80
better (solid
Cumulative Proportion
70
line = experimental group,
dashed line = control 60
group)
50
40
30
20
10
0
−35 −30 −25 −20 −15 −10 −5 0 5 10 15 20 25 30 35
Change From Baseline
usually useful to display individual responses, supplement to – the main analysis based on the
often using an a priori responder definition (i.e., full original scale of measurement using
the individual PRO score change over a preestablished statistical methods (e.g., repeated
determined time period that should be interpreted measures models or mixed-effect models when
as a treatment benefit). The responder definition is the data are longitudinal).
determined empirically and may vary by target
population or other clinical trial design character-
istics. The empiric evidence for any responder References
definition is derived using anchor-based methods,
which explore the association between the Alcion L, Petersen JL, Temple S, Arndt S. Probabilistic
targeted concept of the PRO instrument and the index: an intuitive non-parametric approach to measur-
ing the size of treatment effects. Stat Med.
concept measured by the anchor (or anchors). To 2006;25:591–602.
be useful, the anchors chosen should be easier to Brooks MM, Jenkins LS, Schron EB, Steinberg JS, Cross
interpret than the PRO measure itself. JA, Paeth DS. Quality of life at baseline: is assessment
A cumulative distribution function can display a after randomization valid? Med Care. 1998;26:1515–9.
Brown TA. Confirmatory factor analysis for applied
continuous plot of the change from baseline on the research. New York: The Guilford Press; 2006.
horizontal axis and the cumulative percent of Cappelleri JC, Gerber RA. Exploratory factor analysis. In:
patients experiencing up to that change on the Chow S-C, editor. Encyclopedia of biopharmaceutical
vertical axis. Consider a situation where lower statistics.3rd ed., revised and expanded. New York:
Informa Healthcare; 2010. p. 480–5.
change or more negative scores are better or more Cella D, Li JZ, Cappelleri JC, Bushmakin A,
favorable (Fig. 5). In Fig. 4, 70% of the subjects in Charbonneau C, Kim ST, Chen I, Michaelson MD,
the experimental group had scores of 10 or less (i.e., Motzer RJ. Quality of life in patients with metastatic
10 or better) compared with 55% of the subjects in renal cell carcinoma treated with sunitinib versus
interferon-alfa: Results from a phase III randomized
the control group. The consistent horizontal sepa- trial. J Clin Oncol. 2008;26:3763–9.
ration between the distribution functions suggests Fairclough DL. Patient reported outcomes as endpoints in
that the treatment was beneficial relative to control medical research. Stat Methods Med Res.
over the entire range of changes. 2004;13:115–38.
Fairclough DL. Analysing longitudinal studies of QoL. In:
Responder analysis and cumulative distribu- Fayers P, Hayes R, editors. Assessing quality of life in
tion functions are best suited as descriptive dis- clinical trials. Oxford: Oxford University Press; 2005.
plays and as an adjunct to – as a complement and p. 149–65.
Fairclough DL. Design and analysis of quality of life National Research Council. The prevention and treatment
studies in clinical trials. 2nd ed. Boca Raton: Chapman of missing data in clinical trials. Washington, DC: The
& Hall/CRC; 2010. National Academies Press; 2010.
Fayers FM, Machin D. Quality of life: the assessment, Patrick DL, Burke LB, Powers JH, Scott JA, Rock EP,
analysis and interpretation of patient-reported out- Dawisha S, O’Neill R, Kennedy DL. Patient-reported
comes. 2nd ed. Chichester: Wiley; 2007. outcomes to support medical product labeling claims:
Fayers PM, Aaronson NK, Bjordal K, Groenvold M, FDA perspective. Value Health. 2007;10:S125–37.
Curran D, Bottomley A. On behalf of the EORTC Revicki D, Hays RD, Cella D, Sloan J. Recommended
quality of life group. In: EORTC QLQ-C30 scoring methods for determining responsiveness and minimally
manual. 3rd ed. Brussels: EORTC; 2001. important differences for patient-reported outcomes.
Fetting JJ, Gray R, Fairclough DL, Smith TJ, Margolin J Clin Epidemiol. 2008;61:102–9.
KA, Citron ML, Grove-Conrad M, Cella D, Rothman ML, Beltran P, Cappelleri JC, Lipscomb J,
Pandya K, Robert N, Henderson C, Osborne K, Abeloff Teschendorf B, Mayo/FDA Patient-Reported Out-
MD. A 16-week multidrug regimen versus cyclophos- comes Consensus Meeting Group. Patient-reported
phamide, doxorubicin and 5-flurouracil as adjuvant outcomes: conceptual issues. Value Health. 2007;10:
therapy for node-positive, receptor negative breast S66–75.
cancer: an intergroup study. J Clin Oncol. Russell IJ, Crofford LJ, Leon T, Cappelleri JC, Bushmakin
1998;16:2382–91. AG, Whalen E, Barrett JA, Sadosky A. The effects of
Fitzmaurice GH, Laird NM, Ware JH. Applied longitudinal pregabalin on sleep disturbance symptoms among indi-
analysis. 2nd ed. Hoboken: Wiley; 2011. viduals with fibromyalgia syndrome. Sleep Med.
Food and Drug Administration. Guidance for industry on 2009;10:604–10.
patient-reported outcome measures: Use in medical Snyder CF, Watson ME, Jackson JD, Cella D, Halyard MY,
product development to support labeling claims. Fed Mayo/FDA Patient-Reported Outcomes Consensus
Regist. 2009;74(235):65132–3. Meeting Group. Patient-reported outcomes instruction
Gotay CC, Korn EL, McCabe MS, Moore TD, Cheson selection: designing a measurement strategy. Value
BD. Building quality of life assessment into cancer Health. 2007;10:S76–85.
treatment studies. Oncology. 1992;6:25–8. Streiner DL, Norman GR. Health measurement scales: a
Johnson JR, Temple R. Food and drug administration practical guide to their development and use. 4th ed. -
requirements for approval of new anticancer drugs. New York: Oxford University Press; 2008.
Cancer Treat Rep. 1985;69:1155–9. Wang XS, Fairclough DL, Liao Z, Komaki R, Chang JY,
Luo X, Cappelleri JC. A practical guide on interpreting and Mobley GM, Cleeland CS. Longitudinal study of the
evaluating patient-reported outcomes in clinical trials. relationship between chemoradiation therapy for non-
Clin Res Regul Aff. 2008;25:197–211. small-cell lung cancer and patient symptoms. J Clin
Mallinckrodt CH, Lane PW, Schnell D, Peng Y, Mancuso Oncol. 2006;24:4485–91.
JP. Recommendations for the primary analysis of con- Wiklund I. Assessment of patient-reported outcomes in
tinuous endpoints in longitudinal clinical trials. Drug clinical trials: the example of health-related quality of
Inf J. 2008;42:303–19. life. Fundam Clin Pharmacol. 2004;18:351–63.
Micro-simulation Modeling
24
Carolyn M. Rutter
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
Development of a Microsimulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
Step 1: Define the Decision Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
Step 2: Specify the Model Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
Step 3: Identify Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
Step 4: Select Model Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
Example: Comparison of Two Tests to Screen for Colorectal Cancer . . . . . . . . . . . . . . . . . 570
Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
Exploration and Description of Model Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
In Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
Abstract that describe key events in a disease process.

Microsimulation models are a tool for Individuals occupy these health states, and the
informing health policy decisions. Models pro- model includes rules describing how individ-
vide a structure for combining a wide range of uals transition between states. Models are
evidence that represents the current underdeveloped by specifying states and transition
standing of both disease and interventions to rules that result in predictions that reproduce
prevent or treat disease. In the health policy observed or expected results. Model parame-
context, microsimulation refers to simulation ters are selected to achieve good prediction
of an entire population by simulating life his- through a process of model calibration. Once
tories for individuals within the population. calibrated, models are used to predict
The basic structure of a microsimulation population-level outcomes under different
model includes a description of heath states policy scenarios. Model predictions are
increasingly being used to provide informa-
tion to guide health policy decisions. This
increased use brings with it the need both for
C. M. Rutter (*)
RAND Corporation, Santa Monica, CA, USA better understanding of microsimulation
e-mail: crutter@rand.org models by policy researchers and continued
https://doi.org/10.1007/978-1-4939-8715-3_35
560 C. M. Rutter
improvement in methods for developing Models can help decision makers choose among
and applying microsimulation models. This competing courses of action by structuring and
chapter reviews the process of developing combining a wide range of evidence, including
and applying a microsimulation model, information about disease process and clinical and
drawing from guidelines for best practices economic outcomes, and then predicting patient
for simulation outlined by the International outcomes based on this evidence. Microsimulation
Society for Pharmacoeconomics and Out- models are used to predict outcomes under different
comes Research (ISPOR) and The Society policy scenarios and are especially useful for out-
for Medical Decision Making (SDM) (Caro comes that cannot readily be studied via direct
et al. 2012). observation for ethical or practical reasons. Model
predictions may extend cross-sectional results to
longitudinal predictions, extend results to different
Introduction patient populations, or make direct comparisons not
made in available randomized trials. For example,
Microsimulation models for health policy are a randomized trials demonstrate that both fecal occult
type of decision analytic model that describe dis- blood testing (FOBT) (Hardcastle et al. 1996;
ease processes by simulating key events that occur Kronborg et al. 1996; Towler et al. 1998) and flex-
as disease develops. Their purpose is to help deci- ible sigmoidoscopy (Atkin et al. 2010) reduce CRC
sion makers identify trade-offs associated with mortality. There is no direct evidence that either
different policy decisions. For example, the optical colonoscopy or CT colonography reduces
National Cancer Institute has advanced the use mortality, though several studies have estimated
of models for the cancer outcomes through the their sensitivity and specificity for detecting colo-
Cancer Intervention and Survival Modeling Net- rectal adenomas (the primary precursor of colorectal
work (CISNET) (2014). CISNET models have cancer) (Hixson et al. 1990; Johnson et al. 2008;
been used to inform policy recommendations Rex et al. 1997). Microsimulation models for colo-
regarding use of newer colorectal cancer screen- rectal cancer have been used to combine available
ing tests (fecal immunochemical tests, stool-based information about the natural history of disease and
DNA, and computed tomography colonography) screening tests to compare the effectiveness and
and to assist in development of guidelines for cost-effectiveness of all four of these screening
breast and colorectal cancer screening. As early modalities (Knudsen et al. 2010; Lansdorp-
as the 1980s, models were used by the American Vogelaar et al. 2010).
Cancer Society to aid in guideline development
for cervical cancer screening and by the US
Congress’s Office of Technology Assessment for Development of a Microsimulation
evaluation of cervical and breast cancer screening Model
policy (Eddy 1987; Muller et al. 1990).
Models have also been used to inform policy Table 1 shows the steps in developing a micro-
and clinical practice related to medications, simulation model.
radiology, vaccination, and HIV screening
(Mandelblatt et al. 2012). Examples of policy-
relevant findings from models include overdiag- Step 1: Define the Decision Problem
nosis of prostate cancer among PSA-detected
cases (Etzioni et al. 2002); identification of The first job of the modeler is to define the deci-
efficient cervical cancer screening policies (van sion problem, that is the modeling objectives. It is
der Akker-van Marle et al. 2002); and the impact important to be clear about the objectives, because
of modifiable risk factors, screening, and treat- these will drive model structure and complexity.
ment on colorectal cancer (CRC) mortality rates Modeling is a collaborative process. Consulting
(Vogelaar et al. 2006). with experts knowledgeable about the targeted
24 Micro-simulation Modeling 561
Table 1 Steps in developing a microsimulation model affects the disease process through detection of
Step 1: Define the decision problem preclinical states.
What interventions will be modeled?
What events are of interest? What Events Are of Interest?
What is the target population and what subgroups are of Events that are outcomes, such as cases of and
interest? deaths from the disease of interest, need to be
Step 2: Conceptualize the model structure described by the model. All-cause death is another
Will models describe events in discrete or continuous event that is almost always modeled, because it
time?
enables calculation of life-years gained (or lost)
What disease states and characteristics will the model
describe? that result from intervening on the disease pro-
When (and how) do individuals transition between states? cess. The events that are modeled are closely
Step 3: Identify and select data sources related to the interventions of interest. Models
Which data will inform the model? for prevention and screening need to describe
How will each data source inform the model – as an input, preclinical (asymptomatic) disease processes. In
calibration target, or validation target? contrast, models that focus on treatment focus on
Step 4: Select model parameters detected disease need to describe remission and
Which parameters are “inputs” and which parameters will recurrence.
be calibrated?
Which goodness of fit measure will be used to guide
calibration? What Is the Target Population and What
Which calibration method will be used for parameter Subgroups Are of Interest?
selection? Those eligible for intervention often define the
target population, with the earliest age of inter-
vention defining the beginning of the age range,
disease from the outset will help to ensure devel- which may extend through the entire simulated
opment of a useful model that addresses important life span. For example, models for cervical cancer
policy questions. Clinicians and epidemiologists screening focus on women who are 18 years and
who are familiar with the disease process can help older, while models of for breast cancer screening
inform the model structure to ensure face validity generally focus on women who are 40 years and
of the model and can provide insight into key older. Models examining treatment focus on
questions that cannot readily be addressed patients diagnosed with disease. Specific sub-
through direct observation. Policy makers and groups may be defined by risk factors, such as
other end users can help to determine necessary race/ethnicity and family history or disease
model output and provide additional insight into severity.
policy questions. Some models are developed for very specific
Three key questions, described below, need to decision problems, while others are developed to
be considered when defining the decision address multiple problems. General purpose
problem. models tend to describe disease processes in
greater detail, enabling modeling of the action of
What Interventions Will Be Modeled? a wide range of possible interventions and capture
Interventions can include primary prevention of of a wide range of possible outcomes. Therefore,
disease, screening for purposes of early detec- models that are used for multiple decision prob-
tion, methods for diagnosing disease, and treat- lems tend to be more complex than more focused
ment after diagnosis. The action of the models. It can be difficult to choose the level of
intervention will determine key health states detail that will be described by the model. The
that need to be included in the model structure. modeler must strike a balance between simplicity,
For example, models for screening need to which eases communication of model assump-
describe disease states that occur before clinical tions, and complexity, which may increase face
(symptomatic) presentation, because screening validity.
562 C. M. Rutter
Example: CRC Screening What Interventions Will Be Modeled?

Screening is an effective tool for reducing To simplify this example, consider the impact of
colorectal cancer incidence and mortality. two screening interventions: colonoscopy every
Screening can detect colorectal cancer at an 10 years and annual fecal immunochemical test
early stage, when there are better chances of (FIT). For both interventions, screening begins at
survival (Hardcastle et al. 1996; Imperiale age 50 and continues up to and including age 75.
2013; Kronborg et al. 1996; Towler et al. 1998) Individuals with a positive FIT result are assumed
and can also detect adenomas, the predominant undergo colonoscopy, with a return to annual FIT
precursor lesion in colorectal cancer, leading screening in 10 years if no adenomas or cancers
to disease prevention through their removal. are detected at colonoscopy. Any adenomas
Professional societies, including the American detected at colonoscopy are assumed to be
Cancer Society, the US Multi-Society Task completely removed. Consistent with clinical
Force on CRC, and the American College of practice, both screening interventions refer indi-
Radiology, recommend a variety of options viduals to adenoma surveillance based on findings
for CRC screening, including annual fecal occult at colonoscopy: individuals with one or two small
blood testing, flexible sigmoidoscopy every (<10 mm) adenomas detected have their next
5 years, and colonoscopy every 10 years colonoscopy in 5 years; individuals with three or
(Levin et al. 2008; Rex et al. 2009; U. S. Preven- more adenomas or any large (10 mm) adenomas
tive Services Task Force 2008). These tests have their next colonoscopy in 3 years. These
differ in terms of costs, screening intervals, analyses simulate patients who are fully adherent
and invasiveness. A key question faced by to all test. However, models could be developed to
patients, providers, and policy makers is examine the effect of differential adherence across
how best to screen for colorectal cancer, that screening regimens. For example, models could
is, which test or sequence of tests is most simulate individuals with different overall rates of
effective for preventing death from colorectal adherence for each test type or different rates of
cancer. patient dropout from the two screening regimens
In spite of a great deal of accumulated over time.
evidence demonstrating the effectiveness of As part of specifying the intervention, the sen-
individual colorectal tests, it is difficult to sitivity and specificity need to be defined for each
directly compare the effectiveness of different test and for detection of both precursor lesions and
screening regimens. Colorectal cancer is a rare cancer. FIT was assumed to have 0.95 specificity,
event, and so estimation of the effectiveness so that it results in a positive test 5% of the time
of screening to reduce cancer incidence requires when no disease is present, including precursor
large samples sizes, and estimation of the effec- lesions. FIT was assumed to have sensitivity, the
tiveness of screening to reduce colorectal cancer probability of detecting disease when it is present,
mortality requires long-term follow-up of this that depends on adenoma size: 0.05 for adenomas
large sample. Direct comparison of multiple 5 mm and smaller, 0.10 for adenomas larger than
screening regimens requires even larger 5 mm and less than 10 mm, 0.22 for adenomas
samples. For even short-term outcomes, it is 10 mm and larger, and 0.70 for preclinical cancers
not feasible to directly compare the wide range of any size. Colonoscopy is an endoscopic tests
of potential screening regimens, which include that visually examines the entire large intestine
combinations of different tests given at various (colon and rectum). Most but not all colonosco-
screening intervals. Models allow researchers to pies are complete, and lesions may be missed
combine available evidence to evaluate a specific because they are beyond the reach of the endo-
decision problem: the effect of different screen- scope. Colonoscopy was assumed to be complete
ing regimens on (lifetime) colorectal cancer mor- to the cecum for 98% of exams. Tissue that is
tality. The decision problem is further refined by biopsied during colonoscopy is sent to pathology
addressing our three questions. for definitive diagnosis, so colonoscopy has
perfect specificity. The sensitivity of colonoscopy indirectly inform the model; however, data avail-
was assumed to depend on the size of the lesion; ability should not necessarily determine a model’s
the probability of missing a lesion that is s mm in structure. The structure of the model must be
diameter is given by P(miss|size=s and sufficient to address the decision problem, and
size < 20) = 0.34–0.0349s + 0.0009s2, with per- this may require description of processes that can-
fect sensitivity for adenomas 20 mm and larger. not be directly observed (such as tumor growth). If
The associated miss rates for lesions that are the model structure is not supported by data, this
1 mm, 5 mm, 10 mm, and 15 mm in size are limited understanding of the underlying disease
31%, 19%, 8%, and 2%, respectively. process should be noted. Processes that are not
well supported by data can be explored through
What Events Are of Interest? sensitivity analysis.
For this question, the key outcome event is colo- When specifying a microsimulation model, the
rectal cancer death. However, other-cause death modeler must choose whether to model time as
also needs to be modeled to enable estimation of discrete or continuous, the distinct health states
life-years saved and accurate description of the that the model will describe, and rules for trans-
screened population. In addition, models need to itioning between states.
describe the preclinical disease processes because
screening can reduce mortality by its effect on two Will Models Describe Events in Discrete
preclinical process: (1) by detecting cancer at an or Continuous Time?
earlier stage, before it has become clinically The decision to model time as continuous or dis-
detected (through presentation with symptoms), crete is closely tied to the type of model used for
and (2) by preventing disease through detection simulations. Different types of health policy
and removal of precancerous lesions (adenomas). models are described below, including some
It will be important to describe adenoma size in models that are not used for microsimulation.
this model because both the probability of screen- Decision trees are a relatively simple models
detecting an adenoma and the probability that an that are used to describe outcomes for groups of
adenoma transitions to cancer increases with individuals (Petitti 2000).At each branching
increasing adenoma size. point, the tree specifies the probability of each
subsequent outcome, for example, whether an
What Is the Target Population and What individual has disease and, among people who
Subgroups Are of Interest? have disease, whether a test is positive or nega-
The decision problem in this example focuses on tive. Using a decision tree, alternative courses of
average risk individuals, who begin screening at action are compared by calculating the expected
age 50. Individuals at high risk for colorectal value of the outcome resulting from each pathway
cancer, because of family history of colorectal (i.e., multiplying the value assigned to each poten-
cancer or diagnosis with genetic conditions, tial outcome by the probability that each occurs).
often begin screening at earlier ages. Because they do not explicitly incorporate time,
decision trees are useful for simple decision prob-
lems with short time horizons, such as the short-
Step 2: Specify the Model Structure term effects of diagnostic assessment, but they are
not well suited to modeling of repeated events,
Once the decision problem is defined, the modeler such as a regimen of screening.
must specify the model structure (Roberts et al. State transition models are more complex
2012). The structure of the model is driven by the than decision trees and are useful for describing
decision problem in combination with an under- events over longer time frames than decision
standing of the disease process, which may be trees. State transition models incorporate time by
rooted in empirical data representing the cumula- updating state membership at discrete time inter-
tive scientific knowledge. In this way, data may vals or cycles. Because only a single transition can
564 C. M. Rutter
occur in each cycle, cycle length should be number of states must be increased when there is
selected understanding that only one event can interest in patient subgroups with different transi-
occur within a cycle. For example, if there are tion probabilities that reflect differences in disease
disease-free, preclinical disease, and clinical dis- characteristics. The number of states can also
ease states, and individuals are required to pass increase when modelers relax the Markov
through the preclinical state, then in one cycle assumption by carrying past health state informa-
individuals could transition from disease-free to tion forward. Because of this, the number of dis-
preclinical disease, or from preclinical disease to ease states needed to adequately describe a
clinical disease, but not from disease-free to clin- disease process can quickly increase, a problem
ical disease. Cycle length does not need to be known as “state explosion.” As the number of
uniform – it can depend on the state. However, states increases, Markov process models become
shortening the cycle length for a given timeframe intractable.
increases the total number of simulated transi- State transition models that describe the tran-
tions, increasing computational time. sition of individuals are a type of microsimulation
State transition models that describe the tran- model. Simulated individuals can be assigned
sition of groups of individuals are called Markov characteristics (such as age, sex, or race), and the
process models (Beck and Pauker 1983; Siebert model can allow transitions to depend on these
et al. 2012). Markov models assume that the prob- characteristics. By explicitly allowing individuals
ability of the transition from one state to the next in the same state to be governed by different
depends only on the current state and is indepen- transition rules, microsimulation models are able
dent of prior history (i.e., how members got to the to limit the total number of states. For example,
state). Because of this, Markov models are com- consider the colorectal cancer model shown in
monly described as “memory-less.” For example, Fig. 1, which includes six states: (1) alive and
when using a Markov model for screening, the disease-free; (2) alive with one or more adenomas,
probability of transition to the next screening test but no cancer; (3) alive with preclinical cancer;
depends only on the outcome of the current test (4) alive with detected cancer; (5) dead from colo-
rather than the entire simulated screening history. rectal cancer; and 6) dead from other causes. Sup-
Markov process models assume that individ- pose now that the model needs to allow all
uals who occupy the same state are homogeneous, transitions to depend on sex. Using a Markov
that is, they are governed by the same rules for process model, this would require expansion to
transitioning into the next health state. The an 10 state model (assuming death states are the
Fig. 1 Bubble graph showing the states and allowed transitions between states for the colorectal cancer model
same for men and women). In contrast, a state modeling approach can be used to simulate indi-
transition microsimulation model could describe viduals with specific characteristics (such as age,
this process using the same six states, by allowing sex, or race).
transitions to depend on the sex of the simulated
individuals, and some transitions could be What Distinct Disease States
modeled as identical for both men and women. and Characteristics Will the Model
Discrete event simulation (DES) models are Describe?
another type of microsimulation model that All models require specification of a set of mutu-
describe the movement individuals through dis- ally exclusive disease states that reflect the
tinct disease states in continuous time (Karnon disease processes of interest, such as the six
et al. 2012). Discrete event simulation models states shown for colorectal cancer in Fig. 1.
are useful when modelers can better characterize This basic model must be expanded to evaluate
transitions with time-to-event models than with endoscopic tests because large adenomas are
transition probabilities over fixed periods. For easier to detect than small adenomas. Both state
example, when modeling disease incidence, a transition and DES models could address the
state transition model would specify incidence need for adenoma size information by expanding
probabilities that are tied to the model’s cycle the model to include the size of the largest ade-
length (e.g., annual incidence probabilities), noma (e.g., diminutive (<5 mm), small
while DES models could use time-to-event (sur- (5 10 mm), or large (10 mm)). Alternatively,
vival) models to simulate the age at disease DES models can describe adenoma growth as a
incidence. continuous process, which essentially describes
Models for infectious diseases are more com- the time to reach various sizes. Modeling contin-
plicated because they describe transmission of uous growth requires assumptions about the
disease between individuals, and therefore indi- nature of adenoma growth but allows flexibility
viduals are not independently simulated (Pitman in how adenoma size is incorporated into
et al. 2012). Two broad types of models are used an intervention examined in the decision
to simulate infectious disease at the population- problem.
level: dynamic transition models and agent-based
models. Dynamic transition models for infectious When (and How) Do Simulated
disease model groups of individuals and describe Individuals Transition Between States?
transitions using differential equations (Brauer Rules for moving individuals between states in a
and Castillo-Chavez 2013). These are also state transition model are based on cycle length,
known as compartmental models, and they that is, how often state memberships are updated,
describe the transitions of individuals between and are given by probabilities for each possible
compartments (or states) in continuous time. transition.
Agent-based models are an extension of discrete Rules for moving individuals between states
event simulation that allows interactions between in DES models are based on time-to-event
individuals (Hunt et al. 2013; Luke and distributions, life tables that characterize the
Stamatakis 2012). This chapter focuses on models time between successive events or, possibly,
that are useful for noninfectious diseases. How- continuous growth. Time-to-event distributions
ever, many of the issues associated with DES and take positive values on and include distributions
state transition microsimulation also apply to typically used in survival analysis, such as
agent-based models. exponential and Weibul distributions. While
In summary: State transition models describe state transition models have a single type of
individual disease trajectories in discrete time, parameter (transition probabilities), DES models
with time periods given by cycle lengths. Discrete can incorporate a range of parameter types that
event simulation (DES) models describe individ- are associated with different time-to-event
ual disease trajectories in continuous time. Either distributions.
566 C. M. Rutter
Example: Colorectal Cancer Model allowed to accumulate multiple adenomas

The ColoRectal Cancer Simulated Population while in the adenoma state and multiple adeno-
model for Incidence and Natural history (CRC- mas and preclinical cancers while in the preclin-
SPIN) (Rutter and Savarino 2010) is used as the ical cancer state. In addition, the model simulates
primary example in this chapter, with assumptions size and location characteristics for each ade-
described in section “Example: CRC Screening.” noma. Figure 2 provides an example of the
types of event histories the model will simulate,
Will the Model Describe Events in Discrete or assigning date of birth as time zero across indi-
Continuous Time? viduals. Figure 3 depicts a single screening
This example compares two different screening event, at the same age for all individuals. For
tests for colorectal cancer. Test performance these hypothetical trajectories:
depends on the number and size of adenomas. In
addition, cancer incidence and survival both • Benefit is possible for trajectories A and B
depend on age and sex. The CRC-SPIN model because screening has the potential to prevent
describes events in continuous time (discrete disease through adenoma removal (A) or to
event simulation) enabling description of adenoma detect cancer at a potentially earlier stage
size and number using a limited number of states before it becomes symptomatic (B).
and allowing transitions to depend on age and sex. • Benefit is also possible for trajectory C, in
terms of cancer incidence, because screening
What Distinct Disease States has the potential to avert symptomatic disease.
and Characteristics Will the Model Describe? However, for this trajectory screening does not
The CRC-SPIN model (as shown in Fig. 1) improve survival because other-cause death is
describes six disease states. Individuals are simulated to occur before cancer death.
Fig. 2 Line graph showing

a hypothetical sequence of
simulated events in the
colorectal cancer model
Fig. 3 Line graph showing

the simulated effect of
screening in the colorectal
cancer model, using
symbols shown in Fig. 2
• There is no possible benefit for trajectories D, t after initiation. The minimum detectable ade-
E, F, or G. Although screening has the potential noma size is set to d0 = 1 mm, and the maximum
to detect and remove adenomas (D) or to detect adenoma size is set to d1 = 50 mm.
preclinical cancer (E), both trajectories simu- Variation in growth across adenomas is allo-
late death before the cancer becomes symp- wed by varying the time it takes to reach 10 mm,
tomatic. For trajectory F, the simulated given by t10 = ln((d1 10)/(d1 d0))/λ,
adenoma that could be detected at screening allowing t10 to follow a type I extreme value
does not develop into preclinical cancer before distribution. Individuals can transition out of the
other-cause death. For trajectory G, no disease adenoma state when adenomas are removed dur-
events are simulated to occur. ing colonoscopy. Individuals transition out of the
adenoma state in two ways: (1) any adenoma
transitioning to preclinical cancer, or (2) all ade-
When (and How) Do Individuals Transitions nomas are detected and removed during a
Between States? colonoscopy exam.
This component of the model is made up of the Simulated individuals transition into the pre-
mathematical functions and probability distribu- clinical cancer state when any one of their adeno-
tions that govern between state transitions. mas becomes cancerous. For each adenoma, the
The following section describes between state model assigns a size at transition based on the
transition rules for the CRC-SPIN model. Addi- lognormal distribution, with an expected size at
tional details are provided in Rutter and transition that depends on location in the colon
Savarino (2010). and rectum, gender, and age at initiation. Adeno-
The model describes the initiation of adenomas mas do not transition to preclinical cancer if the
using on a nonhomogeneous Poisson process that individual dies before the adenoma reaches tran-
allows adenoma risk to vary systematically by sition size. Once in the preclinical cancer state,
gender and age and to vary randomly across indi- disease can be screen-detected, perhaps at an ear-
viduals. Under this model, the log-risk of devel- lier stage than if it becomes clinical cancer, but the
oping an adenoma for the ith simulated individual person cannot transition back to the disease-free
is given by or adenoma-only states.
Simulated individuals transition into the clini-
X
4 cal cancer state when any preclinical cancer
α0i þ α1 sexi þ δðAk < agei ðtÞ Akþ1 Þ becomes clinically detected. Once the model sim-
( k¼1 ) ulates a preclinical cancer, the lesion is assigned a
Xk
agei ðtÞα2k þ Aj α2, j1 α2j time to clinical cancer, based on a lognormal
j¼2 distribution that depends on location of the pre-
clinical cancer (colon or rectum).
Here, δ() is an indicator function with δ(x) = 1 Once cancer is detected (clinically or through
when x is true and δ(x) = 0; otherwise, agei(t) is the screening), the model assigns a stage at detection.
ith individual’s simulated age at time t. Increases in For clinically detected cancers, stage is assigned
adenoma risk with age are modeled with a piece- using on the observed (SEER) stage distribution
wise linear function, with changes at Ai, with of clinically detected cancers. The model specifies
A1 = 20, A2 = 50, A3 = 60, A4 = 70, A5 = 100. that screen detection finds cancer at the same stage
Once an adenoma is initiated, the model or earlier than clinical detection. Simulated indi-
assigns two characteristics: a location in the viduals can only die from colorectal cancer after
colorectum (colon/rectum) and a growth rate. cancer is detected. Time from colorectal cancer
Adenomas grow based on the Janoschek growth diagnosis to death is based on survival probabili-
curve model, given by dij(t) = d1 (d1 d0) ties based on analysis of SEER data and is a
exp(λijt), where dij(t) is the maximum diameter function of age at diagnosis, gender, stage at diag-
of the jth adenoma in the ith individual at time nosis, and year of diagnosis.
568 C. M. Rutter
Individuals can transition to other-cause death Cost Data

from any health state (except cancer death). Data Cost data is incorporated into models that predict
from national death registries were used to model cost-effectiveness and may be needed for models
other-cause death, as described in the following that assume resource constraints.
section.
How Will Each Data Source Inform
the Model?
Step 3: Identify Data Sources A model’s credibility will also be affected by
how data source are used to inform a model.
While a model’s structure is driven by the deci- Data can be incorporated into the model in
sion problem at hand, data are required to inform three key ways: as an input, as a calibration
the model so that it can be used for accurate point, or as a validation point. High-quality
predictions. The credibility of a model will be data sources are based on large sample sizes,
affected by the quality of the data that inform the are free from biases, and report on health states
model. that are directly relevant to the model they
inform. Few data sources meet these criteria,
Which Data Will Inform the Model? and so the modeler must decide how to best use
Which data will inform the model? Common limited data.
data sources include registry data, published
study results, unpublished study results, and Model Inputs
cost data. Model inputs are set by the modeler and can
include a range of basic information needed for
Registry Data simulations. Examples of model inputs include
Registry data describe death and disease inci- the percentage of simulated individuals who are
dence. These data may directly inform transition female; characteristics of the intervention, such
probabilities. For example, national death regis- as the sensitivity and specificity of a screening
tries provide good information about the time to test; and life tables that provide all-cause sur-
other-cause death. Similarly, disease registries vival probabilities by sex and year of birth.
provide key information about incidence in a tar- Such model inputs are generally tied directly to
get population. data, with gender distributions coming from
census information, sensitivity and specificity
Published Data coming from published study results including
Published data includes results from both random- meta-analysis, and life tables coming from
ized trials and observational studies of disease registry data. Model inputs are pieces of
prevalence and characteristics and characteristics information that can be directly integrated into
of modeled interventions. When selecting data the model.
sources, the modeler must consider potential
biases. For example, individuals who choose to Calibration Targets
be screened for disease may be at higher risk Calibration targets are important statistics that
because of their family history. cannot be directly integrated into the model. For
example, it is important for a credible model to
Unpublished Data match observed disease rates as described by dis-
Unpublished data can provide a rich source of ease registry data. However, when disease inci-
information at a greater level of detail than is dence is the result of an accumulation of events,
possible from published sources or registry data. the modeler cannot directly incorporate this infor-
While useful for model development and evalua- mation as an input. Instead, model parameters are
tion, inclusion of unpublished data has the poten- selected so that the model is able to reproduce
tial to reduce model transparency. calibration points.
Validation Targets one was found to be a preclinical cancer, and

Validation targets are similar to calibration targets. among 673 lesions, over 10 mm 21 were found
However, rather than being used to select model to be preclinical cancer.
parameters, validation points are used to check the The CRC-SPIN model does not use
predictive ability of the model for new data, that unpublished data or cost data.
is, data not used as inputs or for model calibration.
Often, all data that are available at the time of
model development are used for calibration, and Step 4: Select Model Parameters
validation points are obtained from new studies
published after model development. Which Parameters Are “Inputs,”
and Which Parameters Will Be
Example: Colorectal Cancer Model Calibrated?
The CRC-SPIN model incorporates registry and As mentioned previously, some model parameters
published data. Two types of registry data inform are completely specified by the modeler. These are
the model. Data from the National Center for referred to as “inputs.” Inputs can include the age
Health Statistics Databases is used to develop range of the target population, the percent of
life tables that are used as an input to model women in the population, or, for more detailed
other-cause death (National Center for Health Sta- models, the distribution of risk factors in the target
tistics 2000). Life table must be extrapolated to population. Model inputs are directly informed by
model life spans for individuals born more data or, in the absence of data, expert opinion.
recently. The second type of registry data comes Other model parameters, which are the focus in
from the Surveillance Epidemiology and End the next section, are indirectly informed by
Results (SEER) (U.S. Department of Health & observed data and may need to be inferred
Human Services 2012). This registry data pro- through a process called model calibration.
vides observed incidence of colorectal cancer Calibration is used to select model parameters
and the underlying SEER population in 1978, that result in predictions that are consistent with
before the advent of colorectal cancer screening, (or “fit”) calibration targets. Calibration is needed
to provide incidence the absence of screening by because calibration targets are not directly related
age, stage at diagnosis, and cancer location. The to model parameters and therefore cannot be
number of cancers within the target age range is directly incorporated into the model, as inputs
used as calibration targets. Stage at diagnosis and can be. Calibration may also be needed to recon-
survival information from SEER are used as a cile multiple calibration targets, observed with
model inputs. error, that are not fully concordant. Finally, cali-
Several published data sources are used as bration provides a data-based approach to selec-
calibration points in the model. These include tion parameters that describe unobserved process.
results from studies describing adenoma preva- For example, information about the number and
lence (Rutter et al. 2007; Strul et al. 2006), ade- size of adenomas detected can provide informa-
noma count and size at detection (Lieberman et al. tion about two unobserved processes: the rate of
2000; Pickhardt et al. 2003), preclinical cancer adenoma initiation and the growth of
prevalence (Imperiale et al. 2000), studies of ade- adenomas size.
noma size, and presence of preclinical cancer
(Church 2004; Odom et al. 2005). For example, Which Goodness of Fit Measure Will Be
data from adenoma case series (Church 2004) Used to Guide Calibration?
were used to inform the probability of transition After setting calibration targets and identifying
of an adenoma to preclinical cancer as a function model parameters that will be calibrated, the mod-
of size. These data describe the percentage of eler must select a calibration method. An impor-
adenomatous lesions with preclinical cancer by tant aspect of the calibration method is the
size: among 666 lesions between 6 and 10 mm, measure of fit, that is, the statistic that will be
570 C. M. Rutter
used to measure how close the model predictions sample of parameters might miss regions of
are to the observed data. At least three measures good fit.
can be used to measure goodness of fit (GOF):
least squares, chi-squared, or likelihood methods Directed Searches
(Vanni and Karnon 2011). Least squares minimize Directed searches move through the parameter
the sum of squared differences between predicted space by “hill climbing,” that is, moving in a
values, Pi, and observed values Oi. The chi-square direction of improving goodness of fit. If the func-
approach scales these differences, for example, by tional form of the likelihood is available, then the
dividing by the standard deviation of the observed algorithm can take steps in directions that are
P 2
data, σ i : ðOi P
σi
iÞ
. The goal of calibration is to based on the derivative of the likelihood function,
minimize the distance between the observed and with movements in the direction of most rapid
predicted values, that is, to minimize the least increase (“up the hill”). In general, micro-
squares or chi-square statistics. A third common simulation models do not have closed form
approach is to use the likelihood of the data at a expression for these derivatives. This can be
specific parameter value, ^θ, that is, the probability addressed by using approximations to the deriva-
of the observed data at ^θ. The goal of calibration is tive or by using the Nelder-Mead algorithm,
to maximize the likelihood. The likelihood which does not require derivatives. Directed
approach requires specification of a probability searches may find parameter values that provide
distribution for observed data as a function of locally, but not globally, good fit to calibration
model parameters or simulation-based estimation targets. To avoid this problem, directed searches
of the likelihood at ^θ (Rutter et al. 2009). should be initiated at multiple widely dispersed
points within the parameter space. Directional
searches for model calibration are generally
Which Calibration Method Will Be Used more computationally efficient than grid search
for Parameter Selection? approaches, requiring fewer model runs for
The next step in model calibration is selection of a calibration.
search strategy. There are two primary approaches
to model calibration: undirected and directed
searches (Rutter et al. 2010). Implementation
Undirected Searches Once the model is completely specified, it is ready

Undirected searches involve exhaustive evalua- to be used to address decision problems by gener-
tion of the model at a defined set of points in the ating predictions across a range of scenarios.
parameter space. Models with few parameters A model run generally refers to a set predictions
may be able to use a grid search. Using this associated with a single set of model assumptions,
approach, the modeler defines a grid of parameter including the parameters associated with transi-
values. The model is evaluated at every point on tion probabilities and any interventions that the
the grid. The best parameter set is chosen from modeler has chosen to explore. A “base case” run
these, as the parameter that provides the closest fit generally refers to a run with assumptions that are
to the observed data. A related approach uses a believed to be most plausible.
randomly selected set of parameter values, with
evaluation of the model at every point in this
selected set. Undirected searches are theoretically Example: Comparison of Two Tests
easy to apply, but this approach is not computa- to Screen for Colorectal Cancer
tionally feasible for highly parameterized models,
because the number of grid nodes grows exponen- This section continues with the example compar-
tially with the number of model parameters. Fur- ing two approaches to screening for colorectal
thermore, even a dense grid or a large random cancer: annual screening with a fecal
immunochemical stool test (FIT) and screening Sensitivity Analysis

every 10 years with colonoscopy. Because these
analyses focus on screening beginning at a partic- In some cases, a model parameter cannot be
ular age (50), rather than a screening program that informed by data. In this case, the modeler may
begins in a particular year, the model is was used choose to select a specific value for the parameter
to simulate a cohort of individuals who turned and explore its effect on predictions through sen-
50 in an arbitrarily selected year (2012). All sim- sitivity analysis. Sensitivity analysis refers to
ulated individuals were free of clinically detect- model runs that systematically vary the values
able CRC on their 50th birthday. Model of model parameters, and modelers examine the
predictions are based on a single run of the sensitivity of the predictions to the choice of
model with ten million simulated individuals. parameters values. Sensitivity analyses can also
Model parameters were calibrated using a provide insight into the impact of specific model
likelihood-based approach (Rutter et al. 2009; assumptions. For example, sensitivity analysis
Rutter and Savarino 2010). can be used to explore whether adenoma regres-
Table 2 shows the predicted results for the no sion, which cannot be directly observed, is plau-
screening and two screening scenarios, focusing sible by comparing predictions under specific
on the number of colorectal cancers detected and scenarios, such as a model with no regression
the number of colorectal cancer deaths. These and model that assumes that 10% of adenomas
outcomes were also used to predict the number regress (Loeve et al. 2004).Probabilistic sensitiv-
of colorectal cancers prevented, the number ity analysis places distributions on unknown
of colorectal cancer deaths prevented, and parameters, providing a range of possible results.
life-years gained. Screening colonoscopies are Parameters are sampled from specified distribu-
defined to include primary screening exams, tions, and multiple model runs are used to infer
exams indicated because of a positive FIT variability in model predictions that result from
result and exams that are part of short-interval variability in model parameters (Briggs et al.
follow-up. 2012; Cronin et al. 1998; Doubilet et al. 1985;
The model predicts that screening annually Parmigiani 2002). Sensitivity analyses are com-
with FIT or every 10 years with colonoscopy is mon, largely because most models include
both effective at reducing colorectal cancer inci- unobservable components.
dence and deaths from colorectal cancer. Com-
pared to FIT, for every 100,000 50-year-olds
entering screening colonoscopy results in 0.22 Exploration and Description of Model
fewer colorectal cancer cases, 0.13 fewer colo- Uncertainty
rectal cancer deaths, and 1.6 more life-years
gained but requires 255.7 more screening Models are used to predict unobserved out-
colonoscopies. comes based on imperfect knowledge, and
Table 2 Simulated effect of screening for colorectal cancer, based on a cohort of individuals screened at age 50. The table
below shows predictions per 100,000 individuals screened
No screening FIT every year Colonoscopy every 10 years
Screen detected colorectal cancers 0 0.49 0.13
Clinically detected colorectal cancers 5.73 0.64 0.42
Colorectal cancer deaths 2.08 0.30 0.17
Colorectal cancers prevented 0 5.09 5.31
Colorectal cancer deaths prevented 0 1.78 1.91
Life-years gained 0 19.05 20.92
Number of screening Colonoscopies 0 173.4 429.1
572 C. M. Rutter
these predictions are uncertain. Several sources Structural Variability

of uncertainty have been identified and are Structural variability refers to variability that
described below (Briggs et al. 2012). results from the states selected and the rules for
transitioning between states that are described by
Stochastic, or “First-Order,” Uncertainty the model. Structural variability can be addressed
Stochastic, or “first-order,” uncertainty refers to using a single model through sensitivity analyses,
the uncertainty that results from using a stochas- focused on the most uncertain aspects of the
tic rather than deterministic decision model. model. This approach generally requires
Stochastic uncertainty is analogous to random recalibration of each unique model. This “single
error in regression models. Because modelers model” approach is complicated because the
report average effects, simulating very large model states and transition rules are often selected
sample sizes can essentially eliminate stochastic very deliberately and in consultation with experts.
error. Structural variability can also be addressed
through cross-validation or comparative
modeling.
Parameter, or “Second-Order,”
Uncertainty
Parameter, or “second-order,” uncertainty refers Model Validation
to the uncertainty that results from having to
calibrate model parameters and is related to the Model validation is a critical component of model
data that are available to inform parameters. development. Validation is required to gain confi-
Assessment of parameter uncertainty requires dence in the model. There are five types of valid-
elimination of stochastic uncertainty, but param- ity, outlined below: face validity, internal validity,
eter uncertainty is rarely reported because most cross-validity, external validity, and predictive
model calibration is based on search strategies validity (Eddy et al. 2012).
that do not directly provide standard error
estimates. Instead, modelers sometimes use Face Validity
sensitivity analysis to address parameter vari- Face validity is subjective and refers to whether
ability, running the model at different the model “makes sense.” Face validity of the
parameter values and describing the relationship model relates to the model structure and data
between parameter variability and variability used to inform the model. Face validity depends
in model predictions. Findings from this type on model transparency, the clear description of
of sensitivity analysis can be used to direct the model structure and inputs.
model improvement toward reducing variability To achieve face validity, models need both
of those parameters that have the greatest nontechnical and technical documentation. Non-
impact on prediction variability, for example, technical documentation should provide basic
through additional data collection or, when information about:
appropriate, modifications to the model
structure. • Model Structure: The type of model, health
states, and nontechnical descriptions of general
Systematic Variability or rules for transitions between states.
“Heterogeneity” • Model Inputs: This should include a descrip-
Systematic variability or “heterogeneity” refers to tion of inputs specified by the modeler to char-
variability that is built into the model. For exam- acterize the target population and inputs that
ple, a model may include systematic differences in are directly informed by observed data.
the disease process or intervention effects that are Depending on the model, a description of the
a function of individual characteristics (age, sex, model parameters selected using calibration
race, risk factors). may or may not be useful. When models
include costs, the costs assigned to various validation does not provide a method for choosing
actions and events in the model need to be the correct or best model.
clearly described. The National Cancer Institute, through the
• Calibration Targets and Model Fit to Targets: CISNET group (National Cancer Institute), has
This provides information about observed championed the comparative modeling approach,
information that the model is able to accurately by funding more than one modeling group to
simulate and how accurately the model simu- address policy questions. Examples of compara-
lates these data. tive modeling include estimation of the combined
effects of screening and treatment on breast cancer
Technical documentation should be sufficiently mortality based on seven CISNET models for
detailed to enable others to reproduce the model, if breast cancer (Berry et al. 2006) and the
they wish. This documentation should include: Mt. Hood Challenge comparing diabetes models
(The Mount Hood 4 Modeling Group 2007). Each
• Mathematical formulae for transition rules: if of these groups compared models only after stan-
the model is based entirely on fixed transition dardizing the calibration targets. Without such
probabilities, then these should be provided. cooperation, with each group simulating and pre-
• Methods used for model calibration: as this senting results under the same conditions, it can
would enable others to reproduce the model. be difficult to directly compare model results.
Cross-model comparisons can be very time con-
While release of computer code is seemingly suming, involving coordination across modeling
the most transparent approach, this strategy is groups, and so are generally only practical for
time consuming and ultimately uninformative to major policy questions.
the vast majority of end users so that code release
may obscure rather than clarify the model External Validation
assumptions. External validation refers to how well the model is
able to predict (or “fit”) existing data that was not
used for model calibration. Predictive validation
Internal Validity takes this idea a step further and refers to how well
Internal validity, or verification, refers to coding the model is able to predict study outcomes before
accuracy. Verification of code is a process that they are observed. Among the validity measures
takes place within a modeling team and can be discussed, external validity and predictive validity
facilitated by modular programming to allow test- most closely correspond to the models’ purpose
ing of specific blocks of code. and therefore are critical to model confidence. Yet
it is uncommon for models to carry out external or
Cross-Validation predictive validation exercises, largely because of
Cross-validation, also known as comparative data limitations.
modeling, is based on comparing results obtained Both external and predictive validation exer-
from different models and is the primary method cises require new data. For a model to be imme-
for evaluating structural variability. Cross- diately validated after development, some data
validation provides a way to assess model pre- would have to be held out for validation. But
dictions in the absence of observed or “gold stan- because models are complex, modelers often
dard” information and also provides a way of need to use all available data to inform parame-
exploring unobserved or unobservable phenom- ters. In some cases, modelers may validate to data
ena that are predicted by the model but cannot be that is partially dependent on calibration data,
validated against observed data such as predicted which represents a gray area between goodness
disease incidence in future years. Cross-validation of fit to calibration targets (sometimes referred to
may be reassuring when model predictions are as internal calibration) and external validation.
similar, but when there are differences, cross- For example, a model may use overall disease
574 C. M. Rutter
incidence rates by decades of age as a calibration of the ISPOR-SMDM modeling good research prac-
target and then validate the model by predicting tices Task Force Working Group-6. Med Decis Mak.
2012;32(5):722–32.
incidence rates by sex and age in years. To main- Cancer Incidence – Surveillance, Epidemiology, and End
tain trust in a model, it is critical that modelers be Results (SEER) Registries Research Data [database on
transparent about their validation approaches, the Internet]. National Cancer Institute, Surveillance
clearly stating when partially dependent data are Systems Branch. 2012. Available from: http://seer.can
cer.gov/data/seerstat/nov2011/.
used for validation. Caro JJ, Briggs AH, Siebert U, et al. Modeling good
research practices – overview: a report of the ISPOR-
SMDM modeling good research practices Task Force-
In Conclusion 1. Med Decis Mak. 2012;32(5):667–77.
Church JM. Clinical significance of small colorectal
polyps. Dis Colon Rectum. 2004;47(4):481–5.
Microsimulation models are a powerful tool for CISNET. 2014. Available at: http://cisnet.cancer.gov.
systematically combining evidence from a variety Accessed 30 Apr 2014.
of sources to provide critical information to health Cronin KA, Legler JM, Etzioni RD. Assessing uncertainty
in microsimulation modelling with application to can-
policy decision maker. Decision problems can be cer screening interventions. Stat Med. 1998;17(21):
unconstrained, assuming unlimited resources, or 2509–23.
they can be constrained to restrict resources such Doubilet P, Begg CB, Weinstein MC, et al. Probabilistic
as total costs or treating physicians. The use of sensitivity analysis using Monte Carlo simulation. A
practical approach. Med Decis Mak. 1985;5(2): 157–77.
models to inform policy is increasing, partly due Eddy D. Breast cancer screening for Medicare beneficiaries:
to increasing computational power but also effectiveness, costs to Medicare and medical resources
because of increasing interest in evidence-based required. Washington, DC: U.S. Congress, Health Pro-
medicine. Yet there remain concerns about credi- gram, Office of Technology Assessment; 1987.
Eddy DM, Hollingworth W, Caro JJ, et al. Model transpar-
bility of model predictions. These concerns are a ency and validation: a report of the ISPOR-SMDM
natural consequence of the complexity of models modeling good research practices Task Force-7. Med
and their focus on prediction, which requires Decis Mak. 2012;32(5):733–43.
extrapolation beyond available data. One way to Etzioni R, Penson DF, Legler JM, et al. Overdiagnosis due
to prostate-specific antigen screening: lessons from
build model credibility is to make model assump- U.S. prostate cancer incidence trends. J Natl Cancer
tions as transparent as possible. Another way to Inst. 2002;94(13):981–90.
build credibility is through model predictions, that Hardcastle JD, Chamberlain JO, Robinson MH, et al.
is, by comparing model predictions to observed Randomised controlled trial of faecal-occult-blood
screening for colorectal cancer. Lancet. 1996;348
data and, when possible, allowing end users to (9040):1472–7.
examine model predictions under different hypo- Hixson LJ, Fennerty MB, Sampliner RE, et al. Prospective
thetical scenarios. study of the frequency and size distribution of polyps
missed by colonoscopy. J Natl Cancer Inst. 1990;
82(22):1769–72.
Hunt CA, Kennedy RC, Kim SH, et al. Agent-based
References modeling: a systematic assessment of use cases and
requirements for enhancing pharmaceutical research
Atkin WS, Edwards R, Kralj-Hans I, et al. Once-only and development productivity. Wiley Interdiscip Rev
flexible sigmoidoscopy screening in prevention of Syst Biol Med. 2013;5(4):461–80.
colorectal cancer: a multicentre randomised controlled Imperiale TF. Sigmoidoscopy screening: understanding the
trial. Lancet. 2010;375(9726):1624–33. trade-off between detection of advanced neoplasia and
Beck J, Pauker S. The Markov process in medical progno- diagnostic efficiency. J Natl Cancer Inst. 2013;
sis. Med Decis Mak. 1983;3:419–58. 105(12):846–8.
Berry DA, Inoue L, Shen Y, et al. Modeling the impact of Imperiale TF, Wagner DR, Lin CY, et al. Risk of advanced
treatment and screening on U.S. breast cancer mortal- proximal neoplasms in asymptomatic adults according
ity: a Bayesian approach. J Natl Cancer Inst Monogr. to the distal colorectal findings. N Engl J Med.
2006; 36:30–6. 2000;343(3):169–74.
Brauer F, Castillo-Chavez C. Mathematical models for Johnson CD, Chen MH, Toledano AY, et al. Accuracy of
communicable diseases. Philadelphia: Society for CT colonography for detection of large adenomas and
Industrial and Applied Mathematics; 2013. cancers. N Engl J Med. 2008;359(12):1207–17.
Briggs AH, Weinstein MC, Fenwick EA, et al. Model Karnon J, Stahl J, Brennan A, et al. Modeling using dis-
parameter estimation and uncertainty analysis: a report crete event simulation: a report of the ISPOR-SMDM
modeling good research practices Task Force-4. Value Pitman R, Fisman D, Zaric GS, et al. Dynamic transmis-
Health. 2012;15(6):821–7. sion modeling: a report of the ISPOR-SMDM
Knudsen AB, Lansdorp-Vogelaar I, Rutter CM, et al. modeling good research practices Task Force
Cost-effectiveness of computed tomographic Working Group-5. Med Decis Mak. 2012;32(5):
colonography screening for colorectal cancer in the 712–21.
Medicare population. J Natl Cancer Inst. 2010; Rex DK, Cutler CS, Lemmel GT, et al. Colonoscopic miss
102(16):1238–52. rates of adenomas determined by back-to-back colonos-
Kronborg O, Fenger C, Olsen J, et al. Randomised study of copies. Gastroenterology. 1997;112(1): 24–8.
screening for colorectal cancer with faecal-occult- Rex DK, Johnson DA, Anderson JC, et al. American
blood test. Lancet. 1996;348(9040):1467–71. College of Gastroenterology guidelines for colorectal
Lansdorp-Vogelaar I, Kuntz KM, Knudsen AB, et al. Stool cancer screening 2009 [corrected]. Am J Gastroenterol.
DNA testing to screen for colorectal cancer in the 2009;104(3):739–50.
Medicare population. A cost-effectiveness analysis. Roberts M, Russell LB, Paltiel AD, et al. Conceptualizing a
Ann Intern Med. 2010;153(6):368–77. model: a report of the ISPOR-SMDM modeling good
Levin B, Lieberman DA, McFarland BG, et al. Screening research practices Task Force-2. Med Decis Mak.
and surveillance for the early detection of colorectal 2012;32(5):678–89.
cancer and adenomatous polyps, 2008: a joint guideline Rutter CM, Savarino JE. An evidence-based micro-
from the American Cancer Society, the US Multi- simulation model for colorectal cancer. Cancer
society task force on Colorectal Cancer, and the Amer- Epidemiol Biomark Prev. 2010;19(8):1992–2002.
ican College of Radiology. Gastroenterology. 2008; Rutter CM, Yu O, Miglioretti DL. A hierarchical
134(5):1570–95. non-homogenous Poisson model for meta-analysis of
Lieberman DA, Weiss DG, Bond JH, et al. Use of colonos- adenoma counts. Stat Med. 2007;26(1):98–109.
copy to screen asymptomatic adults for colorectal can- Rutter CM, Miglioretti DL, Savarino JE. Bayesian calibra-
cer. Veterans affairs cooperative study group 380. tion of microsimulation models. J Am Stat Assoc.
N Engl J Med. 2000;343(3):162–8. 2009;104(488):1338–50.
Loeve F, Boer R, Zauber AG, et al. National Polyp Study Rutter CM, Zaslavsky AM, Feuer EJ. Dynamic micro-
data: evidence for regression of adenomas. Int J Cancer. simulation models for health outcomes: a review. Med
2004;111(4):633–9. Decis Mak. 2010;31(1):10–8.
Luke DA, Stamatakis KA. Systems science methods in Siebert U, Alagoz O, Bayoumi AM, et al. State-transition
public health: dynamics, networks, and agents. Annu modeling: a report of the ISPOR-SMDM modeling
Rev Public Health. 2012;33:357–76. good research practices Task Force-3. Med Decis
Mandelblatt J, Schechter C, Levy D, et al. Building better Mak. 2012;32(5):690–700.
models: if we build them, will policy makers use them? Strul H, Kariv R, Leshno M, et al. The prevalence rate
Toward integrating modeling into health care decisions. and anatomic location of colorectal adenoma and
Med Decis Mak. 2012;32(5):656–9. cancer detected by colonoscopy in average-risk indi-
Muller CM, Mandelblatt J, Schechter C. The cost and viduals aged 40–80 years. Am J Gastroenterol.
effectiveness of cervical cancer screening in elderly 2006;101(2):255–62.
women. Washington, DC: Congress of the United The Mount Hood 4 Modeling Group. Computer modeling
States, Office of Technology Assessment; 1990. of diabetes and it's complication: a report on the 4th
National Cancer Institute. Cancer Intervention and Surveil- Mount Hood challenge meeting. Diabetes Care.
lance Modeling Network (CISNET). n.d.. Available at: 2007;30:1638–46.
http://cisnet.cancer.gov/. Accessed 2008. Towler B, Irwig L, Glasziou P, et al. A systematic review
National Center for Health Statistics. US Life Tables. of the effects of screening for colorectal cancer
2000.; Available at: www.cdc.gov/nchs/products/ using the faecal occult blood test, hemoccult. BMJ.
pubs/pubd/lftbls/life/1966.htm. Accessed 2013. 1998;317(7158):559–65.
Odom SR, Duffy SD, Barone JE, et al. The rate of adeno- U. S. Preventive Services Task Force. Screening for colo-
carcinoma in endoscopically removed colorectal rectal cancer: U.S. Preventive Services Task Force
polyps. Am Surg. 2005;71(12):1024–6. recommendation statement. Ann Intern Med.
Parmigiani G. Measuring uncertainty in complex decision 2008;149(9):627–37.
analysis models. Stat Methods Med Res. 2002;11(6): van der Akker-van Marle ME, van Ballegooijen M, van
513–37. Ootmarssen GJ, et al. Cost-effectivness of cervical
Petitti DB. Meta-analysis, decision analysis, and cost- cancer screening: comparison of screening policies.
effectiveness analysis: methods for quantitative synthe- J Natl Cancer Inst. 2002;94:193–204.
sis in medicine. 2nd ed. New York: Oxford University Vanni T, Karnon J, Madan J, et al. Calibrating models in
Press; 2000. 306 p. economic evaluation: a seven-step approach.
Pickhardt PJ, Choi JR, Hwang I, et al. Computed tomo- PharmacoEconomics. 2011;29(1):35–49.
graphic virtual colonoscopy to screen for colorectal Vogelaar I, Van Ballegooijen M, Schrag D, et al. How much
neoplasia in asymptomatic adults. N Engl J Med. can current interventions reduce colorectal cancer mor-
2003;349(23):2191–200. tality in the U.S.? Cancer. 2006; 107:1623–33.
Network Meta-analysis
25
Georgia Salanti, Deborah Caldwell, Anna Chaimani, and
Julian Higgins
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
Example: Incident Diabetes with Antihypertensive Drugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
A Roadmap to the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
Meta-analysis of Head-to-Head Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
Types of Data that Feed into a Meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
Meta-analysis and Meta-regression as Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
Meta-analysis as Hierarchical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
Fitting the Meta-analysis Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
Indirect and Mixed Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
Theory and Formulae for Indirect Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
Theory and Formulae for Mixed Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
Assumptions Underlying Indirect and Mixed Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
Models for Network Meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
Consistency Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
Assumptions of Network Meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
Statistical Methods to Detect Inconsistency in a Network of Interventions . . . . . . . . . . . . 603
Inconsistency Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
G. Salanti (*) · A. Chaimani

Department of Hygiene and Epidemiology, University of
e-mail: georgia.salanti@ispm.unibe.ch;
annachaimani@gmail.com
D. Caldwell
School of Social and Community Medicine, University of
Bristol, Bristol, UK
e-mail: d.m.caldwell@bristol.ac.uk
J. Higgins
MRC Biostatistics Unit, Cambridge, UK
Centre for Reviews and Dissemination, University of York,
York, UK
e-mail: julian.higgins@bristol.ac.uk

https://doi.org/10.1007/978-1-4939-8715-3_36
578 G. Salanti et al.
Exploring Heterogeneity and Inconsistency: Network Meta-regression . . . . . . . . . . . . . . . 608

Numerical and Graphical Presentation of Results
from Network Meta-analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
Abstract limited. Moreover, although clinical and policy

The increasing number of alternative treatment making interest lies in comparing active agents,
options for the same condition created the need new drugs are commonly compared with placebo
to undertake reviews that address complex in order to obtain marketing authorization. Given
policy-relevant questions and make inferences that clinical practice changes over time and that
about many competing treatments. Such licensed or reference treatments differ across
reviews collect data which, under conditions, countries, it is unrealistic to expect that individ-
can be statistically synthesized using network ual trials and pairwise meta-analyses can provide
meta-analysis. This chapter presents the basic evidence of comparative effectiveness for every
concepts of indirect and mixed comparison of intervention of interest.
treatments and presents the statistical models for The need to compare multiple competing treat-
network meta-analysis and their implementation ments to inform clinical guidelines and health
both theoretically and in examples. The assump- technology appraisals has underpinned the devel-
tion underlying network meta-analysis is exten- opment of network meta-analysis. Also known as
sively discussed and extensions of the models to a multiple treatment meta-analysis and mixed
account for effect modifiers are presented. treatment comparisons, a network meta-analysis
simultaneously combines direct and indirect
information across a network of studies to make
inferences regarding the relative effectiveness of
Introduction multiple interventions. An indirect comparison,
which underpins the method, is a simple idea:
Meta-analyses of randomized controlled trials treatment. A can be compared with treatment B
(RCTs) are often considered to provide the via a common comparator C, by statistically com-
most reliable and valid evidence on which to bining the comparison A versus C (AC) and B
base healthcare decisions, usually ranking versus C (BC) studies. Several applications and
above single RCTs in evidence-based medicine methods papers have outlined the benefits of com-
(EBM) hierarchies of evidence (Higgins and bining direct and indirect evidence in a network
Green 2008). Meta-analysis is an integral part meta-analysis (Caldwell et al. 2005; Cooper et al.
of EBM, used by international health organiza- 2011; Hoaglin et al. 2011; Mills et al. 2011).
tions such as the World Health Organization and These include improvement in precision for the
The Cochrane Collaboration, and is widely used estimated effect sizes and the ability to compare
to inform health technology assessment and clin- treatments that have not been directly compared in
ical guidelines produced by organizations such any trial. Despite the increasing number of appli-
as the Canadian Agency for Drugs and Technol- cations, network meta-analysis is far from being
ogies in Health (CADTH), the National Institute an established practice. Many authors emphasize
for Health and Clinical Excellence (NICE), and the secondary or supplementary nature of the ana-
the Agency for Healthcare Research and Quality lyses, giving priority to direct evidence (NICE
(AHRQ). However, as meta-analysis tradition- 2008; Edwards et al. 2009). Network meta-
ally compares two treatments at a time analyses are often considered controversial
(a pairwise comparison) its use in the presence (Piccini and Kong 2011; Thijs et al. 2008), for
of multiple competing treatment options is example, a recent evaluation of the relative
25 Network Meta-analysis 579
effectiveness of twelve new-generation antide- A Roadmap to the Chapter

pressants attracted supporters as well as skeptics
(Barbui et al. 2009; Cipriani et al. 2009). The chapter starts by setting up notation
and the two most commonly applied models
used for meta-analysis and meta-regression
of pairwise comparison and discusses the
Example: Incident Diabetes frequentist and Bayesian implementation of
with Antihypertensive Drugs the models (section “Meta-analysis of Head-to-
Head Comparisons”). Section “Indirect
To exemplify all methodologies of this chapter, and Mixed Comparison” describes the theory
a published network meta-analysis (Elliott of indirect comparisons and the combination
and Meyer 2007) will be used. It is based on a of these with direct comparisons (sometimes
systematic review that aimed to compare called “mixed” comparisons) in a simple
antihypertensive drugs with respect to the three-treatment network consisting of trials
incidence of diabetes. The review included that compare any two of the three treatments.
22 randomized controlled trials. The competing The key assumption required to derive
interventions are placebo (P), β-blockers valid indirect and mixed estimates
(BB), diuretics (D), calcium channel blockers is extensively discussed in section “Assumptions
(CCB), angiotensin converting enzyme (ACE) Underlying Indirect and Mixed Comparisons.”
inhibitors and angiotensin II receptor Section “Models for Network Meta-analysis” is
blockers (ARB). more technical and describes the models used to
Four studies include three arms and the fit network meta-analysis and discusses statistical
rest are two-arm trials. All comparisons methods to detect and account for violation of the
have been evaluated in at least one study key assumption. Section “Models for Network
except for the comparisons ARB versus ACE Meta-analysis” concludes by outlining extensions
inhibitors for which no studies exist. Figure 1 of network meta-analysis to account for the
shows the plot of this network of intervention impact of effect modifiers. Section “Numerical
comparisons. and Graphical Presentation of Results
B-blockers CCB
5
3
1 3
1
ARB 1 2 ACE
2 inhibitors
1 3
1 2
2
3 Diuretics
Placebo
Fig. 1 Plot of network for incidence of diabetes. The size network. The numbers represent the number of studies
of the nodes is proportional to the number of studies that including each comparison (CCB calcium channel
evaluate each intervention and the thickness of the lines is blockers, ARB angiotensin receptor blockers, ACE angio-
proportional to the frequency of each comparison in the tensin converting enzyme)
from Network Meta-analysis” reviews numerical fulfill the predefined inclusion criteria. Consider
and graphical methods for presenting that N studies, indexed with i = 1 , . . . , N and
results from network meta-analysis to assist comparing two treatments. A and B are included
clinicians with interpretation of findings. that contribute data for a particular outcome of
interest. The data from each study can be arm
based or contrast based. The first term refers to
Meta-analysis of Head-to-Head data that apply to each arm; for dichotomous
Comparisons outcomes these can be the number of successes
riA, riB out of the total randomized niA, niB for the
Pairwise meta-analysis summarizes the relative arms A and B, respectively. For continuous out-
effectiveness of two interventions across comes, the arm-based data are the outcome
N studies. Two basic parametric models are usu- means miA, miB, standard deviations sdiA, sdiB
ally used: the fixed-effect model and the random and total numbers of participants niA, niB
effects model. Under the fixed-effect assump- per arm.
tion, it is considered that all studies estimate the Instead of presenting the outcome in each arm
same underlying treatment effect. In the random separately, a study can report the difference in
effects model, it is assumed that there is a study- the outcome between the two arms using a sta-
specific treatment effect underlying each study tistic. The contrast-based approach refers to
and that the observations from different studies study-specific statistics that compare the two
estimate these different underlying effects. The arms. With dichotomous outcomes, the statistics
study-specific underlying effects can be different are usually the odds ratios (OR), risk ratios, risk
yet related, and it is assumed that they “belong” differences, or hazard ratios, whereas for contin-
to the same distribution. The variance of this uous outcomes, it is usually mean differences,
distribution is the heterogeneity parameter standardized mean difference, or ratios of means.
describing the magnitude of the between-study The logarithmic transformation of ratio mea-
variation. Meta-analysis can be viewed as spe- sures (e.g., odds and risk ratios) is typically
cial case of a weighted linear regression or as a applied in practice. Let yiAB be generic notation
hierarchical model. Both models are equivalent; for one of these statistics, which will be referred
though linear regression approaches are the most to as “the effect size.” The sample variance of an
common approach in a frequentist implementa- effect size will be denoted with s2iAB Of course,
tion when treatment effect estimates are the arm-specific data can be transformed into
starting point of the analysis (known as a “con- contrast-based data before the start of the analy-
trast-based” approach), and hierarchical sis. However, modeling arm-specific data is
approaches are usually encountered when sum- often an advantage in terms of model fit and
mary data from each treatment group are the therefore detailed data, if available, should be
starting point of the analysis (an “arm-based” given preference.
approach), often fitted in a Bayesian framework.
These ideas are discussed in detail in the follow-
ing three subsections.
Meta-analysis and Meta-regression
as Linear Model
Types of Data that Feed into a Meta-
analysis Meta-analysis can be viewed as a linear regression
model with no covariates. As each observation
The systematic review process first requires represents a study and these studies typically
identification and appraisal of studies that have different sample sizes, it is reasonable to
address the research question of interest. Then, weight the observations accordingly; hence,
relevant data are extracted from the studies that meta-analyses are fitted as a weighted linear
model. In a random effects meta-analysis, a Meta-analysis as Hierarchical Model

study’s effect size is given by
An alternative representation of the random
yi ¼ μ þ δi þ ei (1) effects meta-analysis model is to consider two
levels of estimation hierarchy: one level for the
where μ is the summary treatment effect and the observation in each study that estimates the study-
random errors are assumed normally distributed specific underlying effect and a second level for
all the study-specific underlying effects that arise

ei N 0, s2i (2) form a common distribution centered around the
meta-analysis summary effect. Specifically, in
The quantities δi account for random variation each study the observed effect size yi is assumed
in the treatment effects across studies (heteroge- normally distributed with mean equal to the
neity) and are assumed to be normally underlying effect size θi, and uncertainty reflected
distributed as δi N(0, τ2). Setting either the by the sample variance:
heterogeneity variance τ2 to be zero or all

δi = 0 reduces the model to the fixed-effect yi N θi , s2i (4)
model.
Equation 1 can be extended into meta- Then, it is assumed that the underlying θi form
regression in order to account for variability in a common distribution with expectation
the effect sizes with respect to a trial-specific The μvariance of the distribution is the
variable xi: heterogeneity:
yi ¼ μ1 þ μ2 xi þ δi þ ei
θi N μ, τ2 (5)
When xi is a categorical variable then the

The equivalence between the two alternative
meta-regression model is equivalent to subgroup
representations (linear and hierarchical model) is
analysis. Consider, for example, that the system-
seen by identifying θi with δi + μ. The fixed
atic review comprises studies with appropriate
effects model can be obtained by substituting
and inappropriate blinding (subgroups 1 and 2).
θi = μ into distribution (5).
Then, using the dichotomous variable xi as an
The hierarchical model presented in this sec-
index variable which takes values 0 for appro-
tion can be used to model arm-specific data
priate and 1 for inappropriate blinding, a
instead of effect sizes. This offers the advantage
subgroup analysis via meta-regression can be
that the true likelihood of the data can be used and
fitted. Then, the summary effect μ1 would be
bypasses the assumption of normality for the
the summary estimate from the subgroup of
observed effect sizes (as reflected in distributions
appropriately blinded studies and μ1 + μ2
(2) and (4)), often yielding better fit of the models.
would be the summary estimate from the sub-
For example, when the outcome is dichotomous
group of inappropriately blinded studies. In a
the normal likelihood in Eq. 5 can be substituted
general framework of F subgroups indexed
by two binomial likelihoods:
with f, a meta-regression model can be fitted
without intercept:
r iA BðpiA , niA Þ
X
F
yi ¼ μf xif þ δi þ ei (3) r iB BðpiB , niB Þ
f ¼1
Then the probabilities of success in the two
where now the regression coefficients μf are the arms can be parameterized to derive contrast-
summary estimates in subgroups. specific parameters θi using a link function φ:
φðpiA Þ ¼ ui likelihood (see Viechtbauer 2007). Accounting

for uncertainty in the estimation of heterogeneity
φðpiB Þ ¼ ui þ θi is possible, but most existing software does not
include uncertainty for τ in standard meta-analysis
For instance, the underlying study-specific routines. The frequentist estimates invariably per-
treatment effect θi can be the log-odds ratio if φ form poorly when few studies are included in the
is the logit function or the log-risk ratio if φ is the meta-analysis. In a Bayesian framework, τ may
logarithmic function. For more details, see (Warn easily be treated as a random variable and is given
et al. 2002). a prior distribution which, combined with likeli-
When the outcome is continuous, distribution hood statement, provides inference on the (poste-
(4) is substituted by the two normal distributions rior) distribution of the heterogeneity parameter.
for the means in the two arms: Therefore, uncertainty about the estimation of τ is
always introduced and impacts on the results.
pffiffiffi
miA N λiA , ðsd iA Þ2 = niA However, with few studies, Bayesian estimation
of heterogeneity is also problematic because the
pffiffiffi choice of the prior distribution may have consid-
miB N λiB , ðsd iB Þ2 = niB erable impact on the results since little informa-
tion is provided from the data (Lambert et al.
Then, the effect size can be derived by parame- 2005). In such cases, it is particularly advisable
terizing the two means λiA and λiB; for example, the to carry out sensitivity analyses.
mean difference could be derived as θi = λiA λiB In a Bayesian framework the fit of the model to
For either type of data (dichotomous or contin- the data can be measured by calculating the pos-
uous) and for any statistic, the underlying effects terior mean residual deviance D The model fits the
θi are assumed to arise from a common distribu- data adequately when D approximates the number
tion as in (5). of unconstrained data points (e.g., the number of
studies when the contrast-based approach is used
in a head-to-head meta-analysis). The deviance
information criterion (DIC) is the sum of D and
Fitting the Meta-analysis Model the effective number of parameters, pD, and pro-
vides a measure of model fit penalized for model
The meta-analysis models above can each be complexity (Spiegelhalter et al. 2002). It has an
fitted within a frequentist or a Bayesian frame- interpretation similar to the Akaike information
work. This section briefly summarizes the practi- criterion: lower values of the DIC suggest a better
cal differences between the two approaches and compromise between model fit and complexity.
the implications they might have for the summary A difference in DIC of three units or more is
estimates. For a more detailed overview of the usually considered important. DIC can be used
Bayesian methodology, the reader should refer to to compare different models as long as they are
Spiegelhalter et al. (2004) and Sutton and Abrams applied to the same amount of data. For example,
(2001). The choice between the different frame- DIC can be used to select between different
works depends primarily on familiarity with the meta-regression models to choose between
required software and methods. consistency and different inconsistency models,
The main practical differences between as will be discussed later.
frequentist and Bayesian implementations relate An advantage of the Bayesian fitting of the
to how the methods estimate the heterogeneity. In models is that the posterior distribution can be
most frequentist implementations, the parameter τ directly interpreted as the probability distribution
is assumed “known” and several estimation of the quantity of interest (e.g., summary effect,
approaches have been proposed including the heterogeneity). Consequently, probabilistic state-
method of moments and restricted maximum ments follow naturally; it is straightforward to
calculate probabilities of one treatment being bet- with dummy variables, to carry out subgroup anal-
ter than the other, or outperforming another by a ysis on contrast-based data (the ln(OR) for diabetes
specific magnitude. This is an important advan- from each study), using the treatments being com-
tage when many treatments need to be compared pared to define two subgroups. There are three
and pairwise presentation of effect sizes becomes studies comparing ACE inhibitors versus
cumbersome. Calculation of probabilities is pos- β-blockers and five comparing CCB versus
sible in a frequentist setting via resampling tech- β-blockers. Although a regression model is usually
niques, but this typically requires specialized written with an intercept and one or more regres-
routines or extra programming for the user. sion terms, it can also be written with no intercept
Several software options exist that fit meta- as in Eq. 3. The eight observed ln(OR) estimatesare
analysis models in a frequentist setting. Freely denoted asyi using study indices,
available software includes RevMan and pack- i = 1 , 2 , . . . , 8. Each yi is then written as a
ages in R; a popular commercial option is function of the variables xiACEBB and xiCCB BB.
STATA. The available routines and software These variables take values xiACEBB = 1 if study
frame the flexibility of models; for instance, it is i compares ACE inhibitors versus β-blockers and
not possible to fit arm-specific data using their xiACEBB = 0 otherwise, and xiCCB BB = 1 for
exact likelihood with the existing meta-analysis CCB versus β-blockers and zero otherwise. The
specific routines. meta-regression model that gives the summary
With network meta-analysis increasing in effects for these two comparisons is
popularity, Bayesian approaches have become
popular as they offer greater flexibility, and yi ¼ μACEBB xiACEBB þ μCCBBB xiCCBBB þ δi
WinBUGS is the most common software used. þ ei
Meta-analysis can be fitted as a linear or hierar-
chical model and both arm-specific or contrast where δi is the study-specific random effect.
specific data can be modelled, giving Bayesian Fitting this model in STATA using the command
fitting a practical advantage compared to the metareg and specifying the method of moments
frequentist approach. as the method to estimate the heterogeneity
parameter produces the results shown in the
upper part of Table 1.
Example: Subgroup Meta-analysis for ACE Τhe coefficients μ of the regression are the
Inhibitors and CCB Versus β-Blockers subgroup-specific summary effects μACE BB,
To exemplify the methods outlined above, consider μCCB BB on the ln(OR) scale. The heterogeneity
the two comparisons CCB versus β-blockers and parameter was estimated as τ2 = 0.01 and the
ACE inhibitors versus β-blockers from the network proportion of variability due to heterogeneity
introduced earlier relating to incident diabetes. rather than sampling error (after accounting for
Firstly, a meta-regression model will be fitted, subgroup differences) as I2 = 59%.
Table 1 Results of subgroup analysis for ACE inhibitors (OR) with their 95% confidence or credible interval
versus b-blockers and CCB versus b-blockers. Log-odds (CI/CrI) estimated from meta-regression and hierarchical
ratios (b) with their standard error SE(b) and odds ratios models are reported
95% CI/CrI for
Model Comparison μ SE(μ) OR OR
Linear model in frequentist ACE inhibitors versus 0.17 0.10 0.84 (0.69,1.03)
implementation β-blockers
CCB versus β-blockers 0.21 0.07 0.81 (0.71,0.93)
Hierarchical model in Bayesian ACE inhibitors versus 0.18 0.12 0.84 (0.66,1.06)
implementation β-blockers
CCB versus β-blockers 0.21 0.09 0.81 (0.68,0.97)
The subgroup meta-analysis can also be fitted The estimates obtained from the two
with the 2 2 tables as the starting point rather approaches are very similar. The major difference
than the ln(OR), and it is convenient to write between the two approaches is in the estimation of
this implementation as a hierarchical model. heterogeneity. The subgroup meta-analysis fitted
The outcome in each study is the number of within a Bayesian setting with the binomial like-
patients diagnosed with diabetes and therefore lihood gives a posterior median of τ2 equal to 0.02
the binomial likelihood can be used in a hierar- with 95% CrI (0.001,0.12), slightly larger than the
chical model. This means that the number point estimate from the frequentist meta-
of events (patients with diabetes) in each study regression.
arm, riBB and riACE for the first three
studies comparing ACE inhibitors to β-blockers
or riBB and riCCB for the five studies comparing
CCB to β-blockers, follow a specific binomial Indirect and Mixed Comparison
distribution with a respective probability of
success: Theory and Formulae for Indirect
Comparisons
r iBB BðpiBB , niBB Þ, i ¼ 1, . . . , 8: In evidence-based medicine, estimates of treat-

ment effect obtained from head-to-head RCTs
r iACE BðpiACE , niACE Þ, i ¼ 1, . . . , 3 and combined in a meta-analysis are widely con-
r iCCB BðpiCCB , niCCB Þ, i ¼ 4, . . . , 8 sidered the “best available” evidence with which
to evaluate the effectiveness of medical interven-
The log-odds ratios can be written as functions tions (Guyatt et al. 1995; McAlister et al. 1999).
of the arm-specific probabilities; the two ln(OR) Consider two treatments, labelled B and C, which
are logit( piACE) logit( piBB) and logit( piCCB) have been compared directly in RCTs and com-
logit( piACE). In this case the parameterization of bined in a pairwise meta-analysis. The summary
the model is: treatment effect estimate is denoted as μ ^D
BC , where
the superscript denotes the “direct” estimate and
logitðpiBB Þ ¼ ui the subscript denotes the treatment comparison,
where BC is the effect of C relative to B. In the
logitðpiACE Þ ¼ ui þ θiACEBB , absence of the “level one” evidence for B
versus C, it has been suggested that an indirect
if study i compares ACE inhibitors versus estimate can be formed via a “common compara-
β-blockers or tor” (Bucher et al. 1997; Song et al. 2003; Glenny
et al. 2005), which is assumed to be treatment
logitðpiBB Þ ¼ ui A. An indirect estimate μ ^ IBC can be derived by
combining the meta-analytic
D effect estimates of A
logitðpiCCB Þ ¼ ui þ θiCCBBB , versus
D B studies μ
^ AB and A versus C studies
μ
^ AC , such that,
if study i compares CCB versus β-blockers.
Then the study-specific underlying treatment μ
^ IBC ¼ μ
^DAC μ
^DAB
effects θiACE BB and θiCCB BB are distributed
normally with expectations μACE BB, μCCB BB This method is often referred to as an “adjusted
and common heterogeneity τ2 in the same way as indirect comparison,” so-called because randomi-
the previous model. Using a half-normal prior dis- zation is respected by using the relative effect
tribution for the heterogeneity (τN(0, 1), τ > 0) estimates μ ^D
AC , μ
^DAB obtained from the meta-
and fitting the model in WinBUGS produces the analyses. Here it is referred to simply as an indi-
estimates presented in the lower part of Table 1. rect comparison.
Docetaxel (C) Paclitaxel (B)
Capecitabine + Docetaxel (D) Gemcitabine + Paclitaxel (A)
Fig. 2 Chain of comparisons network of chemotherapy treatments for second-line treatment of breast cancer
Table 2 Findings from the manufacturer’s submission for gemcitabine STA. Median difference in survival and 95%
confidence intervals (Adapted from: Eli Lilly 2006 and Jones et al. 2006)
Treatment comparison Trials Median difference (MD) (95% CI) SE (MD) Variance (MD)
Gemcitabine + paclitaxel 1 2.8 (0.01, 5.6) 1.42 2.02
(A) versus paclitaxel (B)
Paclitaxel (B) versus 1 2.7 (0.3, 5.1) 1.24 1.54
docetaxel(C)
Docetaxel (C) versus 1 3.0 (0.6, 5.4) 1.20 1.44
Capecitabine + docetaxel
(D)
The usual measures of statistical variability can the network is “connected” and not necessarily
be derived for the indirect estimate. As μ ^ IBC is via a common comparator. Consider the
formed as a difference between two independent network shown in Fig. 2 which is adapted
estimates its variance, ^v IBC , is equal to the sum of from a 2006 submission to NICE which included
the variances, ^v D
AC and ^ vD
AB , estimated from the four distinct regimens for the second-line
direct AC and AB comparisons: treatment of metastatic breast cancer (Eli
Lilly 2006).
^v IBC ¼ ^v D
AC þ ^
vDAB
Table 2 reports the results for difference in
median years survival. Note there are direct esti-
A single head-to-head randomized trial is as mates available for gemcitabine + paclitaxel ver-
precise as an indirect comparison based on four sus paclitaxel ð^μD
AB ¼ 2:8 years), paclitaxel
trials of the same size. To see this, suppose each μD
versus docetaxel (^ BC ¼ 2:7 years and docetaxel
trial produces an estimate with variance σ 2. A versus capecitabine + docetaxel ð^ μDCD ¼ 3:0
meta-analysis of s trials with direct estimates of years), which forms a “chain” of evidence A-B-
A versus B will have variance ^v DAB ¼ σ =s (based
2 C-D. The comparison of interest to the decision-
on inverse variance weights). The indirect esti- maker was gemcitabine + paclitaxel (A) versus
mate of B versus C via A based on s AB and s capecitabine + docetaxel (D) (Jones et al. 2006),
AC trials will have variance ^v IBC ¼ ^v D
AC þ ^
vDAB ¼
an indirect comparison of which can be
σ =s þ σ =s ¼ 2σ =s.
2 2 2 formed as
A common misconception is that for an
indirect comparison to be valid, every trial
μ
^ IAD ¼ μ
^DAB þ μ
^DBC þ μ
^DCD ¼ 2:9 years
must include a common comparator (Hughes
2010). In truth, indirect estimates can be derived I pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
via many routes. The only requirement is that ^ AD ¼ ^v AB þ ^v BC þ ^v CD ¼ 2:23
SE μ
Theory and Formulae for Mixed ^ IACECCB ¼ μ

μ ^DACEΒΒ μ
^DCCBΒΒ
Comparisons ¼ 0:17 ð0:21Þ ¼ 0:04
If both direct and indirect estimates are available The variance of the indirect estimate μ^ IACECCB
for the same comparison, they can be combined is the sum of the variances of μ ^ ACEBB and
I
by taking the weighted average of μD BC and μ^ IBC . μ

^ CCBΒΒ :
D
This has been referred to as a mixed comparison
and will be denoted here as μM BC . However, it ^v IACECCB ¼ ^v D
ACEΒΒ þ ^
vDCCBΒΒ
should not be confused with a “mixed-treatment
comparison” (Lu 2004), which refers to the simul- ¼ 0:102 þ 0:072 ¼ 0:0149
taneous comparison of multiple treatments in a
single analysis and is synonymous to network Therefore, the indirect OR of ACE inhibitors

meta-analysis. A simple and intuitive approach versus CCB is exp μ^ IACECCB ¼ 1:04 with 95% CI
I q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
for combining direct and indirect evidence is the
exp μ^ ACECCB 1:96 ^v IACECCB Þ ¼ ð0:82, 1:32Þ:
inverse variance method, where
Since there are also three studies that directly
1 1 compare ACE inhibitors with CCB, they can be
μ
^DAB þ μ
^ IBC combined with the indirect estimate to produce a
^v D ^v IBC
μ
^MBC ¼ BC
mixed estimate. Synthesis of the studies provides
1 1
þ I a direct estimate for the μ
^DACECCB equal to 0.22
^v D
BC ^
v BC with standard error 0.11. Then, the mixed estimate
can be obtained as the weighted average of the
with variance
direct and indirect ln(OR):
1
^v M
BC ¼ 1 1
1 1 μ
^D ACECCB þ I μ
Î
þ ^v D ^v ACECCB ACECCB
^v D
BC ^v IBC μ
^MACECCB ¼ ACECCB
1 1
þ I
^v D
ACECCB ^v ACECCB
A 95% confidence interval for theqmixed
ffiffiffiffiffiffiffi esti-
mate can be obtained as μ
^MBC 1:96 ^
vMBC . Note
1 1
ð0:22Þ þ 0:04
that in the case of a dichotomous outcome, where ¼ 0:0121 0:0149 ¼ 0:10
1 1
μ is the ln(OR) or ln(RR), mean effect size and þ
0:0121 0:0149
confidence intervals for the OR and RR can
obtained by exponentiation.
The variance of this estimate is
Example: Indirect and Mixed Comparison 1
for ACE inhibitors Versus CCB ^v M
ACECCB ¼ ¼ 0:0067
1 1
Using the ln(OR) for the comparisons ofACE þ
0:0121 0:0149
inhibitors and CCB each versus β-blockers pre-
sented in Table 1, an indirect estimate for ACE 1
inhibitors versus CCB can be obtained. The indi- ¼ ¼ 0:009
1 1
þ
rect ln(OR) estimate μ ^ IACECCB is calculated as the 0:0225 0:0149
difference between the direct ln(OR) for CCB M
versus β-blockers and direct ln(OR) for ACE The mixed OR is exp μ ^ ACECCB ¼ 0:90 with
inhibitors versus β-blockers. Using the estimates 95% CI (0.77, 1.06).
from the frequentist subgroup analysis described The indirect and mixed ORs for ACE inhibi-
earlier, the indirect estimate is tors versus CCB are presented in Fig. 3. Note that
Fig. 3 Summary odds OR 95% CI

ratios (OR) for each
comparison using direct, ACE inhibitors vs. b-blockers 0.84 (0.69, 1.03)
indirect, and mixed
evidences. Diamonds CCB vs. b-blockers 0.81 (0.71, 0.93)
represent the point
estimates and the horizontal ACE inhibitors vs. CCB (Direct) 0.80 (0.65, 1.00)
lines the corresponding
95% confidence intervals
ACE inhibitors vs. CCB (Indirect) 1.04 (0.82, 1.32)
ACE inhibitors vs. CCB (mixed) 0.90 (0.77, 1.06)
0.5 1 1.5
Favors first treatment Favors second treatment
the confidence interval for the mixed estimate is HTA organizations have expressed doubts about
narrower than the confidence interval for the indirect comparisons and state that if direct evi-
direct estimate. By combining direct evidence dence exists, it should take precedence. For exam-
with indirect evidence, the variance is reduced ple, the Cochrane Handbook (Higgins and Green
by 45%. 2008) states “indirect comparisons may suffer the
This approach is intuitive and will be familiar to biases of observational studies” and advises that
meta-analysts; however, it is labor-intensive. In a direct and indirect evidence only be combined as a
three treatment network, A-B-C, the meta-analytic supplemental analysis. In England and Wales,
estimates μ^DAB , μ
^DAC , μ^D
BC must be obtained, the NICE (2008) uses direct evidence as the reference
indirect estimates μ ^ AB , μ
I
^ IAC , μ^ IBC derived, and case for appraisals of new technologies, only allo-
then the mixed estimates μ ^ AB , μ
M
^M
AC , μ
^MBC com- wing indirect and “mixed” comparisons as a
puted. As the number of treatments increases and supporting analysis. Similarly, CADTH (Wells
the network expands, this approach quickly et al. 2009) adopts a cautious stance and the Phar-
becomes untenable and more sophisticated maceutical Benefits Advisory Committee (PBAC)
approaches can be used (Caldwell et al. 2005). in Australia have expressed skepticism about the
Section “Models for Network Meta-analysis” dis- use of indirect evidence (PBAC 2008).
cusses methods for simultaneously combining This caution is based on concerns regarding the
direct and indirect evidence in a single analysis. key assumption underpinning indirect compari-
The next section discusses the underlying assump- sons, which is reflected mathematically in the
tions needed to undertake an indirect or mixed consistency equation μBC = μAC μAB. The con-
comparison. sistency equation relates to the true (or average)
effectiveness of B versus C rather than to each
individual study. It states that the effect of B
Assumptions Underlying Indirect versus C can be estimated either indirectly via A
and Mixed Comparisons (right part of the equation) or directly (left part of
the equation) and that these two pieces of evi-
Current hierarchies of evidence place indirect and dence will, on average, give the same result.
“mixed” comparisons below direct evidence Rearranging the parts of the equation shows that
regardless of whether the constituent effect esti- one consistency equation is sufficient to reflect
mates have been obtained from meta-analyses of consistency for all three comparisons in a three
RCTs (currently “level one” evidence). Several treatment network. Such that
μBC ¼ μAC μAB , μAB ¼ μAC μBC , studies do not allow valid indirect comparisons,
μBC ¼ μAC μAB due to important differences between studies that
prevent an assumption of transitivity from hold-
ing. If there is inconsistency in the data, the
The validity of the consistency equation is assumption of transitivity is clearly challenged.
fundamental to the validity of indirect compari- For an “open loop,” that is one for which there is
sons. In considering the validity of the assumption indirect evidence but no direct evidence, consis-
for the combination of direct and indirect evi- tency cannot be evaluated statistically, and the
dence, some authors have found it instructive to validity of the indirect comparison must rest
separate the notion of similarity (Song et al. 2009; entirely on clinical and epidemiological judge-
Donegan et al. 2010) or transitivity (Baker and ments regarding the plausibility of transitivity. It
Kramer 2002) from the notion of consistency. In can be shown mathematically (Lu and Ades
the current chapter, these notions are interpreted 2009) that consistency is a consequence of the
as the distinction between clinical or epidemio- assumption of exchangeability that forms the
logical considerations on the one hand, and statis- basis of the Bayesian network meta-analysis
tical considerations on the other. Transitivity models which is, in turn, an extension of the
refers to the genuine ability to learn about a usual assumption made in a pairwise meta-
pairwise comparison via an intermediate treat- analysis (Dias et al. 2010). The assumption of
ment via indirect comparison. As will be transitivity is essentially equivalent to the
discussed below, it requires the intermediate treat- assumption of exchangeability in this sense,
ment to be equivalent when compared against since it relates to similarity of studies. The term
each of the treatments of interest and that the “transitivity” might be preferred to “similarity”
actual studies contributing to the indirect compar- (Donegan et al. 2010); however, because (i) it
ison do not differ in important ways. Specifically, better describes the aim of the assumption to
when μIAB is calculated, it is assumed that we can compare two treatments via a third one; (ii) it
learn about B versus C via A. The common com- clearly refers to more than two comparisons
parator A might be said to be “transitive” when it whereas the term “similarity” reduces to homo-
allows valid comparison of the treatments to geneity when we refer to a single head-to-head
which it is linked. Note that transitivity is not a comparison; and (iii) “similarity” may be mis-
property of the common treatment A alone but of interpreted as necessitating all trial and patient
the two sets of studies it links. characteristics to be similar, when in truth a valid
Consistency is a statistical notion that can be indirect comparison can be obtained even when
considered at the level of the parameters or the studies are dissimilar, so long as such character-
level of the data. The consistency equation defines istics do not modify treatment effect.
relationships among the parameters. The validity
of the assumption embodied in the equation can Requirements for Transitivity
be assessed only when data from different sources Transitivity requires some particular characteris-
form a “closed loop” of evidence in the network tics of the studies contributing to the indirect
(a path that starts and ends at the same node comparison, as follows:
treatment). When the consistency assumption
does not hold or when there is evidence of dis- • The two sets of trials AB and AC do not differ
agreement between direct and indirect evidence, with respect to the distribution of effect
then the evidence is said to be inconsistent modifiers.
(or show inconsistency).
When transitivity holds and there are multiple
sources of evidence, the consistency equation In order for an indirect comparison to be valid,
should hold. The consistency equation may hold the distribution of treatment effect modifiers
in a statistical sense, however, even when the should be similar in AB and AC trials. Before
conducting an indirect or mixed comparison, the A is a chemotherapy regimen typically adminis-

analyst should therefore ensure they have identi- tered as a second-line treatment, whereas treat-
fied a priori possible effect modifiers and should ments B and C can be used either as first or
compare their distributions across treatment com- second line, it cannot be assumed that participants
parisons. For example, in a network of treatments in a BC trial could have been randomized in an
for childhood nocturnal enuresis, Caldwell et al. AC trial. Although this consideration is a funda-
(2010) hypothesize that the age of the children mental one and should be addressed when build-
could be a potential effect modifier since a ing the evidence network, it might be the case that
6-year-old suffering from nighttime bedwetting treatments are comparable in theory but not in
might have a different underlying pathology practice. For example, interferon, glatiramer ace-
from a 12-year-old. tate, or natalizumab are commonly used in clinical
Note that the consistency assumption holds at the practice for patients with relapsing-remitting mul-
level of the mean effect sizes and as such an effect tiple scleroses whereas mitoxantrone, methotrex-
modifier that differs within studies of one compari- ate, cyclophosphamide, or azathioprine are more
son but has a similar distribution across both com- frequent for patients with a progressive disease.
parisons will not violate the assumption. For Evidence to support this clinical “tradition” is not
example, if age is an effect modifier and AC trials solid, and it would be appealing to compare all
differ in terms of mean age of participants (which these treatments. In practice, however, transitivity
will manifest as heterogeneity in AC studies), but will be violated as comparisons will differ in
the same variability is observed in the set of AB disease severity.
trials then transitivity could still hold. In contrast, if Another way to conceptualize this requirement
the distribution of age differs across comparisons is to consider treatments not included in each
such that children in the AC studies tended to be study as missing data (Lu and Ades 2006). Thus,
younger and those in AB studies tended to be much AB trials are missing C arms, and AC trials are
older, then the assumption would not hold. missing B arms. The transitivity assumption is
Adjustment using regression techniques can be likely to hold if these arms are missing in an
used to account for small differences in the distri- entirely random way, which guarantees that the
bution of effect but note that covariates must be choice of treatments is unrelated to the indications
carefully selected. Only effect modifiers and not for which they are given. In practice, the selection
“colliders” (variables that influence the choice of of treatments to be included in a trial is not ran-
comparison and the effectiveness) should be con- dom. In many clinical trials, the choice of com-
sidered for adjustment. Adjustment for colliders, as parator is placebo or an older, suboptimal
in classical epidemiology, will introduce bias rather intervention rather than a realistic alternative
than improve the plausibility of the transitivity such as an established effective treatment. If the
assumption (Jansen et al. 2012). choice of comparator is associated, directly or
indirectly, with the relative effectiveness of the
• The interventions are being given for the same interventions then the key assumption will be
indications. violated.
Transitivity could be violated if interventions • The treatment C is similar when it appears in

have different indications, so it is important that AC and BC trials.
sets of studies contributing to the indirect compar-
ison are using the treatments for the same under- The transitivity assumption is violated when
lying condition. A particularly useful way to think the treatments in question differ systematically
about this requirement is to consider whether the between trials. The definition of the nodes in the
participants included in the network could, in treatment network is a challenging issue as very
principle, be randomized to any of the three often treatments are given at various doses,
treatments A, B, and C. For example, if treatment administrations routes, frequencies, etc. For
example, consider that the common comparator A the subscript denoting the loop) (Lu and Ades
is a treatment which can be given at different 2006) as it can be shown that the same inconsis-
doses, but there is no systematic difference on tency factor will be obtained whichever edge of
the average dose of A between AC and AB trials. the triangle is of interest.
In this case the assumption can hold although The variance of the inconsistency factor is
there could be heterogeneity within AC and AB

comparisons. Consequently, the “anchor” treat- b ABC ¼ ^v D þ ^v Ι
var IF BC BC
ment A can be represented by a single node allo-
wing the indirect comparison of B and C. If,
however, A is given via a different administration A 95% confidence interval can be obtained for
the inconsistency factor as b ABC 1:96
IF
route in all AC and AB trials, then it is question- rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
able whether the two types of A can form a com- var IF b ABC . The null hypothesis of evidence
mon node and an indirect comparison of B versus
C via A would be impossible. For example, when b ¼ 0 can then be tested by deriving
consistency IF
comparing different fluoride treatments, compari- a z-test (Bucher et al. 1997).
son between fluoride toothpaste and fluoride rinse
can be made via placebo. However, placebo tooth- b ABC
IF
z ¼ rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ffi
paste and placebo rinse might not be comparable
var IF ABCb
as the mechanical function of brushing might have
a different effect on the prevention of caries. If this
is the case, the transitivity assumption is doubtful If consistency holds, it is reasonable to com-
(Salanti et al. 2009). bine across μ ^D
BC and μ^ IBC to form μ ^M
BC . However, if
there is evidence of a “statistically significant”
discrepancy ( p 0.05), the fundamental
Estimating Inconsistency in Mixed
assumption is not fulfilled, and one may say that
Comparisons
there is evidence of inconsistency.
In theory, the consistency equation μBC = μAC
Claims have been made that indirect compari-
μAB must hold if transitivity is deemed to hold.
sons may systematically over- (Bucher et al. 1997;
However, in practice, there may be inconsistency
Mills et al. 2011) or underestimate treatment effects
in the evidence base. In a three-treatment network,
compared with direct comparisons. Since inconsis-
three independent direct estimates, μ ^D
ΑΒ 0 , μ
^DΑC and tency is a property of a “loop” of evidence apparent
μ
^DBC (assuming there are no trials with more than overestimation of a treatment effect on one side of a
two arms), and three indirect estimates, μ ^ IAB 0 , μ
^ IAC triangle network (e.g., μ ^ IBC ) corresponds to under-
and μ^ IBC, can be obtained. Assuming the treatment estimation of another (e.g., μ ^ IAC ). Thus, any assess-
comparison of interest is B versus C, the discrep- ment of consistency needs to take account of the
ancy (difference) between the direct and indirect particular circumstances of the problem. Until
estimates forms the measure of inconsistency. recently, empirical investigation of the extent of
This discrepancy is called the inconsistency factor inconsistency has been limited. In a recent review,
(IF) which is estimated as (Song et al. 2011) examined 112 independent three-
D treatment networks and detected 16 cases of statis-
b ABC ¼ μ
IF ^ ΙBC
^ BC μ tically significant discrepancies between direct and
indirect estimates. However, there was no consis-
Note that the direction of the difference might tent direction as to over- or underestimation. Of
be clinically important but mathematically is course, the test for inconsistency may have low
unimportant for the statistical evaluation of con- power to detect true inconsistency should it exist,
sistency. Consequently only absolute differences as with other interaction effects. The analyst must
are taken. In a three-treatment network, only one therefore be extremely cautious in their interpreta-
measure of inconsistency is possible (and hence tion even if inconsistency is not detected.
Note that the discovery of inconsistency does leading to a p-value equal to 0.91. Note that this
not necessarily mean that all indirect compari- result applies to the entire triangle: the same
sons in the loop are invalid. For example, sup- inconsistency factor and p-value could have
pose that AC and BC trials are similar regarding been obtained by calculating the difference
the distribution of effect modifiers (e.g., all stud- between direct and indirect evidence for the
ies are carried out in adults with a similar distri- ACE versus β-blockers or CCB versus
bution in age), so that μ^ IAB is a valid estimate of β-blockers comparisons. As the 95% CI includes
the relative effectiveness of A versus B for the zero, there is no indication of important statistical
given setting and population. If now the AB inconsistency between direct and indirect esti-
studies have all being carried out in younger mate, which is also supported by the p-value.
populations (e.g., in adolescents) then the con-
sistency assumption does not hold; both μ ^ IAB and
μ
^ AB are valid but answer different questions;
D
Models for Network Meta-analysis
hence computation of a mixed estimate, μ ^M
AB ,
would be inappropriate. Extensions of the ideas above to more than three
treatments lead to a general framework for network
meta-analysis. Consider a set of T treatments of
Example: Inconsistency in the Evidence Trian- interest that we want to evaluate according to
gle ACE Inhibitors Versus CCB Versus their relative effectiveness on a single outcome
β-Blockers measure. The treatments are studied collectively
Inconsistency can be evaluated by calculating the in N studies. Each study may provide evidence
difference between direct and indirect estimate for about some of the treatments; it will include only
the same comparison. In the case of ACE inhibi- a subset of T, Ti T. The study data can be
tors versus CCB, the inconsistency factor reflects arm-based or contrast-based. In the contrast-based
the disagreement in the triangle formed by the approach, the effect sizes yijk from each study are
three sets of trials ACE inhibitors versus CCB available, and they refer to the relative effective-
versus β-blockers and is calculated as ness of a treatment k relative to j with j , k Ti
Network meta-analysis can be viewed as a special
b ACECCBBB
IF case of meta-regression (linear model), as a hierar-
D
¼ μ^ ACECCB μ
^ IACECCB j¼j 0:22 0:04j ¼ 0:26 chical model or as a multivariate meta-analysis
model. The estimation methods that arise from
The standard error of the inconsistency factor these approaches are essentially equivalent and
is obtained as can be employed under the assumption of consis-
tency or under assumptions that impose fewer
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi restrictions.
b ACECCBBB ¼ ^v D
SE IF ACECCB þ ^ v IACECCB
A key issue in all methods for fitting network
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi meta-analysis is the minimization of the parame-
¼ 0:0144 þ 0:0225
¼ 0:192 ters’ space by selecting a minimum set of basic
parameters. This is a set of comparisons (as many
A 95% CI for the inconsistency factor is as the total number of treatments minus one) that
are sufficient to generate all possible comparisons
b 1:96 SE IF
obtained as IF b ¼ ð0:12, 0:64Þ.
between the treatments via the consistency equa-
^ ACECCBBB
The z-test for the hypothesis H 0 : IF tions. Under consistency, the choice of the basic
¼ 0 is parameters does not affect the results but typically
the basic parameters are defined by taking the
b
IF 0:26 comparisons of all treatments versus a common
z¼ ACECCBBB ¼ ¼ 1:35
b ACECCBBB 0:192 reference to simplify interpretation. Examples to
SE IF
follow should make this clear.
Consistency Models contrasts AC and AB are called the basic con-

trasts and the parameters μΑC, μAB the basic
Network Meta-analysis as a Linear Model parameters, whereas the μBC is a functional
Consider the simplest case of having three parameter and can be derived as a linear function
treatments of interest T = {A, B, C} and studies of the two basic parameters. The choice of the
that compare all possible pairs of those treatments; two out of the three contrasts that enter the meta-
i.e., there are AB, AC, and BC studies. For now it is regression is arbitrary and does not impact on the
assumed that only two-arm trials are available. In parameters estimation; e.g., xiAC and xiBC could
general, yijk refers to the relative effectiveness of have been chosen as covariates.
two interventions j and k within study i.When Extending the idea to the case of more than
each study has only two treatments, the treatment three treatments results in a full network meta-
indices can be dropped and the observed effect analysis. For example, with T = {A, B, C, D, E}
written as yi. The treatment indices will be treatments included, there are T(T 1)/2 = 10
reemployed in section “Network Meta-analysis possible head-to-head comparisons. T 1 basic
as a Hierarchical Model.” parameters are selected, such as all treatment com-
The two-step process described in sections parisons Aj of treatment j = B , C , D , E versus
“Theory and Formulae for Indirect Comparisons” treatment A, relating to regression coefficients μAj.
and “Theory and Formulae for Mixed Compari- The meta-regression model would be
sons” is a simple network meta-analysis. For a
given comparison, say B versus C, an indirect X
yi ¼ μAj xiAj þ δi þ ei (6)
estimate μ ^ IBC is derived by combining AC and
j¼B, C, D, F
AB studies. Then, the indirect estimate and direct
estimate μ ^DBC are synthesized to obtain the mixed
summary estimate μ ^M with ei N 0, s2i . The variable xiAj = 1 if study
BC . This first step of this
process was described in the context of a meta- i compares A and j, xiAj = 0 if study i compares
regression in section “Meta-analysis and Meta- A and k , k 6¼ j. If a study compares treatment s,
regression as Linear Model.” This can be done j and k and does not include A, then the consis-
by creating two dummy variables to identify the tency equations are used to derive the values of
AC and BC studies and omitting the intercept: xiAj. If, for example, a study compares B and D,
then xiAj = 0, j = C , E,xiAB = 1 and xiAD =
yi ¼ μΑC xiAC þ μAB xiAB þ δi þ ei 1 because μBD = μΑD μAB.
In summary, all observed comparisons are
reexpressed using the regression covariates xiAj.
The same model can be used for both stages of This gives the model Ncomp (T 1) degrees of
the mixed comparison analysis, by careful spec- freedom (number of functional parameters),
ification of the covariate values in a way that where Ncomp is the number of comparisons
forces the consistency equation into the analysis observed in the network. For example, if the net-
as a constraint. As above, if study i compares work consists of AC , AB , BC , CD , BE , BD
A and C, then xiAC = 1 , xiAB = 0, and if study studies, then there are 6 (5 1) = 2 degrees
i compares A and B then xiAC = 0 , xiAB = 1. of freedom. This can be also visualised by the
Now if study i compares B and C, the consistency number of independent closed loops in the net-
equation μBC = μΑC μAB can be introduced by work diagram (ABC and BCD).
setting xiAC = 1 , xiAB = 1. Note that This model can be fitted using any meta-
because of the assumption of consistency, only regression software (such as the metareg command
two comparisons need to be included in the in STATA). The estimated regression coefficients
model (here AC and AB), and consequently μ
^ Aj are network meta-analysis estimates for all
there are two explanatory variables to be treatments versus the reference treatment
A and
included in the meta-regression model. The two their uncertainty is conveyed by SE μ^ Aj . Network
meta-analysis summary effects for all other com- a common reference treatment. Here, placebo (P) is
parisons, say B versus D, can be obtained by con- chosen to be the reference treatment and basic
sidering the consistency equations relating the μ ^ Aj contrasts are defined for each treatment versus pla-
to the functional parameters. Their variances can be cebo. Then, to specify the design matrix all com-
obtained by combining standard errors and covari- parisons in the network need to be written as
ances (from the variance-covariance matrix for the functions of the basic parameters. The first two
estimated regression coefficients). For instance, columns of Table 3 list all comparisons in the
μ
^ BD ¼ μ ^ ΑD μ^ AB and SE2 ðμ ^ BD Þ ¼ SE2 ðμ
^ ΑD Þ þ network for which direct estimates are available
SE ðμ
2
^ AB Þ 2cov ðμ^ ΑD , μ
^ ΑB Þ. and the number of studies involving each compar-
Note that the random effects follow a normal ison. Then, for the five comparisons belonging to
distribution δi N(0, τ2), with heterogeneity var- the basic contrasts (e.g., β-blockers (BB) vs. P), the
iance assumed to be equal for every comparison. respective variable xi (xiBBP) takes the value 1 and
This may be a strong assumption as different the variables of the other four basic contrasts take
comparisons might include studies with different the value 0. For any other treatment comparison
between-study variability. Assuming a common (e.g., diuretics (D) vs. BB) xi takes value -1 for the
heterogeneity might impose an inappropriate τ2 first treatment (xiDP) and 1 for the second treat-
value for some comparisons. Although assuming ment based on the consistency equations
comparison-specific heterogeneities can be desir- (μDBB = μDP μBBP).
able in many cases, it presents practical difficul- The full meta-regression model is
ties. Estimation of the parameter τ2 can be
challenging if few studies are available. Even
yi ¼ μBBP xiBBP þ μDP xiDP þ μCCBP xiCCBP
with large network meta-analyses including þ μACEP xiACEP þ μARBP xiARBP þ δi þ ei :
many treatments, it is often the case that some of
the comparisons include only a few studies; some Fitting the model in STATA using metareg
comparisons might even be informed by a single produces the regression coefficients in Table 4.
study. Nevertheless, assuming a common hetero- The common heterogeneity parameter of the
geneity parameter allows comparisons to “borrow network was estimated as 0.02. The variance-
strength” from each other in the estimation of the covariance matrix of the regression-coefficients
common τ2, overcoming computational problems is saved by STATA as the “e(v)” matrix and can
that are encountered both with frequentist and be obtained after fitting the meta-regression model
Bayesian fitting of models. (Table 5).
Then any head-to-head comparison can be
derived applying again the consistency equations
Application: Network Meta-analysis Using to the point estimates. For example, the ln(OR) of
Meta-regression for Incident Diabetes diuretics versus β-blockers is μ ^ DBB ¼ μ^ DP
Standard meta-regression methods can be only be μ
^ BBP ¼ 0:32 0:24 ¼ 0:08 , and its standard
applied to networks that contain two-arm studies. error is
The following analysis treats the 30 pairwise com-
parisons in the incident diabetes data set as if they
SEðμ
^ DBB Þ
came from 30 (rather than the true 22) independent
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
studies. A meta-regression model is be employed ¼ ^v DP þ ^v BBP 2Covðμ ^ DP , μ ^ BBP Þ
where again the different comparisons define the pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
subgroups. First the T 1 “basic contrasts” need ¼ 0:008 þ 0:0076 2 0:004 ¼ 0:09
to be selected, to be included as covariates in the
model. Several combinations of basic contrasts are All other functional contrasts estimates are
possible, and for T = 6, five parameters need to be derived the same way. The network meta-analysis
selected. For ease of interpretation, it is convenient estimates for all comparisons are presented in the
to choose the comparisons of each treatment versus black diamonds in Fig. 4.
Table 3 Parameterization of design matrix for the five basic contrasts when placebo is the reference treatment for
incident diabetes
xiBBP xiDP xiCCBP xiACEP xiARBP
β-blockers diuretics CCB ARB
Comparison in Number versus versus versus ACE inhibitors versus
study i of studies placebo placebo placebo versus placebo placebo
β-blockers versus 1 1 0 0 0 0
placebo
diuretics versus 3 0 1 0 0 0
placebo
diuretics versus 2 1 1 0 0 0
β-blockers
CCB versus 1 0 0 1 0 0
placebo
β–blockers
diuretics
ACE inhibitors 2 0 0 0 1 0
versus placebo
versus β–blockers
versus CCB
ARB versus 3 0 0 0 0 1
placebo
β–blockers
diuretics
ARB versus CCB 1 0 0 1 0 1
Table 4 Results of network meta-analysis as meta- 95% confidence interval (CI) for all basic and functional
μ ) with
regression for incident diabetes. Log-odds ratios (^ contrasts are reported
their standard error SEðμ
^ Þ and odds ratios (OR) with their
Comparison μ
^ SEðμ
^Þ OR 95% CI for OR
β-blockers versus placebo 0.24 0.09 1.27 (1.07, 1.52)
diuretics versus placebo 0.32 0.09 1.38 (1.15, 1.64)
CCB versus placebo 0.08 0.08 1.08 (0.93, 1.27)
ACE inhibitors versus placebo 0.11 0.08 0.90 (0.77, 1.05)
ARB versus placebo 0.17 0.10 0.84 (0.69, 1.03)
Note that the confidence intervals for the com- ranking the mean OR of each treatment versus
parison ACE inhibitors versus CCB have further placebo.
reduced compared with the direct or mixed esti-
mate previously calculated. Also, an estimate for Network Meta-analysis as a Hierarchical
the comparison ARB versus ACE inhibitors is Model
obtained for which no studies exist. Figure 4 An alternative way to fit the network meta-
also shows the ranking of the treatments by analysis model is by extending the hierarchical
Table 5 Variance-covariance matrix of the five basic the variances of the parameters and the other cells the
parameters used in meta-regression approach for network covariances between the two corresponding parameters
meta-analysis for incident diabetes. The diagonal includes
μ
^ BBP μ
^ DP μ
^ CCBP μ
^ ACEP μ
^ ARBP
μ
^ BBP 0.0076
μ
^ DP 0.0040 0.0080
μ
^ CCBP 0.0052 0.0040 0.0070
μ
^ ACEP 0.0038 0.0034 0.0035 0.0058
μ
^ ARBP 0.0037 0.0024 0.0037 0.0022 0.0098
Meta-regression Hierarchical model

OR 95% CI OR 95% CI
β-blockers vs. placebo 1.27 1.07 1.52 1.27 1.04 1.55
diuretics vs. placebo 1.38 1.15 1.64 1.38 1.13 1.68
CCB vs. placebo 1.08 0.93 1.27 1.07 0.88 1.30
ACE inhibitors vs. placebo 0.90 0.77 1.05 0.91 0.78 1.07
ARB vs. placebo 0.84 0.69 1.03 0.83 0.68 1.01
diuretics vs. ?-blockers 1.08 0.91 1.29 1.08 0.89 1.32
CCB vs. diuretics 0.79 0.67 0.92 0.78 0.65 0.93
CCB vs. β-blockers 0.85 0.76 0.96 0.84 0.74 0.97
ACE inhibitors vs. CCB 0.84 0.71 0.98 0.85 0.70 1.04
ACE inhibitors vs. diuretics 0.65 0.56 0.76 0.66 0.56 0.79
ACE inhibitors vs. β-blockers 0.70 0.60 0.82 0.71 0.61 0.83
ARB vs. ACE inhibitors 0.93 0.75 1.16 0.91 0.72 1.16
ARB vs. CCB 0.78 0.64 0.95 0.79 0.66 0.93
ARB vs. diuretics 0.61 0.49 0.76 0.61 0.49 0.76
0.66 0.55 0.81 0.66 0.54 0.80
ARB vs. β-blockers
0 1 2
Fig. 4 Results from network meta-analysis conducted as are the point estimates of summary odds ratios (OR) and
meta-regression in STATA (black) ignoring correlations the horizontal lines represent the corresponding 95% con-
from multi-arm trials and as hierarchical model in fidence intervals (CI)
WinBUGS that account for correlations (red). Diamonds
model. For the simplest case of three treatments θiAC N ðμAC , τ2 Þ, θiAB N ðμAB , τ2 Þ,
A , B , C, assume there are studies that inform all θiBC N ðμBC , τ2 Þ:
possible comparisons. The effect size for a study
that compares A versus B is denoted by yiAB. When This situation is depicted in Fig. 5.
only two-arm trials are included in the network, The model so far is a collection of three indepen-
the likelihood for the observations is specific to dent meta-analyses. The three distributions relate to
the comparison being presented, i.e., three different means, one mean per comparison.
The consistency assumption claims that the three
yiAC N θiAC , s2iAC , yiAB N θiAB , s2iAB , means are related via μBC = μAC μAB. This con-
yiBC N θiBC , s2iBC , straint results in the indirect estimation of B versus
C and, if there are studies making the comparison
and similarly for the random effects directly, results also in the synthesis of the indirect
BC AB AC
BC AB AC
AB = AC − BC
Fig. 5 Hypothetical example from three sets of meta- effects meta-analysis and a common heterogeneity param-
analyses that form a closed loop of evidence. The dia- eter. The diamond in dashed line is the indirect estimate for
monds represent the summary effects using a random the AB comparison
evidence with the direct evidence. Indirect estima- Note that the formulation above assumes that
tion is represented by the dashed diamond in Fig. 5. the different comparisons have the same hetero-
Of course the consistency relationship works by geneity variance, as in the meta-regression model
estimating indirect and mixed estimates for all com- in section “Network Meta-analysis as a Linear
parisons, not just B versus C. Model.” The impact of this assumption can be
Extending the idea to more than three treat- better visualized in Fig. 5, where the three forest
ments is straightforward. For any two treatments plots appear to reflect different degrees of hetero-
j, k = {A, B, C, D, E} compared in study i, the geneity. Nevertheless, the three random effects
likelihood and the random effects distributions distributions have the same dispersion as a result
can be written as in a subgroup analysis, i.e., of imposing a common heterogeneity parameter
specific to the comparison j versus k of their variance τ2. This will “inflate” the uncer-
tainty in the summary estimates in the more

homogeneous sets of studies BC and AB and
yijk N θijk , s2ijk (7)
“deflate” the uncertainty in the AC estimate by
attaching a lower τ2 value for this comparison. A
θijk N μjk , τ2 (8) notable consequence is that the network meta-
analysis summary effect size for a particular com-
Assuming consistency, the means of the random parison can be less precise than the summary
effects distributions are related. Selecting again effect size from direct evidence alone. This can
T 1 basic parameters μAj, all means are related happen when a comparison with very low or no
via μjk = μAk μAj. There are as many consistency heterogeneity enters a network that consists of
equations as comparisons-specific meta-analysis, heterogeneous comparisons. Then, the estimated
that is, Ncomp, each equation expressing every com- common heterogeneity parameter (which will
parison that appears in the data as a combination of be higher than the true for the homogeneous
the basic parameters. This gives the model Ncomp comparison) will impose greater uncertainty
(T 1) degrees of freedom. in the estimate for the homogeneous comparison,
which might outweigh the gain in precision Table 6 Results of network meta-analysis as hierarchical
offered by including the indirect evidence. model for incident diabetes. Log-odds ratios (^ μ ) with their
standard error SEðμ ^ Þ and odds ratios (OR) with their 95%
Fitting the hierarchical model within a Bayes- credible interval (CrI) for all basic and functional contrasts
ian framework makes the use of the true likeli- are reported
hood for the data easier. In the case of 95% CI
dichotomous outcomes, each study reports num- Comparison μ
^ SEðμ
^Þ OR for OR
bers of successes per arm and the likelihood β-blockers 0.24 0.10 1.27 (1.04,
(Eq. 7) is substituted by two arm-specific binomial versus placebo 1.55)
distributions diuretics versus 0.32 0.10 1.38 (1.13,
placebo 1.68)
CCB versus 0.07 0.10 1.07 (0.88,
r ij B pij , nij placebo 1.30)
ACE inhibitors 0.09 0.08 0.91 (0.78,
r ik Bðpik , nik Þ versus placebo 1.07)
ARB versus 0.19 0.10 0.83 (0.68,
Then the probabilities pik, pij are parameterized placebo 1.01)
to produce a treatment effect measure θijk (e.g., for
log(OR), θijk = logit( pij) logit( pik)). The hier-
archical network meta-analysis model is mathe- Bayesian approach accounts for uncertainty in
matically equivalent to the meta-regression as this value with a 95% CrI (0.01, 0.07) and pro-
long as the contrast-specific data are used: they vides estimates of the ORs with slightly wider
both have the same number of degrees of freedom confidence intervals.
and the same number of parameters. One advantage of conducting network meta-
analysis as a hierarchical model compared with a
meta-regression approach is that ranking of all
Application: Network Meta-analysis as Hierar- interventions included in the network is easier.
chical Model for Incident Diabetes This will be discussed in the next section, on the
As in the meta-regression approach, all compari- results from the model that accounts properly for
sons included in the four three-arm trials are the multi-arm trials.
assumed to be evaluated in three independent
two-arm studies, and this results in 30 comparisons
indexed with i. The same five basic parameters μAj Models for Data that Include Multi-arm
are chosen, with placebo as reference: μBB P, μD Trials
P, μCCB P, μACE P, μARB P. Arm-specific When trials involve more than two arms, the net-
data will be modelled using the binomial likelihood. work meta-analysis models described in sections
A categorical covariate needs to be specified for each “Network Meta-analysis as a Linear Model” and
arm, with xij showing the intervention given to arm “Network Meta-analysis as a Hierarchical Model”
j of study i and xik the intervention of arm are further complicated for two reasons. The first
k (xij , xik = {P, BB, D, CCB, ACE, ARB}). Fitting is the need to account for correlations induced by
the model in WinBUGS, and using a half-normal the fact that multi-arm trials inform more than one
prior distribution τ N(0, 1) , τ > 0 for the com- comparison. The second is that multi-arm studies
mon heterogeneity, gives the results in Table 6. are inherently consistent; if A, B, and C are all
These estimates are comparable to the effect included within the same study i then, it is plainly
sizes obtained by the meta-regression approach the case that yiBC = yiAC yiAB where yikj is the
(Table 4, Fig. 4). The most important difference effect size in study i for the contrast k versus j.
is, as in subgroup analysis, in the estimation of This means that if a study has αi arms, then only
heterogeneity. Although both meta-regression and αi 1 of the αi(αi 1)/2 possible comparisons
the hierarchical model result in the same point are linearly independent, and so only αi 1 need
estimate for heterogeneity of ^τ 2 ¼ 0:02 , the to be modelled. This inherent consistency also
Table 7 Data for network meta-analysis with a three-arm trial

Study i No. arms αi Arms/design Data Comparison
1 2 A, C y1AC , s21AC AC
2 2 B, C y2BC , s22BC BC
3 2 A, B y3AB , s23AB AB
4 3 A, B, C y4AC , s24AC AC
y4BC , s25BC AB
cov(y4AC, y4AB) = c
makes the calculation of the number of degrees of Using matrix notation, the meta-regression
freedom difficult and the formula Ncomp (T model in section “Network Meta-analysis as
1) no longerholds (see also section “Statistical a Linear Model” will have the form
Methods to Detect Inconsistency in a Network of
Interventions”). 0 1 0 1
Consider the case of three treatments and four y1ΑC 1 0
B y2BC C B 1 1 C
B C C μAC
B y3AB C ¼ B
studies as presented in Table 7. In study four,
B C B 0 1C
μ
only two of the three contrasts need to be @ y4AC A @ 1 0 A AB
included in the model as the third effect size
y4AB 0 1
yiBC can be simply computed as yiAC yiAB. 0 1 0 1
Thus, the study will contribute directly to two δ1ΑC e1ΑC
B δ2BC C B e2BC C
out of the three meta-analyses in Fig. 5. The two B C B C
observed effect sizes yiAC , yiAB are correlated as þB C B C
B δ3AB C þ B e3AB C (9)
@ δ4AC A @ e4ΑC A
they both include the common treatment C. This
covariance needs to be taken into account in the δ4AB e5AB
analysis.
Note that in Table 7, the data for study To account for the fact that the random errors
4 includes the sample covariance cov(y4AC, y4AB), and the random effects that belong to the same
denoted also as c.The covariance can be estimated study are correlated, it is assumed that e N
from the data as the variance of the outcome in the (0, S2) and δ N(0, T2) where e , δ are the
common arm. For example, if the outcome is vectors of the random errors and random effects,
continuous and the effect size is the mean differ- S2 is the within-studies variance-covariance matrix
ence, it turns out that c is the sample variance of (estimated from the data), and T2 is the between-
the outcome in the common arm C, that is, sdiC2/ studies variance-covariance matrix (and consists of
niC. When the outcome is dichotomous and the unknown parameters to be estimated from the
effect size is the ln(OR), the covariance is c = 1/ model). For the data in Table 7, the within-studies
rC + 1/(nc rC). When the outcome is measured variance-covariance matrix is
on the risk ratio scale (RR), then the covariance
for ln(RR) is c = 1/rC 1/nC and for risk differ- 0 1
s21AC 0 0 0 0
ence it is c = rC(nC rC)/nC3. B C
B 0 s22BC 0 0 0 C
The meta-regression model as presented in B C
section “Network Meta-analysis as a Linear S ¼B
2
B 0 0 s23AB 0 0 C
C
B C
Model” does not account for the dependence @ 0 0 0 s24AC c A
between the observations in study 4. Moreover, 0 0 0 c s25AB
correlations are present not only in the observa-
tions y4AC, y4AB but also in their underlying ran- whereas the between-studies variance-covariance
dom effects δ4AC, δ4AB. matrix is
0 1
τ2AC 0 0 0 0 likelihood of the data consists of three binomial
B C
B 0 τ2ΒC 0 0 0 C distributions with event probability parameters p4A ,
B C
T ¼B
2
B 0 0 τ2AΒ 0 0 C
C p4B , p4C which parameterized will give two effect
B C
@ 0 0 0 τAC
2
covðδ4AC ,δ4AB Þ A sizes θ4 , θ5 that correspond to underlying relative
0 0 0 covðδ4AC , δ4AB Þ τAB
2
effects for treatments A and B compared to C (see
0 2
τ 0 0 0 0
1 section “Meta-analysis as Hierarchical Model”). So,
B 0 τ2 0 0 C
for studies i = 1 , 2 , 3, the underlying random
B 0 C
B C effects θijk follow independent distributions as
¼B
B0 0 τ
2
0 0 C C
B
@0 0 0 τ2
C
τ2 =2 A
described in Eq. 8, but the random effects θ4AC ,
θ4AB from the fourth study will follow the multi-
0 0 0 τ2 =2 τ2
variate normal distribution (10).
As discussed before, it is often the case that a Technical note: the multivariate normal distri-
common heterogeneity parameter is assumed; that bution above can be decomposed into a series of
is, τ2AC ¼ τ2BC ¼ τ2AB ¼ τ2 . This assumption offers conditional distributions; this offers computa-
tional advantages. Distribution (10) can be written
an advantage in the case of multi-arm studies as it
as a set of one unconditional and one conditional
considerably simplifies the between-studies vari-
distribution:
ance-covariance matrix Τ2. It can be shown that
θ4AB N(μ AC, τ )
2
and
when heterogeneity is equal across comparisons, 2
the covariance of any two random effects is τ2/2. θ4AC jθ4AB N μAC þ 12 ðθ4AB μAB Þ, 3τ4
Then the matrix Τ2 has τ2 in the diagonal and τ2/2 More generally, if a study i has ai arms
in the cells that refer to pairs of effects from the that correspond to treatments Ti = {A, B, C,
same study. D. . .} in this presented order, the (ai 1)-dimen-
Similar considerations need to be made for the sional normal distribution of all treatments versus
hierarchical model. The distributions (7) and (8) A can be “decomposed” by writing the independent
apply only to studies i = 1 , 2 , 3. For i = 4 the distribution for θiAB, then the conditional θiAC|θiAB,
likelihood of the two-dimensional vector of effect then θiAD|θiAB , θiAC, and so on. The distribution of
sizes is the random effect θiAj conditional on all “previous”
comparisons θiAk has mean:

y4AC θ4AC s24 covðy4AC ,y4AB Þ
MVN ,
y4AB θ4AB covðy4AC ,y4AB Þ s25
1 X
μAj þ ðθiAk μAk Þ
ai 1 k<j
and the random effects are distributed assuming
equal heterogeneities as
with variance
2 2
θ4AC μAC τ τ =2
MVN , 2 : ai τ 2
θ4AB μAB τ =2 τ2
ai 1 2
(10)
where k < j means that comparison Ak was been
The consistency equations remain as presented
modelled before Aj.
in section “Network Meta-analysis as a Hierarchi-
cal Model”.
With arm-specific data and a hierarchical struc- Application: Network Meta-analysis with
ture, no correlations are needed to account in the Multi-Arm Trials as Hierarchical Model for
likelihood as the observations in arms are indepen- Incident Diabetes
dent. For example, if study 4 presents the number of In this application index i refers to studies
successes r4A , r4B , r4C for a dichotomous outcome (i = 1, . . . ,22). There are 18 studies that com-
out of the total n4A , n4B , n4C randomized, then the pared only two interventions and thus have only
Table 8 Results of network meta-analysis as hierarchi- heterogeneity are as in the application of hierar-
cal model for incident diabetes taking into account chical model that does not account for multi-arm
multi-arm trials. Log-odds ratios (^μ) with their standard
error SEðμ^ Þ and odds ratios (OR) with their 95% credible trials).
interval (CrI) for all basic and functional contrasts are The estimate of common heterogeneity is 0.02
reported with 95% CrI (0.01, 0.07). Very little change is
95% CrI observed compared with the analyses above in
Comparison μ
^ SEðμ
^Þ OR for OR which the correlations between multiple arms
β-blockers 0.24 0.10 1.27 (1.04,1.55) were ignored; this is probably due to the fact that
versus multi-arm trials represent only the 18% of our
placebo
data. All pairwise ORs are presented in Fig. 4.
diuretics 0.32 0.10 1.38 (1.13,1.68)
versus The posterior deviance from the analysis is
placebo D = 53.26 which, when compared to the number
CCB versus 0.07 0.10 1.07 (0.88,1.30) of data points (48), suggests a rather poor fit of the
placebo model to the data. The DIC of the model was
ACE 0.09 0.08 0.91 (0.78,1.07) estimated as 91.4.
inhibitors
versus
placebo
ARB versus 0.19 0.10 0.83 Network Meta-analysis as a Multivariate
(0.68,1.01)
placebo Meta-analysis
Multivariate meta-analysis is an extension of
meta-analysis that simultaneously synthesizes
data on more than one outcome per study. For
two arms (αi = 2), and there are four studies with example, studies which compare antihypertensive
three arms (αi = 3). The variable αi = {2, 3} interventions might measure the two related out-
needs to be specified for each study i and then a comes fatal stroke and nonfatal stroke. Some stud-
binomial likelihood is assumed for the number of ies will only report fatal or only nonfatal stroke,
patients in all arms of each study. Using the index others will report both. Because these two out-
j to show the arm (treatment) within a study, the comes are correlated, there are important benefits
binomial likelihood is written as in analyzing them jointly via multivariate meta-
analysis, including improved precision and calcu-
rij B pij ,nij , i ¼ 1, . . . , 22, lation of confidence regions for both outcomes
j fP,BB,D,CCB, ACE,ARBg (Jackson et al. 2011; Riley 2009).
Multiple treatment comparisons reported by
The probabilities pij can be parameterized to multi-arm studies may be viewed in a similar
model αi 1 effect sizes as way to multiple outcomes. Specifically, the basic
logit( pi1) = ui for the “first” (reference) arm in contrasts can be considered analogous to different
each study that pertains to treatment j outcomes, where the basic contrasts are the set of
logit( pij) = ui + θijk for the other arms in the necessary comparisons to represent all compari-
study sons under the consistency assumption (e.g., the
The underlying ln(OR),θijk, compares treat- contrasts Aj of each treatment versus a common
ments k and j (reported in the first arm) where reference treatment A). Studies may report on
j , k {P, BB, D, CCB, ACE, ARB}. For the many, allora single basic contrast. In the example
multi-arm trials the correlation between θijk and of Table 7, the basic contrasts are the contrasts AC
θijl , l 6¼ j 6¼ kin the same trial is taken into and AB. So, study 1 reports on the first “outcome”
account by the conditional mean and variance of AC, study 3 reports on the second “outcome” AB,
their distribution. Table 8 shows the results of and study 4 reports on both “outcomes.”
fitting this model in WinBUGS against placebo A departure from the analogy arises for study
(basic parameters and prior distribution for 2, which compares B and C. This study gives
Table 9 Data for network meta-analysis assuming data 0 1

augmentation for study 2 δ1AC :
B δ2AC δ2BC C
withB
@ :
C
No. Arms/
δ3AB A
Studyi arms αi design Data Contrast
δ4AC δ4BC
1 2 A,C y1AC , s21AC AC 00 1 0 2 11
2 2 A y2AC , s22AC BC 0 : τ 0 0 0
imputed, BB 0 0C B0 M 0 0C C
y2BC , s22BC
NMV B B
@@ :
C, B CC
B,C cov 0A @ 0 0 τ2 0 AA
(y2AC, y2AB) = c2 0 0 0 0 0 M
3 2 A,B y3AB , s23AB AB
4 3 A,B,C 2 AC 0 1
y4AC , s4AC e1AC :
y4BC , s25BC AB B e2AC e2BC C
andB
@ :
C
e3AB A
cov
(y4AC, y4AB) = c4
e4AC e4BC
00 10 2 11
0 : s1AC 0 0 0
information about a combination of the two “out- BB 0 0C B 0 L1 0 0C C
NMV B B
@@ :
C, B CC
comes.” To model the BC study, the assumption 0A @ 0 0 s23AB 0 AA
of consistency is employed. As presented in sec- 0 0 0 0 0 L2
tion “Assumptions Underlying Indirect and
s22AC c2 s24AC c4
Mixed Comparisons” transitivity suggests that L1 ¼ , L2 ¼ and
the missing arm in a study is missing at random. c2 s22BC c4 s25BC

If study 2 had reported arm A, then the two τ2 τ2 =2
“outcomes” y2AC , y2AB could have been M¼ :
τ2 =2 τ2
derived. This suggests a simple imputation strat-
egy, whereby data in the “missing” arm can be It has been shown that the choice of the basic
created via a data augmentation technique contrasts and the data augmentation parameters do
(White 2011).The imputed data are designed to not impact on the estimation of the effects. The
provide minimal information, for example, by model can be fitted using the mvmeta command
giving thema very large variance (for continuous in STATA, providing estimates of the means and
outcomes) or a very small sample size less than standard errors for the basic parameters μAC and
one (for dichotomous outcomes). The two effect μBC. As described earlier, combinations of the
sizes y2AC, y2AB are correlated, and together they basic parameters can give estimates of all functional
give information about the direct observed BC parameters through application of the consistency
contrast. The data can be rewritten as in Table 9: equations and uncertainty in these can be obtained
Then, following standard multivariate meta- by incorporating covariances between the estimates.
regression techniques and assuming equal
heterogeneities:
Application: Network Meta-analysis for In-
0 1 0 1 cident Diabetes as a Multivariate Meta-
y1AC : 1 :
By C B analysis
B 2AC y2BC C B 1 1CC μAC The use of standard multivariate meta-regression
B C¼B C
@ : y3AB A @ : 1 A μAB requires that all studies have data for the treatment
y4AC y4BC 1 1 that has been chosen as reference. When none of
0 1 0 1
δ1AC : e1AC : the treatments of the network is common to all
Bδ δ C Be e2BC C studies (as in the current example), one of the
B 2AC 2BC C B 2AC C
þB CþB C treatments can be chosen to be the common refer-
@ : δ3AB A @ : e3AB A
ence treatment and then the data augmentation
δ4AC δ4BC e4AC e4BC technique is applied. More specifically, choosing
Table 10 Results of network meta-analysis as multivari- in previous analyses in the chapter. This sets τ2Pj
μ)
ate meta-analysis for incident diabetes. Log-odds ratios (^ ¼ τ2 so all covariances between random effects
with their standard error SEðμ^ Þ and odds ratios (OR) with
their 95% confidence interval (CI) for all basic contrasts are τ2/2. This model can be implemented in
are reported STATA using the mvmeta command with the
95% CI for option bscov(), which gives the results of
Comparison μ
^ SEðμ^Þ OR OR Table 10.
β-blockers 0.21 0.08 1.24 (1.05,1.44) Estimates for all functional comparisons can be
versus derived with the use of consistency equations.
placebo
There are small differences between the results
diuretics 0.28 0.08 1.32 (1.12,1.56)
of this approach with the corresponding results
versus
placebo of the hierarchical model. Using the restricted
CCB versus 0.04 0.08 1.04 (0.89,1.21) maximum likelihood estimator in mvmeta results
placebo in ^τ 2 ¼ 0:01, which is the same as the heteroge-
ACE 0.12 0.07 0.88 (0.77,1.10) neity estimated in the hierarchical model.
inhibitors
versus
placebo
ARB versus 0.19 0.09 0.83 (0.70,0.98)
Assumptions of Network Meta-analysis
placebo
As presented in section “Estimating Inconsistency
in Mixed Comparisons,” inconsistency in a net-
placebo as reference implies that in studies with- work can manifest as a disagreement between
out a placebo arm, we need to “impute” data for a different sources of evidence for the same com-
very small sample size for an assumed placebo parison and can be identified statistically. For
arm, and here the values riP = 0.001 and niP = example, an indirect estimate of A versus B via a
0.01 are used. Then all studies will report on the treatment C can be in conflict with the direct
relative effectiveness of the included treatments estimate or with another indirect estimate, e.g.,
versus placebo, yiPj where j = {BB, D, CCB, A versus B via a treatment C.
ACE, ARB}. The sample variance-covariance Both the likelihood of transitivity (based on
matrix S of all yiPj needs to be specified. As the clinical and epidemiological considerations) and
outcome is measured using the (OR), the vari- any evidence of (in)consistency (based on statis-
ances of all observations are calculated using the tical considerations) should be evaluated in a net-
formula: work as part of a network meta-analysis.
Conceptual evaluation involves a priori judge-
1 1 1 1 ments about the comparability of the studies
s2iPj ¼ þ þ þ
r P nP r P r j nj r j across comparisons with respect to the distribu-
tion of potential confounders, considering
and the covariances are calculated as whether treatments were all given for the same
indication and considering whether anchor treat-
1 1
cov yiPj , yiPk ¼ þ ments are equivalent. Such judgements should be
r P nP r P made ideally before the outcome data are extra-
cted but after the studies and their characteristics
The variance-covariance matrix of the random are collected.
effects can be modelled in various ways. The most Although transitivity and consistency are inter-
flexible structure is to estimate different heteroge- woven concepts and are often thought of as one, it
neity variances τ2Pj for each comparison ( j vs. P). can be useful to consider them separately for ease
In the analyses that follow, a much more restricted of evaluation. Consider, for instance, the network
structure is used, following the assumption of a presented in Fig. 6 where all treatments have been
common heterogeneity variance as has been used compared with placebo but not with each other. In
Fig. 6 Plot of a network for Placebo

efficacy of pharm-
acotherapeutic agents for
anxiety disorders in Venlafaxine
children and adolescents.
The size of the nodes is
proportional to the number
of studies that evaluate each
Fluoxetine
intervention and the
thickness of the lines is
proportional to the
frequency of each Sertaline
comparison in the network
(Uhtman and Abdulmalik
2010) Fluvoxamine
Paroxetine
this network, each given comparison is informed

by a single source of evidence (there are no closed E
loops in the network), and therefore it is impossi-
ble to observe inconsistency by statistical means.
A F
However, network meta-analysis can be
performed and will yield estimates for the relative
effects of all active treatments (which in this case
C D
are only indirect estimates). These estimates are
valid only if transitivity can be assumed for the
common “anchor” treatment (placebo). There- B
fore, judgements about the plausibility of transi-
tivity should be done for the entire network as
Fig. 7 Fictional network of interventions with two trian-
indirect evidence is derived for all treatments gular loops
irrespectively of whether they belong to “closed
loops” or not.
inconsistency in all loops by calculating the
loop-specific inconsistency factors IF,^ their con-
Statistical Methods to Detect fidence intervals, and a z-test for each one. The IF^
Inconsistency in a Network s of a network can be presented in a forest plot like
of Interventions graph where deviations from the consistency
assumption would be reflected in loops with con-
The assumption of consistency can be evaluated fidence intervals incompatible with zero. Note
statistically in a full network by extending the idea that not all loops need to be presented and tested;
outlined in section “Estimating inconsistency in for example, if a quadrilateral consists of two tri-
Mixed comparisons”. A network often comprises angles inconsistency needs to be evaluated only in
several closed loops (triangles, quadrilaterals, the triangles. Consider, for example, the network
etc.) which bring together evidence for the same in Fig. 7. Inconsistency can be evaluated in ABC
comparison from direct and various indirect and ADB loops; if these are consistent then the
routes. Within each one of these closed loops, quadrilateral ADBC would be consistent as well.
the consistency assumption, seen as agreement The loop-based approach is simple and can be
between direct and indirect estimates, can be eval- useful for identifying loops that deviate from con-
uated. A first approach is therefore to evaluate sistency, but has important limitations. An obvious
problem is that the loop-specific tests are not inde- estimate will follow a chi-squared distribution
pendent as they share groups of studies. Consider, with l degrees of freedom.
for example, the network in Fig. 7 and imagine that The results of the loop-based approach can
the AB comparison is informed by a single study in vary substantially depending on the method
which an unobserved characteristic produced an used to derive the pairwise estimates and their
estimate very different from what would be variances. In the presence of heterogeneity, the
expected in the other studies. Then both ABC and uncertainty of IF^ will be larger in a random
ADB loops will present inconsistency because the effects analysis compared with a fixed-effect
respective IFs ^ share the same deviant AB study. analysis, and therefore there will be less chance
The loop-based approach does not provide a of identifying statistically significant inconsis-
network-specific estimate of the inconsistency. tencies. The random effects approach will also
The multiple dependent tests cannot be summa- give different results depending on which
rized into a global network-specific test. It is also method will be used to estimate the heterogene-
unclear how to treat multi-arm trials, which are ity parameter τ (e.g., method of moments,
inherently consistent. Because of the dependence restricted maximum likelihood). Some
between the loops and the multiple testing nature approaches will give larger estimates than
of the approach, the results should be interpreted others, resulting in different estimates for the
with caution; the absence of inconsistent loops uncertainty of IF ^ . Moreover, the estimated
may be reassuring for the assumption of consis- pairwise variances will change depending on
tency (notwithstanding the lack of power of such whether the same or different heterogeneity
tests), but the presence of statistically significant parameters are assumed in the loop.
loops cannot be used to infer the magnitude of There is currently limited empirical evidence
inconsistency in a network. about the occurrence of statistical inconsistency.
In the special case where the loops share a A study evaluated 112 triangular networks
single comparison as in Fig. 7, a chi-squared test of which only 16 were found inconsistent
can be applied (Caldwell et al. 2010). For (Song et al. 2011). O’Regan et al. (2009) empir-
the same comparison AB, there are three esti- ically evaluated the agreement between
mates; the direct estimate μ ^D
AB the two indirect indirect and mixed estimates that appear in
estimates via C and D, μ ^ ABviaC and μ
I
^ IABviaD, respec- networks of at least four treatments. Using a
tively, with their estimated variances noted as ^v D AB fixed-effect approach, they concluded that the
, ^v IABviaC , ^v IABviaD . The mixed estimate μ ^M
AB is the two indirect and mixed estimates did not show
weighted average of the three estimates with important differences, although the 51 compari-
weights being the inverse of the variance. To test sons they examined came from only seven
both ABC and ADB loops (and therefore provide a reviews.
global test for the network) the following Approaches that evaluate inconsistency glob-
chi-squared test can be applied ally in a network rather than testing each loop
have gained in popularity but are typically
cumbersome to apply and have limitations. For
2 I 2
μ
^DAB μ ^M μ
^ ABviaC μ ^M network models fitted within a Bayesian frame-
AB
þ AB
work, the consistency assumption can be evalu-
^v D
AB ^v IABviaC
I 2 ated by comparing a model that assumes
μ
^ μ ^M consistency with one that does not, using
þ ABviaD AB
Χ22
^v IABviaD the DIC (Spiegelhalter et al. 2002). The model
without consistency is the model described in
This can be generalized to combine testing for section “Consistency Models” but without the
disagreement between direct estimate and consistency equations to derive indirect and
l independent indirect sources; the weighted sum mixed estimates. The inconsistency model relies
of the difference of each estimate from the mixed only on direct evidence and is equivalent to a
series of pairwise meta-analyses (usually assum- proposed in (Lu and Ades 2006) and is based on
ing, however, that they share the same heteroge- the idea that inconsistency is a property of closed
neity parameter). The assumption of consistency loops and a network can have as many inconsis-
is challenged when the inconsistency model pre- tencies as functional parameters. Recently, an
sents, for the same data, a better trade-off approach has been proposed which extends the
between model fit and complexity; this is the idea of inconsistency: it does not apply only to the
case when the DIC for the inconsistency model disagreement between direct and indirect estimates
is lower to the DIC for the consistency model by in a loop but also disagreement between studies that
more than three units. An important drawback report the same comparison but include different
with this method is that results may depend sets of treatments. The two approaches are outlined
on the parameterization of the multi-arm trials, below, starting from the data in Table 7.
from which only some of the study-specific The loop-based inconsistency model assumes
effect sizes enter the model. Approaches that that inconsistency arises when the consistency
simultaneously test and account for inconsis- equations between functional and basic parame-
tency are discussed in the section “Inconsistency ters do not hold. Hence, an obvious solution is to
models”. “relax” the assumption by adding an extra term to
account for inconsistencies. In the example of
Application: Statistical Evaluation of Incon- Table 7, there are two basic parameters μAC , μAB
sistency in Each Loop of Incident Diabetes and one functional μBC = μAC μAB. This
Network reflects the closed loops ABC. Inconsistency in
The network includes 16 “triangles” that can be this loop can be accounted for if it is assumed that
evaluated for inconsistency. For the calculation
of all inconsistency factors, the formulae of sec- μBC ¼ μAC μAB þ wABC
tion 3.3.3 is employed. Then the estimates with
their 95% CI can be plot in a forest plot. The where wABC measures the amount of inconsis-
pairwise effect sizes were estimated using the tency in the loop. The term is also called an
random effects model assuming different and inconsistency factors and in fact in the absence
loop-common heterogeneity parameters. of multiple correlated loops is analogous to the
There are no important differences between the simple IF^ . In complex networks where many
two forest plots; both include two inconsistent inconsistency factors exist, the parameters wjkf
loops. are assumed to be randomly distributed with
The hierarchical model is fitted as described in expectation zero:
section “Network Meta-analysis as a Hierarchical
Model” but omitting the consistency equations (i.e.,
wjkf N 0, σ 2
an inconsistency model); i.e., this is essentially a
sequence of pairwise meta-analyses. The value of The variance σ 2 is often referred to as the
the posterior deviance was D = 50.85 and inconsistency variance in analogy with the het-
DIC = 93.6. Comparing theD value to that obtained erogeneity variance τ2 in the distribution of the
from the consistency model, since the difference in study-specific random effects δi N(0, τ2).
DIC is smaller than three points, this suggests that The inconsistency σ 2 describes the amount of
the inconsistency model fits the data better and variability across loops in the conflict between
might also be the most parsimonious model. direct and indirect evidence. Monitoring the indi-
vidual wjkf s for large values will reveal loops with
important inconsistency, whereas comparison of
Inconsistency Models σ 2 to τ2 will show how much inconsistency exists
compared with the heterogeneity.
Two major approaches have been proposed so far As the degrees of freedom in a network describe
to address inconsistency. The first approach was the number of functional parameters, there are
Ncom (T 1) many inconsistency factors. Prob- One further approach for detecting inconsis-
lems arise with this approach when there are multi- tency in a network meta-analysis is “node split-
arm trials. The ABC trials in Table 7 are inherently ting” (Dias et al. 2010) where a “node” refers to
consistent, and therefore the BC comparison each summary effect generated from the network
reported in these studies does not contribute to the meta-analysis. This approach is based on the
inconsistency as much as the BC comparison in an separation of the information contributing to
independent study. Lu and Ades suggested each node into the direct and indirect evidence,
adjusting the inconsistency degrees of freedom to within a single model. The node-splitting
ICDF = Ncom (T 1) S where S is the approach allows the analyst to split the
number of independent inconsistency relations in network-wide information contributing to the
which the corresponding parameters are supported summary estimate into the evidence directly
by no more than two independent sources of evi- comparing B versus C(^ μD
BC) and all the remaining
dence. In practice, S is the number of functional “indirect” evidence for B versus C (^ μ IBC) after the
comparisons where two out of the three parameters studies directly comparing B to C have been
are only estimated in multi-arm trials. removed. The extent of agreement between the
The difficulties in fully defining loop incon- direct and indirect estimates defines the magni-
sistency when there are multi-arm studies moti- tude of consistency. Note that this is a computa-
vated the concept of “design inconsistency.” tionally intensive approach involving models
Design inconsistency reflects the belief that that can be difficult to parameterize; care should
studies which include different treatments be taken to ensure that multi-arm trials are han-
might give different estimates for the same com- dled correctly and to ensure that split nodes are
parison. For example, an AB study and ABC actually from contrasts contributing to suspect
study might provide different estimates because loops.
of their different design. Design inconsistency
can be thought of as a special case of source-
specific heterogeneity: variation between the Application: Hierarchical Inconsistency
estimates for the same comparison due to differ- Model for Network Meta-analysis in Incident
ences in the total treatments included. In the data Diabetes
of Table 7, this means adding an inconsistency The application of a hierarchical inconsistency
factor for the disagreement between model requires careful choice of the basic parame-
the three estimates in the ABC study and the ters μAj and the inconsistency factors wjkf, as well as
AB , AC, and BC studies. The model with the appropriate parameterization of multi-arm trials.
both loop and design inconsistency has First, all basic contrasts need to be informed directly
NCompDesign (T 1) inconsistency factors, from at least one study. Choosing placebo as refer-
where NCompDesign is the number of indepen- ence treatment (A) satisfies this condition, because
dent comparisons per design. In Table 7 there is all other treatments are compared directly with pla-
one independent comparison for each two-arm cebo in at least one study. Second, the four multi-
trial and two independent comparisons for the arm trials included in the data may modify the
three-arm trials. This results in a total of three number of ICDF that should be included in the
inconsistency factors for the network. These model. However, as all consistency equations are
inconsistency factors are comparison-specific informed by at least three independent sources of
and are attached to every study reporting that evidence, it is
comparison. For instance, one inconsistency fac-
tor is attached to each AB , AC, and BC study, ICDF ¼ N com ðT 1Þ S
respectively, (wAB , wAC , wBC). As the inconsis- ¼ 14 ð6 1Þ 0 ¼ 9
tency factors derived in this way are indepen-
dent, they can be summarized in a single test for The consistency relations can be relaxed to
the entire network (see White 2011 for details). include the nine inconsistency parameters:
Table 11 Results of inconsistency hierarchical model for (CrI) for all basic and functional contrasts are reported.
network meta-analysis for incident diabetes. Inconsistency Missing values of w correspond to basic contrasts or func-
μ) with their standard error SE
factors (w), log-odds ratios (^ tional contrasts without direct estimates available
ðμ
^ Þ, and odds ratios (OR) with their 95% credible interval
Comparison wPjk μ
^ SEðμ^Þ OR 95% CI for OR
diuretics versus β-blockers 0.02 0.08 0.12 1.08 (0.86, 1.36)
CCB versus diuretics 0.00 0.25 0.11 0.78 (0.62, 0.97)
CCB versus β-blockers 0.01 0.17 0.11 0.84 (0.68, 1.04)
ACE inhibitors versus CCB 0.01 0.19 0.11 0.83 (0.65, 1.00)
ACE inhibitors versus diuretics 0.01 0.44 0.11 0.65 (0.50, 0.78)
ACE inhibitors versus β-blockers 0.04 0.36 0.12 0.70 (0.53, 0.85)
ARB versus ACE inhibitors 0.07 0.12 0.93 (0.73, 1.18)
ARB versus CCB 0.00 0.26 0.12 0.77 (0.59, 0.95)
ARB versus diuretics 0.01 0.51 0.13 0.60 (0.45, 0.76)
ARB versus β-blockers 0.02 0.43 0.13 0.65 (0.49, 0.81)
μDBB ¼ μDP μBBP þ wPDBB Since the model considers treatment 1 as baseline
treatment (A) of each study, we need in the data for
μCCBD ¼ μCCBP μDP þ wPCCBD the D-BB-P trial placebo to be the first treatment and
for the other three studies CCB or ACE inhibitors.
μCCBBB ¼ μCCBP μBBP þ wPCCBBB Table 11 shows the results of fitting this model
in WinBUGS employing a half-normal prior dis-
μACECCB ¼ μACEP μCCBP þ wPACECCB tribution on the inconsistency variance σ 2 (the
same as for the heterogeneity τ2).
μACED ¼ μACEP μDP þ wPACED Heterogeneity and inconsistency variances
were estimated as 0.02 and 0.01, respectively,
μACEBB ¼ μACEP μBBP þ wPACEBB with 95% CrI (0, 0.06) and (0, 0.13), respectively.
Some w-factors are quite large in relation to the
μARBCCB ¼ μARBP μCCBP þ wPARBCCB treatment effect estimates, indicating that there is
probably some inconsistency in the network. Note
μARBD ¼ μARBP μDP þ wPARBD this is in agreement with the loop-specific
approach. The loop placebo versus ACE inhibi-
μARBBB ¼ μARBP μBBP þ wPARBBB tors versus β-blockers presents the largest incon-
sistency value (0.04) followed by the loops
where wPjk N(0, σ 2) for j , k = {BB, D, CCB, placebo versus ARB versus β-blockers (0.02)
ACE, ARB}. The rest of the model is the same with and placebo versus diuretics versus β-blockers
the consistency hierarchical model (accounting for (0.02). The first two loops were also identified
multi-arm trials). Moreover, contrasts that are as inconsistent in Fig. 8, and the last was appeared
informed only from multi-arm trials need to be marginally consistent. There are no large differ-
expressed in model parameters. Such contrasts are ences in the point estimates of the summary ORs
β-blockers versus placebo, included in a D-BB-P compared to those from the consistency model.
study, and ACE inhibitors versus CCB, included in However, the 95% CrI from the inconsistency
two ACE-CCB-BB, and one ACE-CCB-D studies. model are wider to account for inconsistency.
BB-D-CCB
P-BB-CCB
BB-CCB-ACE
BB-CCB-ARB
P-BB-D
BB-D-ACE
BB-D-ARB
P-BB-ACE
P-BB-ARB
P-D-CCB
D-CCB-ARB
D-CCB-ARB
P-CCB-ACE
P-CCB-ARB
P-D-ACE
P-D-ARB
0 1 2 3 4 0 1 2 3 4
Inconsistency Factor
(a) (b)
Fig. 8 Inconsistency factors of all triangles of incident heterogeneity estimate within each triangle. Triangles
diabetes network with (a) a different heterogeneity esti- with statistically significant inconsistency factors (their
mate for each comparison and (b) with a common 95% CI does not include 0) are considered as inconsistent
The DIC of the model was 92.1 and D ¼ 53:15 The network meta-regression model as a hier-
showing that accounting for inconsistency does archical model is
not improve the fit of the model as the consistency
model resulted in almost the same values.
yijk N θijk , s2ijk
Exploring Heterogeneity and θijk ¼ θijk þ bijk Cijk

Inconsistency: Network Meta-
regression θijk N μjk , τ2r
When heterogeneity is found in a pairwise meta- where bijk are the regression coefficients for study
analysis, subgroup analysis or meta-regression are i and comparison jk and Cijk the explanatory var-
employed to explore possible sources. Network iable. The regression coefficients can be assumed
meta-regression is an extension of network meta- to be fixed across studies (bijk = βjk) or, if there
analysis to include covariates and can be used to are many studies per comparison, as exchangeable
explore heterogeneity and/or inconsistency. across studies (bijk Ν(βjk, γ 2)). The model can
Covariates typically include study-specific vari- be applied to multi-arm trials and also extended to
ables such as setting or length of follow-up, account for inconsistency as described in previous
within-trial bias characteristics such as the quality sections.
of randomization, of allocation concealment and Consistency can be imposed for the regres-
blinding, or patient-level characteristics such as sion coefficients by choosing a reference treat-
age or sex. Meta-regression is equivalent to sub- ment A and defining βjk = βAk βAj (Cooper
group analysis for dichotomous or categorical et al. 2009). To improve power, the independent
explanatory variables. Characteristics such as dif- βAj can be assumed exchangeable; βAj Ν(Β,
ferences in baseline risk (if there is a common φ2). Adjusting for factors that can vary across
comparator) and sample size (as a single proxy comparisons may reduce heterogeneity and
for study quality) can also be considered. improve the likelihood of transitivity. The
importance and impact of the adjustment can be be the sample standard error, variance, or inverse of
judged by monitoring changes in the heteroge- sample size (references). However, significant
neity variance (compare τ2r to τ2) and inconsis- associations between effect sizes and precision
tency variance (compare σ 2r to σ 2), by monitoring can be taken only as an indication of publication
the magnitude and significance of the coeffi- bias, as other explanations, including genuine het-
cients βjk and by comparing the goodness of fit erogeneity, are possible. As publication bias and
and parsimony of adjusted and unadjusted selective reporting will affect interventions and
models using DIC and D. comparisons in different ways depending on the
Network meta-regression suffers from the clinical context, the problem of selection bias in
same problems with simple meta-regression. the network should be considered carefully. Further
These include ecological bias when aggregated methodological development is needed to better
patient-level data are used as covariates, low address selection bias in network meta-analysis.
power with few studies and high false-positive Because network meta-analysis combines
rates if heterogeneity not explained by the covari- studies that compare a treatment against a variety
ances is ignored (Higgins and Thompson 2004). of comparators, it enables researchers to explore
Adjusting for bias in a network of interven- biases that are not identifiable in a head-to-head
tions offers the advantage of increased power meta-analysis. “Optimism bias” associated
compared with traditional meta-analysis sensi- with the use of novel interventions has been a
tivity analysis, because the regression coeffi- concern difficult to address in a pairwise meta-
cients share information via the consistency analysis (Djulbegovic et al. 2011; Heres et al.
equations. Suppose, for example, that compari- 2006; Soares et al. 2005). However, in a
son B versus C is informed by very few studies, network of interventions, the same treatment
or by studies that all have the same characteristic C can be the newer and hence the “favored” in a
(e.g., they all have poor allocation concealment). comparison A versus C but the older in
Then, conducting sensitivity analysis or another comparison B versus C. This enables us
adjusting the meta-analysis result of BC for allo- to explore apparent changes in the effectiveness of
cation concealment is suboptimal or impossible. C because of optimism (Salanti et al. 2010).
However, when these studies are part of a net-
work meta-regression model, the bias coefficient Application: Network Meta-regression for
βBC for allocation concealment is linked to the Incident Diabetes Using Year of Publication
other regression coefficients via βBC = βAC as Covariate
βAB and βAj Ν(Β, φ2). An network meta-regression analysis of the inci-
A special application of network meta- dent diabetes data set will investigate whether
regression is to address small study effects in a differences in the publication year of included
network of interventions. The association between studies have an impact on the estimated treat-
sample size, effect size heterogeneity, and the prob- ment effects, and hence whether they can explain
ability of publication (which is often manifested as any of the heterogeneity and inconsistency of
funnel plot asymmetry) has long been a challenging this network. Two meta-regression models will
issue in meta-analysis. In a pairwise meta-analysis, be used; one estimating a common fixed coeffi-
the presence of small study effects (possibly due to cient across all studies and all treatment compar-
publication bias) has been explored by regressing isons and a second imposing consistency in
the underlying effect on a measure of the study coefficients. More specifically, in the first
precision. The same approach applies to networks model, it is assumed that bijk = B(i = 1 , . . . ,
of interventions to explore situations where com- 22), and a vague normal prior distribution on the
parisons that do not give significant results may be fixed coefficient N(0, 10000)is employed. In
underrepresented or missing in the network and the second model, the coefficients are assumed
their relative effectiveness will be informed primar- to be consistent bijk = βAk βAj (A = P) and
ily by the indirect evidence. The covariate Cijk can exchangeable βAj Ν(Β, φ2) ( j = {BB, D,
Table 12 Medians and 95% CrI of regression coefficients for comparisons of all treatments versus placebo estimated by
network meta-regression model for incident diabetes with consistent and exchangeable coefficients
Comparison β 95% CrI
β-blockers versus placebo 0.02 (0.06, 0.01)
diuretics versus placebo 0.02 (0.06, 0.01)
CCB versus placebo 0.03 (0.07, 0.02)
ACE inhibitors versus placebo 0.03 (0.08, 0.01)
ARB versus placebo 0.03 (0.10, 0.03)
Table 13 Results of network meta-regression model with standard error SEðμ^ Þ and odds ratios (OR) with their 95%
a common fixed coefficient for incident diabetes using year credible interval (CrI) for all comparisons are reported
μ ) with their
of publication as covariate. Log-odds ratios (^
Comparison μ
^ SEðμ^Þ OR 95% CrI for OR
β-blockers versus placebo 0.23 0.09 1.26 (1.06,1.50)
diuretics versus placebo 0.26 0.09 1.29 (1.07,1.56)
CCB versus placebo 0.06 0.09 1.07 (0.90,1.26)
ACE inhibitors versus placebo 0.10 0.07 0.91 (0.78,1.05)
ARB versus placebo 0.16 0.10 0.86 (0.70,1.04)
Table 14 Results of network meta-regression model with μ ) with their standard error SEðμ
ratios (^ ^ Þ and odds ratios
consistent and exchangeable coefficients for incident dia- (OR) with their 95% credible interval (CrI) for all compar-
betes using year of publication as covariate. Log-odds isons are reported
Comparison μ
^ SEðμ^Þ OR 95% CrI for OR
CCB, ACE, ARB}), where Β and φ2 are the mean CrI of B is (0.03, 0.01) implying that
and variance, respectively, of the distribution of there is no statistically significant effect of
all βAj with normal (B N(0, 10000)) and half- study publication year on treatments’
normal prior distributions. In both models a effectiveness.

covariate Ci Ci is used instead of Ci (the The same inference is derived from the
year of publication of study i) for computational second meta-regression model, which estimates
reasons (e.g., convergence of the models), where the mean (B) of distribution of regression
Ci is the mean publication year. coefficients’ to be 0.02 (0.07, 0.01) with
The estimate of the fixed regression coeffi- variance φ2 < 0.001. Table 12 shows the con-
cient from the first model (B) was 0.01, sistent coefficients (βAj) of all treatments versus
corresponding to an odds ratio that is placebo.
e0.01 = 0.99 times smaller for each 1 year The estimated treatment effects by the two
later of publication. However, the 95% models are presented in Tables 13 and 14.
Both meta-regression models resulted in het- Ranking measures and probabilities have
erogeneity estimates ^τ 2 ¼ 0:02, the same as for become popular as they provide an understand-
the consistency hierarchical model (accounting able gateway to the results, particularly when
for multi-arm trials), showing that year of publi- there are many competing treatments. The
cation as a covariate does not explain adequately probability of each treatment being the best is
the heterogeneity in the network. often calculated when the network model is fitted
The meta-regression with a fixed coefficient also within the Bayesian framework. Methods are also
does not improve the fit of the model (D = 53.85, available for similar ranking of treatments in a
DIC = 92.2) compared with the hierarchical con- frequentist framework (White 2011). The proba-
sistency model without any covariates, while the bility of being the best treatment has the disad-
model with consistent coefficients shows a slightly vantage that it does not reflect spread of rankings
better fit (D = 51.57, DIC = 91.3). for the treatments and may thus be misleading. An
The inconsistency model (as described in obvious solution is to calculate the probabilities
section “Consistency Models” but omitting the for all ranks. The probability of each treatment to
consistency equations) was also fitted with a fixed achieve each possible rank can be plotted to yield
coefficient to investigate if differences in year “rankograms.” Presentation of the cumulative
of publication can explain the identified inconsis- ranking curves in a single plot and a numerical
tency. The value of the posterior deviance was D= summary of the area below the cumulative raking
50.43 and DIC = 93.4, same with the inconsistency curve for each treatment is useful as it gives a clear
model not including any covariates. However, ordering of all treatments based on a summary of
using the estimates of this model, the two inconsis- the rank probabilities. A review of graphical and
tent loops ACE-BB-P and ARB-BB-P become numerical methods along with software code are
consistent with IF = 0.20 (0.44, 0.85) and 0.18 presented in (Chaimani et al. 2013).
(0.53, 0.89) implying that year of publication is a
possible explanation of inconsistency.
Application: Presentation of Results for Inci-
dent Diabetes
The results of the consistency hierarchical
model (accounting for multi-arm trials) will be
Numerical and Graphical Presentation used to illustrate the use of rankograms. The hier-
of Results from Network Meta-analysis archical model is fitted, and the ordering of the
treatments according to their effectiveness is col-
Network meta-analysis involves many treat- lected in each MCMC cycle using the equation:
ments and consequently results in a plethora of
pairwise effect sizes. When presenting results X
6
from a network meta-analysis, it is useful to orderk ¼ I μAj μAk
show both the direct and the mixed estimates j¼1
along with their 95% confidence intervals and

comment on any disagreements between them where I(μAj μAk) = 1 if μAj μAk and 0 other-
(e.g., as in Fig. 3). In a consistency model, all wise, A = P and j,k = {BB, D, CCB, ACE, ARB}
pairwise comparisons are possible and the effect Then the probability for each treatment
sizes are often presented in the form of a “league k = {P, BB, D, CCB, ACE, ARB} of being the
table” or in a forest plot against a common com- jth ( j = 1 , . . . , 6) order ( Pjk ) is the ratio of
parator (see, e.g., Fig. 4). Presentation of the MCMC simulations for which orderk = j over the
results using predictive intervals, though infre- total number of simulations. Table 15 includes the
quent, best conveys the uncertainty due to values of the ranking probabilities and Fig. 9 the
heterogeneity. corresponding rankograms.
Table 15 Ranking probabilities for all treatments of incident diabetes. Results are based on the consistency hierarchical
model (accounting for multi-arm trials)
Order Placebo β-Blockers Diuretics CCB ACE inhibitors ARB
1 0.01 0.00 0.00 0.00 0.22 0.77
2 0.07 0.00 0.00 0.02 0.71 0.20
3 0.65 0.00 0.00 0.27 0.06 0.02
4 0.27 0.01 0.01 0.70 0.01 0.00
5 0.01 0.79 0.20 0.01 0.00 0.00
6 0.00 0.20 0.80 0.00 0.00 0.00
1.0
1.0
1.0
0.8
0.8
0.8
Probability
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Rank of Placebo Rank of β-Blockers Rank of Diuretics
1.0
1.0
1.0
0.8
0.8
0.8
Probability
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Rank of CCB Rank of ACE inhibitors Rank of ARB
Fig. 9 “Rankograms” for all treatments of incident diabetes. Results are based on the consistency hierarchical model
(accounting for multi-arm trials)
Table 16 Numerical summary of area below the cumu- The numerical summary of area
lative raking curve for all treatments of incident diabetes.
Results are based on the consistency hierarchical model below the cumulative raking curve for !each
(accounting for multi-arm trials) P5
treatment k is calculated as cum: Pjk =5 .
Placebo 0.59 j¼1
β-Blockers 0.16 The results are presented in Table 16 and
Diuretics 0.04 the plots in Fig. 10. These results suggest
CCB 0.46 that the best treatment appears to be
ACE inhibitors 0.81
ARB followed by ACE
ARB 0.96
inhibitors, placebo, CCB, β-blockers, and last
diuretics.
1.0
1.0
1.0
Cumulative Probability
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Rank of Placebo Rank of β-Blockers Rank of Diuretics

1.0
1.0
1.0
Cumulative Probability
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0.0
0.0
0.0
1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Rank of CCB Rank of ACE inhibitors Rank of ARB
Fig. 10 Plot of area below the cumulative raking curve for all treatments of incident diabetes. Results are based on the
consistency hierarchical model (accounting for multi-arm trials)
Acknowledgments GS and AC received funding Caldwell DM, Welton NJ, Ades AE. Mixed treatment
from the European Research Council (ERC starting comparison analysis provides internally coherent treat-
grant IMMA 260559). DC is supported by an UK MRC ment effect estimates based on overviews of reviews
Population Health Scientist Fellowship (G0902118). and can reveal inconsistency. J Clin Epidemiol. 2010;6
JPTH is funded by Medical Research Council grant (8):875–82.
U105285807. Chaimani A, Higgins JP, Mavridis D, Spyridonos P, Salanti G.
Graphical tools for network meta-analysis in STATA.
PLoS One. 2013;8(10):e76654.
Cipriani A, Furukawa TA, Salanti G, et al. Comparative
efficacy and acceptability of 12 new-generation antide-
References pressants: a multiple-treatments meta-analysis. Lancet.
2009;373:746–58.
Baker SG, Kramer BS. The transitive fallacy for random- Cooper NJ, Sutton AJ, Morris D, et al. Addressing
ized trials: if A bests B and B bests C in separate trials, between-study heterogeneity and inconsistency in
is A better than C? BMC Med Res Methodol. mixed treatment comparisons: application to stroke
2002;2:13. prevention treatments in individuals with
Barbui C, Cipriani A, Furukawa TA, et al. Making the best non-rheumatic atrial fibrillation. Stat Med. 2009;28
use of available evidence: the case of new generation (14):1861–81.
antidepressants: a response to: are all antidepressants Cooper NJ, Peters J, Lai MC, et al. How valuable are multiple
equal? Evid Based Ment Health. 2009;12:101–4. treatment comparison methods in evidence-based health-
Bucher HC, Guyatt GH, Griffith EL, et al. The results of care evaluation? Value Health. 2011;14:371–80.
direct and indirect treatment comparisons in meta- Dias S, Welton NJ, Caldwell DM, et al. Checking consis-
analysis of randomized controlled trials. J Clin tency in mixed treatment comparison meta-analysis.
Epidemiol. 1997;50(6):683–91. Stat Med. 2010;29:932–44.
Caldwell DM, Ades AE, Higgins JPT. Simultaneous com- Djulbegovic B, Kumar A, Magazin A, et al. Optimism bias
parison of multiple treatments: combining direct and leads to inconclusive results-an empirical study. J Clin
indirect evidence. BMJ. 2005;331:897–900. Epidemiol. 2011;64:583–93.
Donegan S, Williamson P, Gamble C, et al. Indirect com- Lu G, Ades AE. Assessing evidence inconsistency in
parisons: a review of reporting and methodological mixed treatment comparisons. J Am Stat Assoc.
quality. PLoS One. 2010;5:e11054. 2006;101:447–59.
Edwards SJ, Clarke MJ, Wordsworth S, et al. Indirect Lu G, Ades AE. Modeling between-trial variance structure
comparisons of treatments based on systematic reviews in mixed treatment comparisons. Biostatistics.
of randomised controlled trials. Int J Clin Pract. 2009;10(4):792–805.
2009;63:841–54. McAlister FA, Laupacis A, Wells GA, et al. Users’ guides
Eli Lilly and Company. Gemcitabine for the treatment of to the medical literature: XIX. Applying clinical trial
metastatic breast cancer: Single technology appraisal sub- results B. Guidelines for determining whether a drug is
mission to the National Institute for health and Clinical exerting (more than) a class effect. JAMA.
Excellence. 2006. Available from http://www.nice.org.uk 1999;282:1371–7.
Elliott WJ, Meyer PM. Incident diabetes in clinical trials of Mills EJ, Ghement I, O’Regan C, et al. Estimating the
antihypertensive drugs: a network meta-analysis. Lan- power of indirect comparisons: a simulation study.
cet. 2007;369:201–7. PLoS One. 2011;6:e16237.
Glenny AM, Altman DG, Song F, et al. Indirect compari- NICE. Methods for the development of NICE public health
sons of competing interventions. Health Technol guidance. 2nd ed. Evidence Synthesis National Insti-
Assess. 2005;9:26. tute of Health and Clinical Excellence; 2008.
Guyatt GH, Sackett DL, Sinclair JC, et al. Users’ guides to O’Regan C, Ghement I, Eyawo O, et al. Incorporating
the medical literature. IX. A method for grading health multiple interventions in meta-analysis: an evaluation
care recommendations. Evidence-Based Medicine of the mixed treatment comparison with the adjusted
Working Group. JAMA. 1995;274:1800–4. indirect comparison. Trials. 2009;10:86.
Heres S, Davis J, Maino K, et al. Why olanzapine beats PBAC. Report of the indirect comparisons working group
risperidone, risperidone beats quetiapine, and to the pharmaceutical benefits advisory committee:
quetiapine beats olanzapine: an exploratory analysis assessing indirect comparisons. Pharmaceutical Benefits
of head-to-head comparison studies of second- Advisory Committee; 2008. http://www.health.gov.au/
generation antipsychotics. Am J Psychiatry. internet/main/publishing.nsf/Content/B11E8EF19B358
2006;163:185–94. E39CA25754B000A9C07/$File/ICWG%20Report%2
Higgins JPT, Green S. Cochrane handbook for systematic 0FINAL2.pdf
reviews of interventions. 5.0.1 ed. The Cochrane Col- Piccini JP, Kong DF. Mixed treatment comparisons for
laboration; 2008; John Wiley & Sons Ltd, The Atrium, atrial fibrillation: evidence network or bewildering
Southern Gate, Chichester, West Sussex, England. entanglement? Europace. 2011;13:295–6.
Higgins JPT, Thompson SG. Controlling the risk of spuri- Riley RD. Multivariate meta-analysis: the effect of ignor-
ous findings from meta-regression. Stat Med. ing within-study correlation. J R Stat Soc Ser
2004;23:1663–82. A. 2009;172:789–811.
Hoaglin DC, Hawkins N, Jansen JP, et al. Conducting Salanti G, Marinho V, Higgins JP. A case study of multiple-
indirect-treatment-comparison and network-meta-anal- treatments meta-analysis demonstrates that covariates
ysis studies: report of the ISPOR task force on indirect should be considered. J Clin Epidemiol.
treatment comparisons good research practices-part 2009;62:857–64.
2. Value Health. 2011;14:429–37. Salanti G, Dias S, Welton NJ, et al. Evaluating novel agent
Hughes S. First “comparison” of prasugrel and ticagrelor. effects in multiple-treatments meta-regression. Stat
2010 Sep16. Available from http://www.theheart.org/ Med. 2010;29:2369–83.
article/1122713.do. Accessed 27 Apr 2011. Soares HP, Kumar A, Daniels S, et al. Evaluation of new
Jackson D, Riley R, White IR. Multivariate meta-analysis: treatments in radiation oncology: are they better than
potential and promise. Stat Med. 2011;30:2481–98. standard treatments? JAMA. 2005;293:970–8.
Jansen JP, Schmid CH, Salanti G. Directed acyclic graphs Song F, Altman D, Glenny AM, et al. Validity of indirect
can help understand bias in indirect and mixed treat- comparison for estimating efficacy of competing inter-
ment comparisons. J Clin Epidemiol. ventions: empirical evidence from published meta-
2012;65:798–807. analyses. BMJ. 2003;326:472.
Jones A, Takeda A, Tan SC, Cooper K, Loveman E, Song F, Loke YK, Walsh T, et al. Methodological problems
Clegg A, Murray N. Gemcitabine for metastatic breast in the use of indirect comparisons for evaluating
cancer: evidence review group report. 2006. Available healthcare interventions: survey of published system-
from www.nice.org.uk atic reviews. BMJ. 2009;338:b1147.
Lambert PC, Sutton AJ, Burton PR, Abrams KR, et al. Song F, Xiong T, Parekh-Bhurke S, et al. Inconsistency
How vague is vague? A simulation study of the impact between direct and indirect comparisons of competing
of the use of vague prior distributions in MCMC using interventions: meta-epidemiological study. BMJ.
WinBUGS. Stat Med. 2005;24:2401–28. 2011;343:d4909.
Lu G, Ades AE. Combination of direct and indirect evi- Spiegelhalter DJ, Best NG, Bradley PC, et al. Bayesian
dence in mixed treatment comparisons. Stat Med. measures of model complexity and fit. J R Stat Soc Ser
2004;23(20):3105–24. PMID: 15449338” B. 2002;64:583–639.
Spiegelhalter DJ, Abrams KR, Myles PJ. Bayesian Viechtbauer W. Confidence intervals for the amount of
approaches to clinical trials and health-care evaluation. heterogeneity in meta-analysis. Stat Med.
Chichester: Wiley; 2004. 2007;26:37–52.
Sutton AJ, Abrams KR. Bayesian methods in meta- Warn DE, Thompson SG, Spiegelhalter DJ. Bayesian ran-
analysis and evidence synthesis. Stat Methods Med dom effects meta-analysis of trials with binary
Res. 2001;10:277–303. outcomes: methods for the absolute risk
Thijs V, Lemmens R, Fieuws S. Network meta-analysis: difference and relative risk scales. Stat Med.
simultaneous meta-analysis of common antiplatelet 2002;21:1601–23.
regimens after transient ischaemic attack or stroke. Wells GA, Sultan SA, Chen L, et al. Indirect evidence:
Eur Heart J. 2008;29:1086–92. indirect treatment comparisons in meta-analysis.
Uhtman OA, Abdulmalik J. Comparative efficacy and Ottawa: Canadian Agency for Drugs and Technologies
acceptability of pharmacotherapeutic agents for anxiety in Health; 2009.
disorders in children and adolescents: a mixed treat- White IR. Multivariate random-effects meta-
ment comparison meta-analysis. Cur Med Res Opin. regression: updates to mvmeta. Stata
2010;26(1):53–9. J. 2011;11(2):255–70.
Introduction to Social Network
Analysis 26
Alistair James O’Malley and Jukka-Pekka Onnela
Contents
Part I: Introduction and Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
Historical Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
Representation of Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620
Representation of Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
Descriptive Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
Unipartite or One-Mode Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623
Bipartite or Two-Mode Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
Part II: Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
Network Influence Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
Relational Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
Part III: Network Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638
Generative Models of Network Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
Network Communities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
Part IV: Discussion and Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
Glossary of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
Terms Used in Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
Terms Used in Network Science . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
A. J. O’Malley (*)
Abstract
The Dartmouth Institute for Health Policy and Clinical This chapter introduces statistical methods
Practice, Department of Biomedical Data Science, Geisel used in the analysis of social networks and
School of Medicine at Dartmouth, Lebanon, NH, USA in the rapidly evolving parallel-field of net-
Department of Health Care Policy, Harvard Medical work science. Although several instances
School, Boston, MA, USA of social network analysis in health services
e-mail: James.OMalley@Dartmouth.edu
research have appeared recently, the majority
J.-P. Onnela involve only the most basic methods and
Department of Biostatistics, Harvard School of Public
Health, Boston, MA, USA
thus scratch the surface of what might be
e-mail: onnela@biostat.hsph.edu; accomplished. Cutting-edge methods using
onnela@hsph.harvard.edu

https://doi.org/10.1007/978-1-4939-8715-3_37
618 A. J. O’Malley and J.-P. Onnela
relevant examples and illustrations in health might also influence the weight of A’s friends
services research are provided. (B and C) because they exercise more when
around A. Hence, A’s weight intervention may
also affect the weight of B and C. A consequence
Part I: Introduction and Background is that the total effect of A’s treatment must also
consider its effect on B and C, the benefit to
Social network analysis is the study of the struc- individuals to whom B and C are connected, and
ture of relationships linking individuals (or other so on. Such interference between observations
social units, such as organizations) and of interde- violates the stable-unit treatment value assump-
pendencies in behavior or attitudes related to con- tion (SUTVA) that one individual’s treatment not
figurations of social relations. The observational affect anothers outcome (Rubin 1978), which pre-
units in a social network are the relationships sents challenges for identification of causal
between individuals and their attributes. Whereas effects. Interference is likely to result in an incon-
studies in medicine typically involve individuals gruity between a regression parameter and the
whose observations can be thought of as statisti- causal effect that would be estimated in the
cally independent, observations made on social absence of interference.
networks may be simultaneously dependent on The second problem is important in sociology
all other observations due to the social ties and as social networks are thought to reveal the struc-
pathways linking them. Accordingly, different ture of a group, organization, or society as a whole
statistical techniques are needed to analyze social (Freeman 2004). For example, there has always
network data. The focus of this chapter is socio- been great interest in determining whether the
centric data, the case when relational data is triad is an important social unit (Simmel 1908;
available for all pairs of individuals, allowing a Heider 1946). If the existence of network ties A-B
fully-fledged review of available methods. and A-C makes the presence of network tie B-C
Two major questions in social network analy- more likely then the network exhibits transitivity,
sis are: (1) do behavioral and other mutable traits commonly described as “a friend of a friend is a
spread from person-to-person through a process friend.” Thus, just as an individual may influence
of induction (also known as social influence, peer or be influenced by multiple others, the relation-
effects, or social contagion); (2) what exogeneous ship status of one dyad (pair of individuals) may
factors (e.g., shared actor traits) or endogeneous affect the relationship status of another dyad, even
factors (e.g., internal configurations of actors such if no individuals are common to multiple dyads.
as triads) are important to the overall structure of Accounting for between dyad dependence is a
relationships among a group of individuals. core component of many social network analyses
The first problem has affinity to medical stud- and has entailed much methodological research.
ies in that individuals are the observational units. Network science is a parallel field to social
In medicine, the health of an individual is para- network analysis in that there is very little overlap
mount and so individual outcomes have histori- between researchers in the respective fields
cally been used to judge the effectiveness of an despite the similarity of the problems. Whereas
intervention. A study of social influence in med- solutions to problems in social networks have
icine may involve the same outcome, but the tended to be data-oriented in that models and
treatment or intervention is the same variable statistical tests are based on the data, those in
evaluated on the peers of the focal individual network science have tended to be phenomenon-
(referred to as alters). An important characteristic oriented with analogies to problems in the physi-
of studies of social influence is that individuals cal sciences often providing the backbone for
may partly or fully share treatments and one indi- solutions. Methods for social network analysis
vidual’s treatment may depend on the outcome of often have causal hypotheses (e.g., does one indi-
another. For example, an intervention that encour- vidual have an effect on another, does the pres-
ages person A to exercise in order to lose weight ence of a common friend make friendship
26 Introduction to Social Network Analysis 619
formation more likely) motivating them and Historical Note

involving microlevel modeling. In contrast,
methods in network science seek models gener- In the 1930s, a field of study involving human
ated from some theoretical basis that reproduce interactions and relationships emerged simulta-
the network at a global or system level and in so neously from sociology, psychology, and anthro-
doing reveal features of the data-generating pro- pology. Moreno is credited for inventing the
cess (e.g., is the network scale-free, does the sociogram (Moreno 1934), a visual display of
degree-distribution follow a power-law). One of social structure. The appeal of the sociogram led
the goals of this chapter is to address the lack of to Moreno being considered a founder of sociom-
interaction between the social network and net- etry, a precursor to the field known as social
work science fields by providing the first joint networks. A number of mathematical analyses
review of both. By enlarging the range of methods of network-valued random variables in the form
at the disposal of researchers, advances at the of sociograms followed (Festinger 1949; Katz
frontier of networks and health will hopefully 1947, 1953; Katz and Powell 1955). Other
accelerate. important contributions were to structural balance
The computer age has enabled widespread (Heider 1946; Newcomb 1953; Cartwright and
implementation of methods for social network Harrary 1956), the diffusion of medical innova-
and network science analysis, particularly statisti- tions (Coleman et al. 1957, 1966), structural equi-
cal models. At the same time, a diverse range valence (Lorrain and White 1971), and social
of applications of social network analysis have influence (Marsden and Friedkin 1993). Refer to
appeared, including in medicine (Keating et al. Wasserman and Faust (1994, chapter 1) for a
2007; Pham et al. 2009; Barnett et al. 2012a; detailed historical account.
Iwashyna et al. 2002; Pollack et al. 2012). Early network studies involved small networks
Because many medical and health-related phe- with defined boundaries such as students in a
nomena involve interdependent actors (e.g. classroom, or a few large entities such as countries
patients, nurses, physicians, and hospitals), engaging in international trade. Because the typi-
there is enormous potential for social network cal number of individuals in such studies was
analysis to advance health services research small (e.g., 100), relationships could be deter-
(O’Malley and Marsden 2008). mined for all possible pairs of individuals yielding
The layout of the remainder of the chapter is as complete sociocentric datasets. Furthermore, the
follows. This introductory section concludes with often enclosed nature of the system (e.g., a class-
a brief historical account of social networks and room or commune) reduced the risk of
network science is given. The major types of confounding by external factors (e.g., unobserved
networks and methods for representing networks actors).
are then discussed (section “Representation of Sociological theory developed over time as
Networks”). In section “Descriptive Measures” sociologists provided intuitive reasoning to sup-
formal notation is introduced and descriptive mea- port various hypotheses involving social networks
sures for networks are reviewed. Social influence and society (Freeman 2004). In the specific area of
and social selection are studied in sections “Net- individual health, at least five principal mediating
work Influence Models” and “Relational Ana- pathways through which social relationships and
lyses”, respectively. Our focus switches to thus social networks may influence outcomes
methods akin with network science in section have been posited (Berkman and Glass 2000).
“Generative Models of Network Formation”, Prominent among these is social support, which
where descriptive methods are discussed. The has emotional, instrumental, appraisal (assistance
review of network science methods continues in decision making), and informational aspects
with community detection methods in section (House and Kahn 1985). Beyond social support,
“Network Communities”. The chapter concludes networks may also offer access to tangible resou-
in “Discussion and Glossary”. rces such as financial assistance or transportation.
They can also convey social influence by defining methods to provide valuable insight on important
norms about such health-related behaviors as practical problems.
smoking or diet, or via social controls promoting
(for example) adherence to medication regimes
(Marsden 2006). Networks are also channels
through which certain communicable diseases, Representation of Networks
notably sexually transmitted ones, spread
(Klovdahl 1985) and certain network structures Social networks are comprised of units and
have been hypothesized to reduce exposure to the relationships between them. The units are
stressors (Haines and Hurlbert 1992). often individuals (also referred to as actors)
A field known as mathematical sociology but can include larger (e.g., countries, compa-
complemented social theory by attempting to nies) and smaller (e.g., organisms, genes)
derive results using mathematical rather than intu- entities.
itive arguments. In particular, statistical and prob-
ability methods are used to test for the presence of
various structural features in the network. Other Network Data
key areas of mathematics that have been used in
network analysis include graph theory and alge- In sociocentric studies, data is assembled on the
braic models. Katz and Powell (1955) develop ties linking all units or actors within some
tests of dependence within dyads (pairs of actors) bounded social collective (Laumann et al. 1983).
while Harary (1953) and Harary (1955) develop For example, the collection of data on the network
tests of triadic dependence. In general, results of all children in a classroom or on all pairs of
were descriptive or based on simple models mak- physician collaborations within a medical practice
ing strong assumptions about the network. With constitutes a sociocentric study. Relationships can
the advent of powerful computers, mathematical be shared or directional, and quantified by binary
contributions have taken on more importance as (tie exists or not), scale (or valued), or multivariate
so much more can be implemented than in the variables. By measuring all relationships, socio-
past. For example, computer simulation has centric data constitutes the highest level of infor-
recently been used to test and develop theoretical mation collection and facilitates an extensive
results (Centola 2009). range of analyses including accounting for the
In the mid-late 1990s, network science effects of multiple actors on actor outcomes or
emerged as a discipline. Whereas social networks the structure of the network itself to be studied
were the domain of social scientists and a growing (O’Malley and Marsden 2008). A weaker form of
number of statisticians, network scientists typi- relational data is collected in egocentric studies
cally have backgrounds in physics, computer sci- where individuals (“egos”) are sampled at random
ence, or applied mathematics. The use of physical and information is collected on at least a sample of
concepts to generate solutions to problems is com- the individuals with direct ties to the egos
mon as evinced by the large domains of research (“alters”). Because standard statistical methods
focusing on the adaptation of (e.g.,) a particular such as regression analysis can generally be used
physical equation to network data. For example, to analyze egocentric data (O’Malley et al. 2012),
several procedures for partitioning a network into herein egocentric data are not featured.
disjoint groups of individuals (“communities”) Relational data is often binary (e.g., friend or
rely on the modularity equation, which was devel- nonfriend). One reason is that other types of rela-
oped in the context of spin-theory to model the tional data (e.g., nominal, ordinal, interval-
interaction of electrons. While much of the initial valued) are often transformed to binary due to
work focused on the properties of the solution at the convenience of displaying binary networks.
different values of the parameters, there recently Another is the greater range of models available
has been increased attention to using these for modeling binary data.
Many studies involve two distinct types of units, square for one-mode and rectangular for
such as patients and physicians, or physicians and two-mode networks. Elements of the matrix con-
hospitals, authors and journal articles or books, etc.
In these two-mode networks, the elementary rela- tain the value of the relationship linking the
tionships of interest usually refer to affiliations of corresponding units or actors, so that element ij
units in one set with those in the other, e.g., of represents the relationship from actor i to actor j.
patients with the physician(s) responsible for their With binary ties (1 = tie present, 0 = tie absent),
care, or of physicians with the hospital(s) at which
they are admitted to practice. Two-mode networks the matrix representation is known as an adja-
are also known as affiliation or bipartite networks. cency matrix. Irrespective of how the network is
They can be viewed as a special case of general valued, the diagonal elements of the matrix
sociocentric network data in that the relationship of representing the network equal 0 as self-ties are
interest is between heterogeneous types of actors.
not permitted. Several network properties can be
The advent of high-powered computers has computed through matrix operations.
enabled the analysis of large networks, which In graphical form, units or actors are vertices
has benefitted fields such as health services and nonnull relationships are lines. Nondirected
research that regularly encounter large data sets. relationships are known as “edges” and directed
A challenge facing analyses of large networks is ones as “arcs”; arrows at the end(s) of arcs denote
that it may be infeasible for all actors to be their directionality. Value-weighted graphs can be
exposed to each other actor and thus for a rela- constructed by displaying nonnull tie values along
tionship to have formed. Therefore, statistical ana- arcs or edges, or by letting thinner and thicker
lyses for large networks essentially use relational lines represent line values. Such graphical imag-
data representing the joint event of individuals ery is a hallmark of social network analysis (Free-
meeting and then forming a tie, not the network man 2004).
of ties that would be observed if all pairs of Two-mode (or bipartite) networks may be
individuals actually met. Accordingly, analyses represented in set-theoretic form as hypergraphs
of large networks may underestimate effect sizes consisting of a set of actors of one type, together
unless information on the likelihood of two indi- with a collection of subsets of the actors defined
viduals meeting is incorporated. on the basis of a common actor of the second type
(Wasserman and Faust 1994). This representation
highlights the multiparty relationships that may
exist among those actors of one type that are
Representation of Network Data linked to a given actor of the other type, e.g., the
set of all physicians affiliated with a particular
Let the status of the relationship from i to j be clinic or service. In matrix form, element ij of an
denoted by aij, element ij of the adjacency matrix affiliation matrix A indicates that actor i of the first
A. In a directed network aij may differ from aji type is linked to actor j of the second type. Affil-
while in a nondirected network aij = aji, implying iation networks may usefully be represented as
A = AT. A network constructed from friendship bipartite graphs in which nodes are partitioned
nominations is likely to be directed while a network into two disjoint subsets and all lines link nodes
of coworkers is nondirected. In the case of immu- in different sets.
table relationships (e.g., siblings), A will only An induced one-mode network A may be
change as actors are added or removed (e.g., obtained by multiplying an affiliation matrix B
through birth or death), as relationship status is by its transpose, A = BBT; entry ij of the outer-
otherwise invariant. In the following, assume the product BBT gives the number of affiliations
network is binary unless otherwise stated (Fig. 1). shared by a pair of actors of one type (see Fig. 2,
Matrices and graphs are two common ways of which emulates a figure in Landon et al. (2012)).
representing the status of a network at a fixed Dually, the inner-product BT B yields a one-mode
time. In a matrix representation, rows and col- network of shared affiliations among actors of the
umns correspond to units or actors; the matrix is second type (Breiger 1974). The diagonals of the
Fig. 1 Graphical and

matrix representation of a A
social network. Digraph A B C D E F G
(left) and adjacency matrix A 0 1 1 0 0 0 0
B C
(right), which is denoted in B 0 0 0 0 0 1 0
the text as A. Note: Self-ties E C 1 0 0 0 1 0 0
are not relevant in studies D 0 0 1 0 0 0 0
involving relationships F D E 1 0 1 0 0 0 1
F 0 0 0 0 1 0 0
G G 0 0 0 0 0 0 0
Physicians Patients Physicians

A
A
B
1
F B
C
2
D
E C
3
E
D
1 2 3 A B C D E F
1 1 0 A 2 1 2 1 1 0 A
1 0 0 B 1 1 1 0 0 0 B
1 1 0 C 2 1 2 1 1 0 C
0 1 1 D 1 0 1 2 2 1 D
0 1 1 E 1 0 1 2 2 1 E
0 0 1 F 0 0 0 1 1 1 F
Fig. 2 A schematic illustrating a projection from a projection of the doctor-patient network is obtained by
two-mode (bipartite) to a one-mode (unipartite) network. multiplying the bipartite adjacency matrix B by its trans-
For example, Medicare records link each doctor to a num- pose, BT, to yield a 6 6 symmetric one-mode adjacency
ber of patients, defining a bipartite network consisting of matrix A, whose elements indicate the number of patients
two types of nodes, doctors and patients. An edge can only the two physicians have in common. The diagonal ele-
exist between different types of nodes (a doctor and a ments of A correspond to the number of patients the
patient), and the network is fully described by the (in this given physician “shares with themselves” (i.e., the number
case 6 3) bipartite adjacency matrix B. A one-mode of patients they care for)
outer and inner matrix products give the degree of that shared actors from the other mode act as
the actors (i.e., the number of ties to actors of the surrogates for ties between the actors. For exam-
other mode). ple, physicians with many patients in common
In health services applications, an investigator might have heightened opportunities for contact
is often interested in a one-mode network that is through consultations or sharing of information
not directly observed but rather is induced from a about those patients, and thus the number of
two-mode network. Such one-mode projection shared patients is a surrogate for the actual extent
networks are motivated theoretically by a claim of interaction between pairs of physicians.
Examples of provider (physician, hospital, health involves subgroup-specific network density statis-
service area) networks obtained as one-mode pro- tics. With high homophily according to some
jections of bipartite networks in health services attribute, networks tend toward segregation by
research are given in (Barnett et al. 2011, 2012a, b; that attribute – the extreme case occurs when the
Pham et al. 2009). network consists of separate components (i.e., no
An often overlooked feature of bipartite net- ties between actors in different components)
work analysis is the mechanism by which network defined by levels of the attribute. In the other
data is obtained. Networks obtained from direction, one obtains a bipartite network where
one-mode projections have different statistical all ties are between different types of actors
properties from directly observed one-mode net- (extreme heterophily).
works. Consider a patient-physician bipartite net- The out- and in-degree for an actor i are the
P
work and suppose a threshold is applied to the number of ties from, aiþ ¼ Nj¼1 aij (column
P
physician one-mode projection such that true sum), and to, aþj ¼ Ni¼1 aij (row sum), actor i.
social ties are assumed to exist or not according These are also referred to as expansiveness and
to whether one or more patients are shared. Then a popularity, respectively. For example, a positive
patient that visits three physicians induces ties correlation between out- and in-degree suggests
between all three physicians. The same complete that popular individuals are expansive.
set of ties between the three physicians is also The number of ties (or value of the ties) in a
induced by three patients that each visit different network is given by L ¼ N d, where d denotes the
pairs of the three physicians. However, the pro- mean degree (or strength) of an individual, imply-
jection does not preserve the distinction (see sec- ing the density of the network is given by D ¼ d=
tion “Bipartite or Two-Mode Networks” for ðN 1Þ. This result is not specific to in- or out-
further comment). degree due to the fact that the total number of
inward ties must equal the total number of outward
ties, implying mean in-degree equals mean
Descriptive Measures out-degree.
The variance of the degree distribution measures
Unipartite or One-Mode Networks the extent to which tie-density (or connectedness)
varies across the network (Snijders 1981). Often
The number of units or actors (N) is known as the actors having higher degree have prominent roles
order of the network. A common network statistic in the network (Freeman 1979). A special type of
is network density (D), defined as the number of homophily is the phenomenon where individuals
ties across the network (L ) divided by the number form ties with individuals of similar degree, com-
of possible ties; for directed networks D = L/ monly referred to as assortative mixing. In
(N (N 1)) and for nondirected networks directed networks, assortative mixing can be
D = L/(2 N (N 1)). Thus, density equals the defined with respect to both out-degree and
mean value of the binary (1, 0) ties across the in-degree (Piraveenan et al. 2010). The opposite
network. The same definition can be used for scenario to a network with the same degree for all
general relational data, in which case the resulting actors is a k-star – a network configuration with
measure is sometimes referred to as strength. k relationships are incident to the focal actor
While results in this chapter are generally pre- (Fig. 3) – in which there are no ties between the
sented for binary networks, corresponding mea- other actors.
sures for weighted networks often exist (Opsahl The length of a path between two actors
et al. 2010). through the network is defined as the number of
The tendency for relationships to form between ties traversed to get from one actor to the other.
people having similar attributes is known as The elements of the adjacency matrix multiplied
homophily (McPherson et al. 2001). Homophily by itself k 1 times, denoted Ak, equal the number
Fig. 3 Triadic and k-star Undirected Transitive 3-cycle

configurations
A A A
Triads
B C B C B C
Undirected Outstar Instar

A A A
k-stars
1 ... k 1 ... k 1 ... k
of paths of length k between any two actors with Centrality

the number of k-cycles (including multiple or Centrality is the most common metric of an actor’s
repeated loops) on the diagonal. The shortest prominence in the network and many distinct
path between two actors is referred to as the geo- measures exist. They are often taken as indicators
desic distance. of an actor’s network-based “structural power.”
Such measures are often used as explanatory vari-
Clustering ables in individual-level regression models
Certain subnetworks have particular theoretical (Barnett et al. 2012a).
prominence. The first step-up from the Different centrality measures are characterized
trivial single actor subnetwork, also known as by the aspects of an actor’s position in the network
an isolated node, is the network comprising two that they reflect. For example, degree-based cen-
actors (a “dyad”). The presence and magnitude trality – the degree of an actor in an undirected
of a tendency toward symmetry or reciprocity in network and in- or out-degree in a directed net-
a directed network can be measured by compar- work – reflects an actor’s level of network con-
ing the number of mutual dyads (ties in both nectivity or involvement in the network.
directions) to the number expected under a null Betweenness centrality computes the frequency
model that does not accommodate reciprocity. with which an actor is found in an intermediary
If the number of mutual dyads is higher position along the geodesic paths linking pairs of
than expected, there is a tendency towards other actors. Actors with high betweenness cen-
reciprocation. trality have high capacity to broker or control
A triad is formed by a group of three actors. relationships among other actors. A third major
Figure 3 shows a “transitive triad,” so-named as it centrality measure, closeness centrality, is
exhibits the phenomenon that a “friend of a friend inversely proportional to the sum of geodesic
is a friend.” Nonparametric tests for the presence distances from a given actor to all others. The
of transitivity or other forms of triadic dependence rationale underlying closeness measures is that
are based on the distribution of the number of actors linked to others via short geodesics have
closed and nonclosed triads conditional on the comparatively little need for intermediary units,
number of null (no ties intact), directed (one tie and hence have relative independence in manag-
intact), and mutual dyads (both ties intact) collec- ing their relationships. Closeness measures are
tively known as the dyad census; the degree dis- defined only for networks in which all actors are
tribution; and other lower-order effects (e.g., mutually related to one another by paths of finite
homophily of relevant individual characteristics) geodesic distance, i.e., single component net-
in the observed network. Such tests are described works. Finally, eigenvalue centrality is sensitive
in Wasserman and Faust (1994, chapter 14). to the presence or strength of connections, as well
as those of the actors to which an actor is linked sets of intermediary actors who also lie within
(Bonacich 1987). It assumes that connections to the subgraph). Such a criterion is related to
central actors indicate greater prominence than do k-coreness, a measure of the extent to which sub-
(similar-strength) connections to peripheral graphs with all internal degrees k occur
actors. The key component of the measure is the (Seidman 1983) in a network.
largest eigenvalue of an adjacency or other matrix There are several other ways for grouping the
representation of the network (Bonacich 1987). actors in a network. Model-based methods
Network-level centrality indices (Freeman include mixed-membership stochastic block
1979) are network-level statistics that resemble models (Airoldi et al. 2008) and latent-class
the degree variance whose values grow larger to models in which the group is treated as a categor-
the extent that a single actor is involved in all ical individual-level latent variable (Handcock
relationships (as in the “star” network shown in et al. 2007) while nonparametric methods used
Fig. 3). in network science include modularity and its
variants. These methods are discussed in section
Cliques, Components, and Communities “Network Communities”, where the grouping of
The assignment of actors to groups is an important actors is referred to as community detection.
and growing field within social networks. The
rationale for grouping actors is that it may reveal
salient social distinctions that are not directly Bipartite or Two-Mode Networks
observed. The general statistical principle adhered
to is that individuals within a group are more alike In practice two-mode networks are rarely directly
than individuals in different groups. Groups are analyzed. If one of the modes instigates ties or is
typically formed on the basis of network ties of primary interest, the network involving just
alone, the rationale being that the similarity of those actors is often analyzed as a single-mode
individuals’ positions in the network is in-part network. For example, in a physician-patient
revealed by the pattern of ties involving them. referral network, the physicians often instigate
Thus, actors in densely connected parts of the ties through patient referrals while patients are
network are likely to be grouped together. A chiefly responsible for who they see first. The
related concept to a group is a clique, a maximal projection from a two-mode network to a
subset of actors having density 1.0 (i.e., ties exist one-mode network links nodes in one mode
between all pairs of individuals in a binary net- (e.g., physicians) if they share a node of the
work). The larger the clique the stronger the evi- other mode (e.g., patients). A weighted network
dence that the collective individuals are in the can be formed with the number of shared actors of
same group. Grouping algorithms based on max- the other mode (or function thereof) as weights.
imizing the ratio of within-group to between- In describing networks obtained from a projec-
group ties are unlikely to split large cliques as tion of a two-mode network, the usual practice is
doing so creates a lot of between-group ties. How- to use unipartite descriptive measures. However,
ever, a clique need not be its own group. several layers of information are lost, including
Components of a network are defined by the the number of actors in the other mode underlying
nonexistence of any paths between the actors a tie and the degree distribution of the actors in the
in them. Often a network is comprised of one other mode, from treating a one-mode projection
large component and several small components as an actual network. Even if the two-mode net-
containing few individuals. A more practical work is completely random, ties in a one-mode
way of grouping individuals than by cliques is projection that arise from a single (e.g.,) patient
through k-connected components (White and with ties to (e.g.,) three physicians are not sep-
Harary 2001), a maximal subset of actors mutu- arate events. More generally, a patient who visits
ally linked to one another by at least k node-inde- k-physicians generates a k-clique among those
pendent paths (i.e., paths that involve disjoint physicians and tells us nothing about whether
physician sharing of one patient is correlated with individual, yielding their strength. Degree and
physician sharing of another patient – the question strength together distinguish between actors with
of primary interest in the study of the diffusion of many weak ties and those with a few strong ties.
treatment practices. Thus, k-cliques for k > 2 may Analogous measures of centrality can also be
be excluded from measures of transitivity in computed for the weighted one-mode projection
two-mode networks. (Opsahl et al. 2010). However, whether ties
Descriptive measures for two-mode networks between k physicians arise through them all
may be computed that parallel those for treating the same patient, from each pair of phy-
one-mode networks (Wasserman and Faust sicians sharing a unique patient, or some
1994). Centrality measures based on the bipartite in-between scenario cannot be determined post-
network representation are covered in Faust transformation; thus, the projection transforma-
(1997). Borgatti and Everett (1997) review visu- tion expends information.
alization, subgroup detection, and measurement A further strategy is to set weights for the
of centrality for two-mode network data. More bipartite network prior to forming the projection.
descriptive measures for two-mode networks For example, in coauthorship networks, the tie
have recently been proposed. For example, a connecting an author to a publication might
two-mode measure of transitivity defined as the receive a weight of 1/(Nj 1) where Nj is the
ratio of the total number of six cycles (closed number of authors on paper j (Newman 2001).
paths of six ties through six nodes) in the (Only papers with at least two authors are used
two-mode network divided by the total number to form such networks.) The rationale is that the
of open five-paths through six nodes (Opsahl greater the number of authors the lower the
2011). In the context of the patient-physician expected interaction between any pair (a similar
network, physician transitivity exists if physi- logic underlies the example weight matrix
cians A and B sharing a patient and physicians described in section “Network Influence
B and C sharing a patient makes it more likely for Models”). The sum of the weights across all pub-
physicians A and C to share a patient. It is only if lications common to two authors is then the basis
the two pairs of physicians have different of their relationship in the author network.
patients in common that the physician triad If the events defining the bipartite network
may be transitive and only if the third pair occur at different times (e.g., medical claims data
share a different patient from the first two that often contain time-stamps for each patient-
the event can be attributed to transitivity. The physician encounter), a directed one-mode net-
involvement of distinct patients makes the work may be formed. The value of the A-B and
physician-physician ties distinct events and B-A ties in the physician-physician network could
thus informative about clustering of physicians be the number of patients who visited A before B
(and patients). and B before A, respectively. In the resulting
In general, the matrix equation A = BBT in directed network each physician has a flow to
which a bipartite network adjacency matrix B is and from each other physician. Subsequent trans-
multiplied by its transpose yields a weighted formation of the flows to binary values yields
one-mode network (the elements contain the num- dyads with states null, directed, and mutual as in
ber of shared actors of the other mode). To avoid a directed unipartite binary network.
losing information about the number of actors Because medical claims and surveys are fre-
leading to a tie between primary nodes, weights quent sources of information about one entity’s
can be retained or monotonically transformed in experience (e.g., a patient) with another entity
the projected network. Weighted analogies of (e.g., a health plan or physician), bipartite network
descriptive measures of binary networks can be analysis is an area that promises to have enormous
evaluated on the weighted one-mode projection. applicability to health services research. Hence,
For example, the calculation of degree is emulated new methods for bipartite network analysis are
by summing the weights of the edges involving an needed.
Part II: Statistical Models In social influence analyses the weight matrix,
W = [wij] in Fig. 4, apportions the total influence
We now consider the use of statistical models in acting on an individual evenly across the individ-
social network analysis. Particular emphasis is uals with whom they have a network tie. Typically
placed on methods for estimating social influence
or peer effects and models for analyzing the net- 1. wij 0: nonnegative weights.
work itself, including accounting for social selec- 2. wii = 0: no self-influence.
tion through the estimation of effects of 3. j wij = 1: weights give relative influences
homophily. (because its row-sums equal 1, W is said to be
row-stochastic).
Network Influence Models

Let ӯ it = (WYt)i denote the influence-weighted
Reported claims about peer effects of health out- average of the outcome y across the network after
comes such as BMI, smoking, depression, alcohol excluding (i.e., subtracting) individual i from the
use, and happiness have recently tantalized the set of individuals to be averaged over. Similarly, let
social sciences. In large part, the discussion and xTit ¼ ðW Xt Þi denote the vector containing the
associated controversies have arisen from the sta- corresponding influence weighted covariates,
tistical methods used to estimate peer effects often referred to as contextual variables.
(O’Malley 2013; Christakis and Fowler 2013). The most common choice for W is the
Let yit and xit denote a scalar outcome and a row-stochastic version of A. For illustration, sup-
vector of variables, respectively, for individual pose that A is binary (the elements are 1 and 0).
i = 1, . . ., N at time t = 1, . . ., T (xit includes Then the off-diagonal elements on the ith row of
1 as its first element to accommodate an intercept). W equal a1 iþ if ai+ > 0 and 1/(N 1) otherwise
In this section, the relationship status of individ-
(Fig. 4). This framework assumes that an individ-
uals i and j from the perspective of individual
ual’s alters are equally influential. In general,
i (denoted aij) is assumed to be time-invariant.
influence might only transmit through outgoing
For ease of notation no distinction is made
ties (e.g., those individuals viewed as friends by
between random variables and realizations of
the focal actor – a scenario consistent with Fig. 4),
them. The vector Yt and the matrices Xt and A
or might only transmit through received ties (e.g.,
are the network-wide quantities whose ith ele-
individuals who view the focal actor as a friend),
ment, ith row, and ijth element contain the out-
or might act in equal or different magnitude in
come for individual i, the vector of covariates for
both directions.
individual i, and the relationship between individ-
Network-related interdependence among
uals i and j as perceived by individual i, respec-
the outcomes may be incorporated in two
tively. The representation of an example
distinct ways. First, an outcome for one actor
adjacency matrix, denoted A, is depicted in Fig. 1.
may depend directly on the lagged outcomes
Regression models for estimating peer effects
or lagged covariates of the alters to whom
are primarily concerned with how the distribution
she or he is linked. For example, consider the
of a dependent variable (e.g. a behavior, attitude,
model:
or opinion) measured on a focal actor is related to
one or more explanatory variables. When behav-
iors, attitudes, or opinions are formed in part as the
yit ¼ α1 yiðt1Þ þ αTx xiðt1Þ þ β1 yiðt1Þ
result of interpersonal influence, outcomes for
different individuals may be statistically depen- þ βT2 xiðt1Þ þ eit , (1)
dent. The outcome for one actor will be related to
those for the other actors who influence her or where α1 is a scalar parameter quantifying the peer
him, leading to a complex correlation structure. effect; αx is a p-dimensional vector of parameters
A A B C D E F G
A 0 1/2 1/2 0 0 0 0
B C B 0 0 0 0 0 1 0
E C 1/2 0 0 0 1/2 0 0
D 0 0 1 0 0 0 0
F D E 1/3 0 1/3 0 0 0 1/3
F 0 0 0 0 1 0 0
G G 1/6 1/6 1/6 1/6 1/6 1/6 0
Fig. 4 Construction of a network weight matrix W (right). here assumes that influence only acts in the direction of the
A directed edge from i to j means that node (or individual) edge, influence may in general act in the absence of a tie
i has a relationship to node j while element ij of W quan- (e.g., people who consider me as a friend might influence
tifies the extent that individual i is influenced by individual me even if I do not consider them a friend)
j. Although the mathematical form of influence depicted
of peer effects acting through the p covariates in ðhÞ

X T
T ðhÞ ðhÞ
x, β ¼ β1 , βT2 is a vector of other regression yit ¼ α1 yiðt1Þ þ αðxhÞ xiðt1Þ
h¼1
parameters for the within-individual predictors,
and Eit is the independent error assumed to have þ β1 yiðt1Þ þ βT2 xiðt1Þ þ eit : (2)
mean 0 and variance σ 2. The notation used in
Eq. 1 is adopted through this section; hence, α ðhÞ
In the special case where α11 ¼ α21 ¼ . . . ¼ α1
and β denote peer effects and within-individual
and α1x ¼ α2x ¼ . . . ¼ αðxhÞ, Eq. 2 reduces to Eq. 1.
effects, respectively.
An alternative to Eq. 2 is to fit separate models for
Equation 1 is known as the “linear-in-means
each type of peer, which would yield estimates of
model” (Manski 1993) due to the conduit for
the overall (or marginal) peer effect for each type
peer influence being the trait averaged over the
of peer as opposed to the independent effect of
alters of each focal actor. The model has a
each type of peer above and beyond that of the
symmetric appearance in that it contains
other types.
corresponding peer effects for each of the
Failing to account for all alters may lead to biased
within individual predictors. A common
results if the alters are interconnected. Figure 5 pre-
alternative model assumes αx = 0; in other
sents a simple directed acyclic graph (DAG), which
words that peer effects only act through the
is a device for determining whether or not an effect
same variable in the alters as the outcome.
is identifiable, involving three individuals i, j, and k.
Another set of variants arises in the case
The nodes represent the variables of interest (a trait
when there are multiple types of alters with
measured on each individual such as their BMI) and
heterogeneous peer effects. Such a situation
the arrows represent causal effects (the origin of the
may be represented in a model by defining
arrow is the cause and the tip is the effect). Consider
distinct influence matrices for each type of peer.
the peer effect of individual j at t 1 on individual
Let W (h) denote the weight matrix formed
i at t. A causal effect is identifiable if it is the only
from the adjacency matrix for the network i
unblocked path between two variables. Because
(t 1) comprising
only
alters of type h and let
ðhÞ ðhÞ
individual k is a cause of both individual j and
yiðt1Þ W Yt1 for h = {1, . . ., H}, individual i, the peer effect of j on i will be con-
i
where H is the number of distinct types of founded by individual k unless the analysis condi-
alters. Then an extension of the linear-in-means tions on yk(t2).
model to accommodate heterogeneous peer The scenario depicted in Fig. 5 does not pre-
effects is: sent any major difficulties as long as effects
observations, OLS will be inconsistent. Therefore,

methods are needed to account for endogeneity
arising from the correlation between ӯ it and ejt for
j¼6 i – in network science parlance the state of ӯ it is
said to be an internal product or consequence of the
system as opposed to an external (exogenous) force.
In Christakis and Fowler (2007), the most
widely cited of the Christakis-Fowler peer effect
papers, the endogeneity problem is resolved using
a novel theoretical argument. They purported that
Fig. 5 Simplified directed acyclic graph (DAG) illustrat-
ing confounding of a peer effect by a third individual. The it is reasonable to assume in a friendship network
DAG is simplified because it does not explicitly show the that the influence acting on the focal actor (the
variable yk(t1), which is an intermediary between yk(t2) ego) is greatest for mutual friendships, followed
and yit. (Because the point made here does nut depend on yj by ego-nominated friendships, followed by alter-
(t2) and yi(t1) they are not depicted.) If yk(t2) (or yk(t1)) is
conditioned on, the path yj(t1) yk(t2)(! yk(t1)) ! yit is nominated friendships, and finally dyads with no
unblocked and therefore confounds yj(t1) ! yit, whose friendships. Furthermore, they reasoned that
effect is the peer effect of interest. Although the DAG because unmeasured common causes should
looks like a digraph of a network, a DAG is a different affect each dyad equally. Because the estimated
construction
peer effects were large and positive for mutual
friendships but close to 0 for alter and null friend-
involving individual k are accounted. However, if ships, consistent with their theory, it was
individual k is not known about or is ignored, then suggested that this constituted strong evidence of
the analysis may be exposed to unmeasured a peer effect. Despite the compelling argument,
confounding. This point has particular relevance Shalizi and Thomas (2011) revealed that
to social network analyses as networks are often unobserved factors affecting tie-formation (homo-
defined by specifying boundaries or rules for phily) may confound the relationship and thus
including individuals as opposed to being finite, lead to biased effects. The estimation of peer
closed systems (Laumann et al. 1983). In situa- effects is a topic of ongoing vigorous debate in
tions where such boundaries break true ties, influ- the academic and the popular press. Alternative
ential peers may be excluded, potentially leading approaches to the theory-based approach of
to biased results. Christakis and Fowler are now described.
A parametric model-based solution to endoge-
Estimation of Contemporaneous Peer nous feedback is to specify a joint distribution for
Effects «t = («1t, . . ., «Nt). Then the reduced form of
From a practical standpoint, it may be infeasible to the model satisfies Yt = α0WYt + α1WYt1 +
use a model with only lagged predictors such as W Xt1αx + β1Yt1 + Xt1β2 + «t for Yt to yield
Eq. 1. For instance, the time points might be so far Yt =(I α0W)1{α1W Yt1 + W Xt1αx + β1Yt1
apart that statistical power is severely + Xt1 β2 + «t}. The resulting model emulates a
compromised. Therefore, it is tempting to use a spatial autocorrelation model (Anselin 1988). One
model with contemporaneous predictors such as: way of facilitating estimation is by specifying a
probability distribution for «t. However, relying
yit ¼ α0 yit þ α1 yiðt1Þ þ αTx xit on the correctness of the assumed distribution for
identification may make the estimation procedure
þ β1 yiðt1Þ þ βT2 xit þ eit , (3) sensitive to an erroneous assumed distribution.
A semiparametric solution is to find an instru-
where adjusting for ӯ i(t1) seeks to isolate the peer mental variable (IV), zit, a variable that is related
effect acting since t 1. However, because ӯ it is to ӯ it but conditional on ӯ it and (ӯ i(t1), xit, yi
correlated with the outcome variables of other (t1), xit) does not cause yit. If xit is excluded
from Eq. 3, its elements can potentially be used as peer effects as they do not account for the statis-
Ivs (Fletcher 2008). However, IV methods can be tical dependence introduced by individuals who
problematic if the instrument is weak or if the play the dual role of ego and alter at time t
assumption that the IV does not directly impact (VanderWeele et al. 2012).
yit (the exclusion restriction) is violated, an
untestable assumption. Thus, in fitting a model Frontiers in Social Influence
with contemporaneous peer effects, one faces a There has recently been a lot of interest and dis-
choice between assuming a multivariate distribu- cussion concerning causal peer effects. Issues that
tion holds, relying on the nonexistence of have been discussed include the use of ordinary
unmeasured confounding variables, or relying on least squares (OLS) for the estimation of contem-
the validity of an IV. None of these assumptions poraneous peer effects (Lyons 2011) and the iden-
can be evaluated unconditionally on the tification of peer effects independent of
observed data. homophily (Shalizi and Thomas 2011). The dis-
While joint modeling and IV methods provide cussion has helped elevate social network meth-
theoretical solutions to the estimation of contem- odology to the forefront of many disciplines. For
poraneous peer effects, the notion of causality is example, VanderWeele et al. (2012) show that
philosophically challenged when the cause is not OLS still provides a valid test of the null hypoth-
known to occur prior to effect. Therefore, longi- esis that the peer effect is zero when the true peer
tudinal data provide an important basis for the effect is zero. Therefore, OLS can be used to test
identification of causal effects, in particular in for peer effects despite the fact that OLS estimates
negating concerns of reverse causality. If the are inconsistent under the alternative hypothesis.
observation times are far apart the use of lagged Christakis and Fowler (2007) use tie direction-
alter predictors may, however, substantially ality to account for unmeasured confounding vari-
reduce the power of an analysis. ables under the assumption that their effect on
relationship status is the same for all types of
Dyadic Influence Analyses relationships. The rationale is that the estimated
If the dyads consist of mutually exclusive or iso- peer effect in dyads where the relationship is not
lated pairs of actors there are no interdyad ties and expected to be conducive to peer influence (“con-
influence only acts within dyads. An example of trol relationships”) provides a baseline against
such a situation occurs when individuals can have which to identify the peer effect for other types
exactly one relationship and the relationship is of relationships. However, this test fails to offer
reciprocated, as is the case with spousal dyads. complete protection against unmeasured homo-
The network influence models of section “Net- phily (Shalizi and Thomas 2011), reflecting the
work Influence Models” reduce to dyadic influ- vulnerability of observational data to unmeasured
ence models in which the predictors are based on sources of bias. However, sensitivity analyses that
individual alters. For example, the dyadic influ- evaluate the effect-size needed to overturn the
ence model analogous to Eq. 3 is obtained by results may be conducted to help support a con-
replacing the subscript i with j. That is, clusion by illustrating that the confounding effect
must be implausibly large to reverse the finding
yit ¼ α0 yjt þ α1 yjðt1Þ þ αTx xjt þ β1 yiðt1Þ (VanderWeele 2011).
Instrumental variable (IV) methods have also
þ xTit β2 þ eit : (4) been used to estimate peer effects. A common
source of instruments is alters’ attributes other
The model in Eq. 4 may be estimated using than the one for which the peer effect is estimated
generalized estimating equations (GEE), avoiding (Fletcher 2008; Fletcher and Lehrer 2009). Poten-
specifying a distribution for eit. However, if any tial IVs must predict the attribute of interest in the
relationships are bidirectional, standard software alter but must not be a cause of the same attribute
packages will yield inconsistent estimates of the in other individuals. Attributes that are invisible
such as an individual’s genes appear to be ideal where α = (α1, . . ., αN)T and γ = (γ 1, . . ., γ N)T are
candidate genes. For instance, an individual with vectors of actor-specific parameters representing
two risk alleles of an obesity gene is at more risk the actors’ expansiveness (propensity to send ties)
of increased BMI but conditional on that individ- and popularity (propensity to receive ties), respec-
ual’s BMI their obesity genes should not affect the tively, and xij is a vector of covariates relevant to
BMI of other individuals. However, if the obesity aij (this may include covariates specific to either
genes are revealed through another behavior actor and combined traits of both actors). It is
(a phenomenon known as pleiotropy) that is asso- important to realize that covariates can be direc-
ciated with BMI then, unless such factors are tional; thus, xij need not equal xji. Although the
conditioned on, genes will not be valid IVs. model may include other parameters, α and γ play
an important role in network analysis due to their
relationship to the degree distribution of the net-
work and so are explicitly denoted.
Relational Analyses
When relationship status is binary, the distri-
bution of (aij, aji) is a four-component multino-
Sociocentric network studies assemble data on
mial distribution. The probabilities are typically
the ties representing the relationship linking a set
represented in the form of a generalized logistic
of individuals, such as all physicians within a
regression model (an extension of the logistic
medical practice. Models for such data posit
regression model to 2 categories) having the
that global network properties are the result of
form
phenomena involving subgroups of (most com-
monly) four or fewer actors (Robins et al. 2005).
pr aij ,aji j α, γ ¼ k1
ij exp μij aij þ μji aji þ ρij aij aji ,
Examples of such regularities are actor-level
tendencies to produce or attract ties (homophily (6)
and heterophily), dyadic tendencies toward rec-
iprocity, and triadic tendencies toward closure or where
transitivity. A relational model, in essence, spec-
ifies a set of microlevel rules governing the local κij ¼ 1 þ exp μij þ exp μji
structure of a network. In this section, models for
þ exp μij þ μji þ ρij ,
cross-sectional relational data are considered
first followed by longitudinal counterparts and μij, μji, and ρij are functions of (αi, αj, γ i, γ j)
of them. and (xij, xji). The term μij includes factors associ-
The simplest models for sociocentric data ated with the likelihood that aij = 1 but not nec-
assume dyadic independence. Under the random essarily the likelihood that aji = 1. In a
model, all ties have equal probability of occurring nondirected network the predictors can be direc-
and the status of one has no impact on the status of tional and so it is likely that μij 6¼ μji. However, the
another (Erdős and Rényi 1959). More general only covariates included in ρij must be non-
dyadic models were developed in Holland and directional as they affect the likelihood of (aij,
Leinhardt (1981) and later were extended in aji) = (1, 1); the sign of ρij indicates whether a
Wang and Wong (1987). Because independence mutual tie is more (if ρij > 0) or less (if ρij < 0)
is still assumed between dyads, the information likely to occur than predicted by the density terms
from the data about the model parameters accu- and so is a measure of reciprocity or mutuality.
mulates in the form of a product of the probability Null mutuality is implied by ρij = 0.
densities for the status of the dyadic observation In dyadic models, the terms μij, μji, and ρij
over each dyad: account for the local network about actors i and
j through the inclusion of (αi, αj, γ i, γ j). Further-
N
L ¼ ∏ pr aij , aji j α, γ, xij , xji , (5) more, other effects can be homogeneous across
i<j actors or actor-specific. For example, the p1 model
(Holland and Leinhardt 1981) assumes where A denotes a possible state of the network, sk
μij = μ + αi + γ j and ρij = ρ, implying the (A) denotes a network statistic evaluated over A
covariate-free joint probability density function (e.g., the number of ties, the number of recipro-
of the network given by cated ties), κ(θ) = ΣA A exp.(Σk θk sk (A)), and
A is the set of all 2N (N 1) possible realizations of a
(
X
N directed network. In general, the scale factor κ(θ)
p1 ðAÞ / exp μs1 ðAÞ þ αi s2i ðAÞ that sums over each distinct network does not
i
factor into a product of analogous terms. As a
X
N result, it is computationally infeasible to exactly
þ s3i ðAÞ þ ρs4 ðAÞg, evaluate the likelihood function of dyadic depen-
j dent ERGMs for even moderately sized N (e.g.,
N > 20 is problematic (Hunter and Handcock
where s1(A) = i/=j aij, s2i(A) = ai+, s3j (A) = a+j 2006)). The key feature of the p1 model that
and s4(A) = i 6¼ j aij aji. Thus, the p1 model allows the probability of the network to decom-
depends on 2 N + 2 network statistics and associ- pose into the product of dyadic-state probabilities
ated parameters. If the p1 model holds within (ego, is that it only depends on network statistics sk (A)
alter)-shared values of categorical attributes, a that sum individual ties or pairs of ties from the
stochastic block model is obtained by allowing same dyad.
block-specific modifications to the density and If dyads are independent unless they share
reciprocity of ties (Fineberg and Wasserman an actor, the network is a Markov Random
1981; Holland et al. 1983; Wang and Wong Graph (Frank and Strauss 1986). Markov
1987). An extension would allow reciprocity to Random Graphs may include terms for density,
also vary between blocks. Because the stochastic reciprocity, transitivity and other triadic
blockmodel extension of the p1 model is saturated structures, and k-stars (equivalent to the degree
at the actor-level due to the expansiveness and distribution) – these terms contain sums of the
popularity fixed effects, no assumption is made products of no more than three ties. Such
about differences in the degree-distributions of the terms may be multiplied with actor attribute
actors in different blocks. Stochastic block models variables to define interaction effects.
are the basis of mixed-membership and other (An interaction is the effect of the product of
recent statistical approaches for node-partitioning two or more variables, e.g., if males and
social network data (Goldenberg et al. 2009; Choi females have different tendencies to reciprocate
et al. 2010; Karrer and Newman 2011). Individ- ties then gender is said to interact with
uals in the same block of a stochastic block model reciprocity.)
are often referred to as being structurally Networks that extend Markov Random
equivalent. Graphs by allowing four-cycles but no fifth- or
higher-order terms are partially conditionally
Models of Networks as Single dependent. In such networks, a sufficient condi-
Observations tion for dependence of aij and akl is that
A criticism of dyadic independence models is that aik = ajl = 1 or ail = ajk = 1 (Wang et al.
they fail to account for interdependencies between 2009). Thus, two edges may be dependent
dyads. The p or exponential random graph model despite not having any actors in common. Partial
(ERGM) generalizes dyadic independence conditional dependence is the basis of the new
models (Frank and Strauss 1986; Wasserman parameterizations of network statistics devel-
and Pattison 1996). An ERGM has the form oped by Snijders (2006) that have led to better
fitting ERGMs (see below).
!
X Under ERGMs, the conditional likelihood of
PrðA; θÞ ¼ κðθÞ1 exp θk sk ðAÞ , (7) each tie given the other ties in the network has the
k logistic form:
h i1 empty and complete graphs) may be radically

Pr aij ¼ 1j Acij ¼ 1 þ exp θT δ Acij ,
different from each other and thus the observed
(8) network (Handcock et al. 2003; Robins et al.
2007). Although the average network over
where Acij is A with aij excluded, δ Acij ¼ S Aþ
ij
repeated draws has similar network statistics to
the observed network, the individual networks

S Aij is the vector of changes in network
generated under the fitted model do not bear any
statistics that occur if aij is 1 rather than 0. Thus, resemblence to the observed network.
the parameters of an ERGM are interpreted as the Because an actor of degree m contributes k-
change in the log of the odds that the tie is present stars for k m, k-star configurations are nested
to not being present conditional on the status of within one another and thus are highly correlated.
the rest of the network (Snijders 2006). A large Therefore, when multiple k-stars are predictors,
positive parameter suggests that more configura- extensive collinearity results. However, the esti-
tions of the type represented in the network statis- mated coefficients of successive k-star configura-
tic appear in the observed network more often tions (e.g., 2-star, 3-star, 4-star) tend to decrease in
than expected by chance, all else equal (Robins magnitude and have alternating signs, an observa-
et al. 2009). tion often seen when multiple highly colinear
Due to the factorization of the likelihood func- variables are included in a regression model.
tion in Eq. 5, likelihood-based estimators of This observation led to the development of the
dyadic independence models have desirable sta- alternating k-star (Snijders 2006), given by
tistical properties such as consistency and statisti-
cal efficiency. However, if the model for the
X
N 1
Sk
network includes predictors based on three or ASðλÞ ¼ ð1Þk for λ > 1,
more actors, no such factorization occurs and k¼2 λk2
Markov chain Monte Carlo (MCMC) is required
to optimize the likelihood function for Eq. 7, where Sk denotes the number of k-stars, being
which for each observation involves making com- used in place of multiple individual k-star terms
putations on kN (N 1)/2 (k = 4 if directed and k = 2 in Eq. 7. A positive estimate of the coefficient of
if nondirected) distinct networks. ERGMs have AS(λ) suggests that the degree distribution is
been demonstrated to be estimable on networks skewed towards higher degree nodes while a neg-
with N 1600 (Goodreau 2007), but computa- ative coefficient implies large degrees are
tional feasibility depends on the terms in the unlikely. The value of λ can be specified or esti-
model and the amount of memory available. The mated from the data (Hunter 2007).
ERGM (“Exponential Random Graph Model”) Network statistics for triadic configurations –
package that is part of the Stat-net suite in R, the triangle (a nondirected closed triad) in non-
developed by the Statnet project, estimates directed networks and transitive triads, three-
ERGMs (Handcock et al. 2010). cycles, closed three-out stars, closed three-in
Other estimation difficulties include failure of stars in directed networks – are the most prone to
the optimization algorithm to converge and the degeneracy. One reason is that heterogeneity in
fitted model producing nonsensical “degenerate” the prevalence of triads across the network leads
predicted networks. Degeneracy arises because to heterogeneity in the density of ties across the
for certain specifications of sk (A) the network network (Robins et al. 2009). A model that
statistics are highly collinear or there is unac- assumes homogeneous triadic effects across the
counted effect heterogeneity across the network. network is unable to describe networks with
As a result, under the fitted model the local neigh- regions of high and low density; the generated
borhood of networks around the observed net- networks are either dominated by excessive
work may have probability close to 0 and those low-density regions or by excessive high-density
networks with positive probability (often the regions. This observation suggests a hierarchical
modeling strategy where the first step is to use a Bipartite ERGMs

community detection algorithm (see section “Net- An alternative approach to modeling a one-mode
work Communities”) to partition the network into projection (by construction a nondirected net-
blocks of nodes. Then fit an ERGM (or other work) from a two-mode network is to directly
model) to the subnetwork corresponding to each model the two-mode network. An advantage of
community, allowing the network statistics to direct modeling is that all the information in the
have different effects within each community. data is used. ERGMs or any other model applied
The just-described modeling strategy combines to bipartite data need to account for the fact that
methods of network science and social network ties can only form in dyads including one actor
analysis. from each mode. In a dyadic independence model
A similar approach has been used to overcome this is recognized simply by excluding all same
severe computational difficulties that often occur mode dyads from the dataset. In general, the
when one or multiple triadic (triangle-type) terms denominator κ(θ) in Eq. 7 only sums over net-
are included in the model. A k-triangle is a set of works in which there are no within mode ties. If
k triangles resting on a common base. For exam- the number of actors in the two modes are N and
ple, if individuals i, j, and k are one closed triad M, there are 2NM distinct nondirected networks.
and individuals i, j, and l are another then the four The density and degree distributions may be
individuals form a 2-triangle with the edge yij represented in a bipartite ERGM as in a unipartite
common to both. Let Tk denote the number of k- ERGM. However, with two modes it may be that
triangles in the network. Thus, T1 denotes the total two types of each network statistic and other pre-
number of closed triads, T2 the total number of dictor is needed. Representations of homophily in
2-triangles, and so on. The alternating k-triangle two-mode networks are defined across modes.
statistic Likewise, because there are no within-mode ties,
statistics that account for closure must also
depend only on inter-mode ties.
X
N 3
Skþ1
ATðλÞ ¼ ð1Þk for λ > 1, The smallest closed structure in a bipartite
k¼1 λk graph is a four-cycle (closed four-path). An exam-
ple of a four-cycle is the path A–1–C–2–A in
was developed to perform for triadic structures Fig. 2; it includes four distinct actors and four
what AS(λ) performs for k-stars (Snijders 2006). edges are traversed to return to the initial actor.
The presence of λ makes AT(λ) nonlinear in A simple measure of closure contrasts the number
the triangle count, giving lower probability to of closed four-cycles out of all three paths
highly clustered structures. By making the containing four unique actors with the overall
number of actors who share k partners the density of ties. A simple model for testing whether
core term, AT(λ) can be rewritten as a geometri- clustering (closure) is present in a bipartite net-
cally weighted edgewise shared partner work includes density, both sets of k-stars, three-
(GWESP) statistic (Goodreau 2007; path, and four-cycle statistics as predictors. A
Hunter 2007). significant positive effect of the four-cycle statis-
The AS(λ) and AT(λ) statistics do not differ- tic suggests that two actors of degree two in one
entiate between outward and inward ties. mode that have one of the actors in the other mode
Recently, directed forms of these statistics have in common are more likely to also have the second
been introduced (Robins et al. 2009). The actor in common, relative to two randomly
directed versions of the k-star are threefold, selected actors of degree two from the same
corresponding to two paths, shared destination mode. For example, in a physician-patient net-
node (activity), shared originator node (popular- work, clustering implies having one patient in
ity). The directed versions of the k-triangle rep- common increases the likelihood of having
resent transitivity, activity closure, popularity another patient in common. Physicians A and C
closure, and cyclic-closure. both have patients 1 and 2 in common, hence they
provide evidence for bipartite closure. However, and tie-dissolution differently (Krivitsky and
physicians E and F have patient 3 in common; Handcock 2010).
despite being eligible to exhibit bipartite closure Like ERGMs for cross-sectional data, longitu-
they do not, and hence they provide evidence dinal ERGMs are defined by statistics that count
against bipartite closure. the number of occurrences of substructures in the
Analogies of ERGMs and solutions to prob- network. However, in addition to the current state
lematic issues exist for bipartite networks. For of the network, such statistics may also depend on
example, to avoid problems of high colinearity previous states. Under Markovian dependence,
between the k-star terms, alternating k-star statis- network statistics only depend on the current and
tics can be used in place of them (Wang et al. the most recent state; for example, the number of
2009). Let SD (B) denote the number of ties from ties that remain intact from the preceding obser-
one mode to the other, AS1(B) and AS2(B) denote vation. The recently released TERGM (“temporal
the alternating k-star statistics for each mode, S3P exponential random graph model”) package in the
(B) denote the number of three-paths, and S4C (B) Statnet suite in R estimates ERGMs for discrete
denote the number of closed four-cycles for a temporal (i.e., longitudinal) sociocentric data
network B. The resulting bipartite ERGM for B (Hanneke et al. 2010).
has the form:
Actor-Orientated Approaches
PrðB; θÞ ¼ κðθÞ1 expðθ0 SD ðBÞ þ θ1 AS1 ðBÞ An alternative approach for modeling network
evolution is the actor-oriented model (Snijders
þθ2 AS2 ðBÞ þ θ3 S3P ðBÞ þ θ4 S4C ðBÞÞ, (9) 1996, 2001, 2005). This centers on an objective
function that actors seek to maximize and which
where κ(θ) sums over the M N possible bipartite may be sensitive to multiple network properties,
graphs. The statistic S4C (B)/S3P (B) is the propor- including reciprocity, closure, homophily, or
tion of times that two patients each visit the same contact with high-degree actors. The model
two physicians out of all the occurrences where assumes that actors control their outgoing ties
two patients both have one visit to one physician and change them in order to increase their satis-
and one patient visits the other physician. The faction with the network in one or more respects
coefficient θ4 is the effect associated with this as quantified by the objective function. It resem-
lowest-order form of closure in a two-mode bles a rationale choice model in which each
sense (but should not be thought of as reciprocity agent attempts to maximize their own utility
because the network is nondirected). function. Estimated parameters indicate whether
changes in a given property raise or lower actor
Longitudinal ERGMs satisfaction.
The development of relational models has primar- An important distinction of actor-oriented
ily focused on cross-sectional data. However, models from ERGMs is that the relevant network
extensions of ERGMs to longitudinal scenarios statistics in the actor-oriented model are specific
have been developed – most often involving a to individuals rather than being aggregations
Markov assumption to describe dependence across the network. However, like ERGMs, esti-
across time. The first longitudinal ERGMs treated mation is computationally intensive. The SIENA
tie-formation and tie-dissolution as equitable package in StOCNET (Huisman and Van Duijn
events in the evolution of the network (Hanneke 2004, 2005) uses a stochastic approximation algo-
et al. 2010). A more general formulation treats rithm but struggles with networks of appreciable
tie-formation (attractiveness in the context of net- size (e.g., thousands of individuals). Because they
work science) and tie-duration (the complement only resemble ERGMs in the limiting steady-state
of tie-duration referred to as fitness in network case, actor-oriented models may also suffer from
science) as separable processes, thereby allowing degeneracy but the problem is less profound
the same network statistic to impact tie-formation (Goldenberg et al. 2009).
Joint Models Typically, (αi, γ i) is assumed to be bivariate nor-

A virtue of the actor-oriented modeling frame- mal with covariance matrix Σαγ. Therefore, the p2
work in SIENA is that an actor’s relationships model is given by
can be modeled jointly with the social-influence

effects of an actor’s peers on their own traits. If the pr aij , aji j xij , xji ¼ k1
ij exp μij aij þ μji aji þ ρij aij aji ,
model is correctly specified, it has the potential to (10)
account for unmeasured confounding factors that
affect both the evolution of relationship status and
the values of individuals attributes, yielding unbi- where κ ij ¼ 1 þ exp μij þ exp μji

ased estimates of the effects of observed variables þexp μij þ μji þ ρij ,
affecting social influence and the evolution of the
μij ¼ μ þ αi þ γ j þ βT xij ,
network. Such a model was developed by Steglich
and colleagues (Steglich et al. 2010), but to date ρij ¼ ρ þ βT x2ij ,
work in this area is limited.
and (αi, γ i) Normal(0, Σαγ ). Thus, xij = (x1ij,
Latent Independence Approaches x2ij) and x2ij includes a subset of covariates that
In ERGMs a huge increase in computational com- are symmetric (x2ij = x2ji) in reflection of the fact
plexity occurs between the dyadic-independent that reciprocity is a symmetric phenomenon. Con-
and dyadic-dependent models. A second concern ditional on (αi, γ i) the model implies that the
about ERGMs is that in general they are not con- relationship status of one dyad does not depend
sistent under sampling in the sense that statistical on that of another. A positive off-diagonal element
inferences drawn from the network for the sample of Σαγ implies that expansive individuals also tend
do not generalize to the full network (Shalizi and to be popular.
Rinaldo 2012). The few ERGMs to exhibit such The p2 model can be extended to account for
consistency include the dyadic independent p1 more general forms of dyadic dependence than the
and stochastic block models. An alternative latent propensity of an individual to send or
modeling strategy provides a more graduated receive ties. Let each individual have a vector of
transition between independence and dependence latent variables, denoted zi in the case of individ-
scenarios by using random effects to model ual i, that together with the same for individual
dyadic dependence and also ensures consistency j affects the value of the relationship between i and
between the results of analyzing the sample and j. The dependence of tie-status on zi is generally
the population of interest. Random effects are represented using a simple mathematical function.
used to account for dyadic independence in the The major types of models are latent class models
p2 model (Duijn et al. 2004; Zijlstra et al. 2006) (Nowicki and Snijders 2001; Airoldi et al. 2008),
introduced below. latent distance models (Hoff et al. 2002;
The p2 model is much like the p1 model except Handcock et al. 2007), and latent eigenmodels
that the expansiveness αi and popularity γ i param- (Hoff 2005, 2008). These models are character-
eters are random as opposed to fixed effects. ized by the form of the latent variable
8
< λzi, zj where
c zi , zj f1, . . . , K gand λzi, zj ¼ λzj, zi
ξ zi , zj ¼ zi zj where c > 0 and zi , zj have K elements (11)
: T j
zi Uz where zi N ð0, Σz Þand U is a K dimensional diagonal matrix
which is included as an additional predictor in μij. variable in the latent class model (first row) to a
In Eq. 11 the form and interpretation of zi changes position in a continuously valued multidimensional
from denoting a scalar ξ(zi, zj) categorical latent space in the latent distance and latent eigenmodels
(second and third rows, respectively). The term negative (allowing for heterophily as well as homo-
ξ(zi, zj) can be added to either the μij or ρij compo- phily). The model constrains the extent to which the
nents of the p2 model to allow higher-order depen- quadratic forms zTi Uzj, zTi Uzk, and zTj Uzk constructed
dence to moderate the effect of density and from the latent vectors vary from one another. The
reciprocity, respectively. greater the magnitude of Σz = cov(zi) the greater the
In the latent class specification the array of values extent to which ties are expected to cluster and form
of λzi,zj form a symmetric K K matrix Λ. A basic cliques. The latent eigenmodel model is appropriate
specification is λzi,zj = λ0 if zi = zj (nodes in same if a network exhibits clustering due to both structural
partition) and λzi,zj = 0 if zi ¼
6 zj (nodes in different equivalence and unmeasured homophily.
partitions) (Nowicki and Snijders 2001; Airoldi et al. In Hoff (2005) and (2008) models are specified
2008). Latent class models extend stochastic-block at the tie level with reciprocity (in directed net-
models to allow latent clusters as well as observed works) represented as the within-dyad correlation
clustering variables. This family of models is suited between two tie-specific latent variables. Modeling
to network data exhibiting structural equivalence, reciprocity as a latent process differs from the p2
that is, under the model individuals are hypothesized model, in which reciprocity is represented as a
to belong to latent groups such that members of the direct effect (Paul and O’Malley 2013). Therefore,
same group have similar patterns of relationships. an alternative family of latent variable models for
In the latent distance specification the most networks is obtained by augmenting the density
common values for c are 1 and 2, corresponding term in the p2 model with Eq. 11. An advantage
to absolute and cartesian distance, respectively. of specifying a joint model at the dyad level is that
The distance metric accounts for latent homo- the resulting (extended-p2) model involves
phily – the effect of unobserved individual char- N (N 1) fewer latent variables, possibly alleviat-
acteristics that induce ties between individuals. ing computational issues such as nonidentifiability
In this model, zi can be interpreted as the posi- of parameters or multiple local optima.
tion of individual i in a social space (Hoff et al. The challenges of estimating models involving
2002). This model accounts for triadic depen- latent variables resemble those of factor analysis or
dence (e.g., transitivity) by requiring that latent other dimension-reduction methods. First, an appro-
distances between individuals obey the triangle priate value of K may not be able to be specified
inequality. Latent distance models are available from existing knowledge of the network, and esti-
in the LatentNet package in R (Krivitsky and mating K from the data is not straightforward. Sec-
Handcock 2008). ond, computational challenges in estimating the
The latent eigenmodel is the most general spec- latent variables can make the method difficult to
ification and accounts for both structural equiva- apply to large networks. However, such issues are
lence and latent homophily. Furthermore, the more easily overcome than degeneracy in ERGMs.
parameter space of the latent eigenmodel model of Degeneracy is avoided in these models as the model
dimension K generalizes that of the latent class for a dyad determines the distribution of the net-
model of the same dimension and weakly general- work. In other words, the factorization of the likeli-
izes the latent distance model of dimension K 1. hood into a product of like terms ensures that
Conversely, the latent distance model of dimension networks sampled under the model are almost surely
K does not generalize the one-dimensional latent in the neighborhood of the observed network,
eigenmodel model (Hoff 2008). The closeness of increasingly so as N increases (i.e., asymptotically).
the latent factors U1/2zi and U1/2zj quantifies the Another contrast with ERGMs is that the model
structural equivalence of actors i and j positions in describes a population as opposed to the single
the network; a tie is more likely if U1/2zi and U1/2zj observed network. Thus, in latent variable models
have a similar direction and magnitude, allowing for the data-generating process is modeled whereas
more clustering than under Eq. 10. On the other ERGMs are specific to the observed network and
hand, latent homophily is accounted for by the so have more in common with finite population
diagonal elements of U, which can be positive or inference.
Another advantage of conditional indepen- The approach in Paul and O’Malley (2013) is
dence models over ERGMs is that the same notable for attempting to capture the best of both
types of models can be applied to valued rela- worlds: it allows localized (actor or dyadic) ver-
tional data. Analogous to generalized linear sions of the higher-order predictors available in
models, the link function and any parametric ERGMs to be included as predictors, but avoids
distributions assumptions that define a condi- degeneracy by using their lagged values as
tional independence network model can be tai- opposed to their current values as predictors.
lored to the type of relationship variable (scale, Therefore, conditional on the observed and
count, ratio, categorical, multivariate). How- latent predictors, dyads are cross-sectionally
ever, a recent adaptation of ERGMs has been independent but longitudinally dependent on
proposed for modeling count-valued socio- prior states of other dyads (in addition to their
centric data (Krivitsky 2012). own past states) in the network. An extension
Offsetting the above advantageous features that builds on Paul and O’Malley (2013) is to
of conditional independence models is that incorporate the latent class, distance, or eigen-
terms such as ξ(zi, zj) are limited from the factor terms in Eq. 11 in the model. Such a
hypothesis testing and interpretational stand- model was entertained in Westveld and Hoff
point in that they do not distinguish particular (2011) but has not yet been developed.
forms of social equivalence or latent homophily.
For example, the effect of transitivity is not
distinguished from that of cyclicity or higher-
order clustering, such as tetradic closure. There- Part III: Network Science
fore, the choice of model in practice might
depend on the importance of testing specific We now switch attention to methods that have
hypotheses about higher-order effects to obtaining been derived and used in the field of network
a model whose generative basis allows it to make science. In general, network science approaches
predictions beyond the data set on which the avoid assumptions about distributions in models.
model was estimated. For example, to test whether a network exhibits a
certain property, the commonly employed
Longitudinal Conditional Independence approach is to use a permutation test to develop
Models a null distribution for a statistic that embodies the
Longitudinal counterparts of conditional indepen- property in question and then evaluate how
dence models are obtained by introducing terms extreme the observed value of the statistic is
that account for longitudinal dependence (e.g., with respect to the null distribution. This tech-
past states of the dyad). A simple Markov transi- nique is the cornerstone of the procedure used
tion model was developed in O’Malley and to evaluate the degree of separation to which
Christakis (2011) with tie-formation and social clustering can be detected in Szabo and
tie-dissolution treated as unrelated processes. Barabasi (2007).
Conditional on the past state of the dyad and the Network science focuses not only on
sender and receiver random effects, the value of social networks but also covers information
each tie is assumed to be statistically independent networks, transportation networks, biological
of that of any other tie. A more general formula- networks, and many others. Most of the net-
tion extends the p2 model, allowing dependence works studied within network science are non-
between ties within a dyad (reciprocity), hetero- directed as ties are typically thought of as
geneous effects in the formation and dissolution connections as opposed to measures for which
of ties, and the inclusion of higher-order effects the distinction between instigator and receiver
(e.g., third-order interactions to account for tran- is relevant. Thus, the networks in this section
sitivity) as lagged predictors (Paul and O’Malley are assumed to be nondirected unless stated
2013). otherwise.
Generative Models of Network citations, where a node represents a scientific

Formation paper, each node has some number of edges
pointing to nodes that correspond to cited papers
Network science has taken a somewhat different (de Solla Price 1965). In the present context, for
approach to modeling networks than the social example, there would be an edge pointing from
sciences or statistics. Essentially all models devel- the node representing this chapter to the node
oped within network science are generative representing the 1965 Science paper of Price.
models, sometimes also known as forward While the out-degree of nodes is fairly uniform,
models, in contrast to probabilistic models such as the length of bibliographies is fairly
as ERGMs. These models start from a set of constrained, the in-degree distribution was found
simple hypothesized mechanisms, often function- to be fat-tailed with the functional form of a
ing at the level of individual nodes and ties, and power-law, P (k) kα (de Solla Price 1965).
attempt to describe what types of network struc- Price later proposed a mathematical model for
tures emerge from a repeated application of the cumulative advantage processes, “the situation in
proposed mechanisms. Many of the models which success breeds success” (Price 1976). In
describe growing networks, where one starts this model, nodes are added to the network one
from a small connected seed network consisting at a time, and the average out-degree of each node
of a few connected nodes, and then grows the is fixed. The attachment rule in the model spec-
network by subsequent addition of nodes, usually ifies that each new paper will cite existing papers
one at a time. The attachment rules specify how with probability proportional to the number of
exactly an incoming node attaches itself to the citations they already have. Thus each incoming
existing network. node will attach itself with some number of
Generative models are commonly exploratory directed edges to the existing network, the exact
in nature. If they reproduce the type of structure number of ties being drawn from a distribution,
observed in an empirical network, it is plausible and the nodes these new edges are pointing to will
that the proposed mechanisms may underlie net- be chosen proportional to their in-degree. In this
work formation in the real world. The main formulation, however, papers with exactly zero
insight to be gained from a generative model is a citations can never accrue citations. To overcome
potential explanation for why a network possesses this problem, one can either consider the original
the type of structure it does. Many of the models publication as the first citation so that each paper
are simple in nature, which occasionally leads to starts with one citation or, alternatively, add a
analytical tractability, but the main reason for small constant to the number of citations (Price
simplicity is the potential to expose clearly the 1976). Either way, the outcome is that the target
main mechanism(s) driving the phenomenon of nodes are chosen in proportion to their in-degree
interest. It is not uncommon for generative models plus this small positive constant. A derivation
to possess only two or three parameters, yet occa- of the resulting in-degree distribution is given by
sionally simple generative mechanisms can Newman (2010). Denoting the average out-degree
explain some of the key features surprisingly of a node by c and using a to denote the small
well. Once a model can explain the main features, positive constant, the in-degree distribution P (k)
it can be fine-tuned by adding more specific or for large values of k has the power-law form
nuanced mechanisms. A few examples of genera- P (k) kα, where α = 2 + a/c.
tive models are now described. This simple model (although the derivation of
the result is quite involved) is able to reproduce
Cumulative Advantage Model the empirical citation (in-degree) distribution for
Cumulative advantage refers to phenomena where scientific papers with surprising accuracy given
success seems to breed success, such as in the case that the model only contains two parameters. It
of accumulation of further wealth to already may seem odd that the model does not incorporate
wealthy individuals. In networks of scientific any notion of paper quality, which surely should
be an important driver of citations. Here it is Strogatz models, operated on a fixed set of

important to notice that the model does not make N vertices, and assumed that connections were
any attempt to predict which paper becomes pop- placed or rewired without any regard to the
ular (although it can be shown, using the model, degrees of the nodes to which they were
that papers published at the inception of a field connected. The model of Barabasi and Albert
have a much higher probability to become popu- changed both of these aspects. First, they intro-
lar). Instead, the model incorporates the quality of duced the notion of network growth, such that at
papers implicitly, and indeed the number of cita- each time step a new node would be added to the
tions to a paper is frequently seen as an indicator network. Second, this new node would connect to
of its quality. Popular papers are also easily dis- the existing network with exactly m nondirected
covered, which further feeds their popularity. The edges, and the nodes they attached to were chosen
idea of using popularity as a proxy for quality may in proportion to their degree. The probability for
extend to other areas where resources are scarce, the incoming vertex to connect to vertex i depends
for example, skilled surgeons are in high demand. solely on its degree ki and is given by
X
Preferential Attachment Model Πðki Þ ¼ ki = kj :
The cumulative advantage model of Price (1976) j
is developed as a modification of the Polya urn
model, which is used to model a sampling process The model was solved by Barabasi and Albert
where each draw from the urn, corresponding to a using rate equations, which are differential equa-
collection of different types of objects, changes tions for the evolution of node degree over time
the composition of the urn and thereby changes where both degree and time, as an approximation,
the probability of drawing an object of any type in are treated as if they were continuous variables
the future. The standard Polya urn model consists (Barabasi and Albert 1999; Barabasi et al. 1999).
of an urn containing some number of black and More general solutions were provided by
white balls, drawing a ball at random and then Krapivsky et al. also using rate equations
returning it to the urn along with a new ball of the (Krapivsky et al. 2000) and Dorogovtsev et al.
same color (Feller 1966). Independently of Price, using master equations which, like rate equations,
Barabasi and Albert introduced a similar model in are differential equations for the evolution of node
1999 (Barabasi and Albert 1999). They examined degree, but they (correctly) treat degree as a dis-
the degree distributions of an actor collaboration crete variable while still making the continuous-
network (two actors are connected if they are cast time approximation for time (Dorogovtsev et al.
in the same movie), World Wide Web (two web 2000). In the master equation approach, one
pages are connected if there is a hyperlink from writes down an equation for the evolution of
one page to the other), and power grid (two ele- the number of nodes of a given degree. Let us
ments (generators, transformers, substations) are use Nk (t) to denote the number of nodes of degree
connected if there is a high-voltage transmission k in the network at time t, where time is identified
line between them), finding that they approxi- with network size, i.e., time t corresponds to the
mately followed power-law distributions. network at the point of its evolution when it con-
Although the actor collaboration network and sists of t nodes. (The nodes making up the seed
the power grid networks are defined much like a network can be usually ignored in the limit as time
projection from a two-mode to a one-mode net- increases.) The number Nk (t) can change in two
work, a subtle difference between them is that ways: it can either increase as an incoming node
direct interaction between the nodes can be attaches itself to a node of degree k 1 and thus
assumed. In other words, the nodes can be thought turn it into a node of degree k, or it can decrease as
of as directly linked. an incoming node attaches itself to a node of
Both of the generic network models in exis- degree k, turning into a node of degree k + 1. The
tence at the time, the Erdős-Rényi and the Watts- former situation leads to Nk (t + 1) = Nk (t) + 1
and the latter to Nk (t + 1) = Nk (t) 1. Transitions attachment hypothesis, new physicians would be
larger than one, e.g., from k to k + 2 or from k to more likely to form ties with and thus seek advice
k 2 are very unlikely and can be ignored. The from popular established physicians or physicians
value of Nm(t) increases by one per time step as in the same cohort (e.g., Medical school or resi-
each incoming node has degree m, which also dency program).
means there are no nodes with degree less than
m, and hence the equations used to model the Social Network Models
evolution of quantities like Nk (t) are not valid The class of models known as network evolution
for k < m. The resulting degree distribution has models can be defined via three properties: (i) the
the form models incorporate a set of stochastic attachment
rules which determine the evolution of the net-
2mðm þ 1Þ work structure explicitly on a time-step–by–time-
Pð k Þ ¼ ,
k ð k þ 1Þ ð k þ 2Þ step basis; (ii) the network evolution starts from
an empty network consisting of nodes only, or
which asymptotically converges in distribution to from a small seed network possessing arbitrary
P (k) k3. structure; and (iii) the models incorporate a stop-
The preferential attachment model of Barabasi ping criterion, which for growing network models
and Albert has attracted a tremendous amount of is typically in the form of the network size
scientific interest in the past few years, and con- reaching a predetermined value, and for dynami-
sequently numerous modifications of the model cal (nongrowing) network models the conver-
have been introduced. For example, extensions of gence of network statistics to their asymptotic
the model allow: values. Many network evolution models do not
reference intrinsic properties or attributes of
• Ties to appear and disappear between any pairs nodes, and in this sense they are similar to the
of vertices (the original formulation only con- various implementations of preferential attach-
siders the addition of ties between the incom- ment models that do not postulate node-specific
ing vertex and set of vertices already in fitness or attractiveness.
existence). Most network evolution models that are
• Vertices to be deleted either uniformly at ran- intended to model social networks employ some
dom or based on their connectivity. variants of focal closure and cyclic closure (see,
• The attachment probability Π(ki) to be super- e.g., Kossinets and Watts (2006)). Focal closure
linear or sub-linear in degree, or to consist of refers to the formation of ties between individuals
several terms. based on shared foci, which in a medical context
• Nodal attributes, such as the attractiveness (the could correspond to a group of doctors who prac-
propensity with which new ties form with the tice in a particular hospital (the focus). The con-
node) or fitness (the propensity with which cept of shared foci in network science is
established ties remain intact) of a node, and analogous to homophily in social network analy-
the attachment probability can incorporate sis. More broadly, ties could represent any interest
these attributes in addition to degree. or activity that connects otherwise unlinked indi-
• Edges to assume weights instead of {0, 1} viduals. In contrast, cyclic closure refers to the
binary values to codify connection strength idea of forming new ties by navigating and
between any pair of elements. leveraging one’s existing social ties, a process
that results in a cycle in the underlying network.
In the context of physician networks, a prefer- Because the network is nondirected, the term
ential attachment model could be used to examine cycle is used interchangeably with closure. This
the process of new physicians seeking colleagues differs from when the network is directional and a
to ask for advice upon joining a medical organi- cycle is a specific form of closure, with transitivity
zation, such as a hospital. Under the preferential being another form. Triadic closure, which is the
special case of cyclic closure involving just three (2003) and Toivonen et al. (2006), do not usually
individuals, refers to the process of getting to incorporate link deletion, but instead grow the
know friends of friends, leading to the formation network to a prespecified size, which obviates
of a closed triad in the nondirected network. Most the need for link deletion.
social networks are expected to (i) have skewed Marsili et al. use extensive numerical simula-
and fat-tailed degree distributions, (ii) be assorta- tions, as well as a master equation approach
tively mixed (high-degree individuals are applied to a mean-field approximation of the
connected to high-degree individuals), (iii) be model, to explore the impact of varying the prob-
highly clustered, and (iv) possess the small- abilities η (global linking), ξ (neighborhood
world property (average shortest path lengths are linking), and λ (link deletion) for average degree
short, or more precisely, scale as log(N )), and and average clustering coefficient. Consider a sit-
(v) exhibit community structure. uation where the value of ξ (neighborhood
The models by Davidsen et al. (2002) and linking) is increased while keeping the value of λ
Marsili et al. (2004) exemplify dynamic (non- (link deletion) fixed. At first, for small values of ξ,
growing) network evolution models for social components with more than two nodes are rare,
networks. Both have a mechanism that starts by and the network can be said to be in the sparse
selecting a node i in the network uniformly at phase. Upon increasing the value of ξ up to a
random. In the model of Davidsen et al., if node specific point, a large connected component
i has fewer than two connections, it is connected emerges, and the value of the average degree
to a randomly chosen node in the network; other- suddenly jumps up. This point equals ξ2/λ and is
wise two randomly chosen neighbors of node i are known as the critical point – it marks the begin-
connected together. In the model of Marsili et al., ning of the dense phase in the phase diagram of
node i (regardless of its degree) is connected with the system. As ξ is increased further, the network
probability η to a randomly chosen node in the becomes more densely connected. Reversing the
network; then a second-order neighbor of node i, process by slowly decreasing the value of ξ iden-
i.e., a friend’s friend, is connected with probability tifies a range of values from ξ1 ξ ξ2 where the
ξ to node i. The first mechanism in each model, the largest connected component remains densely
random connection, emulates focal closure, connected and the average degree remains high.
because there are no nodal attributes signifying Only when the value of ξ is decreased below a
shared interests. The point is that the formation of point denoted by ξ1 does the network “collapse”
these connections is not driven by the structure of and reenter the sparse phase. This phenomenon,
existing connections but, from the point of view of which demonstrates some of the connections
network structure, is purely random. The second between network science and statistical physics,
mechanism, the notion of triadic closure, is is typical of first-order or discontinuous phase
implemented in slightly different ways across the transitions in statistical physics, and it demon-
models. If these mechanisms were applied indef- strates how hysteresis, the effect of the system
initely, the result would be a fully connected net- remembering its past state, can rise in networked
work. To avoid this outcome, the models also systems. Although Markov dependence is a spe-
delete ties at a constant rate, which makes it pos- cial case of hysteresis, its use is generally
sible for network statistics of interest to reach restricted to probabilistic models whereas hyster-
stationary distributions. In the model of Davidsen esis is typically aligned with nonlinear models of
et al., tie deletion is accomplished by choosing a physical phenomena having a continuous state-
node in the network uniformly at random, and space. From the social network point of view this
then removing all of its ties with some probability; means that the network can remain in a
Marsili et al. accomplish the same phenomenon connected phase even if the rate of establishing
by selecting a tie uniformly at random, and then new connections at the current rate would not be
deleting it with probability λ. Growing network sufficient for getting the network to that phase in
evolution models, such as those by Vázquez the first place. In more practical terms, this
a b
c
d
Fig. 6 Network structures produced by the model of Kumpula et al. by varying the reinforcement parameter as follows:
(a) δ = 0, (b) δ = 0.1, (c) δ = 0.5, and (d) δ = 1. Figure adapted from Kumpula et al. (2007)
finding implies that it is possible to maintain a The model by Kumpula et al. (2007), which is
highly connected network with a relatively low another dynamical (nongrowing) network evolu-
“effort” (the ξ parameter in the model) once the tion model for social networks, implements cyclic
network has been established, but that same low closure and focal closure (see Fig. 6) in a manner
level of effort would not be sufficient for similar to the models of Davidsen et al. and
establishing the dense phase of network evolu- Marsili et al., but introduces a minor modification.
tion in the first place. (The analogy in social Unlike the previous models which produce
network analysis is that the threshold for binary networks with Aij = {0, 1}, this model
forming a (e.g.,) friendship is greater than that produced weighted networks with Aij 0. The
needed for it to remain intact.) main modification deals with the triadic closure
step, which here is implemented as a weighted Nodal attribute models, in stark contrast to
two-step random walk. Starting from a randomly network evolution models, specify nodal attri-
chosen node i; this node chooses one of its neigh- butes for each node, which could be scalar or
bors j with probability wij/si, where si = Σj wij is vector valued. The probability of linkage between
the strength of node i, i.e., the sum of the edge any two nodes is typically an increasing function
weights connecting it to its neighbors. If node of the similarity of the nodal attributes of the two
j has neighbors other than i, such a node k will nodes in consideration. This is compatible with
be chosen with probability wjk/(sj wij), where the notion of homophily, the tendency for like to
there is a requirement that k 6¼ i. The weights wij attract like. Nodal attribute models can also be
and wjk on the edges just traversed will be interpreted as spatial models, where the idea is
increased by a value δ. In addition, if there is a that each node has a specific location in a social
link connecting node i and node k, the weight wik space. The models by Boguñá et al. (2004) and
on that link is similarly increased by δ; otherwise a Wong et al. (2006) serve as interesting examples.
new link is established between node i and k with Nodal attribute models do not specify attachment
wik = 1. When δ = 0, there is no clear community rules at the level of the network, and in some sense
structure present, but as the value of δ is increased, can be seen as latent variable models for social
very clear nucleation of communities takes place. network formation. These types of models have
This phenomenon occurs when δ > 0 because a been studied less in the network science literature
type of positive feedback or memory gets than network evolution models.
imprinted on the network, which reinforces Clearly, nodal attribute models have a strong
existing connections, and makes future transver- resemblance to models developed and studied in
sal of those connections more likely. This is not the social network literature that treat dyads as
unlike the models of cumulative advantage or independent conditional on observed attributes
preferential attachment discussed above, but now of the individuals, other covariates, and various
applies to individual links as opposed to nodes. If latent variables (individual-specific random
one inspects the community structure produced by effects in the case of the p2 model, categorical
the model, most of the strong links appear to be latent variables in the case of latent class models,
located within communities, whereas links continuous latent variables under the latent-space,
between communities are typically weak. This and latent eigenmodels in section “Latent Inde-
type of structural organization is compliant with pendence Approaches”). Unlike network science,
the so-called weak ties hypothesis, formulated in work on such models in the social network litera-
Granovetter (1973), which states, in essence, that ture has been more prominent than work on net-
the stronger the tie connecting two individuals, work evolution. A difference in the approach of
the higher the fraction of friends they have in some nodal attribute models and social network
common. Onnela et al. showed that a large-scale models is that the former may use specific rules
social network constructed from the cell phone for determining whether a tie is expected, such as
communication records of millions of people a threshold function (in a sense emulating formal
was in remarkable agreement with the hypothesis decision making), whereas the latter rewards
– only the top 5% of ties in terms of their weight values of parameters that make the model most
deviated noticeably from the prediction. The net- consistent with the observed network(s).
works produced by the model of Kumpula et al.
are clearly reminiscent of observed real-world
social networks, and the inclusion of the tuning Network Communities
parameter δ makes it straightforward to create
networks with sparser or denser communities. Many network characteristics are either micro-
The downside is that the addition of weights to scopic or macroscopic in nature; the value of a
the model appears to make it analytically microscopic characteristic depends on local net-
intractable. work structure only, whereas the value of a
macroscopic characteristic depends on the struc-

ture of the entire network. Node degree is an
example of a microscopic quantity: the degree of
a node depends only on the number of its connec-
tions. In contrast, network diameter, the longest of
all pairwise shortest paths in the network, can
change dramatically by the addition (or removal)
of even a very small number of links anywhere in
the network. For example, a k-cycle consists of
k nodes connected by k links such that a cycle is
formed with each node connected to precisely two
nodes. The diameter of such a network is bk/2c,
where the floor function bxc maps a real number
x to the largest previous integer, such that for an
even n it follows that bk/2c = k/2. For large values
of k, adding just a few links quickly brings down
the value of network diameter. There is a third,
intermediate scale that lies between the micro- Fig. 7 Communities in a patient-sharing network of phy-
scopic and macroscopic scales which is often sicians. Each vertex corresponds to a physician, and a pair
known as the mesoscopic scale. For example, a of physicians are connected with a tie if they share patients.
The community assignment of each physician is indicated
k-clique could justifiably be called a mesoscopic by the node color. In this case the “green” and “orange”
object (especially if k is large). Another type of communities are fairly distinct
mesoscopic structure is that of a network commu-
nity, which can be loosely defined as a set of nodes patients within the same period of time. The clus-
that are densely connected to each other but tering of physicians in communities is shown for
sparsely connected to other nodes in the network one particular Hospital Referral Region (a health
(but not to the extent of resulting in distinct care market encompassing at least one major city
components). where both cardiovascular surgical procedures
There has been considerable interest espe- and neurosurgery are performed) in the United
cially in the physics literature focusing on how States (Fig. 7).
to define and detect such communities, and sev- One potential application of network science
eral review papers cover the existing methods methods for community detection is in the area of
(Porter et al. 2009; Fortunato 2010; Newman health education and disease prevention (e.g.,
2012). The motivation behind many of these screening). Due to limited resources, it may not
efforts is the idea that communities may corre- be possible to send materials or otherwise directly
spond to functional units in networks, such as educate every member of the population. The
unobserved societal structures. The examples partition of individuals into groups would facili-
range from metabolic circuits within cells tate a possibly more efficient approach whereby
(Guimera and Nunes Amaral 2005) to tightly the communities are first studied to identify key
knit groups of individuals in social networks individuals. Then a few key individuals in each
(Newman and Girvan 2004; Traud et al. 2012). community are trained and advised on mecha-
The interested reader can consult the review arti- nisms for helping the intervention to diffuse
cles on community detection methods (Porter across the community. A general characteristic
et al. 2009; Fortunato 2010; Newman 2012) for of interventions where such an approach might
more details. Another application is health care be useful are those where intensive training is
where, for instance, Landon et al. (2012) have required to be effective and where delegation of
deduced communities of physicians based on resources through passing on knowledge or
network ties representing them treating the same advice is possible.
Modularity Maximization Kronecker delta, and the factor 1/2 prevents

A number of network community detection double-counting vertex pairs. To obtain the
methods define communities implicitly via an expected number of edges between vertices of
appropriately chosen quality function. The under- the same class, cut every edge in half, resulting
lying idea is that a given network can be divided in two stubs per edge, and then connect these
into a large number of partitions, or subsets of stubs at random. For a network with m edges,
nodes, such that each node belongs to one subset, there are a total of 2 m such stubs. Consider one
and each such partition P has a scalar-valued of the ki stubs connected to vertex i. This particu-
quality measure associated with it, denoted by Q lar stub will be connected at random to vertex j of
(P). In principle one would like to enumerate degree kj with probability kj/2 m, and since vertex
all possible partitions and compute the value i has ki such stubs, the number of expected edges
of Q for each of them, and the network commu- between vertices i and j is kikj/2 m. The expected
nities would then be identified as the partition number of edges falling between vertices of the
P ki kj
(or possibly partitions) with the highest quality. same class is now 12 i, j 2m δ ci , cj . The differ-
P
In practice, however, the number of possible par- ence between the observed and expected number
k i kj
titions is exceedingly large even for relatively of
within
class ties is therefore 1
2 i , j A ij 2m δ
small networks, and therefore heuristics ci , cj Given that the number of edges varies from
are needed to optimize the value of Q. Community one network to the next, it is convenient to deal
detection methods based on quality function with the fraction of edges as opposed to the num-
optimization therefore have two distinct compo- ber of edges, which is easily obtained by dividing
nents, which are the functional form of the the expression by m, resulting in
quality function Q, and the heuristic used for

navigating a subset of partitions over which Q is 1 X ki kj
maximized. QM ðP Þ ¼ Aij δ ci , cj :
2m i, j 2m
The most commonly used optimization-based
approach to community detection is modularity
maximization, where modularity is one possible The assignment P of nodes into classes that
choice for the quality function Q; in statistical maximizes modularity QM ðP Þ is taken as the
terminology, modularity maximization would be optimal partition and identifies the assignment of
regarded as a nonparametric procedure due to the nodes into network communities. Note that mod-
fact that no distributional nor functional form ularity can be easily generalized from binary net-
assumptions are relied upon. There are many var- works to weighted networks, in which case ki
iants of modularity, but here the focus is on the stands for the strength (sum of all adjacent edge
original formulation by Newman and Girvan weights) of node i, and m is the total weight of the
(Newman and Girvan 2003, 2004; Newman edges in the network.
2006). Modularity can be seen as a measure that The expression for modularity has an interest-
characterizes the extent of homophily or assorta- ing connection to spin models in statistical phys-
tive mixing by class membership, and one way to ics. In a so-called infinite range q-state Potts
derive it is by considering the observed and model, each of the N particles can be in one of
expected numbers of connections between verti- q states called spins, and the interaction energy
ces of given classes, where the class of vertex i is between particles i and j is Jij if they are in the
given by ci. The following derivation follows same state and zero if they are not in different
closely that of Newman (2010), although other states. The energy function of the system, known
derivations, based, for example, on dynamic pro- as its Hamiltonian, is given by the sum over all of
cesses, are also available. the pairwise interaction energies in the system
We start by considering the observed number X
of edges between vertices of the same class, which H ð fσ gÞ ¼ J ij δ σ i , σ j ,
P
is given by 12 i, j Aij δ ci , cj , where δ( , ) is the i, j
Fig. 8 Schematic of a multislice network. Each slice networks, the slice-to-slice coupling extends for each
represents a network encoded by the adjacency tensor node a tie to itself across neighboring slices only as exem-
Aijs, where subscripts i and j are used to index the nodes plified for the node in the upper right corner of the slices;
and subscript s is used to index the slices. Each node is for multiplex networks, the slice-to-slice coupling extends
coupled to itself in the other slices, and the structure of this a tie from each node to itself in all the slices as exemplified
coupling, encoded by the Cjrs tensor, depends on whether for the node in the lower left corner. Whatever the form of
the slices correspond to snapshots taken at different times this coupling, it is applied the same way to each node,
(time-dependent network), to communities detected at dif- although for visual clarity the slice-to-slice couplings are
ferent resolution levels (multiscale network), or to a net- shown just for two nodes. Figure adapted from Mucha et al.
work consisting of multiple types of interactions (2010)
(multiplex network). For time-dependent and multiscale
where σ l indicates the spin of particle l and {σ} and other physical systems to be applied to mod-
denotes the configuration of all N spins. Finding ularity optimization and, more broadly, to the
the minimum energy state (the ground state) of the optimization of other quality functions. Simulated
system corresponds to finding {σ} such that H annealing, greedy algorithms, and spectral
({σ}) is minimized. The states of the particles methods serve as examples of these methods.
(spins) correspond to community assignments of More details and references are available in com-
nodes in the network problem, and minimizing H munity detection review articles (Porter et al.
({σ}) is mathematically identical to maximizing 2009; Fortunato 2010).
modularity QM ðP Þ . In the physical system, Although there are several extensions of mod-
depending on the interaction energies, the spins ularity maximization, only one such generaliza-
seek to align with other spins (interact ferro- tion is described here. Mucha et al. developed a
magnetically) or they seek to have different ori- generalized framework of network quality func-
entations (interact antiferromagnetically). In the tions that allow the study of community structure
community detection problem, two nodes seek of arbitrary multislice networks (see Fig. 8),
to be in the same community if they are connected which are combinations of individual networks
by an edge that is stronger than expected; other- coupled through links that connect each node in
wise they seek to be in different communities. one slice to the same node in other slices (Mucha
This correspondence between the two problems et al. 2010). This framework allows studies of
has enabled the application of computational tech- community structure in time-dependent, multi-
niques developed for the study of spin systems scale, and multiplex networks. Much of the work
in the area of community detection is motivated was interested in examining the community struc-
by the observation that the behavior of dynamical ture of the students at three different scales using,
processes on networks is driven or constrained by say, γ s {0.5, 1, 2}, this would result in a three-
their community structure. The approach of fold replication of the 4 2 slice array with each
Mucha et al. is based on a reversal of this logic, of the three layers having a distinct value for γ s.
and it introduces a dynamical process on the net- Taken together, this would lead to a three-
work, and the behavior of the dynamical process dimensional 4 2 3 array of slices.
is used to identify the (structural) communities.
The outcome is a quality function Clique Percolation
Cliques are (usually small) fully connected sub-

1 X kis kjs graphs, and a nondirected k-clique is a complete

QMS ðP Þ ¼ Aijs γ s δsr þ Cjsr subgraph consisting of k nodes connected with k
2μ i, j, s, r 2ms
(k 1)/2 links. In materials science the term
δ cis , cjr , percolation refers to the movement of fluid
through porous materials. However, in mathemat-
where Aijs encodes the node-to-node couplings ics and statistical physics, the field of percolation
within slices and Cjrs encodes the node-to-node theory considers the properties of clusters on reg-
couplings across slices that are usually set to a ular lattices or random networks, where each edge
uniform value ω; ms is the number (or weight) of may be either open or closed, and the clusters
ties within slice s and μ is the weight of all ties in correspond to groups of adjacent nodes that are
the network, both those located within slices and connected by open edges. The system is said to
those placed across slices; γ s is a resolution percolate in the limit of infinite system size if the
parameter that controls the scale of community largest component, held together by open edges,
slices separately for each slice s. The standard occupies a finite fraction of the nodes. The method
modularity quality function uses ci to denote the of k-clique percolation in Palla et al. (2005) com-
community assignment of node i, but in the multi- bines cliques and percolation theory, and it relies
slice context two indices are needed, giving rise to on the empirical observation that network com-
the cis terms, where the subscript i is used to index munities seem to consist of several small cliques
the node in question and the subscript s to index that share many of their nodes with other cliques
the slice. The outcome of minimizing QMS, which in the same community. In this framework,
can be done with the same heuristics as minimi- cliques can be thought of as the building blocks
zation of the standard modularity QM, is a matrix of communities. A k-clique community is then
C that consists of the community assignments cis defined as the union of all adjacent k-cliques,
of each node in every slice. where two k-cliques are defined to be adjacent if
The multislice framework can handle any com- they share k 1 nodes. One can also think about
bination of time-dependent, multiscale, and mul- “rolling” a k-clique template from any k-clique in
tiplex networks. For example, the slices in Fig. 8 the graph to any adjacent k-clique by relocating
could correspond, say, to a longitudinal friendship one of its nodes and keeping the other k 1 nodes
network of a cohort of college students, each slice fixed. A community, defined through the percola-
capturing the offline friendships of the students in tion of such a template, then consists of the union
each year. If data on the online friendships of the of all subgraphs that can be fully explored by
students were also available, corresponding to a rolling a k-clique template. As k becomes larger,
different type of friendship, one could then intro- the notion of a community becomes more strin-
duce a second stack of four slices encoding those gent, and values of k = 3, . . ., 6 tend to be most
friendships. The four offline slices and the four appropriate because larger values become
online slices form a multiplex system, and they unwieldy. The special case of k = 2 reduces to
would be coupled accordingly. One could further bond (link) percolation and k = 1 reduces to site
introduce multiple resolution scales, and if one (node) percolation.
The k-clique percolation algorithm is an exam- similar to that of their neighbors. The criteria for
ple of a local community-finding method. One judging the efficacy of the partition of nodes into
obtains a network’s global community structure communities is embedded in the statistical model
by considering the ensemble of communities implied for the network and as such is a balance
obtained by looping over all of its k-cliques. between all of the terms in the model. This con-
Some nodes might not belong to any community trasts a nonmodel-based objective function such
(because they are never part of any k-clique), and as modularity which focuses on maximizing in
others can belong to several communities (if they some sense the ratio of density of ties within and
are located at the interface between two or more between communities. To illustrate the difference,
communities). The nested nature of communities consider a k-star. The greater the value of k, the
is recovered by considering different values of k, greater the discrepancy in the degree of the actors.
although k-clique percolation can be too rigid Therefore, if k-stars occur frequently, the mem-
because focusing on cliques typically causes one bers of the same k-star are likely to be included in
to overlook other dense modules that are not quite the same group by the latent class model but, due
as tightly connected. to the difference in degree, are unlikely to be
The advantage of k-clique percolation is that it grouped under modularity maximization. How-
provides a successful way to consider community ever, an advantage of the network science
overlap. Allowing the detection of network com- approach is that results are likely to be more
munities that overlap is especially appealing in the robust to model misspecifications than under the
social sciences, as people may belong simulta- social network approach.
neously to several communities (colleagues, fam- In the future it is possible to imagine a bridging
ily, friends, etc.). However, the case can be made of the two approaches to community detection.
that it is the underlying interactions that are dif- For example, a model for the network, or the
ferent, and one should not combine interactions component of the model involving the key deter-
that are of fundamentally different types. In sta- minants of network ties, could be incorporated in
tistics, this is analogous to using composite vari- the modularity function in (7.1). Depending on
ables or scales that combine multiple items in the specification, the result might be a weighted
(e.g.,) health surveys or questionnaires. If version of modularity in which a higher penalty is
the nature of the interactions is known, the sys- incurred if individuals with similar traits – or in
tem might be more appropriately described as a structurally equivalent positions with respect to k-
multiplex network, where one tie type encodes stars, triadic closure or other local network con-
professional interactions, another tie type corre- figurations – are included in different communi-
sponds to personal friendships, and a third tie ties than if individuals with different traits are in
type captures family memberships. The multi- different communities. However, to the best of the
slice framework discussed above is able to author’s knowledge, such a procedure is not
accommodate memberships in multiple commu- available.
nities as long as distinct interaction types are
encoded with distinct (multiplex) ties.
Part IV: Discussion and Glossary
Comparison to Social Network
Approaches to “Community Detection” In this chapter, the dual fields of social networks
The latent class models in section “Latent Inde- and network science have been described, with
pendence Approaches” partitions the actors in a particular focus on sociocentric data. Both fields
network into disjoint groups that can be thought of are growing rapidly in methodological results
as communities. The clustering process can be and the breadth of applications to which they
thought of as a search for structural equivalence are applied.
in that individuals are likely to be included in the In health applications, social network methods
same community if the network around them is for evaluating whether individuals’ attributes
spread from person-to-person across a popula- There are several important topics that have
tion (social influence) and for modeling rela- not been discussed, notably including network
tionship or tie status (social selection) have sampling. In gathering network data, adaptive
been described. Models of relationship status methods such as link-tracing designs are often
have not been applied as frequently in health used to identify individuals more likely to know
applications, where focus often centers on the each other and thus to have formed a relation-
patient. However, Keating et al. (2007) is a ship with other sampled individuals than in a
notable exception. Due to the ever-growing random-probability design. Link-tracing and
availability of data, the interest in peer effects, other related designs are often used to identify
and the need to design support mechanisms, the hard-to-reach populations (Thompson and
role of social network analysis in health care Seber 1996; Thompson and Frank 2000;
and medicine is likely to undergo continued Thompson 2006). However, the sampling prob-
growth in the future. abilities corresponding to link-tracing designs
A novel feature of this chapter is the attention may be difficult to evaluate (generally requiring
given to network science. Although network sci- the use of simulation), and it may not be obvious
ence is descriptively inclined and thus is how they should be incorporated in the analysis.
removed from mainstream translational medical The development of statistical methods that
research seeking to identify causes of medical account for the sample design in the analysis
outcomes, the increasing availability of complex of social network data has lagged behind the
systems data provides an opportunity for net- designs themselves. However, recently progress
work science to play a more prominent role in has been made on statistical inference for sampled
medical research in the future. For example, relational network data (Handcock et al. 2010).
Barabasi and others have created a Human Dis- In the future it is likely that more bridges will
ease Network by connecting all hereditary dis- form between the social network and the network
eases that share a disease-causing gene (Goh science fields with models or methods developed
et al. 2007). In other work, they created a Pheno- in one field used to solve problems in the other.
typic Disease Network (PDN) as a map summa- Furthermore, as these two fields become more
rizing phenotypic connections between diseases entwined, it is likely that they will also become
(Hidalgo et al. 2009). These networks provided more prominent in the solution to important prob-
important insights into the potential common lems in medicine and health care.
origins of different diseases, whether diseases
progress through cellular functions (phenotypes) Acknowledgments The time and effort of Dr. O’Malley
associated with a single diseased (mutated) gene and Dr. Onnela on researching and developing this chapter
was supported by NIH/NIA grant P01 AG031093 and
or with other phenotypes, and whether patients
Robert Wood Johnson Award #58729. The authors thank
affected by diseases that are connected to many Mischa Haider, Brian Neelon, and Bruce E Landon for
other diseases tend to die sooner than those reviewing an early draft of the manuscript and providing
affected by less connected diseases. Such work several useful comments and suggestions.
has the potential to provide insights into many
previously untested hypotheses about disease
mechanisms. Glossary of Terms
For example, they may ultimately be helpful
in designing “personalized treatments” based on To help readers familiar with social networks
the network position held by an individual’s understand the network science component of the
combined genetic, proteomic, and phenotypic chapter and conversely for readers familiar with
information. In addition, they may suggest con- network science to understand the social network
ditions for which treatments found to be effec- component, the following glossary contains a com-
tive on another condition might also be tried. prehensive list of terms and definitions.
Terms Used in Social Networks depends on their characteristics, as occurs

with homophily and heterophily.
1. Social network: A collection of actors 15. Homophily: A preference for relationships
(referred to as actors) and the (social) relation- with actors who have similiar characteristics.
ships or ties linking them. Popularly referred to as “birds of a feather
2. Relationship, Tie: A link or connection flock together.”
between two actors. 16. Heterophily: A preference for relationships
3. Dyad: A pair of actors in a network and the with actors who have different characteristics.
relationship(s) between them, two relation- Popularly referred to as “opposites attracting.”
ships per measure for a directed network, 17. In-degree, Popularity: The number of actors
one relationship per measure for an undi- who initiated a tie with the given actor.
rected network. 18. Out-degree, Expansiveness, Activity: The
4. Triad: A triple of three actors in the network number of ties the given actor initiates with
and the relationships between them. other actors.
5. Scale or valued relationship: A nonbinary 19. k-star: A subnetwork in which the focal actor
relationship between two actors (e.g., the has ties to k other actors.
level of a trait). We focused on binary rela- 20. k-cycle: A subnetwork in which each actor
tionships in the chapter. has degree 2 that can be arranged as a ring
6. Directed network: A network in which the (i.e., a k-path through the actors returns to its
relationship from actor i to actor j need not origin without backtracking. For example, the
be the same as that from actor j to actor i. ties A-B, B-C, and C-A form a three-cycle.
7. Nondirected network: A network in which the 21. k degrees of separation: Two individuals
state of the relationship from actor i to actor linked by a k-path (k 1 intermediary actors)
j equals the state of the relationship from actor that are not connected by any path of length
j to actor i. k 1 or less.
8. Sociocentric network data: The complete set 22. Density: The overall tendency of ties to form
of observations on the n(n 1) relationships in the network. A descriptive measure is
in a directed network, or n(n 1)/2 relation- given by the number of ties in the network
ships in an undirected network, with n actors. divided by the total number of possible ties.
9. Collaboration network: A network whose ties 23. Reciprocity: The phenomena whereby an
represent the actors’ joint involvement on a actor i is more likely to have a tie with actor
task (e.g., work on a paper) or a common j if actor j has a tie with actor i. Only defined
experience (e.g., treating the same episode for directed networks.
of health care for a patient). 24. Clustering: The tendency of ties to cluster and
10. Bipartite: Relationships are only permitted form densely connected regions of the
between actors of two different types. network.
11. Unipartite: Relationships are permitted 25. Closure: The tendency for network configu-
between all types of actors. rations to be closed.
12. Social contagion, Social influence, Peer 26. Transitivity: The tendency for a tie from indi-
effects: Terms used to describe the phenome- vidual A to individual B to form if ties from
non whereby an actor’s trait changes due to individual A to individual C and from indi-
their relationship with other actors and the vidual C to individual B exist. A form of
traits of those actors. triadic closure commonly stated as “a friend
13. Mutable trait: A characteristic of an actor than of a friend is a friend.” Reduces to general
can change state. triadic closure in an undirected network.
14. Social selection: The phenomena whereby 27. Centrality: A dimenionless measure of an
the relationship status between two actors actor’s position in the network. Higher values
indicate more central positions. There are the relationship from one actor to another.
numerous measures of centrality. Four com- For example, element ij contains the relation-
mon ones are degree, closeness, betweeness, ship from actor i to actor j. The diagonal
and eigenvalue centrality. Degree and eigen- elements are zero by definition.
value centrality are extremes in that degree 38. Matrix transpose: The operation whereby ele-
centrality is determined solely from an actor’s ment ij is exchanged with element ji for all i, j.
degree (it is internally focused) while eigen- 39. Row stochastic matrix: A matrix whose rows
value centrality is based on the centrality of sum to 1 and contain nonnegative elements.
the actors connected to the focal actor (it is Thus, each row represents a probability distri-
externally focused). bution of a discrete-valued random variable.
28. Structural balance: A theory which suggests 40. Random variable: A variable whose value is
actors seek balance in their relationships; for not known with certainty. It can relate to an
example, if A likes B and B likes C then A event or time period that is yet to occur, or it
will endeavor to like C as well to keep the can be a quantity whose value is fixed (i.e.,
system balanced. Thus, the existence of tran- has occurred) but is unknown.
sitivity is implied by structural balance. 41. Parametric: A term used in statistics to
29. Structural equivalence: The network configu- describe a model with a specific functional
ration (arrangement of ties) around one actor form (e.g., linear, quadratic, logarithmic,
is similar to that of another actor. Even though exponential) indexed by unknown parameters
actors may not be connected, they can still be or an estimation procedure that relies on spec-
in structurally similar situations. ification of the complete distribution of
30. Structural power: An actor in a dominant the data.
position in the network. Such an actor may 42. Nonparametric: A model or estimation proce-
be one in a strategic position, such as the only dure that makes no assumption about the spe-
bridge between otherwise distinct cific form of the relationship between key
components. variables (e.g., whether the predictors have
31. Network component: A subset of actors hav- linear or additivie effects on the outcome)
ing no ties external to themselves. and does not rely upon complete specification
32. Graph theory: The mathematical basis under of the distribution of the data for estimation.
which theoretical results for networks are 43. Outcome, Dependent variable: The variable
derived and empirical computations are considered causally dependent on other vari-
performed. ables of interest. This will typically be a var-
33. Digraph: A graph in which edges can be iable whose value is believed to be caused by
bidirectional. Unlike social networks, other variables.
digraphs can contain self-ties. Graphs lie in 44. Independent, Predictor, Explanatory variable,
two-dimensional space. Covariate: A variable believed to be a cause
34. Hypergraph: A graph in dimension three or of the outcome.
higher. 45. Contextual variable: A variable evaluated on
35. Maximal subset: A set of actors for whom all the neighbors of, or other members of a set
ties are intact in a binary network (i.e., has containing, the focal actor. For example, the
density 1.0). If the set contains k actors, the proportion of females in a neighboring
maximal subset is referred to as a k-clique. county, the proportion of friends with college
36. Scalar, vector, matrix: Terms from linear and degrees.
abstract algebra. A scalar is a 1 1 matrix, a 46. Interaction effect: The extent to which the
vector is a k 1 matrix, and a matrix is k p, effect of one variable on the outcome varies
where k, p > 1. across the levels of another variable.
37. Adjacency matrix: A matrix whose 47. Endogenous variable: A variable (or an
off-diagonal elements contain the value of effect) that is internal to a system.
Predictors in a regression model that are between the predictors and the outcome and
correlated with the unobserved error are the effect of a change in the predictor on the
endogeneous; they are determined by an change in the outcome.
internal as opposed to an external process. 52. Cross-sectional model: A model of the rela-
By definition outcome variables are tionship between the values of the predictors
endogenous. and outcomes at a given time. Because one
48. Exogenous variable: A variable (or an effect) cannot discern the direction of causality,
that is external to the system in that its value is cross-sectional models are more difficult to
not determined by other variables in the sys- defend as causal.
tem. Predictors that are independent of the 53. Stochastic block model: A conditional dyadic
error term in a regression model are independence model in which the density and
exogeneous. reciprocity effects differ between blocks
49. Instrumental variable (IV): A variable with a defined by attributes of the actors comprising
non-null effect on the endogeneous predictor the network. For example, blocks for gender
whose causal effect is of interest (the “treat- accomodate different levels of connectedness
ment”) that has no effect on the outcome and reciprocity for men and women.
other than that through its effect on treat- 54. Logistic regression: A member of the expo-
ment. Often-used sufficient conditions for nential family of models that is specific to
the latter are that the IV is (i) marginally binary outcomes. It utilizes a link function
independent of any unmeasured con- that maps expected values of the outcome
founders and (ii) conditionally independent onto an unrestricted scale to ensure that all
of the outcome given the treatment and any predictions from the model are well-defined.
unmeasured confounders. In an IV analysis 55. Multinomial distribution: A generalization of
a set of observed predictors may be condi- the binomial distribution to three or more
tioned on as long as they are not effects of categories. The sum of the probabilities of
the treatment and the IV assumptions hold each category equals 1.
conditional on them. While subject to con- 56. Exponential random graph model: A model in
troversy, IV methods are one of the only which the state of the entire network is the
methods of estimating the true (causal) dependent variable. Provides a flexible
effect of an endogeneous predictor on an approach to accounting for various forms of
outcome. dependence in the network. Not amenable to
50. Linear regression model: A model in which causal modeling.
the expected value of the outcome 57. Degeneracy: An estimation problem encoun-
(or dependent variable) conditional on one tered with exponential random graph models
or more predictors (or explanatory variables) in which the fitted model might reproduce
is a linear combination of the predictors observed features of the network on average
(an additive sum of the predictors multiplied but each actor draw bears no resemblence to
by their regression coefficients) and an the observed network. Often degenerate
unobserved random error. draws are empty or complete graphs.
51. Longitudinal model: A model that describes 58. Latent distance model: A model in which the
variation in the outcome variable over time as status of dyads are independent conditional
a function of the predictors, which may on the positions of the actors, and thus the
include prior (i.e., lagged) values of the out- distance between them, in a latent social
come. Observations are typically only avail- space.
able at specific, but not necessarily equally 59. Latent eigenmodel: A model in which the
spaced, times. Longitudinal models make the status of dyads are independent conditional
direction of causality explicit. Therefore, they on the product of the (weighted) latent posi-
can distinguish between the association tions of the actors in the dyad.
60. Latent variable: An unobserved random var- 70. Steady state: The state-space distribution of a
iable. Random effects and pure error terms are Markov chain describes the long-run propor-
latent variables. tion of time the random variable being
61. Latent class: An unobserved categorical ran- modeled is in each state. Often Markov chains
dom variable. Actors with the same value of iterate through a transient phase in which the
the variable are considered to be in the same current state of the chain depends less and less
latent class. on the initial state of the chain. The steady
62. Factor analysis: A statistical technique used state phase occurs when successive samples
to decompose the correlation (or covariance) have the same distribution (i.e., there is no
matrix of a set of random variables into dependence on the initial state).
groups of related items. 71. Colinearity: The correlation between two pre-
63. Generalized estimating equation (GEE): A dictors after conditioning on the other
statistical method that corrects estimation observed predictors (if any). When predictors
errors for dependent observations without are colinear, distinguishing their effects is
necessarily modeling the form of the depen- difficult, and the statistical properties of the
dence or specifying the full distribution of estimated effects are more sensitive to the
the data. validity of the model.
64. Random effect: A parameter for the effect of a 72. Normal distribution: Another name for the
unit (or cluster) that is drawn from a specified Gaussian distribution. Has a bell-shaped
probability distribution. Treating the unit probability density function.
effects as random draws from a common 73. Covariance matrix: A matrix in which the ijth
probability distribution allows information element contains the covariance of items
to be pooled across units for the estimation i and j.
of each unit-specific parameter. 74. Absolute or Geodesic distance: The total dis-
65. Fixed effect: A parameter in a model that tance along the edges of the network from one
reflects the effect of an actor belonging to a actor to another.
given unit (or cluster). By virtue of modeling 75. Cartesian distance: The distance between two
the unit effects as unrelated parameters, no points on a two-dimension surface or grid.
information is shared between units and so Adheres to Pythagorus Theorem.
estimates are based only on information 76. Count data: Observations made on a variable
within the unit. with the whole numbers (0, 1, 2, . . .) as its
66. Ordinary least squares: A commonly used state space.
method for estimating the parameters of a 77. Statistical inference: The process of estab-
regression model. The objective function is lishing the level of certainty of knowledge
to minimize the squared distance of the fitted about unknown parameters (or hypothesis)
model to the observed values of the depen- from data subject to random variation, such as
dent variable. when observations are measured imperfectly
67. Maximum likelihood: A method of estimat- with no systematic bias or a sample from a
ing the parameters of a statistical model that population of interest is used to estimate popu-
typically embodies parametric assumptions. lation parameters.
The procedure is to seek the values of the 78. Null model: The model of a network statistic
parameters that maximize the likelihood typically represents what would be expected
function of the data. if the feature of interest was nonexistent
68. Likelihood function: An expression that (effect equal to 0) or outside the range of
quantifies the total information in the data as interest.
a function of model parameters. 79. Permutation test: A statistical test of a null
69. Markov chain Monte Carlo: A numerical pro- hypothesis against an alternative implemented
cedure used to fit Bayesian statistical models. by randomly reshuffling the labels (i.e., the
subscripts) of the observations. The signifi- governing, for example, the attachment of
cance level of the test is evaluated by new nodes to the existing network structure
resampling the observed data 50–100 times in models of network growth.
and computing the proportion of times that 5. Cumulative advantage: A stylized modeling
the test is rejected. mechanism introduced by Price in 1976 to
capture phenomena where “success breeds
success.” Price applied the model to study
Terms Used in Network Science citation patterns where power-law or power-
law-like distributions are observed for the
1. Network science: The approach developed distribution of the number of citations and
from 1995 onwards mostly within statistical successfully reproduced by the model.
physics and applied mathematics to study 6. Polya urn model: A stylized sampling model
networked systems across many domains in probability theory where the composition
(e.g., physical, biological, social, etc). Usu- of the system, the contents of the urn, changes
ally focuses on very large systems; hence, as a consequence of each draw from the urn.
theoretical results derived in the thermody- 7. Power law: Refers to the specific functional
namic limit are good approximations to real- form P (x) xα of the distribution of quan-
world systems. tity x. Also called Pareto distribution. See
2. Thermodynamic limit: In statistical physics scale-free network.
refers to the limit obtained for any quantity 8. Preferential attachment: A stylized modeling
of interest as system size N tends to infinity. mechanism introduced by Barabasi and
Many analytical results within network sci- Albert in 1999 where the probability of a
ence are derived in this limit due to analytical new node to attach itself to an existing node
tractability. i of degree ki is an increasing function of ki; in
3. Statistical physics: The branch of physics the case of linear preferential attachment, this
dealing with many body systems where the probability is directly proportional to ki. In
particles in the system obey a fix set of rules, short, the higher the degree of a node, the
such as Newtonian mechanics, quantum higher the rate at which it acquires new con-
mechanics, or any other rule set. As the num- nections (increases its degree).
ber of bodies (particles) in a system grows, it 9. Weak ties hypothesis: A hypothesis devel-
becomes increasingly difficult (and less infor- oped by sociologist Mark Granovetter in his
mative) to write down the equations of extremely influential 1973 paper “The
motion, a set of differential equations that strength of weak ties.” The hypothesis, in
govern the motion of the particles over time, short, states the following: The stronger the
for the system. However, one can describe tie connecting persons A and B, the higher the
these systems probabilistically. The word fraction of friends they have in common.
“statistical” is somewhat misleading as there 10. Modularity: Modularity is a quality-function
is no statistics in the sense of statistical infer- used in network community detection, where
ence involved; instead everything proceeds its value is maximized (in principle) over the
from a set of axioms, suggesting that “proba- set of all possible partitions of the network
bilistic” might be a better term. Statistical nodes into communities. Standard modularity
P
physics, also called statistical mechanics, reads as Q ¼ ð2mÞ1 i, j Aij 2m
ki k j
δ ci , cj
gives a microscopic explanation to the phe-
nomena that thermodynamics explains where ci is the community assignment of
phenomenologically. node i and δ is Kronecker delta; other quanti-
4. Generative model: Most network models ties as defined in the text.
within network science belong to this cate- 11. Rate equations: Rate equations, commonly
gory. Here one specifies the microscopic rules used to model chemical reactions, are similar
to master equations but instead of modeling between phases where thermodynamic func-
the count of objects (e.g., number of nodes) in tions are discontinuous.
a collection of discrete states (e.g., the number 19. Network diameter: The longest of the shortest
of k-degree nodes Nk (t) for different values of pairwise paths in the network, computed for
k), they are used to model the evolution of each dyad (node pair).
continuous variables, such as average degree, 20. Hysteresis: The behavior of a system depends
over time. not only on its current state but also on its
12. Master equations: Widely used in statistical previous state or states.
physics, these differential equations model 21. Quality function: Typically a real-valued
how the state of the system changes from function with a high-dimensional domain
one time point to the next. For example, if that specifies the “goodness” of, say, a given
Nk (t) denotes the number of nodes of degree network partitioning. For example, given the
k, given the model, one can write down the community assignments of N nodes, which
equation for Nk (t + 1), i.e., the number of k- can be seen as a point in an N-dimensional
degree nodes at time t + 1. hypercube, the standard modularity quality
13. Fitness or affinity or attractiveness: A node function returns a number indicating how
attribute introduced to incorporate hetero- good the given partitioning is.
geneity in the node population in a growing 22. Dynamic process: Any process that unfolds on
network model. For example, in a model a network over time according to a set of
based on preferential attachment, this prespecified rules, such as epidemic processes,
could represent the inherent ability of a percolation, diffusion, synchronization, etc.
node to attract new edges, a mechanism 23. Slice: In the context of multislice community
that is superimposed on standard preferen- detection, refers to one graph in a collection
tial attachment. of many within the same system, where a
14. Community: A group of nodes in a network slice can capture the structure of a network
that are, in some sense, densely connected to at a given time (time-dependent slice), at a
other nodes in the community but sparsely particular resolution level (multiscale slice),
connected to nodes outside the community. or can encode the structure of a network for
15. Community detection: The set of methods one tie type when many are present (multi-
and techniques developed fairly recently for plex slice).
finding communities in a given network 24. Scale-free network: Network with a power-
(graph). The number of communities is usu- law (Pareto) degree distribution.
ally not specified a priori but, instead, needs 25. Erdős-Rényi model: Also known as Poisson
to be determined from data. random graph (after the fact that the degree
16. Critical point: The value of a control param- distribution in the model follows a Poisson
eter in a statistical mechanical system where distribution), Bernoulli random graph (after
the system exhibits critical behavior: previ- the fact that each edge corresponds to an
ously localized phenomena now become cor- outcome of a Bernoulli process), or the ran-
related throughout the system which at this dom graph (as the progenitor of all random
point behaves as one single entity. graphs). Starting with a fixed set of N nodes,
17. Phase diagram: A diagram displaying the one considers each node pair in turn indepen-
phase (liquid, gas, etc.) of the system as one dently of the other node pairs and connects
or more thermodynamic control parameters the nodes with probability p. Erdős and Rényi
(temperature, pressure, etc.) are varied. first published the model in 1959, although
18. Phase transition: Thermodynamic properties Solomonoff and Rapoport published a similar
of a system are continuous functions of the model earlier in 1951.
thermodynamic parameters within a phase; 26. Watts-Strogatz model: A now canonical
phase transitions (e.g., liquid to gas) happen model by Watts and Strogatz that was
introduced in 1998. Starting from a regular Barnett ML, Keating NL, Christakis NA, O’Malley AJ,
lattice structure characterized by high clus- Landon BE. Reasons for referral among primary care
and specialist physicians. J Gen Intern Med.
tering and long paths, the model shows how 2012b;27:506–12.
randomly rewiring only a small fraction of Berkman L, Glass T. Social integration, social methods,
edges (or, alternative, adding a small num- social support, and health. In: Social epidemiology.
ber of randomly placed edges) leads to a New York: Oxford University Press; 2000. p. 137–73.
Boguñá M, Pastor-Satorras R, Díaz-Guilera A, Arenas
small-world characterized by high cluster- A. Models of social networks based on social distance
ing and short paths. The model is conceptu- attachment. Phys Rev E. 2004;70:056122. https://doi.
ally appealing, and shows how to interpolate, org/10.1103/PhysRevE.70.056122.
using just one parameter, from a regular lat- Bonacich P. Power and centrality: a family of measures.
Am J Sociol. 1987;92:1170–82.
tice structure in one extreme to an Erdős- Borgatti S, Everett M. Network analysis of 2-mode data.
Rényi graph in the other. Soc Networks. 1997;19:243–69.
27. Mean-field approximation: Sometimes called Breiger R. The duality of persons and groups. Soc Forces.
the zero-order approximation, this approxi- 1974;53:181–90.
Cartwright D, Harrary F. A generalization of Heider’s
mation replaces the value of a random vari- theory. Psychol Rev. 1956;63:277–92.
able by its average, thus ignoring any Centola D. Failure in complex social networks. Math
fluctuations (deviations) from the average Sociol. 2009;33:64–8.
that may actually occur. This approach is Choi D, Wolfe P, Airoldi E. Stochastic blockmodels with
growing number of classes. Arxiv preprint. 2010;
commonly used in statistical physics. arXiv:1011.4644.
28. Ensemble: A collection of objects, such as Christakis N, Fowler J. The spread of obesity in a large
networks, that have been generated with social network over 32 years. N Engl J Med.
the same set of rules, where each object in 2007;357:370–9.
Christakis NA, Fowler JH. Social contagion theory: exam-
the ensemble has a certain probability asso- ining dynamic social networks and human behavior.
ciated with it. For example, one could con- Stat Med. 2013;32:556–77.
sider the ensemble of networks that consists Coleman J, Katz E, Menzel H. The diffusion of innovations
of six nodes and two edges, each begin among physicians. Sociometry. 1957;20:253–70.
Coleman J, Katz E, et al. Medical innovation: a diffusion
equiprobable. study. Indianapolis: Bobbs-Merrill; 1966.
Davidsen J, Ebel H, Bornholdt S. Emergence of a small
world from local interactions: modeling acquaintance
networks. Phys Rev Lett. 2002;88:128701. https://doi.
org/10.1103/PhysRevLett.88.128701.
References Dorogovtsev SN, Mendes JFF, Samukhin AN. Structure
of growing networks with preferential linking.
Airoldi EM, Fienberg SE, Xing EP. Mixed membership Phys Rev Lett. 2000;85:4633–6. https://doi.org/
stochastic blockmodels. J Mach Learn Res. 10.1103/PhysRevLett.85.4633.
2008;9:1981–2014. Duijn MV, Snijders TAB, Zijlstra B. P2: a random effects
Anselin L. Spatial econometrics: methods and models. model with covariates for directed graphs. Statistica
Dordrecht: Kluwer; 1988. Neerlandica. 2004;58:234–54.
Barabasi A-L, Albert R. Emergence of scaling in random Erdős P, Rényi A. Random graphs. Publ Math.
networks. Science. 1999;286:509–12. http://www. 1959;6:290–7.
sciencemag.org/content/286/5439/509.abstract Faust K. Centrality in affliation networks. Soc Networks.
Barabasi A-L, Albert R, Jeong H. Mean-field theory for 1997;19:157–91.
scale-free random networks. Phys A Stat Mech Appl. Feller W. An introduction to probability theory and its
1999;272:173–87. http://www.sciencedirect.com/sci applications, vol. 2. New York: Wiley; 1966.
ence/article/pii/S0378437199002915. Festinger L. The analysis of sociograms using matrix alge-
Barnett ML, Landon BE, O’Malley AJ, Keating NL, bra. Hum Relat. 1949;2:153–8.
Christakis NA. Mapping physician networks with Fineberg S, Wasserman S. Categorical data analysis of
self-reported and administrative data. Health Serv single sociometric relations. In: Sociological method-
Res. 2011;46:1592–609. ology. New Jersey: Jossey-Bass; 1981. p. 156–92.
Barnett ML, Christakis NA, O’Malley AJ, Onnela J-P, Fletcher JM. Social interactions and smoking: evidence
Keating NL, Landon BE. Physician patient-sharing using multiple student cohorts, instrumental
networks and the cost and intensity of care in US variables, and school fixed effects. Health Econ.
hospitals. Med Care. 2012a;50:152–60. 2008;19:466–84.
Fletcher JM, Lehrer SF. The effect of adolescent health on Hoff PD, Raftery AE, Handcock MS. Latent space models
educational outcomes: causal evidence using genetic for social networks analysis. J Am Stat Assoc.
lotteries between siblings. Canadian labor market and 2002;97:1090–8.
skills researcher network, working paper no. 32. 2009. Holland P, Leinhardt S. An exponential family of
Fortunato S. Community detection in graphs. Phys probability-distributions for directed-graph. J Am Stat
Reports. 2010;486:75–174. Assoc. 1981;76:33–50.
Frank O, Strauss D. Markov graphs. J Am Stat Assoc. Holland P, Laskey K, Leinhardt S. Stochastic blockmodels:
1986;81:832–42. some first steps. Soc Networks. 1983;5:109–37.
Freeman L. Centrality in social networks, I. Conceptual House J, Kahn R. Measures and concepts of social support.
clarification. Soc Networks. 1979;1:215–39. In: Social support and health. Orlando: Academic;
Freeman L. The development of social network analysis: a 1985. p. 83–108.
study in the sociology of science. Vancouver: Empirical Huisman M, Van Duijn M. Software for statistical analysis
Press; 2004. of social networks. In: The Sixth International Confer-
Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabasi ence on Logic and Methodology; Amsterdam: 2004.
A-L. The human disease network. Proc Natl Acad Sci. Huisman M, Van Duijn M. Software for social networks
2007;104:8685–90. http://www.pnas.org/content/104/ analysis. In: Models and methods in social network anal-
21/8685.abstract ysis. Cambridge: Cambridge University Press; 2005.
Goldenberg A, Zheng AX, Fineberg SE, Airoldi EM. A Hunter D. Curved exponential family models for social
survey of statistical network models. Found Trends networks. Soc Networks. 2007;29:216–30.
Mach Learn. 2009;2:129–233. Hunter DR, Handcock MS. Inference in curved exponen-
Goodreau S. Advances in exponential random graph tial family models for networks. J Comput Graph Stat.
(p*) models applied to a large social network. Soc 2006;15:565–83.
Networks. 2007;29:231–48. Iwashyna TJ, Chang VW, Zhang JX, Christakis
Granovetter MS. The strength of weak ties. Am J Sociol. AN. Physician social networks and variation in prostate
1973;78:1360–80. cancer treatment in three cities. Health Serv Res.
Guimera R, Nunes Amaral LA. Functional cartography of 2002;37:1531–51.
complex metabolic networks. Nature. 2005;433:895–900. Karrer B, Newman MEJ. Stochastic blockmodels and
Haines V, Hurlbert J. Network range and health. J Health community structure in networks. Phys Rev E. 2011;
Soc Behav. 1992;33:254–66. 83:016107. https://doi.org/10.1103/PhysRevE.83.
Handcock MS, Robins GL, Snijders TAB, Moody J, Besag 016107.
J. Assessing degeneracy in statistical models of social Katz L. On the matrix analysis of Sociometric data. Soci-
networks. J Am Stat Assoc. 2003;76:33–50. ometry. 1947;10:233–41.
Handcock M, Raftery A, Tantrum J. Model-based cluster- Katz L. A new status index derived from sociometric
ing for social networks. J Roy Stat Soc A. 2007; analysis. Psychometrika. 1953;18:39–43.
170:301–54. Katz L, Powell JH. Measurement of the tendency toward
Handcock MS, Hunter DR, Butts CT, Goodreau SM, reciprocation of choice. Sociometry. 1955;18:659–65.
Krivitsky PN, Morris M. ergm: A package to fit, sim- Keating NL, Ayanian JZ, Cleary PD, et al. Factors affecting
ulate and diagnose exponential-family models for net- influential discussions among physicians: a social net-
works, http://CRAN.R-project.org/package=ergm. work analysis of a primary care practice. J Gen Intern
Version 2.2-6. 2010. Project home page at http:// Med. 2007;22:794–8.
statnetproject.org Klovdahl A. Social networks and the spread of infectious
Hanneke S, Fu W, Xing EP. Discrete temporal models of diseases. Soc Sci Med. 1985;21:1203–16.
social networks. Electron J Stat. 2010;4:585–605. Kossinets G, Watts DJ. Empirical analysis of an evolving
Harary F. On the notion of balance of a signed graph. Mich social network. Science. 2006;311:88–90. http://www.
Math J. 1953;2:143–6. sciencemag.org/content/311/5757/88.abstract
Harary F. The number of linear, directed rooted and con- Krapivsky PL, Redner S, Leyvraz F. Connectivity of grow-
nected graphs. Trans Am Math Soc. 1955;78:445–63. ing random networks. Phys Rev Lett. 2000;85:4629–32.
Heider F. Attitudes and cognitive orientation. J Psychol. https://doi.org/10.1103/PhysRevLett.85.4629.
1946;21:107–12. Krivitsky PN. Exponential-family random graph models
Hidalgo CA, Blumm N, Barabasi A-L, Christakis NA. A for valued networks. 2012. arXiv preprint, 1101.
dynamic network approach for the study of human 1359v2 [stat.ME] 19 Jan 2012.
phenotypes. PLoS Comput Biol. 2009;5:e1000353. Krivitsky PN, Handcock MS. Fitting position latent cluster
https://doi.org/10.1371/journal.pcbi.1000353. models for social networks with latentnet. J Stat Softw.
Hoff PD. Bilinear mixed effects models for dyadic data. J 2008;24. http://statnetproject.org
Am Stat Assoc. 2005;100:286–95. Krivitsky PN, Handcock MS. A separable model for
Hoff P. Modeling homophily and stochastic equivalence in dynamic networks. 2010. arXiv preprint, 1011.1937v1
symmetric relational data. In: Advances in neural infor- [stat.ME].
mation processing systems, vol. 20. Cambridge, MA: Kumpula JM, Onnela J-P, Saramäki J, Kaski K, Kertész
MIT Press; 2008. p. 657–64. J. Emergence of communities in weighted networks.
Phys Rev Lett. 2007;99:228701. https://doi.org/ O’Malley AJ, Christakis NA. Longitudinal analysis of
10.1103/PhysRevLett.99.228701. large social networks: estimating the effect of health
Landon BE, Keating NL, Barnett ML, Onnela JP, Paul S, traits on changes in friendship ties. Stat Med.
OâMalley AJ, Keegan T, Christakis NA. Variation in 2011;30:950–64.
patient-sharing networks of physicians across the O’Malley AJ, Marsden PV. The analysis of social networks.
United States. JAMA. 2012;308:265–73. Health Serv Outcome Res Methodol. 2008;8:222–69.
Laumann E, Marsden P, Prensky D. The boundary specifi- O’Malley AJ, Arbesman S, Steiger DM, Fowler JH,
cation problem in network analysis. In: Burt R, Christakis NA. Egocentric social network structure,
Minor M, editors. Applied network analysis: a meth- health, and pro-social behaviors in a National Panel
odological introduction. Beverly Hills: Sage; 1983. Study of Americans. PLoS One. 2012;7:e36250.
p. 18–34. https://doi.org/10.1371/journal.pone.0036250.
Lorrain F, White H. Structural equivalence of individuals Opsahl T. Triadic closure in two-mode networks: redefining
in social networks. J Math Sociol. 1971;1:49–80. the global and local clustering coefficients. Soc Networks.
Lyons R. The spread of evidence-poor medicine via flawed 2011; 34. https://doi.org/10.1016/j.socnet.2011.07.001.
social-network analyses. Stat Polit Policy. 2011;2:1–26. Opsahl T, Agneessens F, Skvoretz J. Node centrality in
Manski CA. Identification of endogenous social effects: the weighted networks: generalizing degree and shortest
reflection problem. Rev Econ Stud. 1993;60:531–42. paths. Soc Networks. 2010;32:245–51.
Marsden P. Network methods in social epidemiology. In: Palla G, Derenyi I, Farkas I, Vicsek T. Uncovering the
Methods in social epidemiology. New York: Jossey- overlapping community structure of complex networks
Bass; 2006. p. 267–86. in nature and society. Nature. 2005;435:814–8. https://
Marsden PV, Friedkin NE. Network studies of social influ- doi.org/10.1038/nature03607.
ence. Sociol Methods Res. 1993;22:127–51. Paul S, O’Malley AJ. Hierarchical longitudinal models of
Marsili M, Vega-Redondo F, Slanina F. The rise and fall of relationships in social networks. J R Stat Soc Ser C
a networked society: a formal model. Proc Natl Acad Appl Stat. 2013;62:705–22.
Sci USA. 2004;101:1439–42. Pham HH, O’Malley AS, Bach PB, Saiontz-Martinez C,
McPherson ML, Smith-Lovin C, et al. Birds of a feather: Schrag D. Primary care physicians’ links to other phy-
homophily in social networks. Annu Rev Sociol. sicians through Medicare patients: the scope of care
2001;27:415–44. coordination. Ann Intern Med. 2009;150:236–42.
Moreno JL. Who shall survive? Nervous and mental dis- Piraveenan M, Prokopenko M, Zomaya AY. Assortative
ease processing. The University of Michigan, Ann mixing in directed biological networks. IEEE Trans
Arbor; 1934. Comput Biol Bioinform. 2010;9:66–78. To appear.
Mucha PJ, Richardson T, Macon K, Porter MA, Onnela J-P. Pollack CE, Weissman G, Bekelman J, Liao K, Armstrong
Community structure in time-dependent, multiscale, and K. Physician social networks and variation in prostate
multiplex networks. Science. 2010;328:876–8. http:// cancer treatment in three cities. Health Serv Res.
www.sciencemag.org/content/328/5980/876.abstract 2012;47:380–403.
Newcomb TM. An approach to the study of communica- Porter MA, Onnela J-P, Mucha PJ. Communities in net-
tive acts. Psychol Rev. 1953;60:393–404. works. Not Am Math Soc. 2009;56(1082–1097):1164–6.
Newman ME. Scientific collaboration networks. Price DDS. A general theory of bibliometric and other
II. Shortest paths, weighted networks, and centrality. cumulative advantage processes. J Am Soc Inf Sci.
Phys Rev. 2001;64:016132. 1976;27:292–306. https://doi.org/10.1002/asi.
Newman MEJ. Modularity and community structure in 4630270505.
networks. Proc Natl Acad Sci. 2006;103:8577–82. Robins G, Pattison P, Woolcock J. Small and other worlds:
Newman M. Networks: an introduction. New York: global network structures from local processes. Am J
Oxford University Press; 2010. Sociol. 2005;110:894–936.
Newman MEJ. Communities, modules and large-scale Robins GL, Snijders TAB, Wang P, Handcock MS,
structure in networks. Nat Phys. 2012;8:25–31. Pattison PE. Recent developments in exponential ran-
Newman MEJ, Girvan M. Mixing patterns and community dom graph ( p) models for social networks. Soc Net-
structure in networks. In: Pastor-Satorras R, Rubi J, works. 2007;29:192–215.
Diaz-Guilera A, editors. Statistical mechanics of com- Robins GL, Pattison PE, Wang P. Closure, connectivity
plex networks. Berlin: Springer; 2003. and degree distributions: exponential random graph
Newman MEJ, Girvan M. Finding and evaluating commu- (p*) models for directed social networks. Soc Net-
nity structure in networks. Phys Rev E. 2004;69:026113. works. 2009;31:105–7.
https://doi.org/10.1103/PhysRevE.69.026113. Rubin D. Bayesian inference for causal effects: the role of
Nowicki K, Snijders TAB. Estimation and prediction for randomization. Ann Stat. 1978;6:34–58.
stochastic blockstructures. J Am Stat Assoc. 2001; Seidman SB. Network structure and minimum degree. Soc
96:1077–87. Networks. 1983;5:269–87.
O’Malley AJ. The analysis of social network data: an Shalizi RR, Rinaldo A. Consistency under sampling of
exciting frontier for statisticians. Stat Med. 2013; exponential random graph models. 2012. arXiv pre-
32:539–55. print. arXiv:1111.3054v3
Shalizi CR, Thomas AC. Homophily and contagion are Traud AL, Mucha PJ, Porter MA. Social structure of
generically confounded in observational social network Facebook networks. Phys A Stat Mech Appl.
studies. Sociol Methods Res. 2011;40:211–39. 2012;391:4165–80. http://www.sciencedirect.com/sci
Simmel G. The sociology of Georg Simmel. New York: ence/article/pii/S0378437111009186
The Free Press; 1908. VanderWeele TJ. Sensitivity analysis for contagion effects in
Snijders T. The degree variance: an index of graph hetero- social networks. Sociol Methods Res. 2011;40:240–55.
geneity. Soc Networks. 1981;3:163–74. VanderWeele TJ, Ogburn EL, Tchetgen Tchetgen EJ. Why
Snijders T. Stochastic actor-oriented models for network and when “Flawed” social network analyses still yield
change. J Math Sociol. 1996;21:149–72. valid tests of no contagion. Stat Polit Policy.
Snijders TAB. The statistical evaluation of social network 2012;3:1050. https://doi.org/10.1515/2151-7509.1050.
dynamics. In: Sociological methodology. Oxford, UK: Vázquez A. Growing network with local rules: preferential
Basil Blackwell; 2001. p. 361–95. attachment, clustering hierarchy, and degree correla-
Snijders TAB. Models for longitudinal social network data. tions. Phys Rev E. 2003;67:056104. https://doi.org/
In: Models and methods in social network analysis. Cam- 10.1103/PhysRevE.67.056104.
bridge: Cambridge University Press; 2005. p. 215–47. Wang W, Wong G. Stochastic Blockmodels for directed
Snijders TAB. Statistical methods for network dynamics. graphs. J Am Stat Assoc. 1987;82:8–19.
In: Luchini SR et al., editors. Proceedings of the XLIII Wang P, Sharpe K, Robins GL, Pattison PE. Exponential
Scientific Meeting, Italian Statistical Society, Basil random graph (p*) models for affiliation networks. Soc
Blackwell, Ltd; 2006. p. 281–96 Networks. 2009;31:12–25.
de Solla Price DJ. Networks of scientific papers. Science. Wasserman SS, Faust K. Social network analysis: methods
1965;149:510–5. http://www.sciencemag.org/content/ and applications. Cambridge: Cambridge University
149/3683/510.short. Press; 1994.
Steglich C, Snijders TAB, Pearson M. Dynamic networks Wasserman S, Pattison P. Logit models and logistic regres-
and behavior: separating selection from influence. sions for social networks: I. An introduction to Markov
Sociol Methodol. 2010;40:329–93. graphs and p. Psychometrika. 1996;61:401–25.
Szabo G, Barabasi AL. Network effects in service usage. Westveld AH, Hoff PD. A mixed effect model for longitu-
2007. Arxiv preprint. http://lanl.arxiv.org/abs/physics/ dinal relational and network data, with applications to
0611177 international trade and conflict. Ann Appl Stat.
Thompson S. Adaptive web sampling. Biometrics. 2011;5:843–72.
2006;62:1224–34. White D, Harary F. The cohesiveness of blocks in social
Thompson S, Frank O. Mode-based estimation with link- networks: node connectivity and conditional density.
tracing sampling designs. Survey Methodol. Sociol Methodol. 2001;31:305–59.
2000;26:87–98. Wong LH, Pattison P, Robins G. A spatial model for social
Thompson S, Seber GAF. Adaptive sampling. New York: networks. Phys A Stat Mech Appl. 2006;360:99–120.
Wiley; 1996. http://www.sciencedirect.com/science/article/pii/S0378
Toivonen R, Onnela J-P, Saramäki J, Hyvönen J, Kaski 437105004334
K. A model for social networks. Phys A Stat Mech Zijlstra BJH, Duijn MV, Snijders TAB. The multilevel P2
Appl. 2006;371:851–60. http://www.sciencedirect. model: a random effects model for the analysis of
com/science/article/pii/S0378437106003931 multiple social networks. Methodology. 2006;2:42–7.
Survey Methods in Health Services
Research 27
Steven B. Cohen
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
Designing National Health-Care Surveys to
Inform Health Policy and Health Services Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
Types of Health and Health-Care Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
Objectives and Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
Survey Design Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666
Cross-Sectional and Longitudinal Survey Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666
Use of Complex Nationally Representative Survey Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
Sample Size Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
Controlling for Sampling Error and Bias in Survey Estimates . . . . . . . . . . . . . . . . . . . . 669
Sample Size Targets and Precision Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
Building Survey Response Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671
Survey Procedures to Facilitate Respondent Cooperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
Estimation of Health-Care Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
Development of Sampling Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
Adjustments for Unit Nonresponse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673
Adjustments for Survey Attrition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674
Post-stratification Adjustments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
Variance Estimation Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676
Integrated Survey Designs: Analytical Enhancements Achieved through
the Linkage of Surveys and Administrative and Secondary Data . . . . . . . . . . . . . . . . . 676
An Example of Survey Integration: The Medical Expenditure Panel Survey . . . . . . . . . 678
Advantages of Integrated Survey Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
Linked Provider Data on Expenditures Improves the Accuracy of National Medical
Expenditure Estimates in the MEPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
Integrated Design Expands Capacity for Longitudinal Analyses . . . . . . . . . . . . . . . . . . . . . . 680
Integrated Design of MEPS Facilitates Examination of Response Error . . . . . . . . . . . . . . 681
Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681
S. B. Cohen (*)
Division of Statistical and Data Sciences,
RTI International, Washington, DC, USA
e-mail: scohen@rti.org
# This is a U.S. government work and not under copyright protection in the U.S.; 661
foreign copyright protection may apply 2019
https://doi.org/10.1007/978-1-4939-8715-3_38
662 S. B. Cohen
Policy-Relevant Examples from the Medical Expenditure Panel

Survey (MEPS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
Design of the MEPS to Inform Health Policy and
Health Services Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
Issues on Measuring and Estimating Health
Insurance Coverage in Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
Testing for the Impact of Survey Attrition on Health Insurance
Coverage Estimates in the MEPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684
The Utility of Prediction Models to Oversample the
Long Term Uninsured . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694
Abstract readily accessible data resources to inform health-

Health-care surveys serve as a critical source of care policy and practice. Existing sentinel health-
essential information on trends in health-care care databases that provide nationally representa-
costs, coverage, access, and health-care quality. tive population based data on measures of health-
The findings derived from these surveys often care access, cost, use, health insurance coverage,
facilitate the development, implementation and health status, and health-care quality provide the
evaluation of policies and practices addressing necessary foundation to support descriptive and
health care and health behaviors at the national behavioral analyses of the US health-care system.
level. This chapter serves to illustrate several sur- Such studies help inform assessments of the avail-
vey methods that enhance the performance and ability and costs of private health insurance in the
utility of health services research efforts. Atten- employment-related and non-group markets, the
tion has been given to the topics of sample and population enrolled in public health insurance cov-
survey designs, nonresponse and attrition, estima- erage and those without health-care coverage, and
tion, precision, sample size determination, and the role of health status in health-care use, expen-
analytical techniques to control for survey design ditures, household decision making, and health
complexities in analysis. Several of the topics that insurance and employment choices. Health services
are featured in this chapter are further connected research efforts provide essential insights into the
by their substantive focus on the measurement drivers of trends in health-care expenditures and
of trends in health-care costs, coverage, service utilization; serve to estimate the impact of
access, and health-care utilization. In addition changes in financing, coverage, and reimbursement
to highlighting underlying survey operations, policy; and help determine who benefits and who
estimates, and outputs, the topics that have bears the cost of a change in policy. Government
been covered also serve to identify potential and nongovernmental entities rely upon these data
enhancements that facilitate improvements in and research efforts to evaluate health reform pol-
design, data collection, estimation strategies, icies, the effect of tax code changes on health
and ultimately analytical capacity for health expenditures and tax revenue, and proposed
services research efforts. changes in government health programs such as
Medicare.
In this chapter, attention is given to key survey
Introduction methods that enhance the conduct of health services
research efforts. To ensure their utility and integrity,
There is a growing demand for timely, high-quality, it is essential that health and health-care surveys are
and precise estimates of health-care parameters at designed according to high-quality, effective, and
the national and subnational levels and associated efficient statistical and methodological practices
27 Survey Methods in Health Services Research 663
and optimal sample designs. This also necessi- Generally, surveys are operationalized by the
tates that subsequent applications of estimation selection of a representative sample of the popu-
strategies to the survey data, as well as analytical lation or universe of interest, referred to as the
techniques and interpretations of resultant target population, and the acquisition of informa-
research findings, are guided by well-grounded tion from the sample units obtained in a structured
statistical theory. The chapter also features impor- manner though administration of a well-
tant sample design considerations, with coverage developed questionnaire. The universe of interest
given to topics that include frame development, is often a population but can be any identifiable
sample size specifications, precision require- group of individual units such as health-care pro-
ments, and sample selection scheme. Adhering viders or events such as health-care visits. If the
to a total survey error framework, challenges that sample is selected as a probability sample, in
characterize health services research efforts are which a frame exists for sample enumeration and
identified, and the interdependence between the every unit selected from the frame has a known
analysts, the health-care survey designers, and the probability of selection for the sample, the find-
statisticians is reinforced. In this context, the ings from the sample are generalizable to the
methods that are discussed are illustrated with population. This is a powerful attribute and
examples from national health-care survey efforts, enhances the integrity of the data collected. Sur-
though the techniques are also applicable to sub- veys can have relatively simple or extremely com-
national or population subgroup specific target plex designs, but the basic principles of sample
populations. design and data collection methodology remain
the same. The complexity of the survey often
reflects the complexity of the subject under
Designing National Health-Care study. As health and health care encompass a
Surveys to Inform Health Policy wide range of phenomena and relate directly and
and Health Services Research indirectly to many other domains, it is necessary
to develop a range of health surveys to respond to
Surveys are a critical source of information for the differing needs for information. Each of these
development, implementation, and evaluation of surveys is often based on complex designs and
policies and practices addressing health and sophisticated data collection mechanisms.
health care. When properly designed, surveys
can provide accurate, unbiased, and generalizable
information on population characteristics, risk
factors, health status, health-care access, utiliza- Types of Health and Health-Care
tion and insurance coverage, and the health-care Surveys
system itself. To be most useful, surveys must be
designed according to sound statistical and meth- There are three main types of health surveys:
odological principles. Health surveys are data col- population-based surveys that obtain information
lection efforts designed to acquire information on directly from the subject (or a suitable proxy),
the nation’s health and health-care characteristics. surveys that obtain information about entities
Several general, though by no means, exhaustive such as health-care providers, and surveys that
uses of health and health-care survey data include are based on administrative records. Population-
identification of public health problems; program based surveys are used when it is essential to
planning and evaluation; health education and describe the characteristics of a defined popula-
health promotion; epidemiological, biomedical, tion. Often the population of interest is the general
and health services research; measurement of the US population and specific subpopulations as
extent and impact of illness; and the measurement defined by such characteristics as age, sex, race/
of the use of health-care services, related medical ethnicity, or socioeconomic status. However, a
expenditures, and sources of payment for care. population may also be defined by occupation or
664 S. B. Cohen
any other well-defined characteristics. For exam- structure, capacity, and functioning of that system.
ple, physicians could be the ultimate sample units These components range from private physicians’
of a population-based survey if the information offices to hospitals, nursing homes, and home
sought related directly to the physician and his health-care agencies. To fully understand the sys-
practice characteristics. Population-based surveys tem, it is necessary to cover all components. In
are most frequently adopted when the sample unit order to select representative samples of these
is the best source for providing the required infor- components, it is necessary to have sampling
mation. When a well-developed sample frame is frames, equivalent to the list frames mentioned
available, samples for population-based surveys above, that identify each member of each type of
can be selected from lists of all eligible subjects. health-care provider. Surveys of providers can
For example, surveys of Medicare or health plan provide information on different aspects of the
beneficiaries can be drawn from a list of enrollees. health-care system. Questions can be targeted at
When such lists are not available, other methods describing the number of components in a sector
such as area-based probability samples are used. as well as their organizational, legal, or financial
Whatever method is adopted, it is important that characteristics. Information can be obtained on
the sample is selected as a probability sample and the individual provider or on the interactions
may be evaluated for potential coverage and among related providers or between providers
response biases. Once a sample is selected, differ- and patients. Interactions with patients can focus
ent data collection modes can be used to collect on the delivery of care or on how care is paid for.
the necessary information including mail, phone, Sometimes the most accurate source of infor-
Web based, and in person. The nature of the con- mation comes from an administrative record that
tent and the sample will affect the mode chosen. was generated as part of the routine operation of a
Of critical importance is the survey instrument system. This clearly would be the case when the
itself. The questionnaire or other data collection objective of the research is the system producing
instruments need to be developed so that accurate the records. For example, utilization of Medicare
and valid information is obtained. The identifica- services is most easily obtained from Medicare
tion of the respondent is also an important step in administrative records. The entire census of
the process. In general, obtaining information records is usually available for these purposes,
directly from the survey subject provides more but often samples of records are taken when the
reliable and valid health data (Madans and entire universe is not needed. When possible,
Cohen 2005). information from administrative records is often
Many population-based health surveys obtain sought as a way to improve the accuracy of infor-
health information directly from the subject (or an mation available from the subject in population-
appropriate proxy) either through in-person or based surveys.
telephone interviews or mail questionnaires. Sup-
plemental information in the form of medical
records is often added to the information obtained Objectives and Content
from subjects to enhance completeness and qual-
ity. To obtain objective standardized information Health surveys used for policy and program
on health characteristics including undiagnosed development can be either focused on a particular
conditions, surveys rely on direct examination of health or health-care issue or can be multipurpose
populations. These surveys are extremely com- in nature. The latter surveys tend to be conducted
plex and expensive to undertake but are of added by public entities and are designed to provide
value to accurately describe the health status of ongoing, descriptive information on a range of
the population, particularly for those subpopula- topics and tend to be based on larger samples.
tions who lack medical care. While the information from these surveys can
In contrast, surveys of the components of the track changes in the population, they are less
health-care system provide information on the effective in obtaining detailed information on a
particular subject or in evaluating the success of the of care. Measures of satisfaction with the usual
survey. Such information is more appropriately source of health care are also collected in the survey
obtained from focused surveys. In order to allow through in-person interviews, in addition to infor-
for comprehensive studies of the current health-care mation on experiencing difficulty or delay in
system, information is needed on the population’s obtaining health care or not receiving needed
access to health care, their utilization of and expen- health-care services.
ditures for health-care services, and their health In addition to national population estimates of
insurance coverage. In a similar vein, an evalu- access to care derived from this survey, the ana-
ation of the system requires an understanding of lytical objectives include a capacity to permit
the patterns and trends in the use of health-care specific comparisons of these measures by age,
services and their associated costs and sources race/ethnicity, sex, perceived health status, health
of payment. To effectively address these issues, insurance coverage, and place of residence. These
researchers and policymakers need accurate analyses permit the identification of potential dis-
nationally representative data to better permit parities in access to care, with particular attention
an understanding of how individual characteris- given to individuals with low incomes, persons
tics, behavioral factors, financial and institutional with disabilities or chronic illness, minorities,
arrangements affect health-care utilization and women and children, elderly, rural, and inner-
expenditures in a rapidly changing health-care city populations. Evaluation of the effects of
market. Health surveys are often designed to acquire changes in the US health-care system on access
this information at both the national and subnational to care for these populations will remain a critical
levels and for policy-relevant population subgroups issue for policymakers in the next few years.
of interest (Madans and Cohen 2005).
Use of Health-Care Services
Access to Care An understanding of the patterns and trends in the
The population’s access to health-care services is use of health-care services is essential to facilitate
an important factor that may influence patterns of evaluations of the current health-care system, in
health-care utilization and associated health out- addition to informing proposals for modification.
comes. Measures of access to care have also been Assessments of the degree of equity in the distri-
used as indicators to assess the quality of the bution of health-care services and the identifica-
nation’s health-care delivery system. In addition tion of health-care disparities require an
to facilitating determinations of the availability of examination of health-care use across vulnerable
a usual source of care for the provision of neces- population subgroups and how it has changed
sary medical care, access to care measures serve to over time. These investigations are essential to
identify barriers to care, which include shortages discern how service utilization varies according
of health-care providers, financial restrictions, to the characteristics of the population, their
limitations in proximity to services, and constraints health plans, and their providers and to identify
associated with waiting times. Population-based other behavioral and institutional factors associ-
national health-care surveys such as the Medical ated with disparities in service use.
Expenditure Panel Survey (MEPS), cosponsored An examination of the variations in the use of
by the Agency for Healthcare Research and Quality health-care services also helps determine the ade-
(AHRQ) and the Centers for Disease Control and quacy of access to care across the population.
Prevention’s National Center for Health Statistics Underutilization of health-care services may be
(CDC/NCHS), collect information on several attributable to limitations in access to care as a
dimensions of access to health care in America. consequence of the lack of adequate health insur-
The survey was designed to yield estimates of the ance, financial resources, or limited availability of
proportion of the population lacking a usual source services in certain areas. Detailed comparisons of
of care as well as the types and characteristics of patterns of use by subpopulations presumed to
providers used by those who do have a usual source require more care (e.g., the elderly, those in poor
666 S. B. Cohen
health, or the terminally ill) relative to their less estimates of the size and composition of the insured
vulnerable counterparts help discern whether and uninsured populations, as well as information
those most in need of care are receiving it. on how demographic characteristics, economic
The utilization measures that are required for factors, and health status affect health plan eligibil-
these analyses typically consist of counts of the ity and decisions to enroll in health insurance plans.
number of visits or events for specific health-care The demand for accurate and reliable information
services that occur in a given calendar year. More on the population’s health-care expenditures, insur-
specifically, health-care services include office- ance coverage, and sources of payment is met by
based visits, ambulatory hospital-based visits, health-care surveys such as the Medical Expendi-
inpatient hospital stays, dental visits, home health ture Panel Survey (MEPS) cosponsored by the
visits, and prescribed medicine purchases. This Agency for Healthcare Research and Quality
information is acquired through population- (AHRQ) and NCHS.
based surveys, surveys of providers, and surveys
based on administrative records. Health-care sur-
veys are designed to acquire this information at Survey Design Framework
both the national and subnational levels and for
policy-relevant population subgroups of interest. Once the underlying survey objectives are articu-
The visible national data collection efforts that lated, greater specificity is required in order to final-
acquire this type of health-care utilization infor- ize the underlying survey design. With several
mation include the MEPS and the National Health competing analytic objectives under consideration,
Interview Survey (population based), the National priorities need to be established which will serve to
Ambulatory Medical Care Survey (provider guide the necessary precision specifications for the
based), and the Medicare Current Beneficiary Sur- core study estimates of the target population param-
vey (primarily population based and supplemented eters. A final set of survey objectives is then devel-
with administrative records). oped that provides details of the core population
domains of interest and the required levels of preci-
Cost of Medical Care and Coverage sion for domain estimates. Underlying study
Health-care expenditures represent one-sixth of hypotheses to be tested also need to be well speci-
the US gross domestic product, exhibit a rate of fied. The precision requirements for the survey esti-
growth that exceeds other sectors of the economy, mates will then be subject to further evaluation and
and constitute one of the largest components of subject to re-specification based on cost constraints.
the Federal and states’ budgets. Although the rate
of growth in health-care costs slowed in the
mid-1990s, it began to rise again shortly after- Cross-Sectional and Longitudinal
ward, fueled primarily by increasing costs for Survey Designs
hospital care and prescription medications. To
effectively address the issue of rising costs, National health-care sample surveys are generally
researchers and policymakers need accurate characterized by cross-sectional or longitudinal
nationally representative data to better permit an designs. The cross-sectional surveys are designed
understanding of how individual characteristics, to provide a snapshot of population characteristics
behavioral factors, financial incentives, and insti- that relate to a fixed point or interval in time.
tutional arrangements affect health-care utiliza- Alternatively, longitudinal surveys collect data
tion and expenditures in a rapidly changing on more than one occasion from the sample mem-
health-care market. bers of the population of analytical interest in
The continuing rise in the number of persons order to measure change and to obtain data for
without private health insurance has made access to time periods too long to recall accurately in a
health insurance coverage a critical public policy single interview. Longitudinal observations are
issue. Informed public policy requires precise essential for characterizing variations in the
population attributes that are sensitive to changes several stages of sampling. Cluster sampling is
in time. also a common feature of these national samples
Longitudinal survey designs are primarily that consider area samples. In these multistage
adopted to provide the necessary information to sample designs, the first stage of sampling
assess changes in the behavior of the population requires the development of a sampling frame
over a specific time period. Often referred to as in which the land mass of the nation is
panel designs, they have the capacity to permit partitioned into primary sampling units (PSUs)
measurement of seasonal and annual variations in defined as counties or groups of contiguous
population characteristics and behavior. These counties. The eligible set of units are then strat-
longitudinal designs are essential to permit the ified based on available geographic and socio-
acquisition of the necessary data that will support demographic information, and a first-stage
analyses that measure the impact of changes in sample of these primary units is then selected.
health status over time for individuals with spe- This process of subsampling areas continues
cific conditions with respect to their use of health- until sample segments consisting of 100–200
care services and related expenditures. Well- housing units are identified and subsampled.
specified sample size requirements for these sur- The final stage of sampling is often character-
veys that are achieved will also permit comparable ized by the selection of a representative sample
studies for different economic groups or special of housing units, which are then interviewed to
populations of interest, such as the poor, elderly, obtain the essential survey information on
veterans, the uninsured, or racial/ethnic groups. which subsequent health services research will
This type of survey design also allows for the be based.
development of economic models designed to This type of sample design has the following
produce national and regional estimates of the attractions. The specification of the sampling
impact of changes in financing, coverage, and frame is both cost-effective and less labor inten-
reimbursement policy over time, as well as esti- sive, where the list frames of target population
mates of who benefits and who bears the cost of members need to only be constructed for the
such changes in policy. sampled areas. In addition, the interviewing
Longitudinal designs are particularly attractive activity is restricted to the sample areas, achiev-
and well suited for studies that examine the extent ing efficiencies in travel time and cost for
of changes in health insurance coverage over time in-person interviewing. In contrast, these effi-
as well as the persistence of catastrophic medical ciencies are achieved at the expense of a loss in
expenditures over time. A cross-sectional survey precision of survey estimates based on the spec-
design can provide accurate national survey esti- ified sample size relative to the precision that
mates of the percent of the population with private would be achieved based upon a simple random
coverage, public coverage, or the uninsured at a sample selection scheme. The increased vari-
fixed point in time. Alternatively, the most accu- ance in survey estimates in a multistage sample
rate population estimates of the percent of popu- design relative to simple random sampling is the
lation ever uninsured in a given year or without result of the greater likelihood of geographi-
coverage for an entire year’s duration come from cally clustered units to have more homogeneous
data collection efforts that have adopted a longi- responses. This within cluster homogeneity is
tudinal survey design. measured by the intra-cluster correlation coeffi-
cient which measures the correlation between
units from the same cluster. Overall, the differ-
Use of Complex Nationally ential in the variance of a survey estimate of a
Representative Survey Designs population mean y based on a complex multi-
stage sample design VarDesign ðyÞ with dispro-
Many of the large national health-care surveys are portionate sampling relative to a simple random
characterized by a complex design structure with sample Varsrs ðyÞ is specified as the design effect.
668 S. B. Cohen
hP pffiffiffi i
Design effect ¼ VarDesign ðyÞ=Varsrs ðyÞ n¼ H
W ð h ÞS ðh Þ c ð hÞ
nh PH o
In addition, the effective sample size for a = V þ ð1=N Þ h W ðhÞS2 ðhÞ ,
design that departs from simple random sampling
P
assumptions is specified as the underlying sample where W(h) ¼ N(h)/N and N¼ H h NðhÞ
size, n, divided by the design effect. (Cochran 1977).
In practice, few health-care surveys are
Effective
sample size conducted with the primary objective of optimiz-
¼ n= VarDesign ðyÞ= Varsrs ðyÞ :
ing the design based on a single parameter esti-
mate. When the design specifications require
Sample Size Determination attention to competing precision specifications
for a variety of survey estimates, the optimization
Stratification is used in sample designs to improve process becomes much more complex. Often,
the precision of survey estimates and also provide sample size optimization for multiple variance
greater control of the sample distribution. For less constraints does not have a closed form solution.
complex designs, when a fixed set of strata (h ¼ 1, Conventional approaches under these circum-
2,. . ..H ) are defined and data collection costs for stances rely on iterative approaches to sample
surveying units from each distinct stratum and the size determination that provide an optimal solu-
associated variance estimates for a core criterion tion when convergence criteria are satisfied
variable have been determined, optimum sample (Chromy 1981).
allocation strategies have been developed. The For national health-care surveys, the precision
values of the samples sizes for each stratum, requirements may be articulated by specifying the
n(h), may be selected to minimize the variance Var amount of error that may be tolerated in the survey
ðyÞ of the survey estimate of the criterion variable y, estimates. To illustrate this process, assume some
expressed as a mean, for a fixed cost (C) or to margin of error, d, in the estimated survey mean of
minimize the cost for a specified level of precision a criterion variable of interest y from the survey
VðyÞ. Considering a cost function of the form has been established, and there is a small risk (α)
that the sponsors are willing to incur that the
Data Collection Cost fCg actual error is larger than d. This can be expressed
XH as Pr j y Yj d ¼ α.
¼ CðoÞ þ h
CðhÞnðhÞ For large samples, n is approximated by
(Design effect) [z2 S2/d2], where z is the cutoff
where C(o) represents an overhead cost and C(h)
point on a standardized normal distribution that
is the data collection cost per unit.
cuts off an area α at the tails and S2 is the
The variance of the estimated mean of a crite-
variance of y.
rion variable will be minimized when n(h) is pro-
pffiffiffi Another way to determine the sample size is to
portional to N ðhÞSðhÞ= cðhÞ, where N(h) is the
specify the relative standard error (RSE) required
population in stratum h and S(h) is the standard
for the resultant survey estimate. The RSE of a
deviation for the criterion variable.
survey estimate is defined as the ratio of the stan-
When cost is fixed, the overall sample size
dard error of the survey estimate SEðyÞ divided by
specification to minimize the variance of survey
the estimate y or RSEðyÞ ¼ SEðyÞ=y
estimate y when considering stratified sampling is
n XH pffiffiffi o pffiffiffi
Since the RSEðyÞ ¼ S=y n, then n
n ¼ ðC CðoÞÞ N ð h ÞS ð h Þ== c ð hÞ =
PH h
pffiffiffi
¼ S2 = y2 RSE2 ðyÞ :
h N ðhÞ SðhÞ== cðhÞ
Alternatively, when the precision level V is For example, if one was attempting to obtain
fixed, the overall sample size specification to min- an estimate of the proportion of the population
imize cost under stratified sampling assumptions is under age 65 uninsured in a given year, p, with a
RSE ¼ .05 and a survey design effect ¼ 1.6, and a 2

MSEðyÞ ¼ E y Y
prior estimate of p at .15, the necessary sample 2
size would be n ¼ 3627: ¼ E ð y Eð y Þ Þ 2 þ Eð y Þ Y
h i
Here, n ¼ 1:6ð:15Þð:85Þ= ð:15Þ2 ð:05Þ2 ¼ 3627 ¼ VarðyÞ þ Bias2
A desired objective of national health-care

and 1=2 survey designs to inform health services research
RSEðpÞ ¼ ð1:6 ð:15 :85ÞÞ
efforts, and all surveys in general, is the minimi-
= 36271=2 ð:15Þ ¼ :05
zation of the mean square error of survey esti-
mates. This requires attention to controlling the
allowable level of error attributable to each of
Controlling for Sampling Error these distinct sources of error, the error due
and Bias in Survey Estimates to sampling and the bias in survey estimates.
A well-designed survey requires careful atten-
Survey estimates of population parameters are tion given to the data collection protocol, the
subject to two core sources of error, variable design of the survey questionnaire, and the
error and bias. Variable error is the random operationalization of data collection strategies
component of survey error attributable to sam- to minimize survey nonresponse. With respect
pling error and variable measurement errors. to sources of survey nonresponse, this would
The bias component captures systematic errors include unit nonresponse, survey attrition in
in the survey estimates attributable to the esti- panel or longitudinal surveys, and item non-
mation procedure, survey nonresponse, and response. Once the data collection phase of the
other sources of nonsampling errors inherent survey is completed, the variable component of
in the measurement process. Consequently, the the error is usually fixed. Scrutiny should then be
total error associated with using the sampling given to the identification of the drivers of survey
estimate y to estimate a population parameter Y nonresponse and the sources of systematic mea-
is defined as the differencebetween
the estima- surement error in response profiles. Consider-
tor and the parameter or y Y . Using this ation should also be given to the
framework, the total error may be decomposed implementation of nonresponse adjustment strat-
into the following two terms: egies, imputation to correct for item non-
response, and logical editing procedures to

y Y ¼ f y Eð y Þ g þ Eð y Þ Y , alleviate response error inconsistencies in the
survey responses.
where EðyÞ is the expected value of the statistic y
over repeated sample selections. The first term
represents variable errors associated with the Sample Size Targets and Precision
sampling and measurement process. The second Requirements
term represents the bias of the estimator, quan-
tified as the differential between the expected To illustrate the sample size targets for a national
value of the sample statistic over repeated sam- health-care survey and associated precision tar-
ples and the true value of the population param- gets, the following example is provided (Cohen
eter. Based on this specification of the total error 2000). Often, an overall precision requirement for
associated with the survey estimate of a popu- the national health-care survey is specified as
lation parameter, the accuracy of the survey the achievement of an average design effect spec-
estimate can be assessed by deriving its mean ification for survey estimates of the policy-
square error. More specifically, the mean square relevant population subgroups (e.g., average
error of an estimator y is defined as the expected design effect ¼ 1.6). Precision requirements for
value of the squared total error, the survey are then presented in terms of relative
670 S. B. Cohen
standard errors for the following survey estimates person level (precision requirement specified
(Table 1): as an average relative standard error):
– Total health expenditures
• A 20% population estimate at the person level – Utilization and expenditure estimates for
for each specified domain (e.g., a percent pop- inpatient hospital stays
ulation estimate such as the rate of uninsured – Utilization and expenditure estimates for
for the population under age 65) ambulatory physician visits
• Mean estimates of the following measures of – Utilization and expenditure estimates for
health-care utilization and expenditures at the dental visits
– Utilization and expenditure estimates for
prescribed medicines
Table 1 Targeted average relative standard errors (RSEs)
for subpopulation of analytic interest in the 1997 Medical
To meet these requirements, the survey must
Expenditure Panel Survey Household Component include a minimum number of persons in each
Average RSE
domain of interest. The sample sizes necessary
for a population Average RSE to satisfy these precision requirements for the
estimate of 20% for mean use survey estimates are then derived, adjusting for
(e.g., percent and expenditure survey nonresponse targets and assumptions
Subpopulation uninsured) estimates
regarding the survey’s sample design and esti-
Persons with .020 .035
family income
mated design effects. The necessary sample sizes
less than 200% required to meet the precision targets for survey
of poverty level estimates presented in Table 1 are specified in the
Persons ages .040 .070 following table (Table 2; Cohen 2000).
18–64 predicted
to incur high
medical Table 2 Targeted sample yields at the end of three core
expenditures data collection rounds for 1997 for subpopulations of ana-
Persons 65 years .042 .070 lytic interest: 1997 Medical Expenditure Panel Survey
and over Household Component
Adults (18 and .080 .135 Targeted
over) with Subpopulation sample yield
functional
Person with family income less than 15,000
impairments
200% of poverty level
measured in
terms of ADLsa Persons ages 18–64 predicted to incur 4000
high medical expenditure
Adults (18 and .080 .135
over) with other Persons 65 years and over 3700
impairments Adults (18 and over) with functional 1000
measured in impairments measured in terms of
terms of IADLsb ADLsa
Children (under .080 .135 Adults (18 and over) with other 1000
age 18) with impairments measured in terms of
activity IADLsb
limitations Children (under age 18) with activity 1000
Overall .015 .023 limitations
population Overall population 34,000
Source: Center for Financing, Access, and Cost Trends, Source: Center for Financing, Access, and Cost Trends,
Agency for Healthcare Research and Quality: Medical Agency for Healthcare Research and Quality: Medical
Expenditure Panel Survey Household Component, 1997 Expenditure Panel Survey Household Component, 1997
a a
Need help in one or more activities of daily living (ADLs), Need help in one or more activities of daily living (ADLs),
such as bathing and dressing such as bathing and dressing
b b
Need help in one or more instrumental activities of daily Need help in one or more instrumental activities of daily
living (IADLs), such as shopping or paying bills living (IADLs), such as shopping or paying bills
The current MEPS sample consists of approx- (Stoop et al. 2010). Reluctant respondents are
imately 14,000 households and 32,000 individ- also more likely to attrite over the course of the
uals and includes oversampling of African- survey. Within fixed survey budget constraints,
Americans, Hispanics, Asians, and low-income these costly late-stage call-back interviews impact
households. With respect to desired levels of pre- on overall data quality, timeliness of data release,
cision for survey estimates, a relative standard and overall sample size specifications.
error (RSE) specification of less than or equal to Several studies have demonstrated the utility of
10% is recommended for survey estimates that subsampling nonrespondents in a survey to help
characterize policy-relevant population sub- minimize nonresponse bias and achieve efficien-
groups which include racial and ethnic minorities cies in data collection efforts. Many of these appli-
(RSE (Y) ¼ standard error (Y ) divided by the cations are modeled after the technique proposed
estimate Y ). by Hansen and Hurwitz (1946) to select a sub-
sample from the nonrespondents to get an esti-
mate for the subpopulation represented by the
Building Survey Response Rates nonrespondents (Vartivarian et al. 2006). Variants
of the procedure include application of double
In national household health-care surveys, signif- sampling for ratio and regression estimation with
icant amounts of resources are allocated to obtain a subsampling of the nonrespondents.
the participation of households that constitute the The subsampling of nonrespondents is consid-
last 5–10% of the overall survey response rate. A ered in order to limit survey costs while
substantial number of households that respond maintaining a nationally representative sample.
toward the end of the survey field period are In this vein, the National Survey of Family
characterized by an initial refusal to participate. Growth has implemented a multiphase design
When the specified response rates are in jeopardy which employs the subsampling of nonrespon-
of not being met, concerted use of nonresponse dents. These approaches are increasingly attrac-
conversion techniques are employed in tandem tive to survey designers because they allow for
with occasional extensions of the length of the methods to control the costs at the end of a data
field period. Applications of these “ninth inning” collection period while addressing concerns about
field force engagements to achieve target survey nonresponse rates and errors. For many national
response rates are not cost neutral and often result in-person household surveys, large costs are
in significant increments to data collection costs. incurred for travel to sample segments to inter-
The primary objective of this approach is to view a small set of sample units, usually those
enhance overall longitudinal survey response extremely difficult to contact in prior visits or
rates and achieve a reduction in survey error repeatedly displaying some reluctance to respond
attributable to nonresponse. It has also to the survey. By restricting these expensive visits
been noted that reluctant respondents occasion- to a sample of the nonrespondents at the end of the
ally differ from the more cooperative survey study, a more cost-effective method concentrates
respondents on sociodemographic characteristics, remaining resources on increasing response rates.
which may translate to significant differences in Additional examples of this approach are found in
the core analytic measures obtained from the sur- the General Social Survey, the National Comor-
vey (Stinchcombe et al. 1981; Cohen et al. 2000; bidity Survey Replication, the National Survey of
Lynn 2009). These differences are a key reason for America’s Families, and the National Survey of
continuing to spend resources following them. Recent College Graduates. Related efforts
Alternatively, findings from the European Social focused on subsampling callbacks to improve sur-
Survey and a number of state-level health-related vey efficiency have yielded mixed results, with
surveys in the USA suggest there are few statisti- trivial savings achieved in applications to the
cally significant differences between the sample National Comorbidity Survey, contrasted with
obtained before and after refusal conversion more cost-effective results attained in the
672 S. B. Cohen
American Community Survey. Adaptive survey record all contacts (in-person, telephone, by
designs have also been considered as a related mail) that are made with the household and
framework to improve the efficiency of survey whether they were successful or not. Where
data collection through the application of more appropriate, the conversion attempt may
tailored data collection treatments for different involve reassigning work to a more experienced
households identified using paradata. A special interviewer.
case of an adaptive survey design is the responsive
survey design, where alternative treatments or
data collection strategies are identified (Groves
Estimation of Health-Care Parameters
and Heeringa 2006).
Development of Sampling Weights
Survey Procedures to Facilitate Probability sampling is utilized in health-care sur-

veys to permit the analysis of data from the sample
Respondent Cooperation
to make inferences about the target population of
In national survey efforts such as the MEPS, it is interest. In order to derive unbiased national esti-
mates of population parameters, the selection
essential to achieve as high a response rate as
probability for each sampling unit must be incor-
feasible in order to reduce the potential error due
to nonresponse bias that may impact on resultant porated into the estimation strategy. This is
achieved through the introduction of sampling
survey estimates. The “tool chest” of methods to
weights, which adjust for the differential proba-
maximize survey participation and maintain coop-
eration across the multiple rounds of data col- bilities of selection of the respective sampled units
in the health-care survey. In this context, stratified,
lection is quite extensive and expensive to
multistage, area probability samples allow for
administer. The interviewers are often selected
from the data collection organization’s pool of approximately unbiased estimation of health-
care parameters at the national level, contingent
experienced staff. Location, previous interviewing
on the application of sampling weights that reflect
experience, work samples, and language fluency are
some of the key criteria used for selecting the inter- the sampled unit probabilities of selection into the
sample. The sampling weight is defined as the
viewers and the supervisors. Due to the over-
inverse or reciprocal of a sample unit’s selection
sampling of Hispanics in several of these national
probability into the sample. For multistage sample
surveys, a portion of the interviewers must be fluent
designs, the weights will be specified as the
in both English and Spanish. New household inter-
inverse of the product of each sample unit’s
viewers also receive intensive project-specific train-
stage-specific selection probability.
ing and general interviewing techniques.
In a four-stage sample design typical of
The households that are selected to participate
national household health-care surveys, the initial
are traditionally sent a notification letter which
sampling weight for the k-th person in the j-th
explains how they were selected for the survey
housing unit in the i-th sample segment in the h-
along with a brief description of the survey. Field
th primary sampling unit (generally a county or
staff then call the household when a number is
group of contiguous counties) selection probabil-
provided or attainable to further introduce the
ity is specified as
project and make an appointment to conduct the
interview. Intensive follow-up efforts often are 1 1
made by interviewers to contact persons not at W hijk ¼ Ph Pijh Pjjhi Pkjhij ¼ Phijk
home, to follow-up broken appointments, and to
convert refusals. The interviewers are provided where Phijk is the selection probability for the
with a variety of materials designed to explain k-th person in the j-th housing unit in the i-th
the importance of the study and establish its legit- sample segment in the h-th primary sampling
imacy. Interviewers are generally required to unit; Ph is the first-stage selection probability of
selecting the h-th primary sampling unit; Pi|h is the survey sampling weights to correct for potential
second-stage conditional probability of selecting nonresponse bias, most often applied at the
the i-th segment, given the h-th primary sample housing-unit level. To facilitate these analyses,
sampling unit is selected; Pj|hi is the third-stage the demographic, socioeconomic, health-related,
conditional probability of selecting the j-th hous- and interview-specific profiles of respondents
ing unit, given the i-th segment in the h-th primary and nonrespondents are examined, based on
sample sampling unit is selected; and Pk|hij is the available data for both groups (Groves et al.
final-stage conditional probability of selecting the 2009). Based on the results of these analyses,
k-th individual, given the j-th housing unit in the i- weighting classes are specified to adjust for
th segment in the h-th primary sample sampling housing unit nonresponse. For illustrative pur-
unit is selected. poses, consider weighting classes defined by
Generally, Pk|hij ¼ 1, as all members of a sam- cross-classifications of the following measures
pled household are selected to participate in the from the Medical Expenditure Panel Survey
survey with certainty. These sampling weights (Cohen et al. 1999):
may be interpreted as inflation factors to represent
the number of units in the target population asso- • Family income of primary reporting unit
ciated with the respective sample unit. (less than $10,000; $10,000–19,999;
$20,000–34,999; $35,000 or more; unknown)
• Size of dwelling unit (one, two, three, four,
Adjustments for Unit Nonresponse five, or more)
• MSA size (MSA, population 500,000 or more;
Once the data collection effort is concluded, care MSA, population less than 500,000; non-MSA)
must be taken to further adjust the survey unit • Region (Northeast, Midwest, South, West)
sampling weights to correct for survey non- • Employment classification of reference person
response. In general, the greater the difference (government, private sector, not in labor force/
among subgroups in response rates and the ana- never worked/worked without pay, unknown
lytic characteristic(s) of interest, the greater is the or under 18 years of age)
need to adjust survey weights for nonresponse. In • DU-level personal help measure (units with at
practice, weighting class nonresponse adjust- least one member unable to perform personal
ments are implemented under the assumption care activities or other routine needs, remaining
that nonresponding sampling units have units with person 70 and over, remaining units
responded in a manner similar to that of respon- with no limitations)
dents with similar sociodemographic and eco- • Propensity to cooperate, based on providing
nomic characteristics within the same adjustment phone number during NHIS (phone number
class. Properly designed, a weighting class non- provided, phone present but no number pro-
response adjustment strategy can result in reduced vided, no phone, unknown)
nonresponse bias. The technique requires that the • Age of reference person (under 25, 25–34,
sample be partitioned into mutually exclusive and 35–44, 45–64, 65 and over)
exhaustive classes, with classification information • Race/ethnicity of reference person (Hispanic,
available for both responding and nonresponding black non-Hispanic, other)
units that are correlated with response propensity • Sex of reference person
and the core criterion variables of the study (Cox • Marital status (married, spouse present, other)
and Cohen 1985).
In national health-care surveys, analyses are Overall, C cells were identified based on cross-
conducted of characteristics associated with dif- classifications of these measures, with cell col-
ferential nonresponse. These analyses help iden- lapsing often specified according to a hierarchy
tify the most important measures to use in determined by significance level to insure ade-
developing a nonresponse adjustment to the quate sample representation of the cell. Following
674 S. B. Cohen
this approach, the nonresponse adjustment for the health care of individuals and families, and the
c-th weighting class takes the form health needs of specific population groups such as
P P the elderly and children. In longitudinal survey
BðcÞ ¼ iec EðiÞDUPSWT ðiÞ= iec RðiÞ designs with multiple rounds of data collection,
DUPSWT ðiÞ the overall survey response rate is a multiplicative
function of the round-specific response rates. In
where DUPSWT(i) is the initial housing unit addition to adjusting for survey nonresponse at the
weight for the i-th sample housing unit, which first round of a longitudinal survey with multiple
reflects the reciprocal of the housing unit’s overall rounds of data collection, additional adjustments
selection probability for the sample survey, to the estimation weights are necessary to
E(i) ¼ 1 for all survey housing units selected for help mitigate the potential influence of survey
interviews, E(i) ¼ 0 otherwise, R(i) ¼ 1 for all attrition on bias in estimates. When the rate of
selected housing units responding in the survey, partial response is modest, it is often preferable
R(i) ¼ 0 otherwise, and iec represents eligible to treat the partial respondents as complete
housing units classified in weighting class c. nonrespondents. In this case, an additional
Consequently, the estimation weight adjusted for weighting class adjustment to the survey estima-
the respective survey’s housing unit nonresponse, tion weight to control for survey attrition is appro-
WGTHU1(i), for the i-th housing unit associated priate. For example, if a survey required three
with class c, takes the form WGTHU1(i) ¼ B(c) rounds of data collection to obtain calendar year
DUPSWT(i). Generally, survey participation is information for the population, the first-round
an all or none decision for the entire household, person-level estimation weights would be
so surveys that interview all members of sam- adjusted for survey attrition in the following
pled households will assume this nonresponse manner:
adjusted household sampling weight, WGTSP1
WGTSP2ðiÞ ¼ FðcÞ WGTSP1ðiÞ
(i) ¼ WGTHU1(i). Alternatively, when there is
for the i th person associatedwith class c,
differential nonresponse within households, an
additional weighting class adjustment should be
where the nonresponse adjustment for the c-th
implemented to correct for this additional level
weighting class takes the form
of person-level nonresponse in the survey.
Based on detailed studies of unit nonresponse P P
Fð c Þ ¼ iec EðiÞWGTSP1ðiÞ= iec RðiÞ
in national health-care surveys, studies have
revealed survey nonrespondents were more WGTSP1ðiÞ
likely to consist of smaller households, reside
in metropolitan areas, and have higher incomes. and
WGTSP1(i) is the round 1 nonresponse
Adjustments for Survey Attrition adjusted person-level weight for the i-th round
1 respondent; E(i) ¼ 1 for all round 1 respondents
Some of the large annual national health-care with positive values of WGTSP1(i); E(i) ¼ 0 oth-
surveys also are characterized by a longitudinal erwise; R(i) ¼ 1 for all persons with E(i) ¼ 1 who
design. The data collected in these ongoing lon- responded for their entire period of eligibility in
gitudinal surveys may be designed to permit stud- the calendar year covered by the survey over all
ies of the determinants of health insurance three data collection Rounds; R(i) ¼ 0 otherwise;
coverage and the use of health services and expen- and iec represents all full- and part-year respon-
ditures over time and to identify changes in the dents classified in weighting class c.
provision of health care in relation to social and Often, a logistic regression analysis is used
demographic factors such as employment or to identify the most important measures to include
income, the health status and satisfaction with in specifying a nonresponse adjustment to the
estimation weights in a longitudinal survey to Post-stratification Adjustments

correct for part-year response at the person level.
To illustrate the identification of weighting class To further improve upon the precision of the sur-
cells, c, consider cross-classifications of the fol- vey estimates obtained from a health-care survey,
lowing measures as of the initial round of the poststratification or stratification after sample
MEPS survey: selection is often employed to complement the
initial stratification imposed at the selection
• Round 1 interview classification (no initial stage. The methodology assumes the availability
refusal, initial refusal) of population control totals for the measures used
• Size of MEPS family (one, two, three, four, for poststratification or consideration of estimates
five, or more) of population control totals from a large national
• MSA (MSA, non-MSA) population-based survey with high levels of pre-
• Age (under 20, 20–29, 30–44, 45–64, 65 and cision in survey estimates. Additional gains from
over) poststratification arise as a consequence of
• Marital status of reference person (married, enhanced corrections for survey nonresponse
widowed, divorced, separated, never married). and undercoverage. To illustrate the application
of a poststratification adjustment to the survey
According to prior studies of survey attrition in estimation weights via a weighting class adjust-
a large-scale national longitudinal health-care sur- ment, the following procedure can be used:
vey, participants who initially refused to respond
in the survey were more likely to drop out of the WGTSP3ðiÞ ¼ GðcÞ WGTSP2ðiÞ
survey in subsequent rounds, in addition to those for the i th personassociated with class c,
residing in metropolitan areas. Furthermore, sur-
vey attrition was positively correlated with resid- where the poststratification adjustment for the c-th
ing in a household with a large number of weighting class takes the form
members, being elderly, never being married, X
and being without health insurance coverage. GðcÞ ¼ POPTOT ðcÞ= iec
WGTSP2ðiÞ
Once adjustments for unit nonresponse and
survey attrition have been implemented, attention where WGTSP2(i) is the first-round nonresponse
must also be given to strategies to correct for item and attrition adjusted person-level weight for
nonresponse. Imputation techniques are then con- the i-th complete respondent and iec represents
sidered to complete the data profiles for survey all full- and part-year respondents classified in
items to facilitate the derivation of survey esti- weighting class c, and the weighting class c is
mates. Models are developed to identify the best defined by cross-classifications of population con-
predictors of the criterion variables affected by trol totals POPTOT(c) obtained from the Current
item nonresponse. The predictors are then used Population Survey for the given year for the
to inform imputation strategies that serve to sub- following measures: Census region (Northeast,
stitute a value for the missing data. Standard var- Midwest, South, and West), MSA status (MSA,
iance estimation procedures applied to data sets non-MSA), race/ethnicity (Hispanic, Black but
that have implemented imputation strategies often non-Hispanic, Asian, and other), sex, age, and
underestimate the component of variance due to poverty status. In a complementary manner, post-
imputation. Consideration of multiple imputation stratification can also be implemented in through
techniques, where each missing value is replaced iterative marginal adjustments cycling through the
by a set of plausible values, provides a framework respective population control totals for each mea-
to adjust the variances of survey estimates for sure, also known as “raking.”
imputation and also help minimize the bias in Once all these adjustments are made to
survey estimates attributable to item nonresponse improve the accuracy of survey estimates, differ-
(Rubin 1987). ences in survey estimates derived from alternative
676 S. B. Cohen
survey sources may still occur. Several factors can Statistical software packages that are com-
contribute to differences in estimates of health- monly used to estimate standard errors from com-
care parameters across surveys. These factors plex multistage designs using the Taylor series
include survey content and questionnaire design, linearization method include SAS ® (version 8.2
definitions of the criterion measures, survey or higher), SUDAAN ®, Stata ®, and SPSS ® (ver-
design and methods, and post-data collection pro- sion 12.0 or higher). The software packages vary
cessing such as editing, imputation, and estima- with respect to the specific types of estimates and
tion techniques. Survey design features such as models that can be produced accounting for the
length of recall period, sample design, and complex survey design and the treatment of miss-
response rates affect the accuracy and precision ing data. For complete information on the capa-
of survey estimates of coverage. Alternative bilities of each package, analysts need to refer to
methodologies for editing the survey data, impu- the appropriate software user documentation man-
tation procedures, and adjustments for survey uals. The websites for SAS, SUDAAN, Stata, and
nonresponse can also affect the final survey esti- SPSS are http://www.sas.com, http://www.rti.org,
mates that are generated. In addition, estimates http://www.stata.com, and http://www.spss.com,
within and across surveys differ depending on respectively. The R language also has a package
the duration of the time period that the survey for complex survey analysis. Information on this
estimates cover. package can be found in the June 2003 R News
newsletter available on the R website at http://
www.r-project.org.
Variance Estimation Considerations Standard errors for these national survey esti-
mates are most accurate when the analytic file
To obtain accurate estimates from complex sur- contains all of the sample persons (e.g., those
vey data, for either descriptive statistics or more with positive values for the person weight vari-
sophisticated analyses based on multivariate able) and the appropriate syntax is used to analyze
models, the survey design complexities need to population subgroups. The table above provides
be taken into account. This is achieved by apply- examples of basic programming code for SAS,
ing the survey estimation weights to produce the SUDAAN, Stata, and SPSS to generate estimates
survey estimates and using an appropriate tech- from MEPS person-level files for the survey var-
nique to derive standard errors associated with iable that measures annual health-care expendi-
the weighted estimates. Several methods for esti- tures, totexp (Table 3).
mating standard errors for estimates from com-
plex surveys have been developed, including the
Taylor series linearization method, balanced Integrated Survey Designs: Analytical
repeated replication, and the jackknife method. Enhancements Achieved through
The national health-care survey public use files the Linkage of Surveys
generally include variables to obtain weighted and Administrative
estimates and to implement a Taylor series and Secondary Data
approach to estimate standard errors for weighted
survey estimates. These variables, which jointly The analytical capacity, quality, and data content
reflect the underlying survey design, include the of household-specific health and health-care sur-
estimation weight, sampling strata, and primary veys are visibly enhanced through integrated
sampling unit (PSU) (Korn and Graubard 1999). designs that feature one-to-one data linkages
The documentation and codebook for the public between surveys, administrative and secondary
use files should contain these survey design vari- data, and future connectivity to electronic health
ables. For example, the documentation should records. The data linkages include direct matches
include the person weight, stratum, and PSU to additional health and socioeconomic measures
variables. acquired for the same set of sample units from
Table 3 Example of software codes for analysis of complex survey data

SAS SUDAAN Stata SPSS
proc surveymeans; proc descript filetype¼sas svyset csplan analysis/plan file¼’filename’/
stratum varstr;cluster design¼wr;nest varstr [pweight¼perwt], planvars analysis weight¼perwt/
varpsu;weight perwt; varpsu;weight perwt;var strata(varstr) psu design strata¼varstr cluster¼varpsu/
var totexp; totexp; (varpsu) svymean estimator type¼wr.csdescriptives/
totexp plan file¼’filename’/summary
variables¼totexp/mean/statistics se.
Source: http://www.meps.ahrq.gov/mepsweb/survey_comp/standard_errors.jsp
other sources of survey specific or administrative analyses conducted by planners, policymakers,

data, in addition to linkage to existing secondary researchers, and other professionals examining
data sources at higher levels of aggregation (both the nation’s health-care delivery system and in
geographic and organizational). One of the more factors that may impact health status and health
pervasive uses of existing large-scale national sur- care in the USA. Comparable enhancements to
veys or administrative data bases is to serve as a health surveys for supplementation of economic
sampling frame to facilitate a cost-efficient identifi- indicators are achievable through linkage of sur-
cation of an eligible survey population for purposes vey data to the socioeconomic indicators made
of sample selection, such as the consideration of the available by the Bureau of the Census through
NHIS to serve as a sampling frame for the MEPS the County and City Data Book and public use
and Medicare administrative records to serve as a files from the decennial census.
sampling frame for a survey of Medicare beneficia- Other examples of improved data quality, con-
ries. Health surveys that are so linked to related tent, and analytic capacity include linkages
surveys and/or administrative records from their between individuals in household-specific
inception benefit by this capacity for data supple- health-care surveys with the medical providers
mentation that permits enhanced and more extensive and facilities that treat them and with the
analyses that are beyond the more constrained scope employers that are the source of their health insur-
of the core health survey. In addition, the use of ance coverage benefits. In terms of data quality,
related surveys or administrative data as sampling household reported medical conditions can be
frames for health-care surveys often permits evaluated for accuracy relative to medical specific
enhanced longitudinal analyses when the host sam- records on medical conditions for the same patient
pling frame and the core survey represent successive and specific health events. With respect to health-
time intervals and share comparable data elements. care expenditures collected from household
The large majority of the nationally represen- respondents for their reported health-care events,
tative population-based health surveys sponsored available linked medical provider level data is a
by the Department of Health and Human Services more accurate source of information. The avail-
have also benefited by a capacity to link the sur- ability of such supplemental data on use and
vey data to county-level data on health service expenditures allows for the conduct of methodo-
resources and health manpower statistics avail- logical studies to evaluate the accuracy of house-
able on the Area Resources File. More specifi- hold reported data and informs adjustment
cally, the ARF is a county-specific health strategies to household data in the absence of
resources information system containing informa- provider-specific data to reduce bias attributable
tion on health facilities, health professions, mea- to response error. To the extent these linkages to
sures of resource scarcity, health status, economic provider and employer records include data for
activity, health training programs, and socioeco- time periods beyond the scope of the household
nomic and environmental characteristics. Geo- surveys; the linkage between survey and admin-
graphic codes and descriptors are provided to istrative data also permit enhanced longitudinal
enable linkage to health surveys to expand analyses (Cohen et al. 2005).
678 S. B. Cohen
An Example of Survey Integration: The can be provided for individuals, families, and
Medical Expenditure Panel Survey population subgroups of interest. The data col-
lected in this ongoing longitudinal study also per-
One of the core health-care surveys in the USA, mit studies of the determinants of the use of
the Medical Expenditure Panel Survey (MEPS), is services and expenditures and changes in the pro-
characterized by an integrated survey design. vision of health care in relation to social and
Since its inception, the primary analytical focus demographic factors such as employment or
of the MEPS has been directed to the topics of income, the health status and satisfaction with
health-care access, coverage, cost, and use. Over health care of individuals and families, and the
the past several years, the MEPS data have health needs of specific population groups such as
supported a highly visible set of descriptive and the elderly and children.
behavioral analyses of the US health-care system The set of households selected for the House-
(Cohen et al. 2009). These include studies of the hold Component is a subsample of those partic-
population’s access to, use of, and expenditures ipating in the National Health Interview Survey
and sources of payment for health care, the avail- (NHIS), an ongoing annual household survey of
ability and costs of private health insurance in the approximately 40,000 households (100,000 indi-
employment-related and non-group markets, the viduals) conducted by the National Center for
population enrolled in public health insurance Health Statistics and Centers for Disease Control
coverage and those without health-care coverage, and Prevention, to obtain national estimates of
and the role of health status in health-care use, health-care utilization, health conditions, health
expenditures, household decision making, and status, insurance coverage, and access (Botman
health insurance and employment choices. As a et al. 2000). In addition to the cost savings
consequence of its breadth, the data have achieved by eliminating the need to indepen-
informed the nation’s economic models and their dently list and screen households, selecting a
projections of health-care expenditures and utili- subsample of NHIS participants has resulted in
zation. The level of the cost and coverage detail an enhancement in analytical capacity of the
collected in the MEPS has enabled public and resultant survey data. The use of the NHIS data
private sector economic models to develop in concert with the data collected for the MEPS
national and regional estimates of the impact of provides an additional capacity for longitudinal
changes in financing, coverage, and reimburse- analyses not otherwise available. Furthermore,
ment policy, as well as estimates of who benefits the large number and dispersion of the primary
and who bears the cost of a change in policy. The sampling units (~200) in MEPS has resulted in
MEPS consists of a family of three interrelated improvements in precision over prior expendi-
surveys: the Household Component (HC), the ture survey designs.
Medical Provider Component (MPC), and the The survey consists of an overlapping panel
Insurance Component (IC). The survey is spon- design in which any given sample panel is
sored by the Agency for Healthcare Research and interviewed a total of five times in person over
Quality (AHRQ). 30 months to yield annual use and expenditure
The MEPS Household Component was data for two calendar years. These rounds of
designed to provide annual national estimates of interviewing are spaced about 5–6 months apart.
the health-care use, medical expenditures, sources of The interview is administered through a
payment, and insurance coverage for the US civilian computer-assisted personal interview mode of
noninstitutionalized population. In addition to data collection and takes place with a family
collecting data to yield annual estimates for a variety respondent who reports for him/herself and for
of measures related to health-care use and expendi- other family members. Currently, the MEPS sam-
tures, MEPS also provides estimates of measures ple consists of 14,000 families and 32,000 indi-
related to health status, demographic characteristics, viduals and reflects an oversample of the
employment, and access to health care. Estimates following policy-relevant population subgroups:
Hispanics, blacks, and Asians. Data from two • Individuals 18–64 years who were predicted to
panels are combined to produce estimates for incur high medical expenditures
each calendar year. • Individuals predicted to have family income
The MEPS Medical Provider Component is a less than 200% of the poverty level
survey of the medical providers, facilities, and
pharmacies that provided care or services to sam- Detailed probabilistic models were to be used
ple persons. The primary objective is to collect to target the oversample of individuals likely to
detailed data on the expenditures and sources of incur high levels of expenditures in addition to
payment for the medical services provided to those with family incomes less than 200% of the
individuals sampled for the MEPS. Such data are poverty level. Data collection and training costs
essential to improve the accuracy of the national associated with this independent screening inter-
medical expenditure estimates derived from the view were projected to exceed several million
MEPS, since household respondents are not dollars. As part of the DHHS Survey Integration
always the most reliable source of information Plan, this separate screening interview was elim-
on medical expenditures. The data also serve as inated. Instead, NHIS was specified as the sam-
a primary imputation source of medical expendi- pling frame for MEPS. In addition to the cost
ture data to correct for the item nonresponse on savings achieved by substituting NHIS as the
this measure by the MEPS household sample MEPS sample frame, the design modification
participants. resulted in enhanced analytic capacity of the resul-
The MEPS Insurance Component was designed tant survey data. The use of the NHIS data in
to produce national and state-level estimates of the concert with the MEPS data provides an addi-
cost of employer-sponsored coverage. National, tional capacity for longitudinal analyses not avail-
regional, and state-level estimates can be made of able in the original design. Furthermore, the
the amount, types, and costs of job-related health greater number and dispersion of the sample pri-
insurance. Interviews are conducted annually via mary sampling units that comprise the MEPS
mail with 30,000 establishments to obtain national national sample resulted in improvements in pre-
and state-specific estimates of the availability of cision over the original design specifications.
health insurance at the workplace, the type of cov- These features are in clear contrast to new frame
erage provided by employers, and the associated construction and/or independent screening inter-
costs of coverage. views that characterize unlinked survey design
efforts.
The integrated survey design model also pro-
vides additional features with respect to improv-
Advantages of Integrated Survey ing data collection strategies tied to the core
Designs survey to better ensure that target response rates
are achieved. When the core survey is linked to a
The original MEPS sample design called for an larger host survey, the survey operations and field
independent screening interview to identify a staff that are armed with detailed record of calls
nationally representative sample and facilitate data from the host survey will be better poised to
oversampling of policy-relevant population subcommit and target necessary nonresponse conver-
groups. Detailed information was to be obtained sion techniques to those cases that included reluc-
on sociodemographic, economic, and health sta- tant or hard to reach respondents in the prior data
tus measures to support an oversample of the collection effort.
following policy-relevant groups:
Capacity to Reduce Bias Attributable to Survey
• Adults (18 years and older) with functional Nonresponse As a consequence of the complex
impairments design of the MEPS HC, the MEPS sample
• Children with limitations of activity data must be appropriately weighted to obtain
680 S. B. Cohen
approximately unbiased national estimates for the comprehensive source of information on the
US civilian noninstitutionalized population. The health status, health-care use and expenditures,
MEPS estimation weights are built from the esti- health insurance coverage, and socioeconomic
mation weights developed for the NHIS. The and demographic characteristics of the entire
use of a sampling weight that has already incor- spectrum of Medicare beneficiaries. Rather than
porated the selection probabilities of the sample being linked to a larger survey, the sample for
design and appropriate nonresponse and post- MCBS is drawn from administrative records in
stratification adjustments is an added feature of CMS’s Medicare enrollment file. The Medicare
the integrated survey design. Since survey non- enrollment files also provide mailing addresses for
response is potentially a significant source of bias the sample. Medicare administrative files provide
in survey estimates, the MEPS dwelling unit sam- not only the sample frame but also service, diag-
pling weights included an adjustment to help nosis, and charge details for covered events,
reduce its potential for bias. In general, the greater month-by-month information on enrollment sta-
the difference among subgroups in response rates tus, payments for Medicaid buy-ins and HMO
and the analytic characteristic(s) of interest, the membership, and data for nonrespondents to the
greater is the need to adjust survey weights for interview.
nonresponse. In the absence of an integrated sur-
vey design, the nonresponse adjustment strategy
adopted for the MEPS would be constrained to Linked Provider Data on Expenditures
sociodemographic and economic information that Improves the Accuracy of National
were available at the geographic level (e.g., Medical Expenditure Estimates
county, state, division, and region), rather than in the MEPS
the detailed information available for each house-
hold participant in the NHIS sample selected for The MEPS Medical Provider Component
the MEPS. This is typical of standard household (MPC) was primarily designed to reduce the
surveys which use aggregate data at the geo- bias associated with national medical expendi-
graphic level to inform the nonresponse adjust- ture estimates derived from household reported
ments (e.g., per capita income for the county data. The estimation strategy that has been con-
based on secondary data available from the Cen- sidered to support the data replacement strategy
sus, physicians per 1000 populations and other is comprehensive in nature, making full use of
health manpower statistics at the county-level MPC data to correct for missing and poor-
available from the Area Resources File). In the quality household reported expenditure data. In
absence of an integrated survey design for the addition, it provides the basis for a recalibration
MEPS, none of the household-specific informa- of household reported data, if significant
tion that were factors in the nonresponse adjust- reporting differentials are observed in expendi-
ments would be available, other than the measures ture data between households and medical
of MSA size and region. Clearly the MEPS link- providers.
age to the NHIS enhances the capacity of the
specification of more direct nonresponse adjust-
ments to better correct for survey nonresponse. Integrated Design Expands Capacity
for Longitudinal Analyses
Another survey that benefits by this integrated
design model is the Medicare Current Beneficiary The MEPS survey integration with the National
Survey (MCBS) sponsored by the Centers for Health Interview Survey (NHIS) permits an
Medicare and Medicaid Services. The MCBS is enhanced capacity for longitudinal analyses of
a continuous, multipurpose survey of a nationally trends in health-care utilization, coverage, access,
representative sample of aged, disabled, and insti- and health status. The parallel structures of the two
tutionalized Medicare beneficiaries. It provides a surveys make their integration for longitudinal
analyses easier to accomplish. To facilitate the con- error associated with household reports. While the
duct of longitudinal cohort analyses using the NHIS development of adjustment factors that correct for
and MEPS data in tandem, NHIS/MEPS linkage both underreporting and overreporting of health-
files have been developed. These NHIS/MEPS link- care utilization by household respondents is per-
age files allow users to link persons in the MEPS missible, which would allow for household event
public use files to the records of the same persons in counts to be either scaled down or up, based on
the previous NHIS public use files. Examples of reported or imputed MPS information, an alterna-
enhanced longitudinal analyses based on the tive approach would be to limit the adjustment to
NHIS-MEPS linked files include studies of the correct the outlier cases (the poorest household
long-term uninsured and the conduct of episodes reporters of utilization).
of illness studies over an extended time interval.
Constraints
Integrated Design of MEPS Facilitates
Examination of Response Error It is important to note that several of the desired
features of an integrated survey design are the
In addition to serving as the primary source for sources of its most prominent limitations. As a
the expenditures in the MEPS, the design of the consequence of acquiring more information
Medical Provider Component provides data that on survey respondents through data augmenta-
could potentially facilitate adjustments to tion and data linkages over time, these analyti-
household reported utilization data that correct cal enhancements also increase the potential for
for reporting errors (both underreporting and disclosure of confidential information. To guard
overreporting (telescoping errors)), under the against this, it is necessary to impose greater
assumption that the medical provider reports restrictions on the release of data to the public.
are the “gold standard.” Within a given event The sponsorship and operation of a data center
type, the number of reported events can be to ensure that confidential data is in a secure
aggregated up to the person-provider pair level. environment while permitting more detailed
The distribution of the difference in utilization analyses to be conducted with the nonpublicly
counts between the medical provider and house- available data offers a compromise between
hold reports can then be examined. For each greater data access and achieving confidentiality
event type at the person-provider level (ij), a protection of data. However, this investment in the
difference measure, DIFFij, may be computed, development and operation of a secure data center
where: requires additional funds that may compete with
sample size enhancements or planned research
DIFFij ¼ MPSCOUNTij HHSCOUNTij efforts.
An integrated survey design also requires
MPSCOUNTij ¼ the number of events for the greater coordination across data sources and orga-
person-provider pair reported in provider nizations. There are often competing demands on
survey the host sample frames that may limit the full
HHSCOUNTij ¼ the number of events for the benefits of an integrated design from being real-
person-provider pair reported in household ized. Furthermore, the enhanced longitudinal data
survey that comes with an integrated survey design will
often be characterized by more frequent survey
The use of MPC data to develop adjustment contacts and rounds of data collection which will
factors that recalibrate or correct household impact the overall survey response rate. When
reported data to reflect utilization counts based properly designed and coordinated, as implemented
on MPC data offers a capacity to inform a utiliza- for the MEPS, the integrated survey design remains
tion adjustment to correct for potential response an attractive model for consideration and adoption.
682 S. B. Cohen
Policy-Relevant Examples from the (for families and individuals); the cost, scope, and
Medical Expenditure Panel Survey breadth of private health insurance coverage held
(MEPS) by and available to the US population; and the
specific services purchased through out-of-pocket
Design of the MEPS to Inform Health and/or third-party payments.
Policy and Health Services Research The MEPS data support a wealth of basic
descriptive and behavioral analyses of the US
The MEPS research program, broadly defined to health-care system. These include studies of the
encompass data collection, data development, population’s access to, use of, and expenditures
research, and the translation of research into prac- and sources of payment for health care, the avail-
tice, is directly tied to the strategic goal of identi- ability and costs of private health insurance in the
fying strategies to improve access, foster employment-related and non-group markets, the
appropriate use, and reduce unnecessary expendi- population enrolled in public health insurance
tures. Few other surveys provide the foundation coverage and those without health-care coverage,
for estimating the impact of changes on different and the role of health status in health-care use,
economic groups or special populations of inter- expenditures, household decision making, and
est, such as the poor, elderly, veterans, uninsured, health insurance and employment choices
and racial/ethnic groups. The public sector relies (Cohen et al. 2009; Cohen 2003).
upon the MEPS research findings to evaluate Efforts to address inequities in the availability
health reform policies, the effect of tax code of private health insurance and to control health
changes on health expenditures and tax revenue, insurance premiums and medical care costs must
and proposed changes in government health pro- necessarily focus on the employment-related
grams such as Medicare. In the private sector, health insurance market. Historically, the analyses
these data are also used to develop economic of data from the MEPS family of surveys have
projections. figured prominently in this arena. As is evidenced
The Medical Expenditure Panel Survey in the recent Institute of Medicine (IOM) Report
(MEPS), initiated in 1996, is designed as a con- on “Health Insurance is a Family Matter,” the
tinuous ongoing survey to permit annual estimates report notes that “the most comprehensive data
of health-care utilization, expenditures, insurance on who uses what health-care service and how
coverage, and sources of payment for the US much is paid for those services comes from the
civilian noninstitutionalized population. Over the Medical Expenditure Panel Survey”. MEPS-
past several years, the MEPS data and associated related analyses are prominently used to inform
research findings have quickly become a linchpin components of this IOM report focused on issues
for the nation’s economic models and their proof insurance coverage and cost.
jections of health-care expenditures and utiliza- MEPS-derived estimates of the health insur-
tion. This combination of breadth and depth of the ance status of the US civilian noninstitutionalized
data enables public and private sector analysts to population are critical to policymakers and others
develop economic models designed to produce concerned with access to medical care and the cost
national and regional estimates of the impact of and quality of that care. Health insurance helps
changes in financing, coverage, and reimburse- people get timely access to medical care and pro-
ment policy, as well as estimates of who benefits tects them against the risk of expensive and unan-
and who bears the cost of a change in policy. Since ticipated medical events. When estimating the
1977, AHRQ’s expenditure surveys have been an size of the uninsured population, it is critical to
important and unique resource for public and pri- consider the distinction between those uninsured
vate sector decision makers. The survey is unique for short periods of time and those who are long-
in the level of detail of information obtained on term uninsured across several years in duration.
the health-care services used by Americans at the Compared to people with health-care coverage,
household level and their associated expenditures uninsured people are less likely to visit a doctor,
have a usual source of medical care, receive pre- institutional arrangements affect health-care
ventive services, or have a recommended test or expenditures in a rapidly changing health-care
prescription filled. Consequently, individuals that market. Research findings for the MEPS have
experience extended periods of being uninsured also served to provide health-care decision makers
are particularly at risk for restrictions in access to with a better understanding of the highly concen-
care and exposure to serious illness and significant trated nature of health-care expenditures and the
financial jeopardy. Since many individuals persistence of these high expenditures over time.
undergo transitions in the acquisition and loss of MEPS studies that examine the persistence of
health insurance coverage over time, an important high levels of expenditures over time have been
consideration is the length of duration of spells of essential to help discern the factors most likely to
uninsurance and the capacity of this lack of cov- drive health-care spending and the characteristics
erage to lead to less efficient use of health-care of the individuals who incur them.
services and facilities. In this regard, MEPS Recently, greater attention and prioritization
research efforts have demonstrated that individ- have been given to data collection procedures,
uals who experience short spells of being predictive modeling, and estimation strategies that
uninsured differ significantly from those who help improve the precision and quality of the sur-
have been uninsured for more than a year on vey estimates that characterize this policy-relevant
several dimensions which include access to population subgroup of individuals with high
employer-sponsored coverage, their attitudes and levels of medical expenditures. Research findings
preferences regarding the need for coverage, and from MEPS also provide clear evidence of the
their sensitivity to the cost of acquiring coverage. utility and appropriateness of probabilistic models
In addition to providing cross-sectional estimates as prediction tools for identifying individuals likely
of health insurance coverage each year, the MEPS to incur high levels of medical expenditures in
has the added analytical capacity to identify indi- future years. To the extent that this policy-relevant
viduals with gaps in coverage over time as well as subset of the population is amenable to successful
the duration of the spells of being uninsured for up prediction through the application of well-
to 4 years. developed models, the methodology continues to
In addition to measuring actual out-of-pocket find several venues for application. Prominent
financial burdens for health care, MEPS provides examples of applications ripe for implementation
the only nationally representative data that can be include adoption of oversampling strategies for
used to measure the extent of underinsurance in the national health-care surveys and the identification
USA. Underinsurance is defined as being at risk of of individuals whose health status improvements
spending more than a certain amount of family through disease management programs could most
income on out-of-pocket expenses in the event of significantly result in potential reductions in overall
a catastrophic medical illness. Estimates of the future year health-care expenditures.
underinsured require linked information on fami- Given the growing attention focused on
lies health insurance benefits, family income, and achieving a better understanding of the impact of
risk of experiencing catastrophic medical events. rising prescribed medicine costs on health and the
With health-care absorbing increasing amounts consumption of health services, it is also impor-
of the nation’s resources, the question of how to tant to note the utility of the MEPS to inform
implement health system design innovations that studies examining the association between the
encourage the provision of high-quality and effi- use of newer medicines and morbidity, mortality,
cient health-care delivery is a sentinel concern of and health spending. Using this data resource,
both private and public payers. To effectively researchers have been able to determine the direc-
address this issue, researchers and policymakers tion of the association between the use of newer
have benefited from MEPS research findings to drugs and all other types of nondrug medical
better understand how individual characteristics, spending. Attention has also focused on studies
behavioral factors, financial incentives, and that identify inappropriate medication use, which
684 S. B. Cohen
is a major patient safety concern and has signifi- health status, presence of limitation in activity,
cant consequences with respect to health-care level of education, poverty status, born in USA,
costs. With its wealth of data on health conditions, and total health-care expenditures (Cohen 2003).
prescribed medication utilization and expendi- Once these measures were controlled for in the
tures, and associated therapeutic drug classifica- logistic regression model, it was possible to deter-
tions, the MEPS data have also been helpful to mine whether an individual’s classification with
researchers attempting to identify potentially respect to MEPS panel (year 1 of panel 7 vs. year
inappropriate medication use in the community. 2 of panel 6), which varied significantly in terms
of level of survey attrition, influenced the predic-
tion of the likelihood of being uninsured in a
Issues on Measuring and Estimating calendar year. Under the assumption that the two
Health Insurance Coverage in Surveys distinct MEPS panels that are combined to pro-
duce annual survey estimates were characterized
Testing for the Impact of Survey by the same survey response rates, one would not
Attrition on Health Insurance Coverage expect to observe a significant panel effect. Given
Estimates in the MEPS the higher level of nonresponse across MEPS
panels, where the older panel is affected by greater
The following study illustrates a test to assess the levels of survey attrition, a test for a MEPS panel
quality of the nonresponse adjustments employed effect affords the opportunity to assess the influ-
in the MEPS to adjust for potential nonresponse ence of unadjusted components of survey attrition
bias attributable to survey attrition. The over- on health insurance coverage estimates in a
lapping panel design of the MEPS survey is par- modeling context. The results of the logistic
ticularly well suited to inform these studies. This regression analysis reveal no significant effect
comparison of the stability of national estimates of for MEPS panel classification in distinguishing
health insurance coverage, subject to varying the full-year uninsured individuals from their
levels of survey attrition, made use of a model- insured counterparts (Table 4), when testing at
based analysis that included additional controls an alpha level of .05. These results serve to further
for other predispositional factors. More specifi- reinforce the efficacy of the estimation strategies
cally, a multivariate analysis was conducted to adopted in the MEPS to correct for the impact of
discern the influence of survey attrition on pre- survey attrition on health insurance coverage esti-
dicting the likelihood of being uninsured after mates and related model-based studies.
controlling for sociodemographic and economic
factors associated with this coverage measure. Analyses Based on NHIS to MEPS Linkage
Building on previous research efforts that have In addition to the within MEPS studies, the link-
identified salient factors associated with the pres- age of the MEPS to the NHIS permits a related set
ence or absence of health insurance coverage, a of analyses to be conducted to discern the impact
logistic regression model was developed to con- of survey attrition on national estimates. The
sist of the subset of significant predictors that design permits appending to the MEPS sample
distinguished the uninsured from those with either the data profiles collected in the NHIS for the
public or private coverage (Cohen et al. 2006b; prior year. Using the NHIS data in concert with
Cohen 2003). the restricted sample of MEPS respondents per-
Using data from the 2002 MEPS for individ- mits the derivation of national estimates for the
uals between the ages of 18 and 64, the following prior year based on a NHIS subsample character-
factors were determined to be significant corre- ized by a lower response rate. Using this design
lates in distinguishing between individuals likely feature, the national estimates derived from the
to be uninsured for the entire calendar year from MEPS sample, affected by survey attrition, may
their counterparts with some coverage: age, gen- be compared to the national estimates obtained
der, race/ethnicity, living in MSA, marital status, from the full NHIS, prior to its linkage to MEPS.
Table 4 Logistic regression analysis of the uninsured, testing for panel effects, US civilian noninstitutional population,
ages 18–64, 2002
Contrast Degrees of freedom Wald F P-value Wald F
Overall model 27 137.91 0.0000
Model minus intercept 26 72.69 0.0000
Panel classification 1 1.08 0.2989
Sex 1 63.48 0.0000
Race/ethnicity 3 18.60 0.0000
Health status 4 3.52 0.0082
Limitation in activity 1 14.71 0.0002
Marital status 4 20.59 0.0000
Highest year of education 4 18.01 0.0000
Poverty status 4 62.87 0.0000
USBORN 1 83.88 0.0000
MSA status 1 4.02 0.0462
Income 1 34.34 0.0000
Total health-care expenditures 1 34.91 0.0000
2 * Normalized Log-Likelihood Full Model: 14,862.53
Pseudo Model R-Square: 0.175588
Source: Center for Financing, Access, and Cost Trends, AHRQ, Household Component of the Medical Expenditure
Panel Survey, 2002
In the National Health Interview Survey, three based on the same NHIS measures from the linked
distinct measures of health insurance coverage are full-year 2002 MEPS survey will be characterized
collected as part of the annual survey. These mea- by a response rate subject to three additional
sures determine insurance coverage status at the rounds of interviewing and associated sample
time of the interview, whether there was a period attrition. A comparison of the health insurance
of being uninsured during the 12-month time estimates, based on the NHIS variables derived
frame preceding the interview, and the likelihood from the sample restricted to MEPS with the full
of being uninsured for durations that exceed a year sample NHIS national estimates, permits another
from the time of the interview. Each year, CDC’s assessment of the impact of survey attrition on the
National Center for Health Statistics releases resultant health insurance coverage estimates.
national estimates of the uninsured based on Table 5 provides a summary of the national
these measures, determining the percent of the health insurance estimates derived from the
population uninsured at the time of the interview, NHIS for calendar year 2001. In addition to
uninsured for at least part of the past year, and including the overall estimates of health insurance
uninsured for more than a year. coverage for the nation, and for the population
The cross-sectional nature of the NHIS design, under age 65, the table includes further break-
and its status as the initial baseline interview for downs distinguished by age groups <18 and
the MEPS, helps facilitate the achievement of a 18–64. National estimates of these NHIS mea-
survey response rate that has often exceeded 90%. sures from the MEPS are derived from the
Given the nationally representative nature of the MEPS full-year responding sample linked to the
subsample of the NHIS used for the MEPS each prior year NHIS. Based on the full sample 2001
year, one may produce national estimates of NHIS, the national estimates of being uninsured
health insurance coverage using the NHIS mea- by specific time periods for the entire US civilian
sures for the reserved MEPS subsample (prior to noninstitutional population were 14.2% at the
the conduct of MEPS interviews) that are conver- time of the interview, 17.8% for at least part of
gent with the estimates obtained from the full the past year, and 8.8% for being uninsured for
sample NHIS. Alternatively, national estimates more than a year since the time of the interview.
686 S. B. Cohen
Table 5 Comparison of 2001 national estimates of the uninsured derived from the 2001 NHIS and the 2002 MEPS
Percent of uninsured individuals, civilian noninstitutionalized population (standard error)
2001 estimates derived from the
NHIS 2001 NHIS estimates based on 2002 MEPS
Uninsured at Uninsured for at Uninsured Uninsured at Uninsured for at Uninsured
Age time of least part of the for more time of least part of the for more
group interview past year than a year interview past year than a year
All ages 14.2 (0.23) 17.8 (0.26) 8.8 (0.17) 13.9 (0.52) 17.9 (0.58) 8.9 (0.45)
Under 15.9 (0.25) 20.0 (0.29) 9.9 (0.19) 15.6 (0.57) 20.0 (0.64) 10.0 (0.50)
65 years
18–64 18.0 (0.26) 22.0 (0.28) 11.6 (0.21) 17.6 (0.62) 22.2 (0.70) 11.7 (0.55)
years
Under 10.9 (0.34) 15.1 (0.41) 6.0 (0.24) 10.7 (0.76) 14.9 (0.89) 5.8 (0.60)
18 years
Sources: Center for Financing, Access, and Cost Trends, AHRQ, Household Component of the Medical Expenditure
Panel Survey, 2002
National Center for Health Statistics, CDC, National Health Interview Survey, 2001
Restricting the sample to the full-year MEPS insurance helps people get timely access to med-
respondents for the subsequent year, the ical care and protects them against the risk of
corresponding NHIS-specific national estimates expensive and unanticipated medical events.
of the uninsured were 13.9% at the time of the When estimating the size of the uninsured popu-
interview, 17.9% for at least part of the past lation, it is important to consider the distinction
year, and 8.9% for being uninsured for more between those uninsured for short periods of time
than a year since the time of the interview. As and those who are uninsured for several years.
can be observed from a review of the comparisons Given the risk of exposure to high out-of-pocket
of the MEPS and NHIS-generated estimates of the medical expenditures faced by the long-term
uninsured, no significant difference in estimates uninsured and associated economic and health-
are evident, when testing at an alpha level of .05. related consequences, this population subgroup
A comparison of the NHIS-derived and the is of particular relevance to health policy consid-
MEPS-derived coverage estimates for the popula- erations. Consequently, a prediction model that
tion under age 65 and for age groupings <18 and can accurately identify the long-term uninsured
18–64 revealed similar levels of convergence. is an important analytical tool. These models
Once again, the results present no evidence of have particular relevance as statistical tools to
nonresponse bias attributable to survey attrition facilitate efficient sampling strategies that permit
affecting the national coverage estimates when the selection of an oversample of individuals
subject to more restrictive response rate require- likely to be uninsured for long periods in the
ments in MEPS. future. This discussion provides a summary of
the development of prediction models to identify
the long-term uninsured adults under age 65 and
The Utility of Prediction Models includes an evaluation of its potential utility as an
to Oversample the Long Term oversampling strategy for use in the Medical
Uninsured Expenditure Panel Survey, a core national longitu-
dinal medical care expenditure survey with com-
Estimates of the health insurance status of the US prehensive data on health insurance status. This
civilian population are critical to policymakers type of modeling effort also enhances the ability
and others concerned with access to medical care to discern the causes of being uninsured and the
and the cost and quality of that care. Health characteristics of the individuals who are without
coverage. This feature also applies to prediction constitutes the long-term uninsured is critical.
models that can accurately identify those individ- For this study, the ultimate objective was to
uals with transitions in coverage or with no gaps in develop the best model to predict the set of adults
coverage over a given time interval. under the age of 65 who are without any health
To improve the precision of survey estimates insurance coverage for two consecutive calendar
that characterize policy-relevant population sub- years (Cohen and Yu 2009). With these parame-
groups in a cost-efficient manner, oversampling ters set, a logistic model specification was consid-
strategies are traditionally included as a core sur- ered as most relevant for predicting the set of
vey design component and implemented in the adults under age 65 most likely to be continuously
sample selection phase. When the characteristics uninsured for two consecutive calendar years. The
of the population that are targeted for an over- longitudinal design of the MEPS, with two con-
sample are static in nature, and the sampling secutive years of data on health-care coverage,
frame that will be utilized contains the essential use, and expenditures, was ideally suited to permit
data to facilitate accurate identification of the model development and evaluation.
respective target subpopulation, the underlying The logistic model under consideration classi-
conditions permit a straightforward application fied individuals without coverage for two consec-
of disproportionate sampling techniques. Alterna- utive calendar years as Y ¼ 1, with all other
tively, when the characteristic of the population individuals classified as Y ¼ 0. Alternative defini-
targeted for an oversample is subject to transitions tions of the long-term uninsured such as lacking
over time, the oversampling strategy is subject to coverage for more than a year, being continuously
much greater uncertainties in terms of achieving uninsured for more than 2 years, are likewise
the desired sample size enhancements. The viable. All the predispositional variables included
greater the departure from a static characteristic, as potential predictors were based on an individ-
the more challenging the effort and the less certain ual’s data profile prior to the 2-year period of
the outcome. Other obstacles that further limit the interest. This modeling effort for predicting future
successful application of oversampling strategies health insurance coverage status builds off related
relate to the level of availability of the key mea- efforts that were likewise limited to consideration
sures essential for the identification of the targeted of the immediate prior year’s predispositional
population subgroup. Consequently, when atten- characteristics.
tion is directed to an effort that attempts to Several studies using MEPS data have identi-
increase the sample yield in a survey of individ- fied factors associated with distinguishing indi-
uals likely to be long-term uninsured in the future, viduals most likely to be characterized as the
the operation is subject to both constraints at its long-term uninsured (Selden and Hudson 2006;
inception: (1) the focus on a characteristic that is Short and Graefe 2003). Given the rare classifica-
subject to change and (2) a restricted set of avail- tion of children under the ages of 18 to be long-
able predictor measures available on a sampling term uninsured (only 2% of children were contin-
frame. uously uninsured over the period 2002–2005), the
modeling effort was further restricted to adults
Analytical Framework: Model between the ages of 18 and 64. The precursor
Development information characterizes an individual’s status
Given the analytical and substantive importance at a baseline period, which is defined as in the
of those individuals that are without health insur- year prior to the 2-year period of analytical focus
ance coverage for extended periods of time (in a and interest. In developing the prediction model, a
given year or longer period duration), the devel- core set of potential predispositional measures
opment and specification of accurate models to were identified that were applicable to health
predict the future likelihood of the occurrence of insurance take-up models and readily available
this event are highly desirable. At the outset, the from a screener interview. These included age,
specification of a clear definition of what gender, race/ethnicity, health status, limitations
688 S. B. Cohen
in ability to work, marital status, education level among the potential set of explanatory measures
(as measured by highest year of education com- under consideration (Table 6). The standard
pleted), region, MSA status, presence of hospital- errors of all the survey estimates derived from
ization, nativity in USA, family size, poverty the MEPS in this study and associated test statis-
status, and health insurance coverage status at tics have been adjusted for the impact of cluster-
time of screening (prior status). More specifically, ing due to the multistage survey design and
the measure of prior coverage distinguished unequal weighting.
whether the individual was covered in the prior Individuals with the longest durations of prior
year or, if not, the period of time without coverage spells without coverage were significantly more
(<6 months, 6 months to <1 year, 1 year to <3 likely to be continuously uninsured over the sub-
years, 3 years, or more years). sequent 2-year period. Hispanics, males, and indi-
As part of this study, three alternative predic- viduals born outside of the USA were also more
tion models were fit to the longitudinal data from likely to be continuously uninsured in the future.
the MEPS, the 2004–2005 panel linked to the Furthermore, low-income individuals, those with
2003 NHIS which served as both the MEPS sam- less than 12 years of education, residence in the
pling frame and screening interview. In this set- South or Midwest, and those who were never
ting, Model 1 makes use of the full set of potential married in 2003 were associated with a greater
predictors that are available from the National likelihood of being classified as long -term
Health Interview Survey for purposes of facilitat- uninsured for the period 2004–2005. Finally, a
ing an oversample of individuals predicted to be likelihood ratio test for the goodness of fit for
long-term uninsured in the MEPS. To assess the this model rejected the null hypothesis that the
performance of the fully specified model relative model’s coefficients were jointly equal to zero
to a model based on a more restricted set of and the pseudo-R2 for the model is 0.228 and it
measures, two additional models were considered had the lowest Akaike information criterion
for comparative purposes. The second model that (AIC ¼ 4572.3). A receiver operating character-
is considered (Model 2) is restricted to a single istic (ROC) analysis was also performed for each
measure of one’s insurance status at baseline, model, examining the area under the curve
further distinguished by length of time without (AUC). The selected model also exhibited the
coverage for those uninsured at baseline. From a highest AUC (.880).
survey operations perspective, the straightforward Remarkably, the second model under consid-
application and limited data requirements of this eration (Model 2) that was restricted to a single
model have particular appeal. Alternatively, the measure of one’s insurance status, further distin-
third model (Model 3) replicates the set of mea- guished by length of time without coverage for
sures considered for Model 1 with the exclusion of those uninsured at baseline, exhibited a relatively
the insurance status measure at baseline. comparable goodness of fit and a pseudo-R2 of
0.195 (Table 7). Alternatively, the third model
Likelihood of Being in the Continuously (Model 3) which replicated the set of measures
Uninsured in 2004–2005, Based on 2003 considered for Model 1 with the exclusion of the
Profiles insurance status measure at baseline exhibited less
In the final logistic regression model developed powerful goodness of fit and the lowest pseudo-
for predicting adults between the ages of 18–64 R2 of 0.130.
likely to be continuously uninsured for two sub-
sequent years, baseline health insurance status, Determination of the Cutoff Threshold
race/ethnicity, marital status, education level in Predicted Probability to Facilitate
(as measured by highest year of education com- Oversampling
pleted), nativity in USA, income, and gender Once these predictive models to identify the like-
were determined to be significant predictors lihood of being continuously uninsured have been
when testing at a .05 level of significance developed, additional analyses are necessary to
Table 6 Logistic regression model to identify individuals aged 18–64 likely to be continuously uninsured in 2004–2005,
based on 2003 profiles (2004–2005 MEPS, 2003, NHIS)
Independent variables and effects Beta coeff. SE beta T-test B ¼ 0 P-value T-test B ¼ 0
Intercept 2.30224 0.24615 9.35282 0.00000
Sex
Female 0.52614 0.08397 6.26584 0.00000
Race/ethnicity recode
Hispanic 0.53787 0.14531 3.70152 0.00027
Non-Hispanic Black 0.05864 0.17903 0.32753 0.74355
Non-Hispanic Others 0.36421 0.30747 1.18455 0.23737
Region
Midwest 0.54791 0.18237 3.00437 0.00294
South 0.86221 0.17992 4.79216 0.00000
West 0.37245 0.20332 1.83179 0.06822
MARITL
Married/DK 0.71323 0.11881 6.00321 0.00000
Widowed/divorced/separated 0.33652 0.14520 2.31754 0.02132
Living w/partner 0.32202 0.17812 1.80787 0.07188
EDUCYR
12 years/GED 0.19730 0.11301 1.74583 0.08212
Some college/DK 0.37671 0.12214 3.08426 0.00228
BA/BS degree 0.74324 0.19376 3.83588 0.00016
Adv degree 0.91486 0.22327 4.09753 0.00006
USBORN
No/DK 0.58786 0.13694 4.29267 0.00003
INCOME
$20K–$75K 0.38947 0.10632 3.66309 0.00031
$75K+ 0.72273 0.21587 3.34795 0.00095
How long since last had health coverage covered
<¼ 6 months 1.72635 0.17993 9.59470 0.00000
6 months–1 year 2.00910 0.19350 10.38289 0.00000
1–3 years 2.32315 0.14945 15.54488 0.00000
3 years+/DK 2.93583 0.11909 24.65196 0.00000
Sample size: 8888
Pseudo-R2: 0.228
2 *Normalized Log-Likelihood Full Model: 4528.34
Approximate chi-square (2 * Log-L Ratio): 2298.75
Degrees of freedom: 21
Source: 2004–2005 Medical Expenditure Panel Survey, Center for Financing, Access and Cost Trends, Agency for
Healthcare Research and Quality; 2003 NHIS, NCHS, CDC
identify the appropriate cutoff threshold in pre- specification and rank ordered from highest prob-
dicted probability for screening purposes to facil- ability to lowest. The predicted probability of
itate an oversample of this target population. To being uninsured for two consecutive years in the
accomplish this determination of an operational future (P ¼ Exp( y)/(1 + exp( y))) was derived
cutoff point for each model, the predicted proba- from a transformation of an individual’s predicted
bilities of being identified as continuously log odds ( y) based on the respective prediction
uninsured were determined for each sample indi- model under consideration. Based on MEPS lon-
vidual based on the underlying model gitudinal data, 12.9% of the US civilian
690 S. B. Cohen
Table 7 Logistic regression model to identify individuals ages 18–64 likely to be continuously uninsured in 2004–2005,
based on 2003 coverage profiles (2004–2005 MEPS, 2003, NHIS)
Independent variables and effects Beta coeff. SE beta T-test B ¼ 0 P-value T-test B ¼ 0
Intercept 2.98189 0.07370 40.45741 0.00000
How long since last had health coverage covered
<¼ 6 months 1.99909 0.16837 11.87347 0.00000
6 months–1 year 2.33658 0.17966 13.00526 0.00000
1–3 years 2.69189 0.13720 19.61994 0.00000
3 years+/DK 3.48061 0.11050 31.49911 0.00000
Sample size: 8888
Pseudo-R2: 0.195
2 * Normalized Log-Likelihood Full Model: 4894.40
Approximate chi-square (2 * Log-L Ratio): 1932.69
Degrees of freedom: 4
noninstitutionalized population between the With respect to those who were long-term
ages 18 and 64 were continuously uninsured for uninsured, 59.5% were correctly identified by
the period 2004–2005. Consequently, initial the model, based on the initial cutoff rule applied
cutoff point for prediction classification was to the Model 1 predicted likelihood (model sensi-
established by determining the value of the tivity; Table 8). In addition, of those with some
predicted probability above which the sum of coverage over the 2-year period, 94.0% were cor-
the estimation weights associated with the rectly identified (model specificity), based on their
MEPS sample participants represented the predicted likelihood relative to the cutoff thresh-
top 12.9% of the distribution of the ranked old. When examining predictive capacity, 59.5%
prediction probabilities of being long-term of individuals predicted to be long-term uninsured
uninsured. As a consequence of the disproportion- were correctly classified by the model. It was also
ate sampling scheme adopted in the MEPS to observed that when considering higher values for
facilitate oversampling of policy-relevant popula- the threshold cutoff (top 10%, top 5%), the poten-
tion subgroups, and additional adjustments to the tial predictive capacity of the model in identifying
estimation weights to adjust for nonresponse and the long-term uninsured increased (Table 9).
poststratification, it was necessary to determine Using the top 5% as the threshold, the percent of
the cutoff point based on an estimated those predicted to be long-term uninsured rose to
population-based distribution of predicted proba- 73.7%. However, this gain in model predictive
bilities to insure greater applicability of the capacity was at the expense of potential sample
approach beyond the MEPS setting. This cutoff yield, given the greater restriction on the resultant
translated to a predicted probability of 0.355 eligible sample that fell above the threshold.
(or log odds of 0.598) based on the fully spec- When simultaneously considering model perfor-
ified model (Model 1). Similarly, when consider- mance on predictive capacity, sensitivity, and
ing the model that was restricted to a single specificity, while efficiently achieving accurate
measure of one’s insurance status at baseline targeted yields from oversampling subject to
(Model 2), the cutoff translated to a fixed overall sample size constraints, adoption of
predicted probability of 0.268 (or log odds of the initial cutoff rule was the preferred approach.
1.006). Alternatively, for the model which By establishing a cutoff rule in this manner, one
excludes the insurance status measure at baseline has the capacity to implement a sample selection
(Model 3), the cutoff translated to a predicted scheme permitting the oversampling of the long-
probability of 0.428 (or log odds of 0.290). term uninsured in “real time,” via a screening
Table 8 Examination of the sensitivity, specificity, and predictive capacity of alternative cutoff values – model 1
Predicted
Likelihood of lower pred.prob. of Logit probability True False
long-term uninsured cutoff cutoff Sensitivity Specificity positive negative
0.8000 1.8167 0.1398 0.7350 0.8789 0.4730 0.0427
0.8100 1.6711 0.1583 0.7194 0.8882 0.4876 0.0446
0.8200 1.5156 0.1801 0.7042 0.8975 0.5037 0.0465
0.8300 1.3715 0.2024 0.6905 0.9068 0.5227 0.0480
0.8400 1.1730 0.2363 0.6723 0.9157 0.5411 0.0502
0.8500 0.9506 0.2788 0.6479 0.9236 0.5561 0.0534
0.8600 0.7902 0.3121 0.6275 0.9320 0.5771 0.0558
0.8700 0.6228 0.3491 0.5988 0.9392 0.5929 0.0594
0.8714 0.5975 0.3549 0.5951 0.9401 0.5948 0.0599
0.8800 0.4734 0.3838 0.5694 0.9463 0.6106 0.0630
0.8900 0.3303 0.4182 0.5335 0.9526 0.6244 0.0675
0.9000 0.1804 0.4550 0.4924 0.9579 0.6337 0.0726
0.9100 0.0699 0.4825 0.4531 0.9636 0.6481 0.0774
0.9200 0.0788 0.5197 0.4180 0.9698 0.6719 0.0815
0.9300 0.2582 0.5642 0.3765 0.9751 0.6912 0.0864
0.9400 0.4694 0.6152 0.3317 0.9801 0.7112 0.0916
0.9500 0.6293 0.6523 0.2869 0.9849 0.7371 0.0967
Table 9 Required sample size of adults 18–64 to yield a sample of 1760 individuals continuously without health
insurance coverage over 2 years (50% increase)
Model-based Model-based
Model-based oversample: oversample:
No model- oversample: model 2 – single model 3 – excludes
based model 1 – fully baseline coverage baseline coverage
oversample specified model measure measure
Required overall sample 15,000 11,058 11,028 11,512
size
Oversampling rate N.A. 1.80 1.75 2.58
Model prediction rate N.A. 55.5% 57.1% 38.8%
– % correct predictions
Assumes base sample size of 10,000 individuals aged 18–64 in a MEPS panel responding for their entire 2-year period of
eligibility in the survey
interview that collects the necessary input infor- those adults under age 65 who would be continu-
mation required for the prediction model under ously without coverage for the subsequent 2-year
consideration. period. The performance of the model was assessed
based upon an independent representative sample
Examination of the Sensitivity, that characterizes the nation’s health insurance cov-
Specificity, and Predictive Capacity erage experience. In this setting, the design of the
of Alternative Probabilistic Models MEPS is uniquely suited to this more rigorous cri-
Once a parsimonious model was identified, which terion to assess model performance. This condition
consisted of the subset of predictors that were all was satisfied through development of the prediction
significant at the .05 level, the model was ready to model using data from one specific MEPS longitu-
be evaluated in terms of its accuracy in predicting dinal panel and then applying the model to an
692 S. B. Cohen
independent MEPS longitudinal panel to assess Model 3, the prediction model that included the
model performance. Since the model was developed same sociodemographic predictors but excluded
using MEPS data from the 2004 to 2005 longitudi- prior year insurance coverage status. Model
nal panel, the model was then applied to a different 3 only correctly identified 31.2% of the long-
MEPS panel to assess performance. In addition, the term uninsured. Surprisingly, the model that
model’s performance was also assessed in relation to was restricted to only measuring the prior
the two alternative prediction models under consid- year’s health insurance coverage status (Model
eration (Model 2: prior coverage status only; and 2) was able to correctly identify 45.1% of the
Model 3, no inclusion of prior coverage status). long-term uninsured. Generally comparable per-
Model performance was then evaluated based formance was observed when examining the
upon predictive capacity, sensitivity, and specific- alternative models with respect to specificity.
ity, using the distinct predicted probability cutoff Model 1’s performed well with a specificity
thresholds established with the 2004–2005 MEPS level of 94.1%, with Model 2 at 95.5% and
longitudinal panel for the three models. For the Model 3 at 93.5%.
fully specified model (Model 1), the threshold The next set of comparisons focused on the
cutoff point for selecting an oversample of long- predictive capacity of the respective models, as
term uninsured individuals was .355. Using the measured as the percent of individuals with pre-
2001 NHIS data in tandem with the associated dicted probabilities of being long-term uninsured
model coefficients to derive a predicted probabil- above the threshold cutoff point, who were cor-
ity of being continuously uninsured in rectly classified. More specifically, of those indi-
2002–2003, all individuals with a predicted prob- viduals predicted to be continuously uninsured
ability of 0.355 or greater were targeted for an throughout 2002–2003, Model 1 correctly pre-
oversample. In the same manner, the threshold dicted 55.5% of the target population. Again,
cutoff point established for Models 2 and 3 were this performance was significantly better than the
0.268 and 0.428, respectively. Based on MEPS predictive capacity of Model 3, the prediction
longitudinal data, 11.7% of the US civilian nonmodel that included the same sociodemographic
institutionalized population between the ages predictors but excluded prior year insurance cov-
18–64 were continuously uninsured for the period erage status. Model 3 only correctly predicted
2002–2003. Using the predetermined cutoff 38.8% of the target population. Alternatively, the
points for the respective models, the overall per- model that was restricted to only measuring the
cent of the population predicted to be classified as prior year’s health insurance coverage status
long-term uninsured by Model 1 was most con- (Model 2) exhibited the best performance in pre-
sistent with the population estimate of 11.7% dictive capacity, with a correct prediction rate of
(11.6%), with both Models 2 and 3 yielding pre- 57.1% for the long-term uninsured.
dicted population estimates below 10% based on The final set of comparisons in model perfor-
the preestablished cutoff thresholds. mance are directly focused on the expected
An assessment of the performance of the sen- sample necessary to support a 50% increase in
sitivity of the alternative models to correctly sample yield of individuals between the ages of
identify individuals likely to be continuously 18–64, who are continuously uninsured over two
uninsured in 2002–2003 indicated the logistic consecutive calendar years. This enhanced sample
model that included prior year insurance cover- size would yield significant improvements in the
age profiles in tandem with the significant socio- precision of survey estimates which characterized
demographic predictors (Model 1) was superior. the long-term uninsured and associated popula-
More specifically, Model 1 correctly identified tion subgroups. The use of this metric facilitated
54.9% of those individuals who were continu- an evaluation of the efficiency of a model-based
ously uninsured throughout 2002–2003. This oversampling strategy to yield the targeted sam-
was significantly better than the sensitivity of ple, standardizing the comparison in terms of
sample size requirements under different model care quality and are dependent on accurate, reliable
specifications. Using an assumption of a base national estimates of these health-care parameters
sample requirement of 10,000 individuals aged to help inform policy and practice. Health-care
18–64 in a MEPS panel responding for their entire surveys serve as a critical source of this essential
2-year period of eligibility in the survey, the information, and the descriptive and analytical find-
required sample size necessary to achieve a 50% ings they generate are key inputs to facilitate the
sample size increase above the 1173 long-term development, implementation, and evaluation of
uninsured survey participants was derived based policies and practices addressing health care and
on the estimated predictive capacity observed for health behaviors. To ensure their utility and integ-
the alternative models. This sample size specifi- rity, it is essential that these health-care surveys are
cation calls for the inclusion of an additional designed according to high-quality, effective, and
587 individuals with the characteristic, resulting efficient statistical and methodological practices
in overall target sample yield of 1760 individuals and optimal sample designs.
who are long-term uninsured in the survey. This chapter serves to illustrate several survey
A summary of the overall sample size require- methods that enhance the performance and utility
ments to achieve a target sample of 1760 chronically of health services research efforts. Attention has
uninsured individuals aged 18–64 in the survey is been given to the topics of sample and survey
provided in Table 9. Model 1 performed quite well designs, nonresponse and attrition, estimation,
in terms of the necessary overall sample size to meet precision, sample size determination, and analyt-
the target for the policy-relevant population sub- ical techniques to control for survey design com-
group under consideration. A sample design with- plexities in analysis. Several of the topics that are
out access to the predictor variables from a featured in this chapter are further connected by
screening interview such as the NHIS or a design their substantive focus on the measurement of
without application of oversampling techniques trends in health-care costs, coverage, access, and
would require an overall sample of 15,000 adults health-care utilization. In addition to highlighting
ages 18–64 to achieve the target. Alternatively, all of underlying survey operations, estimates, and out-
the model-based oversampling strategies were sub- puts, the topics that have been covered also serve
stantially more effective than the constrained to identify potential enhancements that facilitate
approach, each requiring a substantially lower over- improvements in design, data collection, estima-
all sample to achieve the targeted sample. In addition strategies, and ultimately analytical capacity
tion, the expected overall sample size specification for health services research efforts.
for the model-based oversampling approach inher- A well-designed health-care survey imposes an
ent in Model 1 was substantially more modest interdependence between the survey sponsors, the
(11,058) and significantly more efficient than the survey designers, the associated statisticians and
model which excluded a baseline measure of health methodologists, the survey operations, field and
insurance status (Model 3). Remarkably, the model- management staff, the data processing staff, and
based oversampling strategy that required the low- the end users, who are primarily the health
est overall sample was the model that considered researchers, policymakers, and the public. The
only a single baseline insurance coverage status survey methods covered in this chapter should
measure (Model 2). help serve as a roadmap to help realize and
strengthen these connections. When all the essen-
tial health-care survey contributors work in con-
Summary cert, following the methods covered in this
chapter, the overall quality and utility that is
Policymakers, health-care leaders, and decision achieved in the conduct of health services
makers are particularly sensitive to recent trends research should be much greater than the sum of
in health-care costs, coverage, access, and health- the individual successful components.
694 S. B. Cohen
References Cohen J, Cohen S, Banthin J. The Medical Expenditure

Panel Survey: a national information resource to sup-
Botman S, Moore T, Moriarity C, Parsons V. Design and port healthcare cost research and inform policy and
estimation for the National Health Interview Survey, practice. Med Care. 2009;47(7):S1:44–50.
1995–2004. National Center for Health Statistics. Vital Cox B, Cohen, S. Methodological issues for health care
Health Stat. 2000;2(130) surveys. New York/Basel: 1985: Marcel Dekker
Chromy J. Variance estimators for a sequential sample Groves R, Heeringa S. Responsive design for household
selection procedure. In: Krewski D, Rao J, Platek R, surveys: tools for actively controlling survey errors and
editors. Current topics in survey sampling. New York: costs. J R Stat Soc Ser A. 2006;169:439–57.
Academic Press; 1981. p. 329–47. Groves R, Fowler F, Couper M, Lepkowski J, Tourangeau
Cochran W. Sampling techniques. New York: Wiley; 1977. R. Survey methodology. 2nd ed. New York: Wiley;
Cohen S. Methodology report #11: sample design of the 2009.
1997 Medical Expenditure Panel Survey household Hansen M, Hurwitz W. The problem of nonresponse in
component. Rockville: Agency for Healthcare sample surveys. J Am Stat Assoc. 1946;41:517–29.
Research and Quality; 2000. http://www.meps.ahrq. Korn E, Graubard B. Analysis of health surveys.
gov/data_files/publications/mr11/mr11.shtml New York: Wiley; 1999.
Cohen S. Design strategies and innovations in the Medical Lynn P. Methodology of longitudinal surveys. Chichester:
Expenditure Panel Survey. Med Care. 2003;4(7):5–12. Wiley; 2009.
Cohen S, Yu W. The utility of prediction models to Madans J, Cohen S. Health surveys: a resource to inform
oversample the long term uninsured. Med Care. health policy and practice. In: Health statistics in the
2009;47(1):80–7. 21st century: implications for health policy and prac-
Cohen S, DiGaetano R, Goksel H. Methodology report #5: tice. New York: Oxford University Press; 2005.
estimation procedures in the 1996 Medical Expenditure Rubin D. Multiple imputation for nonresponse in surveys.
Panel Survey household component. Rockville: Agency New York: Wiley; 1987.
for Healthcare Research and Quality; 1999. http://www. Selden T, Hudson J. Access to care and utilization among
meps.ahrq.gov/data_files/publications/mr5/mr5.shtml children: estimating the effects of public and private
Cohen SB, Machlin S, Branscome J. Patterns of attrition coverage. Med Care. 2006;44(5):19–26.
and reluctant response in the 1996 Medical Expenditure Short P, Graefe D. Battery-powered health insurance? Sta-
Panel Survey. J Health Serv Outcome Res Methodol. bility in coverage of the uninsured. Health Aff. 2003;
2000;1(2):131–48. 22(6):244–55.
Cohen S, Ezzati-Rice T, Yu W. Integrated survey designs: a Stinchcombe A, Jones C, Sheatsley P. Nonresponse bias
framework for nonresponse bias reduction. J Econ Soc for attitude questions. Public Opin Q. 1981;45:
Meas. 2005;30(2-3):101–14. 359–75.
Cohen SB, Ezzati-Rice T, Yu W. The utility of extended Stoop I, Billiet J, Koch A, et al. Improving survey
longitudinal profiles in predicting future health care response. Chichester: Wiley; 2010.
expenditures. Med Care. 2006a;44(5):45–53. Vartivarian S, Jang S, Salvucci S, Kasprzyk
Cohen S, Ezzati-Rice T, Yu W. The impact of survey D. Subsampling nonrespondents: Issue of calculating
attrition on health insurance coverage estimates in a response rates. In: Proceedings of the section on survey
National Longitudinal Health Care Survey. J Health research methods. Alexandria: American Statistical
Serv Outcome Res Methodol. 2006b;6:111–25. Association; 2006. p. 3796–8.
Two-Part Models for Zero-Modified
Count and Semicontinuous Data 28
Brian Neelon and Alistair James O’Malley
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
Two-Part Models for Zero-Modified Count Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696
Hurdle Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
Zero-Inflated Count Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
Regression Models for Zero-Modified Count Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
Recent Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
Two-Part Models for Semicontinuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
Two-Part Regression Models for Semicontinuous Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704
Recent Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706
Model Fitting, Testing, and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
Zero-Modified Count Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707
Semicontinuous Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 709
Model Comparison and Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
Abstract
Health services data often contain a high pro-
portion of zeros. In studies examining patient
hospitalization rates, for instance, many
B. Neelon (*) patients will have no hospitalizations, resulting
Department of Biostatistics and Bioinformatics, Duke in a count of zero. When the number of zeros is
University School of Medicine, Durham, NC, USA
greater or less than expected under a standard
e-mail: brian.neelon@duke.edu
count model, the data are said to be zero mod-
A. J. O’Malley
ified relative to the standard model. More pre-
The Dartmouth Institute for Health Policy and Clinical
Practice, Department of Biomedical Data Science, Geisel cisely, the data are zero inflated if there is an
School of Medicine at Dartmouth, Lebanon, NH, USA overabundance of zeros, and zero deflated if
Department of Health Care Policy, Harvard Medical there are fewer zeros than expected. A similar
School, Boston, MA, USA phenomenon arises with semicontinuous data,
e-mail: Alistair.J.O'Malley@Dartmouth.edu

https://doi.org/10.1007/978-1-4939-8715-3_39
696 B. Neelon and A. J. O’Malley
which are characterized by a spike at zero and semicontinuous data, parametric mixture dis-
followed by a right-skewed continuous distri- tributions known as two-part models are typically
bution of positive values. When dealing with needed to address both the abundance of zeros and
zero-modified count and semicontinuous the often highly skewed distribution of nonzero
data, flexible two-part mixture distributions values.
are often needed to accommodate both the Various two-part models have been developed
excess zeros and the skewed distribution of in recent years to address zero-modified count
nonzero values. A broad array of two-part and semicontinuous data, including hurdle
models has been introduced over the past models, zero-inflated models, and two-part semi-
three decades to accommodate such data. continuous models. While these models vary in
These include hurdle models, zero-inflated terms of their distributional assumptions and para-
models, and two-part semicontinuous models. metric forms, they all incorporate an underlying,
While these models differ in their distribu- two-part structure in which the zero and nonzero
tional assumptions, they each incorporate a observations are modeled through distinct
two-part structure in which the zero and non- (although sometimes overlapping) sets of
zero observations are modeled in distinct but parameters.
related ways. This chapter describes recent Sections “Two-Part Models for Zero-Modified
developments in two-part modeling of zero- Count Data” and “Two-Part Models for Semi-
modified count and semicontinuous data and continuous Data” of this chapter provide overviews
highlights their application in health services of zero-modified count and semicontinuous models,
research. respectively. Section “Model Fitting, Testing, and
Evaluation” discusses model fitting and evaluation
strategies and highlights software packages com-
Introduction monly used to fit such models. The final section
provides a summary, discusses potential limita-
In health services research, it is common to tions of two-part models, and points to directions
encounter data with an abundance of zeros. For for future research.
example, in studies examining outpatient clinic
visits, patients who report no visits will be
assigned a count of zero. Likewise, in studies Two-Part Models for Zero-Modified
examining the frequency of screening mammog- Count Data
raphy, patients who have never received a screen-
ing mammogram will have a response value of Zero-modified count data arise frequently in
zero. Count-valued outcomes, like those in the health services research. Consider, for example,
previous two examples, are typically modeled a recent study by Neelon et al. (2012) examining
using discrete distributions, such as the Poisson emergency department (ED) visits in Durham,
or negative binomial distribution. However, there North Carolina, during the 2009 calendar year.
are times when the proportion of zeros is greater Figure 1 presents a partial histogram of the visits
or less than what a standard count distribution up to ten visits. The actual number of visits per
would predict, and in such cases the data are said patient ranged from 0 to 95, with an average of
to be zero modified relative to an ordinary count 0.65 visit per patient. Nearly 70% of the patients
model. A related phenomenon occurs with semi- made no ED visits during the year, 19% had
continuous outcomes, such as medical expendi- exactly one visit, 5% had exactly two visits, and
tures, which are characterized by a point mass at the remaining 6% had more than two visits.
zero (representing, say, no expenditures) followed Now, suppose one is interested in building a
by a right-skewed continuous distribution for the statistical model to describe these data. A first step
positive values (representing positive expendi- might be to assume that the data were generated
tures). When dealing with zero-modified count according to a Poisson distribution with mean
28 Two-Part Models for Zero-Modified Count and Semicontinuous Data 697
Fig. 1 Partial histogram of

ED visits (up to ten visits)
60
50
Percent 40
30
20
10
0 1 2 3 4 5 6 7 8 9 10
Number of Emergency Department Visits
parameter μ = 0.65, the average number of ED longer than predicted right-tailed distribution of
visits in the sample. That is, positive counts, since these features impose com-
peting influences on the model. In the Poisson
0:65y e0:65 case, for example, the high proportion of zeros
PrðY ¼ yÞ ¼ , y ¼ 0, 1, . . . , (1) tends to lower the mean parameter, μ, while large
y!
nonzero values tend to increase it. The term “zero
where Y denotes the number of ED visits. inflation,” then, is customarily used to describe
Although this seems like an intuitive (albeit some- data in which a high proportion of zeros, together
what basic) modeling choice, the model is not with a skewed distribution of nonzero counts,
especially compatible with the observed data. leads to a poor-fitting standard count model.
Under this model, for instance, one would expect More generally, the term zero modification is
52% zeros and 34% 1’s – far fewer zeros and more used to encompass both zero inflation and zero
1’s than were actually observed. When the num- deflation (i.e., fewer than expected zeros). In the
ber of zeros is greater than would be predicted presence of zero modification, special two-part
under a standard count distribution, the data are mixture distributions are often needed to provide
said to be zero inflated relative to the standard adequate fit to the data. This section reviews
model. Note that the abundance of zeros by itself common two-part models for zero-modified
is not necessarily problematic. For example, count data.
under a Poisson model with mean μ = 0.35, one
would expect approximately 70% zeros as
observed in Fig. 1. However, this same model Hurdle Models
would predict fewer than 1% of the counts to be
greater than two, clearly in conflict with the 6% The hurdle model (Mullahy 1986; Heilbron 1994)
observed in the data. Ordinary count distributions, is a two-part mixture model consisting of a point
therefore, become problematic primarily when mass at zero followed by a zero-truncated count
there is an abundance of zeros coupled with a distribution for the positive observations:
PrðY ¼ 0Þ ¼ 1 π, 0 π 1 PrðY ¼ 0Þ ¼ 1 π, 0 π 1PrðY ¼ yÞ

πpðy; θÞ μy eμ
PrðY ¼ yÞ ¼ , y ¼ 1, 2, . . . , ¼π , μ > 0;
1 pð0; θÞ y!ð1 eμ Þ
(4)
(2)
y ¼ 1, 2, . . . ,
where π = Pr(Y > 0) is the probability of a
nonzero response; p(y; θ) is an untruncated, or where μ is the mean of the ordinary (i.e.,
base, probability distribution with parameter untruncated) Poisson. When (1 π) > exp
vector θ; and p(0; θ) is the base distribution eval- (μ), the data are zero inflated relative to an
uated at 0. This can be written more compactly as ordinary Poisson, and when (1 π) < exp(μ),
pðy; θÞ there is zero deflation.
Yeð1 π Þ1ðy¼0Þ þ π 1ðy>0Þ , where
1 pð0; θÞ Alternative hurdle models can be formed by
1ð:Þ denotes the indicator function. Models such selecting different base distributions, such as the
as the hurdle model are commonly referred to as negative binomial, the generalized power series
“two-part” models because the zeros and nonzero (Patil 1962; Ghosh et al. 2006) or the generalized
counts are modeled separately, thereby accommo- Poisson distribution (Consul 1989; Gschlößl and
dating zero modification. The expected value and Czado 2008). The negative binomial hurdle
variance of the hurdle model are given by model, for example, is given by
PrðY ¼ 0Þ ¼ 1 π, 0 π 1
πμ y r
Eð Y Þ ¼ η¼ and π Γðy þ r Þ μ r
1 pð0; θÞ PrðY ¼ yÞ ¼ r
r Γðr Þy! μ þ r μþr
,
(3) 1 μþr
πσ 2
V ðY Þ ¼ ηðμ ηÞ þ , r, μ > 0; y ¼ 1, 2, . . .
1 pð0; θÞ
(5)
where μ and σ denote the mean and variance
2
of the base distribution, respectively. In health The negative binomial base distribution is
services research, π is known as the utilization appealing if there is evidence of overdispersion
probability – i.e., the probability of using ser- relative to the ordinary Poisson – that is, a vari-
vices at least once. When 1 π = p(0; θ), the ance exceeding the mean. The mean and variance
hurdle model reduces to its base distribution; of the negative binomial base distribution are
when (1 π) > p(0; θ), the data are zero given by μ and μ(1 + μ/r), respectively; hence,
inflated relative to the base distribution; and (1 + μ/r) is a measure of overdispersion. As
when (1 π) < p(0; θ), there is zero deflation. r ! 1 the negative binomial converges to a
In the extremes, π = 0 or 1. When π = 1, there Poisson distribution with mean and variance
are no zero counts, and the model reduces to a equal to μ. The connection between the negative
truncated count distribution; when π = 0, there binomial and Poisson distributions goes even fur-
are no users (i.e., all counts equal zero), and the ther, since the former can be derived as a Poisson-
model is degenerate at zero. Typically, one gamma mixture. In particular, if Wjλ Poi (λ)
assumes that π is strictly between 0 and 1, so and λ Ga(r, μ/r), then the marginal distribution
that there is a nonzero utilization probability for of W is negative binomial with mean μ and vari-
all individuals under study, and hence all sub- ance μ(1 + μ/r). Thus, the gamma prior, or
jects are viewed as “potential” users, even if “mixing,” distribution for λ induces excess varia-
some do not actually use services during the tion relative to the Poisson. More generally, it can
study period. be shown that hurdle models are more over-
Perhaps the most common choice for the base dispersed than their base distributions if and only
if (1 π) > p(0; θ), since in this case VEððYYÞÞ > σμ ,
2
distribution is the Poisson distribution, which
gives rise to the Poisson hurdle model: where Y is distributed according to Eq. 2 and μ and
σ 2 are the mean and variance of the base distribu- and the zeros are inflated relative to an ordinary
tion, respectively. For example, the negative bino- Poisson distribution. Thus, unlike the Poisson hur-
mial hurdle distribution is more overdispersed dle model, the ZIP model accommodates only zero
than the ordinary negative binomial if (1 π) inflation. In fact, because zero-inflated count
> 1 [r/(μ + r)]r or equivalently π < [r/ models can be rewritten as hurdle models with
(μ + r)]r. As a corollary, it follows that the mixing probability π = ϕ[1 p(0; θ)] (Neelon
Poisson hurdle model is overdispersed relative to et al. 2010), they can be viewed as special cases
the ordinary Poisson if and only if (1 π) > exp of hurdle models in which only zero inflation
(μ) and underdispersed when (1 π) < exp and overdispersion are allowed. As with hurdle
(μ). Thus, the Poisson hurdle model allows for models, other base distributions can be chosen to
both over- and underdispersion. Underdispersion model the counts in zero-inflated models. For
arises when there are fewer zeros than expected example, the ZINB model is given by Y ð1 ϕÞ
under the ordinary Poisson model (Winkelmann 1ðZ¼0Þ þ ϕNBðy; r, μÞ1ðZ¼1Þ. For a comprehensive
2008). As μ ! 1, the number of zeros expected review of zero-inflated models, see Ridout
under the ordinary Poisson model decreases, et al. (1998).
and the potential for underdispersion dimin- Because each part of the mixture accommo-
ishes. For detailed discussions of over- and dates zeros, zero-inflated models such as the ZIP
underdispersion in zero-modified count models, explicitly partition the zeros into two types: struc-
see Helibron (1994), Gschlößl and Czado tural or ineligibility zeros (e.g., those that occur
(2008), and Winkelmann (2008). because a patient is ineligible for health services)
and chance or sampling zeros (those that occur by
chance among eligible patients). In the health
Zero-Inflated Count Models
services setting, the parameter ϕ is known as the
eligibility probability, and hence the random var-
Zero-inflated count models are two-part mixtures
iable Z can be viewed as an “eligibility” indicator
consisting of a degenerate distribution at zero and
taking the value 1 if an individual is eligible for
an untruncated count distribution. These include
services and 0 otherwise. In this context, the
the zero-inflated Poisson (ZIP) model (Lambert
parameter μ represents the mean count among
1992) and the zero-inflated negative binomial
eligible subjects (i.e., given Z = 1). In other set-
(ZINB) model (Green 1994; Mwalili et al. 2008).
tings, such as infectious disease epidemiology,
The ZIP model is given by
ϕ is known as the “at-risk” or “susceptibility”
μ probability – i.e., the probability of belonging to
PrðY ¼ 0Þ ¼ ð1 ϕÞ þ ϕe , 0ϕ1
an at-risk or susceptible population (Albert et al.
μy eμ
PrðY ¼ yÞ ¼ ϕ , μ > 0; y ¼ 1, 2, . . . ; 2011; Preisser et al. 2012). Note that the random
y!
variable Z is unobserved, since the observed out-
or, alternatively, come, Y, provides no direct information about
Y ð1 ϕÞ1ðZ¼0Þ þ ϕPoiðy; μÞ1ðZ¼1Þ , individuals’ eligibility status, only whether they
(6) eventually used services as indicated by Y = 0 or
Y > 0. If Z were actually observed (e.g., through
where Z is a (latent) indicator variable that an eligibility screening process), then ϕ could be
takes the value 1 with probability ϕ. The mean estimated using the sample proportion of eligible
and variance of the ZIP model are E(Y ) = ϕμ patients and μ by fitting a count model to the
and V(Y ) = ϕμ[1 + (1 ϕ)μ], respectively, subsample of those eligible. The fact that Z is
and hence V(Y) > E(Y) and the model is over- unobserved means that it is not possible to condi-
dispersed when ϕ < 1. When ϕ = 1, there is no tion on the eligible group, which, from a policy
zero inflation, and the model reduces to the ordinary standpoint, may be the subpopulation of greatest
Poisson with Pr(Y = 0) = exp(μ). Conversely, interest. Fortunately, zero-inflated models allow
when ϕ < 1, exp(μ) < (1 ϕ) + ϕ exp(μ), one to estimate ϕ and μ even when Z is
unobserved, a topic discussed in greater detail in than 1 π = Pr(Y = 0), since the former is typ-
section “Model Fitting, Testing, and Evaluation.” ically of interest. Moreover, for simplicity, identi-
The choice between ZIP and hurdle models is cal predictors are assumed for both parts of the
dictated in large part by the aims of the investiga- model. In general, one might allow for unique
tor. If zeros can arise in only one way, then a predictors for the two components if the goal is
hurdle model may be desirable. For example, in to obtain a parsimonious model by removing
a study of outpatient service use, it may happen extraneous variables in one component or if
that patients either decline services, in which case there is a priori scientific reason to believe that
Y = 0, or they use services one or more times, in the two components are associated with unique
which case Y > 0. Here, a hurdle model might sets of predictors.
reasonably capture the underlying distribution of Choosing a logit link for g(.) gives rise to the
the counts. In contrast, if patients only use ser- logistic hurdle regression model:
vices when they perceive themselves to be “at
πi
risk,” then zeros can arise in two ways: among logitðπ i Þ ¼ ln ¼ x0i β1
those who are not at risk or among those who are 1 πi
at risk but nevertheless choose not to use services. lnðμi Þ ¼ x0i β2 , i ¼ 1, . . . , n:
In this case, a zero-inflated model would seem (8)
more appropriate. In some situations, the choice
between models is not clear-cut. In these circum- Under model (8), the l-th regression coeffi-
stances, Min and Agresti (2005) suggest that hur- cient, β1l(1 l p), represents the effect of a
dle models might provide better fit if there is one-unit change in the l-th predictor,xli, on the
evidence of zero deflation among subgroups of log odds of service utilization, adjusting for
the population (e.g., among nonsmoking males). other predictors. The precise interpretation of β2l
Zero-inflated models, on the other hand, imply is less straightforward, since, conditional on
zero inflation at all covariate values. Y > 0, the counts are modeled via a truncated
distribution rather than an ordinary count distri-
Regression Models for Zero-Modified bution. Generally speaking, however, β2l > 0
Count Data implies that the expected count among health
services users increases as xli increases.
Suppose interest lies in modeling the association Zero-inflated regression models have a similar
between a set of predictors x (e.g., age, race, etc.) form:
and a zero-modified response Y. Hurdle models
g½PrðZi ¼ 1Þ ¼ gðϕi Þ ¼ x0i β1
can be extended to the regression setting by
modeling each component of as a function of x: lnðμi Þ ¼ x0i β2 , i ¼ 1, . . . , n,
(9)
g½PrðY i > 0Þ ¼ gðπ i Þ
¼ x0i β1 ¼ β10 þ β11 x1i þ . . . þ β1p xpi lnðμi Þ where Zi is the eligibility indicator for the i-th
subject as defined in the previous section. Note
¼ x0i β2 ¼ β20 þ β21 x1i þ . . . þ β2p xpi , that the first equation of (9) models ϕi, the eligi-
i ¼ 1, . . . , n, bility probability for the i-th individual, rather
(7) than the utilization probability, which is
represented by π i = ϕi[1 p(0; μi)]. If a logit
where g(.) is a binary link function, such as the link is assumed for g, then β1l denotes the effect
logit or probit link, Yi denotes the response for the of a one-unit change in covariate l on the log odds
i-th observation, xi is a p 1 vector of predictors, of eligibility, while β2l represents the effect of a
and β1 and β2 are corresponding p 1 vectors one-unit change in predictor l on the log-mean
of regression coefficients for each component. count given eligibility. Or, put another way, for
Note that Eq. 7 models π = Pr(Y > 0) rather every one-unit change in predictor l, the mean
count among eligible patients is multiplied by a PrðY i ¼ 0Þ ¼ eμ1i , lnðμ1i Þ ¼ x0i β1

factor of exp(β2l). The parameter exp(β2l) is com- y
μ2ii eμ2i
monly referred to as the incidence rate ratio, or IRR, PrðY i ¼ yi j Y i > 0Þ ¼ , lnðμ2i Þ ¼ x0i β2 :
yi !ð1 eμ2i Þ
for the eligible population. As Albert et al. (2011) (11)
and Preisser et al. (2012) note, it is more often of
interest to make inferences about the entire popula- If one sets x0i β1 ¼ γ þ x0i β2 , then testing for
tion comprising both eligible and non-eligible indi- zero modification reduces to testing γ = 0. When
viduals. Consider, for example, the simple case γ < 0, the data are zero inflated; when γ > 0, the
where the model includes a single dichotomous zeros are deflated; and when γ = 0, the model
covariate, xi, and a logit link is assumed for g in reduces to a standard Poisson model. In this case,
Eq. 9. In this case, the IRR representing the overall model (11) can be fit using a complementary
effect of xi in the entire population is log-log link for Pr(Yi > 0) = vi:
E ð Y i j x i ¼ 1Þ
IRR ¼
E ð Y i j x i ¼ 0Þ cloglogðvi Þ ¼ ln½lnð1 vi Þ ¼ γ þ x0i β2
lnðμi Þ ¼ x0i β2 :
expðβ11 Þ½1 þ expðβ10 Þ
¼ expðβ21 Þ
1 þ expðβ10 þ β11 Þ (12)
(10)
For a general discussion of zero-altered
where β10 is the intercept and β11 is the coeffi- models, including extensions of model (11), see
cient for xi in the first component and β21 is the Heilbron (1994).
coefficient for xi in the second component. Several authors have proposed zero-modified
Whenβ11 = 0, the population IRR is equal to the regression models for repeated measures and clus-
IRR for the eligible population, exp(β21). As β11 tered count data. The most common approach is to
deviates from 0, naively interpreting the IRR for incorporate random effects into the linear predic-
the eligible class as the overall IRR will lead to tors for each part of the model. For example, Hall
increasingly biased inferences. For a fuller discus- (2000) developed a repeated measure ZIP model
sion of this topic, including extensions to multiple that included a random intercept for the Poisson
categorical and continuous predictors, see Pre- component. Yau and Lee (2001) later introduced
isser et al. (2012). uncorrelated random intercepts for both compo-
A special case of the ZIP regression model is nents of a hurdle model. Min and Agresti (2005)
the ZIP(τ) model (Lambert 1992) whereby β1 = extended the approach to include correlated ran-
τβ2( 1 < τ < 1) in Eq. 9, implying that the dom intercepts for the two components. In partic-
covariate effects are proportional across model ular, the logistic hurdle regression model with
components. If a logit link is assumed for g, then correlated random intercepts is given by
1
ϕi ¼ 1 þ μτ i . As τ ! 1, the probability

of observing a zero for the i-th subject increases, and logit Pr Y ij > 0j b1i t ¼ logit π ij
as τ ! 1, the probability of a zero decreases. In ¼ x0ij β1 þ b1i
many applications, the more parsimonious ZIP(τ)
can lead to efficiency gains in parameter estimation ln μij ¼ x0ij β2 þ b2i ,
compared to the ordinary ZIP model.
j ¼ 1, . . . , ni ;
Heilbron (1994) developed a related zero-
altered regression model that can be used to test i ¼ 1, . . . , n;
for zero modification. The model assumes a single !
b1i
distribution for the i-th response, Yi, but uses separate bi ¼ N2 ð0, ΣÞ,
parameters for Pr(Yi = 0) and Pr(Yi = yi| Yi > 0). b2i
For example, the zero-altered Poisson (ZAP) model (13)
takes the form
where Yij is the j-th response for subject Bandyopadhyay (2011) proposed a two-state,
(or cluster) i, xij is a corresponding vector of pre- hidden Markov ZIP model to analyze cocaine
dictors for the ij-th observation, b1i and b2i are dependence, with hypothesized latent states
random intercepts for the i-th subject/cluster, and corresponding to “high” or “low” cocaine use.
N2(0, Σ) denotes a bivariate normal distribution Dobbie and Welsh (2001) and Hall and Zhang
with mean 0 = (0, 0)0 and 2 2 variance- (2004) used generalized estimating equations
covariance matrix Σ. Higher dimensional random (GEE) to fit population-average (or “marginal”)
effects, such as random slopes, can be incorpo- Poisson hurdle models. Fahrmeir and Osuna
rated as well. The correlated random effects model Echavarría (2006) developed a generalized
is appealing if one believes that the process giving additive ZINB model, using penalized splines
rise to a nonzero count is related to the expected to model nonlinear trends among the predictors
count given Y > 0. For example, returning to the Lam et al. (2006) proposed a related semi-
ED study presented at the beginning of the sec- parametric ZIP model. Hsu (2005) introduced
tion, it might be reasonable to hypothesize that a weighted ZIP (W-ZIP) model to predict the
patients with a high propensity to use the ED at time to recurrence of colorectal polyps among
least once are also likely to make repeat visits patients randomized to high- and low-fiber
given some utilization. In such cases, the corre- diets. Buu et al. (2011) developed a variable
lated random effects model can lead to improved selection method for ZIP models that allows
model fit over uncorrelated random effects, single for component-specific penalties. Williamson
random effect, and fixed-effects models – all of et al. (2007) derived power and sample size
which arise as special cases of the correlated calculations for studies involving zero-inflated
model. The correlated zero-inflated model has a data. For times series analysis, Hasan and
comparable form to the hurdle model, but as noted Sneddon (2009) developed first-order auto-
in the previous subsection, the interpretation of regressive (AR(1)) and moving average (MA
the parameters differs. For overviews of zero- (1)) ZIP models. More recently, Silva et al.
modified count models for repeated measures, (2011) proposed a ZIP model for quantitative
see Min and Agresti (2005) and Neelon et al. trait loci (QTL) mapping.
(2010). For a more general discussion of count Several authors have introduced zero-inflated
regression models, including zero-modified models for the analysis of spatially correlated
models, see Cameron and Trivedi (1998), data. Agarwal et al. (2002) developed a spatial
Winkelmann (2008), and Zuur et al. (2012). ZIP model that incorporated spatially correlated
random effects into the Poisson component.
Rathbun and Fei (2006) proposed a similar
Recent Developments model in which the structural zeros were fitted
using a spatial probit model. Ver Hoef and Jansen
Two-part count models have been adapted to (2007) extended the approach to include distinct
cover a wide range of statistical applications, spatial random effects for both model compo-
including latent growth curve models, finite mix- nents. Recently, Neelon et al. (2012) developed a
ture models, generalized additive models, variable spatial Poisson hurdle model for “areal-
selection methods, multivariate analysis, and spa- referenced” data in which the spatial units consist
tial data analysis. For example, Liu (2007) devel- of aggregated regions of space, such as counties or
oped a zero-inflated growth model that allows for Census tracts. They introduced spatial random
correlated random intercepts and slopes for both effects for both components of the hurdle model
components. Roeder et al. (1999), Dalrymple and linked the random effects via a bivariate con-
et al. (2003), and Min and Agresti (2005) devel- ditionally autoregressive (CAR) prior that induces
oped finite mixture zero-modified models that dependence between the model components and
cluster subjects into distinct classes defined provides spatial smoothing across neighboring
by latent response trajectories. DeSantis and regions. As such, their model can be viewed as a
spatial analogue to the correlated hurdle model As with zero-modified count data, semi-
given in Eq. 13. continuous data can be viewed as arising from
There have been a number of other recent two distinct stochastic processes: one governing
developments as well. These include zero- the occurrence of zeros and the second determin-
inflated binomial (ZIB) models (Hall 2000);, ing the observed value given a nonzero response.
Hall and Zhang 2004), pattern-mixture Poisson The first process is commonly referred to as the
hurdle models for non-ignorable missing data “occurrence” or “binary” part of the data, and the
(Hasan et al. 2009; Maruotti 2011), the k-ZIG second is often termed the “intensity” or “contin-
model for extreme zero inflation (Ghosh et al. uous” part. Two-part mixture models are an ideal
2012), zero-inflated generalized Poisson (ZIGP) choice for such data, since they explicitly accom-
models (Gschlößl and Czado 2008; Gupta et al. modate both data-generating processes. A lognor-
1996), zero-inflated power series models mal distribution is frequently chosen to model the
(Ghosh et al. 2006), and multivariate extensions nonzero values, giving rise to the Bernoulli-log-
of zero-inflated models (Li et al. 1999; Walhin normal two-part model (Manning et al. 1981):
and Bivariate 2001; Majumdar and Gries 2010;
Arab et al. 2011). These recent applications f ðyÞ ¼ ð1 ϕÞ1ðy¼0Þ
highlight the growing use of two-part models

þ ϕ LN y; μ, σ 2 1ðy>0Þ , y 0, 0 ϕ 1,
for the analysis of complex zero-modified
(14)
count data.
where ϕ = Pr(Y > 0) , LN(y; μ, σ 2) denotes

Two-Part Models the lognormal density evaluated at y and μ and σ 2
for Semicontinuous Data denote the mean and variance of ln(Y|Y > 0). Note
that Eq. 14 has the same two-part structure as the
In many cases, the nonzero response distribution hurdle model given in Eq. 2 in the previous section
is continuous rather than count valued. Such data and can therefore be viewed as a natural extension
are commonly referred to as “semicontinuous” of the hurdle model to semicontinuous data. As
because they consist of a mixture of a degenerate with hurdle models, when ϕ = 0, the distribution
distribution at zero and a right-skewed, continu- is degenerate at 0; when ϕ = 1, there are no zeros
ous distribution for the nonzero values. As an and the distribution reduces to a lognormal density.
illustration, consider Fig. 2, which shows the dis- Typically, one assumes that 0 is strictly between
tribution of annual mental health expenditures 0 and 1, so that all individuals have a long-run
among federal employees from a recent study by guarantee of a nonzero value. Note that even if
Neelon et al. (2011). Over 80% of the patients had 0 truly takes the value 0 for some subjects, model
no annual expenditures, depicted by the vertical (14) cannot identify such individuals. That is, with-
line, while the remaining patients spent upward of out further identifying assumptions, the model can-
1000 USD during the study period. Other exam- not differentiate the so-called never users from
ples of semicontinuous data include medical costs those who happened not to use services during
(Manning et al. 1981; Duan et al. 1983; Cooper the study period.
et al. 2003), hospital length of stay (Xie et al. As in the count setting, alternative distributions
2004), health assessment scores (Su et al. 2009), can be used to model the positive values. For exam-
and average daily alcohol consumption (Olsen ple, as part of a study examining inpatient medical
and Schafer 2001; Liu et al. 2012). In some expenditures, Manning et al. (2005) proposed a
cases, such as days of hospitalization or question- one-part generalized gamma distribution that
naire scores, the response is, strictly speaking, encompasses the Weibull, exponential, and lognor-
integer valued, but the domain is refined enough mal distributions as special cases. Building on this
to be reasonably approximated by a continuous work, Liu et al. (2010) developed a two-part gener-
distribution. alized gamma model for semicontinuous medical
1.0
Fig. 2 Distribution of
annual mental health
expenditures among federal
employees
0.50
Density
0.25
0
0 100 200 300 400 500 600 700 800 900
Spending ($)
costs. More recently, Liu et al. (2012) compared model accommodates two sources of zeros (true
generalized gamma, log-skew-normal, and Box- zeros and censored zeros), it can be viewed a
Cox-transformed two-part models and found that semicontinuous version of the zero-inflated
the generalized gamma model provided superior fit count models described in section “Zero-Inflated
in their analysis of daily alcohol consumption. Count Models.”
A related model is the Tobit model (Tobin
1958) in which the zeros represent the censoring
of an underlying continuous variable Y below a Two-Part Regression Models
detection limit, L: for Semicontinuous Data
Y ¼0 if Y < L Two-part models for semicontinuous data can be

(15)
Y ¼ Y if Y L: extended to the regression setting by incorporat-
ing predictors into each component of the model.
Censoring from above and interval censoring For example, the Bernoulli-lognormal two-part
are also allowed. Note that in the Tobit model, the regression model is given by
zeros arise from censoring of Y values that fall
below L, whereas in two-part semicontinuous f ðyi Þ ¼ ð1 ϕi Þ1ðyi ¼0Þ
models, the zeros are valid observed responses
þ ϕi LN yi : μi , σ 2 1ðyi >0Þ , where
(corresponding to, say, no medical expenditures).
gðϕi Þ ¼ g½PrðY i > 0Þ ¼ x0i β1 and
The Tobit and two-part models also differ in that
the former assumes a single underlying distribu- μi ¼ E½lnðY i Þj Y i > 0 ¼ x0i β2 , i ¼ 1, . . . , n:
tion for the data, whereas the two-part model is a (16)
mixture of two separate generating processes –
one for the zeros and one for the positive values. When a logit link is assumed for g( ), the l-th
Recognizing these distinctions, Moulton and Hal- regression coefficient, β1l, represents the change
sey (1995) proposed a zero-inflated Tobit (ZIT) in the log odds of a positive response per one-unit
model that distinguished between censored zeros change in covariate xil, adjusting for other pre-
and “true” zeros. The model is a mixture of a point dictors. Likewise, β2l represents the adjusted per
mass at zero and a Tobit model, such that with unit change in the mean of ln(Yi)jYi > 0. Note that
probability 1 p, Y is set to 0, and otherwise Y is to convert from the log scale to the original
drawn from a Tobit distribution. Because the ZIT response scale in part 2 of the model, one can
take exp(β2l), which denotes the multiplicative Modeling the correlation between b1i and b2i
change in the median of YijYi > 0 per unit change directly accommodates the between-component
in xil. Because the expected value of YijYi > 0 is association, thus providing a realistic characteri-
given by exp x0i β2 þ σ 2 =2 , inference involving zation of the underlying data-generating process.
the untransformed mean response entails estima- There are other advantages to modeling the
tion of both β and σ 2. If the log-normality assump- between-component association, however. Most
tion fails, nonparametric methods can be used to importantly, ignoring the between-component
estimate the untransformed mean in the continu- association can lead to biased estimates in the
ous component of the model (Duan 1983). This second part of the model (Su et al. 2009). To see
topic is discussed in greater detail in section this, consider the two-part lognormal model given
“Model Fitting, Testing, and Evaluation.” in Eq. 17, which can be recoded in terms of two
Two-part regression models have also been random variables:
used to analyze longitudinal and clustered semi-
continuous data (Olsen and Schafer 2001; Tooze 0 if Y ¼ 0
R ¼
et al. 2002; Cooper et al. 2007). The most com- 1 if Y > 0 (18)
mon approach is to introduce correlated random Undefined if R¼0
V ¼ ,
effects for each component, as in model (13) of logðY Þ if R¼1
section “Regression Models for Zero-Modified
Count Data.” Assuming a logit link for g( ) and a where subscripts have been omitted to simplify
lognormal distribution for the positive values notation. The random variable R ¼ 1ðY>0Þ can be
leads to the logistic-lognormal correlated random viewed as a response indicator for the second
effects model: component of the model.
Recall that the target population for the contin-

uous part is the set of all subjects with positive
logit ϕij ¼ logit Pr Y ij > 0j b1i ¼ x0ij β1 þ b1i

responses – that is, for whom Y > 0
μij ¼ E ln Y ij j Y ij > 0, b2i ¼ x0ij β2 þ b2i , (or equivalently for whom R = 1). Valid infer-
j ¼ 1, . . . , ni ; i ¼ 1, . . . , n; ences can be achieved by selecting a random
bi ¼ ðb1i , b2i ÞeN2 ð0, ΣÞ, sample V ¼ fV 1 , V 2 , . . . , V n g from this target
(17) population. However, when some individuals
have a response value of 0, a subset of
where Yij denotes the j-th response for the i-th V ðsay, V Þ is undefined. These undefined obser-
subject (or cluster); b1i and b2i are correlated vations can be viewed as akin to missing data. If
subject-specific random intercepts for the binary the two components are truly uncorrelated, then
and continuous components, respectively; and Σ V is missing completely at random (MCAR). In
is a 2 2 variance-covariance matrix. The model this case, the model for R includes only an inter-
can be easily extended to include higher-order cept and therefore has no bearing on the model for
random effects. V. Consequently, a model fitted to the observed
As in the count setting, the correlated model is values of V will yield population-representative
appealing if one believes that the process giving estimates.
rise to the positive values is related to the observed If the association between the components can
value given a positive response. For example, in a be explained entirely by observed data, then the
study of hospital length of stay, patients who are elements of V are missing at random (MAR). In
likely to be admitted to the hospital may also tend other words, R and V are conditionally indepen-
to have longer stays than those with lower pro- dent given the observed data. Modeling R and
pensities for admission. This would imply a pos- V separately will once again yield unbiased esti-
itive association between the probability of mates as long as the model for V is correctly
admission (component 1 of the model) and the specified and includes all predictors relevant to
length of stay given admission (component 2). R. In some instances, investigators may wish to
include only a subset of the predictors in part 2 of mechanisms, see Little and Rubin (2002). For a
the model, in which case the model for V will, by general overview of shared-parameter models for
necessity, exclude predictors associated with R. non-ignorable missing data, see Albert and
Here, one can use the model for R to form sam- Follmann (2009).
pling weights and fit a weighted regression to V.
Alternatively, one can impute V and base the
analysis on the observed and imputed data. The Recent Developments
key point is that if V is MAR, then modeling
R and V separately will not induce bias so long as There have been a number of recent develop-
the model for V is correct and incorporates, in ments in semicontinuous regression modeling.
some fashion, the relevant predictors for R. Liu et al. (2008) developed a multilevel two-part
If, however, R and V remain correlated after model that incorporates correlated random effects
adjusting for covariates, then V is not missing at at multiple levels of clustering – for example,
random (NMAR). Here, fitting separate models longitudinal measurements on patients clustered
for R and V induces selection bias in the parameter within clinic. Here, clinics constitute the first clus-
estimates for V. For example, if the two compo- tering level, since patients are nested within
nents are positively correlated, higher-valued V’s clinics; patients then form the second level of
will tend to have increased nonzero response clustering, since there are repeated measurements
probability, Pr(R = 1), conditional on observed for each subject.
covariates. As a result, at fixed values of the Another active area of research involves two-
observed covariates, there will be an overrepre- part growth mixture models for examining longi-
sentation of large response values among the tudinal trends among latent subgroups of individ-
observed cases in V . Ignoring the association uals (Neelon et al. 2011; Muthén 2001). Growth
between the two components and basing the mixture models assume that the data are generated
part-2 analysis solely on observed cases will bias through a two-step process: first, individuals are
the fixed-effects intercept upward and may lead to placed into one of K latent classes defined by a set
bias in other part-2 parameters as well, depending of average trajectory curves – one for each com-
on the structure of the between-component asso- ponent of the two-part model; then, around these
ciation (Su et al. 2009). One way to correct for this average trajectories, individuals are randomly
bias is to fit a correlated two-part model analogous assigned their own, subject-specific curves
to Eq. 17. The resulting model can be viewed as a defined by a set of random effects with class-
shared-parameter model (Wu and Carroll 1988) specific variance parameters. As such, these
that accounts for unmeasured subject-level factors models can be viewed as finite mixtures of the
that induce correlation between R and V . Note two-part correlated random effects model
that this approach again relies on a conditional expressed in Eq. 17.
independence assumption whereby, this time, Other recent developments include bivariate
R and V are assumed to be stochastically inde- two-part models (Su et al. 2012), two-part models
pendent given both the observed data and the for the joint analysis of longitudinal and survival
random effects. While it is impossible to verify outcomes (Liu 2009); Hatfield et al. 2011),
whether V is MAR or NMAR, it is often safer to two-part models for estimating expected cumula-
assume NMAR, unless enough covariates have tive cost of illness in the presence of censoring
been measured to reasonably account for the (Basu and Manning 2010), and Bayesian exten-
dependence between R and V . sions of two-part semicontinuous models (Neelon
For further details on selection bias in et al. 2011; Liu 2009; Hatfield et al. 2011; Zhang
two-part semicontinuous models, see Su et al. et al. 2006). This recent work highlights a grow-
(2009). For a related discussion regarding selec- ing interest in parametric two-part modeling and
tion bias in hurdle count models, see Neelon et al. solidifies its current role as a vibrant area of sta-
(2012). For further discussion of missing data tistical research.
Model Fitting, Testing, and Evaluation probability ϕ – is unobserved. Consequently, ML

estimation is commonly implemented using the
There is a broad spectrum of estimation, testing, EM algorithm. Under this approach, the latent
and model evaluation techniques suitable for indicator Z is treated as missing data. The expec-
statistical inference in two-part models, and the tation step involves computing the expected value
choice of method depends on both the type of (with respect to Z ) of the logged “complete data”
model being fit and the analytic aims of the likelihood as expressed, for example, in the last
investigators. This section highlights common line of Eq. (6. In the maximization step, the
approaches to parameter estimation in two-part expected complete data log-likelihood is maxi-
models, with the aim being to provide an informal mized with respect to the model parameters. Alter-
overview of these techniques. Readers are encour- natively, since the zero-inflated model can be
aged to seek out the cited references for more reparameterized as a hurdle model (Neelon et al.
technical discussions of these methods. 2010), Newton-Raphson algorithms can be used
to obtain the ML estimates. Of the two choices,
Lambert (1992) found that the EM algorithm gen-
Zero-Modified Count Models erally outperformed Newton-Raphson. However,
for the ZIP(τ) regression model, Lambert (1992)
There are a number of approaches to parameter notes that the EM algorithm is not useful, since
estimation in zero-modified count models, the parameters β and τ are not easily estimated
including maximum likelihood (ML), general- even when Z is observed. She recommends
ized estimating equations (GEE), penalized Newton-Raphson procedures in this case. A
quasi-likelihood, estimation-maximization (EM) more technical discussion of these procedures
algorithms, and Bayesian methods. For the can be found in her paper.
uncorrelated hurdle models given in Eqs. 2, 7, Recall from section “Regression Models for
and 8, parameter estimation proceeds by fitting Zero-Modified Count Data” that the zero-altered
the two model components separately. For exam- model can be used to test for the presence of zero
ple, for the logistic-Poisson hurdle model (8), β1 is modification. ML estimation for zero-altered
estimated by fitting a logistic regression to π i = models proceeds via Newton-Raphson or related
Pr(Yi > 0), while β2 is estimated by fitting a Fisher scoring methods. If there is evidence of
truncated Poisson model for YijYi > 0. Newton- zero inflation, the investigator may subsequently
Raphson or Fisher scoring algorithms are typi- elect to fit a zero-inflated model. In this case,
cally used for ML estimation. For large samples, Vuong’s test (Vuong 1989) or the score test devel-
asymptotically approximate confidence intervals oped by Ridout et al. (2001) can be used to choose
can be obtained using well-established normal between ZIP and ZINB models. Xiang et al.
theory results. Predicted values for a future (2007) recently extended the testing procedure to
response, yi , can also be generated as functions account for repeated measures.
of the regression estimates: There is a wide range of model fitting strategies
for clustered and longitudinal data, including
0^
π^ i μ
î e xi β 1 mixed models and GEE. GEE (Liang and Zeger
^y i ¼ , where π^ i ¼ 0^
1 e^μ i 1 þ e xi β 1 (19) 1986) is a quasi-likelihood approach in which re-
gression estimates are first obtained from score-
0^
^ i ¼ e xi β 2 :
and μ type estimating equations that include a “working
covariance” matrix to account for within-cluster
Bootstrapping or large-sample Taylor series association. Next, asymptotic standard errors that
approximations can then be used to obtain confi- are robust to possible mis-specifications of the
dence intervals for the predicted values. working covariance structure are derived. For hur-
For zero-inflated models, recall that the latent dle models, GEE estimation proceeds by sepa-
“eligibility” indicator Z – and hence the eligibility rately estimating the parameters for the binary
and truncated count components. Dobbie and way, Bayesian methodology provides a natural
Welsh (2001) used GEE to fit a Poisson hurdle scheme for learning from prior experience. For
model to clustered count data. Hall and Zhang zero-modified count models, the posterior distribu-
(2004) extended the approach to zero-inflated tions generally do not have closed forms, and
models by combining GEE with an EM-type hence Markov chain Monte Carlo (MCMC) algo-
expectation step, resulting in a two-step “expecta- rithms, such as Gibbs sampling (Gelfand and Smith
tion-solution” (ES) procedure (Rosen et al. 2000). 1990), are often used for posterior inference. At
In the E-step, the expectation of the complete data convergence, the MCMC draws form a Monte
log-likelihood with respect to the latent indicator Carlo sample from the joint posterior distribution
Z is computed; in the S-step, GEE is used in lieu of of all model parameters, which can then be used to
maximum likelihood to obtain parameter esti- obtain parameter estimates and corresponding
mates and robust standard errors separately for interval estimates (credible intervals), thus
each component of the model. avoiding the need for asymptotic assumptions.
For the zero-modified random effects models Moreover, because MCMC produces draws from
described in Eq. 13, Min and Agresti (2005) pro- the entire joint posterior distribution of the model
posed a two-stage approach in which numerical parameters, estimation of complex functions of
integration, such as Gaussian quadrature, is first parameters is straightforward. For example, the
used to estimate the marginal likelihood inte- Bayesian framework is ideal for estimating and
grated across the random effects; then, in the obtaining uncertainty intervals for quantities such
second stage, Fisher scoring is used to maximize as the population IRR given in Eq. 10. In the
the estimated marginal likelihood. More recently, maximum likelihood setting, one would have to
Kim et al. (2012) used restricted maximum quasi- perform bootstrapping or derive a Taylor series
likelihood (RMQL) to fit a correlated negative approximation to obtain standard errors and confi-
binomial hurdle model. dence intervals for such quantities.
Several authors have used the EM algorithm In recent years, there has been growing interest
for fitting longitudinal finite mixture (or “latent in Bayesian methods for fitting zero-modified
class”) models. Roeder et al. (1999) used EM to fit models. Rodrigues (2003) proposed a data-
a latent class trajectory model as part of a study augmented Gibbs sampling algorithm to fit a ZIP
examining risk factors for long-term criminal model. Ghosh et al. (2006) used a similar
behavior. Dalrymple et al. (2003) adopted a sim- approach to fit zero-inflated generalized power
ilar approach to study longitudinal trends in sud- series models, which include the ZIP as a special
den infant death syndrome, or SIDS. Min and case. Neelon et al. (2010) developed Bayesian
Agresti (2005) used the EM algorithm to fit a model fitting strategies for repeated measures hur-
discrete random effects model in an analysis of dle, ZIP, and ZAP models and compared various
pharmaceutical side effects. prior specifications, model comparison strategies,
Bayesian methods are also well suited for infer- and approaches to assessing model fit. Ghosh
ence involving zero-modified count data. In Bayes- et al. (2012) used Gibbs sampling to fit the
ian inference, model parameters are treated as k-ZIG model, which accommodates extreme
random variables and assigned prior distributions zero inflation. Several authors have proposed
that quantify one’s uncertainty about their values Bayesian methods for analyzing zero-modified
prior to observing the data. Common prior distri- spatial data (Neelon et al. 2012; Rathbun and Fei
butions for regression models include normal dis- 2006; Ver Hoef and Jansen 2007). In particular,
tributions for fixed-effect parameters, inverse- Neelon et al. (2012) used hybrid Gibbs and
gamma distributions for error variances, and Metropolis-Hastings steps to fit a spatially corre-
inverse-Wishart distributions for random effect lated Poisson hurdle model. For more on Bayesian
covariance matrices. These prior distributions are estimation of zero-inflated count models, see
then combined with the current data via Bayes’ Winkelmann (2008), Neelon et al. (2010, 2012),
theorem to obtain posterior distributions. In this and Zuur et al. (2012).
Semicontinuous Models models, see Cooper et al. (2003) and Neelon

et al. (2011).
Similar procedures can be used for parameter There is an extensive literature regarding esti-
estimation in two-part semicontinuous models. mation of the untransformed mean response in
For example, Duan et al. (1983) used maximum part 2 of semicontinuous models, a quantity of
likelihood to estimate a fixed-effects probit-log- primary interest in many studies (Duan 1983;
normal model. The two components were esti- Manning 1998; Manning and Mullahy 2001).
mated separately by fitting a probit regression Recall that in the two-part
lognormal regression
model for the binary part and a lognormal regres- model (16), exp x0i β2 represents the median
sion model for the continuous part. Maximum response (given xi) among the positive observa-
likelihood can also be used to fit the Tobit and tions. The untransformed mean response, mean-
ZIT models. For the ZIT, Moulton and Hasley while, is given by
(1995) first restructured the model as a hurdle-
type model, with the first component comprising ψ ðxi Þ ¼ EðY i j Y i > 0, xi Þ
both “true” and censored zeros and the second
component consisting of positive responses that ¼ exp x0i β2 þ σ 2 =2 , (20)
were assumed to follow a truncated lognormal
distribution. They then used a quasi-Newton- where σ 2 is the lognormal variance. Thus, if
Raphson procedure to obtain ML estimates. The interest lies in estimating ψ(xi), it is necessary to
EM algorithm could also be applied in this con- estimate σ 2. A consistent
estimatorof ψ(xi) is
text, since the true zeros in the ZIT model are given by ψ^ ðxi Þ ¼ exp x0i ^β 2 þ σ^ 2 =2 , where ^β 2
comparable to the structural zeros in zero-inflated is the ordinary least squares (OLS) estimate of β2
count models. and σ^ 2 is the mean squared error obtained from
For clustered semicontinuous data, one can use regressing the log-transformed response on x.
GEE to fit population-average two-part models. However, if the log-normality assumption is vio-
The two components can be estimated separately lated – for example, if the data arise from a mix-
by fitting one GEE-estimated model for the binary ture of lognormal distributions – then ψ^ ðxi Þ is not
part and another for the continuous part. For a consistent estimator for ψ(xi). To accommodate
two-part mixed models with uncorrelated random departures from log-normality, Duan (1983)
effects, ML estimates can be derived by fitting developed a consistent nonparametric estimator
separate random effects models for each compo- known as the smearing estimator, which is
nent. For the correlated two-part model given in expressed as
Eq. 17, Olsen and Schafer (2001) used a sixth-
order Laplace approximation together with an 1 X
^ ðY 0 j Y 0 > 0, X ¼ x0 Þ ¼ exp x0 ^ expðê i Þ,
E 0β2
approximate Fisher scoring algorithm to derive nþ i:Y >0
i
parameter estimates. Tooze et al. (2002) adopted
¼ exp x00 ^
β2 ^S,
a slightly different approach, using adaptive
Gaussian quadrature to first approximate the mar- (21)
ginal likelihood and then applying quasi-Newton-
Raphson to obtain parameter estimates. where Y0 denotes the untransformed response
Several authors have proposed Bayesian for an individual with covariate profile, x0 , n+
approaches for fitting two-part semicontinuous is the number of nonzero observations, and ê i ¼
models (Neelon et al. 2011; Cooper et al. 2003, ln ðY i Þ x0i ^β 2 is the residual for the i-th nonzero
2007; Deb et al. 2006; Ghosh and Albert 2009). observation. The expression ^S is known as the
For example, Neelon et al. (2011) used a data- “smearing factor.” The method generalizes to any
augmented MCMC algorithm to fit a probit- monotone differentiable function g(Y ). Because
lognormal correlated two-part model. For more the smearing estimator is nonparametric, it makes
on Bayesian inference in two-part semicontinuous no explicit assumption about the distributional form
of Y, only that E [g(Yi)| Yi > 0, xi] is a linear (Yi| xi), yielding a simpler, one-part GLM that
function of β2 and that the errors are independent incorporates both zero and nonzero values.
and identically distributed with mean zero with Estimation for GLMs proceeds by nonlinear
homogeneous variance σ 2. When the errors are weighted least squares, with weights proportional
heteroscedastic – for example, when they depend to the inverse variances of the observations
on covariates – the smearing estimator is biased (Buntin and Zaslavsky 2004). The choice of λ is
(Manning 1998). Three approaches have important, since it can affect the efficiency of the
been proposed to account for heteroscedasticity parameter estimates. Choosing λ = 0 implies con-
when constructing a smearing estimator: (1) esti- stant variance; λ = 1 implies a “Poisson-type”
mate unique smearing factors for different covar- variance proportional to the mean; and λ = 2
iate subgroups (Manning 1998), (2) apply results in a “gamma-type” variance. To help
separate smearing factors to different parts of guide this choice, one can apply the Park test
the response distribution (Buntin and Zaslavsky (Park 1966) which exploits the fact that
2004) or (3) use ^ S ¼ E½exp ðê i Þj Y i > 0, xi as a
corrected smearing factor (Jones 2011) which ln ½VðY i j Y i > 0, xi Þ ¼ constant
can be obtained by regressing the exponentiated þ λ ln ½ψ ðxi Þ: (22)
estimated residuals on x and using the predicted
values as the smearing factors at the corresponding To apply the Park test, the squared residuals
values of x. Recently, Welsh and Zhou (2006) devel- from a candidate model are regressed on the
oped a heteroscedastic smearing estimator for the log-transformed predicted values, yî :
untransformed marginal mean, E(Yi| xi), averaged h i
over both the zero and nonzero observations. ln ðyi ^y i Þ2 j yi > 0
Note that retransformations to the Y-scale
pose no difficulty for Bayesian inference: after ¼ α þ λ lnð^y i Þ þ ei , i ¼ 1, . . . , n, (23)
drawing MCMC samples of model parameters
on the transformed scale, simply retransform where ei is a mean-zero error term. An estimate
and take the average to estimate the posterior of λ close to zero suggests constant variance, an
mean on the original data scale. However, estimate close to 1 suggests a Poisson-like vari-
unless advanced Bayesian nonparametric tech- ance, and an estimate close to 2 suggests a
niques are employed (Ferguson 1973), an gamma-type structure.
explicit parametric form for the likelihood In deciding between a GLM and a transformed
must be assumed. parametric model, one can employ the following
Quasi-likelihood generalized linear models decision procedure, adapted from Manning and
(GLMs) offer an alternative approach to estimat- Mullahy (2001):
ing the untransformed mean in part 2 of the
model (Manning and Mullahy 2001; Buntin and 1. Fit an OLS regression to the transformed pos-
Zaslavsky 2004; Blough et al. 1999). Here, the itive values.
untransformed
mean is modeled as ψ ðxi Þ ¼ 2. If the residuals are highly kurtotic, then the
h x0i β2 , where h is an inverse-link function parametric two-part model is generally prefer-
(e.g., the exponential function). By modeling able, since high kurtosis can lead to impreci-
ψ(xi) directly, GLMs avoid the need to transform sion (high variability) in quasi-likelihood
Y altogether. Next, the variance of Y(Y > 0) is GLM parameter estimates. To guard against
modeled as a function of covariates, typically model mis-specifications, smearing should
using a power function of the form V(Yi|Yi > 0, xi) be applied when estimating the untransformed
/ ψ(xi)λ. The approach does not specify a distri- mean. In the presence of heteroscedasticity,
bution for Y, making it robust to mis-specifications multiple covariate- or response-dependent
that might otherwise occur. The method can also be smearing factors should be applied to
used to directly estimate the marginal mean, E reduce bias.
3. If there is minimal kurtosis, fit a series of quasi- parameter space depends on the degree of het-
likelihood GLMs and apply the Park test to erogeneity between individuals, with greater
determine the optimal value of λ. heterogeneity implying more “effective”
4. To avoid over-fitting, use penalized model parameters. DIC was specifically designed to
comparison or cross validation techniques, estimate the number of effective parameters in
such as split-sample analyses, to choose between Bayesian hierarchical models. Celeux et al.
competing models. (2006) recently adapted the measure to accom-
modate additional latent variable models, such
The choice between models is guided by non- as finite mixtures.
statistical considerations as well. For example, if A second Bayesian comparison measure is
there is interest in estimating both the probability the Bayes factor (Kass and Raftery 1995) which
of a positive response and the mean response offers perhaps the most principled approach to
among positive observations, then a two-part Bayesian model selection. However, because
model (either parametric or quasi-likelihood Bayes factors rely on the marginal likelihood
GLM) may be preferable to a one-part model. of the data under a presumed model, they are not
Further, if it is reasonable to assume that the two defined for improper (infinite variance) prior
components are correlated, then a correlated para- distributions. To accommodate improper priors,
metric two-part model, as in Eq. 17, might be most alternative criteria such as the intrinsic Bayes
appropriate. For a more detailed comparison of factor (Berger and Pericchi 1996) have been
quasi-likelihood GLMs and transformed paramet- proposed. The pseudo Bayes factor (Gelfand
ric models, see Manning and Mullahy (2001) and and Dey 1994) offers a computationally conve-
Buntin and Zaslavsky (2004). nient numerical approximation to the Bayes
factor, but it has been criticized recently due to
its reliance on the computationally unstable
Model Comparison and Assessment harmonic mean (Raftery et al. 2007). Several
other Bayesian comparison measures have
There are several model comparison measures been proposed specifically in connection with
that can be used to select among competing zero-inflated count models, including the
two-part models, including the Akaike informa- group-marginalized DIC (Millar (2009) and
tion criterion (AIC) (Akaike 1974) and the Bayes- the predictive log-score loss function (Ghosh
ian information criterion (BIC), also known as the et al. 2012).
Schwarz criterion (Schwarz 1978). AIC and BIC To further assess model fit in the Bayesian
are referred to as “penalized” criteria because they setting, one can apply Bayesian posterior predic-
combine a measure of model fit, typically twice tive checks, whereby the observed data are com-
the negative log-likelihood, with a penalty for pared to data replicated from the posterior
model complexity, expressed as a function of the predictive distribution (Gelman et al. (1996). If
number of parameters. Smaller values of AIC and the model fits well, the replicated data should
BIC are considered preferable. A related measure resemble the observed data. To quantify the
for quasi-likelihood and GEE models is the quasi- degree of similarity, one typically chooses a “dis-
likelihood under independence, or QIC, criterion crepancy statistic,” such as a sample moment or
(Pan 2001). In the Bayesian setting, a common quantile, which captures some important aspect of
model comparison statistic is the deviance infor- the data. The Bayesian predictive p-value denotes
mation criterion (DIC) (Spiegelhalter et al. 2002) the probability that the model-predicted statistic is
which can be used to compare Bayesian hierar- more extreme than the observed sample value
chical (i.e., random effect) models. As with the (i.e., the value expected under the correct
other selection criteria, DIC balances an assess- model). A Bayesian p-value close to 0.50 repre-
ment of model fit with a penalty for complexity. sents adequate model fit, while p-values near 0 or
For random effects models, the dimension of the 1 indicate lack of fit.
For more information on Bayesian model com- (2004) provide example code for fitting such
parison and assessment strategies, see Millar (2009), models. Finally, WinBUGS can be used to fit Bayes-
Neelon et al. (2010, 2011), and Ando (2010). ian two-part semicontinuous models; see Cooper
et al. (2003, 2007) Cooper et al.(2007) and Ghosh
and Albert (2009) for examples. Readers should
visit the appropriate software websites for updates
Software
and current versions of these packages.
There are a number of software packages that can
be used for fitting zero-modified count and semi-
continuous models. The statistical software pro- Conclusion
gram R (R Development Core Team 2012) has
several packages for fitting zero-modified count Two-part models play an important role in health
models, including the pscl (Zeileis et al. 2008; services research settings where data are character-
Jackman 2012) package, which performs ML esti- ized by both a high proportion of zeros and a skewed
mation of zero-inflated and hurdle models; distribution of positive values. By modeling the zero
glmmADMB (Fournier et al. (2012; Skaug et al. and nonzero values in distinct ways, two-part
(2012)) for fitting random effect zero-inflated and models offer a flexible parametric approach to the
hurdle models; and MCMCglmm (Hadfield 2010) analysis of zero-modified count and semicontinuous
for Bayesian estimation of hurdle, zero-inflated, data. In many cases, such flexibility can yield
and zero-altered models. SAS 9.1.3 Help and Doc- improved model fit over traditional one-part models.
umentation (2000) offers PROC COUNTREG for At the same time, the reliance on parametric
fitting zero-inflated count regressions, PROC assumptions can be a liability, particularly in the
GENMOD for GEE models, and PROCs case of semicontinuous data. Misguided assump-
NLMIXED and GLIMMIX for random effect tions about the response distribution will naturally
zero-modified count models. Stata Statistical Soft- lead to biased inferences. As in any regression anal-
ware (2011) uses the zip and zinb commands for ysis, careful attention to modeling assumptions is
fitting ZIP and ZINB models, HPLOGIT and paramount to achieving unbiased parameter esti-
HNBLOGIT for hurdle models (Hilbe 2005a, b), mates. If these assumptions appear to be violated,
and gllamm for fitting random effect ZIP models distribution-free quasi-likelihood or other semi-
(Rabe-Hesketh et al. 2005). For Bayesian infer- parametric approaches may be preferable.
ence, the freeware package WinBUGS (Lunn There are a number of active areas of research
et al. 2000) can be used to fit various zero-modified involving two-part models. These include
count models, including hierarchical models. See two-part spatial and spatiotemporal models for
Neelon et al. (2010, 2012) for examples. semicontinuous data, shared-parameter models
Many of these packages can also be used to for informatively censored zero-modified counts,
fit semicontinuous models. For example, SAS and inverse-probability weighting methods for
PROC NLMIXED can be used to fit random effect population-average two-part models. These
semicontinuous models (Tooze et al. 2002) The developments highlight just a few of the potential
freeware package ML (Lillard and Panis 1998) opportunities for methodological research involv-
can be used to fit multilevel two-part models; see ing two-part models.
Liu et al. (2008) for an application. Mplus soft- Lastly, given the scope of the methods
ware (Muthén and Muthén 1998) is useful for described above, this chapter should be viewed
fitting finite mixture and growth mixture as an introductory overview of two-part modeling.
two-part models; Muthén (2001) provides an Readers are encouraged to consult the references
illustration. SAS PROC GENMOD and the Stata cited herein for further discussions of two-part
command glm can be used to fit quasi-likelihood models and their ongoing application to health
one- and two-part models. Buntin and Zaslavsky services research.
References 38–53. Available from http://mdm.sagepub.com/con

tent/23/1/38.abstract
Agarwal DK, Gelfand AE, Citron-Pousty S. Zero-inflated Cooper NJ, Lambert PC, Abrams KR, Sutton
models with application to spatial count data. AJ. Predicting costs over time using Bayesian Markov
Environ Ecol Stat. 2002;9(4):341–55. Available from chain Monte Carlo methods: an application to early
http://www.ingentaconnect.com/content/klu/eest/2002 inflammatory polyarthritis. Health Econ. 2007;16
/00000009/00000004/05102063 (1):37–56. https://doi.org/10.1002/hec.1141.
Akaike H. A new look at the statistical model identifica- Dalrymple ML, Hudson IL, Ford RPK. Finite mixture,
tion. IEEE Trans Autom Control. 1974;19(6):716–23. zero-inflated Poisson and hurdle models with applica-
Albert P, Follman D. Shared-parameter models. In: tion to SIDS. Comput Stat Data Anal. 2003;41
Fitzmaurice G, Davidian M, Ver-beke G, (3–4):491–504. https://doi.org/10.1016/S0167-9473
Molenberghs G, editors. Longitudinal data analysis. (02)00187-1.
Boca Raton: Chapman & Hall/CRC Press; 2009. Deb P, Munkin MK, Trivedi PK. Bayesian analysis of the
p. 433–52. two-part model with endogeneity: application to health
Albert JM, Wang W, Nelson S. Estimating overall expo- care expenditure. J Appl Econ. 2006;21(7):1081–99.
sure effects for zero-inflated regression models with https://doi.org/10.1002/jae.891.
application to dental caries. Stat Methods Med Res. DeSantis SM, Bandyopadhyay D. Hidden Markov models
2011. Available from http://smm.sagepub.com/con for zero-inflated Poisson counts with an application to
tent/early/2011/09/08/0962280211407800.abstract substance use. Stat Med. 2011;30(14):1678–94. https://
Ando T. Bayesian model selection and statistical modeling. doi.org/10.1002/sim.4207.
Boca Raton: Chapman Hall/CRC Press; 2010. Dobbie MJ, Welsh AH. Modelling correlated zero-inflated
Arab A, Holan SH, Wikle CK, Wildhaber count data. Aust N Z J Stat. 2001;43(4):431–44. https://
ML. Semiparametric bivariate zero-inflated Poisson doi.org/10.1111/1467-842X.00191.
models with application to studies of abundance for Duan N. Smearing estimate: a nonparametric retransforma-
multiple species. ArXiv e-prints. 2011. Available from tion method. J Am Stat Assoc. 1983;78(383):605–10.
http://arxiv.org/abs/1105.3169v1 Available from http://www.jstor.org/stable/2288126
Basu A, Manning WG. Estimating lifetime or episode-of- Duan N, Manning J Willard G, Morris CN, Newhouse
illness costs under censoring. Health Econ. 2010;19 JP. A comparison of alternative models for the demand
(9):1010–28. https://doi.org/10.1002/hec.1640. for medical care. J Bus Econ Stat. 1983;1(2):115–26.
Berger JO, Pericchi LR. The intrinsic Bayes factor for Available from http://www.jstor.org/stable/1391852
model selection and prediction. J Am Stat Assoc. Fahrmeir L, Osuna EL. Structured additive regression for
1996;91(433):109–22. Available from http://www. overdispersed and zero-inflated count data. Appl Stoch
jstor.org/stable/2291387 Model Bus Ind. 2006;22(4):351–69. https://doi.org/
Blough DK, Madden CW, Hornbrook MC. Modeling 10.1002/asmb.631.
risk using generalized linear models. J Health Ferguson TS. A bayesian analysis of some nonparametric
Econ. 1999;18(2):153–71. Available from http://www. problems. Ann Stat. 1973;1(2):209–30. Available from
sciencedirect.com/science/article/pii/S0167629698000320 http://www.jstor.org/stable/2958008
Buntin MB, Zaslavsky AM. Too much ado about two-part Fournier DA, Skaug HJ, Ancheta J, Ianelli J,
models and transformation?: comparing methods Magnusson A, Maunder MN, et al. AD model builder:
of modeling Medicare expenditures. J Health Econ. using automatic differentiation for statistical inference
2004;23(3):525–42. Available from http://www. of highly parameterized complex nonlinear models.
sciencedirect.com/science/article/pii/S0167629604000220 Optim Methods Softw. 2012;27(2):233–49. https://
Buu A, Johnson NJ, Li R, Tan X. New variable selection doi.org/10.1080/10556788.2011.597854.
methods for zero-inflated count data with applica- Gelfand AE, Dey DK. Bayesian model choice: asymptotics
tions to the substance abuse field. Stat Med. 2011 and exact calculations. J R Stat Soc Ser B Stat
;30(18):2326–40. https://doi.org/10.1002/sim.4268. Methodol. 1994;56(3):501–14. Available from http://
Cameron AC, Trivedi PK. Regression analysis of count www.jstor.org/stable/2346123
data. No. 9780521635677 in Cambridge Books. Gelfand AE, Smith AFM. Sampling-based approaches to
Cambridge University Press; 1998. Available from calculating marginal densities. J Am Stat Assoc.
http://ideas.repec.org/b/cup/cbooks/9780521635677. 1990;85(410):398–409. Available from http://www.
html jstor.org/stable/2289776
Celeux G, Forbes F, Robert CP, Titterington DM. Deviance Gelman A, li Meng X, Stern H. Posterior predictive assess-
information criteria for missing data models. Bayesian ment of model fitness via realized discrepancies. Stat
Anal. 2006;1(4):651–74. Sin. 1996;6:733–807.
Consul P. Generalized Poisson distributions: properties and Ghosh P, Albert PS. A Bayesian analysis for longitudinal
applications. New York: Marcel Dekker; 1989. semicontinuous data with an application to an acupunc-
Cooper NJ, Sutton AJ, Mugford M, Abrams KR. Use of ture clinical trial. Comput Stat Data Anal. 2009;
Bayesian Markov chain Monte Carlo methods to model 53(3):699–706. https://doi.org/10.1016/j.csda.2008.
cost-of-illness data. Med Decis Mak. 2003;23(1): 09.011.
Ghosh SK, Mukhopadhyay P, Lu JC. Bayesian 2005;5(3):201–15. Available from http://smj.sagepub.

analysis of zero-inflated regression models. J Stat com/content/5/3/201.abstract
Plann Infer. 2006;136(4):1360–75. Available Jackman S. pscl: classes and methods for R developed in
from http://www.sciencedirect.com/science/article/ the political science computational laboratory.
pii/S0378375804004008 Stanford: Stanford University; 2012. R package ver-
Ghosh S, Gelfand AE, Zhu K, Clark JS. The k-ZIG: flexible sion 1.04.4. Available from http://pscl.stanford.edu/
modeling for zero-inflated counts. Biometrics. 2012; Jones AM. Models for health care. In: Hendry D,
68(3):878–85. https://doi.org/10.1111/j.1541-0420.2011. Clements M, editors. Oxford handbook of economic
01729.x. forecasting. Oxford: Oxford University Press; 2011.
Green W. Accounting for excess zeros and sample selec- p. 625–54.
tion in Poisson and negative binomial regression Kass RE, Raftery AE. Bayes factors. J Am Stat Assoc.
models. Working paper EC-94-10, Department of Eco- 1995;90(430):773–95. Available from http://www.
nomics. New York: New York University; 1994. jstor.org/stable/2291091
Gschlößl S, Czado C. Modelling count data with over- Kim S, Chang CC, Kim K, Fine M, Stone R. BLUP
dispersion and spatial effects. Stat Pap. 2008;49:531–52. (REMQL) estimation of a correlated random effects
https://doi.org/10.1007/s00362-006-0031-6. negative binomial hurdle model. Health Serv Outcome
Gupta PL, Gupta RC, Tripathi RC. Analysis of zero- Res Methodol. 2012;12:302–19. https://doi.org/
adjusted count data. Comput Stat Data Anal. 1996;23 10.1007/s10742-012-0083-0.
(2):207–18. Available from http://EconPapers.repec. Lam KF, Xue H, Bun CY. Semiparametric analysis of zero-
org/RePEc:eee:csdana:v:23:y:1996:i:2:p:207-218 inflated count data. Biometrics. 2006;62(4):996–1003.
Hadfield JD. MCMC methods for multi-response general- https://doi.org/10.1111/j.1541-0420.2006.00575.x.
ized linear mixed models: the MCMCglmm R package. Lambert D. Zero-inflated poisson regression, with an appli-
J Stat Softw. 2010;33(2):1–22. Available from http:// cation to defects in manufacturing. Technometrics.
www.jstatsoft.org/v33/i02/ 1992;34(1):1–14. Available from http://www.jstor.org/
Hall DB. Zero-inflated poisson and binomial regression with stable/1269547
random effects: a case study. Biometrics. 2000;56(4): Li CS, Lu JC, Park J, Kim K, Brinkley PA, Peterson
1030–9. https://doi.org/10.1111/j.0006-341X.2000. JP. Multivariate zero-inflated Poisson models and
01030.x. their applications. Technometrics. 1999;41(1):29–38.
Hall DB, Zhang Z. Marginal models for zero inflated https://doi.org/10.2307/1270992.
clustered data. Stat Model. 2004;4(3):161–80. Available Liang KY, Zeger SL. Longitudinal data analysis using
from http://smj.sagepub.com/content/4/3/161.abstract generalized linear models. Biometrika. 1986;73(1):
Hasan MT, Sneddon G. Zero-inflated Poisson regression 13–22. Available from http://biomet.oxfordjournals.
for longitudinal data. Commun Stat – SimulCompu. org/content/73/1/13.abstract
2009;38(3):638–53. Lillard LA, Panis CWA. Multiprocess multilevel model-
Hasan MT, Sneddon G, Ma R. Pattern-mixture zero- ling, version 2, user’s guide and reference manual. Los
inflated mixed models for longitudinal unbalanced Angeles: EconoWare; 1998–2003.
count data with excessive zeros. Biom J. Little RJA, Rubin DB. Statistical analysis with missing
2009;51(6):946–60. Available from https://doi.org/ data. 2nd ed. Hoboken: Wiley; 2002.
10.1002/bimj.200900093 Liu H. Growth curve models for zero-inflated count data:
Hatfield LA, Boye ME, Carlin BP. Joint modeling of mul- an application to smoking behavior. Struct Equ Model
tiple longitudinal patient-reported outcomes and sur- Multidiscip J. 2007;14(2):247–79. https://doi.org/
vival. J Biopharm Stat. 2011;21(5):971–91. Available 10.1080/10705510709336746.
from http://www.tandfonline.com/doi/abs/10.1080/ Liu L. Joint modeling longitudinal semi-continuous data
10543406.2011.590922 and survival, with application to longitudinal medical
Heilbron DC. Zero-altered and other regression models for cost data. Stat Med. 2009;28(6):972–86. Available
count data with added zeros. Biom J. 1994;36(5): from https://doi.org/10.1002/sim.3497
531–47. https://doi.org/10.1002/bimj.4710360505. Liu L, Ma JZ, Johnson BA. A multi-level two-part random
Hilbe J. HNBLOGIT: stata module to estimate negative effects model, with application to an alcohol-
binomial-logit hurdle regression; 2005a. Statistical dependence study. Stat Med. 2008;27(18):3528–39.
Software Components, Boston College Department of Available from https://doi.org/10.1002/sim.3205
Economics. Available from http://ideas.repec.org/c/ Liu L, Strawderman RL, Cowen ME, Shih YCT. A flexible
boc/bocode/s456401.html two-part random effects model for correlated medical
Hilbe J. HPLOGIT: stata module to estimate Poisson-logit costs. J Health Econ. 2010;29(1):110–23. Available
hurdle regression. Statistical Software Components, from http://www.sciencedirect.com/science/article/pii/
Boston College Department of Economics; 2005b. S0167629609001386
Available from http://ideas.repec.org/c/boc/bocode/ Liu L, Strawderman RL, Johnson BA, O’Quigley
s456405.html JM. Analyzing repeated measures semi-continuous
Hsu CH. Joint modelling of recurrence and progression of data, with application to an alcohol dependence
adenomas: a latent variable approach. Stat Model. study. Stat Methods Med Res. 2012. Available from
http://smm.sagepub.com/content/early/2012/04/01/ Stat Methods Med Res. 2008;17(2):123–39. Avail-

0962280212443324.abstract able from http://smm.sagepub.com/content/17/2/
Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS – 123.abstract
a Bayesian modelling framework: concepts, structure, Neelon BH, OMalley AJ, Normand SLT. A Bayesian
and extensibility. Stat Comput. 2000;10(4):325–37. model for repeated measures zero inflated count data
https://doi.org/10.1023/A:1008929526011. with application to outpatient psychiatric service use.
Majumdar A, Gries C. Bivariate zero-inflated regression Stat Model. 2010;10(4):421–39. Available from http://
for count data: a Bayesian approach with application to smj.sagepub.com/content/10/4/421.abstract
plant counts. Int J Biostat. 2010;6(1):27. Available Neelon B, O’Malley AJ, Normand SLT. A bayesian
from http://ideas.repec.org/a/bpj/ijbist/v6y2010i1n27. two-part latent class model for longitudinal medical
html expenditure data: assessing the impact of mental health
Manning WG. The logged dependent variable, hetero- and substance abuse parity. Biometrics. 2011;67
scedasticity, and the retransformation problem. J Health (1):280–9. Available from https://doi.org/10.1111/
Econ. 1998;17(3):283–95. Available from http://www. j.1541-0420.2010.01439.x.
sciencedirect.com/science/article/pii/S0167629698000253 Neelon B, Ghosh P, Loebs PF. A spatial Poisson hurdle
Manning WG, Mullahy J. Estimating log models: to trans- model for exploring geographic variation in emergency
form or not to transform? J Health Econ. 2001;20 department visits. Journal of the Royal Statistical Soci-
(4):461–94. Available from http://www.sciencedirect. ety: Series A (Statistics in Society). 2012; Published
com/science/article/pii/S0167629601000868 online ahead of print. Available from https://doi.org/
Manning W, Morris C, Newhouse J, Orr L, Duan N, 10.1111/j.1467-985X.2012.01039.x
Keeler E, et al. A two-part model of the demand for Olsen MK, Schafer JL. A two-part random-effects model
medical care: preliminary results from the health insur- for semicontinuous longitudinal data. J Am Stat Assoc.
ance study. In: van der Gaag J, Perlman M, editors. 2001;96(454):730–45. https://doi.org/10.1198/01621
Health, economics, and health economics. Amsterdam: 4501753168389.
North-Holland; 1981. p. 103–23. Pan W. Akaike’s information criterion in generalized esti-
Manning WG, Basu A, Mullahy J. Generalized modeling mating equations. Biometrics. 2001;57(1):120–5.
approaches to risk adjustment of skewed outcomes Available from http://www.jstor.org/stable/2676849
data. J Health Econ. 2005;24(3):465–88. Available Park RE. Estimation with heteroscedastic error terms.
from http://www.sciencedirect.com/science/article/pii/ Econometrica. 1966;34(4):888. Available from http://
S0167629605000056 www.jstor.org/stable/1910108
Maruotti A. A two-part mixed-effects pattern-mixture Patil GP. Maximum likelihood estimation for generalized
model to handle zero-inflation and incompleteness power series distributions and its application to a
in a longitudinal setting. Biom J. 2011;53(5):716–34. truncated binomial distribution. Biometrika. 1962;
Available from https://doi.org/10.1002/bimj. 49(1–2):227–37. Available from http://biomet.
201000190 oxfordjournals.org/content/49/1-2/227.short
Millar RB. Comparison of hierarchical Bayesian models Preisser JS, Stamm JW, Long DL. Review and recommen-
for overdispersed count data using DIC and Bayes’ dations for zero-inflated count regression modeling of
factors. Biometrics. 2009;65(3):962–9. https://doi.org/ dental caries indices in epidemiological studies. Caries
10.1111/j.1541-0420.2008.01162.x. Res. 2012;46:413–23.
Min Y, Agresti A. Random effect models for repeated R Development Core Team. R: a language and environment
measures of zero-inflated count data. Stat Model. for statistical computing. Vienna; 2012. ISBN 3-900051-
2005;5(1):1–19. Available from http://smj.sagepub. 07-0. Available from http://www.R-project.org/
com/content/5/1/1.abstract Rabe-Hesketh S, Skrondal A, Pickles A. Maximum likeli-
Moulton LH, Halsey NA. A mixture model with detection hood estimation of limited and discrete dependent var-
limits for regression analyses of antibody response to iable models with nested random effects. J Econ.
vaccine. Biometrics. 1995;51(4):1570–8. Available 2005;128(2):301–23. Available from http://www.
from http://www.jstor.org/stable/2533289 sciencedirect.com/science/article/pii/S030440760400
Mullahy J. Specification and testing of some modified 1599
count data models. J Econ. 1986;33(3):341–65. Avail- Raftery AM, Newton MA, Satagopan JM, Krivitsky
able from http://www.sciencedirect.com/science/arti PN. Estimating the integrated likelihood via posterior
cle/pii/0304407686900023 simulation using the harmonic mean identity. In:
Muthén BO. Two-part growth mixture modeling; 2001. Bernardo JM, Bayarri MJ, Berger JO, Dawid AP,
Unpublished Manuscript. Available from http://pages. Heckerman D, Smith AFM, et al., editors. Bayesian
gseis.ucla.edu/faculty/muthen/articles/Article_094.pdf statistics 8. Oxford: Oxford University Press; 2007.
Muthén BO, Muthén LK. Mplus (Version 7). Muthén & p. 1–45.
Muthén; 1998–2012. Rathbun S, Fei S. A spatial zero-inflated poisson regression
Mwalili SM, Lesaffre E, Declerck D. The zero-inflated model for oak regeneration. Environ Ecol Stat.
negative binomial regression model with correction 2006;13:409–26. https://doi.org/10.1007/s10651-006-
for misclassification: an example in caries research. 0020-x.
Ridout M, Demétrio C, Hinde J. Models for count data with Ver Hoef JM, Jansen JK. Spacetime zero-inflated count
many zeros. Proceedings from the International Bio- models of harbor seals. Environmetrics. 2007;18
metric Conference, Cape Town; 1998. Available from (7):697–712. Available from https://doi.org/10.1002/
https://www.kent.ac.uk/smsas/personal/msr/webfiles/ env.873
zip/ibc_fin.pdf Vuong QH. Likelihood ratio tests for model selection and
Ridout M, Hinde J, DemAtrio CGB. A score test for testing non-nested hypotheses. Econometrica. 1989;57
a zero-inflated Poisson regression model against zero- (2):307–33. Available from http://www.jstor.org/sta
inflated negative binomial alternatives. Biometrics. ble/1912557
2001;57(1):219–23. Available from https://doi.org/ Walhin JF, Bivariate ZIP. Models. Biom J. 2001;43
10.1111/j.0006-341X.2001.00219.x (2):147–60. Available from 10.1002/1521-4036
Rodrigues J. Bayesian analysis of zero-inflated distribu- (200105)43:2<147::AID-BIMJ147> 3.0.CO;2-5
tions. Commun Stat Theory Methods. 2003;32 Welsh AH, Zhou XH. Estimating the retransformed mean
(2):281–9. Available from http://www.tandfonline. in a heteroscedastic two-part model. J Stat PlannInfer.
com/doi/abs/10.1081/STA-120018186 2006;136(3):860–81. Available from http://www.
Roeder K, Lynch KG, Nagin DS. Modeling uncertainty in sciencedirect.com/science/article/pii/S037837580400
latent class membership: a case study in criminology. J 3337
Am Stat Assoc. 1999;94(447):766–76. Available from Williamson JM, Lin HM, Lyles RH. Power calculations
http://www.jstor.org/stable/2669989 for ZIP and ZINB models. J Data Sci.
Rosen O, Jiang W, Tanner M. Mixtures of marginal models. 2007;5:519–34. Available from http://www.jds-
Biometrika. 2000;87(2):391–404. Available from http:// online.com/v5-4
biomet.oxfordjournals.org/content/87/2/391.abstract Winkelmann R. Econometric analysis of count data. 5th
SAS 9.1.3 Help and Documentation. Cary; 2000–2004. ed. Berlin: Springer; 2008. Available from http://gso.
Available from: http://sas.com/ gbv.de/DB=2.1/CMD?ACT=SRCHA&SRT=YOP&
Schwarz G. Estimating the dimension of a model. Ann IKT=1016&TRM=ppn+368353176&sourceid=fbw_
Stat. 1978;6(2):461–4. Available from http://www. bibsonomy
jstor.org/stable/2958889 Wu MC, Carroll RJ. Estimation and comparison of
Silva FF, Tunin KP, Rosa GJM, Silva MVBd, Azevedo changes in the presence of informative right censoring
ALS, Verneque RdS, et al. Zero-inflated Poisson by modeling the censoring process. Biometrics.
regression models for QTL mapping applied to 1988;44(1):175–88. Available from http://www.jstor.
tickresistance in a Gyr x Holstein F2 population. org/stable/2531905
Genet Mol Biol; 2011;34:575–82. Available from Xiang L, Lee AH, Yau KKW, McLachlan GJ. A score test
http://www.scielo.br/scielo.php?script=sci_arttext& for overdispersion in zero-inflated poisson mixed
pid=S1415-47572011000400008&nrm=iso regression model. Stat Med. 2007;26(7):1608–22.
Skaug H, Fournier D, Nielsen A, Magnusson A, Bolker Available from https://doi.org/10.1002/sim.2616
B. glmmADMB: generalized linear mixed models Xie H, McHugo G, Sengupta A, Clark R, Drake R. A method
using AD Model Builder; 2012. R package version for analyzing longitudinal outcomes with many zeros.
0.7.2.12. Available from http://glmmadmb.r-forge.r- Ment Health Serv Res. 2004;6:239–46. https://doi.
project.org org/10.1023/B:MHSR.0000044749.39484.1b. Available
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde from https://doi.org/10.1023/B:MHSR.0000044749.
A. Bayesian measures of model complexity and fit. J R 39484.1b
Stat Soc Ser B Stat Methodol. 2002;64(4):583–639. Yau KKW, Lee AH. Zero-inflated Poisson regression with
https://doi.org/10.1111/1467-9868.00353. random effects to evaluate an occupational injury pre-
Stata Statistical Software: Release 12. College Station; vention programme. Stat Med. 2001;20(19):2907–20.
2011. Available from http://stata.com/ Available from https://doi.org/10.1002/sim.860
Su L, Tom BDM, Farewell VT. Bias in 2-part mixed Zeileis A, Kleiber C, Jackman S. Regression models for
models for longitudinal semicontinuous data. Biosta- count data in R. J Stat Softw. 2008;27(8):1–25. Avail-
tistics. 2009;10(2):374–89. Available from http://biosta able from http://www.jstatsoft.org/v27/i08/
tistics.oxfordjournals.org/content/10/2/374.abstract Zhang M, Strawderman RL, Cowen ME, Wells
Su L, Brown S, Ghosh P, Taylor K. Modelling household MT. Bayesian inference for a two-part hierarchical
debt and financial assets: a Bayesian approach to a model: an application to profiling providers in man-
bivariate two-part model; 2012. aged health care. J Am Stat Assoc. 2006;101
Tobin J. Estimation of relationships for limited dependent (475):934–45. Available from http://www.jstor.org/
variables. Econometrica. 1958;26(1):24–36. Available stable/27590773
from http://www.jstor.org/stable/1907382 Zurr AF, Saveliev AA, Ieno EN. Zero inflated models and
Tooze JA, Grunwald GK, Jones RH. Analysis of repeated generalized linear mixed models with R. Newburgh:
measures data with clumping at zero. Stat Methods Highland Statistics Ltd; 2012. Available from http://
Med Res. 2002;11(4):341–55. Available from http:// www.highstat.com/book4.htm
smm.sagepub.com/content/11/4/341.abstract
Data Confidentiality
29
Theresa Henle, Gregory J. Matthews, and Ofer Harel
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718
Introducing the Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
Types of Disclosures and an Overview of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
Privacy for Different Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
Balancing Privacy Versus Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 722
Privacy-Preserving Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
Unperturbed and Perturbed Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
Basic Methods for Limiting Disclosure Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
More Sophisticated SDC Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725
Measuring Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
K-Anonymity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 728
Differential Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
Abstract getting data to researchers and maintaining this

When medical data are collected and dissemi- confidentiality is becoming an increasingly tri-
nated for research purposes, the organization cky proposition. Methods developed in the
which releases the data has an ethical, and in field of statistical disclosure control aim to
most cases a legal, responsibility to maintain thwart potential disclosures of private informa-
the confidentiality of the data relating to indi- tion while still allowing researchers the ability
viduals involved. Striking a balance between to use the data. This chapter presents a survey
of the main types of potential disclosure risks,
an overview of the widely used disclosure con-
T. Henle · G. J. Matthews trol methods, and the most common techniques
Department of Mathematics and Statistics, Loyola
University, Chicago, IL, USA
for measuring privacy.
e-mail: theresahenle@gmail.com; gjm112@gmail.com
O. Harel (*)
Department of Statistics, University of Connecticut, Storrs,
CT, USA
e-mail: ofer.harel@uconn.edu

https://doi.org/10.1007/978-1-4939-8715-3_28
718 T. Henle et al.
Introduction learn some private attribute or attributes of the

entities in a data set, often referred to as a “data
In 1997, Governor William Weld of Massachu- snooper” in the statistical privacy literature, will
setts arrived to his office to find his medical record use certain techniques, often in conjunction
waiting for him in his mailbox. Just months with other sources of data, to reveal private
before, he had authorized the release of state information about individual entities in the
employee medical records for research purposes data set that were not meant to be known.
and assured the public of the safety of their The field of statistical disclosure control has
release, since all explicit identifiers had been continued to grow in complexity as technology
removed from the data (i.e., name, address, social and access have made data more readily available.
security number, etc.). However, a young MIT Today, data are ubiquitous: from the development
graduate student by the named of Latanya of electronic health records (EHR) to personal
Sweeney thought otherwise. Using publicly avail- electronic devices and wearables. The availability
able voting records and the attributes that of data is an invaluable resource to researchers
remained in the data (ZIP code, birthdate, and for improving our understanding of diseases,
gender), Sweeney was able to positively identify treatments, and the human body. However,
Governor Weld’s medical record. Then to make a releasing data to researchers while maintaining
point, Sweeney mailed Governor Weld’s medical the privacy of the individuals involved is a
records to him directly (Sweeney 2002b, 2). unique balancing act.
Sweeney’s work exposed the vulnerability of Many agencies rely on publicly released data
medical data and triggered a widespread response to perform their research. The United States
by data publishers and policy makers alike. With Census Bureau and National Institutes of Health
the ever-expanding amount of publicly available (NIH) are two of the largest sources of publicly
data along with the increasing power of statistical available data. While these agencies seek to
tools, the confidentiality of data has become a share data with as many researchers as possible,
growing concern. To abate the threat of releasing they are also required to protect the privacy of
medical data which risks the privacy of individ- participants. Therefore, “investigators submitting
uals, the field of statistical disclosure control a research application requesting $500,000 or
was born. more of direct costs in any single year to NIH on
Statistical disclosure control (SDC) aims to or after October 1, 2003 are expected to include a
develop and assess techniques to safely release plan for sharing final research data for research
data to interested parties, such as researchers purposes, or state why data sharing is not possi-
(Matthews and Harel 2011). If done correctly, ble” (Matthews et al. 2010).
the distributed data should retain its utility to the It is clear that these organizations take privacy
researcher while fully ensuring the privacy of the seriously, but what exactly is meant by privacy in
participants involved. It is important to clarify these settings? Professor Alan Weston of Colum-
that a breach of privacy in this case is not a result bia University defined privacy as the right “to
of a “hacking” incident but rather discovery of determine what information about ourselves we
information through statistical techniques. In will share with others” (Fellegi 1972). What one
the case of hacking, an “intruder” or “adver- person may wish to keep private, another may
sary” gains unauthorized access to data in reveal publicly, but it is of utmost importance
order to discover sensitive information. In sta- that the individual makes the choice to release
tistical disclosure, an intruder is also interested the information rather than a third party. There-
in identifying sensitive information about spe- fore, when these preferences are unknown,
cific individuals for malicious intent, but they researchers must default to assuming that partici-
go about learning private information using data pants desire privacy.
which they are authorized to possess (Shlomo From a legal standpoint, steps have been taken
2015, 201). An individual party who seeks to to protect certain types of individuals’ data. For
29 Data Confidentiality 719
instance, in the mid-1990s, the US Congress as they “uniquely identify patients”(Gkoulalas-

passed several important pieces of legislation Divanis and Loukides 2015, 19).
regarding private information. The Health For example, data containing names, social
Insurance Portability and Accountability Act security numbers, or addresses make it easy for
(HIPAA 1996) was passed to protect medical an attacker to directly identify an individual and
data of individuals, and the Family Educational are prohibited from being released under HIPAA
Rights and Privacy Act (FERPA) was designed guidelines. In all there are 18 direct identifiers that
to protect the educational data of individuals. are unlawful to release in medical data. Quasi-
HIPAA requires that obvious identifiers, such as identifiers also have the potential to identify a
name, birthdate, and ZIP code, be removed patient but require several working in combina-
from medical records prior to data release tion in order to do so. Oftentimes, quasi-identifiers
unless explicitly authorized by an individual. exist partially within the data set of interest and
However, even with these baseline privacy partially in a public data set from a separate
measures, data can still be at risk of disclosure. source. By cross-tabulating between multiple
Legal guidelines alone are inadequate for ensur- sources, an adversary can identify individuals.
ing the protection of sensitive data, and there- Examples of quasi-identifying attributes could
fore greater measures must be taken to be demographic information (e.g., race, sex, age)
effectively maintain data privacy. or diagnosis codes. Lastly, sensitive attributes are
the type of patient information that researchers are
the most interested in protecting because it is often
Introducing the Basics an information a patient “is not willing to be
associated with” and therefore very often the tar-
Types of Disclosures and an Overview get of an adversary’s attack (Gkoulalas-Divanis
of Terms and Loukides 2015, 19). While the “specification
of sensitive attributes is generally left to data
The vulnerability of a given data set hinges on the owners” (Gkoulalas-Divanis and Loukides 2015,
structure and type of information contained within 19), common examples of sensitive attributes in
it. Rows in a data set are herein referred to as medical data include serious diseases, such as
records, with each record containing a variety of mental illness and life-threatening conditions
attributes. An attribute is a value associated with (Gkoulalas-Divanis and Loukides 2015, 19).
some variable that reveals information about the The presence of these attributes can lead to
record. Typically, in a medical data set, a record disclosure risks and ultimately threaten patients’
would be a person, and an attribute would be a privacy.
descriptive characteristic of that person (e.g., age). The main types of disclosure risk are:
Attributes need not be sensitive in nature, but
because of their potential to identify an individual, 1. Identity disclosure (or reidentification)
they should be considered when establishing 2. Attribute disclosure
methods of privacy protection. There are three 3. Inferential disclosure
main types of data attributes present in medical
health data, and the presence of these attributes Identity and attribute disclosures are the two
can lead to certain privacy disclosure risks most commonly cited types of disclosure risk.
(Gkoulalas-Divanis and Loukides 2015). Disclo- Identity disclosure occurs when an adversary is
sure risks define the process by which a data set able to identify an individual based on their record
can be breached. within a data set. If direct identifiers are present, or
The three types of attributes in medical data are the correct combination of quasi-identifiers, a
direct identifiers, quasi-identifiers, and sensitive patient is at risk of identity disclosure. According
attributes. Direct identifiers are the most danger- to Latanya Sweeney, “It has been estimated that
ous type of attributes from a privacy perspective, over 87% of US citizens can be re-identified based
720 T. Henle et al.
on a combination of only three demographics (ZIP user can difference two separate query results to
code, gender and date of birth)” (Sweeney 2000, gain confidential information about a single per-
1). Attribute disclosure occurs when an adversary son in the study. For example, a user might submit
is able to learn and reveal sensitive attributes a query based on all men under the age of 34, and
about an individual in the data set. Identity disclo- then subsequently submit a query based on all
sure if often a precursor to attribute disclosure: men under the age of 35. If the difference is
first, if a record in the data set is linked to an 1, then you have identified a unique combination
individual, then a private attribute about that of attributes of an individual in the data set. Note
individual is learned. When data are in tabular that in many cases, if the user were to submit a
form, attribute disclosure is most likely to occur query to the database based ONLY on men who
in a column that contains a degenerate distribu- were EXACTLY 34, in this case only one record,
tion of cell counts, as opposed to a column with many query systems will suppress a cell in a table
more uniformly distributed counts. In general, if the value is below some prespecified threshold
“a row or column with large cell counts would precisely because a small cell count can lead to
have less risk of identity or attribute disclosure attribute disclosures. By submitting nested
as compared to a row or column with small queries and differencing, a user is gaining access
counts” (Shlomo 2015, 215). to information that would potentially be
While attribute disclosure often takes place suppressed on its own. The best way to avoid
after an identity disclosure has occurred, there disclosure by differencing is to release a single
are other ways in which attribute disclosure can data set as opposed to providing a system that
occur such as group attribute disclosure, disclo- allows for flexible table generation.
sure by differencing, and disclosure by linking Similar to disclosure by differencing, disclo-
tables. In these scenarios, individuals need not sure by linking tables occurs when two tables
be identified in order for disclosure risk to occur, originate from the same source and therefore
such as in the case of group attribute disclosure, have the potential to be linked by common cells
where sensitive information is exposed about a or common margins. This can potentially allow an
group within the data set, rather than an individual adversary to discover the SDC technique which
person. For example, say that in a certain data guards the data and with it the original data values.
set, all people within a small ZIP code all have a The best way to avoid disclosure by linking tables
diagnosis of high blood pressure. If you know of is to ensure that the margins and cells of tables be
an individual in this data set who lives in that made consistent (Shlomo 2015, 214).
particular ZIP code, you now also know that The last type of disclosure risk is inferential
they have a high blood pressure diagnosis. disclosure. Inferential disclosure relies on proba-
Note that no identity disclosure has taken bility and/or modelling to expose attributes with a
place here as no particular record was matched high degree of confidence. One way in which
to any individual; however, an attribute disclo- inferential disclosure can occur is by way of
sure has still taken place. regression model if the model has “very high
Another way for attribute disclosure to occur is predictive power” (i.e., the dependent and explan-
from what is called disclosure by differencing. An atory variables are highly correlated). This spe-
example of this is where two nested tables (i.e., cific case of inferential disclosure is called model,
one table is a subset of another table) are sub- or predictive, disclosure. Willenborg and de Waal
tracted from one another exposing sensitive infor- (2001) explain predictive disclosure using micro-
mation previously unknown (Shlomo 2015, 214). data containing information about an individual’s
This is often a problem when data are accessed gender, age, occupation, location, and income. If
through flexible table generation. In such a sce- an adversary knows certain characteristics (i.e.,
nario, a user is not given access to the data set as a gender, age, occupation, and location) about a
whole but must submit queries to a database. This specific individual in the data set, they can build
is problematic from a privacy perspective when a a regression model to predict an unknown value
(say income). Through modelling, the adversary pre-tabular, post-tabular, or some combination of
has achieved the “predictive distribution of the these two methods.
target’s income” (Willenborg and de Waal Pre-tabular methods are implemented on orig-
2001); however, many would argue that such inal microdata before it is transformed into a tab-
inferences are the goal of statistical modelling, ular data set. Post-tabular methods modify data for
and therefore it extends beyond reasonable pri- privacy purposes after the data set is already in its
vacy protection. tabular form. The most common forms of post-
Disclosing by differencing, which was previ- tabular disclosure control techniques are methods
ously discussed, can also be considered a type of utilizing random rounding and cell suppression.
inferential disclosure. For this reason, inferential Data collection techniques can also impact the
disclosure is often associated with web-based type and degree of vulnerability associated with a
interactive data. The recent emergence of interest data set. Data can be generated either through
in inferential disclosure has catalyzed a need for sampling from a larger population or by collecting
stricter forms of privacy guarantees and has complete information on the population through
formed the basis for differential privacy, a popular what is called a census. Though sampling is more
privacy measurement discussed later in this chap- common because it requires fewer resources, cen-
ter (Shlomo 2015, 203). sus data are popular for government publications.
There are examples showing that an individual Census data contain unique challenges in pre-
does not even need to be in a data set in order to be venting identity and attribute disclosure because
at risk of inferential disclosure. For example, say a there is no uncertainty about membership in the
person were to release their genetic sequencing data. Conversely, sampling as part of data collec-
data (as was done in the Human Genome Project) tion process obscures the ability to make inference
and they are found to have a specific gene. Then on frequency counts, thereby reducing the possi-
later a subsequent study with new participants bility of identity disclosure.
reveals that specific gene makes a person Lastly, the introduction of new types of data
extremely likely develop a rare form of cancer. has opened the door to new concerns over data
The person who released their genetic data was vulnerability. With advancements in technology,
not a part of the study which determined the large quantities of data are being generated from
effects of this specific gene, but they are now processes that simply did not exist until recently
subject to its inference: that they will likely such as location data collected from cell phone or
develop this form of rare cancer. However, individuals’ genome-wide data. With the Global
according to the current privacy guidelines, Positioning System (GPS) capability of modern
researchers are only beholden to protect the pri- cell phones, it has become easy to track the loca-
vacy of those individuals included in the tions of millions of people at once, yielding mas-
published data set. sive quantities of location data. Location data are
problematic from a privacy perspective as it has
the potential to jeopardize confidential informa-
Privacy for Different Types of Data tion about individuals such as where they live or
where they work.
There are two common structures for how Genetic data, such as data used in genome-
published data sets are presented: microdata and wide association studies (GWAS), for example,
tabular data. Microdata are “data containing are another type of data with unique and ever-
observations on the individual level” such as expanding complexities. Since the inception of
social surveys or general health surveys. Tabular the Human Genome Project in 1990, and its com-
data “contains frequency counts or magnitude pletion in 2003, scientists have made great strides
data,” which is more typical of business surveys in understanding the human genetic structure. The
(Shlomo 2015, 202). Methods for protecting pri- availability of human genetic data is crucial for the
vacy in tabular data can be classified as either continued growth of genetic research. However,
722 T. Henle et al.
this new source of personal information has been When the intended use of the data is unknown,
accompanied by its very own set of privacy con- publishers can utilize information loss measures
cerns. Homer et al. (2008) demonstrated “the abil- to quantify data utility. These measures seek to
ity to accurately and robustly determine whether minimize data distortion in a broad sense, making
individuals are in a complex genomic DNA mix- the data more versatile but relinquishing the
ture” (Homer et al. 2008, 1). Homer argued the promise of utility for any specific task. Informa-
need for more stringent methods for sharing and tion loss measures compare the difference in util-
combining individual genotype data across stud- ity of the altered data set to the original data set.
ies, since “sharing only summary data do not This difference in utility is task specific, meaning
completely mask identity” (Homer et al. 2008, the altered data set may perform accurately for
9). Gymrek et al. (2013) demonstrated a shocking some desired analysis but inaccurately for others.
breach of privacy when they were able to identify For example, if a data set was altered such that
a man whose genetic data had been used in the marginal totals remained constant, it is likely that
Human Genome Project. By using a sequence of when testing for average values, one would
that man’s genetic data along with information achieve perfect retention of data utility. However,
about his location and age, they were able to in the same case, the relationship between vari-
cross-reference genealogical databases and public ables may not be maintained, and therefore a
records to discover the identity of the individual subsequent regression analysis would be ineffec-
(Gymrek et al. 2013). tive at reflecting the true nature of the data. This is
often the case when electronic health records and
data sets that contain multiple variables of interest
Balancing Privacy Versus Utility need to be shared (e.g., demographics and diag-
nosis codes). The two attribute types cannot be
Protection of privacy is not the only concern a anonymized separately; however, it is difficult to
data distributor must consider when preparing a “preserve data utility when anonymizing both of
data set. Establishing privacy protections can the attribute types together” (Gkoulalas-Divanis
often come at a loss of data utility, causing the and Loukides 2015, 30). For this reason, it is
data to become unusable or inaccurate. This is true desirable that publishers reveal the type of
regardless of the type of data used or how a privacy-preserving method used and that
publisher defines privacy in their data set. In gen- researchers consider the effect that method may
eral, “it is always possible to increase the privacy have on the tests they wish to perform.
of any specific data release, but this almost When the intended use of the data is known,
assuredly comes with a loss of data utility” (Mat- the data distributor will likely opt for utility-
thews et al. 2010). Therefore, publishers should constrained approach when measuring data utility.
think carefully about the balance between privacy The specific type of utility constraints employed
and utility when preparing a data set for publica- depends entirely upon the intended use of the data,
tion. Further, if the distributor knows that the data but in general a utility constraint prevents the
will be used for a specific purpose, this is often anonymization procedure from generating data
helpful information in choosing an appropriate that will produce vastly different results when
disclosure control method. compared to the original data. For example, a
There are two major frameworks for how to data publisher may want to add noise to a variable
measure data utility on data sets where SDC tech- but may check that the resulting sample mean
niques have been applied. The more general of the of the modified data is relatively close to the
two is information loss measures, which do not true sample mean of the original data set. Fur-
presume any specific intended use for the data. ther, constraints preventing combinations of
Alternatively, the utility-constrained approach con- variables that are not possible (i.e., a record of
siders the way the data are intended to be used and a woman with prostate cancer) could also be
preserves data utility for that task specifically. considered here.
Privacy-Preserving Techniques Unperturbed methods do not alter the data but

rather seek a limitation of detail in order to pre-
The following section provides an overview of the serve privacy. Methods surveyed here include
most common and basic of privacy-preserving suppression, generalization, rounding, sampling,
techniques utilized by researchers. These tech- and disassociation. The main advantage of
niques provide the foundation for the more unperturbed techniques is that the risk of altering
sophisticated techniques to follow. In general, relationships between variables is less than with
there are two basic approaches for statistical dis- perturbed techniques. This is because unperturbed
closure control: (1) restricting access to the data, techniques protect the data by reducing the detail
“for example, by limiting its use to approved rather than altering the data through noise addi-
researchers within a secure data environment tion. However, when a study calls for specific
(safe access),” or (2) implementing statistical dis- detailed information, unperturbed methods may
closure techniques to protect the data prior to obscure the data in such a way that deem it no
release (“safe data”). It is typical for a publisher longer useful.
to use some combination of both approaches Before selecting an SDC method to implement,
when releasing sensitive health data; however in the first step is to remove obvious identifiers in
this section the focus is on creating “safe data” your data set. Such identifiers include name,
(Shlomo 2015, 201). social security number, birth data, and home
address. As previously stated, “87% of the popu-
lation in the United States have reported charac-
Unperturbed and Perturbed Methods teristics that made them unique based only on ZIP
code, gender and date of birth” (Sweeney 2002b,
Statistical disclosure control (SDC) methods pro- 2). Removing identifiers is necessary to preserv-
tect the privacy of medical health data by pre- ing data privacy, but by itself it is not usually
venting adversaries from uncovering sensitive enough to protect the privacy of individuals.
information. SDC methods can be broken down Recalling the example from the introduction, the
into two categories: perturbed and unperturbed Massachusetts Group Insurance Commission
methods. Perturbed methods work by adding released data under the impression that it was
noise to data, thereby obscuring the true values. safe, having removed all obvious identifiers.
This can be done either through a probability However, Sweeney (2002b) was able to cross-
distribution approach or a value distortion reference the released medical records with pub-
approach. The probability distribution approach licly available voting records and identified the
identifies the distribution of the data and samples specific medical record of former Massachusetts
from that distribution to create a new data set of Governor William Weld.
plausible values. The value distortion approach
perturbs data by building decision tree classifiers
for the data where each element is assigned ran- Basic Methods for Limiting
dom noise. Then, the perturbed data are sampled Disclosure Risk
to match the distribution of the original data set. In
general, the value distortion approach is consid- Among the many disclosure control techniques,
ered to be more effective than the probability the simplest are generalization, suppression,
approach; however random additive noise if prop- rounding, sampling, randomization, and additive
erly filtered can lead to privacy compromises. In noise.
general, perturbed methods are more difficult to
implement, both because they require higher sta- Generalization
tistical sophistication and the added inconve- Generalization works by binning similar values of
nience that one would need the details on how sensitive variables into overarching generalized
the data were perturbed in order to analyze it. terms. For example, rather than providing separate
724 T. Henle et al.
diagnosis codes for different forms of cancer and generalized term “over 80” for those ten individ-
risk small counts that are more easily exposed, uals. Generalization provides the basis for more
generalization would bin all cancer diagnoses complex partitioning privacy preservation models
into one or more subsets that contain higher such as k-anonymity, l-diversity, and t-closeness
counts. The generalized term is still semantically (Li et al. 2015, 187). These techniques will be
consistent for the specific diagnosis, such as discussed in the following section.
replacing lymphoma with “cancer.” However,
the generalized term does not offer as much detail, Sampling
thereby obscuring sensitive values and preventing The last of the unperturbed methods is sampling.
an attacker from distinguishing a specific diagno- A familiar technique for data collection, sampling
sis code from within the generalized term. Gener- is also very useful in privacy preservation. In
alization is best implemented when the number of Skinner et al. (1994), they make the case that
quasi-identifier attributes is small and when the “population uniqueness will be a sufficient condi-
intended use applies to a range of data rather than tion for an exact match to be verified as correct.”
a specific class. The more attributes involved, the In other words, samples obscure population
greater number of generalized terms required to uniqueness and stifle an adversary’s ability to
ensure privacy, which will lead to the degenera- cross-reference uniqueness between data sets in
tion of data utility. When a user seeks information a linkage attack. Sampling also does extremely
about a group or range of values, such as people well in balancing privacy with utility, as proper
from a certain geographic area, generalization sampling techniques should yield data that are an
provides privacy without any utility loss. accurate representation of the population. More
Generalization is susceptible to composition so, sampling is an “easy technique to implement
attacks when multiple independent data sets are and the resulting sampled data are relatively easy
available. If two equivalence classes share only to analyze” (Matthews and Harel 2011).
one sensitive value, an adversary can deduce sen-
sitive information by differencing. For example, Randomization
the raw data set may contain information about the Perturbation techniques work by modifying the
age of an individual. Rather than reporting exact contents of the data in some way as the basis for
age, generalization would report, for instance, the privacy preservation. Randomization is the most
age group (e.g., 20–29, 30–39, etc.). basic perturbation technique and can be used for
both microdata and tabular data sets. In randomi-
Suppression zation, noise is randomly added to the original
Generalization is a favorite technique due to its values (or aggregated values) obscuring the true
“faithful” information properties. Although the values contained within an individual’s record and
granularity of detail may not be fine, the accuracy making it difficult for an adversary to infer sensi-
of values is pristine, and the relationship between tive information. The simplest application of ran-
variables is not disturbed. Suppression is an domization would be random noise generated
extreme case of generalization where the most from an independent and identical distribution
generalized term is utilized. Therefore, possible with a positive variance and mean of zero. In
generalization is preferred because it is a superior this case, the addition of random noise “will not
technique in preserving data utility. Top-bottom change mean of the variable for large data sets, but
coding is another specialized case of generaliza- will introduce more variance,” (Shlomo 2015,
tion that applies specifically to extreme values. 210) which may harm the ability of a researcher
For example, there may only be one person in to make accurate statistical inferences. Randomi-
the study that is 99 years old; however they may zation is best used within “small homogenous
be ten individuals over the age of 80. An agency sub-groups in order to use different initiating per-
may record specific age for individuals in the turbation variance for each sub-group” (Li et al.
study less than 80 years old but utilize the 2015, 180). The use of subgroups for noise
addition is also beneficial in maintaining accurate greater risk to a record exists when core identify-
relationships between variables in the data (Mat- ing variables are present and are unique. Disclo-
thews et al. 2010). sure control techniques are then applied to groups
of records based on their risk category. Substi-
Rounding tution techniques are used to perturb the data.
Rounding is another perturbation method gener- Substitution methods include random rounding,
ally applied to tabular data sets. As the name randomization, data swapping, and synthetic
implies, in rounding, observations are rounded data (the last two methods mentioned here are
up or down to the nearest multiple of a pre- discussed in detail below). The data are then
determined rounding base. For example, if the sampled from the perturbed data set to add
rounding base was 0.1 and the observed value another layer of privacy protection and to
was 0.3, the probability of rounding up would be “help reduce the bias caused by substitution
0.3, whereas the probability of rounding down (Singh et al. 2003). A unique and desirable
would be 0.7. Another method is controlled property of MASSC is “that both disclosure
rounding, “which allows the sum of the rounded risk and information loss can be controlled for
values to be the same as the rounded value of the simultaneously” (Matthews et al. 2010).
sum of the original data” (Shlomo 2015, 218). A
problem with random rounding occurs however Data Swapping
when cells generated in different tables lack con- Data swapping is a privacy-preserving technique
sistency. When this happens, “the true cell count popular for its ease of use. Although the technique
can be learned by generating many tables was originally intended for use on contingency
containing the same cell and observing the pertur- tables, it has become a popular technique for
bation patterns” (Shlomo 2015, 218). An alterna- microdata as well. The procedure involves “the
tive to controlled rounding is semi-controlled swapping of values of variables for records that
random rounding which “ensures that rounded match on a representative key” (OECD 2008,
internal cells aggregate to the controlled rounded 126). In other words, given a data set with a
total” (Shlomo 2015, 218), thereby enforcing con- sensitive variable, such as cancer diagnoses
sistency across all generated tables. where it is necessary to protect against attribute
disclosure, some records containing that diagnosis
code will swap with another record exclusively
More Sophisticated SDC Approaches within that variable. Variables that are not consid-
ered sensitive will be untouched by this process,
Micro-agglomeration, Substitution, for the record swapping applies only to the vari-
Subsampling, and Calibration able of concern. An example of this can be viewed
MASSC (Micro-agglomeration, Substitution, through the following table. In the real data set,
Subsampling, and Calibration) combines various the sensitive variable is the participant’s cancer
simple techniques to create a more robust diagnosis. In the swapped data, the second and
approach to data privatization. The names of the third rows are swapped within the cancer column,
procedure lay out the four steps: micro- so that the participant in row 2 now is associated
agglomeration, substitution, subsampling, and with a cancer diagnosis and the participant in row
calibration. In micro-agglomeration, records are 3 is no longer (Fig. 1).
sorted by the level of risk, dependent on the pres- Data swapping is best used when one is simply
ence of identifying variables. High-risk identify- interested in univariate statistics. Since records are
ing variables are called core variables, as swapped one for one, the marginal totals remain
compared to noncore identifying variables which intact, making univariate statistics unchanged.
generally pose less risk to privacy. Core identify- Multivariate relationships, on the other hand,
ing variables pose a greater risk because they are between the affected variable and the other vari-
generally easier for an intruder to obtain. The ables in the data set may not be correctly
726 T. Henle et al.
Fig. 1 A simple example

of data swapping
maintained in the data swapping process. However variables are swapped with records where the
because only the sensitive variable is affected, mul- value of the sensitive variable falls within a
tivariate analysis can be effectively conducted by certain range of the original record. This restric-
simply excluding the sensitive variable. tion allows the relationships between the sensi-
When implementing data swapping, one must tive variable and the other variables in the data
be wary of swaps that may result in impossible or set to be more effectively maintained than in
improbable records. An example of this would be traditional data swapping where the process of
if a data set contained the variables gender and swapping is strictly random.
diagnosis code and swapping resulted in a record
suggesting that a female was diagnosed with pros- Data Shuffling
tate cancer. Sarathy and Muralidhar (2002) proposed a fur-
As mentioned, data swapping is effective for ther extension of data swapping called data
both tabular and microdata sets. However, imple- shuffling. Data shuffling utilizes a conditional
mentation procedures may differ depending on distribution approach where all of the marginal
the type of data used. Since microdata provides distributions remain intact. More so, pairwise
subject-level information rather than variable monotonic relationships in the original data are
aggregates, “many more swaps must be made to maintained. They are therefore able to increase
preserve the level of privacy” (Matthews et al. the privacy protection without sacrificing the
2010). Determining the number of swaps neces- high level of utility achieved through data swap-
sary was deemed “computationally impractical” ping (Sarathy and Muralidhar 2002). For this
by Fienberg and McIntyre (2004), and therefore reason, this method has become standard for
totals should be preserved only approximately for many, including the United States Bureau of
best practice. As previously stated, arguably the the Census and the Office for National Statistics
greatest advantage to this method is that it is very in the UN (Lauger et al. 2014).
easy to implement. All that is required to utilize
this method is microdata and a random number Randomized Response
generator (Moore 1996). Randomized response is a technique for survey
data closely related to the previously discussed
Rank-Based Proximity Swapping technique of randomization (Warner 1965;
A more contemporary alternative to the tradi- Greenberg et al. 1969). In randomized response,
tional data swapping method is a rank-based respondents will answer a question truthfully with
proximity swapping proposed by Greenberg some given probability (e.g., a coin flip). Other-
(1987) and popularized by Moore (1996). wise, they are instructed to answer the question
Unlike data swapping, values of sensitive with the opposite of the truthful answer.
This technique is most useful when the ques- Synthetic Data

tions being asked require the respondent to reveal Synthetic data are a perturbation approach
sensitive information about themselves that they wherein artificial data sets are generated from the
may not be comfortable answering truthfully. original data through the process of multiple
As a simple example, consider this sensitive imputation (Rubin 1993). Multiple imputation is
survey question “Are you an injection drug user?” a technique which is traditionally used in missing
Before answering the question, the respondent data settings where missing values are filled in by
flips a coin, with the outcome remaining unknown sampling from an appropriate distribution (Rubin
to the administrator of the survey, for whether or 1987). Multiple imputation requires the creation
not they will answer truthfully. If the coin lands of multiple completed data sets, each time
heads, the respondent is directed to answer the replacing the originally missing cells with plausi-
question truthfully. However, if the coin lands ble values. Then the statistical analysis of interest
tails the respondent is directed to answer the ques- is performed on each completed data set, and the
tion untruthfully (i.e., “Yes” if they ARE NOT an results are combined across the data sets using
injection drug user and no if they ARE an injec- Rubin’s combining rules (Harel and Zhou 2007).
tion drug user.) Since the probability of a truthful When creating synthetic data, however, the
and untruthful answers is known, these type of purpose is not to impute missing values but rather
data are useful for many types of analyses, and the to create usable data sets which conceal sensitive
uncertainty in the answers provided introduces a information. This is accomplished by viewing
level of confidentiality for the data because an sensitive attributes of the data as missing values
adversary cannot be sure whether the response is and replacing them using multiple imputation
actually correct or not. techniques. In the case of fully synthetic data
generation, all sensitive variables in the original
PRAM data set are viewed of as missing, and the posterior
The randomized response method could also be predictive distribution is used to generate a syn-
applied to mask raw microdata, even when there is thetic “population.” This is repeated to create
no speculation of false response. This special case several fully imputed data sets, each of which is
of randomized response is called Post- considered a synthetic population. Lastly random
randomization Method (PRAM). In PRAM, “for samples are drawn from each synthetic popula-
each observation, the real value of a sensitive field tion, and this collection of data sets are released to
would be released with some probability and its the public. A popular alternative to fully synthetic
opposite would be released with some other prob- data is partially synthetic data. This technique is
ability” (Matthews et al. 2010, 6). The result is similar to fully synthetic data, however, imputing
essentially a randomized addition of noise. The values only for sensitive attributes, rather than on
difference between PRAM and randomized the entire collection of sensitive variables. An
response is that “PRAM is applied after comple- agency may select individual attributes or entire
tion of a survey and formation of the data set, variables, depending on their privacy needs, and
whereas randomized response is applied during the resulting data set would then contain both real
the interviewing” (Willenborg and De Waal 2001, and synthetic data values.
32). While in randomized response, the random In either fully or partially synthetic data,
mechanism is “independent of the true score” in researchers can perform an analysis on each data
PRAM, “the true value is known and one can set and combine their results using rules set forth
therefore condition on this value when defining in Raghunathan et al. (2003) (for full synthetic
the probability mechanism used to perturb the data sets) or Reiter (2003) (for partially synthetic
data” (Matthews et al. 2010, 7). This distortion data sets), which are slightly modified versions of
of the data, however, requires the researcher to Rubin’s combining rules.
have information about the randomization mech- Synthetic data sets are desirable for their ease
anism in order to effectively analyze the data. of analysis. Similar to multiple imputation,
728 T. Henle et al.
researchers can compute analysis across sev- assessment of privacy is substantially more diffi-
eral synthetic data sets and pool their results cult. This is due to the many different kinds of
for a combined estimate. However, since syn- disclosures that exist and that measures of privacy
thetic data relies on artificial data, this can leave will be different depending on the type of
researchers pondering the validity of their find- disclosure.
ings. Raghunathan et al. (2003) and Reiter Measures of privacy based on reidentification
(2005) set out to assure researchers that syn- assess the probability of accurately identifying a
thetic data have merit by showing that for accu- subject in the published data set. Spruill (1982)
rate imputation models, resulting analyses studied the privacy of some masking procedures
yield almost identical results to that of the orig- (e.g., normal random error, random rounding,
inal data. However, “if the model for imputa- data swapping, etc.). They proposed a measure
tion is incorrect or inaccurate, the resulting of confidentiality based on the percentage of
analysis from the synthetic data will yield records in the published data set that could be
parameter estimates that are much different linked to the original record. Paass (1988) dis-
than those estimated from the actual data. As cusses a measure of privacy based on matching
such, synthetic data sets are only as good as the subjects in the published data set to some addi-
models used for imputation” (Matthews et al. tional available information, and their proposed
2010, 10). measure of privacy is based on the percentage of
Though the idea of synthetic data sets was records that are at risk for identification. They
slow to catch on, it has become a widely used concluded that the best way to protect privacy is
and highly successful disclosure control tech- to release as few variables as possible, since the
nique. The most highly visible user of this tech- greater number of variables, the more difficult it
nique is the United States Census Bureau. They is to protect against a privacy attack. Larger data
have used partially and fully synthetic data in sets (i.e., data sets with many variables) require
several of their publicly released data sets, substantial modifications to the data in order to
including the yearly release of “On the Map” maintain a robust level of privacy though this
data (Shlomo 2015, 228). This data generated comes at the cost of potentially dramatic reduc-
by personal GPS devices provides information tions in data utility. They also note that the
on the locations of individuals. However, it addition of random noise does little to protect
would be rather easy to identify individuals privacy in this framework.
based on their home and place of work, making
it a statistical disclosure concern. However,
through the use of synthetic data sets, the Cen- K-Anonymity
sus Bureau has been able to release this data
without risking the privacy of the individual’s K-anonymity is an additional privacy measure
involved (Shlomo 2015, 228). for data that has had suppression and generali-
zation techniques applied to it. In general,
k-anonymity promises a level of anonymization
Measuring Privacy for any given record in the data by focusing on
quasi-identifiers. As previously mentioned,
Statistical disclosure techniques are designed to quasi-identifiers are “a set of attributes in a
protect the privacy of individuals by masking data set that could be used for matching with
sensitive attributes and preventing disclosure risk an external database” (Matthews et al. 2010,
and, at the same time, producing data sets that are 16). Quasi-identifiers put an individual at
useful for analysis and inference. While assessing greatest risk for disclosure when certain combi-
data utility is relatively straightforward (i.e., how nation of attributes is rare or, in the worst case,
similar is the analysis when using the raw data unique. More formally, k-anonymity states that
vs. the analysis when using the protected data), the every set of quasi-identifiers that appears in the
Fig. 2 An example of making data 2-anonymous
data set must appear at least k-times. Thus there Differential Privacy
is at most a 1/k chance of reidentifying a partic-
ular record (Sweeney 2002a). Differential privacy was proposed in Dwork
In application, take the following two data (2006) and provides formal privacy guarantees
sets with variables age group, gender, and sur- and results in one of the strongest versions of
gical procedure. Generalization has been privacy. The basic idea of differential privacy is
applied to the age group variable in the “Privacy that no single observation in a data set should be
Preserved Data” so that there are fewer overall overly influential in terms of a function of the
age groups and less potential to uniquely iden- data. This means that for a given function of the
tify an individual based on their age. In the raw data, the value of this function will not change
data set, the combination of age group = 70–75 “very much” if ANY one single record in the
and gender = F is a unique combination. How- data is modified. Data sets that differ by only
ever, after the data are generalized, every com- one record are referred to as neighboring data
bination of quasi-identifiers (age and sex) sets. (There are actually two distinct meanings
appears at least two times. Therefore, the privacy for neighboring data sets: one refers to a record
of this data can be measured as 2-anonymous by being modified, and the other refers to a record
the principle of k-anonymity. Note, however, that being removed. Here the second definition is
both 70–80-year-old men had the Whipple proce- used.) Exactly how much values of the function
dure, thus causing an attribute disclosure even are allowed to change is controlled by the
though no record was uniquely identified parameter epsilon (ϵ), with smaller values
(Fig. 2). Extensions of k-anonymity include guaranteeing more privacy and larger values
l-diversity (Machanavajjhala et al. 2007) and guaranteeing less. Guaranteeing that the result
t-closeness (Li et al. 2007). of a function of the data does not change “very
730 T. Henle et al.
much” is accomplished by creating a random- (2012) place the problem of measuring privacy in
ized version of the function rather than the exact a hypothesis testing framework and use the
value of the function. This results in very strong receiver-operating characteristic (ROC) curve to
privacy. Practically speaking, this type of pri- assess the privacy of a database.
vacy guarantees that if an adversary knows all
records in the data set except for 1, they will still
not be able to learn very much about the last Conclusion
unknown observation, and this would be true
for ANY set of observations. It is estimated that 2.5 quintillion bytes of data are
Example data set: 1,2,3,4,100 collected every day (DN Capital 2015). These
As an example, imagine that a data set massive quantities of data allow researchers and
contained five observations, and one of these businesses to perform analyses that were previ-
observations was a large outlier. The mean of ously unthinkable. However, as the amount of
this data set is 22. However, rather than release data that are collected is increased, concerns
the value of 22, a randomized version of the mean about data privacy will naturally follow. Mali-
is released by simply adding some noise to the cious data users often possess the capabilities to
true value of the sample mean. If no noise was expose sensitive attributes and reveal the identi-
added and the true value of the sample mean was ties of individuals in a publicly available data set.
released, if an intruder knew the first four values This is especially problematic in medical data,
in this data set and the mean of 22, the intruder can where sensitive attributes might refer to a serious
learn the exact value of the remaining data value. illness or diagnosis. Therefore, it is of the utmost
However, since the released value of the mean is importance that proper consideration be given to
random, the exact value of the remaining data protecting patient privacy prior to releasing med-
point is uncertain. The exact amount of noise ical data, which requires consideration beyond
that is necessary to add is based on a data simply removing direct identifiers. It is imperative
releaser’s choice of the ϵ parameter and what is that statistical disclosure control techniques be
referred to as the sensitivity of the function. The applied to data to ensure a standard of privacy.
sensitivity of the function is the absolute value of
the largest possible difference in the function
computed on the actual data and a neighboring References
data set across ALL neighboring data sets.
As an example of sensitivity, if we consider the DN Capital – Venture Capital. Beyond ‘big data’ to data
driven decisions. 2015. Dncaptical.com/thoughts/
neighboring data base with the outlier removed,
beyond-big-data-to-data-driven-decisions/.
the mean is now 2.5. This yields a sensitivity of | Dwork C. Differential privacy. In: ICALP. Springer Verlag;
22–2.5| = 19.5 as this is the largest difference 2006. p. 1–12. MR2307219.
across all neighboring databases. Fellegi IP. On the question of statistical confidentiality. J
Am Stat Assoc. 1972;67(337):7–18.
One of the simplest and most popular ways to
Fienberg SE, McIntyre J. Data swapping: variations on a
achieve ϵ-differential privacy is to add Laplace theme by Dalenius and Reiss. In: Domingo-Ferrer J,
noise to the true value of the function of interest Torra V, editors. Privacy in statistical databases. Vol.
calculated on the full data set where the mean of 3050 of lecture notes in computer science. Berlin/Hei-
delberg: Springer; 2004. p. 519. https://doi.org/
the Laplace distribution is 0 and the variance is
10.1007/978-3-540-25955-8_2.
determined by the value of ϵ and the sensitivity of Gkoulalas-Divanis A, Loukides. A survey of
the function. anonymization algorithms for electronic health records.
Extensions of differential privacy include sev- In: Gkoulalas-Divanis A, Loukides G, editors. Medical
data privacy handbook. Cham: Springer International
eral relaxed versions including (ϵ, δ) – indistin-
Publishing; 2015. p. 17–34.
guishability (Nissim et al. 2007) and probabilistic Greenberg B. Rank swapping for masking ordinal micro-
differential privacy (Machanavajjhala et al. 2008). data. Technical report, U.S. Bureau of the Census
Matthews et al. (2010) and Matthews and Harel (unpublished manuscript), Suitland; 1987.
Greenberg BG, Abul-Ela A-LA, Simmons WR, Horvitz Symposium on Theory of Computing; 2007.
DG. The unrelated question randomized response p. 75–84. MR2402430.
model: theoretical framework. J Am Stat Assoc. OECD Statistics. Glossary of statistical terms. OECD glos-
1969;64(326):520–39. MR0247719. sary of statistical terms – data swapping definition,
Gymrek M, McGuire AL, Golan D, Halperin E, Erlich stats. 2008. Oecd.org/glossary/detail.asp?ID=6904
Y. Identifying personal genomes by surname inference. Paass G. Disclosure risk and disclosure avoidance for
Science. 2013;339:321–4. microdata. J Bus Econ Stat. 1988;6(4):487–500.
Harel O, Zhou X.-H. Multiple imputation: Review and Raghunathan TE, Reiter JP, Rubin DB. Multiple imputa-
theory, implementation and software. Statistics in Med- tion for statistical disclosure limitation. J Off Stat.
icine 2007;26, 3057–3077. MR2380504 2003;19(1):1–16.
Health Insurance Portability and Accountability Act Reiter JP. Inference for partially synthetic, public use
(HIPAA); Pub.L. 104–191, 110 Stat. 1936, enacted microdata sets. Survey Methodology 2003;29 (2),
August 21, 1996. 181–188.
Homer N, Szelinger S, Redman M, Duggan D, Tembe W, Reiter JP. Releasing multiply imputed, synthetic public use
Muehling J, et al. Resolving Individuals Contributing microdata: an illustration and empirical study. J Royal
Trace Amounts of DNA to Highly Complex Mixtures Stat Soc Series A Stat Soc. 2005;168(1):185–205.
Using High-Density SNP Genotyping Microarrays. MR2113234.
PLoS Genet 2008;4(8): e1000167. https://doi.org/ Rubin DB. Multiple imputation for nonresponse in sur-
10.1371/journal.pgen.1000167 veys. Hoboken: Wiley; 1987. MR0899519.
Lauger A, et al. Disclosure avoidance techniques at the Rubin DB. Comment on “statistical disclosure limitation”.
U.S. census bureau: current practices and research. J Off Stat. 1993;9:461–8.
Research report series. 2014. www.census.gov/srd/ Sarathy R, Muralidhar K. The security of confidential
CDAR/cdar2014-02_Discl_Avoid_Techniques.pdf numerical data in databases. Inf Syst Res. 2002;13
Li N, Li T, Venkatasubramanian S. t-closeness: privacy (4):389–403.
beyond k-anonymity and l-diversity. In: Data Engineer- Shlomo N. Statistical disclosure limitation for health data:
ing, 2007. ICDE 2007. IEEE 23rd International Con- a statistical agency perspective. In: Gkoulalas-Divanis-
ference on; 2007. p. 106–15. A, Loukides G, editors. Medical data privacy hand-
Li H, et al. Differentially private histogram and synthetic book. Cham: Springer International Publishing; 2015.
data publication. In: Gkoulalas-Divanis A, Loukides G, p. 201–30.
editors. Medical data privacy handbook. Cham: Singh A, Yu F, Dunteman G. MASSC: a new data mask for
Springer International Publishing; 2015. p. 35–58. limiting statistical information loss and disclosure. In:
Machanavajjhala A, Kifer D, Gehrke J, Venkitasu- Proceedings of the Joint UNECE/EUROSTAT Work
bramaniam M. L-diversity: Privacy beyond k-anonym- Session on Statistical Data Confidentiality; 2003.
ity. ACM Trans. Knowl. Discov. Data 2007;1 (1), 3. p. 373–94.
Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., Skinner C, Marsh C, Openshaw S, Wymer C. Disclosure
Vilhuber, L. Privacy: theory meets practice on the control for census microdata. Journal of Official Statis-
map. In: International Conference on Data Engineer- tics 1994;10, 31–51.
ing. Cornell University Computer Science Department, Spruill NL. Measures of confidentiality. Proceedings of the
Cornell; 2008. p. 10. section on survey research methods, American Statisti-
Matthews GJ, Harel O. Data confidentiality: a review of cal Association. 1982
methods for statistical disclosure limitation and Sweeney L. Simple Demographics Often Identify People
methods for assessing privacy. Statist Surv. Uniquely. Carnegie Mellon University, Data Privacy
2011:1–29. https://doi.org/10.1214/11-SS074. Working Paper 3. Pittsburgh 2000.
Matthews GJ, Harel O. Assessing the privacy of random- Sweeney L. Achieving k-anonymity privacy protection
ized vector valued queries to a database using the area using generalization and suppression. Int J Uncertainty
under the receiver-operating characteristic curve. Fuzziness Knowledge Based Syst. 2002a;10
Health Serv Outcome Res Methodol. 2012;12 (5):571–88. MR1948200.
(2–3):141–55. Sweeney, L. Simple demographics often identify people
Matthews GJ, Harel O, Aseltine RH. Assessing database uniquely. Carnegie Mellon University, data privacy
privacy using the area under the receiver-operator char- working paper 3. 2002b.
acteristic curve. Health Serv Outcome Res Methodol. Sweeney L. K-anonymity: a model for protecting privacy.
2010;10(1):1–15. Int J Uncertainty Fuzziness Knowledge Based Syst.
Moore Jr R. Controlled data-swapping techniques for 2002c;10(5):557–70. MR1948199.
masking public use microdata. Census Tech Report. Warner SL. Randomized response: a survey technique for
1996. eliminating evasive answer bias. J Am Stat Assoc.
Nissim K, Raskhodnikova S, Smith A. Smooth sensitivity 1965;60(309):63–9.
and sampling in private data analysis. In: STOC ‘07: Willenborg L, de Waal T. Elements of statistical disclosure
Proceedings of the Thirty-Ninth Annual ACM control. New York: Springer; 2001. MR1866909.
Qualitative Research
30
Cynthia Robins
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
What Is Qualitative Research? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
A Sampling of Qualitative Health Research Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
Methods of Qualitative Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
Informed Consent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
Data Collection Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
To Record or Not to Record? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
Simplifying the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750
Abstract can it really be called “research” at all? A

Qualitative methods were introduced into the philosophical debate about qualitative versus
world of health services research about three quantitative research, however, is not within
decades ago, and have begun to gain traction the purview of this chapter. Rather, the pages
among researchers only in the last decade and a that follow have a threefold objective: First, to
half. Despite the growing interest in what qual- set forth the epistemological assumptions of
itative research can tell us about the human qualitative research, which are fundamentally
understanding of and experience of illness, different from their quantitative counterparts
skepticism remains among some scholars (and thus non-comparable); second, to provide
about the value-added of non-numeric the reader with a brief review of seminal works
research. Indeed, say some, if the findings in qualitative health research and to discuss
from qualitative studies are not generalizable, what factors have contributed to the growing
interest in such approaches; and, lastly, to pro-
vide readers with some basic tools of qualita-
C. Robins (*)
Westat, Rockville, MD, USA
e-mail: cynthiarobins@westat.com

https://doi.org/10.1007/978-1-4939-8715-3_29
734 C. Robins
tive data collection and analysis that can serve What Is Qualitative Research?
as templates for their own qualitative health
studies. The overarching goal of the chapter is Much ink has been spilled over the years on what
to argue that when conducted systematically by scholars have often referred to as the “qualitative-
well-trained scholars, qualitative research has quantitative divide” or the “science wars,” with
the potential to offer us valuable insights into scholars arguing about the supremacy of one
the socio-cultural factors that underlie the approach over the other (classic examples include
interpretation of diseases, the illness experi- Popper 1934; Kuhn 1962; Sokal 1996). Those
ence, and the search for meaningful and effec- arguments will not be revisited here; instead, this
tive treatments. chapter starts from the assertion shared by Hopper
(2008), Morse (1991), and others that qualitative
research is not “better” or “worse” than its scien-
Introduction tific counterparts, but legitimate in its own right
and on its own terms. In research, as in life, some
Qualitative research involves the analysis of things simply serve different ends. To ask,
nonnumeric data obtained through data collec- “Which is better, an electric drill or a reciprocating
tion methods such as in-depth interviews, focus saw?” is a pointless question without knowing the
groups, and observations. Although it is gaining project that is to be undertaken. Once the project
traction in the field of health research, qualita- objectives are clearly defined, however, there is a
tive methods are a relatively recent addition and right answer. Selecting the wrong tool for the job,
many health researchers are still unsure about perhaps because it is the one the researcher likes
the value it brings to the field. This chapter thus or knows the best, can have disastrous conse-
has a threefold objective: first, to briefly set quences for the work at hand.
forth how qualitative research differs philo- For the reader who is not well versed in phi-
sophically from quantitative research. Although losophy and who simply wants to know if he or
readers may be aware that qualitative and quan- she should consider using qualitative methods on
titative data collection methods are different, it a project, the following brief distinction is worth
is also important to understand that the episte- considering. Quantitative research is rooted in a
mologies that drive each approach are very dis- philosophy of positivism, i.e., the belief that there
tinct. Second, this chapter will review the recent is objective truth in the world that can be discov-
history of the use of qualitative methods in ered through the application of the scientific
health research. It will look at the social and method (see discussion in Ponterotto 2005.
political processes that co-occurred with the 128–129). Through controlled experiments, care-
rise of qualitative research in health and ful measurements, and agreed-upon numeric indi-
healthcare, as well as briefly describe some of cators (e.g., p-values, confidence intervals),
the signal studies in the field over the last science aims to gather increasing amounts of
20–25 years. Finally, the chapter will provide information about the world. Scientific advances,
the reader with an overview of the fundamentals thus, are viewed as getting us closer to a full
of qualitative research, including data collection understanding of an empirical reality.
techniques and the basics of the analytic pro- Qualitative research, by contrast, has its roots
cess. The goal of this chapter is to demonstrate in a philosophy of knowledge often called phe-
that when skillfully and appropriately implemented, nomenology or social constructivism (Morse and
qualitative research can offer critical insights into Field 1996; Alvesson and Skoldberg 2009). This
such phenomena as patients’ experiences, service philosophical position is quite distinct from posi-
providers’ views of disease processes and treat- tivism in that it asserts that human beings’ inter-
ments, as well as key socio-cultural factors that actions with the world are always mediated by a
underlie the structure and delivery of health care in socially or culturally provided system of symbols
different communities. – language, beliefs, values, and rules for behavior.
30 Qualitative Research 735
For example, cultural anthropologists, such as former patients of psychiatric hospitals decried
Geertz (1973) and others, operate from the funda- some of the abuses they had endured under the
mental position that because all of our experiences guise of psychiatric medicine. Members of this
are filtered through a cultural lens, there is no way movement, such as activist-writer Judi Chamber-
to get at what some might refer to as “truth”: What lain, wanted to tell their side of the story, i.e., to
we believe to be truth is someone else’s heresy. share their experiences and perspectives being
Qualitative research thus aims not to uncover “treated” under lock and key. Chamberlain’s land-
“the” truth, but rather “their” truth, often with mark work, On Our Own: Patient-Controlled
the explicit aim of creating a foundation of under- Alternatives to the Mental Health System (Cham-
standing between populations in conflict. The berlain 1978), generously incorporated first-
objective of qualitative research generally “is to person accounts from mental health consumer/
experience, reflect, organize, understand, and survivors and, in so doing, made a compelling
communicate” (Estroff 1981, xvi). argument that there could be two sides to the
Different paradigms lead to different questions medical story.
and thus different ways to answer those questions. At roughly the same time, qualitative research
The reader should not wonder if quantitative or methods were being incorporated more broadly
qualitative research is “better,” even if a disserta- into health research. One of the earliest such
tion advisor prefers one approach or the other. The endeavors was Making It Crazy (Estroff 1981),
question really is: Which is the right tool to meet anthropologist Sue Estroff’s account of how indi-
the research objectives? If the research questions viduals with psychiatric disorders were putting
seek to understand volumes or counts (e.g., “How together their lives outside of the state mental
much. . .,” “How many. . .,” “How often. . .”), the hospital. Estroff’s research methods included par-
reader should look to quantitative data collection ticipant observation, in-depth interviews, and ad
techniques. If the interest is in what the world hoc encounters in the community with formerly
looks like from another’s point of view, perhaps hospitalized psychiatric patients. The result of her
with an eye towards understanding motivations time “in the field” is an ethnography that offers the
(e.g., “Why do. . .,” “How do. . .”), then qualita- reader critical insights into the patients’ perspec-
tive approaches are likely the best option. Once tives on psychiatric medications, work, and their
the research objectives are clearly defined, the relationships with others in the community. Other
choice – or even choices (a researcher may use important works were Emily Martin’s (1987) The
multiple methods) – will become obvious. What Woman in the Body, which examined how the
remains is for the researcher to learn how to use language used to describe women’s reproductive
the tool properly. systems influences the medical establishment’s
approach to pregnancy, childbirth, and meno-
pause, and Joan Cassell’s (1991) Expected Mira-
A Sampling of Qualitative Health cles, an ethnography of surgeons and their
Research Studies perceptions of and behaviors around their work.
Cassell’s study offered one of the earliest exam-
The use of qualitative methods to learn how peoples of what she referred to as “studying up,” i.e.,
ple make sense of their medical experiences – research into the lives of powerful members of a
either as recipients or providers of health care – society rather than the dispossessed.
is a fairly new phenomenon, dating back only to Occurring about the same time was the adop-
the mid-1980s. Arguably one of the greatest con- tion of anthropological methods by scholars in
tributors to this epistemological shift was the other fields, notably within the field of nursing
patients’ rights movement, which sought to chal- research. This movement is perhaps best epito-
lenge the hegemony of the medical system. The mized by the work of Janice Morse, a registered
mental health consumer rights movement, for nurse who went on to receive advanced degrees
example, was an early catalyst for change, as in both nursing and anthropology. In the mid- to
736 C. Robins
late-1980s, Morse edited several seminal works the acceptance that qualitative methods have
(Morse 1988, 1989a, b) that introduced nursing something valuable to offer health care practi-
scholars to the epistemology and methods of qual- tioners and researchers.
itative research. These approaches proved essen-
tial to cross-cultural nursing, where the nurses’
and patients’ understanding of illness and appro- Methods of Qualitative Data Collection
priate treatment might be worlds apart. Effective
care could best be provided, nursing scholars Informed Consent
argued, when these different perspectives were
taken into account. Before collecting any data, the researcher must
By the early 1990s, qualitative health research ensure that he or she follows the guidelines for
– while still not fully accepted by the health the protection of human subjects. Key to this is the
research establishment – was becoming both informed consent process, whereby the study par-
ubiquitous and highly influential. Efforts to com- ticipant is told what his or her rights are as a
bat the rapid spread of HIV/AIDS both in the USA research subject before any data are collected.
and in other countries and cultures demanded The core elements of the informed consent pro-
research methods that could uncover how a cess are provided in Fig. 1 and include a descrip-
group’s behaviors and beliefs about the disease tion of the study sponsor and how the data will be
were contributing to transmission. Anthropolo- used, the risks and benefits to the participants, and
gists and other social scientists using qualitative the voluntary nature of participation, among
methods rose to the occasion. Paul Farmer’s others.
(1993) ethnography of the interpretation of These elements must be provided to partici-
HIV/AIDS in Haiti early in the epidemic was a pants in a written informed consent form that the
landmark work, demonstrating the critical role of study participant and researcher will both sign and
both history and culture in people’s illness expe- date before data are collected. It is also good
riences. The rapid pace of globalization over the practice to review these key elements verbally
last two decades – and the concomitant potential with participants before beginning an interview
for pandemics – has only increased the or focus group discussion. Examples of how the
essentialness of qualitative research methods in information can be verbally reviewed with partic-
the health fields (Ramin 2009; Ebola Anthropol- ipants can be found in the sample in-depth inter-
ogy Response Platform). view protocol (Fig. 2) and sample focus group
Seminal journals, such as Qualitative Health guide (Fig. 3).
Research, first published in 1990, and, roughly a
decade later, the International Journal of Quali-
tative Methods as well as the online Forum for Data Collection Approaches
Qualitative Research, have provided important
avenues for scholars to share their health research There are three primary qualitative data collection
findings and learn about new and innovative strategies that will help researchers understand
approaches to qualitative methods. There are how the study subjects experience the world and,
also increasingly well-attended research confer- in turn, make meaning of those experiences: focus
ences, including the Qualitative Methods Confer- groups, in-depth interviews, and participant
ence and the Qualitative Health Research observation. Each of these is described in turn.
Conference (held alternating years), both
sponsored by the International Institute of Quali- In-Depth Interviews
tative Methodology at the University of Alberta, In-depth interviews (IDIs) are known by a num-
Canada, and the International Congress of Quali- ber of other terms, including semi-structured
tative Inquiry held annually at the University of interviews, unstructured interviews, one-on-one
Illinois in Urbana. These forums are testament to interviews, and guided conversations, among
Key Elements of Informed Consent for Research Participants
All study participants need to be given the following information, as applicable, before any data are
collected.
They must be told that the the study involves research

The purposes of the research must be explained to the participants (e.g., the research is for a
dissertation; the study is being funded by a particular agency and why)
How long the subject’s participation in the study will last (e.g., one hour interview)
A description of the procedures to be followed (e.g., participant will be asked questions, the
interview will be recorded with the participant’s permission)
Identification of any procedures which are experimental
Participants must be informed about any risks or discomfort they may experience during the
study. If the study involves more than minimal risk, participants must be told what
compensation or treatments will be available to them and whom to contact.
Participants also must be told if they can expect any benefits from participating in the study (if
none, disclose that to the participants as well)
A disclosure of appropriate alternative procedures or courses of treatment, if any, that might be
advantageous to the subject
The researcher should explain his or her procedures to maintain the confidentiality of the
subjects. This includes how data will be stored as well as steps the researcher will take to make
sure not to disclose the participants’ identities in any written or presented materials.
All participants should be reminded that their participation in the study is voluntary, and that
there are no penalties or loss of benefits if they decide not to participate or drop out of the
study.
A checklist of the elements of informed consent can be obtained from the U.S. Department of
Health and Human Services, Office for Human Research Protections website:
https://www.hhs.gov/ohrp/regulations-and-policy/guidance/checklists/index.html - accessed
9.5.17
Fig. 1 Elements of informed consent
others. All of these terms, however, can mislead greatest interest to the study. IDIs thus require a
the outside observer, who may believe that the skilled interviewer who has superior active lis-
researcher and interviewee are having an hour- tening skills and who fully understands how the
long discussion bounded by few, if any, parame- interviews are intended to support the goals and
ters. Although the researcher may use an IDI objectives of the study. He or she must have the
guide that at first glance appears lean, each intellectual flexibility to move simultaneously
guide must be carefully crafted to clearly and between the respondent’s narrative and the
narrowly frame the topic for the respondent. study aims, gently guiding the narrative back to
The guide must also include targeted probes to the frame when needed, but also listening for
help the interviewer ensure that, within the gen- new and relevant information that may merit
eral frame, the respondent addresses the areas of additional probing.
738 C. Robins
A hypothetical example will help to illustrate observer (Donohue and Siegel 2000). Sjogren’s
the process. Sjogren’s syndrome has been charac- symptoms can range from the annoying, such as
terized as an “invisible illness,” a disease that may dry eyes, mouth, and skin, to the disabling, includ-
be disabling to the individual who has it, but that ing crippling fatigue, joint pain, and even lym-
offers few visible symptoms to the outside phoma (http://www.sjogrens.org). People living
Introduction
Hello, my name is [NAME ]. Thank you for agreeing to talk with me today about how Sjogren’s has
impacted your social life and experiences. This study, which I am conducting for my dissertation at
University, is being funded by [ORGANIZATION].
Informed Consent
Before we get started there are a couple of things I need to mention. First, this is a research project
and your participation is voluntary. You can stop the interview at any time; if I ask you a question
you would prefer not to answer, just tell me and we’ll move on to the next one. Second, I will do
everything I can to maintain your confidentiality. I will not attach your name to any data files and I
will never use your name in any of my writings from this study. I may use quotes from the people I
interview, but the names of interviewees will not be attached to those quotes. I will also remove any
information from that quote that might identify you to others.
There are no direct benefits to you from participating in this research, although your story will
contribute to my efforts to create a resource manual for others living with Sjogren’s. The main risk
to you from participating in this study is that you might experience some emotional distress from
telling your story. I have a list of resources I will give you at the end of the interview if you would
like.
Finally, with your permission, I would like to audio record our interview today. This is so that I do
not have to take many notes while we are talking and I can focus on the story you are sharing with
me. The recording will also help me to be more accurate when analyzing all of the interviews.
Do you have any questions before we get started? [ANSWER ANY QUESTIONS]
Do I have your permission to audio record the interview? [IF YES, TURN ON THE AUDIO
RECORDER]
Interview Questions
I am interested in learning what it is like to live with Sjogren’s, which some people have referred to
as an “invisible illness.” By that they mean the disease can have profound effects on the person who
has it, but it offers few obvious clues to outside observers that the person is ill. What I’d like to do
today is have you tell me a story about your experiences living with Sjogren’s in a world that may not
know you are sick. You can start your story wherever you like, and you can talk as long as you like,
Fig. 2 (continued)
but tell me everything you think is important for me to fully understand your experiences living and
coping with this invisible illness.
PROBES (IF NEEDED):
In what ways, if any, has this unseen illness affected

…your professional life?
…your home life with family members?
…your social life?
How long did it take for you to get a diagnosis after you first began experiencing symptoms
of the disease? Why do you think that was?
How well do you think the medical community recognizes symptoms of the disease?
What have you done that has been most effective in getting your work colleagues, family, and
friends to understand what it’s like living with Sjogren’s?
What, if anything, do you wish you had had – or would still like to have – to help others understand
your experiences?
Is there anything else about your experience living with an invisible illness like Sjogren’s that you
haven’t talked about, but that you think is important for me to hear to fully understand your
experiences?
TURN OFF AUDIO RECORDER AND THANK THE INTERVIEWEE
Fig. 2 Sjogren’s IDI guide
with Sjogren’s may have to make a number of The first question in the IDI guide must set the
significant lifestyle changes, but often without parameters of the interview for study partici-
the support of family or friends, who think the pants, but also give them sufficient leeway to
person looks “perfectly healthy.” This illness cap- be able to share their experiences and their
tures the attention of a hypothetical researcher, points of view. Thus it may look like the
who wants to interview people with Sjogren’s to following:
understand their experiences working and living
I am interested in learning what it is like to live
with a disease that no one can see. She hopes to
with Sjogren’s, which some people have referred
develop a guidebook that can offer sufferers some to as an “invisible illness.” By that they mean the
coping strategies, including talking points that disease can have profound effects on the person
will help the person with the disease explain the who has it, but it offers few obvious clues to
outside observers that the person is ill. What I’d
illness to people in their social network. Thus, in
like to do today is have you tell me a story about
addition to hearing about her subjects’ social your experiences living with Sjogren’s in a world
experiences, she also wants to hear from her inter- that may not know you are sick. You can start
viewees what steps they have taken that have been your story wherever you like, and you can talk as
long as you like, but tell me everything you think
successful in explaining their condition to others,
is important for me to fully understand your
as well as any additional supports they might like experiences living and coping with this invisible
to have. illness.
740 C. Robins
This opening statement is by no means “unstruc- include probes so that the interviewer makes
tured” or even “semi-structured” because the sure the respondent addresses the key domains
interviewee is told precisely the bounds within of the research. Possible probes for this hypothet-
which her narrative should remain: She is being ical study might include the following:
asked to describe the social aspects of the illness,
i.e., what it is like to live with a disease that others • In what ways, if any, has this unseen illness
cannot see. See is not being asked to give a full affected
accounting of her symptoms, the specialists she – . . .your professional life?
sees, or the treatments she is undergoing. – . . .your home life with family members?
In a perfect world, each interviewee would – . . .your social life?
spontaneously relate a story that fully addresses
all areas of interest to the researcher. But because Notice that these three probes cover the key
this is an imperfect world, the protocol should dimensions of interest (work, family, friends), but
Introduction
Hello, my name is [NAME ]. Thank you all for agreeing to participate in this focus group discussion
today about how Sjogren’s has impacted your social lives and experiences. This study, which I am
conducting for my dissertation at University, is being funded by [ORGANIZATION].
Informed Consent
Before we get started there are a couple of things I need to mention. First, this is a research project
and your participation is voluntary. If you decide you no longer want to participate, you can leave
the discussion at any time; if I ask you a question you would prefer not to answer, just tell me and
I’ll move on to the next person. Second, there are no right or wrong answers to any of the questions
that I ask today. You may disagree with what someone else says during the group, and that’s ok. It’s
important that I hear different perspectives. Third, I will do everything I can to maintain your
confidentiality. I will not attach your names to any data files and I will never use your names in any
of my writings from this study. I may use quotes from the focus groups, but the names of
interviewees will not be attached to those quotes. I will also remove any information from that quote
that might identify you to others.
There are no direct benefits to you from participating in this research, although your story will
contribute to my efforts to create a resource manual for others living with Sjogren’s. The main risk
to you from participating in this study is that you might experience some emotional distress from
telling your story. I have a list of resources I will give you at the end of the interview if you would
like.
Finally, with your permission, I would like to audio record our interview today. This is so that I do
not have to take many notes while we are talking and I can focus on the story you are sharing with
me. The recording will also help me to be more accurate when analyzing all of the interviews.
Do you have any questions before we get started? [ANSWER ANY QUESTIONS]
Do I have your permission to audio record the interview? [IF YES, TURN ON THE AUDIO
RECORDER]
Fig. 3 (continued)
Warm-Up Exercise
I’d like to start off by taking just a couple of minutes for us to get to know each other. So if you
would, please tell us just your first name and , briefly, something that you think is unique about
yourself – an interesting hobby, somebody famous that you once met, or an interesting place that
you have visited. [GO AROUND THE ROOM; MODERATOR SHOULD GO LAST]
Discussion Questions
First, I’d like to get a sense of how long each of you has been living with Sjogren’s.
Symptoms
Sjogren’s has often been called an “invisible illness,” that is, a disease in which the symptoms can
have profound effects on the individual who has it, but in ways that may not be obvious to outside
observers. Let’s talk about this idea for a little bit. What symptoms do you all regularly experience
that may affect your daily life, but that you don’t think are noticed by people you work, live, or
socialize with.
Social Impacts
Think for a moment about your professional lives, your home life and family, or your social
activities with friends: Tell me about an instance in which you had to make a lifestyle adjustment to
accommodate your symptoms, but that you didn’t think was fully understood by others, such as
your work colleagues or family and friends.
Strategies
What have any of you done that has been effective in getting your work colleagues, family, and
friends to understand what it’s like living with Sjogren’s?
What, if anything, do you wish you had had – or would still like to have – to help others understand
your experiences?
Close
Is there anything else about your experiences living with an invisible illness like Sjogren’s that you
haven’t talked about, but that you think is important for me to hear to fully understand your
experiences?
Fig. 3 Sjogren’s focus group guide
also remind the interviewee that the focus of the activities that she once enjoyed is important to
research is on the impact of the invisibility of the the respondent. The researcher can be sympa-
illness, in short, the social effects. As an example, thetic to this wider loss, but needs the respondent
perhaps in responding to the third probe, the inter- to hone in on the one activity that relates to the
viewee describes how she can no longer do a social impact of the illness, namely, hiking with
variety of physical activities because of extreme friends. Thus, an appropriate probe at this junc-
joint pain: she can no longer garden, take week- ture might be: How well do your friends under-
end hikes with friends, go for her morning run, or stand why you stopped going on weekend hikes?
walk the dog. Clearly the loss of an array of A simple probe such as this is respectful of the
742 C. Robins
respondent’s need to describe these myriad losses, questions and the interviewer; it is always a good
but in a way that steers the narrative back to the idea to give the respondent a last opportunity to
research focus. talk about something that may have been given
The researcher should also be aware that the short shrift during the interview:
above probes may not be exhaustive and that
interviewees may add a dimension to their expe- • Is there anything else about your experience
riences that the researcher did not anticipate. living with an invisible illness like Sjogren’s
Perhaps three of the first four interviewees that you haven’t talked about, but that you
start off their narratives by recounting how think is important for me to hear to fully under-
many years it took for a doctor to finally recog- stand your experiences?
nize their symptoms and provide a diagnosis.
The research team is not interested in hearing Interviewees generally will not take this as an
about the clinical manifestations of the illness open invitation to talk about their illness experi-
per se, but these narratives suggest that the ences for another hour, for two key reasons: First,
symptoms may be invisible to the medical com- the protocol was structured so as to give them
munity as well. Thus, two new questions for sufficient latitude to tell their stories; and, second,
subsequent interviews might be: this summary question reiterates that the bound-
aries of the discussion are around the social invis-
• How long did it take for you to get a diagnosis ibility of the illness. The full interview guide,
after you first began experiencing symptoms of along with the critical elements of informed con-
the disease? Why do you think that was? sent, is shown in Fig. 2.
• How well do you think the medical community With a skilled interviewer, the above example
recognizes symptoms of the disease? should generate 45 min to an hour’s worth of rich,
detailed narrative. And after conducting another
Qualitative researchers should always be alert 12–15 such interviews, likely the researcher will
to the possibility that data collection will add have sufficient information to at least begin to
entirely new dimensions to their understanding create the desired end-product. Should there be
of the issue and be prepared to modify the inter- critical information gaps, additional in-depth
view protocol, as needed. interviews can be conducted to complete the
Recall, too, that in this example, the researcher’s product.
aim is to create a guidebook for people living with
Sjogren’s syndrome, one that includes successful Focus Groups
coping strategies and other resources that readers Focus groups are small group (6–10 person) dis-
might find useful. Two additional questions might cussions in which a moderator uses a carefully
be included in this protocol: designed protocol to elicit participants’ input on
the topic of interest (Morgan and Krueger 1997).
• What have you done that has been most effec- While IDIs offer depth on an issue, focus groups
tive in getting your work colleagues, family, provide the breadth necessary when beginning to
and friends to understand what it’s like living explore a particular issue. This is a particularly
with Sjogren’s? valuable data collection approach in the formative
• What, if anything, do you wish you had had – stages of a project, when the study team is still
or would still like to have – to help others learning the range of study participants’ experi-
understand your experiences? ences and perspectives on the topic. Focus groups
may also be the data collection method of choice
Finally, because this is a bounded narrative, when project resources (money, time) are limited.
one steered in a particular direction both by the Sometimes this is unavoidable, although the
researcher should remain cognizant that the lack instance in which you had to make a lifestyle
of depth necessarily limits what one can say about adjustment to accommodate your symptoms,
the findings. but that you didn’t think was fully understood
Two aspects of the group dynamic need to be by others, such as your work colleagues or
considered when developing the discussion pro- family and friends.
tocol for a focus group (see Fig. 3). The first is that
even though they have consented to participate, These questions endeavor to get at the same
some participants may be a little nervous, uncer- issues as those covered in the IDI, but in a way
tain how much they want to reveal about that does not allow any one person to tell his or
themselves in this group of strangers. Thus, her life story. For example, the second question
the protocol should include a brief (5 min) about symptom experience is clearly directed to
“icebreaking” exercise to get rid of any lingering the group (“symptoms which you all regularly
participant butterflies and to begin to create con- experience”) and implies that some of these
nections between those in the room. A particularly symptoms may be shared and so discussed.
effective strategy is to ask participants to tell the The third question also restricts any partici-
group something unique or interesting about pant’s input to a single example – enough to
themselves, such as a hobby they have, someone give the group (and the research team) a sense
famous they once met, or some unusual place they of the breadth of experiences of people living
have visited. Having the moderator also partici- with Sjogren’s. Summary questions can be
pate in this exercise is an excellent way for him or roughly identical to those used in the IDI:
her to establish rapport with the group members
before reassuming control of the discussion. • What have any of you done that has been
The second aspect of the group dynamic that effective in getting your work colleagues, fam-
must be taken into account is that the protocol ily, and friends to understand what it’s like
questions – and the moderator – must balance living with Sjogren’s?
the desire for detailed information against the • What, if anything, do you wish you had had –
need to hear from as many participants as possi- or would still like to have – to help others
ble. In the hypothetical Sjogren’s study, the initial understand your experiences?
questions to a focus group may look something • Is there anything else about your experiences
like the following: living with an invisible illness like Sjogren’s
that you haven’t talked about, but that you
• First, I’d like to get a sense of how long each of think is important for me to hear to fully under-
you has been living with Sjogren’s. stand your experiences?
• [Next] Sjogren’s has often been called an
“invisible illness,” that is, a disease in which An important thing to remember is that
the symptoms can have profound effects on the because the researcher must necessarily limit
individual who has it, but in ways that may not each person’s input to the discussion, it will limit
be obvious to outside observers. Let’s talk the depth around any one person’s contribution to
about this idea for a little bit. What symptoms the research topic – often, some important details
do you all regularly experience that may affect about a person’s story may be missing. This is the
your daily life, but that you don’t think are trade-off of conducting focus groups instead of
noticed by people you work, live, or socialize in-depth interviews, so make sure this is the right
with. data collection strategy to answer the research
• Think for a moment about your professional questions. If the researcher has to conduct focus
lives, your home life and family, or your social groups because there are constraints on project
activities with friends: Tell me about an resources, there may be a temptation to over-
744 C. Robins
interpret the data, e.g., the analyst may see differ- show high rates of morbidity from otherwise
ences between groups that are, at best, lightly very treatable conditions, such as diabetes and
supported by the evidence. Analysts should high blood pressure.
remember to work with the information they do Participant observation would be an excellent
have and let unanswered questions serve as the research strategy for trying to understand what is
basis for their next data collection effort. happening in these aversive encounters, why it is
occurring, and if the findings point to a possible
Participant Observation/Ethnography solution. Locations where the researcher might
This data collection strategy is invaluable when consider conducting observations could include
the researcher believes that subjects’ experiences the clinic waiting room, intake stations where
and perceptions can best be understood in the staff make the initial patient contact, weekly
context in which those experiences occur. The team meetings of various staff (e.g., administra-
researcher gains an understanding how the world tors, clinicians, and support staff), and locations
looks through their eyes by observing their behav- throughout the community where the researcher
iors in the location of interest and asking countless can hear from local residents (e.g., senior centers,
questions, some targeted, some spontaneous community library). An example of the kinds of
(Murchison 2010). question domains that might be relevant to this
A new researcher may find it tricky trying to hypothetical study and the categories of people
create interview protocols for this kind of study, in who might be able to speak to each domain, is
part because so much about the context is illustrated in Fig. 4.
unknown, anticipating what specific questions to This example table is by no means exhaustive,
ask and of whom can feel like an exercise in but suggests areas where there may be a discon-
futility. In addition, the field site oftentimes is nect between the various participants. For exam-
not in a location that lends itself to scheduled ple, the administration may need a high patient
in-depth interviews or focus groups. That said, volume to ensure sufficient reimbursements to
the researcher does know the core study goals keep the clinic operating; clinicians, however,
and, very generally, the roles of those within the may find the required volume overwhelming
context who might be able to address them. because it severely limits the amount of time
Instead of trying to develop a series of interview they can spend with each patient. Intake staff
guides applicable to every conceivable situation, and clinicians may get frustrated with patients
the researcher might consider developing a table who repeatedly return to the clinic with the same
of question domains by interviewee role. The issues, clearly not having followed the treatment
table ensures that the researcher will remain recommended during the last visit. At the same
focused on the goals and objectives of the study, time, patients do not understand why physicians
but in a way that provides the latitude required for expect them to be able to follow-through on med-
ad hoc encounters in the field. In addition, having ication regimens when the community does not
a single, focused study document can prove help- have a pharmacy. Moreover, patients with mobil-
ful if the work is being conducted by a team. ity challenges are not always able to drive to the
Another hypothetical example can illustrate closest pharmacy to pick up their prescriptions.
this approach: A community clinic is struggling The data produced through participant obser-
to meet the needs of local residents because resi- vation are not as neat and tidy as those produced
dents are reluctant to go there. Community mem- through IDIs or focus groups. Although the field
bers say they are often treated rudely by staff, and researcher may be able to conduct the occasional
avoid the clinic altogether so as not to be sub- audio-recorded interview and have it transcribed,
jected to the abuse. Without an alternative source much of the resulting data will be in the form of
of care nearby, however, many residents end up comprehensive observation notes written by the
not receiving any medical care at all. Indeed, researcher on a daily basis. Notes should include
surveys conducted with community members some obvious things, such as observations made
Community Residents
Clinic Administrators
Intake Staff
Clinicians
QUESTION DOMAINS
CLINIC ENVIRONMENT
Resource availability, e.g., medical supplies, space, equipment X X X
Staffing, e.g., staff-to-patient ratios, tenure/turnover X X X
Clinic atmosphere, e.g., cleanliness, welcoming, noise levels X X X X
EXPECTATIONS
Patient volume X X X
Length of appointments X X
Time spent in waiting room X X
Appointment outcomes, e.g., diagnosis, treatment X X
Treatment adherence X X
COMMUNITY CONTEXT
Resources, e.g., pharmacies, public transportation, other clinics X X
History of clinic in the community X X X X
Fig. 4 Question domains
by the researcher while sitting in the waiting encounters, the researcher should write up as com-
room: What were interactions like between plete a recounting of the conversation as memory
patients and intake staff? Did the participants allows and as soon as possible after the interview.
seem to be polite with each other or was tension Finally, the researcher should include her own
evident? And what was the evidence for either of thoughts and feelings in the observation notes.
these observations? How long were patients sit- Perhaps she finds the clinic staff insufferable,
ting in the waiting room? What did it look, feel, believing them to be rude to the patients. Con-
and smell like while sitting there, i.e., did the versely, perhaps she finds the patients them-
researcher find it to be a welcoming environment selves unpleasant, believing them to be
or not so much? Why? Notes should also be demanding too much from harried physicians.
recorded of any ad hoc interviews, whether in Regardless, it is important that the researcher
the clinic or in the community. Although it likely keep in mind the biases she brings to the work as
will not be possible to write verbatim notes while well as how those biases can easily color her
talking with people during these spontaneous interpretation of the data. Realistically, it is
746 C. Robins
highly unlikely that one clinic managed to hire gives the respondent some time to consider if he
all of the unbearable doctors, nurses, physi- or she is okay with being recorded. If the record-
cians’ assistants, and administrative staff in the ing is not optional (e.g., the funder/client may
area. The researchers must then ask herself, stipulate in a contract that focus groups are to be
what might be the structural contributors to the recorded), this allows potential participants the
staffs’ bad behavior? Are they overworked? Is opportunity to opt out early if they do not wish
the pay lower than other similar positions in the to be recorded.
area? Do they feel like they are unable to make a
positive difference in their patients’ lives? The
Allow Interviewees or Participants
researcher may not like – indeed, should not feel
to State Things off the Record
compelled to like – the individuals with whom
Interviews can be very cathartic at times, leading
she is working. But it is critical to acknowledge
respondents to get something off their chests that
those feelings and move beyond them so that
they then wish they hadn’t. The researcher should
systemic challenges – and thus possible solu-
let participants know that if they end up saying
tions to the problems – can be identified.
something they want to have expunged, it will be
deleted from the recording, any notes about it will
To Record or Not to Record? be scrubbed, and that information will never make
it into the report. Sometimes respondents may say,
Researchers new to qualitative methods “I need to say something, but it has to be off the
often express discomfort about using an audio record.” The interviewer should TURN off the
recorder during an interview or even a focus audio recorder, let them say what they need to
group discussion. Particularly when interviewing say, and then ask permission to turn the recorder
people about sensitive subjects (e.g., illness, sexu- back on. Interviewees can be much more comfort-
ality), the recorder can seem like a monstrous intru- able knowing they have some editorial control
sion on the interviewee’s private experiences. over what they say.
Nevertheless, recording is the best way to create
an accurate record of what was said during the Let Participants Create a Pseudonym
interview and thus ensure that the analysis is Because study participants’ names will never be
based not on secondary data (e.g., the interviewer’s used in final reports or journal, it makes no differ-
notes and remembrances), but on the primary ence to the researcher whether they use their real
results of the data collection effort (e.g., the record- name when being interviewed or not. But some
ing itself or interview transcripts). Edward Ives The individuals feel more comfortable with the added
Tape Recorded Interview (Ives 1995) is a particu- layer of anonymity that a pseudonym can bring. If
larly useful guide for researchers, but the following a topic, particularly in a focus group discussion, is
brief tips may facilitate the reader’s use of an audio especially sensitive, the researcher should con-
recording device. sider offering participants the option of coming
up with their own names for purposes of the
Discuss the Desire to Record Early discussion.
in the Process
Except on holidays and birthdays, many people Store Audio Files in a Secure Location
do not care to be surprised. If the study plan is to Neither the researcher nor his/her interviewees
record the interviews, respondents should be told should feel confident that an audio file on a por-
this at the recruitment stage of the project: “I will table recording device will not be accessed by
be conducting an approximately one-hour inter- others. Not only do such devices lack security
view that, with your permission, I would like to features, but also they are small and easily mis-
audio record.” If the recording is optional, this placed or lost. Study participants should be told
how the researchers will secure their information, with a description of the fundamentals of the
including where the file will be maintained and analytic process, more details of which can be
how quickly the file will be deleted from the found in the previously cited references.
portable device.
In the End, the Recorder Usually Becomes Simplifying the Data

Invisible to the Participant
Finally, it bears noting that in most instances, the The prospect of reading through several hundred
researcher feels more awkward about using the pages of text multiple times to find answers to
audio recorder than does the interviewee. People one’s research questions is a daunting prospect
agree to share their stories with researchers indeed. Thus, the analyst’s first goal must be to
because they have something to say that they distill that indistinct mass of narrative into
want others to hear. The use of a recording device smaller, like units for further analysis, a simplifi-
offers an assurance that the details of their stories cation process that Miles and Huberman (1994)
will not get lost and that their experiences will be called “data reduction.” In many instances, partic-
faithfully recounted. ularly in applied health research where the objec-
tive is to find answers to very specific questions,
the data can be distilled on the basis of pre-
Data Analysis determined categories or themes that are often
embedded in the very questions asked of the
Health researchers new to qualitative methods respondents.
may find themselves immediately overwhelmed
by the volume of data generated from in-depth Deductive Simplification
interviews and focus groups. It is not unreason- Using the hypothetical Sjogren’s study as an
able, for example, to expect a 1-hour interview to example, IDI probes and focus group questions
result in a 20–25 page transcript; thus, even a asked participants to describe the effects of their
project with only 20 interviews can leave a illness on three dimensions of their lives: their
researcher awash in 400–500 pages of text. professional, home, and social lives. Thus, the
Throw in a few focus groups and notes from analyst’s first step towards simplifying the data
field observations and the numbers increase dra- might involve identifying those sections in each
matically. Perhaps due to this inordinate volume transcript where the interviewee described how
of data, qualitative analysis is often thought of as her illness had impacted her work, home, and/or
one of the most mysterious aspects of qualitative social activities. Those descriptions may have
research: Work colleagues have called it “magic,” come in response to a direct question from the
while others have referred to it as “art.” Really, it interviewer or may have emerged spontaneously
is neither. Qualitative analysis is the process of during the interviewee’s recounting of her experi-
systematically reading through one’s data (and ences living with the illness. Regardless, identify-
re-reading it, numerous times) looking for details ing those sections of the transcripts that deal with
and patterns in respondents’ narratives that each of these dimensions means the hundreds of
address the study’s research questions. There are pages of text have now been separated into at least
numerous books and articles that offer excellent four “piles” of text: that having to do with the
guidance on both philosophical and logistical impacts of the illness on interviewees’ (1) work
aspects of qualitative analysis (Boeije 2010; Ber- lives, (2) home lives, (3) social lives, and (4) text
nard and Ryan 2010; Roller and Lavrakas 2015; having to do with everything else. “Everything
Thorne et al. 2004); thus, no attempt will be made else” (4) may be further simplified by identifying
here to redo what has already been done exceed- those sections of narrative where interviewees
ingly well. Instead, this article provides the reader answered other interviewer questions, such as
748 C. Robins
(4.a.) effective coping strategies, (4.b.) strategies “Please tell me what it has been like for you
that did not work so well for them, (4.c.) resources living with Sjogren’s?” Your probes may be less
that they wish they had available to them, and (4. directive than ours, asking, “How does the illness
d.) text still not yet categorized. As the reader can affect you day-to-day?” rather than, “How has
see, in this example, simply using the concepts living with this illness affected your social life?”
covered in the interview guide (which, not coin- In this case, simplifying the data requires reli-
cidentally, parallel the research questions), the ance on an inductive analytic approach, in which
analyst can readily parse hundreds of pages of the meaningful categories emerge from the read-
data into smaller, more manageable “units” of ing of the analyst’s data rather than being pre-
data for analysis. determined by the research questions. Inductive
Distillation by mapping extant categories onto simplification may mean the analyst needs to
the data, essentially a deductive approach, is not read all of the interviewees’ transcripts, at least
so much “analysis” as it is a necessary precursor to once, possibly twice, before he or she can begin
the analytic process. That is, there is nothing par- to find recurring themes in their narratives. Many
ticularly analytic about locating all of the tran- may describe impacts of the illness on their work
script sections in which interviewees describe and social lives and, as a result, an initial cut in
how Sjogren’s has affected their social lives. the data is created along these two dimensions.
However, it is only by reading through all of this But the analyst may also find that interviewees
similar text that the analyst can then begin to often describe being disappointed in themselves
discern patterns in interviewees’ descriptions – when they find they are no longer able to do not
i.e., analyze – the ways in which this invisible only high-energy activities, such as hiking or
disease leaves its social mark. Indeed, the analyst playing tennis, but even simple tasks to which
ultimately may find at least two threads in these they once gave not a moment’s thought. Carry-
narratives: those cases in which interviewees were ing a basket of laundry, turning a wrench to
no longer able to participate in their peer group’s release the oil drain bolt on the car, even walking
activities and their social lives collapsed and up a flight of stairs – once effortless activities
those, perhaps fewer, instances in which inter- have become onerous, if not impossible, to per-
viewees described a strengthening of their core form. Interviewees describe a loss of self-
social relationships. This bifurcate finding may efficacy that to them is as disturbing, if not
lead the analyst down a further analytic path as more so, than the loss of their social lives. After
he or she endeavors to determine the factors that reading several similar descriptions, the
contribute to any individual experiencing one researcher might create a provisional category,
social trajectory or the other. In sum, deductive perhaps called “Sense of Self,” and begin to look
data simplification does not preclude inductive for additional text that recounts similar feelings
(see below) data analysis. and experiences on the part of the narrator. As
with deductive simplification, once these subsets
Inductive Simplification of data are defined, the analysts can dive further
What if the research questions are not nearly so into each, looking for additional similarities and
clear-cut as the ones proposed in this article? differences in how individuals describe these
What if, instead of wondering how Sjogren’s like experiences.
affects the work, home, and social dimensions Unlike a purely deductive approach, where
of people’s lives, the research goal is simply to data reduction is a precursor to analysis, data
capture the broad experience of living with analysis is part and parcel of inductive data sim-
Sjogren’s? Instead of asking interviewees to plification. Meaningful cuts in the data are not
describe what it is like “living with an invisible predetermined by the research questions or inter-
illness” (which, as noted previously, necessarily view guides, but must be determined by the
implies asking how the person with the illness researcher through multiple careful readings of
interfaces with others), the researcher asks, the data and their subsequent interpretation.
Nevertheless, data reduction is still only the First, because the analyst is using electronic
first step in the process, whether it begins codes for data reduction rather than colored
through induction or deduction. Subsequent ana- markers, extracting similarly coded text can be
lytic efforts will explore the data for additional as quick and easy as the click of a button or the
patterns, such as themes or concepts that are writing of a simple program (data query). The
shared by all respondents, or multiple different analyst thus can spend less time looking for text
perspectives on the same issue (e.g., differential and more time reading it to see if there are impor-
impacts on one’s social life). Whenever possible, tant nuances in interviewees’ narratives. Second,
finding a potential explanation for such differ- automation allows studies to collect and analyze
ences is the next step in the analytic process. The much larger volumes of data than would be feasi-
previous social impacts example described ble if the work were being done by hand. In 2010,
hypothetical interviewees who said their social for example, the U.S. Department of Defense
worlds came undone as a result of their illness supported a Comprehensive Review Working
and others who said they grew even closer to Group (CRWG) to examine active-duty and
their core group of friends. The analyst might reserve service members’ views about the poten-
first look for demographic differences in each of tial impact on unit cohesion, morale, and readi-
these groups as a possible way to account for the ness if Don’t Ask, Don’t Tell (the 1993 law
different effects: perhaps the latter interviewees barring openly gay individuals from serving in
are significantly older than the former or perhaps the military) were repealed. In addition to
the first group are single while the second group conducting the largest-ever survey of service
are married. The analyst might also look to each members and military spouses, the effort included
speaker’s narrative for additional clues that the analysis of hundreds of focus group tran-
could account for the differences: words such scripts, two thousand open-ended survey com-
as “outgoing,” “active,” “gregarious,” and ments, and literally thousands of comments sent
“social” may characterize the first group’s narra- to a DOD inbox. All data collection and analysis
tives, while such terms are largely absent among took place within a ten-month timeframe, a feat
the second group. that was possible only with the support of an
excellent qualitative data analysis program
A Note on Data Coding (Robins and Eisen 2017).
Over the last 20 years, qualitative researchers Third, these programs allow the users to
have increasingly incorporated software into link respondent characteristics (e.g., demographic
their approach to data analysis. Sophisticated pro- data, geographic location, organizational affiliation)
grams such as NVivo, Atlas.ti, Dedoose, and to interview documents such that the analyst can
others are allowing researchers to analyze more quickly examine the data for any patterns by respon-
data, more quickly, and in a way that is far more dent type. In the Don’t Ask, Don’t Tell study, for
transparent than the old-fashioned paper-and-col- example, the team was able to explore respondent
ored-markers approach. Importantly, though, the sentiment regarding repeal (positive, negative, does
fundamentals of qualitative data analysis do not not care) by respondent gender, service (e.g., Marine
change simply because a computer is involved. Corp, Army), officer or enlisted status, or pay grade,
The analyst must still read through all of the data; or any combination of those characteristics (e.g.,
reduce the reams of information into manageable, female Army officers compared to male Army offi-
“like” units through either deductive or inductive cers). This type of analysis can possibly be done
simplification processes; read the like units to without a computer, but it would be tedious and
identify narrative themes that are shared by the time-consuming, at best.
interviewees or that diverge; and, when possible, Finally, and importantly, qualitative analysis
seek an explanation to account for those differ- software supports the development of an “audit
ences. The software does, however, make several trail,” a time- and date-stamped description of
of these steps easier. the decisions and actions of the analytic team.
750 C. Robins
This is important documentation for clients, Donohue PJ, Siegel ME. Sick and tired of feeling sick and
some of whom may be uncertain about the tired: living with invisible chronic illness. New York:
WW Norton & Company; 2000.
rigor with which the qualitative analysis is Ebola Anthropology Response Platform. http://www.
being done. It is also an invaluable check for ebola-anthropology.net/
the analysts, ensuring that both new and sea- Estroff SE. Making it crazy: an ethnography of psychiatric
soned researchers are able to support both their clients in an American community. Berkeley: Univer-
sity of California Press; 1981.
decisions and their findings with data. Farmer P. AIDS and accusation: Haiti and the geography of
blame. Berkeley: University of California Press; 1993.
Forum: Qualitative Social Research. Accessible at: http://
www.qualitative-research.net/index.php/fqs/index.
Summary Geertz C. The interpretation of cultures. New York: Basic
Books; 1973.
Qualitative health researchers have helped to shed Hopper K. Qualitative and quantitative research: two cul-
tures. Psychiatr Serv. 2008;59(7):711.
light on how both patients and clinicians under- International Congress of Qualitative Inquiry. Information
stand states of health, disease, and what consti- available at: http://icqi.org/qualitative-health-townhall-
tutes appropriate treatment. The insights meeting/
generated from their work have contributed to International Institute for Qualitative Methodology. Infor-
mation available at: https://www.ualberta.ca/interna
reduced disease transmission, understanding of tional-institute-for-qualitative-methodology
patients’ lived experiences, improved communi- International Journal of Qualitative Methods. Accessible at:
cation between clinicians and the people they https://us.sagepub.com/en-us/nam/international-journal-
treat, and better patient health care experiences. of-qualitative-methods/journal202499#description
Ives ED. The tape-recorded interview: a manual for
There has always been the potential for misunder- fieldworkers in folklore and oral history. Knoxville:
standings to emerge between patients and clini- University of Tennessee Press; 1995.
cians, who have very different funds of Kuhn TS. The structure of scientific revolutions. Chicago:
knowledge and assumptions about the world. University of Chicago Press; 1962.
Martin E. The woman in the body: a cultural analysis of
The rise in globalization only exacerbates the reproduction. Boston: Beacon Press; 1987.
potential for conflict in the midst of a medical Miles MB, Huberman AM. Qualitative data analysis. 2nd
crisis, resulting in increased disease burden on ed. Newbury Park: Sage; 1994.
patients and the systems trying to treat them. Morgan DL, Krueger RA. The focus group kit. Los
Angeles: Sage; 1997.
Health researchers interested in contributing to Morse JM, editor. Recent advances in cross-cultural nurs-
the development of constructive dialogues in the ing. Edinburgh: Churchill Livingstone; 1988.
clinical encounter may well find that qualitative Morse JM, editor. Cross-cultural nursing: anthropological
research methods are the right tool for the job. approaches to nursing research. New York: Gordon &
Breach; 1989a.
Morse JM, editor. Qualitative nursing research: a contem-
porary dialogue. Rockville: Aspen Press. Rev ed.,
Newbury Park: Sage; 1989b.
References Morse, JM. Getting Started: Labels, Camps, and Teams.
Qualitative Health Research, Volume 1991;1(1):3–5.
Alvesson M, Skoldberg K. Reflexive methodology: new Morse JM, Field PA. Nursing research: the application of
vistas in qualitative research. Los Angeles: Sage; 2009. qualitative approaches. Cheltenham: Stanley Thornes
Bernard HR, Ryan GW. Analyzing qualitative data: sys- Ltd; 1996.
tematic approaches. Los Angeles: Sage; 2010. Murchison JM. Ethnography essentials: designing,
Boeije H. Analysis in qualitative research. Los Angeles: conducting, and presenting your research. San
Sage; 2010. Francisco: Josey-Bass; 2010.
Cassell J. Expected miracles: surgeons at work. Philadel- Ponterotto JG. Qualitative Research in counseling psychol-
phia: Temple University Press; 1991. ogy: a primer on research paradigms and philosophy of
Chamberlain J. On our own: patient-controlled alternatives science. J Couns Psychol. 2005;52(2):126–36.
to the mental health system. New York: Haworth Press; Popper K. The logic of scientific discovery. London:
1978. Routledge; 1934.
Qualitative Health Research. Accessible at https://us. Roller MR, Lavrakas PJ. Applied qualitative research
sagepub.com/en-us/nam/journal/qualitative-health-re design: a Total quality framework. New York: Guilford
search#description Press; 2015.
Ramin, B.M. & McMichael, A.J. Climate Change and Sokal AD. Transgressing the boundaries: towards a trans-
Health in Sub-Saharan Africa: A Case-Based Perspec- formative hermeneutics of quantum gravity. Social
tive. EcoHealth. 2009;6:52. https://doi.org/10.1007/ Text. 1996;#46/47:217–52.
s10393-009-0222-4 Thorne S, Reimer Kirkham S, O’Flynn-Magee K. The
Robins CS, Eisen K. Strategies for the effective use of analytic challenge in interpretive description. Int J
NVivo in a Large-Scale Study: qualitative analysis Qual Methods. 2004;3(1):1–11.
and the repeal of don’t ask, don’t tell. Qual Inq. Volume
2017;23(10):768–778.
Part III
Health Care Systems and Policies
Assessing Health Systems
31
Irene Papanicolas and Peter C. Smith
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755
What Is Performance Measurement for? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756
Defining and Measuring Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758
Defining the Unit of Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 758
Defining Key Performance Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759
Methodological Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 760
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766
Abstract and methodological concerns. Finally the

The provision of performance information chapter considers issues related to data
can play a key role in health system evaluation presentation. The chapter concludes by
and perfomrance improvement. In this chapter summarising progress made in perfomrance
we review the key debates around the assessment and outlining new directions for
conceptuatlisation of the health system and future work.
the domains of perfomrance commonly mea-
sured. The chapter outlines the key chanlleges
to data measurement such as data availability Introduction
The provision of relevant, accurate, and timely

I. Papanicolas (*) performance information can play a pivotal role
The London School of Economics and Political Science, in ensuring the health system is able to deliver
London, UK effective and efficient health services. Through
Harvard T.H. Chan School of Public Health, Cambridge, its capacity to secure accountability in the health
MA, USA system, to determine appropriate treatment paths
e-mail: i.n.papanicolas@lse.ac.uk
for patients, and to plan for future service patterns
P. C. Smith and structures, information can be used to identify
Imperial College, London, UK
and implement potential improvements in service
University of York, York, UK delivery. Performance information thus plays an
e-mail: peter.smith@imperial.ac.uk

https://doi.org/10.1007/978-1-4939-8715-3_40
756 I. Papanicolas and P. C. Smith
important role not only as an intrinsic element of result of the different ways in which the health
the health system but also as a key component of a system and its objectives are conceptualized
great deal of health services research. Underlying by different stakeholders and frameworks.
all of these efforts is the role it plays in enhancing The chapter will then consider some of the
the decisions that various stakeholders, such as methodological considerations which have
patients, clinicians, managers, governments, and arisen the use and evaluation of performance
citizens, take in identifying performance improve- information. And finally it will conclude by
ments and steering the health system toward better discussing the major challenges found in pre-
outcomes overall. senting and using performance measures but
The use of performance measurement for also by considering the presenting key lessons
health system improvement has been strongly and future priorities.
advocated by pioneers in the field such as
Florence Nightingale and Ernest Codman since
the late 1800s. Yet only in the past decades have What Is Performance Measurement
health systems seen a substantial growth in health for?
system performance measurement and reporting
to this end. The new growth in performance infor- Health systems are complex entities with many
mation and its use for improvement have been different stakeholders, including patients, health-
the result of multiple factors on both the demand care professionals, health-care providers, purchaser
and supply side. On the demand side, increasing organizations, regulators, the government, and the
demands of accountability and transparency broader citizenry. As outlined by an early report
by the public have created a growing culture in the area of health information (Rigby et al.
requiring proofs and accountability. While on the 1999), information can be identified as having
supply side, great advances in technology have five key roles in health care (Table 1) relating to
made it possible to develop and store increasing the different accountability relationships that exist
amounts of information, allowing stakeholders between the many stakeholders in the system.
instant access to large volumes of data (Smith Through the collection and use of information
et al. 2009). for decision-making in health systems, stake-
While these factors give major impetus to the holders can hold each other to account, thereby
use of information for performance improve- facilitating improvements in effectiveness and effi-
ment, a large number of key debates and barriers ciency. Thus, the fundamental role of performance
remain. Health systems are still experimenting measurement is to help enable accountability
with performance measurement, and large steps relationships to function, by enabling stakeholders
are still needed to coordinate efforts and identify to make informed decisions. It is therefore note-
what works. The policy agenda has moved worthy that, if the accountability relationships are
from concerns with whether data collection to function properly, no system of performance
should be undertaken, and in what areas, to information should be viewed in isolation from
concerns of how to summarize and present data the broader system design within which the mea-
and how to coordinate key interests in order surement is embedded.
to develop firmly based policies and tangible Each of the key roles of information described
improvements. in Table 1 relates to a separate function or role of
This chapter seeks to summarize some of the health-care system, such as providing patient
the main issues emerging in the performance care or planning and developing health services.
measurement debate. The chapter will begin by Each entails different information needs in terms
considering what the key aims of performance of the nature of information, the level of detail and
measurement are and what performance measure- timeliness, and the level of aggregation required,
ment seeks to evaluate. This section will draw in order to function effectively. For example,
upon some of the debates which have arisen as a in choosing which provider to use, a patient may
31 Assessing Health Systems 757
Table 1 The role and significance of information in health care

Role of health care Type of information needed
Patient care Information to enable patients to make decisions among providers or treatment options, such as
information on:
Location and quality of nearby emergency health services
Quality of options for elective care facilities and physicians
Cost of different services/insurance plans
Reviews of providers by families, friends, or third parties
Information on symptoms and treatment options
Information for patients about how to navigate the health system, such as information on:
Which services they are entitled to
Information for physicians on patient’s health-care needs/problems and clinical history, such as
information on:
Patient diagnosis
Past medical history and family medical history
Patient lifestyle factors
Professional Information to compare relative performance to other professionals and communicate with one
practice another, such as:
Patient information including clinical processes and outcomes of care for all patients with a
similar diagnosis or procedure (i.e., registry information)
Information to enable self-review, such as information on:
Processes of care, clinical processes and outcomes for patients treated as compared to peers,
national averages, or best practice
Errors and adverse events for patients treated
Patient experience for patients treated
Information to defend actions taken when necessary, such as information on:
Historical patient files
Best practice guidelines
Information to provide the foundation for evidence-based practice, such as information from:
Clinical trials
Observational studies
Management Information to enable operational management, such as Information on organizational costs and
quality
Information to optimize resource management and deployment, such as Information on resource
availablity and needs
Information to enable service improvement, such as Inforamtion on processes and outcomes of
care
Service Information to evaluate treatments and services, such as information on:
development Comparative effectiveness of different treatments and services
Comparative costs of different treatments and services
Measuring outcomes and thus developing knowledge, such as information from:
Variation in regional and national population health
Information to plan for future service patterns and structures:
Variation in supply and demand of health services
Projections of changes in supply and demand of health services
Policy Information to provide intelligence for policy formulation, such as information on:
development Variation in outcomes, clinical processes, and patient experience across different geographical
regions or organizations
Costs and effects of different medical interventions and treatments
Prevalence of health needs in the population
Aggregate information on preventable or treatable mortality and morbidity
Information to enable inter-sectoral action, such as information on:
Prevalence of particular lifestyle choices or behaviors
Evidence of association between particular lifestyle choices or behaviors to health outcomes
need detailed comparative data on health out- allow for an appropriate assessment of its perfor-
comes for a specific intervention. In contrast, in mance. In the past decade, numerous conceptual
holding a government to account, and deciding frameworks have been created for health system
for whom to vote, a citizen may seek out highly performance assessment at the international level
aggregate summaries and trends. Many inter- and national level. In many cases countries have
mediate needs arise. In order to contribute to developed more than one performance frame-
operational management, more aggregate infor- work, reflecting variations in national and/or
mation and detailed assurance on safety aspects local priorities or the performance of different
may be necessary. This variety of uses highlights areas of the health system. While existing frame-
greatly different information needs in terms of the works have varied purposes, they all aim to pro-
nature, detail, timeliness, and level of aggreg- vide a better understanding as to what constitutes
ation information users require. A fundamental “good” performance by identifying the entity
challenge in performance measurement is to cre- whose performance is being assessed, its key
ate information systems that are able to cater objectives, and the underlying structures and fac-
efficiently for these diverse needs, both in terms tors that drive performance (Papanicolas and
of data collection and data presentation and Smith 2014).
interpretation.
In practice the development of performance
measurement has rarely been pursued with a
Defining and Measuring Performance
clear picture of who the information users are or
what their information needs might be. Instead
The role of performance measurement is to
performance measurement systems have often
measure, analyze, and report the extent to
developed opportunistically, usually seeking to
which the health system is achieving its key
inform a variety of users and presenting a wide
objectives. In order to assess performance suc-
range of data in the hope that some of the infor-
cessfully, it is important to be able to unambigu-
mation collected will be useful to various parties.
ously define the entity being assessed (whether
Yet, given the diverse information needs of the
this be the health system, an organization, or an
different stakeholders in health systems, it is
individual), as well as the key performance objec-
unlikely that a single method of performance
tives of this entity.
reporting will be useful for everybody. Instead
data sources should be designed and exploited
with the needs of different users clearly in mind.
This may often involve using data from the same Defining the Unit of Analysis
sources in different forms. A major challenge
for health systems is therefore to develop more One of the main areas of debate across this field
nuanced ways of collecting and presenting perfor- of study involves clearly defining the unit under
mance measures for the different stakeholders scrutiny, whatever the level of analysis. At the
without imposing a huge burden of new data system level, differences exist between national
collection and analysis. and international stakeholders in determining
The starting point of most performance assess- where the health system boundaries lie and what
ments is the creation of a conceptual framework responsibilities lie within the jurisdiction of the
on which to base the collection of information and health system. In particular, there is no consensus
to use as a heuristic for the understanding of the as to whether a definition of the “health system”
entity being assessed (whether it be the entire should encompass the wider determinants of
health system, a provider organization, or an indi- health outcomes and whether it should include
vidual practitioner). A theoretical framework is activities which impact health outcomes such as
necessary to help define a set of measures that public health, health promotion, and targeting
reflect key organizational objectives and in turn social determinants of health (Papanicolas et al.
2013). There can be no right answer to this ques- to achieve its overarching goals. Thus, to be
tion, as institutional arrangements differ between able to assess the performance of health system,
countries, and there are arguments for promoting it is important to articulate clearly its key objec-
the use of both wider and narrower boundaries tives. There exists a substantial literature which
depending on the purpose of the analysis. How- outlines the main goals of the health system
ever lack of consensus on this issue makes inter- (Aday et al. 2004; Atun 2008; Commonwealth
national comparison of performance assessment Fund 2006; Hurst and Jee-Hughes 2001; IHP
difficult (Papanicolas and Smith 2014). 2008; Jee and Or 1999; Kelley and Hurst
At the organizational level, boundaries 2006; Klassen et al. 2009; Murray and Frenk
between different sectors of care such as primary 2000; Roberts et al. 2008; Sicotte et al. 1998),
care, hospital care, and long-term care are rarely and while there are differences related to the
clearly defined. Part of the difficulty in producing definitions of what particular objectives entail,
a coherent definition of these services and organi- there seems to be relative consensus on the
zations emerges from differences in remits within objectives themselves. These objectives can
and across systems. For example, something like usually be summarized under a limited number
rehabilitation after surgery may be provided in the of headings broadly summarized as:
hospital sector in some systems and in long-term
care facilities in others. However it would be • The health conferred on citizens by the health
misleading to compare the performance of the system
two hospital sectors without considering this dif- • The extent to which the health system is
ference. Whatever the chosen definition, in any equitable
evaluation of performance, the crucial objective • The extent to which patients and their families
is to ensure that the achievements being assessed are protected from the direct costs of needed
accurately represent the contribution attributable health care
to the entities under scrutiny. For example, in • The patient experience offered by the health
the performance assessment of a hospital, it is system
essential to isolate the contribution of hospital • The efficiency and productivity with which
care to the attainment of performance objectives health resources are utilized
(e.g., health improvement) and where necessary to
adjust for any contribution of other activities such The fundamental goal of all health systems is
as primary care provision, public health, and con- to improve the health of patients and the general
textual factors such as the economic, political, and public. However, aside from being concerned
demographic environment. It is thus necessary for with the absolute level of health improvement
one to consider what range of services falls within in each system, a number of performance frame-
the accountability of the hospital – and how the works highlight the importance of distributional
contribution of these services can be assessed (or equity) issues, expressed in terms of inequity
controlling for other factors external to the respon- in health outcomes. Most health systems today
sibilities of the hospital. are concerned not only with the ability of the
health systems to improve health but to do so
across all groups in the population. Related to
Defining Key Performance Objectives this concept is the issue of equity of access to
health care or equity of access to and financing of
Section “What Is Performance Measurement health care; most health systems also seek to
for?” above outlines the main objectives of per- protect citizens from the impoverishment that
formance assessment and the potential that infor- can arise from health-care expenditure and to
mation holds to ensure that the accountability ensure all groups of the population have access
relationships within the health system can operate to at least a basic package of health services
in a manner that enables the health-care system (Papanicolas et al. 2013).
In 2000, the World Health Report “Health the same objective (e.g., improving quality of
Systems: Improving Performance” highlighted life, extending years of life lived, or providing
health system “responsiveness” as an intrinsic services).
goal of the health system (Murray and Frenk As stated above, the overall aim of perfor-
2000; WHO 2000). The WHO definition refers mance measurement is to measure, analyze, and
to “responsiveness to the legitimate expectations report the extent to which the health system is
of the population for their interaction with achieving its key objectives. However, we have
the health system,” and it captures dimensions also seen that information requirements necessary
unrelated to health outcomes such as dignity, to measure performance vary across the key roles
communications, autonomy, prompt services, of the health system, the different stakeholders,
access to social support during care, quality of and the different levels of analysis. Table 2 con-
basic services, and choice of provider. Often this siders some of the key types of measures relating
goal is also referred to as patient or population to the objectives discussed above at different units
satisfaction or patient experience, yet while there of analysis, in particular relating to (1) the system
is overlap across these three concepts, they do not level, (2) the organizational level, and (3) the indi-
all encompass the same characteristics but almost vidual level. Information at the systems level is
always relate to the underlying expectations aggregated information that allows stakeholders
of patients and the population. As with health to consider how performance objectives are being
outcomes, it is not only the absolute level of met at the population level. This information can
responsiveness/satisfaction or good experience be useful for national or regional benchmarking
in a system that is of interest but how this is exercises or to gauge overall performance on par-
distributed among different groups in the ticular goals or to assess the impact of system-
population. level reforms. Organizational-level performance
Finally, efficiency and productivity, or the can be crucial for many of the key roles of the
extent to which health resources are used to health system, such as allocating resources,
produce valued outcomes, is also a key objective patient choice, treatment, and policy evaluation.
of health systems. Reflecting the wide range Finally, information at the individual level can be
of potential perspectives, economists and policy very important for physicians and managers to
makers have adopted different conceptualizations ensure that safe and effective services are deliv-
of efficiency when analyzing different levels of ered to patients.
the health system. Systems-level efficiency is
concerned with understanding how well a specific
system is using the resources at its disposal Methodological Issues
to improve health and secure related objectives
(Papanicolas and Smith forthcoming). At the The diverse set of users and information needs
organizational level, definitions usually refer to in a health system call for a wide variety of mea-
the extent to which health service objectives – surement techniques and indicators. Various
such as hospital objectives – have been achieved approaches toward data collection are needed
compared to the maximum that could be attained, to assemble the necessary information, such as
given the resources available and the external national surveys, patient surveys, administrative
constraints on attainment. While, at the very databases, and routinely collected clinical infor-
micro level, efficiency can be related to decisions mation. The domain of performance being exam-
of individual clinicians on how to distribute ined will in part determine the most appropriate
health-care resources across treatment options in data collection technique (Table 3). For example,
order to maximize valued outputs. The study of when measuring responsiveness, household or
this type of efficiency often takes the form of a individual surveys are likely to be the best sources
systematic analysis of the effects and costs of of patient’s experiences and perspectives, whereas
alternative methods or programs for achieving when looking at specific clinical outcomes,
Table 2 Measures of key performance objectives at different levels of analysis

Performance objective Types of measures and their uses
Health improvement System level: measures of aggregated data on the health of the population
(e.g., life expectancy, disability-adjusted life years, avoidable mortality, survival rates)
Organization level: measures of aggregated data on the contribution to health of particular
health sectors or services
(e.g., avoidable hospitalizations, hospital standardized mortality rates, emergency
readmission rates for different organizations/conditions)
Individual level: measures of health status/health gain for individuals
(e.g., QALY, survival, patient-reported outcome measures)
Patient experience System level: measures of aggregated data population experiences/satisfaction with the health
system
(e.g., population satisfaction, population experiences, average waiting times)
Organization level: measures of aggregated data on satisfaction/experience for particular
health sectors or services
(e.g., rates of patient satisfaction, aggregated patient experiences, number of patient
recommendations)
Individual level: measures of satisfaction/experience/responsiveness of individuals
(e.g., overall physician rating)
Equity and fair System level: measures of the extent to which there is equity in health, access to health care,
financing responsiveness, and financing
(e.g., rates of access of the population, indices of equity in health and access, out-of-pocket
payments as a % of total health expenditure, catastrophic spending, impoverishing spending)
Organization level: measures of the extent to which there is equity of access, responsiveness,
and financing of particular health-care services
(e.g., utilization rates, unmet need of medical care and dental care)
Individual level: n/a
Efficiency/productivity System level: the extent to which health system objectives are maximized given existing
resources
(e.g., ratio of health system outputs to inputs)
Organization level: the extent to which health sector or health service outputs are maximized
given resources available
(e.g., unit costs, average length of stay)
Individual level: identifying the treatment option which yields the maximum effectiveness
per unit cost
(e.g., QALY/cost)
clinical registries may be a more informative and objectives outlined in section “Defining and
cost-effective source of information. In practice, Measuring Performance” are often the product of
although performance measurement efforts have numerous determinants. An individual’s health
progressed over recent years, many health sys- status, for example, can be directly influenced
tems still rely on readily available data as a basis in the short term by actors in the health services
for performance measurement. An important (e.g., improving medical care), others that require
research agenda is to determine where new or long-term action of actors not directly associated
revised data collection initiatives would be most with health services (e.g., environmental policy),
valuable. and yet others that depend primarily on the actions
Regardless of the data sources used, a funda- of individuals and their families (e.g., diet).
mental issue that arises when seeking to interpret Various statistical methods can be used to
performance data is: What has caused the adjust information for different risk factors, such
observed performance and to what practitioners, as differences in resources, case mix, and environ-
organizations, or agencies should variations in mental factors, to make performance more com-
performance be attributed? The key performance parable across organizations or practitioners.
Table 3 Data sources – strengths and weaknesses

Data type Advantages Disadvantages
Administrative Readily available Payment-related incentives may influence data
data Ease of access content
Relatively low acquisition costs Structure of system will influence degree of data
Clear and comparable data available
Typically cover large populations Coding of diagnosis may be problematic
Provide a wealth of information on services May not capture crucial clinical parameters
provided and potential costs Timing of data entry may not be clear
Survey data No strong incentives for gaming May be subject to survey bias if response rates
Provides the only source of information on are not sufficiently high
experiences, views, and opinions Responses can be very sensitive to conditioning
Subjective measures are often shown to be effects related to survey length or question
good measures of objective measures wording
May be sensitive to cultural, ethnic, and even
gender bias
Longitudinal surveys may be subject to bias
related to attrition
Medical Provide a rich source of clinical information May contain contradictory information
records Track data over time Susceptible to manipulation
Requires trained and skilled staff
Reports may be variable and not directly
comparable
Clinical Provides a rich source of data for large numbers Often limited to particular health conditions
registries of patients suffering a particular health Subject to bias in terms of who is included in the
condition registry
Uniformity in data collection methods and the
frequency of data collection
Includes important clinical information and
patient information
These methods are known as “risk adjustment” Furthermore, when performance assessment
techniques. Where variations in performance is used for health service improvement, it is
measures are known to be influenced by factors essential that causality for observed measures
beyond the control of the entities under scrutiny, is attributed to the correct sources or parties
it becomes essential to employ methods of (Terris and Aron 2009).
risk adjustment when using and comparing indi- When collecting and assessing performance
cators to help account for these variations. For information, two types of error should be recog-
example, when measuring hospital outcomes nized and controlled for to the extent possible.
as an indication of quality, it may become cru- The first of these is random error, which emerges
cial to adjust for patient attributes such as their with no systematic pattern and is always present in
age, comorbidities, or socioeconomic class. quantitative data. Random error can give rise to
Failure to risk-adjust outcome measures before two types of false inference, commonly known
comparing performance may result in drawing as type 1 errors (false positive) and type 2 errors
misleading conclusions and can have serious (false negative). The traditional way of control-
implications for policy and quality improvement ling for these errors is to apply statistical tests to
(Iezzoni 2013). However, many methods of risk data at a high significance level (usually 0.05 or
adjustment remain highly contested. Therefore, 0.01). Although well understood, this statistical
whenever risk adjustment is undertaken, it approach is essentially arbitrary and ignores the
should be presented in a clear transparent man- relative cost of making either type of error. The
ner together with the final performance data. second type of error is systematic error which may
Table 4 Usefulness of outcome and process indicators

Type of indicator Advantages Disadvantages Areas best used
Outcome Stakeholders often May be ambiguous and difficult to To measure quality of
indicators find outcome interpret as they are the result of homogenous procedures
measures more many factors, which are difficult to To measure quality of
meaningful disentangle homogenous diagnosis with
Directs attention to Take time to collect and for outcome strong links between
and focuses health to materialize interventions and outcomes
goals on the patient Require a large sample size to detect To measure quality of
Encourage long-term statistically significant effects interventions done to
health promotion Can be difficult to measure (i.e., heterogeneous populations
strategies wound infection) suffering a common condition
Increasing use of
PROMs
Not easily
manipulated
Process Easily measured Often too specific, focusing on a To measure quality of care,
indicators without major particular intervention or condition especially for treatments where
bias or error May quickly become dated as technical skill is relatively
More sensitive to models of care and technology unimportant
quality of care develop to measure quality of care of the
Easier to interpret May have little value for patients homogenous conditions in
Require a smaller unless they understand how they different settings
sample size to detect relate to outcomes
statistically May be easily manipulated
significant effects
Can often be observed
unobtrusively
Provide clear
pathways for action
Capture aspects of
care that are valued by
patients aside from
outcomes
Source: Adapted from Davies (2005) and Mant (2001)
occur if there have been errors in measurement while process measures will be more indicative of
approaches, such as flawed sampling methods. the quality of care for a specific procedure. It is
Systematic errors of this sort will lead to errone- critical that designers of performance measure-
ous conclusions concerning a variable’s true ment schemes are aware of the advantages and
value. In order to avoid systematic errors, it is disadvantages of different types of indicators
critical that data collection methods are carefully when using them to assess performance. Table 4
designed, implemented, and audited. summarizes the main advantages and disadvan-
Traditionally, performance measures have tages of using outcome and process indicators and
been classified as structure, outcome, or process the areas of performance measurement where they
measures. Outcome reflects the eventual objective are most useful.
of the system. However, certain process measures Experience indicates that a balanced approach
may be more realistic indicators of quality if with multiple aggregated and disaggregated indi-
they are known to be associated with good future cators is most desirable to cater for the informa-
outcomes. Different types of indicators will be tion needs of different stakeholders and to allow
appropriate depending on the setting. For exam- more informed policy decisions. For this reason,
ple, outcome measures such as mortality may be composite indicators – indicators which combine
more useful when looking at population health, separate performance indicators into a single
index or measure – are often used to rank or organizations and the systematization of classifi-
compare the performance of different practi- cations within and across countries (using tools
tioners, organizations, or systems by providing a such as diagnostic resource groupings and/or ICD
“bigger picture” and offering a more rounded codes) also allow more robust comparisons across
view of performance (Goddard and Jacobs organizations. Finally, another very large area of
2009). The main virtue of composite indicators development is that of information and communi-
is that they capture attention in a way that a mass cation technologies (ICT), often described within
of separate indicators cannot. However, critics of the EU context in particular as “e-health,” which
composite measures argue that reducing the mea- has the potential to improve greatly the scope,
surement of objectives, or entire dimensions, to volume, and quality of performance data.
one indicator runs the risk of being too simplistic
and masks many of the variations in performance
that should be studied. Conclusions
Indeed, if composite indicators are not
carefully designed, they may be misleading and The ultimate aim of performance measurement
could lead to serious failings if used for health is to help hold the various agents to account,
system policy making or planning (Smith 2002). given the organization and structure of the health
One of the main challenges encountered in system, by enabling these stakeholders to make
the creation of composite indicators is selecting informed decisions. In order for these account-
which measures to include in the indicator and ability relationships to function properly, no per-
with what weights, particularly in areas where formance information system should be viewed
there is little choice of data, and questionable outside its broader context within which the mea-
sources may be used for some components of the surement is fixed. Where possible the perfor-
indicator. Thus, when using composite indicators, mance measurement should provide information
it is prudent to give a full description of all the for all the relevant accountability relationships
information that is summarized in the indicator, present in the health system.
to provide an insight into the performance of each If undertaken carefully, performance measure-
component and help pinpoint the reasons for ment can offer a powerful resource for identifying
variation. In addition, the composite and its inputs weaknesses and suggesting relevant reforms. The
should be presented with proper uncertainty progress that has been achieved is impressive,
measures, which may be more informative than both in the scope of areas for which data is now
measures of central tendency (Jacobs et al. 2005; available and in the degree to which comparability
Naylor et al. 2002). across different entities has been improved.
It is important to note that rapid progress Table 5 outlines the key developments that have
is being made in all areas of health system been made across some health service perfor-
data collection, including areas such as the design, mance domains and also highlights some of the
collection, governance, linkage, and dissemina- main challenges that remain.
tion of data. These developments have the poten- The data collection techniques and methodo-
tial to add further value to the existing data logical tools used for performance measurement
collected, particularly by extending the applica- have developed considerably in the past decade.
tion of what is already available and by collecting The debates raised by the WHO 2000 report in
new data in a more coordinated, timely, and reli- particular have spurred the development of data-
able fashion. Data linkage is allowing researchers sets, which are updated regularly with new sur-
and policy makers to create a more complete veys, process indicators, or outcome indicators in
record of all factors that contribute to health, facil- order to best operationalize theoretical concepts.
itating the creation of less noisy indicators and Considerable progress has also been made in
a more holistic picture of health determinants. the measurement of patient-reported outcomes,
The adoption of IT systems in health-care patient satisfaction measures, and patient
Table 5 Challenges and developments for the measurement of health service performance domains
Performance domain Challenges for measurement Developments in measurement
Health improvement Many aggregate measures fail to distinguish The development of electronic health records
the contribution of the health system (EHRs) provides more complete information
Problems of comparability among over time, on all factors influencing outcomes
reflecting changes in and differences between Increase in registry data, which identifies
coding rules individual patients and traces them through
Large gaps in availability of evidence on the the care process
effectiveness of treatments reducing Increase in measures of outcomes that are not
mortality defined in terms of cure, which are important
Limited set of dimensions captured by for the measurement of chronic disease and
outcome measures with a marked lack of long-term care
measures on disabilities or discomfort
Lack of available, good-quality, and
comparative data at the patient level
Equity Lack of existing datasets which provide a Better collection of indicators on
longitudinal perspective determinants of health
Limited evidence has been recorded on how Investing in data linkages to allow
sensitive inequalities are to the inclusion of desegregation by socioeconomic status and
environmental effects better monitoring of health inequalities
Limited understanding of the factors
explaining the health production process and
sources of inequalities, including the role of
mental conditions along with cognitive biases
in measuring self-reported health
Inadequate identification of what stands
behind measures of socioeconomic position,
namely, different income sources and
measures of wealth and social environmental
controls which differ across the life cycle
Patient experience Lack of conceptual clarity as to what is the Developing more research to understand
difference between satisfaction, patient determinants of satisfaction, patient
experience, and responsiveness experience, and responsiveness
Lack of clarity as to whose experiences/ Developing more precise questions of
satisfaction should be measured (population experience and standardized questionnaires
vs. patient vs. general experts) for the evaluation of health services
Surveys on satisfaction are very sensitive to
question wording, sampling, and
demographic factors
Efficiency The production process underlying health Research to find suitable metrics that measure
systems is intrinsically complex and poorly organizational factors and administrative
understood. Most measures make simplifying structures, which influence inputs and outputs
assumptions that may sometimes result in Improve clarification on the type of efficiency
misleading data being measured by different indicators
Outputs are generally multidimensional, and Improve the conceptualization of the
therefore preference weights are needed if production process in order to better
they are aggregated into a single measure of harmonize data collection efforts
attainment. The choice of such weights is Improve collection of high-quality
intrinsically political and contentious comparable data on outputs, inputs, and
A fundamental challenge in developing an environmental factors necessary for risk
efficiency measure is ensuring that the output adjustments
that is being captured is directly and fully Invest in research to refine methodologies for
dependent on the inputs that are included in whole-system efficiency measurement
the measurement Find a balance between whole-system
Environmental factors, policy constraints, measures and more fragmented efficiency
population characteristics, and other factors measures
may be largely responsible for determining More consideration of how indicators take
health outcomes, yet it is difficult to
(continued)
Table 5 (continued)
Performance domain Challenges for measurement Developments in measurement
incorporate all possible determinants static and dynamic elements of inputs and
appropriately into an efficiency assessment outputs into account
From an accounting perspective, the
assignment of inputs and associated costs to
specific health system activities is
fundamentally problematic, often relying on
arbitrary accounting rules or other
questionable assignments
Although researchers have developed
indicators that seek to measure full
production processes, these measures are
often not the most informative for policy
makers looking to identify and address
inefficiencies
Many outputs are the results of years of health
system endeavor and cannot be attributed to
inputs in a single period
experience measures. Indicators such as avoidable Iezzoni L. Risk adjustment for measuring health outcomes.
mortality, which seek to measure the contribution Arlington: Health Administration Press/AUPHA;
2013.
of health care to health, are also being better IHP. Monitoring performance and evaluating progress in
developed and more frequently used. Indeed, indi- the scale-up for better health: a proposed common
cators are being selected through rigorous selec- framework. Document prepared by the monitoring
tion mechanisms that aim to identify how and evaluating working group of the International
Health Partnership and Related Initiatives (IHP+) Led
appropriate they are, rather than how readily by the WHO and the World Bank; 2008.
available they are. In addition, risk adjustment Jacobs R, Goddard M, Smith P. How robust are hospital
techniques have become more advanced and ranks based on composite performance measures? Med
allow us to better control for exogenous factors Care. 2005;43(12):1177–84.
Jee M, Or Z. Health outcomes in OECD countries: a
that may lead to changes in performance. framework of health indicators for outcome oriented
policymaking. OECD Labour Market and Social Policy
Occasional Papers No. 36. Paris: Organisation for Eco-
nomic Co-operation and Development; 1999.
References Mant J. Process versus outcome indicators in the assess-
ment of quality of health care. International J Qual
Aday LA, et al. Evaluating the healthcare system: effec- Health Care. 2001;13(6):475–480.
tiveness, efficiency, and equity. Chicago: Health Naylor DC, Iron K, Handa K. Measuring health system
Administration Press; 2004. performance: problems and opportunities in the era of
Atun R, Mendabde N. Health systems and systems think- assessment and accountability. In: Organization of Eco-
ing. In: Coker R, Atun R, McKee M, editors. Health nomic Co-operation and Development (OECD), editor.
systems and the challenge of communicable diseases: Measuring up: improving health system performance in
experiences from Europe and Latin America. European OECD countries. Paris: OECD Publications; 2002.
Observatory on Health Systems and Policies Series; Papanicolas I, Kringos D, Klazinga NS, Smith PC. Health
2008. system performance comparison: new directions
Commonwealth Fund. Framework for a high-performance in research and policy. Health Policy. 2013;112
health system for the United States. New York: The (1–2):1–3. 2013; ISSN 0168–8510.
Commonwealth Fund; 2006. Papanicolas I, Smith PC. Theory of system level efficiency
Davies H. Measuring and reporting the quality of health in health care. In: Culyer AJ, editor. Encyclopedia of
care. NHS Quality Improvement Scotland; 2005. health economics. Philadelphia: Elsevier; 2014. p.
Hurst J, Jee-Hughes M. Performance measurement and 386–394. ISBN 9780123756787.
performance management in OECD health systems. Rigby M, Roberts R, Purves I, Robins S. Realising the
OECD Labour Market and Social Policy Occasional fundamental role of information in health care delivery
Papers No. 47. Paris: Organisation for Economic Co- & management: reducing the zone of confusion.
operation and Development; 2001. Research report. Nuffield Trust; 1999.
Roberts MJ, et al. Getting health reform right: a guide to and prospects. Cambridge: Cambridge University
improving performance and equity. Oxford: Oxford Press; 2009.
University Press; 2008. Terris DD, Aron DC. Attribution and causality in health
Sicotte C, et al. A conceptual framework for the analysis of care performance measurement. In Smith PC,
health care organizations’ performance. Health Serv Mossialos E, Leatherman S, Papanicolas I, editors.
Manage Res. 1998;11:24–48. Performance measurement for health system
Smith PC. Developing composite indicators for assessing improvement: Experiences, challenges and pros-
health system efficiency. In: Smith PC, editor. Measur- pects. Cambridge: Cambridge University Press;
ing up: improving the performance of health systems in 2009.
OECD countries. Paris: Organization for Economic WHO. World health report 2000. Health systems: improv-
Cooperation and Development; 2002. ing performance. Geneva: World Health Organization,
Smith PC, et al., editors. Performance measurement for 2000.
health system improvement: experiences, challenges
Health System in Canada
32
Gregory Marchildon
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770
Organization and Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 771
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
Physical and Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773
Delivery of Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774
Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 775
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777
Abstract Most public revenues are used to provide uni-

With a population of 35 million people spread versal access to acute, diagnostic, and medical
over a vast area, Canada is a highly care services that are free at the point of service
decentralized federation. Provincial govern- as well as more targeted (nonuniversal) cover-
ments carry much of the responsibility for the age for prescription drugs and long-term care
governance, organization, and delivery of services. In the last decade, there have been no
health services although the federal govern- major pan-Canadian health reforms, but indi-
ment plays an important role in maintaining vidual provincial and territorial governments
broad standards for universal coverage, direct have focused on reorganizing and fine-tuning
coverage for specified populations, data collec- their regional health system structure and
tion, health research, and pharmaceutical reg- improving the quality, timeliness, and patient
ulation. Roughly 70% of total health spending experience of primary, acute, and chronic care
is financed from the general tax revenues of services. While Canada’s system of universal
federal, provincial, and territorial governments. coverage has been effective in providing citi-
zens with deep financial protection against hos-
pital and physician costs, the narrow scope of
G. Marchildon (*) coverage has also produced some gaps in cov-
Institute of Health Policy, Management and Evaluation, erage and equitable access (Romanow 2002).
University of Toronto, Toronto, ON, Canada
e-mail: greg.marchildon@utoronto.ca

https://doi.org/10.1007/978-1-4939-8715-3_41
770 G. Marchildon
Introduction The burden of disease is among the lowest in the

OECD even though Canada’s ranking, based on
Canada is the second largest country in the world health-adjusted life expectancy (HALE), slipped
as measured by area, with a mainland that spans a from second in 1990 to fifth by 2010 (Murray et
distance of 5514 km from east to west and al. 2013). The two main causes of death in Canada
4634 km from north to south. The climate is are cancer and cardiovascular disease.
northern in nature with a long and cold winter Canada is a constitutional monarchy, based on
seasons experienced in almost all parts of the a British-style parliamentary system, and a feder-
country. The country has a population of 35 mil- ation with two constitutionally recognized orders
lion with most of the population concentrated in of government. The federal government is respon-
urban centers close to the border with the United sible for certain aspects of health and pharmaceu-
States and the remainder scattered over vast rural tical regulation and safety, data collection,
and remote areas (Fig. 1). research funding, and some health services and
Canada is a high-income country with an coverage for specific populations, including First
advanced industrial economy and one of the Nations and Inuit. The second order of govern-
world’s highest Human Development Index rank- ment consists of ten provincial governments
ings. Relative to other OECD countries, Canada’s which bear the principal responsibility for a
economic performance has been solid despite the broad range of social policy programs and ser-
recession triggered by the financial crisis of 2008. vices (Marchildon 2013).
Fig. 1 Map of Canada

32 Health System in Canada 771
Organization and Governance public administration, comprehensiveness, univer-

sality, portability, and accessibility. Established in
In Canada, the governance, organization, and 2004 in response to the lack of national direction
delivery of health services are highly decentralized during the severe acute respiratory syndrome
for at least three reasons: (1) the constitutional (SARS) epidemic the year before, the Public
responsibility of provinces for the funding, admin- Health Agency of Canada performs a broad array
istration, and delivery of most health services, (2) of public health functions including infectious dis-
the status of physicians as independent contractors, ease control, surveillance, and emergency pre-
and (3) the existence of multiple organizations, paredness and, through community partners,
from regional health authorities to privately facilitates various health promotion and illness pre-
owned and governed hospitals and clinics, all of vention initiatives (Fig. 2).
which operate at arm’s length or independently Due to the constitutional division of powers in
from provincial governments. Canada, there is no single ministry or agency
Provincial and territorial governments are responsible for system-wide national planning.
responsible for administering their own tax- Provincial ministries of health are responsible
funded universal, first-dollar coverage programs. for planning and regulating their respective health
Historically, the federal government used its systems, but they collaborate through mecha-
spending power to encourage the introduction of nisms such as federal-provincial-territorial coun-
these programs based on high-level national prin- cils and working groups of Ministers and Deputy
ciples, including the portability of coverage Ministers of Health. The provincial and federal
among provinces and territories. In most prov- governments have also established a number of
inces, health services are organized and delivered specialized intergovernmental agencies to pursue
by regional health authorities (RHAs) which have more specialized objectives including health data
been legislatively delegated to provide hospital, collection and dissemination (the Canadian Insti-
long-term, and community care as well as tute for Health Information or CIHI), health tech-
improve population health within defined geo- nology assessment (the Canadian Agency on
graphical areas. Drugs and Technologies in Health or CADTH),
Provincial ministries of health retain the electronic health records (Canada Health
responsibility to provide targeted coverage for Infoway), and patient safety.
pharmaceuticals and for remunerating physicians. Provincial and territorial governments regulate
Most physicians work on fee-for-service with fee health facilities and organizations since RHAs are
schedules determined through negotiations delegated authorities without a law-making or
between the medical associations and ministries regulatory capacity. These governments are also
of health at the provincial level of government. As responsible for managing blood products and ser-
independent professionals as opposed to salaried vices through Canadian Blood Services in most of
employees, physicians have considerable auton- the country and Héma-Québec in the province of
omy in terms of the managerial control of provin- Quebec. Provincial or other governments are not
cial health ministries or RHAs. directly involved in facility accreditation, and
Despite this apparent decentralization, the fed- health organizations are accredited on a voluntary
eral government retains significant steering basis through Accreditation Canada, a member-
responsibilities. Through its cash transfers to the ship-based nongovernmental body. Most health
provincial governments and the threat of their professions, including physicians and nurses, are
withdrawal, the federal government sets pan- self-regulating within each province and territory
Canadian standards for hospital and medical care based on framework laws established by the rele-
services through the Canada Health Act. The fed- vant governments.
eral department of health – Health Canada – is Six provincial governments have established
responsible for ensuring that provincial govern- health quality councils to work with health pro-
ments are adhering to the five criteria in the Act: viders and organizations to improve quality and
772 G. Marchildon
Canadian Constitution
Provincial and Territorial Statistics Canada

Transfer payments Federal Government
Governments
Canadian Institutes for

Federal- Health Research
Ministers
Regional Provincial-
and Minister of Health
Health Territorial
Ministries of
Authorities Conferences &
Health
Committees
Public Health Patented

Canada Health
Health Canada Agency of Medicine Prices
mental hospital Act, 1984
home Canada Review Board
health and
care and
and medical
long-term
public services
care
health providers
providers
providers
Provincial and Territorial

prescription drug programs
Canadian Agency for

Canadian Institute for Canadian
Drugs and Technologies Health Council of Canada Canada Health Infoway
Health Information Patient Safety Institute
in Health (2003-2013) (2001)
(1994) (2003)
(1989)
Canadian
Blood Services
(1996)
Fig. 2 Organization of the Canadian health system
safety, as well as report outcomes to the general cost of their respective prescription drug plans
public. However, no government has given a pro- (Paris and Docteur 2006).
vincial quality council the power to regulate qual- Due to a high degree of health system decen-
ity or set enforceable standards. tralization, physician autonomy, and onerous pri-
The federal government through Health Can- vacy laws, Canada has been slower than other
ada regulates medical devices; determines the ini- countries in integrating information and commu-
tial approval and labeling of all prescription drug nications technology (ICT) into health delivery. In
therapies, herbal medicines, and homeopathic a 2009 survey of 11 OECD countries by the Com-
preparations; and prohibits direct-to-consumer monwealth Fund, Canadian family doctors scored
advertising of pharmaceuticals. Pharmaceutical the lowest in terms of using electronic health
advertising targeting physicians is subject to fed- records (EHRs) and had the lowest electronic
eral law as well as to codes established by industry functionality (Schoen et al. 2009). Although the
associations. The federal government has exclusive evidence is limited and now somewhat dated, it
jurisdiction over the patenting of new inventions, does appear that hospitals in Canada are also
including pharmaceuticals, and patent protection is behind in their adoption and use of ICT (Urowitz
set at the 20-year OECD norm. Provincial govern- et al. 2008).
ments use a number of regulatory tools, including Three provincial organizations and one
reference pricing, licensing of generics, bulk pur- national-level organization provide health tech-
chasing, tendering, and discounting, to contain the nology assessments (HTA) to provincial and
federal ministries of health and delegated health benefits are portable among the provinces and
authorities. As the sole pan-Canadian HTA territories. Beyond this so-called Medicare cov-
agency, CADTH’s mandate is to provide evalu- erage, federal, provincial, and territorial gov-
ations of new prescriptions drugs, as well as ernments offer their own categorical programs
medical devices, procedures, and systems, to in, and targeted benefits for, long-term care and
federal, provincial, and territorial governments. prescription drugs.
CADTH’s recommendations are advisory in Based on 2011 data, federal, provincial, and
nature, and it is up to the governments in ques- territorial governments were responsible for
tion to decide whether or not to introduce these funding 70.4% of all health spending in Canada,
technologies. the majority of which is raised through general
The patient rights movement is relatively taxation. Three provinces supplement their reve-
underdeveloped in Canada compared to similar nues through annual health-care premiums, but
movements in the United States and Western these too flow into provincial general revenue
Europe. While there are patient-based organiza- funds. The remaining health financing comes
tions focusing on particular diseases, there are from out-of-pocket payments (14.7%), private
only a handful of more broadly based, rights- health insurance (11.8%), and other sources
oriented patient groups. In recent years, patient (3.1%) (CIHI 2013).
rights have been exercised through the courts, Since the Canada Health Transfer constitutes
relying on the constitutional “right to life, liberty roughly 20% of total provincial government
and security of the person” in the Canadian Char- health expenditures, the provincial governments
ter of Rights and Freedom, although most are responsible for raising the lion’s share of rev-
attempts to extend this to a right of access to enues for health (CIHI 2013). Provincial tax rev-
quality health care within a reasonable time have enues come from a number of sources, including
failed (Jackman 2010). (in rough order of importance) individual income
Patients and their respective physicians have taxes, consumption taxes (including “sin” taxes
been more successful in using such Charter on alcohol and gambling), and corporation taxes.
rights to create a right to private health care In those provinces benefitting from an abun-
and private health insurance. In 2005, the dance of natural resources, resource royalties
Supreme Court of Canada provided a limited and taxes are significant sources of revenue
form of this right in a situation where the major- (Marchildon 2013).
ity of the court interpreted public waiting lists Consistent with being a tax-based Beveridge-
for certain types of elective surgery as unrea- style health system, there is limited pooling of
sonable (Flood et al. 2005). funds in the Canadian system. However, there is
a type of pooling through cash transfers – from the
federal government (which collects tax at the
Financing national level) to the provincial and territorial
governments and from provincial governments
Every provincial and territorial government (which pool federal transfers with own-source
provides universal coverage to medically nec- revenues) to RHAs – which, as public non-
essary hospital, diagnostic, and medical care governmental bodies, have no autonomous pow-
services (Taylor 1987). These 13 governments ers of taxation.
act as single payers in providing full coverage
for their respective provincial and territorial
residents. In return for receiving federal trans- Physical and Human Resources
fers, provincial and territorial benefits are pro-
vided on a first-dollar basis and on the same From the 1940s until the 1960s, Canada experienced
terms and conditions to all residents as stipu- a boom in hospital building encouraged by the
lated in the Canada Health Act. Moreover, these introduction and expansion of universal hospital
774 G. Marchildon
coverage and federal hospital construction grants. Delivery of Health Services

By the 1990s, much of this hospital infrastructure
was outdated. Some provincial governments All provincial and territorial governments have
also felt burdened with too many small and inef- public health programs. They also conduct health
ficient hospitals in rural and remote areas. As a surveillance and manage epidemic response.
result, hospitals were closed, consolidated or While the Public Health Agency of Canada
converted, and, in some provinces, put under the develops and manages programs supporting pub-
governance and ownership of newly created lic health programs at the provincial, regional, and
RHAs (Ostry 2006). local community levels, the stewardship for most
Despite recent reinvestments in hospital cap- day-to-day public health activities and supporting
ital, less in bricks and mortar and more in med- infrastructure remains with the provincial and ter-
ical equipment, imaging technologies, and ICT, ritorial governments.
the number of acute care beds per capita Most primary care is provided by GPs and
has continued to decline. This is in part the family physicians, with family medicine recently
result of improvements in clinical procedures recognized as a specialization by the Royal Col-
and the expansion of non-hospital-based surgi- lege of Physicians and Surgeons of Canada.
cal clinics that specialize in day surgeries. Although mandated through policy and practice
Although in the past Canada had fallen behind rather than law, GPs and family physicians act as
other OECD countries in terms of the supply gatekeepers, deciding whether patients should
and use of advanced imaging equipment, the obtain diagnostic tests and prescription drugs or
supply of computed tomography (CT) scans, be referred to medical specialists.
magnetic resonance imaging (MRI), and posi- Provincial ministries have renewed efforts
tron emission (PT) scans is now closer to the to reform primary care in the last decade.
OECD average. Many of these reforms focus on transitioning
After a lengthy period in the 1990s when the from the traditional physician-only practice to
supply of physicians and nurses was reduced interprofessional primary teams capable of pro-
because of the concerted efforts of governments viding a broad range of primary, health promo-
to reduce spending and pay down accumulated tion, and illness prevention services.
public debt, spending on the health workforce Almost all acute care is provided in public or
has climbed steadily since the turn of the century. private nonprofit hospitals, although specialized
Medical, nursing, and other health profession fac- ambulatory and advanced diagnostic services are
ulties have expanded their seats to produce more sometimes provided in private for-profit clinics,
graduates, even while an increasing number of particularly in larger urban centers. Most hospitals
foreign-educated doctors and nurses have immi- have an emergency department that is fed by
grated to Canada. independent emergency medical service units pro-
With the exception of physicians, most viding first response care to patients while being
health workers are employees of health-care transported to the hospital. Due to the scattered
organizations, RHAs, and health ministries and nature of remote communities without secondary
are remunerated through salary and wage and tertiary care, provincial and territorial govern-
income. The majority of health workers in the ments provide air-based medical evacuation, a
public sector are unionized, and their remuner- major expenditure item for the most northern
ation is set through collective bargaining jurisdictions (Marchildon and Torgerson 2013).
agreements. The majority of physician remuner- Long-term care services, including supportive
ation is through fee-for-service. However, alter- home and community care, are not classified as
native payment contracts – particularly for insured services requiring universal access under
general practitioners (GPs) – are becoming the five national criteria set out in the Canada
more common in part as a result of primary Health Act. As a consequence, public policies,
care reforms. subsidies, programs, and regulatory regimes for
long-term care vary considerably among the prov- targeting Aboriginal Canadians, in particular
inces and territories. Facility-based long-term care eligible First Nation and Inuit citizens. These ser-
(LTC) ranges from residential care with some vices include health promotion, disease preven-
assisted living services to chronic care facilities tion, and public health programs as well as
(originally known as nursing homes) with 24- coverage for medical transportation, dental ser-
hour-a-day nursing supervision. Most residential vices, and prescription drug therapies. Despite
care is privately funded, whereas high-acuity these targeted efforts, the gap in health disparity
LTC (requiring 24-hour-a-day nursing supervi- between these Aboriginal citizens and the major-
sion) is heavily subsidized by provincial and ter- ity of society remains large. Since the 1990s, there
ritorial governments (Canadian Healthcare have been a series of health-funding transfer
Association 2009). agreements between the federal government and
Until the 1960s, the locus of most mental First Nation governments – largely based on
health care was in large, provincially run psychi- reserves in rural and remote regions of Canada.
atric hospitals which in turn had evolved out of the At the same time, there has been an Aboriginal
nineteenth century asylum and the twentieth cen- health movement advocating for a more uniquely
tury mental hospital. With the introduction of Aboriginal approach to health and health care
pharmaceutical therapies and a greater focus on (Marchildon 2013).
reintegration into the community, mental health
conditions have since been mainly treated on an
outpatient basis or, in the case of severe episodes, Reforms
in the psychiatric wards of hospitals. GPs provide
the majority of primary mental health care, in part There have been no major pan-Canadian health
because medical care is an insured service with reforms in the past decade. However, individual
first-dollar coverage, whereas psychological ser- provincial governments have concentrated on two
vices are provided largely on a private basis. categories of reforms: (1) structural change
While drugs administered in hospitals are involving the governance and management of
fully covered as an insured service under the health services as a more integrated health system,
Canada Health Act, every provincial and terri- mainly through the reorganization and fine-tuning
torial government has a prescription drug plan of their regional health systems, and (2) process-
that covers a portion of the cost for outpatient type reforms, aimed at addressing bottlenecks in
prescription drugs. The majority of these drug delivery, improving patient responsiveness and
plans target low-income or retired residents. elevating both quality and safety.
The federal government provides pharmaceuti- The introduction of RHAs allowed provincial
cal coverage for eligible First Nations and Inuit. governments to directly manage the health system
These public insurers depend heavily on health through arm’s-length delegated bodies. RHAs
technology assessment to determine which manage services as purchaser-providers except
drugs should be included in their respective in Ontario when the local health integration net-
formularies. works (LHINs) fund (purchase) but do not deliver
Almost all dental care is delivered by indepen- services directly. The purpose of the reform was to
dent practitioners, and 95% of these services are gain the benefits of vertical integration by manag-
paid privately. Dental services are paid for ing facilities and providers across a broad contin-
through private health insurance – provided uum of health services and to improve the
mainly through employment-based benefit plans coordination of “downstream” curative services
– or out of pocket. As a consequence of access with more “upstream” public and population
being largely based on income, outcomes are health services and interventions. In the last
highly inequitable. decade, there has been a trend to reduce the num-
For historical reasons, the federal government ber of RHAs, thereby increasing the geographic
finances a host of health service programs and population size of RHAs in each province,
776 G. Marchildon
in order to capture greater economies of scale narrow scope of the benefit package has resulted
and scope. in larger gaps in coverage, as pharmaceutical
Influenced chiefly by quality improvement therapies and LTC have grown in importance
initiatives in the United States and the United over time. Since 70% of financing for health
Kingdom, provincial ministries of health have care in Canada comes from general taxation,
established new institutions, mechanisms, and there is more equity in financing, but there is
tools to improve the quality, safety, timeliness, less equity in financing for the remaining 30%,
and responsiveness of health services. Six prov- which comes from out-of-pocket sources and
inces have established health quality councils to employment-based insurance benefits associ-
accelerate quality improvement initiatives at the ated with better-paid jobs.
provincial, regional, and clinical levels. Some There are disparities in terms of access to
provinces have also launched patient-centered health care, but outside of a few areas such as
care initiatives aimed at improving the experi- dental care and pharmaceuticals, they do not
ence of patients and informal caregivers. Patient appear to be large. For example, there appears to
dissatisfactions with long wait times for elective be a pro-poor bias in terms of primary care but a
surgery as well as specialist and diagnostic ser- pro-rich bias in the use of specialist physician
vices have triggered efforts in all provinces to services, but the gap in either case is not large.
better manage and reduce wait times. There is also an historic east-west economic
In contrast, the federal government has largely gradient dividing the less wealthy provinces in
removed itself from engaging the provinces in any eastern Canada and the wealthier provinces in
pan-Canadian reform efforts. This is in part the the more western parts of the country from
consequence of the perceived failure of the “10- Ontario to British Columbia. In the present, the
Year Plan to Strengthen Health Care,” signed by economic division is more between those prov-
the Prime Minister and the Premiers of all provinces rich in natural resources – particularly petro-
inces and territories in 2004. leum-producing provinces such as Alberta,
The “10-Year Plan” ends in the fiscal year Saskatchewan and Newfoundland – and those
2013–2014. In December 2011, the federal gov- provinces without such resources. These differ-
ernment announced its reconfiguration of the ences are addressed through equalization pay-
Canada Health Transfer for the decade follow- ments from federal revenue sources to “have-
ing the 10-Year Plan. After 2014, increases in not” provinces that ensure the latter have the rev-
the transfer to the provinces, originally 6% per enues necessary to provide comparable levels of
annum, will be held to the rate of economic public services, including health care, without
growth with a minimum floor of 3%, and all resorting to prohibitively higher tax rates.
transfers will be made on a pure per capita While Canadians are generally satisfied with
basis, without taking into consideration the tax the financial protection offered by Medicare,
capacity of the provinces. The removal of any they are less satisfied with their access to par-
equalization component in the transfer will ticular services. Beginning with the budget cuts
make it more difficult for lower-income prov- to health care in the 1990s, emergency rooms
inces to continue to ensure coverage is became overcrowded and waiting times for non-
maintained at the standard enjoyed in higher- urgent care became lengthier (Tuohy 2002).
income provinces. Based on a survey of patients in selected
OECD countries conducted in 2010, Canada
ranked poorly in terms of waiting times for
Assessment physician care and nonurgent surgery (Schoen
et al. 2010). However, based on relevant mor-
The model of universal Medicare has been effec- tality and morbidity indicators of health system
tive in protecting Canadians against high-cost hos- performance, such as amenable mortality, Can-
pital and medical care. At the same time, the ada fares considerably better, posting better
outcomes than those in the United Kingdom and References

the United States (Nolte and McKee 2008).
Canadian performance in terms of the quality Canadian Healthcare Association. New directions for facil-
ity-based long term care. Ottawa: Canadian Healthcare
of health care has also improved in recent years.
Association; 2009.
This may be a result of the policy focus of pro- CIHI. Health care cost drivers: The facts. Ottawa: Cana-
vincial governments on quality, assisted by health dian Institute for Health Information; 2011.
quality councils and the comparative indicators CIHI. National health expenditure trends, 1975–2013.
Ottawa: Canadian Institute for Health Information;
collected and disseminated by the Canadian Insti-
2013.
tute for Health Information. This improvement is Fine BA, et al. Leading lean: a Canadian healthcare
now being extended to patient responsiveness in leader’s guide. Healthc Q. 2009;12(3):32–41.
the hope that this will improve the quality of the Flood CM, Roach K, Sossin L, editors. Access to care,
access to justice: the legal debate over private health
patient experience.
insurance in Canada. Toronto: University of Toronto
There have been few studies of technical Press; 2005.
efficiency of health systems in Canada (CIHI Jackman M. Charter review as a health care accountability
2011). However, some provincial governments mechanism in Canada. Health Law J. 2010;18:1–29.
Marchildon GP. Canada: health system review. Health Syst
are beginning to arrange for external evalua-
Transit. 2013;15(1):1–179. Copenhagen: WHO
tions of recent reforms. In the particular, the Regional Office for Europe on behalf on the European
recent application of “lean production” method- Observatory on Health Systems and Policies.
ologies in some provincial health systems can Marchildon GP, Torgerson R. Nunavut: a health system
profile. Montreal/Kingston: McGill-Queen’s Univer-
be interpreted as an effort to achieve greater
sity Press; 2013.
efficiency. First developed by Toyota to achieve Murray CJL, Richards MA, Newton JN, et al. UK health
greater technical efficiency and higher quality in performance: findings of the Global Burden of Disease
automobile productions, lean techniques have Study. Lancet. 2013;381:997–1021.
Nolte E, McKee M. Measuring the health of nations: updating
been applied to hospitals and other health set-
an earlier analysis. Health Aff. 2008;27(1):58–71.
tings in a number of provinces. The objectives Ostry A. Change and continuity in Canada’s health care
of the lean projects have ranged from reducing system. Ottawa: CHA Press; 2006.
surgical wait times to improving patient safety Paris V, Docteur E. Pharmaceutical pricing and reimburse-
ment policies in Canada. Paris: Organisation of Eco-
(Fine et al. 2009).
nomic Co-operation and Development, Health Work
Due to the number of trends and institutional Group; 2006.
changes, health systems in Canada are more trans- Romanow RJ. Building on values: the future of health care
parent today than in the past. Whether in their in Canada. Saskatoon: Commission on the Future of
Health Care in Canada; 2002.
roles as citizens, taxpayers, patients, or caregivers,
Schoen C, et al. A survey of primary care physicians in
Canadians have been demanding greater transpar- eleven countries. Health Aff. 2009;28(6):w1171–83.
ency on the part of their governments and publicly Schoen C, Osborn R, Squires D. How health insurance
funded health-care organizations and providers. design affects access to care and costs, by income, in
eleven countries. Health Aff. 2010;29:w2323–34.
They now receive a range of health information
Taylor MG. Health insurance and Canadian public policy:
and analysis from a number of new provincial and the seven decision that created the Canadian healthcare
intergovernmental organizations, including the system. 2nd ed. Montreal: McGill-Queen’s University
Health Council of Canada which provides acces- Press; 1987.
Tuohy CH. The cost of constaint and prospects for health
sible reports on the state of Canadian health care.
care reforms in Canada. Health Aff. 2002;21(3):32–46.
In addition, a number of advocacy organizations Urowitz S, et al. Is Canada ready for patient accessible
and think tanks also provide regular reports on electronic health records? A national scan. BCM Med
health system issues of concern and interest to Inform Decis Making. 2008;8:33. http://www.
biomedcentral.com/1472-6947/8/33. Accessed 25
the general public.
Sept 2012.
Health System in China
33
David Hipgrave and Yan Mu
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 780
China’s Current Health System Reform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781
Organization, Governance, and Accountability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783
Organization of the Health System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783
Accountability Within Government and to the Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784
Planning, Regulation, and Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786
Monitoring Progress: China’s Health Information Systems and Technology . . . . . . . . . . 786
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
Sources of Funding and Accountability for Its Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 788
Difficulties Using Available Health Financing for Policy Implementation . . . . . . . . . . . . 789
Health Expenditure and Sources of Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
Collection and Pooling of Funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
Coverage, Benefit, and Cost Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792
Payment Methods for Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
Infrastructure and Its Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794
Health Workforce and Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795
Remuneration of Health Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797
Health Services Delivery and Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797
Primary Care and Public Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797
Clinical Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
D. Hipgrave (*)
UNICEF, New York, NY, USA
Nossal Institute for Global Health, University of
Melbourne, Melbourne, VIC, Australia
e-mail: dhipgrave@gmail.com
Y. Mu
UNICEF China, Beijing, China
e-mail: ymu@unicef.org

https://doi.org/10.1007/978-1-4939-8715-3_42
780 D. Hipgrave and Y. Mu
Pharmaceutical Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 800

Private Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 801
Health Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 801
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804
Abstract List of Abbreviations

The health of China’s population improved dra- CDC Communicable disease control
matically during the first 30 years of the Peo- GDP Gross domestic product
ple’s Republic, established in 1949. By the HMIS Health MIS
mid-1970s, China was already undergoing the HSR Health system reform
epidemiologic transition, years ahead of other LMIC Low- and middle-income countries
nations of similar economic status, and by 1980, MCH Maternal and child health
life expectancy (67 years) exceeded that of most MDGs Millennium Development Goals
similarly low-income nations by 7 years. MIS Management information system
Almost 30 years later, China’s 2009 health MoH Ministry of Health
reforms were a response to deep inequity in NCDs Noncommunicable diseases
access to affordable, quality healthcare resulting NDRC National Development and Reform
from three decades of marketization, including Commission
de facto privatization of the health sector, along NEDL National Essential Drugs List
with decentralized accountability and, to a large NEMS National Essential Medicines
degree, financing of public health services. The Scheme
reforms are built on earlier, equity-enhancing NHFPC National Health and Family Plan-
initiatives, particularly the reintroduction of ning Commission
social health insurance since 2003, and are PRC People’s Republic of China
planned to continue until 2020, with gradual RCMS Rural cooperative medical (insur-
achievement of overarching objectives on uni- ance) scheme
versal and equitable access to health services. RMB Renminbi (unit of currency)
The second phase of reform commenced in TCM Traditional Chinese medicine
early 2012. China’s health reforms remain THE Total health expenditure
encouragingly specific but not prescriptive on UEBMI Urban employees basic medical
strategy; set in the decentralized governance insurance
structure, they avoid the issue of reliance on URBMI Urban residents’ basic medical
local government support for the national equity insurance
objective, leaving the detailed design of health
service financing, human resource distribution
and accountability, essential drug lists and appli- Introduction
cation of clinical care pathways, etc. to local
health authorities answerable to local govern- Most people are familiar with two things about
ment, not the Ministry of Health. Community modern China. The first is its physical size and
engagement in government processes, includ- enormous population. In land area, China is the
ing in provision of healthcare, remains limited. world’s third largest nation, theoretically span-
This chapter uses the documentation and litera- ning 4 h of time difference from west to east
ture on health reform in China to provide a (while officially operating on one time zone). Its
comprehensive overview of the current situation 2010 census revealed a population approaching
of the health sector and its reform in the People’s 1.34 billion, the world’s largest. China’s popula-
Republic. tion grew most rapidly from the late 1950s to the
33 Health System in China 781
70
60
50
40
1962 1971 1980 1989 1998 2017
China World Low & middle income Low income
Fig. 1 Life expectancy in years: China, the world, and low-income and middle-income nations (Source: World Bank data
available at http://data.worldbank.org/)
early 1970s, due to the formerly high fecundity of Bank analysis of NCDs in China (The World Bank
its women alongside a rapid fall in the crude death Human Development Unit 2011) concluded that “a
rate due to communicable disease control (CDC) reduced ratio of healthy workers to sicker, older
and basic public health measures. Life expectancy dependents will certainly increase the odds of a
also rose rapidly during this period (Fig. 1) future economic slowdown and pose a significant
(Hipgrave 2011a). social challenge in China” (page 2). Equally chal-
The second familiar aspect is China’s meteoric lenging is the provision of new services for the
economic development, with an average annual prevention and management of chronic illness
growth rate of around 10% for most of the last and the government’s averred commitment to
30 years, only falling to 7–8% since the global equity and universal health coverage. These chal-
financial crisis. lenges and commitments were among the stimuli to
These familiar aspects of China have depended the major health system reform (HSR) that China
on the health of its population improving dramati- commenced in 2009 (State Council 2009).
cally during the first 30 years of the People’s Repub-
lic of China (PRC) since its establishment in 1949.
By the mid-1970s, China was already undergoing China’s Current Health System Reform
the epidemiologic transition, years ahead of other
nations of similar economic status, and by 1980, life China’s most recent HSR was a response to deep
expectancy in low-income China (67 years) inequity resulting from three decades of marketiza-
exceeded that of most similarly low-income nations tion and de facto privatization of the health sector.
by 7 years (Jamison et al. 1984). It was the culmination of many years of debate
However, with CDC (Hipgrave 2011a), eco- (Tang et al. 2014a) after acknowledged inaction
nomic development, rapid urbanization, and a on the heavy burden of healthcare on household
dramatically ageing population, China’s health expenditure (Blumenthal and Hsiao 2005; Huang
system now faces a vastly different range of 2011; Liu 2004; Liu et al. 2003; Tang et al. 2008). It
issues. China will soon become the first large comprises initiatives in five main areas:
nation to age before achieving developed nation
status. Noncommunicable diseases (NCDs) now 1. Expanding the coverage and benefit of health
account for over 80% of deaths in China and insurance schemes in urban and rural areas
almost 70% of its total disease burden (The World 2. Establishing a national essential medicines
Bank Human Development Unit 2011). A World scheme to ensure the availability of affordable
medicines and reduce the ability of health pro- China’s commitment to HSR indicates its ongo-
viders to profit from the sale of drugs ing priority for the highest echelons of government
3. Improving basic service availability and qual- (Ministry of Health 2012a). The four-year plan for
ity while also reducing referrals to specialist phase 2 reiterates the goal of universal access to
care and hospitals basic health services and seeks to resolve constraints
4. Ensuring the availability of basic public health to the supply of China’s increasing and diverse
services for all populations health needs. It again commits to expanding insur-
5. Piloting public hospital reform, particularly in ance benefits and introduces priority to unifying
order to separate hospital management and China’s several health insurance schemes; it encour-
clinical service provision ages development of commercial insurance and the
introduction of capitation and other payment reforms
The current HSR builds on earlier, equity- to separate doctors from the financial management
enhancing initiatives including the reestablishment of hospitals; it suggests that the private sector should
of rural health insurance (Meng et al. 2012) and manage 20% of health services by 2015; family
subsidized hospital maternity services (Feng general practice is promoted alongside expanding
et al. 2010a). Early progress on the first phase community and public health services, and the
of China’s current HSR (2009–2012) was exten- drug production, prescription, and pricing will be
sively reviewed, both internally by domestically further consolidated and regulated; performance-
commissioned teams of international (unpublished) based funding of health staff is also mentioned.
and national experts (Wu and Yang 2013; Li and These individual areas are discussed further below.
Chen 2012) and externally (Yip et al. 2012). The The plan is encouragingly specific but not pre-
Reform is planned to continue to 2020, with scriptive on strategy and avoids the issue of local
gradual achievement of its overarching objec- accountability for financing various health pro-
tives on universal and equitable access to health grams, stipulating only that government spending
services; the second phase (2012–2015) was on health should gradually increase as a propor-
announced in early 2012 (Ministry of Health tion of total government expenditure. This vague-
2012a), and a major additional pronouncement ness hints at a major problem for China’s health
on county hospital reform was made in early sector, the reliance on local government support
2014 (State Council 2014). Monitoring and for the national equity objective (Hipgrave et al.
evaluation of the Reform is slated to prioritize 2012). Another major problem remains the diffi-
its different hierarchical elements (Figs. 2 and culty of reforming hospital management, effec-
3), although detailed plans for such evaluation tively undoing the private, for-profit system that
have not been released. evolved over recent decades. As a result, China’s
Fig. 2 Mapping China’s

health reform priorities over
2009–2020 (Source: WHO
China)
Inputs & processes Outputs Outcomes Impact
2009 2010 2011 2012 2020
Have health outcomes and equity improved?

Are services responsive to needs?
Are people protected against financial risks?
Has coverage and benefits improved?

Have healthy behaviours improved?
Has access to services improved?

Did the quality of services improve?
Is utilization reasonable?
Is the process of implementation

happening as planned?
Are functional mechanisms established?
Have finances been disbursed?

Has infrastructure been built?
Have policies been implemented?
Fig. 3 Focus of monitoring and evaluation of health reforms in China (Source: WHO China)
HSR has not yet reduced the proportional finan- lower levels. A hierarchy of health authorities
cial burden of healthcare on households or their oversees these issues at province, prefecture,
risk of catastrophic expenditure on health (Meng county, and township levels.
et al. 2012). In China’s political economy and governance
structure, local health authorities are more respon-
sive to local government than to higher-level
Organization, Governance, cadres within the health sector, meaning that
and Accountability uptake of national policies and recommendations
is only guaranteed if there is broad agreement
Organization of the Health System across all sectors of government and at local gov-
ernment level. In the past, when the health sector
China’s former Ministry of Health (MoH) was of low priority, this severely limited the
recently merged with the body previously implementation of national laws relevant to the
responsible for family planning to form the health sector. For example, the 1989 Law on Con-
National Health and Family Planning Commis- trol of Infectious Diseases conferred on local gov-
sion (NHFPC). The Commission contains 23 dif- ernment’s responsibility for various forms of
ferent departments, offices, and bureaux reporting and action, but was weakly
responsible for setting standards and for the plan- implemented, culminating in the wake-up call of
ning, administration, oversight, and reporting on SARS in 2003, redrafting of the law and major
China’s health sector. However, as with most of reform of CDC (Hipgrave 2011a; Wang et al.
China’s social sectors, there is a heavy decentral- 2008a). Initiatives depending on countrywide
ization of responsibility for local planning, uptake such as the 2010 national measles vacci-
financing, and implementation of health services nation campaign still rely heavily on local funding
in China (Wong 2010; Zhou 2010a). In China’s and prioritization; recent environmental degrada-
decentralized system, policies and reform guide- tion and food and drug safety scandals are further
lines are set at national level but implementation evidence of the lack of cross-sectoral priority
is delegated to local authorities at provincial and given to the health sector in China. The partial
rollback of the one-child policy announced at However in China, such relationships have not
national level in 2013 remains subject to interpre- yet been forged. While there are promising moves
tation and optional implementation by provincial to make local government generally more
governments. Despite its evident high priority accountable to the public (such as measurement
(Tang et al. 2014a), many aspects of the HSR itself of “green gross domestic product (GDP)” and
are dependent on the same support and follow-up independent surveys of public opinion on local
by provincial and even county governments government performance in some provinces), the
(Hipgrave et al. 2012; Brixi et al. 2012). main motivation for subnational authorities
To ensure that HSR would receive adequate remains economic development and revenue gen-
local priority despite this structure and account- eration (Zhou 2010b). Moreover, while banking,
ability, in early 2010 the HSR Leading Group in communications, etc. are carefully regulated and
the State Council signed “accountability con- monitored from above, like most social sectors,
tracts” with provinces on key reform areas, for health services are largely organized and moni-
subsequent delegation and implementation at tored at the local level. It is too costly for China’s
lower levels (China News Network 2010). In undermanned central government to indepen-
some provinces, a few key HSR targets such as dently monitor and evaluate subnational health
health insurance coverage were incorporated into performance (Wong 2010; Zhou 2010b). These
subnational officials’ performance evaluation circumstances explain the limited ability of
criteria, which has been effective in ensuring pro- national health officials to ensure the HSR is
gress. However, in other, more complicated fully pursued at grassroots level.
reform areas, such as strengthening primary In theory, all government plans represent the
healthcare, public hospital reforms, and others, will of the people as they are ratified by the
ensuring progress has been more difficult. Indeed, National People’s Congress. However, many
the reform of public hospitals suffers from a lack Congress members are unelected (in the western
of consensus or clear national guidance on direc- democratic sense) appointees, and the People’s
tion, limiting its prioritization and implementation Congress generally rubber-stamps the documents
outside pilot areas, particularly at low levels. presented. However, with the increasing attention
of the Party and government in China to public
comment through social media, albeit increas-
Accountability Within Government ingly censored (Osnos 2014), and local protests,
and to the Population there is growing acknowledgment of their answer-
ability to the general public. Therefore, while
Figure 4 illustrates the ideal accountability relation- during local planning there is almost no formal
ships among government, healthcare providers, and process for the public to make input, there are
citizens (society) in the delivery of healthcare. opportunities for the general population to voice
Fig. 4 Accountability
relationships for healthcare Government
(Source: Adapted from The
World Development Report Regulation
2004 (The World Bank Mechanisms for
citizens’ Monitoring and evaluation
2003))
feedback, to
inform policy Funding
Provision of care
Health
Patients
Providers
As active purchasers of care,
patients monitor provider
performance
concerns through the courts, social media, peti- mechanisms to tap the feedback of patients, as
tions, protests, etc., especially when issues affect a the end users of health services, have not been
significant proportion of a community. Although established. There is no ombudsman or indepen-
the process is usually slow (the HSR took many dent regulator in China’s health system, and
years to be formalized (Tang et al. 2014a)), there senior appointments are normally approved by
is usually gradual recognition and acknowledg- the ruling Party organization. However, since
ment of the need to act. On the other hand, imple- launching the HSR, government is learning that
mentation of plans usually requires higher-level empowering patients and regularly collecting
pressure on the various lower tiers of government, their feedback on key parameters such as service
and this pressure progressively dissipates further prices and quality strengthens accountability
down the hierarchy; it may be ignored for issues across the government levels and can help achieve
that don’t have high-level and cross-sectoral sup- the overall goals of the reform (State Council
port and the support of local government. Hence, 2014). Patient satisfaction and feedback is
targets for insurance coverage and drug price con- increasingly incorporated into the performance
trol are accepted, but controlling the environmen- evaluation framework for HSR implementation
tal impact of local industry is often ignored (Ma 2013). However, this practice has not yet
(Human Rights Watch 2011). In this process, pub- been standardized, systematized, and regularized
lic influence is rather indirect and can be ignored if throughout China.
local economic, political, or vested interests An example of the problem China is having in
override it. effecting the most difficult aspect of the HSR, the
Patients’ concerns in healthcare delivery may reform of public hospitals, was recently summa-
be channeled formally through the National Peo- rized by eminent researchers on China (Yip et al.
ple’s Congress at different levels (although usu- 2012), who noted the complex web of relation-
ally only major complaints reach this level) or ships that govern this endeavor (Fig. 5). It seems
informally through social media. However, likely that China will need all the years up to 2020
MOH* NDRC NDRC MOF MOHRSS MOCA CCP Org MOHRSS

(planning) (pricing) (social Dept (personnel)
security)
NCMS UEBMI/ Medical Financial

Assistance Scheme
URBMI
Investment decision Financial Power (e.g. income, use Personnel management

of funds)
Public hospitals
Strategic planning and Use of profits and Management and

development surplus Staffing decisions use of assets
Fig. 5 Dispersion of power between ministries and public Affairs, CCP Org Dept Organizational Department of Chi-
hospitals in China. *MOH Ministry of Health, NDRC nese Communist Party, NCMS New Cooperative Medical
National Development and Reform Commission, MOF Scheme, UEBMI Urban Employee Basic Medical Insur-
Ministry of Finance, MOHRSS Ministry of Human ance, and URBMI Urban Residents Basic Medical Insur-
Resources and Social Security, MOCA Ministry of Civil ance (Based on Yip et al. 2012)
to make progress in this area of reform, although China, desensitization of administrative and eco-
some commentators doubt this will be achieved in nomic data is suspected (Cai 2008; Hu et al. 2011;
the current context (Zhang and Navarro 2014). Walter and Howie 2011; Kaiman 2013; Anony-
mous 2012).
Regulation of the health sector follows the
Planning, Regulation, and Monitoring accountability structure outlined above and
appraises progress and achievement against
The normal sector-planning practice in China fol- high-level targets set at national and local levels.
lows the National Five-Year Plan for Social and Performance assessment tends to be quantitative
Economic Development, with different social sec- (relating to coverage or throughput of health ser-
tors (including health) developing their respective vices), although assessment on more subtle mea-
plans at five-yearly intervals with annual updates. sures such as patient satisfaction, service quality,
However, the special need for health reform did not and disease management has commenced (as
allow China’s HSR to fall neatly in line with regu- outlined in a Guidance on Performance Assess-
lar national development planning, which covers ment of Basic Public Health Services Delivery,
two five-yearly periods per calendar decade: the jointly promulgated by Ministry of Health and
first three-year phase of the HSR covered Ministry of Finance in January 2011). At manage-
2009–2011, while the second overlaps with the ment level, government officials are also increas-
latter part of the government’s 12th Five-Year ingly being appraised according to efficiency and
Plan period: 2012–2015. Moreover, the HSR was innovations in rolling out reform initiatives at
developed as a cross-sectoral endeavor led by the local level.
national planning ministry (the National Develop-
ment and Reform Commission or NDRC) to
address long-accumulated concerns of the popula- Monitoring Progress: China’s Health
tion (State Council 2009; Tang et al. 2014a). While Information Systems and Technology
it overlapped with a MoH planning and develop-
ment activity, Healthy China 2020, the HSR was With around 20% of the world’s people,
not only a MoH initiative. population-level changes in China’s health status
As part of the government’s regular planning, or indeed any globally important indicator have a
the new NPFPC drafts annual national health major influence on corresponding global pro-
work plans with annual targets and submits annual gress. For example, China’s progress toward
budget proposals for approval by the Ministry of regional and global achievement of the Millen-
Finance and the NDRC, which approves major nium Development Goal (MDG) targets will
construction initiatives such as health infrastruc- impact any final evaluation of the MDGs in 2015.
ture development. With major events as the HSR, However, global statistics in any of the biolog-
new changes and innovations are often seen in the ical, physical, and social sciences can only be
plans year on year. At subnational levels, health- calculated if China’s data is included and consid-
related authorities (not only health bureaux) in ered to be reasonably accurate, and data from
provinces, prefectures, and counties submit China is not always available. Many lists of global
annual planning and budget proposals in line indicators lack an entry from China, and the accu-
with health service delivery needs and steward- racy of what is released has been questioned (Cai
ship to the development planning and finance 2008; Mulholland and Temple 2010). Usually,
authorities at the corresponding tier. Implementa- this is simply because China itself does not collect
tion is financed by local budget supplemented by national statistics on the relevant indicators or not
transfers from higher tiers of government in ways comparable with other nations (e.g., see
(explained below). Local data should be used in http://www.countdown2015mnch.org/documents/
formulating plans, but as there is little tradition of 2012Report/2012/2012_China.pdf). However, as
regular, independent, or audited data gathering in long ago as 2000, perspectives on China’s
mortality data were quite positive (Banister and 2. Maternal and Child Mortality Surveillance
Hill 2000). network, which has been summarized else-
The overall lack of data from China rouses where (Wang et al. 2011).
suspicion. But while China’s official statistics 3. The China Food and Nutrition Surveillance
often lack breakdowns on key indicators (e.g., System, which surveys 40 surveillance sites
until recently, child mortality by gender or cause on a five-yearly basis, most recently in 2010.
of death; nutrition status by province) or vary 4. The ten-yearly National Nutrition Survey, a
widely from one official source to the next comprehensive, age-stratified, sex-stratified,
(such as the annual birth cohort (Cai 2008) or and geographically stratified survey with a
number of road deaths (Hu et al. 2011)), these sample size of almost 200,000 (last com-
issues distract from China’s efforts to improve pleted in 2012).
the content, frequency, quality, and public avail- 5. The China Immunization Registration and
ability of official data in recent decades (Banister Information System, a newly computerized
and Hill 2000). Indeed, UNICEF’s “Atlas on administrative system that reports vaccina-
Children in China” publishes a wide range of offi- tion coverage to the NHFPC.
cial and recent data (http://www.unicefchina.org/ 6. Data gathered on health facilities, human
en/index.php?m=content&c=index&a=lists& resources, equipment, and services provided
catid=60), and health statistics and other year- to outpatients and inpatients at various sub-
books are published annually (Ministry of Health national levels and collected by the MoH
2012b; National Bureau of Statistics 2012, 2016) Center for Health Statistics and Information.
with a great degree of detail and disaggregation. 7. China’s National Notifiable Disease
An increasing number of official and peer- Reporting System, through which each
reviewed publications on maternal and child health county reports on 35 notifiable diseases.
(MCH) in China report official government data After SARS, this reporting system was mas-
(Wang et al. 2011, 2012; Rudan et al. 2010; Ministry sively upgraded to become web-based with
of Health 2011a; Feng et al. 2010b, 2011), and this is reporting in real time (Fig. 6).
contributing to summaries of global progress on the 8. Disease Surveillance Points on births, deaths,
world’s health status and MDGs 4 and 5. China and on cases of 35 notifiable diseases at
relies on several different sources to provide health 145 selected points around the nation.
administrators, the public and academia with infor- 9. China’s Vital Registration System, which
mation on the health sector. While it has never covers around 8% of the nation’s population
conducted a demographic and health survey, and but is biased toward urban and eastern
its last multi-indicator cluster survey was in 1995, locations.
China’s national health services survey has been 10. National Health Services Survey, which
undertaken with a reasonably consistent methodol- focuses on health status, service uptake, and
ogy on a five-yearly basis since 1993. Many publi- health financing (Meng et al. 2012); it was
cations have used this source to assess progress in last conducted in 2013.
aspects of China’s health system (Meng et al. 2012) 11. National Census, last conducted in 2010
and on its health indicators (Wang et al. 2012). (National Bureau of Statistics 2012), includ-
As an example of the other sources used, ing substantive demographic information.
China’s official MCH management information 12. National one percent (inter-census) House-
system (MIS) and the China Health Statistics hold Survey, conducted between the
Yearbook (Ministry of Health 2012b) rely on ten-yearly national censuses, last conducted
data from the following: in 2005.
1. MCH Annual Reports: administrative reports

submitted by ~3000 counties and districts Notwithstanding recent attempts to improve
across the nation (Ministry of Health 2007). the health MIS (HMIS), monitoring China’s
Fig. 6 Web-based national notifiable disease reporting since 2004 (Source: China Centre for Disease Control, Beijing
(with permission))
HSR and health status relies largely on output-

based reporting or describes numeric improve- Financing
ments emanating from high-profile national initia-
tives (Meng et al. 2012), often lacking Sources of Funding and Accountability
denominators (Huang 2011; Yip et al. 2012; Min- for Its Use
istry of Health 2012c). China does not have a
tradition of locally representative, population- Subnational governments, even at county and town-
based surveys on health outcomes; those which ship level, are responsible for about 90% of social
are undertaken are almost never independent. The sector financing and for the provision of essential
disaggregated impact of health initiatives and services including health (National Bureau of Sta-
local health status remains unknown except at tistics 2011). Government expenditure on health
crude (regional and urban-rural) levels (Meng depends heavily on local fiscal capacity (Yip et al.
et al. 2012; Ministry of Health Centre for Health 2012; Wong 2010; Feltenstein and Iwata 2005); this
Statistics and Information 2009). This lack of data varies widely across China, even after adjusting for
reduces the ability of governments to allocate formula-based “equalization transfers” from central
resources according to local demography and dis- government (Wong 2010; Bloom 2011). On aver-
ease epidemiology (which are changing rapidly age, tax revenue sharing and intergovernmental
with urbanization). In this context, quality imple- transfers finance up to 50% of subnational govern-
mentation of new HMIS initiatives (Hipgrave ment expenditure (World Bank 2012).
2011b) will be critical; however, again these are This system bestows considerable power on
national initiatives reliant on local funding. The provincial governments but also significant finan-
HMIS is mentioned as a priority for the second cial stress at the lowest levels of government.
phase of the HSR (Ministry of Health 2012a), but Each level of government has considerable dis-
in general the monitoring and evaluation of China’s cretion in transferring resources to successively
health sector remains weak and non-independent lower levels. Provincial governments are the main
and is not prioritized at subnational level. recipients of the central government equalization
Provincial expenditure on health per capita (RMB)

1200
Tibet
1000
Beijing
800
Qinghai Shanghai
600
Ningxia Tianjin
Xinjiang Inner Mongolia
Zhejiang
400 Yunnan Gansu
Guizhou
Henan
Chongqing Guangdong Jiangsu
200
0
10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000
Provincial GDP per capita (RMB)
Fig. 7 Provincial expenditure on health per capita in relation to provincial gross domestic product per capita, 2010
(Source: Ministry of Health, China Health Statistics Yearbook, 2011 (Ministry of Health 2012b))
grants and tax sharing and have significant auton- (Blumenthal and Hsiao 2005; Meng et al. 2012;
omy in what they do with these funds. Prefecture World Bank 2012).
governments in turn have similar autonomy. In Moreover, income disparities have widened
this system, funding for public service delivery across localities and population groups within
by poorer townships and counties tends to be local jurisdictions (Xing et al. 2008; Zheng et al.
insufficient (Wong 2010; Zhou 2010b). 2008; UNDP China and China Institute for
Apart from earmarked transfers from the MoH Reform and Development 2008). The national
and funds for selected nationwide priorities, urban-rural ratio of income per capita has risen
local governments may withhold resources for from 2.4 in 1991 to 3.2 (up to 4 within certain
lower levels or favor spending in more populous provinces) in 2010 (Fig. 7) (National Bureau of
areas or on issues strategic to their career (Zhou Statistics 2011). At subnational level, only four
2010a; Liu 2007). This kind of bias at sub- provinces (Sichuan, Tibet, Xinjiang, and Yunnan)
national levels can undermine progress on bucked this trend due to large subsidies to stimu-
national development goals (Yang 2011; late economic development and poverty reduc-
Uchimura and Jütting 2007). tion. Subsidies for these provinces impact the
To supplement resources received from the shape of the line of best fit in Fig. 2, which depicts
higher levels, subnational governments raise provincial expenditure on health in relation to
resources from various fees, the sale of land use provincial GDP, per capita.
rights, and taxes on real estate transactions
(World Bank 2012). However, poor localities
tend to have limited scope for such revenue gen- Difficulties Using Available Health
eration. The imbalance between resources and Financing for Policy Implementation
expenditure responsibilities, particularly in poor
jurisdictions, impacts on health service quality As mentioned, in China’s decentralized environ-
(Yang 2011) and on household health expenditure ment, local government expenditures are not
aligned with policy priorities across sectors and compared with industrialized countries, which aver-
programs. There are four distinct components of aged 9.7% in 2010 (OECD 2013), but is average
the national budget system, two of which impact among low- and middle-income countries (LMIC),
on social sector spending: the general government whose THE/GDP ranges from 2.6% to 10% (e.g.,
budget (which relies on various taxation revenues Indonesia 2.6%, Thailand 3.9%, India 4.1%, Russia
and allocates funds to publicly funded services 5.1%, Vietnam 6.8%, South Africa 8.9%, and Bra-
and activities) and the social security budget. The zil, 9.0%) (see data at http://apps.who.int/nha/data
first of these allocates funds at the sectoral level; base). Health expenditure as a proportion of GDP
line ministries can then decide on and allocate has increased from ~3% to ~5% since 1980, but
earmarked transfers to the provinces (Wong 2010; numeric growth has been enormous due to China’s
Zhou 2010b). However, subnational government rapid economic growth (Figs. 8, 9, and 10).
spending also relies on off-budget revenues (such The sources of THE have changed dramati-
as local taxes) for off-budget programs. cally over time, reflecting changes in the role of
Monitoring is limited and there is little effort to government. Marketization beginning in the
align subnational budgets or plans with higher- 1980s led to historically high out-of-pocket
level priorities. Moreover, apart from some indi- expenditure in 2001 (60%), but this had decreased
vidually monitored earmarked transfers, little to ~34% in 2012 (China National Health Devel-
information is available on whether governments opment Research Centre 2013), mostly through
actually spend money according to budgetary public subsidies for primary health programs, for
allocations or whether government expenditures health providers and for the social insurance
and programs lead to the desired outputs and schemes.
expected outcomes. Achievement of high-profile In 2011, tax-based government expenditures
input and output HSR targets masks the absence accounted for 30.7% of THE, social health expen-
of substantive analysis of outcome-level impact diture 34.6%, and out of pocket 34.7% (Fig. 8).
(Meng et al. 2012; Yip et al. 2012). Audits tend to Overall, public expenditure on health as a share of
focus on detecting malfeasance, not program THE is similar to that of many other LMIC and
performance. also to the United States (even higher if the gov-
Additionally, China’s budget and expenditure ernment contribution to social health insurance is
cycles are not synchronous. The fiscal year starts considered), but most high-income countries
with the calendar year, but the budget is not average around 71% (Tangcharoensathien et al.
endorsed by the National People’s Congress 2011). WHO calculates this figure differently and
until the end of March. This delay reduces the has China’s figure at 56%; most nations in South
budget’s operational significance for subnational and East Asia average around 41% (see http://
governments and central ministries (World Bank apps.who.int/nha/database and Hipgrave and
2012). Fragmentation, information limitations, Hort 2014).
and delays in budget execution limit the ability
of national authorities to transform policy priori-
ties into resource allocation and results at the local Collection and Pooling of Funds
levels (World Bank 2012).
To provide essential health services, reduce ineq-
uity, and provide financial protection against cat-
Health Expenditure and Sources astrophic health expenditure, governments must
of Revenue mobilize sufficient resources via: (1) collecting
revenues, (2) pooling of risk, and (3) purchasing
Total health expenditure (THE) in China was goods and services (Gottret and Schieber 2006).
US$445.5bn in 2012, at US$329 per capita, and Globally, three models of basic healthcare financ-
5.41% of GDP (China National Health Develop- ing are practiced: nationalized health services,
ment Research Centre 2013). THE/GDP is modest social insurance, and private insurance. China’s
Percentage of GDP
5.5
Government health expenditure
5.0
Social health expenditure
4.5
Out-of-pocket health expenditure
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Fig. 8 Government, social and out-of-pocket expenditure on health, 1978–2011 (Source: China Health Statistics
Yearbook (Ministry of Health 2012b))
30000 6
25000 5
Unit: 100 million renminbi
20000 4
15000 3
10000 2
5000 1
0 0
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010
Total health expenditure % of GDP
Fig. 9 Total health expenditure (THE) in China, numeri- (China National Health Development Research Centre
cally and as a percentage of gross domestic product 2012) (2012: US$1 = ~6 renminbi [RMB]))
(Source: China National Health Accounts Report 2012
2000
1800
1600
1400
1200
RMB
1000
800
600
400
200
0
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
Fig. 10 China’s per capita THE (Source: China National Health Account Report 2012 (China National Health
Development Research Centre 2012))
healthcare financing has evolved to a structure these arrangements are slated for phasing out.
dominated by three social insurance schemes However, government does not as yet contribute
with almost universal population coverage: the substantively to the funding of hospital care,
urban employees basic medical insurance which remains predominantly managed in-house
(UEBMI) (financed by formal sector employers from various sources of revenue (in particular,
and employee contributions), the rural coopera- out-of-pocket payments and insurance) (State
tive medical (insurance) scheme (RCMS), and Council 2014; Barber et al. 2014).
urban residents’ basic medical insurance
(URBMI). The latter two receive heavy govern-
ment subsidization in addition to individual con- Coverage, Benefit, and Cost Sharing
tributions (in a roughly 4:1 ratio).
Government health expenditure stems from tax Table 1 summarizes the current basic health
revenue, as described above. China does not have financing arrangements and benefit provided by
tax instruments specifically designated to health the various health insurance schemes in China. It is
expenses; the funds are allocated from overall tax evident that the major challenge remains fragmenta-
revenue. These funds are used to pay the salaries tion of the schemes and arrangements and the asso-
of health workers, purchase equipment, and build ciated inequity and inefficiency. This is also
infrastructure at various levels and for various highlighted in Fig. 11, which depicts the large var-
specific programs such as public health subsidies iation in average numeric benefit and other informa-
or other schemes earmarked by the MoH. Gov- tion about the various schemes. In this context, and
ernment also funds a social assistance program given China’s highly mobile population and the
(the medical financial assistance scheme), which limited access of migrant populations to urban
provides cash for designated poor households to health services (Di Martino 2011), the Government
purchase health services. There also remains “free is prioritizing integration of the various insurance
medical treatment” for those on the government schemes (Ministry of Health 2012a), but this is a
payroll and for retired military and Party cadres; difficult and complex proposition.
33
Table 1 Current healthcare financing arrangements, coverage, and benefit in China

Three social health insurance schemes Direct
subsidizing Other Out of pocket (direct
of public Medical financial (private) payment for care as
UEBMI RCMS URBMI providers assistance Free healthcare insurance needed)
Beneficiaries Formal Farmers Urban residents All citizens Poor households Public sector Voluntary Persons not covered by
Health System in China
sector not covered by the with catastrophic employees and purchasers another scheme or
workers UEBMI health expenses or special groups of private having to choose a
recognized such as retired insurance different health
recipients of China’s military and provider (includes
social security Party cadres many migrant
payments workers)
Population 14.8% 69.5% 9.5% All citizens 809 million incidents 0.7% 0.3% 5.2% (excludes most
coverage funded in 2011 migrant workers)
Benefit Outpatient Mainly IP, but some Mainly IP, but All services IP incurring OP and IP OP and IP
covered (OP) and counties some cities catastrophic cost to
inpatient experiment to cover experiment to individuals
(IP) some care for OPs cover some care
(e.g., treatment of for OPs (e.g.,
chronic diseases) treatment of
chronic diseases)
Co-payment Practices vary across regions; co-payment occurs for N.A. Practices vary cross Very minimal Varies 100%
arrangement (1) expenses below scheme thresholds and also above regions cross
ceilings, (2) expenses on high-end or special services schemes
excluded from schemes, and (3) the percentage not
reimbursable for different services
OP IP roughly 50–60% IP roughly
roughly 60–70%
30–50%;
IP:
10–20%
Source: Authors’ own compilation; figures from the 2012 China Health Statistics Digest (Ministry of Health 2012b)
793
3000
Per capita
government Programme for Employees of
2500 funding (RMB) in Public Administrative Units and
Organizations
2008 or 2010 (10 million) 2008 2629
2000
1500
Programme for
Employees of Public
1000
Sector Services Units
RCMS
and Organs (39 million) Urban MFA
(832
2008 (16.1 million)
500 533 million)
Rural MFA (50.4
2010
2010 126
million) 2010
279 171
100
0 (bubble
GFYL Urban Residents' BMI covered)
(195million) 2010
-500
Year of 120
launch 1950s 1990s 2002 2003 2007
Fig. 11 Government financing per participant across Government funding figures are annual per person, except
health security schemes introduced during 1950–2007. for the rural and urban medical financial assistance,
Note: Bubble size is equivalent to the number of partici- reported per case (Source: National Health Account Report
pants. Number of participants is shown in parentheses. 2011 and China Health Statistical Digest 2011)
Government spending per participant is shown in red.
Payment Methods for Health Services Innovative provider payment methods, such as
capitation (for primary heath mostly), gross bud-
Before the HSR, to ensure financial accessibility, get, diagnosis-related groups (for hospitals), as
the Chinese government priced primary well as performance-based payment for health
healthcare services at below cost, but allowed workers, are being piloted at county and district
providers to charge high prices for diagnostic level. Other related policy reforms include a
tests using high-tech equipment, effectively zero markup policy (for essential drugs), imple-
cross-subsidizing primary services. Providers mentation of essential drug list, and so on (Yang
could also levy a 15% profit on drug sales. et al. 2013a).
Under the prevailing fee-for-service payment
modality, this created an incentive for providers
to maximize profit by ordering tests and over- Physical and Human Resources
prescription of drugs. Cost-effective and efficient
primary healthcare services were ignored by pro- Infrastructure and Its Funding
viders because they were not profitable; those who
could not pay for services often chose to forego By international standards China’s average health
them (Tang et al. 2008). infrastructure level has been poor. For example, the
The recent reforms to provider payment, and number of hospital beds per 1000 population in
those mooted for the future, aim to: (1) encourage 2011 was around 4, among the lowest in the
the provision of cost-effective and efficient pri- world (Ministry of Health 2012b). Health infra-
mary healthcare services, (2) reduce provider structure in China also suffered from a major
reliance on drug income and curb over- urban-rural divide in the earlier stages of social
prescription, and (3) curb cost inflation. and economic development. Not only did urban
health infrastructure enjoy greater public financial drugs and inappropriate use of parenteral prep-
support, it attracted loans and other financial instru- arations continue to exemplify the low quality
ments because it was profitable and boosted the of care, especially in rural areas (Blumenthal
local economy. For many years, rural facilities and Hsiao 2005; Bloom and Xingyuan 1997;
received very limited government subsidy and Zhan et al. 1998; Pavin et al. 2003; Dong et al.
relied on collective funding among farmers. Rural 2008; Chen et al. 2010).
health infrastructure lagged seriously, in terms of With economic marketization, medicine at all
both the basic condition of health facilities levels became privatized, physician salaries were
(buildings, beds, etc.) and the equipment, while paltry, standard consultation fees were fixed
big urban hospitals acquired technical equipment below cost (Eggleston et al. 2008), and over
of high quality. In 2005, there were 3.6 hospital 40% of doctors’ and health facilities’ income
beds per 1000 urban residents, but only 0.78 in derived from the sale of drugs (Hu 2010). As a
rural townships (Ministry of Health 2007). This result, doctors worked where they could be
inequity was recognized by national government, assured of income, patients became disillusioned
and in 2006 the majority of a national bond issue with the care at rural clinics, self-referral to urban
was used to finance a project earmarked for rural clinics increased, and the distribution of doctors,
health, specifically to finance the rebuilding, reno- nurses, and health facilities was heavily biased to
vation, and updating of medical equipment for urban areas (Yip et al. 2012; UNDP China and
rural providers, including primary health facilities China Institute for Reform and Development
such as CDC and MCH institutions. The NDRC 2008; Youlong et al. 1997; Anand et al. 2008)
and its local branches approved the funding pro- (Table 2). Residents of urban areas in China, par-
posals for physical health infrastructure. More ticularly in the large eastern cities, enjoy physical
recently, the 2009 HSR allocated large sums to access to health services to the same level as in
further improve physical health sector infrastruc- most developed nations. However, like many
ture (focusing on rural remote rural areas, but also other Asian nations, China has trained more doc-
urban community health centers). Progress on this tors than nurses or midwives, and there are pro-
aspect of the Reform has been very positive (Yip gressively fewer staff with formal health training
et al. 2012). in progressively poorer rural areas (Youlong et al.
1997; Anand et al. 2008) (Table 3). China
includes TCM practitioners (13%) in headcounts
Health Workforce and Trends of health staff (Anand et al. 2008).
China is still paying for the interruption of
For the majority of China’s population, access university education during the Cultural Revolu-
to western and formally regulated traditional tion of 1966–1976, and the paucity of new village
Chinese medicine (TCM) only commenced doctors trained since the breakup of the village
with the introduction of China’s famed “bare- cooperatives in the late 1970s. First, as of 2005,
foot doctors” in the mid-1960s. These cadres 67.2% of China’s doctors and 97.5% of nurses had
numbered 1.8 million at their peak (around one only completed junior college or secondary tech-
per 600 people), but numbers fell rapidly with nical school level training, and 6% and 8% respec-
economic marketization and liberalization of tively had just high school or lower education
population movement (Bien 2008). Moreover, (Anand et al. 2008). The duration and standard
village-level care lost its funding base with the of professional education varies widely across the
dismantling of the rural cooperatives in the country (Youlong et al. 1997). Village doctors are
early 1980s, and training and supervision of an ageing cohort, with a likely high attrition rate in
the quality of care provided fell off. As recently the coming decade (Xu et al. 2014).
as the late 1990s, many doctors lacked training However, with massive increases in the num-
to the level suggested by their rank and title ber of formal trainees since 1998, the distribution
(Youlong et al. 1997), and overprescribing of and quality of personnel are probably bigger
Table 2 Health workers in China in 2011

Total Urban Rural
Number number number
Categories (1000s) Density (1000s) Density (1000s) Density
All health workers 8616 4.58 3844 7.9 4762 3.19
Licensed doctors including 2466 1.82 1190 3 1275 1.33
assistant doctors
Nurses 2244 1.66 1304 3.29 939 0.98
Other health professionals 1492 1.1
Other health workers 2413 1.8
Note: Urban areas refer to jurisdiction under China’s four municipalities and prefecture-level cities. Rural areas refer to
counties and county-level cities, as well township hospitals and village clinics. Density refers to the number of health
workers per 1000 population
Source: China Health Statistics Yearbook 2012 (Ministry of Health 2012b)
Table 3 Distribution of doctors and nurses by education level and health institution type, in 2011
In hospitals (%) In community health centers (%) In township hospitals/clinics (%)
Doctors Nurses Doctors Nurses Doctors Nurses
University and above 62.7 11.5 31.7 5.7 3.9 0.4
Secondary school and 36.3 86.4 64.6 91 83 87.9
college
High school or less 1 2.1 3.7 3.3 13.1 11.7
Note: University and above refer to with at least a bachelor’s degree. Secondary schools include technical or professional
high schools
Source: China Health Statistics Yearbook 2012 (Ministry of Health 2012b)
problems than the overall number of China’s of intensive efforts to fill known human resource
health human resources. Indeed, some data sug- gaps among various health and allied health pro-
gest an excess of trainees and the likelihood that viders, and of tiered registration for doctors that
many health graduates do not take up professional first requires a period of rural service. A focus on
service. Nonetheless, inequality and inequity in community general practice is reiterated in the
the distribution of doctors and especially nurses plan, with a target of 150,000 staff newly trained
between and particularly within provinces or upgraded personnel to provide such services.
remains extreme and has been linked to key health In addition, in a 2011 “Guidance” the State
outcomes including infant mortality (Anand et al. Council announced new roles for village doctors,
2008). recommending a wide range of tasks (Govern-
Authorities in China recognize the prevailing ment of China 2011). By 2020, these cadres
inequity in distribution of health human resources should be providing standardized primary care
and have initiated training and other schemes to (following new clinical guidelines), implementing
increase the number of qualified personnel and public health programs, undertaking disease sur-
improve their distribution. The 12th Five-Year veillance, conducting community education, par-
Plan for health sector development, released in ticipating in health financing schemes, and
2012, sets targets for assistant physicians (1.88/ maintaining individual e-health dossiers. In the-
1000 population) and nurses (2.07) and lays out ory, it will be possible for the national HMIS to
plans for increased priority of staffing in rural monitor their work. The official engagement of
areas and at community level, of personnel and village doctors in a national system is positive
financial support for poor rural and western health development and should improve public confi-
facilities by wealthier urban and eastern facilities, dence in their services. However, payment for
the planned elevation of village doctors’ respon- and Hsiao 2005; Ho and Gostin 2009; Wang
sibilities will derive from a complex mix of et al. 2007; Tian et al. 2008) or through
funding streams (Government of China 2011; accepting bribes and kickbacks (Yang and Fan
Ministry of Health 2011b) overseen and addition- 2012). While the government has committed to
ally funded by county-level authorities (Govern- improving both the quality of care provided by
ment of China 2011) whose accountability for this health providers, and is exploring remunerating
national initiative will be to local government them through capitation, diagnostic-related
(Wong 2010; Zhou 2010b), not health authorities. groups and performance-based incentives (Min-
istry of Health 2012a), separating hospital man-
agement from doctors’ income is proving to be
Remuneration of Health Workers the most difficult element of the current HSR
(Yip et al. 2012).
It is well established that marketization and the de
facto privatization of clinical care by salaried doc-
tors working in public facilities had, by 2000, Health Services Delivery
resulted in China having one of the least equitable and Outcomes
health systems in the world (The World Health
Organization 2000), with over 60% of THE being Primary Care and Public Health
out of pocket (Blumenthal and Hsiao 2005; Ho
and Gostin 2009; Wang et al. 2007). One of the As reviewed elsewhere (Hipgrave 2011a), public
main objectives of China’s HSR is to regulate the health services in China suffered badly under the
remuneration of doctors and to separate their marketization of the 1980s and 1990s. CDC in
income from choices on clinical care. However, particular was weak, culminating in the SARS
while China has reduced the level of out-of- epidemic in 2003. Public funding for preventive
pocket expenditure on health to around 35% health services fell dramatically and was insuffi-
through increases in public funding and insurance cient to even cover salaries. Public health author-
initiatives (Yip et al. 2012), household health ities were left to raise their own income through
expenditure has not decreased either numerically charging fees for services, including vaccination
or as a proportion of total household expenditure (for which fees were only completely dropped in
(Meng et al. 2012). Although there is indirect 2007) and various inspections and screening.
evidence of increased non-health expenditure by Community approaches to disease control were
insured households in comparison to before the abandoned in favor of vertical programs reliant on
schemes were introduced (Bai and Wu 2014), this national or external funding, and disease surveil-
objective of the HSR is proving to be the most lance was poor.
difficult to achieve. China’s THE is increasing at SARS and health authorities’ realization of the
around 17% per year, and a large proportion of the epidemic of NCDs due to ageing, urbanization, and
increase is due to payment of health facilities, decreasingly active lifestyles has led to major
doctors, and other providers by individuals or changes to public health programming in China.
insurers. As patient expectations rise but out-of- Disease surveillance is now conducted online, in
pocket expenses remain numerically high, an real time, and funding for CDC and preventive
increasing number of assaults of doctors by health has increased dramatically. New vaccines
patients’ families are being reported. were introduced in 2008, although globally
On the other hand, the scheduled fees payable recommended vaccines against Haemophilus
to doctors for listed services are set below cost, influenzae type B, pneumococci, human papilloma
forcing clinicians and facilities to charge for viruses, and rotaviruses are only available privately
other services, investigations, procedures, and (ironically, through government providers).
drugs (including those not on the essential The largest boost to public health came with
drugs list with unregulated prices) (Blumenthal the 2009 HSR, when government introduced a
minimum 15 renminbi (RMB)/capita subsidy for (2) cervical and breast cancer screening for
public health/screening activities to be conducted women in rural areas; (3) an expansion of the
across the nation. This had been pre-dated by hospital delivery subsidies first introduced in
various vertical preventive health programs, such 2000, to cover women in all rural counties;
as funding of hepatitis B vaccine since 2002 (Cui (4) free cataract surgery for the poor; (5) free
et al. 2007) and national funding of the EPI since folic acid supplementation for rural women before
2007. The HSR public health funding is provided and during pregnancy; (6) improved stoves and
by a mix of national and local authorities fuel to reduce fluorosis; and (7) introduction of
according to their ability to pay (problematic for eco-friendly toilets. Again, targets for introduc-
poor counties in rich provinces) and the RMB15 tion of these measures have been set and rollout
was increased to RMB25 in 2011; it is much is proceeding (Yip et al. 2012).
higher in wealthy areas. The funds pay providers Finally, although firm evidence of impact is
to conduct the following services, notionally scant, local authorities in most Chinese cities
free of charge: (1) maintenance of individual have introduced public education and health liter-
electronic health records, (2) health education, acy programs to enhance awareness on issues like
(3) vaccination, (4) infectious diseases’ preven- diet, exercise, cigarette smoking, appropriate care
tion and treatment, (5) screening and manage- of women before and during pregnancy, infants
ment of chronic diseases such as hypertension and young children, and the elderly. As usual,
and diabetes, (6) mental healthcare, (7) child implementation of national guidelines on such
healthcare, (8) pregnancy and maternity care, activities depends on uptake and funding by
and (9) healthcare for the aged. other sectors and local authorities. The regular
For the elderly and those with chronic diseases, occurrence of outbreaks of food (Xinhua 2011)
this kind of screening, along with the introduction and environmental contamination (Human Rights
of zero markup and full reimbursement for drug Watch 2011) and other scandals with public health
treatment of NCDs (Yang et al. 2013a), has made implications indicates the difficulty faced by
a huge difference to their care. However, rollout of national authorities in China’s decentralized
this initiative is slow, and although most targets context.
are being met (Yip et al. 2012), monitoring is
hampered by the absence of local denominators.
Moreover, some of the programs, such as man- Clinical Services
agement of mental illness, have not been founded
upon a training program for staff ill-equipped to Recent high-profile summaries of China’s health
provide them. In addition, unpublished evidence system tend to focus on its administration and
gathered by UNICEF in 2010 suggests that some financing and neglect the considerable improve-
of the funds are being used as salary supplements ments in clinical care available to the local popu-
to support the new responsibilities of village doc- lation. While standards at all levels of the service
tors (in public health and other programs) and that hierarchy vary very widely, health authorities
the volume of money allocated to some rural have augmented the care available at virtually all
localities is actually too high, due to public facilities across the nation. Moreover,
out-migration to cities. Meanwhile, the increasing access to services to services has improved for
proportion of China’s population living in urban all the population, albeit at high cost to both
areas, including most rural-urban migrants, can- government and individuals (Meng et al. 2012).
not access such services. Clinical services in China are conducted
Another boost to public health came with the through a hierarchically arranged network of
MoH’s program, also introduced in 2009, to pri- facilities ranging from tertiary referral centers in
oritize interventions for certain vulnerable the large cities (most having high-quality diagnos-
populations. These include: (1) catch-up hepatitis tic and laboratory equipment) to second-tier hos-
B vaccination for those aged <15 years; pitals at county and district level. Rural townships
Table 4 Number of outpatient visits and inpatients in primary care, community general practice, and
health institutions in China in 2011 lower-level facilities in the HSR (Yip et al. 2012;
Total visits Ministry of Health 2012a) and to moving outpa-
(100 million Total inpatients tient care in particular from hospitals to primary
Health institution person-times) (10,000 persons)
type (n = 62.7) (n = 15,298) care facilities (Barber et al. 2014).
Hospitals n (%) 22.6 (36) 10,755 (70.3) As would be expected for a nation of this size
General-acute 16.74 8431 and variation, clinical services in China vary
hospitals widely, from the world-class care available to
Hospitals 3.61 1349 residents in Shanghai, Beijing, Guangzhou, and
specialized similar cities to the most basic care in rural clinics
in TCM
in far western China. Similarly, models for the
Specialty 1.88 844
hospitals
care of chronic illness and the use of day-care
Sanitaria 0.05 98 and hospital in the home vary widely, but in gen-
Community health 38.05 (60.7) 3775 (24.7) eral these options are not yet well developed in
institutions (%) China. The average length of inpatient stay is high
Health centers 8.8 3472 in China compared to OECD nations (Meng et al.
Urban health 0.11 23 2012), particularly in public hospitals, which
centers account for 89% of total beds and 92% of hospital
Rural township 8.7 3449 admissions (Barber et al. 2014). Clinicians at
hospitals
community level have usually had training in
Outpatient 0.7 13
department TCM and many practice both western medicine
Clinics, health 5.2 and Chinese medicine.
centers, and However, the preparedness of clinicians in pri-
nurse stations mary care for the wide range of conditions they
MCH centers 1.76 (2.8) 682 (4.28) treat varies widely. For example, China’s current
(stations) n (%)
HSR acknowledges that the system’s clinical
Specialized disease 0.2 38
prevention and focus has been ill-suited to the screening and
treatment institutes outpatient care of chronic illness, an increasing
Source: China Health Statistical Yearbook 2012 (Ministry priority as rates of noncommunicable diseases rise
of Health 2012b) (The World Bank Human Development Unit
2011). Similarly, the high-volume model of clin-
ical care in China is poorly suited to the manage-
and urban communities are served by clinics or ment of mental illness (Qin et al. 2008), aged care
hospitals with varying capacity for inpatient care and dementia, and prevention of tobacco-related
and surgery. At village or neighborhood level, illness and alcohol consumption, all of which are
public or (mostly) private facilities provide basic needed in China (The World Bank Human Devel-
outpatient care, usually with an attached dispen- opment Unit 2011; Phillips et al. 2009; Yang et al.
sary and possibly with links to a laboratory or 2013b; Zhou et al. 2011; Chan et al. 2013).
radiology service. Concern about the standard of With respect to quality of care, in the last
care provided by local facilities has resulted in decade China has moved to standardize many
many patients self-referring to higher-level facili- clinical pathways and practices, and the concept
ties and hospitals (Table 4). As a result, hospitals of evidence-based medicine is increasing. How-
in China tend to provide care for all level of ever, attention to such standards and their influ-
illness, resulting in inefficiency and over- ence on clinical care is perceived to be low (Yang
crowding. Expenditure on hospital-based care as and Fan 2012). Moreover, funding for and the
a proportion of THE in China far exceeds that in quality and independence of clinical research,
many OECD nations (Barber et al. 2014), access to information, and the ability of clinicians
resulting in the high priority given to improving to practice independent of the profit motive are
major obstacles to the use of evidence-based Acknowledging these problems, China’s HSR
guidelines in clinical care in China (Barber et al. included establishment of a National Essential
2014; Wang 2010). Medicines Scheme (NEMS) to improve popula-
tion access to and reduce the cost of essential
medicines (State Council 2009), particularly at
Pharmaceutical Care grassroots (township and village) level. The
Scheme covers drug production, pricing, distribu-
China’s pharmaceutical sector has been one of tion, procurement, prescribing, and payment
the most problematic for health authorities over (Hu 2010) and a new National Essential Drugs
recent decades and the focus of major reform List (NEDL) for primary healthcare institutions.
efforts in the last few years. In 2008, 42.7% of The 2012 NEDL comprises 317 western drugs
China’s THE was on drugs (Hu 2010), compared and 203 TCM commodities (increased from
to 17% in developed nations (Seiter et al. 2010). 205 western and 102 in 2009) for storage and
Excessive drug prescription was common in use by grassroots facilities. Bidding prices for
rural China (Zhan et al. 1998; Pavin et al. 2003; 296 NEDL drugs were capped (Schatz and
Dong et al. 2008; Chen et al. 2010; Yu et al. Nowlin 2010), and a “zero markup” (no profit)
2010), and there is evidence that China’s rural policy was introduced, although markups remain
health insurance scheme was encouraging over- allowed at county-level and higher facilities. By
prescription (Chen et al. 2010; Sun et al. 2009). late January 2012, 99.8% of township hospitals
Drug sales continue to provide the largest and 58.1% of village clinics had implemented the
income source for China’s county health facili- policy (Ministry of Health 2012d). In addition,
ties; doctors have a pecuniary incentive to pre- most (urban) districts and (rural) counties had
scribe more and more expensive drugs (Chen made NEDL medicines reimbursable by the vari-
et al. 2010; Yu et al. 2010). Hospitals and doctors ous health insurance schemes, with higher reim-
profit significantly from the sale of drugs bursement rates than for nonessential medicines
(Yu et al. 2010; The World Bank Group East (Ministry of Health 2011c). Finally, to regulate the
Asia Pacific Region 2010), affecting financial pharmaceutical market and distribution of essen-
access to healthcare (Tang et al. 2008; Meng tial drugs, the NEMS introduced province-wise,
et al. 2012). Weak regulation of drug manufac- collective, internet-based public bidding and pro-
ture and distribution raises safety concerns curement for NEDL medicines.
(Yu et al. 2010; Guan et al. 2011). These four elements – the NEDL, zero
Previous efforts to improve the pharmaceutical markup, reimbursement of certain drug costs by
sector had limited effect. The impact of laws, insurers, and public procurement – were
decrees, and 24 separate price reductions over designed by the government to wrest control of
1996–2007 was constrained by hospital financ- the public pharmaceutical sector from the private
ing/income generation, market influences, and sector. However, the official HSR documents
patient preferences (Chen et al. 2010; Yu et al. encourage local adaptation of the broad design
2010). Price controls were undermined by manu- (Ho 2010), including the NEDL (which has
facturers, wholesalers, and retailers and by hospi- indeed been widely augmented (Guan et al.
tals and physicians controlling the prescription of 2011; Shi et al. 2011)) and strategies to compen-
price-controlled drugs (Hu 2010; Yu et al. 2010; sate providers for the zero markup policy. Few
Chen and Schweitzer 2008). New drug approvals evaluations of the impact of the Scheme have
were issued at astonishing rates (Ho and Gostin emerged. Very early indications suggested little
2009) and the former head of the national drug change in prescribing practices (Yip et al. 2012),
administration authority was executed in 2007 for but a small field evaluation found that while
accepting bribes. Kickbacks and corruption con- drug procurement has been systematized and
tinue to mar the sector (Yip et al. 2012; Yang and the cost of care had declined coincident with
Fan 2012). reduced drug prices, manufacturers have not
uniformly supported the changes, and some drug employed 17.5% of the total labor force, owned
prices have actually increased. Provider compen- 9.7% of total medical beds, and received 9.1% of
sation for reduced income was mostly ineffec- total patient hospital visits (Ministry of Health
tive, forcing some to seek alternative sources of 2012b). Compared with public facilities, a large
income within and outside the health sector. percentage of elderly physicians and new laborers
Rational drug prescribing had improved in this in health market are practicing in private clinics
study. The loss of drug income had forced health (Tang et al. 2014b). This staffing structure could
facilities to rely more on public financing, and have negative impact on quality of services.
providers complained of higher workload and In general, despite rapid development in recent
lower incomes (Yang et al. 2013a). Similar years, private health services are at an early stage
issues were found in another study in different of development in China. One major reason is that
locations (Xiao et al. 2013). the evolution and current standing of national
The NEMS particularly impacts small rural policy generally still favors public providers in
health facilities and will again rely on consider- terms of resource allocation, stewardship (entry
able local support for its implementation. Mean- and registration control), opportunities for promo-
while, provinces are continuing to augment even a tion, and social insurance entitlements. This
revised version of the NEDL (Tang et al. 2014a), accounts for common challenges in the private
and zero markup has not yet been applied in sector, i.e., lack of technical capacity, poor infra-
county or higher-level facilities. While insurance structure, and thus compromised service quality.
reimbursement and capitation may help to Health authorities are now promoting a robust
improve prescribing practices and reduce patient private sector to encourage competition and effi-
outlays, more control of procurement, manufac- ciency within the health sector, aiming for 20% of
turer, and prescriber practices are required. beds and services to be privately provided by
The recently announced reforms of county 2015. However, subsidization of grassroots level
hospital funding and administration include a public institutions may prevent moves in this
major focus on drug procurement, prescription, direction.
management, and pricing (State Council 2014).
Health Outcomes
Private Healthcare
While China’s progress on major health indicators
As a consequence of the marketization of China’s during the 30 years immediately following the
health sector in the 1990s, provision of health foundation of the PRC is unparalleled (Jamison
services was opened significantly to private pro- et al. 1984), marketization and the unaffordability
viders. The number of private providers increased of healthcare for a large proportion of the popula-
rapidly and now comprises a significant proportion stymied progress in the 1980s and 1990s.
tion of the market. For example, in 2005, private There are even suggestions that child mortality
hospitals accounted for only 17.2% of total hos- rates in China actually rose in the 1980s (Banister
pitals, but the share had increased to 38.4% by and Hill 2000), with the breakup of the commune-
2011. In 2011 among all 954,389 health facilities based health cooperatives. Moreover, improve-
(hospitals, clinics, and other institutions), 47% ment in certain indicators has been slow. For
operated as “private” entities. Reports indicated example, urban maternal mortality has been slow
that private health providers can offer services at a to fall, almost certainly because reductions in
cheaper price and shorter physical distance and maternity risk for urban residents have been
waiting time for patients (Deng et al. 2013) and diluted by the much higher risk of death in preg-
are highly active in the provision of healthcare in nancy among urban migrants (Fig. 12) (Zhang
China. However, most private facilities are small et al. 2014). Geographic disparities also remain
and poorly equipped, and collectively they only great, particularly between eastern and western
140
120
100
80
Urban
60 Rural
Total
40
20
Fig. 12 Maternal mortality per 100,000 live births by urban-rural location (Source: China Health Statistics Year Book
(Ministry of Health 2012b) and NHFPC (China National Health and Family Planning Commission 2012))
provinces (Wang et al. 2012). In general, the pri- compare favorably with other developing coun-
ority given to China’s recent HSR acknowledges tries, and China’s performance in reducing rural
that progress in its population’s health status was maternal and neonatal mortality has been outstand-
less than could have occurred, given the nation’s ing (Feng et al. 2010b, 2011). China has already
economic growth since the 1980s (Yip et al. achieved all the health targets in MDGs 4, 5, and
2012). Acknowledgement of this is the govern- 6 and achieved the target on reducing child under-
ment target of a one-year increase in life expec- weight in the early 2000s. Urban-rural disparity in
tancy by 2015 (Ministry of Health 2012a). The under-five and particularly maternal mortality has
most comprehensive analysis of the causes of declined since 1990, but remains high for
death and disability in China, published in child underweight and stunting and especially
mid-2013, highlighted the dramatic evolution of for child micronutrient deficiency (UNICEF
its demographic transition, with NCDs now mak- China, unpublished data; (Hipgrave et al. 2014)).
ing up all but two of the top 30 causes of lost life Challenges to population health status have
years, and most infectious diseases having fallen been alluded to already and include the rise of
precipitously. The report also noted the contribu- NCDs, especially smoking-related illness (The
tion of air and household pollution to mortality World Bank Human Development Unit 2011),
and morbidity and the need for cross-sectoral illness due to environmental damage and air
action to tackle the major causes of ill-health in pollution (The World Bank Human Develop-
China (Yang et al. 2013b). ment Unit 2011; Millman et al. 2008), urbani-
Nonetheless, in 2010, average life expectancy zation, and the provision of services for newly
in China was 74.8 years, and in 2012 the maternal arrived migrants (Gong et al. 2012). The pre-
mortality ratio was 24.5/100,000 live births, infant vention of accidents and injury will also play an
mortality rate 10.3‰, and under-five mortality increasing role in maintaining China’s trajectory
rate 13.2‰ (China National Health and on reducing preventable death and ill-health
Family Planning Commission 2012). These figures (Wang et al. 2008b). As the population ages,
private and institutional care of the elderly is remain least able to fund public services, despite
another major issue for China’s health and other having the greatest needs. As a result, proportional
social sectors. household expenditure on healthcare has not
declined.
Urban residents of China’s industrialized east-
Assessment ern provinces enjoy a high quality of healthcare
and access to trained personnel. This is not the
China’s progress in maternal and child health, case for poorer rural residents, particularly in the
urban health, and communicable disease control nation’s vast western region. The official engage-
are very encouraging, but the nation’s health sys- ment of village doctors to provide publicly funded
tem now faces a vastly different range of issues health services in rural areas should improve the
from those it faced before. standard of and public confidence in their care, but
In addition to health insurance reforms that the burden on this ageing cadre of staff is rising
commenced in 2003, in many ways the compre- and may be untenable; again, accountability for
hensive health system reforms announced in 2009 this national initiative will be to local government
have been highly successful. Insurance coverage and health officials unused to the application of
is almost universal, and the benefit package is treatment algorithms, performance-based assess-
gradually expanding, even for outpatient services, ment, and clinical audit. Concern about the care
although a system for ensuring coverage for the provided by community providers continues to
huge population of rural-urban migrants remains result in many patients self-referring to higher-
under development. Introduction of public health level facilities and hospitals.
screening and management, building of new Population health in China is threatened by the
health infrastructure and expansion of rise of NCDs, especially illness due to diabetes,
community-based services, measures to control cardiovascular disease, overweight, tobacco
profiteering from the sales of drugs, scale-up smoking, environmental damage, and air pollu-
training of health personnel, and other measures tion. The prevention of accidents and injury and
were both needed and are being implemented. On management of mental illness will also play an
the other hand, the reform of hospital management increasing role in maintaining China’s trajectory
and financing remains at the pilot stage, with on reducing preventable death and ill-health. The
suggestions but no formal guidance on the required focus of the health sector on chronic
model to be followed. illnesses, aged care, and outpatient services
China’s HSR is encouragingly specific but not requires a dramatic increase in the engagement
prescriptive on strategy. Monitoring the reform and stewardship of community providers.
remains predominantly output-based at macro- This has been a major focus of China’s health
level; no detailed independent assessments have reforms, now well into their second phase, and it
been undertaken, and population-level studies of is likely that further major policy and financial
health outcomes related to the reforms have not inputs will be announced before this phase con-
been undertaken. Moreover, mechanisms to incor- cludes in 2015. The private sector will play an
porate patient feedback into health service provi- increasing role in the provision of health services
sion have not been established and may be in China, but a higher level of stewardship and the
ignored if local economic, political, or vested use of financial mechanisms to reign in escalating
interests override such input, as has been observed costs will almost certainly be required, especially
in relation to China’s natural environment. Pub- for hospital care. To ensure consistency and trans-
lic financing of the health sector, although ferability, this may involve stronger oversight by
modest by global standards, has improved, par- and involvement of national health policy and
ticularly in relation to the proportion of THE financing authorities, notwithstanding the power
that is out of pocket. But costs are rising faster vested in subnational authorities in China’s sys-
than government inputs, and poorer constituencies tem of government.
References China National Health Development Research Centre.

China National Health Account 2012 (in Chinese). Bei-
Anand S, Fan VY, Zhang J, Zhang L, Ke Y, Dong Z, et al. jing: China National Health Development Research
China’s human resources for health: quantity, quality, Centre; 2013.
and distribution. Lancet. 2008;372:1774–81. China News Network. Ministry of Health signs “military-
Anonymous. False data China’s ‘biggest source of corrup- style order” on health reforms to improve poor account-
tion’: statistics chief. Want China Times. 10 Apr 2012. ability (in Chinese). China News Network [Internet].
Bai CE, Wu B. Health insurance and consumption: evi- 2010. Available at: http://www.chinanews.com/jk/jk-
dence from China’s New Cooperative Medical ylgg/news/2010/05-24/2299821.shtml. Last viewed
Scheme. J Comp Econ. 2014;42:450–69. 24 Oct 2014.
Banister J, Hill K. Mortality in China 1964–2000. Popul Cui FQ, Wang XJ, Cao L. Progress in hepatitis B preven-
Stud. 2000;58(1):55–75. tion through universal infant immunization – China,
Barber SL, Borowitz M, Bekedam H, Ma J. The hospital of 1997–2006. MMWR Morb Mortal Wkly Rep.
the future in China: China’s reform of public hospitals 2007;56(18):441–5.
and trends from industrialized countries. Health Policy Deng G, Dou C, Gong Q. Ownership, fees and service
Plan. 2014;29(3):367–78. https://doi.org/10.1093/heapol/ quality by health providers. Econ Rev (Jingji Pinglun)
czt023. (in Chinese). 2013;1:121–130.
Bien C. The barefoot doctors: China’s rural health care Di Martino K. China: ensuring equal access to education
revolution, 1968–1981. [Departmental Honors]: Wes- and healthcare for children of internal migrants. In:
leyan University; 2008. Bhabha J, editor. Children without a state: a global
Bloom G. Building institutions for an effective health human rights challenge. Cambridge, MA: MIT Press;
system: lessons from China’s experience with rural 2011.
health reform. Soc Sci Med. 2011;72:1302–9. Dong L, Yan H, Wang D. Antibiotic prescribing patterns
Bloom G, Xingyuan G. Health sector reform: lessons from in village health clinics across 10 provinces of West-
China. Soc Sci Med. 1997;45(3):351–60. Epub 1 Aug ern China. J Antimicrob Chemother. 2008;62(2):
1997. 410–5.
Blumenthal D, Hsiao W. Privatization and its discontents – Eggleston K, Ling L, Qingyue M, Lindelow M, Wagstaff
the evolving Chinese health care system. N Engl J Med. A. Health service delivery in China: a literature review.
2005;353(11):1165–70. Epub 16 Sept 2005. Health Econ. 2008;17(2):149–65.
Brixi H, Mu Y, Targa B, Hipgrave D. Engaging Feltenstein A, Iwata S. Decentralization and macroeco-
sub-national governments in addressing health equi- nomic performance in China: regional autonomy has
ties: challenges and opportunities in China’s health its costs. J Dev Econ. 2005;76(2):481–501.
system reform. Health Policy Plan. 2012. https://doi. Feng XL, Shi G, Wang Y, Xu L, Luo H, Shen J, et al. An
org/10.1093/heapol/czs120. impact evaluation of the Safe Motherhood Program in
Cai Y. An assessment of China’s fertility level using the China. Health Econ. 2010a;19(Suppl):69–94. Epub
variable-r method. Demography. 2008;45(2):271–81. 14 Sept 2010.
Epub 11 July 2008. Feng XL, Zhu J, Zhang L, Song L, Hipgrave D, Guo S,
Chan KY, Wang W, Wu JJ, Liu L, Theodoratou E, Car J, et al. Socio-economic disparities in maternal mortality
et al. Epidemiology of Alzheimer’s disease and other in China between 1996 and 2006. BJOG. 2010b;117
forms of dementia in China, 1990–2010: a systematic (12):1527–36.
review and analysis. Lancet. 2013;381(9882):2016–23. Feng X, Guo S, Hipgrave D, Zhu J, Zhang L, Song L, et al.
Epub 12 June 2013. China’s facility-based birth strategy and neonatal mor-
Chen Y, Schweitzer SO. Issues in drug pricing, reimburse- tality: a population-based epidemiological study. Lan-
ment, and access in China with reference to other Asia- cet. 2011;378:1493–500.
Pacific Region. Value Health. 2008;11(Suppl 1): Gong P, Liang S, Carlton EJ, Jiang Q, Wu J, Wang L, et al.
S124–S9. Urbanisation and health in China. Lancet. 2012;379
Chen W, Tang S, Sun J, Ross-Degnan D, Wagner (9818):843–52. Epub 6 Mar 2012.
AK. Availability and use of essential medicines in Gottret PE, Schieber G. Health financing revisited: a prac-
China: manufacturing, supply, and prescribing in Shan- titioner’s guide. Washington, DC: The World Bank;
dong and Gansu provinces. BMC Health Serv Res. 2006.
2010;10:211. Government of China. State Council Guidance on further
China National Health and Family Planning Commis- strengthening the ranks of rural doctors (in Chinese).
sion. Statistical bulletin on national health and family 2011. Available at: http://www.gov.cn/zwgk/2011-07/
planning development. National Health and Family 14/content_1906244.htm. Last viewed 24 Oct 2014.
Planning Commission of the PRC, Beijing; 2012. Guan X, Liang H, Xue Y, Shi L. An analysis of China’s
China National Health Development Research Centre. national essential medicines policy. J Public Health
China National Health Account 2011 (in Chinese). Bei- Policy. 2011;32(3):305–19. Epub 27 May 2011.
jing: China National Health Development Research Hipgrave D. Communicable disease control in China: from
Centre; 2012. Mao to now. J Glob Health. 2011a;1(2):223–37.
Hipgrave D. Perspectives on the progress of China’s Millman A, Tang D, Perera FP. Air pollution threatens the
2009 – 2012 health system reform. J Glob Health. health of children in China. Pediatrics. 2008;122
2011b;1(2):142–7. Epub 1 Dec 2012. (3):620–8. Epub 3 Sept 2008.
Hipgrave D, Hort K. Will current health reforms in south Ministry of Health. The National Health Statistics
and east Asia improve equity? Med J Aust. 2014;200 reporting system (in Chinese). Beijing: Chinese Acad-
(9):514. emy of Medical Science; 2007.
Hipgrave D, Guo S, Mu Y, Guo Y, Yan F, Scherpbier RW, Ministry of Health. Report on women and children’s health
et al. Chinese-style decentralization and health system development in China. Beijing: China Ministry of
reform. PLoS Med. 2012;9(11):1–4. Health; 2011a.
Hipgrave DB, Fu X, Zhou H, Jin Y, Wang X, Chang S, Ministry of Health. China’s Minister of Health: rural doc-
et al. Poor complementary feeding practices and high tors will continue to serve the masses indefinitely
anaemia prevalence among infants and young children (in Chinese). 2011b. Available at: http://www.gov.cn/
in rural central and western China. Eur J Clin Nutr. gzdt/2011-02/18/content_1805889.htm. Last viewed
2014;68:916. 24 Oct 2014.
Ho CS. Health reform and de facto federalism in China. Ministry of Health. China 2010 health statistical yearbook.
China Int J. 2010;8:33–62. Beijing: China Ministry of Health; 2011c.
Ho CS, Gostin LO. The social face of economic growth: Ministry of Health. China’s State Council announcement
China’s health system in transition. JAMA. 2009;301 on deepening medical and health system planning and
(17):1809–11. Epub 7 May 2009. implementation of the program during the 12th Five
Hu S. Financing, pricing and utilisation of pharmaceuticals Year Plan. 2012a. Available at: http://www.wpro.who.
in China: the road to reform. Beijing: The World Bank int/health_services/china_nationalhealthplan.pdf. Last
East Asia and Pacific Region; 2010. Contract No.: viewed 24 Oct 2014.
58410. Ministry of Health. China health statistics yearbook. Bei-
Hu G, Baker T, Baker SP. Comparing road traffic mortality jing: Chinese Academy of Medical Science; 2012b.
rates from police-reported data and death registration Ministry of Health. Three years of significant progress in
data in China. Bull World Health Organ. 2011;89 health reform (in Chinese). 2012c. Formerly available
(1):41–5. Epub 25 Feb 2011. at: http://www.moh.gov.cn/publicfiles/business/htmlfiles/
Huang YZ. The sick man of Asia. Foreign Aff. mohbgt/s3582/201201/53883.htm. Last viewed 20 Aug
2011;90:119–36. 2012 – MoH website now deleted.
Human Rights Watch. “My children have been poisoned”: Ministry of Health. Health statistical monthly reports. Bei-
a public health crisis in four Chinese provinces. jing: Ministry of Health; 2012d.
New York: Human Rights Watch; 2011. Ministry of Health Centre for Health Statistics and Infor-
Jamison DT, Evans JR, King T, Porter I, Prescott N, Prost mation. An analysis report of the fourth national health
A. China: the health sector. Washington, DC: The services survey in China in 2008. Beijing: China Union
World Bank; 1984. Medical University Press; 2009.
Kaiman J. Chinese statistics bureau accuses county of Mulholland K, Temple B. Causes of death in children
faking economic data. The Guardian. 7 Sept 2013. younger than 5 years in China in 2008. Lancet.
Li L, Chen Q-L. A rational evaluation of China’s health 2010;376(9735):89.
sector reform over three years. Health Econ Res. National Bureau of Statistics. China statistical yearbook
2012;5:7–12. (in Chinese). Beijing: National Bureau of Statistics;
Liu Y. China’s public health-care system: facing the chal- 2011. Available at: http://www.stats.gov.cn/tjsj/ndsj/
lenges. Bull World Health Organ. 2004;82(7):532–8. 2011/indexch.htm. Last viewed 24 Oct 2014.
Epub 27 Oct 2004. National Bureau of Statistics. Tabulation of the 2010 pop-
Liu MD. Sub-provincial intergovernmental fiscal transfers. ulation census of People’s Republic of China. Beijing:
2006 Annual China Fiscal Reform Forum. Beijing: China Statistics Press; 2012.
UNDP; 2007. National Bureau of Statistics. China statistical yearbooks.
Liu Y, Rao K, Hsiao WC. Medical expenditure and rural Beijing: Published annually; 2016.
impoverishment in China. J Health Popul Nutr. OECD. Health at a Glance: OECD Indicators, OECD Pub-
2003;21(3):216–22. Epub 14 Jan 2004. lishing. 2013. https://doi.org/10.1787/health_glance-
Ma XVC. National Commission for Health and Family 2013-en
Planning. Quoted comments given at press conference Osnos E. China’s censored world. New York Times. 2 May
during 13th National People Congress 2013 2014.
(in Chinese). 2013. Available at: http://news.sina.com. Pavin M, Nurgozhin T, Hafner G, Yusufy F, Laing
cn/c/2013-03-15/035926536113.shtml. Last viewed R. Prescribing practices of rural primary health care
24 Oct 2014. physicians in Uzbekistan. Trop Med Int Health.
Meng Q, Xu L, Zhang Y, Qian J, Cai M, Xin Y, et al. Trends 2003;8(2):182–90.
in access to health services and financial protection in Phillips MR, Zhang J, Shi Q, Song Z, Ding Z, Pang S,
China between 2003 and 2011: a cross-sectional study. et al. Prevalence, treatment, and associated disability
Lancet. 2012;379(9818):805–14. of mental disorders in four provinces in China during
2001–05: an epidemiological survey. Lancet. rising tide of non-communicable diseases. Washington,

2009;373(9680):2041–53. Epub 16 June 2009. DC: The World Bank; 2011.
Qin X, Wang W, Jin Q, Ai L, Li Y, Dong G, et al. Preva- The World Health Organization. The world health report
lence and rates of recognition of depressive disorders in 2000: health systems: improving performance. Geneva:
internal medicine outpatient departments of 23 general World Health Organization; 2000.
hospitals in Shenyang, China. J Affect Disord. Tian Y, Hua LJ, Chao WM. Chinese doctors’ salaries.
2008;110(1–2):46–54. Epub 12 Feb 2008. Lancet. 2008;371:1577.
Rudan I, Chan KY, Zhang JS, Theodoratou E, Feng XL, Uchimura H, Jütting, J. Fiscal decentralization, Chinese
Salomon JA, et al. Causes of deaths in children younger style: good for health outcome? OECD Development
than 5 years in China in 2008. Lancet. 2010;375 Centre Working Paper #264. Paris: OECD; 2007.
(9720):1083–9. Epub 30 Mar 2010. UNDP China, China Institute for Reform and Develop-
Schatz G, Nowlin P. Drugs for the masses. China Bus Rev. ment. China national human development report 2007/
2010;2010;22–5. 2008: access for all: basic public services to benefit 1.3
Seiter A, Wang H, Zhang S. A generic drug policy as a billion people. Beijing: UNDP; 2008.
cornerstone to essential medicines in China. 2010. Walter CE, Howie F. Red capitalism: the fragile financial
Contract No.: 58413. foundation of China’s extraordinary rise. Singapore:
Shi LW, Ma YQ, Xu LP, Zhao DH, Zhang Y. Review of Wiley; 2011.
adjustment of essential medicine list at provincial level Wang J. Evidence-based medicine in China. Lancet.
in China. Value Health. 2011;14(3):A14. 2010;375(9714):532–3. Epub 18 Feb 2010.
State Council. Opinions of the Communist Party of China Wang H, Xu T, Xu J. Factors contributing to high costs and
Central Committee and the State Council on deepening inequality in China’s health care system. JAMA.
the health care system reform. 2009. Available at: 2007;298(16):1928–30. Epub 24 Oct 2007.
http://www.china.org.cn/government/scio-press-confer Wang L, Wang Y, Jin S, Wu Z, Chin DP, Koplan JP, et al.
ences/2009-04/09/content_17575378.htm. Last viewed Emergence and control of infectious diseases in China.
24 Oct 2014. Lancet. 2008a;372(9649):1598–605. Epub 22 Oct
'State Council. Opinions on promoting the comprehensive 2008.
reform of county public hospitals (in Chinese). Wang SY, Li YH, Chi GB, Xiao SY, Ozanne-Smith J,
2014. Available at: http://baike.baidu.com/link?url= Stevenson M, et al. Injury-related fatalities in China:
jPS0SkO7GzuPcuhNHwpFhXEFOaPjpJ8PKVX7_Gz an under-recognised public-health problem. Lancet.
RMhMUYnG_u-THEs_dlop84b77tO3Y2PXMSoXy 2008b;372(9651):1765–73. Epub 22 Oct 2008.
KJ5gLPtyg. Last viewed 5 July 2014. Wang YP, Miao L, Dai L, Zhou GX, He CH, Li XH, et al.
Sun X, Jackson S, Carmichael GA, Sleigh AC. Prescribing Mortality rate for children under 5 years of age in China
behaviour of village doctors under China’s New Coop- from 1996 to 2006. Public Health. 2011;125(5):301–7.
erative Medical Scheme. Soc Sci Med. 2009;68 Epub 29 Apr 2011.
(10):1775–9. Epub 4 Apr 2009. Wang Y, Zhu J, He C, Li X, Miao L, Liang J. Geographical
Tang S, Meng Q, Chen L, Bekedam H, Evans T, Whitehead disparities of infant mortality in rural China. Arch Dis
M. Tackling the challenges to health equity in China. Child Fetal Neonatal Ed. 2012;97(4):F285–90. Epub
Lancet. 2008;372(9648):1493–501. Epub 22 Oct 2008. 17 Jan 2012.
Tang S, Brixi H, Bekedam H. Advancing universal cover- Wong C. Public Sector Reforms toward Building the Har-
age of healthcare in China: translating political will into monious Society in China. Paper prepared for the China
policy and practice. Int J Health Plann Manage. Economic Research and Advisory Programme. Univer-
2014a;29(2):160–74. https://doi.org/10.1002/hpm.2207 sity of Oxford; 2010.
Tang C, Zhang Y, Chen L, Lin Y. The growth of private World Bank. China 2030: building a modern, harmonious,
hospitals and their health workforce in China: a compar- and creative high-income society. Washington, DC:
ison with public hospitals. Health Policy Plan. 2014b; The World Bank; 2012.
29(1):30–41. https://doi.org/10.1093/heapol/czs130 Wu N, Yang HW. Retrospective evaluation of the achieve-
Tangcharoensathien V, Patcharanarumol W, Ir P, Aljunid ments of the implementation of essential medicine sys-
SM, Mukti AG, Akkhavong K, et al. Health-financing tem in medical reform of past three years. China Pharm.
reforms in southeast Asia: challenges in achieving uni- 2013;10(5–6):78–82.
versal coverage. Lancet. 2011;377(9768):863–73. Xiao Y, Zhao K, Bishai DM, Peters DH. Essential drugs
Epub 29 Jan 2011. policy in three rural counties in China: what does a
The World Bank. World development report 2004: making complexity lens add? Soc Sci Med. 2013;93:220–8.
services work for the poor people. Washington, DC: https://doi.org/10.1016/j.socscimed.2012.09.034.
The World Bank; 2003. Xing L, Fen S, Luo X, Zhang X. Intra rural income dispar-
The World Bank Group East Asia Pacific Region. Fixing ity in West China. China Econ Q. 2008;1(1):329–50.
the public hospital system in China. Washington, DC; Xinhua. China penalizes 113 over chemical tainted pork.
2010. Contract No.: 58411. China Daily. 26 Nov 2011 10:36.
The World Bank Human Development Unit. Toward a Xu H, Zhang W, Gu L, Qu Z, Sa Z, Zhang X, et al. Aging
healthy and harmonious life in China: stemming the village doctors in five counties in rural China: situation
and implications. Hum Resour Health. 2014;12:36. Zhan SK, Tang SL, Guo YD, Bloom G. Drug prescribing in
Epub 30 June 2014. rural health facilities in China: implications for service
Yang DL. The central-local relations dimension. In: Free- quality and cost. Trop Doct. 1998;28(1):42–8. Epub
man CW, Lu XQ, editors. Implementing health care 3 Mar 1998.
reform policies in China. Washington, DC: Center for Zhang W, Navarro V. Why hasn’t China’s high-profile
Strategic and International Studies; 2011. p. 21–9. health reform (2003–2012) delivered? An analysis of
Yang ZP, Fan DM. How to solve the crisis behind Bribegate its neoliberal roots. Crit Soc Policy. 2014;34:175–98.
for Chinese doctors. Lancet. 2012;379(9812):e13–5. Zhang J, Zhang X, Qiu L, Zhang R, Hipgrave D, Wang Y,
Yang L, Cui Y, Guo S, Brant P, Li B, Hipgrave et al. Maternal deaths among rural-urban migrants in
D. Evaluation, in three provinces, of the introduction China: a case-control study. BMC Public Health.
and impact of China’s National Essential Medicines 2014;14:512.
Scheme. Bull World Health Organ. 2013a;91:184–94. Zheng M, Fu Q, Wang X. Comparative study on structural
Yang G, Wang Y, Zeng Y, Gao GF, Liang X, Zhou M, et al. changes in income disparities in urban households in
Rapid health transition in China, 1990–2010: findings Chongqing Municipality, Shanghai Municipality and
from the Global Burden of Disease Study 2010. Lancet. Sichuan Province. J Reform Strategy. 2008;5:98–101.
2013b;381(9882):1987–2015. Epub 12 June 2013. Zhou LA. Reforming China’s local government gover-
Yip WC-M, Hsiao WC, Chen W, Hu S, Ma J, Maynard nance. In: Incentives and governance: China’s local
A. Early appraisal of China’s huge and complex health- governments. Singapore: Cengage Learning Asia Pte.
care reforms. Lancet. 2012;379(9818):833–42. Ltd.; 2010a.
Youlong G, Wilkes A, Bloom G. Health human resource Zhou LA. Incentives and governance: China’s local gov-
development in rural China. Health Policy Plan. ernments. Singapore: Cengage Learning Asia Pte. Ltd.;
1997;12(4):320–8. Epub 3 Nov 1997. 2010b.
Yu X, Li C, Shi Y, Yu M. Pharmaceutical supply chain in Zhou L, Conner KR, Caine ED, Xiao S, Xu L, Gong Y,
China: current issues and implications for health sys- et al. Epidemiology of alcohol use in rural men in two
tem reform. Health Policy. 2010;97(1):8–15. Epub provinces of China. J Stud Alcohol Drugs. 2011;72
24 Mar 2010. (2):333–40. Epub 11 Mar 2011.
Health System in Egypt
34
Christian A. Gericke, Kaylee Britain, Mahmoud Elmahdawy,
and Gihan Elsisi
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813
Historical Background Until 2011 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814
Public System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815
Private System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816
Information Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
Expenditure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
External Sources of Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
Insurance Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
Health Payments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820
Paying Health Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820
C. A. Gericke (*)
Anton Breinl Centre for Health Systems Strengthening,
James Cook University, Cairns, Australia
University of Queensland School of Public Health,
Brisbane, Australia
e-mail: c.gericke@uq.edu.au
K. Britain
University of Queensland School of Public Health,
Brisbane, Australia
e-mail: kayleebritain@gmail.com
M. Elmahdawy
Ministry of Health, Cairo, Egypt
e-mail: mahmoud77@yahoo.com
G. Elsisi
Ministry of Health, Cairo, Egypt
Faculty of Pharmacy, Heliopolis University, Cairo, Egypt
e-mail: gihanhamdyelsisi@hotmail.com

https://doi.org/10.1007/978-1-4939-8715-3_43
810 C. A. Gericke et al.

Physical Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820
Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821
Provision of Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821
Inpatient Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
Outpatient Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
Mental Health Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
Pharmaceuticals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822
The Arab Spring Revolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
Past Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824
Proposed Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826
Abstract With only 4.75% of GDP spent on health,

With over 95 million inhabitants, Egypt is the total health expenditure (THE) in Egypt is
second most populous country in the Middle low compared to other lower-middle-income
East and North Africa. Poverty has nearly countries. Out-of-pocket payments comprise
doubled over the last 15 years. Egypt has a over 60% of THE. Spending on pharmaceu-
very young population, and youth unemploy- ticals is relatively high with over 25% of
ment has become a major societal issue. THE, mostly in the form of out-of-pocket
Egypt’s health-care system is pluralistic costs. Another problem is the lack of com-
combining both public and private providers munication between public and private
and financers. The largest public health-care providers.
payers are the Health Insurance Organization Widespread public dissatisfaction with
(HIO) and the Curative Care Organization basic living conditions spurred the Arab
(CCO). HIO covers 60% of the population, Spring revolution in 2011. Since then, the
and provides basic coverage to employees, country has seen sustained political instabil-
students, and widows through their own hos- ity and slow economic growth which have
pitals and clinics. CCO contracts with indi- thwarted most long-term plans for health
viduals and companies to provide inpatient reform. Several reform measures have been
and outpatient care that was developed publicly discussed, but only few were
through the privatization of Egypt’s health- implemented such as the introduction of a
care providers over the last two decades. pharmacoeconomics unit in the Ministry of
Although the public system provides basic Health to curb the disproportionately high
universal coverage, it is plagued by chronic spending on pharmaceuticals.
underfunding, low service quality, and high A long-term national strategy is needed to
out-of-pocket payments. address issues of growing inequalities in
The private sector comprises private hospi- financial access to care, the perceived low
tals, doctors, and pharmacies, perceived as of quality of public services, as well as the
higher quality than public services. Most pri- growing privatization of health care which
vate services are paid for out-of-pocket; private furthers the existing inequalities in access to
health insurance is insignificant. care.
34 Health System in Egypt 811
Introduction on reducing economic growth rates from 7% to

4.7% which resulted in one fifth of the Egyptian
Egypt is the second most populous country in the population now falling into the “near-poor” cate-
World Health Organization’s (WHO) Eastern gory (Egypt’s progress 2010).
Mediterranean region (WHO 2010). Egypt’s con- Politically, Egypt is a constitution-based
tinued population growth along with increases in republic. This consists of a mixed legal system
urbanization puts much strain on the country’s based on Napoleonic civil law and Islamic reli-
capacity to provide enough food through agricul- gious law and judicial review by a Supreme Court
ture (CIA 2013) and on many publicly provided and Council of State (CIA 2013). The constitution
services including health care. With its location in created three separate branches of government: (a)
the northeast corner of Africa, Egypt has been the the executive, headed by the president; (b) the
cultural bridge between the African continent and legislative, consisting of a People’s Assembly
the Middle East for millennia. The country con- and the Advisory Council; and (c) the judicial
sists mostly of desert and relies on the limited branch, with a Supreme Constitutional Court
stretch of fertile land along the river Nile and its (consists of the court president and ten members),
branches as its only perennial water source (CIA Court of Cassation, and subordinate courts:
2013). The rapid growth in population and urban- Courts of Appeal, Courts of First Instance, Courts
ization has been a continued source of threatening of Limited Jurisdiction, and a Family Court (CIA
health concerns due to the unsanitary, polluted, 2013). After the independence from British colo-
and overstrained environment (CIA 2013; Anwar nialism and the ousting of the last Egyptian King
2003). In recent years, with increasing fertility Farouk in 1952 in the great revolution led by
rates along with decreasing infant mortality General Gamal Abdel Nasser, Egypt came under
rates, the largest age group is 0–14 years (Fig. 1; socialist rule in the 1950s and 1960s (Jabbour
CIA 2013). This implies a highly dependent pop- 2012). After the death of General Nasser in
ulation impeding economic growth. With a part of 1970, his vice president and close ally during the
this young population moving into the workforce revolution Anwar Al-Sadat replaced him and
in recent years, unemployment rates for them have moved Egypt toward a market-based economy.
become a huge issue. When president Al-Sadat was assassinated in
Egypt is a lower-middle-income country with 1981 by Islamist fundamentalists, his vice presi-
the majority of its income coming from tourism, dent General Hosni Mubarak took over the
remittances from working abroad, the Suez Canal, presidency. He continued Sadat’s market-based
and oil sales (WHO 2010). Poverty within the economic and international open political
country has continued to decrease through recent approach. Mubarak served as president for five
decades aided by substantial international donor consecutive terms up until the Arab Spring revo-
support (Egypt’s progress 2010). Despite this, lution in 2011 (Jabbour 2012).
poverty rates did increase in 2008 to 25.2% as a The opening of the country to the world econ-
result of the global economic crisis. However, omy under presidents Al-Sadat and Mubarak allo-
extreme poverty continued to decline to 4.8% wed continued economic growth through the
(Egypt’s progress 2010). Most of the poverty is years as well as implementation of various eco-
found in rural areas (most notably in Upper Egypt) nomic reforms in order to balance economic
which consists of just over 50% of Egypt’s popu- inequalities as well as reduce foreign debt
lation (Egypt’s progress 2010). These inequalities (Ministry of Health, Egypt 2010). Unfortunately,
have carried over into both health and literacy economic reforms were accompanied by problem-
indicators for the respective areas (Table 1; atic social effects giving rise to unemployment
Egypt’s progress 2010). Along with poverty and poverty. It seems that the Egyptian Social
rates, the economic crisis has also made an impact Fund for Development which was instituted in
Male Egypt - 2000 Female

100+
95 - 99
90 - 94
85 - 89
80 - 84
75 - 79
70 - 74
65 - 69
60 - 64
55 - 59
50 - 54
45 - 49
40 - 44
35 - 39
30 - 34
25 - 29
20 - 24
15 - 19
10 - 14
5-9
0-4
5 4 3 2 1 0 0 1 2 3 4 5
Population (in millions) Age Group Population (in millions)
Male Egypt - 2013 Female

100+
95 - 99
90 - 94
85 - 89
80 - 84
75 - 79
70 - 74
65 - 69
60 - 64
55 - 59
50 - 54
45 - 49
40 - 44
35 - 39
30 - 34
25 - 29
20 - 24
15 - 19
10 - 14
5-9
0-4
6 4.8 3.6 2.4 1.2 0 0 1.2 2.4 3.6 4.8 6
Population (in millions) Age Group Population (in millions)
Source: United States Census Bureau, International Programs 2013
Fig. 1 Population pyramids for Egypt (2000 and 2013). (Source: United States Census Bureau, International
Programs 2013)
1991 to counter some of these undesired side improvements and a growing economy, wide-
effects had some positive impact through its spread public dissatisfaction with basic living
microcredit and community financing initiatives conditions and high levels of poverty remained
(Abou-Ali et al. 2010). Despite some social and spurred the Arab Spring revolution in 2011.
Table 1 Socioeconomic and demographic indicators for in fact low, while noncommunicable diseases in
Egypt Egypt are higher than other countries in the
Indicator Year Country geographic region (WHO 2013). Of the non-
Socioeconomic communicable diseases, like in many other
Total population 2013 85,294,388 countries, obesity is a growing factor with over
Population living in urban 2010 43% 33% of the population being obese as of 2008
areas (%)
(WHO 2013). HIV/AIDS has also been an
Gross national income per 2010 6120
capita
increasing health issue with over 11,000
Gross domestic product 2010 $255 known persons with the infection as of 2009.
billion Today the top three diseases causing mortality
GDP growth rate (%) 2012 2.00% are essential primary hypertension, intracere-
Poverty rate (%) 2010 25.20% bral hemorrhage, and fibrosis/cirrhosis (WHO
Unemployment rate (%) 2012 12.50% 2013). In contrast, Egypt has put little emphasis
Rate of urbanization 2010 2.10% on controlling environmental risks to health and
Literacy rate males (%) 2010 80% well-being (Anwar 2003; Gericke 2006).
Literacy rate females (%) 2010 64%
Demographic
Total fertility rate (per woman) 2013 2.9
Population 0–14 (%) 2013 32.30% Organization and Governance
Population 65 years and 2013 4.80%
over (%)
Overview
Death rate (per 1000 2013 4.79
population)
Birth rate (per 1000 population) 2013 23.79 Egypt’s health-care system is pluralistic and
Sources: CIA Factbook 2013, WHO 2010, World Bank
complex combining both public and private
2013 providers and financers. The government has
committed to provide health care to the poor;
however, with a system pluralistic in nature,
The uprising has caused economic growth to health-care providers compete, and clients are
slow down (CIA 2013) in the past few years free to choose services based on their needs
due to the political uncertainty along with a along with the ability to pay (WHO-EMRO
significant reduction in tourism (Haley and 2006). Subsequently, the health-care system
Beg 2012). At the same time, the revolution relies upon four financing agents:
has resulted in increased social spending to
address public dissatisfaction and has also led • Government sector
toward a reduction in foreign exchange reserves • Public sector
contributing to a rising deficit (CIA 2013). • Private organizations
Overall health in Egypt prior to 2011 had • Household payments (out-of-pocket)
been steadily improving over time with marked
increases in life expectancy and decreases in The government sector represents the various
infant mortality rates since 1990 (Table 2). ministries and departments of the government
Communicable disease control, in particular, financed primarily through the Ministry of
for endemic tropical diseases such as schistoso- Finance (MOF). Other government agents are
miasis has also made great improvements dur- the Ministry of Health and Population (MOHP),
ing this time; however, diarrheal diseases, acute the Ministry of Higher Education, and the Minis-
respiratory infections, and hepatitis are still tries of Interior and Defense. The MOHP is
reported from health facilities (CIA 2013). responsible for policy formulation and the
Compared to other MENA countries, the popu- regulation of the health sector including public,
lation percentage of communicable diseases is nongovernmental, and private organizations
Table 2 Health trends in Egypt

Indicators 1990 1995 2000 2004 2005 2010 2013
Life expectancy at birth (total) 65.3 (92) 66.9 (98) 67.1 (01) 70.1 (02) – 73 73.19
Infant mortality rate 63 66 24.5 22.4 20.5 – 23.3
Under five mortality rate (per 1000 – 3.9 (97) 33.8 28.6 26.3 21 –
live births)
Maternal mortality ratio (per 100,000 174 (92) 96 (98) 84 (01) 68 (02) 63 66 –
live births)
Sources: CIA 2013; WHO 2006, 2010
covering over 29 ministries and organizations Historical Background Until 2011

(WHO-EMRO 2006). The MOHP is also respon-
sible for providing preventative and curative care The progression of Egypt’s health-care system
throughout all of Egypt making it the largest pro- today begins with the implementation of socialist
vider of health-care services in the country rule under the Nasserite regime (1950s–1960s).
(WHO-EMRO 2006). This social movement nationalized many services
The public sector comprises financially inde- including hospitals (Jabbour 2012). It was also
pendent governmental organizations. The largest during this time that the HIO and the CCO were
of these are the Health Insurance Organization established as the idea of health insurance based
(HIO) and the Curative Care Organization on actuarial premiums was unacceptable (Jabbour
(CCO) (Haley and Beg 2012; WHO-EMRO 2012). The health-care system grew significantly
2006). The HIO is Egypt’s public health insurance under Nasser furthering not only primary care but
with the goal of providing sustainable and univer- also secondary and tertiary care through a system
sal coverage to employees, students, and widows of fee-for-service (Jabbour 2012). After these
through their own hospitals and clinics. The CCO improvements were made under the Nasserite
contracts with individuals and companies to pro- regime, new economic policies under Sadat to
vide inpatient and outpatient curative care that Mubarak were introduced to bring up a newly
was developed through the privatization of falling economic performance. These actions
Egypt’s health-care system (Ministry of Health, lead toward an increased privatization of the
Egypt 2010). The private sector includes private health-care system in Egypt. This began with the
health insurance companies that can be either non- introduction of the “Infitah” policy under Sadat
for profit or for profit (Ministry of Health, Egypt which was created to reduce the government’s
2010). The private sector has been the fastest- role in the economy to allow for more private
growing source of health provision as the country involvement and investments (Salem 2002).
has continuously moved toward privatization in Furthermore, this started the development of the
the last two decades (WHO 2010). The private health-care system in accordance with interna-
sector comprises private pharmacies, doctors, and tional agencies and standards (Jabbour 2012).
private hospitals and overall provides care that With the help of mostly USAID policies, invest-
continues to remain much higher rated than its ments were made in expanding the private health-
public counterparts. care sector (Jabbour 2012).
For all of these levels of care, out-of-pocket In the 1990s, the Egyptian government made a
payments have consistently remained the largest declaration to focus on improving health for the
source of health financing in Egypt (Ministry of nation. The aim of this statement was to initiate
Health, Egypt 2010; WHO-EMRO 2006) with an the provision of a universal health-care system
adoption of the idea of “fee-for-service.” This idea along with the adoption of the family health
forces households to pay at the point of care in model for the provision of primary care (WHO
both private and public health facilities. 2010). This led to the government created Health
Sector Reform Program (HSRP) established (Elgazzar 2009). This organization is headed by
through the Family Health Fund along with the the minister and further employs over 5,000 per-
HIO (Salem 2002). To aid this development, sonnel in managing and delivering public health
Egypt received substantial foreign aid and assis- services (WHO-EMRO 2006). However, due to
tance by the World Bank, USAID, and the Euro- poor salary bases for doctors along with income-
pean Commission (Salem 2002). Egypt also based inequality in service utilization, the quality
became a party to International Health Regula- of public health care in Egypt is known to be poor
tions (IHR) to improve practice, surveillance, which shifts both suppliers and demand to private
and preparedness for health issues (WHO 2010). health care (Elgazzar 2009). Despite this, the
Egypt’s HSRP was officially introduced in 1997 MOHP is the major provider of primary, preven-
to address how health in Egypt is organized, tative, and curative care with over 4,500 health
financed, and delivered (Haley and Beg 2012). facilities throughout the country (WHO-EMRO
This program has worked to improve upon the 2006). The MOHP delivers its functions through
disjointed and complex health system through four separate levels which correlate to the follow-
the private and public sector that existed then ing levels of health care (WHO-EMRO 2006):
and now in Egypt. A few years after its establish-
ment, the Healthy Egyptians 2010 Initiative was
• Central
launched in 2000 to foster disease prevention and
• Health directorates (government level)
control (Anwar 2003).
• Health districts
The accumulation of reforms has benefited the
• Health-care providers
health system in Egypt by implementing a social
health insurance model, successfully increasing
surveillance, and reducing communicable disease Centrally, the MOHP is divided into ten sectors
incidence and prevalence (WHO 2010). However, (MOHP 2013) depicted in Fig. 2.
given Egypt’s lower-middle-income status, its These sectors, in accumulation, control the
overall population health is relatively poor in policy and regulation of health and health services
comparison with other lower-middle-income in all of Egypt. The governorate level of the
countries. Furthermore, despite some improve- MOHP operates in purchasing and financing
ments, the burden of noncommunicable diseases health care for the Egyptian population by
has increased, putting further strain on Egypt’s balancing income and expenditure in compliance
health system (Roberts et al. 2013). Universal with health sector regulations (WHO-EMRO
health care still has to be achieved due in large 2006). The district health structure is simply a
part to the privatization and its subsequent reduc- replication of the government level in functional-
tion in public spending which forced an increase ity except on smaller scales (WHO-EMRO 2006).
in prepaid private and in out-of-pocket health Finally, the provider level of the MOHP is divided
expenditure (WHO-EMRO 2006). Compared to based on services as well as location (WHO-
other lower-middle-income countries, Egypt EMRO 2006). Despite a consistent discrepancy
spends comparatively little on health care: only between rural and urban health care in Egypt, the
4.75% of GDP (2007–2008) (Ministry of Health, MOHP does try to provide a large variation of all
Egypt 2010). necessary services to all populated areas of Egypt.
A main component of the public sector of the
Egyptian health system is the HIO. While most
Public System Egyptians rely on private care provision in addi-
tion to the services provided by the MOHP, the
The primary organization behind the public sys- HIO is the largest health insurer in Egypt with
tem in Egypt is the MOHP. The MOHP offers continuous increases to its utilization through the
health service free of charge to every Egyptian years (Haley and Beg 2012). From 1990 to 2008,
citizen covering all inpatient and outpatient care the percentage of population insured by the HIO
Fig. 2 Organization of the MOHP. (Source: MOHP 2013)
increased from 10% to 55% (Table 4) showing not perception of high quality within the country.
only its growing use but also improvements to the However, the system prior to 2011 has not set up
public sector by increased access (Ministry of sufficient regulations on governing its service and
Health, Egypt 2010). However, the provision of finance, forcing much of the service to be pro-
health from all public-sector services has suffered vided through purely out-of-pocket payments
from the government and MOHP’s inability to (WHO-EMRO 2006). This increases the inequal-
keep up with increasing costs (WHO 2010). This ity of health-care access within the country as the
has turned not only patients but also doctors to the private services are only for those who can afford
private health system which can provide both them. Furthermore, because there are less regula-
better salaries and physical resources. tions, more doctors are relying on private care
work as supplemental payment which has been a
key factor in the private system’s perceived better
Private System quality of care (WHO 2010). The lack of govern-
mental regulation along with competing health
Increased privatization along with poor mainte- insurers and providers has resulted in a severe
nance of public care has driven substantial devel- absence in communication between the private
opment of the private health-care system in Egypt. and public sectors. This has been a key source of
Moreover, the private system has achieved a Egypt’s health system’s continuous dysfunction.
Information Systems 2010). Overall government investments in the

Egyptian health system have been declining over
Egypt’s health information systems have contin- the years (World Bank 2013). This has subse-
ued to be developed through increased surveil- quently forced financing from private households
lance implementation within the country. In (out-of-pocket) to continue rising distinguishing it
2000, the Epidemiology and Disease Surveillance further as the single largest source of health
Unit (ESU) was created through the MOHP financing in the country. Moreover household
in order to assess health staff and monitor health expenditure has risen past 60% of all health
patterns, risk factors, and diseases (WHO-EMRO investments (Ministry of Health, Egypt 2010).
2006). In total, Egypt has increased surveillance The second largest source of health financing is
to cover 26 communicable diseases which has the Ministry of Finance (MOF). The MOF
helped to reduce the incidence of tuberculosis as accounts for the public financing of the health
well as the elimination of polio within the country system most notably in regard to programs and
(WHO 2010). Egypt, experiencing a dual burden services from the MOHP as well as some support
of disease, has only recently implemented to the social health insurance (HIO). The public
the STEPwise surveillance framework for non- sector of health financing totals approximately
communicable disease. The STEPwise survey one third of all health investments (Ministry of
was successfully conducted in 2011 using a stan- Health, Egypt 2010). Private and other external
dard survey instrument and a methodology sources account for the remaining two thirds (Fig.
adapted to Egypt’s resource setting in accordance 2; Ministry of Health, Egypt 2010).
with WHO (2013). There is also currently a sup- In summary, Egypt spends a mere 4.7% of
ply chain system megaproject underway working GDP on health (Ministry of Health, Egypt
to centralize and computerize drug ordering, pro- 2010). Only 1.6% of GDP accounts for public
curement, delivery, and other associated logistics. spending, and 2.1% of GDP accounts for private
In conjunction with this, an electronic medical spending on health care (WHO-EMRO 2006).
records (EMR) system is also a work in progress. Given the country’s economic status, this value
This would be an integral component of the is low, and the percentage of out-of-pocket spend-
national health insurance project in order to ing by individuals is high, compared to regional
avoid service duplication and abuse of the system comparators (Table 3).
by any group of patients.
Expenditure
Financing
In Egypt, public funding for the health system
Overview flows to financial agents and then onto providers
under mutually exclusive tracts known as silos.
In 2007/2008, Egypt invested 42.5 billion Egyp- This impedes care coordination and effective
tian pounds (LE) on health. For a middle-income allocation of resources between the public and
country in the region, this amount of spending is private sectors (Ministry of Health, Egypt 2010).
relatively low (see Table 3 for comparisons) (Min- From this, expenditure moves into various parts of
istry of Health, Egypt 2010). Breaking this down, the health system ranging from both private
financing derives from direct tax revenues, HIO and public service providers to pharmaceuticals.
premium payments and direct out-of-pocket The largest part of health financing is expenditure
spending from private households, private health for pharmaceuticals (Fig. 3). Pharmaceuticals
insurance premiums, and health spending from account for 25.9% of total health expenditure
employers to employees, and finally assistance which is a relatively high percentage in comparison
also comes from a cigarette tax as well as minor of comparable health systems (Ministry of Health,
donor assistance (Ministry of Health, Egypt Egypt 2010). Funding for pharmaceuticals mostly
Table 3 Comparison of health spending in Egypt to other WHO’s Middle Eastern countries (2007/2008)
Percent Per capita
of GDP Government Health spending as the Out-of-pocket health spending
spent on spending as the percentage of total expenditure as the (Constant
health (%) percentage (%) government budget (%) percentage (%) 2005 US$)
Algeria 4.49 83.85 10.65 15.30 205
Dji bouti 8.54 76.07 14.15 23.60 81
Egypt 4.75 33.00 5.00 60.00 111
Iran 6.30 45.72 11.40 51.68 294
Jordan 9.10 62.20 11.35 33.40 273
Lebanon 8.76 48.99 12.39 39.95 551
Li bya 2.80 75.88 5.38 24.12 383
Morocco 5.33 34.87 6.17 56.13 133
Syria 3.23 45.13 6.01 54.87 76
Tunisia 5.95 49.17 8.90 42.52 213
Sources: WHO NHA data, Egypt NHA results, Jordan NHA report, cited in Egypt MOH 2010
comes from out-of-pocket spending as a result of a Insurance Coverage

lack of communication and education about proper
use and distribution. Private clinics receive the next The largest source of health insurance for
highest amount of expenditures similarly as a result Egyptians remains the Health Insurance Organi-
of out-of-pocket spending by households (Figs. 4 zation (HIO), essentially a social health insurance
and 5). system that is supposed to complement the tax-
financed services provided by the MOHP. The aim
of this organization is to provide a universal
External Sources of Financing health-care coverage for all Egyptians. While
this organization has not achieved this goal, its
While the majority of public health financing coverage has continued to rise significantly since
comes from the Egyptian government’s Ministry 1990 (Table 4). As such, the HIO represents the
of Finance, Egypt has progressed its health poli- second largest health financing organization in
cies and system through the aid of outside Egypt (WHO-EMRO 2006). The rise in coverage
country’s resources. USAID has been a leading has come with the inclusion of new population
resource in aiding the achievement of various groups such as newborns and school-age children
health-related Millennium Development Goals (WHO-EMRO 2006).
(Egypt’s progress 2010). USAID was responsible Given these inclusions, there are four classes of
for policies in improving health care through fur- HIO beneficiaries (WHO-EMRO 2006):
ther privatization of the health system. However,
USAID pulled out funding in 2009 as a result of 1. All employees working in the government
Egypt’s significant progress in improving health sector
status. Other countries have also helped in 2. Private- and public-sector employees and pen-
funding health policies and goals following inter- sioners and widows
national standards and ideals, in particular the 3. Beneficiaries of the Student Health Insurance
European Union and some of its member states. Program (SHIP)
The International Monetary Fund (IMF) has 4. Newborn children up to age 5 years
planned to loan $4.8 billion to the country in
recent years; however, ongoing political tensions Currently, Egypt has no health insurance laws.
have prolonged Egypt’s bid to secure this (World The HIO instead operates under different social
Bank 2013). insurance laws, ministerial decrees, and regulations.
Fig. 3 Egyptian health 2.2%

investments (2007–2008). 0.6%
(Source: Ministry of Households
Health 2010)
35% Ministry of Finance
Private
60%
External
Source: Ministry of Health 2010
Fig. 4 Health expenditure 3.3% 8.8%

by type of provider and
8% MOH FaciliƟes
ownership (2007–2008). 17.7%
(Source: Ministry of 6.8% Private Hospitals
Health 2010)
Private Clinics
25.9% 23.8%
Pharmacies
5.7%
Fig. 5 Out-of-pocket 8.3% 8.2%

expenditure by provider
Private Hospitals
(2007–2008). (Source:
Ministry of Health 2010) Private Clinics
33.1% 38.4%
MOH Hospitals
University
2.9% Hopitals
0.9% 3.5%
2.8%
1.9%
Table 4 Insurance coverage (1990–2011)

1990 1995 2000 2004 2008 2011
Social insurance 10% 37% 45% 52% 55% 59%
Uninsured/Uncoverd 90% 63% 55% 49% N/A N/A
Sources: Egypt NHA data, cited by WHO 2006, 2010
The goal of the HIO is to be a provider of health with no cap on the quantity of services (WHO-
services to its beneficiaries under a low and fixed EMRO 2006). However, there remains inadequate
premium structure with an extensive benefit pack- management of HIO service providers resulting in
age (WHO-EMRO 2006). These benefits include poor care and low responsiveness in the public
transplants, plastic surgery, and treatments abroad system (Mosallam et al. 2013). Therefore, with the
higher demand for private care, the benefits of social purchase services and goods from other
insurance force most patients insured with the HIO providers)
to continue to pay out-of-pocket for most of their 3. Direct household funding
health care. (a) Over 90% of this goes directly to private
Private health insurance is not significant in health-care providers.
Egypt in terms of financing or population cover-
age, but it is on offer for the select few who can Another source of health revenues has been the
afford it, for example, Egycare. The regulations Family Health Fund (FHF). The FHF pays perfor-
have made it difficult for organizations to profit mance-based incentives to health workers in the
from health insurance schemes, which remain the public sector (WHO-EMRO 2006).
biggest barrier to their spread. However, from
time to time, new private health insurance pro-
grams appear which benefit upper-middle and Paying Health Workers
upper-class populations (WHO-EMRO 2006).
A major development post the 2011 Arab Prior to 2011, over 50% of health professionals were
Spring revolution era was the enactment of a employed by MOHP facilities (WHO-EMRO
new health insurance law, a project in the pipeline 2006). With the limited funds that the MOHP
that was hotly debated at the ministerial cabinet’s receives, salaries for individuals are limited forcing
level, in addition to the parliament floor. Unfortu- most professionals to practice privately for further
nately political instability with frequent changes sources of income. In turn, 89% of medical doctors
in the government executive has delayed the pro- had been found to hold more than one job prior to
ject launch and implementation. A new draft of 2011 (WHO-EMRO 2006). This allows for a bal-
the law was presented to the Higher Health Coun- ance in salary payments as a result of out-of-pocket
cil (HHC) in April 2012 which requested some payments to the private facilities.
modifications. The goal of this development is to Post January 2011, the health ministry has
improve universal coverage within the country in worked closely with health-care practitioners
both in a cost-effective way. The new social health (HCPs) and their respective syndicates, in addition
insurance (SHI) would be intending to cover 90% to the MOF to establish a new payroll system for
of the population and reduce out-of-pocket pay- government-employed health-care workers. It is
ments to 35% at the end of implementation phase. designed to reduce the gap between different payroll
categories. In addition to further appreciate those
Health Payments willing to serve in distant geographical locations,
new incentive schemes are being developed. Once
Out-of-pocket payments have always been the implemented, this would encourage the recruitment
largest source of service payment. Regardless of more competitive health-care practitioners into
of insurance status, there are formal user fees the government health-care system. In addition, it
for both outpatient and inpatient public services, would promote more health-care workers to serve in
the MOHP facilities having the smallest fees remote locations. The new payroll system needs
due (WHO 2010). Overall there have been parliament approval prior to implementation.
three separate pathways in provider payment
(WHO-EMRO 2006):
Physical and Human Resources
1. MOF funding
(a) Funds to government care providers Physical Resources
2. Social insurance
(a) Funds services as a combined provider and The number of health facilities has been growing
commissioner (half of revenues to finance rapidly over the last two decades (Table 5). These
services by itself and the other half to consist of both public and private facilities with
Table 5 Summary of health facilities (2005) Table 6 Summary of health workers (2005)
Number Beds Number
MOHP 1,166 78,502 Physicians 12,917
Rural 669 11,093 Dentists 3,885
Rural (integrated) hospitals 439 8,509 Pharmacists 1,277
Urban 497 67,406 Nurses 44,300
General and district hospitals 233 34,656 Lab technicians 3,575
Obstetric and pediatric hospitals 10 752 Source: CAI HC data, cited by WHO 2006
Mental hospitals 17 6,415
Teaching hospitals and 18 5,639
institutes (THO)
these policies has hindered their implementa-
Curative care organization (CCO) 11 2,129
tion (Jabbour 2012). Progress has been made
Health insurance organization 40 9,828
(H10) in eliminating a job guarantee following medi-
Other ministries’ hosptials 119 29,851 cal school. However, there has always been a
Medical schools 71 25,891 lack of communication between the universities
Police and prison 26 1,382 producing doctors and the government oversee-
Private hospitals 1,329 15,302 ing policies. Prior to 2011, there were 6.53 phy-
Source: Egypt MOHP data, cited by WHO 2006 sicians and 13.75 nurses per 10,000 people
registered in Egypt (WHO-EMRO 2006).
There is little to no data as to whether the size
varying services and amenities. Most facilities of the workforce has been adequate to this point
offer at a minimum basic structural needs (i.e., (WHO-EMRO 2006). Little has changed in
electricity and water) along with at least one human resources apart from new payroll incen-
doctor (MOHP 2004). Maternal, child, and tives to encourage health professionals to prac-
reproductive health services have continued to tice in rural areas (Table 6).
increase with urban areas showing the highest
percentage (MOHP 2004). Overall the private
sector has access to more/better medical equip- Provision of Services
ment compared to public facilities which have
been continuously underfunded (WHO 2010; Overview
Ministry of Health, Egypt 2010; WHO-EMRO
2006). This has not changed since the Arab The utilization of health services in Egypt is
Spring revolution. highly reliant upon the division of public and
private sectors of health care. For the most part,
the majority of health facilities are run by the
Human Resources MOHP. However, the dichotomous system does
not allow for health provision completely by
Historically in efforts to make the health-care the public sector because the latter is chronically
system more independent, health professions underfunded. In contrast, inpatient care is
were encouraged and looked upon highly in mainly provided by the MOHP/public sector,
society. However, with that status, many while ambulatory and pharmaceutical care is
health-care workers have been unwilling to mostly private (WHO-EMRO 2006). While
practice in rural areas of Egypt. This has con- universal health-care coverage has not yet
tinued the cycle of a poor distribution of health been achieved within the country, 100% of the
services in these areas (Jabbour 2012). Efforts Egyptian population has access to basic health
had been made by previous governments to services (WHO 2010).
encourage practitioners to work in rural areas, For the most part, the MOHP oversees hospi-
but discontent among health professionals with tals and outpatient facilities. Other public facilities
consist of HIO service providers, university and Outpatient Care

teaching hospitals, and institutes along with other
various government-run facilities. The private Unlike inpatient care, the majority of outpatient
sector is also extensively providing hospitals, care is provided by private health facilities. On
pharmacies, outpatient facilities, as well as tradi- average only 1.4 out of 3.98 total outpatient visits
tional healers to name a few. Taking all of these per capita per year occurred within MOHP facil-
providers into account, 95% of the population is ities (WHO-EMRO 2006).
within 5 km of a medical facility (Elgazzar 2009).
Of this, public facilities coordinated by the
MOHP had grown to over 5000 health facilities Mental Health Care
with over 80,000 beds in 2011 (Haley and
Beg 2012). Despite the growing number, these There are few large psychiatric hospitals in Egypt.
facilities are underutilized by up to 50%, and For the most part, these facilities have remained
over 60% of primary care visits happen in fairly centralized providing inpatient care consid-
the private sector (WHO 2010; Gericke 2006). ered inadequate by many observers (WHO 2010).
This figure has shown little improvement The majority of the issues stem from inpatient care
over time as in 1994–1995, public hospitals surrounding the provision of acute mental health
showed an occupancy of only 45% (with other care as 60% of the beds are occupied by long-stay
sources showing even lower utilization rates) patients (WHO 2010). Overall, there has been an
(WHO-EMRO 2006). increased recognition of the importance of mental
While so much of the population is located health care. However, in Egypt spending and
close to health facilities, the actual provision of mental health regulations have not kept up pace
services has been somewhat of an issue between with these increased expectations (Jenkins et al.
rural and urban areas along with discrepancies 2010). This has led toward a severe lack of staff,
among different socioeconomic populations. resources, and information regarding mental ill-
This has been the result of a lack of optimal ness (Roberts et al. 2013). Therefore, increased
development in rural areas despite the Health funding and recognition by the MOHP, as well as
Reform Program as well as the lack of affordabil- other service providers, are necessary in order to
ity of different forms of care. As a result, 70% of redress this situation.
outpatient care is obtained privately by wealthier
populations along with longer inpatient stays
(Elgazzar 2009). Pharmaceuticals
Pharmaceutical expenditure accounts for the sec-

Inpatient Care ond largest part of total health spending in Egypt.
This is a result of the majority of pharmaceuticals
Inpatient care is mainly provided through gov- being produced in Egypt. The benefits of this have
ernment-funded health facilities. Eighty-five allowed for increased immunization rates in chil-
percent of all inpatient care in 2005 was through dren and the general population. However, many
government MOHP or public facilities (i.e., of the pharmaceuticals have not fully met world
HIO) (WHO-EMRO 2006). Admission rates standards, and with the lack of communication
on average were 0.029 per capita per year within the health system, there have been issues
which is within the upper-middle range for with management based on needs (WHO-EMRO
comparable developing countries (WHO- 2006). Leading the control of pharmaceuticals in
EMRO 2006). The average length of stay for Egypt is the Central Administration for Pharma-
inpatient care was 3 days; however, wealthier ceutical Affairs (CAPA) within the MOHP.
patients have reported up to 1.5 times that This group has been able to positively influence
(Elgazzar 2009). pharmaceutical developments through a decree
establishing a clinical pharmacy unit and drug worked to recover with the help of other countries
information center in every public and private such as financial assistance from the United States
hospital in order to empower and educate which further increased Egypt’s already large debt
patients on medication issues. Furthermore, the (Hamilton 2013).
use of pharmaceuticals will continue to develop It was not until June 2012 that Egypt finally
as the MOHP health technology assessment elected a new president. Promising to end years of
and pharmacoeconomics unit will work to better presidential abuse of power, Mohammed Morsi
utilize pharmaceutical resources and expenditure. was sworn in (Hamilton 2013). Within his first
year of office, Egypt began importing natural gas.
This investment was to the benefit of the nation’s
The Arab Spring Revolution richest businessmen and increased public spend-
ing on fuels to 25% of all public expenditure –
Taking inspiration from Tunisia, in January of more than what the country spends on health and
2011, Egyptian protesters working alongside the education combined (Hamilton 2013). Also
only organized opposition force, the Muslim within this term, with influence from the Muslim
Brotherhood, stormed the streets of major cities Brotherhood, Morsi broke a number of electoral
in Egypt in order to protest against the current promises. In summary, his actions brought no
Egyptian regime. This revolution succeeded to improvements to social issues nor fulfilled the
overthrow President Hosni Mubarak. Under the goals of a new constitution to be improved fol-
Mubarak presidency between 1981 and 2011, lowing the revolutionary demands. Because of
there were many grievances over questions of this, the Egyptian people once again took to the
freedom of expression, other human rights issues, streets in order to overthrow their new president
as well as social and economic issues. The revo- along with the newly developed constitution. The
lution followed a number of years with high Armed Forces sided with the people. On 3 July
unemployment rates, low wages, as well as food 2013, President Morsi was overthrown by the
price inflation. The overall goal of the revolution military’s coup d’état, and he alongside with
was to end the president’s regime along with the other leaders of the Muslim Brotherhood was
country’s policy on emergency law, lack of free- arrested and put to trial. After a series of violent
dom of speech, and overall corruption from the demonstrations and bombings on police and mil-
government. The protests varied from peaceful to itary institutions as well as on Coptic Christians
violent and lasted a total of 28 days until the and churches, the Muslim Brotherhood was
president was finally overthrown. declared a terrorist organization in December
In efforts to disassemble the protests, the Egyp- 2013. The return to a military government has
tian government attempted to eliminate social led to new uncertainty and a continuation of an
media the night before the protests started. While economically and socially unstable condition. In
this was somewhat successful, the protests still 2014, General Abdel Fattah Elsisi was elected as
filled the streets the next day resulting in President the sixth president of Egypt.
Mubarak dismissing his government, appointing a
new cabinet and vice president Omar Suleiman
(the first in 30 years) all in hopes of satisfying the Reforms
uprising masses. However, protests did not resist
until President Mubarak handed power over to the Overview
Armed Forces placing Egypt in a truly transitional
state (Abou-El-Fadl 2012). Under the Armed The 2011 revolution made way for huge changes
Forces oversight, a new prime minister, Essam within the country. For the most part, the popula-
Sharaf, was announced, the Egyptian Parliament tion recognizes the challenges caused by a
was dissolved, and the Egyptian constitution was rapidly growing population alongside an out-of-
put on hold. Following the revolution, Egypt date public-sector health-care provision (Devi
2013). And while attention has been drawn Proposed Plans

toward this along with the long-standing demands
for more funding for health care and prevention, In the near future, the goal will be to finalize the
the unstable government and economy have not implementation of the new social health insur-
been able to make it a priority (Devi 2013). The ance in order to drastically improve health insur-
past Health Minister Hamed has even stated ance coverage for the general population (both
recently the “20% of hospitals in the rural south rural and urban), to reduce out-of-pocket spend-
had no doctors and only 40% of necessary medi- ing (from 72% to 35%), and to provide Egyptians
cines were available in government hospitals with the freedom of choice in terms of provider
and clinics.” There is much optimism that the rev- and treatment location. This plan requires the
olution will ultimately increase public spending in government budget to grow from 4.7 to 8% at
the health-care sector in the future. However, so far the end of the implementation with health insur-
it has only resulted in addressing some immediate ance spending to represent more than 50% of
health concerns (i.e., deaths, injuries, public dis- total health expenditure. The implementation of
placement, and general deterioration of public a health technology assessment (HTA) system
health-care facilities) (Coutts et al. 2013). The rev- and the recent establishment of a pharmacoe-
olution has hurt the economy by not only reducing conomics unit at the Egyptian Drug Authority
Egypt’s number one source of income (tourism) are seen as a promising step to reduce the exceed-
but also reducing global investments and directing ingly high amount of pharmaceutical expendi-
the limited funds in other places of need (Haley and tures compared to other subsectors in the health
Beg 2012). Therefore, regarding the health-care system. Implementation of these plans will start
sector, Egypt is mostly left with the reforms set in in three to five governorates for 3 years as a pilot
place prior to the revolution along with the hope of followed by a gradual rollout to the whole nation
improved health care in the public sector through if successful.
future increased spending and development. Besides these plans, a focus of future
reforms in Egypt will be to continue to develop
primary care and prevention outreach (Roberts
Past Reforms et al. 2013). One of the key goals in this area
includes ensuring that primary care is provided
Within the past couple years, the government of in rural areas, free medical treatment in hospi-
Egypt has made an improvement in salary bases tals, subsidized treatment of children under
for doctors and pharmacists to meet people’s dis- 6 who are not covered by insurance, and regu-
satisfaction. Besides this, there have been few lation of private health-care providers (Roberts
minor reforms since the implementation of the et al. 2013). The Egyptian government (through
Health Sector Reform Program strengthening the HSRP) will continue to work on developing
primary care. This allowed for greater service their universal health care through further
delivery innovation with the implementation of development of the Family Health Model and
the Family Health Model to provide better access ensuring more equity and access to care
to integrated services at a higher quality (WHO- through increased public spending on health
EMRO 2006). This also accounts for the develop- (WHO 2010; WHO-EMRO 2006). Health sec-
ment of the Family Health Fund to help with tor reform is paving the way through the new
payment and financing for the program. Another health insurance system that aims to provide
step was made toward the goal of universal health easy access to affordable basic health services
care through increasing public health insurance to all Egyptians, rich and poor, urban and rural,
benefits and general coverage. In 2014, Health young and old. Plans also include increasing
Minister Elrabat stated that “there is a plan to the percentage of GDP allocated to health
cover all people by the new health insurance sys- care, in addition to better utilizing those
tem within 7 years.” resources.
Assessment Health indicators in general have improved

throughout the years despite the low levels of
The state of Egyptian health care is fragile and health spending. However, concerns with regard
fragmented given the country’s unstable condi- to noncommunicable and chronic diseases are
tions over the last few years. However, the moti- rising. Therefore, a new strategy is needed to
vation of protest gives hope that the new fight these as both smoking and obesity are
government will improve the health-care system increasing along with their subsequent poor health
through further reforms. The largest issue cur- outcomes. There is also a need to increase disease
rently hindering the system in 2014 is very limited surveillance and to work on improving chronic
funding. The government has continuously under- disease control and prevention (Devi 2013; Coutts
financed the public health sector and thus created et al. 2013). Hepatitis C (with its high incidence
a huge gap in inequality based on what individuals and prevalence) in Egypt poses what some gov-
or their families can afford. Therefore, with the ernment officials have publicly addressed as a
organization of the new constitution and govern- national security concern. There is a growing
ment following the current unrest, one of the first need to set priorities along with a written and
issues to be addressed will be increasing public documented plan on how to proceed to face and
funding for health care (Gericke 2005). This will solve some of these problems. Finally, greater
allow for the much-needed finances to go toward efforts are required to address both the constraints
upgrading and maintaining public health resources and gaps in provision of comprehensive reproduc-
as well as increased payment to health-care workers tive health care in Egypt, including stronger coor-
in the public sector. This should also come with dination mechanisms among various stakeholders
subsequent policies to help distribute the resources and the need for more effective partnerships with
as well as recognize doctor salaries in order to create civil society and the private sector.
an equal distribution of health workers throughout In conclusion, the Egyptian health system does
the country (Haley and Beg 2012). With little to no provide an extensive infrastructure with regard to
change in regard to spending on health within the the number of physicians, clinics, pharmaceuticals,
last decades, this assessment is not new. and physical access to hospitals (Gericke 2006).
This stagnant health system has maintained However, with remaining inequalities, there ulti-
minimal communication between private and mately needs to be a national strategic plan for the
public health care. This is problematic for overall next 5–10 years to address the persisting issues.
health. With a lack of government finance and The very frequent change in health ministers
focus on the public sector comes a subsequent (seven between 2010 and 2013) makes it very
lack in the regulation of the private sector which difficult to move forward with a constant set of
needs immediate change (Gericke 2005). While objectives. Health care needs to grow in Egypt by
the private sector continues to develop and grow diminishing inequality through the spread of
in Egypt, there is an uneven distribution of care affordable care to even the most remote areas and
between the two sectors. There need to be an all population segments. Therefore, there is a need
increased focus and spending in the public sector to build up the MOHP to create a more structured
in order to further come close to achieving the organization from which health reforms and
goal of a universal coverage because as of now improvements will come from. Improvements in
only the private sector seems to continue devel- the health sector cannot be seen in isolation and
oping and providing good care to those who can require parallel, substantial improvements to edu-
afford it. Care provision also needs to spread to cation, health promotion, safe water and housing,
the rural areas of Upper Egypt to reduce inequal- and traffic regulations to name only a few. Despite
ities in access to care. Likewise, the private health- the recent stabilization of government, the continu-
care sector needs to be better monitored for qual- ing economic problems and low public spending
ity and, in particular for the pharmaceutical mar- on health have thwarted most attempts at reforming
ket, cost-effectiveness and price controls. health that have been discussed in recent years.
References Jabbour S. Egypt in crisis: politics, health care reform, and

social mobilization for health rights. In: Public health in
Abou-Ali H, et al. Evaluating the impact of Egyptian the Arab world. Cambridge/New York: Cambridge
social Fund for Development programmes. J Dev Eff. University Press; 2012. p. 477–88.
2010;2(4):521–55. Jenkins R, et al. Mental health policy and development in
Abou-El-Fadl R. Beyond conventional transitional justice: Egypt – integrating mental health into health sector
Egypt’s 2011 revolution and the absence of political reforms 2001-9. Int J Ment Heal Syst. 2010;4:17.
will. Int J Transit Justice. 2012;6(2):318–30. Ministry of Health, Egypt. National Health Accounts 2007/
Anwar WA. Environmental health in Egypt. Int J Hyg 2008: Egypt report. In: Health systems 20/20 project.
Environ Health. 2003;206(4–5):339–50. Bethesda: Abt Associate Inc; 2010. p. 1–45.
CIA. The World Factbook: Egypt. 2013 [cited 2013 MOHP. Egypt service provision assessment survey 2004.
June 27th]. Available from: https://www.cia.gov/ 2004. p. 1–410.
library/publications/the-world-factbook/geos/eg.html. MOHP. 2013 [cited 2013 july]; Available from:
Coutts A, et al. The Arab SPRING and health: two years http://www.mohp.gov.eg/about/OrgChart/default.aspx.
on. Int J Health Serv. 2013;43(1):49–60. Mosallam RA, Aly MM, Moharram AM. Responsiveness
Devi S. Women’s health challenges in post-revolutionary of the health insurance and private systems in
Egypt. Lancet. 2013;381(9879):1705–6. Alexandria, Egypt. J Egypt Public Health Assoc.
Egypt’s progress towards achieving the Millenium Devel- 2013;88(1):46–51.
opment Goals. Ministry of Economic Development, Roberts B, et al. The Arab Spring: confronting the chal-
Cairo; 2010. p. 1–154. lenge of non-communicable disease. J Public Health
Elgazzar H. Income and the use of health care: an empirical Policy. 2013;34(2):345–52.
study of Egypt and Lebanon. Health Econ Policy Law. Salem MA. Policy Research in Egypt’s Health Sector
2009;4(4):445–78. Reform. The Alliance for Health Policy and Systems
Gericke CA. Comparison of health care financing in Egypt Research; 2002. Working paper no. 13.
and Cuba: lessons for health reform in Egypt. East WHO. Country cooperation strategy for WHO and Egypt
Mediterr Health J. 2005;11(5–6):1073. 2010–2014: Geneva: World Health Organization;
Gericke CA. Financing health care in Egypt: current 2010. p. 1–52.
issues and options for reform. J Public Health. WHO. Egypt: health profile: Geneva: World Health Orga-
2006;14(1):29–36. nization; 2013. p. 1–2.
Haley DR, Beg SA. The road to recovery: WHO-EMRO. Health System Profile Egypt: Regional
Egypt’s healthcare reform. Int J Health Plann Manag. Health Systems Observatory, EMRO, World Health
2012;27(1):e83–91. Organization, Cairo; 2006. p. 1–111.
Hamilton OR. Egypt’s latest revolutionary act was World Bank. World Bank Data: Egypt. 2013 [cited 2013
profoundly democratic: London: The Guardian; 2013. June 12th]; Available from: http://www.world
bank.org/en/country/egypt.
Health System in France
35
Karine Chevreul and Karen Berg Brigham
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829
Primary Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
Hospital Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832
Integrated Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832
Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833
Disabled Adults and Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833
Public Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834
Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835
Abstract January 1, 2013, the French population totaled

The French Republic is comprised of metro- 63.7 million inhabitants in metropolitan France
politan France located in western Europe and a and 2.1 million inhabitants in the overseas
collection of overseas islands and territories on territories. France is the second most populous
other continents. It is a unitary state with country in the European Union (EU), and over
administrative subdivisions: 100 departments three-quarters of its population lives in urban
(local authorities) embedded in 27 regions. On areas. It has the fifth largest economy in the
world. The French political system is a parlia-
mentary democracy with a president and a
K. Chevreul (*) bicameral parliament consisting of a National
Health Economics and Health Services Research Unit, Assembly and a Senate. France is a welfare
URC ECO Ile de France, Paris, France state that developed its social security system
e-mail: karine.chevreul@urc-eco.fr
after the Second World War with the aim of
K. B. Brigham covering the financial risks associated with
University of Washington, Seattle, WA, USA

https://doi.org/10.1007/978-1-4939-8715-3_44
828 K. Chevreul and K. B. Brigham
getting sick, being injured in the workplace, There is Statutory health insurance (SHI),
getting old, and growing families. which covers virtually 100% of the resident pop-
ulation under various noncompeting schemes.
The delivery of care is shared among private,
Introduction fee-for-service physicians, private profit-making
hospitals, private non-profit-making hospitals,
The overall picture of the state of health in France and public hospitals. In addition to the health
contains apparent contradictions. On the one hand, care sector and the social sector, there is a health
indicators such as life expectancy, life expectancy and social care sector, known as the third sector,
without disability, and healthy life expectancy which provides care and services to elderly and
show that the health of the population is good. disabled people.
The French average life expectancy is now over Jurisdiction in terms of health policy and reg-
80 years and is the second highest in the world for ulation of the health care system is divided among
women Moreover, the French population is aging, the state (parliament, government, and the
and from 2020 onwards, those aged over 60 will Administration of Health and Social Affairs),
outnumber those aged under 20 (accounting for SHI, and local authorities, particularly at the
27% and 23% of the population, respectively). regional level. Reforms over the last two decades
The aging of the population is not due to a decreas- have attempted to devolve a greater remit in gov-
ing fertility rate as in other European countries. ernance and health policy decision-making to the
Indeed, France has the third highest fertility rate regional level, particularly with respect to plan-
in the EU. In addition, older people remain in better ning. This trend culminated in the 2009 Hospital,
health than in many other European countries. Patients, Health and Territories Act (loi hôpital
The main causes of death in France are cancer, patients, santé et territories; HPST), which
cardiovascular diseases, accidents, and diseases of merged institutions representing the main stake-
the respiratory system. However, France also com- holders (the state, SHI schemes, health profes-
pares well with regard to cardiovascular diseases, sionals, and public health actors) at the regional
while its relative position with respect to mortality level into “one-stop shops,” the 26 regional health
caused by alcoholism, cirrhosis, and cancer of the agencies (agences régionales de santé; ARS).
cervix is improving. Nonetheless, France suffers Cutting across the traditional boundaries of the
from a high rate of premature male deaths from health care sector, the public health preventive
accidents and unhealthy habits such as smoking sector, and the health and social care sector for
and alcoholism that are the most common causes disabled and elderly persons, the ARSs are
of avoidable mortality in France. Additionally, responsible for ensuring that health care provision
France has long reported health inequalities across meets the needs of the population by improving
socioeconomic groups that are wider than in most coordination between the ambulatory and hospital
other European countries. These inequalities result sectors and health and social care sector services
not only from risk factors, but also from disparities while respecting national health expenditure
in access to health services that require the highest objectives.
out-of-pocket expenditure by patients. Planning and regulation involve negotiations
among provider representatives (hospitals and
health professionals): the state, represented by
Organization and Governance both the Ministry in charge of Health and the
Ministry in charge of the Economy and Finances,
The French health care system is of a mixed type, and SHI. The outcome of these negotiations is
structurally based on a Bismarckian approach with translated into administrative decrees and laws
Beveridge goals reflected in the single public payer passed by the parliament. These include public
model, the increasing importance of tax-based rev- health acts, social security funding acts, and
enue for financing and strong state intervention. reform acts. In the context of increasing health
35 Health System in France 829
care expenditure and the increasing SHI deficit, individual health insurance electronic card (carte
the role of the state in planning and regulation has Vitale) on the patient side and an electronic iden-
increased over the past two decades. The respon- tification card for health workers (carte de pro-
sibility for capacity planning is shared by the fessionnel de santé; CPS) on the provider side.
central and regional levels. At the regional level, Additionally, in order to improve quality of
the ARSs coordinate ambulatory and hospital care care and decrease redundancy in consumption,
and health and social care for the elderly and the development of an electronic patient record
disabled through a regional strategic health plan (dossier medicale personnel; DMP) to group
(plan stratégique régional de santé; PRS) based medical information and care consumption in
on population needs. Each sector’s planning pro- ambulatory and hospital settings for patients on a
cess must comply with the PRS which, starting in voluntary basis was initiated in 2004. Implemen-
2010, represents the first attempt at regional plan- tation has not been smooth due to both technical
ning of the ambulatory care sector. and patient privacy concerns. However, by June
Providers are paid by SHI (or directly by 2013, nearly 350,000 patients had DMPs, which
patients who are later reimbursed). The statutory are now used by 4800 health professional in the
tariffs are set through negotiations between pro- ambulatory sector and 350 institutions in the hos-
viders and SHI and are approved by the Ministry pital sector.
in charge of Health. Quality of care is regulated at
the national level. Hospitals must undergo a cer-
tification process every four years, but there is no Financing
formal re-certification or re-licensing process for
health professionals. However, doctors, pharma- Financial responsibility for health care in France
cists, dentists, and midwives are required to fol- is mainly borne by SHI. However, SHI only funds
low lifelong learning activities through around three-quarters of health spending, leaving
professional continuous development. considerable scope for complementary sources of
The role of patients in regulation and planning funding, such as private voluntary health insur-
has slowly increased in recent years, although ance (VHI). Moreover, funding for long-term care
their participation remains marginal. The 2009 for the elderly and disabled is financed differently.
HPST law created the Regional conference on It is partly financed by a dedicated fund created in
health and autonomy (Conférence régionale de 2004, the National Solidarity Fund for Autonomy
la santé et de l’autonomie; CRSA) through (Caisse nationale de solidarité pour l’autonomie;
which patients and their representatives may par- CNSA). Its resources come from SHI and the
ticipate in defining public health priorities at the “solidarity and autonomy contribution” that is
regional level, including development of the PRS. generated from the revenue of an unpaid work-
Patient input is stronger at the services level. ing/solidarity day ( journeé de solidarité) contrib-
Health information systems and technologies uted by the French working population. Local
have been developed to help in planning and authorities and households also participate in
regulation. The SHI inter-schemes system financing these categories of care.
(systeme national d’information interrégimes de SHI resources mainly come from an earmarked
l’assurance maladie; SNIIR-AM) was established tax called the “general social contribution” (con-
in 2003. It encompasses information on patient tribution sociale généralisée) based on total
health care consumption for which a claim has income and not only on earned income as was
been sent to SHI, regardless the type of care (hos- previously the case. Additional revenue accounts
pital inpatient stays, self-employed doctor visits, for around 13% and comes from specific taxes
drugs. . .) as long as it is covered by SHI. This such as “sin” taxes or taxes on the pharmaceutical
system has been facilitated by the development of companies’ turnover. Funds are pooled at the
electronic billing, which has been implemented in national level, and there is no formal allocation
the ambulatory sector since the mid-1990s via an mechanism in France.
SHI coverage is established according to resi- SHI pays for hospital acute care by means of
dent status, and entitlement is based on employ- a DRG-type payment method (tarification à
ment, unemployment, student, or retiree status. l’activité; T2A). In addition to the 20%
Since the introduction of universal medical cov- co-insurance amount, a hospital catering flat
erage (couverture maladie universelle; CMU) in fee amounting to €18 per day is the responsibil-
2000, the state has covered the health care costs of ity of patients or their VHI. Self-employed pro-
residents not otherwise eligible for SHI. Illegal fessionals are paid on a fee-for-service basis and
residents who have applied for residency are cov- patients are reimbursed based on official tariffs.
ered by a special program (aide médicale de l’état; However, certain self-employed doctors are
AME). allowed to practice extra-billing, which impairs
SHI covers a broad range of services and the equity of access objective of the system.
goods that are provided in hospital or defined in Financial incentives to improve the quality and
positive lists for outpatient care. In Europe, the efficiency of doctors’ practices and to decrease
level of coverage is considered quite generous, the level of extra-billing exist. Individual con-
offering rapid access to the latest innovations. tracts with general practitioners including with
The rate of coverage varies across goods and pay for performance target were initially
services; for example, the co-insurance rate is implemented in 2009 and extended to specialists
30% for physician and dentist care, 40% for in 2012. From 2012, measures designed to rein
ancillary services and laboratory tests, and 20% in excessive extra billing include a new volun-
for hospitalization. For most drugs, co-insurance tary “Access to health care.”. In exchange for
amounts to either 35% or 70% but ranges maintaining their extra-billin fee practices at
from 0% for nonsubstitutable or expensive 2012 levels, doctors benefit from social and fis-
drugs to 85% for “convenience medications.” cal advantages.
However, there are several conditions for which In 2012, total expenditure on health in France
patients are exempted from co-insurance, such as was estimated at €243 billion or 12% of gross
chronic conditions covered under the ALD domestic product (GDP). Expenditure on personal
scheme (affections de longue durée) or preg- health care accounted for three-quarters of total
nancy after the fifth month. Co-insurance health expenditure (€183.6 billion), representing
amounts are generally covered by VHI, which an average €2806 per person. Of this, 75.5% was
provides reimbursement for co-payments and publicly funded, with complementary voluntary
better coverage for medical goods and services health insurance (VHI) financing 13.7% and
that are poorly covered. However, deductibles households covering 9.6% in out-of-pocket
introduced after 2004 with the aim of improving costs. As in other European countries, health
coordination of care and reducing patient con- care expenditure has steadily increased. As a
sumption cannot be covered by VHI or else the result, since the late 1990s, SHI annual expendi-
insuring entity will be subject to financial ture has been capped by a national ceiling on SHI
penalties. expenditure (objectif national des dépenses assur-
Over recent decades, VHI has gained an impor- ance maladie; ONDAM) approved by the parlia-
tant role in ensuring equity of access and financing ment. It is splits into subtargets that cover hospital
of health care. It covers 88% of the population on expenditure, social, and health care services for
a private basis. Since 2000, in order to ensure that elderly and disabled, privately delivered care.
the measures increasing patients’ co-insurance While there is no formal allocation mechanism,
would not result in increased social inequities in this has provided SHI with a tool to allocate
access, public complementary insurance health care expenditure between broad sectors. If
(couverture maladie universelle complémentaire, the health care system is found to exceed its pro-
CMU-C) has been offered on a voluntary basis to jected budget by more than 1%, a special parlia-
lower socioeconomic groups and covers 6% of the mentary Alert Committee can ask the head of the
population. Directorate of Social Security (the watchdog for
all social security branches) to present a financial About 7% of the French population works in the
rescue plan. health care sector. The number of practicing doctors
per 1000 population is slightly lower than the EU27
average (3.3 vs. 3.4), although in France the num-
Physical and Human Resources ber includes not only those providing direct patient
care, but also managers, educators, researchers, etc.
In France, there is a high level of facilities, equip- The number of practicing nurses exceeds the EU27
ment, and other physical resources. However, average (8.5 vs. 7.9), and the ratio of nurses to
there are strong disparities in geographic distribu- physicians is 2.6, just above the EU average. Work-
tion, and France is well below the EU average for force forecasting and careful planning of educa-
MRI units (7 per million population, compared to tional capacity is mostly made at the national
the EU23 average of 10.3) and CT scanners (11.8 level through the use of numerus clausus for med-
per million population, compared to 20.4). ical professionals. It seeks to prevent shortages or
There are four main categories of hospitals: oversupply of health professionals. However, it
regional hospitals, general hospitals, local hospi- does not control for the geographical distribution
tals, and psychiatric hospitals. Capital investment of medical professionals, as self-employed profes-
is either covered by reimbursements for services sionals are free to choose where they practice. In
delivery or funded through specific programs. order to solve the resulting great disparities in the
Two nationwide investment plans were launched distribution of medical professionals, there has
in the last decade in order to improve quality and been increasing transfer of tasks from medical to
safety standards. The ARSs are responsible for the other professionals such as nurses and development
control of capital investment and purchases of of incentives for attracting health professionals to
major medical equipment. under-served areas.
Following the general trend in European coun-
tries, the number of full time acute beds per 1000
inhabitants has steadily declined over the last Delivery of Health Services
20 years. In 2010, it was 6.4, which is above the
EU27 average of 5.3. Reduction in acute care The delivery of care is shared among private phy-
capacity was accompanied by the transformation sicians, private profit-making hospitals, private
of acute beds into rehabilitation and long-term non-profit-making hospitals, and public hospitals.
care units and the development of day surgery In addition to the health care sector and the social
and hospitalization at home. sector, there is a so-called “third sector” which
Nurses and nursing aides form the largest provides both care and social services to elderly
group of professionals, accounting for approxi- and disabled people.
mately half of the health care workforce. Regis-
tered health professionals also include medical
professionals (physicians, dentists, and mid- Primary Care
wives), pharmacists, professionals involved in
rehabilitation (physiotherapists, speech therapists, Primary care is mostly delivered in the ambulatory
vision therapists, psychomotor therapists, occupa- care sector by self-employed professionals who
tional therapists, and chiropodists) and technical are paid on a fee-for-service basis by patients who
paramedical professions (hearing aid specialists, receive partial reimbursement from the SHI funds
opticians, and radiographers). The other profes- (i.e., co-insurance payments apply). Since the late
sions usually identified as contributing to health 1990s, GPs have gained a major role in the coor-
care include clerical and technical staff working in dination of care with the implementation of a
hospitals, laboratory technicians, pediatric auxil- semi-gatekeeping system that provides incentives
iaries, dieticians, psychologists, and ambulance to people to visit their GP prior to consulting a
drivers. specialist.
Hospital Care Because of concerns about excess acute care

capacity, alternatives to full-time inpatient care
Hospital care is delivered by public, private non-- have been promoted since the late 1980s. Specif-
profit-making, and private profit-making hospi- ically, authorizations to develop “hospital at
tals. Acute medical, surgical, and obstetric care home” (hospitalsation à domicile; HAD) units,
is provided by public as well as private hospitals, as well as ambulatory care places, have been
with different areas of specialization. granted in return for reducing the number of
Acute medical care is mainly provided by pub- acute beds.
lic hospitals, which account for three-quarters of HAD units, which have existed in France for
acute medical care capacity (80% of medical beds about 50 years, send medical or paramedical staff
and 70% of day-care beds) and perform 75% of to the patient’s home on a daily basis in order to
full-time episodes and 55% of day-care episodes. provide continuous and coordinated care in cases
Private profit-making hospitals account for 10% where a hospital stay would have been otherwise
of full-time beds and 20% of day-care beds, and necessary. Administratively, the units are either
they provide 15% of full-time episodes and 40% hospital departments or private mainly non-profit-
of day-care episodes; they specialize in a small making associations. Each unit is led by a physi-
number of technical procedures for which there cian, who takes responsibility for the overall coor-
are profit opportunities, such as invasive diagnos- dination of medical care, while nurses coordinate
tic procedures (e.g., endoscopies or angiograms). individual treatments; actual care is provided by
The balance of the acute medical activity is salaried staff from the hospital or self-employed
performed by the private non-profit-making sec- professionals. In 2011, there were about 305 HAD
tor, which are the main providers in the area of units that cared for more than 100,000 patients,
cancer treatment. mainly in the areas of palliative care, cancer treat-
Surgical care is mainly delivered by private ment, and perinatal care.
profit-making hospitals, which perform more Ambulatory surgery accounted for only 40%
than half of all surgical procedures, including of surgical hospital stays in France in 2011, com-
75% of the surgical episodes performed in pared to nearly 80% in the UK. The Minister of
day-care settings. Surgical care accordingly rep- Health has set a target for ambulatory surgeries to
resents more than half of the acute care activity of exceed 50% of all surgeries by 2016.
the private profit-making sector. These hospitals
tend to specialize in procedures that can be
performed routinely within a short stay with a Integrated Care
predictable length; for example, they perform
three-quarters of surgery for cataracts and vari- The 2002 the Patients’ Rights and Quality of Care
cose veins and two-thirds of surgery for carpal Act brought together diverse provider network
tunnel syndrome. Public hospitals perform a initiatives under the concept of “heath networks”
third of surgical procedures, with a much wider with the aim of strengthening coordination and
scope than profit-making hospitals, including the continuity through the interdisciplinary provision
most complex procedures. Surgical procedures of care, particularly for selected population
performed in the private non-profit-making sector groups and targeted diseases. The disease man-
are mostly related to cancer treatment, as for med- agement provided by these networks also includes
ical stays. experimentation with new models of care delivery
Two-thirds of obstetric procedures are (e.g., nurses performing tasks previously reserved
performed within public hospitals, while the for doctors). Participation is voluntary both for
private sector accounts for the remaining patients and providers. Patients may benefit from
third, mainly within profit-making hospitals, services not usually covered by SHI (e.g., podiat-
which account for one-quarter of all obstetrical ric care and dietary advice for diabetics), and
stays. physicians may be reimbursed for preventive
services and patient education not otherwise cov- goal of offering respite care for families and day
ered. Physicians receive additional compensation care for patients with Alzheimer’s disease and
for coordinating the care of patients with certain other dementias.
chronic diseases (€40 per patient per year).
Disabled Adults and Children

Long-Term Care
About 3.2 million people are registered as dis-
Long-term care for elderly and disabled is pro- abled in France, of whom 1.8 million are affected
vided through both residential care and home care by a severe disability that limits their functional
and falls under the third sector, which combines autonomy. Disability is measured in terms of an
elements of medical and social care. The French incapacity rate, which takes into account the
population aged over 75 years is expected to degree of difficulty with daily living. Specific
nearly double by 2050, when it will constitute committees for children and for adults at the
15.6% of the population compared to 8% today. department level evaluate the rate of incapacity
Thus, there is an increasing need for long-term and determine the right to certain benefits. They
care services for frail elderly persons at home or in also have the authority to refer the disabled person
nursing facilities or other residential care settings. to a specialized institution.
In 2010, French long-term care spending was Around 200,000 disabled adults are accommo-
estimated at €34 billion, or 1.73% of GDP, of dated in 4800 dedicated facilities. Different insti-
which 70% was publicly funded. tutions provide a range of services for disabled
Home care is mainly provided by self- adults with different levels of functional auton-
employed physicians and nurses and, to a lesser omy. Nearly 130,000 disabled children are cared
extent, by community nursing services (services for in 2500 facilities. A large number of institu-
de soins infirmiers à domicile; SSIAD), which tions offer treatment, special education, and voca-
deliver nursing care at home mainly using tional training to children affected by motor,
employed auxiliary nurses and to a lesser extent cerebral, or intellectual disabilities.
nurses, who are mostly self-employed. Disabled individuals may be eligible for mon-
Residential care for elderly people is provided etary allowances. The disability compensation
by many types of institution offering different allowance (prestation de compensation du hand-
levels of service. These include collective hous- icap; PCH) may be used to finance the wages of
ing facilities ( foyers logements), offering a range aides to disabled people or their families or any
of nonmedical facilities (such as catering and necessary technical devices. The allowance is
laundry) and almost no medical care; retirement funded by the general councils, the CNSA, and
homes (etablissements d’hebergement pour the CSG funds and is not means tested.
personnes agées; EHPA), which accommodate
the elderly but also offer medical care; and long-
term care units (unités de soins de longue durée; Mental Health Care
USLD), which accommodate people whose care
requires constant medical monitoring. These units Mental health care is delivered by both the health
are provided in autonomous nursing homes or in sector and the social and health care sector. As in
hospital wards for very sick and dependent many other European countries, mental health
people. care policy in France during the second half
In the early 2000s, intermediary services were of the twentieth century was influenced by a
created to receive frail elderly persons not living general movement towards community-based
in residential services for short periods. They care organization of mental health care services – the
for patients on a daily basis (accueil de jour) or on so-called “deinstitutionalization” process.
a temporary basis (accueil temporaire) with the Services provided by the health sector take the
form of both public and private outpatient and drugs “over the counter” on shelves directly
inpatient care. accessible to patients.
Adult public mental health care is provided A number of measures have been taken to try
within around 800 geographical areas that cover to improve and limit the prescribing behavior of
theoretically equivalent populations of approxi- physicians and as well as the consumption pat-
mately 60,000 inhabitants aged 16 or more, terns of patients. The promotion of generic drugs,
called mental health care areas (secteurs de largely nonexistent until recently owing to the
soins de santé mentale; MHC). Care within relatively low price of drugs in France, first
each area is coordinated by a hospital (a public occurred in the 1990s. The rate of generic substi-
hospital in more than 90% of the cases) and tution increased to 83% in 2012 from 76% in
includes a wide range of preventive, diagnostic, 2011. The volume of drug consumption has
and therapeutic services, which are provided in slowed since 2010 due to fewer prescriptions,
both inpatient and outpatient settings. In partic- the effect of publicity campaigns, including
ular, ambulatory care centers (centres médico- those to reduce antibiotic use, and removal of
psychologiques; CMP) are present in almost certain drugs from the positive list.
every MHC area; they provide primary ambula-
tory mental health care, including home visits,
and direct the patients towards appropriate ser- Public Health
vices. The size and resources of MHC areas are
quite heterogeneous. Public health policy and practice in France have
Public mental health care for children follows a historically been difficult to describe because they
similar territorial organization, with 321 areas involve numerous actors and sources of funding,
covering an average of 46,000 people aged and large discrepancies exist between legislative
under 20 years (corresponding to an average of texts and actual practice, which relies on the ini-
210,000 inhabitants). These MHC areas for chil- tiative of local actors. The 2004 Public Health Act
dren show even wider geographical inequalities. provided a new framework for public health pol-
icy, firmly establishing the responsibility of the
state in public health matters and emphasizing the
Pharmaceutical Care role of the regional level for organizational issues.
The Act also created a quantitative assessment
France is the fourth largest market for pharmaceu- framework for health policies encompassing pub-
tical drugs in the world and the second in Europe lic health objectives for 5-year periods that must
after Germany. Drugs are dispensed by self- be monitored on an annual basis and set 5-year
employed pharmacists, while the price of drugs targets for most of the related indicators. In order
is set administratively for all drugs covered by to meet some of these goals, several national plans
SHI. Pharmacies have a monopoly on the dispens- have been established, such as those related to
ing of medicines. As a general rule, retail pharma- cancer; violence, addictions, and risky behaviors;
cies must be owned by a qualified pharmacist or environment and health; quality of life of patients
by a group of pharmacists associated in a com- with chronic diseases; and the provision of health
pany; these pharmacists or companies cannot be care for patients with rare diseases.
proprietors of more than one pharmacy. This num-
ber of pharmacies is regulated by a numerus
clausus that takes into account both the size of Reforms
the population to be served and the distance
involved in getting to the nearest pharmacy. The main objectives of the reforms to the health
There were about 22,000 retail pharmacies in care system of the last decade were to contain SHI
2012. Since June 2008, pharmacies have been expenditures without damaging equity in financial
allowed to sell a limited range nonprescription access, to increase geographic equity in access to
care, and to meet the increasing demand for long- and simplified regional governance of the health
term care. Decentralization and a change in the care system by creating the ARSs. In addition to
balance of power between the state and SHI were creating the PRS, which should lead to a common
the main instruments used to achieve these approach in planning for the hospital, ambulatory,
objectives. and health and social care sectors, it made formal
To contain SHI expenditure, two categories of legal provisions for the transfer of tasks between
measures were used. The first, called the “strict professionals. It also linked the regional medical
accounting cost-containment policy,” primarily numerus clausus to needs. In order to optimize the
focused on decreasing the size of the benefit bas- distribution of doctors without impairing freedom
ket and levels of coverage, resulting in a shift of settlement, incentives to increase the attractive-
towards VHI coverage. After 2004, several new ness of underrepresented specialties and medi-
mechanisms were introduced. A coordinated care cally under-served areas are being developed.
pathway was implemented with higher For instance, wages for hospital doctors will pos-
co-insurance for patients consuming care out of sibly increase in contexts where there is a high
this pathway, and new categories of co-payment need for their specialties, and contracts with med-
for patients were created with the introduction of ical students and self-employed health profes-
deductibles on some categories of care such as sionals with financial incentives to practice in
drug packages, doctor and nurse consultations, under-served areas will be implemented on a vol-
or patient transportation. Finally, there was stricter untary basis.
control of statutory tariffs, and starting in 2013 The increasing demand for long-term care is a
economic considerations have been introduced in major concern, as the need for public funding in
health technology assessment of innovations. the coming decades is estimated to be three times
The second category of measures was called higher than the expected growth of the population,
the “medically based cost-containment policy”; it thereby threatening equity in financing. Since
was developed in the 1990s after a long period of 2005, various financing reform proposals have
strict accounting policies that led to ongoing con- been debated, ranging from a newly covered risk
flicts between doctors and SHI. Medically based under the social security system to targeted subsi-
cost-containment focuses on the reduction of dies for private long-term care insurance. How-
financial and equity loss due to medical practice ever, to date no reform measure has been enacted.
variations and aims to improve medical practice.
The main tools used are the implementation of
lifelong learning, the development of practice Assessment
guidelines by national agencies, and the introduc-
tion of good practice commitments within profes- The French health care system has long enjoyed
sionals’ collective agreements with SHI. At first, the reputation of being one of the best in the
coercive measures such as fines for not following world. It has become synonymous with universal
continuous education were used to enforce this health coverage and a generous supply of health
new policy, but this was slowly abandoned for a services. This reputation comes in large part from
move towards the development of incentives, success in meeting its goals of full coverage, access
most recently the introduction of payment for without waiting lists, patient choice, and satisfac-
performance for individual doctors based on tion. The combination of a basic universal public
meeting good practice targets. Overall, it appears health insurance system and voluntary comple-
that the coercive medically based cost- mentary private insurance, which provides reim-
containment policy did not lead to major improve- bursement for co-payments required by the public
ments in collective practice and much is expected system as well as coverage for medical goods and
from the pay-for-performance approach. services that are poorly covered by the public sys-
In order to facilitate geographical equity in tem, results in low out-of-pocket costs and high
access to care, the HPST reinforced local planning medical care utilization. France’s average life
expectancy of over 80 years is in part testament to and between health care and public health. Health
the strong combination of good health care and expenditures per capita are higher than the OECD
good public health policies in France. average, ranking usually third or fourth after the
Despite these positives, there also are some United States, Germany, and Switzerland,
shortcomings, especially when considering effi- depending on the data used and year. The high
ciency and socioeconomic inequality in health. level of health expenditure has become increas-
Major problems include lack of coordination ingly important at a time when the public system
between hospital and ambulatory services, is facing chronic deficits, which are likely to
between private and public provision of care, increase with the current economic downturn.
Health System in Japan
36
Ryozo Matsuda
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838
Stewardship/Governance in Health System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
Dimensions of Coverage (Breadth, Scope, Depth) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
Typologies of Health System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840
Regulating and Planning; Actors and Responsibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841
Sources and Collection of Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841
Pooling of Funds and Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842
Purchasing Process and Paying for Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842
Health Spending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842
Intermediate Care Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
The Health Workforce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
Provision of Services: Providers, Services, Access, and Quality . . . . . . . . . . . . . . . . . . . 844
Public Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
Primary Care/Ambulatory Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
Specialized Ambulatory Care/Hospital Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845
Dental Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846
Complementary and Alternative Medicines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846
R. Matsuda (*)
Ritsumeikan University, Kyoto, Japan
e-mail: ryozo.matsuda@gmail.com

https://doi.org/10.1007/978-1-4939-8715-3_45
838 R. Matsuda
Abstract contexts. In the nineteenth century, when Japan

The health-care system in Japan has been based adapted to the changing world order and econ-
on the Statutory Health Insurance System, omy, the health care there drastically changed
consisting of more than 3,000 community- with the introduction of the Western medicine.
based and employment-based insurance plans The expansion of health-care services and the
with significan subsidies from the general bud- increased population of waged workers until
get. The system, supplemented by the Public the1910s lead to the establishment of statutory
Assisstance Program, covers the entire residents health insurance for workers, an idea learned
for most medical and dental services. The from Germany. Facing poverty and sickness par-
national government decides its benefit basket ticularly in rural areas having traditions of mutual
and prices of covered services and pharmaceu- assistance, the government made a legal frame-
ticals after nagotiations with providers and work to establish statutory community-based
insurance organizations. Two-tierd local gov- insurances in the 1930s. The framework enabled
ernments are involved in regulating the system, a municipality to establish first voluntary and
developing supplementary measures, and pro- then compulsory health insurance in the 1940s
viding public health services. Patients are free to for the uninsured residents in its jurisdiction
choose providers when they use health services. (Ikegami et al. 2011; Tatara and Okamoto 2009).
Physcians are not differentiated into general The framework worked well until the national
physicians and specialits: ambulatory care is economy deteriorated during the WWII.
provided both at clinics and at hosptital outpa- During the occupation by the Allied
tient departments.With different mixes of pro- Nations since 1945–1952, new ideas and mea-
viders in different regions, the government has sures, including hospital plannings, had been
been developing regional regulations. The sys- developed under the influences of the United
tem works fairly well: access to healthcare is States. The government expanded eligibility
good though financial and geographical barries of the employment-based health insurance and
have been occasionally reported, particularly in imposed the obligation of establishing compul-
the era of increasing poverty. Mechanisms to sory health insurance on all municipalities by
monitor and regulate quality of care are becom- 1961. The two types of insurance have been the
ing more important with increasing pressures on main compositions for funding health care.
resources. The health system in Japan has been
principally universal since the implementation of
the community-based compulsory statutory insur-
ances across the country in 1961. Health-care
Introduction services, pharmaceuticals and medical devices
covered, and the coverage rates are almost
The health-care system in Japan has been the same across all statutory health insurances
based on a combination of community-based and although some insurances provide additional cov-
employment-based statutory health insurances erage for, e.g., preventive medical examinations.
with significant subsidies from the general budget. The rules of payment for providers are also com-
The system, which is called the statutory health mon across insurances.
insurance system (SHIS) here, has been governed Delivery of health care has been market-ori-
at the national level, although local governments ented under regulations of the government. Mean-
have been heavily involved in the system by while, local governments have established their
operating community-based health insurances hospitals in their own initiatives, supported by
and implementing regulations on health-care subsidies from the national government. Private
providers. providers are principally supposed to behave as
The current system is an accumulation of not-for-profits although some for-profit compa-
layers that were molded in different periods and nies have their hospitals that began their operation
36 Health System in Japan 839
before the establishment of that principle. To systems. The decision is made usually according
administer the system, the national and local gov- to decisions of the Central Social Insurance Med-
ernments have developed complex regulations ical Council, which is a major arena for policy
and incentives more than half century. debates with representatives from insurers, pro-
According to the structure given by the viders, ministry officials, researchers and other
volume, this case study focuses on basic issues experts. Meanwhile, technology assessment of
of the health system and mostly excludes descrip- pharmaceuticals and medical devices is conducted
tions on details on differences between statutory by the Pharmaceutical and Medical Devices
insurances and innovations in policy making. Agency, a regulatory agency of the government.
Also, its descriptions are limited to the period up Two-tiered local governments implement
to 2013. In translation of Japanese language, policies established by the national government
words are selected so that they are clear for inter- as well as develop their own policies. Forty-seven
national readers in references to previous articles prefectures, at the upper level, develop strategies
(Ikegami et al. 2011): some English names of for health-care development and health promo-
insurance plans, acts, and organization are differ- tion, implement regulations on health facilities,
ent from the official translation. monitor activities of providers, and collect data
on health and health care. There are more than
1,700 municipalities at the lower level, each of
Organization and Governance which operates its community-based health insur-
ance for residents that are not covered by other
Stewardship/Governance in Health statutory insurers and the long-term care insur-
System ance. Prefectures and municipalities also imple-
ment regulations on clinics and home care
Health-care policies are developed predominantly providers and hospitals, respectively. Meanwhile,
by the national government with involvement since local governments have omnipotent power
of concerned actors, including statutory health to develop new policies unless they are against
insurers, medical professions, and experts. Within current law, they occasionally develop innovative
the government, the ruling party, the Cabinet, policies for collaboration and supplemental mea-
and the Ministry of Health, Labour and Welfare sures to decrease cost-sharing of children in their
(MHLW) as well as the Ministry of Finance jurisdictions.
and other ministries join health policy making.
Statutory and non-statutory councils and commit-
tees of the government, involving concerned Dimensions of Coverage (Breadth,
actors, usually discuss policy options to build Scope, Depth)
consensuses for enacting legislation and ministe-
rial ordinances (Rodwin 2011). The SHIS covers all residents in Japan except
The national government developed those with social assistance (or livelihood protec-
and enforced laws and regulations on health and tion) and some exceptional cases. In practice, the
long-term care. Coverage rates and policies are insurance is operated by the following three types
usually decided by bills that shall be passed by the of compulsory insurance: employment-based
National Diet. The Social Security Council within health insurance (EHI), community-based health
the MHLW develops national strategies on quality insurance (CHI), and health insurance for elderly
and safety, cost control, and payment reforms in (HIE).
health care. The Minister of Health, Labour and The EHI covers employees and their depen-
Welfare decides services covered and their prices, dents under age 75. It is operated by more than
pharmaceuticals covered and the rule for deciding 1,400 society’s established at large companies for
each price of each pharmaceutical, and other pay- their employees, by more than 75 mutual aid
ments rules in the statutory health insurance associations for public servants and other defined
840 R. Matsuda
groups, and by the National Health Insurance Since 2000, the long-term care insurance
Association (NHIA) for those working at medium (LTCI) covers all residents ages 40 and over. It is
to small companies (Ikegami et al. 2011). Some compulsory and covers both institutional care and
groups of professionals (e.g., doctors in private home care. The co-payment rate is 10% in 2013.
practice) are covered by the purposely established
associations by themselves.
Municipalities are in charge of administration Typologies of Health System
both of the CHI and the HIE in different ways.
They operate the CHI by themselves, while The health system in Japan is principally a type of
they delegate their responsibilities to 47 statutory social insurance-based systems, but since the gov-
insurers, also purposely established, at the prefec- ernment has been involved in making decisions
ture level. A HIE insurer is governed by represen- on some details of the system and more than
tatives of all municipalities in a prefecture. a third of its funds comes from tax, the state
The SHIS provides the same national benefit involvement is far strong than most social insur-
package, which covers hospital care, ambulatory ance-based systems in Western countries (Blank
care, mental health care, approved prescription and Burau 2010).
drugs, home care, physiotherapy, and most On the one hand, it has been partly based
dental care. Health checks, health education, and on statutory health insurances: the EHIs are funded
counseling are delivered by statutory insurers to by contributions both from employees and
those ages 40 and older. Social assistance pro- employers and the CHI by contributions of benefi-
vides similar coverage. Cancer screenings are ciaries and subsidies from tax. On the other hand,
delivered by municipalities outside the SHIS. the government has been holding strong power,
Co-payment rate is 30% in general but 20% for particularly of deciding the payment system and
children under 3 years old and 10% for people levels. Although the system is operated by more
ages 70 and over with lower incomes. To mitigate than 3,000 statutory insurers, financing administra-
high financial burdens, catastrophic insurance tion is highly concentrated with little discretion to
covers most of co-payments over a monthly each insurer except limited issues. Provision of
threshold which varies according to enrollee’s health-care services is based on market mecha-
age and income. Also, cost-sharing is reduced nisms without gatekeeping mechanisms, where
for those with low-income, disabilities, mental public and not-for-profit providers compete with
illness, and specified chronic conditions. A part each other as well as collaborate.
of expenditure on health services and goods can Lee et al. (2008) describe the system as a
be deducted from taxable income. hybrid of a hybrid model between social health
Providers are prohibited from charging extra insurance and the national health insurance,
fees in general, although thay can make extra fees where the financing administration of health
for some services specified by the MHLW, includ- systems is concentrated into a national entity,
ing amenity beds, “experimental treatments,” the and private sectors are dominant in health-care
outpatient services of large multi-specialty hospi- provision.
tals, after-hours services, and hospitalizations of Private voluntary health insurance, historically
180 days or more. developed as a supplement to life insurance,
Catastrophic coverage stipulates a monthly appears to play a marginal role (Paris et al.
out-of-pocket threshold which varies according 2010). Traditional plans usually pay a lump sum
to enrollee age and income (e.g., 80,100 yen for when insured persons are hospitalized over a
people under ages 75 with an average income); defined period and/or diagnosed with cancer or
above this threshold, a 1% co-payment rate is any of other specified chronic diseases. In the last
applied. Alternatively, the threshold works as a decades, however, varieties of complementary
ceiling for low-income people, who do not pay private insurance policies, sold separately from
more than 35,400 yen a month in 2013. life insurance, have been increasing.
Regulating and Planning; Actors and implement regulations on quality of hospital ser-
Responsibilities vices and can develop their own policy measure-
ment, including subsidies and regulations, with
The national, prefectural, and municipal govern- their budget. Prefectures shall have public health
ments regulate health care and conduct planning centers to which many of regulatory responsibili-
activities in various fields within the SHIS, struc- ties concerned with health care and public health in
tured by law. Statutory insurers are responsible their jurisdictions are usually delegated from
for operating themselves within the framework the governors’ office (Tatara and Okamoto 2009).
and regulations stipulated by acts and Municipalities are responsible for operating
ordinances. the CHI and the LTCI, delivering home and
The national government decides which welfare services, and promoting health in the pop-
services and pharmaceuticals are covered by ulation. More autonomous large cities than usual
the SHIS and the rules for paying them. It revises municipalities shall establish public health
the rules every 2 years by building consensus centers.
between providers, insurance organizations, The public can participate in every level
and experts in health policy. Once the rules are of political decision-makings. In the last two
proclaimed, they are valid countrywide in the decades, critical committees concerned with
SHIS. The government also set requirements health care are more likely to have members
and quality standards for health-care facilities, who put patients’ interest first.
most of which local governments enforce in their
jurisdictions.
The national government shall and can Financing
develop its plans on health promotion and health
care. Their objectives include promotion Sources and Collection of Revenue
of healthy behavior and environment, higher
utilization of personal preventive services, In 2010, 82.1% of total health expenditure
increase of efficiency in health-care delivery, was financed through the SHIS, meanwhile
and higher utilization of generic drugs 14.4% by out-of-pocket (OECD 2013).
(OECD 2009). It also makes guidelines for The national and local government paid around
implementing regulations. It directly supervises a quarter and a ninth of national health spending,
the operation of the largest insurer, the respectively. Contributions are collected by
NHIA. Seven regional bureaus of health and each insurer. Each CHI insurer decides its com-
welfare supervise the operation of the insurance plex method of calculating premiums for house-
societies, local branches of the NHIA, and holds. Usually it is based on the number of CHI
the CHI insurers. member in the household and the member’s
Prefectural governments supervise and support household income. Rates,, therefore, vary
the CHI insurers in its jurisdictions both in financial between municipalities. Each HIE insurer at a
and technical terms. They shall develop their plans prefecture levies premium on per-capita and
on health promotion and health care. They are income basis.
usually supposed to consider policy and technical The EHI insurers levy premiums on wages.
guidelines developed by the national government. Employers pay half of these premiums for their
Prefecture shall develop and publish health-care employees. Premium rates of the EHI societies
plans in its jurisdiction, which shall include assess- vary between 3% and 10% of their income
ment of needs, directions for strategic develop- whereas rates of the NHIA, which differ between
ment, and descriptions of providers. The power branches, are around 10%.
and capacity of prefectures for implementing the There are various types of direct and
plans have been limited to place the cap on hospital indirect tax both at the national and local levels,
beds (Hashimoto et al. 2011). Prefectures politically controlled. By law, the national
842 R. Matsuda
and local governments have obligations of Increasing health-care demand, partly due
paying funds, calculated with actual spending, to demographic changes and the introduction of
to the SHIS. new technologies, is considered as cost drivers
in Japanese health-care system (Ikegami and
Anderson 2012).
Pooling of Funds and Resource
Allocation
Physical and Human Resources
Each insurer in the SHIS is expected to be finan-
cially healthy. Subsidies from the national and Physical Resources
local governments are granted mainly to the CHI
insurers and the HIE insurers and, to a lesser Hospitals, clinics, intermediary facilities, long-
extent, the JHIA. There are cross-subsidies from term care facilities, and other facilities have devel-
the CHI and the HI insurers to the HIE insurers, oped. The number of hospitals and beds in them
calculated by factoring in the number of enrollees per population is high, compared to other OECD
ages 65–74. countries (Tatara and Okamoto 2009). Health
facilities are owned and managed both publicly
and privately. Private providers include health
Purchasing Process and Paying for facilities owned by physicians as well as medical
Health Services corporations, which are not-for-profit private legal
entities, usually controlled by physicians, for
Providers are paid by a national payment rule, health-care provision.
which combines various kinds of activity-based To decrease geographical variations, the
funding methods: fee-for-service payments, per- national government increased the number of
diem payments, and per-monthly payments for medical courses with a policy aiming that every
chronic outpatient care. prefecture has at least a university with medical
Providers send claims for the CHIs to the faculties and educational hospitals in the 1960s
Central Federation of National Health Insurance and 1970s. Also, since 1956, the government has
and claims for the EHIs to the Health Insurance developed and implemented its Rural Healthcare
Claims Review & Reimbursement Services, a Plan with subsidies to local governments since
statutory body to manage claims in the SHIS. 1956.
Health facilities need to announce such
specialties and/or subspecialties as “internal med-
Health Spending icine,” “surgery,” “orthopedics,” and “circulatory
medicine.” Which specialties and sub-specialties
The total health expenditure (THE) on health as can be announced, the nomenclature is regulated
percentage of GDP is similar to the average of by the government. It has not so far included
OECD countries. It continuously increased in the “general practice,” “family practice,” nor
last decades. In 2010, 63.3%, 9.1%, 21.4%, 3.0%, “primary care.” The argument to make “general
and 1.6% of the THE were spent for services of practice” or “primary care” recognizable has been
curative and rehabilitative care, services of long- discussed recently.
term nursing care, medical goods, prevention Health facilities can install licensed medical
and public health services, and administration, devices with its resources and, in some cases,
respectively (OECD 2013). Hospitals, nursing with subsidies from the governments. Since
and residential care facilities, and ambulatory there have been no regulations on their diffusions,
care providers spent 47.1%, 3.8%, and 27.1% of magnetic resonance imaging (MRI) and com-
the THE. More than 20% of the THE was spent puted tomography (CT) scanners spread widely
for pharmaceuticals. (Anderson et al. 2005).
A hospital is defined by law as a health-care Intermediate Care Facilities

facility that provides medical care with at least 20
beds. Hospitals provide both inpatient care and In the mid-1980s, a new type of health care facility
outpatient care, particularly specialist outpatient was created to provide “intermediate care” between
care. Hospitals are either publicly or privately the hospital and the community (Ishizaki et al. 1998;
owned and/or operated. A fifth of hospitals is Ikegami et al. 2003). Most services at the new
publicly owned and shares 30% of hospital beds. facilities were covered by the SHIS since 2000.
Small hospitals are common: half of hospitals Then the coverage was transferred to the LTCI.
have less than 150 beds. Psychiatric and long- Meanwhile, a new interdisciplinary post-acute reha-
term care account for around a fifth of hospital bilitation unit has been incorporated in the SHSI
beds, respectively. The number of hospital beds is (Miyai et al. 2011). Measures to develop commu-
about four times larger than the OECD average nity-based integrated care have been progressively
(OECD 2009). implemented in the 2010s (Tsutsui 2014).
The government makes standards on health
workforce, buildings, room spaces, instruments,
and other necessities of hospitals. Hospitals need The Health Workforce
permission from prefectural governments when
they increase number of beds. A prefecture has Physicians
its plan on the number of hospital beds in its Anyone without a license given by the govern-
jurisdiction, according to which it can deny appli- ment is prohibited by law to use the title,
cations from hospitals. “physician”(Ishi). Physician license is given to
In 2012, there are three types of hospitals: those who pass the national medical board exami-
(usual) hospitals, psychiatric hospitals, and hospi- nation after graduating medical courses at universi-
tals with infectious diseases. The standards vary ties and colleges. The capacity of those courses is
between the types. strictly regulated by the government. Physicians that
Prefectural governments designate 378 hospi- just pass the board examination must take manda-
tals, making up approximately 5% of general tory 2-year trainings aimed at developing general
hospitals, as “community hub hospitals,” which clinical knowledge and skills (Teo 2007). After that,
shall operate in close connections with commu- they freely practice in principle but usually continue
nity physicians working at clinics. to take trainings in various specialties (Teo 2007).
A clinic is defined by law as a health-care Most physicians working at hospitals are
facility that provides medical or dental care with- employed by the hospitals and receive salaries.
out or with less than 20 beds for inpatient care. The contract can be either individually or collec-
Most clinics are privately owned and operated. tively through labor unions. Those salaries are
In 2010, only 4.9% of general clinics (clinics usually not related to payments to hospitals from
excluding dental clinics) are operated by public the SHIS.
bodies (Health Statistics Office, Ministry of Physicians working at clinics are usually
Health, Labour and Welfare 2011), although owners of them and are responsible for their over-
their presence is critical in rural areas (Matsumoto all management in addition to clinical issues. So
et al. 2010). Clinics have varieties of medical after paying costs for operating clinics, including
functions: most clinics provide primary care in human resources, buildings, and instruments, they
reality, but some provide specialists care. For can principally decide how to use it.
example, 0.4%, 4.0%, 0.2%, 3.7%, 3.3%, and
11.7% of clinics announce hematology, rheuma- Nurses and Other Co-medical Staff
tology, respiratory surgery, urology, proctological There have been two qualifications in nursing:
surgery, and dermatology, respectively, as one of registered nurses and assistant nurses, who
their specialties (Health Statistics Office, Ministry need licenses, awarded by the national or prefectural
of Health, Labour and Welfare 2012b). governments, to practice. Most nurses are
844 R. Matsuda
employed and get salaries from their employers. (Matsuda 2008). The aim of current health checks,
Approximately 60% of nurses work at hospitals, delivered by the insurers, is not checking general
most of others at clinics. Some nurses operate home health but detecting possible metabolic syndromes
nursing service providers, in which case they earn so that insurers intervene to decrease health-care
money as owners of providers. Japanese Nursing expenditures. The government established targets
Association has developed certification programs for uptake rates of health checks and introduces a
(Japanese Nursing Association 2011). Public financial incentive: insurers that fail to achieve the
health nurses, who are supposed to work in the target have to pay more cross-subsidies to the HIE.
field of public health, and midwives also need Regarding health promotion, the national gov-
licenses to practice. One must take courses for ernment has the national plan and strategies for
the two professions with qualification as nurses. health promotion, “Health Japan 21,” and munic-
Qualifications for long-term care, including home ipalities organize health activities for their resi-
helper and care worker at caring institutions, exist dents using their local health centers.
besides nursing qualifications.
Other qualified professionals in health
care include physical therapists, occupational Primary Care/Ambulatory Care
therapists, radiology technologists, and clinical
medical technologists. For alternative medicines, Ambulatory care is provided by clinics and hospital
licenses are needed to practice therapeutic mas- out-patient departments. The number of ambula-
sage, acupuncture, moxa cautery, and judo chiro- tory patients at medical clinics are 2.5 times than
practic treatment (Tatara and Okamoto 2009). that of hospitals (Health Statistics Office, Ministry
of Health, Labour and Welfare 2012a).
Since physicians in Japan are trained as special-
Provision of Services: Providers, ists and primary care or family care medicine has not
Services, Access, and Quality been established as a specialty in clinical medicine,
it is difficult to distinguish primary care physicians,
Public Health although it is easy to recognize such specialists as
ophthalmologists, otolaryngologists, and dermatol-
Public health administration has been a part ogists. It has been argued that “general practice”
of general administrative structure of the shall be a specialty and included to the nomenclature
governments and been separated from the SHIS. of specialties (Matsuda 2008).
According to legislations by the national govern- There is no gate-keeping. Patients are free to
ment, prefecture governments have a responsibil- choose either clinics or outpatient departments of
ity of public health and environmental health in hospitals when they need medical consultations.
their jurisdictions (Tatara and Okamoto 2011). Meanwhile, highly specialized hospitals can make
Large cities, designated by ministerial ordinances, extra charges when patients visit them without
also have the same responsibility. Those prefec- referral from other providers. Physicians at clinics
tures and cities also have an obligation of estab- or outpatient departments deal with first-contact
lishing and operating public health centers and patients, although their performance might not be
delegate most of their responsibilities and powers satisfactory by the standards of trained family
on public health to directors of those centers. physicians.
Municipalities delivered almost personal
preventive services, including vaccination, health
checks, and cancer screenings, until 2008. Specialized Ambulatory Care/Hospital
Since the 2008 Reform, statutory health Care
insurers deliver health checks and behavioral mod-
ification programs, while municipalities continue to Specialized ambulatory care is provided both at
deliver other personal preventive services clinics and at outpatient departments of hospitals.
Patients can directly use the care without referral in pharmacies in the community, which dispense pre-
principle, although they shall pay extra charges for scribed pharmaceuticals to patients. Some pharma-
the direct utilization. There has been a financial cies operate only for prescribed pharmaceuticals in
incentive to avoid direct utilization of patients of the SHIS; the others sell OTC drugs and other
specialist care: hospitals with highly specialized goods in addition to provision of prescribed phar-
care functions can charge extra fees to patients. maceuticals. There was a tradition that physicians
Hospitals vary in scale from small hospitals with dispense pharmaceuticals at their offices by them-
20 beds to large with more than 1,000 beds. selves in Japan and the tradition still has remained:
Remuneration for specialist physicians 41% of outpatient prescriptions were still dis-
depends on their status, i.e., whether they pensed by physicians in 2008 (OECD 2009).
are employed physicians or owners or executives Patients pay the same proportions of cost-shar-
of health-care organizations, as described above. ing for prescribed drugs as described above. Phar-
The payment method to hospital inpatient care macists can replace prescribed brand-name drugs
is based on their activities but has been gradually with generic drugs unless physicians explicitly
changing from payment on fee-for-service basis prohibit it on their prescriptions. Generic drugs
to payment on per-diem basis with case-mix count for 47.9% in its quantity and 11.4% in
modifications using the Diagnostic Procedure monetary terms among prescribed drugs dispensed
Combination (DPC), a case-mix classification at pharcies.
system similar to the Diagnostic-Related Groups
(Matsuda et al. 2008; Okamura et al. 2005).
However, the payment system with the DPC is Long-Term Care
unusual because it includes both a DPC compo-
nent and a fee-for-service component. The former With the mandatory Long-Term Care Insurance,
is a per-diem payment that declines as the length established in 2000, person with disabilities can
of the hospital stay increases and covers services use monthly budgets, allocated according to their
other than such specified services as surgical pro- assessed needs, to purchase long-term care ser-
cedures and rehabilitation basic charge, which are vices. Long-term care services are classified
covered by the fee-for-service component (OECD largely into institutionalized care and community
2009). A specific coefficient to multiply DPC care. The government prohibits private companies
rate for a hospital is determined in consideration to operate institutionalized care in the LTCI,
of different scales and functions of hospitals. although they can outside the LTCI. Most pro-
Hospitals using the DPCs must submit detailed viders of institutionalized care, therefore, are
data on their services. In 2012, more than half of not-for-profit organizations. Private for-profit
beds were paid with the DPC. The government companies can enter the community care market
uses the data to analyze hospital behaviors and and account for around half of all community care
impacts of financial incentives on them. providers (Olivares-Tirado and Tamiya 2013).
Integration or coordination of care has been
emphasized in health and long-term care policy.
Particular policies toward integration of care Mental Health Care
include development of disease-oriented clinical
care pathways (Okamoto et al. 2011). Psychiatric hospitals, psychiatric departments
of general hospitals, and psychiatric clinics pro-
vide mental health care covered by the SHIS.
Pharmaceuticals In addition to those providers, prefectures have
mental health centers, which are mostly funded
Prescribed pharmaceuticals for outpatients with tax, to support providers with expertise
and inpatients are covered by the SHIS. In princi- and develop collaboration between concerned
ple, patients bring prescriptions of physician to organizations.
846 R. Matsuda
Community mental health care has been obstetricians and pediatricians. Although there
developed. have been much differences in health-care resources
between prefectures, reasons of the differences and
whether they are inequitable or not have not firmly
Dental Care assessed. Furthermore, in the era of increasing pov-
erty, fair and good access to quality health services
Dental care for children as well as adults is cov- have encountered new challenges. Those challenges
ered by the SHIS. Some common services, includ- include delinquency in paying premiums to the CHI
ing orthodontics and expensive artificial teeth, are and cost-related access problems with the current
excluded from the SHIS coverage. co-insurance rates, particularly in ambulatory care
To become a dentist, one shall graduate from a (Matsuda 2016; Murata 2010; OECD 2009).
dental school and pass the national board exami- Quality of care is another area lacking system-
nation. Most dentists own and operate their atic evidences. However, new institutions for
clinics, who are paid on the fee-for-service basis, hospital certification and policy incentives have
and employ dental hygienists and technicians who been developed since 2000. More and more hos-
work with dentist. pitals publish their clinical indicators, which are
supported by the government. With increasing
financial pressures on health-care resources,
Complementary and Alternative mechanisms to monitor and regulate quality of
Medicines care are becoming more important (Hashimoto
et al. 2011).
The government issues licenses of massage ther-
apists, acupuncturists, moxa cauterists, and judo
chiropractitioners for providing care. The licensed
practitioners can provide defined services in the
References
SHIS provided that physicians order them. Anderson GF, Hussey PS, Frogner BK, Waters HR. Health
spending in the United States and the rest of the indus-
trialized world. Health Aff. 2005;24(4):903–14. https://
Assessment doi.org/10.1377/hlthaff.24.4.903.
Blank RH, Burau V. Comparative health policy.
Basingstoke: Palgrave Macmillan; 2010.
One difficulty for anyone trying to assess the Hashimoto H, Ikegami N, Shibuya K, et al. Cost
Japanese health-care system is that the fragmented containment and quality of care in Japan: is there a
system and lack of system-level robust data make trade-off? Lancet. 2011;378(9797):1174–82. https://
doi.org/10.1016/s0140-6736(11)60987-2.
it difficult to assess it quantitatively. The long life Health Statistics Office, Ministry of Health, Labour and
expectancy in Japan suggests that the system Welfare. Summary of 2010 static/dynamic surveys of
works at least fairly well even if strong health medical institutions and hospital report. Tokyo: Minis-
consciousness and prevalent healthy behaviors try of Health, Labour and Welfare; 2011.
Health Statistics Office, Ministry of Health, Labour and
are taken into consideration (Ikeda et al. 2011). Welfare. Summary of 2011 patient survey. Tokyo: Min-
Looking parts of the system, however, ineffi- istry of Health, Labour and Welfare; 2012a.
ciency in delivering health care and imbalances Health Statistics Office, Ministry of Health, Labour and
between regions have been pointed. Health Welfare. Summary of 2011 static/dynamic surveys of
medical institutions and hospital report (in Japanese).
expenditure has been fairly controlled, but its Tokyo: Ministry of Health, Labour and Welfare; 2012b.
projected increase in the near future jeopardizes Ikeda N, Saito E, Kondo N, et al. What has made the
the sustainability of the system (OECD 2010). population of Japan healthy? Lancet. 2011;378
Access to health care has been good since (9796):$32#1094–105.
Ikegami N, Anderson GF. In Japan, all-payer rate setting
patients can choose any providers principally. How- under tight government control has proved to be an effec-
ever, in some rural areas, patients have difficulties to tive approach to containing costs. Health Aff. 2012;31
find physicians, particularly such specialists as (5):1049–56. https://doi.org/10.1377/hlthaff.2011.1037.
Ikegami N, Yamauchi K, Yamada Y. The long term care Neurorehabilitation and Neural Repair. 2011;25
insurance law in Japan: impact on institutional care (6):540–547.
facilities. International Journal of Geriatric Psychiatry. Murata C, Yamada T, Chen CC, Ojima T, Hirai H, Kondo
2003;18(3):217–221. K. Barriers to Health Care among the Elderly in Japan.
Ikegami N, Yoo B-K, Hashimoto H, et al. Japanese International Journal of Environmental Research and
universal health coverage: evolution, achievements, Public Health. 2010;7(4):1330–13413
and challenges. Lancet. 2011;378(9796):1106–15. OECD. Health-care reform in Japan: controlling costs,
Ishizaki T, Kobayashi Y, Tamiya N. The role of geriatric improving quality and ensuring equity. In: OECD,
intermediate care facilities in long-term care for the editor. OECD economic surveys: Japan 2009. OECD
elderly in Japan. Health Policy. 1998;43(2):141–151. Publishing; 2009. https://doi.org/10.1787/
Japanese Nursing Association. Nursing in Japan. Tokyo: eco_survey$32#s-jpn-2009-6-en.
Japanese Nursing Association; 2011. Available at: OECD. Value for money in health spending. 2010. https://
http://www.nurse.or.jp/jna/english/pdf/nursing-in- doi.org/10.1787/9789264088818-en.
japan2011.pdf OECD. Health data 2013 [database on the Internet]. 2013.
Lee S-Y, Chun C-B, Lee Y-G, Seo NK. The National Okamoto E, Miyamoto M, Hara K, et al. Integrated care
Health Insurance system as one type of new typology: through disease-oriented clinical care pathways: expe-
the case of South Korea and Taiwan. Health Policy. rience from Japan’s regional health planning initiatives.
2008;85(1):105–13. Int J Integr Care. 2011. Available at: http://www.ijic.
Matsuda R. Arguments for instituting “general org. URN:NBN:NL:UI:10-1-101572.
physicians”. Health Policy Monitor, April 2008. 2008. Okamura S, Kobayashi R, Sakamaki T. Case-mix payment
Available at: http://www.hpm.org/en/Surveys/ in Japanese medical care. Health Policy. 2005;74
Ritsumeikan_University_-_Japan/11/Arguments_for_ (3):282–6.
Instituting__General_Physicians_.html Olivares-Tirado P, Tamiya N. Trends and Factors in Japan’s
Matsuda R. Public/Private Health Care Delivery in Japan: Long-Term Care Insurance System: Japan’s 10-year
and Some Gaps in Universal Coverage. Global Social Experience. Dordrecht: Springer; 2013.
Welfare. 2016;3:201. https://doi.org/10.1007/s40609- Paris V, Devaux M, Wei L. Health systems institutional
016-0073-1 characteristics. OECD health working papers, No. 50.
Matsuda S, Ishikawa KB, Kuwabara K, Fujimori K, Pasis: OECD; 2010.
Fushimi K, Hashimoto H. Development and use Rodwin MA. Conflicts of interest and the future of medi-
of the Japanese case-mix system. Eurohealth. cine: the United States, France, and Japan. New York:
2008;$32#14(3):25–30. Oxford University Press; 2011.
Matsumoto M, Inoue K, Kajii E, Takeuchi K. Retention Tatara K, Okamoto E. Japan. Health system review. Health
of physicians in rural Japan: concerted efforts of the Syst Transit. 2009;11(5):1.
government, prefectures, municipalities and medical Tatara K, Okamoto A. Public health of Japan 2011. Tokyo:
schools. Rural Remote Health. 2010;10(2):1432. Japan Public Health Association; 2011.
Ministry of Health, Labour and Welfare. Pharmaceutical Teo A. The current state of medical education in Japan: a
expenditures at dispensing pharmacies, FY2013 (in system under reform. Med Educ. 2007;41(3):302–8.
Japanese). Tokyo: Ministry of Health, Labour and Wel- https://doi.org/10.1111/j.1365-2929.2007.02691.x.
fare; 2014. Tsutsui T. Implementation process and challenges for
Miyai I, Sonoda S, Nagai S, Takayama Y, Inoue Y, Kakehi the community-based integrated care system in
A, Kurihara M, Ishikawa M. Results of New Policies Japan. International Journal of Integrated Care.
for Inpatient Rehabilitation Coverage in Japan. 2014;14(1).
Health System in Mexico
37
Julio Frenk and Octavio Gómez-Dantés
Contents
Health Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 850
History of the Mexican Health Care System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
Planning and Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852
Health Information Systems and Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 852
Role of Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853
Coverage and Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853
Sources of Revenue, Collection, and Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854
Health Expenditure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854
Delivery of Personal and Public Health Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855
Quality of Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
Recent Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 857
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858
Abstract
This chapter discusses the Mexican health sys-
President of the University of Miami and former Minister
of Health of Mexico (2000–2006) tem. We first describe the general characteris-
Senior researcher, Center for Health Systems Research, tics of Mexico and the health conditions of the
National Institute of Public Health, Mexico Mexican population, with emphasis in non-
J. Frenk (*) communicable diseases, which are now the
University of Miami, Coral Gables, FL, USA main cause of death and disability. The follow-
e-mail: president@miami.edu ing section is devoted to the description of the
O. Gómez-Dantés basic structure of the system: its history; its
National Institute of Public Health, Cuernavaca, MOR, main institutions; the population coverage;
Mexico
e-mail: ocogomez@yahoo.com the health benefits of those affiliated to the

https://doi.org/10.1007/978-1-4939-8715-3_46
850 J. Frenk and O. Gómez-Dantés
different health institutions; its financial • An increase in life expectancy at birth from
sources; the availability of physical, material, 49.6 years in 1950 to 79.8 years in women
and human resources for health; the delivery of and 74.0 in men in 2013 (2013 est.)
personal and public health services; the stew- • A reduction in fertility from 6.8 children per
ardship functions displayed by the Ministry of women of reproductive age in 1970 to 2.2 in
Health; and other actors. This part also dis- 2013 (2013 est.)
cusses the role of citizens in the monitorization
and evaluation of the health system, as well as The rapid decline in fertility is driving an aging
the levels of satisfaction with the rendered process which implies an increasing proportion of
health services. In part three, the most recent older adults in the population structure. Children
innovations and its impact on the performance under 5 will represent less than 10% of the total
of the health system are discussed. Salient population in 2050 while older adults will con-
among them are the System of Social Protec- centrate over 20% of the total population (Ham-
tion in Health and the Popular Health Insur- Chande 2012).
ance. The chapter concludes with a discussion Mexico is also going through an accelerated
of the most recent health initiatives and process of urbanization. Eight out of every 10
reforms, and a brief analysis of the short- and Mexicans now live in urban areas (Central Intel-
middle-term challenges faced by the Mexican ligence Agency). This is associated to a parallel
health system. process of rural population dispersion which
increases the problems of access to health care of
a population with major health needs (Reyna-
Mexico is the largest Spanish speaking country in
Bernal and Hernández-Esquivel 2006).
the world. It covers 1.9 million km2 of land
in North America (Central Intelligence Agency).
It borders to the north with the
Health Conditions
USA, and with Guatemala and Belize to the south.
Mexico is an upper middle income country with
The increase in life expectancy and a growing
a GDP of US$ppp 1.788 trillion (2012) and a per
exposure to unhealthy life styles in urban dwell-
capita GDP of US$ppp 15,100.1. Its human devel-
ings are modifying the main causes of disease,
opment index is 0.775 (2012), above the world
disability, and death. Mexico is going through a
average of 0.694 and ranking 61 out of 187 coun-
health transition characterized by an increasing
tries (UNDP). Inequality, as measured by the Gini
predominance of noncommunicable diseases
index, is 47.2, higher than all other high human
(NCD) and injuries. In 1950 around 50% of all
development countries except for Brazil (The
deaths in the country were due to common infec-
World Bank). Its principal source of income is ser-
tions, reproductive events, and diseases related to
vices (61.8%), with industry running second
undernutrition (Fig. 1) (Secretaría de Salud 2001).
(34.2%) and agriculture representing a small and
Today, these ailments concentrate less than 12%
waning portion (4.1%) (Central Intelligence
of total deaths, while NCDs and injuries are
Agency). Its annual economic growth rate during
responsible for almost 90% of national mortality
the period 1990–2010 was 2.8% (The World Bank).
(World Health Organization 2012).
Mexico has a population of 116.2 million
The contribution to mortality of the different
(2013 est.) that is witnessing: (Central Intelli-
age groups is also changing. In 1950, half of total
gence Agency; Partida 1999)
deaths were concentrated in children under 5 and
only 15% were concentrated in persons 65 years
• A decline in general mortality explained of age and older (Secretaría de Salud 2007). Now-
mostly by a reduction in infant mortality from adays, more than 50% of deaths are concentrated
79 per 1000 live births in 1970 to 16.2 in 2013 in older adults and less than 10% in children under
(2013 est.) 5 (Zúñiga and García 2008).
37 Health System in Mexico 851
Fig. 1 Evolution of the 100

distribution of mortality by
90
type of disease, México
1950–2010. NCD 80
noncommunicable diseases, 70 Deaths due to injuries
UN, RAP, and CD
60
undernutrition,
reproductive problems, and 50 Deaths due to NCDs
communicable diseases 40
(Source: Ministry of Health, Deaths due to UN, RP and CD
Mexico Secretaría de Salud 30
(2001)) 20
10
0
1950 1990 2000 2010
as part of the World Health Report 2000 (World

History of the Mexican Health Care
Health Organization 2000).
System
These results encouraged the development of
further analysis that showed that catastrophic
The origins of the modern Mexican health system
health expenditures were concentrated among
date back to 1943, when the Ministry of Health
the poor and uninsured. The products of these
(MoH) and the Mexican Institute for Social Secu-
studies generated the advocacy tools to promote
rity (IMSS) were created. IMSS would serve the
a legislative reform that established the System for
industrial work force, while the MoH was
Social Protection in Health (SSPH) in 2004
assigned the responsibility of caring for the
(Frenk et al. 2004). This system has mobilized
urban and rural poor (Frenk et al. 2003). In
public resources by a full percentage point of
1960, a social security institution for civil servants
GDP over a period of 8 years to provide health
was created, the Institute for Social Security and
insurance, through a public scheme called Seguro
Services for Government Employees (ISSSTE).
Popular, to all those ineligible for social security
In order to extend access and improve the
(those who are self-employed, unemployed, or
efficiency and quality of care, a health care reform
altogether out of the labor force).
was launched in 1983: a constitutional amend-
ment establishing the right to the protection of
health was introduced; a new health law was
published; and health services for the uninsured Organization and Governance
population were decentralized to state govern-
ments (Soberón 1987). The force guiding this Organization
program was primary health care. However, uni-
versal access to comprehensive services would The Mexican health system includes a public and
not be reached until the initial years of the new private sector. The public sector comprises the
millennium. social security institutions [IMSS, ISSSTE, and
In the 1990s several national health accounts the social security institutions for oil workers
studies revealed that more than half of total health (PEMEX) and the armed forces (SEDENA and
expenditure in Mexico was out-of-pocket. This SEMAR)], Seguro Popular, and the institutions
was due to the fact that half of the population offering services to the uninsured population,
lacked health insurance. This exposed Mexican including the Ministry of Health (MoH), the State
households to financial crisis. Not surprisingly, Health Services (SESA), and the IMSS-
Mexico performed poorly on the comparative Oportunidades Program (IMSS-O) (Fig. 2). These
analysis of fair financing developed by the WHO institutions run their own health facilities with their
Fig. 2 The Mexican heath system has a public and private sector providing services to overlapping population groups
own staff, except for Seguro Popular, which buys (CONAMED) (Comisión Nacional de Arbitraje
services for its affiliates from the MoH, SESA, and Médico).
IMSS-O. The private sector includes facilities and Regulation is the responsibility of the Federal
providers offering services mostly on a for-profit Commission for Health Risk Protection
basis financed either through insurance premiums (COFEPRIS), charged with assuring food safety,
or out-of-pocket payments. defining environmental standards, promoting
occupational health and safety, regulating the
pharmaceutical industry, and controlling hazard-
Planning and Regulation ous substances like alcohol and tobacco
(Comisión Federal de Protección contra Riesgos
The MoH is in charge of most stewardship func- Sanitarios).
tions, including strategic planning, policy design, The MoH also counts with an evaluation unit
intra- and inter-sectoral coordination, regulation which evaluates the main policies and programs
of personal health services, sanitary regulation, and publishes an annual report on the performance
and evaluation of policies and programs. The reg- of the Mexican health system and its various
ulation of personal health services includes the components (Dirección General de Evaluación
accreditation of medical and nursing schools, the del Desempeño, Secretaría de Salud, México).
certification of health professionals, and the
accreditation of health facilities. These activities
are developed in coordination with several profes- Health Information Systems and
sional bodies and NGOs, including the National Technology
Academy of Medicine and the National Associa-
tion of Medical Schools and Faculties. The pro- Health information is the responsibility of the
tection of health service users is in charge of the General Directorate for Health Information
National Commission for Medical Arbitrage based at the MoH (Dirección General de
Table 1 Health care coverage, Mexico 2002 and 2010

2000 2010
Type of population Number (million) % Number (million) %
Population with social security 38.7 37.4 50.7 45.1
Population with private insurancea 2.5 2.4 2.8 2.5
Population enrolled in Seguro Popular – – 43.5 38.7
Population with health insurance 41.2 39.8 97.0 86.3
Uninsured population 62.2 60.2 15.3 16.6
Total population 103.4 100 112.3 100
Source: Refs. (Crónica; Comisión Nacional de Protección Social en Salud; Comisión Nacional de Protección Social en
Salud 2012).
a
Around half of the population with private health insurance is also covered by public insurance. In this figure we consider
those with private health insurance only
Información en Salud, Secretaría de Salud, Méx- Besides these groups, citizens have tradition-
ico). In collaboration with other public institu- ally played a limited role in the design and oper-
tions, this office created the National Health ation of health services, programs, and policies.
Information System (SINAIS), which generates The main exceptions are the HIV/AIDS and
information on births, deaths, cases of disease, women’s health advocacy groups.
health infrastructure, health services, and financial
and human resources (Sistema Nacional de
Información en Salud, México). SINAIS counts Financing
with several subsystems including the Epidemio-
logical Surveillance System, the Automatized Coverage and Benefits
Hospital Discharge System, and the National and
State Health Accounts System. The Mexican health system is segmented along
The MoH has an area for the evaluation of three broad categories of beneficiaries: (i) salaried
medical technology, the National Center for Health workers and retired population, along with their
Technology Excellence, whose main purpose is to families; (ii) self-employed workers and unem-
produce and disseminate information on the appro- ployed population, along with their families; and
priate selection, incorporation, and use of medical (iii) the population with the ability to pay.
technologies based on evidence of their safety, As mentioned above, salaried workers are the
effectiveness, and efficiency (National Center for beneficiaries of social security institutions, which
Health Technology Excellence). in 2010 covered 50.7 million people (Table 1;
Crónica). IMSS covered 80% of this population,
and the rest was covered by ISSSTE and the
Role of Patients social security institutions for oil workers and
the armed forces.
Patients in Mexico started playing a role in the The second category (self-employed and
operation of the Mexican health system until very unemployed, and their families) was covered
recently through the “citizen endorsements until 2003 by services of the MoH, SESA, and
groups,” created in 2001 as part of a quality pro- IMSS-O. The recently created Seguro Popular
gram, the “National Crusade for Quality in Health was covering 43.5 million individuals in this
Care.” The purpose of these groups is to train category by 2010 (Comisión Nacional de Pro-
community volunteers to assess the responsive- tección Social en Salud; Comisión Nacional
ness of health care facilities (Ruelas 2006). In de Protección Social en Salud). By the end of
2006, there were 1764 active citizen groups that 2011, affiliation to Seguro Popular reached
had endorsed over 1100 health units. 52 million. This means that Mexico is on
track to reach universal health coverage in the IMSS. Finally, Seguro Popular is financed with
near future. federal and state government contributions and
Finally, the third category includes the users of family contributions, with total exemption for
private health services, mostly upper and middle those families in the bottom 40% of the income
class individuals. However, the poor and those distribution.
affiliated to social security institutions also use Private services are financed mostly out-of-
them on a regular basis. According to the National pocket. A very small portion of private health
Health and Nutrition Survey 2012 (ENSANUT expenditure comes from private insurance
2012), over 30% of the insured population regu- premiums.
larly use private health services, mostly ambula-
tory care, for which they usually pay out-of-
pocket (Instituto Nacional de Salud Pública Health Expenditure
2013). The penetration of private insurance is
low. Only six million people in Mexico are cov- Total health expenditure as % GDP in Mexico in
ered by private health insurance, half of which 2010 was 6.3%, well below the OECD average
also are covered by public insurance (9.3%) and below the Latin American average
(CNNExpansión). (6.8%), but up from 5.1% in 2000 (World Health
Those affiliated to social security institutions Organization; Organization for Economic Coop-
have access to a broad, but not explicitly defined, eration and Development; World Health Organi-
package of health services that includes ambula- zation). Health expenditure per capita in that same
tory and hospital care, including high specialty year was US$ppp 603, up from US$ppp 328 in
care. Coverage includes drugs as well. Those 2000.
affiliated to Seguro Popular have access to a com- Mexico’s public expenditure on health as a
prehensive and explicit package of 270 essential percentage of total health expenditure in 2010
interventions and the respective drugs. They also was 49%, up from 46.6% in 2000 but still the
have access to a package of over 60 high-cost third lowest of OECD countries (World Health
interventions for the treatment of acute neonatal Organization; Organization for Economic Coop-
conditions, cancer in children, cervical and breast eration and Development).
cancer, and HIV/AIDS, among other diseases. Private expenditure concentrates 51% of total
Finally, the uninsured population has access to a health expenditure in Mexico, a much larger por-
limited package of benefits that vary considerably tion than the average OECD country (17%) and a
depending on the type of population (urban larger portion than Argentina (35.6%), Colombia
or rural). (25.4%), and Uruguay (34.7%) but lower than
Brazil (53.0%) (World Health Organization;
Organization for Economic Cooperation and
Sources of Revenue, Collection, and Development; World Health Organization).
Pooling Ninety two percent of private health expendi-
ture is out-of-pocket (World Health Organization).
As shown in Fig. 2, social security institutions are The remaining 8% corresponds to private insur-
financed with contributions from the government, ance premiums (World Health Organization). In
the employer (which in the case of ISSSTE, Argentina, Brazil, Colombia, and Uruguay, out-
PEMEX, SEDENA, and SEMAR is also the gov- of-pocket expenditure concentrates 60%, 57.8%,
ernment in its role as employer), and the 67.7%, and 39.6% of total private health expendi-
employee. The MoH and the SESA are financed ture, respectively (World Health Organization).
mostly with federal and state government This means that Mexico has the highest level of
resources coming from general taxation. IMSS- out-of-pocket expenditure of middle-income coun-
O, which is directed to the rural poor of 17 states, tries in Latin America. This exposes households to
is financed with federal resources but operated by catastrophic financial events. In 2000, an estimated
three million Mexican families suffered cata- Health Organization 2013; OECD. OECD Health
strophic or impoverishing health expenditures Data 2013).
(Frenk et al. 2006). However, several studies Regarding human resources, there are 1.96
showed that by 2006 this figure began to decline doctors per 1000 population, below the OECD
due to the implementation both of several programs average (3.0) and other Latin American countries,
to combat poverty and Seguro Popular (Knaul et such as Argentina (3.0) and Uruguay (3.7) (World
al. 2006, 2011). Health Organization 2013). The scarcity of these
resources is particularly acute when it comes to
human resources for mental health: in Mexico
Physical and Human Resources there are only 0.02 psychiatrists per 1000 popula-
tion World Health Organization 2013). The avail-
Excluding medical offices of the private sector, ability of nurses, 2.7 per 1000 population, is also
the Mexican health system has about 27,000 below the OECD average of 8.6 (OECDiLibrary).
health units, 3976 of which are hospitals, for a
rate of 3.5 hospitals per 100,000 population
(Dirección General de Evaluación del Pharmaceuticals
Desempeño, Secretaría de Salud, México). Of
the total number of hospitals, 1386 (33.6%) are The Mexican market of pharmaceutical products
public and 2590 are private (66.4%). Of the total is the 12th largest market in the world and the
number of public hospitals, 2147 (54%) belong to second largest in Latin America, just below Brazil
SESA and MoH and 1829 (44%) to social security (Massachusetts Office of International Trade and
institutions. Investment; Chhabara). Mexico spends 27% of its
In 2010 the three main public institutions total expenditure on health in pharmaceuticals, the
(MoH/SESA, IMSS, and ISSSTE) had 74,064 third highest figure for OECD countries (OECD).
hospital beds and 2900 operating rooms for a About 80% of total expenditure in pharmaceuti-
rate of 6.5 beds per 10,000 population and 2.5 cals is concentrated in generic drugs, a market that
operating rooms per 100,000 population has shown important growth rates in the past
(Dirección General de Evaluación del decade.
Desempeño, Secretaría de Salud, México). Around 80% of total expenditure in pharma-
Private hospitals count with 34,000 hospital ceuticals is private and 90% is out-of-pocket, one
beds. Most of them are general hospitals and are of the highest figures in the world (Moïse and
concentrated in the largest cities of the country. Docteur 2008). The public sector concentrates
Most of them have 20 beds or less. Some of these 20% of the national expenditure in pharmaceuti-
units, in fact, can hardly be considered hospitals at cals and 35% of its volume. This difference is due
all since they have no laboratories, no radiology to the fact that most of the drugs purchased by
and imaging services, and no blood banks. public institutions are generics, which are consid-
The Mexican health system also has over erably cheaper than patented drugs.
20,000 public ambulatory units, most of which
belong to SESA (Dirección General de
Evaluación del Desempeño, Secretaría de Salud, Delivery of Personal and Public Health
México 2000). Services
Regarding high specialty medical equipment
and procedures, Mexico has a rate of 3.9 com- Health care services in public institutions are pro-
puted tomography units (CTU) and 1.3 radiother- vided at social security, MoH, SESA, and IMSS-
apy units (RTU) per million population, the O facilities. Those in the formal, private sector of
lowest and second lowest figures for OECD coun- the economy receive health services at IMSS
tries, respectively, which on average have 8.2 clinics and hospitals. Those in the formal, public
CTU and 6.9 RT per million population (World sector of the economy receive services at ISSSTE,
PEMEX, SEDENA or SEMAR facilities. Those status to any particular health institution. These
affiliated to Seguro Popular receive health care at services include health promotion, risk control,
the MoH, SESA, and IMSS-O facilities. The latter and disease prevention activities, including vacci-
institutions also provide services to the uninsured. nation, and epidemiological surveillance.
All these public providers run their health care
network with their own personnel.
Private providers offer services through a very Quality of Care
heterogeneous networks that includes large hos-
pitals offering high-quality but expensive care in a Quality has been a concern of the Mexican health
few metropolitan areas and a large amount of system for a long time. A quality assessment
small hospital/clinics (general hospitals providing conducted at the end of the past century in more
mostly obstetric care) offering services of poor than 1900 public health centers and 214 general
quality. public hospitals documented problems with
Social security institutions and Seguro Popular waiting times, drug supply, medical equipment,
are allowed to hire private providers to supply and use of medical records. Historically, public
services for their affiliates when demand sur- institutions have operated as monopolies with no
passes capacity or when there is a lack of person- choice, poor responsiveness to consumer needs,
nel, equipment, or other inputs to provide any and lack of concern for quality. Furthermore,
covered service. In 2012 IMSS contracted-out health care facilities were not subject to a formal
dialysis and hemodialysis services for almost accreditation process.
US$ 340 million (Instituto Mexicano del Seguro In the past decade two national quality pro-
Social). grams were implemented: the National Crusade
Furthermore, as mentioned above, due to prob- for Quality in Health Care and Sícalidad. These
lems of access and quality of public services, initiatives were designed to improve standards of
many individuals affiliated both to social security personnel and technical quality in service delivery
institutions and Seguro Popular make regular use and enhance the capacity of citizens to demand
of private out-patient services paying out-of- accountability.
pocket. ENSANUT 2012 indicates that 39% of A central component of these initiatives was
total out-patient services are offered by private the strengthening of the certification process for
providers. public and private health units, which is now
The use of private hospital services by those coordinated by the National Health Council
affiliated to social security or Seguro Popular is (NHC), an institution created in 1917 as the
less common for two reasons: the quality of highest policymaking body in the sector. This
services offered by public providers tends to process was reinforced by a disposition incorpo-
increase with the level of care, and middle-class rated to the General Health Law in 2003 requiring
and poor households seldom have the resources the accreditation of all units providing services to
needed to make use of private hospital facilities. Seguro Popular.
ENSANUT 2012 indicates that only 17% of total Initiatives to monitor and improve the avail-
hospitalizations in Mexico occur in private facil- ability of drugs in public institutions were also
ities, down from 23.9% in 2000 and 20.9% in implemented in the early 2000. External measure-
2006 (Instituto Nacional de Salud Pública 2013). ments have shown major improvements in drug
This trend matches the upward trend in hospital- availability in all public institutions, especially in
izations observed in units of the MoH which ambulatory facilities.
increased from 25.9% of total hospitalizations in A national system of indicators, Indica, was
Mexico in 2000 to 38.3% in 2012, a clear effect of also put in place to monitor quality of care by state
the implementation of Seguro Popular. and institution. This monitoring system includes
Public health services are provided by MoH to indicators for waiting times for ambulatory and
all the population, regardless of its affiliation emergency care, waiting times for elective
interventions, and distribution and dispensing of health investments to enhance human security
pharmaceuticals, among other indicators. through epidemiological surveillance and
Several external surveys have measured the improved preparedness to respond to emergen-
levels of satisfaction with health care in Mexico. cies, natural disasters, and the threats related to
Regarding overall satisfaction with hospital care, globalization, including potential pandemics;
ENSANUT 2012 indicates that 80.6% of health and a major reorganization leading to the estab-
service users consider health care services either lishment of a new public health agency
“good” or “very good” (Instituto Nacional de (COFEPRIS) charged with protection against
Salud Pública 2013). Social security institutions health risks.
providing services to oil workers and the armed Another crucial component of the health
forces show the highest satisfaction levels (97%), reform was an external evaluation that used a
followed by private facilities (92%). quasi-experimental design. This community trial,
implemented in 2005–2006 in over 38,000 house-
holds taking advantage of the phase-in implemen-
Recent Reforms tation of the intervention, showed that Seguro
Popular was reducing out-of-pocket expenditures
The creation of the SSPH in 2004 allowed for the and providing protection against catastrophic
expansion of health care coverage for the non- health expenditures especially to the poorest
salaried population while also improving the qual- households (King et al. 2009). Additional studies
ity of the available services and the protection also showed improvements in health service utili-
against health risks. This system was able to reor- zation and effective coverage both of preventive
ganize and increase public funding by a full per- and curative interventions, including interven-
centage point of GDP over 8 years in order to tions for the main causes of disease, such as dia-
provide universal health insurance. The vehicle betes and breast cancer (Lozano et al. 2006;
for achieving this aim was Seguro Popular. By Gakidou et al. 2006).
December of 2012, 52 million people were
enrolled in it (Comisión Nacional de Protección
Social en Salud). If we add to these figures those Assessment
affiliated to social security institutions and those
with private health insurance, we can reasonably As shown in this chapter, Mexico has made pro-
state that Mexico is on track to achieve universal gress in the three main objectives of health sys-
health coverage. tems: improving health conditions, enhancing
The reform also contemplated quality ori- responsiveness to the legitimate expectations of
ented initiatives including the organization of the population, and providing financial protection
training programs on quality improvement tools (Murray and Frenk 2000). However, the country
for health professionals; the monitorization of is facing emerging challenges.
quality indicators through the regular informa- Efforts to control pretransition ailments have
tion systems and external satisfaction and yielded significant progress. However, as
responsiveness surveys; and the establishment increased immunization coverage expanded and
of a compulsory accreditation for all units will- deaths due to diarrhea and acute respiratory infec-
ing to provide services to those affiliated to tions declined, NCDs began to exercise an
Seguro Popular. increasing pressure on the health of the population
Regarding public health, the Mexican reform and the health system. Salient among these chal-
established a protected fund for community lenges is a critical need for additional public
health services targeting health promotion and funding to extend access to costly interventions
disease prevention interventions, which allowed, for NCDs, such as cardiovascular diseases, can-
among other things, for a major expansion of the cer, diabetes, and its complications, and mental
basic immunization scheme; additional public health problems.
Another challenge facing the Mexican health gob.mx/images/pdf/informes/Informe_Resultados_

system is to achieve a right balance between addi- SPSS_2010.pdf. Accessed 15 Oct 2013.
Comisión Nacional de Protección Social en Salud. Sistema
tional investments in health promotion, risk con- de Protección Social en Salud. Informe de Resultados
trol, and disease prevention, urgently needed to 2012. Available at. http://www.seguro-popular.salud.
address the health risks related to NCDs, on the gob.mx/images/pdf/informes/InformeResultados-2-
one hand, and investments in personal curative SPSS-2012.pdf. Accessed 15 Oct 2013.
Crónica. Tiene México el mayor número de beneficiarios
health services on the other. en salud: FCH. Available at: http://www.cronica.com.
Finally, further progress in quality of health mx/notas/2010/541852.html. Accessed 15 Oct 2013.
care is still expected. The most critical areas are Dirección General de Evaluación del Desempeño,
technical quality of care; availability of drugs in Secretaría de Salud, México. Misión. Available at:
http://www.dged.salud.gob.mx/contenidos/dged/mis
hospital settings; availability of care during eve- ion_vision.html. Accessed 15 Oct 2013.
nings and weekends; and waiting times for ambu- Dirección General de Evaluación del Desempeño,
latory emergency care and elective interventions. Secretaría de Salud, México. Observatorio del
Narrowing gaps in access to health care also Desempeño Hospitalario 2011. Mexico City: Secretaría
de Salud. pp. 1–28.
remains a challenge that needs to be urgently Dirección General de Información en Salud, Secretaría de
addressed. These gaps affect mostly indigenous Salud, México. Misión, visión y objetivo. Available at:
communities that concentrate almost 10% of the http://www.dgis.salud.gob.mx/acercade/misionvision.
national population. html. Accessed 15 Oct 2013.
Frenk J, Sepúlveda J, Gómez-Dantés O. Evidence based
In general terms, the most pressing challenge health policy: three generations of reform in Mexico.
of the Mexican health system is integration, which Lancet. 2003;362(9396):1667–171.
implies the creation of a national health fund that Frenk J, Knaul F, Gómez-Dantés O, et al. Fair financing
guarantees access to the same set of health bene- and universal social protection. The structural reform of
the Mexican health system. Mexico City: Secretaría de
fits to all Mexicans, the reduction of transaction Salud; 2004.
costs associated to a segmented system, and the Frenk J, González-Pier E, Gómez-Dantés O, et al. Com-
universal and egalitarian exercise of the right to prehensive reform to improve health system perfor-
health care. mance in Mexico. Lancet. 2006;368:1524–34.
Gakidou E, Lozano R, González-Pier E, et al. Assessing
the effect of the 2001–06 Mexican health reform: an
interim report card. Lancet. 2006;368:1920–35.
References Ham-Chande R. Diagnóstico socio-demográfico del
envejecimiento en México. In: Consejo Nacional de
Central Intelligence Agency. The World Factbook. North Población, México. Mexico City: CONAPO; 2012b.
America: Mexico. Available at: https://www.cia.gov/ p. 141–55.
library/publications/the-world-factbook/geos/mx.html. Instituto Mexicano del Seguro Social. Gasto en sub-
Accessed 14 Oct 2013. rogaciones y servicios integrales 2012 (unpublished
Chhabara R. Making the most of the Mexican pharma report).
market. Available at: http://social.eyeforpharma.com/ Instituto Nacional de Salud Pública. Encuesta Nacional de
marketing/making-most-mexican-pharma-market. Salud y Nutrición 2012. Cuernavaca: INSP; 2013.
Accessed 17 Oct 2013. King G, Gakidou E, Imai K, et al. Public policy for the
CNNExpansión. Mexicanos adquieren pocos seguros: poor? A randomized assessment of the Mexican uni-
AMIS. Available at: http://www.cnnexpansion.com/ versal health insurance programme. Lancet.
economia/2009/05/18/aseguradoras-suman-el-17-de- 2009;373:1447–54.
pib. Accessed 15 Oct 2013. Knaul FM, Arreola-Ornelas H, Méndez-Carniado O, et al.
Comisión Federal de Protección contra Riesgos Sanitarios. Evidence is good for your health system: policy
Home Page. Available at: http://www.salud.gob.mx/ reform to remedy catastrophic and impoverishing
unidades/cofepris/notas_principal/rimonabant.html. health spending in Mexico. Lancet.
Accessed 15 Oct 2013. 2006;368:1828–41.
Comisión Nacional de Arbitraje Médico. Home Page. Knaul FM, Arreola-Ornelas H, Méndez O, Wong R.
Available at: http://www.conamed.gob.mx/main_ Financiamiento y sistema de salud en México:
2010.php. Accessed 15 Oct 2013. evolución en la desigualdad en la carga financiera
Comisión Nacional de Protección Social en Salud. Sistema entre población afiliada a la seguridad social y afiliados
de Protección Social en Salud. Informe de Resultados al Seguro Popular. Mexico City: Fundación Mexicana
2010. Available at: http://www.seguro-popular.salud. para la Salud; 2011.
Lozano R, Soliz P, Gakidou E, et al. Benchmarking of Sistema Nacional de Información en Salud, México.
performance of Mexican states with effective coverage. Información por temas. Available at: http://sinais.
Lancet. 2006;368:1729–41. salud.gob.mx/estadisticasportema.html. Accessed 15
Massachusetts Office of International Trade and Invest- Oct 2013.
ment. Mexican pharmaceutical industry. Available at: Soberón G. El cambio estructural en la salud. Salud Publica
http://www.moiti.org/pdf/Mexican%20Pharmaceutical Mex. 1987;29(2):127–40.
%20Industry.pdf. Accessed 15 Oct 2013. The World Bank. Gini index. Available at: http://data.
Moïse P, Docteur E. Las políticas de precios y reembolsos worldbank.org/indicator/SI.POV.GINI. Accessed 14
farmacéuticos en México, OCDE, 2007. Salud Publica Oct 2013.
Mex. 2008;50(suplemento 4):s504–10. The World Bank. Data. GDP growth (annual %). Available
Murray CJL, Frenk J. A framework for assessing the per- at: http://data.worldbank.org/indicator/NY.GDP.
formance of health systems. Bull WHO. 2000;78 MKTP.KD.ZG. Accessed 14 Oct 2013.
(6):717–31. UNDP. International human development indicators. Mex-
National Center for Health Technology Excellence. Mis- ico. Available at: http://hdrstats.undp.org/en/countries/
sion. Available at: http://www.cenetec.salud.gob.mx/ profiles/MEX.html. Accessed 14 Oct 2013.
descargas/folletoingles.pdf. Accessed 15 Oct 2013. World Health Organization. World health report 2000. Health
OECD. OECD Health Data 2013. How does Mexico com- systems: improving performance. Geneva: WHO; 2000.
pare. Available at: http://www.oecd.org/els/health-sys World Health Organization. Non-communicable diseases.
tems/Briefing-Note-MEXICO-2013.pdf. Accessed 16 Country profiles 2011. Geneva: WHO; 2012. p. 124.
Oct 2013. World Health Organization. World Health Statistics 2013.
OECD. OECDiLibrary. Pharmaceutical expenditure. Geneva: WHO; 2013.
Available at: http://www.oecd-ilibrary.org/social- World Health Organization. National health accounts.
issues-migration-health/pharmaceutical-expenditure_ Mexico. Available at: http://apps.who.int/nha/database/
pharmexp-table-en. Accessed 17 Oct 2013. StandardReport.aspx?ID=REP_WEB_MINI_TEM
OECDiLibrary. Health: key tables from OECD. Practising PLATE_WEB_VERSION&COUNTRYKEY=84027.
nurses. Available at: http://www.oecd-ilibrary.org/ Accessed 15 Oct 2013.
social-issues-migration-health/practising-nurses_ World Health Organization. National health accounts.
nursepract-table-en. Accessed 16 Oct 2013. Available at: http://www.who.int/nha/en. Accessed 15
Organization for Economic Cooperation and Develop- Oct 2013.
ment., OECD StatExtracts. Available at: http://stats. Ham-Chande R. Diagnóstico socio-demográfico del
oecd.org/index.aspx?DataSetCode=HEALTH_STAT. envejecimiento en México. In: Consejo Nacional de
Accessed 15 Oct 2013. Población, México. Mexico City: CONAPO; 2012a. p.
Partida V. Veinticinco años de transición 141–55
epidemiológica en México. In: CONAPO. La
situación demográfica de México 1999. Mexico
City: CONAPO; 1999. Further Reading
Reyna-Bernal A, Hernández-Esquivel JC. Poblamiento,
desarrollo rural y medio ambiente. Retos y prioridades
Three publications by 2000 the same
de la política de población. In: CONAPO. La situación
demográfica de México 2006. Mexico City: CONAPO; authors were particularly useful for the
2006. development of this chapter:
Ruelas E. Citizen’s quality councils: an innovative mech-
anism for monitoring and providing social endorsement Frenk J, Gómez-Dantés O. Para entender el sistema de
of healthcare providers’ performance. Healthcare salud. Mexico City: Nostra Editores; 2008.
Papers. 2006;6(3):33–7. Gómez-Dantés O. Mexico. In: Johnson JA, Stoskopf CH,
Secretaría de Salud. Programa Nacional de Salud editors. Comparative health systems. Global perspec-
2001–2006. La democratización de la salud en México. tives. Boston: Jones and Bartlett Publishers; 2009. p.
Hacia un sistema universal de salud. Mexico City: 337–47.
Secretaría de Salud; 2001. p. 33. Gómez-Dantés O, Sesma S, Becerril V, et al. The
Secretaría de Salud. Programa Nacional de Salud health system of Mexico. Salud Publica Mex. 2011;53
2007–2012. Mexico City: Secretaría de Salud; 2007. (suppl 2):S220–32.
Health System in the Netherlands
38
Madelon Kroneman and Willemijn Schäfer
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 862
Organization of the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 863
Planning and Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864
The Role of Patients and the Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865
Dimensions of Coverage of Curative Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865
Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866
Pooling of Funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866
Purchasing Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866
Health Spending and Cost Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
Physical Resources: Hospitals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
Paying the Hospital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
Medical Specialists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868
General Practitioners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868
Pharmacists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Nurses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Other Information on Health-Care Personnel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Public Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Primary/Ambulatory Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Specialized Ambulatory Care/Inpatient Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870
Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870
Dental Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871
Out-of-Hour and Emergency Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871
M. Kroneman (*) · W. Schäfer

Netherlands Institute of Health Services Research
e-mail: m.kroneman@nivel.nl;
willemijn.schafer@gmail.com

https://doi.org/10.1007/978-1-4939-8715-3_14
862 M. Kroneman and W. Schäfer
Informal Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871

Palliative Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871
Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871
Assessing the Health System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 872
Some Indicators of Health and Health Care in the Netherlands . . . . . . . . . . . . . . . . . . . . . . . . 872
The Dutch Health-Care System in International Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . 873
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875
Abstract Dutch citizens are on average very satisfied

A lengthy process of policy efforts to reform the with their health-care providers, and the acces-
health-care system and to introduce managed sibility of the health-care system is excellent.
competition into the system resulted in the new However, so far, the Netherlands has not been
Health Insurance Act (Zorgverzekeringswet) in successful in curbing the growth on health-care
2006. A single compulsory health insurance expenditure. The government tries to control
scheme was introduced, and managed competi- costs in several ways, for instance, by increas-
tion for providers and insurers became a major ing the compulsory deductible.
driver in the health-care system. This has meant
fundamental changes in the roles of patients,
insurers, providers, and the government. Insurers Abbreviations
negotiate with providers on price and quality, and GDP Gross domestic product
patients choose the provider they prefer and join GP General practitioner
a health insurance policy of their choice. The OECD Organisation for Economic
system of managed competition is currently in Co-operation and Development
place for the curative health-care sector and part
of the mental health-care sector (ambulatory Introduction
mental care and institutional mental health care
up to 1 year). Since 2006, the role of the national The Netherlands is situated in Western Europe
government has changed from directly steering and borders the North Sea, Germany and
the system to safeguarding the proper function- Belgium. It covers an area of 41,543 km2
ing of the health-care markets. Long-term care (Centraal Bureau voor de Statistiek 2009)
(nursing care and long-term mental care) is reg- and has a population of 16.8 million in 2013,
ulated by the Long-term Care Act (Wet landurige the majority of whom (79%) are native Dutch
zorg) and the Social Support Act (Wet (Statistics Netherlands 2013). The Netherlands
maatschappelijke ondersteuning). During the has the tenth largest economy in the world and
past decade, social support for disabled and ranks 16 in GDP (Ministry of Foreign Affairs
chronically ill and several forms of home care 2013). Between 1970 and 2011, the life expec-
were already transferred to municipalities. tancy at birth of the Dutch population has grown
General practice plays a central role in from 73.6 to 81.3 years (Statistics Netherlands
the Dutch health-care system. All citizens 2013). The infant mortality declined from 4.9
are listed with a general practitioner (GP) or per 1,000 live births in 2005 to 3.6 in 2011,
GP practice. GPs serve as gatekeepers: which is slightly below the average rate for all
patients have to visit their GPs first for their OECD countries (4.4 per 1,000 live births)
health complaints and only upon referral they (OECD 2013). In 2011, most deaths are caused
can go to a medical specialist. Furthermore, by malignant neoplasms (cancer), which is in
compared to other countries, the relative num- contrast with other EU countries, where diseases
ber of nurses is high. of the circulatory system are the main cause of
38 Health System in the Netherlands 863
death. The burden of disease is higher among insurance funds (sickness funds), the Dutch sys-
immigrants than among native Dutch inhabitants. tem represents an innovative and unique variant
Important risk factors affecting the health of the of a social health insurance system.
Dutch population are smoking and overweight. The Dutch population aged 18 years and older is
Between 2000 and 2010, the average of regular obliged to take a health-care insurance for the basic
daily smokers was slightly below the EU average health-care package. Children under the age of 18
(World Health Organisation 2013). According to are included in the policy of one of their parents,
self-reported data, almost half of the population is and their premium is paid by the government.
overweight (Statistics Netherlands 2013). Health insurers are obliged to accept applicants
without restrictions. Differentiation of premiums
for different risk conditions (such as age, sex, and
Organization and Governance chronic diseases) is not allowed. Health insurers
are free to set community-rated premium and to
Organization of the System contract health-care providers, under the condition
that they have to operate within the national health-
A lengthy process of policy efforts to reform care budget set by the government and that they
the health-care system and to introduce managed have to contract sufficient providers to ensure good
competition into the system finally resulted in the access to care for their insured population. Health
new Health Insurance Act (Zorgverze- insurers are compensated for their insured with
keringswet) in 2006. With the introduction of a high risk for health-care costs via a risk adjustment
single compulsory health insurance scheme, the scheme. In addition to the basic insurance package,
former dual system of public and private insur- health insurers offer voluntary complementary
ance for curative care was abandoned. Managed insurance for care that is not covered by the Health
competition for providers and insurers became a Insurance Act. For instance, a (partly) coverage of
major driver in the health-care system. The new glasses or dental care is often part of the voluntary
system introduced new roles for patients, health insurance.
insurers, health-care providers, and the govern- General practice plays a central role in the
ment. Three markets exist: the health insurance Dutch health-care system. All citizens are listed
market, the health provision market, and the with a general practitioner (GP) or GP practice.
health purchasing market. Within the health- GPs serve as gatekeepers: patients have to visit
care purchasing market, insurers have to negoti- their GPs first for their health complaints, and only
ate with providers on price, quality, and volume upon referral they can go to a medical specialist.
of care. In the health-care provision market, About 96% of all contacts are dealt with within
patients can choose the provider they prefer. In primary care (Cardol et al. 2004). An important
the health insurance market, citizens can join a prerequisite is that GP care in the Netherlands is
health insurance policy which best fits their freely accessible and exempted from the compul-
needs and requirements. The system of managed sory deductible which is currently in place for
competition is currently in place for the curative other forms of care.
health-care sector and part of the mental health- Long-term care (nursing care and long-term
care sector (ambulatory mental care and institu- mental care) is regulated by the Exceptional
tional mental health care up to 1 year). Since Medical Expenses Act (AWBZ). This Act was
2006, the role of the national government has intended originally (1968) to provide care for
changed from directly steering the system to those with chronic conditions requiring continu-
safeguarding the proper functioning of the ous care that involves considerable financial con-
health-care markets. With the introduction of sequences. Since the introduction of the Act,
market mechanisms in the health-care sector many types of care have been added resulting in
and the privatization of former public health a rapid growth in expenditure in such a way that
the affordability became at risk and thus the call sets the rules for risk adjustment among health
for reform became urgent. During the past decade, insurers. In the care sector, the central government
social support for disabled and chronically ill and has a number of explicit responsibilities. These
several forms of home care were already include creating the preconditions for quality,
transferred to municipalities. In 2015 the long- accessibility, safety, and affordability of the care
term care in the Netherlands was completely for people with chronic conditions; strengthening
reformed. Home nursing care for people who the position of citizens, in particular patients and
require 24 hours supervision per day is now reg- their representatives; and stimulating innovation.
ulated by the Long-term Care Act (Wet langdurige To meet these responsibilities, the government has
zorg). People in the need of care who live at home supervisory and advisory bodies in place. Further-
receive care through the Social Support Act (Wet more, at national level, there is legislation which
maatschappelijke ondersteuning) wich is the describes the conditions in which the markets have
responsibility of municipalities. Home nursing to operate.
care became part of the Health Insurance Act
and is now the responsibility of health insurers. Supervisory Bodies
Independent supervisory bodies take care of
safeguarding accessibility, affordability, and
Planning and Regulation quality of care:
The role of the Dutch government is steering from • The Dutch Healthcare Authority (NZa) super-
a distance. They define the framework in which vises the compliance of actors with the Health
health care can be developed. Responsibilities Insurance Act (Zvw) and the Health Care
have been transferred to insurers, providers, and Market Regulation Act (Wmg). NZa interferes
patients, and the government only supervises with restrictions or obligations when an actor,
quality, accessibility, and affordability of health that is a health insurer, health-care provider, or
care. The establishment of new supervisory agen- consumers, together or alone, hinders fair com-
cies in the health sector aims to avoid undesired petition in (part of) the health-care market. Fur-
market effects in the new system. Traditionally, thermore, the NZa establishes tariffs and
self-regulation has been an important characteris- performance directions for those health services
tic of the Dutch health-care system. Professional that are not subject to free negotiations. Lastly,
associations are responsible for reregistration the NZa monitors health-care markets and pro-
schemes and are involved in quality improve- motes its transparent and fair operation. In addi-
ment, for instance, by developing professional tion, the NZa imposes on tariffs for health
guidelines. services that are not freely negotiable and on
extending the share of freely negotiable services.
Responsibilities of the National • The National Healthcare Institute (Zorginstituut
Government Nederland) advises the Ministry of Health, Wel-
The government should ensure that managed com- fare and Sport on the content of the basic health
petition results in safe, accessible, and affordable insurance package. Furthermore, it supplies
health care of good quality. Only a few instruments information to insurers (but also consumers
have been left to the government to directly inter- and providers) on the nature, content, and
fere in the health-care system. An essential compe- scope of the basic health insurance. The
tence of the government is setting the budget for Healthcare Institute also administers the Health
health-care expenditures. Other important compe- Insurance Fund and operates the risk adjustment
tences of the central government are taking deci- scheme.
sions on the content of the basic health insurance • The Health and Youth Care Inspectorate (IGJ)
package and on cost-sharing. Furthermore, in order supervises the health-care providers in the
to prevent preferred risk selection, the government areas of quality and safety.
The Role of Patients and the Population less all primary and secondary curative care.
Excluded are dental care for persons older than
Within the Dutch health-care system, the popula- 18 years of age and some elective procedures such
tion is free to choose a health insurer. The idea is as plastic surgery without medical indication and,
that people will choose those insurers with the since 2013, simple walking aids. Partly covered
best price/quality performance. In practice, the are, for instance, allied health care, some medi-
main reason for people to switch is the level of cines, and in vitro fertilization.
the premium for the basic insurance package and Citizens pay for their health insurance
competition on quality of care seems to be absent through a community-rated premium and an
(see www.hspm.org). Patients are expected to income-dependent contribution. For 2013, the
choose providers based on quality (for instance, community-rated premium varied from €92 to
through providers selected by health insurer and/ €112 per month. Health insurers are free to set
or by comparing providers on quality on the the premium level. The insured persons pay these
website www.kiesbeter.nl). In practice, patients premiums directly to their health insurer. For chil-
follow the recommendation of their GP in choos- dren below the age of 18, the government covers
ing a health-care provider (Dautzenberg et al. the premium through a contribution into the
2012; Reitsma et al. 2012). Health Insurance Fund. Insurers are not allowed
to differentiate the premium of one specific policy
for the basic benefit package for different groups
Financing of people. There is one exemption: insurers may
offer collective contracts. Collective contracts are
Dimensions of Coverage of Curative established between groups of insured (e.g., a
Care company with employees) and the insurance com-
pany. Insurance companies are allowed to offer a
Basic health insurance is obligatory for all Dutch maximum of 10% reduction on the individual
residents. Children under the age of 18 are insured premium. Insured people are free to join a collec-
free of charge but have to be included in one of the tive policy or buy an individual policy. The sys-
parents’ policies. The nominal premium for chil- tem of collective policies is established to give the
dren is paid by the government. For persons aged insured more influence (“voice”) on the insurance
18 or over, there is a compulsory deductible of companies. The threat of the loss of a large num-
€385 in 2018. Excluded from this deductible are ber of insured persons may persuade insurers to
GP care, maternity care, and dental care under the satisfy the collectivity and compete on price and
age of 18. In addition to the compulsory deduct- quality of care. In addition, successful negotia-
ible, people can choose for a voluntary deductible. tions may lead to more demand-driven care and
This voluntary deductible may range from €100 to care that is tailored to the need of the target group
maximum €500 in exchange for a reduction on the of the collective. In 2012, 67% of the insured
premium. persons participated in a collective insurance
The basic health insurance covers all curative policy.
(somatic and mental) health care that is considered The income-dependent contribution is col-
essential, effective, cost-effective, and unaf- lected by the Tax Office, which levies the contri-
fordable for individuals. “Essential” refers to its bution from salary together with payroll taxes.
capacity to prevent loss of quality of life or to treat After collecting all the contributions, the Tax
life-threatening conditions. The affordability Office transfers the money to the Health Insurance
criteria state that no services need to be included Fund (Zorgverzekeringsfonds), where the money
that are affordable for individual citizens and is allocated after risk adjustment to the health
for which they can take responsibility (Brouwer insurers.
2004). The content of the benefit package is To ensure access to basic health insurance
defined by the government and covers more or under a system with flat rate premiums and to
compensate for undesired income effects for Insurance Act. Under certain conditions, people
lower-income groups, a “health-care allowance” can receive a personal budget to buy the care they
funded from general tax was created. In 2011, need.
six out of ten households received a health-care To cover the expenses for the Wlz, a contribu-
allowance of on average €85 per month. People tion of 9.65% is levied on the salary of the citi-
with chronic diseases or a handicap receive a zens, with a maximum of €3,280 per year (2018).
compensation of €99 per year for the compulsory The revenues are collected by the Tax Office and
deductible in 2013. transferred to the Long-term Care Fund, adminis-
tered by the Netherlands Healthcare Institute. The
Voluntary Health Insurance (VHI) expenses for the Social Support Act are covered
Most insurance companies offer voluntary by general taxes and are transfered to the
packages in combination with the basic benefit municipilities through the municipality fund.
basket. In 2012, 88% of the insured took out The budget is not earmarked.
complementary VHI (Ten Hove et al. 2012).
VHI covers for care that is not included in the
basic package, for instance, dental care, glasses, Pooling of Funds
or physical therapy (for persons without a chronic
indication). In addition, some co-payments may In the Netherlands, administering and providing
be covered, for instance, for ambulatory mental basic health insurance are delegated to private
care. Contrary to basic health insurance, health health insurers. These insurers are funded by
insurers are free to set premium levels and may the nominal premium directly received from
apply preferred risk selection for complementary clients and a contribution from the Health Insur-
VHI based on medical criteria or other risk ance Fund, which pools the income-dependent
factors. Insurers are obliged to offer VHI inde- employer contributions (collected by the Tax
pendent from the basic health insurance, but Office) and the state contribution (e.g., to cover
some insurers discourage taking VHI without a children under 18). The allocation among the
basic insurance by increasing the premium or by health insurers is based on the health risk profile
stating that VHI can only be taken when a basic of their insured population. The government sets
insurance is taken at the same insurer (Roos and the level of the income-dependent contribution,
Schut 2009). with the notion that, at national level, the total
income-dependent contributions for adults should
amount to approximately 50% of the total funding
Long-Term Care of basic health insurance, while the nominal
premiums should account for the other 50%.
Exceptional Medical Expenses Act
(AWBZ)
Long-term care is insured under the Long-term Purchasing Process
Care Act (Wlz). This is a social health insurance
scheme that is intended to provide care for those Health insurers buy health care for their insured
with chronic conditions (physical and/or mental) population (possibly by selective contracting).
requiring requiring 24 hour supervision (either They negotiate contracts with hospitals (on
physically, mentally or medically) per day. Every- volume and quality but also lump sum) and
one who is legally residing in the Netherlands or with committees that represent GPs. The negoti-
pays payroll tax in the Netherlands is compulsory ations with GPs are in practice hardly on tariff
insured. At present (2018), long-term care at but more on activities aimed at increasing GP
home is provided by municipalities under the care and substitution of secondary care to pri-
Social Support Act (Wmo). Home nursing is pro- mary care (modernization and innovation
vided by health insurers under the Health activities).
Purchasing of long-term institutional and university hospitals, independent treatment cen-

home nursing care is delegated to health insurers. ters have become part of the acute care hospital
Health insurers negotiate with providers on price, sector. These private centers provide selective
volume, and quality of care. non-emergent treatments for admissions up to
24 h.
Most hospitals are corporations. Hospitals are
Health Spending and Cost Control nonprofit institutions as a for-profit motive is not
allowed. Since 2008, however, a few pilots have
Initially, after the reform in 2006, the community- started that allowed paying out a part of the profit
rated premiums were set too low, because health to shareholders. Attracting shareholders might
insurers tried to attract the population via these give hospitals the opportunity to generate more
low premiums. This resulted in a loss for health investment for quality improvement and innova-
insurers. Over the years, premiums have slowly tion. Whether or not hospitals should be allowed
increased. Since 2009, insurers have been able to to generate profit and to have shareholders is still
realize a profit on their insurance policies (2012) a topic of political debate.
(Nederlandse Zorgautoriteit [Dutch Healthcare Within hospitals, approximately 55% of med-
Authority] 2013). ical specialists are self-employed and organized in
So far, the Netherlands has not been successful partnerships (Nederlandse Zorgautoriteit [Dutch
in curbing the growth on health-care expenditure. Healthcare Authority] and DBC-onderhoud
The government tries to control costs in several 2012). These partnerships usually work in one
ways, for instance, by increasing the compulsory hospital. In a few hospitals, especially university
deductible. Initially, in 2008, the deductible was hospitals, the specialists are employed by the
€150. This increased to €350 in 2013. Further- hospital. In 2012, there were 21,750 registered
more, some care is taken out of the basic package, medical specialists. The largest categories were
such as simple walking aids in 2013. For long- psychiatrists (3,299), internists (2,168), and anes-
term care, the government tries to increase the thesiologists (1,805) (KNMG 2013).
involvement of citizens, by stimulating them to
take care of family and neighbors on a voluntary
basis. Since 2006, the growth in expenditure Paying the Hospital
varies from 6.8% (2008) to 2.3% (2011). In
2012, the growth was 3.7% (Centraal Bureau Hospitals are paid through Diagnosis Treatment
voor de Statistiek 2013a, b). Combinations (Diagnose Behandel Combinaties,
DBCs) since 2005. The DBC system was based
on the concept of DRGs (Diagnosis-Related
Physical and Human Resources Groups), but it constituted a newly developed
classification system. The DRG system is based
Physical Resources: Hospitals on the diagnosis of a patient, and there is one
DRG per patient for each hospital episode. The
The structure of health care in the Netherlands DBC system provides a DBC for each diagnosis
comprises a dense network of premises, equip- treatment combination, and thus, more than
ment, and other physical resources. In 2010, one DBC per patient is possible. The system
there were 8 university hospitals and 84 acute was, however, considered too complex and
care hospitals in the Netherlands, subdivided error-prone. Therefore, by 2012, the system was
into 28 top clinical centers and 57 general hos- updated. New DBCs were formulated, and the
pitals (Nederlandse Vereniging van number of DBCs was reduced from 30,000 to
Ziekenhuizen 2012). In 2009, there were 2.8 3,000. The DBC tariffs include the costs of
beds per 100,000 inhabitants, which is among medical specialist care, nursing care, and the use
the lowest in Europe. In addition to general and of medical equipment and diagnostic procedures.
Apart from these direct costs, also indirect costs General Practitioners
such as education, research, and overhead are
included. The reimbursement for each DBC is In 2012, there were 8,879 GPs (53 per 100,000
not influenced by longer hospital or shorter inhabitants), 43% of whom were female. GPs
hospital stay or a deviant number of diagnostic work in independent practices, either alone
procedures for a certain patient. (26%) or with two or more other GPs (74%).
Since the introduction of the DBC system, Patients are listed with a GP practice. About
there were two segments: the freely negotiable 11% of the GPs work in salaried service for
segment and the regulated segment. To get used other GPs; the majority of these salaried GPs is
to the new system, in which health insurers and female (87%) (Van Hassel and Kenens 2013).
hospitals had to negotiate prices for the DBCs, GPs receive a capitation fee per patient per year.
only a small part (10% in 2005) was freely nego- For older patients and patients from deprived areas,
tiable, and the prices for the regulated part were a higher fee is applicable, but this is only paid if
based on the former system of paying the hospi- there is an agreement with the health insurer
tal. Gradually the freely negotiable part (Nederlandse Zorgautoriteit [Dutch Healthcare
increased. In 2012, the former system of paying Authority] 2011). Per patient contact the GP
the hospital was abolished, with a transition receives a fee, differentiated toward practice con-
model for the years 2012 and 2013. Now there sultations, home visits, telephone consultations,
is a freely negotiable part (about 70% of the DBC and prescription refills. Practice nurses take part in
turnover) in which hospitals and insurers are free the routine care for chronically ill persons in the
to set prices and a regulated part for which the general practice, like diabetes, hypertension, and
Dutch Healthcare Authority (one of the supervi- COPD/asthma. Fees for practice nurses are freely
sory organizations) establishes maximum prices. negotiable or are part of integrated care agreements.
In practice, some insurers do not negotiate prices Integrated care agreements are financed via bun-
for the DBCs but negotiate a lump sum amount dled payments. Integrated care addresses the care
with the hospitals. for patients with the following chronic conditions,
As compensation for investments is diabetes type II and COPD, and persons with high
included in the tariffs, since 2008 for hospitals risk for cardiovascular diseases. According to the
and since 2009 for long-term care institutions, system of bundled payments, a care group orga-
health institutions are fully responsible for nizes all care that is necessary for managing these
the realization of their (re)constructions and the diseases. Care groups are owned by GPs in a certain
purchase of equipment. No external approval region; they vary in size from 4 to 150 GPs. The
of building plans applies, although the care group coordinates the care and pays the differ-
quality of premises is externally assessed every ent care providers who are involved in the care.
5 years. Patients are free to participate in integrated
care or to organize the necessary care themselves.
Besides the abovementioned payment methods,
Medical Specialists GPs may negotiate with insurers for the financing
of activities for improvement of efficiency or sub-
Medical specialists are either independent stitution of care. These activities are only reim-
professionals organized in partnerships working bursed if this is negotiated in a contract with the
in a hospital (55% in 2010) (Nederlandse health insurer.
Zorgautoriteit [Dutch Healthcare Authority] & Out-of-hour services for GP care are mostly
DBC-onderhoud 2012); or they are in salaried provided by GP out-of-hour cooperatives. GPs
service of a hospital. Since 2008, medical special- who participate in this system receive a per hour
ists are paid through the DBC system. The inde- compensation. The majority of GPs participate
pendent partnerships have to negotiate their tariffs in a GP out-of-hour cooperative (approximately
with the hospital they work in. 97% in 2013).
Pharmacists Delivery of Health Services
For pharmaceutical care, provided by pharmacists Public Health

or dispensing GPs, remuneration is based on pre-
defined activities that are described by the Dutch Disease prevention, health promotion, and health
Healthcare Authority. Examples of such activities protection fall under the responsibility of munic-
are dispensing the medicine to the patient and ipalities. A number of uniform tasks are specified
providing information about the medication. in the Public Health Act (Wpg) and include
Health insurers and pharmacists can freely among others youth health care, public health for
negotiate prices for these activities. asylum seekers, medical screening, and commu-
nity mental health.
Youth health care ( jeugdgezondheidszorg)
Nurses provides preventive and mental care for all chil-
dren aged between 0 and 19 years. Until the age
Compared to other countries, the relative number of 4, children visit child health centers
of nurses is particularly high. Most nurses are (consultatiebureaus) for checkups. The most
working in home care and in care for the elderly important tasks of preventive health care are the
and disabled. Substitution and transfer of tasks monitoring of growth and development, early
from medical to nursing professionals is an impor- detection of health or social problems (or risks),
tant trend. For instance, practice nurses, who take screening and vaccination, and providing advice
care of chronic patients with certain diagnoses, and information concerning health. This care is
are since 2011 allowed to prescribe medicines provided by specialized physicians and nurses.
(Editorial Office Nursing 2011), and specialist When treatment is necessary, the child health
nurses caring for pulmonary and diabetes patients center will refer to other primary health-care
are allowed to prescribe medicines since 2013 providers, mostly GPs. Youth mental care is the
(Oelen 2013). responsibility of municipalities.
The National Vaccination Programme
(Rijksvaccinatieprogramma, RVP) consists of
Other Information on Health-Care childhood vaccinations (DTP-Hib-HepB, MMR,
Personnel MenC, pneumococci, and HPV for girls of the
age of 12). Other national screening programs
Medical education is provided at each of the eight are screening for cervical cancer, breast cancer,
Dutch universities, while nurses can either be and vaccination against influenza. The heel prick
educated at an intermediate, higher, or academic for newborns screens for 17 diseases.
level, depending on the professional profile. The
quality of health-care professionals is safeguarded
by obligatory registration and by various licensing Primary/Ambulatory Care
schemes. Workforce forecasting and careful plan-
ning of educational capacity seek to prevent short- In the majority of cases, the first point of contact
ages or oversupply of health professionals. In a for people with a medical complaint will be their
small and densely populated country like the GP. The GP has a central role in the health-care
Netherlands, unequal distribution of providers is system and acts as gatekeeper of the system.
not a major issue, although in some parts of large This means that for “prescription-only medicines”
cities, additional efforts need to be made to match or medical specialist care, a prescription or refer-
demand and supply. In 2012, about 15% of the ral from a GP is required. For specific problems,
working age population was working in the patients can also directly access allied health pro-
health-care sector (including home care, child fessionals, such as physiotherapists and remedial
care, and social support). therapists. However, these professionals are not
qualified to prescribe medication or to refer to Long-Term Care

secondary care. Two other directly accessible pri-
mary care professionals are midwives and den- Long-term care is provided both in institutions
tists. These disciplines are also qualified to refer (residential care) and in communities (home
to some forms of secondary care, such as gyne- care). Long-term institutional care is financed
cologists in case of midwives and dental surgeons by the Long-term Care Act (Wlz). The Centre
in case of dentists. They are also allowed to pre- for Needs Assessment (CIZ) has been commis-
scribe some types of medication. sioned by the government to carry out assess-
Patients register with a GP of their choice and can ment for eligibility under the Wlz. Patients, their
switch to a new one without restriction. However, relatives, or their health-care providers can file a
GPs have the right to refuse a patient. Reasons to request with the CIZ for long-term care. The CIZ
refuse patients can be that the patient lives too far assesses the patient’s situation and decides
from the practice or because GPs have too many what care is required. Patients can choose
patients on their list. Almost 100% of the population between receiving a personal care budget (only
can reach a GP within 15 min from their home. GPs in the case that they need care for more than 10 h
can usually be visited within 2 days. A full-time GP per week) to purchase care themselves and
has a practice list of approximately 2,350 persons. receiving the care in kind. Personal budgets are
subject to discussion because of a number of
fraud cases.
Specialized Ambulatory Care/Inpatient Nursing homes are especially for people with
Care severe conditions who require constant nursing
care. All others in need of care receive this care
Dutch hospitals provide practically all forms at home. The majority of the residents in nursing
of outpatient as well as inpatient secondary care. homes and residential homes are older than
Except in cases of emergency, patients only con- 80 years.
sult a specialist upon referral from a GP. Most Home care is provided by home care organiza-
hospitals also have 24-h emergency wards. tions. Besides care for the elderly and people with
disabilities, home care organizations provide
maternity care. Since the long-term care reform
Pharmaceutical Care of 2015, the number of people who are eligible for
nursing homes have decreased drastically. It is the
The supply of prescription-only pharmaceuticals policy of the Dutch government to keep people at
is exclusively reserved to pharmacists and home as long as possible.
dispensing GPs (in some rural areas). Over-the-
counter (OTC) pharmaceuticals for self-medica-
tion are available both at pharmacies and chem- Mental Health Care
ists. There are three types of pharmacies: public
pharmacies, hospital pharmacies, and dispensing Mental health care is provided both in primary and
general practices. Public pharmacies should be in secondary health care. Primary health-care pro-
reachable within 4.5 km from the patient’s home. fessionals in mental health care include GPs,
If this is not the case, a local GP can ask for a psychologists, and psychotherapists. When more
dispensing license. In 2011, about 6% of the GP specialist care is required, the GP refers the patient
practices are dispensing medicines. A new devel- to a psychologist, an independent psychothera-
opment in pharmaceutical care is the emergence pist, or a specialized mental health-care institu-
of Internet pharmacies. In 2013, there are eight tion. When the mental problems can be handled
Internet pharmacies active. Most of them do not within general practice, the GP may refer to a
have a physical location and deliver medicines by mental care practice nurse (praktijkondersteuner
courier services. GGZ), who is working within the practice.
The first three years of mental health treat- Palliative Care

ment are part of the basic health insurance and
are thus financed under the Health Insurance Act Palliative care is provided by general practitioners,
(Zvw). The funding of preventive mental health home care, nursing homes, specialists, and volun-
care and youth mental care is part of the Social tary workers at home. Furthermore, there are grow-
Support Act (WMO), which means that the ing numbers of hospices and palliative units (e.g.,
responsibility for organizing this care lies with in nursing homes). Most palliative care is integrated
the municipalities. into the regular health-care system.
Dental Care Reforms
Oral health care is provided in primary care by The main reform in the Dutch health-care system
private dentists and dental hygienists. Most citi- took place in 2006. The dual system in which two
zens register with a dentist. Most dentists work in third of the population (earning an income below a
small independent practices (about 70%). Dental certain threshold) was insured publicly and one
hygienists are specialized in preventive care and third privately was abolished. Since 2006, there
can be visited directly or upon referral from the is one insurance system for all citizens, with a
dentist. Preventive tasks and relatively simple community-rated premium that cannot be differen-
dental care are increasingly being substituted to tiated toward different risk groups. Insurers are
dental hygienists. Nine out of ten dentists regu- obliged to accept citizens who apply for a health
larly refer to a dental hygienist either in their own insurance policy. Together with this reform, the
practice, to the practice of a colleague, or to an financing system changed. Although some aspects
independent dental hygienist practice. of market forces were already incorporated into the
system before the reform, since 2006, market
mechanisms became officially introduced into the
Out-of-Hour and Emergency Care system. This imposed a new role for especially
health insurers and health-care providers. They
Patient with nonlife-threatening conditions goes had to learn to negotiate on price, volume, and
to the special GP cooperatives for out-of-hour quality. To ensure a smooth transition, in the first
care. For life-threatening conditions or upon refer- years, only a small part of the provided care was
ral of the GP in the GP post, patients can go to the freely negotiable. This share increased over the
24-h emergency department of the hospital. years, and in 2012, about 70% of the hospital care
expenditure was freely negotiable, with the
remaining 30% being regulated covering care that
Informal Care is too difficult or not suitable for free market nego-
tiations, such as intensive care in hospitals. The
The estimates of the number of people who Dutch Healthcare Authority defines the care activ-
provide informal care vary from approximately ities that are subject to remuneration. The prices for
1.7 million people (Oudijk et al. 2010) to 3.7 these activities in the free segment can be negoti-
million (Houben-van Herten and Te Riele 2011). ated by the market parties, although for some
Informal carers (60% women, about half in the issues, maximum prices are set. For instance, for
age of 45–65 years old) provided care (emotional the remuneration of independent medical special-
support, household work, accompanying during ists, a maximum hourly tariff is set. Selective
visits to family) to ill or disabled people, mostly to contracting by insurers is allowed, as long as
parents (40%) or spouses (18%). It is the policy of insurers can assure sufficient care for their clients.
the government to stimulate informal care, in However, until recently, none of the large insurers
order to keep healthcare affordable. opted for selective contracting. There has been one
attempt by a large insurer to refrain to contract a be available and affordable for all citizens. The
large hospital in the Dutch capital in 2012, which increasing demand for care and increasing costs
got a lot of attention in the Dutch newspapers. The as a result of technological and demographic
hospital finally agreed with the lower budget and developments may result in fundamental
thus can still provide care to their patients. changes in health care. People are encouraged
Another important reform is found in to stay at home as long as possible, with the aid
the Exceptional Medical Expenses Act (AWBZ). of informal carers and volunteers. Examples of
This Act regulated long-term care in the new initiatives are institutional care providers,
Netherlands up to 2015. However, over the years, who aim to agree by contract with informal
the act encompassed more and more care activities, carers to provide a minimum of 4 h of informal
leading to a strong increase in expenditure. The care per month. This led to a lot of societal
main target of the reform is to reduce the care commotion. Furthermore, mild forms of institu-
insured under the act to care where it initially was tional care are no longer provided, and new
meant for: care that is unaffordable for individual patients needing this type of care will receive
citizens and their insurers. This is, for instance, care this care at home.
in a medical home for the elderly. The following Dutch citizens are on average very satisfied
care was transferred from the AWBZ to other acts. with their health-care providers (they give a
Home help and social support became a responsi- score of 7.7–7.9 on a scale of 1–10) (Statistics
bility of municipalities under the Social Support Netherlands 2012). Healthy persons are slightly
Act (WMO). Curative mental care was transferred more satisfied than persons with ill health, and
to the Health Insurance Act and became part of the lower-educated people are more satisfied than
basic insurance package (for the first three years of young people and higher-educated people.
care). Youth care was transferred to municipali- In 2011, life expectancy for males was
ties under the Youth Act. Home nursing care is 79.2 years and for females 82.9 years. In the
transferred to the Health Insurance Act. The most past decade, the life expectancy for men
important consequence of this choice is that under increased with 3.4 years and for women with
the Health Insurance Act, citizens have a right on 2.2 years. Healthy life expectancy increased sig-
certain care whereas under the municipalities, the nificantly for men (from 9.2 to 10.9 healthy years
emphasis will be on individual responsibility. for 65-year-olds) but not for women (Statistics
Municipalities have the obligation to compensate Netherlands 2012).
citizens in such a way that they can participate in Mortality from cardiovascular diseases has
the society. The individual circumstances of the steadily decreased over the past decade. Several
citizen may be taken into account. This is called factors have contributed to this decrease, such as
the compensation principle: tailor-made measures a better treatment of high cholesterol and high
instead of rules. The reform came with a major blood pressure and more attention for a healthy
reduction in the budget, since municipalities were lifestyle. Furthermore, more people are aware of
considered to be closer to their citizens and thus the fact that they have a high blood pressure,
better able to efficiently organize the care. making treatment possible. Besides, the develop-
ment in technological options to treat cardiovas-
cular diseases resulted in more patients surviving
Assessing the Health System the disease (Statistics Netherlands 2012). Mortal-
ity due to cancer increased lightly in the past
Some Indicators of Health and Health decade. In 2008, cancer got ahead of cardio-
Care in the Netherlands vascular diseases as most important cause of
mortality.
The Dutch government stipulated in the explan- Affordability of health care is still a cause of
atory note accompanying the health-care budget debate in the Netherlands. Expenditure on health
in 2013 that essential care of good quality should care continues to increase over the years, both due
to increasing prices and an increase in volume in prices for the freely negotiable part of hospital
of care. The government wishes to diminish care showed a decrease in 2010 of 3% and in
especially the increase in volume of care. From 2011 of 1.3%. However, these decreases are
2006 to 2011, the expenditure on care under the mainly due to the tariff caps for medical special-
Health Insurance Act and the Exceptional Medi- ists that were the result of the large increase in
cal Expenses Act increased with on average 4.4% medical specialist’s income in the years before.
per year. In 2011, the expenditure increased with Selective contracting in the health purchasing
3.6%. market is currently still in its infancy. In 2012,
Citizens find accessibility of and solidarity in a large insurer decided to not contract a large
health care important. However, citizens appear to hospital, but later that year, the hospital accepted
have little insight in health-care expenditure. They the lower tariffs proposed by the insurer. In the
are aware of the compulsory deductible and of health-care provision market, patients mainly go
the community-rated premium for the Health to the medical specialist who is advised by their
Insurance Act, but they are hardly aware of general practitioner. There is information avail-
the income-related premiums for the Health able on the Internet on quality of care, but con-
Insurance Act and the Exceptional Medical sumers find it difficult to use this information
Expenses Act that is paid by their employer (Damman et al. 2012).
directly to the government (Kooiker et al. 2012).
Competition in care is not popular among Dutch
citizens, it is associated with a profit orientation, The Dutch Health-Care System in
expensive managers, and a large overhead International Perspective
(Kooiker et al. 2012).
The accessibility of the Dutch health-care When looking at health-care supply, the
system is excellent. Nearly all citizens are Netherlands has a low number of acute care
insured, and waiting times are on average accept- hospital beds with 301 beds per 100,000 inhab-
able. There are a few specialisms that have a itants in 2010, below the EU average, but 10
larger waiting time than the norm of 4 weeks countries have a lower number of beds, with
for a first appointment and only for a few treat- Finland on top with about 180 beds per
ments the waiting time exceeds what is seen as 100,000 inhabitants. The supply of long-term
acceptable. beds (in nursing and elderly homes) is large
Competition in the health insurance market compared to most European countries, with
seems to be present. In 2012, 6% of the citizens 1,036 beds per 100,000 inhabitants. For those
switched insurers, and in 2013, this was 8.3%, countries where information is available, only
which can be seen as an indicator that competi- Finland and Malta have a higher supply of long-
tion in this market is present. In the health-care term care beds in 2011. The Netherlands has
purchasing market, nearly all general practi- nearly the lowest number of physicians in
tioners are contracted by the health insurers for Europe (58 physicians per 100,000 inhabitants),
the maximum tariff (Nederlandse Zorgautoriteit with only Denmark and Ireland having even
[Dutch Healthcare Authority] 2012). Health lower numbers. The number of general practi-
insurers managed to contract 90% of the hospi- tioners is also below the EU average, with
tals for the year 2011 before the end of that year, 72 GPs per 100,000 inhabitants, the EU average
which is rather late, considering that health being 82 GPs per 100,000 inhabitants (World
insurers have to publish their premiums in Health Organisation 2013).
November of the year before. To evaluate quality Acute care hospital admission rates are
of care, several indicators have been developed among the lowest in Europe with 11.4 admis-
by the Dutch Healthcare Authority, but these are sions per 100 inhabitants in 2009. Since 2001,
not yet published due to the fact that they cannot with 8.8 admissions per 100 inhabitants, the
yet be corrected for casemix. The development number of admissions is increasing. The average
length of stay had decreased considerably from

above the EU average in 2000 with 9 days to Box 1 (continued)
5.6 days in 2009, which is below the EU average. Introduces the right for each citizen to be
The number of doctor’s consultations is slightly able to fully participate in society; munici-
below the EU average, with 5.8 consultations per palities should help to overcome barriers.
person in 2009. Health-care expenditure as per- For instance, home help, transportation,
centage of GDP is the highest among Europe home adaptations, sheltered housing, and
with almost 12% in 2010 (World Health Organi- wheelchairs can be applied for by the
sation 2013). municipality.
This chapter is mainly based on the Health Care Market Regulation Act (Wet
Health System Review of the Netherlands marktordening gezondheidszorg)
(Schäfer et al. 2010) and the publications in This act regulates the development,
www.hspm.org: The Netherlands, with updates structuring, and supervision of the
where necessary. health-care markets. The act regulates the
establishment of the Dutch Healthcare
Authority as an independent administra-
Box 1 Main features of the most important tive organization that supervises the
acts that regulate the Dutch health care health-care markets.
system Health Care Admission Act (Wet
Health Insurance Act (Zorgverzekeringswet) Toelating Zorginstellingen)
Regulates the compulsory basic health Health-care institutes need an admission
insurance for citizens, the voluntary and if they provide care under the Health
compulsory deductible, the obligation for Insurance Act or the Exceptional Medical
health insurers to accept every person who Expenses Act. A request is handled by the
applies for a policy, the risk adjustment Central Information point Professions in
system to compensate health insurers for Health Care (Centraal Informatiepunt
persons with high health-care consumption, Beroepen Gezondheidszorg).
and the supervision of the system. Public Health Act (Wet publieke
Long-term Care Act gezondheid)
This act is a social health insurance The act regulates collective prevention,
scheme and regulates the admission of peo- infectious diseases control, and youth care.
ple in nursing homes. People should need Individual Health Care Professions
24 hours supervision for being eligible for Act (Wet op de Beroepen in de
this type of care. Gezondheidszorg)
Youth Care Regulates the care provision by health-
This act regulates mental care and edu- care professionals and the quality of care.
cational support for children under the age A second aim is protection of patients. Pro-
of 18 and their parents. The care is orga- fessionals have to register in the BIG
nized by municipalities. registry.
Health Care Allowance Act (Wet op de Medical Treatment Agreement Act (Wet op
Zorgtoeslag) de Geneeskundige Behandelovereenkomst)
Regulates that people with low incomes Regulates the right to information,
are partly compensated for the community- consent for medical treatment, and access
rated premium, in order to keep health to medical files. The Act further regulates
insurance affordable for this group. the requirement of confidentiality and
Social Support Act (Wet Maatschappelijke the right to privacy during medical
Ondersteuning) treatment.
References care. Branche report general acute care hospitals 2012].

Utrecht/Den Haag: Nederlandse Vereniging van
Brouwer WBF. Met het oog op gepaste zorg, Deel I: Ziekenhuizen/SiRM; 2012.
Over-, onder en gepaste consumptie in de zorg vanuit Nederlandse Zorgautoriteit [Dutch Healthcare Authority].
economisch perspectief [With a view to suitable care, Tariefbeschikking [Tariff decision]. TB/CU-7023-01;
part I: over-, under- and suitable consumption of care volgnr. 29. 16 Dec 2011.
from economic perspective]. Zoetermeer: Council for Nederlandse Zorgautoriteit [Dutch Healthcare Authority].
Public Health and Health Care (RVZ); 2004. Marktscan huisartsenzorg. Weergave van de markt tot
Cardol M, Van Dijk L, De Jong JD, De Bakker D, Westert en met 2011 [Market scan GP care. Overview of the
GP.Tweede Nationale Studie naar ziekten en market up and until 2011]. Utrecht: Nederlandse
verrichtingen in de huisartspraktijk: huisartsenzorg: Zorgautoriteit; 2012.
wat doet de poortwachter? [Dutch National Study of Nederlandse Zorgautoriteit [Dutch Healthcare Authority].
General Practice: GP care: activities of the gate- Marktscan en beleidsbrief Zorgverzekeringsmarkt.
keeper]. Utrecht: NIVEL; 2004. Weergave van de markt 2009–2013 [Market scan and
Centraal Bureau voor de Statistiek. Regionale Kerncijfers policy brief Health insurance market. Overview of the
Nederland [Regional core statistics the Netherlands]. market 2009–2013]. Utrecht: Nederlandse
http://statline.cbs.nl/StatWeb/publication/?DM=SLNL& Zorgautoriteit; 2013.
PA=70072NED&D1=286-288&D2=0,2,10,31,62,84, Nederlandse Zorgautoriteit [Dutch Healthcare Authority]
135&D3=11-13&VW=T. 26 Aug 2009. Centraal & DBC-onderhoud. Toelichting op de honorarium-
Bureau voor de Satistiek. 28 Aug 2009. berkening DBC-zorgproducten [Explanatory note to
Centraal Bureau voor de Statistiek. Stijging zorguitgaven the calculation of the tariffs for medical specialists in
vooral door toename volume [Increase healthcare DBC care products]. Utrecht: Nederlandse
expenditure mainly due to increase in volume]. 3 Jan Zorgautoriteit; 2012.
2013a and 15 Oct 2013a. OECD. OECD health data. 2013. 9 Sept 2013.
Centraal Bureau voor de Statistiek. Uitgaven aan zorg met Oelen M. Meer verpleegkundigen willen medicatie
3,7 procent gestegen [Health expenditure increased voorschrijven [More nurses would like to prescribe
with 3.7 percent]. 16 May 2013b and 15 Oct 2013b. medicines]. Nursing, Tijdschrift voor verpleeg-
Damman OC, Hendriks M, Rademakers J, Spreeuwenberg kundigen. 25 Apr 2013 and 10 Oct 2013.
P, Delnoij DMJ, Groenewegen PP. Consumers’ inter- Oudijk D, De Boer A, Woittiez I, Timmermans J,
pretation and use of comparative information on the De Klerk M. Mantelzorg uit de doeken [Informal care
quality of health care: the effect of presentation explained]. Den Haag: Sociaal Cultureel Planbureau;
approaches. Health Expect. 2012;15(2):$32#197–211. 2010.
Dautzenberg M, Weenink J-W, Faber M, Ouwens M. Reitsma M, Brabers A, Masman W, De Jong J. De kiezende
Kiezen borstkankerpatienten voor kwaliteit? [Do breast burger [The choosing citizen]. Utrecht: NIVEL; 2012.
cancer patients choose for quality]. Nijmegen: IQ Roos AF, Schut FT. Evaluatie aanvullende en collectieve
Scientific Institute for Quality of Healthcare; 2012. ziektekostenverzekeringen 2009 [Evaluation of VHI
Editorial Office Nursing. Verpleegkundig specialist and collective health insurances 2009]. Rotterdam:
mag medicatie voorschrijven [Specialist nurses Instituut Beleid en Management Gezondheidszorg
allowed to presribe medicines]. Nursing, Tijdschrift (Erasmus MC) Erasmus Universiteit Rotterdam; 2009.
voor verpleegkundigen. 2 Nov 2011. Schäfer W, Kroneman M, Boerma W, Van den Berg M,
Houben-van Herten M, Te Riele S. Vrijwillige inzet Westert G, Devillé W, Van Ginneken E. The
[Volunteers]. Den Haag/Heerlen: Centraal Bureau Netherlands health system review. Health Syst Transit.
voor de Statistiek; 2011. 2010;12(1):1–228.
KNMG. Aantal geregistreerde specialisten/proefielartsen Statistics Netherlands. Gezondheid en zorg in cijfers 2012
op 31 december van het jaar [Number of registered [Health and healthcare in figures 2012]. Heerlen:
medical specialists per December 31]. 11 Jan 2013 Centraal Bureau voor de Statistiek; 2012.
and 10 Oct 2013. Statistics Netherlands. Statline. 2013. 15 Sept 2013.
Kooiker J, De Klerk M, Ter Berg J, Schothorst Y. Ten Hove M, Van Hilten O, Berger-van Sijl M, Mets-op’t
Meebetalen aan de zorg. Nederlanders over solidariteit Land JM. Zorgthermometer. Verzekerden in beweging
en betaalbaarheid van de zorg [Co-paying in health care. [Care monitor. Insured on the move]. Utrecht: Vektis;
Dutch citizens about solidarity and affordability of health 2012.
care]. Den Haag: Sociaal Cultureel Planbureau; 2012. Van Hassel DTP, Kenens RJ. Cijfers uit de registratie van
Ministry of Foreign Affairs. About the Netherlands. 2013. huisartsen. Peiling 2012 [Figures from the GP registry
10 Oct 2013. 2012]. Nivel: Utrecht; 2013.
Nederlandse Vereniging van Ziekenhuizen. Gezonde zorg. World Health Organisation. European health for all
Brancherapport algemene ziekenhuizen 2012 [Healthy database. 2013. World Health Organisation.
Health System in Singapore
39
William A. Haseltine and Chang Liu
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878
Organization and Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878
Regulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879
Health Information Systems and Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879
The Role of Patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
Financing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
Coverage and Subsidies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
Sources of Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 881
Cost Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 882
Pooling of Funds and Purchasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
Healthcare Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
2012 Singapore Healthcare Professional Workforce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
Workforce Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
Paying Healthcare Professionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
Primary Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
Community Health Assist Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
Care Coordination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885
Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
Breakdown of operators for various long-term care services . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
Mental Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
The Private Hospitals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887
Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887
Main Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887
Recent Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
Planned Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
W. A. Haseltine (*) · C. Liu

ACCESS Health International, New York, NY, USA
e-mail: wahaseltine@me.com; chang.liu@accessh.org

https://doi.org/10.1007/978-1-4939-8715-3_47
878 W. A. Haseltine and C. Liu
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
User Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
Health Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
Transparency and Accountability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890
Abstract well-being and social harmony. Stable, astute

Singapore is a small island nation located off the political leadership and long-term economic pol-
southern tip of the Malay Peninsula in Southeast icy planning has turned the once impoverished
Asia. The country has a population of 5.31 country into an economic powerhouse allowing
million, of which 3.82 million are citizens and it to build its world-class healthcare system.
permanent residents. With a land area of 715 Earliest steps leading up to creation of the
square kilometers, Singapore’s population den- system involved improving the general state of
sity is 7,422 per square kilometer, making it one public health through proper sanitation, control
of the most densely populated sovereign states of infectious diseases, and development of clean
in the world. Ethnically, the population is over- water and food supplies. Once satisfied that it had
whelmingly Chinese – almost 75%, followed by reached its goals, health policy planners began to
Malays at just over 13%, and Indians at 9%. build the health system’s infrastructure, including
primary care centers at the community level as
well as regional hospitals.
Introduction
Among the citizens and permanent residents (i.e., Organization and Governance
excluding nonresidents in the country), approxi-
mately 23% fall under the age of 20, 67% are Organization and Planning
between 20 and 64, and 10% are 65 or older.
The median age is 38.4 (Department of Statistics, Singapore’s Ministry of Health has overall govern-
Singapore 2013). ment responsibility for addressing the healthcare
Until 1959, the year it achieved internal self- needs of the people. Key ongoing activities include:
government, Singapore was a colonial outpost of assessment of needs and planning for services and
the British Empire. At the time of the British with- for manpower, governance, and financing.
drawal, the country was impoverished, with no Assessing needs: The ministry makes regular
industrial base or natural resources upon which to projections of the disease burden and determines
build its economic future. After a brief and unsuc- whether the current levels of service are sufficient.
cessful merger (1963–1965) with Malaysia – its Service gaps that are detected are prioritized at the
much larger neighbor to the north – Singapore national and the regional levels.
became a fully independent nation under a govern- Services planning: The ministry projects facil-
ment controlled by the People’s Action Party or PAP. ity requirements for primary care locations, acute
The People’s Action Party has been the major- and community hospitals, nursing homes, and
ity party ever since, and its longevity in power has other services. Local care models are assessed to
provided Singapore with a remarkable era of ensure they remain up to date with the latest
political stability. This stability has over the medical advances as well as local developments.
years nurtured a consistent political vision, a con- The ministry is also responsible for planning and
stancy of purpose and action, and a culture of developing the systems IT capability.
cooperation among all government ministries. Manpower planning: The ministry projects
As a result, its policymakers have been able to manpower demand and responds with training
develop and implement extremely long-range and education, attracting talent, and overseas
plans that reflect the nation’s desire for collective recruitment as necessary to meet demand. It is
39 Health System in Singapore 879
also responsible for workforce management practice, ethics, and standards of care and to
including retention and upgrading of skills. consult on policy and operational matters. The
Governance and financing of the system: The ministry also engages them to explain policy
ministry is also responsible for financing policies rationale and garner their support in
and governance, including a performance man- implementing various initiatives.
agement system. It also creates feedback mecha- The Health Sciences Authority regulates the
nisms to drive continual improvement in all areas manufacture, import, supply, presentation, and
of responsibility. advertisement of health products – including med-
icines, complementary medicines (traditional
medicine and health supplements), cosmetic prod-
Regulation ucts, medical devices, tobacco products, and
medicinal products for clinical trials. Its mission
The healthcare system is regulated by the Ministry is to ensure that all meet internationally
of Health through legislation, regulation, and benchmarked standards of safety, quality, and
enforcement. One of its agencies, the Health Sci- efficacy.
ences Authority, regulates health products, includ- The insurance industry is regulated by the
ing medicines. Professional bodies, including the Monetary Authority of Singapore as part of its
Singapore Medical Council, Singapore Dental role as the financial regulatory authority of Singa-
Council, Singapore Nursing Board, and Singa- pore. The Ministry of Health regulates the seg-
pore Pharmacy Board, self-regulate their ment of the health insurance market for plans that
healthcare professionals through codes of ethics are paid by Medisave.
and conduct, practices, and guidelines.
One of the core regulatory functions of the
ministry is the licensing of healthcare institutions Health Information Systems and
under the Private Hospitals and Medical Clinics Technology
Act and conducting regular inspections and
audits. These institutions provide services that Singapore benefits from an information manage-
aid in or provide medical diagnosis, treatment, ment system that collects, reports, and analyzes
rehabilitation, and management of patients. Lab- information to aid in the formulation of policy as
oratory and radiology services are two examples. well as the monitoring of implementation.
Public and private hospitals, clinics, laboratories, Sources of information include administrative
and nursing homes are required to submit appli- data and survey-based data, articles, and reports
cations to the ministry for the license to operate. from professional journals and reports and from
Pre-licensing inspections are conducted to ensure external organizations.
standards. Complaints, surveillance, and analysis The Singapore healthcare system is heavily
of advertisements are used to identify potential invested in IT infrastructure and in the develop-
problems, and they are followed up with compliment of information systems for processing and
ance audits and possible prosecutions. Marketing storing large volumes of data in support of policy
by these licensed facilities is also regulated in research, planning, operations, and monitoring.
order to safeguard the public against false or High-quality data standards, IT security, and
unsubstantiated claims and to prevent inducement audits are utilized to ensure accuracy and reliabil-
to use nonessential services such as aesthetics ity of all information collected. In addition, exter-
medicine. nal data is carefully screened to ensure that
The ministry also works closely with profes- sources are reputable and trustworthy.
sional bodies such as the Academy of Medicine Both public and private healthcare providers
and the College of Family Physicians and with are required to report their service statistics to the
union-associations such as the Singapore Medi- Ministry of Health, including two types of infor-
cal Association as well as industry groups to mation: inpatient capacity and utilization, such as
discuss a wide range of issues such as their number of inpatient beds, beds in service, bed
occupancy rates, inpatient discharges, and aver- approach their healthcare choices knowing that
age lengths of stay, and surgical procedures, they will pay a part of the bill. Still, national
including inpatient and day surgeries, and saving accounts, insurance programs, and a safety
deliveries. net help to ameliorate the financial burden.
In addition, public providers are required to
report on their polyclinic, specialist outpatient,
and emergency department attendances. Coverage and Subsidies
Subsidies flow to and through the healthcare sys-

The Role of Patients tem in this way: government pays subsidies
directly to public hospitals, polyclinics, and
The needs of the nation’s patients and stake- other healthcare providers reimbursing them for
holders are taken into account through various a portion of their costs for treating patients. The
means. Public consultation takes place before pol- funding system is a hybrid mix of block grants and
icies are enacted to ensure that public sentiment, Casemix, a methodology for classifying and
concerns, and feedback are added to the discus- describing providers “output.” Approximately 70
sion; that diverse views, testing, and refinement of medical conditions are financed through Casemix.
ideas take place; and that public understanding Hybrid block grants are allocated to public
and support are cultivated in order to facilitate hospitals. A portion of the hospitals’ annual bud-
implementation. gets are provided as a block grant, with the
The Ministry of Health conducts an annual remainder provided on a piece-rate basis for 70
patient satisfaction survey for patients of the pub- common conditions based on Diagnosis-Related
lic sector healthcare institutions. The survey Groups (DRG). DRG is a system for classifying
focuses on key service areas such as overall satis- inpatient and day surgery cases, according to the
faction and expectations, care coordination, facil- patients’ diagnosis and treatment, into one of
ities, care and concern shown by medical more than 600 groupings. Hospitals can reallocate
professionals, as well as their knowledge and their savings for use in the broad areas that the
skills. Ministry of Health has identified, such as teaching
and research. The hybrid block budgets are
reviewed every 3 to 5 years against the actual
Financing workload of the care providers.
Patients receive the benefits of the government
Funding system of subsidies in a number of ways, includ-
ing acute and inpatient care in specific ward clas-
Funding of the system comes from a combination ses in the public hospitals, for outpatient care in
of government subsidies, individual savings, the public polyclinics as well as the specialist
insurance, and other third-party payers, such as outpatient clinics at public hospitals, and emer-
employer benefits, etc. The philosophy at the heart gency care at all public hospitals. Eligible low-
of Singapore’s system is the requirement that con- and middle-income patients may also receive sub-
sumers of healthcare must share in the costs of sidies for intermediate- and long-term care at
their care. Thus, private expenditure on care facilities managed by voluntary welfare and pri-
(including Medisave, MediShield, and vate organizations, outpatient treatment for
Medisave-approved insurance plans) is high com- chronic and or acute conditions, and also certain
pared to countries with comparable systems – dental procedures, at private sector primary care
almost 70 (68.6)% of the total national expense providers.
of healthcare. As a result, while government sub- Subsidies are closely linked to the ward classes
sidies reduce the cost of services provided for in Singapore’s public hospitals, which range from
those who opt for subsidized care, patients private rooms to dormitory-style accommodations
with a corresponding range of amenities, choices, Means testing in public hospitals as of 1 January 2009
and prices but access to the same doctors and Citizens subsidyd
assurance of the same quality of care. There are Average monthly income of Class C Class B2
four classes: A, B1, B2, and C. A is the most patient (SGD)a ward(%) ward(%)
costly, with C the least costly. A-level wards con- $5,201 and abovec 65 50
a
tain private rooms with bath, air conditioning, and Monthly income is defined as average monthly wage
access to private doctors of the patient’s choice. C based on last available 12 month data (including bonuses)
b
No income and property with annual value (estimated
patients are in open wards, with eight or nine value of a property if it were rented out) $13,000 and below
patients in a room, sharing a bath, and usually c
No income and property with annual value exceeding
without air conditioning. Doctors are assigned to $13,000
d
these patients. Subsidies for Singapore permanent residents in most
income bands will receive half the corresponding subsidy
As amenities increase, subsidies decrease. that citizens receive (Ministry of Health, Singapore)
Patients in the A wards receive no subsidy, while
C-ward patients receive subsidies of up to 80% – Patients do have a choice in the matter of ward
depending on their income – of their ward classes. Individuals with high incomes can choose
charges, drugs, and medical treatment. C-ward the C ward, but their subsidy would be much lower
patients also receive subsidies on surgical proce- than what a low-income individual receives. Con-
dures and physicians’ fees. In the wards between versely, low-income patients can choose to stay in a
A and C, subsidies increase as amenities and class A ward if they can pay for it.
choices decrease.
Class ward Subsidy level
Sources of Revenue
A 0%
B1 20%
Government Healthcare Budget
B2 65–50%a
Funding of the healthcare system takes place
C 80–65%a
a
through the Ministry of Health. The ministry’s
Financial means testing determines eligibility for subsidy
for patients in C and B2 wards budget for fiscal year 2013 is $5.7 billion. The
ministry’s budget is used for healthcare subsidies,
promoting good health practices in the population,
Means testing in public hospitals as of 1 January 2009
developing manpower, training of healthcare pro-
Citizens subsidyd fessionals, and infrastructure. A total of $4 billion
Average monthly income of Class C Class B2 is allocated for subsidies to Singaporeans receiving
patient (SGD)a ward(%) ward(%) medical care at the public hospitals, polyclinics,
$3,200 and belowb 80 65 community hospitals, and institutions providing
$3,201–$3,350 79 64 intermediate and long-term care. A sampling of
$3,351–$3,500 78 63 other budget allocations include: $177 million for
$3,501–$3,650 77 62 initiatives addressing obesity prevention, tobacco
$3,651–$3,800 76 61 control, childhood preventive health services,
$3,801–$3,950 75 60 chronic disease management, and public
$3,951–$4,100 74 59 education and $70 million for Medisave grants to
$4,101–$4,250 73 58 newborn Singapore citizens (Ministry of Health,
$4,251–$4,400 72 57
Singapore 2013c).
$4,401–$4,550 71 56
$4,551–$4,700 70 55
$4,701–$4,850 69 54 Private Expenditure on Healthcare
$4,851–$5,000 68 53 The other major source of funding for the system
$5,001–$5,100 67 52 is private financing and expenditure on healthcare.
$5,101–$5,200 66 51 Singaporeans pay co-payments and deductibles
(continued) that are often higher than in other nations.
According to the World Health Organization, pri- expensive or long-term treatment. Insured patients
vate expenditure amounts to almost 70 (68.6)% of must usually pay 20% of the cost of such care.
the nation’s total expenditure on care. This statis- Private, Medisave-approved insurance, called
tic reflects the government’s guiding philosophy Integrated Shield Plans, are meshed together with
that healthcare is not free and, as stated earlier, MediShield to form an integrated plan for users.
that consumers of care must pay a portion of Such private plans give patients additional bene-
the cost their care. Of the private expenditure, fits and coverage for paying the costs of private
74.2% represent out-of-pocket expenditure hospitals or Class A and B1 wards in the public
versus 8% from Medisave and 6% from hospitals. Policyholders keep the benefits and
MediShield and Integrated Shield Plans (World coverage afforded then by their basic MediShield
Health Organization 2013; Ministry of Health, plans. In addition, Medisave can be used to pay
Singapore). the premiums of the approved, private plans, sub-
At the heart of Singapore’s system of private ject to a limit. Like MediShield, they also include
financing and expenditure are mandated savings deductibles and co-payments in accordance with
and insurance programs that help consumers pay the healthcare systems requirement that con-
for care. They are known as the “3Ms” – sumers of care must contribute to the cost of
Medisave, MediShield, and Medifund. They their care. Catastrophic insurance is widely held
play a critical role in maintaining the health and covers partial costs of expensive or long-term
and welfare of Singapore’s people and the suc- treatment. Insured patients must usually pay
cess of the healthcare system itself. The most 10–20% of the cost of such care (Ministry of
critical component of the trio is Medisave, a Health, Singapore).
mandatory, individual medical savings account Medifund, the third “M” is an endowment pro-
to which workers contribute a percentage of gram funded by the government as a healthcare
their wages which employers match. Medisave safety net that aids the poor pay in paying for their
grew out of the nation’s Central Provident Fund, care. Medifund was set up in 1993 to assist
a mandatory savings program originally created Singaporeans who could not pay their medical
by the British during their rule of Singapore to bills. Needy citizens can apply for assistance and
help workers pay for their retirement. Contribu- are means tested before their applications are
tions to the accounts are tax exempt, as are approved.
withdrawals. The account is used to pay for In addition to the 3Ms, another program,
health services and health insurance for the labeled ElderShield was introduced in 2002 to
account’s owner as well as for family members. provide insurance coverage for the costs of long-
MediShield, the second of the 3Ms, is a low- term care necessitated by very serious disabilities
cost insurance program paid for by the insured for in the elderly. ElderShield is an opt-out program
coverage against catastrophic inpatient bills and that commences for individuals when they turn
selected outpatient care. MediShield premiums 40 years of age. The insurance is offered by pri-
can be paid for from the individual’s Medisave vate insurers only, who are selected through com-
account. Singaporeans are automatically enrolled petitive bidding that takes place every 5 years.
in the program but are able to opt out if they so Premiums are fixed at a flat rate based on the age
desire. Soon to be introduced is an extension of of the individual joining the program and are paid
this program called Medishield Life which will by the insured until age 65. Benefits are set at
cover all Singaporeans. fixed monthly payouts of $400 per month.
Private health insurance is also available. While
affordable, the plans also include deductibles and
co-payments in accordance with the healthcare sys- Cost Control
tems requirement that consumers of care must con-
tribute to the cost of their care. Catastrophic Singapore is a leader in keeping costs under
insurance is widely held and covers partial costs of control, and it does so while providing world-
class healthcare. The nation spends 4.5% of market. At the same time, government sets subsidy
GDP on care versus, for example, 17.9% of and cost-recovery targets for each ward class, which
GDP in the United States and 9.3% in the indirectly keeps the public sector hospitals from
United Kingdom. Here are some examples of producing excess profits. Hospitals are also given
private and public spending on healthcare annual budgets for patient subsidies, so they can
for several nations. All data as of 2010. plan accordingly, knowing in advance the levels of
reimbursement they will receive for patient care.
United They are required to break even within this budget.
Singapore States India China The entire system functions successfully because
Total 4.5 17.6 3.7 5 the quality of care in the public hospitals is
expenditure extremely high and is scrupulously maintained.
on health as
Singapore also regulates the number of medi-
% of GDP
General 31.4 48.2 28.2 54.3
cal students studying in the country, as well as the
government number of foreign medical schools’ degrees rec-
expenditure ognized in the country. In this way, the number of
on health as practicing physicians is controlled, preventing an
% of total
expenditure
oversupply of medical services and avoiding
on health induced demand. The medical savings programs,
Private 68.6 51.8 71.8 45.7 the insurance programs, and the subsidies to pub-
expenditure lic hospitals are continually adjusted. The num-
on health as bers of beds in the public hospitals are carefully
% of total
expenditure controlled. Government regulates and limits the
on health private insurance programs available to
(World Singaporeans. Wages of doctors in the public sec-
Health tor are kept reasonable and not sky-high and are
Organization
2013) periodically reviewed with the goal of keeping
them competitive with the private sector.
Singapore controls the costs of healthcare in a The private sector operates and thrives in this
number of ways, perhaps first and foremost in the quasi-capitalist environment, serving patients
manner by which it both fosters and controls com- who wish to pay more for certain services or
petition. The nation approaches healthcare as a amenities and competing with public sector facil-
quasi-capitalist market. Amid concerns in the early ities on price and quality.
1990s of soaring health costs, the government
issued a white paper entitled “Affordable Health
Care” that, among other issues, set the goal of Price Transparency
engaging competition and market forces to improve Another factor controlling costs is price and out-
service and raise efficiency. It was established that come transparency. The Ministry of Health makes
government would intervene directly in the available on its website the hospital bills for com-
healthcare sector when the market failed to keep mon illnesses (arranged alphabetically from ane-
costs down. This became the guiding policy of the mia to urinary stone), treatments, and ward classes:
system. Public and private hospitals exist side by http://www.moh.gov.sg/content/moh_web/home/
side in this market, with the public sector having the costs_and_financing/HospitalBillSize.html.
advantage of patient incentives and subsidies. Patients can look up the costs of specific sur-
Because it can regulate the number of public hospi- geries, the number of cases treated in each hospi-
tals and beds, the government is able to shape the tal, tests, and more. The data is complete for
environment of the marketplace. Within that envi- public sector hospitals while private hospital
ronment, market forces regulate the private sector, data is voluntary and may not carry the detail of
which must be careful to not price itself out of the the public sector information. Armed with pricing
information, consumers of care can better shop for Total (in active
the services they require. Public Private practice)
Nurses 20,911 8,348 29,259
Midwives 89 65 154
Pooling of Funds and Purchasing Dentists 357 1,215 1,572
Optometrists/ 155 2,124 2,279
Currently, there is no framework to pool funds to opticians
purchase provider services and goods, although a Pharmacists 934 1,048 1,982
system does exist that aggregates demand for bulk Information on the number of occupational
purchasing pricing. The Group Purchasing Office therapists, psychologists, and medical lab techni-
(GPO Pharma) consolidates drug purchases at cians are not available at this time (Ministry of
national level. One goal of this system is to keep Health, Singapore).
drug prices affordable for the elderly and lower-
income groups and contain the costs of pharma-
ceutical-related expenditure. GPO also purchases Workforce Trends
medical supplies, equipment, and IT services for
the healthcare system. Anticipating growth in demand, Singapore will
expand its healthcare professional workforce by
20,000 by the year 2020. This increase covers
Physical and Human Resources doctors, nurses, dentists, pharmacists, and allied
health professionals, representing a 50% increase
Healthcare Infrastructure from 2011. The nation is also expanding training
pipeline, encouraging mid-career professionals to
The data below provide a clear snapshot of the join the healthcare sector, and supporting older
main components of Singapore’s healthcare infra- healthcare staff who wish to continue working
structure as of December, 2012: for as long as they can.
Singapore is also looking to greater use of
Number of public acute hospitals (beds): 7 (6,985) technology, such as tele-consultations, and equip-
Number of public specialty centers (beds): 8 ment such as patient mobility aids to raise the
Number of private acute hospitals (beds): 9 (1,555) productivity levels of its professional workforce.
Number of private other hospitals (beds): 1 (20)
Number of public polyclinics: 18
Number of private medical clinics for primary Paying Healthcare Professionals
care: about 2,400
Number of community health centers: 2 The data below show the gross monthly wage of
Number of nursing homes: 66 healthcare professionals in Singapore in 2012
Number of hospices: 4
2012 Gross monthly wagea
25th 75th
Healthcare percentile Medianb percentile
2012 Singapore Healthcare professionals ($) ($) ($)
Professional Workforce (Primary care 9,058 11,398 16,358
doctors):
General
Total (in active practitioners/
Public Private practice) physicians
Total no. of 6,131 3,515 9,646 Specialist doctors:
doctors Specialist 9,919 20,516 30,300
Specialists 2,342 1,293 3,635 medical
(continued) (continued)
2012 Gross monthly wagea Specialized ambulatory surgical services are

25th 75th provided at the Singapore General Hospital and
Healthcare percentile Medianb percentile National University Hospital.
professionals ($) ($) ($)
practitioners
(medical)
Specialist 15,610 21,595 26,516
Community Health Assist Scheme
medical
practitioners Singapore’s public healthcare sector has been
(surgical) strengthening its ties with the large networks of
Nurses: 2,583 3,061 3,837 private general practitioners. They are being
(registered
nurses) enlisted into the Community Health Assist
Pharmacists 3,743 4,387 5,574 Scheme, a program that provides basic care, treats
Psychologists 2,841 3,537 4,498 certain chronic illnesses, and offers dental care.
Occupational 2,880 3,139 3,723 Lower- and middle-income patients can receive
therapists subsidized outpatient services, including dental
Medical 2,700 3,268 4,492 services, at the private clinics just as they would
technicians: at a government polyclinic. The program also
Medical and
Pathology Lab. covers treatment of common chronic diseases.
Tech
a
Data on monthly gross wages was collected from the Occu-
pational Wage Survey, 2012. Monthly gross wage refers to
the sum of the basic wage, overtime payments, commissions, Care Coordination
allowances, and other regular cash payments. It is before
deduction of employee CPF contributions and personal Singapore’s Agency for Integrated Care (AIC) was
income tax and excludes employer CPF contributions,
bonuses, stock options, other lump-sum payments, and pay- set up under MOH Holdings in 2009 and operates at
ments in kind. Detailed information on the Survey Coverage the patient, provider, and system levels for the ben-
and Methodology is available online efit of patients and families. The Agency provides
b
Median wage refers to the wage level at the middle of the
hospitals with teams – called Aged Care Transition
wage distribution which divides the bottom half of wage
earners from the upper half (Ministry of Manpower, Sin- Teams or ACTION Teams – that coordinate dis-
gapore 2012) charge planning and facilitate the transition of
elderly patients from hospitals to the intermediate
and long-term care sector. It is also the central
Delivery of Health Services referral agency for aged care services in the com-
munity, such as nursing homes, community hospi-
Primary Care tals, day rehabilitation, day dementia care, homecare
services, and home hospice services. Another key
Primary care is provided mainly by private rather function of the Agency is working with primary and
than public providers. There are 2,400 private med- intermediate and long-term care providers to expand
ical clinics offering primary care. Many belong to service capacity and improve healthcare capabili-
private general practitioner chains, and they are ties. The Agency also supports caregivers through
located throughout the neighborhoods of Singapore. “AIC @ City Square Mall,” a program that provides
There are 18 public polyclinics – multi-doctor facil- information on community support resources and
ities providing outpatient care, immunization ser- referral services for health and social care issues. In
vices, health screening, pharmacy services, and addition, the Agency administers financial assis-
more. Some also offer dental services. The tance schemes such as the Caregivers Training
policlinics are meant to serve lower-income patients Grant and the Foreign Domestic Worker Grant, to
who might not be able to afford the fees of the help families offset the costs of hiring and training
private clinics. foreign domestic workers.
Long-Term Care Total VWO- Private

Facility no. run operators
Long-term care services for the elderly are man- Home care 38 20 18
aged by voluntary welfare organizations or by providers
(Including
private operators. These include both residential
home
and nonresidential care options. Government sub- healthcare
sidies are available for seniors who utilize these and social
services, subject to a means test. services)
Residential facilities cater to the convalescent Note: All figures are dated as of Dec 2013 except for the
sick or elderly individuals who do not require number of Eldercare day centers and nursing homes, where
hospital care but are too ill or frail to care for figures are as of current (Ministry of Health, Singapore)
themselves or to be cared for at home. Examples
of residential care facilities include nursing homes
and inpatient hospices. Respite care services are Mental Healthcare
also available to caregivers.
Nonresidential services such as home and com- The National Mental Health Blueprint, formu-
munity-based care are also available to support the lated in 2007, guides Singapore agencies in pro-
elderly. Home care services involve the care staff viding mental health services, including active
visiting the homes of the homebound elderly to mental health education and prevention as well
provide medical, nursing, social care, and/or palli- as early detection and treatment for people at risk
ative care services. There are also eldercare ser- or facing emotional difficulties. The Community
vices, such as maintenance day care, day Mental Health Master Plan, developed in 2012,
rehabilitation, and dementia day care, provided lays the groundwork for building a network of
within centers in the community. The elderly attend care and supporting systems to enable integrated
these centers during the day but go back to their community living.
homes in the evening. Such nonresidential services In addition, resources and workshops devel-
are important in providing alternative care options oped by Singapore’s Health Promotion Board pro-
to institutionalization and facilitate seniors to age motes mental well-being. Programs are targeted at
gracefully in the community. young and the old as well as their family mem-
Below is a listing of the various long-term care bers/caregivers.
services available in Singapore. Singapore has one acute tertiary psychiatric
hospital – the Institute of Mental Health. Services
offered there include psychiatric, rehabilitative,
Breakdown of operators for various and counseling services for children, adolescents,
long-term care services adults, and the elderly, long-term care, and foren-
sic services. The Institute also houses the National
Addictions Management Services to treat patients
Total VWO- Private with addictions.
Facility no. run operators Psychiatric services are also embedded in the
Residential Nursing 66 32 34 other public hospitals, which offer general as well
facilities homes
as more specialized services such as eating, sleep,
Inpatient 4 4 0
hospices addiction disorders, and geriatric psychiatry.
Nonresidential Eldercare day 69 63 6
facilities centers Community Care
Home 9 8 1 As of this writing, Singapore is rolling out a series
palliative
care
of community-based mental health services to
providers complement those offered in tertiary mental
(continued) health facilities. Balanced development of tertiary
and community-based services has been shown to as well as run specialized pharmacy clinics, such
improve health and social outcomes while reduc- as an anticoagulation clinic.
ing system cost. Components of the community In the intermediate- and long-term care set-
care program include: multidisciplinary shared ting as well as in the home, programs have been
care teams that provide treatment and care to the introduced where pharmacists visit nursing
mentally ill through service networks in the com- homes and aid in managing residents’ medica-
munity, support for caregivers to cope with care tion needs more effectively. With the Pharmacist
giving, and community safety network for people Outreach Program, pharmacists visit the
with dementia and depression and their care- homes of referred patients to check medication
givers. There are also community-based, targeted compliance and identify and address drug-
mental health programs for youths, adults, and the related problems in consultation with the
elderly. primary physician.
Pharmacists are also involved in supply of
medicines and medication safety, at the institu-
Psychiatric Intermediate and Long-Term
tional level through reviewing drug formularies
Care
and monitoring the use of drugs. Pharmacists are
The majority of psychiatric long-term care ser-
also involved in medication safety initiatives at
vices, where individuals require residential care
the institutional or national level, medication error
or a period of transition and close supervision
reporting and monitoring frameworks, monitoring
after discharge, are provided by the Institute of
and reporting of adverse drug events.
Mental Health and voluntary welfare organiza-
tions – supported by Ministry of Health and Min-
istry of Social and Family Development – such as
The Private Hospitals
Singapore Association for Mental Health and Sin-
gapore Anglican Community Services. Types of
Private hospitals account for approximately 20% of
long-term care facilities include psychiatric nurs-
inpatient beds. Patients may use either the public or
ing homes, rehabilitation homes, and day care
private system, as long as they can pay the costs of
centers.
their preferred provider. Luxury amenities are
available in some of the private hospitals. Private
hospitals are also more involved in medical tourism
Pharmaceutical Care than are the public facilities. Parkway Pantai is the
main private hospital group in Singapore.
In Singapore, pharmacists are now involved in There is a trend toward tapping private hospi-
providing more direct patient care as members of tals’ spare capacity for treating public system,
multidisciplinary healthcare teams. In the public subsidized patients. Private hospitals’ bed occu-
sector, pharmacy services and pharmaceutical pancy rate averages about 55% (MOH 2012 Com-
care by pharmacists are provided through the mittee of Supply Speech).
Departments of Pharmacy at each public hospi-
tal/institution.
Pharmacists dispense and review medications, Reforms
conduct medication counseling to patients upon
discharge, and perform specialized clinical phar- Main Reforms
macy services in hospitals, such as a dedicated
ICU pharmacist. Several main reforms in the Singapore system are
In the outpatient and community setting, phar- aimed at making healthcare more affordable for
macists also undertake health management and consumers.
disease prevention counseling, provide patient The Community Health Assist Scheme, which
medication management and adherence services provides subsidized healthcare services at private (as
opposed to public) general practitioner clinics, Planned Reforms

will no longer have age restrictions, opening up
subsidized medical and dental care at over 900 Better Care for The Aged
private clinics for lower- and middle-income In 2015, the Ministerial Committee on Ageing
Singaporeans. unveiled new features of an SGD3 billion national
Currently, Medisave can be used for treatment plan to help Singaporeans age with confidence, lead
of ten chronic diseases in the outpatient setting. active lives, and maintain strong bonds with family
The government is also expanding Medisave use and community. The plan encompasses about 60
for five more chronic conditions – osteoarthritis, initiatives covering 12 areas: health and wellness,
benign prostatic hyperplasia, anxiety, Parkinson’s learning, volunteerism, employment, housing, trans-
disease, and nephritis/nephrosis (chronic kidney port, public spaces, respect and social inclusion,
disease) – and bringing the total number up to 15. retirement adequacy, health care and aged care,
These will also be subsidized through the Com- protection for vulnerable seniors, and research.
munity Health Assist Scheme, again, giving Patient costs at specialist outpatient clinics in
patients the opportunity to be treated at the private public hospitals will be lowered through increased
clinics. subsidies for lower- and middle-income groups.
High-risk groups will also benefit from As of this writing, complete details of the plan
expanded Medisave use for pneumococcal and have not been announced.
influenza vaccinations. Over the years, Medisave
use has been expanded gradually to
cover chronic conditions such as diabetes and Assessment
high blood pressure as well as health screenings
and vaccinations for selected groups. The User Experience
Medisave Contribution Ceiling was increased
in 2016, and there is no longer a Medisave Min- Singapore’s Ministry of Health conducts an
imum Sum. annual patient satisfaction survey that helps it
understand patients’ levels of satisfaction and
expectations for the public healthcare institutions.
Recent Reforms The survey includes patient satisfaction for certain
service attributes such as waiting times, facilities,
MediShield Life and care coordination.
MediShield Life is the recent reform and trans- The results of the 2012 survey showed that
formation of the national health insurance pro- 77.1% of respondents indicated they were satisfied;
gram. The reform initiated in November 2015 77.7% of patients would “strongly recommend” or
aim to address the growing need for chronic “likely recommend” the healthcare institutions to
disease care and long-term care. Coverage is others based on their own experience.
now universal and compulsory and includes
individuals with preexisting conditions. Previ-
ously ending at age 90, coverage is now for
life. The lifetime cap on benefits has been Health Outcomes
removed and the annual limit increased to
SGD100,000. Singapore’s healthcare system delivers very
Another recent change provides better protec- high-quality care with outcomes that are usually
tion from large hospital bills, by reducing coin- better than those found in most high-income
surance payments below 10 percent for the countries. It is ranked sixth globally by the
portion of the bill exceeding SGD5,000. Less World Health Organization – far ahead of the
than 1 percent of Singaporeans will need to pay United States at number 37 and the United King-
additional premiums. dom at 18.
• Life expectancy for women – currently stroke. The inclusion of these evidence-based
84.5 years versus 65 years in 1960 and validated indicators allow for comprehensive
• Life expectancy for men – currently 79.9 years benchmarking, enabling identification of areas of
versus 61.2 years in 1960 strong performance as well as areas where
improvements are needed.
Singapore also has a vastly improved survival In 2008 Singapore introduced a set of National
rate among newborns and infants, a rate better Standards for Healthcare which is used to set
than most developed countries: priorities for improvement efforts and alignment
with planning initiatives. It focuses on key areas
• Neonatal mortality rate per 1,000 births is now of concern and promotes a culture of continuous
1.1 versus 17.7 in the1960. quality improvement.
• Infant mortality rate per 1,000 births is now 1.8 National Standards for Healthcare is
versus 34.9 in 1960. implemented through a network of Healthcare
Performance Offices each chaired by a senior
Other outcomes: clinical leader who reports directly to the institu-
tions chief executive officer/chairman medical
• Under 5 mortality rate (per 1,000 live births – board. Resulting quality improvement outputs
both sexes) is 2.8 versus 7.5 in 1990 can then be incorporated into the National Health
System Scorecard and the Public Acute Hospital
Source: Singapore registry of births and death Scorecard for performance analysis and
report (2012). monitoring.
In addition, its cancer survival rates are similar
to Europe’s, and its cardiovascular disease death
rate is half that of the rest of the Asia/Pacific Transparency and Accountability
region.
Regarding policy development and implementa-
tion, Singapore’s Ministry of Health uses public
Efficiency consultation with stakeholders and the public
Singapore uses a performance measurement and before policies are enacted. Stakeholders are
management process to help healthcare providers engaged through dialogue and the public through
assess and benchmark their performance against public consultation. A set of principles and pro-
their peers. The National Health System Score- cesses guide the public consultation ensuring that
card uses internationally established performance public sentiment, concerns, feedback, and diverse
indicators to compare performance in Singapore. views are taken into account.
The Public Acute Hospital Scorecard is used to The Ministry of Health also gathers data on
measure institutional-level performance. Its indi- consumer needs and determines actionable
cators cover clinical quality and patient perspec- insights that might improve healthcare policies.
tives. Similar scorecards for providers are being It also engages in extensive face-to-face conver-
rolled out in primary care facilities and in com- sations through visits to private and public sector
munity hospitals. institutions, town halls, and feedback sessions.
The scorecards lay out the standards of service The ministry also identifies potential issues and
and key deliverables required of the public concerns from the complaint and appeal letters it
healthcare institutions, and they are monitored to receives from customers or their Members of Par-
ensure compliance. They incorporate internation- liament. Quarterly Customer Feedback reports are
ally accepted indicators and definitions where brought to senior management meetings for dis-
possible, such as the Centers for Medicare & cussion. The corporate planning cycle incorpo-
Medicaid Services Joint Commission-aligned rates the review of customer feedback as a key
measures for acute myocardial infarction and process to guarantee policy responsiveness.
Some concrete actions taken as a result of Ministry of Health, Singapore. Expenditure overview. 2013c.
public consultation include: extension of http://www.singaporebudget.gov.sg/budget_2013/expen
diture_overview/moh.html. Accessed Oct 2013.
Medisave use for pneumococcal vaccination, Ministry of Health. All Singapore residents to enjoy universal
treatment of schizophrenia and major depression, coverage under MediShield Life, with no exclusions.
and expanded coverage for major chronic dis- 2015a. https://www.google.com.sg/url?sa=t&rct=j&
eases; raised withdrawal limits for community q=&esrc=s&source=web&cd=1&cad=rja&uact=
8&ved=0ahUKEwikkpWSzPDRAhVJMo8KHWjXA
hospital stays and day rehabilitation center visits; hoQFggaMAA&url=https%3A%2F%2Fwww.moh.gov.
and Medisave use for mammograms and colonos- sg%2Fcontent%2Fmoh_web%2Fhome%2FpressRoom
copies. Directly as a result of customer feedback, %2FpressRoomItemRelease%2F2015%2Fall-singapore-
Medisave withdrawals were also extended to pal- residents-to-enjoy-universal-coverage-under-medish0.
html&usg=AFQjCNFJfEHv-OmUHPEjw-RiELmI8a
liative care, including palliative care in the home. fPFw
Ministry of Health. Better health, better future for all.
Ministry of Health Initiatives for 2015. 2015. https://
References www.google.com.sg/url?sa=t&rct=j&q=&esrc=s&
source=web&cd=1&cad=rja&uact=8&ved=0ahU
KEwiq4P-GzPDRAhXCRY8KHZr9BQYQFggaMAA
Department of Statistics, Singapore. http://www.singstat. &url=https%3A%2F%2Fwww.moh.gov.sg%2Fcontent
gov.sg/statistics/latest_data.html#8/ http://www.singstat. %2Fdam%2Fmoh_web%2FPressRoom%2FResources
gov.sg/publications/publications_and_papers/cop2010/ %2FMOH%2520Factsheet.pdf&usg=AFQjCNFM
cop2010adr.html. Accessed Oct 2013. oyT0VGwMdxLjebX-UVSJsu5EOg
Yong Gan Kim. Straitstimes.com http://www.straitstimes. Ministry of Manpower, Singapore. Median, 25th and 75th
com/mnt/html/parliament/mar6-GanKimYong-pt1.pdf percentile of monthly gross wages of Common Occupa-
Ministry of Health, Singapore. MOH (APO). 2012. tions in Health Industry. 2012. Ministry of Manpower
Ministry of Health, Singapore. 2013. http://www.moh.gov.sg/ (Table 4.9, http://stats.mom.gov.sg/Pages/Occupational-
content/dam/moh_web/Publications/Educational%20Re Wages-Tables-2012.aspx#)
sources/2009/MT%20pamphlet%20%28English%29. Weizhen T. Today Online. Thursday Oct 17 2013. http://
pdf. Accessed Oct 2013. www.todayonline.com/singapore/medishield-life-more-
Ministry of Health, Singapore. 2013. http://www.moh.gov. sustainable-private-medical-schemes-gan
sg/content/moh_web/home/costs_and_financing/sche World Health Organization. World health statistics 2013.
mes_subsidies/Medishield/Medisave-approved_Insur http://www.who.int/gho/publications/world_health_sta
ance.html. Accessed Oct 2013. tistics/2013/en/. Accessed Oct 2013.
Health System in the USA
40
Andrew J. Barnes, Lynn Y. Unruh, Pauline Rosenau, and
Thomas Rice
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893
Public and Private Organizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 893
Financing of Major Insurance Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
Sources of Revenue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
Financing and Financial Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
Medicare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 898
Medicaid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899
Private Insurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 901
Human Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905
A. J. Barnes (*)
Department of Health Behavior and Policy, School of
Medicine, Virginia Commonwealth University, Richmond,
VA, USA
e-mail: andrew.barnes@vcuhealth.org
L. Y. Unruh
Department of Health Management and Informatics,
College of Health and Public Affairs, University of Central
Florida, Orlando, FL, USA
e-mail: lynn.unruh@ucf.edu
P. Rosenau
Division of Management, Policy and Community Health,
School of Public Health, University of Texas Health
Science Center at Houston, Houston, TX, USA
e-mail: pauline.rosenau@uth.tmc.edu
T. Rice
Department of Health Policy and Management, Fielding
School of Public Health, University of California, Los
Angeles, CA, USA
e-mail: trice@ucla.edu

https://doi.org/10.1007/978-1-4939-8715-3_48
892 A. J. Barnes et al.
Provision of Health-Care Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907

Public Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907
Outpatient Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907
Acute Inpatient Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 908
Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 910
Palliative Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 910
Reforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 910
Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912
Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912
US Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913
International Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 913
Outcomes and Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914
Expenditures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 918
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 921
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 922
Abstract implemented in 2014, although judicial set-

This analysis of the US health system reviews backs, delays, and legislative repeals to its
its organization and governance, health core provisions have reduced its overall
financing, health-care provision, health impact. Improving coverage was a central
reforms, and health system performance. aim, envisaged through mandates that certain
The US health system has both considerable individuals purchase, and employers offer,
strengths and notable weaknesses. It has a private health insurance as well as subsidies
large and well-trained health workforce, a for lower-income uninsured citizens to pur-
wide range of high-quality medical specialists chase private insurance. However, in late
as well as secondary and tertiary institutions, 2017, the individual mandate to purchase
and a robust health research program and, for insurance was repealed by Congress, with an
selected services, has among the best medical effective date of January 2019. Eligibility for
outcomes in the world. But it also suffers from Medicaid, which provides public coverage for
incomplete coverage of its citizenry, health low-income individuals and families, is also
expenditure levels per person far exceeding expanded, and greater protections for insured
all other countries, poor health indicators on persons have been instituted. Furthermore,
many objective and subjective measures of primary care and public health are receiving
quality and outcomes, an unequal distribution increased funding, and improving quality and
of resources and outcomes across the country controlling expenditures are addressed
and among different population groups, and through a range of policies. Early assessments
lagging efforts to introduce health informa- of the ACA suggest coverage rates have
tion technology. It is difficult to determine the expanded, particularly for low-income adults
extent to which deficiencies are health system in some states. Whether the ACA will be
related, though it seems that at least some of effective in addressing the US health-care
the problems are a result of poor access to system’s historic challenges can only be
care. Because of the adoption of the Afford- determined over time.
able Care Act (ACA) in 2010, the USA is The material used in this chapter was adapted
facing a period of enormous potential change. or taken directly from our book on the
The major provisions of the ACA were US health-care system – Rice T, Rosenau P,
40 Health System in the USA 893
Unruh LY, Barnes AJ, Saltman RB, van with 2014 (U.S. Census Bureau 2016).
Ginneken E, Health Syst Transit 15(3):1–431, Morevover, in California, there are now almost
2013. twice as many Hispanics and Latinos age 18 and
younger than there are whites (Kidsdata.org
2015).
Introduction Historically, the US has resisted central plan-
ning or control at both the federal and state levels.
The US is a large, wealthy country, with double The US health-care system reflects this wider
the gross domestic product of any other in the context, having developed largely through the
world. It is a federal, constitutional democracy, private sector and combining high levels of spend-
with decision-making authority divided between ing with distinctively low levels of government
the federal and state governments. In 2016 nearly regulation. The US spends far more money on
one-fifth (17.9%) of its economy was spent on health care per person than any other country.
health care ($3.3 trillion), amounting to $10,348 International comparison shows a varied pic-
per capita (Hartman et al. 2017). As with many ture with respect to access to health care, health
such national averages in this report, there are behaviors, and outcomes. The US is unusual
wide variations across the states, with spending among high-income OECD countries in that
per capita in 2014 ranging from about $5,982 per most Americans still receive their coverage from
person in Utah to more than $11,944 in the Dis- private health insurance, and more than 12% of
trict of Columbia (Kaiser Family Foundation non-elderly adults are uninsured, although this
2014a. Tax rates are lower than in almost all proportion has been reduced significantly through
other high-income countries, consistent with the implementation of the Affordable Care Act
fact that its public sector provides fewer social (Kaiser Family Foundation, 2016a). With regard
services. Tax rates are lower than in almost all to health behaviors, the picture is again varied; the
other high-income countries, consistent with the USA has been notably effective in reducing
fact that its public sector provides fewer social smoking rates and has one of the lowest smoking
services. Despite being a high-income nation, rates internationally. But it has been less effective
the US ranks poorly, compared to other high- in grappling with nutritional health and obesity.
income countries, on measures of income equal- The US does well on some disease indicators
ity. Because the US birth rate is higher than that of (e.g., certain cancers) but poorly on others (e.g.,
most developed countries, its dependency ratio – asthma). Compared to other developed countries,
those too young or too old to work, divided by the life expectancy is lower and mortality is
working age population – is expected to grow higher (World Bank 2017).
more slowly than in most other countries.
The racial and ethnic makeup of the US popu-
lation is quite varied, with approximately 61.3% Organization and Governance
non-Hispanic White, 17.8% Hispanic or Latino,
13.3% non-Hispanic Black or African American, Public and Private Organizations
and the remainder other and/or mixed racial and
ethnic groups (US Census Bureau 2017). His- In the US health-care system, public and private
panics and Latinos are the fastest-growing payers purchase health-care services from pro-
group, with a 49% population increase between viders subject to regulations imposed by federal,
2000 and 2010, compared to just 5% for others state, and local governments as well as by private
(Ennis et al. 2011). This proportional relationship regulatory organizations. Figure 1 illustrates the
also continues to change: Asians have replaced interplay between four main actors: (1) govern-
Hispanics and Latinos to be the fastest-growing ment, (2) private insurance, (3) providers, and (4)
group, with a total population of 21 million as of regulators, as well as the types of relationships
2015, representing a 3.4 % increase compared that connect them.
894
Federal State
government government
Regulation
Hierarchical
Legislative Contracts
Executive Judicial Local
(Congress)
Aid to the poor
Privately insured individuals
Larger employer
sponsored Exchanges
Office of Department Department of Health and
Veterans Affairs of Defense Human Services (DHHS) Charities
Small employer
sponsored
Individuals
Selected Other
DHHS agencies: Centers for
AHRQ,CDC,FDA,NH Medicare &
Medica Services
Private insurance
Veterans health Tricare Indian Medicaid High

Medicare HMO PPO Uninsured
administration (military) Health Services and CHIP deductible
Accountable Care Organizations
Home health and Mental health Other professional

Hospitals Physicians Dentists Pharmacies long-term care
Public health
institutions services
institutions
Independent nongovernmental Provider

regulatory organizations regulatory organizations
Fig. 1 Organization of the US health system

A. J. Barnes et al.
Government actors include those at the federal, in the US. Private insurance plans have histori-
state, and local levels. Both the federal and state cally been categorized into three types: health
governments have executive, legislative, and judi- maintenance organization (HMO) plans that pro-
cial branches (although the figure only shows this vide or contract to provide managed care, pre-
for the federal government). Under the executive ferred provider organization (PPO) plans that
branch of the federal government, the Department contract with a preferred network of providers to
of Health and Human Services (HHS) plays the provide care at lower costs, and high-deductible
largest administrative role in the US health-care plans (HDHPs) that typically offer lower pre-
system. HHS includes agencies such as the Cen- miums but higher deductibles than HMOs and
ters for Medicare and Medicaid Services (CMS) PPOs. The vast majority of Americans with pri-
that administer the two major public health insur- vate insurance obtain it through an employer. The
ance programs: (1) Medicare, which provides Patient Protection and Affordable Care Act
near-universal coverage for those 65 and older as (ACA), signed into law on March 23, 2010, is
well as the disabled and those with end-stage renal resulting in significant changes in the US health-
disease, and (2) Medicaid and the Children’s care system. As shown in Fig. 1, these include the
Health Insurance Programs (CHIP), which pri- establishment of federal and state-based insurance
marily provide insurance for some low-income exchanges for individuals without access to public
families and those with disabilities. Medicaid or employer-based insurance to purchase private
also covers long-term care services after individ- coverage as mandated by law. The ACA also
uals have used up all their own income and assets allows providers that organize into Accountable
and, along with Medicare, low-income seniors Care Organizations (ACOs) to share in savings
(referred to as “dual eligibles”). Other agencies they achieve in the Medicare program.
within HHS include research and regulatory agen-
cies such as the Agency for Healthcare Research Planning
and Quality (AHRQ), the Centers for Disease There is a range of public and private organiza-
Control and Prevention (CDC), the Food and tions that undertake health system planning in the
Drug Administration (FDA), and the National US. In spite of this, coordinated health planning
Institutes of Health (NIH). The Office of Veterans by various actors as outlined in Fig. 1 is not highly
Affairs, which oversees the Veterans Health developed. In part this reflects the pluralist and
Administration to provide care to military vet- market-oriented nature of the US health-care sys-
erans, is a federal agency independent of HHS. tem. Planning for emergencies and natural disas-
Public purchasers include federal and state ters, however, is given serious consideration in
agencies. Medicare is the largest public purchaser. both the government and private sector. For exam-
State governments, along with funds provided by ple, the CDC plans for national and international
the federal government, purchase health-care ser- response to public health emergencies.
vices through Medicaid and CHIP, although both
programs are state-administered. Both state and Regulation
local governments are also involved in providing All actors in the health-care system are subject to
health care in a number of ways making it possible regulation, often from multiple government and
for low-income and other disadvantaged individ- nongovernment agencies. Major federal regula-
uals and families to obtain care. These include tory organizations fall under the umbrella of
such things as operating public hospitals as well HHS and include CMS, which regulates public
as providing medical and preventive services payments to private providers and provider qual-
through state and local health departments and ity; the CDC, which focuses on prevention and
their associated clinics and community health control of communicable and noncommunicable
centers. diseases; and the FDA, which regulates food and
In addition to government purchasers, private drug safety. State regulatory bodies include public
insurers and individuals also purchase health care health departments, provider licensing boards,
and insurance commissioners. Local counties and implementation of the ACA and the expansion of
cities also regulate health care through their public the individual private insurance market through
health and health service departments including income-based subsidies, nearly 16 million Amer-
regulating communicable diseases and restaurant icans have individually purchased coverage, at
safety. Independent nongovernment and provider least half of whom purchased private insurance
organizations such as the American Medical through one of the federal or state-based
Association (for physicians) and the Joint Com- exchanges. In 2016, 2 years after the implemen-
mission (for hospitals) also play a regulatory role tation of the ACA’s major coverage expansion
in the US health-care system. efforts, approximately 9% of all Americans were
uninsured (28 million) including many young
Patient Rights adults, minorities, and low-income households
The US does not have a national comprehensive (Kaiser Family Foundation, 2017a; Kaiser Family
Patient Bill of Rights (WHO August 2007). The Foundation, 2018a).
right to health care is not in the US Constitution,
and it remains controversial though some states
have enacted a Patient Bill of Rights. Some Sources of Revenue
patient rights in the US have been initiated by
the court system. For example, the Supreme The sources of revenues in the US health-care
Court ruled that individuals with disabilities system have changed considerably over the past
have the right to receive services in non- 40 years. In 1970, one-third of funding was from
institutional settings whenever possible. Since out-of-pocket payments. Currently, public sources
the 1990 passage of the Americans with Disabil- constitute 37% of spending and private sources
ities Act (ADA), those in the US with physical 34%, with the remaining 11% out-of-pocket (CMS
and/or mental disabilities have been granted addi- 2016). While out-of-pocket payments have fallen
tional civil rights. The Health Insurance Portabil- as a percentage of the total, real out-of-pocket
ity and Accountability Act (HIPAA) of 1996 spending per person has actually risen consider-
governs the security and confidentiality of patient ably. This is because the size of the health-care
information. As a result of this legislation, how system has grown so rapidly.
patient information is collected, stored, and trans-
ferred is subject to careful protection.
Financing and Financial Flows
Financing of Major Insurance Broadly speaking, financing in the US health-care

Programs system originates from employers, employees,
and individuals. From them, it flows to private
Coverage insurers and health plans as well as state and
federal governments. Private and public pur-
Public purchasers – primarily Medicare and Med- chasers then transfer dollars to providers through
icaid – cover more than 30% of the population a variety of payment mechanisms. Figure 2
(Kaiser Family Foundation 2016b). The remain- depicts financial flows in the US health-care
der of the US population – including those with system.
employer-sponsored health insurance, individual Beginning with the left-hand side of the figure,
private insurance, and the uninsured – are consid- employers, employees, individuals, and charities
ered private purchasers. More than half of Amer- pay into the health-care system through various
icans obtain health insurance from their employer. taxes, premiums and other out-of-pocket
Employer-sponsored coverage is funded by a expenses, and donations. Employed persons and
combination of employer and employee pre- their families contribute to private employer-
miums and employee out-of-pocket costs. After sponsored insurance through premiums and cost
Premiums and cost-sharing
Premiums
Medicare Part A Military, VA, IHS

Hospital insurance public health
fund
Corporate
payroll taxes General federal Medicare
revenues Parts B, C and D Insurers and
health plans
Employers
Corporate tax Medicaid/CHIP
General state
Individual revenues
payroll tax
Employees
FFS
Individuals Income, sales Primary care FFS
and property taxes physicians FFS, Cap, Salary
FFS
FFS
Charities Specialists
FFS, Cap, Salary
DRGs, per diem, CR, DSH
DRGs
Hospitals FFS, per diem
Patients Negotiated discounts
Prescriptions Formularies
Co-payment, self-pay Formularies
Various
Income, sales, property and corporate taxes Other providers Various
Payroll taxes Various
Direct payments
Transfer flows
Service flows
Fig. 2 Sources of revenue, financing, and financial flows
sharing. Individuals may purchase non-group 14%, with the remainder coming from a variety
coverage outside of the employment market. In of community organizations (Kaiser Family
addition to payroll taxes, individuals contribute to Foundation 2013). In 2011, the federal govern-
general federal and state revenue funds to finance ment, through the Medicaid Disproportionate
public health-care coverage through income, Share Hospital (DSH) program, allotted $11.2
sales, and property taxes. There is no value- billion to hospitals serving a disproportionate
added tax (VAT) in the US. number of uninsured and Medicaid patients
In the past care for low-income and uninsured (Kaiser Family Foundation 2013). These pay-
individuals has been financed through private ments were expected to decrease as the ACA
charities, a safety net system of public and com- was fully implemented and many of the uninsured
munity clinics, as well as by hospitals and physi- and those with preexisting conditions acquired
cians. Additional funding came from general tax health insurance. However, many states have not
revenues, but in many cases the care received was expanded Medicaid leaving a number of
uncompensated and therefore is borne by pro- uninsured continuing to require uncompensated
viders. Prior to the ACA, it was estimated that of hospital care and subsequent legislation delayed
the $57 billion in uncompensated care expendi- reducing DSH payments to hospitals (Kaiser
tures, hospitals contribute 61% and physicians Family Foundation 2016c).
Table 1 Payment mechanisms for health services

Payers
Medicaid/ Insurers and Insured Uninsured
Medicare CHIP health plans individuals individuals
Services
Inpatient hospital care DRG DRG, per FFS, per diem Co-payment, Direct
diem, CR coinsurance
Physicians and other FFS FFS, FFS, capitation, Co-payment, Direct
health professionals capitation salary coinsurance
Prescription drugs Subsidies for DAWP Formularies Co-payment, Direct
premiums coinsurance
Long-term care and home PPS for PPS, CR Per diem for Direct Direct
health limited limited duration
duration
Notes: CR cost reimbursement, DAWP discounted average wholesale price, DRG diagnosis-related group, FFS fee-for-
service, PPS prospective payment system
In the US, how health services are paid for Part C, Medicare Advantage, is an alternative
depends on the service provided, the type of to Parts A and B. Enrollment is voluntary. It pro-
health worker providing it, the funder, as well as vides coverage for the same services and, at the
where the service is provided (e.g., hospital or discretion of the organization offering coverage,
ambulatory care center, California or New York). sometimes additional benefits such as vision or
Given this complexity, the payment mechanisms hearing. One of the main differences between Part
for each type of health service is shown according C and the preceding two parts which are some-
to the payer involved (e.g., Medicare, insurers, times called “traditional Medicare,” is that Part C
and health plans) in Table 1. coverage is offered through private organizations
(e.g., insurers and HMOs). In 2017, 33% of Medi-
care beneficiaries were enrolled in Medicare
Medicare Advantage plans, but aspects of the ACA could
lead to reductions in enrollment in the future (Kai-
The Medicare program provides health insurance ser Family Foundation 2017b).
coverage to nearly all Americans age 65 and older Part D, prescription drug coverage, began in
as well as to many disabled Americans and people 2006 and is also voluntary. Like Part C, Part D
with end-stage renal disease – a total of about 55 benefits are provided through private insurers.
million people. It covers medically necessary care There are dozens of Part D plans in each state –
with the exception of extended long-term care and in addition to dozens of Medicare Advantage
dental care. Medicare is divided into four parts, plans providing drug coverage in many urban
labeled Parts A, B, C, and D. Part A, hospital areas. Also like Part C, premiums and benefits
coverage, includes not only hospital care but also vary by plan, with competition occurring based
some post-acute nursing home, home health care, not only on premium differences but also on dif-
and hospice care. Part B, supplemental medical ferences in benefits and, in particular, the drugs
insurance, is a voluntary program with essentially that are included on a plan’s formulary that are
the same eligibility requirements as Part A. It listed as “preferred” drugs and which therefore are
covers physicians’ services (both inpatient and out- subject to lower patient co-payments. Over 70%
patient); outpatient care; medical equipment, tests, of Medicare beneficiaries are covered under Part
and X-rays; home health care; some preventive D. Most other beneficiaries have drug coverage
care; and a variety of other medical services. from another source, such as coverage from a
Despite its voluntary nature, about 95% of those former employer, but 12% do not have any drug
eligible enroll in it because it is heavily subsidized. coverage (Kaiser Family Foundation 2017c).
In addition to services not covered, there are Medicaid

substantial patient cost-sharing requirements. As
a result, about 90% of all beneficiaries obtain Unlike Medicare, which is available to nearly all
some form of supplemental insurance coverage, individuals age 65 and older, Medicaid is a means-
mainly through Medicare Advantage plans tested program. It is designed to provide health
(which usually cover additional services), Medic- insurance for those with the lowest-income levels
aid, or private policies called “Medigap.” Cover- and fewest assets, the disabled, and to poor seniors
age for hospital care under Part A contains two with Medicare coverage, as well as the disabled
significant gaps. First, there is a deductible for and seniors who have exhausted their financial
each inpatient hospital stay. In 2018, that amount resources, often as a result of very high long-
was $1,340 (Medicare.gov 2018a). Second, for term care expenses. Medicaid is a key resource
those rare stays that exceed 60 days, there are for some of the poorest and sickest Americans.
substantial daily co-payments. Part A’s nursing Medicaid programs are state-based, but they
home coverage is limited because it is only for are funded jointly by the states and the federal
short-term skilled care following a hospital admis- government. In return for federal dollars, states
sion, rather than extended long-term care. For are required to meet certain federal government
eligible stays, up to 100 days are covered. During standards. Participation by the states is voluntary
the first 20 days, there are no co-payments, but though historically all of the states have chosen to
there is a substantial daily co-payment for days participate. Services are largely purchased from
21–100 of a stay of $167.50. In contrast, there is the private sector. Until 2014, the federal govern-
no co-payment for home health-care services. ment paid between 50 and 74% of Medicaid costs
Coverage for physicians’ and other medical proportional to each state’s income, with the states
services under Part B is also subject to patient paying the remainder. Beginning in 2014, federal
cost sharing. The patient is responsible for 20% contributions changed for those states that
of all covered expenses (with no maximum) after expanded Medicaid, with the federal government
meeting an annual deductible of $183 (all figures paying 100% of costs for those newly eligible,
are for 2018) (Medicare.gov 2018b). The 20% gradually falling to 90% by 2020.
coinsurance requirement is perhaps the main rea- Medicaid covers several distinct population
son why the vast majority of Medicare beneficia- groups. The breadth of coverage varies across
ries seek some form of supplemental insurance states according to these population groups and
coverage. It is difficult to generalize about the by state.
depth of coverage under Part C because each Prior to the ACA, the main groups typically
plan has its own benefit structure. Federal mini- covered by Medicaid were as follows:
mum requirements are that coverage be at least as
comprehensive as under Parts A and B. As noted, Low-income children
most Part C plans offer additional services. About Low-income pregnant women
80% offer prescription drug coverage. It is also Low-income disabled persons
difficult to generalize about Part D (stand-alone Low-income senior citizens
prescription drug coverage) because benefits vary Low-income parents of dependent children
by insurance plan. The main characteristic is a
feature called the “donut hole.” Insurers provide For adults, in some states that have not
coverage (with cost sharing) up to a certain expanded Medicaid coverage, not only are there
amount of drug spending per year, at which income restrictions but also asset limitations that
point there is a period of no coverage at all. can preclude eligibility.
When total drug spending reaches a “cata- Medicaid covers roughly 17 million more
strophic” level, almost all drug costs are covered. Americans (a total of 74 million) than Medicare.
As part of the ACA, the donut hole will shrink and As noted, the breadth of coverage varies consid-
is scheduled to be eliminated by 2020. erably by eligibility group and by state. As of
February 2018, 33 states and the District of 100% of the costs from the federal government
Columbia had expanded their Medicaid coverage to add all poor people and the near poor up to
in accordance with the ACA, and 18 had not 138% of the poverty level to Medicaid rolls for
(Kaiser Family Foundation 2018b). In those states 4 years. The federal contribution will gradually
that have chosen to expand, all adults and children decrease to 90%.
below 138% of the federal poverty level (FPL) are Several states have petitioned the federal gov-
now eligible for Medicaid. (In 2017, the federal ernment for special arrangements in their Medic-
poverty level was $12,060 for a single individual aid expansion, and they have received approval to
and $24,600 for a family of four.) (Healthcare. proceed. These are called “1115 demonstration
gov, 2018). waivers” and typically involve exceptions to the
In the other states, children and pregnant usual Medicaid rules that are budget neutral for
women have the most liberal eligibility require- CMS. Examples include charging a co-pay or
ments. States are required to cover pregnant premium to recipients for services, imposing a
women and children up to age six if their incomes penalty for nonpayment of premiums, including
are at or below 138% of the federal poverty level work requirements, offering “wellness incentive”
(FPL) and children ages 6–18 up to 100% of the programs, and structuring the program like a
FPL. Many states employ even higher, or more health savings account (HSA). As of February
generous, income eligibility thresholds. When 2018, 35 states have received waivers from CMS
combined with CHIP coverage, the median state to tailor their own Medicaid programs (Kaiser
provides coverage to children up to 235% of the Family Foundation 2018c).
FPL and pregnant women up to 185%. To illus- The initial evidence on the effectiveness of
trate the critical role that Medicaid plays for preg- these innovations to save money, improve the
nant women, the program pays for 45% of all quality of care, and/or improve population health
births in the US. Coverage is somewhat narrower is limited. However, states are required by CMS to
for seniors and the disabled, however, with eligi- report such evidence during the demonstration
bility mandated up to 75% of the FPL. waiver. Almost all of the waivers add to the com-
In the 18 states that have not expanded cover- plexity of the Medicaid program and could
age, low-income parents of dependent children increase the cost of administration. This will be
face the most stringent eligibility requirements. evaluated by CMS going forward. In the tradition
Nine states cover them only if their incomes are of American federalism, successful innovations
below 40% of the FPL – with Alabama and Texas could spread to other states.
providing such coverage only up to 18% of the The scope of coverage under Medicaid is gen-
FPL (i.e., an annual income even as low as $2,200 erally wide but varies by state. Federal law
would disqualify an individual from coverage in requires that states provide the following services:
that state). In contrast, Connecticut and the Dis- inpatient and outpatient hospital, physician, nurse
trict of Columbia cover these adults at in excess of practitioner, laboratory and radiology, nursing
200% of the FPL or higher, taking advantage of home and home health care for those age 21 and
the joint funding by the federal government. older, health screening for those under age 21,
Recently, several states have either considered or family planning, and transportation. Other ser-
passed legislation that would also impose work vices are optional for states. This designation
requirements on many Medicaid recipients of means that if a state chooses to cover the service,
working age (Kaiser Family Foundation it will receive matching funds from the federal
2018b). This illustrates the large variation in government. Optional services include some
breadth of coverage that currently exists between major services such as prescription drugs and
states, although this variation has been reduced dental care but also such things as care provided
considerably as a result of the ACA. by professionals besides physicians and nurse
Beginning in 2014, states that choose to practitioners, durable medical equipment, eye-
expand their Medicaid coverage will receive glasses, rehabilitation, various types of
institutional care, home- and community-based an entry into the employer insurance market, and
services, personal care services, and hospice. who are not eligible for Medicare and Medicaid,
In general, those eligible for Medicaid receive often seek coverage individually. Historically,
services at little or no cost. However, states some- individual coverage has had several disadvan-
times put restrictions on the number of services tages over employer group coverage and therefore
that are covered per year. Moreover, payments to was normally purchased only if the employer-
physicians are usually low. In 2013, about 30% of sponsored coverage was unavailable. Prior to the
physicians reported that they would not take new ACA, plans purchased in the individual private
Medicaid patients (Decker 2013). Psychiatrists market were usually unsubsidized; administrative
were the most likely to reject new Medicaid costs tended to be high (25–40%); health exami-
patients (56%), and cardiovascular disease spe- nations were often necessary; cost-sharing
cialists see the most, with only 9% rejecting such requirements were, on average, higher; and
patients (Decker 2013). fewer types of services tended to be covered.
One development with the potential to provide However, the individual market is changing sub-
more mainstream access to physician office care is stantially with the creation of the health insurance
the movement toward the use of managed care in exchanges under the ACA.
the Medicaid program. Over 70% of Medicaid Some employers, particularly larger ones, offer
beneficiaries are in managed care plans. The a choice of health insurance products to their
exact nature of these arrangements varies from employees. Among firms offering a choice, only
state to state. Some include capitation (rather about 20% of employees nationally can choose
than fee-for-service) for providers and/or primary among three or more plans (California HealthCare
care case management. States often prefer man- Foundation 2009). For federal government
aged care both as a means of enhancing quality employees, there can be dozens of choices.
and controlling costs and are likely to rely on it as Employees with a choice can generally switch to
the program expands through provisions in the a different plan irrespective of their health history
ACA. or status once per year.
Historically the most common arrangement
offered by employers was a PPO. Among all
Private Insurance covered workers, in 2017 48% were enrolled in
PPOs, 14% in HMOs, 10% in point of service
In 2016, 179 million Americans were covered by plans (POS – a blend of HMO and PPO arrange-
private insurance; 157 million of these had ments that allow members to seek care from non-
employer-sponsored coverage (Kaiser Family network providers at a higher cost), 28% in high-
Foundation 2016d). While having employer- deductible plans (note that some of these may be
sponsored insurance is almost always advanta- PPOs or HMOs), and less than 1% in conventional
geous – employers generally subsidize premiums insurance (traditional fee-for-service) plans (Kaiser
– it is not available to everyone. First, it is neces- Family Foundation 2017d). The biggest change in
sary to be employed or be a family member of recent years has been the relatively rapid rise of
someone employed. Second, the employer has to high-deductible plans with a savings option, many
offer coverage; until 2015 or 2016, it was of which are classified as health savings accounts
completely voluntary on the part of the employer. (HSAs). In HSAs, the policy holder agrees to pur-
Third, if coverage is offered, the employee has to chase insurance with a high deductible (currently
be eligible for it. And fourth, even if eligible, the averaging about $2,200 annually for individual
employee has to be willing to pay the employee’s coverage and twice that for family coverage). Pre-
share of the premiums, which can be considerable. mium contributions can be made by the individual
It is the people who are better-off economically and/or employer. These contributions are tax
who are able to meet the four conditions men- deductible, can accumulate year to year if unspent,
tioned above. Individuals and families without and therefore can be used for future medical
expense. They can be withdrawn to pay for eligible the average cost of employer-based single cover-
medical care. age was $6,690 in 2017, 18% of which was paid
Market share in health insurance is dominated by the employee. For family coverage, it was 31%
by larger firms that generally market nationally. of the total cost of $18,764. The percentage of
(Blue Cross Blue Shield plans, while having a family coverage paid by the employee has risen
national presence, usually market in individual considerably over the past decade – by 6.8% per
states.) In 2013, three of the largest insurers cov- year compared to 4.8% for the share paid by the
ered 80% of people enrolled in individual, small employer (Kaiser Family Foundation 2017a).
group, and large group private insurance markets This is one of several examples of how employers
in at least 37 states (US Government Accountabil- have shifted more costs onto employees as health-
ity Office 2014). care costs have risen.
Prior to January 2014, insurers priced their As is the case in many high-income countries,
productions in two ways: experience rating and there are often substantial co-payments for pre-
community rating. Under experience rating, the scription drugs. In most employer-sponsored
most common technique used, insurers charged plans, there are multiple “tiers,” each of which
employers (or individuals) on the basis of the past has its own cost-sharing requirements. Their pur-
cost experiences or, when data is lacking, on pre- pose is mainly to encourage the use of cheaper
dicted expenditures. In contrast, community rat- drugs, particularly generics, the use of which has
ing entailed charging the same amount to all grown substantially in recent years. One way in
groups (or even individuals). In the individual which employer coverage tends to be more gen-
insurance market, premiums were generally expe- erous than Medicare’s is that there is usually a
rienced-rated. Each individual went through med- limit on annual out-of-pocket expenditures. Over
ical underwriting in which their risks are assessed. 80% of employer-sponsored health plans estab-
Under the ACA, state-based exchanges com- lish such a maximum. In 2014 the median out-of-
bined with the individual mandate to purchase pocket maximum for an employee with individual
insurance are intended to reduce adverse selection coverage was approximately $6,000 (Kaiser Fam-
problems in the individual and small group market ily Foundation 2014a).
by requiring plans selling in exchanges use com- Administrative costs tend to be higher in pri-
munity rating (older individuals can be charged vate insurance than government-sponsored pro-
more than the younger, but differences within age grams like Medicare and Medicaid. This is a
cohorts will be prohibited), rather than experience result of several factors in addition to the need
rating, and by increasing risk pooling to a far for profits. Private insurers engage in “underwrit-
greater extent than has been the case in the past ing” activities, which involve examining past
in the US. Exchanges will also reduce or eliminate claim expenses to determine a competitive, yet
the need for individuals to purchase insurance still profitable premium to charge. They also
through agents or brokers, whose fees can absorb need to market and advertise since, unlike gov-
20% of the total premium during the first year of ernment programs, they do not have a captive
enrollment (Whitmore et al. 2011). One of the key audience. Finally, to protect themselves against
requirements of the ACA is that individuals pur- unexpectedly high claims, insurers often need to
chase coverage or pay a penalty. Similarly, firms factor in a risk premium. Estimates vary on the
with more than 50 employees will also have to size of administrative costs (including profits and
provide coverage or pay a penalty. These “sticks,” taxes). Most agree, however, that administrative
combined with the “carrots” of subsidies for indi- costs are much higher for insurance policies cov-
viduals to purchase coverage, will, it is hoped, ering individuals and small firms. One study,
lead to a system where community rating will be conducted by a US actuarial firm, estimated that
viable. in 2003, private insurers spent 16.7% on admin-
There are significant user charges associated istrative costs. Among the latter, administrative
with private insurance. Beginning with premiums, costs were estimated to be 30% in the individual
market, 23% in the small employer market, and Institutional Infrastructure

12.5% for large employers (Milliman 2006). In A number of changes have occurred in the infra-
contrast, Medicare administrative costs for the structure of health-care institutions in the past
overall program were 1.4% (Centers for Medicare decades. Figure 3 shows that between 1970 and
and Medicaid Services, 2016). 1990, the number of community hospital beds per
1,000 population declined by 14%. From 1990 to
2012, the decline was even greater, at 30%. The
Physical and Human Resources number of beds in psychiatric institutions fell 58%
from 1970 to 1990 and another 36% from 1990 to
A health-care system requires adequate physical 2000, leveling off in 2000. The number of skilled
and human resources for the delivery of health nursing home beds fell nearly 15% from 1990 to
care. Physical resources encompass capital stock, 2012.
infrastructure, medical equipment, and informa-
tion technology. Human resources are practi- Medical Equipment
tioners who diagnose and treat patients, The use of medical equipment has skyrocketed
technologists, technicians, and support occupa- over the past decades. Reductions in hospital
tions (Bureau of Labor Statistics (BLS) 2011a, b). length of stay and the provision of more acute
care on an outpatient basis require a greater use
of medical equipment (Danzon and Pauly 2001).
Physical Resources Medicare, Medicaid, and private insurance com-
panies indirectly cover the costs of medical equip-
Capital Stock ment in medical facilities as part of the overall
Table 2 presents trends in the number of several reimbursement for care and directly cover the
types of health-care facilities in the US for costs of medical equipment to individuals (Tunis
selected years through 2012. The total number of and Kang 2001).
ambulatory care facilities increased by 24% from
1997 to 2012. All types of ambulatory facilities, Information Technology
such as physician and dentist offices, ambulatory Health information technology (HIT) has become
surgical centers, and rural health clinics, experi- an important part of health care (Hersh 2009). On
enced this growth. Ambulatory surgical centers the provider side, medical record-keeping, deci-
and rural health clinics grew tenfold or more sion-making, imaging, and prescribing can now
between 1980 and 2012. be aided by computer and Internet data storage,
In contrast to the growth in ambulatory care, organization, and retrieval. On the consumer side,
the number of hospitals decreased significantly the Internet has become a source of information
from 1975 to 2009. The consolidations and clos- (and misinformation) on health care, and patients
ings of hospitals that contributed to this decline may be able to communicate with physicians
are related to changes in hospital payment through email. HIT is slowly integrating the pro-
from retrospective to prospective and the rise of vider and consumer sides so that patients can view
managed care practices promoting reduced and add to their medical record online (Hogan and
lengths of stay and competition between hospitals Kissam 2010).
(Harrison 2007). The adoption of health information systems
The total number of nursing homes also has been slow in the US. In 2013, 78% of office-
decreased, but the number of skilled nursing based physicians used some kind of electronic
homes increased threefold. The number of Medi- health record (EHR) in their practice, while 59%
care-certified home health and hospice agencies of hospitals had a basic EHR system (Adler-Mil-
increased fivefold or more, most likely in response stein et al. 2014; Hsiao and Hing 2014).
to changes in Medicare reimbursement and shifts The US government has put significant
from inpatient to outpatient care. funding into the expansion of HIT. In 2009 the
904
Table 2 Number of selected types of health-care facilities in the US, 1975–2012

Number of facilities
Type of facility 1975 1980 1985 1990 1995 2000 2005 2010 2012 % chng
Ambulatory care (all facilities)a – – – – 455,381a 489,038a 547,709 a – 582,733 24.53
Physicians’ officesa – – – – 195,449a 203,118a 209,730a – 221,470 12.48
Dentists’ officesa – – – – 114,178a 118,305a 127,033a – 133,194 15.37
Ambulatory surgical centers (Medicare certified) – – 336 1,165 2,112 2,894 4,445 5,316 5,335 176.3
Rural health clinics (Medicare certified) – 391 428 517 2,775 3,453 3,661 3,845 3,940 163.89
Hospitals (all) 7,156 6,965 – 6,649 6,291 5,810 5,756 5,794 5,723 22.25
Nursing homes (all) – – – – 16,389 16,886 – 15,700 15,673 4.46
Skilled nursing homes (Medicare certified) – 5,052 6,451 8,937 – 14,841 15,006 15,084 15,132 99.88
Home health agencies (Medicare certified) 2,242 2,924 5,679 5,661 8,437 7,857 8,090 10,914 11,930 136.72
Hospices (Medicare certified) – – 164 772 1,927 2,326 2,872 3,405 3,509 182.14
End-stage renal disease facilities (Medicare certified) – 999 1,393 1,987 2,876 3,787 4,755 5,631 5,766 140.93
Sources: For ambulatory care facilities (all, physicians’ offices, dentists’ offices), Census Bureau 2010 (NAICS data). Obtained from http://factfinder.census.gov/. For hospitals:
Health, United States, 2013, Table 107; Health, United States, 2014, Table 98. For nursing homes (all): Health, United States, 2014, Table 101. For the Medicare-certified facilities
of all types: Health, United States, 2013, Table 111. Column for 2012 in table uses 2011 data
Notes: – Data not available
Information is not available about the methods for counting the number of facilities. We assume that each stand-alone facility is counted whether it is part of a larger organization or
not. In that case if a merger results in the closing of one facility, the number of facilities will decrease, but if a merger does not result in the closing of a facility, the number will be
unchanged
a
Years for these numbers are 1997, 2002, and 2007, respectively. The numbers for 2007 are estimations
A. J. Barnes et al.
6.67 6.44
6.5 6.22
5.83 5.81 5.65
5.44
5.5
4.5
4.3
4.5
3.7
3.4
3.5
2.9
2.64 2.7 2.6 2.6 2.6
2.5
1.5 1.24 1.12 1.12

0.75 0.71 0.79
0.5
1970 1980 1990 1995 2000 2005 2010 2011 2012
Psychiatric institutions Community hospitals Skilled nursing homes
Fig. 3 Number of beds in US community hospitals, psy- for Medicare and Medicaid Services. Sources: (1) For
chiatric institutions, and nursing homes per 1,000 popula- community hospitals: Health United States, 2006, 2007,
tion, 1970–2012 (Notes: Community hospitals are defined 2008, 2009, 2011. (2) For psychiatric hospitals: Foley et al.
as nonfederal, short-term general, and other specialized (2004), DHHS pub. no. (SMA)-06-4195, chap. 19; Health,
hospitals. The types of facilities included in the category United States, 2009, Table 119; Health, United States,
of community hospitals have changed over time. Psychi- 2011, Table 117. (3) For skilled nursing homes: Health,
atric institutions are defined as all 24-h psychiatric hospi- United States, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
tals and residential treatment organizations. Skilled 2011)
nursing homes are those that are certified with the Centers
Health Information Technology for Economic and chiropractors, registered nurses (RNs), and thera-
Clinical Health (HITECH) Act was passed. It pro- pist occupations. Employment also increased with
vides $30 billion to hospitals to adopt EHRs. most of the technologist and technician occupa-
Hospitals must build systems that have “meaning- tions and all of the support occupations. Employ-
ful use” in stages of increasingly advanced ment fell for dentists, physician assistants, and
requirements (Adler-Milstein et al. 2014). In addi- clinical laboratory personnel.
tion, the ACA has incentivized physicians and
hospitals to adopt EHRs by encouraging innova- International Mobility
tions such as ACOs, which are difficult to run The numbers of US health-care professionals
without an EHR (Adler-Milstein et al. 2014). include immigrants to the US and exclude emi-
grants from the US. In 2014, 26% of physicians
and 24% of residents in specialty programs in the
Human Resources US were international medical graduates
(Ranasinghe 2015). Over 8% of the US nursing
Health-Care Workforce workforce in 2004 consisted of international nurs-
Table 3 presents the numbers of workers ing graduates (US DHHS 2010).
employed in several health-care occupations Although immigrants add to the health-care
between 1990 and 2014. Increases in employment workforce supply, there is no evidence that they
occurred with most health-care diagnosing and improve distributional issues. Furthermore, a reli-
treating practitioners, such as physicians, ance on immigration reduces the incentive to
Table 3 Employed US health-care personnel per 1,000 population, 1990–2014 (selected occupations)
%
1990 1995 2000 2005 2010 2011 2012 2013 2014 chng
Health-care diagnosing and treating practitioners
Chiropractors – – 0.15 0.28 0.18 0.18 0.19 0.18 0.21 0.56
Dentists 0.64 0.59 0.61 0.55 0.57 0.58 0.53 0.58 0.60 0.01
Optometrists 0.09 0.13 0.12 0.14 0.12 0.09 0.11 0.13 0.15 0.27
Pharmacists 0.69 0.65 0.80 0.84 0.83 0.88 0.91 0.88 0.92 0.15
Physicians and surgeons 2.32 2.64 2.62 2.81 2.82 2.64 2.91 2.95 3.18 0.20
Physician assistants – – 0.15 0.25 0.32 0.26 0.35 0.41 0.26 0.84
Podiatrists 0.06 0.04 0.02 0.04 0.04 0.02 0.03 0.04 0.03 0.01
Registered nurses 6.70 7.52 7.79 8.17 9.21 8.68 9.19 9.15 9.06 0.16
Occupational therapists 0.15 0.20 0.20 0.29 0.35 0.36 0.38 0.35 0.35 0.59
Physical therapists 0.37 0.49 0.51 0.60 0.61 0.71 0.67 0.71 0.77 0.45
Respiratory therapists 0.25 0.36 0.27 0.32 0.42 0.43 0.35 0.35 0.35 0.27
Speech-language therapists 0.25 0.35 0.31 0.33 0.43 0.40 0.47 0.43 0.43 0.35
(pathologists)
Health-care technologists and technicians
Clinical laboratory 1.20 1.42 1.02 1.13 1.11 1.03 1.02 1.08 0.92 0.10
technologists and technicians
Dental hygienists 0.35 0.36 0.39 0.45 0.46 0.47 0.52 0.58 0.55 0.35
Licensed practical and 1.77 1.52 1.81 1.72 1.86 1.80 1.70 1.77 2.01 0.11
licensed vocational nurses
Medical records and health 0.28 0.08 0.31 0.41 0.38 0.37 0.29 0.28 0.43 0.36
information technicians
Health-care support occupations
Nursing, psychiatric, and 5.87 6.69 5.24 6.42 6.24 6.36 6.77 6.75 6.21 0.16
home health aides
Dental assistants 0.76 0.80 0.76 0.88 0.97 0.98 0.88 0.88 0.86 0.12
Sources: Current Population Survey (CPS), Bureau of Labor Statistics, HRSA, DHHS; US Census Bureau, Census 1990,
2000, 2010, and population estimates 2011–2014
Notes: Dashes indicate data are not available. % change is from 1990 to 2014 or from the earliest year. A new occupational
classification system for occupational employment (SOC) was introduced by the CPS in 2003. The 1990 and 1995 data are
based on the old classification system and may not be fully comparable to later data. The table reports numbers employed
rather than full-time equivalents (FTEs), so the actual amount of human resources employed may be less than that
reflected in the table due to part-time employment. On the other hand, since these are employment numbers, the total
number of individuals in each occupation would be larger if unemployed individuals were counted
Calculations: Employment and population were rounded to three decimal places
expand educational capacity, raise wages, and rural areas is only 4/5 that of urban areas (Hing
improve working conditions (Flynn and Aiken and Hsiao, 2014). In nursing, the biggest distribu-
2002). Finally, migration from low-income coun- tional issue is the low number of RN faculty
tries is a “brain drain” for those countries (Aiken (AACN 2017). This creates bottlenecks in the
2007). educational process and contributes to nursing
shortages (AACN 2017). The ACA includes pol-
Distribution icies aimed at improving supply and distribution
The US has a high proportion of specialists to issues related to primary care including scholar-
primary care physicians (around 1.5 times as ships and loan repayment programs for primary
many in 2012) (Hing and Hsiao 2014). Further, care physicians, short-term increases in primary
the primary care physician to population ratio in care payment rates for Medicaid, and additional
support for Federally Qualified Health Centers to CDC. Federal laws allow state health agencies to
provide essential health services to more determine the scope and amount of services and to
uninsured and low-income patients. establish the vehicles for providing those services.
As a result, the services vary significantly across
Adequacy the states. Local public health agencies at the
Projections of the adequacy of physicians using county or city levels (“health departments”)
several forecasting models indicate a future short- carry out many public health functions (Salinsky
age of physicians of 5–20% by 2020 (COGME 2010).
2005; BHPr 2008). Other projections indicate that Public health services include communicable
a smaller increase in supply would be needed if disease control, environmental hazard prevention,
distributional issues were improved or if there was emergency terrorism preparedness and response,
an increased use of nonphysician providers and occupational health, health promotion and screen-
osteopaths (Weiner 2007). In nursing, forecasters ing, and licensing, regulation, and planning of
unanimously predict a large future shortage health-care facilities and providers.
(BHPr 2010).
Outpatient Services
Provision of Health-Care Services
Primary Care
The US has several major health-care sectors, In 2010 55% of the visits to physicians in the US
including public health, primary, specialty, acute were to a primary care physician (US Depart-
inpatient, dental, mental health, pharmaceutical, ment of Health and Human Services 2014). Pri-
post-acute, long-term, and palliative care. Access mary care practitioners are physicians, nurse
to these services and navigation through the US practitioners, physician assistants, and nurse
health-care system differs depending upon the midwives who are generalists or who specialize
care that is needed and whether an individual is in family medicine, internal medicine, pediat-
insured or uninsured. Insured individuals tend to rics, obstetrics, and gynecology (Bodenheimer
enter the health-care system through a primary and Pham 2010).
care or specialty provider. Uninsured individuals Access to primary care requires that patients
often do not have a regular primary care provider have the ability to pay for care, adequate trans-
but instead may visit community health centers portation to care, and the health literacy to
and emergency departments. Due to out-of-pocket demand and use the care; it also requires that the
costs, they may be reluctant or unable to seek care supply, distribution, and time of providers are
unless they are experiencing an emergency. adequate (Shi and Singh 2012). For these reasons,
the uninsured and those with insurance but unable
to afford high out-of-pocket costs due to inade-
Public Health quate coverage have difficulty accessing primary
care. Additionally, those covered by Medicaid
Public health focuses on promoting health at the may experience problems accessing primary
population level through investigating and inter- care due to their inability to find a private physi-
vening in the environmental, social, and behav- cian that accepts Medicaid patients (Shi and
ioral factors in health and disease. It emphasizes Singh 2012).
prevention and health promotion (Shi and Singh
2012). Public health is promoted mostly through Specialty Care
public agencies. At the federal level, public health Forty-five percent of visits to physicians in the US
services are headed by the US Public Health Ser- in 2010 were to specialists (US Department of
vice (USPHS), a division of HHS. There are sev- Health and Human Services 2014, Tables 91,
eral subdivisions within the USPHS, such as the 92). Many of the issues with access to primary
care are even more of a concern with specialty and fractures. Medical care is typically performed
care. Care coordination among primary care and by family physicians, nurse practitioners, and
specialist providers is a growing issue in the US, physician assistants (Weinick et al. 2009).
where the typical Medicare beneficiary sees two In 2011 there were more than 9,000 urgent care
primary care physicians and five specialists a year, centers (UCCs) in the US (Yee et al. 2013). Urgent
and patients with multiple conditions may see up care services have expanded in response to diffi-
to sixteen physicians (Bodenheimer 2008). This culties in seeing primary care practitioners on an
can lead to over-, under-, and conflicting treatment urgent basis and after-hours, high ED costs, and
and polypharmacy. Two initiatives to improve long ED wait times (Yee et al. 2013). Some indi-
care coordination in the US are patient-centered viduals use UCCs because they do not have a
medical homes (PCMHs) and ACOs (Phillips and regular source of primary care. An individual
Bazemore 2010; CMS 2012). In PCMHs each must have insurance or pay out-of-pocket for care.
patient has an ongoing relationship with a primary
care provider, who directs the medical team, and Retail Clinics
the patient’s care is coordinated across all health- Located in pharmacies, grocery stores, and depart-
care settings, with patients actively participating ment stores, retail clinics are emerging as places to
in decision-making (Rittenhouse et al. 2011). In go for treatment of minor medical conditions
ACOs payment from Medicare is tied to the per- (RAND 2010). They tend to be staffed by non-
formance of the provider organization, thus con- physician practitioners, such as nurse practi-
ferring financial risks and rewards for care tioners or physician assistants, and they treat a
management and patient outcomes to providers. limited number of conditions and needs, such as
skin conditions, sore throats, pregnancy testing,
Emergency Care infections, diabetes screening, and immunizations
Emergency departments (EDs) are a major part of (Mehrotra et al. 2008).
the US health-care safety net (Shen and Hsia
2010). EDs in hospitals that receive payment
from Medicare are required by the Emergency Acute Inpatient Care
Medical Treatment and Active Labor Act
(EMTALA) to provide care to anyone needing Individuals who are acutely ill and need to have
emergency treatment. Hospitals must care for the round-the-clock care require inpatient care pro-
individuals until they are stable. This allows vided in hospitals. The availability of hospital
under- and uninsured persons access to the ED services depends upon the insurance status of the
for emergency conditions. individual seeking care, the type of hospital, and
EDs tend to be overused for nonurgent prob- the geographic area. For those who have private or
lems and for serious problems that could have public insurance, care is accessed through a phy-
been prevented with better primary and specialty sician referral to a hospital that the physician
care. ED overcrowding, long wait times, hospital recommends and that is in the insurance provider
diversions, the lack of ED space and staff, and network. For those without insurance, access to
patient boarding have been problems for many care depends upon how sick they are.
years (GAO 2009). When an uninsured patient’s condition is not
an emergency (such as planned surgery), access to
Urgent Care hospital care becomes dependent upon hospital
Urgent care is walk-in care provided outside the ownership. Government-owned hospitals must
ED setting in centers that are open in the evening provide charity care to those who do not have
on weekdays and at least 1 day over the weekend insurance or cannot pay for out-of-pocket portions
(Weinick et al. 2009). Services focus on acute of their care (Weissman et al. 2003). These hospi-
episodic care for minor illnesses and emergencies tals provide the majority of charity care in the US
such as upper respiratory infections, lacerations, (Weiner et al. 2008). Charity care is also provided
by nonprofit private hospitals. It is financed discrimination against those with preexisting

through federal payments for treating Medicaid mental health conditions, increasing rates, or can-
patients for DSH hospitals, tax exemptions, and celing insurance for those who develop mental
cross-subsidies from other payers (Weissman et health conditions.
al. 2003). For-profit hospitals also provide charity
care, but they do not receive tax exemptions for
this, and it is unclear whether they provide as Pharmaceutical Care
much charity care as nonprofit hospitals (Cram
et al. 2010; Schlesinger et al. 2003). The expan- Spending on prescription drugs has been the
sion of health insurance, as being undertaken fastest-growing component of US health costs
through the ACA, is expected to improve access until just recently. Since 1970 spending increased
to inpatient care in the US and reduce hospitals’ rapidly until 2001 (CMS 2014). From the 1990s to
uncompensated care costs, cost shifting, and other 2015 US spending on retail prescription drugs
irrationalities of the system. increased from 7% to 12% of total health expen-
ditures (GAO, 2017). Pharmaceutical production
and marketing in the US are completely privatized
Mental Health Care but are regulated by the Food and Drug Adminis-
tration (FDA). Prices are not regulated, although
The mental health-care landscape has changed the government negotiates payment discounts in
significantly over the past decades. Long-term some of its programs such as Medicaid (but not
institutionalization, which was a major treatment Medicare where a provision in the Part D legisla-
strategy for many mental health problems, is no tion prohibits Medicare from negotiating bulk
longer the preferred way to treat those problems. discounts on drugs).
Instead, treatment occurs through outpatient care, Many pharmaceuticals are overused, inappro-
accompanied by the increased use of pharmaceu- priately used, and underused in the US. Overuse
ticals which can be managed on an outpatient and inappropriate use occur with certain medica-
basis, and short-term inpatient stays (US Depart- tions such as antibiotics and antidepressants and
ment of Health and Human Services 2014, Table with the practice of polypharmacy among the
106; Ling et al. 2008). elderly (Conti et al. 2011; Misurski et al. 2011;
Only about one-third of Americans with men- van der Hooft et al. 2005). Underuse is associated
tal health problems actually receive treatment for with financial barriers. In 2011, 23% of individ-
their problem (Cunningham 2009). Insured uals in the National Health Interview Survey
patients generally receive mental health care in reported cost-related medication underuse
the outpatient settings of offices of private psychi- (Berkowitz et al. 2014).
atrists, psychologists, and licensed social workers Overuse of medications has been cited as result
and inpatient settings of private psychiatric and of aggressive marketing by pharmaceutical com-
general hospitals (Shi and Singh 2012). Patients panies to both physicians and consumers (Brody
without insurance who cannot pay out-of-pocket and Light 2011; Budetti 2008; Williams et al.
expenses are treated in state and county mental 2011). Pharmaceutical companies sometimes
health hospitals, community health centers, EDs, market their drugs by taking advantage of new
and hospitals (Shi and Singh 2012). Other access diseases, literally promoting the existence of the
issues include shortages of mental health pro- disease in their advertisements (also known as
viders and the stigma that is attached to mental “disease mongering”) (Brody and Light 2011). A
illness (Cunningham 2009). health problem is reframed and promoted in the
A goal of the ACA is to improve access to media and popular culture as having a pharma-
mental health care by promoting mental health ceutical solution (Williams et al. 2011). These
parity and expanding insurance coverage for men- strategies have been termed “pharmaceutica-
tal health. Insurance regulation will prohibit lization.” Whether a condition is a true health
problem and is best treated with pharmaceuticals the elderly are fully covered by Medicare, the
or other products, or has been pharmaceuticalized, number of uninsured individuals needing hospice
is controversial (Metzl and Herzig 2007). care is quite small (Lorenz et al. 2003). For the
small number of individuals without insurance
coverage, hospices may provide care regardless
Long-Term Care of ability to pay (Pietroburgo 2006).
Long-term care consists of a number of different

health-care services for individuals with condi- Reforms
tions that are not expected to significantly
improve and that need ongoing care. The Patient Protection and Affordable Care Act
Through a complex financial web, essentially (ACA) constitutes one of the most important
all Americans have access to nursing homes. The reforms to the US health system to date. The
financial options are as follows: If an elderly per- ACA was signed into law in 2010 and was
son is admitted to a nursing home post hospitali- implemented over several years. Its scope is very
zation, Medicare will cover a limited amount of broad, and while its principle goal was to increase
skilled nursing days, contingent upon rehabilita- access to health services through the expansion of
tion progress. If the individual needs to stay both private and public insurance, it also included
beyond Medicare-covered days, or was never hos- measures to improve quality and to control costs.
pitalized, she must pay out-of-pocket or through In the version of the ACA signed into law, almost
Medicaid, if an individual has used up (“spent everyone was required to have insurance; this is
down”) her own assets first (not including a family called the “individual mandate.” There were pen-
home and other exclusions). A private room in a alties for failure to have insurance, but exemptions
nursing home averaged $90,000 a year in 2016 apply (e.g., religious objection, inability to pay).
(Longtermcare.gov, 2018), so those paying out- However, in 2017, the individual mandate to pur-
of-pocket soon run out of money. Long-term care chase insurance was repealed by Congress- indi-
insurance covers nursing home care, but few viduals will no longer be required to purchase
Americans have this insurance (Kovner and coverage beginning in 2019. Sliding scale subsi-
Knickman 2011) because it is expensive and dies help individuals and families purchase
only rarely subsidized. required private health insurance coverage
through health-care exchanges. For example, a
family of four (all nonsmokers) with a very-low-
Palliative Care income level of $23,550 in 2014 received a tax
credit to cover 95–100% of its insurance pre-
Palliative care is the care of persons with a termi- miums if purchased on a government-sponsored
nal illness. It entails the relief of pain and other health insurance exchange officially called the
symptoms to make the person comfortable and Marketplace. The same with an income of
psychosocial and spiritual support (Field and $40,000 per year received a tax credit worth
Cassel 1997). Hospice services are an integral 77% of the total cost of their health insurance.
part of palliative care and were delivered to 1.6 They had to pay $161 per month or about 5% of
million persons in 2009, mostly older persons and their annual income for health insurance. If this
those with cancer (Shi and Singh 2012; NHPCO family’s income reached 400% of the FPL or
2010). In 2010, 32% of Medicare decedents older around $95,000 per year, they had to purchase
than 65 years received care from a Medicare-cer- insurance without any subsidy. They paid about
tified hospice (Aldridge et al. 2015). 9% of their annual income for health insurance.
Medicare, Medicaid in most states, and most For a given amount of coverage offered by a
private insurance plans cover hospice. Due to the particular private insurer, premiums can vary by
fact that most hospice care is for the elderly, and rating area (i.e., geographical location), age,
family size, and tobacco use. A calculator avail- federal government was invited to do this by the
able on the health insurance exchange website state itself, but in other cases the state refused to
allows those seeking insurance to determine the set up their own exchange as a means to protest
approximate of subsidy they will receive (Kaiser against the ACA. The Supreme Court sided with
Family Foundation 2018d). the Obama administration (Burwell) and ruled
Health insurance exchanges have been set up that the intent of Congress had been to provide
by states or the federal government to make it subsidies on all exchanges across the USA.
easier for consumers to compare and choose Medicare benefits were enhanced by the ACA.
health insurance policies by providing informa- Preventive services are covered without a co-pay-
tion in a standardized form. Policies are regulated ment from the patient. Over time, the coverage
as to what they must cover. Insurers selling gap (“doughnut hole”) for prescription drug cov-
through the exchanges cannot reject an applicant erage is being removed. Medicare Advantage
due to health status nor can they charge more to plans (private out-sourced forms of managed
those with a history of preexisting medical condi- care Medicare) are experiencing reductions in
tions. Premiums can, however, vary based on age, how much they are paid by the federal govern-
smoking status, and geographic location. No ment to take care of Medicare patients because of
annual or lifetime limits can be placed on the evidence that they have been paid much more than
value of insurance coverage. There are also limits their costs in the past. Those achieving higher-
on the percent of premiums insurers must use for quality scores for care receive bonuses and those
the health benefits of those who purchase policies. with lower scores, financial penalties.
The ACA also sets Medicaid eligibility stan- Employers with 50 or more employees must
dards which were more generous than those in offer health insurance, or face a penalty. This
effect in many states. The law made the federal mandate became effective in 2015. Employers
government responsible for most of the cost of with fewer employees do not have to provide
this expansion of Medicaid (90–100%) in states coverage. Some small employers receive tax
that were below the new national standard. How- credits to offer coverage.
ever, as a result of the Supreme Court ruling in Providers who choose to organize into ACOs
2011, states were given the option of not have the opportunity to share in any savings they
expanding Medicaid. As of early-2018, 32 states accrue, initially from Medicare but eventually
and D.C. have expanded Medicaid with the others other payers may participate as well. The ACA
working on waivers or not taking action at this includes experiments with innovative payment
time (Kaiser Family Foundation 2018c, 2018d). systems that avoid the problems inherent in fee-
They may, however, choose to participate in sub- for-service reimbursement. Bundled service pay-
sequent years. In June of 2015, the Supreme Court ments are an example. Scholarships and loans
ruled on the King V. Burwell case. King chal- included in the ACA are intended to encourage
lenged the constitutionality of federal subsidies more primary care physicians to work in under-
awarded to those purchasing health insurance on served rural and urban areas. Cost control policies
federal insurance exchanges. When the ACA was in the ACA included the formation of an Indepen-
drafted and adopted into law, wording indicated dent Payment Advisory Board to keep Medicare
that subsidies would be available to those who spending in-line with economic growth. Addi-
enrolled in an exchange “established by the tionally, while the ACA forbids the use of cost-
state,” and King argued that the federal exchanges effectiveness research in determining service cov-
were not established by a state and therefore they erage and reimbursement under Medicare, the law
could not offer subsidies. The case was critical to established the Patient-Centered Outcomes
the survival of the ACA because initially most Research Institute to spur comparative effective-
states (34) failed to establish their own exchange. ness research in the health-care sector.
The federal government had stepped in to set one The ACA was designed to be budget neutral.
up in each of these states. In some cases the To help pay for the ACA, high-income individuals
and families pay higher taxes on unearned and creating challenges as legislators from both
investment income, and they pay higher payroll parties try to shape the U.S. health care system
taxes to finance Medicare. A tax was added to moving forward.
some medical devices and to services offered by
tanning salons. There is also a tax on “Cadillac” or
high-benefit health insurance plans offered by Assessment
employers, although numerous postponements in
Congress have delayed levying the tax until at Overview
least 2020. In the end the ACA is redistributive
from the healthier to the sicker and from the The US health system has both considerable
wealthier to the poorer. strengths and notable weaknesses. These are
The ACA was adopted by a small margin in the discussed in the following sections in the context
Congress and opposition to this reform remains of access, quality and outcomes, and expenditures
strong. But today it is the law and it is unlikely that from the USA and international perspectives.
it will be completely reversed. Voters and stake-
holders become accustomed to the benefits they
receive and removing them is increasingly diffi- Access
cult as time passes. Revisions to the ACA will be
ongoing; health system reform is never final. New In 2013, just prior to the main provisions of the
legislation may be necessary to resolve dilemmas ACA being implemented, it was estimated that
that were overlooked or impossible to resolve at 44.6 million Americans under the age of 65
the time the ACAwas adopted by Congress. While (16.7%) were uninsured (US Department of
the current Republican President Donald J. Trump Health and Human Services 2014, Table 114).
made repealing and replacing the ACA a central This rate had been relatively steady since 2000
focus of his 2016 presidential campaign, wide- except for an uptick during the Great Recession.
spread opposition to repealing the benefits of the The distribution of uninsured was skewed
ACA undermined efforts to remove some of its toward those who were economically most vul-
protections. Nonetheless, Congress repealed the nerable. In 2013, nearly 30% of the non-elderly
individual mandate to purchase health insurance with incomes below twice the federally desig-
(effective in 2019) in addition to other legislative nated poverty level were uninsured, compared to
strategies to reduce ACA protections, including a just 5% of those whose income exceeded 400%
2017 Executive Order by President Trump for of the poverty level. Coverage varied consider-
agencies to explore options that would expand ably by race/ethnicity as well. Among those
short-term health insurance and other less-com- under age 65, about 16% of non-Hispanic
prehensive forms of health coverage, relax rules whites, 19% of African Americans, and 14% of
about associations offering less comprehensive Asians were uninsured. This compares to 31% of
coverage to members, shorten the sign-up period Hispanics/Latinos (US Department of Health
for individual coverage, reduce outreach for and Human Services 2014, Table 114). Poor
enrollment for individual coverage, and attempt and near-poor children were the one group that
to cut spending on federal subsidies offered to has had increasing insurance coverage over the
help individuals purchase health insurance years. Their uninsurance rate in 2013 was about
through the federal exchange. Despite these 7%, less than half that of poor and near-poor
efforts, and the uncertainty and increased costs parents as well as adults without children. The
they created in many state exchanges, enrollment lower uninsurance rates for poor and near-poor
in the exchanges fell only 5% in 2018 compared to children reflected the success of CHIP.
the previous year (Kaiser Family Foundation After nearly 4 years, the 2014 public and pri-
2018a).This suggests that the popularity of the vate insurance expansions brought about by the
expanded coverage afforded by the ACA endures, ACA have reduced the number of uninsured
considerably. Private health insurance coverage is US Data

rising as a result of the employer and individual
insurance mandates, coupled with subsidies pro- Figure 4 shows the relationship between insurance
vided to purchase health insurance. In addition, status and the use of particular services in 2016. The
Medicaid coverage is expanding as program eli- most striking figures relate to having a usual source
gibility rules have been loosened in states that of care, where 49% of the uninsured report having
accept federal subsidies for expansion. As noted, no usual source of care, versus just about 12% for
in those 32 states and D.C., all poor and near-poor those with employer coverage or Medicaid (Kaiser
persons with incomes up to 138% of the federal Family Foundation 2017b). Among the uninsured,
poverty level are covered. By the middle of 2016, 23% report that they did not obtain needed care due
the uninsurance rate was estimated to have to costs, and 18% say that they could not afford a
fallen to 9% (28 million) (Kaiser Family Founda- prescription drug. By comparison, people with
tion, 2018a). Medicaid are roughly half as likely to report these
The ACA also is intended to create more equity problems, with rates even lower for those with
between people in like circumstances. This is private insurance. These figures demonstrate the
accomplished in three primary ways. First, critical role that Medicaid plays in facilitating access
where previously about half of poor and near- to care among those with low incomes.
poor adults (defined here as 138% of the federal Another impact of being uninsured is the stage
poverty level) were ineligible for Medicaid, all at which a person is diagnosed for particular can-
such persons are eligible for coverage in the states cers. For melanoma and colorectal, lung, and
that have elected to accept federal funding for breast cancers, the uninsured are between two
Medicaid expansion. Second, the great majority and three times as likely as the insured to be
of those whose incomes are too high for Medicaid diagnosed at stage III or IV compared to stage I
will be insured through subsidized private cover- (Kaiser Family Foundation 2012).
age. Third, individuals with preexisting medical
conditions or a history of illness will be eligible to
purchase insurance and be able to do so at the International Comparisons
same price as others.
In the US, there is a direct relationship Comparative international data used in this section
between insurance status and having one’s are obtained from the Commonwealth Fund, a US-
usual source of medical care in a physician’s based foundation. Eleven countries were included
office. Generally, those with private health insur- in the surveys, with samples in each country rang-
ance and Medicare have access to physicians’ ing from approximately 1,000–3,000 (for method-
private practices. This is not the case, however, ology, see High et al., 2017).
for most of the uninsured and, as mentioned Compared to ten other developed nations
earlier, many persons on Medicaid. Having a included in the survey, access problems due to
usual source of care provides a critical entry the cost of medical care are greater in the USA.
into the health-care system through access to Table 4 examines sicker adults (those in poor
primary care, preventive services, and referrals health, having received surgery or hospitalization
to specialists. In 2013, 76% of women with a in the past 2 years, or received care for a chronic
usual source of care received mammograms illness, injury, or disability in the past year). The
within a 2-year period, and 84% received cervi- table shows five access problems that result from
cal exams in the past 3 years. For those without a costs, where in each case, Americans had greater
usual source of care, the figures were 30% and problems than those in other countries. To illus-
62%, respectively (US Centers for Disease Con- trate, the table shows that 33% of Americans had
trol and Prevention 2015). problems accessing medical care due to costs in
Selected measures of access are discussed the past year. The next highest figures were 22%
next, first for the US and then across countries. (Switzerland) and 17% (France). In sharp
Fig. 4 Barriers to health care among non-elderly adults by insurance status, 2016 (Kaiser Family Foundation 2017b)
contrast, the figure was just 7% in the UK and in perspective, while it does not perform so well on
the Germany (High et al. 2017). others. Performance on some of these measures is
A final set of metrics regarding access regards discussed next.
in how timely of a manner care is received. Table
5 shows several indicators of waiting times in 11
high-income countries. The US performed well
Mortality
internationally with regard to seeing a specialist
US life expectancy at birth was 81.2 years in 2015
and getting elective surgery, with Germany
(Worldbank 2015). It tied for 26th out of the 32
and France performing best and Norway and
high-income OECD countries, at about 2 years
Canada worst. The picture is different for
below the median. With respect to infant mortal-
primary care. The US ranked 8 out of the 11
ity, US rates have declined substantially over the
countries for seeing a doctor or nurse on the
past two decades but not as fast as other countries.
same or next day. This is not surprising. Access
As a result, it ranks the highest among the 31 high-
to specialty care and surgery is relatively
income OECD countries in infant mortality
high because there are ample resources and
(OECD 2015).
few restrictions on what and how much medical
Amenable mortality is defined as “premature
equipment hospitals, other health facilities,
deaths from causes that should not occur in
and physicians can purchase and own. In
the presence of timely and effective health
contrast, primary care efforts in the US fall
care” (Nolte and McKee 2011). Figure 5,
behind many other high-income countries
adapted from a 2017 Commonwealth
(Starfield and Shi 2002).
Fund report, illustrates that in the 2014 period,
the USA had the highest amenable mortality
rate among all countries, nearly double that of
Outcomes and Quality Switzerland, the country with the lowest
figure (Schneider et al. 2017). Typical explana-
The US performs well on some measures of tions for the poor US performance compared to
quality and outcomes from an international other countries with respect to mortality rates
Table 4 Cost-related access problems in 11 high-income countries

Raw scores (%)
Source AUS CAN FRA GER NETH NZ NOR SWE SWIZ UK US
Overall 2016 2 9 10 8 3 4 4 6 6 1 11
benchmark
ranking
Had any cost- 2016 14 16 17 7 8 18 10 10 22 7 33
related access
problem to
medical care
in the past
year
Skipped 2016 21 28 23 14 11 22 20 20 21 11 32
dental care or
check up
because of
cost in the
past year
Insurance 2016 9 14 24 8 8 2 2 2 12 1 27
denied
payment for
medical care
or did not pay
as much as
expected
Patient had 2016 5 6 23 4 7 5 8 5 11 1 20
serious
problems
paying or was
unable to pay
medical bills
Doctors 2015 25 30 17 13 52 30 3 6 9 12 60
report their
patients often
have
difficulty
paying for
medications
or out-of-
pocket costs
Out-of- 2016 16 15 7 5 7 7 13 4 46 4 36
pocket
expenses for
medical bills
more than
$1,000 in the
past year,
US$
equivalent
Source: (High et al. 2017)
include “a high rate of uninsured and a Objective Measures of Quality

fragmented delivery system with relatively There exist voluminous data on outcomes and
weak primary care and poor coordination of quality of care in the US. The discussion is
care between providers and sites” (Schoenbaum divided into three sections: prevention and screen-
et al. 2011). ing, cancer survival rates, and asthma admissions.
916
Table 5 Timeliness of care in 11 high-income countries

Raw scores (%)
Last time needed medical attention was able to see doctor or nurse the 2016 67 43 56 53 77 76 43 49 57 57 51
same or next day
Very or somewhat difficult to get medical care in the evening, weekend, 2016 44 63 64 64 25 44 40 64 58 49 51
or on a holiday without going to the emergency room (base, sought
after-hour care)
Waiting time for emergency care was 2 h or more (base, used 2016 523 50 9 18 20 30 34 39 26 32 25
emergency room in past 2 years)
Waiting time to see a specialist was 2 months or more (base, saw or 2016 13 30 4 3 7 20 28 19 9 19 6
needed to see a specialist in past 2 years)
Waiting time of 4 months or more for elective/nonemergency surgery 2016 8 18 2 0 4 15 15 12 7 12 4
(base, those needing elective surgery in the past year)
Source: (High et al. 2017)
A. J. Barnes et al.
Fig. 5 Mortality amenable to health care (Source: Adapted Switzerland (2013), and the U.K. (2013). Amenable mortal-
from Schneider et al. 2017). Data from: European Observa- ity causes based on Nolte and McKee (2004). Mortality and
tory on Health Systems and Policies (2017). Trends in population data derived from WHO mortality files (Sept.
amenable mortality for selected countries, 2004 and 2014. 2016); population data for Canada and the U.S. derived from
Data for 2014 in all countries except Canada (2011), France the Human Mortality Database. Age-specific rates standard-
(2013), the Netherlands (2013), New Zealand (2012), ized to the European Standard Population (2013).
Unless otherwise noted, all data are from OECD with regard to breast cancer treatment, in part due
(2015). to the high mammography screening rates. The 5-
Prevention and Screening: The US immuniza- year survival rate, 89%, is highest of 18 OECD
tion rates in 2015 were diphtheria, tetanus, and countries. The US survival rate for cervical cancer
pertussis, 84.6%; measles, 91.9%; hepatitis B, of 62%, in contrast, is the third lowest of the 18
92.6%, and influenza, 67%. The US is among countries. In contrast, for colorectal cancer, with a
the lower half of countries for DTP, measles, and 5-year survival rate of 64%, the US ranks in the
hepatitis B. It is, however, among the countries top third of the countries.
with the highest rates for influenza vaccination. Asthma Admissions: The hospital admission
With regard to screening rates for breast cancer rate for asthma in the US is among the highest
(mammography) and cervical cancer (Pap among the 32 high-income OECD countries, at
smears), of the 14 countries OECD compared, 89.7 per 100,000 population, with only the Slovak
the US has the second highest mammography Republic and Korea higher. This is likely the
(cancer screening) rate for women age 50–69, result of a high uninsurance rate and poor preven-
at 81% (after the Netherlands) among 12 tive care.
countries, and (among 11 countries) the highest
cervical cancer screening rate for women age Subjective Measures of Quality
20–69, at 85%. The leading source of these data for international
Cancer Survival: Cancer survival is often con- comparisons is the Commonwealth Fund, using
sidered a good measure of the quality of a medical annual surveys of patients or physicians that have
care system because high survival rates are related been conducted in up to 11 countries since 2007.
both to preventive (screening) care and to treat- The 2011 survey focused on adults with a history
ment success. The US has been very successful of illness, while the 2013 survey examined
nationally representative samples of all adults. on average, a 3.8-year longer life expectancy than
The data below are from the 2014 report (Davis African Americans. This gap had narrowed con-
et al. 2014). siderably in the recent years, as in 2006, it was
With regard to care coordination, compared to 5.1 years.
the other countries, sicker adults in the US had This disparity between African Americans and
among the highest rates of problems with test other races also holds for certain diseases. Diabe-
results or records not being available when they tes rates, for example, are 80% higher among
saw their doctor as well as having duplicate tests African Americans than whites. For end-stage
ordered. One area in which the US did well was renal disease, African American incidence and
patients receiving a written plan for care after prevalence rates are about three times those of
hospital discharge or surgery – at 92%, well whites. There are disparities by income as well.
higher than the other ten countries. In the case of diabetes, rates for those below 200%
Five metrics of patient safety are shown in of the FPL are twice those of people above 400%
Table 6: that the patient believes there was a of the FPL. While diet and genetic factors play a
medical mistake made in treatment, received the strong role in diabetes, disparities in treatment
wrong medication or dose, that there were incor- relate to both the medical care system itself and
rect test results, there were delays in obtaining access to it. Similarly, there are different cancer
abnormal test results, and those hospitalized survival rates according to race. Overall 5-year
reported an infection from the hospital stay. For survival rates in the 1999–2006 period were
the first four measures, the US ranked near the 69% for whites compared to 59% for African
bottom in patient safety among the 11 countries. Americans. Among ten of the most common
However, for the last measure (hospital infec- types of cancer, whites had higher survival rates
tions), the US figure was the best (Davis et al. for nine of them (all but stomach cancer).
2014). One of the stated objectives of the ACA is to
improve quality and outcomes. First, preventive
Equity of Outcomes care is encouraged because such services will not
The US suffers from major inequities or dispar- be subject to patient co-payments under Medicare
ities in access to health care as well as in health and Medicaid. Medicare will also cover one com-
outcomes. A few of the more noteworthy dispar- prehensive risk assessment. Second, ACOs, some
ities are discussed here (unless noted, all figures believe, can increase quality by encouraging coor-
are from the US Department of Health and Human dination of currently disparate providers and dis-
Services (2016)). Beginning with infant mortality, couraging the provision of unnecessary services.
the overall rate in 2015 was 5.9 deaths per 1000 Third, additional comparative effectiveness
live births. The rates for both whites (4.9) and research will be funded, and fourth, a number of
Hispanics/Latinos (5.01) are considerably higher financial incentives based on quality and out-
than they are for Asian/Pacific Islanders (3.7). The comes are initiated under the legislation. These
rate for African Americans, however, is more than include reimbursement incentives for hospital
double that of whites, at 10.9. The infant mortality performance and value-based payments to
rate for American Indians and Alaskan Natives is providers.
also considerably high at 7.7, higher than the rate
for whites, Hispanics and Asians. Infant mortality
also varies considerably by state, with the rate in Expenditures
Massachusetts (4.3) about half that in several
states in the South. Given the racial differences The US spends far more on health care per person
just noted, it is not surprising that the states with than any other country. There is little agreement
the highest rates tend to have higher proportions on why the US is an outlier in this regard. Those
of African American residents. Life expectancy at on the left often point to what they see as several
birth shows similar patterns: In 2015, whites had, contributing factors: lack of consolidated
Table 6 Measures of patient safety in 11 high-income countries

Raw scores (%)
Overall
benchmark
ranking
Patient 2011 10 11 6 8 11 13 17 11 4 4 11
believed
mistake was
made in
treatment or
care in past
2 years
Patient given 2011 4 5 6 8 6 7 8 5 2 2 8
wrong
medication or
wrong dose at
a pharmacy or
hospital in
past 2 years
Patient given 2011 4 5 3 2 6 5 4 3 3 2 5
incorrect
results for a
diagnostic or
lab test in past
2 years (base,
had a lab test
ordered)
Patient 2011 7 11 3 5 5 8 10 9 5 4 10
experienced
delays in
being notified
about
abnormal test
results in past
2 years (base,
had a lab test
ordered)
Hospitalized 2013 9 11 8 10 12 12 10 8 10 12 5
patients
reporting
infection in
hospital or
shortly after
Source: Davis et al. (2014)
purchasing power among buyers of care, the lack provision and overutilization of services. Other
of universal insurance coverage, high marketing factors that observers on both sides point out are
and administrative costs among private insurers, high unit prices paid to providers, particularly in
too many specialists and not enough primary care the fee-for-service system, proliferation of medi-
doctors, and direct-to-consumer advertising of cal technologies, and unhealthy behaviors.
prescription drugs. Those on the right point to a Per capita spending is more than double the
bloated government bureaucracy and a myriad of median level for OECD countries, nearly 40%
regulations that stifle competition, along with more than the second most expensive country,
medical liability laws that encourage over- Switzerland, and health-care expenses constitute
Fig. 6 Cumulative increases in health insurance premiums, workers’ contributions to premiums, inflation, and workers’
earnings, 1999–2014
over one-sixth of the US economy (Hartman et al. There are two overall ways in which the ACA
2014). The rate of growth in health-care spending may help contain expenditures. First, it includes a
exceeded the GDP growth rate every year since at number of initiatives that have the potential to
least the 1960s until 2010, which has increasingly change the financing and delivery system. These
squeezed the finances of all levels of government, include encouraging the development and/or
employers, and individuals. growth of ACOs; bundled payment systems,
Employers and employees also have seen large which provide payment for a set of related ser-
increases in their contributions to the health-care vices usually related to an episode of illness
costs of employer-sponsored health insurance. (as opposed to fee-for-service); medical homes
Between 1999 and 2014, total premiums rose by (a physician-directed organization that oversees
191% and the workers’ share by 212%. In contrast, the provision of access to comprehensive care
wages rose by only 54% over this period (Fig. 6). across health-care facilities and over a patient’s
Looking now at changes over time, Fig. 7 life); electronic medical records; and the linking
illustrates growth in national health expenditure of reimbursement to performance outcomes
per capita expressed in US purchasing power par- (initially, for Medicare hospital stays).
ities for six countries: Canada, Germany, Japan, In addition, the ACA includes a number of
the Netherlands, the UK, and the US from 2000 to direct mechanisms that could control expendi-
2016. Growth rates in the Netherlands and Japan tures, including large cuts in previously expected
exceed those of the other countries. However, in payment levels to Medicare Advantage (usually,
2016, US spending was more than double that in managed care) plans, which in 2012 were
the UK because the UK started at such a low level estimated to have been paid 7% more than it
of spending. Thus, when one combines both level would have cost for the same individuals to have
of spending and rate of growth, the US is an been enrolled in the traditional fee-for-service
international outlier. Medicare program (Medicare Payment Advisory
Expenditures per Capita (US$$, Current PPP)

6,000,000.0
5,000,000.0
4,000,000.0
3,000,000.0
2,000,000.0
1,000,000.0
0.0
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Canada Germany Japan Netherlands United Kingdom United States
Fig. 7 National health expenditures, per capita in six countries, 2000–2016 (Source: OECD 2017)
Commission 2012), the tax on “Cadillac” or afforded through large government insurers, the
high-benefit health insurance plans, and the Inde- lack of a centralized prices and prospective
pendent Payment Advisory Board, which is to budgeting, and, most importantly, the absence of
recommend ways to reduce Medicare costs if guaranteed insurance coverage.
they exceed a certain threshold. With the adoption of the Affordable Care Act in
The ACA does not include a number of 2010, and subsequent legal and policy challenges
cost-containment methods that have been to its core provisions, the US health care sustem
employed in some other countries. These continues to change. Nonetheless, despite many
include global budgets, coordinating provider legal and political challenges, the core provisions
payment among public and private insurers (i.e., of the ACA have endured. The ACA addresses
an “all-payers” system), controlling the supply of major challenging issues such as geographic varia-
resources (e.g., through expenditure targets tion in the use of services and a bias toward sub-
or technology controls), and using cost-effective- specialty rather than primary care services
ness research to determine which services should but mainly through small programs and pilot stud-
be reimbursed and, if so, how much. ies. The types of changes needed in health-care
delivery are unlikely to result from legislation.
Rather, they need to be innovated and supported
Conclusions by both the public and private sectors as each
grapples with the cost, quality, and access issues
In summary, the US health-care system is among the they face. They also hinge on changing individual
best in the world in some respects while suffering and provider behaviors. Solving the most vexing
from significant shortcomings in others. The US is health-care financing, delivery, and policy issues
distinguished from its counterparts by its historic depends as much on finding a common ground
distaste for health planning, lack of control over the among US policymakers and, more broadly, the
dissemination of medical technologies, reluctance to American public, as it does on medical, social,
take advantage of the potential bargaining power behavioral, and organizational sciences.
References CMS. What’s an ACO? Centers for Medicare and Medic-

aid Services web page. 2012. https://www.cms.gov/
AACN. Nursing Faculty Shortage: American Association ACO/. Accessed 19 Apr 2013.
of Colleges of Nursing Fact Sheet. 2017. http://www. CMS. National health expenditure data. 2014. http://www.
aacnnursing.org/News-Information/Fact-Sheets/Nursing- cms.gov/Research-Statistics-Data-and-Systems/Statis
Faculty-Shortage tics-Trends-and-Reports/NationalHealthExpendData/
Adler-Milstein J, DesRoches CM, Furukawa MF, Worzala index.html
C, Charles D, Kralovec P, Jha AK. More than half of US COGME. Physician workforce policy guidelines for the
hospitals have at least a basic EHR, but stage 2 criteria United States, 2000–2020. Washington, DC: Committee
remain challenging for most. Health Aff (Proj Hope). on Graduate Medical Education; 2005. www.cogme.
2014;33(9):1664–71. gov/16.pdf. Accessed 19 Apr 2013.
Aiken L. US nurse labor market dynamics are key to Cohen RA, Martinez ME. Health insurance coverage: early
global nurse sufficiency. Health Serv Res. 2007;42 release of estimates from the National Health Interview
(3):1299–310. Survey. Jan–Mar 2015. http://www.cdc.gov/nchs/data/
Aldridge MD, Canavan M, Cherlin E, Bradley EH. Has nhis/earlyrelease/insur201508.pdf. Accessed 6 Aug 2015.
hospice use changed? 2000–2010 utilization patterns. Congressional Budget Office. Insurance coverage provi-
Med Care. 2015;53(1):95–101. sions of the Affordable Care Act – CBO’s April 2014
Berkowitz SA, Seligman HK, Choudhry NK. Treat or baseline. 2014. https://www.cbo.gov/sites/default/files/
eat: food insecurity, cost-related medication cbofiles/attachments/43900-2014-04-ACAtables2.pdf.
underuse, and unmet needs. Am J Med. Accessed 19 Apr 2013.
2014;127(4):303.e3–10.e3. Conti R, Busch A, Cutler D. Overuse of antidepressants in
BHPr. The physician workforce: projections and research a nationally representative adult patient population in
into current issues affecting supply and demand. BHPr, 2005. Psychiatr Serv. 2011;62(7):720–6.
HRSA, U.S. DHHS. Dec 2008. http://bhpr.hrsa.gov/ Cram P, et al. Uncompensated care provided by for-profit,
healthworkforce/reports/physwfissues.pdf. Accessed not-for-profit, and government-owned hospitals. BMC
19 Apr 2013. Health Serv Res. 2010;10:90.
BHPr. The registered nurse population: findings from the Cunningham PJ. Beyond parity: primary care physicians’
2008 National Sample Survey of Registered Nurses. perspectives on access to mental health care. Health
BHPr, HRSA, U.S. DHHS. 2010. http://bhpr.hrsa.gov/ Aff. 2009;28:w490–501.
healthworkforce/rnsurvey2008.html. Accessed 19 Danzon PM, Pauly MV. Insurance and new technology:
Apr 2013. from hospital to drugstore. Health Aff. 2001;20
BLS. Current population survey. Bureau of Labor Statis- (5):86–100.
tics, Department of Labor. 2011a. http://www.bls.gov/ Davis K, Stremikis K, Squires D, Schoen C. Mirror, mirror
cps/home.htm. Accessed 19 Apr 2013. on the wall, 2014 update: how the U.S. health care
BLS. Occupational outlook handbook, 2010–11 ed. Bureau system compares internationally. New York: Common-
of Labor Statistics, Department of Labor. 2011b. http:// wealth Fund; 2014. http://www.commonwealthfund.
www.bls.gov/oco/. Accessed 19 Apr 2013. org/publications/fund-reports/2014/jun/mirror-mirror.
Bodenheimer T. Coordinating care: a perilous journey Accessed 6 Aug 2015.
through the health care system. N Engl J Med. Decker S. Two-thirds of primary care physicians accepted
2008;358:1064–71. new Medicaid patients in 2011–12: a baseline to measure
Bodenheimer B, Pham HH. Primary care: current future acceptance rates. Health Aff. 2013;32(7):1183–7.
problems and proposed solutions. Health Aff. 2010;29 Ennis SR, Ríos-Vargas M, Albert NG. The Hispanic
(5):799–805. population 2010, 2010 Census briefs. U.S. Census
Brody H, Light DW. Efforts to undermine public health: Bureau. 2011. http://www.census.gov/prod/cen2010/
the inverse benefit law: how drug marketing under- briefs/c2010br-04.pdf. Accessed 19 Apr 2013.
mines patient safety and public health. Am J Public Field MJ, Cassel CK. Approaching death: improving care
Health. 2011;101(3):399–404. at the end of life. Washington, DC: National Academies
Budetti PP. Market justice and U.S. health care. JAMA. Press, Institute of Medicine; 1997. http://www.nap.edu/
2008;299(1):92–4. catalog/5801.html. Accessed 19 Apr 2013.
California HealthCare Foundation. California health care Flynn L, Aiken LH. Does international nurse recruitment
almanac. 2009. http://www.chcf.org/~/media/MEDIA influence practice values in U.S. hospitals? J Nurs
%20LIBRARY%20Files/PDF/E/PDF%20EmployerBe Scholarsh. 2002;34(1):67–73.
nefitsSurvey09.pdf. Accessed 19 Apr 2013. Foley DJ, et al. Highlights of organized mental
Centers for Medicare and Medicaid Services. Annual Report health services in 2002 and major national and
of the Boards of Trustees of the Federal Hospital Insur- state trends. In: Manderscheid RW, Berry JT,
ance and Federal Supplementary Medical Insurance Trust editors. Mental health, United States 2004.
Funds. 2016. https://www.cms.gov/Research-Statistics- Rockville: Substance Abuse and Mental Health
Data-and-Systems/Statistics-Trends-and-Reports/ Services Administration; 2004. p. 203, Table 19.2.
ReportsTrustFunds/Downloads/TR2016.pdf. Accessed 5 http://store.samhsa.gov/shin/content/SMA06-4195/
Apr 2018. SMA06-4195.pdf. Accessed 19 Apr 2013.
Gallup. U.S. Uninsured Rate Steady at 12.2% in Fourth Kaiser Family Foundation. Uninsured Rates for Non-
Quarter of 2017. 2017. http://news.gallup.com/poll/ elderly Adults by Gender. 2016a. https://www.kff.org/
225383/uninsured-rate-steady-fourth-quarter-2017.aspx. uninsured/state-indicator/rate-by-gender/?currentTime
Accessed 8 Feb 2018. frame=0&sortModel=%7B%22colId%22:%22Location
GAO. Hospital emergency departments: crowding con- %22,%22sort%22:%22asc%22%7D. Accessed 8 Jul
tinues to occur, and some patients wait longer than 2018.
recommended time frames. Washington, DC: US Gov- Kaiser Family Foundation. Health insurance coverage of
ernment Accountability Office; 2009. http://www.gao. total population. 2016b. https://www.kff.org/other/
gov/new.items/d09347.pdf. Accessed 19 Apr 2013. state-indicator/total-population/?currentTimeframe=0
GAO. Drug Industry: Profits, Research and Development &selectedDistributions=medicaid–medicare–other-
Spending, and Merger and Acquisition Deals. 2017. public&sortModel=%7B%22colId%22. Accessed
https://www.gao.gov/assets/690/688472.pdf. Accessed 21 Feb 2018.
5 Apr 2018. Kaiser Family Foundation. Federal Medicaid Dispropor-
Harrison TD. Consolidations and closures: an empirical tionate Share Hospital (DSH) Allotments. 2016c.
analysis of exits from the hospital industry. Health https://www.kff.org/medicaid/state-indicator/federal-
Econ. 2007;16(5):457–74. dsh-allotments/?currentTimeframe=0&sortModel=%
Hartman M, et al. National Health Care Spending In 2016: 7B%22colId%22:%22Location%22,%22sort%22:%
Spending And Enrollment Growth Slow After Initial 22asc%22%7D. Accessed 21 Feb 2018.
Coverage Expansions. 2017. Health Aff, p.10.1377/ Kaiser Family Foundation. Health Insurance Coverage of
hlthaff. http://www.healthaffairs.org/doi/10.1377/ the Total Population. 2016d. https://www.kff.org/other/
hlthaff.2017.1299 state-indicator/total-population/?dataView=1¤tTime
Healthcare.gov. Federal Poverty Level. 2018. Available at: frame=0&selectedDistributions=employer–non-group–
https://www.healthcare.gov/glossary/federal-poverty- uninsured&sortModel=%7B%22colId%22:%22Loca
level-FPL/. Accessed 14 Feb 2018. tion%22,%22sort%22:%22asc%22%7D. Accessed 21
Hersh W. A stimulus to define informatics and health Feb 2018.
information technology. BMC Med Inform Decis Kaiser Family Foundation. Key facts about the uninsured
Mak. 2009;9:24. population. 2017a. https://www.kff.org/uninsured/
High E, Schneider C, Sarnak DO. Appendix 1. Eleven- fact-sheet/key-facts-about-the-uninsured-population/.
Country Summary Scores on Health System Perfor- Accessed 8 Feb 2018.
mance. Mirror, Mirror 2017: International Kaiser Family Foundation. Medicare advantage. 2017b.
Comparison Reflects Flaws and Opportunities for Medicare advantage. http://files.kff.org/attachment/
Better U.S. Health Care. 2017. http://www. Fact-Sheet-Medicare-Advantage. Accessed 21 Mar
commonwealthfund.org/interactives/2017/july/mirror- 2018.
mirror/assets/Schneider_mirror_mirror_2017_Appen Kaiser Family Foundation. The Medicare Part D Prescrip-
dices.pdf. Accessed 18 Feb 2018. tion Drug Benefit. 2017c. http://files.kff.org/attach
Hing E, Hsiao C State Variability in Supply of Office-based ment/Fact-Sheet-The-Medicare-Part-D-Prescription-
Primary Care Providers: United States 2012. 2014. US Drug-Benefit. Accessed 21 Feb 2018.
Department of Health and Human Services. Kaiser Family Foundation. 2017 Employer Health Benefits
Hogan SO, Kissam SM. Measuring meaningful use. Health Survey. 2017d. https://www.kff.org/report-section/ehbs-
Aff. 2010;29(4):601–6. 2017-summary-of-findings/. Accessed 21 Feb 2018.
Hsiao C, Hing E. Use and characteristics of electronic Kaiser Family Foundation. Health Insurance Coverage of
health record systems among office-based physician the Total Population. 2018a. https://www.kff.org/other/
practices: United States, 2001–2013. NCHS Data state-indicator/total-population/?dataView=0¤tTime
Brief. 2014;143:1–8. frame=0&sortModel=%7B%22colId%22:%22Loca
Kaiser Family Foundation. Kaiser slides. 2012. http://facts. tion%22,%22sort%22:%22asc%22%7D
kff.org/. Accessed 19 Apr 2013. Kaiser Family Foundation. Status of State Action on the
Kaiser Family Foundation. Federal Disproportionate Share Medicaid Expansion Decision. 2018b. https://www.kff.
(DSH) hospital allotments. 2013. http://kff.org/medic org/health-reform/state-indicator/state-activity-around-
aid/state-indicator/federal-dsh-allotments. Accessed 11 expanding-medicaid-under-the-affordable-care-act/?
Oct 13. currentTimeframe=0&sortModel=%7B%22colId%
Kaiser Family Foundation. Employer health benefits: 22:%22Location%22,%22sort%22:%22asc%22%7D.
2014 annual survey. 2014a. http://files.kff.org/attach Accessed 14 Feb 2018.
ment/2014-employer-health-benefits-survey-full-report. Kaiser Family Foundation. Medicaid waiver tracker: Which
Accessed 9 Aug 2015. states have approved and pending section 115 Medicaid
Kaiser Family Foundation. Health Care Expenditures per waivers? 2018c. https://www.kff.org/medicaid/issue-
Capita by State of Residence. 2014b. https://www.kff. brief/which-states-have-approved-and-pending-sec
org/other/state-indicator/health-spending-per-capita/? tion-1115-medicaid-waivers/. Accessed 14 Feb 2018.
currentTimeframe=0&sortModel=%7B%22colId% Kaiser Family Foundation. Subsidy calculator. 2018d.
22:%22Location%22,%22sort%22:%22asc%22%7D. http://kff.org/interactive/subsidy-calculator/. Accessed
Accessed 9 Aug 2015. 18 Feb 2018.
Kaiser Family Foundation. Marketplace Enrollment, 2014- OECD. OECD.Stat. 2015. http://stats.oecd.org/index.
2018. 2018. https://www.kff.org/health-reform/state- aspx?DataSetCode=HEALTH_STAT
indicator/marketplace-enrollment-2014-2017/?current OECD. OECD.Stat. 2017. http://stats.oecd.org/OECDStat_
Timeframe=0&sortModel=%7B%22colId%22:%22 Metadata/ShowMetadata.ashx?Dataset=SHA&Coords
Location%22,%22sort%22:%22asc%22%7D. Accessed =%5BLOCATION%5D.%5BDEU%5D&ShowOnWeb
21 Mar 2018. =true&Lang=en. Accessed 18 Feb 2018.
Kidsdata.org. Child population, by race/ethnicity. 2015. Phillips RL, Bazemore AW. Primary care and why it mat-
http://www.kidsdata.org/topic/33/child-population-race/ ters for U.S. health system reform. Health Aff. 2010;29
table#fmt=144&loc=2,127,347,1763,331,348,336,171, (5):806–10.
321,345,357,332,324,369,358,362,360,337,327,364, Pietroburgo J. Charity at the deathbed: impacts of public
356,217,353,328,354,323,352,320,339,334,365,343, funding changes on hospice care. Am J Hosp Palliat
330,367,344,355,366,368,265,349,361,4,273,59,370, Med. 2006;23(3):217–23.
326,333,322,341,338,350,342,329,325,359,351,363, Ranasinghe PD. International medical graduates in the US
340,335&tf=79&ch=7,11,726,10,72,9,939&sortCo physician workforce. J Am Osteopath Assoc. 2015;115
lumnId=0&sortType=asc. Accessed 3 Aug 2015. (4):236–41.
Kovner AR, Knickman JR. Health care delivery in the RAND. Health care on aisle 7: the growing phenomenon of
United States. 9th ed. New York: Springer; 2011. retail clinics. RAND Health Research Highlights. Clin
Ling DC, Berndt ER, Frank RG. Economic incentives and Sch Rev. 2010;3(1):10–3.
contracts: the use of psychotropic medications. Rittenhouse D, et al. Small and medium-size physician
Contemp Econ Policy. 2008;26(1):49–72. practices use few patient-centered medical home pro-
Longtermcare.gov. Costs of Care. 2018. https:// cesses. Health Aff (Proj Hope). 2011;30(8):1575–84.
longtermcare.acl.gov/costs-how-to-pay/costs-of-care. Salinsky E. Governmental public health: an overview of
html. Accessed 5 Apr 2018. state and local public health agencies, National Health
Lorenz K, et al. Charity for the dying: who receives Policy Forum, background paper no. 77. Washington,
unreimbursed hospice care? J Palliat Med. 2003;6 DC: George Washington University; 2010. http://www.
(4):585–91. nhpf.org/library/background-papers/BP77_GovPublic
Medicare Payment Advisory Commission. Health care Health_08-18-2010.pdf. Accessed 19 Apr 2013.
spending and the Medicare program. 2012. http:// Schlesinger M, Mitchell S, Gray B. Measuring community
www.medpac.gov/documents/Jun12DataBookEntire benefits provided by nonprofit and for-profit HMOs.
Report.pdf. Accessed 19 Apr 2013. Inquiry. 2003;40(2):114–32.
Medicare.gov. Your Medicare Coverage. 2018a. Centers Schneider EC, et al. Mirror, Mirror 2017. International
for Medicare and Medicaid Services. https://www. Comparison Reflects Flaws and Opportunities for Bet-
medicare.gov/coverage/hospital-care-inpatient.html. ter US Health Care. 2017. Commonwealth Fund. http://
Accessed 18 Feb 2018. www.commonwealthfund.org/interactives/2017/july/
Medicare.gov. Part B Costs. 2018b. Centers for Medicare mirror-mirror/assets/Schneider_mirror_mirror_2017.
and Medicaid Services. https://www.medicare.gov/your- pdf. Accessed 5 Apr 2018.
medicare-costs/part-b-costs/part-b-costs.html. Accessed Schoenbaum SC, et al. Mortality amenable to health care in
21 Mar 2018. the United States: the roles of demographics and health
Mehrotra A, Wang M, Lave J, Adams J, McGlynn E. Retail systems performance. J Public Health Policy. 2011;32
clinics, primary care physicians, and emergency depart- (4):407–29.
ments: a comparison of patients’ visits. Health Aff. Shen Y, Hsia R. Changes in emergency department access
2008;27(5):1272–82. between 2001 and 2005 among general and vulnerable
Metzl JM, Herzig RM. Medicalisation in the 21st century: populations. Am J Public Health. 2010;100(8):1462–9.
introduction. Lancet. 2007;369(9562):697–8. Shi L, Singh DA. Delivering health care in America: a
Milliman Inc. Medicare versus private health insurance: systems approach. 5th ed. Boston: Jones & Bartlett;
the cost of administration. 2006. http://www.cahi.org/ 2012.
cahi_contents/resources/pdf/CAHIMedicareTechnical Starfield B, Shi L. Policy relevant determinants of health:
Paper.pdf. Accessed 19 Apr 2013. an international perspective. Health Policy. 2002;60
Misurski DA, Lipson DA, Changolkar AK. Inappropriate (3):201–18.
antibiotic prescribing in managed care subjects with Tunis SR, Kang JL. Improvement in Medicare coverage of
influenza. Am J Manag Care. 2011;17(9):601–9. new technology: how Medicare has responded to the
NHPCO. NHPCO facts and figures: hospice care in Amer- need to improve access to beneficial technologies.
ica 2010. National Hospice and Palliative Care Orga- Health Aff. 2001;20(5):83–5.
nization. 2010. http://www.nhpco.org/files/public/ U.S. Census Bureau. NAICS 6211, Offices of physicians.
Statistics_Research/Hospice_Facts_Figures_Oct-2010. 2010. http://www.census.gov/econ/census02/data/
pdf. Accessed 19 Apr 2013. industry/E62111.HTM#bridge. Accessed 19 Apr 2013.
Nolte E, McKee M. Variations in amenable mortality – U.S. Census Bureau. 2014. http://factfinder.census.gov/
trends in 16 high-income nations. Health Policy. faces/tableservices/jsf/pages/productview.xhtml?src=
2011;103:47–52. bkmk. Accessed 4 Jul 2015.
U.S. Census Bureau. Sumter County, Fla., is Nation’s Weiner J. Expanding the US medical workforce: global
Oldest, Census Bureau Reports. 2016. Press Release: perspectives and parallels. BMJ. 2007;335
CB16-107. https://www.census.gov/newsroom/press- (7613):236–8.
releases/2016/cb16-107.html. Accessed 21 Mar 2018. Weiner S, et al. Managing the unmanaged: a case
U.S. Census Bureau. Quickfacts US, Population estimates study of intra-institutional determinants of
2017. 2017. Available at: https://www.census.gov/ uncompensated care at health care institutions
quickfacts/fact/table/US/PST045217#viewtop. Accessed with differing ownership models. Med Care.
21 Mar 2018. 2008;46(8):821–8.
U.S. Centers for Disease Control and Prevention. Cancer Weinick RM, Bristol SJ, DesRoches CM. Urgent
screening and test use – United States, 2013. Morb care centers in the U.S.: findings from a national
Mortal Wkly Rep. 2015. http://origin.glb.cdc.gov/ survey. BMC Health Serv Res. 2009;9:79.
mmwr/preview/mmwrhtml/mm6417a4.htm?s_cid= Weissman J, Gaskin DJ, Reuter J. Hospitals’ care of
mm6417a4_w. Accessed 6 Aug 2015. uninsured patients during the 1990s: the relation of
U.S. Department of Health and Human Services. Health, teaching status and managed care to changes in market
U.S., 2014. 2014. http://www.cdc.gov/nchs/data/hus/ share and market concentration. Inquiry. 2003;40
hus14.pdf. Accessed 19 Aug 2015. (1):84–93.
U.S. Department of Health and Human Services, Health Whitmore H, et al. The individual insurance market before
Resources and Services Administration. The registered reform: low premiums and low benefits. Med Care Res
nurse population: findings from the 2008 National Rev. 2011;68(5):594–606.
Sample Survey of Registered Nurses. 2010. Retrieved WHO. The right to health – fact sheet. 2007. http://www.
from http://bhpr.hrsa.gov/healthworkforce/rnsurveys/ who.int/mediacentre/factsheets/fs323_en.pdf. Accessed
rnsurveyfinal.pdf 19 Apr 2013.
US Department of Health and Human Services. Health, Williams SJ, Martin P, Gabe J. The pharmaceuticalisation
U.S., 2016. 2016. https://www.cdc.gov/nchs/data/hus/ of society? A framework for analysis. Sociol Health
hus16.pdf. Accessed 8 Feb 2018. Illn. 2011;33(5):710–25.
U.S. Government Accountability Office. Private health World Bank. Life Expectancy at Birth, total (years). 2017.
insurance: concentration of enrollees among individ- https://data.worldbank.org/indicator/SP.DYN.LE00.IN.
ual, small group, and large group insurers from 2010 Accessed 21 Feb 2018.
through 2013. 2014. http://www.gao.gov/assets/670/ Yee T, Lechner AE, Boukus ER. The surge in urgent
667245.pdf. Accessed 2 Aug 2015. care centers: emergency department alternative
Van der Hooft C, et al. Inappropriate drug prescribing in or costly convenience? Center for Studying
older adults: the updated 2002 Beers criteria–a popula- Health System Change. Res Brief. (26). July
tion-based cohort study. Br J Clin Pharmacol. 2013. www.hschange.com/CONTENT/1366/1366.
2005;60(2):137–44. pdf
Health System Typologies
41
Claus Wendt
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
Typologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929
The Role of Actors and Institutions in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929
How Do Healthcare Systems Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936
Abstract prominent examples of both areas of research

Since the early 1970s, scholars have been and also describes and characterizes types of
working on typologies for the comparison of healthcare systems and country classifications.
healthcare systems. Typologies enable scholars
to more easily replicate existing studies and
contrast findings from a comparative study Introduction
with those of other studies that cover different
years and countries. Typologies might also Healthcare systems are characterized by different
help to identify institutional indicators that levels and modes of financing, service provision,
seem to be of particular promise when compar- and regulation. Various actors are represented in
ing healthcare systems and reform processes. the healthcare arena. Decision-makers may
This contribution provides an overview of emphasize the relevance of inpatient and outpa-
health system typologies and can be roughly tient healthcare, prevention, and rehabilitation.
divided into two areas of research: (1) classifi- Healthcare systems are, in other words, complex
cations that focus on modes of governance, institutions that are difficult to capture. Typolo-
actors, and institutions and (2) classifications gies can be used as a tool to compare healthcare
that try to capture how healthcare is financed, systems based on few preselected indicators. A
provided, and regulated. This chapter identifies major goal of comparative typology research is to
reduce the massive amount of data and typologies
as a concept of comparative research method
reduce the level of complexity. The strengths of
C. Wendt (*)
University of Siegen, Siegen, Germany typologies can be seen in offering a conceptual
e-mail: wendt@soziologie.uni-siegen.de basis for generalizing across highly diverse
https://doi.org/10.1007/978-1-4939-8715-3_21
928 C. Wendt
healthcare systems. In healthcare, typologies have National Health Service (NHS) system was
so far mainly been used to contrast different types introduced in Britain in 1946 on the basis of the
of healthcare systems, to group countries into Beveridge-plan and provided an example for
types, and to identify similarities and differences countries such as New Zealand, Sweden, and
among countries. Recently, typologies of welfare Denmark, which have since been labeled as hav-
states and of healthcare systems have been ing NHS or Beveridgian healthcare systems
used for combining macro- and micro-research (e.g., Hassenteufel and Palier 2007). In a 1987
in healthcare. Comparative scholars have, study by the OECD, the labels National Health
for instance, studied macro-level effects on Service, social insurance, and private insurance
patients’ access to healthcare, health status, and were used to form a more coherent analytical
satisfaction. concept for healthcare system comparison.
With respect to the triangular model of This concept, however, has been criticized for
healthcare systems as the backbone of this vol- essentially referring to the real cases of Britain,
ume, health system typologies generally refer to Germany, and the USA instead of ideal-types.
one, two, or even all three dimensions: financing While sharing certain characteristics (such as tax
agencies, healthcare providers, and patients. financing vs. social insurance financing vs. pri-
Typologies can roughly be divided into two vate financing), the three types are designed nei-
areas of research. The first concentrates on actors ther for covering all developed healthcare
and institutions by asking who finances, provides, systems nor for capturing changes over time.
and regulates healthcare services. The second area The social insurance countries of Central
of research is interested in the what and captures and Eastern Europe (CEE), for instance, differ
levels and structures of financing, provision, and in many respects (e.g., the weak position of
regulation. corporate actors) from West European social
Typologies are rooted in Max Weber’s meth- health insurance, and Southern European NHS
odology of ideal-types. According to Weber systems differ (e.g., regarding administrative
(1949: 90; italics in original), an ideal-type “is capacities) from the British and Nordic
formed by the one-sided accentuation of one or NHS systems.
more points of view and by the synthesis of a great Due to the lack of a commonly used health
many diffuse, discrete, more or less present and system typology, some scholars of healthcare sys-
occasionally absent concrete individual phenom- tems use the typology of welfare states introduced
ena, which are arranged according to those one- by Esping-Andersen (1990) as a reference. In the
sidedly emphasized viewpoints into a unified ana- original version, social democratic welfare states
lytical construct.” This method can be used as a were separated from conservative-corporatist
tool for grouping countries (real-types) into and from liberal welfare states. In healthcare
health system types to identify similarities and research, welfare regimes have been used for ana-
differences among healthcare systems, analyze lyzing public support of healthcare systems
changes over time, and study the effects of (Gelissen 2002), the health status (Conley and
healthcare systems characterized by different Springer 2001), and health inequalities (Eikemo
institutional setups. et al. 2008). Arguing that the concept of
Social health insurance and National Health de-commodification (central to the welfare regime
Service have been used as terms for contrasting typology) is not designed for capturing character-
healthcare systems that have different traditions istics with great importance to health, Bambra
and institutional designs. The first social health (2005) introduced a health de-commodification
insurance (SHI) was implemented in Germany in index consisting of private health expenditure,
1883 by Bismarck; later, countries such as Aus- private hospital beds, and the overall coverage of
tria, Hungary, and France followed the German the healthcare system. The country grouping is
example, and their systems are often labeled as slightly different compared with the “classic” wel-
SHI or Bismarckian healthcare systems. The first fare state typology. Both concepts, however, were
41 Health System Typologies 929
first and foremost designed to capture social rights The Role of Actors and Institutions in
and do not reveal the main characteristics of mod- Healthcare
ern healthcare systems.
This chapter summarizes some of the more With respect to the triangular concept of this
recent health system typologies related to “orga- book, typologies in this area of research ask
nization and governance of health systems,” “who” is responsible for governance and organi-
“health financing,” and “provision of services.” zation as well as the purchasing and provision of
It discusses concepts, country groupings, and healthcare services. The 1987 OECD study was
findings from studies that use health system one of the first attempts to classify healthcare
typologies. Studies are available that analyze systems according to preselected dimensions.
the effects of different health system types on The analytical dimensions used in OECD 1987
cost containment, access to care, public opinion, were “coverage,” “financing,” and “ownership,”
health, and health inequality (for an overview, and the study investigated “who” was responsi-
see Beckfield et al. 2013; Burau et al. 2015). ble in these areas. The creation of types and the
Most typologies, however, remain descriptive classification of countries, however, did not take
with the primary task of identifying similarities place on the basis of available comparative data
and differences among today’s healthcare but on the basis of informed reasoning. The
systems. Such studies are critical and have OECD study identified (1) a National Health
commonly been used for selecting countries for Service (NHS) model with universal coverage,
small-n comparative studies in the welfare tax financing, and public ownership of
state discourse. Health system typologies have healthcare provision; (2) a social insurance
taken data from data sets such as the model with universal coverage, social insurance
OECD Health Data (various years), the WHO financing, and a combination of public and pri-
Health for All Database (various years), vate ownership; and (3) a private insurance
and other international and national sources. model with selective private insurance coverage,
Most comparative researchers make use of private insurance financing, and private owner-
the OECD data, which is, however, less ship (OECD 1987).
useful for the detection of institutional bound- Moran (1999) developed a typology of
aries within countries or for the analysis of “healthcare states” by asking “who” governs the
inequalities (Beckfield et al. 2013). If the effects “consumption,” “provision,” and “production” of
of different health system types are examined, healthcare. Governance of “consumption” refers
macro data could be matched with micro to patients’ eligibility to access healthcare and to
data from sources such as the Eurobarometer, the allocation of financial resources to the
the European Social Survey (ESS), the Interna- healthcare system; governance of “provision”
tional Social Survey Programme (ISSP), and refers to the control of doctors and hospitals; and
the Survey of Health, Ageing and Retirement governance of “production” refers to the regula-
(SHARE). tion of medical innovations. On the basis of these
dimensions, Moran (1999) constructed four fam-
ilies of healthcare states: (1) the “entrenched com-
Typologies mand and control state,” in which the state is
distinctive in all three governing areas (e.g., the
Health system typologies can basically be divided UK and the Scandinavian countries); (2) the “cor-
into frameworks that concentrate on the role of porate healthcare states,” in which “consumption”
actors and the type of governance on the one hand is dominated by public law bodies and the field of
and into frameworks that try to understand how outpatient healthcare is dominated by panel doc-
healthcare systems work, what they invest in the tors’ associations (e.g., Germany); (3) “supply
people’s health, and what services they provide on states,” which are dominated by provider interests
the other hand. (e.g., the USA); and (4) “insecure command and
930 C. Wendt
control states,” in which administrative capacities regulation, financing, and service provision by state
are much lower and private healthcare provision is actors and institutions (e.g., the UK, the Scandina-
higher than in the first type (e.g., Italy, Greece, vian countries, and the Southern European coun-
Portugal, Spain). By using Moran’s concept, tries of Portugal and Spain); (2) national health
Burau and Blank (2006) analyzed nine healthcare insurance with regulation and financing by the
systems and identified four cases that fully fit one state and with private healthcare provision (e.g.,
of Moran’s types. Sweden and the UK are perfect Australia, Canada, Ireland, New Zealand, and
examples of the “command and control state”; Italy); (3) a societal-based mixed type with regula-
however, New Zealand and the Netherlands tion and financing by societal actors such as social
share important characteristics of this type as insurance and public healthcare provision (e.g.,
well. Germany represents the “corporatist Slovenia); (4) social health insurance with regula-
healthcare state,” and Australia, Japan, and again tion and financing by societal actors and private
the Netherlands match this type in two dimen- healthcare provision (e.g., Austria, Germany, Lux-
sions. The USA is an example of the “supply embourg, and Switzerland); (5) a private health
healthcare state,” and Singapore also shows system with private regulation, financing, and ser-
major characteristics of this type in addition to vice provision (e.g., the USA); and (6) etatist social
corporatist elements. health insurance with state regulation, social insur-
Wendt et al. (2009) suggest a typology with ance financing, and private provision (e.g., Bel-
27 healthcare system types, 3 of which are ideal- gium, Estonia, France, the Czech Republic,
types. These healthcare system types are Hungary, the Netherlands, Poland, Slovakia, Israel,
constructed by combining the dimensions of reg- Japan, and Korea). Böhm et al. therefore identified
ulation, financing, and service provision with the two of the ideal-types proposed by Wendt et al. (a
involvement of state, nongovernmental (societal), state healthcare system and a private healthcare
and private actors. In “state healthcare systems,” system), while according to this study, an ideal-
the state is decisive in all three dimensions; in type societal healthcare system does not exist in
“societal healthcare systems,” societal and corpo- today’s OECD world. While corporate actors such
rate actors are decisive; and in “private healthcare as social health insurance and doctors’ associations
systems,” private actors dominate regulation, can (and sometimes do) run their own services,
financing, and healthcare provision. For each most healthcare systems that are financed by social
ideal-type, Wendt et al. (2009) identified six com- health insurance contributions rely on private
binations in which either the state, societal actors, provision.
or private actors are dominant in two dimensions Most typologies that concentrate on the role of
and therefore come close to the respective ideal- the state and other actors in healthcare (i.e., “who”
type. Six additional combinations do not approach is governing and regulating, financing, and pro-
to any of the three ideal-types. Based on this viding healthcare) have identified one type of
typology, Wendt et al. (2009) suggested that the system in which the state plays a dominant role
UK and Denmark form “state healthcare sys- and includes the UK and the Scandinavian coun-
tems,” in which the state is decisive in all three tries. Furthermore, in all typologies, the private
dimensions. Germany is classified as a (societal- US healthcare system forms a type of its own. All
based) mixed type due to the great importance of other empirical and theoretical observations are
private provision, and the USA is labeled a (pri- far from uniform. Most typologies have identified
vate-based) mixed type due to the growing impor- the German healthcare system as representative of
tance of public financing through public programs a “societal core”; however, while Burau and
such as Medicare and Medicaid. Blank (2006) cluster the German case together
Using Wendt et al.’s model, Böhm et al. (2013) with Australia, Japan, and the Netherlands,
compared and classified 30 OECD countries and Böhm et al. (2013) place Germany in the same
found 6 health system types for which real cases group as Austria, Luxembourg, and Switzerland
could be identified: (1) national health service with (see Table 1 below).
Table 1 Overview of health system typologies

Authors Dimensions Data Types Country grouping Main goal
The role of actors and institutions in healthcare
OECD Coverage No data (1) National health “Paradigmatic cases”: Construction
(1987) service of types
Financing (2) Social insurance (1) The UK
Ownership (3) Private insurance (2) Germany
(3) The USA
Moran Consumption No data (1) Command and (1) The UK, Scandinavian Construction
(1999) control state countries of types
Provision (2) Corporatist state (2) Germany
Production (3) Supply state (3) The USA
(4) Insecure (4) Greece, Italy, Portugal,
command and control Spain
state
Burau and Consumption Partly (1) Command and (1) Sweden, the UK Grouping of
Blank based on control state (New Zealand, the countries
(2006) OECD Netherlands)
(using Provision health (2) Corporatist state (2) Germany (Australia,
Moran’s Production data Japan, the Netherlands)
concept) (3) Supply state (3) The USA (Singapore)
Wendt Regulation No data Taxonomy of (1) Denmark, the UK Construction
et al. 27 health systems of types
(2009) with three ideal-
types:
Financing (1) State healthcare (2) (Germany)
system
Provision (2) Societal (3) (The USA)
healthcare system
(3) Private healthcare
system
Böhm et al. Regulation OECD (1) National health (1) Denmark, Finland, Grouping of
(2013) health service (regulation, Iceland, Norway, Sweden, countries
(using data; HiT financing, and Portugal, Spain, the UK
Wendt reportsa provision: state)
et al.’s Financing (2) National health (2) Australia, Canada,
concept) insurance (regulation Ireland, New Zealand, Italy
and financing: state;
provision, private)
Provision (3) Societal-based (3) Slovenia
mixed type
(regulation and
financing: societal;
provision, state)
(4) Social health (4) Austria, Germany,
insurance (regulation Luxembourg, Switzerland
and financing:
societal; provision,
private)
(5) Private health (5) The USA
system (regulation,
financing, and
regulation: private)
(6) Etatist social (6) Belgium, Estonia,
health insurance France, the Czech Republic,
(continued)
932 C. Wendt
Table 1 (continued)
(regulation: state; Hungary, the Netherlands,
financing, societal; Poland, Slovakia, Israel,
provision, private Japan, Korea
How do healthcare systems work?
Bambra Private OECD (1) High public Grouping suggested by the (Construction
(2005) health health healthcare index author and based on Bambra of types) and
expenditure data; (50 or higher) (2005), Table 8: grouping of
Private WHO (2) Middle public (1) Finland, Sweden, countries
hospital beds data healthcare index Norway, the UK
(around 40)
Coverage of (3) Low public (2) Austria, Belgium, France,
the public healthcare index Ireland, New Zealand,
system (20–30) Canada, Denmark, Italy
(4) Very low public (3) Australia, Germany, the
healthcare index Netherlands, Switzerland,
(below 10) Japan
Reibling Gatekeeping OECD (1) Financial (1) Austria, Belgium, France, Construction
(2010) health incentives states Sweden, Switzerland of types and
Cost-sharing data; HiT (2) Strong (2) Denmark, the grouping of
reportsa; gatekeeping and low Netherlands, Poland, Spain, countries
MISSOCb supply states the UK
Supply (3) Weakly regulated (3) The Czech Republic,
and high supply Germany, Greece
states
(4) Mixed regulation (4) Finland, Italy, Portugal
states
Wendt Health OECD (1) Health service (1) Austria, Belgium, France, Constructing
(2009) expenditure health provision-oriented Germany, Luxembourg of types and
data; HiT type grouping of
Public- reportsa (2) Universal (2) Denmark, Italy, Ireland, countries
private mix coverage – controlled Sweden, the UK
of financing access type
Privatization (3) Low budget – (3) Portugal, Spain, Finland
of risk restricted access type
Healthcare
provision
Entitlement
to care
Payment of
doctors
Patients’
access to
providers
Wendt Health OECD (1) Health service (1) Austria, Belgium, Constructing
(2014) expenditure health provision-oriented Canada, France, Germany, of types and
data; HiT type Japan, Luxembourg, grouping of
reportsa New Zealand countries
Public- (2) Universal (2) Australia, the Czech
private mix coverage – controlled Republic, Denmark, Estonia,
of financing access type Hungary, Ireland, Italy, the
Netherlands, Poland, Slovak
Republic, Slovenia, the UK
(continued)
Table 1 (continued)
Privatization (3) Universal (3) Finland, Iceland,
of risk coverage – controlled Portugal, Spain, Sweden
supply type
Healthcare (4) Low supply type (4) Greece (in 2001), Israel,
provision Turkey
Payment of
doctors
Patients’
access to
providers
a
HiT reports: European Observatory of Health Care Systems, Healthcare Systems in Transitions series, see http://www.
euro.who.int/en/about-us/partners/observatory/health-systems-in-transition-hit-series
b
MISSOC: The EU’s Mutual Information System on Social Protection, see http://ec.europa.eu/social/main.jsp?catId=
815&langId=en
How Do Healthcare Systems Work? Bambra (2005) developed a healthcare de-com-

modification index. The main theoretical argu-
While the first category of typologies focuses on ment is that patients have easier access to
the role of the state, the second category is mainly healthcare provision if public coverage is higher
interested in how healthcare systems work, in and private financing and service provision are
the level of resources invested in healthcare, in lower. A possible country grouping (not provided
the actual process of service provision, and in by Bambra, who was primarily interested in com-
patients’ access to healthcare. In this respect, bining cash and service indicators to construct
these typologies of health systems are closer to a welfare typology that takes health and social
welfare regime types since whether or not citizens services into account) could juxtapose countries
have a right to access certain healthcare services is with a high public healthcare index (Finland,
a key factor. Since a strong focus is placed on what Sweden, Norway, the UK), a middle public
healthcare systems actually do, there is a higher healthcare index (Austria, Belgium, France, Ire-
potential for analyzing institutional effects on land, New Zealand, Canada, Denmark, Italy), a
health outcomes. These types of healthcare sys- low public healthcare index (Australia, Germany,
tems can (but do not need to) be related to typol- the Netherlands, Switzerland, Japan), and a very
ogies that concentrate on actors and institutions low public healthcare index (the USA).
(Marmor and Wendt 2012). Reibling (2010) also used the concept of de-
So far, only a few typologies have included commodification as her starting point but focused
selected information on levels of expenditure, more directly on access to welfare programs,
financing, healthcare provision, or institutional whereby access is defined by benefit levels and
indicators of the healthcare system. Frenk and the conditions by which benefits can be accessed.
Donabedian (1987), for instance, focused on the This focus strengthens the patients’ perspective
basis for eligibility (citizenship, contributions, and draws a closer link between healthcare ser-
poverty) to access the healthcare system (not vices and individual health. Dimensions for the
shown in Table 1), and the OECD 1987 study comparative analysis of access are gatekeeping,
used the related question of coverage. Both con- cost-sharing, and supply. Gatekeeping is defined
cepts, however, are placed within the more gen- as legal regulations that structure patients’ entry
eral concept of governance and regulation. By and passage through the healthcare system
drawing on the extent of private financing, (Reibling 2010). Access, furthermore, is
the level of private provision, and the general influenced by cost-sharing that may create finan-
access provided by the public healthcare system, cial incentives not to use healthcare services,
934 C. Wendt
particularly for minor diseases. Supply, as a major comparatively low, and self-employed doctors
precondition for access, is assessed by provider are generally paid fee-for-service.
density and medical technology. By using gate- 2. The “universal coverage – controlled access
keeping, cost-sharing, provider density (GPs, spe- type,” represented by Denmark, Italy, Ireland,
cialists, and nurses), and medical technology Sweden, and the UK. While these healthcare
(magnetic resonance imaging units/MRI, com- systems provide universal coverage, access to
puted tomography scanners/CT), four types of care is strictly regulated. Patients typically
European healthcare systems were constructed: have to sign up on a general practitioner’s list
(1) “financial incentive states” that regulate for a longer period of time, and a referral is
patients’ access to medical care first and foremost required if specialist care is needed. Access to
by cost-sharing (Austria, Belgium, France, Swe- care is further restricted by a comparatively
den, Switzerland); (2) “strong gatekeeping and low level of healthcare provision in the outpa-
low supply states” that are characterized by low tient sector. General practitioners are mainly
cost-sharing (but where access is controlled by paid on a capitation basis.
extensive gatekeeping), low numbers of 3. The “low budget – restricted access type,”
healthcare providers, and medical technology which includes Finland, Portugal, and
(Denmark, the Netherlands, Poland, Spain, the Spain. This type of system is characterized
UK); (3) “weakly regulated and high supply by a low level of healthcare expenditure.
states” with low legal access regulation and a Patients’ access is controlled not only by strict
high supply of healthcare providers (the Czech access regulation but also by high private co-
Republic, Germany, Greece); and (4) “mixed payments. Most general practitioners receive a
regulation states” that use both gatekeeping and salary, and the degree of doctors’ autonomy
cost-sharing. can therefore be considered to be even lower
In two publications, Wendt (2009, 2014) addi- than in the “universal coverage – controlled
tionally focused on gatekeeping, cost-sharing, and access type.”
supply and combined these dimensions with
information on entitlement to healthcare, the In Wendt (2014), the number of countries
level of healthcare expenditure, the public-private was extended, and the research now covers
mix of healthcare financing, and doctors’ remu- both European and non-European healthcare
neration. Healthcare provision is captured by ser- systems. When using the same dimensions
vice provider numbers in inpatient and outpatient (except entitlement to care) and newer data, the
healthcare, gatekeeping by a healthcare regulation “health service provision oriented type” can be
index, and doctors’ remuneration by the payment confirmed and now also covers Canada, Japan,
of general practitioners in the outpatient sector and New Zealand. The “universal coverage –
(fee-for-service, capitation, salary). The 2009 controlled access type” has also been confirmed
article compares European countries, whereas and now additionally includes Australia and coun-
the 2014 article covers both European and non- tries from Central and Eastern Europe. A third
European countries. By applying cluster analyses type identified in Wendt (2014) is the “universal
in the 2009 typology, Wendt arrived at three types coverage – controlled supply state,” represented
of healthcare systems: by Finland, Iceland, Portugal, Spain, and Sweden.
In this type, the control of doctors’ remuneration
1. The “health service provision-oriented type,” is even stricter, and cost-sharing is even higher
which captures Austria, Belgium, France, than in the “universal coverage – controlled
Germany, and Luxembourg. This type is access type.” In the publication from 2014, the
characterized by a high level and unquestioned “low supply type” has been identified as a fourth
importance of service provision. Patients often type of healthcare system, represented by Israel,
have direct access and a choice of both general Turkey, and (in 2001) Greece. This type is char-
practitioners and specialists. Cost-sharing is acterized by both very low levels of total health
expenditure and low public financing. Levels of way they function. “Command and control states”
healthcare provision in both inpatient and outpa- should be characterized by lower healthcare
tient healthcare are quite low. Patients’ access to spending and stronger access regulation. “Supply
medical doctors, however, is hardly controlled by states,” in which doctors’ associations and other
instruments of regulation. corporate actors are involved in the governance of
healthcare, should be characterized by higher
levels of healthcare provision, greater doctors’
Discussion autonomy, and lower access regulation. However,
strong state actors could also use their power and
The typologies summarized in Table 1 cover two financial capacities to invest more in healthcare. If
different areas of research. The first group is more we want to know how healthcare systems actually
focused on types of governance and on the role of work (e.g., for analyzing healthcare systems’
the state and other actors in healthcare. The effects on health, health inequalities, and health-
dimensions used are “coverage,” “financing,” care utilization), dimensions with a stronger focus
“consumption,” and “provision,” and the focus is on healthcare provision and patients’ access to
on “who” is responsible in these areas of the healthcare providers are required.
healthcare arena. In almost all typologies, Ger- The different focus of the two concepts
many (and to some extent Australia, Japan, and becomes clear when comparing two typologies
the Netherlands), the UK (often together with the that include the largest number of countries (see
Scandinavian countries and to some extent with Table 2). We almost always find the Scandinavian
New Zealand and the Netherlands), and the USA countries in the same type of healthcare system,
(with no other countries representing this type) are irrespective of whether the focus is on governance
contrasted. Böhm et al. (2013) put forward one of (Böhm et al. 2013) or on how healthcare systems
the first empirical classifications of healthcare work (Wendt 2014). Since the mid-2000s, Portu-
systems that covers a larger number of countries. gal and Spain have appeared to be close to the
Like earlier “role of actors and institutions” typol- Scandinavian group. Almost all CEE countries
ogies, the UK and the Scandinavian countries are can be found in a common type of healthcare
grouped into the same type; however, this time system. However, while the form of governance
they are together with the Southern European seems to be close to that of some Western social
countries. Furthermore, Germany is grouped health insurance systems (Belgium, France, the
together with Austria, Luxembourg, and Switzer- Netherlands) and of the Japanese social health
land. This grouping is much in line with argu- insurance, levels of financing and healthcare pro-
ments laid down in the OECD 1987 study and in vision as well as patients’ access to medical care
Moran’s comparative work but has so far not been are more similar to the situation in NHS countries
demonstrated empirically. Two other types that such as Denmark, Ireland, Italy, and the UK (see
have not been suggested in earlier studies are the Table 2). The Western social health insurance
“social health insurance type,” represented by countries of Austria, Germany, and Luxembourg
Australia, Canada, Ireland, New Zealand, and are similar in both their governance and the
Italy, and the “etatist social health insurance way they work. When focusing on levels of
type,” represented by countries from Central and financing, healthcare provision, and patients’
Eastern Europe as well as by Belgium, France, the access, Germany, Austria, and Luxembourg are
Netherlands, Israel, Japan, and Korea. close to Belgium, Canada, France, Japan, and
The second group of typologies is more New Zealand, which, according to Böhm et al.,
focused in “how” healthcare systems work, what represent different governance types. The USA
services they provide, and how patients access seems to be distinct from any other type of
necessary healthcare services. Both areas of healthcare system, both in the way it is regulated
research are necessarily interrelated, for the way and in its level of financing, provision, and
healthcare systems are governed influences the patients’ access to care.
936 C. Wendt
Table 2 Comparing two typologies

Böhm et al. (2013) Wendt (2014)
(1) Denmark, Finland, Iceland, Norway, Sweden, (3) Finland, Iceland, Portugal, Spain, Sweden
Portugal, Spain, the UK
(6) Belgium, Estonia, France, the Czech Republic, (2) Australia, the Czech Republic, Denmark, Estonia,
Hungary, the Netherlands, Poland, Slovakia, Israel, Hungary, Ireland, Italy, the Netherlands, Poland, Slovak
Japan, Korea Republic, Slovenia, the UK
(2) Australia, Canada, Ireland, New Zealand, Italy
(4) Austria, Germany, Luxembourg, Switzerland (1) Austria, Belgium, Canada, France, Germany, Japan,
Luxembourg, New Zealand
(3) Slovenia
(4) Greece (in 2001), Israel, Turkey
(5) The USA (5) The USA
This overview of health system typologies Health system typologies also have limitations
suggests that the way healthcare systems are that are in part related to their strength of simpli-
governed does not directly dictate the way they fication. The identification of health system types
function. Even if very similar actors are always depends on the indicators chosen, and
involved in the regulation, financing, and provi- therefore the selection of indicators and their the-
sion of healthcare, the results can be very differ- oretical justification is key to healthcare system
ent levels of financing, healthcare provision, typologies. Furthermore, the correct definition of
and access regulation among individual coun- indicators is not always an easy task. For instance,
tries. It is therefore essential to construct health does health insurance offered by private organiza-
system typologies for both areas of research. It tions in the Netherlands, that are highly regulated,
depends on the specific research question at count as private or as social health insurance?
hand what the more useful typological category Also, so far typologies have used national aver-
is. Classifications capturing the role of actors ages that conceal regional differences. Due to the
and modes of governance are better suited to trend of decentralization, future typologies may
analyze reform options, cost containment, and have to take geographic inequalities into account
physical and human resource strategies in dif- (Reibling 2010). More generally, according to
ferent health system types, whereas classifica- Freeman and Frisina (2010) and Burau et al.
tions capturing how healthcare systems actually (2015), a trade-off between simplification and
work are better suited for assessing health sys- accuracy is inherent to typologies.
tems and their influence on health, inequalities
in health, and utilization of healthcare services.
The triangular model of health systems References
is of importance for health system typologies
Bambra C. Cash versus services: ‘worlds of welfare’ and
not only with respect to the main players and the decommodification of cash benefits and health care
their interactions in the three health markets services. J Soc Policy. 2005;34(2):195–213.
(the health insurance market, the healthcare Beckfield J, Olafsdottir S, Sosnaud V. Healthcare systems
purchasing market, and the healthcare provision in comparative perspective: classification, conver-
gence, institutions, inequalities, and five missed turns.
market) but also with respect to the way Annu Rev Sociol. 2013;39:127–46.
patients can use the healthcare system, which Böhm K, Schmid A, Götze R, Landwehr C, Rothgang H.
is related to factors such as the resources Five types of OECD healthcare systems:
spent on healthcare, cost-sharing arrangements, empirical results of a deductive classification. Health
Policy. 2013;113(3):258–69.
the level of healthcare services actually Burau V, Blank RH. Comparing health policy: an assess-
provided, and how patients can use these ment of typologies of health systems. J Comp Policy
healthcare services. Anal. 2006;8(1):63–76.
Burau V, Blank RH, Pavolini E. Typologies of healthcare provision, and access to healthcare. J Eur Soc Policy.
systems and policies. In: Kuhlmann E, Blank 2009;19(5):432–45.
RH, Bourgeault IL, Wendt C, editors. The Palgrave Wendt C. Changing healthcare system types. Soc Policy
international handbook of healthcare policy and Adm. 2014;48(7):864–88.
governance. Basingstoke: Palgrave Macmillan; 2015. Wendt C, Frisina L, Rothgang H. Health care system types.
p. 101–15. A conceptual framework for comparison. Soc Policy
Conley D, Springer KW. Welfare state and infant mortality. Adm. 2009;43(1):70–90.
Am J Sociol. 2001;107(3):768–807.
Eikemo TA, Bambra C, Judge K, Ringdal K. Welfare
state regimes and differences in self-perceived health
in Europe: a multilevel analysis. Soc Sci Med. Further Reading
2008;66:2281–95.
Esping-Andersen G. The three worlds of welfare capital- Freeman R. The politics of health in Europe. Manchester:
ism. Cambridge: Polity Press; 1990. Manchester University Press; 2000.
Freeman R, Frisina L. Health care systems and the problem Freeman R, Moran M. Reforming health care in Europe.
of classification. J Comp Policy Anal Res Pract. West Eur Polit. 2000;23(2):35–59.
2010;12(1):163–78. Gauld R. The new health policy. Maidenhead: Open
Frenk J, Donabedian A. State intervention in medical University Press; 2009.
care: types, trends and variables. Health Policy Plan. Giaimo S, Manow P. Adapting the welfare state – the case
1987;2(1):17–31. of health care reform in Britain, Germany, and the
Gelissen J. Worlds of welfare, worlds of consent? Public United States. Comp Pol Stud. 1999;32(8):967–1000.
opinion on the welfare state. Leiden: Brill; 2002. Immergut EM. Health politics: interests and institutions in
Hassenteufel P, Palier B. Towards neo-Bismarckian health Western Europe. Cambridge: Cambridge University
care states? Comparing health insurance reforms Press; 1992.
in Bismarckian welfare systems. Soc Policy Adm. Marmor T, Wendt C, editors. Reforming healthcare
2007;41(6):574–96. systems. Two Volumes. Cheltenham/Northampton:
Marmor T, Wendt C. Conceptual frameworks for compar- Edward Elgar Publishing; 2011.
ing healthcare politics and policy. Health Policy. Montanari I, Nelson K. Social service decline and conver-
2012;107(1):11–20. gence: how does healthcare fare? J Eur Soc Policy.
Moran M. Governing the health care state. A comparative 2012;23(1):102–16.
study of the United Kingdom, the United States and Moran M. Understanding the welfare state: the case of
Germany. Manchester: Manchester University Press; health care. Br J Polit Int Relat. 2000;2(2):135–60.
1999. Rothgang H, Cacace M, Frisina L, Grimmeisen S, Schmid
OECD. Financing and delivery of health care. A compar- A, Wendt C. The state and healthcare. Comparing
ative analysis of OECD countries. Paris: OECD; 1987. OECD countries. Basingstoke: Palgrave Macmillan;
Reibling N. Healthcare systems in Europe: towards an 2010.
incorporation of patient access. J Eur Soc Policy. Smith P, Anell A, Busse R, Crivelli L, Healy J, Lindahl AK,
2010;20(1):5–18. et al. Leadership and governance in seven developed
Weber M. The methodology of the social sciences. New health systems. Health Policy. 2012;106:37–49.
York: The Free Press; 1949. Tuohy C. Accidental logics: the dynamics of change in the
Wendt C. Mapping European healthcare systems. health care arena in the United States, Britain, and
A comparative analysis of financing, service Canada. New York: Oxford University Press; 1999.
Organization and Governance:
Stewardship and Governance in 42
Health Systems
Scott L. Greer
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939
Definitions: Into the Mire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 940
Comparing and Measuring Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 941
Good Enough, or Better, Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942
Attributes of Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943
Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 943
Accountability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944
Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944
Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945
Policy Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945
A Diagnostic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 946
Abstract Accountability, Participation, Integrity and

Governance, how decisions are made and Capacity. Together they make the TAPIC
implemented, is an important part of health framework and can be used to identify gover-
care and health policy. It is also the subject of nance dimensions of policy problems. Third,
a large and often confusing literature. This better governance through the TAPIC model
chapter presents the results of a review of the can also reduce the likelihood of other
governance literature for health. First, it notes problems.
that not all problems are of governance. Sec-
ond, it introduces five domains of governance
in which governance problems, challenges, Introduction
and policies are located: Transparency,
Stewardship and governance, like “resilience” or
“strategic,” are “power words” (Frederickson
S. L. Greer (*) 2005). They sound desirable, are difficult to
Department of Health Management and Policy, University
of Michigan, Ann Arbor, MI, USA argue with, and give an automatic advantage in
e-mail: slgreer@umich.edu most arguments to the people who invoke them.
https://doi.org/10.1007/978-1-4939-8715-3_22
940 S. L. Greer
As a result, both have been stretched by aca- Czempiel 1992), and public management (Rhodes
demics, governments, international organizations, 1997), for example, differed greatly.
consultants, and other ideological entrepreneurs International organizations became particu-
who want the power that comes with its larly interested as part of the backlash against
invocation. structural adjustment lending and, in particular,
This chapter will first separate out stewardship their role in the Asian financial crisis and its
and governance, providing key definitions and aftermath. Fifteen years of increasingly invasive
making the point that while they might be in the policy conditionality in the service of structural
hands of political rivals, they are not intellectually adjustment failed to produce the desired effects in
rivalrous concepts. It then presents the results of the structurally adjusted countries (Greer 2013;
our review of concepts, presenting the five attri- Woods 2006). They turned to good governance
butes of governance (which are also among many as a solution (e.g., World Bank 1992, 1994). The
desirable objectives of stewardship) that emerged essential logic was simple enough: reforms, espe-
as mutually exclusive and able to cover the many cially those imposed through conditional loans,
activities and ideas classified as “governance.” frequently had serious noncompliance problems,
faced serious implementation problems, and had
the wrong effects. The response was to blame
Definitions: Into the Mire these problems on the governance – the organiza-
tion, probity, competence, and coordination –
Governance has several kinds of meaning. On one of the countries involved and try to improve
hand, it has spread across multiple fields that use it that as a part of development or financial rescue
in different ways to discuss topics as different as (Nunnenkamp 1995).
the proper constitution of a company board and In 2013, all three preoccupations are alive and
the nature of public management in the Internet well: we have governance as a field of manage-
age. On the other hand, it is used for a variety of ment, including corporate governance and clinical
normative, empirical, and mixed projects. governance in health (Walshe and Smith 2011),
While there have been sporadic uses of the word governance as a sprawling and contested term
for many years, it became a common modern con- applied in endless different ways by social scien-
cept first in the discussion of management, specifi- tists in analyzing the world (Kjaer 2004; Bevir
cally corporate governance, the organization of 2013), and governance as a normative concept
power within commercial firms. In the 1980s, it used when policymakers speak about improving,
started to pick up a second usage; it was used in essentially, international public management
political economy research to discuss arrangements (Fukuyama 2013).
in which organizations such as unions, professions, In each of these incarnations, governance-
and government collectively coordinated activity (e. speak has two essential uses. One is empirical:
g., Campbell and Lindberg 1991). In the aftermath the description and analysis of what is. One
of the Cold War, more academics became interested is normative: calls for how it ought to be.
in it as a descriptive term for systems that produced Empirically, governance in almost any account
collective decisions without having clear centers of is some form of authoritative coordination, which
hierarchical power (as distinct, in some once-fash- means decisionmaking and implementation. Such
ionable formulations, from “government”). In this analyses tend to try to capture the mechanisms by
capacity, the term drew on and partially displaced which authoritative decisions are made, analyzing
perfectly good older terms such as “networks.” In the powers, responsibilities, and coordination of
the hands of these scholars, governance came to professions, insurers, providers, governments at
mean almost anything that generated order without different levels, and the other actors who make
hierarchy; its meanings in transaction cost econom- and implement decisions in health systems.
ics (Williamson 1996), European studies (Marks et Normatively, governance can be termed good,
al. 1996), international relations (Rosenau and or better or worse, and the parallel normative,
42 Organization and Governance: Stewardship and Governance in Health Systems 941
policy-oriented literature seeks to improve it by concept. Thirdly, stewardship was a concept

promoting, essentially, various forms of “good largely confined to global health policy discus-
governance.” In general, this normative literature sions, while governance is, for better or for
is focused on policy interventions and institu- worse, discussed in many fields of human activity.
tional changes. The real solution to corruption,
social science makes quite clear, is reducing
inequality in society by expanding social rights Comparing and Measuring
and economic redistribution (Uslaner 2008; Governance
Rothstein 2011). That seems to be beyond the
scope of most governance advice, which focuses Measuring the quality of governance has been a
on the level of individuals (hiring the right people) preoccupation of scholars and international organi-
and organizations, and perhaps legal frameworks zations for some years now, and the result has been
(Sabet 2012, 21 for the distinction). Many a variety of initiatives that attempt to define gover-
accounts, of course, mix normative and empirical nance in quantitative, comparable terms. Given
in more or less coherent, articulated, and useful that the latest initiatives are the most ambitious
ways. yet, the next years should be fertile ones for the
Stewardship, by contrast to governance, is a quantitative, comparative study of governance.
word with a more limited history in health policy. The largest project is based at the University of
While the word is as old as the concept of a Gothenburg. The “Quality of Government” pro-
steward – a person entrusted with looking after ject, as it is known, aggregates a wide variety of
something – its grand entrance into the global databases (its key findings are in Rothstein 2011).
health policy vocabulary came in the 2000 A variety of other projects, including the Varieties
World Health Report (World Health Organization of Government project based at the University
2000) (WHR), which defined it as one of four key of Notre Dame (Coppedge et al. 2012), try to
functions of health systems alongside resource enhance our comparative understanding and
generation, financing, and service delivery. The measurement of a wide spectrum of governance
WHR defines stewardship as “the careful and indicators (Fukuyama 2013 for a review). These
responsible management of the well-being of the databases, which face the data and coding prob-
population,” and “... the very essence of good lems of all large-scale international quantitative
government” (Travis et al. 2003 for a lucid dis- comparative research efforts, are mostly focused
cussion in the WHO context). on general regime types and put less focus on the
Separating governance and stewardship is con- actual management of health systems.
ceptually easier than it might look. Firstly, gover- The measurement of health systems gover-
nance is a structure or pattern, whereas nance is somewhat less developed, since it is not
stewardship is an activity. As a result, pursuing unintelligent to focus instead on actual health out-
an item such as capacity or development or trans- comes (an imperfect enough set of outcomes)
parency from a long list of policies can be seen as (Smith et al. 2008). The comparison of health
good stewardship or establishment of better gov- systems and their governance is, by contrast,
ernance. A person occupying a position in a sys- rather more developed. The European Observa-
tem of governance can be a better or worse tory on Health Systems and Policies has signifi-
steward. Secondarily, stewardship is almost cantly advanced comparative health systems
always normative in health policy discussions. research by producing books, written to tem-
Governance in the sense of authoritative coordi- plates, on the health systems of every country in
nation exists in almost any functional society (by the WHO European region and a variety of others.
definition), even if it is not good. Stewardship, by Its Health Systems and Policy Monitor is a regu-
incorporating care, responsibility, good govern- larly updated source of information on health
ment, and the well-being of the population, policies, from which much can be learned about
makes itself a normative rather than empirical governance.
942 S. L. Greer
Good Enough, or Better, Governance The third problem is that governance, being a
power word (Frederickson 2005) whose invoca-
Two words, three broad traditions of their use, a tion strengthens all sorts of arguments and claims,
plethora of international comparative enterprises, therefore has had a wide range of attributes added
and both normative and empirical applications: to it. These are often self-contradictory or hard to
this is a dispiriting starting point for a discussion derive from either data or first principles. For
of how the vocabulary of governance and stew- example, some international organizations view
ardship may be used to understand or improve “conflict prevention” as an important aspect of
health systems. good governance, and others do not (Barbazza
The first problem to address is the confusion and Tello 2014). Does this mean that the WHO
created by political analysts of many stripes, rang- regards conflict as part of good governance?
ing from entrepreneurial consultants to entrepre- Obviously not. Rather, what it shows is that lists
neurial academics, who sought to distinguish of attributes of good governance have a tendency
governance as a type of organization from gov- to be arbitrary and utopian. Defining the aspects of
ernment. This approach defined governance in good governance is tantamount to defining the
terms of self-organization, networks, and a blend good society, and that is questionable on matters
of public, nongovernmental, and private actors, of taste and practicality.
rather than “government,” which connoted hier- Notably, few if any systems show all the attri-
archy, legalism, and inflexibility. The essential butes that have been assigned to “good gover-
distinction was spurious and misleading; net- nance,” and many highly functional systems
works were hardly new forms of political organi- have aspects of poor governance – opacity, cor-
zation, in the West or anywhere else, and the ruption, nepotism, clientelism, and other prob-
hierarchical authority of states and other big orga- lems occur in many places. Few if any countries
nizations such as corporations remained very vaulted into high-income brackets while enjoying
powerful and effective (Bevir 2013). Here, fol- good governance as many define it today (Greer
lowing on current usage and the international and Jarman 2011; Brewer et al. 1999), and a few
institutions, governance is a description of overall practices we associated with bad governance have
decisionmaking and implementation rather than proved flexible and effective, for example,
an ideal type rendering of a particular form of clientelism can mean disruption and bad adminis-
public administration. tration by political jobbers but also allows
The next problem is with the concept of “good reformers to put technically skilled people into
governance.” If governance can be better or important posts (Grindle 2012).
worse, then it seems reasonable to seek to identify The problem, therefore, is the one noted
and generalize practices of good governance, by Tolstoy: all happy families are the same, but
whether it is corporate governance activists trying all unhappy families are different. So many
to generalize good recruitment practices for things have to go right to produce a happy
boards or international financial institutions trying family that the variation within the category
to generalize good governance for the recipients of happy families is limited. Unhappy families
of their funds. Two difficulties arise. The first is have many more degrees of freedom. And it is in
revealed by the syllogism: if governance is how the realm of unhappy families that policy
decisions are made and implemented, then good scholars and policymakers must operate.
governance is good decisionmaking and imple- The solution lies in the simple concept of
mentation throughout a whole society. The likeli- “Good enough governance.” Good enough
hood that the same things, defined with any level governance is a concept formulated by Merilee
of specificity, will constitute good governance in Grindle, who pointed out that many lists of
every society on earth seems limited (Andrews governance attributes have an arbitrary and uto-
2013). Excessive concreteness is a besetting prob- pian character (Grindle 2004, 2007; Thomas
lem in advice about good governance. 2015).
Drawing from this, a more intellectually and More specifically, our review found five key
practically satisfying approach to governance is to aspects of governance that matter and in many
view governance not as a desirable end state but cases can be strengthened. They are not a list of
rather as an activity that can be carried out in attributes to which every society should aspire;
different ways with different effects. This diag- they are, rather, five aspects of health systems that
nostic approach views governance as a phenome- influence the success or failures of policies. One
non that exists in essentially all societies and of the remarkable aspects of the governance liter-
sometimes causes a problem for something else. ature is that, beneath a level of apparent concep-
Governance problems can be diagnosed as a rea- tual confusion, the same words and concepts
son for policy failures, and strengthening one constantly recur. In other words, despite many
aspect or another of governance can remedy pol- different terms and many different lists with dif-
icy failures. Likewise, some policies are just not ferent inclusions and exclusions, and many differ-
sustainable in some systems; governance that is ent conceptual hierarchies, the same five issues
good enough for maintaining basic public health recur. We sorted them into groups with minimal
functions might not be good enough to operate overlap that scholars or policymakers interested in
sophisticated quasi-markets for health care. governance should consider (Greer et al. 2016;
In other words, rather than insistently defining Greer et al. 2017). The result is the TAPIC frame-
good governanceit makes more sense to identify work, for its domains of transparency, account-
aspects of governance that improve the ability of ability, participation, integrity and capacity any of
health systems to achieve a sustainable balance of the five might be the first or most important issue,
equity, access, and cost containment. So, then, and all can exist relatively independently of each
what are aspects of governance that influence the other (accountability without transparency, for
ability of health systems to achieve their goals, example, is the norm in both medical care and
and which can in some cases be improved? Or, on automobile repair). The literature review and anal-
the other side of the coin, what is a governance ysis is presented in (Greer et al. 2016). Case
problem (as distinct from some other kind of studies exploring and showing the uses of the
problem), and what is a detailed taxonomy of TAPIC framework can be found in that book,
governance problems that might need understand- and in (Jarman 2017, Wolfe et al. 2017, Exworthy
ing or remedy? et al. 2017, Trump 2017, Vasev 2017, Willison
2017 and Greer et al. 2017)
Attributes of Governance
Transparency
The first question in using governance analysis to
improve policies and systems: is the challenge, or Transparency involves two things: making deci-
problem, or opportunity one of governance? sions clear and making clear grounds on which
There are other reasons programs fail. They can decisions were made (Woods 1999). At a mini-
be fundamentally bad ideas (though high-capac- mum, this means the kind of basic publicity long
ity, participative, transparent governance might familiar in functional governments – official noti-
reduce the odds of bad ideas being adopted). fications, open meetings, and latterly informative
They can be underfunded. They can also lack websites that make policies and policy processes
political support. understandable.
By a process of elimination, a workable, There are a variety of problems with such a
funded, and supported policy that fails suggests simple form of transparency, however; for a start,
a governance issue. More positively, do problems as every consumer knows, “fine print” can look
appear to lie in the decisionmaking and imple- transparent and effectively hide companies’
mentation systems of society? If so, that means actions. Transparency can be taken too far;
the problems lie in governance. decisionmaking necessarily involves both deals
944 S. L. Greer
and ambiguity, and problems arise if transparency policies; ombuds processes; legislative oversight
displaces real decisionmaking into shadows or and committees of oversight, and regulation
becomes a weapon for those who want to replace including the establishment of dedicated regula-
argument and prioritization with some more tory agencies. Each of these focuses on increasing
mechanistic (Best 2005). It also has the problem the extent of reporting and the ability of the forum
that policy information can be intricate, and to sanction the actor.
efforts to simplify it can also distort it (as fre- Accountability is not the same thing as a prin-
quently happens with both politics and website cipal-agent relationship, which favored form of
redesigns). The result is that simple notification economic modeling. In a public sector principal-
should probably be flanked by devices that permit agent relationship, a principal chooses an agent to
informed access to the policy process so that carry out its wishes (Smith et al. 1997; Besley and
informed journalists, NGOs, citizens, and experts Coate 2003). Governance, in this analysis, is
can contest decisions and their grounds. These better insofar as it shortens and clarifies princi-
mechanisms can include inspectorates, ombuds pal-agent relationships. There are two key prob-
procedures, public data releases, and freedom of lems with this style of analysis. The first is that
information laws. frequently the relationship is hard to characterize
Effective transparency should improve policy in that way – it might actually be a fiduciary model
by enhancing accountability and participation, rather than an agency relationship. The second is
deterring or quickly identifying corruption and that it is essentially normative rather than politi-
incompetence, and making policies more predict- cal; it assumes that there should be a clear princi-
able. The result, in theory, will be trust that an pal, agent, and instructions. A quick reflection on,
organization will not be erratic and in constant for example, the many missions of a hospital
pressure to be competent. shows the empirical limits (Marmor 2001).
Accountability Participation
Accountability is a relationship in which an actor Participation means that affected parties have
(such as a government agency) must account for access to decisionmaking and power so that they
its actions to a forum (such as a legislature) which acquire a meaningful stake in the work of an
can sanction it. In other words, it has three institutions (Woods 1999). Participation has
key attributes: actions, reporting, and sanction. many normatively desirable aspects – it is the
A good accountability relationship means that basis of democracy, after all – but there is also a
the interests of the forum (legislature, population) pragmatic case for participation of affected parties
is always in the mind of the actor, but the actor has in decisions that spans political regimes. That is
autonomy to formulate superior solutions. It can simple: participation helps to reduce or avoid the
also allow productive innovation; holding some- problems that emerge when key affected groups
body accountable for outcomes within limits resist a policy or when a policy is made without
rather than process can produce learning and knowing what they know. For example, complex
better policy outcomes in general (Sabel 2001; medical payment incentive systems do not work
Behn 2001). as intended if they are made without understand-
Mechanisms that policymakers use to achieve ing how doctors work and are paid (a common
accountability are diverse, including contracts; problem in “pay for performance” schemes). In
reporting requirements; financial mechanisms the worst case, it makes it clear what depth of
such as pay for performance; laws that specify opposition a policy will face once enacted.
objectives, reporting, and mechanism; competi- There are a variety of well-established partici-
tive bidding; organizational separation such as pation mechanisms, as well as a very large and
purchaser/provider splits; conflict of interest notably confused literature on public participation
in health that rarely explains the point of partici- traced), clear personnel policies (regular hiring, job
pation (for a critical discussion Stewart 2013) descriptions, and procedures to weed out flawed
and some experiments in novel forms of public people), a clear mandate for each organization, a
participation, such as participatory budgeting, clear and reliable budgeting process, administrative
whose popularity outside their places of origin is procedures such as document management and min-
clearer than their effectiveness (Seekings 2013). uted meetings, external audit (to put a check on
Established mechanisms of participation include people within the organization), and a clear sense
stakeholder forums, public consultations, elec- of organizational roles and purposes. Many of these
tions, appointed community representatives on policies, if added together, are bureaucracy – for
boards, and legal remedies (e.g., legislation that better or for worse. The challenge of public man-
allows aggrieved outsiders to litigate processes). agement is to gain the benefits of bureaucracy in
They can also include research, e.g., surveys of terms of merit, impartiality, and efficiency without
local opinion about a given option. When affected risking too much wasted effort or incompetence.
bodies are other governments or organizations,
advisory committees, partnerships, joint budgets,
and special forums for consultation are effective Policy Capacity
mechanisms for ensuring that different govern-
ments will be aware of decisions and make their Finally, most accounts of effective health gover-
views clear. nance include a discussion of policy capacity: the
The benefit of participation is the potential cre- ability to develop policy that is aligned with
ation of “ownership,” i.e., a sense among affected resources in pursuit of societal goals. Policy
parties that they have a stake in the success of an capacity is a property of what Edward Page
initiative. Without ownership, there is a real risk of calls the “policy bureaucracy,” that part of an
sabotage, lassitude, or simple ignorance, all of organization, especially a government, whose
which amount to implementation failure. There is purpose is to produce policy (Page and Jenkins
also the potential benefit of increased legitimacy – 2005). Just as a health policy initiative can run into
the sense that decisions are taken in ways that trouble for a lack of medical staff, it can run into
reflected the relevant interests. trouble for a lack of policy staff who are capable of
identifying, synthesizing, and analyzing a wide
variety of information in order to spot problems,
Integrity make the case against ill-considered policies, and
work through the procedural and practical chal-
Integrity is one of many words for the key attri- lenges of implementation. It can look good to
butes of a well-run modern bureaucracy: pro- reduce policy capacity – civil servants at the heart
cesses of representation, decisionmaking, and of the state do not always have public sympathy –
enforcement should be clearly specified; all mem- but it can have negative consequences in the form
bers should be able to understand and predict the of poorly thought-out policies.
processes by which an institution will take deci- The development and improvement of policy
sions and apply them; and individuals should have capacity is a central preoccupation of public man-
clear roles and responsibilities. In other words, agement scholarship, and the list of tools for doing
an organization with a high level of integrity is it is long. It includes mechanisms to produce
meritocratic, separates the person and the office, intelligence on developments in the system and
and is not corrupt. These are the bases for its performance, so that policymakers can identify
well-functioning, long-lasting trustworthy and react to problems and intelligence on process
organizations. such as budgetary and legal issues (all too often
Mechanisms policymakers can use to promote or neglected in health policy analysis), research and
entrench organizational integrity include internal analysis capacity (trained staff who can conduct or
audit (so that money moves as intended and can be commission research and deal with literature and
946 S. L. Greer
outside experts), staff training (e.g., so that a doc- Conclusion

tor hired into a health ministry can learn about
budgeting and law), strong hiring procedures that Governance and stewardship might seem like
balance merit and responsiveness in the central hopelessly fuzzy concepts, but the exercise of
policy bureaucracy, procedures to incorporate grouping the many things said about them reveals
experts with their different career structures and five relatively coherent attributes of a health sys-
incentives, and, all too often forgotten, extensive tem that are the object of policies for improvement
capacity for purchasing and managing relation- and that can have an effect on the ultimate cost,
ship with outsiders such as regulated industries quality, and access of health.
or government contractors. This long list suggests
something important: while policy bureaucracies
are routinely dwarfed by the systems they manage
and they go beyond the minister’s immediate
References
office. Civil servants further from the minister, Andrews M. The limits of institutional reform in develop-
and from the glamor of politics, fulfill an impor- ment. Cambridge: Cambridge University Press; 2013.
tant role and can respond to investment and orga- Barbazza E, Tello JE. A review of health governance:
nizational development. definitions, dimensions and tools to govern. Health
Policy. 2014;116(1):1–11.
Behn RD. Rethinking democratic accountability. Washing-
ton, DC: Brookings; 2001.
A Diagnostic Approach Besley T, Coate S. Centralized versus decentralized provi-
sion of local public goods: a political economy
approach. J Public Econ. 2003;87(12):2611–37.
Reading scholarly and grey literature, almost Best J. The limits of transparency: ambiguity and the
everything framed as a component of good gov- history of international finance. Ithaca: Cornell Univer-
ernance or as an attribute of governance in gen- sity Press; 2005.
eral, can be fitted into these five categories. If we Bevir M. A theory of governance. In: A theory of gover-
nance. Berkeley: University of California Press; 2013.
use them as a diagnostic tool (before or after a Brewer J, Hellmuth E, Brewer J. Rethinking Leviathan: the
problem arises), then we can first see if a policy eighteenth-century state in Britain and Germany.
failure, or risk, depends on decisionmaking and Oxford: Oxford University Press; 1999.
implementation and then work out what kind of Campbell JL, Lindberg LN. The evolution of governance
regimes. In: Campbell JL, Rogers Hollingsworth J,
governance issue exists and might be remedied – Lindberg LN, editors. Governance of the American econ-
if, for example, the problem is of sabotage and omy. Cambridge: Cambridge University Press; 1991.
poor implementation by excluded interested Coppedge M, Gerring J, Lindberg S. V-Dem: varieties of
parties, then greater transparency and participa- democracy project description. In: Varieties of democ-
racy project description. South Bend: University of
tion might be called for. It is less productive to Notre Dame Kellogg Institute; 2012.
elevate them, or any other framework, into good Exworthy M, Powell M, Glasby J. The governance of
governance, for the simple reasons that there are integrated health and social care in England since
tensions between them, all of them can be taken to 2010: great expectations not met once again? Health
Policy. 2017;121(11):1124–1130.
extremes (e.g., transparency can make productive Frederickson HG. Whatever happened to public admini-
dealmaking impossible), and not all of them will stration? Governance, governance everywhere.
mean the same thing or have the same salience in In: Ferlie E, Lynn Jr LE, Pollitt C, editors. Oxford
every system (e.g., integrity is much less of an handbook of public management. New York: Oxford
University Press; 2005.
issue in Northern Europe than in most of the rest Fukuyama F. What is governance? Governance. 2013;26
of the world). We can, however, try to use the (3):347-368.
TAPIC framework for diagnoses not just of spe- Greer SL. Structural adjustment comes to Europe: lessons
cific policy problems but of policymaking prob- for the eurozone from the conditionality debates.
Global Soc Policy. 2013;14:51.
lems. This should in turn reduce the likelihood of Greer SL, Jarman H. The British civil service system.
unworkable policies being adopted, or workable In: van der Meer FM, editor. Civil service systems in
policies adopted without adequate finance. Western Europe. Cheltenham: Edward Elgar; 2011.
Greer SL, Wismar M, Figueras J, editors. Strengthening Smith PC, Stepan A, Valdmanis V, Verheyen P. Principal-
health system governance: better policies, stronger per- agent problems in health care systems: an international
formance. Brussels/Philadelphia: European Observa- perspective. Health Policy. 1997;41(1):37–60.
tory on Health Systems and Policies/ Open University Smith PC, Mossialos E, Papanicolas I. Performance mea-
Press; 2016. surement for health systems improvement: experi-
Greer SL, Vasev N, Wismar M. Fences and ambulances: ences, challenges and prospects. Copenhagen: WHO
Intersectoral governance for health. Health Policy. Regional Office for Europe; 2008.
2017;121(11):1101–1104. Stewart E. What is the point of citizen participation in
Grindle MS. Good enough governance: poverty reduction health care? J Health Serv Res Policy. 2013;18(2):
and reform in developing countries. Governance. 124–6.
2004;17(4):525–48. Thomas MA. Govern Like Us: U.S. Expectations of Poor
Grindle MS. Good enough governance revisited. Dev Pol- Countries. Columbia University Press; 2015.
icy Rev. 2007;25(5):533–74. Travis P, Egger D, Davies P, Mechbal A. Towards better
Grindle MS. Jobs for the boys: patronage and the state in stewardship: concepts and critical issues. In: Global
comparative perspective. Cambridge, MA: Harvard programme on evidence for health policy discussion
University Press; 2012. papers. 2002. www.who.int/healthinfo/paper48.pdf
Jarman H. Trade Policy Governance: What Health Travis P, Egger D, Davies P, Mechbal A. Towards
Policymakers and Advocates Need to Know. Health better stewardship: concepts and critical issues.
Policy. 2017;121(11):1105–1112. In: Murray CJ, Evans DB, editors. Health
Kjaer AM. Governance. New York; 2004. systems performance assessment: methods, debate
Marks G, Hooghe L, Blank K. European integration and empiricism. Geneva: World Health Organization;
from the 1980s: state-centric v. multi-level governance. 2003.
J Common Mark Stud. 1996;34(3):341–78. Trump BD. Synthetic biology regulation and governance:
Marmor T. Fads in medical care policy and politics: the Lessons from TAPIC forthe United States, European
rhetoric and reality of managerialism. London: The Union, and Singapore_. Health Policy. 2017;121
Nuffield Trust; 2001. [Rock Carling Fellowship (11):1139–1146.
Lecture 2001]. Uslaner EM. Corruption, inequality, and the rule of law.
Nunnenkamp P. What donors mean by good governance: Cambridge: Cambridge University Press; 2008.
heroic ends, limited means, and traditional dilemmas of Vasev N. Governing energy while neglecting health - The
development cooperation. IDS Bull. 1995;26(2):9–16. case of Poland. Health Policy. 2017;121(11):1147–1153.
Page EC, Jenkins B. Policy bureaucracy: government with Walshe K, Smith J. Leadership and governance.
a cast of thousands. Oxford: Oxford University Press; In: Healthcare management. 2nd ed. Maidenhead:
2005. Open University Press; 2011.
Rhodes RAW. Understanding governance: policy net- Williamson OE. The mechanisms of governance. New
works, governance, reflexivity and accountability. York: Oxford University Press; 1996.
Philadelphia: Open University Press; 1997. Willison C. Shelter from the Storm: Roles, responsibilities,
Rosenau JN, Czempiel E-O. Governance without govern- and challenges in United States housing policy gover-
ment: order and change in world politics. Cambridge: nance. Health Policy. 2017;121(11):1113–1123.
Cambridge University Press; 1992. Wolfe I, Mandeville K, Harrison K, Lingam R. Child
Rothstein B. The quality of government: corruption, social survival in England: strengthening governance for
trust and inequality in international perspective. health. Health Policy. 2017;121(11):1131–1138.
Chicago: University of Chicago Press; 2011. Woods N. Good governance in international organizations.
Sabel C. A quiet revolution of democratic governance: Glob Gov. 1999;5:39–61.
towards democratic experimentalism. In: OECD, Woods N. The globalizers: the IMF, the World Bank, and
editor. Governance in the 21st century. Paris: OECD; their borrowers. Ithaca: Cornell University Press; 2006.
2001. World Bank. Governance and development.
Sabet DM. Police reform in Mexico: informal politics Washington, DC: World Bank; 1992.
and the challenge of institutional change. Stanford: World Bank. Governance: the World Bank’s experience.
Stanford University Press; 2012. Washington, DC: World Bank; 1994.
Seekings J. Is the south Brazilian’? The public realm in World Health Organization. The world health report 2000:
urban Brazil through a comparative lens. Policy Polit. health systems: improving performance. Geneva:
2013;41(3):351–70. WHO; 2000.
Provision of Health Services:
Long-Term Care 43
Vincent Mor and Anna Maresso
Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 950
Who Uses Long-Term Care Services and
Supports? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 951
Background to Long-Term Service and Support “Systems” . . . . . . . . . . . . . . . . . . . . . . . . . . . 951
Structure of Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953
Financing of Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953
Expenditure on Long-Term Care . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 953
Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954
Paying for Long-Term Care Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 959
Structure of the Delivery System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 961
The Long-Term Care Services and Supports Continuum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 962
Long-Term Care Bed Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963
Community-Based Service Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965
Informal Care Provision and Cash Payments for Dependent Care Allowances . . . . . . . 966
Regulating Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 968
Different Regulatory Approaches to Quality Assurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 968
The Regulatory Reach of Quality Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 969
Challenges Facing Quality Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974
Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976
Abstract
V. Mor (*) This chapter examines the financing, organiza-
Department of Health Services, Policy and Practice, Brown tion and regulation of long-term care in OECD
University School of Public Health, Providence, RI, USA countries. Historically, long-term care services
Providence Veterans Administration Medical Center, and supports constitute a blending of social
Center on Innovation, Providence, RI, USA welfare benefits and health care provision.
e-mail: vincent_mor@brown.edu Depending on the complexity and severity of
A. Maresso care recipients’ needs, delivery is characterized
European Observatory on Health Systems and Policies, by both specialized nursing and medical care
London School of Economics and Political Science,
London, UK and personal and home-help services such as
e-mail: a.maresso@lse.ac.uk assistance with meals, grooming and

https://doi.org/10.1007/978-1-4939-8715-3_24
950 V. Mor and A. Maresso
household chores. The delivery of long-term discussion of key challenges in quality mon-
care is accomplished via institutional (residen- itoring and its role in enhancing user choice
tial) care, formal home care services, as well as and stimulating improvements in providers’
through informal care provided by family performance.
members or hired care givers. In line with the
preferences of older people to remain in their
own homes, the past decade has seen a sub-
stantial shift in most OECD countries towards Introduction
more home and community based care. This
trend has regulatory and cost implications for The network of long-term care services and sup-
monitoring the quality of care, which in the ports that provide assistance, both financial and
past has focused predominantly on institutions. personal, to frail and disabled individuals in soci-
Moreover, increased demand for formal ser- ety is not really a system. While many who study
vices, in both residential and home care set- long-term care document and compare the poli-
tings, due to ageing population pressures, also cies and practices that characterize country’s ser-
has implications for the long-term care work- vice structure talk about the “long-term care
force, with shortages anticipated over the next system,” in most countries it is best to view
20–40 years. long-term services and supports as an amalgam
While funding of long-term care services of laws, policies, rules, practices, and service pro-
comes mainly from public sources, there are viders that emerged over the decades as a response
very large variations between OECD coun- to social and demographic changes in developed
tries in the resources dedicated to this sector. and developing societies. Unlike medical care or
Eligibility for coverage also varies between even public health structures, historically long-
countries, ranging from universal systems - term care services and supports constitute a blend-
based solely on need and not on income - to ing of social welfare supports and health care
long-term care systems that apply means test- provision. The supports required by frail older,
ing and safety-net principles to determine or seriously disabled, individuals include
who qualifies for publicly-provided long- enhanced finances made necessary to buy the
term care services and benefits. However, help needed to sustain daily life or the services
irrespective of financing model, all countries from appropriate agencies to provide that help.
use some form of needs assessment to judge Since the principal cause of frailty and/or disabil-
an applicant’s level of functional impairment ity is compromised health, it is almost always the
and care needs. Financial support is provided case that more and more complex and comprehen-
via in-kind services or through cash benefits sive medical care is needed in conjunction with
to recipients to purchase the services they support services.
need (with varying degrees of restrictions). Another factor that differentiates long-term
Cost-sharing, in the form of user charges, care services and supports from the provision of
play a role in all countries, to different health care services is that most often long-term
degrees, with service users, unless they are care services and supports is a family affair made
destitute, having to meet a proportion of the possible by a person’s spouse and children and
cost of their care from their own private less often extended family. Indeed, evidence sug-
resources. gests that, depending upon the country, between
The chapter also looks at the regulatory 10 and 40 times more care is provided by informal
mechanisms used across a selection of carers than by formal agency staff, whether insti-
countries to monitor the quality of long-term tutionally based or providing home-based ser-
care, particularly in residential facilities, vices (Columbo et al. 2011). Unlike demands for
identifying three broad quality assurance primary care medicine, as demand for long-term
approaches. The chapter ends with a care services and supports increases it can be met
43 Provision of Health Services: Long-Term Care 951
by policies favoring provision of care by families Who Uses Long-Term Care Services
or by formal sources, the former more reminiscent and Supports?
of how long-term care has been historically
provided. Most users of formal long-term care services
The history of formal long-term care in west- (institutional or community-based) are women
ern societies is closely bound up with the aged 80 and over. However, according to OECD
emergence of state sponsored social welfare health data, there is considerable country to coun-
efforts ranging from “almshouses” to outdoor try variation in the proportion of women aged
relief efforts designed to support paupers 80 and over who use long-term care services
and others unable to care for themselves (Kellog from a low of 2% in Poland to a high of over
1883; Katz 1996). Almshouses in Britain, the 45% in Norway (Columbo et al. 2011). Interest-
Netherlands, France, and Belgium housed ingly, depending upon the country, a sizeable
indigent elderly and disabled persons unable to minority of long-term care service recipients are
care for themselves without family members to under age 65, with Poland leading the way with
whom they or local authorities or charities almost half (48%) of formal care recipients being
could appeal for support. In European and under 65. However, the substantial variation in the
Anglo-Saxon countries these facilities emerged availability of home versus institutionally based
from a tradition of sectarian or local charitable long-term care services and how cash allowances
organizations but were not infrequently provided to frail elderly persons and their families
conflated with support for the poor, the destitute, are counted in long-term care user statistics makes
and the alcoholic. Local authorities, not just it difficult to be too precise in comparing rates of
in England where the poor laws prevailed, use across OECD countries. This is a theme which
established almshouses or hospices to care will be revisited throughout this chapter since
for the unfortunate and dying as a civic without reliable data to characterize the nature of
responsibility. the services provided and the characteristics of the
There are two overlapping dualities that char- recipient population, it is difficult to have a great
acterize the scope and delivery of long-term ser- deal of confidence in many of the statistics used to
vices and supports. First, services and supports compare the long-term care systems of one coun-
represent both financial support for basic try with another.
food and shelter and the provision of physical
support and care for those unable to do even the
simplest daily tasks without help. Second, Background to Long-Term Service
because care recipients are in need largely due and Support “Systems”
to the complexity and severity of their medical
conditions, services generally involve both As noted, long-term care “systems” are in almost
unskilled homemaker services as well as special- all cases a misnomer because it is the exceptional
ized nursing and medical care. Third, different country that actually has an integrated system.
countries have adopted varying mechanisms to Regardless of its “system-ness,” in general, long-
meet the long term care needs of population term care can be conceptualized as three
ranging from cash payments to eligibility interlocking sets of policies and forces which
determination processes which fundamentally apply, regardless of the country. These three fea-
define each country’s long term services and tures include: (1) financing and reimbursement,
supports structure. As will be observed in the that is, who pays and how the services rendered
paragraphs below, many countries make all are reimbursed; (2) the organization of the deliv-
these different dimensions of services and ery system, that is, how the providers of long-term
supports available under public funding with or care services and supports are organized and coor-
without means testing the client and/or her dinated, since clients often receive a multiplicity
family. of services from different providers and sources;
and (3) the regulatory or quality assurance system, extent that financing changes are also designed to
that is, the regulations, rules, and procedures give the eligible service recipient some choice as
governing licensure and quality standards for to how their needs are to be met in the form of
agencies serving the long-term care population. “consumer direction” (Doty et al. 2010), addi-
These three components are interdependent; tional operational complications arise related to
changes made in each affect the implementation personal care workers’ compensation, indemnifi-
and impact the others have, either directly or cation, and even whether family members can be
indirectly. For example, over the past decade paid to provide the care. As is obvious, these
most OECD countries have increased their wrinkles in the financing rules and allowances
emphasis on home and community-based services introduce a further complication in the regulatory
to meet the preferences of a new generation of control structures since it is difficult for govern-
seniors who are less willing to be relegated to an ment to regulate the quality of familial
institutional setting (Grabowski et al. 2010; relationships.
Damiani et al. 2011). Indeed, the movement Changes in financing also have implications
toward “consumer directed care” represents the for the organization of the long-term care
epitome of the shift toward home and delivery system. For example, countries that
community-based care since the underlying instituted universal long-term care insurance
assumption is that older persons will use the new policies that include cash transfers must
discretion to remain in the community and outside determine whether those funds can be used to
of institutions (Alakeson 2010). purchase home care services, regardless of
Shifting payments to home care providers, the licensure status of the agency or worker
away from the past dominance of institutions, employed to provide the service. However,
has immediate implications for the structure of policies which only reimburse recipients
the delivery system as well as how it is regulated. and their families for services rendered by
To implement policies stimulating the develop- licensed or professionally supervised staff
ment of home care services, regulatory structures necessarily means costs will be higher. Without
that have historically been oriented toward moni- such requirements, institutional care providers,
toring quality in institutions must be realigned to who are required to adhere to professional
manage a much more diverse and complex over- licensure requirements and labor laws including
sight process. To assure that agencies charged tax withholdings, would have a legitimate
with meeting the needs of the elderly in their complaint about there not being a “level
homes actually are providing the care for which playing field,” since cash transfer payments
they are paid requires visiting clients and their that result in families hiring illegal
families in their homes and/or demanding exten- immigrants can be seen as undermining the
sive care management auditable documentation. formal health and social care services labor
This means that the costs of realigning long-term market.
care services from the institution to the commu- Understanding how changes in one compo-
nity will require a very different, and costly regu- nent of the system affect the others is
latory and oversight structure. Furthermore, further complicated by the fact that it spans the
financing home and community services repre- health care sector as well as the formal and infor-
sents a substantial departure from the institutional mal labor market. The emphasis on home
approach, where purchasing a day of care in a care places increased pressure on family care-
nursing home is well understood. In the case of givers who, in rich countries, often supplement
home care services, whether to pay by the hour, direct family care time with undocumented
the skill level of the staff person, or even to bundle workers’ time, thereby violating labor laws and
payment with other post-acute care services or via possibly endangering the frail older person.
capitation are all decisions that have different Recent efforts within the OECD to better
implications for how payments are made. To the characterize the variation in long-term care
systems across member countries has identified Financing of Long-Term Care

clusters of countries based upon variation
along two dimensions pertinent to long-term Expenditure on Long-Term Care
care. The two dimensions first include, the
“generosity” of the formal entitlements to the Expenditures on long-term care vary signifi-
long-term care services and supports that cantly among countries, and spending reflects
recipients require and second, the ease with differences in care needs, utilization rates, the
which individuals can access needed community comprehensiveness of formal long-term care ser-
services and the organizational complexity vices, and the role of families in providing infor-
associated with monitoring or regulating mal care. Another factor that affects
the array of available services. A recent OECD expenditures is whether services are defined as
report which characterized most EU countries health or social services. With this proviso in
on these dimensions found that countries with mind, a recent report (Columbo et al. 2011) cal-
more generous long-term care financing culates that OECD countries spent an average of
systems also offered a broader array of services 1.5% of GDP on long-term care in 2008, (Long-
and more choice for service users (Mot and term care spending is calculated on the basis of
Willemé 2012) health-related long-term care services (including
palliative care, nursing care, personal care ser-
vices, and health services supporting family
Structure of Chapter care) and social services related to long-term
care (including home help and care assistance
In this chapter we characterize the major issues and residential care services).) but with levels
facing industrialized countries with respect to for individual countries ranging from less
the financing, organization, and regulation of than 0.5% (e.g., Hungary, Slovenia, South
long-term care services and supports. Each of Korea, and Poland) to more than 3% (The Neth-
these aspects of countries’ long-term care “sys- erlands and Sweden). These differences are
tems” is discussed separately, although, as noted, put into even starker relief when per capita
these are integrally intertwined. To the extent expenditure on long-term care is considered.
that the available literature allows, we compare On this metric, the highest spenders devote
selected countries with respect to their approach almost three times as many resources as
to financing long-term services and supports and mid-range countries (such as Australia, Ger-
their organization (private vs. public, institu- many, Japan, and the United States) and between
tional vs. home based) with special reference to 20 and 30 times more than the lowest
the level of informal care support provided by spending countries (see Table 1). Table 1 also
family and friends. Finally, the regulatory struc- indicates the source of LTCF funding in each
ture for licensing and certifying institutional country, revealing that long-term care is mainly
and home care agencies meeting the needs of funded through public sources. It should be
frail elders are described across selected noted, however, that data systems in most coun-
countries. Since we rely heavily upon research tries are not sufficiently robust to capture all
studies funded by the EU and/or OECD for our aspects of private spending on long-term care,
comparative data, not all countries are consis- and that there is a great deal of underreporting of
tently represented. We close the chapter by iden- direct out-of-pocket spending. Nevertheless,
tifying several salient issues that could benefit with the exception of Switzerland, government
from additional rigorous cross-national review sources account for the lion’s share of public
along with the challenges facing industrialized financing for long-term care. Interestingly, the
nations as they strive to meet the long-term care data also indicate that in Portugal, Germany,
needs of their growing population of frail elderly and Spain private expenditures on long-term
persons. care are considerable.
Table 1 Funding for publicly provided long-term care, selected OECD countries
Per capita
spending LTC Total government /state Private share/out-of- Private insurance
on LTC funding component (%) of public pocket component component (%) of
(US$ as % of LTC expenditure (incl. taxes (%) of public LTC public LTC
Countryc PPP)a GDPa and social insurance)b, d expenditureb, d expenditureb, d, e
Slovak 42 0.2 – – –
Republic
Czech 59 1.4 100 0 0
Republic
Poland 68 0.4 92.3 0.3 0
Korea 73 0.3 76.9 17.8 0
Hungary 108 0.3 90.3 2.4 0.9
Spain 271 0.6 71.9 28.1 0
Slovenia 302 0.8 75.4 24 0.5
Australia 367 0.8 88.9 8.5 0.3
New Zealand 383 1.3 92 4.4 1.3
United States 455 0.6 – – –
Germany 470 0.9 67.2 30.4 1.7
Austria 497 1.1 81.8 17.1 0
Japan 527 1.4 88.9 7.1 4
France 564 1.7 99.2 0.4 0.4
Canada 574 1.2 82 16.8 0.4
Iceland 638 1.7 100 0 0
Belgium 707 1.7 90 0.2 9.8
Denmark 724 1.8 89.6 10.4 0
Finland 790 1.8 84.4 14.2 0
Luxembourg 822 1.4 – – –
Norway 1276 2 89.3 10.7 –
Sweden 1332 3.6 99.2 0.8 0
The 1431 3.5 99.9 0 0
Netherlands
Portugal – 0.1 53.4 45.4 1.1
Switzerland – 0.8 38.8 58.4 0.4
Source: Adapted from Columbo et al. (2011)
Notes
a
Data from 2008.
b
Data from 2007.
c
Countries are listed from lowest to highest per capita expenditure on long-term care
d
Funding from government sources, private out-of-pocket expenditures, and private insurance do not always add up to
100% as the following other minor funding sources are excluded from this table: nonprofit institutions serving house-
holds, corporations (other than health insurance), and “other”.
e
Data on out-of-pocket spending for some of the countries are underestimated. For example, in the Netherlands, cost-
sharing on long-term care services is estimated to account for 8% of the total long-term care expenditure. The share of out-
of-pocket spending for Switzerland is overestimated as cash benefits granted for care in care facilities are not considered.
Coverage to publicly provided services and benefits?),

the range of benefits covered, and the
One way of looking at the financing mechanisms proportion of the benefit cost that is covered,
behind public long-term care is to consider including those services that are excluded
three aspects of coverage: the scope of entitle- from public funding (cost-sharing and user
ment (i.e., on what basis are citizens entitled charges).
Types of Public Long-Term Care Systems services, until they become impoverished paying
The scope of entitlement provides a useful way to for such services privately. The only other way
classify countries’ public long-term care systems that these individuals’ long-term care needs can
since this approach captures whether entitlement be met is if services are provided as part of the
is universal or whether access to services is health care system (as in the case of nursing care
means-tested and thus reserved for the poorest in the United Kingdom) which sets up the
individuals who are protected through a public dynamic of cross-subsidy between the health and
safety-net. In addition, such coverage may be social services sector, the latter generally being a
financed through a single program (such as gen- more costly means of meeting the same need.
eral taxation or a mandatory long-term care insur- It should be noted, however, that whatever
ance scheme) or through multiple programs and long-term care coverage model a country has,
benefits. Using these criteria, Columbo et al. needs assessments to judge an applicant’s level
(2011) identify three long-term care models: of functional impairment and care needs are a
(1) universal coverage systems with a single pro- central component of determining eligibility. Nor
gram, (2) mixed systems, and (3) means-tested is it the case that countries’ approach to coverage
safety-net systems. Table 2 classifies a number necessarily corresponds with their level of spend-
of OECD countries according to this typology. ing. For example, while the Netherlands, Sweden,
The main feature of single-program universal sys- Norway, and Luxembourg dedicate the highest
tems, as found in Germany, Japan, Luxembourg, per capita spending to long-term care services,
the Netherlands, and South Korea, which have other universal-system countries such as Ger-
mandatory long-term care insurance schemes, many, Japan, South Korea, as well as Denmark
and the Nordic countries, which have tax-based and Finland fall within the mid to upper-mid
financing programs, is that they provide public expenditure range. Similarly, all mixed system
long-term care services to everyone who is countries have mid-range long-term care spend-
assessed as needing care, based on their depen- ing, as does the USA, which belongs to the means-
dency level and regardless of income. That is, tested safety-net group.
access to services is not dependent on the income
level or assets of beneficiaries. Mixed systems, as What Long-Term Care Services Are
seen in Australia, Austria, France, and Spain, Covered?
typically have a number of different programs Most public long-term care systems cover both
and benefit schemes operating side by side, institutional and home-based services, although
which can be either universal or means-tested, the range of services covered varies, as does the
with the amount of the benefit adjusted down- proportion of the cost (see Table 2, as well as the
wards as the recipient’s income level increases. subsection on cost-sharing below). Universal
The countries in this group may also have long-term care systems tend to provide compre-
medical-related or nursing benefits covered uni- hensive long-term care packages encompassing
versally (free) through the health system. Finally, institutional/residential services, home care nurs-
means-tested safety-net systems (found in the ing, domestic assistance as well as sheltered hous-
United Kingdom and USA) use income or asset ing schemes, assistive devices, home
tests to set a threshold for entitlement to publicly modification, and transport to community ser-
provided long-term care services and benefits. vices. However, in some universal systems, such
Income and asset-testing is used to target those as those in Germany and South Korea, a notable
with the highest care needs and to protect those omission from the long-term care package is
who otherwise would not have the means to pur- accommodation (all rooms in Germany and pri-
chase care privately. However, if means-testing vate rooms in South Korea) and meal costs in
thresholds are set quite low, a large proportion of nursing homes; these must be paid for out-of-
elderly people in need of long-term care may be pocket. In Japan, lodging and meals in nursing
excluded from receiving publically provided homes are only partially covered. In mixed
956
Table 2 Public coverage of long-term care, selected OECD countries, 2010

Type of system
based on
eligibility Financing Income/
(universal; source: tax, Government means-
mixed; means- social levels testing to
tested/low security contributing Program determine Needs Types of benefits
Country income) contribution to financing characteristics Who is covered eligibility? assessment provided
Australia Mixed Tax-based Federal, state, Multiple programs Older people Yes Yes In-kind only: home
and local and institutional
care
Austria Mixed Tax-based Federal and Cash benefit All disabled people Universal Yes Cash: home and
regional programs: universal cash benefit: institutional care
(lander) cash benefit no Some in-kind
(Pflegegeld) and 24-hour care benefits provided
24-hour care benefit benefit: yes by regional
governments
Canada Mixed Tax-based Federal and Various programs by All people in need Universal Yes In kind: home and
provinces province (home care) institutional care
and means-
tested
(institutional
care)
Denmark Universal Tax-based National and Single program of All people in need No Yes Cash and in-kind
local assistance home and
institutional care
Finland Universal Tax-based National and Single program of All people in need No Yes Cash and in-kind
municipalities assistance home and
institutional care
France Mixed Taxes, social Central and Various income- All people in need; Income Yes Cash and in-kind
contributions local related benefits Handicap allowance home and
is for those aged 60+ institutional care
Germany Universal Social Payroll LTC insurance All people in need No Yes Cash and in-kind
insurance contributions system with multiple home and
insurers institutional care
V. Mor and A. Maresso
43
Italy Mixed Tax National and Institutional care All people in need No Yes Cash and in-kind
regional benefits part of the home and
health system; cash institutional care
care allowance covers
home care
Japan Universal Social National LTC insurance Over 65, or 40–65 No Yes In kind only: home
insurance, system with age-related and institutional
plus personal Insured individuals disease care
contributions aged 40–65 pay 30%
of total LTC costs
Korea Universal Social National LTC insurance Over 65 s, or under No Yes Cash and in-kind
insurance system 65 suffering from home and
and taxes geriatric diseases institutional care
Luxembourg Universal Social National Single LTC insurance All people in need No Yes Cash and in-kind
insurance, system part of health home and
tax and a insurance system institutional care
Provision of Health Services: Long-Term Care
special tax
The Universal Social National LTC insurance All people in need No Yes Cash and in-kind
Netherlands insurance system with multiple home and
insurers institutional care
New Zealand Mixed Tax-based National Health funding All people in need Yes Yes In kind only: home
authority responsible and institutional
for LTC provision; care
Residential Care
Subsidy
Norway Universal Tax-based National and Single program All people in need No Yes Cash and in-kind
local home and
institutional care
Spain Mixed Tax-based Central and National long-term All people in need Yes Yes Cash and in-kind
regional care system home and
administered by institutional care
regions
Sweden Universal Tax-based Local and Single program All people in need No Yes Cash, in-kind and
national vouchers: home
(11–12%) and in-kind care
varies across
municipalities
(continued)
957
958
Table 2 (continued)
Type of system
based on
eligibility Financing Income/
(universal; source: tax, Government means-
mixed; means- social levels testing to
tested/low security contributing Program determine Needs Types of benefits
Country income) contribution to financing characteristics Who is covered eligibility? assessment provided
Switzerland Mixed Social National and Mandatory health All people in need Asset tested Yes Cash and in-kind
(health cantons insurance program (for some institutional care;
insurance), plus complementary benefits) home care mainly
state budget cash benefits under provided by private
Disability Insurance organizations
United Means-tested Tax-based National and Various programs and Social care benefits to Asset tested Yes Cash and in-kind
Kingdom safety-net local allowances all adults in need; (for some home and
specific allowances benefits) institutional care
for the disabled and
elderly disabled
United States Means-tested Tax-based National and Medicaid and People of low income Medicaid is Yes Mainly in-kind:
safety-net state Medicare programs (Medicaid) means- institutional
Seniors (Medicare) tested; benefits. Optional
Medicare is state home care
universal for benefits
seniors
Sources: Adapted from Fernandez et al. (2009); Columbo et al. (2011); Swartz (2013)
Note: LTC – long-term care.
systems, typically nursing care, either in home or impairment will cost the facility more in terms of
institutional settings, is financed on a universal the time, labor, and skills required to care for
basis by the parallel health system while personal them. In this example, a form of case-mix reim-
(social care) is covered under separate benefit bursement (such as the Resource Utilization
schemes. For example, in Italy, special nursing Groups case-mix system used in many US states
homes for elderly people are covered via the and in Ontario, Canada) that provides an incentive
health system budget while home care services to care for sicker patients would be more appro-
are mainly financed by a non-means-tested cash priate than a flat-rate reimbursement model that
care allowance whose modest level means it is pays the same amount per nursing home resident,
most often used to pay for informal care. In regardless of the intensity of their care needs.
Canada, most provinces cover nursing and per- While most often applied to the institutional set-
sonal care (such as help with bathing and ting, it is possible to devise case-mix reimburse-
grooming) in home settings, but other assistance ment models for home care services. Such
such as domestic help and meal preparation may considerations impact on both the efficiency of
require the user to pay a fee. In the group of the long-term care system as well as on its capac-
means-tested safety-net countries, the United ity to meet the growing care needs of the popula-
States sets a basic mandatory basket of long- tion requiring long-term care services.
term care services (such as nursing facility ser-
vices and home health-related services) through Cash Benefit Schemes
its Medicaid program for people on low incomes, There is some cross-national information on cash-
but individual states determine what other ser- benefit schemes, which offer recipients the choice
vices may be covered. In most states while benefit to purchase care services that they feel best meet
structures cover support for daily living activities their needs from the provider they prefer. While
in home-care settings as well as accommodation most countries offer a combination of in-kind
and meals in nursing homes, the latter services are services and cash benefits, a few, like Austria,
only available to those who meet strict means- France, and the Czech Republic, use cash benefits
testing and who have exhausted their own as the main type of long-term care purchasing
resources before becoming eligible for public sup- mechanism. These schemes differ among coun-
port (Columbo et al. 2011). tries as to whether they are available alongside
in-kind benefits or whether recipients must choose
either one or the other, whether the level of the
Paying for Long-Term Care Services cash benefit is determined through means/income
testing, and whether any restrictions are placed on
Comparing countries’ approach to paying for ser- how the benefit may be used. For example, some
vices is complicated by the lack of comparable countries require that only accredited formal ser-
international data on the different reimbursement vices be hired while others have very few restric-
mechanisms used to pay providers for different tions and allow the benefit to be used to pay family
types of care, whether it be fee-for-service pay- members or other informal carers for services
ments, capitation, or day-rates for nursing costs. rendered in the home. There is some evidence
The importance of having data on the impact of that the use of unregulated cash payments seems
different reimbursement vehicles may be illus- to incentivize the hiring of migrant care workers
trated by the case of how institutional services in countries such as Austria and Italy, who either
(i.e., in nursing homes) are paid for. If a country’s substitute or compliment personal care and
reimbursement mechanism does not recognize, domestic assistance traditionally provided by the
and adjust payment levels proportionally for cli- family (van Hooren 2008; Columbo et al. 2011;
ents/patients who have more complex needs and Phillips and Schneider 2007; see also section
require more care, there will be a disincentive for “Structure of The Delivery System” on Provi-
providers to admit such individuals as their greater sion). Table 3 provides an overview of the cash
Table 3 Cash for care schemes for long-term care services, selected OECD countries
Income/
Country Benefits available Cash benefit programs asset tested Use restrictions
Austria Both in-kind and cash 1) Cash Allowance for Care 1) No No. Can be used to pay for
(Pflegegeld) 2)Income care by relatives or other
2) 24-hour care benefit 3) No carer
3) Dementia care benefit
Czech Only cash benefits Care allowance No No. For services or care by
Republic relatives
Denmark In-kind, cash, and BPA (Citizen Controlled No Yes. Not for nursing care
vouchers Personal Assistance)
France In-kind and cash Allocation personnalisée Income Yes. Use of APAs is strictly
benefits are separate d’autonomie (APA) controlled. Can be used to
pay for care by relatives but
not a spouse
Germany Users must choose Cash benefits part of LTC No Yes. Cannot be used to pay
between either in-kind insurance scheme: 52% of for care by relatives or for
or (lower value) cash users opt for cash benefits some services (such as GP
benefits services)
Italy In-kind and cash Indennità di No No. Can be used to pay
benefits are separate accompagnamento (Carer/ relative or other carer
Companion allowance)
Korea Users must choose Cash benefits part of LTC No Only available to users who
between either in-kind insurance scheme live in remote areas with few
or (lower value) cash facilities, are unable to use
benefits LTC facilities due to national
disasters, or are unsuitable
for institutional LTC due to
physical or mental condition.
Cannot be used to pay for
care by relatives
Luxembourg Users must choose Cash benefits part of LTC No Cash for the first 10.5 hours
between either in-kind insurance scheme: Cash of care per week
or (lower value) cash Allowance for Care
benefits
The Users must choose Cash benefits (Personal Care No 98.5% of expenses must be
Netherlands between either in-kind Budgets) are part of LTC justified and unspent funds
or (lower value) cash insurance scheme: 12% of returned. Personal Care
benefits users opt for Personal Care Budgets can be used to pay
Budgets for care by relatives but they
must have a contract
Spain Users must choose 1)Allowance for user to hire 1) Income 1)Hire through accredited
between either in-kind services 2) Income centers
or cash benefits (the 2) Allowance for user 3) Income 2) To compensate informal
latter vary according to receiving informal care carers who must be a relative
program) 3) Allowance for Personal or in rural areas; a neighbor
Assistance can qualify
3)Expenses must be
justified; carer must have
professional qualifications
Sweden In-kind and cash 1)Attendance Allowance 1) No Yes. Cannot be used to cover
benefits are 2)Assistance Allowance 2) No medical expenses or to pay
complementary; also for care by relatives
vouchers
(continued)
Table 3 (continued)
Income/
Country Benefits available Cash benefit programs asset tested Use restrictions
United In-kind and cash 1)Attendance Allowance 1) Income 1) No
Kingdom benefits are 2) Direct Payments and asset 2) Yes. Spending record
complementary 3) Individual (social care) tested required
Budgets 2) Income 3) Yes. Cannot be used to
3) Income pay for care by relatives
and asset
tested
Sources: Adapted from van Hooren (2008); Columbo et al. (2011); Swartz (2013); Wirrmann Gadsby (2013)
benefit schemes available in a selection of coun- modulate access to care but also countries’ differ-
tries, highlighting these major differences. ent emphasis on social protection for vulnerable
or low-income groups.
Cost Sharing
Cost-sharing, in the form of copayments, deduct-
ibles, or user-charges apply to all long-term care Structure of the Delivery System
systems, whether they are universal, mixed, or
means-tested, safety-net systems. Commentators As can be seen in Fig. 1, based upon OECD
(Swartz 2013) have noted that rising long-term data on long-term care, there is substantial
care costs, aging populations, and pressure on variation in the percentage of the population
public sector spending due to structural deficits over 65 using long-term care services (OECD
and the recent financial crisis in Europe since 2013b). Consistent with its origins in the medi-
2008 have seen a shift to greater cost-sharing eval alms house and hospice, traditionally long-
among users of long-care services or their rela- term care was synonymous with residential
tives. In most cases, cost-sharing is subject to arrangements provided in an institution. Indeed,
income thresholds, with exemptions available for in spite of the concerted effort that most OECD
those meeting set criteria, such as low-income governments have made in “rebalancing” long-
status (Columbo et al. 2011; Swartz 2013). For term services and supports from institutions to
example, in the Nordic countries with universal community-based services, spending on long-
systems, cost-sharing mechanisms account for term care in institutions was higher than spend-
relatively low shares of publically financed formal ing at home in virtually all OECD countries in
long-term care services and in Sweden and Nor- 2008. On the other hand, there are many more
way, such contributions are capped. In contrast, using long-term care services residing at home.
beneficiaries in South Korea are required to pay a While in the average OECD country 12.9% of
coinsurance rate of 20% for residential care and those 65 and over receive formal long-term care
15% for home care (Jung et al. 2014). Similarly, in services, less than half (under 5%) receive care in
Australia, those eligible for public long-term care a residential or institutional setting. Indeed, most
services still need to contribute to the cost of their countries report that about twice as many in the
personal care in both residential and home set- population of long-term care users are receiving
tings, with the amount determined through those services at home (Columbo 2011).
means-testing (Columbo et al. 2011). Table 4 Figure 2, reflecting OECD Health Statistics
summarizes a number of cost-sharing approaches data reveals that in many countries with data on
to long-term care and provides some country the distribution of home-based and institutional
examples. Like user charges and cost-sharing care over time, it is evident that the share of long-
approaches in health care, it is clear that the dif- term care users receiving home care has
ferent approaches found in long-term care systems increased in most countries and as an OECD
reflect not only the incentive structures that average.
Table 4 Cost-sharing approaches for long-term care services in selected countries

Cost-sharing approach Country examples
Users have to first exhaust their own United Kingdom
means (means-tested systems) Eligibility for residential care is means-tested and individuals with savings
over a threshold are not eligible for public support. Cost-sharing is
applicable according to income/savings under the threshold but some
support from local government is available. Individuals with less than
GBP 14,250 in savings qualify to have their residential costs fully
covered.
Residual cost-sharing – after defined France
public benefits The Allocation personnalisée d’autonomie (APA) cash benefit is subject
to a national ceiling, and the level of benefit decreases as a proportion of
income.
Germany
Cost-sharing applies when the costs of long-term care services go beyond
the fixed public benefits. Families are required to help cover costs that
exceed statutory benefits. For residential care, recipients must cover
accommodation and meals; means-tested social assistance benefits may be
available to those who cannot meet these costs.
Flat rate (as a percentage) cost-sharing Japan
The long-term care insurance scheme sets a user-charge rate of 10% on all
public long-term care services (excluding preventive services).
South Korea
Under the national long-term care insurance scheme beneficiaries pay
20% of total institutional care costs and 15% of home care services costs,
with reductions or exemptions for low-income individuals.
User charges are linked to income and/or Finland
assets-based benefits In home care, private contributions are set according to the amount of care
needed and income of the recipient and other household members,
covering about 15% of total costs. In institutional care, personal
contributions are set at 85% of the recipient’s net income.
Norway
Municipalities have the flexibility to set personal contributions within
given frameworks. Personal contributions are typically income-related,
except for short-term stays in nursing homes, where contributions are set
independently from income. For long-term nursing home stays, personal
contributions cannot exceed 80% of a resident’s income in excess of a
given amount. For home care, user charges are set so as to leave the
recipient with a minimum income for extra expenses.
Spain
Private contributions are determined by each autonomous region and
differ according to care setting and type of service. The level of cost-
sharing depends on an assessment of financial capacity, typically based on
available capital, the beneficiary’s estate, and household income. Private
out-of-pocket payments range from 70–90% for residential care and
10–65% for home care.
Source: Columbo et al. (2011)
The Long-Term Care Services based care. While many countries have had a
and Supports Continuum range of different types of long-term care resi-
dential arrangements for frail older persons for
As noted, presently most OECD countries have many decades, the full array of community-
a higher percentage of long-term care users based services has been a relatively recent
receiving care at home than institutionally development. This required the development
% of population aged 65 years and over Institutions Home
25
22.1
20.3
20 19.1
17.6 17.4
16.7 16.3
14.5
15
13.1 13.0 12.8 12.7
12.3 11.7
11.2 11.2
10
7.2
6.7 6.4 6.4 6.4
5.9
5 4.1 3.7 3.4 3.2
0.8
Fig. 1 Percentage of population aged 65 or over receiving long-term care services, by country, 2011 (Source: OECD
2013a)
of a continuum of long-term services and sup- Indeed, some have argued that financing rules
ports, ranging from household chores to inten- and contradictory regulatory controls are the
sive, medically oriented nursing home care to major drawbacks to having more comprehen-
serve individuals as their needs increase. The sive and responsive long-term care delivery
different levels of intensity of nursing home systems.
care offered range from facilities that manage
chronically bed-bound patients requiring oxy-
gen, artificial feeding, and intravenous care to Long-Term Care Bed Capacity
facilities specializing in short term rehabilita-
tion to independent small apartments offering Even though home care is more prevalent than
congregate meals. Nonresidential long-term institutional care, the best data regarding long-
care services can range from intensive round term care services across the OECD refers to the
the clock “respite” services to weekly chore availability of residential long-term care beds per
and cleaning services, and all the range of nurs- elderly person. Figure 3 reveals the substantial
ing to meals services offered in the homes of intercountry variation in the number of residential
dependent elders or in community settings. Day care homes per 1000 elderly, ranging from under
care programs, with and without medical and 20 in Italy, Poland, and Korea to over 60 in the
nursing support, increasingly serve frail older Nordic and northern European countries. (In the
individuals who otherwise live with caregiver OECD data, Japan has very few nursing home
children. Finally, some have argued that even beds but many long stay hospital beds which are
the differentiation between residential and not counted as nursing homes although they serve
home care services can be false since, regard- a very similar population (Ikegami et al. 2014.)
less of where one lives, needed services can be Not included in these figures are OECD countries
provided to meet their needs (Kane et al. 1998). that do not report data on long-term care use such
Fig. 2 Share of long term care recipients aged 65 and over receiving care at home, 2000 and 2011 (Source: Columbo et al.
2011)
as Mexico and Chile which must be presumed to Facilities, in the USA, that are not nursing
have even less well-developed resources than homes per se, has made it appear as if the number
countries like Italy, Korea, and Poland. In spite of nursing home beds per capita elderly has
of the higher prevalence of home care recipients, fallen dramatically, even though the average
spending on long-term care in institutions is impairment level of Assisted Living residents
higher than spending on home care in all OECD now is as great as it used to be among nursing
countries reporting with the exception of Den- homes two decades ago (Sloane et al. 2005;
mark (Columbo et al. 2011). This reflects two Smith et al. 2007; Stevenson and Grabowski
phenomena: first, institutional care is more costly 2010). Thus, the recorded number of long-term
than home care and second because a higher care beds per 1000 elderly is very sensitive to
proportion of those receiving institutional care the different definitions of what constitutes a
are very impaired, particularly in the absence of long-term care bed, suggesting that in both
very involved family caregivers (Carpenter and Japan as well as in the USA, OECD data under-
Hirdes 2013). counts the number of long-term care beds,
In the absence of standardized definitions for although for different reasons. Indeed, according
what constitutes a “long-term care bed,” OECD to the first national survey of long-term care
data on the rates of such beds per 1000 elderly providers done by the US National Center for
are necessarily vague. Many countries have dif- Health Statistics, in 2012 there were some 15,700
ferent definitions for what constitutes a long- nursing homes but 22,000 Assisted Living
term care bed. For example, Japan has many Facilities with 39 nursing home beds per 1000
small, long stay hospitals licensed quite differ- elderly and 20 Assisted Living beds per 1000
ently from other Japanese acute hospitals but elderly. Adding these together places the USA at
serving populations that are not all that different the same level of total long-term care beds per
from licensed nursing facilities (Ikegami et al. thousand elderly as exists in the Netherlands
1994; Ikegami et al. 1997). On the other hand, and Belgium, (but still below Sweden) and
over the last several decades a new class of substantially higher than Germany or France.
residential long-term care home, Assisted Living However, it is not known whether certain classes
Fig. 3 Residential Care Home Beds per 1000 Elderly aged 65+, selected OECD Countries, 2009 (Source: OECD 2013a)
of retirement housing in those countries actually Community-Based Service Capacity

serve some of the same functions as do Assisted
Living Facilities in the USA. In response to a growing preference for care at
Notwithstanding the difficulty of measuring cur- home, over the past decade many OECD countries
rent bed capacity in a comparable way, OECD have implemented programs and benefits to sup-
planners have recently projected the future need port home-based care. Where trend data are avail-
for residential long-term care in multiple countries. able, the share of people aged 65 and over
Between 2010 and 2060, a recent report estimates receiving long-term care at home, as a share of
needing a 100% increase in the number of residen- the total number of long-term care recipients, has
tial long-term care placements in Germany, increased substantially over the past 10 years
whereas they project nearly twice that in the Neth- (Columbo et al. 2011). Other than this general
erlands. In Spain and Poland, which begin from a statement, however, there are few studies that
much lower base, the rate of increase estimated to have documented the changing supply or the
be needed is around 150%. Such projections are detailed levels of use of the variety of different
notoriously unreliable, at least based upon the home care services offered by agencies in Europe.
experience in the USA where estimates of future The OECD data on long-term care services pro-
demand for nursing home care were anticipated to vides specific information only on the number of
more than double by now. However, changes in nursing home beds in the country and not infor-
personal preferences, expectations, and the poor mation regarding the number of adult day care
public image nursing homes have in the mind of centers or home care agencies, even though, the
the public, all worked to undermine those literature and OECD data itself clearly reveal that
projections (Miller et al. 2010, 2012). The for the last decade or more the majority of recip-
continually dropping nursing home occupancy ients of formal long-term services and supports
rates, in spite of declining bed supply, is a have them provided in their homes.
clear indication of reductions in demand for One obvious reason for this lack of systematic
nursing home care in spite of the ongoing aging data on service supply relates to the difficulty of
of the population (Larson Allen 2008; Feng et al. arriving at common definitions of services across
2011). the boundaries of different countries, languages,
and cultural histories. Additionally, in many labor shortage (Mot and Willeme 2012). Indeed,
OECD countries the organization and provision demand for formal institutional care, as well as
of long-term care services is a local matter for formal home care support services, are anticipated
municipalities operating under national guidelines to increase from 100% to over 200% between
that still allow considerable local discretion in 2010 and 2060 in the face of a relative flattening
how long-term care policies are implemented of the available number of informal caregivers and
(Tarricone and Touros 2008). This means that a projected decline in the number of long-term
national governments may not have the kind of care workers employed in the formal sector.
detailed data that would make it possible to char- These projections for the four countries examined
acterize the supply of services across the whole as part of the ANCIEN project are consistent with
country. For example, a 2008 report on home care other European countries, and the authors suggest
included a section on the supply of home care that policy makers face major challenges in the
without offering any data regarding the number coming years as demand outstrips the supply of
of agencies, workers, or services available to the caregivers, both informal family members and
elderly population (Tarricone and Touros 2008). formal agency employees. The only way to
A report compiled as part of a conference increase the supply of formal care workers is to
hosted at the University of Amsterdam by Profes- substantially alter their compensation and
sor Dyvendak and his colleagues included a series improve their working conditions, both of which
of detailed case studies regarding the structure of will dramatically increase the costs of services
the long-term care service delivery systems in that are already projected to bankrupt countries
Greece, Germany, Italy, Poland, the Netherlands, based only upon changing demography.
England, Sweden, and Norway. The project was
designed to assess the adequacy of political com-
mitment to providing support services to the frail Informal Care Provision and Cash
elderly and others with long-term care needs Payments for Dependent Care
(Duyvendak et al. 2009). No consistent informa- Allowances
tion about the supply of home care services was
available across all the countries, other than state- While policies offering cash payments to frail
ments that formal home nursing and care aide elders and their family members are discussed
services provided by established and authorized under the financing section above, these policies
entities were largely unavailable in countries like have a direct bearing on the structure of the market
Greece, Italy, and Poland whereas in countries for long-term care services for several reasons
like Sweden and Norway, all municipalities have (Ungerson and Yeandle 2007). First, many coun-
these kinds of service agencies. This is consistent tries which have some form of cash payments
with OECD data indicating that the proportion of provided to eligible frail elders and their families
the elderly population receiving any formally pro- allow those funds to be used by family members,
vided long-term services and supports was very ostensibly as compensation for foregone labor
low in the southern European countries but much force activity (Wiener 2007). Second, unless
higher in northern European countries. there are explicit limitations on the use of such
One of the recent issues facing all OECD coun- cash transfers, according to the limited empirical
tries are projections of the number of needed long- research that has been done on the issue, recipi-
term care workers relative to the growing number ents and their families appear to be more likely to
of frail, aged individuals in the population purchase unskilled household and personal care
(Columbo et al. 2011). Projections for the num- help from the unregulated labor market. That is,
bers of workers relative to the size of the popula- there are many reports documenting that this kind
tion in need conducted by the European Network of work is frequently done by undocumented
of Economic Policy Institutes strongly point to the workers (Bettio and Solinas 2009). Third, the
fact that most countries will face a significant interrelationship between cash transfer programs
from long-term care insurance and the role of counseling” demonstration in three states was the
informal care and the undocumented “grey” stimulus for major expansions of this option for
labor force for domestic help has begun to receive Medicaid programs across the country since both
considerable attention in EU and OECD countries family members serving as informal caregivers
in the context of the raging debate regarding ille- and the workers hired by the family and patients
gal immigration and the cost of employment. reported improved satisfaction with their circum-
While it is not the place of this chapter to address stances when compared to the control group (Fos-
these issues thoroughly, the implications for the ter et al. 2007; RWJF 2013).
organization of long-term services and supports One of the key provisions of these cash allow-
are considerable since, many would argue that the ance programs is the extent to which the use of
growth of the illegal labor market for domestic the funds by recipients and their families are
help with the aging of the population undermines regulated, that is, how the money is spent is
the development of a robust home care services predetermined and/or whether there are restric-
system. It is this issue we address in the final tions on the kinds of workers that can be hired
paragraphs of this segment of the chapter (van (van Hooren 2008). In a comparative policy anal-
Hooren 2008). ysis, van Hooren appears to suggest that coun-
Informal care is the dominant source of support tries with little restriction on how such cash
for most community dwelling elderly throughout allowances are used are associated with higher
the developed world. While many have argued proportions of eligible households using illegal
that the availability of formal agency support domestic workers. Although the data are neces-
undermines and substitutes for endogenous infor- sarily limited, comparisons of Italian families’
mal care from families, the evidence both from use of undocumented domestic workers to care
microeconomic and macroeconomic studies do for the elderly and the virtually nonexistent use
not support this contention (Tarricone and Touros of such workers in the Netherlands suggests that
2008; Rothgang 2003; Foster et al. 2007). While families in countries with better developed for-
the proportion of the frail elderly in northern mal agency community-based services will rely
Europe who receive some form of formal care less upon the unregulated labor market to care for
services from municipalities is much higher than frail elders (van Hooren 2008; Simmonazzi
in southern Europe, the proportion receiving 2009).
assistance from families and friends is similar, In the USA, recent policies have expanded the
although the relative share of support may be applicability of “cash and counseling” programs
more highly weighted toward formal service pro- which encourage income and functionally eligible
viders (OECD 2013a). elders and their families to use their cash allow-
As noted, numerous OECD countries have ance to arrange for personal care attendants and
some form of cash payment system for frail elders home care assistants directly, thereby making
and their families who require long-term care their money stretch. Like many of the European
services and supports. While there are many cash transfer programs, the autonomy inherent in
details that differentiate the manner in which the such self-directed care seems to promote an infor-
funds are provided and the conditions under mal economy with workers, whether legally
which they can be used, it is clear that these are documented or undocumented, being paid with-
extremely popular programs (Columbo et al. out standard employer benefits such as holidays,
2011). Indeed, the popularity of the cash option vacation, and sick time which would be expected
is best seen in the fact that the vast majority of by a worker employed by an agency. For agencies
German households eligible for support under the to compete for labor when they have to withhold
long-term care insurance law elect to receive cash taxes and other payments from employees (not
rather than services even though the value of the necessarily required for cash payments) as well
cash is far less than the replacement cost of the as for customers, who do not necessarily want to
service. In the USA the success of the “cash and pay for the higher cost of agency-supervised
workers, places them in a highly disadvanta- requires increased investment in time, skilled staff,
geous position. Many US states committed to and resources to ensure appropriate reporting
implement such “consumer directed care” pro- dimensions, standardized data gathering proce-
grams are exploring ways of creating a labor dures, and consistent assessment protocols that
market that elderly and disabled clients and can then feed into quality improvement measures.
their families and advocates can rely upon However, just as there is a great deal of variation
which also have provisions for paying workers’ in how countries organize and finance long-term
benefits while indemnifying the care recipient. care, there are also differences in the regulatory
(See the “Cash and Counseling” Resource Center approaches to assuring quality.
Web site for presentations and discussions of
care worker training, benefits payment, and
liability insurance. http://www.bc.edu/schools/ Different Regulatory Approaches
gssw/nrcpds/cash_and_counseling.html.) to Quality Assurance
Over the next decades, OECD countries will
have to devise more cohesive policies to sup- In a recent comparative study that we
port the population’s desire to receive care at conducted on the regulation of long-term care
home without inadvertently stimulating demand quality in 14 countries, we identified three main
for undocumented workers and illegal immigra- approaches that underpin the quality assurance
tion which, in turn, undermines the ability of a frameworks in the countries with relatively
well-functioning formal market in long-term well-developed long-term care systems (Mor
care services. Northern European countries, et al. 2014). The first approach, as seen in
like wealthier US states, have well-developed countries such as Austria, Germany, Japan,
home care agency structures, whether publicly and Switzerland, delegates the main responsi-
or privately operated, precisely because they bility for upholding standards, training, and
have invested in this sector of the long-term staff certification requirements for the long-
care system, whereas southern European coun- term care workforce, as well as for monitoring
tries, like their US southern state counterparts, quality, to professional organizations. In this
have relied more extensively on family caregiv- approach, government is a partner in quality
ing which, when unable to meet elders’ needs, assurance rather than assuming a primary
seeks to purchase assistance from the informal “policing” role. While government is still
labor market. How to devise financing and involved in setting standards for long-term
reimbursement policies as well as regulatory care via legislation, the “professionalism-
structures to address these issues presents a based” approach to quality regulation places
major challenge to developed countries. considerable trust in “self-regulation.” This
position is predicated on the assumption that
associations of professionals involved in long-
Regulating Quality term care have distinctive expertise that the
state can rely upon to ensure their commitment
Ensuring the quality of long-term care involves to training and ethical good practices in caring
more than just putting into place regulatory for the elderly.
rules and procedures to govern the licensure In contrast, a second approach (followed in
(registration) and certification of providers of countries such Australia, England, the Nether-
long-term care services. It also involves having lands, and Spain) is much more empirical and
systems to monitor the safety, effectiveness, and inspection-based, where government authorities
success of those services in terms of assume the primary role in rule-making and
maintaining the well-being, health outcomes, monitoring providers’ compliance with statuto-
and dignity of long-term care recipients. In prac- rily defined regulations. This “inspection-based”
tice, the latter goal is more difficult to achieve and approach stresses the need for close oversight
by central authorities as there is generally less mainly applicable to the residential care sector as
societal confidence that professionals, or pro- quality regulation of home care agencies is very
viders, will always act in the interests of frail underdeveloped. Moreover, a check mark refers
elderly people using long-term care services. A only to the fact that the regulatory function takes
third approach, in place in Canada, Finland, place and does not purport to indicate the effec-
New Zealand, and the United States, builds tiveness of the regulations or the overall quality of
upon the existing inspection-oriented approach care.
to licensing, inspection, and complaints investi- As can be seen, the first four rows of Table 5
gation by adding quality measurement and pub- look at structural standards that are relevant to the
lic reporting protocols based on intensive data licensing of long-term care providers. All coun-
gathering and analysis. This “data management tries in this sample require providers to register
and public reporting” approach emphasizes with designated authorities and in the case of
standardization and reporting of data so that residential facilities must demonstrate that
long-term care users, ideally, can act as con- requirements for the physical plant (such as fire
sumers and choose the best services suited to and safety arrangements and quality of life con-
their needs, with quality being boosted by mar- siderations such as room size) are met. Moreover,
ket competition among providers. The best the OECD reports that in two-thirds of its member
example of this approach can be seen in the countries accreditation or certification of care
United States where the RAI Minimum Data facilities is compulsory, a condition for reimburse-
Set (MDS) (The MDS is a Resident Assessment ment and contracting or common practice (OECD
Instrument (RAI) which is required in all US 2010). In addition, formal regulations govern the
nursing homes in order to ensure that a resi- level of education and training that groups of
dent’s care plan is based upon a comprehensive long-term care workers (e.g., registered nurses,
assessment of their needs.) is used and the gov- personal care workers) must attain in order to be
ernment’s Nursing Home Compare Web site employed by a long-term care provider. However,
reports information on a range of MDS-based it is noteworthy that the levels of required training
assessment measures for short- and long-stay as well as experience vary markedly among coun-
nursing home residents [See http://www.medi tries. For example certified care workers need 75 h
care.gov/nursinghomecompare/search.html]. of training and experience in the United States,
430 h in Australia, 75 weeks in Denmark, and
3 years in Japan (OECD 2010; OECD/European
The Regulatory Reach of Quality Commission 2013; Table 4). In addition, market
Monitoring conditions, such as local unemployment rates or
the availability of excess labor (such as illegal
Monitoring the quality of long-term care can immigrants), has a big influence on the strictness
address structures, processes, and, less often, out- with which providers apply these professional
comes. It is also useful to divide quality regulation standards. Another consideration is the cost impli-
functions in terms of three broad domains: cations for mandating minimum training of staff
(1) standard setting and initial inspection and in formal care settings. Along with minimum
licensure, (2) ongoing surveillance and enforce- wages, social security contributions, and other
ment, and (3) reporting and/or rewarding perfor- labor-related overhead, training requirements
mance. Table 5 summarizes a wide selection of add to higher wage costs in the formal care sector,
these regulatory functions in a selection of OECD making formal long-term care services too expen-
countries, with the check marks in the columns sive for many users, particularly in countries
indicating that a particular function is an integral where public coverage is limited. In such cases,
part of the quality regulation regime in the partic- informal care from relatives or hiring cheaper care
ular country. It is important to note that the table workers from an available pool of migrant
rows includes quality assurance functions that are workers in the “grey labor market” becomes the
Table 5 Long-term care regulatory functions, selected countries
970
Regulatory The South

Function USAa Canada Netherlands England Australia Finland Japan Germany Austria Spain New Zealand Korea Switzerland China
1. Registration/ X X X X X X X X Xb X X X X X
Licensure
2. Structural X X X Xc X X X X Xb X X X X X
(physical)
standard setting
3. Professional X X X Xd X X X Xb X X X X X
education and
training
standards
4. Long-term X Xe X X X X X X X
care
professional
associations
5. Care Process X X X Xf Xg X X X Xb X X X Xb
minimum
standards
6. Resident/ X X X X X X X Xb X X X
Client outcomes
measures
7. Routine X X X X X X X Xb X X X Xb Xh
inspection
8. Random/ X X X X X X X X X
unannounced
inspections
9. Data and X X X X X X X Xb
experience-
based inspection
10. Monetary X X X X Xi X X X Xb,j X X
penalties for
noncompliance
11. Sanction and X X X X X X X Xk X X X X
warning system
12. Legal X X X Xl Xm X
appeals process
43
13. Complaint X X X X X X X Xb X X
collection and
monitoring
system
14. Telephone X X X X X X X
or Web-based
action-line
complaint
process
15. Public X X X X X Xn X X X
reporting
16. Consumer X X X X
choice data
17. Pay-for- X X
Performance
quality
Provision of Health Services: Long-Term Care
assurance
Source: Mor et al. (2014)
Notes:
a
Functions can vary slightly across nursing homes and community-based options.
b
Varies across regions.
c
Regulations pertaining to structural aspects are very broadly specified for the most part.
d
Standards for all groups exist but enforceability depends on the staff group (e.g., nurse, social worker, care worker).
e
Industry associations of aged care providers and nursing professional bodies.
f
Some provider regulations refer to aspects of the care process but these are broadly specified. Within the regulations, cross-references are made to best practice guidelines, but
inspectors seem to have a good deal of leeway on how to interpret this standard. NICE is developing quality standards which set out care process minimum standards, but it is not
yet clear to what extent these are enforceable or merely guidelines.
g
Legislation sets out a Schedule of Specified Care and Services and requires providers to deliver care of “an appropriate standard.” However, the legislation does not set out
minimum staff-resident ratios or hours of care.
h
Poorly or variably enforced.
i
There are no fines for poor care per se. However, poor providers can be sanctioned by having government funding withheld for new residents until they meet care standards. This is
a form of financial sanction.
j
These exist but are poorly enforced and with very low level of fines.
k
Contracts can be terminated by the Long-term Care Fund.
l
Appeals can be brought against registration or outcome of inspections.
m
Providers have appeal rights against regulatory decisions. Care recipients have appeal rights against a decision by the government to not to approve them for subsidized care.
n
Limited to inspections.
971
only viable alternative, especially where The best example of a such an assessment tool
unregulated cash benefits are available to further is the Resident Assessment Instrument Minimum
incentivize this solution (van Hooren 2008) (See Data Set version (RAI-MDS) applied to nursing
also section “Structure of The Delivery System” home contexts and the versions developed for the
above). Obviously, the use of unskilled informal, assessment of individuals receiving home care
hired, carers may have consequences for the qual- (interRAI-HC) and care in community settings
ity of care provided – although many positive (inter-RAI CHA). First developed in the United
benefits of informal or hired care, such as foster- States in the late 1980s and subsequently
ing empowerment in the care user and building extended through an international consortium of
strong relations of trust, also have been reported experts, the current comprehensive suite of
(Columbo et al. 2011; Dale et al. 2005). assessment instruments are standardized tools to
One final aspect in this group of functions is the detect long-term care users’ strengths, needs,
role of professional organizations and/or indepen- and potential risks to enable individualized
dent, nongovernment organizations, in helping to monitoring and care planning. In addition, col-
set standards for long-term care providers (Row lected data are aggregated to produce quality indi-
4). Again, there is wide variability among coun- cators on processes and outcomes both at the
tries in both the participation rates of such organi- individual and facility/organizational level (Mor
zations and the rigor with which they pursue their et al. 2010; Hutchinson et al. 2010). The mandated
roles in standard setting, i.e., whether they use or testing of RAI assessment instruments
actively participate in developing benchmarks internationally has been growing over the last
for best practice (e.g., in Austria and Japan) or decade, with a presence in several countries in
whether they tend to limit their role to advocating North America, Europe, South-East Asia, and
in favor of minimum standards (USA). Australasia (see http://www.interrai.org/world
Rows 5 to 12 consider different functions wide.html).
associated with ongoing monitoring and Focusing directly on approaches to inspection,
enforcement as captured by inspection regimes. we can see from Table 5 that after a provider has
Such monitoring focuses in particular on process been certified or licensed, routine inspections are
standards that are applied to encourage positive almost universally carried out, albeit according to
aspects of care (such as weight monitoring, different time-frames and conditions that may
wound monitoring, fall prevention, and infection trigger an inspection (Rows 7–9). For example,
control) or to prohibit practices that often have a inspections may take place every few years, or
negative impact (e.g., the use of physical they may be less frequent based on a provider’s
restraints or the use of antipsychotic medica- good performance in previous inspections; the
tions) as well as the sanctions that can be latter kind of “risk-based” regulation relies on
imposed in cases of poor performance and/or historical inspection data and using it to shape
noncompliance. Such standards exist in several the regularity and intensity of subsequent inspec-
countries to minimize the use of these behavior tions. Alternatively, a desk audit of data submitted
control schemes by requiring extensive docu- in advance by a provider may take place in some
mentation to justify their use, and making them circumstances, either in lieu of or prior to an
the subject of inspection. In contrast, the moni- on-site inspection. In addition, ad hoc inspections
toring of resident’s health or well-being out- may be triggered by a complaint by a resident or
comes as an aspect of quality control is a family member, and the assessor will often inves-
relatively advanced and complex objective, tigate both the source of the particular complaint
involving the definition of such “outcomes” for as well as seek to document other problems in the
frail elderly individuals and then establishing same care domain as the complaint. On-site
standardized data systems that carers and inspec- inspections may also be carried out according to
tors can use to determine whether such outcomes regular schedules or be random (unannounced)
have been achieved. with the aim of observing providers in carrying
out their day-to-day care duties with no prior complaints channels or making available system-
notification. To date, however, there is no empir- atic data on provider quality performance. In some
ical data on the efficacy of one approach over the countries, financial incentives (such as pay-for-
other in terms of stimulating better quality of care performance tariffs) may be in place through pub-
(Mor et al. 2014). lic funders of long-term care to encourage pro-
Once inspections uncover a problem with an viders to participate in quality assurance programs
aspect of care, different regulatory frameworks (Rows 13–17). Although complaints monitoring
employ various means to rectify the problem data is scarce, systems for submitting complaints
(Rows 10–12). Some countries prefer to view about the treatment of long-term care recipients
inspections as collaborative, compliance-based exist in most countries, again with substantial
exercises in which inspectors first work in tandem variation in the means available (e.g., written
with providers to find solutions to the identified complaints or telephone/internet action-lines)
deficiencies. This may take the form of informal and the requirements for responding to such
negotiations and persuasion. In other cases, more complaints.
formalized “deterrence-based” procedures are Of equal saliency is making information about
used, such as issuing warnings to return to com- providers’ quality performance available to con-
pliance within a given timeframe. Most systems, sumers. This could take the form of presenting the
however, do have some form of official sanctions results of recent inspections or supplying specific
if providers fail to respond adequately or if performance data on providers through easily
repeated cases of noncompliance are found. By accessible media such as the Internet. For exam-
linking quality to financial penalties, the ability of ple, in the USA various measures of quality, rang-
regulators to levy fines or other penalties against ing from staffing levels, to inspection results, to
poorly performing providers represents one way indicators of process and outcome quality are
of incentivizing improvements. This can be done computerized and posted on government Web
by fining the provider directly, restricting further sites. In Finland, data on residents’ outcomes are
admissions (and therefore potential revenue), voluntarily fed back to providers with the inten-
and/or in countries with public long-term care tion that ultimately this information (particularly
financing, withholding reimbursement until the if it is positive) might be used by the providers
specific problem is fixed. themselves to inform potential long-term care
A last-resort and very rarely applied sanction is consumers in their areas. A similar structure is in
decertification, or revocation, of a provider’s place in New Zealand for their home care agen-
license to operate (Angelelli et al. 2003). Regula- cies, with plans to extend the practice to residen-
tors are often reluctant to impose this ultimate tial care facilities. This transparency can
penalty as relocating frail elderly residents from have several advantages. Firstly, if directly avail-
a facility that has been sanctioned with closure able to consumers, performance data based on
would mean finding other suitable and available various indicators can inform choices in selecting
places in the same area and may also potentially a long-term care provider (Werner et al. 2009).
impose “relocation” stress on residents. It is also Secondly, public reporting of providers’ quality
worth noting that all of the compliance enforce- can exert pressure on them to address problems
ment methods mentioned above may be subject to and maintain standards, particularly if they are
lengthy procedural requirements that regulators preforming poorly and do not wish to com-
have to adhere to and/or legal appeal processes promise their reputation in the local long-term
open to sanctioned providers. These processes care market. Thirdly, having access to perfor-
often involve considerable periods of time before mance data, particularly on comparable quality
a noncompliance issue is resolved. indicators across a number of providers (either
Finally, the quality of long-term care can be local or national), can feed into individual pro-
monitored via various means to report on providers’ voluntary quality improvement strategies
viders’ performance, whether through established (Werner and Konetzka 2009).
Challenges Facing Quality Monitoring Related to this is the issue of whether any
collected data is made publicly available. One
While comparative international information is stumbling block is the opposition of providers to
still quite scarce, available sources (Mor et al. sharing information not only with the public but
2014) highlight that despite how developed or specifically with competitors, particularly if there
underdeveloped a long-term care system is, regu- is a reputational risk involved in releasing infor-
latory frameworks and some form of monitoring mation about poor performance. While some
activities always exist for residential facilities countries have started to collect inspection-based
(nursing homes). In contrast, due to the existence data, very few follow the example of the United
of informal care arrangements as well as less States where the availability of such data on res-
developed quality assurance programs applicable idential services helps would-be residents to make
to formal home care agencies, there is much less choices about what nursing home facilities would
regulation and knowledge about the regulation of best suit their needs or to vote with their feet if
home care, although some countries (such as the their existing facility falls short. The exercise of
Netherlands, New Zealand, Canada, the USA, and choice based on quality data also extends to health
Switzerland) have made inroads into monitoring and long-term care insurers who in the future will
the quality of home care. Given the relative increasingly have to make purchasing decisions
growth of the latter sector and the general prefer- about competitive service suppliers based on both
ence by long-term care users to remain in their cost-effectiveness and the quality of care. Indeed,
own home for as long as feasible (European Com- this process is already taking place in the USA,
mission 2008) the relative scarcity of quality Canada, Finland, and New Zealand.
assurance frameworks for home care settings A final consideration is affecting a sea change
will be a major but necessary challenge for gov- in the attitude to quality monitoring and its role in
ernments in the future. incentivizing improvements in the quality of long-
A second key challenge is the availability of term care. While the primary purpose of measur-
data that can be used for quality assurance pur- ing long-term care structures, processes, and out-
poses – not only in the residential care sector, comes is to ensure the safety and dignity of service
where data gathering can often be sporadic and users, the information harnessed by this process
not standardized across facilities, but also in the can be an invaluable tool for providers to assess
home care sector, where data are even more the relative quality of their performance against
limited. As far as European countries go, it is relevant benchmarks, be it officially set standards
still the case that standardized information derived or industry averages. Armed with such meaning-
from the inspection process is not routinely col- ful data, and skilled staff to interpret it, long-term
lected nor archived for subsequent use. One rea- care providers would then be in a much better
son for this is the difficulty of standardizing position to shape their improvement strategies,
inspections, or assessments, across different not only to enhance their marketability but more
regions, particularly if this function is importantly, for the benefit of the elderly clients in
decentralized to lower levels of government their care.
administration (such as municipalities or local
agencies) or to providers themselves. Another
hurdle is variability in the interpretations and Summary and Conclusions
evaluations of individual assessors. Thus, the
lack of consistency hinders meaningful compari- Governments of the rapidly aging industrialized
sons across providers. Indeed, even in the USA countries are just beginning to be aware of the
where standardized inspection protocols are in enormous challenges they will face in meeting the
place and computerized, there is substantial inter- care needs of the frail elderly. Over the last decade
state variation in the conduct and results of inspec- or so, most countries have begun the difficult
tion. (Mukamel et al. 2012). process of rebalancing the provision of long-
term care from a system that was almost entirely Extending quality measurement to the home
supporting residential or institutional care to one care setting, particularly if including frail older
in which the majority of service recipients were persons receiving cash payments which they
cared for in their homes. This shift was acceler- apply to paying family or undocumented and
ated, or made possible, in many countries by the unlicensed workers, presents numerous chal-
introduction of direct cash payments to eligible lenges. Home care providers in the USA,
individuals and their families, allowing them to many Canadian provinces, and New Zealand
direct their long-term care mix of services using have implemented individualized quality met-
entitlement funds for which they are eligible rics as part of a routine client assessment pro-
because of their need for functionally based cess, and these data have been used to report on
support. provider quality (Mor et al. 2014). These expe-
There are several consequences that these riences suggest that the use of this kind of
shifts in care orientation have brought about. “microlevel” information is certainly a feasible
First, giving older consumers and their family approach to quality performance measurement
members control over who they hire has substan- of home care services, but they necessarily
tially altered the labor market for long-term care depend upon professionals periodically
services, particularly in those areas where well- assessing the client and using those data to
developed agency-based long-term home care ser- calculate indicators of quality performance. In
vices do not exist. Indeed, the availability of some the case of cash payments to family and infor-
financial support may allow late middle-aged chil- mal labor market participants, this approach is
dren of frail elderly persons to remain out of the not viable without introducing a mandatory
formal workforce or provide another reason for assessment in recipients’ homes, a process that
these individuals to retire early, becoming full may be perceived as excessively onerous,
time, partially paid caregivers to their aged and entailing an invasion of privacy. Furthermore,
frail parent. Second, as we have seen, monitoring since most OECD countries have not even
the quality of care and services rendered to frail established a solid data reporting system in
older persons is difficult enough when only pro- reference to institutional care provision and
vided in large residential care settings. Adminis- quality, it would be hard to imagine that most
trative procedures for reporting staffing levels and would be willing to institute an even more
quality as well as documenting services rendered complicated and costly data-based approach to
are sufficiently burdensome that many countries quality oversight of home care services.
do not require this form of reporting. Furthermore, Newer challenges which policy makers in the
hiring independent inspectors to monitor the per- health care delivery space are increasingly worry-
formance of these institutional providers consti- ing about is the linkage between reimbursement
tutes a large expense even if facilities are only and quality measurement. Strategies to assure
inspected annually. However, to truly monitor quality by applying “value-based purchasing”
quality issues requires more frequent inspections, have been tried with limited success in the USA
unannounced inspections, and inspections insti- but are likely to emerge as the next emerging
tuted in response to residents’ and families’ com- policy debate. To even consider this approach,
plaints. While this constitutes a very difficult task however, there is a critical need for consistent
in the case of residential care, monitoring quality data about patients’ outcomes in selected areas
in the home care setting, much less, policing and providers’ characteristics and services. To
family members’ own provision of care to the date, these types of data exist in only a few coun-
frail older person in their own homes is consider- tries, but the complexities of introducing such
ably more complex and costly, requiring close systems are substantial even after the data collec-
collaboration with what in the USA is known as tion and assembly challenges have been met. It is
“adult protective service” given the real potential likely, however, that as demand for various types
for abuse. of long-term care services increase due to
population aging and inadequate private savings Grabowski DC, Cadigan RO, Miller EA, Stevenson DG,
among the elderly, public support for long-term Clark M, Mor V. Supporting home- and community-
based care: views of long-term care specialists.
care services may well be contingent upon those Med Care Res Rev. 2010;67(Suppl 4):82S–101S.
services being viewed as value for money. Hutchinson AM, Milk DL, Maisey S, Johnson C, Squires
JE, Teare G, Estabrooks CA. The resident assessment
instrument-minimum data set 2.0 quality indicators: a
References systematic review. BMC Health Serv Res. 2010;10:166.
https://doi.org/10.1186/1472–6963–10-166.
Alakeson V. International development in self-directed Ikegami N, Fries BE, Takagi Y, Ikeda S, Ibe T. Applying
care. Issue Brief (Commonw Fund). 2010;78:1–11. RUG-III in Japanese long-term care facilities. Geron-
Angelelli J, Mor V, et al. Oversight of nursing homes: tologist. 1994;34(5):628–39.
pruning the tree or just spotting bad apples? Gerontol- Ikegami N, Morris JN, Fries BE. Low-care cases in long-
ogist. 2003;43(2):67–75. term care settings: variation among nations. Age Age-
Bettio F, Solinas G. Which European model for elderly ing. 1997;26(Suppl 2):67–71.
are? Equity and cost-effectiveness in home based care Ikegami N, Ishibashi T, Amano T. Japan’s long-term care
in three European countries. Econ Lavoro. 2009;43 regulations focused on structure – rationale and future
(1):53–71. prospects. In: Mor V, Leone T, Maresso A, editors.
Carpenter I, Hirdes J. A good life in old age: monitoring Regulating long-term care quality: an international
and improving quality in long term care. OECD Health comparison. Cambridge: Cambridge University Press;
Policy Studies, OECD Publishing; 2013. https://doi. 2014.
org/10.1787/9789264194564-en Jung H-Y, Jang S-N, Seok J-E, Kwon S. Quality monitor-
Columbo F, Llena-Nozal A, Mercier J, Tjadens F. Help ing of long-term care in the Republic of Korea. In:
wanted? Providing and paying for long-term care. Mor V, Leone T, Maresso A, editors. Regulating long-
Paris: OECD Publishing; 2011. term care quality: an international comparison. Cam-
Dale S, Brown R, Phillips B, Carlson BL. How do hired bridge: Cambridge University Press; 2014.
workers fare under consumer-directed personal care? Kane RA, Kane RL, Ladd RC. The heart of long-term care.
Gerontologist. 2005;45(5):583–92. New York: Oxford University Press; 1998.
Damiani G, Farelli V, Anselmi A, Sicuro L, Solipaca A, Katz MB. In the shadow of the Poorhouse: a social history
Burgio A, Iezzi DF, Ricciardi W. Patterns of long term of welfare in America. Tenth anniversary edition.
care in 29 European countries: evidence from an New York: Basic Books; 1996.
exploratory study. BMC Health Serv Res. 2011;11:316. Kellogg DO. The pauper question. Atl Mon. 1883;51
Doty P, Mahoney KJ, Sciagaj M. New state strategies to (307):638–652.
meet long-term care needs. Health Aff. 2010;29 Larson Allen L. Mapping the future: estimating Florida
(1):49–56. aging service needs 2008–2030. Tallahassie: Agency
Duyvendak JW, Grootegoed E, Savernije MT, Tonkens for Health Care Administration; 2008.
E. Day 1: long-term care in Europe, the state of the Miller EA, Mor V, Clark M. Reforming long-term care in
art. Presentation given at does Europe care? European the United States: findings from a national survey of
Conference on Long-Term Care and Diversity, Amster- specialists. Gerontologist. 2010;50(2):238–52.
dam; 2009. http://www.careconference.eu/site/sites/ Miller EA, Tyler DA, Rozanova J, Mor V. National news-
default/files/Part201.pdf. paper portrayal of U.S. nursing homes: periodic treat-
European Commission. Long-term care in the European ment of topic and tone. Milbank Q. 2012;90
Union. Brussels: Commission of the European Com- (4):725–61.
munities, DG Employment, Social Affairs and Equal Mor V, Miller EA, Clark M. The taste for regulation in
Opportunities; 2008. long-term care. Med Care Res Rev. 2010;67(Suppl
Feng Z, Lepore M, Clark MA, Tyler D, Smith DB, Mor V, 4):38S–64S.
Fennell ML. Geographic concentration and correlates Mor V, Leone T, Maresso A, editors. Regulating long-term
of nursing home closures: 1999–2008. Arch Intern care quality: an international comparison. Cambridge:
Med. 2011;171(9):806–13. Cambridge University Press; 2014.
Fernandez JL, Forder J, Trukeschitz B, Rokosová M, Mot E, Willemé P, editors. Assessing needs of care in
McDaid D. How can European States design efficient, European nations, ENEPRI policy brief no. 14, vol.
equitable and sustainable funding systems for long- 2012. Centre for European Policy Studies: Brussels;
term care for older people? Copenhagen: World Health 2012.
Organization and World Health Organization on behalf Mukamel DB, Weimer DL, Harrington C, Spector WD,
of the European Observatory on Health Systems and Ladd H, Li Y. The effect of state regulatory stringency
Policies; 2009. on nursing home quality. Health Serv Res. 2012;47
Foster L, Dale S, Brown R. How caregivers and workers (5):1791–813.
fared in Cash and Counseling. Health Serv Res. OECD. Ensuring quality long-term care for older people.
2007;42(1 Pt 2):510–32. Paris: OECD Publishing; 2010. Policy Brief.
OECD. Recipients of long-term care. In: Health at a glance Smith DB, Feng Z, Fennell ML, Zinn JS, Mor V. Separate
2013: OECD indicators. Paris: OECD Publishing; 2013a. and unequal: racial segregation and disparities in qual-
https://doi.org/10.1787/health_glance-2013-75-en ity across US nursing homes. Health Aff. 2007;26
OECD. OECD health data: long-term care resources and (5):1448–58.
utilisation. Paris: OECD; 2013b. Stevenson DG, Grabowski DC. Sizing up the market for
OECD/European Commission. A good life in old age? assisted living. Health Aff. 2010;29(1):35–43.
Monitoring and improving quality in long-term care, Swartz K. Searching for a balance of responsibilities:
OECD health policy studies. Paris: OECD Publishing; OECD countries’ changing elderly
2013. https://doi.org/10.1787/9789264194564-en assistance policies. Annu Rev Public Health.
Phillips B, Schneider B. Commonalities and variations in 2013;34:397–412.
the Cash and Counseling programs across the three Tarricone R, Touros AD, editors. The solid facts: home
demonstration States. Health Serv Res. 2007;42 care in Europe. Copenhagen: World Health Organiza-
(1 Pt 2):397–413. tion Regional Office for Europe and Universita'
Rothgang H. Long-term care for older people in Germany. Commerciale Luigi Bocconi; 2008.
In: Comas-Herrera A, Wittenberg R, editors. European Ungerson C, Yeandle S. Cash for care in developed
study of long-term care expenditure. Investigating the welfare states. Houndmills: Palgrave Macmillan;
sensitivity of projections of future long-term care 2007.
expenditure in Germany, Spain, Italy and the United van Hooren F. Bringing policies back in: How social and
Kingdom to changes in assumptions about demogra- migration policies affect the employment of immi-
phy, dependency, informal care, formal care and unit grants in domestic care for the elderly in the EU-15.
costs. Report to the European Commission, Employ- Paper presented at Transforming elderly care at local,
ment and Social Affairs DG: 24–42. 2003. http://ec. national and transnational level, International Confer-
europa.eu/employment_social/soc-prot/healthcare/ltc_ ence at the Danish National Centre for Social Research
study_en.pdf (SFI), Copenhagen; 2008.
RWJF – Robert Wood Johnson Foundation. Executive Werner RM, Konetzka RT. What drives nursing home
summary: cash and counseling program. Princeton: quality improvement under public reporting? An exam-
Robert Wood Johnson Foundation; 2013. Available ination of post-acute care. Chicago: AcademyHealth;
at: http://www.rwjf.org/content/dam/farm/reports/pro 2009.
gram_results_reports/2013/rwjf406468/subassets/rwjf Werner RM, Konetzka RT, Stuart EA, Norton EC,
406468_1 Polsky D, Park J. Impact of public reporting on quality
Simmonazzi A. Home care and cash transfers. Effects on of postacute care. Health Serv Res. 2009;44
the elderly care-female employment trade-off. Cost (4):1169–87.
Conference. Rome; 2009. Wiener JM. Commentary: cash and counseling in an
Sloane PD, Zimmerman S, Gruber-Baldini AL, Hebel JR, international context. Health Serv Res. 2007;42(1 Pt
Magaziner J, Konrad TR. Health and functional out- 2):567–76.
comes and health care utilization of persons with Wirrmann Gadsby E. Personal budgets and health: a
dementia in residential care and assisted living facili- review of the evidence. London: PruComm. Policy
ties: comparison with nursing homes. Gerontologist. Research Unit in Commissioning and the Health Care
2005;45 Spec No 1(1):124–32. System, Department of Health; 2013.
Provision of Health Services: Mental
Health Care 44
Jon Cylus, Marya Saidi, and Martin Knapp
Contents
Introduction: Why Is Mental Health Important? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 980
Definitions and Spectrum of Mental Health Disorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 980
Direct and Indirect Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 981
Stigma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 982
Comorbidity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 983
Provision of Mental Health Care: How Is Care Delivered? . . . . . . . . . . . . . . . . . . . . . . . . 983
Who Delivers Care: Medical Professionals, Unpaid Caregivers . . . . . . . . . . . . . . . . . . . . . . . 983
Financing Mental Health Services: How Is Care Financed? . . . . . . . . . . . . . . . . . . . . . . . 986
Key Policy Dimensions/Recent Policies and Trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986
Personalization and Empowerment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987
Carer and Family Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 987
Prevention, Promotion, Public Mental Health (e.g., Campaigning) . . . . . . . . . . . . . . . . . . . . 988
Aging and Dementia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 989
Employment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 990
New Advancements in Treatments and Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 992
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 994
Abstract remains a much neglected issue. Throughout the

Despite estimates that one in four people expe- world, the level of resources dedicated to mental
rience a significant episode of mental illness health is incommensurate with its prevalence or
during their lifetime (Kohn et al., Bull World with its burden on society. Stigma and social
Health Organ 82:858–866, 2004), mental health exclusion can make it harder for people with
mental health problems to obtain and maintain
work, access appropriate health services, partic-
J. Cylus (*) · M. Saidi · M. Knapp ipate in their communities, or enjoy family
The London School of Economics and Political Science, life. Public attitudes toward mental illness,
London, UK although showing signs of improving in recent
e-mail: j.d.cylus@lse.ac.uk;
years, are often negative and sometimes
m.saidi1@lse.ac.uk; m.knapp@lse.ac.uk

https://doi.org/10.1007/978-1-4939-8715-3_25
980 J. Cylus et al.
discriminatory. A recent Eurobarometer (2010) inequity in their access, as well as inefficiencies

survey found that two thirds of EU citizens in their use; and these have serious consequences.
reported feeling uncomfortable talking to some- Treatment gaps, defined as the nonreceipt of care
one with a “significant mental health problem,” when it is needed, still occur, and as many as a
while one in five found it “difficult.” third of individuals with schizophrenia and other
non-affective psychoses do not receive any treat-
ment (Kohn et al. 2004).
Introduction: Why Is Mental Health In this chapter, we look at the state of mental
Important? health, mental health care, and policy across the
world. We discuss how people with mental health
Despite significant gaps, awareness of mental problems are treated and how services designed to
health problems continues to improve. Many care for these individuals are financed. We end by
countries, particularly in Europe, have taken focusing on key policy trends which will help to
steps to develop or modernize their mental health shape future priorities and actions.
policy frameworks highlighting the need to try
to prevent mental health problems, to raise aware-
ness of them when they arise, and to improve Definitions and Spectrum of Mental
the volume and quality of resources available Health Disorders
for treatment and care. However, frameworks in
many other countries remain outdated with their The most widely used classifications of mental
mental health-care capacity correspondingly defi- disorders are the International Classification of
cient. Recently the World Health Organization Diseases-10 (ICD-10) (World Health Organiza-
launched its Mental Health Action Plan tion 1992) and the Diagnostic and Statistical
2013–2020 (World Health Organization 2013), Manual-V; the former is currently being revised
which introduced six crosscutting principles that and updated. The current form of the DSM repre-
they suggested should be at the heart of the policy sents its first update in more than two decades;
discussion for global mental health. Universal however it has not gone without controversy. The
health coverage for all was emphasized, as well controversy mainly surrounds the perhaps over-
as the promotion human rights, in a way that all simplification of the description of autistic spec-
mental health strategies should be compliant trum disorder (Wing et al. 2011).
with international and regional human rights Otherwise, the ICD-10 defines mental disor-
instruments. Evidence-based practice should ders as “the existence of a clinically recognisable
be promoted for mental health strategies as well set of symptoms or behaviour, associated in most
as interventions for treatment, prevention, and cases with distress and with interference with
promotion. Policies should adopt a life course personal functions.”
approach, taking into account health and social For the most part, the etiology of mental
needs at all stages and ages of life. Partnership health problems is not fully known, but its deter-
between multiple sectors (health, education, hous- minants and risk factors can be grouped into three
ing, social, judicial, and other relevant sectors as distinct categories: biological factors (e.g., hered-
well as the private sector) should be encouraged, ity or physical diseases), psychological factors
taking into account local (country and regional) (e.g., traumatic experiences or early separation),
contextual factors. Finally, and perhaps for the and social factors (e.g., lack of social support and
first time, the WHO put empowerment at the deprivation) (Lehtinen et al. 2007, p. 127).
forefront of their mental health agenda, stating In terms of definitions, common mental
that people with mental health problems should disorders (CMDs) are mental conditions that
be involved in several aspects of policy. cause marked emotional distress and interfere
Saxena et al. (2007) have highlighted the with people’s daily functioning; however, they
global scarcity of resources for mental health, do not usually affect insight or functioning.
44 Provision of Health Services: Mental Health Care 981
These comprise different types of depression and (16%), again higher in Western developed coun-
anxiety, and their symptoms include low mood tries. Mood disorders were also very common
and a loss of interest and enjoyment in usual (12%) and were also mainly reported in Western
activities. Anxiety disorders include generalized countries. Kessler et al. suggested that a reason for
anxiety disorder, panic disorder, phobias, and their possible underestimation of prevalence rates
obsessive and compulsive disorder (OCD). in some countries may be because the DSM cate-
OCD, the most severe form of anxiety disorder, gories are less relevant to symptom expression in
is characterized by a combination of obsessive some countries that others.
thoughts and compulsive behaviors, where obses-
sions are defined as recurrent and persistent
thoughts and impulses or images that are intrusive Direct and Indirect Costs
and inappropriate and cause anxiety or distress,
while compulsions are repetitive, purposeful, and The latest data from the Global Burden of
ritualistic behaviors or mental acts that are Disease study (2013) estimate that mental and
performed in response to obsessive intrusion and behavioral disorders accounted for 198.3 million
to a set of rigidly prescribed rules (National Centre disability-adjusted life years (DALYs), with uni-
for Social Research 2007). Psychoses are disor- polar depressive disorders accounting for 37.8%
ders of the mind that can produce disturbances in of them. Anxiety disorders were the second
thinking as well as perceptions severe enough to biggest contributor, at 13.6% of DALYs of mental
produce distortions in perceptions of reality. Psy- health. The WHO (2013) ranked mental and
choses may also impair motivation and may be behavioral disorders the sixth leading cause of
associated with affective dysregulation (depres- DALYs worldwide for 2011, surpassing respira-
sion, mania), as well as alterations in information tory diseases, neurological and sense organ con-
processing (cognitive impairment) (Van Os et al. ditions, musculoskeletal diseases, and endocrine,
2010). Van Os et al. conclude that overall, psy- blood, immune disorders and diabetes.
chotic outcomes are associated with living in an Importantly, the burden of mental and sub-
urban area, being part of a minority group, canna- stance abuse disorders had increased significantly
bis use, and developmental trauma – hence is since 1990. However, almost a third of countries –
linked to the three risk factors described above. surprisingly perhaps – still do not have a desig-
An important systematic review of the literature nated budget for mental health; and 21% of
on the epidemiology of schizophrenia (McGrath the countries that do have a specific mental
et al. 2004) found a lifetime prevalence rate of health budget spend less than 1% of their total
0.5–1%. Rates varied across the dimensions Van health budgets on mental health (World Health
Os et al. suggested, as well as gender: schizophre- Organization 2008).
nia was more common in males compared to People with mental health problems experi-
females. ence high rates of unemployment. For example,
Kessler et al. (2009) studied the prevalence in OECD countries and depending on level of
rates reported in the first 17 World Mental Health severity, people with mental health problems are
Surveys and found lifetime prevalence estimates between two to three times and six to seven times
of any DSM-V disorder to be 18.1–36.1%. more likely to be unemployed compared to people
These were the highest in Columbia, France, without such conditions (OECD 2012). One rea-
New Zealand, Mexico, the Netherlands, and son for this difference is that illness can make it
South Africa and the lowest in China and Nigeria. difficult to perform a job, but perhaps bigger
Kessler et al. (2009) commented that the low problems are stigma and discrimination.
prevalence rates in the last two countries may be People with a history of mental health prob-
downwardly biased. Anxiety disorders were lems still face problems in the open employment
consistently found to be the most prevalent class market, including stigma, and a reluctance from
of mental disorder in the general population employers to give them a job (McDaid 2008). The
982 J. Cylus et al.
fact that some people with mental health problems found that people with schizophrenia reported the
receive social security benefits also may also hin- highest self-stigma scores and perceived discrim-
der their chances of seeking and obtaining ination in Greece and the lowest empowerment
employment (OECD 2011). scores in Ukraine. In this case, empowerment and
increased social contact were significantly associ-
ated with reduced self-stigma scores.
Stigma The greatest barriers to social inclusion
for people with mental health problems are said
Stigma can be a “mark of disgrace associated to be stigma and discrimination (Baldwin and
with a particular circumstance, quality or person” Marcus 2011; Social Exclusion Unit 2004).
(Oxford Dictionaries 2010), yet it is no longer Indeed, misconceptions about mental health
physical or bodily in nature (Goffman 1963; Wahl can also lead to the belief that these diseases
1999); it is now viewed as personal, psychological, are untreatable and people who have them
and social. People are no longer physically branded are not valued members of their communities,
but labeled by society as poor, homosexual, crim- subsequently leading to appropriate support and
inal, or, in this case, mentally ill. These labels have resources not being delivered (Funk et al. 2012).
influenced perceptions and behaviors and lead to Lack of access to proper judicial mechanisms
the devaluation and denigration of those who are so that would protect their rights (World Health
labeled (Thornicroft 2007). Organization 2005b) means that people with men-
Research on stigma in mental health has tal health problems may often experience human
largely relied on attitude surveys and has been rights violations in the community (Drew et al.
descriptive; very few studies have investigated 2005; Funk et al. 2005), and sometimes major life-
this aspect from the standpoint of a person with changing decisions are made on their behalf with
mental illness. Discrimination against people with regard to housing or treatment, for example
mental health problems has still been consistent in (World Health Organization 2005b). Where insti-
different parts of the world (Thornicroft et al. tutions still exist, living conditions are paltry and
2009). For example, in Ethiopia, key informants present risks to peoples’ physical health (Drew
were asked about their perceptions of several dif- et al. 2011). Sharma (1999) reported on the state
ferent health and mental health conditions – they of mental hospitals found that many had under-
judged schizophrenia to be the most severe, and gone no structural transformations after previ-
mental illness was frequently associated with talk- ously having been jails. Other hospitals were at
ativeness, aggression, and strange behavior (Alem risk of serious overcrowding as single-person
et al. 1999). In the Arab world, there is much cells were used to house several patients. Others
stigma associated with mental health services lacked any sanitary facilities and received inap-
(Al-Krenawi and Graham 2000; Savaya 1995). propriate treatment.
Thornicroft et al. (2009) have reported high and Public education campaigns have produced
consistent rates of experienced discrimination mixed results (Thornicroft 2007, p. 244) and
among people with schizophrenia across coun- have perhaps only been reserved to certain coun-
tries of various income levels. A cross-sectional tries. A concerted program in Australia called
survey conducted in 27 countries using face to “beyondblue” aimed at conveying accurate infor-
face interviews with 732 participants with schizo- mation about depression, and its initial evalua-
phrenia found that across all countries, the most tions showed a series of benefits, including
common areas of negative experienced discrimi- better community recognition of people with
nation were seeking or maintaining friendships, depression, reforms in life insurance and income
discrimination by relatives, keeping and find- protection, as well as intervention programs in
ing a job, and intimate or sexual relationships. schools (Ellis et al. 2002).
Examining self-stigma and discrimination across Combatting mental health stigma has been at
several European countries, Brohan et al. (2010) the forefront of mental health policy in England.
General population 10%

Tuberculosis 46%
HIV/AIDS 44%
Cancer 33%
Diabetes 27%
Stroke 31%
Epilepsy 30%
Myocardial infarction 22%
Hypertension 29%
Fig. 1 Prevalence of major depression in patients with physical illnesses (World Health Organization 2003a)
The Time to Change campaign led by two mental also associated risk factors with the development
health charities promoted public mental health of mild cognitive impairment or Alzheimer’s dis-
awareness. A study measuring its efficacy ease (Velayudhan et al. 2010); studies in Japan
(Evans-Lacko et al. 2013c) suggested that their (Ohara et al. 2011) and Sweden (Xu et al. 2009)
marketing tools – promoting social contact have also shown associations between diabetes
between members of the public and people with and dementia.
mental health problems – had positive outcomes In England, recent policy has focused on the
on social stigma, and perhaps more so on behavior association between poor health and mental health
and attitudes, rather than knowledge (Evans- (Department of Health 2011), given these consid-
Lacko et al. 2013a). Smith (2013) commented erable costs – it is estimated that between £8 and
that although the economic analysis seems to £13 billon of NHS spending in England is due to
indicate benefits from the program, the assump- comorbid mental health problems and long-term
tions used in the model seem to lead to uncertain conditions (Naylor et al. 2012) – and burden to
conclusions: from a net cost to a benefit of £223 society (Department of Health 2011). Comorbid-
million. ity is also associated with lower quality of life:
utilizing data from the World Health Surveys,
Moussavi et al. (2007) showed that people who
Comorbidity suffered from depression as well as a long-term
condition reported lower quality of life scores
Comorbidity is common within the mental health compared to people who only suffered from
population, as 30% of all people with a long-term long-term conditions (Fig. 1).
health condition also have a mental health prob-
lem (Cimpean and Drake 2011). Other estimates
have shown that people with long-term conditions Provision of Mental Health Care: How
were up to three times more likely to experience a Is Care Delivered?
mental health problem compared to the general
population (Naylor et al. 2012). Although much Who Delivers Care: Medical
of the evidence relates specifically to affective Professionals, Unpaid Caregivers
disorders such as depression and anxiety (Naylor
et al. 2012), studies have shown higher rates of The occurrence of mental illness does not always
conditions such as asthma, arthritis, cancer, and require a need for treatment (Bebbington 1990).
HIV/AIDS (Chapman et al. 2005; Sederer et al. Nevertheless, much like its determinants, the
2006), among people with mental health prob- treatment can be categorized into biological
lems, compared to people without. There are (such as psychotropic drugs), psychological
984 J. Cylus et al.
(or psychotherapies), and psychosocial (like case 10.6% (1996–1999) to 21.3% (2004–2007).
management and family interventions) (Lehtinen Data from England showed annual increases
et al. 2007, p. 128). from 1998 to 2010 of 6.8% on average: antide-
Antipsychotics basically control the produc- pressant prescriptions rose by 10% per year, while
tion of dopamine, the main neurotransmitter antipsychotics grew by 5.1%.
in the brain – the excess of which may play a Costs of antispsychotics overtook those of
part in producing hallucinations, delusions, and antidepressants as the most costly psychiatric
thought disorder and hence are mainly used for drug, with costs rising by 22% (Ilyas and
the treatment of schizophrenia. Older antipsy- Moncrieff 2012). Similarly, data from the USA
chotics, such as haloperidol and chlorpromazine, in recent years show that antipsychotics, antide-
depending on their dosage, have side effects pressants, and drugs for attention deficit hyperac-
which include stiffness and shakiness. In compar- tivity disorder have been consistently ranked
ison, newer drugs, the most popular of which as the most expensive prescription drugs (IMS
are clozapine and olanzapine, have side effects Health 2010).
which include sleepiness and slowness, weight The first point of contact for mental health
gain, sexual problems, increased risk of diabetes, care in many countries is usually primary health
and some risk of Parkinson’s disease; long-term care, and a majority of countries allow primary
use can produce movements of the face and, health care (PHC) doctors to prescribe and/or
rarely, of the arms and legs. Both are administered continue prescribing medicines for mental and
in the form of a pill. Increasingly, the use of depot behavioral disorders either without restrictions
antipsychotics, where medication is given as an (56%) or with some legal restrictions (40%),
injection every 2–4 weeks, has become more such as allowing prescriptions only in certain
common: medication is hence released slowly categories of medicines or only in emergency
over the course of time. Depots are usually admin- settings. In other cases, psychiatrists or neurolo-
istered at the local GP surgery, at a community gists would take responsibility for prescribing
mental health center, at a special outpatient clinic for patients with more severe or treatment-
or by a nurse at home (Royal College of Psychi- resistant symptoms. Only 3% of respondent
atrists Public Education Editorial Board 2012). countries in a WHO survey did not allow any
Antidepressants are also frequently adminis- form of prescription by PHC doctors (World
tered; their main effect is to stimulate the amount Health Organization 2011).
of serotonin and/or noradrenaline in the brain Treatment, care, and support for people with
(Lehtinen et al. 2007, p. 131). Due to their poten- mental health problems are managed by primary,
tially adverse side effects, “new” antidepressants secondary (and tertiary) health-care settings, with
were introduced in the 1990s to curb these. Over- a lot of treatment and care delivered in the com-
all, the uptake of antidepressants has been on the munity by non-medics. The most comprehensive
rise in the last decades. form of mental health care, which comprises a
Cross-country variations are also apparent, balance between hospital and community-based
with the USA leading the pack in terms of drug services, has only been achieved in a few high-
prescribing. More recent data show a continuation income countries (Saxena et al. 2007). Only half
in this positive trend especially with regard to the countries in Africa, the eastern Mediterranean,
antidepressants in the USA (Olfson and Marcus and southeast Asia provide community-based
2009), New Zealand (Exeter et al. 2009), and Italy care (World Health Organization 2005). Within-
(Deambrosis et al. 2010) as well as antipsychotics country differences also exist in terms of the
in various countries (Verdoux et al. 2010). A study availability of community-based care: this type
of the trends of antipsychotic prescribing in the of care is restricted to only a few areas in China,
USA for anxiety disorders among a representative India, Paraguay, and Zambia. In general,
sample of visits to office-based psychiatrists about 52% of low-income countries and about
(Comer et al. 2011) found an increase from 97% of high-income countries provide
community-based care (Saxena et al. 2007; World has been generally on the decline between 2002
Health Organization 2005). and 2006. On the other hand, forensic bed spaces
Hospital inpatient beds were the mainstay of have been on the increase (except in Ireland, Italy,
mental health provision in many high-income and Switzerland), as well as places in supported
countries for many decades and remain crucially and supportive housing (except in Ireland and
important, but in many countries, the specialist Switzerland) and in prisons. More specifically, in
(institutional) asylums are being or have Iceland, Italy, and Sweden, psychiatric hospitals
been closed. no longer exist and care is provided in beds in
The global median number of facilities per general hospitals or in community-based facilities
100,000 population is 0.61 outpatient facilities, (Medeiros et al. 2008).
0.05 day treatment facilities, 0.01 community However, community-based residential ser-
residential facilities, and 0.04 mental hospitals. vices are not available in all countries. Turkey
In terms of psychiatric beds in general hospitals, and most cantons in Switzerland do not possess
the global median is 1.4 beds per 100,000 popu- such facilities.Deinstitutionalization is advancing
lation. Higher income countries typically at different paces in different countries, mainly
have more facilities and higher admission/utiliza- due to national traditions and the sociocultural
tion rates. context, the availability of resources, as well as
Deinstitutionalization is the process of shifting financial incentives (Fakhoury and Priebe 2002).
the care and support for patients with mental ill- Within Europe, the rates of the closure of asy-
ness from custodial asylums to community-based lums have been uneven between countries, and
settings and saw its real beginnings in the USA sometimes gaps have been reported between the
and then in England in the 1970s (Shorter 1997). closure of institutions and the provision of alter-
This period also saw a shift in treatment, in terms native services (Medeiros et al. 2008).
of becoming demedicalized, as non-physician Research conducted in 2000 has shown that,
specialists begin to assume a role (Shorter 2007, for example, in Asia, and specifically in Japan
pp. 21, 22). (Kuno and Asukai 2000) and Hong Kong (Yip
In England, generally, studies have demon- 2000), deinstitutionalization has yet to occur.
strated that deinstitutionalization has had positive In Japan, Kuno and Asukai (2000) comment that
outcomes for service users (see the TAPS studies, deinstitutionalization is unlikely to happen in the
e.g.). However, systematic data on the preferences near future since people with mental health prob-
and situations of people with mental health lems are not valued as members of society. More
problems is gravely missing, with no existing recently in Japan, the Sasagawa Project (Mizuno
European overview (Anderson et al. 2007). Data et al. 2005) aimed to make the transition of people
from the UK show that although the majority of with mental health problems from hospital to
people with mental health problems live in main- community living; this project claimed to be the
stream housing (Boardman 2010; Social Exclu- first of its kind in the country. The study on the
sion Unit 2004), many live in residential care closure of Sasagawa hospital and the subsequent
homes (Health and Social Care Information relocation of patients into Sasagawa “village”
Centre 2013) or in supported housing services or reported positive outcomes; however, there is
in independent flats where they receive “floating much to say about the segregation of people with
support” (Centre for Housing Research 2013), mental health problems.
which is support for a set number of hours a Deinstitutionalization is currently under way in
week within a person’s home. several South American countries (Larrobla and
Data from Priebe et al. (2005) show that in fact, Botega 2000). In Australia, Moxham and Pegg
in most of the nine selected European countries (2000) commented that the shift to community
(Austria, Denmark, England, Germany, Republic care was not met with systematic and adequate
of Ireland, Italy, the Netherlands, Spain, and Swit- planning and the delivery of appropriate housing
zerland), the number of psychiatric hospital beds services or placements.
986 J. Cylus et al.
Recently, a new project – EMERALD – was is often financed in a similar way to the mecha-
launched to improve mental health outcomes in nism of funding general health care in that par-
health systems performance and identify its poten- ticular country. Out-of-pocket payments are also
tial barriers, specifically in low- and mid-income an important source of funding for mental health
countries (EMERALD 2014); results of this pro- care in some countries, particularly outside of
gram have yet to be disseminated. Europe. Even so, nearly half of western
European countries levy user charges for special-
ist mental health-care services, even within their
Financing Mental Health Services: How publicly funded system (Knapp et al. 2006).
Is Care Financed? Generally, voluntary health insurance does not
play a major role in funding mental health care.
Though the cost of poor mental health has been However, in some countries like the UK and
estimated to be between 3% and 4% of GDP even Germany, there has been some expansion of
in many European countries, no countries dedi- mental health-care coverage within voluntary
cate a proportionate level of resources to treating health insurance (Knapp 2007). In the USA
mental health disorders (Gabriel and Liimatainen there have also been recent efforts to ensure
2000). Just over two thirds of countries across the that private health insurers cover mental health
world have a budget that is specifically dedicated conditions no differently than they cover physi-
to mental health, and many countries spend less cal conditions.
than 1% of their total health budget on mental Naturally, each country allocates different
health-care services (Thornicroft and Maingay levels of funding to the treatment of mental
2002). According to the 2005 WHO Mental health. Historically, spending has been directed
Health Atlas, South East Asia had the highest toward psychiatric hospitals; for example, three
proportion of countries with a specified budget quarters of spending in Sri Lanka, Ghana, Kerala
for mental health care (90%); the Western Pacific (India), and Uganda were on psychiatric hospital
had the lowest proportion of countries (59%). care. Recently though, there have been shifts in
European countries often allocate funds specifi- many countries toward allocating funds to
cally for mental health, despite not necessarily community-based services as opposed to psychi-
always having a specific line item within their atric hospitals. As a result of this move into the
national budgets (World Health Organization realm of social and community care, in some
2005a). cases there has also been a trend to shift mental
Generally, mental health budget information health-care funding away from health budgets
is scarce in low-income countries (Raja et al. and onto social protection budgets. This
2010). A study by Raja et al. (2010) found how- intersectoral approach to financing mental health
ever that national ring-fenced budgets for mental care is not exclusive to high-income countries;
health as a percentage of national health spend- for example, according to the WHO, the Burundi
ing for 2007–2008 were less than 4% in Sri Ministry of Finance requested a social sector
Lanka, Ghana, and Kerala (India) and less than loan from the World Bank for work on early
7% in Uganda. Even in countries that dedicate childhood development, which had an explicit
substantial resources to mental health, coverage mental health component (World Health Organi-
for mental health-care services may be more zation 2003b).
limited than other health-care services (Knapp
et al. 2007).
Worldwide, government funds such as those Key Policy Dimensions/Recent Policies
generated by taxes are the most common source and Trends
of mental health financing (World Health Orga-
nization 2003b). In countries where the govern- Several key policy dimensions have dominated
ment pays for the bulk of mental health care, care the global conversation on mental health.
Personalization and Empowerment A Cochrane review of advance treatment

directives (Campbell and Kisely 2009), which is
Service user involvement is becoming increas- a document which specifies a person’s preference
ingly commonplace, through patient-centered for treatment, should they lose the capacity do
care, shared decision-making, patient participa- make such decisions in the future, found two
tion, as well as the recovery model (Storm and RCTs which included 321 people with severe
Edwards 2013). mental illness. Although concluding that too little
Personalized care and services are said to data was available to make robust conclusions,
empower individuals and improve their quality authors felt that more intensive forms of advance
of life. A novel method by which personalization treatment directives seemed promising.
is translated is through “direct payments” or Recently, an RCT conducted in the UK
“personal budgets” and has been introduced in (Thornicroft et al. 2013) aimed to test whether
a few countries, namely, England, Scotland, and JCP (joint crisis planning) was associated with
the Netherlands (Knapp and McDaid 2007, better outcomes among mental health service
p. 93), as well as the USA. In England more users, compared to the control group, who was
specifically, direct payments are “cash payments receiving treatment as usual. No significant dif-
made to individuals who have been assessed as ferences were found with regard to the primary
needing services, in lieu of social service pro- outcome measure, namely, compulsory admis-
visions” (Department of Health 2008). They are sions; however, a modest improvement was
aimed at giving recipients greater control over found with regard to the therapeutic relationship.
their own lives, enabling them to purchase ser- Qualitative analyses revealed that some patients in
vices other than those provided by a local coun- the intervention group reported positive experi-
cil, including novel solutions in terms of services ence; however, there was concern among others
and activities; the money a person receives is still about how clinical services struggled to put JCP
decided on following an assessment of need. into practice.
However, the uptake of direct payments among In the Netherlands, an RCT comparing the
mental health service users has been slow, and it quality aspects of crisis plans drawn up with the
has been reported that they may have great help of patient advocates compared to those by
difficulty of access possibly due to a lack of clinicians (Ruchlewska et al. 2012). The quality
awareness, or as reported by staff, difficulties aspects checklist was devised specifically for this
in managing payments (Davey et al. 2007). study and comprised four domains: (1) relapse
Still, despite low uptake rates, there was indicators/daily functioning, (2) advance state-
great diversity in their use, ranging from ments on what to do during a crisis, (3) medical
support with regard to personal care and trans- information, and (4) information on contacts. The
port, to everyday activities (Spandler and study concluded that, in terms of completeness
Vick 2004). and specificity, crisis plans drawn up by patients
Individual budgets (subsequently called per- advocates were of better quality than those com-
sonal budgets) were later introduced in the UK, pleted by clinicians.
promising greater personalized purchasing and
freedom in the selection of the chosen type of
care and support (Department of Health 2006), Carer and Family Impact
and were to be delivered as a single transparent
sum allocated to a person in their name and held The impact of mental health problems can be
on their behalf (like a bank account), allowing the felt through several mediums, namely, mortality,
individual to either then choose to take the funds suicide, and crime and also by family. Schizophre-
out in cash (as a direct payment) or as a mixture of nia, for example, can have enormous personal
cash and services up to the value of their individ- consequences for people with the illness and
ual budget. their families, as well as tremendous economic
988 J. Cylus et al.
consequences for them, as well as for govern- and bans on underage drinking. For example,
ments and society as a whole. For example, in a comprehensive anti-smoking campaign can
England, the total cost of schizophrenia has been reduce smoking by up to 6% (Saffer 2000).
estimated at almost £12 billion, and this includes More specifically, Jané-Llopis and Anderson
a cost to the public sector of more than £7 billion (2007, pp. 191–192) carefully lay out an inte-
(Andrew et al. 2012). Many relatives or other grated policy framework for the promotion of
unpaid carers of people with schizophrenia may mental health and the prevention of mental disor-
give up employment (4.8% of carers) or take time ders. These are subdivided by age categories:
off work (15.5% took a mean 12.5 days off) childhood and adolescence, adulthood and older
in order to provide care and support. In economic groups, as well as by type of approach, whether
terms, this translates into a loss of £517 public or mental health policy. Starting with fetal
(in 2011/2012 prices) per individual with schizo- development, it is important to raise awareness
phrenia living in a household (Mangalore and among expectant mothers of the risk of substance
Knapp 2007). use during pregnancy, for example, smoking
The WHO has emphasized that more support is while pregnant doubles the risk of lower birth
required for unpaid (sometimes called informal) weight. Educational programs in some countries
carers, as usually their expenses as well as their to help pregnant women cease smoking have had
opportunity costs (e.g., from lost employment) are immediate and long-term mental health gains on
not covered by the State or by insurance (World infants (Institute of Medicine 2001). Other inter-
Health Organization 2003a). In addition to the ventions during childhood include parenting
emotional strain of caring, relatives can also be interventions. These target basic reading skills or
exposed to the stigma and discrimination associ- other parenting skills and are said to improve
ated with mental ill health. This in turn often literacy as well as emotional and language growth
translates into social isolation and exclusion (Jané-Llopis and Anderson 2007, p. 193). Indeed,
from their communities, friends, and relatives. poor school performance increases the risk
of social and mental health problems. School
prevention programs involve general cognitive,
Prevention, Promotion, Public Mental problem-solving, and social skill-building,
Health (e.g., Campaigning) resulting in 50% reductions in depressive symp-
toms (Greenberg et al. 2001). However, and
Given the huge psychological, economic, and unfortunately, most low-income countries lack
societal burdens, much emphasis has been placed appropriate child and adolescent mental health
on the prevention and promotion of mental health services (Patel et al. 2008).
(World Health Organization 2004). (Much of Funk et al. (2012) also focused on similar
the discussion on prevention can be found in aspects to those of the WHO (2010) with regard
Jané-Llopis and Anderson (2007).) In addition to to mental health interventions to improve
targeted interventions, the WHO distinguishes development, by employing targeted poverty-
macro-strategies that may reduce risk and alleviation programs in order to break the cycle
improve quality of life. These include improving between mental illness and poverty.
nutrition (especially in impoverished countries); Funk et al. (2012) discuss many interventions,
improving housing and its quality; improving including pharmacological, psychosocial, and
access to education; reducing economic insecu- care-management strategies for schizophrenia,
rity; strengthening community networks through, depression, alcohol misuse, epilepsy, and suicide
for example, the Communities That Care prevention that have been associated with
program, already in force in the USA, the positive outcomes across the world, regardless of
Netherlands, the UK, and Australia (Hawkins wealth. Suicide prevention should be highlighted
et al. 2002); and reducing the harm from addictive through comprehensive public health programs
substances, through interventions such as taxes and should at least comprise the following
interventions in low- and middle-income coun- Total estimated worldwide costs of dementia
tries (LMICs): reducing the access to means were US$ 604 billion in 2010. In high-income
for suicide, responsible and deglamorized media countries, informal care (45%) and formal social
reporting, and early identification and treatment of care (40%) make up the majority of costs, in
people with mental and substance use disorders. comparison to direct medical costs (15%)
An important point to consider in working- which are much lower. In low-income and
age adults is employment and associated stress lower-middle-income countries, direct social
factors that may lead to anxiety, depression, or care costs are small, and the costs of unpaid
stress-related problems. Interventions to care provided by the family dominate (World
improve mental health in the workplace have Health Organization 2012). Given the expected
centered on task and technical interventions growth over the coming decades in the number
(e.g., lowering workload or ergonomic improve- of people with dementia, the costs of supporting
ments) and clarifying job role expectations as and treating them can also be expected to
well as improving social environment (e.g., con- increase rapidly too. For example, a study com-
flict resolution) (Price and Kompier 2006). There paring future dementia costs in Italy, Spain, the
is now evidence that many of these prevention UK, and Germany suggested that the proportion
and promotion initiatives can be not only effec- of GDP spent on long-term care would more than
tive but also cost-effective. Andrew et al. (2012) double between 2000 and 2050 (Comas-Herrera
assessed the various interventions in schizophre- et al. 2006).
nia in terms of effectiveness and cost- These projected future trends have prompted
effectiveness. One intervention, where authors much discussion and also some real action
found strong evidence for cost-effectiveness, across many countries. One of the first countries
was individual placement and support, which to develop such a plan was Canada in 1999,
aims to help people with schizophrenia find com- and their “Alzheimer Strategy – Preparing for
petitive employment. our future” runs till 2014. A good example of
an integrated action plan for dementia comes
from France, which was one of the first
Aging and Dementia European countries to launch such a program
(in 2008). Based on 44 measures to combat
With the world population aging rapidly, and dementia and related disorders (République
people living longer, the prevalence rate of Française 2013), the key aims are to improve
age-related disorders is increasing. One such dis- diagnosis, to provide better treatment and sup-
order is dementia, which often has an overwhelm- port through establishing “coordinators”
ing effect on the individual with the illness, their throughout the country and through encouraging
family, and society more generally, prompting the treatment at home by skilled support staff,
WHO to promote it as a major public health pri- and to provide more effective help through
ority (World Health Organization 2012). Demen- developing and diversifying respite structure
tia is a chronic and progressive syndrome, caused and through the use of technology (such as a
by a variety of brain illnesses and affecting mem- telephone line or a website). A final aim was to
ory, thinking, behavior, and ability to perform create a foundation for scientific cooperation
everyday activities. The latest figures from the to stimulate and coordinate research through
WHO (2012) estimated the total number of people memory clinics and diagnostics centers, with a
with dementia worldwide in 2010 to be 35.6 mil- lesser reliance on antipsychotic drugs
lion, and this number is projected to nearly double (République Française 2008). It also aimed to
every 20 years, to 65.7 million in 2030 and 115.4 change the way dementia is viewed, by raising
million in 2050. The worldwide annual incidence awareness at the national and international level.
rate of dementia is nearly 7.7 million, implying The plan pledged 1.6 billion Euros over this
one new case every 4 s. period.
990 J. Cylus et al.
More recently (December 2013), the Health are relatively low compared to the general
Ministers from the G8 countries met in London population.
for a Dementia Summit, following which they People with a history of mental health
jointly issued a declaration and communique, problems face problems in the open employment
spelling out clearly the challenges so often market, including stigma, a reluctance from
experienced by family and other cares of employers to give them a job (Manning and
people with dementia and the need for action. White 1995), with some even alluding to
Further joint action is planned to tackle what their perceived risk of violence (Roberts et al.
has become a major global mental health 2004). A recent study using Eurobarometer
challenge. surveys of 2006 and 2010 has demonstrated
that the economic crisis has widened the gap
even more in terms of unemployment rates
Employment between people with and without mental health
problems; those who were particularly affected
Previous studies have shown the enhancing effect were men and people with lower educational
employment can have within the mental health attainment (Evans-Lacko et al. 2013b). Addi-
population. However, poor-quality jobs can be tionally, people living within countries with
detrimental to mental health. This is problematic, higher levels of stigmatizing attitudes toward
due to the fact that people with mental health people with mental health problems were partic-
problems often find themselves in low-skilled ularly more vulnerable to unemployment
jobs, which can add strain to their emotional in 2010.
well-being, as well as not being suitable to their Nevertheless, it is generally agreed that
needs and preferences (OECD 2011). motivation to work has a significant influence on
However, despite these gains in outcomes, whether people with severe mental illness gain
employment rates among people with mental competitive employment (Catty et al. 2008); anti-
health problems vary by diagnosis severity and psychotic medication also plays a role here as
Severe disorder Moderate disorder No disorder ( )

’
90
80
70
60
50
40
30
20
10
0
Employment/population ratio (employed people as a proportion of the working-age population), by severity of mental
disorder, ten OECD countries, latest available year (late 2000s) (OECD 2011)
well. Another reported and important disincentive

to work is a risk of losing entitlements. Innovative Individual Placement and Support (IPS)
solutions have come from OECD countries. IPS is based on an integrated approach to
For example, in Luxembourg, people who were seeking and maintaining employment. It
on benefits and find a job get a permanent starts with the principle that no mental
payment to supplement any loss in earnings health service users are excluded due to a
(OECD 2010). poor previous work history, lack of “work-
Previous studies have found that people readiness,” frequent hospital admissions, or
with mental health problems are generally inter- apparent symptoms. Vocational programs
ested in pursuing employment opportunities should be integrated as part of the mental
(Grove 1999; Secker et al. 2001) but felt health agency or team. The achievement of
that their mental health was an important barrier competitive employment, one that also
to doing so (Bond et al. 1997; Sainsbury takes into consideration preferences, work
Centre for Mental Health 2004; Secker et al. and education history (if applicable),
2001). strengths, and weaknesses, is key. Rapid
Most people with mental health problems job search and placement is preferred to
can achieve competitive employment (Bond prior assessment, skills training, and voca-
et al. 1997). Studies of supported employment tional counseling. The most valuable
schemes in the USA have shown that employ- assessment is made after obtaining employ-
ment may lead to improvements in outcomes, ment, as well as providing support and ser-
in terms of mental health treatment (Cuyun vices for a sustained period of time. Job
Carter et al. 2011), through increasing self- coaches should also be used to help service
esteem and improving quality of life (Van users understand the complex rules
Dongen 1996), as well as alleviating psychiatric governing disability benefits and assisting
symptoms and reducing dependency (Cook and them in making the best employment deci-
Razzano 2000). A Cochrane review of voca- sions (Bond 1998).
tional rehabilitation found that supported
employment schemes to be more effective than
various type of prevocational training in The OECD (2011) has also recently
obtaining and maintaining employment highlighted the promotion of good mental health
(Crowther et al. 2001). An evidence-based in the workplace, as well as its linkages to well-
refinement of the supported employment being and productivity. The OECD reports that
approach, individual placement and support, most people with mental ill health are in work yet
has had positive outcomes in the UK and the that productivity losses are extensive.
USA (Leff and Warner 2006, p. 134). A report These may include short- and long-term absen-
commissioned by the cross government Health teeism, early retirement, reduced employment
Work and Wellbeing Programme on mental opportunities, presenteeism, days out-of-role,
health and the workplace (Lelliott et al. 2008) and reduced lifetime productivity due to prema-
concluded that IPS has the strongest evidence ture mortality (McDaid et al. 2008). Data from the
base of interventions aimed at helping people OECD shows sharp increases in absenteeism and
with severe mental illness back into employ- presenteeism with poorer mental health. So far
ment. However, authors added that IPS can the OECD has released several country reports,
only be beneficial among people who believed with others due in 2014. For example, in Sweden,
they were ready for paid employment. Other Denmark, Norway, and Belgium, a common rec-
limitations included the fact that most people ommendation was to tackle the issue of possible
ended up in mostly part-time, entry-level jobs; future unemployment from an early age, during
the long-term outcomes and economic benefits the school years, and to minimize dropouts, as
are unknown. well as focus on cases that were deemed at risk.
992 J. Cylus et al.
Another commonality was to tackle the issue of New Advancements in Treatments

mental ill health within the workplace in a sys- and Technologies
tematic and rapid fashion, before potential absen-
teeism or job loss. Innovative and cost-effective treatments and tech-
Other techniques may involve cognitive nologies with regard to mental health have started
behavioral therapy within the workplace, and being developed, as traditional services struggle to
several US studies demonstrated positive out- cope with the growing demand and variety of
comes, in terms of better mental health outcomes needs of its users, at least in the UK (Limb 2012).
and higher rates of job retention (Wang et al. 2006, Video games, online and social network tools,
2007). In France, the Electricite de France and as well as mobile phone apps have been quick to
Gaz de France are major companies and respond. The internet does not only provide greater
employers that implemented a program control and power over services (Gournay 2000),
“APRAND” (Action de Prévention des Rechutes but can also be a powerful therapeutic tool
des troubles Anxieux et Dépressifs), which (Foroushani et al. 2011), and a way for service
focused mostly on prevention, by encouraging users to voice their opinions. For instance, an
company health physicians, primary care doctors, online system for people with eating disorders
and social workers to identify anxiety and depres- in Germany (Moessner and Bauer 2012), which
sion problems early among employees. The makes use of e-mailing and forums, generated
implementation of preventative activities among high user satisfaction and, more importantly,
people on long-term sick leave who had been increased the probability of users seeking profes-
identified as having anxiety problems had better sional help. In the UK, a Google search for “sui-
outcomes in terms of recovery or remission com- cide” generates links for the Samaritans and other
pared to people who received regular treatment support groups first (Wicks 2012). Otherwise,
(Godard et al. 2006). PlayMancer is an EU initiative to develop a video
A recent report released by the OECD (2014) game prototype to treat specific mental health prob-
stated that, in an international comparison, lems, such as impulse control disorders. This game
the UK is among the most advanced countries uses biofeedback to help people learn relaxation
in terms of awareness of the costs of mental skills and develop better self-control and emotion
illness for society and the benefits that employ- regulation strategies. Initial results are positive, and
ment may bring to people with mental patients have started to show new coping styles and
health problems. The OECD recognized that more self-control (Fernandez-Aranda et al. 2012).
stricter eligibility criteria for benefits as well as In England, the NHS has developed a mobile
large-scale reassessments were steps in the phone app “My Journey” as part of their early
right direction. The report stated some key interventions in psychosis, to monitor young per-
recommendations as well which focused on sons’ mood, keep track of medication, set goals,
prevention (and early intervention to avoid and be a source of advice if needed (http://www.
sickness benefit becoming a disability benefit) sabp.nhs.uk/eiip/app).
and the expansion of psychological therapies
for people with CMDs, more awareness of
employability within the benefits system, Discussion
and building on the better integration of health
and employment services. Other recommenda- Mental health problems present tremendous per-
tions include to invest in labor markets more sonal and financial burdens to the individual, to
generally, to be able to provide better their carers, and to society as a whole. However,
support for people with mental health problems, resources allocated by different countries to the
and to focus on outcome payments for care and support of these individuals vary. The
employers in to promote better employment WHO has put mental health promotion at the heart
outcomes. of its policy agenda; huge inequalities still remain.
44
Incidence of absenteeism and presenteeism (in percentage) and average absence duration (in days), by mental health status, average over 21 European OECD countries in 2010
Panel A. Sickness absence incidence Panel B. Average duration of sickness absence Panel C. Presenteeism incidence
Percentage of persons who have been absent from work in Average number of days absent from work in the past Percentage of workers not absent in the past 4 weeks but who
the past 4 weeks (apart from holidays) 4 weeks (of those who have been absent) accomplished less than they would like as a result of an
emotional or physical health problem
45 8 90
88
40 42 7 80
7.3
35 70
6
69
Provision of Health Services: Mental Health Care
30 60
5 5.6
25 28 50
4.8
4
20 40
19 3
15 30
2 26
10 20
5 1 10
0 0 0
Severe disorder Moderate No mental Severe disorder Moderate No mental Severe disorder Moderate No mental
disorder disorder disorder disorder disorder disorder
Note: Averages are represented by dashed lines
Source: OECD calculations based on Eurobarometer (2010)
993
994 J. Cylus et al.
Stigma seemed to be pervasive across the media changes attitudes towards people with mental
board: people in low-income and maybe middle- illness. Soc Sci Med. 1996;43(12):1721–8.
Angermeyer MC, Matschinger H. Causal beliefs and
income countries still experience basic human attitudes to people with schizophrenia. Trend analysis
rights violations, while in high-income countries, based on data from two population surveys in Germany.
policies are more focused on social inclusion and [Research Support, Non-U.S. Gov’t]. Br J Psychiatry.
integration. Indeed, the promotion of personalized 2005;186:331–4.
Ayuso-Mateos JL. Global burden of bipolar disorder in the
care and service user empowerment, although year 2000. Geneva: WHO; 2000a.
producing mixed results, are steps in the right Ayuso-Mateos JL. Global burden of obsessive-compulsive
direction. disorder in the year 2000. Geneva: WHO; 2000b.
Deinstitutionalization has yet to occur in many Ayuso-Mateos JL. Global burden of panic disorder in the
year 2000: version 1 estimates. Geneva: WHO; 2000c.
countries, although the process has been under Ayuso-Mateos JL. Global burden of schizophrenia in the
way in some parts of Europe and the USA since year 2000: version 1 estimates. Geneva: WHO; 2000d.
the 1970s. Regardless of the financial resources, Baldwin ML, Marcus SC. Stigma, discrimination, and
and funding arrangements in place, perhaps employment outcomes among persons with mental
health disabilities. In: Schultz IZ, Rogers ES, editors.
stigma and discrimination do play a role in the Work accommodation and retention in mental health.
continued existence of asylums and institutions, New York: Springer; 2011.
alluding to the so-called NIMBY phenomenon Bebbington P. Population surveys of psychiatric disorder
(Thornicroft 2007). and the need for treatment. Soc Psychiatry Psychiatr
Epidemiol. 1990;25(1):33–40.
Mixed evidence exists with regard to anti- Bennett D. The value of work in psychiatric rehabilitation.
stigma campaigns, and it could be that a more Soc Psychiatry. 1970;5(4):224–30.
integrated approach to mental health promotion Boardman J. How are people with mental health problems
should be adopted somehow, with a focus on excluded? In: Boardman J, Currie A, Killaspy H,
Mezey G, editors. Social inclusion and mental health.
prevention as well. London: Royal College of Psychiatrists; 2010.
E-technologies may prove to be innovative and Bond GR. Principles of individual placement and support.
perhaps more importantly, cost-effective solutions Psychiatr Rehabil J. 1998;27:345–59.
but should still be regarded as complementary Bond GR, Drake RE, Mueser KT, Becker DR. An update
on supported employment for people with severe men-
therapies. tal illness. Psychiatr Serv. 1997;48(3):335–46.
Brohan E, Elgie R, Sartorius N, Thornicroft G, GAMIAN-
Europe Study Group. Self-stigma, empowerment and
perceived discrimination among people with schizo-
References phrenia in 14 European countries: the GAMIAN-
Europe study. Schizophr Res. 2010;122(1–3):232–8.
Alem A, Jacobsson L, Araya M, Kebede D, Kullgren G. Catty J, Lissouba P, White S, Becker T, Drake RE,
How are mental disorders seen and where is help Fioritti A, et al. Predictors of employment for people
sought in a rural Ethiopian community? A key infor- with severe mental illness: results of an international
mant study in Butajira, Ethiopia. Acta Psychiatr Scand. six-centre randomised controlled trial. Br J Psychiatry.
1999;397:40–7. 2008;192:224–31.
Al-Krenawi A, Graham JR. Culturally sensitive social Centre for Housing Research. Supporting people. 2013. From
work practice with Arab clients in mental health set- https://supportingpeople.st-andrews.ac.uk/index.cfm
tings. Health Soc Work. 2000;25(1):9–22. Chapman DP, Perry GS, Strine TW. The vital link between
Anderson R, Wynne R, McDaid D. Housing and employ- chronic disease and depressive disorders. Prev Chronic
ment. In: Knapp M, McDaid D, Mossialos E, Dis. 2005;3(2):1–3.
Thornicroft G, editors. Mental health policy and prac- Cimpean D, Drake RE. Treating co-morbid chronic
tice across Europe. Maidenhead: Open University medical conditions and anxiety/depression. [Review].
Press/McGraw-Hill Education; 2007. Epidemiol Psychiatr Sci. 2011;20(2):141–50.
Andrew A, Knapp M, McCrone P, Parsonage M, Comas-Herrera A, Wittenberg R, Costa-Font J, Gori C,
Trachtenberg M. Effective interventions in schizophre- Di Maio A, Patxot C, et al. Future long-term care
nia: the economic case: a report prepared for the expenditure in Germany, Spain, Italy and the United
Schizophrenia Commission. London: Rethink Mental Kingdom. Ageing Soc. 2006;26:285–302.
Illness; 2012. Cook J, Razzano L. Vocational rehabilitation for persons
Angermeyer MC, Matschinger H. Reporting of isolated with schizophrenia: recent research and implications
violent attacks by people with schizophrenia in the for practice. Schizophr Bull. 2000;26(1):87–103.
Crosby C, Barry M, Carter MF, Lowe CF. Psychiatric Evans-Lacko S, Malcolm E, West K, Rose D, London J,
rehabilitation and community care: resettlement Rusch N, et al. Influence of Time to Change’s
from a North Wales Hospital. Health Soc Care. social marketing interventions on stigma in England
1993;1:355–63. 2009–2011. [Evaluation Studies Research Support,
Crowther RE, Marshall M, Bond GR, Huxley P. Helping Non-U.S. Gov’t]. Br J Psychiatry Suppl. 2013c;55:
people with severe mental illness to obtain work: sys- s77–88.
tematic review. Br Med J. 2001;322:204–8. Fakhoury W, Priebe S. The process of deinstitut-
Cuyun Carter GB, Milton DR, Ascher-Svanum H, ionalisation: an international overview. Curr Opin
Faries DE. Sustained favorable long-term outcome in Psychiatry. 2002;15:187–92.
the treatment of schizophrenia: a 3-year prospective Fakhoury W, Priebe S. Deinstitutionalization and
observational study. BMC Psychiatry. 2011;11:143. reinstitutionalization: major changes in the provision
Davey V, Fernández J-L, Knapp M, Vick N, Jolly D, of mental healthcare. Psychiatry. 2007;6(8):313–6.
Swift P, et al. Direct payments: a national survey of Fernandez-Aranda F, Jimenez-Murcia S, Santamaria JJ,
direct payments policy and practice. London: London Gunnard K, Soto A, Kalapanidas E, et al. Video
School of Economics; 2007. games as a complementary therapy tool in mental dis-
Deambrosis P, Chinellato A, Terrazzani G, Pullia G, orders: PlayMancer, a European multicentre study.
Giusti P, Skaper SD, et al. Antidepressant drug pre- J Ment Health. 2012;21(4):364–74.
scribing patterns to outpatients of an Italian local health Foroushani PS, Schneider J, Assareh N. Meta-review
authority during the years 1998 to 2008. [Comparative of the effectiveness of computerised CBT in treating
Study Letter Research Support, Non-U.S. Gov’t]. depression. [Comparative Study Research Support,
J Clin Psychopharmacol. 2010;30(2):212–5. Non-U.S. Gov’t Review]. BMC Psychiatry. 2011;
Department for Education and Employment. Towards full 11:131.
employment in a modern society. 2001. Funk M, Saraceno B, Drew N. Global perspective on
Department of Health. Our health, our care, our say: mental health policy and service development issues.
a new direction for community services. 2006. In: Knapp M, McDaid D, Mossialos E, Thornicroft G,
Retrieved from http://webarchive.nationalarchives. editors. Mental health policy and practice across
gov.uk/+/www.dh.gov.uk/en/Publicationsandstatistics/ Europe: the future direction of mental health care.
Publications/PublicationsPolicyAndGuidance/DH_41 Maidenhead: Open University Press; 2005.
27453 Gabriel P, Liimatainen M-R. Mental health in the work-
Department of Health. Direct payments. 2008. From http:// place. Geneva: International Labour Organisation;
www.dh.gov.uk/en/SocialCare/Socialcarereform/Perso 2000.
nalisation/Directpayments/DH_080273 Godard C, Chevalier A, Lecrubier Y, Lahon G. APRAND
Department of Health. No health without mental health: a programme: an intervention to prevent relapses of anx-
cross-government mental health outcomes strategy for iety and depressive disorders – first results of a medical
people of all ages. 2011. health promotion intervention in a population of
Drew N, Funk M, Pathare S, Swartz L. Mental health employees. Eur Psychiatry. 2006;21(7):451–9.
and human rights. In: Herrman H, Saxena S, Goffman E. Stigma: notes on the management of spoiled
Moodie R, editors. Promoting mental health: concepts, identity. Englewood Cliffs: Prentice-Hall; 1963.
emerging evidence, practice. Geneva: World Health Gournay K. Commentaries and reflections on mental health
Organisation; 2005. nursing in the UK at the dawn of the new millennium:
Ellis PM, Smith DA, beyond blue: the national depression commentary 2. J Ment Health. 2000;9(6): 621–3.
initiative. Treating depression: the beyond blue guide- Greenberg MT, Domitrovich C, Bumbarger B. The preven-
lines for treating depression in primary care. “Not so tion of mental disorders in school-aged children: cur-
much what you do but that you keep doing it”. [Guide- rent state of the field. Prev Treat. 2001;4.
line Meta-Analysis Practice Guideline Research Sup- Grove B. Mental health and employment. Shaping a new
port, Non-U.S. Gov’t]. Med J Aust. 2002;176(Suppl): agenda. J Ment Health. 1999;8:131–40.
S77–83. Hawkins JD, Catalano RF, Arthur MW. Promoting
EMERALD. EMERALD. 2014. science-based prevention in communities. Addict
Eurobarometer. Mental health eurobarometer. 2010. Behav. 2002;27(6):951–76.
Evans-Lacko S, Henderson C, Thornicroft G. Public Health and Social Care Information Centre. Community
knowledge, attitudes and behaviour regarding care statistics, social services activity – England,
people with mental illness in England 2009–2012. 2011–12, Final release. 2013. From https://catalogue.
Br J Psychiatry. 2013a;202:51–7. ic.nhs.uk/publications/social-care/activity/comm-care-
Evans-Lacko S, Knapp M, McCrone P, Thornicroft G, soci-serv-act-eng-11-12-fin/comm-care-stat-eng-2011-
Mojtabai R. The mental health consequences of the 12-soci-serv-act-rep.pdf
recession: economic hardship and employment of peo- Institute of Medicine. Clearing the smoke.
ple with mental health problems in 27 European coun- Washington, DC: National Academy Press; 2001.
tries. [Research Support, Non-U.S. Gov’t]. PLoS One. Jané-Llopis E, Anderson P. A policy framework for the
2013b;8(7):e69792. promotion of mental health and the prevention of
996 J. Cylus et al.
mental disorders. In: Knapp M, McDaid D, Manning C, White PD. Attitudes of employers to the
Mossialos E, Thornicroft G, editors. Mental health mentally ill. Psychiatr Bull. 1995;19:541–3.
policy and practice across Europe. Maidenhead: Marrone J, Golowka E. If work makes people with mental
McGraw-Hill; 2007. illness sick, what do unemployment, poverty, and
Kessler RC, Aguilar-Gaxiola S, Alonso J, Chatterji S, Lee S, social isolation cause? Psychiatr Rehabil J. 2000;
Ormel J, et al. The global burden of mental disorders: an 23(2):187–93.
update from the WHO World Mental Health (WMH) McCourt CA. Life after hospital closure: users’ views
surveys. [Research Support, N.I.H., Extramural]. of living in residential ‘resettlement’ projects. A case
Epidemiol Psichiatr Soc. 2009;18(1): 23–33. study in consumer-led research. Health Expect.
Killaspy H. From the asylum to community care: learning 2000;3:192–202.
from experience. Br Med Bull. 2006;79:245–58. McDaid D, Knapp M, Medeiros H, the MHEEN Group.
Knapp M. Mental health policy and practice across Europe: Employment and mental health: assessing the eco-
the future direction of mental health care. Maidenhead: nomic impact and the case for intervention. London:
Open University Press; 2007. Personal Social Services Research Unit; 2008.
Knapp M, McDaid D. Financing and funding mental health Medeiros H, McDaid D, Knapp M, the MHEEN Group.
care services. In: Knapp M, McDaid D, Mossialos E, Shifting care from hospital to the community in
Thornicroft G, editors. Mental health policy and prac- Europe: economic challenges and opportunities.
tice across Europe. Maidenhead: McGraw-Hill/Open London: Personal Social Services Research Unit; 2008.
University Press; 2007. Mental Disability Advocacy Centre. Guardianship and
Knapp M, McDaid D, Amaddeo F. Financing arrange- human rights in Bulgaria: analysis of law, policy and
ments for mental health in Western Europe. J Ment practice. 2007a.
Health. 2006. Mental Disability Advocacy Centre. Guardianship and
Knapp M, McDaid D, Amaddeo F, Constantopoulos A, human rights in Russia: analysis of law, policy and
Oliveira MD, Salvador-Carulla L, Zechmeister I, the practice. 2007b.
MHEEN Group. Financing mental health care in Mindout for mental health. Working minds: making mental
Europe. J Ment Health. 2007;16(2):167–80. health your business. 2000.
Kohn R, Saxena S, Levav I, Saraceno B. The treatment Mizuno M, Sakuma K, Ryu Y, Munakata S, Takebayashi T,
gap in mental health care. [Research Support, Murakami M, et al. The Sasagawa project: a model for
Non-U.S. Gov’t Review]. Bull World Health Organ. deinstitutionalisation in Japan. Keio J Med. 2005;
2004;82(11):858–66. 54(2):95–101.
Kuno E, Asukai N. Efforts toward building a community- Moessner M, Bauer S. Online counselling for eating dis-
based mental health system in Japan. Int J Law orders: reaching an underserved population? J Ment
Psychiatry. 2000;23(3–4):361–73. Health. 2012;21(4):336–45.
Larrobla C, Botega NJ. Psychiatric care policies and Moussavi S, Chatterji S, Verdes E, Tandon A, Patel V,
deinstitutionalization in South America. Actas Esp Ustun B. Depression, chronic diseases, and decrements
Psiquiatr. 2000;28(1):22–30. in health: results from the World Health Surveys.
Lawrie SM. Newspaper coverage of psychiatric and phys- Lancet. 2007;370(9590):851–8.
ical illness. Psychiatr Bull. 2000;24:104–6. Moxham LJ, Pegg SA. Permanent and stable housing
Leff J. The outcome for long-stay non-demented patients. for individuals living with a mental illness in the
In: Leff J, editor. Care in the community: illusion or community: a paradigm shift in attitude for mental
reality? London: Wiley; 1997. health nurses. Aust N Z J Ment Health Nurs.
Leff J, Warner R. Social inclusion of people with mental 2000;9(2):82–8.
illness. Cambridge: Cambridge University Press; 2006. National Centre for Social Research. Adult psychiatric
Leff J, Dayson D, Gooch C, Thornicroft G, Wills W. morbidity in England, 2007. Results of a household
Quality of life of long stay patients discharged from survey. 2007. Retrieved from http://www.ic.nhs.uk/
two psychiatric institutions. Psychiatr Serv. 1996; webfiles/publications/mental%20health/other%20men
47:62–7. tal%20health%20publications/Adult%20psychiatric%
Lehtinen V, Katschnig H, Kovess-Masféty V, Goldberg D. 20morbidity%2007/APMS%2007%20(FINAL)%20St
Mental health policy and practice across Europe. andard.pdf
Maidenhead: McGraw Hill; 2007. Naylor C, Parsonage M, McDaid D, Knapp M, Fossey M,
Lelliott P, Tulloch S, Boardman J, Harvey S, Henderson M, Galea A. Long-term conditions and mental health. The
Knapp M. Mental health and work. London: Royal cost of co-morbidities. London: The King’s Fund and
College of Psychiatrists; 2008. Centre for Mental Health; 2012.
Limb M. Digital technologies offer new ways to OECD. Sickness, disability and work: breaking the
tackle mental health problems. Br Med J. 2012;345: barriers. 2010.
e5163. OECD. Sick on the job. 2011. From http://www.oecd.org/
Mangalore R, Knapp M. Cost of schizophrenia in England. els/emp/sickonthejob2011.htm
[Research Support, Non-U.S. Gov’t]. J Ment Health OECD. Sick on the job? Myths and realities about mental
Policy Econ. 2007;10(1):23–41. health and work. OECD Publishing; 2012.
Ohara T, Doi Y, Ninomiya T, Hirakawa Y, Hata J, Iwaki T, incidence of schizophrenia in different cultures. A pre-
et al. Glucose tolerance status and risk of dementia liminary report on the initial evaluation phase of the
in the community: the Hisayama study. [Comparative WHO Collaborative Study on determinants of outcome
Study Research Support, Non-U.S. Gov’t]. Neurology. of severe mental disorders. Psychol Med. 1986;16(4):
2011;77(12):1126–34. 909–28.
Olfson M, Marcus SC. National patterns in antidepressant Savaya R. Attitudes towards family and marital counseling
medication treatment. [Comparative Study Research among Israeli Arab women. J Soc Serv Res. 1995;
Support, Non-U.S. Gov’t Research Support, 21(1):35–51.
U.S. Gov’t, P.H.S.]. Arch Gen Psychiatry. 2009;66(8): Secker J, Grove B, Seebohm P. Challenging barriers to
848–56. employment, training and education for mental health
Oxford Dictionaries. Stigma. Oxford Dictionaries. 2010. service users: the service user’s perspective. J Ment
From http://oxforddictionaries.com/definition/stigma Health. 2001;10(4):395–404.
Paykel ES, Hart D, Priest RG. Changes in public attitudes Sederer LI, Silver L, McVeigh KH, Levy J. Integrating
to depression during the Defeat Depression Campaign. care for medical and mental illnesses. [Comment].
[Research Support, Non-U.S. Gov’t]. Br J Psychiatry. Prev Chronic Dis. 2006;3(2):A33.
1998;173:519–22. Shepherd G. Institutional care and rehabilitation. London:
Price R, Kompier M. Work stress and unemployment: Longman; 1984.
risks, mechanisms, and prevention. In: Hosman C, Shepherd G, Muijen M, Dean R, Cooney M. Inside resi-
Jané-Llopis E, Saxena S, editors. Prevention of mental dential care. London: The Sainsbury Centre for Mental
disorders: effective strategies and policy options. Health; 1995.
Oxford: Oxford University Press; 2006. Social Exclusion Unit. Mental health and social
Priebe S, Badesconyi A, Fioritti A, Hansson L, Kilian R, exclusion social exclusion unit report. 2004. Retrieved
Torres-Gonzales F, et al. Reinstitutionalisation in men- from http://www.socialinclusion.org.uk/publications/
tal health care: comparison of data on service provision SEU.pdf
from six European countries. Br Med J. 2005; Spandler H, Vick N. Direct payments, independent living
330:123–6. and mental health. London: Health and Social Care
Raja S, Wood SK, de Menil V, Mannarath SC. Mapping Advisory Service; 2004.
mental health finances in Ghana, Uganda, Sri Lanka, Tansella M. Community psychiatry without mental
India and Lao PDR. Int J Ment Heal Syst. 2010;4:11. hospitals – the Italian experience: a review. J R Soc
Read J, Baker S. Not just sticks and stones: a survey of Med. 1986;79:664–9.
the discrimination experienced by people with mental Thornicroft G. Shunned. Discrimination against people
health problems. 1996. with mental illness. Oxford: Oxford University Press;
République Française. National plan for “Alzheimer and 2007.
related diseases” 2008–2012. 2008. Available from Thornicroft G, Bebbington PE. Deinstitutionalisation –
http://www.plan-alzheimer.gouv.fr/IMG/pdf/Plan_ from hospital closure to service development. Br
Alzheimer_2008-2012_uk.pdf J Psychiatry. 1989;155:739–53.
République Française. 44 measures in order to fight Thornicroft G, Maingay S. The global response to mental
Alzheimer’s disease and related disorders. 2013. From illness – an enormous health burden is increasingly
http://www.plan-alzheimer.gouv.fr/-44-measures-.html being recognised. Br Med J. 2002;325(7365):608–9.
Roberts S, Heaver C, Hill K, Rennison J, Stafford B, Thornicroft G, Brohan E, Rose D, Sartorius N, Leese M.
Howat N, et al. Disability in the workplace: employers Global pattern of experienced and anticipated discrim-
and service providers’ response to the Disability ination against people with schizophrenia: a cross-
Discrimination Act in 2003 and preparation for 2004 sectional survey. Lancet. 2009;373:408–15.
changes. 2004. Van Dongen CJ. Quality of life and self-esteem in
Royal College of Pscyhiatrists Public Education Editorial working and nonworking persons with mental illness.
Board. Antipsychotics. 2012. From http://www.rcpsyc Community Ment Health J. 1996;32(6):535–48.
h.ac.uk/mentalhealthinfo/treatments/antipsychoticmed Velayudhan L, Poppe M, Archer N, Proitsi P, Brown RG,
ication.aspx Lovestone S. Risk of developing dementia in people
Saffer H. Tobacco advertising and promotion. In: Jha P, with diabetes and mild cognitive impairment.
Chaloupka F, editors. Tobacco control in developing [Research Support, Non-U.S. Gov’t]. Br J Psychiatry.
countries. Oxford: Oxford Medical Publications; 2000. 2010;196(1):36–40.
p. 215–36. Verdoux H, Tournier M, Begaud B. Antipsychotic pre-
Sainsbury Centre for Mental Health. Briefing 27. Benefits scribing trends: a review of pharmaco-epidemiological
and work for people with mental health problems: a studies. [Review]. Acta Psychiatr Scand. 2010;
briefing for mental health workers. 2004. Retrieved 121(1):4–10.
from http://www.centreformentalhealth.org.uk/pdfs/ Wahl OF. Telling is risky business. New Brunswick:
briefing_27.pdf Rutgers University Press; 1999.
Sartorius N, Jablensky A, Korten A, Ernberg G, Anker M, Wang PS, Patrick A, Avorn J, Azocar F, Ludman E,
Cooper JE, et al. Early manifestations and first-contact McCulloch J, et al. The costs and benefits of enhanced
998 J. Cylus et al.
depression care to employers. [Research Support, World Health Organisation. Mental health financing.
N.I.H., Extramural Research Support, Non-U.S. Gov’t]. 2003b.
Arch Gen Psychiatry. 2006;63(12):1345–53. World Health Organisation. Prevention of mental
Wang PS, Simon GE, Avorn J, Azocar F, Ludman EJ, disorders. Effective interventions and policy options.
McCulloch J, et al. Telephone screening, outreach, Geneva: WHO; 2004.
and care management for depressed workers and World Health Organisation. Mental health atlas 2005.
impact on clinical and work productivity outcomes – Geneva: WHO; 2005a.
a randomized controlled trial. JAMA. 2007;298(12): World Health Organisation. WHO resource book on men-
1401–11. tal health, human rights and legislation. Stop exclusion,
Warner R. Recovery from schizophrenia: psychiatry and dare to care. Geneva; 2005b.
political economy. Oxford: Oxford University Press; World Health Organisation. Scaling up care for mental,
1994. neurological, and substance use disorders. Geneva:
Whiteford HA, Degenhardt L, Rehm J, Baxter AJ, World Health Organisation; 2008.
Ferrari AJ, Erskine HE, et al. Global burden of disease World Health Organisation. Eurobarometer 73.2. Mental
attributable to mental and substance use disorders: health. Brussels: WHO; 2010.
findings from the Global Burden of Disease Study World Health Organisation. Mental health atlas 2011.
2010. Lancet. 2013;382(9904):1575–86. Geneva: WHO; 2011.
Wicks P. E-mental health: a medium reaches maturity. World Health Organisation. Dementia a public health
[Comment Editorial]. J Ment Health. 2012;21(4): priority. Geneva: WHO; 2012.
332–5. World Health Organisation. Mental health action plan
Wing L, Gould J, Gillberg C. Autism spectrum disorders in 2013–2020. Geneva: WHO; 2013.
the DSM-V: better or worse than the DSM-IV? Res Dev Xu W, Qiu C, Gatz M, Pedersen NL, Johansson B,
Disabil. 2011;32(2):768–73. Fratiglioni L. Mid- and late-life diabetes in relation to
World Health Organisation. The ICD-10 classification of the risk of dementia: a population-based twin study.
mental and behavioural disorders: clinical descriptions [Research Support, N.I.H., Extramural Research
and clinical guidelines. Geneva: WHO; 1992. Support, Non-U.S. Gov’t Twin Study]. Diabetes.
World Health Organization. The world health report 2001: 2009; 58(1):71–7.
mental health: new understanding, new hope. Geneva: Yip KS. Have psychiatric services in Hong Kong
World Health Organization; 2001. been impacted by the deinstitutionalization and com-
World Health Organisation. Investing in mental health. munity care movements? Adm Policy Ment Health.
Geneva: WHO; 2003a. 2000;27(6):443–9.

Health Services Research (2019) PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Health Services Research (2019) PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Health Services Research

Series Editor: Boris Sobolev

Adrian Levy · Sarah Goring

1. Clinical evaluation of health care outcomes

The series will be of signiﬁcant interest for healthcare professionals, program

More information about this series at http://www.springer.com/series/13490

With 142 Figures and 137 Tables

Constantine Gatsonis Boris Sobolev

Ewout van Ginneken Reinhard Busse

ISSN 2511-8293 ISSN 2511-8307 (electronic)

# Springer Science+Business Media, LLC, part of Springer Nature 2019

July 2015 Boris Sobolev

Part I Data and Measures in Health Services Research . . . . . . . 1

1 Health Services Data: Big Data Analytics for Deriving

8 Health Services Information: From Data to Policy Impact

9 Health Services Information: Key Concepts and

10 Health Services Information: Lessons Learned from the

11 Health Services Information: Patient Safety Research Using

12 Health Services Information: Personal Health Records as a

13 A Framework for Health System Comparisons: The Health

14 Health Services Knowledge: Use of Datasets Compiled

15 Waiting Times: Evidence of Social Inequalities in Access for

16 Health Services Data: The Ontario Cancer Registry

17 Challenges of Measuring the Performance of

Part II Methods in Health Services Research ................ 403

18 Analysis of Repeated Measures and Longitudinal Data in

Part III Health Care Systems and Policies ................... 753

31 Assessing Health Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755

33 Health System in China . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 779

Boris Sobolev is a health services researcher from the University of British

Adrian Levy is professor of epidemiology and health services research

Constantine Gatsonis is Henry Ledyard Goddard University Professor and

Ewout van Ginneken is coordinator of the Berlin ofﬁce of the European

Reinhard Busse is department head for healthcare management in the Faculty

EU-funded projects under the Seventh Framework, e.g., on the relationship

Ankit Agrawal Department of Electrical Engineering and Computer

Mahmoud Azimaee ICES Central, Toronto, ON, Canada

Michael Baiocchi Department of Statistics, Stanford University, Stanford,

Andrew J. Barnes Department of Health Behavior and Policy, School of

Karen Berg Brigham University of Washington, Seattle, WA, USA

Kaylee Britain University of Queensland School of Public Health, Brisbane,

Marni Brownell Manitoba Centre for Health Policy, University of Manitoba,

Charles Burchill Manitoba Centre for Health Policy, University of Manitoba,

Deborah Caldwell School of Social and Community Medicine, University

Anna Chaimani Department of Hygiene and Epidemiology, University of

Julie Gilbert Planning and Regional Programs, Cancer Care Ontario,

Patricia J. Martens: deceased.

Georgia Salanti Department of Hygiene and Epidemiology, University of

Chunliu Zhan Department of Health and Human Services, Agency for

XH Andrew Zhou Beijing International Center for Mathematical Research,

Ankit Agrawal and Alok Choudhary

Abstract from the Surveillance, Epidemiology, and

adequately analyzed by traditional processing tech-

Predictive Modeling Evaluation

Table 2 Popular predictive modeling algorithms

Deployment Association rule mining is useful to discover pat-

conﬁrmation (method unspeciﬁed), positive recommended, contraindicated due to other

Fig. 3 Screenshot of the lung cancer outcome calculator. (Available at http://info.eecs.northwestern.edu/

Patricia J. Martens: deceased.

# Springer Science+Business Media, LLC, part of Springer Nature 2019 19

Step 6: Release the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Abstract one of the most extensive collections of gov-

Introduction Our Data Is Our Strength