(Smtebooks - Com) Clinical Trials in Neurology - Design, Conduct, Analysis 1st Edition PDF

Clinical Trials in Neurology
Design, Conduct, Analysis

Clinical Trials in Neurology
Design, Conduct, Analysis
Edited by
Bernard Ravina, MD, MS
Medical Director, Translational Neurology, Biogen Idec, Cambridge, MA, USA
Jeffrey Cummings, MD
Director, Cleveland Clinic Lou Ruvo Center for Brain Health in Nevada, Ohio, and Florida, USA
Michael P. McDermott, PhD

Professor of Biostatistics, and Professor of Neurology,
University of Rochester School of Medicine, Rochester, NY, USA
R. Michael Poole, MD, FACP

Head, CNS and Pain Innovative Medicine Unit, AstraZeneca PLC, Waltham, MA, USA
c a mb rid g e u n ive r si t y pres s
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Mexico City
Cambridge University Press
he Edinburgh Building, Cambridge CB2 8RU, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521762595
© Cambridge University Press 2012
his publication is in copyright. Subject to statutory exception

and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2012
Printed in the United Kingdom at the University Press, Cambridge
A catalogue record for this publication is available from the British Library
Library of Congress Cataloguing in Publication data

Clinical trials in neurology : design, conduct, analysis / edited by Bernard Ravina ... [et al.].
p. cm.
Includes bibliographical references and index.
ISBN 978-0-521-76259-5 (hardback)
1. Neurology – Research – Methodology. 2. Clinical trials. I. Ravina, Bernard.
RC337.C62 2012
616.80072′4–dc23 2012000303
ISBN 978-0-521-76259-5 Hardback
Cambridge University Press has no responsibility for the persistence or

accuracy of URLs for external or third-party internet websites referred to in
this publication, and does not guarantee that any content on such websites
is, or will remain, accurate or appropriate.
Every efort has been made in preparing this book to provide accurate and up-to-date information
which is in accord with accepted standards and practice at the time of publication. Although case
histories are drawn from actual cases, every efort has been made to disguise the identities of the
individuals involved. Nevertheless, the authors, editors and publishers can make no warranties that
the information contained herein is totally free from error, not least because clinical standards
are constantly changing through research and regulation. he authors, editors and publishers therefore
disclaim all liability for direct or consequential damages resulting from the use of material contained
in this book. Readers are strongly advised to pay careful attention to information provided by the
manufacturer of any drugs or equipment that they plan to use.
To J, Gers, Double Reh, and Bewds
Contents
List of contributors ix
Preface xiii
Acknowledgements xv
Section 1. The role of clinical trials 12 Enrichment designs 127

Kathryn M. Kellogg and John Markman
in therapy development 13 Non-inferiority trials 135
1 The impact of clinical trials in neurology 1 Rick Chappell
E. Ray Dorsey and S. Claiborne Johnston
14 Monitoring of clinical trials: Interim
2 The sequence of clinical development 8 monitoring, data monitoring committees,
R. Michael Poole and group sequential methods 147
3 Unique challenges in the development of Rickey E. Carter and Robert F. Woolson
therapies for neurological disorders 19 15 Clinical approaches to post-marketing drug
Gilmore N. O’Neill safety assessment 160
Gerald J. Dal Pan
Section 2. Concepts in biostatistics
and clinical measurement Section 4. Ethical issues
16 Ethics in clinical trials involving the central
4 Fundamentals of biostatistics 28 nervous system: Risk, benefit, justice, and
Judith Bebchuk and Janet Wittes integrity 173
5 Bias and random error 42 Jonathan Kimmelman
Susan S. Ellenberg and Jacqueline A. French 17 The informed consent process: Compliance
6 Approaches to data analysis 52 and beyond 187
William R. Clarke Scott Y. H. Kim
7 Selecting outcome measures 69
Robert G. Holloway and Andrew D. Siderowf Section 5. Regulatory perspectives
18 Evidentiary standards for neurological drugs
Section 3. Special study designs and biologics approval 197
Russell Katz
and methods for data monitoring
19 Premarket review of neurological
8 Selection and futility designs 78 devices 206
Bruce Levin Eric A. Mann and Peter G. Como
9 Adaptive design across stages
of therapeutic development 91
Christopher S. Cofey
Section 6. Clinical trials in common
10 Crossover designs 101
neurological disorders
Mary E. Putt 20 Parkinson’s disease 215
11 Two-period designs for evaluation Karl Kieburtz and Jordan Elm
of disease-modifying treatments 113 21 Alzheimer’s disease 227
Michael P. McDermott Joshua D. Grill and Jefrey Cummings
vii
Contents
22 Acute ischemic stroke 242

Devin L. Brown, Karen C. Johnston, and
Section 7. Clinical trial planning and
Yuko Y. Palesch implementation
23 Multiple sclerosis 257 27 Clinical trial planning: An academic and
Richard A. Rudick, Elizabeth Fisher, and industry perspective 309
Gary R. Cutter Cornelia L. Kamp and Jean-Michel Germain
24 Amyotrophic lateral sclerosis 273 28 Clinical trial implementation, analysis,
Nazem Atassi, David Schoenfeld, and and reporting: An academic and industry
Merit Cudkowicz perspective 338
25 Epilepsy 284 Cornelia L. Kamp and Jean-Michel Germain
John R. Pollard, Susan S. Ellenberg, and 29 Academic-industry collaborations and
Jacqueline A. French compliance issues 352
26 Insomnia 295 D. Troy Morgan
Michael E. Yurcheshen, Changyong Feng, and
J. Todd Arnedt
Index 362
viii
Contributors
J. Todd Arnedt, PhD Merit Cudkowicz, MD, MMSc

Assistant Professor, Departments of Psychiatry and Professor of Neurology, Harvard Medical
Neurology; Director, Behavioral Sleep Medicine School and Massachusetts General Hospital,
Program, University of Michigan Health System, Ann Boston, MA, USA
Arbor, MI, USA
Jeffrey Cummings, MD
Nazem Atassi, MD, MMSc Director, Cleveland Clinic Lou Ruvo Center for Brain
Instructor in Neurology, Harvard Medical School and Health in Nevada, Ohio and Florida, USA.
Massachusetts General Hospital, Boston, MA, USA
Gary R. Cutter, PhD
Judith Bebchuk, ScD Department of Biostatistics, University of Alabama at
Statistical Scientist, Statistics Collaborative Inc., Birmingham, Birmingham, AL, USA
Washington, DC, USA
Gerald J. Dal Pan, MD, MHS
Devin L. Brown, MD, MS
Director, Oice of Surveillance and Epidemiology,
Associate Professor, Department of Neurology, Center for Drug Evaluation and Research, US Food
University of Michigan Health System, Ann Arbor, and Drug Administration, Silver Spring, MD, USA
MI, USA
E. Ray Dorsey, MD, MBA
Rickey E. Carter, PhD
Associate Professor of Neurology, Johns Hopkins
Associate Professor of Biostatistics, Department of
University School of Medicine, Baltimore, MD, USA
Health Sciences Research, Mayo Clinic, Rochester,
MN, USA Susan S. Ellenberg, PhD
Rick Chappell, PhD Professor of Biostatistics, Center for Clinical
Professor, Department of Biostatistics and Medical Epidemiology and Biostatistics, Perelman School
Informatics, University of Wisconsin School of of Medicine at the University of Pennsylvania,
Medicine and Public Health, Madison, WI, USA Philadelphia, PA, USA
William R. Clarke, PhD Jordan Elm, PhD

Professor of Biostatistics, he University of Iowa, Iowa Research Assistant Professor, Department of
City, IA, USA Biostatistics, Medical University of South Carolina,
Charleston, SC, USA
Christopher S. Coffey, PhD
Professor, Department of Biostatistics; Director, Changyong Feng, PhD
Clinical Trials Statistical and Data Management Assistant Professor of Biostatistics, Department of
Center, he University of Iowa, Iowa City, IA, USA Biostatistics and Computational Biology, University
of Rochester School of Medicine, Rochester, NY, USA
Peter G. Como, PhD
Lead Reviewer /Neuropsychologist, Center for Elizabeth Fisher, PhD
Devices and Radiological Health, Division of Department of Biomedical Engineering, Lerner
Opthalmic Neurological and ENT Devices, US Food Research Institute, Cleveland Clinic, Cleveland, OH,
and Drug Administration, Silver Spring, MD, USA USA
ix
List of contributors
Jacqueline A. French, MD Scott Y. H. Kim, MD, PhD

Director, Epilepsy Study Consortium, Department of Associate Professor of Psychiatry and
Neurology, NYU Langone Medical Center, New York, Co-Director, Center for Bioethics and Social
NY, USA Sciences in Medicine, and Department of Psychiatry,
University of Michigan Medical School,
Jean-Michel Germain, PhD
Ann Arbor, MI, USA
Global Trial Director, Wyeth Pharmaceuticals France,
a Division of Pizer Inc., Collegeville, PA, USA Jonathan Kimmelman, PhD
Clinical Trials Research Group, Biomedical
Joshua D. Grill, PhD
Ethics Unit, Department of Social Studies of
Mary S. Easton Center for Alzheimer’s Disease
Medicine, Faculty of Medicine, McGill University,
Research; Katherine and Benjamin Kagan Alzheimer’s
Montreal, QC, Canada
Disease Treatment Development Program,
Department of Neurology, David Gefen School of Bruce Levin, PhD
Medicine, University of California Los Angeles, Los Professor, Department of Biostatistics, Mailman
Angeles, CA, USA School of Public Health, Columbia University, New
York, NY, USA
Robert G. Holloway, MD, MPH
Professor of Neurology and Community and Michael P. McDermott, PhD
Preventive Medicine, University of Rochester Medical Professor, Department of Biostatistics and
Center, Rochester, NY, USA Computational Biology and Department of
Neurology, University of Rochester Medical Center,
Karen C. Johnston, MD, MSc
Rochester, NY, USA
Harrison Distinguished Professor and Chair,
Department of Neurology, University of Virginia, Eric A. Mann, MD, PhD
Charlottesville, VA, USA Clinical Deputy Director, Division of Ophthalmic,
Neurological, and ENT Devices, Center for Devices
S. Claiborne Johnston, MD, PhD
and Radiological Health, US Food and Drug
Professor of Neurology and Epidemiology, Administration, Silver Springs, MD, USA
University of California San Francisco,
San Francisco, CA, USA John Markman, MD
Director , Translational Pain Research, Department
Cornelia L. Kamp, MBA of Neurosurgery, University of Rochester School of
Department of Neurology, University of Rochester Medicine, Rochester, NY, USA
Medical Center, Rochester, NY, USA
D. Troy Morgan Esq.
Russell Katz, MD Director of Corporate Compliance, Biogen Idec,
Director, Division of Neurology Products, US Food Cambridge, MA, USA
and Drug Administration, Silver Spring, MD, USA
Gilmore N. O’Neill, MB, MMedSc
Kathryn M. Kellogg, MPH, BA Vice President, Multiple Sclerosis – Clinical
Research Fellow, Department of Emergency Development, Biogen Idec, Cambridge, MA, USA
Medicine, University of Rochester School of
Medicine, Rochester, NY, USA Yuko Y. Palesch, PhD
Professor of Biostatistics and Director of the Division
Karl Kieburtz, MD, MPH
of Biostatistics and Epidemiology, Medical University
Robert J. Joynt Professor Neurology; Director, of South Carolina, Charleston, SC, USA
Center for Human Experimental herapeutics;
Professor, Community & Preventive Medicine and John R. Pollard, MD
Environmental Medicine; University of Rochester Penn Epilepsy Center, University of Pennsylvania,
Medical Center, Rochester, NY, USA Philadelphia, PA, USA
x
List of contributors
R. Michael Poole, MD, FACP Andrew D. Siderowf, MD, MSCE

Head, CNS and Pain Innovative Medicine Unit, Associate Professor of Neurology at the Pennsylvania
AstraZeneca PLC, Waltham, MA, USA Hospital, University of Pennsylvania, Philadelphia,
PA, USA
Mary E. Putt, PhD, ScD
Associate Professor of Biostatistics and Epidemiology, Janet Wittes, PhD
Center for Clinical Epidemiology and Biostatistics, President, Statistics Collaborative Inc., Washington,
Department of Biostatistics and Epidemiology, DC, USA
University of Pennsylvania School of Medicine,
Philadelphia, PA, USA Robert F. Woolson, PhD
Professor Emeritus, College of Medicine, Medical
Bemard Ravina, MD, MS University of South Carolina, Charleston, SC;
Medical Director, Translational Neurology, Biogen Center for Health Services Research in Primary
Idec, Cambridge, MA, USA Care, Durham VAMC, Durham; Department of
Richard A. Rudick, MD Biostatistics and Bioinformatics, Duke University
Director, Mellen Center for Multiple Sclerosis Treatment Medical Center, Durham, NC, USA
and Research, Department of Neurology, Neurological Michael E. Yurcheshen, MD
Institute, Cleveland Clinic, Cleveland, OH, USA
Assistant Professor, Departments of Neurology
David Schoenfeld, PhD and Internal Medicine; Director, Sleep Medicine
Professor of Medicine, Harvard Medical School and Fellowship, University of Rochester School of
Massachusetts General Hospital, Boston, MA, USA Medicine, Rochester, NY, USA
xi
Preface
he aging population is increasing the global burden of carefully selecting the appropriate dose, design, popu-
neurological diseases and the need for safe and efect- lation, measure, and analytical approach we can best
ive therapeutics for these disorders. While therapeutic test the intervention’s mechanism and its relevance for
targets for neurological disorders are increasingly treating patients with neurological disorders. Rather
tractable, neurology also has one of the highest failure than a high volume of clinical trials, we seek high qual-
rates in late stage clinical trials. here is an increasing ity trials that have the potential to lead to improve-
need for proiciency in the design, conduct, analysis, ments in patient care and quality of life.
and interpretation of clinical trials in neurology. his is
especially true in the early and middle stages of thera- Audience
peutic development, which determine if and how com-
his text is intended for those who conduct clinical
parative eicacy studies should be conducted.
trials in academia, the pharmaceutical and biotech-
he goal of this book is to describe how the prin-
nology industries, and government and is written by
ciples of clinical trials can be applied to the challenges
experts from each of these areas. he intended audi-
that arise in developing therapies for neurological dis-
ence is meant to include the broad spectrum of med-
orders. he fundamentals of clinical trials are explored
ical researchers, statisticians, data managers, trial
in several existing texts and are the same across difer-
managers, regulators, and program oicials. Clinical
ent ields of medicine. Here we describe the application
trials are by nature multidisciplinary, social undertak-
of those principles to the speciic clinical questions that
ings that are accomplished by teams. hose teams work
arise with the study of neurological diseases.
most efectively when the members have a common
here is no one trial design that meets all objectives
understanding of goals and principles that unite their
for a particular phase of development. Rather there are
diferent areas of expertise.
parameters that need to be optimized for each inter-
vention, question, and study. A clinical trial can be
deined as an experiment in humans that is designed Organization and terminology
to test a medical, surgical, behavioral, or other type he text is written to emphasize key concepts, with
of intervention. his deinition does not presuppose examples from neurology and other ields and refer-
a particular design, type of control group, or analysis ences that can provide additional detail. It should be
plan. When designing a trial and consulting this text regarded as a starting point for learning about clinical
for guidance, the reader should carefully consider the trials and a companion to formal coursework and prac-
clinical question they are facing and how that question tical experience.
its in the overall program of research for the interven- he text begins with a description of the growing
tion. he next step is to select a design that can prac- need for progress in the treatment of neurological dis-
tically and eiciently answer the question and guide orders, the sequence of clinical development, and a dis-
decision-making about the intervention and the steps cussion of the unique challenges of neurology research,
to further develop it. such as measuring drug disposition in the central ner-
he underlying motivation for this text is the vous system. While this is not a book speciically about
notion that better clinical trial design and conduct will drug development, any clinical trial must be nested
improve the eiciency of the development process by within an overall development plan to determine how
eliminating interventions with a low likelihood of suc- to optimize the intervention (learning) and then to
cess and focusing resources on those with more prom- actually test it (conirming) for its hypothesized beneit.
ise. his does not mean that all trials will be positive. By Subsequent sections focus on core principles of clinical
xiii
Preface
trials: control of bias and random error, basic aspects of disorders, where clinical trials are relatively new
statistical inference, notable clinical trial designs in the and researchers are oten working in uncharted or
neurology literature, clinical measurement and assess- unfamiliar territory. Our objective is to provide direc-
ment of outcomes, interim monitoring, ethics, and the tion from what has been learned through experience
regulatory framework for drugs and devices using the to help researchers avoid costly mistakes. he inal
US as an example. We then consider how these prin- chapter of the text focuses on issues of inancial rela-
ciples manifest in clinical trials for several common tionships and compliance in industry-academic col-
neurological disorders. laborations. his issue is of growing importance and
We have devoted two chapters to clinical opera- transparency is necessary to facilitate these essential
tions, which is unusual in a clinical trials text. It is collaborations and ensure trust in the clinical research
not suicient to merely design an elegant experiment. enterprise.
he experiment must be conducted in a manner that
ensures the integrity of the intervention and the study Disclaimer
data. he steps involved in planning and implement- Any views or opinions presented in this book are solely
ing studies are oten neglected in texts and courses and those of the authors, and not necessarily those of the
many trials fail on aspects of execution, timeline, and US Food and Drug Administration or the authors’
budget. his is especially true for many neurological employers or institutions.
xiv
Acknowledgements
his is a multi-author text and this diverse group in trainees alike have helped to shape my approach to clin-
many ways relects the multidisciplinary teams needed ical trials. I would like to thank Janine Fitzpatrick and
to conduct clinical trials and develop new therap- Briana Bouchard for their administrative and technical
ies. Many of the authors and my co-editors have been support and Nancy Richert for the MRI cover image.
mentors and colleagues through my positions at the he otherwise un-named contributor to this text is
National Institute of Health (NIH), academic medi- my wife Joanna. Her unwavering support and critical
cine, and now the biotechnology industry. I am grate- thinking skills have been essential for this text and for
ful to them not only for contributing to this text but the many studies, large and small, that ill a career in
for facilitating my own interest in and understanding clinical research.
of clinical trials. he NIH/NINDS Neurology Clinical Bernard M. Ravina
Trials Methods Course brought many of us together. Cambridge, MA
he focused discussions and debates with faculty and
xv
Section 1 The role of clinical trials in therapy development
Chapter
The impact of clinical trials in neurology
1 E. Ray Dorsey and S. Claiborne Johnston
Overview In China, for example, the number of individuals over

65 will more than double from 110 million in 2010
Fueled by the aging global population and economic
to nearly 240 million by 2030 (Figure 1.1) [1]. his
growth of developing countries, the demand for new, safe,
change in population structure – occurring in many
and efective therapeutics for neurological conditions in
countries – will increase the burden of neurological
the US and globally will increase dramatically over the next
disease globally [2]. Cerebrovascular disease currently
generation. Scientiic discovery and clinical investigation
accounts for the majority of global disability for neuro-
are critical for developing and evaluating new treatments
logical disorders as measured in disability-adjusted life
and can have substantial public health beneits. However,
years and will account for 4% of total disability-adjusted
several challenges confront the development of new ther-
life years globally by 2030 [2]. Other conditions, such
apies. Some of these are generic (e.g., rising costs of drug
as Alzheimer’s disease and Parkinson’s disease, will see
development, misaligned incentives, recruitment of
the number of individuals afected increase, and that
research participants) and some are speciic to neurological
increase will be greatest in developing countries [3],
conditions (e.g., slow course of neurodegenerative condi-
[4]. he number of individuals with Parkinson’s dis-
tions, limited availability of biomarkers). Along with these
ease in the world’s most populous nations is projected
challenges are potential advances that could accelerate
to more than double from approximately 4 million in
development, including scientiic progress in the platforms
2005 to over 8 million in 2030 (Figure 1.2) [4].
that support discovery and development (e.g., in genetics
he growth in the burden of neurological disease
and biotechnology) and in the more active participation of
coupled with the economic growth of developing
patients and advocacy groups that can help fuel the devel-
economies, especially in Asia, will increase the glo-
opment of new treatments, even for the rarest of disorders.
bal demand for neurotherapeutics. As the income of
Beyond drugs for neurological conditions, clinical trials
countries increases (as measured by per capita gross
will examine other promising therapeutic interventions,
domestic product), countries tend to devote a greater
including devices and procedures. Meeting the great need
proportion of their gross domestic product to health
for efective therapeutics will not only require continued
care [5]. Access to care for individuals with neuro-
scientiic discovery but also modiications in commercial
logical conditions is severely limited in many parts of
incentives, improvements in the conduct of clinical trials,
the world; however, with increasing income, a larger
and advocacy and participation by the growing number of
proportion of individuals in developing economies
individuals afected by neurological conditions.
will have the resources necessary to beneit from cur-
rent and future treatments for their conditions.
The burden of neurological disease
is growing globally Clinical investigations can have
he increase in life expectancy that occurred in the a substantial public health impact
twentieth century has led to substantial increases in the he development of new drugs and treatments is costly.
number of individuals with neurological conditions, a he current estimate for the successful development of
trend that is expected to accelerate during this century. a drug, including opportunity costs, is $800 million,
Clinical Trials in Neurology, ed. Bernard Ravina, Jefrey Cummings, Michael P. McDermott, and R. Michael Poole. Published
by Cambridge University Press. © Cambridge University Press 2012.
1
Section 1: The role of clinical trials
Male (a) China - 2010 Female Figure 1.1. Population pyramids for
China, 2010 (a) and 2030 (b). Source: US
100+
95–99 Census Bureau, International Data Base
90–94 available at http://www.census.gov/ipc/
85–89 www/idb/
80–84
75–79
70–74
65–69
60–64
55–59
50–54
45–49
40–44
35–39
30–34
25–29
20–24
15–19
10–14
5–9
0–4
65 52 39 26 13 0 0 13 26 39 52 65
Population (in millions)
Male (b) China - 2030 Female

100+
95–99
90–94
85–89
80–84
75–79
70–74
65–69
60–64
55–59
50–54
45–49
40–44
35–39
30–34
25–29
20–24
15–19
10–14
5–9
0–4
65 52 39 26 13 0 0 13 26 39 52 65
Population (in millions)
[6] and the estimate for the successful development of clinical trials has been substantial [9]. In that study,
a new neurological drug exceeds $1 billion [7]. While the investigators examined the costs associated with
the resources required to develop a new therapy are 28 clinical trials and resulting health care expendi-
substantial, the societal return on this investment in tures from adoption of interventions with beneit and
improved health can be even larger. compared those costs to resulting improvements in
One economic study suggests that the societal health over 10 years following completion of the trial.
return from improved health on a handful of proven he study found that the total cost of the clinical trials
interventions would justify total US health care expen- was $335 million and that over ten years the total cost
ditures, including the research to produce the new ther- associated with the clinical trials and adoption of the
apies [8]. A detailed analysis of clinical trials funded by beneicial intervention was $3.6 billion. However, the
the National Institute of Neurological Disorders and estimated net health beneit was $18.1 billion, which
Stroke found that the public return on investment in was calculated as the incremental health beneit from
2
Chapter 1: The impact of clinical trials in neurology
Legend
0–50% growth
50–100% growth
>100% growth
Not examined
Figure 1.2. Change in number of people with Parkinson’s disease in the world’s most populous nations from 2005 to 2030*.
*Among individuals over 50 in the world’s ten most and Western Europe’s five most populous nations.
Reproduced with permission from ref [4].
the intervention (measured in quality-adjusted life in inancial support, the number of novel treatments
years and then multiplied by the per-person annual approved by the US FDA has remained relatively stag-
gross domestic product) projected over ten years. he nant [10, 11], even when allowing for time lags between
net societal beneit was, therefore, $15.5 billion ($18.1 when the investments were made and when new prod-
billion less $3.6 billion), a 40-fold return on the research ucts might be expected [12]. hus, the return on the
investment. research investment over at least the last 10 years –
he results of the study highlight two additional measured as new therapies – is decreasing.
important indings (Table 1.1). First, only a small Coupled with the lack of increase in the number of
minority (6 of the 28 or 21%) of the clinical trials were new drugs is the rising cost of drug development [13].
associated with any incremental societal beneit. And, In 1979, the estimated cost for the clinical development
second, most (80%) of the societal beneit came from of a new drug was $54 million. By 2003, that number
two clinical trials. hese points highlight the substan- had increased nine-fold to $467 million [6]. Larger
tial risk of drug development for neurological con- scale and longer duration trials may account for some
ditions and the need to reduce and spread that risk of the increase in costs.
efectively. Another large cost and barrier to the development
of new therapies is the recruitment of research partici-
Developing new and novel drugs pants [14]. Public participation may be the most crit-
ical challenge. Despite bearing the burden of disease
is increasingly difficult and expressing a strong desire to participate in clinical
In addition to the inherent risks involved in clinical tri- trials, the public is not always encouraged to partici-
als, the challenges of translating scientiic advances into pate in research [15]. Only 7% of Americans report
new therapeutic advances are increasing. From 1994 to their physician ever suggested that they participate in a
2003, funding for US biomedical research from indus- research study [15], and when they do participate, par-
try and government doubled [10]. Funding grew at a ticipants oten are not informed of the research results
slower rate from 2003 to 2008 and now exceeds $100 [16, 17]. Dedicated eforts to informing individuals of
billion annually [11]. However, despite this increase research opportunities, reducing the travel burden of
3
4
Table 1.1 Estimated use, health benefits, treatment costs, and net societal benefits from eight clinical trials funded by the National Institute of Neurological Disorders and Strokea
10-year projections
Quality-adjusted Societal cost Total net Quality-adjusted Treatment Incremental net
life years per use per use ($) uses life years costs ($) benefits ($)
Randomized Indomethacin Germinal Matrix/ 1.00 −632 146 837 146 837 −92 857 340 6 003 009 978
Intraventricular Hemorrhage Prevention Trial
Diazepam for acute repetitive seizures NA 849 1 050 776 −891 839 458 890 276 155
Recombinant beta interferon as treatment for 0.014 3213 297 256 4038 955 140 007 −800 131 189
multiple sclerosis
Asymptomatic Carotid Artery Stenosis 0.25 11552 371 282 92 820 4 288 862 203 −590 564 802
Collaborative Study
Stroke prevention in atrial fibrillation I 0.24 984 147 736 35 457 145 402 116 1 267 774 453
North American Symptomatic Carotid 0.35 1819 163 669 57 120 297 716 385 1 940 786 211
Endarterectomy Trial
Tissue plasminogen activator in ischemic 0.75 −6074 178 517 134 066 −1 084 314 904 6 469 781 905
stroke
Extracrania/Intracranial Arterial Anastomosis NA 30 998 −10 500 .. −325 476 690 296 277 864
Study
Total .. .. .. 470 339 3 292 632 319 15 477 210 576
NA: not available. Incremental net benefits include trial treatment costs, and quality-adjusted life years valued at 2004 per capita gross domestic product $40 310. Products of per use
and net use data vary slightly from 10-year projections because of rounding.
a
The clinical trials are from a set of 28 phase 3 clinical trials whose funding was completed before January 1, 2000 and for which data on use, health benefits, and costs were available.
Reproduced with permission from ref [9].
studies [18], and communicating research results [19] By contrast, 100% of pivotal studies for non-orphan
can facilitate participation in clinical trials. indications included at least two randomized, double-
he public is increasingly looking for roles beyond blind, placebo-controlled studies.
passive participation as research ‘subjects’ in clinical Scientiic advances have also led to the develop-
trials. Some, especially those afected by rare condi- ment of new biological therapies for neurological con-
tions, are creating their own research networks [20], ditions. Some of these have addressed conditions with
funding their own studies [21], and even forming their previously very limited treatment options (e.g., botu-
own virtual biotechnology irms. Active participation linum toxin for focal dystonia) and others have dem-
by the public can lead to creative solutions to many of onstrated substantial eicacy (e.g., natalizumab for
the challenges industry currently faces and may ultim- multiple sclerosis). However, along with these beneits
ately reduce the costs of development and increase the have come risks, including manufacturing and safety.
impact of proven therapies. he emergence of signiicant safety concerns (e.g., pro-
gressive multifocal leukoencephalopathy) with natali-
Developing neurotherapeutics has zumab [27] has led to restrictions on its use and has
increased the need and interest for long-term safety
its own set of challenges monitoring of drugs [28].
Many of the challenges of drug development are par- In addition to drugs, clinical trials frequently
ticularly acute for treatments of neurological condi- evaluate devices for neurological conditions. he num-
tions. Like biomedical research as a whole, increases in ber of devices approved by the FDA is actually more
funding for neuroscience research have not translated than ten-fold greater than the number of drugs [29].
into an increase in the number of novel treatments Part of this diference is due to the lower US regula-
[22]. Particular challenges include a paucity of vali- tory threshold for the approval of devices compared
dated biomarkers [23] – with the notable exception of to drugs [29, 30]. he FDA classiies devices into three
imaging for multiple sclerosis – that can assess eicacy levels. As described in more detail in the chapter on
(or lack thereof) of experimental therapeutics, longer device regulation, Class I devices are generally low-risk
duration of clinical trials [7], and higher failures rates devices and Class II devices represent an intermediate
due to lack of eicacy [24]. risk. Both are generally exempt from premarket review
by the FDA unless the manufacturer desires to mar-
ket the device for a new indication. Class II devices are
The scope of investigations for evaluated by a Premarket Notiication, or 510(k), pro-
neurological treatments is growing cess that only requires that the new device is as safe and
he scope of clinical trials for neurological conditions efective (‘substantially equivalent’) to another mar-
is rapidly expanding to address orphan indications, keted Class II device. Most 510(k) submissions, which
biologics, medical devices, surgeries, and compara- the FDA has 90 days to review, do not require clinical
tive efectiveness studies. Interest in orphan drugs is data to demonstrate substantial equivalence. Class III
increasing, due in part to advances in the understand- devices, which comprise only 5% of products, are more
ing of rare neurological disorders and the high proile complex and high-risk, and must demonstrate a ‘rea-
commercial success of some drugs for orphan indica- sonable assurance of the safety and efectiveness’ for
tions. For example, the drug imiglucerase (Cerezyme) their desired indication [30]. Some class III devices,
for Gaucher’s disease generated nearly $800 million of such as deep-brain stimulators, have undergone rigor-
revenue in 2009 [25]. ous assessments in clinical trials [31, 32].
he design of the pivotal studies that have led to the he scope of clinical trials for neurological
approval of drugs for orphan indications within neur- interventions also includes surgeries. High quality
ology difers from that for non-orphan indications, and data on surgical interventions, such as temporal lobe
this may reduce the costs of clinical development. For resections for epilepsy [33], are critical to understand-
example, 68% of drugs with orphan indications did not ing their relative risks and beneits in the target popula-
have at least two pivotal studies that were randomized, tions. he challenge, like that for drugs and devices, is
double-blind, or placebo-controlled even though that once beneit has been established for a given target
the standard regulatory requirements are the same population in a rigorous study, the intervention quickly
for products with an orphan drug designation [26]. spreads to populations for which the beneit is lower or
5
not established. For example, while carotid endarterec- ultimate success of these expanded investigations will
tomy ofers signiicant beneit for symptomatic carotid require continued attention to rigorous methodology,
disease [34, 35], the vast majority are done for individ- measures to reduce the burden of participation, and
uals with asymptomatic disease for whom the beneits expanded collaboration among industry, other spon-
are much smaller and less clear. Similar to outcomes sors, and investigators.
of device trials, surgical outcomes in clinical trials is
a function of the investigators – oten the most expe-
rienced surgeons operating in the most experienced
Acknowledgement
centers – which raises questions about the generaliz- We thank Mr. Nick Scoglio for his assistance in the
ability of results to the broader population. preparation of this chapter.
A inal frontier for clinical investigations in
neurology is comparative efectiveness studies. References
Comparative efectiveness research ‘is the generation
1. U.S. Census Bureau, International Data Base. 2010.
and synthesis of evidence that compares the beneits http://www.census.gov/ipc/www/idb/ (Accessed March
and harms of alternative methods to prevent, diagnose, 8, 2010.)
treat, and monitor a clinical condition or to improve 2. World Health Organization. Global burden of
the delivery of care [36].’ While comparative efect- neurological disorders estimates and projections.
iveness has gained more attention recently due to the 2006. http://www.who.int/mental_health/neurology/
$1.1 billion dollars in funding for these studies as part chapter_2_neuro_disorders_public_h_challenges.pdf.
of the American Recovery and Reinvestment Act of (Accessed February 5, 2010).
2009 [37], comparative efectiveness studies in neur- 3. Ferri CP, Prince M, Brayne C, et al. Global prevalence of
ology are not new. For example, about half of the 31 dementia: a Delphi consensus study. Lancet 2005; 366:
trials the National Institute of Neurological Disorders 2112–7.
and Stroke funded prior to 2000 could qualify as com- 4. Dorsey ER, Constantinescu R, hompson JP, et al.
parative efectiveness research [38]. Among these tri- Projected number of people with Parkinson disease
als were the comparison of low-dose warfarin plus in the most populous nations, 2005 through 2030.
Neurology 2007; 68: 384–6.
aspirin vs. standard warfarin for stroke prevention for
those with atrial ibrillation and a comparison of val- 5. Reinhardt UE, Hussey PS, and Anderson GF. U.S.
proate vs. phenytoin for seizure prophylaxis ater brain health care spending in an international context. Health
Af 2004; 23: 10–25.
trauma. Trials like these, including trials comparing
ways health care is delivered, will likely become more 6. DiMasi JA, Hansen RW, and Grabowski HG. he price
of innovation: new estimates of drug development
common in the future, especially because many of the
costs. J Health Econ 2003; 22: 151–185.
top priorities for comparative efectiveness research
identiied by the Institute of Medicine involve neuro- 7. Adams CP and Bratner W. Estimating the cost of new
drug development: is it really 802 million dollars?
logical conditions [37]. Health Af 2006; 25: 420–8.
8. Cutler DM and McClellan M. Is technological change in
Conclusions medicine worth it? Health Af 2001; 20: 11–29.
9. Johnston SC, Rootendberg JD, Katrak S, et al. Efect
he need and impact of clinical trials for neurology will
of a US National Institutes of Health programme of
increase in the future. Demographic and economic fac- clinical trials on public health and costs. Lancet 2006;
tors will fuel this demand and increase the geographic 367:2057–8.
reach of clinical trials, which will raise its own chal- 10. Moses H, Dorsey ER, Matheson DH, et al. Financial
lenges [39]. Continued scientiic advances will allow anatomy of biomedical research. JAMA 2005;
better characterization of clinical conditions, new 294:1333–42.
biomarkers will provide for more eicient and inform- 11. Dorsey ER, de Roulet J, hompson JP, et al. Funding of
ative investigations, and increased public participation US biomedical research, 2003–2008. JAMA 2010; 303:
will lead to more creative funding and organization of 137–43.
clinical trials. he scope of clinical trials for neurology is 12. Dorsey ER, hompson JP, Carrasco M, et al. Financing
rapidly expanding and has moved past drugs to devices, of U.S. biomedical research and new drug approvals
surgeries, and comparative efectiveness research. he across therapeutic areas. PLoS One 2009; 4: e7015.
6
13. Booth B and Zemmel R. Prospects for productivity. Nat treatment with natalizumab and interferon beta-1a for
Rev Drug Discov 2004; 3: 451–6. multiple sclerosis. N Engl J Med 2005; 353: 369–74.
14. Sung NS, Crowley WF, Genel M, et al. Central 28. he Pink Sheet, November 3, 2008, p. 27–28.
challenges facing the national clinical research 29. Johnston SC and Hauser SL. Neurology and medical
enterprise. JAMA 2003; 289: 1278–87. devices. Ann Neurol 2006; 60: 11A–12A.
15. Research! America: An Alliance For Discoveries in 30. Yustein A. he FDA’s process of regulatory premarket
Health: 2008 Poll Data. http://www.researchamerica. review for new medical devices. http://www.gastro.org/
org/advocacy_awards (Accessed February 20, 2010). user-assets/Documents/08_Publications/06_GIHep_
16. Meier, B. Participants let uninformed in some halted Annual_Review/Articles/Yustein.pdf. (Accessed
medical trials. New York Times, October 30, 2007. February 20, 2010).
(Accessed February 23, 2010). 31. Deuschl G, Schade-Brittinger C, Krack P, et al.
17. Berenson A. Ater a trial, silence. New York Times A randomized trial of deep-brain stimulation for
November 21, 2007. (Accessed February 23, 2010). Parkinson’s disease. N Engl J Med 2006; 355: 896–908.
18. Karlawish J, Cary MS, Rubright J, et al. How 32. Weaver FM, Follet K, and Stern M. Bilateral deep brain
redesigning AD clinical trials might increase study stimulation vs best medical therapy for patients with
partners’ willingness to participate. Neurology 2008; 71: advanced Parkinson disease: a randomized controlled
1883–8. trial. JAMA 2009; 301: 63–73.
19. Dorsey ER, Beck CA, Adams M, et al. Communicating 33. Wiebe S, Blume WT, Girvin JP, et al. A randomized,
clinical trial results to research participants. Arch controlled trial of surgery for temporal-lobe epilepsy.
Neurol 2008; 65: 1590–5. N Engl J Med 2001; 345: 311–8.
20. Frydman GJ. Patient-driven research: rich 34. North American Symptomatic Carotid Endarterectomy
opportunities and real risks. J Particip Med 2009; Trial Collaborators. Beneicial efect of carotid
1: e12. endarterectomy in symptomatic patients with high-
grade carotid stenosis. N Engl J Med 1991; 325: 445–53.
21. Merz, J. Finding a cure: paying to keep your drug trial
alive. Wall Street Journal, April 10, 2007 (Accessed 35. Burton TM and Kamp, J. Study boosts stents in stroke
February 20, 2010). prevention. Wall Street Journal, February 27, 2010.
(Accessed March 5, 2010).
22. Dorsey ER, Vitticore P, and de Roulet J. Financial
anatomy of neuroscience research. Ann Neurol 2006; 60: 36. Chaturvedi S, Bruno A, Feasby T, et al. Carotid
652–9. endarterectomy – an evidence-based review: report
of the herapeutics and Technology Assessment
23. Dunckley T, Coon KD, and Stephan DA. Discovery and Subcommittee of the American Academy of Neurology.
development of biomarkers of neurological disease. Neurology 2005; 65: 794–801.
Drug Disc Today 2005; 10: 326–334.
37. Institute of Medicine. Initial Priorities for
24. Gordian MA, Singh N, and Zemmel RW. Why drugs fall Comparative Efectiveness Research. 2009. http://
short in late-stage trials? McKinsey Q November 2006. www.iom.edu/~/media/Files/Report%20Files/2009/
(Accessed February 20, 2010). ComparativeEfectivenessResearchPriorities/CER%20
25. Morrison T. Big biotechs preview earnings as JPM report%20brief%2008–13–09.ashx (Accessed February
conference continues. BioWorld.com. January 13, 2010 8, 2010).
(Accessed February 5, 2010). 38. Johnston, SC and Hauser SL. Comparative efectiveness
26. Mitsumoto J, Dorsey ER, Beck CA, et al. Pivotal studies research in the neurosciences. Ann Neurol 2009; 65:
of orphan drugs approved for neurological disease. Ann A6–A8.
Neurol 2009; 66: 184–90. 39. Glickman SW, McHutchison JG, Peterson ED, et al.
27. Kleinschmidt-DeMasters BK, and Tyler KL. Progressive Ethical and scientiic implications of the globalization
multifocal leukoencephalopathy complicating of clinical research. N Engl J Med 2009; 360: 816–23.
7
Section
Section1 The role of clinical trials in therapy development
Chapter
The sequence of clinical development
2 R. Michael Poole
Introduction as being performed during a speciic phase (such as

a human volunteer study, phase 1) can be performed
Clinical development can be described as a process
at multiple times during a development program.
of asking and answering speciic scientiic and oper-
It is preferable when creating a clinical develop-
ational questions at speciic times to learn about
ment plan to organize one’s thinking into stages of
the risks and beneits of drugs or devices that may
information gathering that will accomplish speciic
be useful for human health. Good clinical devel-
objectives.
opment requires the involvement of skilled sci-
Table 2.1 provides an illustration of this concept
entists from many diferent disciplines working
and shows that, in the simplest way of thinking, clin-
together under the guidance of a thoughtful plan
ical programs can be divided into early, middle and late
that describes the program of research that will pro-
vide the data to answer these questions. Because the stages. Although there is some overlap, each develop-
human, monetary, and time resources required to ment stage has unique objectives that are required to
initiate and complete a clinical development pro- progress further into development. he information
gram are signiicant, every such plan involves care- collected at each stage builds upon what has already
ful articulation and sequencing of the questions to be been learned and inluences how decisions are made
answered. with respect to study design, population, indication,
It is especially important at the outset to state and program size.
clearly the ultimate objective for a clinical program What follows is a brief description of the ques-
and how the approach being undertaken may improve tions that are typically asked and answered at each
on what is currently known or practiced. Is the pur- stage of clinical development and the kinds of clin-
pose of the trial to improve prognostication, or provide ical trials that are utilized in the efort. his chapter
a better understanding of disease or biomarkers? Is the focuses speciically on the activities and questions
objective to demonstrate eicacy, safety, or economic that are involved in the generation of data to support
advantages of a drug or device over current standards the registration and approval of a drug candidate. he
of care? Is there an expectation that the approach will ultimate objective in this case is to demonstrate the
ofer improved survival or long-term outcome? Each of use of a drug for management of symptoms or signs
these objectives requires a very diferent clinical plan of an illness or to cure or slow progression of a dis-
and sequence of experiments. ease. However, a similar framework and discipline
Typically, clinical programs are described as can be used when ordering the sequence of questions
involving several speciic phases (phases I–IV). By for medical devices or for more academic clinical pro-
convention, this scheme provides some understand- grams aimed at improving diagnosis, gaining better
ing of the kinds of trials employed and the subjects understanding of a disease state, or prevention of ill-
being studied, but the speciic phase does not provide ness. Lastly, some important sources of information
a good basis for understanding exactly what kinds of apart from the general scientiic and medical litera-
questions are being asked. Trials typically thought of ture are provided.
8
Chapter 2: The sequence of clinical development
Table 2.1 Early, middle, and late development: objectives and examples of studies performed
Development stage
Objectives Early Middle Late
Human pharmacology ‘First in human’, single and Targeted special safety Special formulation
and biomarker exploration multiple ascending dose studies in patients and pharmacology; drug-drug
trials (‘phase 1’) volunteers interaction studies; drug
metabolism in renal and liver
impairment
Exploratory efficacy Early, ‘first in patient’ studies Dose-ranging efficacy and Dose-ranging studies in new
and safety studies safety studies in patients indications
(‘phase 2’)
Confirmatory efficacy Seamless exploratory dose Pivotal confirmatory trials in
trials ranging and confirmatory primary indication; comparative
efficacy efficacy trials (‘phase 3’)
Therapeutic use studies, Comparative efficacy trials New indications, expanded
new indications expansion population studies,
combination trials (‘phase 4’)
Early stage clinical development In addition, safety and toxicology data from both
in vitro and animal testing is needed to justify expo-
Early stage clinical research involves the design and
sure in humans. Data from acute and chronic studies in
conduct of studies aimed at understanding the basic
animals as well as safety pharmacology studies help to
human pharmacology of a drug. he program of early
deine the dose range that can be used safely in humans
research is built upon knowledge gained from pre-
and can highlight speciic toxicity issues that may need
clinical in vitro and in vivo experiments that deine and
justify an initial assessment of potential beneit and risk to be monitored. In certain settings, special studies
to human subjects. Clinical studies are then designed examining the potential for reproductive toxicity and
and performed to produce data that will enable initial carcinogenicity are required. Additional information
determinations of safety and tolerability, pharmacoki- on drug metabolizing enzymes, drug metabolites, the
netics, pharmacodynamics, and aspects of drug action potential for drug interaction, and initial estimates of
and CNS penetration for the drug. preclinical pharmacokinetics help to deine param-
Every early stage clinical development program eters for early studies. When they are available, data
requires information derived from basic laboratory from animals on pharmaceutical properties such as
and animal experiments that deine the fundamental absorption and bioavailability are also useful in help-
pharmacologic properties of a drug. Basic information ing to design an early clinical program.
about the biological target, cellular pathways and the he main goals of early clinical studies are to pro-
biochemical mechanism of action should be known. vide initial assessments of safety, tolerability and
Information about the potency and selectivity of the pharmacokinetics and to estimate the dose range that
compound for its target and the nature of concentra- will be deployed in later trials. his is usually accom-
tion vs. response relationships is critical to the design plished through a combination of single ascending
of an early clinical program. Typically, data is available dose and multiple ascending dose trials that help to
from more than one in vivo eicacy model that pro- determine the maximum tolerated dose and regimen
vides justiication for exploration in humans. his data that provides adequate drug exposure for the proposed
should include information about the time course of indications.
onset and duration of efect, dose vs. response charac- he key objectives of single ascending dose stud-
teristics, and the no-pharmacologic efect dose. Any ies are to deine safety, tolerability, pharmacokinet-
information on biomarkers from in vivo models is also ics and pharmacodynamics of a drug. he dose range
enormously useful at this stage. deployed usually covers approximately two logs and
9
is framed by a starting dose that is a fraction of the and schizophrenia who are chronically exposed to
preclinical pharmacologic no-pharmacologic efect anticonvulsant or antipsychotic medications respec-
dose (NOPED) in the most appropriate or sensitive tively, typically report fewer central nervous system
species and limited to a top dose that is guided by the adverse events than normal volunteers exposed to the
preclinical exposure (drug concentration in plasma) at same doses of a new medication. To ensure an accu-
the no-adverse efect level (NOAEL). Although designs rate determination of the tolerable dose range, during
are highly variable, as many as 6–8 dose levels are used early development both single-dose and multiple-dose
with dose increments typically >2-fold at the lowest studies are conducted in parallel in patients and nor-
doses and <2-fold at the highest doses. Commonly, mal volunteers. he combined data set provides the
about eight subjects are exposed in each dose cohort best overall initial picture of safety, tolerability and
at a placebo-to-drug ratio of one to three. Close assess- pharmacokinetics: studies in normal volunteers pro-
ments of vital signs, hematology and blood chemistry, vide an assessment of normal human pharmacokinet-
electrocardiography, and adverse events are collected ics and determine which adverse events can reasonably
in each cohort and advancement to the next dose level be attributed to drug exposure; studies in patients pro-
is allowed only ater thorough review of these data. vide a more accurate assessment of the tolerable dose
Intensive plasma sampling for pharmacokinetics is range. Other studies speciically designed to charac-
also performed in each cohort although typically these terize drug–drug interactions and efects on pharma-
data are not available before advancement to the next cokinetic parameters can be performed to provide
dose level. At study end, an assessment is made of the information about efects of concomitant medications
overall tolerability and safety across the examined dose used in patient populations.
range along with any deined dose-limiting toxicity Some initial studies in humans can only be con-
whether deined by adverse event or laboratory evi- ducted in patients. Medications with substantial poten-
dence. Detailed analysis of pharmacokinetic samples tial toxicity risks such as cytotoxic or genotoxic drugs
adds to the proile of the medication. his information cannot be administered to normal volunteers and for
is then used to help deine design parameters for mul- this reason, early studies are conducted in patients. he
tiple ascending dose studies. most common setting where this occurs is in oncology
Multiple ascending dose studies extend observa- drug development where initial single- and multiple-
tions on human pharmacology to longer periods of dose studies are virtually always conducted in cancer
dosing. Again, the key objectives are to provide data patients. Examples from neurological therapeutics
on safety, tolerability and pharmacokinetics with pro- include the use of speciic B-cell depleting therapies for
longed dosing. In most studies, the duration of dosing multiple sclerosis and immunotherapeutic vaccines for
ranges from 7 to 14 days with dosing frequency deter- Alzheimer’s disease [1, 2].
mined by the pharmacokinetic parameters deined Data generated from the kinds of experiments
in single-dose studies. Typically, 4–5 dose levels are described thus far provide an initial picture of the
examined in the single ascending dose study, with the human pharmacology of a drug. Ideally, early research
dose range covering a little over 1 log. eforts should also provide evidence of drug exposure
Single and multiple ascending dose human pharma- at the target site of action over a period of time that is
cology studies are usually conducted in healthy volun- consistent with what is believed to be needed for ei-
teers whose age may relect the target population for cacy in the human disease state. Further conidence is
the intended indication for the drug. Healthy volun- gained by demonstrating that the drug binds to the tar-
teers are oten preferred at this stage since the assess- get at the site of action and that binding to the target
ments of the tolerability and pharmacokinetic proile results in a measurable pharmacologic efect. In these
of the drug are less likely to be contaminated by dis- respects, wherever possible both single- and multiple-
ease-related adverse events and concomitant medica- dose studies should include measures of central nervous
tions. However, there are several situations where early system penetration and pharmacodynamic properties
assessments of human pharmacology should be sup- of drugs that are related to both primary and secondary
plemented by data from the target patient population. mechanisms of action. Conducting these kinds of early
For some medications the tolerability proile in assessments in patients rather than healthy volunteers
patients difers markedly from that in healthy vol- may be easier to justify ethically and may generate data
unteers. For example, patients with chronic epilepsy that is more relevant for decision-making.
10
Estimates of exposure in the brain can be deter- development can play in determining dose-response
mined by direct assessment of drug concentration relationships [9]. hey make a strong case for the more
in the cerebrospinal luid or indirectly by efects on routine use of quantitatively deined, model-based
physiological or imaging measures. Both eicacy decision criteria in early development and point to sev-
and safety pharmacologic dose–response relation- eral organizational challenges for broader implemen-
ships can be assessed by the addition of targeted clin- tation of model-based development. hese include
ical measurements to the standard data collection. A the need for early development scientists to be more
simple example comes from the early development of speciic about the assumptions made in creating the
serotonin-norepinephrine reuptake inhibitors where models and for team members with less training in
investigators made assessments of pupillometry and quantitative scientiic disciplines to become comfort-
pulse/blood pressure measures in each cohort to esti- able with the process of deining and applying quanti-
mate the dose relationship for adrenergic efects. More tative decision rules.
complex assessments of serotoninergic efects can be
provided by quantitative polysomnography [3]. hese
examples show that substantial insights on dose–re- Middle stage clinical development
sponse pharmacology can be provided with relatively he middle stage of clinical development typically
small sample sizes. involves more signiicant exploration of therapeutic
Neuroimaging can provide important evidence of eicacy in patients. he issues that need to be addressed
distribution of drug in areas of the CNS with known tar- at this stage include deining the speciic patient popu-
get expression. For example, the cerebral distribution lation to be studied, the determination of the dose
of C-11 labeled donepezil in the brains of Alzheimer’s range and regimen, and the selection and evaluation of
patients has been demonstrated using PET imaging [4]. endpoints for use in later conirmatory studies. Trials
In addition, imaging studies can provide important evi- conducted in this stage of clinical research carry a spe-
dence of speciic drug efects in the brain. Another PET cial burden within an overall development program
ligand, Pittsburgh Imaging agent B (PIB), a C-11 labeled because the data generated in them have a signiicant
thiolavin ligand that binds to ibrillar beta-amyloid, impact on future trial size, expense, and risk. It is espe-
was used to conirm the clinical diagnosis and demon- cially important that the limitations of trial design and
strated the ability of a monoclonal antibody to lower data interpretation at this stage are clearly understood
cerebral beta-amyloid in patients with Alzheimer’s and communicated to investigators, patients, and other
disease [5]. Important information on brain function stakeholders.
that may be modulated with drug therapy may eventu- During middle stage development it is critical to
ally come from other measures like functional MRI [6, begin to characterize the dose-response relationship
7]. he evolving importance of brain imaging studies for eicacy and safety endpoints in the selected popula-
in drug development was borne out in a recent review tion. Determination of the likely efective and safe dose
of new drug applications in the Neuropharmacology range is a critical objective of middle stage develop-
Division at the US FDA. his review showed that a sub- ment that afects not only the design of later stage trials
stantial number of those projects utilized neuroimag- but other aspects of non-clinical development as well.
ing during early stage development [8]. An important study from the FDA showed that a sub-
he data generated in early stage studies provide stantial percentage of new drugs approved were rela-
conidence for deciding whether to advance a drug beled to correct dose ranges, and the majority of these
into more complicated and expensive trials in speciic changes were for safety reductions [10]. Of all thera-
patient populations. Further, they provide evidence for peutic areas examined, drugs for nervous system indi-
the selection of safe doses to be used in those studies and cations had the highest percentage of dosing changes.
insights into speciic safety or tolerability issues that Establishment of the optimal dose range requires that
may need further clariication. Increasingly, pharma- substantial attention be paid to selection of the appro-
ceutical companies are utilizing pharmacokinetic and priate patient population, eicacy endpoints, and
pharmacodynamic modeling to build conidence in safety evaluations.
their assessments of the dose-response relationship for Patient selection during middle stage evaluation of
drugs in early development. Lalonde and colleagues eicacy typically is more restrictive than in later stages
provided a useful review of the role that model-based of development because there is a desire to provide
11
control over aspects of the disease state that might descriptions ‘no pain’ or ‘worst imaginable pain’. More
inluence the therapeutic response to a drug. he spe- complicated examples of PRO include various quality
ciic restrictions that are employed depend on the clin- of life rating instruments. Regulatory agencies review
ical setting. For example, in the evaluation of a new and evaluate the suitability of PRO assessments based
analgesic medication a protocol may exclude patients upon several characteristics including the medical
whose pain is refractory to multiple medications on condition and population for intended use, concepts
the premise that those patients would be unlikely to being measured, number of items, conceptual frame-
respond to any new medication. Similarly, initial proof- work, data collection method, scale administration,
of-eicacy trials for new anticonvulsants typically response options, scoring and weighting of items or
require that patients have recurrent seizures despite domains, and availability of translations or cultural
treatment with more than one medication. Here, the adaptations.
drug to be tested needs to demonstrate anticonvulsant Because the properties of a measurement instru-
eicacy above background therapy in order to advance ment like a PRO need to be well understood prior to
to further studies in less severely afected patients. In collecting deinitive eicacy data in pivotal conirma-
certain clinical trial settings where placebo response tory trials, this important groundwork must be initi-
rates are known to be high (major depressive disorder, ated and is oten largely completed during middle stage
painful diabetic neuropathy, generalized anxiety dis- development. he FDA has published a useful guidance
order), protocols may require that patient selection be document that is aimed at ensuring that the process for
based upon responses to evaluation instruments prior evaluating new instruments is adequately understood
to randomization or ater a period of placebo run-in. In by clinical researchers [11].
each of these settings the external validity of a positive Although general safety data collection at this stage
eicacy signal is limited by the bias introduced by the is important, the strategy for learning about speciic
restricted patient selection. safety issues needs special attention. he development
Endpoint selection in early eicacy trials depends plan should take into account what has already been
on the nature of the drug efect expected, previous learned in the initial experience with healthy volun-
experience with measurement scales used in the dis- teers and what is known or believed to be an issue in
ease state, and the kind of decision problem faced by the patient population of interest. For example, some
the study team. he speciic endpoints selected should anticonvulsants are known to have adverse efects on
balance the need to measure the efect of the drug on the cognitive function in epilepsy patients and speciic
disease state, provide some initial reassurance that the scales aimed at quantifying the dose-response relation-
drug efect is clinically meaningful, and have adequate ship for these efects may be needed. Similarly, in the
operating characteristics for studies that typically are evaluation of certain psychoactive agents, rather than
of somewhat smaller sample size. If the ultimate goal relying on spontaneous reporting to detect withdrawal
of the development program is the approval of a new efects, speciic instruments such as the Physicians
medication, the endpoints selected for pivotal trials Withdrawal Checklist are oten deployed at this stage
must be acceptable to regulatory authorities wherever to gain insight into the dose-response relationship [12].
the drug will be registered. When a new instrument is Separate study visits speciic to this objective may be
used, substantial evidence of its measurement proper- necessary and special care is taken when determining
ties and appropriateness for eicacy assessment will be the appropriate schedule for study drug dosing rela-
needed. tive to the evaluation of withdrawal efects. Another
his is particularly true for patient-reported out- example comes from the evaluation of drugs for neuro-
comes (PRO), which are used commonly in CNS devel- protection in the setting of acute stroke where oten
opment. A PRO is a report of a patient’s health status or there is a need to be certain that the drug is compatible
condition that comes directly from the patient, with- with concomitant use of recombinant tissue plasmino-
out interpretation of the patient’s response by another gen activator (rtPA). hese agents can have both phar-
person. A simple example is the Numeric Pain Rating macokinetic and pharmacodynamic interactions with
Scale, a measurement tool used to evaluate the ei- rtPA that may require speciic plasma sampling, blood
cacy of analgesic medications. On this scale, patients tests, or imaging to understand fully.
rate their pain using a number from 1 to 10, where Another speciic safety issue particularly important
the extreme ends of the scale are anchored with the to CNS drug development is the assessment of abuse
12
liability. Although some components of this assess- or, under the right circumstances, by utilizing adaptive
ment are undertaken at middle stages of development, randomization schemes and assessments [15].
eforts may begin earlier with preclinical assessments Adaptive trials can be designed to assist with spe-
in animals and extend well into late stage development ciic decision problems related to eicacy or safety
and post-marketing. he drug’s primary and secondary endpoints and can be used efectively in assessments
pharmacology, absorption and metabolism, intended of performance relative to comparator agents. Because
patient population, and inal formulation all afect the of the promise they hold for eicient clinical decision-
timing and extent of the overall assessment [13]. Initial making, particularly around identiication of the opti-
abuse liability studies in humans may be undertaken mal dose range, trials utilizing adaptive designs are
at middle stage; however, these studies should only be becoming more commonplace in industry settings. An
considered and designed in the context of an overall in depth review of this topic is provided in Chapter 9
strategy for abuse assessment. Careful planning and of this text.
decision-making are essential since the data generated Active controls are frequently employed in middle
during assessment of abuse liability can have profound stage development, particularly in areas where placebo
impacts on the overall value and availability of a new response rates can be high and failed trials are common
treatment. (neuropathic pain, Alzheimer’s disease, depression
Another critical objective of middle stage develop- and generalized anxiety disorder). In this setting the
ment is the assessment, understanding and mitigation positive control mainly functions to demonstrate that
of patient access, and study feasibility issues that may the experiment has adequate assay sensitivity to detect
arise during later studies. Every experienced clinical treatment efects with the new agent. In some circum-
researcher has dealt with the gap between expectations stances the positive control also serves as a compara-
and reality that comes from incorrectly projecting tor to evaluate eicacy or safety advantages of a new
large numbers of suitable patients for a speciic trial. treatment. Although the study may not be powered to
his common problem was described by the clinical test the question of superiority of the new treatment
pharmacologist Louis Lasagna, who stated that the over the comparator, suicient insight may be gained
number of patients actually available for a clinical trial to help with decision-making about whether to pro-
is between 10% and 33% of original estimates [14]. he ceed to deinitive eicacy trials. his assessment, the
gap is usually a result of the particular requirements ensuing discussions and decision-making are aided by
and design of the clinical experiment. For example, careful articulation, in advance of seeing data, of the
narrow inclusion and exclusion criteria or restrictions speciic eicacy or safety criteria advantage that must
on concomitant medications may eliminate many be demonstrated by the new treatment. his is one
patients from participation. Similarly, the period of of the most important activities undertaken during
study participation may be too long or the study pro- middle stage development.
cedures too onerous for some patients. hese issues
require objective evaluation and an honest assessment
of the scientiic and pragmatic trade-ofs that need to Late stage clinical development
be made in order for later trials to be successful. Much Clinical trials conducted during late stage develop-
of this assessment can and should be done during ment are aimed at extending eicacy and safety obser-
clinical trials conducted in middle stage development. vations in larger populations. he two key objectives
Failure to do so can have signiicant, negative efects on of late stage development are conirmation of eicacy
later trials. and irmer establishment of the general safety proile
he speciic clinical trial designs deployed in the with enhanced understanding of special safety issues.
middle development stage typically utilize a broad hese data provide an adequate basis for assessing the
dose range derived from the early experience in vol- beneit/risk relationship of a new treatment. Typically,
unteers and patients. Ideally, the program of research additional eforts are made to conirm the optimal
will provide an early determination of the doses that dose–response relationship and to provide evidence of
provide no efect, maximum efect, and the best overall quality of life beneits. In large pharmaceutical com-
balance of eicacy and adverse efects. his can be panies, signiicant resources are also expended in late
accomplished by conducting multiple dose-ranging, stage development on comparative eicacy trials that
parallel-group studies with overlapping dose ranges are sometimes necessary for initial regulatory approval
13
and for making cost-beneit arguments with third party and tolerability proile is derived from the number of
payers in the US and government pricing authorities in patients exposed and their duration of treatment with
other parts of the world. a medication. For chronically administered drugs for
Late stage conirmatory clinical trials oten util- non-life threatening conditions, the International
ize a broader study population than was studied dur- Committee on Harmonisation (ICH) Guidelines
ing early development. his is done to ensure that the recommend an overall exposure of 1500 patients
studies performed provide evidence of eicacy and with 300–600 patients exposed for 6 months and 100
safety that is relevant for the majority of patients with patients exposed for one year [16]. hese exposures
a particular disease. his oten necessitates loosening must occur at the dose or dose range believed to be ei-
the entry criteria that were used in middle stage trials, cacious. here are circumstances where these guide-
which can involve signiicant risks since a less highly lines can be relaxed (for example, when the number of
selected population may respond less predictably to patients afected is small), but occasionally the required
a drug. More and more however, late stage trials are number can be even larger (for example, when there
focusing on speciic subsets of patients determined is a need to quantify the frequency of rare but serious
either by genetic makeup or speciic biomarkers to adverse events known to occur in a particular drug
be particularly suited for a new treatment. he best class). Most oten, development teams plan carefully to
examples of this approach are currently being pursued ensure that the basic exposure requirements set forth
in oncology, a simple example of which is the use of by ICH will be met by the time that applications for
estrogen antagonists in estrogen-positive breast can- regulatory approval are submitted and reviewed. hese
cer. In neurology, the previously mentioned use of requirements underscore the need to understand the
imaging methods to determine the presence of ibrillar efective dose range as early as possible during devel-
amyloid in the brains of patients with Alzheimer’s dis- opment; failure to do so can lead to signiicant delays
ease might ultimately be used to deine the appropriate while additional patient exposures are accrued.
patient population for anti-amyloid drugs. he plan for broadening the understanding of spe-
Regardless of whether it is narrowly or broadly ciic safety issues needs to be articulated at the begin-
deined, careful description of the patient population ning of late stage clinical trials. For example, certain
is essential to the interpretation of study results. For CNS drugs are believed to increase the risk for suicidal
example, study protocols should describe the method behaviors. If the risk is known or believed to be par-
for determining that study subjects have the correct ticularly high for a given drug class, speciic data col-
diagnosis and that the stage or severity of their disease lection instruments may be needed for the program
has been determined adequately. his is particularly and investigators should be speciically instructed in
important in late stage trials where the study popula- the handling of adverse events related to suicidality.
tion may be less strictly deined by exclusion or inclu- When a particular safety or tolerability issue is uncov-
sion criteria. he methods used for patient selection in ered in middle stage development, a speciic plan for
late stage studies that are used to support regulatory the data collection needed to fully describe and under-
approval and product labeling are evaluated and inter- stand the issue should be created for all late stage trials.
preted carefully during regulatory review. For example, initial eicacy trials in middle stage may
Late stage studies are usually powered at higher lev- uncover that peripheral edema complicates the use of
els than in earlier development, with sample size esti- a medication in a signiicant percentage of patients.
mation typically employing smaller type-2 error rates. For any patient presenting with a complaint of edema,
Partly this is done to ensure the robustness of any posi- speciic additional medical history is recorded, limb
tive eicacy signal. he additional power provided by measures are taken and additional blood, urine or
the larger sample size may be necessary for validation other testing is performed to more fully understand
of novel endpoints, and can help to add conidence to the nature of the edema in speciic cases. At the end
the interpretation of secondary eicacy measures and of the late stage program, these data are summarized
supplementary analyses of the primary endpoint. and described in aggregate and can provide signii-
In addition, larger sample sizes provide a more cant insight into a particular safety or tolerability issue.
substantial basis for interpreting safety and tolerability Having a plan for uniform data collection across stud-
results from a single study or from a program of clini- ies for important safety issues makes this efort much
cal research. Conidence in the accuracy of the safety easier and the resulting interpretation more robust.
14
It is very common for healthy human volunteer selectively based upon several factors. When the eli-
studies to be performed during late stage develop- gible population of patients is small and there is an
ment. hese trials oten have as their speciic objec- urgent need for a new treatment, these designs may
tives the generation of data on drug-drug interactions, help to save time required for development and may
drug metabolism and pharmacokinetics. Alternate make the most eicient use of eligible patients. his
dosing formulations are also frequently studied dur- ‘adaptive’ approach is not appropriate for programs in
ing late stage, such as liquid formulations that may be which eicacy measurements or surrogates need val-
appropriate for pediatric populations. hese formula- idation or are poorly understood in the patient popu-
tions may be required in order to conduct pediatric lation. In any circumstance, close discussion with
studies, which are typically not initiated until there is regulatory agencies is essential before embarking on a
some assurance that a drug will be successful in adult study with this design.
populations. Sometimes speciic tolerability and safety Another trial adaptation that can be useful in late
issues are more robust when studied in trials utilizing stage development is sample size re-estimation. At
healthy volunteers. For example, a placebo-controlled the beginning of a clinical trial, the assumptions that
clinical trial to assess the efect of the drug pregabalin underlie sample size calculations may not be well
on sperm motility was conducted in 30 healthy male understood. In particular, the variability in the pri-
subjects [17]. his study would have been diicult to mary eicacy parameter may be over- or underesti-
complete with a high level of data quality in the dia- mated and can signiicantly afect the likelihood of
betic, psychiatric, and epileptic patient populations for observing a statistically signiicant result at study end.
which the drug was ultimately approved. his may be a particular problem when entry criteria
Clinical trial designs deployed in late stage devel- change from middle stage to late stage trials. Sample
opment typically involve large and relatively simple size re-estimation involves examining blinded eicacy
parallel group assignment to drug, placebo and some- data at a predetermined point in study enrollment and
times, active comparators. In late stage experiments, calculating the variability in the primary parameter. If
active controls are usually employed to provide direct the variability observed is signiicantly larger than the
evidence of comparative eicacy for the purpose of estimate that was used for original sample size calcula-
demonstrating advantages of the new drug over exist- tions, the sample size is adjusted upward to relect the
ing agents. Here, the data generated with the compara- observed value for variability in the primary parameter.
tor is mainly used to support superior eicacy claims, No statistical penalty needs to be paid for this adjust-
to justify the additional investment needed to market a ment, but the procedure must be carefully documented
new product, and to meet the requirements of regula- in the statistical analysis plan.
tory agencies around the world for pricing decisions. Open-label safety extension studies are also fre-
Since earlier studies conducted during middle quently used during late stage development. Typically,
stage can almost never detect small diferences in ei- these studies follow directly ater pivotal, double-blind,
cacy and do not provide a complete safety proile, it is proof-of-eicacy studies and have as a primary objec-
appropriate to continue to explore dose-response in tive the collection of long-term safety data for a drug
late stage development. For antihypertensives, antide- used for a chronic condition. In these studies, partici-
pressants, anti-migraines, and anti-psychotics most or pants enter a transition period from receiving blinded
all pivotal trials include some degree of dose-ranging. study medication in the preceding controlled trial,
Robert Temple, Deputy Director for Clinical Science in following which they immediately enroll and receive
the Center for Drug Evaluation & Research at FDA has active study drug in the open-label extension. he dura-
stated publicly his opinion that dose-ranging designs tion of patient participation in an open-label extension
should be utilized more commonly in pivotal trials study is typically planned for at least one year. In addi-
performed during late stage development [18]. tion to collecting safety data, open-label trials some-
In certain special circumstances, novel designs may times include eicacy data collection for the purpose
be used in late stage development that accomplish a of observing longer-term responses to a medication.
seamless transition from the typical dose-ranging trial Because an open-label study is uncontrolled, the inter-
used in middle stage to a parallel group, pivotal proof- pretation of both eicacy and safety data is limited. he
of-eicacy study normally used in late stage [19]. hese interpretation of both can be enhanced somewhat by
‘seamless phase 2–3’ studies should be deployed very ensuring a blinded transition from the double-blind to
15
the open-label phase; that is, neither the patients nor correspondence between the sponsor and FDA, and
the investigators are informed of the preceding double- approved labels and labeling changes. he reviews cover
blind treatment assignment at the time of transition assessments of pharmacokinetics, eicacy and safety,
to open-label. Patients may beneit from participat- and detail the concerns raised by FDA scientists in their
ing in open-label studies by being allowed access to assessment of the drug’s risk and beneit. hese reviews
a potentially efective medication that would other- provide an important and detailed source of informa-
wise be unavailable. Access to active medication in a tion on design elements, entry criteria, and perform-
follow-on open-label study also provides incentive for ance of endpoints in clinical trials. Importantly, data is
some patients to enroll in the preceding double-blind available from trials that were submitted in support of
trials where they may receive placebo. Sponsors beneit the drug application but sometimes not submitted for
by the generation of long-term safety data that would publication in peer-reviewed journals. In this respect,
otherwise not be collected easily in prolonged double- a more complete view of the data that supports a drug’s
blind studies. eicacy and safety proile is available and can help to
frame the challenges that may be expected in a clinical
program aimed at the same indication. he database
Important sources of information does not contain information for all drugs approved in
Besides the general scientiic and medical literature, the US but there are signiicant additions to the docu-
there are several important sources of information that ment database every year.
can help with the strategy for clinical development pro- European regulators also provide access to docu-
grams and the design of speciic trials and their ques- ments that describe requirements for drug evaluation
tions. Some of these resources are free and available and registration. Similar to the FDA, the European
on government-sponsored internet sites while others Medicines Agency (EMEA) web site contains links
are proprietary collections of information that require to development guidance documents, reviews of
subscription fees for access. approved products and administrative requirements
he FDA provides access to guidance documents [22]. Since the labels created for the EU difer depend-
that outline regulatory requirements related to the ing on the country where the drug is marketed, there
development of drugs and devices [20]. here are is not the same access to product labeling that is avail-
general guidance documents related to both preclin- able on the FDA web site. Agency scientists working in
ical and clinical requirements for development in any diferent countries can have diferent opinions of the
therapeutic area, as well as speciic guidance for some, data that is necessary to support usage of a drug for a
but not all CNS indications. Clinical guidance docu- particular indication, and therefore it is necessary to
ments describe design requirements, endpoints, and compare requirements in the US and EU when con-
analytic approaches to consider when conducting sidering the strategy for a clinical trial program aimed
trials aimed at registration and marketing approval. at registration in both regions. Sometimes, regulatory
Regulators work diligently to keep up with the latest conclusions on risk and beneit difer substantially
science related to clinical trials; in this respect, guid- from region to region. A careful comparative review of
ance documents may be somewhat outdated and not information from EMEA and FDA web resources can
completely relect the current thinking of regulatory provide essential insights for the clinical development
scientists. hese documents can therefore provide a plan when global registration is a primary objective.
starting point for strategic thinking, but fulsome and he FDA also provides public access to meeting
contemporaneous discussion with reviewing scien- materials and transcripts from public advisory commit-
tists at regulatory agencies is essential before making tee meetings [23]. hese meetings are organized by the
signiicant commitments to program or trial designs. FDA in order to obtain independent expert advice on
Another important resource provided by FDA are scientiic, technical, and policy matters. he meetings are
documents describing their review of data submitted in open to the public and are a good source of information
New Drug Applications (NDA) for drugs approved for on drugs that are under review for regulatory approval,
marketing in the US [21]. his database, indexed alpha- scientiic matters such as safety issues related to par-
betically by drug name, provides access to PDF iles of ticular drug classes, and public discussion prior to the
reviews conducted by regulatory scientists from clin- promulgation of guidelines. Although the best insight
ical pharmacology, statistics and medical disciplines, is gained by attending advisory committee meetings in
16
person, transcripts, brieing documents, and presenta- risks and beneits of a drug or device. In planning a
tion materials are enormously useful by themselves. clinical research program, it is useful to consider the
he US National Institutes of Health (NIH), sequence of questions that must be posed and answered
through the National Library of Medicine, provides in order to proceed through each stage of data gather-
access to information on clinical trials currently ing, keeping the ultimate objective in mind. As the plan
underway at sites in the US and around the world. he for development unfolds, the speciic tactics used to
web site, ClinicalTrials.gov is a registry of federally and answer questions may change as results become avail-
privately supported clinical trials conducted in the able. Although studies typically become progressively
US and around the world. he registry is a searchable, larger and operationally more complicated as devel-
online database that provides information on study opment proceeds into late stage, the speciic questions
objectives, requirements for participation, locations of posed are usually more focused as the speciic char-
investigative sites, and contact information. As of this acteristics of a drug are revealed. here are numerous
writing, ClinicalTrials.gov contained information on information resources apart from scientiic literature
94 215 trials conducted in the US and 173 countries, that should be used when creating a clinical develop-
including those sponsored by US federal government ment plan.
agencies (such as NIH) and private industry [24].
Pharmaceutical companies provide data on tri- References
als that are underway and results for trials that have
completed. For example, Novartis posts information 1. Gilman S, Koller M, Black RS, et al. Clinical efects of
Abeta immunization (AN1792) in patients with AD in
on clinical trials that are currently recruiting subjects an interrupted trial. Neurology 2005; 64: 1553–62.
as well as results from completed trials in searchable,
2. Mehta LR, Schwid SR, Arnold DL, et al. Proof of
online databases [25, 26]. Most large pharmaceutical
concept studies for tissue-protective agents in multiple
companies provide similar internet access to informa- sclerosis. Mult Scler 2009; 15: 542–6.
tion on their projects in development. Although the
3. Chalon S, Pereira A, Lainey E, et al. Comparative efects
information posted in these documents is sometimes of duloxetine and desipramine on sleep EEG in healthy
not highly detailed, considerable insight can be gained subjects. Psychopharmacology (Berl) 2005; 177: 357–65.
into design considerations and performance of end-
4. Okamura N, Funaki Y, Tashiro M, et al. In vivo
points used in studies. visualization of donepezil binding in the brain of
Proprietary databases also exist which gather pub- patients with Alzheimer’s disease. Br J Clin Pharmacol
lically available information into a searchable format. 2008 65: 472–9.
One example of this kind of database, TrialTrove, 5. Rinne JO, Brooks DJ, Rossor MN, et al. 11C-PiB PET
is marketed by Citeline Intelligence Solutions [27]. assessment of change in ibrillar amyloid-beta load
TrialTrove provides surveillance of planned, ongoing, in patients with Alzheimer’s disease treated with
and completed clinical trials from numerous public bapineuzumab: a phase 2, double-blind, placebo-
domain sources. he information is reviewed by a sci- controlled, ascending-dose study. Lancet Neurol 2010;
entiic analyst staf and sorted into topic and discipline 9: 363–72.
areas. he database can be searched by drug name, 6. Wong DF, Tauscher J, and Gründer G. he role of
indication or disease state, and speciic pharmacologic imaging in proof of concept for CNS drug discovery
and development. Neuropsychopharmacology 2009; 34:
approaches. he information is similar to that available
187–203.
from government and company databases but in gen-
7. Pihlajamäki M and Sperling RA. Functional MRI
eral is more detailed and provides information reported
assessment of task-induced deactivation of the default
from sources such as press releases that are not typically mode network in Alzheimer’s disease and at-risk older
cited by companies or government sites. Access to data- individuals. Behav Neurol 2009; 21: 77–91.
bases of this kind requires subscription payments.
8. Uppoor RS, Mummaneni P, Cooper E, et al.
he use of imaging in the early development of
neuropharmacological drugs: a survey of approved
Conclusions NDAs. Clin Pharmacol her 2008; 84: 69–74.
Clinical development is an expensive and time con- 9. RL Lalonde, KG Kowalski, MM Hutmacher, et al.
suming efort that must be carefully planned in order Model-based drug development. Clin Pharm herap
to provide essential information to characterize the 2007; 82: 21–32.
17
10. Cross J, Lee H, Westelinck A, et al. Postmarketing drug 18. Temple R, Comments made at presentation at Drug
dosage changes of 499 FDA-approved new molecular Information Association meeting, June 14, 2004.
entities, 1980–1999. Pharmacoepidem Drug Safety 2002; Washington, DC.
11: 439–46. 19. Maca J, Bhattacharya S, Dragalin V, et al. Adaptive
11. Guidance for Industry : Patient-Reported Outcome Seamless Phase II/III Designs—Background,
Measures: Use in Medical Product Development to Operational Aspects, and Examples. Drug Inf J 2006; 40:
Support Labeling. 2009. www.fda.gov/downloads/ 463–73.
Drugs/…/Guidances/UCM193282.pdf. 20. US Food and Drug Administration, Guidances
12. Rickels K, Garcia-Espana F, Mandos LA, et al. Physician (Drugs). http://www.fda.gov/drugs/
withdrawal checklist (PWC-20). J Clin Psychopharmacol guidancecomplianceregulatoryinformation/guidances/
2008; 28: 447–51. default.htm (Accessed August 2010.)
13. Mansbach RS, Feltner DE, Gold LH and Schnoll SH. 21. US Food and Drug Administration, Drugs@FDA.
Incorporating the assessment of abuse liability into the http://www.accessdata.fda.gov/scripts/cder/drugsatfda/
drug discovery and development process. Drug Alcohol index.cfm (Accessed August 2010.)
Depend 2003; 70: S73–85. 22. European Medicines Agency. http://www.ema.europa.
14. van der Wouden JC, Blankenstein AH, Huibers MJ, et eu/ema/index.jsp?curl=pages/home/Home_Page.
al. Survey among 78 studies showed that Lasagna’s law jsp&murl=&mid=&jsenabled=true (Accessed August
holds in Dutch primary care research. J Clin Epidemiol 2010.)
2007; 60: 819–24.
23. US Food and Drug Administration, Advisory
15. Quinlan J, Gaydos B, Maca J, et al. Barriers and Committees. http://www.fda.gov/AdvisoryCommittees/
opportunities for implementation of adaptive designs in default.htm (Accessed August 2010.)
pharmaceutical product development. Clin Trials 2010;
24. ClinicalTrials.gov. http://clinicaltrials.gov/ (Accessed
7: 167–73.
August 2010.)
16. International Conference on Harmonisation
25. Novartis Research and Development, Clinical Trials.
of Technical Requirements for Registration of
http://www.novartis.com/research/clinical-trial.shtml
Pharmaceuticals for Human Use. ICH Harmonised
(Accessed August 2010.)
Tripartite Guideline: he extent of population exposure
to assess clinical safety for drugs intended for long- 26. Novartis Clinical Trial Results Database,
term treatment of non-life-threatening conditions, E1. Neuroscience. http://www.novctrd.com/ctrdWebApp/
http://www.ich.org/MediaServer.jser?@_ID=435&@_ clinicaltrialrepository/public/products.jsp?divisionId
MODE=GLB (Accessed August 2010.) =2&diseaseAreaID=3 (Accessed August 2010.)
17. Lyrica package insert. http://www.pizer.com/pizer/ 27. Citeline Products and Services: Citeline TrialTrove. http://
download/uspi_lyrica.pdf (Accessed August 2010.) www.citeline.com/trialtrove (Accessed August 2010.)
18
Section 1 The role of clinical trials in therapy development
Chapter
Unique challenges in the development
3 of therapies for neurological disorders

Gilmore N. O’Neill
Introduction should be made to develop the techniques, early in a

drug’s development, that are necessary to answer these
he ultimate goal of clinical science is to identify novel
questions as soon as possible ater entry into human
therapeutics to relieve human sufering. herapies for
studies. Indeed, serious consideration should be made
neurological diseases, including amyotrophic lateral
not to advance a drug or therapeutic program in the
sclerosis (ALS), Parkinson’s disease (PD), Alzheimer’s
absence of these techniques to avoid exposing patients
disease (AD), schizophrenia and neuropathic pain aim
to risks not balanced by a reasonable probability of ei-
to arrest or slow the progression of disease, the worsen-
cacy and to avoid squandering critical resources.
ing of disability and/or relieve symptoms.
Notwithstanding this thesis in support of rational
he challenge of any therapeutic development plan
drug development, if empirically compelling human
is to deliver, with conidence, the appropriate concen-
trial data appear then an opportunistic approach is
tration of an intervention to the intended target on the
reasonable.
intended cell type(s), in the intended tissue type for
In the previous chapter (Chapter 2), you will have
the intended duration of time. To do this, it is neces- read about the general principles of drug development
sary to conirm that the target of interest is expressed and will have seen how speciic questions are posed
in humans, and most particularly, in human disease of and answered at diferent stages of a drug’s develop-
interest and then to conirm one’s ability to hit the tar- ment. In Chapter 1, you will read about the enormous
get and drive the expected downstream efects (Figure and ever increasing human and economic burden of
3.1). In this chapter we will discuss the unique chal- neurological and psychiatric disease on this planet.
lenges to answering these questions when developing his chapter will focus on the key early phase questions
CNS therapies. Diiculties arise because of the elusive that must be answered prior to starting pivotal or regis-
nature of the targets, the challenges of delivering ther- trations trials. Prior to initiating phase 3 studies, which
apies across the blood brain barrier (BBB) (delivery use considerable resources and expose a large number
and elux equilibrium), the intricacies of models of of patients to risks associated with a novel therapeutic,
CNS biologies and the handicaps to measuring drug the early studies should have:
concentrations and pharmacodynamic markers in the
• identiied the optimal population(s) in which to
CNS. In addition, clinical trials sufer from the insidi-
develop the new therapeutic
ous course followed by many neurological diseases,
• preliminarily deined the dosing paradigm for
from the limitations of functional outcome measures,
the novel therapeutic so that investigators can be
which have been oten developed as descriptors or clas-
conident that the biology under investigation is
siiers of disease rather than clinical trial endpoint, e.g.,
being impacted by the investigational therapeutic.
Expanded Disability Status Scale (EDSS), and from the
challenges posed by diseases that may only clinically Rational drug development identiies biological targets
manifest ater the accumulation of signiicant patho- that may be important to a disease’s pathophysiological
logic burdens. It is these diiculties that are largely process and then creates interventions that impact
responsible for the greater than average attrition rates these targets. he key challenge to drug development is
in late CNS drug development. herefore, every efort the translation of these discoveries from the laboratory
19
Figure 3.1. Examples of factors that

Target expressed determine the ability to translate from a
in man potential target to a clinical development
candidate.
Effect persists with Target expressed

prolonged exposure in diseased tissue
Drug development
candidate
Intervention binds
Target binding alters
target
disease model
(engagement)
Target binding
impacts signaling
bench into the human patient. It is in this endeavor that multiple sclerosis (MS) and Pittsburgh Compound B
clinical biomarkers can be used. PET scanning to conirm the presence of Aβ plaque
In considering clinical biomarkers it is import- in the brains of AD disease (AD) patients. In some
ant to distinguish between pharmacodynamic mark- instances, changes in these biomarkers (MS MRI) are
ers that measure the biological efect of a therapeutic also highly predictive of a clinical efect [1].
intervention and other biomarkers that relect the Development of pharmacodynamic (Pd) biomark-
pathophysiological processes of the targeted clinical ers for CNS drug development should be a high priority.
disease. A pharmacodynamic biomarker allows the For timely delivery of such biomarkers, their develop-
investigator to ascertain if the study drug is interacting ment should occur in parallel with transition of a mol-
with and afecting its desired target and helps to iden- ecule from non-clinical to clinical development.
tify the dosage range and exposures required to afect
this target. A pharmacodynamic marker will not neces-
sarily predict a therapeutically meaningful efect in the
Why are CNS diseases difficult
studied disease and population, but it will allow the to treat?
investigator to conirm that the biological hypothesis It is well recognized that few drugs (~11%) enter-
has been tested in clinical trials leading to a deinitive ing clinical trials will be approved for human use [2].
‘positive’ or ‘negative’ outcome. Such a clear binary out- Indeed, the rate of successful development is poorer
come is eminently more desirable than a ‘failed’ study for CNS drugs than other therapeutic areas, averaging
where the clinical outcome in the disease population is only 8% [3] with half of these failures occurring late in
negative but it is not known if the targeted biology was development (see Chapter 2).
altered by the study drug. Unfortunately, many of these failed development pro-
Examples of pharmacodynamic markers include grams have been associated with persisting uncertainty
assays of dystrophin in studies of therapeutic riboso- about the relevance of their biological targets to human
mal read-through of premature termination codons in diseases. he story of neurotrophic factor development
Duchenne muscular dystrophy and Aβ clearance from in ALS and PD is a great example, where clinical trials
the brain in AD. Biomarkers that relect the patho- were conducted without any certainty that the interven-
physiological process of the target neurodegenerative tions actually entered the CNS in adequate concentra-
disease are used to identify the optimal test populations or impacted their cognate receptors [4–6].
tion for a new therapeutic, to monitor disease progres- In addition, diiculties arise because of the com-
sion, and to measure slowing of disease progression. plexity of the CNS and the challenges of developing
Examples of biomarkers of disease pathophysiology clinical endpoints to capture therapeutic efects on mul-
include MRI brain lesion number and volume in tiple functional domains controlled by complex brain
20
Chapter 3: Unique challenges in the development of therapies
circuits. Some of these problems have arisen because and modeling of putative disease related pathways.
we have historically used clinical scales that were devel- Analyses of human neuropathology and disease gen-
oped to describe subgroups of domains afected by etics have identiied targets such as Aβ amyloid and tau
CNS diseases. One example is the EDSS in MS which in Alzheimer’s disease, and NGF, TRPV1 and voltage-
is largely inluenced by lower extremity walking func- gated sodium channels in pain.
tion, but less so by cognition or upper extremity func- When considering such high risk targets, it is
tion. Furthermore, it has been extremely diicult to necessary to develop as much information as possible
determine and agree on what degree of change repre- around the target’s expression in human disease and to
sents a clinically meaningful outcome. All of these fac- predict the behavior of the target’s speciic pharmaceut-
tors have contributed to a high degree of failure in late ical efects, including ainities, metabolism and CNS
stages of development (see Chapter 2). concentrations, and biological efects prior to making
his chapter will focus on the challenges presented decisions to advance through clinical trials.
by therapeutic targets in the CNS, the manner in which
animal models have and can be used to support the Animal models of CNS biology
translation of therapeutic biologies to the human CNS,
the BBB, and the uncertainties of CNS drug exposures. and ‘human disease’
his chapter will also attempt to address how these Animal CNS disease models are unique tools that have
challenges can be met and the associated risks of CNS led to a signiicant increase in the number of poten-
therapeutic development mitigated. tial new therapeutic targets and an improved under-
standing of the biologies underlying disease processes.
Targets hey have, however, proven disappointing in pre-
CNS targets traditionally tend to be proteins that are dicting therapeutic eicacy in humans in many areas
critical to neural signaling, trophic signaling, or guid- including neuropathic pain, ALS, PD, stroke, spinal
ance of neural projections. Neural signaling proteins cord injury and MS [7–9]. his has led to considerable
include neurotransmitter receptors, ion channels, and debate about the use of animal models in drug discov-
neurotransmitter transporters. Other proteins include ery. Nevertheless, models have clear utility and have
members of the glial cell line-derived factor (GDNF), largely sufered from the use of incorrect assumptions,
neurotrophin and other trophic factor receptor fam- inappropriate endpoints, and a failure to understand
ilies. Finally, a complex network of repulsion and their limitations. Models attempt to extend human
attraction guidance factors includes the Nogo receptor, pathology to other species or to in vitro systems. Few
Lingo receptor, and semaphorin families. models succeed in perfect replication of human dis-
Target selection is always diicult in the drug devel- ease. he modeling of human nervous system biologies
opment process. It is particularly challenging for neuro- to lower species such as rodents is particularly diicult.
logical indications owing to the relative complexity of Some reasons for this include the enormous relative
the human CNS, the paucity of validated targets (most complexity of the human cerebral cortex required to
neurological targets, while tantalizing, are poorly vali- support language, self-awareness, and comprehension,
dated and thus very risky) and the ‘orphan’ nature of in addition to the deep nuclear development required
many neurological diseases. for upright walking. his signiicantly impacts human
CNS targets that have been clinically validated in drug development. In pain, for example, the main bur-
human CNS disease include the dopamine D2, sero- den of pain is spontaneous while animal models can
tonin 5HT1b, gamma amino butyric acid-A (GABA-A), only be interrogated using evoked pain outcomes such
N-methyl-D-aspartate (NMDA) receptors, norepin- as thermal hyperalgesia. Similar issues can be expected
ephrine and 5HT transporters and monoamine oxi- in cognitive research and drug development for spinal
dase and catechol-O-methyl transferase (COMT). cord injury and psychiatric disease. he reasons for the
Additionally MS, which is the most successful ield of translational disconnect between rodents and humans
neurotherapeutics when one considers disease modii- have been recently summarized [8].
cation, has several robustly validated targets that include Many targets lack homology across species and thus
the VLA-4 integrin and type 1 interferon receptors. have quite diferent ainities for drug molecules. In add-
Non-validated targets are numerous and have ition, animal models are quite susceptible to changes
and will be identiied through the understanding in genetic background and transgenic animal models
21
sufer from transgene dose variations. Challenges in • A physical barrier created by high-resistance tight
simulating human drug exposures in rodents and even junctions between the endothelial cells of the
non-human primates may oten be an additional prob- cerebral vasculature.
lem, owing to rapid and alternative metabolism path- • A dynamic barrier comprised of:
ways that may generate species speciic metabolites. In ◦ Enzymes, including esterases and peptidases, are
contrast to rodents, humans walk upright and engage in expressed in endothelium and astrocytes. hese
highly complex social interactions that require unique enzymes are inducible and act as ‘enzymatic
functional domains in the human CNS. his has been barriers’ to brain inlux of xenobiotics [13].
particularly challenging for the development of cog- ◦ Active elux transporters which actively
nition enhancing drugs for the treatment of AD, mild remove biological molecules from the CNS.
cognitive impairment and schizophrenia. here are multiple transporters having multiple
Some of these issues arise because of the diference substrates [14]. hese transporters may be
between the conduct of survival studies in animal facil- energy-dependent or energy-independent.
ities and in human studies, the diiculty of examining
• p-glycoprotein (p-GP; MDR1; ABCB1)
spontaneously occurring symptoms, such as pain, in
is the best characterized of the BBB elux
animals, and the frequent under-powering of thera-
transporters for low molecular weight
peutic studies in animals [10, 11].
molecules. It, along with the MRP (ABCC)
In summary, the animals (primates and non-pri-
family and BCRP (ABCG2), is a member
mates) themselves are physiologically quite diferent
of the ATP binding cassette protein family
from humans. hey metabolize drugs diferently from
of energy-dependent transporters that
humans and the application of human clinical endpoints
are expressed at the BBB [15]. Assays
such as cognition and survival run the risk of misinform-
are currently available to assess if drug
ing decisions to move drugs into human development.
candidates are p-GP substrates.
In general an animal model can be extremely
informative in translating an in vitro deined concen- • Energy-independent elux transporters
tration-binding relationship to an in vivo dose-efect belong to the SLC gene family and include
or, more usefully, an in-vivo concentration-efect rela- the organic anion transporter, acidic amino
tionship through demonstration of target engagement acid transporter and others including
using histological or physiological outcomes, e.g., Aβ taurine transporter [16].
clearance from the brains of mutant APP transgenic • Macromolecules such as IgG and transferrin
mice [12]. In addition, animal models can support are actively eluxed by transcytosis from
the identiication and validation of pharmacody- the brain to circulating blood. In addition,
namic markers (see later). It is in the use of the above Aβ is actively eluxed across the BBB via
approaches that one can avoid the pitfalls associated the low-density lipoprotein related protein.
with the oten underappreciated diferences between here are several reviews of the BBB and its relevance to
animal models and patients. CNS pharmaceutical development worth studying [13,
14, 16, 17]. In dealing with the BBB it is important to
Blood brain barrier and its impact on CNS distinguish it from the blood-CSF and CSF-brain bar-
riers which demonstrate signiicant histological and
drug development physiological diferences from the BBB. In fact, these
In addition to achieving appropriate selectivity, ei- diferences do result in diferences in drug concentra-
cacy, and systemic pharmacokinetic characteristics, tions in human CSF and brain extracellular luid, as
CNS therapeutics must overcome an additional hur- measured by intraoperative microdialysis [18].
dle: they must penetrate the blood brain (or if given
intrathecally, the CSF-brain) barrier [13]. Identifying and creating compounds with
he BBB is a physical and physiological barrier
that biochemically isolates the CNS so preserving CNS optimal CNS pharmaceutical properties
homeostasis by protecting it from endogenous and To be used in the treatment of CNS disorders, any phar-
exogenous toxins. his barrier is composed of several maceutical must be able to achieve adequate concen-
components: trations in the brain and spinal cord extracellular luid.
22
In order to do this, a molecule must achieve a favorable literature describes hyperosmotic shock disruption
equilibrium between CNS entry and elux. of the BBB. In this method, the hyperosmolar agent
Lipophilic compounds may be able to difuse across mannitol has been used to administer antineoplastic
the BBB, particularly those of low molecular weight drugs such as methotrexate in the treatment of brain
(<500 Da). One quick ‘rule of thumb’, for small mol- tumors [23].
ecules, states that if the number of nitrogen and oxygen he most apparently simple way to deliver a
atoms is ≤ ive, it has a high chance of entering the brain drug to the CNS is to by-pass the BBB entirely. Intra-
[19]. Nevertheless, it is important to remember that the parenchymal delivery via catheter has been used, most
majority of small molecules are unable to cross the BBB recently in clinical trials of recombinant GDNF in PD
[16]. So while rational drug design using combinator- with mixed results [4]. It remains unclear if adequate
ial synthesis and computational models to predict BBB amounts of the drug were delivered to the target putamen
penetration should be used, various in vitro assays [24]. his could be a function of poor difusion from the
should be used to evaluate BBB penetration properties catheter tip. One approach to deal with this problem is to
of a drug candidate prior to its consideration for devel- use bulk low, through convection-enhanced difusion,
opment as a CNS therapeutic [20]. to drive drug distribution. his method has been used
In addition, the above does not apply to macromol- to treat gliomas and metabolic brain disease (e.g., neu-
ecules such as recombinant proteins and monoclonal ronopathic Gaucher’s disease) [25]. Finally, intrathecal
antibodies, which are being increasingly used to tar- (IT) and intraventricular (ICV) approaches deliver drug
get neuropathologies such as AD. Current data suggest to the CSF, so by-passing the blood-CSF barrier. hese
that <0.1% of systemic concentrations of a monoclonal methods have had success in the management of pain
antibody will enter the brain [21], as assessed by the (IT opioids), spasticity (IT baclofen) and meningeal
analysis of CSF, being used as an imperfect surrogate neoplasia. Nevertheless, drug difusion across the CSF-
for brain interstitial luid. brain barrier (ependyma) is slow, inversely proportional
Ultimately, empiric investigation of a drugs penetra- to molecule size, and results in drug delivery just adja-
tion into brain in humans is best and can be studied non- cent to the ependymal surfaces [26, 27], thus limiting its
invasively by PET or MR spectroscopy, where possible. utility in delivery to deep brain structure.
Various approaches can be used to enhance CNS With all these considerations, it is clear that drug
drug delivery [16]. Chemical modiications of the development for the CNS is very challenging and that
pharmaceutical molecules are regularly used. hese evidence of CNS delivery and CNS target engagement
modiications are made to overcome limited access is critical to avoid the signiicant late stage trial failures
of drugs to the brain. hey can be divided into lipid- associated with the CNS therapeutic area.
mediated transport, a pro-drug approach, and lock-in
system. here are several disadvantages of the systems,
which include increasing permeability to all biological How can biomarkers and CNS exposure
membranes in the body and enhancing inlux as well as measurements mitigate the risks of
inlux through the BBB.
Exploitation of carrier-mediated and receptor-
CNS drug development?
mediated transport can also be used. L-DOPA is a
classic example where dopamine, a water-soluble mol- Biomarkers
ecule which cannot cross the BBB is transformed to Biological markers (biomarkers) are objectively meas-
a large neutral amino acid that uses the large neutral ured characteristics that can be evaluated to describe
amino acid transporter to cross the BBB ater which it normal biological processes, pathogenic processes, or
is converted to the active neurotransmitter. Genetically pharmacologic responses to a therapeutic intervention
engineered fusion proteins that exploit the receptor- [28]. he term biomarker has been used very loosely,
mediated transport system are another tool for deliver- yet a set of deinitions have been laid down by an NIH
ing large molecules across the BBB. Receptor systems working group since 2001 [29]:
currently being evaluated include the insulin receptor • Type 0 biomarker: A biomarker that changes
and the transferrin receptor [22]. longitudinally with the disease process.
Other workers have attempted to increase drug • Type 1 biomarker: A biomarker that changes
delivery through disruption of the BBB. he most robust in response to a therapeutic intervention, i.e., a
23
Figure 3.2. Examples of critical

Candidate biomarker readouts in drug development.
therapeutic
No
Engages
target
No
Yes
Affects
target
Yes
No
Alters
disease
Yes
pharmacodynamic marker. It does not necessarily human brain has received scant attention. In practice
have to relect a change in the disease process. this is extremely diicult without the use of inva-
• Type 2 biomarker: A biomarker whose change can sive techniques. Invasive measurement techniques
predict a clinical alteration in a disease process, include microdialysis [30] and CSF measurement.
i.e., a ‘surrogate biomarker’, and be used in place of Microdialysis has been used to measure drug concen-
the clinical endpoint. trations and brain parenchyma in the context of epi-
lepsy surgery.
Biomarkers can be classiied further according to the CSF is a poor surrogate for assessing CNS exposure
questions that they can answer in human clinical trials [31] because it does not always give an accurate cor-
during the course of a drug development plan. here relation of unbound drug concentration in the brain.
are three broad questions (Figure 3.2): As described above, the blood-CSF barrier difers from
• Does the therapeutic hit/engage its target? Can it the BBB because it is formed by the choroid plexus epi-
do this at reasonable doses with acceptable safety? thelium which does have tight junctions, but where the
• Does the therapeutic, once engaged, alter the capillaries are fenestrated. In addition, the blood-CSF
target’s behavior? What is the dose-efect/ barrier has its own speciic drug transporters [22]. In
concentration-efect relationship? addition, brain interstitial luid and CSF are composed
• If the therapeutic engages and alters target diferently and, indeed, signiicantly lower levels of a
behavior, what is the efect on the disease of number of anti-convulsants have been demonstrated
interest? in CNS interstitial luid when compared to CSF [18].
Non-invasive imaging techniques have been used
with success to conirm CNS delivery of small and bio-
Measuring CNS exposure and target logical molecules and to demonstrate target engage-
engagement ment by the therapeutic agent [32, 33]. Radio-labeled
Ideally, one can ascertain the concentration of free receptor ligands have been used with PET scanners to
drug in the interstitial luid of the CNS. he plasma conirm delivery and receptor occupancy (RO) rates in
pharmacokinetics of systemically delivered drugs the development of psychiatric and neurological phar-
has been extensively investigated, but kinetics in the maceuticals. In one recent clinical study (Figure 3.3) we
24
(a) 120 (b) 120 (c) 120

Receptor occupancy (%)
100 100 100
80 80 80
60 60 60
40 40 40
0 20 40 60 80 100 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 5 10 15 20 25

Dose (mg/d) Cmin (µg/mL) AUCmin (µg*h/mL)
Figure 3.3. Relationship among vipadenant dose (a), steady-state Cmin (b), and steady-state AUC0-T (c) and receptor occupancy in the
putamen. Receptor occupancy was estimated using Bayesian Emax (dose) or logistic regression (Cmin, AUC0-tau) models. Black circles and
the thick line represent the mean predicted receptor occupancy; white diamonds represent actual trial data. Thin lines represent the 95%
confidence limits of the predicted mean. (Reproduced from reference 33 with permission from Wolters Kluwer Health.)
used a 11C labeled high ainity ligand for the adenosine studies that rely on highly variable endpoints such as
A2a receptor to establish dose-RO and blood PK-RO the 6-minute walk test [34]. In one set of experiments
relationships. hese relationships did not tell us any- [35], the developers demonstrated that ataluren can
thing about the potential clinical eicacy in patients but drive ribosomal read through premature stop codons,
enabled us, with conidence, to reine a dose range for promoting expression of dystrophin in primary muscle
testing in a series of clinical eicacy phase 2 trials. cells from Duchenne patients and mdx mice, and res-
cue striatal muscle function in mdx mice. hese data
Measuring biological effects laid the groundwork for a Phase 2A study that demon-
strated the upregulation of dystrophin in muscle biop-
of a therapeutic sies from ataluren treated DMD patents. his is a nice
In the absence of the ability to deine CNS exposures, example of translational validation of a Pd biomarker
a Pd marker that demonstrates biochemical alterations from non-clinical to clinical experiments.
of the target is almost a necessity, except in those indi- Other examples of biochemical Pd biomarkers have
cations, e.g., neuropathic pain, where phase 2 studies supported AD drug development, most particularly for
can empirically test a number of potentially eica- Aβ targeted strategies. Recently, clinical development
cious dose paradigms using clinical outcome meas- of solanezumab as a passive immunotherapy for AD
ures in small numbers of patients over short follow-up was advanced to Phase 3 following human study that
periods. demonstrated substantial dose-dependent increases
Pd markers can be biochemical (e.g., serum in plasma and CSF Aβ [36]. his clinical observa-
cytokines, serum soluble Aβ species), radiological tion mimicked the serum and CSF efects of M266 (a
(e.g., Pittsburgh B PET ligand) or electrophysiological murine homolog of solanezumab) that were linked, in
(e.g., magnetoencephalography) in nature. Pd biomar- animal studies, to the clearance of Aβ from the CNS
kers are used in Phase 1, 2, and 3 studies, although most [37]. he validity of this translational approach using a
commonly in Phase 1 and 2 studies. set of biochemical biomarkers was critically supported
Developers of ataluren for Duchenne muscular by a phase 0 study demonstrating Aβ’s diurnal varia-
dystrophy (DMD) have used an elegant set of human tion in CSF, thus helping to control for spurious con-
and animal studies to support the use of muscle expres- clusions around non-treatment related alterations in
sion of dystrophin as a pharmacodynamic marker CSF Aβ levels [38].
to conirm biological activity and to reine the clin- Radiologically, Aβ imaging has become a useful
ical dose range before embarking on Phase 2B and 3 tool for measuring the efects of anti-Aβ therapies
25
in humans [39]. Non-clinical radio-pathological 3. Miller G. Is pharma running out of brainy ideas? Science
correlation experiments have conirmed the feasibility 2010; 329: 502–4.
of imaging Aβ clearance [40] from the CNS. In add- 4. Gill SS, Patel NK, Hotton GR, et al. Direct brain
ition, a reduction in Pittsburgh B signaling in human infusion of glial cell line-derived neurotrophic factor in
brain PET scan following passive immunotherapy has Parkinson disease. Nature Med 2003; 9: 589–95.
been demonstrated [41], further supporting the valid- 5. A controlled trial of recombinant methionyl human
ity of this methodology for deriving biological proof BDNF in ALS: he BDNF Study Group (Phase III).
of principle in human CNS trials. here are no human Neurology 1999; 52: 1427–33.
data that these interventions or CNS Aβ clearance 6. Ochs G, Penn RD, York M, et al. A phase I/II trial
mitigate AD dementia, but the biologically relevant of recombinant methionyl human brain derived
neurotrophic factor administered by intrathecal
doses have been deined allowing the necessary Phase
infusion to patients with amyotrophic lateral sclerosis.
3 conirmatory studies that use clinical dementia out- Amyotroph Lateral Scler Other Motor Neuron Disord
come measures to conidently test the hypothesis that 2000; 1(3): 201–6.
Aβ deposition in the brain is a key pathophysiological
7. Rothstein JD. Current hypotheses for the underlying
event that drives AD dementia. biology of amyotrophic lateral sclerosis. Ann Neurol
Similar non-clinical and phase 0 clinical preparatory 2009; 65(Suppl 1): S3–9.
non- and clinical studies are warranted for the devel- 8. Geerts H. Of mice and men: bridging the translational
opment of Pd biomarkers for proof of biological prin- disconnect in CNS drug discovery. CNS Drugs 2009; 23:
ciple studies. Such tools can markedly reduce the risk of 915–26.
moving forward in CNS drug development by providing 9. Akhtar AZ, Pippin JJ, Sandusky CB. Animal models
robust proof of biology and by identifying a biologically in spinal cord injury: a review. Rev Neurosci 2008; 19:
relevant dose range for further clinical studies. 47–60.
10. Ludolph AC, Bendotti C, Blaugrund E, et al. Guidelines
Conclusions for preclinical animal research in ALS/MND: A consensus
meeting. Amyotroph Lateral Scler 2010; 11: 38–45.
All clinical trials, as any scientiic experiment, must
give a clear answer that supports a clear decision. his 11. Gold R, Linington C, and Lassmann H. Understanding
pathogenesis and therapy of multiple sclerosis via
goal is particularly challenging for CNS therapeutics animal models: 70 years of merits and culprits in
development and has resulted in a high historical rate experimental autoimmune encephalomyelitis research.
of attrition. Nevertheless new imaging and biochem- Brain 2006; 129: 1953–71.
ical and electrophysiological methods ofer opportun- 12. Wilcock DM, Rojiani A, Rosenthal A, et al. Passive
ities to mitigate the risks of failure. To do this, a CNS amyloid immunotherapy clears amyloid and transiently
drug development plan must deine clear parameters to activates microglia in a transgenic mouse model of
test a biological hypothesis through the direct or indir- amyloid deposition. J Neurosci 2004; 24: 6144–51.
ect conirmation of target engagement and alteration 13. Ghersi-Egea JF, Leininger-Muller B, Cecchelli R, and
within the CNS (through CNS imaging, electrophysio- Fenstermacher JD. Blood-brain interfaces: relevance
logical, or biochemical assays) followed by correlation to cerebral drug metabolism. Toxicol Lett 1995; 82–83:
of that biology to a clinical outcome that is clinically 645–53.
meaningful. In other words, the goal of early Phase 1 14. Terasaki T and Ohtsuki S. Brain-to-blood transporters
and 2 development is to conirm biological efect prior for endogenous substrates and xenobiotics at the blood-
to embarking on phase 3 comparative eicacy clinical brain barrier: an overview of biology and methodology.
NeuroRx 2005; 2: 63–72.
protocol.
15. Loscher W and Potschka H. Blood-brain barrier active
elux transporters: ATP-binding cassette gene family.
References NeuroRx 2005; 2: 86–98.
1. Sormani MP, Bonzano L, Roccatagliata L, et al. 16. Patel MM, Goyal BR, Bhadada SV, et al. Getting into the
Magnetic resonance imaging as a potential surrogate brain: approaches to enhance brain drug delivery. CNS
for relapses in multiple sclerosis: a meta-analytic Drugs 2009; 23: 35–58.
approach. Ann Neurol 2009; 65: 268–75. 17. Neuwelt E, Abbott NJ, Abrey L, et al. Strategies to
2. Kola I, Landis J. Can the pharmaceutical industry advance translational research into brain barriers.
reduce attrition rates? Nature Rev 2004; 3: 711–5. Lancet Neurol 2008; 7: 84–96.
26
18. Rambeck B, Jurgens UH, May TW, et al. Comparison 31. Lin JH. CSF as a surrogate for assessing CNS exposure:
of brain extracellular luid, brain tissue, cerebrospinal an industrial perspective. Curr Drug Metab 2008; 9:
luid, and serum concentrations of antiepileptic drugs 46–59.
measured intraoperatively in patients with intractable 32. Weinmann O, Schnell L, Ghosh A, et al. Intrathecally
epilepsy. Epilepsia 2006; 47: 681–94. infused antibodies against Nogo-A penetrate the CNS
19. Norinder U and Haeberlein M. Computational and downregulate the endogenous neurite growth
approaches to the prediction of the blood-brain inhibitor Nogo-A. Mol Cell Neurosci 2006; 32: 161–73.
distribution. Adv Drug Deliv Rev 2002; 54: 291–313. 33. Brooks DJ, Papapetropoulos S, Vandenhende F, et al.
20. Lohmann C, Huwel S, and Galla HJ. Predicting blood- An open-label, positron emission tomography study
brain barrier permeability of drugs: evaluation of to assess adenosine A2A brain receptor occupancy of
diferent in vitro assays. J Drug Target 2002; 10: 263–76. vipadenant (BIIB014) at steady-state levels in healthy
male volunteers. Clin Neuropharmacol 2010; 33: 55–60.
21. Rubenstein JL, Combs D, Rosenberg J, et al.
Rituximab therapy for CNS lymphomas: targeting the 34. Finkel RS. Read-through strategies for suppression of
leptomeningeal compartment. Blood 2003; 101: 466–8. nonsense mutations in Duchenne/ Becker muscular
dystrophy: aminoglycosides and ataluren (PTC124).
22. Pardridge WM. Drug delivery to the brain. J Cereb
J Child Neurol 2010; 25: 1158–64.
Blood Flow Metab 1997; 17: 713–31.
35. Welch EM, Barton ER, Zhuo J, et al. PTC124 targets
23. Kroll RA and Neuwelt EA. Outwitting the blood-brain
genetic disorders caused by nonsense mutations. Nature
barrier for therapeutic purposes: osmotic opening
2007; 447: 87–91.
and other means. Neurosurgery 1998; 42: 1083–99;
Discussion 99–100. 36. Siemers ER, Friedrich S, Dean RA, et al. Safety and
changes in plasma and cerebrospinal luid amyloid
24. Salvatore MF, Ai Y, Fischer B, et al. Point source
beta ater a single administration of an amyloid beta
concentration of GDNF may explain failure of phase II
monoclonal antibody in subjects with Alzheimer
clinical trial. Exper Neurol 2006; 202: 497–505.
disease. Clin Neuropharmacol 2010; 33: 67–73.
25. Lonser RR, Schifman R, Robison RA, et al. Image-
37. DeMattos RB, Bales KR, Cummins DJ, et al. Peripheral
guided, direct convective delivery of glucocerebrosidase
anti-A beta antibody alters CNS and plasma A beta
for neuronopathic Gaucher disease. Neurology 2007; 68:
clearance and decreases brain A beta burden in a mouse
254–61.
model of Alzheimer’s disease. Proc Natl Acad Sci USA
26. Krewson CE, Klarman ML, and Saltzman WM. 2001; 98: 8850–5.
Distribution of nerve growth factor following direct 38. Blennow K, Zetterberg H, Minthon L, et al.
delivery to brain interstitium. Brain Res 1995; 680: Longitudinal stability of CSF biomarkers in Alzheimer’s
196–206. disease. Neurosc Lett 2007; 419: 18–22.
27. Blasberg RG, Patlak C, and Fenstermacher JD. 39. Bacskai BJ, Frosch MP, Freeman SH, et al. Molecular
Intrathecal chemotherapy: brain tissue proiles ater imaging with Pittsburgh Compound B conirmed at
ventriculocisternal perfusion. J Pharmacol Exper her autopsy: a case report. Arch Neurol 2007; 64: 431–4.
1975; 195: 73–83.
40. Maeda J, Ji B, Irie T, et al. Longitudinal, quantitative
28. Frank R and Hargreaves R. Clinical biomarkers in assessment of amyloid, neuroinlammation, and
drug discovery and development. Nature Rev 2003; 2: anti-amyloid treatment in a living mouse model of
566–80. Alzheimer’s disease enabled by positron emission
29. Biomarkers Deinitions Working Group Biomarkers and tomography. J Neurosci 2007; 27: 10957–68.
surrogate endpoints: preferred deinitions and conceptual 41. Rinne JO, Brooks DJ, Rossor MN, et al. 11C-PiB PET
framework. Clin Pharmacol herap 2001; 69: 89–95. assessment of change in ibrillar amyloid-beta load
30. Helmy A, Carpenter KL, and Hutchinson PJ. in patients with Alzheimer’s disease treated with
Microdialysis in the human brain and its potential role bapineuzumab: a phase 2, double-blind, placebo-
in the development and clinical assessment of drugs. controlled, ascending-dose study. Lancet Neurol 2010;
Curr Med Chem 2007; 14: 1525–37. 9: 363–72.
27
Section 2 Concepts in biostatistics and clinical measurement
Chapter
Fundamentals of biostatistics
4 Judith Bebchuk and Janet Wittes
Statistical formulation of clinical remainder of this book presents more complete dis-
cussions of various topics addressed in this chapter.
questions he chapter begins with some general ideas about the
While in vitro and animal experimentation can yield design of controlled clinical trials. It then sketches
valuable information about the action of a new drug in basic statistical principles with an introduction
or other intervention, only clinical trials on human to calculating the necessary sample size for trials. A
beings can determine a drug’s safety proile and clin- basic formula is presented for sample size that can be
ical eicacy in humans. he necessity for using people adapted to continuous, binary, and time-to-failure var-
as subjects in potentially risky experiments makes iables. Because of the importance in neurology of trials
clinical trials diicult to perform well, since they must studying time to failure, analyses relevant to this type
be conducted in accordance not only with scientiic of outcome are then introduced. Issues related to the
rigor but also observing ethical guidelines, regulatory efect of multiplicity on sample size are addressed and
codes, and legal statutes. Furthermore, the investiga- issues that afect sample size are mentioned. Finally,
tors designing and carrying out a clinical trial must Bayesian analysis is briely introduced. hroughout,
be fully aware of the serious consequences of their we illustrate the methods using an example from a
indings and must maintain high standards of sci- hypothetical trial that tests the cognitive subscale of
entiic probity. he large number of competing ther- the Alzheimer’s Disease Assessment Scale (ADAS-
apies, the high cost of conducting a clinical trial, the Cog).
ethical considerations that mandate against further
testing of therapies shown to have an unfavorable pro- General principles of the design
ile of risks and beneits, and the desire for the timely
introduction of new and efective therapies into gen- of controlled clinical trials
eral application all sharply limit the redundancy of A controlled clinical trial of a medical intervention
clinical research and increase the importance of the should have at least one primary hypothesis that drives
integrity of individual studies. For example, if a clin- its design. Typically, the hypothesis is expressed in
ical trial wrongly declares a beneicial therapy to be terms of the efect of the intervention on one or more
inefective, the potential gains from use of the therapy outcomes of primary interest. For example, investiga-
will most likely be lost indeinitely because the ther- tors studying a new drug that potentially decreases the
apy will probably not be tested again. Conversely, the rate of loss of cognitive functioning may hypothesize
falseness of a inding that an unsafe drug is safe or that that the drug will lead to a lower decline of score on a
a worthless therapy is beneicial may not be detected cognitive assessment test than the control treatment.
until large amounts of resources are wasted or until In general, the more explicit the stated primary pur-
many people are hurt. pose, the more likely one can design a feasible study to
his chapter highlights some important aspects of answer the question of interest.
the design and analysis of clinical trials and sketches Well-designed and well-executed trials include an
a number of relevant statistical concepts [1–3]. he unambiguous protocol approved by the Institutional
28
Chapter 4: Fundamentals of biostatistics
Review Boards (IRBs) or Ethics Committees of the Secondary and exploratory outcomes
participating clinics, laboratories, and data centers.
Most clinical trials in neurology study more than the
he comprehensiveness of both the protocol and
primary outcome. Secondary outcomes are measures
its supporting manuals of operation should relect
that are of clinical interest but are less important to the
the size and complexity of the investigation and the
aims of the trial than the primary outcome.
length of the follow-up, as well as the number of
he protocols of many clinical trials list a host of
clinical centers, laboratories, and other organizations
secondary outcomes with little consideration of their
involved. During a trial, unexpected events may occur
relative importance in terms of the inferences to be
that necessitate changes in the protocol. he protocol
made from the trial. A helpful rule of thumb is to con-
should include explicit, well-documented procedures
sider as secondary outcomes only those for which the
for making amendments ater the study has begun in
investigators have formal hypotheses. Using that guide-
order to protect the scientiic integrity of the trial.
line, the investigators should include in the protocol of
a clinical trial a list of the secondary outcomes with the
Primary outcome planned methods of measurement and analysis as well
A clinical trial should include at least one explicit, as the magnitude of treatment efect the study is likely
unambiguously deined primary outcome that forms to detect. When secondary outcomes are speciied,
the basis for calculating the sample size of the trial. For investigators should in general use the same degree of
example, in a study of the efect of a new anti-dementia care in collecting relevant data for these outcomes as
drug on progression of Alzheimer’s disease, the proto- for the primary outcomes.
col should state the outcome in terms of the measure In addition to formal secondary outcomes, the
that will be used. For example, the endpoint may be protocol may list many variables to be measured as
‘cognitive function’ or ‘functional ability’ as assessed exploratory outcomes. Oten, the sample size of the
by a speciic instrument but not a vague reference to trial is too small to expect precise assessments of the
‘measures of disease progression.’ efect of the experimental intervention on these out-
Most neurological clinical trials have one of three comes. In other cases, too little information is available
types of primary outcome: a continuous or ordinal to calculate power for these explanatory outcomes.
variable, a binary variable, or a time-to-failure variable. Moreover, the degree of care in collecting and valid-
hese three types of outcomes lead to diferent types of ating these outcomes may be less intense than the care
studies. exerted for the primary and secondary outcomes.
A continuous variable is a quantity like a score
that is measured on a continuous, or nearly continu- Study population
ous, scale. (In the speciic example of ADAS-Cog, the A clinical trial must achieve a balance between the
score ranges from 0 to 70.) An ordinal variable has sev- advantages of homogeneity and the advantages of
eral ordered classes. For example, the Clinical Global heterogeneity. Ideally, the study cohort is suiciently
Impression is a 7-point scale that measures the global homogeneous to yield a high probability of learning
functional status of a patient. A binary variable has two whether a therapy is safe and efective while sui-
possible values, for example, a score above or below 40 ciently heterogeneous to provide assurance that the
on the ADAS-Cog or, for trials studying the efect of an observed results are applicable to a wide range of
intervention on mortality, dead or alive. people with the condition under study. No rule pro-
A time-to-failure variable measures the time from vides reliable guidance for planning the composition
randomization to the occurrence of an event. (Some of the study cohort for a single study or for struc-
trials use time from initiation of treatment, but using turing a series of studies to investigate a therapy in
any time diferent from randomization compromises diferent populations. Failure to anticipate fully the
the expected equivalence of the study groups assured by consequences of overly rigid inclusion and exclusion
the process of randomization.) Time to death and time criteria for a clinical trial may lead to great diiculty
to loss of 20 points on the ADAS-Cog are examples of in recruiting patients [4].
time-to-failure variables. See Chapter 7 for a fuller dis- Designers of clinical trials should not overestimate
cussion on issues related to measurement. the ability of investigators to recruit participants into
29
Section 2: Biostatistics and clinical measurement
the trial. We recommend as simple a set of entry and to generalize to a population more heterogeneous
exclusion criteria as possible: than that represented in the study cohort, the proto-
1. he criteria should mimic as closely as possible col should address both the justiication for excluding
the patient population to which the results are the subgroup and the rationale for generalization to
intended to refer. the reference population. Furthermore, the publica-
2. he criteria may exclude people who are unlikely to tions describing the results of the trial should include
comply with the requirements of the protocol. For a description of the population to which the results are
example, the study may exclude people who have to be applied.
severe underlying illnesses not under study, who
are substance abusers, or who are likely to move to Projected timeline
another geographic area during the study. Many Plans for a clinical trial should describe explicitly the
trials exclude people who, because of cognitive timeline for an individual participant in the study. In
problems, are not able to comply with the study studies in which the outcome is measured very soon ater
regimen. In trials of neurological disorders, the the participants are recruited, the timeline is uniform for
patient population of interest may be cognitively each participant and follows calendar time, while in tri-
impaired; in that case, the entry criteria should allow als with long-term follow-up the timeline may difer for
them to participate in the trial, but the protocol each participant [5]. In many long-term studies, patient
should be written in such a way as to facilitate enrollment takes place throughout the course of the
compliance and the statistical analysis plan should trial, and follow-up for each participant continues until
deal explicitly with how to handle missing data the trial ends. Some trials end on a pre-speciied date;
arising from non-compliance. Similar considerations some end at a ixed number of months ater the last par-
are relevant to trials of psychiatric conditions where ticipant has entered; and some end ater a ixed number
the nature of the disease may lead to considerable of primary outcome events have occurred. Because
non-compliance with the study regimen. the rate of recruitment oten difers from expected, the
3. he criteria may exclude people taking average follow-up time may be considerably longer or
medications that are not appropriate for use with shorter than planned so that the probability of ind-
the intervention being tested. ing an efect of treatment may be higher, or lower, than
4. Exclusions on the basis of demographic criteria expected. In particular, when recruitment is slower than
alone (e.g., sex, age, and race) are not oten anticipated, average follow-up time is likely to be longer
scientiically justiiable and may make recruitment than expected; conversely, rapid recruitment may lead
unnecessarily diicult. In trials of new drugs, to shorter follow-up time than planned.
however, the standards of local IRBs and general
ethical considerations may exclude women of Control group
childbearing potential. Furthermore, in trials
Comparing observations from an experimental group
of primary prevention of disease, a trial may
to observations from a control group is central to sci-
reasonably exclude demographic subgroups with
ence. In a few very unusual medical settings, the con-
very low incidence rates to limit the sample size of
trol group need not be explicit, for the new observation
the study and to focus the question of prevention
is so surprising that it deies all previous experience.
on subgroups at high risk.
Penicillin provides the classic example of a new drug
5. If possible, before the protocol is written, the
that had an immediately obvious beneit and needed
entry and exclusion criteria should be applied to a
no control group to show eicacy. Almost always, how-
database of people potentially eligible for the study
ever, the medical condition being studied varies in its
in order to estimate the likely rate of recruitment.
presentation, and the treatment being studied elicits
variable response. herefore, rigorous, unbiased infer-
Reference population ence about the efect of a drug or other intervention
In planning a clinical trial, investigators oten specify requires comparison to a concurrent, randomized con-
in the protocol the population to which the treatment trol. In very early phases of drug development a con-
is expected to apply. In particular, if a trial excludes a trol group may not be necessary, but for clinical trials
speciic subgroup of people but investigators intend that aim to evaluate both safety and eicacy, a control
30
group is important. Since adverse events in a treatment is less among patients treated with drug than among
group may be either a result of the medical condition those treated with placebo.’ he use of a one-sided
being treated or a reaction to the new drug, the safety of alternative implies that the investigators do not enter-
a drug can only be accurately assessed by comparison tain the possibility that the drug might lead to a greater
with a control group. Similarly, the beneicial efect of a decrease in mean change in score from baseline com-
treatment can be measured only in relation to a control. pared to placebo. A ‘two-sided’ alternative hypothesis
he control group may be a group treated with a num- would state, ‘Mean change in score from baseline is dif-
ber of interventions, including placebo, ‘usual care,’ ferent among patients treated with drug than among
‘standard of care’, ‘other therapy’ plus a placebo [5], a those treated with placebo.’ Adoption of a two-sided
non-drug intervention (e.g., surgery or a behavioral alternative allows the data to provide evidence of either
intervention), or a competing drug. A control group favorable or unfavorable efect of drug on cognitive
should be as comparable as possible to the experimen- function. For most comparative eicacy studies in
tal group so that diferences in the endpoint being neurology, two-sided alternative hypotheses are con-
studied are attributable solely to the diference in ther- sidered appropriate.
apy. Randomization assures that the experimental and
control groups have identical expected distributions The type I error, or ‘α-level’
of measured and unmeasured baseline variables. he
Having speciied the null and alternative hypotheses,
larger the sample size, the more likely the two groups
the investigators select an α-level, the probability of
will be very similar to each other with regard to base-
erroneously concluding that the null hypothesis is
line characteristics. In small samples, while random-
false if the null hypothesis is indeed true. Although
ization ensures identical expected distributions, the
the choice of α-level is arbitrary, many clinical trials
actual distributions may difer sizably from each other
use α = 0.05 or α = 0.01. To continue our example, sup-
by chance.
pose the new drug had no efect on the mean change in
ADAS-Cog score from baseline and the investigators
Basic statistics for randomized selected a two-sided α-level of 5%. hen, the probabil-
ity would be 0.05 that the clinical trial would ‘reject’ the
clinical trials null hypothesis and falsely ind that the mean change
his section briely describes the basic frequentist statin score from baseline difers (either better or worse) in
istical testing paradigm used by the typical randomized the treatment and control groups.
clinical trial with particular reference to ideas neces- Many investigators view an experiment that pro-
sary in selecting sample size. We do not address esti- duces a p-value less than 0.05 two-sided, or less than
mation because that topic is covered in Chapter 6. We 0.025 one-sided, as strong evidence, even ‘proof ’, that
introduce hypothesis testing and conidence intervals the treatment under study was efective. We caution
insofar as they are relevant to sample size calculation. that 0.05 is not a very stringent criterion – the prob-
Chapter 6 includes more detail. ability that a single toss of a pair of dice yields two sixes
is 1/36 = 0.028. If you were playing a game of backgam-
Null and alternative hypotheses mon and your opponent rolled a pair of sixes on the
irst toss, you would think the opponent was lucky; you
he study question in a typical clinical trial is formu-
would not think the dice were loaded.
lated in terms of two opposing hypotheses: the ‘null’
hypothesis and an ‘alternative’ hypothesis. he study
is designed to provide evidence that will disprove the The type II error, or ‘β-level’
null and therefore ‘accept’ the alternative. For example, To calculate sample size, the investigators must pre-
consider a trial of a drug whose purpose is to slow the dict the degree of efect ΔA of the therapy under study
rate of decline of the score on the ADAS-Cog. If the (the subscript A denotes ‘alternative’). In our example,
study has two arms, drug and placebo, the null hypoth- they might specify that drug treatment might lead to
esis might be, ‘Mean change in score from baseline in a decline in ADAS-Cog subscale that is less than 10
patients treated with drug is the same as among those points lower than the decline in the control group.
treated with placebo.’ A ‘one-sided’ alternative hypoth- he β-level, or type II error rate, is the probability of
esis would be, ‘Mean decrease in score from baseline failing to reject the null hypothesis if the true efect of
31
1
the drug is ΔA. he choice of β-level, like the choice
of α-level, is arbitrary. Typical values used in many 0.9
clinical trials are 5, 10, or 20%. he smaller the α- and 0.8
β-levels, the less likely the clinical trial is to make an
0.7
incorrect conclusion. In our example, β = 0.20 would
0.6
imply, for example, that if the true efect of treatment
Power
were to halt average decline in the ADAS-Cog over 0.5
the period of the study by a mean of 10 points more 0.4
than the decline in the control group, the probability
0.3
of failing to reject the null hypothesis would be 0.20,
0.2
in which case we would not learn that the treatment
was truly eicacious. 0.1
0
0 2 4 6 8 10
Statistical tests of significance Difference in ADAS-Cog score
A test of signiicance is a procedure that calculates Figure 4.1. Power of a test of the difference in ADAS-Cog score
whether the observed data provide suicient evidence between anti-dementia treatment and placebo. The standard
deviation of ADAS-Cog score in each group is assumed to be 10.
to reject the null hypothesis. he choice of test depends
on the nature of the outcome under study. Standard
textbooks on statistics provide many tests tailored to 3. decreasing sample size
diferent settings [6–8]. 4. increasing variability of the outcome
Note that while we speak of power at the alternative ΔA,
Sample size in fact power is a function of the class of possible alter-
Selection of the α- and β-levels and hypothesizing natives. Rather than speak simply of power as a single
the efect ΔA of drug, allow calculation of sample size. number, a more useful construct is to consider power
he required sample size increases with any of the as a function γ(Δ) where the power is calculated over a
following: range of values of efect sizes Δ (see Figure 4.1).
1. decreasing α-level Figure 4.1 shows the power as a function of dif-
2. decreasing β-level ference in ADAS-Cog scores for a test of a new anti-
3. increasing variability of the outcome dementia drug compared with placebo. he study,
4. decreasing ΔA which has been designed to have a two-sided α-level of
5%, has 84 participants each in the active and standard
hus, although ideally the α- and β-levels would both of care arms. he standard deviation of the ADAS-Cog
be very small, practical and economic constraints score is assumed to be 10 points. Note that if the true
limit the sample size and preclude arbitrarily low error diference in treatment efect between the anti-demen-
rates. tia drug and placebo is 5 points, the power to detect the
diference is roughly 90%. If, however, the true dife-
Power rence is 2 points the power is only about 25%.
he power γ of a statistical test is 1 − β: the probabil- For ixed power, sample size increases proportion-
ity of rejecting the null hypothesis when the true efect ately to the variance (which is the square of the standard
of treatment is ΔA. Power is oten expressed in terms deviation) and inversely proportionately to the square
of percentages. In our example, the β-level is 0.20 so of the diference to be detected. hus, if for a given
the power γ is 0.80, or, as usually expressed, 80%, when α-level and power, 100 people per group are needed to
ΔA = 10 points. herefore, if the true efect of drug is to detect a diference of four points, then 400 are needed
decrease the decline by 10 points relative to control, the to detect a diference of two points.
power is 1 − 0.2 or 80%.
he power decreases with any of the following: The p-value
1. decreasing α-level When all data from a study have been gathered, the pri-
2. decreasing ΔA mary hypothesis is tested. he p-value is the probability
32
of observing an apparent efect of treatment at least as allow construction of a generic formula for the required
large as shown by the data if the null hypothesis is in sample size. Typically, in comparing means or propor-
fact true. he smaller the p-value, the more conidence tions, the diference between the sample statistics has
in the conclusion that the null hypothesis is not true. an approximately normal distribution. In comparing
odds ratios or hazard ratios, the logarithm of the ratio,
Confidence intervals or, equivalently, the diference in the logarithms, has
this property.
Closely related to statistical tests are conidence inter-
Consider three diferent trials using a new drug
vals. A statistical test asks whether one can reject the
called ‘COG-Plus’ to improve relative cognitive func-
null hypothesis. A conidence interval is the set of
tion score relative to control in a study group of people
null hypotheses that the data could have rejected
with Alzheimer’s disease and a baseline cognitive func-
had the statistical test been performed at the stated
tioning score of 20 or more.
α-level. Suppose, for example, that the clinical trial of
he irst hypothetical study, to be called the Slower
ADAS-Cog reports that the 95% conidence interval
COG Decline Trial, tests whether COG-Plus in fact
for the diference between the change from baseline
lowers the rate of decline of cognitive functioning
in ADAD-Cog for the new drug and placebo is (4, 8).
scores relative to placebo. he trial, which randomizes
We can interpret the interval in one of two ways. One
patients to receipt of COG-Plus or placebo, measures
correct interpretation is that if we did an ininite num-
the cognitive functioning score at the end of the sixth
ber of identical clinical trials, 95% of the conidence
month of therapy. he outcome is the continuous vari-
intervals calculated would cover the true diference
able ‘score on the ADAS-Cog.’
between the changes. Another interpretation is that
he second study, to be called the Low COG
the data from this trial would reject any null hypothesis
Prevention Trial, compares the proportions of people
less than 4 and greater than 8. In particular, it rejects
in the treated and control groups with cognitive func-
the null hypothesis that the true diference is zero. Note
tioning scores above 25 points at the end of 1 year of
the conidence interval does not mean that the prob-
treatment with COG-Plus or placebo.
ability is 95% that the true diference is between 4 and
he third study, called the Time to COG-loss, fol-
8. Chapter 6 describes conidence intervals in more
lows patients for at least 5 years and compares times
detail.
to loss of 10 points in the two groups. his type of out-
come is a time-to-failure variable.
Sample size for controlled he formulas for determining sample size use sev-
eral statistical concepts. hroughout this chapter, Greek
clinical trials letters denote a true or hypothesized value, while italic
Roman letters denote observations.
A basic formula for sample size Under the above conditions, a generic formula
he statistical literature contains formulas for deter- for the total number of persons needed in each group
mining sample size in many specialized situations. his to achieve the stated type I (α) and type II (β) error
part describes a simple generic formula that provides rates is:
a irst approximation of sample size and that forms 2
the basis of variations appropriate to specialized situ- n= {⎡⎣ξ1 α +ξ β
⎤ Δ
⎦ }
ations. To understand these principles, consider a trial
that aims to compare two treatments with respect to a where σ2 is the variance of the outcome measure and its
parameter of interest, again, say ADAS-Cog. For sim- square root σ is its standard deviation.
plicity, suppose that half of the participants will be ran- he formula assumes one treatment group and one
domized to a new drug and the other half to a control control group of equal size and two-tailed hypoth-
group. he trial investigators may be aiming to compare esis testing. he quantity ξx is the value that corres-
mean values, proportions, odds ratios, hazard ratios, or ponds to the xth percentile of the standard normal
some other statistic. Suppose that with proper math- distribution (e.g., ξ0.975 = 1.96 and ξ0.8 = 0.84). Typical
ematical transformation, the diference between the controlled trials in neurology set the statistical sig-
parameters in the treatment and control groups has an niicance level at 0.05 or 0.01 and the power at 80 or
approximately normal distribution. hese conditions 90%. Table 4.1 shows the sample sizes required for
33
Table 4.1 Relative sample sizes as a function of statistical 1. he responses of participants are independent of
power and α level
each other. he formula does not apply to studies
Power that randomize in groups, for example, trials
that assign the same treatment to all students
α 50% 70% 80% 90% 95%
in a classroom, or all people in a village, or all
a
0.05 0.5 0.8 1.0 1.3 1.7 visitors to a clinic, or to studies that match patients
0.01 0.9 1.2 1.5 1.9 2.3 or parts of the body and randomize pairwise.
0.001 1.5 1.8 2.2 2.7 3.1 For this type of randomization in groups (i.e.,
cluster randomization), see, for example, Donner
a
Reference group.
To read the table, choose a power and an α level. Suppose one
and Klar [9]. Analysis of studies with pairwise
is interested in a trial with 90 percent power and an α level of randomization focuses on the diference between
0.01. The entry of 1.9 in the table means that such a trial would the results in the two members of the pair.
require 1.9 times the sample size required for a trial with 80
percent power and an α level of 0.05.
2. he variance of the response is the same in both the
treated and control groups.
3. he sample size is large enough that the
various levels of α and β relative to the sample size observed diference in means is approximately
needed for a study with a two-sided α equal to 0.05 normally distributed. In practice, for reasonably
and 80% power (β = 0.20). symmetric distributions, a sample size of about 30
Some people in using sample size formulae mistak- in each treatment arm is suicient to apply normal
enly interpret the ‘2’ as meaning two groups and hence theory. he central limit theorem legitimizes
incorrectly use half the sample size necessary. the use of the standard normal distribution. For
For tests at signiicance level 0.05, the sample size a discussion of its appropriateness in a speciic
needed to achieve high power is considerably lar- application, consult any standard textbook on
ger than the sample size needed to observe a p-value statistics.
of 0.05. hus, many people get confused by what 4. In practice, the variance is unknown. herefore, the
appears to be a very large sample size needed to show test statistic under the null hypothesis replaces σ
statistical signiicance. hey point to studies where with s, the sample standard deviation. he resulting
a much smaller sample size demonstrated a statis- statistic has a t distribution with 2(n–1) degrees of
tically signiicant efect. In fact, that observation is freedom (df). Under the alternative hypothesis,
correct: if a trial is designed with a two-sided α-level the statistic has a non-central t-distribution
test of 0.05 and power of 80%, the expected p-value with non-centrality parameter 2n ΔA and,
under the alternative is 0.005. Similarly, if the same again, 2(n – 1) df. Standard sotware packages for
trial had 90% power, the expected p-value would be sample size calculations employ the t and non-
0.001 (see p. 43 of Proschan, et al. [3] for a proof). central t-distributions [10–12]. Except for small
One way to understand this apparent contradic- sample sizes, the diference between the normal
tion is to consider the sample size required for 50% distribution and the t-distribution is quite small,
power. In that case, the sample size formula reduces so the normal approximation yields adequately
to N = 2σ2[ξ1−α/2/ΔA]2 because ξ1–0.5 = ξ0.5 = 0. In other close sample sizes in most situations. Table 4.2
words, the ‘just barely signiicant’ cut-of occurs at presents the necessary sample size for a two-arm
50% power. he reason to design studies with larger study using the normal approximation under the
sample sizes (e.g., studies with 80% or 90% power) is assumption of no non-compliance with protocol.
to ensure a high probability of actually showing stat-
istical signiicance.
Binary variables: testing the difference
Continuous variables: testing the difference between two proportions
between mean responses Calculation of the sample size needed to test the dif-
To calculate the sample size needed to test the dife- ference between two proportions requires several
rence between two mean values, one makes several assumptions.
assumptions. 1. he responses of participants are independent.
34
Table 4.2 Approximate total sample size for a controlled Table 4.3 Approximate total sample size for a controlled
clinical trial that compares two groups when the primary clinical trial that compares two groups when the primary
outcome is a continuous variable endpoint is a binary variable
Power = 90% Power = 80% Proportion Proportion with

with the the event in group 1
∆/σ (n) (n)
event in
0.1 4200 3100 group 2 0.05 0.1 0.2 0.3 0.4 0.5
0.2 1100 790 0.1 1242
0.3 470 350 0.2 228 572
0.4 270 200 0.3 102 178 824
0.5 170 130 0.4 62 94 238 992
0.6 120 88 0.5 42 58 116 268 1076
0.7 88 66 0.6 32 40 66 122 280 1076
0.8 68 50 0.7 24 32 46 74 122 268
0.9 54 40 0.8 18 22 21 46 66 116
1.0 44 34 0.9 14 16 22 21 40 58
1.5 20 16 α – 0.05; power = 90%; table assumes no loss to follow up, no
2.0 12 10 non-compliance, no multiple looks at the data, and uses the
Fisher’s exact test. The sample size per group is half the value in
α = 0.05; Δ is the difference to be detected and σ is the the table.
population standard deviation. The sample size per group is
half the value in the table.
more accurate approach would acknowledge that the
variance under the null is proportional to 2π(1 π)
2. he probability of an event is πc and πt for each while under the alternative it is proportional to πc(1−
person in the control group and the treated πc) + πt(1− πt).
group, respectively. Because the sample sizes in he formula, which uses the normal approxima-
the two groups are equal, the average event rate tion, becomes inaccurate as nπc and nπt become very
π + πt small (e.g., less than 5). If one employs a correction for
is π = c . his assumption of constancy of continuity in the inal analysis, or if one will be using
2
Fisher’s exact test, one should replace n with the for-
proportions within each group is rarely strictly
mula given by Fleiss [13]:
valid. If the proportions vary considerably in
recognized ways, one may reine the sample
2
size calculations to relect that heterogeneity. n⎛ 4 ⎞⎟⎟
Oten, however, one hypothesizes average n ′ = ⎜⎜⎜1+ 1+ ⎟
4 ⎜⎝ n|π c -π t | ⎟⎟⎠
values of πc and πt and calculates sample size as if
those proportions applied to each individual in the
Table 4.3 presents the necessary sample size for a two-
study.
arm study using the test for proportions under the
Under these assumptions, the binary outcome variable assumption of no non-compliance with protocol.
has a binomial distribution, and the following simple
formula provides the sample size for each of the two Failure time studies
groups:
Many neurological clinical trials compare therapies
2 with respect to time to occurrence of the primary out-
(ξ 1 α ξ1−β ) come. his time is oten called failure time or time to
n = 2 π (1 − π)
(π c − π t )2 failure. More optimistically, the time may be measured
not as the time to failure but as the length of time the
he simple formula uses the same variance under both participant has not failed, or the survival time. Here we
the null hypothesis and the alternative hypothesis. A introduce several important concepts related to failure
35
time distributions [14]. Speciically, we mention cen- follow-up only if the mechanism that leads to loss favors
soring, hazard, survival curves, the Kaplan-Meier neither those who would have experienced the outcome
representation of the estimated survival curves, the nor those who would not have. In an efort to show that
log-rank test, and the Cox proportional hazard model. losses did not occur diferentially by treatment group,
many investigators use baseline parameters to compare
those lost to follow-up to those who were not lost or
Censoring compare the patients lost to follow-up from the treat-
Trials that compare time to failure usually end before ment and the control groups. he fact that such a com-
all the participants experience the primary outcome parison shows no diference is not suicient to preclude
under study. hese participants are said to be ‘censored’ informative censoring. Imagine, for example, a study
at the time of their last observation. In the usual meth- of memory agents that compares two groups of people
ods of time-to-failure analysis, censoring is assumed to with identical baseline parameters. During the course of
be ‘non-informative;’ that is, the mechanism causing the study, a number of people in the placebo group who
the censoring favors neither those who are more likely have experienced a decrease in memory drop out of the
to fail nor those who are less likely to fail [15]. Several study because they perceive that they are not receiving
mechanisms lead to censoring in clinical trials. he any beneit from the study agent and wish to switch to
simplest type of censoring is so-called administrative an eicacious treatment. Since functioning is associated
censoring: the study ends before all persons experience both with memory loss and dropping out of the study,
the primary outcome. For example, in a 10-year study this censoring is informative in spite of the fact that all
of survival among a low risk group, only a small pro- people in the study had identical baseline characteris-
portion of the study group is expected to die by the time tics. Although standard life-table methods are very oten
the study ends. At the end of the study, no one knows used when patients are lost to follow-up, investigators
when an administratively censored person will die. should be aware of potential bias arising from loss to fol-
In some clinical trials, each participant has a ixed low-up. Similar problems occur when participants with-
follow-up time. More typically, the trial ends on a ‘com- draw from the trial. Such withdrawals are oten not at
mon closeout date.’ Since participants are recruited random so that censoring them as if the withdrawal were
over a period of months or years, the length of the fol- non-informative can introduce bias into the analysis.
low-up time is speciic for each person. his ‘staggered Another important type of censoring is that caused
entry’ leads to unequal time of administrative censor- by competing risks. For example, in a long-term sur-
ing. Because the degree of administrative censoring vival study of patients with Alzheimer’s disease, many
is independent of treatment, such censoring is non- people die of causes other than those due to progres-
informative. Standard life-table methods are appropri- sion of their Alzheimer’s disease during the course of
ate for handling the resulting unequal follow-up times the study. Such non-Alzheimer’s death is a compet-
(see, for example, Collett [16]). ing event that precludes the occurrence of the study
A second type of censoring is caused by loss to outcome. his type of censoring is oten informative.
follow-up. In this case, the endpoint cannot be meas- Censoring occurs only when the outcome cannot be
ured because the participant or the participant’s med- measured. he standard methods of statistical analysis
ical records become unavailable to the study. A person (e.g. life tables, Kaplan-Meier survival curves, the log-
is then censored at the time of loss. Vigorous eforts by rank test, and Cox models) can deal with censoring
the investigator can oten minimize loss to follow-up. computationally. All, however, make the assumption
Some participants who have moved residences are that the censoring is non-informative.
willing to be measured at a clinic near their new home. In summary, at any given time during the study only
Sometimes routinely collected data like the National a subset of the study cohort is at risk for experiencing
Death Index can be used to ascertain vital status at the the primary outcome. his subset decreases each time a
common closeout date even if the participant is not primary outcome occurs and each time a person leaves
following study protocol. Loss to follow-up is concep- the study by loss to follow-up or by competing risk.
tually more diicult to deal with than administrative Losses due to administrative censoring do not lead to
censoring because such losses may be informative. he bias in the inference about the estimated efect of treat-
life-table methods appropriate to administrative censor- ment (except when the statistical methods confound
ing are strictly valid when some participants are lost to loss and efect), but losses due to non-independent
36
1.0 Figure 4.2. Four hazard functions.

lambda1 = 0.1; lambda2 = 1 The four hazard curves correspond to
lambda1 = 0.05; lambda2 = 0.5 different clinical settings. A flat curve
0.9
lambda1 = 0.15; lambda2 = 2.3 represents constant risk, the curve
lambda1 = 0.175; lambda2 = 5.1 with lambda1 = 0.05 and lambda2 =
0.8
0.5 represents high immediate risk but
0.7 diminishing risk as time proceeds. The
two increasing curves show functions
Probability of survival
that describe deteriorating conditions.

0.6
0.5
0.4
0.3
0.2
0.1
0.0
0 1 2 3 4 5
Years since randomization
competing risks may lead to bias in the estimated size hazard curves of Figure 4.2. In all cases, since S(5) =
of the treatment efect. 0.6, 60% of the people live beyond 5 years. he hazard
curve h(t) is related mathematically to S(t):
Hazard rate
Consider a study that assigns time 0 to the date a −d {log
l g S(t )]}
h( ) = .
patient was randomized. For any small interval of time dt
Δt about a speciic time t, the probability that a person
will experience the event under study is represented by
h(t)Δt. he function h(t) is called the hazard function. Kaplan-Meier curve
Figure 4.2 plots four hazard curves that correspond to Perhaps the most common representation of the sur-
very diferent clinical settings. he lat line represents vival curve in clinical trials is the Kaplan-Meier curve,
constant hazard; that is, the risk of mortality, or more which interprets the survival curve as a product of
generally, the risk of the event under study, is constant probabilities. For example, in a 7-year trial of mortal-
over time. he curve with λ1 = 0.05 and λ2 = 0.5 rep- ity following diagnosis of Alzheimer’s disease the two-
resents typical hazards ater surgery: high immediate year survival rate can be written as:
post-surgical mortality, but diminishing mortality S(2) = S(1)S(2|1)
risk as time proceeds. he two increasing curves show
functions that describe deteriorating conditions. he where S(2|1) is the probability of surviving at least 2
curve with λ1 = 0.15 and λ2 = 2.3 represents a cohort of years for a participant who has survived for 1 year.
initially healthy people whose risk of death increases Similarly, the 3-year survival rate is:
fairly steadily during the irst 5 years ater randomiza-
S(3) = S(1)S(2|1)S(3|2)
tion. he curve with λ1 = 0.175 and λ2 = 5.1 depicts a
cohort of people at low risk for death during the irst 2 where S(3|2) is the probability that a person who has
years, but rapidly increasing risk thereater. survived for 2 years will survive for at least 3 years.
Finally, the 7-year survival rate is:
Survival curve S(7) = S(1)S(2|1)S(3|2)S(4|3)S(5|4)S(6|5)S(7|6)
he function that describes the proportion of partici-
pants alive at time t is the survival curve S(t). Figure 4.3 To construct the Kaplan-Meier curve, we esti-
shows the four survival curves associated with the mate each component probability from the set of
37
1.0 Figure 4.3. Four survival functions.

The figure shows the four survival curves
0.9 associated with the hazards curves in
Figure 4.2.
0.8
0.7
Probability of survival
0.6
0.5
0.4
0.3
lambda1 = 0.1; lambda2 = 1
0.2 lambda1 = 0.05; lambda2 = 0.5
lambda1 = 0.15; lambda2 = 2.3
0.1 lambda1 = 0.175; lambda2 = 5.1
0.0
0 1 2 3 4 5
Years since randomization
observations. In the typical Kaplan-Meier curve, each he test, which requires no assumptions regarding
time a person dies the curve steps down; the height the shapes of the survival curves, compares treatment
of the step represents the probability of death within and control groups each time a person experiences
the preceding horizontal time interval. he height the primary study outcome. Suppose, for example, at
of the graph from zero at each time t represents the the time of the dth study outcome n1 patients remain
overall probability of survival to time t. Figure 4.4 in the control and n2 in the treated group. If treatment
shows a typical Kaplan-Meier curve. Here, failure has no efect on the outcome, the death would have
time is death. he curve starts at the point (t,S(t)) occurred in the control group with probability n1/(n1
= (0,1) because the entire patient cohort is alive at + n2). he log-rank test compares the expected num-
randomization. In most neurological clinical trials, ber of events in the control group during the study
S(t) does not drop to zero because the study ends with the actual number observed. Standard texts on
while some participants are still alive. To determine survival analysis present formulas for performing
the median survival time, draw a horizontal line at the calculation [16]. Because the calculation requires
0.5 on the y-axis and when this line hits the survival meticulous accounting for each person’s time of event
curve draw a vertical line down to the x-axis. In this or censoring, we recommend using a standard com-
simple case, the median survival time is 38 months puter program.
[S(38) = 0.5] and the dotted lines are shown on Figure
4.4. Many standard statistical sotware packages have Sample size formulae
subroutines that plot Kaplan-Meier curves. Note that Consider a trial that compares time to some speciied
the Kaplan-Meier estimator of the survival curve event – for example, loss of 10 points from baseline on
does not make any assumptions about the shape of the ADAS-Cog scale in a study of Alzheimer’s disease.
either the survival curve or the hazard function. As Let πc and πt be the probability that a person in the con-
previously mentioned, it does assume that censoring trol group and a person in the treated group, respect-
occurs non-informatively. ively, experience an event during the trial. he relative
risk is πt/πc. Deine θ = ln(1 – πc)/ln(1 – πt).
Assume that the event rate is such that within each
Log-rank test of the two groups every participant in a given treat-
he log-rank test is a widely used method for comment group has approximately the same probability
paring survival curves in randomized clinical trials. of experiencing an event. Assume that no participant
38
1 Figure 4.4. Example of a Kaplan-Meier

curve for a small study.
0.75
Survival probability
0.5
0.25
0
0 10 20 30 40 50 60 70 80 90
Time (months)
withdraws from the study. In a study in which half of If the ratio of allocation to treatment and control is m:1
the participants will receive experimental treatment rather than 1:1, the ‘4’ in the above formula becomes
and half will be controls, Freedman [17] presents the (m + 1)2/m. Neither of the above formulae explicitly
following simple formulas. incorporates time. In fact, time appears only in the cal-
Total number of events in both treatment groups: culation of the probabilities πc and πt of events.
2 Table 4.4 presents the necessary sample size for
⎛ θ +1⎞⎟ ⎛ ⎞ a two-arm study using the log-rank test under the
⎜⎜ ⎟
⎜⎜ θ −1 ⎟⎟
⎜⎜ξ
⎜⎜ 1− α ξ1−β ⎟⎟⎟ 2
⎝ ⎠ ⎝ 2 ⎟⎠ assumption of no non-compliance with protocol.
Sample size in each group: General problems of multiplicity as it

1 ⎛⎜ θ +1⎞⎟
⎟ (ξ
2
2
+ ξ1−β )
relates to sample size
⎜
π c + π t ⎜⎝ θ −1⎟⎠ 1 α 2
Most clinical trials study more than one outcome
of interest. A trial of treatment ater diagnosis of
An even simpler formula is due to Schoenfeld [18] who Alzheimer’s may compare 5-year mortality, 10-year
derived it for the log-rank test without assuming an mortality, and time to a score of <20 on the ADAS-
exponential model. Under their models, the total num- Cog. A trial to study the efect of treatment on cogni-
ber of events required in the two treatment groups is: tive function might compare memory loss and ability
2 to perform activities of daily living. he probability of
⎛ ⎞ type I error increases with the number of endpoints
⎜ ξ1− α + ξ1−β ⎟
⎝ 2 ⎠ considered (as discussed earlier in this chapter, type I
4 error, or α-error, is the error incurred by falsely ind-
[ln(θ)]2
ing two treatments to be diferent when they truly have
hen the total sample size required in each treatment equivalent efects). he standard approach to statistical
group is: testing in clinical trials presupposes a single outcome;
2 if there is more than one outcome of interest, the stat-
4 (ξ1 α 2 + ξ1− β )
istical test procedure must be adjusted if the experi-
π c + πt [In(θ)]2 ment is to preserve the stated type I error rate. Many
39
Table 4.4 Approximate total sample size for a controlled Many statisticians, the two authors of this chapter
clinical trial that compares time to event in a treatment and included, recommend statistical adjustments to main-
control group
tain the type I error rate at the stated level. If, however,
Proportion Proportion with the results are to be reported in a professional jour-
with the the event in group 1 nal that does not require such adjustment, then the
event in investigators may decide to adopt the methods stand-
group 2 0.05 0.1 0.2 0.3 0.4 0.5 ard for the work previously published in the journal.
0.1 758 If the experiment is to be used as a pilot for the
0.2 120 356 design of a larger study, then the degree of adjustment
0.3 53 107 554 may not need to be very rigorous.
One rigorous approach to multiplicity is to declare
0.4 31 53 153 694
a single primary outcome variable and assign to it the
0.5 21 33 73 182 773 entire type I error rate. hen list a set – preferably a small
0.6 15 22 43 83 195 785 set – of important secondary outcomes. Apply a rule to
0.7 12 16 28 47 86 190 adjust for multiplicity of these secondary outcomes.
0.8 9 13 20 30 46 79 See, for example, Dmitrienko, et al. [21] for a discussion
0.9
of various approaches for adjusting for multiplicity. In
8 10 14 20 27 39
their α-preserving paradigm, if the primary outcome is
α = 0.05; power = 90%; table assumes no loss to follow-up, no non-
compliance, no multiple looks at the data and uses the Lakatos
not statistically signiicant, then one cannot declare sig-
method. The sample size per group is half the value in the table. niicance for any of the secondary outcomes. Problems
of multiplicity also arise in sequential analysis of clin-
ical trials. See Chapter 14 for details on this type of
approaches are available to adjust for multiplicity; dif- multiplicity.
ferent people support approaches that range from no
adjustment to extreme adjustment [19] with follow- Other considerations in calculating
up time twice recruitment time [20]. he simplest
approach, the so-called Bonferroni method, counts the sample size
number of statistical tests k to be performed and then he sample size discussion in this chapter introduces
divides the α-level by k. he resulting α-level is used the basic concepts and alludes to such complicating fac-
to declare signiicance. his conservative approach tors as multiplicity and loss to follow-up. Actual sample
will lead to large sample sizes if there are many tests. size calculations must account for a host of deviations
For example, consider a study with an α-level of 5%. If from ideal in addition to the two already mentioned.
the sample size were 100 per group in an experiment Participants may stop taking study medication; they
with a single primary outcome, under a Bonferroni may cross over to the active medication either by the
adjustment the size would need to be 118 if there were design of the protocol or, if the medication or a simi-
two primary outcomes, 136 for four, and 159 for 10. lar one is already available, they may do so in violation
Similarly, in a trial that compares more than two drugs, of the protocol. hey may be only partially compliant,
the α-level should adjust the sample size to account for taking their medication sometimes but not always, or
the multiple comparisons possible among treatments. taking more than prescribed. he population itself may
How should investigators address the issue of mul- be too heterogeneous to assume that all participants
tiple outcomes? When feasible, they can severely limit share the same underlying parameters of interest. he
the number of outcomes to be formally tested. If having centers involved in the study may recruit patients of
a limit of one or two outcomes is scientiically or med- very diferent severities of disease and the centers may
ically unacceptable, the investigators should decide use quite diferent standards of care.
to whom they are addressing the results of the study In general, the more variability in the population
and use a method of adjustment acceptable to their studied and the more variability in the investigators’
intended audience. If the results of the experiment will patterns of treating patients, the larger the sample size
support a submission to the US FDA, the investigators must be to maintain adequate power. In designing a
should discuss the appropriate methods of adjustment randomized clinical trial, the prudent investigator will
with the FDA. seriously consider the ways in which the assumptions
40
underlying the planned statistical methods are likely wide or narrow eligibility criteria. Stat Med 1990; 9:
to fail and the potential for deviations from the proto- 73–86.
col to occur. he investigators, including the statisti- 5. Lan K and DeMets D. Group sequential procedures:
cians, should deal carefully with the consequences of calendar versus information time. Stat Med 1989; 8:
the likely failure of assumptions and violations of 1191–8.
protocol. In certain types of studies in neurology, for 6. Moore D and McCabe, GP. Introduction to the Practice
example, in prevention of stroke in high risk popula- of Statistics, 3rd edition. New York: W.H. Freeman and
tions, these types of problems are no more severe than Company, 1999.
in many other ields of medicine. In other areas, how- 7. Altman DG. Practical Statistics for Medical Research.
ever, for example, Alzheimer’s disease, severe epilepsy, London: Chapman & Hall, 1991.
and ALS, the nature of the population under study is 8. Pagano M and Gauvreau K. Principles of Biostatistics
such that many participants fail to complete the proto- 2nd ed. Duxbury, MA: Duxbury Press, 2000.
col as planned. To the extent feasible, the design of such 9. Donner A and Klar N. Design and Analysis of Cluster
trials should incorporate features that either allow Randomized Trials in Health Research. London: Arnold,
large enough sample sizes to overcome the resultant 2000.
increases in variability or that redeine outcome vari- 10. Borenstein M, Rothstein H, Cohen J, et al. Power and
ables in such a way as to avoid violations of protocol. Precision. Englewood, NJ: Biostat, Inc., 2001.
11. Elashof J. nQuery Advisor Version 4.0 User’s Guide. Los
Angeles, CA: Statistical Solutions, 2000.
Bayesian statistics
12. Hintze J. PASS 2008. NCSS LLC, Kaysville, UT, 2008.
Our discussion thus far has assumed that the clinical
trial will be conducted in the classical, or frequentist, 13. Fleiss J, Tytun A, and Ury H. A simple approximation
framework. Philosophically, a frequentist considers for calculating sample sizes for comparing independent
proportions. Biometrics 1980; 36: 343–6.
that the parameter of interest is ixed; that is, if the
sample size were large enough, the estimated value of 14. Cox DR and Oakes D. Analysis of Survival Data. New
York: Chapman and Hall, 1984.
the parameter would converge to the true value. In,
Bayesian statistics, on the other hand, the parameter 15. Lagakos SW. General right censoring and its impact on
itself is viewed as having a distribution. One starts with survival data. Biometrics 1979; 35: 139–156.
a ‘prior’ distribution for that parameter and one uses the 16. Collett D. Modelling Survival Data in Medical
data from the clinical trial to modify one’s prior. In the Research. 2nd ed. Boca Raton, FL: Chapman and Hall/
CRC, 2003.
past, few clinical trials were performed in the Bayesian
framework, but Bayesian methods have become more 17. Freedman L. Tables of the number of patients required
widely used recently. See Berry [22] for a basic descrip- in clinical trials using the logrank test. Stat Med 1982; 1:
121–9.
tion of the approach.
18. Schoenfeld D. he asymptotic properties of
nonparametric tests for comparing survival
References distributions. Biometrika 1981; 68: 316–9.
1. Meinert CL. Clinical Trials. Oxford: Oxford University 19. Lakatos E. Sample sizes based on the log-rank statistic
Press, 1986. in complex clinical trials. Biometrics 1988; 44: 229–241.
2. Friedman LM, Furberg C, and DeMets DL. 20. Miller JRG. Survival Analysis. New York: John Wiley &
Fundamentals of Clinical Trials, 4th ed. New York: Sons, Inc., 1981.
Springer, 2010. 21. Dmitrienko A, Tamhane AC, Wang X, et al. Stepwise
3. Proschan M, Lan KKG, and Wittes JT. Statistical gatekeeping procedures in clinical trial applications.
Monitoring of Clinical Trials: A Uniied Approach. New Biometr J 2006; 48; 984–91.
York: Springer, 2006. 22. Berry D. Bayesian clinical trials. Nature Rev Drug
4. Yusuf S, Held P, and Teo KK. Selection of patients Discov 2006; 5: 27–36.
for randomized controlled trials: Implications of
41
Section
Section2 Concepts in biostatistics and clinical measurement
Chapter
Bias and random error
5 Susan S. Ellenberg and Jacqueline A. French
Introduction not expect to observe precisely identical outcomes. In a

clinical trial, we need to plan our trial so that, if there is
he goal of a controlled clinical trial, as it is for any con-
a true diference in outcomes, we will expect to observe
trolled experiment, is to compare the efects of inter-
a large enough diference to be able to distinguish it
ventions on outcomes of interest. In order to draw valid
from a diference attributable to chance. As with bias,
and reliable conclusions from a trial, one must believe
the control of random error is important to consider
that any observed diference between groups treated
throughout the process of a clinical trial.
diferently is due to the diference in interventions and
In this chapter, we consider methods to limit bias
not to any inherent diferences between the groups, or
and random error at each stage of a clinical trial –
simply to the play of chance.
design, conduct, analysis and interpretation of results.
Bias is the existence of systematic diferences
between groups that will lead to diferences in out-
comes regardless of any diference in treatment efect. Study design
A major focus of clinical trial methodology, in regard
to design, conduct, and analysis, relates to the control Bias
of bias, as bias can arise in any of these areas. For exam- Many aspects of study design relate to control of bias.
ple, a trial designed with a historical control group he one of greatest importance is the method of assign-
consisting of previously treated individuals identiied ment to treatment. Since the middle of the twentieth
from medical records might be biased since there are century it has been widely accepted that the best way
many reasons why individuals treated in the past might to minimize bias related to subject characteristics is
be diferent, and have diferent prognoses for the out- to assign treatment at random. his means using a
come of interest, from those treated currently. A trial truly random mechanism to determine the treatment
conducted so that those evaluating outcomes are aware assignment for each successive subject. Alternative
of the treatment assignments might be biased if the approaches all have the potential for creating treat-
evaluators believe that one treatment is likely superior. ment groups that are systematically diferent, thereby
A trial analysis that excludes individuals who did not confounding any attempt to estimate treatment efect
comply with the assigned treatment might be biased by comparing outcomes in the treatment groups. Some
if non-compliance is associated with prognosis for types of non-randomized control groups, and the
outcome. problems they raise, are as follows:
Random error refers to diferences that occur by • Historical controls: may have received difering
chance. If we lip a fair coin 20 times we are not likely to concomitant therapies; may have been treated by
observe exactly 10 head and exactly 10 tails, although diferent physicians, using diferent protocols to
that is the expected outcome. In coin lips, the random manage therapy; may not have met all inclusion
error may result from the force going into the lip, air criteria for current study; may have difering
currents in the room, or other conditions extrane- distributions of prognostic factors [1]
ous to the fairness of the coin. Similarly, in a clinical • Concurrent subjects choosing not to receive
trial, if two treatments were in fact equivalent, and if investigational treatment: choice of treatment may
we treated 50 subjects with each treatment, we would be associated with prognosis
Clinical Trials in Neurology, ed. Bernard Ravina, Jefrey Cummings, Michael P. McDermott, and R. Michael Poole. Published by
Cambridge University Press. © Cambridge University Press 2012.
42
Chapter 5: Bias and random error
• Concurrent subjects at other sites: similar to either the real or sham procedure, and taking either an
problems with historical controls active or placebo medication. Many single-blind stud-
• Systematic assignment according to birthdate, irst ies involving transplantation of experimental tissue
letter of last name, etc.: assignment of each patient and therefore requiring a sham surgical procedure in
will be known to recruiting investigator, may the control group have been performed in Parkinson’s
inluence decision to approach patient for study disease [3,4]; the known high rate of placebo response
• Alternating treatment assignments: Similar to in single-arm trials have led researchers to insist on
systematic assignment above. double-blind designs and these studies have largely
been accepted by institutional review boards and
When a true randomized design is used, there can be research participants.
no reason other than chance (whose inluence can be In many cases, however, an unblinded design will
controlled by sample size, as we will discuss later) for be necessary, and other approaches to control bias will
outcomes to difer between arms other than the difer- have to be implemented. Sham surgeries, while largely
ent treatment assigned to each arm. accepted in Parkinson’s disease, arthroscopic knee sur-
Probably the tool of next greatest importance in gery, and a few other areas, are always controversial
the control of bias is blinding (or masking; these terms and are complicated to conduct. Even in trials compar-
are used interchangeably). Ideally one would wish to ing medication strategies a blinded design is not always
use a double-blind design, in which neither the sub- possible or ethical. Some medications have distinctive
ject nor treating physician knows to which treatment side efects that make it diicult to blind. Further, some
the subject is assigned. In this way, neither the subject’s drugs with narrow therapeutic indices or potentially
perception of his/her health status, nor the physician’s serious toxicities may be diicult to manage in a fully
decisions about patient management can be inluenced blinded way, or clinicians may not feel comfortable man-
by the knowledge of the treatment assignment. aging a serious medical condition without fully under-
Studies of new drugs in which subjects on the con- standing which therapies have been employed. Such
trol arm can be untreated are typically designed with was the case when a blinded active control comparison
placebo controls, which maintain the double-blind. trial involving the currently best available therapies
he placebo must match the active drug in route and for status epilepticus, a life-threatening condition, was
schedule of administration, appearance, smell and suggested. he clinical investigators were initially hesi-
taste. When two active drugs are being compared, tant about managing the intravenous administration
double-blinding can be more complicated. It is usu- of four diferent treatments (diazepam (0.15 mg per
ally not feasible to prepare diferent active drugs so kilogram of body weight) followed by phenytoin (18
that they look, smell and taste the same. he approach mg per kilogram), lorazepam (0.1 mg per kilogram),
commonly used is a ‘double-dummy’ design in which a phenobarbital (15 mg per kilogram), or phenytoin (18
placebo for each drug is prepared and subjects receive mg per kilogram)) in a blinded way. Ultimately, how-
one active drug and one placebo but of course do not ever, the trial was successfully performed [5].
know which is which. In this way, drugs with diferent
routes and schedules can still be compared in a double- Random error
blind fashion. An excellent example is the Heparin in
Acute Embolic Stroke Trial (HAEST) in which subcu- Sample size
taneous heparin (dalteparin 100 IU/kg) twice a day was Suppose we randomize twenty subjects between two
compared to aspirin tablets 160 mg once a day [2]. To therapies, ten to each. Suppose then that six subjects
maintain study blind, patients received either aspirin have a good outcome with drug A but only four with
tablets and subcutaneous injection of a saline placebo, drug B. Can we conclude then that drug A is superior?
or subcutaneous heparin and an aspirin placebo. Certainly not with any high conidence; even though
Double-blind designs are not always feasible due drug A’s success rate is 50% higher, this degree of vari-
to ethical or logistical considerations. In a trial of ation from the expected inding under the assumption
surgery versus medication, for example, the treating that they have the same efect (ive successes on each
physician cannot be blinded, but if a sham surgical arm) is entirely consistent with chance. Just as we would
procedure can be done ethically, it may still be done not be surprised to lip a fair coin twenty times and get
as a single-blind study with all subjects undergoing six heads in the irst ten lips and then four heads in
43
the next ten, the comparison of six versus four is not Inclusion criteria
at all inconsistent with the two drugs having identical Another way to reduce random error is by selecting eli-
efects. If however, we treated not 10 but 100 subjects gibility criteria that exclude individuals who have little
with each drug, and observed 60 successes with drug chance of showing a treatment efect, either because of
A and 40 with drug B, we would have a much stronger their underlying health status or because of environ-
case for concluding that drug A is superior—the prob- mental factors that might afect their adherence to the
ability that we would observe this much of a diference study protocol. Making the study sample more homo-
if the drugs really had the same efect is less than 1%. geneous with respect to prognosis for showing a treat-
hus, the key to controlling random error in design- ment efect will reduce variability. (On the other hand,
ing a trial is to ensure that the sample size is large enough a highly homogeneous study population will yield
to permit an observed diference of a speciied size to study results that may be less clearly generalizable to
be considered documentation of a true diference in the target population for the intervention.)
efect. he method of determining the required sample
size depends on the type of variable being assessed. If Study conduct
the variable is binomial (e.g., success vs. failure), the
comparison will be of the proportion of successes; if Bias
the variable is a continuous (or approximately continu- Experimentation with human beings is an imperfect
ous) measure (e.g. weight, blood pressure, IQ score), science; it is impossible to exercise the extent of control
the comparison will be of the means or medians; if the over the study implementation as it would be for a lab-
variable is the time until the event of interest occurs, oratory or animal experiment. Many aspects of study
the comparison will be of these times, accounting for conduct have the potential to bias study results.
the length of time the subject has been under study and
whether or not the subject has had the event. Allocation concealment
he goal in calculating sample size is to limit two Some of the beneits of randomization may be lost if
kinds of random errors: 1) the error of falsely conclud- study personnel involved in recruiting and entering
ing that the two treatments being compared produce subjects are aware of the treatment assignment for the
diferent efects when in fact there is no diference; and next subject to be entered. his is primarily an issue
2) the error of falsely concluding that the treatments in unblinded (sometimes called ‘open label’) stud-
being compared produce similar efects when in fact ies, for which a computer-generated assignment list
one is better than the other. he irst is referred to as will provide this information. In a multi-center study
type I error (or ‘alpha error’ as in sample size formulae with a central or web-based randomization process,
this error is designated by the Greek letter α); the second the upcoming assignment would remain hidden from
is referred to as type II error (or ‘beta error’, designated site investigators, but for single-site studies it can be a
β). Other commonly used terms relating to these errors concern. If an investigator knows that the next subject
are ‘signiicance level,’ which is equivalent to type I error to be entered will be assigned a speciic treatment, he/
and ‘power,’ which is the complement of type II error she will be able to make a subjective judgment about
and therefore represents the probability that we will whether to try to recruit a particular subject. his could
correctly identify a treatment efect as large as or larger lead to systematic diferences between arms despite the
than the diference the study was intended to identify. randomization [6]. he use of sealed envelopes to be
he key factors in determining sample size are the opened when the subject agrees to be randomized has
diference between the experimental and control group been shown to be particularly problematic; investi-
that is deemed important to identify; the variability of gators may be tempted to open the envelope to learn
the outcome measure in the study population; and the the assignment and only then decide whether to try to
magnitude of type I and type II error we are willing to recruit the subject. Implementation of randomization
accept. he smaller the diference we wish to identify, should always consider how to ensure that the alloca-
the larger the variability of the outcome measure, and tion schedule remains concealed from investigators.
the smaller the risk of type I and type II errors we can
accept, the larger the sample size will be. Blinded outcome evaluation
Details of sample size calculations in diferent sce- he evaluation of subject outcomes, the primary focus
narios are given in Chapter 4. of the trial, should be done without knowledge of the
44
subject’s treatment assignment whenever possible in criteria, and faulty measures of study outcomes all con-
order to avoid inluencing the evaluator who may have tribute to increased variability and thereby reduce the
a prior belief about the relative eicacy of the treat- chance that the study will be able to document a true
ments being compared. When outcomes are assessed diference in treatment efects.
by means of imaging, laboratory measures or subject
questionnaires, blinding the evaluators is generally Operations manual and training
straightforward, even when the trial is not conducted here are many ways to minimize random error in the
in a single- or double-blind fashion. When the primary conduct of clinical studies. First and foremost is the
outcome results from a clinical evaluation, however, it development of a detailed manual of procedures and
may be more diicult to arrange for a blinded evaluat- the training of study personnel in these procedures.
ion, especially when there are subjective aspects to the Training may need to occur more than once during
evaluation, or when one treatment involves surgery. a study, especially if important new procedures are
For example, in a study comparing bilateral deep brain introduced. A manual of procedures should ideally
stimulation to best medical therapy in patients with be available electronically with a search function that
advanced Parkinson’s disease, patients were required to facilitates accessing the information of interest.
wear caps to blind the raters to the presence or absence In developing the manual of procedures, it is impor-
of surgical scars [7]. tant to consider how best to reduce variability of certain
measurements. Standardizing the time of day may be
Non-compliance and dropout important for some measures, or timing of the meas-
In nearly all clinical trials, it is inevitable that some ure with respect to last food intake. Symptoms of some
subjects will not receive the study treatment according neurological disorders such as Parkinson’s disease can
to protocol. hey may forget to take drugs, stop tak- vary substantially on a diurnal basis. Serum concentra-
ing them (or take them inconsistently) because of side tions will be much less variable if they are taken at a
efects; they may fail to return for study visits at which predetermined interval from the time of dosing. If the
treatment is administered or provided; they may refuse measure requires subject input, it will be important to
to undergo testing. In unblinded studies, they may provide instructions to the site that will be relayed to
refuse the assigned treatment if they had hoped to be the subject on how to complete the measure.
assigned to the other treatment group. It is generally
not possible to know on an individual basis whether a Data entry and audit
non-compliant subject is more or less likely to have a Quality control of the data entry process can also
favorable outcome than a compliant subject, but stud- reduce error. Missing values, out-of-range data or data
ies have suggested that there can be a strong systematic inconsistent with other entered data can be identiied,
diference in prognosis between those who are and are either at time of data entry (for web-based data entry
not compliant with the study protocol [8–10]. hus, it systems) or by regular batch edits of the entire data-
is important to try to maintain information on non- base. Resolving such errors is not always possible but
compliant subjects and to obtain the data necessary in many cases the database can be updated with the
to include them in the primary analysis. Even subjects correct information. he sooner errors are identiied
who refused assigned treatment, for whatever reason, and referred back to the clinical sites for their attention,
should be kept in the study if at all possible and encour- the more likely such errors can be corrected, so qual-
aged to undergo evaluation for outcome. his issue is ity control systems should give high priority to timely
elaborated further in the account of analysis. feedback to clinical sites.
Random error Centralization of operations

Random error in the conduct of a study is commonly Centralization of some study functions can help mini-
referred to as ‘noise.’ Such errors increase the variabil- mize variability associated with diferences among par-
ity of study outcomes and hence reduce the precision ticipating clinical sites in a multi-center study. Having
of estimation and the power of the study to detect dif- laboratory samples run by a central laboratory, rather
ferences between treatment strategies. Errors in data than at each site, will eliminate variability due to use
entry, missing data due to lost lab slips or other records, of diferent equipment and diferent protocols. If labo-
randomization of a subject who does not meet eligibility ratory results are not needed for patient management,
45
running all study samples in a single batch at the end of by someone otherwise independent of the study who
the study will reduce variability even further. reviews study records on a regular basis to identify
Study assessments that incorporate some element of errors or other problems in study conduct, and to ver-
subjectivity can also be centralized. Many trials rely on ify at least some portion of the computerized data by
a central adjudication group to make outcome assess- checking them against original source records such as
ments for all subjects in a study. Such groups may be hospital charts, lab slips, etc. Checking of every data
employed to read scans, assess pathology samples, or item is almost always unnecessary. An approach used
review medical charts, and come to consensus on indi- in some studies is to verify all data pertaining to the
vidual subject outcomes. For example, in a recent highly primary outcome and to eligibility, and then some
successful randomized blinded trial that assessed the fraction (e.g., 10%) of the remaining data, with expan-
efects of three diferent antiepileptic drugs (valproic sion of the review if problems arise in the data that are
acid, ethosuximide, and lamotrigine) in children with initially checked. Many studies incorporate even more
absence seizures, all EEGs were read by a centralized limited on-site checking; studies sponsored by phar-
group. his group determined patient eligibility in the maceutical companies generally perform substantial
trial, and also determined response to treatment. In the on-site checking, in many cases involving 100% of data
eligibility review, the central readers disagreed with the elements, but the slight improvement one might have in
local reader in only three cases; these cases were then accuracy is unlikely to warrant the extensive resources
excluded [11]. In some trials, however, diferences required to verify every data element in most cases.
between local and central readers can be substantial.
Analysis of study data
Case report forms
he design of study forms can inluence the quality of Bias
data. he items on each form must be crystal clear with Even in a study that is designed and conducted with
regard to what information is being asked for, and pos- a meticulous eye to avoiding bias, results may still be
sible answers ofered must be mutually exclusive and severely biased if inappropriate methods of data analy-
exhaustive. A common error in study form design is sis are adopted. Methods that can bias results are those
omission of an ‘other’ option when the respondent is that involve removing subjects from analysis for a sys-
asked to select one of several responses; it is diicult for tematic reason, thereby undermining the assumption
the person entering data to know what to enter when that the treatment groups generated by randomization
none of the options ofered appears appropriate, and are prognostically equivalent.
this may lead to selection of an available but inaccurate
response. When there are many possible options inves- Intention-to-treat
tigators may be tempted to simply have the response he cornerstone of an unbiased analysis is the intention-
entered as free text. While in some cases there may be to-treat principle. An intention-to-treat (ITT) analysis
good reasons for collecting data as free text, this should is one in which everyone who was randomized into the
be avoided when possible as it allows for substantially study is included in the analysis – no one is dropped
more errors in transcription and creates major diicul- out because they switched treatments, stopped taking
ties in data analysis. treatment, or otherwise failed to comply with the pro-
Conducting an initial pilot test of data forms prior tocol. his oten seems counterintuitive to investiga-
to initiating data collection on study subjects is highly tors – why count the outcome for someone assigned to
recommended. Review of the forms by investigators arm A who did not get the arm A treatment (or only a
is insuicient as many unclear questions or questions minimal amount of it)? he reason this is important is
with inadequate response options will not be identiied easiest to see for a trial comparing an active treatment
until someone actually tries to complete the forms for with a placebo. he conventional approach to such a
speciic individuals. trial is to try to show a treatment efect by ‘disproving’
the assumption that there is no diference between
On-site monitoring the treatment and placebo. Under that assumption,
Electronic data editing is a form of quality control called the ‘null hypothesis,’ it wouldn’t matter if some-
monitoring, but for many studies electronic editing one didn’t get treatment, since they would be receiv-
is supplemented by on-site monitoring conducted ing either an inefective treatment or a placebo. If one
46
Table 5.1 Five-year mortality in patients given clofibrate or placebo, according to cumulative adherence to
protocol prescription
Treatment group
Clofibrate Placebo
a b
Adherence No. of patients % mortality No. of patients % mortalityb
< 80% 357 24.6 ± 2.3 (22.5) 882 28.2 ± 1.5 (25.8)
> 80% 708 15.0 ± 1.3 (15.7) 1813 15.1 ± 0.8 (16.4)
Total study group 1065 18.2 ± 1.2 (18.0) 2695 19.4 ± 0.8 (19.5)
a
A patient’s cumulative adherence was computed as the estimated number of capsules actually taken as
a percentage of the number that should have been taken according to the protocol during the first five
years of follow-up or until death (if death occurred during the first five years).
b
The figures in parentheses are adjusted for 40 base-line characteristics. The figures given as percentages
± 1 SE are unadjusted figures whose SEs are correct to within 0.1 unit for the adjusted figures.
Reproduced with permission from Massachusetts Medical Society and the New England Journal of Medicine.
drops out subjects who refused or stopped taking their prognostically diferent from those who do not, for
assigned treatment, however, those dropped out might reasons that we cannot explain by factors that we can
be sicker on average than others, and that could lead to measure. Hence, eliminating non-compliers from ana-
an apparent diference in outcomes by arm, even if the lyses raises the real danger of introducing a major bias
treatment being studied had no efect at all. into the analysis. In the case of the CDP, eliminating the
A dramatic example of this potential bias was seen poor compliers from both arms would have produced
in an NIH trial conducted in the 1970s, the Coronary the same close-to-zero estimate of treatment efect as
Drug Project (CDP) [12]. In this trial, several drugs doing the standard intention-to-treat analysis, with all
were tested against a placebo control to assess whether randomized subjects included; in general, however,
any of them improved survival rates in men at high risk one cannot be certain that those who comply with one
for cardiovascular mortality. Treatments were taken as of the study treatments will be prognostically similar to
pills, and subjects were asked to bring their supplies to those who comply with the other.
the clinic at each return visit. Compliance was a prob- Intention to treat is an important tool in preventing
lem in the trial; analysis of pill counts revealed that a bias, but a true ITT analysis requires that data on all
substantial proportion of study subjects failed to take randomized subjects are available for analysis. When
20% or more of their required medication. A naïve subjects drop out and are not evaluated for the primary
approach to this situation might have been to per- outcome, the approach to handling these dropouts
form an analysis that compared those in the treatment can introduce bias. In a study that compared donepe-
groups who took 80% or more of their medication zil to rivastigmine as treatments for mild to moderate
with those who took less than 80%. he results of such Alzheimer’s disease, there were many more dropouts
analyses, as shown in Table 5.1 were quite surprising. due to side efects in the rivastigmine arm. hese drop-
hose who took less than 80% of their medicine had outs were included in the analysis with the outcome
about a 60% higher mortality rate than those who were at their last assessment prior to dropout substituting
more adherent. he results for the placebo group, how- for the outcome at study completion. Subjects in the
ever, were even more extreme. Since taking more or rivastigmine arm tended to drop out earlier and thus
less placebo could not inluence mortality, it was clear to have the cognitive assessment earlier in their dis-
that men with worse prognosis were more likely to be ease, favoring the less-well-tolerated treatment [13].
non-adherent to medication [8]. he CDP investiga- In epilepsy studies, treatments which cause very early
tors tried to account for the result in the placebo group dropout due, for example, to rapid titration, can cause
by adjusting for all known prognostic factors but were individuals to drop out before they have had a seizure
able to explain only a small proportion of the diference ater randomization. Some studies, attempting to
between better and worse adherers by such adjustment. include all randomized subjects, deine these patients
he clear lesson of this example is that people who as seizure-free, driving up the seizure-free percentages
take medication as prescribed may be substantially in the treated arm, as compared to placebo [11].
47
he analytical approach that uses the last measure baseline measures of clinical relevance. Study analyses
prior to dropout as the primary outcome for subjects should always account for stratiication factors, cal-
who do not complete the study is generally referred to culating the treatment comparison within strata and
as ‘last observation carried forward (LOCF).’ As noted then aggregating across all strata. Since the data within
above, this approach can lead to biased estimates of strata will be more homogeneous than the data over-
treatment efect. A variety of other methods have been all, stratifying the analysis reduces variability [16–18]
proposed for handling missing data; these are dis- (also see Chapter 6).
cussed in more detail in Chapter 6. No method can
guarantee absence of bias in the presence of missing Adjustment for covariates
data, however; exploratory analyses should always be In most studies, the sample size is too small to per-
conducted to assess the possible extent of bias caused mit stratiication by more than two or three factors.
by lack of primary outcome data on some subjects. here may be additional factors that are known to be
Such analyses, referred to as sensitivity analyses, prognostic for study outcome. When analyses are per-
can use a variety of approaches to impute the miss- formed that account for the inluence of these factors,
ing data; for example, a ‘worst case scenario’ analysis the variability with which the treatment comparison is
might assume that all subjects with missing outcome assessed will be reduced, thereby increasing power to
data were treatment successes if on the control arm detect diferences [19–21].
and treatment failures if on the investigational arm.
If the treatment still showed signiicant beneit in an Interpretation of study results
analysis with such extreme assumptions one could be
certain that the missing data were not hiding informa- Bias
tion that could change the conclusions. Other types of
sensitivity analyses making less extreme assumptions The multiple comparisons problem
should also be performed; if multiple methods lead to In most studies, the treatments are compared with
the same conclusions one can feel reasonably coni- regard to multiple outcomes. he more comparisons
dent that the missing data are not leading to errors in are made, the more likely it is to observe a spurious
interpretation of the data. ‘signiicant’ inding. Ideally, one outcome is selected by
investigators as the primary outcome, so that analysis
Eligibility assessment of that outcome is readily interpretable without con-
It might seem logical that eliminating subjects who cern about inlation of the false positive rate. hat still
are found upon review to have not fully met the inclu- leaves the problem of interpreting analyses of other
sion criteria should not lead to any bias – ater all, these outcomes of interest and importance.
subjects should not have been entered in the irst place. It can be diicult to quantify this problem, and
But if the eligibility review is performed by individuals thereby correct for it, since the degree to which the false
with knowledge of treatment assignment and study positive rate is increased depends on how closely corre-
outcome, bias could enter in as reviewers made judg- lated the outcomes are. For example, there are multiple
ments when adjudication of baseline eligibility criteria stroke scales, and they are very similar. If one per-
was not straightforward [15]. Eligibility reviews should formed a study of treatment for stroke and compared
always be performed by individuals blinded both to the treatment groups on each scale, it is highly unlikely
treatment and to study status. that one would give a statistically signiicant result if the
others were not at least strongly suggestive of an efect.
Random error For example, in the placebo-controlled NINDS study
Random error can be reduced by performing analyses of R-tpa the investigators looked at the Barthel Index,
that account for prognostic factors. modiied Rankin score, Glasgow Outcome Score, and
the NIHSS and all scales demonstrated that the drug was
Stratification factors beneicial [22]. On the other hand, in a study of antiepi-
Randomization is oten stratiied by factors that are leptic drug therapy, if one outcome was number of sei-
expected to be related to prognosis for the primary zures occurring during a deined interval and the other
study outcome. hese factors oten include study site, outcome was results of a quality of life assessment at the
demographic factors such as age and gender, and end of that interval, the results would likely have only
48
a modest correlation and it is not unimaginable that a that testing for a diference at the nominal 0.05 level
signiicant efect might be shown for one with little or ten times during the course of a study raises the type
no efect suggested for the other. If one did not clearly I error, or false positive rate, to 19% [26]. Methods for
specify which outcome was primary, the investigators study monitoring and interim analyses are described
would have two opportunities to declare the study posi- in Chapter 14.
tive, thereby doubling the possibility of a false positive
inding if the drug were truly inactive. Methods to account for multiple comparisons
What can be done about the multiple comparisons
Subsets problem? he answer surely cannot be to perform only
One of the most common ways to introduce the prob- a single signiicance test when many questions will be
lem of multiple comparisons is to evaluate results in of legitimate importance. A variety of statistical meth-
subgroups of the study populations. It can be readily ods to allow deinitive conclusions to be made when
calculated that if 14 independent tests are performed at multiple tests are to be performed have been developed.
the 0.05 level of signiicance, the chance is better than All require either testing at reduced signiicance levels
50% that at least one comparison will produce a p-value (Bonferroni and related procedures [27–29]) or setting
less than 0.05 even when there are no true diferences. up nested testing systems whereby secondary hypoth-
Such indings can arise from a study in which there is eses can be tested only when there is a signiicant efect
no overall treatment diference but when subgroups are on the primary outcome (gatekeeping procedures [30–
examined, a subgroup is found that appears to beneit 32]). In the case of multiple tests of a single hypothesis
[23, 24]. It is oten diicult for investigators to take a over time, as in the monitoring of accruing clinical trial
realistic view of the likelihood that the subgroup efect results, the available methods, such as the commonly
is a ‘false positive.’ used O’Brien-Fleming procedure [33], mostly require
Of comparable importance is the situation where testing at reduced signiicance levels at interim ana-
there is a true diference but when subgroups of the lyses so as to ensure that the probability of a false posi-
study population are examined separately. In that case tive result overall remains less than 0.05 (or whatever
it may well happen that by chance, the data from one signiicance level has been selected). For multiple test-
subgroup show no treatment efect, or a trend in the ing of diferent outcomes, the gatekeeping strategies
wrong direction. To illustrate this problem, investiga- have become more popular. What is most important,
tors conducting a large cardiovascular study analyzed however, is the interpretation of the results. Whatever
their outcome data by signs of the zodiac and showed methods are used to account for multiple testing, or
that study subjects born under the signs of Gemini and even (especially) when no such methods are used at all,
Libra appeared to do worse with the tested treatment, authors must describe their approach to multiple test-
while subjects born under the other signs showed a ing and how their results should be interpreted given
strong beneit that was highly statistically signiicant the expected increase in risk of false positives.
[25]. he investigators appreciated that readers of their
paper would not believe that signs of the zodiac could Pre-specification of analytical plan
inluence the likelihood of treatment success, and Even when the study objectives are clearly stated and
included this analysis in their publication to demon- there is a single primary outcome, the details of the pri-
strate that great skepticism is needed when examining mary analyses may not be as clearly deined. For exam-
results in the other subgroups they considered. ple, the primary objective in a study of an anti-epileptic
drug might be to reduce the risk of seizures; this could
Interim analyses be quantiied in several ways, however. We might com-
Another multiple comparisons issue arises when the pare the simple frequency of seizures over the interval
accumulating data are analyzed multiple times during of observation; we might do a seizure-free day assess-
the course of a study with the idea that the study can ment; or we might determine how many subjects have
be stopped, or at least reported, as soon as the primary had a 50% reduction in seizures. If the intended primary
outcome shows a signiicant diference between treat- analysis is not speciied clearly, multiple analyses could
ment arms. Allowing multiple opportunities to answer be conducted and the one producing the lowest p-value
the same question raises the same concerns as address- could be selected. hus, even if the data remain unbi-
ing multiple diferent questions. It has been shown ased, the interpretation of the analysis might be biased.
49
Random error to central pathology review, from eligibility reviews

to interim monitoring plans, all methodological con-
Random error is frequently misinterpreted in discus-
siderations relate in one way or other to minimizing
sion sections of clinical trials reports. Our signiicance
the potential for bias and reducing random error. he
tests are intended to quantify random error; they tell
more successful researchers can be in these eforts, the
us the probability we would see a diference as large
more reliable and informative their clinical trial results
as or larger than what we have observed if there were
will be.
truly no diference between groups. hus, a very low
p-value indicates that the observed results are highly
inconsistent with an assumption that the two treat- References
ment approaches have the same efect. A large p-value 1. Byar DP, Simon RM, Friedewald WT, et al. Randomized
indicates that the data provide inconclusive evidence clinical trials—perspectives on some recent ideas. N
about the existence of a treatment efect but may sug- Engl J Med 1976; 295: 74–80.
gest that if there is an efect it is probably not large. 2. Berge E, Abdelnoor M, Nakstad PH, et al. Low
One ubiquitous error is stating that ‘no diference molecular-weight heparin versus aspirin in patients
was found between treatments X and Y’ whenever the with acute ischaemic stroke and atrial ibrillation: a
p-value for testing the diference did not cross the 0.05 double-blind randomised study. Lancet 2000; 355:
threshold. he convention that permits us to claim a 1205–10
deinitive diference if the signiicance level dips below 3. Freed CR, Greene PE, Breeze RE, et al. Transplantation
0.05 does not imply that one can deinitively conclude of embryonic dopamine neurons for severe Parkinson’s
disease. N Engl J Med 2001; 344: 710–19.
that there is no diference when the signiicance level
is above 0.05. A comparison of treatment outcomes 4. Olanow CW, Goetz CG, Korower JH, et al. A double-blind
controlled trial of bilateral fetal nigral transplantation in
resulting in a p-value of 0.07 sends quite a diferent
Parkinson’s disease. Ann Neurol 2003; 54: 403–14.
message from a comparison yielding a p-value of 0.67.
he irst indicates that a diference this large or larger 5. Treiman DM, Meyers PD, Walton NY, et al. A
comparison of four treatments for generalized
might be expected 7% of the time when there was truly
convulsive status epilepticus. Veterans Afairs Status
no treatment diference; the second indicates that a dif- Epilepticus Cooperative Study Group. N Engl J Med
ference this large or larger might be expected 67% of 1998; 339: 792–8.
the time when there was truly no treatment diference. 6. Schulz KA, Grimes D. Allocation concealment in
hese results should not lead to identical statements of randomised trials: defending against deciphering.
‘there was no diference.’ Lancet 2002; 359: 614–18.
Another common problem is attributing an insig- 7. Weaver FM, Follett K, Stern M, et al. Bilateral deep
niicant p-value to an insuicient number of subjects, brain stimulation vs. best medical therapy for patients
resulting in power too low to have detected a true difer- with advanced Parkinson disease: a randomized
ence. Low power is, of course, a possible reason for fail- controlled trial. JAMA 2009; 301: 63–73.
ing to document a diference at the conventional 0.05 8. [no authors listed]. Inluence of adherence to treatment
level of signiicance, but the competing reason is, of and response of cholesterol on mortality in the coronary
course, the lack of a true treatment diference. Just as it drug project. N Engl J Med 1980; 303: 1038–41.
is misleading to interpret any p-value above 0.05 as evi- 9. Lee YJ, Ellenberg JH, Hirtz DG, et al. Analysis of
dence of no diference, it is equally misleading to inter- clinical trials by treatment actually received: is it really
pret such a p-value as the result of low power, implying an option? Statistics in Medicine 1991; 10: 1595–1605.
that there truly is a diference. In fact, a p-value above 10. Oakes D, Moss AJ, Fleiss JL, et al. Use of adherence
the conventional signiicance level (0.05) means only measures in an analysis of the efect of diltiazem on
that a diference attributable to treatment cannot be mortality and reinfarction ater myocardial infarction. J
Am Stat Assoc 1993; 88: 44–49.
conirmed with high conidence.
11. Glauser TA, Cnaan A, Shinnar S, et al. Ethosuximide,
valproic acid, and lamotrigine in childhood absence
Summary epilepsy. N Engl J Med 2010; 362: 790–9.
Control of bias and random error underlies virtually 12. Coronary Drug Project Research Group. he Coronary
all considerations for the design, conduct and analy- Drug Project. Design, methods and baseline results.
sis of clinical trials. From sample size considerations Circulation 1973; 47 (Suppl 1): I1–I79.
50
13. Wilkinson DG, Passmore AP, Bullock R, et al. A activator for acute ischemic stroke. N Engl J Med 1995;
multinational, randomised, 12-week, comparative 333: 1581–7.
study of donepezil and rivastigmine in patients with 23. Yusuf S, Wittes J, Probstield J, et al. Analysis and
mild to moderate Alzheimer’s disease. Int J Clin Pract interpretation of treatment efects in subgroups of
2002; 56: 441–6. patients in randomized clinical trials. JAMA 1991; 266:
14. Gazzola DM, Balcer LJ, and French JA. Seizure-free 93–8.
outcome in randomized add-on trials of the new 24. Assman SF, Pocock SJ, Enos LE, et al. Subgroup analysis
antiepileptic drugs. Epilepsia 2007; 48: 1303–7. and other (mis)uses of baseline data in clinical trials.
15. Schulz KF and Grimes DA. Sample size slippages in Lancet 2000; 355: 1064–9.
randomized trials: exclusions and the lost and wayward. 25. ISIS-2 Collaborative Group. Randomised trial of
Lancet 2002; 359: 781–5. intravenous streptokinase, oral aspirin, both or neither
16. Green SB and Byar DP. he efect of stratiied among 17187 cases of suspected acute myocardial
randomization on size and power of statistical tests in infarction: ISIS-2. Lancet 1988; 2(8607): 349–60.
clinical trials. J Chron Dis 1978; 31: 445–54. 26. McPherson K. Statistics: the problem of examining
17. Lipchik GL, Nicholson RA, and Penzien DB. Allocation accumulating data more than once. N Eng J Med 1974;
of patients to conditions in headache clinical trials: 290: 501–2.
randomization, stratiication and treatment matching. 27. Simes RJ. An improved Bonferroni procedure for
Headache 2005; 45: 419–28. multiple tests of signiicance. Biometrika 1976; 73:
18. Friedman LM, Furberg CD, and DeMets DL. 751–4.
Fundamentals of Clinical Trials, 3rd ed. New York: 28. Hochberg Y. A sharper Bonferroni procedure for
Springer-Verlag, 1998. multiple tests of signiicance. Biometrika 1988; 75:
19. Hernandez AV, Steyerberg EW, and Habbema JD. 800–2.
Covariate adjustment in randomized controlled trials 29. Holm S. A simple sequentially rejective multiple test
with dichotomous outcomes increases statistical power procedure. Scand J Stat 1979; 6: 65–70.
and reduces sample size requirements. J Clin Epidemiol
2004; 57: 454–60. 30. Benjamini Y, Hochberg Y. Controlling the false
discovery rate: a practical and powerful approach to
20. Pocock SJ, Assman SE, Enos LE, et al. Subgroup multiple testing. JRSS B 1995; 57: 289–300.
analysis, covariate adjustment and baseline
comparisons in clinical trial reporting: current 31. Bauer P, Röhmel J, Maurer W, and Hothorn L. Testing
practice and problems. Statistics in Medicine 2002; 21: strategies in multi-dose experiments including active
2917–30. control. Statist Med 1998; 17: 2133–46.
21. Hauck WW, Anderson S, and Marcus SM. Should 32. Dmitrienko A, Tamhane AC, Wang X, et al. Stepwise
we adjust for covariates in nonlinear regression analyses gatekeeping procedures in clinical trial applications.
of randomized trials? Control Clin Trials 1998; 19: Biometrical Journal 2006; 48: 984–91.
249–56. 33. O’Brien PC and Fleming TR. A multiple testing
22. he National Institute of Neurological Disorders and procedure for clinical trials. Biometrics 1979; 35:
Stroke r-TPA Stroke Study Group. Tissue plasminogen 549–56.
51
Section
Section2 Concepts in biostatistics and clinical measurement
Chapter
Approaches to data analysis
6 William R. Clarke
Introduction usual summary or descriptive statistics describe two

characteristics of the data: its centrality (the location of
he goal of this chapter is to provide an introduction
the ‘middle’ of the data) and its variability (how much
to several fundamental methods for analyzing data
the data vary about the ‘middle’). In this chapter we will
from clinical trials, including a brief overview of two
let the symbol xi represent a data item and the series {x1,
very important and related concepts: conidence inter-
x2… xn} represent the data set.
vals and tests of hypotheses. he chapter begins with
he sample mean is a common descriptive statistic
a few basic statistical ideas that will be needed in the
that locates the ‘middle’ of the data. It is deined as the
rest of the chapter. Subsequently there is a brief intro-
arithmetic mean of the data set and is usually denoted
duction to descriptive statistics and a discussion of
by the symbol x̄ (pronounced x-bar). In summation
concepts of populations and samples. his is import-
notation, the mean is deined as:
ant because statistics provides methods for making
inferences about populations from samples from those n
populations. A discussion of the normal and t distribu- ∑x i
tions follows and the concepts of a conidence interval x= i =1
and hypothesis testing are then discussed, along with n

illustrations of their use. Finally two important issues
For the blood pressure data, the mean systolic blood
in the analysis of clinical trial data are discussed: the
pressure is given by:
Intention to Treat Principle and methods for handling
missing data. 10
∑ xi
x = i=1
Descriptive statistics 10
Some methods for summarizing data are reviewed = ( + + + + + + + + + )
here. It is very brief and the reader is urged to review 10
more detailed presentations that are provided in all 1510
5
= = 151.0
introductory statistics texts. 10
First we provide an example. In a preliminary
study, investigators selected a sample of ten subjects he median is another measure of central tendency. It is
who would have been eligible for their study and meas- deined as the middle item of the data set. If the number
ured their systolic and diastolic blood pressures. he of data points is odd then there is a unique middle data
data for this sample are displayed in Table 6.1. While item and the median is deined as that middle item. If
the data provide all of the information that is available the number of data points is even then the median is
from this study, it is diicult to draw any conclusions deined as the average of the two ‘middle’ items. If we
from this presentation of the data. order the systolic blood pressures in the sample data set
In order to better understand the data, we calcu- we get the ordered data set:
late summary values called statistics. A statistic is just
a value that is calculated from a collection of data. he {135, 138, 143, 150, 150, 156, 159, 159, 160, 160}.
52
Chapter 6: Approaches to data analysis
Table 6.1 Blood pressure data
Subject 1 2 3 4 5 6 7 8 9 10
Systolic 138 150 160 143 160 159 150 156 135 159
BP (mmHg)
Diastolic BP (mmHg) 94 97 100 98 99 104 106 105 93 112
Because there are ten data items, the two middle items are ⎛ 1 ⎞⎟
items 5 and 6 in the ordered list. he median is deined s 2 = ⎜⎜ {(138 − 151) + (150 − 151) +
⎜⎝ 9 ⎟⎟⎠
+ (159 − 151)
2
}
as the average of these two items. In this case the median 2 2 2 2 2 2 2
systolic blood pressure is (150 + 156)/2 = 153. If the data (−13) (−1) (9) (−8) (9) + (8) (−1)
2 2 2
set consisted of only the irst nine items in the list then +( ) + ( ) +( )
=
the median would be the 5th item or 150. By deinition, 9
806
half of the data items are less than or equal to the median = = 89.56.
and half of the data items are greater than or equal to 9
the median. he mean and the median for these data
are close but they are not the same. If the data are sym- he variance is in squared units or, in this case,
metrically distributed about the mean then the median (mm Hg)2. he standard deviation is deined as the
and the mean will be approximately the same. However, square root of the variance and in this case is s = 9.46
if the data are not symmetrically distributed or if there mm Hg. he standard deviation is in the same units as
are extreme items then the mean and median can be sub- the underlying data.
stantially diferent. he median is not afected by extreme his is a very brief discussion of descriptive statis-
values so when the data are not symmetrically distributed tics. We provide computational details about the mean
the median is the preferred descriptive statistic. and variance because we will be using them repeatedly
here are also a number of ways to describe the varia- in the following sections.
bility in a data set. Statisticians frequently report the min-
imum and the maximum values. he diference between Populations and samples
the minimum and the maximum is called the range. For A population is a group of individuals that are of inter-
the systolic blood pressure data the minimum is 135, the est. In clinical trials the inclusion and exclusion crite-
maximum is 160, and the range is (160–135) = 25. ria deine the population of interest. For example, the
he variance and standard deviation are other Intraoperative Hypothermia for Aneurysm Surgery Trial
commonly used statistics. hey are used to describe the IHAST study was conducted to determine the eicacy of
variability in a data set. he variance is deined as the hypothermia during surgery to repair ruptured intra-
average squared diference of each observation from cranial aneurysms [1]. Speciically, the aim of the IHAST
the mean of the data set. Statisticians usually use the study was to determine whether mild intraoperative
symbol s2 for the variance. In summation notation, the hypothermia results in improved neurological outcome
variance is deined as: in patients with an acute subarachnoid hemorrhage
undergoing an open craniotomy to clip their aneurysms.
1 n 2
he population could be roughly deined as all such indi-
s2 = ∑ ( xi
n −1 i =1
x) . viduals. he inclusion and exclusion criteria for IHAST
speciically deined this population as follows:
Note that we divide by (n−1) not n. As it turns out, div- Eligible patients were at least 18 years of age, were not preg-
iding by n will tend to underestimate the true variance. nant, had had a subarachnoid hemorrhage from a radio-
logically demonstrated intracranial aneurysm within
Statisticians have shown that by dividing by (n−1), we
14 days before surgery, and had a World Federation of
obtain an unbiased estimator of the true variance (i.e., Neurological Surgeons score of I, II, or III (‘good grade’)
one that is on average close to the true value). We will at the time of enrollment, which was veriied on arrival in
discuss this concept more below. the operating room. Patients were required to have had a
he variance of the systolic blood pressure data is Rankin score of 0 (no neurological disability) or 1 (mild
computed as: disability) before hemorrhage. Patients were excluded if
53
they had a body-mass index of more than 35, had a cold- Table 6.2 Distribution of race/ethnicity in the IHAST population
related disorder, or had an endotracheal tube in place.
Race/ethnicity Relative frequency
It is important to deine the population because results
from a clinical trial will only strictly apply to the popu- White 80%
lation from which the study patients were selected. For Black 7%
statistical purposes, we are usually interested in meas- Hispanic 6%
urable attributes of the population: height, weight, Other 7%
blood pressure, gender, age, etc. We can consider the
collection of the values of each of these variables as
a population of values. So, for example, we might be of interest. For example, we might be interested in
interested in the population of blood glucose levels in the mean or variance of baseline blood glucose lev-
the IHAST-eligible population. els in patients with ruptured intracranial aneurysms.
A sample is deined as a subset of the population. Because there are a very large number of these individ-
he IHAST sample consisted of 1001 patients that satis- uals, it is impractical to measure them all.
ied all inclusion and exclusion criteria and were rand- Our inability to measure the entire population
omized to receive either hypothermia or normothermia requires that we make inference about the population
during surgery to clip their ruptured aneurysms. from a sample selected from that population. he dis-
When we consider a population, we are usually inter- cipline of statistics provides methods for making ‘good’
ested in a particular characteristic or characteristics of estimates of population parameters (e.g., mean or vari-
that population. For example, we might be interested in ance) from samples. It also provides methods for quan-
the blood glucose levels at baseline in the IHAST popu- tifying the degree to which statistical estimates are likely
lation. Individual members of the population will have to deviate from the population parameters. Several of
diferent baseline blood glucose levels. he population of these techniques will be illustrated in this chapter.
glucose levels can be considered to have a distribution of We speak of the distribution of a certain character-
values. Because we can never observe every member of a istic in a population. he population distribution is the
population, we must make inference about the popula- set of all possible values that a characteristic can assume
tion from a sample from that population. Statistical anal- and the frequencies with which those values occur in the
ysis provides methods for making informed estimates population. We use the mean and variance to describe
or decisions about population characteristics based on the distribution of a characteristic that is measured with
statistical summaries prepared from data collected on a device like a ruler, scale, or thermometer. hese char-
a sample of the population. Most statistical techniques acteristics are said to be continuous. Height, weight,
require that samples are collected in such a way that the blood pressure, and serum glucose are continuous vari-
probability that each individual from the population is ables. For characteristics like race and gender that can
included in the sample is known. he well-known simple have only a small number of distinct values, the dis-
random sample requires that the probabilities of being tribution is usually described by a listing of the values
selected are the same for all members of the population. and the frequency that members of the population take
Clearly, it is very unlikely that this is the case for most on each value. hese variables are called discrete. For
clinical trials where the sample is a convenience sample example, the distribution of race in the IHAST sample
and the probabilities of being selected are not known. is described in Table 6.2. his table lists all possible val-
However, randomizing subjects to treatments will ensure ues and the relative frequencies with which those values
that the statistical analyses are valid [2]. he probability occur in the sample of study participants.
of being assigned to each treatment is known. One could ask how we know that this is the distri-
Just as each sample has a mean and a variance, each bution of race/ethnicity in this population. he short
population has a mean and a variance. Characteristics answer is that we don’t. his table was compiled from
of populations are called parameters. Population a sample of 1001 individuals from that population.
parameters include the population mean, the popula- Because the sample is so large, we can be conident that
tion variance, and the population standard deviation. the population frequencies will be very close to these
hese values would be calculated in much the same values. However, we will never be sure because we will
way that statistics are calculated from samples. hese never determine the characteristics the entire popu-
population parameters are the real characteristics lation. he value of statistics is that if we choose the
54
sample in an appropriate way and if the sample size is a randomly selected observation will be more than 2.0
suiciently large then we can be conident that the esti- standard deviations above the population mean.
mated frequencies will be close to the true values. We Figure 6.1 displays normal distributions with mean
can also make probability statements about how close zero but diferent standard deviations. Note that a larger
the observed values are to the true values. standard deviation (variance) means that the distribu-
tion has more variation (spread) about the mean.
Some useful population distributions Another useful distribution is the Student’s t distri-
bution. It looks very much like the unit normal distri-
The normal and t distributions bution (mean zero and variance one); it is symmetric
and is centered at zero but has greater variability (see
he normal distribution is commonly used in statistics.
Figure 6.2). he t distribution depends on one param-
It has the well-known bell shape. Figure 6.1 provides
eter called the degrees of freedom. For small degrees of
a graph of a normal distribution. Note that it is sym-
freedom the distribution has much more spread than
metric about its mean (if we folded it on a line through
the unit normal distribution. As the number of degrees
the middle the two halves would coincide). he nor-
of freedom increases, the t distribution approaches
mal distribution is completely determined by its mean
the unit normal distribution. Note that with increas-
and variance. he mean is usually denoted by the Greek
ing degrees of freedom, the t distributions have higher
letter μ and the standard deviation is denoted by the
maximum values and less probability in the tails.
Greek letter σ. he variance is denoted by σ2. Figure 6.1
Given the mathematical properties of distributions,
also illustrates some useful properties of the normal
we can compute probabilities associated with drawing
distribution. First, 68% of the distribution lies within 1
observations from normal and t distributions. One
standard deviation of the mean. his means that if one
very useful set of probabilities is the set of probabilities
randomly draws an observation from this distribution,
that a randomly drawn observation will be less than
the probability is 0.68 that the observed value will be
or equal to a given value. For a unit normal distribu-
within 1 standard deviation of the population mean. If
tion (usually denoted by Z) these probabilities can be
the sample is large enough, 68% of the sample will fall
written as P{Z < z}. So for example, P(Z < 0} = 0.5 and
within 1 standard deviation of the mean. Similarly, the
P{Z ≤ −1.96} = 0.025.
probability is 0.95 that the observation will be within
It is also useful to ind values of z that have particular
2.0 standard deviations. (Note that the actual values
probabilities. hese are called percentiles of the distri-
are ±1.96 but this is commonly rounded to 2.0.) he
bution and are denoted by the symbol zα. A useful per-
probability is 0.025 that a randomly selected observa-
centile is denoted z.975. his number has the property
tion will be more than 2.0 standard deviations below
the population mean and the probability is 0.025 that
σ = 0.25
f (x)
f (x)
σ= 0.5
34% 34%
2.5% σ= 1.0
2.5%
13.5% 13.5%
µ –2σ µ – σ µ µ + σ µ + 2σ 0
Figure 6.1. A normal probability distribution with mean μ and Figure 6.2. Normal probability distributions with mean zero but
standard deviation σ. different variances.
55
Table 6.3 Selected percentiles for normal and t distribution variance of this distribution will be the population vari-
Student’s t distribution
ance σ2 divided by the sample size n. We write:
Unit
normal mean( x ) = µ
σ2
Percentile df = 10 df = 15 df = 20 df = 30 (df = ∞) variance( x ) = (6.1)
0.001 –4.144 –3.733 –3.552 –3.385 –3.09
n
0.01 –2.764 –2.602 –2.528 –2.457 –2.33
Statisticians say that the sample mean is an unbiased
0.025 –2.228 –2.131 –2.086 –2.042 –1.96 estimator of the population mean because the mean of
0.05 –1.812 –1.753 –1.725 –1.679 –1.645 the sampling distribution of the sample mean is the pop-
0.10 –1.372 –1.341 –1.325 –1.310 –1.282 ulation mean. If we use the sample mean to estimate the
0.90 1.372 1.341 1.325 1.310 1.282 population mean then on average (over repeated sam-
0.95 2.228 1.753 1/725 1.679 1.645 ples) the calculated estimate will be close to the popula-
tion value. Also note that as the sample size increases the
0.975 1.812 2.131 2.086 2.042 1.96
variability of the sample mean gets smaller. If we select
0.99 2.764 2.602 2.528 2.457 2.33
a very large sample then the variance of the sampling
0.999 4.144 3.733 3.552 3.385 3.09 distribution of the sample mean will be very small and
our repeated estimates will cluster closely about the true
population mean. By taking a large enough sample, we
that P{Z ≤ z.975} = 0.975. We already know that 1.96 has can guarantee with high probability that our estimate is
this property so that z.975 = 1.96. his also means that as close as we want to the true population value.
P{Z > z.975} = 0.025. If the population distribution is normal then the
Table 6.3 provides percentiles for the unit normal sampling distribution of the sample mean will also be
distribution and the t distribution for selected degrees normal. Indeed, if the underlying distribution has a
of freedom. Note that as the degrees of freedom (df) inite mean and variance and if the sample size is large
increase, the percentiles of the t distribution approach enough, the sampling distribution of the sample mean
the corresponding percentiles of the unit normal dis- will be normal regardless of the true underlying popu-
tribution. We will use these percentiles in the sections lation distribution. his result is called the central limit
on conidence intervals and testing of hypotheses. theorem.
he standard deviation of the sampling distribu-
tion of the sample mean (the distribution of all pos-
The distribution of the sample mean
In making statistical inference, one draws a sample sible sample means) is σ2 =σ . his quantity is
n n
from a population and computes one or more statistics. also called the standard error of the mean. Note that it
Frequently, these include the sample mean and the sam- is smaller than the population standard deviation. We
ple standard deviation. Each sample has its own sample s2 = s
mean and sample variance. If the population is large, will use an estimator of this quantity n in
n
there are a large number of possible samples. Repeated many of the methods for conidence interval estima-
sampling will lead to a distribution of sample means tion and hypothesis testing described below.
and sample variances. By drawing a large number of
random samples from the population and computing a
sample mean for each, the distribution of the collection
Confidence intervals
of observed sample means will approximate the distri-
bution of the population of all possible sample means.
Confidence intervals for a single normal
Statisticians have shown that if repeated samples are population mean
drawn at random from a normal distribution with Using the facts about the sampling distribution of the
mean μ and standard deviation σ then the distribution sample mean from above (‘he distribution of the sam-
of the sample means (called the sampling distribution ple mean’) and a few simple algebraic calculations, one
of the sample mean) will also be normal. he mean of can show that the following probability statement is
this distribution will be the population mean μ and the valid for any normal distribution:
56
Table 6.4 Blood glucose levels in a sample of IHAST eligible

{
Pr x . ( )
/ n < <x + . ( / n )} = 0.95 patients
Statistic Computed value

his means that if we repeatedly draw samples of size n N 16
from a normal distribution with mean μ and standard Mean 134.1
deviation σ and if we calculate the interval
Standard deviation 29.6
(x ( n x+ ) ( n ))
(x t n (s )
n , x +t n (s n . )) (0.4)
for each of those samples, then 95% of those inter-
vals will include the true population mean. his
also means that 5% of the intervals will not con- In practice we only collect data on a single sample then
tain the true population mean. We call the interval we calculate the sample mean x̄, the sample stand-
ard deviation s, and the single conidence interval
(x ( n ,x+ ) n a 95% conidence ( )) (x t n (
s n , x +t )
n (
s n . he prob- ))
interval for the true population mean. If we use another
percentile of the normal distribution instead of 1.96, ability is either 0 or 1 that this interval contains the true
say z1−(α/2), then we know that population mean μ; because there is only one interval.
However, if we assume that the true population mean
is in the interval then over many such computations we
{
Pr x z
− /2 ( / n < ) <x +z − / ( / n )} = 1 − α know we will be wrong only 100 × α % of the time. By
(0.2) calculating intervals in this way, we have controlled the
frequency that we will be wrong (the interval will not
We say that we are 100 × (1−α)% conident that the true contain the true value μ).
population mean is in the interval:
Example:
An investigator is interested in estimating the mean
(x z ( )
n , x +z ( ))
n . (0.3) blood glucose level in subjects who would be eligi-
ble for participation in the IHAST study. She collects
a random sample of 16 subjects from the population
So, if α=0.05 (and z1−(α/2) = 1.96) then we are 95% coni-
and measures their glucose levels. he results of that
dent and if α=0.10 (and z1−(α/2) = 1.645) then we are 90% experiment are summarized in Table 6.4.
conident. She wants to compute a 95% conidence interval for
hese intervals require that we know the popula- the true population mean. he appropriate t distribu-
tion standard deviation σ. Clearly, if we don’t know
tion is determined by the sample size. If the sample size
the population mean then we don’t know the popula-
is n then the appropriate distribution is the t distribu-
tion standard deviation. Fortunately, statistical theory tion with n−1 degrees of freedom. Because there are
states that if we use the t distribution in place of the 16 observations in the sample, the standard error has
unit normal distribution, the following probability
(16–1) = 15 degrees of freedom. If we want a 95% con-
statement is true:
idence interval then we would use the 97.5 percentile
of the t distribution with 15 degrees of freedom. From
Pr x{ t1− /2 ,n
,n−1 (s / n ) < µ < x + Table 6.3 we see that this percentile is t.975,15 = 2.131. A
95% conidence interval for the true mean glucose level
t1−α / ,n−1 (s / n )} = 1− α in this population is given by:
where s is the sample standard deviation. he degrees (x t n (s )

n , x +t n (s n ))
of freedom used to ind the appropriate percentile of = (134.1 − 2.131( 34.1 + 2.131(7 40))
), 134
the t distribution is the sample size minus 1 (n−1). In
general, we can be 100 × (1−α)% conident that the true = (134.1 15.77,134.1 + 15 77)
population mean is in the interval: = (118.3,149.9)
57
We are 95% conident that the true mean glucose level variance then the variance of the diference in the
in this population is between 118.3 and 149.9. he true ⎛ 1 1⎞
sample means is given by σ2 ⎜⎜ + ⎟⎟⎟ . Unfortunately,
⎜⎝ m n ⎠
mean may not be in this interval. However, we do know
that if we were to repeat this experiment a large number this variance has an unknown quantity σ2. If we can
of times (say 10,000,000) we would know that 95% of assume that the variances in the two populations are
intervals computed in this way will contain the true approximately the same then we estimate this quan-
tity from the observed standard deviations with the
population mean and 5% will not.
If we want to be 90% conident then the 95th per- (m 1) sx2 + (n ) s 2y
weighted average s 2p = called
centile of the t distribution with 15 degrees of freedom (m 1) + (n 1)
is t.95,15 = 1.753. Using this percentile in our computa- the pooled estimate of the variance or the pooled vari-
tion we ind the 90% conidence interval: ance. he standard error of the diference between the
two treatment means is estimated by substituting this
(x t n (s )
n , x +t n (s n )) estimate for the unknown variance and then taking
the square root. hat is, the standard error of the dife-
= (134.1−1.753( 34.1 + 1.753(7 40))
), 134 1 1
= (134.11 12 97,134.1 + 12 97) rence in sample means is estimated by s p + .A
m n
= (121.1,147.1) 100(1−α)% conidence interval for the diference in the
means is provided by the quantity:
Because the conidence coeicient is smaller (90%
compared to 95%) the conidence interval is narrower. ⎛ ⎛ 1 1 ⎞⎟⎟
⎜⎜
As we raise the required level of conidence we must ⎜⎜( x y ) −tt1− s p ⎜⎜⎜ + ⎟, ( x y)
⎝
,m+n−2
⎜⎝ m n ⎟⎠
widen our conidence interval.
⎛ ⎞⎞
Most conidence intervals use similar methods. In +t1−α /2,m+n−2 s p ⎜⎜⎜ 1 1 ⎟⎟⎟⎟⎟
⎜⎝ + ⎟⎟⎟
general, a conidence interval is made up of a point esti- m n ⎠⎟⎠
mate (like x̄), the standard error of that estimate, and a
tabular value like the z or t values that were used in the where the degrees of freedom for the tabular t-value is
previous sections. We will now describe methods for com- (m−1)+(n−1) = m+n–2.
puting conidence intervals for several other situations.
Example:
Investigators are interested in estimating the diference
Confidence interval for the difference between in mean glucose levels in two populations: subjects who
two independent normal population means receive a new intervention and subjects who receive a
control intervention. he data from this study are sum-
In a two-arm clinical trial we are usually comparing a
marized in Table 6.5.
new or innovative intervention to a standard or con-
In this case the estimate of the true diference in
trol intervention. If the outcome of interest is a char-
the population means (Standard minus Innovative) is
acteristic that is approximately normally distributed
(138.40 − 119.00) = 19.40 mg/dL. he pooled estimate
then the parameter of interest is usually the diference
of the variance is given by:
in the population means, say (μx − μγ). Subjects are
randomly assigned to treatments and we say that the 2 2
two samples are independent. We might represent the (30 −1)(31 82) + (30 −1)(29 46)
s 2p = = 940.20
sample from the innovative population as {x1, x2,...,xm} (30 + 30 − 2)
and the sample from the standard population as {y1,
y2,...,yn}. he data from the two samples are usually s p = 940.20 = 30.66
summarized as sample means and variances, say x̄,
sx2 and ȳ, sy2 respectively. he natural estimator of the he standard error of the diference is therefore:
diference between the two population means is the 1 1 2
sp + = 30.66 66 = 7 91 . he appropriate tab-
diference between the two sample means x̄ − ȳ. 30 30 30
If we want to construct a conidence interval then ular value comes from the t distribution with 30 + 30 – 2
we will need the standard error of the diference in the = 58 degrees of freedom. For a 95% conidence interval
sample means. If the two populations have the same we would use the 0.975 percentile of the t distribution
58
Table 6.5 Statistical summary of blood glucose levels in two the observed proportion of subjects who experience a
samples success. We will denote this (sample) proportion by p̂.
Blood glucose levels (mg/dL) We can use this sample proportion to estimate the pop-
ulation proportion but we would also like to compute a
Standard Innovative
treatment treatment conidence interval for the true proportion.
We have already said that if the sample size is large
N 30 30
enough then the sample mean will be approximately
Mean 138.40 119.03 normally distributed even if the underlying distribu-
Standard deviation 31.82 29.46 tion is not normal (as in this case where the underlying
distribution is discrete, with outcomes taking on the
values 0 or 1). Statisticians have shown that the vari-
with 58 degrees of freedom or t0.975,58 = 2.002 (not pre-
ance of the sampling distribution of the sample pro-
sented in Table 6.3). his yields the 95% conidence
interval:
" "
portion can be estimated by: p 1 p . A 100(1−α)% ( )
m
(19.37 − 2.002(7.91
91),19 37 + 2.002(7 91)) conidence interval can therefore be computed using
or (19 1 .84)
37 −15.84,19.37 + 15
19.37
.37 the formula:
or (3.53, ).
⎛ ⎞
We are 95% conident that the true diference in mean
⎜⎜
⎜⎜ "p − z
"p ( "p ) , "p + z "p ( )
"p ⎟⎟
⎟⎟
⎜⎜ ⎟
m ⎟⎟⎟
− /
/2 −α /2
glucose levels between the Innovative intervention and 1−
m
⎜⎜⎝ ⎟⎠
the Standard intervention lies between 3.53 and 35.21
mg/dL. Note that we are 95% conident that the true
diference is greater than zero and, hence, that the true Example:
diference is in favor of the Innovative intervention. An investigator is interested in studying the ability of
a new drug to lower the rates of hyperglycemia during
Confidence interval for a single population the acute treatment of stroke. She conducted a study
proportion to compare the rates for her new treatment to those of
the standard treatment. he results of this study are
When the outcome is a binary outcome like success or
described in Table 6.6.
failure then we are usually interested in the success rate
For her initial analyses she computes 95% coni-
(or failure rate). hat is, we are interested in the propor-
dence intervals for the rates of hypoglycemia in each
tion of subjects who experience one of the two possible
of the two treatment groups. he calculations are pro-
events. In the IHAST study, the primary measure of ei-
vided in Table 6.7. hese calculations indicate that we
cacy was a Glasgow Outcome Scale (GOS) score at 90 days
can be 95% conident that the true rate of hypoglycemia
ater surgery. A subject was deined as a success if her/his
in the standard treatment group is between 0.1229 and
GOS was 1 (no neurological deicit). In that study 301 of
0.3021. Similarly, we can be 95% conident that the true
501 normothermia subjects had a GOS 1 at 90 days ater
rate of hypoglycemia in the innovative therapy groups
surgery. he success rate for normothermia subjects is:
is between 0.0601 and 0.2179. Because these two inter-
301 / 501 0.601. vals overlap we should probably not expect that the
true means are diferent in the two populations. We
We would like a conidence interval for the true pro- will discuss this more later.
portion of successes in this population. We can use
a variant of the method that we used for the sample
mean. If we collect a sample of m subjects and code the Confidence interval for the difference in
data as xi = 1 if subject i had a success and xi = 0 if sub- two independent population proportions
ject i had a failure then the data are just a collection At this stage, the investigator would like to evaluate
of 1s and 0s. he mean for this sample of m subjects the diference in the rates of hyperglycemia for the two
1 m the number of successes
is x = ∑ xi = . But this is treatments. he obvious estimate of the diference is the
m i =1 m observed diference in the rates p̂S − p̂T = (.2125 − .1250).
59
Table 6.6 Summary of hypoglycemia rates for two therapies
Treatment Number of subjects treated Number with hypoglycemia Proportion with hypoglycemia
Standard therapy 80 17 0.2125
Innovative therapy 80 10 0.1250
Total 160 27 0.1688
Table 6.7 Computation of confidence intervals from hypoglycemia data
Standard therapy Innovative therapy

Estimate 0.2125 0.1250
Standard error (SE)
(.2125)(1- .2125) (.1250)(1- .1250)
= 0.0457 = 0.0370
80 80
Confidence coefficient (percentile) 1.96 1.96
Confidence interval calculation Estimate ± Z0.975(SE) Estimate ± Z0.975(SE)
0.2125 ±1 .96 × (0.0457) 0.1250 ± 1.96 × (0.0370)
Confidence interval (0.1229, 0.3021) (0.0525,0.1975)
he standard error of the diference in proportions is hese results indicate that we can be 95% conident
given by that the true diference in rates can be as small as −0.0278
or as large as 0.2028. In particular, because zero is in
this interval we cannot rule out with 95% conidence
p"S ( p"S ) + p" ( p"T ) that the true diference between the rates of hypergly-
( )
T
SE p"S p"T =
m n cemia for the two treatments is zero. Indeed, with 95%
conidence the true diference could be approximately
In the usual way, we can construct a conidence 3% in favor of the Standard treatment to 20% in favor
interval by adding and subtracting the appropriate of the Innovative treatment.
number of standard errors from the point estimate of
the diference. hus a 100(1 − α)% conidence inter- Testing hypotheses
val for the diference in proportions can computed
as follows: Testing a hypothesis for a single population
⎛
mean
⎜⎜ p"S ( p"S ) + p" (T p"T ), We have provided an example above that computed
(
⎜⎜ p"
⎜⎜ S )
p"T − z1−α
2 m n a conidence interval for the mean glucose level in the
⎜⎝ population of subjects that would be eligible for the
p"S ( − ) + p" ( − ) ⎞⎟⎟⎟⎟T
IHAST study. he upper range of normal for glucose
( p"
S )
p"T + z1−α
2 m n
⎟⎟ .
⎟⎟
in the general population is approximately 120 mg/dL.
⎠ Suppose we wanted to determine if the mean glucose
level in the IHAST eligible population is greater than
120 mg/dL. We will use the symbol μ to denote the mean
Note that, as before, this is the point estimate of the
serum glucose in this population. We will then deine
diference plus or minus a z-value times the standard
two hypotheses. he irst hypothesis is called the null
error of the diference in the two proportions.
hypothesis and represents the condition that the mean is
Example: the normal population mean. hat is, the null hypothesis
Table 6.8 provides the computations associated with cal- will be that the mean serum glucose level in the IHAST
culating a 95% conidence interval for the diference in population is less than or equal to 120 mg/dL. he
the rates of hypoglycemia between the two treatments. second hypothesis is called the alternative hypothesis
60
Table 6.8 Computation of confidence interval for difference in hypoglycemia rates
Difference
Estimate 0.2125 − 0.1250 = 0.0875
Standard error (SE)
(.2125)(1- .2125) (.1250)(1- .1250)
+
80 80
= 0.002092 + .001367 = .003459
3
= 0.0588
Confidence coefficient (percentile) 1.96
Confidence interval calculation Estimate ± Z.975(SE)
0.0875 ± 1.96 *(0.0588)
or
0.0875 ± 0.1152
Confidence interval (−0.0278,0.2028)
Table 6.9 Errors in testing hypotheses
True state of nature

Null hypothesis true Null hypothesis false
Decision based on the data Fail to reject null hypothesis Correct decision Type II error
Reject null hypothesis Type I error Correct decision
and indicates that the mean is greater than 120 mg/dL. can make and the consequences of those decisions. If
his is called a one-sided hypothesis because we are only the null hypothesis is true and we fail to reject it then
interested in a diference in one direction. In general, we have made the correct decision. Similarly, if the
if the true mean is denoted by μ and the hypothesized null hypothesis is false and we reject it then again we
value as μ0 then we can write the hypotheses as: have made the correct decision. However, if the null
hypothesis is true and we reject the null hypothesis
H 0 : µ ≤ µ0
then we have made an error; we have falsely concluded
H A : µ > µ0
that the null hypothesis is false. Statisticians call this
We will design a study, collect and analyze our data, the type I error. he consequences of making this error
and use those data to arrive at a conclusion by deter- can be substantial. For example, in a study of the ei-
mining whether the data are more consistent with the cacy of a new treatment, incorrectly rejecting the null
null hypothesis than with the alternative hypothesis. If hypothesis of no treatment efect could result in phy-
we determine that the evidence is in favor of the alter- sicians prescribing the treatment when it is not efec-
native then we will ‘reject the null hypothesis’ and con- tive. If there is toxicity associated with the treatment
clude that the alternative hypothesis is true. If we are then falsely rejecting the null hypothesis will result in
unable to conclude that the null hypothesis is false then patients being exposed to adverse health efects with-
we will fail to ‘reject the null hypothesis’. hat is, we will out any beneit.
conclude that there is insuicient evidence to conclude he second or type II error results when we fail to
that the null hypothesis is false. Note that we are not reject the null hypothesis when it is false. In practice,
accepting the null hypothesis. We are only saying that this would correspond to deciding that a treatment is
there is insuicient data to disprove the null hypoth- not efective when it really is. A type II error will result
esis. We would use a diferent procedure to prove the in an efective treatment not being used to treat patients
null hypothesis. who could beneit from the treatment.
Let us consider the consequences of these two deci- he probability of making the correct decision
sions. Table 6.9 displays the possible decisions that we depends on the uncertainty associated with making
61
inference from a sample to a population; the outcomes is true is equal to α; that is, that the signiicance level of
have a random component and there is a positive prob- the test is exactly α. his means that c is the 1−α percent-
ability that one will make the wrong decision. he prob- ile of the t-distribution and the decision rule for our test
ability of making a type 1 error is called the signiicance X − µ0
level of the test and is customarily kept at a low level (0.05 becomes: reject the null hypothesis if > t1−α,
α n−
n 1
s n
or 0.01). We denote the signiicance level of the test by
the Greek letter α and write α = Pr{Rejecting H0|H0 is ⎛ s ⎞
or equivalently if X > µ 0 + t1−α ,n−1 ⎜⎜ ⎟⎟⎟ .
true}. he probability of a type II error is usually denoted ⎜⎝ n ⎟⎠
by the Greek letter β and we write: β = Pr{fail to reject
Example:
H0|H0 is false}. he opposite of a type II error is to reject
Consider now the test to determine if mean serum
the null hypothesis when it is false. he probability of
glucose in IHAST2 subjects is greater than 120 mg/dL.
this event is 1 minus the probability of a type II error.
Table 6.4 displays a summary of blood glucose levels
Statisticians call this probability the power of the test and
on a random sample of 16 subjects who would have
we write Power = 1 − β = Pr{Reject H0|H0 is false}. he
been eligible for IHAST2. he observed sample mean is
power of a test depends on the true value of the param-
134.1 mg/dL and the observed sample standard devia-
eter (in our case, the true mean glucose level in subjects
tion is 29.6 m/dL. Suppose now that we want to test the
eligible for IHAST2). If the true mean is denoted by μ1
hypothesis that the true mean blood glucose level in
then we can write the power as Power = 1 − β = Pr{Reject
this population is greater than 120 mg/dL. In this case
H0|μ = μ1}. Note that power is a function of the true value
we can write the null and alternative hypotheses as
of the unknown parameter and we can determine the
power of the test for any speciied value of μ = μ1.
0 : µ = 120
Just as for the conidence interval, the test of hypoth- H A : µ >120.
esis will be based on the sample mean and sample stand-
ard deviation of observed serum glucose levels in a We will reject the null hypothesis in case
sample of IHAST2 eligible subjects. It makes sense that ⎛ s ⎞⎟
if the observed sample mean is very near the null value X> + t1−α,n− ⎜ ⎟ . Because the sample size is 16,
n 1⎜
⎜⎝ n ⎟⎟⎠
μ0 then we would not want to reject the null hypothesis
because the data appear to be consistent with the null the t-distribution will have 15 degrees of freedom. If we
hypothesis. Because we are only interested in whether want to use the 5% level of signiicance then we must
or not the mean is greater than μ0, we will reject the null determine the 95th percentile of the t-distribution with
hypothesis if the observed mean is suiciently greater 15 degrees of freedom. Using an appropriate table we
than μ0. But how much greater must the sample mean can determine that the 95th percentile of the t-distri-
be in order to arrive at this conclusion? bution with 15 degrees of freedom is t0.95,15 = 1.753. We
Remember that we want to make sure that we control will reject the null hypothesis if
the probability of a type I error. Suppose we decide to ⎛ s ⎞ ⎛ 29.6 ⎞⎟
reject the null hypothesis if the observed sample mean X > 120 + 1.753⎜⎜⎜ ⎟⎟⎟ = 120 + 1.753⎜⎜⎜ ⎟⎟ = 120
is larger than some constant c. We want to ensure that: ⎝ n ⎟⎠ ⎝ 16 ⎟⎠
Pr{Reject H0|μ = μ0} = Pr{X̄ > c|μ = μ0} = α. As with the 753(7.4) = 132.97.
+ 1.753
discussion of conidence intervals where the variance
is not known, when μ0 is the true value of the popula- he observed sample mean was 134.1, which is greater
X − µ0 than 132.97, so we will reject the null hypothesis. We
tion mean then we have the result that follows
s n will conclude that the alternative is true and that the
a Student’s t distribution with n−1 degrees of freedom, true mean blood glucose in this population is greater
where n is the sample size. herefore we can use the than 120 mg/dL. We cannot know the true mean and
we may be wrong in concluding that the true mean is
t-distribution to ind the constant c that has the prop- greater than 120 mg/dL. However, we do know that if
⎧X −µ
⎪ ⎫
⎪ we were to run this experiment a very large number
⎪ 0 ⎪ of times and construct this test for each of the experi-
erty that Pr ⎨ > c⎬ = α . his will guarantee that
⎪ ⎪
⎩ s n
⎪
⎪ ⎪
⎪
⎭ ments then we would wrongly reject the null hypoth-
the probability of rejecting the null hypothesis when it esis only 5% of the time when it is true.
62
Hypothesis test for two independent Table 6.10 Data required for test of difference in means
normal means, equal variances Standard

treatment
Innovative
treatment
It is also possible to construct tests for two or more
parameters. In particular, it is frequently useful to N n1 n2
compare two population means. Above we constructed Mean X ̄1 X ̄2
a conidence interval for the diference between two Standard deviation s1 s2
population means (‘Conidence interval for the dif-
ference in two population proportions’). It would be
possible to use that conidence interval to determine has the Student’s t-distribution with n1 + n2 − 2 degrees
if the two population means were diferent. We could of freedom. We will reject the null hypothesis if the dif-
conclude that the two means were diferent if the conference in the means is too far from zero in either direc-
idence interval did not contain zero. hat is, if we are tion. As in the one-sample case we want to determine a
100(1−α)% conident that the true diference between constant c such that:
the means was greater than (or less than) zero then we
might conclude that the two means were diferent. ⎧⎪ ⎫⎪
We can use the method of hypotheses testing to ⎪⎪ ⎪⎪
⎪⎪ ( X1 X2 ) −( 1 − ) ⎪⎪
help us make that same decision. If we denote the mean P ⎪⎨ > c H 0 is true⎪⎬ = α
2
of population 1 by the symbol μ1 and the mean of popu- ⎪⎪ 1 1 ⎪⎪

⎪⎪ sp + ⎪⎪
lation 2 by the symbol μ2 then we can write the null and ⎪⎪⎩ n1 n2 ⎪⎪⎭
alternative hypotheses as:
H 0 : µ1 µ 2 or (µ1 µ 2 = 0) his means that c should be the (1−α/2) percentile

of the t distribution with n1 + n2 − 2 degrees of free-
H A : µ1 µ 2 or (µ1 µ 2 ≠ 0) dom. We use α/2 because we want to reject the null
hypothesis if there is evidence that the true value is
In this case the null hypothesis is no diference and the diferent from the null in either direction. For exam-
alternative hypothesis is that there is a diference. his ple, in a clinical trial we would want to reject the null
is called a two-sided test because we are interested in hypothesis of no diference if there is evidence that
diferences in either direction. his is the most com- the new treatment is better or if there is evidence that
mon case in clinical trials where it is important to the new treatment is worse. In order to weight dif-
determine whether an innovative treatment is better ferences in each direction equally, we assign ½ of the
or worse than the standard treatment. It is possible to signiicance level α to each side of zero. hat is, we
deine one-sided tests as well. In this case the hypoth- determine rejection regions either side of zero that
eses might be as follows: have the same probability of occurring when the null
hypothesis is true.
H 0 : µ1 < µ 2 (µ1 − µ 2 < 0) When the null hypothesis is true then μ1 = μ2 = 0 so
H A : µ1 ≥ µ 2 (µ1 − µ 2 ≥ 0) we would reject the null hypothesis if:
he test will be based on the results of independent ( X1 X2 )

random samples from each population. he data are > t1− /2 ,n1 +n2 2
described in Table 6.10. 1 1
sp +
From the account of conidence intervals we know n1 n2
that if the variances in the two populations are the same or
then: 1 1
( X1 X 2 ) > t1 /2 ,n1 n2 2 sp +
n1 n2
( X1 X2 )− ( 1 − 2 )
1 1 Note that the last expression is exactly the expression
sp + used to calculate a conidence interval for the dife-
n1 n2
rence in the means except that the conidence interval
63
Table 6.11 Summary of blood glucose levels

Hypothesis test for two normal means,
Blood glucose levels (mg/dL) unequal variances
Standard New A similar method is used to test for the equality of two
treatment treatment population means when the population variances are
N 30 30 not equal. In this instance the variance of the diference
Mean 138.40 119.03 between the two sample means is estimated by:
Standard deviation 31.82 29.46
s12 s22
Var ( X1 − X2 ) = + .
n1 n2
is described as being less than or equal to the expres-
sion to the right of the inequality. he conidence inter-
val is the set of values between the two endpoints and Statisticians have shown that the distribution of
the rejection region is the set of values ‘outside’ of the X1 X2
conidence interval. s12 s22 is reasonably well approximated by a t dis-
For the one-sided test of H0 : μ1 − μ2 ≤ 0 versus H1 : +
n1 n2
μ1 − μ2 > 0 we would reject the null hypothesis in case:
tribution; however, the degrees of freedom are not so
easy to obtain. For the two sided test we will reject the
⎛ ⎞⎟
⎜⎜ ⎟⎟ null hypothesis in case:
⎜⎜( X X2 ) ⎟⎟
⎜⎜ 1 ⎟> t .
⎜⎜ 1 1 ⎟⎟⎟ 1 ,n1 +n2 −2
X1 X2
⎜⎜ sp + ⎟ > t d ,1−
⎜⎝ n1 n2 ⎟⎟⎠ s12 s22
/2
+
n1 n2
For the one-sided test of H0 : μ1 − μ2 ≥ 0 versus H1 : μ1 −
μ2 < 0 we would reject the null hypothesis in case: Where the degrees of freedom d is the integer nearest to
the value of the following expression:
⎛ ⎞⎟
⎜⎜ ⎟⎟ 2
⎜⎜( X X2 ) ⎟⎟ ⎛s12 s2 ⎞
⎜⎜ 1 ⎟ < −t1 .
⎜⎜ + 2 ⎟⎟⎟
⎜⎜ 1 1 ⎟⎟ ,n1 +n2 −2 ⎝ n1 n2 ⎠
sp + ⎟⎟⎟ d′ =
⎜⎜ (s n ) / (n ) + (s n ) / (n − )
⎜⎝ n1 n2 ⎟⎠
Example:
Consider the data in Table 6.11. Hypothesis test for two proportions
Note that most of the calculations have already he methods that we described for constructing coni-
been done when we calculated a conidence interval dence intervals for the diference in two proportions can
for the diference in the two treatment means. In that be adapted to testing a hypothesis about the diference in
case the pooled standard deviation was 30.66 and the those proportions. Just as we did for comparing two pop-
97.5th percentile of the t distribution with 58 degrees ulation means, we can test a hypothesis concerning two
of freedom is t0.975,58 = 2.002. herefore, for a two-sided population proportions. If we are interested in whether
test we will reject the null hypothesis if the abso- or not the proportions are diferent (no speciic direction)
lute diference between the treatment means exceeds then the null and alternative hypotheses can be written:
2
002(30.66)
2.002 = 15.84 . he observed diference H 0 : π1 π2 (π1 π2 = 0)
30
between the two treatment means is 138.40 − 119.03 = H A : π1 π2 (π1 π2 ≠ 0)
19.37. Because this diference is greater than 15.84, we
will reject the null hypothesis and conclude that the mean Just as for the comparison of two means, the test statis-
for the new treatment is signiicantly lower than the mean tic is the diference in observed proportions divided by
for the standard treatment (a good thing in this case). the standard error, or:
64
Because the observed p-value is greater than 0.05 we

"p1 "p2
z= . cannot reject this hypothesis at the 5% level.
"p1 ( p ) + "p ( 2
p ) he p-value provides a measure of how extreme
the observed data are relative to the null hypothesis.
n1 n2 Remember that by chance alone some samples will
result in test statistics that are relatively far from the
We will reject the null hypothesis in favor of the two- null value even when the null hypothesis is true. he
sided alternative hypothesis at the α signiicance level p-value only measures the probability of observing a
if |z|>z1−α/2. If we were testing the one-sided alternative value of the test statistic as extreme (as far from the null
(HA: π1 > π2) then we would reject the null hypothesis hypothesis value) as the one that was observed if the
if z > z1−α. null hypothesis is true. he p-value does not measure
Example: the strength of the evidence against the null hypothesis
Using the data from the example in Table 6.6 the test nor does it provide any measure of the probability that
statistic becomes the null hypothesis is false.
.2125 − .1250
Other considerations in analyzing
z= clinical trial data: intention to treat
.2125( − ) .1250( − )
+ As the discipline of clinical trials developed, it became
80 80
clear that subjects who were randomized but did not
.0875
= = 1.488 complete the study on the treatment to which they
.0588 were assigned by randomization created a problem for
analyzing study data and for interpreting the results of
We will reject the two-sided null hypothesis at the 5% those analyses. Randomization is intended to protect
level of signiicance if the absolute value of this test sta- against bias in estimating the efects of an intervention
tistic is greater than 1.96. Since 1.488 is less than 1.96 and to provide a valid framework for testing hypotheses
we fail to reject the null hypothesis and conclude that about diferences in outcomes due to an intervention.
there is insuicient evidence to claim that the rate of If subjects do not complete the study or crossover to
hypoglycemia is diferent for the two therapies. another treatment during the course of follow-up then
Recall that we have already calculated a 95% coni- it becomes diicult to decide how their data should be
dence interval for the diference between these two pro- analyzed.
portions. It was (−0.0278, 0.2028). Because this interval he method of intention to treat has become the
contains zero, we must conclude that it is possible that standard for analyzing data from clinical trials. A thor-
the true mean is zero. he data do not provide strong ough discussion of this topic requires an extended dis-
evidence that the diference is not equal to zero. his is course. Many articles have been published on this topic
consistent with our test of hypothesis where we fail to [3, 4] and most textbooks on clinical trials method-
reject the null hypothesis that the true diference is zero. ology address the issue (e.g. [2]) (See Chapter 5).
The p-value What are some alternatives to intention

Articles in the medical literature routinely report the to treat?
p-value associated with a test of hypothesis. he p-value One alternative is to only analyze those subjects who
is deined as the probability of observing a value of the comply with all study regimens. his is called a compli-
test statistic as extreme as or more extreme than the ers or ‘evaluable subjects’ analysis. his analysis does not
value that was actually observed if the null hypoth- include non-compliers. Unfortunately, compliance is
esis is true. For the previous example the p-value is: p = an outcome that can depend on many processes includ-
Pr{|z| > 1.488| the null hypothesis is true} = 0.1367. ing the severity of a subject’s illness, the toxicity of the
he p-value can be interpreted as the smallest sig- drug, and tolerability of the drug. If non-compliance is
niicance level that would have resulted in rejecting the related to study treatment then the compliers analysis
null hypothesis. In that setting, we would reject the null is likely to be biased. In addition, eliminating subjects
hypothesis at the 5% level of signiicance if p ≤ 0.05. from the analysis will afect the power of the test.
65
Another alternative to intention to treat is called completely at random’ or MCAR. In this case, the rea-
the as treated analysis. his analysis assigns subjects to son that the value was not observed is completely inde-
their treatment at the end of the study. A subject who pendent of the data observed for that subject as well as
crosses over from the innovative drug to placebo is the value that would have been observed. his is the
analyzed as a placebo subject and a subject who crosses ‘subject was run over by a bus’ category. he reason that
over from placebo to the innovative drug is analyzed as the observation is missing depends in no way on char-
an innovative drug subject. here are logical problems acteristics of the subject (e.g. severity of disease) or the
with this strategy because a subject receives one treat- true level of the potential observation. his is the best
ment for part of the follow-up period and another for case because one can analyze the complete data and
the remainder of follow-up. Variations of this method still obtain unbiased results.
will analyze a subject in the group that they were in the he next class of missing data is called ‘missing at
longest. In any case it should be clear that any results random’ or MAR. In this case, the reason why an obser-
from this analysis will have potential for bias. vation is missing may depend on the observed data but
Regulatory organizations such as FDA and the the reason is not related to other processes once the
International Conference on Harmonisation (ICH) observed data have been taken into account. All of the
[6] recommend that the primary eicacy analysis be information about the reasons for the data being missing
based on the intention to treat principle. However, is contained in data that has already been observed. For
many studies report a compliers analysis and/or an as example, a subject drops out and no longer continues in
treated analysis. If these analyses agree then the overall the study if her/his disease deteriorates to a level where
conclusions will have more credibility since the results the subject can no longer continue in the study. In this
are likely to not strongly depend on compliance. If they case, the reason for a subsequent value being miss-
do not agree, then both FDA and ICH recommend ing only depends on the data that have already been
that the most valid conclusions are those based on the observed but not on the unobserved (future) outcomes.
intention to treat analysis. he third class of missing data is called ‘missing
not at random’ or MNAR. Sometimes this is referred
Other considerations in analyzing to as informative missingness. In this case the proc-
clinical trial data: Missing data and esses that cause observations to be missing are related
to the values that would have been observed but
imputation are not. hink of the case of an Alzheimer’s patient
he intention to treat principle requires that all subjects who is doing very well cognitively but, all of a sud-
be analyzed according to the treatment that they were den, ‘falls of the clif ’ in terms of becoming demen-
assigned by the randomization. his means that subjects ted. his event happens ater having observed the
who drop out of the study or are lost to follow-up must be patient’s ‘good’ cognitive scores but the patient drops
included in the analysis. In typical studies this means that out ater this event and no ‘bad’ cognitive data are
we must impute (guess) the value that a subject would ever observed. In this case, the reason for missing
have provided had they not dropped out. Imputation data depends more on what is not observed, and not
can lead to bias in many of the same ways that crossover so much on what is observed. Methods for analysis of
does. If drop-out is related to toxicity or tolerability then data that are MNAR are still being developed and will
subjects in a more intensive intervention will be more not be discussed here.
likely to not inish the study. Any method used to impute
missing values must address the potential bias that
could result from ‘guessing’ the value that would have Methods for accounting for missing data
been observed had the subject completed the trial. One he best way to account for missing data is to vigor-
method may be good if the data are missing for one set ously manage the conduct of the study in ways that
of reasons while another might be preferred if there are avoid missing data. If there are no missing data then
other reasons for the data being unobserved. one does not need to worry about how to account for
it in the analysis. If the proportion of missing data is
Types of missing data small then the results of the analysis probably will not
Rubin [5] has proposed a list of diferent classes of rea- depend strongly on how you account for the missing
sons for missing data. His irst class is called ‘missing data. Unfortunately, this is not always possible.
66
Multiple methods have been used to impute miss- predict the missing data for data that are observed. he
ing data [7]. A method that was very commonly used procedures that are implemented in statistical packages
in the past is called ‘last observation carried forward’ allow the user to select from a menu of methods. We
or LOCF. In this case, the last value that was observed will not discuss these methods in detail but will briely
on the subject earlier in the study is substituted for discuss the general ideas behind multiple imputation
the missing value. his requires that the outcome be briely.
measured repeatedly during follow-up so that a value
is available in case a subject drops out. Depending on The method of multiple imputation
the situation, this method can result in conservative
he method of multiple imputation is described in
conclusions that underestimate the true treatment
many articles in the literature [8, 9]. he article by
efects or anticonservative conclusions that overesti-
Enders [10] provides a useful primer on its use. One
mate the true treatment efects. LOCF will be conser-
advantage of the MI method is that it has been imple-
vative if, for example, people in the placebo group have
mented in several statistical sotware packages (for
worse responses than people in the treatment group
example both SAS and SPSS support multiple impu-
and drop out more frequently or earlier. LOCF will be
tation). he basic strategy is to create multiple com-
anticonservative if, for example, the disease is progres-
plete data sets with missing values imputed usually
sive and people in the treatment group drop out more
using a regression approach. he idea, in layman’s
frequently or earlier than people in the placebo group.
terms, is that you impute not the predicted value,
LOCF has been used extensively in the past but is no
but the predicted value plus an appropriate amount
longer recommended. It is typically associated with
of random ‘jitter’ that relects the uncertainty asso-
bias regardless of the missing data mechanism (even
ciated with that predicted value. he multiple data
if it is MCAR) and has generally very poor properties
sets are identical for the values that were observed;
relative to other methods.
they only difer with respect to the values that are
Other imputation methods are based on developing
imputed, which depend on the random ‘jitter’ that
regression models that predict the value that would have
is added to each predicted value. For a more detailed
been observed based on data that were observed (e.g.,
description of the process, see the Enders primer
observed outcomes at earlier time points and baseline
article cited above.
characteristics) and use the predicted value for the miss-
hese data sets are each analyzed separately using the
ing value. Clearly, these methods are only valid if the data
same statistical methods and the results for each analysis
are missing at random. Regression analyses estimate the
are recorded. Each analysis yields a parameter estimate
mean of values that would have been observed for a sub-
and a standard error of that estimate. hese individual
ject with a given set of predictor values. he value that
estimates are combined to provide the inal estimate of
would have been observed would also have a random
treatment efect and its associated standard error.
deviation about the predicted value. Ignoring this ran-
he MI method has several advantages. First, it
dom component would result in underestimating the true
uses observed data to develop the imputed values.
variability in the data. his, in turn, results in inappropri-
Each predicted value is based on a regression analysis
ately narrow conidence intervals and inappropriately
using observed data. Variables should be included in
large test statistics and small p-values. his is a problem
the prediction models that are either related to the vari-
with all so-called ‘single imputation’ methods such as
able being imputed or to the missing value process. he
those described above, including LOCF.
assumption is that data are MAR and so all of the infor-
he method of multiple imputation was developed
mation necessary for this process should be available.
to overcome the deiciencies of the single imputation
Another major advantage of the MI method is that it
methods. It uses likelihood methods to impute miss-
properly accounts for the uncertainty of the imputed
ing values but incorporates additional features that
(predicted) values and thus provides better estimates
account for the uncertainty in the imputation, which
of the true variance of the treatment efects.
is related to the random variability of the individual’s
observed value. Many authors have proposed methods
for imputing missing values. Some are based on like- Conclusion
lihood methods such as the Estimation/Maximization his chapter provides a very brief discussion of
(EM) algorithm. Others rely on regression models to selected techniques for analyzing data from clinical
67
trials. he methods presented here are both useful and of coronary artery bypass surgery. Stat Med 1993; 12:
are applicable in many real situations. he literature 1185–1195.
is rich with other statistical methods that can also be 4. Lachin JM. Statistical considerations in the intent-to-
applied to these studies, including methods for ana- treat principle. Control Clin Trials 2000; 21: 167–189.
lyzing time-to-event data, methods that incorporate 5. Rubin DB. Inference and missing data. Biometrika 1976;
covariate information, and methods that are less sen- 63; 581–592.
sitive to assumptions (e.g., non-parametric methods). 6. International Conference on Harmonisation, (n.d.).
he reader is urged to consult appropriate sources to www.ich.org.
expand on the material presented here [2]. 7. Liu M, Wei L, and Zhang J. Review of guidelines and
literature for handling missing data in longitudinal
clinical trials with a case study. Pharm Stat 2006; 5:
References 7–18.
1. Todd MM, Hindman BJ, Clarke WR, et al. Mild 8. Fraset G and Ru Y. Guided multiple imputation of
intraoperative hypothermia during surgery for missing data: using a subsample to strengthen the
intracranial aneurysm. New Engl J Med 2005; 352(2): missing-at-random assumption. Epidemiology 2007; 18:
135–145. 246–252.
2. Piantadosi S. Clinical Trials: A Methodologic Perspective 9. Kenward MG and Carpenter J. Multiple imputation:
(second edition). Hoboken, New Jersey : John Wiley and current perspectives. Stat Methods Med Res 2007; 16:
Sons, Inc, 2005. 199–218.
3. Peduzzi P, Wittes J, and Detre K. Analysis 10. Enders CK. A primer on the use of modern missing-
as-randomized and the problem of non-adherence: data methods in psychosomatic medicine research.
an example from the veterans afairs randomized trial Psychosom Med 2006; 68: 427–436.
68
Section 2 Concepts in biostatistics and clinical measurement
Chapter
Selecting outcome measures
7 Robert G. Holloway and Andrew D. Siderowf
Introduction of measures rather than on their development. Here we

provide an overview of outcome measures in neurology
he selection and proper use of outcome measures is of
clinical trials, including developing a conceptual end-
vital importance in clinical trials in neurology. Poorly
point model, role and use of biomarkers, and consid-
developed and chosen outcome measures can result in
erations on how to select, use and interpret them in the
missing true efects of a treatment (type II error) or may
context of early-stage clinical trial design.
capture weak signals of efects that are not clinically sig-
niicant (type I errors). hese errors can result in missed Outcome measures in neurology
opportunities, wasted resources, and patient harm.
he ield of translational research is providing us with clinical trials
an ever-increasing number of biomarker targets for he domains of outcomes used in neurology clinical tri-
early-phase clinical trials. Clear verdicts on therapeutic als range from biomarkers and lab correlates, signs and
advances will not occur without a reasoned approach to symptoms of disease, safety endpoints, functional scales,
outcomes measure selection based in a sound concep- disability scales, survival endpoints, patient-reported
tual framework. he development of methodologically outcomes, health-related quality of life measures, and
sound outcome measures is a critical step but is outside economic endpoints. Each subspecialty in neurology has
of the scope of this chapter which will focus on the use a growing portfolio of outcome measures [1]. Figure 7.1
Figure 7.1. The relative importance

of outcome measures and clinical trial
endpoints in drug development.
Endpoints in drug development
Learn zone Confirm zone
Discovery/ Early Middle Late

pre-clinical
Transl trails Dose/schedule Safety&activity Comparative
Biomarkers
Symptoms
Safety events
Functional status
Relative importance of Disability
endpoints in early, mid-, and Patient-reported
late stage development
Quality of life
Economic
69
Table 7.1 Endpoint models: treatments of various neurological disease
Concept Outcome Endpoints

Friedrich’s ataxia and reduction Increasing frataxin Primary
in frataxin expression expression Change from baseline in frataxin expression levels
Secondary
Symptom diary (ataxia rating scale)
Physical performance (activities of daily living scale)
Stroke Decrease in stroke rate Primary
Reduction in the proportion of patients with stroke
over a 3-year period
Secondary
Recanalization of an occluded artery
Stroke functional scale (e.g., NINDS)
ALS functional status Slow functional decline Primary
Change in ALS Functional Status Rating Scale over 6
month period
Secondary
Health-Related Quality of Life
Adverse event profile
Biomarker outcome (e.g., proteonomic profiling of CSF)
shows relative importance of clinical trial endpoints in aims to develop a clinically relevant and psychometric-
early, mid, and late stage therapeutic development. ally robust HRQL assessment tool for adults and chil-
Early stage clinical trials (phase 1–2) oten employ dren that will be responsive to the needs of researchers
biomarker targets for proof of concept or therapeutic in a variety of neurological disorders [4]. hese trends
validation. hese trials are sometimes referred to as the toward ‘patient-centeredness’ are also being motivated
‘learn zone’ of drug development (see Chapters 1–3). A by payers of medical care who will reward providers
growing number of biomarkers are available for early based on patient experiences and satisfaction with their
stage clinical trial development and are explained below. care. Finally, economic endpoints will be an increasing
Safety endpoints are of critical importance in all stages consideration as the comparative cost and cost-efec-
of development. Functional status and disability rating tiveness of competing interventions are evaluated.
scales are commonly employed in neurology to cap-
ture the multi-dimensional concept and manifestations Endpoint model
oten associated with neurological conditions. hese he choice of an outcome measure is one of the most
scales are oten the primary outcome measures used in important decisions in designing a clinical trial.
the ‘conirm zone’ of therapeutic development. Health- Selection of a primary endpoint and secondary end-
related quality of life (HRQL) and patient reported out- points should be driven by the clinical trial objectives,
come measures are becoming increasingly important the trial design, the target population enrolled, and
in clinical development programs. For example, the the conceptual framework of disease mechanism and
FDA issued its inalized guidance on the use of Patient the hypothesized efect of treatment. he result of this
Reported Outcome (PRO) measures to support new process should be a rationale measurement sequence
drug applications and labeling claims in product devel- based on biological efects, concepts being measured,
opment [2]. he NIH Toolbox initiative is utilizing state- outcomes being used, and the appropriate selection of
of-the-art psychometric and technological approaches clinical trial endpoints. Table 7.1 shows examples of
to develop brief yet comprehensive assessment tools endpoint models from various neurological diseases
for measuring motor, cognitive, sensory, and emotional and therapeutic programs. hese examples include the
function [3]. he NINDS funded Neuro-QOL project important domains of measurement, the physiological
70
Chapter 7: Selecting outcome measures
markers (i.e., biomarkers), the clinical outcome meas- esses, or pharmacologic response to a therapeutic
ures, and the clinical trial endpoints used in the stat- intervention.
istical analysis. he endpoint model is important to A clinical endpoint: A characteristic or variable that
help focus on the primary endpoints, the secondary relects how a patient feels, functions, or survives.
endpoints, and exploratory endpoints by explaining In a clinical trial, changes in a clinical endpoint may
the exact demands placed on the endpoints to meet the relect the efect of a therapeutic intervention. For
clinical trial objectives. the purpose of understanding the usefulness of a
herapeutic development programs can be viewed drug in a clinical setting, clinical endpoints are the
as in the learn zone and conirm zone, with conirm- most credible measure that can be assessed in a clini-
ation occurring in the phase 3 trial designed to test cal trial.
clinical eicacy against a standard or placebo [5]. he A surrogate endpoint: A biomarker that is intended to
learn zone includes those studies in development that substitute for a clinical endpoint. A surrogate end-
contribute to the necessary information to ultimately point is expected to predict clinical beneit or harm
conduct conirmatory clinical trials. hese are usually or lack of beneit or harm based on epidemiologic
within traditionally grouped phases 1 and 2 clinical tri- therapeutic pathophysiologic or other scientiic evi-
als (see Chapter 17). Since many early stage (learn zone) dence. Surrogate endpoints are a subset of biomar-
clinical trials use physiological measures or biomarker kers. he term surrogate literally means ‘substitute
endpoints in therapeutic development, we review the of ’ therefore the NIH working group discourages
role and use of biomarkers and surrogate endpoints in the use of the term surrogate marker because it sug-
various neurological disease programs. gests the substitution is for a marker rather than for
a clinical endpoint.
Biomarker in clinical trials
here is consensus that better biomarkers are needed in he greatest interest in biomarkers in clinical trials is
almost every area of neurology. Biomarkers can assist when they can be used as surrogate outcome measures.
in improved diagnosis of patients with neurological However, inding a valid surrogate outcome measures
disorders. Perhaps more importantly, biomarkers may can be very diicult. According to Prentice [7], a surro-
facilitate more rapid and reliable development of new gate endpoint must both correlate with the true clinical
therapeutics. outcome and fully capture the net efect of treatment
his account will focus on the role of biomarkers on the clinical outcome. A schematic showing this rela-
in clinical trials. he irst part will review deinitions tionship is shown in Figure 7.2.
of terms such as biomarker and surrogate endpoint, Excellent examples of valid surrogate outcome
place them in a theoretical context, and review some measures exist in some areas of medicine. Cholesterol
reasons that biomarkers may not succeed as surrogate as a marker for subsequent cardiovascular events is
outcomes. hen the role of biomarkers in the progres- one example. However, there are also many notable
sive phases of clinical trials will be addressed and inally examples of biomarkers that have not been successful
some examples of some classes of biomarkers currently surrogates. One of the most notorious examples of a
or potentially available for assessment of neurological failed biomarker is the use of electro-cardiogram in
disorders will be discussed. the Cardiac Arrythmia Supression Trial (CAST). In
this case, ECG showing more regular heartbeats was
Biomarker definitions and conceptual inversely correlated with survival. Although results
from trials with clinical measures as the primary
framework outcome are generally required for drug approval, in
he NIH Biomarkers Deinitions Working Group has some cases, the FDA may accept accelerated marketing
produced a standard set of deinitions for biomarkers approval based on efects on a surrogate endpoint.
and related concepts, and placed them in an overall he Biomarkers Deinitions Working Group con-
theoretical framework [6]. he key deinitions from ceptual model (Figure 7.3, adapted from [6]) shows
this panel are as follows: the relationship between biomarkers, surrogate mark-
A biological marker (biomarker): A characteristic that ers, and clinical outcomes. In this model, surrogate
is objectively measured and evaluated as an indica- outcome measures represent a fraction of biomarkers,
tor of normal biological processes, pathogenic proc- since only some biomarkers will meet the additional
71
Time Figure 7.2. The ideal surrogate

endpoint.
Intervention
Surrogate True clinical

Disease end point outcome
Figure 7.3. Relationship between

Surro- biomarkers, surrogate markers and clinical
gate for Clinical outcomes.
efficacy measure of
efficacy
Biomarkers for
efficacy
Provisional evaluation of Further evaluation of

safety and efficacy clinical efficacy and safety
(Phase I and II trials) (Phase III trials)
Surrogate
Clinical
for safety
measure of
Biomarkers for safety
safety
The surrogate is not in the causal pathway of the Of several causal pathways of disease, the intervention affects
disease process only the pathway mediated through the surrogate
Surrogate True clinical Intervention

end point outcome
Disease Surrogate True clinical
end point outcome
Disease
The surrogate is not in the pathway of the intervention’s effect The intervention has mechanisms of action independent of
or is insensitive to the effect the disease process
Intervention Intervention
True clinical Surrogate True clinical

outcome end point outcome
Disease
Disease
Surrogate
end point
Figure 7.4. Reasons why biomarker endpoints may fail as surrogate endpoints (adapted from [8]).
72
Table 7.2 Types of biomarkers used in therapeutic development
Role Description Examples

Disease biomarker Indicate the presence of likelihood of a particular apoE 4 for Alzheimer’s risk
disease
Mechanism Suggest a drug has its effect through a specific Reduction in inflammatory markers in MS
biomarker mechanism or pathway
Pharmacodynamic Used to determine the dose that has the highest Dosage showing greatest reduction in platelet
biomarker response to treatment aggregation in patients with stroke risk
Target biomarkers Show that a drug interacts with a particular target PET study showing serotonin displacement
in in vitro studies, or in vivo imaging studies by anti-depressant
Toxicity biomarkers Indicate potentially harmful effects Abnormal hepatic enzyme profile in novel
anti-convulsant drug
conditions to be called a surrogate outcome measure. between the disease and the true clinical outcome.
Biomarkers are particularly useful in early stages of he second, related, possibility is that there is more
drug development, including pre-clinical studies than one causal pathway and the potential surrogate
to determine the biological activity of a therapeutic is only relevant to one of these pathways. he third
agent. Surrogate endpoints become more useful in possibility is that the surrogate is not in the pathway
early clinical studies to predict whether an agent may of the intervention, or is insensitive to it. Finally, the
have an efect on clinical outcomes. Ultimately, studies intervention may have a mechanism of action that
that employ clinical outcome measures are needed to is independent of the disease process. his last sce-
determine whether an intervention should be adopted nario may be most commonly observed in the case
in clinical practice. of (potentially harmful) side efects of treatment. A
For the purposes of drug development, biomark- ith possibility proposed by Frank and Hargreaves
ers may be characterized in several other ways depend- is that the biomarker may be overly sensitive and not
ing on how the marker is used and the characteristic correlated with a meaningful clinical phenotype [9].
that it measures (See Table 7.2). One common schema In this case, improvements in the biomarker may
is to classify biomarkers as measures of disease state be demonstrated, but would not be associated with
or trait. A state biomarker measures the current status health beneits.
of disease, and may change over time as disease status
changes. Examples of state biomarkers are imaging
studies like MRI ater a stroke, which would show a Role of biomarkers in clinical trials
picture of the anatomic lesion that is producing clinical he enthusiasm for using biomarkers in clinical trials is
stroke symptoms. Trait biomarkers measure a charac- driven by the need to quicken the pace of clinical drug
teristic that does not change over time, such as a genetic development, as well as the proliferation of new tech-
mutation. Trait biomarkers are more likely to be meas- nologies. Using biomarkers has the potential to accel-
ures of disease risk. In the context of clinical trials, such erate the pace of drug development. Novel clinical trial
markers, like a positive gene test for Huntington’s dis- designs, including adaptive designs, are increasingly
ease, may be entry criteria for trial participation, but used with these novel endpoints [10]. his is particu-
are not suitable outcome measures for clinical trials. larly true in chronic and degenerative neurological dis-
Additional classiication schemes to describe biomar- orders where true clinical outcomes evolve very slowly
kers have also been deined. Some examples of these over time. In addition, biomarkers have the potential
categories are shown in Table 7.2, below. to provide complementary information about drug
As described by Fleming and DeMets [8], mechanisms that may be useful throughout the phases
biomarkers may fail to be valid surrogate endpoints of testing a novel therapeutic.
in four general ways. Reasons that a potential surro- he Code of Federal Regulations (CFR) deines
gate may fail are shown in Figure 7.3. he irst is that clinical trials as belonging to three distinct phases
the surrogate outcome is not in the causal pathway (1–3) [11]. Phase 2 is sometimes divided into two sub-
73
phases 2a and 2b. Phase 4 trials are not deined in the about drug mechanism in a phase 3 trial, and add to
CFR, but are oten included in discussions of develop- the credibility of the changes observed in the primary,
ment of pharmaceuticals (see Chapter 2). Biomarkers clinical outcome measure. Use of MRI lesion burden as
can be useful in furthering the goals of clinical trials at a secondary outcome in clinical trials of immune mod-
each stage; however, they may be particularly useful in ulating therapies for multiple sclerosis (MS) provides
phase 1 and phase 2 trials. an example of this application [13].
Biomarkers have a clear role in phase 1 clinical trials.
he purpose of these trials, in addition to determining
pharmacokinetics and metabolism is to identify early
Examples of biomarkers in neurology
evidence for biological activity. In this context, biomar- Biomarkers have been used in studies of a wide variety
kers may be particularly useful, and do not necessarily of neurological disorders with varying degrees of use-
need to be valid surrogate outcome measures, since the fulness. Some of these biomarkers address cellular or
purpose is not to predict a clinical outcome, necessar- microscopic features of disease, examples of this group
ily, but to detect signs of biological activity. his has include biochemical biomarkers and the newer ‘-omics’
led to a need for objective endpoints that allow clinical markers. Other biomarkers address system level physi-
trial sponsors to quickly evaluate whether an explora- ology, including electrophysiology or imaging studies.
tory drug is at least ‘reasonably likely’ to succeed and Clearly, however, overlap exists in these categories. For
to help sponsors make a ‘go-no go’ decision. Despite example, imaging is used with increasing frequency to
the challenges of using biomarkers as true ‘surrogates’ probe cellular mechanisms.
in conirmatory trials for registrational purposes they
have increasingly been used for learning purposes and Biochemical biomarkers
to assist sponsors in making decisions [12]. Biochemical biomarkers are chemical constituents of
Biomarkers continue to play a primary role in phase bio-luids or tissue that relect either disease patho-
2 trials. he purpose of phase 2a trials is to identify physiology or response to treatment. Biochemical
preliminary evidence of eicacy. Biomarkers, particu- biomarkers are attractive because they can potentially
larly those that are reliable surrogates for true clinical be measured in central laboratories with relatively
outcomes provide a means to accomplishing these low expenses. hey may provide a more direct meas-
objectives. Phase 2 trials can gain substantial eiciency urement of the biology of disease than other types of
from valid, reliable surrogate outcome biomarkers. biomarkers.
However, for the biomarker to be useful, it must change Biochemical biomarkers are ubiquitous throughout
more rapidly and/or be measured more precisely than medicine, ranging from measurement of serum elec-
the clinical outcome of interest. Relying on a surrogate trolytes to the latest high-tech bioassay. Biochemical
biomarker rather than a clinical outcome measure also biomarkers are also common in neurological disor-
avoids the problem of performing underpowered ei- ders including measurement of antibodies in inlam-
cacy studies in phase 2. hese studies are generally dif- matory neuropathies or spinal luid constituents in
icult to interpret or inconclusive and, in the worst case, meningitis.
may create ethical barriers to conducting subsequent, Work in biochemical biomarkers is well repre-
deinitive eicacy trials. sented by eforts to translate knowledge about path-
Biomarkers are also useful in phase 2b studies. he ology in Alzheimer’s disease (AD) into useful clinical
goal of phase 2b studies is to identify the best dose of a markers to follow disease progression during life.
medication to use in deinitive studies. Oten, this dose he most frequently studied potential biochemical
may be chosen based on the dose that produces the biomarkers for AD are beta-amyloid (Aβ1–42), total
greatest response in a surrogate biomarker. tau (t-tau) and phospho-tau (p-tau). hese biomarkers
In phase 3 and 4 studies, biomarkers generally play are attractive because they relect the plaque and tan-
a secondary role relative to valid clinical outcomes. gle pathology characteristic of AD. Aβ1–42 has been
While there are examples in medicine where medi- studied frequently as a biomarker for AD. In the CSF
cations can receive FDA approval based on changes of patients with AD, concentrations of Aβ1–42 are
on a biomarker (i.e., change in blood pressure), this reduced by 40–50%. However, Aβ1–42 concentrations
situation is the exception in neurology. Nonetheless, do not correlate with dementia severity, and levels
biomarkers may provide complementary information remain essentially unchanged over intervals up to one
74
year. By contrast, tau levels are increased in spinal luid imaging. In particular, metabolic imaging with PET
from patients with AD, but tend to decline over time or single photon emission computerized tomography
and with disease progression. Because neither Aβ1–42 (SPECT), are widely used as diagnostics and to fol-
nor tau correlate with disease duration of severity, they low disease progression in neurodegenerative disor-
are not valuable as natural history biomarkers. hey ders. Dopaminergic degeneration is clearly central to
may prove to be useful biomarkers of therapeutic efect the pathological process in PD, and changes can be
if a drug can be shown to normalize levels. However, measured with imaging in a way that corresponds well
the roles of Aβ1–42 and tau as markers of therapeutic to accepted ideas regarding PD pathology. PET and
efect in clinical trials are not established [14]. SPECT techniques are available to measure pre- and
post-synaptic neurons in the nigro-striatal pathways.
‘Omics‘ biomarkers In particular, 123iodine-labeled 2β-carbomethoxy-3β
he combination of emerging high-throughput assay (4-iodophenyl)tropane ([123I] β-CIT) SPECT, and [18F]
techniques and increased bio-informatics computing luorodopa (Fdopa)-PET have been used as measures
power has ushered in a new class of biomarkers includ- of the integrity of the nigro-striatal system. Both have
ing genomics, proteomics and metabolomics. he proven to be useful natural history biomarkers, show-
common links among these groups of biomarkers is ing consistent declines in binding of approximately
that they are derived from unbiased sampling of very 10% per year. However, there have been signiicant
large amounts of biological data and that the read-out problems with using these markers in clinical trials. In
obtained is a pattern of changes in multiple constitu- three trials where they have been used as biomarkers,
ents. his pattern of changes is sometimes referred to the changes observed in the imaging biomarkers have
as a proile. Metabolomic approaches quantify large been inconsistent with changes observed in clinical
quantities of small molecules collectively known as measures [18–20]. In all cases, the purpose of the bio-
metabolites using techniques such as mass spectros- marker study was to provide complementary evidence
copy. Computational methods capable of interpreting showing physiological changes consistent with clinical
very large amounts of data are used to identify patterns observation, thus bolstering the clinical data. However,
of metabolites present in patients with a given disease the disconnect between clinical and biomarker results
that are not present in controls. Metabolomic studies produced controversy regarding the validity of dopa-
have shown promise in identifying metabolomic pro- minergic imaging biomarkers and the way that clin-
iles for motor neuron disease [15] and Parkinson’s dis- ical disease severity is assessed in trials. he diiculty
ease (PD) [16]. It remains to be determined whether in interpreting these studies demonstrates challenges
these technologies may be useful in clinical trials. in validating biomarkers for use as surrogate outcome
Proteomics takes a similar approach, dealing with large measures in interventional studies.
numbers of proteins sampled from biological speci- Recently, PET imaging studies using ligands that
mens. Exploratory proteomics studies have identiied bind to β-amyloid have demonstrated the poten-
panels of proteins that distinguish patients with degen- tial for this imaging modality as a biomarker for AD.
erative disorders including AD and PD from each Pittsburgh Compound-B (PiB) was the irst of these
other and from normal controls [17]. Such biomark- compounds to be reported [21]. However, a number
ers identiied through unbiased approaches must be of other similar compounds are in development; in
validated in independent samples before they can be vivo studies show an excellent relationship between in
widely accepted. Genomics studies data generated vivo amyloid measurement in brain slices and amyloid
from studies of genes and gene expression and again imaging. Clinical studies have shown excellent capacity
requires intensive bio-informatic analyses. Although for amyloid imaging to diferentiate between patients
these -omics approaches show promise, to date they with AD and normal controls. Longitudinal studies
have primarily been studied as diagnostics. In the are needed to determine the ability of amyloid imag-
future, they may prove to be useful markers of response ing to predict progression of AD and to identify which
to therapy, and be integrated into clinical trials. patients with mild cognitive impairment will go on to
develop AD. While compounds like PiB are beginning
Imaging biomarkers to be incorporated into clinical trials, there is too little
Imaging biomarkers are ubiquitous in neurology. experience with them as biomarkers in trials to judge
hese modalities include CT, MRI and metabolic their usefulness.
75
Structural imaging with MRI or CT has been used respondent burden, and the availability of culturally
as both an entry criteria into clinical trials and as an adapted versions. Each instrument should have a dem-
outcome measure. For example, CT has been used to onstration of adequate measurement properties (reli-
deine entry criteria for thrombolysis trials in acute ability, validity, and ability to detect change, content
stroke [22]. and score distributions, and information about method
MRI has frequently been used as a measure of treat- of administration and user acceptability. Factors that
ment response of MS patients. he presence of multiple can contribute to respondent burden include length
brain lesions in patients with an isolated clinical event of questionnaire/interview, formatting, font size, liter-
is the best predictor of a subsequent diagnosis of relaps- acy level, need for privacy, and need for physical help
ing-remitting MS. MRI monitoring has been recom- in responding. Modiications to existing instruments
mended to screen new therapies [23], and is used as an should be avoided unless additional qualitative work is
outcome measure in MS clinical trials [13]. Although, proposed to document consistent measurement prop-
the relationship between T2 lesions and long-term erties. Depending on the endpoint or measure being
disability has been controversial [24, 25], one recent used, raters will need suicient training to standardize
study reporting 13 years of clinical follow-up showed procedures, reduce random error, and improve meas-
a strong relationship between T2 lesion burden and a ure reliability. his will not only improve study quality
number of important long-term clinical and imaging but ultimately lower sample size requirements through
measures of disease progression [26]. hese indings precision of measurement.
support the use of MRI as a biomarker for MS clinical Much of the above information may not be available
trials, and possibly its use as a surrogate endpoint to for newer physiological measures proposed for use as
predict important clinical outcomes. biomarkers in early translational trials. herefore, early
stage translational trials are also helping to establish the
Practical considerations in endpoint measurement properties of biomarkers in an iterative
process. his may lead to a situation where biomarker
selection validation lags behind the drug development program
Researchers should deine the role each endpoint is it is intended to support. herefore, when using a newly
intended to play in the clinical trial (e.g., primary, sec- developed physiological measure consultation with the
ondary, or exploratory endpoint). his is important so appropriate sponsor is critical for planning and imple-
that instrument development and performance can be menting the measure into the clinical trial.
reviewed in the context of its intended role and to prop-
erly plan for the appropriate statistical analysis. Each Pitfalls to avoid in selecting outcome
endpoint should be it to purpose and be cohesively
part of the endpoint model (see Chapters 20–26). A less measures
is more approach rather than a value-added approach here are several pitfalls to avoid in selecting outcome
to endpoint selection oten helps focus on those critical measures in clinical trials. hese include choosing an
domains needing measurement and helps avoid the endpoint or instrument with little known informa-
temptation to collect too much information. tion about its validity, reliability, and ability to detect
Characteristics of instruments and selection of change. In addition, one should not use a new measure
endpoints includes an extensive review of the literature without proper pilot testing or use a measure difer-
and detailed consideration of the proposed clinical ently than recommended, including altering questions
trial. Issues needing review include the concepts being or response options. One should use outcome meas-
measured, the number of items for each instrument, ures judiciously. For example, outcome measure
the conceptual framework of the instrument, the med- development may occur in early stage clinical trials
ical condition for the intended use, the population for to reine measurement properties to support their use
intended use, the data collection method, the admin- in conirmatory clinical trials. Alternatively, explora-
istration mode, the response options for each meas- tory outcome measures and endpoints may be used in
ure (e.g., visual analog scale, likert scale, rating scale, conirmatory clinical trials for a variety of purposes,
checklist, recording of events as they occur), the recall including selecting sub-populations who may demon-
period in question, the scoring of the instrument, strate greatest clinical beneit (e.g., ‘patient-selection’
the weighting of items or domains, the format, the biomarkers).
76
14. Sonnen JA, Montine KS, Quinn JF, et al. Biomarkers for
References cognitive impairment and dementia in elderly people.
1. ProQolid Database. http://www.proqolid.org/. Lancet Neurol 2008; 7: 704–14.
(Accessed June 11, 2010.) 15. Rozen S, Cudkowicz ME, Bogdanov M, et al.
2. Guidance for Industry. Patient-reported Metabolomic analysis and signature in motor neuron
outcome measures: Use in medical product disease. Metabolomic 2005; 2: 101–8.
development to support labeling claims. 16. Bogdanov MB, Beal MF, McCAbe DR, et al.
2009. http://www.fda.gov/downloads/Drugs/ Metabolomic proiling to develop blood biomarkers for
GuidanceComplianceRegulatoryInformation/ Parkinson’s disease. Brain 2008; 131: 389–96.
Guidances/UCM193282.pdf. (Accessed June 17, 2010.)
17. Abdi F, Quinn JF, Jankovic J, et al. Detection of biomarkers
3. Gershon RC, Cella D, Fox NA, et al. Assessment of with a multiplex quantitative proteomic platform in
neurological and behavioural function: the NIH cerebrospinal luid of patients with neurodegenerative
Toolbox. Lancet Neurol 2010; 9: 138–39. disorders. J Alzheimer’s Dis 2006; 9: 293–348.
4. Neuro-QOL. Quality of Life in Neurological Disorders. 18. Parkinson Study Group. Dopamine transporter brain
http://www.neuroqol.org/default.aspx. (Accessed June imaging to assess the efects of pramipexole vs levodpa
17, 2010.) on Parkinson disease progression. JAMA 2002; 287:
5. Sheiner LB. Learning versus conirming in clinical 1653–61.
drug development. Clin Pharmacol her 1997; 61: 19. Fahn S, Oakes D, Shoulson I, et al. Levodopa and the
275–91. progression of Parkinson’s disease. New Engl J Med
6. Biomarkers Deinitions Working Group. Biomarkers 2004; 351: 2498–508.
and surrogate endpoints: preferred deinitions and 20. Whone AL, Watts RL, Stoessl AJ, et al. Slower
conceptual framework. Clin Pharmacol her 2001; 69: progression of Parkinson’s disease with ropinirole
89–95. versus levodopa: he REAL-PET study. Ann Neurol
7. Prentice RL. Surrogate endpoints in clinical trials: 2003; 54: 93–101.
deinitions and operational criteria. Stat Med 1989; 8: 21. Klunk WE, Engler H, Norberg A, et al. Imaging
431–40. brain amyloid in Alzheimer’s diease with Pittsburgh
8. Fleming TR and DeMets DL. Surrogate end points in Compound B. Ann Neurol 2004; 55: 306–19.
clinical trials: Are we being misled? Ann Int Med 1996; 22. NINDS Stroke rt-PA Stroke Study Group. Tissue
125: 605–13. plasminogen activator for acute ischemic stroke. N Engl
9. Frank R and Hargreaves R. Clinical biomarkers in J Med 1995; 333: 1581–88.
drug discovery and development. Nature Rev 2003; 2: 23. Miller DH, Albert PS, Barkhof F, et al. Guidelines
566–80. for the use of magnetic resonance techniques in
10. Cofey CS and Kairalla JA. Adaptive clinical trials: monitoring the treatment of multiple sclerosis. Ann
Progress and challenges. Drugs 2008; 9: 229–42. Neurol 1996; 39: 6–16.
11. he Food and Drug Modernization Act of 1997. Title 24. Sormani MP, Bozano L, Roccatagliata L, et al. Magnetic
21 Code of Federal Regulations Part 312 Subpart H resonance imaging as a potential surrogate for relapse
Section 314.500. in multiple sclerosis: a meta-analytic approach. Ann
12. Spinella DC. Biomarkers in clinical drug development: Neurol 2009; 65: 270–77.
realizing the promise. Biomarkers Med 2009; 3: 25. Li DKB, Held U, Petkau J, et al. MRI T2 lesion burden
667–69. in multiple sclerosis. A plateauing relationship with
13. Jacobs LD, Beck RW, Simon JH, et al. Intramuscular clinical disability. Neurology 2006; 66: 1384–89.
interferon beta 1a therapy initiated during the irst 26. Rudick RA, Lee J-C, Simon J, et al. Signiicance of T2
demyelinating event in muliptle sclerosis. New Engl J lesions in multiple sclerosis: A 13 year longitudinal
Med 2000; 343: 898–904. study. Ann Neurol 2006; 60: 236–42.
77
Section
Section3 Special study designs and methods for data monitoring
Chapter
Selection and futility designs
8 Bruce Levin
Introduction Resources may then be saved to bring only the non-

futile treatments forward for deinitive testing. he
Selection designs and futility designs ofer investiga-
futility design was adopted by the National Institute of
tors a way to screen potential therapies in early phase
Neurological Disorders (NINDS) supported NET-PD
clinical research in a relatively rapid manner with fewer
network to screen out unpromising neuroprotective
patients than would be required for a traditional phase
agents in Parkinson’s disease (PD), and was introduced
3 trial for each candidate. To do so requires changing
in the neurological literature by this group in a series of
the standard phase 3 paradigm in some substantial
reports and didactic publications [1–6]. he design has
way. In a futility design, the paradigm is still that of
also been proposed, discussed, and/or used in stroke
hypothesis testing, but the traditional null hypothesis
research [7], amyotrophic lateral sclerosis (ALS) [8–
of no efect and the two-sided alternative hypothesis of
11], and Huntington’s disease [12].
unequal eicacies are reformulated in such a way as to
To gain some insight into the characteristics that
better screen out unpromising therapies. In a selection
a useful screening program would have, consider
design, there is a radical shit away from the hypothesis
what impact errors of omission and commission have.
testing paradigm altogether, with a diferent goal – to
Suppose a treatment which is truly superior to a placebo
select the best from among several competing treat-
fails by chance to show promise in early phase human
ments. In this chapter we explain the rationale for these
trials. If the development program for this treatment
changes and the basic methods required, starting on
were terminated as a result, the loss to humanity could
more familiar ground with the futility design.
be tremendous. But if a treatment which is truly no bet-
ter than a placebo looks promising by chance in early
The logic of the futility design tests and is brought forward for deinitive testing as a
he futility design has appeal for phase 2 clinical trials result, the costs could be measured in time, money, and
which seek to obtain a preliminary indication of prom- perhaps risks for the patients involved, yet the disap-
ising eicacy of an experimental treatment, or the lack pointing truth will ultimately be revealed. Assuming
thereof, i.e., an indication that further research with the that the irst type of error is the more serious, and given
treatment would be futile. he motivation for a futility that we so desperately need safe and efective neuro-
study arises from a familiar context encountered in can- protective agents with precious few resources to ind
cer research and currently facing neurodegenerative them, it makes sense to design the screening program
disease researchers: there are many possible treatment to be less speciic than phase 3 testing traditionally
candidates but each has only a low a priori probability requires in exchange for greater sensitivity to promis-
of having worthwhile eicacy. In such circumstances ing treatments. his implies that we should be willing
it would be impractical to demand a deinitive phase to tolerate a low positive predictive value – ater all,
3 study for each of those high-cost, low-expectation good therapies will be hard to ind under any circum-
endeavors. A better strategy is to screen candidate ther- stances – in exchange for a high negative predictive
apies using relatively fewer patients in each case, to be value, such that candidate treatments which fail the
sensitive to suggestions of eicacy, but to stand ready screen are quite likely to be truly without merit. hese
to weed out candidates that lack suicient promise. conclusions are consistent with a public health and
78
Chapter 8: Selection and futility designs
economic perspective: it is important not to overlook so statistical methods must be used to infer whether
potentially useful drugs but carrying forward agents the superiority or non-superiority hypothesis is true.
with a low probability of success is not economically Note also that θ0 should represent an average disease
sustainable. progression better than that of untreated patients or
he futility design has the above desired proper- patients on placebo. In fact, in order for the alterna-
ties. It is more properly designated a non-superiority tive hypothesis of non-superiority to imply that it
design because the null hypothesis which it tests states would be futile to conduct further testing, the value
that the experimental treatment possesses a pre-speci- of θ0 should represent an agreed upon minimum
ied degree of superiority, while the alternative hypoth- worthwhile eicacy (or maximum allowable progres-
esis, which generally confers a design its name, states sion). his must be done with care, and a consensus
that the experimental treatment does not possess the of expert judgment is essential, as is careful education
required degree of superiority, i.e., is non-superior.1 of, and buy-in by, key trial participants and patients.
To formalize the statement of the design, let In order to qualify as superior, then, an experimental
θ denote a population parameter measuring, for treatment must lead to a certain minimum slowing of
example, the true average clinical progression of dis- disease progression, which we represent by the posi-
ease over a period of time, with larger values indicat- tive quantity Δ0 = θP – θ0 > 0, where θP denotes the true
ing greater disease progression. For example, θ might average disease progression for patients on placebo.
denote the average increase in the Uniied Parkinson’s If θE > θ0 the experimental treatment is deemed non-
Disease Rating Scale (UPDRS) over a given time superior or ‘futile’ even if it represents a true average
period for a population of PD patients, or the average disease progression better than that of a placebo, i.e.,
decline in the revised ALS Functional Rating Scale even if θP > θE > θ0, because it does not achieve the
(ALSFRS-R) over a given time period for a population minimum worthwhile eicacy.
of ALS patients.2 he key step is to deine the criterion We may now formally state the null and alterna-
of superiority, which can be speciied in several ways tive hypotheses for a single-arm futility design, as
depending on other design elements. In a single-arm follows:
design, a value of θ, say θ0, is pre-speciied such that an
H0: θE ≤ θ0 (superiority) versus
experimental treatment will be deined as ‘superior’
H1: θE > θ0 (non-superiority).
if θE ≤ θ0 and will be deined as ‘non-superior’ if θE >
θ0, where θE denotes the true value of θ for patients on Note that in this formulation we do not need to know
the experimental treatment. Note that θE is unknown, what the true placebo progression θP might be exactly,
only that θP > θ0. If a value of θP is known, the hypoth-
eses can be restated equivalently in terms of the slowing
1
he non-superiority design should not be confused
with the non-inferiority design which is oten used
of disease progression ΔE = θP – θE and Δ0 = θP – θ0 as:
in the pharmaceutical industry. In a non-inferiority H0: ΔE ≥ Δ0 (superiority) versus
design, the null hypothesis states that a new treatment H1: ΔE < Δ0 (non-superiority).
has a pre-speciied degree of inferiority compared to
a standard active treatment, and the goal is to reject his is an additive formulation of treatment efect.
that hypothesis in favor of the alternative hypothesis Sometimes a multiplicative formulation may be pre-
of non-inferiority, i.e., acceptable comparability with ferred. In that case the deinition of superiority would
the standard treatment. hus the goals (demonstrating
non-superiority versus non-inferiority) and the types of
be stated in terms of the quantity RE = 100(1 – θE/θP)%,
comparators (placebo versus active) make these designs which is the percentage reduction in the true average
quite distinct. See Chapter 16 for further discussion of disease progression of the experimental treatment rel-
the non-inferiority design. he futility design should ative to placebo, and R0 = 100(1 – θ0/θP)%, which is the
also be distinguished from phase 3 monitoring plans that minimum worthwhile percentage reduction:
allow early stopping for lack of eicacy, called ‘futility
stopping’ in that context. See Chapter 19. H0: RE ≥ R0 (superiority) versus
2
We sidestep here the question of whether θ truly measures H1: RE < R0 (non-superiority).
a characteristic of the actual mechanism underlying
disease progression, includes merely symptomatic In the two-arm design with concurrent placebo con-
features, or both. We wish to let θ refer to representative trols, both θE and θP are unknown. he hypotheses
changes in the usual clinical measures of disease severity. of the futility design are then deined in terms of the
79
Section 3: Designs and methods for data monitoring
slowing of disease progression due to the experimen- are generally smaller than in phase 3 trials, use of the
tal treatment compared to placebo, ΔE = θP – θE, and a traditional formulation can easily produce an under-
pre-speciied positive minimum worthwhile slowing of powered study, even more so if a traditional two-sided
disease progression, which we also denote by Δ0 > 0: design is used, with all of the consequent logistical
uncertainties when one fails to reject the null hypoth-
H0: ΔE ≥ Δ0 (superiority) versus
esis of no beneit.3
H1: ΔE < Δ0 (non-superiority).
Another diference is revealed by considering type
In a multiplicative formulation, it is easiest to state the I and type II errors and the corresponding sensitivity
hypotheses as follows: and speciicity of the screening program. In a futil-
ity design, a type I error occurs when a truly superior
H0: θE – π 0θP ≤ 0 (superiority) versus
treatment by chance produces suiciently unprom-
H1: θE – π0θP > 0 (non-superiority),
ising results as to cause a declaration of futility. Our
where 100(1 – π0)% is the minimum worthwhile per- premise is that this would be a serious error whose rate
centage reduction in the true average disease progres- of occurrence is controlled by specifying a reasonably
sion. For example, if a treatment would be deemed low alpha level at the criterion of superiority, ΔE = Δ0.
superior if it caused a 20% decrease in the decline of the A type II error occurs when we fail to declare a truly
ALSFRS-R over a nine month follow-up period, then non-superior treatment futile. It is natural to assess
π0 = 0.80 and we would test H0: θE – 0.8θP ≤ 0 versus H1: the power of the test at the particular parameter value
θE – 0.8θP > 0. Such a formulation was used in the futility of placebo eicacy in the alternative hypothesis. For a
trial of coenzyme Q10 in ALS (the QALS trial [8, 9]). single-arm trial we consider the design alternative to
Are there any practical diferences between a futil- be θE = θP; for an additive two-arm trial the design alter-
ity design and a traditional one-sided hypothesis test? native is taken to be ΔE = 0; and for a multiplicative two-
Why not just use a traditional test using a more liberal arm trial the design alternative is taken to be θE – π0θP =
type I error rate to reduce the sample size? he answer is (1 – π0)θP for some assumed value of θP.4 Let us deine
somewhat surprising: the practical diference between ‘sensitivity’ to mean the probability that we declare a
the designs is not a matter of statistical power or sample truly superior treatment ‘non-futile’ and ‘speciicity’
size. Indeed, as discussed below, a traditional one-sided to mean the probability that we declare a truly non-
design and a futility design have parallel operating superior treatment ‘futile’. hen sensitivity is equal to
characteristics. Rather, the practical diference appears the probability of failing to reject the null hypothesis of
in terms of what can be said and how to proceed in the superiority with a truly superior treatment, i.e., 1 – α, at
event that we fail to reject the null hypothesis in one the criterion for superiority (or greater if the treatment
design or the other. For the traditional test we make is even better), while speciicity is equal to the power
statements such as ‘we cannot rule out that the experi- of the test. In the traditional design, sensitivity would
mental treatment is no better than placebo with 95% correspond to the power of the test (the probability of
conidence’ and exhibit the disappointing conidence rejecting the null hypothesis of no beneit with a super-
intervals which include the parameter ΔE = 0. Even if ior treatment at a given level of eicacy) while specii-
the trial results are truly inconclusive concerning the city would correspond to the probability of failing to
eicacy of the treatment and the conidence interval
includes rather promising values, the pall of insignii- 3
It is worthwhile to point out here that, as always in
cance has been cast over the results and ‘spin’ state- hypothesis testing, failure to reject H0 is not equivalent to
ments are ultimately post hoc. With the futility design, accepting H0 as true. In the futility design if we fail to reject
however, failure to reject the null hypothesis of super- the null hypothesis of superiority, we do not conclude
iority leads to statements such as ‘with 95% conidence the experimental treatment is superior to placebo. hat
we cannot rule out that the experimental treatment inference must await an adequate and well-controlled
is superior,’ and thus the research should continue to phase 3 clinical trial. We must only conclude that the
evidence was insuicient to rule out true superiority.
deinitive phase 3 testing. his diference may be philo- 4
As always, it is best to examine the entire power curve
sophical, but the latter statement represents a huge for all values of θE (or ΔE or θE – π0θP) rather than just at
advantage. It is consistent with a screening program, the speciic design alternative, in order to fully perceive
and it has the strength of having been planned a pri- the operating characteristic of the trial under all possible
ori. Moreover, given that sample sizes in phase 2 trials parameter values.
80
reject the null hypothesis of no beneit given that the the futility design does a reasonable job of producing
eicacy of the treatment is the same as that of placebo, negative weight of evidence for unpromising therapies,
or 1 – α. Insofar as it is typical to set the type I error though enthusiasm should be tempered when a bor-
probability α lower than the type II error probability derline non-futile result is obtained. If a therapy passes
β in a traditional trial, it follows that sensitivity will be the screen of non-futility, it still must undergo subse-
greater than speciicity for the futility design compared quent deinitive phase 3 testing before it can be consid-
to the traditional design. For example, if α = 0.05 and ered eicacious.
β = 0.20 (for 80% power) at the design alternative, the
futility design will have 95% sensitivity and 80% speci-
icity, whereas the traditional design would have 80% Conducting a futility test and sample
sensitivity and 95% speciicity. size considerations
What does this say about the predictive values of A futility analysis is conducted depending on the
the screening program? Suppose we interpret ‘futility’ precise formulation of the hypotheses and the sta-
as a negative outcome and ‘non-futility’ as a positive tistical distribution of the primary endpoint. For
outcome. hen the negative predictive odds of a futility brevity, we shall only consider the case of a normally
outcome is given by the prior odds on a non-superior distributed variable with mean θE and standard devi-
treatment times the likelihood ratio of speciicity over ation σ, and illustrate with the primary endpoint of
one minus sensitivity, or (1 – β)/α = .80/.05 = 16. his the NET-PD futility studies described in [5], namely,
likelihood ratio means that a futility outcome is at least the increase in the UPDRS total score between base-
16 times more likely under the non-superiority hypoth- line and either the time at which there was suicient
esis at the design alternative of no beneit than under disability to warrant symptomatic therapy for PD or
the null hypothesis of criterion superiority. On the 12 months, whichever came irst. he threshold value
other hand, the positive predictive odds of a non-futile was deined as an increase in the UPDRS that was 30%
outcome is given by the prior odds on a superior treat- less than the mean progression observed on the total
ment times the likelihood ratio of sensitivity over one UPDRS score in a historical control group. In this
minus speciicity, or (1 – α)/β = 0.95/.20 = 4.75 (mean- case, the historical control group was chosen to be
ing a non-futile outcome is 4.75 more likely under the the group receiving either placebo or α-tocopherol in
superiority hypothesis than under the design alterna- the Deprenyl and Tocopherol Antioxidative herapy
tive of no beneit).5 hus a futility outcome multiplies of Parkinsonism (DATATOP) trial (n = 401), and the
the prior odds on non-superiority – which must be mean increase in total UPDRS score was 10.65 units
quite high, given the rarity of neuroprotective agents – with a standard deviation of 10.4 units [14]. Taking
by a factor of 16 or more, yielding a posterior odds θP as 10.65 and σ as 10.4, θ0 was deined as 0.7 × 10.65
on non-superiority yet an order of magnitude greater or θ0 = 7.455.
than the prior odds, whereas failure to declare futility First consider a single-arm study. Let Yi denote the
increases the prior odds on superiority – which must observed decline for the ith patient (i = 1,…,n), let Ȳ
be quite small – by a factor of only 4.75.6 Consequently, denote the average of these values, and let s be the sam-
ple standard deviation. he pivotal test statistic is then
5
Strictly speaking, these likelihood ratios consider only Student’s t statistic, t (Y − 0 ) / s / n ) . We reject the
the evidence of having declared a treatment futile or non-
null hypothesis of superiority in favor of the alternative
futile but nothing more. More informative likelihood
ratios can generally be constructed using the observed hypothesis of non-superiority and declare futility if t ≥
data from the experiment. tn–1;α where tn–1;α is the critical value of Student’s t distri-
6
For instance, if the prior odds on non-superiority is 10 bution with n–1 degrees of freedom cutting of prob-
to 1 (corresponding to a prior probability of superiority ability α in the upper tail. Equivalently, we reject the
of 1/11), then increasing the prior odds by a factor of null hypothesis of superiority if the one-tailed p-value
10 yields posterior odds on non-superiority of 100 to 1 (computed from the t distribution with n–1 degrees of
(corresponding to a posterior probability of superiority
freedom) is less than α. he power of this test is given by
of 1/101 ≈ 0.01). On the other hand, increasing the
prior odds on superiority of 1 to 10 by a factor of 4.75 P[tn–1(λ) > tn–1; α], where tn–1(λ) has a non-central t dis-
yields posterior odds on superiority of 4.75/10 = 0.475 tribution with non-centrality parameter λ = (θE – θ0)/
(corresponding to a posterior probability of superiority (σ/√n). At the design alternative of treatment eicacy
of only 0.475/(1 + 0.475) = 0.322). equal to that of placebo, the non-centrality parameter is
81
λ = Δ0/(σ/√n). Standard sotware for computing power if t ≥ tν;α. Equivalently, futility is declared if the one-
for a one-sample t-test can be used with speciication of tailed p-value (computed from the t distribution with
the diference to be detected as Δ0 = θP – θ0, the standard nP + nE – 2 degrees of freedom) is less than α. he power
deviation as σ, the signiicance level as α (one-tailed), of this test is given by P[tν(λ) >tν;α], where the non-cen-
and the sample size as n. trality parameter is now given by:
In the NET-PD studies the type I error probability
was chosen as α = 0.10 with a sample size of n = 58.7 his λ (θ E − π 0 θ P ) / (σ E
1
+ π 20nP−1 ).
sample size provided 85% power to detect futility if the
eicacy of the treatment was the same as that of placebo At the design alternative θE = θP, the non-centrality
(θE = θP = 10.65) assuming σ = 10.4 and θ0 = 7.455. parameter for equal sample sizes nE = nP = n is the
For an additive two-arm design, the t statistic is: quantity:
0 ) / (s p nP + nE ) , where YP and YE
t (Yp −YE 1 −1 ̄ ̄
are the sample means in the placebo and treatment λ ( (
π 0 )θ p / σ +π ).
groups, respectively, nP and nE are the respective sam-
ple sizes, and sp is the usual pooled standard deviation In our example with equal sample sizes of n per group
estimate. he null hypothesis H0:ΔE ≥ Δ0 is rejected and π0 = 0.70, corresponding to a 30% improvement
in favor of the alternative hypothesis H1:ΔE < Δ0 and in disease progression, the non-centrality parameter is
futility is declared if t ≤ –tν;α, where the degrees of λ = (0.30·10.65) / [10.4·{(1+0.72)/n}½ ] = 2.334 at the
freedom are ν = nP + nE – 2. Equivalently, futility is design alternative θE = θP = 10.65. Now a sample size
declared if the one-tailed p-value (computed from of n = 86 patients per group or 172 in total would be
the t distribution with nP + nE – 2 degrees of free- needed to achieve 85% power, about triple the sample
dom) is less than α. he power of this test is given size of the single-arm design and a saving of 58 patients
by P[tν(λ) < –tν;α], where the non-centralityy param- over the two-arm additive formulation. he saving is
eter is now λ (θP − θE − Δ 0 ) / (σ E 1 +n + P−1 ) . At the due to the reduced variability of ȲE − π0YP̄ compared
design alternative θE = θP , the non-centrality param- with ȲE − ȲP, by the factor (1 + π02)/2.
eter for equal
q sample sizes nE = nP = n is the quantity It should be noted that the power of the multiplica-
−Δ 0 / (σ 2 / ) . Standard sotware for computing tive futility test depends on the true value of the placebo
power for a two-sample t-test can be used with spe- parameter θP. For given sample sizes, if θP is at least as
ciication of the diference to be detected as –Δ0, the large as assumed in the design, the power will be at least
standard deviation as σ, the signiicance level as α (one- as large as planned at the design alternative θE = θP, but if
tailed), and the sample size (per group) as n. the true placebo decline is smaller than assumed, power
If the NET-PD studies had been designed as two- will decrease because the non-centrality parameter at
arm studies with concurrent placebo groups having the design alternative decreases as its numerator θE – π0θP
equal sample sizes, Δ0 = 10.65 – 7.455 = 3.195, and = (1 – π0)θP decreases. his phenomenon does not occur
σ = 10.4, they would have required n = 115 per group with the additive formulation if Δ0 is chosen independ-
to achieve the same power of 85%, essentially quad- ently of the assumed placebo decline, although if Δ0 is
rupling the total number of patients compared to the expressed as a fraction of the assumed placebo decline,
single-arm study. the power will again depend on it. his phenomenon is
For a multiplicative formulation in the two-arm also analogous to the efect of overestimating θP in the
design, some saving in sample size is possible. For H0: single-arm design. here, if the historical control value
θE – π0θP ≤ 0, the pivotal test statistic is: of θP is greater than the true concurrent placebo value
and θ0 is set at π0 times the historical control value, an
t (YE − 0YP ) / (s p nE−1 + π 20nP−1 ), experimental treatment with only the true concurrent
placebo eicacy may fail to be declared futile with high
which again has Student’s t distribution with ν = nP + probability. his is why it is important to have consensus
nE – 2 degrees of freedom under H0. he null hypoth- that a therapy with disease progression no worse than θ0
esis H0: θE – π0θP ≤ 0 is rejected and futility is declared would indeed be a superior treatment. We discuss this
point further below.
7
he actual target enrollment was set at n = 65 in order to To summarize, the factors that determine the sam-
allow for losses to follow-up. ple size needed for a futility design are, in roughly
82
decreasing order of importance: (i) the number of arms region depends on the sample standard deviation sp and
in the study; (ii) the standard deviation of the primary the group sample sizes, n. As n increases this distance
endpoint, σ; (iii) the non-centrality parameter at the narrows, implying a greater demand on the experi-
design alternative, which in turn depends on Δ0 in the mental treatment to demonstrate promising eicacy in
single arm design and the additive two-arm design, as order to avoid a declaration of futility. Conversely, as n
well as θP in the multiplicative two-arm design; and (iv) decreases, the demand is lessened.
the type I error probability, α, and desired power 1 – β. However, one wants to avoid the awkward situation
portrayed in Figure 8.2. Here n is so small that the critical
Potential pitfalls value is actually negative. his means that if the observed
average disease progression in the experimental group
here are a few pitfalls to be avoided when planning a
falls into the circled region, actually looking worse than
futility study. he irst is that if the sample size is too
that in the placebo group, it would nevertheless fail to
small, a rather awkward situation can arise. Consider
cause a rejection of superiority. he same result could
Figure 8.1, which schematically portrays a properly
occur if σ were seriously underestimated, such that
designed two-arm additive futility study with equal
the value of s p 2 / n that results is too large. It would
sample sizes. he vertical axis portrays the diference in
be awkward indeed to argue in favor of bringing the
the mean disease progression between the placebo and
experimental treatment forward for phase 3 testing as
experimental arms. Positive values towards the top of
a promising therapy when it looked worse than placebo
the diagram indicate better eicacy for the experimen-
in the futility trial. here is nothing logically inconsist-
tal treatment than placebo and negative values towards
ent here – the statement that the data are insuicient to
the bottom indicate worse eicacy for the experimen-
rule out superiority at level α is still correct, but the data
tal treatment. On the let side of the diagram, the scale
possess such small evidentiary weight that the statement
portrays true population parameter values and identi-
has little value. his is analogous to the situation with an
ies the regions in the parameter space corresponding
underpowered phase 3 design. he key is to be sure to
to the null hypothesis of superiority and the alterna-
have an adequate sample size (to have a high probability
tive hypothesis of non-superiority. On the right side
of declaring futility when the experimental treatment is
of the diagram the scale portrays the sample average
truly inefective) and not to underestimate σ.
values of Ȳ(P) − Ȳ(E) and identiies the critical region
An interesting case arises that is intermediate
Y ( P ) − Y ( E ) ≤ Δ 0 − t 2n s p 2 / n , where the null
between Figures 8.1 and 8.2, portrayed in Figure 8.3.
hypothesis of superiority is rejected and its comple-
Here the critical value for Y(P)̄ − Y(E)̄ is exactly zero, as
ment where superiorityy cannot be ruled out at level α.
would occur if Δ0 happened to equal t 2nn s p 2 / n . In
he distance t 2nn s p 2 / n between Δ0 and the critical
Difference in endpoint means

Difference in endpoint means
(P ) (E )
(P ) (E )
Truth (θP–θE) Better Data (Y n − Y n )
Truth (θP – θE) Better Data (Yn − Y n )
P E
P E
H0
H0 (superiority) Do not reject H0
(superiority) Do not reject H0 (cannot rule out superiority)
(cannot rule out superiority) θP– θE = Δ0
θP – θE = Δ0
Critical value
H1
Δ0–t2n-2;α
H1 (non-superiority)
sp(2/n)1/2
(non-superiority) θP–θE = 0 Critical value
θP – θ E = 0 Reject H0 Δ0–t2n-2;α
(declare futility) sp(2/n)1/2
Reject H0
Worse (declare futility)
Worse
Figure 8.2. Schematic diagram of a poorly designed futility study.
Figure 8.1. Schematic diagram of a well designed futility study. The oval indicates the awkward region.
83
Difference in endpoint means he last pitfall relates to the use of historical con-
(P ) (E )
trol data in the single-arm design. he problems of
Truth (θP–θE) Better Data (Y n − Y n )
P E
interpreting studies using historical control data are
well-known and need not be repeated here. It will suf-
H0 ice to point out that if θ0 (or π0) represents a super-
(superiority) Do not reject H0
iority criterion based not on an absolute judgment of
(cannot rule out superiority)
θP–θE = Δ0 (select E) how well a superior treatment should perform in the
current patient population but instead represents a
value that would have been superior in the historical
H1 patient population, the single-arm futility study may
(non-superiority) Critical value not rule out even a true placebo as futile. his is what
θP–θE = 0 Δ0–t2n-2;α occurred in the early NET-PD futility studies, where θ0
sp(2/n)1/2 was determined based on a 30% improvement in the
Reject H0
(declare futility) DATATOP placebo/tocopherol group, observed about
Worse (select P) 15 years earlier. It turned out that θ0 was too large rela-
tive to the current patient population, such that even
the placebo-treated patients recruited concurrently in
Figure 8.3. Schematic diagram of a futility design equivalent to a
selection design. the futility studies as ‘calibration controls’ [13] could
not be rejected as futile. his required a series of sen-
sitivity analyses that ran counter to the notion of a
this case the decision rule is identical to that of a sym- pre-speciied deinition of superiority. For further dis-
metric selection procedure: declare futility if and only if cussion, see [5–7]. he lesson to be learned is that the
̄ ≥ Y(P)
Y(E) ̄ , i.e., we select the experimental treatment as additional resources needed for a two-arm study with
potentially preferable to the placebo if and only if it does concurrent controls may well be worth the cost to pre-
better, no matter by how small an amount. he power serve internal validity. Later NET-PD futility studies
of this test is 50% at the design alternative of treatment have used concurrent controls.
eicacy equal to that of placebo. Diferent views are pos-
sible here, but some would argue that in this case a one- Selection designs
half chance of proceeding to phase 3 may be reasonable Not every research goal calls for a hypothesis test. here
because in such close cases, where the treatment looks are times when the primary goal is to select a treatment
better than placebo, a phase 3 trial ought to be done to or a dosage of a treatment to bring forward for the next
settle the question. he QALS trial came close to this phase of clinical testing or the next study, which need
case [8]. We discuss selection procedures below. not be phase 3, or to select a subset of candidate treat-
We mentioned above that a futility test and a one- ments from amongst a larger set of competitors. When
sided test of the traditional null hypothesis that θE ≥ a choice must be made – because constrained resources
θP versus the alternative hypothesis that θE < θP have do not allow phase 3 testing of all competitors, or, in
parallel operating characteristics. hat is because, in other circumstances, because an optimal dosage of the
the additive two-arm design, for example, if the type experimental drug is unknown – it is natural to use a
I error rate is α in each case, the critical values for Y(P)
̄ selection procedure to assist in the decision-making.
− Y lie the same distance away from the respective
̄
(E)
At such times setting up a null hypothesis and control-
null hypothesis values, the distance in each case being ling the probability of committing a type I error may
t 2nn s p 2 / n . It follows that the non-centrality par- be entirely irrelevant. Indeed, if all of the competitors
ameter λ and the power function, P[tν(λ) > tν;α], are have equal eicacy, we might be completely indiferent
identical for the two designs. hus it is incorrect to view as to which treatment we select.8 If, however, there is
the futility design as inherently more eicient than a
traditional design (another pitfall). If futility designs 8
Other things like cost and side efects being equal. We
are more eicient than those used for phase 3 trials,
shall assume ‘other things equal’ here in order to focus on
it is because futility designs may use one arm rather basic principles. In practice, if there is only weak evidence
than two, one-tailed rather than two-tailed testing, and supporting a selected treatment against a competitor,
α=0.10 or more rather than α=0.05 or less. other factors will of course play a role in the inal decision.
84
a truly superior treatment among the competitors, we to bring forward for further testing. In the so-called
shall want our selection procedure to select that one indiference zone approach, which we follow here, one
correctly with high probability. Selection procedures pre-speciies a minimally worthwhile diference in ei-
thus ofer an attractive approach to the problem of cacy, denoted by Δ0. As in our discussion of the futil-
screening potentially good treatments. ity design, and with the same notation, we assume a
Selection procedures have been in the statistical lit- normally distributed measure of disease progression
erature for more than a half century [15–19]. hey irst Y with mean θ1 or θ2 and common standard deviation
appeared in the clinical trials literature in the 1980s σ. If the true diference between θ1 and θ2 is less than
[20–22] and are enjoying a resurgence due to cur- Δ0 in magnitude, then one should be indiferent as to
rent interest in adaptive clinical trial designs [23–24]. which treatment is selected, precisely because the dife-
When an optimal dosage of a drug is unknown, for rence is not worthwhile. If, however, the true diference
example, it is very appealing on grounds of trial ei- between θ1 and θ2 is at least Δ0 in magnitude (falling
ciency to consider selecting a good dose as part of the into the ‘preference zone’), then the selection proced-
same experiment that will evaluate the drug’s promise ure should provide a correct selection with probabil-
(in the context of an adaptive phase 2 trial) or its actual ity no smaller than some pre-speciied value P* such
eicacy (in the context of an adaptive phase 3 trial). as 0.80. hus, if θ1 = θ2, we are completely indiferent
he QALS trial [8, 9] was a two-stage adaptive phase (in terms of eicacy) as to which treatment is selected,
2 trial that used a selection procedure in its irst stage and a one-half chance of selecting either is perfectly
to choose which of two high doses of coenzyme Q10 acceptable. As θ1 and θ2 diverge, we want the probabil-
(1800 mg/day versus 2700 mg/day) to bring forward ity of correct selection (which we abbreviate PCS) to
for a futility test in the second stage. It was adaptive grow, approaching P* as |θ1 – θ2| approaches Δ0. For
in the sense that the same data used for the selection even larger diferences, the PCS should surpass P* and
decision were used again in the futility test to compare approach certainty for large |θ1 – θ2|.
the selected dose with the concurrent placebo con- To achieve these goals with ixed sample sizes we
trol data. As another example, the Combination Drug may randomize n patients on each treatment and select
Selection Trial had as its primary goal the selection the treatment with the smallest observed average dis-
between two combination therapies (celecoxib and ease progression. he probability of a correct selection
creatine versus minocycline and creatine) for further is then given by P[Ȳ1 < Y2̄ ] if θ1 < θ2 or P[Y2̄ < Y1̄ ] if
study in ALS [25]. θ2 < θ1.9 In either case, the PCS equals the probabil-
When a relatively rapid endpoint is available, ity to the let of n | 1 in the standard
2 | /(σ 2 )
sequential selection procedures are especially useful normal distribution. For example, in the QALS trial a
[18, 26–33]. he TNK-S2B phase 2B/3 trial of tenect- sample of size n = 35 patients in each of the two high-
eplase vs. alteplase in acute stroke used a sequential dosage coenzyme Q10 groups was suicient to guar-
selection procedure with a rapid endpoint to choose antee PCS ≥ 0.80 if there were a diference of 1.7 points
between three doses of the experimental drug tenect- between the true average declines in the nine-month
eplase (0.1, 0.25, or 0.4 mg/kg). he rapid endpoint was ALSFRS-R, assuming a common standard deviation
a three-category variable for outcomes of major neuro- of σ = 8.4 for the individual declines.10 his is because
logical improvement (deined as at least an eight-point √35·1.7/(8.4·√2) = 0.847, which has probability 0.80 to
improvement at 24 hours on the NIH Stroke Scale or a its let in the standard normal distribution.11
score of zero), symptomatic intracerebral hemorrhage
on CT scan at 24 hours, or neither. his trial was also 9
Tied averages do not occur for normally distributed
designed adaptively. See [34] for details of this trial and random variables. In practice, disease progression
Chapter 9 for further discussion of adaptive clinical measures with initely many possible values could result in
trial designs. tied averages with very small probability, in which case a
tie-breaking device is used to choose between treatments.
10
he diference of 1.7 represents a 20% improvement in
The indifference zone approach and the assumed average placebo group decline of 8.5 units,
which was used for planning purposes.
simple selection with fixed sample sizes 11
When there are more than two groups, tables or special
Suppose we have two active treatments labeled 1 and sotware are required to derive the PCS. See, e.g., the
2 and our goal is simply to select the better treatment tables in [19].
85
By comparison, a traditional test of the null hypoth- time.12 Ater any number of rounds n, if the running
esis H0: θ1 = θ2 versus the alternative hypothesis H1: success tally of one or more treatments falls r successes
θ1 ≠ θ2 with α=0.05 (two-tailed) and power of 80% at behind the tally or tallies currently in the lead, the
the design alternative |θ2 – θ1| = 1.7, assuming σ = 8.4, trailing treatments are eliminated from further con-
would require samples of size n = 384 per group! he sideration, and no further patients are randomized to
selection procedure requires so many fewer patients them. he procedure continues randomizing patients
because there is no need to control for type I errors to in blocks to the remaining treatments, resuming the
make a good selection. One way to see this is to view the success tallies at their current values. he entire pro-
selection design as a hypothesis test that rejects the null cess iterates until inally only a single treatment is let,
hypothesis of equal eicacy in favor of θ1 < θ2 if Ȳ1 < Ȳ2 which is then selected as best.13
or rejects H0 in favor of θ2 < θ1 if Y2̄ < Ȳ1. Under H0 then, his selection procedure has the following impor-
by symmetry, the probability of a type I error is control- tant property. Let wi = pi / (1 – pi) denote the unknown
led only at 0.50 (not 0.05). However, no type I errors will odds on success for the binary outcome on treatment
be made at all if we do not attempt any declarations of i. hen for any odds w1,…,wc the PCS is bounded from
statistical signiicance upon making the selection. his is below by a simple formula:
the fundamental diference between hypothesis testing
and selection; in a simple selection design, the primary w[r ]
PCS ≥
task at hand is to choose one treatment or the other, c
not to make any formal declaration of statistical signii- ∑w ,

i =1
r
i
cance. Note that post hoc statements of statistical sig-

niicance at α = 0.05 would be seriously underpowered, where w[1] denotes the largest odds corresponding to
so failure to achieve traditional levels of signiicance the best treatment. his result can be used to choose the
would not be considered meaningful. integer r, as follows. Suppose we want the probability
of correct selection to be at least P* whenever the odds
Sequential selection procedures ratio w[1] / w[2] between the best and second best treat-
here are many diferent procedures for more general ment success probabilities is at least some pre-speciied
ranking and selection goals such as selection from value Δ. In that case:
among more than two treatments, selection of best
subsets of treatments (e.g., the best two treatments), w[r ] Δr
ranking treatments in order of eicacy, etc. For brev- c
≥ ,
Δ r + c −1
ity we discuss just one, the Levin-Robbins-Leu (LRL) ∑w r
i
family of sequential selection procedures [28–33]. i =1
hese procedures are convenient to implement, pro-

So if we choose r to be the smallest integer greater than
vide blocking for control of diferences by site or prog-
or equal to:
nostic covariates, and allow sequential elimination of
inferior treatments as the trial progresses, sequential ⎧⎪ ( )P * ⎫⎪⎪
recruitment of superior treatments, or both. For var- ln ⎪⎨ ⎬
⎪⎪⎩ 1 − P * ⎪⎪⎭
iety, we now assume a rapid binary endpoint, such ,
ln Δ
as major neurological improvement (MNI), yes/no.
We want to choose the best from among c ≥ 2 treat-
ments, where ‘best’ means the one with highest suc- then for any set of success probabilities with w[1] / w[2] ≥
cess probability pi (i = 1,…,c) for MNI. he data will Δ:
now consist of c binomially distributed success tallies
ater n patient outcomes per group are observed. In the 13
his is an “open” sequential procedure, meaning there is
response-adaptive LRL elimination procedure, one no pre-speciied upper limit to the number of patients
pre-speciies a reference integer r ≥ 1 and sequentially randomized. he procedure will terminate using a inite
number of patients with probability one, but in practice
observes single binary (yes/no) outcomes on each of
one imposes an upper limit to the number of patients
the c treatments, one vector of c binary outcomes at a enrolled, such that if the criteria for selection are not yet
met, the trial will be truncated and a special terminal
12
his is called ‘vector-at-a-time’ sampling. decision made for the selection.
86
w[r ] them all. (ii) S[1](n) − S[b+1](n) = r, where S[b+1](n) denotes the
Δr
PCS ≥ c
≥ r ≥ P*. (b+1)st largest tally. his is a recruitment event, and any
Δ + c −1 treatment with a leading tally is recruited, meaning we
∑w r
i
i =1 select it immediately for further development. If sev-
eral treatments are tied with the best tally, recruit them
For example, to select the best treatment from among c all. No further patients are randomized to recruited
= 3 competitors with an odds ratio of Δ =2 deining the treatments.15 Ater an elimination or recruitment or
boundary between the indiference and preference zones, both events occur, the procedure continues with the
the criterion value r = 3 suices to guarantee a PCS of at remaining treatments at their current tallies, and the
least P* = 0.80 for any p1,…,pc in the preference zone. entire process iterates with the reduced number c′ of
he number of patients randomized in a sequential remaining treatments and a possibly reduced number
design is a random variable. he expected number of b′ of treatments yet to be recruited. he procedure
patients depends on r and the speciic values of the suc- stops when exactly b treatments have been recruited
cess probabilities. To illustrate the above example, if p1 and c – b treatments have been eliminated.
= ½ while p2 = p3 = 1/3, so that w[1]/w[2] = 2, the expected he probability of correctly selecting the best b-
number of rounds is 17.4, the expected total number tuple of treatments with highest success probabilities is
of patients randomized is 43.6, and the expected total bounded from below by:
number of failures (non-MNIs) is 26.3. By compari-
son, a ixed sample size binomial procedure would w[r ] w[rb]
require 24 patients per arm or 72 patients in total, and PCS ≥ ,
the expected number of failures would be 24 · (1/2 + ∑w
(b )
r
(b )
2/3 + 2/3) = 44, illustrating the expected eiciency gain

of the sequential design. Note also that the expected where the summation is over all b-tuples of the
total number of patients with the LRL procedure in form (b) = (i1,…,ib) with 1 ≤ i1 <…<ib ≤ c and where
this example, 43.6, is less than three times the expected w(rb ) wir1 Awirb . For example, if there are c = 4
number of rounds (52.2 = 3 × 17.4) and the expected treatments and it is required to select the best b =
number of failures, 26.3, is less than the expected num- 2 treatments, the PCS is bounded from below by
ber of rounds times the total failure probability, or 17.4 w[r ]w[r ] / (w1r w2r + w1r w3r + w1r w 4r + w2r w3r + w2r w 4r + w3r w 4r ).
× (1/2 + 2/3 + 2/3) = 31.9, thanks to the sequential elim- he preference zone now contains all sets of success
ination of inferior treatments. his feature strongly probabilities for which w[2]/w[3] ≥ Δ. he value of r is
appeals on ethical grounds.14 chosen large enough so that:
he LRL family of procedures can also be used to
select best subsets of pre-speciied size b (1 ≤ b < c).
w[r ]w[r ]
To select the best b treatments from c competitors PCS ≥
with sequential elimination of inferior treatments and w1r w2r + w1r w3r + w1r w 4r + w2r w3r + w2r w 4r + w3r w 4r
sequential recruitment of superior treatments, one Δ 2r
≥ r ≥ P *.
proceeds as follows. Randomize patients a vector-at-a- Δ + 4Δr +1
time, and pause the irst time that either of the following
events occurs: (i) S[b](n) − S[c](n) = r, where S[b](n) denotes
the bth largest success tally ater n patient observations
per treatment and S[c](n) denotes the cth largest, i.e.,
15
It may seem odd to remove the leading treatment
from competition in the case b > 1. It should be noted,
worst success tally. his is an elimination event, and however, that there is no claim that the irst treatment to
any treatment with the worst tally is eliminated. If sev- be eliminated is the truly worst treatment, only that its
eral treatments are tied with the worst tally, eliminate record is suiciently poor that it should not be selected.
Similarly, there is no claim that the irst treatment to be
recruited is truly the best treatment, only that its record
14
he exact PCS in this example is 0.814. If the open is good enough to be among the best b treatments to be
procedure is truncated ater n = 35 rounds, the PCS is selected. Since it is not known that the best treatment
0.80, still large, while the expected number of rounds, has been removed, it is ethical to continue randomizing
patients, and failures decrease slightly to 16.5, 41.8, and patients to the other treatments, assuming at the outset
25.2, respectively. there are good reasons to select more than one treatment.
87
Additional properties of these procedures are dis- A new design called the ‘comparative selection
cussed in [31–32]. trial’ combines selection and hypothesis testing with
no need for selection bias adjustments [33]. Briely, the
Selection bias trial compares c0 placebo arms to c1 active treatment
arms, for a total of c0 + c1 = c arms. he goal is to select
If a selection procedure is used as the irst stage of an
a subset of pre-speciied size b (1 ≤ b ≤ c1) of all ‘better-
adaptive trial where the selection data will be used in
than-placebo’ (BTP) treatments, assuming one or more
the inal evaluation of the whole trial, an adjustment
exists, or if not, to declare that no such subset exists. he
for selection bias is required due to the potential for
null hypothesis H0 is that there exists no BTP b-tuple of
capitalizing on chance (see Chapter 5 for more on
treatments (because at least one placebo arm is better
bias). Suppose, for example, that we will select one
than the bth best active arm in terms of eicacy). he
of two active treatments in a irst selection stage and
alternative hypothesis H1 is that a BTP b-tuple exists
then use the selected treatment’s data to compare with
(wherein all b active arms are better than the best pla-
a concurrent placebo. Suppose further that all three
cebo arm in terms of eicacy). We wish to test H0 con-
treatments have the same true eicacy. In replications
trolling the type I error rate at level α and in so doing, we
of the experiment, whichever treatment is selected will
will control the probability that we will make a false dec-
have a systematic advantage over the placebo because
laration that a BTP b-tuple exists when H0 is true. If H1 is
its very selection requires it to look better than its com-
true, we want to have a high probability P* of correctly
petitor. If patients are not too scarce, a simple method
declaring that a BTP b-tuple exists and correctly select-
to avoid selection bias is to conduct the selection as a
ing one. For example, if there are c1 = 2 active treatments
separate experiment from the subsequent evaluation.
and c0 = 1 placebo treatment and we want to select the
An adjustment for selection bias was used in the QALS
best b = 1 treatment, we will test the null hypothesis that
trial because the investigators considered ALS patients
there is no better-than-placebo active treatment. If true,
relatively rare and wanted to use their selection data in
we will want to declare this to be the case with probabil-
the second-stage futility test. Formulas for correcting
ity at least 1 – α. If either or both of the active treatments
the selection bias are given in [8].
are better than the placebo, we will want to declare the
existence of a BTP treatment and select the best one
Comparative selection trials with probability at least P*.
Although selection procedures eiciently achieve he LRL family of selection procedures can be used
their goal of selecting best treatments, the desire to for this problem. he idea is to use data augmentation
‘test something’ with an accompanying statement of to ‘handicap’ the placebo treatments’ outcome tallies
statistical signiicance seems irresistible. here is the while selecting a best b-tuple. If the selected b-tuple
following issue to consider too: if a selection trial is contains a placebo treatment, we do not reject H0 and
conducted with only active treatments, i.e., without we declare that there is no BTP b-tuple of treatments.
including a placebo as eligible for selection in the con- If the selected b-tuple contains only active treatments,
test, then it is possible that all of the active treatments then we reject H0 and declare that the selected b-tuple
under consideration may be worse than placebo, so is a BTP b-tuple. he LRL lower bound formula for the
that none ‘should’ be selected. Of course, when a pla- probability of correct selection is used both to select
cebo is excluded from consideration in a selection trial, r and to determine how to augment the placebo data
consideration of whether or not the selected treatment in order to control both the type I error rate and the
is better than a placebo is simply outside the goal of probability of making a false declaration. he type I
the selection trial and additional testing must address error rate can be controlled because the data augmen-
that comparative question. herefore it should be tation adds successes to the placebo arms in a carefully
emphasized that just because an active treatment has speciied manner so as to make the placebo arms look
been selected in a head-to-head comparison with other better than the active treatments, thereby yielding a
active treatments, there is no direct evidence that the high probability of selecting at least one placebo arm
one selected need be eicacious (compared to pla- under the null hypothesis, avoiding a type I error. he
cebo). hese considerations suggest that a selection choice of r then guarantees a high probability of cor-
design with a concomitant hypothesis test would be of rectly rejecting the null hypothesis when it is false and
great practical interest. simultaneously selecting a BTP b-tuple when there is a
88
suiciently large separation between the success prob- 16. Gupta SS. On a decision rule for a problem in ranking
abilities of the best b active and placebo treatments. See means. Mimeograph Series 150, Institute of Statistics.
[33] for details on how to do this. Chapel Hill, University of North Carolina, 1956.
17. Gupta SS. On some multiple decision (selection and
ranking) rules. Technometrics 1965; 7: 225–45.
References
18. Bechhofer RE, Kiefer J and Sobel M. Sequential
1. Palesch Y, Tilley BC, Sackett DL, et al. Applying a phase Identication and Ranking Procedures. Chicago,
II futility study design to therapeutic stroke trials. University of Chicago Press, 1968.
Stroke 2005; 36: 2410–4.
19. Gibbons JD, Olkin I and Sobel M. Selecting and
2. Levin B. he utility of futility (editorial). Stroke 2005; Ordering Populations: A New Statistical Methodology.
36: 2331–2. Wiley, Hoboken, 1977; corrected, unabridged version
3. Elm JJ, Goetz CG, Ravina B, et al. A responsive outcome Society for Industrial & Applied Mathematics,
for Parkinson’s disease neuroprotection futility studies. Philadelphia, 1999.
Ann Neurol 2005; 57: 197–203. 20. Simon R, Wittes RE and Ellenberg SS. Randomized
4. Tilley BC, Palesch YY, Kieburtz K, et al., on behalf of the phase II clinical trials. Cancer Treat Rep 1985; 69:
NET-PD Investigators. Optimizing the ongoing search 1375–81.
for new treatments for Parkinson’s disease: using futility 21. hall PF, Simon R and Ellenberg SS. Two-stage selection
designs. Neurology 2006; 66: 628–33. and testing designs for comparative clinical trials.
5. he NINDS NET-PD Investigators. A randomized, Biometrika 1988; 75: 303–10.
double blinded, futility clinical trial of creatine and 22. Schaid DJ, Wieand S and herneau TM. Optimal
minocycline in early Parkinson’s disease. Neurology two-stage screening designs for survival comparisons.
2006; 66: 664–71. Biometrika 1990; 77: 507–13.
6. he NINDS NET-PD Investigators. A randomized 23. Stallard N and Todd S. Sequential designs for phase III
clinical trial of coenzyme Q10 and GPI-1485 in early clinical trials incorporating treatment selection. Stat
Parkinson disease. Neurology 2007; 68: 20–8. Med 2003; 22: 689–703.
7. Tilley BC and Galpern WR. Screening potential 24. Bischof W and Miller F. Adaptive two-stage test
therapies: Lessons learned from new paradigms used in procedures to ind the best treatment in clinical trials.
Parkinson disease. Stroke 2007; 38: 800–3. Biometrika 2005; 92: 197–212.
8. Levy G, Kaufmann P, Buchsbaum R, et al. A two-stage 25. Gordon PH, Cheung Y-K, Levin B, et al., for the
design for a phase II clinical trial of coenzyme Q10 in Combination Drug Selection Trial Study Group. A
ALS. Neurology 2006; 66: 660–3. novel, eicient, randomized selection trial comparing
9. Kaufmann, P, hompson, JLP, Levy, G, et al., for the combinations of drug therapy for ALS. Amyotrophic
QALS Study Group. Phase II trial of CoQ10 for ALS Lateral Scler 2008; 9: 212–22.
inds insuicient evidence to justify phase III. Ann 26. Buringer H, Martin H and Schriever, KH.
Neurol 2009; 66: 235–44. Nonparametric Sequential Selection Procedures.
10. Czaplinski A, Haverkamp LJ, Yen AA, et al. he value Birkhauser, Boston: 1980.
of database controls in pilot or futility studies in ALS. 27. Bechhofer RE, Santner TJ and Goldsman DM. Design
Neurology 2006; 67: 1827–32. and Analysis of Experiments for Statistical Selection,
11. Cudkowicz M, Katz J, Moore DH, et al. Toward more Screening, and Multiple Comparisons. Wiley, New York:
eicient clinical trials for amyotrophic lateral sclerosis. 1995.
Amyotrophic Lateral Scler 2010; 11: 259–65. 28. Levin B and Robbins H. Selecting the highest
12. he Huntington Study Group DOMINO Investigators. probability in binomial or multinomial trials. Proc Natl
A futility study of minocycline in Huntington’s disease. Acad Sci USA 1981; 78: 4663–6.
Mov Disord 2010; 25: 2219–24. 29. Leu CS, Levin B. On the probability of correct selection
13. Herson J and Carter SK. Calibrated phase II clinical in the Levin-Robbins sequential elimination procedure.
trials in oncology. Stat Med 1986; 5: 441–7. Stat Sinica 1999; 9: 879–91.
14. he Parkinson Study Group. Efect of deprenyl on the 30. Leu CS and Levin B. Proof of a lower bound formula for
progression of disability in early Parkinson’s disease. the expected reward in the Levin-Robbins sequential
New Engl J Med 1989; 321: 1364–71. elimination procedure. Sequent Anal 1999; 18: 81–105.
15. Bechhofer RE. A single-sample multiple decision 31. Leu CS and Levin B. A generalization of the Levin-
procedure for ranking means of normal populations Robbins procedure for binomial subset selection and
with known variances. Ann Math Stat 1954; 25: 16–39. recruitment problems. Stat Sinica 2008; 18: 203–18.
89
32. Leu CS and Levin B. On a conjecture of Bechhofer, False Discovery, Survival Analysis and Other Topics.
Kiefer, and Sobel for the Levin-Robbins-Leu binomial Series in Biostatistics, Volume 4. World Scientiic, 2011.
subset selection procedures. Sequent Anal 2008; 27: 34. Haley EC, hompson JLP, Grotta, JC, et al., for the
106–25. Tenecteplase in Stroke Investigators. Phase IIB/III
33. Leu CS, Cheung YK and Levin B. Subset selection in trial of tenecteplase in acute ischemic stroke: Results
comparative selection trials. In: Bhattacharjee M, Dhar of a prematurely terminated randomized clinical trial.
SK, Subramanian S, eds. Recent Advances in Biostatistics, Stroke 2010; 41: 707–11.
90
Section 3 Special study designs and methods for data monitoring
Chapter
Adaptive design across stages
9 of therapeutic development
Christopher S. Cofey
Introduction to adaptive designs the literature and there are a large number of potential
study adaptations. here is clearly a need for a stand-
During the planning phase, an investigator must make
ardized deinition of an adaptive design.
important decisions that afect the design of a clinical
he Pharmaceutical Researchers and Manufactur-
trial (e.g., patient population, primary outcome, and
ers of America (PhRMA) Adaptive Designs Working
primary hypothesis). Unfortunately, there may be lim-
Group (ADWG) was formed in 20061. One of the earli-
ited information to guide these initial choices. Since
est contributions of the working group was the publi-
more knowledge will accrue as the study progresses,
cation of a white paper that provided one of the irst
one attractive suggestion is to incorporate an adap-
formal deinitions of an adaptive design: ‘By adaptive
tive design that modiies one or more characteristics
design we refer to a clinical study design that uses accu-
of the trial based on interim information. his greater
mulating data to modify aspects of the study as it con-
lexibility has the potential to require the use of fewer
tinues, without undermining the validity and integrity
patients within trials, allow a more eicient use of
of the trial’ [1]. he white paper went on to stress that
resources, and provide the ability to make efective
changes should be made ‘…by design, and not on an ad
treatments available to patients more quickly or stop
hoc basis’ and that adaptive designs are ‘…not a remedy
inefective treatments earlier. Accordingly, there has
for inadequate planning’. A similar deinition appeared
been substantial recent interest, and a number of con-
in the recent FDA drat guidance document on adap-
cerns, associated with the use of adaptive designs. his
tive designs: ‘…a study that includes a prospectively
chapter will attempt to clarify the deinition of an adap-
planned opportunity for modiication of one or more
tive design, summarize some of the commonly pro-
speciied aspects of the study design and hypotheses
posed types of adaptive designs, summarize the use of
based on analysis of data (usually interim data) from
adaptive designs in published neurological trials, and
subjects in the study’ [8]. However, the deinition in
describe some logistical barriers that will need to be
the FDA drat guidance document uses a more relaxed
addressed in order to more fully achieve the beneits
deinition for what is meant by prospectively planned:
of promising adaptive designs in the future. he reader
‘his can include plans that are introduced or made
interested in more details regarding the subject should
inal ater the study has started if the blinded state of the
consult one of a number of excellent review articles
personnel involved is unequivocally maintained when
[1–6] or recent guidance publications by regulatory
the modiication plan is proposed.’
agencies [7–8].
Much of the research on adaptive designs has been
driven by drug development within the pharmaceuti-
Definition of an adaptive design cal industry. Although many basic principles remain
he rapid proliferation of interest in adaptive designs,
and inconsistent use of terminology, has created con-
1
fusion about similarities and diferences among the he AD working group has established an external
various techniques. For example, the deinition of an webpage: http://biopharmnet.com/doc/doc12004.html.
‘adaptive design’ itself is a common source of confu- his webpage provides a central location for publications,
training courses, and other documents created by the
sion. he term has been used rather ambiguously in
working group to facilitate the sharing of knowledge.
91
the same regardless of the venue or funding source, In this chapter, we focus on some speciic adaptive
some of the speciic advantages and disadvantages of designs that have received the most attention to date.
adaptive designs difer when considering the use of Although many adaptive designs employ the use of
such designs in trials funded by the NIH, foundations, Bayesian statistical techniques, it is important to con-
or non-proit organizations. To address this issue, a sider both Bayesian and Frequentist approaches to
2009 workshop was held on ‘Scientiic Advances in adaptive designs.
Adaptive Clinical Trial Design.’ he workshop deini-
tion of an adaptive design was very similar to that of the Adaptive designs for early stage
ADWG: ‘A protocol that allows certain design features
to change from an initial speciication based on evolv- exploratory development
ing trial information while maintaining statistical, sci- Early exploratory (phase 1) trials generally represent
entiic, and ethical integrity.’ the initial introduction of an investigational new drug
Hence, all three deinitions clearly state that only into humans. hese studies are generally small (15–30
studies with pre-planned adaptations would be consid- subjects) with an objective of determining the max-
ered adaptive designs. For the purposes of this chapter, imum tolerated dose (MTD) – the largest dose of the
we take the same approach and consider valid adap- drug that can be given before patients start to experi-
tive designs to be only those that consider pre-planned ence a dose limiting toxicity (DLT) at an unaccept-
changes. ably high rate. hese trials help to guide the decision
whether to continue a drug development program and,
if so, which dose(s) to select for further development. If
Types of adaptive designs additional development is planned, an accurate deter-
Based on the above deinitions, it is clear that there are mination of the MTD is very important to the plan-
an ininite number of adaptive design possibilities and ning and conduct of trials in later phases. Selecting
any number of aspects of the study can be changed. too low a dose may not allow future studies to show
Design features that can change include, but are not eicacy of a potentially useful drug. Similarly, select-
limited to, the maximum sample size, the stopping ing too high a dose may put patients in future trials at
time, the allocation of patients, dosing, the number unnecessary risk. Traditional approaches for designing
of treatment arms, the endpoints, or the hypotheses. phase 1 clinical trials include up-and-down designs or
Clearly, changes to some of these elements are more model-based designs where the MTD is treated as a
controversial than others. quantile that can be estimated [10]. he most common
In all instances, the objectives of the adaptations approach is the ‘3+3 design’, which treats three subjects
should be clearly deined and the operating character- at each dose level of interest. If no subjects experience
istics should be well understood. For example, before a DLT, the dose is increased to the next level. If two
utilizing any adaptive design that involves hypothesis or more subjects experience a DLT, the process stops
testing, researchers should assess the impact of the and selects the lower dose as the MTD. If one subject
increased power on the overall type I error rate and experiences a DLT, then three additional subjects are
make steps to adjust for any inlation that might be treated at the given dose. If none of the three additional
introduced. Such assessments are crucial because adap- subjects experiences a DLT, the dose is increased to the
tive designs are not always better than standard ixed next level. Otherwise, the process stops and the dose
designs. One important assessment when considering below is selected as the MTD. his approach is eas-
an adaptive design is to compare its properties with ily understood by clinicians and requires no complex
those obtained from a standard ixed design. he need computer program to implement, but tends to treat
for such evaluations underscores the need for adapta- many subjects at low, inefective doses and may pro-
tions to be planned in advance. In order to enable a full vide poor estimates of the MTD in neurological set-
simulation of any proposed adaptive design, the extent tings where DLTs of interest occur less frequently than
to which adaptation is planned should be described a in the oncology settings where this design originated.
priori in detail. As stated by Hung et al.: ‘At the very Recently, more sophisticated approaches for adaptive
least, the regulatory agencies need to know every detail dose ranging have been proposed. he most com-
of how the trial proceeded during the conduct and mon of these approaches, the continual reassessment
adaptations’ [9]. method (CRM), is discussed below [11].
92
Chapter 9: Adaptive design in therapeutic development
Continual reassessment method Adaptive designs for late stage exploratory

he CRM assumes that the probability of both eicacy
and toxicity increase with dose and that toxicity can
development
be deined as a binary outcome. he ‘acceptable’ level Late exploratory (phase 2) trials typically have a num-
of toxicity must be explicitly deined by the investi- ber of diferent goals [12]. hese include establishing
gators. he MTD is then deined as the highest dose that the response changes with the dose (proof of con-
with a toxicity level at or below the speciied accept- cept) and selecting a target dose to take forward into
able level of toxicity. In its original formulation, the the conirmatory phase. Traditional approaches to such
method begins with an assumed a priori dose-toxic- trials involve random allocation to multiple ixed doses
ity curve and a chosen target toxicity level. he irst with multiple comparison adjustments. A number of
enrolled subject is assigned the dose most likely to be adaptive model-based approaches have been proposed,
associated with the target toxicity level, based on the including a D-optimal approach, a normal dynamic lin-
initial curve. Ater the outcome for this patient has ear model (NDLM) [13], and a general adaptive dose
been observed, the estimated dose-toxicity curve is allocation. A PhRMA adaptive dose-ranging studies
reitted (i.e., the posterior distribution of the model working group was formed in 2006 to address the con-
cern that a poor understanding of dose response is a
is shited slightly up or down depending on whether
leading cause of high attrition in late development. One
the patient experienced a DLT). he next subject is
of the initial objectives of this group was to conduct a
assigned the dose closest to the MTD based on the
comprehensive simulation study comparing adaptive
updated dose-toxicity curve. his process continues
model-based approaches to other dose-inding meth-
until some pre-deined stopping criteria are met.
ods [14]. he group concluded that the sample sizes
here are two general strategies for deining the
typically used for traditional approaches to dose-ind-
stopping rules: 1) Continue until a speciied num-
ing studies are too small for accurate dose selection and
ber of patients are treated at the same dose and the
estimation of the dose-response curve. he adaptive
next patient would also be treated at that dose; or 2)
model-based methods had increased power to detect
Continue until the dose-toxicity curve changes by less
dose-response and better precision with respect to
than some pre-speciied threshold. Regardless of the
selecting a target dose. However, they caution that there
stopping rule chosen, once the stopping criteria are
is a need to balance gains associated with adaptive dose-
achieved, the inal dose is selected as the MTD. To
ranging designs against the greater methodological and
address some of the concerns raised with the initial
operational complexity currently associated with the
CRM proposal, several modiied CRM approaches
use of these designs. In particular, there are very few
have been developed and implemented. hese mod-
public sotware packages available for implementing
iications include always starting at the lowest dose
these methods. As new sotware is developed, the use of
level under consideration, enrolling 2–3 patients in
these methods will become much more practical.
each cohort, proceeding as a standard 3+3 dose escal-
ation design until the irst DLT occurs, and specifying
that dose escalation cannot increase by more than one Adaptive designs for confirmatory
level at any time during the study. As compared to the clinical trials
3+3 design, the CRM typically treats more subjects at Adaptive designs are generally well accepted and
the target dose and fewer subjects at inefective doses. encouraged for early phases of drug development.
However, the implementation of a CRM requires a For a variety of reasons, including the potential for
substantial collaboration between an investigator and type I error inlation, the use of adaptive designs in
statistician. he method is also rather computation- conirmatory (phase 3) trials is a bit more controver-
ally intensive, although there are documented sot- sial. However, it is clear that unplanned adaptations
ware packages available for the implementation of the have been utilized for many years in clinical trials.
technique. A free package can be downloaded from: For example, most trials involve changes related to
• M.D. Anderson Cancer Center (http:// logistical issues, such as recruitment criteria, that do
biostatistics.mdanderson.org/Sotware not afect the inferences of interest. Furthermore, in
Download) order to determine the required sample size to ensure
93
a desired level of statistical power, an investigator must Adaptive randomization

specify a clinically meaningful treatment diference An adaptive randomization design allows the rand-
and values for any ‘nuisance’ parameters. A nuisance omization schedule to be modiied during the course of
parameter represents any value that must be speciied an ongoing trial. here are a number of diferent types
in order to perform a sample size calculation that is of adaptive randomization procedures. With response
not directly related to the efect of the treatment (e.g., adaptive randomization, the allocation probability for
the standard deviation of a continuous measure, the assigning patients to treatment groups is determined by
overall event rate for a binary outcome, and the accrual the responses observed in previous patients. Examples
rate for a time-to-event outcome). he uncertainty include the randomized play-the-winner model [16]
associated with the estimation of most key nuisance and the use of a Bayesian bandit allocation rule [17].
parameters at the beginning of a trial, perhaps due to Covariate adaptive randomization uses the covariate
complications from using natural history data to plan values of previously enrolled subjects to determine the
a clinical trial, has led to unplanned sample size adjust- allocation probabilities for future subjects. For exam-
ments in a number of ongoing studies. As an example, ple, a minimization algorithm can be used to assign
the Secondary Prevention of Small Subcortical Strokes subjects to treatments in a way that maximizes the bal-
(SPS3) study recently increased the overall planned ance among treatment groups with respect to the dis-
sample size from 2500 to 3000 in order to account for a tributions of several covariates [18]. Although adaptive
lower than expected overall event rate. randomization methods are one of the oldest proposed
he biggest change in recent years is that such adaptations, the use of response-adaptive randomiza-
unplanned design changes are starting to receive tion in conirmatory trials remains the source of much
greater scrutiny. his is actually a good thing because controversy due to concerns that the approach may
it forces researchers to give more thought to possible lead to imbalances in important covariates and has the
adaptations earlier in the planning process. As a result, potential to add complexity to the inal analysis. For
investigators are being proactively encouraged to con- example the recent FDA drat guidance document
sider adaptation in the original development of a study states that ‘Adaptive randomization should be used
protocol. However, the use of adaptive designs in the cautiously in adequate and well-controlled studies, as
conirmatory setting requires researchers to proac- the analysis is not as easily interpretable as when ixed
tively assess the operating characteristics of any pro- randomization probabilities are used’ [8].
posed adaptations via simulation. his has the potential
to require more resources for study planning, but can Sample size re-estimation
lead to great beneits during the conduct of the trial. he traditional approach to study design involves a sub-
Below, we briely summarize the possible adaptations stantial efort on the part of the investigators to ensure
for conirmatory trials that have received the most an adequate sample size is determined before the trial
attention in the literature to date. Although many can is initiated. Once all required design features have been
also be used in earlier studies, they are most oten used speciied, and a clinically meaningful treatment difer-
for conirmatory trials and that will be our focus. ence and values for any nuisance parameters have been
speciied, the investigators can compute the sample size
Group sequential methods required to achieve the desired power. his approach can
Sequential monitoring of interim data has become inte- be quite complicated since the speciication of a ‘clinically
gral to modern clinical trials (see Chapter 14). A Data meaningful’ treatment diference may not be straight-
Safety Monitoring Board (DSMB) is usually given the forward or a great deal of uncertainty may exist with
responsibility for monitoring the accumulating data in respect to the speciied values for nuisance parameters.
a clinical trial. In general, DSMBs can be charged with If the assumptions used for the sample size calculations
stopping trials for: 1) safety, 2) eicacy or lack of eicacy, are not correct, the study may have a sample size that is
or 3) futility (insuicient power). Appropriate statisti- too small or too large. If the sample size is too small, the
cal methods for interim monitoring exist [15] and are study will be underpowered and may lead to discarding
implemented in a number of statistical sotware pack- a potentially useful treatment. Such underpowered stud-
ages. Importantly, given the deinitions above, group ies lead to great confusion in the literature since they are
sequential designs are one of the most commonly used oten perceived as negative studies, but would properly
adaptive designs in clinical trials. be interpreted as inconclusive. Similarly, overpowered
94
studies collect larger sample sizes than required and waste the type I error rate, suggests that the protocols for all
investigator resources that might have been directed else- large trials should include re-assessments of nuisance
where. A sample size re-estimation design refers to an parameters at some interim time point. However, such
adaptive design that allows for a sample size adjustment designs have not been routinely implemented to date.
based on a review of interim data.
Historically, there has been a great deal of contro-
versy surrounding designs that utilize sample size re- Adaptive seamless designs
estimation. In general, the acceptance of such methods Seamless designs attempt to accomplish, within a sin-
depends greatly on whether the sample size is being gle trial, objectives that are normally achieved through
modiied based on a re-estimated treatment efect or separate trials. he goal is to eliminate the downtime
only on re-estimated values for the nuisance parame- between trials. An adaptive seamless design combines
ters. Methods have been proposed that allow the use of phases and uses data from patients enrolled before and
sample size re-estimation methods based on a revised ater the adaptation for the inal analysis. Most inter-
treatment efect without inlating the type I error est to date has involved seamless phase 2/3 designs
rate [19–23]. However, such methods have proven to that transition an adaptive dose-inding study into a
be controversial due to concerns as to whether there standard conirmatory trial. However, there are also
is any beneit above and beyond that which can be opportunities for adaptive seamless designs in early
achieved with a standard group sequential design [24]. development (phase 1/2a) or biomarker adaptive
Generally, a sample size re-estimation method based designs that allow design modiication (dose selec-
on a revised estimate of the treatment efect is nearly tion, dropping arms, etc.) to be based on a short-term
always less eicient than a group sequential approach biomarker, while using a longer-term clinical endpoint
[25–26]. hat being said, the lexibility involved with for the conirmation stage.
such designs may be attractive because it allows starting he use of an adaptive seamless design will result in
a smaller study with an option of increasing if interim a more complicated statistical analysis at the end of the
results seem promising. his could be very attractive to trial. When an adaptive seamless design is used, statisti-
a small company or investigator with limited resources. cal methods must account for the fact that data from the
However, it is vitally important that the rules for modi- second stage are combined with data from the irst stage
fying the sample size be stated prior to any unblinding for the inal analysis. he data from both stages must be
of the data. hus, although methods exist to adjust for combined in a way that guarantees control of the overall
potential type I error inlation, the adjustments only type I error rate, produces unbiased parameter estimates,
apply conditional upon the speciic decision that was and produces conidence intervals with the correct
made. Importantly, if the adaptation was made on an coverage probability. For example, Kaufman et al [31]
ad-hoc basis, these methods cannot guarantee uncon- conducted an adaptive seamless trial of coenzyme Q10
ditional control of the type I error rate because it is (coQ10) for the treatment of amyotrophic lateral scle-
impossible to simulate the entire study design since one rosis (ALS). he primary outcome was the nine month
can never go back and clearly state all diferent deci- decline in the ALS Functional Rating Scale-revised. he
sions that might have been made had diferent interim irst stage used a selection design (see Chapter 8) to select
results been observed. As a consequence, researchers one of two dosages of coQ10 (1800 or 2700 mg/day) to
should avoid post-hoc modiications of the sample size carry forward into stage 2. he second stage compared
based an observed interim treatment diferences. the selected dose from stage 1 against placebo using a
With internal pilot designs, modiications are based futility design [32] (see Chapter 8). If no adjustment is
only on re-estimated nuisance parameters [27]. With made to the inal test statistic, the type I error rate may be
moderate to large sample sizes, there is minimal (if any) increased due to the positive bias introduced by the fact
inlation of the type I error rate associated with the use that the test statistic does not account for the fact that the
of such designs [28–30]. hus, internal pilot designs can dose being compared to placebo was chosen as the best
be used in moderate to large randomized clinical trials dose in stage one. To address this issue, the investigators
to assess key nuisance parameters and make appropri- used simulations to develop and validate a bias correc-
ate modiications with little cost in terms of an inlated tion. his bias correction was then incorporated into the
type I error rate. he fact that internal pilot designs inal test statistic in order to preserve the overall type I
can be used in large trials, with little to no inlation of error rate at the desired level.
95
he added lexibility of an adaptive seamless consider novel designs. Correspondingly, the majority
design may be ofset by the added complexity associ- of the published examples describing the use of adap-
ated with such designs. Investigators should carefully tive designs in neurology it into this category.
consider the feasibility of implementing an adaptive • Krams et al [33] described a dose-response study
seamless design within a given project. Some projects with randomized adaptive allocation to 1 of 15
might be better suited to seamless development than doses of UK-279,276 or placebo for the treatment
others. he length of time needed to make a decision of acute ischemic stroke (AIS). he primary
should be small relative to the time for enrollment. If a outcome was the change from baseline to day
biomarker will be used for dose selection, it should be 90 on the Scandinavian Stroke Scale. During
validated and well understood. Drug supply and pack- the trial, an NDLM continuously reassessed the
aging may be more challenging in the seamless design dose-response curve in order to estimate the
setting because the number of treatment groups may dose-response relationship. he NDLM itted
change during the trial. Finally, at the end of each a linear regression model to each dose in order
phase in the traditional approach, all analyses are to obtain posterior estimates and 95% posterior
carefully studied by the investigators and sponsors. As credible intervals of the dose-response curve, the
a consequence, the ‘go’ or ‘no go’ decision is made by minimal dose that yields near maximal eicacy
the investigator and sponsor based on a careful review (ED95), and the efect over placebo at the ED95.
of all data. Adaptive seamless designs raise particular Ater each evaluation of the dose-response curve,
concerns at the end of the irst phase because there is a termination rule speciied that the trial would
the need to keep the investigators and sponsors from stop for eicacy if the lower 80% boundary of
knowing any interim indings. To alleviate this con- the credible interval for the efect over placebo at
cern, the DSMB may play an important role in the the ED95 was >2 or stop for futility if the upper
decision-making process between phases. As a conse- 80% boundary of the credible interval was <1.
quence, the roles and responsibilities of the DSMB are his termination rule was used to recommend
becoming more complex. In general, there should be cessation of the study ater futility had been
a clear advantage for implementing a seamless transi- established.
tion before such designs should be utilized. Although • Ho et al [34] described a two-stage adaptive
important for any adaptive design, the importance of dose-ranging design to determine an efective
advanced study planning, adequate statistical support, and tolerable dose of a novel oral calcitonin
and the need for simulation studies to assess operating gene-related peptide receptor antagonist
characteristics is magniied in an adaptive seamless (MK-0974) for the acute treatment of migraine.
design. he primary outcome was pain relief, deined
as a reduction to mild or none two hours ater
Examples in neurology dosing. During the irst stage, subjects were
Because this is a rapidly expanding area of research, randomized to one of seven MK-0974 levels
outside of group sequential designs, there are few pub- or matching placebo. Ater 192 patients were
lished examples of neurology clinical trials using an randomized, an interim analysis was performed
adaptive design. Some of the published neurological to determine the lowest dose with at least 70%
trials that utilized an adaptive design will be discussed conditional probability of being nominally
here. We stress that this is in no way meant to be an signiicant at the end of the trial based on a
exhaustive list. here are many trials for which some comparison with placebo. Only the MK-0974
type of adaptation may take place that are not clearly groups with dose levels at least as high as
relected in the published paper. One of the goals of the dose level identiied at the end of stage 1
ongoing education eforts is to more clearly delineate were carried forward into stage 2. When the
exactly what should be described in any publication design was implemented, the study led to the
that utilizes an adaptive design. discontinuation of the four lowest doses at the
As relected in the recently released FDA drat end of the irst stage. he results at the end of the
guidance on the topic [8], adaptive designs are cur- second stage suggested that the remaining doses
rently better accepted in the ‘learn’ phase of drug of MK-0974 were generally efective and well
development where investigators are generally freer to tolerated for the treatment of migraine.
96
• Whelan et al [35] described an outcome-adaptive determine whether or not to proceed with a

dose-inding design that will be used in a dose- phase 3 trial. he trial was terminated for slow
inding trial for tissue plasminogen activator (tPA) enrollment ater only 112 patients had been
in childhood AIS. he design uses both eicacy randomized, so the advantages of the adaptive
(angiographic recanalization or restoration of design could not be realized.
low past the area of occlusion on follow-up
here are currently very few published examples describ-
magnetic resonance angiography) and toxicity
ing the use of adaptive designs in conirmatory, rand-
(fatal or symptomatic intracranial or systemic
omized clinical trials (excluding the common use of group
hemorrhage) to determine doses for successive
sequential methods for interim monitoring). Olesen et al
patient cohorts. he investigators argue that
[38] described a group sequential adaptive randomiza-
by integrating both eicacy and toxicity in the
tion design to assess whether a calcitonin gene-related
selection of doses, the design avoids the additional
peptide might be efective in the treatment of migraine
costs in terms of time and money associated with
attacks. he primary outcome was the reduction from
the usual approach of irst assessing toxicity alone,
severe or moderate migraine at baseline to mild or no
followed by a separate assessment of eicacy. he
migraine at 2 hours ater treatment. Subjects presenting
results of this study have not yet been published.
with severe to moderate migraine were treated in groups
• Elkind et al [36] conducted an adaptive dose-
of six, with two subjects in each group assigned to pla-
inding study using the CRM. he study
cebo and the other four subjects assigned to one of six
demonstrated that 8 mg/kg/day is the maximum
doses (0.25, 0.5, 1, 2.5, 5, or 10 mg administered intrave-
tolerated dose of lovastatin for the treatment of AIS,
nously over 10 minutes). he dose assignment to the next
and demonstrated that the CRM method could be
group of patients depended on the responses observed
successfully utilized in early phase stroke trials.
in all previous patient groups. Based on a total enroll-
• As previously described, Kaufman et al [31] ment of 126 patients, the design selected the 2.5 mg dose
performed an adaptive seamless trial (selection and found that it was efective in treating acute attacks of
design in stage one, futility design in stage two) migraine (p = 0.001 when comparing the response rate
of coQ10 for the treatment of ALS. he irst stage to that observed with placebo). Unfortunately, the design
selected the 2700 mg/day dosage. he second stage did not lead to early stopping, so the advantage of the
established that the efect of coQ10 was not of adaptive design is not easily apparent.
suicient magnitude to justify the cost and efort Although there are few published examples of the
associated with undertaking a conirmatory trial. use of adaptive designs in conirmatory trials, the use of
For this reason, the trial should be considered a adaptive designs has become more common in recent
success. By using an adaptive seamless design, years. Because of the lag between study initiation and
the investigators were able to select a preferred the publication of inal study results, it will take a few
dose and conclude that further study would not years before the impact of an increasing use of adaptive
be worthwhile using a sample size of only 185 designs is seen in the literature. Hence, the number of
participants. published conirmatory randomized controlled trials
• Haley et al [37] described an adaptive seamless using an adaptive design in neurology is expected to
trial of intravenous tenecteplase versus standard- dramatically increase over the next few years. here is
dose rtPA in patients with AIS. he trial began a need for further discussion regarding what aspects of
by comparing three doses of tenecteplase with an adaptive design should be included in publications
standard 0.9 mg/kg rtPA in patients within three in order to give the reader a clear sense of how the adap-
hours of stroke onset. he initial phase used a tations were planned and implemented.
selection design (see Chapter 8) to establish the
‘best’ dose of tenecteplase for further study based
on a 24-hour assessment of major neurological Barriers to adaptive designs
improvement balanced against the occurrence of While the development of additional statistical
symptomatic intracranial hemorrhage. he trial methodology is needed, this chapter illustrates that
would then proceed with a futility assessment appropriate statistical methods currently exist for
between the selected dose and rtPA, on the basis implementing a number of well-accepted adaptive
of the three month modiied Rankin scale, to designs. However, before any adaptive design can be
97
practically implemented, there are a number of logisti- also be a good sense of trust between the investigators
cal barriers that need to be overcome [39–40]. A few of and DSMB members, since the use of a seamless adap-
the most pressing issues are discussed below. However, tive design may involve some loss of control on the
the reader is cautioned that this is far from an exhaus- part of the investigators. Discussions are needed as to
tive list and the barriers may change as progress is made whether this should be a responsibility of the DSMB
to address some of the barriers and/or new barriers are or an external group. If the DSMB is to be involved
introduced. in this process, it is likely that the time demands on
DSMB members will be increased. In addition, at the
Funding beginning of the study a number of diferent possible
scenarios should be discussed with the investigators,
Current funding mechanisms make it diicult to
since this will be the only time that the DSMB will be
include an adaptive design since the inal sample size
able to solicit investigator input on how to react at the
may not be known at the outset. his causes logistical
time of an important design decision. Removing the
problems associated with setting up an overall trial
investigator from discussions surrounding these key
budget and contracting with potential study sites.
design decisions reinforces the need for investigators
Discussions will need to take place among sponsors
to pre-specify all adaptations in the protocol so that the
to determine how to gain the advantages of adaptive
DSMB (or other third party) has a clear set of rules to
design within the current funding framework.
follow for implementing the adaptations.
Transparency Summary
Adaptive designs require a high degree of transparency he term ‘adaptive design’ creates much confusion since
with respect to the decisions that will be considered it has been used to refer to a variety of situations. As a
throughout the trial. he extent to which adaptation is result, many incorrectly perceive all adaptive designs
planned should be described a priori in detail. However, as controversial. In fact, regulatory agencies generally
if all possible adaptations are clearly speciied in the encourage the use of adaptive designs for early phases
protocol, a great deal of information can be inferred of research. For conirmatory trials, regulatory agen-
once a decision is implemented. his has the potential cies will accept some adaptive designs but are cau-
to unblind researchers and other individuals regarding tious about others. A number of adaptive designs have
any observed interim trends in the data. Discussions been classiied as ‘generally well understood adaptive
are needed to resolve this issue. For example, should designs with valid approaches to implementation’
the details of the adaptation be deined a priori in a in the recently released FDA guidance document on
separate document for which a limited number of indi- adaptive designs [8]:
viduals have access?
• Adapting study eligibility criteria based on
analyses of baseline data
Computational complexity • Sample size re-estimation based on blinded
Methods for the design and analysis of adaptive designs interim analyses of aggregate data
are oten computationally complex. As a result, getting • Adaptations based on interim results of an
the clinical trials community to accept any particular outcome unrelated to eicacy (e.g., discontinuing
type of adaptation is merely the irst step to utilizing doses with unacceptable toxicity)
the method. Achieving widespread implementation • Adaptations using group sequential methods
of accepted methods will require the creation of high for early study termination due to demonstrated
quality sotware packages with validated codes and eicacy or lack of beneit
well-documented examples. • Adaptations in the data analysis plan that are
not dependent on within study, between group
Impact on the Data and Safety Monitoring outcome diferences.
Board he list above does not imply that these are the only
he DSMB may be required to play a major decision- types of adaptations that should be considered. A
making role in an adaptive design protocol. his greatly number of other adaptive designs may be appropri-
expands the responsibilities of the DSMB. here must ate, provided that the investigators have adequately
98
addressed the operating characteristics of the design 3. Chow SC and Chang M. Adaptive design methods in
for the scenario in which it will be utilized. In general, clinical trials – A review. Orphanet J Rare Dis 2008; 3: 11.
the concept of ‘adaptive by design’ is crucial. By speci- 4. Cofey CS and Kairalla JA. Adaptive clinical trials:
fying all adaptations in advance, researchers have the Progress and challenges. Drugs RD 2008; 9: 229–42.
ability to simulate the study in order to gain a clear 5. Bretz F, Branson M, Burmann CF, et al. Adaptivity in
understanding of the operating characteristics of the drug discovery and development. Drug Dev Res 2009;
design. It is extremely important to ensure reliable, 70: 169–90.
well-planned, and thorough simulation studies are 6. Bretz F, Koenig F, Brannath W, et al. Adaptive designs
employed during the planning phase of an adaptive for conirmatory clinical trials. Stat Med 2009; 28:
clinical trial [41]. A common misconception is that an 1181–1217.
adaptive design requires less planning than a standard 7. EMEA. Relection paper on methodological issues in
trial design. In actuality, the opposite is true. An adap- conirmatory clinical trials with lexible design and
analysis plan. EMEA (European Medicines Agency) 2007.
tive design typically requires much more time for the
upfront planning and simulation studies that must be 8. Food and Drug Administration. Guidance for
done to ensure the validity and integrity of the trial. Industry: Adaptive Design Clinical Trials for Drugs
and Biologics Drat Guidance. http:/www.fda.gov/
he major barriers to the implementation of Drugs/GuidanceComplianceRegulatoryInformation/
adaptive designs in future clinical trial protocols are Guidances/default.htm (Accessed May 2010.)
primarily logistical, rather than statistical. A recent
9. Hung HMJ, O’Neill RT, Wang SJ, et al. A regulatory
publication by members of the adaptive designs view on adaptive/lexible clinical trial design.
working group describes current thinking on good Biometrical J 2006; 3: 1–9.
practices for adaptive clinical trials in pharmaceuti- 10. Gaydos B, Krams M, Perevozskaya I, et al. Adaptive
cal product development [42]. However, there is an dose-response studies. Drug Inf J 2006; 40: 451–61.
immediate need for further educational eforts to 11. Garrett-Moyer E. he continual reassessment method
clarify the strengths and weaknesses of the diferent for dose-inding studies: A tutorial. Clin Trials 2006; 3:
types of adaptations that have been proposed. here is 57–71.
also a need for discussions among study sponsors and 12. Bretz F, Hsu J, Pinheiro J, et al. Dose inding – A
investigators regarding how to address the logistical challenge in statistics. Biometrical J 2008; 50: 480–504.
barriers associated with the use of adaptive designs 13. West M and Harrison PJ. Bayesian Forecasting and
within current funding frameworks, and to address Dynamic Models. Springer-Verlag: New York, 1997.
whether major changes are needed to the funding 14. Bornkamp B, Bretz F, Dmitrienko A, et al. Innovative
models in order to accommodate the use of adaptive approaches for designing and analyzing adaptive dose-
designs. ranging trials. J Biopharm Stat 2007; 17: 965–95.
Greater usage of adaptive designs for neurology 15. Proschan MA, Lan KKG, Wittes JT. Statistical
trials should be encouraged. his will require a better Monitoring of Clinical Trials: A uniied approach.
understanding of the strengths and weakness of the Springer: New York, 2006.
diferent types of adaptations that have been proposed. 16. Rosenberger WF. Randomized play-the-winner clinical
Because this is a rapidly expanding area of research, trials: Review and recommendations. Control Clin
more practical experiences and case studies are needed Trials 1999; 20: 328–42.
in the literature. 17. Hardwick JP and Stout QF. Bandit strategies for ethical
sequential allocation. Control Clin Trials 1991; 23: 421–24.
References 18. Taves DR. he use of minimization in clinical trials.
Contemp Clin Trials 2010; 31: 180–84.
1. Gallo P, Chuang-Stein C, Dragalin V, et al. Adaptive
designs in clinical drug development – An executive 19. Bauer P and Kohne K. Evaluation of experiments with
summary of the PhRMA working group. J Biopharm adaptive interim analyses. Biometrics 1994; 50: 1029–41.
Stat 2006; 16: 275–83. 20. Proschan MA and Hunsberger SA. Designed extension
2. Krams M, Burman CF, Dragalin V, et al. Adaptive of studies based on conditional power. Biometrics 1995;
designs in clinical drug development: Opportunities, 51: 1315–24.
challenges, and scope relections following PhRMA’s 21. Lehmacher W and Wassmer G. Adaptive sample size
November 2006 workshop. J Biopharm Stat 2007; 17: calculations in group sequential trials. Biometrics 1999;
957–64. 55: 1286–90.
99
22. Cui L, Hung HMJ and Wang S. Modiications of sample dose-response study of UK-279,276 in acute ischemic
size in group sequential clinical trials. Biometrics 1999; stroke. Stroke 2003; 34: 2543–48.
55: 853–57. 34. Ho TW, Mannix LK, Fan X, et al. Randomized
23. Muller HH and Schafer H. Adaptive group sequential controlled trial of an oral CGRP receptor antagonist,
designs for clinical trials: Combining the advantages MK-0974, in acute treatment of migraine. Neurology
of adaptive and classical group sequential approaches. 2008; 70: 1304–12.
Biometrics 2001; 57: 886–91. 35. Whelan HT, Cook JD, Amlie-Lefond CM, et al.
24. Mehta CR and Patel NR. Adaptive, group sequential, Practical model-based dose inding in early-phase
and decision theoretic approaches to sample size clinical trials: Optimizing tissue plasminogen activator
determination. Stat Med 2006; 25: 3250–69. dose for treatment of ischemic stroke in children. Stroke
2008; 39: 2627–36.
25. Tsiatis AA and Mehta C. On the ineiciency of
the adaptive design for monitoring clinical trials. 36. Elkind MSV, Sacco RL, MacArthur RB, et al. High-dose
Biometrika 2003; 90: 367–78. lovastatin for acute ischemic stroke: Results of the phase
I dose escalation neuroprotection with statin therapy
26. Jennison C and Turnbull BW. Eicient group sequential for acute recovery trial (NeuSTART). Cerebrovasc Dis
designs when there are several efect sizes under 2009; 28: 266–275.
consideration. Stat Med 2006; 25: 917–32.
37. Haley EC, hompson JLP, Grotta JC, et al. Phase IIB/
27. Wittes J and Brittain E. he role of internal pilot studies III trial of tenecteplase in acute ischemic stroke: Results
in increasing the eiciency of clinical trials. Stat Med of a prematurely terminated randomized clinical trial.
1990; 9: 65–72. Stroke 2010; 41: 707–711.
28. Proschan MA. Two-stage sample size re-estimation 38. Olesen J, Diener H, Husstedt IW, et al. Calcitonin gene-
based on a nuisance parameter: A review. J Biopharm related peptide receptor antagonist BIBN 4096 BS for
Stat 2005; 15: 559–74. the acute treatment of migraine. New Engl J Med 2004;
29. Friede T and Kieser M. Sample size recalculation in 350: 1104–10.
internal pilot study designs: A review. Biometrical J 39. Quinlan JA and Krams M. Implementing adaptive
2006; 48: 537–55. designs: Logistical and operational considerations.
30. Proschan MA. Sample size re-estimation in clinical Drug Information Journal 2006; 40: 437–444.
trials. Biometrical J 2009; 51: 348–57. 40. Quinlan J, Gaydos B, Maca J, et al. Barriers and
opportunities for implementation of adaptive designs in
31. Kaufman P, hompson JLP, Levy G, et al. Phase II trial
pharmaceutical product development. Clin Trials 2010;
of CoQ10 for ALS ind insuicient evidence to justify
7: 167–73.
phase III. Ann Neurol 2009; 66: 235–44.
41. Burton A, Altman DG, Royston P, et al. he design of
32. Ravina B and Palesch Y. he phase II futility clinical simulation studies in medical statistics. Stat Med 2006;
trial design. Prog Neurother Neuropsychopharmacol 25: 4279–92.
2007; 2: 27–38.
42. Gaydos B, Anderson KM, Berry D, et al. Good practices
33. Krams M, Lees KR, Hacke W, et al. Acute stroke therapy for adaptive clinical trials in pharmaceutical product
by inhibition of neutrophils (ASTIN): An adaptive development. Drug Inf J 2009; 43: 539–56.
100
Chapter
Crossover designs
10 Mary E. Putt
Introduction time; our example showed high-low oxygen to be more

efective than placebo for treating cluster headaches [3]
his chapter describes crossover trials and their
(see also [4, 5]). To treat pain, Gilron et al. [6] showed
applications in neurology. In a typical crossover trial,
that gabapentin combined with nortriptyline was a
each subject receives more than one experimental
more efective analgesic than either alone. For stroke
intervention or placebo during the diferent periods of
patients in rehabilitation, several assistive walking
the trial. his chapter discusses conditions in neurology
devices improved functional mobility [7]. Symptoms
suitable for this design, the eiciency that is possible
of restless leg syndrome improved ater treatment with
with a crossover trial, and the beneits and limitations
ropinrole [8]. In studies of methylphenidate, children
of the design. Considerable thought is given to the
with pervasive developmental disorder responded with
thorny issue of carryover. his chapter will also review
decreased hyperactivity while a child with ADHD more
study design, provide guidance regarding the logistics
oten completed homework independently [9,10].
of carrying out a crossover trial and briely describe
Lastly, in an example of a trial reporting a negative ind-
some of the issues with missing data. Bioequivalence
ing, patients with Parkinson’s disease showed no signii-
studies, which typically use crossover designs, are not
cant improvement in the primary outcome, ADAS-cog,
discussed: the interested reader is referred to [1] for
during periods on donepezil compared to placebo [2].
an excellent discussion and to [2] for an illustration in
Table 10.1 shows that the sample size for each study was
neurology.
small to moderate. While sample size must be calcu-
lated carefully for any particular study of interest, Table
Applications in neurology 10.1 introduces the idea that successful crossover stud-
Table 10.1 lists several recently published crossover ies are oten carried out with modest sample sizes. his
studies. Chronic neurological conditions, where the has obvious beneits in terms of study cost and accrual;
outcome of interest is stable over the duration of a study, if resources are limiting and/or if eligible patients are
are excellent candidates for the design. Crossover tri- diicult to come by, the crossover may be the only feas-
als, in principle, could be used to study aspects of many ible design for a clinical trial. Reasons for the design’s
common neurological disorders including Parkinson’s eiciency are discussed next.
disease, Alzheimer’s disease, stroke, multiple sclerosis,
pain and headache, epilepsy, traumatic brain and spi-
nal cord injury, psychiatric disorders such as social Efficiency
anxiety disorder or generalized anxiety disorder, and Crossover designs are eicient. To illustrate, we present
developmental disabilities such as attention deicit sample size estimates for two placebo-controlled par-
hyperactivity disorder (ADHD) or autism spectrum allel and one crossover design for a trial examining the
disorders. eicacy of donepezil in treating dementia in patients
We briely describe the trials in Table 10.1; later with Parkinson’s disease. We note that a somewhat dif-
we revisit these studies to illustrate our discussion. ferent design was ultimately used in the published study
Headache is ideally suited to the crossover design as [11,12]. We estimated that 26 subjects were needed for
the condition is chronic and frequently stable over the simplest crossover design, a 2-treatment 2-period
101
102
Table 10.1 Examples of crossover trials. N is the number of subjects in the study
N Analyzed
Study Condition Treatment Design N Enrolled (Percent)
Cohen et al. [3] Cluster headache High-flow oxygen vs. placebo ABAB:BABA1 109 76(70%)
Gilron et al. [6] Neuropathic pain Morphine vs. gabapentin vs. Balanced Latin square 57 44(77%)
combination vs. placebo
Tyson and Rogers [7] Walking impairment post-stroke Four assistive walking devices Randomized order of 20 20(100%)
and control during rehabilitation receipt of devices
Adler et al. [8] Restless leg syndrome Ropinorole and placebo 2x2 22 22(100%)
Research units on Pediatric Hyperactivity in children with 3 doses of methylphenidate Placebo followed by 3 66 58(88%)
Psychopharmacology Autism pervasive developmental disorder versus placebo randomized doses of
Network [10] methylphenidate
Proschan 2008 [9] ADHD Methylphenidate versus nothing N of 1 1 1(100%)
2
Ravina et al. [11] Parkinson’s disease Donepezil versus placebo AABB:BBAA 22 19(86%)
1
Four-period design with alternating treatments beginning with A in the first sequence and B in the second sequence
2
Four-period design analogous to the 2 x 2 design except with two consecutive periods of each treatment
Chapter 10: Crossover designs
Sequence Period
500
1 2
AB A B
BA B A 400
Figure 10.1. 2 × 2 design.
(2 × 2) design (see Figure 10.1) to detect a diference in
Sample size
300
the mean of ADAS-cog, the cognitive subscale of the
Alzheimer’s Disease Assessment Scale, of 3.5 units. A
standard deviation of 10 units, and an intra-class cor-
relation coeicient, ρ, of 0.8 was assumed with a type 200
I error rate of 0.05 and a power of 80%. In contrast, a
parallel design with a single outcome and a baseline
measurement would need 103 patients if the diference
100
between baseline and response was used as the out-
come; if analysis of covariance was used for the same
data, the estimated sample size is 93 patients [13]. he
dramatic savings in patients for the crossover trial
relects the assumed large intra-class correlation coei- 0.0 0.2 0.4 0.6 0.8
cient of ρ = 0.8. he intra-class correlation is the ratio of ρ
the between-subject variance to the total variance, the Design
sum of the within and between-subject variance, with Parallel group/ANCOVA
values closer to 1 indicating that subjects demonstrate Parallel group/change from baseline
substantial heterogeneity in response. he treatment Crossover
efect in the 2 × 2 design is usually estimated largely, if Figure 10.2. Sample size as a function of the intra-class
not wholly, from a within-subject comparison. A value correlation, ρ, for a parallel group design or a 2 × 2 crossover. For
of ρ near 1.0 indicates that variability among patients is the parallel group a single outcome and baseline are collected and
analyzed either by subtracting the baseline from each subject’s
large compared to variability within patients. hus elim- outcome or by analysis of covariance (ANCOVA). Calculations are
inating between-subject variability and basing the esti- for a difference in mean outcome of 3.5 units, a standard deviation
mate on within-subject comparisons yields large savings of 10 units with 80% power for a two-sided Type I error of 0.05
assuming normality as described in [12, 13]; similar results were
in patients for the crossover design. Figure 10.2 shows obtained in PASS 2008 using a more conservative T-distribution for
the same calculations for a range of ρ, suggesting that the test statistic.
even for more modest ρ substantial savings in patients
are achieved. With such eiciency it is natural to ask
why crossover studies are not more common. here are sequence, where they receive treatment A in the irst
perhaps three reasons. First, crossover trials are gener- period followed by treatment B in the second period,
ally limited to chronic conditions where the endpoint or to the BA sequence, where the treatment order is
is stable and can be repeatedly measured (but see [14]). reversed (Figure 10.1). Of interest is the treatment
Second, bias in the estimated treatment efect may arise efect, in the case where the outcome is continuous,
from unequal carryover or period by treatment inter- the mean diference in outcome that is due purely to
actions; in my experience this is the primary concern diferences between treatments. To develop a proced-
limiting the use of crossover trials (see below). he last ure for estimating the treatment efect we describe an
part of this chapter describes some logistical challenges approach based on a model to account for a number of
involved in successfully completing a crossover trial. ‘nuisance parameters’, changes in mean outcome that
are not of direct interest in the trial because they are
The simplest crossover design due to factors other than treatment. We then describe
In the simplest crossover design, a 2 × 2 or AB:BA approaches for data analysis for several types of out-
design, subjects are randomized to either the AB come variables.
103
Table 10.2 Expected outcomes expressed as combinations of nuisance and treatment

parameters in a 2 x 2 crossover trial. Sequence effects which behave similarly to subject
effects are omitted to simplify the explanation.
Period 1 Period 2 Sequence-specific

Sequence (P1) (P2) period difference (P1 − P2)
No Carryover (Figure 10.3A,B)
AB π1 + δj + μA π2 + δj + μB (π1 − π2) + (μA − μB)
BA π1 + δj + μB π2 + δj + μA (π1 − π2) + (μB − μA)
Overall Effecta μA − μB
With Carryover (Figure 10.3C,D)
AB π1 + δj + μA π2 + δj + μB + λA (π1 − π2) + (μA − μB) − λA
BA π1 + δj + μB π2 + δj + μA + λB (π2 − π1) + (μA − μB) + λB
Overall Effecta 1
(μA–μB)– (λA–λB)
2
a
(P1-P2) for AB less (P1-P2) for BA, divided by 2
π = period mean; δj = the effect of the jth subject; μ = added effect of treatment; λ =
added effect of carryover
Modeling treatment administered in the same period where the

outcome is measured; in contrast, carryover is a com-
Nuisance parameters commonly considered in cross-
ponent of outcome due to the treatment administered
over trials include:
in the previous period(s). However treatment by period
a. Subject efects: Individual diferences in response. interactions and carryover efects are mathematically
b. Period efects: Diferences in the mean outcome of indistinguishable in the 2 × 2 trial. Treatment by period
interest between diferent periods that would occur interactions are not considered further here. We note
irrespective of treatment in those periods. that using relatively short trials may reduce the chance
c. Carryover: he lingering efect of a treatment of a period by treatment interaction. Additionally there
given in one period into the subsequent period (or are designs that distinguish carryover and treatment by
periods) of the crossover trial. period interactions if the latter are a potential problem
d. Treatment by period interactions: Changes in the in the study (e.g., see [15]).
efect of treatment at diferent periods of the study. Table 10.2 illustrates two models, irst without, and
For example, if a treatment is only efective with then with carryover. he equation form used by statisti-
minimally or moderately afected patients, and cians includes sums of nuisance and treatment parame-
the condition of the subjects deteriorates rapidly, a ters; the combination of parameters for any one period,
treatment may be efective in the irst period(s) of or combination of periods is the expected outcome,
the study and inefective in subsequent period(s). sometimes called simply the ‘expectation’. his expecta-
e. Sequence efects: Diferences in the mean tion is the mean response expected for the population
outcome that relect diferences in the response represented by the sample of patients used in the study.
to treatment for subjects assigned to diferent
groups. Sequence efects are essentially
The model without carryover
aggregated subject efects. For example a sequence Referring to Table 10.2, we hypothesize a situation
efect would occur if patients assigned to one where we measure the outcome of interest in each
sequence are older and older subjects on average period in the absence of experimental treatments; with
have worse responses. parameters for period 1 (π1) and period 2 (π2) and an
added efect of subject, δj in each period. Layered onto
Period by treatment interactions and carryover are these parameters are treatment parameters where μA is
conceptually distinct. A treatment by period interac- the added efect of treatment A; μB is the added efect of
tion is a component of outcome that depends on the treatment B.
104
Table 10.3 Numerical values of parameters used in Figure 10.3 AB: No CO BA: No CO
18 A B
Parameter
16
λ λ 14
π δj μ (Fig. 10.3A,B) (Fig. 10.3C,D)
12
π1 = 8 δj = 2 μA = 5 λA = 0 λA = 5
10
π2 = 3 μB = 1 λB = 0 λB = 5
8
Expected outcome
6
Without carryover, the combined data from the two 4
periods can yield an unbiased estimate of the treatment AB: +CO BA: +CO
efect, μA − μB. he column ‘Period diference (P1 − P2)’ 18 C D
shows the diference in expected outcome for periods 1 16
and 2 for each sequence. Note that taking the period dife-
14
rence eliminates subject efects. his is the basis of the
eiciency described above (‘Eiciency’); the estimate of 12
the treatment efect and, importantly, its variance relects 10

only a within-subject diference. In contrast, in a parallel 8
group trial, the variance of the estimated treatment efect
6
contains both between and within-subject components.
4
Next consider the overall estimate, which is determined
by taking the mean diference in outcome P1 − P2 for 1 2 1 2
Period
each sequence, and dividing by 2. he expectation of this
Parameters
estimate, the ‘Overall efect’, shows that period terms can- Nuisance+ trt
cel, leaving the desired treatment efect. Nuisance only
Figures 10.3A and 10.3B illustrate these efects for
Figure 10.3. Examples of expected outcomes for the AB and BA
a hypothetical trial using the parameters in Table 10.3, sequences without (A,B) and with (C,D) carryover (see Table 10.3).
where the only nuisance parameters are period and
subject efects. he total of the nuisance parameters that each treatment is represented in each period, the data
(π1 + δj for Period 1 and π2 + δj for Period 2) is shown yield an unbiased estimate of the treatment efect despite
in grey and the total of all parameters appears in black. pronounced period efects such as those in Figure 10.3.
At irst glance, Figures 10.3A and 10.3B suggest that
the response on the AB and BA sequence is very dif- The model with carryover
ferent. For the AB sequence the total for Treatment A Consider an identical model, but allow carryover in
versus Treatment B is superior by nine units; for the BA Period 2, as seen in Table 10.2. he expectation of the
sequence the total for Treatment A versus Treatment mean of the diference of the sequence-speciic period
B is inferior by one unit. his type of pattern can be 1
diferences is (µ A µ B ) − (λ A λ B ) (Overall efect
disconcerting. However the average diference of the 2
sequence-speciic period diferences (P1 − P2) from in Table 10.2). he data combined from the two periods
1 no longer yield an unbiased estimate of the treatment
Table 10.2, [(8 3 5 1) −(8 (8 3 1 5)] = 4 , is
2 efect; the overall estimate is biased from the treatment
exactly the treatment efect. For the AB sequence the estimate by a term that is half of the ‘carryover efect’. his
diference in expected outcome for the two periods problem occurs only when the two carryover terms dif-
appears pronounced because the treatment efect lay- fer. Carryover itself does not cause bias; diferences in
ered over a substantial period efect; for the BA sequence carryover yield bias in the estimated treated efect. We
the diference in expected outcome between treatment illustrate equal and unequal carryover using Figure 10.3.
conditions appears muted because it is opposite in dir-
ection to the period efect and the trends almost cancel. Equal carryover
When the efect of treatment is constant across periods, Figure 10.3C and 10.3D show a trial where the two
when there is no carryover, and the design is balanced so treatments have equal, albeit large, positive carryover
105
terms. Because carryover from treatments A and B into a treatment lingers into the subsequent period. For
the next period is identical, the sum of the nuisance example, in the study to improve walking post-stroke,
parameters changes by an identical amount in period 2. a device could have a positive impact on strength, that
For these data, the mean of the two P1 − P2 diferences continues into a subsequent period where no device is
yields an unbiased estimate of the treatment efect. used, leading to an underestimate of the efect of the
device. he treatment efect is overestimated when k is
Unequal carryover negative, for example when there is a rebound efect, and
Now imagine a trial where treatment A carries say a treatment yields a negative efect in the subsequent
over into period 2 for the AB sequence (Figure period. his type of carryover may have occurred in a
10.3C) but treatment B does not carry over in study where a child had insomnia ater taking methyl-
the BA sequence (Figure 10.3B). In this case the phenidate possibly leading to depressed performance
expectation of the estimated treatment efect is on the subsequent day and an inlated estimate of the
1 true improvement on homework performance [9].
[(8 3 5 1 5) (8 3 1 5)] 1.5 , a substan-
2
tial underestimate of the true treatment efect of 4.
Because the positive efect of treatment A lingers into
Analysis
period 2 in the AB sequence, the diference between Under the null hypothesis, there is no treatment efect
treatments A and B is attenuated. while under the alternative there is a treatment efect,
In this example, carryover from treatment A is i.e.:
large, identical to the original treatment efect of A H0 : µA µB = 0
while treatment B has no carryover. his results in a H A : µA µB ≠ 0 (1.3)
huge carryover efect. In practice carryover can be
reduced using washout periods (see below), leading to We briely describe the analysis of data arising from
the question: Ìf my washout period reduces but does continuous or discrete outcomes, as well as cen-
not completely eliminate carryover, how serious is the sored time-to-event data [15,17]. hese methods are
bias in the estimated treatment efect?’ [12,16]. Here it described for the 2 × 2 trial but extend readily to more
can help to think of the carryover efect as a proportion complex designs.
of the treatment efect, i.e.:
Continuous outcomes
λA − λB
k= (1.1) For continuous outcomes the observations on each
µA − µB
subject can be reduced to a single observation by
he expected bias in the estimated treatment efect is constructing paired diferences between periods for
the ratio of the treatment efect estimated with and each subject [18]. he treatment efect is estimated
without carryover efects less the desired value of 1. from half the mean of the diference of these paired
Substituting using Equation 1.1 gives: diferences as described above (‘he model without
carryover’, ‘he model with carryover’). Under the
1 null hypothesis, the P1 − P2 diferences have identical
(µ A µ B ) − (λ A λB ) means (Table 10.2). Under the alternative hypothesis,
Bias = 2 −1 the expected mean diference is twice the treatment
µA µB
1 efect. If it is reasonable to assume normality, or if
= 1 − k −1 the sample size is large, a two-sample t-test can be
2
1 used for hypothesis testing.1 When the sample size is
=− k
2 (1.2) small a permutation test, generally in the form of the
Wilcoxon-rank sum test, should be used to maintain a
So for example if k from Equation 1.1 is 20%, indicating valid test. If the distribution of the outcome is believed
that the carryover efect is 20% of the treatment efect,
the expectation of this contaminated treatment estimate 1
A paired t-test can be used if the number of subjects on
will underestimate the true treatment efect by 10% each sequence is identical; a paired t-test will be biased
(Bias = −10%). he treatment efect is underestimated if there are period efects and the number of subjects per
when k is positive, oten because the positive efect of sequence difers.
106
to deviate from normality, either because of outliers Censored data

or skewness, a Wilcoxon-rank sum is again the test of As a hypothetical example, in a study of treatments to
choice irrespective of sample size. In addition to being prevent seizures, the primary outcome could be time to
a valid test, the Wilcoxon has better power for heavy- irst seizure in each (ixed-length) period. he obser-
tailed or skewed distributions. vation is censored if no seizure occurs in a period.
Equivalently, for the 2 × 2 design, and for some of Hypothesis tests for censored data are constructed
the more complicated designs described below, a model using a modiied version of the Wilcoxon rank sum
that accounts for the repeated measures on each sub- test taking into account whether censoring is absent
ject can be constructed to provide both estimates and or present in one or both periods [23]. Estimates and
hypothesis tests. When the outcome is approximately conidence intervals can also be derived.
normally distributed, the model may be a simple lin-
ear regression, or equivalently an analysis of variance, Reducing the impact of carryover
including terms for treatment and period as well as a
Unwanted bias in the estimated treatment efect attrib-
ixed efects term for subject, or a mixed efects model
utable to a carryover efect has already been described
where treatment and period are included as ixed efects
(‘Unequal carryover’). Here we describe approaches to
and subject is included as a random efect. In crossover
mitigate carryover efects.
trials with small sample sizes, normality is diicult to
evaluate. Chen and Wei provide guidance for robust
methods for analysis when sample sizes are small [19]. Washout periods
‘Suicient washout periods’ are oten recommended to
Binary outcomes reduce or eliminate carryover efects. In practice it can
he approach is similar for binary outcomes (e.g., any be diicult to deine suicient. For a pharmacologic
vs. no improvement). It is simplest to reduce the data to intervention, knowledge of kinetics can be valuable in
one of two outcomes, i.e., improvement on one of the the planning stages. For example, ater seven half-lives
periods compared to the other, or no diference between less than 1% of the agent remains; at this time point
periods. Under the null hypothesis, results should be meaningful pharmacologic carryover is removed for
similar across sequences. hese data can be analyzed many drugs. If the efect of the agent is reasonably rapid
using an exact test, either by dropping the outcomes and the outcome is closely related to physiological con-
where the results are tied and using Mainland-Gart’s centration then this might be all of the information that
approach, essentially Fisher’s exact test, or by Prescott’s is needed. Other situations might be more complicated
extension of Fisher’s exact test to incorporate informa- if pharmacodynamic efects persist beyond the phys-
tion from ties into the 2 × 3 contingency table [20]. A ical elimination of the drug. he trial of donepezil on
more general approach models the binary or categor- cognitive function in Parkinson’s disease used a wash-
ical outcome as a function of treatment and period, and out equivalent to 17 half-lives of donepezil efectively
can be carried out using a marginal approach imple- eliminating a pharmacologic carryover [11]. Here the
mented with generalized estimating equations (GEE) primary outcome was ADAS-cog and we were con-
or a model where the subjects efects are considered cerned about carryover related to a training efect, i.e.,
random [21, 22]. hese approaches both account for improved performance over repeated administration
correlation among repeated responses on the same of the ADAS-cog instrument. We anticipated period
individual, but they answer subtly diferent questions. efects relecting improvement across periods related
he marginal approach makes inference about response to the training efect. However if donepezil were efect-
averaged across the population where the question of ive, outcomes would be better on donepezil than pla-
interest is: ‘on average do the odds of response difer cebo in the irst period, and the donepezil efect might
for patients receiving diferent treatments?’ In con- carry over into the second period as an enhanced train-
trast the mixed efects model is used to ask: ‘Is the odds ing efect. Using the model described in Table 10.2 the
of response diferent among treatments for patients treatment efect would be underestimated. In this trial
receiving both treatments?’ Random efects models we used a long washout period primarily to mitigate
are itted using either conditional logistic regression, non-pharmacologic carryover. We note that if donepe-
which eliminates any between-subject efects, or using zil were inefective (null hypothesis in Equation 1.3
a generalized linear mixed efects model [22]. true) any carryover for donepezil and placebo would
107
likely be identical, and under the model in Table 10.2, the diference between the baseline and outcome as the
the test would be valid [24]. In practice there are many primary outcome for the analysis. his procedure may
scenarios where it is reasonable to assume that hypoth- have unexpected consequences [27]. Let λA(Bsl) and λB(Bsl)
esis tests are valid under the null hypothesis even when be the carryover of treatments A and B into the base-
the possibility of carryover may bias the estimate under line measurement for period 2. Following Table 10.2,
the alternative hypothesis. taking the diference of the outcome during the active
period less its baseline values yields an expectation for
The two-stage approach the estimated treatment efect of:
It was once common to base the analysis of a crossover
trial on a preliminary test for carryover efects; to this 1 1
(µ A µ B ) − (λ A λ B ) + (λ A − λ BBsll ) (1.4)
day investigators sometimes report that a test for carry- 2 2
over proved negative [25]. In the two-stage approach
As in Equation 1.1
the analysis proceeded as described above (‘Continuous
outcomes’) when the test for carryover was negative; if
λ (ABsll ) − λ (BBsll )
unequal carryover was detected the analysis used the k ( Bsll ) = , (1.5)
data from the irst period of the study, essentially turn- µA − µB
ing the study into a parallel group design and discarding And the bias in the expected outcome is:
the information from the second period of the study. his
approach has numerous problems. First, it is essentially ⎡ 1 ⎤ 1
a sequential testing approach, but without proper adjust- Bias = ⎢1 + (k l
k )⎥ −1 = − (k k Bsl
Bl
) (1.6)
⎢⎣ 2 ⎥⎦ 2
ment for the multiple testing, leading to inlated type I
error rates for the test of the treatment efect [26]. Second, his analysis produces a biased estimate of the treat-
the test for unequal carryover is based on a between-sub- ment efect unless λA − λB = λA(Bsl) − λB(Bsl) or equivalently
ject comparison and the power to detect even large carry- λA − λA(Bsl) = λB − λB(Bsl). he analysis eliminates bias only
over efects is dismal, oten comparable to the type I error if diferences in carryover between outcome and base-
rate of the test [24]. Declaring the carryover efect absent line are identical for the two treatments. Otherwise col-
generally relects nothing more than lack of statistical lecting baselines just alters the bias. In a more realistic
power. Lastly, in the unlikely event that unequal carryover scenario, if treatment B is a placebo with no carryover
is detected, the subsequent test for the treatment efect and carryover from treatment A decreases in the inter-
using only the irst period data generally has little power. val between baseline and outcome, then subtracting
Instead of the two-stage approach, the study should be of baseline values changes the sign of the bias, e.g., if
carefully designed to reduce potential carryover efects. k(Bsl) = 40% and k = 20%, the treatment efect is over-
Moreover in planning the study, a sensitivity analysis can estimated by 10% (Bias = 10%), compared to ‘Unequal
be performed using Equation 1.2 to determine how carry- carryover’ (above) where, without baselines, the treat-
over efects of diferent magnitudes might impact the esti- ment efect was underestimated by 10%.
mated treatment efect. Similarly power can be calculated
for the combined treatment and carryover efect (Table
10.2, last line ‘Overall efect’) and compared to the power Study design
determined using the desired treatment. If treatment Alternatives to the 2 × 2 design are used to increase ei-
efects may be underestimated as a result of carryover ciency, provide unbiased estimates in the presence of
efects but the magnitude of the possible bias in the esti- carryover efects, and to compare more than two treat-
mate is acceptable to the investigator, sample sizes may be ments. hese topics are reviewed along with several
adjusted upward to achieve the desired power for a study. recent innovations in design including response adap-
tive designs, matching and N of 1 trials.
Baselines and carryover effects
Baselines are measurements collected post-randomi- Designs to address carryover effects
zation but prior to the start of treatment. Baselines may An extensive literature describes ways of choosing the
be collected once at the start of the study or prior to number of periods and treatment sequences to maxi-
each treatment period. Investigators sometimes use mize statistical power while simultaneously allowing
108
unbiased estimates of the treatment efect in the pres- with outcomes from multiple periods. Here ANCOVA
ence of carryover efects [28]. Much thought has gone should be used.
into realistic models of carryover in these studies, for
example allowing carryover to depend on the treat- Matched crossover designs
ment that induces carryover as well as the treatment
Modern genomic techniques allow the possibility of
administered in the period where the carryover occurs.
individualizing treatments to patients based on their
For example, carryover from an active treatment, say
genetic proile. More generally there is interest in tai-
A, for headache, may difer depending on whether the
loring treatments to individual subjects. For example,
subsequent period involves treatment with placebo
in asthma, patients with Arg/Arg at the 16th position
or a second period of treatment A at a diferent dose.
of the beta-agonist receptor gene may respond dif-
his type of carryover, called a mixed carryover efect,
ferently to inhaled albuterol than those with the Gly/
replaces the less realistic ‘simple carryover’ model that
Gly genotype [31]. Similar scenarios are easily envi-
assumes that carryover from treatment A is unafected
sioned for neurological phenotypes associated with
by treatment in the subsequent period.
complex underlying genotypes. In the asthma study,
Designs that yield unbiased estimates of the treat-
individual crossover studies for each group are suf-
ment efect in the presence of carryover efects tend to
icient if it is of interest to know whether albuterol
have power that is intermediate to the 2 × 2 design and
is more efective than placebo for each group [30].
the parallel group design. While washout periods can
A design where patients are matched based on their
eliminate or dramatically reduce carryover, they may
genotype and baseline function, and then rand-
be ruled out either for ethical or logistical reasons.
omized to a crossover sequence is usually preferred
Patients may not tolerate the washout, or an efective
if the question of interest is whether albuterol is more
washout would lengthen the study to the extent that
efective for Arg/Arg than Gly/Gly. he matched
extensive loss to follow-up might occur in the later
design is more eicient than individual studies as
periods. Here these alternative crossover designs can
long as correlation of the paired subjects on the same
provide an eicient alternative to the parallel group
treatment is greater than their correlations on difer-
design.
ent treatments.
Baselines and efficiency Response adaptive designs

Kenward and Roger [29] thoroughly reviewed the use
Balaam’s design is a two-treatment, four-sequence
of baselines in crossover trials recommending analy-
design which adds two sequences, one with two peri-
sis of these data using ANCOVA and concluding that
ods of A and one with two periods of B to the 2 × 2
baselines may improve eiciency, particularly when
design (Figure 10.4). Balaam’s design can be the basic
there is information about the treatment efect that
design for an appealing response adaptive design
can be gained from between-subject information
where patients are initially randomized to one of the
(see also [30]). Speciics of the ANCOVA analysis are
four sequences but over time the probability of alloca-
described in [29]. he less desired alternative uses the
tion to a sequence is altered by the relative success of
change from baseline as the outcome in the analyses
the treatments. For example if treatment A is consist-
described above (‘Continuous outcomes’). he ei-
ently superior to B, patients over time are increasingly
ciency of this approach depends on the decay in the
allocated with higher probability to the AA sequence.
correlation between repeated measurements over
time. he method may have better eiciency when the
baselines are collected relatively close in time to that
Sequence Period
of the outcome, and the washout period is long. Under 1 2
these conditions the baseline and outcome measure- AB A B
ment within a period are more highly correlated than BA B A
say the baseline and outcome from the subsequent AA A A
period. However the analysis of change from baseline BB B B
may have worse eiciency than an analysis without Figure 10.4. Balaam’s design: Re-randomization or response
baselines if the baseline measure is equally correlated adaptive design.
109
More information on response adaptive designs as disorders, a number of doses of methylphenidate were
they relate to crossover studies appears in [32,33]. of interest [10]. Table 10.1 includes studies with two
Response adaptive designs are appealing because they approaches to design when the number of periods
ultimately allocate more patients to the better treat- available for study is equivalent to the number of treat-
ment sequence. However, they are more complicated ments: using a Latin square design or simply random-
to administer and potentially less eicient than a ixed izing patients to a treatment order. A Latin square is a
allocation scheme. block with t columns corresponding to periods and t
rows corresponding to treatment sequences where t is
N of 1 trials the number of treatments (Figure 10.5). Each treatment
appears in each row and each column exactly once. he
Most clinical trials answer questions about mean dif-
2 × 2 design is the simplest example of a Latin square.
ferences in response to treatment for patients eligible
As we showed for the 2 × 2 design, Latin squares give
to enroll in the trial, with little information to guide
unbiased estimates of treatment efects in the presence
the clinician on how individual patients may respond.
of period efects. When the order of treatments is sim-
Matching (see above, ‘Matched crossover designs’)
ply randomized care must be taken in the analysis to
addresses one approach to individualizing the infor-
avoid introducing bias due to period efects.
mation from a trial, but still asks questions about the
Designing a crossover trial with more than two
mean response in subsets of patients. N of 1 trials are
treatments is more complicated when either carryover
designed to determine which of two treatments is more
efects need to be considered or when the number of
efective for a particular individual of interest [9,34].
periods difers from the number of treatments. Simple
Generally this design involves assigning treatment
carryover depends only on the treatment in the period
pairs to an individual in random order. For example
prior to when carryover occurs. Senn has written exten-
in [9] the study design speciied methylphenidate and
sively about the irrelevance of simple carryover to clin-
no intervention be assigned in random order to a child
ical research [18]. While these arguments have merit,
with ADHD on Monday and Tuesday, Wednesday and
the simple carryover assumption yields a mathemati-
hursday. he study duration was 7 weeks and the pri-
cally tractable model leading to the use of ‘balanced’
mary outcome was independence in homework com-
Latin square designs where not only does a treatment
pletion assessed by a blinded observer. his study could
appear in each period and sequence once, but where
be analyzed using the two-sample t-test or Wilcoxon
each treatment follows every other treatment the same
rank sum test on the P1−P2 diferences as described
number of times (see Figure 10.6). hese designs are
above (‘Analysis’) if the outcome were continuous, or
intricate involving multiple sequences. Jones and
using Fisher’s exact test for a binary outcome (e.g., a
Kenward [15] provide a detailed description of the
binary indicator of whether homework was completed
issues involved in designing such studies.
independently). Here each Monday/Tuesday and
Wednesday/hursday pair is considered an observa-
tion on period 1 and 2 of a single sequence2. Sequence Period
1 2 3
Designs for more than two treatments ABC A C B
Crossover designs in neurology oten involve more CAB C A B
than two experimental conditions (see Table 10.1). BCA B C A
To treat pain, investigators compared placebo to two Figure 10.5. Latin square: Three treatment design.
agents individually and in combination. To study
hyperactivity in subjects with pervasive developmental
Sequence Period
1 2 3 4
2
For this study a 5th observation was also collected each ABDC A B D C
week on Friday. For this reason, instead of analyzing BCAD B C A D
paired diferences the analysis was carried out on CDBA C D B A
the unpaired data using only a two-sample t-test, an DABC D A C B
approach which maintains Type I error as long as the
intra-class correlation is non-negative. Figure 10.6. Balanced Latin square: Four treatment design.
110
Logistics that is unobserved. For example, if a patient from a

Parkinson’s disease study had relatively intact cogni-
Crossover trials have logistical challenges beyond the
tive function at all follow-up visits that he attended, but
careful planning and implementation that accompanies
then withdrew from the trial due to a sudden and sharp
any successful clinical trial. he design requires repeated
deterioration in condition (and such deterioration was
contact with patients, possibly over a prolonged period
never observed), the subsequent missing data would
of time, with increased risk of drop outs. Recruitment
reasonably be classiied as NMAR [11].
can be slow if subjects with neurological conditions
he assumption regarding the missing data mecha-
need to make repeated visits to a medical center, par-
nism is critical for the analysis but in general these
ticularly if caretaker support is needed to make the visit.
assumptions are untestable with the available data.
Patients must be randomized to a sequence of at least
hus the investigator chooses a method based on best
two treatments so the investigator must carefully plan
available knowledge about the trial. For data that are
how this will occur, particularly if it is a blinded study. In
MCAR, dropping those cases with missing data in a
a blinded study, the risk for un-blinding increases when
‘complete case analysis’ yields valid, albeit ineicient,
each patient can compare multiple treatments; careful
inference. Data that are either MCAR or MAR both
attention to the preparation of the experimental treat-
yield valid inference in likelihood-based models such
ment is needed if blinding is to be maintained. Lastly, the
as mixed efects models. For example in a 2 × 2 design,
investigator must balance the desirability of a washout
subjects with missing data in one period contribute to a
period (see above) with the risk that patients become
between-subject component estimate of the treatment
non-compliant during the washout and seek alternative
efect in this approach. he likelihood-based approach
treatment, particularly if their condition deteriorates
is preferred over the complete case analysis for MCAR
during the washout due to inactive treatment.
because of the eiciency gained by using all of the data
collected in the study. Likelihood-based approaches
Missing or incomplete data do require strong distributional assumptions that may
In Table 10.1, the sample size in the analysis was oten not be justiiable; several more robust approaches are
smaller than the number of subjects who enrolled. available [17]. One approach that should never be used
Oten this relects loss to follow-up (See Chapter 5 is ‘last observation carried forward’ (LOCF). he very
for more on bias and random error). he study report strong assumptions LOCF makes are rarely justiied
should describe patients who enroll but contribute no in practice and may introduce bias into the estimates
observations to, at least qualitatively, understand how [22]. Lastly, NMAR requires advanced statistical meth-
the missing data might inluence the study’s generaliz- ods and is generally used as a sensitivity analysis rather
ability. he analysis can use information from patients than as the pre-speciied analysis; Simon and Chinchilli
with ‘incomplete’ observations, i.e., subjects who con- [35] suggest one approach for using an NMAR analysis
tribute at least one observation and are missing at least for the 2 × 2 paired crossover trial.
one observation with the analysis depending on the
so-called missing data mechanism [21,22]. Data miss- References
ing completely at random (MCAR) are missing with-
1. Chow S, Liu J. Design and Analysis of Bioequivalence
out regard to either the observed or unobserved data Studies. 3rd ed. Boca Raton, FL: Chapman & Hall, 2007.
in the study. For example if a machine randomly fails to
2. Constantinescu R, McDermott MP, Dicenzo R, et al.
measure an outcome, the missing data are MCAR. he A randomized study of the bioavailability of diferent
observed data are a random sample of the complete formulations of coenzyme Q(10) (ubiquinone). J Clin
data that would have been observed had the machine Pharmacol 2007; 47: 1580–6.
not malfunctioned. Information about data that are 3. Cohen AS, Burns B and Goadsby PJ. High-low oxygen
missing at random (MAR) is contained solely in the for treatment of cluster headache: A randomized trial.
observed data. For example in the stroke rehabilitation JAMA 2009; 302: 2451–7.
study, a patient who felt success walking using the irst 4. Schytz HW, Birk S, Wienecke T, et al. PACAP38 induces
device she tried and subsequently refused to try other migraine-like attacks in patients with migraine without
devices would have MAR data [7]. Lastly, data that are aura. Brain 2009; 132: 16–25.
not missing at random (NMAR) or non-ignorable, have 5. Hauge AW, Asghar MS, Schytz HW, et al. Efects of
a missingness mechanism that depends on information tonabersat on migraine with aura: a randomised,
111
double-blind, placebo-controlled crossover study. 20. Prescott RJ. he comparison of success rates in cross-
Lancet Neurol 2009; 8: 718–23. over trials in the presence of an order efect. Appl Stat
6. Gilron I, Bailey JM, Tu D, et al. Morphine, gabapentin, 1981; 30: 9–15.
or their combination for neuropathic pain. N Engl J Med 21. Diggle P, Heagerty P, Liang KY and Zeger S. Analysis
2005; 352: 1324–34. of Longitudinal Data. 2nd ed. New York: Oxford
7. Tyson SF and Rogerson L. Assistive walking devices University Press, 2002.
in nonambulant patients undergoing rehabilitation 22. Fitzmaurice GM, Laird NM and Ware JH. Applied
ater stroke: the efects on functional mobility, walking Longitudinal Analysis. Chichester: John Wiley & Sons,
impairments, and patients’ opinion. Arch Phys Med 2004.
Rehabil 2009; 90: 475–9.
23. Feingold M and Gillespie BW. Cross-over trials with
8. Adler CH, Hauser RA, Sethi K, et al. Ropinirole for censored data. Stat Med 1996; 15: 953–67.
restless legs syndrome: a placebo-controlled crossover
24. Putt ME. Power to detect clinically relevant carry-over in
trial. Neurology 2004; 62: 1405–7.
a series of cross-over studies. Stat Med 2006; 25: 2567–86.
9. Proschan M. Self-experimentation and web trials.
25. Brimacombe J, Keller C, Eschertzhuber S and
Chance 2008; 21: 7–9.
Hohlrieder M. he problem of cross-over design in
10. Research Units on Pediatric Psychopharmacology airway studies: a reply. Aneasthesia 2009; 64: 919.
(RUPP) Autism Network. Randomized, controlled,
crossover trial of methylphenidate in pervasive 26. Freeman PR. he performance of the two-stage analysis
developmental disorders with hyperactivity. Arch Gen of two-treatment, two-period crossover trials. Stat Med
Psychiatry 2005; 62: 1266–74. 1989; 8: 1421–32.
11. Ravina B, Putt M, Siderowf A, et al. Donepezil for 27. Fleiss JL. A critique of recent research on the two-
dementia in Parkinson’s disease: a randomised, double treatment crossover design. Control Clin Trials 1989; 10:
blind, placebo controlled, crossover study. J Neurol 237–43.
Neurosurg Psychiatry 2005; 76: 934–9. 28. Hedayat AS and Stuken J. Optimal and eicient
12. Putt ME and Ravina B. Randomized, placebo- crossover designs under diferent assumptions about
controlled, parallel group versus crossover study the carryover efects. J Biopharm Stat 2003; 13: 519–28.
designs for the study of dementia in Parkinson’s disease. 29. Kenward MG and Roger JH. he use of baseline
Control Clin Trials 2002; 23: 111–26. covariates in crossover studies. Biostatistics 2010; 11:
13. Borm GF, Fransen J and Lemmens WA. A simple sample 1–17.
size formula for analysis of covariance in randomized 30. Liang Y and Carriere KC. On the role of baseline
clinical trials. J Clin Epidemiol 2007; 60: 1234–8. measurements for crossover designs under the self and
14. Nason M and Follman D. Design and analysis of mixed carryover efects model. Biometrics 2010; 66:
crossover trials for absorbing binary endpoints. 140–8.
Biometrics 2010; 66: 958–65. 31. Simon LJ andChinchilli VM. A matched crossover design
15. Jones B and Kenward MG. Design and Analysis of Cross- for clinical trials. Contemp Clin Trials 2007; 28: 638–46.
Over Trials. 2nd ed. Boca Raton, FL: Chapman & Hall/ 32. Liang Y and Carriere KC. Multiple-objective response-
CRC, 2003. adaptive repeated measurement designs for clinical
16. Willan AR and Pater JL. Carryover and the two-period trials. J Stat Plan Inf 2009; 139: 1134–45.
crossover clinical trial. Biometrics 1986; 42: 593–9. 33. Bandyopadhyay U, Biswas A and Mukherjee S.
17. Vonesh EF and Chinchilli VM. Crossover Trials. Adaptive two-treatment two-period crossover design
Linear and nonlinear models for the analysis of repeated for binary treatment responses incorporating carry-
measurements. New York: Marcel Dekker Inc, 1997; over efects. Stat Meth Appl 2009; 18: 33.
119–201. 34. Guyatt G, Sackett D, Taylor DW, et al. Determining
18. Senn S. Cross-over trials in Clinical Research. 2nd ed. optimal therapy – randomized trials in individual
Chichester, John Wiley & Sons, 2002. patients. N Engl J Med 1986; 314: 889–92.
19. Chen X and Wei L. A comparison of recent methods 35. Simon LJ and Chinchilli VM. A pattern mixture model
for the analysis of small-sample cross-over studies. Stat for a paired 2 × 2 crossover design. IMS Collections
Med 2003; 22: 2821–33. 2008, 1: 257–271.
112
Chapter
Two-period designs for evaluation
11 of disease-modifying treatments
Michael P. McDermott
Introduction their populations age; 3) research scientists who seek

clearer understanding of the mechanisms that underlie
Many pharmacologic agents have been developed
diseases and treatments; 4) pharmaceutical companies
in recent years for the treatment of certain progres-
who seek product diferentiation and an increasing
sive neurological diseases such as Alzheimer’s disease
market share; and 5) regulatory agencies such as the
(AD) and Parkinson’s disease (PD). Cholinesterase
FDA and European Medicines Agency (EMA) who
inhibitors such as tacrine, donepezil, rivastigmine, and
will need to make decisions concerning the necessary
galantamine and the glutamate antagonist meman-
evidentiary standards to approve a new treatment for
tine have been US FDA-approved for treatment of the
an indication of disease modiication. All of these con-
symptoms of AD. Vitamin E has also been suggested
stituencies, of course, are also motivated by the desire
to be beneicial in AD [1]. A wider array of treatments
is available for the motor symptoms of PD, including to help people who have disease.
levodopa, dopamine agonists (e.g., pramipexole and he term disease modiication implies that the
ropinirole), monoamine oxidase type B (MAO-B) treatment has exerted an enduring efect on the course
inhibitors (e.g., selegiline and rasagiline), amantadine, of the underlying disease. For example, in AD this may
anticholinergics, and catechol-O-methyl transferase mean that a key pathological feature of the disease has
(COMT) inhibitors (e.g., entacapone and tolcapone). been modiied, such as tau and β-amyloid protein lev-
Surgical treatments such as deep brain stimulation els in the brain [3]. In PD, it may mean that the rate
and pallidotomy are also employed later in the disease of loss of catecholaminergic neurons, primarily the
course. While these treatments have been established dopaminergic projection from the substantia nigra to
as eicacious, none have been conclusively shown the striatum, has been altered [4]. Alternatively, it may
to modify the underlying course of the disease and relect an alteration in the physiological compensatory
most are believed to exert their efects only on disease mechanisms in PD [5–7]. In either case, for a disease
symptoms. modifying efect to be important, the impact on the
here is great interest in the problem of designing underlying disease would have to be accompanied by
clinical trials to establish the extent to which a treat- a measurable beneit on the clinical course of the dis-
ment has disease-modifying efects, symptomatic ease [8]. his is in contrast to treatments that amelio-
efects, or both in neurodegenerative diseases. Indeed, rate the symptoms of the disease without afecting the
discovering a treatment that either slows, halts, or underlying disease process. When such a treatment is
even reverses underlying disease progression has been discontinued, the efect of the treatment disappears in
termed the ‘highest priority in PD research’ [2]. he a relatively short period of time.
issue of disease modiication is of paramount impor- In order to establish that a treatment has an impact
tance to many constituencies, including: 1) people on underlying disease progression, a clinical trial must
with the disease who are seeking improved quality of clearly distinguish between the symptomatic and dis-
life for a longer period of time, if not a cure; 2) govern- ease-modifying efects of the treatment. It would not
ments who will have to confront increasing drains on be diicult to design such a trial if a valid marker of the
health care resources and increased health care costs as underlying progression of the disease were available.
Cambridge University Press. © Cambridge University Press 2012,
113
Although a considerable amount of research has been loss of basic activities of daily living, or a diagnosis of
(and continues to be) devoted to establishing such severe dementia, whichever occurred irst [1]. he dif-
markers, these eforts have, so far, been unsuccessful. As iculty with this strategy is that such endpoints can be
a result, special trial designs have been developed that inluenced by symptomatic efects as well as disease-
attempt to distinguish the symptomatic and disease- modifying efects. his is true even of mortality, which
modifying efects of treatment using clinical outcome may be delayed by the beneicial consequences of
measures. hese designs, termed ‘two-period designs’ symptomatic prevention of a decline in function.
[9], include the so-called withdrawal and delayed-start Others have suggested that standard parallel
(or ‘staggered-start’) designs and their variations. his group designs can be used to address the issue of dis-
chapter describes these study designs in terms of their ease modiication by examining the pattern of mean
rationale, assumptions, design features, implemen- responses over time on a suitable clinical rating scale
tation, statistical analysis, and sample size consid- [12–13]. Even if the pattern of change over time is lin-
erations. Important limitations of the designs are also ear in each treatment group, a group diference in the
discussed. To date, published results are available for rate of change (slope) does not necessarily indicate an
only three trials in neurodegenerative disease (all in efect of treatment on the underlying progression of the
PD) that have used the two-period design. Additional disease. his pattern is also compatible with the inter-
experience with this design will ultimately determine pretation of a very slow-onset symptomatic efect [14].
its usefulness in discerning the mechanisms of treat- It may also arise if the symptomatic efect of the treat-
ment efects (symptomatic, disease-modifying, or ment changes as a function of time. For example, the
both) in neurodegenerative disease. magnitude of the symptomatic efect in a participant
may increase as the underlying disease worsens, as the
score on the clinical rating scale worsens, or as the par-
Problems with single-period designs ticipant ages. Indeed, such a pattern might be expected
One of the earliest examinations of disease modii- with some treatments in neurodegenerative disease. It
cation took place in the Deprenyl and Tocopherol is thus clear that a divergence in mean response over
Antioxidative herapy of Parkinsonism (DATATOP) time cannot necessarily be attributed to a disease-
trial, which was designed to test the hypothesis that modifying efect of the intervention.
selegiline and vitamin E slowed the progression of As noted above, standard single-period parallel
PD [10–11]. Eight hundred participants with early, group designs could be used to establish a disease-
untreated PD were randomized to receive selegiline, modifying efect of a drug if valid measures of the
vitamin E, both treatments in combination, or placebo underlying neurodegenerative process were available.
in a 2 × 2 factorial design. he primary outcome vari- In PD, two imaging outcomes have been explored in
able was the time from randomization until the devel- this regard. Striatal uptake of luorodopa, as determined
opment of disability suicient to require treatment by PET imaging, has been investigated as a measure of
with dopaminergic therapy, as judged by the enrolling the capacity of dopamine neurons to decarboxylate
investigator. It was assumed that neither of the study and store levodopa/dopamine [15]. Similarly, striatal
interventions had a symptomatic efect, and that the uptake of β-CIT, as determined by single photon emis-
study design, a standard parallel group trial, would be sion computerized tomography (SPECT), has been
suicient to demonstrate disease-modifying efects. examined as a measure of the density of dopamine
Although a pronounced beneicial efect of selegiline transporters on presynaptic dopamine terminals [16].
on the primary outcome variable was demonstrated, Both markers have demonstrated a characteristic pat-
an unanticipated short-term symptomatic efect of sel- tern of asymmetric signal loss primarily in the posterior
egiline was also apparent [10], making the results dif- putamen in PD patients and appear to decline linearly
icult to interpret with respect to mechanism. over time [17], but have yielded ambiguous results in
he DATATOP study is an example of a trial that clinical trials comparing dopamine agonists with levo-
attempted to use an important disease milestone as an dopa [15–16], possibly due to diferential acute efects
outcome variable to measure the disease-modifying of these drugs on the dopamine transporter [17–18].
efects of an intervention. A virtually identical design As a result, there remain concerns with these out-
was used in a trial of selegiline and vitamin E in AD comes as measures of the underlying neurodegenera-
in which the milestone was death, institutionalization, tive process in PD. Several candidate biomarkers have
114
Chapter 11: Two-period designs
been proposed in AD, including CSF levels of tau and DATATOP trial participants had study medications
β-amyloid protein, regional and whole brain atrophy withdrawn ater either reaching the study endpoint
on MRI, and imaging of amyloid plaques; these and (disability suicient to require dopaminergic treat-
other proposed markers remain to be established as ment) or completing their inal evaluation and were
valid measures of AD progression [8,13]. re-evaluated 1–2 months later [11].
he purpose of the withdrawal maneuver is to deter-
Two-period designs mine whether any portion of the treatment efect that is
evident at the end of Period 1 persists ater withdrawal
Withdrawal design of treatment, i.e., to distinguish between the short-term
In the context of AD, Leber [19] formally introduced symptomatic efect and the disease-modifying efect.
the concept of the two-period design to investigate the A key assumption is the adequacy of the length of the
disease-modifying efects of an intervention. One such withdrawal period (Period 2). In the DATATOP trial,
design is the so-called withdrawal design in which par- although the mean response on the Uniied Parkinson’s
ticipants are randomly assigned to receive either active Disease Rating Scale (UPDRS) total score in partici-
treatment or placebo in the irst period (Period 1) and pants originally receiving selegiline remained slightly
followed for a ixed length of time. All participants better than that in participants not receiving selegiline,
are then given placebo in the second period (Period this may have been due to the relatively short duration of
2), i.e., those on active treatment are withdrawn from the withdrawal period (1–2 months) [11]. In the Early
that treatment and switched to placebo, and those on vs. Late L-dopa in Parkinson’s Disease (ELLDOPA) trial,
placebo continue to receive placebo (Figure 11.1). he participants were randomized to receive one of three
two periods do not have to be of equal length; Period dosages of levodopa or matching placebo and followed
1 is chosen to be long enough to allow any disease- for 40 weeks, ater which they underwent a 2-week
modifying efect of the treatment to become apparent, withdrawal of study medication [20]. Participants
and Period 2 is chosen to be long enough to eliminate receiving levodopa continued to have substantially bet-
(or wash out) any symptomatic efect of the treatment ter mean UPDRS total scores than those receiving pla-
from Period 1. Any group diference in mean response cebo ater the withdrawal period, but this again may have
at the end of Period 2 in favor of the group receiving been due to the short duration of the withdrawal period.
active treatment in Period 1 may then be attributed to a It should be noted that the underlying hypothesis being
disease-modifying efect of the treatment. his design tested in the ELLDOPA trial was that levodopa would be
has been previously employed. For example, in the associated with a worsening of PD progression.
Withdrawal design Figure 11.1. Illustration of the

withdrawal design, in which trial
participants are randomly assigned
P/P
to receive either active (A) or placebo
(P) treatment in Period 1 followed by
10 placebo treatment for all participants in
Mean change in UPDRS score
A/P
Period 2. The notation ‘A/P’ indicates the
group that received active treatment in
Period 1 followed by placebo treatment
in Period 2. The outcome variable is the
5 mean change in the Unified Parkinson’s
Disease Rating Scale (UPDRS) total
score, where positive changes indicate
worsening. Disease modification is
supported by a persisting difference in
0 mean response between the A/P and P/P
groups at the end of Period 2.
–5
Period 1 Period 2
115
Delayed start design Figure 11.2. Illustration of the delayed

start design, in which trial participants
are randomly assigned to receive either
active (A) or placebo (P) treatment in
10 Period 1 followed by active treatment for
all participants in Period 2. The notation
‘P/A’ indicates the group that received

placebo treatment in Period 1 followed
by active treatment in Period 2. The
5 outcome variable is the mean change in
P/A the Unified Parkinson’s Disease Rating
Scale (UPDRS) total score, where positive
changes indicate worsening. Disease
A/A modification is supported by a persisting
difference in mean response between
0
the P/A and A/A groups at the end of
Period 2.
–5
Period 1 Period 2
Although the withdrawal design theoretically per- active treatment continue to receive active treatment
mits inference about both symptomatic and disease- (Figure 11.2). As in the withdrawal design, the two peri-
modifying efects, a potential problem is that there ods do not have to be of equal length; Period 1 is cho-
is no blinding with respect to the treatment received sen to be long enough to allow any disease-modifying
during the withdrawal period (Period 2). In addition efect of the treatment to become apparent, and Period
to the obvious biases that can result, participant reten- 2 is chosen to be long enough for the treatment to fully
tion may become a problem during Period 2, particu- exert its symptomatic efect. Any group diference in
larly if it is lengthy, since participants will be aware that mean response at the end of Period 2 in favor of the
they are not receiving active treatment. One strategy group receiving active treatment during Period 1 may
to address these concerns is to add a third randomized then be attributed to a disease-modifying efect of the
group to the study in which participants remain on treatment.
active treatment in both Period 1 and Period 2. he As in the withdrawal design, a key assumption is
advantage of this strategy is that blinding can be main- the adequacy of the length of Period 2 to ensure that
tained throughout the trial. his third group, however, the group irst receiving active treatment in Period 2
has no value in distinguishing between the disease- does not continue to ‘catch up’ to the group that has
modifying and symptomatic efects of the treatment; been receiving active treatment throughout the trial.
hence, eiciency is lost [9]. Relatively few participants he delayed start design has been used in three trials
can be allocated to this third group in order to minim- in PD to date: the TVP-1012 in Early Monotherapy for
ize the loss of eiciency. Parkinson’s Disease Outpatients (TEMPO) [21] and
Attenuation of Disease Progression with Azilect Given
Once-Daily (ADAGIO) trials of rasagiline [2, 22] and
Delayed start design the Assessment of Potential Impact of Pramipexole on
Concerns regarding participant recruitment and reten- Underlying Disease (PROUD) trial of pramipexole
tion associated with the withdrawal design motivated [23–24]. In the TEMPO trial, participants were ran-
Leber to propose an alternative design that he termed domized to receive one of two dosages of rasagiline
the randomized start design [19]; this has also been (1 mg/day or 2 mg/day) or matching placebo and fol-
called the staggered start design and is now commonly lowed for 26 weeks in Period 1. In Period 2, partici-
known as the delayed start design. he design is the same pants in the placebo group were switched to rasagiline
as the withdrawal design in Period 1, but in Period 2 all 2 mg/day (delayed start group) and participants in
participants are given active treatment, i.e., those on the active treatment groups maintained their original
placebo are switched to active treatment and those on treatment assignments for an additional 26 weeks. he
116
(A) Rasagiline 1 mg/day Figure 11.3. Summary of the ADAGIO

trial results for the 1 mg/day (A) and
2 mg/day (B) dosages of rasagiline. Both
dosages of rasagiline demonstrated
4 significant benefit relative to placebo
Mean change in UPDRS total score
at the end of Period 1. For the 1 mg/day

Delayed start dosage, the mean change in the Unified
Parkinson’s Disease Rating Scale (UPDRS)
2
total score remained lower in the early
start group than in the delayed start
group throughout Period 2. For the 2
0 Early start mg/day dosage, the mean change in
UPDRS total score in the delayed start
group caught up to that in the early start
group.
–2
–4
0 12 24 36 48 60 72
Weeks after randomization
(B) Rasagiline 2 mg/day
4
Mean change in UPDRS total score
Delayed start
2
Early start
0
–2
–4
0 12 24 36 48 60 72
Weeks after randomization
trial provided evidence that there may be a disease- by rasagiline 1 mg/day for 36 weeks; 3) rasagiline 2 mg/
modifying efect of rasagiline 2 mg/day as the mean day for 72 weeks; and 4) placebo for 36 weeks followed
response on the UPDRS total score in the delayed start by rasagiline 2 mg/day for 36 weeks. he trial produced
group remained lower than the mean response in the conlicting results [22]. Counter to expectations,
early start group at Week 52 [21]. rasagiline 2 mg/day did not appear to have a disease-
he TEMPO trial was followed by the conirmatory modifying efect as the delayed start group caught up
ADAGIO trial in which participants were randomized to the early start group in terms of mean response on
with equal allocation to one of four groups: 1) rasag- the UPDRS total score during Period 2 (Figure 11.3).
iline 1 mg/day for 72 weeks (36 weeks in Period 1 and he 1 mg/day dosage, however, yielded a classic pattern
36 weeks in Period 2); 2) placebo for 36 weeks followed of mean UPDRS total scores over time that would be
117
Complete two-period design Figure 11.4. Illustration of the

complete two-period design, in which
P/P trial participants are randomly assigned
to receive either active (A) or placebo
10 (P) treatment in Period 1 followed by
A/P either active or placebo treatment
during Period 2. This may be viewed

as the combination of the withdrawal
and delayed start designs. The notation
‘A/P’ indicates the group that received
5 active treatment in Period 1 followed by
P/A
placebo treatment in Period 2; similar
notation is used for the other three
groups.
A/A
0
–5
Period 1 Period 2
expected from a drug that had an efect with a disease- disease-modifying efects of a treatment [14]. he afore-
modifying component (Figure 11.3). mentioned concerns regarding participant recruitment
In the PROUD trial, participants were randomized and retention associated with the withdrawal design,
to receive pramipexole 1.5 mg/day or matching placebo however, signiicantly limit its use in practice.
and followed for 9 months in Period 1. In Period 2, par-
ticipants in the placebo group were switched to prami-
pexole 1.5 mg/day (delayed start group) and participants
Complete two-period design
A combination of the withdrawal and delayed start
in the pramipexole group maintained their original
designs, termed the complete two-period design [9], was
treatment assignment for an additional 6 months. he
irst presented by Whitehouse et al. [25] who described
trial demonstrated no evidence of a disease-modifying
its use in a trial of propentofylline in AD. he trial had
efect of pramipexole as the delayed start group caught
four treatment arms (Period 1/Period 2): placebo/
up to the early start group in terms of mean response on
placebo, placebo/propentofylline, propentofylline/
the sum of the UPDRS motor and activities of daily liv-
placebo, and propentofylline/propentofylline. he
ing component scores during Period 2 [23].
results of this trial were apparently never published.
he delayed start design shares with the withdrawal
he general design is depicted in Figure 11.4.
design the potential problem that there is no blinding
Under certain assumptions (discussed below), the
with respect to the treatment received during Period 2.
complete two-period design would have the advan-
Again, one strategy to address this concern is to add a
tage of blinding without sacriicing eiciency, i.e., data
third randomized group to the study in which partici-
from all treatment arms would provide information on
pants remain on placebo in both Period 1 and Period
the distinction between the symptomatic and disease-
2. his third group, however, has no value in distin-
modifying efects of the treatment [9]. In essence, the
guishing between the disease-modifying and sympto-
information from the withdrawal component and the
matic efects of the treatment; hence, eiciency is lost
delayed start component of this design can be combined
[9]. Relatively few participants can be allocated to this
to produce an estimate of the disease-modifying efect
third group in order to minimize the loss of eiciency. A
of the treatment. As will be explained below, however,
practical problem is that this third group, which would
the assumptions required for this are somewhat strong.
never receive active treatment, might make it less attrac-
tive for potential participants to enroll in the trial.
Simulation studies using disease progression mod- A statistical model
eling suggest that the withdrawal design may provide A statistical model for data from a complete two-period
more power than the delayed start design to detect design assumes that a normally-distributed outcome
118
Table 11.1 Statistical model for mean responses in the termed μ2. he A/P group, which had active treatment
complete two-period design
withdrawn in Period 2, is assumed to retain the disease-
End of End of modifying efect acquired from active treatment during
Component Group Period 1 Period 2 Period 1, but any symptomatic efect acquired during
Withdrawal P/P µ1 µ2 Period 1 is assumed to disappear by the end of Period
2. hus, the mean response in this group is μ2 + αD. he
A/P µ1 + αS + αD µ2 + αD
P/A and A/A groups both receive active treatment in
Delayed start P/A µ1 µ2 + α′T Period 2, the total (symptomatic + disease-modifying)
A/A µ1 + αS + αD µ2 + αD + α′′T efects of which are denoted by the parameters α′T (P/A
group) and α′′T (A/A group). he A/A group is also
Group indicates the treatment assignments (Period 1/Period 2),
with P = placebo and A = active.
assumed to retain the disease-modifying efect (αD)
αS = Symptomatic effect acquired during Period 1. and lose the symptomatic efect (αS) acquired during
αD = Disease-modifying effect acquired during Period 1. Period 1.
α′T = Total incremental effect (symptomatic + disease- his simple model for the mean responses illus-
modifying) acquired during Period 2.
trates several important assumptions that underlie the
α′′T = Total incremental effect (symptomatic + disease-
modifying) acquired during Period 2. withdrawal and delayed start designs: 1) Period 1 is
long enough for a detectable disease-modifying efect
to become apparent; 2) the disease-modifying efect
variable Y is measured on each participant at the end acquired over the duration of Period 1 (αD) remains
of Period 1 (Y1) and at the end of Period 2 (Y2). A typ- with the participant (at least through the end of Period
ical analysis of data from this design might include 2, but presumably longer); 3) Period 2 is long enough
certain covariates such as site and the baseline value of for the symptomatic efect from Period 1 (αS) to com-
the outcome variable, but these will be ignored here for pletely disappear by the end of this period; and 4)
simplicity. Additional details regarding this model are withdrawal of active treatment does not modify (e.g.,
presented by McDermott et al [9]. hasten) the disease process in some way.
he model for the mean responses at the end of each It is clear from Table 11.1 that the diference in
period in each of the four treatment arms is provided in observed mean response between the A/P and P/P
Table 11.1. he notation ‘P/A’, for example, indicates the groups (withdrawal component) at the end of Period
group that received placebo (P) in Period 1 and active 2 will provide an unbiased estimate of the disease-
treatment (A) in Period 2. At the end of Period 1, par- modifying efect αD. In the delayed start component of
ticipants receiving placebo (i.e., those in the P/P and the design, however, the diference in observed mean
P/A groups) have a common mean response termed μ1, response between the A/A and P/A groups at the end
but participants receiving active treatment (i.e., those of Period 2 will provide an unbiased estimate of αD only
in the A/P and A/A groups) have a mean response if α′T = α′′T, i.e., if the incremental efect of treatment
that includes a treatment efect that is assumed to be acquired during Period 2 is the same for the P/A and
a sum of two components: a symptomatic efect (αS) A/A groups. Put another way, it must be assumed that
and a disease-modifying efect (αD). Of course, the data the total (symptomatic + disease-modifying) efect
at the end of Period 1 cannot be used to distinguish of treatment received in Period 2 is independent of
between these two components. For example, in the whether or not the participant received treatment dur-
withdrawal component of the design, the diference in ing Period 1. his implies that Period 2 should be cho-
mean response between the A/P and P/P groups would sen to be long enough for the symptomatic efect of the
estimate αS + αD; the same is true for the delayed start treatment to become fully apparent (P/A group). his
component of the design (diference in mean response critical assumption for the delayed start design is oten
between the A/A and P/A groups). he data from overlooked and is not testable in a trial that only has
Period 1 are used to estimate only the total treatment treatment groups P/A and A/A. It can be tested, how-
efect accrued during that period; the data from Period ever, using data from a complete two-period design.
2 are used to distinguish between the symptomatic and he parameter α′T can be estimated by the diference
disease-modifying components of that efect. in observed mean response between the P/A and P/P
At the end of Period 2, participants who received groups at the end of Period 2, and the parameter α′′T
placebo in both periods (P/P) have a mean response can be estimated by the diference in observed mean
119
response between the A/A and A/P groups at the end challenges [26]. he sample size requirements for trials
of Period 2. he diference between these diferences, in this population may also be larger than those for tri-
therefore, would form the basis for a test of the null als in a population with manifest disease.
hypothesis that α′T = α′′T. If this assumption is correct, he use of concomitant medications should ideally
then the disease-modifying efect αD could be esti- be minimized in trials of potentially disease-modifying
mated by averaging the estimators obtained from the agents, particularly if it is not irmly established that
withdrawal and delayed start components of the com- they do not have disease-modifying efects themselves.
plete two-period design [9]. In ADAGIO, for example, use of levodopa, dopamine
he optimal allocation of trial participants to the agonists, selegiline, rasagiline, or coenzyme Q10 (> 300
four treatment arms in a complete two-period design mg/day) was prohibited within 4 months of randomiza-
was discussed by McDermott et al [9]. Equal allocation tion. he diiculties involved in recruiting large num-
within the withdrawal component (i.e., between the bers of (essentially) untreated subjects must be carefully
P/P and A/P arms) and within the delayed start com- considered when formulating eligibility criteria.
ponent (i.e., between the P/A and A/A arms) is optimal Eligibility criteria can be tailored to maximize
in terms of minimizing the variance of the estimator retention since this is a major concern in two-period
for αD. he allocation of participants between these two designs. For example, in ADAGIO only patients who
components, however, can be arbitrary. Indeed, it may were judged by the site investigator to not likely require
be best to allocate fewer participants to the withdrawal symptomatic treatment in the subsequent 9 months
component to improve recruitment and retention in were eligible. It may be helpful to exclude those with
the trial. On the other hand, equal allocation between certain comorbid conditions as well. he concern has
the withdrawal and delayed start components would been raised that such restrictions on eligibility may
maximize the power of the test of the assumption that yield a cohort of slowly progressive patients in whom
α′T = α′′T [9]. a disease-modifying efect may be more diicult to
detect [6, 28] or may signiicantly limit generalizability
of the results [6].
Additional design considerations
Eligibility criteria Duration of follow-up periods
Trials in PD that have used two-period designs to As summarized above, Period 1 should ideally be cho-
address the question of disease modiication have sen to be long enough for a detectable disease-mod-
thus far involved participants with recently diagnosed ifying efect to become apparent. Period 2 should be
PD who do not yet require treatment [2, 21, 24]. In chosen to be long enough for the symptomatic efect
ADAGIO, patients were eligible if they had been diag- from Period 1 to completely disappear by the end of
nosed within the previous 18 months. In PROUD, Period 2 and, in the case of a delayed start design, for
eligible patients needed to be diagnosed within the the symptomatic efect of the treatment to become fully
previous 2 years. A reasonable hypothesis is that a dis- apparent in Period 2. A practical consequence of this
ease-modifying efect may be more readily detected if is that, in either the withdrawal design or the delayed
treatment is given earlier in the disease course. A poten- start design, the group diferences in mean response
tial problem is misdiagnosis in the early stages of a near the end of Period 2 should not be continuing to
neurodegenerative disease such as PD or AD, although decrease over time. he duration of these periods,
this is not an issue in Huntington’s disease for which therefore, will depend on the nature of the treatment
the genetic defect is known. Studies in participants being studied. Practical aspects related to recruitment
with ‘pre-manifest’ disease may be even more attract- and retention also have to be carefully considered.
ive, although there are many issues that would need to In PD, an initial treatment period of 9 months
be resolved in terms of deining a population at high was used in both the ADAGIO and PROUD delayed
risk for the development of the disease and of dein- start studies. he length of Period 2 was 9 months in
ing appropriate outcome measures before such trials ADAGIO and 6 months in PROUD. Given the inexor-
could be recommended [26–27]. he study of poten- able progression of PD and the availability of dopa-
tially toxic treatments in individuals who have not yet minergic treatments, a duration of Period 1 beyond
developed a disease is also associated with practical 9 months is likely impractical. he current opinion
120
among PD researchers seems to be that withdrawal Period 2 (this only applies to participants who require
designs are not feasible, a situation that may be mag- treatment in Period 1 and would clearly only be a rea-
niied in trials of AD, although this could be recon- sonable option in a trial with a delayed start design, in
sidered for treatments with symptomatic efects that which all participants receive active treatment during
might be expected to disappear relatively rapidly. In Period 2); and 3) allowing the participant to receive
neurodegenerative diseases having no known efective additional treatment while continuing participation
treatment, such as Huntington’s disease, longer period in the trial. he third option may be viewed as being
durations may be feasible. consistent with strict adherence to the intention-to-
treat principle and might be sensible in a trial with a
very pragmatic aim. On the other hand, a trial with a
Schedule of evaluations two-period design that attempts to evaluate the dis-
he timing of evaluations needs to be carefully con- ease-modifying efect of a treatment has a primary
sidered for the eicient design of two-period studies. aim that is much more explanatory or mechanistic
here is the usual consideration of the balance between than pragmatic, making this option unappealing.
cost and participant burden versus the beneit of hav- he delayed start trials conducted to date have
ing more information and maintaining contact with all allowed participants who have been followed for
participants to monitor safety and improve retention. a certain minimum duration in Period 1 (no mini-
Since participant withdrawal is a potential concern and mum duration in TEMPO, 24 weeks in ADAGIO,
some missing data are inevitable, it is important from and 6 months in PROUD) to proceed directly into
an analysis perspective to have a reasonable amount Period 2 if judged by the enrolling investigator to
of information on the trajectory of a participant’s require additional anti-parkinsonian medication.
responses prior to withdrawal. Another important his allows information to be obtained in these par-
consideration is the evaluation of the assumption that ticipants on the mechanism of the efect of the treat-
the group diferences in mean response near the end of ment; however, the time scale for follow-up becomes
Period 2 are not continuing to decrease over time. To compressed for these participants, the implications
adequately test this assumption, more frequent evalu- of which are not entirely clear. Also, if the active
ations may be required in the latter part of Period 2. treatment has a beneicial efect (even if purely
his aspect was carefully considered in the design of symptomatic), the early initiation of Period 2 may
the ADAGIO trial [2, 22] but was not considered in the occur preferentially in those receiving placebo dur-
design of the PROUD trial [24]. ing Period 1, which may complicate interpretation
of the results. In all of these trials, participants who
Withdrawal due to worsening disease required additional treatment in Period 2 were with-
A practical issue that arises in two-period designs is drawn from the trial at that time.
how to accommodate participants who require add-
itional treatment due to a decline in their condition. Statistical considerations
his issue is of particular concern for trials in PD, for
which there are many available efective treatments, Primary analyses
but applies to AD as well. It is helpful to have a for- he primary analyses for a two-period design typi-
mal operational deinition of the need for additional cally focus on three issues: 1) comparison of the mean
treatment to distinguish this situation from the case responses of those receiving active treatment and those
where the participant may be doing well but desires receiving placebo at the end of Period 1; 2) comparison
to receive additional treatment for reasons unrelated of the mean responses in the A/P and P/P arms (with-
to accumulating disability; the primary endpoint in drawal design) or in the P/A and A/A arms (delayed
the DATATOP trial, for example, was declared when start design) at the end of Period 2; and 3) evaluation
the investigator, in his/her clinical judgment, felt that of the assumption that the group diferences in mean
the participant had reached a level of functional dis- response near the end of Period 2 are not continuing to
ability suicient to warrant treatment with levodopa decrease over time.
[10]. here are a number of options for dealing with Analyses for the irst issue should involve simple
this issue, including: 1) withdrawing the participant comparisons of mean responses at the end of Period 1,
from the trial; 2) moving the participant directly into as exempliied in the TEMPO [21] and PROUD [24]
121
studies. In the ADAGIO trial, however, the analyses H0: βP/A – βA/A > δ vs. H1: βP/A – βA/A ≤ δ,
involved comparisons of the rates of change (slopes)
between the rasagiline and placebo groups in Period 1, where δ is the non-inferiority margin. his means that
where the rates of change were based on data from Week the slope in the P/A group would be considered to
12 to Week 36 [2, 22]. he rationale for this strategy is be not meaningfully larger than the slope in the A/A
not clear, particularly since it should only be of interest group if the diference between them can be demon-
in Period 1 to determine whether or not the treatment strated to be signiicantly less than the non-inferiority
groups difer with regard to mean response at the end of margin δ. As described in Chapter 13, the choice of the
this period and not to try to make inferences about the non-inferiority margin needs to be made with care to
mechanism of the treatment efect; if the latter were pos- allow for proper interpretation of the trial results. In
sible, there would be no need for a second period. Also, ADAGIO, the non-inferiority margin was chosen to
this analysis strategy requires the pre-speciication of a be δ = 0.15 UPDRS points/week, a value that was not
time point beyond which the symptomatic efect of the justiied in the trial publications [2, 22] and appears
treatment is fully apparent (Week 12, in the case of the to be much too large. his value means that the group
ADAGIO trial), which may be problematic [29]. diference in mean responses could be shrinking by
he key analyses are the group comparisons of the as much as 3.6 points over the 24-week time period
mean responses at the end of Period 2. hese analyses (Weeks 48–72), a value greater than the treatment efect
should again be relatively straightforward. It may be observed during Period 1, yet still be considered to be
advantageous to use data from multiple time points non-decreasing over time.
near the end of Period 2 to improve precision of the Despite the poor choice of non-inferiority margin
estimated mean responses, but this would require an in ADAGIO, the results for the 1 mg/day dosage indi-
additional assumption regarding the stability (con- cated that the estimate of βP/A – βA/A was 0.00 with a
stancy) of the treatment group diference at all of these 95% conidence interval of (−0.04 to 0.04) [22]. he
time points. interpretation of the upper conidence bound of 0.04
Analyses to address the issue of whether or not is that diferences between the slopes of more than 0.04
the group diferences in mean response near the UPDRS points/week (or a convergence of the group
end of Period 2 are continuing to decrease over time means by more than ~ 1 point over 24 weeks) can be
are somewhat more complex than those required to ruled out with a high degree of conidence. A choice of
address the irst two issues. First, a decision must be non-inferiority margin this small may make research-
made prior to study initiation regarding which data ers more comfortable with the conclusion that the
to include in the analyses. For example, in ADAGIO group diference in mean responses is not continuing
the data from Weeks 48–72 were included because it to decrease appreciably over time.
was thought that the symptomatic efect of rasagiline It should be recognized that two-period studies
would appear within 12 weeks of its initiation in the that aim to investigate the ability of an intervention to
delayed start group at Week 36 [22]. Second, a decision modify disease course have an objective that is more
must be made regarding how to quantify the evolution explanatory than pragmatic in nature [30]. For this
of the group diference in mean response over time. In reason, carefully collected data on compliance with the
ADAGIO, this was done using a rate of change (slope) intervention could potentially be quite valuable in the
that assumed linearity of the relationship between interpretation of the trial results. Statistical methods
mean response and time during Weeks 48–72 [22]. that attempt to account for participant compliance may
A third complexity is that the goal of these analy- be useful in this context [31–32], although these have
ses is to establish that the group diferences in mean not been applied to data from the TEMPO, ADAGIO,
response are not continuing to decrease over time, a or PROUD trials.
goal that translates into a hypothesis concerning non- A inal point concerns multiple statistical testing.
inferiority (see Chapter 13 for a thorough explanation In order for an intervention to be considered disease
of this concept). Let βP/A be the slope (Weeks 48–72) modifying, it would likely have to be successful in each
in the delayed start (P/A) group and let βA/A be the of the above three analyses (statistically signiicant
corresponding slope in the early start (A/A) group. beneit at the end of Period 1, continued statistically
In ADAGIO the following statistical hypotheses were signiicant beneit at the end of Period 2, and non-
formulated: decreasing group diference in mean response over
122
time near the end of Period 2), not just one of them. all available data, including all observed data from
If this were the case, correction for multiple statistical participants who prematurely withdraw from the trial
testing would not be required. In fact, this is an exam- [38]. Linear or non-linear mixed efects models [37]
ple of so-called reverse multiplicity [33] whereby the that specify a functional form for the relationship
overall probability of a false-positive result will be less between response and time can also be used for this
than the signiicance level used for each of the three purpose and may be more eicient than the MMRM
tests (e.g., α = 0.05). An exception is if it is desired to strategy if the speciied functional form is (approxi-
make a claim about a signiicant treatment efect dur- mately) correct. Multiple imputation is another tech-
ing Period 1 alone, regardless of the mechanism of nique that has been developed for inference in the
this efect. In this case, some multiplicity adjustment setting of missing data [39–40]. It is superior to sin-
would be necessary [34]. gle-imputation methods because it accounts for the
uncertainty associated with the model used for data
imputation, i.e., it does not artiicially increase the
Strategies for accommodating precision of estimated treatment efects. he primary
missing data analyses described above for a two-period design
As in virtually any clinical trial, the problem of miss- would be fairly easy to conduct using these strategies
ing data (see Chapter 6) will arise in trials having two- for accommodating missing data.
period designs. he implications of missing data are hese methods rely on an important (and untest-
arguably greater in a two-period design, however, due able) assumption concerning the missing data mechan-
to the fact that information concerning the treatment ism: that the data are ‘missing at random’ (MAR). his
mechanism (symptomatic vs. disease-modifying) is assumption speciies that the missingness depends only
derived from the data acquired during Period 2. Studies on observed outcomes in addition to covariates, but not
with two-period designs also involve long duration of on unobserved outcomes [35]. his may be a reason-
follow-up, which increases the probability of partici- able assumption under many circumstances, especially
pant withdrawal. Several statistical methods have been if data on participant response can be obtained at the
developed to deal with the problem of missing data and time of withdrawal. One cannot determine, however,
are well summarized elsewhere [35–36]. if the missingness mechanism is MAR vs. ‘missing not
Simple ad-hoc methods for dealing with missing at random’ (MNAR), where missingness can depend
data such as dropping cases with missing data (‘com- on unobserved outcomes in addition to observed out-
plete case’ analyses) or carrying forward the last avail- comes and covariates.
able observation (LOCF imputation) have been widely he TEMPO, ADAGIO, and PROUD trials all
criticized in the literature [37]. Analyzing data only allowed participants who needed additional antipar-
from complete cases involves a comparison of subsets kinsonian treatment in Period 1 to move directly
of treatment groups that are determined on the basis of to Period 2. he primary analyses in ADAGIO and
outcome; hence, the beneits of randomization are lost PROUD, however, had minimum requirements for
and bias of unknown magnitude and direction can be participation in Period 1 (24 weeks in ADAGIO and 6
introduced. LOCF imputation in the setting of a neu- months in PROUD) for this to be allowed. In all three
rodegenerative disease is clearly problematic in terms trials, only participants who had at least one follow-up
of bias, particularly if the last observation for the par- evaluation ater the start of Period 2 were included in
ticipant is obtained relatively early during follow-up. the primary analyses of Period 2 data. he bias intro-
Moreover, the use of single-imputation methods such duced by the exclusion of randomized participants is
as LOCF can artiicially increase the precision of esti- of unknown magnitude and direction, although par-
mated treatment efects because the imputed data are ticipant retention in these trials was generally excellent
treated in the analyses as if they were observed. [21–23]. Methods such as propensity score adjust-
Better strategies for accommodating missing ment [41] may be useful in reducing the bias result-
data include so-called ‘mixed model repeated meas- ing from such participant exclusion [34]. In TEMPO
ures’ (MMRM) analyses which treat time as a cat- and PROUD, participants who withdrew in Period 2
egorical variable and use maximum likelihood to had their last observed responses carried forward to
estimate model parameters (e.g., mean treatment the inal visit for analysis. he ADAGIO trial used the
group responses at each individual time point) using MMRM strategy to deal with missing data in Period 2.
123
Sample size determination the ideal approach. Such an approach, however, awaits
the development of valid biomarkers of underlying dis-
here are several important considerations in deter-
ease progression. Another promising approach has been
mining the appropriate sample size for a trial with a
suggested that combines a model for disease progression
two-period design. First, the minimally important
with a pharmacodynamic model for drug efects [42–43],
efect size for disease modiication needs to be speci-
the latter facilitating inference concerning the mecha-
ied. In ADAGIO, this was chosen to be 1.8 points for
nisms of the drug efect. hese models have been applied
the UPDRS total score [22], and in PROUD, this was
to data from the DATATOP trial [44] and the ELLDOPA
chosen to be 3 points [24]. hese choices have been
trial [29], providing evidence for disease-modifying
criticized by some to not represent clinically impor-
efects of selegiline and levodopa. his approach was also
tant efects [6]. One must bear in mind, however, that
used to provide independent validation (prediction) of
this group diference, if real, should be interpreted as
the results of the ELLDOPA trial [45]. hese methods
the disease-modifying beneit that accrued over a very
are analytically complex and rely on several modeling
short period of time (9 months) relative to the duration
assumptions, but they may overcome some of the limita-
of the disease and would be expected to continue to
tions of two-period designs for this purpose and appear
accrue over time, indeed possibly over many years. he
to hold great promise in facilitating understanding of
ADAGIO investigators [22] noted that the observed
the mechanisms of drug beneit [29].
efect of the 1 mg/day dosage of rasagiline (1.7 points
over 36 weeks) represents a 38% reduction in the Limitations of two-period designs
change from baseline which, if this truly represents dis-
here are several limitations that accompany the use
ease modiication, would be highly meaningful from a
of two-period designs to determine whether or not
clinical standpoint. he choice of efect size for sam-
an intervention has disease-modifying efects. Many
ple size determination should be based on a realistic
of these have already been discussed, including the
expectation of the magnitude of a disease-modifying
assumptions that: 1) Period 1 is long enough for a
efect that could accrue over a relatively short follow-up
detectable disease-modifying efect to become appar-
period (e.g., 9 months) and may not be very large.
ent; 2) the disease-modifying efect acquired over
A second consideration is the sample size require-
the duration of Period 1 remains with the participant
ment for determining that the group diference in mean
at least through the end of Period 2, but presumably
responses is not continuing to decrease appreciably
longer; 3) Period 2 is long enough for the symptom-
over time near the end of Period 2. his was not a major
atic efect from Period 1 to completely disappear by the
consideration in the ADAGIO trial because of the large
end of Period 2; 4) withdrawal of active treatment does
value chosen for the non-inferiority margin. A more
not modify (e.g., hasten) the disease process in some
appropriate (smaller) choice for the non-inferiority
way (withdrawal design); and 5) the total (symptom-
margin, however, may make this aspect of the design
atic + disease-modifying) efect of treatment received
the most important determinant of sample size. Other
in Period 2 is independent of whether or not the par-
problems such as participant withdrawal, non-compli-
ticipant received treatment during Period 1 (delayed
ance, and misdiagnosis also need to be carefully con-
start design), implying that Period 2 is long enough
sidered. In particular, clinical trial simulation can be
for the symptomatic efect of the treatment to become
highly useful in determining the impact of participant
fully apparent. Many of these assumptions cannot be
withdrawal, missing data, and the reverse multiplicity
veriied directly using the data from the two-period
problem on the sample size requirements for the trial.
design and must rely on evidence external to the trial.
Interventions with a very slow onset and/or ofset of
Alternative approaches to determining a symptomatic efect may not be well-suited for study
disease-modifying effects using a two-period design [14, 29].
here are alternative approaches to evaluating the dis- Other limitations previously mentioned include
ease-modifying efects of an intervention that require problems with acceptability of the withdrawal design
only a single treatment period. As mentioned above, a by researchers and potential trial participants; a poten-
standard randomized, double-blind, parallel group trial tial compromise of the blind if only two treatment arms
with a valid biological measure of underlying disease are used; diiculties in recruiting large numbers of
progression as the primary outcome variable would be untreated subjects; potentially limited generalizability
124
of the results if ‘slow progressors’ are preferentially rep- 11. he Parkinson Study Group. Efects of tocopherol
resented in the trial; and diiculties with participant and deprenyl on the progression of disability in early
retention and the use of proper statistical methods to Parkinson’s disease. N Engl J Med 1993; 328: 176–83.
deal with the resulting missing data. 12. Guimaraes P, Kieburtz K, Goetz CG, et al. Non-linearity
An additional limitation not previously mentioned of Parkinson’s disease progression: implications for
includes the possibility of ceiling or loor efects of sample size calculations in clinical trials. Clin Trials
2005; 2: 509–18.
the clinical rating scale used to measure outcome that
might limit the ability of the two-period design to 13. Vellas B, Andrieu S, Sampaio C, et al. Endpoints for
trials in Alzheimer’s disease: a European task force
assess disease modiication. his might be particularly
consensus. Lancet Neurol 2008; 7: 436–50.
problematic if participants have very mild disease. A
similar concern is that a two-period design might not 14. Ploeger BA and Holford NHG. Washout and delayed
start designs for identifying disease modifying efects
be able to ascertain the mechanism of the efect of an in slowly progressive diseases using disease progression
agent with a very prominent symptomatic efect that analysis. Pharm Statist 2009; 8: 225–38.
overwhelms a disease-modifying efect in participants
15. Whone AL, Watts RL, Stoessl AJ, et al. Slower
with very early disease [22, 24]. progression of Parkinson’s disease with ropinirole
versus levodopa: the REAL-PET study. Ann Neurol
References 2003; 54: 93–101.
1. Sano M, Ernesto C, homas RG, et al. A controlled trial 16. Parkinson Study Group. Dopamine transporter
of selegiline, alpha-tocopherol, or both as treatment for brain imaging to assess the efects of pramipexole vs.
Alzheimer’s disease. N Engl J Med 1997; 336: 1216–22. levodopa on Parkinson disease progression. JAMA
2002; 287: 1653–61.
2. Olanow CW, Hauser RA, Jankovic J, et al. A
randomized, double-blind, placebo-controlled, delayed 17. Schapira AHV and Olanow CW. Neuroprotection
start study to assess rasagiline as a disease modifying in Parkinson disease: mysteries, myths, and
therapy in Parkinson’s disease (the ADAGIO study): misconceptions. JAMA 2004; 291: 358–64.
rationale, design, and baseline characteristics. Mov 18. Clarke CE and Guttman M. Dopamine agonist
Disord 2008; 15: 2194–2201. monotherapy in Parkinson’s disease. Lancet 2002; 360:
3. Kaye JA. Methods for discerning disease-modifying 1767–69.
efects in Alzheimer disease treatment trials (editorial). 19. Leber P. Observations and suggestions on antidementia
Arch Neurol 2000; 57: 312–14. drug development. Alzheimer Dis Assoc Disord 1996;
4. Clarke CE. A “cure” for Parkinson’s disease: can 10(Suppl 1): 31–5.
neuroprotection be proven with current trial designs? 20. he Parkinson Study Group. Levodopa and the
Mov Disord 2004; 19: 491–8. progression of Parkinson’s disease. N Engl J Med 2004;
5. Schapira AHV and Obeso J. Timing of treatment 351: 2498–2508.
initiation in Parkinson’s disease: a need for reappraisal? 21. Parkinson Study Group. A controlled, randomized,
Ann Neurol 2006; 59: 559–62. delayed-start study of rasagiline in early Parkinson
6. Clarke CE. Are delayed-start design trials to show disease. Arch Neurol 2004; 61: 561–6.
neuroprotection in Parkinson’s disease fundamentally 22. Olanow CW, Rascol O, Hauser R, et al. A double-blind,
lawed? Mov Disord 2008; 23: 784–89. delayed-start trial of rasagiline in Parkinson’s disease. N
7. Olanow CW and Rascol O. he delayed-start study in Engl J Med 2009; 361: 1268–78.
Parkinson disease: can’t satisfy everyone. Neurology 23. Schapira A, Albrecht S, Barone P, et al. Immediate vs.
2010; 74: 1149–51. delayed-start pramipexole in early Parkinson’s disease:
8. Cummings JL. Deining and labeling disease-modifying the PROUD study. Parkinsonism Relat Disord 2009; 15:
treatments for Alzheimer’s disease. Alzheimer’s Dement S2–S81.
2009; 5: 406–18. 24. Schapira AHV, Albrecht S, Barone P, et al. Rationale
9. McDermott MP, Hall WJ, Oakes D, et al. Design and for delayed-start study of pramipexole in Parkinson’s
analysis of two-period studies of potentially disease- disease: the PROUD study. Mov Disord 2010; 25:
modifying treatments. Controlled Clin Trials 2002; 23: 1627–32.
635–49. 25. Whitehouse PJ, Kittner B, Roessner M, et al. Clinical
10. he Parkinson Study Group. Efect of deprenyl on the trial designs for demonstrating disease-course-altering
progression of disability in early Parkinson’s disease. N efects in dementia. Alzheimer Dis Assoc Disord 1998;
Engl J Med 1989; 321: 1364–71. 12: 281–94.
125
26. Kieburtz K. Issues in neuroprotection clinical trials 37. Molenberghs G, hijs H, Jansen I, et al. Analyzing
in Parkinson’s disease. Neurology 2006; 66(Suppl 4): incomplete longitudinal clinical trial data. Biostatistics
S50–S57. 2004; 5: 445–64.
27. Vellas B, Andrieu S, Sampaio C, et al. Disease- 38. Mallinckrodt CH, Clark WS and David SR. Accounting
modifying trials in Alzheimer’s disease: a European task for dropout bias using mixed-efects models. J
force consensus. Lancet Neurol 2007; 6: 56–62. Biopharm Statist 2001; 11: 9–21.
28. Ahlskog JE and Uitti RJ. Rasagiline, Parkinson 39. Little R and Yau L. Intent-to-treat analysis for
neuroprotection, and delayed-start trials: still no longitudinal studies with drop-outs. Biometrics 1996;
satisfaction? Neurology 2010; 74: 1143–8. 52: 1324–33.
29. Holford NHG, Nutt JG. Interpreting the results of 40. Schafer JL. Analysis of Incomplete Multivariate Data.
Parkinson’s disease clinical trials: time for a change. Boca Raton, FL: Chapman and Hall/CRC, 1997.
Mov Disord 2011; 26: 569–77. 41. D’Agostino RB Jr. Propensity score methods for bias
30. Schwartz D and Lellouch J. Explanatory and pragmatic reduction in the comparison of a treatment to a non-
attitudes in therapeutical trials. J Chronic Dis 1967; 20: randomized control group. Statist Med 1998; 17:
637–48. 2265–81.
31. Robins JM, Hernan MA and Brumback B. Marginal 42. Chan PLS and Holford NHG. Drug treatment efects on
structural models and causal inference in epidemiology. disease progression. Annu Rev Pharmacol Toxicol 2001;
Epidemiology 2000; 11: 550–60. 41: 625–59.
32. Frangakis CE and Rubin DB. Principal stratiication in 43. Holford NHG and Ludden T. Time course of drug
causal inference. Biometrics 2002; 58: 21–9. efect. In: Welling PG, Balant LP, eds. Handbook of
Experimental Pharmacology. Heidelberg: Springer-
33. Ofen W, Chuang-Stein C, Dmitrienko A, et al. Multiple
Verlag, 1994.
co-primary endpoints: medical and statistical solutions.
Drug Inf J 2007; 41: 31–46. 44. Holford NHG, Chan PLS, Nutt JG, et al. Disease
progression and pharmacodynamics in Parkinson
34. D’Agostino RB Sr. he delayed-start study design. N disease – evidence for functional protection with
Engl J Med 2009; 361: 1304–6. levodopa and other treatments. J Pharmacokinet
35. Little RJA, Rubin DB. Statistical Analysis with Missing Pharmacodyn 2006; 33: 281–311.
Data. Hoboken, NJ: John Wiley and Sons, Inc., 2002. 45. Chan PLS, Nutt JG and Holford NHG. Levodopa slows
36. Molenberghs G and Kenward MG. Missing Data in progression of Parkinson’s disease: external validation
Clinical Studies. Chichester: John Wiley and Sons, by clinical trial simulation. Pharm Res 2007; 24:
2007. 791–802.
126
Chapter
Enrichment designs
12 Kathryn M. Kellogg and John Markman
Table 12.1 Names for a two-stage clinical trial design using

Introduction select patients from the first stage in the second stage
Enriched enrollment designs allow researchers to
Enrichment design
identify subjects for whom a proposed treatment is
Discontinuation design
more likely to be beneicial and to include only those
subjects in the randomized phase of a clinical trial. Randomized discontinuation design
Since the introduction of enrichment approaches over Enriched enrollment with randomized withdrawal
three decades ago, this method is increasingly used to Study with a qualification period
enhance assay sensitivity for study drug efects when
only a subset of subjects in a population is expected Clinical trials using enrichment designs involve
to respond to an intervention. his chapter will exam- at least two periods (Figure 12.1). In the irst period,
ine the varied strategies involved in developing a trial the enrichment period, subjects are screened for their
using an enrichment design, the advantages and dis- responsiveness according to predetermined criteria
advantages of this method, and issues to be considered (e.g. a 30% reduction in baseline pain intensity). hese
when planning a study using enrichment strategies. criteria vary depending on the type of study being per-
Clinical trials using this design have a variety of names formed. Researchers oten use the putative response to
in the literature, examples of which can be found in the treatment to be studied in the subsequent phase of
Table 12.1. the trial as a direct screening tool during the enrich-
he enrichment design is a relatively new clini- ment period. However, some researchers use other
cal trial method irst described by Amery and Dony screening criteria such as biomarkers that may indicate
in 1975 [1]. hese researchers identiied a need for an potential response to the intervention. his may be
alternative to the traditional randomized controlled particularly useful when there is a biomarker that can
trial (RCT) in pharmaceutical clinical development be identiied in the short term that predicts response to
because of the high incidence of placebo response and long-term treatment [2].
the ethical implications of prolonged placebo expo- Researchers use a variety of methods to perform the
sure in half of the study participants who might ben- irst stage of an enrichment design trial. In the simplest
eit from alternative treatments. he run-in periods method, the test drug is given in the irst phase and
common in enrichment designs may have multiple participant response is used to gauge advancement to
objectives, some of which are clinically relevant such the second stage (Figure 12.1). However, some studies
as tolerability, and others which are trial speciic such examine more than one intervention in the irst phase
as subject adherence to the protocol. Since its introduc- in order to ind a subject’s ideal treatment or dose to
tion, this design type has been adopted and reined in be used in the second phase or select subjects whose
many areas of medicine, most notably in psychiatry symptoms worsen upon withdrawal of study drug [3].
and pain research. In these study populations, placebo Other enrichment strategies aim to select for partici-
response rates are oten high and the complex trade-of pants with speciic traits, such as the ability to report
of symptom relief for drug tolerability frequently leads acute pain consistently as evaluated by psychophys-
to high dropout rates in clinical trials. ical screening. Other enrichment approaches feature
127
Figure 12.1. The steps for a clinical trial

Enrichment period Randomized, controlled trial using the enrichment design.
Response
Active
No response
Randomize
Responders
Screening Response
All eligible patients Control
No response
Non-responders
Excluded
pharmacogenomic testing, assessment of baseline commonly used enrichment design, subjects who are
characteristics such as a previous response to another putative responders during the enrichment period are
treatment, or induction of a pain lare on withdrawal of enrolled in the subsequent, randomized, controlled
study medication [4]. trial portion of the study [2].
Further reinement of enrichment deinitions has While the clinical trial without an enrichment
been proposed. For example, in their 2008 systematic phase has long been viewed as the gold standard for
review Straube et al deined ‘complete enriched enroll- clinical evidence, this traditional design has a num-
ment’ as a study in which all participants are known ber of weaknesses, particularly when studying cer-
to have been exposed to the test drug, either in clin- tain disease processes. In a group of subjects with a
ical practice or in a clinical trial setting [4]. In this case, common chronic pain etiology but heterogeneous
either the putative responders are advanced to the underlying pain mechanisms and symptom pat-
second phase of the study or the non-responders are terns, the average treatment response in the group
excluded. hey then deined ‘partial enriched enroll- exposed to active drug may reveal little about the
ment’ as a study in which previous non-responders are experience of most participants. he vast majority of
excluded from the study, but those who had not been subjects may endorse a very limited response while
previously exposed may also have been included, such others experience signiicant beneit; it is the norm
that not all participants are deinitively known to have that few subjects experience the ‘average’ response
been exposed to the test drug [5]. [6]. In diseases for which a high proportion of sub-
jects are expected to be non-responders, such as in
chronic pain or depression, using group mean reduc-
Advantages of the enrichment design tion as the primary endpoint may mask a clinically
If the treatment to be examined in the trial is admin- meaningful beneit in a subset of subjects due to
istered during the enrichment period, observations degradation of assay sensitivity [7–8]. his liability
from this period approximate how a general popula- of RCT-based evidence can be mitigated by using an
tion may be expected to respond to the treatment in enrichment design.
clinical practice. It is important to note that the extent Another rationale driving the increasing use of
to which the experience of subjects during this uncon- the enrichment design is its close replication of clin-
trolled exposure is attributable to non-speciic treat- ical practice when compared to the traditional RCT.
ment efects, natural history, spontaneous resolution, In the RCT, subjects are enrolled and maintained on
placebo efects, and regression to the mean cannot the study treatment regardless of its efect. However,
be discerned. As such, the study period that follows in clinical practice it is common for a treatment to be
the enrichment phase may be viewed as testing the discontinued in subjects for whom no beneit is per-
hypothesis that the response observed in the subjects ceived during an initial treatment interval deined
in the enrichment period is due to chance. In the most by the expected onset of action and kinetics of the
128
Chapter 12: Enrichment designs
agent. Only patients who tolerate a therapy and per- treatment efect, the sample size required in an enrich-
ceive beneit during an initial period of titration and ment design could be reduced by 30% compared to
observation are typically maintained on a treatment that of an RCT [9]. his increase in sensitivity is par-
in actual clinical practice. It is in the population of ticularly relevant when the anticipated efect size of
study subjects that most resemble an intended patient the treatment is small and signiicant heterogeneity of
population that clinicians are most concerned about treatment response is anticipated across subpopula-
the rates of positive and adverse efects [7, 9]. In actual tions of subjects. Enrichment designs are not as ei-
practice, patients who failed to tolerate an antidepres- cient when only partial enrichment is used. When the
sant or analgesic due to intolerable side efects would efect size is large in the responsive subpopulation of
be changed to an alternative therapy. he extent to subjects but enrichment is incomplete, the power of the
which the results of the enrichment phase emulate enrichment design has been shown to be similar to that
clinical practice will vary in accord with the method of the RCT [12].
used to deine a responder. he concept that an enrichment design can have
For example, Ho et al performed a trial using gabap- increased sensitivity was demonstrated in a trial per-
entin or tramadol for treatment of pain due to small formed by Byas-Smith et al in 1995. he irst portion
iber neuropathies in a group of subjects with biopsy- of this study was a randomized, double-blind, placebo-
proven small iber neuropathy [10]. he enrichment controlled crossover trial that included 41 subjects with
period in this study involved two single-blind phases. In painful diabetic neuropathy. Subjects were randomly
the irst single-blind phase, subjects were treated with assigned to one of four 3-week treatment sequences
gabapentin at their pre-study dose. hose whose pain including placebo (P) or clonidine (C): C-P-C, P-C-P,
scores were less than or equal to 7.5 were determined C-P-P, or P-C-C. In the irst week of each treatment
to be responders and were enrolled in the subsequent period, the clonidine patch dosage was titrated from
portion of the study. he included subjects were then the initial dosage of 0.1 mg/day in 0.1 mg increments
treated with placebo in the second single-blind phase. up to 0.3 mg/day. Subjects kept a daily pain diary and
hose subjects whose pain did not increase while on the outcome measures for this portion of the study
placebo were then excluded from the double-blind, were ratings of pain intensity and a global relief assess-
randomized portion of the study. By using two stages ment. heir results showed that there was little difer-
in the enrollment period, the researchers were able to ence in pain relief between subjects using the clonidine
irst eliminate non-responders, and potentially exclude or placebo patches [13].
placebo responders. he researchers then enrolled 12 subjects who
here are many strengths of the enrichment design appeared most responsive to clonidine treatment in
that have been cited by its advocates. First, because Phase 1 of the study into a subsequent study. In the
a trial employing the enrichment design includes next phase, subjects were randomly treated with their
only subjects who have been shown to respond to the maximum tolerated dose of clonidine, as established in
screening criteria and not the general subject popula- Phase 1, in 2 of 4 consecutive 1-week periods as follows:
tion, these trials are conigured to detect the treatment C-P-C-P, C-P-P-C, P-C-P-C, or P-C-C-P. When only
efect in a subpopulation with greater eiciency. hat those subjects who responded to treatment in Phase 1
is, fewer subjects are required to be included in the ran- were examined in this way, the researchers found these
domization period than in a non-enriched RCT in order subjects had signiicantly reduced pain with clonidine
to show the separation from placebo thereby yielding treatment when compared with placebo [13].
higher assay sensitivity [4–5, 9]. Use of the enriched he mathematical model generated by Kopec also
enrollment randomized withdrawal design has been supported this inding [9]. he model showed that if
associated with reduced variability and an increased the proportion of non-compliers and those who expe-
efect size (mean treatment diference/SD) compared rience dose limiting adverse efects occurred in 20%
with parallel-group design in trials of post-herpetic of the initial population and these subjects could be
neuralgia and painful diabetic neuropathy [11]. excluded with both 80% sensitivity and speciicity,
Kopec et al. used a computer model based on a the sample size requirement for the subsequent study
variety of assumptions, to demonstrate this feature of would be reduced by greater than 30%. When this il-
the enrichment design. he model showed that with tering was performed in conjunction with the exclu-
80% sensitivity and 80% speciicity for identifying sion of non-responders with similar accuracy, the
129
overall reduction in sample size was 20% (rather than the need for caution when using surrogate endpoints
the 30% found previously) in comparison to an RCT as outcomes in a clinical trial and shows the problem
without enrichment [9]. One important drawback of of enrichment based on a biomarker of unknown pre-
the enrichment design to be considered below is that dictive value [18].
such iltering undermines the controlled assessment of Focus on the average outcome in the traditional
the safety of the active treatment. RCT may serve to mask eicacy in a subgroup by
Another advantage of the enrichment design is the including other subgroups in which the treatment has
opportunity to use the enrichment period to adjust poor eicacy. his may lead to a treatment with great
drug dosing in a lexible fashion to achieve maximum utility for certain subjects failing to achieve regulatory
treatment efect before comparing the treatment to approval and not going to market due to negative RCTs
placebo [6, 8, 14–15]. he subject may then be assigned that are a relection of study design failure rather than
to either the subject’s own best dose or to placebo dur- an intervention’s lack of therapeutic eicacy. Because
ing the randomization period. his can serve to both the enrichment design examines a subgroup in which
increase the likelihood of a successful trial, and pro- the treatment was tolerated and perceived as efective
vide an ethical way to ensure that each subject is opti- during an initial exposure, the problem of efects being
mally disposed to treatment efect [14]. As many RCTs masked by averaged results among groups in a parallel
have ixed timetables for drug administration, the less design may be mitigated [6]. However, this design does
rigid timetable of the enrichment period can also ofer not ensure that the optimal subgroup of responders
researchers more lexibility in dose inding for each was deined in advance of the randomized phase. he
subject [8]. issue of heterogeneity is also seen at a molecular level
he Cardiac Arrhythmia Suppression Trial (CAST) in the ield of oncology, and it has been suggested that
was one of the earlier large, multi-center studies to enrichment strategies may be appropriate to increase
use an enrichment design. his study examined the the sensitivity of cancer treatment trials through means
hypothesis that the death rate in subjects with asymp- such as genotyping [19].
tomatic or mildly symptomatic ventricular arrhyth-
mia ater myocardial infarction would be reduced
with arrhythmia suppression. During the enrichment
Disadvantages of the enrichment
period, the researchers strove to ind the treatment that design
yielded a response for each subject by testing a variety Despite its increasing use in multiple ields, the util-
of antiarrhthymics at diferent dose levels. Researchers ity and merits of the enrichment design continue to be
deined response as either an 80% reduction in ven- debated. Some researchers feel there are limitations of
tricular premature contractions or a reduction of at the design that cannot be overcome, such as placebo
least 90% in runs of unsustained ventricular tachycar- efect issues and problems with unblinding and carry-
dia as recorded on 24h Holter monitoring. Once a sub- over efects [20–21]. Others maintain that these issues
ject achieved this outcome, the titration was stopped can be overcome in the early stages of trial design or
and that drug and dose were used for the subject in the with the use of active comparators [7]. A frequently
randomization sequence if the subject was in the treat- cited concern is the loss of generalizability (i.e., exter-
ment group. Speciically, subjects with an ejection frac- nal validity) in selecting a speciic subpopulation in the
tion (EF) of ≥30% were randomly assigned to receive enrichment phase. It is important to note that in both
either encainide-morcizine-lecainide or lecainide- an enriched trial and one lacking enrichment, subjects
morcizine-encainide, and each drug was tested at two are randomized to study treatments. he key diference
dose levels. Subjects with an EF <30% were not admin- of course is that a group of putative responders rather
istered lecainide due to its negative inotropic proper- than all comers are randomized with the enrichment
ties but were administered either encainide-morcizine design. To whom do the results of an enriched trial
or morcizine-encainide [16–17]. By inding each sub- demonstrating a beneit of active therapy over placebo
ject’s best dose, the researchers maximized the chances apply? hese considerations will be discussed further
that the treatment would have a beneit over placebo below.
in the RCT portion of the trial. Notably this trial was Some caveats do apply to the use of the enrich-
halted prematurely due to increased deaths in the treatment design. his design is best utilized to study
ment groups. his trial is now seen as an example of non-curative treatments, as subjects cannot be cured
130
during the enrichment period and still be studied dur- supporting the use of therapies studied in this way
ing the randomization period [1]. Nor can this design applies to a more restricted population of patients [7].
be used to study any irreversible treatment, such as sur- It has been argued that the enrichment design
gery [9]. his design is most appropriate for chronic decreases recruitment eiciency [25–26]. Because
conditions such as chronic pain with target symptom only a subset of subjects from the enrichment period
endpoints that remain relatively stable. If a disease of the trial will continue on to the randomized period,
is progressive, as was found in a study with design in additional subjects need to be screened in the enrich-
an Alzheimer’s disease population, it can be diicult ment period to meet statistical power requirements.
to ascertain which efects are due to the study treat- Lemmens et al. [26] discuss this concern, noting that as
ment and which are attributable to disease progression the proportion of subjects from the enrichment period
[21]. Additionally, subjects may not return to baseline who are randomized decreases, the power of the study
before the randomization period, which could also decreases [25]. his is particularly concerning when:
potentially afect results [1, 7]. his is similar to prob- 1) the pool of available study participants is limited; 2)
lems with other designs such as crossover trials (see the designers have little guidance as to the relative pro-
Chapter 10), where a washout may be required. Despite portion of subjects who will not be excluded ater the
these limitations, this design has been employed across enrichment phase; or 3) there is concern that the enrich-
a broad array of ields from neurology to oncology to ment phase is not accurately identifying responders to
cardiology to study conditions as varied as chronic speciic treatment efects of the therapy in question.
pain, cancer, Alzheimer’s Disease, and mortality ater Conversely, Kopec’s computer-simulated comparison
myocardial infarction [16, 22–24]. of sample sizes indicated that the number of subjects
he limitation most oten cited regarding the enrich- enrolled in the enrichment period to achieve equiva-
ment design is that of generalizability. Ascertaining the lent power in the randomization period was actually
broader population of subjects to which the results of a slightly lower than the number of subjects required
clinical trial can be extrapolated is a key interpretative in a conventional RCT. hese results were based on
challenge for clinicians. Because this design uses an an assumption of sensitivity and speciicity of greater
initial enrichment period during which subjects who than 70% for identifying responders in the enrollment
do not putatively respond to the enrichment criteria period of the enrichment design.
are excluded, critics of this design argue that the results In addition to the above criticisms, there are a num-
of a trial with an enrichment design have reduced gen- ber of issues that must be considered by researchers
eralizability than those without such a feature. hese planning a clinical trial using an enrichment design,
critics have argued that this screening method oten many of which are not unique to this design. In a classic
magniies the treatment beneit that may be real- parallel group RCT without enrichment, participants
ized when giving the therapy to the general popula- are exposed to the treatment or the comparator, most
tion in routine clinical practice and that discretion is oten a placebo, and nothing more. In an enrichment
warranted when considering results of this trial type design, however, the participant is oten exposed to the
[21]. Equally important, controlled evaluation of the study intervention during the enrichment period [21,
safety proile of the active treatment is truncated. It is 27]. Participants may therefore be better able to iden-
likely that some subjects discontinuing the treatment tify what they are receiving for the randomized por-
during the enrichment phase due to adverse efects tion of the study. If adverse events experienced in the
would potentially develop more severe adverse efects enrichment period increase, the subject may assume
were they to continue on therapy. An RCT without an that he or she is in the treatment group, and if the
enrichment phase provides a controlled safety evalu- efects decrease the participant may believe him or her-
ation of the active therapy over a longer time period. self to be in the placebo group. his unblinding could
his drawback is signiicant because many of these bias results in either direction, but the ‘reverse placebo
therapies are indicated for chronic diseases that will efect’ is of particular concern. In this instance, the par-
result in prolonged exposure. However, proponents ticipant feels he or she has been switched to placebo and
of the design have countered that the examination of therefore reports more symptoms than he or she might
a subgroup makes deciphering beneits to subpopula- in a completely blinded study [7]. hese occurrences
tions deined by treatment rather than pain etiology would be more likely in the case of a study drug with
easier. Clinicians need to be advised that the evidence multiple adverse efects that would be obvious to the
131
participant, or in trials that rely on subject reporting unblinding and unblinding may make the study results
for outcome measures, such as pain scores in a study of diicult to interpret
analgesics [21]. Concern about the validity of the enrichment
hese concerns have been cited in reference to design also exists due to the potential for carryover
studies such as the Tacrine Consortium study per- efects. If the efects of this treatment take time to wane
formed by Davis et al [21, 24]. In this study, otherwise ater the treatment is stopped, this time lag must be fac-
healthy subjects with Alzheimer’s disease were initially tored into the study design. If this washout time period
randomized to receive tacrine at a dosage of 40 mg or is not considered, there is potential that efects seen in
80 mg or placebo in two-week blocks of varying order. the placebo group could be attributable to study treat-
he Alzheimer’s Disease Assessment Scale (ADAS- ment given to those participants during the enrich-
cog) was used to assess the subjects during each phase ment period [21, 29].
of treatment. he subject’s ‘best dosage’ of tacrine was Issues related to washout periods were considered
deined as the dosage at which the ADAS was at least 4 by Irving et al. in their trial using the enrichment design
points lower than during the two-week placebo baseline to examine the use of gabapentin for treatment of post-
period following the dosage-titration period. Subjects herpetic neuralgia (PHN). his study was enriched
who achieved a best dosage were then included in a six- by including only subjects who had previously dem-
week randomized, double-blind, placebo-controlled onstrated a response to ≥ 1200 mg of gabapentin and
period [24]. his design has been criticized because it excluding those subjects with dose-limiting adverse
involved exposure of subjects to both the study drug events and subjects with hypersensitivity to gabapen-
and placebo during enrichment. Particularly in the case tin. Because most subjects with PHN continue to have
of a drug such as tacrine with adverse efects including issues with pain control, most available subjects were
nausea and vomiting, when participants experienced undergoing treatment with various agents when they
the switch to placebo it is possible that unblinding may were enrolled in the study. herefore, the research-
have occurred. his efect may have inluenced the out- ers included a pharmacokinetic washout period of
come of the study [21]. > 5 times the half-life of typical treatments for PHN
Unblinding is a concern in many types of clinical including benzodiazepines, tricyclic antidepressants,
trials. However, there are multiple methods that can oral steroids, and others, and a 14-day washout of
be used to minimize this efect. By randomizing only potent opioids. In the study design, a one-week dose
subjects who experience minimal adverse efects in tapering period was built in before subjects were begun
the enrichment period, a strategy most enrichment on active treatment in the randomized portion of the
design trials employ, the chance that subjects will rec- study [30]. hese types of washout periods are essential
ognize a change in frequency in these adverse events in an enrichment design to reduce carryover efects.
may be reduced. Also, if the study drug is tapered for
subjects in the placebo group prior to the start of the
randomized portion, rather than stopped abruptly, the Conclusion
chance that subjects will identify their treatment may he enrichment design is relatively novel and its many
be reduced if using a class of therapy with known with- permutations are still being explored. While most
drawal syndromes such as opioids. he only manner to studies to date have used criteria in the enrichment
deinitively assess the beneit of tapering vis-a-vis the period such as subject response to a drug or response
issue of unblinding is to ask the subjects directly as to to a screening test, in the future these screening cri-
their beliefs about treatment allocation [28]. teria may come to more frequently include molecu-
In order to identify the existence and extent of any lar markers. he distinction between enrichment by
unblinding, subject questionnaires can be directly response and enrichment by expected mechanism of
used to query study participants about their belief as action is signiicant. his type of enrichment design
to when they received treatment or placebo [7]. hese study will likely be very important in ields such as
questionnaires may inquire about the treatment the neuro-oncology. As more assays are being developed
subject believes she is receiving, as well as the reasons with increasing sensitivity and speciicity for diferent
for this guess. he answers may be used to determine if molecular markers, it becomes more realistic to use
there is a higher rate of unblinding than would be pre- these markers to screen subjects in an enrichment
dicted by chance. Study results cannot be adjusted for design setting.
132
Table 12.2 Examples of enrichment strategies for identifying responders in an enrichment design trial
1. Utilize study drug to identify responders
Identify responders to study drug given in open fashion
Identify responders to study drug given in single-blind fashion
Identify responders to study drug given in double-blind fashion
Identify patients who have responded to study drug in clinical practice
Identify patients whose condition flares when study drug is withdrawn
2. Utilize alternate methods to identify responders
Identify responders to a drug similar to study drug
Identify patients in whom symptoms can be induced, e.g., induce pain with treadmill test
Identify patients whose symptoms worsen with study drug withdrawal
3. Identify and exclude placebo responders using a placebo run-in
4. Identify and exclude patients with poor compliance
Source: Ref [3, 4].
Enrichment designs are being increasingly used in 6. McQuay HJ, Derry S, Moore RA, et al. Enriched
ields such as chronic pain research because they may enrolment with randomised withdrawal (EERW): Time
better relect routine clinical practice than other study for a new look at clinical trial design in chronic pain.
Pain 2008; 135: 217–20.
designs. his strategy has speciic advantages for test-
ing a non-curative treatment in a chronic, non-pro- 7. Katz N. Enriched enrollment randomized withdrawal
gressive condition [7]. An enrichment design is well trial designs of analgesics: focus on methodology. Clin J
Pain 2009; 25: 797–807.
suited to examine treatments with small efect sizes in
a general population with increased eiciency, particu- 8. Quitkin FM and Rabkin JG. Methodological problems
larly those treatments with a greater expected efect in in studies of depressive disorder: utility of the
discontinuation design. J Clin Psychopharm 1981; 1:
a particular subpopulation of subjects [9]. Issues such
282–8.
as carryover efects and planning for an appropriate
washout period must be considered when designing a 9. Kopec JA, Abrahamowicz M and Esdaile JM.
Randomized discontinuation trials: Utility and
trial using enrichment design [21]. When used to study
eiciency. J Clin Epidemiol 1993; 46: 959–71.
an appropriate condition, and with proper planning to
avoid pitfalls facing this and other similar clinical trial 10. Ho TW BJ, Froman S and Polydekis M. Eicient
assessment of neuropathic pain drugs in patients with
designs, enrichment enrollment designs ofer an ei-
small iber sensory neuropathies. Pain 2009; 141:
cient way to evaluate potential therapies. 19–24.
11. Hewitt DJ, Ho TW, Galer B, et al. Impact of
References responder deinition on the enriched enrollment
1. Amery W and Dony J. A clinical trial design avoiding randomized withdrawal trial design for establishing
undue placebo treatment. J Clin Pharmacol 1975; 15: proof of concept in neuropathic pain. Pain 2011; 152:
674–9. 514–21.
2. Chow SC LJ. Design and Analysis of Clinical Trials, 2nd 12. Fu P, Dowlati A and Schluchter M. Comparison of
ed. Hoboken, NJ,:John Wiley & Sons, Inc, 2004. power between randomized discontinuation design
and upfront randomization design on progression-free
3. Quessy SN. Two-stage enriched enrolment pain trials:
survival. J Clin Oncol 2009; 27: 4135–41.
a brief review of designs and opportunities for broader
application. Pain 2010; 148: 8–13. 13. Byas-Smith MG, Max MB, Muir J and Kingman A.
Transdermal clonidine compared to placebo in painful
4. Straube S DS, Derry S, McQuay HJ and Moore
diabetic neuropathy using a two-stage ‘enriched
RA. Enriched enrolment: deinition and efects of
enrollment’ design. Pain 1995; 60: 267–74.
enrichment and dose in trials of pregabalin and
gabapentin in neuropathic pain. A systematic review. 14. Knipschild P, Lefers P and Feinstein AR. he
Br J Clin Pharmacol 2008; 66: 266–75. qualiication period. J Clin Epidemiol 1991; 44: 441–4.
133
15. Chow SC, editor. Encyclopedia of Biopharmaceutical 23. Rosner GL, Stadler W and Ratain MJ. Randomized
Statistics. New York, Marcel Dekker, 2000. discontinuation design: application to cytostatic
16. Echt DS Liebson PR, Mitchell LB, Peters RW, et al. antineoplastic agents. J Clin Oncol 2002; 20: 4478–84.
Mortality and morbidity in patients receiving 24. Davis KL, hal LJ, Gamzu ER, et al. A double-blind,
encainide, lecainide, or placebo. he Cardiac placebo-controlled multicenter study of tacrine for
Arrhythmia Suppression Trial. N Engl J Med 1991; 324: Alzheimer’s disease. he Tacrine Collaborative Study
781–8. Group. N Engl J Med 1992; 327: 1253–9.
17. Chow S-C. Encyclopedia of Biopharmaceutical Statistics. 25. Lemmens HJM WD, Munera C, Eltahtawy A, et al.
New York, Marcel Dekker, 2000. Enriched analgesic eicacy studies: an assessment by
18. Fleming TR, DeMets DL. Surrogate end points in clinical trial simulation. Contemporary Clin Trials 2006;
clinical trials: are we being misled? Ann Intern Med 27: 165–73.
1996; 125: 605–13. 26. Freidlin B SR. Evaluation of randomized
19. Betensky RA, Louis DN and Cairncross JG. Inluence discontinuation design. J Clin Oncol 2005; 23: 5094–8.
of unrecognized molecular heterogeneity on 27. Staud R and Price DD. Role of placebo factors in clinical
randomized clinical trials. J Clin Oncol 2002; 20: trials with special focus on enrichment designs. Pain
2495–9. 2008; 139: 479–80.
20. Staud R, Price DD. Long-term trials of pregabalin and 28. Moore RA, Derry S and McQuay HJ. Response to: Long-
duloxetine for ibromyalgia symptoms: how study term trials of pregabalin and duloxetine for ibromyalgia
designs can afect placebo factors. Pain 2008; 136: symptoms: how study designs can afect placebo factors.
232–4. Pain 2008; 139: 477–9; author reply 9–80.
21. Leber PD and Davis CS. hreats to the validity of 29. Sonpavde G GM, Hutson TE and Von Hof DD. Patient
clinical trials employing enrichment strategies for selection for Phase II trials. Am J Clin Oncol 2009; 32:
sample selection. Controlled clinical trials 1998; 19: 216–9.
178–87. 30. Irving G, Jensen M, Cramer M, et al. Eicacy and
22. Lynch ME, Clark AJ and Sawynok J. Intravenous tolerability of gastric-retentive gabapentin for the
adenosine alleviates neuropathic pain: a double blind treatment of postherpetic neuralgia: results of a double-
placebo controlled crossover trial using an enriched blind, randomized, placebo-controlled clinical trial.
enrolment design. Pain 2003; 103: 111–7. Clin J Pain 2009; 25: 185–92.
134
Chapter
Non-inferiority trials
13 Rick Chappell
Introduction and definitions ethical, practical, and scientiic aspects of equivalence

trials.
The scope of this chapter Despite certain problems in implementation and
interpretation, which are discussed below, equivalence
he traditional role of the randomized clinical trial is
trials have yielded useful clinical results. Table 1 of [3]
to determine if there is superiority of one treatment,
lists examples of important therapeutic advances in
diagnostic technique, or preventive measure over
which the treatment was not proven more efective than
one or more others (See Chapter 2). his paradigm
an established treatment – including selective serotonin
is reasonable when standard of care interventions are
reuptake inhibitor antidepressants, which, though
non-existent or, in situations when they do exist, have not shown to be more efective than tricyclic antide-
undesirable characteristics such as low eicacy and/or pressants, are better tolerated; and the antipsychotic
high toxicity rates. As medical progress creates more drugs risperidone, olanzapine, and quetiapine which
alternatives and ethical considerations prohibit the also were found to have fewer side-efects than the
use of inactive interventions in many cases, active- existing phenothiazine and butyrophenone classes of
control trials are becoming common. hese are stud- drugs without being proven more efective. herefore,
ies in which one or more experimental treatments are for better or worse, equivalence is being used as evi-
compared to a control treatment whose efectiveness dence for approving and implementing new classes of
has previously been established [1]. (For simplicity’s treatments.
sake, this chapter will refer to any intervention under A little clariication regarding terminology may now
study as a ‘treatment’, though all comments are gener- be useful. First, this chapter only mentions multi-arm
alizable to prevention and diagnostic trials.) Active- randomized trials in ‘Multiple hypothesis testing and
control superiority trials, each with the goal of trying non-inferiority trials with more than two treatments’
to determine if a new treatment is better than an exist- (below); however, all other discussion also applies to stud-
ing one with respect to the primary outcome, are pos- ies with three or more comparators. Also, please notice
sible and indeed common. But a control which shows that the preceding paragraphs use descriptions such as
clinical activity allows a type of question other than ‘approximately as good as’ and ‘equivalence’ instead of
superiority to be answered: that of equivalence (also ‘equal to.’ his is because equivalence does not imply
referred to, for reasons explained below, as non-inferi- mathematical equality and, indeed, the latter cannot be
ority). hat is, we may wish to know if an experimen- statistically proven. More concretely, it is impossible to
tal treatment is approximately as good as the control prove one treatment’s efect on an outcome to be exactly
with respect to a given outcome. hus active-control equal to another’s if the outcome has random variation,
trials are not always equivalence studies although as of course do all clinical endpoints. We can only show
most equivalence trials are active-control. A possible one treatment to be ‘not much worse’ than another, a
exception to the latter statement is when a treatment is point which is made in more detail in ‘A false method of
investigated as being equivalent to a placebo or other showing equivalence’ and ‘Formal deinition of non-infe-
control using toxicity or other undesirable event as an riority with respect to the equivalence margin and state-
outcome. See [2, 3] for non-technical summaries of ment of hypotheses’ (below). Another ambiguity is that
135
equivalence is sometimes used to mean non-inferiority from randomized controlled trials’ and described
combined with non-superiority. herefore, equivalence as ‘foundational to the experimental nature of rand-
trials are now commonly referred to with greater preci- omized controlled trials’ [9]. Bath [10], while describ-
sion as non-inferiority trials as they are below. ing approaches to clinical trials of stroke, noted that the
Finally, it is useful to distinguish the terms ‘equiva- deinition of ITT is sometimes weakened to include
lence’ and ‘bioequivalence’ if only to limit our atten- only patients who receive one or more treatments
tion to the former. he term ‘bioequivalence’ is used (most trialists prefer this latter standard to be referred
to describe a study of pharmacokinetic similarity to in a qualiied manner as ‘modiied intent-to-treat’
between two treatment formulations, oten in healthy or other similar term). A competing analysis strategy
subjects [4]. Design and analytic considerations for would be to only include those patients who comply
such experiments are diferent than those used in non- closely enough with the protocol. hese constitute a
inferiority trials with clinical outcomes such as the per protocol (PP) sample of patients and although a PP
example (SPORTIF III) described below. I do not dis- analysis is sometimes thought to be relevant for toxicity
cuss bioequivalence trials further. and other outcomes, it is almost universally considered
less relevant than ITT. he role of ITT in non-inferi-
ority trials is discussed further below (‘Intent-to-treat
A brief summary of superiority trials’ in non-inferiority vs. superiority trials: which analysis
relevant properties population should be used?’).
A few important aspects of superiority trials’ conduct
and analysis are now mentioned as background for Other means of avoiding bias
subsequent development of non-inferiority trials and a Randomization and an ITT analysis are not the only
comparison of the two types of studies in ‘Key compar- ingredients of high-quality results in a clinical trial.
isons between superiority and non-inferiority trials’. Although a complete discussion is beyond the limits of
hese are perforce cursory and selected. See Chapter 5 the present chapter, I present three particularly impor-
in this volume for more information. tant properties. he irst is blinding or masking [11].
When a patient is unaware of his or her treatment, he or
The role of randomization she is unable to attribute a clinical response to a speciic
Randomization is as central to the conduct of non- treatment. Such an attribution, if it varied with the treat-
inferiority as it is to superiority trials. Fisher [5] argued ment group, would bias the estimated treatment efect.
that statistical inference (i.e., p-values and conidence A general attribution of improvement without knowl-
intervals) is impossible without randomization. Even edge of the treatment group is certainly possible due to
among those who consider that position to be extreme, the well-known placebo efect; but if these subjective
a randomized controlled trial is the ‘standard by which conclusions are unrelated to the treatment, as they must
all other trials are judged’ and ‘the best method for be in the presence of proper blinding, bias from this
achieving comparability’ quoting [6], p. 61, but see also source is impossible. See Chapter 5 for details.
[7, 8]. his is because it is the only mechanism of assur- Well-deined endpoints, clearly prioritized and
ing approximate comparability between the treatment stated in advance, are important components of power-
groups with respect to both observed and unobserved ful trials yielding useful results. Lack of ambiguity is
predictors. important in order for a study’s conclusions to be clear.
For example, suppose a treatment is hypothesized
The role of intent-to-treat analysis to reduce bone fractures due to falls. Of course the
he intent-to-treat (or intention-to-treat; abbreviated investigators could make the occurrence of any bone
ITT) principle states that patients who are randomized fracture following a fall the study’s primary endpoint.
to a treatment group should be analyzed as part of that However, even in high-risk populations this tends to
group even if they crossed over to a diferent treatment be a rare occurrence with moderate follow-up and so a
and requires that all outcomes be determined regard- trial with this primary outcome would either be under-
less of their purported relation to treatment. hat is, powered or very large. Alternative endpoints include:
the ITT analysis strategy uses all randomized patients any bone fracture; the number of fractures; the number
along with all of their outcomes. It has been called ‘the of separate incidents involving a fracture (this difers
most fundamental principle underlying the analysis from the preceding when a single accident causes more
136
Chapter 13: Non-inferiority trials
than one fracture); the number of falls; the number of implying that the treatment group betters the control
days on which a fall occurs (patients may not remem- as measured by Efect. Data from the superiority trial
ber multiple falls in a single day); and various measures are used to either reject or not reject H0 according to
of balance and stability. hese are just some examples a pre-deined signiicance level (maximum false posi-
of endpoints potentially assessing neurological inter- tive error rate, also called type I error rate or α) such
ventions; trials of treatments which are thought to dir- as 0.05, 0.025, or 0.01. Rejecting H0 and accepting HA
ectly inluence bone strength can have other outcomes would imply that, subject to the possibility of error
such as measures of bone mineral density. Each end- quantiied by the p-value, E is superior to C. On the
point and its associated hypothesis should be stated in other hand failure to reject H0 would not imply that HA
advance, especially for the trial’s primary question, and is false, rather that the trial does not contain enough
is oten a trade-of between clinical relevance and the information to support the conclusion that it is true.
trial’s ability to answer the question. he inequality in HA need not be strict – substituting ≠
One important aspect of the research question for ≤ merely makes the hypothesis two-sided so that it
for any outcome in which time plays a role (which is tests superiority in either direction.
the majority of clinical outcomes) is length of patient
follow-up. here are choices involved with even such
an apparently unambiguous outcome as mortality :
A false method of showing
whether mortality is to be deined dichotomously as equivalence
having occurred or not occurred over the course of the he penultimate sentence of the previous part belies an
study; or whether time to death is to be the primary informal, seemingly ubiquitous yet erroneous strategy
descriptor; whether all-cause mortality will be assessed; for attempting to prove equivalence: performing a clin-
or, if not, which deaths due to intercurrent illnesses will ical trial then concluding superiority if H0 is rejected
be excluded; and, in all cases, how long each patient is and equivalence otherwise. Of course, a small clinical
to be observed. In all but the most pernicious illnesses trial could fail to reject a false null hypothesis. In fact, if
some survivors will be seen and so their lack of events the goal of a trial is to show equivalence by not reject-
at the study’s end, or right-censoring, will require spe- ing H0, the chance of success would be maximized by a
cial analytic techniques. sample size of 0! No information yields no conclusive
evidence of superiority, but of course should give no
Statement of null and alternative hypotheses evidence of equivalence either.
An alternative has long been advocated by statis-
for superiority trials
ticians: computing a conidence interval for EfectE –
Consider a superiority trial in which an endpoint is EfectC (using the notation of ‘Statement of null and
described with a quantity denoted, for simplicity’s alternative hypotheses for superiority trials’, again
sake, as ‘Efect’. his could be a mean or a median of assuming large efects to be unfavorable) and using it
a continuous outcome such as a stroke severity scale; to characterize the diference between treatment and
a proportion of seizure-free patients; the hazard of or control in terms of the efect of interest. A conidence
median time to death or other failure time outcome; interval may include zero, meaning that equivalence
or some other quantity of interest. Suppose also that would not be ruled out, but it will also include a range
the treatment groups are labeled ‘Experimental’ and of other possibilities for the diference in efects. his
‘Control’, where the latter could refer to an active, pla- entire range must be compared to clinically interest-
cebo or other control, and are abbreviated E and C. ing diferences in order to be interpreted. (he consid-
hen assuming lower values of Efect are better the null erations which go into deining ‘clinically interesting’
hypothesis of no diference is formulated as: are discussed below in ‘Choice of equivalence margin
H0: EfectE – EfectC = 0, and scale’.) In particular, the upper endpoint of the
conidence interval for EfectE – EfectC gives the worst
meaning that Efect is identical in the experimen- reasonable estimated performance of the experimen-
tal treatment and control groups, and the alternative tal treatment compared to the control. ‘Reasonable’
hypothesis of superiority is: relects the coverage of the conidence interval and
corresponding false positive error rate, oten but not
HA: EfectE – EfectC ≤ 0, necessarily 97.5% and 2.5%, respectively.
137
Formal definition of non-inferiority in the prevention of strokes and systemic embolic

events in patients with atrial ibrillation. A blinded
with respect to the equivalence trial of otherwise similar design, SPORTIF V, was also
margin and statement of hypotheses conducted approximately concurrently. For simpli-
As with most aspects of the analysis of a randomized city’s sake, this chapter will only mention SPORTIF III
clinical trial, the standard to which the worst per- below although considerations discussed here apply to
formance of the treatment vs. the control (the upper both studies. he trials’ designs are described in more
endpoint of the conidence interval just mentioned) is detail in [13] and their results in [14]. Ximelagatran
to be compared should be speciied in advance. his was a new thrombin inhibitor under investigation as an
standard is called a non-inferiority margin; the ICH alternative to warfarin because of the latter’s side efects
Guideline E3 [12] requires the trial’s protocol to state and intolerability in some patients. Because ethical
the margin to be a ‘pre-speciied degree of inferiority’ considerations forbade a placebo arm in these high-
oten denoted as Δ. We can thus frame the hypothesis- risk patients, non-inferiority trials were conducted to
testing paradigm in the usual fashion, where rejecting determine whether or not ximelagatran is clinically
the null hypothesis H0 gives a successful resolution of equivalent to warfarin. SPORTIF III’s salient charac-
the trial’s primary goal (a conclusion that the experi- teristics are given in Table 13.1.
mental treatment is non-inferior to the control treat-
ment) and failing to reject it results in the opposite (that Statement of equivalence margin and
a conclusion of inferiority is reasonable). his requires hypotheses
a new pair of null and alternative hypotheses: Based on previous studies and assuming equality of
H0: EfectE – EfectC > Δ, efect, a primary event rate of 3.1% per year was esti-
mated in advance of the study for patients in both
meaning that Efect in the experimental treatment is treatment groups with additional patients or follow-up
inferior to Efect in the control group by an amount planned if necessary to guarantee at least 80 primary
exceeding Δ. Also endpoints. he margin of equivalence was chosen to
HA: EfectE – EfectC ≤ Δ, be 2% per year, inducing the following null and alter-
native hypotheses, denoted H0 and HA, respectively,
meaning that the experimental treatment may be super- where RateE and RateC refer to the annual rates of the
ior to the control, identical to it, or slightly inferior to primary endpoints in the two groups:
it in terms of Efect but, in the last case, the inferiority
is no more than Δ. If we reject H0 and accept the alter- H0: RateE – RateC > 2%/year
native hypothesis HA we claim to have demonstrated E and
to be non-inferior to C with respect to a pre-speciied
margin Δ. If (and only if) so, the conidence interval HA: RateE – RateC ≤ 2%/year.
for EfectE – EfectC falls entirely below Δ. here are a hus ximelagatran (‘E’) is the experimental treatment
variety of possible combinations of non-inferiority and and warfarin (‘C’) is the active control. he primary aim
superiority conclusions; these are discussed in the fol- of SPORTIF III was to prove that ximelagatran does not
lowing part and shown in Figure 13.1. cause an excess of 2%/year or more in strokes and sys-
temic embolic events compared to warfarin. he logic
Motivating example – SPORTIF III, a behind the choice of a 2% margin is briely discussed
below (‘Demonstrating superiority to placebo’).
non-inferiority trial of ximelagatran
vs. warfarin in stroke prevention Interpretation of possible trial results
Figure 13.1, from Figure 2 of [13], summarizes an
Brief description of study background and assortment of hypothetical SPORTIF III results with
goals their interpretations. Point estimates (diamonds) with
he Stroke Prevention using Oral hrombin Inhibitor two-sided 95% conidence intervals are given for a var-
in Atrial Fibrillation (SPORTIF) III trial was an open- iety of scenarios. Dotted lines show the 2% margins of
label study of the eicacy of ximelagatran vs. warfarin clinical equivalence. he top two conidence intervals
138
Table 13.1 Design elements of SPORTIF III

Trial type Randomized, parallel, two cohorts
Blinding status Open-label, blinded assessment
Planned sample size 3407 patients at 259 sites in 23 countries
Patient population Age ≥ 18 y.o., atrial fibrillation, high-risk (at least one of: hypertension;
age ≥ 75 y.o.; previous stroke, TIA, or systemic embolism; left ventricular
dysfunction; or age ≥ 65 y.o. and diabetes mellitus or coronary artery disease)
Timing Enrollment 7/2000–12/2001
Planned average duration of 16 months
treatment / followup
Treatment groups Ximelagatran (E; experimental treatment)
Warfarin (C; active control)
Primary endpoint Stroke (ischemic or hemorrhagic) or systemic embolic event
Margin of equivalence ∆ = 2%/year
Size of test for equivalence α = 0.025
Power of test for equivalence 90% for primary event rate of 3.1%/year in each treatment group
Ximelagatran Clinical Warfarin Figure 13.1. Hypothetical outcomes

better equivalence better from the SPORTIF III trial with point
estimates and 95% two-sided confidence
intervals.
Non-inferiority
Inconclusive
Superiority
Inferiority
–6 –4 –2 0 2 4 6
Event rate difference (ximelagatran minus warfarin), % per year
lie entirely within the region of clinical equivalence; Key comparisons between superiority
for these cases, there is evidence that neither drug is
superior to the other. he third interval indicates a and non-inferiority trials
possibility but no proof that ximelagatran is superior.
For all three cases, non-inferiority of ximelagatran is Sample size
demonstrated (H0 in rejected in favor of HA) because For a non-inferiority trial, the discussion above
the conidence intervals do not overlap the right dot- (‘Formal deinition of non-inferiority with respect to
ted line drawn at Δ = 2%/year. he next two conidence the equivalence margin and statement of hypotheses’)
intervals are inconclusive, in that neither non-inferi- shows that H0 is rejected, and HA accepted, if the con-
ority of ximelagatran nor superiority of warfarin can idence interval for the diference in treatment efects
be demonstrated. he following interval shows a situ- falls entirely below the non-inferiority margin Δ. Since
ation in which superiority (and thus, of course, non- conidence intervals narrow with increasing sample
inferiority also) of ximelagatran is concluded. he last size, the chance of rejecting H0 and concluding equiva-
possible case is that of ximelagatran’s proven inferior- lence (the power of the trial) increases with sample size
ity to warfarin. as long as H0 is false. In this respect non-inferiority and
139
superiority trials are identical: higher sample sizes give Table 13.2 Total sample sizes for two-arm non-inferiority trials
more information. It is also true that the sample size with false-positive rate = 0.025 and power = 90% as a function of
additive non-inferiority margin and event proportions.
required for a superiority trial with a treatment efect
(EfectE – EfectC in the notation above) of size Δ is the Event
same as that needed for a non-inferiority trial with mar- proportion
gin Δ assuming equality (EfectE = EfectC). his holds in both
exactly for normally distributed outcomes, approxi- groups ∆ = 0.05 ∆ = 0.1 ∆ = 0.15 ∆ = 0.2
mately for other types of outcomes such as binary data, 0.05 800 200 90 50
and assumes identical powers and critical p-values. See 0.1 1,514 380 170 96
Chapter 4 by Bebchuk and Wittes for a description of 0.2 2,690 674 300 170
sample size calculation in superiority trials.
0.3 3,532 884 394 222
he simplest binary sample size calculation, con-
densing Equation (11.5) of [15], gives the following 0.4 4,036 1,010 450 254
formula for a two-arm non-inferiority trial with binary 0.5 4,204 1,052 468 264
outcomes, non-inferiority margin Δ, and equality 0.6 4,036 1,010 450 254
of proportions. he assumed proportion of events in 0.7 3,532 884 394 222
the experimental and control groups (under the usual 0.8 2,690 674 300 170
alternative hypothesis used for power/sample size cal-
0.9 1,514 380 170 96
culation purposes, this is equal in the two groups) is
denoted P. hen the total required sample size is 0.95 800 200 90 50
n = 4 × P × (1 – P) × (ZPower + Z1-α)2/Δ2.
ZPower is the normal quantile for the required power; for require very large sample sizes is a common one, and
example, with 90% power Z.9 = 1.28. Also, Z1-α is the correctly describes it as a myth. he logic varies with
normal quantile for one minus the signiicance level α; the situation; at times a large margin may be sui-
for a traditional signiicance level of 0.025, Z.975 = 1.96. cient to show clinical equivalence and at other times
Table 13.2 gives sample sizes for a two-arm non-infe- a smaller one is required. Remember that all the sam-
riority trial with 90% power and a signiicance level of ple size calculations above are carried out under the
α = 0.025 for a range of non-inferiority margins Δ and assumption of true equivalence, that proportions in
event proportions. each group are the same. It is certainly possible to per-
his table shows that if the event rate in both groups form these computations using the equations of [15]
is 0.6 then 450 subjects (225 per group) are required under a more optimistic assumption of superiority on
to provide 90% power to demonstrate non-inferiority the part of the experimental treatment. his procedure
of the experimental group with respect to a 0.15 mar- has been proposed [17] and indeed lowers the sample
gin. Table 13.2 holds no matter whether events are size. However, an investigator assuming superiority
successes or failures, but note that certain cells aren’t and testing only for non-inferiority can be accused of
applicable to both situations. For example if events ‘having his cake and eating it too’ (or even her cake).
are failures then the lower right corner isn’t relevant: On the other hand, pessimism – that is, assuming a
assuming proportions equal to 0.9 with a margin of 0.2 slight inferiority of the new treatment compared to the
would allow non-inferiority to extend above a propor- standard treatment – could be a usefully conservative
tion of 1. An experimental treatment whose failure rate strategy in computing sample size.
is 100% couldn’t be usefully claimed to be non-inferior
to anything. However if events are successes then the
lower right corner is useful but the upper let is not.
How patient non-adherence influences
Sample size formulas for proportions using a rela- results in the two types of trials
tive (multiplicative) instead of an absolute (additive) Consider a trial in which one drug is tested for super-
margin are given in [15], as are methods for time-to- iority over another. Suppose the trial is successful in
event outcomes. that the experimental treatment efect signiicantly
Fleming [16] claims the notion that non-inferior- improves upon the control’s but that there is sub-
ity trials with scientiically rigorous margins always stantial non-compliance in each group. Although
140
the results could be legitimately criticized on various Since the latter were presumably superiority trials
grounds – including lack of generalizability, imprac- which used ITT, comparability is enhanced by using
ticality of extension to ordinary clinical practice, and ITT in the non-inferiority trial. A practical comprom-
underestimation of true eicacy and toxicity rates – ise is to perform both ITT and PP analyses, hope that
the non-compliance would not contradict the basic they are similar, and if so to use the former for the main
conclusion of superiority. Non-compliance biases results. If the two analyses difer then it may be a useful
the ‘true efect’ (the diference achieved under the if laborious exercise to model the missing data/non-
scenario of full drug exposure) towards the null and compliance mechanisms and perform sensitivity ana-
therefore the observed results would be conservative. lyses on their efects upon the trial’s results.
he same cannot be said of non-inferiority trials: if
nobody in either group of this type of study took their
drugs then of course the treatments would appear to Practical issues in non-inferiority trials
be equivalent. hus it has been noted that ‘One of the
concerns that has been expressed regarding equiva- Choice of equivalence margin and scale
lence trials is that sloppiness in the conduct of the trial Many authors have discussed the choice of margin in
biases results towards no diference [18, 19].’ But since non-inferiority trials [21–25]. All agree that it should
nearly all trials are sloppy to some extent, this leads us be made in advance and pre-speciied in the protocol.
to the question of what amount of non-compliance in A variety of guidelines have been put forth; although
a non-inferiority trial invalidates its results. No irm these are usually subjective, rational decisions can be
answer has been given so far; Chi et al. [1] have given made based on clinical factors and data from the his-
a practical recommendation that ‘One should design torical study or studies which established the active
the current trial to be as similar as possible to the his- control treatment’s eicacy.
torical placebo control trials used in estimating the he margin Δ should clearly not exceed the bene-
historical control efect.’ it provided by the control treatment; otherwise the
experimental therapy would not be shown superior to
Intent-to-treat in non-inferiority vs. placebo (assuming that the control was tested against
superiority trials: which analysis population a placebo) even if non-inferiority by Δ was demon-
strated. One rule of thumb is that the margin should
should be used? be less than half the beneit ascribed to the control,
As stated above (‘he role of intent-to-treat analysis’), i.e., the new treatment retains at least one-half of the
the intent-to-treat strategy for analysis is crucial to the beneit of the active control treatment. Even this ‘50%
interpretation of superiority trials. he ICH Guideline rule’ is vague because it could use either the estimated
E3 [12], in a section which neither speciically refers control beneit or the lower bound of the 95% or other
to nor excludes non-inferiority trials, states ‘An ana- level conidence interval (the description of Figure 13.3
lysis using all available data should be carried out for below uses estimated beneits but lower conidence
all studies intended to establish eicacy.’ Most analysts bounds could be substituted without changing its mes-
agree with this statement. But per-protocol analyses, for sage). One opinion, published in the aptly named ‘he
reasons given above, are also particularly relevant for trials and tribulations of non-inferiority: the ximela-
non-inferiority trials. he problem with the ITT strat- gatran experience’ [26], claimed that SPORTIF III had
egy in non-inferiority trials is that it could bias results ‘… an unreasonably generous margin that was poten-
towards equivalence, making it an anti-conservative tially biased toward non-inferiority’. he authors thus
strategy. It can also bias results in the opposite direc- evinced ‘… a lack of conidence that ximelagatran
tion if the patterns of non-compliance are diferent in retains at least 50% of warfarin’s efect (a prerequisite to
the two groups. Since PP analyses can also be biased, the establishment of non-inferiority)’ and expressed a
we are let with a diicult choice. Wiens and Zhao [20] preference, based on a meta-analysis of warfarin’s efect
promote ITT, saying that ‘he ITT analysis follows compared to placebo, for a margin of Δ = .68%/year.
from randomization, and must be used to maintain the hey did not mention their preference’s consequence
integrity of randomization.’ hey also point out that (because sample size is roughly inversely proportional
a non-inferiority trial should be conducted similarly to the square of the margin) that a change of the mar-
to the trials which established eicacy of the control. gin from 2% annually downward to .68% would have
141
multiplied the sample size by a factor of 8.7, necessitat-

ing approximately 29,500 patients.
Demonstrating superiority to placebo
Another standard for setting the margin is that it The problem of invoking a non-randomized
should be clinically relevant. Even if the active con-
comparison
trol was shown to have a large efect, a margin of half
that efect may be too large for judging non-inferiority Temple and Ellenberg [3] state the fundamental prob-
between it and an experimental treatment. For exam- lem of non-inferiority studies to be one of assay sensi-
ple, if an active control antibiotic is known to cure about tivity : the ability of a trial to show that a new therapy is
90% of infections a margin of 45% is likely too large for efective. his can be achieved by demonstrating it to
comparing it to an experimental treatment. A success- be superior to a placebo control or, as discussed here,
ful trial may not yield clinically useful results even if it to not be inferior by some deined amount to a known
were to show a maximum 45% diference, allowing a efective treatment. However the inference that non-
cure rate as low as 45% for the new treatment. See [26] inferiority to the known treatment implies superiority
for an extensive discussion of the choice of margin for to a placebo involves a crucial assumption, and Temple
the SPORTIF III trial. and Ellenberg point out that ‘support for this assump-
he choice of a margin’s scale − e.g., should Δ con- tion must come from sources external to the trial.’ In
stitute a diference between two means, proportions, the SPORTIF III example we must assume that warfa-
or rates; or their ratio; or some other quantity such rin is efective in the population under study in order
as an odds ratio – is less commonly mentioned but is for non-inferiority of ximelagatran to be clinically
also important. his is not an issue in superiority tri- interesting. his is not provable by the trial because an
als. For example, the null hypothesis of no efect on inactive control group is not included in it. hus infer-
mean blood pressure could be expressed as a diference ence between the experimental treatment and active
between means equaling 0 or a ratio equaling 1, but it comparator relies on historical evidence of sensitivity
wouldn’t matter: the two are mathematically equiva- to drug efects (HESDE; see Section V of [28]), based
lent. In non-inferiority trials the scale used in the null on past trials. Figure 13.2a shows a schematic for this
hypothesis does matter because we need it in order to type of inference. A non-inferiority trial (right) yields
deine the margin. he designers of the SPORTIF trial a randomized comparison between the experimental
assumed for the sake of power calculations that event and active control treatments in its patients while the
rates were 3.1%/year with an absolute, or additive, 2% historical superiority trial (let) gives a randomization-
margin (see Table 13.1). hey could have chosen a based estimate of the active control’s efect compared
relative, or multiplicative, scale to judge equivalence, to a control. But the inferential leap that the active con-
for example 5.1/3.1 = 1.65. his would have resulted trol’s efect in the non-inferiority trial is the same as in
in a very diferent trial. he interpretation for patient the historical trial is not based on evidence, or at least
populations with diferent risks would change: if we not on evidence of the same quality as that used for the
wanted to extrapolate SPORTIF’s results to a higher- other conclusions. herefore HESDE, like all important
risk group with stroke/systemic embolic rate of 6%, a truths, can be a nebulous thing [29]. his is illustrated
2% absolute margin indicates that they would have a by the International Conference of Harmonisation
maximum rate of 8%; but a 1.65 relative margin gives (ICH) Guideline E10 [30] which gives a variety of
a maximum rate of 9.9%. Power calculations produce scenarios in which a well-run randomized controlled
diferent sample sizes for the two scales with a relative clinical trial’s result may not be reproduced. HESDE is
margin requiring a larger trial. Analytic methods would relevant only if it was achieved under conditions simi-
change as well – a relative margin relies on the well- lar to those obtained by the new trials. hese conditions
known proportional hazards model, while an absolute include, but are not limited to, the patients under study,
margin requires the more rarely used additive hazards adjuvant treatments, and diagnostic standards. Note
regression [27]. Note that although ‘Formal deinition that comparisons conducted within a single trial have
of non-inferiority with respect to the equivalence mar- no such problems as shown by Figure 13.2b. hough
gin and statement of hypotheses’ (above) only shows their generalizability may be questioned, these con-
hypotheses for absolute diferences, they can also apply clusions’ validity depends only upon all patients being
to relative diferences when efects are logarithmically drawn from the same population and their treatment
transformed. randomly determined [31].
142
Historical trial Equivalence trial Figure 13.2a. Schematic for inference

in non-inferiority trials.
nH Patients nE Patients
Randomization Randomization
Placebo, nH/2 Old drug, nH/2 Old drug, nE/2 New drug, nE/2
Compare ? Compare
of non-inferiority trials can make an active treatment

n Patients
appear equivalent to a placebo even in the presence of
Randomization
HESDE, has been termed ‘bio-creep’ [16]. It can be pre-
vented by comparing Drugs 2, 3, and 4 all to Drug 1 and
not to each other in an ever-more-imprecise sequence.
his preventive may be impractical because Drug 1
could be of the market or undesirable due to consid-
Old drug, n/2 New drug, n/2 erations such as side efects.
The problem of an incentive to produce minimally

Compare
significant results
Figure 13.2b. Schematic for inference in superiority trials.
In ‘Practical issues in non-inferiority trials’ (above)
it was mentioned that the precision of the active con-
The problem of equivalency drift trol’s estimated efect is relevant in choosing a non-
Even if HESDE holds and the inference indicated by a inferiority margin for comparing it to an experimental
question mark in Figure 13.2a is valid, other problems treatment. If the active control’s efect estimated from
can interfere with non-inferiority studies’ assay sen- a historical trial was 10% with a conidence interval of
sitivity. Suppose instead of an ‘Old drug’ and a ‘New ± 4% then we clearly don’t want a non-inferiority mar-
drug’ we have a series of treatments denoted ‘Drug 1’ gin in the new study to be 6% or higher. Under HESDE
… ‘Drug 4’ and that Drug 1 was proven to be superior the active control’s beneit in the new trial could be 6%,
to a placebo by 4% in a historical trial. his is schemati- the lower end of the conidence interval, and so a 6%
cally illustrated in Figure 13.3, which depicts an artii- margin would allow an inefective experimental treat-
cial series of results via point estimates (these are less ment to be judged equivalent to it. his means that in
realistic than lower conidence interval endpoints but a new age of active controlled non-inferiority trials a
render a clearer example). For ethical or perhaps other drug company could have an incentive to make dem-
reasons, Drug 2 was not compared to a placebo but to onstration of assay sensitivity as diicult as possible for
Drug 1 in a non-inferiority trial with a non-inferiority its product’s successors. One way to do so in the present
margin of 2%. We suppose that the null hypothesis was example would be to aim, perhaps using interim stop-
rejected in this trial and non-inferiority was concluded ping guidelines, for a conidence interval of ± 9.9%
implying, in the presence of HESDE, that Drug 2’s ben- whereby the 50% rule could require a margin of 0.05%,
eit exceeded 2%. In fact it was estimated to be 1.5% half the control’s minimum beneit. Are we headed for
lower than Drug 1’s, as 2.5%. hen, similarly, Drug 3 a future in which all conidence intervals just barely
can be concluded to be non-inferior to Drug 2, with exclude 0 and all p-values are 0.049? A related point is
an estimated efect of 1.2%, say, and Drug 4 seen to be that if there are several active controls available there is
non-inferior to Drug 3. But Drug 4 has an estimated great incentive to choose the ‘worst’ active control as a
0% beneit! his unpleasant feature, in which a series comparator.
143
+4% Figure 13.3. The problem of

equivalency drift.
Margins of non-
inferiority (∆) = 2%
Benefit
0
Drug 1 Drug 2 Drug 3 Drug 4
Equivalent Equivalent Equivalent
Multiple hypothesis testing and non- Moving from a superiority to a non-

inferiority trials with more than two inferiority hypothesis or vice-versa, after
treatments the results are in
Designers of clinical trials which formally test two or Having failed to show superiority it may be tempting
more sets of hypotheses usually need to reduce the to switch to a ‘salvage hypothesis’ of non-inferiority
individual tests’ p-values for signiicance in order to or, having shown non-inferiority, to hope to success-
preserve the overall false positive error, also called fully address the ‘home-run hypothesis’ of superior-
type I error. his is true when the trial is declared a ity. hese strategies are theoretically possible but have
success if any null hypothesis is rejected, a common the following practical pitfall. he caution concerning
strategy in superiority trials with multiple endpoints multiplicity of testing mentioned above can be waived
(for example, when a treatment is hoped to reduce at in this speciic instance. Dunnett and Gent [33] showed
least one of a number of symptoms). However, non- that because the hypothesis of non-inferiority is nested
inferiority trials are oten conducted to show equiv- inside that of superiority (the latter always implies the
alence on all sets of hypotheses. hese hypotheses former, for a given population), we can conduct both
could relect equivalence of two or more outcomes, tests without adjusting for multiple comparisons.
in two or more patient subgroups, or among three Wiens [34, informatively titled ‘Something for noth-
or more treatments. In this case, setting the critical ing in non-inferiority / superiority testing: a caution’]
p-value for each comparison at the overall value (e.g., pointed out that Dunnett and Gent’s results crucially
0.025) has the result of making the tests of universal depend upon the same strategies (ITT vs. per-proto-
non-inferiority to be conservative. hat is because col) being used for each analysis. Although superior-
every null hypothesis of inferiority must be rejected ity trials usually use the ITT strategy for their primary
in order for this conclusion to be reached. hus either analyses, ‘Intent-to-treat in non-inferiority vs. superi-
the unadjusted p-values are used or each signiicance ority trials: which analysis population should be used?’
level could be increased to make the overall probabil- (above) warns that both ITT and per-protocol strate-
ity of erroneously rejecting all null hypotheses equal gies are relevant for non-inferiority trials. It is certainly
to a nominal value such as 0.025. he second choice possible for an ITT analysis to show superiority while
is rarely made and involves specialized calculations. a per-protocol analysis of the same data fails to dem-
his scenario is a special case of the ‘reverse multiplic- onstrate non-inferiority. How then would investigators
ity problem’ [32]. interpret simultaneous superiority and inferiority?
144
12. International Conference of Harmonisation. Guideline

Summary E3: Choice of Control Group and Related Issues
In conclusion, non-inferiority trials can make useful in Clinical Trials. 2000. http://www.ich.org/cache/
scientiic contributions when ethical considerations compo/475–272–1.html#E3
disallow a placebo or other inactive control. However, 13. Halperin JL. Ximelagatran compared with warfarin
unlike the scenario of a superiority trial with a placebo, for prevention of thromboembolism in patients with
their assay sensitivity is not directly ensured by rand- nonvalvular atrial ibrillation: Rationale, objectives, and
omized comparison and so there are numerous cautions in design of a pair of clinical studies and baseline patient
their use. In particular, the measures of study quality men- characteristics (SPORTIF III and V). Am Heart J 2003;
146, 431–8.
tioned in ‘Key comparisons between superiority and non-
inferiority trials’ should be carefully examined. Finally, 14. Hankey JH, Klijn CJM, and Eikelboom JW.
when interpreting results it is important to remember that Ximelagatran or Warfarin for stroke prevention
in patients with atrial ibrillation? Stroke 2004; 35:
non-inferiority can be shown only with respect to a given
389–391.
margin and is only as relevant as that margin.
15. Julious, SA. Sample Sizes for Clinical Trials. Boca Raton,
FL, Chapman & Hall/CRC, 2010.
References 16. Fleming, T. Current issues in non-inferiority trials. Stat
1. Chi GYH, Chen G, Rothmann, M, and Li, N. Active Med 2008; 27: 317–32.
control trials. In Chow SC, ed. Encyclopedia of
Biopharmaceutical Statistics 3rd ed. London: Informa 17. Friedlin B, Korn EL, George, SL, Gray R. Randomized
Health Care, 2010; 8–15. clinical trial design for assessing noninferiority
when superiority is expected. J Clin Onc 2007; 25:
2. Temple R, Ellenberg SS. Placebo-controlled trials 5019–5023.
and active-control trials in the evaluation of new
treatments. Part 1: Ethical and scientiic issues. Ann 18. Temple, R. Government viewpoint of clinical trials.
Intern Med 2000; 133: 455–463. Drug Inf J 1982; 16: 10–17.
3. Ellenberg SS, Temple R. Placebo-controlled trials 19. Hauck, WW and Anderson, S. Some issues in the design
and active-control trials in the evaluation of new and analysis of equivalence trials. Drug Inf J 1999; 33:
treatments. Part 2: Practical issues and speciic cases. 109–118.
Ann Intern Med 2000; 133: 464–470. 20. Wiens BL, Zhou W. he role of intention to treat in
4. Endrenyi, L. Bioequivalence. In he Encyclopedia of analysis of noninferiority studies. Clinical Trials 2007; 4:
Biostatistics, 2nd ed. P Armitage and T Colton, eds. New 286–291.
York, Wiley, 2010. 21. Blackwelder W. Proving the null hypothesis in clinical
5. Fisher RA. he arrangement of ield experiments. J Min trials. Control Clinical Trials 1982; 3: 455–63.
Agric Great Britain 1926; 33: 503–513. 22. Siegel JP. Equivalence and non-inferiority trials. Am
6. Friedman LM, Furberg CD, DeMets, DL. Fundamentals of Heart J 2000; 139: S166–70.
Clinical Trials, 3rd ed. New York, Springer-Verlag, 1998. 23. Gould A. Another view of active-controlled trials.
7. Armitage P. he role of randomization in clinical trials. Control Clin Trials 1991; 12: 474–85.
Stat Med 1982; 1: 345–52. 24. Hasselblad V, Kong DF. Statistical methods for
8. Kempthorne O. he Design and Analysis of Experiments. comparison to placebo in active-control trials. Drug Inf
New York, Wiley, 1952. J 2001; 35: 435–49.
9. DeMets, DL, Cook, TD and Roecker, E. Selected issues 25. Snapinn SM. Alternatives for discounting in the
in the analysis. In Introduction to Statistical Methods analysis of noninferiority trials. J Biopharm Stat 2004;
for Clinical Trials, DL DeMets and TD Cook, eds. Boca 14: 263–73.
Raton, FL, Chapman & Hall/CRC, 2008. 26. Kaul S, Diamond GA, and Weintraub, WS. Trials
10. Bath, P. Acute stroke. In Textbook of Clinical Trials, 2nd and tribulations of non-inferiority: the ximelagatran
ed. D Machin, S Day, and S Green, eds. Chichester, John experience. J Amer Coll Cardiol 2005; 46: 1986–95.
Wiley & Sons, 2006. 27. Klein, JP and Moeschberger, ML. Survival Analysis, 2nd
11. Day, S. Blinding or Masking. In he Encyclopedia of ed. New York, Springer-Verlag, 2003.
Biostatistics, 2nd ed. P Armitage and T Colton, eds. New 28. Kwang IK. Active-controlled noninferiority/equivalence
York, Wiley, 2010. trials: methods and practice. In Buncher CR, Tsay JY,
145
eds. Statistics in the Pharmaceutical Industry. Boca 32. Ofen W, Chuang-Stein C, Dmitrienko, A et al. Multiple
Raton, FL, Chapman & Hall/CRC. 2006; 193–230. co-primary endpoints: Medical and statistical solutions.
29. Singer, IB. A Crown of Feathers. New York, Farrar, A report from the Multiple Endpoints Expert Team
Straus and Giroux, 1973. of the Parmaceutical Research and Manufacturers of
America. Drug Inf J 2007; 41: 31–46.
30. International Conference of Harmonisation. Guideline
E10: Choice of Control Group and Related Issues 33. Dunnett CW, Gent M. An alternative to the use of
in Clinical Trials. 2000. http://www.ich.org/cache/ two-sided tests in clinical trials. Stat Med 1996; 15:
compo/475–272–1.html#E10. 1729–1738.
31. Lachin J. Statistical properties of randomization in 34. Wiens BL. Something for nothing in noninferiority/
clinical trials. Contr Clin Trials 1988; 9: 289–311. superiority testing: a caution. Drug Inf J 2001; 35: 241–245.
146
Chapter
Monitoring of clinical trials: Interim
14 monitoring, data monitoring committees,

and group sequential methods
Rickey E. Carter and Robert F. Woolson
Introduction his chapter discusses the process of reviewing

accumulating clinical trial data in a formal manner.
Accumulating data may be reviewed regularly in all
he presentation includes a discussion of the role of the
phases of clinical development for decision-making
Data Monitoring Committee (DMC), and a descrip-
based on safety or clinical beneit. In later phase clini-
tion of state-of-the-art statistical techniques of interim
cal trials, especially phase 3 trials, this ongoing review
monitoring permitting ongoing review of these data.
ideally takes a comprehensive look at the trial’s con-
Together, these two elements of the DMC and interim
duct, problems in clinical assessments, patient compli-
monitoring contribute heavily to ethically and scien-
ance, protocol adherence, patient safety, and patient
tiically sound periodic reviews. In the next part, we
response to therapy. For randomized controlled clini-
provide an overview of the structure and the opera-
cal trials, the review will also incorporate assessments
of the integrity of the randomization and the mainte- tions of a DMC. Following this the statistical issues and
nance of the treatment group assignment blind, in the challenges of interim monitoring are elucidated; sev-
event the trial is blinded, single or double. Clearly, these eral commonly used approaches for interim monitor-
periodic evaluations are organized to meet the ethical ing are described. Applications to neurological clinical
obligations in conducting clinical research and pro- trials are considered; these illustrate the process and
tecting patients from undue harm. How these reviews the challenge of monitoring eicacy data on an ongo-
are conducted, what is reviewed, and who conducts the ing basis.
reviews are critical features that impact the success in One example trial is the TOAST clinical trial.
meeting these ethical obligations. ‘Trial of Org 10172 in Acute Stroke Treatment’ (i.e.,
Randomized controlled therapeutic trials are oten TOAST) was a phase 3 randomized, double-blind
designed to compare two treatments, A & B, in an efort clinical trial examining the eicacy of a new anti-
to decide which is superior. Randomization of a patient to thrombotic drug for improving the outcome of per-
treatment A or to treatment B can be defended ethically sons with acute ischemic stroke [1]. he study was a
if clinical experts are truly uncertain which of the two is joint efort of a number of academic medical centers,
superior. his clinical uncertainty, equipoise, exists at the the National Institute Neurological Disease and Stroke
start of a trial, but accumulating data may alter this state (NINDS) and Organon, manufacturer of danaparoid
as the trial proceeds. herefore, a plan must be in place to (Org 10172), and this trial was organizationally com-
review these data in a manner preserving the principles plex with multiple components. he trial was funded
of sound research design. hese research principles, at a primarily through formal grants from NINDS with
minimum, include: an unbiased assessment of the two additional support from Organon. Among the trial
treatment groups, and the control of the statistical type components were a clinical coordinating center, a sta-
I and type II error rates at the levels prescribed when the tistical and data coordinating center, and a DMC. he
trial was launched. Simultaneously, this preservation of total sample size was approximately 1200 participants
sound principles of research design can be secured while and four interim analyses were planned and conducted
maintaining both individual patient safety, and the eth- over the course of the study. he study concluded with
ics of clinical practice. a non-signiicant eicacy diference between the ORG
147
10172 and placebo groups. In spite of the ‘negative’ well as of key clinical eicacy data. Assessing the trial’s
inding of no diference in the primary eicacy out- performance and quality is also required, since a poorly
come between the two treatment arms, important ei- conducted trial will be unable to test the trial hypoth-
cacy and safety information were gained. he complete eses, and would therefore be unethical to continue.
study design was published [1], and the principal study Clearly, expertise in clinical trial design and analysis
indings have also been summarized [2]. he methods is required in addition to clinical specialist knowl-
of this chapter will be illustrated using some informa- edge of the disease. Speciically, from the scientiic and
tion from these published papers. clinical perspective it is important the DMC member-
ship include experts in the clinical condition under
Data monitoring committees and trial investigation, experts in key related medical areas,
biostatisticians with expertise in clinical trials and trial
monitoring overview monitoring, and individuals in related areas speciic
Multi-center trials must be coordinated and adminis- to the trial. his last group might include bioethicists,
tered eiciently. Oten a set of committees is constituted basic science specialists, patient advocate representa-
to streamline this efort. While there is no single com- tives, and representatives from the public. he size and
mittee structure paradigm that its every trial, there composition of the DMC should be commensurate
are several commonly used conigurations. One such with the risk and complexity of the study and the exact
arrangement includes: a Steering Committee to govern charge of the DMC.
overall study conduct; an Operations Committee to For the DMC to provide efective safeguards for the
handle day-to-day decisions; a Publication Committee human participants, the DMC must be fully informed
to form writing assignments and writing teams; and a of the trial’s progress and have a mechanism for dia-
DMC to monitor the trial on an ongoing basis. logue with the sponsor and investigators. his issue
With the exception of the DMC, members for can be addressed by careful organization of the DMC
these committees are selected primarily from the trial’s meetings. Formal summary reports are prepared for
clinical and scientiic team. hus, the trial’s principal the DMC’s review or, in some cases, by members of
investigators, clinical investigators, biostatisticians, the DMC itself. However, these reports only provide
and other key trial leaders typically represent the for one-way dialogue. A best practice is to have the
pool from which committee members are chosen. In reports distributed to the DMC suiciently prior to
contrast, DMC members are frequently chosen from the scheduled meeting so that a pre-meeting review of
a group of individuals who are not actively involved the materials can identify critical points that warrant
in the trial, ideally limited to those individuals exter- clariication during the conduct of the DMC meeting.
nal to the trial and free from conlicts of interest. his he conduct of this meeting should allow for dialogue
degree of independence from the trial, investigators, with the investigative team while maintaining the sci-
and sponsors is a desirable attribute for the DMC to entiic integrity of the trial. Typically, the meeting is
conduct its work in an unbiased manner. While a fully conducted in at least two phases: an open session fol-
independent DMC is desired and should be viewed as lowed by a closed session. During the open session,
a requirement for phase 3 registration trials, earlier representatives of the study team are invited to par-
phase clinical trials or clinical trials with particularly ticipate in the study’s discussion, present an overview
rare diseases may be unable to achieve complete inde- of the trial’s current status and answer any questions
pendence. Furthermore, the degree of monitoring and that may have arisen during the pre-review of meet-
independence of the committee should be commensu- ing materials. Oten the summary reports prepared
rate with the risks of the interventions. hese issues can for the open session are called the Open Report. his
be challenging to address and several groups provide summary report focuses on the overall study progress
guidance in this area, e.g., NIH Policy: http://grants. and does not provide any information about treatment
nih.gov/grants/guide/notice-iles/not98–084.html. group diferences. his restriction is incorporated to
Diverse membership is critical, since the DMC oten minimize the risk of compromising the blind or oth-
will be the independent body responsible for monitor- erwise jeopardize the scientiic integrity of the study.
ing the trial at intervals during the trial’s conduct. he Ater the open session draws to a close, the attendees
responsibility to perform independent review includes not formally on the DMC are excused and a closed ses-
a review and evaluation of accumulated safety data as sion begins. A Closed Report is prepared in advance
148
Chapter 14: Monitoring of clinical trials
for this session and is reviewed during the closed Composition and operations of a DMC
session. the closed session is when a formal coniden-
In TOAST the DMC was formed by the project’s steering
tial review of the entire trial occurs. he speciic nature
committee in consultation with, and with the approval of
of both the open and closed sessions is detailed in writ-
NINDS, the primary National Institutes of Health spon-
ing in the form of a DMC Charter. his would include
sor. he DMC members included individuals with no
a detailed description of the planned open and closed
reports. direct ties to the study and with no conlicts of interest.
A DMC is ordinarily constituted early in the life Hence, the DMC was constituted to be independent of
of a trial. Ideally, the committee is formed and has the trial. Ater an initial DMC meeting at the trial’s initi-
its initial meeting before the irst patient is enrolled ation, the DMC met face-to-face annually, and had mul-
and randomized. As a irst order of business the DMC tiple interim teleconferences. Each meeting began with
must establish its template for its functioning. Many an open session and key project investigators provided
aspects of this template include the establishment an overview of the trial’s progress to date. his included
and approval of a DMC Charter. his charter should an update on patient recruitment, data quality and
be drated by the trial organizers and the document study performance, and a summary of special issues
is really a proposed protocol for the DMC’s opera- requiring attention. he Statistical Center provided sum-
tions and functions. It is helpful if the trial organizers mary tables for the open session and for the closed session
provide the irst drat of this document, since they are to follow. he open session attendees included the clinical
among the most knowledgeable persons regarding principal investigators, the statistical investigators, pro-
what needs to be monitored for both safety and ei- ject coordinators and additional appropriate staf, repre-
cacy. he drat charter describes the DMC’s respon- sentatives from NINDS, representatives from Organon,
sibilities, identiies its members and chair, outlines and the DMC. In the open session no data were presented
the structure of tables and reports to be given to the by treatment group; all data summaries were provided in
DMC, describes the statistical plans for monitoring the aggregate across the two treatment groups.
safety and eicacy, and includes a proposed set of Following the open session, a closed session fol-
times and intervals for DMC meetings. Most impor- lowed and those present included the DMC, the
tantly, the DMC Charter identiies a clear delineation NINDS representative, and the study statisticians. (It is
of the pathways of communication of the DMC with now more common for there to be a second statistical
the trial investigators, sponsors and others. his last group preparing the DMC reports; this allows the study
point requires careful consideration as the DMC must statistical group to remain blinded. his was not done
be mindful of its responsibility to protect conidenti- in the TOAST trial.) Others were excused, but would
ality of trial results throughout the study. his drat be called back aterwards for a closing open session.
charter is one that could be adopted by the DMC, or Data presented at the closed session were separated
it might be revised and inalized by the DMC at this by treatment group and depicted any diferences in
initial meeting. here is typically discussion regarding safety, eicacy, compliance, etc., between the two treat-
the statistical monitoring plan and the identiication ment groups. he DMC discussed any treatment group
of threshold boundaries warranting special action diferences and interpreted the data to-date. In some
should they be exceeded. his is one of the reasons it is cases additional analyses were requested by the DMC
best for this deliberation to take place before the trial’s requiring the statistical center to do these and distrib-
irst patient is enrolled and randomized. Objectivity is ute these at a later date to the DMC. Ater the closed
essential in setting the monitoring and review guide- session the DMC met in executive session to form its
lines. With no data available, this objectivity is easier recommendations. Following this, a inal open session
to maintain and defend. was held with the same attendees as the original open
Once the charter is approved it becomes a cen- session. In this inal session the DMC Chair delivered a
tral trial roadmap allowing others to see where the set of recommendations and an evaluation of the trial
DMC will be going, and at the end where they have to date. Following this the meeting adjourned. A meet-
been. here are several excellent templates for a DMC ing summary including overall recommendations of
Charter including Appendix A of Ellenberg, Fleming the DMC was prepared in writing following the meet-
& DeMets [3] and NINDS provides templates for Open ing for distribution to the study investigators, sponsors
and Closed Reports on its website [4]. and institutional review boards.
149
It is important to note that one of the major items can be argued that all randomized clinical trials should
to be reviewed in the closed DMC session was the consider some form of interim monitoring; the proto-
accumulated eicacy data. he primary outcome in col should clearly detail how this monitoring will be
TOAST was favorable outcome at 3 months post-ran- accounted for in the inal analysis. his part provides
domization. Patients were considered to have a favor- guidance for developing monitoring schemes.
able outcome if they had a good Glasgow Coma Score
and a good Barthel Index (measure of activities of daily Issue of multiplicity
living). hus, the favorable outcome assessment was Consider a setting where two or more measures are
a composite score. At each DMC meeting an interim used to quantify the eicacy proile of an intervention.
analysis was done comparing the favorable outcome In the context of the TOAST trial, the Barthel Index
rates between the two treatment groups. A major chal- and Glasgow Coma Scale were used to assess activities
lenge was to perform this comparison at each interim of daily living and level of consciousness, respectively.
analysis while preserving the ability to do a valid com- he NIH Stroke Scale was also used as a quantitative
parison at the 5% signiicance level at the end of the neurological exam, and inally, a supplemental motor
trial. he rest of the chapter will deal with procedures exam was used to measure limb strength. In this sce-
for achieving these aims. nario, there were four critical measures to assess an
intervention’s eicacy at improving post-stroke func-
tioning. Intuitively, having four assessments increased
Mechanics/statistics of interim the likelihood of declaring a treatment group difer-
monitoring ence. hus, the power to detect a treatment group dif-
A DMC’s objectivity is in part due to independence ference could be apparently increased, albeit at the cost
of the members. A formal statistical framework can of an increased type I error rate.
enhance the objectivity by providing a universal lan- In usual statistical parlance, a type I error is a ‘false
guage to communicate the accumulating evidence. positive’ result. For example, when testing a single
his part will introduce key terminology used in the hypothesis, the probability of incorrectly concluding a
statistical monitoring of clinical trials while motivating signiicant efect (rejecting the null hypothesis) when
the need for this statistical framework. in fact there is no efect is denoted as α. On the con-
trary, the probability of correctly failing to reject the
null hypothesis when there is in fact no efect is (1 −
Interim monitoring of efficacy and safety α). When multiple statistical tests are performed, one
data & issues of multiple testing obtains a set of hypothesis tests, each with a compar-
Measuring an intervention’s eicacy, for example, may ison-wise type I error rate. he family-wise error rate
require a complex battery of assessments in order to is the error rate for an entire collection of compari-
measure adequately the full scope of the disease or sons. Virtually any introductory biostatistics textbook
condition. Statistically, such a multi-faceted assess- describes many valid procedures for controlling one or
ment introduces a set of hypothesis tests. he statis- the other of these rates. For interim testing the prin-
tical implications of these multiple hypothesis tests ciple of a family-wise error rate applies to the notion
can be characterized in the context of the well-known of controlling the error rate for the entire collection of
multiple comparisons problem. Added to this prob- interim looks we take of the trial until its termination.
lem’s complexity is the fact that modern clinical trials We will return to this, but irst we continue our general
further introduce an additional dependency on this discussion of this concept of family-wise error rate.
hypothesis testing problem. Namely, clinical trial data To ensure the collection of hypothesis tests only
are routinely analyzed as the trial progresses (sequen- contain α probability of any type I error (i.e., the fam-
tial analysis). his additional dependency dimension ily-wise error rate), the individual tolerance level for a
to the multiple testing setting is not as easily addressed type I error rate for each hypothesis test must be more
by simple correction factors such as the Bonferroni rigorous (α* < α). Conceptually, a large number of tests
adjustment. However, as this part will detail, statisti- each conducted at the 5% signiicance level is associated
cal methodology has been developed to allow for the with a larger family-wise error rate than would a smaller
routine monitoring of a trial’s accumulating data. his number of tests each conducted at the 5% signiicance
possibility enables the sound monitoring of the study. It level [5]. his issue and approaches to managing the
150
family-wise error rate have been fully discussed in clin- one might expect (e.g., as it did in the preceding for
ical trial texts [6] and in the introductory biostatistics independent tests). Ellenberg et al. illustrated the prob-
literature. With independent comparisons simple probability of a type I error based on the number of interim
ability shows what can happen to the overall, i.e. fam- analyses for one-sided testing at α = 0.025 [3]. With
ily-wise, error rate. Suppose one intends to make two only one test, the type I error rate is the nominal 0.025,
comparisons, each independent of one another, and but with the addition of only one interim analysis half-
suppose we plan to conduct each comparison at a type I way through the study, the error rate jumps to 0.041, a
error rate of 0.05. Under the hypothesis of no treatment 64% increase. Adding additional interim analyses once
group diferences, the probability that one test does one already has conducted one has a less pronounced
not reject the null is 0.95, (i.e., 1 – 0.05). If the two tests efect on the inlation of the type I error rate. For exam-
are independent, then the probability that both do not ple, with ive total analyses, the type I error rate is 0.075,
reject is (0.95) × (0.95) or 0.9025. Hence, the probability three times the nominal 0.025 rate, and with 10 analy-
that at least one of the two tests rejects is 1 – 0.9025, or ses, the error rate is approximately four times the nom-
0.0975. hus, the family-wise error rate for this collec- inal rate (0.096). hus, the greatest inlation in the type
tion of two comparisons is not 0.05, but it is 0.0975. If I error rate occurs when one moves from no interim
you had c independent tests then the family-wise error analyses to any number of interim analyses. hus, the
rate would be 1 – (0.95)c. If c = 5, that is ive independent relative ‘penalty’ for adding additional interim analyses
tests, then the family-wise error rate can be calculated is not as striking as adding additional hypothesis tests
to be 0.2262. So, techniques to handle multiple compar- in the independent testing framework. Nonetheless,
isons evolved to allow us to lower the individual com- there is a signiicant inlation in the type I error rate
parison rate to something smaller than 0.05 in order to when any interim analyses are performed. herefore,
keep the family-wise rate at 0.05. Interim testing in clin- there is a penalty and one must account for this when
ical trials builds on similar logic; although, the proce- designing the study, if the intent is to keep the overall
dures are more involved since the interim comparisons type I error for the trial at a prescribed level like 0.05 or
are not independent, but are built on accumulating data 0.025, as is usually desired.
(on the same endpoints) over time. Two general approaches are available for specifying
he probability of a false positive inding increases the required signiicance level for each of the succes-
with repeated interim assessment of accumulating data sive evaluations of the data. he irst method is a fully
from the same trial. his form of multiplicity presents speciied group sequential approach. his approach
special features because the probability of rejecting the pre-speciies the stopping boundaries for all analyses.
null hypotheses is conditionally associated with previ- he second approach allows for more lexibility in the
ous examinations of the data. In fact, one could con- interim analyses in that unplanned analyses can be
sider the probability of rejecting the null hypothesis at included while still providing appropriate control of
analysis K or earlier as the type I error rate. he presentation continues with a
description of group sequential methods, followed by a
P( j 0 Analysis ) = more lexible design that increases practical utility.
P( j t
look) P (reject at 2nd look |
did not reject at 1 )
+ + (reject at K th llook | did not reject previously) Group sequential methods
(14.1) Group sequential methods for determining the critical
values to be used during interim analyses (i.e., ‘stopping
As with the regular multiple testing scenario, in order boundaries’) represented a key advancement in the the-
to control the overall type I family-wise error rate a ory and application of sequential analyses. he design
higher degree of statistical evidence is required for lexibility to allow for interim analyses fundamentally
rejecting the null hypotheses at each analysis (i.e., for changed study design and provided a broad platform
the comparison at each analysis time) when this type of to monitor clinical trials. he group sequential foun-
multiple testing is involved. dation rests on two primary design considerations.
Contrary to the independent testing framework, he irst consideration is that the number of analyses
however, an increased number of interim analyses (interim analyses and the inal analysis) is speciied.
does not inlate the type I error rate as appreciably as Denote this number as k. he second consideration is
151
Table 14.1 Upper-limit critical values (stopping boundaries) for a Z-score required for termination due
to efficacy at analysis point k, two-sided α = 0.05
Planned analyses Analysis number Pocock Haybittle-Peto O’Brien-Fleming

1 1 1.95996 1.95996 1.95996
2 1 2.17827 3.00000 2.79651
2 2.17827 1.96729 1.97743
3 1 2.28948 3.00000 3.47111
2 2.28948 3.00000 2.45445
3 2.28948 1.97510 2.00405
4 1 2.36129 3.00000 4.04862
2 2.36129 3.00000 2.86281
3 2.36129 3.00000 2.33747
4 2.36129 1.98275 2.02431
the method in which the critical values will be selected. near the nominal error rate. For analyses in the early
Clearly, these critical values for k > 1 need to be more study period, the required level of signiicance is much
stringent than the critical value if no interim analyses greater using the O’Brien-Fleming boundaries than the
are to be conducted (i.e., k = 1). his is due to multiplic- other two methods. Table 14.1 demonstrates the inter-
ity in testing, which is the basis for the adjustment. relationship of the number of analyses and the group
sequential method. Note for k > 2, the Haybittle-Peto
Specific stopping boundaries and O’Brien-Fleming approaches provide very simi-
While there are numerous stopping boundaries in the lar inal critical values. he key distinction between
statistical literature, three are discussed here. he three the two methods is the manner in which you reach the
methods difer in ease of implementation and inten- inal critical value. O’Brien and Fleming’s approach
tion, but all three are broadly applicable to clinical tri- is such that a rejection of the null hypothesis is more
als. hey have been used widely. Pocock’s method [7] likely during the mid-study period since a very conser-
is straightforward to implement since only one critical vative hypothesis test was conducted irst early in the
value is used through the study. A crucial limitation study. his judicious selection of the testing strategy is
of this approach is that (relatively) little statistical evi- emphasized with the next part on α spending functions
dence may be required to stop the trial early for ei- and lexible designs.
cacy but at the end of the trial, a p-value much less than
the nominal error rate (α) is required to reject the null
hypothesis. he Haybittle-Peto [8] method mirrors Flexible design methods
that of Pocock in that the same critical value is used Clinical trials require rigor in the protocol to ensure
for all interim analyses but a smaller critical value is consistency across multiple sites and reproducibility
used at the inal analysis to bring the required level of of the indings. However well-designed the protocol is,
signiicance more in line with the nominal error rate. there is the likelihood that the study may progress in a
To ensure overall control of the type I error rate, the manner that is generally unanticipated. his could be
stopping boundaries during the interim analyses are as straightforward as accrual being lower than antici-
larger than that of Pocock’s method, so this approach pated, or there could be scientiic concerns raised dur-
attenuates some of the concern with Pocock’s method. ing the course of the study that warrant a more frequent
A third group sequential method is due to O’Brien and examination of the study data. his could be driven in
Fleming [9]. heir approach allows for a gradation in part by the needs of the DMC or by new literature that
the stopping boundaries with the irst analysis requir- may afect the risk-to-beneit ratio. For all of these situ-
ing the largest amount of statistical evidence to reject ations, having rigor in the analysis plan while balancing
the null hypothesis and concludes with the inal ana- the ever present need to ensure human subject safety
lysis needing an observed level of signiicance very is essential. Flexible designs are well suited to meet
152
this need. hese designs relax the assumptions of the we can re-estimate the remaining stopping boundaries
group sequential methodology. Speciically, these lex- on the basis of the amount of α already spent. his is
ible designs allow the modiication of k and the timing not to suggest that one should be casual with respect
and spacing of the interim looks. Spending functions to the original statistical analysis plan, but rather to
are the principal analytical tools permitting the lex- reinforce the importance of design lexibility when
ibility in the study design. hese spending functions needed.
are robust tools for dynamic application, and therefore
are ideally applicable to large complex clinical trials. Alpha spending functions
Spending functions primarily spend the α across the Such design lexibility was made possible with the
set of interim and inal looks at the data. he general advent of α spending functions. An α spending func-
approach for ‘spending’ α over the course of a study is tion [10–12], α(τ), is a monotonically increasing func-
an important methodological advancement over the tion that regulates the amount of type I error spent
group sequential methodology. during each interim analysis as the proportion of the
Conceptually, the target α (say 0.05) is established information (τ) increases. At the start of the trial, the
a priori and each incremental analysis utilizes some of function equals zero relecting that none of the type I
the available error. he rate of spending determines the error rate has yet been spent. At the conclusion, the α
overall stopping boundaries at any point during the spending function should be α, the preplanned type I
study. Alpha spending functions have been developed error rate. In a theoretical sense, when τ = 100% this
to resemble a variety of group sequential stopping would represent the minimum variance-covariance
boundaries, with the O’Brien-Fleming-like boundaries obtainable for the given sample size (i.e., Fisher’s infor-
being a highly attractive option for the reasons speci- mation). Determining the fraction of this theoretical
ied in the group sequential part above. A diference quantity may appear daunting at irst; however, in
here is that we are now permitted to have unequally practice, for the common clinical trial settings the frac-
spaced evaluations or unplanned evaluations while tion depends on either the planned total sample size
still providing overall protection to the target α and β and/or the planned total number of events in survival
error rates. his protection requires complex condi- analysis. In particular, for normally distributed out-
tional probability calculations, but greatly expands the comes measured only once, the observed fraction of
capacity for trial monitoring. the total theoretical information is τ = nobs/nplanned. For
survival analyses, the fraction of τ is approximated by
Illustrative example τ ≈ d/D, where d represents the number of observed
Prior to the formal introduction of the α spending events at the interim analysis when a total of D events
functions, consider this scenario. Suppose a large are anticipated. For repeated measures analysis, τ may
phase 3 clinical trial has two formal eicacy interim represent the proportion of observed measurements of
analyses planned when 33% and 66% of the partici- the dependent variable divided by all potential meas-
pants have the primary endpoint available for analysis. urements, or τ ≈ r/NM, where r, N, and M represent the
he protocol speciies that the O’Brien-Fleming group observed number of dependent measurements at the
sequential stopping boundaries will be used to provide interim analysis, the total number of planned partici-
protection to the overall α level. According to Table pants, and the number of repeated measurements per
14.1, the three critical values (in absolute value for participant, respectively.
two-sided testing) required to have early stoppage of here is great lexibility of the shape of the mono-
the trial due to eicacy are (3.47, 2.45, 2.00). Suppose tonic spending function provided the constraints α(0)
during the review of the irst interim analysis, the DMC = 0 and α(1) = α are incorporated. For example, a func-
determined that waiting until 66% of the participants tion that is concave up would spend a small amount
have been enrolled would be unacceptable and that an of the α early in the study. his may be desirable since
interim analysis should be conducted when 50% of the there is imprecision in these early trial estimates.
participants have been enrolled instead. Using trad- Conversely, a concave down function would increase
itional group sequential methods, this type of modii- the likelihood of stopping early but at the expense that
cation is not possible. Speciically, the group sequential a signiicant portion of the α would be spent early in
methods are based on equally-spaced preplanned ana- the trial. his translates into a large critical value (or a
lyses. Using the α spending approach, we will see that requirement for a very small p-value) at the end of the
153
study, something that may be undesirable. While there on the most current information (the trial to date), the
is in fact an ininite number of potential spending func- likelihood of this would be very low. hus, equipoise is
tions, framing a spending function around the familiar not present so early termination may be warranted. On
group sequential boundaries of Pocock or O’Brien- the contrary, early termination due to a low probabil-
Fleming has proven to be a useful method of selecting ity of reaching a statistically signiicant conclusion is
a spending function. also justiication for early termination. So called futil-
An O’Brien-Fleming-like α spending function is: ity analyses address this concern. It is important to dis-
tinguish this type of interim analyses from the phase 2
⎡Z ⎤ futility design and analyses described in Chapter 8.
α O F (τ) = 2 − 2Φ ⎢ 1−1/α2/2 ⎥ , (14.2)
⎢⎣ τ ⎥⎦ Futility analyses utilize the concept of stochastic
curtailment to determine the likelihood of obtaining
and a Pocock-like α spending function is: a statistically signiicant result based on the accumu-
lating data obtained in the course of the study [13–14].
α P (τ) = α l [ ( 1)τ] he rationale for early termination for futility mirrors
(14.3)
that for early termination due to eicacy. In particu-
[11]. Generation of the critical values for values of τ for lar, based on the data accrued to date in the study, the
the irst interim analysis can be easily accomplished by likelihood of crossing the stopping boundaries is par-
hand, but generation of multiple stopping points (which ticularly low. Conditional power is used to determine
are conditional on previous examinations) can be com- the probability of concluding a statistically signiicant
plicated. Use of specialized sotware is recommended result. his power is calculated conditionally on the data
in these settings. Before turning to this it is appropriate observed to date. Figure 14.1 describes the interaction
to comment on futility, or the lack of eicacy, which of conditional power with unconditional power (i.e.,
could be the basis for early study termination. power estimated prior to the start of the study).
Lan and Wittes [15] provide formulas based on
‘Futility’ analyses the ‘B-value’, a sample size independent quantity rep-
he development of group sequential methods and α resenting accumulating data, to calculate conditional
spending functions thus far has focused on early ter- power for a variety of settings. Ideally, one would want
mination due to eicacy. When early termination for conditional power to remain in the range of the proto-
eicacy is considered, one has generally observed at col’s assumed power, but interpreting what is ‘high’ and
interim analysis a test statistic value that exceeds the what is ‘low’ is diicult, particularly if the decision is
pre-determined stopping boundary. One could argue post hoc. Lan et al. [16] recommend a threshold for low
that a very strong reversal in the direction of the treat- to high conditional power in the range of 0.5 to 1.0, and
ment efect would need to be observed for the efect to speciication of a lower limit for conditional power in
no longer be signiicant at the end of the trial, and based the DMC Charter may prove useful in interpreting the
Conditional power Figure 14.1. Interrelationship of

Low High unconditional power with conditional
Early stoppage due to Larger than anticipated power.
Low futility should be considered effect or less variability
observed, continue the trial
If early in the study, need to Continue study, but early
consider imprecision in termination due to efficacy
estimates but monitoring may be possible.
Unconditional more closely is warranted. If
power in the mid- or late- stage of
(‘planned the study, consider early
power’) High termination if safety profile
is marginal. Consider
maintaining the study if
risks are acceptable so that a
more informed conclusion
regarding the alternative
hypothesis can be made.
154
calculations. Note that at study completion, the null approaches: R, SAS, and EAST. It is worth noting that
hypothesis will be either rejected or not, so high condi- the calculations oten require polynomial approxima-
tional power along the course of the study is generally tions and numerical integration of approximated func-
desirable. Low conditional power, however, does not tions and are intense computationally. Practically, the
rule out the likelihood of rejecting the null hypothesis, use of approximation integrals can introduce trivial
so one must be aware of the potential inlation to the diferences in the estimated stopping boundaries across
false negative error rate when considering early ter- statistical sotware. For this reason, it is important that
mination due to low conditional power. the protocol specify the sotware used for calculations
A less formal method of estimating conditional so that the stopping boundaries are reproducible and
power is to re-evaluate the sample size assumptions consistent over time.
in the context of the accumulating data. Power can be his part does not provide comprehensive details
re-estimated based on the minimum clinically signii- regarding the use of each sotware, instead general fea-
cant diference deined in the protocol and the observed tures will be illustrated. To illustrate the calculations,
variation in the primary endpoint. It is not unreason- a clinical trial with three planned analyses τ = {0.33,
able to expect that changes in the assumed (within 0.66, 1.00} and the O’Brien-Fleming-like α spending
group) standard deviation may occur in a large phase 3 function will be used. his scenario will be modiied
study, so reassessment of the necessary sample size may to include the addition of an unplanned analysis when
be warranted to allow for adequate power to detect the 50% of the participants have the outcome measured
minimum clinically signiicant diference. Such con- (τ = {0.33, 0.50, 0.66, 1.00}) as was illustrated earlier.
siderations fall under the rapidly developing area of he generation of the stopping bounds will be illus-
adaptive designs (see Chapter 9). trated using R. he remaining two sotware approaches
are described in the context of increased functionality
Fully sequential designs over R, which is somewhat rudimentary.
While group sequential methods have broad applica- he R-project is an open-source statistical comput-
bility, there are sequential designs that involve exam- ing environment, and users have contributed numer-
ining the primary endpoint ater every participant (or ous modules (‘packages’) that contain programming
every two or three participants). For this approach, the code for speciic analyses. here are several packages
endpoint needs to be available before the next partici- available within R to create the stopping boundaries,
pant is enrolled. For safety studies (e.g. phase 1 stud- but for this presentation, the ldBounds package will
ies) with a clearly deined adverse event endpoint, this be illustrated [17]. he ldBounds package is distrib-
approach is oten employed and the ‘3 + 3’ or ‘ up/down’ uted through GNU-2 public license and comes with
designs are examples of this more traditional sequen- sotware documentation in the form of a help ile [17].
tial approach. For eicacy trials, particularly multi- Interface with the sotware is through command-line
center studies, this approach has numerous logistical syntax in R. his implementation is most basic and
issues to be overcome. hat said, there are cases where only produces the stopping boundaries. Nonetheless,
the fully sequential designs could be appropriate, but ldBounds may prove suicient for many statisticians
in the context of neurology trials, which oten require working on trials.
long-term follow-up to assess eicacy, their use will be Generation of the stopping bounds for the illustra-
less common. As such, the methods will not be covered tive scenario uses the ‘bounds’ command that is avail-
further. able once the ldBounds package is loaded. Figure 14.2
provides the necessary syntax to generate the original
and modiied study design. Note that the original stop-
Use of statistical software for interim ping boundary (|z|>3.7307 @ τ = 0.33) is unafected
analyses by the addition of an extra interim analysis at τ = 0.50.
he statistical underpinnings of the methods discussed Furthermore, the inal stopping boundary remained
are complex computationally and are enabled through essentially identical (|z|>1.9917 vs. |z|>1.9931) with
the use of specialized sotware. With recent additions to the addition of this one additional interim look. he
the SAS System and the increased popularity of R, the same could be said for the originally planned second
implementation of the methods is straightforward and interim analysis (|z|>2.5262 vs. |z|>2.5546). his is
broadly available. his part discusses three sotware an illustration of a point made earlier in the chapter;
155
Original (Planned) Study Design Figure 14.2. Command syntax and summary
output from R using the ldBounds package.
R Syntax
> times1<-c(0.33,0.66,1.0)
> obf_original <-bounds(times1,iuse=c(1,1),alpha=c(0.025,0.025))
> summary(obf_original)
Output
Lan-DeMets bounds for a given spending function
n = 3
Overall alpha: 0.05
Type: Two-Sided Symmetric Bounds

Lower alpha: 0.025
Upper alpha: 0.025
Spending function: O'Brien-Fleming
Boundaries:
Time Lower Upper Exit pr. Diff. pr.
1 0.33 -3.7307 3.7307 0.00019097 0.00019097
2 0.66 -2.5262 2.5262 0.01159656 0.01140558
3 1.00 -1.9917 1.9917 0.05000000 0.03840344
/*******************************************************/
Modified Study Design
R Syntax
> times2<-c(0.33,0.5, 0.66,1.0)
> obf_modified <-bounds(times2,iuse=c(1,1),alpha=c(0.025,0.025))
> summary(obf_modified)
Output
Lan-DeMets bounds for a given spending function
n = 4
Overall alpha: 0.05
Type: Two-Sided Symmetric Bounds

Lower alpha: 0.025
Upper alpha: 0.025
Spending function: O'Brien-Fleming
Boundaries:
Time Lower Upper Exit pr. Diff. pr.
1 0.33 -3.7307 3.7307 0.00019097 0.00019097
2 0.50 -2.9692 2.9692 0.00305065 0.00285967
3 0.66 -2.5546 2.5546 0.01159656 0.00854591
4 1.00 -1.9931 1.9931 0.05000000 0.03840344
namely, once a study is designed to include at least one informative set of MACROs are fully documented in
interim analysis, the relative efect of a small number an excellent general reference by Dmitrienko et al [18].
of additional analyses does not appreciably change the For the new SAS procedures, SAS provides thorough
overall signiicance level. documentation in the sotware’s help iles with numer-
SAS Version 9.2 for Windows incorporates new ous examples.
procedures for the design (PROC SEQDESIGN) and In contrast to the ldBounds implementation in R,
testing (PROC SEQTEST) of studies involving interim PROC SEQDESIGN includes provisions to estimate
analyses. Prior versions of SAS utilized SAS MACROs sample size during the design phase. his integration
to perform the necessary calculations. One of the more allows for estimates of expected sample sizes under the
156
null and alternative hypotheses. In the context of study of the planned sample size). One would expect dur-
management and budget, these additional estimates ing this phase that the treatment efects are estimated
could prove highly informative. Additionally, SAS with desired precision and the inal subjects will pro-
provides computational routines for estimating con- vide the additional observations to minimize the esti-
ditional power. Since this is a new release, the level of mated standard errors so that the targeted power is
sophistication and lexibility of the SAS oferings does obtained. Figure 14.3 presents a broader overview of
not yet reach that ofered by Cytel’s EAST, but it may these points.
prove to be a very viable sotware package considering
the widespread installation base for SAS. Stopping boundary selection and interim
EAST is a comprehensive design and analysis sot-
ware tool and may be the most comprehensive of the analysis frequency
three approaches described here. Like SAS, generation Using the framework presented in Figure 14.3, an
of the stopping boundaries coincides with the descrip- interim analysis plan may ideally allow for an early
tion of the distribution of the primary endpoint and study interim analysis with minimal impact on the
the estimation of the sample size. For this reason, using overall type I error rate. One or more interim analyses
EAST requires the largest amount of training, particu- during the middle period of the study would be viewed
larly if one is only seeking to generate general stopping as critical to the ongoing management of the study. It is
boundaries. he interface, however, is intuitive and the debatable as to whether interim monitoring is required
sotware comes with comprehensive help iles. his late in the study. hus, using equally spaced analyses
sotware is designed to be used throughout the course (pure group sequential methods), k = 2 or 3 are attract-
of the study. In doing so, estimates of conditional ive options.
power, additions of unplanned analyses, and graphical When k = 2, one interim analysis will be conducted
displays of study estimates are readily available from oten when 50% of the study is complete. his approach
within a single sotware. EAST is highly regarded as an is useful when the intervention has had numerous
excellent tool for interim monitoring of clinical trials; prior investigations so that the protocol assumptions
however, the other approaches do allow for application are likely realistic. Virtually any of the group sequential
of the methods discussed in this chapter. methods discussed here would be appropriate when
k = 2. When there is uncertainty with the assumptions,
additional interim analyses are recommended. If using
Recommendations and additional group sequential methods, it is recommended that only
considerations k = 3 and the O’Brien-Fleming stopping boundaries be
Selection of the stopping boundaries has been an used. his approach allows for interim analyses at 33%
active area of theoretical statistical research. Jennison and 66% of the participant accrual. hese time periods
and Turnbull [19] provide an excellent summary of are included in the mid-study phase and a large treat-
this development in their comprehensive textbook ment efect will be required to stop the trial with only
on sequential methods for clinical trials. Each of the 33% of the study data. When k > 3 and equally spaced
developed methods strives to balance several consid- analyses are planned, one or more interim analysis will
erations. To present these considerations, a study will be conducted late in the study. If more lexibility with
be divided into three coarse categories relecting the testing is desired, particularly when we need lexibility
amount of data and/or sample size accrued. he ‘early’ to ‘front load’ the analyses in the early and mid-periods
study category relects the trial when a small fraction of the study, an α spending approach should be consid-
of the subjects have been enrolled (e.g., 30%). During ered. he O’Brien-Fleming-like α spending function is
this phase of the study, one would expect limited power an attractive choice, particularly if interim analyses will
for eicacy and imprecision in the estimates. he ‘mid- be conducted very early in the study’s accrual. Finally,
study’ category is when the treatment efects should there may be practical issues governing the timing and
be estimated with reasonable precision and informed spacing of interim analyses. For example, the DMC
decisions could be made regarding the protocol’s may plan to meet semi-annually and may also require
assumptions (hypothesized efect size, sample size a formal interim look at those times. hus, the DMC
estimates, etc.). he ‘late study’ phase is when the inal Charter would provide this prescription governing the
participants are being enrolled (perhaps 70–100% number and timing of the interim looks.
157
Clinical trial Efficacy considerations Safety considerations Figure 14.3. Statistical considerations
progress related to the amount of study
category information available.
Early study: Features: Features:
30% or less of • Unstable treatment effects • Only high incident events likely
the sample size • Wide confidence intervals to be observed
accrued • Confidence intervals on event rates
will provide little useful information
Implications:
• Only profound differences Implications:
should warrant consideration for • Unlikely trial could be stopped due
termination due to efficacy to safety unless there is a vastly different
• Early study efficacy analysis risk profile or unanticipated
could be viewed as a “practice run”: complications are observed
ensure endpoint availability, data
quality, etc.
Mid-study: Features: Features:

30–70% of the • Treatment effects can be • Common, anticipated events
sample size estimated with reasonable precision observed in sufficient numbers to
accrued • Point estimates can be compared provide reasonable summaries
to study/sample size assumptions • Rare and/or serious events may be
observed, but probability of observation
Implications: is still low
• Critical period where efficacy
Implications:
and study assumptions should be
validated • Critical to evaluate expected vs.
• Termination due to efficacy and unanticipated events to ensure study
futility potential risks are appropriately communicated
• Excessive unanticipated events
may warrant early termination
Late study: Features: Features:

70% or more of • Stable parameter estimates • Maximum amount of safety data
the sample size • Borderline to acceptable power will be available
accrued for hypothesized clinical effect • For large clinical trials (n>400),
rare events (1%–5%) have reasonable
Implications: probability of being observed; extremely
• Efficacy monitoring late in the rare (1%) may not be observable
study may be unnecessary and will Implications:
result in lower than desired power due • Adverse events rates between
to increased “alpha spending” treatment groups can be quantified with
acceptable precision
• Critical decisions regarding
stopping for efficacy should have
occurred previously
Sample size considerations signiicance for each endpoint. It is this adjusted level
that can be used when planning the study for sequen-
Whether group sequential methods or a lexible design
tial analyses. It is comforting to note that the statistical
approach is implemented, a result is that a greater
literature suggests very little sample size inlation for
amount of evidence (larger test statistic) is needed over
the O’Brien-Fleming monitoring of a single endpoint.
the course of the study to reject the null hypotheses.
here is a slight increase, but given the massive uncer-
hus, the overall sample size required to test the same
tainties in other aspects of sample size estimation, the
efect size is larger with planned interim analyses if
relative magnitude of this adjustment is of little prac-
the null hypothesis is true. On the other hand, if the
tical importance and can oten be disregarded.
alternative hypothesis is true, the expected sample size
using interim analyses is actually lower [19].
Further, in practice, both forms of multiplicity Stopping is not always mathematically
(multiple endpoints/comparisons and sequential
tests) readily occur and require attention in the stat- justified
istical plan. A simple solution to addressing this is Reaching conclusions regarding continuation or dis-
to irst determine how the multiple endpoints will continuation of a study are rarely black and white. In
be addressed through the correction to the α level. the case an intervention lacks a suicient safety pro-
his will determine the per-comparison level of ile and the dosage cannot be adjusted to improve the
158
safety for the human participants, early termination Low molecular weight heparinoid, ORG 10172
of the study may be easily recommend. However, it (danaparoid), and outcome ater acute ischemic
is worth noting that this decision may not be from a stroke: a randomized controlled trial. JAMA 1998; 279:
1265–72.
statistically supported conclusion. Safety concerns do
not equate with false positive results the same as in an 3. Ellenberg SG, Fleming TR and DeMets DL. Data
Monitoring Committees in Clinical Trials: a practical
eicacy scenario. In fact, a false positive result with
perspective. New York: John Wiley & Sons, LTD, 2003.
safety (concluding the risk: beneit ratio is unfavor-
able) will likely slow the development of an interven- 4. NINDS. Outline of DSMB Report. 2010. http://www.
ninds.nih.gov/research/clinical_research/policies/
tion by suggesting that additional research is needed dsmb_outline.htm (Accessed September 9, 2010.)
before the intervention moves forward. his is essen-
5. Carter RE. A simple illustration for the need of multiple
tial to the safety of the human participants and the
comparison. Teach Stat 2010; 32: 90–91.
population that may ultimately be a candidate for the
intervention. 6. Piantadosi S. Clinical Trials: a methodologic perspective
(2nd ed.). Hoboken, NJ: John Wiley & Sons, Inc, 2005.
his part has focused on the statistical aspects of
monitoring eicacy and safety data from a statistical 7. Pocock SJ. Interim analyses for randomized clinical
trials: the group sequential approach. Biometrics 1982;
and/or hypotheses oriented view. However, the reader
38: 153–62.
should bear in mind that there are other aspects of
8. Haybittle JL. Repeated assessment of results in clinical
routine trial monitoring. First, the study’s data qual-
trials of cancer treatment. Br J Radiol 1971; 44: 793–7.
ity control process provides ongoing assessment of
data quality. Good clinical practice provides recom- 9. O’Brien PC and Fleming TR. A multiple testing
procedure for clinical trials. Biometrics 1979; 35:
mendations for the data quality standard to which 549–56.
all studies should adhere [20]. If in the course of the
10. Lan KKG and DeMets DL. Discrete sequential
study it is observed that data quality is poor (untimely,
boundaries for clinical trials. Biometrika 1983; 70:
copious data entry mistakes, monitoring reports with 659–63.
numerous source document to case report form dis-
11. Lan KKG and DeMets DL. Changing frequency of
crepancies, etc.) the trial may be stopped temporarily interim analyses in sequential monitoring. Biometrics
or permanently to account for these issues. Likewise, 1989; 45: 1017–20.
new information may become available that changes
12. Lan, KKG and DeMets DL. Group sequential
the risk: beneit ratio. hese situations and many more procedures: calendar versus information time. Statistics
are generally outside the purview of statistical deci- in Medicine 1989; 8: 1191–8.
sions but are just as important to the scientiic integrity 13. Halperin M, Lan KK, Ware JH, et al. An aid to data
of the study. monitoring in long-term clinical trials. Control Clin
Trials 1982; 3: 311–23.
Comments 14. Lan, KK, DeMets DL and Halperin M. More lexible
sequential and non-sequential designs in long-term
Statistically appropriate monitoring of a clinical trial
clinical trials. Commun Stat 1984; 13: 2330–53.
by an independent DMC improves the safety to the
15. Lan, KK and Wittes J. he B-value: a tool for monitoring
participants while maintaining the scientiic integrity
data. Biometrics 1988; 44: 579–85.
of the study. All clinical trials should consider the need
for interim analysis. While large phase 3 studies should 16. Lan KK, Simon R and Halperin M. Stochastically
curtailed tests in long-term clinical trials. Commun Stat
include interim analyses, smaller studies will still bene- 1982; C(1): 207–19.
it from the inclusion of an independent DMC and for-
17. Casper C and Perez OA. Package ‘ldbounds’ 2006.
mal statistical monitoring.
18. Dmitrienko A, Molenberghs G, Chuang-Stein C, et al.
Analysis of Clinical Trials Using SAS: a practical guide.
References Cary, NC: SAS Institute Inc, 2005.
1. Adams HP, Jr. Woolson RF and Clarke WR, et al. Design 19. Jennison C and Turnbull BW. Group Sequential Methods
of the TRIAL of Org 10172 in Acute Stroke Treatment with Applications to Clinical Trials. New York: Chapman
(TOAST). Control Clin Trials 1997; 18: 358–77. & Hall/CRC, 2000.
2. he Publications Committee for the Trial of ORG 10172 20. ICH harmonised tripartite guideline: guideline for
in Acute Stroke Treatment (TOAST) Investigators. good clinical practice E6(R1), 1996.
159
Section
Section3 Special study designs and methods for data monitoring
Chapter
Clinical approaches to post-marketing
15 drug safety assessment

Gerald J. Dal Pan
Introduction to be carefully selected for inclusion in these trials, and

are thus more clinically homogeneous than patients
Monitoring and understanding the safety of drug and
treated in the course of clinical practice once a drug
therapeutic biological products is a process that pro-
is marketed. Compared to patients in clinical trials,
ceeds throughout the product’s life cycle, spanning the
patients treated in clinical practice may have a broader
period prior to irst administration to humans through
range of comorbidities, take a wider variety of con-
the entire marketing life of the product. Pre-approval
comitant medications, and have a wider spectrum of
drug safety assessment includes animal toxicology and
the underlying disease being treated. hird, additional
pharmacology studies, clinical pharmacology studies
populations of patients, such as children or the elderly,
(also known as phase 1 studies), proof-of-principle who may not have been studied in large numbers in
studies for the disease or condition under study (also clinical trials, may be treated with the product once it
known as phase 2 studies), and conirmatory studies is marketed. In addition, marketed drug products are
of safety and eicacy (also known as phase 3 studies). oten used for diseases or conditions for which they
In each of these stages of drug development, important are not indicated, or at doses outside of the approved
drug safety information is obtained. hese topics have range. Because of this ‘of-label’ use, patients treated in
been covered elsewhere in detail [1]. clinical practice are more diverse than those treated in
At the time a drug product is approved, there is a clinical trials.
substantial amount of data regarding its safety pro- he goal of the post-marketing, or post-approval,
ile. In the pre-approval review process, FDA reviews safety program is to identify drug-related adverse
these data, along with data on the product’s eicacy, to events that were not identiied prior to approval, to
determine if the potential beneits of the drug exceed reine knowledge of the known adverse efects of the
the potential risks for its intended use. As part of the drug, and to understand better the conditions under
approval process, FDA reviews the product’s profes- which the safe use of the drug can be optimized.
sional labeling (also referred to as the package insert), he scope of this endeavor is broad. he core activ-
to insure that, amongst other things, the product’s uses ity is usually the identiication of previously unrec-
and its risks are explained. Risks of the products are pre- ognized adverse events associated with the use of the
sented in the following sections of the label: Highlights, drug. However, it is not suicient simply to note that a
Boxed Warnings, Contraindications, Warnings and drug can cause an adverse event. Rather, an investiga-
Precautions, and Adverse Reactions [2,3]. tion into not only the potential causal role of the drug
hough the pre-approval testing of a drug is very in the development of the adverse event, but also into
rigorous, and the review of the data is very thorough, the conditions leading to the occurrence of the adverse
there are still some uncertainties about the complete event in one person or population and not in others
safety proile of a drug when it is brought to market. should be the focus of any post-marketing drug safety
Several factors contribute to these uncertainties. First, efort. Factors such as dose-response relationships,
the number of patients treated with the drug prior to drug-drug interactions, drug-disease interactions,
approval is limited, generally from several hundred to drug-food interactions, and the possibility of medica-
a few thousand. Second, patients in clinical trials tend tion error must be carefully considered.
160
Chapter 15: Post-marketing drug safety assessment
A full understanding of the factors that can lead to Case reports and case series
a drug-related adverse event can, in some cases, lead to
A core aspect of the post-approval drug safety system
interventions that can minimize the severity or occur-
in the US is the reporting of adverse events to FDA. In
rence of the adverse event, and thus enhance the safe
the US, adverse events in individual patients are gener-
use of the drug. For this reason, the approach to under-
ally identiied at the point of care. Patients, physicians,
standing adverse events, especially serious adverse
nurses, pharmacists, or anyone else at the point of care
events, in the post-marketing period, must be as com-
who suspects that there may be an association between
prehensive as possible.
an adverse event and a drug or therapeutic biological
he identiication of a new safety issue with a drug
product can, but are generally not required, to report
oten begins with a single observation. Such observa-
the adverse event to either the manufacturer or to
tions may come from animal studies, chemical studies
the FDA.
and assays, or observations of human experience with
he public – including health care professionals,
the drug. In the post-market period, such observations
patients, and consumers – can send reports directly to
are usually clinical observations, oten made at the
FDA via the MedWatch program (http://www.fda.gov/
point of care in the course of clinical practice. A prac-
medwatch/), which was established in 1993 to allow
titioner or patient notes the development of symptoms
health care providers and consumers to send a report
or signs that were not present, or were present in less
about serious problems that they suspect are associated
severe form, prior to the patient’s using the medicine.
with any medical product (i.e., drug, biologic, device)
If this sign or symptom is not listed in the product’s
directly to FDA. Members of the public can also report
approved labeling, patients and practitioners may not
suspected adverse events to a product’s manufacturer;
attribute it to the drug. If further evaluation reveals a
the manufacturer, in turn, is then subject to regulations
clinically signiicant process (e.g., acute severe liver
regarding the submission of these reports to FDA.
injury, rhabdomyolysis, agranulocytosis), it is import-
When the manufacturer of a product receives an
ant for the practitioner to keep a side efect due to a
adverse event report, it is required to report the event to
drug in the diferential diagnosis of the event. If a
the FDA. he speciic reporting requirements depend
medication side efect is not included in the diferential
both on the regulatory status of the product and on
diagnosis, a potential association between a drug and
the nature of the event. In general, adverse events are
a previously unrecognized side efect will not be made,
deined as ‘serious’ if they result in any of the following
and the patient may not receive appropriate treatment.
outcomes:
If, on the other hand, the practitioner believes the drug
played a role in the development of the new clinical ‘Death, a life-threatening adverse drug experience, inpa-
tient hospitalization or prolongation of existing hospital-
indings, he or she can forward relevant clinical infor-
ization, a persistent or signiicant disability/incapacity,
mation to either the drug’s manufacturer or to a drug or a congenital anomaly/birth defect. Important medical
regulatory authority, such as the FDA in the US. events that may not result in death, be life-threatening,
In the post-marketing period, the investigation of or require hospitalization may be considered a serious
adverse events is a multi-disciplinary one. he analy- adverse drug experience when, based upon appropriate
sis of a complex adverse event can involve the ields of medical judgment, they may jeopardize the patient or
medicine, pharmacy, epidemiology, statistics, pharma- subject and may require medical or surgical intervention
cology, toxicology, and others. A discussion of the role to prevent one of the outcomes listed in this deinition.
of each of these disciplines in drug safety assessment Examples of such medical events include allergic bron-
is beyond the scope of this chapter. his chapter will chospasm requiring intensive treatment in an emergency
discuss the broad categories of clinical investigations room or at home, blood dyscrasias or convulsions that
do not result in inpatient hospitalization, or the develop-
used in post-market drug safety assessment.
ment of drug dependency or drug abuse’ [4].
his chapter will present an overview of the three
main methods of clinical post-marketing safety assess- Adverse events are also deined as ‘unexpected’ if they
ment: case reports and case series, observational are:
epidemiological studies, and clinical trials. As will ‘Not listed in the current labeling for the drug product.
be discussed, no one method is better than another. his includes events that may be symptomatically and
Rather, the choice of method depends on the particular pathophysiologically related to an event listed in the label-
safety question to be answered. ing, but difer from the event because of greater severity
161
or speciicity. For example, under this deinition, hep- studies were conducted prior to the development of the
atic necrosis would be unexpected (by virtue of greater current MedWatch program in 1993, and do not con-
severity) if the labeling only referred to elevated hep- sider the contribution of reporting from sources other
atic enzymes or hepatitis. Similarly, cerebral thrombo- than physicians. Calculating the proportion of adverse
embolism and cerebral vasculitis would be unexpected
event reports that FDA actually receives requires that
(by virtue of greater speciicity) if the labeling only listed
the true number of adverse events in the population
cerebral vascular accidents. ‘Unexpected,’ as used in this
deinition, refers to an adverse drug experience that has be known. For most adverse events, this number is not
not been previously observed (i.e., included in the label- known. In some cases, however, data are available that
ing) rather than from the perspective of such experience allow an estimate of the extent of reporting to be cal-
not being anticipated from the pharmacological proper- culated. For example, the extent of reporting to FDA
ties of the pharmaceutical product’ [4]. cases of hospitalized rhabdomyolysis associated with
statin use was estimated using a projected estimate of
From a public health perspective, adverse events that the number of such cases in the US and comparing it to
are both serious and unexpected are of the greatest the number of reports of statin-associated hospitalized
concern, since information about such events may rhabdomyolysis in FDA’s Adverse Event Reporting
require regulatory action, such as a labeling change or System, a database that houses FDA’s post-marketing
dissemination of information to the public, on the part adverse event reports [7]. he projected national esti-
of FDA, the manufacturer, or both. mate was obtained by using incidence rates obtained
he above system of adverse event reporting is some- from a population-based cohort study [8], and apply-
times called a passive, spontaneous reporting system. It ing those incidence rates to national estimates of sta-
is called passive because FDA receives this information tin use. Across four statins (atorvastatin, cerivastatin,
without actively seeking it out. It is called spontaneous pravastatin, and simvastatin), the estimated overall
because the persons who initially report the adverse extent of adverse event reporting was 17.7%. For indi-
events to either the FDA or to the manufacturer choose vidual statins, the estimated extent of reporting ranged
what events to report. Because this system of adverse from 5.0% (atorvastatin) to 31.2% (cerivastatin).
event reporting is voluntary on the part of health care Further analysis revealed that the high proportion of
professionals, patients, and consumers, it is generally reporting of cerivastatin cases was driven by reports
recognized that there is substantial underreporting of received ater the dissemination of a Dear Health care
adverse events to FDA. Two survey-based studies con- Professional Letter noting physicians of the risks of
ducted in the 1980s, one in Maryland [5] and the other cerivastatin-associated rhabdomyolysis. he estimated
in Rhode Island [6], examined physician reporting of extent of reporting was 14.8% before the letter and rose
adverse events to FDA, and concluded that fewer than to 35.0% ater. It is important to note that the results
10% of adverse events were reported to FDA. hese of this study apply only to reporting cases of statin-
600000 Received Entered Figure 15.1. Number of direct,

Direct 15-day, and periodic reports received
(solid bars) and entered (checkered
500000 15-day
bars) into the FDA Adverse Event
Periodic Reporting System (AERS) from 2000
Number of reports
400000 through 2009. FDA receives direct

reports straight from the public;
300000 15-day reports and periodic reports
are submitted to FDA by industry. The
15-day reports describe adverse events
200000 that are both serious and unexpected
(i.e., not in the product’s approved
100000 labeling), as well as adverse events from
post-approval clinical trials that are
serious, unexpected, and judged to be
0
reasonably associated with the drug.
00
01
02
03
04
05
06
07
08
09
Industry submits all other adverse event

20
20
20
20
20
20
20
20
20
20
reports as periodic reports. FDA enters

Year all direct reports, 15-day reports, and
all other reports of serious adverse events into the AERS database. Reports of non-serious adverse events are entered only for new-molecular
entities in the first 3 years of marketing.
162
associated rhabdomyolysis. he extent of reporting for an oral formulation, was used). In other cases, it may
diferent drug-adverse pairs will be diferent, and can be appropriate to restrict case reports to certain age
not be estimated from the results of this study. groups (e.g., limit the case series to only case reports
FDA receives over 500,000 adverse event reports describing the suspected adverse events in pediatric
a year; approximately 94% are from manufacturers; patients), or to certain indications for use (e.g., limit
the remainder are directly from the public via the the case series to case reports in which the drug was
MedWatch system. he number of reports has been used for a certain of-label indication). Exclusion cri-
increasing over the past decade (Figure 15.1). Many teria for a case series must be carefully considered so
manufacturers submit reports to FDA electronic- that potentially relevant cases are not excluded. In gen-
ally, using the standards set forth by the International eral, if the purpose of the case series is to examine the
Conference on Harmonisation (ICH) [9,10], which relationship between a drug and a suspected adverse
includes regulators and industry representatives from event that has not been previously associated with the
three regions, the US, Japan, and the European Union drug, it is best to include as many case reports as pos-
[9,10]. sible in the case series, and to minimize the number of
he adverse event reports that FDA receives from excluded cases.
the public and from the manufacturers are entered Once the case series has been developed, it is next
into a database known as the Adverse Event Reporting necessary to review each case report individually in
System (AERS), which contains about 5 million adverse order to determine if there is a plausible causal rela-
event reports. Adverse events in AERS are coded using tionship between the drug and the adverse event. At
a system called MedDRA, the Medical Dictionary for the level of the individual case report, it is oten dii-
Regulatory Activities [11]. cult to establish with certainty that the drug caused the
Other large databases of post-marketing adverse event of interest. For example, if the adverse
adverse events are the European Medicine Agency’s event of interest is one that is common in persons with
Eudravigilance and the World Health Organization’s the disease or condition for which the drug is indicated
Vigibase. In large databases, datamining techniques when the drug is not used, establishing a causal role for
can be applied to identify previously unrecognized the medicine in the development of the adverse event
potential drug-related adverse events [12,13]. is generally not possible. For example, the incidence of
he review of case reports of suspected adverse Parkinson’s disease is much higher in persons over age
events is a complex process that has been described 60 years than it is in persons below that age [15]. In this
elsewhere [14]. It typically begins by identifying one situation, review of a report describing a myocardial
or more case reports with the outcome of interest (e.g., infarction in a 70-year-old patient on an anti-parkin-
aplastic anemia). Because the case reports that form a sonian agent will generally not be informative in deter-
case series oten come from several sources that do not mining if the anti-parkinsonian agent played a causal
report adverse events in a standardized way, it is usually role in the development of the myocardial infarction,
necessary to develop a case deinition. he case deini- as myocardial infarction occurs commonly in this age
tion centers around the clinical characteristics of the group. Similarly, review of a case report is not likely to
event of interest, without regard to the causal role of the shed light on the causal relationship between a medi-
drug whose relationship to the adverse event is being cine and a suspected adverse event when the suspected
investigated. Once a case deinition is established, each adverse event is a manifestation of the underlying illness
report is reviewed to determine if the event meets the which the medicine is treating. For example, review of
case deinition and if the report is to be included in the a case report of suicidal behavior in patients taking an
case series. Depending on the speciic question(s) to be antidepressant is not likely to be suicient to establish a
answered by the case series, other exclusion criteria may casual link between the suicidal behavior and the anti-
also apply. For example, one would generally exclude a depressant. Review of a case series to establish a causal
case in which the report provides no evidence that the relationship between a drug and a suspected adverse
patients ever took the drug of interest. In other cases, event is most useful when the suspected adverse event
one may restrict the case series to only certain formula- is rare in the population when the medication is not
tions of the drug if the drug safety question concerns used, is not a manifestation of the underlying disease,
some formulations but not others (e.g., include case and is generally thought to be the result of exposure to
reports in which an intravenous formulation, but not a medicine. Examples of suspected adverse events in
163
this category are acute hepatic failure, aplastic anemia, population-based context, and for following trends
agranulocytosis, serious skin reactions such as Stevens- over time.
Johnson Syndrome and toxic epidermal necrolysis, and he case of aplastic anemia associated with fel-
certain arrhythmias, such as torsade de points. bamate therapy illustrates the role that case reports can
he approach to assessing the causal role of a medi- play in the assessment of a previously unknown adverse
cine in the development of an adverse event has evolved event in the post-approval period [21]. Felbamate is an
over the past four decades [16,17]. In general, these antiepileptic agent approved for use in the US on July
approaches rely on a systematic review of each case 29, 1993. Pre-approval studies showed no evidence of
report to ascertain the temporal relationship between signiicant, non-reversible hematologic abnormal-
drug use and the development of the adverse reaction, ities [22]. Within about 1 year of approval, 20 cases of
an assessment of any co-existing diseases or medica- aplastic anemia, three of them fatal, had been reported
tions that could confound the relationship between the in the US [21]. Review of the case reports suggested a
medicine and the adverse event, the clinical course ater causal role for felbamate. An estimated 100 000 patients
withdrawing the drug (de-challenge), and the clinical had taken felbamate during this time [21]. While the
course ater re-introduction of the drug (re-challenge). true incidence of aplastic anemia in patients taking
Naranjo and colleagues [18] have developed a quantita- felbamate can not be calculated because case ascer-
tive method based on these general principles for esti- tainment may be incomplete, the estimated rate is 20/
mating the probability that a drug caused an adverse 100 000/year, or 200/million/year. By contrast the
clinical event. he World Health Organization [19] has population background rate of aplastic anemia is low,
developed a qualitative scale for categorizing causality about 2/million/year [23]. hus, the observed cases of
assessments. aplastic anemia suggest that aplastic anemia is about
To help place reports of adverse events in a broader 100 times more frequent in patients taking felbamate
context, data on drug utilization are oten incorporated than in the general population. Based on this ind-
into the analysis. Typically, these data provide infor- ing, the FDA and the manufacturer recommended
mation on the number of prescriptions dispensed for a that patients not be treated with felbamate unless
given drug in a deined time period; in some cases data the beneits of the drug were judged to outweigh the
on the number of persons who have taken the drug may risk of aplastic anemia [21]. A subsequent review of
also be available. hese data, which are obtained from 31 case reports of aplastic anemia in patients taking
commercial vendors, are used to calculate a reporting felbamate [23], using the criteria of the International
rate. he reporting rate is calculated by dividing the Agranulocytosis and Aplastic Anemia Study (IAAAS),
number of cases of an adverse event in persons tak- established that felbamate was the only plausible cause
ing a given drug reported in a deined time period by in three cases, and the most likely cause in eleven
the number of prescriptions dispensed for that drug cases. For the remaining nine cases, there was at least
in a given time period. It is important to note that the one other plausible cause. he authors estimated that
reporting rate is not an incidence rate. he calculation the ‘most probable’ incidence of aplastic anemia in
of an incidence rate requires knowledge of the total patients exposed to felbamate was estimated to be to
number of cases of the adverse event in the population 127 per million. Because aplastic anemia is uncom-
as well as knowledge of the total number of persons and mon in the population and because it is generally the
the duration of drug exposure in the population taking result of a medication or other toxin, a careful ana-
the drug. Because the adverse event reporting systems lysis of a case series can establish the relationship of
only receive a small fraction of all drug-related adverse felbamate to aplastic anemia.
events occurring in the population, the total number
of cases of the adverse event of interest is not available.
In addition, the drug utilization data oten report drug Active surveillance
utilization in terms of number of dispensed prescrip- Active surveillance systems are also being explored to
tions, not in terms of number of actual persons taking identify and examine drug safety issues. Drug safety
the medication. For these reasons, a reporting rate is active surveillance systems, which take advantage of
not the same as an incidence rate. Nonetheless, des- large repositories of automated healthcare data, are
pite some well-recognized limitations [20], reporting now being developed and tested by multiple organiza-
rates are useful for placing adverse event reports in a tions. he common feature of these systems is that they
164
do not rely on health care providers or patients to rec- and do not use the drug being investigated) over time
ognize and report adverse events that may be related to for the outcome of interest.
medication use. Rather, these systems oten use sophis- In the design and analysis of observational phar-
ticated statistical methods to actively search for pat- macoepidemiological studies, careful attention must
terns in linked prescription, outpatient and inpatient be paid to the potential for bias and confounding, each
utilization of care data that might suggest the occur- of which can lead to an erroneous estimate of the efect
rence of an adverse event related to drug therapy. his of the exposure on the outcome. In drug safety stud-
lack of reliance on healthcare providers or patients to ies, these factors would lead to an erroneous conclu-
detect the event, relate it to a drug and then report it to sion regarding the relationship of the use of a drug to
FDA, along with its prospective nature, is what makes the development of an adverse event [25]. One par-
these systems active rather than passive in their scope. ticular type of confounding, confounding by indica-
However, one system is unlikely to address all drug tion, is especially important to address in the design
safety problems or all patient populations. While there of observational epidemiological drug safety studies.
is much interest in developing these systems, there is Persons who take a given medication are diferent
also much work to be done in the validation of these from those who do not take that medication in many
systems. ways. One important way in which they can be difer-
ent is the reason, or indication, for which they are tak-
ing the medication. If the characteristics of those with
Observational epidemiological this indication are also related to the development of
studies the adverse event of interest, an observed association
Observational epidemiological studies of drug safety, between the medication and adverse event may be con-
also known as observational pharmacoepidemiologi- founded by the indication for treatment – that is, the
cal drug safety studies, are widely used in the post- association may be explained not by a direct efect of
marketing period. Because these studies, like case the drug on the outcome, but rather by the relationship
reports and case series, rely on actual patient experi- of the indication for treatment to both use of the drug
ence, they can provide an assessment of a drug’s safety as well as to the development of the adverse outcome,
under actual conditions of use. Observational drug which produces an indirect link between the drug and
safety studies, which can be prospective or retrospect- the adverse outcome. While analytic techniques may
ive, can be used to make inferences about the safety of control for confounding by indication in some cases, it
the drug, provided that they are carefully designed, con- should not be assumed that such techniques will always
ducted, analyzed, and interpreted. Unlike case reports eliminate the efect of confounding by indication. It is
and case series, observational epidemiological studies therefore important to consider carefully the potential
can include a control group. Unlike in clinical trials, impact of confounding by indication in the design of
the investigator in an observational epidemiological the study, in order to minimize the chance that this will
drug safety study does not assign treatment to patients; occur.
patient treatment decisions are made in the course of
routine clinical care and are independent of the study.
he two most common observational epidemio- Cohort studies
logical study designs are the case-control design and A cohort study is designed to determine if there is
the cohort design [24]. In each type of study an ‘expo- an association between an exposure and an outcome
sure’ is related to an ‘outcome’. For drug safety studies, in a deined group followed over time [24]. In a drug
the exposure is usually the use of the drug being inves- safety cohort study, a group of persons treated with
tigated, and the outcome is usually the adverse event of drug of interest and a comparable group of persons
interest. Case-control studies of drug safety compare not treated with that drug are identiied and followed
the frequency of exposure (i.e., the frequency of use of over time. (he group of persons treated with the drug
the drug being investigated) amongst cases (i.e., those of interest can also be compared to a group of persons
with the adverse event of interest) to the frequency treated with an alternative treatment.) he incidence
of exposure amongst controls (i.e., those without the of the adverse event of interest is ascertained in each
adverse event of interest). Cohort studies follow per- group. A relative risk is obtained by dividing the inci-
sons with and without the exposure (i.e., those who use dence in the group treated with the drug of interest by
165
the incidence in the group not treated with the drug of constructed by identifying persons who took the drug
interest. A relative risk of 1.0 implies that the incidence of interest and those who did not take it. Similarly, out-
is equal in the two groups. A relative risk greater than comes of interest that occurred in patients ater entry
1.0 implies that those who receive the drug of interest into the cohort can be ascertained in these datasets.
have a higher risk of the outcome of interest than those his design is known as a retrospective cohort design.
who did not receive the drug of interest. Similarly, a It is conceptually identical to the prospective cohort
relative risk less than 1.0 implies that those who receive design, except that the use of the drug of interest and
the drug of interest have a lower risk of the outcome of the development of the outcomes of interest occurred
interest than those not treated with the drug of interest. prior to the initiation of the study. By using already
To determine if the relative risk is statistically signii- existing data, a retrospective cohort study can be con-
cantly diferent from 1.0, it is customary to calculate ducted and completed much more rapidly than a pro-
and report p-values. To determine the precision of the spective cohort study. Of course, if existing data are not
estimate, 95% conidence intervals can be calculated available or suitable for a retrospective cohort study,
and reported. Because cohort studies measure the inci- this approach can not be used, and another approach
dence of the outcome in two groups, a risk diference must be sought.
can be calculated. his measure quantiies the excess he availability of large computerized adminis-
risk attributable to the drug of interest, and is thus trative health care databases and electronic medical
more suitable for considering the public health impact record systems provide a substantial source of data in
of the indings. which to examine drug safety questions. hese data-
In a prospective cohort study, the investigator bases contain information on medication exposure and
identiies the cohort members at the start of the study, health outcomes. Records of dispensed prescriptions
ascertains drug exposure, and follows users and non- and prescribed medications are the measure of medi-
users of the drug contemporaneously over time to cation exposure. Health outcomes are generally meas-
determine who develops the outcome of interest. his ured by diagnostic codes or procedure codes. Because
design may be particularly useful when the outcome diagnostic codes and procedure codes are recorded
of interest is common and is likely to occur within a for administrative, and not research, purposes, it is
reasonable time ater drug treatment is initiated. A rea- important that their validity be understood when used
sonable time ater drug exposure is one in which a suf- in drug safety studies. To accomplish this, outcomes
icient number of outcome events is likely to occur, but can be ascertained and adjudicated in a manner that is
is not so long that it results in an unacceptable delay in blinded to treatment received, in order to avoid bias in
obtaining study results. outcome ascertainment.
When the outcome of interest is infrequent or when A retrospective cohort study using administra-
there is a long latency between exposure and the devel- tive claims data was used to examine the incidence of
opment of an outcome event, the prospective cohort hospitalized rhabdomyolysis in patients treated with
study design may not be the most feasible approach, lipid-lowering agents [8]. Drug-speciic inception
because it may take several years to complete the study. cohorts of statin (atorvastatin, cerivastatin, luvastatin,
In addition, loss to follow-up may make the original lovastatin, pravastatin, and simvastatin) and ibrate
study design inadequate to address the original ques- (fenoibrate and gemibrozil) users were established by
tion, especially in the case of very long follow-up peri- identifying new users (deined as no use within the 180
ods. Similarly, the introduction of new treatments days prior to entrance into the drug-speciic cohort).
during the study period may make the results of the Hospitalization claims were reviewed for diagnosis
study uninterpretable, irrelevant, or both. Because codes indicative of possible rhabdomyolysis. Medical
of these limitations, the prospective cohort design is record review of hospitalizations by investigators
not oten used in observational epidemiological drug blinded to statin or ibrate exposure status was per-
safety studies. formed to identify cases of rhabdomyolysis, according
In some situations, the cohort study design can be to a case deinition. Incidence rates of rhabdomyolysis
employed by using existing information on patient per 10 000 person-years of treatment were calculated.
treatments and outcomes that have already occurred. If he incidence per 10 000 person-years for cerivasta-
such data are available, mainly in administrative claims tin monotherapy was 5.34 (95% CI, 1.46–13.68). he
data or electronic medical records, cohorts can be corresponding rates were lower for monotherapy
166
with atorvastatin (0.54, 0.22–1.12), pravastatin, (0, drug of interest. As with the relative risks obtained from
0–1.1), and simvastatin (0.49, 0.06–1.76). Amongst the cohort studies, p-values and 95% conidence intervals
ibrates, the rate for gemibrozil monotherapy was 3.70 are customarily calculated and reported to determine
(0.76–10.82) and the rate for fenoibrate monother- statistical signiicance and precision of the estimate,
apy was 0 (0–14.58). Further analysis showed that the respectively. A variant of the case-control study that
incidence rate for the combination of cerivastatin and is oten used in pharmacoepidemiology is the nested
gemibrozil was markedly elevated (1036, 389–2117). case-control study, in which cases and controls are
his retrospective cohort study demonstrated that the selected from a cohort.
risk of rhabdomyolysis was low for monotherapy with Case-control studies are particularly useful when
atorvastatin, pravastatin, and simvastatin, but higher the outcome of interest is relatively uncommon,
for cerivastatin. Additionally, it demonstrated that sta- because such outcomes are not likely to be observed
tin-ibrate combination therapy increased this risk. in a clinical trial or a cohort study. Similarly, if the
outcome has a long latency relative to the exposure,
Case-control studies a cohort study or a clinical trial may not be feasible.
Like cohort studies, case-control studies are designed Designing a robust case-control study is complex.
to measure an association between an exposure and an From a broad perspective, there are three features that
outcome. While cohort studies follow deined groups must be carefully considered when designing a phar-
based on exposure (i.e., cohorts) over time to ascer- macoepidemiologic case-control study: the deinition
tain the outcome of interest, case-control studies start of a case, the measurement of exposure to the drug,
with the identiication of those who have the outcome and the selection of a control group.
of interest (cases) and an appropriately selected group Case deinitions must be carefully considered so that
that does not have the outcome of interest (controls). they insure that the outcome of interest is adequately
he frequency of exposure is then ascertained in each captured. For example, an overly narrow case deinition
group and compared between groups. may result in failure to identify all clinically relevant
In observational drug safety case-control studies, events, while an overly broad case deinition may result
persons with the adverse event of interest (cases) are in inclusion of clinically irrelevant events. In either
compared to persons without the adverse event of inter- case, an imprecise deinition can lead to an incorrect
est (controls). he proportion of cases that received the estimate of the association of the drug of interest to the
drug of interest is determined, as is the proportion of adverse event of interest. An imprecise case deinition
cases that did not receive the drug of interest. Similarly, can lead to failure to identify an association when one
the proportions of controls who did, and who did not, actually exists, or it can lead to an incorrect conclu-
receive the drug of interest are determined. If the drug sion that an association exists when one actually does
of interest is associated with the adverse event of inter- not exist. Case-control studies generally obtain data
est, the frequency of the exposure amongst cases will on drug exposure retrospectively. his can be accom-
be higher than that amongst the controls. he measure plished either by examining medication records, such
of this association in a case-control study is expressed as medical records or administrative claims data, or by
as an odds ratio. he design of a case-control study administrating questionnaires to patients, their health
does not permit calculation of incidence rates. A true care providers, or other respondents. It is important to
relative risk, therefore, can not be calculated. However, understand the method of medication exposure ascer-
when the outcome of interest is relatively rare, the odds tainment in order to identify the potential limitations
ratio functions as an estimate of the relative risk. Like in their validity. Finally, selection of controls must be
the relative risk obtained from cohort studies, an odds done in a way to minimize selection bias.
ratio of 1.0 implies no association between the drug of Despite these challenges, well-designed case-
interest and the adverse event of interest. An odds ratio control studies can be useful sources of information
greater than 1.0 implies that those who receive the drug about the adverse efects of medicines.
of interest have a higher risk of the outcome of inter- A nested case-control design was used to examine
est than those who did not receive the drug of interest. the relationship of dopamine agonists to cardiac-valve
Similarly, an odds ratio less than 1.0 implies that those regurgitation [26]. Using the General Practitioner
who receive the drug of interest have a lower risk of Research Database, a computerized medical records
the outcome of interest than those not treated with the system containing information on approximately
167
6.3 million lives from about 350 general practices in eicacy of a product, such trials are typically not well
the UK, the researchers identiied a cohort of 11 417 suited for detecting rare adverse events, nor are they
patients who had at least two prescriptions for an anti- generally of suicient size to determine if there are
parkinsonian medication, were 40–80 years of age, and clinically signiicant diferences in the frequency of a
met other eligibility criteria. Anti-parkinsonian medi- speciic adverse event between two treatments. In these
cations included levodopa, selegiline, bromocriptine, settings, some of the most useful safety information
cabergoline, pergolide, lisuride, pramipexole, and rop- gained from a clinical trial is an understanding of the
inirole. From this cohort, they identiied 81 patients frequency of the most common adverse events.
with possible new valvular regurgitation. Fity of these In the post-market period, clinical trials can play
81 patients were then excluded because they did not many important roles in the ongoing safety assessment
have a conirmed diagnosis (n = 40), because they had of a medicine. he framework of a properly designed
pre-existing valvular heart disease (n = 2), or because and carefully conducted clinical trial is well suited
they had a myocardial infarction within the previous 3 for assessing the safety of a medicine in selected cir-
years (n = 8). he remaining 31 patients formed the case cumstances. First, randomization assigns treatment
group. For each case, up to 25 controls were selected independent of individual patient characteristics and
from the patients in the cohort who did not have pos- physician preferences. Randomization thus avoids
sible new valvular regurgitation, matched on sex, age the problem of confounding by indication that can be
(within 2 years) and year of entry into the study cohort. present in observational studies. Second, patient data
Patients with myocardial infarction within 3 years prior are collected in a standardized way deined in the clin-
to the index date (the date that resulted in the same dur- ical trial protocol. his method allows investigators
ation of follow-up for the control patients as the case to insure that all clinically relevant baseline and post-
patient) were excluded from the control group. he baseline data are captured, including detailed informa-
inal case-control analysis included 31 cases and 663 tion on patient demographics, disease duration and
controls. Exposure to a dopamine agonist was quan- severity, prior treatment, past medical history, and
tiied in two ways. First, cumulative duration of use concomitant medications. hird, data on the dosage
of a dopamine agonist was categorized as less than 6 and duration of study treatment regimens are carefully
months or more than 6 months. Second, for patients recorded. Fourth, outcomes, including adverse events
using pergolide or cabergoline, the total daily dose of of interest, can be ascertained and recorded in a sys-
the dopamine agonist was calculated and categorized as tematic and standardized way. Information on onset
3 milligrams (mg) or less daily or more than 3 mg daily. date or time, seriousness, clinical course and severity,
Conditional logistic regression, adjusting for multiple response to treatment (including response to with-
patient characteristics, was used to calculate odds ratios, drawal of the test medicine), and extent of resolution
which were used as estimates of incidence-rate ratios. can be obtained in a uniform way. For key adverse
he adjusted incidence-rate ratio of cardiac valve regur- event outcome measures, the protocol can stipulate the
gitation was elevated amongst patients using pergolide additional clinical details that need to be recorded. If
(adjusted incidence rate ratio 7.1, 95% CI: 2.3 to 22.3) necessary, speciic outcome events of interest can be
or cabergoline (adjusted incidence rate ratio 4.9, 95% adjudicated using pre-deined criteria by an independ-
CI: 1.5 to 15.6). No cases of new valvular regurgitation ent group of experts not otherwise involved in the trial
were found amongst users of the other dopamine ago- who can be blinded to treatment assignment. Finally,
nists. his case-control study thus demonstrated that blinding of treatment assignment minimizes bias in
the use of pergolide or cabergoline was associated with assessing adverse events.
the development of cardiac-valve regurgitation. Despite the advantages of the clinical trial method-
ology for assessing adverse events, there are constraints
Clinical trials to the set of clinical trials that are done prior to a drug’s
Clinical trials play an important role in assessing the approval that limit knowledge of a drug’s full safety
safety of medicine. he majority of clinical trials are per- proile. First, clinical trials are generally conducted in
formed primarily to assess the eicacy of a product. In patients who are more homogeneous than the larger
these trials, safety assessments are routinely included, population of patients who will receive the drug once
though there is usually not a speciic safety hypoth- it is marketed. Patients in clinical trials may difer from
esis. As eicacy trials are powered to demonstrate the those treated in clinical practice in terms of disease
168
severity, concomitant and past illnesses, concomitant trial that uses a non-inferiority study design. If, in the
medication, and personal characteristics. hese fac- post-marketing setting, efective treatments, in add-
tors can each inluence the development of an adverse ition to the drug whose safety is being tested, exist
event. Second, for medicines intended for chronic use, for the condition being treated, it would generally
the duration of treatment in a clinical trial is generally be unethical to withhold treatment. hus, placebo-
relatively short compared to that used in clinical prac- controlled clinical trials to study a potential safety risk
tice. Clinical trials are oten not practical for the detec- would likely not be possible, nor would they necessar-
tion or characterization of adverse events that emerge ily be relevant. In these cases, active comparators are
only ater prolonged exposure to a medicine. used. In a clinical trial designed to test a drug safety
Given the strengths and limitations of clinical tri- hypothesis, the relevant demonstration of safety would
als for the detection of adverse events, what is the role be that the test drug of interest has no higher risk of
of clinical trials in the assessment of safety in the post- the adverse event of interest than the comparator treat-
approval period? here are actually many roles for clini- ment (either an active comparator or a placebo), within
cal trials. First, clinical trials of eicacy oten continue a speciied margin. To determine that the test drug has
in the post-approval period. hese trials are generally no higher risk of the adverse event of interest than does
designed to expand the medicine’s original indication, a comparator treatment, a non-inferiority clinical trial
by studying diferent patient populations, diferent indi- design is used [27,28]. he objective of a non-inferior-
cations, and diferent dosing regimens. hough most ity clinical trial is to show that the diference in the fre-
post-approval clinical trials in these settings will be quency of an outcome between two treatment groups is
eicacy studies, each of these circumstances afords the small (see Chapter 13). For non-inferiority trials with
opportunity to enhance knowledge of medicine’s safety a primary safety outcome, the objective is to show that
proile. For example, new patient populations may have the frequency of the safety outcome between two treat-
a broader range of concomitant illnesses or a greater ments groups is small. Assuming that the frequency of
range of severity of the disease being treated. Similarly, the adverse event of interest in the comparator group
studying the eicacy of a medicine in a new indication is known, based on prior data, to be acceptable, the
will oten result in patient population that is diferent objective becomes showing that the frequency of the
than the one previously studied. Careful collection of adverse event of interest in the test drug group is not
safety data in a clinical trial with a broader population clinically meaningfully higher than that of the com-
or in a new indication may thus reveal new patterns of parator. To accomplish this goal, the investigators must
adverse events not previously recognized. Clinical trials determine the maximum clinically acceptable increase
with new dosing ranges or dose regimens may reveal in the frequency of the adverse event of interest in the
dose-dependent toxicities not previously appreciated. test drug group relative to the comparator group. his
For these reasons, careful collection of safety data in diference is known as the non-inferiority margin. With
clinical trials is important in the post-approval period. respect to the adverse event of interest, the test drug is
In some cases, a clinical trial is conducted specii- non-inferior to the comparator drug if the upper limit
cally to test a safety hypothesis. he impetus for such a of the 95% conidence interval around the measure
trial may be indings from case reports, observational comparing the two groups is below the pre-speciied
studies, or previous clinical trials. In addition to general non-inferiority margin. Because the non-inferiority
considerations for all clinical trials, there are certain trial design seeks to demonstrate that the speciied
features of clinical trials designed speciically to test risk of the drug of interest is not greater than that of a
drug safety hypotheses that must be considered care- comparator agent by a pre-deined, and usually small,
fully. First, while most clinical trials typically specify amount, the magnitude of the non-inferiority margin
a primary eicacy endpoint, and collect adverse event is based on clinical judgment and must be carefully
data to characterize the general safety proile of a medi- considered. Active-controlled, non-inferiority trials
cine, clinical trials designed to answer a speciic safety present a special set of challenges. he absence of a
question must clearly specify a deined safety endpoint. placebo arm results in loss of assay sensitivity, or the
his endpoint may be a single outcome, or it may be a ability to distinguish between active and inactive treat-
composite outcome. ments. If two treatments have the same frequency of an
Second, the design of a post-marketing clinical trial adverse event in an active-controlled clinical trial and
testing a safety hypothesis is oten an active-controlled the test drug is determined to be non-inferior to the
169
comparator agent, it is still not known if that adverse to be fulilled: the upper limit of the one-sided 97.5%
event is related to treatment in each of the two arms, or conidence interval for the hazard ratio can not exceed
if it is unrelated to treatment in the two arms [27,28]. 1.33, and the point estimate of the hazard cannot exceed
Relative to observational epidemiological studies, 1.12. he trial is ongoing.
clinical trials designed to answer drug safety questions A variant of the standard clinical trial that is some-
are usually more costly and more time-consuming. One times used for drug safety studies is the so-called ‘large
circumstance in which clinical trials are preferred for simple trial’. his technique can be used when rand-
answering drug safety questions is when there is con- omization is deemed to be the only way to control con-
cern that the techniques used to adjust for confounding founding completely.
in observational epidemiological studies do not allow In certain situations, prior data may suggest that
for complete controlling of the confounders. When the the strength of an association between a drug and an
observed association is small (e.g., a relative risk of 1.5 adverse event is numerically small but clinically impor-
or less) and there is concern that residual confounding tant. If the question is clinically important, a clinical
is present, observational epidemiological studies will trial may be the best way to address the issue, since in
oten not be able to sort out a causal efect from one these situations observational studies may not com-
driven by residual confounding. In this circumstance, pletely control for confounding. However, numerically
if the drug safety question is important, and there is small associations require that clinical trial sample
genuine uncertainty about the relationship of the drug sizes be large. Such large trials might not be feasible if
to the adverse outcome of interest, a clinical trial may they were to collect all the detailed information that is
be the only acceptable option. typically collected in a standard clinical trial. he large,
For example, the ‘Prospective Randomized simple trial is an alternative approach that allows for
Evaluation of Celecoxib Integrated Safety versus large numbers of subjects to be studied by minimiz-
Ibuprofen or Naproxen’ (PRECISION) trial is designed ing the volume and complexity of data collected, while
to evaluate the cardiovascular safety of celecoxib, ibu- maintaining the methodological rigor of a clinical trial.
profen, and naproxen [29]. Celecoxib, a non-steroidal Protocols for large, simple trials are developed to insure
anti-inlammatory drug (NSAID), is a selective that only data relevant for the speciic question of inter-
cyclooxygenase-2 inhibitor used in the treatment of est is recorded at baseline and at follow-up, and that
osteoarthritis and rheumatoid arthritis. Prior data had data on the speciic outcome(s) of interest are captured.
indicated that another member of the class, rofecoxib, Because large amounts of detailed data are not col-
was associated with an elevated risk of cardiovascular lected, in a large, simple trial, the ideal outcome data in
morbidity [30]. A further review of the entire NSAID these trials are those that are objectively deined, such
class revealed uncertainty about the relative cardiovas- as hospitalization or death. Follow-up and outcome
cular efects of all drugs in the class. he PRECISION information may be obtained using epidemiological
trial was designed to address this issue. he trial was techniques not used in traditional clinical trials, such as
designed to include 20 000 patients with symptomatic vital records databases, or questionnaires administered
osteoarthritis or rheumatoid arthritis at high risk for, to patients or caregivers not involving the investigator.
or with, established cardiovascular disease. Patients A large, simple trial was used to assess the safety of
will be randomized to naproxen 375 mg bid, ibupro- ibuprofen in children between 6 months and 12 years
fen 600 mg tid, or celecoxib 100 mg bid in a 1:1:1 allo- of age [31]. he investigators randomly assigned treat-
cation. Subjects will be followed for 48 months. he ment with acetaminophen 12 mg/kg, ibuprofen 5 mg/
primary safety outcome is the irst occurrence of the kg, or ibuprofen 10 mg/kg to patients who were seen
Antiplatelet Trialist Collaboration (APTC) endpoint, as outpatients for an acute febrile illness and who met
which includes cardiovascular death, non-fatal myo- other entry criteria. A total of 84 192 patients were
cardial infarction, or non-fatal stroke. he trial uses recruited from the practices of 1735 pediatricians, fam-
a non-inferiority design, with a statistical hypothesis ily practitioners, and general practitioners. Outcome
that none of the treatments is inferior to either of the information was obtained via a self-administered
others. hree pairwise comparisons will be used to test questionnaire mailed to parents or guardians 4 weeks
each drug against the other two. he published non- ater enrollment. he questionnaire asked about the
inferiority deinition for this trial speciies that two initial febrile illness, the amount of medication taken,
conditions must be met for the non-inferiority criteria supplemental treatments received, and the occurrence
170
of serious adverse events in the four-week interval. 4. US Food and Drug Administration. 21 CFR 314.80.
If a hospitalization was reported, the investigators 2009. http://www.accessdata.fda.gov/scripts/cdrh/
requested a copy of the hospital record. he principal cfdocs/cfcfr/CFRSearch.cfm?fr=314.80
outcomes of interest were acute gastrointestinal bleed- 5. Rogers A, Israel E, Smith C, et al. Physician
ing, acute renal failure, anaphylaxis, and Reye’s syn- knowledge, attitudes, and behavior related to
drome. Follow-up data were obtained for all but 0.3% reporting adverse drug events. Arch Int Med 1988; 148:
1596–1600.
of enrolled children. he investigators found a risk of
acute gastrointestinal bleeding of 7.2/100 000 children 6. Scott H, Rosenbaum S, Waters W, et al. Rhode Island
physicians’ recognition and reporting of adverse drug
treated with either ibuprofen (95% CI, 2 to 18 per 100
events. Rhode Island Med J 1987; 70: 311–6.
000). he corresponding risk amongst acetaminophen-
treated children was zero per 100 000 (95% CI, 0 to 11 7. McAdams M, Stafa J, Dal Pan G. Estimating the extent
of reporting to FDA: a case study of statin-Associated
per 100 000) (p = 0.31 for the diference). For acute
Rhabdomyolysis. Pharmacoepidemiol Drug Safety 2008;
renal failure, anaphylaxis and Reye’s syndrome, the 17: 229–39.
observed risk among children randomized to either
8. Graham D, Stafa J, Shatin D, et al. Incidence of
dose of ibuprofen was zero per 100 000 (95% CI, 0 to
hospitalized rhabdomyolysis in patients treated with
5.4 per 100 000).he authors concluded that the risks lipid-lowering drugs. JAMA 2004; 292: 2585–90.
of hospitalization for gastrointestinal bleeding, acute
9. US Food and Drug Administration. he ICH
renal failure and anaphylaxis were not increased fol- Guideline on Clinical Safety Data Management:
lowing short-term use of ibuprofen in children. Data Elements for Transmission of Individual Case
Safety Reports. 2009. http://www.fda.gov/downloads/
Drugs/GuidanceComplianceRegulatoryInformation/
Summary Guidances/ucm073090.pdf
here are many possible approaches to studying safety
10. US Food and Drug Administration. Guidance for
of drugs in the post-marketing period. hese include Industry – E2B(M): Data Elements for Transmission
review of individual case reports, case series, obser- of Individual Case Safety Reports. 2005. http://
vational epidemiological studies, and clinical trials. www.fda.gov/RegulatoryInformation/Guidances/
Each approach has its own strengths and limitations, ucm129428.htm
and no single approach is appropriate for all situations. 11. MedDRA MSSO. Medical Dictionary for Regulatory
Rather, the approach taken must consider what is Activities Maintenance and Support Services
already known about the adverse event and the know- Organization. 2010. http://www.meddramsso.com/
ledge gaps that need to be illed. Additional critical fac- 12. Almenof J, Tonning J, Gould A, et al. Perspectives on
tors include the nature of the adverse outcome under the use of data mining in pharmacovigilance. Drug
study, its expected frequency, the availability of exist- Safety 2005; 28: 981–1007.
ing data, and the importance and urgency of answering 13. Almenof J, DuMouchel W, Kindman L, et al.
the question. Regardless of the approach chosen, care- Disproportionality analysis using empirical Bayes data
ful attention must be paid to selecting proper control mining: a tool for the evaluation of drug interactions in
groups and comparator agents, minimizing bias, and the post-marketing setting. Pharmacoepidemiol Drug
Safety 2003; 12: 517–521.
controlling for confounding.
14. Anonymous. Causality asssessment of adverse events
following immunization. Weekly Epidemiol Rec 2001;
References 76: 85–9.
1. Institutes of Medicine-Committee on the Assessment of 15. Bower J, Maraganore D, McDonnell S, et al.
the U.S. Drug Safety System. Natural History of a Drug. Incidence and distribution of Parkinsonism in
he Future of Drug Safety: Promoting and Protecting Olmsted County, Minnesota, 1976–1990. Neurology
the Health of the Public. 2006. 1999; 52: 1214–20.
2. US Food and Drug Administration. 21 CFR 201.56. 16. Jones J. Determining causation from case reports. In:
2009. http://www.accessdata.fda.gov/scripts/cdrh/ Strom B, ed. Pharmacoepidemiology. 4th edition. John
cfdocs/cfcfr/CFRSearch.cfm?fr=201.56 Wiley & Sons, Ltd, 2005; 557–70.
3. US Food and Drug Administration. 21 CFR 201.57. 17. Meyboom R, Hekster Y, Egberts A, et al. Causal
2009. http://www.accessdata.fda.gov/scripts/cdrh/ or causal?: he role of causality assessment in
cfdocs/cfCFR/CFRSearch.cfm?fr=201.57 pharmacovigilance. Drug Safety 1997; 17: 374–89.
171
18. Naranjo C, Busto U, Sellers E, et al. A method for 26. Schade R, Andersohn F, Suissa S, et al. Dopamine
estimating the probability of adverse drug reactions. agonists and the risk of cardiac-valve regurgitation.
Clin Pharmacol herap 1981; 30: 239–45. New Engl J Med 2007; 356: 29–38.
19. he Uppsala Monitoring Centre. he Use of the 27. Temple R and Ellenberg S. Placebo-controlled trials
WHO-UMC System for Standardised Case Causality and active-control trials in the evaluation of new
Assessment. 2010. http://www.who-umc.org/ treatments. Part 1. Ethical and scientiic issues. Ann Int
20. Rodriguez E, Stafa J, Graham D. he role of databases Med 2000; 133: 455–63.
in drug postmarketing surveillance. Pharmacoepidemiol 28. Ellenberg S and Temple R. Placebo-controlled trials
Drug Safety 2001; 10: 407–10. and active-control trials in the evaluation of new
21. Nightingale S. Recommendation to immediately treatments. Part 2: Practical issues and speciic cases.
withdraw patients from treatment with felbamate. Ann Int Med 2000; 133: 464–70.
JAMA 1994; 272: 995. 29. Becker MC, Wang TH, Wisniewski L, et al.
22. Pennell P, Ogaily M, Macdonald R. Aplastic anemia in a Rationale, design, and governance of Prospective
patient receiving felbamate for complex partial seizures. Randomized Evaluation of Celecoxib Integrated
Neurology 1995; 45: 456–60. Safety versus Ibuprofen Or Naproxen (PRECISION),
a cardiovascular end point trial of nonsteroidal
23. Kaufman D, Kelly J, Anderson T, et al. Evaluation of antiinlammatory agents in patients with arthritis. Am
case reports of aplastic anemia among patients treated Heart J 2009; 157: 606–12.
with felbamate. Epilepsia 1997; 38: 1265–9.
30. Bresalier R, Sandler R, Quan H, et al. Cardiovascular
24. Strom B. Study designs available for events associated with rofecoxib in a colorectal
pharmacoepidemiology studies. In: Strom B, ed. adenoma chemoprevention trial. N Engl J Med 2005;
Pharmacoepidemiology. 4th edition. John Wiley & Sons, 352: 1092–102.
Ltd, 2005; 17–28.
31. Lesko SM and Mitchell AA. An assessment of the
25. Csizmad I, Collet J and Boivin J. Bias and confounding safety of pediatric ibuprofen: A practitioner-based
in pharmacoepidemiology. In: Strom B, ed. randomized clinical trial. JAMA 1995; 273: 929–33.
Pharmacoepidemiology. 4th edition. John Wiley & Sons,
Ltd., 2005; 791–809.
172
Section 4 Ethical issues
Chapter
Ethics in clinical trials involving
16 the central nervous system:

Risk, benefit, justice, and integrity
Jonathan Kimmelman
Introduction Basic principles of human research

Drugs targeting the CNS have one of the highest rates ethics
of attrition during development [1]. hough there have
been many spectacular successes in drug and biological Research and clinical care as morally
development, the clinical course of many CNS disor-
ders, like amyotrophic lateral sclerosis and Alzheimer’s distinct activities
disease, has changed little in decades. Why protect human subjects? Why is the consent proc-
Development of safe and efective interventions ess for research so much more laborious then it is in
against diseases of the CNS therefore remains an care settings? Why do clinical investigators have to get
important goal. As with any clinical trials, those involv- permission from third parties – institutional review
ing neurological disorders should cohere with the core boards (IRBs) – to give a drug to half their patients,
principles underlying human research ethics: respect while clinicians need not get permission from anyone
for persons, beneicence, and justice [2]. However, to give the drug to all their patients?
CNS trials oten present particular challenges with he answers to these questions take us to the heart
respect to applying these principles. hese relate to a of human research ethics, which is founded on a rec-
cluster of factors: neurological disorders oten impli- ognition that research and care are morally distinct
cate capacities necessary for informed consent, inter- activities. One oten cited reason for considering them
ventions in brain function involve signiicant degrees distinct is risk: volunteers in human research endure
of uncertainty and risk, and many trials rely on subjec- higher degrees of uncertainty and risk than patients
tive endpoints. in clinical care. his is almost certainly the case for
Here, we survey basic ethical principles and prac- phase 1 clinical trials, in which interventions that have
tices for human experimentation, and extend these to only been tested in animals are irst applied in human
clinical trials of CNS interventions. his chapter only beings. But research is not always riskier or more uncer-
touches on the related subject of regulatory and legal tain than care. As we will see below, principles like
issues in neurological research; we also refer the reader clinical equipoise are designed to ensure that the risks
to other sources for specialized topics like advanced and beneits of research participation are equivalent to
research directives [3], emergency studies [4], and those in competent care settings. Moreover, some care
neuroimaging. Finally, our discussion of informed con- interventions (e.g., surgical procedures) can be very
sent is cursory; a more detailed account can be found in risky, while some research procedures (e.g., a retro-
Chapter 17. spective chart review) are minimally risky. hough risk
173
Section 4: Ethical issues
and uncertainty pose important challenges in clinical Nuremberg Code established ten directives for human
research, it seems diicult to argue that risks alone experimentation. Principal among these was an abso-
justify the extra ethical vigilance accorded to clinical lute requirement for the informed consent of subjects.
research. In 1964, the World Medical Association relaxed this
A more satisfactory explanation is that, in care set- requirement with its Declaration of Helsinki, thus pro-
tings, clinicians have obligations to consider only the viding an ethical policy compatible with research on
best interests of their patients when making care deci- individuals lacking consent capacity. hese policies
sions, whereas in research, clinical investigators legiti- were not widely honored in North America. A ser-
mately endure divided loyalties. In particular, only in ies of revelations, starting with Henry Beecher’s 1966
exceptional instances should caregivers consider exter- exposé in the New England Journal of Medicine [5] and
nal interests when making treatment decisions about continuing past the Tuskegee Syphilis Study, led the
a particular patient. In contrast, though researchers US Congress to empanel a National Commission for
have well-established obligations to advance (or at the Protection of Human Subjects of Biomedical and
least, not set back) the interests of their volunteers, they Behavioral Research. he National Commission was
also have obligations to society by advancing medical the irst body to articulate broad ethical principles
knowledge. he latter obligations sometimes impose for research in its Belmont Report. heir recommen-
practices that, on their face, at least, seem to antago- dations were largely taken up in regulations issued
nize the interests of patients who volunteer for clinical by the Department of Health Education and Welfare
trials. For example, most patients prefer to – and are (45 CFR 46). In the years since, the Declaration of
indeed entitled to – know the identity of a drug they Helsinki has undergone several revisions, and vari-
are receiving. In research, however, trialists oten ran- ous other countries and entities have developed their
domize study volunteers and then mask them to their own policies. Policies of many professional societies,
treatment allocation. hey perform such procedures to like the American Academy of Neurology [6], largely
ensure the internal validity, and hence the social value, recapitulate themes in major policy documents like the
of the knowledge gained by the study. Other elements Belmont Report and Declaration of Helsinki.
of research practice that help secure its social value, but
that arguably are in tension with what patients might Core principles of major codes of research
identify as in their own interest, include the use of com- Numerous policies, codes, and regulations have fol-
parators (especially placebos), subtherapeutic dosing lowed from this history. hough they take diferent
in phase 1 trials, research procedures like blood draws positions on speciic issues – for example, they difer
that are not performed to inform care, exclusion cri- about when the use of placebo controls is ethical – there
teria that prevent co-interventions, wash-out periods, is nearly universal consensus on certain principles and
and rigid protocols that prevent patients from selecting practices. All policies express the view that the auton-
their dose, treatment schedule, or treatment. omy and welfare of human subjects must be protected;
It is clear, then, that medical research is morally that clinical studies should be designed to meaningfully
distinct from clinical care. According to Immanuel advance medical knowledge; and that protocols should
Kant’s celebrated categorical imperative, a person meet certain standards of justice. hese principles are
should never be instrumentalized, that is, used only as put into practice through a series of well-established
means to some other end. Medical research certainly frameworks and mechanisms. Autonomy is ensured
uses people as means to another end. Research ethics through the provision of informed consent (for per-
ofers a set of principles and practices to ensure that sons with capacity) or approximated through surrogate
medical science does not only use human subjects for decision-making plus a restriction on research risk (for
other ends. persons lacking capacity). he welfare of study volun-
teers is protected by ensuring that risks of clinical stud-
ies are justiied by a credible appeal to direct beneits
The history of human protection: scandal for study volunteers and beneits for society through
and reaction knowledge. he latter establishes a requirement that
Contemporary research ethics practices emerged in all research meet a threshold of validity. he justness
response to a series of scandals and atrocities in human of clinical research is protected by ensuring that dis-
experimentation. Following the Nazi doctors trial, the advantaged or vulnerable groups are not recruited
174
Chapter 16: Ethics in clinical trials involving the CNS
in an opportunistic manner, that they do not dispro- ing risk, beneit, and justice play out in clinical trials
portionately bear the burdens and risks of knowledge involving CNS disorders.
production activities, and that they are not denied the
knowledge value of medical research through undue Risk-benefit balance
exclusion from trials. All major codes of research ethics
agree that adherence to principles and practices out- Component analysis, clinical equipoise,
lined above should be prospectively and independently
reviewed by an independent and competent body (in and acceptable risk
the US, these committees are IRBs). A irst step in establishing ethical design of a study is
he principles of research ethics are best thought of ensuring that risks are reasonable in relation to ben-
as conditions that must each be fulilled for a study to eits. How are investigators and IRBs to make this judg-
proceed. here may be unusual circumstances where ment? he prevailing approach is through component
principles are in conlict, and it may be necessary bal- analysis, which begins with the recognition that clinical
ance competing objectives and principles. In general, trials oten involve a mixture of diferent procedures, the
however, one should avoid the temptation to think of risks of which will have diferent justiications [9–10].
principles as exchangable: the justice of a clinical trial When interventions are performed for scientiic rea-
cannot be ‘purchased’ by providing greater beneits to sons (e.g., blood draws to monitor drug metabolites,
volunteers or their communities; an unfavorable risk- or lumbar punctures to measure biomarkers), risks are
beneit balance in a protocol is not purchased by a par- only justiied insofar as they are outweighed by know-
ticularly robust informed consent procedure. ledge beneits. here are restrictions on the level of risk
for research procedures performed on patients deemed
vulnerable or lacking consent capacity (e.g., children
Regulatory vs. ethical obligations [11], prisoners [12]). he Declaration of Helsinki, for
Regulations governing human protections aim at estab- example, allows only minimal risk research procedures
lishing a baseline level of ethical conduct. Researchers on incompetent subjects [13]. his thus establishes
oten assume, then, that unless an action is specii- important limits on research risk in many realms of
cally excluded by regulations, it is ethical. Yet there are CNS research, including traumatic brain injury [14]
many examples of research conduct that clearly count and advanced neurodegenerative disease.
as unethical, but are not speciically barred by regula- When interventions are performed with evidence
tions. For example, current human protections laws do suicient to support belief that patients might beneit,
not mandate full publication of negative or unfavorable the standard for deciding risk acceptability is clinical
indings in clinical trials. However, selective publica- equipoise. Clinical equipoise establishes two condi-
tion of clinical trial data is widely viewed as unethical. tions that must be met at the outset of a clinical trial.
In CNS research, the tensions between regulation First, ‘there must be honest, professional disagreement
and ethics are perhaps greatest around the use of pla- among expert clinicians about the preferred treatment.’
cebo controls. As we will see below, many ethicists hold According to this condition, patients should never be
that clinical trials that violate the principle of clinical systematically disadvantaged by enrollment in a clini-
equipoise are unethical. Nevertheless, US regulations cal trial by allocation to a study intervention that is
and FDA policy, as well as international policies aimed demonstrably inferior to standard of care. As such,
at harmonizing regulatory standards across jurisdic- when trials administer drugs to patients with unmet
tions, do not require clinical equipoise except where medical needs, there should be uncertainty within the
use of comparators present life-threatening and/or expert community as to the drug’s comparative mer-
irreversible morbidity [7–8]. Another example where its with other drugs provided within the study (e.g., in
regulatory and ethical standards diverge is with the the control arm) or available outside the study in a care
principle of justice: neither US nor ICH policy specii- setting.
cally address the fairness of locating trials in economi- he second condition embodied in the principle of
cally disadvantaged settings. Clearly, then, researchers clinical equipoise is that studies should ‘be designed in
should avoid conlating regulation and ethics. such a way as to make it reasonable to expect that, if it is
With the above principles established, the sections successfully conducted, … the results should … be con-
below turn to how principles and practices surround- vincing enough to resolve the dispute among clinicians’
175
[15]. Because statistically underpowered or methodo- hese distinctive features each have several implica-
logically unsound trials only rarely resolve disputes tions for the design and review of CNS trials. First, the
among practitioners about the comparative clinical greater uncertainty associated with brain interventions
merits of drugs, such studies generally do not meet the should be interpreted as higher risk. Greater uncer-
principle of clinical equipoise, and are hence unethical. tainty makes it more diicult to rule out the possibility
he second condition of clinical equipoise builds of major adverse outcomes occurring. hese adverse
on the principle that ethical research should fulill a outcomes potentially implicate qualities that are essen-
threshold condition of validity, and that the medical tial to an individual’s selhood. Given that the primary
and social value of knowledge produced in clinical tri- aim of clinical research is the production of generaliz-
als are important criteria in the ethical evaluation of able knowledge, investigators (and review committees)
clinical trials. he requirements of validity and value should proceed with extreme caution. Second, because
ground other ethical practices that have become well distinctively human responses are impossible to antici-
established in clinical research. For example, when pate in animal studies, human studies provide the irst
results of clinical trials go unpublished, the broader opportunity to monitor the efects of an intervention on
clinical community cannot use such indings to inform the human mind. he principle of beneicence would
practice. As such, failure to publish blocks a neces- favor study designs that carefully monitor subjects for
sary step through which clinical research is translated changes in cognition, afect, and other brain functions
into social value, and policies like the Declaration of as appropriate [19].
Helsinki require prospective registration of clinical
trials in a public database, and publication of posi- Human clinical experiments: role
tive, negative, and inconclusive research indings. CNS
research has been subject to the same kinds of publi- of preclinical studies
cation bias seen in other realms [16–17]. It should be Phase 1 trials of new CNS interventions, as with all
noted here that, though FDA policy does not require interventions, generally present a high degree of
prospective registration of phase 1 studies, the ethical risk and uncertainty. he Nuremberg Code and the
rationale for prospective registration of early phase Declaration of Helsinki clearly articulate a require-
studies is similar to that for later phase studies. ment for preceding clinical testing with animal and/or
laboratory experiments. A series of well-designed pre-
Distinctive features of risk in studies clinical experiments can provide a sound ethical basis
for initiating human clinical trials if they provide a rea-
involving brain interventions sonably reliable basis for estimating and avoiding risk,
here are many ways that risks presented by stud- and sound reasons to expect that human testing will
ies that involve brain interventions have a diferent meaningfully inform the development of an interven-
character than those encountered in other therapeu- tion or a class of interventions.
tic areas. First, the brain is the organ of personhood. A full discussion of design principles and ethics
Inadvertent disruptions to brain processes have the of preclinical research is well beyond the scope of this
potential to diminish such essentially human capaci- chapter. Nevertheless, there is a growing literature
ties as language, cognition, identity, and sociality. In showing that many preclinical studies in neurology do
part because of the intricacy of brain circuitry, disrup- not appear to take basic measures to ensure preclini-
tions are extremely diicult to reverse. cal study validity. For example, various meta-analyses
A second challenging characteristic of risk in stud- consistently show a minority of CNS preclinical stud-
ies that involve the brain is the type of uncertainty about ies address threats to internal validity through use
risk. Unlike most toxicities, impairments in human brain of a priori statement of hypothesis, randomization,
processes like cognition or sociality do not lend them- concealed treatment allocation, or masked outcome
selves to easy testing in animal models [18]. Uniquely assessment (see, for example, [20–24]) Whether these
human traits, like capacity for language, are, by deini- methodological practices actually invalidate preclini-
tion, impossible to model in animals. herefore, animal cal indings is unclear, though meta-epidemiological
studies do not provide a reliable basis for anticipating studies have found that failure to conceal treatment
many types of harms that can occur in CNS trials. hey allocation [25] and to publish [26] led to larger efect
can also be diicult to monitor or detect. sizes.
176
A lack of methodological rigor in preclinical stud- hese might include collection of biomarker data,
ies raises concerns about risk-beneit balance in phase imaging, histological studies, and a plan for autopsy
1 studies. It is thus the responsibility of preclinical in the event of volunteer death. Research components
researchers to provide reasonably reliable evidence increase the likelihood that, in the event that desired
of an intervention’s safety and promise, and it is the responses are not observed in a trial, investigators can
responsibility of clinical investigators to solicit study determine why a drug is failing, and whether modiica-
volunteers only ater these standards have been met. tion of the approach might lead to successful transla-
tion [28–29].
Phase 1 trials: planning for positive
and negative results Subject selection in early phase trials
One particularly vexing category of clinical research is CNS trials involving aggressive interventions oten
the phase 1 trial (of which irst-in-human trials are a raise diicult ethical questions about which category of
special class). To these authors’ knowledge, there are no patients to include in initial tests. In realms like cancer
reliable estimates of risks and beneits for phase 1 stud- or infectious disease, aggressive and novel approaches
ies of any CNS disorders. Nevertheless, the very high are most oten tested in patients who are no longer
attrition rate for CNS drugs would lead one to infer that responsive to standard therapies. his is because the
direct beneits (that is, beneits attributable to receiv- risk-beneit balance of trial enrollment is more favo-
ing study interventions) are limited. In some circum- rable for them: trial enrollment for patients with
stances, risks in phase 1 studies can be considerable. advanced disease entails less opportunity cost to them,
For example, recently completed clinical trials involv- because they are not imperiling adequate health status,
ing gene transfer of neurotrophic factors involved eight and participation does not necessitate withdrawal of
intraputaminal inoculations [27]. Assuming surgical established efective care.
risks in these studies are similar to those for electrode However, there are several reasons why patients in
implantation in deep brain stimulation, delivery alone earlier stages of disease might be attractive candidates
confers a 0.9% risk of mortality and a 4% risk of intra- for early phase studies. One reason this author rejects is
cerebral hemorrhage leading to serious neurological that, because interventions aim at halting progression
deicits. of disease, patients with less advanced disease have a
It is a matter of some controversy whether interven- greater prospect of beneit. his argument necessarily
tion risks can be ethically justiied by the prospect of subscribes to the position that risks in early phase tri-
direct beneit for volunteers, or whether they are justi- als are generally justiied by an appeal to therapeutic
ied entirely by social knowledge. he present author beneit. Anyway, if other established efective forms
inds the latter justiication more plausible, especially of care are available for patients, it strains credibly to
for trials where enrollment requires withholding of argue that a never before tested intervention, and for
validated interventions (in the case of Parkinson’s which appropriate dosing, scheduling, and delivery
disease, future trials might involve withholding deep methods are not established, is in genuine clinical equi-
brain stimulation from patients for whom it is indi- poise with one that is already validated. A more con-
cated). Regardless of how one justiies risk in phase 1 vincing rationale for enrolling patients with relatively
trials involving surgical delivery, such trials are only recent disease onset is that such studies enable a more
justiiable insofar as laboratory and preclinical stud- meaningful test of the intervention’s properties. here
ies strongly support the initiation of human testing. To is also less concern that, should adverse events occur,
maximize the knowledge value of studies while mini- attribution of cause will be confounded by disease
mizing the exposure of volunteers to risk, phase 1 trials status. If later stage trials are to be pursued in patients
should be designed with two objectives in mind: irst, with early disease, there may be validity advantages to
they should provide reliable evidence of optimal dose performing earlier phase studies in a similar patient
such that phase 2 trials can select the appropriate doses, group. An additional factor that may make medically
route of administration, and, in some circumstances, stable patients more attractive candidates from an ethi-
patient population. Second, investigators should incor- cal perspective is that they may be in a better position
porate into trials research components that enable vali- to provide valid and authentic informed consent, since
dation at key steps in the causal pathway of drug action. their decisions are not impelled by perceived medical
177
necessity [30]. However, this last advantage is tempered involving MS [33]. he most recently articulated con-
by the suggestion that patients with advanced disease ditions include (briely): 1- forms of disease for which
might have advantages in decision-making as com- there is no established efective therapy; 2- participants
pared with patients with early disease, as the former are refuse established efective therapy; 3- enrolling sub-
more likely to have adapted to their illness [31]. jects are not responding to established efective ther-
How this debate is resolved ultimately hinges on apy; 4- established efective therapy is not available to
a utilitarian calculus that the risks of jeopardizing the enrolling subjects because of resource constraints; 5-
adequate health status of patients in early disease stages studies are short-term and aimed at proof of concept;
are justiied by the incremental gain in knowledge from 6- use of placebo controls will not cause serious or irre-
enrolling them instead of patients who are treatment versible harm [33, 35].
refractory. his author inclines toward the position In the opening of this part, we described the prin-
that, if a study is primarily aimed at testing safety, feasi- ciple of clinical equipoise as the standard for justifying
bility, and deining conditions for testing in later stage risk of drug administration in late phase clinical tri-
trials, enrollment of patients with advanced disease is als. To what extent are the conditions speciied above
generally a more prudent course. However, reasonable consistent with the principle of clinical equipoise?
people can disagree on this. We suggest that one way of Conditions 1 and 3 are uncontroversial and fulill clin-
resolving this controversy about risk and beneit is to ical equipoise. Condition 2 could, in principle, ful-
seek advice from a representative cross-section of the ill the principle of clinical equipoise provided that
disease community [30]. We will return to this point in patient refusal of established efective therapy has a
our discussion of justice. medical basis and occurs independently of the invi-
tation to trial enrollment. Condition 4 could be con-
sistent with clinical equipoise, though as we will see in
Placebo controls and clinical equipoise the next part, it is constrained by concerns about just-
At the opposite end of clinical development is the ran- ice. Provided that medications are withheld for a very
domized controlled trial, in which new interventions short period, and that harms are carefully monitored,
are tested against a comparator drug. Few clinical modest, and immediately treated, a nuanced reading of
trial design features have inspired as much debate as clinical equipoise could be compatible with condition
the use of placebo controls. Such debate has been fur- 5. In such circumstances, the appropriate moral frame-
ther intensiied by a proliferation of ethical standards, work for evaluating risk under component analysis is
and tensions between regulatory standards and ethics to view the withholding of care as a research procedure.
bodies. Condition 6 is more problematic for clinical equipoise:
Many trials involving neurological disorders show it would fall beneath the standards of competent care
evidence of placebo responses [32]. Controversy sur- for clinicians to withhold medications from patients in
rounding the use of placebo comparator arms has been a manner that led to moderate or long lasting (but not
especially pitched in clinical trials involving relapse- irreversible) morbidity. Moreover, even were such risk
remitting multiple sclerosis [33–34]. On its face, deemed ethically acceptable, placebo-controlled trials
relapse-remitting multiple sclerosis is precisely the type meeting the sixth condition would not enable the reso-
of condition for which placebo controls are methodo- lution of relevant clinical uncertainty: clinicians and
logically desirable: its course is remitting, and outcome their patients need to know whether a new drug works
measurements oten involve variables that are subjec- better than established efective drugs, not whether the
tive or otherwise susceptible to bias. he rationale for new drug works better than no treatment. Proponents
including placebo comparators—plus randomized and of clinical equipoise, then, would question condition 6,
masked treatment allocation—is to control for subjec- and instead urge the use of alternative trial designs, like
tive report and assessment of study outcomes, expect- placebo add-on or non-inferiority studies [36].
ancy efects triggered by perceived administration of One last issue complicating the ethics of placebo
therapy, and various factors that might cause sponta- use in clinical trials is the possibility that volunteers will
neous remission (e.g. regression to the mean). become unmasked during the course of the study, as
he National Multiple Sclerosis Society twice may happen if there are treatment-speciic side efects.
issued policies specifying conditions where the use of When unmasking occurs, interpretation of results is
placebo controls could be ethically acceptable in trials confounded by the possibility that outcome diferences
178
between arms represent a placebo efect rather than a penetration of the dura. Partial burr holes enable the
pharmacologic response. Another confounding possi- masking of study volunteers to intervention, with only
bility is that unmasked subjects in the placebo arm are modest risk and burden. If sham interventions require
seeking co-interventions or dropping out of a study. To extended withholding of established efective therapy
ensure valid interpretation of placebo-controlled tri- from volunteers, they may also violate clinical equi-
als, investigators should assess and report the quality poise. For instance, in trials involving Parkinson’s dis-
of the blind at the completion of the study. ease, patients allocated to the sham arm may be asked
to forgo otherwise medically indicated treatment like
deep brain stimulation. his exposes patients with
Sham controls unmet medical needs to the burdens of unmanaged
Further complicating ethical debates surrounding illness.
the choice of comparators is the use of sham surgical he second ethical critique of sham controls is their
controls. Many cutting edge treatment strategies in deceptive element. In studies that involve ‘awake sur-
neurology, like stem cell transplantation, gene trans- gery,’ sham procedures require that clinicians enact
fer, and neurotrophic factors involve surgical delivery. a theater of surgical delivery. Of course, there is an
Without active placebo controls—like sham surgical element of deception in any placebo controlled trial,
procedures—such studies are susceptible to confound- as placebos are administered in part to elicit a level of
ing as a result of the placebo response. expectation comparable to that for patients in the active
his is because the strength of placebo responses arm. However, some commentators question whether
tends to correlate with the degree of a procedure’s inva- it is ethical for clinical practitioners to actively mislead
siveness [37]. In addition, placebo responses tend to be patients, even if they have been warned ahead of time
greater when subjective outcomes are used. he meth- about deceptive design elements [45].
odological case for sham controls is therefore particu- hough use and design of sham controls continues
larly strong for CNS disorder trials that involve both to inspire debate [42, 46–47] even among volunteers
surgery and subjective endpoints. Parkinson’s disease themselves [31, 48], discussion appears to have moved
is one such example; in several instances, clinically well beyond simplistic and categorical opposition. As
signiicant and durable responses have been observed long as shams continue to be used, skeptics and pro-
following sham procedures [38]. In this case, there is ponents agree on three necessary conditions for use of
evidence to suggest that placebo responses are in part sham controls. First, risks must be minimized: inves-
driven by disturbances in basal ganglia dopamine turn- tigators should select sham procedures that reduce
over [39–41]. risk and burden for volunteers. As penetration of the
Absent sham controls, then, inferences about cau- dura exposes volunteers to a range of potential risks
sation for clinical response are likely to be unreliable, without being necessary for maintaining a blind, such
thus frustrating the ethical requirements of value and invasive sham procedures should be avoided. Second,
validity. Nevertheless, the use of sham controls is ethi- risks and burdens for sham procedures must be justi-
cally contentious. Concerns divide into two categories. ied by the prospect of knowledge value. his means
First, applying sham controls may expose patients to that there should be a very high degree of conidence
non-trivial risk and burden [42]. Sham interventions that a study is addressing a signiicant and immanent
are, by deinition, invasive and justiied by an appeal question for the clinical community, and that the study
to research warrant rather than therapeutic beneit is designed and likely to be executed in a way that will
for volunteers. he level of harm associated with sham produce meaningful results that cannot be obtained
interventions depends, of course, on the nature of the through alternative study designs. Sham controlled
sham procedure. At one extreme is a study that per- studies therefore warrant particular attentiveness to
formed sham implantations of catheters into the puta- supporting evidence and rigorous trial design. To that
men of study subjects, thereby exposing volunteers to end, research teams should plan to query patients at
the full risk of brain surgery [43]. Some sham controls the end of the study about whether they believe they
in movement disorder trials have involved exposing have been allocated to the active arm. hird, research
patients to a course of immunosuppression [44]. More teams should ensure a careful informed consent proc-
typically, sham controls in brain intervention stud- ess, making certain that patients understand that they
ies involve partial burr holes to the cranium without may be allocated to sham interventions. Researchers
179
sometimes substitute the word ‘placebo’ for ‘sham’ or correlative value of markers that will be measured,
when discussing a trial. his substitution should be including assay validity. he burdens of correlative
avoided during informed consent, as shams are con- studies are not justiied if prognostic biomarkers have
siderably less benign than placebos. Research teams unproven predictive value. hird, researchers should
should also not attempt to entice wavering volunteers demonstrate an intention to publish the results of their
with the prospect of an open-label extension study in correlative studies. Trial registries tend not to list cor-
which patients in the sham arm can later receive active relative study components within clinical trials, and
treatment, because this may not come to pass if the there is generally little if any pressure to publish indings
intervention shows unacceptable toxicity or activity. of correlative studies—especially when they produce
Finally, teams should provide a careful debrieing proc- negative or inconclusive results. his raises concerns
ess for volunteers at the completion of the study. that burdens that volunteers have submitted to will go
unredeemed by a gain in generalizable knowledge.
Correlative studies embedded within drug trials
Correlative studies also raise concerns about informed consent. Because
In realms like cancer research, the amount of tissue correlative study procedures mingle with therapeutic
procured from patients for pharmacokinetic and phar- activities, research subjects might not appreciate that
macodynamic studies has increased over the years the former are performed for research purposes only.
[49]. Similarly burdensome or risky research proce- One small study found that most patients receiving
dures within clinical trials are likely on the rise in CNS non-diagnostic serial tumor biopsies in the context of
research as well. Brain imaging and biomarkers in cere- a phase 1 cancer study incorrectly perceived the pro-
brospinal luid promise a way of measuring drug activ- cedure as aimed at disease management [51]. If this
ity for conditions like amyotrophic lateral sclerosis indeed shows failed comprehension (as the authors
[50], Alzheimer’s, and MS before clinical responses are purport), it raises concerns that volunteers may not be
detectable. Imaging provides an opportunity to follow providing valid informed consent. To thwart such mis-
response in numerous brain diseases. Moreover, such understandings, separate consent should be sought for
studies provide an opportunity to test a drug’s activity burdensome research procedures like lumbar punc-
along key points in the causal pathway of drug action. tures, and research teams might assess the adequacy
Correlative and marker studies raise two sets of of a volunteer’s understanding before accepting their
issues. he irst concerns the policies for storage and informed consent as valid.
sharing of banked data and tissues. We direct the inter-
ested reader to other sources for a more complete dis-
cussion of privacy protections and data sharing policies. Brain imaging and incidental findings
he second set of issues concerns the assessment and Many CNS drug trials involve brain imaging; in one
management of risk. Correlative and marker studies are report, brain abnormalities, like malignancies or vas-
neither designed nor expected to address a volunteer’s cular malformations, were detected in as many as 18%
unmet health needs. hey therefore present volunteers of healthy volunteers [52]. Incidental indings are prob-
with risks and burdens in the absence of a clinical ration- ably less common in the context of CNS trials, because
ale. Under component analysis, the risks of such study many patients will have already received brain scans as
components must be justiied by a credible claim about part of their diagnosis. Nevertheless, trials involving
the value of the knowledge that will be produced. he brain imaging should plan for the management of inci-
assessment of research value requires that investigators dental indings. Several guidelines for addressing inci-
and reviewers attend to three elements of burdensome dental indings in brain imaging have been put forward
studies embedded within clinical trials. First, investi- [53–54]. hese vary somewhat, but tend to concur on
gators must demonstrate the validity of study design, the following items: 1- researchers should submit a
including sampling and statistical methods. Because plan for managing incidental indings to the IRB, and
correlative studies are rarely the central focus of clin- disclose to subjects the possibility of incidental ind-
ical trials, investigators may underestimate the ethical ings during informed consent; 2- researchers should
signiicance of ensuring statistically and methodo- obtain informed consent to report incidental indings
logically valid design. Second, at the outset of the study, to them should they occur; 3- research teams should
investigators should be able to establish the prognostic consider whether professionals capable of interpreting
180
the clinical relevance of neuroimaging scans should populations would build on unfair disadvantages of
be included in the study personnel; 4- research teams others.
should prioritize disclosure of incidental indings to hough irst articulated by the National
subjects (or their surrogates) who have consented to Commission in the 1970s, the principle of justice
receiving this information, and follow up with written lay more or less dormant until its revival in the mid
communications [55]. 1990s following a series of controlled trials in Africa
and hailand. In these studies, pregnant women were
Justice and fairness randomized to either an abbreviated course of AZT or
placebo in order to test whether vertical transmission
Justice and a fair distribution of risks of HIV could be reduced. Critics alleged that, because
a standard course of AZT had been shown to prevent
and benefits vertical transmission, the studies violated clinical
Among the three canonical principles of research eth- equipoise by depriving some patients of established
ics, justice is probably the least familiar and celebrated efective care. Study defenders argued that because the
within the clinical research community. he relative standard course of AZT was not afordable for patients
obscurity of this principle stems, at least in part, from in impoverished settings, the study met a local stand-
the fact that considerations of justice do not implicate ard of clinical equipoise.
the kinds personal interests that clinicians routinely Following this debate, major international codes
encounter with informed consent and risk. Despite the of research ethics developed two policies for ensuring
lagrant injustices behind early to mid twentieth cen- fair and non-exploitative research design. First, clinical
tury scandals that motivated research ethics policy, the trials should make provisions for post-trial access. he
principle of justice was articulated only belatedly with Declaration of Helsinki states that ‘protocol[s] should
the Belmont Report. here, justice is conceived largely describe arrangements for post-study access by study
in terms ensuring that disadvantaged individuals do subjects to interventions identiied as beneicial in the
not disproportionately bear burdens and risks of clini- study or access to other appropriate care or beneits.’
cal research. Study designers should therefore address the prospect
hree historical developments have driven an that patients responding to a study intervention will
expansion of what the principle of justice is thought not be withdrawn once the study ends. hough this
to encompass in research. First is the globalization of policy applies to all trials, the issue of post-trial access
research, and the increasing volume of high-income is a particular concern where patients or health care
country-sponsored trials pursued in low and middle- systems are unable to aford continued treatment once
income countries. Second is a recognition that certain a study ends.
classes of patients—namely, children, women (espe- he second policy is the principle of responsiveness:
cially pregnant women), persons in low and middle- trials should always be part of a program of inquiry that
income countries, persons of color, the elderly—have will expand the capacity of health-related social struc-
been deprived of the beneits of medical knowledge in tures in the host community to meet urgent health
part because of their exclusion from clinical research. needs [56]. As such, studies should not actively recruit
hird is the ascendancy of disease advocacy groups patients who are members of groups that are unlikely
that have used justice-based arguments for greater to be able to access or beneit from the knowledge that
access and inclusion not only to clinical trials, but also a trial produces.
to experimental interventions outside of trials. All Issues of justice arise with particular frequency
three expansions are apparent in contemporary CNS whenever CNS trials involve placebos. Recall that,
clinical research. according to the MS Society of America, the use of pla-
cebo controls may be acceptable where established efect-
Research and disadvantaged populations ive therapy is not available to enrolling subjects because
Patients disadvantaged by poverty, incarceration, con- of resource constraints. his policy can comply with the
inement, lack of health care access, and/or marginal principle of justice if the study is testing an intervention
political status oten present convenient research for MS that is likely to be accessible despite the resource
opportunities. Unchecked by the principle of justice, constraints of the local health care system. However,
advancing the medical interests of relatively advantaged it violates the principle of justice where interventions
181
are unlikely to be afordable or accessible to the types of in terms of patient autonomy—build on the intuition
patients recruited into the study [33]. Because patients that, because patients ultimately bear the risks and bur-
unable to access established efective interventions are dens of trial participation, their perspectives should be
oten unable to access new and cutting edge interven- incorporated into the design and review of trials.
tions, proposed placebo controlled trials in low-income Nevertheless, appeals for access should not be
settings will oten falter on the principle of responsive- allowed to override the core objectives of clinical
ness. To address concerns about responsiveness, inves- research. However much trials aim to protect the inter-
tigators should produce evidence that the intervention ests of subjects, they are ultimately designed to advance
they are testing is likely to be afordable and deployable medical knowledge by producing generalizable know-
given the resource constraints of the host community. ledge. Greater access and inclusion present two threats
to this objective. First, packaging trials as therapeutic
Inclusivity, evidence needs, and inclusion vehicles potentially diverts attention from their scien-
tiic purpose. For example, clinicians are oten tempted
Exclusion of patients also has adverse consequence for
to fudge eligibility criteria in order to enable enrollment
society, because it deprives the health care system of evi-
of otherwise excluded patients [63]. If these exclusions
dence needed to provide efective care to certain classes
have a valid scientiic justiication, their violation can
of patients. Among the categories of patients that have
confound the interpretation of trial outcomes. Second,
been excluded historically are children, women, and
access and/or less restrictive risk standards can threaten
people of color.
the interests of other legitimate stakeholders in clinical
hough changes in research and patent policy have
research. For example, major adverse events in one
helped address some exclusions, various commenta-
trial can have cascading adverse efects on related lines
tors point out that others—e.g. pregnant women [57]
of research [64], and poorly designed or executed stud-
and the elderly—remain to be addressed [58]. For
ies potentially damage the credibility and standing of a
example, most epilepsy drugs are tested in younger
broader research ield.
populations; extending these results to elderly patients
he perspectives of potential research subjects and
is made diicult by the presence of co-morbidities and
disease communities can and should inform the design
altered metabolism associated with aging [59]
and review of clinical trials—especially where conten-
he design and review of clinical trials should
tious designs or levels of risk are involved. Nevertheless,
determine whether eligibility criteria are fair and
the principle of justice also requires that investigators
appropriate. On the one hand, trials should strive to
and reviewers safeguard the integrity of the research
test interventions in a population that is as diverse and
enterprise by maintaining appropriate standards of
heterogeneous as the ultimate target population for the
quality, safety, and methodology.
drug. On the other hand, patients belonging to certain
groups are expected to have biological diferences that
afect clinical responses. Inclusive eligibility criteria can Beyond protecting human subjects
antagonize validity aims if efects in one patient group
‘dilute out’ efects for another. herefore, studies that Clinical investigators and responsibilities
recruit biologically diverse patients should adequately to non-research subjects
power and plan for a subgroup analysis. his is espe- he issues we have addressed thus far largely center on
cially critical when recruiting members of vulnerable duties investigators (and by extension, IRBs) owe to
or disadvantaged groups. human volunteers in CNS trials. However, investiga-
tors harbor duties to other stakeholders as well. With
The integrity of the research enterprise some exceptions, regulations and major ethical policies
A third salient along which the principle of justice has do not speciically address these other ethical duties. In
expanded concerns patient access to investigational this part, we briely discuss several issues of particular
agents. Organized patient advocates have pressed relevance to CNS drug development.
policy-makers to relax restrictions on access to inves-
tigational CNS drugs; they have also, at times, urged Risks and burdens for third parties
more permissive and inclusive standards for clinical Many neurological clinical trials require the participa-
trials [60–62]. hese appeals—though oten couched tion not only of subjects, but also on their caregivers.
182
For example, Alzheimer’s disease clinical trials oten their decision-making. Given that these expectations
perform assessments of caregiver outcomes [65]. Even are oten established prior to the consent encounter,
when they do not, the conduct of such studies involv- the information patients and family members receive
ing patients with compromised or declining capacity before being solicited for trial enrollment plays a cru-
may depend crucially on the cooperation of caregiv- cial role in patient exercise of autonomy.
ers. Caregivers oten do not fall within the deinition Researchers have obligations to interact with vari-
of ‘human subject,’ and are hence not always accorded ous publics in ways that foster critical engagement with
protections of informed consent and risk review under the implications of their research indings. Speciically,
existing policy. Yet clearly, their interests are implicated they should avoid issuing press releases that do not pro-
in clinical trials, and they bear at least some burdens of vide context for evaluating the implications of a study.
the research. Elsewhere, I called implicated third par- hus, if an early phase study shows promising efects,
ties ‘research bystanders,’ and argued that protections researchers should emphasize that many interventions
of some form should be extended to them in the form that show promising efects at this stage do not with-
of risk review, burden minimization, and under some stand larger, more rigorous testing. Researchers should
circumstances, informed consent [66–67]. also attend to various non-verbal or afective elements
of communication that shape public expectations. For
The duty to initiate trials before diffusion example, they should avoid presenting to the media
patient testimonials from small, uncontrolled clinical
of risky interventions trials.
To a large degree, drug and biologics regulations bar
clinicians from introducing non-validated interven-
tions into medical practice without clinical testing. Conclusion
his helps protect the public from undue risk, while Disorders of the CNS present a number of challenges
promoting the production of knowledge to enable for specifying core principles and practices of research
evidence-based practice. Nevertheless, non-validated ethics. Patients frequently have compromised consent
CNS interventions have occasionally been introduced capacities, and risks are oten considerable: access to
into clinical practice before rigorous testing has estab- the brain can require invasive approaches, harms are
lished a favorable risk-beneit balance. For example, potentially irreversible and diicult to model in ani-
several overseas clinics market non-validated cell mals, and they implicate functions necessary for per-
transplantation to patients with neurodegenerative sonal identity and human interaction. he distinctive
diseases and spinal cord injury [68–69]. nature of neurological illnesses—and interventions
he Belmont report states that ‘radically new proce- designed to reverse them—lead to recurrent ethical
dures… should… be made the object of formal research tensions surrounding the initiation of translational
at an early stage in order to determine whether they are clinical trials, subject selection, the use of placebo com-
safe and efective.’ he Declaration of Helsinki makes parators in randomized controlled trials, and standards
a similar point in paragraph 35. Researchers thus have for acceptable risk.
positive duties to subject their interventions to clinical Addressing unmet health needs of patients with
testing—or, barring that, systematic study—regardless CNS disorders will necessitate inding ways of adapt-
of whether an intervention falls within the remit of ing general principles and practices of research ethics
domestic drug regulatory bodies. to these circumstances. However compelling the need
or objectives of clinical research, research ethics always
begins with the premise that the rights and interests
Fostering critical public engagement of human subjects are inviolable. he task of ethical
with findings research is both to work within these constraints, and to
We began this chapter by noting the inexorable and design studies that align knowledge production activ-
morbid course of many neurodegenerative diseases. ities with patient care objectives. And where patient
Patients and their families oten invest signiicant care and research objectives diverge in non-trivial ways
energy in following and responding to cutting edge (as they inevitably will), researchers should at least
research developments. Patient expectations concern- ensure that their subjects share with them a conviction
ing an intervention’s therapeutic possibilities shape in the value of the research.
183
13. World Medical Association. Declaration of Helsinki,

Acknowledgments 1964.
his work was funded by the Canadian Institutes of
14. Menon DK. Unique challenges in clinical trials in
Health Research (NNF 80045 and MSH 87725). traumatic brain injury. Crit Care Med 2009; 37(1 Suppl):
S129–35.
References 15. Freedman B. Equipoise and the ethics of clinical
research. N Engl J Med 1987; 317: 141–5.
1. Pangalos MN, Schechter LE and Hurko O. Drug
development for CNS disorders: strategies for balancing 16. Rowbotham MC. he impact of selective publication on
risk and reducing attrition. Nat Rev Drug Discov 2007; clinical research in pain. Pain 2008; 140: 401–4.
6(7): 521–32. 17. Liebeskind DS, Kidwell CS, Sayre JW and Saver JL.
2. he National Commission for the Protection of Human Evidence of publication bias in reporting acute stroke
Subjects of Biomedical and Behavioural Research. he clinical trials. Neurology 2006; 67: 973–9.
Belmont Report: Ethical Principles and Guidelines 18. Mathews DJ, Sugarman J, Bok H, et al. Cell-based
for the Protection of Human Subjects of Research. interventions for neurologic conditions: ethical
Department of Health and Welfare, 1979. challenges for early human trials. Neurology 2008; 71:
3. Stocking CB, Hougham GW, Danner DD, et al. Speaking 288–93.
of research advance directives: planning for future 19. Duggan PS, Siegel AW, Blass DM, et al. Unintended
research participation. Neurology 2006; 66: 1361–6. changes in cognition, mood, and behavior arising from
4. Schats R, Brilstra EH, Rinkel GJ, et al. Informed consent cell-based interventions for neurological conditions:
in trials for neurological emergencies: the example ethical challenges. Am J Bioeth 2009; 9: 31–6.
of subarachnoid haemorrhage. J Neurol Neurosurg 20. van der Worp HB, de Haan P, Morrema E, et al.
Psychiatry 2003; 74: 988–91. Methodological quality of animal studies on
5. Beecher HK. Ethics and clinical research. N Engl J Med neuroprotection in focal cerebral ischaemia. J Neurol
1966; 274: 1354–60. 2005; 252: 1108–14.
6. Ethical issues in clinical research in neurology: advancing 21. Gibson CL, Gray LJ, Bath PM, et al. Progesterone for
knowledge and protecting human research subjects. he the treatment of experimental brain injury; a systematic
Ethics and Humanities Subcommittee of the American review. Brain 2008; 131: 318–28.
Academy of Neurology. Neurology 1998; 50: 592–5. 22. O’Collins VE, Macleod MR, Donnan GA, et al. 1,026
7. International Conference on Harmonisation (ICH). experimental treatments in acute stroke. Ann Neurol
Guidance for Industry. E10 Choice of Control Group 2006; 59: 467–77.
and Related Issues in Clinical Trials: U.S. Department 23. Banwell V, Sena ES and Macleod MR. Systematic
of Health and Human Services, Food and Drug review and stratiied meta-analysis of the eicacy of
Administration, Center for Drug Evaluation and interleukin-1 receptor antagonist in animal models of
Research (CDER), Center for Biologics Evaluation and stroke. J Stroke Cerebrovasc Dis 2009; 18: 269–76.
Research (CBER), 2001. 24. Benatar M. Lost in translation: treatment trials in the
8. Temple R and Ellenberg SS. Placebo-controlled trials SOD1 mouse and in human ALS. Neurobiol Dis 2007;
and active-control trials in the evaluation of new 26: 1–13.
treatments. Part 1: ethical and scientiic issues. Ann 25. Crossley NA, Sena E, Goehler J, et al. Empirical
Intern Med 2000; 133: 455–63. evidence of bias in the design of experimental stroke
9. Weijer C and Miller PB. When are research risks studies: a metaepidemiologic approach. Stroke 2008; 39:
reasonable in relation to anticipated beneits? Nat Med 929–34.
2004; 10: 570–3. 26. Sena ES, van der Worp HB, Bath PM, et al. Publication
10. Freedman B, Fuks A and Weijer C. Demarcating bias in reports of animal stroke studies leads to major
research and treatment: a systematic approach for the overstatement of eicacy. PLos Biol 2010; 8: e1000344.
analysis of the ethics of clinical research. Clin Res 1992; 27. Marks WJ, Jr., Ostrem JL, Verhagen L, et al. Safety and
40: 653–60. tolerability of intraputaminal delivery of CERE-120
11. Protection of Human Subjects: Criteria for IRB (adeno-associated virus serotype 2-neurturin) to
approval of research 45 CFR 46.400 et seq. Department patients with idiopathic Parkinson’s disease: an open-
of Health and Human Services, 2005. label, phase I trial. Lancet Neurol 2008; 7: 400–8.
12. Protection of Human Subjects: Criteria for IRB 28. Kimmelman J. Gene Transfer and the Ethics of First-
approval of research 45 CFR 46.300 et seq. Department in-human Research: Lost in translation. Cambridge:
of Health and Human Services, 2005. Cambridge University Press, 2010.
184
29. Kimmelman J, London AJ, Ravina B, et al. Launching 44. Bjorklund A, Dunnett SB, Brundin P, et al. Neural
invasive, irst-in-human trials against Parkinson’s disease: transplantation for the treatment of Parkinson’s disease.
ethical considerations. Mov Disord 2009; 24: 1893–901. Lancet Neurol 2003; 2: 437–45.
30. Kimmelman J. Stable ethics: enrolling non-treatment- 45. Macklin R. he ethical problems with sham surgery in
refractory volunteers in novel gene transfer trials. Mol clinical research. N Engl J Med 1999; 341: 992–6.
her 2007; 15: 1904–6. 46. London AJ and Kadane JB. Placebos that harm: sham
31. Frank SA, Wilson R, Holloway RG, et al. Ethics of sham surgery controls in clinical trials. Stat Methods Med Res
surgery: perspective of patients. Mov Disord 2008; 23: 63–8. 2002; 11: 413–27.
32. de la Fuente-Fernandez R, Schulzer M and Stoessl AJ. 47. Horng SH and Miller FG. Placebo-controlled
he placebo efect in neurological disorders. Lancet procedural trials for neurological conditions.
Neurol 2002; 1: 85–91. Neurotherapeutics 2007; 4: 531–6.
33. Lublin FD and Reingold SC. Placebo-controlled clinical 48. Cohen PD, Herman L, Jedlinski S, et al. Ethical issues in
trials in multiple sclerosis: ethical considerations. clinical neuroscience research: a patient’s perspective.
National Multiple Sclerosis Society (USA) Task Force Neurotherapeutics 2007; 4: 537–44.
on Placebo-Controlled Clinical Trials in MS. Ann 49. Goulart BH, Clark JW, Pien HH, et al. Trends in the use
Neurol 2001; 49: 677–81. and role of biomarkers in phase I oncology trials. Clin
34. Miller A. Ethical issues in MS clinical trials. Mult Scler Cancer Res 2007; 13: 6719–26.
2005; 11: 97–8. 50. Ryberg H, Askmark H and Persson LI. A double-
35. National Multiple Sclerosis Society. Ethics of Placebos in blind randomized clinical trial in amyotrophic lateral
MS Clinical Trials Reassessed in New Publication. 2008. sclerosis using lamotrigine: efects on CSF glutamate,
http://www.nationalmssociety.org/news/news-detail/ aspartate, branched-chain amino acid levels and
index.aspx?nid=202 (Accessed November 2, 2010.) clinical parameters. Acta Neurol Scand 2003; 108: 1–8.
36. National Placebo Working Committee. National Placebo 51. Agulnik M, Oza AM, Pond GR, et al. Impact and
Initiative (NPI). Health Canada. 2005. http://www.hc-sc. perceptions of mandatory tumor biopsies for
gc.ca/dhp-mps/prodpharma/activit/proj/npinotice_ correlative studies in clinical trials of novel anticancer
inpavis-eng.php (Accessed November 2, 2010.) agents. J Clin Oncol 2006; 24: 4801–7.
37. Kaptchuk TJ, Goldman P, Stone DA, et al. Do medical 52. Katzman GL, Dagher AP and Patronas NJ. Incidental
devices have enhanced placebo efects? J Clin Epidemiol indings on brain magnetic resonance imaging from
2000; 53: 786–92. 1000 asymptomatic volunteers. JAMA 1999; 282:
36–9.
38. Watts RL, Freeman TB, Hauser RA, et al. A double-
blind, randomised, controlled, multicenter clinical 53. Illes J, Kirschen MP, Edwards E, et al. Ethics. Incidental
trial of the safety and eicacy of stereotaxic intrastriatal indings in brain imaging research. Science 2006; 311:
implantation of fetal porcine ventral mesencephalic 783–4.
tissue (Neurocelli-PD) vs. imitation surgery in patients 54. Wolf SM, Lawrenz FP, Nelson CA, et al. Managing
with Parkinson’s disease (PD). Parkinsonism and incidental indings in human subjects research: analysis
Related Disord 2001; 7(S87). and recommendations. J Law Med Ethics 2008; 36:
39. Lidstone SC and Stoessl AJ. Understanding the placebo 219–48.
efect: contributions from neuroimaging. Mol Imaging 55. Illes J, Kirschen MP, Edwards E, et al. Practical
Biol 2007; 9: 176–85. approaches to incidental indings in brain imaging
40. de la Fuente-Fernandez R, Ruth TJ, Sossi V, et al. research. Neurology 2008; 70: 384–90.
Expectation and dopamine release: mechanism of the 56. London AJ and Kimmelman J. Justice in translation:
placebo efect in Parkinson’s disease. Science 2001; 293: from bench to bedside in the developing world. Lancet
1164–6. 2008; 372: 82–5.
41. Lidstone SC, Schulzer M, Dinelle K, et al. Efects of 57. Adab N, Tudur SC, Vinten J, et al. Common antiepileptic
expectation on placebo-induced dopamine release in drugs in pregnancy in women with epilepsy. Cochrane
Parkinson disease. Arch Gen Psychiatry 2010; 67: 857–65. Database Syst Rev 2004; 3: CD004848.
42. Weijer C. I need a placebo like I need a hole in the head. 58. Avorn J. Powerful Medicines: he Beneits, Risks, and
J Law Med Ethics 2002; 30: 69–72. Costs of Prescription Drugs – Chapter 7. New York,
43. Lang AE, Gill S, Patel NK, et al. Randomized Vintage Books, 2005.
controlled trial of intraputamenal glial cell line-derived 59. Leppik IE, Brodie MJ, Saetre ER, et al. Outcomes
neurotrophic factor infusion in Parkinson disease. Ann research: clinical trials in the elderly. Epilepsy Res 2006;
Neurol 2006; 59: 459–66. 68 Suppl 1: S71–6.
185
60. Winerip M. Fighting for Jacob. he New York Times Alzheimer’s disease pharmacotherapies. IRB 2006; 28:
1998. 11–8.
61. he hard way to a Bill of Rights. Lancet Neurol 2005; 4: 787. 66. Kimmelman J. Medical research, risk, and bystanders.
62. Patient choice in clinical trials. Lancet 2005; 365(9476): IRB 2005; 27: 1–6.
1984. 67. Kimmelman J. Missing the forest: further thoughts on
63. Chen PW. Bending the Rules of Clinical Trials. he New the ethics of bystander risk in medical research. Camb
York Times 2009. Q Healthc Ethics 2007; 16: 483–90.
64. Wilson JM. Medicine. A history lesson for stem cells. 68. Baker M. Tumours spark stem-cell review. Nature 2009;
Science 2009; 324: 727–8. 457: 941.
65. Lingler JH, Parker LS, DeKosky ST, et al. Caregivers 69. Lau D, Ogbogu U, Taylor B, et al. Stem cell clinics
as subjects of clinical drug trials: a review of human online: the direct-to-consumer portrayal of stem cell
subjects protection practices in published studies of medicine. Cell Stem Cell 2008; 3: 591–4.
186
Section 4 Ethical issues
Chapter
The informed consent process:
17 Compliance and beyond

Scott Y. H. Kim
Introduction respect for potential and enrolled subjects [1]. hese

requirements for ethical clinical research are in roughly
his chapter provides an evidence-based and practical
sequential order in the process of evaluating the ethics
overview of informed consent for neurological clinical
of a research protocol.
trials, in four parts. he irst part places the doctrine
here are ive requirements that precede the question
of informed consent within an overall framework of
of informed consent. In other words, a clinical research
clinical research ethics, along with a brief history of
protocol must satisfy ive other requirements before it is
informed consent. he second part discusses the three
deemed ethically permissible to even ofer research par-
key elements of informed consent: how and what infor-
ticipation to potential subjects. hus, informed consent
mation to disclose; ensuring voluntary consent; and
cannot make ethical the involvement of a person in a
how to assess the decision-making capacity of potential
clinical trial that is of dubious scientiic or social value, or
subjects with cognitive impairment. he third part dis-
that uses shoddy methods, or that targets a sample only
cusses issues to consider when considering enrollment
for convenience, or that has not minimized the risks, or
of subjects based on surrogate consent. he conclu-
that has not undergone independent review. Although
sion critically examines the widely discussed concept
some of these elements are commonly thought of only as
of therapeutic misconception and suggests how to
scientiic criteria for evaluating research protocols, they
enhance the quality of subjects’ decision-making about
are actually important ethical criteria that precede the
research participation.
question of informed consent.
So what role does informed consent play in research
The purpose of informed consent ethics? In general, rather than making a research pro-
tocol ethical, informed consent makes the involvement
The place of informed consent in research of speciic subjects in ethically approved research ethi-
ethics cal. It is a duty owed to speciic individuals that shows
respect for their right to self-determination.
What makes clinical research ethical? Perhaps the irst
thing that comes to mind is informed consent. his
is not surprising since autonomy, the ethical basis for History of informed consent for research
informed consent, has become the dominant concept he purpose of research is fundamentally diferent
in Western bioethics [1]. But informed consent is from that of treatment. When a surgeon recommends
only one among several requirements of ethical clini- an operation to her patient, the patient can reasonably
cal research. If one were to review the various ethics assume that the surgeon’s primary purpose in recom-
codes, commission reports, declarations, and scholarly mending the procedure is to improve his health and
literature from around the world on clinical research welfare. When a researcher ofers a research protocol
ethics and reduced them to a set of common princi- to a patient, on the other hand, the primary purpose of
ples, one will likely ind the seven principles identiied that research protocol is not the speciic subject’s health
by Emanuel et al.: social or scientiic value, scientiic and welfare. he primary goal is the generation of sci-
validity, fair subject selection, favorable risk-beneit entiic knowledge. his primary research goal implies
ratio, independent review, informed consent, and a potential for some degree of sacriice – of health,
187
welfare, or comfort – on the part of the subject for the required. he US Federal regulations are explicit about
sake of generating scientiic knowledge. he amount of what needs to be disclosed to potential subjects; these
such trade-of will vary depending on the clinical trial. elements are summarized in Table 17.1.
At one extreme might be research involving place- Because the disclosure elements are so explicitly
bos when efective treatments exist. Some have even spelled out, an investigator will ind that his or her local
argued that as long as the research subject does not research ethics review board (an institutional review
sufer permanent serious injury or death, the trade-of board, or IRB, in the US) will have considerable say
may be permissible [2] whereas others have proposed a over what goes into an informed consent document.
lower limit on risk in such situations [3]. On the other In fact, IRBs usually have a detailed template that the
hand, for some research protocols, especially when investigator will be expected to use.
they involve diseases for which no efective treatments Although the IRB’s job is to ensure that the informed
exist and the proposed intervention is not too risky or consent forms are ‘understandable,’ the tendency of
burdensome, the amount of trade-of may be less. In IRB requirements regarding informed consent forms
either case, the main goal of clinical trials is, by treating oten go in the other direction, albeit unintentionally.
the subject as a means (with his or her permission), to Research shows that informed consent documents are
generate knowledge that can be applied to persons with written at a high level of reading diiculty. In fact, IRBs
the same medical condition; the primary goal is not to use language in their informed consent templates that
treat the speciic individuals enrolling in the study. are far above the levels they require their investigators
he doctrine of informed consent in the treat- to use (typically 8th grade level) in informed consent
ment context was developed largely through case law forms, by an average of almost three grade levels (aver-
in 1950s to 1970s [4]. But given the important distinc- age text level was 10.6th grade) [7].
tion between research and treatment, the necessity of What should the investigator do? IRBs vary consid-
informed consent for research was recognized much erably in their oversight practices and policies [8, 9]. he
earlier (even if it was not called informed consent at that researcher may not be able to do much in some cases.
time). For example, as early as 1907, Sir William Osler For instance, one of the most bureaucratic and dii-
cult to understand passages in most clinical research
was asked to testify to the British Royal Commission on
informed consent forms is the section on ‘Privacy and
Vivisection regarding the ethics of Major Walter Reed’s
Conidentiality’ because it is oten written in lawyerly
experiment on yellow fever [5]. When Osler was asked
language in complying with the Federal HIPAA (Health
by the Commission whether ‘to experiment upon man
Insurance Portability and Accountability Act) Privacy
with possible ill results was immoral,’ he answered, ‘It is
Rule. In spite of these bureaucratic constraints there
always immoral, without a deinite, speciic statement
are a few things that a researcher can do to improve the
from the individual himself, with a full knowledge of
quality of disclosure.
the circumstances’ [Osler quoted in [5] p. 131]. In fact,
he very nature of research involves uncertainties
the essential diference between treatment and research
and probabilities. How to best communicate prob-
was formally recognized even earlier [6].
abilities of risk and potential beneits is a common
and complex issue [10]. First, should probabilities be
The practice of informed consent expressed using words such as ‘possible,’ ‘rare,’ ‘unlikely’
for research or by numerical expressions? Studies have shown that
In order for a person to provide valid, informed con- it is generally better to use numerical expressions in the
sent, three conditions must be met. he person must form of natural frequencies (i.e., ‘5 out of 100’) rather
be provided adequate information. He or she must pos- than relying solely on verbal expressions of probability
sess decision-making capacity. And the decision must be [10]. Also, although ‘possible’ and ‘probable’ seem to
made voluntarily, without coercion or undue inluence. indicate quite diferent likelihood of an event occur-
ring, many factors afect perceptions of such prob-
ability expressions; for example, mere valence (i.e.,
Information to be disclosed for research ‘possible’ has positive valence) of a verbal expression
consent can create perceptions of probability that are much
For most clinical research (and for most clinical tri- greater than what one might intend [11]. Second,
als in neurology), written informed consent will be being sensitive to how the probability statements are
188
Chapter 17: Informed consent process
Table 17.1 Legally required disclosure elements for informed consent for research (Title 45 Code of Federal Regulations 46.116a&b)
Always required:
(1) A statement that the study is research, its purpose and procedures
(2) Any reasonably foreseeable risks or discomforts
(3) Any benefits that may be reasonably expected
(4) Any alternative treatments that might be advantageous to the subject
(5) Degree of confidentiality expected
(6) Compensation, if any, and whether and nature of treatment available if injury occurs
(7) Contact information for further questions
(8) Statement that participation is voluntary
Required when appropriate:
(1) A statement regarding currently unforeseeable risks
(2) When the investigator may terminate the subject’s participation
(3) Any additional costs to the subject that may result from participation in research
(4) Consequences of withdrawal from study and procedures for orderly withdrawal
(5) A statement that significant new findings during the study which may relate to continued participation by the subject
will be provided
(6) Approximate number of subjects in the study
framed is important. hus, when discussing potential that subjects’ understanding is optimal. A variety of
beneits and risks, it may be important to present both methods have been attempted, including multimedia
the likelihood of a good outcome and the likelihood of interventions, enhanced consent forms, extended dis-
bad outcome, especially if the outcome in question is cussion formats, and test/feedback procedures [13].
central to the risk-beneit analysis that could afect a A review of 42 such studies showed that, perhaps not
person’s willingness to participate. hus, instead of say- surprisingly, the most efective means of improving
ing ‘serious bleeding is rare,’ it may be better to say, ‘it understanding is extended, one-on-one discussions
is expected that if 100 persons were given this medica- with the subjects [13].
tion, on average 3 persons will experience severe bleed-
ing and 97 persons will not.’
If the protocol is long and complex, it may be use- Voluntary consent
ful to prepare a short one page summary document he Federal regulations require that informed consent
(which will need to be approved by the IRB) of the long will be sought ‘only under circumstances that provide
informed consent form. IRBs will not allow such forms the prospective subject or the representative suicient
to replace the longer form, but such a summary may be opportunity to consider whether or not to participate
a useful tool to reinforce the key points and to provide and that minimize the possibility of coercion or undue
an easy-to-grasp overview of the clinical trial. inluence’ (45CFR46.116). Of the three elements of
In representing the risks and burdens of a study informed consent, this one is the least well conceptual-
protocol, being clear and straightforward will serve ized and studied [14].
the project well in the long run. Drop-outs are expen- Although evidence is scarce, it is highly unlikely
sive and compromise the quality of science. An impor- that research subjects participate from coercion or
tant ingredient in subjects’ motivation to participate undue inluence. A recent study of 88 subjects enrolled
is the trust and conidence they feel in the researchers in clinical trials for a variety of conditions found ‘lit-
and their institutions [12]. Candor and transparency tle evidence’ of constraints on voluntariness [15].
go a long way in earning such trust and conidence. Participation in clinical trials requires a good deal of
As the burdens or risks involved in a research study cooperation based on trust. It is unlikely that a sub-
increases, greater the efort should be in ensuring ject who feels coerced or feels external pressure would
189
volunteer to participate, and even less likely that such a standards of decision-making ability [20]. Clearly, the
subject would continue to cooperate. hus, discussions capacity to consent to research or treatment is impaired
in bioethics regarding threats to voluntary decision- very early in conditions such as AD.
making have focused on other ways in which a subject Other neurodegenerative disorders with cognitive
may make a less than optimal decision – either by being impairment will of course be associated with impaired
misled by the information provided or misunderstand- decision-making abilities. For example, depending on
ing the nature of research participation due to internal the legal standard used, 25% to 80% of Parkinson’s dis-
pressures (such as a desperate desire to beneit thera- ease patients with ‘mild’ level of cognitive impairment
peutically). Systematic studies of informed consent were found to be marginally incapable or incapable of
forms do not reveal that subjects are given inaccurate providing consent for treatment [21].
information [16]. However, concerns that very sick
individuals desperate for relief may conlate research Assessment of capacity
with treatment (the so-called ‘therapeutic misconcep-
he practice of capacity assessment is still an evolv-
tion’) remain [17]. hese concerns are discussed in the
ing ield, especially when done to assess capacity for
inal part of this chapter.
research consent. In contrast to the other two elements
of informed consent (disclosure and voluntariness),
Decision-making capacity and cognitive the Federal regulations are silent in terms of criteria for
impairment in neurological disorders assessing capacity. Over the past several years, some
states in the US have passed speciic laws regulating
Many neurological conditions involve impaired cogni-
research with adults lacking capacity and some of these
tive function. Conditions such as Alzheimer’s disease
discuss the criteria for capacity. A recent New Jersey
(AD) have a devastating impact on their victims. We
statute deines ‘unable to consent’ as:
cannot currently alter the course of the disease and the
best hope for advances in treating persons with such ‘…unable to voluntarily reason, understand, and appre-
ciate the nature and consequences of proposed health
illnesses rests on research. However, the assault on the
research interventions… and to reach an informed deci-
brain that impairs the overall cognitive and decision-
sion’ [22].
making abilities creates the ethical problem of needing
to conduct research with those who are oten not capa- he principles of capacity assessment in the treatment
ble of providing their own informed consent. context generally apply to the research context as well
he dementing illnesses have a major impact on [23]. All adults are presumed to have decision-making
consent capacity, even when the disease is in the early capacity (DMC), although that presumption can be
stages. In a study of 60 patients with mild cognitive challenged, as in cases where the subject is known to
impairment (MCI) with a mean mini-mental state have a cognitive disorder that oten impairs a person’s
examination (MMSE) score of 28.4, 27% to 53% were DMC.
deemed to have capacity for treatment consent that he terms ‘capacity,’ ‘decision-making capac-
was ‘marginal or below,’ depending on the standard of ity,’ and ‘competence’ can be used interchangeably
capacity used. In this study, ‘marginal or below’ was to indicate a clinical determination approximating
deined psychometrically as persons falling 1.5 stand- what a court would decide; the latter can be speciied
ard deviations below the control group mean [8]. In a as adjudicated competence or capacity, to avoid any
study that examined 40 persons with MCI regarding confusion [23].
research consent capacity for a typical phase 3 drug he assessment of decision-making capacity is
clinical trial, expert judges categorized subjects using measured according to four standards or abilities: evi-
audio-taped capacity interviews. hey found that 40% dencing a choice, understanding, appreciation, and
of MCI subjects were incapable of providing informed reasoning [24]. Although the exact terms may be dif-
consent, despite a MMSE mean score of 28.3 (SD1.1) ferent, most statutes and policy documents mention
[19]. In a study of persons with AD (mean MMSE (or can reasonably be interpreted to overlap with) these
22.9) using the same capacity instrument (MacArthur four abilities [23]. Evidencing a choice is a minimal
Capacity Assessment Tool-Clinical Research), 66% of standard, merely the ability to state a preference that
the mild to mild-moderate AD patients failed a clini- is stable enough to be implemented. Understanding is
cian panel-validated threshold on at least one of four the ability to comprehend intellectually the facts of the
190
decision-making situation. Appreciation is the ability risk-beneit proile of the protocol. At one extreme may
to apply those facts to one’s own situation, and involves be an informal judgment of capacity made by a research
an ability to form appropriate beliefs. For instance, a assistant. his may be appropriate, for example, for a
person with AD may acknowledge that the researchers minimal risk observational or interview study involving
are telling him that he has dementia (thus exhibiting AD patients, with no sensitive information. Sometimes
understanding of what the researchers are saying), and brief forms or questionnaires might be used to guide the
yet fail to believe that he actually has dementia (thus assessment and to document the fact that subjects have
failing to appreciate the fact). he ability to reason refers understood the essential elements of informed consent,
to general procedural ability to process information or for use as an initial screen to determine whether fur-
without obvious processing defects; it is not a standard ther, more intensive assessment is needed [28].
about the ‘reasonableness’ the subject’s decision. At the other extreme, the capacity evaluation pro-
DMC is distinct from a diagnosis. It is a functional cedures may need to be a systematic, structured evalu-
concept. Even if one has AD, the old fashioned, vague ation by an experienced, independent mental health
label of ‘unsound mind’ cannot be used to justify cat- professional (or perhaps even a panel of such experts)
egorizing someone as lacking capacity. A person lacks who renders his judgment using a detailed and vali-
capacity if he or she is unable to carry out the requi- dated capacity assessment tool. his may be an appro-
site abilities underlying decision-making, not simply priate standard when enrolling potentially impaired
because he or she has a diagnosis. persons who provide their own informed consent for
DMC is context sensitive. A capacity assessment is a high-risk study, such as irst in human neurosurgical
largely an exercise in balancing the duty to respect the experiments.
person’s autonomy interests with the duty to protect his
welfare interests. hus, in assessing whether a person
lacks capacity, the potential welfare implications for Surrogate consent for research
the subject must be taken into account: the greater the Although the need for informed consent in research
potential for harm and lower the potential for beneit, has long been recognized, controversy about how best
the higher the level of abilities needed to be deemed to regulate research involving those who cannot con-
competent. his is a long-standing principle that is sent for themselves remains, not only in the US [29]
widely accepted in policy documents [25, 26] and in but also internationally [30]. Because the situation var-
practice [27]. ies according to jurisdiction, it is impossible to give a
uniform guidance on how to involve decisionally inca-
Recommendations regarding capacity pacitated subjects in neurological research. he inves-
tigator will need to work closely with his or her IRB.
assessment An excellent, detailed guidance on how to work with
he decision as to whether a speciic plan for capac- one’s IRB on these issues has been published by the
ity assessment is required in a clinical trial generally Alzheimer’s Association [31]. Some of the key ques-
rests with the local IRB. However, there are no uniform tions that will need to be addressed by the investigator
standards for formulating such plans. Such plans need and the IRB are as follows.
to be lexible and adapted to the particular context. In practice, close family members tend to serve as
he investigator should, at minimum, be familiar with de facto surrogates. We have found that persons at risk
the laws and regulations of one’s own jurisdiction (and for AD, family caregivers, and the general public are all
one’s own institution’s interpretation of those laws and broadly in favor of de facto family consent [32–34]. But
regulations). he investigator should also be familiar policy is not so clear. he US Federal research regula-
with the elements of the modern practice of capacity tions require that for an incompetent adult, a legally
assessment, along with the available empirical data authorized representative (LAR) provide permission
on capacity for the population of interest, if available. for the incapacitated subject to participate in research
Because the level of knowledge regarding these matters (45 CFR 46.102c). However, the regulations defer to the
may vary considerably among IRBs, the investigator states on who can serve as LAR. Although California,
may need to educate his or her IRB. Virginia, and New Jersey have recently enacted laws that
he rigor or intensiveness of capacity evaluation answer this question, most states have not addressed
should vary depending on the subject population and the the issue clearly, if at all [22, 35–37].
191
he Federal regulations do not provide explicit decision-making by the AD patient who can provide
guidance. Published documents by various groups do his or her own consent. Many people feel that if a com-
not agree [29–31]. For example, the recent law passed petent person decides to take on a high-risk option,
in California does not limit the research by specifying then it is more permissible than allowing an incapaci-
risk-beneit categories, leaving the judgment to local tated person to take on that risk based on a surrogate’s
IRBs, whereas laws in Virginia and New Jersey do spell permission. his follows the logic of autonomy, at least
out the types of research allowed in terms of risks and theoretically.
beneits, such as excluding psychosurgery, and limit- A ‘competent only’ policy has obvious limitations
ing risk on ‘non-therapeutic’ research. Most attempts as well. Since the threshold for capacity should be sen-
to articulate a policy on this topic tend to focus on sitive to the risk-beneit context, when a study’s risk-
whether research holding no prospect for direct bene- beneit ratio is seen as quite high – i.e., those cases likely
it to the subjects can involve risks that are greater than to elicit a competent only policy – very few AD patients
‘minor increase over minimal risk’ [31]. will be competent to consent. Involving those few,
When a person is able to provide airmative agree- perhaps atypical, AD patients may limit the generaliz-
ment to participate, even if not capable of informed ability of the clinical trial’s indings. It will also make
consent, it is generally agreed that such assent is essen- recruitment more diicult and expensive. But the point
tial. Dissent by an incompetent adult should generally of a competent only policy may just be that sometimes
be respected as well. Excellent discussion and recom- the quality of science and the extra costs are the price
mendations regarding this issue can be found elsewhere to pay to uphold an important ethical principle. his
[38]. Another widely discussed principle is that persons trade-of is probably the most obvious focus when try-
with incapacity may not be enrolled in research unless ing to balance the pros and cons of a policy of enrolling
that research focuses on the subjects’ medical condi- only competent subjects.
tion [30]. Also, some advocate that research should not However, it may be useful to examine what such a
be performed with incompetent persons if it can be policy might look like at the level of implementation,
performed with competent persons (although this can as a way of thinking through the merits of a compe-
be more complicated than it seems: see next part). tent only policy. Ethicists, oten non-clinicians who
advocate a competent only policy may not realize
Ethical analysis: Should only competent that a capacity determination is not a straightforward
subjects be enrolled in certain types of assessment. In fact, although capacity researchers have
developed methods for measuring the abilities relevant
research? to DMC in a dimensional sense, there is very little guid-
When involving cognitively impaired subjects in clini- ance on how to make a categorical determination of
cal trials, the ‘right thing to do’ will require a delibera- capacity, i.e., there is no ‘gold standard’ we can use to
tive process of thinking through various options, in determine whether a cognitively impaired person is in
working with independent ethics review bodies. It may fact competent [40]. Because the capacity for research
be useful therefore to work through a realistic example consent is a relatively new domain of assessment, there
that an investigator may encounter, as an exercise in can be widely difering opinions about where this line
ethical analysis. should be drawn, even among clinicians who routinely
Are there certain types of research that are so risky perform capacity assessments in other settings [27].
that only competent subjects – even if they have a disor- Is it possible then that a competent only policy
der such as AD – should be allowed to enroll? Recently places too much emphasis on a diicult to implement
the Recombinant DNA Advisory Committee (RAC) distinction? A person with well diagnosed AD who is
of the National Institutes of Health recommended to deemed ‘competent’ remains a highly vulnerable sub-
researchers proposing to conduct a phase 2 sham con- ject because he is still cognitively impaired. On the
trol gene transfer study for AD that: (a) only competent other hand, there is considerable evidence that even if
subjects be enrolled and (b) requiring permission from a person with mild to moderate AD is deemed ‘incom-
a caregiver be prohibited because it would ‘undermine petent,’ he may still retain important, ethically relevant
the autonomy’ of the subject [39]. abilities, such as the ability to convey a preference, the
he main consideration in favor of this ‘competent ability to work cooperatively with a loved one, or the
only’ requirement is the advantage of autonomous ability to delegate authority to a trusted surrogate [41].
192
In studies of persons with AD, it has been repeatedly that there is an increasing focus on novel and oten
shown that despite the obvious and signiicant loss in aggressive interventions to treat these disorders,
the ability to provide independent informed consent, including brain stimulation, gene transfer, and cell
such persons still tend to make medical treatment and transplants, among others [44–47].
research participation choices that are similar to age- he much needed efort to ind new interventions is
matched controls and choices that are, in the main, accompanied by a long-standing concern that persons
quite reasonable [42, 43]. with serious, incurable disorders may be so desperate
Is it better policy to require the ‘competent’ but for improvement that they are particularly vulner-
vulnerable subjects to stand alone (i.e., prohibit a able to what is called the therapeutic misconception
joint permission from a close relation) based on a dif- (TM), which was irst described over 25 years ago by
icult assessment; or, to require a broader approach by Appelbaum and colleagues [48] as the tendency of
respecting their remaining abilities (by maximizing research subjects to conlate research with treatment,
their involvement in decision-making) and yet provid- thereby generating mistaken beliefs about the pur-
ing additional safeguards, such as the informed per- pose and nature of research procedures, including the
mission of a family member? potential for beneits and harms [17].
Another consideration is that even if a person with Although the concept of TM seems intuitive, the
AD is ‘capable’ of consenting to a highly risky study, term is used in the literature ‘to denote a number of
it is quite likely that he or she will lose that capacity related, but not always identical concepts’ [49]. For
during the trial and will need a surrogate’s permission instance, in one study the investigators deined TM as
to maintain that person’s enrollment in the study [26]. the sum of three types of phenomena: subjects’ thera-
hus, even if a policy of competent only enrollment is peutic motivation for participation, their perception of
used, the de facto practice will have to involve a person therapeutic beneit, and their failure to understand the
who agrees to serve as a surrogate. From a legal point purpose of research [50]. It is likely that most persons
of view, a surrogate’s permission at the beginning of with serious, oten devastating, conditions with inad-
the study, if the subject is deemed competent, may not equate treatment options will volunteer for clinical
be necessary. But is there a reason to prohibit a surro- trials because they are hoping for therapeutic beneit,
gate’s informed permission, especially since a de facto even for early phase studies [12, 50]. But to assume that
agreement from that surrogate is needed anyway? Also, merely having such a motivation is a form of a mis-
since no one has a legal right to participate in a research conception seems inaccurate. Motivation and under-
study, it would seem reasonable for researchers to exer- standing may inluence one another, but they are not
cise the option of requiring informed decisions from the same thing. Subjects motivated by personal beneit
both the subject and the prospective surrogate, if the may in fact understand that the purpose of the clinical
researcher believes this will enhance the protection of a trial is scientiic, for the beneit of society [50]. hey
vulnerable research subject. may, for example, see themselves as using the clinical
he point is not that a competent only policy is nec- trial as an opportunity to receive beneit, in a kind of a
essarily right or wrong. he answer will surely vary for gamble [12].
diferent clinical trials. he investigator should carefully However, it is also reasonable to worry that when
think through such a policy in working with his ethics patients feel desperate about obtaining therapeutic
review committee, and make sure that the theoretical beneit and volunteer for a clinical trial on that basis,
rationale of upholding subject autonomy is not out- they may not be in an optimal position to coolly absorb
weighed by other real-world ethical considerations. and weigh all of the relevant elements of a clinical trial.
hat is, although it is wrong to equate therapeutic moti-
Conclusion: Helping potential subjects vation with a misunderstanding, it is reasonable to be
on guard against the natural human tendency to inter-
make good decisions pret facts in line with one’s motivations. As one subject
put it in one of our studies, ‘I really don’t remember
Concern over therapeutic misconception thinking about what [the researchers] were trying to
Some of the most devastating human illnesses are accomplish as much as how it was going to afect me…
neurological disorders, with only marginally efective I wasn’t sure at the beginning, to tell you the truth, even
symptomatic treatment available. It is understandable though I went through the study. hen I realized that
193
what they were trying to do was to see if there was any helping potential subjects make decisions, by provid-
harm done. hat was really the basis of the study’ [12]. ing a framework for an interactive conversation that
A subject who fails to see the experimental purpose of places the concerns of the subject and the aims of the
a clinical trial may in turn fail to understand and appre- clinical trial in context.
ciate the details of the clinical trial.
Preventing therapeutic misconception Acknowledgments

Informed consent for research has become institution- Supported in part by a Greenwall faculty scholars
alized. his can encourage a ‘compliance’ mindset in Award in Bioethics.
which informed consent is seen as merely a vehicle for
transferring information. People are seen as informa- References
tion receptacles that need to illed with the right kind 1. Emanuel EJ, Wendler D and Grady C. What makes
of information [51]. he focus becomes the informed clinical research ethical? JAMA 2000; 283: 2701–11.
consent document which becomes longer and longer 2. Temple R and Ellenberg SS. Placebo-controlled trials
as more and more information is deemed necessary to and active-control trials in the evaluation of new
‘transfer’ to the subject. treatments. Ann Int Med 2000; 133: 455–63.
But suppose informed consent is seen as more than 3. Emanuel EJ and Miller FG. he ethics of placebo-
compliance with regulations. What if it were seen as a controlled trials – a middle ground. New Engl J Med
conversation designed to promote good decision-mak- 2001; 345: 915–19.
ing by potential subjects and investigators? he con- 4. Faden RR and Beauchamp TL. A History and heory of
cern over TM is valuable. It serves as a reminder that Informed Consent. New York: Oxford University Press,
the informed consent conversation should take into 1986.
account where the patients are starting from, rather 5. Jonsen AR. he Birth of Bioethics. New York: Oxford
than seeing them as empty information receptacles. University Press, 1998.
he therapeutic motivation common in research 6. Vollmann J and Winau R. he Prussian regulation
subjects should therefore be an essential element in of 1900: early ethical standards for human
framing the informed consent conversation. Such a experimentation in Germany. IRB 1996; 18: 9–11.
conversation at some point should involve an explicit 7. Paasche-Orlow M, Taylor HA and Brancati FL.
question to the potential subject: ‘Mr. Jones, can you Readability standards for informed consent forms as
tell me what your main reasons are for wishing to compared with actual readability. New Engl J Med 2003;
participate in this clinical trial?’ Such a question is 348: 721–26.
not required by the Federal regulations, and no IRB 8. McWilliams R, Hoover-Fong J, Hamosh A, et al.
requires that an investigator ask it. Problematic variation in local institutional review of a
But the question will oten bring out on the table a multicenter genetic epidemiology study. JAMA 2003;
290: 360–66.
subject’s therapeutic motivation. his will provide the
investigator with an essential point of contrast between 9. Dziak K, Anderson R, Sevick MA, et al. Variations
among Institutional Review Board reviews in a
the scientiic purpose of the clinical trial and the sub-
multisite health services research study. Health Serv Res
ject’s underlying motivation: ‘Although we hope that 2005; 40: 279–90.
it might help you, our main concern is to see whether
10. Lipkus IM. Numeric, verbal, and visual formats of
or not the experimental therapy is safe and efective. conveying health risks: Suggested best practices and
We are doing an experiment to answer that scientiic future recommendations. Med Decis Making 2007; 27:
question. hat is why we will follow procedures that we 696–713.
would never use if our main goal were to beneit you.’ 11. Teigen KH and Brun W. Verbal probabilities: A
he investigator can then go on to explain those ele- question of frame? J Behav Decision Making 2003; 16:
ments of research design such as randomization, use of 53–72.
placebos (such as use of sham surgery), limits on use 12. Kim SYH, Schrock L, Wilson RM, et al. An approach to
of other medications, etc. that are done for the sake of evaluating the therapeutic misconception. IRB: Ethics &
answering the scientiic question. In this way, informed Hum Res 2009; 31: 7–14.
consent can go beyond just disclosure of information. 13. Flory J and Emanuel E. Interventions to improve
Instead, informed consent can become a means of research participants’ understanding in informed
194
consent for research: A systematic review. JAMA 2004; mellitus: Comparison of a 3-item questionnaire with a
292: 1593–1601. comprehensive standardized capacity instrument. Arch
14. Appelbaum PS, Lidz CW and Klitzman R. Gen Psychiatr 2005; 62: 726–33.
Voluntariness of consent to research: a conceptual 29. Kim SYH, Appelbaum PS, Jeste DV, et al. Proxy
model. Hastings Cent Rep 2009; 39: 30–39. and surrogate consent in geriatric neuropsychiatric
15. Appelbaum PS, Lidz CW and Klitzman R. research: Update and recommendations. Am J Psychiatr
Voluntariness of consent to research: a preliminary 2004; 161: 797–806.
empirical investigation. IRB 2009; 31: 10–14. 30. Wendler D and Prasad K. Core safeguards for clinical
16. Horng S, Emanuel EJ, Wilfond B, et al. Descriptions of research with adults who are unable to consent. Ann Int
beneits and risks in consent forms for phase I oncology Med 2001; 135: 514–23.
trials. New Engl J Med 2002; 347: 2134–140. 31. Alzheimer’s A. Research consent for cognitively
17. Lidz CW and Appelbaum PS. he therapeutic impaired adults: Recommendations for Institutional
misconception: Problems and solutions. Med Care Review Boards and investigators. Alzheimer’s Dis Assoc
2002; 40 (Suppl V): V55–63. Disord 2004; 18: 171–175.
18. Okonkwo O, Griith HR, Belue K, et al. Medical 32. Kim SYH, Kim HM, McCallum C, et al. What do
decision-making capacity in patients with mild people at risk for Alzheimer’s disease think about
cognitive impairment. Neurology 2007; 69: 1528–35. surrogate consent for research? Neurology 2005; 65:
1395–1401.
19. Jeferson AL, Lambe S, Moser DJ, et al. Decisional
capacity for research participation in individuals with 33. Kim S, Wall I, Stanczyk A, et al. Assessing the public’s
mild cognitive impairment. J Am Geriatr Soc 2008; 56: views in research ethics controversies: deliberative
1236–43. democracy and bioethics as natural allies. J Emp Res
Hum Res Ethics 2009; 4: 3–16.
20. Kim SYH, Caine ED, Currier GW, et al. Assessing
the competence of persons with Alzheimer’s disease 34. Kim SYH, Kim HM, Langa KM, et al. Surrogate consent
in providing informed consent for participation in for dementia research: A National Survey of Older
research. Am J Psychiatr 2001; 158: 712–17. Americans. Neurology 2009; 72: 149–55.
21. Dymek MP, Atchison P, Harrell L, et al. Competency to 35. California Health and Safety Code, Amendment to
consent to medical treatment in cognitively impaired Section 24178, 2002.
patients with Parkinson’s disease. Neurology 2001; 56: 36. Code of Virginia, Title 32.1, Section 162.16–162–18.
17–24. 2002; Section 162.116–162.119.
22. New Jersey, Access to Medical Research Act, Title 26, 37. Saks ER, Dunn LB, Wimer J, et al. Proxy Consent to
14.1–14.5. In: 2008; 14.11–14.15. Research: Legal Landscape. Yale J Health Law Policy
23. Kim SYH. Evaluation of Capacity to Consent to Ethics 2008; 8: 37–78.
Treatment and Research. New York: Oxford University 38. Black BS, Rabins PV, Sugarman J, Karlawish JH.
Press, 2010. Seeking assent and respecting dissent in dementia
24. Appelbaum PS. Assessment of patients’ competence research. Am J Geriatr Psychiatr 2010; 18: 77–85.
to consent to treatment. New Engl J Med 2007; 357: 39. Minutes of the Recombinant DNA Advisory
1834–40. Committee. 2008. http://oba.od.nih.gov/oba/RAC/
25. President’s Commission for the Study of Ethical meetings/Sept2008/RAC_Minutes_09–08.pdf.
Problems in M, Biomedical, Behavioral R. Making (Accessed January 27, 2010.)
health care decisions: the ethical and legal implications 40. Kim SYH. When does decisional impairment become
of informed consent in the patient-practitioner decisional incompetence? Ethical and methodological
relationship, 1982. Report No. One. issues in capacity research in schizophrenia. Schizophr
26. National Bioethics Advisory C. Research Involving Bull 2006; 32: 92–7.
Persons with Mental Disorders hat May Afect 41. Kim SYH and Appelbaum PS. he capacity to appoint a
Decisionmaking Capacity. Rockville, MD, NBAC, 1998. proxy and the possibility of concurrent proxy directives.
27. Kim SYH, Caine ED, Swan JG, et al. Do clinicians Behav Sci Law 2006; 24: 469–78.
follow a risk-sensitive model of capacity determination? 42. Kim SYH, Cox C and Caine ED. Impaired decision-
An experimental video survey. Psychosomatics 2006; 47: making ability and willingness to participate in research
325–29. in persons with Alzheimer’s disease. Am J Psychiatr
28. Palmer BW, Dunn LB, Appelbaum PS, et al. Assessment 2002; 159: 797–802.
of capacity to consent to research among older persons 43. Marson DC, Cody HA, Ingram KK, et al.
with schizophrenia, Alzheimer disease, or diabetes Neuropsychological predictors of competency in
195
Alzheimer’s disease using a rational reasons legal 47. Hochberg LR, Serruya MD, Friehs GM, et al. Neuronal
standard [comment]. Arch Neurol 1995; 52: 955–59. ensemble control of prosthetic devices by a human with
44. Lozano AM, Dostrovsky J, Chen R, et al. Deep brain tetraplegia. Nature 2006; 442: 164–71.
stimulation for Parkinson’s disease: disrupting the 48. Appelbaum PS, Roth LH and Lidz C. he therapeutic
disruption. Lancet Neurol 2002; 1: 225–31. misconception: informed consent in psychiatric
45. Kaplitt MG, Feigin A, Tang C, et al. Safety and research. Int J Law Psychiatry 1982; 5: 319–29.
tolerability of gene therapy with an adeno-associated 49. Appelbaum PS, Lidz CW and Grisso T. herapeutic
virus (AAV) borne GAD gene for Parkinson’s misconception in clinical research: frequency and risk
disease: an open label, phase I trial. Lancet 2007; 369: factors. IRB Ethics & Human Research 2004; 26: 1–8.
2097–105. 50. Henderson GE, Easter MM, Zimmer C, et al.
46. Freed CR, Greene PE, Breeze RE, et al. herapeutic misconception in early phase gene transfer
Transplantation of embryonic dopamine neurons for trials. Soc Sci Med 2006; 62: 239–53.
severe Parkinson’s disease. New Engl J Med 2001; 344: 51. Manson N and O’Neill O. Rethinking Informed Consent in
710–19. Bioethics. New York: Cambridge University Press, 2007.
196
Section 5 Regulatory perspectives
Chapter
Evidentiary standards for neurological
18 drugs and biologics approval

Russell Katz
of use prescribed, recommended, or suggested in the

Introduction proposed labeling thereof…’
he evidentiary standards for the approval of drugs to
treat human disease are set forth in the relevant sec- Until 1997, the Act deined ‘substantial evidence’ as
tions of the Food, Drug, and Cosmetic Act (the Act) follows:
[1]. his statute, enacted by Congress in 1938, and ‘…evidence consisting of adequate and well-controlled
amended in important ways numerous times since, investigations, including clinical investigations, by
describes the evidence a sponsor must submit, and that experts qualiied by scientiic training and experience
the FDA (the Agency) must ind acceptable, in order to evaluate the efectiveness of the drug involved, on the
basis of which it could fairly and responsibly be con-
for a drug to be approved for marketing in the US. he
cluded by such experts that the drug will have the efect it
law set out broad standards for both the demonstration purports or is represented to have under the conditions
of efectiveness and safety, and implementing regula- of use prescribed, recommended, or suggested in the
tions written by the Agency further deine, more spe- labeling or proposed labeling thereof [1].’
ciically, how the statutory standards can be met. Both
the Act and the regulations are suiciently lexible to he requirement that substantial evidence of efective-
accommodate a wide variety of clinical situations; that ness derive from clinical investigations was intended to
is, they anticipate, and allow for, diferent standards for embody the accepted scientiic standard for independ-
drug approval for the myriad conditions and diseases ent replication or corroboration. hat is, a ‘positive’
that alict patients. he Public Health Service Act is the inding in a single study (perhaps even performed by a
statute under which biological products (‘…any virus, single investigator) was not considered to be adequate
therapeutic serum, toxin, antitoxin, or analogous prod- to support a conclusion that a drug was efective; such a
uct…’) are regulated; this statute requires, as a stand- inding had to be independently (e.g., by other investi-
ard of efectiveness, that these products be shown to gators studying other patients) conirmed.
be ‘potent’. For all intents and purposes, the standards However, in 1997, Congress amended the Act
for the demonstration of efectiveness are identical for by passing the Food and Drug Administration
drugs and biologics [2]. his chapter will focus prima- Modernization Act (FDAMA). Among other impor-
rily on some of the more important and current issues tant changes, a new deinition of substantial evidence
related to the demonstration of efectiveness of drugs of efectiveness was added to the law. he relevant lan-
and biologics. guage is given below:
‘If the Secretary determines, based on relevant science,
that data from one adequate and well-controlled clinical
General effectiveness investigation and conirmatory evidence (obtained prior
he basic legal requirement for a demonstration of to or ater such investigation are suicient to establish
efectiveness, the Secretary may consider such data and
efectiveness is codiied in the Act at Section 505(d),
evidence to constitute substantial evidence…[3]’
and is described as follows:
‘…substantial evidence that the drug will have the efect he law now contains both deinitions of substantial
it purports or is represented to have under the conditions evidence, and either can be applied in any given case.
197
Section 5: Regulatory perspectives
Although replication is most commonly required, the exhibit any clinical manifestations of an improvement
law provides no guidance as to when the alternative in that measurement.
deinition of substantial evidence was to be applied, nor he Agency has for many years approved treatments
does the law provide a deinition of conirmatory evi- on the basis of studies that examine a treatment’s efects
dence. However, the Agency has described some of the on surrogate markers, without any assessment of the
elements of a single trial that might permit it to consti- patient’s clinical symptoms (common examples include
tute, with conirmatory evidence, substantial evidence anti-hypertensives, cholesterol lowering drugs, treat-
of efectiveness. Some of these elements include: ments for glaucoma). he justiication for relying on these
1) a small p-value (demonstrating that the indings measurements in these cases is that evidence exists dem-
are very unlikely to have occurred by chance) onstrating that changes in these surrogates are relected
2) multiple outcomes showing statistically signiicant (usually in the relatively distant future) in changes in
diferences from the control clinically important outcomes (for example, a decrease
3) multiple study centers showing positive indings in heart attacks and strokes for anti-hypertensives and
4) multiple sub-groups (e.g., both mildly and severely cholesterol-lowering agents, and preservation of normal
impaired patients) equally beneitted by drug vision for treatments for glaucoma). Because there is evi-
dence establishing the relationship between a treatment’s
5) multiple dose groups showing beneit
efects on these surrogates and clinically important out-
Although not all of these elements need to be positive comes, these surrogates are considered ‘validated’.
in such a setting, the more robust the indings, the more As a general matter, reliance on a drug’s efect on a
likely that the results of a single study can be considered surrogate marker for approval is applied in those cases
to constitute substantial evidence of efectiveness [4]. in which the clinical outcome of interest is likely to be
demonstrable only over many years. hat is, in these
Surrogate markers as primary outcome cases, studies capable of examining the treatment’s
measures efects directly on the clinical outcome(s) of interest
may need to be impractically long.
Another critical change to the law introduced with
FDAMA permits the Agency to approve treatments
FDAMA was a provision regarding the use of surrogate
on the basis of their efects on what may be called ‘unval-
markers as primary outcome measures.
idated’ surrogates (this standard has been in the regula-
Ordinarily, drugs are approved on the basis of a
tions since 1992). Unvalidated surrogates are those for
showing of an efect on a measure that is of clear clin-
which the relationship between the treatment’s efects
ical beneit to patients. In essentially all cases, drugs to
on the surrogate and the clinical outcome(s) of inter-
treat neurological disease are approved on the basis of
est has not been established. he law does require, as
clinical trials that examine the drug’s efects on a face
described above, that there be a ‘reasonably likely’ rela-
valid measurement with clinical meaning (e.g., scales
tionship between the efect on the surrogate and the
that measure symptoms, event [seizures], time to
clinical outcome of interest; the basis for such a conclu-
events of interest, etc.).
sion can vary, but is ultimately a judgment.
As part of FDAMA, however, Congress granted the
Because the reasonably likely standard introduces a
FDA the authority to approve a drug on the basis of an
degree of uncertainty about the treatment’s utility that
efect on what can be called an ‘unvalidated’ surrogate
does not exist with the usual basis for drug approval
marker. As described below, under its Fast Track provi-
(ater all, the efect on the surrogate may not predict
sions, the Agency may approve a drug:
the hoped-for clinical beneit), the law stipulates that
‘…upon a determination that the product has an efect on
this standard for approval be applied only in those
… a surrogate endpoint that is reasonably likely to pre-
cases where the disease being treated is serious or life-
dict clinical beneit [1].’
threatening and where the treatments already available
A surrogate marker is (typically) a laboratory test (bio- are inadequate. In addition, the law requires that the
chemical test, imaging test, etc.) that, by itself, bears no surrogate be validated ater the drug is approved.
direct relationship to how a patient feels or functions. Although the potential to approve drugs on the
For example, although a patient’s blood pressure may basis of their efects on unvalidated surrogate mark-
be high, he or she does not ordinarily experience any ers is attractive for many reasons, such approvals raise
clinical symptom relective of this measurement, nor serious questions.
198
Chapter 18: Evidentiary standards for drugs and biologics approval
Most important, as alluded to above, because the sponsors to develop treatments for pediatric patients
efect on the surrogate need only be reasonably likely (deined as patients 16 years of age and younger) for
(and not established by evidence) to predict the clini- those indications approved in adults [6].
cal beneit, it could turn out that the treatment, in fact, Of course, the indication for which the drug is
does not predict the hoped-for clinical beneit. Indeed, approved in adults must exist in at least some subset
numerous examples in the literature describe studies of pediatric patients in order for the requirements for
in which the proposed treatment did afect the surro- pediatric studies to apply. For those subsets of pediatric
gate in the desired way but had either no efect, or a patients in which the disease in adults does not exist,
deleterious efect on the ultimate clinical outcome. he the Agency will grant a waiver of the requirements.
relationship between the surrogate and the clinical out- he speciic kind of pediatric data required will
comes in the untreated state may not continue to exist depend on the speciic clinical setting. If it can be dem-
under treatment conditions. onstrated that: 1) the condition for which the drug is
here can be many reasons for the potential disapproved in adults is essentially the same as in pediat-
sociation between the efects of a drug on an (unval- ric patients; 2) there is evidence that pediatric patients
idated) surrogate and the clinical outcome with which will respond similarly to the drug as do adults; 3) there
the surrogate is correlated in the untreated state. In is evidence that pediatric patients will respond to the
general, however, they are probably related to the fact same doses (or plasma exposures) as adults, the only
that drugs can have both desirable and undesirable speciic pediatric requirement may be for pharma-
efects, many of which are unknown and unpredict- cokinetic studies to determine an appropriate dosing
able. Based on our understanding of the mechanism of regimen in pediatric patients that will produce rel-
action of a drug, we might predict that its efects on evant plasma exposures. On the other hand, if there is
both the surrogate and clinical outcomes will be bene- uncertainty about the similarity of the disease or the
icial. In reality, it may ‘ix’ the surrogate and have other exposure-response relationship in pediatric and adult
unpredicted actions that make the patient’s clinical patients, a single controlled trial in pediatric patients
symptoms worse. Alternatively, there may be many will usually be required.
underlying pathophysiological pathways that lead to Independent of the speciic requirements imposed
clinical symptoms in a particular disease. Although for pediatric efectiveness data, there will almost always
a drug may have efects on a pathway that result in a be a requirement for safety data in pediatric patients.
desirable change on a surrogate outcome measure, it Although some adult safety data may be relevant to the
may have no efect or harmful efects on other path- pediatric population, the Agency will almost always
ways, resulting in an overall efect on the patient that is be interested in deining the efects of the treatment
either null or harmful [5]. on the developing child, including an assessment on
For these reasons, we would be most conident growth (height and weight), cognitive and neuropsy-
that a drug’s efect on the surrogate will translate into a chological development, sexual maturation, and other
clinical beneit when we have a complete understand- issues. Furthermore, additional special studies may
ing of all of the drug’s actions, as well as a complete be required for drugs known to have efects that may
understanding of all of the physiologic events under- be particularly problematic on the developing human
lying the production of symptoms. Of course, we (e.g., pediatric patients may require speciic bone den-
never have such a complete understanding of either sity assessments when treated with drugs that afect
the treatment or the disease; in this respect, the con- bone metabolism).
clusion that a drug’s efect on an unvalidated surrogate In addition to the PREA requirements for pediatric
marker will predict the clinical outcome of interest is studies, the Agency has another statutory mechanism
always uncertain. for obtaining data in pediatric patients.
he Best Pharmaceuticals for Children Act (BPCA)
Pediatric studies was passed by Congress in 2002. he provisions of this
In an efort to promote the development of drugs to act, unlike PREA, are voluntary. Speciically, BPCA
treat pediatric patients, several statutory mechanisms provides that if sponsors perform and submit by a
have been adopted. speciied time, studies in pediatric patients requested
he Pediatric Research Equity Act (PREA) was by the Agency, any existing marketing exclusivity for
passed by Congress in 2003. his legislation requires a drug will be extended by 6 months (that is, generic
199
versions of the drug will not be permitted for this add- Orphan diseases
itional 6 months). his exclusivity is extended whether
Rare diseases raise numerous questions related to the
or not the studies performed demonstrate that the
evidentiary standards for drug approval. he Agency
treatment is efective or safe in pediatric patients.
deines orphan diseases as those with a prevalence
Under BPCA, the Agency can ask for studies not only
of less than 200 000 in the US [8]. Although there are
in those indications already approved in adults but
numerous beneits associated with the designation of
also for indications where the Agency considers that
a treatment as an ‘orphan drug’, including grants, tax
the treatment is likely to be used in pediatric patients.
advantages, and a waiver of the requirement to per-
By contrast, the PREA requires pediatric studies only
form studies in pediatric patients, neither the law, nor
in the same indication for which the drug is approved
the regulations describe any diferent standard of evi-
in adults). Typically, studies required in pediatric writ-
dence (either for safety or efectiveness) required for
ten requests (PWRs) include extensive dose inding
the approval of treatments for orphan or non-orphan
and pharmacokinetic studies, controlled trials, and
diseases. For example, the determination of efective-
safety data [7].
ness for an orphan indication must meet one of the two
Because the sponsor may accrue a large inancial
deinitions of substantial evidence discussed above.
beneit by conducting the studies requested by the
However, of course, the speciic data necessary to sup-
Agency regardless of the outcome of the studies, and
port approval of an orphan treatment will depend upon
because the goal is to design and conduct studies opti-
the speciic clinical setting. For example, the require-
mally designed to yield useful information in the pedi-
ments (for the demonstration of both safety and efec-
atric population, great energy is expended to ensure, to
tiveness) for approval of a drug intended to treat an
the extent possible, that the controlled trials conducted
orphan disease with a prevalence of 3000 people are
by the sponsors are designed to maximize the potential
likely to be substantially diferent from those imposed
of the studies to detect a treatment efect, if there is one.
on a treatment intended to treat an orphan disease with
his imperative may result in the imposition of speciic
a prevalence of 150 000 people.
requirements that may not always be part of studies
performed in adults.
For example, although it is always important for
Types of acceptable study designs
adequate dose inding to be performed, it is particu- Although the Act does not deine ‘adequate and well-
larly important in studies done to satisfy BPCA. If a controlled investigations’, the implementing regula-
sponsor proposes to study a single dose in a pediat- tions describe ive diferent types of clinical trials that
ric study that has been shown to be efective in adults, can, depending upon the clinical setting, be considered
and that dose is not shown to be efective in pediatric to contribute to a inding of substantial evidence of
patients, such a study is not likely to be considered efectiveness. hese following ive studies are described
adequate to satisfy the demands of BPCA, because at 21 Code of Federal Regulations (CFR) 314.126:
pediatric patients may have a diferent dose-response 1) Placebo concurrent control-patients are assigned
than adults. Indeed if, a priori, we knew that pediatric (typically randomly) to treatment with the
patients responded similarly to a given dose as adults, investigational drug or an inactive placebo.
a controlled trial in pediatric patients would be unnec- 2) Dose-comparison concurrent control-patients
essary. For this reason, PWRs typically require studies are assigned (typically randomly) to one of several
that explore the full tolerated dose range in pediatric doses of the investigational drug; in this design,
patients to ensure, to the extent possible, that an efect there may also be a placebo group.
will be demonstrated if it exists. Similarly, sample sizes 3) No treatment concurrent control-patients are
for pediatric patients are typically calculated on esti- assigned (typically randomly) to the investigational
mates of efect size and data variability obtained in treatment or to standard care, but no placebo.
adults. Of course, these measures may be diferent in 4) Active treatment concurrent control-patients are
the pediatric population, so studies conducted to sat- assigned (typically randomly) to the investigational
isfy BPCA may need to incorporate interim analyses drug or to an active drug already approved for
to assess whether these parameters are as predicted; if that indication. his design may also incorporate
they are not, the sample size or other study parameters several ixed doses of either treatment as well as
may need to be amended. placebo.
200
5) Historical control-patients are given the information for most of the conditions for which spon-
investigational drug but there is no concurrent sors are currently developing treatments.
control group; the responses of the patients are A critical aspect in the interpretation of the results
compared to responses in a cohort of patients with of almost all clinical trials of neurological treatments is
the same condition not included in the study. the requirement that a diference in outcomes be shown
between the investigational treatment and the control
Although the ive types of control groups described in order for the results to be interpretable. he design
above as providing substantial evidence of efectiveness that is usually most eicient in this regard employs a
may all be appropriate under certain circumstances, in concurrent placebo control (such a design may also
the development of treatments for patients with neu- include multiple ixed doses and/or an active control),
rological illness, the use of historical controls is rarely though this is almost never required. Of course, a trial
acceptable. It is rarely, if ever, the case that a concurrent that does not distinguish between the efects of an
control cannot be included in a study of a neurological applied treatment and a placebo group cannot be inter-
treatment. preted as demonstrating an efect of the drug.
Historical controls are the weakest type of control, By contrast, trials employing an active control are
primarily because there is usually considerable uncer- oten designed to demonstrate equivalence of two treat-
tainty that the patients being given the investigational ments with the intention of drawing the conclusion that
drugs are similar in all relevant aspects to the patients the new treatment is efective. In most cases, a trial that
constituting the historical control. he great advan- fails to distinguish an efect between an investigational
tage to utilizing a concurrent control group to which treatment and an active control is uninterpretable.
patients have been randomized is that randomization Such an outcome has two possible interpretations:
can be counted on (in most cases) to create treatment either both drugs were efective, or both drugs were
groups that are similar in the attributes (both known inefective. he irst interpretation seems the most log-
and unknown) that might afect response to treatment. ical; ater all, a new drug was shown to be ‘equivalent’ to
If attributes that can afect patients’ responses to the a drug known to be efective.
applied treatment are mal-distributed among groups, he law in this argument is that it is oten impos-
this is likely to result in a bias (that is, one group will be sible to know (with any reasonable degree of certainty)
more likely to respond than another, unrelated to the that the active control was efective in this particular
treatment itself) that may be extraordinarily diicult study. Not every drug previously determined to be
to detect. If a diference between treatments is detected active (on the basis of adequate and well-controlled
in such a study, it will be diicult, if not impossible, to trials) is efective at all times, in all populations. Using
determine if the diference is related to the treatment or this design, the only way to conclude that the investi-
the diferences in responsiveness of the groups them- gational drug was efective is to show that the active
selves. he use of non-concurrent historical controls treatment was also efective in this particular study;
will invariably raise questions of interpretability that this can only be shown by demonstrating that patients
may be impossible to answer. If there were conditions not treated with the active control would have had a
for which detailed information was available about the worse outcome. In this sense, an active control trial
natural history of the untreated condition (for exam- that fails to show a diference between treatments can
ple, obtained from a large cohort of patients followed be considered a type of historical controlled trial, the
prospectively), and we were reasonably certain that the weakest, most diicult to interpret trial design, as dis-
patients constituting this cohort were essentially iden- cussed above [9–11].
tical to the ones being treated with the investigational One circumstance (perhaps the only one) in which
drug (including elements of the standard of care of the an active control trial that does not distinguish treat-
historical control and the study population), and the ments can appropriately be interpreted as establish-
efect produced by the treatment was extremely large, ing the efectiveness of the new treatment is the case
so that it could not reasonably be attributed to the in which there is a very large dataset of controlled tri-
fact that patients knew they were on active treatment, als that has uniformly demonstrated the efectiveness
it might be possible to interpret a diference between of the active control in patients essentially the same
the responses of the two cohorts as being due to the as those enrolled in the active controlled study itself.
treatment. Unfortunately, we do not typically have this Typically, we would expect that the previous trials
201
would have uniformly demonstrated superiority of the efectiveness), in some of these cases, patients rand-
active control to a control (usually placebo) in many omized to placebo may sufer withdrawal phenomena
well-designed and conducted trials. Even one trial in immediately ater randomization, during which their
which the active control was not superior to placebo condition may be worse than if they had never received
would raise questions about whether or not the active treatment at all. In this case also, any diference seen
control could reasonably have been known to have between the patients continuing on drug and those
been efective in the trial in which it was included as experiencing withdrawal on placebo might inappro-
a control. If there were a large such number of trials priately be attributed to a beneicial efect of the drug
of the active control, all of them positive, we might be [12]. Additionally, withdrawal symptoms attributable
conident that it was efective in the trial in which it was to the investigational agent might unblind investigators
compared to the investigational drug. Unfortunately, to treatment assignments and create bias in a particu-
such a large, robust, clinical trial database in which a lar study. In some cases, withdrawal symptoms may be
proposed active control has been uniformly shown mitigated by slowly withdrawing treatment over time
to be superior to placebo (or other control) does not in patients randomized to placebo. Nonetheless, any
exist for most, if any, of the drugs sponsors have pro- diference between the new treatment and a control
posed as active controls in studies of neurological can be interpreted to support a beneicial efect of the
disease. For this reason, a trial of a neurological treat- new treatment.
ment that does not distinguish the efects of that treat- Although a diference between the new treatment
ment and an active control is typically considered to be and almost any control can be interpreted to demon-
uninterpretable. strate an efect of the new treatment, as noted earlier,
Of course, if an investigational drug is shown to be the most eicient and most common control group is
superior to an active control, this can be interpreted as a placebo group. Almost all trials of new agents to treat
being a ‘positive’ study. In this case, we may not know if neurological disease employ a placebo group, even
the active control was efective or not, but if it was not, though there may be cases in which all patients are on
then the new treatment has been shown to be super- other background treatments as well. In these studies,
ior to what, in efect, was a placebo (at least in this so-called add-on studies, patients are randomized to
trial), which is the usual source of evidence of efect- have the new drug, or the placebo, added on to their
iveness. he only caveat about interpreting a diference background medications; these studies can demon-
between an investigational treatment and an active strate that the new treatment is efective when added
control is that, in order to interpret this diference as to other treatments, but not to establish that the new
demonstrating a beneicial efect of the new treatment, treatment is efective by itself.
we must assume that the active control did not make Although an argument has been made that a group
patients worse than they would have been without the in which patients receive only placebo is unethical
treatment. his is usually a reasonable assumption, when alternative treatments are available for the condi-
but there may be cases in which such an assumption tion under investigation, the international community,
is wrong. For example, there are certain anti-epilepsy including the FDA, has not routinely adopted this posi-
drugs (AEDs) that are considered to exacerbate certain tion. To be sure, if the condition under study is serious
speciic seizure types. If one of these AEDs were used as or life-threatening, and the available treatments have
an active control in a study of an investigational treat- been shown to prevent signiicant morbidity or mor-
ment for that seizure type, any apparent superiority of tality, these treatments cannot ethically be withheld.
the new treatment may be spurious. However, in many cases, the available treatments pro-
Similarly, there are studies in which all patients vide only symptomatic beneits, and withholding them
receive the investigational treatment for a speciied for the relatively short durations necessary to establish
period of time, ater which they are randomized to the efectiveness of a new treatment does not expose
continue on the treatment or receive placebo (so- the patient to any important risk. In these cases, it is
called randomized withdrawal designs). In these stud- perfectly acceptable from an ethical point of view to
ies, the outcome measure is typically either the time randomize patients to placebo. Indeed, if patients were
to, or the proportion of patients, reaching a speciied required to receive the best available care in all cases
failure event. Although these are ordinarily acceptable in which treatments were available, no new treatments
designs (and are frequently used to establish long-term could ever be developed, because it would be ethically
202
unacceptable to withhold the available treatments establishing a treatment’s efect on slowing disease pro-
from patients, which means that they could not receive gression. Speciically, it is postulated that a treatment’s
any new treatment that had not yet been established efect on a given surrogate relects an efect on the
to be efective. he previous point notwithstanding, disease itself. For example, a treatment may decrease
if the only studies that could be done were those that the appearance of brain atrophy as imaged on MRI
employed active controls, it is likely that many of these in patients with Alzheimer’s disease, and this would
would not be interpretable, for the reasons discussed be taken as evidence that the drug had an efect on
earlier. he conduct of clinical trials that are known to the underlying disease. Another example would be a
be uninterpretable is itself seriously problematic from treatment that decreased the amount of amyloid in the
an ethical point of view. brains of patients with AD, as seen on PET scanning. It
is clear both atrophy and amyloid deposition increase
with the progression of AD, and the assumption, there-
Disease modification and prevention fore, is that a treatment that interrupts this process is
To date, the treatments available to treat progressive considered to, almost by deinition, slow the progress
neurological disease are, almost without exception, of the disease.
considered to provide symptomatic beneit to patients; However, as stated earlier, the approval of a treat-
that is, there is no evidence that the available treatments ment (in this case, for a claim for disease modiica-
slow the progression of the underlying disease process. tion) based on an efect on an ‘unvalidated’ surrogate is
However, at this time, numerous treatments are being problematic. At this time, the Agency has determined
developed that are believed to slow the progression of that a treatment that has been demonstrated to have an
the underlying disease. It is worth considering the ele- efect on such a surrogate would not, by itself, be ade-
ments of clinical trial designs that could support such quate to support a disease-modifying efect. However,
a claim. a treatment shown to have an efect on a clinical out-
In the typical case, patients are randomized to come as well as on a proposed surrogate, might, under
receive investigational drug or a control (usually pla- certain circumstances (including a wide consensus
cebo). Any diference in favor of drug is considered to among the community of experts about the relation-
demonstrate an efect of the drug, but such a design ship of the surrogate to the progression of the disease)
cannot distinguish between a symptomatic efect of be considered to support a disease-modiication claim.
the drug, and an efect on the underlying progression More appropriately, the Agency has endorsed a study
of the disease. For this reason, numerous clinical trial design which is considered adequate to demonstrate a
designs have been proposed as being capable of detect- disease-modifying efect.
ing a disease-modifying efect of a treatment. In this design, patients are randomized to either
It is commonly proposed that a trial (or outcome) drug or placebo, as in the standard study design, and
that shows an increasing diference between study treat- the expectation is that a diference will emerge between
ments over time deines a disease-modifying efect. In the treatments at an appropriate time (this is identical
this view, symptomatic efects (which are usually seen to the typical design and outcome that support a stand-
early ater treatment initiation and are typically con- ard claim). At this point, patients originally assigned to
sidered to wane over time) could not possibly increase drug are switched over to placebo, and patients origin-
over time because the disease itself is progressing. ally treated with placebo continue to receive placebo.
herefore, it is argued, such an outcome must relect In this second phase, if patients originally assigned to
an efect of the treatment on the underlying disease drug (and now receiving placebo), approach the ratings
process. Although this response could relect a disease- of the patients continuing on placebo, the efect seen in
modifying efect, it is possible that a symptomatic efect the irst phase is considered to relect a symptomatic
could, in fact, increase over time as the disease process treatment (that is, when the treatment is withdrawn,
progresses. he possibility that such an outcome may they respond as if they had been on placebo all along).
not represent a disease-modifying efect has made this If, however, the patients originally assigned to drug do
scenario unacceptable (at this time) as establishing a not approach (or reach) the original placebo patients
disease-modifying efect of any treatment. when they are switched to placebo, the implication
More commonly, many sponsors have proposed is that their original treatment with drug fundamen-
that surrogate markers be used in the service of tally altered their disease (otherwise, they would have
203
‘caught up’ to the original placebo patients). A similar appropriate length could, in theory, establish an efect
design, except that in the second phase patients ori- on prevention, but these circumstances are rare, if they
ginally assigned to placebo are switched to drug, and exist at all.
those originally assigned to drug remain on drug, has Regardless, many sponsors are contemplating
also been proposed. hese so-called randomized with- developing treatments to be applied to patients with
drawal and randomized start designs, respectively, have signs of pathology (either imaging or biochemical) but
the great advantage of essentially ‘forcing’ a conclusion without clinical symptoms, in the hope of preventing
that the treatment has modiied the disease, as opposed those symptoms from occurring. Beside the obvious
to relying on numerous assumptions about drug efects advantages to public health of doing so, recent expe-
and pathophysiological events leading to disease that rience with various treatments suggests that treating
other approaches to disease modiication require for patients with purported disease-modifying agents
interpretation. However, the randomized withdrawal once clinical symptoms have occurred may be futile,
(and start) designs are complicated and pose numer- because the damage to necessary structures makes
ous methodological problems (for example, how long these treatments inefective. For this reason, it might
should the second phase be to accurately determine be necessary to study pre-symptomatic patients simply
whether or not patients are ‘approaching’ each other; in order to establish an efect of the treatment. As noted
what are the statistical criteria to determine if patients above, in most cases, these trials would not be capa-
are approaching each other; how should dropouts be ble of establishing that the treatment prevented the
handled in these long-term studies, etc.). Nonetheless, disease, but could reliably be interpreted as delaying
these designs have the great advantage of requiring few the time to the onset of symptoms (that is, they could
assumptions in order for a disease modiication claim detect a meaningful efect of the drug).
to be supported. In most cases, the hope would be to treat patients
Another related issue of considerable interest is the many years before the expected onset of symptoms. In
determination of an efect of treatment on preventing this setting, a trial of any reasonable duration would
neurological disease. not be expected to show an efect on clinical symptoms.
he design of a trial designed to prevent disease herefore, a surrogate marker would most likely be
raises numerous questions, including the fundamental acceptable as a primary outcome measure. However, as
question of what constitutes a disease. In most degen- previously discussed, considerable information about
erative diseases of the nervous system, the pathological the efect of the drug on the surrogate and the expected
hallmarks of the disease can predate the onset of clin- clinical outcome would need to be available in order
ical symptoms (and therefore diagnosis) by decades. for the Agency to conclude that the efect seen on the
Does a treatment that prevents the onset of symptoms surrogate would be ‘reasonably likely’ to predict the
(but that is applied ater the onset of the pathology) desired efect on the clinical outcome. For example, in
truly prevent the disease? the case of AD, studies in patients with very early AD
An important point to make in this context is that (i.e., mildly symptomatic patients) might establish a
delaying the time to diagnosis or the onset of symp- relationship between the treatment and the surrogate
toms is not the same as prevention. Delaying the time and clinical symptoms. Such data might then provide
to diagnosis or symptoms, although perfectly accepta- conidence that an efect on the surrogate alone (in the
ble as an outcome supporting drug approval, is entirely pre-symptomatic patients) would predict the delay to
consistent with a symptomatic efect, and therefore the onset of clinical symptoms that we would require
cannot be considered to establish a preventive efect. in order to grant a claim.
Similarly, long trial duration in asymptomatic
patients, cannot, by itself, be considered to establish a Comparative effectiveness and safety
preventive efect. For most diseases, the period of risk Another area of increasing interest is the area of com-
continues for the patient’s life. For this reason, a study parative efectiveness and/or safety. For various rea-
of even several years duration, in which drug-treated sons, there is considerable interest in the design of
patients do not develop symptoms, cannot deinitively clinical trials that will demonstrate either the superior
establish that patients will not become symptomatic efectiveness of one treatment compared to another, or,
later. In some cases, if the period of risk of develop- alternatively, the superior safety proile. hese compar-
ing symptoms is known and inite, a trial duration of isons are important, but trials designed to demonstrate
204
the superiority of one treatment compared to another increases the severity or frequency of this latter event),
are potentially problematic. it may be inappropriate to permit a claim of superior
In particular, the critical consideration in these tolerability for Drug A.
comparisons is that any trial designed to demonstrate Of course, it may be possible for a trial to enroll
superiority should incorporate elements to ensure patients who cannot tolerate Drug A (either because
that the comparison is a fair one. For example, in trials of a speciic adverse reaction of due to a general lack
designed to demonstrate that one drug is more efect- of tolerability), and compare the tolerability of Drug
ive than another, it is critical that appropriate doses of A with Drug B. In such a trial, patients would be rand-
each treatment are compared. he choice, for instance, omized to one or the other drug, and the comparative
of a maximally tolerated high dose of the new treat- tolerability could be examined. Even if Drug B caused
ment compared to a low dose of the control will result a ‘new’ adverse event in these patients, they may still
in an unfair comparison, and will not be adequate to prefer Drug B to Drug A.
conclude that the new treatment is superior to the
old. Further, if the old treatment must be titrated, but
is not titrated in the trial in which it is used as a con- References
trol, any inding of superiority of the new treatment 1. Federal Food, Drug, and Cosmetic Act (FD&C Act).
may be biased and uninterpretable. Another consider- United States Code (U.S.C.) Title 21, Chapter 9.
ation would involve the appropriate choice of outcome 2. Myers AM, et al. An overview of the drug approval
measures. One drug may be superior to another on a process: An FDA perspective. In: Hartzema AG, Tilson
particular measure of efectiveness, but the opposite HH, Chan KA. Pharmacoepidemiology and herapeutic
may be true for a diferent measure of efectiveness. he Risk Management. Harvey Whitney Books Company.
2008; 67–94.
over-arching principle to be applied in such studies is
that the control treatment must be administered under 3. Food and Drug Modernization Act (FDAMA) of 1997.
Public Law 105–115. November 21, 1997.
conditions in which it will be maximally efective and
which examine all relevant measures of efectiveness; if 4. Guidance document: Providing Clinical Evidence
those conditions are not obtained in the comparative of Efectiveness for Human Drug and Biological
Products. 1998. http://www.fda.gov/Drugs/
trial, any statement about the superior efectiveness of GuidanceComplianceRegulatoryInformation/
the new treatment will be questionable. Guidances/ucm065012.htm
Similarly, if a claim of superior tolerability is to
5. Fleming T and DeMets D. Surrogate end points in
be granted, the study on which such a claim is to be clinical trials: Are we being misled? Ann Intern Med
granted must be a fair one. In this case, a critical con- 1996; 125: 605–613.
sideration in the design of such trials is that the treat- 6. Pediatric Research Equity Act (PREA) of 2003. Public
ments be compared on doses that are equi-efective. If Law 108–155. December 3, 2003.
a inding of increased tolerability of one drug occurs
7. Best Pharmaceuticals for Children Act (BPCA). Public
in the setting of a dose of that drug that is less efective Law 107–109. January 4, 2002.
than the control, that inding may be misleading. his
8. Orphan Drug Act. Public Law 97–414. January 4, 1983.
requirement can be problematic, because a showing of
‘equi-efectiveness’ may be diicult, given that it can 9. Leber PD. Hazards of inference: the active control
investigation. Epilepsia 1989; (30) Suppl 1:S57–63;
only be formally demonstrated through a inding of
discussion S64–8.
non-inferiority, a diicult outcome to achieve. Again,
10. Temple R and Ellenberg SS. Placebo-controlled trials
as in the case of an attempt to establish superior efect-
and active control trials in the evaluation of new
iveness, the ideal study would compare a range of doses treatments. Part 1: ethical and scientiic issues. Ann
of both drugs. Intern Med 2000; 133: 455–63.
In the case of a trial designed to establish the supe-
11. Temple R and Ellenberg SS. Placebo-controlled trials
rior safety proile of one drug compared to another, and active control trials in the evaluation of new
it is also critical that the trials examine a full range of treatments. Part 2: practical issues and speciic cases.
adverse events and employ methods sensitive enough Ann Intern Med 2000; 133: 464–70.
to adequately assess them. If Drug A is not associated 12. Leber PD and Davis CS. hreats to the validity of
with an adverse event seen with Drug B, but Drug A clinical trials employing enrichment strategies for
causes an adverse event not seen with Drug B (or sample selection. Control Clin Trials 1988; 19: 178–87.
205
Section
Section5 Regulatory perspectives
Chapter
Premarket review of neurological devices
19 Eric A. Mann and Peter G. Como
Introduction systems designed to retrieve clots in patients experien-

cing acute ischemic stroke as well as a variety of coils,
he US neurological device market is one of the fast-
stents, low diverters, and injectable agents designed to
est growing segments in the country’s medical device
embolize and/or occlude intracranial aneurysms and
industry. he global neurological device market is
arteriovenous malformations. hese ‘minimally inva-
predicted to exceed $5 billion by 2016 [1]. Factors
sive’ technologies ofer an alternative to open surgical
expected to spur this growth include changing patient
procedures, and may have comparatively lower com-
demographics, increasing physician adoption of inno-
plication rates and shorter hospital stays.
vative technologies, patient demand, and the availabil-
Overall, the US neurological device market cur-
ity of reimbursement for device-related procedures.
rently appears to be in a similar situation to that of the
In particular, neurological disorders such as epilepsy,
cardiovascular device market of the 1990s. hat is, the
chronic migraine headache, stroke, and neurodegen-
substantial unmet need for efective treatments of neu-
erative disorders (e.g., Alzheimer’s disease, Parkinson’s
rological disorders, coupled with a large and expand-
disease) afect large patient groups, many of which are
ing patient population, is expected to spur growth in
rapidly increasing in size with the overall aging of the
the neurological device ield in coming years.
country’s population.
Neurostimulation devices [e.g. deep brain stimula-
tors (DBS), spinal cord stimulators, peripheral nerve The role of the FDA
stimulators, and vagus nerve stimulators] are the fast- he mission of the Center for Devices and Radiological
est growing category within the neurological device Health (CDRH) within the FDA is to promote and
market [1]. Currently, FDA-approved indications for protect the health of the American public by assuring
neurostimulation devices include the treatment of the safety and efectiveness of medical devices and the
debilitating conditions such as treatment-resistant safety of radiological products marketed in the US.
depression, epilepsy, gastroparesis, urinary incon- he enormous scope of this mission is exempliied
tinence, chronic pain, Parkinson’s disease, essential by the tremendous diversity and number of products
tremor, dystonia, and obsessive compulsive disorder. regulated as devices including tongue depressors,
Additionally, research is underway to expand the wheelchairs, tanning beds, in vitro and radiological
indications for use of neurostimulation devices to diagnostic devices, cardiac pacemakers, prosthetic
other important conditions such as obesity, stroke, joints, and DBSs. Overall, there are approximately
Alzheimer’s disease, hypertension, migraine, and 1700 diferent generic types of devices identiied in
neuropsychiatric disorders (e.g. Tourette syndrome the regulations [4]. In 2006 alone, expenditures on
and addictive disorders). Neurostimulation devices medical devices in the US were estimated at $131.6
may, in fact, be cost-efective alternatives to traditional billion [5].
pharmacologic therapy for some disorders [2]. FDA’s legal authority to regulate medical devices
Neurointerventional devices constitute another derives from the Medical Device Amendments of 1976
rapidly growing segment of the neurological device to the Federal Food, Drug, and Cosmetic Act (FD&C
market [3]. hese devices include catheter-based Act). As deined under the Act [6], a medical device is:
206
Chapter 19: Premarket review
Table 19.1 Risk-based classification of medical devices
Class Risk Regulatory requirements Neurological device examples

Class I Low General controls Manual surgical instruments, neurological
pinwheel, tuning fork, neurosurgical chair
Class II Moderate General controls and special controls EEG cortical and cutaneous electrodes, neurological
endoscope, evoked response stimulators
Class III High General controls and premarket approval DBS, VNSa, cortical stimulators, dural sealants,
polymerizing neurovascular embolization agents
a
DBS: deep brain stimulator: VNS: vagus nerve stimulators.
An instrument, apparatus, implement, machine, con- both the safety and efectiveness of these products in
trivance, implant, in vitro reagent, or other similar or the US.
related article, including a component part, or accessory
which is:
• recognized in the oicial National Formulary,
Regulatory classification of devices
or the US Pharmacopoeia, or any supplement to As described above, the Medical Device Amendments
them, of 1976 established a risk-based classiication system
• intended for use in the diagnosis of disease or for medical devices [6]. he goal of this system is to tai-
other conditions, or in the cure, mitigation, lor the degree of regulatory oversight to the risks posed
treatment, or prevention of disease, in man or by a particular device type. Each generic type of device
other animals, or is assigned to one of three regulatory classes, each with
• intended to afect the structure or any function of distinct regulatory requirements (see Table 19.1). A list
the body of man or other animals, and which does of classiication regulations for various types of diag-
not achieve any of it’s primary intended purposes nostic, surgical, and therapeutic neurological devices,
through chemical action within or on the body of and their regulatory classiication is found in the Code
man or other animals and which is not dependent
of Federal Regulations (CFR) under 21 CFR 882[7].
upon being metabolized for the achievement of any
he following sections will provide an overview of the
of its primary intended purposes.
regulatory requirements for each of these classes.
If the primary intended use of the product is achieved
through chemical action or by being metabolized by
the body, the product is usually regulated as a drug
Class I devices
or biological product. he regulatory framework for Class I devices are low risk devices for which FDA
devices difers from that for drugs and biologics in that has determined that general controls alone will pro-
a risk-based classiication system determines the level vide a reasonable assurance of safety and efectiveness.
of regulatory oversight for a speciic device type. his General controls are the baseline regulatory require-
regulatory approach is consistent with both the wide ments of the FD&C Act that apply to all three classes of
spectrum of risk levels posed by devices as well as the medical devices. hese controls include:
frequent, incremental modiications made to devices • Adulteration and Misbranding provisions (Sections
to enhance safety and efectiveness with rapid techno- 501 and 502 of the FD&C Act)
logical advancement. Requiring at least two adequate In general, a device will be considered adulterated (and
and well-controlled clinical investigations, the usual in violation of the FD&C Act) if it is unsanitary, con-
tains a poisonous substance or unsafe color additive,
evidentiary standard per the drug regulations, would
difers from its claimed purity or quality, or fails to meet
be inappropriate for many devices (e.g. tongue a required performance standard. he main provisions
depressors, bedpans) and impractical and unneces- regarding misbranding require that the labeling not be
sary for other devices (e.g. minor modiications or false or misleading, that the device packaging bear a label
design enhancements to currently approved devices). containing certain information (e.g. name and address
However, evidentiary standards for devices, drugs, of manufacturer, device’s established name, quantity of
and biologics all share the common goal of ensuring contents), and that the device bear adequate directions
207
for use including appropriate warnings for over-the- • Premarket notiication

counter devices. Section 510(k) of the FD&C Act requires a manufacturer
who intends to market a new medical device to submit a
• Good manufacturing practices
premarket notiication [also known as a ‘510(k)’] to the
he device manufacturer must conform to the Quality Agency. his 510(k) premarket application is described in
System Regulation (21 CFR 820) which contains general further detail in ‘Types of FDA Premarket Applications’
requirements in the areas of: organization and personnel; below.
design practices and procedures; buildings and environ- Of note, as a result of the 1997 Food and Drug
mental control; design of labeling and packaging; con- Administration Modernization Act (FDAMA), almost
trols for components, processes, packaging and labeling; all Class I devices are now exempt from the premarket
inished device evaluation; distribution and installation; notiication requirement. However, there are limitations
device and manufacturing records; complaint process- to this exemption, as outlined in 21 CFR 882.9 for neu-
ing; and QA system audits. rological devices.
• Registration and listing hus, manufacturers are required to comply with the
All manufacturers are required to register their estab- general controls outlined above in order to legally
lishments with FDA and submit a list of all devices they market a Class I medical device. Examples of Class I
manufacture. his information is maintained in data- neurological devices include various simple diagnos-
bases within FDA.
tic devices (e.g. tuning fork, neurological pinwheel,
• Repair, replacement or refund provisions percussion hammer) and various manual surgical
he FD&C Act authorizes the Agency, ater ofering an instruments.
opportunity for an informal hearing, to order manu-
facturers, importers, or distributors to repair, replace, Class II devices
or refund the purchase price of devices that present an Class II devices are moderate risk devices for which
unreasonable risk to health. FDA has determined that special controls, in addi-
• Records and reports on devices tion to the general controls (as outlined above), are
Section 519 of the FD&C Act authorizes FDA to prom- necessary to provide a reasonable assurance of safety
ulgate regulations requiring manufacturers, importers, and efectiveness. he special controls which apply to
or distributors to maintain records and reports to assure a certain device type depend on the speciic safety and
that devices are not misbranded or adulterated. efectiveness issues associated with it, and may include
special labeling requirements, mandatory performance
• Restricted devices standards and post-market surveillance requirements
Under Section 520 (e) of the Act, FDA may restrict the (e.g. patient registry or device-tracking requirements
sale, distribution, or use of a device if necessary to pro- that facilitate device recalls or patient notiications if
vide a reasonable assurance of safety and efectiveness.
necessary). For example, the special control for neu-
For example, if adequate directions for use for a device
rovascular embolization devices such as embolization
can not be written that will assure safe use of a device
by the lay public, the device can be restricted through coils (which are Class II devices) is a special controls
prescription use. Other restrictions may pertain to labe- guidance document created by FDA [8] which out-
ling or other requirements. For example, hearing aid lines speciic risks to health posed by these devices
devices are not restricted through prescription use, but (e.g. blood vessel perforation, unintended thrombosis,
are restricted by regulation regarding speciic labeling adverse tissue reaction, infection, hematoma forma-
requirements (e.g. user brochure, technical data to be tion) and recommended measures to mitigate these
provided) and the requirement for a medical evaluation risks (e.g. pre-clinical testing, animal testing, clinical
by a licensed physician within 6 months of the hearing testing, labeling). hus, any manufacturer intending to
aid being dispensed. market a new neurovascular embolization coil device
• Banned devices will need to adequately address the issues outlined in
If a device presents such deception or risk of illness or this guidance document and will need to obtain FDA
injury, which cannot be corrected by a change in labeling, marketing clearance through the premarket notiica-
then FDA may publish a proposed regulation to ban the tion [510(k] process which is described below. Unlike
device. To date, only one device (prosthetic hair ibers) Class I devices, most Class II devices still require clear-
has been banned by FDA regulation. ance through the 510(k) process prior to marketing.
208
Class III devices the predicate; and has diferent technological

characteristics and the information submitted to
Class III is the most stringent regulatory classiica-
FDA:
tion for devices. Class III devices are those for which
insuicient information exists to assure safety and ◦ does not raise new types of safety and
efectiveness solely through general or special controls. efectiveness questions; and
Typically, such devices support or sustain human life, ◦ demonstrates that the device is at least as safe
are of substantial importance in preventing impair- and efective as the legally marketed device.
ment of human health, or present a potential, unrea-
sonable risk of illness or injury. A claim of substantial equivalence does not mean
In addition to the General Controls that also apply that the new and predicate devices must be identical.
to Class I and Class II devices, premarket approval Substantial equivalence is established with respect
(PMA) is the required process of scientiic review to to intended use, design, energy used or delivered,
ensure the safety and efectiveness of Class III devices materials, chemical composition, manufacturing
(see further description of the PMA process in the fol- process, performance, safety, efectiveness, labeling,
lowing account). Examples of Class III devices which biocompatibility, standards, and other characteris-
require PMA include DBSs, cortical stimulators, and tics, as applicable. If there are diferences in these areas
vagus nerve stimulators. between the new device and the predicate which could
impact safety and/or efectiveness, the applicant must
provide performance data (e.g. bench, animal, and/or
Types of FDA premarket applications clinical data) in the 510(k) to show that the new device is
at least as safe and efective as the cited predicate. Until
Premarket notification the submitter receives an order from FDA declaring
A premarket notiication or 510(k) is the type of the a device to be substantially equivalent, the submitter
premarket application required by FDA for most Class may not proceed to market the device. his determin-
II devices and some class I devices [9]. he 510(k) must ation, which is referred to as a ‘clearance’ for marketing
demonstrate that the device to be marketed is at least as (as opposed to ‘approval’ for marketing under the PMA
safe and efective, that is, ‘substantially equivalent’, to a process as described below), is usually made within
legally marketed device (or devices) that is not subject 90 days of FDA review time and is made based on the
to PMA. A legally marketed device, as described in 21 information submitted by the applicant.
CFR 807.92(a)(3), is a device that was either: Over the past decade (1999–2009), FDA has cleared
1. Legally marketed prior to the Medical Device approximately 1800 neurological devices. A signiicant
Amendments of 1976 (pre-amendments device), portion of these 510(k) cleared devices include devices
for which a PMA is not required, or which assess brain function (e.g. EEG monitors, EEG
2. A device which has been reclassiied from Class III electrodes, depth of anesthesia monitoring systems,
to Class II or I, or intracranial pressure monitors), diagnostic devices
3. A device which has been found substantially such as hearing screeners, biofeedback systems, trans-
equivalent through the 510(k) process. cutaneous electrical nerve stimulator (TENS) devices,
and various neurological and neurosurgical instru-
his legally marketed device to which equivalence is ments. An online searchable database of FDA-cleared
drawn is commonly known as the ‘predicate’ device. devices is available at: http://www.accessdata.fda.gov/
Although devices most recently cleared under 510(k) scripts/cdrh/cfdocs/cfPMN/pmn.cfm
are oten selected as the predicate to which substantial As noted above, most Class I devices (and some
equivalence is claimed, any legally marketed as deined Class II devices) are now exempted by regulation from
above may be used as a predicate. the 510(k) requirements. However, these exemptions
A device is substantially equivalent if, in compari- are subject to limitations under the regulations (i.e., if
son to a predicate, it: a new device that falls under these ‘exempted’ device
• has the same intended use as the predicate; and types has either a new indication for use or new tech-
has the same technological characteristics as nology which could impact its safety or efectiveness
the predicate; or has the same intended use as compared to other legally marketed predicate devices
209
within that device type, then the manufacturer would constitutes a signiicant functional disability.’ Since the
be required to obtain 510(k) clearance prior to market- original approval, the sponsor has submitted approxi-
ing the new device). mately 82 supplemental applications to the original
PMA to date which have been approved for a variety
of changes to the device hardware (e.g., rechargeable
Premarket approval battery) and indications for use (e.g., bilateral implant-
he PMA process [4] is the most stringent type of ation, management of the advanced symptoms of
device marketing application and is required by FDA Parkinson’s disease).
for Class III devices. he applicant must receive FDA
approval of its PMA application prior to marketing the
device. PMA approval is based on a determination by Humanitarian device exemption
FDA that the PMA contains suicient valid scientiic A humanitarian use device (HUD) is a device that is
evidence to assure that the device is safe and efective intended to beneit patients by treating or diagnos-
for its intended use(s). An approved PMA is, in efect, ing a disease or condition that afects or is manifested
a private license granting the applicant permission to in fewer than 4000 individuals in the US per year. A
market the device. device manufacturer’s research and development costs
Information contained in a PMA submission could exceed its market returns for diseases or condi-
typically includes the following: an in-depth device tions afecting small patient populations. he HUD
description and indications for use, a description of provision of the regulations (21 CFR 814 Subpart H)
alternative practices and procedures for the proposed provides an incentive for the development of devices
indications for use, a marketing history of the device for use in the treatment or diagnosis of diseases afect-
outside of the US if applicable, detailed manufacturing ing these populations.
information, reference to any performance standard To obtain marketing approval for an HUD, a
or voluntary standard used in the development and humanitarian device exemption (HDE) application is
testing of the device, results of non-clinical labora- submitted to FDA. An HDE is similar in both form
tory studies, results of clinical investigations involving and content to a PMA application, but is exempt from
human subjects, and copies of all proposed labeling for the efectiveness requirement of a PMA. hat is, an
the device. HDE application is not required to demonstrate that
he regulations provide 180 days for FDA to review the device is efective for its intended purpose. he
the PMA and make a decision. However, the overall application, however, must contain suicient infor-
review time may be longer because of deiciencies or mation for FDA to determine that the device does
questions raised by FDA that need to be addressed by not pose an unreasonable or signiicant risk of illness
the applicant. An online searchable database of PMA- or injury, and that the probable beneit to health out-
approved devices is available at: http://www.accessdata. weighs the risk of injury or illness from its use, taking
fda.gov/scripts/cdrh/cfdocs/cfPMA/pma.cfm into account the probable risks and beneits of cur-
Among the most prominent neurological devices rently available devices or alternative forms of treat-
approved under the PMA process are the DBS devices ment. Additionally, the applicant must demonstrate
which have been PMA-approved for Parkinson’s dis- that no comparable devices (other than another HDE
ease and essential tremor. he irst of these neuro- device) are available to treat or diagnose the disease or
stimulators was approved by FDA in 1997 (Medtronic condition, and that they could not otherwise bring the
Activa Tremor Control System™). his device system device to market.
(DBS lead electrodes, lead extensions, implantable An approved HDE authorizes marketing of the
pulse generator, memory module, console program- HUD. However, an HUD may only be used in facili-
mer, burr hole ring and cap, magnet, test stimulator, ties that have established a local institutional review
lead frame kits and accessories) was initially approved board (IRB) to supervise clinical use of HDE-approved
for the following indication: ‘unilateral thalamic devices. he labeling for an HUD must state that the
stimulation for the suppression of tremor in the upper device is a humanitarian use device and that, although
extremity in patients who are diagnosed with essen- the device is authorized for marketing by Federal Law,
tial tremor or parkinsonian tremor not adequately the efectiveness of the device for the speciic indica-
controlled by medications and where the tremor tion has not been demonstrated.
210
Over the past 13 years, FDA has approved nearly that sponsors submit to the agency or those studies
50 HDEs. Several of these have included neurological for which an IRB or clinical investigator asks for FDA’s
device technology including neurostimulator devices opinion. If FDA disagrees with an IRB’s NSR determin-
indicated for restoring or promoting bladder function, ation, the sponsor may not begin their study until FDA
gastric emptying, and diaphragmatic function; for aid- approves an IDE. If a sponsor submits an IDE to FDA
ing in the management of chronic, intractable primary because the sponsor presumed it to be an SR study, and
dystonia; and for the treatment of patients with obses- FDA determines that the device study is a NSR, FDA
sive-compulsive disorder who are resistant to medical will inform the sponsor in writing. he study may then
therapy. Several other HDEs have been approved for be reviewed by the IRB as an NSR study.
neurovascular indications (stroke, wide-necked intra-
cranial aneurysms) in HUD populations. Non-significant risk device
A listing of HDE approvals can be found at the
Non-signiicant risk devices are devices that do not
FDA website at: http://www.fda.gov/MedicalDevices/
pose a signiicant risk to subjects in a research study.
ProductsandMedicalProcedures/DeviceApprovalsand
Examples of NSR neurological devices include EEG,
Clearances/HDEApprovals/ucm161827.htm
functional non-invasive electrical neuromuscular
stimulators, and TENS devices for treatment of pain
Investigational device exemptions (except chest pain/angina).
he investigational device exemptions (IDE) regula- A NSR device study requires only IRB approval
tion (21 CFR 812) pertains to devices that have not prior to initiation of a clinical study. Sponsors of stud-
been approved or cleared for marketing or that are ies involving NSR devices are not required to submit an
being tested for indications not previously approved or IDE application to FDA for approval.
cleared. he IDE allows the investigational device to be
used in a clinical study in order to collect safety and IDE application process
efectiveness data required to support a PMA or, less
An IDE application to FDA must include informa-
frequently, a 510(k) or HDE premarket submission. An
tion on relevant preclinical studies and any available
IDE application to FDA is required for any signiicant
clinical data. he sponsor (or sponsor/investigator in
risk device, as deined below.
the case of an individual or group of individuals not
associated with a device manufacturer) must also sub-
Significant risk device mit an investigational research plan that describes the
A signiicant risk device presents a potential for ser- research design and analytic methods to be used. his
ious risk to the health, safety, or welfare of a subject. plan should deine the study design, study objectives or
Signiicant risk devices may include implants, devices hypotheses, device description, subject inclusion/exclu-
that support or sustain human life, and devices that sion criteria, procedures, data monitoring plan, statis-
are substantially important in diagnosing, cur- tical analysis plan, including sample size estimates and
ing, mitigating or treating disease, or in preventing power calculations to detect a signiicant efect, speci-
impairment to human health. Examples of neuro- ication of primary and secondary outcome measures,
logical devices currently considered as signiicant risks/risk monitoring plan, number of investigators/
risk devices include implanted intracerebral/subcor- sites, and whether or not a data safety monitoring com-
tical stimulators, implanted spinal cord and periph- mittee is planned. Extensive online information is avail-
eral nerve stimulators, neurovascular embolization able to assist industry and investigators in the planning,
devices, hydrocephalus shunts, and electroconvulsive design and conduct of IDE studies at: http://www.fda.
therapy devices. gov/MedicalDevices/DeviceRegulationandGuidance/
FDA guidance on distinguishing between signii- HowtoMarketYourDevice/InvestigationalDeviceExem
cant risk and non-signiicant risks studies is available in ptionIDE/ucm162453.htm
the document ‘Signiicant Risk and Nonsigniicant Risk An IDE study cannot proceed until the IDE is
Medical Device Studies’ [10]. FDA is the inal arbiter in approved by FDA and an IRB. FDA and investigators
deciding whether a device study poses signiicant risk or sponsors may engage in extensive discussions about
(SR) or non-signiicant risk (NSR). It should be noted, the characteristics and objectives of research studies to
however, that FDA generally only sees those studies support any future claims of safety and efectiveness.
211
hese discussions oten occur through the pre-IDE 1. A legally marketed device when used in accordance
process (see next section). with its labeling
Upon receipt of an IDE application, sponsors are 2. A diagnostic device if it complies with the labeling
notiied in writing of the date that FDA received the requirements in 21 CFR 809.10(c) and if the
original application and an IDE number assigned for testing:
tracking purposes. An IDE application is considered a. is non-invasive;
approved 30 days ater it has been received by FDA, b. does not require an invasive sampling
unless FDA otherwise informs the sponsor within 30 procedure that presents signiicant risk;
calendar days from the date of receipt that the IDE is c. does not by design or intention introduce
approved, approved with conditions, or disapproved. In energy into a subject; and
cases of disapproval, a sponsor has the opportunity to
d. is not used as a diagnostic procedure without
either respond to the deiciencies or to request a regu-
conirmation by another medically established
latory hearing.
diagnostic product or procedure.
Once an IDE application is approved, the follow-
ing requirements must be met in order to conduct the 3. Consumer preference testing, testing of a
investigation in compliance with the IDE regulation: modiication, or testing of a combination of devices
• Labeling – he device must be labeled in if the device(s) are legally marketed device(s) [that
accordance with the labeling provisions of the is, the devices have an approved PMA, cleared
IDE regulation (21 CFR 812.5) and must bear the Premarket Notiication 510(k), or are exempt from
statement ‘CAUTION – Investigational Device. 510(k] and if the testing is not for the purpose of
Limited by Federal (or United States) law to determining safety or efectiveness and does not
investigational use.’ put subjects at risk.
• Distribution – Investigational devices can only 4. A device intended solely for veterinary use.
be distributed to qualiied investigators [21 CFR 5. A device shipped solely for research with
812.43(b]. laboratory animals and contains the labeling
• Informed Consent – Each subject must be ‘CAUTION – Device for investigational use in
provided with and sign an informed consent laboratory animals or other tests that do not
form before being enrolled in the study. 21 CFR involve human subjects.’
50, Protection of Human Subjects, contains the
requirements for obtaining informed consent. Depending upon the nature of the investigation, those
studies which are exempt from the requirements of the
• Monitoring – All investigations must be properly
IDE regulation may or may not be exempt from the
monitored to protect the human subjects and
requirements for IRB review.
assure compliance with approved protocols (21
CFR 812.46).
• Prohibitions – Commercialization, promotion, Pre-IDE process
and misrepresentation of an investigational device he pre-IDE process provides a means for gaining
and prolongation of the study are prohibited (21 FDA comments and feedback on proposed preclin-
CFR 812.7). ical or clinical studies intended to support a marketing
• Records and Reports – Sponsors and investigators application. his includes studies for both SR and NSR
are required to maintain speciied records and devices or post-market studies which do not require an
make reports to investigators, IRBs, and FDA (21 IDE submission, but which will generate data to sup-
CFR 812.140 and 21 CFR 812.150). port an eventual marketing submission. his process is
especially beneicial for medical device manufacturers
or sponsors/investigators who have not had previous
IDE exempt investigations contact with the FDA, and whose device utilizes new
All clinical investigations of devices must have an technologies or involves new uses of existing technolo-
approved IDE or otherwise be exempt from the IDE gies. Early interaction with the agency may help to
regulation. Studies exempt from the IDE regulation [21 increase the sponsor’s understanding of FDA require-
CFR 812.2(c] include those involving: ments, regulations, and guidance documents, and will
212
Table 19.2 FDA Medical Device Advisory Panels

Anesthesiology and Respiratory Therapy Devices Panel Hematology and Pathology Devices Panel
Circulatory System Devices Panel Immunology Devices Panel
Clinical Chemistry and Clinical Toxicology Devices Panel Microbiology Devices Panel
Dispute Resolution Panel Molecular and Clinical Genetics Panel
Ear, Nose and Throat Devices Panel Neurological Devices Panel
Gastroenterology and Urology Devices Panel Obstetrics and Gynecology Devices Panel
General and Plastic Surgery Devices Panel Ophthalmic Devices Panel
Dental Products Panel Orthopaedic and Rehabilitation Devices Panel
General Hospital and Personal Use Devices Panel Radiological Devices Panel
allow FDA personnel to familiarize themselves with radiation. Each committee consists of experts with
the new technologies. Increased interaction between recognized expertise and judgment in a speciic ield.
FDA and sponsors and investigators may also help to Members have the training and experience necessary
speed the regulatory process and minimize delays in to evaluate information objectively and to interpret
the development of clinically useful devices. he com- its signiicance. While these members are not regular
munication with FDA may take the form of a ‘pre-IDE employees of FDA, they are paid as ‘special government
submission’ and/or a ‘pre-IDE meeting’. Pre-IDE sub- employees’ for the days they participate as members
missions oten focus on troublesome parts of a planned of a panel and assist FDA in its public health mission.
IDE application (e.g., clinical protocol design, pre- he committees are advisory – they provide their com-
clinical testing proposal, pre-clinical test results, and ments and recommendations regarding issues and
protocols for foreign studies when the studies will be questions posed by FDA – but inal decisions are made
used to support future marketing applications to be by the Agency. Panel input is oten requested for ‘irst
submitted to FDA). Upon completion of the review of of kind’ devices or applications which pose challenging
the pre-IDE submission, the reviewing division within safety and/or efectiveness issues.
CDRH will issue comments and responses to questions he majority of neurological device issues requir-
posed by the sponsor within the submission in a timely ing panel input are brought before the Neurological
manner, usually within 60 days of receipt. Pre-IDE Devices Advisory Panel. However, depending on the
meetings may take the form of telephone conference proposed indication for use of the device under con-
calls, video conferences, or face-to-face meetings and sideration, other advisory panels may be involved. For
typically focus on speciic questions or issues raised example, neurostimulation devices to promote gastric
during the review of the pre-IDE submission. or bladder emptying would likely be presented to the
Gastroenterology and Urology Devices Panel which
Role of FDA advisory panels would have the most appropriate clinical and scien-
tiic expertise to evaluate the safety and efectiveness
he Medical Devices Advisory Committee consists of
issues associated with such devices. Alternatively,
18 panels (see Table 19.2). With the exception of the
experts from other advisory panels within FDA may
Medical Devices Dispute Resolution Panel, these pan-
be used to augment necessary areas of expertise for
els advise the Agency about issues related to the safety
the Neurological Devices Advisory Panel for speciic
and efectiveness of medical devices. he Medical
device issues.
Devices Dispute Resolution Panel provides advice to
the Commissioner on complex or contested scientiic
issues between the FDA and medical device sponsors, Summary
applicants, or manufacturers. CDRH has established FDA uses a tiered, risk-based classiication of med-
advisory committees to provide independent, profes- ical devices, including neurological devices, in deter-
sional expertise and technical assistance on the devel- mining the regulatory requirements for the premarket
opment, safety and efectiveness, and regulation of review process. General regulatory requirements (i.e.,
medical devices and electronic products that produce general controls) apply to all classes of devices and are,
213
by themselves, suicient to assure the safe and efective 3. Pena C, Li K, Felten R, et al. An example of US Food
use of low risk (Class I) devices. Additional ‘special con- and Drug Administration device regulation: Medical
trols’ such as post-market surveillance, conformance to devices indicated for use in acute ischemic stroke.
Stroke 2007; 38: 1988–1992.
standards and guidance documents, supplement these
general controls for moderate risk (Class II) devices. 4. Premarket Approval Manual. HHS Publication FDA
Finally, the PMA process is used to ensure the safety 97–4214, January 1998; 1–2.
and efectiveness of high risk (Class III) devices. In 5. Donahoe G and King G. Estimates of Medical
addition to reviewing premarket applications for these Device Spending in the United States. 2009. http://
devices [510(k)s, PMAs, HDEs], the FDA is respon- www.advamed.org/NR/rdonlyres/6ADAAA5B-
BA37–469E-817B-3D61DEC4E7C8/0/
sible for the regulatory oversight of clinical studies King2009FINALREPORT52909.pdf
for signiicant risk investigational devices. he agency
actively collaborates with industry and investigators in 6. Federal Food, Drug, and Cosmetic Act (as
amended March, 2005). U.S. Government Printing
developing rigorous clinical studies that will provide Oice. Washington, D.C. 2005. http://www.
adequate safety and efectiveness data to support FDA fda.gov/RegulatoryInformation/Legislation/
clearance or approval of devices that will beneit the FederalFoodDrugandCosmeticActFDCAct/default.htm
American public. he data generated by such studies 7. Code of Federal Regulations. U.S. Government Printing
may be presented to a CDRH Advisory Panel of exter- Oice. Washington, D.C. http://www.accessdata.fda.
nal clinical and scientiic experts for recommendation gov/scripts/cdrh/cfdocs/cfcfr/cfrsearch.cfm
and comment for devices with novel technologies, 8. Center for Devices and Radiological Health FDA.
indications for use, or for applications which pose spe- Guidance for industry and FDA staf – Class II
ciic challenging safety and efectiveness issues. During special controls guidance document: vascular
the premarket review of neurological and other device and neurovascular embolization devices. Silver
types, the FDA strives to fulill its dual mission of both Spring, MD. http://www.fda.gov/MedicalDevices/
promoting and protecting the public health by assur- DeviceRegulationandGuidance/GuidanceDocuments/
ucm072013.htm
ing the safety and efectiveness of medical devices.
9. Center for Devices and Radiological Health FDA.
Guidance on the CDRH premarket notiication
References review program. 510(k) Memorandum #K86–3. Silver
1. PRLog Free Press Release. Prospects of the Neurology Spring, MD. http://www.fda.gov/MedicalDevices/
Devices Market to 2016. 2010. http://www.prlog. DeviceRegulationandGuidance/GuidanceDocuments/
org/10544597-prospects-of-the-neurology-devices- ucm081383.htm
market-to-2016.html 10. Information Sheet Guidance for IRBs, Clinical
2. Weaver FM, Follett K, Stern M, et al. Bilateral deep Investigators, and Sponsors: Signiicant and
brain stimulation vs best medical therapy for patients nonsigniicant risk medical devices. http://www.fda.
with advanced Parkinson disease: A randomized gov/downloads/RegulatoryInformation/Guidances/
controlled trial. JAMA 2009; 301: 63–73. ucm126418.pdf
214
Section 6 Clinical trials in common neurological disorders
Chapter
Parkinson’s disease
20 Karl Kieburtz and Jordan Elm
Introduction via oral supplementation of levodopa, a metabolic

precursor of dopamine. Although initial investigation
In the early nineteenth century James Parkinson
was of uncertain beneit, eventually levodopa emerged
described the cardinal motor features of Parkinson’s
as a dramatically efective treatment in reversing most
disease (PD), which remain the hallmark of early diag-
of the motor features of PD, particularly rigidity and
nosis to this time. His initial observations emphasized
bradykinesia.
the slowness of movement (bradykinesia), rhythmic
he ultimate cause of PD remains uncertain,
shaking of the limbs at rest (resting tremor), resist-
although there are genetic forms of illness (both
ance of the limbs to passive movement (rigidity),
autosomal recessive and dominant) with clinical
and stooped posture with impaired balance (postural
and some pathological features similar to other-
change and instability). Later in the nineteenth century
wise ‘idiopathic’ PD. Still, the vast majority of PD
these clinical features were conirmed by other neu-
does not have a clear genetic cause, although many
rologists and codiied by Charcot. Although the initial
investigators believe there is an important interplay
description did not include impairment in cognitive
between genes and the environment in its pathogen-
functioning, some concerns were raised about this,
esis. Current hypotheses regarding the mechanism
and other aspects of mood and personality, as a greater
of neurodegeneration include abnormalities in pro-
understanding of the disease developed. Furthermore,
tein folding and traicking, bioenergetic defects, free
it was recognized as an illness that is chronic and pro-
radical injury and induction of cell death programs,
gressive, with no clear treatments that could modify
perhaps with an interaction among all. Potential
the course.
therapies targeting these mechanisms are under
he underlying neuropathology of this disorder
active investigation.
only came to light in the early twentieth century, with
the identiication of cell loss and atrophy of brain stem
and mid-brain nuclei of neurons. Loss was particularly Goals of intervention
notable in the pigmented pars compacta of the substan- With the advent of efective treatment of the classic
tia nigra of the mid-brain. However, other pigmented motor features of PD, a better understanding of the
nuclei of the brain stem, such as the locus coeruleus and complexity of the clinical features of PD emerged
the dorsal motor nucleus of the vagus, also had evidence in the last few decades of the twentieth century.
of neuronal loss. An understanding of the underlying Complex motor features such as freezing of gait, fall-
neurochemical defects in PD did not emerge until the ing and motor luctuations, including wearing of of
second half of the twentieth century. Several investi- the response to levodopa and involuntary movements
gators, including Oleh Hornykiewicz, Arvid Carlsson, called dyskinesias (oten in response to levodopa dos-
and others, identiied the striatal deiciency of dopa- ing), were identiied. hese motor features, particu-
mine that was a corollary of the loss of neurons in the larly freezing and falling, were relatively resistant to
substantia nigra pars compacta. his identiication of the beneicial efects of levodopa, in comparison to
a dopaminergic deicit led to the proposal, and subse- rigidity and bradykinesia. In addition, despite the
quent testing, of replacement of the deicient dopamine efective treatment of the classic motor features of
215
Section 6: Clinical trials in neurological disorders
PD, additional non-motor features emerged, perhaps Study populations

relecting the more extensive neuropathology of PD
Because of the progressive nature of the disease, ran-
that involves more than dopaminergic nerve cells.
domized trials tightly deine the target population
Chief among these non-motor features is impairment
depending on the goal of the intervention. In general,
in cognition, which may be subtly present even at the
clinical trials in PD enroll patients all within the same
earliest diagnosis, but in a proportion of patients will
course of their disease. Frequently, clinical trials enroll
advance to functionally limiting cognitive impairment
one of three groups of patients: 1) early, untreated PD
and dementia. Mood is also impaired in PD most oten
patients; 2) patients who are just initiating dopaminer-
manifested by depression, but the depression oten has
gic therapy; or 3) advanced patients who are experienc-
atypical anxious features, and the response to standard
ing motor luctuations.
anti-depressive medications is uncertain. Autonomic
function is also impaired in PD with luctuation in
control of blood pressure, gastrointestinal function Measurement tools and biomarkers
and urinary bladder emptying. here is oten disrup- In clinical trials of short-term improvement with
tion in sleep, sometimes preceding the diagnosis of PD, early PD patients the most common primary outcome
with various problems including REM behavior sleep measure is the Uniied Parkinson’s Disease Rating
disorder, restless legs symptoms, and vivid dreaming. Scale (UPDRS) [1]. his rating scale was developed
Night-time vivid dreaming sometimes extends into by expert consensus rather than through a traditional
daytime hallucinations, usually of a visual nature, that clinicometric process. he UPDRS is divided into three
seem to be precipitated or exacerbated by dopamin- main sections: mentation, activities of daily living, and
ergic medications. Although not well studied, many motor. Recently the UPDRS has been updated and
PD patients have complaints of pain, that may repre- modiied as the Movement Disorder Society-UPDRS
sent inadequately treated motor symptoms, but pain (MDS-UPDRS) [2], in an attempt to create better clini-
is oten not responsive to standard dopaminergic cometric properties. he classic UPDRS focuses on
medications. the traditional motor features (rigidity, bradykinesia,
In summary, although the classic and initial fea- tremor), which were most responsive to levodopa.
tures of PD are primarily motoric, a range of symp- Hence, as an outcome measure it is most sensitive to
toms involving cognition, mood, autonomic function, improvement in the core or classic PD motoric features.
and sleep are also part of the constellation of PD signs It is not particularly good at assessing other aspects of
and symptoms. While some of the motor features may PD including mood and cognition.
respond very well to levodopa and other dopamin- In clinical trials of advanced patients, aimed at
ergic treatment, many of the non-motor features are reducing ‘of ’ time, the most common primary out-
either non-responsive or are exacerbated by dopamin- come measures are patient-completed diaries, usu-
ergic therapies, and currently lack deinitive efec- ally on a half hour basis, where the subject indicates
tive treatments. All of the signs and symptoms of PD whether they are ‘on’ (medication controlling symp-
are progressive in nature and ultimately culminate toms), ‘of ’ (medication not controlling symptoms), or
in signiicant disability for a majority of people with asleep. hese diaries are typically collected for 2–3 days
PD, despite the use of dopaminergic therapies. While before a baseline visit and before subsequent and inal
mood and sleep disruption may be early features of visits to assess whether the amount of time spent in the
PD, signiicant cognitive impairment, autonomic dys- ‘on’ condition has been extended. Additionally, UPDRS
function, and hallucinations tend to be later features scores may be obtained in both the ‘on’ and ‘of ’ states
of the illness. to determine if the medication has lessened the severity
Clinical trials in PD have focused in two major of symptoms in the ‘of ’ condition. Most trials are not
areas: treatments designed to alleviate signs and symp- attempting to improve the best ‘on’ state.
toms in the short run, and treatments designed to mod-
ify the long-term progression of the illness. As might
be expected, the trial designs and outcome measures
Clinical trials to address short term
for studies addressing these two very diferent aims are improvement in signs and symptoms
also diferent. We will review both categories of trials As already mentioned, early in the course of PD the
separately. classic motor signs predominate the clinical picture.
216
Chapter 20: Parkinson’s disease
Table 20.1 Recent randomized, multi-center clinical trials in early PD
Total sample Primary efficacy

Trial Design size outcome Duration
STEP-UP, 1997 [5] Double-blind, placebo-controlled, 264 Change in Total 10 wks
parallel-group, multi-arm UPDRS
Shannon, Bennett, Friedman Double-blind, placebo-controlled, 335 Change in ADL Up to 32 wks
study of pramipexole, 1997, [6] parallel-group and Motor UPDRS
The 056 study of ropinirole, Double-blind, parallel-group 268 Percentage Interim
1998 [7] improvement in analysis at 26
motor UPDRS wks
TEMPO, 2002 [8] Double-blind, placebo-controlled, 404 Change in Total 26 wks
parallel-group, multi-arm UPDRS
PATCH, 2003 [4] Double-blind, 242 Change in 14 wks
placebo-controlled, parallel- Motor+ADL
group, multi-arm UPDRS
Table 20.2 Recent randomized, multi-center clinical trials of PD motor fluctuations
Total sample
Trial Design size Primary efficacy outcome Duration
SEESAW, 1997 [12] Double-blind, placebo- 205 Change in percentage of 24 wks
controlled, parallel-group ‘on’ time
Lieberman, Ranhosky, Korts Double-blind, placebo- 360 Change in ADL UPDRS 32 wks
(1997) study of pramipexole [13] controlled, parallel-group (average of ‘on’ and ‘off ’
ratings), change in motor
UPDRS
Lieberman et al (1998) study of Double-blind, placebo- 149 Number of patients with 26 wks
ropinerole [14] controlled, parallel-group 20% or greater decrease in
L-dopa dose and 20% or
greater reduction in percent
time spent ‘off ’
Waters et al (2004) study of Zydis Double-blind, placebo- 140 Change in percentage of 12 wks
Selegiline [15] controlled, parallel-group total daily ‘off ’ time
PRESTO, 2005 [11] Double-blind, placebo- 472 Change in total daily ‘off ’ time 26 wks
controlled, parallel-
group, multi-arm
Rascol, Brooks, Melamed, Double-blind, placebo- 687 Change in total daily ‘off ’ time 18 wks
LARGO, 2005 [17] controlled, parallel-group
Clinical trials at this stage are largely aimed at improv- concerns have subsequently been questioned, but the
ing motor function. Traditionally in clinical practice, tradition of delaying dopaminergic therapy in patients
levodopa, a potent treatment for the motor signs and with an early diagnosis of PD persists. In this setting,
symptoms of PD, is delayed until there is signiicant patients who have been identiied with idiopathic PD
motor disability. he rationale behind this practice was are recruited to test novel interventions which may have
a concern that levodopa could either hasten the pro- anti-parkinsonian efects as monotherapy. Most clin-
gression of illness, or that the duration of its beneicial ical trial designs have been relatively straightforward
efect may be limited and should be preserved until and simplistic in this stage of therapeutic development.
such time as disability warrants therapy. Both of these Usually studies are double-blind, placebo-controlled,
217
randomized studies of 13–26 weeks in duration. For randomized, controlled trial of DBS versus best med-
most efective treatments, anti-parkinsonian eicacy ical management was done in a largely open-label for-
can be measured within 2–4 weeks [3–4]. mat. Trials of infusions of levodopa preparations have
Later in PD ater the initiation of dopaminergic only been double-blinded in a small subset of patients.
therapy, motor luctuations develop. As described he intracranial procedures to introduce fetal nigral
above these luctuations consist both of wearing of of grats and viral vector-based gene therapy have been
the beneicial efect of dopaminergic medications and both open label and double-blind. he double-blind
the emergence of involuntary movements called dys- trials have used sham surgical approaches to maintain
kinesias. Most studies in this stage of PD have focused the blinding. his use of sham surgery has been highly
on reducing the amount of time when dopaminergic controversial, but has had methodological rigor. In
medications fail to produce clinical beneit. In the sham-controlled trials of fetal nigral transplantation
vocabulary of this disease, such time is referred to as and of gene therapy, the initial open-label indications
‘of ’ time, as opposed to the time when the dopaminer- of eicacy were not replicated. Still, the ield remains
gic medications are working well and largely alleviating somewhat uncertain as to the best methodological
symptoms, which is referred to as ‘on’ time. In general, approach for such trials [21–30].
such trials are not attempting to improve the magnitude A diferent therapeutic approach to motor compli-
of beneit that is achieved when someone is ‘on,’ but just cations is to attempt to prevent them from developing
to extend the time spent in the ‘on’ condition. Again the in the irst place. One school of thought in PD treat-
trial designs here are relatively straightforward usually ment has been that the intermittent pulsatile nature
double-blind, placebo-controlled, randomized studies of levodopa administration, with peaks and valleys of
of 13–26 weeks duration. Again, beneicial efects are blood concentrations, may contribute to the develop-
usually observed within the irst month or so of treatment of motor complications. A reasonable hypothesis
ment [10–11]. has been made that sustained release preparations of
Despite the use of dopaminergic medications and levodopa may have a favorable impact on motor com-
agents which have been shown to extend and improve plications. Another approach would be to give a more
‘on’ time (monoamine-oxidase inhibitors, catechol-O- sustained delivery of dopaminergic treatment, such as
methyl transferase inhibitors), patients with PD con- initiating treatment with a dopamine agonist or the
tinue to have an inconsistent response to optimized combination of levodopa with a dopamine agonist. In
therapy. Such patients may be characterized by contin- such clinical trials, patients would then be followed over
ued unpredictable ‘of ’ periods as well as by potentially time for the emergence of motor complications: dyski-
disabling involuntary movements or dyskinesias. Such nesias, wearing of, or both. Patients recruited for such
patients typically have tried multiple medications yet trials are generally those just initiating dopaminergic
have continued disability associated with their erratic therapy. Clinical trials of controlled-released levodopa
response to dopaminergic medications. he source of preparations [31–32] failed to show a diference in the
this erratic response is not well understood but may be emergence of complications in patients followed for 5
partially explained by erratic absorption of levodopa years. In contrast, trials that compared dopamine ago-
from the gastrointestinal tract. In such patients invasive nists as initial therapy vs. levodopa as initial therapy
procedures have been considered, and include direct have largely shown a reduction in the rate of emergence
infusion of levodopa intravenously or intra-intesti- of complications, despite the eventual addition of levo-
nally (both of which seem to improve motor luctua- dopa to those individuals initially receiving dopamine
tions) and intracranial operations. Destructive lesions, agonists [33–36]. here is also evidence to suggest that
including pallidotomies, were briely of interest, but the initiation of levodopa with a dopamine agonist as
electrode placement and subsequent deep brain stimu- opposed to levodopa alone will lead to fewer complica-
lation (DBS) have been shown to be efective in redu- tions although methodological rigor is less consistent
cing ‘of ’ time and reducing the severity of of periods, in these studies [37–39]. Recently a trial compared the
based on UPDRS scores [18–20]. Other more experi- initiation of levodopa combined to levodopa with a
mental procedures have been considered including COMT inhibitor, and in contrast to expectations, initi-
fetal nigral tissue transplantation, viral vector-based ation of levodopa with a COMT inhibitor led to a higher
gene therapy, and stem cell therapy. he methodology rate of dyskinesias [40]. Some of the diiculties in such
of such clinical trials is complex and controversial. A clinical trials are how to deine the onset of dyskinesias
218
and wearing of. Dyskinesias themselves are oten not per year. his has been relatively consistent over time
noticeable in their early stages by patients, although although recently the rate of annual decline seems to
physicians and spouses may identify them more eas- have abated somewhat [41]. he reasons for this are
ily. he contribution of mild dyskinesias to disability in unclear but may be related to how the UPDRS scale is
the early stage is also unclear. Wearing of on the other actually being used or due to variability, rather than
hand is oten appreciable by a research subject, but the any underlying change in PD progression. Electronic
symptoms of wearing of are not always motoric and monitoring devices may provide a more quantitative
may be more related to somatic discomfort, autonomic measure of motor function with less variability than
dysfunction (particularly gastrointestinal) and mood. the motor score from the UPDRS rating scale. he vali-
Hence, deinitively timing the onset of subtle motor dation of such instruments and feasibility of incorpo-
complications is diicult, and the implications for sub- rating these instruments into a clinical trial is under
sequent disability are not straightforward. way [42]. Another way of assessing PD progression has
been the time from randomization until the need for
Clinical trials to modify disease initiation of dopaminergic therapy based on emerg-
ing motor disability [43]. Such a survival endpoint is
progression useful in trial design, however, the decision about what
Perhaps more compelling to researchers, patients and constitutes suicient disability to warrant initiation of
families are trials that are designed to detect the ability dopaminergic treatment has certainly changed over
of interventions to slow the progression of PD. As out- time. As researchers and clinicians become less con-
lined previously, while motor and (perhaps) cognitive vinced that the delaying of initiation of dopaminergic
dysfunction are early manifestations of PD, there is a therapy is important, progressively smaller decrements
broad range of symptoms and signs that emerge as PD in motor function seem to be suicient to warrant
progresses. Motoric dysfunction, loss of ambulatory dopaminergic therapy. Many clinicians feel that there
capability, cognitive impairment, mood disruption, is no need to delay dopaminergic therapy once the
and autonomic dysfunction all eventually contrib- diagnosis has been made. Hence, using the rate of pro-
ute to potentially severe disability in individuals with gression of UPDRS in the ‘untreated’ state or the time
advanced PD. Interventions which could delay the pro- until need of dopaminergic therapy may represent
gression of any of these features as well as the cumula- problematic outcome measures for future trials.
tive disability would be of great advantage to patients Other ways of assessing the progression of PD have
and families and would have an enormous global public included the time from randomization until the devel-
health impact. However, such trials are associated with opment of motor luctuations [34]. However, there is
signiicantly more methodological diiculty for sev- concern about the incidence of motor complications as
eral reasons. Firstly, diferentiating long-term changes a manifestation of underlying PD progression. Motor
in disease trajectory from short-term improvement luctuations are thought to be at least in part due to the
in signs and symptoms is not easy or straightforward. use of dopaminergic medications, and certainly can be
Secondly, such trials by their nature will need to be modiied by adjustments in dopaminergic medications
long in duration, certainly months and usually years. raising the possibility that the manifestations are not
Lastly, selecting outcome measures is problematic ones of PD progression but rather the manipulation of
given the diverse nature of PD signs and symptoms, pharmacologic treatments.
and the uncertainty about acceptable ways to measure Trials are just emerging that focus on the devel-
disability and health related quality-of-life. In response opment of overall disability in PD, rather than meas-
to this, somewhat paradoxically, most studies to assess uring impairments in any particular domain such as
disease modiication have been conducted in individu- motor function or cognitive impairment. he UK
als with early and untreated PD. Other trial designs are based PD-MED and PD-SURG [19] trials used the
emerging to study disease modifying efects in more PDQ-39 and EQ5D, PD-speciic and generic health
advanced PD patients. related quality-of-life scales, as primary outcome
he UPDRS is also the traditional outcome meas- measures. Health related quality-of-life is not neces-
ure used in studies designed to assess change in dis- sarily a direct measurement of disability but relects
ease progression. Before the initiation of dopaminergic an individual’s perception of their overall function-
treatment, UPDRS scores deteriorate close to 10 points ing. Other potential measures of disability in PD could
219
include measurement of ambulatory capability and and complimentary evidence in the context of stud-
ability to function in the home or work environment. ies relying on clinical measures could give us a bet-
Although the Schwab and England ADL scale was ini- ter framework to understand the clinical efects we
tially developed with this in mind, its clinicometric observe in trials.
properties are not well studied, and it tends to focus
on motor function. Validated overall measures of dis-
ability in PD based on motor and non-motor features Trial design
are being developed. In response to this lack of single Conirmatory (phase 3) clinical trials in PD typically
measurement of disability or overall PD severity, ana- require long-term follow-up (months or years for tri-
lytic strategies such as global statistical tests have been als of disease modiication) and relatively large sam-
proposed. In such an approach, rather than selecting ple sizes for deinitive evidence of eicacy. As such, it
a single primary outcome measure, multiple outcome is sensible to perform pilot testing of potential agents.
measures assessing relevant domains are used. hey Ideally, phase 2 testing will be conducted quickly in a
are then analyzed in a single synthetic way using a glo- small sample of patients, so as not to delay the drug
bal statistical test that takes into account an individual’s development process. Phase 2 tests of potential dis-
performance on a series of measures [45]. his seems ease modifying agents in PD have relied on short-term
like a reasonable and fairly eicient approach until a improvement (observed over 1 year or less) in order
single, more global measurement of functioning in PD to complete a phase 2 study within a reasonable time
can be established. frame. his prompts the question of whether such
We have largely used the clinical manifestations short-term improvement is purely symptomatic or
of PD as a measurement of the underlying progres- truly relecting disease modiication. An examination
sion of the illness. his approach is fraught with dif- of 1 and 3 month time points compared to 12 months
iculties since there are so many treatments to modify can help alleviate this concern. Moreover, as alluded to
the expression of the clinical features which we don’t before, in order to be able to detect short-term change,
believe have any impact on the underlying disease phase 2 testing is frequently done in an early, untreated
(although this may be a mistake). Ideally we would have PD sample, making the generalizability of results to
some way of assessing the extent and progression of the treated patients more diicult [46].
disease process, whether that is a measure of neuronal Phase 2 or pilot testing of several potential disease
atrophy and death, a loss of important physiological modifying agents has been done using a futility design
compensatory mechanisms, impairment of glial func- [47]. his design is frequently used in cancer clinical
tion or structure, or all of the above. Unfortunately trials to quickly rule out clearly inefective treatments.
such a biomarker of disease progression is lacking. Rather than testing for eicacy, each treatment arm is
Attempts at developing imaging biomarkers includ- compared to a futility threshold (a predeined max-
ing those that measure the dopamine transporter, the imum worsening to warrant further study of the drug).
metabolic capacity of dopaminergic neurons, vesicu- If the drug is worse than this threshold, then the drug
lar transporter mechanisms, or pre-synaptic recep- would be discarded as futile. Failure to reject the null
tor binding have not emerged as efective outcome hypothesis would imply that further study of the drug
tools in the setting of clinical trials of PD progression. should be undertaken in a phase 3 setting. Cancer futil-
Measurements of the deposition of alpha synuclein ity designs are typically single-armed studies in which
protein, or of inlammation, are other approaches to there is no placebo group, thereby reducing the overall
measuring disease progression in PD, but are largely study sample size. However, given the variability of the
experimental at this time. here have been analogous placebo rate in PD and changes in practice over time,
attempts to measure changes in body luid (blood, CSF, it is advisable to include a concurrent placebo control
urine) markers including alpha-synuclein, indices of group and test the futility hypothesis as a two-group
cell death, and indices of oxidative stress. While some comparison. Phase 2 dose-ranging trials (where the
of these appear promising, none have emerged as an maximum tolerated dose is already known) can be per-
efective biomarker in the context of PD trials. While formed as a test of linear trend (of the doses) [48]. Two-
it will be a very long road for any of these biomarkers stage selection/futility designs have been proposed to
to be accepted as surrogate endpoints in the context of select among more than one dose arm in the phase
disease progression studies, using them as supportive 2 setting and are in use in other areas of neurology
220
[49–50]. Because of the long-term follow-up required both groups then experience a second period where
for pilot trials in PD, pilot designs that incorporate they are on the same treatment status [56]. In a delayed
early stopping rules are not as eicient from a sample start design the two groups are initially randomized
size perspective as in disease areas where follow-up to active and placebo, and in the second period the
time is short. his is because most patients have been placebo group begins on active medication while the
enrolled by the time the irst cohort completes fol- active group continues on the original assignment. At
low-up or recruitment must be halted ater enrollment the end of the second period, any diference that exists
of the irst cohort. All of these designs require a smaller between the two groups is hard to explain as short-
sample size (with the same power) compared to a study term symptomatic improvement alone since both
designed to test the null hypothesis of equal eicacy in groups are on the medication in the inal period. An
a two-group comparison (treatment vs. placebo). For a alternate approach is to randomize subjects to active
pilot study, a smaller sample size can be achieved by set- and placebo and then withdraw the active in the second
ting the false positive rate (alpha) at 0.1 or 0.25 (rather period, the so called withdrawal design. he advantage
than the conventional 0.05); this is justiied since the of this design is that in the second period neither group
drug will be tested again in a conirmatory trial [51]. is on active medications and if any diference persists
Phase 2 trials also address safety, tolerability, and feasi- it is more likely to represent a more ‘structural,’ or
bility issues. disease-modifying efect, than a symptomatic efect.
Multiple trial designs have been proposed and used On the other hand, study designs which require both
in studies to assess disease modiication in PD. he groups to be on extended periods of placebo interven-
original DATATOP [52] used a 2 × 2 factorial design tion are problematic, and determining the adequate
and randomized subjects to deprenyl, tocopherol, the length of the withdrawal period is diicult. A recent
combination, or placebo. A factorial design permits example of a delayed start design with rasagiline has
for an evaluation of the interaction of the two drugs. been published [57], which suggests that the 1 mg dos-
Under the assumption that there is no signiicant drug age is of beneit regarding disease modiication, but the
interaction, the sample size is more eicient than test- 2 mg failed. he reasons for the diferences in these two
ing both drugs alone. he groups were followed over responses will likely be the source of speculation for
time regarding the development of suicient disabil- some time, and helps to point out some of the meth-
ity to warrant dopaminergic therapy. Similar double- odological diiculties and uncertainties regarding
blind placebo controlled, parallel-group studies have two-period design clinical trials.
been conducted using agents including Co-enzyme Several statistical analyses of hypotheses of disease
Q10, CEP-1347 [53] and TCH-346 [54]. hese studies modiication are possible with two-period designs.
either used change in UPDRS as an outcome measure he FDA is currently considering the methodological
or time until development of disability warranting merit of various approaches [58]. In the delayed start
dopaminergic therapy. he deprenyl and Co-enzyme design, three analyses of the primary outcome (e.g.,
Q10 studies suggested a beneicial efect of the total UPDRS change) are of interest: 1) within period 1,
intervention studied. However, the apparent change in the rate of change in the active group is less than (slower
the long-term course of illness may have been a mani- than) the placebo group; 2) the early-start group has a
festation of short-term improvement in PD signs and lower change from baseline compared to the delayed-
symptoms (shiting the progression curve to the let) start group (patients who received the drug early have
that confounded the ability to detect a change in the a better inal outcome than those starting drug later;
long-term course of PD. Post-hoc disease progres- 3) the rate of change within period 2 in the early-start
sion modeling studies [55] suggest that there is both a group is the same as the rate of the delayed-start group
short-term symptomatic improvement and long-term as demonstrated by a test of non-inferiority.
disease modiication in the deprenyl studies, but such Other trial designs have not focused on trying to dif-
post-hoc analyses need to be interpreted cautiously. In ferentiate short-term symptomatic improvement from
an attempt to address this issue of short-term symp- long-term disease modiication regarding the clinical
tomatic vs. disease modiication efects, two-period features of PD, but instead have focused on determin-
designs have been proposed. ing if there is a long-term change in disability. Long-
Two-period designs are those in which, ater an term disability trials do not necessarily try to imply
initial period of randomization to active and placebo, the mechanism of reduction in disability (i.e., they do
221
not necessarily try to diferentiate symptomatic from prior sleepiness reported falling asleep suddenly, for
disease-modifying efects) but focus on the ultimate example while driving. In follow-up of this phenome-
outcome of subjects randomized to diferent treatment non it appears that in fact subjects were drowsy but were
strategies. In the long-term trial planned by the NET-PD unaware of this due to the chronic nature of the drow-
group over 1700 subjects were randomized to receive siness associated with the treatments. Dopaminergic
either placebo or creatine 10 g per day and will be fol- therapies in general can cause a mild degree of sedation
lowed for a minimum of 5 years. During the time of ran- which can lead to excessive daytime sleepiness and to
domized treatment, the subjects may receive any other episodes of falling asleep. Hence measuring the extent
PD treatments available including surgery. he primary of excessive daytime sleepiness, also present in unmedi-
outcome is a set of outcome measures designed to assess cated PD, is important in the context of clinical trials.
disability. he Symbol Digit Test is used to assess cog- he Epworth Sleepiness Scale has emerged as a reliable
nition, the Rankin Scale to assess overall functional standard tool to measure [59] symptoms of daytime
capability, the Schwab and England Scale to assess PD sleepiness in individuals with PD. A cutof point of 10 is
function, ambulatory capability is assessed by selected usually suggested as threshold for excessive sleepiness.
UPDRS scores, and the PDQ-39, a PD speciic health Early in the development of levodopa there was a
related quality-of-life instrument. hese outcomes will concern that it could induce or promote melanomas
be analyzed using a global statistical test comparing or the proliferation of pigmented skin cells. While
those receiving creatine with those receiving placebo to several epidemiological studies have not conirmed
determine the overall disability at the end of a minimum an increased risk for melanomas with dopaminergic
of 5 years of treatment. If a single overall disability scale medications, it is clear that PD populations are at about
were to emerge in the meantime, it could be proposed as a 2–3-fold risk of melanomas compared to the popula-
an alternative primary outcome measure. tion without PD. his is distinctive as the risk of other
Lastly, large pragmatic trials of diferent treatment cancers in PD is normal or slightly lower. In the context
strategies, or comparative efectiveness trials have been of clinical trials, increased surveillance for melanomas
proposed. he PD-MED and PD-SURG [19] trials in and other skin cancers is warranted.
the UK are examples of these. Subjects are randomized While the phenomena of impulse control disor-
to diferent treatment strategies without the use of ders, especially pathological gambling and inappropri-
a placebo control arm. Follow-up assessment is rela- ate sexual behavior, have been reported for decades in
tively brief and infrequent, and the primary outcome the context of PD, a recent resurgence in interest of this
measure is the PDQ-39, a health-related quality-of-life came from the apparent increase in these behaviors
instrument. Entry criteria are few and the trial can be with the co-administration of dopamine agonists and
conducted by both neurologists and gerontologists. he levodopa. Impulse control disorder assessment tools
advantage of large pragmatic trials is that their external (including the modiied Minnesota Impulse Disorders
validity or generalizability is likely higher than more Interview (mMIDI) [60–61] and the Questionnaire
explanatory trials. his means that the results of them for Impulsive-Compulsive Behaviors in Parkinson’s
are likely more directly applicable to the management Disease (QUIP) [62], may be useful in the context of
of community-based patients with PD. he mecha- clinical trials to assess baseline and subsequent change
nism of impact of interventions in such trials will not in these behaviors. his kind of adverse event raises the
be clear, but that may not be needed in order to choose issue of low frequency events that are of high import-
a treatment strategy which will be most beneicial in ance but that are diicult to measure in the context
the long run. Comparative efectiveness trials largely of clinical trials. hese instruments are based on self-
fall into this type of clinical trial approach. reporting and many of these behaviors are considered
inappropriate or potentially shameful; hence patients
Standards for efficacy and special and families are less likely to be willing to report them.
Active investigation for such events is necessary.
safety concerns Lastly, although suicide is an unusual event in
Several particular safety concerns have emerged in the PD there is continued interest in observing for the
context of PD clinical trials. In the early studies with occurrence of suicidal behaviors or suicide. he Beck
dopamine agonists a phenomenon originally called Depression Inventory (BDI-II) is a widely used screen-
‘sleep attacks’ was identiied. Individuals who denied ing measure of depression, and it includes a question
222
on suicidal thoughts [63]. he Columbia Suicide and hey will contribute data but it will not be data derived
Severity Rating Scale has been developed by the FDA in from the treated state. here exists signiicant con-
collaboration with academic investigators. It has been troversy about how to deal with truly missing data.
used in trials of PD with relative ease. Methodologically sound clinical trials will perform
an intent-to-treat analysis, whereby all patients who
are randomized are included in the primary analysis.
Challenges/controversies Patients for whom data are not available will have their
One of the biggest controversies in trials of disease missing data imputed (or assigned, in some accepted
modiication of PD is the appropriate patient popula- way). Historically, PD patients participating in clin-
tion. While many of the studies have focused on early ical trials are adherent to study medications and drop-
untreated PD, it remains uncertain as to whether there is out rates are low. However, because of the progressive
truly a population of patients who are early and appropri- nature of the disease it is diicult for many PD patients
ately not treated any longer. Some investigators feel that to remain untreated or remain on a stable dose of dopa-
early initiation of dopaminergic therapy is warranted minergic therapy for the duration of trial follow-up.
for the best long-term outcome, whereas others still he presence of missing data poses a challenge to the
believe that delaying initiation of dopaminergic therapy conduct of two-period designs (delayed start or with-
until signiicant motor disability has emerged is the best drawal designs) which may be increasingly used in dis-
course. No clear standard has emerged for the design ease modiication trials. Using a delayed start design
of trials to assess slowing disease progression, although (rather than a withdrawal design) may circumvent the
the FDA and other investigators are working on models likelihood that patients are more likely to discontinue
of disease progression as one approach. his methodo- under withdrawal. In early trials that are measuring
logical uncertainty is not present for trials of symptom time for need for levodopa or UPDRS scores, subjects
improvement. In the setting of this controversy it has been that require additional therapy will not be able to con-
increasingly diicult to identify and recruit subjects in tribute data beyond that point. Hence, they also con-
this ‘early untreated’ state to clinical trials. his also raises tribute to the missing data pool. How to handle such
the question of equitable access to and knowledge about missing data and impute data to replace it is an area
clinical trials for patients and families with PD. Many of active investigation. Frequently this is done by sim-
organizations have attempted to increase the awareness ply carrying forward the patient’s score at the last visit
of clinical trials so that those who are interested can par- in which the patient initiated dopaminergic therapy
ticipate, but dissemination of information about clinical or required additional therapy. Better approaches
trials remains suboptimal, in particular among patients to impute these types of missing data are statistically
of racial or ethnic minority groups [64]. complex and, as yet, remain an area of continued the-
he use of sham controls in high-risk or intensity oretical research. While some PD clinical trials have
intervention studies remains controversial. he rigor discontinued follow-up of patients once they are no
of placebo or sham controlled trials has been useful in longer able to follow the study protocol, there is an
identifying inefective therapies thought to be efec- emerging consensus that subjects who are in a trial
tive in unblinded studies. Fetal nigral tissue trans- but discontinue the study drug should continue with
plantation was essentially halted until aspects of it can all protocol assessments. he data generated are not on
be optimized. It is likely that gene therapy and stem active intervention, but are data which relect the sta-
cell treatments will be subjected to similarly rigorous tus of the subject in any case. Data in study subjects, of
trial designs. Seeking information about attitudes and study medication is likely to be much more informative
beliefs of subjects and their families, as well as research- than any imputed data that replace missing data.
ers, on the appropriateness of such sham control stud-
ies is important [65–66]. References
Studies of disease modiication in PD will be long
term by deinition. In all long-term trials some subjects 1. Fahn S and Elton RL, Members of the UPDRS
Development Committee. he Uniied Parkinson’s
will stop participation before the intended completion. Disease Rating Scale. In: Fahn S, Marsden CD, Calne
Hence, there will always be missing data from clinical DB and Goldstein M, editors. Recent Developments in
trials. Alternatively, subjects may participate in clinical Parkinson’s disease, Vol 2. Florham Park, NJ, Macmillan
trials but elect to no longer take the study medication. Health Care Information. 1987; 153–164: 293–304.
223
2. Goetz CG, Tilley BC, Shatman SR, et al. Movement 18. Weaver FM, Follett K, Stern M, et al. Bilateral deep
Disorders Society-sponsored revision of the Uniied brain stimulation vs. best medical therapy for patients
Parkinson’s Disease Rating Scale (MDS-UPDRS): scale with advanced Parkinson disease: A randomized
presentation and clinimetric testing results. Mov Disord controlled trial. JAMA 2009; 301: 63–73.
2008; 23: 2129–70. 19. Williams A, Gill S, Varma T, et al. Deep brain
3. Parkinson Study Group. Safety and eicacy of stimulation plus best medical therapy versus best
pramipexole in early Parkinson disease. A randomized medical therapy alone for advanced Parkinson’s disease
dose-ranging study. JAMA 1997; 278: 125–30. (PD SURG trial): a randomized, open-label trial. Lancet
4. Parkinson Study Group. A controlled trial of rotigotine Neurol 2010; 9: 581–91.
monotherapy in early Parkinson’s disease. Arch Neurol 20. Follett KA, Weeaver FM, Stern M, et al. Pallidal versus
2003; 60: 1721–8. subthalamic deep-brain stimulation for Parkinson’s
5. Parkinson Study Group. Safety and eicacy of disease. NEJM 2010; 362: 2077–91.
pramipexole in early Parkinson disease. A randomized 21. Freed CR, Breeze RE, Rosenberg NL, et al. Survival
dose-ranging study. JAMA 1997; 278: 125–30. of implanted fetal dopamine cells and neurologic
6. Shannon KM, Bennett JP, Jr and Friedman JH. Eicacy of improvement 12 to 46 months ater transplantation for
pramipexole, a novel dopamine agonist, as monotherapy Parkinson’s disease. N Engl J Med 1992; 327: 1549–55.
in mild to moderate Parkinson’s disease. he 22. Hauser RA, Freeman TB, Snow BJ, et al. Long-term
Pramipexole Study Group. Neurology 1997; 49: 724–8. evaluation of bilateral fetal nigral transplantation in
7. Rascol O, Brooks DJ, Brunt ER, et al. Ropinirole in Parkinson’s disease. Arch Neurol 1999; 56: 179–87.
the treatment of early Parkinson’s disease: a 6-month 23. Schumacher JM, Ellias SA, Palmer EP, et al.
interim report of a 5-year levodopa-controlled study. Transplantation of embryonic porcine mesencephalic
056 Study Group. Mov Disord 1998; 13: 39–45. tissue in patients with PD. Neurology 2000; 54:1042–50.
8. Parkinson Study Group. A controlled trial of rasagiline 24. Patel NK, Bunnage M, Plaha P, et al. Intraputamenal
in early Parkinson disease: the TEMPO study. Arch infusion of glial cell line-derived neurotrophic factor
Neurol 2002; 59: 1937–43. in PD: a two-year outcome study. Ann Neurol 2005; 57:
10. Parkinson Study Group. A controlled, randomized, 298–302.
delayed-start study of rasagiline in early Parkinson 25. Slevin JT, Gash DM, Smith CD, et al. Unilateral
disease. Arch Neurol 2004; 61: 561–6. intraputaminal glial cell-line derived neurotrophic
11. Parkinson Study Group. A randomized placebo- factor in patients with Parkinson’s disease: response to 1
controlled trial of rasagiline in levodopa-treated year of treatment and 1 year of withdrawal. J Neurosurg
patients with Parkinson disease and motor luctuations. 2007; 106: 614–20.
he PRESTO Study. Arch Neurol 2005; 62: 241–8. 26. Stover NP, Bakay RAE, Subramanian T, et al.
12. Parkinson Study Group. Entacapone improves motor Intrastriatal implantation of human retinal pigment
luctuations in levodopa-treated Parkinson’s disease epithelial cells attached to microcarriers in advanced
patients. Ann Neurol 1997; 42: 747–55. Parkinson’s disease. Arch Neurol 2005; 62: 1833–7.
13. Lieberman A, Ranhosky A and Korts D. Clinical 27. Marks WJ, Ostrem JL, Verhagen L, et al. Safety and
evaluation of pramipexole in advanced Parkinson’s tolerability of intraputaminal delivery of CERE-120
disease: results of a double-blind, placebo-controlled, (Adeno-associated virus serotype 2-neurturin) to
parallel-group study. Neurology 1997; 49: 162–8. patients with idiopathic Parkinson’s disease: an open-
label, phase I trial. Lancet Neurol 2008; 7: 400–8.
14. Lieberman A, Olanow CW, Sethi K, et al. A multicenter
trial of Ropinirole as adjunct treatment for Parkinson’s 28. Freed CR, Greene PE, Breeze RE, et al. Transplantation
disease. Ropinirole Study Group. Neurology 1998; 51: of embryonic dopamine neurons for severe Parkinson’s
1057–62. disease. N Engl J Med 2001; 344: 710–19.
15. Waters CH, Sethi KD, Hauser Ra, et al. Zydis Selegiline 29. Olanow CW, Goetz CG, Kordower JH, et al. A
reduces of time in Parkinson’s disease patients with double-blind controlled trial of bilateral fetal nigral
motor luctuations: a 3-month, randomized, placebo- transplantation in Parkinson’s disease. Ann Neurol
controlled study. Mov Disord 2004; 19: 426–32. 2003; 54: 403–14.
17. Rascol O, Brooks DJ, Melamed E, et al. Rasagiline as 30. Lang AE, Gill S, Patel NK, et al. Randomized
an adjunct to levodopa in patients with Parkinson’s controlled trial of intraputaminal glial cell line-derived
disease and motor luctuations (LARGO, Lasting efect neurotrophic factor infusion in Parkinson’s disease.
in Adjunct therapy with Rasagiline Given Once daily, Ann Neurol 2006; 59: 459–66.
study): a randomized, double-blind, parallel-group 31. Koller WC, Hutton JT, Tolosa E, Capilldeo R, and the
trial. Lancet 2005; 365: 947–54. Carbidopa/Levodopa Study Group. Immediate-release
224
and controlled-release carbidopa/levodopa in PD. A 46. Elm JJ, Goetz CG, Tilley B, et al. A responsive outcome
5-year randomized multicenter study. Neurology 1999; for Parkinson’s disease neuroprotection futility studies.
53: 1012–19. Ann Neurol 2005; 57: 197–203.
32. Dupont E, Andersen A, Boas J, et al. Sustained-release 47. Tilley BC, Palesch YY, Kieburtz K, et al. Optimizing the
Madopar HBS compared with standard Madopar in the ongoing search for new treatments for Parkinson disease:
long-term treatment of de novo parkinsonian patients. Using futility designs. Neurology 2006; 66: 628–33.
Acta Neurol Scan 1996; 93: 14–20. 48. Shults CW, Oakes D, Kieburtz K, et al. Efects of
33. Rinne UK, Bracco F, Chouza C, et al. Early treatment Coenzyme Q10 in early Parkinson Disease. Evidence
of Parkinson’s disease with Cabergoline delays on the of slowing the functional decline. Arch Neurol 2002; 59:
onset of motor complications. Results of a double-blind 1541–50.
levodopa controlled trial. Drugs 1988; 55 (Suppl): 23–9. 49. Levy G, Kaufmann P, Buchsbaum R, et al. A two-stage
34. Parkinson Study Group. Pramipexole vs. levodopa as design for a phase II clinical trial of coenzyme Q10 in
initial treatment for Parkinson disease. A randomized ALS. Neurology 2006; 66: 660–3.
controlled trial. JAMA 2000; 284: 1931–8. 50. Cheung YK, Gordon PH and Levin B. Selecting
35. Rascol O, Brooks DJ, Korczyn AD, et al. A ive-year promising ALS therapies in clinical trials. Neurology
study of the incidence of dyskinesias in patients 2006; 67: 1748–51.
with early Parkinson’s disease who were treated with
51. Schoenfeld D. Statistical considerations for pilot
Ropinirole or levodopa. N Engl J Med 2000; 342:
studies. Int J Radiat Oncol Biol Phys 1980; 6: 371–4.
1484–91.
52. Parkinson Study Group. Efects of tocopherol and
36. Oertel WH, Wolters E, Sampaio C, et al. Pergolide
deprenyl on the progression of disability in early
versus levodopa monotherapy in early Parkinson’s
Parkinson’s disease. N Engl J Med 1993; 328: 176–83.
disease patients: the PELMOPET study. Mov Disord
2006; 21: 343–353. 53. Parkinson Study Group PRECEPT Investigators.
Mixed lineage kinase inhibitor CEP-1347 fails to delay
37. Allain H, Destee A, Petit H, et al. Five-year follow-up
disability in early Parkinson disease. Neurology 2007;
of early lisuride and levodopa monotherapy in de novo
69: 1480–90.
Parkinson’s disease. he French Lisuride Study Group.
Eur Neurol 2000; 44: 22–30. 54. Olanow W, Schapira AHV, LeWitt PA, et al. TCH346 as
a neuroprotective drug in Parkinson’s disease: a double-
38. Rinne UK. Lisuride, a dopamine agonist in the
blind, randomized, controlled trial. Lancet Neurol 2006;
treatment of early Parkinson’s disease. Neurology 1989;
5: 1013–20.
39: 336–9.
55. Holford NHG, Chan PL, Nutt JG, et al. Disease
39. Nakanishi T, Iawata M, Goto I, et al. Nation-wide
progression and pharmacodynamics in Parkinson
collaborative study on the long-term efects of
disease – Evidence for functional protection with
Bromocriptine in the treatment of parkinsonian patients.
levodopa and other treatments. J Pharmacokinet
Final report. Euro Neurol 1992; 32(Suppl 1): 9–22.
Pharmacodyn 2006; 33: 281–311.
40. Olanow CW, Hauser RA, Jankovic J, et al. A
randomized, double-blind, placebo-controlled, delayed 56. McDermott MP, Hall WJ, Oakes D, et al. Design and
start study to assess rasagiline as a disease modifying analysis of two-period studies of potentially disease-
therapy in Parkinson’s disease (the ADAGIO study): modifying treatments. Control Clin Trials 2002; 23:
rationale, design, and baseline characteristics. Mov 635–49.
Disord 2008; 23: 2194–201. 57. Olanow CW, Rascol O, Hauser R, et al. A double-blind,
41. he NINDS NET-PD Investigators. A randomized delayed-start trial of Rasagiline in Parkinson’s disease.
clinical trial of coenzyme Q10 and GPI-1485 in early N Engl J Med 2009; 361: 1268–78.
Parkinson’s disease. Neurology 2007; 68: 20–8. 58. Bhattaram VA, Siddiqui O, Kapcala LP, et al. Endpoints
42. Goetz CG, Stebbins GT, Wolf D, et al. Testing objective and analyses to discern disease-modifying drug efects
measures of motor impairment in early Parkinson’s in early Parkinson’s disease. AAPS J 2009; 11: 456–64.
disease: Feasibility study of an at-home testing device. 59. Johns MW. A new method for measuring daytime
Mov Disord 2009; 24: 551–6. sleepiness: the Epworth sleepiness scale. Sleep 1991; 14:
43. Parkinson Study Group. DATATOP: A multicenter 540–45.
controlled clinical trial in early Parkinson’s disease. 60. Christenson GA, Faber RJ, and deZwaan M. Compulsive
Arch Neurol 1989; 46: 1052–60. buying: descriptive characteristics and psychiatric
45. Huang P, Goetz CG, Woolson RF, et al. Using global comorbidity. J Clin Psychiatry 1994; 55: 5–11.
statistical tests in long-term Parkinson’s disease clinical 61. Weintraub D, Siderowf AD, Potenza MN, et al.
trials. Mov Disord 2009; 24: 1732–39. Association of dopamine agonist use with impulse
225
control disorders in Parkinson disease. Arch Neurol 64. Schneider MG, Swearingen CJ, Shulman LM, et al.
2006; 63: 969–73. Minority enrollment in Parkinson’s disease clinical
62. Weintraub D, Stewart S, Shea JA, et al. Validation of the trials. Parkinsonism Relat Disord 2009; 15: 258–62.
Questionnaire for Impulsive-Compulsive Behaviors 65. Kim SYH, Frank S, Holloway R, et al. Science and ethics
in Parkinson’s Disease (QUIP). Mov Disord 2009; 24: of sham surgery. A survey of Parkinson disease clinical
1461–67. researchers. Arch Neurol 2005; 62: 1357–60.
63. Beck AT, Steer RA, and Brown GK. Manual for Beck 66. Kim SYH, Holloway RG, Frank S, et al. Volunteering for
Depression Inventory II. San Antonio, TX, Psychological early phase gene transfer research in Parkinson disease.
Corporation, 1996. Neurol 2006; 66: 1010–15.
226
Chapter
Alzheimer’s disease
21 Joshua D. Grill and Jefrey Cummings
Biological basis for therapies beta amyloid protein (Aβ42). Neuroibrillary tangles
are composed primarily of hyperphosphorylated
his chapter will provide an overview of the funda-
aggregations of the microtubule-associated protein
mentals of clinical trials in Alzheimer’s disease (AD).
tau. he molecular events that lead to the formation of
A basic understanding of disease biology and clinical
these two brain lesions provide ample opportunity for
presentation is necessary to interpret matters regarding
therapeutic intervention. Most attempts at developing
AD trials and we begin with an overview of the disease.
disease-modifying drugs to this point have focused on
We review the goals of AD trials, the basic tools used in
the Aβ cascade (Figure 21.1). Proteolytic processing of
their conduct, current and future trial designs, limita-
the large membrane-bound amyloid precursor protein
tions and challenges to trial conduct, and controversies
results in the formation of both seemingly benign and
that exist in the ield of AD clinical research. he ield
synapto- and neurotoxic proteins of varying sizes. Aβ42
of AD trials is a rapidly evolving one. We attempt to
is the result of sequential cleavage by beta and gamma
address recent changes in trial conduct and to consider
secretase enzymes. he presence of Aβ42 is character-
future changes that will be needed.
istic of AD. Its presence in neuritic plaques, combined
with demonstration that mutations to the genes for
Introduction to Alzheimer’s disease amyloid precursor protein or the catalytic subunits
Alzheimer’s disease is a progressive neurodegenera- of gamma secretase result in an autosomal dominant,
tive disorder characterized over 100 years ago but still inherited early onset form of AD, suggest that it is this
lacking adequate therapies. To date, the US FDA has post-translational product that is critical to the disease
approved ive drugs for the treatment of AD. Most pathogenesis.
studies suggest that these agents provide only sympto- Formation of Aβ plaques results from a series of
matic improvement in AD and pursuit of treatments stages of aggregation. It is not clear which or how many
capable of altering the natural history of AD is rigor- of these stages are neuro- or synaptotoxic. Soluble
ous. herefore, clinical trials of new therapies in AD in low number combinations of monomeric Aβ may be
the coming years will continue to be a mainstay of AD most toxic. he number of Aβ monomers aggregated
research. into soluble combinations collectively termed oligom-
Recent decades have brought signiicant increases ers ranges from 2 to 12. Soluble oligomers appear to be
in the understanding of AD pathogenesis. Much of the more toxic than monomeric Aβ or the insoluble ibril-
focus in AD research continues to revolve around the lar aggregates of Aβ found in difuse or neuritic plaques
two hallmarks of disease pathology irst described by [1]. Synaptic function is altered in neuronal processes
Alzheimer himself, the neuritic plaque and the neuro- in proximity to Aβ plaque deposition [2].
ibrillary tangle (NFT). hese two brain lesions, and Tau is an endogenous microtubule-associated
the proteins that are most readily used to identify them protein critical to axonal transport and neuronal
in immunocytochemical study, have become the focus health and function. he hyperphosphorylation of
of research in both disease etiology and treatment tau can occur through activity of a variety of kinases,
development. Neuritic plaques are largely composed but glycogen synthase kinase 3β appears to be a pri-
of the ibrillogenic 42-amino acid length form of the mary mechanism for tau hyperphosphorylation. Once
227
A. APP Figure 21.1. The Aβ cascade. A. The

amyloid precursor protein is a large
peptide that undergoes proteolytic
process at characterized sites, including
β-, α-, and γ-sites. B. Serial cleavage at the
β- and γ-sites liberates the 42-amino acid-
length peptide fragment Aβ42. C. Low n
B. combinations of monomeric Aβ combine
into high-molecular weight soluble
oligomers that represent synapto- and
neurotoxic elements. D. Aggregation of
Aβ continues into fibrillar forms such as
protofibrils and fibrils. E. Fibrillar Aβ forms
C.
diffuse and neuritic Aβ plaques.
D.
E.
hyperphosphorylated, tau condenses, dissociates from hus, Alzheimer’s disease is present prior to the onset
microtubules, and aggregates, impairing axonal trans- of Alzheimer’s dementia. Eforts are underway to better
port and giving the characteristic appearance of a NFT. understand the earliest stages of biological and clinical
Formation of NFTs proceeds topographically and has AD. Attempts to better characterize the earliest stages
been used to stage disease severity upon pathological of AD resulted in construction of the clinical syndrome
examination. While there is overlap among Aβ and mild cognitive impairment (MCI), which is deined as
NFT regional pathology, pathological burden in early subtle cognitive impairment that distinguishes one
disease difers between the two hallmark signs in the from an age-matched cohort but does not impair activ-
AD brain. Aβ plaque deposition is irst observed in ities of daily living (ADL). he cognitive impairment is
the posterior cingulate cortex and other cortical areas, most commonly deined as performance on standard-
while NFT formation occurs initially in the entorhinal ized cognitive tasks that is 1.5 or 2.0 standard devia-
cortex and hippocampus of the medial temporal lobe. tions below the mean for an age-cohort, adjusted for
Aβ plaques and NFTs are accompanied by a variety education. Individuals with MCI are at signiicantly
of other cellular and molecular changes within the AD increased risk for all types of dementia. Individuals
brain. Inlammatory responses are evident and include who sufer from amnestic MCI (characterized by the
increased recruitment of microglia, which are associ- presence of memory impairment speciically, either
ated with Aβ plaques. As neurons are lost, character- alone or concomitantly with impairments to other cog-
istic depletions of neurotransmitters occur. One such nitive domains) are at signiicantly increased risk for
neurotransmitter decrease was discovered early in AD dementia speciically. A concerted efort is under-
modern AD research and resulted in the development way, including the National Institute on Aging- and
of the mainstay of current therapies, cholinesterase industry-sponsored collaborative Alzheimer’s Disease
inhibitors. Neuroimaging Initiative, to better characterize the bio-
Alzheimer’s disease pathology begins a decade or logical markers that predict future dementia among
more prior to dementia onset: 20–40% of elderly indi- MCI and non-impaired persons. Biological signatures
viduals with normal cognition qualify for post-mortem associated with AD predict future AD dementia among
diagnostic criteria for AD [3]. While this may contra- MCI cohorts. hese include characteristic atrophy of
dict current pathological theories of AD, it seems prob- brain volume assessed by MRI, brain hypometabolism
able that such individuals were destined for cognitive as measured by luorodeoxyglucose (FDG) PET amy-
impairment and, eventually, full blown dementia. loid burden as measured by amyloid-speciic ligands
228
Chapter 21: Alzheimer’s disease
with PET imaging, and changes in CSF protein levels. presentation, is sensitive and speciic when performed
Individuals who meet MCI criteria and also demon- by a specialist, and is oten supported by biological test-
strate a biological signature of AD have been deined ing. Screening tools for AD exist and can assist in iden-
as prodromal AD [4]. Alternatively, individuals who tiication of individuals with cognitive impairment.
carry the biological signature of AD but for whom no Neuropsychological assessment is oten used to further
demonstrable cognitive impairment is present may be delineate the type and extent of cognitive abnormality
deined as preclinical AD. present. Biological measures also can aid in diagnosis,
A wide array of therapeutic interventions for AD especially diferentiating AD from reversible forms of
are being developed. hese include therapies that aim cognitive impairment or other dementias.
to halt the underlying biology of AD, as well as thera- Among the biological changes that can be used in
pies that aim to improve cognitive function despite the the diagnosis of AD, brain atrophy in the hippocam-
pathological burden of disease. As is the case for many pus and entorhinal cortex is well described in the earli-
therapeutic realms, clinical trials represent the rate- est stages. Similarly, bilateral hypometabolism in the
limiting step to the testing of new therapies for AD. temporal lobe, parietal cortex, and posterior cingulate
Trials of AD therapies, however, bring unique chal- cortex are consistently observed in AD with FDG PET.
lenges related to study design and enrollment. Further, Recently, disease-speciic ligands for use with PET,
it is likely that only through clinical testing of targeted such as the amlyloid-speciic Pittsburgh compound B
therapies, perhaps in the prodromal and preclinical (PIB), lorbetaben, and AV-45 have become available
phases of disease, will many of the debates related to for use in AD research and appear to have good spe-
AD pathology be resolved. ciicity and sensitivity for AD identiication. Analysis
of CSF proteins can be used in diagnostic assessment,
Goals of intervention in AD although consistency in cutof points and protein level
Alzheimer’s disease is characterized by episodic mem- measures across laboratories is still lacking. Decreased
ory impairments (initially manifest as impairments to levels (190 pg/mL) of CSF Aβ are expected in AD,
short-term episodic memory); language changes such hypothetically due to the accumulation of Aβ into
as anomia and luent aphasia; visuospatial impair- plaques in the brain. Concomitant increases in CSF tau
ments; and executive function compromise. Behavioral and hyperphosphorylated tau occur and are good pre-
impairments such as apathy, depression, and agitation dictors of AD diagnosis (Figure 21.2).
also are common. he course of AD is unrelenting; life Currently available treatments include the
expectancy ater diagnosis is 8–12 years and quality of cholinesterase inhibitors donepezil, galantamine, and
life will decline through this period. rivastigmine, and the glutamate receptor antagonist
Deinitive diagnosis of AD is reached only upon memantine. None have been demonstrated to possess
post-mortem examination or brain biopsy, demon- disease-modifying properties. Symptomatic thera-
strating the presence of ibrillar amyloid. he clini- pies improve patient performance on cognitive tasks,
cal diagnosis of probable AD is based on symptom global measures, ADL, and behavior. Alternatively,
Figure 21.2. Markers of AD

progression. AD progresses from an
asymptomatic pre-clinical stage, to a
Amyloid PET period of mild cognitive impairment
during which criteria for AD are not
CSF Tau met (prodromal AD), and eventually to
AD dementia. Biological signs of AD,
including reduced CSF Aβ, increased
Cognition CSF Tau, positive signal on amyloid PET
Neuron/Synapse Number imaging, and brain atrophy (neuron
loss) are present in preclinical AD and
CSF Aβ
increase in magnitude with disease
progression. CSF Aβ and amyloid PET
Pre-clinical Prodromal AD Dementia imaging progress at approximate equal
Time
rates, in opposite directions, as amyloid is
accumulating in the brain and accordingly decreasing in level in the CSF. Declines in cognition are delayed, relative to the onset of biological
changes, and better correlate neuron and synapse number.
229
Figure 21.3. Distinction of

symptomatic from disease-modifying
Cognitive performance
therapies. The solid line represents

a hypothetical model of disease
Disease-modifying treatment progression (with the caveat that
disease progression in AD is not linear).
Initiation of symptomatic therapy results
in immediate increase in performance,
Symptomatic treatment but an unaltered decline in function
over time (slope). Disease-modifying
Untreated AD therapies, alternatively, may or may not
provide symptomatic improvement
Time upon initiation, but alter the course of
disease progression (slope) over time, resulting in preservation in cognitive function over time and delay to milestones related to overall
cognitive function.
disease-modifying therapies ofer a diferent type of of investigational treatments have evolved. To receive
beneit, though the speciic deinition of that beneit is marketing approval from the FDA, a new AD drug must
actively debated. demonstrate eicacy on co-primary outcomes, includ-
Disease-modifying therapies may not provide ing a cognitive measure and a functional or global
an immediate recovery of memory function. Over measure in two well conducted trials [7]. Randomized
time, however, the rate of decline in memory would trials of cholinesterase inhibitors were parallel group
be slowed, relative to the untreated patient (Figure 3- and 6-month studies. Subjects were blindly assigned
21.3). One proposed deinition for disease modii- to therapy or placebo and cognitive performance was
cation requires that the underlying biology must be assessed with the AD Assessment Scale which included
altered. Alternatively, a patient-centered perspec- the cognitive subscale (ADAS-cog) and the non-cog-
tive provides a deinition whereby clinical milestones nitive (ADAS-noncog) portion. Since these initial tri-
must be delayed, such as the ability to perform ADL or als, the ADAS-cog has remained the cognitive scale
nursing home placement. Purely cellular or molecular used in most trials conducted in mild to moderate AD
milestones may not confer clinical beneit. Similarly, (Table 21.1). he ADAS-cog is a 70-point scale that
symptomatic therapies may improve clinical outcomes assesses performance in memory, orientation, com-
without truly altering disease biology. herefore, we prehension of language and commands, naming, word
use a deinition that combines these two requirements: inding, and ideational and constructional praxis.
disease-modifying therapies must both alter the under- An 80-point version of the ADAS-cog that includes a
lying biology of AD that results in cell death and, as a delayed recall task is also available. Attention, working
product of that biological efect, produce a measurable memory, and executive function are largely overlooked
impact on clinical disease progression [5]. Given the by the ADAS-cog; an expanded ADAS-cog 13 with can-
long course, the apparent preclinical period of poten- cellation and maze tasks is available [8]. Standard cho-
tial intervention, and the late-life age-of-onset of AD, linergic therapy provides a beneit of approximately 2
the medical and economic ramiications of developing points on the ADAS-cog ater 6 months.
disease-modifying drugs are substantial and research Evaluation of placebo groups in large trials sug-
related to development of these therapies is intense. A gests that the ADAS-cog scores decline roughly 4 to
drug that can delay the onset of AD dementia by 5 years 6 points per year in AD [9, 10]. In the single largest
could decrease disease prevalence by 50%. A drug that AD trial to date, mean ADAS-cog scores in the pla-
delays AD dementia by 10 years will alleviate the public cebo group declined 4.28 and 7.08 points at 12 and 18
health crisis of AD [6]. months, respectively [11]. Other studies have demon-
strated annual rates of decline as high as 11.4 points
Alzheimer’s disease clinical trial [12]. hese discrepancies result from the fact that dis-
ease progression measured with the ADAS-cog is not
measurement tools linear and populations in diferent trials difer in their
Clinical trials of the irst approved treatments for AD rate of decline. Ito and colleagues performed a meta-
have largely guided subsequent trials, though the tar- analysis of ADAS-cog decline in acetylcholinesterase
gets, mechanisms of action, and intended indications clinical trials. hey found that baseline mini-mental
230
Table 21.1 Co-primary outcomes used in phase 3 clinical trials in AD
Cognitive primary Global or functional

Phase 3 trial [Reference] Disease stage outcome primary outcome
Tacrine[42] Mild-to-Moderate ADAS-cog CIBI
Donepezil[43] Mild-to-Moderate ADAS-cog CIBIC-Plus
Rivastigmine[44] Mild-to-Moderate ADAS-cog CIBIC-Plus
Galantamine[45, 46] Mild-to-Moderate ADAS-cog CIBIC-Plus
Rivastigmine [transdermal][47] Mild-to-Moderate ADAS-cog ADCS-CGIC
Donepezil[48] Severe SIB ADCS-ADL
Donepezil[16] Severe SIB CIBIC-Plus
Memantine[23] Moderate-to-Severe SIB ADCS-ADL
Tarenflurbil[11] Mild-to-Moderate ADAS-cog ADCS-ADL
Tramiprosate[34] Mild-to-Moderate ADAS-cog CRD-SB
Bapineuzumab[34] Mild-to-Moderate ADAS-cog DAD
Dimebon [drug naïve][34] Mild-to-Moderate ADAS-cog CIBIC-Plus
Dimebon [donepezil add on][34] Mild-to-Moderate ADAS-cog ADCS-ADL
Dimebon [memantine add on][34] Moderate-to-Severe SIB ADCS-ADL
Solanezumab[34] Mild-to-Moderate ADAS-cog ADCS-ADL
LY450139[34] Mild-to-Moderate ADAS-cog ADCS-ADL
annual decline of 7.52 ADAS-cog points [10]. Similarly,

1-Year change in cognitive subscale score
30
Doraiswamy and colleagues noted an 84% greater
25
decline in the ADAS-cog ater 6 months among partici-
20 pants with baseline MMSE 12–18, relative to those with
baseline MMSE 19–23. Stern and colleagues exam-
15 ined 1-year change in ADAS-cog among participants
with a range of baseline ADAS-cog scores. hey noted
10
that in very mild and more severe AD, annual decline
5 is reduced, relative to moderate disease [13] (Figure
21.4). Neither age nor ApoE genetic status appears to
0 impact the rate of decline on ADAS-cog [14].
Because of the performance characteristics of
–5
0 5 10 15 20 25 30 35 40 45 50 55 60 65 the ADAS-cog, particularly in the earliest stages
Baseline cognitive subscale score of disease, Harrison and colleagues developed the
Neuropsychological Test Battery (NTB), speciically
Figure 21.4. One-year changes in ADAS-cog score. Taken from
Stern et al. [13] this figure demonstrates changes in performance in for use in clinical trials [15]. he NTB uses nine vali-
the ADAS-cog based on baseline entry score on the same outcome dated components to examine cognitive function in
measure. The figure clearly illustrates that the greatest annual the domains of visual and verbal memory, and execu-
change in ADAS-cog performance occurs in moderate disease, with
minimal changes in very mild and very severe disease. Reprinted tive function. he NTB appears to demonstrate linear
with permission from the American Journal of Psychiatry, Copyright decline for both mild and moderate dementia. he
1994, American Psychiatric Association. NTB has been used in only a few trials and its perform-
ance across trials is not yet well understood.
state examination (MMSE) scores above 27 (or ADAS- All cholinesterase inhibitors are approved for mild-
cog below 10) were associated with an annual decline to-moderate AD and two medications (memantine
of 2.97 ADAS-cog points/year, but baseline MMSE and donepezil) have been approved by the FDA for
below 12 (ADAS-cog above 40) was associated with an severe dementia. Approval in the severe disease stage
231
was based on improved cognition demonstrated by tri- Global outcome measures are clinician-based
als utilizing the severe impairment battery (SIB). he tools that aim to detect clinically signiicant changes.
SIB is a 40-item scale with a maximum score of 100 Such clinical signiicance may result from substan-
points. Lower scores represent greater impairment. tive changes in a single domain or cumulative efects
Memory, orientation, language, attention, visuospatial of small changes to multiple domains. Scales assessing
function, and construction are evaluated. he SIB suc- Clinical Global Impression (CGI) of Change (CGIC)
cessfully distinguishes moderately severe (MMSE >6) and Severity (CGIS) are common in medication trials
from more severe impairment (MMSE 0 to 5). in the neurological disorders. he CGIC was a primary
Functional measures are included in nearly all clin- outcome, accompanied by the ADAS-cog, in the initial
ical trials and the Alzheimer’s Disease Cooperative trials of tacrine that led to the irst FDA approval for
Study-Activities of Daily Living (ADCS-ADL) is the a treatment of AD. his scale was modiied to create
most commonly used method to examine functional the Clinical Interview Based Impression of Change
performance. he ADCS-ADL is an informant-based (CIBIC). he CIBIC speciically assesses the patient’s
questionnaire that assesses conduct of both basic (e.g. ADLs, language, motivation, behavior, strengths and
personal hygiene, dressing, eating) and instrumen- weaknesses, and history. As a result of a letter from the
tal (e.g. using a telephone, preparing a meal, dealing FDA recommending the use of a CIBIC for regulatory
with inances) ADLs. A total of 45 ADLs are evaluated, trials in AD, a variety of scales exist. he ADCS devel-
chosen for their consistency in decline among gen- oped one comprehensive scale for use in AD clinical
ders and levels of disease severity. Scores range from trials. he ADCS-CGIC includes a clinician interview
0 to 78, with higher scores representing more main- with the caregiver, characterizing it as a ‘CIBIC-Plus.’
tained function. In the tarenlurbil study, subjects with Fiteen domains are examined by the ADCS-CGIC.
MMSE of 20–26 at entry had a baseline ADCS-ADL Change is rated on a 7-point scale, with 1 represent-
of 63.6. Subjects receiving placebo declined 6.0 points ing marked improvement, 4 representing no change,
at 12 months and 9.7 points at 18 months. In a study and 7 representing marked worsening. he ADCS-
examining the efects of memantine on ADLs in mod- CGIC is valid and reliable and is used oten in AD tri-
erate to severe AD, participants meeting MMSE entry als. Self- and informant-based reporting versions have
criteria (5–14) had a mean baseline ADCS-ADL score been developed. Both versions are valid and reliable,
of 36.2. In a trial of donepezil in severe AD (MMSE although in a large study of cognitively normal volun-
0–12), mean baseline ADCS-ADL score was 26.7. Ater teers, subjects demonstrated a bias toward rating them-
24 weeks, scores declined to a mean 2.53 [16]. he rate selves as improved. he CIBIC has been used in trials
of AD progression as measured with the ADCS-ADL is of mild-to-moderate and moderate-to-severe AD and
not afected by ApoE genotype [14]. he ADCS-ADL is the ADCS-CGIC has recently been modiied for use in
currently in use in a variety of phase 3 clinical trials in prevention and MCI trials [18].
AD (Table 21.1). Developed by Berg, Morris, and colleagues at
he Disability Assessment for Dementia (DAD) Washington University, the Clinical Dementia Rating
scale also examines functional outcomes and is used in Scale (CDR) is a global tool that utilizes patient and
AD trials. he DAD is a 46-item, 100-point caregiver informant interview to assess memory, orientation,
report that assesses 19 basic and 26 instrumental ADLs. judgment and problem solving, community afairs,
A score of 100 represents no disability while a score of 0 home and hobbies, and self-care [19]. he CDR can
represents maximum impairment, or a loss of the abil- be scored in two ways. First, it can provide a global
ity to independently perform all assessed items. Items score of 0, 0.5, 1.0, 2.0, or 3.0 relating to not demented,
are also scored as to whether they represent planning, very mild dementia, mild dementia, moderate demen-
initiation, or maintenance of activities. In one study of tia, and severe dementia, respectively. he CDR glo-
moderate AD patients (mean MMSE=19), mean total bal score is calculated using the memory score as the
DAD score was 70.1 and DAD scores declined 12.5 primary indicator of disease severity and all other
points over 12 months [17]. his corresponds to a loss subdomains as secondary. hus, in most situations
of an ADL every other month. Change in DAD score the global score is equal to the memory score, unless
is accelerated later in AD. he DAD is currently in use diferent levels of impairment are present in three or
as a co-primary in at least one phase 3 investigation in more other subdomains. Alternatively, a CDR sum of
AD (Table 21.1). boxes (CDR-SB) score can be assessed by adding the
232
box scores together. he CDR is widely used in clinical better suited for mild AD trials, MCI trials, or primary
trials and has been included in primary and second- or secondary prevention trials.
ary prevention trials as a primary outcome. It has the A number of current trials also assess biomarkers
advantage that the rater need not recall previous inter- of disease as secondary outcomes. he AN1792 vaccine
views in order to assess change, though consistency in trial was one of the irst to include a biological marker as
raters is important within trials. he CDR does not, a secondary outcome measure and yielded diicult-to-
however, assess behavioral symptoms. interpret results when analyses suggested that individ-
he assessment of behavioral outcomes is generally uals who responded to therapy by antibody production
included as a secondary outcome measure. Behavioral manifested increased brain atrophy. Since then, most
symptoms are experienced by 75% and 43% of demen- studies have included one or multiple biomarkers of
tia and MCI patients, respectively [20]. he most disease progression, including volumetric MRI; FDG
widely used scale to assess behavioral symptoms is the PET; brain amyloid load, PIB or AV-45 PET; and CSF
Neuropsychiatric Inventory (NPI). he NPI assesses protein analysis of Aβ, tau, phosphorylated tau, or the
10 (or 12 depending on the version) unique behav- relation of these protein levels. As yet, the use of biomar-
ioral symptoms including delusions, hallucinations, kers remains limited to secondary outcomes, oten as
dysphoria, anxiety, agitation/aggression, euphoria, substudies of the larger overall eicacy study, and no
disinhibition, irritability/lability, apathy, and aber- surrogate marker for AD has been validated. Other uses
rant motor activity [21]. he NPI utilizes a caregiver for biomarkers in AD trials will be discussed later.
interview to assess for presence of each item. If pre- A limited number of primary and secondary pre-
sent, severity (1–3 points), frequency (1–4 points), vention trials have been conducted to date in AD (Table
and associated caregiver distress are scored. Each 21.2). For these trials, the primary outcome is most oten
domain score is calculated as frequency score multi- fulillment of the diagnostic criteria for dementia or
plied by severity score. he NPI total score is equal to probable AD. One secondary prevention trial has used
the sum of the subdomains. hus, for the 10-item NPI, a global score of 1.0 on the CDR as a primary outcome
a total score of 120 is possible, with higher scores rep- [24]. he remaining large-scale prevention trials have
resenting worse behavioral symptoms. An additional used the criteria for dementia of an Alzheimer’s type
two items, sleep and appetite disturbances, were later outlined by either the Fourth Edition of Diagnostic and
added and a brief clinical version of the scale and an Statistics Manual (DSM) of the American Psychiatric
institutional/nursing home version of the NPI are also Association or the criteria for AD of the National
available. Most trials in AD utilize the full 12-item ver- Institutes of Neurological and Communicative
sion of the NPI. In the phase 3 tarenlurbil study, pla- Disorders and Stroke-Alzheimer’s Disease and Related
cebo group participants with a mean baseline MMSE Disorders Association (NINCDS-ADRDA). In general,
of 23.3 had a baseline NPI score of 8.4 and increased these criteria rely upon clinically demonstrated cogni-
1.74 and 3.37 points ater 12- and 18-months, respect- tive impairment to two or more domains, impairment
ively. he placebo group in a recent trial of dimebon to the patient’s ADL, and the excluding of other possible
(baseline MMSE = 18.7) demonstrated a 2.5 point causes of cognitive impairment. he NINCDS-ADRDA
increase in NPI score over 12 months, from a baseline criteria distinguish between possible, probable, and
score of 11.8 [22]. In a trial of memantine in which the deinite AD, with the latter being fulilled only upon
mean baseline MMSE of the placebo group was 10.2, pathological examination of brain tissue.
the baseline NPI score of 13.4 increased 2.9 points Economic impact of therapy has become an impor-
over 6 months [23]. hus, with disease progression, tant outcome for trials of medications that may be eli-
the incidence and severity of behavioral symptoms, gible for reimbursement. he Resource Utilization in
and subsequent NPI scores, increase. Dementia (RUD) scale examines caregiver time spent
Speciic measures of executive function are more assisting patients with ADL and instrumental ADL
commonly being included in clinical trials in AD. and supervising the patient. he RUD examines how
hese include the Trails A and B, Stroop test, mazes, many days and the number of hours each day a care-
cancellation, and others. he previously discussed giver assisted the patient. Treatment with cholinester-
NTB was meant to assess executive function, as this ase inhibitors and memantine have been shown to
may be a cognitive domain impaired early in AD. But result in a signiicant reduction in caregiver time assist-
speciic scales assessing this form of cognition may be ing patients, relative to placebo [25, 26]. A Resource
233
Table 21.2 Primary and secondary prevention trials in AD
Primary / Primary Outcome

Trial/Intervention Secondary Population Study Length Measure
Women’s Health Initiative Primary 2947 cognitively normal 5.6 years [halted for DSM-IV criteria for
Memory Study (WHIMS) of women 65–79 safety reasons] dementia, Petersen
estrogen plus progestin [49] criteria for MCI
Women’s Health Initiative Primary 4532 cognitively normal 5 years DSM-IV criteria for
Memory Study (WHIMS) of women 65–79 dementia, Petersen
estrogen [50] criteria for MCI
Post-menopausal AD with Primary 477 cognitively normal 5 years Specific criteria based
Replacement Estrogens women 65 years or older on DSM-IV and NINDS-
(PREPARE) [51] with a family history of AD ADRDA criteria
AD Anti-inflammatory Primary 2528 cognitively normal 5 years DSM-IV and NINDS-
Prevention Trial (ADAPT) [52] volunteers age 70 or older ADRDA criteria
with a family history of AD
Ginkgo Evaluation of Primary and 3069 total volunteers 5 years DSM-IV criteria for
Memory (GEM) study [53] secondary 75-years-or-older (2587 dementia
cognitively normal and
482 MCI)
GuidAge Study [54] Secondary 2854 patients 70-years- 5 years DSM-IV and NINDS-
or-older with a memory ADRDA criteria
complaint
Rofecoxib Protocol 078 Secondary 1457 MCI patients (global 4 years NINDS-ADRDA criteria
Study [55] CDR = 0.5, CDR memory
box ≥ 0.5, and AVLT ≤ 37)
Investigation in the Delay Secondary 1018 MCI patients 4 years DSM-IV and NINDS-
to Diagnosis with Exelon (CDR=0.5 and NYU ADRDA criteria
[InDDEx] [56] Paragraph recall < 9)
Alzheimer’s Disease Secondary 769 MCI patients (Petersen 3 years NINDS-ADRDA criteria
Cooperative Study of Vitamin criteria [58] and CDR=0.5)
E and Donepezil in MCI [57]
GAL-INT 118 Study Secondary 2048 MCI patients 2 years CDR = 1.0
(Galantamine) [24] randomized to two studies
(global CDR = 0.5, memory
box ≥ 0.5)
Utilization for Severe Patients (RUSP) scale also has

been developed. Clinical trial designs
he Quality of Life-AD (QOL-AD) is a 13-item dis- Phase 1 clinical trials of drugs being developed for
ease-speciic scale that provides patient and caregiver AD are largely similar to industry standard phase 1
assessment of patient QOL and caregiver assessment designs. Blinded and unblinded ascending single and
of caregiver QOL [27]. hirteen domains are assessed: then multiple dose studies in healthy young volunteers
physical health, energy, mood, living situation, mem- are most common, aiming to establish pharmacokinet-
ory, family, marriage, friends, chores, ability to do ics and safety of the candidate compound. Alzheimer’s
things for fun, money, self, and life as a whole. Items disease agents are also generally tested in healthy eld-
are scored 1–4 for poor, fair, good, and excellent. he erly volunteers in phase 1. Elderly patients may have
QOL-AD is applicable to all stages of disease and is not decreased drug absorption, metabolism, or excretion
related to disease severity [27, 28]. and increased concomitant therapies.
234
he exception to the inclusion of healthy non- studies). Bateman and colleagues [30] recently devel-
demented volunteers in phase 1 is for immunothera- oped a technique for using phase 2a studies to con-
peutic agents and gene therapies. Phase 1 studies of irm candidate target activity in AD. In a recent study
such therapies for AD are conducted in AD patients. For of non-demented human volunteers, they performed
example, the initial phase 1 investigation of AN-1792, 24-hour in-dwelling lumbar catheterization to exam-
tested two doses of the full-length Aβ peptide vaccine ine if the gamma secretase inhibitor LY450139 reduced
(with two doses of adjuvant for four total study groups) Aβ production (a 52% reduction was observed with
in 80 mild-to-moderate (MMSE 14–26) AD patients. the therapeutic dose of LY450139). Alternatively, other
Long-term follow-up of participants has continued phase 2a studies examine whether an agent has clinical
for nearly 10 years and fueled continued development eicacy.
of AD immunotherapies. A variety of immunothera- Phase 2b studies generally represent a inal stage
peutic strategies are currently under development, before the necessarily large-scale studies of clinical
including vaccinations with whole or fragmented eicacy (i.e. phase 3). hey are oten used to optimize
Aβ, passive immunization with humanized antibod- dosing and conirm earlier trial signals related to ei-
ies against whole or fragmented Aβ, and intravenous cacy before initiating phase 3. Phase 2a and 2b studies
immunoglobulin therapy. All phase 1 investigations of can be conducted concomitantly to establish dose and
these agents have been conducted in AD patients, not eicacy. Some phase 2b designs that show eicacy can
healthy volunteers. Similarly, one gene therapy, neuro- function as pivotal trials for regulatory purposes, as has
surgically-delivered DNA for nerve growth factor in occurred recently in some cases in AD. Unfortunately,
an adeno-associated viral vector, has been tested in a and more commonly, agents that fail to demonstrate
10-subject phase 1 study in mild AD. eicacy in phase 2a or 2b but demonstrate sugges-
Phase 2 clinical trials have a variety of goals. hese tive trends are taken on the longer, larger, and more
studies should conirm the safety of an agent in the dis- expensive trials powered to demonstrate eicacy if it is
ease population of interest. hey also should determine present. Great economic and human cost on the part of
one or a series of doses to be taken forward in clini- the sponsor, investigators, and patient participants can
cal development. In phase 2 the appropriateness of the be saved by full utilization and endorsement of phase 2
mechanism of action should be conirmed and initial trials that instruct the decision to proceed to phase 3.
data on eicacy should be provided, suicient to war- To date, phase 3 trials of symptomatic agents in AD
rant phase 3 investigation. have largely employed parallel 6–12 month double-
he safety of an agent is oten irst examined in the blind randomized design studies of approximately 500
disease of interest in a phase 2 study. Adaptive dose- AD patients. In keeping with the initial therapeutic tri-
inding and futility designs (as discussed in Chapters 8 als in AD and a subsequent regulatory guidance [7],
and 9) are becoming increasingly popular in all thera- co-primary outcomes are used, including one cogni-
peutic areas, with the expressed intent of achieving one tive outcome (most oten the ADAS-cog, see above),
of two outcomes: 1) increasing the rapidity of reaching and one global or functional outcome. In trials of
phase 3 investigation to maximize the time ater mar- more severe disease, a similar strategy is utilized, with
keting approval before expiration of patent, and 2) pre- adjusted choices of outcome measures (the SIB rather
venting the conduct of large, long, and expensive phase than the ADAS-cog, for example). Several parallel
3 trials that ultimately produce negative results. he design eicacy trials have been conducted in patients
use of such designs, as well as designs described below, meeting criteria for MCI [31, 32]. In one 48-week MCI
should be considered to reduce the chance of large- trial, the co-primary outcomes were the ADAS-cog
scale negative phase 3 trials. Adaptive stratiication to and the CDR-SB. In a 24-week trial, the co-primary
ensure balance among treatment groups for genotype, outcomes were the NYU Paragraph Recall and the
disease severity, or concomitant treatment subgroups ADCS-CGIC-MCI. No trial has demonstrated eicacy
have been utilized in AD trials [29]. in MCI. For the reasons discussed above, alternate
Some phase 2a study designs aim to conirm in tools might enhance the ability to detect a signal in
humans the desired biological efect observed in ani- those with milder impairment, such as MCI.
mal and in vitro models. Such studies are generally A number of phase 3 trials of potentially disease
small, and may be used to justify the movement into modifying therapies are underway. hese phase 3 (and
larger clinical development (i.e., phase 2b and phase 3 many phase 2) trials are 18-months in length, with the
235
intention of allowing for demonstration of slowed cog- Monitoring the time until patients reach disease-
nitive decline. Since these trials continue to use cogni- related milestones will support a claim of disease
tive measures with high variance as primary outcomes, modiication. Possible milestones for use as outcomes
they require large numbers of participants, enrolling in AD trials include functional milestones, such as
over 1000 subjects. In addition, biological markers are decline in the ability to perform ADL. Mohs et al (35]
oten included as key secondary outcome measures, utilized a survival design to examine whether donepe-
potentially providing biological support for a claim zil could delay functional decline in moderate AD
of disease modiication. Since none of the completed (MMSE 12–20 at entry). In this trial, patients were
studies have demonstrated a positive efect on the pri- randomized to active treatment or placebo. he pri-
mary outcome, however, it remains unclear if these mary outcome measure was time to clinically evident
parameters are suicient. functional decline. As described by the authors, the
Trial designs that theoretically conirm disease survival design ofered two distinct advantages over
modiication have been proposed, including the stag- a standard parallel design. First, the drug under study
gered start and randomized withdrawal designs [33]. was deemed more likely to demonstrate a signiicant
hese designs rely upon the progressive nature of AD to delay in functional decline over time than it was a func-
distinguish a therapy that can slow progression during tional improvement, relative to placebo. Second, by
treatment from one that simply provides symptomatic using this design and discontinuing participants that
relief. One version of the staggered start design rand- demonstrated signiicant decline, it permitted the ini-
omizes participants to three blinded groups: placebo tiation of open-label therapy in those patients assigned
for the duration of study, treatment for the duration of to placebo who met criteria.
study, and a third group that receives placebo at initia- Similarly, Sano and colleagues in the ADCS per-
tion but is blindly switched to therapy some period ater formed a placebo-controlled survival design trial in
study initiation (for example, ater 12 months). he moderate AD of selegiline, vitamin E, or both [36]. In
study concludes ater a subsequent period (for example this two-year study, the primary outcome measure was
6 months, for a total study duration of 18 months). If time to death, institutionalization, loss of basic ADLs,
the two treatment groups are indistinguishable at study or progression to a global CDR score of 3. Despite rand-
end, the treatment is determined to be either inefective omization, there was unbalance between the groups in
(if the groups are no diferent from placebo) or symp- baseline disease severity as measured with the MMSE
tomatic (if both groups perform better than placebo). and this diference was highly predictive of the primary
If, however, the group that received therapy at study outcome. Although no signiicant efects were seen in
initiation is superior to the delayed start group at study the prespeciied analyses of primary outcome, signii-
end, it can be concluded that the natural history of dis- cant delays of the deined functional outcomes were
ease was altered and the therapy is disease modifying. seen for both therapies and the combination, when
Similarly, the corresponding version of the randomized baseline MMSE was included as a covariate. No efects
withdrawal design utilizes three groups: placebo and of either agent were seen for secondary outcome meas-
two treatment arms at study start. Ater an initial ures of cognition.
period, one of the treatment arms is blindly switched Survival designs have been implemented in a
to placebo and the efect on natural history of disease number of AD prevention studies, including both
is examined ater an additional predetermined period. primary prevention of older cognitively normal vol-
No AD study has utilized these designs as a primary unteers, primary prevention of cognitively normal
outcome. Variations have been implemented in some volunteers with a family history of AD, and secondary
trials, however. he phase 2 investigation of tarenlurbil prevention trials of individuals who meet criteria for
employed a planned blinded extension that examined MCI (Table 21.2). Time to diagnosis of dementia serves
diferences in slope between those that received drug for as the primary outcome measure in these studies.
24 consecutive months vs. those who received placebo
for 12 months and drug for the following 12 months. Standards for efficacy and special
Similarly, a phase 3 investigation of LY-450139 planned
as a secondary outcome that ‘some time during the safety issues
study’ patients given placebo would be given LY-450139 he standard for AD trials is demonstration of ei-
to assess the efect of long-term treatment [34]. cacy on two co-primary outcomes in AD trials. Such
236
eicacy must also be replicated in a second phase 3 resulted in underpowered and even halted trials in AD
trial to achieve marketing approval. All trials lead- [37, 38].
ing to approval have used as cognitive outcomes the Age is inversely correlated with likelihood of par-
ADAS-cog for mild-to-moderate AD and the SIB for ticipation in clinical research. Because AD is a disease
moderate-to-severe AD. A variety of global and func- more common among the elderly, this serves as a bar-
tional measures have been and are being utilized as rier to trial participation. Elderly patients also sufer
the co-primary outcomes. hese include the CDR and from increased comorbidities and take a high number
CIBIC (global) and the ADCS-ADL and DAD (func- of concomitant therapies, both of which can prevent a
tional) (Table 21.1). patient from meeting eligibility criteria.
While measures for assessing cognition have been Every trial in AD and MCI, and even prevention
largely unchanged in the modern era of AD trials, trials in cognitively normal individuals, requires the
measures of safety have evolved. All therapies admin- participation of a study partner. he primary care-
istered to aged patients (and aged brains) carry addi- giver most oten ills the role of the study partner,
tional risks that are not present in young participants. though others typically are permitted to serve in this
his includes age-related vascular changes, in addi- role. Deined requirements to act as study partner vary
tion to the well-described age-related changes in drug from study to study, most oten delineated by the num-
metabolism and increased prevalence of polyphar- ber of interactions per week (e.g., three) or the number
macy. Trials of a variety of therapies believed to be low of hours spent with the subject on a weekly basis (e.g.,
in risk have in fact been terminated for safety reasons. 10 hours/week). he role of the study partner in AD
his includes a trial of the non-steroidal anti-inlam- trials is an important one: ensuring medication com-
matory agents celecoxib and naproxen. Safety con- pliance, study visit attendance, and fulilling the role of
cerns outside of trials also have been raised, including informant for outcomes assessments that require care-
cardiovascular risks for the peroxisome proliferator- giver interview (e.g., CDR).
activated receptor-γ (PPAR-γ) agonist rosiglitazone Among AD trials, minority recruitment is very
and vascular risks for alpha-tocopherol (vitamin E) at low. Faison and colleagues recently assessed minority
high doses. Further, AD interventions have increased participation in ADCS and industry-sponsored trials
in their invasiveness, now including immunotherapies over a 10-year period [59]. In ADCS trials, 90% of all
and other infused agents and neurosurgical deliveries participants were Caucasian. In industry-sponsored
of gene therapies and cell transplantation. In therapies trials, Caucasians made up over 96% of all partici-
that aim to remove Aβ from the brain, involvement pants. African Americans and Hispanics are at greater
of Aβ in cerebral vasculature may have negative con- risk of developing AD, the basis of their increased risk
sequences. Studies of passive immunotherapies have is unclear (e.g., no consensus on the role of ApoE in
required the standard use of safety MRI to identify minorities has been reached). Minority participation is
treatment-related vasogenic edema, hemorrhage, or made more diicult by the lower likelihood of diagno-
other complications. sis and treatment in early disease in these populations;
it is clearly important that their participation in trials
be increased.
Implementation challenges Demonstration of placebo group decline is crit-
Alzheimer’s disease trials face a variety of unique chal- ical to the efective testing of disease-modifying ther-
lenges, in addition to the challenges encountered across apies. If patients fail to decline when randomized to
the spectrum of clinical research in neurodegenera- placebo, treatment efect (when desired treatment
tive disease. As in all therapeutic areas, enrollment is outcome is slowing or halting of decline) is impossible
a rate-limiting factor to new AD drug approval. Given to detect. Variability of placebo decline has generated
the increased length and size of AD trials of potential recent discussion in AD trials. Gold examined placebo
disease-modifying therapies, issues related to recruit- group decline across 69 prospective double-blind AD
ment and enrollment are now of greater concern than trials that used the ADAS-cog as the primary cogni-
before. Trials that are slow to enroll are less likely to tive outcome measure [39]. He concluded that only
reach overall recruitment goals, less likely to demon- length of study predicted placebo decline across trials.
strate positive efect in the primary outcome, and less Moreover, increased number of study sites, increased
likely to be published. Challenges to enrollment have number of assessments, and high baseline MMSE were
237
all associated with a lack placebo group decline on the likely to soon develop AD dementia is a diicult and
ADAS-cog. Schneider and Sano recently assessed pla- ethically challenging arena.
cebo group decline in eleven 18-month trials for which
data were available [40]. hey noted a wide range of Methodological limitations/
mean placebo decline on the ADAS-cog in these studies
from 4.34 to 9.10 points, despite largely similar inclu- controversies
sion criteria. Mean placebo group decline at 18 months he rapid and substantial increase in research focus,
for the examined trials was 6.5 points and roughly 25% drug development, and trial conduct in AD has not
of patients declined by 1 point or less. been without challenges. Concerns exist over the lack
Trials of disease-modifying therapies are gen- of regulatory guidance for the next age of AD drug
erally at least 18 months in length to allow for sui- development, i.e., therapies that slow disease progres-
cient placebo group decline. It remains unclear if this sion rather than or in addition to causing symptomatic
is suicient time to demonstrate disease-modifying beneit. It is unclear what information will be included
eicacy. Recent large 18-month trials of tramiprosate in the prescribing information of the irst agent that
and tarenlurbil failed to demonstrate signiicant dif- demonstrates such eicacy. It is also unclear what
ferences from placebo. requirements will be necessary to make such a claim.
Most 18-month trials are powered to detect a dif- New diagnostic criteria for AD have been proposed
ference of 2 points in mean ADAS-cog scores at study but have not been endorsed by regulatory bodies [4]. It
conclusion. Depending on the expected slowing of is a consensus among investigators that implementa-
rate of decline, however, this suggests that the rate of tion of such criteria would improve trials and increase
decline observed in these studies might or might not likelihood of success. Such criteria include clinical and
be suicient for a positive trial. A drug that slows rate biomarker evidence of AD. Alternatively, the use of
of decline by 50% would require 18 months to dem- biomarkers as key secondary outcome measures has
onstrate eicacy if the placebo group was declining become common, but no biological measure of disease
by 3 points annually. A drug that slowed decline by has been accepted as a surrogate marker for disease
25% would require a 3-year trial to demonstrate ei- progression or drug eicacy.
cacy under the same parameters. Such efectiveness, Because of the lack of surrogate markers, current
however, might represent a signiicant improvement trials rely upon clinical assessment tools as primary
in the quality of care and is still worthy of marketing. outcome measures. he minimal clinically signiicant
Uncertainties related to efect sizes of potential disease beneit of therapy for disease-modifying agents is con-
modifying therapies and variance in placebo group troversial. A European group of experts concluded that
decline make planning of well-powered but cost-efec- a 2-point diference from placebo on the ADAS-cog at
tive large-scale trials diicult. 18 months was a ‘minimal clinically important change’
Finally, it is clear that AD biology begins prior to [41]. Others, however, regard global and/or functional
clinical phenomenology and this window of time may measures as being of greater importance. For these
represent the ideal point of intervention. Trials enroll- tools as well, however, the minimal diference and
ing individuals who it criteria for preclinical or pro- time course for establishing such a diference remain
dromal AD may have the greatest likelihood of success, debated and lacking regulatory guidance.
since the pathological burden of disease is minimal. he recent failures in large-scale phase 3 trials has
Further, cognitive rescue is greatest in this disease led to questions regarding which agents should be
phase. Conducting trials in such populations is dii- taken forward into large scale clinical development.
cult, however. Individuals must be identiied, informed Data from phase 2 are oten limited to non-signii-
of their condition, and recruited to long-term par- cant results on measures of eicacy and whether such
ticipation. Fulillment of entry criteria, by deinition, clinical data represent suicient rationale for moving
includes presence of a biomarker of disease. Currently, to phase 3 is debated. Support of such decisions with
the biological marker with the greatest predictive sen- biomarker data and POP studies should be used more
sitivity for AD dementia is CSF analysis via lumbar readily.
puncture (LP). Convincing individuals who may think Finally, it remains unclear if the dominant theory
that they are cognitively normal to undergo LP so that in AD is correct and if the majority of disease modi-
they can then learn that they have a disease and/or are fying therapies aim at the appropriate target. he
238
amyloid hypothesis remains well-supported by basic health impact of delaying disease onset. Am J Public
and clinical research. Alternate hypotheses exist, how- Health 1998; 88: 1337–42.
ever, and the need to pursue other lines of therapeutic 7. Leber P. Observations and suggestions on antidementia
research is compelling. herapies that aim at other drug development. Alzheimer Dis Assoc Disord 1996; 10
pathological characteristics of AD are in development, Suppl 1: 31–5.
but lag behind those that aim at Aβ. Inhibitors of the 8. Mohs RC, Knopman D, Petersen RC, et al.
kinases that phosphorylate tau and phosphatases that Development of cognitive instruments for use in
clinical trials of antidementia drugs: additions to the
attempt to reverse this phosphorylation are in develop-
Alzheimer’s Disease Assessment Scale that broaden
ment. Agents that aim to protect neurons from death, its scope. he Alzheimer’s Disease Cooperative Study.
independent of the cause of that death, including mito- Alzheimer Dis Assoc Disord 1997; 11 Suppl 2: S13–21.
chondrial stabilizers and neurotrophic factors, are 9. Holford NH and Peace KE. Results and validation of
also in development. Agents that aim at the intrinsic a population pharmacodynamic model for cognitive
mechanisms of aging, the single greatest risk factor for efects in Alzheimer patients treated with tacrine. Proc
AD, such as resveretrol and other antioxidants are now Natl Acad Sci U S A 1992; 89: 11471–5.
being tested clinically. 10. Ito K, Ahadieh S, Corrigan B, et al. Disease progression
In summary, AD trials represent an area of urgent meta-analysis model in Alzheimer’s disease. Alzheimers
need, tremendous enthusiasm, and great promise. An Dement 2010; 6: 39–53.
unprecedented number of trials are currently under 11. Green RC, Schneider LS, Amato DA, et al. Efect of
way at all levels of development. A wide array of mech- tarenlurbil on cognitive decline and activities of
anisms of action and therapeutic targets are being pur- daily living in patients with mild Alzheimer disease: a
sued. Despite this diversity, trial designs and tools are randomized controlled trial. JAMA 2009; 302: 2557–64.
largely unchanged in the modern era of AD research. It 12. Suh GH, Ju YS, Yeon BK, and Shah A. A longitudinal
is possible that similar evolution in the way that trials study of Alzheimer’s disease: rates of cognitive and
functional decline. Int J Geriatr Psychiatry 2004; 19:
are conducted will be needed before efective disease
817–24.
modifying therapies can be demonstrated as clinically
13. Stern RG, Mohs RC, Davidson M, et al. A longitudinal
efective, approved for large-scale marketing, and uti-
study of Alzheimer’s disease: measurement, rate, and
lized to avoid an extraordinary health care burden. predictors of cognitive deterioration. Am J Psychiatry
1994; 151: 390–96.
References 14. Kleiman T, Zdanys K, Black B, et al. Apolipoprotein E
epsilon4 allele is unrelated to cognitive or functional
1. Shankar GM, Li S, Mehta TH, et al. Amyloid-beta decline in Alzheimer’s disease: retrospective and
protein dimers isolated directly from Alzheimer’s prospective analysis. Dement Geriatr Cogn Disord 2006;
brains impair synaptic plasticity and memory. Nat Med 22: 73–82.
2008; 14: 837–42.
15. Harrison J, Minassian SL, Jenkins L, et al. A
2. Meyer-Luehmann M, Spires-Jones TL, Prada C, et al. neuropsychological test battery for use in Alzheimer
Rapid appearance and local toxicity of amyloid-beta disease clinical trials. Arch Neurol 2007; 64: 1323–9.
plaques in a mouse model of Alzheimer’s disease. 16. Black SE, Doody R, Li H, et al. Donepezil preserves
Nature 2008; 451: 720–24. cognition and global function in patients with severe
3. Price JL, McKeel DW, Jr., Buckles VD, et al. Alzheimer disease. Neurology 2007; 69: 459–69.
Neuropathology of nondemented aging: presumptive 17. Feldman H, Sauter A, Donald A, et al. he disability
evidence for preclinical Alzheimer disease. Neurobiol assessment for dementia scale: a 12-month study
Aging 2009; 30: 1026–36. of functional ability in mild to moderate severity
4. Dubois B, Feldman HH, Jacova C, et al. Research Alzheimer disease. Alzheimer Dis Assoc Disord 2001; 15:
criteria for the diagnosis of Alzheimer’s disease: 89–95.
revising the NINCDS-ADRDA criteria. Lancet Neurol 18. Schneider LS, Raman R, Schmitt FA, et al. Characteristics
2007; 6: 734–46. and performance of a modiied version of the ADCS-
5. Cummings JL. Deining and labeling disease-modifying CGIC CIBIC+ for mild cognitive impairment clinical
treatments for Alzheimer’s disease. Alzheimers Dement trials. Alzheimer Dis Assoc Disord 2009; 23: 260–7.
2009; 5: 406–18. 19. Morris JC. he Clinical Dementia Rating (CDR):
6. Brookmeyer R, Gray S, and Kawas C. Projections of current version and scoring rules. Neurology 1993; 43:
Alzheimer’s disease in the United States and the public 2412–4.
239
20. Lyketsos CG, Lopez O, Jones B, et al. Prevalence of 35. Mohs RC, Doody RS, Morris JC, et al. A 1-year,
neuropsychiatric symptoms in dementia and mild placebo-controlled preservation of function survival
cognitive impairment: results from the cardiovascular study of donepezil in AD patients. Neurology 2001; 57:
health study. JAMA 2002; 288: 1475–83. 481–8.
21. Cummings JL, Mega M, Gray K, et al. he 36. Sano M, Ernesto C, homas RG, et al. A controlled trial
Neuropsychiatric Inventory: comprehensive assessment of selegiline, alpha-tocopherol, or both as treatment
of psychopathology in dementia. Neurology 1994; 44: for Alzheimer’s disease. he Alzheimer’s Disease
2308–14. Cooperative Study. N Engl J Med 1997; 336: 1216–22.
22. Doody RS, Gavrilova SI, Sano M, et al. Efect of 37. Gomez-Isla T, Blesa R, Boada M, et al. A randomized,
dimebon on cognition, activities of daily living, double-blind, placebo controlled-trial of trilusal
behaviour, and global function in patients with mild-to- in mild cognitive impairment: the TRIMCI study.
moderate Alzheimer’s disease: a randomised, double- Alzheimer Dis Assoc Disord 2008; 22: 21–29.
blind, placebo-controlled study. Lancet 2008; 372: 38. de Jong D, Jansen R, Hoefnagels W, et al. No efect of
207–15. one-year treatment with indomethacin on Alzheimer’s
23. Tariot PN, Farlow MR, Grossberg GT, et al. Memantine disease progression: a randomized controlled trial.
treatment in patients with moderate to severe PLoS One 2008; 3: e1475.
Alzheimer disease already receiving donepezil: a 39. Gold M. Study design factors and patient
randomized controlled trial. JAMA 2004; 291: 317–24. demographics and their efect on the decline of
24. Winblad B, Gauthier S, Scinto L, et al. Safety and placebo-treated subjects in randomized clinical trials
eicacy of galantamine in subjects with mild cognitive in Alzheimer’s disease. J Clin Psychiatry 2007; 68:
impairment. Neurology 2008; 70: 2024–35. 430–8.
25. Wimo A, Winblad B, Shah SN, et al. Impact of 40. Schneider LS and Sano M. Current Alzheimer’s
donepezil treatment for Alzheimer’s disease on disease clinical trials: methods and placebo outcomes.
caregiver time. Curr Med Res Opin 2004; 20: 1221–5. Alzheimers Dement 2009; 5: 388–97.
26. Wimo A, Winblad B, Stoler A, et al. Resource 41. Vellas B, Andrieu S, Sampaio C, and Wilcock G.
utilisation and cost analysis of memantine in Disease-modifying trials in Alzheimer’s disease: a
patients with moderate to severe Alzheimer’s disease. European task force consensus. Lancet Neurol 2007; 6:
Pharmacoeconomics 2003; 21: 327–40. 56–62.
27. Logsdon RG, Gibbons LE, McCurry SM, and Teri L. 42. Knapp MJ, Knopman DS, Solomon PR, et al. A 30-week
Assessing quality of life in older adults with cognitive randomized controlled trial of high-dose tacrine in
impairment. Psychosom Med 2002; 64: 510–19. patients with Alzheimer’s disease. he Tacrine Study
Group. JAMA 1994; 271: 985–91.
28. Shin IS, Carter M, Masterman D, et al. Neuropsychiatric
symptoms and quality of life in Alzheimer disease. Am J 43. Rogers SL, Doody RS, Mohs RC, Friedhof LT.
Geriatr Psychiatry 2005; 13: 469–74. Donepezil improves cognition and global function in
Alzheimer disease: a 15-week, double-blind, placebo-
29. Salloway S, Sperling R, Gilman S, et al. A phase 2
controlled study. Donepezil Study Group. Arch Intern
multiple ascending dose trial of bapineuzumab in mild
Med 1998; 158: 1021–1031.
to moderate Alzheimer disease. Neurology 2009; 73:
2061–70. 44. Rosler M, Anand R, Cicin-Sain A, et al. Eicacy and
safety of rivastigmine in patients with Alzheimer’s
30. Bateman RJ, Siemers ER, Mawuenyega KG, et al. A
disease: international randomised controlled trial. BMJ
gamma-secretase inhibitor decreases amyloid-beta
1999; 318: 633–8.
production in the central nervous system. Ann Neurol
2009; 66: 48–54. 45. Tariot PN, Solomon PR, Morris JC, et al. A 5-month,
randomized, placebo-controlled trial of galantamine in
31. Doody RS, Ferris SH, Salloway S, et al. Donepezil
AD. he Galantamine USA-10 Study Group. Neurology
treatment of patients with MCI: a 48-week randomized,
2000; 54: 2269–76.
placebo-controlled trial. Neurology 2009; 72: 1555–61.
46. Raskind MA, Peskind ER, Wessel T, Yuan W.
32. Salloway S, Ferris S, Kluger A, et al. Eicacy of
Galantamine in AD: A 6-month randomized, placebo-
donepezil in mild cognitive impairment: a randomized
controlled trial with a 6-month extension. he
placebo-controlled trial. Neurology 2004; 63: 651–7.
Galantamine USA-1 Study Group. Neurology 2000; 54:
33. Cummings JL, Doody R, and Clark C. Disease- 2261–8.
modifying therapies for Alzheimer disease: challenges 47. Winblad B, Cummings J, Andreasen N, et al. A six-
to early intervention. Neurology 2007; 69: 1622–34. month double-blind, randomized, placebo-controlled
34. www.clinicaltrials.gov. study of a transdermal patch in Alzheimer’s disease –
240
rivastigmine patch versus capsule. Int J Geriatr 53. DeKosky ST, Williamson JD, Fitzpatrick AL,
Psychiatry 2007; 22: 456–67. et al. Ginkgo biloba for prevention of dementia: a
48. Winblad B, Kilander L, Eriksson S, et al. Donepezil in randomized controlled trial. JAMA 2008; 300: 2253–62.
patients with severe Alzheimer’s disease: double-blind, 54. Vellas B, Andrieu S, Ousset PJ, et al. he GuidAge
parallel-group, placebo-controlled study. Lancet 2006; study: methodological issues. A 5-year double-
367: 1057–65. blind randomized trial of the eicacy of EGb 761 for
49. Shumaker SA, Legault C, Rapp SR, et al. Estrogen prevention of Alzheimer disease in patients over 70
plus progestin and the incidence of dementia and with a memory complaint. Neurology 2006; 67: S6–11.
mild cognitive impairment in postmenopausal 55. hal LJ, Ferris SH, Kirby L, et al. A randomized, double-
women: the Women’s Health Initiative Memory Study: blind, study of rofecoxib in patients with mild cognitive
a randomized controlled trial. JAMA 2003; 289: impairment. Neuropsychopharmacology 2005; 30:
2651–62. 1204–15.
50. Shumaker SA, Legault C, Kuller L, et al. Conjugated 56. Feldman HH, Ferris S, Winblad B, et al. Efect of
equine estrogens and incidence of probable dementia rivastigmine on delay to diagnosis of Alzheimer’s
and mild cognitive impairment in postmenopausal disease from mild cognitive impairment: the InDDEx
women: Women’s Health Initiative Memory Study. study. Lancet Neurol 2007; 6: 501–12.
JAMA 2004; 291: 2947–58. 57. Petersen RC, homas RG, Grundman M, et al. Vitamin
51. Sano M, Jacobs D, Andrews H, et al. A multi-center, E and donepezil for the treatment of mild cognitive
randomized, double blind placebo-controlled trial of impairment. N Engl J Med 2005; 352: 2379–88.
estrogens to prevent Alzheimer’s disease and loss of 58. Petersen RC, Smith GE, Waring SC, et al. Mild cognitive
memory in women: design and baseline characteristics. impairment: clinical characterization and outcome.
Clin Trials 2008; 5: 523–33. Arch Neurol 1999; 56: 303–8.
52. Meinert CL, McCafrey LD, Breitner JC. Alzheimer’s 59. Faison WE, Schultz SK, Aerssens J, et al. Potential
Disease Anti-inlammatory Prevention Trial: design, ethnic modiiers in the assessment and treatment
methods, and baseline results. Alzheimers Dement 2009; of Alzheimer’s disease: challenges for the future. Int
5: 93–104. Psychogeriatr 2007; 19: 539–558.
241
Section
Section6 Clinical trials in common neurological disorders
Chapter
Acute ischemic stroke
22 Devin L. Brown, Karen C. Johnston, and Yuko Y. Palesch
Overview Neuroprotection
In this chapter, we will discuss clinical trials in acute An alternative or complementary strategy to recanali-
ischemic stroke. Stroke is one of the leading causes of zation is neuroprotection aimed at interruption of the
death in the US and the leading cause of adult disabil- ischemic cascade for tissue preservation. Numerous
ity. Unfortunately, there is currently only one FDA- aspects such as energy supply failure, membrane depo-
approved treatment for this devastating and common larization, excitatory amino acid release, intracellular
disease. Ischemic stroke prevention and rehabilitation calcium accumulation, free radical elaboration, and
strategies share little in common with acute therapies cellular edema can be targeted [1]. Despite successes
so we will focus on acute therapies. in animal models, no neuroprotective agent has yet
been successful in humans. he multitude of failed
Biological basis neuroprotective clinical trials led to the development
of the Stroke herapy Academic Industry Roundtable
Recanalization (STAIR) recommendations [9]. hese recommenda-
tions describe guidelines for preclinical development of
Cerebral infarction is the result of severe enough
potential neuroprotective agents in the hopes that more
ischemia for a suicient time to result in cell death.
rigorous preclinical preparation and drug selection will
he progression toward infarction includes protein
ultimately yield a successful neuroprotective agent.
synthesis failure, anaerobic metabolism, release of
neurotransmitters, energy failure, and ultimately,
when the threshold of <0.15 cc/gm/min of blood Time window
low is reached, anoxic depolarization [1]. If hypo- Because ischemia causes time dependent tissue injury,
perfusion can be remedied quickly, penumbral tis- time is critical in initiation of acute stroke therapies.
sues which were not yet critically hypoperfused can If eicacious therapies can be initiated early enough,
be saved. his is the physiological basis of recanali- the ischemic penumbra can be salvaged, tissue damage
zation therapy with lytics and mechanical agents. can be limited, and clinical outcomes can be improved.
Ultrasound and physiological studies have shown Even the most eicacious therapy will fail however if the
that recanalization with intravenous recombinant stroke is completed and no viable tissue remains to be
tissue plasminogen activator (IVrt-PA) is associated rescued at the time of drug administration. Time is our
with tissue salvage and better clinical outcomes [2, 3]. best marker of salvageable tissue currently, although
Clinical trials have shown that IV rt-PA is associated clearly individuals respond diferently to duration of
with better outcomes compared with control groups ischemia likely dependent on collateral low, age, and
[4, 5]. Similarly, successful endovascular clot removal, many other factors. Functional imaging, originally
retrieval, and lysis have been associated with better with PET, but now more commonly with CT perfusion
clinical outcomes than persistent arterial occlusion and MR perfusion imaging can be used to study the
[6–8]. However, no mechanical system has yet been ischemic penumbra. When mismatch of infarct and
tested in clinical eicacy studies. perfusion deicit is identiied, salvageable penumbra
242
Chapter 22: Acute ischemic stroke
may exist. Studies have shown that thrombolytic ther- herefore, preservation of brain tissue is also a goal.
apy is more eicacious in those with existing pen- In addition to limiting the primary injury cause by
umbra [10], but no eicacy study has yet proven that ischemia through penumbral salvage, interruption of
functional imaging can be used to extend the time win- the ischemic cascade through neuroprotection may
dow for thrombolytic administration [11]. he NINDS decrease secondary injury. Ultimately, the tissue goal is
rt-PA Stroke Study [5] showed that IV rt-PA is eica- similar: to limit the amount of infarcted brain.
cious when used in the irst 3 hours of stroke symptom
onset, and ECASS III [4] extended this to 4.5 hours. Properties and measurement tools
PROACT II [8] showed that endovascular prouroki-
nase when initiated within 6 hours of proximal middle Biomarkers/biological outcome
cerebral artery occlusion, and infused over 2 hours, also Biomarkers may provide an eicient means of deter-
improves clinical outcomes. Other single arm studies mining biological efects of new agents and are useful
[6] have initiated endovascular recanalization therapy outcomes in middle development because they predict
out to 8 hours, but no randomized, controlled eicacy a clinical endpoint. Surrogate markers, biomarkers that
study has proven any clinical beneit to any recanaliza- capture the full major efects of a treatment [12, 13], are
tion therapy initiated ater 6 hours. Increased risk of used to substitute for clinical outcome measures, but have
intracranial hemorrhage and less beneit was found in not been accepted in late development acute stroke trials.
PROACT II compared with the NINDS rt-PA Stroke
Study, but this may be due to the more severely afected Recanalization/reperfusion
patients in the endovascular treatment trial. Recanalization, the re-establishment of arterial patency
can be assessed by angiography and indirectly by tran-
Goals of intervention scranial Doppler ultrasound [14], and graded. Results
may be confusing due to inconsistencies in the appli-
Reduction of death and disability cation of recanalization rating scales, and therefore, it
he main purpose of acute stroke therapy is to reduce has been recommended that all trials reporting angi-
death and disability. he only currently FDA-approved ographic outcomes include information on target vessel
therapy for acute ischemic stroke, IV rt-PA, has been patency, distal illing, and capillary phase perfusion [15].
proven to reduce disability but does not have an afect Recanalization relates but is not identical to antegrade
on mortality [5]. One would assume that a signiicant reperfusion, which is volume of low through the pre-
and early reduction in post-stroke deicits should reduce viously occluded vessel, and collateral perfusion which
potentially fatal stroke-related complications such as represents the volume of low through collaterals to the
aspiration pneumonia and pulmonary emboli, common ischemic region. Even when recanalization is successful,
causes of death post stroke. Nevertheless, functional the region may remain ischemic due to distal emboli.
recovery is a highly meaningful outcome measure. Furthermore, low may be established too late for some
or all of the ischemic tissue to be preserved. herefore,
Reduction of brain injury volume through recanalization can be an important marker of treatment
efects, but alone is not suicient to determine whether a
penumbral salvage (primary injury) treatment is going to be efective. In middle development
As discussed above, at the initiation of ischemia, there is studies, early recanalization should be assessed in addi-
a core of infarction surrounded by viable but impaired tion to assessment of late infarct volume [15]. PROACT
tissue. Expeditious reperfusion can save the impaired I provides an example of a middle phase study that used
tissue and return it to normal function. Failure to save recanalization as the primary endpoint [16]. Subjects
this tissue results in permanent structural changes and with M1 or M2 occlusions had prourokinase or placebo
ultimately necrosis of neuronal cells. Volume of infarc- infused directly into the proximal portion of the throm-
tion does relate to ultimate outcome, but the relation- bus, initiated within 6 hours of stroke symptom onset.
ship is non-linear, and depends on location, age, and Both groups received heparin. he primary eicacy out-
other factors. Furthermore, there is some evidence that come was recanalization of the M1 or M2 2 hours ater
the efects of infarcted tissue, even if it does not dir- the initiation of the treatment. here was also a primary
ectly result in disability, accumulate and may contrib- safety outcome: symptomatic intracerebral hemorrhage
ute to cognitive dysfunction and poor brain ‘reserve.’ within 24 hour of treatment.
243
Imaging outcomes Neurological worsening

Infarction size on CT or MRI can be used as an out- Neurological deterioration may be due to a variety of
come measure in middle phase studies. Final difer- causes such as sICH, seizure, intracranial hypertension,
ence between treatment groups can be compared, or recurrent stroke, and medical illnesses that include pneu-
alternatively, diferences between baseline infarction monia and urinary tract infections. Neurological wors-
size and inal size can be compared when baseline ening is oten speciied as a deterioration in NIHSS of 4
measures are feasible. However, location of infarction points or more, and usually triggers a mandatory head
and clinical deicits are also important factors in ulti- CT to investigate the possibility of sICH. Neurological
mate outcome in addition to lesion size. worsening is typically a safety outcome which is not
necessarily reported in the primary trial publication; it
Safety outcome measures is oten due to the underlying disease and not the study
treatment. Because there can be temporary luctuations
Symptomatic intracranial hemorrhage early ater stroke, the duration of worsening that consti-
Symptomatic intracranial hemorrhage (sICH) is a highly tutes neurological worsening is oten speciied.
feared complication of acute stroke recanalization thera-
pies because it carries with it a high risk of poor outcome Serious adverse events
including death. It has been deined diferently in difer- Although, in accordance with Good Clinical Practice
ent trials but is oten characterized by any hemorrhage on guidelines, all serious adverse events (SAEs) should be
CT scan within the irst 24–48 hours ater stroke symptom collected in a clinical trial, many SAEs in the 3 months
onset accompanied by a meaningful deterioration in neu- ater stroke are related to the underlying stroke rather
rological status sometimes deined by a worsening on the than the study treatment. However there are drug- and
NIHSS by 4 or more points [17]. CT evidence of hemor- device-related SAEs in addition to sICH. For instance,
rhage has been graded by the European Cooperative Acute thrombolytic treatment can cause angioedema and
Stroke Studies (ECASS) investigators into four categories: associated respiratory compromise, although this is
hemorrhagic infarction-1 (HI-1) with small petechial uncommon. Use of recanalization devices can cause
hemorrhage, hemorrhagic infarction-2 (HI-2) with con- arterial dissection, vascular perforation, and emboliza-
luent petechial hemorrhage, parenchymal hematoma-1 tion into a previously unafected vessel; while angiog-
(PH-1) where the hematoma consumes less than 30% of raphy itself carries a risk of retroperitoneal hemorrhage
the infarcted area with a mild space-occupying efect, and and contrast-related complications. Intubation is
parenchymal hematoma-2 (PH-2) where the hematoma required for many recanalization procedures which
takes up greater than 30% of the infarcted area and exerts also can result in complications.
a signiicant space-occupying efect. Most deinitions
of sICH do not include a requirement for parenchymal Death
hematoma, but risk of neurological deterioration is more Death is a clinical outcome that is recorded in stroke
likely with larger amounts of hemorrhage. studies as an SAE but is also oten a part of a pre-
speciied outcome measure, such as in the modiied
Asymptomatic ICH
Rankin Scale. On many rating scales, death is ascribed
It has been argued that while parenchymal hematomas the worst outcome score (e.g., modiied Rankin Scale),
are due to a thrombolytic efect, that hemorrhagic trans- while with other scales there is no provision for death
formation is related to other factors and is an irrelevant (e.g., Barthel Index).
epiphenomenon [18]. Hemorrhagic transformation
is oten not typically accompanied by symptomatic
worsening. However, there is some recent evidence Clinical efficacy outcome measures –
that HI-2 is associated with poor clinical outcomes and definitive endpoint
that the outcome is proportional to the extent of hem-
orrhage [19]. It is uncertain whether asymptomatic Traditional scales
ICH is a meaningful safety outcome because of its lack he selection of an endpoint for a clinical trial depends
of clear clinical signiicance. However, asymptomatic on the intervention’s mechanism of action and
hemorrhage is more common in lytic treated subjects expected efect. Ideally an endpoint should be reliable,
and may be a marker for lytic activity [17]. reproducible, sensitive, easy to measure, and clinically
244
Table 22.1 Common stroke outcome measures
Measurement Scoring
Neurological impairment
NIH Stroke Scale[18] Neurological examination based on 13 clinical items Ordinal scale ranging from
0 (best) to 2 or 3 or 4 (worst)
for each of 13 items
Disability measures
Modified Rankin Scale[12] Functional assessment ranging from no symptoms to Ordinal scale ranging from 0
death (no symptoms) to 6 (death)
Barthel Index[13] Activities of daily living based on 10 questions on feeding, Total score from 10 items
bathing, grooming, dressing, toilet use, transfers, mobility, ranging from 0 to 100 (best)
stairs, bowel, and bladder continence.
Quality of life
Stroke Impact Scale[26] 59 questions covering 9 domains: (strength, hand function, Each domain ranging from
activities of daily living, mobility, communication, emotion, 0–100 (best)
memory and thinking, and social participation).
Stroke Specific Quality of 49 questions covering 12 domains (energy, family roles, Overall score is the average
Life[25] language, mobility, mood, personality, self-care, social of all domains
roles, thinking, upper extremity function, vision, and work/
productivity).
meaningful (for later stage trials). he use of more sen- lower when using the modiied Rankin Scale than the
sitive outcome measures should in general help reduce Barthel Index given that the modiied Rankin Scale is
sample size requirements. For middle phase studies, more sensitive to change [23, 24]. Two more compre-
the endpoint should relate to the mechanism of action hensive and stroke-speciic outcome measures were
of the treatment, even if it is not the most clinically rele- more recently developed: the Stroke-Speciic Quality
vant outcome, given the goal of identifying a biological of Life (SS-QOL) [25] and the Stroke Impact Scale (SIS)
efect with the fewest patients necessary. [26]. Both expand the spectrum of limitations in activ-
A variety of measures can be used to assess recov- ities, physical abilities, and participation. hese stroke-
ery post stroke, and there is no consensus about which speciic measures are increasing in use in clinical trials
measure or what cutofs to use (Table 22.1). he most as secondary outcomes.
common eicacy endpoint measures in acute stroke
therapy trials are the modiied Rankin Scale and the Non-traditional scales
NIHSS [20], most oten performed at 3 months post As mentioned, quality of life is an important measure-
stroke. To improve standardization, certiied person- ment that represents a comprehensive patient-oriented
nel should be used to administer the NIHSS, and a outcome measure. When measured, it is typically rel-
structured interview should be used for the modiied egated to a secondary outcome. Cognitive outcomes
Rankin Scale. he modiied Rankin Scale while simple are also gaining recognition as important post-stroke
and reliable, is insensitive and only has 7 categories [21]. measures. No single measure of cognitive function
he Barthel Index is valid and reliable but is insensi- post stroke is accepted, and measurements are compli-
tive to small changes and is limited by a ceiling efect cated by aphasia.
[21]. hose attaining the highest (i.e., best) score can
nonetheless have signiicant disabilities [22]. Because Global statistics
placebo-treated patients tend to achieve a ‘favorable Global statistics incorporate results of more than one
outcome’ more frequently for disability than for neuro- measure simultaneously and therefore may increase
logical impairment measures, larger sample sizes may the chance of identifying a treatment efect especially
be required when using a disability index, such as the within a heterogeneous study population [22]. Study
modiied Rankin Scale. Required sample sizes may be power using a global measure is at least equal to and
245
oten greater than using a single measure, assuming in poor outcome is of interest [8, 29]. his approach
a common treatment efect among the measures. For is advantageous because it is analytically simple, and
these reasons, it was used to test the primary hypothesis creates results that are clinically meaningful and inter-
of part II of the NINDS rt-PA Stroke Study. he global pretable and easily described to patients. he most sig-
statistic incorporated the results of four pre-speciied niicant disadvantage, however, is loss of information,
outcome measures: the NIHSS, Barthel Index, modi- where small but potentially meaningful improvements
ied Rankin Scale, and the Glasgow Outcome Scale [5]. can be missed. Occasionally, ordinal outcomes are tri-
Patients who were deceased at the time of the outcome chotomized, such as in the GAIN Americas trial [30].
assessment at 90 days were ascribed the lowest score In this study, the Barthel Index was trichochomized
in each scale. he global statistics test was signiicant, into 95–100, 60–90, and 0–55 or dead. he extended
as were the tests of the individual outcome measures. Mantel-Haenszel test was used to test whether the dis-
Because the results of a global statistic may be diicult tribution of scores was diferent between the treatment
to interpret clinically, the FDA may require justiica- groups.
tion for its use in an acute stroke trial, where a single Shit analysis, also known as analysis of distribu-
outcome measure may be suicient to capture the efect tions or proportional odds model analysis, assesses
of the treatment. diferences in the distribution of treatment groups
across the full range of an outcome scale [31, 32]. It
Early vs. late clinical outcomes can account for realistic treatment goals and does not
hree months is the most common clinical endpoint in require that the most severe patients demonstrate a dra-
late phase acute stroke trials [21]. Much of the recovery matic, and perhaps unrealistic, improvement in order
from acute stroke is thought to have occurred by this for the treatment to be called a success. As an example,
time, and thus greater diferences between the treat- the Stroke–Acute Ischemic NXY Treatment II (SAINT
ment groups may be seen. However, this later endpoint II) trial compared distribution of modiied Rankin
may introduce additional variability compared with Scale scores at 90 days between those treated with a
earlier time points, from factors unrelated to the treat- putative neuroprotectant agent, NXY-059, and placebo
ment allocation. For instance, diferences in recurrent [33]. Because the SAINT II investigators anticipated
stroke or rehabilitation programs may contribute to the beneits of a neuroprotective agent to afect all lev-
late outcome diferences that are unrelated to the study els of severity moderately, they opted to use shit ana-
treatment. Earlier endpoints, such as 24 hours or 7 days, lysis rather than analysis of the dichotomized modiied
may be feasible depending on the mechanism of action Rankin Scale where only subjects with minimal or no
of the treatment. Furthermore, the use of early out- disability would be counted as a ‘success.’ Distributions
comes in adaptive designs (see below) could result in of scores were compared with a generalized Cochran-
increased selection eiciency, and a shortend duration Mantel-Haenszel test [33] adjusting for three baseline
in comparision with non-adaptive trials. Middle devel- covariates. his test does not assume proportionality
opment studies oten use earlier outcomes as primary of odds ratio (i.e., does not assume that the odds ratio
endpoints. For instance, the IVrt-PA bridging study – a would be the same regardless of the choice of cutpoint
middle phase randomized, controlled trial of IVrt-PA for dichotomization on the ordinal scale). In this study,
compared with placebo – used a reduction in NIHSS there was no diference between the groups, but had
by 4 or more points or NIHSS of zero at 24 hours as the there been, explanation of the magnitude of treatment
primary outcome measure [27]. NIHSS at 24 hours has beneit to a patient or his/her family may have been
been shown to predict the modiied Rankin Scale at 3 somewhat challenging.
months [28]. he use of a single measure of success applied to a
heterogeneous study population may obscure beneit.
Treatment of outcome analytically – dichotomous, herefore, some investigators have proposed diferent
ordinal, continuous, sliding dichotomy criteria for favorable outcome depending on baseline
Dichotomous treatment of ordinal outcome scales severity. For example, in the Abciximab in Emergency
is common in acute stroke trials. For example, the Stroke Treatment Trial–II (AbESTT-II)[29], this type
modiied Rankin scale is oten dichotomized into of sliding dichotomy, or responder analysis, was used. A
0–1 vs. 2–6, for trials where very good outcomes are successful outcome, a so called ‘responder,’ was deined
anticipated [5, 29] or into 0–2 vs 3–6 where reduction as follows: if the baseline NIHSS was 4–7, the goal was
246
a modiied Rankin Scale=0; if the baseline NIHSS was can be more easily detected. Traditional dose-inding
8–14, the goal was a modiied Rankin Scale=0–1; if the studies, open-label dose escalation studies where a
baseline NIHSS was 15–22, the goal modiied Rankin small group of patients are treated with successively
Scale was 0–2. his is one reasonable and analytically higher doses of drug pending the lack of suicient
simple way of accounting for expected diferences in adverse events, or treated with lower doses if adverse
beneit based on initial severity. he results can also be events occur, have been used successfully. his approach
reasonably communicated to patients by referring to was used to develop IV rt-PA. In the irst IVrt-PA pilot
percentage in each treatment group with a favorable study [35], 74 patients were treated within 7 dose tiers
outcome. ranging from 0.35 mg/kg and ending with 1.08 mg/kg
Studies using individual level patient data from within 90 minutes of stroke symptom onset. Members
randomized controlled stroke trials demonstrated that of the safety and monitoring committee and the inves-
tests that maintain the ordinal level of data are typically tigators made consensus decisions about the number
more eicient than treating functional outcome meas- of subjects treated per dose and dose advancement.
ures dichotomously [34]. In fact, on average, while he absence of a single intracranial hematoma in at
maintaining the same statistical power, trials ana- least six consecutively treated patients in a dosing tier
lyzed using an ordinal approach could have been 28% prompted a dose advancement ater review. Higher
smaller than those measured using a binary approach. numbers of subjects were required for the highest dose
However, the analytic approach planned for a trial tiers. Two major bleeding complications in six patients
should be pre-speciied based on a variety of factors, at a particular dose resulted in a dose tier reduction.
including the anticipated treatment efect. Simulation No intracranial hematomas occurred in any of the
studies have suggested that depending on the pattern of 58 subjects treated with ≤0.85 mg/kg; although, dose
treatment beneit, shit analysis or dichotomous analy- tier did not relate to infarction volume. In the second
ses can be more eicient [31]. Shit analysis is likely to IV rt-PA pilot study [36], where subjects were treated
be more eicient for treatments that result in uniform within 91–180 minutes, 20 patients were tested in 3
mild beneit across outcome levels. However, since the dose tiers: 0.6 mg/kg, 0.85 mg/kg, and 0.95 mg/kg. One
distribution of the outcome data are unknown when fatal intracranial hemorrhage occurred in each of the
designing a trial, the sample size and analysis plan highest dose tiers.
should consider the efect of violation of primary anal- One middle development study, the Albumin in
ysis model assumptions, if any, on the statistical power Acute Stroke (ALIAS) trial used its dose escalation
and inferences at the conclusion of the study. results to assess for an eicacy signal by grouping their
dose tiers. ALIAS was an open-label, dose-escalation
study that tested the safety of moderate to high doses of
Clinical trial design used 25% human albumin in acute ischemic stroke. Six doses
in development were administered; the lowest three were thought to be
subtherapeutic based on preclinical studies. he inves-
Early and middle development tigators therefore grouped the outcome of the lowest
three tiers and the highest three. hey also compared
Dose finding the highest three dose tiers with data from the NINDS
Early and middle development studies of new com- rt-PA Stroke Study [37].
pounds for acute stroke require pharmacologic data Adaptive designs (see Chapter 9) are an eicient
such as determination of the efective plasma level, way of learning the dose-response relationship in
delineation of the minimum dose that achieves 95% of real time, but have been applied infrequently in acute
the maximum efect (i.e., the ED95), optimal dose, time stroke trials. One example is the Acute Stroke herapy
window, and duration of therapy, and contribute to by Inhibition of Neutrophils (ASTIN) study [38] which
safety data. Early middle phase studies are also needed tested a neutrophil inhibitory factor in 15 doses rang-
to determine safety and to gain information on eicacy. ing from 10–120 mg and placebo using a Bayesian
For these, broad eligibility criteria are sensible, so that adaptive dose-response inding study. his approach
danger to those with comorbidities and the elderly can was designed for early termination for eicacy or
be determined. Later middle phase studies may beneit futility using a clinically relevant outcome measured
from a narrowing of eligibility so that biological efects at 90 days post stroke. A sequential stopping rule was
247
applied where the efect compared with placebo of the deinitively the treatment efect. However, a middle
ED95 was iteratively calculated and if it reached a preset phase study using historical control data can provide
threshold, would trigger study termination for futility preliminary information about the treatment efect
or eicacy. he study was terminated for futility ater more eiciently since it requires about one-fourth of
966 subjects had been treated. the sample size of a concurrently controlled study with
the same study parameters.
Single treatment arm studies
When new endovascular mechanical treatments are Futility
introduced, no dose-inding studies are necessary. he purpose of middle development futility studies is
Single treatment arm studies are oten performed to discard treatments with a small likelihood of suc-
where patients who meet certain selection criteria are cess, and to maintain promising treatments to test in
all ofered the new treatment. An example of this is late phase studies (see Chapter 8). hese protocols are
mechanical clot removal for larger artery occlusions typically conducted as single-arm studies in which
with the Merci device in the Mechanical Embolus all subjects receive the treatment under question.
Removal in Cerebral Ischemia (MERCI) trials [6]. Outcomes used for comparison are obtained from pla-
he primary outcome was recanalization of the tar- cebo groups of other trials conducted in similar study
get vessel, while important safety outcomes, such as populations, case-series, or from clinical consensus/
sICH and device-related complications, and clinical judgment of the expected outcome in the untreated
outcomes were also reported they could not be com- patient population. he smallest efect size considered
pared to a contemporaneous control group. Outcomes clinically meaningful is determined to provide the
were compared between those who had recanalization threshold that the treatment must pass. To illustrate
and those who did not. On the basis of mere single arm the utility of performing middle development futility
study results, the Merci retriever system received FDA analyses, investigators performed simulated futility
approval to ‘restore blood low in the neurovasculature analyses applied to data from a convenience sample of
by removing thrombus in patients experiencing ische- a mixture of positive and negative previously published
mic stroke’ [39], highlighting diferences in approval late phase acute stroke trials [42]. In this analysis, futil-
between drugs and devices. ity was established based on the simulations for three
treatments, all of which had negative late phase results,
Historical controls using only a small fraction of the sample size that was
Rather than acquiring contemporaneous controls, required for the late phase studies. hree studies did
some single-arm studies use historical controls. hese not show futility; one of these had a positive result in
controls can be gathered through case series, or more the late phase study. hus, with a fraction of the sample
practically by using the placebo arm of previously con- size required for an eicacy study, a single arm futility
ducted randomized studies. For instance, the outcomes study can help discard treatments with a low likelihood
of the placebo arm and treatment arm of the NINDS of success.
rt-PA Stroke Study were compared with outcomes of
those treated with combined IV and intra-arterial (IA) Late phase
rt-PA in the Interventional Management of Stroke
(IMS) study in the primary outcome publication [40]. Parallel group randomized controlled trial
he placebo group of PROACT-II was used as the com-
parison for the MERCI trials given that both groups Use of placebos vs. active control (IVrt-PA)
were large artery occlusions and treated in a similar When the NINDS rt-PA Stroke Study was performed,
time window [41]. hese types of comparisons can there were no approved acute therapies with which
provide some useful information; however, because IVrt-PA could be compared. Diferences between a
they are not randomized, there are likely to be inher- new intervention and placebo are generally greater
ent diferences between the two groups that contribute than between a new intervention and a proven treat-
to diferential outcomes. Because the controls are not ment. Now that IVrt-PA is FDA-approved, and has
contemporaneously ascertained, secular trends may been shown to be an eicacious treatment in clinical
also contribute to bias. herefore, this type of study trials, alternative lytics must be tested against IVrt-PA
is inappropriate for a late phase study to ascertain rather than a placebo within the 3 hour window, and
248
due to ECASS III [17], within the 3–4.5 hour window deicits, such as an NIHSS between 7 and 20. Inclusion
as well. However, patients with very severe strokes, or of the severity extremes will increase the variability of
other particular patient groups that would have been response and will require a larger sample size. To help
excluded from ECASS III trial enrollment, can be account for expected diferences in outcomes based
enrolled in a placebo-controlled acute stroke therapy on initial severity, some trials have stratiied by base-
trial within the 3–4.5 hour window. Studies that test line NIHSS. For instance, the Glycine Antagonist in
new devices and drugs within the 4.5 hour window Neuroprotection (GAIN) Americas trial stratiied by
in subjects who may have been treated with IV rt-PA age (≤75 vs. >75 years) and NIHSS (2–5, 6–13, or ≥14),
should block stratify based on IVrt-PA treatment. creating six strata [30]. Accounting for baseline severity
at the time of analysis is another strategy for account-
Randomization allocation ratio ing for this heterogeneity (see above ‘Treatment of out-
In acute stroke trials, subjects are generally randomized come analytically – dichotomous, ordinal, continuous,
to treatment groups in a 1:1 fashion, where trial power sliding dichotomy’). In smaller trials, where the bene-
is optimized. However, it may entice patients to enroll its of stratiication are greatest, only a few stratiication
if the chance of receiving a placebo is reduced below variables should be selected to minimize the numbers
50%. For instance, in PROACT II, subjects were allo- of strata. It has been recommended that the number of
cated in a 2:1 fashion to active treatment and control strata be less than the total number of subjects divided
groups [8]. Uneven group allocation may also allow by four times the block size [43].
additional experience to be obtained with a treatment Imaging can also be used for patient selection. he
or within a patient subset. However, if the allocation ECASS investigators elected to exclude subjects from
ratio is or exceeds 3 (i.e. the proportion on the treat- thrombolytic trials based on early evidence of ischemia
ment exceeds 0.75) statistical test power is signiicantly in greater than a third of the MCA territory [4]; however,
reduced for the same total sample size. an analysis in the NINDS rt-PA Stroke Study data did not
support a treatment by early ischemic change interaction
Subject selection [44]. Some multimodal imaging studies have shown that
Eligibility criteria can be broad, or more focused in an lack of evidence of penumbra suggests against a thrombo-
attempt to ind subjects who will have more similar lytic response [10]. Eliminating subjects who are unlikely
responses. Widened eligibility criteria improve gen- to respond helps reduce study sample size.
eralizability, and increase the available sample popu-
lation. However, it introduces heterogeneity and may Masking
include subjects with a low likelihood of response to Maintaining proper treatment masking is essential to
the treatment. In early and mid-development stud- reduce assessment bias in randomized trials. When
ies, limiting the sample to those who are most likely treatment allocation masking is not possible, the clin-
to beneit or establishing those most likely to beneit is ical outcomes assessor should be masked. his can
helpful in proving proof of concept. Similar eligibility be accomplished by using two diferent treating and
criteria are then applied to the initial late phase studies. rating investigators or other study team members.
Once there is evidence of success, further trials can be Boluses and infusions of study drug and placebos can
designed to expand the population. For example, lim- be prepared to look identical, out of the sight of inves-
iting the time window to 3 hours in the NINDS rt-PA tigators by an investigational pharmacist. Masking can
Stroke Study [5] and then following this with other late become more complex when combinations of agents
phase studies of a more expanded window proved suc- are tested. For instance, in the Combined approach to
cessful. However, there are circumstances where eligi- Lysis utilizing Eptiibatide And Recombinant tissue-
bility criteria are liberalized for the later phase studies. type plasminogen activator (CLEAR) stroke study,
Baseline severity is an important consideration. It standard dose IV rt-PA was tested against a lower dose
is well known from the NINDS rt-PA trial that patients of IV rt-PA plus eptiibatide using a double-dummy
with a very high NIHSS have less dramatic recover- approach [45]. All patients received either 10% of the
ies. Patients with a low NIHSS on the whole tend to standard dose or 15% of the lower dose bolus in 10 ml.
do quite well and are oten normal or near normal at Patients then received the remainder of the standard
3 months. herefore, the greatest beneit may be seen dose of IV rt-PA in two sequential infusions over 30
when enrollment focuses on those with moderate minutes each, or the remainder of the low dose IVrt-PA
249
over 30 minutes followed by 30 minutes of placebo. he power by decreasing the standard error of the treat-
patients who received low dose IV rt-PA were given a ment efect.
2 hour infusion of eptiibatide, and those who received
standard dose IV rt-PA were given a 2 hour infusion Non-inferiority
of placebo. Volumes given to both groups at all phases In the post IVrt-PA era, a new treatment can be tested to
were identical, and all infusions were clear. see if it is superior to IV rt-PA, or whether it is as good
as (i.e., not inferior) to IV rt-PA. In acute stroke ther-
Effect size apies, identifying something that works equivalently
For sample size calculations, a minimum clinically to, or not worse than, IV rt-PA would not represent an
important diference (MCID) or the efect size must important impact on patient care, unless the treatment
be speciied. Because the sample size increases with were clearly safer than IV rt-PA. Hence, the usefulness
the inverse square of the MCID, detecting small difer- of non-inferiority studies is currently minimal in acute
ences requires a very large sample size. In general, the stroke therapy trials [see Chapter 13]. Furthermore,
efect size used is the smallest clinically relevant efect. depending on the threshold used to determine non-
Selecting an efect size from that observed in middle inferiority, the sample size for these studies is oten
phase studies can be misleading. Middle phase studies quite large. herefore, to test a new thrombolytic agent
tend to have smaller sample size and hence a smaller thought to be similarly efective but with a lower sICH
number of clinical sites, and more homogeneous sam- risk, it may be more eicient to incorporate sICH
ples, and oten, this leads to an observed treatment into the primary outcome measure such as devising a
efect that is larger than what could be observed in lar- scheme in which the outcome score would be penal-
ger late-phase studies with greater variability in patient ized for an sICH. Finally, the FDA has stringent criteria
characteristics as well as in clinical management at a for non-inferiority studies where superiority of the
larger number of sites. herefore, the efect size sought active control must be well established from placebo-
should be derived from a clinical perspective and the controlled late phase studies.
observed treatment efect from the middle phase stud-
ies used to determine whether the MCID selected for Adaptive designs
the late phase study can be reasonably achieved. As Using group sequential analysis designs, interim ana-
an example, the ECASS II investigators pre-speciied lyses that control for overall type I as well as type II errors
a 10% diference in favorable outcome between the can be applied so that trials can be stopped early for ei-
groups. Prior data do oten inform the expected treat- cacy and/or futility. GAIN Americas performed two
ment diference selected in late phase studies. Examples interim analyses in addition to the inal analysis using
include the beneit of PROACT I on PROACT II [8], a group sequential design to limit type I error [30]. he
and the inluence of prior pooled data from other late NINDS rt-PA Stroke Study described interim analyses
phase studies on the extended time window for ECASS ater each three sICH subjects so that the trial could be
III [4]. Estimates of the probability of favorable out- stopped early if IVrt-PA were found to be harmful.
come in the placebo group can be obtained from prior Conditional power can be calculated to determine
natural history studies. whether trial continuation is futile. PROACT II for
instance, had a preplanned futility analysis conducted
Study design ater the irst 42% of patients had completed their 3
month follow up.
Superiority Although not performed frequently, sample size
Most trials are analyzed to compare outcomes between can be recalculated during the trial if parameters unre-
two groups with respect to superiority in an intent-to- lated to eicacy comparison (i.e., nuisance parameters)
treat fashion. his is frequently performed by compar- from the current trial data suggest that the information
ing the proportion of favorable outcome between the on which the original sample size calculations were
two randomized treatment groups, sometimes adjusted based were diferent. For instance, if the proportion
for baseline factors such as NIHSS score [8, 30]. Rather of favorable outcome in the placebo group is closer to
than a comparison between a primary outcome meas- 50% from either direction, the sample size is likely to be
ured only at the inal visit, comparison adjusting for inadequate, if the original assumption for the placebo
the baseline value of the outcome generally increases group success rate was much less than or much greater
250
than 50%. Also, if the variance of the estimate of a con- of enrollment in a non-thrombolytic trial, studies that
tinuous outcome measure was underestimated prior include IV rt-PA eligible patients now have to account
to study initiation, it may be prudent to re-estimate for the efects of the IV rt-PA treatment. As an example,
the sample size, preferably in a blinded manner using in the SAINT II study, randomization was stratiied on
aggregate data, to ensure adequate statistical power for intent to administer IV rt-PA, in addition to country,
the inal analysis. A plan for sample size re-estimation baseline NIHSS, and side of infarction [33]. his strati-
at some point in the trial should be pre-speciied prior ication is appropriate because IV rt-PA is associated
to beginning the trial. with the primary outcome. Similarly the analysis was
An eicient method for adjusting sample size while stratiied by use of IV rt-PA, NIHSS, and side of infarct.
preserving alpha is adaptive randomization. A recent To answer the question as to whether the efect of the
example of this was planned for the recently published, study drug difered based on IV rt-PA administration,
but early terminated, TNK trial [46]. his was planned an interaction was investigated, but not identiied.
as a single overarching study including a seamless tran-
sition between a middle and late phase. he irst piece
of the study was designed to select one of three doses Implementation issues
of tenecteplase to use for comparison with IV rt-PA
through an adaptive design based on a 24 hour out- Recruitment and consent
come measure. his outcome incorporated favorable Recruitment of acute stroke subjects is challenging.
outcome and sICH using the following scoring system: Patients with acute stroke oten are unable to con-
sICH (0), major neurological improvement (2), nei- sent for themselves, 70% in the NINDS rt-PA Stroke
ther sICH nor major improvement (1). When the one Study [49], requiring family members to act as surro-
of the three dose arms fell behind the best dose group gate decision makers for research. he superimposed
by 6 points, the dose was discarded. One dose of ten- challenge is that decisions must be made very quickly
ecteplase was eliminated using a sequential selection and in the setting of a stressful event – an acute medical
procedure based on only 14 triplets’ (each assigned to illness. Some small studies have suggested that acute
the three diferent doses) 24 hour data. stroke study consent does not always fulill the object-
ives of consent and that patients oten have signiicant
Standards for efficacy and special misconceptions about the trial design, purpose, and
certainty of beneit [50, 51]. Laws and other regulations
safety issues that govern the use of surrogate consent vary at the
country, state, and institutional levels. Further compli-
Adjusted primary analysis cations include the lack of a federal statutory provision
Even with balanced randomization, heterogeneity is specifying the qualiications of a legally authorized
known to result in bias towards the null in clinical trial representative, and the inconsistencies in state and
analyses of dichotomous variables or survival data [47]. Institutional Review Board (IRB) rules on this subject.
Adjustment for important baseline characteristics that However, the use of surrogate consent is essential to
are associated with outcome thus results in increased acute stroke research for two reasons. First, a require-
eiciency. For example, a reanalysis of the NINDS ment for self-consent would eliminate the majority
rt-PA Stroke Study data showed that adjustment for of otherwise eligible subjects, thereby substantially
age, NIHSS, stroke subtype, prior disability, diabetes, increasing study recruitment duration. And second,
and history of stroke resulted in a more extreme odds there are signiicant diferences between patients who
ratio (i.e., greater treatment efect) than bivariate ana- can and cannot consent for themselves with respect to
lysis [48]. his would have resulted in a 13% smaller age, stroke severity, infarction volume, side of infarc-
required sample size. Accordingly, pre-speciication of tion, and ultimate recovery [49, 52]. Studies have not
a risk-adjusted analysis should be considered in stroke shown an interaction between ability to consent for
trials that use a binary (or survival) outcome. oneself and IV rt-PA response, however [49].
When surrogate consent is necessary and the
Interaction with IV rt-PA legally authorized representative is not physically pre-
Because patients who are eligible for IV rt-PA therapy sent, opportunities for obtaining informed consent are
should not be denied this approved treatment in favor limited. Some IRBs allow the use of telephonic consent
251
if the consent form can be viewed by the surrogate and inancial disincentives for centers to participate in
a signed copy can be returned. his is possible if the device-related acute stroke trials needs to be addressed,
surrogate has easy access to a fax machine. In a novel otherwise the ability to develop any proven endovascu-
permutation of this process, the Field Administration lar therapies will be jeopardized.
of Stroke herapy-Magnesium (FAST-MAG) investi-
gators have pilot tested and are using a consent process Randomization in multi-center trials under
that begins during ambulance transport to the hospital
[53]. Other investigators have focused on aerial light the time constraints
rather than ambulance transport. Patients at rural or he time pressures of acute stroke study enrollment are
other non-urban hospital emergency departments are ierce and include the consent process, review of eli-
oten transferred by helicopter to tertiary care stroke gibility criteria, randomization, and study treatment
centers ater initial evaluation. he ability to capture preparation. Local randomization procedures are typ-
patients for acute stroke trials during transport would ically simpler and easier to implement than central ran-
increase early trial enrollment and extend trial oppor- domization, but may result in imbalances in the overall
tunities to those who live in non-urban areas who would treatment assignment as well as in baseline character-
otherwise arrive too late. he feasibility of obtaining istics in the total study population across sites. A new
consent from patients or their surrogates during heli- method, step-forward randomization [55], has been
copter transport has been demonstrated [54]. proposed as a hybrid approach. he irst subject of a
multi-center trial has a treatment assigned before trial
enrollment. Ater enrollment of each successive subject,
Obstacles to recruitment a single randomization assignment is made for only the
Recruitment in acute stroke trials is a challenge for many next subject at that site based on the baseline character-
reasons. here is usually a very limited time window in istics and treatment assignments of all prior subjects
which patients can be enrolled, leaving few patients eli- across all sites. his dynamic randomization technique
gible. his is combined with a very time-limited con- keeps randomization one step ahead of subject enroll-
sent process, an actively sick patient, and oten a drug ment. Ater each enrollment, the study team enters the
that can cause major adverse events, making the process enrollment information about the subject just enrolled
even more challenging. In drug trials, the study agent is into the study website, so that the assignment for the
oten not available outside of the clinical trial; however, next eligible patient can be made prior to the patient’s
with endovascular treatments, the procedure is oten presentation. he approach allows for incorporation of
available outside of the research setting because of the blinding. he step-forward randomization expedites
diferent standard for FDA device approvals compared the treatment assignment process. his randomization
with drugs. For instance, while clinical trials of the scheme has been proposed and applied to acute stroke
Merci retriever are currently ongoing, its use as part of trials coordinated by the Medical University of South
routine clinical care is common. If faced with the option Carolina [55]. heir experience with ALIAS and IMS
of a randomized trial of recanalization with the Merci III suggests the success of the procedure for maintain-
retriever or Merci retriever use outside of research, fam- ing covariate balance. However, they caution that step-
ilies will oten choose the ‘sure thing’ despite the lack forward randomization should probably be limited to
of known risk/beneit ratio. Interventionalists may also studies with no more than two strata.
be tempted to use devices and receive standard reim-
bursement versus enroll in research with lower reim- Methodological limitations/
bursement. hese issues negatively afect recruitment
into trials and diminish trial generalizability. Centers
controversies
that are enrolling in catheter-based therapy trials need
to consider whether they will ofer routine clinical use
Conducting trials in the face of evolving
of these unproven treatments to patients who are other- treatments
wise eligible for study enrollment. A commitment to he technology of mechanical thrombectomy is evolv-
avoid this would improve trial recruitment and exped- ing faster than the technologies can be tested adequately
ite the advancement of stroke therapies. Similarly, the in clinical trials. his creates a complexity where an
252
improved device becomes available while the prior Emergency exception to informed consent
device is being tested. For instance, when recruitment
In the US, a provision for emergency exceptions
for the Multi MERCI single arm study began in January
from informed consent has existed since 1996. his
2004, only the irst generation retriever devices, X5 and
provision can be applied to acute stroke therapeutic
X6 were available [6]. Only during the study, in August
research if a number of qualiications are met [58].
of 2004 did the L5 second generation retriever gain
hese include: the condition is life-threatening;
FDA clearance. To address this issue, the investigators
available treatments are unsatisfactory or unproven;
performed a non-inferiority analysis testing whether
obtaining informed consent is not feasible and the
the newer device was not inferior to the older device.
research cannot be practicably performed without the
waiver; direct beneit to the participant is possible; if
Clinical trials in the setting of variation in the potential therapeutic window permits, contact
the definitions of terms was attempted with the legally authorized representa-
tive; there is an IRB-approved protocol and consent.
he NIH-NINDS has initiated a process of creating
Additional provisions must also be in place including
and deining Common Data Elements (CDEs) for
at a minimum: community consultation; public dis-
stroke trials [56]. he common utilization of terms is
closure of the trial and consent process (and later, the
expected to reduce variability and maximize compar-
results); an independent data monitoring committee;
ability amongst trials. he irst drat of the CDEs is now
documentation that any family member was called
available (www.commondataelements.ninds.nih.gov/
in attempt to allow him/her to object; ater enroll-
Stroke.aspx). It is anticipated that diferent trials will
ment, the patient, legally authorized representative,
utilize diferent elements and that the deinitions of
or any family member must be sought out to discuss
elements may evolve to some degree.
the research to provide the opportunity for research
participation to be discontinued. Most acute stroke
Race/ethnicity and gender breakdown research easily its within these conines. However,
of trial participants there are some controversial areas. An approved ther-
apy is available, IV rt-PA, but some have argued that
Some trials in stroke prevention may naturally enroll
it is unsatisfactory due to hemorrhage risk, moder-
a non-representative race/ethnic distribution of sub-
ate beneit, and a short therapeutic window [59].
jects by virtue of the disease it targets, such as sickle cell
Furthermore, although consent is possible in some
disease or intracranial atherosclerosis. Based on the
stroke patients or with their legally authorized rep-
epidemiology of stroke, the most common minority
resentatives, exclusion of groups unable to consent,
groups should be overrepresented in US acute stroke
such as those with aphasia, would bias study results.
trials given their higher risk of stroke compared with
Many acute stroke trials have been carried out to date
non-Hispanic whites. However, this does not appear
without invoking the exception to informed con-
to be the case, especially for Hispanics, who are now
sent requirement for emergency research. Because
the largest minority group in the US. As an example,
informed consent is so intrinsic to clinical research
the NINDS rt-PA Stroke Study enrolled only 5–8%
ethics, if this exception is to be applied, it should be
Hispanics in each treatment group despite their preva-
done so with great care and extensive consideration.
lence in the 2000 US census of 12.5% [57]. Further
complicating the assessment of this issue is the lack of
reporting of race/ethnicity in many publications [6, Conclusion
11, 45] and international trials where the background Stroke is a very common and oten devastating disease
population representation of these groups difers. To a with important public health impact. Unfortunately,
lesser degree, women also seem to be underrepresented despite a multitude of completed clinical trials, only
in trials [5, 8], which may be due to their tendency to one currently FDA-approved treatment is available.
have strokes later in life, and thus may be diferentially Rigorous clinical trial design is needed to progress new
excluded by an age limit. he lack of representation of therapies through early, middle, and late development
race and ethnic minority groups may limit the general- in order to identify eiciently a new treatment for acute
izability of these clinical trials. ischemic stroke.
253
of Health Workshop. Control Clin Trials 2001; 22:

Reference 485–502.
1. Hossmann KA. Viability thresholds and the penumbra 14. Alexandrov AV, Molina CA, Grotta JC, et al.
of focal ischemia. Ann Neurol 1994; 36: 557–65. Ultrasound-enhanced systemic thrombolysis for acute
2. Heiss WD, Grond M, hiel A, et al. Tissue at risk of ischemic stroke. N Engl J Med 2004; 351: 2170–8.
infarction rescued by early reperfusion: A positron 15. Saver JL, Albers GW, Dunn B, et al. Stroke
emission tomography study in systemic recombinant therapy academic industry roundtable (STAIR)
tissue plasminogen activator thrombolysis of acute recommendations for extended window acute stroke
stroke. J Cereb Blood Flow Metab 1998; 18: 1298–1307. therapy trials. Stroke 2009; 40: 2594–600.
3. Molina CA, Montaner J, Abilleira S, et al. Time course 16. del Zoppo GJ, Higashida RT, Furlan AJ, et al. PROACT:
of tissue plasminogen activator-induced recanalization A phase ii randomized trial of recombinant pro-
in acute cardioembolic stroke: a case-control study. urokinase by direct arterial delivery in acute middle
Stroke 2001; 32: 2821–7. cerebral artery stroke. Stroke 1998; 29: 4–11.
4. Hacke W, Kaste M, Bluhmki E, et al. hrombolysis with 17. Hacke W, Kaste M, Fieschi C, et al. Intravenous
alteplase 3 to 4.5 hours ater acute ischemic stroke. N thrombolysis with recombinant tissue plasminogen
Engl J Med 2008; 359: 1317–29. activator for acute hemispheric stroke. he European
5. he National Institute of Neurological Disorders and Cooperative Acute Stroke Study (ECASS). JAMA 1995;
Stroke rt-PA Stroke Study Group. Tissue plasminogen 274: 1017–25.
activator for acute ischemic stroke. he National 18. homalla G, Sobesky J, Kohrmann M, et al. Two tales:
Institute of Neurological Disorders and Stroke rt-PA Hemorrhagic transformation but not parenchymal
Stroke Study Group. N Engl J Med 1995; 333: 1581–7. hemorrhage ater thrombolysis is related to severity and
6. Smith WS, Sung G, Saver J, et al. Mechanical duration of ischemia: MRI study of acute stroke patients
thrombectomy for acute ischemic stroke: inal results of treated with intravenous tissue plasminogen activator
the Multi MERCI trial. Stroke 2008; 39: 1205–12. within 6 hours. Stroke 2007; 38: 313–8.
7. Bose A, Henkes H, Alke K, et al. he Penumbra System: 19. Dzialowski I, Pexman JHW, Barber PA, et al.
a mechanical device for the treatment of acute stroke Asymptomatic hemorrhage ater thrombolysis may
due to thromboembolism. AJNR, American Journal of not be benign: Prognosis by hemorrhage type in the
Neuroradiology 2008; 29: 1409–13. Canadian alteplase for stroke efectiveness study
registry. Stroke 2007; 38: 75–9.
8. Furlan A, Higashida R, Wechsler L, et al. Intra-arterial
prourokinase for acute ischemic stroke. he PROACT 20. Fisher M, Hanley DF, Howard G, et al. Recommendations
II study: a randomized controlled trial. Prolyse in from the STAIR V meeting on acute stroke trials,
Acute Cerebral hromboembolism. JAMA 1999; 282: technology and outcomes. Stroke 2007; 38: 245–8.
2003–11. 21. Duncan PW, Jorgensen HS, Wade DT. Outcome
9. Recommendations for Standards Regarding Preclinical measures in acute stroke trials: A systematic review and
Neuroprotective and Restorative Drug Development. some recommendations to improve practice. Stroke
Stroke 1999; 30: 2752–8. 2000; 31: 1429–38.
10. Albers GW, hijs VN, Wechsler L, et al. Magnetic 22. Fisher M, for the Stroke herapy Academic Industry
resonance imaging proiles predict clinical response Roundtable IV. Enhancing the development and
to early reperfusion: the difusion and perfusion approval of acute stroke therapies: Stroke herapy
imaging evaluation for understanding stroke evolution Academic Industry Roundtable. Stroke 2005; 36:
(DEFUSE) study. Ann Neurol 2006; 60: 508–17. 1808–13.
11. Hacke W, Furlan AJ, Al-Rawi Y, et al. Intravenous 23. Weimar C, Kurth T, Kraywinkel K, et al. Assessment of
desmoteplase in patients with acute ischaemic stroke functioning and disability ater ischemic stroke. Stroke
selected by MRI perfusion-difusion weighted imaging 2002; 33: 2053–9.
or perfusion CT (DIAS-2): a prospective, randomised, 24. Young FB, Lees KR, Weir CJ. Strengthening acute stroke
double-blind, placebo-controlled study. Lancet Neurol trials through optimal use of disability end points.
2009; 8: 141–50. Stroke 2003; 34: 2676–80.
12. Prentice RL. Surrogate endpoints in clinical trials: 25. Williams LS, Weinberger M, Harris LE, et al.
deinition and operational criteria. Stat Med 1989; 8: Development of a stroke-speciic quality of life scale.
431–40. Stroke 1999; 30: 1362–9.
13. De Gruttola VG, Clax P, DeMets DL, et al. 26. Duncan PW, Bode RK, Min Lai S, et al. Rasch analysis
Considerations in the evaluation of surrogate endpoints of a new stroke-speciic outcome scale: the stroke
in clinical trials: Summary of a National Institutes impact scale. Arch Phys Med Rehabil 2003; 84: 950–63.
254
27. Haley EC, Brott TG, Sheppard GL, et al. Pilot 41. Josephson SA, Saver JL, Smith WS, et al. Comparison
randomized trial of tissue plasminogen activator in of mechanical embolectomy and intraarterial
acute ischemic stroke. he TPA Bridging Study Group. thrombolysis in acute ischemic stroke within the MCA:
Stroke 1993; 24: 1000–4. MERCI and Multi MERCI compared to PROACT II.
28. Brown DL, Johnston KC, Wagner DP, et al. Predicting Neurocrit Care 2009; 10: 43–9.
major neurological improvement with intravenous 42. Palesch YY, Tilley BC, Sackett DL, et al. Applying a
recombinant tissue plasminogen activator treatment of Phase II futility study design to therapeutic stroke trials.
stroke. Stroke 2004; 35: 147–50. Stroke 2005; 36: 2410–4.
29. Adams HP, Jr., Efron MB, Torner J, et al. Emergency 43. Kernan WN, Viscoli CM, Makuch RW, et al. Stratiied
administration of Abciximab for treatment of patients randomization for clinical trials. J Clin Epidemiol 1999;
with acute ischemic stroke: Results of an international 52: 19–26.
phase III trial: Abciximab in Emergency Treatment of 44. Patel SC, Levine SR, Tilley BC, et al. Lack of clinical
Stroke Trial (AbESTT-II). Stroke 2008; 39: 87–99. signiicance of early ischemic changes on computed
30. Sacco RL, DeRosa JT, Haley EC Jr, et al. Glycine tomography in acute stroke. JAMA 2001; 286: 2830–8.
antagonist in neuroprotection for patients with acute 45. Pancioli AM, Broderick J, Brott T, et al. he combined
stroke: GAIN Americas: a randomized controlled trial. approach to lysis utilizing eptiibatide and rt-PA in
JAMA 2001; 285: 1719–28. acute ischemic stroke: he CLEAR Stroke Trial. Stroke
31. Saver JL, Gornbein J. Treatment efects for which shit 2008; 39: 3268–76.
or binary analyses are advantageous in acute stroke 46. Haley EC, hompson JLP, Grotta JC, et al. Phase IIB/III
trials. Neurology 2009; 72: 1310–15. trial of Tenecteplase in acute ischemic stroke: Results
32. Saver JL. Novel end point analytic techniques and of a prematurely terminated randomized clinical trial.
interpreting shits across the entire range of outcome Stroke 2009; 41: 707–711.
scales in acute stroke trials. Stroke 2007; 38: 3055–62. 47. Gail MH, Wieand S, and Piantadosi S. Biased estimates
33. Shuaib A, Lees KR, Lyden P, et al. NXY-059 for the of treatment efect in randomized experiments
treatment of acute ischemic stroke. N Engl J Med 2007; with nonlinear regressions and omitted covariates.
357: 562–71. Biometrika 1984; 71: 431–44.
34. Optimising Analysis of Stroke Trials Collaboration. 48. Johnston KC, Connors AF, Jr., Wagner DP, et al. Risk
Calculation of sample size for stroke trials assessing adjustment efect on stroke clinical trials. Stroke 2004; 3:
functional outcome: comparison of binary and ordinal e43–e45.
approaches. Int J Stroke 2008; 3: 78–84. 49. Flaherty ML, Karlawish J, Khoury JC, et al. How
35. Brott TG, Haley EC Jr, Levy DE, et al. Urgent therapy important is surrogate consent for stroke research?
for stroke. Part I. Pilot study of tissue plasminogen Neurology 2008; 71: 1566–71.
activator administered within 90 minutes. Stroke 1992; 50. Kasner SE, Del Giudice A, Rosenberg S, et al. Who will
23: 632–40. participate in acute stroke trials? Neurology 2009; 72:
36. Haley EC, Levy DE, Brott TG, et al. Urgent therapy 1682–8.
for stroke. Part II. Pilot study of tissue plasminogen 51. Mangset M, Førde R, Nessa J, et al. I don’t like that, it’s
activator administered 91–180 minutes from onset. tricking people too much…: acute informed consent to
Stroke 1992; 23: 641–5. participation in a trial of thrombolysis for stroke. J Med
37. Palesch YY, Hill MD, Ryckborst KJ, et al. he ALIAS Ethics 2008; 34: 751–6.
pilot trial: A dose-escalation and safety study of 52. Dani KA, McCormick MT, and Muir KW. Brain lesion
albumin therapy for acute ischemic stroke – II: volume and capacity for consent in stroke trials:
Neurologic outcome and eicacy analysis. Stroke 2006; potential regulatory barriers to the use of surrogate
37: 2107–4. markers. Stroke 2008; 39: 2336–40.
38. Krams M, Lees KR, Hacke W, et al. Acute stroke therapy 53. Saver JL, Kidwell C, Eckstein M, et al. Physician-
by inhibition of neutrophils (ASTIN): An adaptive investigator phone elicitation of consent in the ield: a
dose-response study of UK-279,276 in acute ischemic novel method to obtain explicit informed consent for
stroke. Stroke 2003; 34: 2543–8. prehospital clinical research. Prehosp Emerg Care 2006;
39. 501(k) summary. http://www.accessdata.fda.gov/cdrh_ 10: 182–5.
docs/pdf3/k033736.pdf. 54. Leira EC, Ahmed A, Lamb DL, et al. Extending acute
40. he IMS II Trial Investigators. he interventional trials to remote populations: a pilot study during
management of stroke (IMS) II Study. Stroke 2007; 38: interhospital helicopter transfer. Stroke 2009; 40:
2127–35. 895–901.
255
55. Zhao W. Step-forward randomization in multi-site 57. United States Census 2000. he Hispanic population:
emergency treatment clinical trials. Acad Emerg Med Census 2000 brief. http://www.census.gov (Accessed
2009; 17: 659–65. June 1, 2004.)
56. Saver JL, Warach S, Janis S, et al. Standardizing 58. 21 CFR §50.24.
the structure of stroke clinical and epidemiologic 59. Bateman BT, Meyers PM, Schumacher HC, et al.
research data: he NINDS Stroke Common Conducting stroke research with an exception from
Data Element (CDE) Project. Stroke 2012; the requirement for informed consent. Stroke 2003; 34:
in press. 1317–23.
256
Chapter
Multiple sclerosis
23 Richard A. Rudick, Elizabeth Fisher, and Gary R. Cutter
Biological basis for therapies disability in SPMS, axonal transection occurs at sites of
CNS inlammation [2] and causes neurodegeneration
Pathogenesis of MS on which experimental during the early disease stages. here are no approved
disease-modifying drugs that directly target neurode-
therapies are based generation in MS.
Multiple sclerosis (MS) is classiied as an organ- A subtype of MS, primary progressive MS (PPMS),
speciic autoimmune disease. Genome-wide asso- occurs in approximately 15% of patients and is char-
ciation studies have linked HLA and immune system acterized by continuous progression of neurological
genes to the disease, leaving little doubt that immuno- disability in the absence of relapses from disease onset
logical factors contribute to disease pathogenesis [1]. forward. Mechanisms underlying PPMS are presumed
In the early stages of MS, scattered foci of inlamma- to be similar to those underlying SPMS. herapies tar-
tion occur in the central nervous system, the target of geting inlammation have been tested in PPMS but
the inlammatory response. When these inlamma- have not been beneicial.
tory foci involve motor, sensory, or visual pathways,
clinical relapses occur. With resolution of inlamma- The role of animal models in developing
tion, patients recover and enter a clinical remission.
Relapses occur during the relapsing-remitting stage of MS treatments
MS (RRMS) at a variable rate, both across and within No naturally occurring animal model of MS exists.
patients. Studies using MRI have revealed frequent new However, for nearly a century, experimental autoim-
lesions, deined as gadolinium-enhancing lesions or as mune encephalomyelitis (EAE) models have provided
new T2-hyperintense lesions. he frequency of new great insight into the mechanism of immune-initiated
lesions seen on MRI exceeds that of clinical relapses by inlammation within the CNS. EAE can be induced
approximately 10 to 1. For MS treatment, all currently in a variety of animal species and strains by immuni-
approved disease-modifying drugs target inlamma- zation with CNS constituents or passive transfer of T
tion and are generally indicated for reduction of relapse cells or antibodies from immunized animals, resulting
frequency. in immunologically mediated inlammatory injury to
In MS patients, relapses become less frequent over the CNS. Gold and colleagues extensively reviewed the
the initial 10–20 years of the disease, and are replaced value and limitations of EAE models in MS research [3].
by slowly advancing neurological disability. his stage hey point out the tremendous heterogeneity in clinical
is referred to as secondary progressive MS (SPMS). manifestations and pathology, depending on the animal
Mechanisms underlying the transition from RRMS to species or strain and the immunogen. Ater decades of
SPMS are not entirely understood, but there appears study of rat and guinea pig models of EAE, mouse mod-
to be a transition from a mostly inlammatory pathol- els were developed, and recently, various transgenic or
ogy to one that is neurodegenerative and no longer knock-out mouse models have been used.
dependent on inlammation. Although neurodegener- hese models have yielded important information
ation is presumed to underlie progressive neurological about immune-mediated CNS tissue injury, but the
257
value of EAE in screening therapeutic agents has been presumably because inlammation is not essential to
limited. First, no single EAE model reliably mimics the ongoing neurodegeneration in the later stages. A
all aspects of MS. Secondly, in many models inlam- range of potential treatments directed at neurodegen-
mation predominates while demyelination is sparse. eration may be neuroprotective. Some strategies aim
Further, no generally accepted models exhibit the to increase axon stability or alter processes that dam-
marked neurodegeneration observed in later stages age axons. Others include remyelination strategies that
of MS although more recent models may be useful to promote diferentiation of oligodendrocyte precursors
investigate the axonopathy seen in SPMS [4]. he most into myelin-producing cells or the use of mesenchymal
signiicant limitation of the EAE model is that the out- or bone marrow-derived stem cells [9]. To date, no trial
comes of therapeutic strategies tested in EAE do not of neuroprotective therapy in MS has been positive.
reliably predict results in humans. Positive therapeutic However, evaluating the eicacy of potential neuro-
studies in the EAE model have not always translated protective agents is complex because there are no vali-
into efective treatments in patients with MS, and dated methodologies to demonstrate neuroprotection
conversely some beneicial therapies (e.g., interferon (see below).
therapy) were not preceded by strong eicacy results in
animal models. he EAE models appear to be most use- Newer hypotheses concerning multiple
ful and signiicant for studies of immune pathogenesis.
Consequently, they have not achieved a prominent role sclerosis etiology
in the screening of therapies for MS to date. New hypotheses have emerged regarding the role
Derfuss and colleagues [5] demonstrated that the of ultraviolet light and vitamin D [10–12], vascular
axoglial protein contactin 2 and its rat homologue comorbidity in driving disability progression [13], and
TAG-1 may be important autoantigens in the gray mat- venous obstruction [14]. hese hypotheses each lead
ter pathology that has recently been identiied in MS. directly to therapeutic strategies – e.g., vitamin D sup-
Adoptive transfer of TAG-1-reactive T cells resulted in plementation, prevention and treatment of vascular
inlammation predominately in spinal cord and cortex comorbidities, or treatment of venous obstruction. As
gray matter; when myelin-oligodendrocyte glycopro- with any intervention, studies will require large sample
tein-speciic antibodies were coadministered, focal sizes and rigorous designs, and results will depend on
cortical perivascular demyelination also developed. the validity of the underlying hypothesis.
Contactin 2-induced EAE may represent a new model
to analyze mechanisms of and interventions for MS Goals of intervention
gray matter pathology [6, 7].
Modifying the disease process vs. relieving
‘Neuroprotective’ vs. ‘anti-inflammatory’ symptoms
therapy he goals of disease-modifying therapy in MS – redu-
All current therapies for MS target neuroinlam- cing relapse frequency or reducing disability progres-
mation, with the aim of reducing the frequency of sion – may or may not improve quality of life in the
gadolinium-enhancing or new T2 lesions (which short term. An entirely separate approach targets MS
are markers of inlammation) and the frequency of symptoms; symptom therapies may signiicantly bene-
relapse, and thus slowing disease progression. Because it patients by reducing morbidity or improving quality
axons are transected at sites of acute inlammation of life. Symptom-based therapies for MS are oten used
[2], anti-inlammatory therapy may be neuroprotec- of-label, e.g., use of antidepressants or analgesics is
tive by preventing axonal injury and transection. A common. Additionally, drug development recently has
number of studies [8] have demonstrated that the rate focused on symptom management. Studies of 4-amino
of brain tissue loss, as measured by MRI volumetric pyridine (dalfampridine-SR) have targeted walking
studies, slows ater efective anti-inlammatory ther- speed in patients with MS; studies of dextrometh-
apy, adding evidence to support the concept that anti- orphan together with quinidine (AVP 923) have tar-
inlammatory therapy may be neuroprotective in the geted pseudobulbar afect; duloxetine and dronabinol
early stages of MS. Anti-inlammatory therapy, how- have targeted neuropathic pain; solifenacin succinate,
ever, has not been efective in the later stages of MS, bladder symptoms; and modainil, fatigue. Table 23.1
258
Chapter 23: Multiple sclerosis
Table 23.1 Drugs currently under development for symptomatic treatment of multiple sclerosis
Current
Drug ClinTrials.gov identifier Symptoms targeted status Sponsor
4-amino pyridine NCT00053417 Walking speed (timed Approved Accorda Therapeutics
(dalfampridine-SR) 25-foot walk)
dextromethorphan + NCT00573443 Pseudobulbar affect Approved Avanir Pharmaceuticals
quinidine Neudexta
duloxetine NCT00755807 Neuropathic pain Completed Eli Lilly
NCT00457730 Recruiting
dronabinol NCT00959218 Neuropathic pain Ongoing Bionorica Research GmbH
solifenacin succinate NCT00629642 Bladder symptoms Completed Astellas Pharma Inc
modafinil NCT00220506 Fatigue Recruiting Sheba Medical Center
NCT00142402 Memory, fatigue, Ongoing Kessler Foundation
anxiety and depression
modafinil + interferon NCT00210301 Cognition and fatigue Recruiting Institute for Clinical
β-1a* (secondary outcomes) Research
armodafinil NCT00981084 Cognitive function and Enrolling, by University of Missouri,
cognitive fatigue invitation Kansas City
* To test the safety of the combination.
lists drugs currently being developed for symptomatic meaning of conirmed EDSS worsening in RRMS, vir-
treatment – the clinical development pipeline for MS tually all subsequent clinical trials of disease-modify-
symptom relief is robust. ing drugs in MS, including those leading to approval of
subcutaneous interferon β-1a, used measures of con-
irmed EDSS worsening.
Lessons learned from the development
of interferon β to reduce relapses Study populations
Development of interferon β for MS [15] was a water-
shed event because interferon β-1b was the irst dis- Classification of MS subtypes
ease-modifying drug approved by regulatory agencies In 1996, Lublin and Reingold published the results of
to treat MS. hus, its approval ushered in the current an international survey that established standard ter-
therapeutic era in MS. Importantly, approval was sup- minology and categories for the diferent MS subtypes:
ported by a prominent reduction in new T2 hyperin- RRMS, SPMS, PPMS, and progressive-relapsing MS
tense brain lesions. his irmly established MRI lesions (PRMS) [16]. his classiication has profoundly inlu-
as an important secondary outcome measure for MS enced development of MS therapies because the clini-
clinical trials. Approval of interferon β-1b was quickly cal category has been used as a study entry criterion
followed by approval of intramuscular interferon β-1a for nearly all trials of disease-modifying therapy. More
and subcutaneous interferon β-1a. Whereas interferon recently, clinically isolated syndrome (CIS) has been
β-1b was approved based on its efect on relapse rate, added as a new category. his refers to the occurrence
intramuscular interferon β-1a was approved based on of a typical clinical syndrome suggesting inlammatory
its efect of delaying the time to conirmed worsening on demyelination. When CIS is accompanied by multi-
the Kurtzke Expanded Disability Status Scale (EDSS). ple lesions on brain MRI, the likelihood of new MRI
Disability progression was deined as an increase in the lesions or clinical relapses is extremely high [17], but
EDSS level, conirmed at the next 6-month scheduled patients with CIS do not meet current international
visit. his has led to a still-unresolved debate concern- panel criteria for a diagnosis of deinite MS [18]. All
ing the methodology used to measure disability pro- MS clinical trials enrolling CIS patients have required
gression in RRMS. Despite the controversy about the multiple T2 hyperintense brain lesions as an inclusion
259
criterion [19–21]. his represents a form of ‘informa- well as study drug discontinuation may have contrib-
tive enrollment’ (see below). Recently, a consensus uted to the negative indings [29]. Presumably, results
panel called for a more precise deinition of CIS [22]. In in PPMS trials have been negative because inlamma-
2009, incidental MRI abnormalities that suggested MS tion drives the pathologic process to a lesser degree in
was described and termed as the ‘radiologically iso- PPMS compared with RRMS and because mechanisms
lated syndrome’ [23]. No studies to date have entered driving neurodegeneration were not speciically tar-
patients with radiologically isolated syndrome into geted. As no approved therapies exist for PPMS, place-
randomized clinical trials. bo-controlled trials are ethical and needed, and this is
an area of extremely high unmet need. Childhood MS
[35] has been emphasized recently, but randomized
The benefits of early treatment controlled trials in the pediatric MS populations are
In patients with CIS, interferon β has been shown to just beginning.
reduce conversion to RRMS by 50% [19]. In RRMS,
interferon β therapy reduces the frequency of relapses by
33% [24–26]. Studies of interferon β therapy in patients Informative enrollment in MS clinical trials
with SPMS or PPMS have been negative. hese indings he most common approaches to trial enrollment are: 1)
suggest the possibility that anti-inlammatory therapy selection of patients with a history of relapses in the year
is most efective at earlier disease stages. Additionally, or two prior to trial entry; 2) selection of patients with
pathology studies demonstrated transected axons in ‘disability progression,’ usually deined as worsening
the inlammatory lesions of patients in early RRMS. by a speciied amount on the EDSS scale; or 3) patients
Inhibiting inlammation at an early stage, therefore, with one or more gadolinium-enhancing lesions on
would seem a good strategy. A crossover study com- cranial MRI during a run-in or at study entry. As with
paring early and delayed subcutaneous interferon β-1a other inclusion and exclusion criteria, informative
therapy showed that those patients initially treated enrollment strategies will restrict generalizability of the
with placebo for 2 years and then switched to interferon results. In addition to limiting generalizability, inform-
β-1a, were worse at 4 years compared with patients who ative enrollment strategies raise other considerations.
received interferon β-1a treatment for all 4 years, sup- First, entrance criteria may signiicantly inluence trial
porting the contention that early treatment is better results. As noted above, studies of a given drug show
than delayed treatment [27]. hese observations led to a greater efect on relapse frequency in CIS popula-
the concept that early treatment is preferable to delayed tions compared with RRMS populations, which in turn
treatment, and the MS clinical trial ield has moved in show greater eicacy than studies in SPMS populations.
that direction, testing interventions in CIS patients and hus, restricting trial entry to patients in earlier stages
testing aggressive immunomodulatory treatment very of MS may result in higher observed eicacy. Selecting
early in the disease. patients who demonstrated increased EDSS scores
may bias trial results in the direction of lower eicacy,
since patients remain at various EDSS steps for periods
Special issues concerning primary that approach or exceed the duration of clinical trials.
progressive MS and pediatric MS hus, entering patients who recently moved to a higher
To date, no treatments have shown signiicant beneits EDSS level may ensure fewer EDSS events rather than
in patients with PPMS (Table 23.2 [28–34]), although enriching the cohort for added events. Finally, enrolling
most published studies are relatively small. A notable patients with more progressive disease may enrich the
exception was the PROMiSe trial, in which 943 patients trial for patients more refractory to treatment. hus, the
were randomly assigned to receive glatiramer acetate main advantage of informative enrollment based on dis-
or placebo for 3 years. he two arms did not signii- ease activity – increased events during the trial – must
cantly difer on the primary outcome of delay in dis- be balanced against the likely efect of the informative
ability progression. Patients in the treatment arm had enrollment strategy on the outcome.
a signiicant decrease in the number of gadolinium- Another consideration is that standardized, vali-
enhancing lesions and smaller increases in T2 lesion dated methods to identify patients based on disease
volume although this diference was not signiicant. activity before randomization are not available. One
he lower-than-expected disability progression rate as common approach is to require pre-study relapses for
260
Table 23.2 Summary of randomized placebo-controlled clinical trials for PPMS
Placebo Treatment
Study (n) (n) Drug (s) Outcome measures Outcome
Hawker 147 292 rituximab • Time to sustained disease Groups did not significantly differ in time to progression. Patients
et al. [28] 2 1000-mg infusions/24 weeks for 96 weeks progression on EDSS receiving treatment had a significantly smaller increase in T2
• Changes on MRI lesion volume. Subgroup analyses suggested that treatment may
delay disease progression in younger patients, particularly those
with inflammatory (gadolinium-enhancing) lesions.
Wolinksky 316 627 glatiramer acetate • Time to sustained disease Groups did not significantly differ in time to progression. MRI
et al. [29] 20 mg SC/day for 36 months progression on EDSS lesion burden was significantly less in the treatment group.
• Changes on MRI Treatment may have slowed progression in males with rapid
progression.
Because the trial was stopped early and the event rate was low,
the trial may have been underpowered to detect a treatment
effect.
Montalban 37 36 IFN β-1b • Time to sustained disease No significant differences were found in disability progression
[30] 8 MIU SC every other day for 2 years progression on EDSS as assessed by EDSS. However, significant differences favoring
• Change in MSFC interferon β-1b treatment were seen in the MSFC score, T2 and TI
• QOL measures lesion volumes, suggesting that IFN β-1b may have a beneficial
• Changes on MRI effect in PPMS
Leary et al. 20 30 IFN beta-1a, IM • Time to sustained disease No difference in EDSS
[31] 30 μg or 60 μg 1x/wk for 24 months progression on EDSS T2 lesion load less in 30 μg treatment group but brain volume
• Changes on MRI loss greater with 60 μg
• 10-meter walk, 9-hole peg test
Rammohan 72 72 modafinil, oral • FSS score Fatigue significantly improved with 200 mg treatment on all
et al. [32] Crossover design with titration up from 200 • MFIS score measures
mg to 400 mg • VAS-F score
• EDSS score
Rice et al. 54 105 cladribine, SC, • Mean change in EDSS Treatment did not significantly affect the absolute change or
[33] 0.07 mg/kg/day for 5 consecutive days every • Scripps Neurologic Rating time to progression in EDSS or SNRS scores.
4 weeks for 2 or 6 cycles (total dose, 0.7 mg/ Scale Both doses significantly reduced the presence, number, and
kg or 2.1 mg/kg, respectively), followed by • MRI changes volume of gadolinium-enhanced T1 brain lesions, and cladribine
placebo, for a total of 8 cycles (12 months) 2.1 mg/kg decreased the T2 lesion load accumulation.
Filippi et al. 48 14 cladribine, SC, 0.07 mg/kg/day for 5 • Change in brain volume Brain volumes decreased in all patients as a group and in
[34] consecutive days every 4 weeks for 2 or 6 placebo-treated patients when analyzed alone. Neither
cycles (total dose, 0.7 mg/kg or 2.1 mg/kg, cladribine dose had any effect on brain volume loss over time.
respectively), followed by placebo, for a total In the placebo group, changes in brain volume did not correlate
of 8 cycles (12 months) with changes in other MRI measures.
EDSS = Kurtzke Extended Disability Status Scale; MSFC = Multiple Sclerosis Functional Composite; QOL = quality of life; SC = subcutaneous; IM = imtramuscular; MIU = million international units;
261
FSS = Fatigue Severity Scale; MFIS = Modified Fatigue Impact Scale; VAS-F = Visual Analogue Scale for Fatigue; ESS = Epworth Sleepiness Scale.
trial eligibility, but deining pre-study relapse rate is Since the mid-1990s, the EDSS has been used to
subjective. hus, clearly deining the population stud- determine conirmed worsening from the baseline
ied may be diicult. score determined at study entry. Kaplan-Meier ana-
Lastly, methods to enroll patients based on biologi- lysis of survival curves plotting the time to onset of
cal factors, while appealing, are in their infancy in MS. conirmed disability worsening in each treatment arm
For example, the HLA-DRB1*15 allele is more common have been used to estimate hazard ratios for disability
in MS patients, and is associated with more rapid MS progression with active treatment. Worsening of the
disease progression. An informative enrollment strat- EDSS score has been conirmed at a 3-month study
egy would be to enroll only HLA-DRB1*15-positive visit in most trials; a minority of trials have required
patients. However, the responsiveness of this patient 6-month conirmation. he EDSS may revert to base-
subgroup to a particular therapy cannot be known in line more commonly if the 3-month deinition is used
advance, so a more appealing strategy would be to con- [40]. Also, the relevance of conirmed EDSS worsening
duct pre-planned subgroup analyses in patients posi- in the early stages of MS is uncertain, although a recent
tive for HLA-DRB1*15 before using this marker for report demonstrated a correlation between 6-month
informative enrollment. conirmed EDSS worsening and clinical outcome
8 years later [41].
Properties of measurement tools Because of perceived limitations of the EDSS, a
National Multiple Sclerosis Society task force recom-
mended the MS Functional Composite (MSFC), a
Clinical measures: relapses, physical three-part composite consisting of timed measures
function, neuropsychological performance of ambulation, upper extremity function, and cogni-
he most common outcome measure for RRMS tri- tion [42]. he MSFC has been extensively tested and
als is the relapse number or rate. his was the primary validated but has yet to achieve its intended purpose –
outcome measure in two of the three pivotal trials of to replace the EDSS as a primary clinical measure of
interferon for RRMS [26, 36], the glatiramer acetate MS-related disability. A substantial part of the problem
trial [24], and the placebo-controlled natalizumab trial lies in interpreting the clinical relevance of the results.
[37]. Relapses are relatively simple to count, and by def- As originally recommended, the three MSFC measures
inition have a clinical impact on the patient. However, are transformed to a single Z score, deined as the aver-
the relationship between relapses and eventual disabil- age of the Z scores from the ambulation, upper extrem-
ity is weak [38]; relapses are sometimes subjective and ity, and cognitive tests. he optimal population used to
open to bias, over- or under-reporting, and treatment normalize the clinical trial test scores has been a sub-
unmasking; and generally accepted methods for quan- ject of debate, since the choice of reference population
tifying the severity of each relapse, or for quantifying inluences the weighting of the diferent components
recovery are not developed. within the MSFC [43]. Recently, a group analyzed
he EDSS is an ordinal scale from 0 to 10 that captures MSFC data collected during the AFFIRM trial and
the level of disability according to 19 steps [39]. Between proposed using the MSFC to identify a disability pro-
0.0 (normal neurological examination) and 3.5 (mod- gression event, analogous to how the EDSS is used [44].
erate disability in more than one functional system) the Disability progression as demonstrated by the MSFC
score is determined by combinations from seven sep- score correlated with traditional measures of disease
arate functional system scales (e.g., visual, motor, cere- activity and progression, and the MSFC score as a
bellar, sensory, bowel, bladder, etc.). From 4.0 to 6.0, the measure of progression showed treatment efects simi-
scale measures limitations in distance walking. Level lar to EDSS. It is expected that adding a visual assess-
6.0 indicates the need for unilateral assistance to walk, ment measure to the MSFC and possibly substituting
6.5 bilateral assistance, and ≥ 7.0 measures severity in a cognitive measure with less learning efect than the
non-ambulatory patients. here is considerable debate Paced Auditory Serial Addition Test (PASAT) will
whether the EDSS measures disability accurately at the improve the MSFC performance characteristics and
low end, and whether the middle and high ranges are allow the MSFC to replace the EDSS as a more useful
optimally sensitive for clinical trials. Despite criticism, measure of disability.
the EDSS has been the standard measure of neurological Neuropsychological impairment, particularly
disability in nearly all MS clinical trials. in processing speed, complex attention, and verbal
262
learning, has been identiied in approximately 50% of occur much more oten than clinical relapses. However,
MS cases in population-based studies [45]. he efects enhancement only lasts for 1 to 4 weeks, so enhancing
of treatment on neuropsychological test performance lesions will be missed when periods between serial
have been reported, although the popularity of neuro- MRIs are longer. During and following enhancement,
psychological testing in MS clinical trials has declined most lesions appear hyperintense on T2-weighted
because of time and cost considerations. Six rand- MRIs. Once formed, T2 hyperintense lesions may per-
omized clinical trials have been published that inves- sist indeinitely. Because serial MRIs are costly and
tigate disease-modifying medications and also assess impractical for most studies, clinical trials typically
neuropsychological outcome [46], with mixed results. include counts of both gadolinium-enhancing lesions
Neuropsychological testing is most appropriate for a and new or enlarging T2 hyperintensities as measures
study that speciically targets neurocognitive deicits of inlammatory activity. he number of combined
in MS. Eforts are under way to develop and validate unique active lesions is a single measure that has been
brief neuropsychological test batteries that are more proposed for use in clinical trials to avoid double
practical for MS clinical trials [47]. counting of enhancing lesions and new T2 lesions [50].
All currently approved MS disease-modifying therap-
ies have been shown to reduce enhancing lesions.
Patient-reported quality-of-life he total volume of T2 hyperintense lesions is
measures [48] considered an estimate of overall MS disease burden.
Many health-related quality-of-life (HR-QOL) scales Reductions in the accrual of T2 lesion volume have
have been used in MS trials. Generic HR-QOL measures been reported in the active treatment arms compared
include the Symptom Impact Proile and the Medical to placebo for most MS trials. he signiicance of these
Outcomes Study 36-Item Short-Form Survey (SF-36). volume reductions has been questioned because the
Hybrid measures are the MS Quality of Life Index accrual of T2 lesion volume only weakly correlates
and MSQOL-54; MS-speciic instruments include the with disability progression over the short term [51].
Functional Assessment of MS and MS Impact Scale-29. Furthermore, post-mortem studies have shown that
No consensus exists concerning the optimal patient only about half of T2 hyperintense lesions correspond
self-report HR-QOL instrument for MS clinical trials. to focally demyelinated MS lesions [52]. Pathologically,
At least eight clinical trials have reported the efects of T2 lesions range from transient edema to severe tis-
interferon or glatiramer acetate treatment on quality of sue destruction, complicating interpretation. Despite
life in MS. he AFFIRM study revealed a strong asso- these issues, T2 lesion volume correlates modestly with
ciation between the physical component score of the future brain atrophy [53], and it remains an important
SF-36 and both the EDSS score and relapse rate and measure in trials for conirming that treatment arms
number, and showed signiicant treatment efects [49]. are well-matched at baseline.
Patient-reported HR-QOL measures are appealing in Lesions that appear persistently hypointense on
that they capture the overall burden of MS, but they are unenhanced T1-weighted images (T1 black holes)
somewhat insensitive in that clinical changes can occur have been shown to correspond to regions with axonal
while HR-QOL remains the same. In addition, many loss [54]. However, black hole total volume correlates
HR-QOL measures are non-speciic and are therefore strongly with T2 lesion volume, and has not been par-
most appropriate as secondary outcome measures. ticularly useful as a clinical trial outcome measure.
Recently, the percentage of enhancing lesions that
evolve into chronic T1 black holes has been proposed
Conventional MRI measures as a marker of neuroprotection [55].
Measures using MRI, such as the number of contrast- Brain atrophy is a conventional MRI measure that
enhancing lesions and T2 hyperintensity volumes, is considered to be a marker of severe tissue destruc-
are routinely used in MS clinical trials. Lesions that tion. Measurement of changes in normalized brain vol-
enhance on T1-weighted images acquired ater injec- ume [56] and direct measurement of changes in brain
tion with a paramagnetic contrast agent (typically, edges from pairs of registered MRIs [57] have been
gadolinium-DTPA) indicate blood-brain barrier dis- applied in MS clinical trials to estimate whole brain
ruption and inlammatory activity. Frequent MRI atrophy. Like T2 lesion volume, normalized brain
studies have shown that gadolinium-enhancing lesions volume can be considered a marker of overall disease
263
burden, but it has some advantages over lesion meas- interferon-stimulated gene products may be moni-
urements. Importantly, it relects the net efect of the tored. Pharmacodynamic markers are useful in early
destructive processes due to MS. Brain atrophy cor- studies to determine the dose or dosing interval, moni-
relates more strongly with disability than any other tor patients for tachyphylaxis, and compare the magni-
conventional MRI measurement and predicts subse- tude of biological efects across doses or agents. With
quent disability [58]. However, some important issues the exception of antibodies to biological agents, few
relate to the interpretation of atrophy measurements studies have shown correlations between the efect
from clinical trials. In the initial period ater starting of therapy on a pharmacologic marker and clinical
most anti-inlammatory therapies, there is typically an response to therapy.
accelerated reduction in brain volume, termed pseu-
doatrophy [8], which presumably is due to the reso- New approaches to measuring MS
lution of inlammatory edema rather than actual tissue
loss. herefore, sometimes treatment efects on atrophy Measures based on MRI have been continually evolv-
can only be observed ater the irst of treatment. Also, ing with new image acquisition methods and higher
although changes in brain volume are much higher in magnet strengths. Several non-conventional MRI
MS patients than in healthy controls, the changes are techniques are under development for use in MS clin-
still very small, on the order of 0.5% to 1% per year. ical trials [60]. hese include magnetization transfer
herefore, highly reproducible methods and studies of imaging, T1 and T2 relaxation time measurements,
adequate duration are required. magnetic resonance spectroscopy, difusion tensor
Generally, MRI measures have the advantage of imaging, functional MRI, ultra-high ield strength
being more objective and more sensitive than clin- imaging, and molecular imaging. hese measures
ical measures. However, because no MRI measures provide greater sensitivity and speciicity, allowing
meet the stringent deinition of a surrogate marker quantitative assessment of pathophysiological mecha-
of MS, MRI is not accepted by regulatory agencies as nisms in MS. Challenges remain related to validation,
a primary outcome for phase 3 trials, although gado- optimization, and standardization that would permit
linium-enhancing lesions are commonly used as the newer measures to be used in multi-center trials.
primary outcome in phase 2 trials. Sormani and col- Interest in optical coherence tomography (OCT),
leagues conducted a pooled analysis of 23 clinical tri- a newer technique that quantiies the retinal nerve
als that tested the efect of interventions on relapse rate iber layer (RNFL), was stimulated by indings that MS
and included MRI lesion measures [59]. he efect of patients have reduced low-contrast letter acuity [61].
the intervention on MRI lesions accounted for 81% of hese studies demonstrated that the RNFL is thinner
the variance in the treatment efect on relapses, thus in patients than controls, even in those without a his-
showing a strong association between reduction in tory of optic neuritis [62], raising the possibility that
MRI lesions and reduction in relapses. Consistency in OCT could be used to assess treatment efects on the
MRI acquisition is a signiicant issue in clinical trials. thickness of the RNFL. As of yet, no study has demon-
Volumetric measures are highly sensitive to changes in strated that treatment afects the iber layer.
scanner hardware of sotware and changes in sequence
parameters, whereas count measures, e.g., enhancing Clinical trial designs and analytical
lesions, are relatively robust. Scanner upgrades can be methods used in development
disastrous for clinical trilas, and care must be taken to
prospectively plan for unavoidable changes in MRI
hardware and sotware over the study period.
Conventional designs: Preclinical through
phase 3 studies
Conventional designs for phase 1 through phase 3 trials
Pharmacodynamic markers are well known and established. he primary objectives
Many diferent assays have been used in pharmaco- of phase 1 clinical trials are to identify an efective dose
dynamic studies to measure or monitor biological and assess toxicity. In phase 2 trials, the objectives are
efects of speciic therapies. For instance, B-cell to insure that the drug provides some degree of efect-
numbers are monitored in rituximab trials because iveness and insure safety without excess toxicity in
rituximab depletes B cells. For interferon trials, the disease population. It is critical that phase 3 trials
264
demonstrate clinical efectiveness, but they also pro- years to active arm comparison trials and to trials of
vide information about side efects and tolerability of drugs in combination. Active arm comparisons and tri-
treatments, as well as their impact on quality of life. als of combination therapy entail several considerations.
he irst involves sample size: as event rates are reduced
Delayed start by active therapy in the comparison arms, the number
of cases necessary to achieve adequate power increases
Placebo treatment arms are ethically question- dramatically. In the example of the BEYOND trial, we
able, and increasingly so as more efective drugs are saw an almost four-fold increase in sample size to 2244
approved. However, active arm comparator designs compared to the placebo-controlled BENEFIT trial with
(e.g., head-to-head trials) require more patients than 487 patients. Second, adverse events may escalate sig-
placebo-controlled trials with the same endpoints. niicantly with multiple drug therapy. Another issue is
he BEYOND Trial, which compared two doses of that the FDA oten requires at least three arms in com-
interferon β-1b with glatiramir acetate, randomized bination trials – the combination and each of the com-
2244 patients [63]. Such large sample sizes are driv- ponent drugs alone or in combination with a placebo.
ing the need for alternative trial designs that can be hese three groups are necessary to provide evidence of
accomplished without an active comparator arm. a statistically signiicant superiority of the combination,
For example, patients can be randomized to double- say drugs A+B over A alone, and B alone. he rationale
blinded early- or delayed-start treatment, with subjects is that if A+B is not better than both drug A alone and
in the delayed-start arm receiving placebo until the drug B alone, there is no reason to expose the patient to
treatment phase is initiated for that arm. his design both drugs. he Avonex Combination Trial tested the
was used in the BENEFIT Trial, which randomized addition of methotrexate, methylprednisolone, or both
487 patients to interferon beta-1b or placebo for up to to intramuscular interferon β-1a in patients with RRMS
2 years and then initiated therapy. Here the question who had disease activity despite intramuscular interferon
was timing of treatment and whether earlier interven- β-1a. Although combination therapy showed beneicial
tion was beneicial [21]. he FDA (http://www.fda.gov/ trends compared to monotherapy, they were not statis-
RegulatoryInformation/Guidances/ucm125802.htm) tically, and probably not clinically, signiicant [65].
endorses such an approach and examines two out-
comes: replicating the treatment efect in the patients
initially receiving placebo and the sustained parallel
Adverse events
diferences between the treatment arms ater placebo Opportunistic infections
patients are switched to treatment. If a gap persists and
Two cases of progressive multifocal leukoencephalopa-
treatment efects are replicated, it seems reasonable to
thy (PML) in patients participating in the SENTINEL
conclude that the drug slows disease progression.
trial [66] were suicient to stop the study. Importantly
the number of events, two, is insuicient statistically
Standards for efficacy and special to call for stopping a trial. Such decisions are based on
safety issues clinical judgment and the severity of the consequences
of the events. In this situation, upon notiication of the
Active arm comparison and combination PML cases in February 2005, the FDA suspended use
of natalizumab pending a detailed safety review of all
trials patients exposed to the drug. Only ater a complete ana-
Ethical concerns surrounding placebo-controlled tri- lysis of the estimated risk for PML and other opportun-
als in RRMS involve the availability of efective disease- istic infections was the drug reintroduced to the market
modifying drugs that reduced the severity of MS. One in June 2006. Subsequent to the reintroduction, world-
approach has been to conduct placebo-controlled trials wide attention has focused on the risk of PML, on risk
in regions of the world where disease-modifying drugs stratiication methods, and on risk minimization and
are unavailable, but this does not satisfactorily address treatment of PML. he natalizumab experience has
the ethical concerns. Another approach has been to ofer called into question the methods used in post-marketing
placebo-controlled trials to patients who decline the surveillance for unusual severe adverse events such
use of available drugs [64], but this introduces selection as opportunistic infections and has had tremendous
biases. Consequently, MS trialists have moved in recent implications for the use of potent immunomodulatory,
265
immunosuppressive, and cytotoxic drugs for MS. chronic cerebrospinal venous insuiciency as a possible
Opportunistic infections have also been reported with cause of MS, leading to the demand for endovascular
current drugs in development, including but not lim- intervention, despite the lack of trial results demon-
ited to ingolimod [67, 68] and cladribine [69]. strating treatment beneits.
Few trials today recruit ahead of schedule, and
Cancer most sufer from 30% to 40% slower recruitment than
Mitoxantrone (Novantrone) was approved for relaps- planned. Aban et al. discussed their experience in plan-
ing and progressive MS. It was rapidly adopted due to ning and launching a multinational study in myasthe-
the lack of approved treatments for progressive MS. nia gravis [72]. hey highlighted the additional steps
Shortly thereater it became clear that mitoxantrone required for international sites and provided estimates
was linked to acute leukemia [70], which has signii- of the time required to bring US and non-US sites into
cantly limited its use. Cladribine, which is used to treat full regulatory compliance before they could initi-
leukemia, has been associated with second cancers in ate recruitment. Delays for non-US centers were 13.4
that setting, and MS clinical trials have thus far demon- ± 0.96 months as compared with US centers, of 9.67
strated some increase in cancer in cladribine recipients months (p = 0.02). he delay for non-US sites was
relative to placebo recipients [69]. attributable to Federal Wide Assurance certiication
and State Department clearance.
Other significant adverse events Historically MS trials have enjoyed very high reten-
Fingolimod has been associated with atrioventricular tion rates, usually exceeding 90% in 1- or 2-year stud-
block, bradycardia, mildly increased blood pressure, ies. However, as the MS treatment options increase,
and occasionally, macular edema [71]. hese ‘of target’ more patients can be expected to exit trials when they
efects of ingolimod appear to be mild enough to allow experience disease activity or side efects.
use of the drug. At times, of-target efects have been
signiicant enough to stop development of potential Effect of multiple trials within MS centers
MS drugs. In the mid-1990s, linomide (Roquinimex) When multiple trials exist within an MS center, com-
was in phase 3 clinical studies for RRMS and SPMS. petition for patients can occur. In such circumstances,
Approximately 1200 participants were enrolled. if researchers recruit potential study patients accord-
Development was stopped when eight patients expe- ing to the trial in which they believe the patient will do
rienced myocardial infarction and two died. It was best, the trial results may not be generalizable. In many
later determined that linomide caused pericardi- industry-sponsored studies, when speed of recruit-
tis. Subsequently, a chemical derivative of linomide, ment is an important goal, randomization is oten not
laquinimod, was developed. It does not appear to be done within each center. When selection biases occur,
cardiotoxic and is in phase 3 testing at present. particularly under circumstances of competing trials,
the treatment efects can be confounded by both center
Implementation issues – challenges and the small patient numbers within a center. here
are no statistical remedies for this confounding and its
in the conduct of the trial impact on generalizability.
Recruitment challenges
Two decades ago when trials in MS were starting, MS severity ‘drift’
patient recruitment was not a problem. Today, phase he changing patterns of relapses over time with an
2 and phase 3 MS trials have recruited patients over- apparent lessening, even in placebo groups, as well as
seas where regulatory processes are less onerous, trial somewhat reduced disability progression raises the ques-
costs are lower, and patients have fewer therapeutic tion of changes in MS severity over time. If such drit is
options. Trial recruitment diiculties may also stem occurring, is it a result of changing incidence and newer
from increased media coverage of clinical trials, where forms of MS, or difering subgroups of patients detected
negative media coverage trials may damage the public’s with difering prognoses? Is earlier diagnosis changing
trust in biomedical research. On the other hand, media the patterns observed from prior decades of observation,
reports can create a clamor for treatments that remain in the era before disease-modifying therapy? Or does
unproven. A recent example is the fervor surrounding therapy have a cumulative efect on clinician awareness
266
and response to the disease? Drit is important, as it Challenges and controversies

changes the risk-beneit equations that patients and cli-
nicians need to consider when selecting treatments. What is the relationship of treatment to
relapses, EDSS, and MRI parameters?
Country effects Figure 23.1 shows the course of destructive path-
he participation of multiple countries and centers in ology in the central nervous system in MS. During
MS research includes many untested assumptions con- the initial 10 to 20 years of symptoms, relapses result
cerning bias and trial design. he origin of the patients, in periodic neurological problems, but patients tend
the medical care system, the investigative teams and to function relatively well and would not be consid-
their views, and approaches to clinical trials combine ered to be disabled. As the pathology progresses and
to challenge the assumptions made in trial design – is superimposed on the aging process, a threshold is
that the efects of drugs are independent of country, surpassed, beyond which progressive neurological
center, etc. Ultimately, the confounding of such efects disability ensues, and the patient enters the second-
with treatment efects may afect the generalizability ary progressive phase.
of the results if assumptions concerning these import- All approved therapies for MS at present have been
ant covariates are false, but more importantly, efective directed at the RRMS and have targeted inlammation.
therapies may be missed due to increased variability. he degree to which such treatments inhibit and halt
a. RIS CIS RRMS SPMS Figure 23.1a. The natural course

of multiple sclerosis without disease-
modifying drug therapy.
The figure shows the stages of MS
Neurological disability
(see text): RIS – radiologically isolated

2 4 syndrome; CIS – clinically isolated
syndrome; RRMS – relapsing remitting
MS; SPMS – secondary progressive MS.
(1) Many patients presenting with CIS
3 already have multicentric MRI lesions,
indicating preceding subclinical disease
activity, designated as new MRI lesions
1 (↑). At the time of CIS and at the time
of relapses (2), transient neurological
disability appears (vertical lines). Once a
0 5 10 15 20 25 30 threshold of CNS pathology is surpassed
(3), disability ceases to be transient, and
years after MS onset ongoing disease pathology is manifest as
progressively worse neurological disability. The presence of ongoing tissue injury (4) is suggested by MRI studies showing progressive brain
atrophy starting early in the disease.
b. RIS CIS RRMS SPMS Figure 23.1b. The course of multiple

sclerosis with disease-modifying drug
therapy initiated at the time the patient
presents with CIS.
Neurological disability
All current approved disease-modifying

drugs target brain inflammation and
reducing new brain MRI lesions and
relapses. The figure shows a hypothetical
modified course of MS in the presence
of disease-modifying drug therapy.
New MRI lesions are reduced by about
70%, relapses by 50%, and the rate of
progression of ongoing tissue injury by
about 35%. Lowering the rate of ongoing
tissue injury delays the onset of SPMS to
0 5 10 15 20 25 30 about 20 years from symptom onset and
lowers the eventual level of neurological
years after MS onset disability.
267
the CNS pathology is of considerable debate. he trad- atrophy results has been controversial. Volumetric
itional measures of relapses and MRI lesions are useful changes are pathologically non-speciic. Some por-
because they measure beneit to patients, but the long- tion of the change relects real tissue loss due to MS,
term beneits of drug treatment on clinically signiicant but superimposed on the disease-related changes are
disability and development of SPMS are still uncertain. possible physiological luid shits related to hydration
Part of the problem relates to the unclear relationship status [8, 73], efects of gliosis or steroids [74], and pos-
between treatment efect on relapses or MRI lesions sibly cytotoxic efects [75]. hese confounding efects
and later clinically signiicant impact on disability. he are diicult, if not impossible, to control. Despite these
EDSS has been used in an attempt to deine progres- complex issues, MS patients lose greater brain vol-
sive disability in RRMS. Commonly, a deined amount ume over time than age-matched healthy controls, and
of worsening from the score at study entry is required atrophy correlates with and predicts disability [76],
and must be conirmed at least 3 months later. Many suggesting that atrophy measurements mainly relect
have equated conirmed EDSS worsening as synonym- disease-related change. Currently available MS disease-
ous with progressive disability in RRMS patients, but modifying therapies have been shown to slow atrophy
this remains controversial and uncertain. he EDSS is 30% to 50% in the second year of treatment [56, 77, 78].
somewhat imprecise at the low disability end of its range, his inding has been relatively consistent across trials,
and in a substantial proportion of patients, reports have therapies, and measurement methodologies. However,
documented recovery from conirmed EDSS worsen- more recently, a few MS trials have reported complete
ing of 1 point ater 3 months [40]. Long-term longitu- cessation or even reversal of brain atrophy [63, 79], the
dinal studies are required to determine the relevance meaning of which is under investigation.
of relapses, lesions, and conirmed EDSS worsening he lack of a validated, pathologically speciic
as predictors of clinically signiicant disability and measure of neuroprotection has prompted further dis-
SPMS. Further, it is necessary to determine whether, cussion on the utility of brain atrophy measurements.
and to what degree, improvements on these parameters In 2008, a meeting was convened to develop a consen-
translate into clinical beneits years later in the form of sus on how to measure neuroprotection and repair in
reduced disability. Many experts believe that decreases MS clinical trials [80]. he panel concluded that brain
in relapses, conirmed EDSS worsening, and lesions atrophy measurements are the most feasible, well-
represent intermediate outcome measures that predict a characterized, and useful marker of neuroprotection
beneicial long-term efect, but this has not been irmly currently available. More speciic measures of neuro-
established. protection and repair are essential and are currently
being sought.
Is ‘disease free’ a useful concept in MS trials?
he concept of ‘disease free’ (which derives from can-
cer trials) in MS is based on results from the natalizu-
What are we missing?
mab trials, which reported the proportion of patients An important limitation to using MRI for evaluating
with no indication of disease activity during treat- disease burden in MS is that abnormalities revealed by
ment. Disease-free is deined as no new MRI lesions, conventional MRI are restricted to the white matter.
no active MRI lesions during the trial, no relapses, and Pathology studies have shown that, in addition to the
no worsening on the EDSS score. Although a useful classic white matter plaques, signiicant tissue damage
concept, it is not certain that disease-free equates to occurs in the gray matter and in white matter regions
pathology-free status. For example, much of the MS outside areas of focal demyelination [81]. However,
pathology has been localized to gray matter, and gray most MRI outcome measures in MS (including gado-
matter pathology is not detectable with conventional linium-enhancing lesions, T2 lesions, and T1 black
MRI techniques. hole lesions) are insensitive to both gray matter path-
ology and difuse white matter damage. With conven-
tional MRI acquisitions, only atrophy measurements
What is the role of brain atrophy studies are sensitive to the efects of damage in the gray matter
in clinical trials? and normal-appearing white matter. Gray matter atro-
Although brain atrophy has been measured in MS phy has been shown to be correlated with disability and
trials for almost two decades, the interpretation of is currently under investigation as a feasible outcome
268
measure in trials [82]. Advanced MRI acquisition 7. Rudick RA and Trapp BD. Gray-matter injury in
methods have been applied to detect MS pathology multiple sclerosis. N Engl J Med 2009; 361: 1505–6.
outside of white matter lesions, including magnetiza- 8. Zivadinov R, Reder AT, Filippi M, et al. Mechanisms of
tion transfer ratio and double inversion recovery [83, action of disease-modifying agents and brain volume
84]. For technical reasons, these non-conventional changes in multiple sclerosis. Neurology 2008; 71:
imaging techniques are not yet ready for use in large 136–44.
multi-center trials. 9. Greenberg BM and Calabresi PA. Future research
directions in multiple sclerosis therapies. Semin Neurol
2008; 28: 121–7.
Design limitations specific to MS 10. Ramagopalan SV, Maugeri NJ, Handunnetthi L, et al.
Alternative statistical designs for MS have been dis- Expression of the multiple sclerosis-associated MHC
cussed extensively. Most are not new, but just have not class II allele HLA-DRB1*1501 is regulated by vitamin
been implemented in MS. Part of the reason for this is the D. PLoS Genet 2009; 5: e1000369.
interaction between regulatory authorities and pharma- 11. Ebers GC. Environmental factors and multiple
ceutical companies. Each in its own way is conservative, sclerosis. Lancet Neurol 2008; 7: 268–77.
opting for tried and true. Additional regulatory forces, 12. Giovannoni G and Ebers G. Multiple sclerosis: he
such as ethics boards, increase the inherent diiculty of environment and causation. Curr Opin Neurol 2007; 20:
more adventurous designs in MS. For example, adaptive 261–8.
designs require elaborate a priori decision-making and 13. Marrie RA, Rudick R, Horwitz R, et al. Vascular
oten changes to sample sizes, duration of treatment, comorbidity is associated with more rapid disability
etc., which require further discussions with IRBs and progression in multiple sclerosis. Neurology 30: 1041–7.
ethics boards. he practical aspects oten outweigh the 14. Zamboni P, Galeotti R, Menegatti E, et al. Chronic
statistical design properties and even potential cost sav- cerebrospinal venous insuiciency in patients with
ings attributed to modiications of the design. A single multiple sclerosis. J Neurol Neurosurg Psychiatry 2009;
80: 392–9.
change in protocol could cost $150 000 if there were 100
centers at $1500 per change. 15. Bermel RA and Rudick RA. Interferon-beta treatment
for multiple sclerosis. Neurotherapeutics 2007; 4:
633–46.
References 16. Lublin FD and Reingold SC. Deining the clinical
1. International Multiple Sclerosis Genetics Consortium, course of multiple sclerosis: Results of an international
Haler DA, Compston A, et al. Risk alleles for multiple survey. Neurology 1996; 46: 907–11.
sclerosis identiied by a genomewide study. N Engl J 17. Brex PA, Ciccarelli O, O’Riordan JI, et al. A longitudinal
Med 2007; 357: 851–62. study of abnormalities on MRI and disability from
2. Trapp BD, Peterson J, Ransohof RM, et al. Axonal multiple sclerosis. N Engl J Med 2002; 346: 158–64.
transection in the lesions of multiple sclerosis. N Engl J 18. McDonald WI, Compston A, Edan G, et al.
Med 1998; 338: 278–85. Recommended diagnostic criteria for multiple
3. Gold R, Linington C, and Lassmann H. Understanding sclerosis: Guidelines from the international panel on
pathogenesis and therapy of multiple sclerosis via the diagnosis of multiple sclerosis. Ann Neurol 2001; 50:
animal models: 70 years of merits and culprits in 121–7.
experimental autoimmune encephalomyelitis research. 19. Jacobs LD, Beck RW, Simon JH, et al. Intramuscular
Brain 2006; 129: 1953–71. interferon beta-1a therapy initiated during a irst
4. Soulika AM, Lee E, McCauley E, et al. Initiation demyelinating event in multiple sclerosis. CHAMPS
and progression of axonopathy in experimental study group. N Engl J Med 2000; 343: 898–904.
autoimmune encephalomyelitis. J Neurosci 2009; 29: 20. Comi G, Martinelli V, Rodegher M, et al. Efect of
14965–79. glatiramer acetate on conversion to clinically deinite
5. Derfuss T, Parikh K, Velhin S, et al. Contactin-2/TAG- multiple sclerosis in patients with clinically isolated
1-directed autoimmunity is identiied in multiple syndrome (PreCISe study): A randomised, double-blind,
sclerosis patients and mediates gray matter pathology placebo-controlled trial. Lancet 2009; 374: 1503–11.
in animals. Proc Natl Acad Sci USA 2009; 106: 8302–7. 21. Kappos L, Polman CH, Freedman MS, et al. Treatment
6. Steinman L. he gray aspects of white matter disease in with interferon beta-1b delays conversion to clinically
multiple sclerosis. Proc Natl Acad Sci USA 2009; 106: deinite and McDonald MS in patients with clinically
8083–4. isolated syndromes. Neurology 2006; 67: 1242–9.
269
22. Miller DH, Weinshenker BG, Filippi M, et al. 34. Filippi M, Rovaris M, Iannucci G, et al. Whole brain
Diferential diagnosis of suspected multiple sclerosis: A volume changes in patients with progressive MS treated
consensus approach. Mult Scler 2008; 14: 1157–74. with cladribine. Neurology 2000; 55: 1714–8.
23. Moore F and Okuda DT. Incidental MRI anomalies 35. Banwell B, Ghezzi A, Bar-Or A, et al. Multiple sclerosis
suggestive of multiple sclerosis: he radiologically in children: Clinical diagnosis, therapeutic strategies,
isolated syndrome. Neurology 2009; 73: 1714. and future directions. Lancet Neurol 2007; 6: 887–902.
24. Johnson KP, Brooks BR, Cohen JA, et al. Copolymer 36. Interferon beta-1b is efective in relapsing-remitting
1 reduces relapse rate and improves disability in multiple sclerosis. I. clinical results of a multicenter,
relapsing-remitting multiple sclerosis: Results of a randomized, double-blind, placebo-controlled trial.
phase III multicenter, double-blind placebo-controlled he IFNB Multiple Sclerosis Study Group. Neurology
trial. the copolymer 1 multiple sclerosis study group. 1993; 43: 655–61.
Neurology 1995; 45: 1268–76. 37. Rudick RA, Stuart WH, Calabresi PA, et al.
25. Jacobs LD, Cookfair DL, Rudick RA, et al. Natalizumab plus interferon beta-1a for relapsing
Intramuscular interferon beta-1a for disease multiple sclerosis. N Engl J Med 2006; 354: 911–23.
progression in relapsing multiple sclerosis. he Multiple 38. Kremenchutzky M, Rice GP, Baskerville J, et al. he
Sclerosis Collaborative Research Group (MSCRG). Ann natural history of multiple sclerosis: A geographically
Neurol 1996; 39: 285–94. based study 9: Observations on the progressive phase of
26. PRISMS (Prevention of Relapses and Disability the disease. Brain 2006; 129: 584–94.
by Interferon beta-1a Subcutaneously in Multiple 39. Kurtzke JF. Rating neurologic impairment in multiple
Sclerosis) study group. Randomized double-blind sclerosis: An expanded disability status scale (EDSS).
placebo-controlled study of interferon beta-1a in Neurology 1983; 33: 1444–52.
relapsing/remitting multiple sclerosis. Lancet 1998; 352:
1498–504. 40. Ebers GC, Heigenhauser L, Daumer M, et al. Disability
as an outcome in MS clinical trials. Neurology 2008; 71:
27. PRISMS Study Group and the University of British 624–31.
Columbia MS/MRI Analysis Group. PRISMS-4: Long-
term eicacy of interferon-beta-1a in relapsing MS. 41. Rudick RA, Lee J, Cutter GR, et al. Signiicance of
Neurology 2001; 56: 1628–36. disability progression in a clinical trial in relapsing-
remitting multiple sclerosis: Eight-year follow-up. Arch
28. Hawker K, O’Connor P, Freedman MS, et al. Rituximab Neurol 2010; 67: 1329–35.
in patients with primary progressive multiple sclerosis:
Results of a randomized double-blind placebo- 42. Rudick R, Antel J, Confavreux C, et al.
controlled multicenter trial. Ann Neurol 2009; 66: Recommendations from the national multiple sclerosis
460–71. society clinical outcomes assessment task force. Ann
Neurol 1997; 42: 379–82.
29. Wolinsky JS, Narayana PA, O’Connor P, et al.
Glatiramer acetate in primary progressive multiple 43. Fox RJ, Lee JC, and Rudick RA. Optimal reference
sclerosis: Results of a multinational, multicenter, population for the multiple sclerosis functional
double-blind, placebo-controlled trial. Ann Neurol composite. Mult Scler 2007; 13: 909–14.
2007; 61: 14–24. 44. Rudick RA, Polman CH, Cohen JA, et al. Assessing
30. Montalban X. Overview of European pilot study of disability progression with the multiple sclerosis
interferon beta-1b in primary progressive multiple functional composite. Mult Scler 2009; 15: 984–97.
sclerosis. Mult Scler 2004; 10 (Suppl 1): S62; discussion 45. Rao SM, Leo GJ, Bernardin L, et al. Cognitive
62–4. dysfunction in multiple sclerosis. I. Frequency, patterns,
31. Leary SM, Miller DH, Stevenson VL, et al. Interferon and prediction. Neurology 1991; 41: 685–91.
beta-1a in primary progressive MS: An exploratory, 46. Cohen JA, Rudick RA, editors. Multiple Sclerosis
randomized, controlled trial. Neurology 2003; 60: herapeutics. 3rd ed. London, UK, Informa Health care.
44–51. 2007.
32. Rammohan KW, Rosenberg JH, Lynn DJ, et al. Eicacy 47. Benedict RH, Cookfair D, Gavett R, et al. Validity of the
and safety of modainil (provigil) for the treatment of minimal assessment of cognitive function in multiple
fatigue in multiple sclerosis: A two centre phase 2 study. sclerosis (MACFIMS). J Int Neuropsychol Soc 2006; 12:
J Neurol Neurosurg Psychiatry 2002; 72: 179–83. 549–58.
33. Rice GP, Filippi M and Comi G. Cladribine and 48. Rudick RA and Miller DM. Health-related quality of life
progressive MS: Clinical and MRI outcomes of a in multiple sclerosis: Current evidence, measurement
multicenter controlled trial. Cladribine MRI Study and efects of disease severity and treatment. CNS Drugs
Group. Neurology 2000; 54: 1145–55. 2008; 22: 827–39.
270
49. Rudick RA, Miller D, Hass S, et al. Health-related acetate in relapsing-remitting multiple sclerosis: A
quality of life in multiple sclerosis: Efects of prospective, randomised, multicentre study. Lancet
natalizumab. Ann Neurol 2007; 62: 335–46. Neurol 2009; 8: 889–97.
50. Li DK and Paty DW. Magnetic resonance imaging 64. Polman CH, Reingold SC, Barkhof F, et al. Ethics of
results of the PRISMS trial: A randomized, double- placebo-controlled clinical trials in multiple sclerosis: A
blind, placebo-controlled study of interferon-beta1a in reassessment. Neurology 2008; 70: 1134–40.
relapsing-remitting multiple sclerosis. Ann Neurol 1999; 65. Cohen JA, Imrey PB, Calabresi PA, et al. Results of the
46: 197–206. avonex combination trial (ACT) in relapsing-remitting
51. Barkhof F. MRI in multiple sclerosis: Correlation with MS. Neurology 2009; 72: 535–41.
expanded disability status scale (EDSS). Mult Scler 66. Langer-Gould A, Atlas SW, Green AJ, et al. Progressive
1999; 5: 283–6. multifocal leukoencephalopathy in a patient treated
52. Fisher E, Rudick RA, Cutter G, et al. Relationship with natalizumab. N Engl J Med 2005; 353: 375–81.
between brain atrophy and disability: An 8-year 67. Cohen JA, Barkhof F, Comi G, et al. Oral ingolimod
follow-up study of multiple sclerosis patients. Mult Scler or intramuscular interferon for relapsing multiple
2000; 6: 373–7. sclerosis. N Engl J Med 2010; 362: 402–15.
53. Rudick RA, Lee JC, Simon J, et al. Signiicance of T2 68. Kappos L, Radue EW, O’Connor P, et al. A placebo-
lesions in multiple sclerosis: A 13-year longitudinal controlled trial of oral ingolimod in relapsing multiple
study. Ann Neurol 2006; 60: 236–42. sclerosis. N Engl J Med 2010; 362: 387–401.
54. van Walderveen MA, Kamphorst W, Scheltens P, et al.
69. Giovannoni G, Comi G, Cook S, et al. A placebo-
Histopathologic correlate of hypointense lesions on
controlled trial of oral cladribine for relapsing multiple
T1-weighted spin-echo MRI in multiple sclerosis.
sclerosis. N Engl J Med 2010; 362: 416–26.
Neurology 1998; 50: 1282–8.
70. Martinelli V. J Neurol Sci 2009; 30: S167–70.
55. van den Elskamp IJ, Lembcke J, et al. Persistent T1
hypointensity as an MRI marker for treatment eicacy 71. Kappos L, Antel J, Comi G, et al. Oral ingolimod
in multiple sclerosis. Mult Scler 2008; 14: 764–9. (FTY720) for relapsing multiple sclerosis. N Engl J Med
2006; 355: 1124–40.
56. Rudick RA, Fisher E, Lee JC, et al. Use of the brain
parenchymal fraction to measure whole brain 72. Aban IB, Wolfe GI, Cutter GR, et al. he MGTX
atrophy in relapsing-remitting MS. Multiple Sclerosis experience: Challenges in planning and executing an
Collaborative Research Group. Neurology 1999; 53: international, multicenter clinical trial. J Neuroimmunol
1698–704. 2008; 201–202: 80–4.
57. Smith SM, Zhang Y, Jenkinson M, et al. Accurate, 73. Duning T, Kloska S, Steinstrater O, et al. Dehydration
robust, and automated longitudinal and cross- confounds the assessment of brain atrophy. Neurology
sectional brain change analysis. Neuroimage 2002; 17: 2005; 64: 548–50.
479–89. 74. Fox RJ, Fisher E, Tkach J, et al. Brain atrophy
58. Fisher E, Rudick RA, Simon JH, et al. Eight-year and magnetization transfer ratio following
follow-up study of brain atrophy in patients with MS. methylprednisolone in multiple sclerosis: Short-term
Neurology 2002; 59: 1412–20. changes and long-term implications. Mult Scler 2005;
59. Sormani MP, Bonzano L, Roccatagliata L, et al. 11: 140–5.
Magnetic resonance imaging as a potential surrogate 75. Chen JT, Collins DL, Atkins HL, et al. Brain atrophy
for relapses in multiple sclerosis: A meta-analytic ater immunoablation and stem cell transplantation in
approach. Ann Neurol 2009; 65: 268–75. multiple sclerosis. Neurology 2006; 66: 1935–7.
60. Filippi M. Multiple sclerosis, part II: Nonconventional 76. Bermel RA and Bakshi R. he measurement and
MRI techniques. Preface. Neuroimaging Clin N Am clinical relevance of brain atrophy in multiple sclerosis.
2009; 19: xiii–xiv. Lancet Neurol 2006; 5: 158–70.
61. Balcer LJ, Baier ML, Pelak VS, et al. New low-contrast 77. Sormani MP, Rovaris M, Valsasina P, et al.
vision charts: Reliability and test characteristics in Measurement error of two diferent techniques
patients with multiple sclerosis. Mult Scler 2000; 6: for brain atrophy assessment in multiple sclerosis.
163–71. Neurology 2004; 62: 1432–4.
62. Fisher JB, Jacobs DA, Markowitz CE, et al. Relation of 78. Miller DH, Soon D, Fernando KT, et al. MRI outcomes
visual function to retinal nerve iber layer thickness in in a placebo-controlled trial of natalizumab in relapsing
multiple sclerosis. Ophthalmology 2006; 113: 324–32. MS. Neurology 2007; 68: 1390–401.
63. O’Connor P, Filippi M, Arnason B, et al. 250 microg or 79. Paolillo A, Coles AJ, Molyneux PD, et al. Quantitative
500 microg interferon beta-1b versus 20 mg glatiramer MRI in patients with secondary progressive MS treated
271
with monoclonal antibody campath 1H. Neurology 82. Fisher E, Lee JC, Nakamura K, et al. Gray matter
1999; 53: 751–7. atrophy in multiple sclerosis: A longitudinal study. Ann
80. Barkhof F, Calabresi PA, Miller DH, et al. Neurol 2008; 64: 255–65.
Imaging outcomes for neuroprotection and 83. Filippi M, Campi A, Dousset V, et al. A magnetization
repair in multiple sclerosis trials. Nat Rev Neurol 2009; transfer imaging study of normal-appearing white matter
5: 256–66. in multiple sclerosis. Neurology 1995; 45: 478–82.
81. Ludwin SK. he pathogenesis of multiple 84. Geurts JJ, Pouwels PJ, Uitdehaag BM, et al. Intracortical
sclerosis: Relating human pathology to lesions in multiple sclerosis: Improved detection with
experimental studies. J Neuropathol Exp Neurol 3D double inversion-recovery MR imaging. Radiology
2006; 65: 305–18. 2005; 236: 254–60.
272
Chapter
Amyotrophic lateral sclerosis
24 Nazem Atassi, David Schoenfeld, and Merit Cudkowicz
Introduction Therapeutic targets

Amyotrophic lateral sclerosis (ALS) is a neurodegen- he precise cause of selective motor neuron death in
erative disorder characterized by progressive muscle ALS is unknown. Many pathogenetic mechanisms
weakness that eventually afects respiratory muscles and have been proposed such as excitotoxicity, oxidative
causes death. here is a strong unmet need for devel- damage, mutant proteins, immune dysregulation,
opment of treatments for people with ALS. In the past mitochondrial dysfunction, and growth factors (Table
10–15 years there has been an exponential growth in clin- 24.1) [5, 6]. Advances in understanding the biology of
ical trials in ALS [1]. Much has been learned from these motor neuron death in ALS has led to more than 32
studies about preclinical models and clinical trial design compounds being tested in phase 2/3 clinical trials in
and conduct in ALS. he complexities of ALS still pose ALS during the past 15 years (Table 24.1). he recent
major challenges in translating progress in understand- discovery of new ALS genes has expanded our under-
ing disease mechanisms into efective novel therapies for standing of the role of aberrant RNA metabolism in the
people with ALS. he development of ALS therapeutics pathogenesis of both familial and sporadic ALS.
has followed a traditional discovery path with identii-
cation of potential targets from a variety of in vitro and Pre-clinical disease models
in vivo preclinical models. he predictability of these
Valid disease models are critical to better understand
preclinical tools for determination of eicacy in humans
disease pathogenesis and to the development of new
with ALS is not yet known. Once a candidate therapy has
treatments for ALS. A wide range of in vitro and in vivo
been identiied, investigators are faced with additional
models are available to both study disease biology and
challenges to identify compound bioavailability, dosing
screen therapeutic compounds (Table 24.2). he G93A
and pharmacodynamic properties and the optimal clin-
SOD1 mouse is used routinely as an in vivo model for
ical trial design. Preclinical disease models, biomarkers,
ALS. Recently, skin ibroblasts from people with ALS
clinical trial design options, and challenges to the con-
were used to produce induced pluripotant stem cells
duct of ALS clinical trials are discussed in this chapter.
(iPS) that are capable of diferentiating into motor
neurons and glia [7]. hese recent advances in stem
Biological basis for interventions cell technology ofer potential motor neuron models of
People with ALS develop progressive muscle weakness, sporadic ALS that can help understand disease patho-
atrophy and spasticity, relecting loss of lower motor genesis and screen new drugs.
neurons (LMNs) and upper motor neurons (UMNs)
in the brain and spinal cord. No treatment prevents, Challenges in translation from models
halts or reverses the disease, although a small delay in
mortality occurs with the drug riluzole [2]. While the to people
majority of ALS cases are sporadic, about 10% of cases Most available cell and mouse models are based on
are familial and of these, 30% arise due to a hexanucle- the SOD1 mutation that is present in only about 2%
otide repeat expansion on chormosome 9 and 25% arise of ALS patients. Currently, mice carrying 23 copies of
due to mutations in the gene encoding SOD1 [3, 4]. the human G93A SOD1 transgene are considered the
273
Table 24.1 Examples of past ALS clinical trials and their proposed primary targets
Targeted pathway ALS clinical trials

Excitotoxicity Riluzole, gabapentin, topiramate, lamotrigine, dextromethorphan, celecoxib, talampanel
Ceftriaxone*
Oxidative damage Vitamin E, glutathione, N-acetylcysteine, selegiline,
Immunoregulation Interferon β1a, ganglioside, cyclophosphamide, intravenous immunoglobulin, celecoxib, total
lymphoid irradiation
Energy & Creatine monohydrate*, coenzyme Q10, branched chain amino acids, L-threonine, KNS-760704*,
mitochondria Olesoxime
Growth factors Ciliary neurotrophic factor, brain-derived neurotrophic factor, thyrotropin releasing hormone [47],
growth hormone, insulin-like growth factor [48], xaliproden, VEGF* (SB509 and sNN0029)
Apoptosis Omigapil (TCH346), minocycline, pentoxifylline, tamoxifen*
Protein aggregation Arimoclomol
Decrease SOD1 levels ISIS-333611*, pyrimethamine
Stem cell replacement Neural stem*
* Active trial.
Table 24.2 Common pre-clinical disease models standard model for ALS therapeutic studies. While
the G93A SOD1 is an invaluable tool to test proof of
In vitro
̇ Organotypic spinal cord cultures
concept that the proposed therapy has the desired bio-
1. Mature cells logical activity, this mouse model has several limitations
from post-natal rats
̇ NSC34 and HeLa cell lines
[8]. Until there are more therapies that are efective in
people, it is not possible to know whether the currently
expressing mutant SOD1
̇ Glutamate excitotoxicity models
available preclinical models are valid screening tools.
2. Embryonic cells ̇ Organotypic slice cultures from

wild type/G93A embryonic spinal
Goals of interventions
cords
̇ Purified human motor neurons Slow disease progression
and astrocytes from human Most therapeutic targets and preclinical disease mod-
embryonic spinal cord anterior els discussed in this chapter are focused on modifying
horns disease progression. he primary goals of most clinical
̇ Motor neurons from mice trials in ALS are to slow disease progression as meas-
embryonic stem cells (ESCs) ured either by function or survival. Functional scales
̇ Motor neurons from human ESCs include measures of strength, pulmonary function,
and pluripotent cells and a questionnaire called the ALS functional rating
3. Neuroblastoma scale-revised (ALSFRS-R) [9].
cell lines
In vivo Treat ALS-related symptoms
1. Rodent models: ̇ Transgenic motor neuron disease In addition to muscle weakness, people with ALS sufer
rodent models: G93A SOD1, from many other ALS-related symptoms. Improving
G85R
SOD1, G37R SOD1 rodents,
ALS symptomatic management is as important to
Dynamitin over-expression model
̇ VEGF mouse model
ALS patients and their caregivers as disease modi-
̇ PMN, Wobbler, HCSMA mouse
fying treatments. he improvement in survival in
people with ALS seen in the past 10 years is likely sec-
models
ondary to improved multidisciplinary care and symp-
2. Zebra fish
tomatic treatments of dyspnea and dysphagia such
models
274
Chapter 24: Amyotrophic lateral sclerosis
as non-invasive ventilation and gastrostomy [10]. drugs targeting SOD1 protein, such as ISIS-333116 and
here have been very few studies to determine best Arimoclomol, require genetic conirmation of SOD1
approaches to manage most of the symptoms of ALS. familial ALS.
his is an unmet need and one for which it is diicult
to ind funding.
Disease heterogeneity
Although ALS by deinition involves both UMN and
Study population LMN dysfunction, some people have apparently only
LMN (progressive muscular atrophy) or only UMN
Demographics (primary lateral sclerosis) involvement which can be
he incidence of ALS is approximately 2/100 000/year associated with prolonged survival [16]. Similarly,
[11]. Fity percent of people with ALS die within 3 years phenotypes that predominantly afect certain body
of onset of symptoms and 90% die within 5 years [12]. areas (arm, leg, or bulbar) can be associated with dif-
Variability in rate of disease progression and symptom ferent disease progression and survival. In addition,
progression is high among people with ALS. Age and diferent SOD1 mutations are associated with diferent
gender are the only risk factors repeatedly documented disease phenotypes and rates of progression. As ALS
in epidemiological studies [13]. is characterized by marked phenotypic heterogeneity,
variations in therapeutic response might be an import-
ant confounding factor in clinical trials which drives
Eligibility in ALS clinical trials the need for larger sample size.
he goal of requiring speciic eligibility criteria for sub-
jects to enter ALS clinical trials is to achieve the balance
between conidence of the diagnosis and early disease Statistical challenges because of disease
enrollment. heterogeneity
he wide variety of presenting symptoms in ALS Most phase 2 or 3 eicacy clinical trials in ALS have
makes absolute diagnosis diicult early in disease not addressed phenotypic or genotypic heterogene-
course and ALS diagnosis is oten delayed for approxi- ity. Currently there is one trial which is restricted to
mately 12 months ater symptom onset [14]. he World patients with familial ALS from genetic mutations in
Federation of Neurology Subcommittee on Motor SOD1 mutations that are associated with a rapid rate of
Neuron Disease reached consensus on the criteria for progression. he hope is that because the rate of disease
the diagnosis of ALS in 1994 and these were revised in progression is rapid and more homogeneous in this
2000 [15]. population of patients, determination of eicacy can
Clinical trials in ALS require a certain degree of be made with a smaller number of participants.
certainty about the diagnosis based on the El Escorial he question about whether to combine genotypes
diagnostic criteria. Clinical trials to assess eicacy of and phenotypes or to separate them is always problem-
an intervention (phase 2 or 3) oten enroll people early atic. In the 1960s cancer clinical trials for solid tumors
in the disease course. his would include people classi- oten tested agents on many diferent tumor types. It
ied as possible, probable or deinite by El Escorial cri- was only ater agents were targeted to speciic tumors
teria, who have a forced vital capacity above 60 or 70% that progress began to be made. Currently cancer ther-
of predicted normal and disease duration less than 2 apy is very speciically targeted to tumor pathology. As
or 3 years. he reasons for these criteria include want- more is learned about the biology behind the diferent
ing to treat people as early as possible in their disease phenotypic forms of ALS, the ield will learn whether
course, minimizing patient to patient variability in rate this approach is feasible for ALS.
of disease progression, and for studies whose primary
outcome measure is function, ensuring that people can
complete the study. Early phase safety and dosage ind- Measurement tools in ALS
ing studies (phase 1 or 2a) oten include people with One of the most essential steps in clinical trial design
more advanced disease. here are trial-speciic eligibil- is choosing the outcome measure. he ideal outcome
ity criteria that depend on the aims and the primary measure for an ALS clinical trial is sensitive to dis-
outcome measures of the trial. For example, trials of ease progression, clinically meaningful, and easy to
275
administer even in advanced disease when patients are excess mortality [19]. Manual muscle testing on the
not able to come to clinic. other hand, ofers a faster and more portable method
of muscle strength measurement that has compar-
Clinical outcome measures able variability to the TQNE. he disadvantages of this
technique are that grading is qualitative and it may be
Survival insensitive to small changes. A more promising tech-
Survival is the gold standard primary endpoint for nique uses a hand held dynamometer to test isomet-
ALS trials. However, trials using survival as an out- ric strength of multiple muscles [20]. his technique
come measure require prolonged durations (typic- is portable, fast, and has been validated against TQNE
ally 18 months or more) and typically 400 people per [21]. It been used in previous ALS clinical trials and is
arm. Survival is inluenced by the individual’s rate of one of the outcome measures in the trial of cetriaxone
disease progression, site of onset, nutrition status, use in ALS (NCT00349622). Muscle weakness is the major
of riluzole, invasive and non-invasive ventilation, and ALS symptom and it is important to include it in ei-
gastrostomy. he occasional use of permanent assisted cacy trials.
ventilation (PAV) by ALS patients confounds survival
as a clinical trial endpoint since survival can be consid-
Motor unit number estimates [22]
erably extended by PAV [17]. At best this adds variation Motor unit number estimates (MUNE) is an electro-
to a treatment comparison of mortality at worse there physiological measure of lower motor neuron loss. It
might be a treatment efect on the decision to start PAV. uses surface electromyography techniques to assess
Sometimes the outcome measure is the time to death the progress of motor neuron loss and consequent
or the initiation of PAV. his also has diiculties as the re-enervation of denerved muscle ibers by surviv-
decision about when to start PAV, in patients who want ing neurons. his measure can be reliably performed
it, may be afected by treatment. here is currently no in a multi-center trial and yields results that show a
validated surrogate marker for survival. Several other consistent decline of MUNE overtime [23]. One of
outcome measures have been used in various ALS clin- the advantages of MUNE is that it is one of the clos-
ical trials (Table 24.3). est measures to disease pathology. It requires formal
electrophysiology training and it takes approximately
Vital capacity 30 minutes to compete. It might be a good outcome
Vital capacity (VC) is used in most ALS clinical tri- measure for a small phase 2, proof of concept, ALS
als as marker of respiratory muscle weakness. It is clinical trial. Currently, there is an ongoing longitu-
correlated with survival [18]. he major limitation of dinal study of MUNE as a potential outcome measure
VC use is that patients with bulbar weakness cannot in ALS.
make a good seal around the mouth piece resulting in
increased measurement variability. Time to a drop in Patient-reported outcome measures
VC has been used as a way to address this limitation
(NCT00542412). One of the outcome measures in the ALS Functional Rating Scale-Revised
current clinical trial of Arimoclomol in familial ALS is he ALS Functional Rating Scale-Revised (ALSFRS-R)
forced expiratory volume (FEV-6), a new measure that is widely used as a primary or secondary outcome
enables patients to reliably measure their respiratory measure in ALS clinical trials. It is an ordinal scale
status at home (NCT00706147). (0–4) used to determine patients’ assessment of their
capability and independence in 12 functional activities
Muscle strength questions (total score of 48). It can be administered
Measuring the decline in muscle strength using hand- quickly (ive minutes) in person or over the phone
held dynamometry, the Tuts quantitative neuromus- [24] and is also validated for administration from the
cular examination (TQNE) or manual muscle testing, caregiver [25]. Questions in ALSFRS-R cover four
can be easily performed. It takes approximately 45 domains: gross motor, ine motor, swallowing, and
minutes to administer the TQNE, it requires expen- breathing. he ALSFRS-R rate of decline was approxi-
sive equipment and is not practical for home visits. mately 0.92 units per month with a standard error of
In the clinical trial of topiramate in ALS, a signiicant 0.08 in the placebo arm of the trial of topiramate in
diference in muscle strength was not associated with ALS [19].
276
Table 24.3 Outcome measures used in past ALS clinical trials
Therapy Primary outcome measure

Riluzole Survival & ALSFRS
Gabapentin MVIC
Topiramate MVIC
Lamotrigine Clinical scores (age of onset, bulbar and respiratory involvement,
ambulation and functional disability)
Functional decline (Norris, Plaitakis and Bulbar scales)
Dextromethorphan Survival
Talampanel ALSFRS-R
Ciliary neurotrophic factor - Isometric muscle dynamometry
- Combination of MVIC & VC
Insulin-like growth factor-1 Appel ALS score
Brain-derived neurotrophic VC & Survival
factor
Thyrotropin releasing - Tufts quantitative neuromuscular exam
hormone - Muscle strength
Xaliproden Survival
Vitamin E - Modified Norris limb scale
- Survival
N-acetyl-L-cysteine Survival
Selegeline Appel ALS total score
Coenzyme Q10 MVIC
Creatine - Survival
- MVIC
Branched chain amino acids - Muscle strength, maximal isometric muscle torque
- Disability scales
Nimodipine Isometric muscle strength
Verapamil VC and limb megascores
TCH346 ALSFRS-R
Pentoxifylline Survival
Minocycline Safety/tolerability measures
ALSFRS-R
Sodium phenylbutyrate Safety and tolerability
Cyclophosphamide Neurological function score
Bovine gangliosides Neuromuscular function
Various objective tests of muscle strength
Interferon beta (IFβ1a) Non self supporting status (Medical Research Council Scale, Norris
Scale, Bulbar scores)
Glutathione Manual muscle testing
Oxandrolone MVIC
Tamoxifen Safety, MVIC
Lithium Survival
Ceftriaxone Survival
Arimoclomol in familial ALS ALSFRS-R and survival
KNS-760704 ALSFRS-R and survival
ISIS-333611 Safety
ALSFRS-R: ALS functional rating scale-revised; MVIC: maximum voluntary isometric contraction; VC: vital capacity.
277
Appel rating scale patients followed for 2 years. he length of follow-up

he Appel rating scale includes both subjective and becomes a problem as patients tend to stop therapy
objective assessments of bulbar, respiratory function, early in ALS trials, which reduces the power of a trial
muscle strength, and upper and lower extremity func- that requires long follow-up.
tion [26]. Appel scores range from 30 points (healthy) Random efects models are used to analyze lon-
to 164 (maximum impairment). gitudinal measures such as vital capacity, ALSFRS-R,
muscle strength, and MUNE. he most common pri-
ALS specific quality of life mary eicacy measure is ALSFRS-R because it can be
Quality of life in ALS patients is not easily determined assessed by telephone if the patient is unable to travel
by standard scales, such as SF-36, that rely mainly on to the clinic [24]. he random efects model speciies
physical function as indicator of quality of life. he ALS that each patient has a linear trajectory in the outcome
speciic quality of life (ALSSQOL)is a self-administered measure. heir actual measurements will be normally
questionnaire that was developed, tested and validated distributed about this trajectory and the trajectories
to measure quality of life in ALS patients. It is usually themselves will have normally distributed slopes and
used as a secondary outcome measure in ALS clinical intercepts. he primary statistical hypothesis is that the
trials [27]. he scale consists of 59 questions, each rated mean slope of these trajectories is diferent in the active
on a 1–10 scale, that ask about the severity of the symp- treatment group than it is in the placebo group. his
toms of ALS, mood, afect, intimacy, and social issues. model can be extended to situations where the trajec-
tories are non-linear, where there are important cov-
Biomarkers ariates, and where the distributions are not normal.
here are currently no reliable blood, cerebrospinal Trials that use longitudinal outcomes tend to be
luid or imaging biomarkers that track disease progres- smaller than trials focused on mortality, although they
sion. he availability of biomarkers that were disease still are fairly large due to the variability in the rate of
relevant and were related to disease progression could patient progression. For instance the standard deviation
potentially greatly accelerate therapy in drug devel- of the slope of ALSFRS-R is approximately 0.83 units/
opment. Establishing multi-center polling of a large month. his implies that 11% of the patients will actu-
number of blood, CSF and tissue samples is essential ally improve on placebo. he standard deviation around
for biomarker discovery. In addition to the above men- the trajectory is 2/units per month but the efect of this
tioned disease-related biomarkers, there are biomark- can be minimized by taking enough measurements.
ers that are drug-speciic. Measuring EAAT2 activity in he trial of lithium in ALS has used time to progres-
olfactory nerve ending during the trial of cetriaxone sion as an eicacy measure. Progression was deined as
in ALS is one example of a drug-speciic biomarker that a 6 point drop in ALSFRS-R, death, or PAV. he advan-
tracks pharmacodynamic activity. Another example is tage of this endpoint is that it shortened the trial for
measuring histone acetylation levels to determine the rapidly progressing patients on placebo and allowed
ideal dosage of sodium phenylbutyrate in a phase 2 them to switch to active treatment. he problem with
study in ALS [28]. Neuroimaging has proved to be an this endpoint is that it is not as powerful as using the
invaluable biomarker for drug discovery in multiple ALSFRS-R measurements themselves. One comprom-
sclerosis and Alzheimer’s disease trials, and is a promise is to use the design but to use a random efects model
ising tool in ALS (reviewed in [29]). in the analysis.
Another proposal is to combine a longitudinal
outcome with mortality using a rank-based approach
Principles of statistics and outcome which compares time to death when patients die and
measures in ALS clinical trials the last value of their longitudinal endpoint when they
he statistical techniques used in ALS clinical trials survive [30, 31].
depend on the outcome measure. he proportional
hazard regression model is oten used to analyze time
to death or time to death or PAV; these trials need to be
Statistical challenges of ALS outcome
large because the mortality in a clinical trial popula- measures
tion is about 15–20% per year. For instance to detect he largest challenge in clinical trials that use ALSFRS-R
a 50% improvement would require approximately 600 or other longitudinal outcomes measures is missing data.
278
Participants may drop out of the trial and not provide Futility design
data on the outcome measure. Other patients die dur- he main purpose of futility design studies in ALS is
ing the trial. It is not clear how to handle these missing to eliminate drugs that are not worthy of proceeding
data statistically. It appears that the usual random efects to phase 3 trials (discussed in Chapter 8). Two futility
model is fairly robust to this missingness. Alternatives studies have been performed in ALS [34, 35] studying
where the missingness process is explicitly modeled coenzyme Q10, and the combination of creatine and
appear to give similar results. Methods that combine minocycline and creatine and celecoxib.
mortality and ALSFRS-R do not have this problem but
have somewhat less power than random efects models Multi-arm selection design
unless there is a large efect on mortality that is independ- he purpose of selection designs is to use smaller sam-
ent of the relationship of mortality and ALSFRS-R. ple sizes to select a superior treatment or dosage to
move forward to a phase 3 trial (discussed in Chapter
Trial designs 8). his approach is particularly useful when there are
several candidate treatments or dosages. For example,
ALS trial designs two doses of CoQ10 were compared and the winner
was selected to be compared to placebo in a futility ana-
Phase 1 trials in ALS lysis [36] and in another trial, two drug combinations
Phase 1 trials are designed to learn the pharmacokinet- were compared [35].
ics, tolerable dosage ranges, and initial safety of single
and multiple doses of the drug under investigation in Lead-in design
humans. Approaches in ALS are similar to those used he lead-in design provides historical data about each
in other neurological and non-neurological disorders. patient prior to treatment onset hoping to reduce the
variance of the outcome measure which will allow for
Phase 2 trial designs smaller sample size. he main concerns of this design
Phase 2 trials in ALS usually enroll between 60 and are the delay of treatment onset, enrollment diicul-
400 participants and trial duration is usually less than ties, and non-linearity of the outcome measure. here
12 months. he main purpose of phase 2 trials in ALS are two approaches to analyzing these trials depending
is to gather information about drug’s biological activ- on whether you assume that the slope of the placebo
ity (pharmacodynamics), dosage range and sched- group will not change ater treatment is initiated. With
ule, tolerability, side efects, and preliminary eicacy. this assumption the slope of the ALSFRS-R lead-in
his information will help guide the decision about phase serves as a control for the treatment phase and
whether to proceed to and how to best design a phase augments the placebo sample size. his assumption
3 trial [32]. has become suspect because in two recent ALS clin-
Traditional phase 2 trials in ALS are not focused on ical trials (TCH346 and Minocycline), the slope of
evaluating eicacy. In addition to toxicity and safety ALSFRS-R of the placebo arm changed ater treatment
information, it is useful to incorporate pharmacody- had started [37, 38]. Without this assumption the slope
namic markers, predictive markers for therapeutic of ALSFRS-R in the lead in phase is used as a prognostic
response (proof-of-concept), and explore a range of covariate. With the latter analysis, the power advantage
dosages. of a lead-in design is small because the lead-in phase is
Phase 2 or proof-of-concept designs are challen- not as prognostic as one would expect. [32].
ging in ALS because there is no short-term sensitive
biomarker that can predict long-term therapeutic ei- Phase 3 trial designs in ALS
cacy. Clinical trials that use available functional out- Phase 3 trials are usually performed ater successfully
come measures such as ALSFRS-R, typically require completing a phase 2 trial that provided some evidence of
approximately 200 subjects per arm and at least 9 drug activity. A placebo arm is required in phase 3 trials
months to have the statistical power to identify a treat- and multiple dosages are sometimes desired. Phase 3 trials
ment efect. To overcome these design challenges, need to have adequate power to detect a clinical beneit if
other types of phase 2 designs are sometimes employed it is truly present. he primary outcome measure should
such as futility design, multi-arm selection design, and be clinically meaningful and the goal of these studies is to
lead-in design [32, 33]. conclusively demonstrate eicacy or lack of eicacy, in
279
addition to long-term safety. Phase 3 trials in ALS need to Drug interactions

be powered according to the primary outcome measure It is very important to determine experimental drug
used in the trial. For example, approximately 200–250 interactions with riluzole and other commonly used
subjects per arm followed for 9–12 months if ALSFRS-R symptomatic treatments. An experimental drug, such
is the primary outcome measure, and 400 subjects per as minocycline, that can potentially increase riluzole
arm followed for 18 months if survival is chosen to be the levels may increase riluzole side efects and may also
primary outcome measure. Traditionally phase 3 trials negatively afect ALSFRS-R scores [41]. hus, drug
in ALS looked at survival as the primary endpoint; how- interactions can cause apparent worsening of func-
ever, more recent phase 3 trials started using functional tional decline or mask positive efects of the experi-
outcome measures such as ALSFRS-R as the primary mental drug.
endpoint in order to conduct smaller, shorter, and more
eicient trials with fewer drop outs.
Standards for efficacy and special safety
Adaptive designs concerns
Adaptive design trials have been recently tried in ALS
as a way to conserve on sample size, trial duration, Efficacy in Phase 2/3 ALS trials (Riluzole is the new
and resources (see Chapter 9). he trials of CoQ10 placebo)
and more recently the trial of cetriaxone in ALS are Riluzole is the only FDA-approved treatment for ALS.
examples of seamless adaptive trial designs using the In the US, only about 60% of people with ALS use
irst stage to determine the best dosage to be used in an riluzole because of its high cost relative to its minimal
eicacy later stage [34, 39] expected clinical beneit. Most ALS trials in the US
enroll people regardless of their riluzole use and strati-
Challenges in clinical trial design ied enrollment based on riluzole use is usually imple-
mented to balance treatment groups. Riluzole is more
Dosage selection commonly used in Europe and Canada, subsequently;
Picking the appropriate dosage and route of administra- most ALS trials in Europe and Canada are considered
tion in humans is one of the most challenging aspects of ‘add-on’ trials.
drug development. Determining dosage response and Riluzole treatment during an ALS clinical trial
maximum tolerated dosage can be costly but should be potentially sets a higher bar for survival eicacy of the
key components in phase 2/3 ALS trials. he phase 2/3 new experimental drugs. For example, a new investiga-
randomized trial of TCH347 [37] and the second study tional drug for ALS has to prolong median survival for
of riluzole [40] are examples of well-conducted ALS at least 6 months which translates to a 30% decrease in
clinical trials that explored a broad range of dosages. ALSFRS-R rate of decline in a 12 month trial of more
Without determining the ideal therapeutic dosages, it than 350 subjects.
is diicult to interpret the results of negative clinical Achieving balance between feasibility of high-cost
trials which may result in erroneous rejection of drugs large clinical trials in a rare disease like ALS is a major
and scientiic hypotheses because of a ‘failed’ trial. challenge.
Sample size Methods and standards of monitoring safety

Sample size requirements vary based on the primary in ALS clinical trials
outcome measure and expected efect size. For example, As ALS is a devastating disease the side efects of new
1200 participants are needed to demonstrate a 50% treatments are usually acceptable as long as these treat-
change in median survival during a 1 year follow up trial ments are efective. herefore, the safety threshold in
(90% power and alpha of 0.05), whereas only 200 partici- ALS clinical trials is relatively high and is comparable
pants are needed to show a 40% change in ALSFRS-R. to cancer trials, and safety concerns in ALS trials are
However, large changes in ALSFRS-R may not mean usually speciic to the experimental treatment. High-
similar large changes in survival. Many past ALS tri- risk phase 1 and all phase 2/ 3 ALS clinical trials typic-
als were insuiciently powered including trials of dex- ally have a Data Safety Monitoring Board (DSMB), that
omethorphan (n = 45), vitamin E (n = 104), nimodipine evaluates whether the clinical trial should continue as
(n = 87), verapamil (n = 72), and creatine, 5g (n = 104). planned, requires modiications to the protocol, or
280
should terminate early because of safety or toxicity con- the diagnosis. he inclusion criteria of most ALS tri-
cerns or convincing evidence of eicacy or futility. he als require the presence of both UMN and LMN dys-
recent trial of lithium (NCT00818389) is an example of function in multiple body segments to be conident
an ALS trial that stopped early for futility. about the diagnosis of ALS. his reduces the number
of trial-eligible people and results in conducting tri-
Clinical trial conduct in ALS als on a subpopulation of people with advanced ALS
that have probably missed their therapeutic window.
Enrollment Traditionally, people with ‘possible ALS’ according to
he percentage of eligible people with ALS that enroll the El Escorial criteria were excluded from enrollment
in trials is surprisingly small. In a recent poll of neu- in ALS trials, but more recently, people with ‘possible
rologists involved in ALS clinical research, average ALS’ are allowed to enroll in clinical trials. A prospect-
enrollment in ALS trials was 25% and highly variable ive population study reported that 35% of the patients
between diferent sites. In a literature review of 36 com- with ALS were considered trial ineligible at the time
pleted clinical trials in ALS, the average enrollment rate of diagnosis and 16% of patients die of ALS without
was 2.2 participants/site/month [42]. Slow enrolling being considered trial-eligible based on El Escorial
trials are more resource intensive and may end prema- criteria for ‘possible ALS’ [14]. Some of these patients
turely and without an answer due to insuicient power. have a very clear clinical presentation of early ALS but
Of-label use of study medication signiicantly delayed unfortunately they do not fulill the strict diagnostic
enrollment for topiramate, minocycline, and celecoxib criteria to enter ALS clinical trials. Development of a
clinical trials. Low enrollment rate is not unique to ALS sensitive and speciic diagnostic biomarker for ALS
trials but it has a bigger impact in ALS trials because can help early accurate diagnosis and enrollment in
it is an orphan disease. Recently, low enrollment has clinical trials, which allows earlier initiation of poten-
resulted in recent changes in trial design and eligibil- tial therapies.
ity criteria that allow early enrollment in more eicient
trials. Challenges and controversies
Study retention Changing natural history
he dropout rate of past ALS clinical trials was high, An improvement in survival during the last decade
particularly in trials of longer duration or with signii- has been demonstrated in diferent studies [43, 44].
cant adverse events. With intent-to-treat (ITT) ana- In addition to FDA approval of riluzole treatment in
lyses, high dropout rates result in the dilution of the ALS in the mid-1990s, symptom management of ALS
observed beneits of the new therapy. For longitudinal has improved due to the introduction of multidiscip-
endpoints such as ALSFRS it can compromise the val- linary clinics [45], better hospital care, early use of
idity of the trial because of the assumption that patients non-invasive ventilation [46], and nutritional support
who remain are no diferent from those who drop out. with gastrostomy. Although survival has improved in
For example, in the 12- month trial of topiramate in the placebo arms of ALS trials, function measured by
ALS, 23% of the participants did not complete the trial ALSFRS-R has not changed [10].
because of subject’s choice, disease progression, adverse
events, and diiculty travelling [20]. Conducting Multisystem disorder
shorter trials with fewer visits and ofering home vis- Most ALS clinical trials target motor dysfunction and
its or travel compensation may ease the burden on the survival as outcome measures with minimal focus on
participants and improve study retention. other features of ALS such as cognitive dysfunction.
he recognition of extra-motor involvement in ALS
Diagnostic accuracy has changed the old deinition of ALS as a pure motor
Most ALS clinical trials require conirmed diagnosis neuron disease. Approximately half of people with ALS
and good respiratory function as part of the eligibil- have frontal executive deicits. he lack of good disease
ity criteria. Amyotrophic lateral sclerosis is mainly a models and outcome measures for cognitive dysfunc-
clinical diagnosis that follows the El Escorial criteria tion in ALS are the major challenges of conducting ALS
which is based on diferent degrees of certainty about trials targeting cognitive dysfunction.
281
8. Scott S, Kranz JE, Cole J, et al. Design, power, and

What is missing? (symptomatic treatments) interpretation of studies in the standard murine model
Despite an increased number of candidate drugs and of ALS. Amyotroph Lateral Scler 2008; 9: 4–15.
clinical trials in ALS, few trials target management of 9. Cedarbaum J, Stambler N, Malta E, et al. he
ALS-related symptoms. A survey of clinicians’ prac- ALSFRS-R: a revised ALS functional rating scale that
tice in the symptomatic treatment of ALS revealed incorporated assessments of respiratory function. J
that consensus on treatments was rare among clini- Neurol Sci 1999; 169: 13–21.
cians [47]. he Quality Standards Subcommittee of the 10. Qureshi M, Schoenfeld DA, Paliwal Y, et al. he
American Academy of Neurology along with the ALS natural history of ALS is changing: Improved survival.
Practice Parameters Task Force issued an evidence- Amyotroph Lateral Scler 2009; 10: 324–31.
based review of the practice parameter in the care of 11. McGuire V, Longstreth Jr W, Koepsell T, et al. Incidence
people with ALS in 2009 [49]. Few evidence-based of amyotrophic lateral sclerosis in three counties in
guidelines were produced for symptomatic treatments western Washington State. Neurology 1996; 47: 571–3.
and further controlled trials were recommended. 12. Kurtzke J and Kurland L. he epidemiology of
neurologic disease. In: Joynt R, editor. Clinical
Neurology. Philadelphia, J.B. Lippincot. 1989; 1–43.
Summary 13. Kurtzke JF. Risk factors in amyotrophic lateral sclerosis.
Amyotrophic lateral sclerosis is an orphan neurodegen- Adv Neurol 1991; 56: 245–70.
erative disorder that has one available treatment with 14. Traynor BJ, Codd MB, Corr B, et al. Clinical features
modest impact on survival. In addition to the experi- of amyotrophic lateral sclerosis according to the
ence in clinical management and clinical trial design El Escorial and Airlie House diagnostic criteria: A
and conduct, an enormous amount of new and promis- population-based study. Arch Neurol 2000; 57: 1171–6.
ing information about ALS genetics, pathophysiology, 15. Brooks B, Miller R, Swash M, et al. El Escorial revisited:
and biomarkers has become available. hese are very revised criteria for the diagnosis of amyotrophic lateral
exciting and hopeful times for ALS research and ther- sclerosis. Amyotroph Lateral Scler Other Motor Neuron
apy discovery. Disord 2000; 15: 293–9.
16. Gordon PH, Cheng B, Katz IB, et al. he natural
history of primary lateral sclerosis. Neurology 2006; 66:
References: 647–53.
1. Lanka V, Cudkowicz M. herapy development for ALS: 17. Conte A, Media F, Luigetti M, et al. Survival in ALS
Lessons learned and path forward. Amyotroph Lateral patients ater tracheostomy. 20th Symposium on ALS/
Scler 2008; 9: 131–40. MND. Berlin. 2009.
2. Bensimon G, Lacomblez L, and Meininger V. he 18. Traynor BJ, Zhang H, Shefner JM, et al. Functional
ALSRSG. A Controlled Trial of Riluzole in Amyotrophic outcome measures as clinical trial endpoints in ALS.
Lateral Sclerosis. N Engl J Med 1994; 330: 585–91. Neurology 2004; 63: 1933–5.
3. Rosen DR, Siddique T, Patterson D, et al. Mutations in 19. Cudkowicz ME, Shefner JM, Schoenfeld DA, et al. A
Cu/Zn superoxide dismutase are associated with familial randomized, placebo-controlled trial of topiramate
amyotrophic lateral sclerosis. Nature 1993; 362: 59–62. in amyotrophic lateral sclerosis. Neurology 2003; 61:
4. Renton AE, Majounie E, Waite A, et al. A 456–64.
hexanucleotide repeat expansion in C9ORF72 is the 20. Goonetilleke A, Modarres-Sadeghi H and Guilof R.
cause of chromosome 9p21-linked ALS-F70. Neuron Accuracy, reproducibility, and variability of hand-
2011; 72(2): 257–68. held dynamometry in motor neuron disease. J Neurol
5. Rothstein JD. Current hypotheses for the underlying Neurosurg Psych 1994; 57: 326–32.
biology of amyotrophic lateral sclerosis. Ann Neurol 21. Beck M, Giess R, Wurfel W, et al. Comparison
2009; 65(S1): S3–S9. of maximal voluntary isometric contraction and
6. Cleveland DW and Rothstein JD. From Charcot to Lou Drachman’s hand-held dynamometry in evaluating
Gehrig: Deciphering selective motor neuron death in patients with amyotrophic lateral sclerosis. Muscle
ALS. Nature Reviews Neuroscience 2001; 2(11): 806–19. Nerve 1999; 22: 1265–70.
7. Dimos JT, Rodolfa KT, Niakan KK, et al. Induced 22. Raoul C, Estevez A, Nishimune H, et al. Motoneuron
pluripotent stem cells generated from patients with ALS death triggered by a speciic pathway downstream
can be diferentiated into motor neurons. Science 2008; of Fas. potentiation by ALS-linked SOD1 mutations.
321: 1218–21. Neuron 2002; 35: 1067–83.
282
23. Shefner J, Rutkove SB, David W, et al. Modiied 38. Gordon PH, Moore DH, Miller RG, et al. Eicacy
incremental motor unit estimation in a longitudinal of minocycline in patients with amyotrophic lateral
natural history study of subjects with ALS. 20th sclerosis: a phase III randomised trial. Lancet Neurol
International Symposium on ALS/MND. Berlin. 2007; 6: 1045–53.
2009. 39. Cudkowicz M, Greenblatt D, Shefner J, et al.
24. Kaufmann P, Levy G, Montes J, et al. Excellent inter- Cetriaxone in ALS: results of stages 1 and 2 of an
rater, intra-rater, and telephone-administered reliability adaptive design safety, pharmacokinetic and eicacy
of the ALSFRS-R in a multicenter clinical trial. trial. 20th International Symposium on ALS/MND.
Amyotrophic Lateral Sclerosis 2007; 8: 42–6. Berlin. 2009.
25. Kasarskis EJ, Dempsey-Hall L, hompson MM, et 40. Lacomblez L, Bensimon G, Leigh P, et al. Dose-ranging
al. Rating the severity of ALS by caregivers over the study of riluzole in amyotrophic lateral sclerosis.
telephone using the ALSFRS-R. Amyotroph Lateral Scler Amyotrophic Lateral Sclerosis/Riluzole Study Group II.
Other Motor Neuron Disord 2005; 6: 50–4. Lancet 1996; 347: 1425–31.
26. Appel V, Stewart S, Smith G, et al. A rating scale 41. Aggarwal S and Cudkowicz M. ALS drug development:
for amyotrophic lateral sclerosis: description and relections from the past and a way forward.
preliminary experience. Ann Neurol 1987; 22: 328–33. Neurotherapeutics 2008; 5: 516–27.
27. Simmons Z, Felgoise SH, Bremer BA, et al. he 42. Bedlack RS, Pastula D, Welsh E, et al. Scrutinizing
ALSSQOL: Balancing physical and nonphysical factors enrollment in ALS clinical trials: Room for
in assessing quality of life in ALS. Neurology 2006; 67: improvement? Amyotroph Lateral Scler 2008; 9: 257–65.
1659–64. 43. Czaplinski A, Yen AA, Simpson EP, et al. Slower disease
28. Cudkowicz ME, Andres PL, Macdonald SA, et al. Phase progression and prolonged survival in contemporary
2 study of sodium phenylbutyrate in ALS. Amyotrophic patients with amyotrophic lateral sclerosis: Is the
Lateral Sclerosis 2009; 10: 99–106. natural history of amyotrophic lateral sclerosis
changing? Arch Neurol 2006; 63: 1139–43.
29. Turner MR, Kiernan MC, Leigh PN, et al. Biomarkers
in amyotrophic lateral sclerosis. Lancet Neurol 2009; 8: 44. Testa D, Lovati R, Ferrarini M, et al. Survival of 793
94–109. patients with amyotrophic lateral sclerosis diagnosed
over a 28-year period. Amyotroph Lateral Scler Other
30. Finkelstein D and Schoenfeld D. Combining mortality
Motor Neuron Disord 2004; 5: 208–12.
and longitudinal measures in clinical trials. Stat Med
1999; 18: 1341–54. 45. Traynor BJ, Alexander M, Corr B, et al. Efect of a
multidisciplinary amyotrophic lateral sclerosis (ALS)
31. Cudkowicz M, Bozik ME, Ingersoll EW, et al. he efects
clinic on ALS survival: a population based study, 1996–
of dexpramipexole (KNS-760704) in individuals with
2000. J Neurol Neurosurg Psychiatry 2003; 74: 1258–61.
amyotrophic lateral sclerosis. Nat Med 2011; 17(12):
1652–6. 46. Dattwyler RJ, Halperin JJ, Pass H, et al. Cetriaxone as
efective therapy in refractory lyme disease. J Inf Dis
32. Cudkowicz ME, Katz J, Moore DH, et al. Toward more
1987; 155: 1322–4.
eicient clinical trials for amyotrophic lateral sclerosis.
Amyotrophic Lateral Sclerosis 2010; 11: 259–65. 47. Forshew DA and Bromberg MB. A survey of clinicians’
practice in the symptomatic treatment of ALS.
33. Schoenfeld DA and Cudkowicz M. Design of phase II
Amyotroph Lateral Scler Other Motor Neuron Disord
ALS clinical trials. Amyotrophic Lateral Sclerosis 2008; 9:
2003; 4: 258–63.
16–23.
48. Miller RG, Jackson CE, Kasarskis EJ, et al. Practice
34. Petra K, John LPT, Gilberto L, et al. Phase II trial of
Parameter update: he care of the patient with
CoQ10 for ALS inds insuicient evidence to justify
amyotrophic lateral sclerosis: Drug, nutritional, and
phase III. Annals of Neurology 2009; 66: 235–44.
respiratory therapies (an evidence-based review):
35. Gordon PH, Cheung Y-K, Levin B, et al. A novel, Report of the Quality Standards Subcommittee of the
eicient, randomized selection trial comparing American Academy of Neurology. Neurology 2009; 73:
combinations of drug therapy for ALS. Amyotrophic 1218–26.
Lateral Sclerosis 2008; 9: 212–22.
49. Batshaw M, MacArthur R, and Tuchman M. Alternative
36. Levy G, Kaufmann P, Buchsbaum R, et al. A two-stage pathway therapy for urea cycle disorders: twenty years
design for a phase II clinical trial of coenzyme Q10 in later. J Pediatr 2001; 138(Suppl 1): S46–55.
ALS. Neurology 2006; 66: 660–3. 50. Leigh N, Groups atNAaEAI-S. he treatment of ALS
37. Miller R, Bradley W, Cudkowicz M, et al. Phase II/ with recombinant insulin-like growth factor (rhIGF-1):
III randomized trial of TCH346 in patients with ALS. pooled analysis of two clinical trials. Neurology 1997;
Neurology 2007; 69: 776–84. 1997(Suppl 1): A217–A8.
283
Section
Section6 Clinical trials in common neurological disorders
Chapter
Epilepsy
25 John R. Pollard, Susan S. Ellenberg,

and Jacqueline A. French
Introduction led to the development of several useful antiepileptic

drugs [2]. However, this extremely useful theory is
Epilepsy is a condition that is characterized by unpro-
profoundly diicult to extend to the single cell level
voked recurrent seizures. It has also been called ‘the epi-
that makes extensive use of recent advances in genet-
lepsies’ since epilepsy is in fact comprised of a number
ics, functional genomics, proteomics, and signaling
of syndromes, and can be precipitated by a large variety
pathways. It is diicult to incorporate the variation in
of underlying causes. Seizures associated with epilepsy
neuroanatomy, the diferential efects of the drugs on
occur with unpredictable and variable timing and fre-
excitatory versus inhibitory neurons, or the efects on
quency. his variability is central to many of the issues
astrocytes and glia. One result is that the mechanisms
surrounding epilepsy clinical trials, and as with all con-
of action of many antiepileptic drugs are still under
ditions that occur episodically, creates some complex-
active investigation.
ity. Yet, antiepileptic drugs have been an active area of
Complicating the understanding of the biological
drug development, for several reasons.
basis of epilepsy is the fact that there are a number of
he pathophysiology is understood; the preclinical
epilepsy syndromes that may represent fundamentally
models had good predictive value in treating human
diferent types of dysfunction. Dysregulation of tha-
disease; and the clinical trials were relatively reliable
lamocortical circuits probably underlies generalized
because seizures were objective events that were eas-
onset seizures, which can either result from a primary
ily analyzable. More recently the path to epilepsy drug
genetic disorder (idiopathic generalized epilepsies
approval has become somewhat more diicult. his
such as juvenile myoclonic epilepsy and absence epi-
chapter will illustrate the important issues to consider
lepsy) or conditions associated with difuse brain dis-
while shepherding a compound through epilepsy clini-
ruption (symptomatic generalized epilepsies such as
cal trials.
Lennox Gastaut syndrome, West syndrome). Partial
here are several facets of epilepsy that deserve treat-
onset seizures start focally or multifocally and then
ment but have not been the main objective of approval
spread to involve adjacent or well-connected brain
studies in the past. hese include comorbidities such as
regions [1]. Usually this results from a focal anatomic
depression or dementia, and more fundamental prob-
brain disturbance (common etiologies include mesial
lems like status epilepticus or development of epilepsy
temporal sclerosis, cortical dysplasia, stroke, and trau-
(epileptogenesis.) his chapter will be limited to treat-
matic brain injury)[3].
ment of seizures.
Approximately one-third of patients with epilepsy
are treatment resistant, and it is these patients for whom
Biological basis for interventions new drugs are oten targeted. he mechanism of treat-
It is generally accepted that a seizure results when exci- ment resistance is poorly understood. Some theories
tation in one area of the brain exceeds inhibition. his include an inability of drugs to reach relevant targets
imbalance results in synchronous depolarization of due to overexpression of multidrug elux transport-
excitatory neurons (which far outnumber inhibitory ers (such as P-glycoprotein), alteration of the targets in
ones) and ultimately this activity manifests as clinical a way that makes common drugs inefective, and/ or
seizures [1]. In practical application this theory has development of unique mechanisms of seizure genesis
284
Chapter 25: Epilepsy
in treatment resistant patients, that are not addressed Screening focuses on high throughput models uti-
by standard antiepileptic drugs. hus, it is possible that lizing electrically or chemically induced seizures in
drugs most appropriate for newly diagnosed patients normal animals, such as the maximum electroshock
may not be relevant for patients with treatment resist- test and the pentylenetetrazol test, which in the past
ant epilepsy, who may need unique approaches [4]. were believed to predict eicacy against tonic-clonic
he major known mechanisms of actions of and absence seizures, respectively, and the 6 Hz model
antiepileptic drugs can be subdivided into those asso- [10], which may better target drugs for treatment
ciated with alteration of voltage and receptor gated ion resistant epilepsy. More recently, these screens have
channels, (sodium, potassium, calcium), modulation been criticized as they do not truly model the human
of neurotransmitter release, and modulation of excita- condition, even though they have been quite predictive
tory and inhibitory neurotransmitters [5]. Some anti- of at least some eicacy in human clinical trials [11].
epileptic drugs are developed as ‘designer drugs’, with Newer models, such as pilocarpine, kainate, or elec-
a known mechanism as a target. his was the case for trically induced post-status epilepsy models, may be
vigabatrin, which acts via irreversible inhibition of the more useful for identifying compounds for treatment-
GABA metabolizing enzyme GABA transaminase [6]. resistant epilepsy, but are not useful for high-through-
In contrast, a number of antiepileptic drugs are iden- put screening of new chemical entities. Also gaining
tiied through high throughput screening (see below) favor are genetic spike-wave models such as the WAG/
or other means, and in these cases mechanisms may RIJ rodent model [12], and the generalized absence
be unknown. Drugs identiied this way will undergo epilepsy rat of Strasbourg which have a good record for
testing to try to uncover important mechanisms of predicting eicacy against generalized onset seizures
action. One or several may be discovered, and it is oten associated with EEG spike-wave, and also can help pre-
diicult to conirm that any of these mechanisms dis- dict the likelihood of seizure exacerbation [13].
covered post hoc is the principle mechanism by which Preclinical models are also used to evaluate phar-
the drug exerts its eicacy. Occasionally, further inves- macokinetic parameters and toxicology. Of particular
tigation will uncover a completely novel mechanism. interest in antiepileptic drug development is testing
his was the case for levetiracetam, which ultimately in another non-rodent species to reine the model for
was found to bind to synaptic vesicle protein 2A, and is estimating dosing in humans [14].
thought to act through modifying the release of neuro-
transmitter, a previously unknown mechanism [7].
Understanding the putative mechanism of an Study populations: Human
antiepileptic drug may be important in determin- Populations for epilepsy clinical trials are usually sub-
ing the clinical population that will beneit from the divided based on epilepsy syndrome and patient age.
drug. Some mechanisms of action have been associ- It is diicult to enroll treatment sensitive patients into
ated with aggravation of some seizure types. Most clinical trials once they are already on an established
notably, drugs that act either by fast sodium channel antiepileptic drug, and therefore the trials are also sub-
blockade (carbamazepine, phenytoin, oxcarbazepine) divided into those for newly diagnosed patients, and
or GABA enhancement (vigabatrin, tiagabine) are those who continue to have seizures despite therapy
known to exacerbate certain types of generalized sei- (treatment resistant).
zures (absence, myoclonus) [8]. he majority of epilepsy trials enroll patients with
partial (focal) epilepsy, as deined in the International
League Against Epilepsy seizure classiication, which
Preclinical assessment of antiepileptic has recently been revised [15]. he previous classiica-
drugs tion is typically used for classifying seizures in anti-
One of the great boons to epilepsy drug development epileptic drug trials, although this may change [16].
has been the availability of the NIH Anticonvulsant Two-thirds of all epilepsies are partial onset, and these
Screening Program, which provides preclinical screen- can begin at any age. Seizures associated with partial
ing at no cost to companies with potential antiepileptic epilepsies are classiied depending on degree of spread,
compounds. It is estimated that over the last 35 years, which translates to degree of clinical disruption. If
nearly 32 000 investigational antiepileptic drugs have a seizure involves a small amount of brain, and thus
been evaluated by the program [9]. no alteration of awareness, it is referred to as a simple
285
partial seizure. A seizure that involves more brain and patients oten results in seizure freedom or rare sei-
results in alteration of awareness is called a complex zures. hese characteristics make new onset epilepsy
partial seizure, and a seizure that spreads to involve the patients a poor choice for initial registration studies,
whole brain results in a secondarily generalized tonic where the regulatory authorities typically insist on a
clonic seizure, oten referred to as a ‘convulsion’. Of placebo control. hus, the default antiepileptic trial for
these seizure types, the most common in clinical trials drug approval in the US has become the add-on design
is the complex partial seizure, but each patient can have enrolling drug resistant adult partial onset epilepsy
one, two or all of these seizure types. Partial onset types subjects with a high enough seizure frequency to be
of seizures are by far the most common in drug resist- able to demonstrate a drug efect.
ant adult epilepsy patients, and therefore partial onset
seizures form the basis of most pivotal trials to dem- Methods of delivery
onstrate eicacy and safety for initial drug registra-
he usual method of medication delivery in epilepsy
tion [17]. he International League Against Epilepsy
is oral administration. Intravenous administration
deines drug resistant epilepsy as: ‘failure of adequate
of these drugs in terms of approval studies has been
trials of two tolerated and appropriately chosen and
restricted to use when oral medications are unable to
used [antiepileptic drug] schedules (whether as mono-
be administered [21]. In practice intravenous admin-
therapies or in combination) to achieve sustained seiz-
istration is oten used in emergency rooms to ensure
ure freedom’ [18].
that new onset epilepsy patients have achieved ade-
he generalized onset epilepsy types typically
quate serum concentrations before going home. hus
begin in childhood or adolescence, and some will
the availability of intravenous phenytoin has helped
remit by adulthood. Seizure types for the more benign
ensure its persistent status as one of the most prescribed
genetic (idiopathic) generalized epilepsies include
antiepileptic drugs in the US. In addition, intravenous
absence, myoclonus and generalized tonic-clonic
formulations are critical for treatment of patients with
(oten referred to as primarily generalized, to distin-
status epilepticus. However, clinical trials in this popu-
guish them from those that occur in the partial epi-
lation are extremely diicult, so regulatory approval
lepsies, as described above). Which of these seizure
for this indication has not been sought.
types manifests, is determined by the speciic syn-
Other methods of administration have been
drome. Clinical trials have been performed in patients
explored for acute and urgent use, where it is important
with seizure types associated with juvenile myoclonic
to have an immediate drug exposure. Epileptic seizures
epilepsy, as well as in patients with generalized, tonic
have a tendency to cluster, and a new indication was
clonic seizures [19]. Occasionally, trials for absence
created, namely ‘acute repetitive seizures’, when rectal
seizures have been attempted, but these are diicult
diazepam underwent regulatory review [22]. In prac-
due to the fact that the majority of absence seizures are
tice, rectal diazepam may also be used for status epilep-
treatment sensitive [19].
ticus prior to ambulance arrival, in children with very
he more devastating symptomatic generalized
severe epilepsy. Intranasal therapy using midazolam is
epilepsies are typically accompanied by multiple sei-
being tested for use in acute repetitive seizures and an
zure types, including all of the types described for par-
intramuscular formulation of diazepam is under inves-
tial epilepsy, as well as ‘atypical’ absence, myoclonus,
tigation for the same indication.
and tonic/atonic (‘drop seizures’). Patients with the
so-called Lennox-Gastaut Syndrome, which is charac-
terized by multiple seizure types, developmental delay, Measurement tools
and a characteristic slow-spike-wave EEG pattern, are he primary instrument for assessment of eicacy in
usually selected for clinical trials of this epilepsy syn- epilepsy clinical trials is counting of seizure events over
drome. his syndrome begins in childhood, but per- time. his is usually done by use of a seizure diary. he
sists into adulthood [20]. subjects work with the site investigator to deine each
Because of ethical considerations, it is considered seizure type the subject sufers and assign a letter des-
unacceptable to leave an epilepsy patient untreated ignation for each. he subjects or their caregivers then
because standard of care therapy is efective in prevent- record how many of each seizure type the subject had
ing serious injury. In addition, the relatively high ei- that day. If there were none, a box is checked to indicate
cacy of almost all the medications in treatment naïve this, and exclude a missing day of data.
286
here is some controversy over the use of the diary, efective surrogate marker for seizures. Video-EEG has
and the diary restricts the type of patient that can enroll been used in trials of patients with the Lennox-Gastaut
in the study. For example, a patient who is unaware of syndrome to teach parents how to recognize seizures,
most of his seizures is ineligible. If patients cannot but not as an outcome measure [25]. In a recent study,
record their own seizures (either because they are not EEG has also been used as an accompaniment to sei-
aware of them, or they have cognitive disturbance), zure diaries comparing three treatments for absence
they should have caregivers who can record this infor- seizures. In this case, children had a prolonged out-
mation for them. Living conditions should remain sta- patient EEG with hyperventilation to conirm control
ble. If a subject is unaware that he is unaware of some of absence seizures, and to rule out unrecognized or
seizures, moving in with a relative in the middle of the unreported events [26].
study could result in reporting of an artiicial increase Some studies have tried to capture the reduction in
in seizure frequency. Inaccurate evaluation of each sei- seizure severity that oten occurs with the administra-
zure type by the neurologist or imperfect administration of an efective antiepileptic drug. In practice this
tion by the subject may be a source of signiicant error type of analysis is diicult to quantitate and is not oten
in centers that are inexperienced in clinical trials. With used as a primary outcome variable [27].
studies increasingly involving less experienced centers here are many quality of life scales used in epilepsy
from around the world, in some cases the diaries may studies. One of the most common is the Quality of Life
be precise but not accurate. he diary is an imperfect in Epilepsy-31 (QOLIE-31)[28]. hey are very helpful
tool, but the best available. in certain circumstances but somewhat insensitive to
Seizures have a tendency to occur in clusters and the reduction in seizure frequency usually achieved by
lurries, which can occur over the course of minutes, a new compound, likely relecting the fact that in these
hours, or even days. Seizure clusters represent a prob- trials, seizure freedom is rare, and thus the clinical
lem in epilepsy trials. If the patient cannot count the impact of treatment may be small, rather than a defect
number of seizures that have occurred in a cluster, in the instruments (see above ‘Preclinical assessment
there may be issues in analysis. A patient who sufers of antiepileptic drugs’) [29].
seizures that cluster and uses frequent rescue medica- An extensive battery of neuropsychological assess-
tion (benzodiazepines for acute seizures, as described ment is seldom used in antiepileptic trials. he recent
above), which is becoming more common, is also not exception is the Stimulation of the Anterior Nucleus of
ideal because if a patient randomized to placebo uses the halamus for Epilepsy study of the deep brain stim-
more rescue medication than a patient in the treat- ulator in the anterior nucleus of the thalamus [30].
ment group, there may be less separation between the Treatment emergent adverse event frequency
groups. Finally, an efective new antiepileptic therapy is usually assessed in comparison to placebo. his
can change a single seizure cluster into many countable becomes more complicated, in the add-on trials used
seizures, and while this change would beneit patients, for registration of new antiepileptic drugs for partial
it could profoundly impact the measurement of the onset seizures. In these trials, patients are typically
primary endpoints of a trial. However, frequency of receiving one to three background antiepileptic drugs,
use of rescue medications could be used as an outcome to which either a placebo or the new chemical entity are
measure but this has not been widely done in the past. added. he baseline drugs the patient is on oten have
At present, there is no EEG surrogate available that side efects which are similar to those of the study drug.
can replace or even enhance standard three month ei- he resulting pharmacodynamic impact for side efects
cacy trials for partial onset seizures. Recording EEG can result in a high rate of adverse events or even result
intermittently to capture inter-ictal activity is not use- in subject drop out, potentially making the primary
ful, as some antiepileptic drugs will be able to reduce outcomes more diicult to reach [31]. Oten, for this
seizure frequency, but will have no efect on inter-ictal reason the side efect proile of new antiepileptic drugs
activity, and some (carbamazepine is an example) is over estimated until monotherapy studies are per-
are actually known to increase inter-ictal spikes [23]. formed later in development. Dropout rate and reason
Occasionally, when a drug is known to suppress inter- for dropout is another important outcome measure in
ictal spikes, spike counting can be used as a surrogate all epilepsy therapeutic trials.
measure of drug efect [24]. here is no current means here are several diferent outcomes that have been
to record chronic EEG for long enough periods to be an employed in epilepsy clinical trials through use of the
287
seizure diary. he seizure frequency is highly variable Finally, an assessment can specify the overall efect-
from patient to patient, and there is also variability iveness of the drug by assessing whether patients meet
from month to month within patient, which makes certain endpoints over a prespeciied period of time.
statistical assessment of outcome complex. For this his outcome is used in withdrawal to monotherapy
reason, most chronic epilepsy trials employ a 6 to 12 studies in patients with treatment resistant partial
week baseline to establish seizure frequency, followed onset seizures. In this trial design, a number of exit cri-
by a treatment period. Most commonly, the primary teria, which are indicators of worsening of seizures are
outcome measure is assessment of seizure rate dur- pre-speciied. he number of patients meeting the exit
ing the treatment period, compared to their baseline. criteria over time is determined. his kind of assess-
his is usually assessed as median percent seizure ment is also used in a ‘time to n-th seizure design,’
reduction. Another common outcome measure is the where patients exit the study ater they have experi-
‘responder rate.’ his is a dichotomous assessment of enced a predetermined number of seizures. Time to
the number of patients who achieve a certain percent exit has been used in studies of newly diagnosed sub-
of seizure reduction compared to baseline. Commonly, jects [35]. his outcome measure is considered to be
50% seizure reduction is considered to be clinically a composite of eicacy and tolerability. his is said to
meaningful, and thus patients who achieve this are des- provide information about drug ‘efectiveness,’ a bet-
ignated as responders [32]. Patients with 75% response ter assessment of real-life impact, rather than eicacy.
may also be reported. Seizure-free rates (100% seizure he outcome measure has been criticized by some, as
reduction) are also oten reported, but the number of drugs that are well tolerated but minimally eicacious
patients achieving this outcome tends to be small, and can look equally as efective as drugs that are poorly
the reporting has been confounded by use of diferent tolerated but highly eicacious. For more information,
deinitions of seizure freedom. For example, in some refer to Section 2 of this book.
studies, patients who dropped out early but have not
had seizures prior to drop out are counted as seizure
free. In other studies, patients would need to com- Clinical trial designs
plete the entire treatment period to be counted [33].
Another complexity is that changes in seizure counts Efficacy
do not obey a normal distribution. his is because the Many drug trials are performed to achieve registra-
seizure frequency can only be reduced by a maximum tion for speciic indications by regulatory author-
of 100%, but can increase by any amount. his is dealt ities. Ultimately these clinical trials are designed to
with by normalization using various means, including demonstrate to the regulatory agencies that a drug is
logarithmic transformation. Bounded functions of sei- efective and safe. Oten an early study is done to aid
zure rates are oten used as well. A common measure is in dose inding. hese are the irst studies in epilepsy
the response ratio, which is the ratio of the diference patients.
between the treatment and baseline seizure rates to the Due to the intermittent, unpredictable nature of
sum of these two rates [34]. the clinical manifestations of epilepsy and the various
Monotherapy trials in patients newly diagnosed types of epilepsies secondary to variable etiologies,
with epilepsy are usually performed as active control diferent pathophysiological mechanisms, as well as
non-inferiority trials. Newly diagnosed patients have a diferent clinical and electroencephalographic expres-
much higher likelihood of seizure freedom, and com- sions, there are complexities in assessing eicacy of
parison of the percent of patients that remain seiz- potentially new antiepileptic drugs in man in a rapid
ure free on each randomized therapy is usually used fashion.
at the outcome measure. Recently, the Committee for One proof of concept study that had been gaining
Medicinal Products for Human Use (CHMP, European popularity is performed in patients with photosensi-
regulatory body) has provided relatively stringent cri- tive epilepsy. hese patients reliably have an epilepti-
teria for assessment of seizure freedom in active control form discharge known as a photoparoxysmal response
monotherapy trials. hey suggest that seizure freedom on their EEG in response to speciic frequencies of
should be compared for at least 6 months, and that the lashing light. he photoparoxysmal response can be
trial be continued for at least 1 year. hey also suggest a used as a quantitative measure of photosensitivity and
relative -20% delta for the 95% CI, with 80% power. therefore epileptogenicity. To date, eicacy of a single
288
Typical phase 3 trial design for add-on therapy Figure 25.1. Placebo controlled
add-on design. All subjects stay on
their prescreening antiepileptic drugs.
Screening is followed by a baseline
High dose arm phase. Randomization occurs just before
drug initiation. In this diagram there
are two treatment arms and a placebo
Low dose arm add-on. This is followed by an open-
label extension in which the dose is
determined by the investigator.
Placebo arm
Baseline phase Treatment phase Open-

label
extension
Screening Randomization End
dose of drug in the photoparoxysmal response protocol antiepileptic drugs, in addition to the study drug. he
has been a robust indicator of successful antiepileptic outcomes are measured with a seizure diary.
drugs (e.g., levetiracetam and lamotrigine). Combined he regulatory program leading to drug registra-
with blood level monitoring, the model ofers information usually has approval for an indication as add-on
tion about the time of onset and the duration of the therapy in treatment resistant partial onset seizures
antiepileptic action and side efects [36]. as the linchpin. Two adequate and well-controlled tri-
A dose-inding study is typically performed in early als will need to demonstrate eicacy and safety. hese
development, using diferent doses of the study drug to regulatory studies usually enroll drug resistant par-
try to establish some trend towards eicacy in a dose- tial onset seizure patients who are already on one to
dependent manner. hese studies are oten not powered three antiepileptic drugs, not including a vagus nerve
to detect signiicant eicacy or to establish deinitive stimulator, and are having at least three or four count-
number of subjects experiencing adverse events. able partial seizures per month [39] (See Figure 25.1).
here are alternative designs which have proven Ater a 6–12 week baseline, patients are randomized to
useful in the past both for initial demonstration of add-on placebo or one of several ixed doses of the test
eicacy and for dose inding. One interesting alterna- drug. Flexible dose trials are frowned upon by US regu-
tive is a presurgical inpatient study. In this study sub- latory agencies. here is typically a 1 to 4 week titra-
jects are enrolled while in an epilepsy monitoring unit. tion period, followed by a 12 week maintenance period
Typically, patients have been weaned of some or all of (mandated by the regulatory authorities in Europe and
their background antiepileptic drugs in order to record the US to assure a long enough duration of therapy
seizures for surgical localization. Patients are then ran- to assess drug afect). At the conclusion of the study,
domized to test drug or placebo, and inish involvement patients are usually ofered a long-term maintenance
in the study when a prespeciied number of seizures phase. Patients who do not elect to continue on the
has occurred [37]. Although the design does provide a drug are tapered of over a 1–3 week taper phase. he
rapid assessment of drug eicacy in partial seizures, it is outcome of seizure reduction is measured with seizure
an expensive study, and is not considered by the regula- diaries. Oten the two deinitive pivotal trials to dem-
tory agencies for drug approval. Ethical issues have also onstrate eicacy and safety for regulatory bodies are
been raised in regards to maintaining patients on pla- run nearly concurrently with one based in the US and
cebo alone for prolonged periods solely for the purpose Canada and one based in Europe to persuade the two
of the trial, as post-ictal psychosis and other adverse most stringent regulatory agencies. At present, these
events have occurred as a consequence [38]. large studies are run multinationally.
Deinitive proof of eicacy studies usually employ In addition to approval for partial onset sei-
drug resistant partial onset seizure patients with at zures, many drugs are assessed for eicacy in one or
least three seizures per month. he subjects remain on more additional seizure types. One design utilizes
their typical antiepileptic regimen, usually one to three patients with a very high seizure frequency, those with
289
Lennox-Gastaut. As noted above, the Lennox-Gastaut treatment resistant partial onset seizures’ [37]. In this
syndrome has onset in childhood, and is associated study design, patients are randomized to treatment
with a high seizure frequency. Some seizure types may with an experimental drug or placebo, ater which
be diicult to diferentiate from behavioral problems, baseline therapy is withdrawn over 2–8 weeks. A true
since the syndrome is associated with developmental placebo is not utilized as the comparison to reduce the
delay which can at times be profound. hus, the study likelihood of status epilepticus or secondary general-
measures both total seizures and frequency of one of ization. he comparison arm can consist of a minim-
the more clear cut seizure types (tonic and atonic sei- ally efective dose of either the same investigational
zures) and seizure severity. In other aspects, trial design drug or of any other therapy presumed to be less efect-
is similar to that used for randomized placebo-control- ive than the test drug. A starting dose of valproic acid
led add-on study in partial onset seizures. Recently, (15 mg/kg) has been employed in a number of trials for
the drug ruinamide was evaluated in Lennox-Gastaut this purpose. Outcome is assessed in terms of ‘failures’
syndrome [40]. and ‘completers.’ Failure is determined on the basis of
Most epilepsy eicacy studies are followed by an escape criteria, such as doubling of seizure frequency,
open-label extension study. In this trial type, all sub- occurrence of generalized tonic-clonic seizures or
jects in the eicacy study who tolerated the medication increase in seizure severity. If more patients receiving
are allowed a chance to try the study medication at a the experimental drug at a therapeutic dose in mono-
clinician estimated dose. his is used primarily to gar- therapy can complete the trial, without fulilling escape
ner information on long-term use of the drug for safety criteria, than patients receiving the less efective com-
and for approval [41]. parator in monotherapy, the treatment is considered
Add-on trials may also be done in patients with idio- efective. Over time, concern arose about randomizing
pathic generalized epilepsy. hese trials are diicult, as patients to a less efective therapy, and most recently,
there are fewer patients who are treatment resistant, and a historical control has been compiled, which uses a
seizures tend to be less frequent than in partial onset meta-analysis of the escape rate from the pseudo-pla-
seizure syndromes. Typically, these studies will enroll cebo arms of all of the relevant trials, to create a ‘virtual
patients with one or more generalized tonic clonic con- placebo arm’, against which active drugs can be meas-
vulsions per month. Other seizure types that occur in ured [42]. he FDA has agreed to accept trials which
idiopathic generalized epilepsy (absence, myoclonus) use the historical control at the comparator arm for
are allowed and are counted, but are not included in the approval of drugs as monotherapy [43]. In contrast, the
primary outcome, which is usually % reduction in gen- European regulatory authorities have not accepted this
eralized tonic clonic convulsions. As in the other trials, design, and prefer active control non-inferiority trials
a 2–3 month baseline will be followed by randomiza- in newly diagnosed patients (described above).
tion to adjunctive test drug or placebo. he treatment he time to event design has been mentioned
period is usually 6 months in duration. above. his design can be used to study subjects with
By convention, indications are usually granted by more rare outcomes of interest, and is being explored
regulatory authorities for the primary seizure type that as a regulatory endpoint. his design has been used to
is studied, with an indication of the syndrome where study seizure clusters, by measuring time to seizure
appropriate. herefore, a successful study as described cessation [22]. It could also be used to study a severe
above would lead to an indication for ‘generalized tonic seizure type such as primary generalized tonic clonic
clonic convulsions associated with idiopathic general- seizures [44]. Cross over trials are rarely used in epi-
ized epilepsy’. lepsy. he disadvantages of a crossover trial include a
much longer duration, risk of patients dropping early,
and, more importantly, a potential unblinding of the
Additional study designs trial (see Chapter 10). his is of most concern in a trial
Trials to assess efectiveness of antiepileptic drugs as which compares placebo to active treatment. Patients
monotherapy are complex, primarily because pla- may be able to discern a diference in side efects when
cebo is ethically almost impossible to employ in a switching from placebo to drug, or vice versa. here
patient with active epilepsy. To address this concern, may be carryover efects (that is, long-lasting efects) of
a trial design was introduced which was known as the treatment, which would impact on the initial portion
‘pseudo-placebo withdrawal to monotherapy study in of the second treatment phase [44]. For these reasons,
290
the Food and Drug Administration as well as European the intervention being tested, rather than related to
Medicines Agency do not favor such trials. the underlying epilepsy. Certain adverse events, such
Clinical testing of antiepileptic drugs oten con- as sudden death, depression, and psychosis, are more
siders children separately because of the age-related common in patients with epilepsy than in the popula-
changes in both brain and overall physiological and tion at large.
biochemical status that occur during childhood along One safety issue that is currently challenging those
with the age dependency of certain seizure types and conducting clinical trials is the recent FDA determi-
epileptic syndromes. Most studies on antiepileptic nation that all antiepileptic drugs may cause suicidal
drugs have considered children to be less than 12 years, thoughts and behaviors [48]. It is likely that screening
and have included those aged 12 years and over in trials for suicidal thoughts will be required during future
designed primarily for adults. However, a recent state- clinical trials. Studies are currently underway to ind
ment by the European Medicines Agency suggested the best screening tool to identify this efect.
that focal epilepsy in children is similar enough to its Sudden unexplained death in epilepsy patients is
adult counterpart, that ‘the results of eicacy trials per- a concern for clinical trials. he rate of sudden unex-
formed in adults could to some extent be extrapolated plained death in epilepsy patients in clinical trials has
to children provided the dose is established … [46].’ been estimated at 0.3/100 patient years. In the past,
here are a number of severe epilepsy syndromes there have been some concerns that certain drugs
that occur in infants and young children, such as West increase the rates of sudden unexplained death in
syndrome, severe myoclonic epilepsy in infants, and epilepsy patients, and this may be diicult to conirm
myoclonic astatic epilepsy which deserve separate or refute in studies without a control group, such as
trials. long-term extension studies ater randomized trials.
However, several analyses have indicated that the rate
is consistent with the expected rate for patients with
Safety issues frequent uncontrolled seizures [49].
Common side efects of antiepileptic drugs tend to be
CNS related (dizziness, drowsiness, diplopia, concen-
tration diiculties) and tend to increase with dose. Implementation issues
Behavioral disturbances (irritability, depression, psy- One of the persistent diiculties in antiepileptic drug
chosis) are also seen. It may be diicult to determine the development is the selection of the appropriate dose
true extent of these dose-related adverse events, because of the drug. here are cost pressures to move a drug
most of these trials are add-on. Pharmacodynamic quickly through development, but as with all new
interactions with baseline antiepileptic drugs will tend therapies, it is prudent to proceed only when there is
to amplify apparent toxicity from the new drug. In a reasonable assessment of optimal dosing. A selec-
one study, toxicity developed in 90% of patients who tion of a dose that is too low will result in a study that
were converted from monotherapy to polytherapy does not meet its endpoints for eicacy. A dose that is
with standard agents [47]. Idiosyncratic side efects, too high will sufer from a high drop-out rate. Dose
such as hypersensitivity syndromes, pancreatitis, hep- selection is complicated by the fact that there may
atic failure and renal calculi, occur with one or several be a great deal of interindividual variability in drug
marketed antiepileptic drugs, and may occur months metabolism, leading to over dosing in some cases and
to years ater initiation of the drug. hus, if they occur under dosing in others. his is particularly true, as
over the course of a trial with a new intervention, it may a proportion of patients may be receiving one of the
be diicult to determine if they are related to the drug antiepileptic drugs that is hepatic enzyme inducing,
of interest, or the background medication. Patients which will lead to relatively lower serum concentra-
who are receiving marketed antiepileptic drugs associ- tions in that subset of patients. Oten, dose will not be
ated with more frequent serious adverse events, such as adjusted to account for these pharmacokinetic inter-
felbamate (aplastic anemia up to 1 in 3000) and viga- actions. One possible solution was used in a trial of
batrin (irreversible visual ield defects 30%) are oten topiramate, in which the drug was titrated to a ixed
excluded from participation in trials. serum concentration rather than a ixed dose [50].
A placebo control is extremely helpful in assess- However concentration controlled trials are complex
ing whether some adverse events are truly related to and diicult to perform.
291
Another issue is that the placebo response rate for Another common concern in epilepsy studies, is
clinical trials in epilepsy may be increasing. he rate that many patients may have some or considerable cog-
of responders to placebo in epilepsy trials has been nitive disturbance. Many studies require patients to be
variable over time [51]. he factors that contribute capable of signing their own consent. However, it is not
include more reliable antiepileptic drug intake [52], common to perform any speciic testing to determine
regression to the mean (described above), and poten- whether the patient is truly capable of understanding
tial reduction of patient stress when enrolled in a trial what is being asked of them.
and obsessively supervised. here has also been an Trial duration is an issue in epilepsy trials. Ideally, a
increase in failure of clinical trials to separate active clinical trial should last for as long as possible, to assess
drug arms from placebo. Two recent large develop- whether a new therapy will be successful over the long
ment programs led to failed trials (carisbamate, bri- term. Unfortunately, epilepsy trials are performed in
varacetam) [53]. Studies performed in the 1990s were patients with severe, treatment resistant epilepsy and
oten done at a few very experienced centers that were frequent seizures. It is not benign to maintain these
resource intensive, leading to solid outcome data. patients on the same therapy for a prolonged period of
More recently, individual centers have been unable to time without intervening, as is the case in the placebo
enroll the same number of subjects, probably due to arm of randomized trials. Moreover, patients are reluc-
the rising number of previously approved therapies. tant to be randomized knowing that they will receive
A patient is unlikely to try a new chemical entity when no active treatment for an 8–12 week baseline ended
an already approved one may be available. he result 3-months treatment phase. Eforts are underway to
is that less experienced investigators with less inten- consider new trial designs, such as time to n-th seizure,
sively screened subjects enroll in studies. hese factors which would allow patients who were doing poorly to
tend to lead to errors such as counting events that are exit sooner and yet provide adequate trial duration to
not seizures or to enrolling subjects with the wrong assess treatment efect.
syndrome, thus narrowing the diference between pla- As noted above, monotherapy trial design remains
cebo and treatment arms. Currently there are eforts extremely controversial. In the US at the present time
to improve the quality of outcomes by assessing each the accepted trial design is historical control conversion
investigator’s seizure descriptions. Another proposed to monotherapy in treatment resistant partial onset sei-
solution is to widen the possible patient pool by zures. In Europe, the accepted trial design is monotherapy
using new trial designs such as the time to n-th sei- active control non-inferiority study in newly diagnosed
zure design to reduce patient burden, and hopefully patients. Discussions with regulatory agencies continue,
increase enrollment. to try and harmonize monotherapy trial designs [54].
Many have questioned the generalizability of epi-
lepsy trials that are performed for registration of new
Challenges and controversies drugs. he majority of these trials enroll patients with
It is always diicult to determine what risks are appro- very frequent treatment resistant seizures, who may
priate for patients to be subjected to. Most patients not be representative of the bulk of the patients will
enter trials with investigational drugs because they ultimately receive the treatment.
have failed many standard drugs, and continue to have
seizures that impair their quality of life. hey may also
be inluenced by the promise of free medical care or References
free drug. However, with so many available marketed 1. Kandel ER, Schwartz JH, Jessell TM. Principles of Neural
drugs, patients may be approached to enter clinical Science. 3rd edition. New York: Elsevier. 1991.
trials before they have failed a number of marketed 2. Porter RJ. Antiepileptic drugs: future development.
drugs. his issue also arises in trials to obtain a mono- Epilepsy Res Suppl 1993; 10: 69–77.
therapy approval, which in Europe are typically done in 3. Herman ST. Epilepsy ater brain insult: targeting
patients with newly diagnosed epilepsy. It may be hard epileptogenesis. Neurology 2002; 59(Suppl 5): S21–6.
for an investigator to decide how much information 4. Kwan P, Brodie MJ. Refractory epilepsy: mechanisms
should be available about a drug, before it is reasonable and solutions. Expert Rev Neurother 2006; 6: 397–406.
to start newly diagnosed patients on it, when they have 5. White HS, Smith MD, Wilcox KS. Mechanisms of action
so many other options. of antiepileptic drugs. Int Rev Neurobiol 2007; 81: 85–110.
292
6. Jung MJ, Lippert B, Metcalf BW, et al. gamma-Vinyl 21. Ramael S, Daoust A, Otoul C, et al. Levetiracetam
GABA (4-amino-hex-5-enoic acid), a new selective intravenous infusion: a randomized, placebo-
irreversible inhibitor of GABA-T: efects on brain controlled safety and pharmacokinetic study. Epilepsia
GABA metabolism in mice. J Neurochem 1977; 29: 2006; 47: 1128–35.
797–802. 22. Dreifuss FE, Rosman NP, Cloyd JC, et al. A comparison
7. Lynch BA, Lambeng N, Nocka K, et al. he synaptic of rectal diazepam gel and placebo for acute repetitive
vesicle protein SV2A is the binding site for the seizures. N Engl J Med 1998; 338: 1869–75.
antiepileptic drug levetiracetam. Proc Natl Acad Sci 23. Marciani MG, Gigli GL, Stefanini F, et al. Efect of
USA 2004; 101: 9861–6. carbamazepine on EEG background activity and on
8. Perucca E, Gram L, Avanzini G, et al. Antiepileptic drugs interictal epileptiform abnormalities in focal epilepsy.
as a cause of worsening seizures. Epilepsia 1998; 39: 5–17. Int J Neurosci 1993; 70: 107–16.
9. White HS, Wolf HH, Woodhead JH, et al. he National 24. Milligan N, Dhillon S, Oxley J, et al. Absorption of
Institutes of Health Anticonvulsant Drug Development diazepam from the rectum and its efect on interictal
Program: screening for eicacy. Adv Neurol 1998; 76: spikes in the EEG. Epilepsia 1982; 23: 323–31.
29–39. 25. Eicacy of felbamate in childhood epileptic
10. Barton ME, Klein BD, Wolf HH, et al. Pharmacological encephalopathy (Lennox-Gastaut syndrome). he
characterization of the 6 Hz psychomotor seizure Felbamate Study Group in Lennox-Gastaut Syndrome.
model of partial epilepsy. Epilepsy Res 2001; 47: 217–27. N Engl J Med 1993; 328: 29–33.
11. Loscher W and Leppik IE. Critical re-evaluation of 26. Glauser TA, Cnaan A, Shinnar S, et al. Ethosuximide,
previous preclinical strategies for the discovery and the valproic acid, and lamotrigine in childhood absence
development of new antiepileptic drugs. Epilepsy Res epilepsy. N Engl J Med 2010; 362: 790–9.
2002; 50: 17–20. 27. Cramer JA. Seizure measurement in clinical trials. J
12. van Luijtelaar EL and Coenen AM. Two types of Epilepsy 1998; 11: 256–60.
electrocortical paroxysms in an inbred strain of rats. 28. Cramer JA, Perrine K, Devinsky O, et al. Development
Neurosci Lett 1986; 70: 393–7. and cross-cultural translations of a 31-item quality of
13. Marescaux C, Vergnes M, and Depaulis A. Genetic life in epilepsy inventory. Epilepsia 1998; 39: 81–8.
absence epilepsy in rats from Strasbourg – a review. J 29. Leone MA, Beghi E, Righini C, et al. Epilepsy and
Neural Transm 1992; 35(Suppl 1): 37–69. quality of life in adults: a review of instruments. Epilepsy
14. Mager DE, Woo S, and Jusko WJ. Scaling Res 2005; 66: 23–44.
pharmacodynamics from in vitro and preclinical 30. Fisher R, Salanova V, Witt T, et al. Electrical stimulation
animal studies to humans. Drug Metab Pharmacokinet of the anterior nucleus of thalamus for treatment of
2009; 24: 16–24. refractory epilepsy. Epilepsia 2010; 51: 899–908.
15. Berg AT, Berkovic SF, Brodie MJ, et al. Revised 31. Cramer JA, Mintzer S, Wheless J, et al. Adverse efects
terminology and concepts for organization of seizures of antiepileptic drugs: a brief overview of important
and epilepsies: report of the ILAE Commission on issues. Expert Rev Neurother 2010; 10: 885–91.
Classiication and Terminology, 2005–2009. Epilepsia
32. Ben-Menachem E, Sander JW, Privitera M, et al.
2010; 51: 676–85.
Measuring outcomes of treatment with antiepileptic
16. Seino M. Classiication criteria of epileptic seizures and drugs in clinical trials. Epilepsy Behav 2010; 18: 24–30.
syndromes. Epilepsy Res 2006; 70 (Suppl 1): S27–33.
33. Gazzola DM, Balcer LJ, and French JA. Seizure-free
17. Kwan P and Brodie MJ. Early identiication of outcome in randomized add-on trials of the new
refractory epilepsy. N Engl J Med 2000; 342: 314–9. antiepileptic drugs. Epilepsia 2007; 48: 1303–7.
18. Kwan P, Arzimanoglou A, Berg AT, et al. Deinition 34. Pledger GW and Sahlroot JT. Alternative analyses for
of drug resistant epilepsy: consensus proposal by antiepileptic drug trials. Epilepsy Res 1993; 10 (Suppl):
the ad hoc Task Force of the ILAE Commission on 167–74.
herapeutic Strategies. Epilepsia 2009; 51: 1069–77.
35. Brodie MJ, Richens A, and Yuen AW. Double-blind
19. Bergey GK. Evidence-based treatment of idiopathic comparison of lamotrigine and carbamazepine in newly
generalized epilepsies with new antiepileptic drugs. diagnosed epilepsy. UK Lamotrigine/Carbamazepine
Epilepsia 2005; 46 (Suppl 9): 161–8. Monotherapy Trial Group. Lancet 1995; 345: 476–9.
20. Arzimanoglou A, French J, Blume WT, et al. Lennox- 36. Kasteleijn-Nolst Trenite DG, Marescaux C, et al.
Gastaut syndrome: a consensus approach on diagnosis, Photosensitive epilepsy: a model to study the efects
assessment, management, and trial methodology. of antiepileptic drugs. Evaluation of the piracetam
Lancet Neurol 2009; 8: 82–93. analogue, levetiracetam. Epilepsy Res 1996; 25: 225–30.
293
37. Devinsky O, Faught RE, Wilder BJ, et al. Eicacy 46. Guideline on clinical investigation of medicinal
of felbamate monotherapy in patients undergoing products in the treatment of epileptic disorders.
presurgical evaluation of partial seizures. Epilepsy Res EMEA. 2010. http://www.ema.europa.eu/pdfs/human/
1995; 20: 241–6. ewp/056698enrev2.pdf.
38. Ketter TA, Malow BA, Flamini R, et al. Anticonvulsant 47. Schmidt D. Two antiepileptic drugs for intractable
withdrawal-emergent psychopathology. Neurology epilepsy with complex-partial seizures. J Neurol
1994; 44: 55–61. Neurosurg Psychiatry 1982; 45: 1119–24.
39. Wilensky AJ. Protocol design. Epilepsy Res 1993; 10 48. French JA. Obstacles encountered in designing
(Suppl): 107–13. antiepileptic drug trials. Epilepsy Res 1993; 10 (Suppl):
40. Glauser T, Kluger G, Sachdeo R, et al. Ruinamide for 81–9.
generalized seizures associated with Lennox-Gastaut 49. Walczak T. Do antiepileptic drugs play a role in sudden
syndrome. Neurology 2008; 70: 1950–8. unexpected death in epilepsy? Drug Saf 2003; 26: 673–83.
41. he US Gabapentin Study Group.he long-term safety 50. Christensen J, Andreasen F, Poulsen JH, et al.
and eicacy of gabapentin (Neurontin) as add-on Randomized, concentration-controlled trial of topiramate
therapy in drug-resistant partial epilepsy. Epilepsy Res in refractory focal epilepsy. Neurology 2003; 61: 1210–8.
1994; 18: 67–73. 51. Guekht AB, Korczyn AD, Bondareva IB, and Gusev EI.
42. French JA, Wang S, Warnock B, et al. Historical control Placebo responses in randomized trials of antiepileptic
monotherapy design in the treatment of epilepsy. drugs. Epilepsy Behav 2010; 17: 64–9.
Epilepsia 2010; 51: 1936–43. 52. Cramer J, Vachon L, Desforges C, et al. Dose
43. Perucca E. When clinical trials make history: frequency and dose interval compliance with multiple
Demonstrating eicacy of new antiepileptic drugs as antiepileptic medications during a controlled clinical
monotherapy. Epilepsia 2010; 51: 1933–5. trial. Epilepsia 1995; 36: 1111–7.
44. Biton V, Sackellares JC, Vuong A, et al. Double-blind, 53. Sperling MR, Greenspan A, Cramer JA, et al.
placebo-controlled study of lamotrigine in primary Carisbamate as adjunctive treatment of partial onset
generalized tonic-clonic seizures. Neurology 2005; 65: seizures in adults in two randomized, placebo-
1737–43. controlled trials. Epilepsia 2010; 51: 333–43.
45. Richens A. Proof of eicacy trials: cross-over versus 54. French JA and Schachter S. A workshop on antiepileptic
parallel-group. Epilepsy Res 2001; 45: 43–7; discussion drug monotherapy indications. Epilepsia 2002; 43
9–51. (Suppl 10): 3–27.
294
Chapter
Insomnia
26 Michael E. Yurcheshen, Changyong Feng, and J. Todd Arnedt
Overview mental disorder, or by the direct physiological efects

of a substance [5]. he current prevailing psychologi-
Chronic insomnia afects up to 10% of American
cal construct about primary insomnia is the Spielman
adults and exacts a major personal and societal burden.
‘3P’ model [6]. his model suggests that an individual
Chronic insomnia has been linked to reduced quality
has ‘predisposing’ factors that may increase individual
of life, increased risk for psychiatric and substance use
susceptibility to insomnia. Identiied inherent char-
disorders, and exacerbates comorbid health conditions
acteristics that are considered predisposing factors
[1–3]. he total costs of insomnia to the health care sys-
include a familial history of light or disrupted sleep
tem are highly signiicant, with one recent study esti-
and psychological characteristics such as a tendency to
mating that average direct and indirect costs for younger
worry excessively and over concern with personal well-
adults with untreated insomnia were more than $1200
being. ‘Precipitating’ factors are triggering events that
greater than for adults without insomnia [4].
initiate a bout of insomnia. Some examples of precipi-
Amongst the neurological conditions outlined in
tating conditions include physical stressors (i.e., acute
this textbook, insomnia shares its dual objective and
illness, pain), psychiatric stressors (clinical depression,
subjective nature with other conditions. Current ther-
mania), or social stressors (either positive or negative).
apy has both a biological and behavioral basis. For the
Once insomnia has been initiated, ‘perpetuating’ fac-
purposes of this chapter, circadian rhythm disorders,
tors, counterproductive associations and habits, can
which involve a mismatch between the biological tim-
maintain it over time, even ater the original precipitat-
ing system and preferred sleep and wake cycles, are
ing event has disappeared or has been managed. he
considered pathophysiologically separate from insom-
perpetuating factors that have received most attention
nia, and will not be discussed.
include behavioral strategies to compensate for poor
sleep (i.e., napping), eforts to deal with the conse-
Biological basis for intervention quences of insomnia (i.e., excessive cafeine intake),
Insomnia is a sleep disorder characterized by compre-sleep cognitive arousal, and negative sleep-related
plaints of diiculty initiating sleep, maintaining sleep, beliefs and attitudes (i.e., worry about inability to sleep
waking too early, or sleep that is chronically experienced and daytime consequences as a result of sleep loss,
as non-restorative or poor in quality. hese complaints unrealistic sleep expectations).
occur despite adequate opportunity and circumstances
for sleep, and individuals attribute some form of day- Comorbid insomnia
time impairment (e.g., fatigue, neurocognitive deicits,
Comorbid insomnia is more common than primary
mood disturbance) to the sleep problems [5].
insomnia. Although there is some debate about the
directionality of the relationship, comorbid insomnia
Primary insomnia is thought to be caused primarily by a concurrent med-
Primary insomnia can be considered a condition of ical, sleep, or psychiatric disorder, or to be the direct
hyperarousal, and is deined as an insomnia disorder result of another substance. Until recently, most clini-
that is not directly caused by another medical, sleep, or cal trials have focused on primary insomnia; however,
295
Table 26.1 Neurotransmitters involved in sleep and wakefulness
Examples of sleep related Examples of sleep

Transmitter Wakefulness NREM sleep REM sleep agonists/upregulators related antagonists
Acetylcholine x x Tricyclic
antidepressants
Monoamines x Amphetamines
Histamine x Diphenhydramine
Glutamine x
Adenosine
Serotonin x
GABA x Benzodiazepines,
Benzodiazepine receptor
agonists
Hypocretin x x Under development
(orexin A)
recruitment of appropriate subjects with this disorder but have widespread connections, and play a role in
is oten diicult, given its relative rarity compared REM sleep regulation [10–12]. his neurotransmitter,
to comorbid insomnia. he frequency of comorbid however, is involved in more than REM-sleep, and has
insomnia is now becoming a recognized phenomenon an impact on wakefulness as well. Similarly, acetylcho-
in clinical trials planning. Some recent comparative line, a neurotransmitter also involved in wakefulness,
eicacy trials have examined responses to pharmaco- also contributes to REM sleep regulation. Coordination
logic interventions in cohorts with primary vs. comor- between these various states is complicated, and the
bid insomnia [7]. details of these patterns are emerging.
Brief mention should be made of some of the more Some studies of insomnia suggest that disruption of
established neural pathways responsible for initiating these mechanisms will result in sleep-wake dysregula-
and maintaining sleep and wakefulness, as they are tion. For instance, in animal models, lesions of the ven-
putative therapeutic targets. Wakefulness, non-REM trolateral preoptic nucleus in the hypothalamus result
sleep, and REM sleep are separate but functionally in a substantial decrease in NREM and REM sleep [13,
interconnected states, and are modulated by diferent 14]. For most individuals with insomnia, however,
neurotransmitter systems. Glutaminergic, choliner- such distinct lesions are not present. here are several
gic, and monoaminergic pathways ascending from the convergent areas of research using diferent techniques
brainstem serve critical roles in maintenance of wake- that lend support to hyperarousal in insomnia. It is
fulness [8]. By contrast, non-REM sleep regulation relies unclear how the psychological and biological state of
largely on GABAergic pathways, ascending from a por- hyperarousal relates to the 3P model, and how it causes
tion of the reticular activating system and descending dysregulation of these neural pathways.
from the anterior hypothalamus [9]. To date, most of With this as background, interventions for insom-
the developed neuropharmacologic agents for insomnia have been non-pharmacologic, pharmacologic, or
nia have focused on these pathways, speciically in the both [15]. Drawing a distinction between psychologi-
form of GABAergic manipulation. Additional neuro- cal and pharmacologic interventions may prove arbi-
transmitter systems contribute to non-REM sleep. For trary, as both types of interventions may ultimately
instance, serotonergic pathways are based largely in the result in biological change.
midbrain, and like many of the pathways responsible for
wakefulness, have ascending cortical and septal projec-
tions. Substantial complexity was introduced into the Non-pharmacologic interventions
known basic sleep-wake mechanisms in the late 1990s Cognitive behavioral therapy for insomnia (CBT-I)
with the discovery of hypocretin (orexin A). hese cent- has become the gold standard therapy for primary
ers primarily localize to the posterior hypothalamus, insomnia and has demonstrated eicacy for comorbid
296
Chapter 26: Insomnia
insomnia [16]. his multimodal intervention incor- Conditions associated with/exacerbated by insom-
porates therapeutic interventions targeting behavioral nia include depression, as well as a host of other
factors (maladaptive sleep habits, irregular sleep sched- neurophysiological complaints including altered con-
uling) and cognitive factors (worry, beliefs, appre- centration, energy levels, attention and vigilance, and
hension about sleep) that are believed to perpetuate motivation [20, 21]. Motor vehicle accidents have also
insomnia over time. Other examples of individual non- been linked to sleepiness, which can be associated with
pharmacologic interventions for insomnia include, but insomnia [22]. Emerging evidence links some insom-
are not limited to, stimulus control, relaxation therapy, nia with hypertension [23, 24]. he studies that have
paradoxical intention, and biofeedback [16]. Early linked insomnia to these conditions have been small,
studies evaluated the eicacy of individual behavioral and have some methodological limitations, thereby
therapies and CBT-I via in-person individual treat- hindering their use as primary outcome measures.
ment format, but more recent trials have expanded to
efectiveness studies with treatment modalities ran- Study populations
ging from group therapy to telephone consultations to
internet-delivered CBT-I [17]. As outlined, insomnia can be primary or comorbid.
Most early insomnia trials generally focused on sub-
jects with primary insomnia, however, some calls for
Pharmacologic interventions clinical trials in comorbid insomnia have been sounded
Pharmacologic agents with hypnotic properties via [25].
several diferent mechanisms have been evaluated in Furthermore, insomnia is, in many ways, a disor-
clinical trials. In the past 10–20 years, many of these der studied in the developed world. Clinical trials for
trials have studied novel drugs targeted to some of the insomnia are generally conducted in populations that
neurological pathways outlined above. have an awareness of the functional impact of sleep
Regarding the aforementioned neural networks loss, as well as the luxury to consider this disorder as a
responsible for wakefulness, REM sleep, and non-REM signiicant concern.
sleep, pharmacologic agents generally act either as sleep
‘agonists’, or wakefulness ‘antagonists’. Table 26.1 sum- Special populations
marizes some of the known neurotransmitter systems
It is estimated that up to 90% of insomnia is comorbid,
involved with sleep, and examples of neurotherapeutic
and psychiatric disease and pain are likely the inciting
agents that act at these targets (benzodiazepine recep-
factors in the majority of these cases [26]. Consequently,
tor agonists, benzodiazepines, hypocretin antagonists,
there is a signiicant need for exploratory and conirm-
antihistamines, anticholinergics).
atory clinical trials for such conditions. he converse is
he ideal hypnotic would be a safe, efective agent
also true, and the impact of insomnia on these inciting
that preserved sleep macro and micro architecture. It
conditions is becoming increasingly recognized. his
would also be free of side efects (dependency, rebound
represents an opportunity to conduct clinical trials that
insomnia, residual daytime sleepiness, medication
evaluate outcomes on the underlying condition, as the
interactions, etc.) while working rapidly and on a
concurrent insomnia is addressed. here is some argu-
known therapeutic target [18].
ment to include more detailed criteria for sleep dis-
turbance as part of the operational deinition of many
Goals of intervention of these conditions, especially in psychiatric disease.
In general, there are two major goals when treating Sleep disturbances are varied, though, and inding suf-
insomnia: icient uniformity in a potential study population can
1) Treat for resolution of/improvement in sleep be a barrier to design and recruitment. hese types of
disruption studies are far smaller and rarer than would be expected
2) Treat to improve the associated neurocognitive and when considering the prevalence of comorbid insom-
medical consequences of insomnia nia in these populations [27].
Insomnia is more common amongst women, and,
Compared to hypnotics, CBT-I seems to have a more as a result, many clinical trials have a preponderance
sustained efect, and perhaps an additional beneit of of female subjects [28]. Perhaps one of the best studied
disease modiication [19]. insomnia subpopulations is menopausal women. hese
297
trials can be challenging, in part because there are several (e.g., sleep onset vs. maintenance insomnia) or puta-
contributing factors to the insomnia [29]. Furthermore, tive mechanisms of the treatment under evaluation.
although these subjects could certainly be included in Sleep diaries can be used as stand-alone tools, or as an
standard insomnia trials, there has been substantial adjunct to objective measures such as actigraphy.
interest in alternative agents that are generally not con-
sidered hypnotics/soporiics in other populations [30, Questionnaires
31]. Speciically, hormone replacement therapy has been Several questionnaires are utilized in clinical trials. hese
studied as a targeted treatment for menopausal insom- include the Insomnia Severity Index (ISI), which meas-
nia, oten with contradictory results [32–34]. ures the characteristics and severity of the condition, and
Another condition that warrants mention is para- the Pittsburgh Sleep Quality Index (PSQI), which is a
doxical insomnia (also known as sleep state misper- measure of general sleep disruption [35–37]. he Women’s
ception). his condition is characterized by objectively Health Initiative Insomnia Rating Scale (WHIIRS) is
normal sleep duration, continuity, and architecture in used in appropriate populations [38, 39]. Depending on
an individual with complaints of gross sleep distur- the protocol, measures of daytime sleepiness, such as the
bances [5]. Although misjudgment of sleep time is a Epworth Sleepiness Scale (ESS) or Stanford Sleepiness
feature of most forms of insomnia, patients with para- Scale (SSS), can be appropriate [40, 41].
doxical insomnia have minimal daytime impairments
with a grossly disproportionate perception of sleep time Collateral clinical and neurocognitive measures
compared to objective measurements. his is a chal- Other subjective measurements are available to meas-
lenging patient population to treat, and no clinical tri- ure the impact of insomnia or sleep deprivation on per-
als to date have evaluated the eicacy of pharmacologic formance. he range of these measures is wide, and will
or non-pharmacologic therapeutics on this condition. vary depending on the individual protocol and what
it seeks to measure. hese tests oten include psycho-
Properties of measurement tools motor vigilance tasks. hese occasionally also take
the form of objective measurements. For instance, the
Clinical measures ‘steer clear’ test, a driving simulator that measures sur-
A variety of objective and subjective measures have rogates of driving performance, has been used in clini-
a role in clinical insomnia trials. Practical consid- cal trials in insomnia and sleep deprivation [42].
erations of these measures are introduced here, but
their application in clinical trials is detailed below Objective measures
(‘Clinical trial designs and analytical methods used in Polysomnography
development’).
here are several objective tests that are relevant in
Subjective measures insomnia trials. Of these, polysomnography (PSG)
remains the gold standard. It remains the only reliable
Sleep diary method by which sleep stages can be detected, and is
In both pharmacologic and non-pharmacologic clini- the most accurate measure of sleep continuity variables
cal trials of insomnia, sleep diaries are oten the primary (sleep latency, sleep eiciency, wake ater sleep onset,
outcome measure of treatment eicacy. One advantage number of arousals) that are oten important in these
of daily sleep diaries is the ability to evaluate sleep over trials. More invasive measures can be taken during PSG
days to weeks, providing a more complete picture than monitoring (for instance, blood pressure readings or
polysomnography or other short-term measures. he long-line blood draws). his said, PSG has signiicant
information is typically averaged over the assessment limitations as a research tool. In-lab studies with type
period, with calculation of key sleep parameters: sleep 1 devices (as opposed to type 2, 3, or 4 devices that are
onset latency (time to fall asleep for the irst time), considered ambulatory) are sometimes prohibitively
number of awakenings during the night, wakeful- expensive to use in research protocols [43]. In addition,
ness ater sleep onset, total sleep time, sleep eiciency polysomnography is subject to certain systemic arti-
(total sleep time/time in bed × 100), and sleep quality. facts, including ‘irst night efect’ where sleep architec-
Investigators oten select primary and secondary sleep ture can be disrupted simply by virtue of being observed
diary outcomes based on the patient selection criteria in a foreign environment with surface instrumentation.
298
Table 26.2 Methods of evaluating sleep and insomnia in clinical trials
Method Examples Advantages Disadvantages

a
Questionnaire ISI, PSQI, ESS, SSS Many validated. Simple to Each questionnaire has limitations
complete. Can have validated of what it is able to measure. Can
cut-offs to differentiate normal suffer from floor effects.
from pathology.
Sleep diaries Simple to complete. Requires significant subject effort.
Prospective. Longitudinal data. Adherence can drop over time.
Actigraphy Can provide longitudinal, More involved than questionnaires.
objective data about sleep. Some cost involved. Not as reliable
as PSG to determine sleep and
wakefulness. Does not always
correlate with PSG data in insomnia
subjects.
Portable polysomnography Less expensive than in-lab Many devices cannot determine
monitoring polysomnography. Can provide sleep staging.
longitudinal data about sleep.
In-lab monitored Best method for determining Expensive. Does not provide
polysomnography sleep staging. longitudinal data about sleep.
Subject to ‘first night’ effect
a
See text for abbreviations.
A run-in night is oten utilized in protocols, especially sleep diaries, or questionnaire data, and could be
for large, phase 3 trials. During this night, data are usu- considered neurophysiological extensions of clinical
ally recorded, but are rarely used as a baseline for analy- measurements that allow improved sensitivity in iden-
sis. Lastly, polysomnography provides no longitudinal tifying sleep characteristics. here are spectral analysis
data about sleep. It does create a detailed ‘snapshot’ of data that suggest faster EEG frequencies are present in
an individual night’s sleep, but this is insuicient to sleeping individuals with primary insomnia [44–46].
judge sleep objectively over weeks or months. his inding lends some support to a theory of hyper-
arousal contributing to insomnia; however, not all
Actigraphy studies have yielded similar results [47].
Actigraphy, another objective measurement tool for Observational studies have identiied biomarkers
sleep, utilizes movement (or more correctly, the lack in related sleep disturbances, (i.e., melatonin levels and
thereof) as a surrogate for sleep. his wristwatch size Period (Per) gene secretegogues in circadian rhythm
and shaped unit is worn by the subject, sometimes for disturbances); however, there are no analogous serum
weeks at a time. he information is then downloaded markers identiied to date in primary or comorbid
for recording and analysis. Although actigraphy data is insomnia.
useful to measure sleep over a sustained period of time, Some observational studies have evaluated
and to conirm objectively what is recorded in subjec- biomarkers for the sequelae of insomnia. For instance,
tive sleep journals, it does not permit measurement of inlammatory markers have been explored in forced
sleep stages and is less accurate in measuring sleep par- sleep deprivation [48]. It remains to be seen whether
ametrics (sleep latency, etc.) than polysomnography. Spielman’s 3P model of insomnia will eventually lend
itself to the identiication of biomarker surrogates.
Biomarkers and their relationship
to biological targets Clinical trial designs and analytical
Polysomnography and actigraphy are physiologic- methods used in development
ally more precise in detecting and quantifying sleep he studies highlighted below are examples of some
latency and sleep disruption than clinical interview, recent clinical trials that evaluated non-pharmacologic,
299
pharmacologic, or dual therapies for insomnia. hey eicacy of all methods, although this inding was not
utilize a host of study designs, including the gold stand- demonstrated with PSG.
ard: randomized controlled double-blind clinical tri- Although the use of objective PSG data in this trial
als. Some of the trials are considered to be seminal in lends objective data, this trial has a number of short-
the ield. Others were selected as examples of trials that comings that are common in the study of psychological
had some methodological or biostatistical limitations interventions. For instance, like the Espie trial, it is dif-
in order to highlight challenges or advances in the icult to blind subjects for psychological treatment. he
implementation of such studies. trial also uses some statistical methods that are limit-
ing, although concerns are not necessarily speciic to
Examples of clinical trials insomnia trials.
A handful of clinical trials have been designed to
Non-pharmacologic based trials evaluate non-pharmacologic interventions on comor-
Espie et al. conducted a randomized, controlled clin- bid insomnia. Currie et al. studied the efect of CBT on
ical trial to study the efectiveness of CBT delivered by participants with insomnia secondary to chronic pain
primary care nurses [49]. he study aimed to evaluate [51]. Fity-one subjects inished the study. Using self-
efectiveness of CBT over weeks to months. Patients report measures and actigraphy, a CBT group showed
with chronic insomnia were randomized with equal signiicant improvement at 7 weeks and again at 3
probability to two treatments: CBT or self-monitoring months, as compared to a wait list control group.
control (SMC). he SMC patients entered the treatment he imputation of missing data in this trial was
replication phase, receiving an identical treatment to performed using an intention to treat principle; one
the CBT group ater 6 weeks. he authors concluded that introduces signiicant uncertainty into the inal
that CBT was an efective intervention as evidenced by comparisons and therefore conclusions. Since this
both its initial superiority over SMC, and by the repli- publication, increasing emphasis has been placed on
cation of a similar outcome with deferred treatment. subject retention in clinical trials. Furthermore, other
his is a half crossover study. Since each individual statistical methods have been developed for imputa-
serves as his/her own control, the inluence of covariates tion of missing data [52]. he use of a repeated measure
other than the treatment assignment is reduced. Also, analysis of variance to study outcomes is no longer con-
crossover designs are usually statistically eicient, and sidered ideal in longitudinal studies. For future stud-
require fewer subjects than do non-crossover designs. ies that study insomnia longitudinally, other methods
his is closely related to the irst point, as the crossover such as the linear mixed-efect model and generalized
design generally reduces the variation of the pre- and estimation equation (GEE) methods could be consid-
post-treatment diference. his eiciency is an advan- ered [53, 54].
tage in insomnia trials, where both variability within
the condition and recruitment diiculties can be barri- Pharmacologic-based clinical trials
ers to conducting the trial. Roth et al. studied the treatment efect of eszopiclone
Despite these advantages, both the order of admin- (ESZ) over 1 year. Ater 6 months of double-blind,
istration of treatment, and carry-over efects between randomized, placebo-controlled treatment with esz-
treatments, can confound the estimates of the treat- opiclone, the study was extended for an additional 6
ment efect. For these reasons, crossover designs are months in an open-label phase [55]. In efect, there
not ideal for large-scale insomnia trials until carryover are two groups (placebo-ESZ) and (ESZ-ESZ) in
efect can be minimized or disproven. See Chapter 10 this report. he analyses indicated that signiicant
for a further discussion of crossover designs. improvement in sleep and daytime function was evi-
Other studies have evaluated eicacy of diferent dent in those switched from double-blind placebo to 6
types of psychological interventions for insomnia. In months of open-label eszopiclone therapy. In addition,
a comparative eicacy trial, Lichstein et al. studied improvements were noted and sustained in the ESZ-
the treatment efect of three psychological treatments ESZ group as well. his trial uses a diferent design in
(relaxation, sleep compression, and placebo therapy) order to gather long-term data, although half of the
on older adults with insomnia [50]. Seventy-four par- trial was open label. here are signiicant limitations
ticipants were randomized to three treatment groups to using an open-label design, but this trial highlights
with equal probabilities. Sleep diary data demonstrated the diiculty in designing ethically appropriate studies
300
that will maximize subject retention while still gather- the eicacy of maintenance strategies in optimizing
ing long-term data about insomnia. long-term outcomes [15]. Initially, the subjects were
Krystal et al. conducted a randomized, double- randomized to either CBT-I or dual therapy. Ater 6
blind, placebo-controlled parallel-group multi-center weeks, the CBT cohort was randomly split in two. One
study that aimed to evaluate long-term eicacy and half was selected to no treatment, and the remaining
safety of zolpidem on patients with chronic primary half continued with CBT. Likewise, the dual therapy
insomnia [56]. Patients were randomized at a rate of 2:1 group was split in two, with one half randomly trans-
to two groups: treatment (669) and placebo (349). he ferred to CBT-I treatment only. he extended treat-
subjective sleep measures, including sleep onset latency ment was stopped ater 6 months, but the cohorts were
(SOL), total sleep time (TST), number of awakenings studied for an additional 6 months in follow-up. In the
(NAW), wake ater sleep onset (WASO), quality of sleep short term, both CBT-I and dual therapy showed ei-
(QOS), and next day functioning were assessed daily cacy; long-term beneits were maintained for the dual
through the Patient Morning Questionnaire (PMQ) therapy turned CBT-I group. Although the randomi-
while the patients global impression (PGI) and Clinical zation procedure and data analysis were complicated,
Global Impression-Improvement scale (CGI-I) were this study explores a much needed treatment strategy,
assessed every 4 weeks. he total study lasted 6 months. one with previous little data to support it. Furthermore,
he study demonstrated that the zolpidem extended- it represents an evaluation with active treatment com-
release treatment was statistically superior to the pla- parisons, a rarity in insomnia trials.
cebo at each time point of assessment for PGI, CGI-I, Lastly, in a comparative eicacy trial, aimed to
TST, WASO, QOS, SOL and NAW. he treatment pro- evaluate active treatments using objective measures,
vided sustained and signiicant improvements in sleep Sivertsen, et al. examined short- and long-term clini-
onset and maintenance, and improved next-day con- cal eicacy of CBT and pharmacologic treatment in
centration and morning sleepiness. he design of the older adults experiencing chronic primary insomnia
study is the clearest among all clinical trials discussed [19]. his was a randomized, double-blinded, placebo-
here. he advantage of a longitudinal design is to study controlled trial, with subjects randomized to three
both the time trend of outcome variables, and the efect groups: CBT, sleep medication, and placebo medi-
of covariates (for example, treatment indicator) on cation. he treatment period was for 6 weeks, with 6
those outcome variables in the same model. In the data months follow-up for the two active treatment groups.
analysis, the authors compared the outcome variables With polysomnography, they found that CBT was
of two groups at each assessment time point. Although more efective than medication over both the short and
intuitive, this type of analysis should only be used for long term in certain sleep parameters (eiciency, slow-
a very preliminary analysis. Given the relatively large wave sleep, and total wake time). he study was subject
sample size, semiparametric methods for longitudinal to a host of limitations (small sample size, no blind-
studies (such as GEE, linear mixed model, or general- ing for CBT group, last value carried forward for miss-
ized linear mixed model) can be used to analyze the ing data), but does address comparative eicacy using
data eiciently [53, 54, 57]. objective methods.
Dual therapy and comparative efficacy trials

Recently, there has been interest in using pharma- General consideration of design
cologic and non-pharmacologic treatment in com- and data analysis
bination to treat insomnia. Hypnotics can produce
rapid symptomatic relief, but the results are oten not Parallel /crossover design
sustained. Conversely, CBT-I is generally intensive, For crossover insomnia trials, it is diicult to deter-
but oten results in sustained improvement. A dual mine exactly how long a washout period is required
approach represents an uncommon opportunity in for psychological interventions, such as CBT-I. For
neurology. instance, in the Espie trial, the placebo group switched
Morin et al. conducted a randomized controlled over to the CBT-I treatment ater 6 weeks [49]. here
trial of subjects with persistent insomnia designed to: are also some issues in estimating the treatment efect
1) evaluate the short and long-term efects of CBT-I, of CBT-I, speciically due to the delayed start of CBT-I
singly and combined with zolpidem and 2) compare in some subjects. For instance, the CBT-I evaluated vs.
301
placebo in the irst 6 weeks, may be fundamentally dif- Subjective

ferent from the CBT-I that is delayed for 6 weeks.
Diary data
Sample size
Eicacy of insomnia interventions is commonly deter-
Sample size is one of the most important considera- mined by comparing diary means and/or proportions
tions in clinical trials. Some studies have a sample size between a treatment and control group before and ater
too small (for example, in the Sivertsen trial) to make a treatment using appropriate standard tests of statistical
meaningful statistical inference. As an example, some signiicance. Magnitude of treatment efects are oten
of the above studies highlighted above set power at 80% expressed in terms of efect size using the d-statistic
while others at 90% [19]. [59]. Results from several meta-analyses of insom-
nia treatment studies indicate that pharmacologic
Length of follow-up
and non-pharmacologic insomnia therapies produce
Clinical trials for insomnia have examined both the medium to large efect sizes on key subjective sleep
short- and long-term efect of treatment. Several stud- parameters [60, 61].
ies of hypnotics have examined the eicacy and safety Indicators of clinical signiicance go beyond the
(i.e., the risk of dependency) of these agents for periods inferential statistical analysis and enable investigators
up to 6 months in duration [55]. Some non-pharma- to predict whether treatment-related changes are likely
cologic intervention trials have examined treatment to produce meaningful improvements in subjects’ daily
periods longer than this. Future studies that use a com- lives. Although no consensus exists on how to deine
monly accepted standard for short- and long-term clinically meaningful improvements, some possible
treatment periods will help to standardize treatments approaches include comparing sleep improvements to
and facilitate comparative eicacy. normative comparisons (e.g., mean sleep latency ≤30
Blindness and placebo effect minutes), using collateral information from signiicant
others and clinicians, and documenting the proportion
In clinical trials with medication as the treatment
of responders and remitters to insomnia based on an
modality, blinding is simple to perform and maintain,
accepted criterion.
for both the patients and researchers. For CBT-I, blind-
ing is somewhat more diicult. Most trials to date have Collateral clinical and neurocognitive measures
used wait list controls or minimal intervention com- A thorough assessment of insomnia treatment eicacy
parisons, but some trials have used behavioral placebos includes administration of collateral clinical measures.
successfully. In many trials, related conditions include depression
(Beck Depression Inventory), anxiety (State-Trait
Primary objective/primary outcome (end point) Anxiety Inventory), fatigue (Multidimensional Fatigue
Ideally each clinical trial should have only one primary Inventory), and quality of life (SF-36).
objective, and one primary outcome. Since insomnia is Although no standards exist for establishing ei-
a complicated problem, most clinical trials have several cacy, CBT-I trials have included clinician- and signii-
primary objectives and primary outcomes. cant-other ratings of insomnia symptoms and severity,
reasoning that an eicacious treatment should evidence
changes in sleep and functioning that are noticeable to
Standards for efficacy and special others [62]. he Clinical Global Impressions Scale, in
safety issues particular the Improvement subscale, is commonly
Signiicant progress has been made in developing and used in pharmacologic insomnia studies as a second-
testing pharmacologic and non-pharmacologic treat- ary measure of treatment eicacy.
ments for insomnia; however, variability remains Patients with insomnia frequently complain about
across clinical trials in the assessment of eicacy and deicits in cognition, most notably in the domains of
safety [58]. Early studies focused nearly exclusively on attention, concentration, and memory, yet few clinical
changes in sleep parameters with treatment, but the trials of insomnia have incorporated these measures as
importance of including collateral clinical measures outcomes. Because neurocognitive deicits can be mild
and relevant assessments of daytime functioning has and selective, the role of neurocognitive tests in insom-
become increasingly recognized. nia clinical trials has been restricted to quantifying
302
residual daytime impairment following nighttime hyp- Efficacy in comorbid populations

notic administration [63].
In comorbid insomnia trials, determination of treat-
ment eicacy is based on changes in both sleep and the
Objective accompanying condition. For example, Fava and col-
leagues found that remission rates to depression were
Polysomnography
higher in subjects who received 8 weeks of combined
Polysomnography is considered the gold standard for luoxetine and eszopiclone, compared to those who
measurement of sleep in hypnotic trials and is also used received luoxetine plus placebo [64]. In this study, the
as a measure of treatment eicacy in behavioral insom- primary endpoint of eicacy was change in psychiatric
nia trials. Because insomnia has a large subjective symptomatology rather than sleep. Primary eicacy
component, PSG is essentially never the sole measure outcomes have also been measured in terms of mark-
of eicacy in clinical trials. It is oten used to exclude ers for dependency and pain [65, 66].
subjects with other sleep disorders, and is increasingly
used as a pre-post measure of treatment eicacy. Key
sleep continuity parameters, such as total sleep time
Safety measures
and sleep eiciency (total sleep time/planned sleep In pharmaceutical insomnia trials, as with other phar-
time × 100) are generally measured. his approach may macologic treatment studies, it is standard to include
substantially increase Type I error rates (depending on laboratory studies, vital signs, electrocardiograms, and
sample size), therefore, deining primary and second- self-report scales. hese measures are included to doc-
ary PSG endpoints based on expected mechanisms of ument side efects, and adverse events while studying
treatment is a preferred approach. the experimental and placebo agents, whether related
or unrelated to the study treatment. Residual daytime
Actigraphy sedation and its consequences on daytime functioning
he use of actigraphy as a measure of eicacy is appeal- is a unique focus of safety evaluations in insomnia clin-
ing because it seems to balance the beneits of daily ical trials. In clinical trials involving hypnotic depend-
sleep diaries (continuous recording over days to weeks ent individuals, the Clinical Institute Withdrawal
in the home environment) with those of overnight PSG Assessment (CIWA) – Benzodiazepines assesses the
(objective measure of sleep). Actigraphy is generally type and severity of symptoms that may be related to
most useful as a secondary and complementary meas- discontinuation of the medication.
ure of treatment eicacy and in situations when PSG is
not practical. Implementation issues
Recruitment
Insomnia is extremely prevalent in the general popu-
lation, with as many as one in three individuals com-
plaining of persistent insomnia in a given year [67].
Nevertheless, recruitment issues and challenges are
common in insomnia clinical trials. Community
advertisement can result in samples of primarily non-
treatment seeking participants. hese individuals are
characteristically diferent from treatment-seeking
individuals in primary care or specialty settings,
which may impact on the generalizability of ind-
ings [68]. he use of prescription or non-prescrip-
AW-2, AW-Spectrum, AW-Score tion sleep agents, or other CNS active medications
Figure 26.1. Examples of actigraphy units. They are intended are also frequent exclusions, and the common use of
to be worn on the non-dominant wrist. Many units come with these drugs is a signiicant barrier to recruitment. As
event markers and/or with light sensors. The actigraphs can store a result, selection bias may be an issue in many tri-
several weeks’ worth of wake-sleep information. These data can be
downloaded, stored, and analyzed using a PC. Images courtesy of als, since only volunteers who: 1) can tolerate a with-
Philips Respironics, Murrysville, PA. drawal from their hypnotics, 2) desire to discontinue
303
these agents, or 3) are not currently on sleep agents Appropriate control group
are likely to volunteer. As a result, this bias can poten-
Nearly all pharmacologic insomnia trials to date have
tially compromise the external validity of the study.
compared active treatments to pill placebos, yet a more
Few clinical trials in insomnia are multi-site stud-
appropriate question involves the relative risks and ben-
ies, resulting in suboptimal samples in terms of geo-
eits of pharmacologic therapies with similar or difer-
graphic and racial/ethnic diversity. Moreover, while
ent mechanisms of action. One of the most commonly
insomnia disproportionately afects older rather than
used control conditions in non-pharmacologic trials is
young adults, most randomized trials to date have
a wait-list condition. As discussed in other chapters in
been carried out with middle-aged samples, with
this book, this introduces both ethical considerations
notable exceptions [61].
and concerns that post-treatment diferences can-
not be ascribed to the speciic treatment provided vs.
Sample selection non-speciic factors associated with participating in
One challenge unique to insomnia clinical trials is a research trial. More recent clinical trials have used
the lack of standards for the insomnia diagnosis. behavioral placebos against which to compare active
Criteria difer across three widely used nosologies, the CBT-I treatments, allowing for more speciic inter-
Diagnostic and Statistical Manual of Mental Disorders, pretation of observed treatment-related diferences
Fourth Edition – Text Revision (DSM-IV-TR), the [73, 74]. Given the recruitment challenges inherent in
International Classiication of Diseases (ICD-9-CM insomnia trials, crossover or single-case designs may
and ICD-10) and the International Classiication of be reasonable alternatives to randomized controlled
Sleep Disorders Second Edition (ICSD-2), with inves- parallel trials in certain circumstances, with the caveats
tigators using all three for sample selection. Recent noted above (‘Properties of measurement tools’) that
eforts have been made to address the heterogeneity washout periods should be reasonably known.
in insomnia deinitions across trials by deriving a con-
sensus deinition for insomnia research [69]. In add- Frequency of assessments
ition, frequency, severity, and duration are important Repeated assessments in randomized clinical trials are
dimensions to insomnia, but there has been little crucial for determining eicacy of therapies, but the
agreement about what or how cutofs should be used frequency of these assessments is a topic of signiicant
for study inclusion [70]. debate. While the frequency of assessments in a given
trial is guided by a number of considerations, insomnia
Duration of treatment trials should include, at a minimum, baseline assessments
A methodological challenge relevant to both phar- to characterize symptom status at study entry, post-
macologic and non-pharmacologic insomnia trials is treatment assessments to measure treatment eicacy,
determining the duration of treatment. he duration of and some follow-up assessment to ascertain durability
therapy in a particular study is likely the result of a vari- of gains following acute discontinuation of treatment. In
ety of factors, including the speciic research questions, addition, assessing outcomes during the course of ther-
known properties of the treatment under investiga- apy allows for the determination of treatment response
tion, and practical and methodological considerations. trajectories and process-outcome relationships [75].
Despite evidence that a signiicant proportion of indi-
viduals with chronic insomnia use hypnotic agents for Subject retention
longer than indicated, very few pharmaceutical trials A primary challenge to any clinical trial is subject reten-
have evaluated nightly or non-nightly use of these tion. More than one-third of active treatment partici-
agents beyond 1 month, with median treatment dura- pants and one-half of placebo participants discontinued
tion of just 1 week [71]. More recent medication trials their respective treatments in the longest controlled tri-
have been enhanced methodologically by the inclusion als of a pharmacologic insomnia therapy (6 months)
of placebo lead-ins to evaluate placebo response and [63, 76]. In general, attrition rates in large-scale rand-
lead-outs to evaluate the sustainability of treatment omized behavioral insomnia trials are lower than rates
efects ater acute therapy [72]. Issues about treatment in medication trials. However, attrition rates increase
duration have also been evident in non-pharmacologic signiicantly for behavioral trials that are not in-person,
insomnia trials [16]. such as internet-based insomnia treatment trials [77].
304
Future directions nia population is lacking. Studies evaluating the safety

proile of these types of treatments are needed.
Clinical trials examining insomnia are in their early
Lastly, cost analysis in insomnia treatment has been
stages, and there is signiicant controversy regarding
largely excluded from clinical trials. A few population-
optimal treatment approaches and duration, and the
based studies have examined the direct and indirect
measurement of eicacy outcomes.
costs of untreated insomnia [4]. Studies designed to
For instance, trials using two active drugs or thera-
measure economic productivity, ofset by costs of the
pies (i.e. study drug vs. active control) are rare, however,
therapy, are lacking. Outcomes should also be exam-
comparative efectiveness trials are critical for iden-
ined in terms of utilization of health care resources and
tifying diferentiating factors among subjects that are
treatment satisfaction.
related to treatment response/non-response. Ideally,
hese gaps in knowledge should serve as a basis for
these trials would involve drugs or therapies posited to
future clinical trials in this rapidly expanding ield.
target diferent mechanisms of disease. Better access to
funding outside of industry-sponsored sources could
help with this challenge. he few trials evaluating phar- References
macologic vs. non-pharmacologic interventions have 1. Breslau N, Roth T, Rosenthal L, et al. Sleep
been encouraging, and could serve as a model for this disturbance and psychiatric disorders: a longitudinal
type of trial. epidemiological study of young adults. Biol Psychiatry
To date, most clinical trials have focused on treat- 1996; 39: 411–8.
ments for primary insomnia. Both comorbid insomnia, 2. Katz DA and McHorney CA. he relationship between
and special patient populations (such as menopausal insomnia and health-related quality of life in patients
insomnia or insomnia in the elderly or in children) with chronic illness. J Fam Pract 2002; 51: 229–35.
deserve unique attention. Given subject variability, it 3. Taylor DJ, Mallory LJ, Lichstein KL, et al. Comorbidity
is not clear that the results from previous hypnotic tri- of chronic insomnia with medical problems. Sleep 2007;
30: 213–8.
als can be generalized to these groups that have some
fundamental diferences when compared to a general 4. Ozminkowski RJ, Wang S, Walsh JK. he direct and
primary insomnia population. indirect costs of untreated insomnia in adults in the
United States. Sleep 2007; 30: 263–73.
In general, hypnotic trials have evaluated eicacy
over short-term periods, weeks to months. Given the 5. American Academy of Sleep Medicine. International
chronicity of the condition, trials evaluating long-term classiication of sleep disorders, 2nd edition.:
Diagnostic and coding manual. Westchester, Illinois,
use of hypnotics are important and should be priori- American Academy of Sleep Medicine. 2005.
tized. Trials that examine long-term adherence to ther-
6. Spielman AJ, Glovinsky P. he varied nature of
apies, and sustainability of treatment gains, are needed.
insomnia. In: Hauri PJ, ed. Case Studies in Insomnia.
Perhaps more importantly, the safety proile of these New York, Plenum Press. 1991; 1–15.
pharmacologic agents, in terms of dependence and
7. Edinger JD, Olsen MK, Stechuchak KM, et al. Cognitive
other efects, needs to be evaluated, as many patients behavioral therapy for patients with primary insomnia
remain on these drugs for years. or insomnia associated predominantly with mixed
Insomnia is a complicated process, and alicted psychiatric disorders: A randomized clinical trial. Sleep
individuals are diferent from one another in a vari- 2009; 32: 499–510.
ety of ways. To date, consistent genetic, physiological, 8. Jones BE. Toward an understanding of the basic
and psychological proiles of insomnia have not been mechanisms of the sleep-waking cycle. Behavior Brain
identiied. Moreover, the characteristics of responders Sci 1978; 1: 495.
vs. non-responders in clinical trials are understudied. 9. Jones BE. he organization of central cholinergic
By identifying phenotypes of insomnia, treatment can systems and their functional importance in sleep-
potentially be more targeted in the future. waking states. Prog Brain Res 1993; 98: 61–71.
Safety outcomes have heretofore generally been 10. De Lecea L, Kilduf TS, Peyron C, et al. he hypocretins:
excluded in non-pharmacologic treatment trials. It hypothalamus-speciic peptides with neuroexcitatory
has been assumed that CBT, biofeedback, relaxation activity. Proc Natl Acad Sci 1998; 95: 322–7.
therapy, and other techniques have few treatment risks, 11. Gautvik KM, de Lecea L, Gautvik VT, et al. Overview
but the evidence behind this assumption in an insom- of the most prevalent hypothalamus-speciic mRNAs,
305
as identiied by directional tag PCR subtraction. PNAS 26. Lichstein K, Gellis LA, Stone K, et al. Primary and
1996; 93: 8733–8. secondary insomnia. In: Pandi-Perumal SR, Monti JM,
12. Sakurai T, Amemiya A, Ishii M, et al. Orexins ed. Clinical Pharmacology of Sleep. Berlin: Birkhauser
and orexin receptors: a family of hypothalamic Basel; 2006.
neuropeptides and G protein-coupled receptors that 27. Zucker TL, Samuelson KW, Muench F, et al. he efects
regulate feeding behavior. Cell 1998; 92: 573–85. of respiratory sinus arrhythmia biofeedback on heart
13. Saper CB, Chou TC, and Scammell TE. he sleep rate variability and posttraumatic stress disorder
switch: hypothalamic control of sleep and wakefulness. symptoms: a pilot study. Appl Psychophysiol Biofeedback
Trends Neurosci 2001; 24: 763–71. 2009; 34: 135–43.
28. Liljenberg B, Almqvist M, Hetta J, et al. he prevalence
14. Saper CB, Scammell TE, and Lu J. Hypothalamic
of insomnia: the importance of operationally deined
regulation of sleep and circadian rhythms. Nature 2005;
criteria. Ann Clin Res 1988; 17: 1–7.
437: 1257–63.
29. Moline M, Broch L, Zak R. Sleep in women across the
15. Morin CM, Vallieres A, Guay B, et al. Cognitive
life cycle from adulthood through menopause. Med
behavioral therapy, singly and combined with
Clin N Am 2004; 88: 705–36.
medication, for persistent insomnia: A randomized
controlled trial. JAMA 2009; 301: 2005–15. 30. Dorsey CM, Lee KA, and Scharf MB. Efect of
zolpidem on sleep in women with perimenopausal
16. Morin CM, Bootzin RR, Buysse DJ, et al. Psychological
and postmenopausal insomnia: a 4-week randomized,
and behavioral treatment of insomnia: Update of
multicenter, double-blind, placebo-controlled study.
the recent evidence (1998–2004). Sleep 2006; 29:
Clin herap 2004; 26: 1578–86.
1398–414.
31. Jofe H, Petrillo L, Viguera A, et al. Eszopiclone
17. Ritterband LM, horndike FP, Gonder-Frederick LA, et improves insomnia and depressive and anxious
al. Eicacy of an internet-based behavioral intervention symptoms in perimenopausal and postmenopausal
for adults with insomnia. Arch Gen Psychiatry 2009; 66: women with hot lashes: a randomized, double-blinded,
692–8. placebo-controlled crossover trial. Am J Obstet Gynecol
18. Mendelson WB, Roth T, Cassella J, et al. he treatment 2010; 202: 171.e1–.e11.
of chronic insomnia: drug indications, chronic use and 32. Best NR, Rees MP, Barlow DH, et al. Efect of estradiol
abuse liability. Summary of a 2001 New Clinical Drug implant on noradrenergic function and mood in
Evaluation Unit Meeting Symposium. Sleep Med Rev menopausal subjects. Psychoneuroendocrinology 1992;
2004; 8: 7–17. 17: 87–93.
19. Sivertsen B, Omvik S, Pallesen S, et al. Cognitive 33. Schif I, Regestein Q, Tulchinsky D, et al. Efects
behavioral therapy vs zopiclone for treatment of estrogens on sleep and psychological state of
of chronic primary insomnia in older adults: A hypogonadal women. JAMA 1979; 242: 2405–7.
randomized controlled trial. JAMA 2006; 295: 2851–8.
34. Pickett CK, Regensteiner JG, Woodward WD, et al.
20. Tsuno N, Besset A, and Ritchie K. Sleep and depression. Progestin and estrogen reduce sleep disordered breathing in
J Clin Psychiatry 2005; 66: 1254–9. postmenopausal women. J Appl Physiol 1989; 66: 1656–61.
21. Roth T and Ancoli-Israel S. Daytime consequences and 35. Buysse DJ, Reynolds III CF, Monk TH, et al. he
correlates of insomnia in the United States: Results of Pittsburgh Sleep Quality Index: A new instrument for
the 1991 National Sleep Foundation Survey. II. Sleep psychiatric practice and research. Psychiatry Research
1999; 22: S354–S8. 1989; 28: 193–213.
22. Powell NB, Schechtman KB, Riley RW, et al. Sleep 36. Morin CM. Insomnia: Psychological assessment and
driver near-misses may predict accident risks. Sleep management. Barlow DH, ed. New York, he Guilford
2007; 30: 331–42. Press, 1993.
23. Vgontzas AN, Liao D, Bixler EO, et al. Insomnia with 37. Bastien CH, Vallières A, and Morin CM. Validation of
objective short sleep duration is associated with a high the Insomnia Severity Index as an outcome measure for
risk of hypertension. Sleep 2009; 32: 491–7. insomnia research. Sleep Med 2001; 2: 297–307.
24. Lanfranchi PA, Pennestri M-H, Fradette L, et al. 38. Levine DW, Bowen DJ, Kaplan RM, et al. Factor
Nighttime blood pressure in normotensive subjects structure and measurement invariance of the women’s
with chronic insomnia: Implications for cardiovascular health initiative insomnia rating scale. Psychol Asses
risk. Sleep 2009; 32: 760–6. 2003; 15: 123–36.
25. Stepanski E and Rybarczyk B. Emerging research on 39. Levine DW, Kripke DF, Kaplan RM, et al. Reliability
the treatment and etiology of secondary or comorbid and validity of the Women’s Health Initiative Insomnia
insomnia. Sleep Med Rev 2006; 10: 7–18. Rating Scale. Psychol Assess 2003; 15: 137–48.
306
40. Johns MW. A new method for measuring daytime 56. Krystal AD, Erman M, Zammit GK, et al. Long-term
sleepiness: he Epworth Sleepiness Scale. Sleep 1991; eicacy and safety of zolpidem extended-release 12.5
14: 540–5. mg, administered 3 to 7 nights per week for 24 weeks,
41. Hoddes E, Dement W, and Zarcone V. he development in patients with chronic primary insomnia: A 6-month,
and use of the Stanford Sleepiness Scale (SSS). randomized, double-blind, placebo-controlled,
Psychophysiol 1972; 9: 150. parallel-group, multicenter study. Sleep 2008; 31: 79–90.
42. Findley LJ, Fabrizio MJ, Knight H, et al. Driving 57. McCulloch CE, Searle SR, Neuhaus JM. Generalized,
simulator performance in patients with sleep apnea. Am Linear, and Mixed Models, 2nd edition. New York, John
Rev Respir Dis 1989; 140: 529–30. Wiley & Sons. 2008.
43. Standards of Practice Committee of the American Sleep 58. Buysse DJ, Ancoli-Israel S, Edinger JD, et al.
Disorders Association. Practice parameters for the use Recommendations for a standard research assessment
of portable recording in the assessment of obstructive of insomnia. Sleep 2006; 29: 1155–73.
sleep apnea. Sleep 1994; 17: 372–7. 59. Cohen J. Statistical Power for the Behavioral Sciences.
44. Freeman R. EEG power in sleep onset insomnia. 2nd edition. New York, Academic Press. 1988.
Electroencephal Clin Neurophysiol 1986; 63: 408–13. 60. Morin CM, Culbert JP, and Schwartz SM.
45. Merica H, Blois R, and Gaillard JM. Spectral Nonpharmacological interventions for insomnia: a
characteristics of sleep EEG in chronic insomnia. Eur J meta-analysis of treatment eicacy. Am J Psychiatry
Neurosci 1998; 10: 1826–34. 1994; 151: 1172–80.
46. Krystal AD, Edinger JD, Wohlgermuth WK, and Marsh 61. Irwin MR, Cole JC, and Nicassio PM. Comparative
GR. NREM Sleep EEG frequency spectral correlates of meta-analysis of behavioral interventions for
sleep complaints in primary insomnia subtypes. Sleep insomnia and their eicacy in middle-aged adults and
2002; 25: 630–40. in older adults 55+ years of age. Health Psychol 2006;
47. Bastien CH, LeBlanc M, Carrier J, et al. Sleep 25: 3–14.
EEG power spectra, insomnia, and chronic use of 62. Morin CM, Colecchi C, Stone J, et al. Behavioral and
benzodiazepines. Sleep 2003; 26: 313–7. pharmacological therapies for late-life insomnia: a
48. Irwin MR, Wang M, Ribeiro D, et al. Sleep loss activates randomized controlled trial. JAMA 1999; 281: 991–9.
cellular inlammatory signaling. Biol Psychiatry 2008; 63. Krystal AD, Walsh JK, Laska E, et al. Sustained eicacy
64: 538–40. of eszopiclone over 6 months of nightly treatment:
49. Espie CA, Inglis SJ, Tessier S, et al. he clinical Results of a randomized, double-blind, placebo-
efectiveness of cognitive behaviour therapy for chronic controlled study in adults with chronic insomnia. Sleep
insomnia: implementation and evaluation of a sleep 2003; 26: 793–9.
clinic in general medical practice. Behav Res her 2001; 64. Fava M, McCall WV, Krystal A, et al. Eszopiclone
39: 45–60. co-administered with luoxetine in patients with
50. Lichstein KL, Riedel BW, Wilson NM, et al. Relaxation insomnia coexisting with major depressive disorder.
and sleep compression for late-life insomnia: a Biol Psychiatry 2006; 59: 1052–60.
placebo- controlled trial. J Consult Clin Psychol 2001;
65. Edinger JD, Wohlgemuth WK, Krystal AD, et al.
69: 227–39.
Behavioral insomnia therapy for ibromyalgia patients:
51. Currie SR, Wilson KG, Pontefract AJ, et al. Cognitive- A randomized clinical trial. Arch Intern Med 2005; 165:
behavioral treatment of insomnia secondary to chronic 2527–35.
pain. J Consult Clin Psychol 2000; 68: 407–16.
66. Morin CM, Bastien C, Guay B, et al. Randomized
52. Little RJA and Rubin DB. Statistical Analysis with clinical trial of supervised tapering and cognitive
Missing Data, 2nd edition. Hoboken, NJ, John Wiley & behavior therapy to facilitate benzodiazepine
Sons, 2002. discontinuation in older adults with chronic insomnia.
53. Laird NM and Ware JH. Random-efects models for Am J Psychiatry 2004; 161: 332–42.
longitudinal data. Biometrics 1982; 38: 963–74. 67. Ohayon MM. Epidemiology of insomnia: what we
54. Liang KY and Zeger SL. Longitudinal data analysis know and what we still need to learn. Sleep Med Rev
using generalized linear model. Biometrika 1986; 49: 2002; 6: 97–111.
623–30. 68. Davidson JR, Aime A, Ivers H, et al. Characteristics
55. Roth T, Walsh JK, Krystal A, et al. An evaluation of of individuals with insomnia who seek treatment
the eicacy and safety of eszopiclone over 12 months in a clinical setting versus those who volunteer for a
in patients with chronic primary insomnia. Sleep Med randomized controlled trial. Behav Sleep Med 2009; 7:
2005; 6: 487–95. 37–52.
307
69. Edinger JD, Bonnet MH, Bootzin RR, et al. Derivation primary insomnia: a randomized controlled trial.
of research diagnostic criteria for insomnia: Report of JAMA 2001; 285: 1856–64.
an American Academy of Sleep Medicine Work Group. 74. Manber R, Edinger JD, Gress JL, et al. Cognitive
Sleep 2004; 27: 1567–96. behavioral therapy for insomnia enhances depression
70. Lichstein KL, Durrence HH, Taylor DJ, et al. outcome in patients with comorbid major depressive
Quantitative criteria for insomnia. Behav Res her 2003; disorder and insomnia. Sleep 2008; 31: 489–95.
41: 427. 75. Morin CM. Measuring outcomes in randomized
71. Nowell PD, Mazumdar S, Buysse DJ, et al. clinical trials of insomnia treatments. Sleep Med Rev
Benzodiazepines and zolpidem for chronic insomnia: 2003; 7: 263–79.
a meta-analysis of treatment eicacy. JAMA 1997; 278: 76. Walsh JK, Krystal AD, Amato DA, et al. Nightly
2170–7. treatment of primary insomnia with eszopiclone for
72. McCall WV, D’Agostino R, and Dunn A. A meta- six months: Efect on sleep, quality of life, and work
analysis of sleep changes associated with placebo in limitations. Sleep 2007; 30: 959–68.
hypnotic clinical trials. Sleep Med 2003; 4: 57–62. 77. Ström L, Pettersson R, and Andersson G. Internet-
73. Edinger JD, Wohlgemuth WK, Radtke RA, et al. based treatment for insomnia: A controlled evaluation.
Cognitive behavioral therapy for treatment of chronic J Consult Clin Psych 2004; 72: 113–20.
308
Section 7 Clinical trial planning and implementation
Chapter
Clinical trial planning: An academic
27 and industry perspective

Cornelia L. Kamp and Jean-Michel Germain
Clinical trial planning overview sample clinical trial team is included in Figure 27.1.
hese may vary across diferent settings and the igure
Implementing large scale clinical trials is a logistical
is merely an illustration of the typical study team com-
challenge for both academic investigators and com-
ponents including:
panies. he purpose of this chapter is to describe the
planning process for a clinical trial. Successful clinical • Steering Committee (SC): primary responsibility is
trials rely on scientiic, clinical, and operational excel- for the scientiic and clinical conduct of the study,
lence. It requires not only the optimal protocol or study typically used in investigator-initiated studies
design, but also the appropriate experience and expert- conducted in academic medical centers
ise in project planning and management. Much of the • Sponsor: typically the company or investigator with
battle is won or lost in the planning stages when risks overall responsibility for the trial
can be assessed and mitigated in advance. Table 27.1 • Operational Team: with the Project Manager as the
provides an overview of key clinical trial activities that team leader
need to be planned ahead and managed throughout the • Enrolling Site Team
trial. In this chapter, we use the example of large, global • Data Safety Monitoring Board (DSMB), or Safety
studies, but the basic principles in planning a trial are Monitoring Board (SMC)
the same regardless of trial size. • Independent Medical Monitor: individual charged
with reviewing day to day safety of randomized
Project planning subjects and answering site questions about
inclusion/exclusion criteria
Project and trial management • Vendor Teams which include central laboratory,
he study or project team’s primary responsibility is to primary and secondary drug packaging/labeling/
deliver clinical trials that meet the clinical study plan distribution, central ECG, electronic diaries,
objectives. A clear deinition of roles and responsibil- electronic patient reported outcomes (ePROs), etc.
ity among the team members and efective processes • Endpoint Adjudication Committee (EAC)
are necessary to successfully manage the multiple steps • Others as may be required by unique trial designs.
associated with study planning, initiation, implemen- A large phase 3, global trial may have hundreds of study
tation, publication, and ultimate closeout. Here, we will team members, each contributing a unique component
primarily focus on reviewing key roles of the sponsor to the success of the trial, while smaller proof-of-con-
or coordinating center (e.g., academic coordinating cept or single site studies will have a more manageable
center, contract/clinical research organization (CRO) team size of ~10–20 individuals.
or sponsor) and the site team. Most study management models in the pharma-
ceutical industry rely on a study or project manager
Study team members (PM). his role may vary from one company to another
Running multi-center clinical trials requires a matrix but responsibilities remain very similar. he PM is
team approach across multiple organizations/institu- primarily accountable for the execution of the study,
tions and numerous individuals within each entity. A leading the study team, and coordinating the various
309
Section 7: Clinical trial planning and implementation
Table 27.1 Clinical trial overview

Study Study analysis/
Study planning Study start-up maintenance reporting
Planning Study Planning/Tracking/Communication
Projects/studies Synopsis/Protocol/Amendment
ICF/ICF Amendment
Advertising
Hold/Early Termination
Project/study Investigators Brochure/Annual Update
documents
Service providers CRO & Vendors Selection
CRO & Vendors Agreements
Budgets & Payments
Investigational Manufacturing/Packaging/Labeling/Stability Program/Management of
product(s) Expiry Dates/Distribution
Accountability/Reconciliation
Specimens Specimen Planning Specimen Collection and Tracking Specimen
Analysis
Regulatory afairs IND/CTA/CTX/CTN/IMPD
Safety Reporting
Investigational sites Sites Sites
Identification/ Qualification
Selection
Confidentiality Agreements / Clinical Study Agreements
Financial Disclosure
Study Training
Enrolment Planning &Tracking
Enrolment
Development of Monitoring Plan Monitoring
Data management CRF Development
Database set-up
Data Collection/Editing/Review/Data Data Reporting
Monitoring Committee
Statistical Analysis Plan/Study Programming/Interim Analysis Data Analysis
Randomization Code Code Release
Documents Study Documents Translation/Trial Master Files Maintenance
management
QA Audits & Inspections/Inspection Readiness
Project Study Registration/Publication/Scientific Communication/Meetings
communication
Compliance Compliance Management/Training
310
Chapter 27: Clinical trial planning
Sponsor
- Pharmaceutical
- Biotech
- NIH
- FDA
- DOD
- Foundation/combination/other
Steering committee (SC) Regulatory authorities

- PI
- Co-PI
- Biostatistician Project team
- 2-4 disease experts Project manager
- Lay person with (team leader)
disease of interest - Database manager Endpoint adjustication
- PM (ex-officio) - Information analyst committee (EAC)
- Medical monitor
- Clinical scientist
- Clinical pharmacist
Data safety monitoring - Statistical programmer
board (DSMD) - Regulatory Enrolling sites
- Chair - Monitors - Primary investigator
- 1-2 Experts in area of safety - Finance/administration - Site coordinator
concern - Administrative support - IRB/ethics board (local or
- 2-4 Disease experts central)
- Biostatistician - Contracts office
Vendors
- Central laboratory
- Central ECGs
- ePRO
- Drug/device manufacturer
- Drug/device
Packager/distributor
- EDC, IWRS/IVRS
Figure 27.1. Sample clinical trial study team.
functional areas (i.e., biostatistics, regulatory, data local regulatory requirements, as applicable. he CRA
management, monitoring groups, vendors etc.) as well is responsible for site initiation, collection of regula-
as external partners, committees (SC, DSMB, EAC) tory documents, and monitoring as well as for oversee-
and providers of services such as central laboratory, ing the study progress and delivering operational and
ECG provider, or drug/device manufacturer, to achieve protocol training to site personnel.
study objectives and milestones. Other responsibilities Site organization relies on multiple specialties and
include risk analysis and management, adherence to area of expertise, but the primary site team roles are
the approved budget, as well as the optimization of those of the principal investigator (PI) and site coordin-
operational eiciency. In essence the PM is tasked with ator. Since the complexity of studies has signiicantly
ensuring the study is completed on-time, with high increased over the last several years, sponsors have
quality and within budget. been looking for eicient, well-qualiied, and well-
he site manager (SM) or clinical research associate trained investigative teams, which have technical,
(CRA) is generally the main point of contact for inves- organizational, and administrative skills [1, 2]. he PI
tigational sites. he SM’s primary responsibility is exe- is responsible for the overall conduct of the study at his/
cuting the monitoring plan, and ensuring sites comply her site. his includes overseeing the study progress,
with Good Clinical Practice (GCP) and International patient selection and safety, as well as compliance with
Conference on Harmonisation (ICH), Code of Federal the federal, state, and local regulatory requirements.
Regulations (21 CFR Part 11) and other regional or he investigator may delegate his/her responsibilities
311
to other qualiied members of the team via the use of a the sponsor is a large biopharmaceutical company or
delegation of authority log, but ultimately remains the a small one, the reasons for outsourcing activities are
sole person responsible for the overall conduct of the oten the same: lack of resources or lack of in-house
study and site [3]. expertise, and large projects involving many sites in
he site coordinator is typically responsible for multiple regions of the world [4].
coordinating clinical activities at the site level, man- When selecting a CRO, most sponsors prioritize
aging standard operating procedures (SOPs), site the CRO’s experience in the target therapeutic area or
personnel training records, contacts with CRAs or indication, its worldwide experience, or its expertise
companies subcontracted by the sponsor such as in a speciic region. Other important selection criteria
central laboratories, interactive voice/web response generally include the proposal for the implementation
system, or data management. his position relies on and the execution of the project (resources allocation,
excellent organization and communication skills and is sites, and countries), budget, as well as the working
generally held by a study nurse with established experi- relationship between the sponsor and the CRO as both
ence in clinical trials. parties will need to develop a close collaboration.
he choice of a CRO should stem results from a
thorough evaluation process performed in collabor-
Communication strategy ation with staf with relevant expertise (e.g., site moni-
Eicient communication across the entire study team, toring). he assessment of the CRO should include the
is a requisite for successful execution of clinical trials evaluation of written SOPs.
(Figure 27.1). Site personnel will have to communi- In general, when selecting a CRO or any other ven-
cate and collaborate with various internal and external dors, it is critical to put together a Request for Proposal
partners. hese partners may be at diverse locations (RFP) document that clearly deines the services being
around the world and language and cultural barriers requested. he more speciic the RFP is, including
may present a challenge for communications. anticipated key timeline milestones such as irst patient
Because the management of clinical trials relies on enrolled, enrollment duration, last patient enrolled,
activities from individuals sitting in various functional number of sites, or number of countries, the more real-
domains, departments, or institutions, the sponsor or istic and comprehensive the CRO/vendor proposal will
CRO project team must ensure that the clinical study be and the more accurate the budget will be. he indus-
plan is aligned with site, vendor, and functional area try standard is to submit an RFP to a minimum of two
objectives and that all study teams’ members under- or three CROs/vendors for the requested service. he
stand the overall strategy and objectives. his is espe- process of requesting RFPs and evaluating returned
cially true when conducting multi-center, worldwide proposals oten helps identify gaps, inconsistencies, or
clinical trials, where the operational team must ensure deiciencies in the protocol.
that all study steps are implemented consistently across he contract agreement and the scope of work (SOW)
sites and regions. herefore, regular meetings with key which formalizes and delineates the proposal for services
team members, whether these are teleconferences, should clearly include detailed obligations to be assumed
web-based, and in-person meetings are required to by the CRO, vendors etc. as well as detailed descrip-
ensure a common understanding of study priorities, tions of all activities with associated costs and timelines.
challenges, timelines, and budget. Also, to preclude any conlict of interest the agreement
From a project management perspective, clinical should clearly state the responsibilities for each party.
trials sponsors value a single point of accountabil- As an example, Table 27.2 provides a non-exhaustive list
ity with the CRO/vendors, with frequent and regular of activities and associated responsibilities. More com-
updates to ensure the appropriate oversight of the exe- plex studies may include SOWs with even greater clarity
cution plan. of roles including codes like P = primary creation, R =
Review, A = approval, O = oversight, etc. as many tasks
Service providers involve input from multiple groups within the sponsor,
CRO, or other vendors. he greater the clarity to the
CRO selection and management deined roles and responsibilities in the planning phase
he use of CROs (either academic or for-proit) has of the study the greater the likelihood for tasks being
signiicantly increased over the last decade. Whether completed on-time.
312
Table 27.2 Planning and implementation scope of work
Sponsor
Data
Medical management/ Clinical
Task/description Investigator team Project team biostastistics pharmacy CRO
A. Protocol development
Develop, refine protocol X
Develop informed consent X
form (study template)
Develop informed consent X X
form (site specific)
Develop sites & monitor X X
training material
B. Study preparation
Identify regulatory X
requirements for protocol
submission to IRB/IEC & local
regulatory bodies
Develop/review procedures X X X
for reporting serious adverse
events to sponsor, IRB, IEC,
local regulatory bodies
Package, label, study X
medication
Develop database and CRF X
C. Sites selection and
qualiication
Develop criteria for sites X X
selection
Identify medically appropriate X
study sites
Conduct sites qualification X
visits
Develop clinical study X X
agreement with sites
Obtain & file site X
documentation (CVs, financial
disclosure forms, 1572
FDA forms) & regulatory
documents
D. Study start-up
Submit protocol and X X
protocol-related documents
to IRB/IEC and local regulatory
bodies
Follow-up on submission/ X X
approval status
313
Table 27.2 (cont.)
Sponsor
Data
Medical management/ Clinical
Task/description Investigator team Project team biostastistics pharmacy CRO
Conduct site initiation visits X
Distribute CRF to sites X X
E. Study maintenance
Oversee subject recruitment X X
and enrollment
Safety reporting to IRB/IEC X X
and local regulatory bodies
Ship study medication to sites X X
as required
Run consistency & edit X
checks, derivations, batch
validations
Follow-up on queries X
resolution
Monitor sites according to X
monitoring plan
F. Study closure
Ensure resolution of all issues X X
at site level
Conduct study medication X X
reconciliation
Perform site closeout visit X
Inform IRB/IEC and local X X
authority of trial end
In an attempt to gain eiciency and to avoid repeat- sponsor must ensure that the vendor complies with
ing the same process for each study, most biopharma- the existing regulations [5] through adequate SOP
ceutical companies have now developed Master documentation, for all steps. Critical steps include but
Services Agreements (MSAs) with preferred CRO and are not limited to:
other vendors or are now entering strategic partner- • CRF data entry: Whether the protocol uses paper
ships and alliances with CROs and other vendors. CRF or electronic-CRF (e-CRF) the investigator
remains responsible for the accuracy, completeness,
Data Management vendor selection/management legibility, and timeliness of the data reported in the
Sponsors oten contract for data management ser- CRF.
vices. Activities generally range from database and • Data entry: For paper CRF, double-data entry with
Case Report Form (CRF) design to database lock, third party veriication is considered standard
including clinical data collection, validation and edit- practice. In case additional clinical data from
ing, coding, transfer, and, occasionally, analysis and external vendor(s) (e.g., central laboratory) is
reporting. As with any other clinical trial related activ- expected, the format of the data to be received, as
ities, data management is governed by GCP and ICH. well as the frequency of transfer, is to be agreed
herefore, as part of the vendor selection process the to and documented in the Data Management
314
and Data Validation Plan (DMP) prepared with • Database lock: he lock of the database occurs
the data vendor and approved by the sponsor. ater all clinical data including data from external
he clinical data that require derivation such as vendors (e.g., central laboratories, ePROs,
calculated scores must be identiied together with Interactive Web/Voice Response System (IWRS/
the raw data it is derived from. he DMP is a living IVRS)) have been received, edited, and ater all
document that is updated throughout the study as discrepancies have been resolved. At the time of
modiications to the data management criteria are database lock, all permissions to add, delete, or
agreed to with the sponsor. modify clinical data are revoked. For studies that
• Editing and validation: he clinical trial sponsor require a per-protocol interim analysis, the lock of
and the data vendor must agree on standard the database can occur while the study still includes
validations to be performed on an ongoing basis, active subjects
with appropriate documentation in the DMP. All
alterations to the database (addition, deletion, and Laboratory and ECG vendor selection/management
update) must have a complete and searchable audit
trail, and can only be performed by authorized, Central laboratories
qualiied, and trained personnel. For protocols he use of central laboratories for routine testing (blood
using e-CRF, clinical data can only be entered by chemistry, hematology, and urine analysis) and other
site personnel. biological specimens is now very common in clinical
• Management of queries: To address inconsistencies trials. It developed in the late 1980s from the need for
and potential data entry errors from site personnel, clinical trial sponsors to collect and report data in a
queries, or clariications on data collected in the more consistent way. In the mid and late 1990s, with
CRF are sent by data management personnel the extension of clinical trials in emerging markets
directly to the investigational sites. he DMP (Latin America, Eastern Europe) and more recently in
deines upfront all data that will be queried. Asia/Paciic and India, the use of central laboratories
his can include for example range checks for continued to develop so that it is now well established
an assessment where each question can only be for the monitoring of safety parameters, and also for
answered 1–4 as an acceptable response or logic the collection of biochemical or biological eicacy data
checks across CRFs such as querying the subject (if applicable).
demography indicated as male on the CRF while When outsourcing biological samples testing to a
a ‘yes’ response for pregnancy is provided on the central laboratory, clinical trial sponsors are looking
laboratory CRF. for:
• Coding and dictionary: he coding of adverse • consistent methodology for both the collection and
events and medications is required to ensure the testing of biological specimens
a standardized reporting. For the coding of • consistent reporting (i.e. consistent normal ranges
adverse events, MedDRA (Medical Dictionary across sites)
for Regulatory Afairs) is the standard in use • consistent SOPs
from phase 1 to phase 4 clinical [6]. he European • high quality data
Directive 2001/20/EC [7] requires that adverse • global services
reaction terms be coded according to MedDRA
• eicient shipping, tracking and reporting
when reporting suspected unexpected serious
systems
adverse events (SUSARs) [8]
• contingency plans
• Data quality check: As per ICH E6 GCP [9]
• high quality services to sites
quality checks on pre-determined samples must be
conducted to ensure that validated processes have • responsiveness.
been used for the transformation of the data. Unless the central laboratory considered has already
• Safety Data Management: Serious adverse event been selected as a preferred provider of services by
(SAE) data entered in the clinical database is to the sponsor, the assessment process should include
be reconciled vs. the safety database used for the a review of SOPs and ideally a visit of the facility and
collection of serious adverse events. ancillary sites, if possible.
315
he contract agreement and scope of work should Other vendors

clearly identify the parameters which are tested Similar to central ECG readers and central laboratories,
together with the methodology and associated costs, the use of vendors providing central imaging solutions
as well as detailed information on timelines associated (MRI, β-CIT, PET, DEXA, etc.) has recently developed
with the shipment, receipt, analysis, and reporting of specially in the ields of neurology and oncology. As med-
data. A primary point of contact and accountability for ical imaging is more and more accepted as a surrogate
reporting to the sponsors, as well as a primary point marker of disease progression, there has been increased
of contact for both site personnel and site monitors needs for more standardization in the acquisition and the
should also be identiied. review of images. In the context of global clinical trials, the
study team must ensure that when contracting this activ-
ECG vendors
ity to an external partner, enough resources and adequate
Cardiac safety issues identiied post New Drug customer service is provided to sites with a 365/24/7 help
Application (NDA) (21 CRF part 314) approval, start- desk. his is of critical importance as the vendor is likely
ing in the late 1990s are some of the reasons that sev- to have responsibilities for qualifying and training sites
eral drugs have been pulled from the market such as for image acquisition and for developing and providing
Sertindole (atypical antipsychotic) and Terfenadine sites with image acquisition protocol.
(antihistaminic) or required signiicant labeling Similar precautions would apply to other technologies
changes [10]. As a result of these indings, clinical such as electronic Patient Reported Outcomes (e PRO)
ECG evaluation of novel agents is now required by the which have been progressively replacing paper-based
regulators. he recommendation was made to clinical diaries when collecting data directly from subjects. As
trial sponsors to conduct and analyze clinical studies the cost associated with such technology is substantial,
to assess the potential of a drug to delay cardiac repo- the study team and vendor must ensure that the technol-
larization [11]. he efect on cardiac repolarization, ogy and the proposed devices are really suitable for use
also known as ‘horough QT/QTc Trial’ is evaluated in the study (e.g., multiple country trials, special popu-
in healthy volunteers unless the drug cannot be stud- lations: pediatric, elderly subjects). he most import-
ied in that population due to unacceptable tolerability, ant question that needs to be addressed is whether the
such as cytotoxic agents. With additional data col- subjects can actually use the technology as required. It
lected in phase 3, 12-lead ECG monitoring is a widely may be necessary to test the technology in a few patients
used safety measure to identify drug-induced cardiac before using it in a clinical trial.
adverse efects. Other solutions such as IWRS/IVRS are discussed
Clinical trial sponsors increasingly rely on ECG below (‘IWRS/IVRS and IRT used to manage clinical
vendors that provide standardized ECG collection supply inventories’).
and reading that limit variations and inconsistencies
that are frequently observed between investigational Clinical trial budgets
sites. he selection process for ECG vendors are not
he cost of conducting clinical trials continues to
too diferent from those used when selecting a CRO
increase. A median phase 3 study of 800 subjects, 50
and should focus on the following speciic relevant
sites and 2 years from First Subject First Visit (FSFV)
criteria:
to Last Subject Last Visit (LSLV) can cost upwards of
• management of ECG equipment (shipping process, $25 million or $36 000/day [12]. he cost of bringing
supplies management) one compound to market over a 12 year period in 2000
• query process dollars was ~$802 million [13]. In short, conducting
• transmission timing and process from the site to clinical trials is an expensive undertaking and appro-
the ECG vendor priately budgeting for a clinical trial, whether a single
• turnover time between ECG acquisition, central center proof-of-concept study or a large, multi-center,
reading and data transfer back to the site and to the multi-year global phase 3 trial, it is of utmost import-
sponsor ance to avoid budget shortfalls, which are common.
• training program for site personnel and Budgeting for clinical trials is an art form given the
CRAs, and number and varied types of services required. Given the
• customer service. maturity of the CRO, and associated vendor industry
316
(e.g. central lab, interactive response system (IRT), template by key category, not all are discussed within
electronic data capture (eDC) ECG providers, ePRO, the section below, only a few are highlighted.
etc.), most clinical trial services providers have a good
understanding of the cost of doing business. What typ- Investigational medicinal product supply
ically creates havoc in most clinical trial budgets are here are many components to investigational medi-
all the unexpected challenges such as the need to add cinal product (IMP) costs, including a full understand-
visits, eicacy or safety measures, slow enrollment, and ing of the supply chain: procurement of the active
problems with drug availability. pharmaceutical ingredient (API), excipients, compo-
he most important step to creating a realistic nents (bottles, caps, labels, etc.), manufacturing, ana-
budget is to develop a detailed scope of work docu- lytical testing, dissolution testing, stability program,
ment clearly delineating all aspects of conducting the packaging, labeling, distribution, custom costs, and
trial and the responsible groups (see Table 27.1). he accountability/returns/destruction. Obtaining detailed
greater the granularity the more precise the budget information on the various costs from the appropriate
can become. Established pharmaceutical, biotech suppliers early and accounting for inlation and applic-
companies, and CROs have historical data that can be able overage (see below ‘Quantities of IMP to order’)
applied to the creation of each new budget and also rely will insure adequate funds are budgeted for this crit-
on experienced operational functional group leaders ical component. If the IMP is still in early development
to review budgets and provide input on the number phases, additional funds need to be allocated to formu-
and type of labor unit required for each activity. he lation development and testing. Additionally, studies
budgeting process is a team efort that requires the that include a comparator(s) will require comparable
review and input of all operational stakeholders. his efort for its procurement and oten come at consid-
includes obtaining a minimum of two or three propos- erable cost and matching placebo may not be readily
als for each type of service that will be outsourced and available if a drug is not in clinical development.
providing the vendors with the most current protocol Investigator-initiated studies oten try to obtain IMP
and realistic timeline for key milestones (see below directly from the manufacturer at no-cost. As part of ini-
‘Timelines’). tial grant submissions, investigators may obtain a letter
When budgeting for a global trial the budget of support from the manufacturer indicating that the
should also factor in anticipated luctuations in cur- manufacturer will provide the IMP. However, as most
rency exchange rates. Most vendors stipulate pricing in grant funding takes anywhere from 1 to 3 years to obtain,
the currency of the parent company so that the spon- depending on the review cycle, company priorities, lead-
sor must consider the impact of currency luctuations. ership changes, mergers and acquisitions, and the gen-
Multi-year studies also need to take into account inla- eral economy oten interfere with these letters of support
tion and increases in labor rates associated with annual being upheld. hese letters of support are not legally bind-
merit increases that occur at most companies. ing. herefore, investigators oten end up scrambling to
ind funds to cover this critical piece of the study.
Components of a clinical trial budget
Most industry budgets are created on a unit activity Operational team effort
basis (e.g. (# monitoring visits/site) × (# of sites) × (cost Whether budgeting the time and efort for tasks as per-
per routine monitoring visit) = total routine monitor- cent efort or based on unit activity, the unit activity
ing budget). Investigator-initiated studies conducted eforts for the responsibility of each functional area need
within academic institutions and funded by NIH, to be identiied in order for the appropriate unit cost and
FDA, DOD are oten based on a percent efort basis number of units or percent efort to be calculated cor-
(e.g. Project manager 50% efort in Year 01–02; 35% rectly. Clinical trial budgets typically include costs for
efort Year 03). Regardless of which method is used, the project management, data management, medical writ-
key is attempting to estimate the total efort required ing, biostatistics, regulatory, clinical supplies support,
based on a inal protocol and schedule of activity. he meeting planning, investigator training, and admin-
key components of a typical clinical trial are broken istrative activities; the list goes on, depending on the
down into the main categories below. A non-inclusive complexity and size of the study. In the budget template
budget driver template can be found in Appendix 27.A. provided in Appendix 27.A, costs are broken down by
As most of the general trial costs are covered in the unit of activity within the various functional areas.
317
Committees and consultants Vendors

Budgeting for the various committees and consultants Depending on the protocol requirements the services
required for the conduct of a study, whether a DSMB, of one or more vendors may be required. In most stud-
adjudication committee, scientiic advisory committee, ies, central laboratory services are required for routine
or speciic consultants requires determining the mem- safety laboratories. Laboratory fees typically include
bership, frequency of meetings, and compensation per cost per safety assays (clinical chemistry, hematology,
meeting. he budget should include the contingency urinalysis and pregnancy tests). Other specialty assays
for impromptu meetings by the DSMB if safety issues may be required. For pharmacokinetic (PK) studies,
emerge. budgeting for the PK assays is also required and may
require a diferent vendor. In addition to the actual
Data management and interactive response system assay, laboratory budgets also need to factor in kits for
Budgets for electronic data capture systems and paper blood and urine specimen collection and shipment
CRFs do have certain shared features, including the to the laboratory, associated courier fees, and storage
actual database build, hardware/sotware and mainten- fees as some samples could be kept for a period of time
ance fees, and creation of the Data Management Plan which exceeds study duration. If frozen specimens are
(DMP). For studies involving paper CRFs, the cost for transported, additional costs for dry ice, special ship-
the reproduction of each CRF binder, typically created ping packs must also be included.
as 3 part no carbon required (NCR) and the cost for the Other vendors that may be required include cen-
shipping of completed CRFs and data queries need to tral ECG costs, electronic diaries, holter monitoring
be factored in. Likewise budgeting for an IRT solution, services etc.
includes the hardware/sotware and maintenance fees,
along with transaction fees and 365/24/7 help desk Site costs
support. Costs for IRT systems vary broadly by the pro- In most cases, site payments are the single largest budget
vider, so obtaining multiple bids will help determine item (~ 50–70%). he site budget typically includes the
the scope of the solution required. following components: a per-subject fee (PFS), IRB/
EC fees, one time non-reimbursable start-up fee, and
Monitoring other per-occurrence costs.
he monitoring budget is determined based on the
monitoring plan, which describes the planned number Per subject fee
and type of visits per site. Generally this will include a he per subject fee (PSF) is based on the protocol sched-
pre-study visit [for those sites not previously used by ule of activities and the assessments conducted at each
a company or not used over a set period of time (e.g. visit. Appendix 27.A provides a template for how the
2 years], one site initiation visit, routine monitoring budget would typically be determined. Many companies
visits, and one closeout visit (see Chapter 28). he dur- have access to costing databases that help with determin-
ation for visit type includes time and efort for visit ing reimbursement rates for typical procedure types that
preparation, travel to/from the site, on-site monitoring are oten regionally based. In most cases, companies will
time, and time for report writing and follow-up. Many determine a range for the PSF recognizing that costs in
companies have metrics on average time for each type larger metropolitan areas will generally be greater than
of activity and budget accordingly. For example pre- costs in small cities and that costs vary from country to
study visits are allocated at 14 hours, initiation visits at country. he PSF should factor in the anticipated screen
17 hours, and routine and closeout visits at 24 hours for failure rate, such that these additional costs, typically
all associated activities. some percentage of the cost of the screening and base-
he other key component for the monitoring budget line visit are also added to the total budget.
is the travel which typically includes: airfare, car rental, As part of the PSF calculation, costs that are stand-
hotel, meals, and incidentals. In an efort to reduce the ard of care (SOC) costs, typically covered by insurance,
travel budget many companies have moved towards a need to be lagged. If companies are reimbursing the
regional-based monitoring strategy to reduce travel SOC costs as part of the PSF then sites need to ensure
time and airfare, ideally with monitors living within that processes are in place to prevent billing these
driving distance of most of the sites they monitor. costs to insurance. Many companies put very stringent
318
language into their agreements with sites to prevent Budget management

sites from ‘double-dipping’. For many NIH funded he person who manages the budget during the trial
studies SOC costs are determined up front and the must have irst hand knowledge of whether or not the
costs for these are not budgeted as part of the PSF so study is running according to plan or if unexpected costs
that sites would bill insurance. are being incurred due to study delays or new protocol
It is critical to get input from local and regional requirements. his person is oten the PM. It is critical
staf regarding the proposed PSF as in some cases the that the person responsible for the budget communi-
PSF may be signiicantly more than what is typically cates early and oten regarding anticipated shortfalls so
received for a comparable study. that sponsors can make plans to raise additional funds
or for investigators to go back to the funding source or
IRB/EC fees an alternate funding source to obtain additional fund-
he study budget needs to factor in the average cost ing. Like many construction projects, clinical trials
per IRB/EC initial review and approval of the protocol, typically cost more than originally budgeted and spon-
ICF, investigator brochure, and ideally any advertising sors and investigators should plan accordingly.
materials. Additionally, the budget needs to factor in
annual IRB/EC renewals for the total duration of the
study and some assumption regarding the anticipated Timelines
number of amendments. On average this may be one or As the person responsible for ensuring the study is com-
two amendments per year. Also, the budget should fac- pleted with high quality, within the budget and on time,
tor in some percentage of investigator turn-over that the PM (this may be the PI for investigator-initiated trials
also requires IRB/EC review and approval. conducted at academic institutions) needs to go through
a thorough and detailed preparation phase, planning all
One-time non-reimbursable start-up fee possible activities, tasks, or actions that need to be accom-
Start-up costs, regardless of actual enrollment, are plished during the life of the project. he project team and
oten requested by experienced sites to cover the labor the PM generally rely on using a standard template devel-
efort associated with getting a site up and running. oped from previous experiences and which identiies the
It includes reimbursement for time and efort spent diferent steps together with the duration of each step.
preparing the IRB/EC submission, time spent reading Most project teams or PMs use a countdown approach
the protocol, training site staf and source document where the main goal is identiied as the inal task and all
creation, time spent by the investigator and coordina- activities that need to take place to achieve that goal are
tor at an investigator training meeting and an on-site deined. Since conducting clinical trials includes hundreds
initiation visit, getting systems and processes in-place of tasks, it is not uncommon to breakdown the timelines
for conducting study visits, etc. he start-up fee may by study phase (e.g. ‘study planning and set-up’; ‘study exe-
vary based on the complexity of the protocol and added cution’, ‘study reporting’) with the main objective for each
resources, equipment, materials, etc. that may need to phase identiied as a inal task (e.g. ‘80% of sites initiated’,
be obtained. ‘last subject last visit’, ‘inal clinical study report signed-
of ’). his document is then used routinely to measure the
General office/clinic costs progress of the study by tracking the completion of each
he budget should also include costs for copying, faxing, step, for reporting purposes, and also to identify poten-
phone usage, courier, long-term storage, materials (paper, tial deviations from the original plan and put in place the
gloves, folders, etc.), specialty equipment, or space. appropriate corrective actions or contingency plans.
Other per-occurrence costs Elements of realistic timelines

Other per-occurrence site costs for consideration he proper planning of clinical trial activities requires
include time and efort for serious adverse efect pro- that each process and associated tasks be understood as
cessing, amendments, pharmacy set-up fees, local some activities can be run in parallel while other activ-
advertising (e.g. newspaper, radio, etc. advertising cre- ities must be sequential. Most project teams develop
ation time, plus actual advertising costs) and others project timelines based on historical data and recent
that may be study speciic. experience. However, in order to improve the accuracy of
319
the planning, it is critical for the PM to obtain the appro- management as it really signals the beginning of the
priate input from key players from the team (internal or study. he FSFV is oten associated with key inancial
external) such as data management, clinical pharmacy, pay-outs, which is critical information for stockholders
site monitoring or regulatory afairs staf, investigative and potential investors in publically traded companies
sites, and other functional areas, and to obtain endorse- and oten coincides with a major press release. Assuming
ment on the timeline from the project team and from that all previous steps and previous key milestones have
the management. As the study proceeds, it is import- been achieved and that sites have everything they need
ant that the planning document be updated with actual to get started, the FSFV milestone depends upon the
dates in real time and shared regularly with individuals site’s ability to recruit subjects. A comprehensive site
from the study team that are accountable for delivera- feasibility study (see Chapter 28) conducted during the
bles, especially if there are major shits in achieving key planning phase can help the project team in validating
milestones. For instance, if enrollment goes signiicantly the planning strategy as well as identifying unforeseen
quicker than originally planned then downstream mile- risks and developing contingency plans.
stones, like database lock would be shited forward. he
data management group and biostatistical group need Study execution
to be informed of this early on to ensure resources are he Last Subject Last Visit (LSLV) milestone signals the
made available earlier than originally planned. end of the trial. Every PM has faced situations where
enrollment is running behind schedule, with missed
Discussion of critical timelines milestones targets and milestones. he key for timely completion
of a clinical trial relies on a protocol that is feasible and
Study planning and set-up phase on sites that can recruit enough patients within the
he main goal of the planning and set-up phase is gen- project time frame. Site feasibility assessment must be
erally the initiation of all sites or at least of a certain conducted early during the planning phase to identify
number of sites. To ensure accurate planning one needs those sites that have access to the applicable patient
to identify all of the steps leading to site activation. For population, and that have the experience and infra-
global clinical trials, the regulatory step is probably one structure necessary to conduct the study.
of the most challenging as spelled out in ‘Regulatory
requirements’ (see below). In preparation for the initi- Database lock and database freeze
ation of sites, others items that need to be planned ahead Terminology for “locking/freezing” of the database
and tracked down during the set-up phase include: varies from company to company. What is import-
• protocol and protocol-related documents (informed ant is the general process for securing the database to
consent form (ICF), CRF, investigator’s brochure, further changes following treatment unblinding. he
regulatory documents, monitoring plan). database lock is a procedure used to prevent data from
• IMP (manufacturing, labeling, packaging) being modiied or altered when multiple users have
• contracts with vendors and investigator sites access to the database. Generally it takes place ater
• other clinical supplies and material (laboratory all data from investigative sites and any external ven-
kits for blood testing, ECG machines, investigator’s dors are received, checked for completeness, reviewed,
manuals, electronic diaries, etc.) edited, and all queries and issues have been resolved.
• site training strategy and materials he clinical data can either be locked at site, patient, or
• monitoring documents (e.g., monitoring plan) study level depending on the completeness status of the
• data management activities: database set-up and database. he locked database is then used by the bio-
validation, statistical analysis plan (SAP), clinical data statistician and the clinical programmer to ensure that
review and validation plan, etc. (see Figure 27.1). their inal programs are suitable and allow the proper
analysis, per the Statistical Analysis Plan (SAP), of the
data. Treatment codes are then assigned to each sub-
Study start-up phase ject, per the original randomization, before the data-
he end of the study planning phase is generally deined base is frozen. As this stage no additional changes can
by the irst subject enrolled or the First Subject First be made to the database. Should any update need to
Visit (FSFV). In the biopharmaceutical industry, this be made to a frozen database, this can only be made
key milestone is one of the most scrutinized by upper by users that have privileged access to the database.
320
In order to meet CFR 21 part 11 and other regulatory IV for food, and Chapter V, Subchapters A, B, C, D,
requirements, the database must include an audit trail and E for drugs and devices). Good Manufacturing
for recording and tracking all activities from individ- Practice regulations in the US are deined in 21 CFR
uals responsible for the development and maintenance parts 210 and 211, and Guideline on the Preparation of
of the clinical database. Investigational New Drug Products, March 1991 [15],
and in the European Union are deined in the Clinical
Reporting Trials Directive 2001/20/EC [16]. According to cGMPs
he inal timeline activities associated with true com- manufacturers, processors, and packagers of drugs,
pletion of a clinical trial include the reporting require- medical devices, some food, and blood products are
ments as spelled out in Chapter 28. required to take proactive steps to ensure that their
products are safe, pure, and efective. he regulations
Investigational products: drug or device require a quality approach to manufacturing, enab-
he success of any clinical trial depends on the avail- ling companies to minimize or eliminate instances of
ability of adequate quantities of the investigational contamination, mix-ups, and other errors. Failure of
product (IP): drug or device (herein ‘clinical supplies irms to comply with cGMP regulations can result in
or IP’). Clinical supplies are frequently a bottleneck serious consequences including recall, seizure, ines,
especially in investigator-initiated studies. Inadequate and jail time. Issues including record-keeping, person-
time is spent in the planning phases understanding the nel qualiications, sanitation, cleanliness, equipment
complexity of the clinical supply chain, which includes veriication, process validation, and complaint hand-
the availability of the API, excipients, and components ling are addressed by cGMP regulations. Most cGMP
(bottles, caps, kit boxes, pill counters, dosing syringes, requirements are very general and open-ended, allow-
etc.), understanding the manufacturing timeline and ing each manufacturer to decide individually how to
complexity, availability of an appropriate stability best implement the necessary controls. his provides
program, understanding possible delays at customs, much lexibility, but also requires the manufacturer to
and the time required for primary and secondary interpret the requirements in a manner best suited for
packaging, labeling, and distribution. Unfortunately, their individual business. When selecting vendors in
clinical supplies are not straightforward commod- the clinical supply chain it is critical that each vendor
ities readily available in the marketplace. As such, a adheres to cGMPs.
dedicated person needs to be identiied in the planning here are key terms used throughout the supply
phase of a trial, through LPLV, whose primary respon- chain that deined in 21 CFR part 210.3:
sibility is to ensure appropriate quantities of clinical • API (active pharmaceutical ingredient): Any
supplies are available when and where they are needed componentthatisintendedtofurnishpharmacologic
in a continuous, uninterrupted fashion. activity or other direct efect in the diagnosis, cure,
Most pharmaceutical and biotech companies have mitigation, treatment, or prevention of disease, or
clinical supply departments whose sole responsibility to afect the structure or any function of the body
is to focus on all aspects of the IP used in a clinical trial. of man or other animals. he term includes those
Investigator-initiated studies do not typically have this components that may undergo chemical change in
luxury and in many cases academic investigators are not the manufacture of the drug product and be present
properly trained in all aspects of IP manufacture, pro- in the drug product in a modiied form intended to
curement, testing, and all other regulatory aspects of furnish the speciied activity or efect.
manufacturing IP, etc. An in-depth review of supply chain • Drug substance: Active pharmaceutical ingredient
management in the drug industry can be found in [14]. (API) or ‘raw drug substance’
Clinical supplies used in clinical trials must be man- • Drug product: A inished dosage form, for example,
ufactured according to current Good Manufacturing tablet, capsule, solution, which contains an active
Practices (cGMPs). he ‘c’ stands for ‘current,’ remind- drug ingredient generally, but not necessarily, in
ing manufacturers that they must employ technolo- association with inactive ingredients. he term also
gies and systems which are up-to-date in order to includes a inished dosage form that does not contain
comply with the regulations that are set forth in the an active ingredient but is intended to be used as a
Federal Food, Drug, and Cosmetic Act (Chapter placebo [17].
321
Device classification from accelerated degradation studies indicate that

he FDA has established classiications for approxi- the drug will remain stable through that date.
mately 1700 diferent generic types of devices and • Retest dates (investigational product): he date
grouped them into 16 medical specialties referred to as assigned by the manufacturer ater which the drug
panels. Each of these generic types of devices is assigned substances need to be examined (retested) to ensure
to one of three regulatory classes (class I through class that they remain within suitable speciications for
III) based on the level of control necessary to assure the use in the manufacture of a drug product.
safety and efectiveness of the device (see Chapter 19). For many investigator-initiated studies, planning and
Determining what class a device is categorized in and budgeting for stability programs is oten overlooked.
the regulatory requirements can be found on the FDA his is an important aspect of the clinical drug supply
device website [18] and within the regulations 21 CFR expiry dating, which in turn afects all study timelines,
parts 862–892. and should remain at the forefront of the investigator’s
responsibilities to fulill this requirement.
Stability testing of drug substance and drug product
Methods for blinding the IP
he purpose of stability testing is to provide evidence on
Varying methods are available for creating a placebo-
how the quality of a drug substance or drug product var-
to-match (PTM). his includes manufacturing iden-
ies with time under the inluence of a variety of environ-
tically matching tablets, capsules, powder, IVs, oral
mental factors such as temperature, humidity, and light,
solutions, devices, etc. Attempting to create a PTM
and to establish a re-test period for the drug substance
for marketed tablets/capsules is oten complicated by
or a shelf life for the drug product and recommended
branding either engraved or printed on a tablet/cap-
storage conditions. As part of the cGMP regulations, the
sule, which cannot be readily duplicated on the PTM.
FDA has adopted the standards of the ICH [19] which
One of the most common forms of blinding oral dos-
requires that drug products bear an expiration date
age forms is over-encapsulation of tablets or capsules,
determined by appropriate stability testing (21 CFR
thus allowing for an identically matched inal capsule
211.137 and 211.166). he stability of drug products is
in all physical attributes: size, color, shape, imprints,
required to be evaluated over time in the same contain-
taste, smell, solubility, etc.
er-closure system in which the drug product is marketed
or being provided to subjects participating in the clinical • Over-encapsulation: Over-encapsulation of mark-
trial. In some cases, accelerated stability studies can be eted product, changes the formulation of the original
used to support tentative expiration dates in the event dosage form. he efectiveness of tablets/capsules
that full shelf-life studies are not available. When a irm relies on the drug dissolving in the gastrointestinal
changes the packaging or formulation of a drug product, tract. Over-encapsulated marketed product should
stability testing must be repeated. be tested to ensure comparable dissolution to the
he description of the stability program includes the original formulation. his can sometimes be diicult
list of tests to be performed, analytical procedures, accept- to achieve especially with controlled release (CR) or
ance criteria, test time points, storage conditions, and dur- extended release (ER) type products [21].
ation of the study. Standard stability programs for solid Any time double-blind materials are created for use in
dosage forms are delineated in the FDA guidance docu- a clinical trial, testing should be conducted to ensure
ment [20]. he stability program should be conducted for that patients, investigators, site coordinators, and other
as long as the IMP will be used in the ield and a minimum members of the project team are unable to distinguish
of 1-month accelerated stability data should be available active product from placebo. Engaging the help of the
before any materials are used in the ield. biostatistician to develop formal testing methodology
• Expiration dates (marketed product): he (e.g., total number of matched pairs to test and SAP)
expiration date on marketed product is based on the can ensure the integrity of the blind from a product
data obtained from the stability program collected standpoint prior to study launch.
from three individual production size batches in In those cases where it is not possible to make a
its original closed container. he date does not PTM or where it is not cost efective, other blinding
mean that drug was unstable ater a longer period; methods may be used. For example, for an in-patient
it means only that real-time data or extrapolation early proof-of-concept study, an unblinded pharmacist
322
may administer the IP to a blind-folded subject. Any Quantities of IMP to order

unblinded staf should have no other role in the study he following general formula can be used to deter-
and should not discuss the study with any other project mine an initial estimate of the amount of investiga-
team members. tional drug that will be needed for a given treatment
arm. Although protocols generally specify time inter-
Clinical supply labeling and packaging
val between patient’s visits, most protocols also allow
he labeling requirement for IP is determined by the for some lexibility (window) to allow a subject to come
regulations of the countries in which the study will be earlier or later. his visit window should be taken into
conducted. In general, label text for IP includes, but is consideration for estimating the quantity of inves-
not limited to the following: study protocol identiica- tigational drug required for the trial and one should
tion, investigational caution statement, storage condi- assume that all subjects would come later than by
tions of the product, administration directions, expiry required by the protocol (e.g. the visit may provide for
or re-test date of the material, name and address of a month 3 visit with a ± 7 day window. A subject could
the sponsor, manufacturer, and /or distributor. If con- therefore be seen at 3 months (day 91) plus 1 week (day
ducted in multiple regions, appropriate certiied trans- 98) and still be within the protocol requirements. Day
lations are required. Determining upfront if all clinical 98 would be considered the plus 7 day side of the visit
supplies will have the same labels (e.g., booklet labels window).
allow for multiple languages such that the same clin-
ical supplies can be used for all regions) or if clinical Total amount of study drug for arm #1 = [(Dosage strength
× doses/day) × (# dosing days/subject + plus side of each
supplies will be region-speciic will aid in determin-
visit window) × (total number of subjects in treatment
ing the quantities needed. Consulting with the applic- arm] × overage (e.g. 1.30 for 30% overage).
able region regulations during the planning phase will
ensure materials are appropriately labeled and avoid It is best to calculate quantities based on the inal
the necessity for rework of the supplies. Knowing the packaging coniguration as in the example provided;
container system sizes upfront helps determine appro- inal unit dose count may not always coincide with
priate label size and thus total label content that can be the count contained in the inal kit coniguration per
included without running into space constraints. Most subject.
vendors that deal with packaging and labeling are well Overage accounts for manufacturing waste, loss,
versed in the labeling regulations. damage, extra supplies at the sites that may never be
used, and allows for the ability to bring additional
Subject use considerations sites on-board quickly in the event of slow enrollment.
Formulation of IMP afects subject compliance. here Overage estimates vary based on many factors and
are several key items to take into consideration when may be in the range 15–30%. Overage can be decreased
thinking about optimizing subject compliance. Large via use of an IRT (see below). Similar calculations can
tablets/capsules may be diicult for older subjects be computed for determining the number of devices
with neurological conditions to swallow. Mixing IMP required in a device study; however given high costs
with food/liquid may cause weight gain if daily dosing of many devices, overage requirements are typically
requirements are frequent. Volume of IMP distributed much lower.
may make transporting IMP home from the clinic dif-
icult. Likewise, the packaging and labeling may also Forecasting IMP needs
afect compliance. Use of child-resistant caps or blister While knowing the total quantity of investigational
packets may make it too diicult for subjects to open product required for the duration of the study is a rela-
the container or worse yet, once opened, not be able tively straight forward calculation, what is less obvious
to close again. Small label text or poorly diferenti- is determining how much IMP is required at the start of
ated labels may cause confusion if there is more than the study and when to manufacture subsequent batches.
one container that must be opened per each dose. All his becomes especially tricky and important for lar-
of these factors need to be considered early on in the ger long-term trials. Factors that inluence a forecast-
planning phase to ensure that the packaging conigur- ing algorithm include: site activation rates; enrollment
ation does not negatively impact compliance, thereby rates; premature withdrawals; the re-test or expiry date;
skewing study results. lead time required to order and receive API, excipients
323
and components; lead time and cost associated with the to drug status and readily allow for changes to site
manufacture of multiple batches; possible challenges supply strategies at a site or regional level. here are
in matching active product with an identical match- multiple advantages to using these systems including
ing placebo on multiple manufacturing runs (e.g., savings in overall amount of drug required, just in
color variation between batches can signiicantly afect time delivery to the sites when needed, accurate study
whether materials are truly identical, thus impacting progress, and enrollment information, and ease in the
the study blind); storage constraints of bulk product at overall management of the drug inventory. However,
the vendor packaging site; and transportation and cus- as clinical trials are more complex (e.g., multiple
toms (for international shipments) time and costs for cohorts, stratiied randomization, adaptive design,
multiple shipments. Plans must be put in place early in etc.) the team and the vendors must plan for enough
the study to forecast appropriate quantities of IMP to time for the design, development, and validation of
avoid two major pitfalls: 1) producing too much IMP at the tools. Also, the clinical team should bear in mind
the start of the study that expires before it can be used that any future change in the design of the study (e.g.
(e.g. delayed study start, lower than expected enroll- randomization, treatment arms) could have a signii-
ment); and 2) not manufacturing enough IMP such that cant impact on the study timelines and budget as the
there is an IMP shortage (e.g. enrollment quicker than system could go through extensive changes requiring
anticipated). Forecasting reports should be reviewed additional work and re-validation.
and adjusted regularly during the course of the study For large and long-term (more than a 12-month
taking into account actual enrollment rates, enrollment study duration) phase 3 neurological studies where
projections, and actual premature withdrawals and per- there is routine resupply (e.g. every 3, 4, or 6 months),
manent drug suspensions, to determine timing of sub- possible stability issues, and signiicant drug costs, the
sequent manufacturing/packaging runs. use of an IRT is something that should be considered
initially in the planning phases of the trial.
IWRS/IVRS and IRT used to manage clinical supply
inventories Transportation and storage considerations
An integrated solution for the optimization and the he IP is transported within the clinical supply chain
management of the drug supply chain in clinical regularly between manufacturing facilities, to the pri-
trials is provided by IWRS/IVRS or most recently mary and secondary packagers, to the central depot
Interactive Response Technology (IRT). It enables for distribution to the sites, and including the return
seamless 365/24/7 management of randomization, of used/unused IP. he chain of custody for a shipment
drug/device supply, patient diary data, laboratory may be quite complex. Factoring in all of the possible
samples, treatment disclosure information, tempera- temperature luctuations throughout the chain of cus-
ture excursions of drug product, and drug returns tody is critical for ensuring the integrity of the product
and reconciliation, all through a web-based platform once received by the sites. Should temperature excur-
that can be accessed anytime via the Internet or the sions be identiied by site personnel they should be
telephone. here are many vendors that now ofer promptly reported to the sponsor so that the impact
this service compliant with 21 CFR part 11 (regula- on IP stability can be assessed before it is dispensed to
tions covering computerized systems used in clinical subjects.
investigations) [4]. Many companies are now utilizing Supplies that are temperature sensitive may require
some form of IRT to: help manage complex packaging special temperature-controlled and monitored ship-
conigurations and titration schedules, to proactively ping in order to ensure constant temperature through-
manage global expiry and label updates, to dissociate out the chain of custody. his is oten accomplished
the enrollment ID # from the drug kit thus allowing via the development of special packaging that has
the IP to be non-subject speciic, and to provide real- been validated (shipping studies have been conducted
time visibility into accountability documentation, from central depot to sites in countries furthest from
including inventory updates and a complete view of the depot to ensure package maintains the appropri-
the entire supply chain throughout the process and ate temperature for the duration of the shipping period
across all sites. he IRT systems maintain site and and may also include testing of the IP following brief
depot inventory reports that give immediate access to temperature excursions to ensure the IP is still useable).
real-time shipment information and dynamic updates Temperature monitoring devices are oten included
324
in shipments to monitor and record the temperature Accountability, reconciliation and destruction
throughout the transport of the IP. Regulatory agencies mandate that all IP manufactured
If shipping internationally, it is critical to establish for use in clinical trials have cradle-to-grave tracking
a communication link between shipper, consignee, for accountability, reconciliation, and destruction [24].
custom broker, importer of record, and transport Terms are deined as follows:
provider [22]. It is important that import approval is
• Accountability: he amount of IP dispensed to a
obtained before shipping to avoid delays and ensure
subject vs. the amount returned by the subject, which
that the transport provider has a clear understanding
takes into account missed doses, lack of compliance,
of transportation requirements (e.g., temperature-
lost IP, and any study drug suspensions. his is
controlled trucks). Most of the large couriers have
typically documented at the subject level on a ‘drug
been working with industry for years to help address
accountability log’ using the smallest dosage level (e.g.
some of these issues and can oten provide signiicant
tablet, capsule, ml, mg etc.) and monitored routinely
advice in managing the logistics for international
by the CRA during on-site monitoring visits.
transportation of temperature-sensitive IP prior to
• Reconciliation:hisistheamountofIPmanufactured,
shipping. hose involved with the actual shipping
amount released to the packager, shipped to the
process should also be knowledgeable and trained on
clinical sites, dispensed to the subjects, returned
the International Air Transport Association (IATA)
from the subjects, and destroyed. he amount of IP
regulations, which include guidelines for packaging
remaining at each site at a closeout in addition to what
and labeling diagnostic specimens, and use of haz-
was dispensed to the subjects should be equivalent to
ardous materials like dry ice used for cold chain
the amount received during the course of the trial.
supplies [23].
he following key variables are typically tracked
Additionally, when exporting clinical materials,
through the life cycle of the manufacture of the IP to
regulatory compliance with government agencies in
its inal destruction: Batch/lot #, kit #, site #, subject
both the originator and destination countries must be
#, date dispensed, date returned, amount returned,
ensured prior to shipping.
destruction date, and quantity.
Factoring in lead time to transport material from
• Destruction: he process of destroying all
one location to another is an important factor in ultim-
remaining IP at the conclusion of a study ater
ately determining lead time for ordering supplies and
all reconciliation has been completed, using the
ensuring IP arrives where it is required (i.e., at the
appropriate documented method for destruction
sites) when there are patients ready to be enrolled. he
(e.g., incineration, landill etc.).
impact of transportation costs on the overall study
budget should not be forgotten, keeping in mind that Numerous regulations, both cGMP and GCPs address
these costs are signiicantly afected by luctuations in accountability, reconciliation, and destruction includ-
fuel prices. ing 21 CFR parts 312, 210, 211 and ICH guidelines. he
Most sites have limited room temperature secured life cycle of IP is complex transitioning from a manu-
storage capabilities, and as non-cGMP compliant facil- facturing environment that must adhere to cGMPs to a
ities, they do not always have appropriate temperature/ clinical environment bound by GCPs.
humidity monitoring capabilities. hese storage con- For complete and accurate drug accountability,
straints should be factored in when considering how reconciliation, and destruction it is essential that all
much IP should be shipped as part of a site’s initial set of companies involved in the supply chain, including
supplies and any restock supplies. he storage require- any monitoring groups, and the participating clinical
ments typically become more problematic for frozen sites have SOPs in place to address these critical aspects
or refrigerated IP. of the IP. Planning for these essential elements at the
As part of the items to be checked during on-site start of the study will ensure that they are tracked by all
monitoring visits, the sponsor monitor or CRA will stakeholders in the supply chain.
ensure the adequate delivery, storage conditions, and he use of IRT can greatly facilitate complete and
proper administration of IP. Proof of documentation accurate drug accountability and reconciliation as well
to verify that sites have properly stored the IP may be as documentation of destruction, especially if deployed
requested throughout the study duration (i.e., if return- at the start of the study with key supply chain stakehold-
ing to distributor or site to site transfers, etc.). ers having clear visibility and input into the system [25].
325
When conducting clinical trials in the European Even if all clinical data management activities are
Union, IMPs may only be used ater being released by a planned ahead, the project team still relies on the
trained or certiied Qualiied Person (QP) [25]. Products investigative sites and services providers such as ECG
that are imported from countries outside the EU are vendor or central laboratory to enter clinical data into
subject to a release by a QP. he release will be at the QP’s the CRF/eCRF or transfer the data to the sponsor on
discretion, but will be based on a quality assessment of a regular basis. It is therefore critical to ensure that
the manufacturing site and review of batch records. he well-trained and qualiied personnel either at the
depth of assessment will be dependant upon the recog- sponsor or on site is dedicated to this activity and is
nized standards of GMP in that country. either following-up with sites or vendors, or is dir-
ectly responding to clinical data management quer-
ies, in a timely manner. Again, adequate resources
Data management at sites, availability of dedicated personnel as well
Similar to any other phase of the project, clinical data as the experience of the site in clinical trials are crit-
management activities must be identiied and planned ical aspects that need to be evaluated when selecting
ahead during the planning phase of the study. he project potential sites and investigators to ensure the success-
team with representatives from key functional areas (data ful execution of a clinical trial.
management, programming, biostatistics, medical/clin-
ical) should get together to set the direction and execute Coding of clinical data
the database build and data analysis requirements. he Case Report Forms are used to collect various subjects’
critical items required for database lock include: information such as medical diagnosis, medical history,
• the build and test of the study database, automatic adverse events, and concomitant treatments. Although
edit checks, and derivations instructions for completion of CRFs are aimed to har-
monize the CRF data entry process, the nature of clin-
• database structure and variable naming conven-
ical trials involving investigative sites from diferent
tions should follow the Clinical Data Interchange
practices or background, and diferent cultures, lead to
Standards Consortium (CDISC) conventions.
a lot of variation in the data entered in the CRF ields.
CDISC is a global, open, multidisciplinary, non-
In an efort to standardize and harmonize the report-
proit organization that has established standards
ing of information across sites and countries, the cod-
to support the acquisition, exchange, submission
ing of clinical data is a critical step when building-up
and archive of clinical research data and metadata.
the integrated clinical database.
CDISC standards are vendor-neutral, platform-
he medical coding system is therefore a harmon-
independent and freely available via the CDISC
ization system relying on the use of medical diction-
website at www.cdisc.org.
aries which provide matching terms for terms entered
• the deinition of electronically collected data vs. in the CRF. Various dictionaries exist and can be used
eCRF or paper CRF collected data such as laboratory as references. he most commonly used include the
data or ePRO data WHO Drug Dictionary Enhanced (WHO DDE) for
• the deinition of data transfer requirements coding of concomitant medications [30], and for the
• the identiication and set-up of blinding require- coding adverse events, COSTART (Coding Symbols
ments for a hesaurus of Adverse Reaction Terminology),
• the conirmation and validation of automated and MedDRA (Medical Dictionary for Regulatory
data lags including automated lags for prior Activities) are frequently used. [26].
concomitant treatments, or the automated lags for he main objective of MedDRA,which was devel-
treatment emergent adverse events as opposed to oped as an ICH initiative, is to standardize the com-
adverse events munication between the industry and the regulators
• the deinition of the data cleaning, review and through all phase of the drug development cycle
validation processes, including standard validations including investigational and marketed drugs. he
• the deinition of data presentation (e.g. clinical data Maintenance and Support Services Organization
listings) reviews and maintains MedDRA on a regular basis, so
• the conirmation of successful transfer of test data that the data management group must always ensure
from the diferent vendors. the most current version is used.
326
Biostatistics and programming the relevant section of the SAP. he SAP must be con-
sistent with the statistical section of the protocol, and
The statistical analysis plan must be revised according to any subsequent protocol
he SAP is a document that provides a detailed descrip- amendment, if applicable. In order to permit the devel-
tion of the statistical analysis described in the protocol opment and the validation of the required analysis pro-
including, detailed procedures, methodology, and stat- grams, the SAP must be completed as early as possible,
istical techniques for running the statistical analysis. preferably before enrollment begins and in any case
he objectives of the SAP are multiple. It documents the before the study is unblinded (for blinded studies).
rationale for the choice of the statistical model applied
to the statistical analysis and provides evidence that Regulatory requirements
the analysis is performed in a pre-speciied manner. he PM needs to have a clear understanding of the dif-
Ultimately, it should contain enough information for ferent regulatory requirements (e.g., required docu-
the analysis to be repeated by the reviewing authorities. ments, translation), process (local ethics committee,
Guidance documents giving directions to spon- versus central ethics committee, national drug agency),
sors for the design, conduct, and analysis of clinical and timelines associated with submitting and obtaining
trials have been released by regulatory authorities, in local regulatory clearance for the study. Standard delays
the various regions, and should be used as a reference and turn around time for obtaining IRB/IEC and regu-
when planning the statistical analysis [27]. latory approval may range from a couple of weeks to a
Drating the SAP typically begins shortly ater the couple of months. In Europe, the European Medicines
protocol synopsis is inalized. It is generally prepared Agency (EMEA) established guidelines and guidance
by the biostatistician and reviewed by key study team documents requesting that applications for clinical tri-
members: clinical pharmacokineticist (e.g., for phase 1 als authorization from the competent authorities and
studies), statistical programmer, DSMB, medical team ethics committee be reviewed within 60 days of a valid
members, steering committee members, health outcome application [28]. In the US, the FDA must respond to
group (if applicable), or regulatory afairs representatives. a new investigational new drug application within 30
he information required in the SAP includes days [29]. For global trials the PM must seek advice
information on: from internal regulatory or CRO staf or from local site
• study design monitors or ailiates to make sure about country spe-
• study objectives ciic submission processes and requirements.
• sample size and statistical power Importantly, it should not be assumed that the sub-
• randomization and blinding techniques mission of the protocol and its approval will happen
• interim analysis requirements at the same time for all sites. Local diferences must be
• primary and secondary endpoints and comparison considered when planning study start-up activities and
of interest setting up start-up objectives and realistic timelines.
• assessments methods for endpoints
• other endpoints and ancillary data Conclusion
• treatment groups for analysis Properly planning and appropriately resourcing for the
• handling of missing data conduct of a clinical development program or a single
• statistical sotware used clinical trial is an important upfront investment that
• references. takes time, input from key stakeholders, and needs to
be taken seriously. In most cases active planning can
he SAP must also contain a description of the popu- easily take 6–9 months. Just like a construction com-
lations studied and analyzed (e.g., ‘all randomized’, pany would not begin to build a multi-billion dollar
‘intent-to-treat’, ‘per-protocol’ populations). Although hotel, without the fully approved architect’s blueprints
the primary analysis is typically run on ‘all randomized and all applicable zoning and building approvals, a
subjects’ population, there may be circumstances that clinical trial should not be started without irst hav-
lead to excluding individual subjects from the full ana- ing a fully leshed out and realistic plan that includes:
lysis. he criteria used to deine the diferent popula- a realistic study timeline; budget; applicable resources
tions of patients must be justiied and documented in and well deined project team and other required
327
328
Appendix 27.A Template clinical trial overall budget

A non-inclusive template of typical clinical trial costs
Number of
Task title Resource title Rate per hour units Total hours Total USD ($) Comments/assumptions
Protocol development tasks
Full protocol development
Protocol amendments Assume minimum of 1/year
Protocol review
Start-up tasks
Country specific regulatory submissions (e.g. IND, CTA,
etc)
Country and site feasibility Based on # countries/sites
required
Identify and secruit sites
Pre-study site qualification visits
Develop, assemble and distribute operations manual
Develop, assemble and distribute site regulatory binder
Develop, translate and distribute ICF
Site budget and contract development/negotiation
Central file set-up
Investigator meeting planning and preparation
Investigator training material
Investigator meeting travel and attendance
Project kick-off/training meetings Depending on # of vendors
there may need to be one with
the CRO and then each of the
vendors. For academic centers
these kick-off meetings may
be between departments or
units and vendors.
IRB/EC initial submissions Average IRB costs in the US for
initial submissions is ~ $2500
Regulatory document collection (site specific)
RFP develop, distribution and analysis for all vendors Ideally obtain 2–3 bids for
each type of vendor required
(central lab, ECG, electronic
diaries, IVRW etc.)
Vendor selection, budget/contract negotiations
Clinical supplies (drug or device)
Project management
Cost of drug or device
Materials (exipients, components, etc.)
Label printing charges plus translation
Primary manufacturing
Secondary packaging/labeling
Receipt
Storage
Distribution
Analytical/dissolution/stability fees
Courier shipping charges
Custom fees
Returns/destruction
Central laboratory costs
Central ECG costs
Any other vendor costs (paper CRF binders, IRT, ePRO,
holter monitoring etc.)
Site monitoring tasks
Develop and maintain site monitoring plan
Site initiation visits
Routine monitoring visits
Site closeout visits
Site management tasks
Draft and maintain site management plan
Administer grant payments to sites
Maintain central files (trial master file (TMF))
329
330
Number of
Newsletter development distribution Based on frequency expected
(e.g. monthly, quarterly, etc.)
Annual IRB renewals (review time) Time for PM to review each
site’s annual IRB renewal
Project management tasks
Tracking of enrollment and key milestones
Provide and document training for all operational
project team members
Routine internal and external team meetings/
teleconferences and creation of minutes
Review trip reports
Overall project management during study start-up,
implementation and close-out
Vendor management and external committee
management and payments
External committees and consultants
Data Safety Monitoring Board # of members × number of
meetings × payment/member
Steering Committee # of members × number of
Endpoint Adjudication Committee # of members × number of
Consultants
Data management tasks
Draft and maintain data management plan
Case report form (CRF) d+B44evelopment+B64
CRF design, review and approval
CRF printing and distribution (paper studies)
Develop, assemble and distribute CRF completion
instructions
Electronic data capture (EDC) (including electronic diaries and other eRPO devices
Develop design specifications
Site assessment, provisioning and training
Design and validate EDC system
Deploy system and support end-users
License fees (if applicable)
EDC help desk support
Database development
Design, test and implement database
Annotate CRFs
Set-up and maintain medical coding dictionaries (e.g.,
AEs, concomitant medications, medical history, etc.)
Documentation of database design and edit checks
(range checks and cross form logic checks)
Double data entry costs (paper studies)
Tracking, storing and logging CRFs (paper studies)
Data entry
If a paper study data entry costs are driven by the actual
process, including any double data entry fees etc.
Data review
The data management system will define all the steps
in the review process
Database coding
Using the appropriate data dictionary code AEs,
Concomitant medications, medical history etc. and
resolve and discrepancies
Review and approval of applicable coding reports
Electronic data transfers
Design file format transfer specifications
Design, test and validate data transfer files
Provide electronic data transfer files Depending on the # of
external vendors and
frequency of file transfer,
cost per file transfer (varies
by vendor) determine total
budget. If combined data
from all sources need to go
331
elsewhere, also factor in those

costs
332
Number of
Database finalization
Identify and perform all database finalization activities
Perform subject acceptability criteria and flag subjects Identify those patients that do
not meet protocol inclusion/
exclusion criteria or had
other protocol violations to
determine which patients
can be included in the various
analyses (e.g. per protocol,
intention-to-treat, completors
etc.)
Authorization and sign-off on database lock
Site closeout tasks
Provide sites with corrections report (paper studies
only)
Provide sites with pdf of all data or CD and obtain
investigator sign-off (EDC only)
ECC site closeout and decommissioning
QC of data management activities vs. clinical study
report (CSR)
Biostatistics tasks
Development and finalization of statistical analysis plan
(SAP)
Development of SAP tables, listing and figures (shells)
Derived data programming
Programming tables, figures, listings and graphs
Validation/QC of tables, figures, listings and graphs
Interim analysis production and QC (if applicable)
Final statistical analysis and QC
Site budget
Site IRB costs
IRB/EC initial submissions Average IRB costs in the US for
initial submissions is ~ $2,500
IRB/EC amendment fees On average assume ~ 1/year
IRB/EC annual review/renewal fees
Per subject fee (PSF)
Cost per subject based on PSF (see Fig. 27A.1) PSF is based on the final
protocol schedule of activities
(SOA). Any changes to the
SOA typically result in a
modification to the PSF and
amendments to the site
subcontracts
Other site costs
Local advertising (creation and placement) (e.g.
newspaper, radio, newsletters, flyers etc.)
Site labor cost per amendment (independent of IRB/ Two types of amendments:
EC costs) 1) administrative that do
not require reconsenting
of subjects, but require
personnel effort on the
submission, 2) those that
require modification to the
ICF and thus reconsenting of
subjects
One time non-reimbursible site start-up fee Funds required for the
upfront effort put forth by site
personnel before any subjects
can be enrolled, including
time for IRB submission
activities, subcontracts,
training of personnel etc.
Per occurrence SAE processing
Pharmacy set-up fee
Specialty equipment that may require purchase
Site archiving costs (sorting, boxing and documenting
items for archive)
General office supplies that may be independent of
PSF, including things like dry ice for shipments
333
334
Number of
Courier fees
Institutional indirect rates For academic institutions
the indirect rate for
pharmaceutically, foundation
and NIH sponsored studies
vary and can range anywhere
from 0% to 75%
Reporting/publication tasks
Publication plan and publication creation
Publication reprint costs
Regulatory submissions (IND, CTA etc); include required
annual submission and final end of study submission
Regulatory submissions (IND safety letters, SUSARs)
Development and implementaiton plan for informing
sites, subjects of study results
Submission of required financial reports to SEC and
comparable authorities and competent authorities in
other countries
Overall study archive tasks
Sorting, boxing and documenting materials to be
archived
Central files management & long-term off-site storage/
retrieval costs
Total Clinical Trial Budget ESTIMATE $–
General comments
1. For each item, determine resource that will perform the task and the associated hourly rate, which should include applicable Facilities and Administration (F&A) rates.
2. Once final budget has been created, consider applying a ‘fudge factor’ to the total budget, anywhere from 10 to 20% depending on preceived level of unknown
3. Attempt to get a minimum of 2–3 bids from each vendor type required. The bidding process typically identifies protocol inconsistencies, deficiencies and significant valuable
input can be gained from the expertise from the various vendors.
4. If conducting study in multiple countries or if vendors are located in other countries, factor in anticipated fluctuations in currency exchanges rates that are likely to occur
during the course of multi-year studies. Hedge against significant negative changes.B24
Sample Per Subject Fee Budget Template (Based on final protocol Schedule of Activities)
Premature
Unit Screening Baseline Visit 1 withdrawal Total per Unsched
T1 Visit 2 Visit 3 T2 Visit 4 Visit 5
cost
Day 1 visit/end of subject visit
study fee
ASSESSMENTS
Titration Maintenance
Timeframes
Study day SC 0 1 2 7 8 9 14 50 80
Written informed consent -

Update screening projections form
Inclusion/exclusion review -
Medical history / demographics -
Physical/neuro examinations -
Brief physical/neuro examinations -

ECG 12-lead -
Vital signs / weight -
Total functional capacity (TFC) -
MMSE -
Concomitant medication review -
Enrollment log -
Safety laboratory procedures -
Pharmacokinetics -
Hospital stay for pharmacokinetics -

UPDRS -
Dispense study drug -
Dosage log -
Drug accountability / compliance -

Adverse events log -
Adverse events follow up -
Coordinator time and effort -
Worksheet prep / data entry -
Investigator time and effort -
Total direct costs - - - - - - - - - - - - -
F&A at XX% (include institutional

rate) - - - - - - - - - - - - -
Total per subject fee - - - - - - - - - - - - -
Notes: Detemine % of screen failures that will be reimbursed and at what rate (e.g., full screening visit and portion of baseline visit); Determine reimbursement rate for
premature withdrawals
Figure 27A.1. Sample Per Subject Fee Budget Template (Based on final protocol Schedule of Activities).
committees (e.g., Steering Committee, DSMB); assess- of the trial which is discussed in detail as part of the
ment of protocol feasibility; appropriate sites selection, implementation phase in Chapter 28.
qualiication and site training; a realistic recruitment
strategy; well deined CRFs; a fully validated database References
with appropriate edit and logic checks; a fully devel- 1. Getz K, Wenger J, Campo R, et al. Assessing the impact
oped DMP and SAP and applicable regulatory approv- of protocol design change on clinical trial performance.
als. A well-laid-out plan will ensure a smooth execution Am J her 2008; 15: 450–5.
335
2. Getz K, Campo, R, Kaitin K. Variability in protocol 13. DiMasi JA, Hansen RW and Grabowski HG. he price
design complexity by phase and therapeutic area. Drug of innovation; new estimates of drug development
Inform J 2011; 45: 413–20. costs. J Health Econ 2003; 22: 151–85.
3. FDA Guidance. Investigator responsibilities– 14. Rees, H. Supply Chain Management in the Drug
protecting the rights, safety and welfare of study Industry: Delivering Patient Value for Pharmaceuticals
subjects. http://www.fda.gov/downloads/Drugs/ and Biologics. 2011.
GuidanceComplianceRegulatoryInformation/ 15. FDA Guideline on the preparation of
Guidances/UCM187772.pdf Investigational New Drug Products. 1991.
4. Getz K. Ominous clouds over outsourcing. Applied http://www.fda.gov/downloads/Drugs/
Clinical Trials. 2010. http://appliedclinicaltrialsonline. GuidanceComplianceRegulatoryInformation/
indpharma.com/appliedclinicaltrials/ Guidances/ucm070315.pdf. (Accessed August 8, 2011.)
CRO%2FSponsor/Ominous-Clouds-OverOutsourcing/ 16. Commission Directive 2003/94/EC of 8 October
ArticleStandard/Article/detail/686210?contextCategory 2003. Laying down the principles and guidelines of
Id=37194. (Accessed August 8, 2011.) good manufacturing practice in respect to medicinal
5. 21 CRF Part 11. Electronic Records; Electronic products for human use and investigational medicinal
Signatures – Scope of Application. http://www.fda.gov/ products for human use. 2003. http://ec.europa.eu/
RegulatoryInformation/Guidances/ucm125067.htm. health/iles/eudralex/vol-1/dir_2003_94/dir_2003_94_
(Accessed August 8, 2011.) en.pdf. (Accessed August 8, 2011.)
6. MedDRA MSSO Medical Dictionary for Regulatory 17. ICH Harmonized Tripartite Guideline Comparability
Activities, Maintenance, and Support Services. http:// of biotechnological/biological products subject to
www.meddramsso.com. (Accessed August 8, 2011.) changes in their manufacturing process. http://www.
7. Directive 2001/20/EC of the European Parliament ich.org/ileadmin/Public_Web_Site/ICH_Products/
and of the Council of 4 April 2001 on the Guidelines/Quality/Q5E/Step4/Q5E_Guideline.pdf.
approximation of the laws, regulations and (Accessed August 8, 2011.)
administrative provisions of the Member States 18. FDA Guideline. Device Classiication. 2009. http://www.
relating to the implementation of good clinical fda.gov/MedicalDevices/DeviceRegulationandGuidance/
practice in the conduct of clinical trials on medicinal Overview/ClassifyYourDevice/default.htm. (Accessed
products for human use. http://www.eortc.be/Services/ August 9, 2011.)
Doc/clinical-EU-directive-04-April-01.pdf. (Accessed 19. he International Conference on Harmonisation
August 8, 2011.) of Technical Requirements for Registration of
8. EudraVigilence. Information for sponsors of non- Pharmaceuticals for Human Use (ICH). http://www.
commercial clinical trials. Reporting Rules of SUSARs ich.org/. (Accessed August 8, 2011.)
to EudraVigilance for commercial and non-commercial 20. FDA Guidance for Industry. Q1A (R2) Stability Testing
sponsors of clinical trials conducted in the EEA. http:// of New Drug Substances and Products. 2003. http://
eudravigilance.ema.europa.eu/human/index03.asp. www.fda.gov/downloads/RegulatoryInformation/
(Accessed August 8, 2011.) Guidances/ucm128204.pdf. (Accessed August 8, 2011)
9. European Medicines Agency. ICH Topic E6 (R1). 21. What is Tablet Dissolution Testing? 2011. http://www.
Guideline for Good Clinical Practice. http://www.ema. tabletdissolution.com/education/dissolution/index.
europa.eu/docs/en_GB/document_library/Scientiic_ php. (Accessed August 8, 2011.)
guideline/2009/09/WC500002874.pdf. (Accessed
22. Lis F, Gourley D, Wilson D, and Page M. Global Supply
August 8, 2011.)
Chain Management. Applied Clinical Trials. 2009.
10. 21 CFR 314. Applications for FDA Approval to Market http://appliedclinicaltrialsonline.indpharma.com/
a New Drug. http://www.accessdata.fda.gov/scripts/ appliedclinicaltrials/article/articleDetail.jsp?id=602049
cdrh/cfdocs/cfcfr/CFRSearch.cfm?CFRPart=314. &pageID=1&sk=&date=. (Accessed August 8, 2011.)
(Accessed August 8, 2011.)
23. International Air Transport Association. http://www.
11. ICH E14. Clinical Evaluation of QT/QTc Interval iata.org/index.htm. (Accessed August 8, 2011.)
Prolongation and Proarrhythmic Potential for Non- 24. Dowlman N, Kwak M, Wood R, et al. Managing the
Antiarrhythmic Drugs. 2005. http://www.fda.gov/ Drug Supply Chain with eProcesses. Appl Clin Trials
downloads/RegulatoryInformation/Guidances/ 2006; 15: 40–5.
ucm129357.pdf. (Accessed August 8, 2011.)
25. European QP Association. Qualiied Persons in Europe.
12. Li G. Site Activation, he Key to more Eicient Clinical http://www.qp-association.eu/qualiied_person_qp_
Trials. Avanstar Communications Inc. 2008. regulation.html, (Accessed August 8, 2011.)
336
26. Medical Dictionary for Regulatory Activities. docs/en_GB/document_library/Scientiic_

Maintenance and Support Services Organization guideline/2009/09/WC500002874.pdf. (Accessed
(MedDRA MSSO). http://www.meddramsso.com/. August 8, 2011.)
(Accessed August 8, 2011.) 29. FDA Code of Federal Regulations. 21 CFR Part
27. European Medicines Agency. Science Medicines 312.40. http://www.accessdata.fda.gov/scripts/cdrh/
Health. Clinical eicacy and Safety Guidelines cfdocs/cfcfr/CFRSearch.cfm?fr=312.40. (Accessed
Introduction. http://www.ema.europa.eu/htms/human/ August 8, 2011.)
humanguidelines/eicacy.htm. (Accessed August 8, 2011.) 30. WHO Drug Dictionary Enhanced (WHO DDE). http://
28. ICH Topic E6 (R1). Guideline for Good www.umc-products.com/DynPage.aspx (Accessed
Clinical Practice. http://www.ema.europa.eu/ November 20, 2011).
337
Section
Section7 Clinical trial planning and implementation
Chapter
Clinical trial implementation, analysis,
28 and reporting: An academic and industry

perspective
Cornelia L. Kamp and Jean-Michel Germain
Clinical trial implementation overview team (medical, project management). he team should
agree on the minimum qualiications required for each
he successful implementation of any clinical trial
principal investigator (PI) and/or sites to participate in
depends on careful planning as described in Chapter
the study. hese criteria should assess the site experi-
27. With a realistic timeline, budget and a detailed
ence and the qualiications of the site as well as the
scope of work for all study activities the actual execu-
chance for achieving the recruitment/retention target.
tion of the trial should be relatively straightforward.
he feasibility study is usually conducted through a
he focus during the implementation phase is on
survey to potential sites and includes selection criteria
monitoring the study progress which includes: over-
such as:
all subject recruitment and retention activities, time-
liness of Case Report Form (CRF) entry, observation • site setting (hospital vs. private practice)
of any site, safety laboratory or data trends that may • catchment area (large hospital, large city, regional
need to be addressed, and ensuring adequate IP is center)
available as needed. Additionally, the project manager • patient referral system (physician network)
(PM) will be managing the project against the overall • site experience in the same disease area with
budget, pre-deined timeline, and scope of work, and indications on previous performance (number of
communicating proactively and frequently to all key subjects enrolled, recruitment period associated
stakeholders about any changes that have downstream and retention rates)
ramiications. • site experience in similar study design (e.g.,
Successful project teams are lexible in managing placebo-controlled trial)
the trial, planning for major milestones, and address- • information on recruitment potential (number of
ing day-to-day operational issues as they arise. he new patients seen every month)
more time spent properly planning the execution of • anticipated recruitment rate and recruitment
the trial, the fewer headaches endured during the study strategy
implementation. • advertisement possibility
his chapter will review the key implementation • comments on eligibility criteria
steps and outline key aspects to be considered and • conidence in obtaining regulatory and
monitored, as well as key requirements to comply with Institutional Review Board (IRB)/ Independent
throughout the execution phase, from sites selection to Ethics Committee (IEC) approval (‘were trials
reporting of results. with similar design recently approved?’)
• frequency of IRB/IEC meetings
Protocol feasibility • information on the anticipated delay between
Whether it is performed internally or outsourced to protocol submission to IRB/IEC and/or local
a clinical research organization (CRO), the proto- regulatory body approval and site activation
col feasibility should rely on criteria developed and • site technology and infrastructure is consistent
agreed by relevant representatives from the project with the protocol requirements (MRI equipment,
338
Chapter 28: Implementation, analysis, and reporting
central pharmacy or adequate space for proper of the investigative sites including the availability of
storage of study medication, availability of adequate resources and personnel. herefore, the pur-
refrigerator or freezer, etc.) pose of the site selection process is to identify sites that
• site experience in collaborating with external have the appropriate experience and expertise with the
vendors (interactive response system (IRT), disease or the indication studied, access to the right
central laboratories) or using speciic technology population of patients, adequate organizational cap-
such as electronic Case Report Forms (eCRFs), or abilities and familiarity with clinical trials require-
electronic patient diaries ments and regulations.
• site personnel training on FDA, International
Conference on Harmonisation (ICH), Good Site qualification
Clinical Practice (GCP) and other applicable For sites that have previous experience with the clin-
regulatory requirements based on region ical trial sponsor, a review of the site’s previous per-
• site previous exposure to inspections (FDA, formance may also provide valuable information on
European Medicines Agency (EMEA), Drug the recruitment rate, drop-out rate, quality of the data
Enforcement Agency (DEA) for controlled (query rate, protocol violations, and audit indings),
substances, or local drug agency) site collaboration and responsiveness (submission of
• availability of dedicated staf (sub-investigator, CRF data, response to queries).
study coordinator) In order to ensure that meaningful data is obtained
• evaluation of concurrent studies (ongoing or from the feasibility study, it should be conducted with a
planned) that could compete with the proposed trial mature version of the protocol synopsis with no antici-
• principal investigator’s interest in participating in pated major changes. A change in any of the eligibility
the study. criteria could have a signiicant impact on the patient
• availability of site standard operating procedures population and the site’s ability to recruit.
(SOPs) he identiication of potential investigators can
be based on a variety of sources including literature,
publications, network, and input for local staf, includ-
Sites selection, qualification and ing Site Monitor (SM) or Clinical Research Associate
(CRA), local ailiates or marketing. As a irst step in
training establishing potential future collaboration with sites
and PIs, most clinical trial sponsors generally rely on
Site selection locally trained and experienced internal or outsourced
Beside a ‘well-designed’ protocol, the site selection pro- resources for the administration of the feasibility sur-
cess is probably one of the most challenging and critical vey. Ultimately, the qualiication of the sites to par-
steps of the project, as the performance of sites both in ticipate in the study is based on the review of the site
terms of subject recruitment, retention and quality of selection documents including both surveys and on
the data submitted can impact the outcome of a trial. site visit monitoring reports (see ‘Sites Monitoring’) by
Good Clinical Practices require sponsors to select the appropriate project team members. For regulatory
investigators that are qualiied by education, train- inspection purposes, the clinical trial sponsor should
ing and experience to assume responsibility for the be able to document the selection and qualiication
proper conduct of clinical trials [1]. Investigators must process for all sites, and the compliance with its own
also meet all qualiications speciied by the applicable requirements, so that the iling of the relevant docu-
regulatory requirements and provide evidence of their mentation in the Trial Master Files (see ‘Monitoring of
qualiications (through an up-to-date curriculum vitae Quality’) is appropriately completed.
or other relevant document) upon request by the spon- In order to ensure that sites are meeting their
sor, IRB, IEC, and regulatory authorities objectives and sponsor’s recruitment expectations, the
In Europe, the current regulation requires that performance of the sites must be monitored closely
documentation and information on the qualiication ater sites are activated, meaning that sites have started
and training of the principal investigator in GCPs be to actively enroll subjects into the trial. Also, the qual-
sent to the ethics committee for review and approval. ity of the clinical data submitted by the sites should
he ethics committee must also approve the quality be assessed throughout the sites’ participation with
339
diferent tools including the review of CRA monitor- data to be veriied and compared with source docu-
ing visit reports, clinical data, trending, and site audits ments including all [a information in original records
(see also ‘Monitoring of Quality’) and certiied copies of original records of clinical ind-
ings, observations, or other activities in a clinical trial
Site training necessary for the reconstruction and evaluation of
the trial. Source data are contained in source docu-
Site personnel must be adequately trained to ensure ments (original records or certiied copies)]. [ICH E6].
smooth execution of a clinical trial. In Europe, the EU Additionally the MP typically requires a complete tour
Directive requires that information documenting the of the facilities where any aspect of the study will be con-
qualiication of the PI, the training of the PI in GCP as ducted, for example ancillary sites such as an imaging
well as his/her experience in investigational research unit or clinical pharmacy. he MP also spells out the
be reviewed by the Ethics Committees [2]. In the US, process for escalating issues and the resolution process.
FDA also requires similar documentation [3]. For glo- he MP is a critical document for the standardization
bal multicentre trials, the general approach adopted by of monitoring aspects especially when site monitor-
sponsors has been to complete all training during glo- ing activities are outsourced to an external vendor, or
bal and/or local investigator’s meetings. Although the when a study is conducted in various geographic loca-
investigator’s meeting is an important training vehicle, tions. he extent and the nature of site monitoring visit
it is sometimes organized several months before a site depend on the objectives and purpose of the visit.
can get started. herefore, sponsor representatives
must plan for enough time during the site initiation Site initiation visit
visit to go again through important messages from
Prior to enrolling any subjects in a study the PI as well
the investigator’s meeting and study procedures. All
as site personnel must be trained in GCP as well as on
training should be documented so that it can be pro-
protocol speciic requirements and study procedures,
vided in case of regulatory inspection. hroughout the
and have to understand their role and responsibilities
execution of the trial the CRA will play a key role in
in the conduct of the study. Although training could
monitoring site adherence to the trial procedures and
already have been dispensed as part of the investiga-
requirements and in identifying any needs for train-
tor’s meeting, the SM or CRA will plan for a speciic
ing (new personnel on site) and retraining in case of
site initiation visit to ensure that the investigator and
non-adherence. Training requirements are obviously
site staf understand the protocol and GCP. During this
not limited to site personnel but also include any team
visit the SM or CRA will also check the completeness
members at the sponsor, CRO, or vendors.
and the accuracy of all study documentation will make
sure that all regulatory and IRB/IEC approvals have
Site monitoring been received and will conirm the qualiication of the
In accordance with GCP, including the ICH Guidelines site for participating in the study.
for GCPs, sponsors must ensure that the following key
aspects are respected [4, 5]: Regular site monitoring visit
• he rights and well-being of human subjects are Ater the irst patient is enrolled, the site will receive
protected. regular site monitoring visits as described in the GMP.
During these visits, the monitor will ensure that the
• he reported trial data are accurate, complete and
trial is conducted and that the data are recorded and
veriiable from source documents.
reported in compliance with the protocol require-
• he conduct of the trial is in compliance with the
ments, the sponsor SOPs, the international regulation
currently approved protocol/amendment(s), GCP,
and local regulation as applicable. he monitor will
and with the applicable regulatory requirements.
generally focus on ensuring the proper documentation
Site monitoring, therefore, requires on-site visits con- of subject’s informed consent (ICF), the compliance
ducted by well-trained study personnel (SM, or CRA). with the requirement for reporting safety informa-
On-site visits are usually performed according to pre- tion, managing and storing clinical supplies including
agreed criteria deined in a global monitoring plan investigational medicinal product (IMP). he moni-
(MP) which generally includes information such as tor will also verify that current study documentation
the frequency of site monitoring visit, the items and is maintained on site, including study records, such as
340
updated list of study personnel, training records, list • Following the lock of the database, all user
of all subjects enrolled, communication with IRB/IEC permissions to modify the clinical trial database,
and delegation of authority log. are disabled, so that no further changes to the data
Every on-site visit has to be documented on site can be made.
and in the sponsor central repository. he monitor has • All vendors should be notiied about study
responsibility for preparing a monitoring visit report completion to ensure inal study tasks and
describing the data which were veriied, the indings invoicing are completed in a timely matter and
as well as the corrective actions taken. he monitor has to avoid unnecessary expenditures by vendors.
responsibility for following-up on all indings until he members of the project team and the
resolution by the site or the sponsor. vendors would be responsible for ensuring all
documentation for the study was appropriately
Routine study closure iled and ultimately archived according to the
he routine closure of most clinical trials includes the regulations and applicable internal SOPs.
following: • Final reporting requirements as delineated in
‘Clinical trial reporting’.
• All sites should have a inal site closeout
monitoring visit prior to database lock to ensure All entities involved in the conduct of clinical trials
all data queries have been addressed, and to ensure should have SOPs in place for the orderly closeout of a
that the data in the database agrees with the source clinical trial. Dinnett et al. provide lessons learned on
documentation. study closeout experience from the large multi-center
• During the inal site closeout monitoring visit, study of Prospective Study of Pravastatin in the Elderly
the CRA should perform a full account of all used at Risk [7].
and unused investigational product (IP) including
drug and devices and prepare IP for inal return to Recruitment and retention plan
the sponsor or destruction of any used/unused IP
at each participating center following institutional Recruitment
policy. Final reconciliation of all IP by the study Much has been previously written in the literature
sponsor or designee is also required. he CRA will about the importance of recruitment and retention of
also ensure appropriate closure of all monitoring subjects for the timely completion and success of any
issues identiied during the course of the study. clinical trial [8]. he key is ultimately to have a well-
• Return or destruction of any other clinical supplies deined recruitment and retention plan available at the
such as ECG machines, personal digital assistants start of the study with a realistic enrollment timeline
if used for patient reported outcomes, or paper with robust contingencies plans that are implemented
CRF binders. as soon as enrollment falls behind. While much research
• Notiication of each site’s IRB/IEC that the site is has been done identifying barriers to recruitment, slow
no longer actively enrolling subjects and that the enrollment continues to plague most clinical trials,
database has been locked. Some IRBs/IECs require including those in neurological disorders. Regardless
that the study remain open until the primary of the funding source, most studies are being completed
manuscript has been published. behind schedule by at least 1 or more months. Recent
• All site iles must be prepared for archive and long- data suggests that as many as 80% of studies inish
term storage. If an electronic data capture system enrollment at least 1 month behind schedule [9]. here
was used, each site must be provided with a inal are few, if any, reports of studies completing enroll-
complete CD of their data [6]. ment ahead of schedule. Recruitment starts with site
• FDA Code of Federal Regulations (CFR 21 312.56) selection (see above ‘Sites selection and qualiication
contains record retention requirements for IRB and training’). Identifying sites with an appropriate
records. patient pool, with a qualiied and experienced investi-
• FDA CFR 21 312.57 contains record retention gator/study coordinator team is of utmost importance
requirements for inancial disclosures. in ensuring successful recruitment.
• CFR 21 312.62 contains record retention Pharmaceutical companies and CROs have spent
requirements for drug disposition and case histories. signiicant efort during the past decade or more
341
looking at methods to enhance recruitment, includ- one group of patients does not necessarily transfer to
ing more sophisticated advertising, targeted databases another population. Gathering data about prior enroll-
and interactive websites, educational brochures for ment rates for other randomized controlled trials can
subjects [10], better training of staf recruiting sub- help in determining realistic enrollment rates and
jects, increasing the per subject fee, etc. One trend that number of sites to include. For example published data
has been noted by researchers at Tuts Center for the for newly diagnosed PD subjects suggest an enrollment
Study of Drug Development is the fact that protocols rate of 0.83 subjects/site/month [15], regardless of the
have become more complex over the past decade [11, funding source or clinical trial infrastructure.
12, 13]. More frequent visits and more assessments
per visit oten add lots of ‘nice to have’ but not ‘need to Recruitment of women and minorities
have’ data, collectively driving up the costs of clinical
From 1977 to 1993, the FDA forbade early-stage test-
research and hampering efective recruitment eforts
ing of most medication on women of child-bearing
as subject and site burden increases.
potential for fear of causing birth-defects. It was not
In many cases the calculation used to determine
until 1993 with the NIH revitalization act that NIH
the enrollment duration is unrealistic. his includes
established guidelines for the inclusion of women and
attempting to squeeze a study timeline into a grant
minorities in clinical trials [16, 17, 18]. In 1997, the Food
timeline or an overall drug development plan. In many
and Drug Modernization Act recommended inclusion
cases, project teams determine the enrollment dur-
and documentation of race and ethnicity and analysis
ation by taking the total sample size, divided by the
thereof. While signiicant strides have been made in
anticipated enrollment rate (e.g. number of subjects
the past 15 years to include more women and minor-
enrolled/month), and divided by the total number of
ities, there is still a signiicant shortage of women and
sites. For example a phase 3 Parkinson’s disease (PD)
minority participation in clinical trials [19, 20, 21, 22,
study of 600 subjects with an enrollment rate of 1 sub-
23, 24]. In fact, clinical trial participation rates by race
ject/site/month at 40 centers, the enrollment duration,
for new drug applications (NDAs) submitted between
using this simplistic formula will be calculated to be
1995 and 1999 showed a distribution of 88% white, 8%
15 months (600 ÷ 1subject/site/month ÷ 40 sites = 15
black, 1% Hispanic, and 3% Asian [25]. here is still
months). he calculation assumes that all sites will
much to be done to get minority participation rates to
be activated (ready to start enrolling subjects, mean-
coincide with the actual distribution of race and ethni-
ing all regulatory documents are in-house, IRB/IEC
city distribution of the US population.
approval has been obtained, clinical trial agreements
Materials used for successful enrollment of minor-
are in-place and clinical supplies are available at the site
ities should be culturally appropriate and translated in
(investigational agent, laboratory kits, etc.) at the exact
the applicable languages. Having investigator, coord-
same time. Unfortunately, this is typically not the case,
inator or other study staf of the same race/ethnicity
especially for global trials that involve sites in countries
oten helps to remove cultural barriers and build trust,
that have diferent timing for the submission and the
allowing for greater enrollment of the targeted group.
review of protocol and regulatory documents.
While NIH grant submission forms require a clear
Published data suggest average site activation of
breakdown of anticipated enrollment by race, ethnicity,
100 days [14]. In addition to factoring on site activa-
and gender few NIH studies achieve the targeted enroll-
tion time, additional time must be factored in for the
ment distribution. Likewise, while regulatory author-
delay from site activation to irst enrollment, which
ities would like to see a diverse patient population as part
from unpublished data has averaged an additional 83
of the data used for an NDA or CTA, there are no regu-
days. Clearly more robust formulas should be devel-
latory requirements that insist on certain targets. Most
oped when calculating the enrollment duration taking
NDA/CTA submissions also fall short of the target dis-
into account these very real staggered site activation
tribution of the racial/ethnic and gender distributions
proiles, plus factoring in reduced enrollment during
based on population census in the given region [25].
holiday months (e.g., late November and December
and summer vacation months) thereby developing a
realistic enrollment timeline from the start. Retention
Recruitment plans need to be speciic to the dis- Retention starts with recruitment. Identifying the
ease and disease stage being recruited. What works for right subjects that meet all protocol inclusion and
342
exclusion studies, who fully understand the purpose of he literature is rich with articles on various tools
their participation for the entire duration of the study used to ensure subject retention, especially in long-term
and who are committed to follow the study visits and studies where in-person study visits may be infrequent;
assessments as required. Experienced investigators/ every 6 months to 1 year (e.g. birthday and anniversary
coordinators can oten determine up front which sub- cards, hand written thank you notes following visits or
jects are most likely to remain in the study for the full for special eforts, git cards for protocol milestones,
duration and will make a decision not to enroll those newsletters reporting ongoing status of the study, etc.)
subjects who although may meet inclusion/exclusion [26, 27]. Budgeting for these types of retention initia-
criteria are not apt to remain committed for the long tives prior to study start is critical for their success.
haul. During the site selection process it is important he industry as a whole has been paying speciic
to gather information on site retention rates in prior attention to recruitment and retention issues in all
studies to avoid including those sites who may be high clinical trials. here are numerous vendors available
enrollers, but who fail to follow the vast majority of that specialize in recruitment/retention initiatives and
subjects to study completion. Poor subject retention there are a plethora of training courses available specif-
can have a signiicant negative impact on the outcome ically addressing this particular challenge in conduct-
of a trial. he chance of showing eicacy can be coming clinical trials [28].
promised by a high premature withdrawal rate and/or Retention rates in many neurological disorders are
a high lost to follow-up rate. relatively high, with premature withdrawal rates about
Experienced investigator/coordinator teams under- 10–20% [29, 30]. he sample size calculations must
stand the importance of building and maintaining a include contingencies for the anticipated premature
strong relationship throughout the duration of the study. withdrawal rate to ensure that enough subjects com-
his includes open communication during in-person plete the full study for the analysis to be meaningful.
visits, regular follow-up by phone, and a true attempt
to understand what the subject and their spouse/care- Quality management
giver/signiicant other and family are going through
Inspections to conirm GCP at investigator sites have
as part of participation. Eforts to eliminate barriers to
been occurring for many years. Originally driven by
attend routine visits can help ensure continued partici-
the FDA, the environment has been changing over the
pation. his could include; the reimbursement for costs
last decade as not only the FDA, but also other regu-
associated with travel; pre-paid phone card; childcare;
latory agencies from either Europe, Japan, and other
food; parking; home visits; week night and weekend
countries are now routinely conducting GCP site
clinic hours; ofering a car service for pick-up/drop-of;
inspections. It is therefore critical for sites to get pre-
and eforts to minimize the entire duration of study vis-
pared for such audits.
its by keeping assessments involving other departments
such as MRI, or assessments by a neuropsychologist, on
time so as to avoid large gaps between assessments. Preparing for inspection
In some studies payment to subjects for time and Local regulatory agencies can perform GCP inspec-
efort may be allowed. he compensation cannot be tions as part of a national surveillance program of clin-
coercive in nature, must be prorated based on visits ical trials, coordinated by the EMEA or the FDA before
completed, and should be done in compliance with marketing authorization. As opposed to the FDA, the
local regulatory requirements (as the regulation may EMEA does not employ any full time inspector, but
vary from one country to another). In most cases it appoints an inspection team by gathering inspectors
should be reviewed and approved by the IRB/IEC and from two or three diferent European countries. For
the amount and timing of the compensation is gener- inspections related to marketing authorization, the
ally disclosed in the ICF. focus of the site inspection will be primarily on the
Understanding the intensity of assessments at each integrity and validity of the data, on the ethical stand-
protocol visit and the unique situation of each subject ards, adherence to the protocol, training and qualiica-
in the study can help an investigator/coordinator team tions of the study personnel as well as on speciic issues
in determining speciic barriers to ongoing retention that could be identiied during the review process of
to that subject that could be eliminated oten with min- the application dossier/NDA. hese inspections can be
imal additional efort by the study team. announced but are frequently unannounced.
343
he speciic items that may be checked during a documents, destruction (if applicable), treatment
GCP inspection [31] include but are not limited to: compliance, storage conditions, randomization
• Legal and administrative aspects: procedures (IRT), and unblinding.
communication with the IRB/IEC and local he most common indings from FDA or EMEA GCP
regulatory bodies, to ensure that IEC/IRB and inspections of investigator sites are related to the man-
local regulatory approvals were obtained for the agement of IMP, trial management and study oversight,
protocol and its amendment(s) and ICF before essential documents, delegation of tasks and functions,
the study start. Additionally, that all subjects as well as qualiication and training of investigators and
enrolled in the study signed the ICF prior to site personnel [32].
any study assessments being completed and
any ICF amendments prior to conducing new
protocol requirements dictated by the protocol Monitoring of quality
amendment(s). Sponsors are required to have an internal quality assur-
• Organizational aspects: documentation of ance system which includes quality control and audit-
delegation of responsibility by the PI, staf ing [33]. Generally independent from the research
qualiication, CVs, responsibilities, experience, and development team, the audit or quality assurance
availability of PI and site personnel, training groups routinely perform audits of investigator sites,
program and training records, SOPs, contract vendors, clinical trial processes and systems, trial mas-
between the sponsor and the investigator as it ter iles, but also conduct pre-inspection activities.
relates to delegation of responsibility but not the In biopharmaceutical companies audit activities are
budget component. usually planned before the protocol starts. he audi-
• Facility and equipment: proper use, adequacy tors and the project team generally agree on an audit
and validation of procedures and equipment plan which includes a minimum number of sites to be
(including documentation of routine calibration) audited. his plan can be revised during the execution
used for the conduct of the trial. of the study based on the performance of the sites both
• Management of biological samples: conditions in terms of recruitment and quality. However, as clinical
of collection, storage and shipment, shipping trials may involve dozens or even sometimes hundreds
documentation. of sites, only a small proportion are ultimately selected
• Organization of the documentation: general for an audit. It is therefore critical that all sites and insti-
documentation available, signed, dated and tutions running clinical trials get prepared and develop
iled on site, trial subject’s documents available quality management tools to maximize compliance and
including source documents, ICF documents, quality in the execution of clinical trials.
CRFs. Sites, institutions, and project teams from the spon-
• Monitoring and audit: review of signed and dated sor can address and limit most deiciencies found on
ICF, ICF approval documentation by IRB/IEC, and audits by developing simple tools such as checklists, or
documentation of the consent process. ongoing quality checks based on the identiication of
• Trial subject data: study conducted according to trends.
the approved protocol, source data veriication, Document checklist: he following items are
corrections of CRF data according to ICH/GCP. examples of critical site level documents that can be
• Characteristics of subjects included: accuracy tracked for iling in the sponsor or institution central
of eligibility criteria as compared to source data, repository or Trial Master File:
documentation of protocol violations. • updated and signed CVs of investigators, sub-
• Eicacy and safety assessments: consistency investigators and other key site personnel involved
between CRFs and source documents. in subject assessments
• Concomitant therapies: managed according to • investigator’s meeting documentation and
protocol requirements and recorded in the source materials
documents and in the CRF. • statement of investigator (FDA form 1572)
• Management of IP: review of shipping records, • certiication of disclosure of inancial interest
drug or device labels, IP accountability • request for shipment of clinical supplies
344
• conirmation of receipt of clinical supplies shape the timeline for the project. he use of advanced
• completed drug dispensing and inventory records electronic data capture (eDC) technology has dras-
• IRB/IEC approved ICF tically changed the clinical data management area,
• IRB/IEC approved protocol and amendment(s) by expediting the retrieval and clean-up of data and
• IRB/IEC composition/organization therefore saving time and money on the data manage-
• IRB annual report ment process. Where it could take up to several months
• delegation of site responsibilities and signatures with paper CRF studies between the Last Patient Last
record Visit (LPLV) and the freeze of the database, it now
takes a couple of days or a couple of weeks with eDC
• sponsor approved protocol signed and dated by
technology.
the PI
Whether the clinical trial uses paper CRF or eDC
• acknowledge receipt of Investigator’s Brochure
system, the investigational sites, the project manager
and Safety Reports
and data management staf play a key role in ensur-
• site monitoring trip reports
ing the database is completed on time. Most sponsors
• site monitoring sign-in-log require that clinical data be entered in the CRF within
• site monitoring correspondences (including 2 days of the subject’s visit. Site monitors make sure
documentation on closure of outstanding that site personnel are adequately trained in methods
monitoring issues) to enter clinical data in CRF ields. It is also expected
• site monitoring visit follow-up letters. that data management staf edit and clean-up the data
• subject master list. on an ongoing basis to address missing data or data
Analysis of trends: the systematic and regular review discrepancies. he project manager and data manage-
of speciic items can also help the sponsor to monitor ment group are responsible for closely monitoring the
the overall quality of the study, and to address potential status of the database both for timeliness and quality of
gaps. For example, the regular review and analysis of data. he project manager usually holds regular meet-
monitoring issues reported in site monitoring reports ings with clinical data management representatives
can be very useful to identify recurrent site issues and (especially when approaching database inalization
determine the needs for additional training for sites or or interim analysis) to address potential issues as they
CRAs, or the needs for vendors to improve their per- arise.
formance. Reports on subject screening, enrollment,
retention, timeliness of data entered into the database, Routine study closure and early
number of queries and time to resolution of queries can termination
provide overall site performance information. Reports
he early termination of a study must be done in an
looking at overall safety lab trends and overall adverse
orderly manner and plans should be put in place at the
events that are reviewed regularly by medical staf can
start of the study outlining the procedure that would be
provide early safety signals in a blinded fashion.
followed in the event of an early termination of a study
Compliance and quality management by both
[34, 35]. his will avoid potentially missing notiication
sites and sponsors requires constant attention from
to key players and avoid the chaos oten associated with
all team members from conception to inal regulatory
the early termination of studies.
reporting. he progress of a trial must be assessed by
he steps followed to complete a routine close out
performance metrics but also by developing and gen-
of a study are delineated in ‘Routine study closure’ (see
eralizing quality assessment measures, as high quality
above) and are the same things that need to be com-
data and compliance with the current regulations and
pleted in the event of early termination. In the event
current processes could make the regulatory approval
that a study is terminated early regardless of the reason
process smoother and could therefore lead to the drug
(safety, futility, etc.), each actively participating subject
being available on the market sooner.
must be notiied that the study has been stopped pre-
maturely, including the reason why the study has been
Database management and lock stopped. Each subject should be given instructions for
When planning for a clinical trial the team must agree stopping the use of the IP, plans for return of the IP,
early enough on the technology to be used as this will and scheduling of a inal visit for each study subject.
345
Depending on the local regulation and the reason for and coordinators. Most publically traded companies
early termination, expedited reporting to IRB/IEC will ile Form 8-K and in addition issue a press release.
and/or local Board of Health may be required. Immediately upon meeting the SEC disclosure require-
Following the notiication of the participating study ments, critical players in the clinical trials process, the
subjects, all of the other study team members, includ- participants, and the site investigators and coordina-
ing the various vendors must be informed of the early tors, should be informed.
termination of the study along with instructions for the
orderly conclusion of their service and contract issues Notification of study results to research
associated with the early termination.
participants and site investigators and
Clinical trial reporting coordinators
Reporting the results of clinical trials is not just the While research participants are essential to the con-
publication of a peer-reviewed manuscript. here are duct of any clinical trial study, they are oten the last
several important parts, each serving diferent pur- group to be made aware of study results. Following
poses as deined further below. he order in which any SEC and non-US comparable inancial reporting
reporting occurs is important. Delineating a compre- requirements, the subjects, caregivers, and the sites
hensive reporting plan, at the start of the study, encom- that participated should be the next group that are for-
passing all six elements described below, is essential to mally made aware of study results, including any thera-
ensuring the timeliness, completeness and transpar- peutic and/or public health recommendations. In one
ency of reporting trial results, regardless of the study clinical trial in Huntington’s disease, the investigators
outcome. used a three-part communication plan to disseminate
the study results: 1) a media release from the principal
US Securities and Exchange Commission investigators posted on the Huntington’ disease web-
site and emailed to the Huntington’s community; 2) a
(SEC) and equivalent reporting telephone call from the site investigator or coordinator
requirements at each site to the participants providing the results and
next steps; and 3) a joint teleconference for the inves-
According to Section 13 or 15 (d) of the Securities
tigators, sponsors, research participants, and caregiv-
Exchange Act of 1934, publically traded companies
ers to listen to the results and ask questions [40]. Other
must report material corporate events on a more cur-
means for disseminating study results include having
rent basis, beyond the required standard quarterly and
each site send a letter to each of their research par-
annual reports [36]. Material corporate events are those
ticipants with the results included in lay language plus
events that may afect a company’s stock price. Examples
the name and phone number of a contact person to
of material corporate events include the results (posi-
call with questions, along with the participant’s actual
tive or negative) of a completed clinical trial, stopping
treatment assignment information and copies of the
a study for safety concerns or futility, for safety reasons,
published abstract(s) and/or manuscript(s).
lack of enrollment, etc. Information is submitted to the
SEC using Form 8-K within 4 days of a material corpor-
ate event [37, 38]. Regulation FD provides that when an IRB/IEC Notification
issuer discloses material non-public information to cer- Pursuant to 21 CFR parts 312.64 (d) (Investigator
tain individuals or entities – generally, securities market reports), 312.66 (Assurance of IRB review) and 56.109
professionals, such as stock analysts, or holders of the (2) (f) (IRB review of research) Investigators must
issuer’s securities who may well trade on the basis of the inform the IRB of any changes to an ongoing study
information – the issuer must make public disclosure of which includes notifying the IRB when a study has
that information. In this way, the new rule aims to pro- been completed. his alleviates the IRB from its obli-
mote full and fair disclosure [39]. gation of continuing review of a research protocol that
Securities and Exchange Commission reporting has been completed. Most institutional and for-proit
requirements supersede the requirements of disclos- IRB’s have clear procedures for what must be submitted
ing study results, or the halting of an ongoing clinical to an IRB at the conclusion of a study and what the IRB
trial, to the research participants, site investigators, deines as a completed study.
346
In Europe, according to article 10 (c.) of the Directive EMEA regulations require the CSR be submitted to the
2001/20/EC, the sponsor of a clinical trial must notify competent authority of each member state within 1
the competent authority of member state(s) concerned year of the end of clinical trial notiication
that the clinical trial has ended. his end of clinical
trial notiication must be submitted within 90 days Health Canada
of the end of the clinical trial as deined in the proto- Similar to the US reporting IND requirements, the
col. Should the trial be prematurely terminated, the herapeutic Products Directorate (TPD) of the Health
notiication must be submitted expeditiously (within Products and Food Branch (HPFB) within Health
15 days). he sponsor is also required to ile an end of Canada requires the submission of a clinical trial appli-
clinical trial notiication when the sponsor decides not cation (CTA) before conducting research in Health
to commence or not to resume (ater a hold) a clinical Canada (C.05.006). A CTA is speciic to a given proto-
trial. In such a case, it is not required to follow the expe- col vs. a development program for a compound for a
dited reporting process. given indication (e.g. an IND in the US). Unlike the
US, an annual report does not need to be submitted to
Regulatory submissions the CTA, but an updated Investigators Brochure must
While many aspects of the reporting requirements are be submitted annually. Additionally, Health Canada
must be notiied within 15 days of the completion of a
similar, there are some diferences across countries
and regions. With the emergence of the International clinical trial (C.05.007) or the premature termination
of the trial (C.05.015.(1)) [43]. hese notiications
Conference on Harmonisation of Technical
Requirements for Registration of Pharmaceuticals for are submitted via a cover letter and any supporting
Human Use (ICH) starting in April 1990, signiicant documentation.
efort has been made to bring together the regulatory Other countries
authorities of Europe, Japan, and the US, and experts
All countries have speciic regulations relating to
from the pharmaceutical industry in the three regions
conducting clinical trials and their ultimate report-
to discuss scientiic and technical aspects of product
ing. Understanding those requirements prior to study
registration and streamline the regulatory submis-
launch is critical to keeping a study on track. here are
sion process in these regions, including the submission
numerous companies that specialize in ofering regu-
of Clinical Study Reports (CSR) at the conclusion of a
latory support of clinical research that can oten help
study [41, 42]. While signiicant progress has been made
navigate the process thereby avoiding missteps for irst
with harmonizing the process within these three regions
time studies in countries where the investigator/spon-
there are still distinct diferences even within these three
sor does not have prior regulatory experience.
regions let alone in the rest of the world. A few of the
he regulatory reporting requirements for gaining
diferences are addressed below. Before embarking into
regulatory approval to market new drugs, via the sub-
new regions it is critical that regulatory experts from the
mission of the electronic Clinical Technical Document
given region are included early in the process to ensure
(eCTD) in the various countries and regions is beyond
all post-study reporting requirements are completed.
the scope of this book. Readers seeking more infor-
US Food and Drug Administration mation on this topic should review 21 CRF part 314
(Applications for FDA to Market a New Drug), ICH
21 CFR part 312.33 (Annual Reports) requires annual
investigational new drug (IND) updates within 60 days regulations [44], and other applicable regulations in
of the anniversary that the IND application went into the country of interest.
efect. his annual report includes detailed reporting
of the outcome of completed studies, via the submis- Peer-reviewed publications
sion of a CSR as delineated in 21 CFR part 312.33 (3) Failure to publish an adequate account of a well designed
and ICH E3 (Structure and Content of Clinical Study clinical trial is oten regarded as a form of scientiic
Reports) [42]. misconduct [45]. Reporting is essential to evidence-
based medicine. Reporting has many venues, but peer-
European Medicines Agency reviewed original contributions in journals remain
Similar to the reporting requirements in the US, the the highest form of data dissemination. Publication
EMEA also requires the submission of a CSR, but the of clinical trial results should follow the reporting
347
requirements as outlined in Consolidated Standard trials. here are many sources of this bias including
of Reporting Trials (CONSORT) [46]. CONSORT the decision by investigators or journal editors not to
has become the standard for reporting clinical trials publish negative or seemingly uninteresting results.
including a 22 item checklist and a diagram for doc- Conlicts of interest exist at both the sponsor level and
umenting the low of participants through the four the investigator level when it comes to publications
stages of a clinical trial: enrollment, intervention allo- [51]. Additionally, the publication rate of abstracts
cation, follow-up, and analysis. Most major journals and summaries exceeds full reports, creating media
require CONSORT standards be followed. Knowing sampling bias. More stringent conlict-of-interest dis-
the requirements of CONSORT will ensure that crit- closures by ICMJE, related to inancial relationships
ical data about the study cohort is complete, accurate, between sponsors and authors, will likely evolve in the
balanced, devoid of bias, and is published according to coming years.
the pre-speciied outcome. Ultimately, including the
data elements from CONSORT allows the readers to Government registry: Basic results reporting
assess the validity of the results and allows for greater
ease of comparison between clinical trials. Even with requirements
CONSORT in-place reporting of RTC is not always Currently there are about two dozen international
complete [47]. clinical trial registries available. he majority of these
Transparency in publication has become of utmost registries are voluntary registries while a few, like the
importance, not only in presenting all of the data col- one in the US (www.clinicaltrials.gov) are manda-
lected in an accurate and comprehensive manner but tory. Many registries are set up to adhere to the World
also the transparency of authorship contributions. Full Health Organization (WHO) registry requirements
disclosure of authorship contribution, including those for content, quality and validity, accessibility, unique
of sponsors or any medical writers that have been hired identiication, technical capacity, and administration.
has become necessary to avoid the practice of ghost- WHO Primary Registries meet the requirements of the
writing that emerged in the mid 2000s [48]. ICMJE (http://www.who.int/ictrp/network/primary/
To keep trial documents compliant and transparent, en/index.html).
the following publication planning guidelines can be he clinical trials registry in the US was prompted
used from organizations like International Society for by policy makers via the FDA modernization act of
Medical Publication Professionals (www.ismpp.org), 1997 (FDAMA 1997) to increase clinical trial trans-
American Medical Writers Association (www.amwa. parency through the public disclosure of key infor-
org), and Pharmaceutical Research and Manufacturers mation about clinical trials. Under the Food and Drug
of America www.phrma.org/publications. Administration Amendments Act 2007 (FDAAA): US
In 2004, the International Committee of Medical Public Law 110–85, Title VIII requires study sponsors
Journal Editors (ICMJE) published their clinical tri- or investigator to not only register their trials prior to
als registration policy requiring the prospective the irst subject being enrolled, but additionally, study
registration (prior to irst patient enrolled) of all inter- summary results must be reported/posted within one
ventional studies in order for ICMJE journals to even year ater the actual or estimated completion date,
consider publishing the results of a study. he ICMJE whichever is earlier (http://prsinfo.clinicaltrials.gov)
accepts registration in several registries including the on the www.clinicaltrials.gov registry. Completion
US registry (www.clinicaltrials.gov), AUS/NZ registry, date is deined in the legislation as the date that the last
ISRCTN, Japan Registry, Netherlands registry, and any patient in a trial is evaluated for the primary outcome.
of the primary registries that participate in the WHO his leaves very little time for investigators and spon-
International Clinical Trial Portal. he policy applies to sors to get a peer-reviewed manuscript in the public
any trial that started recruitment on or ater July 1, 2005 domain before needing to post the summary results
(see May 2005 editorial and Frequently Asked Questions on www.clinicaltrials.gov. he ICMJE does not con-
for details of the current ICMJE policy including the sider results data posted in the tabular format required
deinition of applicable trials, acceptable registries, tim- by ClinicalTrials.gov to be prior publication [49]. As
ing of registration, and required data items)[49, 50]. the regulations regarding clinical trial registration and
Publication bias exists in favor of signiicant, ‘posi- result posting continues to evolve, reviewing the cur-
tive’ results and larger, multi-center, NIH-sponsored rent regulations at the time of starting a clinical trial
348
will help ensure appropriate clinical trial reporting via of Pravastatin in the Elderly at Risk (PROSPER). Clin
the registry(ies) at the conclusion of the study [52]. Trials 2004; 1: 545–52.
8. Kamp C and Shinaman A. Participant recruitment
Conclusion and retention in clinical trials. In: Dunn C and
Chadwick G (eds). Protecting Study Volunteers in
A therapeutically focused team with clinical experts Research: A Manual for Investigative Sites. Boston, MA,
(investigator sites and sponsor clinical team) as well CenterWatch. 2002.
as a proicient and eicient project management team 9. he Center for Information & Study on Clinical
is key to the successful conclusion of a clinical study. Research Participants (CISCRP). Clinical Trial Facts
Enhanced communication between well-trained key and Figures for Health Professionals. http://www.ciscrp.
players, well-deined processes and procedures, and org/professional/facts.htm. (Accessed August 8, 2011.)
appropriate oversight by the sponsor throughout the 10. Clinical Research Educational Material. http://www.
project life, are also prerequisites. ciscrp.org/professional/store/index.html. (Accessed
One should not forget that an objective without a August 8, 2010.)
plan is just a wish, so that the team should get involved 11. Getz K, Wenger J, Campo R, et al. Assessing the
very early in thorough planning, lexible enough to impact of protocol design changes on clinical trial
allow adaptation to emerging situations. performance. Am J her 2008; 15: 450–7.
12. Getz K. First things irst: patient recruitment. Scrip
References 2008; 3371: III–IV.
13. Getz K, Campo R, and Kaitin K. Variability in protocol
1. ICH Topic E6 (R1). Guideline for Good Clinical
design complexity by phase and therapeutic area. Drug
Practice. 2009. http://www.ema.europa.eu/docs/en_
Inf J 2011; 45: 413–20.
GB/document_library/Scientiic_guideline/2009/09/
WC500002874.pdf. (Accessed August 9, 2011.) 14. Li G. Site Activation: he Key to More Eicient
Clinical Trials. Pharmaceutical Executive. 2008.
2. EudraLex – Volume 10 Clinical Trials Guidelines.
http://license.icopyright.net/user/viewFreeUse.
Detailed guidance on the application format and
act?fuid=NjU3OTMwMA%3D%3D. (Accessed August
documentation to be submitted in an application
8, 2011.)
for an Ethics Committee opinion on the clinical
trial on medicinal products for human use. http:// 15. Kamp C Shinaman A, Kieburtz K, et al. Do clinical trial
ec.europa.eu/health/iles/eudralex/vol10/12_ec_ infrastructures impact enrollment rates and baseline
guideline_20060216_en.pdf (Accessed August 8, 2011.) demographics in early Parkinson’s disease (PD) studies?
Mov Dis 2004; 19(Suppl 9): S198.
3. FDA Guidance Document. Investigator
Responsisbilities – Protecting the Rights, 16. Fortune T, Wright E, Juzang I, et al. Recruitment,
Welfare and Safety of Study Subjects. 2009. enrollment and retention of young black men for HIV
http://www.fda.gov/downloads/Drugs/ prevention research: experiences from he 411 for Safe
GuidanceComplianceRegulatoryInformation/ Text Project. Contemp Clin Trials 2009; 31:151–6.
Guidances/UCM073122.pdf. (Accessed August 8, 17. NIH Revitalization Act of 1993. U.S. Congress
2011.) Public Law 103–43. (National Institute of Health
4. FDA Guidance: Guideline for the monitoring of Revitalization Amendment. Washington, DC. 1993.
Clinical Investigations. Docket Number 82D-0322. http://grants.nih.gov/grants/funding/women_min/
1988. guidelines_amended_10_2001.htm. (Accessed August
5. ICH E5 – Good Clinical Practice: Consolidated 8, 2011.)
Guideline. 1996. http://www.fda.gov/downloads/ 18. Food and Drug Administration. Guideline for the
Drugs/GuidanceComplianceRegulatoryInformation/ Study and Evaluation of Gender Diferences in the
Guidances/UCM073122.pdf. (Accessed August 8, Clinical Evaluation of Drugs, Notice. Fed Reg 1993; 58:
2011.) 39405–16.
6. CDISC, Electronic Source Data Interchange (eSDI) 19. Meinert CL, Gilpin AK, Unalp A, et al. Gender
Group. Leveraging the CDISC Standards to Facilitate representation in trials. Control Clin Trials 2000; 21:
the use of Electronic Source Data Within Clinical 462–75.
Trials, Version 1. 2006. http://www.cdisc.org/esdi- 20. Vidaver RM, Laleu B, Tong C, et al. Women subjects in
document. (Accessed August 8, 2011.) NIH-funded clinical research literature: lack of progress
7. Dinnett EM, Mungal M, Kent JA, et al. Closing out a in both representation and analysis by sex. J Womens
large clinical trial: lessons from the Prospective Study Health Gend Based Med 2000; 9: 495–504.
349
21. Merkatz RB and Junod SW. Historical background of 34. Shepherd R, Macer JL, and Grady D. Planning for
changes in FDA policy on the study and evaluation of closeout – from Day One. Contemp Clin Trials 2008; 29:
drugs in women. Acad Med 1994; 69: 703–7. 136–9.
22. McCarthy CR. Historical background of clinical trials 35. he Center for Information & Study on Clinical
involving women and minorities. Acad Med 1994; 69: Research Participants (CISCRP). Unanticipated
695–8. Closing of a Clinical Trial. http://www.ciscrp.
23. Wermeling DP and Selwitz AS. Current issues org/downloads/articles/CISCRP_Article_
surrounding women and minorities in drug trials. Ann UnanticipatedClosingofTrial.pdf (Accessed August 8,
Pharmacother 1993; 27; 904–11. 2011.)
24. Killien M, Bigby JA, Champion V, et al. Involving 36. US Securities and Exchange Act of 1934. http://www.
minority and underrepresented women in clinical sec.gov/about/laws/sea34.pdf. (Accessed August 8,
trials: he National Centers of Excellence in Women’s 2011.)
Health. J Womens Health Gend Based Med 2000; 9: 37. US Securities and Exchange Commission. Form 8K.
1061–70. http://www.sec.gov/answers/form8k.htm (Accessed
25. Evelyn B, Toigo T, Banks D, et al. Participation of racial/ August 8, 2011.)
ethnic groups in clinical trials and race-related labeling: 38. US Securities and Exchange Commission. Fair
a review of new molecular entities approved 1995–1999. Disclosure Regulation, FD. http://www.sec.gov/
J Natl Med Assoc 2000; 93(12 Suppl):18S–24S. answers/regfd.htm. (Accessed August 8, 2011.)
26. Levkof S and Sanchez H. Lessons learned about 39. US Securities and Exchange Commission. Final Rule:
minority recruitment and retention from the Selective Disclosure and Insider Trading. http://www.
Centers on Minority Aging and Health Promotion. sec.gov/rules/inal/33–7881.htm. (Accessed August 8,
Gerontologist 2003; 43: 18–26. 2011.)
27. Stahl SM and Vasquez LJ. Approaches to improving 40. Dorsey ER, Beck C, Adams M, et al. Communicating
recruitment and retention of minority elders clinical trial results to research participants. Arch
participating in research: examples from selected Neurol 2008; 65: 1590–5.
research groups including the National Institute on
41. ICH-E3. Structure and Content on Clinical Study
Aging’s Resource Centers for minority Aging Research.
Reports. http://www.ich.org/products/guidelines/
Aging Health 2004; 16(Suppl 5): 9S–17S.
eicacy/eicacy-single/article/structure-and-content-
28. Drug Information Association (DIA) Training Courses. of-clinical-study-reports.html. (Accessed August 8,
http://www.diahome.org/DIAHome/Search/UrlListEO. 2011.)
aspx#Training%20Course. (Accessed August 8, 2011.)
42. FDA Guidance. Structure and Content of Clinical
29. Galpern W. he NINDS NET-PD Investigators. Study Report. http://www.fda.gov/downloads/
A pilot clinical trial of creatine and minocycline RegulatoryInformation/Guidances/UCM129456.pdf.
in early Parkinson Disease: 19-month result. Clin (Accessed August 8, 2011.)
Neuropharmacol 2008; 31: 141–50.
43. Health Canada. Guidance for Clinical Trial Sponsors.
30. Holloway R. Parkinson Study Group. Pramipexole vs. Clinical Trial Application. 2001. http://www.hc-sc.
levodopa as initial treatment for Parkinson disease: A gc.ca/dhp-mps/alt_formats/hpb-dgpsa/pdf/
4-year randomized controlled trial. Arch Neurol 2004; prodpharma/ctdcta-ctddec-eng.pdf. (Accessed August
61: 1044–53. 8, 2011.)
31. Procedure for Preparing GCP Inspection Requested by 44. ICH M4: he Common Technical Document. http://
the EMEA. 2007. http://www.ema.europa.eu/docs/en_ www.ich.org/products/ctd.html. (Accessed August 8,
GB/document_library/Regulatory_and_procedural_ 2011.)
guideline/2009/10/WC500004455.pdf. (Accessed
45. Chalmers I. Underreporting research is scientiic
August 8, 2011.)
misconduct. JAMA 1990; 263: 1405–8.
32. European Medicines Agency Annual Report of Good
46. JAMA 1996; 276: 637–9. Updated Ann Intern Med 2001;
Clinical Practices Inspectors Working Group. 2009.
134; 663–94; www.consortstatement.org.
http://www.ema.europa.eu/docs/en_GB/document_
library/Annual_report/2010/04/WC500089199.pdf. 47. Toerien M, Brookes ST, Metcalfe C, et al. A review of
(Accessed August 8, 2011.) reporting of participant recruitment and retention in
33. ICH Topic E6 (R1). Guideline for Good Clinical RCTs in six major journals. Trials 2009; 10: 52.
Practice. http://www.ema.europa.eu/docs/en_GB/ 48. PLoS Medicine Editors. Ghostwriting: the dirty little
document_library/Scientiic_guideline/2009/09/ secret of medical publishing that just got bigger. PLoS
WC500002874.pdf. (Accessed August 8, 2011.) Med 2009; 6: e1000156.
350
49. International Committee of Medical Journal Editors. 51. Dunn C and Chadwick G (ed). Protecting Study
Frequently Asked Questions About Clinical Trial Volunteers in Research: A Manual for Investigative Sites.
Registration. http://www.icmje.org/faq_clinical.html. Boston, MA, hompson CenterWatch. 2002.
(Accessesd August 8, 2011.) 52. ClinicalTrials.Gov. Protocol Registration System.
50. Collier R. Prevalence of ghostwriting spurs calls for http://prsinfo.clinicaltrials.gov/. (Accessed August 8,
transparency. CMAJ 2009; 181: E161–2. 2011.)
351
Section
Section7 Clinical trial planning and implementation
Chapter
Academic-industry collaborations
29 and compliance issues

D. Troy Morgan
Introduction Clinical trial fair market value

he demand for clinical research has increased dra- and enforcement trends
matically in recent years. Total spending on medical
Clinical trials have become the center stage of recent
research in the US has doubled over the past decade
enforcement activity and government inquiry.
to nearly $95 billion dollars a year [1]. While this dra-
Regulators such as the Department of Justice (DOJ) have
matic expansion has created a vast array of opportu-
clearly signaled that their focus is on industry-physician
nities for clinical researchers, there has also been an
inancial relationships. In response to recent allegations
unprecedented rise in regulatory and compliance
that researchers were failing to disclose payments from
obligations for sponsors and investigators alike. he
pharmaceutical companies, such as a world renowned
public and law makers have become extremely skepti-
Harvard psychiatrist who failed to disclose $1.6 mil-
cal of industry sponsored research due to concerns of
lion dollars in consulting fees [3], a US Senate commit-
potential bias and inducement. Over the last 25 years,
tee lead by Senator Charles Grassley launched several
drug and device makers have displaced the federal gov-
investigations into the nature of inancial arrangements
ernment as the primary source of research inancing
between industry and researchers. he focus of these
and this industry support has become vital to many
investigations was to determine if the payments made
university research programs [2]. However, industry
to the physicians were in anyway considered excessive
relationships with physicians and academic medical
(e.g., above fair market value) or inappropriate.
institutions are under intense scrutiny and will become
he Centers for Medicare & Medicaid Services
even more challenging in the future.
(CMS) deines ‘fair market value’ as the compensation
We cannot live in a nation where drug companies are that would be included in a service agreement as the
less than candid, hide information and attempt to mis-
result of bona ide bargaining between well-informed
lead the public. When they manipulate or withhold data
parties to the agreement who are not otherwise in a pos-
to hide or minimize indings about safety and/or ei-
cacy they put patient safety at risk, US Senator Charles ition to generate business for the other party [4]. Because
Grassley [3] regulators provide little guidance on how fair market
value compensation should be determined, sponsors
Although recent enforcement actions and headlines and investigators must rely on their own methodolo-
have given rise to some areas of concern, the future gies. A range of methods to set compensation for clin-
of breakthrough therapies depends on the success- ical research has emerged, including: 1) an institution’s
ful collaboration of industry and clinical researchers. own past practice; 2) compensation surveys; 3) the use of
his chapter is intended to serve as an introduc- independent third parties to conduct fair market value
tion to the compliance landscape associated with assessments; 4) benchmarks such as the Medicare reim-
conducting clinical research in today’s challenging bursement rate for a given procedure; and 5) a combin-
environment. ation of these methods or other methods altogether [5].
352
Chapter 29: Academic-industry collaborations
Every payment set forth in a clinical trial budget transfers of value on January 1, 2012. he information
agreement should represent the fair market value for must be reported to HHS by March 31, 2013 and con-
the services rendered and must not be determined tinue on an annual basis. In turn, HHS will post this
in any manner that takes into account the volume or information on a searchable public database, which is
value of any referrals or business otherwise generated scheduled to be available on September 30, 2013. he
between the investigator and the sponsor. database will contain the name, business address, spe-
• Payments should be based on the actual work cialty, and National Provider Identiier of the covered
performed; and need a transparent method of recipient. Manufacturers will also report the amount
taking into account the core activities or elements and date of payment, form of payment, cash or cash
necessary for each type of clinical study the equivalent, in-kind items or services, stock, stock
company conducts; options, or ownership, interest or dividend, and the
• Payments should not be based on opportunity nature of the payment. If the payment is related to
cost, to determine a fee amount; even if it seems marketing, education, or research speciic to a covered
reasonable it may not be acceptable since it is not drug, device, biologic, or medical supply, the name of
based on the actual services performed; the product must also be reported [6].
• Hourly rates should be determined based on he Sunshine Act is part of a growing body of
objective criteria such as training, specialty, ‘aggregate spend’ global legislations whose intentions
research experience, type of work being are to collectively address the following: 1) Increased
performed, and other factors, including a basis for transparency with regard to payments made to health
increasing or decreasing base rates. care providers by industry; 2) Statutory reporting from
pharmaceutical manufacturers for said payments; and,
It is important for sponsors and investigators to docu- 3) Monitoring and regulating spend per physician.
ment every transaction in order to ensure the integrity Aggregate spend is the total, cumulative amount
of the research and to avoid even the appearance of spent by companies on individual health care profes-
inducement. he approach for determining fair mar- sionals and organizations through consulting fees,
ket value fees for clinical trials described above meets grants, honoraria, travel and other consideration. he
the type of requirements that should be done to ensure health care reform law requires health care provid-
that this integrity is met, the process is transparent; it is ers like physicians, physician groups, and teaching
based on the actual activities performed by consultants hospitals to disclose payments and transfers of value,
conducting a company’s clinical research, and estab- whether cash or in-kind. All of these instances must
lishes hourly rates based on objective factors. Sponsors be aggregated into an electronic form, along with the
and investigators should use these strategies as a risk physicians’ National Provider Identiier (NPI) and sub-
mitigation efort to ensure that their transactions are mitted to the federal government annually.
defendable against regulatory scrutiny and govern- he Sunshine Act is one of the more demanding dis-
ment inquiry. closure regulations for the industry because any trans-
fer of value, with some minor exceptions for amounts
Emerging disclosure and transparency under $10, needs to be reported annually and will be
made available to the public. his level of reporting
requirements and transparency is unprecedented and will not only
require a tremendous amount of work, but will also be
US physician payments Sunshine Act visible to the media, competitors, regulatory agencies,
(Sunshine Act) and others. Compensation such as investigator meeting
he US Physician Payments Sunshine Act (Sunshine fees, meals and accommodations, business courtesies,
Act) was signed into law on March 23, 2010 as part and other things of value such as leased equipment
of the Patient Protection and Afordable Care Act. must be included [7]. In addition to this new federal
he Sunshine Act requires manufacturers of drug, law, individual states have also adopted their own
device, biologics, and medical supplies covered under tracking and reporting requirements. he federal gov-
Medicare, Medicaid to report payments on an annual ernment included a clause in the Sunshine Act to indi-
basis to the department of Health and Human Services cate that federal laws preempt individual state laws to
(HHS). Manufacturers must begin recording all the extent that they require the reporting of the same
353
information. Unfortunately, current state laws require any given period of time. he government is requesting
the reporting of diferent items to a broader audience that the investigator report any and all signiicant pay-
and therefore escape federal preemption. Additionally, ments of other sorts (SPOOS), substantial payments or
several countries outside of the US are in the process of other support provided to an investigator that could
introducing similar transparency laws [7]. create a sense of obligation to the sponsor. (e.g. hon-
oraria, consulting fees, grant support for laboratory
activities and equipment or actual equipment for the
The FDA – Disclosure of financial interests laboratory/clinic).
and arrangements of clinical investigators he inancial disclosure requirement applies to any
On February 2, 1998, the FDA published a inal rule clinical study submitted in a marketing application
requiring anyone who submits a marketing application that the applicant or the FDA relies on to establish that
of any drug, biological product, or device to submit the product is efective, and any study in which a sin-
certain information concerning the compensation to, gle investigator makes a signiicant contribution to the
and inancial interests of, any clinical investigator con- demonstration of safety. he inal rule requires appli-
ducting clinical studies covered by the rule. he inan- cants to certify the absence of certain inancial interests
cial disclosure regulations were intended to ensure that of clinical investigators or to disclose those inancial
inancial interests and arrangements of clinical inves- interests. If the applicant does not include certiication
tigators that could afect the reliability of data sub- and/or disclosure, or does not certify that it was not
mitted to the FDA are identiied and disclosed by the possible to obtain the information, the agency may ref-
applicant. use to ile the application [8].
To protect research integrity, NIH require research- Under the applicable regulations of (21 CFR) an
ers to report to universities earnings of $10 000 or more applicant is required to submit to FDA a list of clinical
per year, for instance, in consulting money from mak- investigators who conducted covered clinical studies
ers of drugs also studied by the researchers in federally and certify and/or disclose certain inancial arrange-
inanced trials. Universities manage inancial conlicts ments as follows:
by requiring that the money be disclosed to research 1. Certiication that no inancial arrangements
subjects, among other measures. with an investigator have been made where study
he FDA is also expanding its audit scope to include outcome could afect compensation; that the
review of the inancial disclosures before, during and investigator has no proprietary interest in the
ater the clinical trial, and is randomly selecting clin- tested product; that the investigator does not have
ical investigators to review the inancial transactions a signiicant equity interest in the sponsor of the
between sponsors and investigators to ensure that there covered study; and that the investigator has not
is no conlict of interest. In the past this type of docu- received signiicant payments of other sorts; and/
mentation was viewed as a minor part of the process. or
However, with the increasing pressure from the trans- 2. Disclosure of speciied inancial arrangements and
parency trends it has gone from a minor part to a sig- any steps taken to minimize the potential for bias [9].
niicant part of the audit process. If this is disclosed and
there is a conlict of interest, such as payments reaching Disclosable Financial Arrangements:
the threshold, the investigator may be disqualiied from A. Compensation made to the investigator in which
participating in the clinical trial. Investigators should the value of compensation could be afected by
consider managing their relationships with industry study outcome. his requirement applies to all
sponsors to ensure that they do not exceed these min- covered studies, whether ongoing or completed as
imum thresholds. of February 2, 1999.
As much as investigators would like to work with B. A proprietary interest in the tested product,
the sponsor outside of the clinical trial environment including, but not limited to, a patent, trademark,
for consulting and other advisory capacities, such as copyright or licensing agreement. his
speaker engagements, an investigator will need to take requirement applies to all covered studies, whether
proactive steps to monitor that these engagements do ongoing or completed as of February 2, 1999.
not preclude them or disqualify them from participat- C. Any equity interest in the sponsor of a covered
ing in future clinical studies with the sponsor during study, i.e., any ownership interest, stock options,
354
or other inancial interest whose value cannot be in question, applicants are urged to explain why
readily determined through reference to public this information was not obtainable and document
prices. his requirement applies to all covered attempts made in an efort to collect the information.
studies, whether ongoing or completed; Additionally, the disclosure forms must also be signed
D. Any equity interest in a publicly held company that and dated by a responsible corporate oicial or repre-
exceeds $50 000 in value. hese must be disclosed sentative of the applicant (e.g., the chief inancial oi-
only for covered clinical studies that are ongoing cer) and the investigators involved in the study.
on or ater February 2, 1999. he requirement
applies to interests held during the time the clinical
investigator is carrying out the study and for 1 year
Industry guidance
following completion of the study; and
E. Signiicant payments of other sorts, which are
PhRMA Code on interactions with health
payments that have a cumulative monetary value of care professionals and conduct of clinical
$25 000 or more made by the sponsor of a covered trials
study to the investigator or the investigators’
he Pharmaceutical Research and Manufacturers of
institution to support activities of the investigator
America (PhRMA) is a trade group that represents
exclusive of the costs of conducting the clinical
research-based pharmaceutical and biotechnology
study or other clinical studies (e.g., a grant to fund
companies. It has created a voluntary code, commonly
ongoing research, compensation in the form of
known as the PhRMA Code, which sets standards for
equipment or retainers for ongoing consultation or
the health care industry’s interactions with health care
honoraria) during the time the clinical investigator
professionals. he purpose of the Code is to ensure
is carrying out the study and for 1 year following
that interactions are focused on supporting medical
completion of the study. his requirement applies to
education, informing health care professionals about
payments made on or ater February 2, 1999 [10].
products, and providing medical or scientiic infor-
If the FDA determines that the inancial interests of mation. he PhRMA Code sets standards for many
any clinical investigator raise a serious question about diferent aspects of the health care industry, such as
the integrity of the data, the FDA will take any action consultant arrangements, speaker programs, and
it deems necessary to ensure the reliability of the data industry support of independent medical education,
including: Initiating agency audits of the data derived business courtesies, gits, and training of company
from the clinical investigator in question; Requesting representatives [9].
that the applicant submit further analyses of data, e.g., In 2009, PhRMA updated its model Principles
to evaluate the efect of the clinical investigator’s data on on Conduct of Clinical Trials and Communication
the overall study outcome; Requesting that the appli- of Clinical Trial Results to help assure that clinical
cant conduct additional independent studies to con- research conducted by America’s pharmaceutical
irm the results of the questioned study; and Refusing research and biotechnology companies continues to
to treat the covered clinical study as providing data that be carefully conducted and that meaningful medical
can be the basis for an agency action. research results are communicated to health care pro-
here are signiicant penalties for non-compliance. fessionals and patients. Some of the key changes in the
If a sponsor or investigator unknowingly fails to report revised Principles are increased transparency about
a single instance, there will be a $1000 to $10 000 ine clinical trials for patients and health care professionals,
that is limited to $100 000 annually. However, if the enhanced standards for medical research authorship,
parties knowingly fail to report a transfer of value, and improved disclosure to better manage potential
there will be a $10 000 to $100 000 ine that is limited conlicts of interest in medical research.
to $1 000 000 annually and an investigation will be he following voluntary principles have been
opened by the federal government. adopted by PhRMA to clarify its members’ relation-
In complying with these rules, sponsors and appli- ships with other individuals and entities involved in the
cants are urged to use reasonable diligence and judg- clinical research process and to set forth recommended
ment to collect this information. If sponsors/applicants standards of practice for the industry.
ind it impossible to obtain the inancial information he key issues addressed are:
355
• Protecting Research Participants – Clinical In sponsoring and conducting clinical research,

research should be conducted in a manner that PhRMA places great importance on respecting
recognizes the importance of protecting the and protecting the safety of research participants.
safety of and respecting research participants. Principles for the conduct of clinical research are set
Our interactions with research participants, as forth in internationally recognized documents, such as
well as with clinical investigators and the other the Declaration of Helsinki and the Guideline for Good
persons and entities involved in clinical research, Clinical Practice of the International Conference on
recognize this fundamental principle and Harmonisation. he principles of these and similar ref-
reinforce the precautions established to protect erence standards are translated into legal requirements
research participants. through laws and regulations enforced by national
• Conduct of Clinical Trials – Clinical research authorities, such as the FDA.
should be conducted with the highest quality, PhRMA has a longstanding commitment of sup-
including trials and observational studies, to test porting its members through the development of
scientiic hypotheses rigorously and gather bona model standards of conduct for the health care indus-
ide scientiic data in accordance with applicable try. he PhRMA code of conduct and model standards
laws and regulations, as well as locally recognized for clinical research are the foundation of the majority
good clinical practice. When conducting of ethics and compliance programs today and should
multinational, multi-site trials, in both the be a reference guide for any organization or practi-
industrialized and developing world, ensure that tioner involved in clinical research.
standards based on the Guideline for Good Clinical
Practice of the ICH are followed. In addition, Fraud and abuse regulations that
clinical trial protocols are reviewed by independent
Institutional Review Boards and Ethics Committees
govern research and development
as well as national clinical trials health authorities. activity
• Ensuring Objectivity in Research – Clinical
research will respect the independence of the The Federal Anti-Kickback Statute
individuals and entities involved in the clinical One of the most inluential laws governing the health
research process, so that they can exercise their care industry in the US is the Federal Anti-Kickback
judgment for the purpose of protecting research Statute, 42 U.S.C. § 1320a-7b(b). his statute prohib-
participants and to ensure an objective and its individuals or entities from knowingly and willfully
balanced interpretation of trial results. ofering, paying, soliciting or receiving remuneration
• Providing Information About Clinical Trials – to induce referrals of items or services covered by
Sponsors and investigators are committed to the Medicare, Medicaid, or any other federally funded pro-
transparency of clinical trials. hey recognize gram. Remuneration means anything of value given,
that there are important public health beneits directly or indirectly, overtly or covertly, in cash or in
associated with making appropriate clinical kind, to a health care provider and includes, but is not
trial information widely available to health care limited to cash, free goods, free services, and payments
practitioners, patients, and others. Such disclosure for items, services, or data at above fair market value.
must maintain protections for individual privacy, he Anti-Kickback Statute is an intent-based statute
intellectual property, and contract rights, as well and may be violated if any one purpose of the transac-
as conform to legislation and current national tion or practice is to induce referrals or the purchasing,
practices in patent law. Availability of information leasing, or ordering of any item or service, or the rec-
about clinical trials and their results in a timely ommending of or arranging for such activities, even if
manner is oten critical to communicate important there are other legitimate purposes for the transaction
new information to the medical profession, or practice.
patients and the public. Additionally, sponsors are he any one purpose doctrine is a critical analysis
responsible for receipt veriication of data from that must be applied to any engagement between
all research sites for the studies we conduct; we health care providers and industry. his can be a com-
ensure the accuracy and integrity of the entire plicated analysis due to the fact that even though an
study database, which is owned by the sponsor. engagement meets all of the threshold criteria of a
356
reasonable and necessary transaction, if the contrib- law. However, compliance with the safe harbor require-
uting factor of the amount of compensation or elem- ments will protect a transaction from anti-kickback
ent of the engagement has the appearance to induce, scrutiny by the OIG and the Justice Department.
it could create a risk to both parties as potential vio- A violation of the Anti-Kickback Statute is a crim-
lation. he subjective nature of this doctrine requires inal ofense, which constitutes a felony punishable
parties involved in clinical research to determine what by a ine of not more than $25 000 per ofense and/
level of risk they are willing to assume for the amount or imprisonment for up to 5 years. A conviction also
of compensation involved for the contracted services. will lead to mandatory exclusion from participation
Additionally, the subjective nature of the negotiation in Federal Health Care Programs and may also lead to
of services by industry and investigators and the any civil monetary penalties of up to $50 000 for each vio-
one purpose doctrine creates a paradigm of risk that lation, plus damages of three times the amount of the
must be carefully analyzed to ensure the appropriate remuneration.
level of justiication and documentation will defend he following are recommendations that can be
the transaction. taken to reduce the risk of anti-kickback violations in
In evaluating whether any particular business clinical research: 1) Both the sponsor and the investi-
transaction or practice violates the Anti-Kickback gator must ensure that only reasonable compensation
Statute, the government may consider whether the is paid and received for valid business purposes; 2) All
transaction or practice has the potential to: increase payments for services must be consistent and object-
costs to a Federal Health Care Program, beneiciar- ive based on the complexity and amount of time for
ies, or enrollees; increase the risk of over-utilization or the services, but also must be justiiable for the back-
inappropriate utilization; raise patient safety or quali- ground and qualiications of the provider; 3) All com-
ty-of-care concerns; or interfere with appropriate clin- pensation must only be paid for reasonable necessary
ical decision-making. services and procedures; 4) here must be an object-
While the anti-kickback law is broad, the Oice ive and defendable scientiic purpose for the research
of Inspector General (OIG) at the US Department of with a well deined protocol; 5) Ensure that the study is
Health and Human Services issued ‘safe harbor’ rules objective, unbiased, necessary, and that the payments
in 1991, identifying speciic types of activities not sub- do not relect the physician’s ability to generate busi-
ject to enforcement actions under the anti-kickback ness; 6) Establish clear methodology to determine a
statute as long as various conditions are satisied [10]. reasonable study size [11].
he safe harbor rules cover such activities as
investments in publicly traded companies, joint ven-
tures, rentals of space or equipment, personal services The False Claims Act
agreements, sales of practice, discounts, and other he False Claims Act (31 U.S.C. §§ 3729–3733) is a US
arrangements. he safe harbor of particular relevance law that imposes liability on persons and companies
to clinical research is the ‘personal services safe harbor.’ who defraud governmental programs. he law includes
his personal services safe harbor allows for com- a whistleblower provision that allows people who are
mon business practices, such as consulting arrange- not ailiated with the government to ile actions on
ments, subject to certain valid business need and fair behalf of the government. Persons iling under the Act
market value requirements. he arrangement must be stand to receive a substantial portion, 15–25%, of any
in writing and be signed, specify the services to be ren- recovered damages. he government has recovered
dered, specify the length and time of the engagement, nearly $22 billion under the False Claims Act between
last for at least 1 year, provide the aggregate compen- 1987 and 2008 and has received signiicant media
sation and be consistent with fair market value, not attention in recent years.
include services to ‘promote’ a business arrangement, here have been many developments and recent
not exceed the reasonable and necessary business pur- enforcement actions with matters involving clinical
poses of the arrangement. Generally, as long as the research where the False Claims Act and Anti-Kickback
service falls within the boundaries of these safe har- statute have the focus of the investigation. A common
bors, an anti-kickback law violation has not occurred. theme with recent enforcement actions are duplica-
Conduct that falls outside a safe harbor does not mean tive fees whereby an investigator has already been
an individual or entity automatically has violated the paid for a service and is additionally being reimbursed
357
by the federal government for the same procedure, he False Claims Act not only imposes liability on
or unnecessary procedures that are not required for those who submit the false claims, but has also been
standard of care or the clinical research in question. used against those who are determined to cause the
Investigators should be aware that due to the recent submission of the claims. Enforcement can take the
increase in whistleblower claims that anyone including form of government action or a private party bringing
their internal staf, associated pharmacists and third a civil action in the name of the government.
parties, as well as CROs and patients, could be a poten- he following are recommendations that can be taken
tial whistleblower, if they have evidence of what could to reduce the risk of False Claims Act violations in clin-
be construed as a false claim or potential kickback. ical research: 1) Avoid any appearance of Anti Kick back
he Act establishes liability when any person or Statute violations; 2) Prohibit billing for free services; 3)
entity improperly receives from or avoids payment to Do not encourage billing for free services; 4) Include a
the Federal government. he Act prohibits: knowingly ‘no billing for free services’ provision in agreements; 5)
presenting, or causing to be presented a false claim for Do not double bill for study services; 6) Prohibit prac-
payment or approval; knowingly making, using, or tices that encourage double billing; 7) Include a ‘no dou-
causing to be made or used, a false record or statement ble billing’ provision in agreements [12].
material to a false or fraudulent claim; conspiring to
commit any violation of the False Claims Act; know-
ingly making, using, or causing to be made or used a The Foreign Corruption Practices Act
false record to avoid, or decrease an obligation to pay or Another signiicant law that governs research and
transmit property to the government [11]. development activities is the he Foreign Corrupt
he most commonly used of these provisions are the Practices Act, 15 U.S.C. §§ 78dd-1, (FCPA). he FCPA
irst and second, prohibiting the presentation of false prohibits corrupt payments to foreign oicials for the
claims to the government and making false records to purpose of obtaining or keeping business. Speciically,
get a false claim paid. By far the most frequent cases the anti-bribery provisions of the FCPA prohibit the
involve situations in which a defendant, usually a cor- willful use of any means of interstate commerce cor-
poration but on occasion a health care practitioner, ruptly in furtherance of any ofer, payment, promise
overcharges the federal government for goods or ser- to pay, or authorization of the payment of money or
vices. Other typical cases entail failure to test a product anything of value to any person, while knowing that
as required by the rigorous government speciications all or a portion of such money or thing of value will be
or selling defective products. ofered, given or promised, directly or indirectly, to a
here have also been a series of cases whereby the gov- foreign oicial to inluence them in their oicial cap-
ernment has prosecuted organizations and individuals acity, induce the foreign oicial to do or omit to do an
for false claim billing as they relate to the clinical trial set- act in violation of their lawful duty, or to secure any
ting. More speciically, submitting false service records or improper advantage in order to assist in obtaining or
samples in order to show better-than-actual performance, retaining business for or with, or directing business to,
double billing – charging more than once for the same any person. he FCPA is interpreted broadly and the
procedure or service, up-coding employee work – billing deinition of a government oicial includes health care
at doctor rates for work that was actually conducted by a practitioners who are employed by state and federal
nurse or resident intern and billing for research that was health care institutions [13].
never conducted; falsifying research data. he anti-bribery provisions of the FCPA apply to
Pharmaceutical manufacturers can only pro- all US persons and certain foreign issuers of secur-
mote approved uses of their approved products. ities. With the enactment of certain amendments, the
Manufacturers are strictly prohibited from promot- anti-bribery provisions of the FCPA were expanded to
ing investigational products or unapproved uses of include foreign irms and persons who cause, directly
approved products. Promotion of an investigational or through agents and third parties, an act in further-
product or an unapproved use of an approved prod- ance of such a corrupt payment to take place within the
uct is not allowed under the law. Only a product or territory of the US.
the particular use of a product that has been approved Under the Act, the person making or authoriz-
by the FDA is deemed to be safe and efective can be ing the payment must have a corrupt intent, and the
promoted. payment must be intended to induce the recipient to
358
misuse his oicial position to direct business wrong- The UK Anti-Bribery Act
fully to the payer or to any other person. Additionally,
he UK Bribery Act 2010 (c.23), efective July 1, 2011,
the FCPA does not require that a corrupt act succeed in
is an Act of the Parliament of the United Kingdom that
its purpose. he ofer or promise of a corrupt payment
covers the criminal law relating to bribery. he Bribery
can constitute a violation of the statute.
Act applies to UK citizens, residents and companies
he FCPA prohibits any corrupt payment intended
established under UK law. In addition, non-UK com-
to inluence any act or decision of a foreign oicial in
his or her oicial capacity, to induce the oicial to do panies can be held liable for a failure to prevent bribery
or omit to do any act in violation of his or her lawful if they do business in the UK.
duty, to obtain any improper advantage, or to induce a he Act has been described as ‘the toughest anti-cor-
foreign oicial to use his or her inluence improperly to ruption legislation in the world,’ raising the bar above the
afect or inluence any act or decision. standard set by the US Foreign Corrupt Practices Act
he Act applies to payments to any public oicial, (FCPA). It is more stringent than other anti-corruption
regardless of rank or position. he FCPA focuses on the laws because it covers not only bribes or inducement to
purpose of the payment instead of the particular duties government oicials, but it also covers bribes to non-
of the oicial receiving the payment, ofer, or promise government oicials, including UK health care practi-
of payment, and there are exceptions to the anti-bribery tioners. he Act also broadens the jurisdictional reach
provision for facilitating payments for routine govern- of the UK anti-bribery laws to cover bribery worldwide
mental action [12]. by individuals who are UK nationals or are ordinar-
he FCPA prohibits corrupt payments through ily resident in the UK, and organizations that conduct
intermediaries. herefore, it is unlawful to make a pay- some portion of their business in the UK. he Act spe-
ment to a third party, while knowing that all or a por- ciically prohibits the issue of facilitation payments,
tion of the payment will go directly or indirectly to a which was not a focus of the FCPA [13].
foreign oicial. he term ‘knowing’ includes conscious he UK Act represents a new tool for regulators
disregard and deliberate ignorance. he elements of and prosecutors, whose earlier eforts at criminal
an ofense are essentially the same as described above, enforcement of anti-bribery laws against companies
except that in this case the ‘recipient’ is the intermedi- were constrained by limitations in attributing criminal
ary who is making the payment to the requisite ‘foreign misconduct to organizations. he Act redeines the
oicial.’ substantive criminal elements of bribery, including a
he FCPA also requires companies in the US to new general ofense that covers domestic and foreign
meet its accounting provisions [14]. hese accounting bribery, and a separate, stand-alone foreign bribery
provisions, which were designed to operate in tandem ofense that introduces standards similar in scope to
with the anti-bribery provisions of the FCPA, require the US Foreign Corrupt Practices Act. It also repeals
corporations covered by the provisions to: 1) make the pre-existing criminal anti-bribery laws in the UK
and keep books and records that accurately and fairly and creates a new bribery ofense imposing liability
relect the transactions of the corporation and 2) devise on organizations whose employees or representatives
and maintain an adequate system of internal account- engage in bribery in the UK or abroad.
ing controls. here are four general ofenses to the UK Bribery
he following criminal penalties may be imposed Act.
for violations of the FCPA’s anti-bribery provisions: 1) Ofering, promising or giving a bribe – active
corporations and other business entities are subject bribery.
to a ine of up to $2 000 000; oicers, directors, stock- 2) Requesting, agreeing to receive or accepting a
holders, employees, and agents are subject to a ine of bribe – passive bribery.
up to $100 000 and imprisonment for up to 5 years. 3) Bribery of a foreign public oicial.
Moreover, under the Alternative Fines Act, the actual 4) Failure by a commercial organization to prevent a
ine may be up to twice the beneit that the defend- bribe being paid to obtain or retain business or a
ant sought to obtain by making the corrupt payment. business advantage [15].
Finally, ines imposed on individuals pursuant to the
FCPA are deemed to be punitive and may not be paid he UK Bribery Act introduces an ofense of corporate
by their employer or principal. failure to prevent bribery. In addition, a company or
359
corporate entity is culpable for bribes given to a third alike. Everyone involved in clinical research activities
party with the intention of obtaining or retaining busi- with a nexus in the UK should be vigilant to ensure that
ness for the organization or obtaining or retaining an payments made and consideration received involving
advantage useful to the conduct of the business by their clinical research be reasonable and necessary, appro-
employees and associated persons, even if they had no priately documented and is not considered an induce-
knowledge of those actions. here is also personal lia- ment or kickback pursuant to the requirements of the
bility for senior company oicers that turn a blind eye Act [17].
to board-level bribery.
A company can invoke in its defense that it had Conclusion
adequate procedures designed to prevent persons
Clinical research is our investment in the future of
associated from undertaking misconduct. However, a
public health. Every efort we make today deines the
company must prove that it had ‘adequate procedures’
treatment therapies and potential cures of tomorrow.
in place during the time of the ofense.
Many diferent entities and individuals contribute to
he following procedures must be in place for an
the safe and appropriate conduct of clinical research,
organization to be considered in compliance with the
including not only sponsoring companies but also
UK Act [16]:
regulatory agencies; investigative site staf and med-
• Proportionate Procedures – An organization must ical professionals who serve as clinical investigators;
show proportionate risk mitigation procedures hospitals and other institutions where research is con-
based on the risk and complexity of their business. ducted; and Institutional Review Boards and Ethics
• Top Level Commitment – he organization must Committees.
be able to demonstrate top level commitment While most organizations and practitioners have no
to mitigating bribery risk. Examples include intention of violating compliance requirements in this
internal and external communications from senior area, little guidance is available, which makes adher-
leadership tailored to establish an anti-bribery ence to compliance diicult. At a minimum, sponsors
culture. and investigators must be able to show that research-
• Risk Assessment – Periodic, risk based, and related payments and activities are reasonable, neces-
documented anti-bribery risk assessments sary and in no way create an unethical outcome and
demonstrating corrective actions and continual ensure public trust.
risk mitigation.
• Due Diligence – Proportionate and risk based with References
respect to third parties
• Communication and Training – Communication 1. he Associated Press. $95 Billion a Year Spent on
Medical Research. 2005. http://www.msnbc.msn.com/
and training (internal and external) regarding id/9407342/ns/health-health_care/t/billion-year-spent-
bribery prevention policies medical-research/ (Accessed August 9, 2011.)
• Monitoring & Review – Monitoring procedures 2. Congress of the United States: Congressional
designed to prevent bribery and make Budget Oice. Research and Development in the
improvements where necessary Pharmaceutical Industry. 2006. http://www.cbo.gov/
tpdocs/76xx/doc7615/10–02-DrugR-D.pdf (Accessed
he penalties for committing a crime under the Act are August 9, 2011.)
imprisonment for up to 10 years with unlimited ines
3. Harris G, Carey B. Researchers Fail to Reveal Full
and the potential for the coniscation of property, as well Drug Pay. 2008. http://www.nytimes.com/2008/06/08/
as the disqualiication of directors. he Act has a near- us/08conlict.html (Accessed August 9, 2011.)
universal jurisdiction, allowing for the prosecution of an
4. Federal Register. Final Rule, Stark II, Phase II
individual or company with links to the UK, regardless Regulations. 2004.
of where the crime occurred. Additionally, a company or
5. he Center for Health & Pharmaceutical Law & Policy.
an individual is automatically and perpetually debarred Conlicts of Interest in Clinical Trial Recruitment &
from competing for public contracts where it is con- Enrollment: A Call for Increased Oversight. 2009.
victed of a corruption ofense under the UK Act. http://law.shu.edu/programscenters/healthtechIP/
he UK Act poses a multi-dimensional risk to inves- upload/health_center_whitepaper_nov2009.pdf
tigators, academic medical institutions and sponsors (Accessed August 9, 2011.)
360
6. Sullivan T. Physician Payment Sunshine Provision: gov/fraud/docs/safeharborregulations/

Patient Protection Afordable Care Act Passed the MedicareSELECTNPRMFederalRegister.pdf (Accessed
House. 2010. http://www.policymed.com/2010/03/ August 9, 2011.)
physician-payment-sunshine-provisions-patient- 12. Steiner. Clinical Research Law and Compliance
protection-afordable-care-act.html (Accessed August Handbook. Jones and Bartlett Publishers, 2005, 182.
9, 2011.)
13. Wikipedia. False Claims Act. 2011. http://en.wikipedia.
7. 111th Congress. S.301 Physician Payments Sunshine org/wiki/False_Claims_Act. (Accessed August 9, 2011.)
Act of 2009. http://thomas.loc.gov/cgi-bin/query/
z?c111:S.301.IS: (Accessed August 9, 2011.) 14. he False Claims Act: A Primer. http://www.justice.
gov/civil/docs_forms/C-FRAUDS_FCA_Primer.pdf
8. FDA Guidance. 21CFR 54.4 [b]. (Accessed August 9, 2011.)
9. FDA: US Food and Drug Administration. Financial 15. he United States Department of Justice. Foreign
Disclosure by Clinical Investigators. 2001. http:// Corrupt Practices Act. http://www.justice.gov/
www.fda.gov/RegulatoryInformation/Guidances/ criminal/fraud/fcpa/. (Accessed August 9, 2011.)
ucm126832.htm (Accessed August 9, 2011.)
16. Atkins M. Introduction to the UK Bribery Act.
10. PhrMA. Principles on Conduct of Clinical Trials and 2011. http://www.inancierworldwide.com/article.
Communication of Clinical Trial Results. http://www. php?id=8188. (Accessed August 9, 2011.)
phrma.org/about/principles-guidelines/clinical-trials
17. Foreign and Commonwealth Oice. he UK Bribery
(Accessed August 9, 2011.)
Act. 2011. http://www.fco.gov.uk/en/global-issues/
11. Federal Register. Proposed Rules: 42 conlict-minerals/legally-binding-process/uk-bribery-
CFR Part 1001. 2002. http://oig.hhs. act. (Accessed August 9, 2011.)
361
Index
Abciximab in Emergency Stroke Albumin in Acute Stroke (ALIAS) progression, 233

Treatment Trial–II (AbESTT-II), trial, 247 treatment, 229
246 ALS, 273 Alzheimer’s disease Assessment Scale,
abuse liability, 13 biomarker, 278 103
actigraphy, 299, 303 disease models, 273 Alzheimer’s disease Cooperative
active arm comparator designs, 265 futility design studies, 279 Study-Activities of Daily Living
active controls, 13, 143, 201–202 lead-in design, 279 (ADCS-ADL), 232
active-control trials, 135 motor neuron death, 273 Alzheimer’s disease modiication
acute ischemic stroke, 242 mouse model, 273–274 randomized withdrawal design, 236
Acute Stroke herapy by Inhibition of phenotypic heterogeneity, 275 staggered start design, 236
Neutrophils (ASTIN) study, 247 selection designs, 279 trial duration, 238
acute stroke treatment, 243 symptomatic management, 274 Alzheimer’s disease trials, 227, 230,
recruitment, 252 symptomatic treatment, 282 233, 239
acute stroke trials therapies, 273 drug safety, 235
consent, 251 ALS clinical trials, 273, 275, 279 eicacy, 236
early and middle development dosage, 280 enrollment issues, 237
studies, 247 dropout rate, 281 minority recruitment, 237
endpoint measures, 245 drug interactions, 280 phase I clinical trials, 234–235
recruitment, 252 eligibility criteria, 275, 281 phase II clinical trials, 235
subject recruitment, 251 enrollment, 281 phase IIa trials, 235
surrogate consent, 251 missing data, 278 phase IIb trials, 235
ADAGIO trial, 116–117, 122 random efects model, 278 phase III trials, 235
adaptive design, 91, 93, 97, 98, 247 sample size, 280 placebo group decline, 237–238
assessment, 92 ALS trial outcome measure, 275 safety measures, 237
deinition, 91 ALS trials study partner, 237
neurological trials, 96 statistical techniques, 278 amyloid imaging, 75
sample size re-estimation design, 95 ALS trials outcome measure Amyotrophic lateral sclerosis (ALS).
sample size re-estimation methods, longitudinal outcomes, 278 See ALS
95 MUNE, 276 animal CNS disease models, 21
adaptive design type, 92, 98 muscle strength, 276 animal model, 21–22
Adaptive Designs Working Group survival, 276 antegrade reperfusion, 243
(ADWG), 91 vital capacity, 276 antiepilepsy drugs (AEDs), 202
adaptive dose-inding study, 97 ALSFRS-R, 276, 278 antiepileptic drug development, 291
adaptive randomization, 94, 251 ALSSQOL, 278 antiepileptic drug safety, 291
adaptive seamless design, 95–97 alternate dosing formulations, 15 antiepileptic drugs, 284
adaptive trials, 13 alternative hypothesis, 60 mechanism, 285
ADAS-Cog, 101 alternative hypothesis of superiority, antiepileptic drugs trials
scale, 230 137 children, 291
score, 32, 230 Alzheimer’s disease, 11, 14, 19, 25, 29, Anti-Kickback Statute, 356
adverse drug event 113, 132, 203, 204, 227 violation, 357
manufacturer responsibility, 161 behavioral outcomes, 233 Antiplatelet Trialist Collaboration
physician reporting survey, 162 biological signatures, 228 (APTC) endpoint, 170
adverse drug event factors, characteristics, 229 any one purpose doctrine, 356
160–161 diagnosis, 229 aplastic anemia, 164
adverse drug event reporting, 163, 164 diagnostic criteria, 238 Appel rating scale, 278
statins, 162 disease-modifying therapies, 238 Arrhythmia Suppression Trial
Adverse Event Reporting System molecular changes, 228 (CAST), 130
(AERS), 163 pathology, 227–228 as treated analysis, 66
aggregate spend, 353 prevention, 236 aspirin, 6
362
Index
asymptomatic intracranial censored data, 107 clinical trial team, 309

hemorrhage (ICH), 244 Center for Devices and Radiological communication, 312
ataluren, 25 Health (CDRH), 206 clinical trials, 1, 2, 5, 9, 29
atrial ibrillation, 6 central laboratories, 315 candidates, 177
Avonex Combination Trial, 265 central laboratory criteria, 315 caregiver participation, 182
Aβ imaging, 25 cerebral beta-amyloid, 11 censoring, 36
Aβ plaques, 227 cerebral infarction, 242 central laboratories, 315
cerebrovascular disease, 1 cognitively impaired
Barthel Index, 245 cGMP regulations, 321 subjects, 192
baseline severity, 249 characteristics of instruments, 76 committees, 148
baselines, 108 child epilepsy, 291 critical public engagement, 183
basic exposure requirements, 14 China, 1 data management services, 314
Belmont Report, 174, 183 cladribine, 266 data review, 147
BENEFIT Trial, 265 Class I medical devices, 207, 209 design, 15, 28
Best Pharmaceuticals for Children Act Class I neurological devices, 208 design protocol, 28
(BPCA), 199 Class II medical devices, 208 drug safety hypothesis, 169
beta-amyloid (Aβ1–42), 74 Class III medical devices, 209 eligibility criteria, 182
BEYOND Trial, 265 clincal trials scope of work (SOW), 312 endpoint, 244
bias, 42 clinical care, 174 enforcement, 352
binary outcomes, 107 clinical data management, 326 implementation, 338
biochemical biomarkers, 74 Clinical Dementia Rating Scale interpretation of results, 201
bio-creep, 143 (CDR), 232–233 intervention risks, 177
bioequivalence, 136 clinical development, 8, 13, 15, 17 investigational agent access, 182
biological therapies, 5 early stage, 9 management, 312
biomarker adaptive designs, 95 late stage, 13–15 participation, 189
biomarkers, 20, 23, 71, 73–74, 127, 243 middle stage, 11, 13 patient exclusion, 182
pharmacodynamic markers, 20 clinical drug safety trials, 170 payment, 353
type 0 biomarker, 23 clinical endpoint planning, 319, 327
type 1 biomarker, 23 late phase acute stroke study, 246 planning process, 309
type 2 biomarker, 24 clinical enrollment duration, 342 prospective registration, 176
Biomarkers Deinitions Working clinical equipoise, 175, 176, 178 publication bias, 348
Group Clinical Global Impression, 29 quality assurance system, 344
conceptual model, 71 clinical guidance documents, 16 recruitment, 341
blinding, 43, 45, 136 Clinical Interview Based Impression of registries, 348
blood brain barrier, 22, 23 Change (CIBIC), 232 regulatory requirements, 327
blood-CSF barrier, 24 clinical investigators, 182 reporting, 346
Bonferroni method, 40 clinical post-marketing safety reporting and transparency, 353
botulinum toxin, 5 assessment, 161 results dissemination, 346
brain atrophy, 268 clinical research results publication, 347
brain imaging, 180 competent only policy, 192, 193 retention, 342–343
incidental indings, 180 clinical research associate (CRA), 311 safety, 9
studies, 11 clinical research consent, 188 set-up phase, 320
brain intervention study risk, 176 clinical research organizations site audits, 344
(CROs), 312 site monitoring, 339–340
C-11 labeled donepezil, 11 Clinical Study Reports (CSR), 347 site qualiication, 339
calcitonin gene-related peptide clinical supplies, 321 site selection process, 39
(CGRP), 97 clinical supply chain, 321 site training, 340
calcitonin gene-related peptide clinical supply labeling, 323 subject payment, 343
(CGRP) receptor, 96 clinical trial budget, 316, 317 timelines, 319
cancer futility designs, 220 budget management, 319 vendors, 316
cancer therapy, 275 committees, 318 clinically isolated syndrome (CIS), 259
Cardiac Arrhythmia Suppression Trial components, 317 cluster headaches, 101
(CAST), 71, 130 currency luctuations, 317 CNS drug delivery, 23
carotid endarterectomy, 6 data management costs, 318 CNS targets, 21
carrier-mediated and receptor- IRB/EC costs, 319 CNS trial challenges, 173
mediated transport, 23 monitoring budget, 318 CNS trials
carryover, 104 vendors, 318 aggressive interventions, 177
case series, 163 clinical trial design ethical research, 183
CDP, 47 entry and exclusion criteria, 30 risks, 183
celecoxib, 170 timeline, 30 trial enrollment, 177
363
Index
Code of Federal Regulations, 73 DATATOP study, 114 candidates, 177

cognitive behavioral therapy for DATATOP trial, 115, 121 early stage clinical trials, 69
insomnia (CBT-I), 296 decision-making capacity, 190–191 early trial termination, 154, 345
collateral perfusion, 243 Declaration of Helsinki, 174–176, 181, Early vs. Late L-dopa in Parkinson
Combination Drug Selection Trial, 85 183 Disease (ELLDOPA) trial, 115
comorbid insomnia, 295 deep brain stimulation (DBS), 218 ECG vendor criteria, 316
comparative efectiveness studies, 6 deep brain stimulation devices, 210 ECG vendors, 316
comparative selection trial, 88 demonstration of efectiveness, 197 EDSS, 262, 268
complete two-period design, 118 Deprenyl and Tocopherol Antioxidative EDSS scale, 262
participant allocation, 120 herapy of Parkinsonism efect size, 250
statistical model, 118–119 (DATATOP) trial, 114 eicacy, 11, 150, 154
compliance and quality management, device classiications, 322 in vivo model, 9
345 Disability Assessment for Dementia proof-of-eicacy-trials, 12
concomitant medications, 10 (DAD) scale, 232 therapeutic eicacy, 11
conditional power, 154–155 disease modiication, 113 eicacy and safety endpoints, 11
conidence interval, 33, 63 modifying efect, 203 eicacy trials, 168
conirmatory trials, 93, 98 modifying efect study, 203 electronic data capture (eDC)
adaptations, 94 disease prevention efect study, 204 technology, 345
confounding, 179 disease prevention study eligibility, 48
Consolidated Standard of surrogate marker, 204 eligibility criteria, 249
Reporting Trials (CONSORT). disease progression treatment, 203 endovascular mechanical treatments,
See CONSORT disease-modifying efect, 124 248
continual reassessment method, 92 DMC, 159 endpoint, 76, 136
stopping criteria, 93 DMC Charter, 149 model, 70
continuous outcomes, 106 donepezil, 101, 107 endpoint selection, 12
control group, 30–31, 42 dopamine agonist, 218 enrichment design, 132
controlled clinical trial, 42 dopaminergic medication, 218, 219, enrichment design trial
Coronary Drug Project (CDP), 47 222–223 methods, 127
correlative and marker studies, 180 sleep attacks, 222 enrichment design trials, 127
covariate adaptive randomization, 94 dose-response relationship, 11 advantages, 128, 130, 133
CRFs, 326 dose-response study, 96 carryover efects, 132
CRM. See continual reassessment drop out, 66 complete enriched enrollment, 128
method drug approval, 197 generalizability, 131
modiied approaches, 93 drug safety, 160 Kopec mathematical
crossover design, 101, 110 active surveillance systems, 164, 165 model, 129
2-treatment 2-period design, 101 case deinitions, 167 limitations, 130
AB:BA design, 103, 107 clinical eicacy trials, 169 partial enriched enrollment, 128
matched crossover design, 109 clinical trials, 168 planning considerations, 131
parallel design, 103 clinical trials constraints, 168 recruitment eiciency, 131
crossover insomnia trials, 301 drug deinition, 167 responder deinition method, 129
crossover trials, 101 observational epidemiological sample size, 129
applications, 101 study, 165 sensitivity, 129
baselines, 109 post-market clinical trials, 168 strengths, 129
logistical challenges, 111 proile trials, 205 subject response, 128
sample size, 101 relative risk, 165 epilepsy, 284
sequence efects, 104 study approach, 171 photoparoxysmal response, 288
two-stage approach, 108 drug safety issue identiication, 161 epilepsy clinical designs
with carryover, 105 drug safety proile, 160 monotherapy trial design, 292
without carryover, 105 drug safety program, 160 epilepsy clinical trials
CSF measurement, 24 drug stability testing, 322 adverse event frequency, 287
drugs, lack of increase, 3 assessment of eicacy, 286
Data and Safety Monitoring Board drugs, cost, 1 deinitive proof of eicacy studies,
(DSMB), 94 DSMB, 96, 98 289
data monitoring committee, 148, 149 Duchenne muscular dystrophy dose-inding study, 289
data monitoring committee (DMD), 25 drug eicacy, 288
objectivity, 150 dyskinesias, 218 electroencephalogram (EEG), 287
data vendor regulation compliance, dystrophin, 25 exit criteria, 288
314 failed trials, 292
database freeze, 320 early exploratory (phase I) trials, 92 outcome measures, 287–288
database lock, 320, 326 early phase studies placebo response rate, 292
364
Index
populations, 285 lexible design methods, 152 historical evidence of sensitivity

presurgical inpatient study, 289 focal dystonia, 5 to drug efects (HESDE).
quality of life scales, 287 Food and Drug Administration See HESDE
seizure clusters, 287 Modernization Act (FDAMA), human drug development, 21
seizure freedom, 288 197 human drug exposures, 22
sudden unexplained death, 291 Food, Drug, and Cosmetic Act (the human medical research, 174
trial duration, 292 Act), 197 policies, codes and regulations, 174
epilepsy drug delivery, 286 Foreign Corrupt Practices Act. regulations, 175
epilepsy drug eicacy studies, 289 See FCPA welfare of volunteers, 174
epilepsy drug models, 285 fully sequential designs, 155 human pharmacokinetics, 10
epilepsy open label extension funding mechanisms, 98 human pharmacology, 10
study, 290 funding research (US), 3 human research ethics, 173, 175
epilepsy regulatory studies, 289 futility analysis, 81, 154 human research risk, 173
epilepsy syndromes, 284 futility assessment, 97 human volunteer studies, 15
epilepsy treatment resistance, 284 futility design, 78, 80–81, 248 Humanitarian Device Exemption
Epworth Sleepiness Scale, 222 criterion of superiority, 79 (HDE), 210–211
equal carryover, 105 neuroprotective agents, 78 Humanitarian Use Device (HUD), 210
equivalence, 135–137, See non- sample size, 82 Huntington’s disease, 346
inferiority single-arm design, 79, 84 hypotheses, 31
equivalence trials, 135, See non- two-arm design, 79 ‘one sided’ alternative hypothesis, 31
inferiority trials futility design hypotheses, 79 null hypothesis, 31
eszopiclone (ESZ), 300 futility outcome, 81 two sided alternative hypothesis, 31
ethical criteria, 187 sensitivity, 80 hypothesis of non-inferiority, 144
ethical study design, 175 speciicity, 80 hypothesis tests, 150
ethics research type I error, 80
justice, 181 type II error, 80 ICH Guideline E3, 141
EU clinical trials, 326 futility design pitfalls, 83 IDE application, 211
European Medicines Agency, 16 historical control data, 84 IDE exempt studies, 212
evaluable subjects analysis, 65 sample size, 83 IDE study, 211
Expanded Disability Status Scale IHAST, 53–54
(EDSS), 21 G93A SOD1, 273 imaging biomarkers, 75
experimental autoimmune gadolinium-enhancing imiglucerase (Cerezyme), 5
encephalomyelitis (EAE), 257 lesions, 257 IMP formulation, 323
experimental autoimmune Gaucher’s disease, 5 IMP quantities, 323
encephalomyelitis (EAE) model GCP inspections, 343–344 IMP quantity forecasting, 323
limitations, 258 General Practitioner Research income of countries, 1
exploratory outcomes, 29 Database, 167 infarction, 243
generalized onset seizures, 286 informed consent, 187, 188, 194
fair market value compensation, 352 genomics studies, 75 capacity assessment, 190
false claim billing, 358 Glasgow Outcome Scale, 59 decision-making capacity, 190
False Claims Act, 357–358 global outcome measures, 232 emergency exceptions, 253
violations, 358 global statistics, 246 probability statements, 189
family-wise error rate, 150–151 GMP, 340 subject understanding, 189
FCPA, 359 Good Clinical Practice therapeutic motivation, 194
FDA, 3 (GCP), 159 informed consent forms, 188
FDA Adverse Event Reporting System, Good Manufacturing Practice (GMP). insomnia, 295
162 See GMP associated conditions, 297
FDA MedWatch program, 161 group sequential adaptive biomarkers, 299
Federal Anti-Kickback Statute. randomization design, 97 diagnosis criteria, 304
See Anti-Kickback Statute group sequential methods, 151 diary data, 302
Federal Drug Agency, 5 development, 154 intervention goals, 297
Federal Food, Drug, and Cosmetic Act neural pathways, 296
Medical Device Amendments of hazard function, 37 perpetuating factors, 295
1976, 206–207 hazard ratio, 38 pharmocologic agents, 297
Federal Regulations 21 Code, 200 Health Canada, 347 precipitating factors, 295
felbamate, 164 health-related quality-of-life sleep diaries, 298
inancial disclosure regulations, 354 (HR-QOL) scales, 263 treatment eicacy, 302
inancial disclosure requirement, 354 hemorrhagic transformation, 244 insomnia clinical trials, 297, 299
ingolimod, 266 HESDE, 142 blinding, 302
First Subject First Visit (FSFV), 320 historical controls, 201, 248 CBT trials, 300
365
Index
comparative efectiveness Kaplan-Meier curve, 37 missing completely at random

trials, 305 (MCAR), 111
comparative eicacy trial, 301 large simple trial, 170 not missing at random (NMAR),
controls, 304 ibuprofen safety, 170–171 111
dual approach, 301 last observation carried forward missing not at random (MNAR), 66
duration of treatment, 304 (LOCF), 48, 67 mitoxantrone, 266
eicacy, 303 Last Subject Last Visit (LSLV), 320 mixed carryover efect,, 109
eicacy and safety, 302 late exploratory (phase II) mixed efects model, 107
frequency of assessments, 304 trials, 93 mixed model repeated measures, 123
hypnotic trials, 305 Latin square design, 110 MMT, 276
improvement indicators, 302 L-DOPA, 23 model-based development, 11
menopausal women, 297 Lennox-Gastaut Syndrome, 286, 290 modiied Rankin scale, 245–246
neurocognitive tests, 302 levetiracetam, 285 monotonic spending function, 153
psychological interventions, 300 Levin-Robbins-Leu (LRL) sequential mortality, 137
questionnaires, 298 selection procedures, 86–87 MRI, 76
recruitment, 303 levodopa, 217–218, 222 MRI measure
retention, 304 melanomas, 222 brain atrophy, 263, 264
safety outcomes, 305 life expectancy, 1 MS Functional Composite (MSFC),
sample size, 302 likelihood-based approach, 111 262
institutional review board (IRB), 188 linomide, 266 MS trials
intention to treat (ITT), 46–47 lipophilic compounds, 23 adaptive designs, 269
intent-to-treat (ITT) log-rank test, 38 MTD. See maximum tolerated dose
principle, 136 Long-term disability trials, 221 multi-center trial, 252
strategy, 141 LRL. See Levin-Robbins-Leu (LRL) multiple ascending dose studies, 10
Interactive Response Technology sequential selection procedures multiple dose studies, 10
(IRT). See IRT multiple hypothesis testing, 144
interferon β, 259–260 Mainland-Gart’s approach, 107 multiple imputation, 67, 123
interim analysis, 49, 151, 157, 250 manual muscle testing (MMT), 276 multiple sclerosis (MS), 5, 76, 178, 257
sample size, 158 marginal approach, 107 adverse events, 265
internal pilot designs, 95 masking, 249, See blinding anti-inlammatory therapy, 258
International Committee on material corporate events, 346 disease free, 268
Harmonisation Guidelines, 14 maximum tolerated dose (MTD), 92 disease-modifying therapy, 258
International Conference on Mechanical Embolus Removal in drug development, 258
Harmonization (ICH), 163 Cerebral Ischemia (MERCI) early treatment, 260
intra-class correlation, 103 trials, 248 gray matter pathology, 258
Intraoperative Hypothermia for MedDRA dictionary, 326 hypotheses, 258
Aneurysm Surgery Trial (IHAST) medical coding system, 326 multiple trials, 266
53 medical devices, 206 relapse and treatment, 267
intra-parenchymal delivery, 23 legally marketed device, 209 severity drit, 266
investigational device exemptions premarket approval, 210 subtypes, 259
(IDE), 211 regulatory framework, 207 symptom-based therapies, 258
investigational medicinal products substantial equivalence, 209 MRI measures, 264
(IMP), 317 Medical Devices Advisory Committee, MRI studies, 263
Investigational Product (IP). 213 multiple sclerosis trials
See clinical supplies Medical Devices Dispute Resolution enrollment, 260
investigational research plan, 211 Panel, 213 entrance criteria, 260
IP medical devices premarket informative enrollment, 260, 262
accountability, reconciliation and notiication, 209 MRI outcome measures, 268
destruction, 325 metabolomic approaches, 75 recruitment, 266
cradle-to-grave tracking, 325 microdialysis, 24 multiplicity, 151, 158
storage, 325 middle phase studies, 250
transportation, 325 mild cognitive impairment (MCI), N of 1 trials, 110
IRB notiications, 346 190, 228 natalalizumab, 5, 265
IRT, 324 minorities enrollment material, 342 National Commission for the
ischemia, 242 missing at random (MAR), 66 Protection of Human Subjects
ischemia trials, 243 missing completely at random of Biomedical and Behavioral
superiority, 250 (MCAR), 66 Research, 174
ITT strategy, 144 missing data, 66 National Institutes of Health, 17
IV rt-PA pilot study, 247 missing data mechanism, 111 National Multiple Sclerosis Society
missing at random (MAR), 111 policies, 178
366
Index
NDLM, 96 retrospective cohort study, 166 statistical analysis, 221

NET-PD futility studies, 81, 84 observational epidemiological study, Parkinson’s disease progression,
additive two-arm design, 82 165 219–220, 223
control group, 81 case control studies, 167 clinical trials, 220
placebo parameter, 82 cohort study, 165 monitoring devices, 219
single-arm study, 81 cohort study design, 166 survival endpoint, 219
two-arm design, 82 cohort study restrictions, 166 Parkinson’s disease trials, 216, 218
NET-PD network, 78 confounding, 165 disease progression modiication,
neuritic plaques, 227 prospective cohort studies, 166 219
neuroibrillary tangles, 227–228 restrospective cohort design, 166 missing data, 223
neuroimaging, 11 observational epidemiologic study outcome measures, 219
neuroinlammation, 258 designs partial onset seizures, 286
neurointerventional devices, 206 case-control studies, 165 passive spontaneous reporting system,
neurological device market, 206 cohort studies, 165 162
Neurological Devices Advisory Panel, observational epidemiologic study patient follow-up, 137
213 case-control study, 168 patient non-compliance, 141
neurological worsening, 244 nested case-control, 167 patient population, 14
neuropathic pain, 19 ocular coherence tomography (OCT), patient reported outcome (PRO), 12,
neuroprotection, 242, 268 264 69
Neuropsychiatric Inventory open label dose escalation studies, 247 patient selection, 11, 249
(NPI), 233 Open Report, 148 Pediatric Research Equity Act (PREA),
Neuropsychological Test Battery open-label dose-escalation studies, 199
(NTB), 231 247 pediatric studies, 199–200
neuropsychological testing, 263 open-label safety extension studies, 15 pediatric study safety data, 199
Neuro-QOL projects, 69 ordinal outcome scales pediatric written requests (PWRs), 200
neurostimulation de dichotomous treatment, 246 pediatric written responses, 200
vices, 206 ordinal outcomes scales penumbral salvage, 243
neurotherapeutics, challenges, 5 trichotomous treatment, 246 per subject fee (PSF), 318
neurotrophic factor development, 20 orphan diseases, 200 per-protocol analysis, 141, 144
New Drug Applications (NDA), 16 orphan drug approval, 200 pervasive developmental disorder, 101
NIH Anticonvulsant Screening orphan drug efectiveness, 200 PET imaging studies, 75
Program, 285 orphan drugs, 5 pharmaceutical promotion, 358
NIH Biomarkers Deinitions Working outcome measures, 69, 76 Pharmaceutical Research and
Group, 71 outcome-adaptive dose-inding Manufacturers of America
NIH Toolbox, 69 design, 97 (PhRMA). See PhRMA
NINDS rt-PA acute stroke study, 248 pharmacodynamic marker, 25
NINDS rt-PA trial, 246 paradoxical insomnia, 298 pharmacodynamic modeling, 11
no-adverse efect level (NOAEL), 10 parallel group designs, 114 pharmacodynamic studies, 264
non-inferiority, 142, 250 Parkinson’s disease, 1, 19, 101, 107, pharmacokinetic modelling, 11
non-inferiority margin, 122, 138, 169 113–114, 163, 179, 215 pharmacologic agents, 113
non-inferiority trials, 169–170 causes, 215 pharmacologic properties, 9
assay sensitivity, 142–143 comparative efectiveness trials, 222 phase I trial, 177
choice of margin, 141–142 conirmatory (phase III) clinical phase 1 trial design objectives, 177
patient noncompliance, 141 trials, 220 phase 1 trial risks, 177
sample size, 140 disability, 219 phase I clinical trials, 74, 264
non-invasive imaging techniques, 24 early trials, 217 approaches, 92
non-signiicant risk device impulse control disorders, 222 phase II clinical trials, 74, 264
studies, 211 long term disability trials, 221–222 phase III clinical trials, 264
non-validated CNS interventions, 183 medication, 168 PhRMA adaptive dose ranging studies
no-pharmacologic efect dose motor complications, 218, 219 working group, 93
(NOPED), 10 motor features, 215 PhRMA Code, 355
normal distribution, 55 neuropathology, 215 PhRMA principles, 355
novel therapeutic, 19 non-motor features, 216 Physicians Withdrawal Checklist, 12
nuisance parameters, 94, 103–105 of time, 218 Pittsburgh Imaging agent B (PIB), 11
null hypothesis, 46, 60 phase II clinical trials, 220 placebo control trial ethics, 178
Numeric Pain Rating Scale, 12 pilot trials, 221 placebo controls, 178
Nuremberg Code, 174, 176 suicide, 222 principles of justice, 181
Parkinson’s disease modiication, 221 placebo group, 202
observational epidemiological studies, clinical trial, 221 placebo response
170 patient population, 223 rates, 12
367
Index
placebo responses, 179 recruitment and retention plan, 341 severe impairment battery (SIB), 232
placebo-to-match (PTM). See PTM recruitment and retention plans, 342 sham control studies, 223
polysomnography (PSG), 298–299, regression models, 67 sham surgical approach, 43, 218
303 regulatory reporting requirements, sham surgical controls, 179, 223
population, 53, 54 347 conditions of use, 179, 180
population distribution, 54–56 relative risk. See hazard ratio ethical critique, 179
population parameters, 54 remyelination strategies, 258 intervention studies, 179
population standard deviation, 57 Request for Proposal (RFP), 312 shit analysis, 246, 247
positron emission tomography (PET), research ethics, 174 sICH, 250
11, 75 fair research design, 181 signiicance level, 62, 151
post-market drug safety, 160–161 justice, 181 signiicant risk device, 211
pre-approval drug safety assessment, principles, 175 simple carryover, 110
160 research participants, 3 simple random sample, 54
PRECISION trial, 170 Resource Utilization in Dementia single ascending dose studies, 9
preclinical experiments, 176 (RUD) scale, 233 single dose studies, 10
researcher responsibility, 177 response adaptive design, 109 single imputation, 67
pre-IDE process, 212 response-adaptive randomization, 94 single photon emission computerized
pre-IDE submission, 213 reverse multiplicity problem, 144 tomography (SPECT), 75
premarket approval (PMA), 214 reverse placebo efect, 131 single treatment arm studies, 248
primary insomnia, 295 rhabdomyolysis, 166 site coordinator, 312
primary outcome, 29 riluzole, 280 site manager (SM), 311
primary progressive multiple sclerosis routine trial closure, 341, 345 Site Monitoring Visit (SMV), 340
(PPMS), 257, 260 RRMS trials sliding dichotomy analysis, 246
principal investigator (PI), 311 placebo-controlled trials, 265 SMV, 341
principle of responsiveness, 181 relapse number, 262 societal beneit, 3
PROACT I, 243 rt-PA treatment, 251 solanezumab, 25
progressive multifocal spending functions, 153
leukoencephalopathy (PML), 265 safe harbor rules, 357 SPORTIF III trial, 138, 141–142
project manager (PM), 309 safety and tolerability issues, 12, 15 standard deviation, 53
proteomics, 75 safety and tolerability proile, 14 standard of care (SOC) costs, 318
protocol feasibility, 338 safety endpoints, 69 state biomarker, 73
PROUD trial, 116, 118 safety of research participants, 356 statistical sotware, 155, 157
pseudo-placebo withdrawal study, 290 safety or tolerability issues, 11 EAST, 157
PTM, 322 safety pharmacology studies, 9 ldBounds package, 155
public Advisory Committee meetings, safety studies, 155 R-project, 155
16 sample mean, 52 SAS, 156
Public Health Service Act, 197 sample size, 33, 94, 250 statistics, 52–53
putative disease related pathways, 21 sample size adjustment, 251 step-forward randomization,
p-value, 65 sample size re-estimation, 15 252, 252
statistical analysis plan, 327 stopping boundaries, 152, 155, 157
QALS trial, 85 schizophrenia, 19 Haybittle-Peto method, 152
quality control monitoring, 46 secondary outcomes, 29 O’Brien and Fleming method,
Quality of Life-AD (QOL-AD), 234 secondary progressive MS (SPMS), 152–154
257 Pocock method, 152, 154
radio-labeled receptor ligands, 24 Securities Exchange Act of 1934, 346 stratiication factors, 48
random efects models, 107 seizure diary, 286–287 stroke prevention, 6
random error, 42, 44, 45, 50 seizure prophylaxis, 6 stroke prevention trials, 253
randomization, 65, 136 seizures, 284 Stroke Prevention using Oral
randomization allocation ratio, 249 selection design, 78 hrombin Inhibitor in Atrial
randomized clinical trial, 31, 135 selection of endpoints, 76 Fibrillation (SPORTIF) III trial.
randomized controlled stroke trials, selection procedures, 84–86 See Sportif III trial
247 indiference zone approach, 85 Stroke herapy Academic Industry
randomized controlled therapeutic sequential selection procedures, 85 Roundtable (STAIR)
trials, 147 sensitivity analyses, 48 recommendations, 242
randomized controlled trial, 136, 178 sequential monitoring, 94 Stroke–Acute Ischemic NXY
randomized start trials, 128, 204 serious adverse events (SAEs), 161, Treatment II (SAINT II)
randomized withdrawal trials, 204 244 trial, 246
outcome measure, 202 serotoninergic efects, 11 stroke-speciic outcome measures, 245
rapid endpoint, 85 serotonin-norepinephrine reuptake structural imaging, 76
recanalization, 242, 243 inhibitors, 11 Student’s t-distribution, 55
368
Index
study forms, 46 time to event trial, 290 type I error, 61, 150, 151
substantial evidence of efectiveness, tissue plasminogen activator, 97 type II error, 61
197 TNK trial, 251
clinical trials, 200–201 TOAST (III) trial, 147, 150 UK Bribery Act, 359
single trial elements, 198 data monitoring committee, 149 unblinding, 132
single trials, 198 tolerability proile, 10 unequal carryover, 106
suicient washout periods, 107 tolerable dose range, 10 unexpected adverse drug
Sunshine Act, 353 trait biomarker, 73 events, 161
superiority, 250 treatment approval, 203 Uniied Parkinson’s Disease Rating
superiority to placebo, 142 treatment by period interaction, 104 Scale (UPDRS), 216, 219
superiority trial, 137 trial monitoring, 159 United States drug safety
patient noncompliance, 140 trial termination, 159 system, 161
sample size, 140 Tuts quantitative neuromuscular unvalidated surrogate markers, 198
surrogate consent, 191 examination (TQNE), 276 US Physician Payments Sunshine Act
surrogate endpoint, 71 two-arm non-inferiority trial (Sunshine Act), 353
surrogate markers, 198, 203 sample size, 140
clinical outcomes, 198, 199 two-period design, 114, 221 variability, 53
studies, 198 ADAGIO trial, 120 variance, 53
surrogate outcome measures, 71 additional treatment, 121 virtual biotechnology
symptomatic and disease-modifying delayed start design, 116, 118 irms, 5
efects, 113 eligibility criteria, 120
symptomatic intracranial hemorrhage evaluation, 121 warfarin, 6
(sICH), 244 limitations, 124–125 washout periods, 132
missing data, 123 Wilcoxon-rank sum test, 106
T2 hyperintense lesions, 263 multiple statistical testing, 122 women and minority participation,
Tacrine Consortium study, 132 period duration, 120 342
tau, 227 primary analyses, 121–123
TEMPO trial, 116 PROUD, 120 ximelagatran, 138, 141
test of signiicance, 32 sample size, 124
therapeutic development, 19 withdrawal design, 115, 116 zolpidem, 301
therapeutic development programs, 71 two-stage adaptive dose-ranging
therapeutic misconception (TM), 193 design, 96 α spending function, 153, 154
369

(Smtebooks - Com) Clinical Trials in Neurology - Design, Conduct, Analysis 1st Edition PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

(Smtebooks - Com) Clinical Trials in Neurology - Design, Conduct, Analysis 1st Edition PDF

Hochgeladen von

Copyright:

Verfügbare Formate

Clinical Trials in Neurology

Design, Conduct, Analysis

Michael P. McDermott, PhD

R. Michael Poole, MD, FACP

© Cambridge University Press 2012

his publication is in copyright. Subject to statutory exception

First published 2012

Printed in the United Kingdom at the University Press, Cambridge

Library of Congress Cataloguing in Publication data

ISBN 978-0-521-76259-5 Hardback

Cambridge University Press has no responsibility for the persistence or

Section 1. The role of clinical trials 12 Enrichment designs 127

22 Acute ischemic stroke 242

J. Todd Arnedt, PhD Merit Cudkowicz, MD, MMSc

William R. Clarke, PhD Jordan Elm, PhD

Jacqueline A. French, MD Scott Y. H. Kim, MD, PhD

R. Michael Poole, MD, FACP Andrew D. Siderowf, MD, MSCE

1 E. Ray Dorsey and S. Claiborne Johnston

Overview In China, for example, the number of individuals over

Male (b) China - 2030 Female

Introduction as being performed during a speciic phase (such as

3 of therapies for neurological disorders

Introduction should be made to develop the techniques, early in a

Figure 3.1. Examples of factors that

Effect persists with Target expressed

Figure 3.2. Examples of critical

(a) 120 (b) 120 (c) 120

100 100 100

0 20 40 60 80 100 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 5 10 15 20 25

4 Judith Bebchuk and Janet Wittes

Power = 90% Power = 80% Proportion Proportion with

1.0 Figure 4.2. Four hazard functions.

that describe deteriorating conditions.

1.0 Figure 4.3. Four survival functions.

1 Figure 4.4. Example of a Kaplan-Meier

Sample size in each group: General problems of multiplicity as it

5 Susan S. Ellenberg and Jacqueline A. French

Introduction not expect to observe precisely identical outcomes. In a

Random error Centralization of operations

Random error to central pathology review, from eligibility reviews

Introduction usual summary or descriptive statistics describe two

and hypothesis testing are then discussed, along with n

Table 6.1 Blood pressure data

Table 6.4 Blood glucose levels in a sample of IHAST eligible

Statistic Computed value

where s is the sample standard deviation. he degrees (x t n (s )

Table 6.6 Summary of hypoglycemia rates for two therapies

Table 6.7 Computation of conﬁdence intervals from hypoglycemia data

Standard therapy Innovative therapy

Table 6.8 Computation of conﬁdence interval for diﬀerence in hypoglycemia rates

Table 6.9 Errors in testing hypotheses

True state of nature

normal means, equal variances Standard

of population 1 by the symbol μ1 and the mean of popu- ⎪⎪ 1 1 ⎪⎪

H 0 : µ1 µ 2 or (µ1 µ 2 = 0) his means that c should be the (1−α/2) percentile

he test will be based on the results of independent ( X1 X2 )

Table 6.11 Summary of blood glucose levels

Because the observed p-value is greater than 0.05 we

The p-value What are some alternatives to intention

7 Robert G. Holloway and Andrew D. Siderowf

Introduction of measures rather than on their development. Here we

Figure 7.1. The relative importance

Learn zone Confirm zone