Mark V. Thomas DMD - Evidence-Based Dentistry, An Issue of Dental Clinics (The Clinics - Dentistry) (2009, Saunders) PDF

EVIDENCE BASED DENTISTRY 0011–8532/02 $15.00 .
00
WHAT IS EVIDENCE BASED

DENTISTRY?
Gary R. Goldstein, DDS
The volume of literature and lectures directed at the modern dental

practitioner has created some problems. How does one resolve the often
contradictory information? How does one determine what is a cutting-
edge technique and what is useless? In resolving a clinical decision,
evidence rather than empiricism should dictate treatment. Evidence
based dentistry (EBD), based on the concepts developed at MacMaster
University,13, 14, 17–22 presents guidelines to determine the validity of study
results and whether they can be applied to clinical practice.
The foundation for evidence based practice was laid by David Sack-
ett who has defined it as ‘‘integrating individual clinical expertise with
the best available external clinical evidence from systematic research.’’23
Evidence based dentistry supplies guidelines to help the clinician
make an intelligent decision. In and of itself, EBD does not give defini-
tive answers. It does not exchange the tyranny of the expert for the
tyranny of the literature. As Sackett’s definition states, EBD relies first
on clinical expertise. This expertise is especially critical in dentistry,
where the number of randomized, controlled clinical trials and prospec-
tive cohort studies is limited. In a perfect world, full of quality prospec-
tive studies, one would only have to pull up a well-performed meta-
analysis or systematic review of the evidence on the clinical question to
solve the problem at hand. Unfortunately, these studies are too few, and
clinicians must apply the best available evidence to make a decision.
The Cochrane Collaboration, an international nonprofit organization
whose goal is to make up-to-date, accurate information on the effects of
health care available worldwide, has an Oral Health Group that has
produced some systematic reviews. Their web site (http://hiru.mcmaster.
ca/cochrane/default/htm) is an excellent place to see what the evidence
based dental practice in the future will be like.
From the Department of Prosthodontics, New York University College of Dentistry; and
Department of Dental Material Science, New York University Graduate School of Arts
and Sciences, New York, New York
DENTAL CLINICS OF NORTH AMERICA
VOLUME 46 • NUMBER 1 • JANUARY 2002 1

2 GOLDSTEIN
The internet has made it easy to initiate an evidence based practice

(see article by Felton on page 45 of this issue). Guidelines for EBD are
applicable to peer-reviewed literature and also to publications and lec-
tures that provide a case report or, at best, a case series done under
conditions that may not be similar to those seen in the average dental
office. Armed with the tools of EBD, the clinician can readily evaluate
the mass of data and choose, in an educated manner, what to use and
what to discard.
Unfortunately, most of what is seen in dentistry is product testing
done in laboratories, not operatories. The studies are usually univariate
analyses, because the researcher has been trained to homogenize the
study so that only one variable is tested. Clinicians, however, live in a
multivariate environment. For example, an in vitro study on a dental
cement might deal with retention of castings on extracted teeth. Reten-
tion, however, is not the only variable that a clinician evaluates in
choosing a cement. A clinician must also be concerned with postopera-
tive sensitivity, film thickness, setting time, working time, longevity,
ability to clean up, setting expansion, and so forth. One might also
wonder how good the retention would be in a clinical milieu where
isolation, crevicular fluid, saliva, and intraoral humidity become con-
founding variables. Clinicians, seeing only one variable tested, should
be reluctant to change their cement based on the limited laboratory
study. Needed instead are controlled, long-term clinical trials to help
clinicians make decisions, but such studies are expensive and require a
long time to supply the information. Chambers questioned whether
‘‘there is clinical evidence showing that this restorative material will last
longer in patient’s mouths then it will be on the market’’ (see article by
Chambers on page 29 of this issue).
Using EBD is quite simple:3
1. Create an answerable question.
2. Track down the best evidence to answer the question.
3. Critically appraise the information.
4. Apply the results to one’s patients.
5. Evaluate one’s performance.
The Journal of Prosthetic Dentistry has published a series similar to
the User’s Guide to the Medical Literature,13, 14, 17–22 specific to dentistry, to
help appraise the information.1, 2, 5, 6, 8, 11, 12, 15, 16 Although the guidelines
differ for the different clinical question being asked, certain characteris-
tics pertain to all studies.
THE USE OF EVIDENCE BASED DENTISTRY IN

DETERMINING THERAPY
Was the Assignment of Patients to Treatment
Randomized?
Randomization eliminates allocation bias. In theory, randomization
ensures that variables, over which the study has control and the un-
WHAT IS EVIDENCE BASED DENTISTRY? 3
known variables that come in to play in all studies, are equally distrib-
uted among the test groups. To ensure equal distribution, the study
population (N) must be sufficiently large. A randomized controlled trial
(RCTs) is considered the optimal research design and is the reference
standard for most clinical questions. Not all RCTs, however, are properly
planned and carried out. The reader must still examine the methodology.
Also, as Sackett concluded, ‘‘some questions about therapy do not re-
quire randomized trials (successful interventions for otherwise fatal in-
terventions) or cannot wait for the trials to be conducted. And if no
randomized trial has been carried out for our patient’s predicament, we
follow the trail to the next best external evidence and work from there.’’23
Feinstein9 has questioned the blind faith often put in randomized
trials and has suggested that prognostic stratification is critical to the
utilization of the data. He maintains that if data are to be evaluated
in prognostic subgroups, those subgroups should be identified, where
possible, before the study starts, and that subjects should be allocated to
those subgroups before they are randomly allocated to treatment.10 For
example, in a study on implants in which the site (anterior mandible
versus posterior maxilla) is a major variable, it would be sensible to
identify the site before randomizing to ensure that chance alone does
not place most of the anterior mandibles in one group and most of the
posterior maxillae in the other. Another potential confounder would be
smoking. Although it would be unwieldy, if not impossible, to identify
every possible variable, certain dominant ones known to affect the
outcome of the therapy should be identified at the start of the project.
Were All Patients Who Entered the Trial Properly

Accounted For and Attributed For at its Conclusion?
It is critical that all patients who enter a trial are properly accounted
for at its conclusion. It is not enough to say that a certain number of
patients dropped out. One must include the dropouts in the statistical
analysis (see article by Clive on page 137 of this issue). The most
common reason patients drop out of a therapy trial is because they are
unhappy with the therapy. Some subjects die, and some move out of the
area, but the number in these categories should be relatively equal in
the control and test groups. If the drop-out rate exceeds 20%, the clini-
cian should be concerned about the external validity or generalizability
of the project.
Were Patients, Their Clinicians, and Study Personnel

Blinded to Treatment?
Blinding means that someone was not aware of the treatment being
rendered. Double-blinded means that both the evaluators and the patients
4 GOLDSTEIN
were unaware of the therapy being rendered. Blinding is easily done in

a drug trial in which the pills look and taste the same and the patient is
identified only by a code number unknown to the evaluator looking at
the outcome. Blinding can also be easily done in a study of toothpastes
or mouthwashes. It is not always possible to blind a clinical trial. For
example, in a study comparing implant-retained overdentures with ei-
ther two or four fixtures in place, it would be impossible to blind
the patient or the researcher if intraoral examinations were necessary.
Although a nonblinded trial is not ideal, it can still be an excellent
experiment that can generate usable, reliable data.
Were the Groups Similar at the Start of the Trial?
To ensure validity, it is critical that the cohorts (groups) be similar

in all pertinent demographic, medical, and dental factors. Although in a
large study randomization should ensure equivalence, it is the investiga-
tors responsibility to assess equivalence among cohorts in detail.
Aside From the Experimental Intervention, Were the

Groups Treated Equally?
Anything one studies, one alters. Patients who agree to participate

in a study tend to be more compliant than the average. Knowing they
are to be examined may cause them to exercise better home care before
presenting in an effort to please the investigator. It is tempting for
investigators to recall a test group more often when the outcome is
uncertain or side effects are suspected. Co-interventions, such as an
extra prophylaxis, can affect the primary outcome being examined and
the validity of the study. All groups need be treated equally.
Were All Clinically Important Outcomes Considered?
The reader must decide whether all clinically important outcomes

have been considered. If, for example, in evaluating a new cement for
ceramic restorations, the investigator reports only that the restoration
was in place after the time of the study, it is obvious that other important
considerations have been ignored. If the investigator also evaluates
postoperative sensitivity, film thickness, setting time, working time, lon-
gevity, ability to clean up, setting expansion, and so forth, the important
clinical factors have been evaluated. More commonly, the investigation
might evaluate only two of the factors. Some clinicians would find the
study adequate; other readers might not. An implant study, for example,
might speak of prosthesis stability and neglect the number of implants
remaining. If six implants were placed and three were lost, the prosthesis
might be stable, but the clinician has cause to question the data.
Was Follow-up Sufficiently Long and Complete?
Too often a study is not long enough to be valid to the clinician

(chronology bias). Although a 1-year follow-up may be sufficient in a
study of the efficacy of tetracycline-impregnated cord, the same follow-
up time is not adequate in a study on a new composite resin restoration.
For restorative procedures, a minimum of 3 to 5 years may be necessary
to convince a dentist to change therapy.
Were Objective and Unbiased Outcome Criteria Used?
Outcome criteria are chosen by the investigator, and it is easy to err

by choosing an assessment that best serves the theory of the investigator.
The adage, ‘‘I would not have seen it if I didn’t believe it,’’ readily comes
into play. Picture a study that compares a Lexus with a Yugo and
chooses the following criteria for the study:
Does it have an engine?
Does it have a radio?
Does it have four wheels?
Does it have windshield?
Does it have seat belts?
Using these criteria, one concludes that the Lexus and Yugo are
similar. Any rational person, however, clearly sees that results based on
questionable outcome assessments are useless. In more sophisticated
studies, such a flaw may not be so obvious.
Will the Results help Clinicians in Caring for

Their Patients?
The critical question for clinicians is whether the results will help
them provide better care for their patients, because that question in-
volves all the others. If the methodology is good, if the statistically
significant results have clinical relevance, and if the data interpretation
is rational, one would lean towards accepting the study. If, however, the
population is not representative of a clinician’s practice or if the inclusion
and exclusion criteria do not match the practice population, clinicians
should be hesitant about applying the results to the population they are
treating.3
6 GOLDSTEIN
USING EVIDENCE BASED DENTISTRY TO EVALUATE

THE NEED FOR A DIAGNOSTIC TEST
Was There an Independent, Blind Comparison with a

Reference Standard?
A gold (reference) standard is important. In histopathology, the

biopsy is considered the gold standard, but even the biopsy does not
result in 100% agreement among pathologists. The disagreement is mag-
nified when the pathologists are deprived of the clinical findings sup-
plied by the surgeon. If a reference standard exists, one might question
the need for the new test. If the test cannot offer the advantages of being
less expensive, or less invasive, or easier to perform, one should question
its use. Unfortunately, often there is no reference standard, or the refer-
ence may be controversial. Lack of a suitable reference standard does
not mean that the new test is not useful, but a heavier burden of
proof is demanded from the investigator, and the clinician must exercise
more caution.
Were the Methods for Performing the Test Described

in Sufficient Detail to Permit Replication?
If the reader cannot perform the test, it is of no use.
Were Sensitivity, Specificity, Positive Predictive Value,

Negative Predictive Value, and Likelihood Ratios
Presented?
It is not the reader’s responsibility to undertake statistical analysis

when reading an article. Rather, it is the researchers’ obligation to supply
the appropriate data (see article by Brunette on page 87 of this issue).
Because EBD puts the onus of decision making on the clinician, readers
must be familiar with the terms so they can determine if the new test
would have merit in their practices.
Will the Patient be Better Off as a Result of the Test?
Routine testing, if it does not affect the diagnosis, prognosis, or

treatment, has questionable value. If the results do not potentially change
the course of treatment, the test is unnecessary. A patient who fell and
knocked out the coronal portion of a tooth would benefit from a radio-
graph to determine the extent of the fracture but not from a pulp test to
determine vitality. An adolescent with an ulceration from biting the
cheek would be better served by a reexamination in a week rather than
by a biopsy.
Evidence based dentistry will surely be abused.4 Insurance compa-
nies have already developed evidence based care policies that require
dentists to prove that patients need the services.3 The possibility of abuse
does not mean that dentistry should reject EBD. Indeed, dentists have
been practicing EBD, in part, for many years. When clinicians tell pa-
tients to brush and floss, they do so because the evidence supports the
efficacy of these interventions. When dentists advocate fluoride, they do
so because the evidence supports its efficacy. Although many areas of
dental practice are supported by numerous high-quality research proj-
ects, many more areas are supported only by anecdotal data. Hence, the
validity of the data and who evaluates it become critical. Aurbach4 has
questioned:
‘‘Who will be the anointed one or group that determines which evidence
is valid? Who will set the research agenda and determine where the
results will be maintained? Who will validate the research? Who will
maintain the data base to make sure that it is up to date? How will the
results be used?’’3
It is obvious that to control the data, clinicians need to own it. If
clinicians are not sophisticated enough to force good research practices
by their ability to evaluate and reject poor science, they will be at the
hands of third parties who can use dubious research as justification to
control clinicians’ practices. The sooner dentistry as a profession univer-
sally embraces EBD, the sooner the profession will command the use of
research and prevent its misuse.
WHAT EVIDENCE BASED DENTISTRY IS NOT
Evidence based dentistry is not a veil to mask the same old, inade-
quate research. It is disturbing to see lecturers invoke EBD and present
the same anecdotal lectures they gave before, with different slide titles.
As the profession of dentistry becomes more sophisticated, researchers
and lecturers will be forced to grow also. Evidence based dentistry does
not take the clinical decisions out of clinicians hands and put them into
the hands of the literature. In fact, the opposite is true. Evidence based
dentistry gives guidelines for the clinician and relies first on clinical
expertise.
Evidence based dentistry does not mean that third parties will
control dental practices. In fact, educated dentists, understanding the
literature, will be able to prevent the misrepresention of data by commer-
cial interests.
Evidence based dentistry does not mean the clinician need not
study basic and dental material sciences. In fact, the opposite is true. To
evaluate the research presented, clinicians need a solid background on
which to base their evaluations and decisions.
Evidence based dentistry does not mean clinicians abandon every-
thing they learned in dental school. It does not force clinicians to go
backwards to justify things the profession universally accepts.
8 GOLDSTEIN
WHO BENEFITS FROM EVIDENCE BASED DENTISTRY
• The ultimate beneficiaries of EBD are members of the public, who

will reap the rewards of better care. The internet allows patients,
as well as professionals, access to health care information. The
public, however, does not have the tools to evaluate the data
adequately and must rely on their educated dentists to help sort
fact from fiction. Patients will be more educated, more involved
in their treatment decisions, and more appreciative of quality care.
• Dentists, who will also benefit from EBD. Instead of conducting
free product testing for dental product manufacturers, prac-
titioners will have at their disposal more valid research on which
to predicate their clinical decisions.
• Researchers, who will benefit by being called upon to do the
clinical testing necessary before new products are placed on the
market.
References
1. Anderson JD: Need for evidence based practice in prosthodontics. J Prosthet Dent
83:58–65, 2000
2. Anderson JD, Zarb GA: Evidence based dentistry: Prognosis. J Prosthet Dent 83:495–
500, 2000
3. Anderson V: Evidence based care, is the defense ready? Dental Economics 28–32, 2000
4. Aurbach FE: Evidence based dentistry: A practitioner’s perspective. J Am Col Dent
66:17–20, 1999
5. Carr AB, McGivney GP: Measurement in dentistry. J Prosthet Dent 83:266–271, 2000
6. Carr AB, McGivney GP: Users’ guides to the dental literature: How to get started. J
Prosthet Dent 83:13–20, 2000
7. Chambers D: Research for practitioners or research for researchers? J Am Coll Dent
65:2–4, 1998
8. Eckert SE, Goldstein GR, Koka S: How to evaluate a diagnostic test. J Prosthet Dent
83:386–391, 2000
9. Feinstein AR: An additional basic science for clinical medicine: II. The limitations of
randomized trials. Ann Intern Med 99:544–550, 1983
10. Feinstein AR: An additional basic science for clinical medicine: III. The challenges of
comparison and measurement. Ann Intern Med 99:705–712, 1983
11. Felton DA, Lang BR: The overview: An article that interrogates the literature. J Prosthet
Dent 84:17–21, 2000
12. Goldstein GR, Preston JD: How to evaluate an article about therapy. J Prosthet Dent
83:599–603, 2000
13. Guyatt GH, Sackett DL, Cook DJ: Users’ guides to the medical literature. II. How to
use an article about therapy or prevention. A. Are the results of the study valid?
Evidence-Based Medicine Working Group. JAMA 270:2598–2601, 1993
14. Guyatt GH, Sackett DL, Cook DJ: Users’ guides to the medical literature. II. How to
use an article about therapy or prevention. B. What were the results and will they
help me in caring for my patients? Evidence-Based Medicine Working Group. JAMA
271:59–63, 1994
15. Jacob RF, Carr AB: Hierarchy of research design used to categorize the ‘‘strength of
evidence’’ in answering clinical dental questions. J Prosthet Dent 83:137–152, 2000
16. Jacob RF, Lloyd PM: How to evaluate a dental article about harm. J Prosthet Dent
84:8–16, 2000
17. Jaeschke R, Guyatt G, Sackett DL: Users’ guides to the medical literature. III. How to
use an article about a diagnostic test. A. Are the results of the study valid? Evidence-
Based Medicine Working Group. JAMA 271:389–391, 1994
18. Jaeschke R, Guyatt GH, Sackett DL: Users’ guides to the medical literature. III. How
to use an article about a diagnostic test. B. What are the results and will they help me
in caring for my patients? The Evidence-Based Medicine Working Group. JAMA
271:703–707, 1994
19. Laupacis A, Wells GA, Richardson S, et al: Users’ guides to the medical literature V.
How to use an article about prognosis. JAMA 272:234–237, 1994
20. Levine M, Walter S, Lee H, et al: Users’ guides to the medical literature IV. How to
use an article about harm. JAMA 271:1615–1619, 1994
21. Oxman AD, Cook DJ, Guyatt G: Users’ guides to the medical literature VI. How to
use an overview. JAMA 272:1367–1371, 1994
22. Oxman AD, Sackett DL, Guyatt GH: Users’ guides to the medical literature. I. How to
get started. The Evidence-Based Medicine Working Group. JAMA 270:2093–2095, 1993
23. Sackett D, Richardson WS, Rosenberg W, et al: Evidence based Medicine: How to
Practice and Teach EBM. New York, Churchill Livingstone, 1997
Address reprint requests to

NYU College of Dentistry
Division of Restorative and Prosthodontic Sciences
345 East 24th Street
Clinic 5W
New York, NY 10010–4086
e-mail: gary.goldstein@nyu.edu
EVIDENCE BASED DENTISTRY 0011–8532/02 $15.00 .00
THE QUESTION
James D. Anderson, BSc, DDS, MScD
HOW QUESTIONS ARISE
There are two aspects to the clinical practice of dentistry. The surgi-
cal component includes all the manipulation of hard and soft tissue that
is performed every day in dental practice. Examples are tooth prepara-
tion and restoration, scaling, orthodontics, and prosthesis fabrication.
The other element involves decision making. The diagnosis of unlocal-
ized dental pain, the prognosis for a periodontally compromised tooth,
the choice of posterior restorative materials, and the risks/benefits as-
sessment of third molar extractions are examples. Early in the career,
decision making may be the most difficult aspect of clinical practice.
There is an overwhelming array of choices with little or no structure on
which to build an approach to solving the problems. As a practitioner
gains experience, he or she acquires the advantage having seen the
results of previous decisions, good and bad, and can recall how a
problem was dealt with previously. The practitioner also develop habits
that make each task easier. Habits, too, are the result of decisions made
but not re-examined.
As a start, the thoughtful practitioner will ask first if there is a
compelling reason to intervene for a patient, and second if there is a
compelling reason to intervene at this time. The answers to these ques-
tions can be obvious or elusive. The patient who has severe, throbbing
pain and tender swelling over the apex of a heavily carious lateral
incisor with a large periapical radiolucency clearly needs treatment and
needs it promptly. On the other hand, whether or when to treat the
young patient with impacted but asymptomatic third molars is less
From the Faculty of Dentistry, University of Toronto; and the Craniofacial Prosthetic Unit,
Toronto-Sunnybrook Regional Cancer Centre, Toronto, Ontario, Canada

12 ANDERSON
obvious. With experience, practitioners build up a mental library of

circumstances that can be recognized when next encountered. This is
practice by pattern recognition.
Because of the infinite variety in the combinations of circumstances
encountered every day, the choices made are commonly extensions of
previous experiences. For example, the extension of resin-bonded pros-
thesis designs from the front of the mouth to the posterior segments is
logical, provided provision is made for the extra occlusal load. When no
previous experience is available as a guide, a knowledge of basic biologic
principles can guide decision making. For example, for an edentulous
patient who has had a maxillectomy, ensuring that the design for a
denture includes bilateral support will guide the impression procedures.
Decision making in clinical practice thus is supported by pattern
recognition when experience exists. When experience does not exist,
the practitioner falls back on extensions from previous experiences or
inferences from basic biologic principles. Continuing education guides
and reinforces these strategies. A comfort level develops, which is the
confidence one gains with years in practice.
All these approaches are molded by the single practitioner’s clinical
and educational exposure, that is, by one person’s sample of the profes-
sion’s accumulated knowledge and judgment. Because the practice be-
haviors of dentists are highly divergent, there is clearly great variation
in each practitioner’s sample of knowledge and experience. Hence, the
decisions reflect different biases and knowledge gaps among different
clinicians. This consequence is the problem that evidence based practice
(EBP) is intended to address. The first step in EBP is to acknowledge
that such gaps exist in one’s personal knowledge and experience. Or as
Will Rogers put it, ‘‘Everybody is ignorant, only on different subjects.’’
WHICH QUESTIONS?
In the flow of daily practice, virtually no decisions are made in a

complete information vacuum. (Such decisions would best be made with
the flip of a coin.) When there is no definitive information on a given
problem, there is nearly always some influence, whether it be patient
preference, the practitioner’s knowledge of basic biologic principles, or
the practitioner’s habits. Decisions are made, therefore, without empiric
information about the consequences of the decision. For example, is
endodontic treatment and full-coverage restoration of a nonvital molar
more cost-effective than extraction and replacement with an implant-
supported prosthesis? If practitioners recognize that they do not have
empiric evidence on a current problem, suddenly the practice day be-
comes filled with uncertainty, even for the experienced practitioner. As
in medicine, this uncertainly is, in fact, the nature of dental practice. The
practitioner must decide which questions to pursue in the limited time
available.
Clearly, the thoughtful practitioner will seek evidence to answer
THE QUESTION 13
questions that directly affect patient management. Doing so is ethical

practice: it puts the patient’s perspective on the problem ahead of the
practitioner’s. The patient may want to know if chewing will be easier
with a fixed implant-supported prosthesis than with an implant-sup-
ported overdenture. The practitioner, on the other hand, may be more
concerned with implant survival. So, the first criterion in selecting which
questions to pursue is to choose questions from the patient’s perspective.
The fact that the question has arisen means that it can arise again, so
the second criterion suggests that practitioners seek evidence on ques-
tions that assist in staying current and in preparing for the next occasion.
Often in the pursuit of this information, however, the literature does not
provide a definitive answer. To ration time effectively, the third criterion
suggests choosing the questions that are most likely to yield a clear
answer. Of course the searcher cannot know in advance whether the
answer is available to be found. Common problems, however, are more
likely to have a better body of literature than rare problems. Finally, of
course, the searcher should choose interesting questions that spark the
learning process.
WHY BOTHER?
For the Patient
As noted previously, the patient’s questions and the practitioner’s

questions are not always the same. Articulating the question makes it
more likely that the practitioner’s quest for scientific information will
correspond with the patient’s perception of what is important. Thus,
there is better opportunity to include in the question issues that balance
the potential for good with risk of harm. Similarly, the question should
reflect the patient’s wishes and priorities, concerns about costs, and
cultural issues. An implant-supported fixed reconstruction cannot be
done for an edentulous patient without significant surgical procedures
and considerable discomfort and cost over a prolonged period of time.
The patient should expect that the additional discomfort, costs, and time
taken will yield a worthwhile extra benefit in terms of comfort, chewing
efficiency, and appearance beyond conventional dentures. In addressing
these concerns, the practitioner can easily be sidetracked into surrogate
outcomes that do not provide a direct measure of success for the patient.
For example, in the landmark 15-year report of implant success by Adell
and others,1 the authors reported rates of continuously stable prostheses
as high as 100%. Significant numbers of patients, however, had to be
reoperated on as many as three or more times to maintain continuous
prosthesis stability. Although reoperation is less common now, implant
treatment is still not without such risks, and they may be of primary
concern to the patient. A clearly articulated question that probes such
issues focuses the treatment priorities for the patient and assists the
provider in offering appropriate counsel on the potential for harm.
14 ANDERSON
For the Searcher
The most direct approach to finding the answer to a clinical question

is to telephone a colleague and ask. Doing so doubles the sample of
knowledge and experience that is brought to bear on the problem. Given
the variety of practice decisions that are made worldwide, however, this
sample is still is unimpressive. There remains, also, the specter of the
blind leading the blind. With the availability of easy access to the
worldwide literature, there is now no reason why that vast resource of
information cannot be applied to the individual clinician’s patient prob-
lem, other than the clinician’s inability to use it effectively. So perhaps
the clinician should waste no time in getting to the literature to hunt
down the evidence. The problem with this approach is that numerous
articles will likely be found that seem to address the clinical issues. As
a result, time will be wasted going through them to find the one that
deals most directly with the issues and provides the strongest evidence.
Thus, for finding the best evidence, there are two advantages to taking
the trouble to articulate a carefully crafted clinical question. One relates
to efficiency in constructing a search, and the other relates to reviewing
the found titles as quickly as possible.
By carefully crafting a question, the searcher learns to be more
specific. The search terms selected for the search become more specific
and thus are more likely to exclude concepts that are peripheral to the
central point. More precise selection is likely to influence the choice of
outcome measure, that is, the result desired by the patient or the out-
come the patient seeks to avoid. When these issues are articulated
carefully, the search terms will yield a smaller number of articles whose
titles and abstracts must be reviewed individually.
Similarly, a carefully crafted question provides criteria against which
found articles can be reviewed for closer inspection. As the titles and
abstracts of articles are scanned, the searcher is asking, ‘‘Do I want to
read this article in detail?’’ If the answer is no, the searcher wants that
answer quickly, to be able to proceed to the next article. Having the
criteria enunciated clearly in the question facilitates a quick judgment.
Here again, the choice of outcome measures is often critical. Articles that
address the same problems as those being researched using the same
interventions but recording different outcomes are of general interest
but are not necessarily relevant. Being able to ascertain quickly that the
outcome reported is not the outcome of interest allows the searcher to
move on to the next article more quickly.
Another advantage of articulating a clearly defined question can be
found in the communication between cooperating providers. In referring
patients to specialists, general practitioners can focus the attention of the
specialist and at the same time circumscribe the specialist’s responsibil-
ity. It therefore is easier for the general practitioner to fulfill the duty to
coordinate specialist services.
Finally, a significant benefit of taking the trouble to frame clinical
questions is the opportunity to organize the questions for later reference.
THE QUESTION 15
Lee et al6 suggest the development of critically appraised topics (CATS)

that form a personal library of answers to clinical questions that have
arisen. Of course, such a library needs to be updated from time to time,
but it serves as a starting point for future searches and at the very least
provides a compendium of accumulated best evidence on issues already
encountered.
TYPES OF QUESTIONS
To fill the knowledge gaps, the busy practitioner needs a strategy

to yield the greatest return in information in the least amount of time.
The earlier questions relating to unlocalized pain, periodontally involved
teeth, posterior restorative materials, and third molars are vague. They
do not define what the practitioner really wants to know about those is-
sues.
Sackett et al9 suggest that a searcher might want to obtain either
background information or foreground information. Background informa-
tion relates to a general understanding of a disorder, test, treatment,
product, other matter. For example, questions such as, ‘‘What is the wear
rate of this posterior composite material?’’ or ‘‘What are the nerve
pathways responsible for unlocalized pain?’’ are background questions.
These questions usually have two components. They start with who,
what, where, when, why, or how and a verb that connects them to the
item of interest.
Foreground questions, on the other hand, are more specific and
relate to the management of the patient. For example, ‘‘In patients with
unlocalized dental pain, is a cold test more sensitive than an electric
pulp test in identifying a pulpitis?’’ or, ‘‘In patients with asymptomatic
impacted third molars, will removing the teeth cause greater loss of
bone support at the distal of the second molars than not removing
them?’’ are foreground questions. These questions usually have four
components: (1) a population; (2) an intervention; (3) an alternative
intervention; and (4) an outcome (the result of the test, treatment, or
exposure).
The patient is a member of a population that is usually described
by demographics, diagnosis, symptom, or exposure. The patient, for
example, may be a man in his fifties, who is a smoker, with a complaint
of loose teeth. Some of these factors may be irrelevant, but the relevant
factors are the features that define the population of interest. An inter-
vention describes the action being considered, which usually is a diag-
nostic test, a treatment, or an exposure. The alternative intervention
serves as a reference against which the test or treatment of interest is
compared. One might, for example, compare fixed implant-supported
prostheses against implant-supported overdentures. Finally, the outcome
is the result sought from the test or treatment or the unhappy event one
wishes to avoid, such as a diagnosis of apical periodontitis, or chewing
efficiency, or implant failure.
16 ANDERSON
At any time the searcher may need answers to both background

and foreground questions. As students, practitioners asked many back-
ground questions to learn the biologic principles, disease processes, and
properties of materials. Experienced practitioners, dealing with all the
combinations of circumstances encountered in practice, are more inter-
ested in practical management issues that need to be specifically defined.
Framing a Question
An example illustrates the usefulness of framing a clinical question

as an aid to retrieving an answer quickly.
A dentist saw his edentulous patient on annual follow-up 2 years after

inserting fixed, implant-supported prostheses. The patient complained of dis-
comfort at one of the implants in the mandible. On examination, the implant
was found to be loose and had to be removed. The clinician now is unsure
whether the prosthesis can be expected to continue to function on the four
remaining well-distributed implants. The alternative is to tell the patient that
the remaining implants are too few to support the prosthesis. Preservation of
the remaining implants may require that new implants be inserted and the
prosthesis be remade or at least heavily modified. This alternative is an invasive,
costly, and time-consuming solution that the patient seeks to avoid. The patient
asks the dentist if he is more likely to lose his prosthesis if he continues to
function with just four implants.
The clinician converts the patient’s problem into a question: ‘‘In

edentulous patients with fixed implant-supported prostheses, is the risk
of implant failure greater when it is supported by only four implants
than when it is supported by more implants?’’ The population is made
up of edentulous patients who have implant-supported prostheses. The
intervention in this case is an exposure to the use of just four implants.
The alternative is the use of more implants (with the obvious implica-
tions for surgery, cost, time, discomfort, and so forth) The outcome is
implant failure, which could be defined in many different ways. These
phrases of the question will directly steer the choice of terms in the
search strategy and the assessment of the found titles.
HOW A QUESTION STEERS A SEARCH
Specific Definition of Search Terms
Using the concepts defined in the question, the clinician searches

MEDLINE by first entering the term edentulous as a descriptor of the
patient population. The software maps the term to ‘‘jaw, edentulous’’
and ‘‘mouth, edentulous,’’ both of which describe the situation of con-
cern. The next term to enter describes the population in more detail,
that is, those having an implant-supported prosthesis. This term maps
THE QUESTION 17
to several Medical Subject Heading (MeSH) terms that describe this

patient, including ‘‘dental prosthesis, implant-supported’’ and ‘‘dental
implants.’’ These terms are relevant to the problem, so both are selected.
Exposure to four implants sounds like a narrow circumstance that could
not be easily generalized in a search of the literature. Because the
situation of four implants is of interest, however, the number four could
be entered as a text word. The search software will then look for all
occurrences of the word four (and words containing ‘‘four’’) in the titles
and abstracts. Finally, the searcher enters a term describing the outcome
measure, which is ‘‘implant failure.’’ The software maps this term to
‘‘prosthesis failure.’’ Combining all these terms yields no information in
the current database. Repeating the search in the 1993–1996 database
yields five articles that may answer the patient’s question.
Skimming Titles and Abstracts in Found Literature
The clinician now wants to scan the found titles and abstracts
quickly to identify the best one or two articles that are most likely to
answer the patient’s question. Here, again, the details of the question
facilitate the process. Each title (and abstract, if necessary) is scanned,
and the content is compared with the population, maneuvers, and out-
comes articulated in the question.
Of the five titles found in the search, the first is a case series by
Leimola-Virtanen7 that followed four implants in the mandibles of 39
patients for 3 to 10 years. Implant and prosthesis success rates are
provided. This article thus seems to address the patient’s question quite
closely, except the prostheses used were denture prostheses, not fixed
prostheses. In addition, being a case series, the article offers no control
against which to compare the success rates found in the patients with
only four implants. This article therefore is not a strong piece of evidence
to use in answering the patient’s question.
The title of the next article, by Jemt and Lekholm,5 seems to deal
more with varying amounts of remaining bone. Nothing is said about
the number of implants or prosthesis type used. A quick check of the
abstract against the criteria in the question confirms that this article will
not help answer the question.
The title of the third article describes a study by Brånemark and
others2 that compares the use of four implants against six implants in
edentulous patients. By the title alone, this article seems to satisfy two
of the criteria specified in the question. A check of the abstract reveals
it to be a study that uses a cross-sectional design that provides a control
group to assess the success rates in the four-implant group against a
control group with more implants. This article thus provides much
stronger and more focused evidence of the implant and prosthesis suc-
cess rates that could be expected when only four implants are available.
The fourth article by Zarb and Schmitt10 provides a title and abstract
18 ANDERSON
that are too vague to identify the details of either the maneuvers or the
outcomes. With the relatively focused article by Brånemark et al avail-
able, there seems little value in retrieving this article and reading it
in detail.
Finally, the title of the fifth article, by Jemt and others4 suggests that
the article deals with overdentures exclusively and thus is not relevant
to the patient’s problem.
This review of the found titles has revealed an article that seems to
address the practitioner’s question directly and provides a study design
that permits useful comparisons of success rates to support an answer
to the patient. Although the evidence is not compelling (the study is not
a randomized trial), it is the best available evidence that bears directly
on the question. The patient can thus be informed that leaving his
prosthesis to function on four implants is unlikely to pose greater risk
of implant or prosthesis failure than there was when there were more
implants. The patient is thus spared the time, cost, and discomfort of
further implant surgery while avoiding any extra risk of failure.
SUMMARY
This exercise of isolating the strongest article from the found titles
should take no more than 1 to 2 minutes. Thus, the whole process of
searching for the best evidence should take no more than 5 minutes. In
medical practices where evidence based practice is done routinely, this
process can be completed in less than 1 minute.8 Obviously, the evalua-
tion could not have been made as expeditiously without the benefit of
the specific details articulated in the question. The question focused the
search terms and expedited the identification of the strongest evidence
that directly addressed the patient’s problem from among the found
titles. It provided the dentist with good (but not compelling) evidence
to support an answer to the patient. It also provided the dentist with a
new piece of information to use the next time the problem of reduced
implant support comes up. The dentist has thus enjoyed the satisfaction
of quickly identifying new knowledge and the confidence that comes
with its use. In addition, the information has provided the dentist
with a small but important block against the deterioration of clinical
judgment skills.
References
1. Adell R, Lekholm B, Rockler B, et al: A 15-year study of osseointegrated implants in the

treatment of the edentulous jaw. International Journal of Oral Surgery 6:387–416, 1981
2. Brånemark PI, Svensson B, van Steenberghe D: Ten-year survival rates of fixed prosthe-
ses on four or six implants ad modum Branemark in full edentulism. Clinical Oral
Implants Research 6:227–231, 1995
3. Elderton RJ: Variation among dentists in planning treatment. Br Dent J 154:201–206,
1983
THE QUESTION 19
4. Jemt T, Chai J, Harnett J, et al: A 5-year prospective multicenter follow-up report on

overdentures supported by osseointegrated implants. Int J Oral Maxillofac Implants
11:291–298, 1996
5. Jemt T, Lekholm U: Implant treatment in edentulous maxillae: A 5-year follow-up
report on patients with different degrees of jaw resorption. Int J Oral Maxillofac
Implants 10:303–311, 1995
6. Lee H, Sauve J, Faroukh M, et al: The critically appraised topic–A standardized
aid for the presentation and storage of evidence based medicine. Clinical Research
41:A543, 1993
7. Leimola-Virtanen R, Peltola J, Oksala E, et al: ITI titanium plasma-sprayed screw
implants in the treatment of edentulous mandibles: A follow-up study of 39 patients.
Int J Oral Maxillofac Implants 10:373–378, 1995
8. Sackett DL, Straus SE: Finding and applying evidence during clinical rounds: The
‘‘evidence cart’’. JAMA 280:1336–1338, 1998
9. Sackett DL, Straus SE, Richardson WS, et al: Evidence based Medicine. How to Practice
and Teach EBM, ed 2. Edinburgh, Churchill Livingstone, 2000
10. Zarb GA, Schmitt A: The edentulous predicament. I: A prospective study of the
effectiveness of implant-supported fixed prostheses. J Am Dent Assoc 127:59–65, 1996

Faculty of Dentistry
University of Toronto
124 Edward Street
Toronto, Ontario M5G 1G6 Canada
e-mail:jim.anderson@utoronto.ca
THERAPY
Anecdote, Experience, or Evidence?
Gary R. Goldstein DDS, and Jack D. Preston, DDS
How does a practitioner determine what therapy to use? Often, the

decision depends on the age of the practitioner and the experiences
gained in practice. The younger practitioner depends mainly on what
was taught in dental school. All dental schools have a core technique,
usually derived by faculty consensus, that allows a student to develop
competency in one approach to a therapeutic problem. Trying to teach a
novice multiple techniques usually results in the student’s mastering
none. Educators have agreed that teaching one technique well allows
the student to enter practice and satisfy the needs of the public. Unfortu-
nately, dental schools have been unfairly criticized as teaching outdated
and often unrealistic techniques. This criticism is not true. Dental school
faculty almost universally teach time-tested and scientifically sound
procedures. Ethics dictate that patients in dental schools be protected
and not subject to whimsical trends in treatment. Internal review boards
mandate that research be structured to ensure the patient’s rights are
preserved. The clinician, unencumbered by such constraints, often makes
forays into other treatment modalities, some successful, others disap-
pointing. Once in practice, the clinician is influenced by observations
based on experience. Such observations, however, are often flawed, and
associations thought to be causal are instead, only casual. Anecdotal
evidence from colleagues may mold decision-making. With the broad
communication now possible using with the internet, such anecdotes
may come from a continent away and from a completely unknown
From the Department of Prosthodontics, New York University College of Dentistry; De-
partment of Dental Material Science, New York University Graduate School of Arts
and Sciences, New York, New York (GRG); and The University of Southern California
School of Dentistry, Los Angeles, California (JDP)

22 GOLDSTEIN & PRESTON
person. Conversely, upon graduation some clinicians become comfort-

able with a particular procedure and may be wary of change.
As clinicians expand their knowledge through lectures and by read-
ing journals, they constantly modify their clinical methods. There is
always a new restorative material on the market, a new surgical tech-
nique, a new piece of equipment, a new toothbrush, and a new tooth-
paste. Detailers joke that dentists are gadget enthusiasts who buy a
product, use it once or twice, and store it in some cabinet, finding it
years later and not remembering when, where, or why it was purchased.
How do practitioners decide which treatments to use? Often, they
are influenced by the prestige of the professor giving the lecture or
writing the article. All too often, however, they are seduced by the
show rather than by the science. Multiple projectors, enhanced digital
presentations, or the glitz of the advertising become the main reasons
for change. Companies market directly to the public who, with inade-
quate ability to evaluate the hype, pressure the practitioner to change
therapy, often with inadequate research to justify the change.
An example is a patient with an edentulous area who presents with
the request for implants. What the patient is really saying is, ‘‘I want
teeth.’’ It is the practitioner’s responsibility to understand that the pa-
tient is requesting the ability to chew better, speak better, or look better.
It is the practitioner’s responsibility to determine the best therapy for
that individual patient and to advise the patient of that therapy and any
other suitable options. Another example is a patient who, having heard
all the hype on tooth bleaching, requests the procedure when the prob-
lem is really recurrent decay around old, severely stained composite
restorations that need replacement.
Any procedure involves some risk, and increasing risk usually ac-
companies more complex therapy. The practitioner should decrease that
risk as much as possible without unduly burdening the patient. Patients
have a moral, ethical, and legal right to know the risks and benefits of
any therapy that is recommended.
Today, information may be obtained from a variety of sources. There
are often newer procedures to supplant the approaches documented in
textbooks. Reports in peer-reviewed journals are more current, de-
pending upon the source and the publication delays. Today, many prac-
titioners obtain information over the internet, through conversations
with other practitioners, and through newsletters and non–peer-re-
viewed periodicals and journals. These less scientific sources can be
useful. For example, the problems of root fracture when cementing
dowels and the fracture of porcelain complete-coverage restorations
when using the first-generation resin/ionomer cements were first made
public in these forums. Regardless of how information is obtained,
anyone seeking newer approaches to improve the delivery of dental
service must apply the rules of evidence in evaluating a suggested
technique. Failure to consider all aspects of a therapy have sometimes
proven disastrous (e.g., the teratogenic effects of thalidomide) or merely
ineffective after encouraging initial results (e.g., early treatments for
THERAPY: ANECDOTE, EXPERIENCE, OR EVIDENCE? 23
AIDS). Therefore, the alert practitioner walks a tightrope between endan-

gering patients with a therapy that has an undetected accompanying
risk and failing to provide optimal therapy that would be of substan-
tial merit.
Anyone considering a new course of care must invoke the rules of
evidence and evaluate the strengths of a report against its inadequacies.
A technique or regimen may have statistical significance but lack clinical
merit. How, then, does the practitioner maintain balance on that tight-
rope and best serve the patient? The rules of evidence have been well
established; their benefit lies in their knowledgeable application. A re-
port may not furnish all the information desired, or the data may be
reported in such a manner that they are difficult to evaluate. Bias
from well-meaning researchers is common, and dentistry is filled with
volumes of pseudoscientific reports in which results have been derived
from a false premise or a flawed research design.
Unfortunately, some established dental procedures have gained ac-
ceptance because a charismatic champion of the technique was a con-
vincing advocate. Often, procedures that had merit were based on a
falsely attributed cause and have been successful for reasons other than
those to which their success has been ascribed.
When considering the merits of a report or lecture, the practitioner
must clearly understand the purpose of the study and how the investiga-
tors sought to establish their premise. The results of the study must
relate directly to this purpose statement. Anything not established as a
purpose of the study should not be given primary consideration.
Subjects enrolled in the study must all have an equal chance to
obtain the study parameter (e.g., drug, treatment regimen, material)
rather than the alternative approach (e.g., placebo, previously accepted
technique or regimen, no therapy). Those in the treatment and alterna-
tive groups must be equivalent in all pertinent respects.
Before being enrolled in a study, a person should go through a
complex screening process that establishes them in the appropriate co-
hort. Patients have a dental or medical problem and choose a treatment
facility. They may enter that facility at different stages of the disease
process and hence may have a different prognosis. They may be moti-
vated by cost, location, or the reputation of the facility or the treating
doctor. After screening, the patient is referred to the researcher, whose
study population is further filtered by the informed-consent or volunteer
process. Investigators also tend to include persons perceived to be com-
pliant to ensure their continuance in the study and to rule out apparently
less-compliant or difficult candidates. An additional series of eligibility
decisions are then made to reduce the population further. Inclusion and
exclusion criteria must be clearly established. They are necessary but
must be pragmatic and relevant. As investigators cull the potential
population using appropriate demographic criteria, they also rule out
persons with potentially confounding comorbidities. Ultimately, clini-
cians must ask if the results are applicable to their patient population. If
the sample group is excessively homogenized, the study population may
not be representative of the clinicians’ patients, and the study may have
decreased validity.
For example, in a well-done study on IPS Empress inlays and
onlays, the population (N 130) consisted of 27 one-surface inlays, 38
two-surface inlays, 40 three-surface inlays,8 and 25 onlays.1 A significant
percentage of the population consisted of class I restorations; therefore,
the data may not be pertinent to a clinician who does not normally
perform class 1 restorations.
Exactly what a study is to measure must be determined in advance,
and the methods of measurement of the effects must be clearly and
specifically stated. The precision of the measurements (or the converse,
the error of the study) must be established before the study is initiated.
It is not enough to know that the microscope used had a precision of 5
␮m. The ability of investigators to repeat their measurement is crucial.
How many persons were involved in making the measurements? Was
their equipment calibrated to ensure that the measurements were equiva-
lent? The method by which the study is to be analyzed must also be
established a priori. Too often, investigators gather data only to find that
statistical analysis is compromised by the procedures used.
The outcome assessment must be relevant. Investigators sometimes
are encumbered by the dogma that the only legitimate way to do an
experiment is to vary one factor at a time.1 This univariate approach is
at odds with the multivariate climate in which the clinician functions.
For example, in reviewing the current literature for dental luting ce-
ments, Rosenstiel et al6 listed 10 different clinically important parame-
ters. A study that concentrates on only one factor may not supply
enough information to warrant a change in material.
Readers must be acutely aware of the structure of the study before
trying to ascertain its applicability to their patients. The design is deter-
mined by the direction of inquiry, who determines the therapy, and the
presence of a control group. Prospective studies are those in which the
therapy is initiated at the start of the study. The advantage of a prospec-
tive trial is that, theoretically, the investigator can control all aspects
of the treatment and minimize the effect of confounding variables.
Retrospective studies are those in which the therapy was initiated before
the beginning of the study. The disadvantage of this study design is the
inability of the investigator to control inadvertent confounding variables.
Studies can be further divided into comparative studies, (also called
analytical studies) that have a control group, and descriptive studies
with no control group.
The hierarchy of evidence can be listed as
1. Comparative studies
• Prospective studies
Randomized, controlled trials (RCTs) – assignment to therapy
is under the control of the investigator
Cohort study – two matched study groups (cohorts) are assem-
bled and followed. Because the patient self-selects the treat-

ment, the assignment to therapy is not under the control of
the investigator.
• Retrospective studies
Case-control study – similar in design to the cohort study
except that the outcome was present at the time the study be-
gan
2. Descriptive studies
• Case series
• Case report
The hierarchy of evidence gives the reader a primer to use when compar-
ing conflicting evidence.
The RCT study is the standard for questions regarding therapy.
Because the study is prospective, and the therapy is under the command
of the investigator, an RTC minimizes the bias inherent in other designs.
The control, usually the standard of care, allows the reader to make
direct comparisons; hence, this design provides the best evidence. Un-
derstanding the terminology is more important than simply recognizing
the terminology. Randomization, however, cannot make up for a poorly
planned and implemented study. Just because the design is an RCT
does not relieve the reader from the responsibility of examining the
methodology.
Randomization is effective only when the study population is of
compelling size. The appropriate study population size in a clinical
trial varies for different questions. Determining the size of the study
population requires a knowledgeable best guess by the researcher and
consultation with a statistician to determine the power of the study.
Randomized, controlled trials are not always possible because of
cost, time, or ethics. For example, it would not be ethical, in performing
a study on the hazards of smoking, to randomize a cohort to a regimen
that forced a participant to smoke two packs of cigarettes a day to see
if there was a harmful result. Rather, matched cohort studies, although
not ideal, are accepted as the norm to answer a question of harm.4
Sackett et al have concluded that ‘‘Evidence-based medicine is not re-
stricted to randomized trials and meta-analyses. It involves tracking
down the best external evidence (from systematic reviews when they
exist; otherwise from primary studies) with which to answer our clinical
questions.’’7
The determination of an acceptable control refers back to the ques-
tion the study is trying to answer. In a study to determine the efficacy of
a new drug, a placebo could be an acceptable control. The pharmaceutic
industry, however, is concerned about the extent of the placebo effect,
which can be as high as 30%. A current trend is to have a three-group
(instead of the typical two-group) RCT. One group receives the new
drug, one group receives the placebo, and the third group (the control)
receives no treatment. The difference between the no-treatment group

and the placebo group is the placebo effect, whereas the difference
between the placebo group and the treatment group is called the thera-
peutic effect.2 In therapy trials, the most useful control for the clinician
is the current standard of care. The usefulness of a study on a new
headache medication is enhanced if the control is aspirin, ibuprofen, or
acetaminophen rather than a placebo, because most clinicians would not
prescribe or take a placebo for a headache.
Unfortunately, most dental therapy articles are descriptive rather
than comparative. In the typical case study or case series, practitioners
evaluate their own work. Despite the integrity of the clinician, the study
cannot have the same validity as one in which an independent, blinded
observer assesses the outcome. Another problem of descriptive studies
is that authors sometimes want to project their data beyond the scope
of their project. For example, it is easy when doing a case series on
implant product X to compare it with another case series done on
implant product Y. This comparison is dangerous, because the two
studies had different populations, in different settings, receiving differ-
ent therapy from different investigators. The groups are almost always
dissimilar, and treatment regimens are almost always different. Although
such a comparison is acceptable in the discussion section, it should
never appear in the conclusions. Conclusions can report only the results
of the present study in answer to the initial question or hypothesis.
Journals and authors usually express results in positive numbers, a
practice which can be misleading to the clinician. For example, an 85%
success rate might sound impressive, but viewing the same results as a
negative (a 15% failure rate) may have more impact on the decision
making process.3 An example is a recent article in the Journal of the
American Dental Association evaluating Class V restorations with and
without mechanical retention.5 The authors claimed that ‘‘restoration of
Class V lesions without using mechanical retention could be expected to
succeed in seven of 10 restorations over a three year period,’’ but
clinicians must determine if a 30% failure rate after 3 years is an accept-
able result in their practices.
Results are also presented in terms of statistical significance, and
unfortunately statistical significance does not always relate to clinical
significance. For example, an investigator may use an extremely accurate
measurement device which can report attachment loss around teeth in
tenths of a millimeter. After 2 years of treatment with drug X, the
study shows a statistically significant attachment loss of 0.01 mm when
compared with scaling and root planning. But are the results clinically
significant, especially if the drug therapy causes an after-effect? The
clinical relevance of a statistically significant finding is best determined
by the clinicians reading the report and determining if the results are
applicable to their patients.
The clinician must also understand the difference between a biologic
response and a clinical response. A new mouthwash may demonstrate
the ability to kill more bacteria or viruses (a biologic response), which
may have no clinical relevance. If, however, a clinical response such as

a decrease in periodontal disease, caries, or malodor can be demon-
strated with the use of the mouthwash, the data have relevance for the
clinician. Another example is the family of glass ionomer restorations
and cements. Fluoride release (a biologic response) is meaningless to the
practitioner unless a documented decrease in caries (a clinical response)
can be demonstrated.
When confronted with evidence that conflicts with the current stan-
dard of care, an experienced clinician can be biased in evaluating data.
‘‘I’ve been doing this for years and it works in my hands,’’ may not be
an acceptable excuse to disregard compelling studies.
The definition of success is controversial, and consensus is often
difficult to achieve. How long should a restoration last? If it is still in
place, but staining compromises the esthetics, is the procedure a success
or failure? How long should an implant last? If there is a loss of
osseointegration along one wall of the implant, but the fixture is rigid
and there is no pain, is the implant a success or a failure? What is an
acceptable success rate for molar endodontics? If there is a small periapi-
cal radiolucency, but the patient has no pain, is it a success or failure? If
the patient has intermittent recurrent pain, but the radiograph demon-
strates a perfect fill, is it a success or a failure?
Success is also tempered by the cause of the failure. Recurrent decay
or periodontal problems that compromise a full-coverage restoration and
are caused by poor home care are different from the same problems
caused by defective margins. A patient fracturing a restoration by biting
into an olive pit is different from a failure caused by an overlooked
occlusal prematurity. Also, changes in the clinician’s advice to patients
can cause embarrassing moments if the dentist is not willing to admit
that current good research has caused a change in thinking. Should a
clinician restore a patient’s lost molars? Patients need teeth to masticate,
to phonate, and for esthetics. If the lost molars are not in the esthetic
zone, the patient has no problem eating or speaking, and extrusion of
the opposing dentition has not occurred or is not a concern (the opposing
molars are also missing) why restore? This argument has intensified
with an article by Witter et al9 in the Journal of Dental Research that
showed 9-year prospective evidence questioning the rationale for restor-
ing the missing molars, and the controversy will persist as researchers
supply more data.
SUMMARY
In dentistry, most changes in therapy come from new techniques

and products that are introduced to the market. Clinicians (and patients)
can be overwhelmed by advertisements and marketing, some obvious
and some (e.g., paid clinical reports in non–peer-reviewed journals) not
so obvious. Because most advances are made with small case studies,
which are at a lower level of evidence, it is imperative that data clinicians
read or see have the greatest validity possible. This validity is imperative
to achieve evidence-based dentistry that uses relevant, high-quality, clini-
cally oriented research that provides better information for the clinician
and better treatment for the patient.
References
1. Brunette DM: Critical Thinking. Understanding and Evaluating Dental Research. Chi-
cago, Quintessence Publishing Co, 1996
2. Chambers D: The big placebo. J Am Coll Dent 66:2–5, 1999
83:599–603, 2000
84:8–16, 2000
5. McCoy RB, Anderson MH, Lepe X, et al: Clinical sucess of class V composite resin
restorations without mechanical retention. J Am Dent Assoc 129:593–599, 1998
6. Rosenstiel SF, Land MF, Crispin BJ: Dental luting agents: A review of the current
literature. J Prosthet Dent 80:280–301, 1998
7. Sackett D, Richardson WS, Rosenberg W, et al: Evidence-based Medicine: How to
Practice and teach EBM. New York, Churchill Livingstone; 1997
8. Struder S, Lehner C, Brodbeck U, et al: Short-term results of IPS-Empress onlays and
inlays. Journal of Prosthodontics 5:277–287, 1996
9. Witter DJ, Creugers NHJ, Kreulen CM, et al: Occlusal stability in shortened dental
arches. J Dent Res 80:432–436, 2001

Division of Restorative and Prosthodontic Sciences
New York University College of Dentistry
345 East 24th Street
Clinic 5W
New York, NY 10010–4086
e-mail: gary.goldstein@nyu.edu
THE ETHICS OF EXPERIMENTING

IN DENTAL PRACTICE
David W. Chambers, EdM, MBA, PhD
This article is not about the experiments dental researchers conduct

in laboratories or controlled clinical trails. It is about the far more
common experiments dentists conduct in their offices—for example, the
first time a new procedure is performed following a continuing educa-
tion course, using a material ordered as a sample, performing endodon-
tics on a molar more complex than any attempted in recent years,
proceeding with a large case in which several alternatives look equally
attractive.
There is a very simple and well-known rule of ethics for performing
procedures in which there is some attendant risk: Primum non nocere—
above all, cause no harm. This injunction is often attributed to the
Hippocratic Oath, and it has become famous among malpractice attor-
neys and writers of editorials.
The truth is that primum non nocere does not appear in the Hippo-
cratic Oath, and it is doubtful advice.6 It is a Latin gloss on the older
Hippocratic admonition that might better be translated, ‘‘You have been
given great power as a doctor; use it for good and not for evil.’’ It would
be unwise to make avoiding harm the ultimate standard for a care
provider. The only certain way to assure avoiding harm would be to
avoid undertaking treatment altogether.
Attempting to do good for patients is attendant with risk. This
article addresses the problem of treating patients in an ethical fashion
when there is no way of guaranteeing success. Such situations are
common and unavoidable in dental practice.
From the Office of Academic Affairs and Scholarship, School of Dentistry, University of
the Pacific, San Francisco, California

30 CHAMBERS
GENERAL APPROACHES TO ETHICS
The recent interest in ethics in medicine and dentistry reflects the

growing range of choices in the professions. One hundred years ago
when dentists primarily treated pain caused by advanced caries, fast
forceps were the measure of quality. As dentists began to understand
caries and periodontal diseases, diagnostic acumen assumed importance,
and a range of treatment skills was required. Still, the number of proce-
dures available per condition was small, and patients were both unaware
of alternatives and usually quite willing to follow the judgment of the
dentist. Today, patients visit the dentist in the complete absence of
symptoms for preventive reasons and to seek cosmetic enhancements.
They often bring their own opinions with them. Disease entities have
expanded to include malocclusions, temporomandibular joint considera-
tions, and oral cancers, and the options for treating even the most
basic of conditions—caries—have become bewilderingly vast. Once a
condition needing intervention is identified, there are frequently many
choices of methods and materials for treatment. Industry and continuing
education speakers pressure dentists to consider the merits of the alter-
natives they favor.
As choices multiply, the opportunities for making right and wrong
choices expand. The profession has recognized this situation and has
turned to the field of ethics for guidance. The basic texts in dental ethics
are those by Ozar and Sokol12 and Rule and Veatch.15 An organization
known as PEDNET—Professional Ethics in Dentistry Network—is de-
voted to promoting awareness and discussion of dental ethics, and its
members welcome contact at dozar@luc.edu. Dental schools across the
country are adding courses in ethics to their curricula. A national Alli-
ance for Oral Health has been created, embracing 61 organizations
involved in health care such as the American Dental Association (ADA),
insurers, specialty groups, the military, public health groups, allied den-
tal health professionals, examiners, schools, and so forth. The American
College of Dentists, long concerned with ethics and professionalism, has
an excellent, small handbook (available at www.facd.org). The winter, 1996,
issue of the Journal of the American College of Dentists contrasts multiple
approaches to ethical analysis of a single case involving managed care.10
Of the many approaches to ethics, the most basic is grounded on
ethical principles. Principles animated the revision of the ADA Code of
Ethics and Professional Conduct completed in 1998. In this approach, a
set of ethical principles (shown in Table 1) is used as a touchstone for
reflection and conduct. Obtaining informed consent from patients, for
example, is appropriate based on the principle of autonomy—the pa-
tients’ right to decide what is to be done with their bodies. The principle
of veracity can be cited as reason for explaining procedures and their
consequences in clear, understandable language. Such principles offer
general guidance, although conflicts can arise among the principles. For
example, a patient may want veneers when what is needed is periodon-
tal therapy. Autonomy and beneficence clash in this case. The issue
THE ETHICS OF EXPERIMENTING IN DENTAL PRACTICE 31
Table 1. COMMON ETHICAL PRINCIPLES
Principle Definition
Autonomy The right of the patient, the dentist, and any other competent
individual who is involved to determine what should be done
by and to them
Beneficence An obligation to help others, normally assumed in exchange for
privileges granted a group such as professionals
Competence The capacity to perform as one promises or as expected
Integrity Consistency throughout one’s actions and language; being
guided by core values
Justice Fairness in the distribution of rewards and obligations and in
the processes by which distribution is made; sometimes tested
by a willingness to trade places with others one deals with
Nonmaleficence Avoiding unnecessary harm to others
Veracity Telling the truth and creating environments where honest views
are expressed
addressed in this article—experimenting in dental practice—can be

framed as a conflict between beneficence (helping the patient and other
patients in the future) and nonmaleficence (not harming the patient).
Some dental ethicists are pushing beyond the principles approach.
Their work is prompted by questions such as, ‘‘How does a person
recognize when he or she is dealing with an ethical issue?’’, ‘‘What
happens when principles are in conflict?’’, and ‘‘Shouldn’t ethics lead to
right action as well as right thought?’’ Murial Bebeau has applied the
work of Rest and Narvaez14 to dentistry in proposing an approach to
ethical issues in terms of moral sensitivity, moral reasoning and judg-
ment, moral motivation and commitment, and (at the highest level)
moral character and competence. Ozar and Sokol12 and Rule and Veatch15
have worked though many cases in dentistry, offering some thoughts on
how competing claims can be addressed and which values take prece-
dence. Bruce Peltier has written about the difficulties of taking ethical
action and has offered practical advice.13
A Discursive Approach to Ethics
The discursive approach to ethics builds on the traditional methods

presented previously.4 This approach sets a context that places greater
emphasis on people than on principles, and it favors ethical behavior
over reflection. Attention is paid to how language is used to create
ethical communities.
Dentistry takes place in a social context.8 There is an understanding
on the patients’ part that dentists are well trained, perform only those
procedures they have high confidence will be successful, value the
patients’ welfare and their own reputation, are part of a network of
professionals available for backup, and will not take advantage of pa-
tients by performing unnecessary work or charging more than is fair.
32 CHAMBERS
Patients also realize that they are expected to be present and prompt for
appointments, to pay their bills, to answer honestly when asked about
their health, and to comply with reasonable requests for home care
and postoperative recommendations. This general therapeutic alliance is
understood by reasonable adults. It is the background for the jury
system, and it makes health care possible and efficient. No book contains
these rules, and they are normally discussed only when something
unexpected happens. Patients participating in insurance fraud or dentists
who performs unnecessary work generally understand that they are
acting outside the normal bounds of right and wrong.
In other cases, the therapeutic alliance is ambiguous. The patient
knows a damaged tooth must be fixed. But there are choices: considera-
tions of function, appearance, and cost must be understood and
weighed. Or the patient may be uncertain whether to remain with the
current dentist. The hours are inconvenient, the staff may not show
respect, and the dentist is abstemious with explanations. Again, an
understanding must be reached. These are not cases of universal expecta-
tions that form a treatment alliance. They represent alternatives in a
range of variation that contains individuality. Some dentists are known
to be expensive or to focus on esthetics. Others are known to take a
holistic approach. Some patients have personal traits that make them
difficult to deal with; others require an inordinate amount of attention.
As long as the office team and the patient can come to an understanding
about what is mutually acceptable, the treatment alliance can be pre-
served across a wide range of individual variation. Of course, there is a
limit to individual agreements that exceed public acceptability. Dentists
cannot perform medicine even if the patient agrees to medical proce-
dures, and insurance fraud is unacceptable, even with patients’ collusion.
Discursive ethicists are concerned with ethical communities and
agreements that promote civil good. Making and keeping promises is
central to a discursive view of ethics.7 A definition that is used in this
article is Ethics is the creation, adjustment, and maintenance of communities
in which participants can reach their potentials.
Several aspects of this definition go beyond the traditional concept
of ethics. First, ethics is a community activity; it concerns the relation-
ships among people. There are no private ethics. Ethics is something
people do together. Second, ethical understandings are created. This is
different from some traditional notions that there are abstract ethical
principles that must be discovered or with which all people would
agree. Discursive ethics is not ethical relativism; some actions such as
lying, murder, and seeking to avoid the penalties of violating agreements
are universally abhorred. The general treatment alliance mentioned pre-
viously contains such examples. Discursive ethics also recognizes that
there can be ethical violations within specific communities. A husband
can cheat on his wife in ways that might not bother other couples. A
dentist can violate the confidence of a patient without violating the ADA
Code or any generally accepted set of ethical rules. Third, discursive
ethics is concerned with the obligation to create ethical communities and
to adjust them when necessary, as well as with avoiding breaches of

established codes. Creating systems that put people in ethical jeopardy
is as wrong as violating the rules of such a system. Some dentists
have argued, for example, that the conditions of some reimbursement
mechanisms are unethical. (They are probably wrong, however, in plead-
ing that it is ethical to violate these conditions if they have voluntarily
agreed to a contract that contains them.)
Discursive ethics uses all the methods of traditional ethical theories
to create ethical communities. Ethics is often defined as the study of
right and wrong, and some ethical theories seem to accept that distin-
guishing right from wrong is the entirety of the ethical problem. Other
theories use the determination of right and wrong as a step in the ethical
process. In traditional ethical theory, judgments of right and wrong are
often made by third parties. In discursive ethics, however, the number
of categories is broader than the right/wrong dichotomy, judgment plays
a smaller role, and the perspective is entirely from within the community.
It may be too crude to categorize people or actions as only ethical
and unethical. Some people are ethically insensitive. They just do not
understand ethical issues; they are surprised when others call ethical
lapses to their attention. They do not pay as close attention to what is
expected as others would like. Some people are ethically awkward. They
try to do good, but they are unskilled. A colleague once described a
situation in which the dentist prescribed narcotics for the same patient
four times in a single day. He said he knew he was doing wrong but he
just could not be assertive with this particular patient.
A third category is ethical abuse. Ethical abuse is more than break-
ing the rules. Abusers want the rules to remain in place precisely so
they can take advantage of others who follow the values of the commu-
nity. Scam artists take advantage of the expectation that trust will be
part of relationships. Insurance frauds defend the insurance system.
Patients who fail to honor their financial obligations steadfastly profess
a relationship with the dentist. Ethical abusers want the benefits of
participation in an ethical community without the obligations of such
participation. (Civil disobedience, by contrast, is a willingness to step
outside a community whose ethics the conscientious objector finds offen-
sive. It is an open disobeying of the community’s norms.) The response
to ethical insensitivity or awkwardness is normally to increase group
concern and to try to help the individual. In the case of abuse, the
community distances the person from the group to preserve the group.
Dentists with addiction problems and those with poor clinical judgment
or skill receive remedial treatment or training. Those who refuse remedi-
ation or engage in purposeful deception lose the privileges of dental
practice. Those who embarrass the profession are shunned.
The Ethics Test
Dentists are in partnership with three ethical communities. The first

partnership is with each individual patient. Dentists operate within the
34 CHAMBERS
general treatment alliance, as modified by individual circumstances. The

second relationship is with the profession. It is inherent in professional-
ism that the acts of individual members affect the reputation of all
colleagues, and the reputation of the profession is an asset available for
use by the individual practitioner. Regardless of participation in orga-
nized dentistry, any who call themselves dentists are part of the commu-
nity, precisely because patients and the public see it this way. The third
relationship is with the public at large. Customs in a community, laws,
and general civil expectations apply in all cases.
Being aware of the three communities and the mutual ethical expec-
tations placed on all members of these communities is useful in creating
the ethics test. It is helpful to know when one is in an ethical situation.
Academics can always create a hypothetical context that would make a
particular act of a dentist an ethical issue, but dentists need a more
practical way of identifying, from an internal perspective, situations in
which the community is suffering from tension and abuse. If the test is
to be useful, it must work from the point of view of those in the
community. Here is the guideline: An ethical situation exists whenever
members of the community are compromised in their potentials. If the dentist
makes money by overtreating or undertreating or mistreating a patient,
it is an ethical situation. If an associate receives less compensation than
promised or a poorer mix of patients than promised, it is an ethical
issue. If a group of patients has less access to care than contracted for in
their insurance coverage or care that is limited, it is an ethical issue.
From the discursive perspective, it is possible to fashion an ethics
test. The test is oriented to the communities involved and not toward
abstract principles or personal feelings of right or wrong. The test has
two parts:
• If you believe members of the community (patients, colleagues, or
society generally) would be offended or outraged by an action on
your part provided that they knew all the relevant details—do not
do it!
• If you believe members of the community would be concerned
by an action on your part provided they knew all the relevant
details—discuss it with them.
Notice that both parts of the rule directly connect the ethical com-
munity to actions. The admonition, ‘‘Don’t do anything that would
outrage those with whom you have a relationship,’’ is obvious. The
injunction to discuss actions that might be of concern is more novel. It
speaks directly of ethics being the creation and adjustment of communi-
ties. Talking about ethical concerns goes to the point of clarifying and
renegotiating relationships. One of the conditions for membership in a
group is giving others the right to withdraw from the relationship if one
intends to change it. The principle of autonomy is important in this
concept. Veracity, another ethical principle, is also important. When
discussing an ethical concern one must be honest—as one certainly
expects of others in the community. Informed consent is largely a process
of establishing and adjusting mutual expectations in an ethical commu-

nity limited to the dentist and patient in a specific situation. The concept
can be generalized.
EXPERIMENTS IN DENTAL PRACTICE
Dental practice makes use of science in several ways. Fundamental

principles are learned in dental school and updated through reading,
discussions with friends, and continuing education. Manufacturers also
provide information of varying degrees of accuracy and usefulness. By
far the most common way dentists learn is through observing the out-
comes of their work in their own practices on their patients in their own
hands.9 This information is potentially of great value; whether it does in
fact improve practice depends on how each practitioner responds.
A common understanding of the word experiment is a carefully
designed and controlled attempt to reveal truth in a research context. In
his classic The Reflective Practitioner,16 however, Donald Schon shows that
there are other common uses of the term.
An ethical issue is involved in the translation of research findings
into practice. Ethical issues are also involved in the experiments that are
conducted on a regular basis in practice. Most dental experiments in-
volving patients are performed in offices by dentists who are not trained
as researchers and normally do not think of themselves as experiment-
ing. Experimenting is what takes place, however, when a dentist per-
forms his or her first bonding case or first posterior composite. It is an
experiment when the dentist says ‘‘Let’s keep a watch on that tooth.’’
The first injection in dental school or the first endodontics case falls into
the same category. The dental profession even experiments on a whole-
sale basis in initial licensure examinations when unlicensed dentists
perform independent care on patients with a national success rate ap-
proximating 80% (one in five state board experiments fails11).
An experiment is any planned and purposeful action where the
results can be observed and the outcomes contain risk. Table 2 shows
several categories of experiments. Two of these are discussed along with
the rules of ethical experimentation in practice, and the final two are
then considered briefly.
Scientific Investigation
There may be a reluctance to accept the idea that practitioners

perform experiments in their practices because of the dominant concept
of experimentation that comes from science. The characteristics of strict
experimental design, randomized control groups, precisely defined pa-
rameters, and sophisticated statistical analyses are not possible in dental
practice. Dentists who are interested in this type of experimentation
36 CHAMBERS
Table 2. TAXONOMY OF EXPERIMENTS
Type Characteristics
Scientific investigation Extreme uncertainty regarding outcomes,
rigorous control, nonpractice context, purpose
to discover general principles, results in
publications
Experimental practice High probability of success, careful observation
rather than control, realistic settings, purpose
to discover more effective methods, results in
improved practice
Heroic measures High probability of failure, little control, all else
has failed
Doing nothing Unknown outcomes, no control, changes in
practice unrelated to outcomes
normally associate themselves with universities or other research pro-

grams.
Experimental Practice
Experimenting in practice is more common than it might sound. It

occurs regularly following continuing education programs, reading the
literature, or talking with colleagues. A visit from a supplier or to the
annual convention is another stimulus. Any new class of procedures is
an experiment. There is a common misconception that the ADA seal
of approval, publications in peer-reviewed journals, Food and Drug
Administration (FDA) endorsement, and other scientific validation pro-
tects a practitioner from experimenting. Unproven products, materials,
procedures, and equipment are only one source of risk in therapy.
Another source contributing to risk is the dentist. There is risk in a
technique when it is tried for the first time, regardless of how much
scientific research has been conducted or how many other dentists have
used the technique successfully. The third major source of risk comes
from the patient. To the extent that the patient in the chair is exactly the
same as the average patient in the research studies, the risk is reduced,
but it is never eliminated. Even a generally established procedure per-
formed by an experienced practitioner can present risk if the patient has
unusual conditions, systemic complications, or idiopathic expectations.
Of course, there are also interactions among the three primary categories
of risk—between therapy and dentist, therapy and patient, and patient
and dentist, and the interaction of all three factors.
Previous success involving any one or two of the categories of risk
does not eliminate risk in the others. A dentist who fails in treatment
using a product well-tested in the literature is not immune from ques-
tioning about whether he or she was properly trained and experienced
in the use of the product or whether the use of the product was appro-
priate in the particular circumstance. The recent concern over peer-

reviewed literature is in many ways unfortunate. It creates an impression
that only the product or procedure risk matters. The proliferation of
journals that focus on products and procedures and the small number
devoted to differences among dentists or among patients creates a mis-
perception that therapy is the major or even the only important source
of experimental risk in practice.
THE ETHICS OF PRACTICE EXPERIMENTS
The fundamental rule for experimentation in practice is if your

patients or colleagues would be shocked to learn that you had tried the
treatment, do not do it; if they would be concerned, discuss it with
them; if there would be no concern, proceed. Discussing treatments one
uses with patients is a matter of informed consent. Discussions with
colleagues are often informal, such as case discussions at component
society meetings, but they could be formalized as literature searches or
seeking the advice of known experts.
An experiment is not necessarily a failure because it does not go as
planned; it is always a failure when it should not have been attempted
in the first place. A motorcyclist who weaves between lanes of automo-
bile traffic may sustain injury or worse because he or she is a poor rider
or because an automobile drivers makes an unexpected maneuver. The
risk lies not so much the cyclist’s skill as the poor judgment in being
between the cars. Discursive ethics is concerned with creating ethical
circumstances as well as with acting ethically. There are four ethical
standards for experimentation in practice:
1. The action is undertaken for improving patient oral health.
2. The action is within standard of care.
3. There is a probable expectation of success based on evidence.
4. The action is performed reflectively, systematically, and with
measured outcomes.
Patients’ Interests First
The patient’s interests must always be the primary concern, and the
reasons for experimentation must always be to improve patient oral
health. Placing patients at risk in hopes of finding a faster or more
profitable way of delivering care is unethical. It is true that all three
parties (dentist, profession, and patient) are at risk in most practice
experiments, but patients cannot be co-opted into endeavors in which
they bear risk for the sake of other’s potential gain. It is insufficient to
argue that patients tacitly agree to general experimentation by agreeing
to care. (Treatment in dental schools is a possible exception to the rule.)
A special challenge to the principle of patients first involves the
38 CHAMBERS
difference between the interests of patients individually and collectively.

Can an individual patient be expected to bear the risk for improvements
that will benefit patients generally? This problem is handled in research
by informing patients that they are participating in an experiment, that
they may receive either a standard treatment or an experimental one,
and the expected outcomes of each. In such circumstances, patients
must consent to participate in a set of therapies that include uncertain
alternatives.
As a general practice, informed consent is vital when attempting a
novel treatment. Consent has the following advantages: (1) it forces the
dentist to think through what is being done in a rigorous fashion; (2) it
offers some legal protection; and (3) it clarifies exactly what is in the
patient’s interests. Sometimes dentists undertake heroic or innovative
treatments on the assumption that patients would prefer these courses
of action. (Certainly, dentists would prefer the successful outcomes if
the odds were not an issue.) Sometimes, a conversation with the patient
about the risks involved reveals that the risks are acceptable but the
proposed outcome is not what the patient prefers. Certainly, honest,
informed consent serves as a check that the innovative treatment is
being done for the patient’s benefit and not the dentist’s. If the dentist
must disclose that a novel treatment is being undertaken primarily for
his or her benefit, the ethical rule ‘‘if there is a concern, discuss it with
those involved’’ will preserve the dentist’s integrity (or the dentist will
lie, most often through incomplete disclosure).
Standard of Care
The second criterion for ethical experimentation is grounded in the

standard of care. The standard of care is a legal concept and one that is
rather fuzzy at the edges—precisely where office experimentation is
involved. In an important sense, the standard of care is an operational
form of the ethical rule ‘‘if one’s colleagues would be shocked at what
was done, do not do it.’’ The normal form of the argument in the
standard of care is that a particular example of therapy for a given
patient and performed by a dentist of certain qualifications falls into a
class that other practitioners would accept. LaForte resections are re-
served for specialists, often those with specific training. Surgical extrac-
tions can be done by general dentists, but there will be some question
about what other surgical experience the practitioner has and what
protocols were followed. The standard of care does allow for experimen-
tation, but what constitutes acceptable innovation is subject to review
by the standard of what one’s professional peers are doing.
Grounds for Expecting Success
Third, there must be probable reason to expect success with the new
product or procedure or patient. This baseline of probable success can
be established by studying its scientific basis, in conversations with

people who have first-hand knowledge and experience, or through the
dentist’s own experience with similar situations. In a highly abstract
sense, every treatment is a novel application of product and process,
dentist experience, and patient characteristics. Practically, each case is an
example from a class of similar factors. With extensive experience with
similar products or procedures, with dental experience in similar cases,
and familiarity with given categories of patients, the risk goes down.
There are no sharp categories regarding grounded experimentation. The
burden of proof increases rather sharply, however, when the dentist has
to answer that he or she has never used this therapy or any like it, has
little or no experience in such treatment, or has never done such work
on this type of patient. Before trying something new, dentists must ask
themselves, ‘‘On what grounds am I willing to justify taking this risk?’’
Systematic Approach
The final criterion dictates that unusual treatments require unusual

care in their execution. Experimentation cannot be capricious. Dentists
are expected to reflect on alternatives and their benefits and risks and to
share the results of their reflections. The treatment also must be delivered
in a careful fashion, and the results must be recorded. It is valuable in
some cases to prepare a written protocol for innovative treatments. At
an absolute minimum, the reasons for performing experimental work
must be entered in the chart.
Recording the outcomes of experimental procedures is critical. There
is much to be gained from recording outcomes on a routine basis for all
treatment, but experimental procedures are a special case. When
exposing patients, one’s self, and the profession to risk, it is imperative
to learn as much from the experience as possible. Recording outcomes
is necessary to reduce the exposure of further patients and others to
similar risk. If a treatment seems reasonable based on the patient’s
interests, standard of care, and available evidence but results differ from
expectations, the dentist will need to have good information about the
outcomes. Saying that, ‘‘It just didn’t turn out as planned,’’ or, ‘‘We’ll
have to do more such experiments to clarify the situation,’’ are signals
of ethical jeopardy.
The preceding discussion has focused on office experiments that
realistically have a high probability of success. The experiment is ethical,
provided that it meets the criteria of aiming to improve patient care
within the standard of care, is based on treatment that is known to have
a reasonable basis for successful outcomes, and is undertaken in a
reflective fashion. When some of the criteria approach the borderline,
honest communication with the patient will resolve the matter. If any
criteria are not met, office experimentation is unwise. Patients cannot
consent to risks others would regard as foolish.
40 CHAMBERS
Heroic Experiments
Heroic experiments are high risk. Although they may be undertaken
in the patient’s best interests, they normally fail two other tests: being
within the standard of care and having evidence of probable success.
Normally, heroic efforts are considered only when there is no other valid
alternative. Professional groups and the public at large normally frown
on such interventions because they expose both the individual patient
and the system for deciding what is appropriate behavior to risk. Den-
tists who may be attracted to such interventions are well counseled to
investigate the standard of care carefully.
The fundamental justification for heroic effort is that all other con-
ventional alternatives have been exhausted and that great risks are
justified to protect the patient from grave harm. There are presumed
trade-offs between the criterion for evidence of probable success and the
criterion for improving the patient’s well being. For such trade-offs to be
considered valid, there is a greatly heightened requirement for informed
consent. The patient’s true interests must be carefully explored, and
there must be overwhelming evidence that the patient understands the
risks associated with various outcomes (including no treatment) and
that the patient has made a completely uncoerced decision. The criteria
are written in capital letters when cases of experimentation in the dental
office deviate from standard circumstances. There may also be cases in
which the patient agrees to heroic treatment that would shock the
profession or the public. A private agreement between the patient and
the dentist—for example, to practice outside legal limits—is still unethi-
cal because there are communities to consider other than the patient.
The Invisible Experiment

Doing nothing is quite literally impossible. Sins of omission are still
sins, as anyone who has been sued for failure to diagnosis periodontal
disease will verify. Doing nothing in the context of this article means
adopting a hyperconservative approach and seeking to avoid experimen-
tation in the office by doing only what has been done successfully in
the past. As long as patients do not change, as long as their expectations
remain unaffected by media or reimbursement plans, and as long as no
other dentists innovate, this is a sound strategy. Professionals, however,
have an ethical responsibility to their colleagues to practice to an evolv-
ing standard of care. Technically speaking, a dentist should reveal as
part of informed consent that therapies being offered are behind the
times or that a definitive diagnosis is not being made because of out-
dated knowledge.
READING THE LITERATURE

This article has explored the ethics of experimentation in dental
practice. There is also a well-developed literature on the ethics of re-
search.17 An area between these two raises some interesting ethical

questions. What is the right or wrong in moving knowledge from the
scientific literature to the office practice?
As much as practitioners might wish it were otherwise, responsibil-
ity for using the scientific literature in dentistry rests almost entirely
with the dentist. Certainly, there is bad science, and some of it is
published in peer-reviewed journals or other sources that attempt to
present themselves as authoritative. The ADA and the FDA perform a
valued service in establishing standards for products and materials, but
many products do not seek this approval, including some effective
products that fall outside the FDA’s mandate. There are also some sound
products whose developers choose not to list with the ADA because of
the length of time required for approval or the restrictions on advertising
that the ADA places on products. Further, these organizations review
only products, materials, and devices that make therapeutic and some
cosmetic claims; supplements, for example, fall outside their purview.
When a clinically proven product fails to perform in a particular dentist’s
hands, manufacturers reflexively argue that the failure results from the
dentist’s technique.
Even peer review is not a sufficient standard. In 1998, the Journal of
the American Medical Association published an entire issue on the medical
literature. Included in the publication were a number of papers that
examined the uses and impact of peer review. In several respected
medical journals, the agreement among reviews was low, and there were
even cases in which, over the entire period studied, the consistency
between peer reviewers and the decision to publish was negative—the
higher the rating by reviewers, the less likely the manuscript was to be
published.3 The situation in dentistry is unknown. The only dental
journal that annually publishes the acceptance rate of manuscripts and
the concordance between reviewers and decision to publish is the Journal
of the American College of Dentists. The rate of concordance in that journal
is moderately high, between 0.60 and 0.80.
The credibility of published research findings cannot be assured
even by the best external reviewers. Three problems cannot be resolved
through the review process: (1) internal versus external validity, (2)
generalizability, and (3) the baseline problem. Because the individual
dentist cannot transfer responsibility for any of these problems to the
research or the journalistic communities, the practitioner must exercise
ethical practices in these areas as well. In fact, the solution to this
problem has already been addressed—dentists must perform reasonable
experiments in their own practices using the ethical standards discussed
previously.
Internal versus External Validity
Steady advances in the theory and practice of experimental design

and hypothesis testing have brought both basic science and clinical
42 CHAMBERS
dental research to a high level of sophistication. The standards for

judging the scientific rigor of research are well understood and are fairly
consistently applied by reviewers. The problem is inherent in the theory
of research design itself.2 The rigor that has been developed is largely in
the area known as internal validity. Controls, placebos, cross-overs,
statistical tests, and so forth all work to increase the likelihood of valid
conclusions in the context in which the research was conducted. A well-
designed study of patients in a nursing home tells about that nursing
home; a clinical trial of a new material conducted at a university applies
to that university. Scientific rigor is important, and reviewers are custom-
arily sensitive to the fine points of experimental design. External
validity—accuracy in general circumstances such as various dental
practices—requires high internal validity in the research, but internal
validity does not guarantee external validity.
Generalizability
External validity is commonly discussed under the heading of gen-

eralizability,1 that is, whether the results of a clinical trial on a certain
product in specific conditions can be generalized to other settings, partic-
ularly to the office of the dentist who is reading the study and may wish
to use the product. Generalizability is a gradient. The more similar the
study conditions described in the literature are to the office where the
results will be applied, the greater the external validity and the less
likely the practitioner will be surprised. External validity, however, will
always be lower in an application than in the study on which the
application is based. An appropriate analogy is shipping cookies across
country: sometimes they arrive only slightly damaged and stale, but
they never improve during the trip.
Responsibility for estimating generalizability of research results
does not rest with the research community; it rests with individual
practitioners. There is no way for the researcher to know all of the
circumstances in which results might be applied. Only the individual
dentist knows the difference between his or her practice and the circum-
stances described in the literature. In this sense, all dental research
consists of two experiments—one conducted by the researcher, and
another conducted by the dentist. The dentist is responsible for the
second experiment, and the ethical nature of the second experiment
should follow the rules already developed.
The Baseline Problem
There is much discussion today regarding evidence based dentistry.

Although the term has been used to describe a variety of activities, the
basic approach seems to be a concern that dental practice be based more
securely on evidence from scientific studies. Certainly, the issues of
internal validity and generalizability must be considered as tempering

the widespread use of this approach. Another issue is also troublesome.
The concept of evidence based dentistry was borrowed from medicine,
and the concept may not carry over effectively to dentistry. Physicians
spend a substantial amount of their practice time diagnosing a broad
range of conditions, but treatment is delegated to nurses, other physi-
cians, therapists, and even to patients using prescribed medications.
Dentists diagnose a much smaller number of more conditions, and they
treat those conditions themselves. Problem-solving is a smaller part of a
dentist’s role than treatment, and dentists develop intimate, intuitive
experience of the outcomes of treatment because of their direct involve-
ment in it. In other words, dentists have a rich baseline understanding
of patient conditions.
The baseline problem is a sophisticated issue in scientific decision
making.5 The most basic explanation of the baseline problem is that
valid decisions are made based on what is known in a general sort of
way about classes of conditions (the baseline knowledge) and on what
can be found out by inquiry (the evidence). When trying to determine a
value, such as pocket depth readings or the expected rate of decay
observed in an incipient carious lesion, the best strategy is to combine
the baseline knowledge and the evidence. Dentists do so intuitively
when they shade the probing depth reading based on other probings in
the area or modify their estimate of expected rate of caries advancement
based on both the lesion itself and baseline factors such as the age of
the patient, other evidence of caries in the mouth, and an assessment of
home care.
When the decision involves a course of action rather than a value
estimate, a different logic applies. The rule is always go with either
the baseline or with the evaluation evidence, whichever has a higher
probability of being accurate. To extract or to treat endodontically, to
bleach or not to bleach, to use an implant or a crown are decisions that
are mutually exclusive—one action excludes the other. Most carious
lesions are best treated based on the individual practitioner’s experience
in the practice (baseline) rather than the literature (external evidence).
The same is true, to varying degrees, for many other treatment decisions
in practice. It must be remembered, however, that whether the dentist
follows practice patterns or the literature in a particular case, if there is
any probability for surprise, a practice experiment is being conducted,
and the appropriate ethics must be observed.
References
1. Brennan RL: (Mis)Conceptions about generalizability theory. Educational Measure-

ment: Issues, Practice 19:5–10, 2000
2. Brunette DM: Critical Thinking: Understanding and Evaluating Dental Research. Carol
Stream, IL, Quintessence, 1996
3. Callaham ML, Baxt WG, Waeckerle JF, et al: Reliability of editors’ subjective quality
ratings of peer reviews of manuscripts. JAMA 280:229–231, 1998
44 CHAMBERS
4. Chambers DW: Looking for virtue in a virtuous society – discursive ethics and dental
managed care. J Am Coll Dent 63:39–42, 1996
5. Chambers DW: The roles of evidence and the baseline in dental decision making. J
Am Coll Dent 66:60–68, 1999
6. Chambers DW: Above all, check your references. J Am Coll Dent 67:2–3, 2000
7. Chambers DW: Promises. J Am Coll Dent 67:51–55, 2000
8. Chambers DW, Abrams RG: Dental Communication. Sonoma, CA, The Ohana
Group, 1986
9. Chambers DW, Eng WRL Jr: Practice profile: The first twelve years. Journal of the
California Dental Association 12:25–32, 1994
10. Ethics of managed care. J Am Coll Dent 63: entire issue, 1996
11. Licensure Results. News & Views [the newsletter of the American College of Dentists]
26:5, 1998
12. Ozar D, Sokol D: Dental Ethics at Chairside: Professional Principles and Practical
Applications. Washington, DC, Georgetown University Press, 1999
13. Peltier B: Reflection, introspection, and communication: A psychologist’s view of
dental ethics. J Am Coll Dent 67:33–38, 2000
14. Rest JR, Narvaez D: Moral Development in the Professions: Psychology and Applied
Ethics. Hillsdale, NJ, Lawrence Erlbaum, 1994
15. Rule J, Veatch R: Ethical Questions in Dentistry. Chicago, Quintessence, 1993
16. Schon DA: The Reflective Practitioner: How Professionals Think in Action. New York,
Basic Books 1983
17. Toward responsible research conduct: The role of scientific societies. J Dent Res 75:823–
860, 1996

David W. Chambers, EdM, MBA, PhD
University of the Pacific
2155 Webster Street
San Francisco, CA 94115
e-mail: dchambers@uop.edu
CONDUCTING A SEARCH OF
THE LITERATURE
David A. Felton, DDS, MS
The explosion of dental literature over the past decade in absolutely

unparalleled in the history of dentistry. Never before has so much
literature been in print, and never before has the clinician been expected
to have current knowledge of such a variety of materials and techniques
to provide the best possible dental care. It is impossible for a clinician
to read everything that is published on a monthly basis or to be expert
in all aspects of dentistry. Frequently, however, the clinician needs to be
able to draw on published reports to recommend treatment modalities
to patients without relying solely on the dentist’s own individual experi-
ence or other empiric methods. The purpose of this article is not to
provide concise information on how to conduct an overview of the
literature, but rather to show how to search the literature for articles
that may be appropriate in answering a particular question related to
patient care. A review of the methodology of conducting an overview
has been published elsewhere.1–3
Perhaps the most time-consuming portion of any review of the
published literature on a given subject is the actual literature search
itself. Several sources should be included in a comprehensive search of
the literature. These sources are traditional references (such as Index
Medicus or Index to Dental Literature), peer-reviewed dental journals in
print, CD-rom, and on-line electronic formats, electronic databases such
as PubMed, MEDLINE, and Grateful Med, and contacts with appropriate
source individuals.
From the Department of Prosthodontics, The University of North Carolina School of

Dentistry, Chapel Hill, North Carolina

46 FELTON
TRADITIONAL REFERENCE SOURCES
Unless the clinician has access to a health sciences library, the use
of traditional source materials such as Index Medicus or the Index to
Dental Literature may be limited. Determining under which traditional
headings in the Index to Dental Literature a particular topic is listed can
also be time-consuming. Each volume of these indices generally covers
the topics published in a single year, and searching through the array of
these indices is often daunting. When reviewing these indices, the reader
is urged to begin with the most current year’s index and to work
backward in time, unless the exact publication date of an article on a
particular subject is known. The exception for this technique might be a
search for a treatment material or method that is antiquated or no longer
practiced, such as the gold foil technique or the clinical use of a particu-
lar all-ceramic crown material that is no longer manufactured. For these
historical searches, review articles might be useful initial sources for the
topic of interest.
PEER-REVIEWED JOURNAL SOURCES
Most information today is acquired from the multitude of dental

and medical journals to which dentists subscribe. Most often, member-
ship in a professional organization entitles the member to receive the
organization’s designated journal on a quarterly, bi-monthly, or monthly
basis. Not all these journals, however, are peer-reviewed. Peer review
generally implies that a submitted manuscript is blindly reviewed by
one or more experts on the general topic of the manuscript, that sugges-
tions for improving the manuscript are returned to the authors, and that,
following revisions, the manuscript is again reviewed and copy-edited
for clarity before being accepted for publication. Provided that the re-
viewers are skilled in the precepts of evidence based dentistry and apply
those precepts when reviewing the manuscript, one can generally expect
that most peer-reviewed articles are accurate. Subscribing to a periodical
generally entitles the member to have access to the publisher’s on-line
journal source. This on-line source gives the reader access to all pub-
lished manuscripts from that journal, including all back issues that have
been entered into the on-line source. Generally, the suscriber logs onto
the publisher’s website, enters the specific journal log-in information,
selects a password, and begins the search process. Key words can often
be used to search the databases and generate a list of related articles
published by particular journal. After reviewing the titles, the reader
can select the articles of interest; often, an abstract of the article is
available for review. Finally, many publishers allow the reader to select
and review the entire article on-line, including all tables and figures.
One advantage of this technique is that it allows free access to the
journal as long as the clinician subscribes to it (individually or through
membership in a sponsoring professional organization); it also allows
CONDUCTING A SEARCH OF THE LITERATURE 47
the clinician to discard old journal issues that may be consuming valu-
able space in the office or home. The disadvantage of this search tech-
nique is that it allows a search for articles in only one particular journal,
rather than providing a more comprehensive listing of all articles pub-
lished on any given topic. This technique may prove too limiting when
treatment decisions require a more comprehensive approach. Several
journals typically underwrite dental conferences or symposia and often
provide a CD-rom or on-line review of the conference proceedings, for
an appropriate fee. This review may serve as an additional source of
information for the busy clinician. Finally, several journals provide CD-
rom disks of their published manuscripts for persons without internet
access.
ELECTRONIC DATABASES
One of the easiest and most cost-effective methods to search the

literature is through the use of the PubMed service from the federal
government. This service can be accessed on the internet at http://
www.ncbi.nlm.nih.gov/PubMed. The PubMed system was developed by
the National Library of Medicine, located at the National Institutes
of Health (NIH), and through the National Center for Biotechnology
Information (NCBI). PubMed serves as an excellent search tool for ac-
cessing dental, medical, and biomedical literature citations and provides
links to full-text journals at the web sites of participating publishers.
Publishers participating in the PubMed service electronically submit
their articles just before or at the time of publication. In addition to the
biomedical literature, PubMed provides access and links to various
databases, such as those that contain DNA and protein sequences, popu-
lation-study data sets, and assemblies of complete genomes through
their integrated system. For the practicing clinician, PubMed provides
free access to MEDLINE and Internet Grateful Med, the bibliographic
databases that serve as an excellent source for obtaining current litera-
ture citations.
MEDLINE is PubMed’s premier bibliographic database covering
medicine, dentistry, nursing, veterinary medicine, preclinical sciences,
and the health care system. MEDLINE contains more than 11,000,000
citations dating back to the mid-1960s and contains bibliographic cita-
tions and author abstracts from more than 4000 biomedical journals
published in the US and 70 foreign countries. Only about 80% of the
current journals participate in the MEDLINE citation service, however,
so some information may not be provided. When one accesses the
MEDLINE system and types in the keyword dentistry, the system lists
more than 243,500 citations of the dental literature. These citations are
listed according to key words (or under Medical Subject Heading
[MeSH]) provided by the authors and publishers.
Internet Grateful Med provides free access to MEDLINE, AIDSLINE,
AIDSDRUGS, AIDSTRIALS, BIOETHICSLINE, DIRLINE, HISTLINE, OLD-
48 FELTON
MEDLINE, POPLINE, TOXLINE, SPACELINE, SDILINE, HSRPROJ,

HealthSTAR, and ChemID. Internet Grateful Med allows the use of
Loansome Doc Document Delivery service, through which the entire text
of published journal articles can be individually ordered. This service is
conducted through a local or regional library, and a fee (which depends
on the library used, but can be $8.00 or more per order) is charged for
the service. If a limited number of articles is required, however, this
service may be more cost-effective than ordering a journal subscription.
The clinician must take this vast array of electronic information and
limit it to the area of interest. Clearly, search strategies must be employed
to reduce the volume to a usable size for the busy reader. The first step
is to develop a logical question. As with any evidence based assessment
of the literature, the hierarchy of literature categories comes into play.
Thus, the use of ramdomized, controlled-clinical trials (RCT), prospec-
tive clinical trials, retrospective analysis, cross-sectional trials, and case
based assessments must be appropriately selected in the key word
selection. Simple additions or deletions of letters or words can make
huge differences in the number of citations that MEDLINE lists.
Recently, the author conducted two searches of the dental literature
to determine (1) the success rates of single-tooth implant therapy, and
(2) an outcome assessment of root canal therapy. Listed in Tables 1 and
2 are the search strategies employed, along with the number of citations
MEDLINE provided for each search strategy. The addition of an ‘‘s’’ to
one word or changing a key word from ‘‘treatment’’ to ‘‘therapy’’ had a
significant effect on the number of citations listed. Similarly, the addition
of RCT, prospective trial, and retrospective trial affected the outcomes for
the two individual search strategies employed. Occasionally, it is neces-
sary to use trial and error to limit the search to a manageable list.
At last check, from the 243,500 dental citations, use of the key words
dentistry and RCT yielded a total of 20 citations; dentistry and prospective
trial yielded 48 citations; dentistry and retrospective trial yielded 38 cita-
tions; dentistry and cross-sectional studies yielded 735 citations; and den-
tistry and case-controlled studies yielded 20,507 citations. Thus, it can be
Table 1. RESULTS OF SEARCH STRATEGIES FOR SINGLE-TOOTH IMPLANT

THERAPY
No. Medline Citations
Search Strategy Listed
Single-tooth implant treatment 208
Single-tooth implant treatment and RCT 0
Single-tooth implant treatment and prospective trial 0
Single-tooth implant treatment and retrospective trial 0
Single-tooth implant therapy 161
Single-tooth implants therapy 36
Outcome of single-tooth implant therapy 39
Outcomes of single-tooth implant therapy 8
Treatment outcomes of single-tooth implant therapy 30
RCT randomized, controlled trial.

CONDUCTING A SEARCH OF THE LITERATURE 49
TABLE 2. RESULTS OF SEARCH STRATEGIES FOR OUTCOME ASSESSMENT OF

ROOT CANAL THERAPY
No. Medline Citations
Search Strategy Listed
Outcome of root canal therapy 179
Outcome of root canal therapy and RCT 0
Outcome of root canal therapy and prospective trial 1
Outcome of root canal therapy and retrospective trial 0
Outcomes of root canal therapy 20
Outcome of endodontic therapy 120
Outcomes of endodontic therapy 14
Outcome of root canal treatment 189
Outcomes of root canal treatment 20
Outcome of endodontic treatment 125
Outcomes of endodontic treatment 15
RCT randomized, controlled trial.
concluded that the field of dentistry has a long way to go to provide the
clinician with an adequate amount of conclusive evidence for planning
treatment interventions in patient care.
Once the list of citations has been produced, each can be opened to
review an abstract of the article (if available) for initial review. If a useful
article is found, one can improve search results by selecting the related
articles link to find others that MEDLINE has assessed. The entire text
of the article can then be ordered through a link to the publisher (if
available), or through the Loansome Doc Document Ordering service
described previously. When the entire article has been obtained, how-
ever, it is essential that the precepts of evidence based dentistry be
accurately applied to determine whether the research methods, use of
control groups, appropriate sample size, and appropriate statistical tests
have been suitably employed so that the article provides a valid presen-
tation of data. Otherwise, to paraphrase the ancient Romans, ‘‘caveat
lector’’ or, ‘‘Let the reader beware!’’
References
1. Felton D, Lang B: The overview: An article that interrogates the literature. J Prosthet
Dent 84:17–21, 2000
2. Oxman A, Guyatt G, for The Evidence based Medicine Working Group: How to use an
overview. In Sakett DL: Evidence based Medicine: Users’ Guides to the Medical Litera-
ture. Hamilton, Canada, McMaster University Health Sciences Centre, 1993
3. Oxman A, Cook D, Guyatt G, for The Evidence based Medicine Working Group: Users’
guide to the medical literature. VI. How to use an overview. JAMA 272:1367–1371, 1994

David A. Felton, DDS, MS
Department of Prosthodontics
Room 404 Brauer Hall, CB 7450
Chapel Hill, NC 27599–7450
e-mail:dave_felton@dentistry.unc.edu
EVIDENCE BASED DENTISTRY

Design Architecture
Catherine Hayes, DMD, DMSc
Dentists often need to make clinical decisions based on limited

scientific evidence. To base a clinical dental practice on scientific evi-
dence more effectively, clinicians must have the skills to evaluate the
dental literature critically. In dentistry and dental education, clinical
decision making is traditionally based on expert opinion. These opinions
usually coincide with standard practice. Recently, however, there has
been a shift to support expert opinion or standard practice with evi-
dence.
The shift toward evidence based dentistry provides an opportunity
for the transfer of scientific information into clinical decision making
(Fig. 1). Simply defined, evidence based dentistry focuses on scientific
evidence in guiding clinical decisions. The practice of evidence based
dentistry requires reviewing the results of all research relating to a
particular clinical issue and assessing the validity of the findings. An
additional step is to determine if the study’s results will help in caring
for a particular patient or group of patients or assessing the external
validity (generalizability) of the study. For example, if a particular study
evaluates the effect of a specific treatment on a limited patient popula-
tion, the findings may not be applicable to the practice of a particular
clinician.
TYPES OF RESEARCH STUDIES
To evaluate research studies critically, clinicians must have a work-

ing knowledge of the principles of scientific research and an understand-
From the Department of Oral Health Policy and Epidemiology, Harvard School of Dental
Medicine, Boston, Massachusetts

52 HAYES
Figure 1. Clinical decision making.
ing of the various types of research studies. Briefly, there are two broad
categories of research: basic science and clinical research. The principles
that govern the validity of scientific research are common to both
branches of scientific research. It is more challenging to ensure that a
study is free of bias with clinical research than with basic science or
laboratory research, because in the laboratory the researcher has more
control over the environment and other variables that may influence the
results of the study. This article focuses on assessing the validity of
clinical research studies.
It is important to understand the hierarchy of evidence in clinical
research. All clinical research studies are encompassed under the broad
heading of epidemiologic studies. Epidemiology is defined as the study
of the distribution and determinants of disease frequency in human
populations.2 The distribution of disease refers to who is at risk for a
particular disease. For example, older men have the highest risk for oral
cancer. The determinants of disease are the factors that affect the individu-
al’s risk of developing a disease. For example, tobacco use increases an
individual’s risk for developing oral cancer and is thus considered a risk
factor. A risk factor may increase an individual’s likelihood of devel-
oping a disease (as smoking increases the risk of lung cancer), or it may
decrease an individual’s likelihood of developing disease (as fluoride
decreases the risk of dental caries). In clinical research the aim is to
quantify risk relationships as well as benefits of specific treatments to
improve the health of the public.
Epidemiologic studies include studies that follow the natural course
of disease or treatment effects as well as studies in which the investiga-
tors intervene in assigning a treatment for a particular condition or in
using a preventive agent to decrease likelihood of disease. These studies
can be categorized into two broad categories: descriptive and analytical
studies.
Descriptive Studies
Descriptive studies describe the general characteristics of the distri-
bution of a disease, particularly in relation to person, place, and time.
EVIDENCE BASED DENTISTRY: DESIGN ARCHITECTURE 53
Descriptive studies commonly seen in the dental literature are case

reports and case series studies, which are detailed reports of an individ-
ual patient (case report) or a group of patients (case series) with a
particular disease or who have received a particular treatment. Case
series studies abound in the dental literature. An example of a case
series study is one in which investigators report on patients treated in
their practice with a particular implant system. This report may be a
long-term study in which the investigator reports on a variety of treat-
ment outcomes. Although this study may provide interesting informa-
tion to clinicians, it cannot demonstrate the superiority of one treatment
over another without the use of an appropriate comparison group. It is
impossible to know what effect a particular treatment has on these
outcomes without making a comparison with another treatment. This
comparison is possible only with an analytic study design, described
later.
Cross-sectional surveys are another type of descriptive study that
report the status of an individual with respect to the presence or absence
of both exposure and disease assessed at one point in time. These studies
are also limited in their ability to demonstrate definitively the benefits
of a particular treatment or the significance of a particular exposure. For
example, a study that examined 500 individuals, including a complete
oral examination, a medical examination, and an interview regarding a
variety of health, dietary, and sociodemographic factors, reports on the
association between oral health and diet. The investigators report that
individuals with good oral health also had a healthy diet, indicating that
a healthy diet contributes to adequate oral health. With a cross-sectional
study it is impossible to conclude anything about causality. Adequate
oral health might enable a person to consume more fruits and vegetables
that constitute a healthy diet, a conclusion that is quite different from
the conclusion that adequate diet results in good oral health. Essentially,
in a cross-sectional study it is impossible to determine if A causes B or
vice versa; this situation is analogous to the ‘‘chicken-and-egg’’ phenom-
enon. In summary, descriptive studies are often referred to as hypothesis-
generating studies. They are often the first step in investigating a particu-
lar scientific question.
Analytic Studies
Analytic studies differ from descriptive studies in that they include

an appropriate comparison group that permits the testing of epidemio-
logic hypotheses. Causality can be investigated with analytic studies.
The two broad subcategories of analytic studies are intervention and
observational studies.
Intervention studies or clinical trials are considered to be the gold
standard for clinical research studies. Because the examiner assigns the
exposure or treatment, it is often possible to blind both the subject and
the examiner to the treatment assignment, creating a double-blinded
54 HAYES
study that minimizes bias of the study findings. Also, the ability to
assign subjects randomly into treatment groups ensures that the only
difference between study groups is the intervention being evaluated. In
a randomized study, each subject has an equal likelihood of being
assigned to any of the study groups, thus reducing the influence of bias.
This process creates groups that are relatively similar with respect to all
variables except for treatment, thus balancing the study groups in terms
of known and unknown confounders. Randomization to create similar
study groups is possible only with clinical trials and therefore signifi-
cantly increases the validity of these studies in comparison with other
clinical research study designs. Whenever possible, a clinical research
question should be addressed with a double-blind, randomized, con-
trolled clinical trial. Such a trial is not always feasible for ethical or
logistic reasons, leading investigators to choose one of the other study
designs.
In observational studies, investigators observe the natural course of
events, noting which subjects are exposed or not exposed, which have
had a particular treatment and which have not, and which have or have
not developed the outcome. There are two subcategories of observational
studies: cohort studies and case-control studies. In a cohort study, sub-
jects are selected on the basis of presence or absence of a particular
exposure (treatment) and then followed to determine the association
between the exposure (treatment) and outcome. All subjects must be
free of the disease of interest at the time the exposure is defined. Cohort
studies are efficient for the study of rare exposures, such as occupational
exposures (e.g., to asbestos), provide the ability to examine multiple
effects of a single exposure, and provide the ability to determine the
temporal relationship between exposure and disease. Cohort studies also
have disadvantages: they are inefficient for the study of rare diseases,
they may be expensive and time consuming, and they have the potential
for loss-to-follow-up bias that may affect the validity of the study.2 An
example of a cohort study in dental research is following individual
smokers and non-smokers to determine their risk for developing peri-
odontal disease. The study subjects must be free of periodontal disease
when the study begins.
The second class of observational studies are case-control studies in
which subjects are selected on the basis of whether or not they have the
disease of interest. Case-control studies are efficient for studying rare
diseases and diseases with long latency periods and have the ability to
examine multiple causes of a single disease. The disadvantages of case-
control studies include their inefficiency for the study of rare exposures,
the difficulty in establishing a temporal relationship between exposure
and disease, and their susceptibility to selection and recall bias.2 An
example of a case-control study is a study examining the association
between oral cancer and smoking. Oral cancer cases are compared with
a similar group of individuals who do not have oral cancer to determine
the difference in smoking rates between the groups. This approach was
used when it was first discovered that smoking is a significant risk factor
for lung cancer. It is important that cases and controls be selected from
the same source population to ensure that study subjects are similar
except in respect to the diagnosis of the study disease.
In summary, the study design chosen to address a specific research
question must take into account the nature of the exposure or treatment
and the nature of the outcome as well as ethical and logistic considera-
tions. For example, if one were studying the effect of two treatments on
a particular disease, to randomize subjects ethically to one treatment or
the other, there must be sufficient belief that either treatment may offer
benefits to the study participant and that neither treatment poses any
risk. This assurance is often not possible, and researchers therefore
choose one of the other analytic approaches. It is important to decide
if the disease or outcome is considered rare and thus decide which
observational design is most efficient in addressing the specific ques-
tions, keeping in mind that bias and confounding are of greater concern
in observational than in intervention studies.
STUDY SAMPLES
Clinical research is conducted using samples of subjects selected

from the population of individuals who have the disease of interest. For
example, if investigators are interested in evaluating a specific treatment
for the replacement of missing teeth, a sample of subjects who meet the
study criteria are selected. Each investigator determines a priori the
inclusion and exclusion criteria and the size of the sample to be used in
a particular study. For example, a study may include adults over age 40
years who have at least six missing teeth. These characteristics are
referred to as inclusion criteria. Anyone who smokes, who has received
antibiotic therapy within the past 6 months, or who has a history of
diabetes is not eligible to participate. These characteristics are referred
to as exclusion criteria. The inclusion and exclusion criteria are based on
characteristics that the investigator believes, from previous research or
clinical experience, may affect the results of the study. Samples are used
to estimate population values, because it is not practical to measure all
individuals in a population. Most of the application of statistics in
medicine and epidemiology involves making inferences from samples
to populations.
MEASURES OF ASSOCIATION
In epidemiologic studies, it is important to quantify the relationship

between exposure and outcome. This quantification is accomplished by
calculating a relative risk or odds ratio, values that are referred to as
measures of association. Table 1 demonstrates the method of calculation.
First, the relative risk is defined as the ratio of the incidence of disease in
the exposed group divided by the incidence of disease in the nonexposed
56 HAYES
Table 1. DISEASE STATUS
Exposure Status Positive Negative Total

Positive a b ab
Negative c d cd
Total ac bd
incidence of disease in exposed subjects a/(ab)

Relative risk
incidence of disease in nonexposed subjects (cd)
group. If there is no association between exposure and disease, the

relative risk is equal to 1. If the exposure increases the incidence of
disease, the relative risk is greater than 1. If the exposure is protective,
the relative risk is less than 1.
Example. A randomized, controlled clinical trial was conducted to evaluate

the effect of two treatments (scaling and root planing versus systemic antibiotic
therapy) on periodontal disease outcomes. Successful treatment was considered
to be that which resulted in a probing pocket depth of less than 4 mm at the
end of 12 months of follow-up. Of the 500 subjects in the scaling and root
planing group, 350 were classified as successful cases, compared with 250 in the
antibiotic treatment group. The results are shown in Table 2.
The relative risk of 0.71 indicates that the standard therapy, scaling
and root planing, is more beneficial in treating periodontal disease. The
classification of treatment success may be considered arbitrary, and the
investigator may wish to evaluate several outcomes, such as actual
attachment loss in millimeters.
In case-control studies, a relative risk cannot be used, because by
definition the cases in a case-control study already have disease. Instead
an odds ratio is calculated using the same 22 table format. Essentially,
the odds ratio determines the odds of being exposed among cases and
controls.
Example. A case-control study was conducted to determine the association

between cigarette smoking and periodontal disease. Subjects with periodontal
disease were compared with a similar group of subjects free of any periodontal
Table 2. EFFECT OF ANTIBIOTIC AND SCALING AND ROOT PLANING THERAPY ON

PERIODONTAL DISEASE
No. of Successful No. of Treatment

Treatment Outcomes Failures
Antibiotic 250 250
Scaling and root planing 350 150
Total 600 400
The relative risk 250/500 0.71

350/500
disease. The participants’ smoking status was then ascertained by self report
and validated by coltinine levels. Of the 1000 subjects with periodontal disease,
400 were smokers, compared with 200 of the controls subjects. The results are
shown in Table 3.
The interpretation of the odds ratio is the same as the relative risk.
Therefore, in this example, the conclusion is that smokers are 2.7 times
more likely to have periodontal disease than nonsmokers.
CONFIDENCE INTERVALS
The measures of association are calculated with data from the sam-
ple of individuals being studied; however, it is the population estimate
of risk that is of interest. To estimate the population value of the measure
of association, a confidence interval is calculated. A confidence interval
is one method of statistical inference that allows statements to be made
about the population using data from the sample. The most commonly
used method is that of calculating a 95% confidence interval. The meth-
ods of calculation are beyond the scope of this discussion; interested
readers are referred to a statistical text.1, 5 Briefly, the data can be used
to calculate an interval that includes lower and upper limits. For exam-
ple, in a study conducted to examine the association between diabetes
and tooth loss, the relative risk was calculated to be 1.9, and the 95%
confidence interval was calculated to be 1.2 to 2.7. That is, the data
indicate that there is approximately a twofold increase in the risk of
tooth loss among diabetics as compared with nondiabetics. It can be
concluded with 95% confidence that the true risk is between a 20%
increase and a 2.7-fold increase. Because the null value of 1.0 is not
included in this interval, this result is statistically significant.
ASSESSING VALIDITY
In interpreting the results of any research study, one must consider

three possible alternative explanations for research findings.2, 4 These
alternative explanations are chance, bias, and confounding. Chance refers
Table 3. INCIDENCE OF PERIODONTAL DISEASE IN SMOKERS AND NONSMOKERS
Disease Status
Tobacco Use Positive Negative

Smokers 400 200
Nonsmokers 600 800
Total 1000 1000
Odds Ratio ad/bc (400)(800) 2.67

(200)(600)
58 HAYES
to the probability that the results observed may be a chance occurrence

and not necessarily the result of the treatment under study. Chance is
assessed by statistical analysis of the research data and by calculating a
P-value. The P-value is defined as the probability that what was ob-
served, or something more extreme, occurred by chance alone. In scien-
tific research, the cutoff for statistical significance has traditionally been
set at 0.05. That is, if a P-value is 0.05 or less, the observation is
considered to be statistically significant. Numerous statistical tests are
used to calculate the P-value. The type of test used depends on the type
of data being analyzed. Many statistical texts are available that provide
details of statistical tests.1, 5 Bias refers to the divergence from the truth.
In epidemiologic studies, investigators aim to determine the true rela-
tionship between a specific exposure and a specific outcome. Anything
that obscures this true association may result in bias. For example, if
investigators know the treatment status of a subject, they may pay
closer attention to their evaluation of the outcome, thus introducing
observation bias into the study and interfering with the results. Stan-
dardization and calibration of examiners as well as blinding of the
examiner or investigator and the subjects are important steps that can
be taken to decrease bias in clinical research studies. A more detailed
discussion of bias is provided in the paper by Jacob and Carr.5 Confound-
ing refers to the influence of a second variable or factor on the relation-
ship between an exposure and outcome. This factor must be associated
with both the exposure and the outcome. For example, in a study
examining the relationship between smoking and oral cancer, alcohol
intake could be considered a potential confounder, because it is an
independent risk factor for disease and is also associated with smoking.
Adjustment should be made for both known and suspected confounders
in multivariate analysis of the data.
CRITERIA FOR CAUSALITY
If the findings of a study do not seem to be the result of chance,

bias, or confounding, one must attempt to determine if a causal relation-
ship exists. Several criteria are used in epidemiologic research to deter-
mine if an association is causal.3 These criteria include (1) consistency,
(2) biologic plausibility, (3) strength of association, (4) temporal relation-
ship, and (5) dose-response relationship. Consistency refers to the body
of evidence from multiple studies. The results of the present study must
be compared with previous similar studies to determine if the results
are consistent. If, in fact, several studies demonstrate similar results, this
consistency lends credence to the association’s being a causal one. The
relationship between the exposure and outcome must make sense biolog-
ically. Also, the association should be strong. For example, a relative risk
of 5.0 is more indicative of a true relationship than a relative risk of 1.2.
If it can be demonstrated that an exposure during a specific window of
time is related to the outcome, this temporal relationship provides evi-
dence for causal relationship. For example, in the prevention of neural

tube defects, it has been demonstrated that women who consume folic
acid during the time before neural tube closure have a lower likelihood
of giving birth to child with a neural tube defect than women who take
folic acid outside this critical period, thus providing evidence of a
temporal relationship between the exposure and the outcome. Similarly,
demonstrating that the relationship becomes stronger with increasing
amounts of the exposure also lends credence to a causal relationship. It
is often not possible to satisfy all the criteria for causality. It is the
overall body of evidence regarding the association between a particular
exposure and outcome that allows the inference of causality to be made.
A single study cannot demonstrate causality, because in a single study
only one sample is taken from the entire population of individuals with
a particular condition. It is possible that the sample is not representative
of the entire population. Thus, several studies must consistently demon-
strate similar findings before a conclusion of causality can be made.
SUMMARY
It is important for clinicians to understand the type of clinical

studies that appear in the literature and the inherent strengths and
limitations of each study. The three possible alternative explanations,
chance, bias, and confounding, must be considered for any research
study. Thus, it is important to evaluate research studies critically in light
of this discussion and not simply to summarize the findings. Finally,
conclusions about causality can only be made on the body of evidence,
not on any single study.
References
1. Campbell M, Machin D: Medical Statistics: A Common Sense Approach, ed 2. New

York, John Wiley & Sons, 1993
2. Hennekens CH, Buring JE: Epidemiology in Medicine. Boston, Little Brown, 1983
3. Hill AB: Principles of Medical Statistics. New York Oxford University Press, Chapter
XXIV, 1966
4. Hulley SB, Hulley SR: Designing Clinical Research. Baltimore, Williams & Wilkins, 1988
6. Weintraub J, Douglass C, Gillings D: Biostats: Data Analysis for Dental Health Care
Professionals. Chicago, Joshi International, 1985

Catherine Hayes, DMD, DMSc
Department of Oral Health Policy and Epidemiology
Harvard School of Dental Medicine
188 Longwood Avenue
Boston, MA 02115
e-mail: catherine_hayes@hms.harvard.edu
BIAS IN DENTAL RESEARCH CAN

LEAD TO INAPPROPRIATE
TREATMENT SELECTION
Rhonda F. Jacob, DDS, MS
In research, as in life, bias is the enemy of truth.

R. F. JACOB
Bias is a systematic error that distorts the true relationship between

an event and its outcome. Bias will negatively affect the truth of the
conclusions. In research, bias includes any systematic error in the design,
conduct, or analysis of a study. Bias can occur at all stages of research,
from the selection of the population, how treatment is provided, to how
and when outcome measurements are made. One report reviewed more
than 50 possible sources of bias in analytic research.33 The various
research designs differ in the features within the design that control bias.
Specific maneuvers attempt to control bias by reducing opportunities for
systematic errors and by encouraging impartial judgment by persons
involved in the study. In health care research, bias can result in a
mistaken estimate of a treatment’s effect or an exposure’s effect on the
course of disease.12 These mistaken estimates probably account for some
of the conflicting conclusions observed in apparently similar studies.
Mistaken estimates can lead to practitioners’ offering ineffective or even
harmful treatments. It is the clinician’s obligation to continue profes-
sional education by reviewing current literature. To optimize continued
learning and patient care, clinicians should understand and scrutinize
the various biases that can exist in research reports.
From the Department of Head and Neck Surgery, MD Anderson Cancer Center, Hous-
ton, Texas

62 JACOB
CLINICAL RESEARCH
How the human population as a whole behaves under natural
conditions and how the entire population of humans responds to a
particular treatment are the ultimate health care questions for researchers
and clinicians. Because the entire human population cannot be entered
into or managed in a study, researchers and clinicians rely on the laws
of probability and inferential statistics, which allow smaller sample
populations to be studied as representatives of the population as a
whole. These studies of sample populations use a multitude of research
methods to determine the relationship between event and outcome. If
stringent research and design criteria are not maintained, the assurance
is lost that the sample population and its event-to-outcome relationship
accurately represent that relationship in the total population; the study
lacks validity.
Health care research designs are broadly described as observational
or experimental.19, 38 In observational studies, a passive investigator usu-
ally observes subjects for exposures and outcomes. In experimental
studies, an involved investigator usually prescribes an intervention to
achieve a particular outcome. It is generally accepted that, because of
the active participation of the investigator, experimental studies offer
the best opportunity to control bias and that a correctly implemented
experimental study offers the best available evidence to answer a specific
research question.
Whether an observational or experimental design is chosen to an-
swer a given health care question depends on the type of research
question being asked. For many health care questions, an experimental
research design may not be appropriate because of the constraints of
population availability, population management, cost, time, and ethics.
Various design strategies have evolved to overcome these constraints,
but some of the strategies increase the possibility of bias.
A hierarchy of research design exists, based on study validity and
the ability to control bias within certain study designs.15, 32 Clinicians
and researchers must understand that less confidence can be placed in
the research conclusions derived from some study designs, and extreme
caution must be exercised when using these study reports to influence
decisions concerning patient care.
In addition to employing the appropriate study design, certain
elementary research methods must be implemented in all studies to
control bias. These include methods regarding patient selection, exam-
iner training, intervention, data collection, and analysis. When bias is
not controlled in these areas of clinical research, conclusions are highly
suspect, no matter what the study design.
HIERARCHY OF RESEARCH DESIGN AND BIAS

CONTROL
The hierarchy of research design is based on satisfying three main
criteria: (1) randomized or nonbiased selection of target and control
BIAS IN DENTAL RESEARCH CAN LEAD TO INAPPROPRIATE TREATMENT 63
subjects; (2) intervention or putative exposure under the control of the

investigator, and (3) prospective gathering of outcomes after entry into
the study.7, 9, 11, 13, 14, 15, 34, 38, 39a The control of bias in a given research project
depends largely on whether these criteria are met.
One of the greatest biases of health care research arisis from the
methods of selecting the sample populations targeted for the research.
If research subjects are inappropriately selected, no amount of stringent
research methodology can counter the bias of sample population selec-
tion. It has been suggested that the scope of population selection bias in
health care literature poses ‘‘potential catastrophic damage to a study’s
inferential basis’’ and should be taken as a serious threat.7 Some research
designs have more inherent patient-sampling safeguards than others.
When these safeguards are appropriately executed, the higher-quality
design, with the higher-quality patient sampling safeguards, should be
used to make health care decisions.
Randomized, Controlled Trials: ‘‘Best at Bias Control’’
Randomized, control group trials (RCTs) offer the greatest opportu-

nity for the investigator to identify subjects and then randomly assign
them to the intervention group or the control group by a predetermined
randomization protocol. Treatment is not rendered until the subjects are
randomly assigned to the study groups. Patient data are collected in a
prospective fashion to evaluate the intervention’s effect on the outcome
of interest. This study design is ideal for evaluating therapy. A sample
group of subjects with the malady of interest is further narrowed in
number by the use of specific inclusion and exclusion criteria appropri-
ately based on the research question. Subjects are usually excluded from
the study because their inherent characteristics are not relevant to the
research question. For instance, adults would probably be excluded
from an orthodontic treatment trial evaluating mandibular growth. Some
characteristics or cointerventions may confound the research conclusion.
Confounding characteristics may have or are suspected to have nearly
as profound an effect on the outcome as the intervention, and including
subjects with confounding characteristics makes it difficult to discern
the true effect of the intervention. Subjects with these confounding
characteristics are excluded from the study. For instance, when osseointe-
gration of dental implants was first evaluated, diabetic patients were
often excluded, because diabetes was thought to confound the ability to
measure healing at the implant site. When evaluating a question related
to in-office bleaching of teeth, researchers would probably exclude per-
sons performing at-home bleaching (a cointervention), because this addi-
tional therapy would probably confound the true effect of the in-office
study intervention.
After the subjects are selected, they are queried about their willing-
ness to undergo the study. Ideally, a study would report data on the
subjects who were eligible for the study, but refused to enter it.7 The
64 JACOB
investigators should then evaluate the characteristics of those persons

who entered and those who refused. This evaluation can establish that
the persons entering or not entering the study are alike in measurable
respects, and therefore the subjects entering the study are representative
of the total population of those subjects. For instance, in a dental trial in
which subjects are required to pay for therapy, persons of a lower
socioeconomic status may be eligible for the study but consistently
refuse to enter the trial because of financial concerns. This financial issue
becomes a selection bias, before the trial subjects are enrolled. The
subjects in the study should be recognized as representing the popula-
tion within a socioeconomic stratum, rather than the population as
a whole.
The manner in which subjects are recruited before screening can
produce a selection bias. If an implant study is advertised, only persons
interested in implants report to the recruiting site. If all dental school
denture patients are queried about their desire to enter an implant study,
a number will probably refuse. Something is inherently different in
subjects who volunteer for studies versus those who do not. Patients
who actively seek implants and those who are offered and accept im-
plants as an option to new dentures are likely to be from different
subsets of the population. This selection bias of recruitment at the outset
of a study could greatly affect how patients report their satisfaction
outcomes and could account for implant studies’ reporting contradictory
results.2, 20
After subjects are screened and found to be eligible for the study,
they will be randomly assigned to the treatment groups. Randomization
allows the patients an equal opportunity to be assigned to either inter-
vention group, thereby reducing selection bias and allowing the study
to be representative or generalized to the total population of other
patients with similar maladies and characteristics. Randomization
should be generated by computer programs,9 and the entry schedule
should be kept blind to investigators and study accrual personnel. As-
signing subjects to study groups by birthdate, entry date, hospital num-
ber, or an alternation schedule is haphazard, but is not randomization.
These ‘‘haphazard or quasi-randomized’’ methods allow study personnel
or referring clinicians to have prior knowledge of which group the
patient will enter. Well-meaning assistants have been known not to enter
a subject in a trial when they believe the subject would receive little
benefit from the assigned therapy. A system of alternating assignment
allows one to guide the order of accrual of subjects and to place a subject
in a specific study group, based on the desires of the subject. Assistants
responsible for accruing subjects might guide some subjects to a particu-
lar group because the morbidity rate is lower and the subject might be
more likely to finish the study. If accrual personnel know the new
therapy is next to be assigned, they might give positively slanted infor-
mation to a prospective subject, thereby ensuring the subjects’ entry into
the study. (Clinicians, too, can be influenced by their perception of what
offers the best treatment opportunity for their patients.) These systematic
biases can distort treatment outcomes. Blind randomization allows equal
distribution of variables known to affect treatment outcomes. Perhaps

more importantly, randomization allows equal distribution of the un-
known characteristics that might affect treatment outcomes. Having
interventions under the control of the examiner at the outset of the study
allows treatment methods to be standardized by trained practitioners.
This control also allows standardization of follow-up regimens, record
keeping, and measurements of outcomes. This standardization of meth-
ods, along with adequate training for practitioners and examiners, helps
minimize bias, thereby making the RCT the definitive clinical trial.
Observational Comparison Studies: ‘‘Ranges of Bias

Control’’
Observational studies, such as cohort and case-control studies, have

been used in epidemiologic surveys to determine the natural history of
a disease and exposures associated with a disease. Observational studies,
in which the investigators do not actually manipulate the subjects’
exposure to a treatment or event, but only observe outcomes and often
retrospectively determine exposures, are often used to discern the preva-
lence of a disease. Such studies are also used to determine population
characteristics that might be risk factors for disease. Observational stud-
ies rank lower in the hierarchy of evidence because they can meet few
if any of the three criteria for bias control. Observational comparison
studies, however are usually the studies of choice when risk factors or
harmful exposures are being evaluated. The major weakness of these
studies is that the patients are not randomly selected; but rather are
selected because they were exposed to a particular event, had a specific
lifestyle choice, or were noted to have a particular outcome.
A number of study designs fall under the description of observa-
tional. The strongest of the observational design strategies is the incep-
tion cohort study, in which the investigator is present immediately after
the exposure or event occurs (at the inception) and follows the subjects
for outcomes using prospective and standardized methods. A control
group, whose subjects were not exposed to the event, must be followed
with the same prospective methods to determine comparatively how
many subjects develop the outcome of interest. Great care must be taken
in selecting subjects for the concurrent control group. The control group
must be as nearly equivalent as possible to the exposed population
in every measurable characteristic that might affect the outcome. The
characteristics commonly considered are age, sex, socioeconomic status,
and educational background. Depending on the outcome of interest,
other characteristics, such as geographic area, concurrent medical condi-
tions, cointerventions, occupational exposures, among others, must also
be examined in selecting the control population. Unfortunately, in all
circumstances there are unknown characteristics that may influence the
outcome of interest. Unlike the random assignment of subjects to the
test or control group, there can be no safeguard to assure that these
66 JACOB
unknown characteristics are equally distributed in the control popula-

tion. Therefore, it is understood that considerable bias can occur in
selecting the control population. An inception cohort trial might be used
to determine whether persons with and without amalgam restorations
have an equal risk of developing multiple sclerosis. Another current
health question that might be considered using the inception cohort
design is whether the risk of autoimmune diseases is equal in women
who undergo silicone breast augmentations versus those who do not.
Both these issues have been hotly debated in health care and media
arenas. There is probably an unknown multifactorial cause for multiple
sclerosis and autoimmune diseases; therefore, selecting a control popula-
tion that is similar for these unknown characteristics is nearly impossible
and fraught with bias.
The inception cohort study is the premier observational study be-
cause of its prospective nature. Unfortunately, waiting for outcomes to
occur may take many years, leading to loss of subjects, loss of trained
study personnel, and prohibitive costs. The difficulty of maintaining the
validity of a protracted study adds additional biases. Other observational
designs using retrospective, one-point-in-time evaluation of comparative
populations with and without the outcome of interest offer a more
immediate answer to the research question. The price for immediacy,
however, is increased bias and risk of distorting the true relationship
between event and outcome.
Three types of retrospective studies are cross-sectional, ex post facto,
and case-control studies.15, 39a In these research designs, the outcome has
already occurred in the test population. The selection of the control
population is critical to reduce bias. Control subjects should be as equiv-
alent as possible in all characteristics to the test population, with the
exception of the exposure of interest. The comparison is the incidence of
the outcome in the test population and in the control population. The
control subjects may come from the same population pool as the test
subjects or from a different population pool. For instance, when investi-
gating whether a particular dental assistant chair may increase the risk
for lower back pain, the same-pool subjects might be drawn from all
dental assistants at one dental school, but they would be allocated into
the control or test group based on whether they used a specific design
of chair. Control subjects drawn from a different-pool population could
be assistants working at a different dental school where a different chair
design is used. Same-pool populations are more likely to have similar
demographic and workplace characteristics, both known and unknown.
Regardless of whether same- or different-pool subjects are selected for
the control group, the processes for identifying possible subjects and the
final selection of each subject must be consistent. In either design, the
subjects would be queried about their present or past history of back
pain.
Selection bias is quite difficult to control in observational studies.
Because investigators are often gathering data on exposures that have
already occurred, the existence of an exposure or outcome must often
be confirmed by patient report or past medical records. Medical records

are often incomplete, because practitioners may not document the spe-
cific findings required for the study. Alternately, an investigator may
infer exposure or outcome from other clinical findings (not the outcome
of interest) or tangential records such as insurance claims. These meth-
ods may lead to a biased selection of subjects that do not represent the
totality of exposed patients. Investigators often evaluate characteristics
in the two populations to show that they are similar in all respects
except the exposure of interest. Even though the two groups being
evaluated may seem comparable, there is always the possibility that one
or more unidentified characteristics are responsible or at least influence
the outcome of interest; these other characteristics are unlikely to be
distributed equally between the two groups.
Subjects may also be selected based on their recall of an exposure
or event, thereby creating a recall bias. Subjects who have the outcome
of interest, or fear they will develop the outcome of interest, are more
likely to recall that the exposure occurred. During subject interviews,
investigators should blind the subjects to the outcome of interest and
the exposure of interest. This blinding can be accomplished by asking
the subjects many questions regarding various outcomes and exposures
to decrease their awareness of the possible interactions.
Besides the difficulties inherent in population selection, the one-
point-in-time studies have other biases. The assessment of outcomes
represents a snapshot of the subjects’ daily lives. Outcomes that are
identified by waxing and waning signs and symptoms may not be
present during the study evaluation. At evaluation, the outcome may be
at an early, barely detectable level. Subjects aware of the possible out-
come and exposure relationship may have a biased response when asked
to recall their symptoms and exposure data and their cointerventions.
Cointerventions or confounding signs and symptoms are also likely to
wax and wane, thereby affecting the outcomes during the evaluation
and affecting recall by the subjects.
Observational studies have been used extensively to evaluate harm-
ful exposures. Smoking risks for cardiovascular disease and lung cancer
have been universally accepted only in the past decade. Many investiga-
tors from many countries have reported increased health risks in persons
who smoke. Because of the inherent weaknesses in observational studies
and the political and monetary implications of these findings, many
years and hundreds of confirmatory studies were required before the
risks of smoking were accepted. The few studies that have been con-
ducted on the health issues of amalgam restorations or breast implants
reveal a wide range of risks, including no increased risk, for persons
undergoing these treatments. Currently, the literature regarding these
controversies include more letters to the editor than clinical trials. A
MEDLINE search of reports associating amalgam restorations with mul-
tiple sclerosis reveals only three small, case-control trials in the past 20
years; with inconclusive suggestions of an increased risk of multiple
sclerosis or alterations in immune parameters. One study reported that
68 JACOB
their multiple sclerosis population did not have an increased number of

amalgam restorations, but did have an increased number of caries com-
pared with the control population.25 The other study reported that the
multiple sclerosis group had an increased number of amalgam restora-
tions.1 This report shows another problem with one-point-in-time evalua-
tions: the inability of such studies to establish a cause-and-effect relation-
ship between exposure and outcome. Patients with multiple sclerosis
may have poorer oral hygiene because of possible physical constraints
of their disease. Therefore, they may have more caries, and if their caries
are treated, they are likely to have more amalgam restorations. This
chicken-or-the-egg problem is common in case-control trials that identify
possible associations between two processes. Associations can be re-
vealed, but not causation; a cause must always precede the outcome.
Case Reports and Case Series: ‘‘Bias Out of Control’’
Having a comparison control group is an absolute criterion for

research. Studies that do not have a comparison group are relegated to
an inferior position in the research hierarchy. These reports are most
commonly referred to as case reports or case series. It has been stated
that in these studies the only basis for comparison is ‘‘implicit, intuitive,
and impressionistic.’’14 Sackett states that inductive reasoning gives way
to seductive reasoning.32 Rather than controlling bias, case reports and
case series are more likely to be bias out-of-control. Reports of a single
patient outcome or a series of patient outcomes are subject to extreme
bias in patient selection and treatment delivery decisions and methods.
Subjects in case series do not represent a random sample of the total
population, patients within the treatment group often have many pre-
treatment characteristics besides the malady of interest, and subjects are
rarely treated with a standardized protocol of therapy. Often, data are
gathered in a retrospective review of charts with nonstandardized mea-
surement and outcomes assessment criteria. Despite their best intentions,
reporting clinicians are biased by the very fact that they rendered the
care and analyzed the outcome. Clinicians should never predict treat-
ment outcomes based on reports that do not have a comparison group.
Despite their unreliability as predictions of treatment outcomes,
unusual case reports and case series have value. These case reports call
attention to little-known maladies, reveal complications of proposed
therapies, and document outcomes that may have occurred because of
exposures and proposed therapies. Precisely documented characteristics
and descriptive data from case series and case reports are often used to
plan subsequent research with control groups.
Historical Control Groups
Control groups are required to assess the value of a therapy. Dental

and medical reports have commonly used data from patients who were
treated earlier at the same institution with a different modality of treat-

ment. As a new therapy is introduced to the profession, some prac-
titioners will begin using it. To assess the value of the new therapy, the
practitioners will compare the group receiving new therapy with the
group receiving the older therapy. When comparing the outcomes of the
old and new therapy, the patients who received older therapy would
become the historical control group. Rarely is this historical control
group an arm of a RCT with specific population criteria and prospective
data protocols. Instead, the historical control group usually consists of
patients who were given the older therapy based on a number of
decisions made by the patient and the practitioner, and specific treatment
and outcome analysis methods were not standardized. Often, these data
are gathered from chart reviews. Even if the two groups are treated
during the same time-frame, a multitude of biases exist in this type of
patient assignment and in the non-standardized methods. When patients
in a historical control group were treated many months or even years
previously, unknown variables and unknown cointerventions can create
additional bias that is likely to affect outcome.
An analysis of the literature was performed to compare findings in
therapies when data reports are based on RCTs versus historical control
trials. A total of six therapies had reports of both study designs evaluat-
ing similar outcome endpoints, for a total of 50 RCTs and 56 historical
control trials. The historical control trials found that the new therapy
was better in 78% of the trials, where the RCTs found the new therapy
was better in only 20% of the trials. When comparing the control group
in the RCT with the control group in the historical controlled trial, the
control group in the historical control trial not only fared worse than
the experimental group in that trial, but often fared worse than the
control group in the RCT. This finding supports the lack of equivalence
in the two populations.34 The two groups are rarely equivalent, except
for the primary diagnosis. When a new therapy is developed, there are
often conscious or unconscious efforts to narrow the criteria in the
treatment group to include only those who are considered most likely
to benefit or most likely to comply with the new methods. The others
receive the traditional or historical treatment. Also, when historical
controls are used, not all participants are included in the evaluation.
The finding that control groups in the historical control trials faired
worse than control groups in the RCTs suggests that bias in patient
selection may ‘‘irretrievably weight the outcome of HCT in favor of
new therapies.’’34
In a retrospective chart review of patients receiving palatal obturator
prostheses to restore palatal defects following maxillectomy, it was hy-
pothesized that patients had shorter hospital stays when they were given
this prosthesis at time of surgery rather than several days after surgery.
A review of 120 patients from 1960 to 1971 revealed that nearly 58% of
patients did not receive surgical prostheses, and an evaluation of 151
patients from 1980 to 1984 revealed that 45% did not receive a surgical
70 JACOB
prosthesis. In the earlier trial, there was a significant difference in the

duration of hospitalization of the two groups studied (22.7 and 14.2
days, respectively), but no significant difference was observed in the
later trial (10.6 and 8.0 days, respectively). The practice of dentistry and
medicine has changed remarkably from 1960 to 1984, but the cause of
the difference in hospital stay in the two groups in the earlier trial and
the cause of the magnitude of difference of hospital stay between the
two trials remains undetermined. Thus, using historical controls, even
within the same institution, presents difficulty in distinguishing treat-
ment effects from changes in ancillary care, manpower, referral patterns,
patient support methods, health care reimbursement, and so forth. His-
torical controls derived from published reports present the same diffi-
culties.
BIAS IN RESEARCH METHODS
Bias control continues beyond design selection and population selec-

tion. Specific methods of bias control should be implemented in the
conduct and analysis of the investigation. These methods are applicable
to all research designs.
Blind Participants
Blinding the investigators, examiners, and subjects to the interven-

tion and the outcome is a significant controller of bias. Double-blind
methods are the ideal situations. Subjects and study personnel are blind
to the treatment assignments and to any study events or information
that might influence outcome assessments. Single-blind methods blind
either the examiner or the patient. When procedures are performed, the
persons who examine subjects for outcomes or collect data from subjects
should not be the same individuals who perform the procedures. Den-
tists have been trained to perform various treatment alternatives. For
example, fixed partial dentures, removable partial dentures, and im-
plants have all been used to replace the same missing tooth. Most
dentists prefer one restorative method over another, and no dentist can
state that the preference is solely based on scientific evidence. If the
preference is not solely based on scientific evidence, there is an element
of bias, and this bias can affect the outcome assessment if outcome data
are collected by the practitioner.
Those who collect data should be blind to the hypotheses of the
study. This blinding is likely to be easier than blinding the clinician who
performed the dentistry. Data on oral conditions, restorative conditions,
and function could be collected; however, only some of the data would
be relevant to a given study. Some institutions have established data
collection facilities, where routine data is collected under strict protocol

for all subjects sent to the data collection facility, irrespective of the
study in which the subjects are involved. Subjects can be queried about
a number of oral conditions without knowing specifically what condition
or exposure is relevant to the hypothesis. Blinding subjects to their
treatment, especially in dentistry, requires ingenuity. Sham treatments
are often unconvincing, and the informed consents required today are
so explicit that study subjects may be biased by the description of the
procedures and the list of possible complications. Preconceived notions
that subjects form during the informed consent process may influence
their outcome responses. This influence may be a problem when a model
consent form, with its blanks to be completed, has been approved by an
institutional review board and is expected to serve as the consent form
for all studies. Investigators should campaign for wording in their spe-
cific consent form that avoids biasing study participants. When in-
forming subjects of the comparative treatments in the study, clinicians
and research assistants should strive to control their own biases. When
screening persons for study entry, applicants should be reminded that
the study is being conducted because the dental community is not
convinced which treatment functions better, is faster, is more esthetic,
has greater longevity, and so forth.
When reviewing the literature, clinicians should evaluate whether
blind methods were used in data collection, and the methods for assur-
ing blinding should be explained. With this information, the clinician
can determine if blinding truly occurred. If blind data collection was not
employed, clinicians should search for other studies that address the
research question.
Treat All Subjects the Same
Specific methods for delivery of interventions, data collection, and

analyses should be determined before initiating an investigation. These
protocols should assure that study participants in both treatment and
control groups are treated and assessed equally. Doing so requires that
the same follow-up regimen, follow-up data, and tests be performed
on all subjects. Questionnaires and quality-of-life analyses should be
administered in the same fashion to all participants. Follow-up examina-
tions should be scheduled as often as needed to gather the data neces-
sary to answer the research question and as often as needed to anticipate
complications, complaints, and compliance issues. Bias can result if
patients with complications must alter follow-up regimens because the
follow-up examinations were not scheduled frequently enough. Subjects
with less tolerance or with more complaints have potential for more
frequent follow-up and have the potential of being evaluated differently.
It is likely that more data will be gathered on these subjects. Pertinent
data may be missed on subjects who return sporadically; their complica-
tions and improvements may need to be assessed by history taking
72 JACOB
rather than by examiner observation. The inequities in such data gather-

ing should be recognized as potential biases.
Although prospective interventions are not employed in observa-
tional studies, specific protocols for data review of records, patient
interviews, and tests to evaluate outcomes should be designed in a
prospective fashion. Before the investigation is begun, and even before
the populations are selected, methods must be established so all subjects
are tested and queried similarly. In some observational designs, the
outcome of interest is often present before the study is initiated. The
investigator queries subjects about exposure history. There is potential
for investigators to interview subjects more vigorously to uncover the
exposure when the subjects exhibit the outcome. This difference in the
level of interrogation potentially biases towards a positive correlation
between the exposure and outcome. This problem underscores the need
for established methods for data gathering, as well as the need to blind
the examiners to the outcomes.
Often, subjects are not treated similarly because of missing data. In
dentistry, outcomes or baselines may be retrospectively assessed using
existing radiographs, photographs, or study casts. Records that were not
made for the purpose for which they are currently being used often fall
short of meeting various criteria. Frequently, subjects who are otherwise
eligible for the study cannot be enrolled because these previously col-
lected records are not available or are nondiagnostic. Records made
during a routine clinical examination may serve the purpose for a
patient’s treatment or evaluation on that occasion but are often not
detailed enough for a later research project. For instance, casts made for
custom trays may not be of adequate quality to serve as baseline for
studies that require anatomic detail of all tooth surfaces. Less than ideal
radiographs may not be remade if patients complain of discomfort, and
appropriate angulations of film and beam may be sacrificed. Photo-
graphs may be missing; in a busy practice, clinicians may not retain
serial photographs of specific patient outcomes that were unsuccessful
or unesthetic. Investigators must decide either to extrapolate data from
these less-than-ideal sources of documentation or to exclude these poten-
tial research subjects. Although it might seem that the better solution is
to exclude subjects with missing documentation, doing so may create a
serious selection bias. One study sought to evaluate the esthetic out-
comes of a specific surgical method of closing cleft lip and palate.
Subjects came from one surgeon’s practice, were treated by one of two
surgical methods, and were included only if they had had a clinical
photograph made after age 15 years. The esthetics of the lip closure
were evaluated by a panel of lay judges blind to the surgical method.
Subjects with missing photographs or poor-quality photographs were
excluded from the investigation. Twenty subjects were included in each
group for analysis. No data were supplied as to the number of subjects
who never returned before age 15 years, how many subjects failed to
have quality photographs, or the percentage of the entire population
these 40 patients represented. In this investigation, a population selection
bias occurred based on whether photographic documentation was avail-

able on the subjects.31
Calibration and Training of Examiners
Innumerable studies are available in the health care literature that

specifically test the level of agreement among multiple examiners who
are evaluating a clinical test, making a diagnosis, reading radiographs, or
measuring treatment outcomes. More than 300 clinical reports evaluating
observer variability in health care published from 1985 to 1989 were
complied in a pre-MEDLINE bibliography.8 A MEDLINE search found
57 clinical trials that evaluated observer variation between 1990 and
2001. Various indices of agreement have been formulated based on
percentage, probabilities, correlation coefficients, the kappa statistic (␬),
and others.6 The ␬ statistic is preferred because it provides for an adjust-
ment of agreement beyond chance and is appropriate for category scales
and continuous data. (Kappa is affected by prevalence, and it cannot be
calculated when one of the investigators constantly uses the same score.
Variations on the original formulations by Cohen are frequently em-
ployed. Kappa is widely used and widely debated. Continued variations
and other models for measuring agreement are being evaluated in statis-
tical arenas.) It has been estimated that for many medical decisions,
clinical agreement is at a suboptimal level, with ␬ below 0.35.23 It has
been proposed that ␬ less than 0.4 is poor agreement, ␬ of 0.40 to 0.75 is
fair to good agreement, and ␬ above 0.75 to 1.00 is excellent agreement.10
Even calibrated examiners in dental investigations have not consis-
tently reached good agreement in clinical measurement. Observer
agreement was reported among seven calibrated observers of various
dental specialties, who evaluated quality of bone trabeculation from 100
panoramic radiographs using a five-point scale. This scale was similar
to that used in various implant studies and ranged from lack of trabecu-
lation to bone as dense as cortical bone. The mean intraobserver
agreement was ␬ 0.61. The observers were paired in 21 pairs, with
interobserver agreement ranging from ␬ 0.23 to 0.56. Comparison of
all seven examiners measuring all 100 sites and grades revealed ␬
0.38. Grade 1, representing no trabeculation, had the most agreement of
␬ 0.76. Grades 2, 4, and 5 were ␬ 0.38 to 0.39. The worst agreement
was for normal trabeculation with ␬ 0.23. A grade of 5, representing
dense trabeculation, was given 230 times, but 25 subjects were regraded
to level 2 on a repeat examination by the same examiners.39 These
measurement methods have been used to qualify boney trabeculation
and subsequently enroll or exclude patients from implant studies. These
same bone qualification methods have been used retrospectively to
explain implant failures.
Another investigation considered 11 parameters of fixed restorations
evaluated on a five-point scale by two calibrated examiners from each
of six participating centers. The two examiners from each institution
74 JACOB
were evaluated for agreement on each of the 11 parameters. The

agreement of each pair ranged from ␬ 0.16 to 0.95. The mean of the
␬ values from all six institutions for each parameter ranged from ␬
0.56 to 0.91. Marginal integrity had the lowest level of agreement.27 An
evaluation of four calibrated examiners investigating the efficacy of
dental radiography found intraexaminer agreement was ␬ 0.75 or
higher at baseline and remained at approximately the same level (0.80)
throughout the 24-month period of the study. The interexaminer
agreement among the six pairings of the four examiners ranged from ␬
0.68 to 0.80 for caries and 0.72 to 0.83 for periodontal disease.40
As in other health care clinical measurements, various dental
measurements result in ranges in practitioners’ level of agreement. This
lack of agreement indicates how critical it is to decrease bias created by
systematic errors in measurement by training multiple examiners in the
appropriate use of measurement instrumentation and in the implementa-
tion of clinical criteria. The more explicitly each measurement technique
and category is defined, the less ambiguous are the demarcations be-
tween categories, and the higher is the observer agreement. The level of
agreement of multiple examiners should be tested before an investiga-
tion to assure that the examiners have reached an understanding of
measurement criteria and an acceptable level of agreement. During the
investigation, continued calibration is often necessary, and the final level
of agreement achieved during the investigation should be reported.
Accounting for all Subjects
It is disconcerting to an investigator to have subjects not complete

a study. Statistical tests (power analysis) are often performed before the
investigation to determine how many subjects are necessary to detect a
difference in outcome between the groups. When subjects do not finish
the trial, a result may be inconclusive because the lower number of
study subjects causes a lack of statistical power. In prospective trials that
require a long follow-up to determine the outcome of interest, there is
an increased chance of losing subjects for a myriad of reasons: noncom-
pliance, moving away from area, loss of contact, inability to travel to
test site, and unrelated death, among others. It is important to determine
the characteristics of the subjects who left the study and to perform
another analysis of the remaining subjects to determine if the two groups
are still equivalent in the variables that might influence the treatment
effect. In addition, one should determine if the dropouts are more
common in one group than the other. Uneven loss of subjects was found
in a study evaluating the effectiveness of vitamin C in decreasing cold
signs and symptoms.20a The caplets often broke, allowing subjects to
taste the medication, and subjects discussed this occurrence among
themselves while waiting for study evaluations. Persons in the placebo
group realized they were not tasting ascorbic acid and began to drop
out of the study, anticipating no benefit, where as the subjects ‘‘tasting
the benefit of treatment’’ continued the study. More drop-outs in one

group than another can signal a loss of blindness to therapy or may
indicate untoward side effects. Uneven loss makes the study groups
unequal in numbers and in known and unknown study variables.
Too often, reports simply change the number of subjects (N) at the
end of the study, with minimal or no reference to the subjects lost to
follow-up. It is assumed that these lost subjects have experienced the
outcomes at the same rate as those subjects remaining in the study. For
example, a systematic literature review of the English-language reports
published since 1960 evaluated the survival rate of fixed partial dentures
(FPD). Difficulty arose in performing a meta-analysis of the reports
because many of the reports did not have any follow-up data on a large
portion of the subjects after insertion of the prostheses.36 As follow-up
continued, even more subjects were lost to follow-up. One report quanti-
fied 255 FPD inserted over 10 years but only had 121 available for
evaluation at year 11.30 Another considered a one-point-in-time evalua-
tion of 77% of an original 184 FPD. No data were reported on the 33%
of lost subjects.5 Eighteen years after insertion of 122 FPD, 66 persons
were available for a follow-up analysis. No data were reported on the
54% of lost subjects.28 A large database of 642 FPD inserted in 1974 was
randomly selected from a national dental insurance registry. A 10-year
evaluation was made, but only 164 persons presented for examination.21
The subjects were evaluated again at year 14, with only 97 of the original
642 subjects reporting.22 It is inappropriate to assume that 30% to 50%
of subjects lost to follow-up would have the same outcomes as those
subjects remaining in the study.
Investigations are often undertaken to determine differences in
treatment outcomes that are usually quite small. Often, the difference in
outcomes between the therapies is only 10%. Loss of subjects will reduce
the statistical ability to detect these small differences in outcome. Losing
only 10% to 15% of subjects can render a study inconclusive. Altering
the final N of the study risks drawing the wrong conclusion about the
value of the therapy.
Data Used Appropriately: Chart Reviews and Errors of

Omission
In health care research, review of patient treatment records is a

common method of describing disease prognosis and determining thera-
peutic outcomes. Often, historical control data are collected from treat-
ment records to compare previous therapies with current therapies.
Some studies have used insurance records or national health care regis-
tries to gather data on the prevalence of a disease. When patients are
treated as subjects in a research protocol, the data recorded are driven
by the research question. In a well-designed trial, measurements or tests
required for the protocol are documented and read with strict attention
to minimizing bias, using many of the methods previously described.
76 JACOB
Records kept for routine treatment in a clinical setting, however, are

often incomplete. Tests may be read, but not recorded. Not all subjects
will receive the same tests, and techniques may be modified based on
factors unrelated to the disease process. Patient compliance is often not
considered. Cointerventions are rarely recorded. Follow-up examinations
are often scheduled at patients’ requests; therefore, unless patients have
a specific complaint, their follow-up schedule will be abbreviated com-
pared with other patients receiving the same treatment. Often, the notes
are influenced by a patient’s complaint; unless the patient complains,
the follow-up note is an array of summary statements of ‘‘patient satis-
fied, within normal limits, normal diet, good esthetics, good occlusion,
watch tooth # 3,’’ and so forth. Treatment records are maintained by the
treating clinician, and often patients are reluctant to complain to their
practitioners, lest that complaint negatively affect the practitioner–
patient relationship. For the same reason, patients may tend to overem-
phasize the positive outcomes of their treatment. Clinicians are also
likely to overestimate the positive outcomes of therapies they deliver,
waiting for patients to bring forward complaints, rather than asking
whether patients experience particular difficulties.
Without standard treatment protocols and documentation, omission
of data or ambiguous interpretation of data to fit a research question is
problematic. A concurrent investigation was undertaken to evaluate
temporomandibular disorder on a group of patients receiving ortho-
gnathic surgery. An RCT evaluating the cost, risks, and efficacy of
two jaw fixation techniques was performed, and pertinent data were
documented by the treating clinicians in the patients’ records. The sec-
ond study involved specific evaluations of patients with temporoman-
dibular disorder performed by blind examiners with specific examina-
tion protocols performed on the same patients. The authors then
examined the disagreement between data taken from the treatment
records and data taken from the temporomandibular disorder examina-
tion. Four parameters were evaluated: (1) a vertical opening of more
than or less than 40 mm, (2) the presence or absence of clicking, popping,
or locking of a joint, (3) the presence or absence of pain, and (4) the
presence or absence of crepitus. Although both studies were prospective,
it became apparent that the surgeons focused more on efficacy of treat-
ment than on secondary outcomes. Often, no data in the treatment
records addressed the criteria for temporomandibular disorder. In other
instances, it was necessary to create operational definitions of the four
criteria that would allow interpretation of the surgeons’ notes to catego-
rize the outcomes. At 2- and 24-month surgical follow-ups, surgeons
stated that 23% and 0% of subjects, respectively, had a vertical opening
below 40 mm, whereas the temporomandibular disorder examiners re-
ported 90% and 21%, respectively. The surgeons reported pain in 8.6%
and 1.7% of the subjects, respectively, whereas the temporomandibular
disorder examiners reported 47% and 29%, respectively. These differ-
ences show the level of disagreement that can occur when data from
routine treatment records are used for research purposes as compared
with data gathered by blind, calibrated examiners.85
SUMMARY
The first RCT was instituted in the early 1950s, evaluating strepto-
mycin and bed rest compared with bed rest alone for tuberculosis.26 This
research design has become the reference standard for comparative
evaluations of therapies because of its prospective nature and the ability
to control bias. Because it is easier to conduct observational studies, they
have often been inappropriately substituted for the better experimental
study designs. Since the 1950s, however, readers of the medical literature
have slowly come to demand quality clinical research to assist them in
caring for their patients. Dentists are somewhat behind their medical
colleagues in using the strongest research designs to answer clinical
questions. In dentistry, observational studies with convenience samples
of patients have been commonly used. It is often argued that few dental
ailments affect a person’s life as negatively as most medical maladies;
therefore, experimental rigors are not required of dental research. Al-
though most dental care does not involve life-and-death issues, dentists
are as eager as physicians to offer their patients optimal care. Optimal
care is best defined through nonbiased research strategies.
References
1. Bangsi D, Ghadirian P, Ducic S, et al: Dental amalgam and multiple sclerosis: A case-
control study in Montreal, Canada. Int J Epidemiol 27:667–671, 1998
2. Boerrigter EM, Geertman ME, van Oort RP, et al: Patient satisfaction with implant-
retained mandibular overdentues: A comparison with new complete dentures not
retained by implants—a multicentre randomized clinical trail. Br J Oral Maxillofac
Surg 33:282–288, 1995
3. Carr AB, McGivney GP: Users’ guides to the dental literature: How to get started. J
Prosthet Dent 83:13–15, 2000
4. Chalmers TC, Celano P, Sacks HS, et al: Bias in treatment assignment in controlled
clinical trials. N Engl J Med 309:1358–1361, 1983
5. Cheung GS, Dimmer A, Mellor R, et al: Gale M. A clinical evaluation of conventional
bridgework. J Oral Rehabil 17:131–136, 1990
6. Cohen J: A coefficient of agreement for nominal scales. Educational Psychology and
Measurement 20:37–46, 1960
7. Ellenberg JH: Selection bias in observational and experimental studies. Stat Med
13:557–567, 1994
8. Elmore JG, Feinstein AR: A bibliography of publications on observer variability (final
installment). J Clin Epidemiol 45;567–580, 1992
9. Feinstein AR: Clinical Epidemiology: The Architecture of Clinical Research. Philadel-
phia, WB Saunders, 1985
10. Fleiss JL: The measurement of interrater agreement. In Fleiss JL: Statistical Methods
for Rates and Proportions, ed 2. New York, John Wiley & Sons, 1981
11. Friedman GD: Primer of Epidemiology, ed 4. New York, McGraw-Hill, 1994
12. Gordis L: Epidemiology. Philadelphia, WB Saunders, 1996
13. Hulley SB, Cummings SR: Designing Clinical Research in Epidemiologic Research,
ed 2. Baltimore, Lippincott Williams & Wilkins, 2001
14. Isaac S, Michael WB: Handbook in Research and Evaluation: A Collection of Principles,
Methods, and Strategies Useful in the Planning, Design and Evaluation of Studies in
Education and the Behavioral Sciences, ed 2. San Diego, CA, Edits Publishers, 1981
16. Jacob RF: [abstracts/commentary]. Journal of Prosthodontics 6:325–327, 1997
78 JACOB

19. Jaeschke R, Sackett DL: Research methods for obtaining primary evidence. Int J
Technol Assess Health Care 5:503–519, 1989
20. Kapur KK, Garrett NR, Hamada MO, et al: Randomized clinical trial comparing the
efficacy of mandibular implant-supported overdentures and conventional dentures
in diabetic patients. Part III: Comparisons of patient satisfaction. J Prosthet Dent
82:416–427, 1999
20a. Karlowski TR, Chalmers TC, Frenkel LD, et al: Ascorbic acid for the common cold.
A prophylactic and therapeutic trial. J Am Med Assoc 231:1038–1042, 1975
21. Karlsson S: A clinical evaluation of fixed bridges, 10 years following insertion. J Oral
Rehabil 13:423–432, 1986
22. Karlsson S: Failures and length of service in fixed prosthodontics after long-term
function. A longitudinal clinical study. Swed Dent J 13:185–192, 1989
23. Koran LM: The reliability of clinical methods, data and judgments. N Engl J Med
293:695, 1975
24. Kramer MS: Clinical Epidemiology and Biostatistics: A Primer for Clinical Investiga-
tors and Decision-makers. Berlin, Springer-Verlag; 1988
25. McGrother CW, Dugmore C, Phillips MJ, et al: Multiple sclerosis, dental caries and
fillings: A case-control study. Br Dent J 187:261–264, 1999
26. Medical Research Council: Streptomycin treatment of pulmonary tuberculosis. BMJ
2:769–782, 1948
27. Morris HF: Department of Verterans Affairs cooperative studies project number 147:
Level of examiner reliability over seven years. Implant Dentistry 2:245–249, 1993
28. Palmqvist S, Swartz B: Artificial crowns and fixed partial dentures 18 to 23 years
after placement. International Journal of Prosthodontics 6:279–285, 1993
29. Phillips C, Tulloch JF: The randomized clinical trial as a powerful means for under-
standing treatment efficacy. Seminars in Orthodontics 1:128–138, 1995
30. Reuter JE, Brose MO: Failures in full crown retained dental bridges. Br Dent J
157:61–63, 1984
31. Ross RB. MacNamera MC: Effect of presurgical infant orthopedics on facial esthetics
in complete bilateral cleft lip and palate. Cleft Palate Craniofac J 31:68–73, 1994
32. Sackett DL, Haynes RB, Guyatt GH, et al: Clinical Epidemiology. A Basic Science for
Clinical Medicine, ed 2. Boston, Little, Brown and Co, 1991
33. Sackett DL: Bias in analytic research. Journal of Chronic Diseases 32:51–63, 1979
34. Sacks H, Chalmers TC, Smith H Jr: Randomized versus historical controls for clinical
trials. Am J Med 72:233–240, 1982
35. Scott BA, Clark GM, Hatch JP, et al: Comparing prospective and retrospective evalua-
tions of temporomandibular disorders after orthognathic surgery. J Am Dent Assoc
128:999–1033, 1997
36. Scurria MS, Bader JD, Shugars DA: Meta-analysis of fixed partial denture survival:
Prostheses and abutments. J Prosthet Dent 29:459–464, 1998
37. Shugars DA, Bader JD, White BA, et al: Survival rates of teeth adjacent to treated and
untreated posterior bounded edentulous spaces. J Am Dent Assoc 129:1089–1102, 1998
38. Stamm JW: Types of clinical caries studies: Epidemiological surveys, randomized
clinical trials, and demonstration programs. J Dent Res 63:701–707, 1984
39. Taguchi A, Tanimoto K, Suei Y, et al: Observer agreement in the assessment of
mandibular trabecular bone pattern from panoramic radiographs. Dentomaxillofacial
Radiology 26:90–94, 1997
40. Valachovic RW, Douglass CW, Berkey CS, et al: Examiner reliability in dental radiog-
raphy. J Dent Res 65:432–436, 1986
MD Anderson Cancer Center
1515 Holcombe Boulevard,
Box 0441
Houston, TX 77030
e-mail: rjacob@mail.mdanderson.org
SYSTEMATIC REVIEWS OF
THE LITERATURE
The Overview and Meta-analysis
Alan B. Carr, DMD, MS
INTRODUCTION TO THE INFORMATION PROBLEM
The process of being a continual learner in this information age is a

significant challenge. This challenge is especially significant for the
health care provider who realizes that patient care is not a stagnant
undertaking but an evolving process in which the responsibility to act
in the patient’s best interest requires continual infusion of new knowl-
edge and skills. For others, who are possibly less motivated to stay up-
to-date, state licensure organizations impose expectations of continuing
education that strongly suggest it is in the publics’ best interest for
professionals to improve their knowledge continually to provide ade-
quate patient care. For all dental practitioners, staying up-to-date is a
challenge because of the vast amount of clinical research available. At
the heart of the problem is the difficulty in finding a focused answer
that has the best chance of truthfully informing clinicians to act in the
patient’s best interest regarding a specific clinical dilemma.
The dilemma associated with the sheer volume of the literature
available is illustrated by a recent publication15 that focused on a specific
area of dental care, the dental implant literature. In this study, the
authors wanted to estimate the quantity of dental implant literature
available from the MEDLINE database between the years 1989 through
1999 that could be used to guide evidence based decisions. The search
From the Department of Dental Specialities—Prosthodontics, Mayo Graduate School of

Medicine, Mayo Clinic, Rochester, Minnesota

80 CARR
strategy was designed to identify the best evidence related to the catego-
ries of etiology, diagnosis, therapy, and prognosis in implant care. The
results for this single area of dentistry reinforced the notion of an
information explosion. The search provided an amount of clinically
relevant information regarding implants that would require a clinician
to read between one and two articles a week for 52 weeks out of the
year just to stay current with the progress in dental implants. For the
practitioner also interested in staying current in other areas, such as
prosthetic, surgical, periodontal, endodontic, and direct restorative pro-
cedures, staying current could indeed be difficult.
To determine whether this volume of literature is characteristic of
all aspects of dentistry or only of special dental subjects such as dental
implants, another study16 investigated trends in dental and medical
research publications and the proportion of high-quality clinical studies
(randomized, controlled trials [RCTs]) of relevance to general dentistry.
In this study, the authors conducted a MEDLINE search of the literature
published between 1969 and 1999 and found that clinical trials in dental
research had increased to 7% and RCTs had increased to 5% of all dental
research during this period. Although the overall number of research
publications decreased during this period, the proportion specifically
related to outcomes of patient care had increased. Thus, more of the
literature currently published focuses directly on patient care and might
be important for clinicians to read. Between 1979 to 1999, the authors
found that one of every 200 research publications was an RCT, studies
which by nature of their design have the best chance to provide valid
and reliable information. These trials were relevant to between 60% of
the dental care activities for adults and 80% of those for children.
Together these findings suggest that more high-quality information
is available to clinicians than ever before. In a professional life that
leaves little time for reviewing the increasing numbers of potentially
useful research reports, how does the conscientious clinician of today
find the highest quality and most relevant reports among the hundreds
of others.
A SOLUTION FOR THE BUSY PRACTITIONER

One solution is for the clinician to seek reports that synthesize
numerous sources of clinical information into summary statements or
recommendations regarding specific clinical questions or controversies.
These articles can save the clinician time and effort spent sorting through
numerous primary research reports. Such research syntheses go by vari-
ous names and can take a variety of forms. The familiar literature review
is a narrative summary of some clinical topic or group of topics, often
provided by an expert in the field and usually characterized as an
unsystematic compilation of opinion and evidence. Although it intu-
itively seems correct that experts should be able to inform clinicians
about a topic they have studied intensively, it has been shown that they
are less able to produce objective reviews of the literature in their subject
SYSTEMATIC REVIEWS OF THE LITERATURE 81
than are nonexperts.11 More reliable are the reviews that take a systematic
approach in providing an overview of the relevant and important pri-
mary research regarding a specific clinical question. (In this context,
primary research refers to the research reports that contain the original
information on which the review is based.) Such a systematic review is
an overview of the primary research that has an explicit statement of
the objectives, materials, and methods and has been conducted following
a previously established rigorous and reproducible methodology.5 When
the systematic review includes a statistical synthesis of the numerical
results of several trials that examined the same question it is termed a
meta-analysis.
Systematic reviews are now considered the most reliable method
for summarizing large volumes of research evidence. These reviews are
less prone to subconscious and subjective forms of bias often seen in
reports by experts because they follow principles of research design
similar to those found in primary research. The fundamental difference
between the primary research study and the systematic review is the
unit of study. The scientific principles of a systematic review—docu-
mentation of methods before beginning, a comprehensive search identi-
fying all relevant studies, and the use of rigorous methods for appraisal,
collection, and synthesis of data—limit the bias in identifying and re-
jecting studies and provide more reliable and accurate conclusions. The
usefulness of overviews and meta-analyses is reflected in the increasing
numbers of review publications and in the efforts of groups, most
notably the Cochrane Collaboration, to prepare, maintain, and dissemi-
nate results of systematic reviews of health care. The Cochrane Collabo-
ration is an international initiative for systematic review management
and currently has an Oral Health Group that encourages participation
by interested individuals.
ANATOMY OF A SYSTEMATIC REVIEW
The specific features that illustrate the systematic approach and

improve the chance of providing the best synthesized evidence are
• Preparation of a detailed research protocol that outlines the clinical
question of interest
• Selection of criteria for inclusion of articles in the review
• Systematic search of relevant published and unpublished research
• Determination (by two reviewers) of articles that meet pre-
defined inclusion criteria
• Critical appraisal of the quality of selected articles
• Extraction of outcome data from the selected articles
• Data combination (where appropriate) to synthesize and summa-
rize the best evidence
• Report of findings relative to the knowledgebase and new ques-
tions raised by the findings
82 CARR
A systematic review has distinct advantages over an unsystematic

approach.6 The authors must describe where the data (the published
trial) come from and how they were processed to arrive at the conclu-
sions. Being explicit about the methods taken to identify and select the
appropriate trials for processing is important in limiting bias and pro-
vides more accurate and reliable synthesized information from the vol-
umes of related literature. Also, with a systematic review, large amounts
of information can be assimilated in a timely manner, resulting in shorter
delays between research discoveries and implementation of demon-
strated effective patient-management methods. Results of different stud-
ies can be more formally compared, inconsistencies among studies can
be identified, and the causes for the inconsistencies can be evaluated.
When possible, quantitative systematic reviews, or meta-analyses, can
provide more precise answers by combining overall results of many
similar trials, increasing confidence in the clinical application of the re-
sults.
A summary of some recent dental systematic reviews illustrates the
important steps in this process. The importance of the search for uncov-
ering all potential sources (articles) that can contribute to the results is
highlighted by recognizing that a simple MEDLINE search alone is
inadequate for this phase of the systematic review process. One recent
report9 describes a process that included a review of 25 electronic data-
bases, the Worldwide Web, relevant journals that were also hand-
searched, and authors in the field who were personally contacted for
additional information. Personally contacting authors is an important
attempt to address potential publication-bias problems. This form of
bias, which results from the selective publication of studies based on the
direction and magnitude of their results, is harmful if important negative
studies are not published.10 Because systematic reviews pool results,
conclusions derived in the absence of truthful negative studies could
lead to overestimation of treatment effectiveness.
Another review provides a good illustration of several key features
of the systematic process. In this recent report of the effectiveness and
cost-effectiveness of prophylactic removal of wisdom teeth, the authors
wanted to provide a summary of existing evidence on prophylactic
removal of wisdom teeth in terms of the incidence of surgical complica-
tions and the morbidity associated with wisdom tooth retention.18 The
inclusion criteria were in three main categories: design (RCT, literature
review, or decision analysis), patient characteristics (unerupted or im-
pacted wisdom teeth, or those having wisdom tooth extraction prophy-
lactically or because of disease), and reported outcomes (either patho-
logic changes associated with retention of wisdom teeth or postoperative
complications following extraction). The data sources included an ex-
isting review that formed the basis for the report, six electronic data-
bases, paper sources (including Clinical Evidence), web-based resources,
and relevant organizations and professional bodies that were contacted
for further information. For non-English papers, translators were re-
cruited to assist with study selection and data extraction. Decisions
regarding study selection, data extraction, and validity assessments were

made by two independent reviewers; when the reviewers disagreed,
discussion took place to gain consensus. The process of assessing validity
followed a previously established checklist that was used to evaluate
data organized into structured tables. This process resulted in 40 studies
being included in the review: two RCTs, 34 literature reviews, and 4
decision analysis studies. The authors’ method of dealing with such a
mix of data sources is instructive. Specifically, it was stated that the
methodologic quality of the literature reviews (no systematic reviews
were included) was generally poor. Although most of the reviews sug-
gested that prophylactic removal was not warranted, the three reviews
that did suggest such removal was justified were of poorer methodologic
quality than most other reviews. When reviews include primary research
with less-than-optimal designs, Slavin17 emphasizes the need to report
more details about the studies.
Another study faced a similar situation involving questions of
study-design related to research synthesis. This study used an alternative
method manner for selecting the articles to be included when the pri-
mary criteria were not met.14 The aim of this review was to assess the
clinical evidence for the ability of glass-ionomer restoratives to inhibit
secondary caries. A total of 52 articles that met previously established
inclusion criteria were evaluated. Primary and secondary lists of system-
atic criteria for methodologic quality were drawn up. After applying the
primary list of 14 criteria to each article, none was found to be accept-
able. The secondary list, which included design features of a prospective
trial with an appropriate control group, was then applied to the 52
articles and yielded 28 suitable for data extraction and evaluation.
The methodology used in creating a systematic review and the
syntheses such reviews provide make them useful for clinicians who do
not have the time to review all the primary studies related to a clinical
question of interest. Because systematic reviews offer the best chance for
busy clinicians to act in their patient’s best interest, it is important to
know how to evaluate them.
WHAT TO LOOK FOR IN A USEFUL SYSTEMATIC

REVIEW
A number of helpful descriptions for evaluating the validity of

systematic reviews have been presented in the literature.3, 7, 13 The im-
portant questions to consider when assessing a systematic review12 are
• Was a clinical question clearly stated and addressed?
• Were the search methods comprehensive enough to find all rele-
vant articles?
• Were explicit methods used to evaluate which articles to include
in the review?
84 CARR
• Was validity of the articles assessed, and was this assessment

reliable and free from bias?
• Were inconsistencies in the findings of the included studies ana-
lyzed?
• Were the findings of the primary studies combined appropriately?
• Were the reviewers’ conclusions supported by the data?
Without a clear statement of the clinical question it addresses, clini-
cians have no idea if the review can help with their patients’ needs. For
clarity, questions must include specification of the patient population
involved, the intervention or exposure studied (often with a comparison
or standard treatment group), and the outcomes evaluated. Even a good
question cannot be adequately answered if all pertinent articles are not
found to evaluate. The reader therefore must be reasonably assured that
all relevant and important literature has been included in the review. It
is likely that comprehensive searches will include (1) use of one or more
bibliographic databases, (2) a search for reports that cite the important
papers found through a database such as Science Citation Index, (3)
perusal of the references of all relevant papers found (and often the
references of the references), and (4) personal communication with au-
thors and organizations active in the area being reviewed.
A comprehensive search will probably yield many articles not useful
for review. An article may be unsuitable because it does not directly
relate to the question of interest or because a certain study design is
methodologically too weak to provide valid information. The authors
should clearly describe how the articles were chosen and, the method
used may apply methodologic criteria. Such criteria will not always
produce studies that are valid, so a validity assessment is also necessary
so that the review will be based on data that are as free from bias as
possible. Guidelines for such assessment have recently been published
in dentistry for clinical questions that address diagnosis,2 prognosis,1
and treatment.4, 8 Such guidelines should be applied and reported in
sufficient detail to allow readers to assess the validity of the primary arti-
cles.
Even with the use of methodologic guidelines, assessments can be
both unreliable and biased. Such assessments can affect both the inclu-
sion and validity assessment of the primary studies. As a safeguard, the
primary studies should be assessed by at least two reviewers, each blind
to the other’s decision. The level of disagreement should be known, and
the rules to reach consensus should be reported. To protect from the
bias associated with a lack of blindness, the information regarding the
institution and authors associated with the primary research can be
removed before assessment for inclusion and validity.
Variation in the findings from the assessed studies is inevitable.
Reasons for this variation can include chance, study design, and differ-
ences in the three basic study components mentioned previously (popula-
tion, exposure or intervention, and outcome). Authors of reviews who
discuss the potential impact of all possible sources of variation have met
their responsibility to the reader. Whether the review uses statistical

methods of data synthesis or not, the author should clearly state the basis
for any conclusions and explain any conflicting results. The primary
studies included in the review should have been reported in sufficient
detail to allow the reader to assess critically the basis for any conclusions.
SUMMARY
Systematic reviews in the form of overviews or meta-analyses offer a

solution for busy practitioners who have difficulty keeping abreast of
current literature. Because systematic reviews can condense numerous
studies into reliable and valid summaries of the best available evidence for
a specific clinical problem, they offer significant benefit to busy clinicians.
This article has summarized the major features and advantages of
systematic reviews. It has distinguished those features that attempt to
increase the usefulness of reviews by limiting bias, and it provided a
summary of important questions clinicians can use to appraise such
reviews critically. With this knowledge, clinicians should be able to use
the literature more appropriately and in a timely fashion.
References
1. Anderson JD, Zarb GA: Evidence based dentistry: Prognosis. J Prosthet Dent 83:495–
500, 2000
2. Eckert SE, Goldstein GR, Koka S: How to evaluate a diagnostic test. J Prosthet Dent
83:386–391, 2000
3. Felton DA, Lang BR: The overview: An article that interrogates the literature. J Prosthet
Dent 84:17–21, 2000
83:599–603, 2000
5. Greenhalgh T: Papers that summarize other papers (systematic reviews and meta-
analyses). In: How to Read a Paper: The Basics of Evidence Based Medicine. London,
BMJ, 1997, p 111
6. Greenhalgh T: Papers that summarize other papers (systematic reviews and meta-
analyses). In: How to Read a Paper: The Basics of Evidence Based Medicine. London,
BMJ, 1997, p 113
7. Greenhalgh T: How to read a paper: The Basics of Evidence Based Medicine. London,
BMJ, 1997
84:8–16, 2000
9. McDonagh MS, Whiting PF, Wilson PM, et al: Systematic review of water fluoridation.
BMJ 321:855–859, 2000
10. Montori VM, Smieja M, Guyatt GH: Publication bias: A brief review for clinicians.
Mayo Clin Proc 75:1284–1288, 2000
11. Oxman AD, Guyatt GH: The science of reviewing research. Ann N Y Acad Sci
703:125–131, 1993
12. Oxman AD, Guyatt GH: Guidelines for reading literature reviews. Can Med Assoc J
138:697–703, 1988
13. Oxman AD, Cook DJ, Guyatt GH for the Evidence-Based Medicine Working Group:
User’s guide to the medical literature. VI. How to use an overview. JAMA 272:1367–
1371, 1994
14. Randall RC, Wilson NH: Glass-ionomer restoratives: A systematic review of a second-
ary caries treatment effect. J Dent Res 78:628–637, 1999
86 CARR
15. Russo SP, Fiorellin JP, Weber HP, et al: Benchmarking the dental implant evidence on
MEDLINE. Int J Oral Maxillofac Implants 15:792–800, 2000
16. Sjogren P, Hallinf A: Trends in dental and medical research and relevance of random-
ized controlled trials to common activities in general dentistry. Acta Odontol Scand
58:260–264, 2000
17. Slavin RE: Best evidence synthesis: An intelligent alternative to meta-analysis. J Clin
Epidemiol 48:9–18, 1995
18. Song F, O’Meara S, Wilson P, et al: The effectiveness and cost-effectiveness of prophy-
lactic removal of wisdom teeth. Health Technol Assess (Winch Eng) 4:1–55, 2000

Alan B. Carr, DMD, MS
Department of Dental Specialties—Prosthodontics
Mayo Graduate School of Medicine
Mayo Clinic
200 First Street SW
Rochester, MN 55905
e-mail: Carr.alan@mayo.edu
THE USE OF DIAGNOSTIC DATA

IN CLINICAL DENTAL PRACTICE
Carol Oakley, DDS, MSc, PhD,
and Donald Maxwell Brunette, MSc, PhD
‘‘If it looks like a duck, quacks and waddles like a duck . . . then it
probably is a duck!’’ and ‘‘if you hear hoof beats, think of horses not
zebras’’ (unless, of course, you are on the plains of the Serengeti). At
first glance, these adages may seem irrelevant to the diagnostic process
in clinical dental practice. These adages, however, respectively illustrate
the principle of pattern recognition and the effect of prevalence, both of
which are important aspects of the diagnostic process.
This article presents the dentist in clinical practice with an evidence-
based approach to diagnostic data and tests so that the reader can
become a more discriminating user of tests offered by the medical
profession and, increasingly, by the pharmaceutic industry for promo-
tional purposes.
This article reviews a few basic principles of biostatistics, discusses
test design and test characteristics, and demonstrates how to identify a
good test and the circumstances in which a test will be useful in the
clinical setting. For ease of discussion, this article focuses on dichoto-
mous data that are divided into mutually exclusive categories: positive
or negative. Data are presented from the dental literature, and clinical
dental examples are used. Texts providing more detailed, comprehensive
information regarding biostatistics, clinical epidemiology, and related
topics are listed with the references.4, 22, 29 Much of the following discus-
sion has been summarized from these sources.
From the Department of Oral Biological and Medical Sciences, Faculty of Dentistry,
University of British Columbia, Vancouver, British Columbia, Canada

88 OAKLEY & BRUNETTE
THE DIAGNOSTIC PROCESS
Most dentists have had their height, weight, and blood pressure
measured in a physician’s office. They may have had blood drawn for
complete blood cell count and differential blood series or testing for
cholesterol levels, prostate surface antigen, blood glucose, or thyroid
hormone levels. They may have undergone tuberculin skin tests, mam-
mography, electrocardiograms, and cardiac stress tests or had suspicious
moles removed for histologic examination. They may even have sought
the convenience of home pregnancy tests. As dentists, clinicians have
probably prescribed dental radiographs and used explorers and peri-
odontal probes to detect caries, defective restorative margins, and peri-
odontal attachment loss. They may have applied electric pulp testers or
ice to teeth to determine their vitality. They may have used toluidine
blue dye to aid in selecting sites for biopsy of suspicious oral lesions.
They may have recorded mandibular excursions, palpated muscles of
mastication, and listened for temporomandibular (TM) joint sounds.
As consumers and providers of health care, dentists reasonably
expect that the information obtained from diagnostic investigations is
reliable and truthful. Moreover, it is generally assumed that the informa-
tion obtained from these investigations will provide a diagnosis as to
the presence or absence of an abnormality or disease and that the
diagnosis will direct a subsequent course of management or treatment.
The question remains, however: how can patients and clinicians know
if the data and subsequent diagnosis are correct?
Beck2 maintains that dentistry, in contrast to medicine, has de-
emphasized diagnostic activities and merged them with treatment-plan-
ning activities. Nevertheless, the aim of a medical or dental clinician is
to arrive at a diagnosis that may direct a subsequent course of manage-
ment. The diagnostic process is initiated by the patient history and
symptoms and is followed by the clinical examination, during which the
clinician perceives signs that are manifestations of the disorders. The
clinician may also use assays or measurements that are traditionally
referred to as diagnostic tests or tools. In reality, symptoms, signs, and
assays may all be considered diagnostic tools, because all are sources of
information used to generate a diagnosis.2
Sacket et al29 explain that patients, clinicians, and researchers gener-
ally agree that the presence of disease indicates a derangement in anat-
omy, biochemistry, physiology, or psychology. They less often agree,
however, on the exact criteria that define the condition that is the target
of the diagnostic process.
Wulff 40 distinguished two major principles of disease: (1) the nomi-
nalistic or patient-oriented principle, and (2) the essentialistic principle
that emphasizes disease as an independent entity. In the nominalistic
approach, disease does not exist as an independent entity, and disease
classification is really a classification of sick people or patients. Thus, a
particular disease is defined by a group of characteristics that occur
more often in persons with the disease than in other people. Patients
THE USE OF DIAGNOSTIC DATA IN CLINICAL DENTAL PRACTICE 89
will have a pattern of similar symptoms and signs, and their prognosis
and treatment will have some common features. The nominalistic princi-
ple does not require a definition of normality and recognizes that defini-
tions of disease may vary among different societies.40
The essentialistic view40 is closely related to a modern principle of
disease termed biochemical fundamentalism.6 This view is based on the
idea that disease can be described in terms of biochemistry and molecu-
lar biology. Diseases are assumed to follow regular patterns, and once
the underlying biochemical events are understood, the course of the
disease can theoretically be predicted. Hence, disease classification be-
comes a matter of biotechnology, and the need for defining a normal
state is avoided by relying upon statistical terms to define the disease
state. That is, disease is defined by the distribution of certain features in
a particular population and the extent to which that distribution differs
from a similar assessment of a group the investigators consider not
diseased.6, 40 This statistical approach forms the basis for using biomark-
ers as diagnostic or screening tests.
Contemporary clinical medical and dental practice is still an art and
a science. Overall, the nominalistic approach may offer a more realistic
strategy for coping successfully with the varying manifestations of con-
ditions such as coronary heart disease and temporomandibular disorders
(TMD) that can be defined in both essentialistic and nominalistic
terms.24, 29
DIAGNOSTIC DECISION ANALYSIS
The use of diagnostic data and tests can be considered at three

levels: screening, confirmatory, and exclusionary.13, 29 The objective of
screening procedures is the early detection of disease, before symptoms
associated with the disease are apparent. Thus, screening tests are con-
ducted on individuals who do not have symptoms associated with the
condition for which screening is being conducted. Screening tests classify
individuals with respect to their likelihood of having a particular disease,
but they do not diagnose disease. Individuals whose screening tests are
positive require further evaluation by subsequent tests to rule in or to
rule out the presence of the disease.13, 29
The use and interpretation of diagnostic data, including signs,
symptoms, and diagnostic tests, are based on the four principles of
decision analysis29, 31:
1. Clinicians should not consider patients as absolutely having a
disease but rather as having only the probability of disease. The
probability of disease is based on the prevalence of the disease,
the patient’s history (including risk factors, symptoms, signs,
and previous tests), and the clinician’s previous experience with
similar situations.
2. Clinicians use diagnostic tests to improve their estimates of the
probability of disease, and the estimate of probability following

the test may be lower or higher than the estimate of probability
before the test. Tests should be selected by their ability or power
to revise the initial probability of disease.
3. The probability that disease is actually present, following a posi-
tive or negative test result, should be calculated before the test is
performed. Application of this principle results in fewer useless
tests being performed.
4. A diagnostic test should revise the initial probability of disease.
If the revision in the probability of disease does not alter the
planned course of management or treatment, then the use of the
test should be reconsidered. Unless the test provides information
desired for an unrelated problem, tests that will not alter the
planned course of management or treatment should not be per-
formed.
Principle 1 states that in the diagnostic context, patients do not have
a disease; rather, patients have a probability or likelihood of disease. At
the outset, the clinician may assign to the patient a probability of disease
that reflects the clinician’s level of confidence that the target disease is
actually present. This initial probability may be based on the prevalence
(see box) of the disease in the population and may be revised, upwards
or downwards, based upon the patient’s history, symptoms, signs ob-
tained from the clinical examination, previous tests, and the clinician’s
previous clinical experience with similar situations. If the patient is
known to have one or more risk factors for a certain disease, the proba-
bility of disease may be increased. Thus, a pretest probability, risk, or
likelihood of disease is assigned. Diagnostic tests may then be considered
to revise the pretest probability, as per principle 2. That is, by themselves,
the measurements, assays or test results do not reflect 100% certainty as
to the presence or absence of the disease. Instead, the test results, either
positive or negative, are used to revise, upwards or downwards, the
initial pretest probability of disease. Moreover, once a test has been
carried out, the clinician and patient must accept and deal with the
results. That is, the decision that a test provides useful information is
independent of the actual result. If the clinician picks and chooses which
test results to accept or discard, the clinician opens the door to personal
bias and preconceived notions, undermining the principle of objective
testing.
On completion of the clinical examination, and before further inves-
tigations are considered, the clinician may be confident that a particular
disease or condition really is present. In that instance, there is no need
for further investigations or tests, and management appropriate for the
condition should commence without delay. Likewise, if the clinician is
confident that a particular disease is not present, further investigation
or treatment of that disease is not warranted. These decisions are based
on the threshold approach in decision analysis, shown in Figure 1. For
each condition or disease, the clinician sets a threshold for testing known
Figure 1. Threshold approach to decision analysis: Examples of threshold approach for

disease of pulpal pathology and test of periapical radiograph. ZONE A, A patient complains
of sensitivity to cold and sweet stimuli. These symptoms are localized to an unrestored
tooth with no known history of trauma but with visible cervical abrasion and root recession.
Pulpal pathology is most likely absent because root sensitivity caused by exposed dentin
is the most probable diagnosis. A radiograph would not be warranted because information
obtained from the radiograph would not alter the diagnosis or further management. ZONE
B, A patient with a poorly maintained dentition describes intermittent and increasing sensitiv-
ity to cold and sweet stimuli and occasional spontaneous discomfort lasting over an hour
and requiring analgesics for relief. These symptoms are associated with a heavily restored
tooth with subgingival restorative margins. Recurrent caries or pulpitis may be present. A
radiograph is warranted because it may provide useful information for diagnosis and further
management. ZONE C, A patient describes severe pain with biting pressure and denies
sensation to cold stimuli. These symptoms are localized to a molar with visible gross caries.
Radiographs are not required for the clinician to arrive at the diagnoses of caries and a
nonvital pulp; however, a periapical radiograph is indicated to guide prognosis and further
treatment, such as endodontic therapy or extraction.
as the test threshold and a second threshold for treatment known as the
test-treatment threshold.29 In general, these cutoff threshold probabilities
for ruling in or ruling out a disease depend on the particular disease
and the subsequent courses of action or follow-up that relate to either
ruling in or ruling out the disease. That is, the consequences of false-
positive and false-negative results must be weighed in each case. If a
test is not powerful enough to alter the pretest probabilities so that a
positive or negative test result will not alter the pretest planned course
of action, the test should not be performed.29, 31 The strategies for defining
specific test and test-treatment threshold cutoffs are discussed in greater
detail by Sacket et al.29
Three clinical decisions are depicted in Figure 1. In the first instance,
the pretest probability of a disease is below the test threshold (Zone A
in Fig. 1). The patient is unlikely to have the disease, and even a positive
test result would not alter the posttest probability to a level that would
justify treatment. Therefore, neither treatment of the disorder nor further
testing for the disorder should proceed. For example, multiple yellowish
spots and plaques are observed bilaterally on the posterior buccal mu-
cosa of an elderly male patient. The spots and plaques cannot be re-
moved with gentle wiping of a gauze across the mucosal surface. The
clinician is confident that Fordyce granules are present and that no
pathologic condition is present. Therefore, further investigations such as
biopsy or further management or treatment are not indicated.
In similar fashion, if the pretest likelihood of disease exceeds the
test-treatment threshold (zone C in Figure 1), treatment should proceed
without further diagnostic testing. For example, soft white plaques re-
sembling milk curds are observed on the palate and buccal mucosa of
an elderly male patient. The plaques may be stripped from the tissue,
leaving an intensely erythematous surface with localized bleeding. Oral
thrush (candidiasis) is most likely present, and further investigation
such as biopsy will not alter the diagnosis or the probable management
with antifungal medications.
When the pretest probability falls in between the test and test-
treatment thresholds, however (zone B in Figure 1), testing is indicated,
and treatment should proceed on the basis of the test results. In general,
a diagnostic test is most useful when the pretest probabilities fall be-
tween roughly 30% and 70%.5, 20, 21 For example, an adherent white
plaque is observed on the anterior floor of the mouth and ventral left
lateral tongue of an elderly adult male. A pathologic condition may or
may not be present. Further investigation such as biopsy is indicated to
establish a diagnosis and to direct further management.
MEASUREMENT RELIABILITY
Measurement reliability refers to the ability to obtain the same

measurement consistently over sequential measures. The reliability of a
measurement may be affected by three sources of variability: (1) the
system or phenomenon being examined, (2) the examination itself, such
as the instruments or equipment used and the examination environment,
and (3) the examiners.4, 29
Variation in the System or Phenomenon Being

Measured
Normal biologic variability may be inherent in the phenomenon

being measured. For example, blood pressure and pulse fluctuate
throughout the day and under different circumstances such as stress,
exercise, and body position; hormonal levels fluctuate with the diurnal
and menstrual cycles. Moreover, the very act of measurement may
influence or alter the phenomenon being measured so that repeated
measurements (test-retest) are not reproducible (not reliable). For exam-
ple, if persons are asked to bend over and touch their fingers to their
toes, they may not be able to do so on the first attempt. After several
attempts, however, the distance between fingers and toes may decrease.
In similar fashion, clinical variables for assessment of TMD such as
muscle palpation and assessment of joint sounds may not be stable in
the short- or long-term, and they may be altered by repeated palpation
or repeated mandibular movements.39 Some phenomena such as blood
pressure will demonstrate regression towards the mean by returning to
usual levels over time.4 Therefore, evaluation of some phenomena may
require several examinations over time before a diagnosis is finalized.
Variability from Examination Equipment and

Environment
In laboratory-based measurements, instruments are typically cali-

brated against established standards such as those of the American
National Bureau of Standards, and the measurements are performed
under controlled and specified conditions. The results and variability in
these measurements are usually expressed as a standard deviation of
the individual values or as confidence intervals around the calculated
means.4, 22
It is important to distinguish the reliability of a measurement from
the precision of the measurement. The precision of a measurement refers
to the exactness or degree of refinement with which a measurement is
stated. For example, clinicians may measure the anatomic root length on
a radiograph to the nearest half-millimeter with a Boley’s gauge or
measure the depth of a periodontal pocket to the nearest millimeter with
a periodontal probe. Alternatively, these measurements could be made
electronically using tools with more precision, perhaps facilitating
measurements to the nearest hundredth of a millimeter. Such a level of
precision, however, may not be clinically relevant and would not neces-
sarily translate to higher reliability scores. That is, just because a
measurement is precise does not mean that it is reliable. In fact, the
inherent variability of the physical attributes associated with many den-
tal conditions is responsible for the inability to attain higher reliability
scores.
Variability may also originate from the incorrect function or use of
measuring devices or instruments. For example, reliable periodontal
probing requires the use of a calibrated probe, on correct positioning of
the probe, and application of appropriate probing pressure.
Variability of Examiners
Examiners may be inexperienced or incompetent. Examiners also

differ because of biologic variation in the acuity of their senses (e.g.,
sight, touch, hearing), which may be further affected by their mood
and sleep status. Examiners may also replace evidence by inference,
potentially increasing the diagnostic error because a hasty inference may
close a clinician’s mind to other diagnoses.29 For example, a middle-aged
female patient describes symptoms of constant aching, throbbing pain

that began shortly after a recent lengthy dental appointment. The patient
localizes the symptoms to the right submandibular region, right mandib-
ular angle, and the right mandibular molar teeth. The dentist recalls the
recent restoration of extensive caries on the mandibular right first molar.
No radiographic abnormalities are detected, but irreversible pulpitis is
diagnosed, and lengthy endodontic therapy is completed. Unfortunately,
the patient returns the following day with increased bilateral pain of the
mandibular molar teeth, restricted interincisal opening, and pain radiat-
ing from the mandibular molars bilaterally along sides of the face to
the preauricular and anterior temporalis regions. Temporomandibular
disorders, including referred pain from the masseter muscles to the
mandibular molars, are subsequently diagnosed. In this example, the
clinician jumped to the conclusion that the initial symptoms were of
odontogenic origin and failed to consider the common alternative of
referred pain from the masticatory muscles to the teeth.23
A clinician’s diagnosis may also be affected by the mind set; that is,
clinicians tend to diagnose what they expect or hope to find.29 For
example, when pathologists reach a diagnosis, they may be influenced
by factors other than the histomorphology of the tissue on the slide.
Schwartz et al33 suggest that the pathologist’s knowledge of the patient’s
clinical presentation may be considered and incorrectly weighted in
reaching a diagnosis, so that the clinical data are double counted. If the
pathologist knows that a biopsy specimen has been obtained from an
area of erythroleukoplakia on the floor of the mouth of a heavy smoker
and alcohol drinker, the suspicion of malignancy is raised even before
the slide is placed on the microscope stage.9, 19 In such instances, the
dysplasia or carcinoma may be unconsciously graded as more severe
than if the clinical information were not available to the pathologist.33
Specific biologic assays do not exist for all diseases, and investiga-
tors may need to make judgments using criteria that are not very specific
or make judgments about subject characteristics that are difficult to
evaluate. Because there are no absolute standards, the best that can be
done is to determine if the investigators are consistent in their judg-
ments. That is, performance review of the clinician investigators focuses
on the likelihood that repeated examinations of the same, unchanged
patient by either the same clinician or other clinicians yield identical re-
sults.
Comparisons may be made in which the same investigator examines
the same subjects two or more times (intraexaminer reliability) or in
which different investigators examine the same subjects (inter-examiner
reliability). Interobserver variability is minimized when the endpoints
are well defined and quantifiable, such as measuring the anatomic root
length on a periapical radiograph or measuring overbite or overjet on
study models. Interobserver variability is greater when criteria are vague
and subjective, as in the clinical diagnosis of TMD24, 39 or histologic
diagnosis of dysplasia.17, 18, 26
Reliability measures provide information only about how well the

examiners agree, not about whether the conclusions are correct.
Inter- and intraexaminer reliability have been quantitated by such
measures as the Pearson correlation coefficient, intraclass correlation
coefficient (ICC), or the kappa statistic (␬) (Table 1). For more details,
readers are directed to the text by Norman and Streiner.22
Correlation Coefficients
The correlation coefficient, more properly called the Pearson product

moment correlation coefficient, is used with continuous data. It is based
on the extent to which the relationship between two variables can be
described by a straight line called the regression line. The Pearson
correlation coefficient, r, is a measure of the strength of the relationship
between two sets of data. The strongest positive correlation has a value
of 1.0, no relationship is indicated by 0, and perfect negative correlation
has a value of 1.0. Thus, correlation coefficients with values closest to
1.0 demonstrate the greatest relationship between sets of data, but per-
fect agreement occurs only when the regression line has a slope of 1;
that is, the points fall along the line of equality.
In regression analysis, the square of the correlation coefficient, r2, is
known as the coefficient of determination, which is, in effect, the fraction
or proportion of the total variance in the dependent variable that can be
explained by the relationship between variables; r2 tends to overestimate
the true reliability. In general, r values will be higher than, and overesti-
mated in comparison with, the more theoretical sum reliability calculated
by the intraclass correlation coefficient (ICC). Bland and Altman3 have
discussed the problems with use of correction coefficients and have
developed an alternative method for assessing agreement between two
methods of clinical measurements based on graphic techniques.
Intraclass Correlation Coefficient
The ICC is generally derived from analysis of variance calculations.

Intraclass correlation coefficient values can range from 0 to 1.0. Unlike r,
the ICC value indicates what proportion of the total observed variability
is caused by variability among the subjects as compared with variability
among the examiners. If most of the variability results from discrepan-
cies among examiners, the ICC values are low. Alternatively, if the
examiners are reliable (consistent) among themselves, ICC values are
high (e.g., between 0.75 and 1.00), and in effect one examiner could be
replaced with another.8 The ICC values may be interpreted in a manner
similar to ␬ scores, which are more commonly used.
96
Table 1. RELIABILITY OF SOME MEASUREMENTS/TESTS USED IN DENTISTRY
Correlation
Coefficient Kappa
Intraclass
Reference Inter- Intra- Correlation Inter- Intra-
Test Number observer observer Coefficient observer observer
Periodontics
Probing depth, general 5b, 10a 0.63 0.72 0.26
Plaque 6b, 11a 0.81 0.32 0.22
Temporomandibular disorders
Temporomandibular joint sounds—manual palpation
Trained examiner 8 0.68 0.62
Untrained examiner 8 0.35 0.30
Temporomandibular joint sounds—stethoscope
Trained examiner 8 0.26 0.61
Untrained examiner 8 0.32 0.35
Mandibular kinesiology
Maximal pain-free vertical opening 8, 10b
Trained examiner 0.89 0.90
Untrained examiner 8 0.72
Dental radiology
Caries, calibrated examiner 36a 0.73 0.80
Periodontal disease, calibrated examiner 36a 0.80 0.79
Degenerative temporomandibular joint changes on 5a 0.47–0.80 0.58–0.79
tomography
Disk displacement on MR imaging 24 0.70
Oral pathology
Diagnosis of dysplasia 1 0.30–0.63 0.29–0.48 0.05–0.49
Grading of oral leukoplakia from no dysplasia to carcinoma 16 0.27–0.45
in situ
Kappa Scores
The best approach in evaluating reliability for noninterval data is
the ␬ statistic, which adjusts for the degree of agreement expected by
chance. For a perfect association, ␬ 1.0, and for no association ␬ 0.
Qualitative interpretation in relation to ␬ values vary,16, 29 but Brunette4
suggests that ␬ values below 0.4 indicate poor agreement, ␬ values of
0.4 to 0.75 are fair, and ␬ values of 0.75 to 1.0 are excellent. A rule of
thumb is that clinical studies should not proceed before investigators
have been trained and calibrated with demonstrated high ␬ scores (e.g.,
␬ ⬎ 0.6).
Table 1 lists the reliabilities of some measurements and tests used
in dentistry and illustrates the differences between correlation coeffi-
cients and ␬ scores. For example, the interexaminer correlation coeffi-
cients for probing depths and plaque assessment are 0.63 and 0.81,
respectively; in contrast, the interexaminer ␬ scores are only 0.26 and
0.22! In similar fashion, Abbey et al1 calculated correlation coefficients
and ␬ scores for six pathologists whose agreement between their original
sign-out diagnoses of dysplasia and subsequent reexaminations of the
same slides were compared. Correlation coefficients averaged 0.50; in-
traexaminer ␬ scores ranged from 0.05 to 0.49. In the same study, interex-
aminer ␬ scores for the presence or absence of dysplasia ranged from
0.29 to 0.48.
MEASUREMENT VALIDITY AND THE REFERENCE

STANDARD
Measurement validity refers to the truthfulness of the measurement
or technique. In other words, whether the measurement measures what
it claims to measure. The determination of measurement validity re-
quires a comparison of the measurement or technique with a reference
measure or technique that has been accepted as true and is the acknowl-
edged standard, at the time, for definitive diagnosis of the disease or
condition. The principle of measurement validity is crucial to clinical
measurements because even if a measurement is highly reliable, the
measurement has no diagnostic value if that measure does not accurately
reflect the characteristic of interest. For example, a clinician may reliably
measure the anatomic root length of an incisor on a periapical radio-
graph. If however, the bisecting angle technique rather than the parallel
technique was used for exposure of the radiograph, the measured root
length may not be a true or valid representation of the anatomic root
length.
The classification of disease is traditionally based on pathologic
anatomy,40 and therefore the histopathologist’s diagnosis is typically
regarded as the reference standard. Performing the reference test of
autopsy or histopathologic examination is not always feasible, however,
because obtaining a specimen is generally an invasive procedure that
may also be risky, expensive, and often impossible to perform in a timely
manner. Not all body sites are as readily accessible for biopsy and
histologic examination as the oral soft tissues. Therefore, surrogate pa-
rameters such as biologic assays or measurements are used as the stan-
dard for comparison. For example, in the case of bovine spongiform
encephalopathy and its human variant, Creutzfeld-Jacob disease, au-
topsy is both the reference standard and the only reliable and valid
diagnostic tool at this time. If valid and less invasive laboratory tech-
niques were available, earlier diagnosis of the disease would be possible.
The assumed benefit of earlier disease detection, such as through screen-
ing tests, must be tempered with the possibility that for some diseases
earlier detection is unlikely to improve the prognosis. The early detection
of disease is assumed to be beneficial, because treatment initiated before
the onset of symptoms is assumed to be more effective than later
treatment and thereby the development of disease may be reduced or
eliminated. For some conditions, such as Creutzfeld-Jacob disease, there
is no effective treatment at this time; hence, the earlier diagnosis of some
conditions must be weighed against the overall risks and benefits for
the individual and society.13
Widmer39 reviewed the measurement validity of TM joint imaging
techniques to anatomy. Arthrography demonstrated an 84% true correla-
tion to anatomy,37 MR imaging had a 73% to 85% true correlation,7 and
tomography had a 63% to 85% true correlation to anatomy.12 Widmer39
also reviewed the measurement validity of TM joint sounds by palpation
and stethoscope in an arthrographic examination of asymptomatic sub-
jects. Assessment for TM joint sounds by manual palpation revealed that
15% of silent joints had disk displacement.37 Joint sound assessment by
stethoscope revealed 14% of silent joints with disk displacement.32 These
results demonstrate that disk displacements may be present in the ab-
sence of joint sounds and that the presence of joint sounds may not offer
a valid assessment of disk displacements.
DIAGNOSTIC VALIDITY AND THE REFERENCE

STANDARD
Biologic assays do not exist for all disorders, and for some diseases
and conditions, a real or practical reference standard does not exist. For
example, biologic assays for TMD and fibromyalgia do not exist, and
there is no reference standard for the measurement of active periodontal
disease. Instead, clinicians use measurements such as probing and at-
tachment levels, which are cumulative indices reflecting the history of
disease (in this case, attachment loss) rather than the presence of active
disease.4 In similar fashion, the diagnosis of fibromyalgia relies on the
key clinical feature of decreased pain threshold as manifested by tender-
ness at 18 specified anatomic locations.
Widmer39 distinguishes measurement validity from diagnostic valid-
ity, which is the extent to which diagnostic criteria can be used to classify
persons as to the absence or presence of a disorder in regards to the
current reference standard classification system. That is, in the absence

of reference standard based on histopathologic or biologic assays, a
general nominalistic impression of the diagnostic usefulness of each
measure is gained through diagnosis of the presence or absence of the
disorder among individuals already known either to have or not to have
the disorder of interest. For example, for fibromyalgia, the diagnostic
validity of tenderness to muscle palpation is evaluated by the ability of
this measurement technique to distinguish between individuals known
to have or known not to have fibromylgia. In the future, if laboratory
findings are linked to fibromyalgia, this new measurement or test ap-
proach must also be assessed for its ability to distinguish between
individuals known to have or not to have fibromylgia. Thus, the relative
diagnostic abilities of the existing method of muscle palpation and the
new laboratory finding can be compared; the more successful method
would be regarded as the reference standard until another new test
proves superior.
TEST CHARACTERISTICS
Traditionally, a new test, measurement, or technique is evaluated in

a sample of patients identified by the existing reference standard either
to have or not to have the disease of interest. A general impression of
the diagnostic strengths of a measure, test, or technique may then be
obtained from characteristics or parameters of the test. Test characteris-
tics are mathematical probabilities that are calculated by direct compari-
son between a test, measurement, or technique and the reference stan-
dard in a 2 2 contingency table (Figs. 1–2; Box). Summary statistics
such as sensitivity, specificity, and predictive values aid in the compari-
son and analysis of different tests. Test accuracy is a measure of the
agreement between the test and the reference standard, but, as discussed
Figure 2. Contingency comparison between gold standard and new test. For example, for
the disease of caries, the gold standard is histologic examination, and a new test for
diagnosis of caries may be direct digital radiography.
Definitions of and Calculations for Test Characteristics
Accuracy is the overall agreement between the test and the ref-
erence gold standard. Accuracy may be calculated
from a 2 2 contingency table as shown in Figure 2
by the formula
ad
abcd
Sensitivity is the proportion of diseased individuals correctly

identified by the test. Sensitivity is also known as the
true positive rate and may be calculated from a 2
2 contingency table as shown in Figure 2 by the for-
mula
a
ac
Specificity is the proportion of non-diseased individuals correctly

identified by the test and is also known as the true
negative rate. Specificity may be calculated from a 2
2 contingency table as shown in Figure 2 by the
formula
d
bd
Prevalence (P) is the overall probability or risk that the disease is

present before the test and is also known as the pre-
test likelihood. Prevalence is the proportion of individ-
uals in a population who have the disease at a spe-
cific point in time. Prevalence in a specified
population may change over time, and prevalence
may change if the definition of the disease changes.
Prevalence may be calculated from a 2 2 contin-
gency table as shown in Figure 2 by the formula
ac
abcd
Post-test is also known as the positive predictive value. For an

Likelihood individual with a positive test result, PTL() is the
of a Positive probability that the disease is actually present. The
Test (PTL) PTL() may be calculated from a 2 2 contin-
gency table as shown in Figure 2 by the formula
a
ab
When the sensitivity, specificity, and prevalence or

pretest likelihood are known, PTL() may be calcu-
lated by the formula
P LR()
PTL()
(1.0 P) P LR()
true positive sensitivity

where LR()
false positive 1.0 specificity
Post-test For an individual with a negative test result, PTL()

Likelihood of a is the probability that the disease is actually present.
Negative Test The PTL() may be calculated from a 2 2 contin-
(PTL[]) gency table as shown in Figure 2 by the formula
c
cd
When the sensitivity, specificity, and prevalence or
pretest likelihood are known, PTL() may be calcu-
lated by the formula
P LR()
PTL()
(1.0 P) P LR()
false negative 1.0 sensitivity

where LR()
true negative specificity
Negative For an individual with a negative test result, the prob-

Predictive Value ability that disease is really absent. The NPV may be
(NPV) calculated from a 2 2 contingency table as shown
in Figure 2 by the formula
d
cd
in the section on likelihood ratios, accuracy is not the sole measure or

guarantee of a test’s clinical usefulness.
Sensitivity is the proportion of individuals who are correctly identi-
fied as having the disease. Specificity is the proportion of individuals
who are correctly identified as nondiseased. Table 2 illustrates the sensi-
tivities and specificities of some diagnostic tests used in dentistry.
Sensitivity and specificity are typically calculated in defined popula-
tions in which the disease status of the individuals is already known
and confirmed by the reference standard and in which only extremes of
disease (the very sick) and health (the very healthy) are represented. As
discussed later, these circumstances do not represent the true clinical
situation. If the clinician already knew the disease status of a patient,
Table 2. SENSITIVITIES, SPECIFICITIES, AND LIKELIHOOD RATIOS OF SOME

DIAGNOSTIC TESTS USED IN DENTISTRY
Reference LR LR
Test Number Sensitivity Specificity ()* ()†
Caries
Clinical examination 36b 0.13 0.94 2.2 0.93
Bite-wing radiographs 21a 0.73 0.97 24.3 0.28
Periodontics
Gingival redness 11a 0.27 0.67 0.82 1.09
Plaque 11a 0.47 0.65 1.3 0.82
Bleeding on probing (2 mm, 5/6 18a 0.29 0.88 2.4 0.81
threshold)
Temporomandibular joint disorders
Temporomandibular 7a 0.43 0.75 1.7 .76
sounds—manual
palpation—single click
Disk displacement on MR imaging 0.86 0.63 2.3 .22
Degenerative changes on sagittal 36a 0.47 0.94 7.8 .56
tomography
sesnsitivity
*LR () is calculated by
1.0 specificity
1.0 sensitivity
†LR () is calculated by
specificity
there would be no need for further investigation. Instead, the clinician

is typically confronted with equivocal cases among a population of
healthy and diseased individuals.
The difference between the diagnosis for presence or absence of
disease or abnormality depends on the selection of cutoff points.
Changes in activity or level of any physiologic, biochemical, or molecular
marker are typically reflected by continuous measures. In contrast, the
presence or absence of an abnormality or disease is typically a dichoto-
mous diagnosis, such as normal versus abnormal or health versus dis-
ease, on occasion gradations of abnormalities are also used, such as mild,
moderate, or severe dysplasia, and hypertension, which is classified as
stage 1 to stage 4. Continuous measures may be collapsed to dichoto-
mous data by the selection of cutoff points. For example, individuals
exhibit a wide range of pain-free unassisted vertical and horizontal
mandibular movements, and these mandibular kinesiology measure-
ments are used as diagnostic criteria for TMD.39 If the cutoff point
between non-TMD (health) and TMD (disease) is arbitrarily set at inter-
incisal opening of 40 mm, then theoretically, the patient with a 39-mm
opening is eligible for diagnosis of TMD, but another patient with a 41-
mm opening is diagnosed as non-TMD. Alternatively, if the cutoff point
between non-TMD and TMD is set instead at 35 mm, the same patient
with a 39-mm opening would be excluded from diagnosis of TMD. In
similar fashion, the number of specified muscle sites that are tender to
palpation and the number and type of TM joint sounds will affect the
proportions of individuals diagnosed with TMD.39
Ideally, the selection of a cutoff point should be based on what is
best for the patients concerned, and the consequences of over- and
underdiagnosis must be considered. If the condition is innocuous and
neither shame nor anguish is associated with the diagnosis (for example,
the diagnoses of linea alba or the common cold), then the cutoff for
classification as diseased may be relaxed. Conversely, if there is no
advantage in early diagnosis, a positive diagnosis has the potential to
produce anxiety in the patient, and there is no effective treatment, the
cutoff for disease should be set high (ⱖ 99%) to exclude the nondis-
eased.29
The selection of the cutoff point will determine the proportion
of true-positive, false-positive, true-negative, and false-negative results,
which, in turn, will produce different estimates of the sensitivity and
specificity of the diagnostic test (see box). A perfect test will yield only
true-positive and true-negative results without any overlap or false-
positive or false-negative result (Fig. 3).
Criteria for Selection of Test Thresholds
Low Threshold
• selected if it is important that all individuals with the disease or
its progression are detected
• provides high sensitivity and high PTL()
• results in increased number of false-positive results because of the
low specificity
• is useful for screening for serious or life-threatening disease but
confirmation testing is required (e.g., dentists perform screening
examinations for high blood pressure or for oral cancer in patients
who are asymptomatic for these diseases.)
High Threshold
• limits the number of false-positive results
• is required for confirmation testing
• results in high specificity but lower sensitivity. High-specificity
values are important for diseases that are not life-threatening such
as TMD. High specificity excludes individuals without the disease
from pursuing unnecessary, irrelevant, and possibly invasive, irre-
versible, and expensive treatment.
In general, if a low threshold is selected, the sensitivity is increased,
and the specificity is decreased; a high threshold results in high specific-
ity but lower sensitivity. High sensitivity is desirable for screening tests.
High specificity is required for exclusionary tests to minimize the num-
ber of false-positive results. The highest possible sensitivity and specific-
ity are desirable for confirmatory tests to minimize both false-positive
and false-negative results. Unfortunately, high sensitivity and high speci-
ficity are rarely found in a single test.
No. of Individuals
TNF
TPF
A Parameter
No. of Individuals
TNF
TPF
B Parameter
Figure 3. Hypothetical distribution of healthy (true positive fraction [TPF]) and diseased
(true negative fraction [TNF]) populations. Test results yield different estimates of sensitivity
and specificity A, Hypothetical perfect test with 100% sensitivity and 100% specificity. The
diseased (TPF, dashed line) and healthy (TNF, solid line) individuals are identified without
false negative (FNF) or false positive (FPF) fractions. B, Hypothetical useless test. The
diseased and healthy populations are not identified by the test.
Receiver Operating Characteristic Analysis
One of the best methods to evaluate the effect of different cutoff

points is the receiver operating characteristic (ROC) analysis (Fig. 4). An
ROC analysis plots the true-positive fraction (sensitivity) as a function
of the false-positive fraction (1.0 specificity), and points along the
curve can be used to determine the effect of different thresholds for the
test. Selection of points towards the left of the curve yields higher
specificity, and points to the right yield higher sensitivity. An ROC
analysis also permits the comparison of different tests without any
selection of upper or lower reference limits or any particular sensitivity
or specificity. It is widely agreed that ROC curves are independent of
the disease prevalence and therefore reflect the true performance of the
diagnostic tests.11, 29
In clinical practice, the selection of cutoff points is determined by
several factors, including mortality and morbidity of the disease, the
No. of Individuals Cut-Off #1
FNF
TNF TPF
FPF
C Parameter
No. of Individuals
Cut-Off #2
FNF
TNF TPF
FPF
D Parameter
Figure 3 (Continued). C and D, Hypothetical typical test with overlap of healthy and
diseased populations. The selection of the cut-off point to distinguish between healthy and
diseased individuals affects the proportion of the FNF and FPF. Sensitivity and specificity
are affected by the selection of the cut-off point. In C, the cut-off point is located further to
the right than the cut-off point in D. Therefore, the FPF in C is smaller than the FPF in D.
Conversely, the FNF in C is larger than the FNF in D.
consequences of over- and undertreatment, and the cost and time re-
quired to perform the diagnostic test. Once test thresholds are estab-
lished, sensitivity and specificity are considered to be stable properties
of the test because they are apparently not affected by the prevalence of
the target disease. Some evidence, however, indicates that sensitivity
and specificity do change from one clinical population to another,14, 15
especially if the stage of disease varies in different groups of patients.11, 24
The Effects of Prevalence
Sensitivity (true-positive rate) and specificity (true-negative rate) are

measures of how well the test correctly identifies diseased and healthy
individuals, respectively. Sensitivity and specificity do not provide the
clinician with any information about whether the test will provide mean-
ingful diagnostic information for individuals whose disease status is not
known. Hence, the predictive values (see box on pages 100–101) of a
test are required to provide information about how often a test will
Figure 4. Receiver operating characteristics (ROC) curves plot the TPF (sensitivity) against
the FPF (1.0-specificity). ROC curves permit selection of the threshold or cut-off point that
provides the best combination of sensitivity and specificity scores. The most discriminating
tests cluster in the upper left-hand corner, and the most discriminating test has the greatest
area under its ROC curve. ROC curves also permit the comparison of tests without selection
of reference limits or sensitivity and specificity. For example, this figure compares the ROC
curve for conventional radiographic film evaluation of artificial cortical bone lesions, pro-
duced with a size 6 burr in dried mandibles (bulleted line) with the ROC curve for conven-
tional radiographic film evaluation of in vivo periodontal crestal alveolar bone loss (dashed
line). In this example, the area under the ROC curve for the detection of in vitro cortical
lesions is larger than the area under the ROC curve for the in vivo detection of periodontal
crestal bone loss. As expected, conventional radiographic evaluation of in vitro artificial
cortical lesions is more discriminating or a more powerful test than conventional radio-
graphic evaluation of in vivo crestal bone loss. Solid line ROC curve of noise or a
hypothetical useless test. (Dashed line, Data from Nummikoski PV, Steffensen B, Hamilton
K, et al: Clinical validation of a new subtraction radiography technique for periodontal bone
loss detection. J Periodontol 71:598–605, 2000; Bulleted line, Data from Paurazas SB,
Geist JR, Pink FE, et al: Comparison of diagnostic accuracy of digital imaging by using
CCD and CMOS-APS sensors with E-speed film in the detection of periapical bony lesions.
Oral Surg Oral Med Oral Pathol Oral Radiol Endod 89:356–363, 2000.)
provide a correct diagnosis in a mixed population. Three predictive

values may be calculated: (1) positive predictive value, (2) negative
predictive value and (3) posttest likelihood of a negative test.
The positive predictive value is also known as the post-test likeli-
hood of a positive test (PTL[]). For a patient who has undergone a
diagnostic test and obtained a positive test result, PTL() is the proba-
bility that disease is actually present. When a negative test result is
obtained, the probability that disease is truly absent is known as the
negative predictive value. For a patient with a negative test result, the
clinician may need to know the probability that disease is actually
present; this probability is known as the post-test likelihood of a negative
test (PTL[]). Although a negative result will reduce the probability of
disease being present, typically it will not absolutely eliminate this
possibility.
The predictive values of a test vary widely as the prevalence of the
disease changes.11, 29 Prevalence is also known as the pretest likelihood,
and it is the overall probability or risk that disease is present before the
test is administered.
For example, toluidine blue has been advocated for the detection of
oral squamous cell carcinoma (SCC). The sensitivity of toluidine blue
ranges from 93.5% to 97.8%, and its specificity ranges from 73.3% to
92.9%.28 The predictive values of toluidine blue and the conclusions
provided by this test will vary, however, depending on the individual
patient to whom or the population in which the test is applied. The
prevalence of SCC in the general population has been estimated at 3%,25
and therefore the posttest likelihood of a positive toluidine blue test in
the general population is only 6%.10 In contrast, the prevalence of SCC,
either as primary or recurrent disease, is greater in a tertiary care center
for oral SCC, where prevalence estimates range from 26%25, 34 to 33%.10
Consequently, the posttest likelihood of a positive test in a tertiary care
center is also greater (51%).10 In the high-prevalence setting, the posttest
likelihoods of the tests are considerably higher than the pretest probabili-
ties, meaning that there is a considerably increased probability that the
disease is actually present. In contrast, the posttest likelihoods of the
same test in the general population (low-prevalence setting) are similar
to the pretest probabilities, meaning that there is only a slight increase
in the probability that the disease is actually present. Nevertheless, the
significance of each positive and negative test must be evaluated on an
individual basis by the clinician, who must then decide the subsequent
course of action.
The example with the toluidine blue test demonstrates that even a
test with high sensitivity (93.5%–97.8%) and specificity (73.3%–92.9%)27
can yield low predictive values when the prevalence (or pretest likeli-
hood) is low. Sacket et al29 further illustrate this point using a theoretical
test with 95% sensitivity and 95% specificity under conditions of variable
prevalence. For example, as the prevalence changes from 99% to 1%, the
PTL() changes from 99.99% to 16%, respectively. Thus, even a test that
has excellent specificity and sensitivity will produce a low likelihood of
disease being present if it is applied to an individual in a population in
which the initial pretest prevalence is low.
The choice of a particular test for a specific disease is determined
by the power or ability of the test to revise the pretest probabilities,
either upwards to rule in the disease, or downwards to rule out the
disease. In general, for a test with a sufficiently high sensitivity, a
negative result rules out the disease. In contrast, for a test with a
sufficiently high specificity, a positive result rules in the disease.29 In
other words, the clinician relies on pattern recognition: ‘‘if it looks like
a duck, quacks like a duck, and waddles like a duck, it probably is
a duck.’’
Different tests for the same disease can be used in combination,
either in series, such as screening testing followed by confirmation
testing, or in parallel.
In series testing
• tests are used in succession
• if tests A and B are used in series, then either test A or test B can
be used first
• a positive result on the first test requires testing with the second
test
• is less sensitive in detecting disease than parallel testing, but series
testing has greater specificity and is more efficient at confirming
the presence of disease
• is used in confirmation testing
In parallel testing
• tests are performed concurrently
• if tests A and B are used in parallel,
a positive result requires positive results for either test A or
test B
a negative result requires that both test A and test B are negative
• is more sensitive than series testing for detecting disease but less
efficient at confirming presence of disease
At health fairs, clinician dentists may perform screening tests for
oral cancer through a careful visual inspection of the oral soft tissues or
a screening test for TMD by evaluating the patient’s range of pain-free
mandibular movements. With positive results of suspicious oral lesions
or a restricted range of jaw movements and associated discomfort, the
patients would be referred to their own dentists or to specialists for
possible oral biopsy or more detailed TMD evaluation including assess-
ment of joint sounds and TM joint and neck and masticatory muscle
tenderness.
When several tests are used in sequence, the posttest likelihood of
disease after the first test is used as the pretest likelihood for the
subsequent test. A possible problem with this approach is the propaga-
tion of errors, because each test can be considered as having some
associated error. Therefore, as more tests are performed, the precision of
the probability estimate will decline. The posttest probability of disease
may also be distorted by the end of the test sequence if the clinician
assumes that the tests are independent when the test results are actually
dependent. That is, the test result on one test or measure may affect the
characteristics of the second test, a phenomenon termed concordance or
convergence.29 Concordance occurs when patients who are positive on
one of the paired tests are likely to be positive on the other one as well,
or when patients who are negative on one test are likely to be negative
on the other one. For example, the electric pulp-stimulation test is much
more likely to be positive when the thermal (cold) test is positive (i.e.,
the patient reports sensation upon cold stimulation of the tooth) than
when the thermal test is negative (the patient denies sensation to cold
stimuli). Conversely, teeth with negative results on one test (either the
cold or electric pulp test) are also likely to be negative on the other.
Concordance results in an overestimate of disease likelihood. Sacket et
al29 suggest that for short courses of two or three diagnostic tests,
convergence is not a serious problem but should be considered. For
example, concordance was observed between the use of toluidine blue
and visual clinical examination of patients in an oral cancer tertiary care
center by a trained and experienced clinician.10 That is, oral lesions that
were classified as suspicious or positive by one these methods were
likely to be positive on the other method as well. When the results of
both the visual clinical examination and toluidine blue were positive,
the pretest likelihood of 33% was raised to a posttest likelihood of 54%,
which is greater than the PTL() obtained by either toluidine blue
application alone (51%) or the visual clinical examination alone (44%).10
A PTL() of 54% calculated with consideration of concordance is a
lower but more realistic value than the PTL() of 62% that is calculated
if the tests are used sequentially and assumed to be independent.
LIKELIHOOD RATIOS AND NOMOGRAMS
Principles three and four of the diagnostic decision analysis require

that the interpretation of possible test outcomes precede the ordering of
the test and that testing should proceed only if the subsequent manage-
ment of the patient will be altered as a result of the test result. How can
this interpretation be accomplished?
If the sensitivity and specificity of a particular test and the preva-
lence of the disease of interest are known, the post-test likelihoods of a
positive and negative test can be calculated from the formulas for
PTL() and PTL() shown in the box on pages 100–101. These calcula-
tions use likelihood ratios that ‘‘express the odds that a given level of a
diagnostic test result would be expected in a patient with (as opposed
to one without) the target disorder.’’29 Sensitivity and specificity are
probability statements, and they may be converted to odds ratios, which
are the ratio of two probabilities. Probabilities and odds contain the
same information but convey it differently. Thus, a probability of 50%
means even odds of 1:1. Likelihood ratios provide a measure of a test’s
ability to revise the pretest probabilities, and they are simple to calculate
from the sensitivity and specificity of the particular test. Although sensi-
tivity and specificity are used to calculate the likelihood ratios of a test,
it is the likelihood ratios, not sensitivity and specificity, that provide
information as to the potential power of the test. As a rule of thumb, if
the sum of a test’s sensitivity and specificity is unity (1.0), the test is
useless: the likelihood ratios of the test are also unity (1.0), and therefore
the test has no power to revise the pretest probability. In general, power-
ful tests for revising pretest probabilities of disease have positive likeli-
hood ratios with values greater than 10 and negative likelihood ratios
less than 0.1.
Likelihood ratios offer diagnostic advantages in that they are less
susceptible than sensitivity or specificity to changes in the prevalence or
pretest probability of the disease.29 Likelihood ratios may also be calcu-
lated for dichotomous levels of disease and for several levels of the test
result. The product of the likelihood ratio for the diagnostic test result
and the pretest odds for the target disorder yields the posttest odds for
the target disorder.29 A convenient method for rapidly calculating post-
test probability of disease is offered by the use of likelihood ratios for
the test and nomograms.
Nomograms (Fig. 5)30 offer a convenient and fast alternative to the
calculation of posttest likelihoods using the formulas shown in the box
on pages 100–101. Table 2 illustrates the sensitivities, specificities, and
likelihood ratios of some diagnostic tests used in dentistry. Figure 6
demonstrates use of the nomogram in the diagnostic decisions for three
examples of potential interproximal caries (the disease) and use of bite-
wing radiographs (the test). In each case, the clinician detects a small
area of discoloration on the distal aspect of the maxillary second bicuspid
but is not able to engage the explorer interproximally. For the disease of
caries, the clinician has assigned a test threshold of 30% and a test-
treatment threshold of 65% (Figs. 1, 6).
Patient A is an adolescent female who aspires to a career in model-
ing with an unrestored permanent dentition. Patient A practices excellent
oral hygiene and is compliant with twice-yearly prophylaxis appoint-
ments. Bitewing radiographs taken 2 years ago at the completion of
orthodontic treatment do not reveal any abnormalities. The clinician
assigns a pretest probability for caries of 1%. The clinician’s pretest
probability is located well below the test threshold of 30%, and therefore
radiographs would not be indicated. In the unlikely event that radio-
graphs (the test) were performed with a positive test result, the probabil-
ity of caries or PTL() can be calculated to be 20%. Despite this positive
test result, no further tests or restoration would be indicated, because
this probability is still less than the test threshold of 30%. If the test
results were negative, PTL() can be calculated to be 0.4%, effectively
ruling out the presence of caries.
Patient B is a young adult male with a moderately restored posterior
dentition. Patient B is a pastry chef apprentice who demonstrates poor
oral hygiene and poor compliance with recommended dental recall and
prophylaxis appointments. The patient was last seen 3 years ago when
bitewing radiographs revealed no sites of interproximal caries in the
posterior mandibular dentition. The clinician assigns a pretest probabil-
ity of 50% to the presence of caries. This pretest probability is located
between the test and test-treatment thresholds; therefore, bitewing radio-
graphs are indicated. With a positive test result, treatment is indicated,
but a negative test result rules out the disease and treatment.
Patient C is an elderly patient with a heavily restored dentition and
Figure 5. Nomograms have converted pre- and post-test odds to their corresponding
probabilities. To use the nomogram, a straightedge is used to align the pretest probability
(left column) with the likelihood ratio (center column) of the test being used. The post-test
probability is revealed by reading across the straightedge to the right-hand column on the
nomogram. (Data from Fagan, TJ: Nomogram for Bayes’ theorem [letter]. N Engl J Med
293:257, 1975; Sacket DL, Richardson WS, Rosenberg W, et al: Evidence-Based Medicine:
How to Practice and Teach EBM. New York, Churchill Livingstone, 1997, p 127.)
Figure 6. Diagnostic decisions for bitewing radiographs for three patients with possible
caries (the disease). Patient A, By aligning the straightedge at 1% in the pretest probability
column with 24 in the likelihood ratio column, the post-test probability of caries being
present is raised to about 20%—a value well below the test-treatment threshold of 65%
and below the test threshold of 30%. Despite a positive test result, no further tests or
restoration are indicated, and the clinician may feel confident about merely observing the
tooth. When the pretest probability of 1% is aligned with the likelihood ratio (LR) of a
negative test result (0.28), the post-test probability of disease has been further reduced to
about 0.4%, effectively ruling out the presence of caries. Patient B, The pretest probability
of 50% is located between the test and test-treatment thresholds. Radiographs are indi-
cated. Post-test likelihood of disease (PTL[]) is raised to 92% and treatment is indicated.
PTL() is reduced to 18% and treatment is not indicated. Patient C, The clinician recog-
nizes that the 95% pretest probability exceeds the established test-treatment threshold;
bitewing radiographs are not required for diagnosis and test results would not alter the
proposed management (restoration of the tooth). Even a negative test result (no radio-
graphic evidence of caries) would still result in an 80% post-test probability of caries being
present. Although 80% is a lesser probability of disease than 95%, it still exceeds the test-
treatment threshold and is probably not low enough to change the planned management.
LR() 24; LR() 0.28 (see Table 2); test threshold 30%; test-treatment threshold
65% (see Figure 1); see Figure 4 for nomogram.
recent past history of recurrent and new caries. Patient C is disabled

with rheumatoid arthritis and is xerostomic with poor oral hygiene
although she is a compliant patient. The clinician assigns a pretest
probability for caries at 95%, and treatment is indicated without further
diagnostic testing. That is, radiographs are not required to establish the
diagnosis of caries in this case, although radiographs may provide useful
information to guide treatment of the caries or the diagnosis or treatment
of other pathologic conditions. For patient C, even a negative test result
would still result in an 80% posttest probability of caries being present
and requiring treatment. This case illustrates that clinicians must be
careful not to overestimate the meaning of negative test results when, in
fact, the probability of disease is high.
SUMMARY
This article has briefly introduced the dental clinician to the princi-
ples and practical application of diagnostic decision analysis. There are
trade-offs and uncertainties in the process of arriving at a diagnosis, but
they can be understood and controlled. First, the clinician must under-
stand the significance of disease prevalence and assign to the patient an
initial probability of disease being present. The clinician must then
determine if further diagnostic measurements or tests are warranted. If
so, the appropriate test must be selected, based on the ability of the test
to revise the initial pretest probability. When a diagnostic test is positive,
the clinician must know the probability that disease is actually present.
The clinician must also know the probability that disease is actually
present if the test result is negative. The astute clinician will calculate
the posttest probabilities before proceeding with a test and will base
treatment decisions on test results in accordance with predetermined
test and test-treatment thresholds.
ACKNOWLEDGEMENTS
The authors are grateful to David Perizzolo for formatting the digital figures, to Lesley
Weston for her careful editing, and to Dr. Babak Chehroudi for his critical review.
References
1. Abbey LM, Kaugars GE, Gunsolley JC, et al: Intraexaminer and interexaminer reliabil-
ity in the diagnosis of oral epithelial dysplasia. Oral Surg Oral Med Oral Pathol Oral
Radiol Endod 80:188–191, 1995
2. Beck JD: Issues in assessment of diagnostic tests and risk for periodontal diseases.
Periodontology 2000 7:100–198, 1995
3. Bland JM, Altman DG: Statistical methods for assessing agreement between two
methods of clinical measurement. Lancet 8:307–310, 1986
4. Brunette DM: Critical Thinking. Understanding and Evaluating Dental Research.
Chicago, Quintessence Publishing Co, 1996
5. Choi BCK, Jokovic A: Diagnostic tests. J Can Dent Assoc 62:6–7, 1996
5a. Cholitgul W, Petersson A, Rohlin M, et al: Diagnostic outcome and observer perfor-
mance in sagittal tomography of the temporomandibular joint. Dentomaxillofacial
Radiology 19:1–6, 1990
5b. Clemmer BA, Barbano JP: Reproducibility of periodontal scores in clinical trials. J
Periodont Res 9 (suppl 14):118–128, 1974
6. Dabelsteen E, Mackenzie IC: The scientific basis for oral diagnosis. In Mackenzie IC,
Squier CA, Dabelstein E (eds): Oral Mucosal Diseases: Biology, Etiology and Therapy.
Copenhagen, Laegeforeningens Follag, 1987, pp 99–102
7. Drace JE, Young SW, Enzmann DR: TMJ meniscus and bilaminar zone: MR imaging
of the substructure–diagnostic landmarks and pitfalls of interpretation. Radiology
177:73–76, 1990
7a. Dworkin SF, LeResche L, DeRouen T, et al: Assessing clinical signs of temporomandib-
ular disorders: Reliability of clinical examiners. J Prosthet Dent 63:574–579, 1990
8. Dworkin SF, LeResche L, DeRouen T: Reliability of clinical measurement in temporo-
mandibular disorders. Clinical J Pain 4:89–99, 1988
9. Ephros H, Samit A: Leukoplakia and malignant transformation. Oral Surg Oral Med
Oral Pathol Oral Radiol Endod 83:187, 1997
10. Epstein JB, Oakley C, Millner A, et al: The utility of toluidine blue application as a
diagnostic aid in patients previously treated for upper aerodigestive tract cancers.
Oral Surg Oral Med Oral Path Oral Radiol Endod 83:537–547, 1997
10a. Fleiss JS, Chilton NW: The measurement of interexaminer agreement in periodontal
disease. J Periodont Res 18:601, 1983
10b. Goulet J, Clark GT: Clinical TMJ examination methods. Journal of the California
Dental Association 18:25–33, 1990
11. Greenstein G, Lamster I: Understanding diagnostic testing for periodontal diseases. J
Periodontol 66:659–666, 1995
11a. Haffajee AD, Socransky SS, Goodson JM: Clinical parameters as predictors of destruc-
tive periodontal disease activity. J Clin Periodontol 10:257–265, 1983
12. Hansson LG, Westesson PL, Katzberg RW, et al: MR imaging of the temporomandibu-
lar joint: Comparison of joints of autopsy specimens made at 0.3 T and 1.5 T with
anatomic cryosections. AJR Am J Roentgenol 152:1241–1244, 1989
13. Hennekens CH, Buring JE: Screening. In Mayrent SL (ed): Epidemiology in Medicine.
Boston, Little, Brown and Co, 1987, pp 327–347
14. Hlatky MA, Mark DB, Harrell FE, et al: Factors affecting sensitivity and specificity of
exercise electrocardiography. Am J Med 77:64–71, 1984
15. Hlatky MA, Mark DB, Harrell FE, et al: Rethinking sensitivity and specificity. Am J
Cardiol 59:1195–1198, 1987
16. Karabulut A, Reibel J, Therkildsen MH, et al: Observer variability in the histologic
assessment of oral premalignant lesions. J Oral Pathol Med 24:198–200, 1995
17. Kramer IRH: Basic histopathological features of oral premalignant lesions. In Macken-
zie IC, Dabelstein E, Squier CA (eds): Oral Premalignancy. Iowa City, University of
Iowa Press, 1980, pp 23–34
18. Kramer IRH: Prognosis from features observable by conventional histopathological
examination. In Mackenzie IC, Dabelstein E, Squier CA (eds): Oral Premalignancy.
Iowa City, University of Iowa Press, pp 304–311, 1980
18a. Lange JP: Clinical markers of periodontal disease. In Johnson NW (ed): Risk Markers
for Oral Disease, vol. 3. Periodontal Disease, Markers of Disease Susceptibility and
Activity. Cambridge, Cambridge University Press, 1991, pp 179
19. Mashberg A: Clinical features of oral malignancy in relation to prognosis. In Macken-
zie IC, Dabelstein E, Squier CA (eds): Oral Premalignancy. Iowa City, University of
Iowa Press, pp 292–334, 1980
20. Matthews DC, Banting DW: Authors’ response. J Can Dent Assoc 62:7, 1996
21. Matthews DC, Banting DW, Bohay RN: The use of diagnostic tests to aid clinical
diagnosis. J Can Dent Assoc 61:785–791, 1996
21a. Mileman PA, Vissus T, Pundell-Lewis DJ: The application of decision making analysis
to the diagnosis of approximal caries. Community Dental Health 3:65–81, 1985
22. Norman GR, Streiner DL: PDQ Statistics. Toronto, Canada, Decker Inc, 1986
23. Okeson JP: Management of Temporomandibular Disorders and Occlusion. St. Louis,
C.V. Mosby Co, 1989, pp 147–300
24. Orsini MG, Kuboki T, Terada S, et al: Clinical predictability of temporomandibular
joint disc displacement. J Dent Res 78:650–660, 1999
25. Parker SL, Tong T, Bolden S, et al: Cancer statistics. CA Cancer J Clin 46:5–27, 1996
26. Pindborg JJ, Reibel J, Holmstrup P: Subjectivity in evaluation of oral epithelial dyspla-
sia, carcinoma in situ and initial carcinoma. Journal of Oral Pathology 14:698–708, 1985
27. Rohlin M, Akerman S, Kopp S: Tomography as an aid to detect microscopic changes
of the temporomandibular joint. Acta Odontol Scand 44:131–140, 1986
28. Rosenberg D, Cretin S: Use of meta-analysis to evaluate tolonium chloride in oral
cancer screening. Journal of Oral Surgery 67:621–627, 1989
29. Sacket DL, Haynes RB, Guyatt, et al: Clinical Epidemiology. A Basic Science for
Clinical Medicine, ed 2. Boston, Little, Brown and Co, 1991, pp 3–170
30. Sacket DL, Richardson WS, Rosenberg W, et al: Evidence-Based Medicine: How to
Practice and Teach EBM. New York, Churchill Livingstone, 1997, p 127
31. Schechter MT, Sheps SB: Diagnostic testing revisited: Pathways through uncertainty.
J Can Med Assoc 132:755–759, 1985
32. Schiffman E, Anderson GC, Fricton J, et al: Diagnostic criteria for intraarticular TM
disorders. Community Dent Oral Epidemiol 17:252–257, 1989
33. Schwartz WB, Wolfe HJ, Pauker SG: Pathology and probabilities. A new approach to
interpreting and reporting biopsies. N Engl J Med 305:917–913, 1981
34. Silverman SJR: Oral Cancer, ed 3. Atlanta, GA, American Cancer Society, 1990
35. Streiner DL, Norman GR: Health Measurement Scales. Oxford, Oxford University
Press, 1989, pp 79–95
36. Tanimoto K, Peterson A, Rohlin M, et al: Comparison of computed with conventional
tomography in the evaluation of temporomandibular joint disease: A study of autopsy
specimens. Dentomaxillofacial Radiology 19:21–27, 1990
36a. Valachovic RW, Douglass CW, Berkey CS, et al: Examiner reliability in dental radiog-
raphy. J Dent Res 65:432–436, 1986
36b. Vendonschotsh, Bronkhurst EM, Burgersdijk RCS, et al: Performance of some diagnos-
tic systems in examinations for small occlusal caries. Caries Res 26:59–64, 1992
37. Westesson PL, Bronstein SL, Liedberg J: Temporomandibular joint: Correlation be-
tween single-contrast videoarthrography and postmortem morphology. Radiology
160:767–771, 1986
38. Westesson PL, Eriksson L, Kurita K: Reliability of a negative clinical temporomandib-
ular joint examination: Prevalence of disk displacement in asymptomatic temporo-
mandibular joints. Oral Surgery, Oral Medicine and Oral Pathology 68:551–554, 1989
39. Widmer CG: Physical characteristics associated with temporomandibular disorders.
In Sessle BJ, Bryant PS, Dionne RA (eds): Temporomandibular Disorders and Related
Pain Conditions, Progress in Pain Research and Management, vol 4. Seattle, IASP
Press, 1995, pp 161–174
40. Wulff HR: Rational Diagnosis and Treatment. Oxford, Blackwell Scientific Publica-
tions, 1976

Donald Maxwell Brunette, MSc, PhD
Department of Oral Biological and Medical Sciences
University of British Columbia
2199 Wesbrook Mall
Vancouver, British Columbia
Canada, V6T 1Z3
e-mail: brunette@interchange.ubc.ca
ASSESSMENT OF KEY ELEMENTS

TO DETERMINE CAUSATION
AND RISK FACTORS
IN DENTISTRY
The hypothesis that an exposure or characteristic is associated with

a particular disease outcome can be statistically proven through large
population studies. Causation studies usually involve identifying dis-
eases that are caused by or whose natural history is modified by lifestyle
choices and environmental exposures. A causal association is one in
which a change in the frequency or quality of an exposure or characteris-
tic results in a corresponding change in the frequency or quality of the
disease outcome. The causal characteristics associated with an increase
in disease are often called risk factors. In 1890, Robert Koch clarified the
cause-and-effect relationship of infectious disease when he postulated
that a bacterium was the cause of a single disease entity. He stated that
the specific organism should be present in all hosts suffering from a
specific disease, the microorganism should be isolated from the diseased
host and grown in pure culture in the laboratory, inoculation of the
cultured organism into a healthy host should cause the disease, and the
microorganism should be reisolated from the inoculated host. Although
not all of Koch’s postulates have proved true for all bacteria, viruses,
and prions, Koch’s postulates marked a milestone for cause-and-effect
thinking in health care science.
Epidemiologists frequently perform causation studies. When a true
cause-and-effect association is determined, this information assists in
From the Department of Head and Neck Surgery, MD Anderson Cancer Center, Hous-
ton, Texas

118 JACOB
formulating global strategies for controlling disease based on population

issues such as living conditions, nutrition, personal behavior, lack of
health care education, absence of immunity, and so forth. Clinicians are
interested in the cause of disease so that they may test therapeutic
strategies to prevent or cure the disease. Clinicians are interested in
prevention and therapy for individual patients and smaller patient popu-
lations. For clinicians, strategies are tailored to individual patient charac-
teristics that will allow the clinician to modify therapy prescribed to the
patient afflicted with the target disease.
Although most dentists may view dentistry as a clinical and thera-
peutic science, many global population issues of causation are related
to dentistry. Some examples are public water fluoridation, a possible
association of amalgam restorations with multiple sclerosis, and smoking
as a risk factor for periodontal disease and dental implant loss. Dentistry
has recently begun to examine the association of periodontal disease
and cardiovascular disease.1, 4, 8 In these studies, it has been statistically
proven that some persons with a diagnosis of heart attack or coronary
heart disease are more likely than the general population to have a
diagnosis of periodontal disease. The question arises whether this associ-
ation is a valid cause-and-effect relationship: does periodontal disease
cause cardiovascular disease? To some, this cause-and-effect relationship
seems outlandish. So did the hypothesis generated by Oliver Wendall
Holmes (professor of anatomy and physiology and later dean of Harvard
medical school) in 1843 that maternal fevers after childbirth were com-
municated from mother to mother by obstetricians who did not practice
hand washing between births. Learned colleagues stated that they sus-
pected the disease was ‘‘accident or providence’’ rather than any process
that could be stemmed by hygiene.5 It was almost 40 years before Koch
set forth the postulates that an infectious agent causes disease.
KEY ELEMENTS FOR EVALUATING CAUSATION

Just as Koch formulated postulates that shaped the assessment of
the validity of the causal association of a specific organism and a disease,
scientists have formulated key elements that assist in judging the scien-
tific evidence for causation. These elements involve chance, bias, con-
founding variables, biologic credibility, temporal relationship, strength
of the relationship, and a dose-response gradient. Making judgments as
to whether associations are causal associations involves an evaluation of
the totality of evidence taken from a number of sources that document
the cause-and-effect relationship. The ultimate test of causation is the
successful use of intervention strategies that therapeutically alter the
risk factor or characteristic, thereby altering or curing the disease.
RESEARCH STUDIES TO EVALUATE CAUSATION

The methodologies of research an epidemiologist uses to study
cause-and-effect relationships are different from those used by a clini-
ASSESSMENT OF KEY ELEMENTS TO DETERMINE CAUSATION AND RISK FACTORS 119
cian. Causation studies usually include large sample populations, are

carried out by epidemiologists, and are observational in nature; the
subjects are observed, queried, and measured, without the investigators’
offering or testing any interventions. The study is often hypothesis-
driven: the investigators are gathering data to determine if the character-
istic and the outcome can be found together (associated) in the patient
population, with the statistical analysis supporting the association be-
yond mere chance. Today, health care researchers understand that, unlike
Koch’s simple assessment of ‘‘bacteria cause disease,’’ the cause of dis-
ease is often multifactorial. Multiple characteristics affect the host’s
susceptibility to disease, and how the characteristics come together in
the host affects the magnitude of the disease. Therefore, observational
studies often examine multiple characteristics of the population to deter-
mine associations. Observers record the natural course of events, noting
which subject has or does not have the risk factor and who does or
does not develop the outcome of interest. Different observational study
designs can be used, but some designs offer an improved opportunity
to control bias and confounding variables, thereby increasing the likeli-
hood that these studies report a valid causal association. The two types
of observational studies most often employed are cohort or case-control
studies. Either may be used, but the decision to use one rather than the
other is often based on features of the exposure or risk factor and the
disease, current knowledge of the disease, and considerations of time
and resources.
How is the Magnitude of a Risk Factor Reported?
Depending on the study design, the magnitude of the causal associa-

tion is often described as a ratio, either an odds ratio or a relative risk.
This mathematical analysis considers the ratio of subjects in the exposed
or unexposed group that have or do not have the outcome of interest.
Because this relationship is a ratio, an odds ratio or a relative risk of 1
denotes that there is no difference in outcomes between the two groups.
Relative risks barely above 1 describe a weak association of the risk
factor with the outcome. As the ratio becomes higher than 1, it is an
estimate of the increased risk of having the outcome if the risk factor is
present, as compared with having the outcome if the risk factor is not
present. A relative risk of 1.5 means that the subject with the risk factor
is 50% more likely to have the disease outcome than a subject without
the risk factor. As in all studies, the test population serves only as a
representative population to predict how similar populations would
respond. When testing a subpopulation, one can only estimate how
similar populations would respond. Statistical maneuvers can assure the
precision of the estimate by using a 95% confidence interval. A relative
risk and confidence interval might be written as 2.2 (C.I. 1.3–4.4) This
expression states that, given the data from this representative study, the
best estimate of the relative risk is 2.2, but if the study were performed
120 JACOB
100 times, in 95% of those studies the true risk estimate would fall
somewhere between 1.3 and 4.4.7, 9, 15
Prospective Cohort Studies
Cohort studies observe large populations with and without the

exposure and observe the subjects forward in time to determine if there
is a difference in the populations as to the incidence of the disease
outcome. The observation process often begins with descriptive statistics
that reveal a difference in prevalence of a disease in a defined popula-
tion, such as a geographic area. A hypothesis then arises that the in-
creased prevalence of the disease in one geographic population versus
another is caused by some environmental factor. For instance, in 1942 a
low prevalence of dental decay was demonstrated to correlate with a
high fluoride concentration in the natural water supply. These descrip-
tive correlations came from a study of 4425 children, 12 to 14 years of
age, in 13 cities located in four states.14 Understanding this correlation,
a subsequent prospective comparison of the dental status of children in
a city without natural fluoride (Kingston, New York) and a city that had
fluoride added to its water supply (Newburgh, New York).* The children
were examined for decayed, missing, and filled teeth at baseline and
again in 10 years. In the 6- to 9-year-olds who had drunk fluoridated
water for all of their lives, a 57% relative reduction of dental caries was
seen. The older children experienced a 41% relative reduction.2, 6, 7 An-
other observational study compared decayed, missing, and filled tooth
surfaces at baseline and years after fluoridation was removed from the
water supply in Antigo, Wisconsin. This study revealed that the caries
index rose significantly, from 2.1 to 4.8 surfaces per person in the fourth-
grade population and from 0.5 to 2.0 surfaces per person in the second-
grade population.7, 11 This observational study added to the totality of
the evidence that fluoride acts to prevent dental caries. In these studies,
fluoride is a preventive factor or a negative risk factor for dental caries.
These two studies exemplify some of the elements that quantify the
strength of a causal association. In the New York study, the observations
were conducted in a prospective fashion and showed that the exposure
occurred first, and the outcome followed. This demonstration satisfies
the temporal relationship required in causation. In the second study, the
negative risk factor (fluoride) was withdrawn, and the disease incidence
increased. This demonstration satisfies the dose-response gradient of
causation. The strength of the causal evidence is also enhanced by the
magnitude of the causal effect in both studies.
*The addition of fluoride to the water supply could be considered a therapeutic or

interventional trial; however, because the subjects were not randomized but were consid-
ered as two distinct, self-selected populations who were not balanced for other population
characteristics, the methods are similar to those used in an observational trial.
Dental research has since satisfied biologic credibility by explaining

the mechanism of fluorappetite and how it decreases acid dissolution of
enamel. Finally, multiple therapeutic trials evaluating administration of
fluoride in the water supply, in the diet, and through pharmaceutic
supplements have demonstrated a decrease in caries incidence.
Case-control Studies
Case-control studies differ from cohort studies in that the exposure

or risk factor and outcome have already occurred. There is no following
of the subjects over time, waiting for the outcome to occur. A case-
control study usually includes a fixed population from which the investi-
gator selects a population with the outcome of interest (cases). In a
systematic fashion, the investigator identifies another subject from the
fixed population that is as similar in as many characteristics as possible
except for the outcome (control). This type of study is often called a
matched case-control trial. Both these groups are then evaluated to
determine how many of them have the exposure or risk factor. The data
concerning the characteristics and exposures are almost always gathered
retrospectively from chart reviews, patient questionnaires, and other
documentation. Examinations may be performed to confirm some of the
data, such as the outcome. The retrospective data and examination data
are collected at this one point in time. If the final analysis confirms that
the outcome of interest occurs more frequently in the group with the
exposure, an association exists. This type of study is sometimes preferred
to the cohort study because it allows evaluation of rare outcomes and
outcomes that may take many years to manifest. It is also less costly
than longitudinal studies.5, 7, 10, 15 A cross-sectional study is similar to a
case-control study in that the outcomes and exposures have occurred
before the study, and the interface with the investigator is at one point
in time, without longitudinal follow-up. The population is a fixed popu-
lation, but usually only a representative sample, a cross-section of the
fixed population, is evaluated.
The associations between periodontal disease and coronary heart
disease have been reported through case-control studies and cross-sec-
tional convenience samples.1, 12, 13 The multifactorial causality of coronary
heart disease, the various criteria defining heart disease outcomes, the
various methods of defining periodontal disease, and the large number
of microorganisms in the oral cavity have made it difficult to evaluate
the evidence in these studies. From a statistical standpoint, the multifac-
torial causality of heart disease requires statistical adjustments for as
many as 13 different causal variables, besides periodontal disease. One
study categorized attachment loss in one quadrant of the oral cavity,
compared with a self-reported history of a heart attack, in 5564 persons
older than 40 years of age. After adjustment of other risk factors for
heart attack, the odds ratio for heart attack in persons with attachment
loss of 3 mm or greater in 67% of measurements was 3.8 (C.I. 1.5–9.7)
122 JACOB
compared with persons without attachment loss. The odds ratio for
persons with attachment loss of 3 mm or greater in 33% to 67% of
measurements was 2.3 (C.I. 1.2–4.4). There was no statistically significant
difference in odds ratio with attachment loss in less than 33% of mea-
surements.1 A second study of 85 persons referred to a hospital for
angiography, matched with persons without coronary heart disease se-
lected from public records, revealed no difference in the dental indices
of periapical and periodontal disease. The average age was 56 years,
and the author speculated that this group is older than those in previous
studies. There may have been an age-selection bias, such that older
patients with coronary heart disease are in better general health and
have better oral health, because the severely ill patients with coronary
heart disease have all ready died.13 Another one-point-in-time assess-
ment from chart review data and periodontal examination of a sample
of 320 Veterans Medical Association dental patients older than 60 years
of age was performed to determine dental associations with coronary
heart disease. Other risk factors were also considered from data gathered
from hospital charts and patient interviews. Use of cardiac medications
were considered to represent a diagnosis of coronary heart disease.
Multiple analyses were performed on 25 characteristics. The medically
recognized risk factors for coronary heart disease did not have significant
association in this study. The authors believed the lack of significance in
this study was probably caused by to the increased age of the subjects
and that those subjects with significant associations may have already
succumbed to coronary heart disease. In addition, subjects were being
treated for many of the other risk factors, and therefore those risk factors
were under control. Statistical associations with coronary heart disease
were found for total tooth number up to 14, low salivary levels of
Streptococcus sanguis, gingival bleeding, positive plaque scores, and a
complaint of xerostomia.12
A prospective analysis of 9760 persons concluded that persons with
periodontitis had a 25% increased risk of coronary heart disease com-
pared with those with minimal periodontal disease. Poor oral hygiene,
determined by dental debris and calculus, was also associated with an
increased incidence of coronary heart disease, which was defined as a
hospital admission or death caused by coronary heart disease. Compared
with men without periodontal disease, the highest relative risk for coro-
nary heart disease was for men with periodontitis who were younger
than 50 years old, 1.72 (C.I. 1.10–2.68). An even greater relative risk for
total mortality was found for this group; those with periodontitis had a
relative risk of 2.12 (C.I. 1.24–3.62), and the edentulous subjects had a
relative risk of 2.60 (C.I. 1.33–5.07). The authors concluded that a causal
association between periodontal disease and coronary heart disease is
unclear, and that dental health may be more an indicator of personal
hygiene and overall health care practices.4
Case-control studies that interface with the subjects at one point in
time can suggest an association between a characteristic and an outcome,
but they cannot confirm the temporal relationship that the risk factor
came before the outcome. The correct temporal relationship is a primary

element in proving causation, but in case-control studies this element is
missing. In the evaluations of coronary heart disease and periodontal
disease, case-control studies have demonstrated that the two entities
occur simultaneously in the population, but one cannot be certain that
the coronary heart disease did not in some way cause the periodontal
disease. Case-control studies do not control the element of confounding
characteristics, that is, the possibility that a third variable or mechanism,
not yet isolated or understood, is causing the increase in both coronary
heart disease and periodontal disease. Such a confounding element
would account for the association of the coronary heart disease and
periodontal disease without there being a causal relationship between
the two. A confounding characteristic is demonstrated in the study by
Loesche and colleagues.12 There was a significant association of an in-
crease in the complaint of xerostomia in persons with coronary heart
disease. One should not assume that xerostomia causes coronary heart
disease. It is known, however, that cardiac medications cause xerostomia.
Patients with coronary heart disease require cardiac medications. The
association of xerostomia and coronary heart disease results from the
cardiac medications; the cardiac medications are the confounding factor.
Randomized, Controlled Trials
Randomized, controlled trials (RCTs) are rarely conducted as the

first step to determine a causal association. In an RCT, a homogenous
population of subjects is randomly assigned to two groups, one that will
receive the test intervention and the other that will receive a placebo or
standard-of-care intervention. The two groups are followed prospec-
tively for the outcome of the two interventions. The decided advantage
of RCTs over all other study designs is the investigators’ ability to
control multiple aspects of the trial, prospectively thereby decreasing
bias and offering the greatest opportunity to arrive at a valid and
conclusive answer to a research question.5, 10, 15 In discussions of causa-
tion, the cause is usually harmful, and the outcome is usually a disease.
Initially, only descriptive data are available describing a possible harmful
cause and effect. Even though the cause-and-effect assumption may be
weak, most ethicists and clinicians would not wish to move directly to
an RCT in which the investigator purposefully administers a possibly
harmful event to determine if it really is harmful. For questions of
causation, the initial information to promote the hypothesis and prove
an association between an event and an outcome should be gained
through observational studies. This data gathering can usually be per-
formed more efficiently and cost effectively in case-control or cross-
sectional trials, in which the subjects can be examined at one point in
time. If several trials indicate that sufficient association exists and the
health care impact is judged appropriate for further time and monies to
be expended, several longitudinal, cohort trials could be undertaken.
124 JACOB
These studies do not occur in a vacuum, and it is likely that many

investigators are examining the same issue in the laboratory and clini-
cally. As a significant body of evidence mounts that a causal association
exists, RCTs can be used to evaluate treatments that will modify the
harmful cause or risk factor, thereby altering the disease process or
curing it.
Bias in Research and Causation Studies
In any research, the validity of the conclusions is negatively affected

by bias. One major source of bias in observational studies is the difficulty
in assuring a homogenous study population. In observational studies,
subjects are usually self-selected in that they experienced an exposure,
have an inherent risk factor, or have specific lifestyle behaviors. The
investigators must attempt to quantify the other characteristics of the
study subjects and to select a comparison population with characteristics
as similar to the exposed population as possible. With such ex post facto
population selection, bias can easily occur. Given the intricacies of the
human body, unknown patient characteristics may affect the outcome.
Also, many patients engage in self-directed interventions whereby they
wittingly or unwittingly alter the exposure or outcome. These confound-
ing interventions may affect the disease outcome more than the risk
factors being assessed. Because the investigator does not know about
these confounding entities, they cannot be measured. Nor will the un-
known characteristics be uniformly distributed in both comparison
groups. Randomized, controlled trials control for these unknown, con-
founding characteristics by selecting a large homogeneous population
before rendering any intervention. The large population is then ran-
domly divided into the two comparison groups. The random assignment
of the subjects allows equal assignment of the known and unknown
characteristics into both study groups, thereby creating two homogenous
populations.
Other biases can occur in observational studies because of the retro-
spective nature of the data gathering. Investigators must rely on patients
to give valid answers on questionnaires and rely on the completeness of
medical and dental records to acquire information about a subject’s
health and exposure status. One cross-sectional study to evaluate fluo-
rosis in a school population required that that the parents complete a
questionnaire. Forty-five percent of the questionnaires were not accepted
because of invalid responses.3 Without standard treatment protocols and
documentation protocols, difficulties can arise from omission of data or
ambiguous interpretation of data to fit a research question. A study
that evaluated temporomandibular complications was undertaken as a
concurrent evaluation of the efficacy of two orthognathic surgical fixa-
tion techniques. The surgeons and other investigators evaluated the
temporomandibular complications. The surgeons recorded their data in
the patients’ charts as part of the treatment record, whereas the trained
temporomandibular examiners documented their data on standardized

research forms. It was apparent that the surgeons focused more on
efficacy of surgery than on secondary temporomandibular outcomes. In
many instances, the surgeons failed to document the temporomandibular
findings of pain, oral opening, crepitus, locking, or clicking of the joint.
The approximate differences in documentation and disagreement of the
various findings between the surgeons and the trained investigators
ranged from 20% to 65% for each parameter over the various measure-
ment periods.16
Magnitude of the Risk
When the magnitude of risk is great, risk is not easily masked by

bias. Even though the original studies in New York on water fluoridation
did not use randomized populations or populations selected for like
characteristics, the magnitude of the effect in reducing caries was so
great that the causal association was accepted. Further studies and
examination of the key elements related to causation established the
cause-and-effect relationship of fluoride and decrease in caries incidence.
When causes are multifactorial, have a long latency period before the
effect is demonstrated, and the physiology is complex, as in the risk
factors for coronary heart disease, a small increase in risk is not readily
observed. When a characteristic has a small influence on the outcome,
more subjects are required to demonstrate that influence. The more
subjects in a study and the longer the duration of a study, the more
likely it is that the study will be tainted by bias, and the more equivocal
the conclusions become. Such was the case of smoking and lung cancer.
Multiple studies, conducted over many years and from many countries,
were necessary to establish this causal relationship by proving a large
risk of lung cancer among smokers, and proving both a temporal rela-
tionship and a dose-response gradient.
SUMMARY
The best research method for assessing therapeutic modalities is the

RCT. The prospective nature and the randomization of the subjects in
an RCT provide the greatest opportunity to control bias and offer the
most valid answer to the clinical question. Observational studies gener-
ate hypotheses about causation and should be viewed as a first step in
the continuum of health care delivery. The preponderance of evidence
will mount as the hypotheses are tested by additional prospective, longi-
tudinal, observational trials. The clinician’s involvement is to design and
implement therapeutic strategies to alter the causal exposure, intervene
in the dose-response gradient, and block the pathophysiologic mecha-
nisms. Dentistry is an art and a science. Moving through the continuum
from causation hypothesis to therapeutic intervention is the science of
126 JACOB
dentistry. It is the science of dentistry that will change the scope of the
profession in this millennium.
References
1. Arbes SJ Jr, Slade GD, Beck JD: Association between extent of periodontal attachment
loss and self-reported history of heart attack: An analysis of NHANES III data. J Dent
Res 78:1777–1982, 1999
2. Ast DB, Schlesinger EF: The conclusion of a 10-year study of water fluoridation. Am J
Pub Health 46:265–271, 1956
3. Brothwell DJ, Limeback H: Fluorosis risk in grade 2 students residing in a rural area
with widely varying natural fluoride. Community Dent Oral Epidemiol 27:130–136,
1999
4. DeStefano F, Anda RF, Kahn HS, et al: Dental disease and risk of coronary heart
disease and mortality. BMJ 306:688–691, 1993
5. Fletcher RH, Fletcher S, Wagner EH: Clinical Epidemiology: The Essentials, ed 3.
Baltimore, Williams & Wilkins, 1996
6. Friedman GD: Primer of Epidemiology, ed 4. New York, McGraw-Hill, 1994
7. Gordis L: Epidemiology. Philadelphia, WB Saunders, 1996
8. Hujoel PP, Drangsholt M, Spiekerman C, et al: Periodontal disease and coronary heart
disease risk. JAMA 284:1406–1410, 2000
9. Hulley SB, Cummings SR: Designing Clinical Research. An Epidemiologic Research,
ed 2. Baltimore, Lippincott Williams & Wilkins, 2001
evidence’’ in answering clinical dental questions. J Prosth Dent 83:137–152, 2000
11. Lemke CW, Doherty JM, Arra MC: Controlled fluoridation: The dental effects of
discontinuation in Antigo, Wisconsin. J Am Dent Assoc. 80:782–786, 1970
12. Loesche WJW, Schork A, Terpenning MS, et al: Assessing the relationship between
dental disease and coronary heart disease in elderly U.S. Veterans. J Am Dent Assoc
129:301–311, 1998
13. Mattila KJ, Askikainen S, Wolf J, et al: Age, dental infections, and coronary heart
disease. J Dent Res 79:756–760, 2000
14. US Public Health Services, 1942, pp 1155–1179, Public Health Report 57
15. Sackett DL, Haynes RB, Guyatt GH, et al: Clinical Epidemiology. A Basic Science for
Clinical Medicine, ed 2. Boston, Little, Brown and Co, 1991
16. Scott BA, Clark GM, Hatch JP, et al: Comparing prospective and retrospective evalua-
tions of termporomandibular disorders after orthognathic surgery. J Am Dent Assoc
28:999–1003, 1997

MD Anderson Cancer Center
1515 Holcombe Boulevard, Box 0441
Houston, TX 77030
e-mail: rjacob@mail.mdanderson.org
USERS’ GUIDE TO THE DENTAL

LITERATURE
How to Use an Article about Prognosis
Patrick M. Lloyd, DDS, MS
Today’s prosthodontic treatments are some of the most sophisticated

the profession has ever been able to offer. State-of-the-art materials
simulate the color, texture, pliability, and wear of human tissues to near
perfection. Techniques have been refined, dramatically reducing the
time required for sophisticated procedures. With proper planing and
sequencing, function and appearance can be restored to such a high
level that the artifice is imperceptible even to the most critical patient.
Coincident with the availability of advanced forms of therapy, pa-
tients have also become more sophisticated and present with specific
requests and desires. Because many patients are financially secure, they
are able to support the costs associated with extensive prosthodontic
treatments. These are patients who, after reviewing the evidence, are
most willing to make the investment of time and resources to achieve a
particular outcome. They have high expectations and will not be satisfied
by results that fall short of predicted outcomes. The demand for author-
ity to support a particular intervention has become increasingly common
among prosthodontic patients whose treatments are either fully or par-
tially the responsibility of third-party providers. Because third-party
providers oversee the care of thousands (sometimes hundreds of thou-
sands persons), they are likely to demand an even greater degree of
proof before they support a given intervention.
From the Department of Family Dentistry, The University of Iowa College of Dentistry,
Iowa City, Iowa

128 LLOYD
With the growing need for documented efficacy of treatment and

efficiency of rendering care, prosthodontists will serve their patients best
when they fully understand the intricacies of clinical research and the
results reported. This article proposes a structure for evaluating the
literature that pertains to prognosis—the prediction of outcomes and the
frequency of such occurrences (see box).
Users’ Guide for Evaluating an Article About Diagnosis

1. Are the results of the study valid?
• Primary guides
Was there a representative and well-designed sample of patients
at a similar point in time?
Was follow-up sufficiently long and complete?
• Secondary guides
Were objective and unbiased outcome criteria used?
Was there adjustment for important prognostic factors?
2. What are the results?
• How large is the likelihood of the outcome events in a specified period
of time?
• How precise are the estimates of likelihood?
3. Will the results help a clinician care for patients?
• Are the study patients similar to those in the clinician’s practice?
• Will the results lead directly to selecting or avoiding therapy?
• Are the results useful for reassuring or counseling patients?
From Jacob R, Lloyd P: How to evaluate a dental article about harm. J Prosthet
Dent 84:8–16, 2000; with permission.
It will help practitioners develop the ability to judge whether the

results of an investigation are valid, to interpret the results, and to
determine whether the analysis of the results is relevant to their practice.
A hypothetical case is given here for discussion.
CLINICAL SITUATION
The first patient of the day is a 52-year-old woman with an unre-

markable health history. She has been referred by a general dental
practitioner for evaluation and possible treatment of a missing mandibu-
lar left first molar. About 15 years earlier, tooth #19 was restored with a
multiple-surface, intracoronal silver amalgam restoration. About 3
months ago, the patient bit into a hard piece of bread, separating the
lingual surface of the tooth and resulting in an immediate sharp pain
that persisted for 2 days. Her general dentist diagnosed it as a vertical
USERS’ GUIDE TO THE DENTAL LITERATURE 129
root fracture and recommended extraction. The patient has excellent oral
hygiene, a class I Angle’s malocclusion bilaterally, no mucogingival
defects, and an extensively restored posterior dentition (with silver amal-
gam as the predominate restorative material). Her third molars are the
only other missing teeth.
Her chief concern is whether a dental prosthesis should be fabri-
cated to replace her missing molar tooth. Her general dentist has told
her that if the edentulous space is left untreated, it will lead to future
problems, the most significant of which would be drifting and shifting
of the adjacent and opposing teeth. Such tooth movement, the dentist
said, often results in severe occlusal disharmony, limiting a patient’s
ability to eat comfortably and, because of the concomitant gingival and
periodontal complications, ultimately leading to the demise of other
teeth. At present, the patient does not find the toothless site to be an
esthetic problem. She reports having slightly modified her chewing
pattern, eating more on her right side than her left since the trauma to
tooth #19.
The prosthodontic specialist informs the patient that before treat-
ment options can be considered it is necessary to make diagnostic casts
and to test the vitality and physical condition of the teeth surrounding
the edentulous space. A relevant article reporting a study of the conse-
quences of not replacing a missing posterior tooth has been published
recently in a national dental journal. The specialist promises to share the
results presented in that article with the patient at her next visit so that
she can make an informed decision.
After spending almost an hour rummaging through a stack of
journals later that day, the practitioner finally locates the article. Its title
seems to fit the patient’s condition perfectly: ‘‘The consequences of not
replacing a missing posterior tooth,’’ by Shugars et al.6 Because the
specialist has read it once, a few months ago, he plans to review it again
in more detail before the patient’s next appointment.
STUDY DESIGNS THAT YIELD INSIGHT AND IDENTIFY

PROGNOSIS FACTORS
To advise the patient with the greatest degree of confidence, it is

desirable to have the results of a clinical study that follows over time a
large population of patients who are in every way similar to this patient.
The subjects of the study would have the same condition (missing a
mandibular first molar) and would be comparable in all other domains
(e.g., age, gender, oral hygiene, periodontal support, classification of
malocclusion). Such a study design would allow observation of the
natural history or clinical course of the condition. It would be possible
to monitor the status of anatomic, physiologic, and psychologic condi-
tions that have been reported to occur as a consequence of no treatment.
130 LLOYD
Ultimately, the clinician would have definitive insight to share with the
patient and could feel secure in advising her.
This type of study would provide the information the reader de-
sired, but it is unlikely to be undertaken for many reasons. First are
considerations of cost and time. To assemble such a pool of patients
would require innumerable resources: hundreds of calibrated examiners
and clinical facilities that could accommodate tens of thousands of
subjects. Identical follow-up treatment would have to be provided to
each subject (e.g., the same period for prophylaxis, operative treatments,
and other, more complicated procedures). To assure that there were no
influences from other health conditions, it would be necessary to remove
patients from the study who developed illnesses or were prescribed
medications. Ultimately, the initial population might be reduced to too
small a group to make a conclusive assessment. Many years would be
required collect the data necessary to allow advice to be given with
confidence.
A cohort study design offers a more realistic approach for exposing
the risks associated with certain conditions. Patients in a cohort study
would have the same condition (missing a mandibular first molar) but
would be different in ways previously reported to influence the outcome
(e.g., type of malocclusion, periodontal status, other tooth loss). Subjects
would be grouped according to these prognostic factors and followed
over time. Data collected on other conditions that arise during the course
of the study would allow additional analysis to expose other factors
that contribute to the negative outcomes. Absolute risk ratios could be
calculated so that the patient could be offered probabilities on the
outcomes associated with not treating her condition.
The case-control study design is even more practical from both a
resource and a time perspective but is extremely prone to bias. In a case-
control study, subjects with the condition who have experienced the
negative outcome (periodontal destruction, additional tooth loss) are
compared with subjects who have not. Because subjects, cases, and
controls are selected after the event has or should have occurred, there
is tremendous potential bias. Investigators, because they must examine
subjects to determine their appropriateness for the study, cannot be
blinded during the selection process. The population from which sub-
jects are drawn (e.g., a convenience sample from a dental college) further
contributes to bias. Bias is compounded by the inherent shortcomings of
a retrospective study design, substantially reducing the confidence that
clinicians can realistically derive from such a study. Also, because case-
control studies do not follow subjects over time, only relative risks can
be calculated. In spite of these deficiencies, skillfully planned and tightly
monitored case-control studies can play a significant role in patient
care, especially when the outcome under consideration is infrequently
detected or the time needed to observe the outcome is excessively long.
(For example, mesial drifting of teeth posterior to the edentulous space
has been reported to take several years.)
ARE THE RESULTS OF THE STUDY VALID?
Primary Guides
Was There a Representative and Well-defined Sample of

Patients at a Similar Point in the Course of the Disease?
The validity of the conclusions drawn by investigators from their
work should be judged on how well the population is defined. Are the
criteria for patient selection well defined and appropriate? Is the data-
base adequate to determine whether the study group represents the total
population of patients at risk for the negative outcome? Shugars et al
studied patients from a large group-model health maintenance organisa-
tion who had a first molar or second premolar extracted, were 18 years
of age or older, and were enrolled in the program for at least 8 years.7
The potential for introducing bias into an investigation during the
selection of subjects is quite strong. Questions that should be asked in-
clude:
Did all types of patients have an equal probability of being selected?
Were some patients filtered out because of coexisting conditions?
What measures were taken to ensure that patients represented a broad
cross-section of the population (e.g., age, sex, geographic origin,
socioeconomic status)?
In judging whether a study on prognosis is valid, it is also important
to make sure that all patients entering the investigation are at a similar,
well-defined point in the course of their condition. The investigators
should describe, as specifically as possible and using discipline terminol-
ogy, the stage patients must be in to be included in the study. Shugars
et al decided to enroll patients if there was a radiograph of the adjacent
and opposing teeth within 6 months before or after the extraction.7
Was Follow-up Sufficiently Long and Complete?

Even if there is a true association between a prognostic factor and
an outcome of interest, it may take an extended period of time before
the connection becomes evident. The chronic nature of most dental
diseases and their delayed sequelae call for a rather long and protracted
observation phase to confirm or deny a relationship. For instance, the
loss of additional teeth as a consequence of not replacing a missing
posterior tooth may take several years to occur. Patients should be
examined at regular intervals over a sufficiently long period to judge
whether such an outcome is related to a particular prognostic factor.
Investigators studying the natural history or clinical course of a
disease process or clinical condition are compelled to maintain contact
with individual patients in their study populations. Because of myriad
circumstances, most of which might have little effect on the results of
the study, many patients may be unavailable for follow-up. They may
132 LLOYD
fail to return for a recall examination because of a family relocation, a

loss of interest in the study, an unrelated debilitating illness, or because
the condition under scrutiny has bothered them enough that they elect
to seek treatment (e.g., placement of a fixed partial denture). The greater
the number of patients lost to follow-up, for whatever reason, the less
confidence can be placed in estimates of true risk for a given adverse
outcome.
The effect of patients who are lost to follow-up depends on the size
of the population being studied and the rate of risk for the outcome
event. For example, if 50 patients in a study population of 1000 were
not available for recall, and the calculated risk of the outcome event for
those patients examined was 25%, the worst-case scenario (i.e., all 50
experienced the event) would be a rate of 30%. Although this effect may
be of statistical importance, it would be unlikely to be of clinical import.
If, however, the calculated risk were only 1%, the worst-case scenario
would be 6%, an outcome with substantially different clinical implica-
tions.
To lessen the impact of patients lost to follow-up on a study’s
ralidity, investigators need to report the reasons for unavailability. Each
unamilable patient should be individually counted and identified. In
addition, a comparison should be made, using a multitude of demo-
graphic parameters and clinical conditions, between those for whom a
complete set of follow-up data was collected and those with partial data
on follow-up. Such reporting and analysis increase the confidence that
can be placed on the conclusions made.
This type of analysis, comparing the demographics of respondents
and nonrespondents, was done by Haselton et al1 in a retrospective
study of the clinical performance of high-strength all-ceramic crowns
over a 3-year period. They showed that the age range, gender distribu-
tion, number of ceramic crowns received, and the type of ceramic resto-
ration were comparable between the two groups, allowing readers to
place more confidence in conclusions they drew from the patient base
available for examination.
WHAT ARE THE RESULTS?
How Large is the Likelihood of the Outcome Event

Over a Specified Period of Time?
Of all the questions patients pose, none is more frequent than ‘‘How
long will it last?’’ or, in the case of predicting risk, ‘‘What are the odds
that it will happen to me?’’ To satisfy the sophisticated patient whose
decision whether to be treated may depend on the response to this
question, the practitioner should consider crafting an answer that will
address the issue even more completely than the patient expects.
One could first offer a predication based on absolute prevalence
rates—the percent of likelihood that a particular event will occur at
some time in the future. In the article by Shugars et al6, 12% of the
patients who did not receive treatment for a missing posterior tooth lost
an additional adjacent tooth. The median time was 2.5 years, with a
range of 0.9 to 6.7 years. An additional 13% experienced a tilting of the
teeth adjacent to the edentulous space by a distance greater than 2.0
mm, with a median time of 6.9 years and a range of 1.1 to 9.6 years.
A second-level response would be to advise the patient of the
relative likelihood that she will experience the outcome. This response
would involve calculating the relative risk that the event (additional
tooth loss) would occur during a specified period of time if no treatment
were rendered. In a related article on the same cohort of patients,
Shugars et al7 reported the status of teeth adjacent to a bound edentulous
space for patients who received no treatment and for patients for whom
a fixed partial denture was fabricated. There was a 13% failure rate (e.g.,
an additional tooth was lost adjacent to the edentulous space) for the
untreated group of patients and a 7% failure rate for those who received
a fixed partial denture. These rates demonstrate a relative risk of 1.86.
In other words, patients in this study were 1.86 times as likely to lose a
tooth adjacent to an edentulous space if they received no treatment than
if a fixed partial denture were constructed.
Finally, to provide the patient with a perspective on the rate at
which the event is likely to occur over time (more often than not, there
is significant variation), one could provide information gleaned from a
survival curve. These graphic representations depict what occurs over
the course of time and yield information of potentially great value to
the patient. McLaren and White, in a report on the survival rates of In-
Ceram (In-Ceram, Vident, Brea, California) crowns, used multiple survival
graphs to show the rate of failure in each successive month (Fig 1).3a
Figure 1. Reasons for loss of service of In-Ceram crowns followed for 36 months. (From
McLaren E, White S: Survival of In-Ceram crowns in a private practice: A prospective
clinical trial. J Prosthet Dent 83:216–222, 2000; with permission.)
134 LLOYD
In addition, to help practitioners and patients further, they categorised

their data to identify specific reasons for failure.
How Precise are the Estimates of Likelihood?
Even with the best of intentions and systematic planning, the popu-
lation selected for study will always be a sampling of the population as
a whole. From the data collected, the relative risk for a particular event
can be calculated, but the value will be only an estimate. To show the
precision of this estimate, confidence intervals (CIs) are used. Confidence
intervals help clinicians decide the range within which they can be
confident of the relative risk estimate.2 Norderyd et al, reporting on the
risk of periodontal disease in a Swedish adult population, found that
age is correlated with severe periodontal disease progression.4 Because
this was a case-control study, their calculated risks were expressed as
odds ratios, with a value of 1.05 for the age correlation. The CI was 1.02
to 1.07, a rather tight range and one indicating that the calculated risk
is quite precise.
WILL THE RESULTS HELP THE PRACTITIONERS IN

CARING FOR PATIENTS?
Are the Study Patients Similar to Those in the

Practitioner’s Practice?
Regardless of the steps taken to minimize bias, to standardize mea-

surement, and to adjust for differences, a study on prognosis has limited
application to one’s practice if the patients under consideration are
unlike those one treats from day to day. An adequate base of information
on the demographics and clinical conditions for patients used in the
study should be reported so that one can judge the level of comparabil-
ity. Characteristics to consider include age, socioeconomic status, pat-
terns of tooth loss, and medication profile—virtually any characteristic
that distinguishes the patient population in a particular practice.
The description of the patients involved in the article by Shugars et
al6, albeit brief, might be adequate to judge their similarity to the patients
a practitioner treat. The article reports the gender distribution (51%
female) and the mean age and range of the population (45.5 years, with
a range of 24 to 90 years). All subjects were also enrolled in a large
group-model health maintenance organization in Portland, Oregon.
These data, although limited, do offer some insight into the comparabil-
ity between patients in this study and in one’s practice.
Will the Results Lead Directly to Selecting or

Avoiding Therapy?
It is highly unlikely that the results of any clinical investigation,

whether it deals with prognosis or therapy, will be directly applicable
to one’s practice. The myriad factors that influence subject selection,
measuring, and follow-up protocol should be critically examined to
determine what insight, if any, can be applied to a particular clinical
situation. Relevance is not and should not be considered an all-or-none
situation. Nearly every article contains some evidence that, when used
properly, can help support or refute a decision whether to intervene.
The article reviewed here6 reported a 13% rate of clinically signifi-
cant tilting (⬎2.0 mm) of the teeth adjacent to the edentulous space and
some loss of alveolar bone around the involved teeth; 12% of the patients
who did not receive treatment for a missing posterior tooth lost an
additional adjacent tooth. In sharing this information with a patient, a
practitioner would be obligated to inform the patient how the subjects
were selected and what characteristics could potentially raise or lower
that rate, given the patient’s unique set of conditions.
Are the Results Useful for Reassuring or

Counseling Patients?
For the hypothetical patient discussed in this article, there is evi-

dence that she may not suffer significantly if her condition is not treated
immediately. Although there is a risk that not replacing her missing
posterior tooth will cause her harm, the risk is apparently less than
reported by other investigators and clinicians. To help the practitioner
be more confident about the counsel he or she provides, the practitioner
may want to reread sections of occlusion, fixed prosthodontic, and ortho-
dontic texts. These texts may offer additional theory on how to manage
the condition and what other factors should be monitored to ensure that
an intervention is both appropriate and timely.
ACKNOWLEDGMENT
The author expresses appreciation to Anita Makuluni for her insights and editorial
comments.
References
1. Haselton D, Diaz-Arnold A, Hillis S: Clinical assessment of high-strength all-ceramic

crowns. J Prosthet Dent 83:396–401, 2000
2. Jacob R, Lloyd P: How to evaluate a dental article about harm. J Prosthet Dent
84:8–16, 2000
3. Laupacis A, Wells G, Richardson S, et al: Users’ guides to the medical literature V.
How to use an article about prognosis. JAMA 272:234–237, 1994
136 LLOYD
3a. McLaren E, White S: Survival of In-Ceram crowns in a private practice: A prospective

clinical trial. J Prosthet Dent 83:216–222, 2000
4. Norderyd O, Hugoson N, Grusovin G: Risk of severe periodontal disease in a Swedish
adult population. J Clin Periodontol 26:608–615, 1999
5. Sackett D, Haynes R, Guyatt G, et al: Clinical epidemiology: A basic science for clinical
medicine, ed 2. Boston, Little, Brown and Co; 1991, pp 173–185
6. Shugars D, Bader J, Phillips W, et al: The consequences of not replacing a missing
posterior tooth. J Am Dent Assoc 131:1317–1323, 2000
7. Shugars D, Bader J, White A, et al: Survival rates of teeth adjacent to treated and
untreated posterior bounded edentulous spaces. J Am Dent Assoc 129:1089–1095, 1998

Patrick M. Lloyd, DDS, MS
Department of Family Dentistry,
The University of Iowa College of Dentistry
S313 DSB
Iowa City, 1A 52242
e-mail: patrick-lloyd@viowa.edu
BIOSTATISTICAL CONSULTATION
FOR DENTAL RESEARCH
Jonathan Clive, PhD
What do dentists need to know about statistics, and why do they

need to know it?
This article suggests some reasonable and convincing answers to
these questions. To focus the discussion, dental health care providers are
considered as either practitioners and specialists who see patients daily
but who do not perform scientific research (PR), or as dental researchers
(DR) who may see patients or students but are also actively engaged in
research.
In this article, it is generally assumed that dental researchers’ activi-
ties involve the acquisition and evaluation of some kind of data. The
term data simply refers to a description, numeric or otherwise, of the
attributes of the experimental units (patients, laboratory animals, teeth,
periodontal tissue, and so forth.) being considered. These may be as
basic as the number of decayed, missing, and filled teeth (DMFT), or the
number of decayed, missing, and filled surfaces (DMFS), gingival index,
or some more specialized and exotic measure, such as the number of
cells of a particular type per unit volume.
This article does not provide a crash course in statistics; in fact, no
specific statistics lessons are offered here (although some specific exam-
ples are provided). It would be counterproductive to attempt to cover
in this limited space subject matter that most introductory statistics texts
require several hundred pages to present. Instead, the author discusses
related general concepts that he believes are crucial for both dental
researchers and dental practitioners to understand before beginning the
statistical consultation process.
From the Office of Biostatistical Consultation, University of Connecticut Health Center,

Farmington, Connecticut

138 CLIVE
STATISTICAL NEEDS OF DENTAL PRACTITIONERS

Although dental practitioners may not possess formal statistical
experience or training, they may frequently use several terms that com-
prise a basic statistics vocabulary. These terms may carry associations or
interpretations that are intuitively understood. Perhaps the terms most
often used in this fashion are mean and average. Given a set of numbers
(which can be assumed to represent data acquired during the course of
some research endeavor), the mean is often interpreted as the most typical
or most representative single value describing all the numbers. Although
there is some justification for this view, the term has a more rigorous
definition. To determine the mean of a set of data, one sums the data
values and divides by the number of observations. The mean represents
the center of gravity of a set of numbers, the value around which all
other numbers are distributed.
The standard deviation (or, equivalently, the variance, which is the
square of the standard deviation) is another basic summary attribute of
data that has a relatively straightforward meaning. It describes the
spread or dispersion of the numbers in the dataset around the mean. The
larger the standard deviation, the greater the variation, or heterogeneity,
among observations. These two measures, the mean and the standard
deviation, arise naturally in the logical study of data. Outlining this
development is helpful in understanding means and standard deviations
and in obtaining an overview of statistical procedures.
To do so, it is useful to consider a common graphic portrayal of
data, as illustrated in Figure 1A. This figure shows a histogram, which
illustrates the distribution of values for some variable, in a sample
assessed from some hypothetic population. The distribution of values
encompasses the values that occur in the data and the frequency with
which they occur. To generate a histogram, the range of the data (small-
est and largest values among the observations) is partitioned into succes-
sive intervals.
A set of DMFS scores, for example, might range from 0 to 40.
Intervals of DMFS could be designated as from 0–4, 5–10, 11–15, up to
36–40. The number of observations falling in each of these intervals is
then tallied, and a bar graph is generated. The height of the bar for each
interval is proportional to the frequency or the number of observations
falling within that interval. The resulting plot is called a histogram, and
it, too, is an intuitively and easily understood representation of data.
(Note that the width of the intervals must be constant so that the
comparison of frequencies is meaningful.)
BASICS 1: HISTOGRAMS TO BELL-SHAPED CURVES

Although useful and informative, the histogram represents little
more than a basic tool for exploratory data analysis and is purely
descriptive. Statisticians and mathematicians who were not satisfied
such heuristic descriptions, carried the idea further, asking what happens
when a histogram presents all the information possibly available. This
BIOSTATISTICAL CONSULTATION FOR DENTAL RESEARCH 139
Figure 1. A, A histogram can be used to graphically represent the distribution of values

contained in a sample of observations. Each bar corresponds to a range of values of the
measure being considered (X), and the height of the corresponding bar (Freq) is propor-
tional to the count or percentage of all observations falling into that category. Histograms
can be refined by designating more narrow intervals and increasing the number of observa-
tions. B, A histogram may be modeled, or approximated, by an appropriate mathematical
function (see text). FREQ frequency.
limit can be approached, it was suggested, by making the intervals more

and more narrow as the sample size (the number of observations or
data points) becomes larger and larger. As successive histograms are
generated under these circumstances, their appearance is increasingly
seen to resemble a relatively smooth or continuous curve, in contrast to
a single histogram defined over a few intervals, which resembles a
conventional bar chart with adjacent bars.
The generation of such a curve suggests that it might be possible to
use some type of mathematical function to characterize it. A mathemati-
cal representation would have several desirable features. First, it would
provide a succinct way to describing a distribution, namely, the mathe-
matical function describing the curve. Second, it would serve as a basis
for comparing distributions across different populations or groups. Mak-
ing such comparisons is a basic activity of statistical inference.
Using a well-defined mathematical function to describe or model a
histogram is illustrated in Figure 1B. Here, a curve has been superim-
posed around the histogram shown in Figure 1A. The particular curve
140 CLIVE
used here is called a normal curve, or normal distribution. It is also well

known as the bell-shaped curve familiar to most scientists. The term
normal in normal distribution is used as a name, and derives from the
suggestion that the distribution of most attributes in the normal (here
used as an adjective) population is well represented by the type of curve
shown in Figure 1B.
In fact, the normal curve is in many cases an acceptable representa-
tion of a distribution, even when the observed histogram is skewed or
not symmetric or, in general, somewhat poorly behaved. Furthermore,
one of the most remarkable results from mathematical statistics shows
that in almost all situations, no matter what the underlying form of the
histogram of observations, the distribution of means from the population
under study does tend to follow a normal distribution. The normal
distribution forms the core of much of statistical theory and practice.
It is important to understand the distinction between a histogram
(Fig. 1A) and the normal curve (Fig. 1B). The histogram represents the
data, or the real world, whereas the normal curve is strictly a model,
an approximation generated by statisticians. A normal distribution is
uniquely characterized by its mean and standard deviation. These are
the same parameters so intuitively familiar to nonstatisticians. The mean
of a normal distribution reflects the center of gravity of the values or
observations and is often referred to as a measure of location, or a measure
of central tendency. The normal distribution peaks at the mean value; that
is, the maximum value of the curve along the y-axis occurs at the
location of the mean on the x-axis. The standard deviation indicates the
spread of the distribution around the mean; the larger the standard
deviation, the more flat or less spiked the normal curve.
BASICS 2: CONFIDENCE-INTERVAL ESTIMATES
The mean and standard deviation of a normal curve (or any set of
data, for that matter) are often presented in the scientific literature as
mean standard deviation. For example, an author might write that
‘‘the mean and standard deviation DMFS for the test group was 8.3
4.5, whereas for the control group the mean and standard deviation
DMFS was 11.6 5.2.’’ The symbol ‘‘’’ means plus or minus and is
technically not correctly used in this context. The standard deviation is
a positive quantity. By convention, however, the use of this notation has
been almost universally adopted.
The best interpretation of a statement of the form mean standard
deviation is that approximately 68% of all observations lie within one
standard deviation of the mean. Approximately 95% of all observations
lie within two standard deviations of the mean. (These statements apply
to observations that are well modeled by a normal distribution.) The use
of this notation does provide the reader with the notion of an interval
in which most of the data are contained. A confidence interval represents
the natural extension of this notion of an interval estimate and is another

statistical concept commonly encountered in the scientific literature.
The use of a mean standard deviation attempts to indicate both
the value of a mean and the precision with which the value was deter-
mined. Thus, one might simply note that ‘‘the estimated mean DMFS
was 8.3.’’ This statement can be extended to include a 95% confidence
interval: ‘‘The mean DMFS was 8.3, with a 95% confidence interval given
by (6.8, 9.8).’’ This statement means that one is 95% confident that the
interval from 6.8 to 9.8 inclusive contains the true mean DMFS for the
population under study. A 95% confidence interval given by (6.8, 9.8) is
much more precise than a 95% confidence interval given, for example,
by (4.8, 11.8).
Most scientific journals currently insist on the use of confidence
intervals beyond simple point estimates when discussing numerical
data. The difference between a simple point estimate and a confidence
interval is illustrated in the following statements: (1) ‘‘It is likely that the
restaurant is on 7th Avenue and 55th Street’’; and (2) ‘‘I am 95% confident
that the restaurant is on 7th Avenue between 52nd and 58th streets.’’
BASICS 3: COMPARING MEANS
If two distributions have identical means, one can assume that the
distribution of values of the variable being measured are identical in the
two groups from which the data were drawn. If two distributions have
similar means, the distributions are similar; finally, if two distributions
have different means, the distributions are different. Statistics provides
a way of estimating how probable it is that two or more means origi-
nated from the same underlying population. This probability is called
a ‘‘P-value’’ and is the last of the routinely invoked statistical terms
considered here.
P-values arise when two means are compared statistically. Thus,
one may report that ‘‘when the two means were compared using a t-test
of independent group means, it was observed that t on 44 degrees of
freedom was 4.55, P ⬍ 0.001.’’ The interpretation of this statement is as
follows: if there is really no difference between the groups being ob-
served, then the probability of observing a mean difference as extreme
or more extreme than that observed is less than 1 in 1000.
Because the P-value in the example is less than 0.05, the difference
is said to be statistically significant. This cutoff point for statistical signifi-
cance (0.05) is rather arbitrary but has developed into a standard in the
scientific literature over time. The P-value indicates the strength of the
evidence against the hypothesis of no difference in means. (The perspec-
tive of no difference is used because doing so reflects how the theory of
statistical hypothesis testing developed.) Small P-values indicate that the
hypothesis of no difference is unlikely. Unlikely does not mean impossi-
ble, however; therefore the researcher (or reader) must choose between
rejecting the hypothesis of no difference or accepting the conclusion
142 CLIVE
that the data represent the unlikely instance of a large difference in

sample means.
STATISTICAL NEEDS OF DENTAL PRACTITIONERS—

GENERAL CONSIDERATIONS
Although dental practitioners generally do not need assistance in

designing and executing an experiment, there are circumstances in the
activities of daily practice for which some statistical insight is required.
For example, dental practitioners need to be able to keep up with the
scientific literature, studying and evaluating scientific reports appearing
in professional journals. Also, patients may ask questions such as, ‘‘Am
I at risk for losing my teeth?’’ or, ‘‘Am I at risk for periodontal disease?’’
or, ‘‘Am I at risk for oral cancer?’’
These are statistical questions. Risk is a statistical concept in epide-
miology, and the appropriate estimation of risk factors for major diseases
and other clinical conditions is a major topic in biomedical research. The
dental practitioner may not need to estimate the risk but rather may
need to be able to explain it and discuss it intelligently with a concerned
patient. In other circumstances, practitioners may need to evaluate an
article in the scientific literature and ponder the implications for their
practice. For example, a practitioner may need to decide whether an
article presents convincing evidence for switching to a different type of
material for restorations.
The study by Kilburn and Asmundsson10 serves as an example of
the importance of reviewing the scientific literature with a certain degree
of skepticism. These authors claimed to disprove the long-held clinical
maxim that the anteroposterior (AP) diameter of the chest is increased
in patients with advanced pulmonary emphysema (who were compared
with a group of nondiseased controls and a second group of patients
with non-emphysema diagnoses). In fact, the authors were not at all
reluctant to assert that ‘‘it is contended that measurement has destroyed
an apparently long-established and often repeated maxim that an in-
creased AP diameter is a common and useful sign of emphysema.’’
The experimental approach used in this study was suspect; further-
more, the authors cited no statistical evidence to support their claim.
They did, however, present sufficient tabulated summary measures to
enable the reader to carry out the basic statistical test that would have
been appropriate for the clinical question. When the reader carries out
the test (the level of statistical knowledge necessary to do so would be
acquired during the first third of an introductory biostatistics course),
the difference in mean AP diameter between the emphysema group and
the nondiseased controls is found to be statistically significant, with
P 0.04! Thus, not only does this result contradict the main conclusion;
the authors themselves have provided the reader with the resources to
refute the paper.
In this case, the statistical test is a t-test of independent group means,
two-tailed, carried out at the 0.05 level of significance. Note that is also
possible to perform a one-way analysis of variance given the tabulated
summary measures. A one-way analysis of variance permits comparison
of all three group means simultaneously, as well as the appropriate
multiple-comparison procedures to isolate group differences.
Is it reasonable to expect practitioners to be familiar with these
terms and to be able to duplicate statistical procedures of the type
discussed here in the course of evaluating a journal article? Probably
not. Any statistical background practitioners acquired in dental school
may be long forgotten, and the practitioner probably faces more pressing
concerns involving office management and patient treatment. In addi-
tion, the practical import of a study in the literature may need time to
propagate to the office of a practitioner.
Unfortunately, in the presentation of scientific studies, the situation
is often ‘‘let the reader beware.’’18 It is true that since the appearance of
the Kilburn, et al article, most journals have increased their requirements
for statistical rigor in submitted manuscripts. Many journals retain statis-
tical consultants for special reviews and will use statistically sophisti-
cated referees where necessary. Nonetheless, a practitioner may need to
know how to evaluate such issues as the suitability of the experimental
design, the appropriateness of the statistical tests used, and whether the
results of the test have been interpreted correctly.
STATISTICAL TRAINING FOR DENTAL

PRACTITIONERS
The practitioner must determine what level of statistical insight is

appropriate and how to acquire it. A suggested knowledgebase is shown
in Table 1. Although there is no one-size-fits-all statistics curriculum,
this table lists the basic topics an introductory statistics student should
Table 1. SUGGESTED TOPICS FOR A BASIC STATISTICAL EDUCATION, WITH

ASSOCIATED ENHANCEMENTS
Topic Contents Enhancements
Descriptive statistics Summary measures/graphics
Hypothesis testing Paired, 2-sample t-tests
Compare independent
proportions (analysis of Higher order contingency
22 tables) tables
Power/sample size
Type I and type II errors
One-way analysis of variance Linear models/higher order
ANOVA designs
Bivariate analysis Univariate regression/ Multiple linear regression
correlation Logistic regression
ANOVA analysis of variance

144 CLIVE
master. Beyond these basic topics, certain enhancements or intermediate

topics are given.
In the absence of a working knowledgebase of this type, a prac-
titioner may feel it sufficient to ‘‘ask around,’’ at professional meetings,
for example, concerning a particular topic. In special circumstances, it
may be necessary to secure the services of a statistical consultant from a
local university, college, or school of public health. (The department
secretary of the statistics or biostatistics department will, in most cases,
direct a PR to an available consultant or the director of a consulting
facility.) Sometimes these two approaches can be combined, as when a
local dental society, for example, invites a statistician to address a meet-
ing and discuss a reference of special interest to the members.
The practitioner may also want to audit a course in introductory
biostatistics. The course should be presented from a research perspective.
Most institutions of higher learning from the community college level
to the college, university, or academic health center (including schools
of public health) level offer such courses, and auditing privileges can
often be obtained.
Alternatively, practitioners may want to follow a self-study regimen.
There are a number of excellent references for this purpose, well written
and especially suited for individual use. For biostatistics, the author
recommends Glantz, Primer of Biostatistics8; for clinical epidemiology,
Sackett, Haynes, and Tugwell, Clinical Epidemiology17 and Friedman
Primer of Epidemiology7; and for general evaluation of the scientific litera-
ture, Riegelman, Studying a Study and Testing a Test.16 The truly ambitious
practitioner can acquire some statistical software. Statistical package for
the social sciences (SPSS) is a Windows-based, user-friendly program,
and inexpensive student versions are available. A number of self-instruc-
tion texts are available for this package to complement the useful in-
program documentation.2
STATISTICAL NEEDS OF DENTAL RESEARCHERS
At this point, it is appropriate to consider the statistical needs

of the dental researcher, although readers who consider themselves
practitioners are encouraged to keep reading. The statistical needs of
dental researchers are generally more pressing (e.g., there is generally
less time for a self-instruction approach) and may be more technically
advanced than the statistical needs of a practitioner.
Because the researcher is actively involved in some form of dental
research, the need for statistical input during all phases of the research
is important, from the initial formulation of the scientific or clinical
hypotheses being considered, through the execution of the project, to
the preparation of research papers and presentations. The extent and
amount of this input depend, of course, on the magnitude of the project.
Clearly, a survey of several thousand patients involving many study
variables will require more statistical resources than a study involving

20 or 30 laboratory animals and few study variables. Nonetheless, the
planning issues are similar in both cases.
THE ROLE OF THE STATISTICAL CONSULTANT IN

DENTAL RESEARCH
Certain primary tasks are the biostatistician’s responsibility when

interacting with dental researcher. For the present discussion, it is as-
sumed that the consultation goes beyond a simple drop-in visit during
which the biostatistician can respond to a few simple questions such as
responding to comments in a manuscript review. Ideally, the researcher
and the biostatistician meet early in the planning process (at the research-
er’s initiative, of course), to discuss the nature of the research, whether
it is an intricate experiment with many outcome measures or a large-
scale clinical trial in which observations are collected at multiple points
over time on a large group of patients or experimental animals.
It is advisable for the biostatistician to become as familiar as possible
with the purpose of the research and the underlying scientific or clinical
considerations. This familiarity can accrue over time, as the biostatisti-
cian and the researcher meet repeatedly and interact regularly over the
course of planning and executing the project. It is not reasonable to
expect the statistical consultant to manifest the same level of understand-
ing of the scientific issues as the researcher. It is also not reasonable,
however, to expect the researcher to be able to handle the technical
mathematical issues of the statistics involved in the research.
Thus, in the interaction between researcher and statistician, what
the researcher needs to know about statistics mirrors in many ways
what the statistical consultant or dental patient needs to know about
clinical dentistry. A patient does not need to know the intricate and
minute clinical and scientific details of how a therapy works to benefit
from it. It is likely that over time a patient will become somewhat
familiar with aspects of the dental procedures received. For example, a
patient receiving implants would be able to advise other patients on the
nature of the process but would not be qualified to apply it.
Similarly, the researcher does not need to know whether maximum
likelihood or least squares algorithms were used to generate estimates
of model parameters. The model needs to target the specific aims of the
research and to address the fundamental hypotheses. It is the interpreta-
tion of the results of the analysis or modeling process that is crucial. (It
is assumed here that the biostatistician has appropriately diagnosed the
adequacy of the model—a process referred to as evaluating the fit of the
model. That is, this discussion assumes good statistical practice on the
part of the statistician, just as it assumes high standards of clinical and
scientific conduct on the part of the researcher.)
146 CLIVE
STATISTICAL HYPOTHESES AND METHODS
The biostatistician and the researcher need to carefully establish a

one-to-one relationship between a set of clinical or experimental hypoth-
eses and corresponding statistical hypotheses. These hypotheses are
often expressed in opposite ways. For example, the clinical hypothesis,
‘‘This new treatment, together with proper oral hygiene, will greatly
reduce the rate of increase in DMFS compared with proper oral hygiene
alone,’’ might be expressed in a statistical context as, ‘‘There is no
difference in mean change in DMFS between the drug-and-oral-hygiene
and oral-hygiene-only groups.’’ This transcription will help to specify
the appropriate statistical procedures to be used in analyzing the data.
In some cases, the relationship may be more subtle and may require
extensive interaction and question-and-answer sessions between the re-
searcher and the biostatistician. It is quite possible that the type of
statistical methods selected will change as new understanding arises on
the part of both the researcher and the statistical consultant. New issues
may arise that require additional planning. In any case, it is important
to establish statistical hypotheses for all primary and secondary clinical
and scientific hypotheses and to take the nature of the corresponding
outcome variables into account. The statistical procedures should be
coordinated with the research-specific aims when a research proposal is
being jointly prepared.
COMPLICATING FACTORS
A number of characteristics of dental research need to be considered

when planning statistical analyses. The first characteristic is structural;
quite simply, one is dealing with multiple units if one considers individ-
ual teeth or even individual tooth surfaces. Statisticians refer to multi-
ple-unit data as high-dimensional data, and such data can seriously
complicate both the research planning and the data analysis. The usual
approach is to attempt a type of statistical analysis known as a dimension
reduction procedure and then study the reduced number of data units.
Alternatively, researchers can confine attention to a specified subset of
the original units. These issues are discussed by Clive and Woodbury.3
An example of such a situation is the paper by Löe et al, which
analyzes data from the well-known study of the progression of periodon-
tal disease among Sri Lankan tea laborers.12 The authors seek to identify
and characterize subtypes of disease development based on analysis of
loss-of-attachment measurements. Three disease subtypes were identi-
fied. Although the clinical utility of these types remains to be established,
their descriptive value is clear.
Many experiments in dental research involve the acquisition and
analysis of longitudinal, or repeated measurements, data, because the
development and manifestation of dental disease is often a gradual
process. Longitudinal research concerns the assessment of experimental

units at several points in time over the entire chronologic course of the
research. Such observations are referred to as clustered, or correlated, data
because the outcomes for individual subjects may be related over time.
It is generally inadvisable to analyze such data in orthodox ways,
as if the observations were independent. For example, if a subject’s
DMFS is above average at time 1, it is likely to be so again at subsequent
readings; this degree of association can influence the analytic results and
needs to be accounted for. Developments in both applied and theoretical
statistics and computer science over the past two decades have made it
possible to deal with these analyses on a fairly routine level; these
approaches are discussed in Diggle, Liang, and Zeger’s The Analysis of
Longitudinal Data4 and in Littell, Milliken, and Stroup’s SAS System for
Mixed Models.11
Armitage et al1 provide an interesting illustration of such research
in assessing the use of elastase as a marker for the progression of
periodontal disease. This paper, which appeared in the dental literature,
is paired with a technical paper from the statistical literature20 that
assesses the advanced longitudinal data analytic techniques in the spe-
cific context of periodontal disease. These articles are noteworthy in that
together they present research evaluating the appropriateness of a new
class of data analytic models as well as clinical and scientific applications
of the new analyses. The software for implementing these procedures is
now routinely available; this was not the case when the papers were
published.
Other interesting examples of longitudinal data analysis for dental
research are given by Neely14 and Chugal et al.2 Neely identifies key risk
variables for tooth loss based on analysis of the Sri Lanka data cited
previously.12 Subjects were seen between one and seven times, and loss
of attachment for two surfaces for each tooth were measured at each
point. Tooth loss was also assessed at each point and used as the
outcome variable for the analysis.
Chugal and colleagues2 investigated factors influencing the success
or failure of endodontic therapy. This research modeled the success or
failure for each canal. The number of canals within teeth varied, as did
the number of treated teeth across patients. In this case, there were, in
fact, two levels of clustered data: canals within teeth, and treated teeth
within patients.
Still another attribute of dental data that complicates analytic con-
siderations concerns the intrinsic lack of precision of some basic mea-
surements; this imprecision is sometimes referred to as noise. A good
example of noise is the limit on precision in measuring loss of attach-
ment. Although loss of attachment is a crucial outcome measure in
many studies, investigators need to be aware of the extent to which
experimental conclusions can be compromised by measurement defi-
ciencies. Certain experimental design considerations devised to deal
with the problem of noise are discussed by Imrey and Chilton.9
148 CLIVE
SAMPLE SIZE AND POWER ANALYSIS
The estimation of sample size is a crucial aspect of research develop-

ment. Having too many experimental subjects wastes time and money,
a circumstance especially frowned upon by funding agencies. Having
too few subjects increases the risk of failing to detect a real effect or a
statistically significant difference. In statistical terms, the smaller the
sample size, the smaller the power of the experiment, which is the
chance of detecting a difference or effect that is really present. In the
chronology of planning, it could be argued that sample size estimation
precedes the specification of statistical techniques described previously.
The selection of methods of analysis often dictates the approach to
sample size estimation, however.
The design of the study is another factor influencing the estimation
of sample size. There are several well-established experimental designs
to consider, especially when the research is in the form of a clinical trial.
A randomized clinical trial is a design with at least two study groups
(test and control) to which eligible patients are assigned randomly. Other
designs include case-control, prospective cohort, crossover, and the less
scientificly rigorous pilot and observational studies. The selection of
an appropriate study design is an important aspect of researcher and
biostatistician interaction.
Sample size estimation must include a statistical justification in
terms of testing the primary research hypotheses and specification of
what a clinically or scientifically significant difference is for each of the
main outcome variables in the study. The term difference is used here to
denote the magnitude of the difference between summary outcome
measures across experimental groups.
Note that clinical or scientific significance may be different from
statistical significance. As Feinstein, citing Gertrude Stein, has noted,
a difference has to make a difference to be a difference.6 Thus, rational
experiment planning requires the researcher to estimate what difference
is noteworthy and to specify sample size accordingly. This planning is
in contrast to the haphazard selection of a sample size with the hope
that something statistically significant and worth reporting will turn
up.
Although the biostatistician actually provides sample size estimates,
these estimates are based on extensive input from the researcher. In
addition to the concepts cited previously, the researcher needs to have a
good working understanding of type I and type II errors. In statistical
terms, a type I error involves rejecting a true hypothesis of no difference,
and a type II error involves accepting a false hypothesis of no difference.
The probability of these events are denoted by ␣ and ␤; the power of a
test is the complement of a type II error, with probability 1-␤.
In research terms, type I and type II errors correspond to concluding
falsely that an effect is present or concluding falsely that an effect is
absent, respectively. Many researchers consider the latter more serious
than the former, because failing to detect an experimental effect might
lead to loss of interest or motivation in the particular type of research

being performed. On the other hand, it is likely that a false effect will
be exposed sooner or later in the course of further research.
The researcher also needs to provide estimates of the magnitude of
the effect of the intervention on each of the outcome variables in one of
the research groups, together with an estimate of the variability; these
estimates are called pilot data. Making these estimates may seem counter-
intuitive, because it may reasonably be asked what purpose the research
serves if some concept of the size and variance of the intervention is
available a priori. In fact, researchers are not presuming to estimate the
effect of the experimental intervention but rather to make a reasonable
speculation on the response that could be expected in the control or
nonintervention group. The pilot data form the basis for estimating the
sample size required to observe a given difference.
Assume, for example, that an investigator is planning to test the
effect of an intervention hypothesized to reduce the rate of loss of
attachment. Suppose further that it is known that over some time period
untreated individuals will lose an average of 4 mm attachment, with a
standard deviation of 3.5, and that these estimates apply to the type of
patient population being studied. The biostatistical consultant can use
this information to estimate the number of patients needed to detect a
specified difference based on given values of ␣ and ␤ and the type of
analysis to be used.
SOURCES OF PILOT DATA
The division of labor is straightforward in sample size estimation.

The researcher needs to supply estimates of the appropriate summary
measures for the important outcome variables for at least one of the
groups in the study (probably the control or non-intervention group).
The researcher also needs to provide an idea of the magnitude of a
clinically significant effect. The statistician uses these data, together with
specifications of ␣ and ␤ and the statistical method to be used, to
estimate a sample size. Table 2 outlines the main steps in performing a
power analysis. It is permissible to estimate sample sizes for a range of
values of ␣ and ␤, as illustrated in Table 3, which shows a sample size
table for the hypothetical experiment concerning loss of attachment
discussed previously.
There are several sources of assistance for the researcher in de-
termining what constitutes a clinically significant difference, as well as
providing pilot data for use by the statistical consultant in estimating
sample size or power. The scientific literature is a valuable source of
background data for this planning. It is quite likely that the researcher
has exhaustively reviewed the literature in the course of formally devel-
oping the research plan. The literature review may provide multiple
sources of pilot data as well as indications of the variation in response
across different classes of patients or potential research subjects.
150 CLIVE
Table 2. PRIMARY STEPS IN CARRYING OUT A POWER ANALYSIS
Step Activity Responsibility

1 Specify clinical hypotheses DR
2 Determine primary outcome measures DR, B
3 Transcribe clinical hypotheses to statistical hypotheses DR, B
4 Specify range of clinically meaningful differences for DR
outcome measures
5 Specify ␣, ␤ DR
6 Obtain pilot data for calculations DR
7 Obtain power/sample size estimates B
8 Evaluate results for final sample size/power specification DR, B
DR dental researcher; B biostatistician; ␣ probability of a type I error; ␤ probability of

a type II error
Input from colleagues is also a useful source of data for research

planning, especially when inquiries can be focused, so that matters of
pilot data and significant effect can be addressed directly. A researcher
who is also a practicing dentist may have a set of patient records worth
examining. Many providers use the readily available software (such as
spreadsheet packages) to construct their own databases, which may be
useful for planning; however, it is important to verify the standards
under which the data were assessed.
Some commercially available software packages for power analysis
provide interactive prompts for assisting users through the steps of a
sample size determination.6 These steps include a variety of techniques
for generating pilot data summary measures from limited input (e.g.,
estimating the standard deviation for an outcome variable based on
estimation of the range or percentile values). These techniques are useful
when prior knowledge or data are limited.
In some cases, suitable pilot data are lacking altogether. Such a
situation may arise, for example, early in the history of a line of scientific
inquiry or when first testing a new drug or intervention. Here, the
researcher may consider implementing a pilot study. A pilot study is a
Table 3. SAMPLE SIZE ESTIMATES FOR HYPOTHETICAL LOSS OF ATTACHMENT

STUDY*
␤ Value
Percent Mean
Difference 0.10 0.20
10 40 35
15 33 28
20 29 24
25 25 19
*The table is designed to show the number of subjects needed in each of two groups, assuming a
repeated measures design with hypothesis testing carried out at ␣ 0.05. Sample size estimates are
shown for each of two values of ␤, and each of several effect sizes.
␣ probability of a type I error; ␤ probability of a type II error
small-scale, preliminary study designed to assess the feasibility of the

proposed research and entails evaluating all aspects of the research
(including administrative matters if a clinical trial is being contem-
plated). One of the primary objectives is to acquire a database for use in
formal calculations of sample size needed for a larger experiment or
clinical trial to be carried out at a later date.
Although relatively small in scale, pilot studies often require as
much interaction between researcher and statistician as more formal
research, particularly in determining which variables will be assessed.
Stopping rules need to be established because, by definition, detailed
power analyses are not possible in a pilot study.
DATA MANAGEMENT
Data management is a deliberately broad term, incorporating a vari-

ety of tasks concerned with data acquisition, storage, confidentiality,
editing, and retrieval. These aspects of research execution are of vital
importance in assuring the quality of the research. Data management is
especially vital in large-scale projects involving the determination of
many variables from many research subjects and possibly at multiple
time points.
Data acquisition begins with the design and planning of data-
gathering forms. Completed forms need to be machine entered, although
machine-readable forms can facilitate this activity. Data should be
checked thoroughly. One procedure for checking is dual entry, in which
data forms are entered twice, by independent data technicians. The final
data files for the two operators are compared data point by data point.
With dual entry, the only way an incorrect value can enter the final file
is for each operator to make the identical mistake in the identical
location.
Although dual entry is an effective mechanism for data-entry qual-
ity control, it is not always feasible. In large studies or clinical trials, the
researcher should plan on printing a randomly selected subset of the
entire data file for verification against the data-gathering forms. It may
also be prudent to check all values of any variables that are particularly
significant. This checking will provide an estimate of the overall data-
entry error rate and may suggest variables or areas in the data file that
need further attention in the editing process.
Once a data file has been checked, edited, and found satisfactory,
further examination should involve the search for potential outlying
values. This search can be done for all variables in the file and is a basic
procedure in the exploratory data analysis phase of the research. All
values above the 95th percentile and below the 5th percentile (or some
other specified cutoff points) are listed, together with identifying infor-
mation indicating which record in the data file contains the value. The
researcher can then consider this output and flag blatantly out-of-range
152 CLIVE
values for further checking. This procedure should be followed for all
major study variables.
Data confidentiality is important in clinical trials involving human
subjects. Assuring confidentiality usually entails an intermediate step in
the data-entry process in which identifying information is replaced with
some numeric patient identifier. This is the responsibility of the re-
searcher, who establishes and maintains the key relating the two data
fields. Access to the key is limited by and is under the direction of the
researcher. The file supplied to the statistical consultant should contain
no unique patient data that could be used for identification or to breach
subject anonymity.
RESEARCH ADMINISTRATION
The biostatistical consultant can help the researcher with other as-
pects of the actual administration and execution of the research project.
Several such topics noted here arise in clinical trials and include the
randomization of patients, protocol deviations, and the analysis of drop-
outs and missing data.
The randomization of patients refers to the assignment of patients
to study groups, usually by some random mechanism. Randomization
is generally a straightforward task, and the method of randomization
depends largely on the study design. Both the researcher and the statisti-
cal consultant need to keep careful track of any protocol deviations.
Protocol deviations involve changes in the study design or plan once
subject intake has begun. These planning changes are sometimes un-
avoidable.
Subject dropout can be a major problem in longitudinal clinical
trials. Patients can leave for a variety of reasons. Some may decide not
to continue participating, especially if the experiment involves some
unpleasant or invasive procedures. Others may leave the area. Some
may become injured or ill. The researcher and the biostatistician will
hope fervently that such dropout is random; that is, one is not primarily
losing the treatment responders or non-responders or only the most
compliant or non-compliant participants. Random dropout implies that
subjects who leave a clinical trial before completion of the protocol do
so at random, and that the remaining subject groups are still homoge-
neous with respect to potentially confounding factors.
The term intent to treat refers to the analysis of data from all subjects,
including those who drop out. The rationale for this method of analysis
is discussed in most clinical trial guides; see, for example, Spilker’s
Guide to Clinical Trials19 and Piantadosi’s Clinical Trials: A Methodological
Perspective.15 Most researchers will also analyze those subgroups of parti-
cipants completing the protocol. A crucial phase of the analysis of data
from a clinical trial is the analysis of dropouts and their comparison
with subjects who remained.
Missing data can be a problem in surveys or retrospective studies.

The difficulty, as with patient dropouts, arises when missing data occur
in a nonrandom fashion. Although the treatment of missing data (involv-
ing, for example, multiple imputation procedures) is the responsibility
of the biostatistician, the researcher needs to participate in planning the
procedures for carrying out the study so that that the occurrence of
missing data is minimized or the variables prone to absence are of
relatively minor importance in the study.
DISCUSSION
The primary theme of this article is that it is not essential that the
researcher have a strong working knowledge of elementary (or higher)
statistics to perform valid scientific research. Rather, the researcher
should be prepared to work with a biostatistical consultant on an exten-
sive and ongoing basis to plan and execute a research project carefully.
The need for this emphasis was recognized by Moses and Louis, who
suggested that effective collaboration between clinician and statistician
can help identify tractable scientific and statistical problems that need
attention and can help avoid undertaking intractable ones.13 Further-
more, they assert that the ‘‘central requirement for successful collabora-
tion is clear, broad, specific, two-way communication on both scientific
issues and research roles.’’13
The researcher will need to assist the biostatistician in estimating
sample size, in understanding the basics of the science involved, and in
relating scientific and statistical hypotheses. The biostatistician should
come to appreciate the scientific and clinical issues and underlying
principles; likewise, the researcher will come to appreciate how appro-
priately executed data analysis can extract valuable scientific knowledge
from experimental data.
Over time, the researcher will acquire the statistical knowledge
needed to interpret and present study results. The statistical understand-
ing may be focused and restricted to the methods relevant to the particu-
lar study, but it will constitute a useful body of knowledge, appropriate
for future studies or as a basis for using other statistical methods in
different studies.
Most, if not all, scientists are convinced of the utility of mathematical
models in representing and studying natural phenomena. Statistical
models are mathematical models that incorporate probabilistic measures
of uncertainty. In the study of oral health, two primary sources of
variation impart this uncertainty. The first is the natural variation among
patients in measures of oral health; the second is the variation resulting
from sampling, or selecting a subgroup of patients for study, because
the entire population of such patients is impossible to access.
The progression of data analyses intended to account for this varia-
tion, from simple independent group t-tests through complex multivari-
154 CLIVE
ate methods, is one of increasing technologic, mathematical, and statisti-

cal sophistication and advancement. It is also a progression that
describes considerable theoretic and applied advances by researchers
attempting to understand dental disease and how to deal with it. Often,
theoretic clarification and understanding derive from the application of
more detailed models, as new information about the processes being
modeled derive from formalization and logical representation.
The influence of this trend of increasing detail and complexity in
data analysis for dental research will become more profound in the
immediate future, as new developments in dental science occur simulta-
neously with new advances in statistical theory and computer science.
This progression will only increase the need for dental researchers to
establish and develop lines of communication with data analysts.
References
1. Armitage GC, Jeffcoat MK, Chadwick DE, et al: Longitudinal evaluation of elastase as
a marker for the progression of periodontitis. J Peridontol 65:120–128, 1994
2. Chugal N, Clive J, Spangberg L: A prognostic model for assessment of the outcome of
endodontic treatment: Effect of biologic and diagnostic variables. Oral Surg Oral Med
Oral Pathol Oral Radiol Endod, in press
3. Clive J, Woodbury MA: Continuous and discrete global models of disease. Mathemati-
cal Modeling 7:1137–1154, 1986
4. Diggle PJ, Liang K-Y, Zeger SL: The Analysis of Longitudinal Data. New York, Oxford
University Press, 1994
5. Elashoff J: nQuery Advisor Version 4.0 User’s Guide. Los Angeles, CA, 2000
6. Feinstein A: Clinical Biostatistics. Boca Raton, FL, CRC Press, 2002
7. Friedman G: Primer of Epidemiology, ed 4. New York, McGraw Hill, 1994
8. Glantz A: Primer of Biostatistics, ed 4. New York, McGraw Hill, 1997
9. Imrey PB, Chilton NW: Design and analytic concepts for periodontal clinical trials. J
Periodontol (suppl) 63:1124–1140, 1992
10. Kilburn KH, Alsmundsson T: Anteroposterior chest diameter in emphysema. Arch
Intern Med 123:379–382, 1969
11. Littell RC, Milliken GA, Stroup WW, et al: SAS System for Mixed Models. Cary, NC,
SAS Institute, 1996
12. Löe H, Anerud A, Boysen H, et al: Natural history of periodontal disease in man.
Rapid, moderate and no loss of attachment in Sri Lankan laborers 14 to 46 years of
age. J Clin Periodontol 13:431–440, 1986
13. Moses LE, Louis TA: Statistical consulting in clinical research: The two-way street. Stat
Med 3:1–5, 1984
14. Neely A: The natural history of periodontal disease in man. Risk factors for progres-
sion of attachment loss in subjects receiving no oral health care. J Periodontol
72(8):1006–1015, 2001
15. Piantadosi S: Clinical Trails: A Methodological Perspective. New York; John Wiley &
Sons, 1997
16. Riegelman RK: Studying a Study and Testing a Test, ed 4. Philadelphia, Lippincott
Williams & Wilkins, 2000
17. Sackett DL, Haynes RB, Tugwell P, et al: Clinical Epidemiology ed 2. Philadelphia,
Lippincott Williams & Wilkins, 1991
18. Sheehan TJ: The medical literature: Let the reader beware. Arch Intern Med 140:472–
474, 1980
19. Spilker B: Guide to Clinical Trials. New York, Raven Press, 1991
20. Ten Have TR, Landis JR, Weaver SL: Association models for periodontal disease
progression: A comparison of methods for clustered binary data. Stat Med 14:413–
429, 1995
21. Voelkl KE, Gerber S: Using SPSS for Windows. New York, Springer-Verlag New
York, 1999

Jonathan Clive, PhD
Department of Biostatistical Consultation
University of Connecticut Health Center MC3805
Farmington, CT 06030
e-mail: clive@nso1.uch.edu
APPLYING EVIDENCE BASED

DENTISTRY TO YOUR PATIENTS
GENERAL ISSUES
A common criticism of evidence based practice is that it seeks to

usurp the individual clinician’s judgment, imposing instead an external
authority found in the literature that may or may not be appropriate.
This criticism is not valid. Indeed, the fourth step of the Evidence based
Practice Model (Fig. 1) reserves a place for the individual practitioner’s
judgment in the application of the literature to the clinical problem.
Evidence based practice therefore seeks to inform clinical decisions, not
to impose them.
After converting the patient’s problem into an answerable question,
searching the literature, and critically appraising the found articles, the
clinician must to decide if the valid information that has been revealed
can be applied to the patient whose problem triggered the process. To
do so, the clinician must consider certain specific factors. First, clinicians
cannot allow themselves to be dazzled by elaborate statistics showing
extreme measures of statistical significance. In a trial comparing Bråne-
mark and IMZ implants under mandibular overdentures, Boerrigter et
al1 found a statistically significant difference in bone level changes be-
tween the implant types 1 year after implant placement. The mean
scores were 0.5 mm for the IMZ implants and 1.0 mm for the Brånemark
implants. This difference was found to be statistically significant (P ⬍
From the Faculty of Dentistry, University of Toronto ; and the Craniofacial Prosthetic Unit,
Toronto-Sunnybrook Regional Cancer Centre, Toronto, Ontario, Canada

158 ANDERSON
Figure 1. The steps in the model of evidence-based practice. (From Anderson JD: Need
for evidence-based practice in prosthodontics. J Prosthet Dent 83:58–65, 2000; with permis-
sion.)
0.003), meaning there was only a 0.3% chance that such a difference
could have occurred by chance. This difference seems major until one
realizes that it is only a 0.5-mm difference and therefore is unlikely to
be clinically significant. A highly significant statistical difference is there-
fore no indicator of a clinically significant difference.
Most articles that describe clinical research report their findings on
a sample of patients. Often, the sample of patients is intended to repre-
sent the whole population. The selected patients therefore should have
demographic and disease characteristics similar to those of the popula-
tion at large. The distribution of age, sex, socioeconomic status, educa-
tion, nutritional status, and occupational range all should reflect society
in general. Similarly, the prevalence, severity, and duration of disease
status should also mirror the general population. Clearly, a sample of
patients in any given study is unlikely to fulfill all these criteria. Often,
the authors do not want to reflect the whole population and limit their
sample to persons of a certain age group, or with a history of exposure
to an agent such as smoking, or with a clinical condition such as
edentulousness. In applying the findings from such studies to the indi-
vidual patient, a clinician must decide if the patient is similar enough to
the study patients for the results to be applicable. One way to do so is
to see if the clinician’s patient would have met the inclusion and exclu-
sion criteria to be included the study. Often, some differences are found
between the study sample and the present patient. These differences
may not make the article useless. A more useful approach may be to
reverse the question and ask whether the study population is so different
APPLYING EVIDENCE BASED DENTISTRY TO YOUR PATIENTS 159
from the patient that the results cannot possibly be applied. This ap-
proach makes it possible to apply some information from the article. If
the study population is divided into subgroups, it may be possible
to match the reader’s patient to one of the groups for more focused
information.
The setting in which the study was gathered can have a major
impact on the findings. The results of a new, experimental periodontal
treatment tested in a major teaching institution may not be applicable to
the patients of a general practitioner because of an effect called referral
filter bias. The major teaching institution is likely to attract patients
who have more severe periodontal problems than those seen in general
practice. Similarly, the treatment they receive at a major center may not
be feasible in general practice. The patients’ response to the new treat-
ment, therefore, may not be applicable to the patients of a general
practice. An example is the series of patients treated with severe (apical
third) periodontal bone loss who were rehabilitated with extensive fixed
bridges and aggressive oral hygiene maintenance.6 Such a report offers
little help to the general practitioner who sees less severe periodontal
destruction, is less likely to undertake such extensive reconstructions,
and may not be able to expect such a high degree of patient compliance
in oral hygiene. Therefore, if readers are seeking information to apply to
their general practice, it will be necessary to pay special attention to
how the patients were selected with respect to the severity of their
disease and the feasibility of the treatment approach. The important
question for the practitioner to ask is, ‘‘Could such circumstances be
duplicated in my office?’’
No clinical decisions are made without some element of patient
input. The patient’s preferences, priorities, and resources will therefore
affect clinical decisions. Stated another way, the social and cultural issues
that are important to the patient must be considered when deciding how
to apply the information found in a literature search related to the
patient’s problem. A new, highly effective treatment approach that takes
too long, is likely to be painful, or is too expensive is not appropriate if
it is not consistent with the patient’s wishes. Similarly, treatment solu-
tions exist for problems that are not important to some patients. The use
of effective veneering techniques makes sense only if the social and
cultural pressures on a patient exceed the risks inherent in the technique.
For many people, a less than perfect smile is simply not important. To
suggest a solution where there is no problem invites disaster. Marketing
techniques aimed at creating demand are a concern here.
SPECIFIC APPLICATIONS
The general issues discussed previously apply to any situation in

which one is contemplating the application of valid information found
in the literature to a specific patient situation. There are, however, other
160 ANDERSON
points to be considered when applying certain types of information to

the patient’s situation. These are considered in turn.
Diagnostic Tests
Once valid information about a diagnostic test has been recovered

from the literature, the practitioner must decide whether the test will be
useful for a given patient. To make this decision, the answers to a few
questions will provide guidance7.
1. Is the diagnostic test available, affordable, accurate, and precise
in this setting?
2. Can a clinically sensible estimate of the patient’s pretest probabil-
ity of disease be generated?
• Can personal experience or prevalence statistics be drawn on?
• Are the study patients similar to this patient?
3. Would the results of the test affect the management and help
the patient?
• Could the results influence the decision to treat the condition?
• Would the patient be a willing partner in the treatment?
First and most sensibly, practitioners must be assured that the test
is available, affordable, accurate, and precise in their setting. The answer
to the first two parts of this question are probably obvious. An electric
pulp tester is easily available and usable at reasonable cost in most
dental offices. On the other hand, computed tomography and the associ-
ated software are less available, and their use is certainly more expen-
sive. The answer to the latter two parts of the question may be less
apparent. A diagnostic test that has performed well in the office of a
general practitioner may not perform as well in a specialist’s office or
university clinic. The reason is that the prevalence of the condition being
tested will probably be different in the two settings. Therefore, the
difference in the rate of false-positive (or false-negative) findings will
change the likelihood ratios of the test. Because the prevalence of the
disease may be a major component of the pre-test probability of disease,
the test may behave differently in different settings. Thus, the interpreta-
tion of the electric pulp test may be different in a general practitioner’s
office and in a university teaching endodontic clinic. Similarly, a report
of a new test that was validated in a tertiary care center must be applied
with caution in a general practice setting.
The appropriate use of a diagnostic test begins with a pretest esti-
mate of the likelihood of disease. This estimate may be no more precise
than the prevalence of the condition in the population. A patient who
presents with throbbing pain and facial swelling, however, raises the
pretest estimate of the likelihood of apical periodontitis beyond the
general prevalence in the population. Even if the diagnosis is a guess,
under these conditions it is a better estimate than simple prevalence.
On the other hand, it is more difficult to generate an estimate of the
pretest likelihood of a malignancy in a young patient with an unex-

plained ulcer in the palate. Where a practitioner estimates pretest likeli-
hood, the most recent or most dramatic previous events encountered
influence his or her judgment.5
The third question Sackett and colleagues pose is whether the results
of the test will change the practitioner’s treatment behavior.7 For exam-
ple, a patient presents with a maxillary lateral incisor consumed by
decay that is sensitive to heat and percussion, with throbbing pain and
swelling above the apex. It is highly unlikely that an electric pulp test
of this tooth will change the treatment behavior of the dentist. In this
situation, the treating dentist has already crossed a decision threshold to
treat the tooth based on other clinical findings, and the electric test will
add no new information. On the other hand, the dentist would not do a
biopsy of a lesion in the palate of a teenager who reported burning his
mouth on a hot pizza the night before. Similarly, the dentist has crossed
a decision threshold in the other direction, deciding not to test the lesion
because of the invasiveness of the test and the low probability of a
finding that warrants treatment. It is in the area between these extremes,
where the results of a test will influence the treatment behavior, that the
time, cost, and discomfort of a test are warranted.
Finally, if the test is painful or costly, the patient may choose not to
know the results of the test rather than submit to the test. For example,
a patient may be reluctant to submit to a CT scan with three-dimensional
reconstruction to measure bone volume before the placement of two
implants in his edentulous mandible when a conventional panoramic
film and clinical examination confirm more than enough bone thickness
and height. Clearly, the patient must be a willing participant in the
diagnostic procedure with an expectation of obtaining valuable new
information that will influence the outcome to justify the additional cost
or discomfort.
Prognosis
Whether the information in an article about the prognosis of a

condition should be applied to a specific patient can be decided by
answering questions specific to this type of article:
1. Will the results lead directly to selecting or avoiding treatment
for an individual patient?
2. Are the results useful for reassuring or counseling patients?
Knowledge of the natural history of a condition clearly will influ-
ence the decision to select or avoid treatment. For example, with the
clarification of the prognosis of juvenile periodontitis,8 treatment can
be more focused and aggressive. On the other hand, an article by
de Leeuw et al3 suggests that in patients with osteoarthrosis and reduc-
ing temporomandibular joint disk displacement, the prevalence of pain
dropped from 43% to only 17% in 2 to 4 years and dropped further to
162 ANDERSON
only 2.4% after as much as 30 years. It would be difficult to suggest

invasive surgical treatment in the face of this information. So, in addition
to the general issues noted at the start of this article, readers of an article
that describes a prognosis must ask whether the results will lead directly
to selecting or avoiding treatment for an individual patient. Unless the
results of an article about prognosis can be used in this way, it is unlikely
that the results will have any application to the individual patient.
In situations such as temporomandibular disorders, providing infor-
mation for the patient may be enough treatment. Simply giving the
patient some understanding of the natural history of the condition can
do much to relieve anxieties by providing realistic expectations. A sec-
ond question for the reader of articles about prognosis, then, is, ‘‘Are
the results useful for reassuring or counseling patients?’’
Therapy
In dentistry, numerous articles advocate improved techniques or

materials over existing therapies. It is not always appropriate to apply
the results of every therapeutic improvement to every patient, even if
the evidence was found to be compelling when critically appraised.
Certain questions specific to articles about therapy will help determine
when to apply improvements to patients and when not to:
1. Are the results reported as outcomes that are important to pa-
tients?
2. Were all clinically important outcomes reported?
3. Are the likely treatment benefits worth the potential harms and
costs?
Evidence for the improvement usually takes the form of increased
longevity (such as implants or fixed partial dentures), reduced numbers
of failures (such as tooth loss), or improvement in subjective parameters
(such as comfort or chewing ability). All of these outcomes are important
to patients. Sometimes, when these outcomes are rare or take a long
time to realize, substitute outcomes, such as attachment loss, bleeding
on probing, and mobility, are used to predict those events that are
important to patients. The use of these surrogates is reasonable and
expedient only to the extent that they do, in fact, predict the events that
are important to patients. A meta-analysis presented recently2 suggested
that guided tissue-regeneration procedures would result in a mean in-
crease in attachment level of 4.0 mm. This result is impressive, but the
application of this information to an individual patient requires that an
increase in attachment level predicts greater tooth longevity—an out-
come more likely to be of interest to the patient than the level of
attachment. If this link has been previously established, this information
is meaningful; if this link has not been established, the usefulness of this
information is limited, even though it is based on a meta-analysis (a
strong design). A further problem was that the underlying studies used
in the meta-analysis were limited to 1 year of follow-up. The reader of

articles that report the results of trials of therapy must therefore be sure
that the outcomes reported are important to patients and not merely
surrogates lacking in predictive value.
To ensure predictive value, it is important that all clinically im-
portant outcomes have been reported in the article. In a randomized
trial of the efficacy of flurbiprofen taken for 3 months after implant
surgery in reducing alveolar bone loss around implants, Jeffcoat and
others4 noted that two patients had to be withdrawn from the study, one
because of stomach upset and another because of a decrease in red blood
cell counts thought to be related to the medication. The trial found a
statistically significant reduction in bone loss between the third and sixth
months after surgery, but at no other time up to 1 year. The difference
in bone mass lost was between 8.6 and 12 mg. The reader therefore must
consider whether the additional risks involved in that dosage of the
drug are worth the benefit of saving those few milligrams of bone.
The significance of saving those few milligrams of bone must also be
considered. The clinician must balance the potential benefits of the
treatment against the potential harms or costs of the treatment. The
information presented in the article informs but does not dictate the
clinician’s decision to apply the findings to the particular patient.
SUMMARY
It should be evident by now that evidence based dentistry leaves

much room for the application of clinical judgment to the literature. This
article points out that judgment in evaluating certain factors is essential
and that the practice of evidence based dentistry is not a process of
blindly following the conclusions found in the literature. Clinicians can
safeguard the patient and themselves against the inappropriate use of
weak or irrelevant evidence in the conduct of daily practice. This skill
adds confidence to decision making in clinical practice and prevents the
decline in skills throughout a career.
References
1. Boerrigter EM, van Oort RP, Raghoebar GM, et al: A controlled clinical trial of implant-
retained mandibular overdentures: Clinical aspects. J Oral Rehabil 24:182–190, 1997
2. Bragger U: Evidence Based Outcomes of Periodontal Therapy: Clinical Decision Making
in Prosthodontics and the Impact of Implants. Bern, Switzerland, 2001
3. de Leeuw R, Boering G, Stegenga B, et al: Symptoms of temporomandibular joint
osteoarthrosis and internal derangement 30 years after non-surgical treatment. Cranio
13:81–88, 1995
4. Jeffcoat MK, Reddy MS, Wang IC, et al: The effect of systemic flurbiprofen on bone
supporting dental implants. J Am Dent Assoc 126:305–311, 1995
5. Kassirer JP, Kopelman RI: Cognitive errors in diagnosis: Instantiation, classification, and
consequences. Am J Med 86:433–441, 1989
164 ANDERSON
6. Nyman S, Lindhe J: A longitudinal study of combined periodontal and prosthetic

treatment of patients with advanced periodontal disease. J Periodontol 50:163–169, 1979
7. Sackett DL, Straus SE, Richardson WS, et al: Evidence-Based Medicine. How to Practice
and Teach EBM, ed 2. Edinburgh, Churchill Livingstone, 2000
8. Zambon JJ, Christersson LA, Genco RJ: Diagnosis and treatment of localized juvenile
periodontitis. J Am Dent Assoc 113:295–299, 1986

Department of Dentistry
University of Toronto
124 Edward Street
Toronto, Ontario M5G 1G6
Canada
e-mail: jim.anderson@utoronto.ca

Mark V. Thomas DMD - Evidence-Based Dentistry, An Issue of Dental Clinics (The Clinics - Dentistry) (2009, Saunders) PDF

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Mark V. Thomas DMD - Evidence-Based Dentistry, An Issue of Dental Clinics (The Clinics - Dentistry) (2009, Saunders) PDF

Hochgeladen von

Copyright:

Verfügbare Formate

EVIDENCE BASED DENTISTRY 0011–8532/02 $15.00 .

WHAT IS EVIDENCE BASED

The volume of literature and lectures directed at the modern dental

DENTAL CLINICS OF NORTH AMERICA

VOLUME 46 • NUMBER 1 • JANUARY 2002 1

The internet has made it easy to initiate an evidence based practice

THE USE OF EVIDENCE BASED DENTISTRY IN

Were All Patients Who Entered the Trial Properly

Were Patients, Their Clinicians, and Study Personnel

were unaware of the therapy being rendered. Blinding is easily done in

Were the Groups Similar at the Start of the Trial?

To ensure validity, it is critical that the cohorts (groups) be similar

Aside From the Experimental Intervention, Were the

Anything one studies, one alters. Patients who agree to participate

Were All Clinically Important Outcomes Considered?

The reader must decide whether all clinically important outcomes

Was Follow-up Sufﬁciently Long and Complete?

Too often a study is not long enough to be valid to the clinician

Were Objective and Unbiased Outcome Criteria Used?

Outcome criteria are chosen by the investigator, and it is easy to err

Will the Results help Clinicians in Caring for

USING EVIDENCE BASED DENTISTRY TO EVALUATE

Was There an Independent, Blind Comparison with a

A gold (reference) standard is important. In histopathology, the

Were the Methods for Performing the Test Described

If the reader cannot perform the test, it is of no use.

Were Sensitivity, Speciﬁcity, Positive Predictive Value,

It is not the reader’s responsibility to undertake statistical analysis

Will the Patient be Better Off as a Result of the Test?

Routine testing, if it does not affect the diagnosis, prognosis, or

WHAT EVIDENCE BASED DENTISTRY IS NOT

WHO BENEFITS FROM EVIDENCE BASED DENTISTRY

• The ultimate beneﬁciaries of EBD are members of the public, who

Address reprint requests to

HOW QUESTIONS ARISE

DENTAL CLINICS OF NORTH AMERICA

VOLUME 46 • NUMBER 1 • JANUARY 2002 11

obvious. With experience, practitioners build up a mental library of

In the ﬂow of daily practice, virtually no decisions are made in a

questions that directly affect patient management. Doing so is ethical

For the Patient

As noted previously, the patient’s questions and the practitioner’s

For the Searcher

The most direct approach to ﬁnding the answer to a clinical question

Lee et al6 suggest the development of critically appraised topics (CATS)

To ﬁll the knowledge gaps, the busy practitioner needs a strategy

At any time the searcher may need answers to both background

An example illustrates the usefulness of framing a clinical question

A dentist saw his edentulous patient on annual follow-up 2 years after

The clinician converts the patient’s problem into a question: ‘‘In

HOW A QUESTION STEERS A SEARCH

Speciﬁc Deﬁnition of Search Terms

Using the concepts deﬁned in the question, the clinician searches

to several Medical Subject Heading (MeSH) terms that describe this

Skimming Titles and Abstracts in Found Literature

1. Adell R, Lekholm B, Rockler B, et al: A 15-year study of osseointegrated implants in the

4. Jemt T, Chai J, Harnett J, et al: A 5-year prospective multicenter follow-up report on

Address reprint requests to

Gary R. Goldstein DDS, and Jack D. Preston, DDS

How does a practitioner determine what therapy to use? Often, the

DENTAL CLINICS OF NORTH AMERICA

VOLUME 46 • NUMBER 1 • JANUARY 2002 21

person. Conversely, upon graduation some clinicians become comfort-