Potential errors and misuse of statistics in studies

on leakage in endodontics

C. Lucena1, J. M. Lopez1, R. Pulgar1, C. Abalos2 & M. J. Valderrama3

Department of Conservative Dentistry, School of Dentistry, University of Granada, Granada; 2Department of Conservative
Dentistry, School of Dentistry, University of Seville, Seville; and 3Department of Statistics and Operations Research, Faculty of
Pharmacy, University of Granada, Granada, Spain

Aim To assess the quality of the statistical methodology used in studies of leakage in Endodontics, and
to compare the results found using appropriate versus
inappropriate inferential statistical methods.
Methodology The search strategy used the descriptors root filling microleakage, dye penetration,
dye leakage, polymicrobial leakage and fluid filtration for the time interval 20012010 in journals
within the categories Dentistry, Oral Surgery and
Medicine and Materials Science, Biomaterials of the
Journal Citation Report. All retrieved articles were
reviewed to find potential pitfalls in statistical
methodology that may be encountered during study
design, data management or data analysis.
Results The database included 209 papers. In all
the studies reviewed, the statistical methods used

The practice of Endodontics involves decision-making
on materials, techniques, procedures and treatment
options. Today, in the era of evidence-based dentistry,
many of these decisions are based on the review of

were appropriate for the category attributed to the

outcome variable, but in 41% of the cases, the chisquare test or parametric methods were inappropriately selected subsequently. In 2% of the papers, no
statistical test was used. In 99% of cases, a statistically significant or not significant effect was
reported as a main finding, whilst only 1% also
presented an estimation of the magnitude of the
effect. When the appropriate statistical methods were
applied in the studies with originally inappropriate
data analysis, the conclusions changed in 19% of the
Conclusions Statistical deficiencies in leakage studies may affect their results and interpretation and
might be one of the reasons for the poor agreement
amongst the reported findings. Therefore, more effort
should be made to standardize statistical methodology.
Keywords: confidence intervals, endodontics, leakage, statistical bias.
Received 17 April 2012; accepted 18 July 2012

scientific information available on a given topic. One of

the main considerations in this process is that the scientific literature should be sound, reliable and unbiased
(Eckert et al. 2003). The standardization and quality of
reporting of biomedical research is, therefore, an important obligation of the healthcare community.
However, a growing number of articles are emerging in the dental and statistical literature warning of
flaws in research reports (Baccaglini et al. 2010).
According to Kim et al. (2011), 51.5% of 307 articles
published in 10 dental journals between 1995 and
2009 contained at least one statistical error. Most of

Statistics and leakage studies Lucena et al.

these deficiencies are encountered in the areas of Epidemiology, Periodontology, Implantology and Orthodontics (Galgut & OMullane 1998, BeGole 2000,
Galgut 2003, Tu et al. 2004, 2005), but until now,
little has been published on the prevalence of flawed
statistical analysis in Endodontics.
The inappropriate application of statistical methods
to research data is an error of sufficient magnitude to
raise serious questions about the validity of the
conclusions reached (Elenbaas et al. 1983). Nevertheless, in dentistry, most of the papers published on this
question have listed only the types of statistical mistakes committed, but have made no attempt to determine whether both the correct and incorrect statistical
tests lead the investigators to the same conclusions.
Fardi et al. (2011) identified the 100 top-cited articles
published in journals dedicated to Endodontics, analysing
the subject areas to highlight noteworthy trends and to
reflect major advances during the last 50 years. This
review shows that one of the main topics covered is leakage. Although the number of citations that an article
receives is not necessarily a measure of its quality, it does
reflect recognition by the scientific community and the
influence of the article in generating changes in dental
practice or further research (Fardi et al. 2011).
Given the importance of the statistical methodology in
terms of the validity of the findings of any research and
the fact that the subject of leakage represents an area of
significant interest within endodontics, the goals of the
present study were to analyse the quality of statistical
methodology used in the literature with respect to leakage in Endodontics, and to compare the results using
appropriate versus inappropriate inferential statistical

Materials and methods

The database selected for the search was the ISI Web of
Science (WoS), using the Science Citation Index
Expanded (SCI-E). The search was limited to the categories Dentistry, Oral Surgery and Medicine and Materials Science, Biomaterials of the Journal Citation Report
(JCR) for the time interval 20012010. For descriptors,
we used the non-MeSH term root filling combined by
pairs with the non-MeSH terms microleakage, dye
penetration, dye leakage, polymicrobial leakage and
fluid filtration. The results were filtered according to
the type of document articles, thus excluding other
types of publications, such as proceedings papers and
abstracts for meetings and congresses, because of the
insufficient information provided in these documents.


The records were imported to the bibliography-management program ProCite, version 5 for Windows,
eliminating all duplicates. Two researchers from the
Conservative Dentistry Department of the University of
Granada (Spain) independently screened the database,
selecting only the documents related to leakage in
Endodontics. Doubtful cases were decided by consensus.
A detailed evaluation was made of the articles to
assess the quality of the statistical methodology used
and to find potential pitfalls in the implementation and
reporting of research methodology that may be
encountered during the study design, data management, statistical analysis or documentation of the
statistical tests applied. For this purpose, we used guidelines specified in previous studies (Strasak et al. 2007,
Baccaglini et al. 2010, Clark & Mulligan 2011).

Study design stage

The study design included whether the working
hypothesis was clearly defined, whether an a priori
effect and sample size was considered, or whether the
participants were randomly assigned to either an exposure treatment or a control group. Next, the method of
evaluating the leakage was recorded: qualitatively
(as success or failure, according to an ordered scale
and at which level of severity) or quantitatively (mm,
percentages, etc.), and the category (qualitative or
quantitative) assigned to the outcome variable leakage.

Data management stage

The description of basic data made by the authors
was examined (the descriptor of central tendency was
recorded as mean, median or both), as well as
whether the distribution of the data (normal or nonnormal) had been specified. A note of whether the
authors stated that they had performed a test of
normality on data was made or if they explicitly
mentioned the distribution. For articles that did not
describe the distribution of data, it was assumed to be
non-normal, unless sufficient information was
provided to reveal that the data were normally
distributed (Qualls et al. 2010).

Data analysis stage

The inferential statistical tests used were noted and
compared against standards (Qualls et al. 2010, Lucena
et al. 2011) used to test categorical or continuous data
(normally or non-normally distributed).

Lucena et al. Statistics and leakage studies

Documentation stage
All statistical methods applied were checked to confirm
whether they were described clearly and correctly, and
ascertained whether the raw data were presented to
enable a reader to recalculate the results presented
(Strasak et al. 2007, Clark & Mulligan 2011). In addition, whether the magnitude of the effect had been
estimated was also considered (by means of confidence
intervals CIs-, odds ratios, etc.).

and 2010. The number of articles published annually on

this question ranges from 11 to 32, implying an average
of just over 21 articles per year, appearing in 20
journals. Three journals accounted for 83% of the total,
whilst the remaining 17% were distributed amongst 17
All articles were laboratory studies. These documents
referred mainly to apical leakage (n = 129). Some of
the studies concerned coronal leakage (n = 30) and 50
articles examined both coronal and apical leakage from
root canals.

Statistical re-analysis
For all the studies that, according to the standards
mentioned, had employed inappropriate methods to
analyse their data, and in which the authors had
provided raw data, statistical methods were applied
that were more appropriate to the nature of the data
presented. Finally, to estimate the magnitude of the
effect, the odds ratios or CIs were calculated in all the
studies in which it was possible to apply them.

The search strategy applied retrieved 371 papers, 209 of
which adhered to the predetermined criteria. Therefore,
the final database consisted of 209 original research articles on leakage in Endodontics, published between 2001

Study- design and data management stages

Table 1 summarizes the frequencies of the most
common errors found in the papers reviewed. In most
studies (n = 200), leakage was assessed using either a
quantitative measurement scale or a qualitative one,
although in nine papers more than one method was
used. Thus, in three cases, both a qualitative scale and
quantitative method were applied, whilst another three
combined two different quantitative methods, two articles compared three different quantitative methods,
and, finally, one study compared four different quantitative methods. Therefore, the total number of leakage
tests considered was 222. Table 2 presents their distribution according to the type of scale of the outcome
variables used to evaluate the amount of leakage.

Table 1 Common mistakes/inconsistencies found in the articles reviewed

Statistical error type
Study design
No clear a priori statement or description of the Null Hypothesis under investigation
No a priori sample-size calculation/effect-size estimation (power calculation)
Failure to use and report randomization or method of randomization not clearly stated
Failure to report drop-out subjects from the study
Failure to use and report blinding if possible
Failure to report initial equality of baseline characteristics and comparability of study groups
Analysis of study data
Failure to examine/specify for the normality of continuous data
Failure to use an inferential test, giving only a description of the data
Documentation of study data
Failure to provide data with enough detail to enable the reader to recalculate all results
Failure to specify which test was applied
Presentation of study data
No a priori significance level definition
Reporting P = NS or P < 0.05 instead of reporting exact P-values
Reporting median without its respective standard deviation
Recording Odds Ratio for treatment group differences
Recording Confidence Intervals for treatment group differences















Where n = studies in which it is possible to apply the criteria.

In five studies there were no data of significance.

Statistics and leakage studies Lucena et al.

Table 2 Evaluation of leakage

Number of

Leakage measurement
01 scale (success or failure)
02 scale
03 scale
04 scale
05 scale
13 scale
Millimetres, micrometers
or nanometres
Percentages or ratios
Fluid movement (in mL min 1,
lL min 1, etc.)
Bacterial or glucose concentrations
(in mmol mL 1, UI mL 1, etc.)
Mean of days without leakage
Filtration area (in mm2)
Electrical conductance (in k)

N = 222

In some studies more than one type of evaluation was used.

Data analysis stage

Table 3 summarizes the inferential statistical tests
used in the articles. In 26 articles, authors evaluated
the leakage using a qualitative scale and treated the

outcome variable as categorical. In 11 of them (42%),

Fishers test was correctly applied. In the remaining
15 (58%) articles, the chi-square test was inappropriately employed.
In 14 of the studies reviewed (6% of the total),
leakage was assessed using a qualitative measurement
scale, but these measurements were used to generate
count outcome variables. In 93% of these cases
(n = 13), the authors selected nonparametric tests
with a non-normal distribution, but in 7% (n = 1),
parametric tests of significance were used on nonnormal data.
Finally, in 182 cases, leakage was measured quantitatively; the ANOVA test was appropriately applied in
15 (8%) and the KruskalWallis test in 88 (48%). In
the remaining 75 (41%) articles, a parametric test of
significance was applied to data that were non-normally distributed, or a nonparametric test was applied
to normally distributed data. In four studies (2%) with
a quantitative measurement of leakage, no test of statistical inference was used.
In short, the above results show that in 127 cases
(57%) the authors selected an appropriate test of statistical inference, but in the remaining 95 cases, an
incorrect statistical test (41%) or no statistical test
(2%) was applied. The percentage of articles that used
the appropriate test of statistical inference varied
depending on the journal (Table 4).

Table 3 Statistical analysis applied in 209 studies revieweda



of application

Ordinal scale
(n = 40)

As a categorical variable
(n = 26)
As a continuous variable
(n = 14)

Expected number
in a cell  5 (n = 26)
Variables without normal
approximation (n = 14)

(n = 182)

As a continuous variable
(n = 182)

Variables with normal

approximation (n = 16)

Variables without normal

approximation (n = 162)

Studies with a quantitative

leakage measurement in
which no test of statistical
inference was used

Test of

of works

Numbers in italics refer to studies that used the appropriate test of statistical inference.
In nine studies more than one type of evaluation was used.


International Endodontic Journal, 46, 323331, 2013

Lucena et al. Statistics and leakage studies

Table 4 Distribution of works by journals

J Endod
Oral Surg Oral Med
Oral Pathol Oral Radiol Endod
Int Endod J
Quintessence Int
Aust Endod J
J Appl Oral Sci
J Can Dent Assoc
Am J Dent
Dent Mater J
J Biomed Mater
Res B Appl Biomater
J Dent
J Oral Rehab
Dent Mater
Eur J Oral Sci
J Adhes Dent
J Biomater Appl
J Biomed Mater Res A
J Am Dent Assoc
J Prosthet Dent
Med Oral Patol
Oral Cir Bucal









Number of articles on leakage in Endodontics published in
the time interval 20012010.
Percentage of articles that used the appropriate test of statistical inference.

Documentation stage
Amongst the 91 cases that had selected an inappropriate test of statistic inference, only 16 provided
raw data enabling a statistical re-analysis to be performed. With respect to the presentation of the study
results, only three of the 209 articles reviewed provided a measure of the magnitude of the difference,
by means of an odds ratio (two articles) or CIs (one

Statistical re-analysis
The change of the statistical methodology with
respect to the one previously applied (n = 16) gave
results that differed from those originally presented in
11 studies (69% of the sample) and represented a
substantial alteration of the final conclusions in three
of them (19% of the recalculated studies). In eight of
the recalculations (50%), a lower P-value was
obtained, although the initial conclusions remained
valid. In five cases (31% of the sample), it was not
possible to compare the results generated in the

present study with the original ones, because the

authors had not provided exact P values (only
reporting P = NS or P < 0.05).
Of the 91 studies that had selected inappropriate
statistical tests, none provided data with enough
detail to allow CIs to be calculated, whilst in 10 cases,
it was possible to calculate the odds ratio. Amongst
these, four were negative studies based on significance
testing (studies that produced statistically nonsignificant results) and six were positive (studies producing
statistically significant results). The re-analysis of
these negative studies focused on interpreting the
odds ratio rather than P-values and revealed that the
intervention had been beneficial in one study. In this
latter case, the application of the appropriate statistical test predicted this effect, giving statistically
significant results (P = 0.021). In the remaining three
cases, the odds ratio did not reflect beneficial intervention effects. Amongst six studies with positive results,
the odds ratio calculation revealed that the significant
differences were clinically relevant in five studies,
whilst in the remaining one the statistically
significant effect was minor and of negligible clinical
Amongst 127 studies that had selected appropriate
statistical tests, none provided data with enough
detail to allow CIs to be calculated, and only in three
cases was it possible to calculate the odds ratio. The
interpretation of the results on the basis of calculating
the odds ratio in the studies with statistically significant differences (n = 2), revealed significant treatment
effects. In the remaining paper that reported negative
results based on significance testing, no substantial
changes with regard to the original interpretation of
the results were necessary after calculating the odds

Studies of leakage in Endodontics have a long and
controversial history, with both supporters and
detractors (Oliver & Abbott 2001, Susini et al. 2006,
Editorial Board of the Journal of Endodontics 2007,
De-Deus 2008). Thus, the relationship of in vitro
leakage to clinical success or failure of endodontic
treatment has been widely debated (Oliver & Abbott
2001). Furthermore, the inconsistency apparent
amongst the results of different studies means that
the value of virtually any endodontic technique may
be supported or disputed (Editorial Board of the
Journal of Endodontics 2007).

Statistics and leakage studies Lucena et al.

However, the sealing ability of root fillings is still

important and this parameter should continue to be
used as a factor to rank and evaluate new materials
(De-Deus 2008, Wu 2008). Consequently, there is a
need to try to create a standard method that is reliable
and reproducible and that relates to clinical outcomes
(Editorial Board of the Journal of Endodontics 2007).
Concerning this point, it is important to note that
the reliability of any research depends on the internal
validity of the study, which refers to the qualitative
characteristics of the methodology employed (Pandis
et al. 2011). Several papers have discussed methodological aspects that can influence the outcome of
in vitro leakage (Pommel et al. 2001, Camps & Pashley 2003, Karagenc et al. 2006, Rechenberg et al.
2011). Given the importance of the statistical methodology in terms of the validity of the findings of any
research, it is particularly striking that, to date, no
study has determined the prevalence or the exact
implications of possible errors or misuse of statistics
on the validity of these studies.
In the present review, the authors have found a
high prevalence of statistical errors, most of which
are committed during the planning stage and the statistical-analysis phase (Table 1). It should be noted
that errors in the analysis phase can be corrected if
the study design is sound, but flaws in the study
design can lead to data that are unusable.
A recent review (Pandis et al. 2011) concluded that
reporting on sample-size calculations in dental
research is low, ranging from 0% to 28% amongst
dental speciality publications. In this study, only 2%
of the studies employed sample-size estimation. However, an appropriate sample size is crucial for highquality statistical work, to detect any differences
between study groups. Inadequate sample size often
leads to type-II errors, in which the null hypothesis is
falsely accepted. Thus, a previous study (Schuurs
et al. 1993) concluded that the value of many
endodontic leakage studies is limited because of the
inadequate power of the statistical test applied, as a
result of sample sizes that are unjustifiably small.
With respect to the analytical phase, the present
data are in agreement with the findings of previous
studies (Altman 1991, Tu et al. 2004, 2005, Abt
2010a,b) that one of the most frequent methodological errors is to select an incorrect statistical test. For
example, the chi-squared test is applied to analyse
ordinal-scaled data that appear as frequencies or
number of observations in every category. Application
of this test requires observations to be independent,


and to ensure a reasonably good approximation no

more than 20% of the cells should have an expected
cell frequency  5 (Kim et al. 2011). To satisfy these
requirements, a sufficiently large sample size is necessary. However, 15 articles were found to violate the
above conditions. In these cases, Fishers test or the
chi-squared test with Yates correction would have
been more appropriate options.
This review also shows that one-way analysis of
variance is one of the most egregiously misused
statistical methods (Kim et al. 2011). For the proper
use of a one-way ANOVA, the populations must be
independent, normally distributed and have the same
variance. With respect to the issue of distribution,
normality tests should always be used as a prerequisite to the application of parametric or nonparametric
tests, but invoking the central limit theorem, the nonnormal distribution of the variable does not always
involve the rejection of the parametric test (Cohen
2001), since, even if the variables do not follow a
normal distribution, their means may do so, provided
that the sample has been taken randomly from the
population and the sample size is sufficiently large.
When these conditions are not satisfied, a transformation of the variable would make sense. This could
lead to an approximately Gaussian distribution, and
the variances would equalize. Examples of such transformations are the square-root transformation for
Poisson data, the Box-Cox transformation for regression analysis, and the arcsine transformation for
proportions. Another common situation in practice
occurs when there are several independent Gaussian
variables, and one has a different variance. In this
situation, the procedure required is to perform, first,
an ANOVA amongst the remaining variables and then
to apply a hypothesis test between the means of two
variables with different variances, using the Welch
approximation to degrees of freedom. However, as
shown in Table 4, in 75 of the 196 cases with count
variables, these conditions were not met, and
consequently, the statistical methodology applied was
considered to be incorrect.
The article that used a nonparametric test on
normal data has also been classified as incorrect
because in this case a parametric test, being more
powerful and versatile, would have been the correct
This survey shows the important influence of the
statistical test adopted, with respect to the results and
interpretation of any research, given that a change in
the conclusions was necessary in 19% of the studies

Lucena et al. Statistics and leakage studies

that were statistically re-analysed. In addition, when

both the correct and the incorrect statistical tests lead
to the same conclusion, of course the P-values will be
different, (in this review, this circumstance arose in
50% of the recalculations). Therefore, the misuse of
statistics in data analysis may lead to erroneous
conclusions and make the research findings difficult
to replicate. This highlights the need for the raw data
to be reported, at least during the editorial review
process, prior to the acceptance of an article. The
final decision on acceptance should be taken only
after screening by a statistical expert provided by the
Journal. Journals could also publish online extended
version of articles including the entire body of data as
supporting information, thus allowing readers to
perform alternative analyses.
In relation to the presentation and interpretation
of study results, in 91% of the articles reviewed, only
P-values are reported and used to draw conclusions concerning intervention effectiveness. However, it has long
been recognized that over-reliance on P-values is often
misleading (Rothman 1978, Pandis et al. 2011). Thus,
an intervention may be found to be not statistically significant because of inadequate sample size, even though,
if the intervention were applied to the population, it
might have a very important clinical effect. Conversely, a
statistically significant effect could be small and of very
little clinical importance (Petrie et al. 2002). A recent
publication (Vavken et al. 2009) showed that a high
proportion of statistically significant results (31%) do not
reflect relevant treatment effects.
In this context, a more appropriate presentation of
the results would be to quote the exact P-values and
the CIs. Thus, the reader would have the possibility of
interpreting the results from a broader perspective
(Gardner & Altman 1986, Goodman 1999, Abt 2011,
Pandis et al. 2011). Nevertheless, in accordance with
previous findings (Pandis et al. 2011), it is crucial to
note that there is very limited adoption of CI reporting
in the literature on leakage in Endodontics. In this
review, only one of 90 studies that used parametric
tests included CIs. This pattern of data interpretation
may have important implications for the implementation of research findings in clinical practice (Pandis
et al. 2011, Polychronopoulou et al. 2011).
In this study, amongst all the papers reporting
positive results on the basis of the inferential test, the
use of the odds ratio reflected large treatment effects
in 75% of them. Conversely, amongst five studies
with negative results (no significant differences), the
use of the odds ratio indicated that the intervention

was probably clinically relevant only in only one case

(20%). Therefore, it was verified that the use of the
odds ratio could help avoid an erroneous interpretation in one study, at least.
The limitations of this review include the fact that
only a specific topic (leakage) was included, perhaps
introducing selection bias; however, articles from
20 different journals were examined, and therefore,
the inclusion of more topics, although this would
have increased outcome precision, would not be
expected to alter the conclusions.
The availability of statistical software packages and
the lack of a system to validate the competence of
persons who perform the statistical analysis could
explain this prevalent misuse of statistics in dental
research (Altman et al. 2002, Kim et al. 2011).
Although the errors in research methods are mainly the
authors responsibility, a clear attitude taken by the
editorial boards of dental journals is also required to
minimize this problem in forthcoming years. Authors
and editors should have the same goal, to seek the highest standards of science. Therefore, researchers may
benefit greatly from involving expert statistical help at all
stages of a research project. Editorial boards could also
contribute by creating detailed guidelines on how to conduct and report statistical methods, by applying stronger
statistical reviewing policies and by encouraging the publication of the full and transparent reporting of research
(i.e. extended version of articles including the raw data
used in research articles) (Altman 2002). If journals are
willing to implement this policy, they should suggest it
explicitly to authors. With respect to the consequences of
statistical mistakes, as they can go unnoticed for a long
time, their impact is not easy to quantify.

Statistical deficiencies found in studies of leakage in
Endodontics may affect their results and interpretation
and might be one of the reasons for the poor agreement
amongst the reported findings. The standardization of
statistical methodology should be more widely implemented and assistance from a statistical expert is highly

We thank David Nesbitt and Glenn Harding for the
English version of the text. This research was supported
by Ministerio de Ciencia e Innovacion (Spain) grant

Statistics and leakage studies Lucena et al.

