Annals of EM Gaddis Biostat

SPECIAL CONTRIBUTION
biostatistics
Introduction to Biostatistics:
Part 1, Basic Concepts
Statistical methods commonly used to analyze data presented in journal
articles should be understood by both medical scientists and practicing
clinicians. Inappropriate data analysis methods have been reported in
42% to 78% of original publications in critical reviews of selected medical journals. The only way to halt researchers' misuse of statistics and
improve the clinician's knowledge of statistics is through education. This
is the first of a six-part series of articles intended to provide the reader
with a basic, yet fundamental knowledge of common biomedical statistical methods. The series will cover basic concepts of statistical analysis,
descriptive statistics, statistical inference theory, comparison of means,
X2, and correlational and regression techniques. A conceptual explanation
will accompany discussion of the appropriate use of these techniques.
[Gaddis ML, Gaddis GM: Introduction to biostatistics: Part I, basic concepts. Ann Emerg Med January 1990;19:86-89.]
INTRODUCTION
The practicing physician must remain abreast of new information and
techniques and meet standards for continuing medical education. A major
source of new information is the medical journal. Given the competition
involved in achieving publication of research, as well as the review and
editing process, the reader might justifiably assume that published studies
are correct in their methodology, regardless of the conclusions made. Although this assumption seems reasonable, it is not correct. Between 1979
and 1984, 42% to 78% of original publications from selected medical journals used inappropriate statistical analysis methods. 1-5
Statistical analysis involves organization and mathematical manipulation of data. This process can describe characteristics studied and help to
infer conclusions from the data, thus guiding the acceptance or rejection of
a given treatment or theory. Incorrect data analysis is a grave error in the
research process, often leading to inappropriate conclusions, continued
study of erroneous hypotheses, and curtailed study of viable therapies and
therapeutic adjuncts. Additional dangers ensue when a physician uses nonefficacious treatment on a patient. From this, it becomes apparent that a
basic knowledge of statistics can be an important tool for any clinician,
whether in performing research or simply reading about it.
However, statistical analysis of data is a task commonly delegated to
statistical consultants. This is often justified by citing that there are those
more qualified to perform this function than the principal investigator of a
study. Though this seems logical, the principal investigator remains the
individual ultimately responsible for the content and conclusions of a research project. The principal investigator cannot effectively meet this
obligation fully if he does not have an adequate working knowledge of
biomedical statistics. The investigator cannot plead innocence through ignorance when serious errors are made.
Thus, there is an obvious need for statistical education among clinicians,
not only to provide for a better understanding when reading the biomedical
literature but also to aid medical researchers in communication with consulting statisticians and for selection of appropriate data analysis techniques.
19:1 January 1990
Annals of Emergency Medicine
Monica L Gaddis, PhD

Gary M Gaddis, MD, PhD
Kansas City, Missouri
From the Departments of Surgery and
Emergency Health Services, Truman
Medical Center, University of Missouri,
Kansas City.
Received for publication September 1,
1989. Accepted for publication October 4,
1989.
Address for reprints: Monica L Gaddis,
PhD, Department of Surgery, Truman
Medical Center, 2301 Holmes, Kansas
City, Missouri 64108.
86/143
BIOSTATISTICS
Gaddis & Gaddis
FIGURE I. A bell-shaped or normal
distribution curve.
FIGURE 2. Frequency distributions:
A, bimodal; B, rectangular; C, positively skewed; D, negatively skewed.
(Hopkins KD, Glass GV: Basic Statistics for the Behavioral Sciences. Englewood Cliffs, N e w Jersey, PrenticeHall Inc, 1978, p 36.)
S T A T E M E N T OF PURPOSE
It is our purpose to present a sixpart series discussing the basics of
biomedical statistics, with the intent
to familiarize the reader with the terminology and appropriate use of data
analysis techniques commonly used
in original papers published in Annals of Emergency M e d i c i n e and
o t h e r c l i n i c a l m e d i c a l journals.
Learning will be directed toward a
conceptual understanding of statistical analysis m e t h o d s rather than
computational exercises.
Part 1, in this issue of Annals, presents some basic concepts of statistical analysis. Knowledge of these
building blocks is necessary for a
complete understanding of biomedical statistics and the topics of future
articles in this series.
Part 2 will address descriptive statistics. These include measures of
central tendency (mean, median, and
mode), measures of variability (standard deviation and standard error of
the mean), and confidence intervals.
Appropriate uses of these will be discussed.
Part 3 will introduce statistical inference theory. An explanation of the
concept of hypothesis testing, definition of the probability value P, a discussion of the terms alpha, beta, and
power, and a discussion of clinical
versus statistical significance will be
included. Sensitivity, specificity, and
p r e d i c t i v e value will also be addressed.
Part 4 will present parametric and
nonparametric methods used for the
comparison of means. Included will
be analysis of variance (ANOVA), the
Student's t test, the Mann-Whitney
U test, and m e t h o d s of multiple
comparisons.
Part 5 will present a discussion of
X2 and Fisher's exact tests. Both statistical methods should be understood by readers of biomedical research involving a study of the effic a c y of e x p e r i m e n t a l m e d i c a l
treatments.
144/87
B,
2
Part 6 will give a basic discussion
of correlation and regression, along
with the series' concluding remarks.
SAMPLE V E R S U S
POPULATION
It is clearly impossible for most
scientific studies to survey all individuals of a group about which conclusions are to be drawn. Cost and
time considerations force the researcher to settle for studying a subset of a group in order to form conclusions about the entire group. For
example, if one wished to know the
systolic, diastolic, and mean blood
pressures of the entire population of
the United States, this testing could
be done as part of the next census, in
which all members of the population
would be surveyed. However, the
cost of training census takers for this
time-consuming task and the need to
hire additional census takers because
of the increased time required to
check each individual would make
sampling the entire population an
impractical task. Similarly, in biomedical research, time and financial
costs preclude study of every member of the population.
However, if a representative sample of an appropriate population can
be obtained and studied, conclusions
about the sample can be properly extrapolated to the defined population.

The key to obtaining a representative
sample of that population lies in random selection of a study sample from
the applicable population. Randomization can be accomplished by sample subject selection using a random
number table, drawing numbers or
names out of a hat, or the like. Random sampling implies that all individuals in a population have an equal
chance to be included in the sample.
It is when random allocation of
study subjects to treatment groups is
violated that bias is introduced. Bias
can easily lead to erroneous conclusions because control and treatment
groups may inherently differ in relevant characteristics before the study
is initiated. Therefore, post-treatment differences between groups can
erroneously be ascribed to an effect
of the experimental treatment! Improper allocation of subjects to c o n trol and treatment groups remains a
significant problem confounding current biomedical research.
DATA SCALES
Before an analysis method can be
selected, the type of data that will be
generated by the research process
must be defined. Data will fit a nom19:1 January 1990
F I G U R E 3. Summary of biostatisti-
Statistical The organization and mathematical manipulation of

analysis data, used to describe characteristics studied and/or
Population
to help infer conclusions from the data

A large group possessing a given characteristic or set
of characteristics. A population may be finite (the
states of the United States) or infinite (blood pressure
measurements of all infants born in New York)6
The studied subset of members of a defined

population 6
The
process of selection of a sample from a
Random sample
population whereby each member of the population
has an equal and independent chance of being
chosen 6
Nominal scale Numbers are arbitrarily assigned to characteristics for
data classification
Ordinal scale Numbers are used to denote rank-order, without
defining a magnitude of difference between numbers
Interval scale Numbers denote units of equal magnitude as well as
rank order on a scale without an absolute zero
Ratio scale Numbers denote units of equal magnitude as well as
rank order on a scale with an absolute zero
Distribution The systematic organization of a collection of data
Normal A grouping of data that is graphically symmetrical and
distribution bell shaped. Many human anatomic and physiologic
characteristics are normally distributed
Parametric Used when the data studied are from a sample or
methods population that is normally distributed. Data must be
of an interval or ratio scale
Sample
Nonparametric
methods
Used when the data studied are from a sample or

population that deviates from a normal distribution.
Ordinal data are analyzed using nonparametric
methods of analysis
inal, ordinal, interval, or ratio scale.
Nominal Scale
N o m i n a l is the m o s t p r i m i t i v e of
t h e data scales. I n f o r m a t i o n classified
b y an a s s i g n e d n u m b e r or code to
m a k e the data n u m e r i c fits a n o m i n a l scale. For instance, gender m a y
be defined as 1 for " f e m a l e " and 2 for
" m a l e . " Clinical diagnoses m a y also
be assigned r e p r e s e n t a t i v e n u m b e r s ,
s u c h as 1 for " r e n a l failure," 2 for
"congestive heart failure," 3 for "diabetes," and so forth. It m u s t be emphasized that the numbers selected
are purely arbitrary and are chosen at
the discretion of the researcher witho u t regard to any order of ranking of
severity.6, 7
Ordinal Scale
Ordinal scale data can be ranked in
a specific order, be it low to high or
h i g h to low. A n e x a m p l e w o u l d be
data from a questionnaire in w h i c h a
19:1 January 1990
response of "strongly agree" is scored

as 5, "agree" is scored 4, "no opini o n " is scored 3, "disagree" is scored
2, and "strongly disagree" is scored 1.
In t h i s e x a m p l e , t h e r e s p o n s e s are
s c o r e d on a c o n t i n u u m , w i t h o u t a
c o n s i s t e n t level of m a g n i t u d e of differences between ranks. However,
u n l i k e the n o m i n a l scale, n u m b e r s of
an ordinal scale are not arbitrary because the order of n u m b e r s is meaningful. 6 For instance, in the above example, p r o g r e s s i v e l y larger n u m b e r s
i n d i c a t e p r o g r e s s i v e l y greater agreement with the question presented.
T h e G l a s g o w C o m a Score s allots a
progressively decreased classification
number that implies progressively
worsened obtundation. Other examp l e s of o r d i n a l s c a l e s f a m i l i a r to
many emergency physicians would
i n c l u d e t h e T r a u m a Score 9 and the
Injury Severity Score. 1
A v e r a g e v a l u e s of o r d i n a l s c a l e
data are often calculated but are usu-
cal terms.
ally misleading because of the lack of
any c o n s i s t e n t m a g n i t u d e of differe n c e b e t w e e n u n i t s of t h e scale, u
Also, it is n o t u n c o m m o n in t r a u m a
o u t c o m e studies to see such data reported with standard deviation
values. C a l c u l a t i o n of such statistics
f r o m n o n p a r a m e t r i c , and t h u s nonn o r m a l l y d i s t r i b u t e d data, is h i g h l y
questionable because use of t h e standard deviation assumes that the data
are n o r m a l l y distributed. H
Interval Scale
Interval scale data are a step m o r e
s o p h i s t i c a t e d t h a n ordinal scale data.
N o t only is there a p r e d e t e r m i n e d order to the n u m b e r i n g of the scale, but
also t h e r e is a c o n s i s t e n t l e v e l of
m a g n i t u d e of difference described bet w e e n the observed data units. 6 Interval data also have a clearly defined
u n i t of measure. However, " t h e zero
p o i n t on t h e scale is arbitrary, and
does n o t c o r r e s p o n d to a t o t a l absence of t h e characteristic m e a s u r e d
. . . . ,,6 T h e Farenheit scale for temperature is interval in nature, as the
n u m b e r i n g of the scale is consistent,
y e t the zero v a l u e is arbitrary. Because of t h e consistent n u m b e r i n g of
t h e scale a n d e q u a l m a g n i t u d e bet w e e n m e a s u r e m e n t u n i t s , average
values and measures of variability of
i n t e r v a l scale d a t a are m e a n i n g f u l .
A n interval scale can be converted to
an ordinal scale to show ranking, but
an ordinal scale ordinarily cannot be
converted to an interval scale. 6
Ratio Scale
T h e ratio scale is s i m p l y an interval scale w i t h an a b s o l u t e zero. 6 A
p r e d e t e r m i n e d order to the numbering of the scale is present, as is a cons i s t e n t level of m a g n i t u d e b e t w e e n
each u n i t of measure. Ratio data can
be c o n v e r t e d to o r d i n a l data. H e a r t
rate, blood pressure, distance, time,
and degrees Kelvin r e p r e s e n t exampies of ratio scale data.
DISTRIBUTIONS
O n c e d a t a are collected, t h e y can
be organized into a d i s t r i b u t i o n , or
g r a p h of f r e q u e n c y of o c c u r r e n c e .
T h i s is a v i s u a l l y d e s c r i p t i v e t o o l
that allows the researcher to begin to
define and analyze data.
Figure 1 is a theoretical frequency
d i s t r i b u t i o n of resting h e a r t rate of
88/145
BIOSTATISTICS
Gaddis & Gaddis
e m e r g e n c y p h y s i c i a n s . Its shape is
s y m m e t r i c a l and bell-shaped, and is
defined as the " n o r m a l distribution."
M a n y h u m a n a n a t o m i c and physiologic c h a r a c t e r i s t i c s approach the
normal distribution.6 Other terminology used synonymously with
"normal distribution" includes
G a u s s i a n curve, curve of error, and
n o r m a l probability curve. A n understanding of the characteristics of the
n o r m a l d i s t r i b u t i o n is f u n d a m e n t a l
i n the d e v e l o p m e n t of even a basic
l~nowledge of biostatistics. This topic
will be discussed in greater detail in
Part 2 of this series.
Other distributions are depicted in
Figure 2. 6 Bimodal distributions have
two peaks of cluster, or areas w i t h a
high frequency level. For example, if
the weights of A m e r i c a n adults were
plotted, there w o u l d be two definitive points of cluster, one for female
weight and one for male weight.
Data that are rectangularly distributed show equal frequency of occurrence for all levels of a characteristic.
The date of birth (month and day) of
a sample of all p a t i e n t s seen i n an
e m e r g e n c y d e p a r t m e n t approaches
this distribution pattern.
Skewed data are those that tail off
to either the high or low end of meas u r e m e n t units. The a n n u a l i n c o m e
of patients seen i n the ED of an inner-city c o m m u n i t y hospital will be
defined by a distribution that is posi t i v e l y skewed, s h o w i n g a high frequency of lower a n n u a l incomes. A
negatively skewed distribution has a
cluster of data on the high end of the
u n i t scale and tails off toward the
low end. 6
G i v e n the nature of data collected
in medical science research, the norm a l d i s t r i b u t i o n is t h e o n e w i t h
which the clinician will be most familiar. Most statistical methods ap-
146/89
plied to interval or ratio data assume

t h a t the data are n o r m a l l y distributed. It is w h e n this a s s u m p t i o n is
violated that significant controversy
exists i n the statistical c o m m u n i t y
regarding t h e proper a p p l i c a t i o n of
statistical tests.
PARAMETRIC VERSUS
NONPARAMETRIC METHODS
Statistical methods are defined as
being p a r a m e t r i c or n o n p a r a m e t r i c .
T h e type of a n a l y s i s m e t h o d t h a t
should be selected is d e p e n d e n t on
the nature of the data to be analyzed.
Parametric methods use data extrapolated from a sample of the population studied to n u m e r i c a l l y describe
some characteristic of a population.
P a r a m e t r i c m e t h o d s are v a l i d o n l y
w h e n t h a t characteristic follows or
nearly follows the n o r m a l distribut i o n i n t h e p o p u l a t i o n s t u d i e d . 12
Thus, parametric methods can be applied properly to m o s t i n t e r v a l and
r a t i o scale data w h e n t h o s e data
come from a sample of a n o r m a l l y
distributed population. Parametric
methods can be used to derive measures of c e n t r a l t e n d e n c y and variability, w h i c h will be discussed i n
Part 2 of the series.
N o n p a r a m e t r i c m e t h o d s are app l i e d to n o n - n o r m a l l y d i s t r i b u t e d
data and/or data that do not meet the
c r i t e r i a for t h e u s e of p a r a m e t r i c
m e t h o d s . D a t a t h a t fit the o r d i n a l
scale d e f i n i t i o n should be analyzed
by n o n p a r a m e t r i c m e t h o d s . Examples i n c l u d e Glasgow C o m a Scale, 8
T r a u m a Score, 9 the Injury Severity
Score, lo a n d o t h e r s i m i l a r o r d i n a l
scales data.
SUMMARY
This has been the introductory ins t a l l m e n t of a series of articles outlining the proper use of biostatistics.
We have introduced some basic concepts regarding data and its classification, w h i c h will be needed for understanding of topics to be presented
s u b s e q u e n t l y (Figure 3). Measures of
central tendency, m e a s u r e s of variability, confidence intervals, and the
a p p r o p r i a t e use of t h e s e s t a t i s t i c a l
concepts will be discussed i n part 2
of this series.
REFERENCES
1. Glantz SA: Biostatistics: How to detect, correct, and prevent errors in the medical literature. Circulation 1980;61:1-7.
2. FelsonDT, Cupples LA, Meenan RF: Misuse
of statistical methods in arthritis and rheumatism: 1982 versus 1967-68. Arthritis R h e u m
1984;27:1018-1022.
3. Thorn MD, Pulliam CC, Symons MJ, et al:
Statistical and research quality of the medical
and pharmacy literature. A m J Hosp Pharm
1985;42:1077-1082.
4. Avram MJ, Shanks CA, Dykes MHM, et ah
Statistical methods in anesthesia articles: An
evaluationof two Americanjournalsduringtwo
six-month periods. A n e s t h Analg 1985;64:
607-611.
5. MacArthur RD, Jackson GG: An evaluation
of the use of statistical methodologyin the Journal of Infectious Diseases. J Infect Dis 1984;
149:349~354.
6. Hopkins KD, Glass GV: Basic Statistics for
the Behavioral Sciences. EnglewoodCliffs, New
Jersey, Prentice-HallInc, 1978,
7. Elenbaas RM, ElenbaasJK, Cuddy PG: Evaluating the medical literature. Part II: Statistical
analysis. Ann Emerg Med 1983;12:610-620.
8. Teasdale G, Jennett B: Assessment of coma
and impaired consciousness:A practical scale.
Lancet 1974;2:81-84.
9. Champion HR, 8acco WJ, Carnazzo AJ, et al:
Trauma score. Crit Care Med 1981;9:672-676.
10. Baker SP, O'Neill B, HaddonW Jr, et al: The
Injury Severity Score: A method for describing
patients with multiple injuries and evaluating
emergency care. I Trauma 1974;14:187q96.
ll. Sokal RR, Rohlf FJ: Biometry, ed 2. New
York, WH Freeman and Co, 198!, pp 39-52.
12. Glantz SA: Primer of Biostatistics, ed 2.
New York, McGraw-HillBook Co, 1987.
19:1 January 1990

Annals of EM Gaddis Biostat

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Annals of EM Gaddis Biostat

Hochgeladen von

Copyright:

Verfügbare Formate

SPECIAL CONTRIBUTION

19:1 January 1990

Annals of Emergency Medicine

Monica L Gaddis, PhD

FIGURE I. A bell-shaped or normal

about the sample can be properly extrapolated to the defined population.

Statistical The organization and mathematical manipulation of

to help infer conclusions from the data

The studied subset of members of a defined

Used when the data studied are from a sample or

inal, ordinal, interval, or ratio scale.

response of "strongly agree" is scored

Annals of Emergency Medicine

plied to interval or ratio data assume

Annals of Emergency Medicine

19:1 January 1990

Das könnte Ihnen auch gefallen