Biostatistics: DR Priyanka N Maiya

BIOSTATISTICS
DR PRIYANKA N MAIYA

Definition
Types of studies
Variables
Randomization
Various tests

BIOSTATISTICS
(a portmanteau word made from biology and
statistics)
The application of statistics to a wide range of topics
in biology
BIOSTATISTICS
It is the science which deals with development and
application of the most appropriate methods for
the:
Collection of data.
Presentation of the collected data.
Analysis and interpretation of the results.
Making decisions on the basis of such analysis
5
COMMON STATISTICAL TERMS
Data
Measurements or observations of a variable
Variable
A characteristic that is observed or manipulated
Can take on different values
6
STATISTICAL TERMS
Independent variables
Precede dependent variables in time
Are often manipulated by the researcher
The treatment or intervention that is used in a study
Dependent variables
What is measured as an outcome in a study
Values depend on the independent variable
EVIDENCE-BASED CHIROPRACTIC 7
STATISTICAL TERMS
Parameters
Summary data from a population
Statistics
Summary data from a sample
POPULATIONS
A population is the group from which a sample is drawn
e.g., headache patients in a chiropractic office; automobile crash victims in an
emergency room
In research, it is not practical to include all members of a population
Thus, a sample (a subset of a population) is taken
RANDOM SAMPLES
Subjects are selected from a population so that each individual has an
equal chance of being selected
Random samples are representative of the source population
Non-random samples are not representative
May be biased regarding age, severity of the condition, socioeconomic status
etc.
RANDOM SAMPLES
Random samples are rarely utilized in health care research
Instead, patients are randomly assigned to treatment and control
groups
Each person has an equal chance of being assigned to either of the groups
Random assignment is also known as randomization
VARIABLES
TYPES OF DATA
Constant
Variables
The measurements to be made are referred to as variables
Variables can be classified as quantitative (numeric) or qualitative
(categorical)
Examples
CD4 count represents numbers (or counts) of CD4 lymphocytes per
liter of peripheral blood, and thus is numeric
Prior myocardial infarction status can be classified in several ways. If
it is classified as either yes or no, it is categorical. If it is classified as
number of prior MIs, it is numeric
Quantitative
continuous
Types of variables
Quantitative variables Qualitative variables
Quantitative
discrete
Qualitative
nominal
Qualitative
ordinal
Further, numeric variables can be broken into two types: continuous
and discrete
Continuous variables are values that can fall anywhere
corresponding to points on a line segment .Egs.Weight and Diastolic
blood pressure .
Discrete variables are variables that can take on only a finite (or
countably infinite) number of outcomes. Number of previous
myocardial infarctions and parity are examples of discrete variables.
Categorical variables also are commonly described in one of two
ways: nominal and ordinal
Nominal variables have distinct levels that have no inherent ordering.
Hair colour and sex are examples of variables that would be
described as nominal.
On the other hand, ordinal variables have levels that do follow a
distinct ordering. Examples relate to degrees of change in patients after
some treatment (such as: vast improvement, moderate improvement,
no change, moderate degradation, vast degradation/death).
TYPES OF STUDIES
Introduction
There are various ways data can be collected
Which questions can be answered, and which ones cannot be
answered, entirely depends on how a specific data set arose
Also the strength of evidence depends on the method of data
collection
TYPES

Observational studies

Experimental studies
Descriptive studies
Analytical studies
EXPERIMENTAL VERSUS OBSERVATIONAL
STUDIES

Observational: The data are collected on a routinely basis, and no new
experiment is set up

Examples:
Hospital records on all patients treated
Data collected during yearly medical check-ups
Dataset on sickness absence
EXPERIMENTAL VERSUS OBSERVATIONAL
STUDIES

Experimental: The data are collected in a newly designed and
conducted experiment
Examples:
. Rats are treated and measured afterwards, at pre-specified time points
. Cancer patients are followed after chemotherapy and the outcome of
interest is the time until disease progression.

DESCRIPTIVE STATISTICS
As we have already known Descriptive Statistics is used to organize
and describe the characteristic of a collection data.
Descriptive Statistics has no hypothesis and doesnt analyze data
Descriptive Statistics measures are:
A. Central tendency: Mean, Median, and Mode
B. Variability: Range, Interquartile range, Variance, Standard
deviation, Coefficient of Variation

MEASURES OF CENTRAL TENDENCY

Arithmetic mean (mean)
Sum of all observations
Number of observations
MEDIAN

The observation which lies in the middle of the ordered observation
The median is less sensitive to extreme values than the mean is.
Medians frequently are used to measure the middle of the distribution
of an ordinal or numerical characteristic that is skewed.

MODE
The value which occurs with the greatest frequency i.e. the most common
value
Mode is not used frequently in practice

For correct practical application of central tendency measurements
two factors are important:
1. The scale of measurements
2. The shape of the distribution of data set

Outlying observations occur in only one direction, the distribution is
called a skewed distribution. There are two types of the skewed
distribution:
1. Negatively (skewed to the left) outlying values are small.
Relationship of central tendency measurements in this case is:
X< Md < Mo
2.Positively (skewed to the right) - outlying values are large.
Mo < Md < X
SKEWED DISTRIBUTIONS (CONT.)
Skew is always toward the direction of the longer tail
Positive if skewed to the right
Negative if to the left
The mean is shifted
the most
A symmetric distribution has the same shapes on both sides of the
mean.
Mo < Md < X
28
THE NORMAL DISTRIBUTION (CONT.)
Mean
A normal distribution is symmetric about its mean
As one moves away from
the mean in either direction
the height of the curve
decreases, approaching,
but never reaching zero
The highest point of
the overlying
normal curve is at
the mean
29
THE NORMAL DISTRIBUTION (CONT.)
Mean = Median = Mode
MEASURES OF DISPERSION
Range
Variance
Standard deviation
Coefficient of variation

Set 1: -200; -20; -10; 7; 10; 20; 200 (n=7; Mean =1)
Set 2: -20; -5; -2; 7; 2; 5; 20 (n=7; Mean =1)
As we see the measurements of the central tendency are the same
even the set data are different. Its why for appropriate descriptive
statistics of set data measurements of central tendency must be used
with variability measurements
THE STANDARD DEVIATION (S)
The standard deviation is an estimate of the average distance of
the values from their mean.
If the data is normally distributed (bell-shaped curve) approximately
68% of data will lie within 1 standard deviation, approximately 95%
within 2 standard deviations, and approximately 99% of data will lie
within 3 standard deviations
34
MORE PROPERTIES
OF NORMAL CURVES
About 68.3% of the area under a normal curve is within one standard
deviation (SD) of the mean
About 95.5% is within two SDs
About 99.7% is within three SDs
TYPES OF STUDIES
Types of epidemiological
studies
Observational
Experimental
Quasi-
Experi-
mental
Randomized
Controlled
studies
Cases and
controls
Cases or report
of cases
Cohort Cross sectional Ecologic
ANALYTICAL STUDIES
Case control studies Retrospective Studies
both exposure and outcome have occurred before the start of the
study
The study proceeds backwards from effect to cause
It uses a control or comparison group to support or refute an
inference

Cases and control
CASE CONTROL STUDIES

Selection of cases and controls
Matching
Measurement of exposure
Analysis : exposure rates

diseases
yes no
Exposed a b Odds ratio ad/bc
Non exposed c d
CASES AND CONTROLS STUDIES
Advantages:
They are adequate to study rare outcomes.
They are adequate to outcomes with long latency
period.
They are cheap and easy to apply.
It is not necessary to wait to present outcome.
Disadvantages:
A lot of bias.
They depend on the quality of registries.
Control group should be adequately selected,
because they represent the population without the
outcome.
EXAMPLES
Thalidomide tragedy
Oral contraceptives and thromboembolic disease

ANALYTICAL- COHORT
Also known as prospective,longitudinal
the cohorts are identified before the appearance of the disease
under investigation
The study groups are observed over a period of time to determine the
frequency of disease among them
The study proceeds forward from cause to effect
COHORT STUDIES
Subjects are selected because do not have the
outcome and they are classified if have or not have
the risk factor (exposure).
We follow up to prove if they develop the outcome.
The cohort study can be prospective if the follow up is
forward in the time or it can be retrospective (historic),
if it go back in the time.

Cohort

yes

no

Total
Exposed to the factor a b a+b
Not exposed to the factor c d c+d
disease
Incidence rates Exposed=a/a+b
Unexposed=c/c+d
Relative risk Incidence among exposed /incidence among non
exposed
COHORT STUDIES
Advantages:
They are adequate to know the causes of an outcome.
To know the natural history of disease.
They adequate when the exposure is rare.
They are useful when we study two or more outcome at the
same time.
Disadvantages:
They take a long time.
They are expensive.
Subjects can be lost in the follow up.
They are not adequate for study rare outcomes.
CROSS SECTIONAL STUDIES
Analyze data of a subjects group in a point of time.
Describe a disease and its importance for the population.
Define the needed on health.
CROSS-SECTIONAL STUDIES
Advantages:
They are useful to know the burden of a disease in a group.
Useful to evaluate diagnostic procedures.
To study common risk factors.
To study common outcomes.
Disadvantages:
Populations little willing to collaborate.
The sample can not be representative from the population.
It is not useful to search causes of the outcome.
CORRELATION

Pearsons Correlation Coefficient
Spearmans Rank Correlation Coefficient

A lot of medical researches are related to relationship between two or
more characteristics.
For this kind of purpose is appropriate to use correlation that is able
to examining the relationship between two variables

PEARSONS CORRELATION COEFFICIENT
Pearsons correlation coefficient is one measure of the relationship between
two numerical characteristics, symbolized by X and Y. The correlation
coefficient is denoted by r, it is calculated using the formula

The maximum values that r can achieve is 1, and its minimum values is -1.
Therefore for any given set data: -1 r 1

Positively correlation ( + ), when coefficient of correlation is 0< r 1 :
Y tends to increase in magnitude as X increases;
Negatively correlation ( - ), when coefficient of correlation is -1 r< 0
: Y decreases as X increases;

EXPERIMENTAL
STUDY
Definition
A study in which the risk factor/exposure of interest is controlled
by the investigator; randomization is generally used
Use
Most convincing demonstration of causality
Limitation
Logistic and ethical difficulties in its application to human studies
Randomized controlled trial

Drawing up a protocol
Selecting reference and experimental group
Randomization
Manipulation
Follow up
Assessment

RANDOMIZATION
RANDOMIZATION
Randomization is a statistical procedure by which the participants are
allocated into groups called study and control to receive an or not to
receive a manoeuvre/intervention
Is an attempt to eliminate bias and allow for comparability
HEART of a control trial
METHODS OF RANDOMIZATION
Simple randomization
Throwing coins or dice, spinning wheels,. . .
Random number generators
Pre-generated lists should not be made available in advance, in order
to guarantee blinding
SYSTEMIC RANDOM SAMPLE
Done by picking every 5th or 10
th
unit at regular interval
By this method each unit of a sample frame would have the same
chance of being selected .
BLOCK RANDOMIZATION
With block randomization, subjects are put in small equal-sized
groups and,
within each block, equal numbers are allocated to the groups.
Block randomization also implies approximately equal numbers at
each moment during the study.
STRATIFIED RANDOM SAMPLE
With relatively small studies, (serious) imbalance can be obtained by
pure chance.
Stratified randomization can be used to ensure complete balance, at
least with
respect to some measured important prognostic factors.
Eg. analysing the data by a certain characterstic of a population eg. Age
groups
BIAS
DEFINITION
Any systemic error in the determination of association between the
exposure and disease.

It reflects some type of non comparibility between the case and
controls
TYPES
Memory /recall bias
Selection bias
Berkesonian bias
Interviewer bias
Bias due to confounding

Observer bias
Investigator bias
Blinding
Randomization
SAMPLE SIZE
The sample size denotes the number of subjects to be in the study
Chosen sample size is too small
This results in low power for detecting clinically relevant alternatives.
We run the risk of obtaining no conclusions, which means that
patients, probands, laboratory animals are burdened unnecessarily by
the study
research resources (money, laboratory places, working time etc.) are
wasted by the study
Chosen sample size is too large
Here we run risk that
the detected significant differences are clinically irrelevant
the inadequate therapy is applied too long
too many patients are bound to the study
resources are wasted
FACTORS INFLUENCING THE SAMPLE SIZE
Degree of difference expected
Degree of variation among subjects
The level of significance desired
Power of the study desired
Drop out rate
Non compliance to treatment
EXAMPLE OF SAMPLE SIZE CALCULATION
It is hypothesised that the dose of drug X needs to be adjusted when
administered to Indians compared to Europeans .
The investigator wants to test the hypothesis and calculate the sample
size required

Variation of T1/2 in Europeans : this may be available from published
papers which give mean t1/2 and its standard deviation
Variation of T1/2 in Indians : either one has to do a pilot study or
assume that it is same as the Europeans
The difference one would like to detect: Suppose T1/2 in Europeans is
50 hours and investigator feels that the question of dosage adjustment
will arise only if there is difference of 20 hours.

The alpha or type 1 error one wishes to tolerate
Two types of error
type I: H0 is incorrectly rejected
Type II error: H0 is incorrectly accepted

We would like to set up the rejection region to keep the probability of a
Type I error () and the probability of a Type II error () as small as
possible.
Unfortunately for a fixed sample size, if we try to decrease ,we
automatically increase , and vice versa.
Here a error is the probability we reject the null hypothesis when
it is true.
Suppose the test is performed at the = 5% level of significance
If H0 is correct, then one will observe a significant result in 5% of the
cases
Hence, in 5% of the cases, H0 would be incorrectly rejected
In biomedical sciences = 5% is often used, hereby allowing to make a
type I
error in 5% of the cases.

In all cases, a p-value less than corresponds to a test statistic being
in the rejection region (reject H0), and a p-value larger than
corresponds to a test statistic failing to be in the rejection region
The probability that we could have obtained as strong or stronger
evidence against the null hypothesis (when it is true) than what we
observed from our sample data is called the observed significance
level or pvalue

POWER OF A STUDY
The power of a test corresponds to the probability of rejecting the null
hypothesis when it is false.
That is, in terms of a test of efficacy for a drug, the probability we
correctly conclude the drug is effective when in fact it is.
To find out the chances of detecting a real difference power is
calculated as
Power=100-type 2 error
When type 2 error is 5-10% the power of the study will be 95-90%
respecly
So in the example
Difference to detect 20h
Significance- 5%()
Power 90%
Literature shows meansd in Europeans is 4912.4
Sd=12.4

N=(u+v)*(sd1+sd2)/(1-2)

Drop outs of the samples must be taken into account
STATISTICAL TESTS
TYPES OF TESTS
Parametric
Students paired t test
Students unpaired t test
ANOVA
Correlation co efficient
Regression analysis
Non parametric
Wilcoxan signed rank test
Wilcoxan rank sum test
Spearmans rank correlation
Chi Square test
STATISTICAL TESTS
In Statistical tests certain formulae /procedures based on certain
concepts /assumptions are used to calculate the P values which
reveal whether a result is significant or not .

A significant result is a result which is not likely to have occurred by
chance
STUDENTS t TEST(PAIRED AND UNPAIRED)
Used when
Data is quantitative
It is needed to compare two means
The sample is small<30
The data is distributed normally
Paired t test used when group is its own control
Unpaired t test when there are two groups-control and test
ANALYSIS OF VARIANCE(ANOVA)
To compare the means of three or more groups together
Used when
Subgroups to be compared are defined by just one factor
Subgroups are based on two factors eg. To compare the mean
haemoglobin levels in different regions and find out if there is sex
difference
Data is normally distributed
CORRELATION COEFFICIENT(R)
When the degree of linear association between two variables is
required correlation coef is calculated eg. b/w sugar level and insulin
in blood
To use this
Both test variables must be continuous

REGRESSION
This is used to estimate a dependence relationship
The dependence of one variable on the other is described
mathematically using this test
NON PARAMETRIC TESTS
Chi square test
Used to find out the association between variables
Eg. Whether smoking is a cause for lung cancer
In situations where propotions or percentages of two groups are
compared
On applying the tests described above one will end up with a value
depending on the test used
After obtaining this value a table is to referred to find out the P value
If the P value is less than 0.05, it is considered the groups are
significant
Clinicians use statistical methodology both to analyze their own data and to
assess the value of others scientific reports
When assessing the value of a scientific report, a careful clinician reads the
methods section and asks the questions :
Are the subjects selected to minimize bias?
Are the statistical methods appropriate?
Is the study large enough to be statistically powerful?
Are the conclusions warranted by the evidence?
Can the conclusions from this study be extended to cover more
general cases or do they apply only to the limited situation under
study?
Statistics helps to:. quantify the errors and control the errors

THANK YOU

Biostatistics: DR Priyanka N Maiya

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Biostatistics: DR Priyanka N Maiya

Hochgeladen von

Copyright:

Verfügbare Formate

BIOSTATISTICS

Das könnte Ihnen auch gefallen