Sie sind auf Seite 1von 10

1

Research Method
And
Related Concepts from Epidemiology &
Statistics

Amrut A Bang
(This note is meant for all those who seek a basic introduction to the above subject. It
contains brief description of many useful terms and concepts but will require the user to
comprehend them and study further to get a hold because the text may sound cryptic at times.
The note also provides interesting insights regarding the approach of science. For its best
use it is advised to get engaged in a study/survey and while doing so, read this note and
use/apply the concepts to the practical issues. For better understanding and further study,
references can be followed. Thanks to Dr. Pawankumar Patil for giving me the inspiration to
write this one!)
Introduction to the method of science:

Research is a systematic process of collection, analysis and interpretation of data to generate
new knowledge or to answer certain questions or
Research is finding answer to an unanswered question, in a scientific and flawless manner,
so that the answer is as close to the truth as possible and is acceptable to the scientific
community.
A theory is an abstract generalization of an observed relationship.
A model is a symbolic representation of a relationship observed or expected.

A scientific theory is just a (mathematical) model we make to describe our observations; it
exists only in our minds. So as in the case of the concepts of real and imaginary time from
astrophysics, it is meaningless to ask which is real out of these two. It is simply a matter of
which is the more useful description.



The real test for a scientific theory is whether it makes predictions that agree with
observation. (Theory is Green but Life is Evergreen!)

The way science goes is:
1. Observation
2. Hypothesis
3. Experiment
4. Analysis
5. Conclusion
You cant argue with a mathematical theorem!

Amrut A Bang is a NIRMAN Fellow at MKCL,
Pune. He also works as a member of the
NIRMAN coordination team.
He has done his bachelors in computer
engineering from Pune Institute of Computer
Technology, University of Pune.
2


Anything scientific has limits! So state the limits and state the condition(s) in which the
theory can go wrong i.e. state the ways to check the theory for its falsifiability.
e.g. J.B.S. Haldane had in particular growled for the opponents of the theory of evolution that
if rabbit fossils could be found in the pre-Cambrian, the theory of evolution would be proved
wrong.



Importance of measuring:

Niels Bohr had famously said that nothing exists until it is measured!
Thus, while we are doing research, or conducting a study, its important that we do not leave
things to chance. So possibly try to measure everything that can affect the study.


To have better clarity about the subject, always ask two questions:
1. MK + ?
2. KM ?
Try and answer these two questions for everything that is significant. To maintain the
feasibility and the authenticity, any good research focuses. Do not try to include or cover
everything in the research. Narrow down!
An important insight of scientific research is that unless we measure things, we cant assess
or state the actual impact of what we are doing. And thus, we can continue to be an illusion of
self-appreciation dreaming that we are successful with whatever we are doing. The hard fact
is that the entire society never changes at one go, its a gradual, progressively increasing (or
decreasing) difference in situations that is attributed to as change.



c c c c

||

||

||

|| | | | | | | | | . . . . We can always
go wrong tomorrow. . . . .
. . . . 48 48 48 48% % % % 17 17 17 17% % % % . .. .
We may never make things zero, but we can measure and minimize them!
3

Important points to be remembered while doing social research:

Know what has been done. Do in depth literature review.
There is no ultimate truth. The method of science to find the truth is an iterative
process, like a helix.
Question everything!
.
18 .
| + !
..... ? !
.

Research Protocol:

Whether its observational or experimental study, like the laboratory research, it is essential to
elaborate and follow a research protocol which will increase the likelihood that the
conclusions drawn will be scientifically sound.
The Research Topic should be able to stand alone as an explanation of the study.

The protocol should explain the study in terms of answers to the following questions:

Why? Sets out the study questions and the relevant background information: that
why we should embark on this particular research problem.

How? Describes the study design and the rational for choosing it. Also describes
Instruments /techniques to be used.

Who? Defines the targets and the study population and sample size.

What? Identifies the variables to be measured and outcomes to be analyzed.

So what? Comments on the expected significance of results and contribution to
knowledge.

Study questions should be written in the form of questions rather than statements. The
questions need greater precision in order to increase the productivity and piercing power of
the research.
The objectives of the study should be SMART i.e. Specific, Measurable, Achievable,
Relevant and Time bound.
All the study formats or questionnaires should be placed as annexures.


4

Research Design or Protocol Format:

1. Protocol summary
2. Background or Introduction
- Purpose
- Study questions
- Rationales and Previous knowledge on the subject
3. Objectives (Research Questions, Hypotheses)
4. Design and methods
- Study design
- Study population
- Sample size and sampling
- Study subject: selection, definitions
- Sources of information
- Study questionnaires, formats
- Data collection methods
- Data management and statistical data analysis methods
5. Project Management
- Personnel
- Action plan (Time schedule)
6. Strengths and limitations
7. Ethical considerations
8. Expected outcome (Results: Manifest, Latent)
9. Budget summary
10. References
11. Annexure
- Study formats/Questionnaires
- Budget details
- Time frame
- Requirements (Human, Material)



Empirical data is gathered through observation, experience and/or experiment.
5

Concepts from Epidemiology and Statistics:

Types of research studies:
A. Experimental or intervention study
B. Non-experimental or observational study
1. Cross-sectional study gives a snapshot, useful to find prevalence
2. Longitudinal study expanded in the time domain
i. Prospective or cohort or follow up study groups with reference to
cause, looks for the effect
ii. Retrospective or case control or case referent study groups with
reference to effect, looks for the cause
A cohort study takes a population with similar starting point and observes it over a period of
time, for example, a batch of NIRMAN.
A cross sectional study does not bother about the similarity of the starting point and we
generally measure the instantaneous value of a variable(s) in the sample taken.
For any study, comment on its results as well as limitations.
Study or Intervention area vs. Control area these are the two groups into which the
population is split. Study group is marked by a particular differentiating character
either as a cause to be explored or an intervention to be done. Control is a similar
chunk of population which serves as the reference with which the results in the study
area can be compared to know any difference with regards to the effect of the cause or
intervention. However, we do measurement in both the areas, with the same rigor.
Before After comparison in both the study and control area is generally done in case
of intervention studies to know the effect or impact of the tried intervention, separated
in the time domain.

Sample and Sampling:
A sample is a (supposed to be) representative of a population. Sampling is the method of
selecting the sample.

We take sample because most of the times it is not possible, feasible to study the entire
population. Thus we study the sample and try to generalize the findings over the concerned
population. To be able to do this, utmost care has to be taken while selecting the sample.

Example: Generally, the blood system in our body is so uniform that even a few drops of
blood from our finger fairly accurately represent the entire blood in our body and thus the
check for pathogens could be limited to those drops of blood.
However, most of the times this is not the case with other populations, and thus we have to
take lot of care while selecting the sample, because the population is not so uniform.

Generally, to avoid any kind of biases and to increase the probability of the sample being
closer to a true representative of the population, random sampling method is used, in which
every element in the population has an equal chance of getting into the sample. Keep in mind
that sampling can be random, but the sample should be representative.
6

When the sample is chosen with some predefined criteria in mind or is deliberately chosen
for a cause, its called purposive sample and is not a representative of the entire population.

Various sampling methods:
1. Simple random sampling
2. Stratified random sampling
3. Systematic sampling
4. Cluster sampling
5. Multistage sampling
For randomization, we have to specify:
1. Population
2. Unit of randomization e.g. individual or cluster
3. Method used to randomize
The assumption of no difference between populations or no association between factors in the
same population is known as the Null Hypothesis. Typically, the null hypothesis is tested
through the study.

Statistical terms:

Mean is the average of the observed sample of values (denoted and read as x-bar: x ).
Mode is the value which occurs the maximum time in the observed sample of values.
Median is a value such that 50% of the observed values are above it and 50% are below it.

Measures of dispersion:
Range of a group of observations is the interval between the smallest and the biggest
observation i.e. its only dependent upon the two extreme values in the observations.

Interquartile range is the interval between the values of the upper quartile and lower
quartile of observations for that group. Upper quartile of a group is the value above which
25% of the observations fall. Lower quartile is the value below which 25% of the
observations fall. Interquartile range gives us the range which covers the middle 50% of
the observations in the group and thus is unaffected by the occurrence of rare extreme
values.

Mean deviation is the arithmetic mean of the deviations of observations from the
arithmetic mean, ignoring the sign.
So, mean deviation = ( | x - x |) / n
where, x is the arithmetic mean, n is the number of observations and is summation.
7

Standard deviation (S or or stdv) is the square root of the average of the squared
deviations of the observations from the arithmetic mean.
S or = { [ (x - x )
2
] / n }
It together with arithmetic mean can describe a frequency distribution uniquely.
Variance is the square of the stdv.

Coefficient of variation is the stdv expressed as the percentage of arithmetic mean.
So, C.V. (per cent) = (stdv / mean) * 100
Standard error: Standard deviation talks about the distribution of values around the
arithmetic mean. But the mean itself may be inaccurate.
Hence, the standard deviation of the mean value is called as the standard error (se).

p-value is the probability of the phenomenon happening by chance.
If it is lesser than 5% i.e. p < 0.05, then we say that the factor or relationship is significant
and not just by chance. So lesser the p-value, lesser is the chance of chance and hence
more the significance.
Odds Ratio is the estimated relative risk. For e.g. what is the risk of developing a disease
as compared to the reference group?
Crudely, if a variables proportion or value is 1 in the reference group, what is the
proportion or value of that variable in the study group or the focus group? Ratio of an
indicator in Cases to that of Control is the O.R. and it shows association.

e.g. O.R. of a disease = Odds that exposed individuals will have disease / Odds that non-
exposed individuals will have disease

By adjusted odds ratio, we mean that apart from that particular variable with whom we
are concerned, all other variables are made common on the baseline, statistically. i.e.
regression is done and just one variable is kept varying, rest all having been adjusted.
This adjustment is required so that we can attribute the difference observed to a particular
variable or cause; else there will be confusion so as to whats the cause of change. Thus by
adjusting the variables, we eliminate their effect and isolate the one we want to focus on.



We cant adjust something that we have not measured.
8

Correlation, Regression and Causality:

Correlation or Association talks about whether two variables are interdependent or co-vary,
i.e. if they vary together. We dont express one as a function of the other. There is no
dependent and independent variable distinction.
In Regression, we intend to describe the dependence of a variable on an independent
variable. So we may try to check possible causation of changes in y by changes in x; or
variations in y by x.
e.g. The effect of age of a person on blood pressure, dose of insecticide on mortality of an
insect population, etc.

Correlation is easy to find or assess when there are just two variables involved. But what
when there are multiple variables?
Regression Analysis is a method which is used to control for all the other influencing factors
and test only the correlation between the two factors that one is interested in at the particular
moment.


Correlation is a statistical term which indicates whether two variables move together. It is
different than Causality which has a typical Cause and an Effect relationship.
We need to check whether a particular thing is the cause of the other or is just an indicator of
the other. For e.g. books is not a cause of intelligence but an indicator.
X can cause Y; Y can cause X; or there can be a third thing Z which causes both X and Y.
You can hide the real cause by disproportionately blaming the confounder.
Generally, if the confounding factor is eliminated, then we can say whether correlation is
itself like the causality.

Results of a retrospective study may point to the existence of a real association between a
disease and some factor, but can never prove cause and effect. In case of an association
between A and B, (1) A may cause B; (2) B may cause A; (3) Both A and B may be caused
by a common factor. Temporality may enable to exclude (1) or (2), but (3)rd possibility can
never be excluded by retrospective methods alone.
In addition, any supposed association can turn out later to be a pure coincidence and no
association. Frequent checks on elongated time line can solve this issue.



Statistical Association or Correlation does not prove Causation.
Does the concept of Time in our daily lives resemble correlation or causation?
9

Suggested criteria (guidelines) to check for causation by A Bradford Hill
and Mervyn Susser:

1. Time order or Temporality: The causal factor must precede the effect, in time. This
is a sine qua non for causality.
2. Consistency: Is the same association found in many studies? Replicating the
association in different samples (different persons, places, times, and circumstances),
with different study designs, and different investigators gives evidence of causation.
3. Strength: Is the association strong? The larger the association, the more likely the
exposure is causing the disease. Weak associations may also be causal but it is harder
to rule out bias and confounding. Quantitative expressions of the strength of
association are: (a) Odds Ratio: higher the OR (away from unity), the stronger the
association. (b) p-value: with 95% confidence interval and p < 0.05, the smaller it is,
the stronger the association.
4. Specificity of exposure and outcome: when present, provides evidence of causality,
but its absence does not preclude causation. A one to one relationship between a cause
and an effect was suggested by the Germ Theory of Disease and Henle-Koch
Postulates. But now, the idea of multiple causes leading to a web of causation is more
accepted (esp. in case of chronic diseases).
5. Dose-response relationship: Is a regular gradient of disease risk found to parallel a
gradient in exposure? Persons who have increasingly higher exposure levels have
increasingly higher risks of disease. Or there may be threshold effect instead in which
below a threshold there are no adverse outcomes.
6. Coherence: The association in question is not incoherent with what we already know
and the coherence is:
Theoretical: compatible with pre-existing theory
Factual: compatible with pre-existing knowledge
Biological: compatible with current information of natural history and biology of
diseases
Statistical: compatible with a reasonable statistical model in the form of a regular
exposure/effect relation
7. Experiment: Investigator-initiated intervention that modifies the exposure through
prevention, treatment, or removal should result in less disease. This provides strong
evidence for causation, but most epidemiologic studies are observational.

We have a tendency to look for immediate reasons, solutions. But often the chain of
causation runs long and deep in the time domain and is quite complicated and unobvious
at times. We need to challenge the conventional wisdom and keep looking for such
reasons. Besides, the perceived severity of things and the probability of their happening
do not always go hand in hand.

Risks that scare people are generally different than the risks that kill people.
10

References:

1. Singh S, Suganthi P, Ahmed J & Chadha V: Formulation of health research
protocol a step by step description, NTI Bulletin, 2005, 41/1&2, 5-10.
2. Bartlett JE II, Kotrlik JW, Higgins CC: Organizations research determining
appropriate sample size in survey research, Information Technology, Learning,
and Performance Journal, Spring 2001, Vol. 19, No. 1, 43-50.
3. Sundar Rao PSS & Richard J: An introduction to biostatistics, Prentice Hall of
India, 1997, ISBN-81-203-1008-X.
4. Levitt SD & Dubner SJ: Freakonomics a rogue economist explores the hidden
side of everything, Penguin Books, 2005, ISBN 10: 0 141 02580 8.
5. Hawking S: A brief history of time, Bantam Books, 1988, ISBN 0-553-38016-8.
6. Mandil A: Causal inference in epidemiology, High Institute of Public Health,
University of Alexandria.
7. Karandikar A: Lectures at Metric Consultancy, 22 24 Feb, 2009.

Das könnte Ihnen auch gefallen