Beruflich Dokumente
Kultur Dokumente
FAISAL SHAHZAD
PhD Scholar (China), M.Sc. Statistics (Norway), M.Sc. Statistics (BZU, Pak)
Certificate in Public Health (Sweden), Certificate in Epidemiology (Finland),
Certified Takaful Professional (Pak)
faisalisbest@gmail.com
Unit – 1
Introduction
Comprehensive Sample
Sources of Data
• Routine kept record: e.g. Patient info can be obtained from Hospitals
• Published reports: commercially available data banks, literature review
• Surveys: A set of certain questions.
For example: If the administrator of a clinic wishes to obtain information
regarding the mode of transportation used by patients to visit the clinic, then a
survey may be conducted among patients to obtain this information
• Experiments / Interviews: Frequently the data needed to answer a question
are available only as the result of an experiment.
For example: If a nurse wishes to know which of several strategies is best for
maximizing patient compliance, she might conduct an experiment in which the
different strategies of motivating compliance are tried with different patients.
A variable
Quantitative Qualitative
continuous nominal
Quantitative Qualitative
descrete ordinal
Types of variable
Sample:
A subset of the population Population Sample
Descriptive Statistics
Mean
• A measure of central tendency is a single value that attempts to
describe a set of data by identifying the central position within
that set of data. As such, measures of central tendency are
sometimes called measures of central location.
• The mean, median and mode are all valid measures of central
tendency, but under different conditions.
• Note: Mean can badly effect by outliers while Median is not.
The Population Mean:
N
X
i 1
i
x = x i 1
i
n
Example:
Here is a random sample of size 10 of ages, where
1 = 42, 2 = 28, 3 = 28, 4 = 61, 5 = 31,
x
6 = 23, 7 = 50, 8 = 34, 9 = 32, 10 = 37.
65 55 89 56 35 14 56 55 87 45 92 14 35 45 55 55 56 56 65 87 89 92
Mode
• The mode is the most frequent score in our data set. On a
histogram it represents the highest bar in a bar chart or
histogram. You can, therefore, sometimes consider the mode as
being the most popular option.
• Example: in the previous example, 55 & 56 are the mode. There
can also be 2 or 3 modes or no mode. Only you need to see the
most repeated value in the data set.
65 55 89 56 35 14 56 55 87 45 92
Comparison of Mean, Median and Mode
Measures of Dispersion: Variance
• It measure dispersion relative to the scatter of the values about mean
a) Sample Variance (S 2 ) :
• n
,where x is sample mean
(x x )2 i
S 2
i1
n 1
b) Population Variance ( 2 ) :
• N
,where mu is the Population mean
2
(x i )
2
i1
N
Measures of Dispersion: Variance
• Example:
• Data: 43,66,61,64,65,38,59,57,57,50 x = 56
• Solution:
• S2= [(43-56) 2 +(66-56) 2+…..+(50-56) 2 ]/ 10
= 900/9 = 100
• Line graph
• Frequency polygon
• Frequency curve
• Histogram
• Bar graph
• Scatter plot
Pie chart
Statistical maps
Line Graph
25
20
15
10
5
0
Age
25 35 45 55 65
8 F e m a le
7 M a le
6
Frequency
5
4
0
20- 30- 40- 50- 60-69
A g e in y e a r s
Distribution of a group of cholera patients by age
Age (years)
30
20
10
0
Single Married Divorced Widowed
Marital status
Pie chart
Deletion
Inversion 3%
18%
Translocation
79%
Doughnut chart
Hos pital B
DM
Hospital A IHD
Renal
Unit 2
Basic Statistical Methods
Normal Distribution
• In this section, we will mainly focus on Normal distribution,
Standard Normal distribution, skewness and kurtosis, Hypothesis
testing, type-I and type-II error etc.
Normal Distribution
It is one of the most important probability distributions in statistics.
It is the limiting form of binomial distribution by increasing ‘n’ (the
no. of trails) to a very large number for a fixed value of p.
The normal density is given by:
1
2
( x ) - ∞ < x < ∞, - ∞ < µ < ∞, σ > 0
f (x)
2
2
e
2
π, e : constants
µ: population mean.
σ : Population standard deviation
Characteristics of Normal Distribution
• In its most general form, under some conditions (which include
finite variance), it states that averages of samples of observations of
random variables independently drawn from independent
distributions converge in distribution to the normal, that is, become
normally distributed when the number of observations is
sufficiently large.
• It is bell shaped
• The mean, median and mode are equal are equal
• It is unimodal (i.e. it has only one mode)
• The curve is symmetrical about the mean, which is equivalent to
saying that its shape is the same on both sides of a vertical line
passing through the center.
Characteristics of Normal Distribution
• The curve is continuous. i.e. there are no gaps or holes. For each value
of x, there is a corresponding value of y.
• The curve never touches the x-axis. Theoretically, no matter how far in
either direction the curve extends, it never meets the x-axis but gets
increasingly closer.
• Total area under the normal distribution curve is equal to 1.00 or 100%.
• The area under the normal curve that lies within one standard deviation
of the mean is approx. 68%, within two standard deviations 95% and
within three standard deviation it 99.7%.
• The normal distribution is completely determined by the parameters µ
and σ.
• All odd order moments from mean is 0.
The normal distribution
depends on the two
parameters and .
determines the location of
the curve.
2 -∞<z<∞