Sie sind auf Seite 1von 64

BASIC STATISTICS (3685)

FAISAL SHAHZAD
PhD Scholar (China), M.Sc. Statistics (Norway), M.Sc. Statistics (BZU, Pak)
Certificate in Public Health (Sweden), Certificate in Epidemiology (Finland),
Certified Takaful Professional (Pak)
faisalisbest@gmail.com
Unit – 1
Introduction

Introduction, Need to study statistics, Nature of Variability, Variance, Covariance and


Correlation
Introduction
Statistics: is a field of study deals with:
1- collecting, summarizing, analyzing and interpreting the data.
2- drawing inferences about a body of data (about population),
when only a part of the data is observed (sample).

Statisticians try to interpret and communicate the results to


others.
Basic Statistics

It is the science which deals with development and application of the


most appropriate methods for the:
Collection of data.
Presentation of the collected data.
Analysis and interpretation of the results.
Making decisions on the basis of such analysis
Role of statisticians
 To guide the design of an experiment or survey prior to data
collection

 To analyze data using proper statistical procedures and


techniques

 To present and interpret the results to researchers and


other decision makers
Organization of Data

• Any raw information / material of Statistics is data.


• Data are categorized into two forms: Primary and Secondary
data.
• We may define data as figures. Figures result from the process of
counting or from taking a measurement.
For example: When a hospital administrator counts the number of
patients (its counting). Whereas when a nurse weights a patient (its
measurement)
Types of data
Constant
Variables
Sources of data

Records Published reports Surveys Experiments


Routine Kept Record by Hospitals Published public reports A set of questions Particular question / Interview

Comprehensive Sample
Sources of Data
• Routine kept record: e.g. Patient info can be obtained from Hospitals
• Published reports: commercially available data banks, literature review
• Surveys: A set of certain questions.
For example: If the administrator of a clinic wishes to obtain information
regarding the mode of transportation used by patients to visit the clinic, then a
survey may be conducted among patients to obtain this information
• Experiments / Interviews: Frequently the data needed to answer a question
are available only as the result of an experiment.
For example: If a nurse wishes to know which of several strategies is best for
maximizing patient compliance, she might conduct an experiment in which the
different strategies of motivating compliance are tried with different patients.
A variable

It is a characteristic that takes on different values in different


persons, places, or things.
For example:
- heart rate,
- the heights of adult males,
- the weights of preschool children,
- the ages of patients seen in a dental clinic.
Variable, Observation, Values

Observations are the units upon which measurements are made.


Observation Rows

Variables are the characteristics being measured.


Variable Columns

Values are realized measurement


Values table cells
Types of variables

Quantitative variables Qualitative variables

Quantitative Qualitative
continuous nominal

Quantitative Qualitative
descrete ordinal
Types of variable

Quantitative Variables Qualitative Variables


It can be measured in the usual Many characteristics are not capable
sense. of being measured. Some of them
For example: can be ordered or ranked.
- the heights of adult males,
- the weights of preschool children, For example:
- the ages of patients seen in a - classification of people into socio-
dental clinic. economic groups,
- social classes based on income,
education, etc.
Quantitative variables types

A discrete variable A continuous variable


is characterized by gaps or can assume any value within a
interruptions in the values that it specified relevant interval of values
can assume. assumed by the variable.
For example: For example:
- The number of daily admissions to - Height,
a general hospital, - weight,
- No. of item produced in a factory - skull circumference.
daily No matter how close together the
observed heights of two people, we
can find another person whose
height falls somewhere in between
Application of Statistics in Research
In order to provide a more concrete picture of what the course will
cover, let us take a brief look at a relatively recent article in the
American Journal of Public Health. Here you will see if you need
statistics or not.. In fact it is very much important for research to
understand the basic concepts and many of the basic tools of
statistics.
Article for Review
1 - 18
1 - 19
1 - 20
1 - 21
1 - 22
1 - 24
1 - 25
1 - 26
1 - 27
1 - 28
The Strategy
The basic strategy is to first focus on the types of data, then
examine tables and graphs and frequency distributions,
types of measurement scales and etc. This is critical
background for understanding samples and populations,
estimates of population parameters and the fundamental
elements of hypothesis testing. We will then move to
hypothesis testing e.g. z-tests, t-tests and many of the other
statistical tools.
Type of statistical analysis
Descriptive Inferential
Techniques for collection, Techniques for making
organization, summarization and generalization about characteristics
presentation of data of a population based on sample

e.g. Mean, Median, Mode, Standard e.g. Hypothesis testing,


deviation or Variance. Frequency correlations, Z, Chi-square or t-
tables and/or graphical testing or ANOVA etc.
representation of data etc.

Parametric: use interval or ratio


data
Non-parametric: use nominal or
ordinal data
Population and Sample
Population:
A set which includes all
measurements of interest
to the researcher
Also called the collection
Of all responses,
Measurements.

Sample:
A subset of the population Population Sample
Descriptive Statistics
Mean
• A measure of central tendency is a single value that attempts to
describe a set of data by identifying the central position within
that set of data. As such, measures of central tendency are
sometimes called measures of central location.
• The mean, median and mode are all valid measures of central
tendency, but under different conditions.
• Note: Mean can badly effect by outliers while Median is not.
The Population Mean:
N

X
i 1
i

= which is usually unknown, then we use the


N
sample mean to estimate it.

The Sample Mean: n

x = x i 1
i

n
Example:
Here is a random sample of size 10 of ages, where
 1 = 42,  2 = 28,  3 = 28,  4 = 61,  5 = 31,
x
 6 = 23,  7 = 50,  8 = 34,  9 = 32,  10 = 37.

= (42 + 28 + … + 37) / 10 = 36.6


Median
• The median is the middle score for a set of data that has been
arranged in order of magnitude. The median is less affected by
outliers and skewed data.
• If the observations are odd, choose the middle one. But if the
observations are even, then choose two middle obs and average it.
• Example: suppose we have the following data, arrange it first then
choose the middle observation for odd obs. If even, then choose
the middle two and get average of them.

65 55 89 56 35 14 56 55 87 45 92 14 35 45 55 55 56 56 65 87 89 92
Mode
• The mode is the most frequent score in our data set. On a
histogram it represents the highest bar in a bar chart or
histogram. You can, therefore, sometimes consider the mode as
being the most popular option.
• Example: in the previous example, 55 & 56 are the mode. There
can also be 2 or 3 modes or no mode. Only you need to see the
most repeated value in the data set.

65 55 89 56 35 14 56 55 87 45 92
Comparison of Mean, Median and Mode
Measures of Dispersion: Variance
• It measure dispersion relative to the scatter of the values about mean
a) Sample Variance (S 2 ) :
• n
,where x is sample mean

(x  x )2 i
S 2
 i1

n  1

b) Population Variance ( 2 ) :
• N
,where mu is the Population mean
 2
(x i   )
 2
 i1

N
Measures of Dispersion: Variance
• Example:
• Data: 43,66,61,64,65,38,59,57,57,50 x = 56
• Solution:
• S2= [(43-56) 2 +(66-56) 2+…..+(50-56) 2 ]/ 10
= 900/9 = 100

• For population variance divider will be 10 and answer will be


90. which is good for being less spread of data points.
Measures of Dispersion: Standard Deviation
The Standard Deviation: 
It shows the variation in data. If the data is close together,
the standard deviation will be small. If the data is spread
out, the standard deviation will be large.
Varince
S.D is the square root of variance=
2
S
a) Sample Standard Deviation = S =
 2
b) Population Standard Deviation = σ =
Covariance and Correlation
• It will be attached with unit 4.
Graphical presentation

 Graphs drawn using Cartesian coordinates

• Line graph
• Frequency polygon
• Frequency curve
• Histogram
• Bar graph
• Scatter plot

 Pie chart

 Statistical maps
Line Graph

MMR/1000 Year MMR


60 1960 50
50
40 1970 45
30 1980 26
20
1990 15
10
0 2000 12
Year
1960 1970 1980 1990 2000

Figure (1): Maternal mortality rate of (country), 1960-2000


Frequency polygon

Age Sex Mid-point of interval


(years) Males Females

20 – 29 3 (12%) 2 (10%) (20+30) / 2 = 25


30 – 39 9 (36%) 6 (30%) (30+40) / 2 = 35
40 – 49 7 (8%) 5 (25%) (40+50) / 2 = 45
50 – 59 4 (16%) 3 (15%) (50+60) / 2 = 55
60 – 69 2 (8%) 4 (20%) (60+70) / 2 = 65
Total 25(100%) 20(100%)
Frequency polygon Age
Sex
M-P
M F
20- (12%) (10%) 25
Males Females 30- (36%) (30%) 35
%
40 40- (8%) (25%) 45
50- (16%) (15%) 55
35
30 60-70 (8%) (20%) 65

25
20
15
10
5
0
Age
25 35 45 55 65

Figure (2): Distribution of 45 patients at (place) , in (time) by age and sex


Frequency curve

8 F e m a le

7 M a le

6
Frequency

5
4

0
20- 30- 40- 50- 60-69
A g e in y e a r s
Distribution of a group of cholera patients by age

Histogram Age (years) Frequency %


25- 3 14.3
% 35 30- 5 23.8
40- 7 33.3
30
45- 4 19.0
25
60-65 2 9.5
20 Total 21 100
15
10
5
0

Age (years)

Figure (2): Distribution of 100 cholera patients at (place) , in (time) by age


Bar chart
%
45
40
35
30
25
20
15
10
5
0
Single Married Divorced Widowed
Marital status
Bar chart
%
50
Male
40 Female

30
20
10
0
Single Married Divorced Widowed
Marital status
Pie chart
Deletion
Inversion 3%
18%

Translocation
79%
Doughnut chart

Hos pital B

DM
Hospital A IHD
Renal
Unit 2
Basic Statistical Methods
Normal Distribution
• In this section, we will mainly focus on Normal distribution,
Standard Normal distribution, skewness and kurtosis, Hypothesis
testing, type-I and type-II error etc.
Normal Distribution
It is one of the most important probability distributions in statistics.
It is the limiting form of binomial distribution by increasing ‘n’ (the
no. of trails) to a very large number for a fixed value of p.
The normal density is given by:
1 
2
( x  ) - ∞ < x < ∞, - ∞ < µ < ∞, σ > 0
f (x) 
2
2
e
2 
π, e : constants
µ: population mean.
σ : Population standard deviation
Characteristics of Normal Distribution
• In its most general form, under some conditions (which include
finite variance), it states that averages of samples of observations of
random variables independently drawn from independent
distributions converge in distribution to the normal, that is, become
normally distributed when the number of observations is
sufficiently large.
• It is bell shaped
• The mean, median and mode are equal are equal
• It is unimodal (i.e. it has only one mode)
• The curve is symmetrical about the mean, which is equivalent to
saying that its shape is the same on both sides of a vertical line
passing through the center.
Characteristics of Normal Distribution
• The curve is continuous. i.e. there are no gaps or holes. For each value
of x, there is a corresponding value of y.
• The curve never touches the x-axis. Theoretically, no matter how far in
either direction the curve extends, it never meets the x-axis but gets
increasingly closer.
• Total area under the normal distribution curve is equal to 1.00 or 100%.
• The area under the normal curve that lies within one standard deviation
of the mean is approx. 68%, within two standard deviations 95% and
within three standard deviation it 99.7%.
• The normal distribution is completely determined by the parameters µ
and σ.
• All odd order moments from mean is 0.
The normal distribution
depends on the two
parameters  and .
determines the location of
the curve.

But,  determines the scale of


the curve, i.e. the degree of
flatness or peaked ness of
the curve. Note that:
1. P(µ - 1σ < x < µ + 1σ) = 68%
2. P(µ - 2σ < x < µ + 2σ) = 95%
3. P(µ - 3σ < x < µ + 3σ) = 99%
Application of Normal Dist. Through
68%-95%-99.7% rule
• A researcher measured the percent of body fat of 2000 women,
the resulting dist. Has a mean = 25% fat and a S.D. of 4.0.
therefore, the scores would be distributed in the following
manner.
The Standard Normal distribution:
• Is a special case of normal distribution with mean equal 0 and a
standard deviation of 1.
• The equation for the standard normal distribution is written as
z2
1 
f (z)  e 2

2 -∞<z<∞

It has the following characteristics:


1- It is symmetrical about 0.
2- The total area under the curve above the x-axis is 1
3- We can use a separate table to find the probabilities and areas.
Tests for Skewness and Kurtosis
Skewness is a measure of symmetry, or the lack of
symmetry. A distribution, or data set is symmetric if it
looks the same to the left and right of the center point.
Tests for Skewness and Kurtosis
Kurtosis is a measure of whether the data are heavy-tailed
or light-tailed relative to a normal distribution.
Rules for Skewness and Kurtosis
Rules for Skewness

▫ Skewness > 0 = positive (right) skewed i.e. Mode<Median<Mean


▫ Skewness < 0 = negative (left) skewed i.e. Mean<Median<Mode
▫ Skewness = 0 is acceptable. i.e. Mean=Median=Mode
If 3 times stand. error of skewness comes equal to skewness statistic value
then you are ok. Otherwise not. In the following case, multiply
0.150*3=0.450 which show huge different from -1.493. similarly for others.
Rules for Kurtosis

▫ Different books different scales: Kurtosis between -3 and 3 is acceptable


(means normal curve). some says b/w -2 and +2. some says even value >0
or <0 is enough to measure kurtosis.
▫ If 3 times stand. error of kurtosis comes equal to kurtosis statistic value
then you are ok. Otherwise not. In the following case, multiply
0.299*3=0.0.897 which show huge different from 3.934. similarly for
others.
Thank you

Das könnte Ihnen auch gefallen