Beruflich Dokumente
Kultur Dokumente
Descriptive Statistics
Statisticalprocedures used to summarise,
organise, and simplify data. This process
should be carried out in such a way that
reflects overall findings
Raw data is made more manageable
Raw data is presented in a logical form
Patterns can be seen from organised data
Tables
Graphical techniques
Measures of Central Tendency
Measures of Dispersion(variability).
Coefficient of Correlation.
11 September 2015 2
Describing data with tables
1) Frequency table
2) Relative and cumulative frequency
3) Grouped frequency
4) Open- ended groups
5) Cross-tabulation
6) Tables that are not contingency tables
11 September 2015 3
1) Frequency table
variables frequency
15.2-20.1 1, 1, 1, 1, 1, 1, 1, 1 8
20.2-25.1 1, 1, 1, 1, 1 5
25.2-30.1 1, 1, 1 3
30.2-35.1 1, 1
11 September 2015 4
2) Relative frequency, cumulative frequency
Relative
frequency: percentage of the total
Cumulative frequency:
parity No. of women Percentage Cumulative
(relative frequency) percentage
0 5 12.5 12.5
1 6 15 27.5
2 14 35 62.5
3 10 25 87.5
4 3 7.5 95
7 1 2.5 97.5
8 1 25 100
11 September 2015 5
3) Grouped frequency
Grouped frequency: work for continuous
metric data
Birth weight No. of infants
2700-2999 2
A group width
of 300g
3000-3299 3
3300-3599 9
The class
lower limit 3600-3899 9
The class 3900-4199 4
upper limit
4200-4499 3
11 September 2015 6
Note
11 September 2015 7
5) Cross-tabulation
Two variables within a single group of individuals
Breast lump 2 or fewer children Totals
diagnosis
Yes No
3. The lumps with 2 or fewer children tend to more benign than those with more than 2 children
11 September 2015 8
6) Not contingency tables
11 September 2015 9
Describing data with Chart
11 September 2015 10
Histogram & Frequency Polygon
11 September 2015 11
The pie chart
4-7 categories
One variable
Start at 0° in the same order as the table
11 September 2015 12
11 September 2015 13
The simple bar chart
11 September 2015 14
The clustered bar chart
11 September 2015 15
The stacked bar chart
11 September 2015 16
The dot plot
11 September 2015 17
Scatter-plot
11 September 2015 19
Example of a Scatter-plot matrix
(multiple pair-wise plots)
11 September 2015 20
Describing data with numeric summary value
11 September 2015 21
1- numbers, percentages and proportions
11 September 2015 22
Measures of central tendency
Also called measures of location
Gives one number which is representative
of all the data
They are the:
Mean
Median
Mode
11 September 2015 23
Sample Mean
11 September 2015 24
Measures of Location - Mean
Given a data set of size n : x1 , x2 , x3 ,..., xn
n
x i
the mean of the x' s will be denoted by x i 1
n
How many hours of television do you watch in a week?
Example : {5,7,3,38,7} in hours , n 5
5
x
i 1
i 60 sum of the data points
60
leading to : x 12 hours
5
Summation Sign ””
In the formula to find the mean, we use the “summation sign” —
This is just mathematical shorthand for “add up all of the
observations” n
X
i1
i X 1 X 2 X 3 ....... X n
11 September 2015 25
Geometric Mean Example
x ln(x)
8 2.08 The mean using the raw data is :
5 1.61 79
x 11.3
4 1.39 7
While on the log scale :
12 2.48
15 2.71 ln x 15.5 2.22
n 7
7 1.95 leading to a geometric mean of : 9.22
28 3.33
79 15.55
11 September 2015 26
Mean from a Positively Skewed Distribution
11 September 2015 27
Mean
Advantages Disadvantages
Simple and easy Affected by extreme
Most widely used values
Can be used for further Sometimes looks
statistical tests ridiculous e.g. average
All values are included
number of children =
2.7
Does not need
arrangement of data
11 September 2015 28
Median
11 September 2015 29
Measure of Location- Median
11 September 2015 30
Another Example of the Median
11 September 2015 31
Median
Advantages Disadvantages
Not affected by Needs arrangement
extreme values of data
Used for growth Difficult to calculate
curves and income from large amounts
Can be determined of data
graphically Not all values are
represented
11 September 2015 32
Final Measure of Location-Mode
It is the most common value found in the dataset
(fashionable value)
Hb level of 5 pregnant women
12, 12.5, 11, 13, 12.5 Mode = 12.5
Hb level of 6 pregnant women
12, 12.5, 11, 13, 12.5, 8 Mode = 12.5
More than one mode may occur (bimodal, trimodal)
Sometimes there is no mode .
11 September 2015 33
Mode
Advantages Disadvantages
Not affected by Not all values are
extreme values represented
11 September 2015 34
Distribution Characteristics
Mode: Peak(s)
Median: Equal areas point
Mean: Balancing point
Mode Mean
Median
11 September 2015 35
Shapes of Distributions
Mode Mean
Median
11 September 2015 36
Shapes of Distributions
11 September 2015 37
Shapes of Distributions
Symmetric (Right and left sides are mirror
images)
Left tail looks like right tail
Mean = Median = Mode
11 September 2015 38
Choosing the most appropriate measure
Nominal yes no no
11 September 2015 39
Measures of Dispersion
After we know the mean of a set of
measurements it is often of interest to measure
the degree of variation or dispersion around the
mean.
The measurement of dispersion (or variation)
plays an important role in the methods of
statistical inference.
We will discuss:
Range.
Variance.
Standard Deviation.
11 September 2015 40
Range
Difference between highest and lowest value.
Range = largest value-smallest value
11 September 2015 41
Range
Advantages Disadvantages
It is affected by extreme Value of range is only
values. determined by two values.
Easy to calculate The interpretation of the
range is difficult.
It does not provide
information about other
values and how dispersed
they are.
11 September 2015 42
Variance and Standard Deviation
n
n ( xi ) 2
n xi i 1
2
n
xi x 2 i 1
s2 i 1
n 1 n 1
11 September 2015 43
Variance Example
Considerthe dataset x xi-x (xi-x)2
{8,5,4,12,15,5,7} 8 0 0
◦ Use VARIANCE function
5 -3 9
in SPSS to calculate OR
◦ Use the data from the 4 -4 16
table 12 4 16
x 8 15 7 49
n
5 -3 9
i
( x
i 1
x ) 2
100
7 -1 1
s 2 100 / 6 16.67
11 September 2015 44
Standard Deviation
Average deviation of values around the mean
(Square root of variance)
SD
( x x)
i
2
n 1
s has the advantage of being in the same units as
the original variable x
From previous example sd=4.08
11 September 2015 45
Standard deviation
11 September 2015 46
Standard deviation
11 September 2015
47
Standard deviation
11 September 2015 48
E.g. Systolic blood pressure
Smoking males Non-smokingmales
120 130 120 150 110 130 120 140
130 170 180 160 130 150 160 130
170 150 130 150
11 September 2015 49
Inter-quartile range
The Median divides a distribution into two halves.
11 September 2015 50
Example
The ordered blood pressure data is:
113 124 124 132 146 151 170
Q1 Q3
11 September 2015 51
Coefficient of Variation
s
CV 100%
x
Measureof spread that is independent of the units
of measurement variables.
Consequently, a useful way of comparing the
dispersion of variables measured on different scales
11 September 2015 52
Coefficient of Correlation
Measure of linear association/ relationship
between two continuous variables.
Setting:
two measurements are made for each
observation.
Sample consists of pairs of values and you
want to determine the association between
the variables.
11 September 2015 53
Association Examples
Example 1: Association between a mother’s
weight and the birth weight of her child
2 measurements: mother’s weight and baby’s weight
Both continuous measures
11 September 2015 54
Birth Weight Data
x (oz) y(%)
112 63
111 66
x – birth weight in ounces
107 72
119 52 y – increase in weight between
92 75 70th and 100th days of life,
80 118 expressed as a percentage of
81 120 birth weight
84 114
118 42
106 72
103 90
94 91
11 September 2015 55
Pearson Correlation Coefficient
Birth Weight Data
120
110
Increase in Birth Weight (%)
100
90
80
70
60
50
40
70 80 90 100 110 120 130 140
Birth Weight (in ounces)
11 September 2015 56
Pearson Correlation Results
x (oz) y(%)
x (oz) 1
y(%) -0.94629 1
11 September 2015 58
Uses of Statistics
Data presentation
Simplifies large numbers of figures and
reduces volume of data
Enables comparisons across different
groups
Helps us to form and test hypotheses
Helps in prediction, planning and
administration
Helps form suitable policies
Helps measure standard of health
11 September 2015 59
Thank you