Beruflich Dokumente
Kultur Dokumente
AND
UNIVARIATE DISTRIBUTIONS
ENGSTAT
Statistics
A branch of mathematics dealing with the collection,
presentation, analysis, and interpretation of masses of numerical
data to make decisions, solve problems, and design products and
processes.
Types
Descriptive - describes the main features of a collection of data
quantitatively
Inferential
infer from the sample data what the population might think
make judgments of the probability that an observed difference
between groups is a dependable one or one that might have
happened by chance
Statistics
Parameters Population
Statistical
Inference
Statistic Sample
Statistics
Types of Data
QUANTITATIVE (Numeric)
Continuous Discreet
Blood pressure Number of Children
Height Number of asthma attacks per week
Weight
Age
QUALITATIVE (Categorical)
Ordinal (Ordered categories) Nominal (Unordered categories)
Stage of cancer Gender (Male/Female)
Better, Same, Worse Alive or Dead
Disagree, Neutral, Agree Blood Type (O, A, B, AB)
Data Presentation
To Show Use Data Needed
Frequency of Occurrence Bar Chart, Pie Chart Tallies by category
Crohn's Disease 88
Ulcerative Colitis 48
17%
Crohn's Disease
9%
Ulcerative Colitis
Non-specific Chronic
74% Inflammatory Disease
1 10
160
136
140
120
100 92 Crohn's Disease
85
80 Ulcerative Colitis
65
60
Non-specific Chronic
37 Inflammatory Disease
40
22
20 15 14 13
8 6 8 10
2 2 3 5
0
0
Male Female Male Female Male Female
A1 (< 16) A2 (17-40) A3 (> 40)
1 11
200
180
160
140
120 136
Non-specific Chronic
100 Inflammatory Disease
80 85 Ulcerative Colitis
92
60
15 65 Crohn's Disease
40
13
20 37 8
6
0 3 14 22 10
8 2 5
0
Male Female Male Female Male Female
A1 (< 16) A2 (17-40) A3 (> 40)
1 12
L2 0 0 2 0 4 6
L3 6 2 31 14 16 4 73
L4 2 2
35
30
25
20 L1 (Ileal)
L2 (Colon)
15
L3 (Ileocolonic)
10 L4 (UGT)
0
Male Female Male Female Male Female
A1 (< 16) A2 (17-40) A3 (> 40)
1 13
B2 0 0 2 0 0 0 2
B3 0 0 3 1 1 0 5
35
30
25
20
B1 (NSNP)
15 B2 (Stricturing)
10 B3 (Penetrating)
0
Male Female Male Female Male Female
A1 (< 16) A2 (17-40) A3 (> 40)
Tools for Describing Data: Pie Chart
Tools for Describing Data: Pareto Chart
Tools for Describing Data: Dot Plot
The dot plot as a
representation of a
distribution consists of
group of data points
plotted on a simple scale
Data: 8 6 4 10 3 8 4 8 5
Sample Mean is given by:
n
X i
8 6 4 10 3 8 4 8 5
X i 1
6.22
n 9
3 4 4 5 6 8 8 8 10
Measure of Central Tendency: Ungrouped Data
Data: 8 6 4 10 3 8 4 8 5
The mode of the sample is the value that occurs most
frequently
3 4 4 5 6 8 8 8 10
3 4 4 4 6 8 8 8 10
bimodal
Measure of Dispersion: Ungrouped Data
X
n
2
i X
s2 i 1
n 1
S s 2
Measure of Moments: Ungrouped Data
1. Skewness measure of asymmetry
X
N
3
i Xi
skewness i 1
n 1s 3
If skewness is +, distribution is skewed to the right
If skewness is -, distribution is skewed to the left
X
N
4
i Xi
k i 1
n 1s 4
Measure of Central Tendency: Grouped Data
M ean
fx i i
N
f frequency
i
X class mark
i
Measure of Central Tendency: Grouped Data
N
f i
M edian Lx 2 c
f med
f i summation of frequency preceding the median class
f frequency of the median class
med
L lower limit of the median class
X
Measure of Central Tendency: Grouped Data
f mo f1
Mode Lmo c
2 f m 0 f1 f 2
L lower limit of the modal class
mo
f frequency of the modal class
mo
f frequency preceeding the modal class
1
f frequency succeeding the modal class
2
Quartile, Decile, Percentile
nN
f i
Qn LQn 4 c
f Qn
nN
f i
Dn LDn 10 c
f Dn
nN
f i
Pn LPn 10 c
f Pn
Inter Quartile Range
also called the midspread or
middle fifty
equal to the difference
between the upper and lower
quartiles
IQR = Q3 Q1
Measure of Dispersion: Grouped Data
Standard Deviation
1. For normal distribution, the 68-95-99.7 Rule applies
2. The square of the standard deviation is the variance
3. When referring to the standard deviation of the sample s is used, for
standard deviation of the population is used.
Measure of Moments: Grouped Data
1. Skewness - measure of the symmetry
f X
N
i X
skewness i 1
n 1s 3
Measure of Moments: Grouped Data
2. Kurtosis - measure of whether the data are peaked or flat
relative to a normal distribution
f X
N 4
i X
k i 1
n 1s 4
f X
N 4
i X
Normal Distribution : k i 1
3
n 1s 4
Grouped Data: Frequency Distribution
1. Determine the range: Range = highest lowest value
24 27 18 21 21 18 24 27
21 15 24 21 21 21 18 24
37 31 24 15 30 21 18 21
25 24 15 39 27 27 24 39
24 27 24 21 37 18 34 21
Frequency Distribution
f = frequency Classes f True Class
Range = 39-15 = 24 (apparent limits Mark
K = 1 + 3.322 log(40) = 6.3 limits)
Class interval = 24/6.3 = 3.8 4 10 13 0 9.5 13.5 11.5
14 17 3 13.5 17.5 15.5
18 21 15 17.5 21.5 19.5
22 25 10 21.5 25.5 23.5
26 29 5 25.5 29.5 27.5
30 33 2 29.5 33.5 31.5
34 37 3 33.5 37.5 35.5
38 41 2 37.5 41.5 39.5
42 45 0 41.5 45.5 43.5
Total 40
Frequency Histogram
16
14
12
Frequency
10
8
6
4
2
0
10 13 14 17 18 21 22 25 26 29 30 33 34 37 38 41 42 45
Class Interval
Ogive Cumulative Frequency Distribution
45
40
35
30
25
less than ogive
20
greater than ogive
15
10
5
0
13.5 17.5 21.5 25.5 29.5 33.5 37.5 41.5
Measures of Central Tendencies
Classes frequency True Xi Cumulative
Limits Frequency
10 13 0 9.5 13.5 11.55 0
14 17 3 13.5 17.5 15.5 3
18 21 15 17.5 21.5 19.5 18 Modal class
22 25 10 21.5 25.5 23.5 28 Med. class
26 29 5 25.5 29.5 27.5 33
30 33 2 29.5 33.5 31.5 35
34 37 3 33.5 37.5 35.5 38
38 41 2 37.5 41.5 39.5 40
42 45 0 41.5 45.5 43.5
Total 40
Exercises
Raw Data
66, 58, 56, 46, 44, 46, 46, 60, 70, 54, 80, 62, 64, 44, 60, 66,
82, 86, 94, 70, 44, 64, 52, 46, 40, 56, 52, 60, 44, 48, 64, 55
a. Group data into frequency distribution
b. Calculate Mean, Median, Mode
c. Calculate Standard deviation
d. Select one each from Percentile, Decile, and Quartile and
calculate
e. Graph the frequency distribution into a histogram and
polygon
1 45
1 46
1 47