Beruflich Dokumente
Kultur Dokumente
Quartile
Measures of Variation
The Range, Variance and
Standard Deviation, Coefficient of variation
Shape
Symmetric, Skewed
3
Important Summary Measures
Central Tendency
Mean
Median
Mode
Quartile
One sample
Summary Measures
Variation
Variance
Standard Deviation
Coefficient of
Variation
Range
4
Measures of Central Tendency
Central Tendency
Mean
Median Mode
n
x
n
i
i
1
Data: You can access practice sample data on
HMO premiums here.
5
With one data point
clearly the central
location is at the point
itself.
But if the third data point
appears on the left hand-side
of the midrange, it should pull
the central location to the left.
Measures of Central
Location (Tendency)
Sample size Population size
n
x
x
i
n
1 i
l
Arithmetic
Arithmetic
mean
mean
7
+ + + + +
6 6
6 5 4 3 2 1
6
1
x x x x x x x
x
i i
Example 4.1
The mean of the sample of six measurements 7, 3, 9, -2, 4, 6 is given by
7
7
3
3
9
9
4
4
6
6
4.5
4.5
Example 4.2
Suppose the telephone bills of example 2.1 represent population
of measurements. The population mean is
+ + +
200
x ... x x
200
x
200 2 1 i
200
1 i
42.19
42.19
15.30
15.30
53.21
53.21
43.59
43.59
2
8
26,26,28,29,30,32,60,31
Odd number of observations
26,26,28,29,30,32,60
Example 4.4
Seven employee salaries were recorded
(in 1000s) : 28, 60, 26, 32, 30, 26, 29.
Find the median salary.
Example 4.6
A professor of statistics wants to report the results of a midterm
exam, taken by 100 students. The data appear in file XM04-06.
Find the mean, median, and mode, and describe the information
they provide.
Marks
Mean 73.98
Standard Error 2.1502163
Median 81
Mode 84
Standard Deviation 21.502163
Sample Variance 462.34303
Kurtosis 0.3936606
Skewness -1.073098
Range 89
Minimum 11
Maximum 100
Sum 7398
Count 100
The mean provides information
about the over-all performance level
of the class.
The Median indicates that half of the
class received a grade below 81%,
and half of the class received a grade
above 81%. The mode must be used when data is
qualitative. If marks are classified by
letter grade, the frequency of each
grade can be calculated.Then, the mode
becomes a logical measure to compute.
Excel Results
11
Relationship among Mean, Median,
Relationship among Mean, Median,
and Mode
and Mode
,
_
X
S
CV
14
Measures of variability
Measures of variability
(Looking beyond the average)
(Looking beyond the average)
l
The variance
The variance
18
Consider two small populations:
Population A: 8, 9, 10, 11, 12
Population B: 4, 7, 10, 13, 16
10 9 8
7 4 10
11 12
13 16
8-10= -2
9-10= -1
11-10= +1
12-10= +2
4-10 = - 6
7-10 = -3
13-10 = +3
16-10 = +6
Sum = 0
Sum = 0
The mean of both
populations is 10...
but measurements in B
are much more dispersed
then those in A.
Thus, a measure of dispersion
is needed that agrees with this
observation.
Let us start by calculating
the sum of deviations
A
B
The sum of deviations
is zero in both cases,
therefore, another
measure is needed.
19
10 9 8
7 4 10
11 12
13 16
8-10= -2
9-10= -1
11-10= +1
12-10= +2
4-10 = - 6
7-10 = -3
13-10 = +3
16-10 = +6
Sum = 0
Sum = 0
A
B
The sum of deviations
is zero in both cases,
therefore, another
measure is needed.
The sum of squared deviations
is used in calculating the variance.
See example next.
20
Let us calculate the variance of the two populations
18
5
) 10 16 ( ) 10 13 ( ) 10 10 ( ) 10 7 ( ) 10 4 (
2 2 2 2 2
2
B
+ + + +
2
5
) 10 12 ( ) 10 11 ( ) 10 10 ( ) 10 9 ( ) 10 8 (
2 2 2 2 2
2
A
+ + + +
Why is the variance defined as
the average squared deviation?
Why not use the sum of squared
deviations as a measure of
dispersion instead?
After all, the sum of squared
deviations increases in
magnitude when the dispersion
of a data set increases!!
21
Example 4.8
Solution
1
1
]
1
n
) x (
x
1 n
1
1 n
) x x (
s
2
i
n
1 i
2
i
n
1 i
2
i
n
1 i
2
95 . 2
6
7 . 17
6
7 . 3 8 . 2 2 . 1 1 . 4 5 . 2 4 . 3
6
x
x
i
6
1 i
+ + + + +
A shortcut formula
=[3.4
2
+2.5
2
++3.7
2
]-[(17.7)
2
/6] = 1.075 (years)
2
22
Sample Standard Deviation
1
2
n
X X
i
For the Sample : use n - 1
in the denominator.
Data: 10 12 14 15 17 18 18 24
s =
n = 8 Mean =16
1 8
16 24 16 18 16 17 16 15 16 14 16 12 16 10
2 2 2 2 2 2 2
+ + + + + + ) ( ) ( ) ( ) ( ) ( ) ( ) (
= 4.2426
s
: X
i
23
Interpreting Standard
Interpreting Standard
Deviation
Deviation
n
X X
i s =
= 4.2426
N
X
i
= 3.9686
Value for the Standard Deviation is larger for data considered as a Sample.
Data : 10 12 14 15 17 18 18 24 : X
i
N= 8 Mean =16
25
Comparing Standard Deviations
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C
26
Measures of Association
Measures of Association
x
(
y
)
is the population mean of the variable X (Y)
N is the population size. n is the sample size.
1 - n
) y )( (x
Y) cov(X, covariance Sample
y i x i
l
The
The
covariance
covariance
28
l
The coefficient of correlation
The coefficient of correlation
29
COV(X,Y)=0
or r =
+1
0
-1
Strong positive linear relationship
No linear relationship
Strong negative linear relationship
or
COV(X,Y)>0
COV(X,Y)<0