STAT3115
08/28/2015
Simple Measures of Variability
The range is simply the difference between the largest and
smallest observations (Maximum Minimum)
The interquartile range (IQR) is the difference between the
third and first quartiles (Q3Q1). The range for the middle
50% of the data.
Often if we use the median as our measure of center, we will
pick one of the above to use as our measure of variability.
Page 1 of 10
STAT3115
08/28/2015
Variance
For any observation, the deviation from the mean is given by
e=
xi x . The sum (and thus average) of all such deviations
i
is 0.
The sample variance is a measure of spread that is computed
using the average of the squared deviations.
n
( xi x ) SS
=1
x
s 2 i=
=
n 1
n 1
The numerator is known as the sums of squares for X. It is
common to divide by n 1 instead of n. The why? comes
later?
2
Page 2 of 10
STAT3115
08/28/2015
Properties of 2
Let {1 , 2 , , } be the sample and be any nonzero constant.
If 1 = 1 + , 2 = 2 + , , = + , then 2 = 2 .
If 1 = 1 , 2 = 2 , , =
then 2 = 2 2 =   .
Page 3 of 10
STAT3115
08/28/2015
Standard Deviation
The sample standard deviation is the square root of the
variance (same units as observations).
If calculating by hand (almost never), an alternative formula
for calculation of the sample variance is
2
n
n
2
2
1
1
=
s
n 1 X i n X i
=
i 1
i 1 =
Often if we use the mean as our measure of center, we will
pick the variance or standard deviation to use as our measure
of variability.
Page 4 of 10
STAT3115
08/28/2015
Example
n = 13 observations
Ordered data set (smallest to largest)
5.9
6.8
7.0
7.0
7.3
7.4
7.9
8.0
8.2
9.7
10.7 11.3
7.8
What are the measures of spread?
Range = ?
Interquartile range =?
Standard deviation =?
Page 5 of 10
STAT3115
08/28/2015
Comparison of Spread Statistics
Since the variance and standard deviation use the mean, they are
both sensitive to outliers (nonresistant)
The range is also sensitive to outliers
IQR more resistant
However, the variance and standard deviation are more efficient
than the IQR since they use all the data
Page 6 of 10
STAT3115
08/28/2015
Outliers
Data points that are extremely far from the rest of the data are
often thought of as outliers.
How far is far?? Measures of variability are often used to
quantify this.
More than 3s from the mean.
More than 1.5 times the IQR from the 25th or 75th
percentiles.
Box plots can be used to explicitly show outliers.
Comparative box plots are side by side box plots that can be
used to compare the same variable across multiple data sets.
Page 7 of 10
STAT3115
08/28/2015
Example
n = 13 observations
Ordered data set (smallest to largest)
6 9 12 12 14 14 14 15 16 18 18 18 18 19 19 20
20 21 21 21 22 22 22 23 28 28 29 32 33 33 37
Are there any outliers?
Any points <Q11.5*IQR=?
Any points >Q3+1.5*IQR=?
Page 8 of 10
STAT3115
08/28/2015
Box Plots
Graphical summary of 5 statistics
Minimum
Maximum
1st quartile
3rd quartile
Median

Box defined by quartiles (first and third)
Median represented as line in box
Lower whisker extends no further than: Q1 1.5*IQR
Upper whisker extends no further than: Q3 +1.5*IQR
Observations outside fence displayed as dots (outliers)
Page 9 of 10
STAT3115
08/28/2015
Box Plot of
Flexural Strength Data
Outliers
Maximum
Q3
Median
Q1
Minimum
Page 10 of 10