You are on page 1of 10

STAT3115

08/28/2015

Simple Measures of Variability

The range is simply the difference between the largest and


smallest observations (Maximum Minimum)

The interquartile range (IQR) is the difference between the


third and first quartiles (Q3-Q1). The range for the middle
50% of the data.

Often if we use the median as our measure of center, we will


pick one of the above to use as our measure of variability.

Page 1 of 10

STAT3115

08/28/2015

Variance

For any observation, the deviation from the mean is given by


e=
xi x . The sum (and thus average) of all such deviations
i
is 0.

The sample variance is a measure of spread that is computed


using the average of the squared deviations.
n

( xi x ) SS

=1
x
s 2 i=
=
n 1
n 1
The numerator is known as the sums of squares for X. It is
common to divide by n 1 instead of n. The why? comes
later?
2

Page 2 of 10

STAT3115

08/28/2015

Properties of 2
Let {1 , 2 , , } be the sample and be any nonzero constant.

If 1 = 1 + , 2 = 2 + , , = + , then 2 = 2 .

If 1 = 1 , 2 = 2 , , =
then 2 = 2 2 = | | .

Page 3 of 10

STAT3115

08/28/2015

Standard Deviation

The sample standard deviation is the square root of the


variance (same units as observations).

If calculating by hand (almost never), an alternative formula


for calculation of the sample variance is
2
n
n

2
2
1
1
=
s
n 1 X i n X i
=
i 1
i 1 =

Often if we use the mean as our measure of center, we will


pick the variance or standard deviation to use as our measure
of variability.
Page 4 of 10

STAT3115

08/28/2015

Example
n = 13 observations
Ordered data set (smallest to largest)
5.9
6.8
7.0
7.0
7.3
7.4
7.9
8.0
8.2
9.7
10.7 11.3

7.8

What are the measures of spread?


Range = ?
Interquartile range =?
Standard deviation =?

Page 5 of 10

STAT3115

08/28/2015

Comparison of Spread Statistics


Since the variance and standard deviation use the mean, they are
both sensitive to outliers (non-resistant)
The range is also sensitive to outliers
IQR more resistant
However, the variance and standard deviation are more efficient
than the IQR since they use all the data

Page 6 of 10

STAT3115

08/28/2015

Outliers

Data points that are extremely far from the rest of the data are
often thought of as outliers.

How far is far?? Measures of variability are often used to


quantify this.
More than 3s from the mean.
More than 1.5 times the IQR from the 25th or 75th
percentiles.

Box plots can be used to explicitly show outliers.


Comparative box plots are side by side box plots that can be
used to compare the same variable across multiple data sets.

Page 7 of 10

STAT3115

08/28/2015

Example
n = 13 observations
Ordered data set (smallest to largest)
6 9 12 12 14 14 14 15 16 18 18 18 18 19 19 20
20 21 21 21 22 22 22 23 28 28 29 32 33 33 37
Are there any outliers?
Any points <Q1-1.5*IQR=?
Any points >Q3+1.5*IQR=?

Page 8 of 10

STAT3115

08/28/2015

Box Plots
Graphical summary of 5 statistics
Minimum
Maximum
1st quartile
3rd quartile
Median
-

Box defined by quartiles (first and third)


Median represented as line in box
Lower whisker extends no further than: Q1 1.5*IQR
Upper whisker extends no further than: Q3 +1.5*IQR
Observations outside fence displayed as dots (outliers)
Page 9 of 10

STAT3115

08/28/2015

Box Plot of
Flexural Strength Data
Outliers
Maximum
Q3
Median
Q1
Minimum

Page 10 of 10