You are on page 1of 10

# STAT3115

08/28/2015

## The range is simply the difference between the largest and

smallest observations (Maximum Minimum)

## The interquartile range (IQR) is the difference between the

third and first quartiles (Q3-Q1). The range for the middle
50% of the data.

## Often if we use the median as our measure of center, we will

pick one of the above to use as our measure of variability.

Page 1 of 10

STAT3115

08/28/2015

Variance

## For any observation, the deviation from the mean is given by

e=
xi x . The sum (and thus average) of all such deviations
i
is 0.

## The sample variance is a measure of spread that is computed

using the average of the squared deviations.
n

( xi x ) SS

=1
x
s 2 i=
=
n 1
n 1
The numerator is known as the sums of squares for X. It is
common to divide by n 1 instead of n. The why? comes
later?
2

Page 2 of 10

STAT3115

08/28/2015

Properties of 2
Let {1 , 2 , , } be the sample and be any nonzero constant.

If 1 = 1 + , 2 = 2 + , , = + , then 2 = 2 .

If 1 = 1 , 2 = 2 , , =
then 2 = 2 2 = | | .

Page 3 of 10

STAT3115

08/28/2015

Standard Deviation

## The sample standard deviation is the square root of the

variance (same units as observations).

## If calculating by hand (almost never), an alternative formula

for calculation of the sample variance is
2
n
n

2
2
1
1
=
s
n 1 X i n X i
=
i 1
i 1 =

## Often if we use the mean as our measure of center, we will

pick the variance or standard deviation to use as our measure
of variability.
Page 4 of 10

STAT3115

08/28/2015

Example
n = 13 observations
Ordered data set (smallest to largest)
5.9
6.8
7.0
7.0
7.3
7.4
7.9
8.0
8.2
9.7
10.7 11.3

7.8

## What are the measures of spread?

Range = ?
Interquartile range =?
Standard deviation =?

Page 5 of 10

STAT3115

08/28/2015

Since the variance and standard deviation use the mean, they are
both sensitive to outliers (non-resistant)
The range is also sensitive to outliers
IQR more resistant
However, the variance and standard deviation are more efficient
than the IQR since they use all the data

Page 6 of 10

STAT3115

08/28/2015

Outliers

Data points that are extremely far from the rest of the data are
often thought of as outliers.

## How far is far?? Measures of variability are often used to

quantify this.
More than 3s from the mean.
More than 1.5 times the IQR from the 25th or 75th
percentiles.

## Box plots can be used to explicitly show outliers.

Comparative box plots are side by side box plots that can be
used to compare the same variable across multiple data sets.

Page 7 of 10

STAT3115

08/28/2015

Example
n = 13 observations
Ordered data set (smallest to largest)
6 9 12 12 14 14 14 15 16 18 18 18 18 19 19 20
20 21 21 21 22 22 22 23 28 28 29 32 33 33 37
Are there any outliers?
Any points <Q1-1.5*IQR=?
Any points >Q3+1.5*IQR=?

Page 8 of 10

STAT3115

08/28/2015

Box Plots
Graphical summary of 5 statistics
Minimum
Maximum
1st quartile
3rd quartile
Median
-

## Box defined by quartiles (first and third)

Median represented as line in box
Lower whisker extends no further than: Q1 1.5*IQR
Upper whisker extends no further than: Q3 +1.5*IQR
Observations outside fence displayed as dots (outliers)
Page 9 of 10

STAT3115

08/28/2015

Box Plot of
Flexural Strength Data
Outliers
Maximum
Q3
Median
Q1
Minimum

Page 10 of 10