You are on page 1of 51

Chapter 3

Numerical
Descriptive Measures
PowerPoint to accompany:
Learning Objectives
After studying this Chapter you should have a better
understanding of:

How to calculate and interpret numerical descriptive
measures of central tendency, variation and shape for
numerical data
How to calculate and interpret descriptive summary
measures for a population
How to construct and interpret a box-and-whisker plot
How to calculate and interpret the covariance and the
coefficient of correlation for bivariate data
2 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Describing Data
3 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Arithmetic Mean
Median
Mode
Describing data by its central tendency,
variation and shape
Variance
Standard Deviation
Coefficient of Variation
Range
Interquartile Range
Geometric Mean
Skewness
Central Tendency Variation Quartiles
Shape
Measures of Central Tendency
4
Central Tendency
Arithmetic
Mean
Median Mode
n
X
X
n
i
i
=
=
1
Midpoint of
ranked
values
Most
frequently
observed
value
Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
The Arithmetic Mean
For a sample of size n the sample mean, denoted , is
calculated:






Where means to sum or add up.

5 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
n
X X X
n
X
X
n
n
i
i
+ + +
= =

=

2 1 1
X
i
s are observed values
X
The Median

6 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e

In an ordered array, the median is the middle
number (50% above, 50% below).






Its main advantage over the arithmetic mean is that
it is not affected by extreme values.
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Finding the Median
The location of the median:




Note that is not the value of the median, only the position of
the median in the ranked data.

Rule 1: if the number of values in the data set is odd, the median is
the middle ranked value.

Rule 2: if the number of values in the data set is even, the median is
the mean (average) of the two middle ranked values.
7 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
2
Median = ranked value
1 n +
2
1 n +
The Mode
A measure of central tendency

Value that occurs most often (the most frequent)

Not affected by extreme values

Unlike mean and median, there may be no unique (single) mode for
a given data set

Used for either numerical or categorical (nominal) data

An example of no mode:

An example of several modes:
8 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Modes = 5 and 9
0 1 2 3 4 5 6
Review Example
9 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Prices for 5 houses located near the beach:
$2,000,000
$500,000
$300,000
$100,000
$100,000
Review Example
Mean=





Median (position = 6/2 = 3)
= $300,000

Mode = $100,000
10 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
House Prices:
$1,000,000
$500,000
$300,000
$100,000
$100,000
+ + + +
=
=

2,000,000 500,000 300,000 100,000 100,000


5
3,000,000
5
$600,000
Which Measure of Location is the Best in this
Situation?
The mean is generally used most often, unless extreme values
(outliers) exist.

The median is often used, since it is not sensitive to extreme values.

The mode is usually the least used of the three.

Since we have an obvious outlier ($2,000,000), it makes sense to use
the median in this instance.

Most housing prices are now reported as median housing prices in
Australian newspapers due to possible outliers.

11 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Quartiles
Quartiles split the ranked data into four segments with an equal
number of values per segment

The first quartile, Q
1
, is the value for which 25% of the
observations are smaller and 75% are larger.

The second quartile, Q
2
, is the same as the median (50% are smaller,
50% are larger).

Only 25% of the observations are greater than the third
quartile Q
3.

12 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
25% 25% 25% 25%
Q
1
Q
2 Q
3
Q
4
Quartiles
Similar to the median, we find a quartile by determining the value in
the appropriate position in the ranked data:

First quartile position: Q
1
= (n+1)/4

Second quartile position: Q
2
= (n+1)/2 (the median)

Third quartile position: Q
3
= 3(n+1)/4


Where n is the number of observed values (sample size).
13 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Quartile Example
Firstly, data must be arranged in ordered array (note n =
15)





Q
1
is in the (15+1)/4 = 4
th
position of the ranked data, so Q
1
= 7
Q
3
is in the 3*(15+1)/2 = 12
th
position of the ranked data so Q
3
= 17

14 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
1 3 5 7 8 10 11 12 13 16 16 17 18 21 22
Q
1
Q
2
= median Q
3

Geometric Mean vs. Geometric Mean Rate of Return
15 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Geometric mean is used to measure the average rate of
change of a variable over n periods of time.


n / 1
n 2 1
G
) X X X ( X =
Geometric mean rate of return measures the status
of an investment over time or average percentage
change in a variable.
1 )] R 1 ( ) R 1 ( ) R 1 [( R
n / 1
n 2 1
G
+ + + =
Where R
i
is the rate of return in time period i
Geometric Mean and Mean Rate Example
16 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
An investment of $100,000 declined to $50,000 at
the end of year one and rebounded to $100,000
at end of year two:
000 , 100 $ X 000 , 50 $ X 000 , 100 $ X
3 2 1
= = =
50% decrease 100% increase
The overall two-year rate of return is zero, since it
started and ended at the same level.
Geometric Mean and Mean Rate Example
17 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Use the 1-year returns to compute the
arithmetic mean and the geometric mean:
% 0 1 1 1 )] 2 ( ) 50 [(.
1 %))] 100 ( 1 ( %)) 50 ( 1 [(
1 )] R 1 ( ) R 1 ( ) R 1 [( R
2 / 1 2 / 1
2 / 1
n / 1
n 2 1
G
= = =
+ + =
+ + + =
% 25
2
%) 100 ( %) 50 (
X =
+
=
Arithmetic
mean rate
of return:
Geometric
mean rate
of return:
Misleading result
More accurate result
Measures of Variation
18 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
E.g. same centre,
different variation
Variation
Variance
Standard
Deviation
Coefficient
of Variation
Range Interquartile
Range
Measures of variation
give information on
the spread or
variability of the data
values.

The Range
19 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Simplest measure of variation
Difference between the largest and the smallest
values in a set of data


Range = X
largest
X
smallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
Disadvantages
Ignores the distribution of the data.




Like the mean, the range is sensitive to outliers.
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

20 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Range = 5 - 1 = 4
Range = 120 - 1 = 119


The Interquartile Range (IQR)
Like the median and Q
1
and Q
2
, the IQR is a resistant
summary measure (resistant to the presence of
extreme values).

Eliminates outlier problems by using the interquartile
range as high- and low-valued observations are
removed from calculations.

IQR = 3
rd
quartile 1
st
quartile

21 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e

3 1
IQR=Q Q
The Interquartile Range (IQR)
22 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Q2 = Median
X
maximum
X
minimum
Q1 Q3
Example: Range = 200 10 = 190 (Misleading)
25% 25% 25% 25%
10 30 45 60 200
IQR = 60 30 = 30
Even if the value of 200 changes to 300, IQR remains
the same, hence resistant to changes in extreme values.
The Sample Variance S
2

23 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Measures average scatter around the mean
Units are also squared

1 - n
) X (X
S
n
1 i
2
i
2

=

=
Where
= mean
n = sample size
X
i
= i
th
value of
the variable X
X
The Sample Standard Deviation - S
24 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Most commonly used measure of variation
Shows variation about the mean
Has the same units as the original data

1 - n
) X (X
S
n
1 i
2
i
=

=
Calculation Example: Sample Standard Deviation

25 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Sample
Data (X
i
) 10 12 14 15 17 18 18 24
n = 8 Mean = X = 16
= 4.3095
A measure of the average
scatter around the mean
+ + + +
=

+ + + +
=

2 2 2 2
2 2 2 2
(10 X) (12 X) (14 X) (24 X)
S
n 1
(10 16) (12 16) (14 16) (24 16)
8 1
=
130
7
Measuring Variation
26 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Small standard deviation

Large standard deviation
Comparing Standard Deviations
27 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Mean = 15.5
S = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
S = 0.926
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
S = 4.567
Data C
Variance and Standard Deviation
Advantages
Each value in the data set is used in the calculation
Values far from the mean are given extra weight as
deviations from the mean are squared

Disadvantages
Sensitive to extreme values (outliers)
Measures of absolute variation not relative variation

28 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
The Coefficient of Variation
29 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Measures relative variation i.e. shows variation
relative to mean.
Can be used to compare two or more sets of data
measured in different units.
Always expressed as percentage (%).
| |
=
|
\ .
S
CV 100%
X
Coefficient of Variation Example
30 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Stock A:
Average price last year = $50; standard deviation = $5


Stock B:
Average price last year = $100; standard deviation = $5
Both stocks have
the same std dev,
but stock B is less
variable relative to
its price
| |
= = =
|
\ .
A
S $5
CV 100% 100% 10%
$50
X
| |
= = =
|
\ .
B
S $5
CV 100% 100% 5%
$100
X
The Z Score
31 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
The difference between a given observation and the
mean, divided by the standard deviation.




E.g. a Z score of 2.0 means that a value is 2.0
standard deviations from the mean.
A Z score above 3.0 or below -3.0 is considered an
outlier.

=
X X
Z
S
Z Score Example
If the mean is 14.0 and the standard deviation is 3.0, what is the Z
score for the value 18.5?




The value 18.5 is 1.5 standard deviations above the mean.

A negative Z score would indicate that a value is below the mean.

32 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e

= = =
X X 18.5 14.0
Z 1.5
S 3.0
The Shape of a Distribution
33 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Describes how data are distributed.
Measures of shape.
Symmetric or skewed
Mean = Median

Mean < Median Median < Mean
Right-Skewed Left-Skewed Symmetric
Using Microsoft Excel
34 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Use menu choice:
Data/Data Analysis
/Descriptive Statistics

Using Microsoft Excel
35 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Numerical Measures for a Population
36 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Population summary measures are called
parameters.
The population mean is the sum of the values in the
population divided by the population size, N.
N
X X X
N
X
N 2 1
N
1 i
i
+ + +
= =

=

Population Variance vs. Standard Deviation
37 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Population variance:
The average of the
squared deviations
of values from the
mean
N
) (X

N
1 i
2
i
2

=

=
N
) (X

N
1 i
2
i
=

=
Population Standard Deviation:
Shows variation about the mean
Is the square root of the
population variance
Has the same units as the original
data
= population mean; N = population size; X
i
= i
th
value of the variable X
The Empirical Rule
38 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
If the data distribution is approximately bell-shaped,
then the interval contains about 68% of
the values in the population.
1

68%
1
The Empirical Rule
39 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
contains about 95% of the values in the
population

contains about 99.7% of the values in
the population
2
3
3
99.7% 95%
2
Chebyshev Rule and Examples
40 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
(1 - 1/1
2
) x 100% = 0% k=1 (
1)

(1 - 1/2
2
) x 100% = 75% k=2 (
2)

(1 - 1/3
2
) x 100% = 89% k=3 (
3)
Within At least
Regardless of how the data are distributed, the percentage
of values within k standard deviations of the mean must be
at least:
[(1 - 1/k2)] x 100% (for k > 1)

Approximating the Mean
41 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Sometimes only a frequency distribution is available,
not the raw data.

Use the midpoint of a class interval to approximate
the values in that class.



Where n = number of values or sample size
c = number of classes in the frequency distribution
m
j
= midpoint of the j
th
class
f
j
= number of values in the j
th
class
n
f m
X
c
1 j
j j

=
=
Approximating the Standard Deviation
42 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
1 - n
f ) X (m
S
c
1 j
j
2
j
=

=
Assume that all values within each class interval are
located at the midpoint of the class.
Exploratory Data Analysis
43 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Q2 = Median
X
maximum
X
minimum
Q1 Q3
25% 25% 25% 25%
Box-and-Whisker Plot: A graphical display of data using the 5
number summary:

Minimum(X
smallest
) -- Q1 -- Median -- Q3 -- Maximum (X
largest
)
Distribution Shape and Box-and-Whisker Plot
44 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Right-Skewed Left-Skewed Symmetric
Q1 Q2 Q3 Q1 Q2 Q3
Q1 Q2 Q3
The Covariance
The sample covariance measures the strength of
the linear relationship between two numerical
variables.




Only concerned with the direction of the relationship
No causal effect is implied
Is affected by units of measurement
45 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
1
( )( )
cov( , )
1
n
i i
i
X X Y Y
X Y
n
=

=

Correlation
46 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Measures the relative strength of the linear
relationship between two variables




Where
Y X
S S
Y) , (X cov
r =
1 n
) X (X
S
n
1 i
2
i
X

=

=
1 n
) Y )(Y X (X
Y) , (X cov
n
1 i
i i


=

=
1 n
) Y (Y
S
n
1 i
2
i
Y

=

=
Features of Correlation Coefficient, r
Also called Standardised Covariance,
i.e. invariant to units of measure.
Ranges between 1 and 1:
The closer to 1, the stronger the negative linear
relationship.
The closer to 1, the stronger the positive linear
relationship.
The closer to 0, the weaker the linear relationship.

47 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Scatter Plots of Data with Various Correlation
Coefficients
48 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1
r = -.6 r = 0
r = +.3 r = +1
Y
X
r = 0
Industry Application
Skyscrapers 'linked with impending financial crashes'
There is an "unhealthy correlation" between the building of skyscrapers and
subsequent financial crashes, according to Barclays Capital.

Examples include the Empire State building, built as the Great Depression
was under way, and the current world's tallest, the Burj Khalifa, built just
before Dubai almost went bust.

China is currently the biggest builder of skyscrapers, the bank said.

India also has 14 skyscrapers under construction.

"Often the world's tallest buildings are simply the edifice of a broader
skyscraper building boom, reflecting a widespread misallocation of capital
and an impending economic correction," Barclays Capital analysts said.
(source: http://www.bbc.co.uk/news/business-16494013)
49 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Pitfalls and Ethical Issues
Data analysis is objective.
Should report the summary measures that best meet the
assumptions about the data set

Data interpretation is subjective.
Should be done in fair, neutral and transparent manner

Should document both good and bad results.

Results should be presented in a fair, objective and neutral manner.

Should not use inappropriate summary measures to distort facts.

Do not fail to report pertinent findings even if such findings do not
support original argument.

50 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e
Chapter Summary
Described measures of central tendency.
Mean, median, mode, geometric mean

Described quartiles.

Described measures of variation.
Range, interquartile range, variance and standard deviation,
coefficient of variation, Z scores

Illustrated shape of distribution.
Symmetric, skewed, box-and-whisker plots

Discussed covariance and correlation coefficient.

Addressed pitfalls in numerical descriptive measures and ethical
considerations.

51 Copyright 2013 Pearson Australia (a division of Pearson Australia Group Pty Ltd) 9781442549272/Berenson/Business Statistics /2e