Beruflich Dokumente
Kultur Dokumente
management
Descriptive statistics- Numerical
measures
DAY 3
Recap
Day 1 Introduction, types of statistics, data and its types
Definition of statistics, terminologies : population , sample, parameter,
statistic, qualitative and quantitative data, levels of measurements :
Nominal, Ordinal, Interval and Ratio- sources of collecting data
Primary and secondary, applications of Statistics in various functions of
management data mining and data warehousing
Chap 3-4
Arithmetic Mean
Commonly called the mean
is the average of a group of numbers
Applicable for interval and ratio data
Not applicable for nominal or ordinal data
Affected by each value in the data set, including
extreme values
Computed by summing all values in the data set and
dividing the sum by the number of values in the data
set
It is possible to find the average, if we know the
aggregate and number of items, not necessarily to
know the value of the individual
Measures of Central Tendency:
The Mean
The arithmetic mean (often just called
mean) is the most common measure of
central tendency
Pronounced x-bar
The ith value
For a sample
n of size n:
X i
X1 X2 Xn
X i1
n n
Sample size Observed values
Chap 3-6
Measures of Central Tendency:
The Mean (continued)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
Chap 3-7
Properties of AM
Sum of deviations from AM is ZERO
Sum of squares of deviation taken from AM
will be minimum
Combined mean
It is affected by change of scale and change of
origin
Median
Middle value in an ordered array of numbers.
Applicable for ordinal, interval, and ratio data
Not applicable for nominal data
Unaffected by extremely large and extremely
small values.
Median: Computational Procedure
First Procedure
Arrange the observations in an ordered array.
If there is an odd number of terms, the median is
the middle term of the ordered array.
If there is an even number of terms, the median is
the average of the middle two terms.
Second Procedure
The medians position in an ordered array is given
by (n+1)/2.
Median: Example
with an Odd Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22
Q1 Q2 Q3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
Chap 3-20
Mode -- Example
The mode is 44.
35 41 44 45
There are more 44s
than any other value. 37 41 44 46
37 43 44 46
39 43 44 46
40 43 44 46
40 43 45 48
Problem
The cost of consumer purchases such as single family
housing, gasoline, internet services, tax preparation ,
and hospitalization were provided in The Wall Street
journal. Sample data typical of the cost of tax return
preparation by services such as H&R block are shown
below
120 230 110 115 160 130 150 105
195 155 105 360 120 120 140 100
115 180 235 255
- Compute the mean, median and mode
- Compute the first and third quartiles
- Compute and interpret the 90th percentile
Measures of Central Tendency:
Review Example
House Prices: Mean: ($3,000,000/5)
$2,000,000 = $600,000
$500,000
$300,000 Median: middle value of ranked data
$100,000 = $300,000
$100,000
Mode: most frequent value
Sum $3,000,000
= $100,000
Chap 3-23
Measures of Central Tendency:
Which Measure to Choose?
Chap 3-24
Measures of Central Tendency:
Summary
Central Tendency
X i
X i 1
Most
n Middle value in
the ordered array frequently
observed
value
Chap 3-25
Measures of Variability
It is often desirable to consider measures of variability
(dispersion), as well as measures of location.
Variability
No Variability
Measures of Variation
Variation
Same center,
Chap 3-29 different variation
Measures of Variability:
Ungrouped Data
Measures of variability describe the spread or the
dispersion of a set of data.
Common Measures of Variability
Range
Interquartile Range
Mean Absolute Deviation
Variance
Standard Deviation
Z scores
Coefficient of Variation
Range
The difference between the largest and the
smallest values in a set of data
Simple to compute 35 41 44 45
Range 39 43 = 44 46
Largest - Smallest =
48 - 35 = 13 40 43 44 46
40 43 45 48
Measures of Variation:
The Range
Simplest measure of variation
Difference between the largest and the smallest values:
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 - 1 = 12
Chap 3-32
Measures of Variation:
Why The Range Can Be Misleading
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Interquartile Range Q 3 Q1
Deviation from the Mean
Data set: 5, 9, 16, 17, 18
Mean:
X 65 13
N 5
Deviations from the mean: -8, -4, 3, 4, 5
+5
-4 +4
-8 +3
0 5 10 15 20
Mean Absolute Deviation
Average of the absolute deviations from the
mean
X X X
X
M . A. D.
5 -8 +8 N
9 -4 +4
+3 +3 24
16
17 +4 +4
18 +5 +5 5
0 24 4.8
Measures of Variation:
The Standard Deviation
Steps for Computing Standard Deviation
Chap 3-37
Measures of Variation:
The Variance
Average (approximately) of squared deviations of values from
the mean
Sample variance: n
(X X)
i
2
S 2 i1
n -1
Where X= arithmetic mean
n = sample size
Xi = ith value of the variable X
Chap 3-38
Measures of Variation:
The Standard Deviation
Most commonly used measure of variation
Shows variation about the mean
Is the square root of the variance
Has the same units as the original data
n
Sample standard deviation: (X i X) 2
S i 1
n -1
Chap 3-39
Measures of Variation:
Sample Standard Deviation:
Calculation Example
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
X X
X
X
2
2
2
5 -8 64
9 -4 16 N
16 +3 9
130
17
18
+4
+5
16
25
0 130
5
2 6 .0
Population Standard Deviation
Square root of the
variance
X
2
X X X
2
2
N
5 -8 64 130
9 -4 16
16 +3 9 5
17
18
+4
+5
16
25 2 6 .0
0 130
2
2 6 .0
5 .1
Sample Variance
Average of the squared deviations from the
arithmetic mean
X X X X X X
X
2
2
2
2,398 625 390,625 S
1,844 71 5,041 n1
1,539 -234 54,756
6 6 3 ,8 6 6
1,311
7,092
-462
0
213,444
663,866
3
2 2 1 , 2 8 8 .6 7
Sample Standard Deviation
Square root of the
X X
2
sample variance 2
S
X X X X X
2
n1
6 6 3 ,8 6 6
2,398 625 390,625
1,844 71 5,041 3
1,539 -234 54,756
1,311 -462 213,444 2 2 1 , 2 8 8 .6 7
7,092 0 663,866
2
S S
2 2 1 , 2 8 8 .6 7
4 7 0 .4 1
Uses of Standard Deviation
Indicator of financial risk
Quality Control
construction of quality control charts
process capability studies
Comparing populations
household incomes in two cities
employee absenteeism at two plants
Measures of Variation:
Comparing Standard Deviations
Chap 3-46
Standard Deviation as an
Indicator of Financial Risk
A 15% 3%
B 15% 7%
3-47
Measures of Variation:
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21
S = 3.338
Chap 3-48
Measures of Variation:
Summary Characteristics
The more the data are spread out, the greater the range,
variance, and standard deviation.
The more the data are concentrated, the smaller the range,
variance, and standard deviation.
If the values are all the same (no variation), all these
measures will be zero.
Chap 3-49
Measures of Variation:
The Coefficient of Variation
Measures relative variation
Always in percentage (%)
Shows variation relative to mean
Can be used to compare the variability of two or more sets
of data measured in different units
S
CV 100%
X
Chap 3-50
Measures of Variation:
Comparing Coefficients of Variation
Stock A:
Average price last year = $50
Standard deviation = $5
S $5
CVA 100% 100% 10%
X $50 Both stocks
have the same
Stock B: standard
deviation, but
Average price last year = $100 stock B is less
Standard deviation = $5 variable
S $5 relative to its
CVB 100% 100% 5% price
X $100
Chap 3-51
Coefficient of Variation
Ratio of the standard deviation to the mean,
expressed as a percentage
Measurement of relative dispersion
C.V . 100
Coefficient of Variation
29
1
84
2
1
4.6 2
10
100 100
. .
CV 1
1
. .
CV 2
2
1 2
4.6 10
100 100
29 84
1586
. 1190
.
A home theatre in a box is the easiest and cheapest way to provide surround
sound for a home entertainment centre. A sample of prices is shown here
(Consumer Reports Buying Guide, 2013). The prices are for models with a
DVD player and for models without a DVD player.
Models with DVD Player Price Models without DVD Player Price
Sony HT-1800DP $450 Pioneer HTP-230 $300
Pioneer HTD-330DV 300 Sony HT-DDW750 300
Sony HT-C800DP 400 Kenwood HTB-306 360
Panasonic SC-HT900 500 RCA RT-2600 290
Panasonic SC-MTI 400 Kenwood HTB-206 300
Compute the mean price for models with a DVD player and the mean price for
models without a DVD player. What is the additional price paid to have a DVD
player included in a home theatre unit?
Compute the range, variance, and standard deviation for the two samples. What
does this information tell you about the prices for models with and without a DVD
player?
Price with DVD player Price without DVD player
Count 5 Count 5
The following data were used to construct the histograms of the number of
days required to fill orders for Dawson Supply, Inc., and J.C. Clark
Distributors
Use the range and standard deviation to support that Dawson Supply
provides the more consistent and reliable delivery times.
dawson clark
Range 2 Range 8
Minimum 9 Minimum 7
Maximum 11 Maximum 15
Count 10 Count 10
coefficient of variation 25.08873455
coefficient of variation 6.552898619
Practice
The following times were recorded by the quarter-mile and mile runners of
a university track team (times are in minutes).
Quarter-Mile Times: .92 .98 1.04 .90 .99
Mile Times: 4.52 4.35 4.60 4.70 4.50
After viewing this sample of running times, one of the coaches commented
that the quarter milers turned in the more consistent times. Use the standard
deviation and the coefficient of variation to summarize the variability in the
data. Does the use of the coefficient of variation indicate that the coachs
statement should be qualified?
General Descriptive Stats Using
Microsoft Excel
1. Select Tools.
2. Select Data Analysis.
3. Select Descriptive Statistics and
click OK.
Chap 3-59
General Descriptive Stats Using
Microsoft Excel
Chap 3-60
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Chap 3-61
Minitab Output
N for
Variable Median Maximum Range Mode Skewness Kurtosis
House Price 300000 2000000 1900000 100000 2.01 4.13
Chap 3-62
Numerical Descriptive Measures for a
Population
Descriptive statistics discussed previously described a sample,
not the population.
Chap 3-63
Numerical Descriptive Measures
for a Population: The mean
The population mean is the sum of the values in the
population divided by the population size, N
X i
X1 X2 XN
i1
N N
Where = population mean
N = population size
Xi = ith value of the variable X
Chap 3-64
Numerical Descriptive Measures For A
Population: The Variance 2
Average of squared deviations of values from the mean
N
Population variance:
(X )
i
2
2 i1
N
i1
N
Chap 3-66
Sample statistics versus population
parameters
Chap 3-67