Beruflich Dokumente
Kultur Dokumente
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Preliminaries
Topics:
What is Statistics?
Typical Descriptive Statistics Problems
Note for the Student
It is recommended that students read this section in its
entirety before coming to class for the lecture to ensure that
they have the required background information.1
During the lecture I will mainly focus on sections which have
a direct bearing on the lecture topic under discussion.
Material in the last section serves to complement what we
cover during the lecture.
1
Preliminaries
Summary Measures
Miscellany
Statistics Overview
Topics:
What is Statistics?
Applications of Statistics
Learning Objectives:
Learn the nature of Statistics and study its relevance to
Business Research Analysis and Decision Making.
Learn about the different subdisciplines of Statistics concerned
with extracting descriptive information from data, assessing
uncertainty and making statistical inferences & predictions.
Preliminaries
Summary Measures
Miscellany
What is Statistics?
Statistics is the discipline which makes use of mathematical and
computational techniques to, among other things,
collect data using surveys, observational studies or designed
experiments;
describe, summarize and present the collected data;
assess and quantify uncertainty;
draw inferences about population characteristics based on
sample information;
assess the statistical significance of observed differences or
presence of associations;
construct empirical models to obtain estimates, test
hypotheses or for predictive purposes;
make projections using cross-sectional or time series data.
Preliminaries
Summary Measures
Miscellany
Applications of Statistics
Some Applications:
Marketing Research
Eg. Assessing Brand Preferences for a Given Product
Finance
Eg. Measuring the Credit Risk of a Counterparty
Insurance
Eg. Measuring Risk of an Insurance Portfolio
Reliability Engineering
Eg. Assessing the Reliability of an Aircraft Engine
Medical Research
Eg. Determining the Efficacy of a New Drug
Q: Do you think Statistics is worthwhile learning? If so, why?
Preliminaries
Summary Measures
Miscellany
R
O
R
O
R
O
R
O
D
D
R
D
D
D
R
R
O
R
D
R
R
O
R
R
R
R
R
O
O
R
R
D
R
D
D
Preliminaries
Summary Measures
Miscellany
Summarizing Data
Arterial blood pressures (in mm of mercury) for a sample of 16
children of diabetic mothers are given below.
81.6 84.1
82.0 88.9
84.6 104.9
69.4 78.9
87.6
86.7
90.8
75.2
82.8
96.4
94.0
91.0
What does the data tell you about the average blood pressure
of a child whose mother is diabetic?
What can we conclude about the variability of the blood
pressure measurements?
Source: Adapted from Weiss (2012, p. 95)
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Abs Freq
13
18
9
Rel Freq
0.325
0.450
0.225
Preliminaries
Summary Measures
Miscellany
Class
(l1 , u1 ]
(l2 , u2 ]
(l3 , u3 ]
..
.
Frequency
n1
n2
n3
..
.
(lk , uk ]
nk
Frequency
3
7
4
4
2
Note: (10, 20] refers to values between 10 (exclusive) and 20 (inclusive) etc.
Preliminaries
Summary Measures
Miscellany
Rel Freq
0.15
0.35
0.20
0.20
0.10
Class
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]
Cum Freq
0.15
0.50
0.70
0.90
1.00
Preliminaries
Summary Measures
Miscellany
Pie Slice
Democratic
Republican
Other
Q: How can we improve on this graphical display?
Angle
117 deg
162 deg
81 deg
Preliminaries
Summary Measures
Miscellany
Bar Chart
Each category is represented by a vertical (or horizontal) bar.
The height (or width) of each bar is equal or proportional to
the absolute or relative frequency of a category.
Example
For the political affliation data, we have the following bar chart.
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Frequency
3
7
4
4
2
Preliminaries
Summary Measures
Miscellany
Frequency
0.15
0.35
0.20
0.20
0.10
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Class
(0, 10]
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]
Preliminaries
Summary Measures
Miscellany
Digression: Quartiles
Let x1 , x2 , . . . , xn denote a set of n observations for our study.
Usually, the xi s are unordered.
For some applications, we need to work with ordered values in the
dataset, i.e, with x(i) s such that
x(1) x(2) x(n) .
Define
Q2 = second quartile of the xi s
1
x(k) + x(k+1) , if n = 2k,
2
=
x(k+1) ,
if n = 2k + 1.
Note that Q2 is also referred to as the median of the xi s.
Preliminaries
Summary Measures
Miscellany
109.76
99.63
99.76
100.22
101.96
109.76.
99.76
100.22
Here,
Q1 = 99.76, Q2 = 100.22 and Q3 = 101.96.
Preliminaries
Summary Measures
Miscellany
|
|
|
|
67788899
0012257
28
2
Preliminaries
Summary Measures
25
27
25
33
26
1
1
2
2
3
3
4
|
|
|
|
|
|
|
36
21
14
28
27
31
35
32
26
34
26
30
30
43
33
36
41
29
30
27
29
33
31
40
33
37
21
26
32
29
37
26
22
32
30
20
26
24
31
31
we obtain
4
9
01124
55556666667778999
000011112223333444
56677
013
Miscellany
Preliminaries
Summary Measures
Miscellany
Boxplots
We introduce the boxplot via a couple of examples.
Example [Boxplot]
Weekly television viewing times (in hours) of a sample of 20 people
are given below.
25
66
34
30
41
35
26
38
27
31
32
30
32 43
15 5
38 16
20 21
15
26
32
38
16
27
32
41
20
30
34
43
21
30
35
66
Q1 = 23
Q2 = 30.5
Q3 = 36.5
Preliminaries
Summary Measures
Miscellany
a
Adjacent values are the most extreme values that lie within the lower and
upper limits; they are the most extreme observations that are not potential
outliers (Weiss, 2012, p. 120).
Preliminaries
Summary Measures
Miscellany
Group
Statistics
5 Num Summary
Limits
Adjacent Values
Potential Outliers
Runners
3.0, 4.85, 6.3, 7.4, 8.8
1.025, 11.225
3.0, 8.8
None
Nonrunners
5.2, 12.3, 18.25, 21.55, 29.4
-1.575, 35.425
5.2, 29.4
None
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Summary Measures
Topics:
Location & Spread of a Distribution
Measures of Central Tendency
Measures of Dispersion
Summary Measures for Grouped Data
Learning Objectives:
Learn how to measure the location and spread of the
distribution of raw data for a single numerical variable.
Learn how to obtain summary measures from grouped data.
Learn how to interpret and choose between the various
summary measures.
Learn the role played by robustness in the selection of a
summary measure.
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
1X
xi = x, say.
mean =
n
i=1
Median
median =
1
2
x(k) + x(k+1) , if n = 2k,
x(k+1) ,
if n = 2k + 1.
Mode
mode = data value with highest frequency.
Preliminaries
Summary Measures
Miscellany
Example
Consider dataset
101.96, 109.76, 99.63, 99.76, 100.22
with corresponding ordered values
99.63, 99.76, 100.22, 101.96, 109.76.
Here, the mean is
x=
and
median = x(3) = 100.22.
Q: What about the mode?
Preliminaries
Summary Measures
Miscellany
Mean
Y
Y
N
Y
Median
Y
N
Y
N
Mode
N
N
Y
N
Note
Use a robust (i.e., resistant) measure of central tendency
when outlying values (assuming these are valid) are present.
The trimmed mean is an example of a robust measure of
location - see Exercise 3.54 on p. 101 of Weiss (2012) for a
specific illustration.
Q: What about the mean and median?
Preliminaries
Summary Measures
Miscellany
Example [Robustness]
The mean is not robust since it is affected by outlying (extreme)
observations.
> set.seed(2012)
> x <- rnorm(50, 10, 1)
> mean(x)
[1] 10.03585
> median(x)
[1] 10.09504
Note that Ive decided to stop using R for this course. You may ignore the R
codes that you see in this and the next three examples.
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
6
7
7
4
> mean(x)
[1] 4
> median(x)
[1] 4
The above example illustrates the case when
mean = median = mode.
Miscellany
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
5
7
6
4
7
2
> mean(x)
[1] 2.59
> median(x)
[1] 2
Q: What is the practical significance of these examples?
Miscellany
Preliminaries
Summary Measures
Miscellany
75.2
82.8
87.6
94.0
78.9 81.6
84.1 84.6
88.9 90.8
96.4 104.9
6
7
8
9
10
|
|
|
|
|
9
59
22345789
1146
5
Here,
x = 86.18 and median = 85.65.
Q: Which measure do you recommend for the data at hand?
Preliminaries
Summary Measures
Miscellany
Measures of Dispersion
Some measures of dispersion are given below.
Range
range = x(n) x(1)
Interquartile Range
IQR = Third Quartile First Quartile
Variance
variance =
1 X
(xi x)2
n1
i=1
Standard Deviation
v
u
u 1
standard deviation = t
n1
n
X
i=1
!
xi2 nx 2
Preliminaries
Summary Measures
Miscellany
Example
Consider the (ordered) dataset
99.63, 99.76, 100.22, 101.96, 109.76.
Here,
range = 109.76 99.63 = 10.13
and
IQR = 101.96 99.76 = 2.2.
Furthermore,
99.632 + + 109.762 5 102.272
18.42
variance =
51
and
standard deviation
18.42 = 4.29.
Preliminaries
Summary Measures
Miscellany
standard deviation
.
mean
Example
For data in the previous example,
coefficient of variation =
4.29
0.04.
102.27
R
Y
Y
N
Y
Y
V
Y
N
N
Y
N
SD
Y
N
N
Y
Y
IQR
Y
N
Y
Y
Y
CV
Y
N
N
N
N
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Let
mi
ni
k = number of classes,
n = total frequency.
The grouped data mean is
xg =
k
X
mi
ni
1X
=
m i ni .
n
n
i=1
i=1
k
X
i=1
mi2
ni
1X 2
x 2g =
mi ni x 2g .
n
n
i=1
Miscellany
Preliminaries
Summary Measures
Miscellany
Example
For the grouped data given earlier, we have
2
Class |
ni
mi
mi*ni
mi * ni
-----------+-----------------------------------------(10,15] |
1
12.5
12.5
156.25
(15,20] |
2
17.5
35.0
612.50
(20,25] |
8
22.5
180.0
4050.00
(25,30] |
17
27.5
467.5
12856.25
(30,35] |
15
32.5
487.5
15843.75
(35,40] |
5
37.5
187.5
7031.25
(40,45] |
2
42.5
85.0
612.50
-----------+-----------------------------------------Total |
50
1455.0
44162.50
Hence,
xg =
1455.0
44162.50
= 29.1 and sg2 =
29.12 = 36.44.
50
50
Preliminaries
Summary Measures
Miscellany
Topics:
Summation Notation
Classification of Statistical Studies
Questions for Class Discussion
Learning Objectives:
Review the notation used for summation.
Learn about different types of statistical studies.
Miscellany
Preliminaries
Summary Measures
Miscellany
Summation Notation
Summation Notation
Given numerical values x1 , . . . , xn , we have:
n
X
xi = x1 + x2 + + xn
i=1
n
n
X
X
(axi + b) = (ax1 + b) + + (axn + b) = a
xi + nb
i=1
i=1
Example
If xi s are given by 1.75, 2.25, 2.25, 2.25, 1.75, 2.00, 1.50, we have
7
X
i=1
xi = 13.75 and
7
X
i=1
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Inferential Study
The study is based on a properly chosen sample (e.g., random
sample).
Inferences made from sample information may be generalized
to a larger population.
Example [Testing Baseballs]
An independent testing company investigated the liveliness of 85
randomly selected Rawlings baseballs from the 1977 supplies of
major league teams.
The Rawlings baseball was found to be more lively than the 1976
Spalding baseball.
Source: Adapted from Weiss (2012, p. 6).
Preliminaries
Summary Measures
Miscellany
Designed Experiments
A proper randomization technique is used to allocate subjects
(or objects) to treatment and control groups.
Relevant sources of extraneous variation are controlled.
Example [Folic Acid & Birth Defects]
4753 women prior to conception were divided randomly into two
groups. One group took daily doses of folic acid while the other
took only trace elements.
Incidence of major birth defects was much reduced for the group
taking folic acid.
Here, we can infer presence of a causal relationship.
Source: Adapted from Weiss (2012, p. 7).
Preliminaries
Summary Measures
Miscellany
|
|
|
|
|
|
|
|
|
1259
34558
01889
013566688899
001235567
002234467899
88
05
Preliminaries
Summary Measures
Miscellany
Question 1 (contd)
A similar display for a sample of 53 female nonvegetarians is given
below.
The decimal point is 1 digit(s) to the right of the |
0
1
2
3
4
5
6
7
8
|
|
|
|
|
|
|
|
|
5
14
34557
4567779
0112444569
0003345577
0113334799
1157
1444
Preliminaries
Summary Measures
Miscellany
Question 1 (contd)
(a) The quartiles for both groups of females are partially given in
the following table. Fill in the missing entries in table.
Group
Vegetarian
Nonvegetarian
1st Quartile
38
2nd Quartile
39
3rd Quartile
63
Preliminaries
Summary Measures
Miscellany
Question 2
(a) Which of the following is not a property of the coefficient of
variation?
(i)
(ii)
(iii)
(iv)
It
It
It
It
is
is
is
is
Preliminaries
Summary Measures
Miscellany
Question 3
Suppose you obtain the following five number summaries from
data on annual (percentage) returns for common stock and
government bonds over a fifteen year period.
Investment: Bonds
[1] -10.460
1.035
4.600
14.080
42.980
Investment: Stocks
[1] -25.930 -0.495
10.710
23.760
44.770
Preliminaries
Summary Measures
Miscellany
Question 3 (contd)
(b) One of the values given in the five number summary for the
bond returns looks unusual. Is it a potential outlier?
(c) Of the two financial instruments, which is preferred if your
primary investment objective is to choose the one that gives
you the greater level of return on average?
(d) Which is preferred if risk aversion is the key factor influencing
your choice of investment to make?
(e) Is there anything wrong with the following statement?
Under appropriate conditions, the coefficient of variation is a
useful measure to consider when making risk-reward trade-offs
amongst several investment alternatives.
Preliminaries
Summary Measures
Miscellany
Question 4
Consider the following absolute frequency distribution obtained
from data on distance (in miles) travelled to work for a random
sample of 50 workers.
Classes
| (10,20] (20,30] (30,40] (40,50]
----------+-----------------------------------Frequency |
3
19
23
5
(a) Determine the grouped data variance using information
provided by the above empirical distribution.
(b) Determine one other grouped data measure of dispersion.
Preliminaries
Summary Measures
Miscellany
Acknowledgements