Stats Powerpoint From The Worst Prof in The World

Preliminaries
Empirical Data Distributions
Summary Measures
STAT101 Introductory Statistics

Data Distributions & Summary Measures
Sutaip L. C. Saw, Ph.D.
Last Revision: 29 July 2014
Miscellany
Preliminaries
Summary Measures
Miscellany
Preliminaries
Topics:
What is Statistics?
Typical Descriptive Statistics Problems
Note for the Student
It is recommended that students read this section in its
entirety before coming to class for the lecture to ensure that
they have the required background information.1
During the lecture I will mainly focus on sections which have
a direct bearing on the lecture topic under discussion.
Material in the last section serves to complement what we
cover during the lecture.
1
This also applies to the Preliminaries section of subsequent lecture slides.
Preliminaries
Summary Measures
Miscellany
Statistics Overview
Topics:
What is Statistics?
Applications of Statistics
Learning Objectives:
Learn the nature of Statistics and study its relevance to
Business Research Analysis and Decision Making.
Learn about the different subdisciplines of Statistics concerned
with extracting descriptive information from data, assessing
uncertainty and making statistical inferences & predictions.
Preliminaries
Summary Measures
Miscellany
What is Statistics?
Statistics is the discipline which makes use of mathematical and
computational techniques to, among other things,
collect data using surveys, observational studies or designed
experiments;
describe, summarize and present the collected data;
assess and quantify uncertainty;
draw inferences about population characteristics based on
sample information;
assess the statistical significance of observed differences or
presence of associations;
construct empirical models to obtain estimates, test
hypotheses or for predictive purposes;
make projections using cross-sectional or time series data.
Preliminaries
Summary Measures
Miscellany
Applications of Statistics
Some Applications:
Marketing Research
Eg. Assessing Brand Preferences for a Given Product
Finance
Eg. Measuring the Credit Risk of a Counterparty
Insurance
Eg. Measuring Risk of an Insurance Portfolio
Reliability Engineering
Eg. Assessing the Reliability of an Aircraft Engine
Medical Research
Eg. Determining the Efficacy of a New Drug
Q: Do you think Statistics is worthwhile learning? If so, why?
Preliminaries
Summary Measures
Miscellany
Typical Descriptive Statistics Problems

Organizing Data
Forty students in an Introductory Statistics course were asked to
state their political affliations (i.e., whether they favoured the
Democratic (D), Republican (R) or Other (O) party). The
following results were obtained.
D
D
D
D
O
R
O
R
O
R
O
R
O
D
D
R
D
D
D
R
R
O
R
D
R
R
O
R
R
R
R
R
O
O
R
R
D
R
D
D
What type of data are we dealing with?

What can we say about the distribution of political affliations?
Source: Adapted from Weiss (2012, p. 40).
Preliminaries
Summary Measures
Miscellany
Summarizing Data
Arterial blood pressures (in mm of mercury) for a sample of 16
children of diabetic mothers are given below.
81.6 84.1
82.0 88.9
84.6 104.9
69.4 78.9
87.6
86.7
90.8
75.2
82.8
96.4
94.0
91.0
What does the data tell you about the average blood pressure
of a child whose mother is diabetic?
What can we conclude about the variability of the blood
pressure measurements?
Source: Adapted from Weiss (2012, p. 95)
Preliminaries
Summary Measures
Miscellany

Topics:
Tabulating Data Distributions
Graphing Data Distributions
Learn tabular and graphical techniques for organizing and
presenting data.
Learn how to choose among the available techniques for a
given problem in descriptive statistical analysis.
Note:
Much of the material in this and the next section are of a review nature.
Well quickly review such material but spend more time on material
students are less familiar with.
Preliminaries
Summary Measures
Miscellany
Tabulating Data Distributions

Tabulating Categorical Data
The first column of the table contains the possible categories
and the second column the correponding absolute frequencies
(optionally, relative frequencies may also be given in another
column).
Example
Consider the political affliation data given in the first illustrative
problem. Following is the frequency table for the data.
Affliation
Democratic
Republican
Other
Abs Freq
13
18
9
Rel Freq
0.325
0.450
0.225
Preliminaries
Summary Measures
Miscellany
Tabulating Numerical Data

In an absolute frequency table, the number of observations in
each class (i.e., pre-defined sub-interval) is presented.
Class
(l1 , u1 ]
(l2 , u2 ]
(l3 , u3 ]
..
.
Frequency
n1
n2
n3
..
.
(lk , uk ]
nk
Abs Frequency Table

Class
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]
Frequency
3
7
4
4
2
Note: (10, 20] refers to values between 10 (exclusive) and 20 (inclusive) etc.
Preliminaries
Summary Measures
Miscellany
Example [Frequency Tables]

The absolute frequency table in the previous slide was obtained
from the following raw data
12 13 17 21 24 24 26 27 27 30
32 35 37 38 41 43 44 46 53 58
The corresponding relative and cumulative frequency tables are:
Class
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]
Rel Freq
0.15
0.35
0.20
0.20
0.10
Class
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]
Q: What can we deduce from each table?
Cum Freq
0.15
0.50
0.70
0.90
1.00
Preliminaries
Summary Measures
Miscellany
Graphing Data Distributions

Graphing Distributions for Categorical Data
Pie Chart
A circle is divided into pie slices. The area of each slice is
proportional to the relative frequency of each category.
Example
For the political affliation data, we have the following pie chart.
Pie Slice
Democratic
Republican
Other
Q: How can we improve on this graphical display?
Angle
117 deg
162 deg
81 deg
Preliminaries
Summary Measures
Miscellany
Bar Chart
Each category is represented by a vertical (or horizontal) bar.
The height (or width) of each bar is equal or proportional to
the absolute or relative frequency of a category.
Example
For the political affliation data, we have the following bar chart.
Q: Which is preferred? A pie chart or bar chart?
Preliminaries
Summary Measures
Miscellany
Side-by-Side Bar Chart

This chart may be used to present bivariate categorical data.
Example [Side-by-Side Bar Chart]
Consider the following distribution of student grades by gender.
A B C D E
Female 3 9 7 1 1
Male
4 6 5 3 1
In relative terms, we have the following table.
A
B
C
D
E
Female 0.14 0.43 0.33 0.05 0.05
Male
0.21 0.32 0.26 0.16 0.05
Preliminaries
Summary Measures
Miscellany
Example [Side-by-Side Bar Chart] (contd)

Information in the first (second) table may be displayed by the
chart in the left (right) panel of the following figure.
Q: What conclusion(s) can be drawn from the above figure?

Q: Does it matter which chart you base you conclusions on?
Source: Adapted from Chow et al (2007, p. 7).
Preliminaries
Summary Measures
Miscellany
Graphing Distributions for Numerical Data

Absolute Frequency Histogram
Displays information contained in an absolute frequency table
using vertical bars with no gaps between bars.
The height of each bar gives the number of observations that
lie in the interval determined by the base of the bar.
Example
Class
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]
Frequency
3
7
4
4
2
Preliminaries
Summary Measures
Miscellany
Relative Frequency Histogram

Displays information in a relative frequency table by vertical
bars with no gaps between bars.
The area of each bar gives the fraction of observations that lie
in the interval determined by the base of the bar.
Example
Class
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]
Frequency
0.15
0.35
0.20
0.20
0.10
Q: What can you conclude from the above figure?
Preliminaries
Summary Measures
Digression: Identifying Distribution Shapes
Miscellany
Preliminaries
Summary Measures
Miscellany
Cumulative Frequency Polygon

Displays a plot of cumulative frequency against upper class limit in
an expanded cumulative frequency table (as illustrated below).
Example
Class
(0, 10]
(10, 20]
(20, 30]
(30, 40]
(40, 50]
(50, 60]
Cum Freq (%)

0
15
50
70
90
100
Q: What useful statistic(s) can we deduce from such plots?
Preliminaries
Summary Measures
Miscellany
Digression: Quartiles
Let x1 , x2 , . . . , xn denote a set of n observations for our study.
Usually, the xi s are unordered.
For some applications, we need to work with ordered values in the
dataset, i.e, with x(i) s such that
x(1) x(2) x(n) .
Define
Q2 = second quartile of the xi s

1
x(k) + x(k+1) , if n = 2k,
2
=
x(k+1) ,
if n = 2k + 1.
Note that Q2 is also referred to as the median of the xi s.
Preliminaries
Summary Measures
Miscellany
The first quartile, denoted Q1, may be defined as the median of xi

values less than or equal to Q2.
The third quartile, denoted Q3, may be defined as the median of
xi values greater than or equal to Q2.
Example
For the following set of 5 observations
101.96
109.76
99.63
99.76
100.22
101.96
109.76.
the corresponding ordered sample is

99.63
99.76
100.22
Here,
Q1 = 99.76, Q2 = 100.22 and Q3 = 101.96.
Preliminaries
Summary Measures
Miscellany
Stem and Leaf Diagram

A stem and leaf diagram (like the one shown below) is a graphical
display that shows the distribution of a set of numerical values.
From it, one can
sometimes recover the original data;
easily infer empirical percentiles;
obtain measures of central tendency and dispersion.
Example
1
2
3
4
|
|
|
|
67788899
0012257
28
2
Ordered data: 16, 17, . . . , 38, 42.

Distribution is right-skewed.
Q1 = 18, Q2 = 20 and Q3 = 23.5
Min = 16 and Max = 42.
Preliminaries
Summary Measures
Example [Stem and Leaf Display]

For the Cord Strength dataset
25
34
19
34
25
25
27
25
33
26
1
1
2
2
3
3
4
|
|
|
|
|
|
|
36
21
14
28
27
31
35
32
26
34
26
30
30
43
33
36
41
29
30
27
29
33
31
40
33
37
21
26
32
29
37
26
22
32
30
20
26
24
31
31
we obtain
4
9
01124
55556666667778999
000011112223333444
56677
013
Miscellany
Preliminaries
Summary Measures
Miscellany
Boxplots
We introduce the boxplot via a couple of examples.
Example [Boxplot]
Weekly television viewing times (in hours) of a sample of 20 people
are given below.
25
66
34
30
41
35
26
38
27
31
32
30
32 43
15 5
38 16
20 21
To obtain a boxplot, begin by finding the quartiles.

5
25
31
38
15
26
32
38
16
27
32
41
20
30
34
43
21
30
35
66
Q1 = 23
Q2 = 30.5
Q3 = 36.5
Preliminaries
Summary Measures
Miscellany
Example [Boxplot] (contd)

Then, determine the following limits
Lower Limit = Q1 1.5 IQR = 2.75,
Upper Limit = Q3 + 1.5 IQR = 56.75,
where IQR = 36.5 23 = 13.5. Finally, obtain 5 and 43 as the
adjacent valuesa and note that 66 is a potential outlier since it falls
outside the interval (2.75, 56.75).
a
Adjacent values are the most extreme values that lie within the lower and
upper limits; they are the most extreme observations that are not potential
outliers (Weiss, 2012, p. 120).
Preliminaries
Summary Measures
Miscellany
Example [Parallel Boxplots]

Measurements on skinfold thickness (in mm) for samples of
runners and nonrunners in the same age group are given below.
Runners
|
Nonrunners
-----------------+----------------------7.3 6.7 8.7
|
24.0 19.9 7.5 18.4
3.0 5.1 8.8
|
28.0 29.4 20.3 19.0
7.8 3.8 6.2
|
9.3 18.1 22.8 24.2
5.4 6.4 6.3
|
9.6 19.4 16.3 16.3
3.7 7.5 4.6
|
12.4 5.2 12.2 15.6
Group
Statistics
5 Num Summary
Limits
Adjacent Values
Potential Outliers
Runners
3.0, 4.85, 6.3, 7.4, 8.8
1.025, 11.225
3.0, 8.8
None
Nonrunners
5.2, 12.3, 18.25, 21.55, 29.4
-1.575, 35.425
5.2, 29.4
None
Preliminaries
Summary Measures
Miscellany
Example [Parallel Boxplots] (contd)
Q: What conclusions can you draw from the above figure?

Source: Adapted from Weiss (2012, pp. 121-122)
Preliminaries
Summary Measures
Miscellany
Summary Measures
Topics:
Location & Spread of a Distribution
Measures of Central Tendency
Measures of Dispersion
Summary Measures for Grouped Data
Learn how to measure the location and spread of the
distribution of raw data for a single numerical variable.
Learn how to obtain summary measures from grouped data.
Learn how to interpret and choose between the various
summary measures.
Learn the role played by robustness in the selection of a
summary measure.
Preliminaries
Summary Measures
Location & Spread of a Distribution
Miscellany
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Preliminaries
Summary Measures
Miscellany
Measures of Central Tendency

Let x1 , x2 , . . . , xn denote a set n observations with corresponding
ordered values x(1) , x(2) , . . . , x(n) .
Some measures of central tendency are given below.
Mean
1X
xi = x, say.
mean =
n
i=1
Median

median =
1
2

x(k) + x(k+1) , if n = 2k,
x(k+1) ,
if n = 2k + 1.
Mode
mode = data value with highest frequency.
Preliminaries
Summary Measures
Miscellany
Example
Consider dataset
101.96, 109.76, 99.63, 99.76, 100.22
with corresponding ordered values
99.63, 99.76, 100.22, 101.96, 109.76.
Here, the mean is
x=
101.96 + 109.76 + 99.63 + 99.76 + 100.22

102.27
5
and
median = x(3) = 100.22.
Q: What about the mode?
Preliminaries
Summary Measures
Miscellany
Advantages & Disadvantages

Feature
Always Exists?
Always Unique?
Not Affected by Outliers?
Further Analysis Potential?
Mean
Y
Y
N
Y
Median
Y
N
Y
N
Mode
N
N
Y
N
Note
Use a robust (i.e., resistant) measure of central tendency
when outlying values (assuming these are valid) are present.
The trimmed mean is an example of a robust measure of
location - see Exercise 3.54 on p. 101 of Weiss (2012) for a
specific illustration.
Q: What about the mean and median?
Preliminaries
Summary Measures
Miscellany
Example [Robustness]
The mean is not robust since it is affected by outlying (extreme)
observations.
> set.seed(2012)
> x <- rnorm(50, 10, 1)
> mean(x)
[1] 10.03585
> median(x)
[1] 10.09504
Note that Ive decided to stop using R for this course. You may ignore the R
codes that you see in this and the next three examples.
Preliminaries
Summary Measures
Miscellany
Example [Robustness] (contd)

> x <- sort(x)
> x[50] <- 30
> mean(x)
[1] 10.37307
> median(x)
[1] 10.09504
The median is not affected by extreme observations and hence it is
a robust measure of central tendency.
Preliminaries
Summary Measures
Relative Magnitude of Location Measures

Example
> table(x)
x
1 2 3 4 5
4 7 23 32 23
6
7
7
4
> mean(x)
[1] 4
> median(x)
[1] 4
The above example illustrates the case when
mean = median = mode.
Miscellany
Preliminaries
Summary Measures
In the next example, we have

mean < median = mode.
Example
> table(x)
x
1 2 3 4 5 6 7
2 4 7 12 15 33 27
> mean(x)
[1] 5.41
> median(x)
[1] 6
Miscellany
Preliminaries
Summary Measures
It is also possible that

mean > median = mode.
Example
> table(x)
x
1 2 3 4
27 33 15 12
5
7
6
4
7
2
> mean(x)
[1] 2.59
> median(x)
[1] 2
Q: What is the practical significance of these examples?
Miscellany
Preliminaries
Summary Measures
Miscellany
Example [Mean vs Median]

The ordered sample and stem and leaf display for some data on
arterial blood pressure are given below.
69.4
82.0
86.7
91.0
75.2
82.8
87.6
94.0
78.9 81.6
84.1 84.6
88.9 90.8
96.4 104.9
6
7
8
9
10
|
|
|
|
|
9
59
22345789
1146
5
Here,
x = 86.18 and median = 85.65.
Q: Which measure do you recommend for the data at hand?
Preliminaries
Summary Measures
Miscellany
Measures of Dispersion
Some measures of dispersion are given below.
Range
range = x(n) x(1)
Interquartile Range
IQR = Third Quartile First Quartile
Variance
variance =
1 X
(xi x)2
n1
i=1
Standard Deviation
v
u
u 1
standard deviation = t
n1
n
X
i=1
!
xi2 nx 2
Preliminaries
Summary Measures
Miscellany
Example
Consider the (ordered) dataset
99.63, 99.76, 100.22, 101.96, 109.76.
Here,
range = 109.76 99.63 = 10.13
and
IQR = 101.96 99.76 = 2.2.
Furthermore,

99.632 + + 109.762 5 102.272
18.42
variance =
51
and
standard deviation
18.42 = 4.29.
Preliminaries
Summary Measures
Miscellany
A relative measure of dispersion is

coefficient of variation =
standard deviation
.
mean
Example
For data in the previous example,
coefficient of variation =
4.29
0.04.
102.27
Advantages & Disadvantages

Feature
Always Exists?
Always Unique?
Not Affected by Outliers?
Absolute Measure?
Same Units?
R
Y
Y
N
Y
Y
V
Y
N
N
Y
N
SD
Y
N
N
Y
Y
IQR
Y
N
Y
Y
Y
CV
Y
N
N
N
N
Preliminaries
Summary Measures
Miscellany
Example [Comparing Stock Performance]

Following are annual logarithmic returns of Microsof (MSFT) and
Hewlett-Packard (HWP) for the period spanning 1995-1999.
|
1995
1996
1997
1998
1999
-----+-----------------------------------MSFT | 0.3644 0.6622 0.5026 0.7648 0.5290
HWP | 0.5014 0.1836 0.2156 0.1864 0.4921
Some summary statistics for the returns are as follows:

|
MSFT
HWP
-------------+---------------Mean
| 0.5646
0.3158
Std Dev
| 0.1539
0.1657
Median
| 0.5290
0.2156
IQR
| 0.1596
0.3057
Coef of Var | 0.2727
0.5246
Q: Which of the two stocks performed better over 1995-1999?
Preliminaries
Summary Measures
Miscellany
Mean & Variance for Grouped Data

Grouped data refers to data in a frequency distribution.
Example
Class |
Freq.
Percent
Cum.
------------+----------------------------------(10,15] |
1
2.00
2.00
(15,20] |
2
4.00
6.00
(20,25] |
8
16.00
22.00
(25,30] |
17
34.00
56.00
(30,35] |
15
30.00
86.00
(35,40] |
5
10.00
96.00
(40,45] |
2
4.00
100.00
------------+-----------------------------------
Information in the first and any one of the remaining three

columns of the above table constitute grouped data.
Preliminaries
Summary Measures
Let
mi
= mid-point of i-th class,
ni
= frequency of i-th class,
k = number of classes,
n = total frequency.
The grouped data mean is
xg =
k
X
mi
ni
1X
=
m i ni .
n
n
i=1
i=1
The grouped data variance is

sg2 =
k
X
i=1
mi2
ni
1X 2
x 2g =
mi ni x 2g .
n
n
i=1
Miscellany
Preliminaries
Summary Measures
Miscellany
Example
For the grouped data given earlier, we have
2
Class |
ni
mi
mi*ni
mi * ni
-----------+-----------------------------------------(10,15] |
1
12.5
12.5
156.25
(15,20] |
2
17.5
35.0
612.50
(20,25] |
8
22.5
180.0
4050.00
(25,30] |
17
27.5
467.5
12856.25
(30,35] |
15
32.5
487.5
15843.75
(35,40] |
5
37.5
187.5
7031.25
(40,45] |
2
42.5
85.0
612.50
-----------+-----------------------------------------Total |
50
1455.0
44162.50
Hence,
xg =
1455.0
44162.50
= 29.1 and sg2 =
29.12 = 36.44.
50
50
Preliminaries
Summary Measures
Miscellany
Topics:
Summation Notation
Classification of Statistical Studies
Questions for Class Discussion
Review the notation used for summation.
Learn about different types of statistical studies.
Miscellany
Preliminaries
Summary Measures
Miscellany
Summation Notation
Summation Notation
Given numerical values x1 , . . . , xn , we have:
n
X
xi = x1 + x2 + + xn
i=1
n
n
X
X
(axi + b) = (ax1 + b) + + (axn + b) = a
xi + nb
i=1
i=1
Example
If xi s are given by 1.75, 2.25, 2.25, 2.25, 1.75, 2.00, 1.50, we have
7
X
i=1
xi = 13.75 and
7
X
i=1
xi2 = 1.752 + + 1.502 = 27.5625.
Preliminaries
Summary Measures
Miscellany
Classification of Statistical Studies

Observational Study
Observed relationships and other inferences apply only to
the study subjects (or objects) under investigation.
No control of extraneous sources of variation.
Example [Vasectomies & Prostrate Cancer]
A study found an association between vasectomy and prostrate
cancer - elevated risk after vasectomy.
No information that the study was based on a properly chosen
sample or a properly designed experiment.
We cannot infer causation nor generalize the observed association.
Preliminaries
Summary Measures
Miscellany
Inferential Study
The study is based on a properly chosen sample (e.g., random
sample).
Inferences made from sample information may be generalized
to a larger population.
Example [Testing Baseballs]
An independent testing company investigated the liveliness of 85
randomly selected Rawlings baseballs from the 1977 supplies of
major league teams.
The Rawlings baseball was found to be more lively than the 1976
Spalding baseball.
Preliminaries
Summary Measures
Miscellany
Designed Experiments
A proper randomization technique is used to allocate subjects
(or objects) to treatment and control groups.
Relevant sources of extraneous variation are controlled.
Example [Folic Acid & Birth Defects]
4753 women prior to conception were divided randomly into two
groups. One group took daily doses of folic acid while the other
took only trace elements.
Incidence of major birth defects was much reduced for the group
taking folic acid.
Here, we can infer presence of a causal relationship.
Preliminaries
Summary Measures
Miscellany
Questions for Class Discussion

Question 1
A stem-and-leaf display of daily protein intake (in grams) for a
sample of 51 female vegetarians is shown below.
The decimal point is 1 digit(s) to the right of the |
0
1
2
3
4
5
6
7
8
|
|
|
|
|
|
|
|
|
1259
34558
01889
013566688899
001235567
002234467899
88
05
Preliminaries
Summary Measures
Miscellany
Question 1 (contd)
A similar display for a sample of 53 female nonvegetarians is given
below.
The decimal point is 1 digit(s) to the right of the |
0
1
2
3
4
5
6
7
8
|
|
|
|
|
|
|
|
|
5
14
34557
4567779
0112444569
0003345577
0113334799
1157
1444
Preliminaries
Summary Measures
Miscellany
Question 1 (contd)
(a) The quartiles for both groups of females are partially given in
the following table. Fill in the missing entries in table.
Group
Vegetarian
Nonvegetarian
1st Quartile
38
2nd Quartile
39
3rd Quartile
63
Table : Quartiles of Vegetarian and Nonvegetarian Females
(b) Based on information in (the completed) table, compare the

location and spread of the two sets of data.
(c) Identify potential outliers, if any, for each dataset. Do you
obtain results that are consistent with what you observe in the
stem-and-leaf displays?
Preliminaries
Summary Measures
Miscellany
Question 2
(a) Which of the following is not a property of the coefficient of
variation?
(i)
(ii)
(iii)
(iv)
It
It
It
It
is
is
is
is
not always unique.

resistant to outliers.
a relative measure.
not in the same units as the original data.
(b) The (arithmetic) mean computed from raw data is always

unique. The same is true of the mean computed from
grouped data. True or False?
(c) The sample mid-range is a robust measure of location. True
or False?
Preliminaries
Summary Measures
Miscellany
Question 3
Suppose you obtain the following five number summaries from
data on annual (percentage) returns for common stock and
government bonds over a fifteen year period.
Investment: Bonds
[1] -10.460
1.035
4.600
14.080
42.980
Investment: Stocks
[1] -25.930 -0.495
10.710
23.760
44.770
(a) What types of statistics do the numbers in each summary

represent?
Preliminaries
Summary Measures
Miscellany
Question 3 (contd)
(b) One of the values given in the five number summary for the
bond returns looks unusual. Is it a potential outlier?
(c) Of the two financial instruments, which is preferred if your
primary investment objective is to choose the one that gives
you the greater level of return on average?
(d) Which is preferred if risk aversion is the key factor influencing
your choice of investment to make?
(e) Is there anything wrong with the following statement?
Under appropriate conditions, the coefficient of variation is a
useful measure to consider when making risk-reward trade-offs
amongst several investment alternatives.
Preliminaries
Summary Measures
Miscellany
Question 4
Consider the following absolute frequency distribution obtained
from data on distance (in miles) travelled to work for a random
sample of 50 workers.
Classes
| (10,20] (20,30] (30,40] (40,50]
----------+-----------------------------------Frequency |
3
19
23
5
(a) Determine the grouped data variance using information
provided by the above empirical distribution.
(b) Determine one other grouped data measure of dispersion.
Preliminaries
Summary Measures
Miscellany
Acknowledgements
The current slides are based in part on material from:

Introductory Statistics (9th Edition) by Neil A. Weiss.
Introductory Statistics (2nd Edition) by H. K. Chow, A.
Ghosh, D. H. Y. Leung and Y. K. Tse.
The slides were produced using The Beamer Class package and
MikTeX (a public domain document preparation system).
Customized computations and graphics were produced using R (a
public domain statistical software package).
I am grateful to the developers of the above resources for making
them available.

Stats Powerpoint From The Worst Prof in The World

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Stats Powerpoint From The Worst Prof in The World

Hochgeladen von

Copyright:

Verfügbare Formate

Preliminaries

Empirical Data Distributions

STAT101 Introductory Statistics

Sutaip L. C. Saw, Ph.D.

Last Revision: 29 July 2014

Empirical Data Distributions

This also applies to the Preliminaries section of subsequent lecture slides.

Empirical Data Distributions

Empirical Data Distributions

Empirical Data Distributions

Empirical Data Distributions

Typical Descriptive Statistics Problems

What type of data are we dealing with?

Empirical Data Distributions

Empirical Data Distributions

Empirical Data Distributions

Empirical Data Distributions

Tabulating Data Distributions

Empirical Data Distributions

Tabulating Numerical Data

Abs Frequency Table

Empirical Data Distributions

Example [Frequency Tables]

Q: What can we deduce from each table?

Empirical Data Distributions

Graphing Data Distributions

Empirical Data Distributions

Q: Which is preferred? A pie chart or bar chart?

Empirical Data Distributions

Side-by-Side Bar Chart

Empirical Data Distributions

Example [Side-by-Side Bar Chart] (contd)

Q: What conclusion(s) can be drawn from the above figure?

Empirical Data Distributions

Graphing Distributions for Numerical Data

Empirical Data Distributions

Relative Frequency Histogram

Q: What can you conclude from the above figure?

Empirical Data Distributions

Digression: Identifying Distribution Shapes

Empirical Data Distributions

Cumulative Frequency Polygon

Cum Freq (%)

Q: What useful statistic(s) can we deduce from such plots?

Empirical Data Distributions

Empirical Data Distributions

The first quartile, denoted Q1, may be defined as the median of xi

the corresponding ordered sample is

Empirical Data Distributions

Stem and Leaf Diagram

Ordered data: 16, 17, . . . , 38, 42.

Empirical Data Distributions

Example [Stem and Leaf Display]

Empirical Data Distributions

To obtain a boxplot, begin by finding the quartiles.

Empirical Data Distributions

Example [Boxplot] (contd)

Empirical Data Distributions

Example [Parallel Boxplots]

Empirical Data Distributions

Example [Parallel Boxplots] (contd)

Q: What conclusions can you draw from the above figure?

Empirical Data Distributions