Sie sind auf Seite 1von 103

Chapter 2

Methods for Describing


Sets of Data
Business Statistics
Business Statistics
Our market share far
exceeds all
competitors!

30%

32%

34%

36%

Us Y X
Business Statistics
Data
Presentation
Qualitative
Data
Quantitative
Data
Summary
Table
Stem-&-Leaf
Display
Frequency
Distribution
Histogram
Bar
Graph
Pie
Chart
Pareto
Diagram
Presenting
Qualitative Data
Business Statistics
Pie
Chart
Pareto
Diagram
Data
Presentation
Qualitative
Data
Quantitative
Data
Summary
Table
Stem-&-Leaf
Display
Frequency
Distribution
Histogram
Bar
Graph
Business Statistics
Summary Table
1. Lists categories & number of elements in category
2. Obtained by tallying responses in category
3. May show frequencies (counts), % or both
Row Is
Category
Tally:
|||| ||||
|||| ||||
Major Count
Accounting 130
Economics 20
Management 50
Total 200
Business Statistics
Pie
Chart
Summary
Table
Data
Presentation
Qualitative
Data
Quantitative
Data
Stem-&-Leaf
Display
Frequency
Distribution
Histogram
Bar
Graph
Pareto
Diagram
0
50
100
150
Acct. Econ. Mgmt.
Major
Business Statistics
Vertical Bars
for Qualitative
Variables
Bar Height
Shows
Frequency or %
Zero Point
Percent
Used
Also

Equal Bar
Widths
F
r
e
q
u
e
n
c
y

Bar Graph
Business Statistics
Data
Presentation
Qualitative
Data
Quantitative
Data
Summary
Table
Stem-&-Leaf
Display
Frequency
Distribution
Histogram
Bar
Graph
Pie
Chart
Pareto
Diagram
Econ.
10%
Mgmt.
25%
Acct.
65%
Business Statistics
Pie Chart
1. Shows breakdown of
total quantity into
categories
2. Useful for showing
relative differences
3. Angle size
(360)(percent)
Majors
(360) (10%) = 36
36
Business Statistics
Data
Presentation
Qualitative
Data
Quantitative
Data
Summary
Table
Stem-&-Leaf
Display
Frequency
Distribution
Histogram
Bar
Graph
Pie
Chart
Pareto
Diagram
Business Statistics
Pareto Diagram
Like a bar graph, but with the categories arranged by height in descending
order from left to right.
0
50
100
150
Acct. Mgmt. Econ.
Major
Vertical Bars
for Qualitative
Variables
Bar Height
Shows
Frequency or %
Zero Point
Percent
Used
Also
Equal
Bar Widths
F
r
e
q
u
e
n
c
y

Business Statistics
Thinking Challenge
Youre an analyst for IRI. You want to show the market
shares held by Web browsers in 2006. Construct a bar
graph, pie chart, & Pareto diagram to describe the data.
Browser Mkt. Share (%)
Firefox 14
Internet Explorer 81
Safari 4
Others 1
0%
20%
40%
60%
80%
100%
Firefox Internet
Explorer
Safari Others
Business Statistics
M
a
r
k
e
t

S
h
a
r
e

(
%
)

Browser
Bar Graph Solution
Business Statistics
Market Share
Safari, 4%
Firefox,
14%
Internet
Explorer,
81%
Others,
1%
Pie Chart Solution
Business Statistics
0%
20%
40%
60%
80%
100%
Internet
Explorer
Firefox Safari Others
M
a
r
k
e
t

S
h
a
r
e

(
%
)

Browser
Pareto Diagram Solution
Presenting
Quantitative Data
Business Statistics
Data
Presentation
Qualitative
Data
Quantitative
Data
Summary
Table
Stem-&-Leaf
Display
Frequency
Distribution
Histogram
Bar
Graph
Pie
Chart
Pareto
Diagram
Business Statistics
Stem-and-Leaf Display

1. Divide each observation
into stem value and leaf
value
Stem value defines
class
Leaf value defines
frequency (count)
2. Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
26
2 144677
3 028
4 1
Business Statistics
Data
Presentation
Qualitative
Data
Quantitative
Data
Summary
Table
Stem-&-Leaf
Display
Frequency
Distribution
Histogram
Bar
Graph
Pie
Chart
Pareto
Diagram
Business Statistics
Frequency Distribution Table Steps
1. Determine range
2. Select number of classes
Usually between 5 & 15 inclusive
3. Compute class intervals (width)
4. Determine class boundaries (limits)
5. Compute class midpoints
6. Count observations & assign to classes
Business Statistics

Determine the range
Range (R) = highest value lowest value
Number of classes
C=1 + 10/3 x log N ( N = number of observation)
Class Interval
CI = R/C (rounded)
Class Limits/Boundaries
Lowest Limits value <= lowest value
Highest Limits value >= Highest Value
Class Mid Point
CM = (Lower + Upper Limits) / 2




Business Statistics
Data
Presentation
Qualitative
Data
Quantitative
Data
Summary
Table
Stem-&-Leaf
Display
Frequency
Distribution
Histogram
Bar
Graph
Pie
Chart
Pareto
Diagram
0
1
2
3
4
5
Business Statistics
Frequency
Relative
Frequency
Percent
0 15.5 25.5 35.5 45.5 55.5
Lower Boundary
Bars
Touch
Class Freq.
15.5 25.5 3
25.5 35.5 5
35.5 45.5 2
Count
Histogram
Business Statistics
Raw Data:
24, 26, 24, 21, 27 27 30, 41, 32, 38
20 18 42 25 57 26 35 29 34 40
33 21 56 45 51 23 36 54 20 19
Make Distribution Frequency Table !
Business Statistics
Relative Frequency Distribution
Class
18 23
2
24 29
1
42 47
3
Frequency %
30 35
36 41
54 59
48 53
4
5
8
7
10
3
7
13
17
27
23
Numerical Data Properties
Business Statistics
Standar Notation
Measure Sample Population
Mean

X

Standard
Deviation
S o
Variance

S
2
o
2
Size
n N
Business Statistics
Central Tendency
(Location)
Variation
(Dispersion)
Shape
Numerical Data Properties
Business Statistics
Numerical Data
Properties
Mean
Median
Mode
Central
Tendency
Range
Variance
Standard Deviation
Variation
Percentiles
Relative
Standing
Interquartile Range
Zscores
Central Tendency
Business Statistics
Mean
Median
Mode
Range
Variance
Standard Deviation
Interquartile Range
Numerical Data
Properties
Central
Tendency
Variation
Percentiles
Relative
Standing
Zscores
Business Statistics
Mean
1. Measure of central tendency
2. Most common measure
3. Acts as balance point
4. Affected by extreme values (outliers)
5. Formula (sample mean)
X
X
n
X X X
n
i
i
n
n
= =
+ + +
=

1
1 2

Business Statistics
Mean Example
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
X
X
n
X X X X X X
i
i
n
= =
+ + + + +
=
+ + + + +
=
=

1
1 2 3 4 5 6
6
10 3 4 9 8 9 11 7 6 3 7 7
6
8 30
. . . . . .
.
Business Statistics
Mean
Median
Mode
Range
Variance
Standard Deviation
Interquartile Range
Numerical Data
Properties
Central
Tendency
Variation
Percentiles
Relative
Standing
Zscores
Business Statistics
Median
1. Measure of central tendency
2. Middle value in ordered sequence
If n is odd, middle value of sequence
If n is even, average of 2 middle values
3. Position of median in sequence

4. Not affected by extreme values
Positioning Point =
+ n 1
2
Business Statistics
Median Example (Odd-sized sample)
Raw Data: 24.1 22.6 21.5 23.7 22.6
Ordered: 21.5 22.6 22.6 23.7 24.1
Position: 1 2 3 4 5
Positioning Point
Median
=
+
=
+
=
=
n 1
2
5 1
2
3 0
22 6
.
.
Business Statistics
Median Example (Even-sized Sample)
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
Position: 1 2 3 4 5 6
Positioning Point
Median
=
+
=
+
=
=
+
=
n 1
2
6 1
2
3 5
7 7 8 9
2
8 30
.
. .
.
Business Statistics
Mean
Median
Mode
Range
Variance
Standard Deviation
Interquartile Range
Numerical Data
Properties
Central
Tendency
Variation
Percentiles
Relative
Standing
Zscores
Business Statistics
Mode

1. Measure of central tendency
2. Value that occurs most often
3. Not affected by extreme values
4. May be no mode or several modes
5. May be used for quantitative or qualitative
data
Business Statistics
Mode Example

No Mode
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
One Mode
Raw Data: 6.3 4.9 8.9 6.3 4.9 4.9
More Than 1 Mode
Raw Data: 21 28 28 41 43 43
Business Statistics
Thinking Challenge

Youre a financial analyst for Prudential-Bache
Securities. You have collected the following
closing stock prices of new stock issues: 17, 16,
21, 18, 13, 16, 12, 11.
Describe the stock prices
in terms of central tendency.
Business Statistics
Mean
X
X
n
X X X
i
i
n
= =
+ + +
=
+ + + + + + +
=
=

1
1 2 8
8
17 16 21 18 13 16 12 11
8
15 5

.
Business Statistics
Median
Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
Position: 1 2 3 4 5 6 7 8
Positioning Point
Median
=
+
=
+
=
=
+
=
n 1
2
8 1
2
4 5
16 16
2
16
.
Business Statistics
Mode

Raw Data: 17 16 21 18 13 16 12 11

Mode = 16
Business Statistics

Summary of Central Tendency Measures
Measure Formula Description
Mean E X
i
/ n Balance Point
Median
( n +1)
Position
2
Middle Value
When Ordered
Mode none Most Frequent
Variation
Business Statistics
Mean
Median
Mode
Range
Variance
Standard Deviation
Interquartile Range
Numerical Data
Properties
Central
Tendency
Variation
Percentiles
Relative
Standing
Zscores
Business Statistics
Range
1. Measure of dispersion
2. Difference between largest & smallest observations
Range = X
largest
X
smallest
3. Ignores how data are distributed
7 8 9 10 7 8 9 10
Range = 10 7 = 3 Range = 10 7 = 3
Business Statistics
Mean
Median
Mode
Range
Interquartile Range
Variance
Standard Deviation
Numerical Data
Properties
Central
Tendency
Variation
Percentiles
Relative
Standing
Zscores
Business Statistics
Variance & Standard Deviation
1. Measures of dispersion
2. Most common measures
3. Consider how data are distributed
4 6 10 12
X = 8.3
4. Show variation about mean (X or )
8
Business Statistics
n - 1 in denominator!
(Use N if Population
Variance)
Sampel
Variance
Formula
X X X X X X
n
n 1
2
2
2 2
1
=
+ + +

( ) ( ) ( )

=
S
X X
n
i
i
n
2
2
1
1
=


=

( )
Business Statistics
Standar Deviation Formula
S S
X X
n
X X X X X X
n
i
i
n
n
=
=


=
+ + +

=

2
2
1
1
2
2
2
2
1
1
( )
( ) ( ) ( )

Business Statistics
Variance Example
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
S
X X
n
X
X
n
S
i
i
n
i
i
n
2
2
1 1
2
2 2 2
1
8 3
10 3 8 3 4 9 8 3 7 7 8 3
6 1
6 368
=


= =
=
+ + +

=
= =

( )
( ) ( ) ( )
where .
. . . . . .
.

Business Statistics
Thinking Challenge
Youre a financial analyst
for Prudential-Bache
Securities. You have
collected the following
closing stock prices of new
stock issues: 17, 16, 21, 18,
13, 16, 12, 11.
What are the variance and
standard deviation of the
stock prices?
Business Statistics
Variation Solution
Raw Data: 17 16 21 18 13 16 12 11
S
X X
n
X
X
n
S
i
i
n
i
i
n
2
2
1 1
2
2 2 2
1
15 5
17 15 5 16 15 5 11 15 5
8 1
11 14
=


= =
=
+ + +

=
= =

( )
( ) ( ) ( )
where .
. . .
.

Business Statistics
Sample Standard Deviation
S S
X X
n
i
i
n
= =


= =
=

2
2
1
1
11 14 3 34
( )
. .

Business Statistics
Summary of Variation Measures
Measure Formula Description
Range X
largest
X
smallest
Total Spread
Standard Deviation
(Sample)
X X
n
i


( )

2
1
Dispersion about
Sample Mean
Standard Deviation
(Population)
X
N
i X
( )


2
Dispersion about
Population Mean
Variance
(Sample)
E ( X
i


X )
2
n 1
Squared Dispersion
about Sample Mean
Interpreting Standard Deviation
Business Statistics
Intrepreting Standard Deviation :
Chebyshevs Theorem (Applies to any shape data set)
No useful information about the fraction of data in the
interval x s to x + s
At least 3/4 of the data lies in the interval
x 2s to x + 2s
At least 8/9 of the data lies in the interval
x 3s to x + 3s
In general, for k > 1, at least 1 1/k
2
of the data lies in the
interval x ks to x + ks
Business Statistics
Interpreting Standard Deviation: Chebyshevs
Theorem
s x 3 s x 3 +
s x 2
s x 2 + s x + x s x
No useful information
At least 3/4 of the data
At least 8/9 of the data
Business Statistics
Chebyshevs Theorem Example
Previously we found the mean
closing stock price of new stock
issues is 15.5 and the standard
deviation is 3.34.
Use this information to form an
interval that will contain at least
75% of the closing stock prices of
new stock issues.
Business Statistics
At least 75% of the closing stock prices of new stock
issues will lie within 2 standard deviations of the mean.

x = 15.5 s = 3.34
(x 2s, x + 2s) = (15.5 23.34, 15.5 + 23.34)
= (8.82, 22.18)
Business Statistics
Interpreting Standard Deviation : Empirical Rule
Applies to data sets that are mound shaped and
symmetric
Approximately 68% of the measurements lie in the
interval to +
Approximately 95% of the measurements lie in the
interval 2 to + 2
Approximately 99.7% of the measurements lie in the
interval 3 to + 3
Interpreting Standard Deviation:
Empirical Rule
3 2 + +2 + 3
Approximately 68% of the measurements
Approximately 95% of the measurements
Approximately 99.7% of the measurements
Empirical Rule Example
Previously we found the mean
closing stock price of new stock
issues is 15.5 and the standard
deviation is 3.34. If we can assume
the data is symmetric and mound
shaped, calculate the percentage of
the data that lie within the intervals
x + s, x + 2s, x + 3s.
Empirical Rule Example
Approximately 95% of the data will lie in the interval
(x 2s, x + 2s),
(15.5 23.34, 15.5 + 23.34) = (8.82, 22.18)

Approximately 99.7% of the data will lie in the interval
(x 3s, x + 3s),
(15.5 33.34, 15.5 + 33.34) = (5.48, 25.52)

According to the Empirical Rule, approximately 68%
of the data will lie in the interval (x s, x + s),
(15.5 3.34, 15.5 + 3.34) = (12.16, 18.84)

Numerical Measures of Relative
Standing
Numerical Data
Properties & Measures
Mean
Median
Mode
Range
Variance
Standard Deviation
Interquartile Range
Numerical Data
Properties
Central
Tendency
Variation
Percentiles
Relative
Standing
Zscores
Numerical Measures of Relative
Standing: Percentiles
Describes the relative location of a measurement
compared to the rest of the data
The p
th
percentile is a number such that p% of the data
falls below it and (100 p)% falls above it
Median = 50
th
percentile
Percentile Example
You scored 560 on the GMAT exam. This score puts
you in the 58
th
percentile.
What percentage of test takers scored lower than you
did?
What percentage of test takers scored higher than you
did?
Percentile Example
What percentage of test takers scored lower than you
did?
58% of test takers scored lower than 560.
What percentage of test takers scored higher than you
did?
(100 58)% = 42% of test takers scored
higher than 560.
Numerical Data
Properties & Measures
Mean
Median
Mode
Range
Variance
Standard Deviation
Interquartile Range
Numerical Data
Properties
Central
Tendency
Variation
Percentiles
Relative
Standing
Zscores
Numerical Measures of Relative
Standing: ZScores
Describes the relative location of a measurement
compared to the rest of the data


Sample zscore
x x
s
z =
Population zscore
x

z =
Measures the number of standard deviations
away from the mean a data value is located
ZScore Example
The mean time to assemble a
product is 22.5 minutes with a
standard deviation of 2.5 minutes.
Find the zscore for an item that
took 20 minutes to assemble.
Find the zscore for an item that
took 27.5 minutes to assemble.
ZScore Example
x = 20, = 22.5 = 2.5
x 20 22.5

z = =
2.5
= 1.0
x = 27.5, = 22.5 = 2.5
x 27.5 22.5

z = =
2.5
= 2.0
Quartiles & Box Plots
Quartiles
1. Measure of noncentral tendency
25% 25% 25% 25%
Q
1
Q
2
Q
3
2. Split ordered data into 4 quarters
Positioning Point of Q
i n
i
=
+ 1
4
( )
3. Position of i-th quartile
Quartile (Q
1
) Example
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
Position: 1 2 3 4 5 6
Q Position
Q
1
=
+
=
+
= ~
=
1 1
4
1 6 1
4
1 75 2
6 3
1
n
( ) ( )
.
.
Quartile (Q
2
) Example
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
Position: 1 2 3 4 5 6
Q Position
Q
2
=
+
=
+
=
=
+
=
2 1
4
2 6 1
4
3 5
7 7 8 9
2
8 3
2
n
( ) ( )
.
. .
.
Quartile (Q
3
) Example
Raw Data: 10.3 4.9 8.9 11.7 6.3 7.7
Ordered: 4.9 6.3 7.7 8.9 10.3 11.7
Position: 1 2 3 4 5 6
Q Position
Q
3
=
+
=
+
= ~
=
3 1
4
3 6 1
4
5 25 5
10 3
3
n
( ) ( )
.
.
Numerical Data
Properties & Measures
Mean
Median
Mode
Range
Interquartile Range
Variance
Standard Deviation
Skew
Numerical Data
Properties
Central
Tendency
Variation Shape
Interquartile Range
1. Measure of dispersion
2. Also called midspread
3. Difference between third & first quartiles
Interquartile Range = Q
3
Q
1

4. Spread in middle 50%
5. Not affected by extreme values
Thinking Challenge
Youre a financial analyst for
Prudential-Bache Securities.
You have collected the
following closing stock prices
of new stock issues: 17, 16,
21, 18, 13, 16, 12, 11.
What are the quartiles, Q
1
and
Q
3,
and the interquartile

range?
Q
1

Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
Position: 1 2 3 4 5 6 7 8
Quartile Solution*
Q Position
Q
1
=
+
=
+
=
=
1 1
4
1 8 1
4
3
13
1
n
( ) ( )
Quartile Solution*
Q
3

Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
Position: 1 2 3 4 5 6 7 8
Q Position
Q
3
=
+
=
+
= ~
=
3 1
4
3 8 1
4
6 75 7
18
3
n
( ) ( )
.
Interquartile Range Solution*
Interquartile Range
Raw Data: 17 16 21 18 13 16 12 11
Ordered: 11 12 13 16 16 17 18 21
Position: 1 2 3 4 5 6 7 8
Interquartile Range
= = = Q Q
3 1
18 0 13.0 5 .
Box Plot
1. Graphical display of data using 5-number summary
Median
4 6 8 10 12
Q
3
Q
1
X
largest
X
smallest
Shape & Box Plot
Right-Skewed Left-Skewed Symmetric
Q
1
Median Q
3
Q
1
Median Q
3
Q
1
Median Q
3
Graphing Bivariate Relationships
Graphing Bivariate Relationships
Describes a relationship between two quantitative
variables
Plot the data in a Scattergram
Positive
relationship
Negative
relationship
No
relationship
x x x
y y y
Scattergram Example
Youre a marketing analyst for Hasbro Toys.
You gather the following data:
Ad $ (x) Sales (Units) (y)
1 1
2 1
3 2
4 2
5 4
Draw a scattergram of the data
Scattergram Example
0
1
2
3
4
0 1 2 3 4 5
Sales
Advertising
Time Series Plot
Time Series Plot
Used to graphically display data produced over time
Shows trends and changes in the data over time
Time recorded on the horizontal axis
Measurements recorded on the vertical axis
Points connected by straight lines
Time Series Plot Example
The following data shows
the average retail price of
regular gasoline in New
York City for 8 weeks in
2006.
Draw a time series plot for
this data.

Date
Average
Price
Oct 16, 2006 $2.219
Oct 23, 2006 $2.173
Oct 30, 2006 $2.177
Nov 6, 2006 $2.158
Nov 13, 2006 $2.185
Nov 20, 2006 $2.208
Nov 27, 2006 $2.236
Dec 4, 2006 $2.298
Time Series Plot Example
2.05
2.1
2.15
2.2
2.25
2.3
2.35
10/16 10/23 10/30 11/6 11/13 11/20 11/27 12/4
Date
Price
Distorting the Truth
with Descriptive Techniques
Errors in Presenting Data
1. Using chart junk
2. No relative basis in
comparing data
batches
3. Compressing the
vertical axis
4. No zero point on the
vertical axis
Chart Junk
Bad Presentation Good Presentation
1960: $1.00
1970: $1.60
1980: $3.10
1990: $3.80
Minimum Wage Minimum Wage
0
2
4
1960 1970 1980 1990
$
No Relative Basis
Good Presentation
As by Class As by Class
Bad Presentation
0
100
200
300
FR SO JR SR
Freq.
0%
10%
20%
30%
FR SO JR SR
%
Compressing
Vertical Axis
Good Presentation
Quarterly Sales Quarterly Sales
Bad Presentation
0
25
50
Q1 Q2 Q3 Q4
$
0
100
200
Q1 Q2 Q3 Q4
$
No Zero Point
on Vertical Axis
Good Presentation
Monthly Sales Monthly Sales
Bad Presentation
0
20
40
60
J M M J S N
$
36
39
42
45
J M M J S N
$

Das könnte Ihnen auch gefallen