Beruflich Dokumente
Kultur Dokumente
© 2005 Thomson/South-Western 1
Measures of Distribution Shape,
Relative Location, and Detecting Outliers
■ Distribution Shape
■ z-Scores
■ Chebyshev’s Theorem
■ Empirical Rule
■ Detecting Outliers
© 2005 Thomson/South-Western 2
Distribution Shape: Skewness
© 2005 Thomson/South-Western 3
Distribution Shape: Skewness
.30
.25
.20
.15
.10
.05
0
© 2005 Thomson/South-Western 4
Distribution Shape: Skewness
.30
.25
.20
.15
.10
.05
0
© 2005 Thomson/South-Western 5
Distribution Shape: Skewness
.30
.25
.20
.15
.10
.05
0
© 2005 Thomson/South-Western 6
Distribution Shape: Skewness
.30
.25
.20
.15
.10
.05
0
© 2005 Thomson/South-Western 7
Distribution Shape: Skewness
© 2005 Thomson/South-Western 8
Distribution Shape: Skewness
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
© 2005 Thomson/South-Western 9
Distribution Shape: Skewness
.30
.25
.20
.15
.10
.05
0
© 2005 Thomson/South-Western 10
z-Scores
The
The z-score
z-score is
is often
often called
called the
the standardized
standardized value.
value.
It
It denotes
denotes the the number
number of
of standard
standard deviations
deviations aa data
data
value
value xxii is
is from
from the
the mean.
mean.
xi − x
zi =
s
© 2005 Thomson/South-Western 11
z-Scores
© 2005 Thomson/South-Western 12
z-Scores
© 2005 Thomson/South-Western 13
Chebyshev’s Theorem (or Chebyshev's
inequality)
At
At least
least (1
(1 -- 1/
1/zz22)) of
of the
the items
items inin any
any data
data set
set will
will be
be
within
within zz standard
standard deviations
deviations of of the
the mean,
mean, where
where zz is
is
any
any value
value greater
greater than than 11
(z>1,
(z>1, beware
beware that
that value
value 11 is
is not
not included).
included).
© 2005 Thomson/South-Western 14
Pafnuty Chebyshev
(1821-1894)
© 2005 Thomson/South-Western 15
Chebyshev’s Theorem
At
At least
least 75%
75%of
of the
the data
data values
values must
must be
be
within
within zz =
=22 standard
standard deviations
deviations of
of the
the mean.
mean.
At
At least
least 89%
89%of
of the
the data
data values
values must
must be
be
within
within zz =
=33 standard
standard deviations
deviations of
of the
the mean.
mean.
At
At least
least 94%
94%of
of the
the data
data values
values must
must be
be
within
within zz =
=44 standard
standard deviations
deviations of
of the
the mean.
mean.
© 2005 Thomson/South-Western 16
Chebyshev’s Theorem
For example:
Let z = 1.5 with x= 490.80 and s = 54.74
© 2005 Thomson/South-Western 17
Empirical Rule
68.26%
68.26%of
of the
the values
values of
of aa normal
normal random
random variable
variable
are
are within
within +/-
+/- 1
1 standard
standard deviation
deviation of
of its
its mean.
mean.
95.44%
95.44%of
of the
the values
values of
of aa normal
normal random
random variable
variable
are
are within
within +/-
+/- 2
2 standard
standard deviations
deviations of
of its
its mean.
mean.
99.72%
99.72%of
of the
the values
values of
of aa normal
normal random
random variable
variable
are
are within
within +/-
+/- 3
3 standard
standard deviations
deviations of
of its
its mean.
mean.
© 2005 Thomson/South-Western 18
Empirical Rule
99.72%
95.44%
68.26%
µ
x
µ – 3σ µ – 1σ µ + 1σ µ + 3σ
µ – 2σ µ + 2σ
© 2005 Thomson/South-Western 19
Detecting Outliers
© 2005 Thomson/South-Western 20
Detecting Outliers
© 2005 Thomson/South-Western 21
Exploratory Data Analysis
■ Five-Number Summary
■ Box Plot
© 2005 Thomson/South-Western 22
Five-Number Summary
1 Smallest Value
2 First Quartile
3 Median
4 Third Quartile
5 Largest Value
© 2005 Thomson/South-Western 23
Five-Number Summary
© 2005 Thomson/South-Western 24
Box Plot
37 40 42 45 47 50 52 55 57 60 62
5 0 5 0 5 0 5 0 5 0 5
Q1 = 445 Q3 = 525
Q2 = 475
© 2005 Thomson/South-Western 25
Box Plot
© 2005 Thomson/South-Western 26
Box Plot
© 2005 Thomson/South-Western 27
Box Plot
37 40 42 45 47 50 52 55 57 60 62
5 0 5 0 5 0 5 0 5 0 5
Smallest value Largest value
inside limits = 425 inside limits = 615
© 2005 Thomson/South-Western 28
Measures of Association
Between Two Variables
■ Covariance
■ Correlation Coefficient
© 2005 Thomson/South-Western 29
Covariance
The
The covariance
covariance is
is aa measure
measure of
of the
the linear
linear association
association
between
between two
two variables.
variables.
Positive
Positive values
values indicate
indicate aa positive
positive relationship.
relationship.
Negative
Negative values
values indicate
indicate aa negative
negative relationship.
relationship.
© 2005 Thomson/South-Western 30
Covariance
The
The covariance
covariance is
is computed
computed as
as follows:
follows:
∑ ( xi − x )( yi − y ) for
sxy =
n −1 samples
∑ ( xi − µ x )( yi − µ y ) for
σ xy = populations
N
© 2005 Thomson/South-Western 31
Correlation Coefficient
The
The correlation
correlation coefficient
coefficient is
is computed
computed as
as follows:
follows:
sxy σ xy
rxy = ρ xy =
sx s y σ xσ y
for for
samples populations
© 2005 Thomson/South-Western 32
Correlation Coefficient
The
The coefficient
coefficient can
can take
take on
on values
values between
between -1
-1 and
and +1.
+1.
Values
Values near
near -1-1 indicate
indicate aa strong
strong negative
negative linear
linear
relationship
relationship..
Values
Values near
near +1+1 indicate
indicate aa strong
strong positive
positive linear
linear
relationship
relationship..
© 2005 Thomson/South-Western 33
Correlation Coefficient
Correlation
Correlation is
is aa measure
measure of
of linear
linear association
association and
and not
not
necessarily
necessarily causation.
causation.
Just
Just because
because two
two variables
variables are
are highly
highly correlated,
correlated, itit
does
does not
not mean
mean that
that one
one variable
variable is
is the
the cause
cause of
of the
the
other.
other.
© 2005 Thomson/South-Western 34
Covariance and Correlation Coefficient
© 2005 Thomson/South-Western 35
Covariance and Correlation Coefficient
© 2005 Thomson/South-Western 36
Covariance and Correlation Coefficient
■ Sample Covariance
sxy =
∑ (x − x )(y
i i − y ) −35.40
= = 7.08
−
n− 1 6 −1
■ Sample Correlation Coefficient
sxy −7.08
rxy = = = -.9631
sxsy (8.2192)(.8944)
© 2005 Thomson/South-Western 37
The Weighted Mean and
Working with Grouped Data
■ Weighted Mean
■ Mean for Grouped Data
■ Variance for Grouped Data
■ Standard Deviation for Grouped Data
© 2005 Thomson/South-Western 38
Weighted Mean
© 2005 Thomson/South-Western 39
Weighted Mean
x= ∑ wx i i
∑w i
where:
xi = value of observation i
wi = weight for observation i
© 2005 Thomson/South-Western 40
Grouped Data
© 2005 Thomson/South-Western 41
Mean for Grouped Data
■ Sample Data
x= ∑ fMi i
■ Population Data
µ= ∑ fMi i
N
where:
fi = frequency of class i
Mi = midpoint of class i
© 2005 Thomson/South-Western 42
Sample Mean for Grouped Data
© 2005 Thomson/South-Western 43
Sample Mean for Grouped Data
© 2005 Thomson/South-Western 44
Variance for Grouped Data
∑ f i ( M i − x ) 2
s2 =
n −1
∑ f i ( M i − µ ) 2
σ2 =
N
© 2005 Thomson/South-Western 45
Sample Variance for Grouped Data
Rent ($) fi Mi Mi - x (M i - x )2 f i (M i - x )2
420-439 8 429.5 -63.7 4058.96 32471.71
440-459 17 449.5 -43.7 1910.56 32479.59
460-479 12 469.5 -23.7 562.16 6745.97
480-499 8 489.5 -3.7 13.76 110.11
500-519 7 509.5 16.3 265.36 1857.55
520-539 4 529.5 36.3 1316.96 5267.86
540-559 2 549.5 56.3 3168.56 6337.13
560-579 4 569.5 76.3 5820.16 23280.66
580-599 2 589.5 96.3 9271.76 18543.53
600-619 6 609.5 116.3 13523.36 81140.18
Total 70 208234.29
continued
© 2005 Thomson/South-Western 46
Sample Variance for Grouped Data
■ Sample Variance
s2 = 208,234.29/(70 – 1) = 3,017.89
© 2005 Thomson/South-Western 47
End of Chapter 3, Part B
© 2005 Thomson/South-Western 48