Beruflich Dokumente
Kultur Dokumente
Motivation: the following kinds of statements in newspaper and magazine appear very
frequently,
Sales of new homes are accruing at a rate of 70300 homes per year.
The unemployment rate has dropped to 4.0%.
The Dow Jones Industrial Average closed at 10000
Census
The above numerical descriptions are very familiar to most of us since we use it in
everyday life. As a matter of fact, these are part of statistics. Therefore, statistics is in
our everyday life. We now give a description about statistics.
Definition of statistics:
Statistics is the art and science of collecting, analyzing, presenting, interpreting
and predicting data.
Objective of this course: using statistics is to give the managers and decision makers a
better understanding of the business and economic environment and thus enable them
to make more informed and better decision.
1.1 Data
Example 1:
We have a data set for the following 5 stocks:
1
Koss Corp 36.1 0.89 OTC
Par Technology 81.2 0.32 NYSE
Scientific Tech. 17.3 0.46 OTC
Western Beef 273.7 0.78 OTC
Note: OTC stands for over the counter while NYSE stands for New York Stock
Exchange.
Example 1 (continue):
Qualitative data: OTC, OTC, NYSE, OTC and OTC
Quantitative data: 86.6, 36.1, 81.2, 17.3, 273.7, 0.25, 0.89, 0.32, 0.46 and 0.78
Note: quantitative data are always numeric, but qualitative data may be either
numeric or nonnumeric, for example, id numbers and automobile license plate
numbers are qualitative data.
Note: ordinary arithmetic operations are meaningful only with quantitative data
and are not meaningful with qualitative data.
Cross-sectional data: data collected at the same or approximately the same point in
2
time.
Time series data: data collected over several time periods.
Online Exercise:
Exercise 1.1.1
Exercise 1.1.2
Online Exercise:
Exercise 1.2.1
3
Example 1(continue):
Tabular approach for stock data:
OTC NYSE
Some numerical quantities can be used to provide important information about the
data, for example, the average or mean. Index numbers are widely used in business,
for example, the Consumer Price Index (CPI) and te Dow Jones Industrial Average
(DJIA).
Online Exercise:
Exercise 1.3.1
Exercise 1.3.2
Descriptive statistics introduced in section 1.4 can provide important and intuitive
information about the data of interest. However, these statistical measures are mainly
exploratory. For more detailed, rigorous and accurate results, the statistical inference
4
procedure is required. To conduct a statistical inference, data need to be drawn from a
set of elements of interest. We now introduce some basic components in the statistical
inference procedure. They are:
Data from a sample can be used to make estimates and test hypotheses about the
characteristics of a population
Example 2:
The 100000 bulbs are the population of interest. In practice, it is not possible (also not
realistic) to test 100000 bulbs for the lifetime. One workable way is to draw a sample,
say 100 bulbs, and then test for their lifetime. Suppose the average lifetime of the 100
bulbs is 750 hours. Then, the estimate (guess) of the average lifetime of the 100000
bulbs is 750 hours.
Note: the process of making estimates and testing hypotheses about the
characteristics of a population is referred to as statistical inference.
Online Exercise:
Exercise 1.4.1
5
2.1 Summarizing Qualitative Data:
For qualitative data, we can use frequency distribution and relative frequency. We
now introduce frequency distribution, relative frequency and percent frequency.
Relative frequency: (frequency of a class)/n, where n is the total number of the data.
Based on the frequency distribution, relative frequency, and percent frequency of the
data, we can use table and graphs to display these frequencies.
Example:
Forbes investigates the degrees of 25 best paid CEO (chief executive officer).
Tabular summary:
None 2 0.08 8
Bachelor 11 0.44 44
Master 7 0.28 28
Doctorate 5 0.20 20
Graphical display:
6
Bar Graph:
10
8
6
4
2
0
Pie Graph:
CEO Degrees
Online Exercise:
Exercise 2.1.1
Exercise 2.1.2
7
2.2 Summarizing Quantitative Data:
For quantitative data, we need to define the classes first. There are 3 steps to define
the classes for a frequency distribution:
Step 3: Determine the class limits: the smallest possible data value should be larger
than the lower class limit while the largest possible data value should be smaller than
the upper class limit.
Example:
Step 1:
We choose 5 to be the number of classes.
Step 2:
largest data value smallest data value 33 12
class width 4.2 .
number of classes 5
Therefore, we use 5 as the class width.
Step 3:
The 5 classes we choose are
10-14 15-19 20-24 25-29 30-34
8
Note: the lower class limit in the first class (10) is smaller than the
smallest data value 12. Also, the upper class limit in the last class (34) is
smaller than the largest data value 33.
Tabular summary:
Cumulative frequency distribution: the number of data items with values less than
or equal to the upper class limit of each class.
Graphical display:
Ogive: the number of data items with values less than or equal to the upper class
limit of each class.
Example (continue):
10-14 4 0.2 20
15-19 8 0.4 40
20-24 5 0.25 25
25-29 2 0.1 10
30-34 1 0.05 5
Total 20 1 100
9
19 4+8=12 0.2+0.4=0.6 20+40=60
The histogram is
8
6
4
2
0
10 15 20 25 30 35
data
20
cumulative frequency
15
10
0 5 10 15 20 25 30 35
data
Online Exercise:
10
Exercise 2.2.1
Exercise 2.1.2
Stem-and-leaf display is a useful exploratory data analysis tool which can provide an
idea of the shape of the distribution of a set of quantitative data.
Example:
1 7
2 2
3
4
5 2
6 68
7 1
8 27
9 35
Online Exercise:
Exercise 2.3.1
11
In practice, it is not realistic or not possible to obtain population parameter from a
population, for example, the average lifetime of 100000 bulbs. Therefore, the sample
statistic can be used to estimate the population parameter, for example, the average
lifetime of 100 bulbs can be used to estimate the average lifetime of 100000 bulbs..
Example:
Suppose the following data are the scores of 10 students in a quiz,
1, 3, 5, 7, 9, 2, 4, 6, 8, 10.
Some measures need to be used to provide information about the performance of the
10 students in this quiz.
(I) Mean:
n
Sample mean: x i
(sample statistic)
x i 1
n
N
Population mean: y i
(population parameter)
i 1
Basically, the mean can provide the information about the center of the data.
Intuitively, it can measure the rough location of the data.
Example (continue):
1 3 10
x 5.5
10
(II) Median:
Example (continue):
12
56
median 5.5
2
If the data are 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11. Then,
median 6
Note: the median is less sensitive to the data with extreme values than
the mean. For example, in the previous data, suppose the last data has
been wrongly typed, the data become 1, 3, 5, 7, 9, 2, 4, 6, 8, 100. Then
the median is still 5.5 while the mean becomes 14.5.
(III) Mode:
The data value occurs with greatest frequency (not necessarily to be numerical).
Note: if the data have exactly two modes, we say that the data are
bimodal. If the data have more than two modes, we say that the data are
multimodal.
(IV) Percentile:
The pth percentile is a value such as at least p percent of the data have this value or
less.
Example (continue):
Please find 40th percentile and 26th percentile for the previous data.
[Solution]
13
Step 1: the data in ascending order are
1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
Step 2:
For 40 th percentile,
40
i 10 4 .
100
For 26 th percentile,
26
i 10 2.6
100
Step 3:
45
40th percentile 4.5
2
and
26th percentile 3
(V) Quartiles:
When dividing data into 4 parts, the division points are referred to as the quartile!!
That is,
Q1 the first quartile or 25th percentile
Q2 the second quartile or 50th percentile
Q3 the third quartile or 75th percentile
Example (continuous):
Find the first quartile and the third quartile for the previous example.
Step 2:
For the first quartile,
25
i 10 2.5 .
100
For the third quartile,
75
i 10 7.5
100
Step 3:
Q1 3
and
Q3 8
Online Exercise:
14
Exercise 3.1.1
Exercise 3.1.2
Example:
Suppose there are two factories producing the batteries. From each factory, 10
batteries are drawn to test for the lifetime (in hours). These lifetimes are:
Factory 1: 10.1, 9.9, 10.1, 9.9, 9.9, 10.1, 9.9, 10.1, 9.9, 10.1
Factory 2: 16, 5, 7, 14, 6, 15, 3, 13, 9, 12.
The mean lifetimes of the two factories are both 10. However, by looking at the data,
it is obvious that the batteries produced by factory 1 are much more reliable than the
ones by factory 2. This implies other measures for measuring the dispersion or
variation of the data are required.
(I) Range:
range(largest value of the data)(smallest value of the data).
Example (continue):
Note: the range is seldom used as the only measure of dispersion. The
range is highly influenced by an extremely large or an extremely small
data value.
Example:
15
The first quartile and the third quartile for the data from factory 1 are 9.9 and 10.1,
respectively, and 6 and 14 for the data from factory 2. Therefore,
IQR (factory 1)10.19.90.2
IQR (factory 2)1468.
The interquartile of battery lifetimes for factory 1 is much smaller than the one
for factor 2.
y
2
i
2 i 1
N
and
n n
xi x xi2 nx 2 ,
2
s2 i 1
i 1
n 1 n 1
respectively. The population standard deviation and sample standard deviation are the
square root of population variance and sample variance:
2
and
s s2 ,
respectively.
Large sample variance or sample standard deviation implies the data are dispersed
or are highly varied.
Note: n n n x i n n
xi x xi nx xi n i 1
i 1 i 1 i 1 n
xi xi 0
i 1 i 1
Example:
s 2 ( factory .1)
10.1 10 2 9.9 10 2 10.1 10 2 0.0111
10 1
16
s 2 ( factory.2)
16 10 2 5 10 2 12 10 2 21.1111
10 1
The sample variance of battery lifetimes for factory 2 is 190 times larger than
the one for factor 1.
The sample standard deviation for the data from factories 1 and 2 are
0.01111 0.1054 and 21.1111 4.5946 ,
respectively.
Example:
In the battery data from factory 1, suppose the measurement is in minutes rather than
hours. Then, the data are 606, 594, 606, 594, 594, 606, 594, 606, 594, 606.
Thus, the standard deviation becomes 6.3245 which is 60 times larger than the one
0.1054 based on the original data measured in hours. However, no matter the data are
measured in hours and minutes, the coefficient of variation is
0.1054 6.3245
C.V . 100 100 1.054.
10 600
Online Exercise:
Exercise 3.2.1
Exercise 3.2.2
17
3.3 Exploratory Data Analysis:
The five number summary can provide important information about both the location
and the dispersion of the data. They are
Smallest value
First quartile
Median
Third quartile
Largest value
Example (continue):
The box-plot is commonly used graphical method to provide information about both
the location and dispersion of the data. Especially, as the interest is the comparison of
the data from different populations, the box-plot can provide insight. The box-plot is
1.5IQR 1.5IQR
18
limit Q1 Q3 limit
Note: data outside upper limit and lower limit are called outliers.
Example (continue):
factory1 factory2
Online Exercise:
Exercise 3.3.1
z-score is the quantity which can be used to measure the relative location of the data.
Z-score, referred to as the standardized value for observation i, is defined as
xi x
zi .
s
Note: zi is the number of standard deviation xi from the mean x .
Example (continue):
19
Factory 1:
xi 10.1 9.9 10.1 9.9 9.9 10.1 9.9 10.1 9.9 10.1
zi 0.948 -0.948 0.948 -0.948 -0.948 0.948 -0.948 0.948 -0.948 0.948
Factory 2:
xi 16 5 7 14 6 15 3 13 9 12
zi 1.305 -1.088 -0.652 0.870 -0.870 1.088 -1.523 0.652 -0.217 0.435
There are two results related to the location of the data. The first result is Chebyshevs
theorem.
Chebyshevs Theorem:
For any population, within k standard deviation of mean, there are at least
1
(1 ) 100%
k2
of the data, where k is any value greater than 1.
Based on Chebyshevs theorem, for any data set, it could be roughly estimated that at
1
least (1 ) 100% of data within k sample standard deviation of mean.
k2
Example (continue):
The second result is based on the empirical rule. The rule is especially applicable as
the data have a bell-shaped distribution. The empirical rule is
Approximately 68% of the data will be within one standard deviation of the
mean ( 1 z i 1 ).
20
Approximately 95% of the data will be within one standard deviation of the
mean ( 2 z i 2 ).
Almost all of the data will be within one standard deviation of the mean (
3 z i 3 ).
Example (continue):
For data from factory 1, all the data are within one standard deviation of the mean
while 60% of the data are within one standard deviation of the mean for the data from
the factory2. The result based on the empirical rule is not applicable to the two data
set since the two data sets are not bell-shaped. However, for the following data,
2.11 -0.83 -1.43 1.35 -0.42 -0.69 -0.65 -0.29 -0.54 1.92
0.53 -0.27 1.7 0.88 1.25 0.32 -2.18 0.68 0.85 0.34
The histogram of the above data given below indicates the data is roughly bell-
shaped.
4
3
2
1
0
-2 -1 0 1 2
rn1
Approximately 65% of the data are within one standard deviation of the mean, which
is similar to the result based on the empirical rule (68%).
Detecting Outliers:
To identify the outliers, we can use either the box-plot or the z-score. The outliers
identified by the box-plot are those data outside the upper limit or lower limit while
21
the outliers identified by z-score are those with z-score smaller than 3 or
greater than 3.
Note: the outliers identified by box-plot might be different from
those identified by using z-score .
Online Exercise:
Exercise 3.4.1
Weighted Mean:
n
w x i i
xw i 1
n .
w
i 1
i
Note: when data values vary in importance, the analyst must choose the
weight that best reflects the importance of each data value in the
determination of the mean.
Example 1:
The following are 5 purchases of a raw material over the past 3 months.
Purchase Cost per Pound ($) Number of Pounds
1 3.00 1200
2 3.40 500
3 2.80 2750
4 2.90 1000
5 3.25 800
[solutions:]
w1 1200, w2 500, w3 2750, w4 1000, w5 800.
and
x1 3.00, x2 3.40, x3 2.80, x4 2.90, x5 3.25.
22
Then,
5
w x i i
xw i 1
5
w
i 1
i
1200 3.00 500 3.40 2750 2.80 1000 2.90 800 3.25
1200 500 2750 1000 800
2.96
Fk M k F M k k
g k 1
m
k 1
,
N
F k 1
k
where
Mk : the midpoint for class k,
Fk : the frequency for class k in the population,
m
N Fk : the population size.
k 1
fk M k f M k k
xg k 1
m
k 1
,
n
f k 1
k
where
fk : the frequency for class k in the sample,
m
n f k : the sample size.
k 1
f M
2
k
2
k nxg2
s g2 k 1
k 1
n 1 n 1
23
Example 2:
The following are the frequency distribution of the time in days required to complete
year-end audits:
Audit Time (days) Frequency
10-14 4
15-19 8
20-24 5
25-29 2
30-34 1
What is the mean and the variance of the audit time?
[solutions:]
f1 4, f 2 8, f 3 5, f 4 2, f 5 1.
n f 1 f 2 f 3 f 4 f 5 4 8 5 2 1 20
and
M 1 12, M 2 17, M 3 22, M 4 27, M 5 32.
Thus,
5
fM i i
4 12 8 17 5 22 2 27 1 32 and
xg i 1
19
5
4 8 5 2 1
f i
i 1 Online
Exercise:
Exercise 3.5.1
f M xg
5
2
i i
s g2 i 1
n 1
4 12 19 8 17 19 5 22 19 2 27 19 1 32 19
2 2 2 2 2
20 1
30
24
They are:
1. Crosstabulations
Example:
Objective: explore the association of the quality and the price for the
restaurants in the Los Angeles area.
The following table is the crosstabulation of the quality rating (good, very good
and excellent) and the mean price ($10-19, $20-29, $30-39, and $40-49) data
collected for a sample 300 restaurants located in the Los Angeles area.
Meal Price
Quality $10-19 $20-29 $30-39 $40-49 Total
Rating
Good 42 40 2 0 84
Very Good 34 64 46 6 150
Excellent 2 14 28 22 66
Total 78 118 76 28 300
The above crosstabulation provides insight abut the relationship between the
variables, quality rating and mean price. It seems higher meal prices appear to be
associated with the higher quality restaurants and the lower meal prices appear to be
associated with the lower quality restaurants. For example, for the most expensive
restaurants ($40-49), none of these restaurants is rated the lowest quality but most of
them are rated highest quality. On the other hand, for the least expensive restaurants
25
($10-19), only 2 of these restaurants are rated the highest quality ( 2 2.56% ) but
78
over half of them are rated lowest quality.
2. Scatter Diagram
Suppose we have the following scatter diagrams for the weights and heights of the
students:
180
180
175
175
171
170
170
height
height
height
165
170
165
160
160
155
169
155
150
50 55 60 65 70 75 80 50 55 60 65 70 75 80 50 55 60 65 70 75 80
The left scatter diagram indicates the positive relationship between weight and
height while the right scatter diagram implies the negative relationship between
the two variables. The middle scatter diagram shows that there is no apparent
relationship between the weight and height.
Online Exercise:
Exercise 4.1.1
There are several numerical measures of association. We first introduce the covariance
of two variables.
(I) Covariance:
Suppose we have two populations,
26
population 1: y1 , y 2 , , y N and population 2: w1 , w2 , , w N .
Also, let
sample 1: x1 , x2 , , xn and sample 2: z1 , z 2 , , z n
are drawn from population 1 and population 2, respectively.
xi and z i
zx i 1 i 1
n n
be the sample means of samples 1 and 2, respectively.
(y i y )( wi w )
,
yw i 1
N
while the sample covariance
n n
( x x )( z
i i z) x z i i nx z
.
s xz i 1
i 1
n 1 n 1
Example: .
Let xi be the total money spent on advertisement for some product and z i be the
sales volume (1 unit 1000 packs).
xi 2 5 1 3 4 1 5 3 4 2
27
zi 50 57 41 54 54 38 63 48 59 46
( xi x )( z i z ) 1 12 20 0 3 26 24 0 8 5
10
(x i x )( z i z )
99 .
s xz i 1
11
10 1 10 1
Example (continue):
10 10
( xi x ) 2 and (z i z )2
s x2 i 1
1.4907 s z2 i 1
7.9303
10 1 10 1
28
Then,
10
s (x i x )( zi z )
rxz xz i 1
0.93 .
sx sz 10 10
(x
i 1
i x)2 (z
i 1
i z)2
Example:
Let z i 2 xi , i 1,2,3,4,5 .
xi 1 2 3 4 5
zi 2 4 6 8 10
Then,
5 5
( xi x ) 2 5
(z i z)2
,
x 3, z 6, s x i 1
, sz i 1
10
5 1 2 5 1
5
(x i x )( z i z )
.
s xz i 1
5
5 1
Thus,
s xz 5
rxz 1
sx sz 5 .
10
2
Online Exercise:
Exercise 4.2.1
29
Chapter 5 Introduction to Probability
Experiment Outcomes
Toss a coin Head, Tail
Roll a dice 1, 2, 3, 4, 5, 6
Play a football game Win, Lose, Tie
Rain tomorrow Rain, No rain
Example:
Example:
30
Step 1 Step 2 Experimental Outcomes
(throw dice) (throw coin)
1 T
(1,T),(1,H)
H
2 T
(2,T),(2,H)
H
3 T
(3,T),(3,H)
H
4 (4,T),(4,H)
T
H
5 (5,T),(5,H)
T
H
6 (6,T),(6,H)
T
H
S {(1, T ), (1, H ), (2, T ), (2, H ), (3, T ), (3, H ), (4, T ), (4, H ), (5, T ), (5, H ), (6, T ), (6, H )}
The total number of experimental outcomes= 12 6 2
2. Permutations:
n objects are to be selected from a set of N objects, where the order is
important.
Example:
31
Example:
n=3
543
N(N-1)(N-n+1)
N=5 5 4 3
Example:
n
N-1 N!
N (N-1) (N-2) [N-(n-1)]=
( N n)!
n
Counting rule for permutation:
As n objects are taken from N objects, then the total number of
32
permutations is given by
N!
PnN ( N n 1)( N n 2) N
( N n)!
where N ! 1 2 3 N and 0! 1.
3. Combinations:
n objects are to be selected from a set of N objects, where the order
is not important.
Example:
Example:
5
P2 5 4= 20
5
P
5
C 2
2
2!
10
10 combinations
20 permutations
33
Example:
5
P 3
= 543=60
P
5 P35
C 3
3
3!
6
10
5 4 3
3!=6
1 combinations, total 10 combinations.
Example:
34
N
Pn permutations
P
N
C n
n-1
n!
n
N-1
n n! n!
P n!
n
(n n)! 0!
1 combination
n-1
n n-1 1
35
5.2. Events and Their Probability
Modern probability theory: a probability value that expresses our degree of belief that
the experimental outcome will occur is specified.
Example:
Example:
Example:
1 1 1 1
P ( E1 ) P ({2,4,6}) P (e2 ) P (e4 ) P (e6 ) .
6 6 6 2
36
Note: P(S ) 1
Online Exercise:
Exercise 5.2.1
Example:
Example:
Results:
1. P ( A c ) 1 P ( A)
37
P( A B ) P( A) P ( B ) P ( A B)
A B
A B, A
Example:
1 1
1. P( E 2 ) P({1,3,5}) P({2,4,6}c ) P( E1c ) 1 P( E1 ) 1
2 2
1 1
2. P( E1 E 2 ) 0, P ( E1 E 2 ) P( E1 ) P ( E 2 ) 1
2 2
5
3. P( E1 E3 ) P({1,2,3,4,6}) . We can also use the addition law, then
6
P( E1 E3 ) P( E1 ) P( E3 ) P ( E1 E3 ) P ({2,4,6}) P ({1,2,3}) P ({2})
1 1 1 5
2 2 6 6
Online Exercise:
Exercise 5.3.1
Exercise 5.3.2
38
5.4. Conditional Probability
occurred.
Example:
Example:
1
P ( E1 {2}) P ({2}) 6 1
P ({2} | E1 )
P ( E1 ) P ( E1 ) 1 3
2
Note: P( A | B) P( A | B) 1
c
Note: P( A B) P( B) P( A | B) P( A) P( B | A)
Independent Events:
P ( A | B ) P ( A)
P ( B | A) P ( B ) .
39
Dependent Events:
P ( A | B ) P ( A)
P ( B | A) P ( B ) .
Intuitively, if events A and B are independent, then the chance of event A occurring is
the same no matter whether event B has occurred. That is, event A occurring is
independent of event B occurring. On the other hand, if events A and B are
dependent, then the chance of event A occurring given that event B has occurred will
be different from the one with event B not occurring.
Example:
The above result implies the chance of a promotion knowing the candidate being male
is twice higher than the one knowing the one being female. In addition, the chance of
a promotion knowing the candidate being female (0.15) is much lower than the
overall promotion rate (0.27). That is, the promotion event A is dependent on the
gender event M or W.
A promotion is related to the gender.
Note: P( A B) P( A) P( B) as events A and B are independent.
Online Exercise:
Exercise 5.4.1
Exercise 5.4.2
40
5.5. Bayes Theorem
Example 1:
Example 2:
41
finance assessment for the company in this year.
To find the required probability in the above two examples, the following Bayess
theorem can be used.
P( A B) P ( A) P( B | A)
P ( A | B)
P( B) P ( A) P ( B | A) P ( A c ) P ( B | A c )
A B Ac
BA BAc
P( B A)
We want to know P ( A | B ) . Since
P( B)
P ( B A) P ( A) P( B | A) ,
and
P( B) P( B A) P( B A c ) P( A) P( B | A) P( A c ) P( B | A c ) ,
thus,
P( B A) P( B A) P( A) P( B | A)
P( A | B)
P( B) P( B A) P( B A ) P( A) P ( B | A) P ( A c ) P ( B | A c )
c
Example 1:
A patient with test positive still has high probability (0.7519) of no AIDS.
42
Bayess Theorem (general):
Let A , A , , A be mutually exclusive events and
1 2 n
A A A S , 1 2 n
then
P ( Ai B )
P( Ai | B )
P( B)
P ( Ai ) P ( B | Ai ) ,
..............
P ( A1 ) P( B | A1 ) P ( A2 ) P ( B | A2 ) P ( An ) P ( B | An )
i 1,2, , n .
A1 A2 .. An
B
BA
BA
1
BABA
2
BA
BAnn
1 2
BA1 BA2 BAn
Since
P( B Ai ) P ( Ai ) P ( B | Ai ) ,
and
P ( B ) P ( B A1 ) P ( B A2 ) P ( B An )
,
....... P ( A1 ) P ( B | A1 ) P ( A2 ) P ( B | A2 ) P ( An ) P ( B | An )
thus,
P( B Ai ) P( Ai ) P( B | Ai )
P( Ai | B)
P( B) P( A1 ) P( B | A1 ) P( A2 ) P( B | A2 ) P( An ) P( B | An )
Example 2:
43
P ( A1 ) P ( B1 | A1 )
P( A1 | B1 )
P ( A1 ) P( B1 | A1 ) P( A2 ) P ( B1 | A2 ) P( A3 ) P ( B1 | A3 )
0.5 * 0.9
................ 0.95
0.5 * 0.9 0.2 * 0.05 0.3 * 0.05
A company with good finance assessment has very high probability (0.95) of
good finance situation in the coming year.
Online Exercise:
Exercise 5.5.1
Example:
Win 3 30
Lose -4 -40
Tie 0 0
In this example, the sample space is S {Win , Lose, Tie} , containing 3 outcomes. X
is the quantity representing the token obtained or lose under different result while Y is
the one representing the money obtained or lost.
44
the experimental outcome. A formal definition for these numerical quantities is in the
following.
Example:
Note that
Y 10 X ,
since
Example:
Online Exercise:
Exercise 6.1.1
Example:
46
Let f x (x) be some function corresponding to the probability of the gambling
outcomes for random variable X, defined as
1
f x (3) P ( X 3)
6
2
f x ( 4) P ( X 4)
3
1
f x (0) P ( X 0)
6
f x (x) is referred as the probability distribution of random variable X.
Similarly, the probability distribution f y (x ) of random variable Y is
1
f y (30) P (Y 30)
6
2
f y ( 40) P (Y 40)
3
1
f y (0) P (Y 0)
6
Required conditions for a discrete probability distribution:
Let a1 , a 2 , , a n , be all the possible values of the discrete random
variable X. Then, the required conditions for f x (x) to be the discrete
probability distribution for X are
(a) f x (a i ) 0, for every i.
(b) f (a ) f (a ) f (a
i
i 1 2 ) f (a n ) 1
Example:
In the gambling example, f x (x) is a discrete probability distribution for the random
variable X since
(a) f x (3) 0, f x ( 4) 0, and f x (0) 0 .
(b) f x (3) f x (4) f x (0) 1 .
Similarly, f y (x ) is also a discrete probability distribution for the random variable Y.
47
numerical value since there are uncountable number of values in an interval. Instead,
the probability can be assigned to a small interval. The probability density function
can describe how the probability distributes in the small interval.
Example:
In the delay flight time example, suppose the probability of being late within 0.5
hours is two times of the one of being late more than 0.5 hour, i.e.,
2 1
P (0 Z 0.5) and P (0.5 Z 1) .
3 3
Then, the probability density function f 1 ( x ) for the random variable Z is
4 2
f1 ( x ) , 0 x 0.5; f1 ( x ) , 0.5 x 1.
3 3
1.5
1.0
f1(x)
0.5
0.0
The area corresponding to the interval is the probability of the random variable Z
taking values in this interval. For example, the probability of the flight time being late
within 0.5 hour (the random variable Z taking value in the interval [0,0.5]). is
0.5
4 2
P(The flight time being late within 0.5 hour) P(0 Z 0.5) f
0
1 ( x)dx
3
* 0.5
3
.
Similarly, the probability of the flight time being late more than 0.5 hour (the random
variable Z taking value in the interval (0.5,1]). is
1
2 1
P(The flight time being late more than 0.5 hour) P(0.5 Z 1) f
0.5
1 ( x)dx
3
* 0.5
3
.
48
On the other hand, If the probability of being late within 0.5 hours is the same as the
one of being late more than 0.5 hour, i.e.,
1
P (0 Z 0.5) P (0.5 Z 1) ,
2
then, the probability density function f 2 ( x ) for the random variable Z is
f 2 ( x) 1, 0 x 1.
Note that the probability density function corresponds to the probability of the random
variable taking values in some interval. However, the probability density function
evaluated at some value, not like the probability distribution, can not be used to
describe the probability of the random variable Z taking this value.
(b) f ( x)dx 1
a
d
Example:
In the flight time example, f 1 ( x ) is a discrete probability distribution for the random
variable Z since
(a) f1 ( x ) 0, 0 x 1 .
0.5 1
4 2
(b) 0 3 dx 0.5 3 dx 1 .
Online Exercise:
Exercise 6.2.1
49
(I): Discrete Random Variable:
Example:
X: the random variable representing the point of throwing a fair dice. Then,
1
P ( X i ) f x (i ) , i 1, 2, 3, 4, 5, 6.
6
Intuitively, the average point of throwing a fair dice is
1 2 3 4 5 6
3.5 .
6
The expected value of the random variable X is just the average,
6
1 1 1 1 1 1
E ( X ) if x (i ) 1 2 3 4 5 6 3.5 average point .
i 1 6 6 6 6 6 6
Example:
50
1 2 1 130
E (Y ) 30 f y (30) (40) f y (40) 0 f y (0) 30 (40) 0 .
6 3 6 6
(b) Variance:
Example:
Suppose we want to measure the variation of the random variable X in the dice
example. Then, the square distance between the values of X and its mean E(X)=3.5
can be used, i.e., (1 3.5) 2 , (2 3.5) 2 , (3 3.5) 2 , (4 3.5) 2 , (5 3.5) 2 , (6 - 3.5) 2
can be used. The average square distance is
(1 3.5) 2 ( 2 3.5) 2 (3 3.5) 2 (4 3.5) 2 (5 3.5) 2 (6 3.5) 2 8.75
.
6 3
Intuitively, large average square distance implies the values of X scatter widely.
The variance of the random variable X is just the average square distance (the
expected value of the square distance). The variance for the dice example is
6
Var ( X ) E X E ( X ) E ( X 3.5) 2 (i 3.5) 2 f (i )
2
i 1
1 1 1 1 1 1
(1 3.5) 2 (2 3.5) 2 (3 3.5) 2 (4 3.5) 2 (5 3.5) (6 3.5) 2
6 6 6 6 6 6
8.75
the average square distance
3
Example:
51
2 2 2
13 13 13
Var ( X ) 3 f x (3) 4 f x ( 4) 0 f x (0 )
6 6 6
2 2 2
.
31 1 11 2 13 1
7.472
6 6 6 3 6 6
Similarly, the variance of the random variable Y is
2 2 2
130 130 130
Var (Y ) 30 f y (30) 40 f y ( 40) 0 f y ( 0)
6 6 6
2 2 2
310 1 110 2 130 1
747.2
6 6 6 3 6 6
Example:
Z: the random variable representing the delay flight time taking values in [0,1].
1
P (0 Z 0.5) P(0.5 Z 1) .
2
The expected value of the random variable Z is just the average delay time.
1 1
x2
E ( Z ) xf 2 ( x)dx xdx 1
0 0.5 average delay time .
0 0
2
52
b
E ( X ) xf ( x )dx .
a
Example:
In the flight time example, suppose the probability density function for Z is
4 2
f1 ( x ) , 0 x 0.5; f1 ( x ) , 0.5 x 1.
3 3
Then, the expected value of the random variable Z is
1 0.5 1
4 2 x2 4 x2 2
E ( Z ) xf1 ( x) dx x dx x dx 0.5
0 1
0.5
0 0
3 0.5
3 2 3 2 3
.
0.5 4 0 4 12 2 0.5 2 2 5
2 2
2 3 2 3 2 3 2 3 12
5
Therefore, on the average, the flight time is hour.
12
(b) Variance:
Example:
Suppose we want to measure the variation of the random variable Z in the flight time
example. Suppose f 2 ( x ) is the probability density function for Z. Then, the square
1
distance between the values of Z and its mean E ( Z ) can be used, i.e.,
2
2
1
x , 0 x 1 can be used. The average square distance is
2
2 2
1
1
1 x3 x2 x 1 1 1 1 0 0 0 1
E Z x f 2 ( x)dx 0 .
2 0 2 3 2 4 3 2 4 3 2 4 12
The variance of the random variable Z is just the average square distance (the
expected value of the square distance). The variance for the flight time example is
2
1 1
Var ( Z ) E Z E ( Z ) E Z
2
the average square distance .
2 12
53
b
Var ( X ) 2 E X E ( X ) ( x u ) 2 f ( x)dx
2
Example:
In the flight time example, suppose f 1 ( x ) is the probability density function for Z.
Then, the variance of the random variable Z is
1 2 0.5 2 1 2
5 5 4 5 2
Var ( Z ) E Z E ( Z ) x f1 ( x)dx 0 x 12 3 dx 0.5 x 12 3 dx
2
0
12
x 3 5 x 2 25 x 4 x 3 5 x 2 25 x 2 11
0.5
1
12 144 3 12 144 3
0 0.5
3 3 144
Online Exercise:
Exercise 6.3.1
Exercise 6.3.2
Example:
H : head T : tail .
1 1 2 1 1
X2 0 T T P ( X 2 0) (1 combination)
2 2 0 2 2
X2 1 H T
54
1 1 2 1 1
T H P ( X 2 1) 2 (2 combinations)
2 2 1 2 2
1 1 2 1 1
X2 2 H H P ( X 2 2) (1 combination)
2 2 2 2 2
2
2 1
P ( X 2 i ) f 2 (i )
i 2
(number of combinations) ( the probability of every combination)
, i 0, 1, 2.
3
1 1 1 3 1
X3 0 T T T P ( X 3 0) (1 combination)
2 2 2 0 2
H T T
3
1 1 1 3 1
X3 1 T H T P ( X 3 1) 3 (3 combinations)
2 2 2 1 2
T T H
H H T
3
1 1 1 3 1
X3 2 H T H P ( X 3 2) 3 (3 combinations)
2 2 2 2 2
T H H
3
1 1 1 3 1
X3 3 H H H P ( X 3 3) (1 combination)
2 2 2 3 2
55
3
3 1
P ( X 3 i ) f 3 (i )
i 2
(number of combinations) ( the probability of every combination)
, i 0, 1, 2, 3.
Then,
n n
1 n 1
Xn 0 T T.T P ( X n 0)
2 0 2
(1 combination)
H T....T
n n
1 n 1
Xn 1 n T H T P ( X n 1) n
2 1 2
(n combinations)
T T ..H
n
n 1
P ( X n i ) f n (i )
i 2
(number of combinations) (the probability of every combination)
56
Example:
S : Success F : Failure
1 2
Suppose the probability of the success is while the probability of failure is .
3 3
Then,
0 3 0 3
1 2 3 1 2
Z3 0 F F F P ( Z 3 0)
3 3 0 3 3
(1 combination)
S F F
2 2
1 2 3 1 2
Z3 1 F S F P ( Z 3 1) 3
3 3 1 3 3
(3 combinations)
F F S
S S F
2 2
Z3 2 1 2 3 1 2
S F S P( Z 3 2) 3
3 3 2 3 3
(3 combinations)
F S S
3 0 3 0
1 2 3 1 2
Z3 3 S S S P( Z 3 3)
3 3 3 3 3
(1 combination)
57
i 3i
3 1 2
P ( Z 3 i ) f 3 (i )
i 3 3
(number of combinations) ( the probability of every combination)
, i 0, 1, 2, 3.
Then,
n 0 n
2 n 1 2
Zn 0 F F.F P ( Z n 0)
3 0 3 3
(1 combination)
S F.. .F
n 1 n 1
1 2 n 1 2
Zn 1 n F S F P ( Z n 1) n
3 3 1 3 3
(n combinations)
F F .S
i n i
n 1 2
P ( Z n i ) f n (i )
i 3 3
(number of combinations) (the probability of every combination)
58
From the above example, we readily describe the binomial experiment.
[Derivation:]
n n
n n
n!
E ( X ) i f (i ) i p i 1 p i p i 1 p
n i n i
i 0 i 0 i
i 0 i! n i !
n n
n! n!
i p i 1 p p i 1 p
n i n i
i 1 i! n i ! i 1 i 1! n i !
n
np
n 1! p i 1 1 p
( n 1) ( i 1)
i 1 i 1 ! n 1 i 1 !
np
n 1
n 1! p j 1 p n1 j ( j i 1)
j 0 j! n 1 j !
np (since
n 1! p j 1 p n1 j is the probability
j! n 1 j !
distributi on of a binomial random variable over n 1
trials)
59
The derivation of Var ( X ) np (1 p ) is left as exercise.
Online Exercise:
Exercise 7.1.1
Exercise 7.1.2
60
Properties of Poisson Probability Distribution:
A random variable X has the Poisson probability distribution f (x )
with parameter , then
E ( X ) the expected number of occurrences
and
Var ( X ) .
The derivations of the above properties are similar to the ones for the binomial
random variable and are left as exercises.
Example:
Suppose the average number of car accidents on the highway in one day is 4. What is
the probability of no car accident in one day? What is the probability of 1 car
accidence in two days?
[solution:]
It is sensible to use Poisson random variable representing the number of car accidents
on the high way. Let X representing the number of car accidents on the high way in
one day. Then,
e 4 4i
P ( X i ) f x (i ) , i 0,1, 2,
i!
and
E( X ) 4 .
Then,
e 4 4 0
P ( No car accident in one day) P ( X 0) f x (0) e 4 0.0183 Since the
0!
average number of car accidents in one day is 4, thus the average number of car
accidents in two days should be 8. Let Y represent the number of car accidents in two
days. Then,
e 8 8i
P (Y i ) f y (i ) , i 0,1, 2,
i!
and
E (Y ) 8 .
Then,
61
e 8 81
P (1 car accidents in two days) P (Y 1) f y (1) 8e 8 0.002
1!
Example:
Suppose the average number of calls by 104 in one minute is 2. What is the
probability of 10 calls in 5 minutes?
[solution]:
Since the average number of calls by 104 in one minute is 2, thus the average number
of calls in 5 minutes is 10. Let X represent the number of calls in 5 minutes. Then,
e 1010 i
P ( X i ) f x (i ) , i 0,1, 2,
i!
and
E ( X ) 10 .
Then,
e 10 1010
P (10 calls in 5 minutes) P ( X 10) f x (10) 0.1251 .
10!
Online Exercise:
Exercise 7.2.1
Exercise 7.2.2
62
Example:
Suppose there are 50 officers, 10 female officers and 40 male officers. Suppose 20 of
them will be promoted. Let X represent the number of female promotions. Then,
10 40
0 20
P ( X 0)
50
20
10 40
1 19
P ( X 1)
50
20
10 40
i 20 i
P( X i)
50
20
10 40
10 10
P ( X 10)
50
20
63
10 40
i 20 i
P( X i) , i 0,1, ,10.
50
20
r
Note: i is the number of combinations as selecting i elements from
N r
group 1 while is the number of combinations as selecting n-i
ni
N
elements from group2. is the total number of combinations as
n
r N r
selecting n elements from the two groups while is the total
i n i
number of combinations as selecting i and n-i elements from groups 1
and 2, respectively.
Online Exercise:
Exercise 7.3.1
64
Chapter 8 Continuous Probability Density
Example:
X: the random variable representing the flight time from Taipei to Kaohsiung.
Suppose the flight time can be any value in the interval from 30 to 50 minutes. That
is, 30 X 50. .
Question: if the probability of a flight time within any time interval
is the same as the one within the other time interval with the same
length. Then, what density f (x) is sensible for describing the
probability?
Recall that the area under the graph of f (x ) corresponding to any interval is the
probability of the random variable X taking values in this interval. Since the
probabilities of X taking values in any equal length interval are the same, then the the
areas under the graph of f (x ) corresponding to any equal length interval are the
same. Thus, f (x) will take the same value over any equal length area. For example,
within one minute interval, then
31 32 50
P(30 X 31)
30
f ( x) dx P (31 X 32)
31
f ( x) dx P ( 49 X 50) f ( x)dx
49
Therefore, we have
1
f ( x) , 30 x 50; f ( x ) 0, otherwise.
20
In the above example, the probability density has the same value in the interval the
random variable taking value. This probability density is referred as the uniform
probability density function.
65
probability density function f (x ) if
1
f ( x) , a x b; f ( x) 0, otherwise. .
ba
The graph of f (x ) is
f(x)
1/(b-a)
a b
x
1 b2 a2
b b
1 1 x2 b
E ( X ) xf ( x)dx x dx |a
a a
ba ba 2 b a 2 2
1 b a b a b a
ba 2 2
Example:
66
E( X )
50 30
40, Var ( X )
50 30 33.33 2
2 12
Online Exercise:
Exercise 8.1.1
Exercise 8.1.2
The normal probability density, also called the Gaussian density, might be the most
commonly used probability density function in statistics.
where
E ( X ), 2 Var ( X ), 3.14159
The graph of f (x ) is
67
f(x)
(a)
x 2
1
E ( X ) xf ( x)dx x e 2 2
dx
2
and
x 2
1
Var ( X ) x f ( x)dx x
2 2
2
e 2 2
dx
2
(b)
the mean of the normal random variable X
the median of the normal random variable X ( P( X ) P( X u ))
the mode of the normal probability density ( f (u ) f ( x), x )
68
means but different standard deviations, one is 1 (the solid line)
and the other is 2 (the dotted line):
f(x)
u
x
(e) The normal density is symmetric with respect to mean. That is,
f (u c ) f (u c ), where c is any number
(f) The probability of a normal random variable follows the
empirical rule introduce previously. That is,
P (u X ) 0.6826 68%
P ( 2 X 2 ) 0.9544 95%
P ( 3 X 3 ) 0.9973 100%
where
E ( Z ) 0, 2 Var ( Z ) 1 .
69
The probability of Z taking values in some interval can be found by the normal
table. The probability of Z taking values in [0,z], z 0, can be obtained by the
normal table. That is,
P(0 Z z ) the area of the region between two vertical lines
z x2
1
0 2
e 2
dx
-z 0 z
Example:
P ( Z 1.5) P( 1.5 Z 0) P ( Z 0)
1
P (0 Z 1.5) (symmetry of Z )
2
0.4332 0.5 0.9332
P (1 Z 1.5) P (0 Z 1.5) P (0 Z 1)
0.4332 0.3413 0.0919
70
Example:
P ( Z x) 0.0099 . What is x?
[solutions:]
P( Z x) 1 P( Z x) 1 0.0099 0.9901
1
x 0 (if x 0, then P( Z x ) )
2
1
P ( Z x) P ( Z 0) P (0 Z x) P(0 Z x) 0.9901
2
1
P(0 Z x) 0.9901 0.4901 x 2.33
2
Example:
1 and 2. Then,
0 X 1 2
P(1 X 3) P(1 1 X 1 3 1) P ( ) P(0 Z 1) 0.3413
2 2 2
Online Exercise:
Exercise 8.2.1
Exercise 8.2.2
The exponential random variable can be used to describe the life time of a machine,
71
industrial product and Human being. Also, it can be used to describe the waiting time
of a customer for some service.
P ( X x0 ) 1 e ,
for any x0 0 .
2.
x
1
E ( X ) x e dx
0
and
x
1
Var ( X ) x e dx 2 .
2
[derivation:]
72
x0
x0 x x0 x
1 x x
P( X x0 ) e
dx e
d e
y
dy (y )
0
0
0
x0 x0 x0
e y
0 e
e0 1 e
x0
Note: S ( x0 ) P( X x0 ) 1 P( X x0 ) e is called the survival
function.
Example:
Let X represent the life time of a washing machine. Suppose the average lifetime for
this type of washing machine is 15 years. What is the probability that this washing
machine can be used for less than 6 years? Also, what is the probability that this
washing machine can be used for more than 18 years?
[solution:]
73
The intuition of the above result is as follows. Suppose the time interval is [0,1] (in
hour) and 4 . Then, on the average, there are 4 occurrences during 1 hour period.
1 1
Thus, the mean time for one occurrence is (hour). The number of
4
occurrences can be described by a Poisson random variable (discrete) with mean 4
while the time of one occurrence can be described by an exponential random variable
1
(continuous) with mean .
4
Example:
Suppose the average number of car accidents on the highway in two days is 8. What is
the probability of no accident for more than 3 days?
[solutions:]
8
The average number of car accidents on the highway in one day is 4 . Thus, the
2
1
mean time of one occurrence is (day) .
4
Let Y be the Poisson random variable with mean 4 representing the number of car
1
accidents in one day while X be the exponential random variable with mean (day)
4
representing the time of one accident occurrence. Thus,
P(No accident for more than 3 days) P ( the time of one occurrence larger than 3)
3
1
P ( X 3) e 4
e 12 0
Online Exercise:
Exercise 8.3.1
74
Example:
Let X be the binomial random variable over 250 trials with p 0.01 . Then, it might
not be easy to obtain
250!
P ( X 3) 0.01 3 0.99 247
3!247!
directly. However, if we only want to obtain an approximation, Poisson
approximation is a good choice.
Poisson approximation:
Let X be a binomial random variable over n trials and let
p 0.05, n 20. Let Y be a Poisson random variable with mean np.
Example:
In the above example, the Poisson random variable with mean np 250 0.01 2.5
can be used for approximation. Thus,
Normal approximation:
Let X be a binomial random variable over n trials and the probability
of success be p. Let Y be the normal random variable with mean np
and variance np(1-p). Then, the probability of X taking value i can be
1 1
approximated by the probability of Y taking values in i 2 , i 2 .
75
That is,
n i 1 1
p 1 p ni P (i Y i )
i 2 2
1 x np 2
.
i 1
i
1
2
2
2 np 1 p
e 2 np (1 p ) dx
1 1
Note: the probability P(i
2
Y i )
2
can be obtained by
transforming Y to the standard normal random variable Z.
Example:
Let X be the binomial random variable over 100 trials and let the probability of a
success be 0.1. What is the probability of 12 successes by normal approximation?
[solutions:]
The normal random variable with mean np 100 0.1 10 and variance
np (1 p ) 100 0.1 (1 0.1) 9 can be used for approximation. Thus,
1 1 11 .5 10 Y 10 12.5 10
P ( X 12) P (12 Y 12 ) P ( )
2 2 3 3 3
P (0.5 Z 0.83) P (0 Z 0.83) P (0 Z 0.5)
0.2967 - 0.1915 0.1052
Note:
Let X be a binomial random variable over n trials and the probability
of success be p and let Y be the normal random variable with mean
np and variance np(1-p). Then,
1 1
i np i np
1 Y np
P ( X i ) P (Y i ) P ( 2 ) P( Z 2 ),
2 np(1 p ) np(1 p ) np(1 p )
76
1
k np
1 2
P ( X k ) P (Y k ) P ( Z )
2 np (1 p )
Example:
[solution:]
1
13 10
P ( X 13) P ( Z 2 ) P( Z 1.17) P ( Z 0) P (0 Z 1.17)
3
0.5 0.3790 0.8790
Note that the exact probability is
13 100
P ( X 13) 0.1 0.9
i 100i
0.8761 .
i 0 i
Review 1
Chapter 1:
Example:
77
321.10, 0.48, 23.40),, (AMEX, WEB, 153.50, 0.88,
7.50).
Example (continue):
Then,
Grades Frequency Relative Percent
Frequency Frequency
E 2 2/20=0.1 10
D 3 3/20=0.15 15
C 5 5/20=0.25 25
B 9 9/20=0.45 45
A 1 1/20=0.05 5
Total 20 1 100
78
2. Summarizing qualitative data:
Example:
[solution:]
79 30
Approximate class width 9.8
5
The class width is 10.
Thus,
Class Frequency Percent Cumulative Cumulative
Frequency Frequency Percent
Frequency
30-39 2 (2/18)100=11 2 11
40-49 3 (3/18)100=17 5 28
50-59 7 (7/18)100=39 12 67
60-69 5 (5/18)100=28 17 95
70-79 1 (1/18)100=5 18 100
79
Example:
[solution:]
10
fM i i
, where fi is the frequency of class i Mi is the midpoint of
xg i 1
70
class i and n is the sample size. Then,
Rent 420-439 440-459 460-479 480-499 500-519
fi 8 17 12 8 7
Mi 429.5 449.5 469.5 489.5 509.5
Rent 520-539 540-559 560-579 580-599 600-619
fi 4 2 4 2 6
Mi 529.5 549.5 569.5 589.5 609.5
Thus,
10
34525
fM
i 1
i i 34525 and x g
70
493.21 .
f M xg
10
2
i i
208234.287
s g2 i 1
3017.89
70 1 69
Chapter 4:
80
Chapter 5:
Example:
[solution:]
5 8 5! 8!
3
5 3!2! 5!3! 560
Example:
Assume you are taking two courses this semester (S and C). The
probability that you will pass course S is 0.835, the probability that you
will pass both courses is 0.276. The probability that you will pass at least
one of the courses is 0.981.
(a) What is the probability that you will pass course C?
(b) Is the passing of the two courses independent event?
(c) Are the events of passing the courses mutually exclusive? Explain.
[solution:]
(a)
Let A be the event of passing course S and B be the event of passing
course C. Thus,
P( A) 0.835, P( A B ) 0.276, P ( A B ) 0.981 .
P ( A c B ) P ( A B ) P ( A) 0.981 0.835 0.146
.
P ( B ) P ( A B ) P ( A c B ) 0.276 0.146 0.422
81
(b)
P ( A B ) 0.276
P( A | B) 0.654 P ( A) 0.835
P( B) 0.422
Thus, events A and B are not independent. That is, passing of two courses
are not independent events.
(c)
Since P( A B) 0.276 0 , events A and B are not mutually exclusive.
Review 2
Chapter 4
Bayes Theorem:
Example:
[solution:]
Let
A1: the students are engineering majors
A2: the students are business majors A1 A2 A3
A3: the students are other majors.
Originally, we know
P( A1) 0.4, P( A2) 0.5, P( A3) 0.1, P( B | A1) 0.3, P( B | A2) 0.6, P( B | A3) 0.2
82
.
Then, by Bayes theorem,
P ( A1) P ( B | A1)
P ( A1 | B )
P ( A1) P ( B | A1) P ( A2) P ( B | A2) P( A3) P ( B | A3)
0.4 0.3
0.2727.
0.4 0.3 0.5 0.6 0.1 0.2
Chapter 5
Example:
[solution:]
(a)
f ( x)
x
f (1) f (3) f (5) 2k 3k 4k 9k 1
1
k .
9
(b)
P( X 2) P( X 3 or X 5) P( X 3) P( X 5)
1 7
f (3) f (5) 3k 4k 7k 7
.
9 9
(c)
83
u E ( X ) xf ( x) 1 f (1) 3 f (3) 5 f (5)
x
2 3 4 31
1 3 5
9 9 9 9
and
Var ( X ) x u f ( x)
2
x
2 2 2
31 31 31
1 f (1) 3 f (3) 5 f (5)
9 9 9
22 2 2 16 3 14 2 4 200
81 9 81 9 81 9 81
Example:
[solution:]
(a)
1 1
a bx dx 1
b 3 1
f ( x) dx 1 ax x |0 1
2
0 0
3
b
a 1
3
and
1 1
E ( X ) xf ( x)dx x a bx 2 dx a 2 b 4 1 a b 3
2
x x |0
4 2 4 5
0 0
84
(b)
3 6 2
f ( x) x , 0 x 1
5 5
0, otherwise.
Thus,
2
3
Var ( X ) E X E ( X ) E ( X 2 ) E ( X ) E ( X 2 )
2 2
5
1 1
9 3 6 9
x 2 f ( x) dx x 2 x 2 dx
0
25 0 5 5 25
1 3 6 5 1 9 1 6 9 2
x x |0
5 25 25 5 25 25 25
Chapter 6
Example:
[solution:]
(a)
Let
Y: the number of customers arriving within 2 minutes.
Then,
E (Y ) u 2 4 8 (customers/two minutes)
and
e u u i e 8 8i
P (Y i ) , i 0,1,2, .
i! i!
Thus,
e 8 83
P (Y 3) 0.0286 .
3!
85
(b)
Review 2
Chapters 3, 4
Measures of Location, Dispersion, Exploratory Data Analysis,
Measure of Relative Location, Weighted and Grouped Mean and
Variance, Association between Two Variables
Example:
86
(c) Determine an interval for the batteries lives that will be true for at
least 80% of the batteries.
[solution:]
Denote
x 60, s 4
(a)
[54,66] 60 6 x 1.5s
Thus, by Chebyshevs theorem, within 1.5 standard deviation, there is at
least
1
1 2 100% 55.55%
1.5
of batteries.
(b)
[52,68] 60 8 x 2 s
Thus, by Chebyshevs theorem, within 1.5 standard deviation, there is at
least
1
1 2 100% 75%
2
of batteries.
(c)
1 1
1 2 100% 80% 1 2 0.8 k 5
k k
Thus, within 5 standard deviation, there is at least 80% of batteries.
Therefore,
x 5s 60 5 4 60 8.94 51.06,68.94 .
Chapter 5
Basic Relationships of Probability, Conditional Probability and
Bayes Theorem
Example:
The following are the data on the gender and marital status of 200
customers of a company.
87
Male Female
Single 20 30
Married 100 50
[solution:]
(a)
30
P A1 B2 0.15
200
(b)
100
P A2 B1 0.5
200
(c)
P A1 B2
P A1 | B2 .
P B2
Since
30 50 80
P B2 P A1 B2 P A2 B2 ,
200 200 200
30
P A1 B2 200 30
P A1 | B2 0.375 .
P B2 80 80
200
88
(d)
20 100 120
P B1 P A1 B1 P A2 B1 0.6
200 200 200
(e)
100
P A2 B1 200 100 5
P A2 | B1
P B1 120 120 6 .
200
(f)
Gender and martial status are not mutually exclusive since
P A1 B1 0
(f)
Gender and martial status are not independent since
30 50
P A1 | B2 P A1 .
80 200
Example:
[solution:]
(a)
89
P B1 P B1 A1 P B1 A2
P A1 P B1 | A1 P A2 P B1 | A2
0.6 0.98 0.4 0.2
0.668
(b)
By Bayes theorem,
P A1 B1 P ( A1) P ( B1 | A1)
P ( A1 | B1 )
P B1 P ( A1) P ( B1 | A1) P ( A2) P ( B1 | A2)
0.6 0.98
0.6 0.98 0.4 0.2
0.854
Chapter 6
Example:
[solution:]
(a)
f ( x)
x
f (1) f (3) f (5) 2k 3k 4k 9k 1
1
k .
9
(b)
P( X 2) P( X 3 or X 5) P( X 3) P( X 5)
1 7
f (3) f (5) 3k 4k 7k 7
.
9 9
90