Beruflich Dokumente
Kultur Dokumente
Reading
Newbold 1.1, 1.3, parts of 1.2.
Anderson, Sweeney, and Williams Chapter 1
Wonnacott and Wonnacott Chapter 1
James T Mc Clave, P. George Benson Chapter 1
Introductory Comments
This Chapter sets the framework for the book. Read it carefully, because the ideas
introduced are a basis to this subject and research Methodology.
1.
Example 1
Suppose that a population consists of six measurements, 1, 2, 3, 4, 5, and 7. List
all possible different samples of two measurements that could be selected from
the population. Give the probability associated with each sample in a random
sample of n 2 measurement selected from the populations.
Solution
All possible samples are listed below
Sample
Measurements
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1,2
1,3
1,4
1,5
1,7
2,3
2,4
2,5
2,7
3,4
3,5
3,7
4,5
4,7
5,7
Now let us suppose that I draw a single sample of n = 2 measurement from the 15
possible sample of two measurements. The sample selected is called a random sample if
every sample had an equal probability (1/15) being selected.
It is rather unlikely that we would ever achieve a truly random sample, because the
probabilities of selection will not always be exactly equal. But we do the best we can.
One of the simplest and most reliable ways to select a random sample of n measurements
from a population is to use a table of random numbers (See Appendix B). Random
number tables are constructed in such a way that, no matter where you start in the tables
no matter what direction you move, the digits occur randomly and with equal probability.
Thus if we wished to choose a random sample of n = measurements from a population
containing 100 measurements, we could label the measurements in the population from
0 to 99 (or 1 to 100). Then referring to Appendix Vii and choosing a random starting
point, the next 10 two-digit numbers going across the page would indicate the labels of
the particular measurements to be included in the random sample. Similarly, by moving
up or down the page, we would also obtain a random sample.
Example 2
A small community consists of 850 families. We wish to obtain a random sample of 20
families to ascertain public acceptance of a wage and price freeze. Refer to Appendix B
to determine which families should be sampled.
Solution
Assuming that a list of all families in the community is available such as a telephone
directory), we could label the families from 0 to 849 (or equivalently, from 1 to 850).
Then referring to the Appendix, we choose a starting point. Suppose we have decided to
start at line 1, column 4. Going down the page we will choose the first 20 three-digit
numbers between 000 and 849 from Table B, we have
511
584
754
258
791
045
750
266
099
783
059
105
671
301
498
469
152
568
701
160
These 20 members identify the 20 families that are to be included in our example/
Learning Objectives
After working through this chapter, you should be able to:
CHAPTER 2
Reading
Newbold Chapter 2
James T Mc Clave and P George Benson Chapter 2
Tailoka Frank P Chapter 3
Introductory Comments
This Chapter contains themes to do with the understanding of data. We find graphical
representations from the data, which allow one to easily see its most important
characteristics. Most of the graphical representations are very tedious to construct
without the use of a computer. However, one understands much more if one tries a few
with pencil and a paper.
Types of business data. Although the number of business phenomena that can be
measured is almost limitless, business data can generally be classified as one of two
types: quantitative or qualitative.
Quantitative data are observations that are measured on a numerical scale. Examples of
quantitative business data are:
i.
ii.
iii.
Qualitative data is one that is not measurable, in the sense that height is measured, or
countable, as people entering a store. Many characteristics can be classified only in one
of asset of category. Examples of qualitative business data are:
i)
ii)
The brand of petrol last purchased by seventy four randomly selected car owners.
Again, each measurement would fall into one and only one category.
Notice that each of the examples has nonnumerical or qualitative measurements.
Customer
1
2
3
4
5
6
7
8
9
10
Resident
NW
SE
SE
NW
SW
NW
NE
SW
NW
SE
Customer
11
12
13
14
15
16
17
18
19
20
Residence
NW
SE
SW
NW
SW
NE
NE
NW
NW
SW
Customer
21
22
23
24
25
26
27
28
29
30
Residence
NE
NW
SW
SE
SW
NW
NW
SE
NE
SW
A natural and useful technique for summarizing qualitative data is to tabulate the
frequency or relative frequency of each category.
Definition:
The frequency for a category is the total number of measurements that fall in the
category. The frequency for a particular category, say category i will be denoted by the
symbol fi .
The relative frequency for a category is the frequency of that category divided by the
total number of measurements; that is, the relative frequency for category I is
Relative frequency =
fi
n
Frequency
5
11
6
8
Total
30
Relative Frequency
5/30 = .167
11/30 = .367
6/30 = .200
8/30 = .267
1
10
Relative
Frequency
5
Frequency
0
NE
NW
SE
SW
Residential quadrant
a)
.50
.25
0
NE
NW
SE
SW
Residential Quadrant
b)
b)
Figure 1.1
The Pie Chart
The second method of describing qualitative data sets is the pie chart. This is
often used in newspaper and magazine articles to depict budgets and other
economic information. A complete circle (the pie) represents the total number of
measurements. This is partitioned into a number of slices with one slice for each
category. For example, since a complete circle spans 360o, if the relative
frequency for a category is .30, the slice assigned to that category is 30% of 360
or (.30) (36) = 108o.
108o
Figure 1.2 The portion of a pie char corresponding to a relative frequency of .3.
Company
E/S
Company
E/S`
Company
E/S
1
2
3
4
5
6
7
8
9
10
1.85
3.42
9.11
1.96
6.48
5.72
1.72
.8.56
0.72
6.28
11
12
13
14
15
16
17
18
19
20
2.80
3.46
8.32
4.62
3.27
1.35
3.28
3.75
5.23
2.92
21
22
23
24
25
26
27
28
29
30
2.75
6.58
3.54
4.65
0.75
2.01
5.36
4.40
6.49
1.12
2.
Divide the interval from the smallest to the largest measurement into between five
and twenty equal sub-intervals, making sure that:
a)
Each measurement falls into one and only one measurement class.
b)
3.
4.
Using a vertical axis of about three-fourths the length of the horizontal axis, plot
each frequency (or relative frequency) as a rectangle over the corresponding
measurement class.
Using a number of measurements, n = 30, is not large, we will use six classes to
span the distance between the smallest measurements, 0.72, and the largest
measurement, 9.11. This distance divided by 6 is equal to
Largest measurement smallest measurement
Number of intervals
9.11 0.72
6
1.4
By locating the lower boundary of the first class interval at 0.715 (slightly below the
smallest measurement) and adding 1.4, we find the upper boundary to be 2.115. Adding
1.4 again, we find the upper boundary of the second class to be 3.515. Continuing this
process, we obtain the six class intervals shown in the table below. Note that each
boundary falls on a 0.005 value (one significant digit more than the measurement), which
guarantees that no measurement will fall on a class boundary.
The next step is to find the class frequency and calculate the class relative frequencies
Class
1
2
3
4
5
6
Measurement
Class
0.715 2.115
2.115 3.515
3.515 4.915
4.915 6.315
6.315 7.715
7.715 9.115
Total
Class
Frequency
8
7
5
4
3
3
Class relative
Frequency
8/30 = .267
7/30 = .233
5/30 = .167
4/30 = .133
3/30 = .100
3/30 = .100
30
1.00
Table 1.4
Definition
The class frequency for a given class, say class i, is equal to the total number of
measurements that fall in that class. The class frequency for class I is denoted by the
symbol f i .
Definition
The class relative frequency for a given class, say class i, is equal to the class frequency
divided by the total number n of measurements, i.e.
Relative frequency for class i =
fi
n
10
a)
0.517 2.115
Earnings per share
Frequency Histogram.
3.515 4.915
6.315 7.715
.3
.2
.1
0.715
(b)
11
9.115
It is often useful to know the number or the proportion of the total number of
measurements that are less than or equal to those contained in a particular class. These
quantities are called the class cumulative frequency and the class cumulative relative
frequency respectively.
For example, if the classes are numbered from the smallest to the largest values of x, 1, 2,
3, 4, . . . , then the cumulative frequency for the third class would equal the sum of the
class frequencies corresponding to classes 1, 2, and 3.
Cumulative frequency for class 3 f1 f 2 f 3
Similarly, cumulative relative frequency for class 3
f1 f 2 f 3
where n is the total
n
Cumulative frequencies and cumulative relative frequencies for earning per share data.
Class No.
Measurement
class
Class
Frequency
Cumulative
frequency
0.715 - 2.115
8/30 = .267
8/30 =.267
2.115 3.515
(8 + 7) = 15
7/30 = .233
15/30 = .500
3.155 4.915
(15 + 5) = 20
5/30= .167
20/30 = .667
4.915 6.315
(20 + 4) = 24
4/30 = .133
24/30 = .800
6.315 7.715
(24 + 3) = 27
3/30 = .100
27/30 = .900
7.715 9.115
(27 + 3) = 30
3/100 = .100
30/30 = 1.00
30
Cumulative relative frequency Distribution for earnings per share data.
12
1.0
Cumulative
Relative
.8
Frequency
.6
.4
.2
0.715
7.715
9.115
Learning Objective
Draw a pie chart, bar chart and also construct frequency tables, relative
frequencies, and histogram.
Interpret the diagrams. You will understand the importance of captions, axis
labels and graduation of axes.
CHAPTER 3
13
DESCRIPTIVE MEASURES
Reading
Newbold Chapter 2
Wonnacott and Wonnacolt Chapter 2
Tailoka Frank P. Chapter 4
James T McClave , Lawrence Lapin L and P George Benson Chapter 3
Introductory Comments
This Chapter contains themes which allow one to easily se the most important
characteristics of data. The idea is to find simple numbers like the mean, variance which
will summarize those characteristics.
3.
The modal class, the one corresponding to the interval 0.715 2.115 lies to the left side
of the distribution. The mode is the midpoint of this interval; that is
14
Mode =
0.715 2.115
1.415
2
In the sense that the mode measures data concentration, it provides a measure of central
tendency of the data.
Definition
The mean of a set of quantitative data is equal to the sum of the measurements divided by
the number of measurement contained in the data set. The mean of a sample is denoted
by x (read x bar) and represent the formula for this calculation as follows:-
Example 1
Calculate the mean of the following five simple measures,. 5, 3, 8, 5,6.
Solution
Using the definition of the sample mean and demand shorthand notation we find
5
11
xi
5 3 8 5 6 27
5.4.
5
5
15
The median of a data set is the number such that half the measurements fall below the
median and half fall above. The median is of most value in describing large data sets. If
the data set is characterized by a relative frequency histogram, the median is the point on
the x-axis such that half the area under the histogram lies above the median and half lies
below. For a small, or even a large but finite, number of measurements, there may be
many numbers that t satisfy the property indicated in the figure on the next page. For this
reason, we will arbitrarily calculate the media of a data.
Calculating a median
1.
If the number of n of measurements in a data set is odd, the median is the middle
number when the measurements are arranged in ascending (or descending) order.
2..
If the number of n of measurements is even, the median is the mean of the two
middle measurements when the measurements are arranged in ascending (or
descending) order.
Example 2
Consider the following sample of n = 7 measurements.
5, 7, 4, 5, 20, 6, 2
a)
b)
Eliminate the last measurement (the 2) and calculate the median of the remaining
n = 6 measurements.
Solution
a)
The seven measurements in the sample are first arranged in ascending order
2, 4, 5, 5, 6, 7, 20
Since the number of measurements is odd, the median is the middle measure.
Thus, the median of this sample is 5.
b)
After removing the 2 from the set of measurements, we arrange the sample
measurements in ascending order as follows:
4, 5, 5, 6, 7, 20
Now the number of measurements is even, and so we average the middle two
measurements. The median is (5+6)/2 = 5.5.
16
If the median is less than the mean, the data set is skewed to the right.
Relative
Frequency
Median
Rightward Skewness
Skewness
2.
Mean
measurement units
Mean Mode
s tan dard deviation
3(mean median)
s tan dard deviation
The median will equal the mean when the data set is symmetric.
Median
Mean
Measurement unit
Symmetry
17
3.
If the median is greater than the mean, the data set is skewed to the left.
Mean
Median
Measures of Variation
Definition:
The range of a data. Set is equal to the largest measurement minus the smallest measure.
When dealing with grouped data, there are two procedures which are not adopted for
determining the range.
1.
2.
S2
(x x)
i 1
n 1
The second step in finding a meaningful measure of data variability is to calculate the
standard deviation of the data set.
18
The sample standard deviation , s, is defined as the positive square root of the sample
variance, S 2 thus,
n
S S2
(x x)
i 1
n 1
The corresponding quantity, the population standard deviation, measure the variability of
the measurements in the population and is denoted by (sigma). The population
variances will therefore be denoted by 2 .
Example 3
Solution
For this set of data, x 3. Then
(2 3) 2 (3 2) 2 (3 3) 2 (4 3) 2
5 1
2
0.5 0.71
4
n
i 1
n 1
19
Example 4
Use the shortcut formula to compute the variances of these two samples of five measures
each.
Sample 1:
1, 2, 3, 4, 5
Sample 2:2, 3, 3, 3, 4
Solution
We first work with sample 1. The quantities needed are:
n
x
i 1
= 1 + 2 + 3 + 4 + 5 = 15,
x
i 1
2
1
and
12 22 32 42 52
1 4 9 16 25 55
5
xi
n
(15) 2
2
x1 i 1
55
5
5
S 2 i 1
5 1
4
55 45 10
2.5
4
4
x
i 1
= 2 + 3 + 3 + 3 + 4 = 15
5
Add
x
i 1
2
1
22 32 32 32 42 4 9 9 9 16 47
20
5
xi
n
(15) 2
2
x1 i 1
47
5
5
S 2 i 1
5 1
4
47 45 2
0.5
4
4
Example 5
The earnings per share measurements for thirty companies selected randomly from 1980
Financial/Daily mail are listed here. Calculate the sample variance S 2 and the standard
deviation, S, from these measurements.
1.85
3.42
9.11
1.96
6.48
5.72
1.72
8.56
0.72
6.28
2.80
3.46
8.32
4.62
3.27
1.35
3.28
3.75
5.23
2.92
2.75
6.58
3.54
4.65
0.75
2.01
5.36
4.40
6.49
1.12
Solution
The calculation of the sample variance , S 2 , would be very tedious for this example if we
tried to use the formula,
30
S
2
(x
i 1
x) 2
30 1
because it would be necessary to compute all thirty squared distances from the mean.
However, for the shortcut formula we need only compute:
21
30
x
i 1
30
x
i 1
2
i
(1.85) 2 (3.42) 2 . .
(1.12) 2 6.57.5239
30
x1
30
(122.47) 2
i 1
2
x
657.5239
i 30
30
S 2 i 1
30 1
29
5.4331
Notice that we retained four decimal places in the calculation of S 2 to reduce rounding
errors, even though the original data were accurate to only two decimal places.
S S 2 5.4331 2.33
distribution (the mean, median and mode should all be about the same) and that laid off
as we move away from the center of the histogram.
2.
b.
c.
A rule of thumb, called the empirical rule, that applies to samples with frequency
distributions that are mould-shaped:
a)
b)
c)
Example 6
Refer to the data for earnings per share for thirty companies selected randomly from the
1980 Financial/Daily Mail. x 4.08 , S 2.33 . Calculate the fraction of the thirty
measurements that lie within the intervals x S , x 2 S , and x 3S , and compare the
results with those of the Chebyshev and Empirical rule.
23
Solution
x S , x S ) (4.08 2.33, 4.08 2.33) (1.75, 6.41)
contains all the measurements. These 1, 2 and 3 standard deviations percentages (63, 97,
and 100) agree fairly well with the approximations of 68%, 95% and 100%, given by the
Empirical Rule for mould-shape distributions.
Example 7
The aid for interpreting the value of a standard deviation can be put to an immediate
practical use as a check on the calculation of the standard deviation. Suppose you have a
data set for which the smallest measurement is 20 and the largest is 80. You have
calculated the standard deviation of the data set to be S = 190.
How can you use the Chebyshev or empirical rule to provide a rough check on your
calculated value of S?
Solution
The larger the number of measurements in a data set, the greater will be the tendency for
very large or very small measurements (extreme values) to appear in the data set. But
from the Rules, you know that most of the measurements (approximately 95% if the
distribution is mould-shaped) will be within 2 standard deviations of the mean, and
regardless of how many measurements are in the data set, almost all of them will fall 3
standard deviations of the mean. Consequently we would expect the range to be between
4 and 6 standard deviations i.e. between 4s and 6s.
24
x 2S
x 2S
Range 4S
=
=
=
6S
6S
10
Or, if we let the range equal 4S, we obtain a larger (and more conservative) value for S,
namely
Range =
60
=
S
=
4S
6S
15
Now you can see that it does not make much difference whether you let the range equal
4S (which is more realistic for most data set) or 6S (which is reasonable for large data
sets). It is clear than your calculated value, S = 190, is too large, and you should check
your calculations.
25
x f
i i
i 1
xi f i
K
x12 f i i 1
n
S 2 i 1
n 1
S S2
Example 8
Compute the mean and standard deviation for the earnings per share data using the
grouping shown in the frequency Table 1.4.
Solution
The six class interval, midpoints, and frequencies are shown in the accompanying table.
Table 1.4 Earnings per share
Class
Class Midpoint
0.715 2.115
1.415
Class frequency
fi
8
2.115 3.515
2.815
3.515 4.915
4.215
4.915 6.315
5.615
6.315 7.015
7.015
7.715 9.115
8.415
n fi 30
26
x f
i i
4.03
30
i 1
xi f i
K
x12 f i i 1
n
S 2 i 1
n 1
We found
x f
i 1
i i
29
5.5060
S2
S 5.5060 2.35.
You will notice that values of x, S 2 , and S from the formulas for grouped data usually do
not agree with these obtained for the raw data ( x 4.03 and S = 2.311). This is because
we have substituted the value of the class mid point for each value of x in a class
interval. Only when every value of a x in each class is equal to its respective class
midpoint will the formulas for grouped and for ungrouped data give exactly the same
answers for x, S 2 , and S. Otherwise, the formulas for grouped data will give only the
approximations to these numerical descriptive measures.
27
ranking.
Definition
Let x1 , x2 , . . . , xn be a set of n measurements arranged in increasing (or decreasing)
order. The pth percentile is a number x such that p% of the measurements fall below the
pth percentile and (100 p)% fall above it.
For example: if oil company A report that its yearly sales are in the 90th percentile of all
companies in the industry, the implication is that 90% of all oil companies have yearly
sales less that As, and only 10% have yearly sales exceeding company As.
Relative
Frequency
.90
.10
Company As sales. Yearly sales.
Another measure of relative standing in popular use is the Z-score. The Z-score makes
use of the mean and standard deviation of the data set in order to specify the location of a
measurement.
Definition
The sample Z-score for a measurement x is
Z
xx
S
The Z-score represents the distance between a given measurement x and the mean
expressed in standard units.
28
Example 9
Suppose 200 steel workers are selected, and the annual income of each is determined.
The mean and standard deviation are x K14 ,000 , S K 2,000
Suppose Chipos annual income is K12, 000 what is his sample Z-score?
K8,000
x 3S
K12,000
x
K14,000
x
K20,000
x 3S
Solution
Chipos annual income lies below the mean income of the 200 steel workers.
We compute Z
x x 12000 14000
1.0
S
2000
Which tells us that Chipos annual income is 1.0 standard deviation below the sample
mean, in short, his sample Z-score is 1.0.
Example 10
Suppose a female bank executive believes that her salary is low as a result of sex
discrimination. To try to substantiate her belief, she collects information on the salaries
of her counterparts in the banking business. She finds that their salaries have a mean of
K17, 000 and a standard deviation of K1, 000. Her salary is K13, 500. Does this
information support her claim of sex discrimination?
Solution
The analysis might proceed as follows: First, we calculate the Z-score for the womans
salary with respect to those of her male counterparts. Thus
Z
13500 17000
3.5
1000
29
The implication is that the womans salary is 3.5 standard deviations below the mean of
the male distribution. Furthermore, if a check of the male salary data shows that the
frequency distribution is mould-shaped, we can infer that very few salaries in this
distribution should have a Z-score less than 3, as shown in the figure.
Relative
Frequency
Z-Score = -3.5
13.500
17,000
Salary (K)
However, the careful investigator should require more information before inferring sex
discrimination as the case. We would want to know more about the data collection
technique the woman used, and more about her competence at her job. Also perhaps
other factors like the length of employment should be considered in the analysis.
30
Learning Objectives
After working through this Chapter you should be able to
31
(a)
(b)
Briefly state, with reasons, the type of chart which would best convey the
information for each of the following:
(i)
(ii)
(iii)
Numbers of cars taxed for 2002, 2003 and 2004 in areas A, B and
C of a city.
The weekly cost (K) of rented accommodation was recorded for 100
students living in an area.
Amount in Thousand of
Kwachas
04
59
10 14
15 19
20 24
25 - 29
Frequency
3
17
24
31
19
6
(i)
Draw a histogram.
(ii)
(iii)
(iv)
32
2.
3.
The data below are per capita per week numbers of cigarettes sold for 38 states in
a country.
19.20
26.82
19.24
27.18
25.96
30.14
29.27
21.10
28.91
29.92
29.64
21.94
22.58
29.92
26.91
43.40
30.18
23.86
28.56
24.75
24.32
24.78
22.17
20.96
27.38
24.44
26.89
41.46
21.08
23.57
15.80
32.10
24.44
29.04
31.34
29.60
23.12
17.08
(a)
(b)
(c)
(d)
How does this compare with the actual situation as shown in the table
above?
(a)
Briefly state, with reasons, the type of chart which would best convey in
each of the following:
(b)
(i)
(ii)
(iii)
56
33
30
31
55
29
27
21
32
43
33
29
27
30
29
26
26
27
26
35
32
28
27
31
27
33
24
27
28
33
49
22
19
46
36
26
38
36
55
33
4. (a)
(i)
(ii)
Calculate the mean and the standard deviation from your frequency
table.
(iii)
Plot a histogram for these data. What is the value of the median?
(iv)
The range
(iii)
The median
(iv)
(v)
(vi)
(vii)
(viii)
18
29
42
50
61
20
33
43
54
63
10
21
35
46
56
67
11
25
39
48
58
69
14
(b)
Explain the term measure of dispersion and state briefly the advantage and
disadvantage of using the following measures of dispersion:
(i)
Range
(ii)
Mean deviation
(iii)
Standard deviation
34
5.
55
58
25
42
42
58
7
55
57
13
40
40
43
73
28
15
41
22
27
24
28
67
66
66
37
21
28
32
7
34
29
19
29
23
27
30
26
11
17
24
17
26
21
35
12
(a)
(b)
(c)
the mean
the standard deviation
35
CHAPTER 4
PROBABILITY
Reading
Newbold Chapter 3
Tailoka Frank P Chapter 8
Wonnacott and Wonnacolt Chapter 3
Introductory Comments
Probability is more abstract than other parts of this subject, and solving the problems may
be difficult. The concepts are very important for statistics because it is the rules of
probability that allow one to reason about uncertainty. Independence and conditional
probability are important to understand clearly for the purpose of statistical investigation.
4.
Elementary Probability
Counting Techniques. Introduction of the probability concept. The event and the
event relationships. Probability trees, conditional probability and statistical
independence.
Counting techniques: In calculating probabilities, it is essential to be able to work
out n(s) and n(E) as straight-forwardly as possible.
Permutations and
combinations are very helpful here. We begin with the following basic principle.
Fundamental principle of counting. If two operations A, B are carried out, and
there are M different ways of carrying out A and k different ways of carrying out
B, then the combined A and B may be carried out in M x K different ways.
Example 1
Suppose a license plate contains two distinct letters followed by three digits with
the first digit not zero. How many different license places can be printed?
36
The first letter can be printed in 26 different ways, the second letter in 25 different ways
(since the letter printed first cannot be chosen for a second letter, the first digit in 9 ways
and each of the other two digits in 10 ways. Hence
26.25.9.10.10 = 585,000
Different plates can be printed.
Example 2.
A toy manufacturer makes a wooden toy in two parts, the top part may be coloured red,
white or blue and the bottom part brown, orange, yellow or green. How many differently
coloured toys can be produced?
A red top part may be combined with a bottom part of any of the four possible colours.
Similarly, either a white or a blue top part may be combined with each of the four
different coloured parts. Hence the number of different coloured toys is
3 4 12
Example 3
Consider the set of letters a, b, c and d. Then
i)
bdca, dcba and acdb are permutations of the 4 letters (taken all at a time).
ii)
iii)
37
Example 4
The telephone switchboard in the company requires two operators whose chairs
(positions) are side by side. When the telephone operators go to lunch, two of the four
Secretaries take their places. If we make a distinction between the two operatorss
positions, in how may ways can the four secretaries fill them?
We can answer this question by determining the number of possible permutations of 4
things taken 2 at a time. There are 4 secretaries, A, B, C and D, to fill the first position.
Once this position has been filled, there are only 3 secretaries to fill the second positions.
The figure below
Ways to fill
First position
10
11
38
12
The tree diagram on the page illustrates that there are 4.3 = 12 possible permutations of
four things taken two at a time. Suppose that n is the number of distinct objects from
which an ordered arrangement is to be derived, and r is the number of objects in the
arrangement. The number of possible ordered arrangements is the number of
permutations of things taken r at a time. This is written symbolically as P(n, r ) in
general, or n Pr .
P(n, r ) n(n 1)(n 2). . . (n r 1)
(1)
(n 1)!
(n r )!
(n 1)!
Example 5
i)
In a stock room, 5 adjacent bins are available for storing 5 different items. The
stock of each item can be stored satisfactorily in any bin. In how many ways can
we assign the 5 items to the 5 bins?
ii)
5!
5.4.3.2.1 120
(5 5)!
Suppose that there are 6 different parts to be stocked, but only 4 bins are
available.
P(6,4)
6!
6.5.4.3.2.1
360
(6 4)!
2!
Example 6
How many permutation are there of 3 objects, say, a, b and c?
There are P (3,3)
3!
3! 1.2.3 6 such permutations.
(3 3)!
Example 7
Find the number of permutation of the word ACCOUNTANTS
Total number of letters in ACCOUNTANTS is 11 out of which there are two Cs, two
Ns, and two ts. So the required number of permutation s
11!
2494800.
2!2!2!2!
Combinations
A combination is an arrangement of objects without regard to order.
40
Example 8
The combinations of the letters a, b, c, d taken 3 at a time are
{a, b, c}, {a, b, d}, (a, c, d}, (b, c, d} or simply
abc, abd, acd, bcd, . Observe that the following combinations are equal.
abc, acb, bac, bca, cab, cba.
That is, each denotes the same set a, b, c
Combinations
Permutations
abc
abd
acd
bcd
41
P(4,3)
3!
n!
r!(n r )!
Example 10
10!
10 .9.8.7.6!
210
6!(10 6) 6!.4.3.2.1
Tree Diagrams
A tree diagram is a device used to enumerate all the possible outcomes of a sequence of
experiments where each experiment can occur in a finite number of ways. The
construction of tree diagrams is illustrated in the following examples.
Example 11
Find the product A x B x C where
A = {1, 2}, B{a, b, c} and C = {3, 4}. The tree diagram follows:
3
(1, a, 3)
(1, a, 4)
(1, b, 3)
(1, b, 4)
(1, c, 3)
42
(1, c, 4)
(2, a, 3)
(2, a, 4)
(2, b, 3)
(2, b, 4)
(2, c, 3)
(2, c, 4)
Observe that the tree is constructed from left to right, and that the number of branches at
each prints corresponds to the number of possible outcomes of the next experiment.
Example 12
Mumba and Ened are to play a tennis tournament. The first person to win two games in a
row or who wins a total of three games wins the tournament. The following diagram
shows the possible outcomes of the tournament.
M
M
M
E
E
M
E
M
M
E
E
43
Observe that there are 10 end points which correspond to the 10 possible outcomes of the
tournament.
MM, MEMM, MEMEM, MEMEE, MEE, EMM, EMEMM, EMEME, EMEE, EE
The path from the beginning of the tree to the end point indicates who won which game
in the individual tournament.
Basic Of Probability
Given a sample spaces S, we need to assign to each event that can be obtained from S a
number, called the probability of the event. This number will indicate the relative
likelihood of the various events.
For events that are equally likely, the probability of the event can be found from the
following basic probability principle. Then the probability that event E occurs, written P
(E), is
P(E) = m
(1)
n
This same result can also be given in terms of the cardinal number of a set. Where n (E)
represents the number of elements in a finite set E. With the same assumptions given
above,
P(E) = n(E) .
(2)
n(S)
44
Example 1
Suppose a fair coin is tossed twice. The sample space is S = (HH), (HT), (TH), (TT).
Set S contains 4 outcomes, all of which are equally likely. (This makes n = 4 in the
formula (1) above.) Find the probability of the following outcomes.
a)
E = (HT), (TH)
Event E contains two elements, so
P (E) = 2 = 1
4
By this result, a head or tail will show up 1/2 of the time when a fair coin is tossed
twice.
b)
Two heads
Let event F = (HH) be the event two heads are observed when a fair coin is
tossed twice. Event F contains one element, so
P (F) =
c)
Three heads
A fair coin tossed twice can never show three heads. If G is the event, then G =
, and P (G) =
0
= 0.
4
45
Example 2
If a single paying card is drawn at random from an ordinary 52-card bridge deck,
find the probability of each of the following events.
a)
An ace is drawn
There are four aces on the deck, out of 52 cards, so
P(ace) =
b)
4
1
52 13
c)
12 3
52 13
A spade is drawn
The deck contains 13 spaces, so
P (spade) =
d)
13 1
54 4
26 1
52 2
46
Example 3
The Manager of a department store has decided to make a study on the size of purchases
made by people coming into the store. To begin he chooses a day that seems fairly
typical and gathers the following data. (Purchases have been rounded to the nearest
Kwacha) with sales tax ignored.
Amount of purchase
Number of customer
Probability (relative
frequency)
K0 and under
160
0.280
84
0.147
50
0.088
and under
136
0.239
and under
77
0.135
63
0.111
570
1.000
K13500
K13500
K20250
K20250
K22500
K22500 and over
Probability Distributions.
In Example 3 the outcomes were various purchase amounts, and a probability was
assigned to each outcome. By this process, a probability distribution can be set up; that is
to each possible outcome of an experiment, a number, called the probability of that
outcome, is assigned.
47
Example 4
Set up a probability distribution for the number of heads observed when a fair coin is
tossed twice.
_______________________________________
Number of heads
Probability
_______________________________________
0
1
4
1
2
4
2
1
4
_________
Total
1
_______________________________________
The probability distribution that was set up suggests the following properties of
probability.
Let S = S1, S2, S3, , Sn be the sample space obtained from the union of n distinct
simple events S1 , (S2 , S3 ,, Sn with associated probabilities P1, P2, P3, ,
Pn. Then
1.
0 P1 1, 0 P2 1, , 0 Pn 1
(All probabilities are between 0 and 1 inclusive);
2.
P1 + P2 + P3 + + Pn = 1;
(The sum of all probabilities for a sample space is 1.);
3.
P (S) = 1
4.
P() = 0
48
Addition Principle
Suppose E S1 , S 2 S n , where S1 , S 2 , S n are distinct simple events then
P (E) = P( S1 ) + P( S2 ) + ... + P ( Sn )
Example 5
Refer to the previous Example and find the probability that a customer spends at least
K11, 250 but less than K20250.
This event is union of two simple events spending K11, 250 to K20, 250. The probability
of spending at least K11, 250 but less than K20, 250 can thus be found by the addition
principle. Let this event A, then
P (A ) = P(Spending K11250 K13500) + P(spending K13500 -K20250)
1 2 3
4 4 4
49
P(E') = 1 - P(E) = 1
3
.
8
Find P(E')
3 5
.
8 8
Example 7
In example 3 above, find the probability that a customer spends less than K22500. Let E
to be the event a customer spends less than K22500.
P(E) = 0.281 + 0.147 + 0.088 + 0.2394 + 0.135 = 0.889
Alternatively E' is the event that a customer spends K22500 and over from the table.
P(E') = 0.111, and 1-P( E ) = P(E) = 1 - 0.111 = 0.889
Odds
The Odds in favor of an event E is defined as the ratio of P(E) to P(E') , or P(E)
P(E')
Example 8
Suppose the weather forecaster says that the probability of rain tomorrow is
2
. Find
5
2
5
3
We have P( E ) = . By the definition of odds, odds in favor of rain
5
3 or 3:2
= 2/5 written 2 to
3/5 .
50
P(E) =
m
m
and P( E ) =
mn
mn
Example 9
The odds that a particular bid will be the low bid are 8 to 13. Find the probability that the
bid will be the low bid.
Solution
Odds of 8 to 13 show 8 favorable chances out of 8 + 13 = 21 chances altogether.
There is a
8
8
8 13 21
13
chance that the bid will not be the low bid
21
51
Example 10.
If a single card is drawn from an ordinary deck, find the probability that it will be red or a
face card.
Let R and F represent the events red and face card respectively. Then
P(R) =
26
12
6
, P(F) =
, and P (R F) =
52
52
52
(There are six red face cards in a deck) By the extended addition principle,
= 26 + 12 - 6 = 32 = 8
52
52
52
52
13
Example 11
Suppose two fair dice care rolled. Find each of the following probabilities.
a)
(1,1)
(2,1)
(3,1)
(4,1)
(5,1)
(6,1)
(1,2)
(2,2)
(3,2)
(4,2)
(5,2)
(6,2)
(1,3)
(2,3)
(3,3)
(4,3)
(5,3)
(6,3)
(1,4)
(2,4)
(3,4)
4,4)
(5,4)
(6,4)
(1,5)
(2,5)
(3,5)
(4,5)
(5,5)
(6,5)
(1,6)
(2,6)
(3,6)
(4,6)
(5,6)
(6,6)
52
P(A) =
6
5
1
, P(B) =
, P(An B) =
36
36
36
=
b)
6
5
1 10 5
36 36 36 36 18
4
6
, P(second die is 4) =
36
36
1
36
= 9 = 1
36
Often we are interested in how certain events are related to the occurrence of
other events. In particular, we may be interested in the probability of the
occurrence of an event given that another related event has occurred. Such
probabilities are referred to as Conditional Probabilities.
The conditional Probability of event E given event F, written P(EF), is
P(EF) = P(E F), P(F) 0
P(F)
53
Example 11
The Training Manager for a large stockbrokerage firm has noticed that
some of the of firms brokers use the firms research advice, while other
brokers tend to go with their own feelings of which stocks will go up. To
see if the research department is better than just the feelings of the brokers,
the manager conducted a survey of 100 brokers, with results as shown in
the following table.
Didnt pick stocks
Picked stocks
Total
That went up
That went up
15
Used research
30
45
30
25
55
Totals
60
40
100
Letting A represent the event picked stocks that went up, and letting B represent the
event used research, we can find the following probabilities.
P(A) =
60
= 0.6
100
P(A') =
40
= 0.4
100
P(B) =
45
= 0.45
100
P(B') =
55
= 0.55
100
Suppose we want to find the probability that a broker using research will pick stocks that
go up. From the table above, of the 45 brokers who use research, 30 picked stocks that
went up, with
P(broker who uses research picks stocks that go up)
= 30 = 0.667.
45
This is a different number than the probability that a broker picks stocks that go up, 0.6,
since we have additional information (the broker uses research) which reduced the
54
sample space. In other words, we found the probability that a broker picks stocks that go
up, A, given the additional information that the broker uses research, B. This is called the
conditional probability of event A, given that event B has occurred, written P(A/B). In
the example above,
P(AB) = P(A B)
P(B)
= 30 = 0.667.
45
Product Rule: For any events E and F
P(EF) = P(F). P(E/F)
Example 12.
A class is
2
3
women and men . Of the women, 25% are business majors. Find the
5
5
Solution
Let B and W represent the events business major and woman, respectively. We want
to find P(B W) . By the product rule,
P(B W) = P(W). P(BW)
Using the given information, P(W) =
2
5
Example 13
55
The firm has assigned the following probabilities on the basis of available information.
That is, the Investment Company believes the probability is 0.8 that the XYZ common
stock will gain 10% in the next year assuming that the GNP gains 10% in the same time
period. In addition, the company believes the probability is only 0.3 that the GNP will
gain 10% in the next year. Use the formula for calculating the probability of an
intersection to calculate the probability that XYZ common stock and the GNP gain 10%
in the next year.
Solution.
Thus, the probability, according to this investment firm, is 0.24 that both XYZ common
stock and the GNP will gain 10% in the next year.
In the previous section we showed that the probability of an event A may be substantially
altered by the assumption that the event B has occurred. However, this will not always
be the case. In some instances the assumption that event B has occurred will not alter the
probability of event A at all. When this is true, we call events A and B independent.
Events A and B are independent if the assumptions that B has occurred does
not alter the probability that A has occurred, i.e
P(AB) = P(A)
P(BA) = P(B)
56
Example 14
The probability that interest rates will rise has been assessed as 0.8. If they do rise, the
probability that the stock market index will drop is estimated to be 0.9. If the interest
rates do not rise, the probability that the stock market index will still drop is estimated as
0.4. What is the probability that the stock market index will drop?
Solution
P(A) = P(Interest rates rise) = 0.8.
P(B) = P(Stock market index drops) = ?
Then, the probability of A , the complement of A, interest rates do not rise is P( A ) =
1 0.8 = 0.2.
Example 15
Suppose we toss a fair die, let B be the event observe a number less or equal to 4 and A to
be the event an even number is observed. Are event A and B independent?
P(B) =
4 2
, since B = { 1, 2, 3, 4}
6 3
P(A) =
3 1
since A = 2, 4,
6 2
57
P(A B) =
2 1
where A B = 2, 4
6 3
P( A B)
P( A B) 1 / 3 2
P( B)
P( A)
1/ 2 3
P( A B) 1 / 3 1
P( A)
P( B)
1/ 2 2
1 2 1
.
2 3 3
58
Bayes Theorem
A posteriori Probabilities
Suppose three machines, A, B, and C, produce similar engine components. Machine A
produces 45 percent of the total components, machine B produces 30 percent, and
Machine C, 25 percent. For the usual production schedule, 6 percent of the components
produced by machine A do not meet established specifications; for machine B of machine
C, the corresponding figures are 4 percent and 3 percent. One component is selected at
random from the total output and is found to be defective. What is the probability that
the component selected was produced by machine A?
The answer to this question is found by calculating the probability after the outcomes of
the experiment have been observed.
C
A
B
D
A D
BD
CD
59
The three mutually exclusive events A, B and C form a partition of the sample spaces.
Apart from being mutually exclusive, their union is precisely S.
The event D may be expressed as:
1.
D ( A D) ( B D) (C D)
2.
by
A D.
n( A D )
n( D )
P( A D)
P( D)
P(( A D)
P( A D) P( B D)P(C D)
P( A / D)
(1)
P( A D) P( A) P( D / A)
P( B D) P( B)P( D / B), and
P(C D) P(C ) P( D / C )
P( A / D)
P( A) P( D / A)
P( A) P( D / A) P( B) P( D / B)P(C ) P( D / C )
(2)
calculated in the usual fashion. Infact, by displaying these quantities on a tree diagram,
we obtain Figure 1.0. We may compute the required probability by substituting the
relevant quantities into (2), or we may make use of the following device.
P(A/D) = Product of probabilities along the limb through A
Sum of products of the probabilities along each limb terminating at D
Step 1
outcome
Machine
Step 2
Probability
Condition
P( A) 0.45
A
P( D A) P( A).P( D / A)
P( D / A) 0.06
D
P ( D / A) 0.94
P( B) 0.30
= 0.027
P ( D A). P ( D / A) =
0.423
P( D / B) 0.04
P( D B) P( B).P( D / B)
= 0.012
P(C ) 0.25
P ( D / B) 0.96
P( D B) P ( B ). P( D / B )
=0.288
P( D / C ) 0.03
of
P( D C ) P(C ).P( D / C )
= 0.0075
P( D / C ) 0.97
P( D C ) P (C.). P( D / C ).
=0.2425
61
P( A / D)
(0.45)(0.06)
(0.45)(0.06) (0.3)(0.04) (0.25)(0.03)
0.027
0.027 0.012 0.0075
0.027
0.581
0.0465
Before looking at any further examples, let us state the general form of Bayes Theorem.
Let A1 , A2 , . . . , An be a partition of a sample space S and let E be an event of the
experiment such that P( E) 0. Then the posterior probability P( Ai / E )(1 i n) is
given by
P( Ai / E )
P( A1 ) P( E / A1 )
P( A1 ). P( E / A1 )P( E / A2 )P( A2 ) . . . P( An ). P( E / An )
(3)
Problems
1)
2)
MMD
UPND
Independent.
Three girls Chanda, Mumba and Chileshe, pack okra in a factory. From the batch
allotted to them Chanda packs 55%, Mumba, 30% and Chileshe 15%. The
probability that Chanda breaks some okra in a packet is 0.7, and the respective
probabilities for Mumba and Chileshe are 0.2 and 0.1. What is the probability
that a packet with broken okra found by the Checker was packed by
a)
Chanda?
62
b)
c)
3)
Mumba?
Chileshe?
Solutions
MMD
UPND
Independent
P( MMD) 0.40
P(UPND) 0.35
P( I ) 0.25
P(V / MMD) 0.45 P(V / UPND) 0.40
P(V / I ) 0.60
a)
b)
i)
ii)
iii)
P(M / V )
P (V )
P ( M ). P (V / M )
P ( M ). P (V / M ) P (U ). P(V / U ) P ( I ). P (V / I )
0.18
0.383
0.47
P(M / V )
P (U V )
P (V )
P (U ). P (V / U )
P (V )
0.14
0.298
0.47
P (U / V )
P( I / V )
0.15
0.319
0.47
63
2.
Chanda,
(D)
Mumba
(M)
Chileshe
(H)
P ( D) .55,
P ( B / D ) 0. 7,
P ( M ) .30
P ( B / M ) 0.2,
P ( H ) .15
P ( B / H ) 0. 1
0.837
a)
P( B)
0.46
3.
b)
P( M / B)
P( M ). P( B / M ) 0.06
0.1304
P( B)
0.46
c)
P( H / B)
P( H ).P( B / H ) 0.015
0.0326
P( B)
0.46
Let R be the event the Professor received material. A be the even the Professor a
adopted the book
P(R).P(A/R)
P(A/R) = 0.30
P( A /R) = 0.10
P(R) = 0.8
P(A/ R ) = 0.10
P( R ) = 0.2
P( A / R ) = 0.90
64
P( R / A)
P( R A)
P( R ).P ( A / R )
P( A)
P( R).P( A / R) P( R).P ( A / R)
0.8(0.30)
0.8(0.30) 0.2(0.10)
0.24
0.24
0.923.
Learning Objectives
After working through this Chapter, you should be able to
65
CHAPTER 5
PROBABILITY DISTRIBUTION
Reading
Newbold Chapters 4 (not 4.4) and only 5.5 in Chapter 5
Wonnacott and Wonnacott Chapter 4
Tailoka Frank P Chapter 9
Introductory Comments
This Chapter introduces the three useful standard distributions for two counts (Discrete
Probability distribution) and one for (Continuous probability Distribution). These are so
often used that everyone should be familiar with them. We need to know the mean, the
variance and how to find simple probabilities.
5.0
66
1.
2.
f ( x) 1
x
Property 1: simply states that probabilities are greater than or equal to zero. The
second property states that the sum of the probabilities in a probability
distribution is equal to 1. The notation
f ( x)
means sum of the values f for all the values that x takes on. We will
ordinarily use the term probability distribution to refer to both discrete and
continuous variables; other terms are sometimes used to refer to probability
distributions (also called probability functions).
Probability distributions of discrete random variables are often referred to as
probability mass functions or simply mass functions because the probabilities are
massed at distinct points, for example along the x axis.
Probability distributions of continuous random variables are referred to as
probability density functions or density functions.
5.1
(1)
f (c ) f ( x )
(2)
xc
The symbol
f ( x)
x c
Means sum of the values of x for all values of x less than or equal to c.
67
Example 1
Shoprite is interested in diversifying its product line into the soft goods market.
Mr Phiri, Vice president in charge of mergers and acquisitions, is negotiating the
acquisition of quick-save, a discount shop. The determine the price Shoprite
would have to pay per share for quick save, she sets up the probability distribution
for the stock price shown in the table below.
Probability distribution and cumulative distribution for the price of Quick
save common stock.
Price of Quicksave
Common stock x
K74 250
76 500
78 750
81 000
83 250
Probability
f x
0.08
0.15
0.53
0.20
0.04
Cumulative Probability
F x
0.08
0.23
0.76
0.96
1.00
68
A graph of the cumulative distribution function is a step function that is the values
change in discrete steps at the indicated integral values of the random variable x.
F (x)
1.00
0.80
0.60
0.40
0.20
0.00
K74 250
76 500
78 750
81 000
83 250
Price of stock
Graph of cumulative distribution of the price of Quicksave common
stocks.
5.2
E ( x) xP( x)
All x
The variance of discrete random variable x is
69
2 E ( x ) 2 ( x ) 2 p ( x)
All x
In general, if g(x) is any function of the discrete random variable x, then
E[ g ( x)] g ( x) P( X x)
All x
For example
E (20 x) 20 xP( X x)
E ( x 2 ) x 2 P( X x)
E ( X 5) ( x 5) P( X x)
Example 2
The random variable X has the following distribution for x 1,2,3,4.
X
P( X x )
1
0.02
2
0.35
Calculate:
a)
b)
E ( x)
E (5 x 3)
c)
E( X 2 )
d)
6 E ( x) 8
e)
E (5 x 2 2)
Solution
a)
E( x) xP( X x)
1(0.02 ) 2(0.35 ) 3(0.53) 4(0.10 )
0.02 0.70 1.59 0.40
2.71
70
3
0.53
4
0.10
b)
E (5x 3) 5E ( x) 3
5 xP( X x) 3
5 [1(0.02) 2(0.35) 3(0.53) 4(0.10)] 3
5(2.71) 3
13.55 3
10.55
c)
E( X 2 ) X 2 P( X x)
12 (0.02) 22 (0.35) 32 (0.53) 42 (0.10)
0.02 1.4 4.77 1.6
7.79
d)
e)
6E( x) 8 6 xP( X x) 8
= 6(2.71) + 8 = 16.26 + 8
= 24.26
E (5 x 2 2) 5E ( x 2 ) 2
5E ( x 2 ) 2
5 x 2 P( X x) 2
5(7.79) 2
40.95
2)
3)
4)
71
2)
3)
Example 3
For the data in Example 2, calculate the following:
a)
Var(5 x 3) 25 var(x)
b)
Var(4 x)
c)
Var(3x 2)
Solution
a)
Var(5x 3) 25 var(x)
We will need to find Var ( x) E ( x 2 ) E 2 ( x)
E( X )
xP( X
x)
2.71.
E ( X 2 ) X 2 P( X x)
7.79
Var( x) E ( X 2 ) E 2 ( x)
7.79 (2.71) 2
0.4459
Var(5 x 3) 25 var(x)
25(0.4459)
Therefore var(5 x 3) 11.1475
72
b)
Var(4 x) 16 var(x)
16(0.4459) 7.1344
c)
Var(3 x 2) 9 var(x)
9(0.4459) 4.0131
Example 4
A risky investment involves paying K300 000 that will return K2, 700,000 (for a net
profit of K2, 400,000) with probability 0.3 or K0 .00 (for a net loss of K300 000) with
probability 0.7. What is your expected net profit from this investment?
Solution
x
2,400,000
-300,000
P(x)
0.3
0.7
(Note
that
a
loss
is
treated
as
a
negative
profit.)
Then E( x) xP( x) 2,400,000(0.3) (300,000)(0.7) 720,000 210,000 510,000
Your expected net profit on an investment of this kind is K510, 000. If you were to make
a very large number of investments, some would result in a net profit of K7200, 000, and
others would result in a net loss of K300, 000. However, in the long run, your Average
net profit per investment would be K510, 000.
5.3
On each trial, there are two mutually exclusive possible outcomes, which
are referred to as success and failure. In somewhat different language
sample space of possible outcomes on each experimental trial is S =
(failure, success).
b)
73
c)
The trials are independent. That is, the outcomes on any given trial or
sequence of trials does not affect the outcomes on subsequent trials.
Suppose we toss a coin 3 times, then we may treat each toss as one Bernoulli trial.
The possible outcomes on any particular trial are a head and a tail. Assume that
the appearance of a head is a success. For example, we may choose to refer to the
appearance for a defective item in a production process as a success, if a series of
births is treated as a Bernoulli process, the appearance of female 9male0 may be
classified as a success.
Consider the experiment of tossing a fair coin three times, then the sequence of
outcome is
HTH, HHH, HHT, THH, TTT, THT, TTH, HTT
Since the probability of a success and failure on a given trial are respectively, P
and , the probability of the outcome for instance {HTH } pqp p 2 q where p is
the probability of observing a head and q is the probability of observing a tail.
Outcome
Probability
HTH
pqp p 2 q
HHH
PPP p 3
HHT
ppq p 2 q
THH
qpp qp 2
THT
qpq q 2 p
TTT
qqq q 3
TTH
qqp q 2 p
HTT
pqq pq 2
We can obtain the number of such sequences from the formula for the number of
combination of n objects taken x at a time. Thus the number of possible sequences in
3
which two heads can occur is .
2
74
Thus C (n, x)
C (3,2)
n!
x!(n x)!
3!
3
2!1!
1
1
to p and to q. Hence
2
2
P( x 2) C (3,2)(1 / 2)(1 / 2) 2 3 / 8.
qqq
. q
n x Failures
ppp
x successes
The fact that this is a probability distribution is verified by noting the following
conditions.
1)
75
2)
f ( x) 1
x
Example 5
The tossing of a fair coin 3 times was used earlier as an example of a Bernoulli process.
Compute the probabilities of all possible numbers of heads and this establishes a
particular binomial distribution.
Solution
1
, n 3. Letting x
2
represent the random variable number of heads, the probability distribution is as
follows:
(Number of heads)
76
P( x)
3 1 1
1
8
0 2 2
3 1
1 2
3 1 1
3
8
2 2 2
3
1
8
2
3 1 1
1
8
3 2 2
3
Example 6
A machine that produces stampings for car engines is not working properly and
producing 15% defectives. The defective and no defective stampings proceed from the
machine on a random manner. If 4 stampings are randomly collected, find the probability
that 2 of them are defective.
Solution
Let P = 0.15 be the probability that a single stamping will be defective and let X equal the
number of defective in n = 4 trials. Then,
q 1 p 1 0.15 0.85 and
n
p( x)
x
x n x
p q 4(0.15) x (0.85) 4 x
4!
(0.15) x (0.85) 4 x ( x 0,1,2,3,4)
x!(4 x)!
77
4!
(0.15) 2 (0.85) 2 0.01625625(6)
2!(4 2)!
0.0975375
0.0975
P(2)
The mean, variance and standard deviation for a Binomial random variable is given by:
Mean
np
Variance
2 npq
To calculate the values of and in example 5, substitute n = 4 and P = 0.15 unto the
following formula
np 4(0.15) 0.60
npq (4)(0.15)(0.85) 0.51 0.714
Example 7
Payani Serenje owns 5 stocks. The probability that each stock will rise in price is 0.6.
What is the probability that three out of the five stocks will rise in price?
Solution
n 5 0.6,
q 1 P 0.4
P( X 3) (5,3)(0.6)3 (0.4) 2
5!
.(0.216)(0.16)
3!2!
(5)(4)
(0.216)(0.16)
2
0.3456
0.346
From the tables n = 5, P = 0 .6
78
5.4
P( X )
xe
x!
, forx 0,1,2 . . .
Where P(x) is the probability that a variable with a Poisson distribution equals x,
is the mean or expected value of the Poisson distribution, and e is
approximately 2.718 and is the base of the natural logarithms.
One reason why the Poisson distribution is important in statistics is that it can be
used as an approximation to the binomial distribution. If n (the number of trials)
is large and P (the probability of success) is small, the probability can be
approximated by the Poisson distribution where np . Experience indicates
that the approximation is adequate for most practical purposes if n is at least 20
and P is no greater than 0.05.
The Poisson distribution has been used to describe the probability function of such
situations.
1)
2)
3)
4)
5)
Product demand
Demand for service
Number of telephone calls that come through a switchboard.,
Number of death claims per day received by an insurance company.
Number of breakdowns of an electronic computer per much.
2)
There is some rate that characterizes the process producing the outcome. The rate
is the number of occurrences per interval of time or space.
79
For instance, product demand can be characterized by the number of units purchased in a
specified period. Product demand may be viewed as a process that produces random
occurrences in continuous time.
The characteristics of a Poisson distribution are as follows:1)
The experiment consists of counting the number of times a particular even occurs
during a given unit of time, or in a given area of volume (or any unit of
measurement,
2)
The probability that an event occurs in a given unit of time, area, or volume is
independent of the number that occur in their units.
Note that the most important difference between the Binomial and the Poisson
distributions is that in the Binomial distribution we find the probability of a number of
successes in n trials , whiles as for the Poisson distribution we find the probability of the
number of successes per unit of time.
Example 7
Suppose the random variable X the number of the companys absent employees on
Tuesdays has (approximately) a Poisson probability distribution. Assuming that the
average number of Tuesday absentees is 3.4;
a)
Find the mean and standard deviation of x , the number of absent employees on
Tuesday.
b)
Find the probability that exactly 3 employees are absent on a given Tuesday.
c)
Find the probability that at least two employees are absent on a Tuesday.
Solution
a)
The mean and variance of a Poisson distribution are equal to . Thus for this
example
= 3.4,
2 3.4
3.4 1.84
b)
We want the probability that exactly three employees are absent on Monday. The
probability distribution for x is
80
P( X )
X e
X!
c)
(from Table 2)
0.2186 .
3!
6
To find the probability that at least two employees are absent on Tuesday, we
need to find
P( X 2) P(2) P(3) . . . P( X )
x2
P( X 2) 1 P( X 1) 1 [ P(0) P(1)]
(3.4)0 e3.4 (3.4)1 e3.4
1
0!
1!
1 [0.033373 (3.4)(0.03337]
1 0.1468412 0.8531588
0.8532
Example 8
a)
b)
Either one or two airplanes will arrive between 13.00 hours and 14 00 hours next
Saturday?
c)
A total of exactly two airplanes will arrive between 13 00 hrs and 14 00 hrs
during the next three Saturdays?
81
Solution
a)
= 3 and we let X be the number of arrivals during the specified time period.
30 e. 3
P (0)
0.049787068
0!
0.0498
b)
P( X 1 or X 2) P( X 1)P( X 2)
31 e 3 32 e 3
1!
2!
9
e 3 (3 )
2
15
( )(0.04978068)
2
0.37340301
0.3734.
c)
A total of exactly two arrivals in three Saturdays during the period 13 00 hours to
14 00 hours can be obtained. For example by having two arrivals on the first day,
none on the second day, and none on the third day during the specified one-hour
period.
The total number of ways in which the event in question can occur is shown in the
table below.
Saturday Day 1
2
0
0
1
1
0
Number of Arrivals
Saturday Day 2
0
2
0
1
0
1
82
Saturday Day 3
0
0
2
0
1
1
(32 e 3 ) (30 e 3 ) 2
(31 e 3 ) 2 (30 e 3 )
3
2!
0!
1!
0!
9
81
9
81e
3e 9 9
(0.0001)
2
2
2
0.0049815
0.005
5.5
Similar to the
f ( x) 0 and
f ( x)dx 1.
a
b
83
5.6
F ( x)
(x 2 )
2
2 2
Example 1
Suppose you have a normal random variable x with 50 and 15.
Find the probability that x will fall within the interval 30 x 70 .
Solution
We compute the Z-Score (or standard score) for the measurement x, the
standard score is defined by:
Z
Value Mean
x
Thus Z
30 50
1.33
15
70 50 20
1.33
15
15
f (x)
84
(4)
30
50
70
85
The probability that a normal random variable will be more than 1.64 standard deviation
to the right of its mean is indicated in the figure above. Because the normal distribution
is symmetric, half of the total probability (.5) lies to the right of the mean and half to the
left. Therefore, the desired probability is P(Z 1.64) 0.5 A. .
Where A is the area between 0 and Z =1.64 as shown in the figure.
Referring to Table 1, the area A corresponding to Z = 1.64 is 0.4495, so,
P(Z 1.64) 0.5 A 0.5 0.4495 0.0505.
Example 3
Find the probability that the value of the standard normal variable will be between 1.23
and +1.14.
Solution
Table 1.0 show that the area under the standard normal curve between 0 and 1.23 is
0.3907, so the area between 0 and 1.23 must also be 0.3907. Table 1.0 show that the
area between 0 and 1.14 is 0.3729. Thus, the area between 1.23 and +1.14 equals
0.3907 + 0.3729 = 0.7636, which means that the probability we want equals 0.7636.
-1.23
+1.14
Example 4
Find the probability that the value of the standard normal variable will be between 0.43
and 1.55.
86
Solution
0.43 1.55
From Table 1, the area between 0 and 1.55 is 0.4394 and that between 0 and 0.43 is
0.1664. Therefore the area between 1.55 is 0.4394 0.1664 = 0.2730.
The Normal Distribution As An Approximation To The Binomial Distribution
Normal Approximation to the Binomial Distribution. If n (the number of trials) is large
and P ( the probability of success) is not too close to 0 or 1, the probability distribution of
the number of successes occurring in n Bernoulli trials can be approximated by a normal
distribution. Experience indicates that the approximation is fairly accurate as long as
1
1
1
and n(1 p) when p .
np 5 when p
2
2
2
Example 5
1
. A firm has 100
2
such machines and whether one is down, is statistically independent of whether another is
not down. What is the probability that at least 60 machines will be down?
The probability that a machine will be down for repairs next week is
Solution
The number of machines down for repair has a binomial distribution with mean equal to
1 1
100 or 50. Because of the continuity correction, the probability that the
2 2
number down for repairs is 60 or more can be approximated by the probability that the
value of a normal variable with mean equal to 50 and standard deviation equal to 5
exceeds 59.50. The value of the standard normal variable corresponding to 59.50 is (5950) 5, or 1.9. Table 3 shows that the area under the standard normal curve between
87
zero is 1.9 is 0.4713, so the area to the right of 1.9 must equal 0.5000 0.4713 = 0.0287.
This is the probability that at least 60 machines will be down for repair.
Learning Objectives
After working through this Chapter, you should be able to:
Find the mean and the variance of the binomial, Poisson and Normal distributions.
88
1.
2.
a)
It is estimated that 75% of a grapefruit crop is good, the other 25% have
rotten centers that cannot be detected unless the grapefruit is cut open.
The grapefruit are sold in sacks of 6. Let r be the number of good
grapefruit in the sack.
i)
ii)
iii)
iv)
v)
b)
a)
In a lottery, you pay K12 500 to choose a number (integer) between 0 and
9999, inclusive. If the number is drawn, you win K12 500,000. What is
your expected gain (or loss) per play?
b)
ii)
iii)
iv)
From past records the hotel knows that 0.2% of its customers will
require medical attention while staying in the hotel. Calculate the
exact and approximate probability that no customer out the 500
will require medical attention while attending the conference. Is
this approximation better or worse that the approximation used in
(ii)? Why?
89
3.
a)
The Table below shows the probabilities for the number of complaints
received each day by a newspaper agency from customers not receiving a
paper.
No. of complaints
Probability
b)
4.
a)
b)
8
.35
9
.42
10
.18
11
.03
12
.02
i)
ii)
A write has prepared to submit sit articles for publication. The probability
of any article being accepted is 0.20. Assuming independence, find the
probability that the writer will have
i)
ii)
iii)
iv)
A Toyota dealer wishes to know how many citations to order for the
coming month. Estimated demand is normally distributed, with a standard
deviation of 20 and a mean of 120.
i)
ii)
A client wishes to know what price he might be able to get for a business
property. The realtor estimates that a sale price for that property of K600
million would be exceeded no more than 5% of time. A price at least
K420 million should be obtained at least 90% of the time.. Assuming the
distribution of sales prices to be normal, answer the following questions?
i)
ii)
What is the probability of a scale price greater than K540, less than
K640 million, and between K540 million and K600 million.
90
5.
a)
b)
6.
Which of the following are continuous variables, and which are discrete
variables.
i)
ii)
iii)
iv)
v)
ii)
iii)
iv)
c)
The Mulenga Caf has found that about 6% of the parties who make
reservations dont show up. If 90 party observations have been made, how
many can be expected to show up. Find the standard deviation of this
distribution.
a)
b)
i)
65
ii)
89
P( Z 2.12 )
P (16 Z 1.13)
91
7.
c)
d)
The side effects of a certain drug cause discomfort to only a few patients.
The probability that any individual will suffer from the side effects is
0.005. If the drug is given to 35 000 patients, what is the probability that
three (3) will suffer side effects.
a)
b)
c)
i)
ii)
iii)
A car rental company is determined that the probability a car will need
service work in any given month is 0.25. The company has 850 cars.
i)
What is the probability that more than 150 cars will require service
work in a particular month?
ii)
What is the probability that fewer than 180 cars will need service
work in a given month? (Give reason for the method used to
calculate the probabilities in (i) and (ii).
Time (days)
Probability
1
.04
2
.21
3
.34
4
.31
5
.10
i)
ii)
iii)
iv)
93
CHAPTER 6
SAMPLING AND SAMPLING DISTRIBUTION
Reading
Newbold Chapter 6
Wonnacolt and Wonnacolt Chapter 6
Tailoka Frank P Chapter 10
James T Mc Clave and P George Benson Chapter 7
Introductory Comments
We now start on the work that defines the subject Statistics as a different and unique
subject. The idea of sampling and sampling distribution for a statistic like the mean must
be clearly understood by all users of statistics. This is not an easy Chapter to understand.
6.
Sampling Theory
Sampling and Sampling Distribution
6.1
Sampling
If we draw an object from a box, we have the choice of replacing or not replacing
the abject into the box before we draw again. In the first case a particular object
can come up gain and again, whereas in the second it can come up only once.
Sampling where each member of a pollution may be chosen more than once is
called sampling with replacement while sampling where each member cannot be
chosen more than once is called sampling without replacement.
94
6.2
Sampling Distributions
X1 X 2 . . . X n
n
95
(1)
If x1 , x2 , . . ., xn denote the values obtained in a particular sample of size b, then the mean
x x . . . xn
for that sample is denoted by x 1 2
(2)
2
Theorem 6.1
The mean of the sampling means denoted by x
(3)
Where is the mean of the population. Theorem 6 1 states that the expected value of
the sample mean is the population mean.
Theorem 6.2
If a population is infinite and the sampling ir random or if the population is finite and
sampling is with replacement, then the variance of the sampling distribution of means,
denoted by x2 , is given by
E (x )
2
2
x
2
n
Theorem 6.3
If the population is of size N, if sampling is without replacement, and if the sample size is
2 N n
2
(5)
n N , then the previous equation is replaced by x
n N 1
While x is from Theorem 6.1.
Note that Theorem 6.3 is basically the same as 6.2 as N
96
Theorem 6.4
If the population from which samples are taken is normally distributed with mean and
variance 2 , then the sample mean is normally distributed with mean and variance
2
.
n
Theorem 6.5
Suppose that the population from which samples are taken has a probability with mean
and variance 2 that is not necessarily a normal distribution. Then the standardized
variable associated with x , given by
(6)
n
is asymptotically normal, i.e.
lim
n
P( Z z )
1
2
2
2
du
(7 )
Theorem 6.5 is a consequence of the Central limit theorem. It is assumed here that the
population is infinite or that sampling is with replacement. Otherwise, the above is
correct if we replace
in Theorem 6.5 by x2 as given in theorem 6.3.
n
Example 1.0
Five hundred ball bearings have a mean weight of 5.02kg and a standard deviation of
0.30kg. Find the probability that a random sample of 100 ball bearings chosen from this
group will have a combined weight of more than 5.10kg.
For the sampling distributions of means, x 5.02 kg, and
97
x2
2
n
N n
N 1
The combined weight will exceed 5.10kg if the mean weight of the 100 bearings exceeds
5.10kg.
5.10 in standards units Z
5.10 5.02
2.96
0.027
The required probability is the area to the right z = 2.96 as shown in Figure 6.1.
2.96
Figure. 6.1.
The probability is 0.5 0.4985 = 0.0015. Therefore, there are only 3 chances in 2000 of
picking a sample of 100 ball bearings with a combined weight exceeding 5.10 kg.
p P
pq
p(1 p)
n
(8)
For large values of n(n 30) the sampling distribution is very nearly a normal
distribution, as seen from Theorem 6.5. For finite populations in which samplings
98
pq
n
Example 2.0
A simple random sample of size 64 is selected from a population with p 0.30 .
(a)
What is the expected value of p ?
(b)
What is the standard deviation of p ?
(c)
Show the sampling distribution of p ?
(d)
What does the sampling distribution of p show?
Solution
(a)
(b)
(c)
(d)
0.00328125 0.0573 .
of p p
n
64
Normal with E ( p ) 30 and p 0.0573 .
The probability distribution of p .
for each sample of size n2 drawn from the second population, let us compute a statistic
X 2 whose mean and standard deviation are X and X respectively.
2
Taking all possible combinations of these samples from the two populations, we can
obtain a distribution of the differences X 1 X 2 , which is called the sampling distribution
of differences of the statistics. The mean and standard deviation of this sampling d,
denoted respectively.
By X 1 X 2 X 1 X 2
1X2
X2 1 X2 2
(9)
Provided that the samples chosen do not in any way depend on each other, i.e., the
samples are independent (in other words, the random variables X 1 and X 2 are
independent.)
99
Similarly for the sample means from two populations, denoted by x1 , x2 , respectively,
then the sampling distribution of the differences of means is given for infinite population
with mean and standard deviation X , X and X , X , respectively by
1
1 x2
x1 x 2 1 2 ,
1 x2
x2 x2
1
12
n1
and
22
n2
(10)
(11)
Using Theorems 6.1 and 6.2 this result also holds for finite populations if sampling is
done with replacement. The standardized variable
Z
( X 1 X 2 ) ( 1 2 )
12
n1
22
n2
in that case is very nearly normally distributed if n1 and n2 are large (n1 , n2 30 ).
Similar results can be obtained for infinite populations in which sampling is without
replacement by using Theorems 6.1 and 6.3.
Example 3.0
In the age of rising housing costs, comparisons are often made between costs in different
areas of the country. In order to compare the average cost 1 of a 3 bedroom, 2 bath
home in Kitwe to the average cost 2 of a similar home in Lusaka, independent random
samples were taken of 190 housing costs in Kitwe and 120 housing costs in Lusaka.
Describe the sampling distribution of ( x1 x2 ) , the difference in sample housing costs in
the two cities.
Solution
The mean of the sampling distribution of x1 x2 is E x1 x2 E ( x1 ) E ( x1 ) 1 2
The variance of x1 x2 is the sum of the variances of x1 and x2 ; Thus
x2 x
12
22
12
22
n1 n2 190 120
the costs of 3 bedroom, 2 bath homes in Kitwe and Lusaka, respectively. The standard
1
100
12
22
190 120
P P P1 P2
1
P p
1
(13)
P1q1 P2 q2
n1
n2
(14)
Example 4.0
It has been found that 2% of the tools produced by a certain machine are defective. What
is the probability that in a shipment of 400 such tool, 3% or more will prove defective?
p P 0.02,
p q
0.02(0.98) 0.14
0.007
400
20
0.03 0.02
P( P 0.03) P Z
0.007
P( Z 1.43)
0.5000 0.4236
0.0764
1.43
101
Learning Objectives
After working through this Chapter, you should be able to:
Find the mean and the variance of the Binomial, Poisson and Normal distribution.
Define the sampling distribution of the sample mean, the sample proportion and
their differences.
102
CHAPTER 7
ESTIMATION
Reading
Newbold Chapter 7
Wonnacott and Wonnacott Chapter 7
Tailoka Frank P Chapter 10
Introductory Comments
We need to know how the mean of the population is related to the sample mean.
What characteristics must the sample mean have. We need to know whether the sample
is likely to give us an estimate close to the population value. To tell us this, we use
confidence intervals.
7.
Estimation Theory
7.1
7.2
103
Example 1.0
If we say that a distance is 34.5km, we are giving a point estimate. If, on the
other hand, we say that the distance is 34.5 0.04km, i.e., the distance lies
between 34.46 and 34.54km, we are giving an interval estimate.
A statement of the error or precision of an estimate is often called reliability.
7.3
7.4
104
If the statistic S is the sample mean x , then the 95% and 99% confidence limits
for
estimation
of
the
population
mean
are
given
by
(1)
N n
N 1
(2)
2.93
67.45 1.96
100
or
67.45 0.57
Then the 95% confidence interval for the population mean is 66.88 to 68.02
cm, which can be denoted by 66.88 68.02.
105
We can therefore say that the probabilit that the population mean height lies
between 66.88 and 68.02 cm is above 95%.
In symbols, we write
P(66.88 68.02) 0.95% . This is equivalent to saying that we are 95%
confident that the population mean (true mean) lies between 66.88 and 68.02cm.
7.5
x
S
t0.025
(3)
From which we can see that can be estimated to lie in the interval
x t0.025
S
S
x t0.025
n
n
(4)
with a 95% confidence. In general the confidence limits for population means are
given by
x tc
S
n
(5)
106
(b)
The
point
estimate
x
x n
s
(c)
7.6
n 1
of
the
population
121
1879
standard
deviation
is
5.615
s
5.615
, 13.444 1.860
, 13.444 3.4813 .
n
9
Thus, the 90% confidence interval estimate of the population mean is
9.9627 to 16.9253.
We have x t0.05,8
n . Using the value of p obtained in chapter 6; we see that confidence limits for
P Zc
pq
P(1 P)
P Zc
n
n
(6)
pq
n
N n
N 1
(7 )
107
Example 4.0
A sample roll of 100 votes chosen at random from all voters in a given district
indicated that 55% of them were in favour of a particular candidate. Find the 99%
confidence limits for the proportion of all voters in favour of this candidate.
The 99% confidence limits for the population P are
P 1.58 p P 2.58
0.55 2.58
P(1 p)
n
055(0.45)
100
0.55 0.13
7.7
X 2 Z c s1 s 2 X 1 X 2 Z c s21 s22
(8)
While confidence limits for the sum of the population parameters are given by
X 2 Z c s1 s2 S1 S2 Z c s21 s22
(9)
For example, confidence limits for the difference of two population means, in the
case where the populations are infinite and have known standard deviations
1 , 2 , are given by
x x Z
1
x1 x 2
x1 x 2 Z c
s2 s2
1
n1
108
n2
(10 )
where x1 , n1 and x 2 , n2 are the respective means and sizes of the two samples
drawn from the populations.
Similarly, confidence limits for the difference of two population proportions,
where the populations are infinite, are given by
P1
P 2 Z c
P(1 p1 )
P (1 p2 )
2
n1
n2
(11)
When P1 and P2 , are sample proportions and n1 and n2 are sizes of the two
samples drawn from the populations.
Example 5.0
In a random sample of 400 adults and 600 teenagers who watched a certain
television program, 100 adults and 300 teenagers indicated that they like it.
Construct the 99.7% confidence limits for the difference in proportions of all
adults and all teenagers who watched the program and liked it.
Confidence limits for the difference in proportions of the two groups are given by
911), where subscripts 1 and 2 refer to teenagers and adults, respectively, and
Q1 1 p1 , Q2 1 p2. Here P1 300 / 600 0.5 and P2 100 / 400 0.25 are
respectively, the proportions of teenagers and adults who liked the program.
600
400
0.25 0.09
(12 )
Therefore, we can be 99.7% confident that the true difference in proportions lies between
0.16 and 0.34.
109
2
2
the sample size n needed to provide any sampling error. Let d the maximum sampling
2
2
Z
n 2 2
error, we have
. This is the sample size which will provide a
d
probability statement of 1 with sampling error d or less.
In most cases, , will be unknown. In practice one of the following procedures can be
used.
(a)
Use a pilot study to select a preliminary sample. The sample standard
deviation from the preliminary sample can be used as the planning value for .
(b)
Use the sample standard deviation from a previous sample of the same
or similar units
(c)
Use judgment or best guess for the value of . This is where you apply
the Empirical rule or the Chebyshevs rule.
Example 6.0
How large a sample should one select to be 90% confident that the sampling error is 3 or
less? Assuming the population variance is 36.
Solution
We have d 0.05 , Z 0.05 1.65 and 6 . Hence
2
2
1.65 6
n
32
6.6
In cases where the computed n is a fraction, we round up to the next integer value: hence
the recommended sample size here is 7.
pq
. In practice, the planning value for the population
d2
2
proportion can be chosen in the same way as the population mean. However if none of
them applies, use p 0.05
As for a proportion, n Z 2
110
Example 7.0
In a survey, the planning value for the population proportion p is given as 0.45. How
large a sample should be taken to be 95% confident that the sample proportion is within
0.04 of the population proportion?
Solution
We have d 0.04 , Z 0.025 1.96 , p 0.45 and q 0.55 . Hence
Example 8.0
How large a sample should be taken to be 90% confident that the sampling error of
estimation of the population proportion is 0.02 or less? Assume past data are not
available for developing a planning value for p ?
Solution
We have Z 0.05 1.65 , and assume that p 0.5 , q 0.5 and d 0.02 .
Therefore n
Learning Objectives
Find confidence intervals for means of normal populations, and for differences of
means of two normal populations, both when variance (s) are known and when
they are unknown..
111
CHAPTER 8
HYPOTHESIS TESTING
Reading
Newbold Chapter 9
Wonnacott and Wonnacott Chapter 9
Tailoka Frank P Chapter 10
Introductory Comments
We often need to answer questions about a population such as Is the mean of the
population less 5? or Is there any difference between two means? In statistics we try
to answer these questions based on the information in samples. There is useful
information in this Section of this subject for everyday life.
The theory of tests of hypothesis is necessarily linked to that for confidence intervals.
8.0
8.1
Statistical Decisions
Very often in practice we are called upon to make decisions about
populations on the basis of sample information. Such decisions are called
statistical decisions. For example, we may wish to decide on the basis of sample
data whether a new serum is really effective in curing a disease, whether one
educational procedure is better than another, or whether a given coin is loaded.
8.2
Statistical Hypothesis
In attempting to research decisions, it is useful to make assumptions or
guesses about the populations involved. Such assumptions, which may or may
not be true, are called Statistical hypotheses and in general are statements about
the probability distribution of the populations. For example, if we want to decide
whether a given coin is loaded, we formulate the hypothesis that the coin is fair,
i.e., p = 0.5, where p is the probability of heads. Similarly, if we want to decide
whether one procedure is better than another, we formulate the hypothesis that
there is no difference between the two procedures (i.e., any observed differences
112
are merely due to fluctuations in sampling from the same population). Such
hypotheses are often called null hypotheses, denoted by H o .
Any other hypothesis that differs from a given null hypothesis is called an
alternative hypothesis. For example, if the null hypothesis is p = 0.5, possible
alternative hypotheses are p 0.7, P 0.5 or P 0.5. A hypothesis alternative
to the null hypothesis is denoted by H1 .
8.3
8.4
Level of Significance
In testing a given hypothesis, the maximum probability with which we
should be willing to risk a type I error is called the level of significance of the
test. This probability is often specified before any samples are drawn so that
results obtained will not influence our decision.
In practice a level of significance of 0.05 or 0.01 is customary, although other
values are used. If for example a 0.05 or 5% level of significance is chosen in
designing a test of a hypothesis, then there are about 5 chances in 100 that we
would reject the hypothesis when it should be accepted; i.e., whenever the null
hypothesis is true, we are about 95% confident that we would make the right
decision. In such cases we say that the hypothesis has been rejected at a 0.05
level of significance, which means that we could be wrong with probability 0.05.
8.5
113
in Figure 8.1, and the extreme values of Z would lead to the rejection of the
hypothesis.
Critical
region
Critical
region
0.95
0.25
0.25
Z = -1.96
Z = 1.96
114
a)
8.6
8.7
P-Value: The P-value is the smallest value of which will lead to the rejection
of the null hypothesis.
8.8
Special Tests
For large samples, many statistics share nearly normal distributions with
mean s and standard deviation s . In such cases we can use the above results to
formulate decision rule or tests of hypotheses and significance. The following
special cases are just a few of the statistics of practical interest. In each case the
results hold e for infinite populations or for sampling with replacement. For
sampling without replacement from finite populations, the result must be
modified.
1.
x
/ n
(1)
for n 30 .
115
for n 30 ,
tc
2.
x
S n
P p
pq / n
In case P
(2)
x
, where x is the actual number of successes in a sample, (2)
n
becomes
Z
3.
X np
npq
(3)
x x 0
1 x
12
n1
22
n2
(4)
4.
X1 X 2 0
1X2
X1 X 2
(5)
X X
population proportions, i.e., P1 = P2 , and thus the samples are really drawn from
the same population.
1 1
n
1 n2
p P 0, P P 2 P(1 P)
1
where P
n1P1 n2 P2
is used as an estimate of the population proportion P.
n1 n2
P1 P 2 0
P P
1
P1 P2
P P
1
we can observe
1.
Ho :
= 1600
H a : 1600
2.
.025
.025
.95
-1.96
1.96
117
3.
n = 100, Z c
X
S
n
1570 1600 30
120
12
100
= -2.5
4.
8.83
(b) Z C
s
3
n
60
Since Z C 8.83 1.65 , we reject H 0 .and conclude that we have sufficient
evidence based on this sample at 5% level of confidence to say that the population
mean is less than 12.
118
(a)
(b)
(c)
(d)
(e)
Example 3.0
Consider the following hypothesis test. H 0 : 15 ; H a : 15 . Data from the
sample of seven items are: 8, 10, 9, 11, 15, 9, 7.
Compute the sample mean.
Compute the sample standard deviation.
With 0.05 , what is the rejection rule?
Compute the value of the test statistic t .
What is your conclusion?
Solution
x 69 9.857
n
x
x n
721
692
n 1
6
(c) This is a two tailed test we reject H 0 if tc t0.025, 6 2.447 or
(b) The sample standard deviation s
2.6095
tc t0.025, 6 2.447
x 9.857 15
5.2144
s
2.6095
n
7
(e) Since tc 5.2144 5.2144 , we reject H 0 .
(d) tc
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d)
Example 4.0
Consider the following hypothesis test. H 0 : P 0.35 ; H a : P 0.35 . A sample
of 500 provides a sample proportion of P .255 .
At 0.01, what is the rejection rule?
Compute the value of the test statistic Z .
What is the P value?
What is your conclusion?
Solution
This is a two tailed test. Reject H 0 if Z C 2.58 or Z C 2.58
P P
0.255 0.35
Z
4.45
0.350.65
pq
n
500
P Value P(Z 4.45) Or PZ 4.45 . Because of the symmetrical nature of
the normal distribution , P value 2 PZ 4.45 0
Reject H 0 .
119
Learning Objectives
After working through this chapter you should be able to:
Carry out statistical tests of all the types covered in this Chapter.
Explain the way in which the rejection regions of tests follow from the
distributional results, taking into account the level and considerations of power.
120
2.
3.
Determine how many different samples of size 2 can be drawn from this
infinite population and list them.
(b)
Determine the means of the samples of part (a). What is the probability
assigned to each mean? Construct the sampling distribution to the mean for
random samples of size 2 drawn from this infinite population.
(c)
Calculate the mean and the standard deviation of the probability distribution
of part (b) and compare the value of the standard deviation with the
corresponding result obtained from the standard error of the mean formula.
(a)
population parameter
(ii)
sample statistics
(iii)
population
(b)
(a)
(ii)
(iii)
Standard error
121
(b)
4.
(a)
(b)
(ii)
(iii)
(iv)
(v)
(ii)
A students test.
80
86
90
95
100 110 85 75
105 115 92 74
65
64
85
92
72
73
74
(i)
(ii)
Test the null hypothesis that the mean salary for the private
institutions is K5, 000,000 more than in the public institutions
against the alternative that the mean for the private institutions is
more than K5, 000,000 greater.
(iii)
State carefully the assumptions you have made in arriving at the test
and confidence interval.
122
5.
(a)
(b)
6.
(a)
(b)
7.
(a)
Rejection region
(ii)
(ii)
(ii)
(iii)
An Air Force base mess hall has received a shipment of 10 000 gallon size
cans of cherries. The supplier claims that the average amount of liquid is
0.25 gallon per annum. A government inspector took a random sample of
100 cans and found the average liquid content to be 0.28 gallon per can
with a standard deviation of 0.10.
(i)
Does this indicate that the suppliers claim is too low? (Use 95%
level of significance).
(ii)
Model 1:
Mean time
Model 2:
123
Type I error
(ii)
Decision
(iii)
Type II error
124
CHAPTER 9
ANALYSIS OF VARIANCE
Reading
Newbold Chapter 15
Wonnacott and Wonnacott Chapter 10
Tailoka Frank P Chapter 13
Introductory Comments
Analysis of Variance (ANOVA) is a popular tool that needs some time and effort to
appreciate. The idea of analysis of variance is to investigate how variation in structured
data can be split into pieces associated with components of the structure. Here we cover
one-way and two-way cases. Both tests and confidence intervals are widely used in
applications.
Analysis Of Variance
Use of F-distribution: The F-distribution is used to test the hypothesis that the variance of
one normal population equals the variance of another normal population.
The second use of the F-distribution involves the analysis of variance techniques,
abbreviated ANOVA. Basically, analysis of variance uses sample information to
determine whether or not three or more treatments produce different results. A treatment
is a cause, or specific source, of variation in a set of data. Following are several cases to
expand on the meaning of a treatment.
Do different treatments of fertilizer affect yield? Do different grades of gasoline affect
performance? Do four different assembly methods result in different population means?
125
1.
2.
3.
The samples we select from each of the populations are random and independent
that is they are not related.
Table 1.0
Monthly Sales of appliances for three sales People.
Sample
Ms Banda
Mr Chisenga
25
25
19
15
15
17
14
17
13
10
16
11
21
17
12
18
14.4
17
Mean
126
The ANOVA procedure calls for the same hypothesis procedure outlined in the lecture
notes of Estimation and hypothesis testing.
STEP 1
STEP 2
STEP 3
MSE
The numerator has k-1 degrees of freedom. The denominator has N-K
degrees of freedom, where k is the number of treatments and n is the number
of observations.
STEP 4
127
In using the predetermined 0.05 level, the decision rule is to accept the null hypothesis
H o if the computed F value is less than or equal to 3.89; we reject H o if the computed F
value is greater than 3.89. The decision rule is shown diagrammatically.
Region of rejection
Region of acceptance
3.89
Distribution of F for a k of 3 and an N of 15.
0.05
0.05
STEP 5
Compute F, and arrive at a decision. The first step is to set
up an ANOVA table. It is merely a convenient form to record the sum of
squares and other computations. The general format for a one-way
analysis of variance problem is shown in table 2.0
Table 2.0
A general format for Analysis of Variance Table
Source of
variation
(1)
Sum of Squares
(2)
Degrees of
freedom
(3)
Mean squares (1)/(2)
K-1
SST
MSTR
K 1
N-K
SSE
MSE
N K
SST
Between
Treatments
SSE
Error(within
treatments)
SS Total
Total
128
Formula For
SST
K 1
SSE
NK
F=
MSRT
MSE
Where
SST
is the abbreviation for the sum of square treatment and is found by:
2
(T 2 ) ( X )
SST =
n
N
x
x
=
2
Treatment total
Compute SST
SST =
2
(T 2 ) ( X )
n
N
5
5
5
15
= 4101.8 4067.27
= 34.53
129
Compute SSE
[T ]
X N
2
SSE =
5
5
5
= 4 355 -
( 247 )
15
( X )
= 4.355 4067.27
= 287.73
Three sums of squares and the calculation needed for F are transferred to the ANOVA
Table 3.
Table 3.0
ANOVA Table for the Store Managers problem
Source of
variation
(1)
Sums of square
(2)
degrees of freedom
Between
treatment
SST = 34.53
K-1=3-1=2
Error (within
253.2
SSE
=
SS Total)
287.73
N-K = 15-3 = 12
130
1
Mean squares
2
SST 34.53
17.265
k 1
2
SSE
253.2
21.1
NK
12
SST
MSRT 17.265
0.818
K 1
MSE
21.1
SSE
NK
Computing F: F =
The decision rule states that if the computed value of F is less than or equal to the critical
value of 3.89, the null hypothesis is accepted. If the F value is greater than 3.89, H o is
rejected and H a is accepted. Since 0.818 < 3.89, the null hypothesis is accepted at the
0.05 level. To put it another way, the differences in the mean monthly sales (K17,000,
K18,000 and K14,000) are due to chance (sampling). From a practical standpoint, the
levels of sales of the three salespeople being considered for Store manager are the same.
No decision with respect to the position can be made on the basis of monthly sales.
Inferences about Treatment Means
Suppose in carrying out the ANOVA procedure, we make the decision to reject the null
hypothesis. This allows us to conclude that all treatment means are not the same.
Sometimes we may be satisfied with this conclusion, but in other instances we may want
to know which treatment means differ. Let us consider the following example:
Four groups of students were subjected to different teaching techniques and tested at the
end of a specified period of time. As a result of dropouts from the experimental groups
(due to sickness, transfer, and so on), the number of students varied from group to group.
Do the data shown below present sufficient evidence to indicate a difference in the mean
achievement for the four teaching techniques? Use 0.05 level of significance.
1
65
67
73
79
81
69
454
SS (total) =
= 139511 -
2
75
69
83
81
72
79
90
549
X
(1779 )
23
ij
3
59
78
67
62
83
76
4
94
89
80
88
425
351
( X ij )
= 139511 137601.78
= 1909.22
131
T2
CM
i 1 ni
K
SST =
137601 .78
6
7
6
4
Table 4.0
ANOVA Table For Students
Source of
Variation
SST
SSE
SS Total
Sums of square
712.59
1196.63
1909.22
Degrees of
Freedom
3
19
22
Mean square
237.53
62.98
237.53
3.77
62.98
Decision Rule: Reject H o if the computed F value is greater than F.05, 3, 19 = 3.13.
Since FC 3.77 3.13 , we reject H o .
Recall that in the Stores managers data there was no difference in the treatment means.
In this case further analysis of the treatment means is not warranted. However, in the
foregoing example, regarding mean achievement for the four teaching techniques, we
found a difference in the treatment means. That is, the null hypothesis is rejected and the
alternative hypothesis is accepted. If the achievements do differ, the question is between
which groups do the treatment means differ?
Several procedures are available to answer this question. Perhaps the simplest is through
the use of confidence intervals. A confidence interval for the difference between two
population means is found by:
X 2 t
2
N K
1 1
MSE
n1 n2
Where:
132
X1
X2
t
MSE
n1
n2
X 2 t
2
,N K
1 1
MSE
n1 n2
1 1
62.98
6 7
= -2.76 9.24
= --12.00 and 6.48
where
X 1 = 75.67, X 2 = 78.43
133
Caution
The investigation of differences in treatment means is a sequential process. The initial
step is to conduct the ANOVA test. Only if the null hypothesis that the treatment means
are equal is rejected should any analysis of the treatment means be attempted.
Two-Way Analysis of variance:
In the appliance sales, example, we were unable to show that a difference exists among
the mean sales of the three salespeople. In the computation of F- statistic, variation was
considered as originating from two sources. First, variation within each of the treatment
was considered. The variation either originated from the treatment or was considered
random. There are other possible sources of variation, such as the training the sales
people had, the days of the week on which the sample data were obtained, etc. Two-way
analysis of variance allows us to consider at least one other of these possibilities.
Example:
EUROAFRICA is expanding bus services from the Capital City into the heart of the
Copperbelt. There are four routes being considered from Kitwe to the other four towns.
The travel times in minutes along each of the four routes are given below.
Travel Time from Kitwe to Other Four Towns
DAY
Monday
Tuesday
Wednesday
Thursday
Friday
LUANSHYA
40
38
38
37
41
NDOLA
45
42
40
43
41
CHINGOLA
46
44
44
42
40
MUFULIRA
34
30
33
40
32
At the 0.05 significance level, can it be concluded there is a difference among the four
routes? Does it make a difference which day of the week it is?
The null hypothesis is that the mean time is the same along the four routes, then this
requires the one-way ANOVA approach. The variation that occurs because of
differences in the days of the week is considered random and is included in the MSE
term. Thus the F ratio is reduced. If the variation due to the day of the week can be
removed, the denominator or the F ratio will be reduced. In this case, the day of the week
is called a blocking variable. Hence, we have variation due to treatment and due to
blocks. The sum of squares due to block (SSB) is computed as follows:
SSB
B
K
( X ) 2
N
134
Where B refers to the block total, that is, the total for each row, and K refers to the
number of items in each block.
The same format is used for the two-way ANOVA Table as was used in the one-way
ANOVA case. SST and SS total are computed as before. SSE is obtained by subtraction
(SSE = SS Total SST-SSB). Table 4.0 shows the necessary calculations.
Luanshya
40
38
38
37
41
194
7538
5
Ndola
Chingola
Mufulira
46
44
44
42
40
216
9352
5
34
30
33
40
32
169
5769
5
45
42
40
43
41
211
8919
5
Row
Sum
165
154
155
162
154
790
31578
Analogous to the ANOVA Table for a one-way analysis, the two way general format is:
Source of
(1)
Sum of Squares
(2)
Degrees of
freedom
(3)
Mean squares (1)/(2)
K 1
SST
MSTR
K 1
SST
Treatments
SSB
Blocks
Error
SSE
SSTotal
n 1
( K 1)(n 1)
Total
135
SSB
MSB meansquare
n 1
SSE
MSE
( K 1)( n 1)
SST =
n
N
5
5
5
20
5
= 31474.8 31205
= 269.8
B X
=
31205
4
4
4
4
4
X
2
(790 )
20
( X )
= 31578 31205
= 373
SSE = SS total SST SSB
= 373 269.8 26.5
= 76.7
136
The values for the various components of the ANOVA Table are computed as follows:
(1)
Sum of Squares
Source of
variation
(2)
Degrees of
freedom
(3)
Mean squares (1)/(2)
89.933
6.625
12
6.392
269.8
Treatments
26.5
Blocks
76.7
Error
Total
373
19
2.
Ho
Ha
Ho
Ha
First we all test the hypothesis concerning the treatments means. There are K-1 = 4-1 = 3
degrees of freedom in the numerator and (n-1) (K-1) = (4-1)(5-1) = 12 degrees of
freedom in the denominator. Using the 0.05 significance level, the critical value of F is
3.49. The null hypothesis that the mean times for the four routes are the same is rejected
if the F ratio exceeds 3.49.
F=
MSTR 89.933
14.07
MSE
6.392
The null hypothesis is rejected and the alternate accepted. It is concluded that mean
travel time is not the same for all routes. EUROAFRICA will want to conduct some tests
to determine which treatment means differ.
Next, we test to find out if the travel time is the same for different days of the week. The
degrees of freedom in the numerator for blocks is n-1 = 5-1 = 4. The degrees of freedom
137
in the denominator is the same as before: (n-1) (K-1) = (5-1) (4-1) = 12. The null
hypothesis that the block means are the same is rejected if the f ratio exceeds 3.26.
MSB 6.625
F=
1.04
MSE 6.392
The null hypothesis is accepted. The mean travel time is the same for the various days of
the week.
Problems
1)
Suppose that we want to compare the cholesterol contents of four competing diet
foods on the basis of the following data (in milligrams per package) which were
obtained for three 6-ounce packages of each of the diet foods.
Diet Food
A
3.6
4.1
4.0
nA 3
B
3.1
3.2
3.9
nB 3
C
3.2
3.5
3.5
nC 3
D
3.5
3.8
3.8
nD 3
The means of these four samples are YA 3.9,YB 3.4 , YC = 3.4 and Y4 3.7 .
We want to know whether the differences among them are significant or whether
they can be attributed to chance, using 5% level of significance.
2)
Of the three banks in Kitwe, customers are randomly selected from each bank and
their waiting times before service are recorded.
Bank
ZNCB
4.8
Standard Chartered 6.9
bank
7.1
Barclays bank
Do these data indicate a significant difference among the mean waiting times of
these banks? Use the 0.05 significance level.
138
3.
4)
5)
Ndola
Kitwe
5.6
8.8
9.0
7.8
8.2
7.4
8.2
11.0
10.1
8.9
9.3
10.0
a)
b)
c)
d)
b)
c)
d)
139
b)
c)
d)
2
D
13.4
B
12.9
A
12.2
C
12.3
3
B
12.7
D
12.9
C
11.4
A
11.9
ANSWERS
Diet Food:
A
3.6
4.1
4.0
Total X 11.7
B
3.1
3.2
3.9
10.2
35.06
45.77
C
3.2
3.5
3.5
10.2
34.74
X
X
SS Total =
43 .2
= 156.7 -
12
= 156.7 155.52
= 1.18
140
D
3.5
3.8
3.8
11.1
41.13
T X
SST =
- 155.52
= 0.54
SSE = SS Total SST = 1.18 054 = 0.64
___________________________________________________
Source of
Degree of
Mean square
F
Variation
Freedom
____________________________________________________
SST = 0.54
3
0.18
SSE = 0.64
8
0.08
2.25
___________________________________________________
SS Total = 18
11
____________________________________________________
F0.05,3,8 4.07 , Therefore we accept H o
2)
_______________________________________________
Bank
Waiting
Sample
Time
Size
________________________________________________
ZNCB
4.8, 5.5, 6.3
3
16.6
92.98
Standard
Chartered
Bank
25
166.44
Barclays
7.1, 3.5
2
10.6
62.66
________________________________________________
141
(52 .2) 2
322 .08 302 .76 19 .32
SS Total = 322 .08
9
(16 .6) 2 25 10 .6
302 .76
SST =
3
5
2
2
Source of
variation
SST
SSE
Sum of Square
1.523
17.797
Degree of
freedom
2
6
Mean square
0.7615
2.966
0.257
SS Total
19.32
3.
H o : 1 2 3
(104 .3) 2
12
906 .54
SST=
3
4
5
= 11.718
142
Source of
variation
SST
SSE
Sum of Square
Degree of
freedom
2
9
11.718
10.332
Mean square
5.859
1.148
5.10
SS Total
22.05
11
We cannot reject H o since F0.01, 2,9 8.02 . The evidence does not suggest any
differences in the weights of tomatoes.
4)
a)
where S =
S =
1.07
7.8 t0.025,9
3
MSE
b)
7.9 (2.262)
(1.071 )
4
7.9 1.2
(6.7, 9.1)
T T t
1
c)
1 1
ni n j
143
1 1
3 4
5)
SSB =
(43 .5)
(50 .8)
(48 .9) 2 (143 .2)
4
4
4
12
(143 .2)
1721.76 12
SST =
33
1708 .85
Source of
variation
SST
SSB
SSE
SS Total
5)
a)
Sum of Square
5.2
7.175
0.535
12.91
Degree of
freedom
3
2
6
11
H a : A B C D
H a : One of the means is not equals.
b)
144
Mean
square
1.7333
3.5875
0.0892
F
19.43
40.22
H o : 1 2 3
H a : One of the means is not equal.
c)
1 1
0.0892
3 3
-1.4
0.597
(-1.997, -0.803)
145
Learning Objectives
Carry out small examples of one way and two-way analysis of variance with a
hand calculator, presenting in an ANOVA table.
Carry out tests of hypothesis, and to write down confidence intervals as in this
Chapter.
146
a)
Monday
10.5
8.4
5.9
Tuesday
8.4
9.3
7.1
Friday
12.6
11.4
6.7
Saturday
18.3
7.9
14.2
Sunday
10.8
6.3
13.7
Day
(b)
(i)
(ii)
(iii)
(iv)
(v)
(vi)
State the assumptions required for the validity of the procedures used
in parts (ii) to (v).
147
2.
(a)
(b)
A power plant, which uses water from the surrounding bay for cooling its
condensers, is required by the Environmental Protection Agency (EPA) to
determine whether discharging its heated water into the bay has a
detrimental effect on the flora (plant life) in the water. The EPA requests
that the power plant make its investigation at three strategically chosen
locations, called stations. Stations 1 and 2 are located near the plants
discharge tubes, while station is further out in the bay. During one
randomly selected day in each of 4 months, a diver is sent down to each of
the stations, randomly samples a square meter area of the bottom, and
counts the number of blades of the different types of grasses present. The
results are as follows for one important grass type.
Month
(c)
Station
1
May
28
31
53
June
25
22
61
July
37
30
56
August
20
26
48
(i)
(ii)
148
3.
(a)
4.
Sales Area
1
120
76
95
114
60
102
140
85
122
102
80
85
(i)
(ii)
(b)
State the three assumptions of the error term in the analysis of variance
models. Which of the three assumptions is most critical in validating an
analysis of variance model fitted to a data set?
(a)
(b)
Store 2
Store 3
High shelf
60
56
52
Eye-level shelf
53
58
56
Low shelf
55
55
59
149
5.
(a)
Three of the currently most popular television shows produced the following
ratings (percentage of the television audience tuned into the show) over a
period of four weeks:
Week
1
2
3
4
Totals
(b)
SHOW
B
28.4
32.2
32.4
28.2
121.2
A
34.7
38.1
35.1
30.4
138.3
C
23.8
20.7
25.8
29.9
99.2
Totals
86.9
91.0
93.3
87.5
358.7
(i)
(ii)
(iii)
(iv)
Associate
16
13
16
9
Full
12
8
7
10
8
(i)
(ii)
(iii)
Test the null hypothesis that the three population times are equal.
Use 0.05 .
150
CHAPTER 10
TIME SERIES
Reading
Newbold Chapter 17
Tailoka Frank P Chapter 6
Plane and Oppermann 395
Introductory Comments
This Chapter follows from the Index and allows the understanding of some alternative
ways of presenting the results. Index numbers plays an important role in forecasting and
here models of forecasting are presented.
10.1
Introduction
Any variable that is measured overtime in sequential order is called a time series.
The primary characteristic of a time series is the assumption that the observations
have some form of dependence on time. Since this time dependence may take on
any number of possible patterns, the problem becomes one of identifying the most
important factors.
Business people, economists, and analysts of various kinds all look back at the
sequence of events that occurred over the past year or years in order to understand
what happened and thereby (they hope) to be in a better position to anticipate
what may happen in the future.
A leveling-off long-term population growth, for example, may indicate to a
particular firm that future market expansion may not be unlimited and that more
careful attention should be paid to increasing the firms market share. Even with
a general slowdown in population growth, the gradual aging of the population
may imply to another firm one concentrating in consumer goods for older
people that its total market potential is growing substantially year after year,
other types of time dependent patterns may exist, as well. In looking at a time
series of monthly or quarterly beer sales, for example, we may discover a regular
seasonal pattern in which beer consumption peaks. Other regular periodic or
seasonal variation can be observed in sales of college textbooks, and in the
151
observance of such social customs as giving Christmas gifts and Valentines Day
flowers.
The task of time series analysis can therefore be thought of quite generally as a
matter of identifying and isolating the various major time dependent patterns on a
given time series data array. Once accomplished, this analysis should enhance the
users ability to forecast variables of interest over the future.
The classical time-series model focuses on the decomposition of the timedependent variable into four component parts: trend (T), cycle (C), seasonal
variation (S), and residual or irregular variation (I).
The model may be additive in its component parts:
Yt Tt St I t Ct
2.
3.
152
4.
Seasonal these are the oscillations, which depend on the season of the year.
Thus, employment is usually higher at harvest time at Nakambala Sugar Estate in
Mazabuka. Rainfall will be higher at some times of the year than at others.
The motivation behind decomposing a time series is twofold. On the one hand,
we wish to see whether a particular component is present in a given time series
and to understand the extent to which it explains some of the movements in the
variable of interest. On the other hand, if we wish to forecast a particular
variable, we can usually improve our forecasting accuracy by first breaking it into
component parts, then forecasting each of these parts separately, and finally
combining the individual effects to produce the composite overall forecast.
Business Forecasting is concerned with estimating the future value of some
variable of interest. This may be done for the short-term or for the long-term, and
different forecasting models are more appropriate for one case than for the other.
Forecasting may be done in any of three possible ways. Using regression models,
using time series models, and using forecasting models especially created for a
specific purpose. Indeed, quantitative forecast models have even been designed
for cases in which historical databases are not available such as when a firm
wishes to forecast sales of a new product or the expected profitability or market
share for such a product.
Today, forecasters have developed a specialized terminology or jargon and many
forecasting models require a level of mathematical sophistication and the
availability of computers and specialized computer software that go far beyond
the scope of this book. As such, our objective in this course is to provide the
student with a basic understanding of the underlying issues about the use of
various types of forecasting models, rather than to provide a sophisticated level of
hands on experience.
10.2
Trend Analysis
The first component of a time series that we will consider is the long-term trend.
A trend can be linear or nonlinear and, indeed, can take on a whole host of other
functional forms such as polynomials and logarithmic trends, among others. We
shall begin by working through an example using a linear model.
Example
Annual sales for a pharmaceutical company have been recorded over the past 10
years; they are shown in Table 1.1. Calculate a linear trend of the data.
153
YEAR
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
TOTAL
TIME
X
1
2
3
4
5
6
7
8
9
10
55
X2
1
4
9
16
25
36
49
64
81
100
385
SALES
(in K millions)
18.0
YEAR
1975
1976
19.4
1977
18.0
1978
19.9
1979
19.3
1980
21.1
1981
23.5
1982
23.2
1983
20.4
1984
24.4
154
XY
18.0
38.8
54.0
79.6
96.5
126.6
164.5
185.6
183.6
244.0
1,191.2
Least Squares Method: The simplest method of fitting a linear trend is to use the
least squares approach we discussed in the handout on Regression Analysis. In
this method, the formulas for the slope and intercept are:
xy
x
a Y bx
n
x 2
n
1191.2
55207.2
10
2
55
385
10
51.6
.6255
82.5
55 17.28
207.2
.6255
10
10
Y 17.28 .6255x
17.28 .625511
24.1605
24.16
Similarly, forecasting 2 years ahead would involve setting x equal to 12; and so on.
Both confidence and prediction intervals can be constructed to give us a bound of
confidence about our forecast. The Caveat about forecasting outside the data range must
be emphasized here-especially if forecasting for more than one time period is being
contemplated.
155
Example 1.2
Among the more common functional forms used in trend analysis are the following three:
1.
A linear model,
y P0 P1 x
which is appropriate if the first differences are roughly equal (first differences are
between success values in time series).
2.
A polynomial form,
y P0 P1 x P2 x 2
( parabola)
or
y P0 P1 x 2
( parabola)
Y P0 ( P1 ) x
or
log y log P0 (log P1 ) x
which is appropriate if neither A linear or polynomial form fits but there
nonetheless appears to be a constant rate of increase over time.
10.3
Moving Averages
An alternative approach to trend-cycle analysis is to use moving averages. In a
sense, the moving average, MA, takes away the short-term seasonal and irregular
variation, leaving a combined trend widely used to remove seasonal variation,
irregular variation (or noise, as it is also called), or both.
Example 1.2
Monthly sales figure for gasoline were recorded at all the gas stations in a
particular town, as shown in table 1.3. Calculate the three-month and five month
moving averages.
156
Example 1.3
Monthly Regional Gasoline Sales
GASOLINE SALES
(1000s of kilograms)
37
70
45
26
60
45
31
79
24
61
25
44
MONTH
1
2
3
4
5
6
7
8
9
10
11
12
Solution
A moving average is a simple arithmetic average computed over any number of time
periods. For a three period moving average, we would take the first three months (1, 2,
and 3) and average them. Then we would move to the next month grouping (2, 3 and 4)
and averaging them; and so on. In a similar fashion, we can compute 5 month moving
averages, as shown in table 1.4, or any other number of months averages.
Table 1.4 Calculations for Moving Averages for Gasoline Sales Example
Month
Gasoline
Sales
1
2
3
4
5
6
7
8
37
70
45
26
60
45
31
79
3 month MA
Moving
3
Total
Moving
=
Average
152
50.7
141
47.0
131
43.7
131
43.7
136
45.3
155
51.7
134
44.7
157
5 month MA
Moving
3 Moving
Total
= Average
238
246
207
241
239
240
47.6
49.2
41.4
48.2
47.8
48.0
9
10
11
12
24
61
25
44
164
110
130
-
54.7
36.7
43.3
-
220
233
-
44.0
46.6
-
Notice that, the longer the time period, over which we average, the smoother the series
becomes. Eventually it becomes a straight line moving average. Reducing the number
of observation points for the 3 month moving average, we lose the first and last month;
for the 5 month moving average, we lose both the first 2 and the last 2 months.
In general, if we set the period of the moving average exactly equal to the number of
seasonal variations that occur in a given time series, we exactly remove that seasonal
variation. For example, if we have quarterly observations and wish to remove the four
seasons, we choose a 4 period moving average. Here (and in general) when the number
of periods chosen is even numbered we must compute a centered moving average.
Example 1.3
Historical occupancy rates for a Kasaba resort hotel have been compiled by the
government tourism office; these are shown in Table 1.5 calculate 4 quarter moving
average.
Solution
To remove the seasonal variation, we need to compute a 4 period moving average.
This, however, would place the moving average exactly between the two quarters.
Consequently, we next take a 2 period moving average of all 4 period moving averages,
thereby centering the final moving average on a particular quarter. Our calculations
appear in Table 1.6.
Notice that we first calculated the 4-quarter moving and then centered it by determining
the averages of each pair of adjacent moving averages. For example, the moving average
of the first four quarters is 105. The moving average of quarters (1980 and 1981) II, III,
IV and I are 90. The centered moving average is (105 + 90)/8 = 24.4. The remaining
centered moving averages are computed in a similar manner.
158
1981
1982
1983
1984
Quarter
Hotel Occupancy
I
II
III
IV
I
II
III
IV
I
II
III
IV
I
II
III
IV
I
II
III
IV
40
20
30
15
25
15
35
20
35
22
32
18
36
16
30
20
37
17
32
18
Moving averages are specifically designed to remove seasonal and/or irregular variations.
As such, they can be thought of as serving three purposes. First, they are one of several
types of smoothing techniques that remove short-term variation and leave only a
combined trend-cycle. In other words, if we think of the classical multiplicative time
series model, we have
Y T .C.S.I
by dividing both sides by (S.I.), we get
Y
T .C.S .I .
T .C MA
S .I
S .I
That is, we are left with the moving average series, which is composed solely of the trend
and cycle.
159
Second, we can set the period of the moving average exactly equal to the number of
seasonal effects we wish to remove. In that sense, we have deseasonalized our time
series.
Table 1.6 Centered Moving Average Calculation for Hotel Occupancy
Year
1980
1981
1982
1983
1984
Quarter
Occupancy
4 Quarter
Moving Total
I
II
III
IV
40
20
30
15
I
II
III
IV
I
II
III
IV
25
15
35
20
35
22
32
18
105
90
85
90
95
105
112
I
II
III
IV
I
II
III
IV
36
16
30
20
37
17
32
18
Centered
Moving
Average
24.4
21.9
21.9
23.1
25.0
27.1
109
107
108
102
27.6
27.0
26.9
26.3
100
102
103
104
106
104
-
25.3
25.3
25.6
25.9
26.3
26.3
-
This is one of the simplest methods of forecasting but it is only appropriate for series with
no trend or seasonal effect. It is often used to predict the demand for a product in the
next time period so that sufficient stock can be kept to supply it. (This is called demand
forecasting.)
10.4
Irregular Variation
Irregular or random variation remains after the trend, cyclic and seasonal variation
have been removed. One way of removing it is through smoothing techniques,
such as the moving average we discussed in section 1.3. Another popular
technique is exponential smoothing, which we shall look at shortly.
By definition, irregular variation is unpredictable and random, can only
sometimes be identified through examination of major external events that might
have influenced the time series, and often tend to cancel each other out over time.
160
st Yt 1 Yt 1 1 Yt 2 ...
2
This formula states that the current periods smoothed value of the time series, St
depends on all past values of the dependent variable, although these are weighed
progressively less the farther back they go. We set the smoothing constant such that
2
0 1, which means that the successive values of , 1 , 1 ..., get smaller
and smaller. There is a mathematical procedure for selecting the best or optimal value of
the smoothing constant, but it is beyond the level of this course. In fact, selecting small
values for straightens out the time series more completely than selecting large values
of does. By simple mathematical derivation, it can be shown that the extended
exponential smoothing equation just described reduces to a computationally simpler
form, called the basic exponential smoothing equation:
St Yt 1 St 1
or
St Yt St 1 St 1
for
0 1
(1)
S2 Y2 1 S1
S3 Y3 1 S2
and so on.
Setting the smoothing constant to either of its extremes yields one of two cases. When
0, then
St 0. yt 1 0St 1
St 1
Since we set S1 Y1 , it follows that St Y1 for all t . Thus smoothed values are simply
equal to the initial value of the time series. Setting 1, then
161
St 1. yt 1 1St 1
Yt
Thus, the smoothed value of the series is just the most recent observation, and all earlier
observations are ignored. Such a series is called a random walk or a nave forecasting
model. Here, the forecast value in any particular year is simply the previous years
value.
The layout for working out problems using equation (1) is as follows:
(t)
Actual
Values
(Yt)
Y1
Y2
3
.
.
t
Time Period
Forecasted Values
St
Y S0
S1
Y3
Y2 S1
Y2 S 2
S3 S2 Y2 S1
.
.
.
.
.
.
Yt St 1
Yt
S 2 S1 Y1 S0
St St 1 Yt 1 St 2
Month 1986
January
February
March
April
May
June
July
August
Yt
1
2
3
4
5
6
7
8
$1,800
2,000
1,800
3,000
2,700
1,900
3,000
2,600
162
September
October
November
December
9
10
11
12
1,700
1,200
2,400
1,500
0.1
Time Period
t
1
2
3
4
5
6
7
8
9
10
11
12
Actual Sales
Yt
Yt St 1
Forecast
Sales
$1,800
2,000
1,800
3,000
2,700
1,900
3,000
2,600
1,700
1,200
2,400
1,500
-30
-7
-26
96
57
-29
84
36
-58
-102
28
-65
2,100
2,070
2,063
2,037
2,133
2,190
2,161
2,245
2,281
2,223
2,121
2,149
2,084
2
, where n is the
n 1
number of periods in the equivalent moving average. For example, for a 4quarterly moving average over 1 year n 4 , 0.4. The larger the value of
n , of course, and the smaller the value of , the greater will be the smoothing
effect.
163
Worked Examples
1.
The old forecast for the first observed value should be taken as 40 with 0.2 .
St Yt St 1 St 1
2.
0.2
Yt St 1
S t 1
Yt
1
2
3
4
5
6
7
40
35
39
44
45
43
46
0
-1
0
1
1
0.4
0.92
40
40
39
39
40
41
41.4
42.32
Exponentially Smooth the following data what is the new forecast for the
production of aircraft in 1971? (Take 0.25 ).
Year
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970
Production
of New
Aircraft
518
Year
t
1
2
3
4
5
395
487
Yt
518
395
487
450
319
450
319
415
431
0.25
S t 1
0
-31
0
-9
-40
518
518
487
487
478
Yt St 1
164
312
278
500
450
6
7
8
9
10
11
415
431
312
278
500
450
-6
-0.25
-30
-31
32
12
438
432
432
402
371
403
415
Problems:
1.
The accompanying table shows earnings per share of a corporation over a
period of 18 years.
Year
1
2
3
4
5
6
2.
Earnings
3.63
3.62
3.66
5.31
6.14
6.42
Year
7
8
9.
10.
11
12.
Earnings
7.01
6.37
5.82
4.98
3.43
3.40
Year
13.
14.
15.
16.
17.
18.
(a)
(b)
Earnings
3.54
1.65
2.15
6.09
5.95
6.26
0.7
and
0.9, find
1966
20.9
17.3
15.6
13.9
1967
17.5
14.7
13.5
13.1
1968
17.0
13.5
13.5
13.7
Is there any evidence that manufacturers sales of womens footwear are subject
to seasonal variation? Predict manufacturers sales during the first quarter of
1969.
New Forecasted
Value = old forecasted value + (actual observation old forecasted value).
0.2
165
Period
Reference
1
2
3
4
5
6
7
Actual
Demand
16
20
15
19
17
21
25
Old
Forecast
16
16
16.80
16.44
16.95
16.96
17.77
19.22
New
Forecast
16.00
16.80
16.44
16.95
16.96
17.77
19.22
Learning Objectives
After working through this Chapter you should be able to:
Discuss appropriate model to use when forecasting, least squares method, moving
average method, exponential smoothing method
166
CHAPTER 11
INDEX NUMBERS
Reading
Newbold Chapter
Plane and Oppermann Chapter 16
Tailoka Frank P Chapter 5
Introductory Comments
This Chapter looks at an index number which is useful in describing the way in which
the economic changes from period to period using prices, quantities etc.
A device constructed by statisticians which attempts to explain the magnitude of
economic changes overtime is called an index number.
An index number shows the rate of change of a variable from one specification to
another.
You will realize that the index of retail prices attempts to measure the change in the price
of a whole range of goods and services that we regularly buy. So you can see that it is
attempting to measure the cost of living something that concerns us all. In times of
inflation, the retail price index is probably more important than at any other time in its
existence. In the some developed countries increase in pay and pension are index linked. The consumer price index (CPI) is an indicator of what is happening to prices
consumers are paying for the items purchased. The CPI measures changes in price over a
period of time. It is often used as a measure of inflation.
However, to do this we need to know what an index is, how it is calculated and what its
limitations are. The primary function of price index is to compare prices in one year with
those in some other years. Technically prices in a given year are to be compared with
prices in the base year which are taken as standard. Conventionally P1 refers to the price
in the given year and P0 refers to the price in the base year.
A Price Index: measures the change in the money value of a group of items overtime. If
only one item such as bread is being considered the comparison between years may be
made by the calculation of price relatives, i.e., the prices in the given year relative to the
base year.
167
P
Price relative = 1 .100
P0
e.g., if the price of a loaf was K1100 in 1999 and K1700 in 2000, the 2000 price relative
1700
to 1999 was
100 154 .5 . The interpretation of price index is straight forward.
1100
The price index for 2000 is 154.5. This means that the 2000 price of a loaf of bread is
154.5 percent of the 1999 (base year) price of a loaf of bread.
ii)
Unless the purpose is clearly defined the eventual usefulness of the final index
will be suspect. In other words it must be designed to show something in
particular.
Selection of the items for inclusion
iii)
The main principles to be followed are that the items selected must be
unambiguous, relevant to the purpose, and their values ascertainable.
Since index numbers are concerned largely with making comparisons over
given time periods, an item selected one year must be clearly identified (i.e. in
terms of size, weight, capacity, quality, etc.) so that the same item can be
selected the following year for comparison.
Selection of appropriate weights
iv)
I1
Pq
Pq
1 0
0 0
168
.100
where I1
P0
W P
A
B
C
q0
500
35
65
p1
75
90
100
I1
P0
45
50
55
pq
p q
1 0
P1q0
37500
3150
6500
47150
P0 q0
22500
1750
3575
27825
100
0 0
I1
47150
.100 169.5
27825
It may now be stated that prices have risen by 69.5% overall from 1990 to 1991
based on the evidence of these three commodities.
This index is a reasonable measure of the change in prices over a short period of,
say, two years, but if the given year is a longer period in time from the base year,
the weights used tend to become out of date as spending habits change and no
longer give a realistic comparison between the two years. This disadvantage may
be overcome by using a given year weighted index as calculated by the Paasche
formula.
169
I1
pq
p q
1 1
100
0 1
This index gives the change in the total value of the given year consumption from
the value it would have had in the base year. The disadvantage of the Paasche
price index is that the quantity must be predetermined each year, thus adding to
the time and cost of data collection. Moreover, each year the index numbers for
previous years must be recomputed to reflect the effect of the new quantity
weights.
p1
75
90
100
P0
45
50
55
q1
800
150
80
I1
P1q1
60000
13500
8000
81500
P0 q1
36000
7500
4400
47900
81500
.100 170
47900
From this calculation prices may be said to have risen 70% overall. However, this
formula is equally unrealistic in that it compares hypothetical past quantities with
current real quantities rather than vice versa. One suggested way out of the
dilemma is to calculate an average index number which is the geometric mean of
the Laspeyres and the Paasche index numbers which is called the Fishers price
index.
I F I L .I p 100
Pq . Pq
Pq Pq
1 0
1 1
0 0
0 1
.100
Fishers price index has its own disadvantages each years index number is
calculated with new weights the only comparisons that can be made are between
the given year and the base year and the successive years are not directly
comparable as with the Laspeyres formula. It is also costly and time consuming
operation to find raw weights each year.
2.
170
Year
1971
1972
1973
1974
1975
1976
3.
BASE CHANGE
Index A
100
110
120
130
140
150
Index B
66.7
73.3
80.0
86.7
93.3
100
1
250
2
300
3
350
4
225
Calculate and interpret a chain base index using week 1 as the base.
Index
Index
Index
Index
( wk1) 100
Pr ice
wk 2
300
( wk 2)
100
100 120 (to 2 d . p.)
Pr ice
wk1
250
Pr ice
wk 3
350
( wk 3)
100
100 116.67 (to 2 d . p.)
Pr ice
wk 2
300
225
( wk 4)
100 64.29 (to 2 d . p.)
350
At the end of the second week the share price had increased by 20% from the end
of the first week. By the end of the third week the share price had increased again
but at a slower rate (16.67%) when compared with week 2. In week 4 the price
had dipped with a 35.71% decrease from week 3.
171
4.
Index A
Pq66
Year
240
200
180
1972
1973
1974
1975
1976
Index B
Pq68
pq
66
200
180
160
180 200
Series A
172
Year
1972
1973
1974
1975
1976
Base 1974
100.0
83.3
75.0
67.5
60.0
111.11
133.34
100.00
90.00
80.00
The index series B came into being because the weights were changed in 1974. It
would of course be possible to change the weights every year and using the chain
index technique relate that year back to the original base Series A. This is the
method used in calculating the index of retail prices.
5.
Income
K2,610,000
K3,150,000
Price Index
100
157
Real Income
K2,610,000.00
K2,006,369.43
Example:
Suppose that the income column in table 3.0 shows the incomes from a sales
representative in 1974 and 1976, the base year of the index of retail prices has
been taken as 1974 and the value for 1976 is 157. Real income may be calculated
by dividing actual income by the price index.
1974 real income =
K 2,610,000
K 2,610,000
1.00
173
K 3,150,000
K 2,006,369.43
1.57
It may be said that the salesmans real income has decreased by K603, 630.57
over the two years.
The following figures give the distribution of income percentages for an average
family:
Food
%
45
15
Clothing
05
Rent
20
Other items
15
Food
Clothing
Rent
Other Items
2003
180
40
95
50
65
2004
200
45
80
55
80
2005
215
42
95
60
80
174
2.
(i)
Calculate a cost of living index for the years 2004 and 2005, taking 2003 as
a base year.
(ii)
Comment briefly on the problem of the choice of items and weights when
constructing an index number.
(a)
(b)
(c)
3.
Explain briefly the major weakness of the paasche index in this case and
suggest an alternative.
The following figures give the distribution of income percentages for an average
family:
%
Food
25
Fuel and light
20
Clothing
25
Rent
10
Other items
20
175
Clothing
Rent
Other Items
1999
180
35
100
45
65
2000
195
34
90
45
75
2001
210
30
95
50
75
(a)
Calculate a cost living index for the years 2000 and 2001, taking 1999 as a
base year.
Comment briefly on the problem of the choice of items and weights when
constructing an index number.
Define what is meant by a fixed base index number and a chain based
index number and explain the different ways in which these alternatives
have to be interpreted.
From the following data, calculate:
i)
A laspeyre price index for 2003.
(b)
4.
(a)
(b)
ii)
5.
Commodity
A
2001
Average price (K)
18 250
Quantity
155
2003
Average price (K)
1 8 750
Quantity
195
39 100
275
46 000
310
7 000
120
9 000
195
14 750
435
22 700
380
74 200
95
101 800
130
(a)
(b)
9 500
510
Apples
3 000
600
176
Meat
7 000
24 000
7 200
9 990
July 2004
Butter
4 500
4 200
Potatoes
8 500
4 200
Apples
3 500
1 500
Meat
(c)
7 500
19 500
24 000
29 400
You are required to compute a Laspeyres index showing the extent of the rise in
prices of all four commodities.
Explain briefly the major weakness of the Laspeyres index in this case and
suggest an alternative.
Learning Objectives
After working through this Chapter you should be able to
177
CHAPTER 12
REGRESSION ANALYSIS
Reading
New bold Chapter 12, 13
Pfaffenberger Chapter 13, 14, 15
James T. McClare, Chapter 10, 11
P. George Benson
Wonnacott and Wonnacott Chapter 12, 15
Introductory Comments
We carry through the ideas of least Squares fitting; using further assumptions that allow
confidence intervals and tests, connection between regression and analysis of variance
becomes apparent. Correlations are very importance for all work with many variables.
Regression Analysis helps one determine the probable form of the relationship between
variances. The objective of this method of analysis is usually to predict or estimate the
value of one variable corresponding to a given value of another variable. The English
Scientist Sir Francis Galton (1822 1911) first proposed the ideas of regr4ession in
reports of his research in the area of heredity first in sweet peas and later in human
stature. (Business Statistics, Third Edition Daniel/Terrel page 301).
1.1
178
complicated for practical use. On the other hand, an analysis that has forced the
sample data into a model that is not applicable is worthless. Fortunately we can
get useful results from a model that falls somewhere between these two extremes.
The type of relationship between the two variables X and Y that is of concern
here is a linear relationship. This implies that the relationship of interest has
something to do with a straight line. The measurements that are available for
analysis come in pairs, x1 , y1 , x2 , y2 ,... , xn , yn where the measurements xi , yi
are taken on the same entity, called the unit of association.
Two variables X and Y are linearly related if their relationship can be expressed
by the following simple linear model:
yi X i ei
(1)
Where y i is the value of the Y variable for a typical unit of association from the
population, xi is the value of the X variable for that same unit of association,
and are parameters called the regression constant and the regression
coefficient, respectively, and ei is a random variable with a mean of 0 and a
variance of 2 . To understand the model of equation (1), we must consider the
assumption underlying simple linear regression.
1.2
179
To demonstrate inferential
procedures, we shall assume in the examples and exercises that follow that the
Y values are normally distributed.
4. The variances of subpopulations of Y are all equal.
Y / x xi
(2)
the Y intercept and slope, respectively, of the line on which all the
subpopulation means are assumed to lie.
5. The Y values are statistically independent. This means that in drawing the
sample, the values of Y chosen at one value of X in no way depend on the
values of Y chosen at another value of X.
180
We are now in a position to shed some more light on the term ei in the simple
linear model. Solving equation (1) for ei , we have
ei yi xi
(3)
Thus ei shows the amount by which y i deviates from the mean of the
subpopulation of Y values from which it is drawn, since by equation (1)
estimate and in order to make inferences about the true line of regression of
Y on X.
We can explain the procedures involved in regression analysis more easily by
means of a numerical illustration.
Example (1)
An operations analyst conducts a study to analyze the relationship between
production and manufacturing expenses in the electronics industry. A sample of
n 10 firms, randomly selected from within the industry yields the data in Table
(1). Manufacturing expenses is considered to be the dependent variable. It
changes as the volume of production varies. On the other hand, a change in
181
40
42
48
55
65
79
88
100
120
140
Y (thousands of kwachas)
150
140
160
170
150
162
185
165
190
185
(4)
Here a is the point at which the line crosses the Y axis and b is the amount by
which the line changes per unit change in x. We refer to a as the Y intercept and
b as the slope of the line. To draw a straight line for the sample data, then we
need only numerical values for a and b. Once we have these values, we can
substitute two different values of X into the equation and get corresponding
values of Y. If we plot the resulting coordinates x1 , y1 and x2 , y2 on the graph
and connect them, we have a straight line.
Figure (2) is a graph of a straight line. Here we see the geometric relationships
between the slope, the Y intercept, and a unit change in x.
We can find numerical values for a and b for any set of data such as that in the
present example by simultaneously solving the following two equations:
Y na b x
i
(5)
Yi a X i b X i2
182
(6)
Their solution yields the equation for the least squares line
(7)
Where y denotes the calculated value of Y for a given X, and a and b are
estimates of and , respectively.
Table (2) gives the values of
Y , X , X Y , Y , X
2
i i
2
i
needed to solve the equations. Substituting values from Table (2) into equations
(5) and (6) gives.
Figure (2) A linear regression equation illustrating the geometrical interpretations
of a and b.
b = slope
a = y.intercept
183
xi2
1 600
1 764
2 304
3 025
4 225
6 241
7 744
10 000
14 400
19 600
70 903
yi
40
42
48
55
65
79
88
100
120
140
Total 777
150
140
160
170
150
162
185
165
190
185
1 657
xy
6 000
5 880
7 680
9 350
9 750
12 798
16 280
16 500
22 800
25 900
132 938
22 500
19 600
25 600
28 900
22 500
26 244
34 225
27 225
36 100
34 225
277 119
XY n
b
X
X
X
n XY X Y
n X 2 X
Y b X y b x
n
184
(9)
(8)
132,938
7771657
10
7772
70,903
0.3978,
10
185
Figure (3) Scatter diagram and least squares line for Example (1)
y 134.79 0.3978x
200
134.79
100
25
50
75
100
125
Suppose that we square the vertical distance from each observed point yi to the
least-squares use, and add these squared distances over all points. The total we
get will be smaller than the similarly computed total for any other line that we
could draw through the original points. This is why we call the line the least
squares line.
1.4
186
One method of evaluating the regression equation is to compare, the scatter of the
points about the regression line with the scatter about y , the mean of the sample
values of Y. Figure (4) shows the regression line and the relative magnitudes of
the scatter of the points from y for example (1). It shows the line representing y
as a horizontal line. This is because, regardless of the value of X, y remains
constant. For these data, the dispersion of the points about the regression line is
much less than the dispersion about the y line. So it seems that the regression
line provides a good fit for the data.
We get the amount by which any observed value of Y, y i and as showing figure
(4).
Figure (4) scatter diagram for Example (1) showing deviations about y and the
regression line.
yi
y 134.79 0.3978x
xi
This difference yi y is called the total deviation. Consider, for example, the
ninth value of Y. You will find it in Table (1) to be y 190. Since y 165 .7,
the total deviation of this Y value is 190 165.7 = 24.3.
187
The vertical distance from the regression line to the y is given by y y . This is
called the explained deviation. It shows the amount by which we reduce the total
deviation when we fit the regression line to the points.
Finally, the vertical distance of the observed Y from the regression line ( yi y ) is
called the unexplained deviation. It represents that portion of the total deviation
not explained or accounted for by the fitting of the regression line. In the case
of y9 190 , there is an unexplained deviation of y9 y 190 182 .5 7.5.
Thus the total deviation for a particular y i is equal to the sum of the explained
and unexplained deviations. That is,
y y y y
i
yi y
Total
Explained Un exp lained
deviation deviation deviation
Total sum of
squares
y y
Explained sum
of Squares
y (9)
2
Each of the terms in equation (9) is a measure of dispersion. The total sum of
squares measures the dispersion of the observed values of Y about their mean y .
188
That is, this term is a measure of the total variation in the observed values of Y. it
is the numerator of the familiar formular for the sample variance.
The explained sum of squares measures the dispersion of the observed Y values
about the regression line. It is sometimes referred to as the sum of squares of
deviations from linearity. The unexplained sum of squares is the quantity that we
minimize when we find the least-squares line. It is usually called the error sum of
squares. We may write equation 9 in a more compact form, as follows:
(10)
y
SST y y y
n
2
2
(11)
SSR y y b
2
Xi
xi x b X i
n
(12)
We can get the unexplained sum of squares by subtraction. That is, SSE = SST
SSR
1657
SST 277 ,119
10
2554 .10
189
Alternatively, we may compute SST by squaring and summing the individual total
15 .7 2 25 .7 2 ... 19 .32 246 .49 660 .49 ... 372 .49 2554 .10
By equation (12), the explained sum of squares, or sum of squares due to
regression, is
2
777
SSR 0.3978 70,903
1666.33
10
or we can get the explained sum of squares by squaring and summing the
explained deviations y y to give
Note a slight discrepancy due to rounding in the results for SSR and SSE
computed by the two methods.
190
When the assumptions we gave in section 1.2 hold, we may use analysis of
variance to test for the presence of regression. In this process, the total sum of
squares
y y
y is a measure
2
of the variability left unexplained after regression has been considered. This last
sum of squares is also called the deviations from regression or error sum of
squares. We can also subdivide the total degrees of freedom n 1 into two
components, 1 for regression and n 1 1 n 2 associated with the error
sum of squares. Dividing the sums of squares by their associated degrees of
freedom yields corresponding mean squares. If there is no linear regression (that
is, if 0 , and if the stated assumptions about the model apply, the ratio of the
regression mean square to the error mean square is distributed as F with 1 and
n 2,
degrees of freedom).
We can, therefore, test the null hypothesis that 0 using analysis of variance.
Table 3 shows the analysis-of-variance table that we can construct.
Table 3
ANOVA table for Simple Linear Regress
Source of Variation
Linear regression
Deviation from
uncarity (error)
Total
SS
SSR
df
1
n2
n 1
SSE
SST
ms
SSR
n 1MSR
1
SSE
MSE
n2
MSR
MSE
Table 4
Analysis of Varaince for Example (1)
Source
Regression
SS
1,666.33
191
df
1
MS
1,666.33
15.02
Error
Total
887.77
2,554.10
8
9
110.97
When the assumptions in section 1.2 are met, a and b are unbiased point
estimators, respectively, of and .
192
2
a
(13)
y2
2
i
n xi x
(14)
b2
(15)
y2
x x
(16)
In equation (14) and (16) is the variance about the population regression line. We
also call y2 x the unexplained variance of the population. It is the common
variance 2 of the subpopulations of Y as specified in the initial assumptions.
The definitional equation for this quantity, for a finite population of size N is:
y2
yi y
n
i 1
When assumptions are met, then, we can construct confidence intervals for, and
test hypotheses about, and in the usual way. In most cases, inferences
about are not of great interest. The parameter , however, is of great interest.
If 0, the regression line is horizontal, and an increase or decrease in X is not
associated with a change in Y.
In this situation, we conclude that X and Y are not linearly related. A positive
indicates that, generally, Y tends to increase as X increases. In this situation,
there is a direct linear relationship between X and Y. A negative indicates that
values of Y tend to decrease as values of X increase, and there is an inverse linear
relationship between X and Y. Figure 5 illustrates these three situations.
193
(a) Direct linear relationship (b) Inverse Linear relationship (c) No linear relationship
b o
is known, is
(18)
b o
Sb
(19)
194
Where S b is the estimator of b . The associated degrees of freedom are n-2, the
error degrees of freedom from the ANOVA table.
To find S b , we must first estimate y2 x . An unbiased estimator of this is given
S
2
y x
2
y
(20)
n2
An alternative formula S y2 x is
S y2
X i Yi
yi X iYi
1
2
n
yi
2
n 2
n
X i
2
(21)
2
yi n X iYi X i Yi
1
2
b
yi
n 2
n
n
The estimator, S
2
b
S y2
(22)
S y2
(23)
xi2 xi n
2
Let us now use the example of production and manufacturing expenses (Example
(1)) to show how to test the null hypothesis that 0 . First we state the
hypotheses and significance level:
H 0 : 0, H1 : 0
195
110 .97
0.0105 and Sb 0.0105 0.102
2
70903 777 10
The figures in the denominator of S b2 come from table (4). The test statistic that
we may compute
t
0.3978 0
3.9
0.102
We reject H 0 , since 3.9>2.306, the upper critical value of t for a two-sided test
with 8 degrees of and 0.05. Thus we conclude that is not 0 and that there
is a linear relationship between X and Y. Since b is positive, we conclude that the
relation is direct, not inverse. Since 3.9>3.3554, P<2(0.005)=0.01.
Note that the decision resulting from testing H 0 : 0 by means of the t test is
the same as that reached using analysis of variance. In fact, the value of t
computed from equation (19) is equal to the square root of the F computed in the
analysis of variance.
We can use equation (19) to test the null hypothesis that is equal to some value
other than 0. The hypothesized value for , 0 , replaces 0 in the equation. All
other quantities computations, degrees of freedom, and methods of determining
significance are the same as in the example.
Alternatively, we can test the null hypothesis that 0 by means of a
confidence interval for . We use the general formula for a confidence interval,
Estimate (reliability factor) (standard error)
When we construct a confidence interval for , the estimator is b. The reliability
factor is some value of Z or t (depending on whether or not y2 is known). And
x
196
y2 x
x x
Sb
S y2 x
x x
Thus in most practical cases, the 100 1 % confidence interval for is given
by
b t 2 Sb
(24)
1.5
are met, we can construct a prediction interval for Y. Second, we can use it to
estimate the mean of the subpopulation of Y values for a particular value of X.
Again, if the assumptions of section 1.2 are met, we can construct a confidence
interval for the mean.
Predicting Y for a Given Y
We get a point prediction of the value Y is likely to assume for a given X by
substituting a particular value of X, X p , into the sample regression equation and
solving for y . If the assumptions of section 1.3 are met, and if y2 x is unknown,
the 100 1 % prediction interval for Y is given by
y t 2 S y x
Xp x
1
1
n x x2
(25)
x x ,
2
by means of he formula
x
x n
2
i
In example (1), we wish to predict the manufacturing expenses for a firm that
produces 50,000 units. Substituting 50 for x in the sample regression equation
gives
y 134 .79 0.3978 50 155
Using expression (25) and the data from Tables 4 and 2, we construct the
following 95% prediction interval:
1
K155 2.306 110.97 1
10
155 26
129,181
198
50 77.7 2
7772
70903
10
xp x
1
n x x2
y t 2 S y x
(26)
Suppose that, for the example of the production and manufacturing expenses, we
wish to estimate the mean of the subpopulation of Y values for firms that produce
50,000 units. We obtain the estimates as follows:
y 134 .79 0.3978 50 155
1
155 2.306 110.97
10
50 77 7 2
777
70903
10
155 10
145, 165
Learning Objectives
199
Using the given formulas compute a and b to fit the least square line
Explain how to set confidence intervals and carry out tests about and from a
small collection of data.
define the sample correlation coefficient, and link it to the appearance of scatter
diagrams
construct and use an Analysis of Variance Table for a regression, including the Ftest for 0
200
READING LIST
1.
2.
3.
4.
5.
6.
7.
8.
Statistics for Business and Economics, Debra Olson Oltman and James R Lackritz
Thomson information/Publishing Group.
Statistics and Econometrics, Charles R. Frank Jr.
Introduction to statistics Analysis, Wilfrid J. Dixon and Frank J. Massey Jr.
Questions and Answers, Tailoka Frank P.
Statistics for Business and Economics An Action learning Approach Marion
Gross Sobol, Martin k Starr (McGraw Hill)
Statistical methods Dfattenberger Roger, C, James H, Patterson (Irwin)
Elementary Business Statistic The Modern Approach, sixth edition, John E.
Freund, Frank J. Williams, Benjamin M Perles (Prentice-Hall International, Inc).
Business Statistics A decision making Approach, David F. Groebner/Patrick W.
Shannon.
201