STT041 STT041.1 2ND Set of Hand-Outs

VIII.
STATISTICAL EXPERIMENT AND COUNTING TECHNIQUES
Statistical Experiment – is an experiment that generates data.

Examples: tossing a coin, tossing a die, drawing a ball from a box
Sample Space – the set of all possible outcomes of an experiment, denoted by S.

Example 8.1. In the experiment tossing a coin once, S = {H , T}.
Example 8.2. In the experiment tossing a coin twice, S = {HH , HT , TH , TT}.
Example 8.3. In the experiment tossing a die once, S = {1 , 2 , 3 , 4 , 5 , 6}.
Example 8.4. In the experiment tossing a die twice, S = {(1,1) , (1,2) , (1,3) , (1,4) , (1,5), (1,6) ,
(2,1) , (2,2) , (2,3) , (2,4) , (2,5) , (2,6) , (3,1) , (3,2) , (3,3) , (3,4) , (3,5) , (3,6) ,
(4,1) , (4,2) , (4,3) , (4,4) , (4,5) , (4,6) , (5,1) , (5,2) , (5,3) , (5,4) , (5,5) , (5,6) ,
(6,1) , (6,2) , (6,3) , (6,4) , (6,5) , (6,6)}
Sample Point – refers to each outcome in the sample space.
Event – a subset of the sample space, denoted by capital letters such as A, B, C,…., Z.
1. Simple event – a set containing only one element of the sample space.
Example 8.5. In tossing a coin twice, if A is the event of getting two heads, then A = {HH}.
2. Compound event – one that can be expressed as a union of simple events.

Example 8.6. In tossing a coin twice, if B is the event of getting at least one head, then B = {HT ,
TH , HH}.
Null or empty space – a set that contains no element, denoted by Ø.

Example 8.7. In tossing a die twice, if C is the event of getting a total of 13 dots in the two
tosses, then C = { }.
Disjoint sets/ mutually exclusive events – sets which do not have elements in common.
Example 8.8. Consider the experiment tossing a die twice. Let A be the event of getting a
total of 4 dots and B be the event of getting a total of 3 dots. Then A = {(1,3) ,
(2,2) , (3,1)} and B = {(1,2) , (2,1)}. Events A and B are mutually exclusive
events.
In the next discussion, we will study how to count the total number of sample points in the
sample space without actually listing element.
Counting Principles:
A. Fundamental Principle of Counting (FPC)

If an operation can be perform can be performed in n1 ways and of for each of these a
second operation can be done in n2 ways, then the two operations can be done in n1  n2 ways.
Example 8.9. How many sample points are there in the sample space when a pair of dice is thrown
once?
Solution: The first die can land in any of 6 ways. For each of these 6 ways the second die can also
land in 6 ways. Therefore, the pair of dice can land in (6)(6) = 36 ways.
59
ST
STT041/STT041.1 1 SEMESTER, A.Y. 2020-2021
Example 8.10. How many sample points are in the sample space when a coin is tossed three times?
Solution: (2) (2) (2) = 8 sample points
B. Permutation – is an ordered arrangement of all or a part of n distinct objects.
Property 1: The number of permutations of n distinct objects taken all at a time is n!

Example 8.11. How many distinct permutations can be made from the letters of the word “LOVE”?
Solution: There are 4 letters that can be chosen for the first position, 3 letters that can be chosen
for the second position, two letters for the third position, and one letter for the last
position. Hence, there are n! = 4! = 4 x 3 x 2 x 1 = 24 distinct permutations that can be
made from the letters of the word “LOVE.” Some of these permutations are LOVE,
LOVE, LVEO.
Example 8.12. In how many ways can An, Joe, Al, Fe, and Cale be seated in a row of five chairs?
Solution: n! = 5! = 5 x 4 x 3 x 2 x 1 = 120 ways.

n!
Property 2: The number of permutations of n distinct objects taken r at a time is n Pr 
n  r !
Example 8.13. Consider the 4 objects a, b, c, d. If we arrange these objects two at a time, how many
possible arrangements are there?
4! 4! 4  3  2!
Solution: n = 4, r = 2 : 4 P2     12 possible permutations or arrangements.
4  2! 2! 2!
These are ab, ac, ad, ba, ca, ba, bc, bd, cd, cb, db, & dc.
Example 8.14. If two prizes, the first and second price, will be awarded among 10 students, how
many ways can this be done assuming that only one prize will be won by a winner?
Answer: 90 ways
Property 3: The number of permutations of n distinct object arranged in a circle is n  r !

Example 8.15. In how many ways can 6 different varieties of orchids be planted in a circle?
Solution: There are n = 6 varieties of orchids. The number of permutations is (6-1)! = 5! = 5 x 4 x

3 x 2 x 1 = 120. That is, there are 120 ways of planting these orchids in a circle.
Property 4: The number of distinct permutations of n things of which n1 are of one kind (or alike),
n!
n2 of a second kind, . . ., nk of the kth kind, is given by , where
n1!n2!...nk !
n  n1  n2  ...  nk .
Example 8.16. How many different ways can 3 red, 4 yellow, and 2 green bulbs be arranged in a
string of Christmas lights with 9 sockets?
9! 9  8  7  6  5  4!
Solution: The total number of distinct arrangements is   1,260 .
3!4!2! 3!4!2!
60
ST
C. Combination – is the number of ways of selecting r objects without regard to order.
n!
The number of combinations of n distinct objects taken r at a time is n Cr  .
n  r !r!
Example 8.17. How many different combinations can be formed with the letters a, b, c, and d if two
letters are taken at a time?
4!
Solution: n = 4, r = 2: the total number of combinations is 4 C2   6 . These are ab ,
4  2!2!
ac , ad , bc , bd , and cd.
Example 8.17. From four mathematicians and three statisticians, find the number of committees of
size three that can be formed with two mathematicians and one statistician.
Solution: The number n1 of ways of selecting two mathematicians from a group of four is
4! 4  3  2!
C2    6.
4
4  2!2! 2!2!
The number n2 of ways of selecting one statistician from a group of three statisticians
3! 3  2!
is 3 C1   3.
3  1!1! 2!
Using the fundamental principle of counting, the number of committees of size three that
can be formed with two mathematicians and one statistician is n1  n2  6  3  18
committees.
Exercise 8.18.
1. If an experiment consists of throwing a die and then drawing a letter at random from the English
alphabet, how many points are there in the sample space?
2. An enrolling college student has the following subjects open to her: English 1 (section B or C),
Math 31 (section F, G or X), Filipino 1 (section T, U, Y), and Biology1 (section D, H, V or W). If
she will enroll in all these subjects, in how many ways can she arrange her schedule?
3. How many distinct permutations can be made from the letters of the word “columns”? How many
of there permutations starts with letter “n”?
4. How many ways can 6 people be lined up in a bus?
5. How many ways can a caravan of 8 covered wagons be arranged in a circle?
6. From a group of four men and five women, how many committees of size three are possible
a) if there is no restriction?
b) if the committee must have one man and two women?
c) if the committee must have three men and no women?
7. Find the number of ways in which six teachers can be assigned to four sections of an introductory
statistics course if no teacher must be assigned to more than one section.
61
ST
IX. PROBABILITY
In the 17th century, two mathematicians, Blaise Pascal and Pierre de Fermat carried on a
correspondence discussing mathematical problems dealing with games of chance. They were confronted
with questions raised by gamblers: How can the uncertainty of the outcomes of a game, such as coins,
dice or deck of cards, be measured? What constitutes a fair bet? Attempts to answer these questions led to
the birth of probability theory.
Probability is a branch of mathematics that deals with calculating the occurrence of a given event.
Definition: The probability of event A, which is written as P(A), is the sum of the probabilities of each
sample point in A. Hence, if A has sample points a1 ,a2 , a3 , . . . , an , then
P(A) = P{ a1} + P{ a2} + P{ a3} + . . . + P{ an}
Properties of Probability
1. 0  P(A)  1. (This means that the probability of an event lies between 0 and 1.)
2. If A= Ø, then P(A) = 0. (This means that the probability that an impossible event will happen is
0.)
3. If A = S, then P(A) = P(S) = 1. (This means that the probability of a sure event is equal to 1. An
event is said to be a sure event when the event is equal to the
sample space.)
Approaches in Measuring Probability
1. Subjective Approach
- the probability depends on one’s personal assessment of how likely an event will occur
- it uses personal beliefs, intuition and other indirect information in arriving probabilities
Example 9.1. A Lakers supporter might say, "I believe that the Lakers have a 0.9 probability of
winning the NBA this year since they have been playing really well."
2. Empirical Approach
- the probability of an event A can be approximated by the proportion of times that A
occurs when the experiment is repeated a very large number of times.
- it is expressed in terms of relative frequencies.
Example 9.2.
1. If a coin is tossed what is the probability that the coin will turn up head?
Solution: Suppose that the coin is tossed 100 times under the same conditions. Suppose further
that the coin fall heads 45 times out of 100. Then
45 9
P(getting a head)   .
100 20
2. Consider the table below. Find the probability that the weight of a Math 31 student drawn at
random is between 54 and 60 kg.
62
ST
Frequency Distribution Table of Weights of Math 31 Students
CI CB f CM
40 – 46 39.5 – 46.5 6 43
47 – 53 46.5 – 53.5 14 50
54 – 60 53.5 – 60.5 10 57
61 – 67 60.5 – 67.5 6 64
68 – 74 67.5 – 74.5 2 71
75 – 81 74.5 – 81.5 2 78
10 1
Solution: P(A)   .
40 4
3. Classical or Theoretical Approach
- probability is computed based on theoretical assumptions about the possible outcomes.
Suppose we can assume that all the possible outcomes of an experiment are equally
likely, that is, each has the same chance of occurring. Then probability of an event A is
equal to the number of possible outcomes favorable to A, denoted by n(A), divided by
the total number of outcomes of the experiment, denoted by n(S). That is,
n( A)
P(A)  .
n( S )
Example 9.3.
1. If a die is tossed, each of the 6 faces can be considered equally likely to come up. Thus, the
probability that any face, say the face with 4 dots, will come up is 1/6.
3. In drawing a letter from the English Alphabet, the probability of selecting a vowel is
P(vowel) = 5/26.
4. In tossing a coin twice, what is the probability of getting one head? Ans: 2/4 or ½
5. In tossing a die, what is the probability of getting an even number? Ans: 3/6 or ½
6. In tossing a die twice, what is the probability of getting a total of 5 dots? Ans: 4/36 or 1/9
7. In selecting a student from a class with 5 boys and 18 girls, what is the probability that a girl will
be selected? Ans: 18/23
Some Laws of Probability
Law 1. If events A and B are mutually exclusive, then P( A  B)  P( A)  P( B) .

Mutually Exclusive Events – are events that do not have common sample points, that is,
A  B   . In other words the events do not occur simultaneously.
The probability of occurrence of either A or B is the sum of their separate probabilities. In layman’s term,
 means “or” and so it is applicable in solving probabilities involving “either or, at least, at most, greater
than, lesser than”, etc.
63
ST
Example 9.4.
1. In the experiment of tossing a die, let E be the event that an even number of dots come up and F
be the event that an odd number of dots will come up. Then
S = {1, 2, 3, 4, 5, 6}
E = {2, 4, 6}
F = {1, 3, 5}
P(A  B) = P(A) + P(B) = 3/6 + 3/6 = 1.
2. If a card is drawn from an ordinary deck of playing cards, what is the probability that it will either
be an ace or a queen?
Solution: Let A be the event of drawing an ace and Q be the event of drawing a queen. Then
P(A) = 4/52 , P(Q) = 4/52 , P(A  Q) = 4/52 + 4/52 = 8/52 = 2/13.
Law 2. If A and B are any events, then P( A  B)  P( A)  P( B)  P( A  B) .
A B
In the Venn diagram above, P( A  B) is the sum of the probabilities of the sample points in A  B .
Now, P(A) + P(B) is the sum of all probabilities of the points in A and the sum of all probabilities of the
points in B. The points found in A  B are counted twice, hence, it has to be deducted from P(A) + P(B).
Thus, if A and B are any events, then
P( A  B)  P( A)  P( B)  P( A  B) .
Note:  in Layman’s term is “both” or “common” to the given events.
Example 9.5.
1.If a card is drawn from an ordinary deck of playing cards, what is the probability that it will either
be heart or a king?
Solution: P(heart) = 13/52 , P(king) = 4/52 , P(heart and king) = P(heart  king) = 1/52
Hence, P(heart or king) = P(heart  king) = 13/52 + 4/52 – 1/52 = 16/52 = 4/13.
2. In tossing a coin twice, what is the probability of getting at least one head?
Solution: P(one head) = 2/4 , P(two heads) = ¼. Hence, P(at least one head) = P(one head or
two heads) = P(one head  two heads) = 2/4 + ¼ = ¾.
3. What is the probability of obtaining less than 5 dots when an ordinary die is tossed?
Ans: 4/6
Law 3. If A is an event, then P(A’) = 1 - P(A).
The complement of an event A, denoted by A’, is the set of all points in the sample space S but not in A.
64
ST
Example 9.6.
1. When a pair of dice is tossed, what is the probability that the sum of points will not be 5?
Solution: P(not a 5) = 1 – P(5) = 1 – 4/36 = 32/ 36
2. If the probability that it will rain tomorrow is 0.7, what is the probability that it will not rain?
Ans.: 0.3
3. If the probability of passing an exam is 0.4, what is the probability of failing?

Ans.: 0.6
Exercise 9.7.
1. A study is designed to investigate weight and smoking habits of patients with hypertension. Here S
represents all patients with hypertension. Let A1 denote those patients who are overweight and A2 those
that smoke. In a Venn diagram, shade the region representing the following events:
a. patients who are overweight but who do not smoke;
b. patients who smoke but are not overweight;
c. patients who smoke and are overweight;
d. patients who neither smoke nor are overweight.
2. Suppose in #1 that 68% are overweight, 55% smoke, and 30% are overweight and who also smoke.
Plot these probabilities in your Venn Diagram. (Note that the total % of the Venn Diagram is 100% only,
including the region outside A1 or A2..) Then compute the percentage of patients describe in #1(a) up to
(d).
3. a. Suppose a coin is tossed three times. What is the probability of getting 2 heads?
b. Suppose the experiment is performed 100 times and the frequency for each outcome that turned
up are as follows:
Outcome HHH HHT HTH HTT THH THT TTH TTT

Frequency 11 5 13 9 17 15 18 12
What is the probability of getting 2 heads?
4. If A and B are mutually exclusive events, P(A) = 0.3, and P(B) = 0.5, find
a. P( A  B) b. P(A’) c. P( A ' B)
Hint: Construct a Venn diagram and fill in the probabilities associated with the regions.
5. A pair of dice is tossed. Find the probability of getting a total of 8 dots.
6. In a college graduating class of 100 students, 54 studied mathematics, 69 studied history and 35
studied both mathematics and history. If one of these students is selected at random, find the
probability that
a. the student takes mathematics or history;
b. the student does not take any of these subjects.
c. the student takes history but not math.
65
ST
7. A survey of in a class of 35 students in a business school showed the following selection of majors:
Accounting 10
Finance 5
Info Tech 3
Management 6
Marketing 10
Suppose you select a student and observe his or her major, what is the probability that he or she is a
Management major?
8. The distribution of blood types among the whites in the US is roughly as follows: type A – 40%, type
B – 11%, type AB – 4%, type O – 45%. A white man is brought into the emergency room after an
automobile accident. He is to be blood typed. What is the probability that he will be of type A, B, or
AB?
9. Suppose the probability of “curing” childhood leukemia is 1/3. What is the probability that the disease
will not be cured?
10. Trees on a certain mountainous area have been found to be affected by pollution. Suppose that 40%
of the evergreens show mild damage, 15% show moderate damage, 10% are severely damaged, 8%
are dead, and the rest are unaffected. If a tree is selected at random for study, what is the probability
that it is
a. unaffected?
b. at most mildly damaged?
c. severely damaged or dead?
d. neither severely damaged nor dead?
Conditional Probability
Definition. Let A1 and A2 be events such that P(A1) > 0. The conditional probability of A2 given that A1
has occurred, denoted by P(A2 |A1), is defined by
P( A1  A 2 ) probability of occurrenceof both events
P( A 2 | A1)  = .
P( A1) probability of given event
Exercise 9.8. It is estimated that 15% of the adult population has hypertension, but 75% of all adults feel
that they do not have this problem. It is also estimated that 6% of the population has
hypertension but do not think that they have the disorder.
a. If an adult patient reports thinking that he or she does not have hypertension, what is the
probability that the patient does, in fact, have it?
b. If the disorder is present, what is the probability that the patient will suspect its
presence?
Solution: Let A be the event that an adult has hypertension, B be the event that an adult feels he has no
hypertension. Then P(A) = 15% , P(B) = 75% , P(A  B) = 6%.
P(A  B) 6%
a. P(A | B) = = = 8%.
P(B) 75%
P(A  B ' ) 9%
b. P(B’ | A) = = = 60%.
P ( A) 15%
66
ST
Definition. Two events are independent if either P(A | B) = P(A) or P(B | A) = P(B). Otherwise, A and
B are dependent.
Example 9.9. Suppose two cards are drawn in succession from an ordinary deck with replacement.
a. Let A be the event that the first card is an ace and B be the event that the second card is a
spade. Events A and B are not mutually exclusive. Are they independent events?
b. Let A be the event that the first card is a diamond and B the event that the second card is a
spade. Events A and B are mutually exclusive. Are they independent events?
Solution: Since the experiment is done with replacement the sample space for both the first and second
draws consists of 52 cards with 4 aces and 13 spades. Hence,
a. P(B | A) = 13/52 and P(B) = 13/52. That is, P(B | A) = P(B) and so events A and B are
independent.
b. P(B | A) = 13/52 and P(B) = 13/52 . Again, A and B are independent events.
Exercise 9.10.
1. Suppose a family has four children.
a. Find the probability that exactly two are male.
b. What is the probability that exactly two are male if the first child born is male?
c. Find the probability that the last child born is a male.
d. What is the probability that the last child born is male if the first three are female?
2. In a study of alcoholics, it was found that 40% had alcoholic fathers and 6% had alcoholic
mothers. Forty-two percent had at least one alcoholic parent. What is the probability that a
randomly selected alcoholic will have
a. both parents alcoholic?
b. an alcoholic mother if the father is alcoholic?
c. an alcoholic mother but not an alcoholic father?
d. an alcoholic mother if the father is not alcoholic?
Multiplicative Rules
Theorem. (Multiplicative Rule) If in an experiment the events A and B can both occur, then
P(AB) = P(A)*P(B|A) or P(AB) = P(B)*P(B|A).
If events A and B are independent, then P(A∩B) = P(A)*P(B).
Exercise 9.11.
1. In the previous example on hypertension, how many percent of the population has hypertension and
know about its presence?
2. If we choose three pregnant women from the population, what is the probability that all three children
born will be girls?
3. Suppose a box contains 20 fuses, five of which are defective. If 2 fuses are selected at random
and removed from the box in succession without replacement, what is the probability that both
fuses are defective?
67
ST
Theorem. (Generalized Multiplicative Rule) If in an experiment the events A1, A2, . . .,Ak can occur, then
P( A1  A2   Ak )  P( A1 ) P( A2 | A1 ) P( A3 | A1  A2 )  P( Ak | A1  A2   Ak 1 ) .
If the events A1, A2, . . .,Ak are independent, then
P( A1  A2   Ak )  P( A1 ) P( A2 ) P( A3 )  P( Ak ) .
Exercise 9.12.
1. The probability that a person visiting his dentist will have an X-ray is 0.6; the probability that a
person who has an X-ray will also have a cavity filled is 0.3; and the probability that a person who
has had an X-ray and a cavity filled will also have a tooth extracted is 0.1. What is the probability
that a person visiting his dentist will have an X-ray, a cavity filled and a tooth extracted?
2. Three cards are drawn in succession from an ordinary deck of playing cards. Find the probability that
the first card is a red ace, the second card is a ten or jack, and the third card is greater than 3 but less
than 7 if the cards are drawn
a) without replacement;
b) with replacement.
Exercise 9.13.
1. The probability that a doctor correctly diagnoses a particular illness is 0.7. Given that a doctor makes
an incorrect diagnosis, the probability that the patient enters a lawsuit is 0.9. What is the probability
that the doctor makes an incorrect diagnosis and the patient sues?
2. A small town has one fire engine and one ambulance available for emergencies. The probability that
the fire engine is available when needed is 0.98, and the probability that the ambulance is available
when called is 0.92. In the event of an injury resulting from a burning building, find the probability
that both the ambulance and the fire engine will be available.
References:
1. Ronald E. Walpole. Introduction to Statistics, 3rd Edition. Macmillan Publishing Co. Inc., 1982
2. William Mendenhall and Richard Sheaffer. Mathematical Statistics with Applications.
Wadsworth Publishing Company, 1973
3. YA- Lun Chou. Probability and Statistics for Decision Making, 1973
X. RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
Definition:
Variable of a study – a characteristic that we observe or measure in a research or survey
Random variable – a function or a rule whose value is a real number determined by each
element in the sample space
 We use a capital letter to denote a random variable and its corresponding small letter
for a specific value of the random variable.
68
ST
Example 10.1. In tossing a coin two times, let X represent the number of times a head appears. What are
the possible values of the random variable X?
Solution:
sample points X
HH 2
HT 1
TH 1
TT 0
Thus, the possible values of the random variable X are 0, 1, or 2.
Two Types of Random Variables:

1. Discrete Random Variable – can take on a finite or countable number of possible values.
Example 10.2.
a. gender (0 – female, 1 – male)
b. marital Status (0 – single, 2 – divorced/separated, 1 – married, 3 – widow)
c. number of heads in tossing a coin n times (0, 1, 2, …, n)
d. number of children in a family (0,1, 2, 3, …, n)
e. number of tosses of a coin until a head appears (0, 1, 2, 3, …)
2. Continuous Random Variable – can take on any value in a given interval.

Example 10.3.
a.) height of an MSU-MarawiCollege student – can be anywhere from 4 ft. to 7ft.
b.) weight of an MSU-MarawiCollege student – can be anywhere from 30 kg to 100kg
Probability Distribution – describes the probability of occurrence for the different possible values of a
random variable.
Two Types of Probability Distributions:

1. Discrete Probability Distribution – a table or a formula listing all the possible values of a random
variable and its corresponding probability.
Example10.4. Construct the probability distribution of X representing the number of heads in

tossing a coin twice.
Solution: The possible values of the random variable X: number of heads in tossing a coin twice
are given in example 10.1 above. In tabular form, the probability distribution of X is as follows:
X 0 1 2 Total
P(X = x) 1/4 2/4 1/4 4/4 = 1
The probability distribution of a random variable can also be shown by means of a graph
called probability histogram. The graph below shows the probability histogram of the random
variable X, representing the number of heads in tossing a coin twice.
69
ST
0.5
0.25
0
0 1 2
The area under a bar represents the probability that x will occur.
2. Continuous Probability Distribution – identifies the range for all the possible values of the
random variable and their corresponding probabilities
Note: If X is continuous, the P(X = x) is very negligible that we can say P(X = x) = 0. Hence,
i.) we only identify the probability of X for a range of values of X, i.e. P(a < X < b), P(X >
a) or P(X < a).
ii.) P(a  X  b) = P(a < X < b)
Example 10.5. The table below shows the empirical probability distribution on the weight of
package of a mixture of vegetables:
Weight (in kilos) Probability

of
occurrence
4.5-4.9 (underweight) 0.025
5.0-5.4 (satisfactory) 0.900
5.5-5.9 (overweight) 0.075
Total 1.000
Example 10.6. The table below shows the relative frequency (or empirical probability)
distribution on the weights of Math 31 students.
Weights (Kg) CB freq Relfreq

40 – 46 39.5 – 46.5 6 0.15
47 – 53 46.5 – 53.5 14 0.025
54 – 60 53.5 – 60.5 10 0.25
61 – 67 60.5 – 67.5 6 0.15
68 – 74 67.5 – 74.5 2 0.05
75 – 81 74.5 – 81.5 2 0.05
Creating a classical probability distribution from an empirical probability distribution of a continuous

random variable:
1. Construct a frequency histogram of X. When the sample size is large, we can say that the
empirical probability distribution will estimate fairly the classical probability distribution.
70
ST
2. Find a function that would approximately follow the shape of the probability histogram. This
function is called the probability density function. Then express the classical probability
distribution in terms of area under the probability density function.
Properties of a probability density function:
1. The function lies on or above the x-axis.
2. The total area under its curve and above the x-axis is equal to 1.
3. Area under the curve between x  a to x  b is equal to Pa  x  b  .
Remarks. If the frequency histogram of the weights can be considered as symmetric and bell-shaped, then
we can consider its classical probability distribution to follow a normal probability distribution.
Exercise 10.7.
1. Classify the following random variables as discrete or continuous:
a. the number of automobile accidents each year in Marawi
b. the length of time to play 18 holes of golf
c. the amount of milk produced yearly by a particular cow
d. the number of eggs laid each month by 1 hen
2. A coin is flipped until 3 heads in succession occur. List only those elements of the sample space
that require 6 or less tosses. Is this a discrete sample space? Explain.
3. From a box containing 4 black balls and 2 green balls, 3 balls are drawn in succession, each ball
being replaced in the box before the next draw is made. Find the probability distribution for the
number of green balls.
4. Consider the experiment of tossing a die twice. Let X represent the total number of dots that will
appear. Find the probability distribution of X.
5. Suppose we toss a coin 3 times and let X represent the number of heads that will appear. Find the
probability distribution of X.
6. The frequency distribution on the lifetime (in years) of a random sample of 40 car batteries are as
follows:
Lifetime Frequency
(in years)
1.5  1.9 2
2.0  2.4 1
2.5  2.9 4
3.0  3.4 15
3.5  3.9 10
4.0  4.4 5
4.5  4.9 3
Find an empirical probability distribution on the car battery lives based from this result.
7. A continuous random variable X that can assume values between x=1 and x=4 has a density
function given by f(x) = 1/3.
a.) Show that the area under the curve is equal to 1.
b.) Find P(1.5 < X < 3).
c.) Find P(X ≥ 2.2).
71
ST
Mean and Variance of a Random Variable
We have learned in our previous discussion how to find the mean and variance of interval or ratio
scaled variables. In this section, we will discuss how to find the mean and variance of random variables
when their probability distributions are known. These 2 values are very important in statistical inference.
Definition. Mean of a Random Variable

Let X be a discrete random variable with the probability distribution
x x1 x2 ... xn
P(X = x) f(x1) f(x2) ... f(xn)
The mean or expected value of X is

n
  E( X)   xi f ( xi )
i 1
Remarks:
1. The mean of a random variable X is also the mean of the distribution of the random variable.
2. The mean or expected value of a random variable can be interpreted as the mean of the
population or distribution whose observations are all the values of X, i.e,  = X = E(X).
3. The mean of a random variable is the mean value of the random variable that we expect to get
if we repeat the experiment over and over again indefinitely.
Example 10.8. Find the mean of a random variable X, where X represents the number of heads that
occur when 3 coins are tossed once.
Solution:
The table below shows the probability distribution for the random variable X representing the
number of heads that occur when 3 coins are tossed once.
x 0 1 2 3
P(X = x) 1/8 3/8 3/8 1/8
4
Therefore, = E(X) =  xi f ( xi ) = (0) (1/8) + (1) (3/8) + (2) (3/8) + (3) (1/8)
i 1
= 0 + 3/8 + 6/8 + 3/8
= 1.5
This means that a person who tosses 3 coins over and over again will, on the average, get 1.5
heads per toss. Note that 1.5 is an average value and not necessarily a possible outcome for the
experiment.
72
ST
Definition. Mean of a Function of One Variable
x x1 x2 ... xn
P(X = x) f(x1) f(x2) ... f(x xn)
The mean or expected value of X is

n
 g( X)  E[g( X)]   g( xi ) f ( xi )
i1
Example 10.9. Suppose the number of cars, X, that pass through a car wash between 4:00 p.m.and 5:00
p.m. on any sunny Friday has the following probability distribution:
x 4 5 6 7 8 9
P(X = x) 1/12 1/12 ¼ ¼ 1/6 1/6
Let g(X) = 2X – 1 represent the amount of money in dollars, paid to the attendant by the
manager. Find the attendant’s expected earnings for this particular period.
Solution:
6
g(X) = E[g(X)] = E[2X – 1] =  (2xi  1) f (xi )
i 1
= [2(4) – 1](1/12) + [2(5) – 1](1/12) + [2(6) – 1](1/4) + [2(7) – 1](1/4) +
[2(8) – 1](1/6) + [2(9) – 1](1/6)
= $12.67
Therefore, for the period from 4:00 p.m. to 5:00 p.m., the attendant will, on the average, earn $12.67.
Properties of the Mean of a Random Variable

1. If a and b are constants, then aX b  a X  b  a  b
Remarks: 1. Settinga = 0, we see that b  b
2. Setting b= 0, we see that aX  a
2. The mean of the sum or difference of two or more random variables is equal to the sum or difference
of the means of the variables. That is,
 X Y   X   Y and  X Y   X   Y
3. The mean of the product of two or more independent variables is equal to the product of the means of
the variables. Therefore, if X and Y are independent,
 XY   X   Y
73
ST
Example 10.10. Applying property 1 to Example 10.9 above, we have
g(X) = 2X – 1 = 2 X – 1
6
where X =  xi f (xi ) = (4)(1/12) + (5)(1/12) + (6)(1/4) + (7)(1/4) + (8)(1/6) + (9)(1/6)
i 1
= 1/3 + 5/12 + 3/2 + 7/4 + 4/3 + 3/2
= 41/6
Therefore, g(X) = 2X – 1 = 2X – 1 = (2) (41/6) – 1 = $ 38/3 = $12.67.
Exercise 10.11. The probability distribution of the discrete random variable X is

x 3 x
 3  1   3 
f(x) =      for x = 0, 1, 2, 3.
 x  4   4 
a. Find the mean of X.

b. Find the mean of g(X), where g(X) = 7 – 4X.
c. Find the mean of g(X), where g(X) = 3X2 + 8.
d. Find the mean of g(X), where g(X) = (2X + 5)2.
Definition. Variance of a Random Variable

x x1 x2 ... xn
P(X = x) f(x1) f(x2) ... f(x xn)
The variance of X is
n
2  E[( X   )2 ]   ( xi   )2 f ( xi )
i 1
Computing Formula: 2  E( X2 )  2 .
Remarks:
1. The variance of the random variable X is also the variance of the population whose values
are the values assumed by the random variable X, when the experiment is repeated over
and over again indefinitely.
2. We denote the variance of the random variable as  2X or simply 2.
Example 10.12. Calculate the variance of the random variable X in Example 10.8.
Solution:
Let X = number of heads that occur when 3 coins are tossed once
In Example 10.8 we showed that  = 1.5. So now, we solve for E(X2).
74
ST
4
2
E(X ) =  xi2 f ( xi ) = (02) (1/8) + (12) (3/8) + (22) (3/8) + (32) (1/8)
i 1
= 0 + 3/8 + 3/2 + 9/8
= 3.
2 2
Therefore,   E( X )  2 = 3 – (1.5)2 = 0.75
Exercise 10.13: Verify the result using the definitional formula

n
2  E[( X   )2 ]   ( xi   )2 f ( xi ) .
i 1
Definition. Variance of a Function of One Random Variable

x x1 x2 ... xn
P(X = x) f(x1) f(x2) ... f(x xn)
The variance of the random variable g(X) is

n
2 2 2
g( X)  E[( g( X)   g( X) ) ]   (g( x i )   g( X ) ) f ( xi )
i 1
Exercise 10.14. Calculate the variance of g(X) = 2X – 1, where X is the random variable defined in
Example 10.9.
Properties of the Variance of a Random Variable
1. If X is a random variable and b is a constant, then 2X b  2X  2
2
2. If X is a random variable and a is any constant, then aX  a 2 2X  a 2 2
3. The variance of the sum of difference of two or more independent random variables is equal to the
sum of the variances of the variables. That is, if X and Y are independent
 2X  Y   2X   2Y and  2X  Y   2X   2Y
Exercise 10.15.Let X be the number of heads that occur when a coin is tossed twice, and
Y be the outcome when a die is tossed once. X and Y are independent random variables. Find the
variance of the random variable Z = 2X – 7Y + 4. (Apply appropriate properties of the variance
of a random variable.)
75
ST
Exercise 10.16.
1. The probability distribution of the discrete random variable X is
x 3 x
 3  1   3 
f(x) =      for x = 0, 1, 2, 3.
 x  4   4 
a. Find the variance of X.

b. Find the standard deviation of g(X), where g(X) = 7 – 4X.
c. Find the variance of g(X), where g(X) = 3X2 + 8.
d. Find the standard deviation of g(X), where g(X) = (2X + 5)2.
2. Let X represent the number that occurs when a green die is tossed and Y the number that occurs when
a red die is tossed. Find the variance of the random variable
a. 2X – Y b. X + 3Y – 5
XI. SOME DISCRETE PROBABILITY DISTRIBUTIONS
Below are some mathematical models on the probability distribution of some random variables.
Understanding these models will help in calculating probabilities of some events.
A. Uniform Distribution
Example 11.1. Consider the experiment rolling a die and let the random variable X represent the number
that comes up. We note that each possible value of the random variable has the same chance of
occurring. And we can express the probability distribution of X by the formula
1
f ( x;6)  for x = 1, 2, 3, 4, 5, 6
6
Example 11.2 Suppose we spin a roulette wheel with 25 equal divisions. If we let X represent the
number on the wheel where it will stop, then the probability distribution of X is given by
1
f ( x;25)  for x = 1, 2, …, 25
25
N
Example 11.3. If we select a subset of size n from a finite sample space of size N, then there k =  
n 
possible subsets that can be selected so that
1 N
f(x ; k) = , for x = 1 , 2, . . . ,   = k
 N  
n
 
n 
Suppose we consider a group of 4 students and we select two students at random. Denote the students as
A, B, C, and D. Then the possible selections would be 1–AB, 2–AC, 3–AD, 4– BC, 5–BD, and 6–CD.
 4
Thus, there are   = 6 possible selections that we can make so
 
2
that
76
ST
1 1
f(x ; k) = = , for x = 1, 2, 3, 4, 5, 6
 4 6
 
 2
Definition. If the random variable X assumes the values x 1 , x2 , . . . , xk , with equal probabilities, then
the discrete uniform distribution is given by
1
f(x ; k) = , for x = x1 , x2 , . . . , xk
k
B. Binomial Distribution
Tossing a coin n times is a classical example of a binomial experiment. From this

experiment we can see the following characteristics of a binomial experiment:
a. The experiment consists of n repeated trials.

b. Each trial results in an outcome that may be classified as a success of failure.
c. The probability of a success, denoted by p, remains constant from trial to trial.
(The probability of a failure is denoted by q, q = 1 – p.)
d. The repeated trials are independent.
NOTE: A binomial random variable is defined as the number of success in n trials.

Example 11.4. Consider the experiment of tossing a coin three times. Let X represent the number of
heads that will appear. Note that
P(X=0) = P{TTT} = (1/2) (1/2) (1/2) = 1/8

3!
P(X=1) = P{HTT, THT, TTH} = (1/2) (1/2)2 = 3/8
1!2!
3!
P(X=2) = P{HHT, HTH, THH} = (1/2)2 (1/2) = 3/8
2!1!
P(X=3) = P{HHH} = (1/2)3
In general, we can express the formula for the probability distribution of X by
 3
f ( x;3, 1 )   1 / 2 1 / 2 for x = 0, 1, 2, 3
x 3 x
2  x
77
ST
Definition. (Binomial Distribution)
If a binomial trial can result in a success with probability p and a failure with probability q = 1 – p
, then the probability distribution of the binomial random variable X, the number of successes in n
independent trials is
n 
b(x ; n , p) =   p x qn  x , for x = 0 , 1 , 2 , . . . , n
 
x
The mean and variance of the binomial distribution b(x ; n , p) are

  np and 2  npq
Note: Computing binomial probabilities can be aided by using the Binomial Distribution Table.
Exercise 11.5.
1. The probability that a patient recovers from a rare blood disease is 0.4. If 15 randomly chosen people
are known to have contracted this disease, what is the probability that
(a) exactly 8 survive?
(b) at most 5 survive?
(c) less than 5 survive?
(d) at least 10 survive?
(e) more than 10 survive?
(f) between 3 and 8, exclusive, survive?
(g) between 3 and 8, inclusive, survive?
2. A multiple-choice quiz has 10 questions, each with 4 possible answers of which only 1 is the correct
answer. What is the probability that sheer guesswork yields
a. exactly 5 correct answers?
b. more than 8 correct answers?
c. fewer than 3 correct answers?
d. between 5 and 7, inclusive, correct answers?
C. Negative Binomial Distribution

Characteristics of a Negative Binomial Experiment
1. Each trial results in an outcome that may be classified as a success or a failure.
2. The probability of a success, denoted by p, remains constant from trial to trial.
3. The repeated trials are independent.
4. The trials are repeated until a fixed number of success occur.
In here, we are now interested in the probability that the kth success occurs on the xth trial.
NOTE: A negative binomial random variable is defined as the number of the trial on which the kth
success occurs.
Example 11.6. Suppose a coin will be tossed until 3 heads have appeared. Let the random variable X
represent the number of trials required until 3 heads have appeared. We know that x = 3, 4, 5, 6, … The
minimum value of x is 3 because it requires at least 3 tosses to get 3 heads. Note that
P(X=3) = P{HHH} = 1/8

3!
P(X=4) = P{HHTH, HTHH, THHH} = (1 / 2) 3 (1 / 2)
2!1!
78
ST
4!
P(X=5) = P{HHTTH, HTHTH. HTTHH, THTHH, TTHHH,THHTH} = (1 / 2) 3 (1 / 2) 2
2!2!
In general, the probability distribution of X is given by
( x  1)
P(X=x) = (1 / 2) 3 (1 / 2) x 3 for x = 3, 4, 5, …
(k  1)!( x  1  (k  1))!
Definition. (Negative Binomial Distribution)

If repeated independent trials can result in a success with probability p and a failure with
probability q = 1 – p , then the probability distribution of the random variable X, the number of the trial
on which the kth success occurs is given by
 x  1
b*(x ; k , p) =   pk qx k , for x = k , k+1 , k+2 , . . .
 k  1 
D. Geometric Distribution
– is a special case of the negative binomial distribution where k = 1
– in here we will be interested in the probability that the 1 st success occurs on the xth trial
Example 11.7. Suppose the probability that a patient will survive from a heart operation is 0.7. What is
the probability that the first successful operation will happen on the 1 st patient? 2nd? Third?Fourth patient?
What is the general formula that the 1st success will occur on the xth patient?
P(X=1) = 0.7
P(X=2) = P{FS} = (0.3) (0.7)
P(X = 3) = P{FFS} = (0.3)2 (0.7)
P(X = 4) = P{FFFS} = (0.3)3 (0.7)
In general, the probability that the first success will occur on the xth trial is
P(X=x) = (0.3)x-1 (0.7)
Definition. (Geometric Distribution) If repeated independent trials can result in a success with
probability p and a failure with probability q = 1 – p , then the probability distribution of the random
variable X, the number of the trial on which the first success occurs, is given by
g(x ; p) = qx – 1 p , for x = 1 , 2 , 3, .. .
E. Hypergeometric Distribution
Characteristics of a Hypergeometric Experiment
1. A random sample of size n is selected (without replacement) from a population of N items.
2. k of the N items may be classified as success and N – k as failures.
NOTE: A hypergeometric random variable is defined as the number of success.
Example 11.8. Suppose we will form a committee of size 3 from 4 mathematicians and 2 statisticians.
Let X be the number of mathematicians in the committee. Hence, X = 1, 2, 3. Note that there are
 6
   20 possible committees of size 3 that can be formed. And
 3
79
ST
 4  2 
  
P(X=1) =
 1  2 
20
 4  2 
  
P(X=2) =
 2  1 
20
 4  2 
  
P(X=3) =
 3  0 
20
 4  2 
  
 x  3  x 
The general formula is given by P(X=x) = for x = 1, 2, 3
20
Definition. (Hypergeometric Distribution)

If a population of size N contains k items labeled “success” and N – k items labeled “failure,”
then the probability distribution of the hypergeometric random variable X, the number of successes in a
random sample of size n, is
 k  N  k 
   
 x   n  x 
h(x ; N , n , k) = , for x = 0 , 1 , 2, . . . , n
 N
 
n 
The mean and variance of the hypergeometric distribution h(x ; N , n , k) are

nk Nn k k
 and 2   n  1  
N N1 N N
F. Poisson Distribution
Properties of a Poisson experiment
1. The number of outcomes occurring in one time interval or specified region is independent of
the number that occur in any other disjoint time interval or region of space.
2. The probability that a single outcome will occur during a very short time interval or in a small
region is proportional to the length of the time interval or the size of the region and does not
depend on the number of outcomes occurring outside this time interval or region.
3. The probability that more than one outcome will occur in such a short time interval or fall in
such a small region is negligible.
The probability distribution of the Poisson random variable X, representing the number of outcomes
occurring in a given time interval or specified region, is
e   x
p(x ;  ) = , for x = 0 , 1 , 2 , . . .
x!
where  is the average number of outcomes occurring in the given time interval or specified region.
80
ST
Exercise 11.9. Suppose that a certain intersection has an average of 2 accidents per month. What is the
probability that this intersection will have
a.) 3 accidents in a month?
b.) more than 3 accidents in a month?
c.) less than 3 accidents in a month?
G. Normal Probability Distribution
This is a theoretical probability distribution of a continuous random variable that follows a bell-
shaped curve.
Properties of the normal curve:

1. It is bell-shaped and symmetric about the vertical line passing thru the mean.
2. It is asymptotic to the horizontal axis as it proceeds away from the mean.
3. The total area under the curve is equal to 1.
4. The flatness of the curve is determined by the standard deviation of the distribution. Bigger
standard deviation means flatter curve.
Finding probabilities of events using the normal curve is equivalent to finding the area under the curve
pertaining to the given event. Areas under the curve may be found using the z-table. Just convert first the
random variable X to the standard normal random variable Z using the formula
x
Z

where μ = mean of the random variable X
σ = standard deviation of X
Exercise 11.10.
1. The IQs of 600 applicants to a certain college are approximately normally distributed with a mean of
115 and a standard deviation of 12. If a student will be selected at random, what is the probability that
the student has an IQ
a. less than 110?
b. greater than 120?
c. between 110 to 120?
d. How many percent of the students has an IQ lower than 95?
e. How many of the students has an IQ higher than 100?
2. The heights of 1000 students are normally distributed with a mean of 174.5 centimeters and a standard
deviation of 6.9 centimeters.
a. If a student will be drawn at random, what is the probability that his or her height is less than 170
cm?
81
ST
b. How many percent of the students stand at least 180 cm?
c. How many of the students have heights between 170 to 178 cm?
3. A certain type of battery for pocket calculators is known to have an average life of 60 hours of
continuous use. The lives of these batteries follow a bell-shaped distribution. If a battery will be drawn
at random, what is the probability that
a. it will last less than 58 hours?
b. it will last more than 64 hours?
c. it will last between 58 to 62 hours?
4. Among diabetics, the fasting blood glucose level X may be assumed to be approximately normally
distributed with a mean of 106 mg/100 ml and a standard deviation of 8 mg/100 ml. Find the
probability that for a randomly chosen diabetic, his fasting blood glucose is
a. less than 120 mg/100 ml.
b. greater than 121 mg/100 ml.
c. between 90 and 120 mg/100 ml.
d. Find the point x0 that has the property that 25% of all diabetics have a fasting glucose level X
lower than x0.
5. Among females in the US between 18 and 74 years of age, diastolic blood pressure is normally
distributed with a mean of 77 mmHg and a standard deviation of 11.6 mmHg. What is the probability
that a randomly selected woman has a diastolic blood pressure
a. less than 60 mmHg?
b. greater than 90 mmHg?
c. between 60 and 90 mmHg?
d. Find the point x0 that has the property that 30% of the females have a diastolic blood pressure
lower than x0.
e. Find the point x0 that has the property that 30% of the females have a diastolic blood pressure
greater than x0.
The Normal Approximation to the Binomial Distribution
In a binomial experiment, when the number of trials n becomes large, the computation can be so
tedious. We have an alternative procedure and this is through approximating the binomial probability
using the normal curve. Note than when p is close to 0.5, the probability histogram would follow a bell-
shaped curve. The larger the n, the better is the approximation. In fact, as long as p is not so close to 0 or
close to 1, and n is large the normal curve can still be used to approximate fairly the binomial
probabilities. Walpole (1982, Intro. To Statistics, p. 204) said that one possible guide to determine when
the normal approximation may be used is through the value of np and nq. If both are greater than 5, the
approximation will be good.
In the normal approximation, the conversion to z follows the formula
xn   xn  np
z 
 npq
wherexn is the corresponding cut-off score under the normal curve when it is superimposed on the
binomial probability histogram.
82
ST
Example 11.11. The probability that a patient recovers from a rare blood disease is 0.6. If 100 people are
known to have contracted this disease, what is the probability that
a. less than one-half survive?
We note that   np  100(0.6)  60 and   (100)(0.6)(0.4)  4.9
 49.5  60 
Then P(Xb< 50) ≈ P( Xn< 49.5) = P z   = P(z < -2.14) = 0.0162
 4.9 
b. more than 65 will survive?
c. from 40 to 70 will survive?
XII. SAMPLING DISTRIBUTION
Consider all possible samples of size n which can be drawn from a given population with size N.
For each sample we can compute a statistic, such as the mean or the standard deviation, and the value of
the statistic will vary from sample to sample. Hence, the statistic that we compute from sample to sample
can be thought of as a random variable. Being a random variable, it has a probability distribution. The
probability distribution of a statistic is called a sampling distribution. If the statistic that we compute is
the mean of each possible sample, then we have a sampling distribution of the mean. If we are interested
in the variance of each sample, then we come up with a sampling distribution of the variance.
Example 12.1. Suppose we have a population of size 4 whose values are 0, 1, 2 and 3. If we draw a
random sample of 2 elements the possible results and the corresponding mean of the sample are as
follows:
Sample Sample mean,

x
0,1 0.5
0,2 1
0,3 1.5
1,2 1.5
1,3 2
2,3 2.5
The sampling distribution of the sample mean therefore is
x P( X  x )
0.5 1/6
1 1/6
1.5 2/6
2 1/6
2.5 1/6
83
ST
If we graph this, the result is symmetric close to a bell-shaped form.
For each sampling distribution we can compute the overall mean of the statistic or the variance of
the statistic, etc. For example, we can compute the mean of all the sample means, or we can compute the
variance of the sample means.
Example 11.2. The mean of the sample means in the previous example is equal to
0.5  1  1.5  1.5  2  2.5

 1.5
6
Note that the mean of the population is also 1.5.
The variance of the sample means is
(0.5  1.5) 
 (1  1.5) 2  (1.5  1.5) 2  (1.5  1.5) 2  (2  1.5) 2  (2.5  1.5) 2  6
2
 (1  .25  0  0  .25  1)  6  2.5  6  0.416667
The properties below (taken from the Central Limit Theorem) are very important in statistical
inference because these will be used in evaluating inferential statements.
Properties: If all random samples of size n are drawn from a population with mean  and variance 2,
then
a.) mean of the sample means is equal to the mean of the population, i.e.,  x  
 N - n 
2
b.) the variance of the sample means,    
2
x if N is finite
 N -1  n
2
 2
x  if N is infinite or N is very large
n
Equivalently, the standard deviation of the sample means is
N-n   
x    if N is finite
N -1  n

x  if N is infinite or very large
n
c.) if the sample size n is large enough (at least 30) the sampling distribution of the sample mean X is
approximately normally distributed
84
ST
x  x x-μ
Hence, z or equivalently, z= is a value of the standard normal variable z.
x σ/ n
Remark: The normal distribution of the sample mean X is found to be true when the sample size of the
samples is large or n30. If the sample size is small, the sampling distribution of the sample mean X
also follows a normal distribution as long as the population from which the samples were taken is
normally distributed.
Exercise 12.3. Suppose the weights of a population is known to have a mean of 50 kilograms with a
standard deviation of 5 kilograms. What is the probability that a random sample of 36
elements drawn from this population will have a mean weight between 49.5 to 50
kilograms?
The use of the properties above require that the population standard deviation be known.
However, in many inferential problems it is not known. The theorem below states that we can use the
Student t-distribution instead of the z-distribution when the population standard deviation is unknown, n
is small, i.e., n< 30, and the population is normally distributed.
Theorem. (Student’s t - distribution)

If x and s2 are the mean and variance, respectively, of a random sample of size n taken from a
x-μ
population that is normally distributed with mean  and unknown variance 2, then t= is
s/ n
the value of the random variable T having the t distribution with v = n – 1 degrees of freedom.
Remarks:
1.) The t-distribution can be used only when the population from which the sample is taken is normally
distributed especially when the sample size is small.
2.) When the sample size n is large, the sample variance will approximate closely the value of the
population variance. Hence the t-distribution and the z distribution will be almost identical.
XIII. STATISTICAL INFERENCE
Statistical Inference is a procedure whereby inferences or conclusions about a population are

made on the basis of the results obtained from a sample drawn from that population.
Two major areas of Statistical Inference

1. Estimation
In estimation, the concern is on estimating unknown population characteristics (or parameters)
like the mean, proportion, or variance. These parameters will be estimated using the results obtained
from the sample.
2. Hypothesis testing
The concern is whether to accept or reject a statement regarding a population characteristic. A
parameter is designated or hypothesized before the collection of sample data is made. To make valid
decisions, most of these statistical procedures assume that the population of interest is normally
distributed.
85
ST
Statistical Inference I: Estimation
An interval estimate is an interval that we are willing to say with some degree of certainty
contains the parameter being estimated. This degree of certainty is called the level of confidence or
confidence coefficient, denoted by 1-. A statement of confidence can always be attached to each
interval estimate constructed. Hence, we can refer to interval estimates as confidence intervals and to
their bounds as confidence limits.
When computing a 100(1-)% confidence interval, where 1-is the confidence coefficient, we
may select any level of confidence we wish by varying the values of . Commonly used confidence
coefficients are 90%, 95% and 99% where the corresponding level of  is 0.1, 0.05 and 0.01,
respectively. A practical interpretation of a 95% confidence interval for a parameter, say population
mean, means that we are 95% confident that the interval computed from a random sample of size n
contains the population mean .
Confidence Interval for the Population Mean

To construct a 100(1-)% confidence interval for the population mean , consider the following
notations:
x = sample mean
z/2 = standard normal deviate whose area above it is /2
t/2 = standard t deviate whose area above it is /2
 = population standard deviation
s = sample standard deviation
n = sample size
v = degrees of freedom
The computing formulas are:

σ σ
Case 1. When  is known and n  30: x - z/2 << x + z/2
n n
s s
Case 2. When is unknown and n  30: x - z/2 << x + z/2
n n
Case 3. When  is unknown, the population is normally distributed, and n is small:
s s
x - t/2, v << x + t/2, v with v = n – 1degrees of freedom
n n
Exercise 13.1.
1. Suppose the average IQ of a random sample of 50 MSU students is found to be 105 with a standard
deviation of 9. Find a 95% and a 99% confidence interval estimate for the true average IQ of MSU
students. Compare your results.
2. The data below represents the length of life in minutes, measured to the nearest tenth, of a random
sample of 10 black flies subjected to a new spray in a controlled laboratory experiment: 0.7, 1.3, 1.7,
1.9, 2.1, 2.5, 2.8, 3.4, 3.9, 6.3. Assuming that the length of life is normally distributed, construct a
99% confidence interval for the true length of life of black flies subjected to the new spray.
3. A random sample of 8 batteries for pocket calculators is found to have a mean life of 58 hours of
continuous use with a standard deviation of 1.5 hours.
86
ST
(a) Assuming that the lifetime of batteries is normally distributed, find a 95% confidence interval for
the true mean life of this battery.
(b) The manufacturer of this battery claims that the average life of this battery is 60 hours. Is the
manufacturer justified by his claim using the result in (a)?
Confidence Interval for Population Proportion

When sample size is sufficiently large (n > 20) and population proportion is known to be not so
close to 0 or 1, then a 100(1-)% confidence interval for the true population proportion is given by
p̂q̂ p̂q̂
p̂ z α 2 < p < p̂ + z α 2 , where p̂ and q̂ are the proportion of success and failure, respectively,
n n
in a random sample of size n.
Exercise 13.2. In a random sample of 500 people eating lunch at a hospital cafeteria on various Fridays,
it was found that 160 preferred seafood. Find a 95% confidence interval for the actual
proportion of people who eat seafood on Fridays at this cafeteria.
Sample size determination

If we want to be 100(1-)% confident that our error in estimating the population mean will not
 z 2  
2
exceed an amount e, then our sample size will be at least n  . If the population variance
 e 
 
is not known a pilot study with n ≥ 30 may be conducted to estimate this value. The size of e depends on
the amount of error in estimating .
If the value to be estimated is the population proportion then the sample size should be at least
z2 pˆ qˆ
n 2
2
. The size of e depends on the amount of error in estimating p.
e
Exercise 13.3. A soft-drink machine is regulated so that the amount of drink dispensed is approximately
normally distributed with a standard deviation equal to 1.5 deciliters. How large a
sample is needed if we wish to be 95% confident that our sample mean will be within
.852 deciliter of the true mean.
Exercise 13.4. In exercise 12.2, how large a sample is required if we want to be 95% confident that our
estimate of p is within 0.02?
87
ST
Statistical Inference II: Hypothesis Testing
In many occasions, we are called upon to make decisions about a population characteristic. To
reach a decision, it is useful to make speculations or guesses regarding the population parameter. Such
speculations are called statistical hypotheses.
The null hypothesis, denoted by Ho, specifies the value or the possible range of values that a
population parameter may take. When evidence from the sample dictates that the value or range of values,
as stated in the null hypothesis, is not feasible, then an alternative range of values for the parameter of
interest can be assumed. This will be expressed in another mathematical equation called the alternative
hypothesis, and is denoted by H1.
Illustration 13.5. If one hypothesized that the average Filipino height is 165 cm, or  = 165 cm, a
possible alternative hypothesis may be one of the following:
a. the average Filipino height is not equal to 165 cm, or  165 cm,
b. the average Filipino height is greater than 165 cm, or > 165 cm,
c. the average Filipino height is less than 165 cm, or < 165 cm.
A Test of Statistical Hypothesis is a rule on deciding how to reject a hypothesis. The information
we gather from the sample will be used to decide whether the hypothesis is to be rejected or not.
Since the value of the population parameter, which we are trying to infer, is unknown, errors are
bound to be committed. It is a natural tendency to try to find ways to minimize these errors, since they
cannot be avoided. Statistical procedures provide a scientific way by which the errors in making a
conclusion are minimized. This is one of the major reasons why Statistics is being studied.
Types of Errors committed is testing hypothesis:

1. Type I Error – is the error of rejecting the null hypothesis when in fact it is true. The probability of
committing this error, which is denoted by , is also more popularly known as the
level of significance of the test.
2. Type II Error – is the error of accepting the null hypothesis when in fact it is false.
Other terminologies used in performing a test of hypothesis are as follows:

1. Test Statistic – the value based on the sample that is used to determine whether H 0 should be
rejected or accepted.
2. Critical Value – identifies the value of the test statistic that would lead to the rejection of H0 at the
chosen level of significance.
3. Critical Region or Rejection Region – is a part of the set of all possible values of a test statistic
for which H0 is rejected.
88
ST
Types of Statistical Tests
Let θ be the parameter of interest.
A. One-tailed test – is a test such that the alternative hypothesis is one-sided
H 0 : θ  θ0 , or , H o :   0 H 0 : θ  θ0 , or , H 0 :   0
or
H1 : θ  θ1 , H1 : θ  θ1 ,
B. Two-tailed test – is a test such that the alternative hypothesis is two-sided

H 0 : θ  θ0 ,
H1 : θ  θ1 ,
Exercise 13.6. State the null and alternative hypotheses to be used in testing the following claims and
determine generally where the critical region is located.
a. The mean IQ of MSU students is 105.
b. Not less than 50% of the MSU constituents are in favor of Dr. Ricardo de Leon’s
extension of term of office.
c. The average nicotine content of a particular brand of cigarette does not exceed 3.5 mg.
d. The average weight of the varsity basketball players this school year is at least 60
kilos.
Steps is Hypothesis Testing

1. Formulate the null and the appropriate alternative hypothesis.
2. Specify the level of significance to be used.
3. Select the appropriate test statistic.
4. Establish the critical regions.
5. Compute the actual value of the test statistic from the sample.
6. Make a statistical decision, which is to reject H 0 when the computed value of the test statistic is
within the critical region; otherwise, accept H0.
7. Draw the conclusion.
Remark: When a null hypothesis is stated, the objective of the experimenter or researcher is to reject it.
If a null hypothesis is rejected, the confidence is quite high that we made the right decision. However, if
the null hypothesis is not rejected, it does not follow that it is true. An appropriate conclusion will be to
state that “there is no substantial evidence to reject the null hypothesis” rather than concluding that the
null hypothesis is true.
89
ST
XIV. TEST FOR POPULATION MEANS
Table 14.1 give the summary of the different one-sided and two-sided null and alternative
hypotheses concerning single mean. Critical regions are also given to each of the test statistic for each of
the hypothesis.
A. Hypothesis Testing for a Single Mean
Table 14.1
Table for Testing H0:  = 0
H0 Test Statistic H1 Critical Region
 = 0 x  0 < 0 z < –z
z if  is known or n  30
/ n >0 z > z
0 z < –z/2 or z > z/2
 = 0 x  0 <0 z < –z
z if  is unknown and n  30
s/ n >0 z > z
0 z < –z/2 or z > z/2
 = 0 x  0 <0 t < –t
t if  is unknown and n  30
s/ n >0 t > t
with v = n – 1 0 t < –t/2 or t > t/2
Example 14.1. (two-tailed test)

A certain company wishes to know the mean weight of soy sauce of Brand A received in a bulk
scheduled delivery. A random sample of 20 Brand A soy sauce were weighed and the results recorded.
The mean weight was found to be 8.8 kgs. It has been known that the standard deviation of all the Brand
A soy sauce is 1.5 kgs. Is there a reason to believe that the mean weight of the Brand A soy sauce is
different from 9.0 kgs (as written in the bottles)? Use a 0.05 level of significance. Assume that the
weights of this soy sauce follow a normal distribution.
Solution:
Step 1. Null hypothesis: H0:  = 9.0 kgs
Alternative hypothesis: H1:  9.0 kgs
Step 2.level of significance,  = 0.05

x  0
Step 3. Since  is known, the appropriate test statistic is z  .
/ n
Step 4. The critical region is z< –z0.05/2 or z > z0.05/2

z < –z0.0.25 = -1.96 or z > z0.025 = 1.96
Step 5. Computed value of the test statistic is z  8.8  9.0 =

0.20
= 0.5963
1.5 / 20 0.3354
Step 6. Since the computed statistic does not fall on the critical region, we accept Ho.
Step 7. We conclude therefore at .05 level of significance that the mean weight of Brand A soy sauce is
not different from 9.0 kgs as written in the bottles.
90
ST
Example 14.2. (one-tailed test)
E. canis infection is a tick-borne disease of dogs that is sometimes contracted by humans.
Among infected humans, the distribution of white blood cell counts has an unknown mean  and standard
deviation . In the general population, the mean white blood cell count is 7250 / mm 3. It is believed that
persons infected with E. canis must on the average have lower blood cell counts. A random sample of 15
infected persons have a mean white blood cell count of 4767 / mm3 and the standard deviation is 3204 /
mm3. Do we have sufficient evidence to show that the mean white blood cell count of the infected
persons is lower than 7250 / mm3? Use  = 0.01 and assume the white blood cell count follow a normal
distribution.
Solution:
Step 1. Null hypothesis: H0:  = 7250 / mm3
Alternative hypothesis: H1: < 7250 / mm3
Step 2.level of significance,  = 0.01

x  0
Step 3. Since  is unknown and n < 30, the appropriate test statistic is t  with v = 15 – 1 = 14.
s/ n
Step 4. The critical region is t < – t0.01

t < – 2.624
 2483
Step 5. Computed value of the test statistic is t  4767 7250 = = – 3.001.
3204/ 15 827.269
Step 6. Since the computed value of t falls in the critical region (– 3.001 < – 2.624), we reject H0 at the
0.01 level of significance.
Step 7. Therefore, we have sufficient evidence to show that the mean white blood cell count of infected
persons is lower than 7250 / mm3.
Exercise 14.3. A study would like to know whether the mean height of 12- to 40-year olds who suffer
fetal alcohol syndrome is equal to the mean height of 12- to 40-year olds who do not suffer from fetal
alcoholic syndrome. A random sample of 31 selected from those who suffer from fetal alcohol syndrome
showed a mean of 147.4 cm. The standard deviation for this population is known to be 6 cm. The
population mean height for those who do not suffer from the syndrome is already known to be 160.0 cm.
Test the hypothesis that the mean height for those who have the syndrome is equal to 160.0 cm which is
the mean height of those who do not suffer from the syndrome. Use a 0.01 level of significance.
Exercise 14.4. Body mass index is calculated by dividing a person’s weight by the square of his or her
height. It is a measure of the extent to which the individual is overweight. For the population of middle-
aged men who later develop diabetes mellitus, the distribution of baseline body mass indices is
approximately normal with an unknown mean  and standard deviation . A sample of 58 men selected
from this group has mean body mass index of 25.0 kg / m2 and a standard deviation of 2.7 kg / m2. At the
0.01 level of significance, test whether the mean baseline body mass index for the population of middle-
aged men who do develop diabetes is significantly greater than 24.0 kg / m2 which is the mean baseline
body mass index for the population of men who do not develop diabetes.
91
ST
Exercise 14.5. The average score of the 100 randomly selected Math 31 Prelim Examination result is
48.54 with a standard deviation of 20. Test the hypothesis at the 0.01 level of significance that the true
mean score is less than 50.
Remarks: From the examples, we have seen that the probabilities of committing a type I error (denoted
by α ) and a type error (denoted by β ) have the following properties:
1. As a decreases, b increase. This means that as you decrease the possibility of committing a type I
error, you increase the possibility of committing a type II error.
2. Suppose we know the specific value of the mean in the alternative hypothesis. The farther the
alternative value of the mean from the hypothesized value, the lesser is the probability of committing a
type II error. The power of the test which is 1-b will increase.
3. Power is also a function of the sample size; the larger the sample size, the more power. This is
because the standard error,  x   / n , decreases as n increases. This means that for small samples,
even if there is a large difference between 0 and 1 , Ho may not be rejected. Conversely, when n is
large, small differences may be found to be significant.
B. Hypothesis Testing for Two Sample Means (From Independent Populations)
We will now discuss how to perform a test of hypothesis in comparing the means of two
unrelated populations.
Table 14.2 Testing Ho: 1   2

Ho Test Statistic H1 Critical Region
1   2 x1  x 2
z
 12  22 1   2 z < –z

n1 n2
1   2 z > z
If the population variances

1   2 z < –z/2 or z > z/2
 12 and  22 are known and
the sample sizes n1 and n 2
are both large.
1   2 x1  x 2
z
s12 s 22
1   2 z < –z

n1 n 2
1   2 z > z
If the population variances

1   2 z < –z/2 or z > z/2
are unknown and the sample
sizes are both large.
1   2 x1  x 2
t
1 1 
s 2p    1   2 t < –t
 n1 n2 
92
ST
Where 1   2 t > t
(n  1) s12  (n2  1) s 22
s 2p  1
n1  n2  2 1   2 t < –t/2 or t > t/2
If the population variances with degrees of freedom
are unknown but are equal, v = n1  n2  2

the sample sizes are small and
populations are normally
distributed.
1   2 x1  x 2
t 1   2 t < –t
s12 s 22

n1 n 2
1   2 t > t
If the population variances t < –t/2 or t > t/2

are unknown and not equal 1   2
with degrees of freedom
and the sample sizes are small 2 2
and the populations are  s1 s 2 
2
  
normally distributed.
v  n1 n2 
2 2
 s12   s 22 
 n 1  
    n2 
n1  1 n2  1
Exercise 14.6. The same English examination was given to random samples of 20 seniors at university A
and 22 seniors at university B. The students from university A got a mean of 76.8 with a standard
deviation of 9.5, while the students from university B got a mean of 83.6 with a standard deviation of
10.6. Assuming that the scores in both universities are normally distributed with equal variance, test if
there is sufficient evidence to say that the performance of the students in university A and university B are
significantly different. Use a .05 level of significance.
Exercise 14.7. Two techniques (A and B) for reducing anxiety among children were tried out on two
groups of high-anxiety children consisting of 20 members each. At the end of the experiment an
instrument to measure anxiety level was given to both groups. The instrument is scored such that high-
anxiety children get high scores. Group A got a mean score of 70 with a standard deviation of 10, while
group B got a mean score of 65 with a standard deviation of 8. Determine if there is sufficient evidence
to say that technique A is less effective than technique B. Use a .05 level of significance. Assume that
the scores are normally distributed with equal variance.
Exercise 14.8. A certain food manufacturer uses two machines to produce CARAMEL POPCORN in
regular and special packs. A random sample of 11 regular pack Caramel Popcorn showed a mean weight
of 5.95 kg with variance of .18 square kg while a random sample of 15 special pack Caramel Popcorn
showed a mean weight of 6.01 kg with variance of .2 square kg. Test whether the mean weight of a
regular pack of Caramel Popcorn is significantly different from the mean weight of special pack of
Caramel Popcorn using a 0.05 level of significance. Assume that the weights are normally distributed
with equal variance.
93
ST
Exercise 14.9. A random sample of 50 students from Region A showed a mean score of 80 with a
standard deviation of 15 in the SASE exam, while a random sample of 35 students from Region B showed
a mean score of 83 with a standard deviation of 10. Can we say that the average performance of the
students from Region A is lower than that of Region B? Use a 0.01 level of significance.
C. Hypothesis Testing for Paired Observations
Table 14.3. Table for Testing Ho: 1   2  0
H0 Test Statistic H1 Critical Region
d
1   2  0 t with v = n-1
sd / n
1   2  0 t  t
1   2  0 t  t
where
d  average of the
differences, di , in each pair 1   2  0 t  t  or t  t 
2 2
n d i   d i 
2 2
sd 
n(n  1)
= sample standard
deviation of the differences
Exercise 14.10. The weights in pounds (lbs) of 5 women who took a new dieting pill were recorded
before and after taking the pills for 2 weeks. Test the hypothesis that the pills, on the average reduce the
weight using   0.01 . The data including the difference in weights are recorded as follows:
(Note: 1 2 3 4 5 This is
a one- Women sided
test) Weight Before ( x1 ) 58.5 60.3 61.7 69.0 64.0
Weight After ( x 2 ) 60.0 54.9 58.1 62.1 58.5
D i = x1i  x 2i -1.5 5.4 3.6 6.9 5.5
Exercise 14.11. The following data come from a study that examines the efficacy of saliva cotinine as an
indicator for exposure to tobacco smoke. In one part of the study, seven subjects – none of whom were
heavy smokers and all of whom had abstained from smoking for at least one week prior to the study –
were each required to smoke a single cigarette. Samples of saliva were taken from all individuals 2, 12,
24 and 48 hours after smoking the cigarette. The cotinine levels at 12 hours and at 24 hours are shown
below
94
ST
Subject Cotinine Levels (mmol/l)
After 12 hours After 24 hours
1 73 24
2 58 27
3 67 49
4 93 59
5 33 0
6 18 11
7 147 43
Test at .05 level of significance if cotinine level is lower after 24 hours compared to 12 hours after
smoking the cigarette. Assume that the populations are normally distributed with equal variance.
D. Test On 3 Or More Means
Assumptions: The k populations from which the samples are taken are independent, each
population follow a bell-shaped distribution, and they have the same population variance which
is equal to  2 .
This time we will be using another notation. Let us suppose our data is displayed as follows:
GroupObservationsTotalMean
1 X11 X12 X13 ... X1r T1 X 1
2 X21 X22 X23 ... X2r T2 X 2
.
.
.
k Xk1 Xk2 Xk3 ... XkrTk X k

_______ _______
T.. X ..
(grand total) (grand mean)
Note: There are equal samples per group.
95
ST
The Analysis:
If the means of all the groups are the same, then r x variance of the k group means will be
almost the same as the pooled variance of all the groups. And so the ratio,
f = ( r x variance among the k group means ) / pooled variance of the k groups
will have a value near 1. The random variable f follows the F distribution with (k-1) numerator
degrees of freedom, which we denote by v1, and k(r-1) denominator degrees of freedom, which
w denote by v2.
However if the means are not all equal, then r x variance among the group means will be
much greater than the pooled variance of the k groups. Consequently, the value of the random
variable f will be much greater than 1 or it will be located at the right end tail of the f
distribution.
And because the analysis is focused on the comparison of the two variances, the
procedure to test for the equality of 3 or more means is called Analysis of Variance (ANOVA).
The Null and Alternative Hypotheses:
Ho: All group means are equal. (1 = 2 = 3 = = k )
H1: At least 2 means are different.
Test Statistic:
f = (r x variance among group means) / pooled variance of the k groups

2
r å (X i - X ..)
(r - 1) s12 + (r - 1) s22 + (r - 1) s32 + + (r - 1) sk2
= i
¸
k- 1 rk - k
Decision Rule:
We reject Ho if computed f is greater than f under (k-1) numerator degrees of freedom

and k(r-1) denominator degrees of freedom.
Alternative way of computing f:
(SS – stands for sum of squares)
T..2
SSTotal = å å X ij2 -
i j rk
96
ST
2
å Ti T..2
i
SSGroup = -
r rk
SSReplicate =SSTotal - SSGroup
ANOVA Table
Source of Sum of Degrees of Mean Square, Computed f

Variation Squares, SS freedom MS
Group SSGroup k-1 SSGroup / (k-1) MSGroup / MSReplicate

Means
Replicates SSReplicates k(r-1) SSReplicate /

k(r-1)
Total SSToatal rk-1
Note: Mean square is another name for variance.
Example 14.12. A chemical engineer is studying a newly developed polymer to be used in

removing toxic wastes from water. Experiments are conducted at 5 different temperatures. The
response noted is the percentage of impurities removed by the treatment. These data are
obtained:
Temperature
I II III IV V
40 36 49 47 55
45 42 51 49 60
42 38 53 51 62
48 39 53 52 63
50 37 52 50 59
51 40 50 51 61
Total
276 232 308 300 360
Mean 46 38.7 51.3 50 60
Test at .05 level of significance if the percentage of impurities removed by the treatment
is the same for the 5 temperature levels.
97
ST
Answer:
Ho: 1 = 2 =  3 =  4 = 5
H1: At least 2 i ’s are different.
2 2 2 2 2
å å X ij = (40) + (45) + + (59) + (61) = 74, 268
2
(1476)
SSTotal = 74, 268 - = 1, 648.80
6 (5)
2 2 2 2
(276) + (232) + + (360) (1476)
SSGroup = - = 74, 077.3 - 72, 619.20 = 1, 458.10
6 6 (5)
SSReplicate = 1, 648.80 - 1, 458.10 = 190.7
ANOVA Table:
Source of SS df MS Computed f
Variation
Group Means 1,458.10 5-1=4 364.53 47.8
Replicates 190.7 5(6-1)=25 7.63
Total 1,648.80 6(5)-1=29
If  = .05 , numerator df (v1) = 4, denominator df (v2) = 25 then f 0.05 = 2.76 .
Since computed f = 47.8 is greater than the tabular value of 2.76, we reject Ho and accept
H1. That is, at least 2 group means are significantly different.
E. Multiple-Range Test
The analysis of variance is a powerful procedure for testing equality of several means.
However, when we reject Ho and accept H1 – that not all means are equal – we still do not know
which group means are equal and which are different. There are several tests that separate a set
of significantly different means into subsets of homogeneous means and here we will discuss the
procedure for the Duncan’s Multiple –Range Test.
Assume that the k groups have equal sample size n.

1. First, we arrange the means of the groups from highest to lowest.
2. Second, we compute the range (highest-lowest) of any subset of p sample means. The
range must exceed a certain value before we consider any of the p population means to be
98
ST
different. This value is called the least significant range for the p means, and is denoted
by Rp, where
s2 MSError
R p  rp  s x  rp  rp
n n
where rp , called the least significant studentized-range, depend on the desired level of
significance and number of degrees of freedom for the MSError in the ANOVA table.
The value of rp can be obtained from Table A.11.
3. Third, we label the groups that are not significantly different by connecting them with a
line or labeling them with same letters. Groups that are caught between two different
labels are subject to further investigation (like taking more samples) to determine which
group they really belong.
Exercise 14.13. Consider the results in example .Do a Duncan Multiple-Range Test to determine
which groups (temperature level) have different means at 0.01 level of significance.
Exercise 14.14. The data in the table below represent the number of hours of pain relief
provided by 5 different brands of headache tablets administered to 25 subjects. The 25 subjects
were randomly divided into 5 groups and each group was treated with a different brand.
Tablet
A B C D E
5 9 3 2 7
4 7 5 3 6
8 8 2 4 9
6 6 3 1 4
3 9 7 4 7
Total 26 39 20 14 33 132
Mean 5.2 7.8 4.0 2.8 6.6 5.28
The ANOVA result is shown below:

Source of Variation Sum of Degrees of Mean Square Computed f
Squares Freedom
Treatment (Tablet) 79.440 4 19.860 6.90
Error (Replicate) 57.600 20 2.880
Total 137.040 24
i.) Determine if the mean hours of pain relief provided by the 5 tablets are not all equal.
ii.) If Ho is rejected in part (i), do a Duncan Multiple-Range Test to determine which
tablets have different means using a 0.05 level of significance.
99
ST
XV. TEST FOR THE EQUALITY OF SEVERAL VARIANCES
Ho :  12   22       k2
H 1 : The variances are not all equal.
Test statistic: b
s  2 n1 1
1 s 
2 n2 1
2  
   s k2 
nk 1 1 /( N  k )
s 2p
Where s 2p is the pooled variance of the k groups
And N  n1  n2  ...  nk .
B is a value of the random variable B having the Bartlett distribution.
Critical region:
If n1  n2  ...  nk  n then we reject Ho if b < bk ( ; n) .
If the sample sizes are unequal, we reject Ho if b  bk ( ; n1 , n2 ,..., nk )
n1bk ( ; n1 )  n2 bk ( ; n2 )  ...  nk bk ( ; nk )
Where bk ( ; n1 , n2 ,..., nk ) 
N
Exercise 15.1. Using the previous example on 5 temperature levels in removing toxic waste
from water, test at .05 level of significance if the variances of the 5 groups are equal.
Exercise 15.2. Using the previous example on 5 tablets for relieving pain, test at .01 level of
significance if the variances of the 5 groups are equal.
XVI. TEST ON CORRELATION
A. For Categorical Variables
Remember before that we computed the Cramer’s V coefficient to determine the correlation
between categorical variables. In the process of computing Cramer’s V, we computed for χ 2. To test
hypothesis on the presence of correlation for categorical data, the most popular is the chi-square test for
independence. We do the following steps:
Ho : There is no correlation between variable X and variable Y. ( Or we say that they are not related, or
we say variable X does not affect variable Y, etc.)
H1: There is a correlation between variable X and Variable Y (or we say that they are related, they are
dependent, or one variable affects the other, etc.)
c r o  eij 
2
Test statistic: Chi-square (χ ) test for independence:   

2 2 ij
j i eij
100
ST
Critical Region: χ2>  2 with (r-1)(c-1) degrees of freedom.
Note: r = number of rows

c = number of columns
Example 16.1. Recall our example before:

A study was conducted to determine if there is a correlation between smoking status and the
presence or absence of cervical cancer. A survey was conducted on 656 women, and they were classified
as having cancer or without cancer and whether they are smokers or non-smokers. The results are as
follows:
Smoker Non-smoker Total

With Cancer 108 117 225
No Cancer 163 268 431
Total 271 385 656
.
Test if there is sufficient evidence to show that there is correlation between smoking and the presence of
cervical cancer. Use a .05 level of significance.
 Ho: There is no correlation between smoking and the presence of cervical cancer.
H1: There is a correlation between smoking and the presence of cervical cancer.
(225)( 271) (225)(385)

e11 = = 92.9 e12 = = 132.1
656 656
(431)( 271) (431)(385)

e 21 = = 178.1 e 22 = = 252.9
656 656
It will be convenient to write the expected frequency beside the corresponding observed
frequency before  2 is computed.
108 (92.9) 117 (132.1)

163 (178.1) 268 (252.9)
(108 92.9) 2 (117 132.1) 2 (163 178.1) 2 (268 252.9) 2

χ2 = + + +
92.9 132.1 178.1 252.9
= 2.45 + 1.73 + 1.28 + 0.90 = 6.36
 Critical value:  2   .205 = 3.841 under v = (r-1)(c-1) = (2-1)(2-1) = 1
 Decision and Conclusion: Since our computed χ2 = 6.36 is greater than 3.841, we reject H 0 and
conclude there is sufficient evidence to show correlation between smoking and presence of
cervical cancer.
Exercise 16.2. A study was conducted to evaluate the relative efficacy of supplementation with calcium
versus calcitriol in the treatment of postmenopausal osteoporosis. Calcitriol is an agent that has the
ability to increase gastrointestinal absorption of calcium. A number of patients withdrew from this study
101
ST
prematurely due to the adverse effects of treatment, which include thirst, skin problems, and neurologic
symptoms. The relevant data appear below:
Withdrawal
Treatment Yes No Total
Calcitriol 27 287 314
Calcium 20 288 308
Total 47 575 622
a) Compute the sample proportion of subjects who withdrew from the study in each treatment group.
b) Test the null hypothesis that there is no association between treatment group and withdrawal from
the study at the .05 level of significance. What do you conclude?
Remarks:
1) In a 2x2 contingency table (where at least one expected frequency is less than 10), the degree of
freedom is 1, so a correction factor, called the Yate’s correction for continuity must be applied. The
formula is
( oij e ij  0.5) 2
 (corrected)  
2
i j eij
2) The chi-square test for independence is valid only provided that no expected frequency is less than 1
and not more than 20% of the cells have expected frequency less than 5. If this is violated, usually
we merge classes meaningfully so that the expected frequency will increase. In effect also, we reduce
the number of rows or the number of columns.
B. For Numerical Variables
Recall that for numerical variables, we use the Pearson r to determine the strength of the
relationship. The value of the coefficient can be anywhere from -1 to +1. The procedure for testing
hypothesis on this correlation coefficient is shown in the table below.
Table 14.1 Testing Ho: ρ = 0
Ho Test Statistic H1 Critical Region

r
ρ=0 t  0 t  t
1 r2
n2
 0 t  t
 0 t  t  or t  t 
2 2
with degrees of freedom
v = n-2
102
ST
Exercise 16.3. Compute and interpret the correlation coefficient for the following grades of 6 students
selected at random:
Math grade 70 92 80 74 65 83
English grade 74 84 63 87 78 90
Then test at 0.01 level of significance if we can say that there is a relationship between Math and English
grade.
Exercise 16.4. A study was made by a retail merchant to determine the relation between weekly
advertising expenditures and sales. The following data were recorded.
Advertising costs ($) Sales ($)
40 385
20 400
25 395
20 365
30 475
50 440
40 490
20 420
50 560
40 525
25 480
50 510
Compute and interpret the correlation coefficient. Then test at 0.05 level of significance if there is
sufficient evidence to say that there is really a relationship between advertising cost and sales.
XVII. TEST FOR EQUALITY OF SEVERAL PROPORTIONS
Suppose we would like to compare the proportion of successes of k independent

populations, we may apply the following test:
Ho: p1  p2  ....  pk
H1: Not all proportions of successes are equal.
o  eij 
2
Test statistic:   
2 ij
eij
where oij  observed frequencyin the (i, j)th cell
eij  expected frequencyin the (i, j)th cell
Critical Region:  2   2 with (r - 1)(c - 1)  (2 - 1)(c - 1)  c - 1 degrees of freedom
103
ST
Exercise 17.1. In a shop study, a set of data was collected to determine whether or not the
proportion of defectives produced by workers was the same for day, evening or night shift
worked. The following data were collected:
Shift
Day Evening Night
Defectives 45 55 70
Nondefectives 905 890 870
Use a .025 level of significance to determine if the proportion of defectives is the same for all
three shifts.
Exercise 17.2. The following data come from a study designed to investigate drinking problems
among college students. In 1983, a group were asked whether they had ever driven an
automobile while drinking. In 1987, after the legal drinking age was raised, a different group of
college students were asked the same question.
Drove while drinking Year Total
1983 1987
Yes 1250 991 2241
No 1387 1666 3053
Total 2637 2657 5294
Test the hypothesis that the proportion of students who drove while drinking are the same in the
two calendar years using a .05 level of significance.
XVIII. TEST FOR GOODNESS-OF-FIT
This is used to test the hypothesis that a random variable follows an expected
distribution.
Ho: The distribution of the observed sample follows the expected distribution.
H1: The distribution of the observed sample does not follow the expected distribution.
(oi  ei ) 2
Test statistic:   
2
ei
Where oi  observed frequencyin the ith cell
ei  expected frequencyin the ith cell
 (expectd % of observations to fall in the ith category) x (total # of obs.)
Critical Region:  2   2 with v degrees of freedom
Note: The degrees of freedom is c - # of parameters estimated from the sample that is used in the
computation of the expected frequency. If only the total number of observations in the sample is
used in the computation of the expected frequency, then degrees of freedom is c-1.
104
ST
Exercise 18.1 a die is tossed 180 times with the following results:
X 1 2 3 4 5 6
frequency 28 36 36 30 27 23
Is this a balanced die? Use a .01 level of significance.
Exercise 18.2. A researcher wanted to determine whether age is a factor that contributes to the
occurrence of urinary tract infection. In achieving this goal, the researcher took data from a
sample of hospitals and take note of the ages of the patients who were found to have urinary tract
infection. Consider the fictitious data below:
Age <2 2-5 6-12 13-25 26-40 >40 Total
Obs. No. 60 30 16 18 56 130 310
of
patients
In order to test whether age is a factor in the occurrence of urinary tract infection, the researcher
compares the percentage distribution of the patients according to age and the percentage
distribution of the local population. Data from the local NSO shows the following distribution of
the population:
Age <2 2-5 6-12 13-25 26-40 >40 Total

Population 4850 4615 4560 4485 4400 3980 26890
What is the finding of the researcher using a .05 level of significance?
Exercise 18.3. A machine is supposed to mix peanuts, hazelnuts, cashews, and pecans in the
ratio 5:2:2:1. A can containing 500 of these mixed nuts was found to have 269 peanuts, 112
hazelnuts, 74 cashews and 45 pecans. At the .05 level of significance, test the hypothesis that the
machine is mixing the nuts in the ratio 5:2:2:1.
105
ST

STT041 STT041.1 2ND Set of Hand-Outs

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

STT041 STT041.1 2ND Set of Hand-Outs

Hochgeladen von

Copyright:

Verfügbare Formate

VIII.

STATISTICAL EXPERIMENT AND COUNTING TECHNIQUES

Statistical Experiment – is an experiment that generates data.

Sample Space – the set of all possible outcomes of an experiment, denoted by S.

Sample Point – refers to each outcome in the sample space.

2. Compound event – one that can be expressed as a union of simple events.

Null or empty space – a set that contains no element, denoted by Ø.

A. Fundamental Principle of Counting (FPC)

Solution: (2) (2) (2) = 8 sample points

B. Permutation – is an ordered arrangement of all or a part of n distinct objects.

Property 1: The number of permutations of n distinct objects taken all at a time is n!

Solution: n! = 5! = 5 x 4 x 3 x 2 x 1 = 120 ways.

Property 3: The number of permutations of n distinct object arranged in a circle is n  r !

Solution: There are n = 6 varieties of orchids. The number of permutations is (6-1)! = 5! = 5 x 4 x

Approaches in Measuring Probability

Some Laws of Probability

Law 1. If events A and B are mutually exclusive, then P( A  B)  P( A)  P( B) .

Law 2. If A and B are any events, then P( A  B)  P( A)  P( B)  P( A  B) .

Note:  in Layman’s term is “both” or “common” to the given events.

Law 3. If A is an event, then P(A’) = 1 - P(A).

3. If the probability of passing an exam is 0.4, what is the probability of failing?

Outcome HHH HHT HTH HTT THH THT TTH TTT

X. RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

Two Types of Random Variables:

2. Continuous Random Variable – can take on any value in a given interval.

Two Types of Probability Distributions:

Example10.4. Construct the probability distribution of X representing the number of heads in

Weight (in kilos) Probability

Weights (Kg) CB freq Relfreq

Creating a classical probability distribution from an empirical probability distribution of a continuous

Definition. Mean of a Random Variable

P(X = x) f(x1) f(x2) ... f(xn)

The mean or expected value of X is

P(X = x) 1/8 3/8 3/8 1/8

P(X = x) f(x1) f(x2) ... f(x xn)

The mean or expected value of X is

P(X = x) 1/12 1/12 ¼ ¼ 1/6 1/6

Properties of the Mean of a Random Variable

Therefore, g(X) = 2X – 1 = 2X – 1 = (2) (41/6) – 1 = $ 38/3 = $12.67.

Exercise 10.11. The probability distribution of the discrete random variable X is

a. Find the mean of X.

Definition. Variance of a Random Variable

P(X = x) f(x1) f(x2) ... f(x xn)

2. We denote the variance of the random variable as  2X or simply 2.

In Example 10.8 we showed that  = 1.5. So now, we solve for E(X2).

Exercise 10.13: Verify the result using the definitional formula

Definition. Variance of a Function of One Random Variable

P(X = x) f(x1) f(x2) ... f(x xn)

The variance of the random variable g(X) is

Properties of the Variance of a Random Variable

1. If X is a random variable and b is a constant, then 2X b  2X  2

a. Find the variance of X.

XI. SOME DISCRETE PROBABILITY DISTRIBUTIONS

Tossing a coin n times is a classical example of a binomial experiment. From this

a. The experiment consists of n repeated trials.

NOTE: A binomial random variable is defined as the number of success in n trials.

P(X=0) = P{TTT} = (1/2) (1/2) (1/2) = 1/8

P(X=3) = P{HHH} = (1/2)3

In general, we can express the formula for the probability distribution of X by

The mean and variance of the binomial distribution b(x ; n , p) are

C. Negative Binomial Distribution

P(X=3) = P{HHH} = 1/8