Beruflich Dokumente
Kultur Dokumente
Probability Models
2.1
2
two possible outcomes viz., head and tail, which are mutually exclusive
and exhaustive. If now we like to incorporate equal likelihood of occurrence
of these elementary outcomes, then we have to presume that the coin is symmetrical (unbiased) and made completely of a homogeneous material.
The classical definition was enunciated by Laplace & Bernoulli. If there are n
exhaustive, mutually exclusive and equally likely cases associated with a random
experiment E, and of these, m are found to be favourable to a given event A,
then the relative frequency of A in n repetitions of the same experiment E will
be
m
n.
m
n
P (A) =
3
known to be biased in favour of heads, the classical definition stands helpless to
interpret the probability of head or tail.
Even if we accept this equally likely criterion, we may not be able to enumerate all the cases in practice. The counting method fails miserably when
the number of possible outcomes become countably infinite or even uncountable. For example, if the random experiment E consists of tossing a coin (supposed unbiased) until a head appears, then would be countably infinite as
= {H, T H, T T H, T T T H, . . . ad inf}. Then to count the elementary outcomes
of we have no combinatorial clues. So the classical definition has a very small
compass of applicability - a limited class of random experiments for which the
sample space is finite. From the mathematical formula it also follows that
classical definition can yield only rational numbers for probability of an event
a limitation which is very much unacceptable.
The use of the classical definition is limited to the games of chance. The concept
of equally likely might perhaps sound natural in throwing a coin or a pair of
dice, where there is some symmetry inherent in the system, but what about
the systems when any such symmetry is lacking ? For instance, can anybody
proclaim any symmetry in the sex-distribution of the newborn that fully depends on the biology of the organisms ? The answer is in the negative. Thus,
overviewing the various drawbacks it seems that refinement of the definition of
probability is on the cards.
2.2
4
account for this apparent irregularity or chaos in the outcomes of the successive
trials. But when the outcomes of a large number of repetitions are looked upon
a striking regularity is felt even in this chaos. This is what is loosely interpreted
as statistical regularity.
We sum up this empirical facts in the following :
Let some random experiment E be repeated under identical (invariable) conditions as many as n times. Let an event A, connected with this random experiment E be observed to occur n(A) times. The ratio
n(A)
n
is a rational number
n(A)
n
We shall therefore conclude that the event A has a probability, say P (A), if it
possesses the following peculiarities :
(i) It is plausible, at least conceptually, to repeat the random experiment
indefinitely large number of times under uniform conditions in each of
which the event A may or may not occur.
(ii) As a result of a sufficiently large number of trials it is to be presumed that
the frequency of event A for nearly every large run of trials departs only
slightly from a certain constant.
5
Advantages : (i) Unlike the classical definition, Von Mises statistical definition
has a relatively wide domain of applicability since it applies well to the random
experiments for which the elementary outcomes are not equally likely and/or
countably infinite.
(ii) The statistical definition of probability given here is descriptive rather than
formally mathematical in character. The frequency interpretation assigns an
empirical meaning to probability.
Disadvantages : Von Mises definition, although far more efficient than Laplaces
classical definition, suffers from the following deficiencies :
(i) It claims stabilisation of the relative frequency of an event near a definite
value purely on the basis of practical experience and fails to account for
the traits of those phenomena for which relative frequency stabilises.
(ii) Suppose that from some source we come to know that probability of an
event A is p. However, through a series of independent trials we observe
that the relative frequency of A deviates substantially from the proposed
value p. In this case we shall not doubt the existence of a definite probability of the event A (i.e., we shall not be circumspect about the convergence
of the rational sequence { n(A)
n } or the correctness of the a priori information) but will be sceptical about our premises concerning the proper
organisation of the trials. This belief on which statistical definition bases
itself has insuperable logical difficulties.
(iii) Von Mises definition is clueless when the sample space associated with a
random experiment is uncountably infinite. For example, if the experiment
involves choosing a point at random from the unit interval [0, 1], then the
sample space is uncountable and so statistical definition fails to click.
(iv) The greatest weakness of the statistical definition is inherent in its mathematical formalism. It tries to blend the empirical concept of relative
frequency to that of a more subtle analytic concept of convergence of a
sequence by claiming statistical regularity.
A Few Important Results Deduced from the Statistical Definition
In the following discussion, is the sample space associated with a random
experiment E and stands for the impossible event. is the certain event.
6
(a) Since the relative frequency of the impossible event is 0,
n()
= 0,
n n
P () = lim
n(A1 )
;
n
n(A2 )
;
= lim
n
n
n(A1 A2 )
= lim
;
n
n
=
lim
n(A1 )
n(A2 )
n(A1 ) + n(A2 )
+ lim
= lim
= P (A1 A2 ),
n
n
n
n
n
where in the penultimate step we used the result n(A1 A2 ) = n(A1 ) + n(A2 )
by virtue of Inclusion Exclusion principle (See art. 1.3) since A1 and A2 are
mutually exclusive events.
By principle of Mathematical Induction, lets extend this result literally to
any finite number of pairwise mutually exclusive events : Suppose A1 , A2 , . . . , An
be m pairwise mutually exclusive events that is,
Ai Aj = for i 6= j, i, j = 1, 2, . . . , m.
We are to prove that
m
X
i=1
P (Ai ) = P ( Ai ).
i=1
m
X
i=1
P (Ai ) = P ( Ai ), Ai Aj = for i 6= j
i=1
k
X
P (Ai ) = P ( Ai )
i=1
i=1
P (Ai )
i=1
k
X
k
+ P (Ak+1 ) = P Ai + P (Ak+1 )
i=1
i=1
k
k+1
= P ( Ai ) Ak+1 = P Ai ,
=
P (Ai )
i=1
i=1
i=1
P ({ai }) = 1
i=1
However, for classical definition of probability to hold good, the outcomes are
equally likely, i.e, P ({ai }) are all equal.
1=
m
X
i=1
implying,
P ({ai }) =
1
.
m
8
Hence for any event A, P (A) 0.
The salient deductions from Von Mises definition, therefore yield the following
features :
(i) Non-negativity of probability of any event A.
(ii) The probability of certain event is unity and that of impossible event
is zero. As and are mutually exclusive and exhaustive, P () = 1
implies P () = 0 and vice versa. Thus we need not mention both the
results separately as they are interrelated.
(iii) Probability of pairwise mutually exclusive events are finitely additive.
(iv) Under specific restrictions, classical definition follows from the statistical
definition.
Remark : (a) Classical definition being retrievable from the statistical definition and many other important deductions having been made from this definition, we are naturally inclined to improve the statistical definition in such a
way that all these deductions could be transmitted straightforward to the new
theory. Axiomatic theory of probability was propounded by Kolmogorov in order fulfill this aim of not losing what we have gained already and at the same
time overcoming the logical difficulties. Indeed, if we are to bypass the several
logical impediments faced in statistical definition, then Axiomatic theory is the
best alternative.
(b) Axiomatic treatment of a subject is, however, not new to an undergraduate
student as he/she had a number of occasions to lay hands on it. The
previous instances of axiomatic development may be shortlisted as :
(i) In modern theories of geometry, where the existence of the fundamental
ingredients like points, straight line and planes are left uninterpreted but
whose existence is axiomatised at the onset of the development of the
subject.
(ii) In the development of the set theory where a precise definitions of the
fundamental concept of set itself is lacking.
9
(iii) In modern analysis, where the number system is developed from the
Peanos axioms, followed by the Axioms of Integers which appear in
three equivalent forms, viz, the well-ordering principle, the first principle of mathematical induction and the second principle of mathematical
induction. Here also none is involved in the interpretation of axioms themselves.
Once we embark on the axiomatic development, it is irrelevant to ponder over
the interpretation of probability. Its better that we develop the mathematical
theory of probability from a handful of axioms and look into the preceding relative frequency interpretation of probabilities as barely an intuitive motivation
behind the upcoming definitions and theorems.
(c) Axiomatic theory of probability should be processed in such a way that
it is logically sound, free from the shortcomings of the preceding theories
and expansive enough to accommodate all kinds of random experiments no matter whether the associated sample space is finite, countably infinite
or even uncountable (continuous). But once infinite sample spaces get into
our discussion, we are compelled to update the concept of event itself as
no longer the crude definition any subset of the sample space is an event
is sensible.
2.3
10
to demand that F is closed under finite union, finite intersection and complementation. For instance, if A and B be two events, then A B occurs if the
outcome of our experiment is representable either by a point in A or a point in
B. Clearly then, if it is going to be meaningful to talk about the probabilities
that A and B occur, it should also be meaningful to talk about the probabilities
that either A or B occurs, i.e., the event A B occurs. Since only sets in F
would be assigned probabilities, we should demand that A B F whenever
A, B F. Again to say that the event A does not occur is to say that the outcome of our experiment is not represented by a point in A, so that it must be
represented by a point in Ac . It sounds rather idiotic that we could talk about
the probability of A and not of Ac . Thus to be more realistic we demand that
F is closed under complementation, i.e., Ac F whenever A F. By principle
n
k=1
11
and is our fundamental object of interest as assigning probability to each such
elementary event serves our purpose. In case is uncountably infinite, P(),
although a -algebra by nature, is not chosen for F since it is too large to work
with. So how to choose F when is uncountable ? The clue can be had from
the fact that given any non-empty class C of subsets of , there exists a unique
-algebra F that contains C - in this case F is referred to as the -algebra
generated by C. Hence the working F depends solely on what non-empty class
F of subsets of we choose in a specific context.
Once the -algebra F relative to gets fixed up, we have an algebraic structure (, F), known as a measurable space and the members of F as measurable sets. In the language of probability theory, (, F) is known as Event
space and the elements of F are known to be events. In particular, when
= R and F = B, the measurable space (, F) becomes the one-dimensional
Borel space (R, B). This measurable space lies at the heart of distribution functions of random variables. In the above discussion we have not yet stated what
B consists of B indeed is the -field generated by the class of all semi-infinite
intervals of the form (, a], a R. B contains all the singletons and all types
of intervals,
finite or infinite, because of the following relations :
1
(i)
(, a) = (, a ]; a R and n N
n=1
n
(ii)
(a, b] = (, b] (, a]; a, b R
1
1
(iii)
{a} = (a , a + ]; a R
n=1
n
n
c
(iv)
(a, +) = ((, a]) ; a R
(v)
(vi)
(vii)
R = (, n]
(viii)
= (, n]
n=1
n=1
Remark : (i) The family of all open intervals, closed intervals and family of all
open-closed or closed-open intervals are not -algebra.
(ii) The non-trivial -algebra (i.e., other than P()) was required for nonenumerable because it is impossible to define probabilities consistently for all
subsets of a non-enumerable like R or any internal in R (since any interval in
R is numerically equivalent to R itself vide Schroeder Bernstein theorem).
(iii) Some texts define the -algebra F as a collection of subsets of that is
12
closed under countable union & countable intersection and contains the sample
space itself. This definition is equivalent to our definition of -algebra check.
Definition of -algebra : A non-empty collection F of subsets of a set is
called a -algebra or a -field of subsets of provided it satisfies the following
properties :
(i)
(ii)
(iii)
n=1
n=1
From this it transpires that every -field is a field but the converse is not
always true.
Probability Space
To complete the construction of a decent mathematical model for a random
experiment there remains the job of specification or assignment of probabilities. The completed model is conventionally called Probability space which
contains together with a measurable space (, F), a set function P (). A probability space is often denoted as (, F, P ()), where is the sample space of the
random experiment under discussion, F is the -algebra of events of possible
interest and P () is the probability measure that assigns to each set A F a
real number P ().
The probability P (A) in the mathematical model is something that is assumed
to exist as an intrinsic feature of the random experiment, intended to represent
the proportion of trials in which A occurs - no matter whether a long sequence
of trials is carried out in practice or not. For a given A F, P (A) is intended
to embody whats commonly deemed as As chances of occurrence. Since
proportion is a positive real number between 0 and 1, 0 P (A) 1 is a
must. Again to reflect the realistic nature of the mathematical model, the event
representing every possible outcome of the experiment should be assigned
probability
Unity i.e., P () = 1 is mandatory. We recall that the need of an axiomatic
theory of probability was felt to overcome the drawbacks of the classical and
statistical definitions and so the new model we are trying to build up, must
involve the extracts of those predated theories. Non-negativity of probability
of all events and the fact P () = 1 were already involved in earlier theories.
13
The other thing that appears as inheritance from the preceding theories is the
finite additivity of the probability measure. It states that if A1 , A2 , . . . , An
be any finite sequence of pairwise mutually exclusive events, i.e., if Ai Aj =
whenever i 6= j,
P
Ak
k=1
n
X
=
P (Ak )
K=1
But mind that, our model promises to be valid for all kinds of sample spaces and
our F is a -algebra of events. Hence, we have to extend the finite additivity
criterion to countable additivity because otherwise, we cannot account for
the probability of events that decompose into a countably infinite number of
more elementary events. In fact, unless the extended axiom of addition is
incorporated there may crop up a series of mathematical predicaments in the
new model. Hence the updated model involves a probability measure which is
a real-valued set function whose domain is the -algebra Fc and which satisfies
the properties :
(i) P (A) 0 A F
(ii) P () = 1
F and moreover, P ( An ) =
n=1
n=1
P (An ).
n=1
The last axiom of countable additivity holds the key in our model and includes
finite additivity as a special case.
Remarks : (i) Kolmogorovs set of axioms is incomplete as even for one and the
same sample space , we can choose the probabilities of the events in F in more
than one ways. For instance, let us consider throwing a die the associated
sample space being = {1, 2, 3, 4, 5, 6}. Had this die been fair, i.e., symmetrical
in geometrical appearence and made of a homogeneous material, we would have
P (k) =
1
6
for all k = 1, 2, 3, 4, 5, 6. Had the die been not fair, the probability
1
12 .
1
4;
P (2) =
14
correct probabilities of each event, but since to have that knowledge, one has
to perform infinitely many trials, one has no scope of asserting whether the
associated probability space represents the random experiment faithfully. This
shortcoming is a general feature of any mathematical model.
(iii) If the sample space is discrete and has n elements, it is possible to assign
X
to every singleton {w} a definite probability. If A F then P (A) =
P ({}).
A
In particular, one may assign equal probability to all the elementary events of
. If cardinality of A be m, P (A) =
m
n.
wA
15
(i) Since for any event A, A Ac = and A Ac = , axiom of countable
additivity asserts that 1 = P (A) + P (Ac ), which implies that
P (Ac ) = 1 P (A).
(ii) Using the second axiom of Kolmogorov and also (i) we have :
P () = P (c ) = 1 P () = 1 1 = 0
i.e., probability of an impossible event is 0.
Remark : If the probability P (A) of any event A be zero, we cannot conclude
that A = . The events A for which P (A) = 0 are said to be stochastically
impossible. Impossible events are stochastically impossible but converse is not
always true. The best example of a non-trivial stochastically impossible event
can be had in case of continuous distributions.
The complement of a stochastically impossible event is called stochastically certain event. Here also any event B for which P (B) = 1, B is not
necessarily the certain event . The stochastically certain events are independent of any other event, the proof of which will be furnished when we introduce
the idea of independence of events. (see worked example (25))
(iii) P is bounded above by 1 i.e., P (A) 1 A F because P (Ac ) = 1
P (A) 0 (vide first axiom).
(iv) P is monotone and subtractive, i.e, if A, B F and A B, then P (A)
P (B) and P (B A) = P (B) P (A).
Proof : Since A B, we can write B = A (B A) where A (B A) = .
From countable additivity it follows that P (B) = P (A) + P (B A) and as
P (B A) 0, P (A) P (B) holds true. Moreover, P (B A) = P (B) P (A).
If B were , then monotonicity would assert that P (A) P () and as P () =
1, P (A) 1. So (iv) includes (iii) as a special case.
(v) Addition Rule : If A, B are two events in F,
P (A B) = P (A) + P (B) P (A B)
Proof : From the juxtaposed figure it follows that
(A B) (B A) (A B)
A = (A B) (A B)
B = (B A) (A B)
AB
16
= P (A B) + P (B A) + P (A B)
P (A) = P (A B) + P (A B)
P (B) = P (B A) + P (A B)
P (A B)
so that
P (A B) = P (A) + P (B) P (A B).
(vi) Extended or General Addition Rule : If A1 , A2 , . . . , An be n random
events, then
n
P ( Ai ) =
i=1
n
X
P (Ai )
i=1
n
X
P (Ai Aj ) +
i,j=1
i<j
n
X
P (Ai Aj Ak )
i,j,k=1
i<j<k
. . . + (1)n1 P (A1 A2 . . . An )
In (v) we have already seen our result to be true for n = 2, i.e.,
P (A1 A2 ) = P (A1 ) + P (A2 ) P (A1 A2 ).
Assume the result to be true for n = t in particular.
t
P ( Ai ) =
i=1
t
X
i=1
P (Ai )
t
X
i,j=1
i<j
P (Ai Aj ) +
t
X
P (Ai Aj Ak )
i,j,k=1
i<j<k
. . . + (1)t P (A1 A2 . . . At )
17
t+1
t
( Ai ) At+1
i=1
t
t
= P Ai + P (At+1 ) P
Ai At+1
i=1
i=1
t
t
= P Ai + P (At+1 ) P (Ai At+1 )
i=1
i=1
t
t
= P Ai + P (At+1 ) P Bi (writing Bi Ai At+1 )
t+1
P ( Ai )
= P
i=1
i=1
i=1
Again we have
P
Bi
i=1
t
X
P (Bi )
i=1
t
X
t
X
P (Bi Bj ) +
i,j=1
i<j
P (Bi Bj Bk )
i,j,k=1
i<j<k
. . . + (1)t P (B1 B2 . . . Bt )
Now,
t
X
P (Bi ) =
i=1
is
t
X
P (Bi Bj ) =
i,j=1
t
X
i=1
t
X
i,j=1
i<j
Aj Ak ) is tC2 )
and in
t
X
P (Ai Aj ),
i,j=1
i<j
t
X
i,j,k=1
i<j<k
P (Ai Aj Ak ) and
t
X
P (Bi
i,j=1
i<j
Bj ), the total number of summands being t C3 in the first case and the total
summands being t C2 i the second case, so that there are t C3 +t C2 =(t+1) C2
ordered-triplet involving terms which can be expressed in the condensed for m
t+1
X
as
P (Ai Aj Ak ).
i,j,k=1
i<j<k
18
Proceeding in this way and employing grouping of terms one may write that
! t+1
t+1
t+1
t+1
X
X
X
X
P
Ai =
P (Ai )
P (Ai Aj ) +
P (Ai Aj Ak )
i=1
i=1
i,j=1
i<j
i,j,k=1
i<j<k
Ak
k=1
n
X
P (Ak )
k=1
Proof : From the addition rule it follows that P (A1 A2 ) P (A1 ) + P (A2 ).
Using this result as the basis of induction, we can prove the above result through
the principle of induction.
Alternative form : For any two events A1 and A2 there holds the inequality
P (A1 A2 ) 1 P (Ac1 ) P (Ac2 )
Proof is again based on additive rule V :
P (A1 A2 )
P (A1 A2 )
1 P (Ac1 ) P (Ac2 )
(since probability is a non-negative set function).
Extension : For n events A1 , A2 , . . . , An F,
n
P ( Ai ) 1
i=1
n
X
i=1
P (Aci )
19
Suppose that for n = t, the proposition is true.
P
Ai
i=1
t
X
P (Aci )
i=1
We prove, on the basis of this supposition that the proposition is true for i =
(t + 1) :
P
Ai
Ai
i=1
( Ai ) At+1
=P
i=1
=P
t+1
i=1
P (Act+1 ) 1
t
X
1P
( Ai )
i=1
P (Act+1 )
P (Aci ) P (Act+1 ) = 1
i=1
t+1
X
P (Aci )
i=1
n=1
k = 1, 2, . . . , .
(ix) Deduction of Classical Definition of Probability
In Laplaces classical definition, events (i.e., possible outcomes) are mutually
exclusive, exhaustive and equally likely.
A1 , A2 , . . . , An are said to be pairwise mutually exclusive if Ai Aj =
n
i=1
P (Ak ) = c =
1
k = 1, 2, . . . , n.
n
Taken B as an event which is the union of m simple events, say, {Ai1 , Ai2 , . . . , Aim },
m
P (B) = P ( Aik ) =
k=1
m
X
P (Aik ) =
k=1
m
,
n
20
(x) Axiom of Continuity : If {An } be a monotone sequence of events, no
matter whether it is expanding or contracting, then
P ( lim An ) = lim P (An ).
n
An = Bk and Bi Bj = when i 6= j
k=1
An = lim
Ak = lim An = lim
n k=1
k=1
Bk = Bk
n k=1
k=1
P ( An )
n=1
= P ( Bk ) =
k=1
=
=
n
X
lim
lim
k=2
n
X
P (Bk ) = lim
k=1
n
X
P (Bk )
k=1
k=2
n=1
or,
1 P ( An ) = 1 lim P (An )
n=1
or,
P ( An ) = lim P (An )
n=1
or,
P ( lim An ) = lim P (An )
n
21
function defined on F. Perhaps it is the underlying reason why this result is
popular as Axiom of Continuity.
Remark : The extended axiom of addition (or axiom of countable additivity
of Kolmogorov) follows from the Axiom of Continuity. Details of the proof is
given in Appendix A2.
2.4
Example 1. Two cards are drawn from a deck of well-shuffled cards. What is
the probability that both the extracted cards are aces ?
Solution : As 52 cards form a complete deck, there exist 52 ways of selecting
the first card. After the first card is drawn, the second card may be one of the
remaining 51 cards and so the total number of ways of drawing a pair of cards
is 52 51. All the cases may be regarded equally likely. To find the number
of favourable cases of drawing a pair of aces, we observe that there are 4 aces;
therefore there exist 4 ways to get the first ace and 3 ways to have the second
ace. Hence the total number of ways to draw a pair of aces is 4 3 = 12 and
the required probability is
12
5152
1
221 .
Example 2. If two fair dice are thrown simultaneously, what is the probability
that the sum of the points exceeds 9 ?
22
Solution: The sample space = {(i, j)/i, j = 1, 2, 3, 4, 5, 6}. For visualisation
we arrange those ordered pairs in the form of a 66 matrix - the first die displays
its points along the horizontal while the second die displays its points along the
vertical. We need to identify those ordered pairs for which i+j > 9. From this it
follows that there are total 1 + 2 + 3 = 6 favourable cases and hence by classical
definition of probability, P (required event) =
6
36
35
36
n
For the second part, we need to determine the minimum value of n so that p >
n
holds. To this end, we must solve the inequality 35
< 23 .
36
n>
1
3
log 3 log 2
14.393
log 36 log 35
1
, i = 1, 2, . . .
2i
23
where i number of tosses, then show that it denotes indeed a probability measure
and moreoer, the probability that the first head will appear in an even-numbered
flip of the coin is 1/3.
Solution : The sample space is countably infinite. Since is infinite,
X
1X 1
1/2
1
=
=
P () =
=1
i
i1
2
2
2
1
12
i=1
i=1
1
1
1/22
1
1
+
+
+
.
.
.
ad
inf
=
=
1
2
4
6
2
2
2
3
1 22
Example 5. In a game of bridge, find the probability that each of the four
players will hold an ace.
Solution : A brige deck of 52 cards can be partitioned into four hands of 13
cards each in 52!/(13!)4 ways. The four aces can be distributed in 4! ways among
four contestants. The remaining 48 cards cna be partitioned in four groups with
12 apiece in
48!
(12!)4
48!
(i.e, where each player holds an ace) is 4! (12!)
4 so that the required probability
is
(13!)4
2197
4!48!
=
0.105
4
(12!)
52!
20824
225
1
=
900
4
The same result could have been obtained also from the addition rule of probabiilty, viz.,
P (A B) = P (A) + P (B) P (A B)
24
where
A = the set of integers that are divisible by 6.
B = the set of integers that are divisible by 8.
A B = the set of integers that are divisible by both 6 and 8, i.e., divisible
by their l.c.m. 24.
From the workout of chapter 1, it follows that P (A) =
P (A B) =
37
900
1
6;
1
4.
P (B) =
28
225
and
b2
4 .
Suppose that we are writing out the sample space for two dice one corresponding to a and the other corresponding to c:
(1, 1)
(1, 2)
(1, 3)
(1, 4)
(1, 5)
(1, 6)
(2, 1)
(2, 2)
(2, 3)
(2, 4)
(2, 5)
(2, 6)
(3, 1)
(3, 2)
(3, 3)
(3, 4)
(3, 5)
(3, 6)
(4, 1)
(4, 2)
(4, 3)
(4, 4)
(4, 5)
(4, 6)
(5, 1)
(5, 2)
(5, 3)
(5, 4)
(5, 5)
(5, 6)
(6, 1)
(6, 2)
(6, 3)
(6, 4)
(6, 5)
(6, 6)
Lets combine it with different values of b and enquire whether the condition
ac < b2 /4 of real & distinct rotos is satisfied or not.
Case I : b = 1 and so ac < 14 . There exists no admissible entry in the table to
satisfy it.
Case II : b = 2 and hence ac < 1. Here also no admissible pair exist.
Case III : b = 3, yielding ac < 9/4. Here the admissible ordered pairs are
(1, 1), (1, 2) and (2, 1).
Case IV : b = 4, so that ac < 4. Here number of admissible pairs is 6, i.e.,
(1, 1), (1, 2), (1, 3), (2, 1), (2, 3), (3, 1).
Case V : b = 5, making ac <
25
4 .
(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (4.1), (5.1),
25
(6.1).
Case VI : b = 6, making ac < 9. Here in addition to case (V), the ordered pairs
(2, 4) & (4.2) are to be enlisted, making the count 16.
For convenience, the favourable simple events are enlisted :
b
Total
# cases
14
16
39
Required probability =
39
63
13
72 .
= P (A B c ) + P (Ac B)
= P (A A B) + P (B A B)
= P (A) + P (B) 2P (A B)
26
Therefore the information supplied reads :
P (A) =
1
2
3
7
; P (B) = ; P (C) =
P (A B) =
;
2
5
10
20
3
1
1
; P (A B C) =
; P (Ac B c C c ) =
5
20
4
1
3
c
c
c
P (A B C) = 1 P (A B C ) = 1 =
4
4
Again by addition rules
P (B C) =
P (A B C)
P (A C)
+
=
4 5 10 20 5 20 4
20
80
C5 ways. This
is the total number of possible samples of size 5 that can be checked by the
inspector. We are to find the probability of detecting an underweight loaf. Out
of the 70 loaves that are either of correct weight or of overweight, number of
possible samples of size 5 is
70
70
C5 .
70
C5
80 C
0.497.
27
of points in the sample space is n of which n1 is of colour 1, n2 of colour
2, . . . , nr of colour r. Therefore the probability of drawing a ball of the
kth colour is
nk
n , (k
= 1, 2, . . . , r).
(ii) Number of possible ways in which n balls of the urn can be partitioned
into r groups (on the basis of colour difference) with n1 balls of colour
1, n2 balls of colour 2, . . . , nr balls of color r is
n!
n1 !n2 !...nr !
i!
i1 !i2 !...ir ! .
In the same vein, the total number of ways in which the unselected (n i) balls
can be colourwise partitioned into r groups, of which (n1 i1 ) balls are of colour
1, (n2 i2 ) balls of colour 2, . . ., (nr ir ) balls of colour r is
(n i)!
(n1 i1 )!(n2 i2 )! . . . (nr ir )!
The required probability is p, where
p
=
=
(n i)!
n!
i!
/
i1 !i2! .. . ir! (n
i
)!(n
i
)!
.
.
.
(n
i
)!
n
!n
2
2
r
r
1 2 ! . . . nr !
1 1
n1
n2
nr
i1
i2 . . . i r
n
i
2
52
13
3
, (observe n1 = n2 = n3 = n4 = 13)
Example 12. Two numbers i and j are chosen without replacement from the
set {1, 2, . . . , n}. Find the probability that |i j| m, a preassigned natural
number.
Solution : Since the numbers i and j are chosen without replacement from
the set {1, 2, . . . , n}, the total number of elementary events, i.e., ordered pairs
28
(i, j) with i 6= j and i, j = 1, 2, . . . , n is n P2 = n(n 1). These elementary
events can be arranged in the form of an n n matrix with its main diagonal
removed (see the annexing figure). Our aim is to find the cardinality of the set
A = {|i j| m : i 6= j and i, j = 1, 2, . . . , n}.
(1, 1)
(1, 2)
(1, 3)
...
(1, n)
(2, 1)
(2, 2)
(2, 3)
...
(2, n)
(3, 1)
..
.
(3, 2)
..
.
(3, 3)
..
.
...
..
.
(3, n)
..
.
(n, 1)
(n, 2)
(n, 3)
. . . (n, n)
From this presentation it is clear that due to symmetry of the modulus function,
working with the upper triangular matrix is sufficient. In general, the number
of elements appearing in the diagonal parallel to the main one and lying at a
distance r units from it is (nr). In the upper triangular matrix, the admissible
elementary events appear in the parallel diagonals lying at distances m, (m +
1), . . . , (n 1) from the main diagonal - making the total number of favourable
n1
X
elementary events in the upper triangular matrix
(n r). Due to symmetry,
r=m
the picture is exactly the same for the lower triangular matrix. Hence total
n1
X
(n r) = (n m)(n m + 1).
number of favourable elementary events is 2
r=m
n m)(n m + 1)
n(n 1)
Example 13. Find the probability of obtaining a given sum of points with n
dice ?
Solution : There are as many as 6n elementary events associated with the n
dice. The number of favourable cases is the same as the total number of integer
solutions of the Diophantine equation x1 + x2 + . . . + xn = s, where xk s are
integers ranging from 1 to 6. This number can be defined by the following device.
Multiplying the polynomial (x + x2 + . . . + x6 ) by itself, the product will consist
of terms of the form xx1 +x2 , where x1 and x2 independently assume all integral
values from 1 to 6. Grouping all the terms with the same exponent s, the coefficient of xs will yield the number of solutions of the equation x1 + x2 = s;
x1 , x2 being subject to the condition 1 xi 6. Similarly, multiplying the
29
polynomial (x + x2 + . . . + x6 ) thrice by itself and grouping all the terms with
the same exponent s, the co-efficient of xs will yield the number of admissible
integer solutions of x1 + x2 + x3 = s. In general, the total number of admissible
integer solutions ofthe equation x1 + x2 + . . . + xn = s is the co-efficient of
xs in the expansion of (x + x2 + . . . + xs )n . (Incidentally, in the language of
combinatorics, this multinomial is called the generating function).
(x + x2 + . . . + x6 )n
= xn (1 + x + . . . + x5 )n =
xn (1 x6 )n
(1 x)n
= xn (1 x6 )n (1 x)n
n
X
X
n
n+k1
n
j
6j
= x .
(1)
x .
xk
j
k
j=0
k=0
where
sn
6
sn
6 .
This being the number of elementary events, the required probability is given
by the classical definition as
sn
[X
6 ]
s 6r 1
n
r n
p=6
(1)
r
n1
r=0
Remark : (i) The above proof is modeled in the shade of Uspenskys Mathematical Probability. The above problem is known as De Moivres problem.
(ii) In the special circumstances, where we are to find the integer-valued solutions of the Diophantine equation x1 + x2 + . . . + xn = s under the constraint
1 xk 6 k = 1, 2, . . . , n and 6(n 1) < s < 6n, we can bypass the above
approach of generating functions and use tricky combinatorial results as follows
Introduce new variables yk = 6 xk k = 1, 2, . . . , n given equation yields a new
n
X
equation, viz., y1 + y2 + . . . + yn =
(6 xk ) = 6n s. The original problem
k=1
30
Hence the required probability is 6n
7ns1
n1
1
X
3
13 6r
3
13
3
7
3
(1)
.
=6
= 5/72
r
2
0
2
1
2
r=0
r
Method II. With the same set of notations it follows that (12 < s < 18) the
required probability is
p = 63
21 14 1
31
= 63
5
6
15
=
=
216
72
2
Example 15. From the urn containing N1 white and N2 black balls (N = N1 +
N2 ), balls are drawn successively without replacement. What is the probability
that the first black ball will be preceded by i white balls ?
Solution : Since balls are being drawn without replacement, the stock of balls
in the urn go on decreasing by one at each step. The first white ball can be
anyone of N1 whites present in the urn, the second white ball can be anyone of
the remaining (N1 1) whites in the urn and so on. Thus the first i whites can
be drawn out of the urn in as many as N1 (N1 1) . . . (N1 i + 1) ways. The
ball drawn at the next step is mandatorily black and it can be any one of the
N2 blacks present in the urn. Now there are (N i 1) balls left in the urn
and there is no a priori restriction over their draws. Hence they can be chosen
one-by-one in (N i 1)! ways. Combining all this, we have the total number of
favourable ways of draw as N1 (N1 1) . . . (N1 i + 1)N2 (N i 1)!. The total
number of ways in which all the balls can be drawn one-by-one, irrespective of
their colour, is N !. This gives us the required probability
p=
i=1
i,j=1
i<j
(1)
31
Solution : For n = 2 this result is satisfied by addition rule, since
P (A1 A2 ) = P (A1 ) + P (A2 ) P (A1 A2 )
For n = k, we presume this inequality to hold true.
P
Ai
i=1
k
X
P (Ai )
i=1
k
X
P (Ai Aj )
i,j=1
i<j
Again
P
k+1
Ai
i=1
( Ai ) Ak+1
i=1
k
k
= P Ai + P (Ak+1 ) P (Ak+1 ( Ai ))
i=1
i=1
k+1
k
X
X
k
P (Ai )
P (Ai Aj ) P (Ai Ak+1 )
= P
i=1
k+1
X
P (Ai )
i=1
k+1
X
i=1
i<j
i,j=1
k
X
P (Ai Aj )
i=1
k+1
X
P (Ai Ak+1 )
i=1
i,j=1
i<j
P (Ai )
k
X
P (Ai Aj ),
i,j=1
i<j
proving the validity of the proposition for n = (k + 1). Note that in the penultimate step we made use of Booles inequality. Hence by principle of induction,
the result is true for any finite n.
Remark : In the same vein one may also prove that
X
n
n
n
X
X
n
P (Ai Aj ) +
P (Ai Aj Ak )
P Ai
P (Ai )
i=1
i=1
i,j=1
i<j
(2)
i,j,k=1
i<j<k
32
n
terms in the expression of the same addition rule, what we get works as a lower
n
bound for P ( Ai ).
i=1
2.5
Conditional Probability
P (A B)
, with P (A) > 0 and B F.
P (A)
P (AA)
P (A)
PA ( Bn )
n=1
P A ( Bn )
n=1
P (A)
P (A Bn )
n=1
P (A)
(A Bn )
n=1
P (A)
PA (Bn ),
n=1
(where we used Kolmogorovs axiom of countable additivity valid for the original
probability measure P ()).
This restriction (c) validates the countable additivity of the new probability
measure PA (). From the above derivations it is clear that B F. (The advanced reader may look into this aspect as similar to relative topology). However, one should keep in mind that given a probability space (, F, P ()), many
33
conditional probability spaces (, F, PA ()) - one differing from the other on
the basis of the chosen A F. This is why usually we refer to PA (B) (more
conventionally P (B|A)) as the conditional probability of an event B on the
hypothesis that another event A has occurred.
Empirical Interpretation :
The interpretation of this new mathematical entity conditional probability
can be furnished as follows. For a long sequence of n repetitions of the random
experiment E under uniform conditions, there is a subsequence n(A) in which
the event A occurs and among these n(A) repetitions, the event B occurs (together with A) in n(A B) instances. The ratio n(A B)/n(A) is known as
the conditional frequency ratio of B on the hypothesis that A has already
occurred and denoted by fA (B). Assuming that lim fA (B) exists, we refer to
n
lim fA (B)
n(A B) n
n(A B)
= lim
.
= lim
n
n
n(A)
n
n(A)
n(A)
n(A B)
/ lim
= lim
n
n
n
n
n(A B)
=
, provided P (A) > 0
P (A)
n
Note that the definition of conditional probability directly gives rise to the
multiplication rule in probability theory :
P (A B) = P (A).PA (B) = P (A).P (B|A)
An Extension result : If A1 , A2 , . . . , An be events in F such that P (Ai ) > 0,
n
P (B) =
n
X
j=1
n
X
P (Aj )P (B|Aj )
j=1
34
up as follows. It often happens that the sample space for an experiment needs
to be altered to take into account the availability of some limited information
about the outcomes of the random experiment. Such an information may well
eliminate certain outcomes as impossible which were otherwise (i.e., without
the information) possible, and in such a case either the appropriate or reduced
sample space omit these outcomes or the probabilities (new) assigned to them
in the revised model would be zeros.
(b) Conditional probability gives rise to independence and dependence of
events - a feature that stamps probability measure with a special status over
other measure theoretic treatments.
(c) The definition of conditional probability is often useful in the form of a
multiplication law :
P (A B)
35
This generalised multiplication rule can be deduced by method of induction.
For n = k, suppose the given proposition holds true, i.e.,
P (A1 A2 . . . Ak )
P (A1 A2 . . . Ak Ak+1 )
(Here we made use in the penultimate step the fact that by definition, the
multiplication rule is true for n = 2).
Thus on the basis of the supposition, we have asserted that the generalised
multiplication rule is true for n = k +1. By Principle of Mathematical Induction
the result is found to be valid for all finite n 2.
Useful Corollary :
P ((B1 B2 )|A) = P (B1 |A) + P (B2 |A) P ((B1 B2 )|A)
The proof starts from ab initio : L.H.S.
L.H.S.
= P (B1 B2 |A)
P ((B1 A) (B2 A))
=
P (A)
P (B1 A) + P (B2 A) P (B1 B2 A)
=
P (A)
P (B1 A) P (B2 A) P (B1 B2 A)
=
+
= R.H.S.
P (A)
P (A)
P (A)
36
and so P (A1 A2 . . . An2 ) > 0 (why ?). This positivity is propagated
through all predecessors viz., P (A1 A2 . . . An3 ), . . . , P (A1 ) ensuring
that all the given conditional probabilities are well-defined.This multiplication
rule can be deduced by method of induction : For n =k, suppose the given
proposition holds true.
P (A1 A2 . . . Ak ) = P (A1 )P (A2 |A1 ) . . . P (Ak |A1 A2 . . . Ak1 )
P (A1 A2 . . . Ak Ak+1 )
= P ((A1 A2 . . . Ak ) Ak+1 )
= P (Ak+1 |A1 A2 . . . Ak )P (A1 A2 . . . Ak )
= P (Ak+1 |A1 A2 . . . Ak )P (A1 ).P (A2 |A1 )
. . . P (Ak |A1 A2 . . . Ak1 ).
(Here we made use in the penultimate step the fact that by definition, the
multiplication rule is true for n = 2).
Thus on the basis of supposition, we have asserted that the generalised
multiplication rule is true for n = k +1. By Principle of Mathematical Induction
the result is found to be valid for all finite n 2.
Useful corollary :
P (B1 B2 |A) = P (B1 |A) + P (B2 |A) P ((B1 B2 )|A)
The proof starts from abinitio :
L.H.S.
= P (B1 B2 |A)
P ((B1 A) (B2 A))
=
P (A)
P (B1 A) + P (B2 A) P (B1 B2 A)
=
P (A)
P (B1 A) P (B2 A) P (B1 B2 A)
=
+
P (A)
P (A)
P (A)
= R.H.S
37
a non-zero probability is a must. If in general, there are n alternative causes
A1 , A2 , . . . , An that all lead to the effect B, then extension rule of conditional
probability comes into play. In this context we refer to the extension rule as
Theorem of Total Probability.
If the events A1 , A2 , . . . , An , F constitute a partition of the sample space
and P (Aj > 0) j = 1, 2, . . . , n, then for any event B F we have
P (B) =
n
X
P (Aj )P (B|Aj )
j=1
In the language of probability theory one often says that Aj s are n mutually
exclusive and exhaustive causing events and B is the effect. The quantity P (Aj )
is called the Priori Probability of the jth cause Aj while the quantity P (B|Aj )
is called the conditional probability of the effect B due to jth cause Aj . The
tree-diagram shows the case in the figure.
Bayes rule may be deemed as focusing the reverse problem : Given that the
effect B has taken place, what is the contribution (as far as probability is concerned) of the jth cause Aj ? The symbolic answer is P (Aj |B), which is the
a posteriori probability of the jth cause Aj . Bayes theorem relates the a
priori and the a posteriori probabilities of the causes :
If A1 , A2 , . . . , An constitute a partition of the sample space and P (Aj ) >
0 j = 1, 2, . . . , n, then for any event B F which is not stochastically impossible,
P (Ak |B) =
Remark : (a) This result is extensible for any countably infinite number of
events A1 , A2 , . . . , An , . . . each of which has a positive probability and which
38
exhaust . If B F with P (B) > 0, then
P (Ak |B) =
P (Ak )P (B|Ak )
, k = 1, 2, . . . , .
X
P (Ak )P (B|Ak )
k=1
Proof :
P (Ak B)
P (Ak |B)
X
P (Ak B)
P
Ak B
k=1
k=1
X
P (Ak ).P (B|Ak )
k=1
For the proof of the original theorem, the steps are same in letter and spirit.
(b) The formula reasseses apriori probabiilties in terms of aposterriori probabilities and has close resemblance with the standard rule for a change of base in
logarithms. The Bayes rule is of great importance in Decision Theory, specially
Bayesian Estimating that is beyond the scope of the present text.
The situation of partitioned sample space so that exactly one event occurs can
be picturised as follows:
Sn
k=1
Ak and B is an effect to
39
2.6
Informally speaking, two events A and B associated with the same random
experiment E are independent if the occurrence or non-occurrence of either
one does not affect the probability of occurrence of the other. In mathematical
language, A is independent of B if the conditional probability P (A|B) equals the
unconditional probability P (A), i.e., if P (A) = P (A|B). If this is so and P (A) >
0, then according to Bayes rule, P (B|A) =
P (A).P (A|B)
P (A)
= P (A|B) = P (A),
40
combination of 1, 2, . . . , n taken three at a time.
..................
P (A1 A2 . . . An ) = P (A1 )P (A2 ) . . . P (An )
The constraint relations are n2 + n3 + . . . + nn = 2n n 1 in number.
Trivial observation : Mutual independence of events implies pairwise independence of events. However, by the following counter example we show that
pairwise independence of events does not necessarily imply mutual independence
of events.
Let the equally likely outcomes of an experiment be one of the four points in R3
with Cartesian co-ordinates (1, 0, 0), (0, 1, 0), (0, 0, 1) and (1, 1, 1). Let A, B, C
denote the following events :
1
4
= P (A).P (B); P (B C) =
1
4
1
4
1
2
= P (B).P (C); P (C A) =
and
1
4
Obviously we have
AB
= {(3, 1), (3, 2), (3, 3)(3, 4), (3, 5), (3, 6)};
AC
= {(3, 6)};
BC
ABC
= {(3, 6)}
41
1
1
1
; P (B) = ; P (C) =
2
2
9
1
1 1 1
P (A B C) =
= . . = P (A).P (B).P (C),
36
2 2 9
indicating that the events A, B, C have level-3 independence.
P (A) =
1
1
6= = P (A).P (B)
6
4
1
1
6=
= P (A).P (C)
36
18
1
1
6=
= P (B).P (C)
P (B C) =
12
18
Hence we draw the conclusion that level-3 independence does not imply level-2
P (A C) =
Figure 2.5: Venn diagram for the example illustrating level-3 independence not
implying level-2 independence.
42
Figure 2.6: Venn diagram representing the non-transitivity of pairwise independence of events.
2.7
8
25 ;
P (C) =
7
25 .
43
product is defective, we have
P (D|A) =
5
1
3
1
=
; P (D|B) =
and P (D|C) =
.
100
20
100
50
2 1
8
3
7
1
44
+
+
=
.
5 20 25 100 25 50
1250
We are to find the conditional probability P (A|D), (i.e., knwon that product
selected is random, probability of its being produced by machine A. To this
end, we have Bayes rule;
P (A|D) =
1/50
P (A).P (D|A)
25
= 44 =
P (D)
44
1250
Example 18. Urn 1 contains one white and two black marbles. Urn 2 contains
one black and two white marbles while Urn 3 contains three black and three
white marbles. A die is rolled. If the die show up 1,2 or 3, urn 1 is selected; if
the die shows up 4, urn 2 is selected; if the die show up 5 or 6, urn 3 is selected.
A marble is then drawn at rndom from the selected urn. Let A be the event
that the marble drawn is white. If U, V, W respectively denote the events that
the urn selected is 1, 2, 3, find the probability P (V |A).
Solution : We have
P (A U )
P (U ).P (A|U ) =
3 1
1
. =
6 3
6
P (A V )
= P (V ).P (A|V ) =
1 2
1
. =
6 3
9
P (A W )
2 3
1
= P (W ).P (A|W ) = . =
6 6
6
1
6
+ 19 + 16 = 49 .
=
=
P (V ).P (A|V )
P (U ).P (A|U ) + P (V ).P (A|V ) + P (W ).P (A|W )
1
1 2
1
6.3
9
3 1
1 2
2 3 = 4 = 4.
.
+
.
+
.
6 3
6 3
6 6
9
Example 19. A box contains three coins, two of them are fair and one twoheaded. A coin is selected at random and tossed. If heads appear, the coin is
tossed again; if the tails appear, then another coin is selected at random from
the two remaining coins and tossed.
44
(i) Find the probabilities that heads appear twice.
(ii) If the same coin is tossed thrice, find the probability that it is the twoheaded coin.
(iii) Find the probability that the tails appear twice.
Solution :
1
2
1
2
P (H)
P (T )
With this set up, lets workout (i), (ii) & (iii)
(i) P (Two heads) = P (HH)
By condition of the problem, as two heads are required, the first trial should be
head and the coin used is retained for second trial.
P (HH)
45
(ii) P (The coin is biased, given that the same coin were tossed twice) =
P (A3 |H).
Since same coin is tossed twice, head on the first trial is a must as otherwise
the coin, as per the rules, should be left out.
Required probability is
P (A3 |H) =
1
3 .1
2
3
1
2
1
1
1
; P (A2 ) = ; P (A3 ) =
4
2
2
letters of the English Alphabet, (a) with replacement (b) without replacement,
Find for each of the cases (a) and (b), the probabilities of the word formed (i)
contains an a (ii) Consists of only vowels (iii) The word is woman.
Solution : Part (a). Left to the reader as an exercise.
Part (b). All the letters being equally likely, P (a in the first draw) =
1
26 .
46
of favourable cases while 26
5 is the number of total outcomes. This shows
26
that 25
is the required probability.
4 / 5
(iii) For the formulation of word woman, first letter is w, followed by o, m, a
and n in succession.
P (w drawn first) =
1
26
1
25 .
P (m being drawn w and o were drawn in the 1st & 2nd draw) without
replacement =
1
24 .
1
23 .
1
22 .
Example 21. You heard that an old friend of yours has two children, one of
them is a girl, but you do not know whether the other child is a boy or a girl.
How likely is it that the other child is a boy ?
The sample space is = {BB, BG, GB, GG} with uniform probability. (Here
we denote a boy by B and a girl by G). Consider the events X = {BB, BG, GB};
Y = {BG, GB, GG}. Obviously X occurs if at least one child is a boy and Y
occurs if at least one child is a girl. i.e.
P (X|Y ) =
P (X Y )
#{BG, GB}
2
=
= .
P (Y )
#{BG, GB, GG}
3
Example 22. There are a white and b black balls in an urn. Two players draw
one ball each in turn, replacing it each time and stirring the balls of the urn.
The player who first draws a white ball wins the game. Find the probabiilty
that the player who begins the game being the winner.
Solution : The first player can obviously win the game either in the first or in
the third or in any subsequent odd-numbered draw. In the first draw, probability
a
of drawing the white ball is a+b
. In the third draw, probability of drawing the
2
b
a
white ball is a+b
[Observe that the game is extended upto the third
a+b
47
draw only if the first two draws yield black balls and the third draw produces
white ball and moreover these events are statistically independent.] In general,
2m
a
b
the (2m + 1)th draw can yield the white ball with probability a+b
a+b .
Using Kolmogorovs axiom of countable additivity we have
2
2m
a
b
a
b
a
=
+
+ ... +
+ . . . to
a+b
a+b
a+b
a+b
a+b
X
2m
a
a
1
a+b
b
=
=
.
2 =
a + b m=0 a + b
a+b
a + 2b
b
1 a+b
p1
1
2
b
a+b
< 1.
that the game heavily leans on who initates the game. Had there been a toss
to demarcate which of the two players will set the game rolling, it is a foregone
conclusion that in the longrun the starter has a greater probability of winning.
Generalisation of the previous problem is the following Huygens problem that
states :
Example 23. A and B throw alternately a pair of dice in that order. A wins
if the scores a total of 6 points before B scores a total of 7 points, in which case
B wins. If A starts the game what is his probability of winning ?
48
Solution :
p1
q1
p1
q2
P (A)
=
=
X
k=0
5
36
31
= P (A not scoring 6) =
36
6
= P (B scoring 7) =
36
30
= P (B not scoring 7) =
36
= P (A scoring 6) =
k=0
= p1
=
(q1 q2 )k =
k=0
5
36
30
31
36 36
p1
1 q1 q2
30
61
30
31
=
.
61
61
Remark. If the set up of the problem be such that q2 = q1 then as a corollary
P (B)
1 P (A) = 1
we would get back the previous problem. Actually the gneralisation owes to the
lack of symmetry of this problem.
Example 24. Polyas Urn model : Consider an urn that initially contains r
red balls and b black balls. At each trial, one ball is drawn. It is replaced and
c(> 0) balls of the colour drawn are added to the urn. Let Aj denote the event
that the jth ball drawn is black. Show that P (Aj ) =
b
b+r
for every j = 1, 2, . . ..
P (A1 )
(Here we used the fact if A1 occurred, 1st ball drawn is black and hence at
the onset of second draw, urn contains r red and (b + c) black balls so that
49
P (A2 |A1 ) =
b+c
r+b+c .
so at the onset of the second draw, urn contains (r + c) red and b black balls so
that P (A2 |Ac1 ) =
b
b+r+c ).
b
b+r
and so,
P (A3 |A1 A2 ) =
b + 2c
( why ?)
r + b + 2c
b + (n 1)c
r + b + (n 1)c
where we have made use of independence of events. However, p is same for all
roads p being the probability of road blocked by traffic jam.
P (open road)
(1 p2 )2
50
Further, suppose that there is also a direct road from A to C, which is independently blocked by traffic jam with probability p.
P (open road)
(1 p2 )2 p + 1.(1 p)
(b) Three urns contain respectively 1 white and 2 black balls; 2 white and 1
black balls; 2 white and 2 black balls. One ball is transferred from the first to
the second urn; then one ball is transferred from the second to the third urn;
finally one ball is drawn from the third urn. Find the probability that the ball
drawn from the third urn is white.
C1 : U1 U2 U3 W
C2 : U1 U2 U3 W
C3 : U1 U2 U3 W
Observe that
P (W |U1 ) =
1
3
P (B|U1 ) =
Now P (white from U2 |W were transported from U1 ) =
3
4
51
By multiplication rule,
P (C1 channel followed) =
1 3 3
9
=
3 4 5
60
(i)
1
4
1 1 2
2
=
3 4 5
60
(ii)
2
4
and P (white from U3 |W were transferred from U2 to U3 while B were transferred from U1 to U2 ) = 35 .
P (C3 channel followed) =
2 2 3
12
=
3 4 5
60
(iii)
2 2 2
8
=
3 4 5
60
(iv)
9
60
2
60
12
60
8
60
31
60 .
(ii)
There is no match
(iii)
Exactly r matches
52
Solution : (i) By the general addition rule or so called Poincare Theorem,
n
P ( Ak )
k=1
n
X
P (Ak )
k=1
P (Ai1 Ai2 ) +
A2 . . . An )
n
X
n1
= S1 S2 + S3 . . . + (1)
Sn =
(1)k1 Sk ,
+... +
k=1
where
X
Sk =
(nk)!
n! .
[Note
that the total number of ways in which n distinct objects can be placed into
n positions is n! and when matches at the specific k positions take place, the
remaining (n k) objects can be placed in the remaining (n k) positions in
(n k)! ways]
n
P ( Ak ) =
k=1
P (no matches)
n
X
k=1
= P(
Ack )
k=1
n
X
(1)k1 Sk =
k=1
n
X
(1)k1
k!
k=1
(1)
k!
k=1
n
X
k=0
=1P
Ak
=P
k1
c
Ak
k=1
k
(1)
k!
1
Note that the R.H.S is the sum of first (n + 1) terms of the Taylor series of
2
1
and so for moderate values of n, works as a remarkably good approximation
e
to the probability of no matches. It seems from this simple observation that
the probability of at least one match among n randomly permuted objects is
practically independent of n.
(iii) To solve the problem of exactly, r matches, all that is necessary is to realise
that the events exactly r matches occurring at the positions i1 , i2 , . . . , ir are
disjoint events for various choices of i1 , i2 , . . . , ir the total number of such
choices being nr . Thus using definition of conditional probability,
n
P (exactly rmatches) =
.P (A1 A2 . . . Ar B)
r
=
n
r
P (A1 A2 . . . Ar )P (B|A1 A2 . . . Ar ),
53
where B denotes the event no matching for the last (n r) positions.
Again P (B|A1 A2 . . . Ar ) = P (no matches in the last (n r) positions
nr
X (1)k
given that matches occur at the first r positions) =
,
k!
k=0
P (exactly r matches) =
nr
n (n r)! nr
X (1)k
1 X (1)k
.
.
=
n!
k!
r!
k!
r
k=0
k=0
= P (A ) = P (A (B B c )) = P (A B) + P (A B c )
= P (A).P (B) + P (A B c )
P (A B c )
54
Again,
P (B)
= P (B ) = P (B (A Ac )) = P (B A) + P (B Ac )
= P (A).P (B) + P (B Ac )
P (B B c )
Finally,
P (Ac B c )
= P (Ac ).P (B c ),
indicating that Ac and B c are independent.
Example 28. Any event A is independent of a stochastically certain event B,
i.e., if P (B) = 1, P (A B) = P (A).P (B) prove or disprove.
Solution : By monotonicity of probability function, P (B) P (A B). Since
P (B) = 1 by hypothesis and for any event A B, P (A B) 1, we have
1 P (A B) 1, i.e, P (A B) = 1. Using addition rule,
P (A) + P (B) P (A B) = 1
Hence
P (A B) = P (A) = P (A).P (B),
ascertaining the claim.
Example 29. If A, B, C are mutually independent events, show that Ac , B c
and C c are mutually independent events.
Solution : From the definition of mutual independence of A, B, C it follows
that
P (A B) = P (A).P (B); P (B C) = P (B)P (C); P (C A) = P (C).P (A)
Again P (A B C) = P (A).P (B).P (C). To prove that Ac , B c and C c are
mutually independent we are to show both level-2 and level 3 independence.
But
P (Ac B c ) = P (Ac )P (B c ); P (B c C c ) = P (B c ).P (C c ); P (C c Ac ) = P (C c )D(Ac )
55
because A, B, C are pairwise independent (cf., example 25 above). Finally for
level-3 independence,
P (Ac B c C c )
=
1 P (A B C)
= P (Ac )P (B c )P (C c )
2.8
Exercises
56
7. A man seeks advice regarding one of the possible two courses of action
from three seniors who arrive at their recommendations independently.
He follows the recommendations of the majority. The probabilities of the
individual advisers being wrong are 0.1, 0.05, 0.05 respectively. What is
the probability that the man takes an incorrect advice ?
8. The numbers 1, 2, 3, 4, 5 are written on five cards. Three cards are drawn
in succession. If the resulting digits are arranged from left to right, what
the probability that the three-digit number formed will be even ?
9. De Meres Paradox : What is more probable to get an ace with four dice
or to get one double ace in 24 throws of two dice ?
10. (a) An experiment has four possible outcomes A, B, C, D which are mutually exclusive. Is the following assignment of probabilities a feasible one
?
3
7
11
1
; P (B) =
; P (C) =
; P (D) =
10
30
20
5
Explain your answer.
P (A) =
3n
4n2 1 .
12. Repeated drawings are made with replacement from the set of five letters
A, B, C, D, E. What is the probabiilty that B will not occur ?
13. From an urn containing N1 white and N2 black balls (N = N1 + N2 ), balls
aer drawn one by one without replacement until only those of the same
colour are left. Prove that the probability that the balls left are white is
N1
N .
57
it is found to be a woman. What is the probability that our first choice
was a man ?
16. Three urns contain respectively 1 white and 2 black balls; 2 white and 1
black balls; 2 white and 2 black balls. One ball is transferred from the first
urn into the second, then one from the latter is transferred into the third;
finally, one ball is drawn from the third urn. What is the probability of
its being white ?
17. In a competitive examination, a student has to undergo a set of multiple
choice questions in which each question has 4 possible alternative answers,
exactly one of which is correct. If the student knows the answer, he picks
the correct alternative and otherwise, he selects one answer randomly out
of the 4 alternatives provided. If the student knows 60% answers correctly,
(a) what is the probability that for a given question, the student gets the
correct answer ?
(b) if the student chooses the correct alternative answer to a given question, what is the probability that he applied guesswork ?
18. For any three events A, B, C defined on the same probability space, if
B C and P (A) > 0, then prove that P (B|A) P (C|A).
19. Two persons throw a pair of dice once each. What is the probability that
the outcome of two throws is equal ?
20. There are two identical boxes, each provided with two drawers. In the
first, each drawer contains a gold coin; in the second, one drawer contains
a gold and other a silver coin; in the third, each drawer contains a silver
coin. A box is selected at random and one of the drawers opened. If a
gold coin is found, find the probability that the box chosen was the second
one.
21. There are n urns each containing N balls, of which N1 are white and N2
are black. One ball is transferred from the first to the second urn; then
one ball is transferred from the second to the third and so on; finally one
ball is drawn from the nth urn. Prove that the probability of the last ball
being white is
N1
N .
58
22. If the probabilities of n mutually independent events are p1 , p2 , . . . , pn ,
then show that the probability that at least one of these events will occur
is 1 (1 p1 )(1 p2 ) . . . (1 pn ). In technical term, this probability is
referred to as reliability of a system having n components connnected
parallelly.
23. A player randomly chooses one of the coins A and B. The probability of
coin A showing up heads is 3/4 while the probability of heads of coin B
is 14 . He tosses the coin twice.
(a) Find the probability that the obtains (i) two heads. (ii) one head.
(b) Instead of the above strategy, suppose the player chooses an unbiased
coin and tosses it twice. What procedure or strategy should the player
adopt to maximize the probability of at least one head ?
24. In Chennai 75% of the population is Tamils and rest non-Tamils. 20% of
the Tamils and 10% of the non-Tamils speak English. A stranger to Chennai meets a local resident who can speak English. What is the probability
that the local is a non-Tamil.
25. A person wrote letters to n addresses, put one letter in each envelope, and
then at random wrote one of the n addresses on each envelope. What is
the probability that no letter reached its proper destination ?
26. If the events A, B, C are mutually independent, then prove tht the pairs
(A, B C), (B, A C), (C, A B) also consist of independent events.
27. An experiment can result in one of the five outcomes, assigned probabilities of which are as follows : w1 with probability 18 ; w2 , w3 , w4 each with
probabilities
3
16 ;
5
16 .
:
E = {w1 , w2 , w3 }; F = {w1 , w2 , w4 }; G = {w1 , w3 , w4 }
Show that E, F, G are not pairwise independent, although P (E F G) =
P (E).P (F ).P (G), i.e, has level-3 independence.
28. If n objects are distributed at random among men and b(b < a) women,
show that the probability that the women get an odd number of objects
is
1
2
59
29. A parent particle can be split up into 0, 1 or 2 particles with probability
1 1
4, 2
and
1
4
lets denote by Xi the number of particles in the ith generation. Find (a)
P (X2 > 0) and (b) The probability that X1 = 2, given that X2 = 1.
30. It is suspected that a patient has one of the diseases A1 , A2 , A3 . Suppose
that the population percentages suffering from these illnesses are in the
ratio 2 : 1 : 1. Patient is given a test which turns out to be positive in
25% the cases of A1 , 50% of A2 , and 90% of A3 . Given that out of three
tests taken by the patient two were positive, find the probability for each
of the three illnesses.
31. Laplaces law succession : Assuming that it was hot on n consecutive days,
what is the probability that it will be hot during the next m days ? Try
to formulate an example to show that this law of succession yields absurd
results at times.
32. In a box there are 10 cut up alphabet cards with letters 3As, 4Ms and 3Ns.
We draw three cards one after another and place these letters on the table
in the order they have been drawn. Whats the probability that the word
MAN will appear ? If instead, the letters are drawn out simultaneously,
then whats the probability that the word MAN can be had from the
letters drawn ?
33. A man goes to his office following one of the three routes A1 , A2 and
A3 . His choice of route is quite independent of the weather. If it rains, his
probabilities of arriving late, following routes A1 , A2 , A3 are 0.06, 0.15, 0.12
respectively. The corresponding probabilities, if it does not rain, are
0.05, 0.10, 0.15.
(a) Given that on a sunny day he arrives late, what is the probability that
he used route A3 ? (Assume that two in every five days are rainy).
(b) Given that on a day he arrives late, what is the probability that it is
a rainy day ?
34. If A, B, C are any three events associated with a random experiment, show
that P (AB|C) = P (B|AC)P (A|C).
60
35. The integers x and y are chosen at random with replacement from the first
nine natural numbers {1, 2, . . . , 9}. Find the probability that |x2 y 2 | is
divisible by 2.
36. Most pairs of independent events that we deal with are distinct events.
However, this need not be the case as there are events that are independent
of themselves. An event A is said to be independent of itself P (A) =
(P (A))2 . Identify these events associated with a given random experiment.
37. A coin is tossed (m + n) times (m > n). Show that the probability
of exactly m consecutive heads is (n + 3)/2m+2 and that of at least m
consecutive heads is (n + 2)/2m+1 .
38. You have two coins in your pocket a fair one and an unfair one with
probability of heads
1
3,
random and tossed, falling heads up. How likely is it that it is the fair one
?
39. Cards are dealt from a well-shuffleed pack until the first heart appears.
(a) What is the probability that exactly 5 deals are required ?
(b) What is the probability that 5 or fewer deals are required ?
(c) What is the probability that exactly 3 deals were required, given the
information that 5 or fewer were required.
40. The four major blood groups are present approximately in the following
proportions in the Indians :
Type
AB
24
28
30
18
(a) If two people are picked at random from this population, whats the
chance that their blood groups are same ? different ?
(b) If four people are picked at random, and P (k) dentoe the chances of
these four people having exactly k different blood types among them, find
P (k) for k = 1, 2, 3, 4.
41. Suppose that a laboratory test on a blood sample yields one of the two
results - positive or negative. It is found that 95% of people with a particular disease produce a positive result. But 2% of the people without the
61
disease will also produce a positive result (a false positive). Suppose that
1% of the population actually has the disease. What is the probability
that a person chosen at random from the population will have the said
diesase, given that the laboratory test yields a positive result.
Appendix to Chapter II
A1. -algebra construction in R
In the remark (i) of Section 2.3 we stated that the family of all open intervals
in R, the family of all half open intervals in R, the family of closed intervals
in R none is a -algebra in its own right. However, the -algebras which
these families generate are all identical. We shall now prove a theorem which is
more general viz., the -algebras generated by each of the following families
of subsets of R are all identical.
[In what follows, the -algebra generated by the family Ak of subsets in R is
denoted by F(Ak )].
Theorem : Show that F(A1 ) = F(A2 ) = . . . = F(A8 ) provided
A1 : family of open intervals of the form (a, b).
A2 : family of half-open intervals of the form (a, b].
A3 : family of half-open intervals of the form [a, b).
A4 : family of closed intervals of the form (a, b].
A5 : family of semiinfinite intervals of the form (, a).
A6 : family of semiinfinite intervals of the form (a, +).
A7 : family of open subsets in R.
A8 : family of closed subsets in R.
Proof : We shall try to prove this theorem in the following sequence.
F(A1 ) F(A2 ) F(A3 ) F(A4 ) F(A5 F(A6 ) F(A7 ) F(A8 ) F(A1 )
(a, b) = (a, b
n=1
1
],
n
62
Now
(a, b)
{b}
(a, b]
F(A2 )
[a +
n=1
[b
n=1
1
1
, b ) F(A3 ) and
n
n
1
1
, b + ) F(A3 )
n
n
Again since
1
[a, b)
= a, b
, [a, b) F(A4 )
n=1
n
F(A3 ) F(A4 )
Further,
= (, b] (, a)
1
= (, b + ) (, a) F(A5 ),
n=1
n
since being a -algebra, F(A5 ) is closed under countable intersection and com[a, b]
plementation.
F(A4 ) F(A5 )
We observe that F(A5 ) F(A6 ) as for any real a,
c
c
1
1
(, a) = ([a, ))c = a ,
=
a ,
n=1
n=1
n
n
Since each open interval is an open set in R, any interval of the form (a, ) is
an open set and so belongs to A7 .
F(A6 ) F(A7 )
As complement of any closed set in R is an open set in R and F(A8 ) is closed
under complementation, every open set in R is contained in F(A8 ).
F(A7 ) F(A8 )
Finally, due to Lindeloff covering theorem, every open set in R is expressible
as a countable union of open intervals. Hence every open set in R belongs to
F(A1 ) and so also every closed set in R. This ensures that F(A8 ) F(A1 ).
All these prove that the -algebras F(Ak ), k = 1, 2, . . . , 8 are identical. We
therefore denote them by a common symbol, viz., F(R) or B and refer to it as
63
the -algebra of Borel subsets in R. The elements of B are called Borel sets
in R.
Note : If Ak be a family of subsets of R, then F(Ak ), the -algebra generated
by Ak is the intersection of all -algebras (of subsets in R) that contain Ak .
F(Ak ) = {F is a algebra containing Ak }
Very interesting to see that N and Z are Borel sets in R. (For each k N , {k}
belongs to F(A3 ) and hence belongs to B
N = {k} B.
k=1
In the same vein, one can ensure that Z is a Borel set in R. One can also show
that cantor set is a Borel set in R. As N and Z are Borel sets. We can develop
the theory of discrete random variables. Cantor set is useful in constructing the
Cantor function which is a singular continuous distribution function (See A3,
Chapter 4).
Few remarks on the independence-idea of two or more events
Two events A and B are said to be independent if both the relations P (B|A) =
P (B) and P (B|Ac ) = P (B) hold true. But why two ? This answer can be found
out provided we look into the basic definition of conditional probability. Note
that P (B) = P (B|A) is valid only if P (A) 6= 0. However, if P (A) = 0, this
relation is undefined. Again P (A) = 0 means P (Ac ) = 1 and hence P (B|Ac )
is well-defined. So the two relations put up for characterising independence,
supplement each other and both of them lead us to the multiplication rule, viz.,
P (A B) = P (A)P (B).
P (B|Ac ) = P (B Ac )/P (Ac )
P (B Ac ) = P (B) P (B A),
P (B|Ac ) = P (B) implies P (A B) = P (A)P (B).
The pair of relations defining indpendence of events is very intuitively appealing
as it pronounces that the event B is independent of A provided the knowledge
of occurrence non-occurrence of A does not anyhow determine the probability
of B.
The idea of independence of several events is a natural extension of the idea
of independence of two events. For instance, three events A, B, C are called
independent if
64
(i) Probability of B does not anyhow depend on whether or not A occurs.
(ii) Probability of C is not anyhow influenced by the apriori information which
two of the four events A, B, Ac , B c did occur, i.e,
P (C) = P (C|A B) = P (C|Ac B) = P (C|A B c ) = P (C|Ac B c )
All these however, lead us to the set of four relations characterising mutual
independence of three events A, B, C :
Since A and B are independent, P (A B) = P (A)P (B)
P (C) = P (C|A B) =
P (A B C)
P (A B C)
=
P (A B)
P (A)P (B)
i.e.,
P (A B C) = P (A)P (B)P (C)
[level-3 independence]. Again
P ((B C) Ac )
P ((B C) Ac )
=
P (Ac B)
P (Ac )P (B)
P (B C) P (A B C)
P (B C) P (A)P (B)P (C)
=
=
P (B)(1 P (A))
P (B)(1 P (A))
P (B C) = P (B)P (C)
P (C)
= P (C|Ac B) =
Similarly,
P (C)
P (C)
k=1
k=1
1.
65
Now,
lim Bn
n=1
P ( lim Bn )
= P ()
P (Bn )
P Ak
k=1
P Ak
= lim
Ak [due to subadditivity]
n
k=1
P
Ak
k=1
lim
k=+1
n
X
k=1
P (Ak ).
k=1
n
X
P (Aj )P (B|Aj )
j=1
This relation reminds us of the linear span of a vector in a vector space in terms
of a set of preassigned basis vectors. Here the apriori probabilities P (Aj )s
which partition unity work as the basis vectors while the likelihoods P (B|Aj )
work like the co-efficients appearing in the linear span. However, if you are too
much professional, you may treat it barely as a fiddlestick.